The layman’s guide to RegEx: What is RegEx exactly?

RegEx stands for regular expression and it is a common tool used by programmers to match patterns of string data. “WAIT, I thought this was the layman’s explanation?!” Let us continue.

A good way to think about regular expressions is a sifter for string data. Let’s say you have a recipe for an amazing cake but your string flour is too lumpy and stuck together. The regular expression sifter will allow you to collect the string flour of your choosing so that you can finish the recipe. And you can keep adding filters to the sifter to make your flour even more fine.

At the most fundamental level, a regular expression allows us to find, manipulate and collect data within strings. A string is one of the most common data types in computer science and it is usually just a squence of characters between a pair of quotes.

"This is a string"

With regular expressions we can take a string like the one above and match certain patterns that we are looking to find. Say for instance we wanted to change every ‘s’ character to the number ‘5’.

We could use regular expression to do that instead of re-typing every single time the letter ‘s’ appears. This might seem like a trivial task for the above example but imagine doing this for a research paper you have been working on or maybe even your first novel.

Regular expressions make that process of matching certain data within strings much easier and that is why it so useful. You probably use regular expressions everyday without know it. Everytime you open up a Google Doc or Microsoft Word to use the “Find and Replace” feature you are using a regular expression!

Using regular expressions #

When I first started learning about regular expressions I was very intimidated because the syntax looked very complicated. You might see a regular expression that looks like this:

/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.[\W]).{8,}$/

The above is a regular expression for a password that satisfies a strict set of conditions. These are the types of regular expressions that I first saw and it boggled my mind. Let’s start with something a little more straightforward.

test1.png

Here we have a string of some sample song lyrics. Let’s switch up the words a little bit:

lyrics_test = File.read('lyrics')
replacements = [ [/club/, "pull request"], [/girl/, "labs"], [/the cut/, "my repo"], [/she/, "they"] ] 

replacements.each {|replacement| lyrics_test.gsub!(replacement[0], replacement[1])} 

puts lyrics_test

test2.png

In this very simple example we can use one of the most popular tools for regular expressions called gsub.

The method gsub takes two arguments, the first being the data you want to replace and the second being the data you want to replace it with. In our example we have an array of values we want to replace along with the values we want to replace them with.

replacements = [ [/club/, "pull request"]...

RegEx uses the syntax of two forward slashes and anything you place inside of those slashes becomes your regular expression /-ANYTHING-/. In the first iteration we have our regular expression of /club/ and we replace it with the string ‘pull request’.

Let’s take a look at another example:

lyrics_test = File.read('lyrics')
replacements = [ [/ [t][A-Za-z]{3} /, " code "] ] 

replacements.each {|replacement| lyrics_test.gsub!(replacement[0], replacement[1])} 

puts lyrics_test

In this example we want to replace every four letter word that begins with the letter ‘t’, with the word ‘code’.

Looking more closely at our regular expression inside of /-ANYTHING-/ we can see that it starts with an empty space. This will be the start of any new word. Now we want to specify only words that begin with the letter ‘t’. Using [t] we are able to pinpoint only the first words that start with the letter ‘t’.

Next we use a handy regular expression trick to specify any alphaetical letter. [A-Za-z] means ANY character from capital A-Z or lower case a-z. Finally we specify that we want exactly three of the characters using {3} to complete a four letter word.

test3.png

RegEx has a treasure chest of pattern matchers which allow you to match the different sequences across your strings. I would highly recommend playing around with Rubular and RegExr to get more familiar. Once you get the syntax and matchers down I promise you will it will begin to click!

 
1
Kudos
 
1
Kudos

Now read this

List of sublime text shortcuts, tips and tricks

I started using sublime text over the past couple months (RIP textmate) and it’s been awesome so far. I wanted to create a short list of the shortcuts, tips and tricks that I’ve found most useful: Files cmd+p — Go to Anything cmd+ctrl+p... Continue →