Ruby Regex

Xavier Carty
5 min readMar 31, 2020

--

EngineJournal.com

Regex can seem cryptic upon first glance but once you have become accustomed to it. It can be a powerful tool and make your life easier. I have struggled with understanding regex at first but once i practiced it has been very helpful

Ruby regular expressions help you find specific patterns inside strings, with the intent of extracting data for further processing.

For example:

One way to check if string matches a regex that is the most friendliest way in my opinion.

# Find the word ‘own’

Do you own cats?” =~ /own/

All Rights Reserved RubyGuides

This syntax returns the index of the first occurrence of the word if it was found or nil otherwise.

Another way to check if a string matches a regex is to use the match method:

if “Do you own cats?”.match(/own/)

puts “Match found!”

end

All Rights Reserved RubyGuides

The scan method returns an array of all items in your string that match a given Regular Expression. For example:

  • “The rain in Spain lies mainly in the plain”.scan(/\w+ain/)
  • => [“rain”, “Spain”, “main”, “plain”]
All Rights Reserved Flatiron School

String Substitution

Regex can also be useful for string substitution for example you can use string methods such as .sub and .gsub.

For Example:

https://www.slideshare.net/Codemotion/luca-mearelli

When Regex becomes powerful in my opinion is when you can use the character , anchor , quantifiers and alteration classes.

Character Classes

., matches any character except line breaks. Equivalent to [^\n\r].

\w, matches any word character (alphanumeric & underscore). Only matches low-ascii characters (no accented or non-roman characters). Equivalent to [A-Za-z0–9_]

\d, matches any digit character (0–9). Equivalent to [0–9].

\s, matches any whitespace character (spaces, tabs, line breaks).

[ABC], matches any character in the set.

[^ABC], matches any character is not in the set.

[A-Z], matches a character having a character code between the two specified characters inclusive.

Anchors

^, matches the beginning of the string, or the beginning of a line if the multiline flag (m) is enabled. This matches a position, not a character.

$, matches the end of the string, or the end of a line if the multiline flag (m) is enabled. This matches a position, not a character.

\b, matches a word boundary position such as whitespace, punctuation, or the start/end of the string. This matches a position, not a character.

\B, matches any position that is not a word boundary. This matches a position, not a character.

Groups & Look around

(ABC), groups multiple tokens together and creates a capture group for extracting a substring or using a back reference.

\1, matches the results of a previous capture group. For example \1 matches the results of the first capture group and \3 matches the third.

(?:ABC), groups multiple tokens together without creating a capture group.

(?=ABC), matches a group after the main expression without including it in the result.

(?!ABC), specifies a group that can not match after the main expression (if it matches, the result is discarded).

Quantifiers & Alternation

+, matches 1 or more of the preceding token.

*, matches 0 or more of the preceding token.

{1,3}, matches the specified quantity of the previous token. {1,3} will match 1 to 3. {3} will match exactly 3. {3,} will match 3 or more.

?, matches 0 or 1 of the preceding token, effectively making it optional.

+?, *? and ??, are equal to the preceding quantifiers, but make them lazy causing it to match as few characters as possible. By default, quantifiers are greedy, and will match as many characters as possible.

|, acts like a boolean OR. Matches the expression before or after the |. It can operate within a group, or on a whole expression. The patterns will be tested in order.

Capture Groups

Using parentheses in our regex allows us to create ‘groups’ that we can refer to in our scan/match/grep methods as indexes in an array. In the example below we create three capture groups for the three sets of digits in a phone number. Now, when we scan a list of numbers, each phone number is broken down into subgroups based on the capture groups we built in our regular expressions:

All Rights Reserved Flatiron School

Regex can be difficult to grasp at first but with practice you can match anything you want! Here are some helpful sites to get practice below.

Helpful Sites:

Sources

#regex #code #softwareengineering

--

--

Xavier Carty
Xavier Carty

Written by Xavier Carty

Learning new things everyday and sharing them with them world. Software Engineering , Muay Thai , and Soccer are my passions.

No responses yet