Once again, it’s Enumerating Enumerable time! This is the latest in my series of articles where I set out to make better documentation for Ruby’s Enumerable
module than Ruby-Doc.org’s. In this installment, I cover the grep
method.
In case you missed any of the previous articles, they’re listed and linked below:
- all?
- any?
- collect / map
- count
- cycle
- detect / find
- drop
- drop_while
- each_cons
- each_slice
- each_with_index
- entries / to_a
- find_all / select
- find_index
- first
Enumerable#first Quick Summary
In the simplest possible terms | Which items in the collection are a === match for a given value? |
---|---|
Ruby version | 1.8 and 1.9 |
Expects |
|
Returns |
|
RubyDoc.org’s entry | Enumerable#grep |
Enumerable#grep, Regular Expressions and Arrays
The grep
method’s name implies regular expressions, and that’s one of its uses. When given a regular expression as an argument and used without a block, grep
returns an array containing the items in the original array that match the given regular expression.
# Here's a list of countries, some of them with "stan" in their names. # # I'm including Stan Lee, creator of many wonderful superhero comics simply because # he's cool enough to be his own country. countries = ["Afghanistan", "Burkina Faso", "Kazakhstan", "France", "Tajikistan", "Iceland", "Uzbekistan", "Australia", "Stan Lee"] => ["Afghanistan", "Burkina Faso", "Kazakhstan", "France", "Tajikistan", "Iceland", "Uzbekistan", "Australia", "Stan Lee"] # Which countries have the string "stan" in their names? countries.grep(/stan/) => ["Afghanistan", "Kazakhstan", "Tajikistan", "Uzbekistan"] # Note that "Stan Lee" wasn't included in that list. "Stan" and "stan" aren't the # same thing, but that's easy to fix: countries.grep(/[S|s]tan/) => ["Afghanistan", "Kazakhstan", "Tajikistan", "Uzbekistan", "Stan Lee"]
When a block is used with grep
, the contents of the result array are passed through the block and the resulting array is returned. Think of it as grep
followed by a collect
/map
operation.
# Let's get a look at those countries with "Stan" and "stan" in their names again: countries.grep(/[S|s]tan/) => ["Afghanistan", "Kazakhstan", "Tajikistan", "Uzbekistan", "Stan Lee"] # Let's get the lengths of the names of those countries: countries.grep(/[S|s]tan/) {|country| country.length} => [11, 10, 10, 10, 8] # It's a slightly shorter version of this: countries.grep(/[S|s]tan/).map {|country| country.length} => [11, 10, 10, 10, 8] # This time, let's find all the "stans" and uppercase them countries.grep(/[S|s]tan/) {|country| country.upcase} => ["AFGHANISTAN", "KAZAKHSTAN", "TAJIKISTAN", "UZBEKISTAN", "STAN LEE"] # And here's the version that uses map: countries.grep(/[S|s]tan/).map {|country| country.upcase} => ["AFGHANISTAN", "KAZAKHSTAN", "TAJIKISTAN", "UZBEKISTAN", "STAN LEE"]
What Enumerable#grep Really Does: The === Operator
Here’s grep
‘s secret: what it actually does is take each item in the array, compares it against the given argument using Ruby’s ===
(the “triple equals”) operator and returns an array of those items in the original array for which the comparison returns true
.
For regular expressions, the ===
operator is grep
-like. The expression r === s
operator returns true
if there is a match for regular expression r
in string s
.
Different classes implement ===
differently. For example, in the Range
class, ===
is used to see if an item is within the range. The expression r === x
returns true
if x
is in range r
. Here’s grep
in action when its argument is a range:
# These are the years when the band Radiohead released an album radiohead_album_years = [1993, 1995, 1997, 2000, 2003, 2007] => [1993, 1995, 1997, 2000, 2003, 2007] # And these are the years when Radiohead released an album between 1996 and # 2002 inclusive radiohead_album_years.grep((1996..2002)) => [1997, 2000]
Generally speaking, collection.grep(thing_to_compare)
compares thing_to_compare
with each item in collection
using the ===
operator as defined for thing_to_compare
‘s class. It returns an array of those items in the original array for which the comparison returned true
.
Don’t forget the extra processing — a map
operation — comes “free” if you provide grep
with a block:
radiohead_album_years = [1993, 1995, 1997, 2000, 2003, 2007] => [1993, 1995, 1997, 2000, 2003, 2007] # Adding a block performs a map operation on grep's initial results radiohead_album_years.grep((1996..2002)) {|year| year % 2 == 1 ? "odd" : "even" } => ["odd", "even"]
Enumerable#grep and Hashes
I’ll put it simply: Enumerable#grep
isn’t terribly useful with hashes. Like most methods of Enumerable
, when applied to a hash, grep
, as it iterates through the hash, converts each key-value pair into a two-element array where the first element is the key and the second element is the corresponding value.
As I mentioned earlier, grep
uses the ===
operator to do its comparison, and for arrays, ===
returns true only when comparing identical arrays:
# Identical arrays [1, 2] === [1, 2] => true # How about the first array as a subset of the second? [1] === [1, 2] => false # How about the first array as a superset of the second? [1, 2, 3] === [1, 2] => false # How about one array as a permutation of the other? [2, 1] === [1, 2] => false
The practical upshot of all this is that for hashes, grep
will return the empty array []
for most arguments, with the notable exception of an argument that is a two-dimensional array that corresponds to one of the key-value pairs in the hash.
That was a bit wordy, but an example should clear things right up:
# These are countries and their total areas (not counting outside territories) # in square kilometres. total_country_areas = {"Afghanistan" => 647_500, "Burkina Faso" => 274_200, "Kazakhstan" => 2_717_300, "France" => 547_030} => {"Afghanistan"=>647500, "Burkina Faso"=>274200, "Kazakhstan"=>2717300, "France"=>547030} # Is there a '"Burkina Faso" => 274200' item in the hash? total_country_areas.grep(["Burkina Faso", 274_200]) => [["Burkina Faso", 274200]] # That worked because the array argument we provided was an exact match # for one of the items in the hash when it is converted into an array. # Is there a '"Burkina Faso" => 0' item in the hash? total_country_areas.grep(["Burkina Faso", 0]) => [] # That didn't work because the array argument didn't correspond to any of the items # in the hash.
Making Hashes grep-able
If you need to find which keys in a hatch pattern-match a given value, use the Hash#keys
method (which returns an array of the hash’s keys) and grep
that:
# Again with the countries and the areas... total_country_areas = {"Afghanistan" => 647_500, "Burkina Faso" => 274_200, "Kazakhstan" => 2_717_300, "France" => 547_030} => {"Afghanistan"=>647500, "Burkina Faso"=>274200, "Kazakhstan"=>2717300, "France"=>547030} # Which ones are the "stans"? total_country_areas.keys.grep(/stan/) => ["Afghanistan", "Kazakhstan"]
If you need to-find which values in a hatch pattern-match a given value, use the Hash#values
method (which returns an array of the hash’s values) and grep
that:
# Of the countries' total areas, which the ones between # 500,000 and 1 million square km? total_country_areas.values.grep((500_000..1_000_000)) => [647500, 547030]
What if you want to find key-value pairs where either the key or the value is a ===
match for a given argument? There’s a way to do that, and I’ll cover it when we get to the Enumerable#inject
method. It’ll be soon, I promise!
One reply on “Enumerating Enumerable: Enumerable#grep”
[…] functional niceties), but I have no excuse for blowing this. I’ve actually written a whole series of articles on the power of Ruby’s Enumerable module, including the select […]