The grep tool is more than 40-years old and is ubiquitous (with some variations) across Unix systems. Its full name, global regular expression print, obscures its simple yet powerful purpose: to "search a file for a pattern"
The most simple invocation involves two arguments: the pattern and the target file. The following:
grep hello somefile.txt
– will print all lines that have the word "
hello" in them.
Like other Unix tools,
grep will accept shell expansions. For example:
grep hello *.txt
– will return all lines containing "
hello" from all files (in the current directory) with a
grep is called on more than one file, as in the above case, the output will also prepend the name of the file in which the match was found:
a.txt:I say hello a.txt:you say hello b.txt:we all say hello
And like most Unix tools,
grep will read data that is piped in from another command-line tool. For example, perhaps you want to filter a file through two
grep calls. The following will return all the lines from
.txt files that have hello and world in them:
grep hello *.txt | grep world
For the following example, let's imagine a file named
ham.txt with these lines:
To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep; No more; and by a sleep to say we end The heart-ache and the thousand natural shocks That flesh is heir to, 'tis a consummation Devoutly to be wish'd. To die, to sleep; To sleep: perchance to dream: ay, there's the rub; For in that sleep of death what dreams may come
-i option will match words regardless of capitalization:
grep "and" ham.txt
Output: ~~~ The slings and arrows of outrageous fortune, No more; and by a sleep to say we end The heart-ache and the thousand natural shocks ~~~
grep -i "and" ham.txt
Output: ~~~ The slings and arrows of outrageous fortune, And by opposing end them? To die: to sleep; No more; and by a sleep to say we end The heart-ache and the thousand natural shocks ~~~
If you have a separate file of text patterns, the
-f option lets you specify that file. The
grep will consider each line in that file as a pattern to match against the target file.
words.txt looks like this:
grep -f words.txt ham.txt
Output: ~~~ And by opposing end them? To die: to sleep; The heart-ache and the thousand natural shocks ~~~
-v flag will return all non-matches. The following would return all lines that did not have the letter 'e' in them:
grep e -v ham.txt # This would also work: grep -v e ham.txt
Whether 'tis nobler in the mind to suffer And by opposing end them? To die: to sleep; Devoutly to be wish'd. To die, to sleep;
grep displays the entire line in which a match is made:
grep 'the' ham.txt
To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer And by opposing end them? To die: to sleep; The heart-ache and the thousand natural shocks To sleep: perchance to dream: ay, there's the rub;
However, If you want to see only the match, use the
grep -o 'the' ham.txt
the the the the the the the
Note in the output, each match of
the is shown, whether it is in standalone
the or in
Whether. Obviously, this isn't very helpful by itself. Which is why we combine it with a regular expression, as seen below:
The topic of regular expressions is worth a lesson on its own. Think of them as pattern-matching-on-steroids. When doing extensive searches, you rarely are looking for exact words. Instead, you'll find yourself wanting to look for certain patterns, such as:
Regular expressions is a "mini-language" that lets you express such custom matching. Regular-Expressions.info is a pretty good (and comprehensive) place to start. But we'll cover them in another tutorial.
By using the
-E option and then a text-string,
grep will act on any regular expression syntax in that text-string.
For example, to find all words that are either
the, or have
the in them, use
-E to specify the pattern, combined with
-o to show just the match:
grep -oE '\w*the\w*' ham.txt
the Whether the them the there the
Again, the regular expression syntax is its own lesson, but
\w*the\w* can be translated into: "Find the text matching the word 'the' with any number of alphanumerical characters before or after 'the'"
To find all words that begin with the letter "s", either upper or lowercase, we use the
-i flag and the following regular expression (the
\b stands for "word boundary", i.e. the beginning of a word):
grep -oiE '\bs\w+' ham.txt
suffer slings sea sleep sleep say shocks sleep sleep sleep