UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 27.13 More grep-like Programs Written in Perl Chapter 27
Searching Through Files
Next: 27.15 Narrowing a Search Quickly
 

27.14 Compound Searches

You may recall that you can search for lines containing "this" or "that" using the egrep (27.5) | metacharacter:

egrep 'this|that' files

But how do you grep for "this" and "that"? Conventional regular expressions don't support an and operator because it breaks the rule that patterns match one consecutive string of text. Well, agrep (28.9) is one version of grep that breaks all the rules. If you're lucky enough to have it installed, just use:

agrep 'cat;dog;bird' files

If you don't have agrep, a common technique is to filter the text through several greps so that only lines containing all the keywords make it through the pipeline intact:

grep cat files | grep dog | grep bird

But can it be done in one command? The closest you can come with grep is this idea:

grep 'cat.*dog.*bird' files

which has two limitations - the words must appear in the given order, and they cannot overlap. (The first limitation can be overcome using egrep 'cat.*dog|dog.*cat', but this trick is not really scalable to more than two terms.)

As usual, the problem can also be solved by moving beyond the grep family to the more powerful tools. Here is how to do a line-by-line and search using sed, awk, or perl: [2]

[2] Some versions of nawk require an explicit $0~ in front of each pattern.

sed '/cat/!d; /dog/!d; /bird/!d' files
awk '/cat/ && /dog/ && /bird/' files
perl -ne 'print if /cat/ && /dog/ && /bird/' files

Okay, but what if you want to find where all the words occur in the same paragraph? Just turn on paragraph mode by setting RS="" in awk or by giving the -00 option to perl:

awk '/cat/ && /dog/ && /bird/ {print $0 ORS}' RS= files
perl -n00e 'print "$_\n" if /cat/ && /dog/ && /bird/' files

And if you just want a list of the files that contain all the words anywhere in them? Well, perl can easily slurp in entire files if you have the memory and you use the -0 option to set the record separator to something that won't occur in the file (like NUL):

perl -ln0e 'print $ARGV if /cat/ && /dog/ && /bird/' files

(Notice that as the problem gets harder, the less powerful commands drop out.)

The grep filter technique shown above also works on this problem. Just add a -l option (15.7) and the xargs command (9.21) to make it pass filenames through the pipeline rather than text lines:

grep -l cat files | xargs grep -l dog | xargs grep -l bird

(xargs is basically glue used when one program produces output that's needed by another program as command-line arguments.)

- GU


Previous: 27.13 More grep-like Programs Written in Perl UNIX Power ToolsNext: 27.15 Narrowing a Search Quickly
27.13 More grep-like Programs Written in Perl Book Index27.15 Narrowing a Search Quickly

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System