[Chapter 6] 6.4 Pattern-Matching Examples

6.4 Pattern-Matching Examples

Unless you are already familiar with regular expressions, the discussion of special characters above probably looks forbiddingly complex. A few more examples should make things clearer. In the examples that follow, a square () is used to mark a space; it is not a special character.

Let's work through how you might use some special characters in a replacement. Suppose that you have a long file and that you want to substitute the word child with the word children throughout that file. You first save the edited buffer with :w, then try the global replacement:

:%s/child/children/g

When you continue editing, you notice occurrences of words such as childrenish. You have unintentionally matched the word childish. Returning to the last saved buffer with :e!, you now try:

:%s/child/children/g

(Note that there is a space after child.) But this command misses the occurrences child., child,, child: and so on. After some thought, you remember that brackets allow you to specify one character from among a list, so you realize a solution:

:%s/child[,.;:!?]/children[,.;:!?]/g

This searches for child followed by either a space (indicated by ) or any one of the punctuation characters ,.;:!?. You expect to replace this with children followed by the corresponding space or punctuation mark, but you've ended up with a bunch of punctuation marks after every occurrence of children. You need to save the space and punctuation marks inside a $ and $. Then you can "replay" them with a \1. Here's the next attempt:

:%s/child\([,.;:!?]\)/children\1/g

When the search matches a character inside the $ and $, the \1 on the right-hand side restores the same character. The syntax may seem awfully complicated, but this command sequence can save you a lot of work! Any time you spend learning regular expression syntax will be repaid a thousandfold!

The command is still not perfect, though. You've noticed that occurrences of Fairchild have been changed, so you need a way to match child when it isn't part of another word.

As it turns out, vi (but not all other programs that use regular expressions) has a special syntax for saying "only if the pattern is a complete word." The character sequence \< requires the pattern to match at the beginning of a word, whereas \> requires the pattern to match at the end of a word. Using both will restrict the match to a whole word. So, in the task given above, \<child\> will find all instances of the word child, whether followed by punctuation or spaces. Here's the substitution command you should use:

:%s/\<child\>/children/g

6.4.1 Search for General Class of Words

Suppose your subroutine names begin with the prefixes: mgi, mgr, and mga.

If you want to save the prefixes, but want to change the name box to square, either of the following replacement commands will do the trick. The first example illustrates how $ and $ can be used to save whatever pattern was actually matched. The second example shows how you can search for one pattern but change another:

:g/mg\([ira]\)box/s//mg\1square/g

The global replacement keeps track of whether an i, r or a is saved. In that way, box is changed to square only when box is part of the routine's name.

:g/mg[ira]box/s/box/square/g

This has the same effect as the previous command, but it is a little less safe since it could change other instances of box on the same line, not just those within the routine names.

6.4.2 Block Move by Patterns

You can also move blocks of text delimited by patterns. For example, assume you have a 150-page reference manual. Each page is organized into three paragraphs with the same three headings: SYNTAX, DESCRIPTION, and PARAMETERS. A sample of one reference page follows:

 .Rh 0 "Get status of named file" "STAT"
 .Rh "SYNTAX"
 .nf
 integer*4 stat, retval
 integer*4 status(11)
 character*123 filename
 ...
 retval = stat (filename, status)
 .fi
 .Rh "DESCRIPTION"
 Writes the fields of a system data structure into the
 status array.
 These fields contain (among other
 things) information about the file's location, access
 privileges, owner, and time of last modification.
 .Rh "PARAMETERS"
 .IP "\fBfilename\fR" 15n
 A character string variable or constant containing
 the UNIX pathname for the file whose status you want
 to retrieve.
 You can give the ...

Suppose that it is decided to move DESCRIPTION above the SYNTAX paragraph. With pattern matching, you can move blocks of text on all 150 pages with one command!

:g /SYNTAX/.,/DESCRIPTION/-1 move /PARAMETERS/-1

This commands works as follows. First, ex finds and marks each line that matches the first pattern (i.e., that contains the word SYNTAX). Second, for each marked line, it sets . (dot, the current line) to that line, and executes the command. Using the move command, the command moves the block of lines from the current line (dot) to the line before the one containing the word DESCRIPTION (/DESCRIPTION/-1) to just before the line containing PARAMETERS (/PARAMETERS/-1).

Note that ex can place text only below the line specified. To tell ex to place text above a line, you first subtract one with -1, and then ex places your text below the previous line. In a case like this, one command saves literally hours of work. (This is a real-life example -- we once used a pattern match like this to rearrange a reference manual containing hundreds of pages.)

Block definition by patterns can be used equally well with other ex commands. For example, if you wanted to delete all DESCRIPTION paragraphs in the reference chapter, you could enter:

:g/DESCRIPTION/,/PARAMETERS/-1d

This very powerful kind of change is implicit in ex's line addressing syntax, but it is not readily apparent even to experienced users. For this reason, whenever you are faced with a complex, repetitive editing task, take the time to analyze the problem and find out if you can apply pattern-matching tools to get the job done.

6.4.3 More Examples

Since the best way to learn pattern matching is by example, here is a list of pattern-matching examples, with explanations. Study the syntax carefully, so that you understand the principles at work. You should then be able to adapt these examples to your own situation.

Put troff italicization codes around the word RETURN:
```
:%s/RETURN/\\fI&\\fP/g
```
Notice that two backslashes (\\) are needed in the replacement, because the backslash in the troff italicization code will be interpreted as a special character. (\fI alone would be interpreted as fI; you must type \\fI to get \fI.)
Modify a list of pathnames in a file:
```
:%s/\/home\/tim/\/home\/linda/g
```
A slash (used as a delimiter in the global replacement sequence) must be escaped with a backslash when it is part of the pattern or replacement; use \/ to get /. An alternate way to achieve this same effect is to use a different character as the pattern delimiter. For example, you could make the above replacement using colons as delimiters. Thus:
```
:%s:/home/tim:/home/linda:g
```
This is much more readable.
Put HTML italicization codes around the word RETURN:
```
:%s:RETURN:<I>&</I>:g
```
Notice here the use of & to represent the text that was actually matched, and, as just described, the use of colons as delimiters instead of slashes.
Change all periods to semicolons in lines 1 to 10:
```
:1,10s/\./;/g
```
A dot has special meaning in regular expression syntax and must be escaped with a backslash (\.).
Change all occurrences of the word help (or Help) to HELP:
```
:%s/[Hh]elp/HELP/g
```
or:
```
:%s/[Hh]elp/\U&/g
```
The \U changes the pattern that follows to all uppercase. The pattern that follows is the repeated search pattern, which is either help or Help.
Replace one or more spaces with a single space:
```
:%s/*//g
```
Make sure you understand how the asterisk works as a special character. An asterisk following any character (or following any regular expression that matches a single character, such as . or [a-z]) matches zero or more instances of that character. Therefore, you must specify two spaces followed by an asterisk to match one or more spaces (one space, plus zero or more spaces).
Replace one or more spaces following a colon with two spaces:
```
:%s/:*/:/g
```
Replace one or more spaces following a period or a colon with two spaces:
```
:%s/$[:.]$*/\1/g
```
Either of the two characters within brackets can be matched. This character is saved into a hold buffer, using $ and $, and restored on the right-hand side by the \1. Note that within brackets a special character such as a dot does not need to be escaped.
Standardize various uses of a word or heading:
```
:%s/^Note[:s]*/Notes:/g
```
The brackets enclose three characters: a space, a colon, and the letter s. Therefore, the pattern Note[s:] will match Note, Notes or Note:. An asterisk is added to the pattern so that it also matches Note (with zero spaces after it) and Notes: (the already correct spelling). Without the asterisk, Note would be missed entirely and Notes: would be incorrectly changed to Notes::.
Delete all blank lines:
```
:g/^$/d
```
What you are actually matching here is the beginning of the line (^) followed by the end of the line ($), with nothing in between.
Delete all blank lines, plus any lines that contain only whitespace:
```
:g/^[tab]*$/d
```
(In the line above, a tab is shown as tab.) A line may appear to be blank, but may in fact contain spaces or tabs. The previous example will not delete such a line. This example, like the one above it, searches for the beginning and end of the line. But instead of having nothing in between, the pattern tries to find any number of spaces or tabs. If no spaces or tabs are matched, the line is blank. To delete lines that contain whitespace but that aren't empty, you would have to match lines with at least one space or tab:
```
:g/^[tab][tab]*$/d
```
Delete all leading spaces on every line:
```
:%s/^*$.*$/\1/
```
Use ^* to search for one or more spaces at the beginning of each line; then use $.*$ to save the rest of the line into the first hold buffer. Restore the line without leading spaces, using \1.
Delete all spaces at the end of every line:
```
:%s/$.*$*$/\1/
```
For each line, use $.*$ to save all the text on the line, but only up until one or more spaces at the end of the line. Restore the saved text without the spaces.
The substitutions in this example and the previous one will happen only once on any given line, so the g option doesn't need to follow the replacement string.
Insert a > at the start of every line in a file:
```
:%s/^/>/
```
What we're really doing here is "replacing" the start of the line with >. Of course, the start of the line (being a logical construct, not an actual character) isn't really replaced!
This command is useful when replying to mail or USENET news postings. Frequently, it is desirable to include part of the original message in your reply. By convention, the inclusion is distinguished from your reply by setting off the included text with a right angle bracket and a couple of spaces at the start of the line. This can be done easily as shown above. (Typically, only part of the original message will be included. Unneeded text can be deleted either before or after the above replacement.) Advanced mail systems do this automatically. However, if you're using vi to edit your mail, you can do it with this command.
Add a period to the end of the next six lines:
```
:.,+5s/$/./
```
The line address indicates the current line plus five lines. The $ indicates the end of line. As in the previous example, the $ is a logical construct. You aren't really replacing the end of the line.
Reverse the order of all hyphen-separated items in a list:
```
:%s/$.*$-$.*$/\2-\1/
```
Use $.*$ to save text on the line into the first hold buffer, but only until you find -. Then use $.*$ to save the rest of the line into the second hold buffer. Restore the saved portions of the line, reversing the order of the two hold buffers. The effect of this command on several items is shown below.
```
more - display files
```
becomes:
```
display files - more
```
and:
```
lp - print files
```
becomes:
```
print files - lp
```
Change every word in a file to uppercase:
```
:%s/.*/\U&/
```
or:
```
:%s/./\U&/g
```
The \U flag at the start of the replacement string tells vi to change the replacement to uppercase. The & character replays the text matched by the search pattern as the replacement. These two commands are equivalent; however, the first form is considerably faster, since it results in only one substitution per line (.* matches the entire line, once per line), whereas the second form results in repeated substitutions on each line (. matches only a single character, with the replacement repeated on account of the trailing g).
Reverse the order of lines in a file:[8]
[8] From an article by Walter Zintz in UNIX World, May 1990.
```
:g/.*/mo0
```
The search pattern matches all lines (a line contains zero or more characters). Each line is moved, one by one, to the top of the file (that is, moved after imaginary line 0). As each matched line is placed at the top, it pushes the previously moved lines down, one by one, until the last line is on top. Since all lines have a beginning, the same result can be achieved more succinctly:
```
:g/^/mo0
```
In a database, on all lines not marked Paid in full, append the phrase Overdue:
```
:g!/Paidinfull/s/$/Overdue/
```
or the equivalent:
```
:v/Paidinfull/s/$/Overdue/
```
To affect all lines except those matching your pattern, add a ! to the g command, or simply use the v command.
For any line that doesn't begin with a number, move the line to the end of the file:
```
:g!/^[0-9]/m$
```
or:
```
:g/^[^0-9]/m$
```
As the first character within brackets, a caret negates the sense, so the two commands have the same effect. The first one says, "Don't match lines that begin with a number," and the second one says, "Match lines that don't begin with a number."
Change manually numbered section heads (e.g., 1.1, 1.2, etc.) to a troff macro (e.g., .Ah for an A-level heading):
```
:%s/^[1-9]\.[1-9]/.Ah/
```
The search string matches a digit other than zero, followed by a period, followed by another non-zero digit. Notice that the period doesn't need to be escaped in the replacement (though a \ would have no effect, either). The command above won't find chapter numbers containing two or more digits. To do so, modify the command like this:
```
:%s/^[1-9][0-9]*\.[1-9]/.Ah/
```
Now it will match chapters 10 to 99 (digits 1 to 9, followed by a digit), 100 to 999 (digits 1 to 9, followed by two digits), etc. The command still finds chapters 1 to 9 (digits 1 to 9, followed by no digit).
Remove numbering from section headings in a document. You want to change the sample lines:
```
2.1 Introduction
10.3.8 New Functions
```
into the lines:
```
Introduction
New Functions
```
Here's the command to do this:
```
:%s/^[1-9][0-9]*\.[1-9][0-9.]*//
```
The search pattern resembles the one in the previous example, but now the numbers vary in length. At a minimum, the headings contain number, period, number, so you start with the search pattern from the previous example:
```
[1-9][0-9]*\.[1-9]
```
But in this example, the heading may continue with any number of digits or periods:
```
[0-9.]*
```
Change the word Fortran to the phrase FORTRAN (acronym of FORmula TRANslation):
```
:%s/$For$$tran$/\U\1\2\E(acronymof\U\1\Emula\U\2\Eslation)/g
```
First, since we notice that the words FORmula and TRANslation use portions of the original word, we decide to save the search pattern in two pieces: $For$ and $tran$. The first time we restore it, we use both pieces together, converting all characters to uppercase: \U\1\2. Next, we undo the uppercase with \E; otherwise the remaining replacement text would all be uppercase. The replacement continues with actual typed words, then we restore the first hold buffer. This buffer still contains For, so again we convert to uppercase first: \U\1. Immediately after, we lowercase the rest of the word: \Emula. Finally, we restore the second hold buffer. This contains tran, so we precede the "replay" with uppercase, follow it with lowercase, and type out the rest of the word: \U\2\Eslation).


6.3 Pattern-Matching Rules		6.5 A Final Look at Pattern Matching