sed & awk

sed & awkSearch this book
Previous: 7.5 Records and FieldsChapter 7
Writing Scripts for awk
Next: 7.7 System Variables
 

7.6 Expressions

The use of expressions in which you can store, manipulate, and retrieve data is quite different from anything you can do in sed, yet it is a common feature of most programming languages.

An expression is evaluated and returns a value. An expression consists of any combination of numeric and string constants, variables, operators, functions, and regular expressions. We covered regular expressions in detail in Chapter 2, Understanding Basic Operations, and they are summarized in Appendix B. Functions will be discussed fully in Chapter 9, Functions. In this section, we will look at expressions consisting of constants, variables, and operators.

There are two types of constants: string or numeric ("red" or 1). A string must be quoted in an expression. Strings can make use of the escape sequences listed in Table 7.1.

Table 7.1: Escape Sequences
SequenceDescription
\aAlert character, usually ASCII BEL character
\bBackspace
\fFormfeed
\nNewline
\rCarriage return
\tHorizontal tab
\vVertical tab
\dddCharacter represented as 1 to 3 digit octal value
\xhexCharacter represented as hexadecimal value[3]
\cAny literal character c (e.g., \" for ")[4]

[3] POSIX does not provide "\x", but it is commonly available.

[4] Like ANSI C, POSIX leaves purposely undefined what you get when you put a backslash before any character not listed in the table. In most awks, you just get that character.

A variable is an identifier that references a value. To define a variable, you only have to name it and assign it a value. The name can only contain letters, digits, and underscores, and may not start with a digit. Case distinctions in variable names are important: Salary and salary are two different variables. Variables are not declared; you do not have to tell awk what type of value will be stored in a variable. Each variable has a string value and a numeric value, and awk uses the appropriate value based on the context of the expression. (Strings that do not consist of numbers have a numeric value of 0.) Variables do not have to be initialized; awk automatically initializes them to the empty string, which acts like 0 if used as a number. The following expression assigns a value to x:

x = 1

x is the name of the variable, = is an assignment operator, and 1 is a numeric constant.

The following expression assigns the string "Hello" to the variable z:

z = "Hello"

A space is the string concatenation operator. The expression:

z = "Hello" "World"

concatenates the two strings and assigns "HelloWorld" to the variable z.

The dollar sign ($) operator is used to reference fields. The following expression assigns the value of the first field of the current input record to the variable w:

w = $1

A variety of operators can be used in expressions. Arithmetic operators are listed in Table 7.2.

Table 7.2: Arithmetic Operators
OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
%Modulo
^Exponentiation
**Exponentiation[5]

[5] This is a common extension. It is not in the POSIX standard, and often not in the system documentation, either. Its use is thus nonportable.

Once a variable has been assigned a value, that value can be referenced using the name of the variable. The following expression adds 1 to the value of x and assigns it to the variable y:

y = x + 1

So, evaluate x, add 1 to it, and put the result into the variable y. The statement:

print y

prints the value of y. If the following sequence of statements appears in a script:

x = 1
y = x + 1
print y

then the value of y is 2.

We could reduce these three statements to two:

x = 1
print x + 1

Notice, however, that after the print statement the value of x is still 1. We didn't change the value of x; we simply added 1 to it and printed that value. In other words, if a third statement print x followed, it would output 1. If, in fact, we wished to accumulate the value in x, we could use an assignment operator +=. This operator combines two operations; it adds 1 to x and assigns the new value to x. Table 7.3 lists the assignment operators used in awk expressions.

Table 7.3: Assignment Operators
OperatorDescription
++Add 1 to variable.
--Subtract 1 from variable.
+=Assign result of addition.
-=Assign result of subtraction.
*=Assign result of multiplication.
/=Assign result of division.
%=Assign result of modulo.
^=Assign result of exponentiation.
**=Assign result of exponentiation.[6]

[6] As with "**", this is a common extension, which is also nonportable.

Look at the following example, which counts each blank line in a file.

# Count blank lines.
/^$/ { 
	print x += 1 
     }

Although we didn't initialize the value of x, we can safely assume that its value is 0 up until the first blank line is encountered. The expression "x += 1" is evaluated each time a blank line is matched and the value of x is incremented by 1. The print statement prints the value returned by the expression. Because we execute the print statement for every blank line, we get a running count of blank lines.

There are different ways to write expressions, some more terse than others. The expression "x += 1" is more concise than the following equivalent expression:

x = x + 1

But neither of these expressions is as terse as the following expression:

++x

"++" is the increment operator. ("--" is the decrement operator.) Each time the expression is evaluated the value of the variable is incremented by one. The increment and decrement operators can appear on either side of the operand, as prefix or postfix operators. The position has a different effect.

++x	Increment x before returning value (prefix)
x++	Increment x after returning value (postfix)

For instance, if our example was written:

/^$/ { 
	print x++
     }

When the first blank line is matched, the expression returns the value "0"; the second blank line returns "1", and so on. If we put the increment operator before x, then the first time the expression is evaluated, it will return "1."

Let's implement that expression in our example. In addition, instead of printing a count each time a blank line is matched, we'll accumulate the count as the value of x and print only the total number of blank lines. The END pattern is the place to put the print that displays the value of x after the last input line is read.

# Count blank lines.
/^$/ { 
	++x
}
END {
	print x
}

Let's try it on the sample file that has three blank lines in it.

$ awk -f awkscr test
3

The script outputs the number of blank lines.

7.6.1 Averaging Student Grades

Let's look at another example, one in which we sum a series of student grades and then calculate the average. Here's what the input file looks like:

john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84

There are five grades following the student's name. Here is the script that will give us each student's average:

# average five grades 
{ total = $2 + $3 + $4 + $5 + $6
  avg = total / 5
  print $1, avg }

This script adds together fields 2 through 6 to get the sum total of the five grades. The value of total is divided by 5 and assigned to the variable avg. ("/" is the operator for division.) The print statement outputs the student's name and average. Note that we could have skipped the assignment of avg and instead calculated the average as part of the print statement, as follows:

print $1, total / 5

This script shows how easy it is to write programs in awk. Awk parses the input into fields and records. You are spared having to read individual characters and declaring data types. Awk does this for you, automatically.

Let's see a sample run of the script that calculates student averages:

$ awk -f grades.awk grades
john 87.4
andrea 86
jasper 85.6


Previous: 7.5 Records and Fieldssed & awkNext: 7.7 System Variables
7.5 Records and FieldsBook Index7.7 System Variables

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System