[Chapter 7] 7.2 String I/O

7.2 String I/O

Now we'll zoom back in to the string I/O level and examine the print and read statements, which give the shell I/O capabilities that are more analogous to those of conventional programming languages.

7.2.1 print

As we've seen countless times in this book, print simply prints its arguments to standard output. You should use it instead of the echo command, whose functionality differs from system to system. [3] Now we'll explore the command in greater detail.

[3] Specifically, there is a difference between System V and BSD versions. The latter accepts options similar to those of print, while the former accepts C language-style escape sequences.

7.2.1.1 print escape sequences

print accepts a number of options, as well as several escape sequences that start with a backslash. [4] These are similar to the escape sequences recognized by echo and the C language; they are listed in Table 7.2.

[4] You must use a double backslash if you don't surround the string that contains them with quotes; otherwise, the shell itself "steals" a backslash before passing the arguments to print.

These sequences exhibit fairly predictable behavior, except for \f on some displays, it causes a screen clear, while on others it causes a line feed. It ejects the page on most printers. \v is somewhat obsolete; it usually causes a line feed.

Table 7.2: print Escape Sequences
Sequence	Character printed
\a	ALERT or `[CTRL-G]`
\b	BACKSPACE or `[CTRL-H]`
\c	Omit final NEWLINE
\f	FORMFEED or `[CTRL-L]`
\n	NEWLINE (not at end of command) or `[CTRL-J]`
\r	RETURN (ENTER) or `[CTRL-M]`
\t	TAB or `[CTRL-I]`
\v	VERTICAL TAB or `[CTRL-K]`
\0n	ASCII character with octal (base-8) value n, where n is 1 to 3 digits
\\	Single backslash

The \0n sequence is even more device-dependent and can be used for complex I/O, such as cursor control and special graphics characters.

7.2.1.2 Options to print

print also accepts a few dash options; we've already seen -n for omitting the final NEWLINE. The options are listed in Table 7.3.

Table 7.3: print Options
Option	Function
-n	Omit the final newline (same as the \c escape sequence)
-r	Raw; ignore the escape sequences listed above
-p	Print on pipe to coroutine; see next chapter
-s	Print to command history file. See Chapter 2, Command-line Editing.
-un	Print to file descriptor n

Notice that some of these are redundant: print -n is the same as print with c at the end of a line; print -un ... is equivalent to print ... >&n (though the former is more efficient).

However, print -s is not the same as print ... >> $HISTFILE. The latter command renders the vi and emacs editing modes temporarily inoperable; you must use print -s if you want to print to your history file.

Printing to your history file is useful if you want to edit something that the shell expands when it processes a command line; for example, a complex environment variable such as PATH. If you enter the command print -s PATH=$PATH and then press [CTRL-P] in emacs-mode (or ESC k in vi-mode), you will see something like this:

$ PATH=/bin:/usr/bin:/etc:/usr/ucb:/usr/local/bin:/home/billr/bin

That is, the shell expands the variable (and would expand anything else, like command substitutions, wildcards, etc.) before it writes the line to the history file. Your cursor will be at the end of the line (or at the beginning of the line in vi-mode), and you can edit your PATH without having to type in the whole thing again.

7.2.2 read

The other half of the shell's string I/O facilities is the read command, which allows you to read values into shell variables. The basic syntax is:

read var1 var2...

There are a few options, which we will cover in the section "Options to read," below. This statement takes a line from the standard input and breaks it down into words delimited by any of the characters in the value of the environment variable IFS (see Chapter 4, Basic Shell Programming; these are usually a space, a TAB, and NEWLINE). The words are assigned to variables var1, var2, etc. For example:

$ read fred bob
dave pete
$ print "$fred"
dave
$ print "$bob"
pete

If there are more words than variables, then excess words are assigned to the last variable. If you omit the variables altogether, the entire line of input is assigned to the variable REPLY.

You may have identified this as the "missing ingredient" in the shell programming capabilities we have seen thus far. It resembles input statements in conventional languages, like its namesake in Pascal. So why did we wait this long to introduce it?

Actually, read is sort of an "escape hatch" from traditional shell programming philosophy, which dictates that the most important unit of data to process is a text file, and that UNIX utilities such as cut, grep, sort, etc., should be used as building blocks for writing programs.

read, on the other hand, implies line-by-line processing. You could use it to write a shell script that does what a pipeline of utilities would normally do, but such a script would inevitably look like:

while (read a line) do
    process the line
    print the processed line
end

This type of script is usually much slower than a pipeline; furthermore, it has the same form as a program someone might write in C (or some similar language) that does the same thing much much faster. In other words, if you are going to write it in this line-by-line way, there is no point in writing a shell script. (The author has gone for years without writing a script with read in it.)

7.2.2.1 Reading lines from files

Nevertheless, shell scripts with read are useful for certain kinds of tasks. One is when you are reading data from a file small enough so that efficiency isn't a concern (say a few hundred lines or less), and it's really necessary to get bits of input into shell variables.

One task that we have already seen fits this description: Task 5-4, the script that a system administrator could use to set a user's TERM environment variable according to which terminal line he or she is using. The code in Chapter 5, Flow Control used a case statement to select the correct value for TERM.

This code would presumably reside in /etc/profile, the system-wide initialization file that the Korn shell runs before running a user's .profile. If the terminals on the system change over time - as surely they must - then the code would have to be changed. It would be better to store the information in a file and change just the file instead.

Assume we put the information in a file whose format is typical of such UNIX "system configuration" files: each line contains a device name, a TAB, and a TERM value. If the file, which we'll call /etc/terms, contained the same data as the case statement in Chapter 5, Flow Control, it would look like this:

console	s531
tty01	gl35a
tty03	gl35a
tty04	gl35a
tty07	t2000
tty08	s531

We can use read to get the data from this file, but first we need to know how to test for the end-of-file condition. Simple: read's exit status is 1 (i.e., non-0) when there is nothing to read. This leads to a clean while loop:

TERM=vt99       # assume this as a default
line=$(tty)
while read dev termtype; do
    if [[ $dev = $line ]]; then
        TERM=$termtype
        print "TERM set to $TERM."
        break
    fi
done

The while loop reads each line of the input into the variables dev and termtype. In each pass through the loop, the if looks for a match between $dev and the user's tty ($line, obtained by command substitution from the tty command). If a match is found, TERM is set, a message is printed, and the loop exits; otherwise TERM remains at the default setting of vt99.

We're not quite done, though: this code reads from the standard input, not from /etc/terms! We need to know how to redirect input to multiple commands. It turns out that there are a few ways of doing this.

7.2.2.2 I/O Redirection and multiple commands

One way to solve the problem is with a subshell, as we'll see in the next chapter. This involves creating a separate process to do the reading. However, it is usually more efficient to do it in the same process; the Korn shell gives us three ways of doing this.

The first, which we have seen already, is with a function:

function findterm {
    TERM=vt99       # assume this as a default
    line=$(tty)
    while read dev termtype; do
        if [[ $dev = $line ]]; then
            TERM=$termtype
            print "TERM set to $TERM."
            break;
        fi
    done
}

findterm < /etc/terms

A function acts like a script in that it has its own set of standard I/O descriptors, which can be redirected in the line of code that calls the function. In other words, you can think of this code as if findterm were a script and you typed findterm < /etc/terms on the command line. The read statement takes input from /etc/terms a line at a time, and the function runs correctly.

The second way is by putting the I/O redirector at the end of the loop, like this:

TERM=vt99       # assume this as a default
line=$(tty)
while read dev termtype; do
    if [[ $dev = $line ]]; then
        TERM=$termtype
        print "TERM set to $TERM."
        break;
    fi
done < /etc/terms

You can use this technique with any flow-control construct, including if...fi, case...esac, select...done, and until...done. This makes sense because these are all compound statements that the shell treats as single commands for these purposes. This technique works fine - the read command reads a line at a time - as long as all of the input is done within the compound statement.

7.2.2.3 Code blocks

But if you want to redirect I/O to or from an arbitrary group of commands without creating a separate process, you need to use a construct that we haven't seen yet. If you surround some code with { and }, the code will behave like a function that has no name. This is another type of compound statement. In accordance with the equivalent concept in the C language, we'll call this a block of code. [5]

[5] LISP programmers may prefer to think of this as an anonymous function or lambda- function.

What good is a block? In this case, it means that the code within the curly brackets ({}) will take standard I/O descriptors just as we described in the last block of code. This construct is appropriate for the current example because the code needs to be called only once, and the entire script is not really large enough to merit breaking down into functions. Here is how we use a block in the example:

{
    TERM=vt99       # assume this as a default
    line=$(tty)
    while read dev termtype; do
        if [[ $dev = $line ]]; then
            TERM=$termtype
            print "TERM set to $TERM."
            break;
        fi
    done
} < /etc/terms

To help you understand how this works, think of the curly brackets and the code inside them as if they were one command, i.e.:

{ TERM=vt99; line=$(tty); while ... } < /etc/terms

Configuration files for system administration tasks like this one are actually fairly common; a prominent example is /etc/hosts, which lists machines that are accessible in a TCP/IP network. We can make /etc/terms more like these standard files by allowing comment lines in the file that start with #, just as in shell scripts. This way /etc/terms can look like this:

#
# System Console is a Shande 531s
console	s531
#
# Prof. Subramaniam's line has a Givalt GL35a
tty01	gl35a
...

We can handle comment lines in two ways. First, we could modify the while loop so that it ignores lines beginning with #. We would take advantage of the fact that the equal sign (=) under [[/]] does pattern matching, not just equality testing:

if [[ $dev != \#* && $dev = $line ]]; then
    ...

The pattern is #*, which matches any string beginning with #. We must precede # with a backslash so that the shell doesn't treat the rest of the line as a comment. Also, remember from Chapter 5 that the && combines the two conditions so that both must be true for the entire condition to be true.

This would certainly work, but the more usual way to filter out comment lines is to use a pipeline with grep. We give grep the regular expression ^[^#], which matches anything except lines beginning with #. Then we change the call to the block so that it reads from the output of the pipeline instead of directly from the file. [6]

[6] Unfortunately, using read with input from a pipe is often very inefficient, because of issues in the design of the shell that aren't relevant here.

grep "^[^#]" /etc/terms | {
    TERM=vt99
    ...
}

We can also use read to improve our solution to Task 6-2, in which we emulate the multicolumn output of ls. In the solution in the previous chapter, we assumed that (as in System V-derived versions of UNIX) filenames are limited to 14 characters, and we used 14 as a fixed column width. We'll improve the solution so that it allows any filename length (as in BSD-derived UNIX versions) and uses the length of the longest filename (plus 2) as the column width.

We will need to pass the output of ls twice through the list of files we want to display in multicolumn format. In the first pass, we will find the longest filename and use that to set the number of columns as well as their width; the second pass will do the actual output. Here is a block of code for the first pass:

ls "$@" | {
    let width=0
    while read fname; do
        if (( ${#fname} > $width )); then
            let width=${#fname}
        fi
    done
    let width="$width + 2"
    let numcols="${COLUMNS:-80} / $width"
}

This code looks a bit like an exercise from a first-semester programming class. The while loop goes through the input looking for files with names that are longer than the longest found so far; if a longer one is found, its length is saved as the new longest length.

After the loop finishes, we add 2 to the width to allow for space between columns. Then we divide the width of the terminal by the column width to get the number of columns; the shell's integer division operator truncates remainders, which is just what we want. Recall from Chapter 3 that the built-in variable COLUMNS often contains the display width; the construct ${COLUMNS:-80} gives a default of 80 if this variable is not set.

The results of the block are the variables width and numcols. These are global variables, so they are accessible by the rest of the code inside our (eventual) script. In particular, we need them in our second pass through the filenames. The code for this resembles the code to our original solution; all we need to do is replace the fixed column width and number of columns with the variables:

set -A filenames $(ls $@)
typeset -L$width fname
let count=0

while (( $count < ${#filenames[*]} )); do
    fname=${filenames[$count]}
    print "$fname  \\"
    let count="count + 1"
    if [[ $((count % numcols)) = 0 ]]; then
         print		# output a NEWLINE
    fi
done

if [[ $((count % numcols)) != 0 ]]; then
    print
fi

The entire script consists of both pieces of code.

7.2.2.4 Reading User Input

The other type of task to which read is suited is prompting a user for input. Think about it: we have hardly seen any such scripts so far in this book. In fact, the only ones were the modified solutions to Task 5-4, which involved select.

As you've probably figured out, read can be used to get user input into shell variables. We can use print to prompt the user, like this:

print -n 'terminal? '
read TERM
print "TERM is $TERM"

Here is what this looks like when it runs:

terminal? vt99 
TERM is vt99

However, shell convention dictates that prompts should go to standard error, not standard output. (Recall that select prompts to standard error.) We could just use file descriptor 2 with the output redirector we saw earlier in this chapter:

print -n 'terminal? ' >&2
read TERM
print TERM is $TERM

However, this has various disadvantages. The shell provides a better way of doing the same thing: if you follow the first variable name in a read statement with a question mark (?) and a string, the shell will use that string as a prompt. In other words:

read TERM?'terminal? '
print "TERM is $TERM"

does the same as the above. This looks a bit nicer; also, the shell knows not to generate the prompt if the input is redirected to come from a file, and this scheme allows you to use vi- or emacs-mode on your input line.

We'll flesh out this simple example by showing how Task 5-4 would be done if select didn't exist. Compare this with the code in Chapter 5:

print 'Select your terminal type:'
done=false
while [[ $done = false ]]; do
    done=true		# assume user will make a valid choice
    {
        print '1) gl35a'
        print '2) t2000'
        print '3) s531'
        print '4) vt99'
    } >&2
    read REPLY?'terminal? '

    case $REPLY in
        1 ) TERM=gl35a ;;
        2 ) TERM=t2000 ;;
        3 ) TERM=s531 ;;
        4 ) TERM=vt99 ;;
        * ) print 'invalid.'
	          done=false ;;
    esac
done
print 'TERM is $TERM'

The while loop is necessary so that the code repeats if the user makes an invalid choice.

This is roughly twice as many lines of code as the first solution in Chapter 5 - but exactly as many as the later, more user-friendly version! This shows that select saves you code only if you don't mind using the same strings to display your menu choices as you use inside your script.

However, select has other advantages, including the ability to construct multicolumn menus if there are many choices, and better handling of null user input.

7.2.2.5 Options to read

read takes a set of options that are similar to those for print. Table 7.4 lists them.

Table 7.4: read Options
Option	Function
-r	Raw; do not use \\ as line continuation character.
-p	Read from pipe to coroutine; see next chapter.
-s	Save input in command history file. See Chapter 1.
-un	Read from file descriptor n.

read lets you input lines that are longer than the width of your display device by providing backslash (\) as a continuation character, just as in shell scripts. The -r option to read overrides this, in case your script reads from a file that may contain lines that happen to end in backslashes.

read -r also preserves any other escape sequences the input might contain. For example, if the file fred contains this line:

A line with a\n escape sequence

Then read -r fredline will include the backslash in the variable fredline, whereas without the -r, read will "eat" the backslash. As a result:

$ read -r fredline < fred 
$ print "$fredline" 
A line with a
 escape sequence
$

However:

$ read fredline < fred 
$ print "$fredline" 
A line with an escape sequence
$

The -s option helps you if you are writing a highly interactive script and you want to provide the same command-history capability as the shell itself has. For example, say you are writing a new version of mail as a shell script. Your basic command loop might look like this:

while read -s cmd; do
    # process the command
done

Using read -s allows the user to retrieve previous commands to your program with the emacs-mode [CTRL-P] command or the vi-mode ESC k command. The kshdb debugger in Chapter 9, Debugging Shell Programs uses this feature.

Finally, the -uN option is useful in scripts that read from more than one file at the same time. Here's an example of this that also uses the n< I/O redirector that we saw earlier in this chapter.

Task 7.4

Write a script that prints the contents of two files side by side.

We'll format the output so the two output columns are fixed at 30 characters wide. Here is the code:

typeset -L30 f1 f2
while read -u3 f1 && read -u4 f2; do
    print "$f1$f2"
done 3<$1 4<$2

read -u3 reads from file descriptor 3, and 3<$1 directs the file given as first argument to be input on that file descriptor; the same is true for the second argument and file descriptor 4. Remember that file descriptors 0, 1, and 2 are already used for standard I/O. We use file descriptors 3 and 4 for our two input files; it's best to start from 3 and work upwards to the shell's limit, which is 9.

The typeset command and the quotes around the argument to print ensure that the output columns are 30 characters wide and that all whitespace in the lines from the file is preserved. The while loop reads one line from each file until at least one of them runs out of input.

Assume the file dave contains the following:

DAVE
Height: 5'10"
Weight: 175 lbs.
Hair: brown
Eyes: brown

And the file shirley contains this:

SHIRLEY
Height: 5'6"
Weight: 142 lbs.
Hair: blonde
Eyes: blue

If the script is called twocols, then twocols dave shirley produces this output:

DAVE                          SHIRLEY
Height: 5'10"                 Height: 5'6"                  
Weight: 175 lbs.              Weight: 142 lbs.              
Hair: brown                   Hair: blonde                  
Eyes: brown                   Eyes: blue


7.1 I/O Redirectors		7.3 Command-line Processing