sed & awk

sed & awkSearch this book
Previous: 10.7 DebuggingChapter 10
The Bottom Drawer
Next: 10.9 Invoking awk Using the #! Syntax
 

10.8 Limitations

There are fixed limits within any awk implementation. The only trouble is that the documentation seldom reports them. Table 10.1 lists the limitations as described in The AWK Programming Language. These limitations are implementation-specific but they are good ballpark figures for most systems.

Table 10.1: Limitations
ItemLimit
Number of fields per record100
Characters per input record3000
Characters per output record3000
Characters per field1024
Characters per printf string3000
Characters in literal string400
Characters in character class400
Files open15
Pipes open1

NOTE: Despite the number in Table 10.1, experience has shown that most awks allow you to have more than one open pipe.

In terms of numeric values, awk uses double-precision, floating-point numbers that are limited in size by the machine's architecture.

Running into these limits can cause unanticipated problems with scripts. In developing examples for the first edition of this book, Dale thought he'd write a search program that could look for a word or sequence of words in a single paragraph. The idea was to read a document as a series of multiline records and if any of the fields contained the search term, print the record, which was a paragraph. It could be used to search through mail files where blank lines delimit paragraphs. The resulting program worked for small test files. However, when tried on larger files, the program dumped core because it encountered a paragraph that was longer than the maximum input record size, which is 3000 characters. (Actually, the file contained an included mail message where blank lines within the message were prefixed by ">".) Thus, when reading multiple lines as a single record, you better be sure that you don't anticipate records longer than 3000 characters. By the way, there is no particular error message that alerts you to the fact that the problem is the size of the current record.

Fortunately, gawk and mawk (see Chapter 11, A Flock of awks) don't have such small limits; for example, the number of fields in a record is limited in gawk to the maximum value that can be held in a C long, and certainly records can be longer than 3000 characters. These versions allow you to have more open files and pipes.

Recent versions of the Bell Labs awk have two options, -mf N and -mr N, that allow you to set the maximum number of fields and the maximum record size on the command line, as an emergency way to get around the default limits.

(Sed implementations also have their own limits, which aren't documented. Experience has shown that most UNIX versions of sed have a limit of 99 or 100 substitute (s) commands.)


Previous: 10.7 Debuggingsed & awkNext: 10.9 Invoking awk Using the #! Syntax
10.7 DebuggingBook Index10.9 Invoking awk Using the #! Syntax

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System