[Chapter 17] 17.4 Fixed-Length Random Access Databases

17.4 Fixed-Length Random Access Databases

Another form of persistent data is the fixed-length, record-oriented disk file. In this scheme, the data consists of a number of records of identical length. The numbering of the records is either not important or determined by some indexing scheme.

For example, we might have a series of records in which the data has 40 characters of first name, a one-character middle initial, 40 characters of last name, and then a two-byte integer for the age. Each record is then 83 bytes long. If we were reading all of the data in the database, we'd read chunks of 83 bytes until we got to the end. If we wanted to go to the fifth record, we'd skip ahead four times 83 bytes (332 bytes) and read the fifth record directly.

Perl supports programs that use such a disk file. A few things are necessary in addition to what you already know:

Opening a disk file for both reading and writing
Moving around in this file to an arbitrary position
Fetching data by a length rather than up to the next newline
Writing data down in fixed-length blocks

The open function takes an additional plus sign before its I/O direction specification to indicate that the file is really being opened for both reading and writing. For example:

open(A,"+<b");  # open file b read/write (error if file absent)
open(C,"+>d");  # create file d, with read/write access
open(E,"+>>f"); # open or create file f with read/write access

Notice that all we've done was to prepend a plus sign to the I/O direction.

Once we've got the file open, we need to move around in it. We do this with the seek function, which takes the same three parameters as the fseek (3) library routine. The first parameter is a filehandle; the second parameter gives an offset, which is interpreted in conjunction with the third parameter. Usually, you'll want the third parameter to be zero so that the second parameter selects a new absolute position for next read from or write to the file. For example, to go to the fifth record on the filehandle NAMES (as described above), you can do this:

seek(NAMES,4*83,0);

Once the file pointer has been repositioned, the next input or output will start there. For output, use the print function, but be sure that the data you are writing is the right length. To obtain the right length, we can call upon the pack function:

print NAMES pack("A40 A A40 s", $first, $middle, $last, $age);

That pack specifier gives 40 characters for $first, a single character for $middle, 40 more characters for $last, and a short (two bytes) for the $age. This should be 83 bytes long, and will be written at the current file position.

Last, we need to fetch a particular record. Although the <NAMES> construct returns all of the data from the current position to the next newline, that's not correct; the data is supposed to go for 83 bytes, and there probably isn't a newline right there. Instead, we use the read function, which looks and works a lot like its UNIX system call counterpart:

$count = read(NAMES, $buf, 83);

The first parameter for read is the filehandle. The second parameter is a scalar variable that holds the data that will be read. The third parameter gives the number of bytes to read. The return value from read is the number of bytes actually read; typically the same number as the number of bytes asked for unless the filehandle is not opened or you are too close to the end of the file.

Once you have the 83-character data, just break it into its component parts with the unpack function:

($first, $middle, $last, $age) = unpack("A40 A A40 s", $buf);

Note that the pack and unpack format strings are the same. Most programs store this string in a variable early in the program, and even compute the length of the records using pack instead of sprinkling the constant 83 everywhere:

$names = "A40 A A40 s";
$names_length = length(pack($names)); # probably 83


17.3 Using a DBM Hash		17.5 Variable-Length ( Text) Databases