UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 20.7 Creating a Timestamp File for Selective Backups Chapter 20
Backing Up Files
Next: 20.9 When a Program Doesn't Understand Wildcards
 

20.8 Telling tar Which Files to Exclude or Include

[This article was written for SunOS. Many versions of tar don't have some or all of these features. Some do it in a different way. Check your tar manual page, or use the GNU tar (19.6) that we provide on the disc. -JP]

On some systems, make (28.13) creates filenames starting with a comma (,) to keep track of dependencies. Various editors create backup files whose names end with a percent sign (%) or a tilde (~). I often keep the original copy of a program with the .orig extension and old versions with a .old extension.

I often don't want to save these files on my backups. There may be some binary files that I don't want to archive, but don't want to delete either.

A solution is to use the X flag to tar (20.1). [Check your tar manual page for the F and FF options, too. -JIK ] This flag specifies that the matching argument to tar is the name of a file that lists files to exclude from the archive. Here is an example:

% find project ! -type d -print | \
egrep '/,|%$|~$|\.old$|SCCS|/core$|\.o$|\.orig$' > Exclude
% tar cvfX project.tar Exclude project

In this example, find (17.1) lists all files in the directories, but does not print the directory names explicitly. If you have a directory name in an excluded list, it will also exclude all the files inside the directory. egrep (27.5) is then used as a filter to exclude certain files from the archive. Here, egrep is given several regular expressions to match certain files. This expression seems complex but is simple once you understand a few special characters:

/

The slash is not a special character. However, since no filename can contain a slash, it matches the beginning of a filename, as output by the find command.

|

The vertical bar separates each regular expression.

$

The dollar sign is one of the two regular expression "anchors" and specifies the end of the line, or filename in this case. The other anchor, which specifies the beginning of the line, is ^ (caret). But because we are matching filenames output by find, the only filenames that can match ^ are those in the top directory.

\.

Normally the dot matches any character in a regular expression. Here, we want to match the actual character . (dot), which is why the backslash is used to quote or escape the normal meaning.

A breakdown of the patterns and examples of the files that match these patterns is given here:

PatternMatches FilesUsed by
/,starting with ,make dependency files
%$ending with %textedit backup files
~$ending with ~emacs backup files
\.old$ending with .oldold copies
SCCSin SCCS directorySource Code Control System (20.13)
/core$with name of corecore dump (52.9)
\.o$ending with .oobject files
\.orig$ending with .origoriginal version

Instead of specifying which files are to be excluded, you can specify which files to archive using the -I option. As with the exclude flag, specifying a directory tells tar to include (or exclude) the entire directory. You should also note that the syntax of the -I option is different from the typical tar flag. The next example archives all C files and makefiles. It uses egrep's () grouping operators to make the $ anchor character apply to all patterns inside the parentheses:

% find project -type f -print | \
egrep '(\.[ch]|[Mm]akefile)$' > Include
% tar cvf project.tar -I Include

I suggest using find to create the include or exclude file. You can edit it afterward, if you wish. One caution: extra spaces at the end of any line will cause that file to be ignored.

One way to debug the output of the find command is to use /dev/null (13.14) as the output file:

% tar cvfX /dev/null Exclude project

20.8.1 Including Other Directories

There are times when you want to make an archive of several directories. You may want to archive a source directory and another directory like /usr/local. The natural, but wrong, way to do this is to use the command:

% tar cvf /dev/rmt8 project /usr/local

NOTE: When using tar, you must never specify a directory name starting with a slash (/). This will cause problems when you restore a directory, as you will see later (20.10).

The proper way to handle the incorrect example above is to use the -C flag:

% tar cvf /dev/rmt8 project -C /usr local

This will archive /usr/local/... as local/.... Article 20.10 has more information.

20.8.2 Type Pathnames Exactly

For the above options to work when you extract files from an archive, the pathname given in the include or exclude file must exactly match the pathname on the tape.

Here's a sample run. I'm extracting from a file named appe.tar. Of course, this example applies to tapes, too:

% tar tf appe.tar
appe
code/appendix/font_styles.c
code/appendix/xmemo.c
code/appendix/xshowbitmap.c
code/appendix/zcard.c
code/appendix/zcard.icon

Next, I create an exclude file, named exclude, that contains the lines:

code/appendix/zcard.c
code/appendix/zcard.icon

Now, I run the following tar command:

% tar xvfX appe.tar exclude
x appe, 6421 bytes, 13 tape blocks
x code/appendix/font_styles.c, 3457 bytes, 7 tape blocks
x code/appendix/xmemo.c, 10920 bytes, 22 tape blocks
x code/appendix/xshowbitmap.c, 20906 bytes, 41 tape blocks
code/appendix/zcard.c excluded
code/appendix/zcard.icon excluded

20.8.3 Exclude the Archive File!

If you're archiving the current directory (.) instead of starting at a subdirectory, remember to start with two pathnames in the Exclude file: the archive that tar creates and the Exclude file itself. That keeps tar from trying to archive its own output!

% cat > Exclude
./somedir.tar
./Exclude
[CTRL-d]
% find . -type f -print | \
egrep  '/,|%$|~$|\.old$|SCCS|/core$|\.o$|\.orig$' >>Exclude
% tar cvfX somedir.tar Exclude .

In that example, we used cat > (25.2) to create the file quickly; you could use a text editor instead. Notice that the pathnames in the Exclude file start with ./; that's what the tar command expects when you tell it to archive the current directory (.). The long find/egrep command line uses the >> operator (13.1) to add other pathnames to the end of the Exclude file.

Or, instead of adding the archive and exclude file's pathnames to the exclude file, you can move those two files somewhere out of the directory tree that tar will read.

- BB, TOR


Previous: 20.7 Creating a Timestamp File for Selective Backups UNIX Power ToolsNext: 20.9 When a Program Doesn't Understand Wildcards
20.7 Creating a Timestamp File for Selective Backups Book Index20.9 When a Program Doesn't Understand Wildcards

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System