UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 24.15 Trimming a Directory Chapter 24
Other Ways to Get Disk Space
Next: 24.17 Disk Quotas
 

24.16 Trimming a Huge Directory

Some implementations of the BSD fast filesystem never truncate directories. That is, when you delete a file, the filesystem marks its directory entry as "invalid," but doesn't actually delete the entry. The old entry can be re-used when someone creates a new file, but will never go away. Therefore, the directories themselves can only get larger with time. Directories usually don't occupy a huge amount of space, but searching through a large directory is noticeably slow. So you should avoid letting directories get too large.

On many UNIX systems, the only way to "shrink a directory" is to move all of its files somewhere else and then remove it; for example:


.[^A--/-^?] 

ls -lgd old   Get old owner, group, and mode
mkdir new; chown user new; chgrp group new; chmod mode new
mv old/.??* old/.[^A--/-^?] old/* new   ^A and ^? are CTRL-a and DEL
rmdir old
mv new old

This method also works on V7-ish filesystems. It cannot be applied to the root of a filesystem.

Other implementations of the BSD fast filesystem do truncate directories. They do this after a complete scan of the directory has shown that some number of trailing fragments are empty. Complete scans are forced for any operation that places a new name into the directory - such as creat(2) or link(2). In addition, new names are always placed in the earliest possible free slot. Hence, on these systems there is another way to shrink a directory. [How do you know if your BSD filesystem truncates directories? Try the pseudo-code below (but use actual commands), and see if it has an effect. -ML ]

while (the directory can be shrunk) {
    mv (file in last slot) (some short name)
    mv (the short name) (original name)
}

This works on the root of a filesystem as well as subdirectories.

Neither method should be used if some external agent (for example, a daemon) is busy looking at the directory. The first method will also fail if the external agent is quiet but will resume and hold the existing directory open (for example, a daemon program, like sendmail, that rescans the directory, but which is currently stopped or idle). The second method requires knowing a "safe" short name - i.e., a name that doesn't duplicate any other name in the directory.

I have found the second method useful enough to write a shell script to do the job. I call the script squoze:






IFS 

: 

-r 



-i && 




#! /bin/sh
#
# squoze
last=
ls -ldg
IFS='
'
while :
do
   set `ls -f | tail -10r`
   for i do
      case "$i" in "$last"|.|..) break 2;; esac
      # _ (underscore) is the "safe, short" filename
      /bin/mv -i "$i" _ && /bin/mv _ "$i"
   done
   last="$i"
done
ls -ldg

[The ls -f option lists entries in the order they appear in the directory; it doesn't sort. -JP ] This script does not handle filenames with embedded newlines. It is, however, safe to apply to a sendmail queue while sendmail is stopped.

- CT in comp.unix.admin on Usenet, 22 August 1991


Previous: 24.15 Trimming a Directory UNIX Power ToolsNext: 24.17 Disk Quotas
24.15 Trimming a Directory Book Index24.17 Disk Quotas

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System