Salvaging Volumes

An unexpected interruption while the Volume Server or File Server is manipulating the data in a volume can leave the volume in an intermediate state (corrupted), rather than just creating a discrepancy between the information in the VLDB and volume headers. For example, the failure of the operation that saves changes to a file (by overwriting old data with new) can leave the old and new data mixed together on the disk.

If an operation halts because the Volume Server or File Server exits unexpectedly, the BOS Server automatically shuts down all components of the fs process and invokes the Salvager. The Salvager checks for and repairs any inconsistencies it can. Sometimes, however, there are symptoms of the following sort, which indicate corruption serious enough to create problems but not serious enough to cause the File Server component to fail. In these cases you can invoke the Salvager yourself by issuing the bos salvage command.

When you notice symptoms such as these, use the bos salvage command to invoke the Salvager before corruption spreads. (Even though it operates on volumes, the command belongs to the bos suite because the BOS Server must coordinate the shutdown and restart of the Volume Server and File Server with the Salvager. It shuts them down before the Salvager starts, and automatically restarts them when the salvage operation finishes.)

All of the AFS data stored on a file server machine is inaccessible during the salvage of one or more partitions. If you salvage just one volume, it alone is inaccessible.

When processing one or more partitions, the command restores consistency to corrupted read/write volumes where possible. For read-only or backup volumes, it inspects only the volume header:

Combine the bos salvage command's arguments as indicated to salvage different numbers of volumes:

The Salvager always writes a trace to the /usr/afs/logs/SalvageLog file on the file server machine where it runs. To record the trace in another file as well (either in AFS or on the local disk of the machine where you issue the bos salvage command), name the file with the -file argument. Or, to display the trace on the standard output stream as it is written to the /usr/afs/logs/SalvageLog file, include the -showlog flag.

By default, multiple Salvager subprocesses run in parallel: one for each partition up to four, and four subprocesses for four or more partitions. To increase or decrease the number of subprocesses running in parallel, provide a positive integer value for the -parallel argument.

If there is more than one server partition on a physical disk, the Salvager by default salvages them serially to avoid the inefficiency of constantly moving the disk head from one partition to another. However, this strategy is often not ideal if the partitions are configured as logical volumes that span multiple disks. To force the Salvager to salvage logical volumes in parallel, provide the string all as the value for the -parallel argument. Provide a positive integer to specify the number of subprocesses to run in parallel (for example, -parallel 5all for five subprocesses), or omit the integer to run up to four subprocesses, depending on the number of logical volumes being salvaged.

The Salvager creates temporary files as it runs, by default writing them to the partition it is salvaging. The number of files can be quite large, and if the partition is too full to accommodate them, the Salvager terminates without completing the salvage operation (it always removes the temporary files before exiting). Other Salvager subprocesses running at the same time continue until they finish salvaging all other partitions where there is enough disk space for temporary files. To complete the interrupted salvage, reissue the command against the appropriate partitions, adding the -tmpdir argument to redirect the temporary files to a local disk directory that has enough space.

The -orphans argument controls how the Salvager handles orphaned files and directories that it finds on server partitions it is salvaging. An orphaned element is completely inaccessible because it is not referenced by the vnode of any directory that can act as its parent (is higher in the filespace). Orphaned objects occupy space on the server partition, but do not count against the volume's quota.

During the salvage, the output of the bos status command reports the following auxiliary status for the fs process:

   Salvaging file system

To salvage volumes

  1. Verify that you are listed in the /usr/afs/etc/UserList file. If necessary, issue the bos listusers command, which is fully described in To display the users in the UserList file.

       % bos listusers <machine name>
    
  2. Issue the bos salvage command to salvage one or more volumes.

       % bos salvage  -server <machine name>  [-partition <salvage partition>]  \
                      [-volume <salvage volume number or volume name>]  \
                      [-file salvage log output file]  [-all]  [-showlog]  \
                      [-parallel <# of max parallel partition salvaging>]  \
                      [-tmpdir <directory to place tmp files>]  \
                      [-orphans <ignore | remove | attach >]
    

    where

    -server

    Names the file server machine on which to salvage volumes. This argument can be combined either with the -all flag, the -partition argument, or both the -partition and -volume arguments.

    -partition

    Names a single partition on which to salvage all volumes. The -server argument must be provided along with this one.

    -volume

    Specifies the name or volume ID number of one read/write volume to salvage. Combine this argument with the -server and -partition arguments.

    -file

    Specifies the complete pathname of a file into which to write a trace of the salvage operation, in addition to the /usr/afs/logs/SalvageLog file on the server machine. If the file pathname is local, the trace is written to the specified file on the local disk of the machine where the bos salvage command is issued. If the -volume argument is included, the file can be in AFS, though not in the volume being salvaged. Do not combine this argument with the -showlog flag.

    -all

    Salvages all volumes on all of the partitions on the machine named by the -server argument.

    -showlog

    Displays the trace of the salvage operation on the standard output stream, as well as writing it to the /usr/afs/logs/SalvageLog file.

    -parallel

    Specifies the maximum number of Salvager subprocesses to run in parallel. Provide one of three values:

    • An integer from the range 1 to 32. A value of 1 means that a single Salvager process salvages the partitions sequentially.

    • The string all to run up to four Salvager subprocesses in parallel on partitions formatted as logical volumes that span multiple physical disks. Use this value only with such logical volumes.

    • The string all followed immediately (with no intervening space) by an integer from the range 1 to 32, to run the specified number of Salvager subprocesses in parallel on partitions formatted as logical volumes. Use this value only with such logical volumes.

    The BOS Server never starts more Salvager subprocesses than there are partitions, and always starts only one process to salvage a single volume. If this argument is omitted, up to four Salvager subprocesses run in parallel.

    -tmpdir

    Specifies the full pathname of a local disk directory to which the Salvager process writes temporary files as it runs. By default, it writes them to the partition it is currently salvaging.

    -orphans

    Controls how the Salvager handles orphaned files and directories. Choose one of the following three values:

    ignore

    Leaves the orphaned objects on the disk, but prints a message to the /usr/afs/logs/SalvageLog file reporting how many orphans were found and the approximate number of kilobytes they are consuming. This is the default if you omit the -orphans argument.

    remove

    Removes the orphaned objects, and prints a message to the /usr/afs/logs/SalvageLog file reporting how many orphans were removed and the approximate number of kilobytes they were consuming.

    attach

    Attaches the orphaned objects by creating a reference to them in the vnode of the volume's root directory. Since each object's actual name is now lost, the Salvager assigns each one a name of the following form:

    _ _ORPHANFILE_ _. index for files
    _ _ORPHANDIR_ _. index for directories

    where index is a two-digit number that uniquely identifies each object. The orphans are charged against the volume's quota and appear in the output of the ls command issued against the volume's root directory.