Solaris – Everything you need to know before you run FSCK

Everything you need to know before running FSCK in solarisWhenever a file system develops inconsistencies because some unexpected issues, the integrity of the file system needs to be verified.  UNIX has developed a very sophisticated file system integrity check ,fsck utility,  which attempts to verify that all the links and blocks are correctly tied together.
This post will focus on several things that we need to know before we use it on a working machine.

:::: When should be the fsck needs to be run ?

Normally, the system recommends for fsck during system boot, when it tries to mount a file system that is dirty.  However we might encounter some situations to run fsck manually to fix the issues during manual mounts.

::: How to run fsck ?

To run fsck the file system that is being checked must be unmounted. The best way to run fsck is run it from single user mode. In single user mode, all the file systems can be checked even if they are stable with the following command:
# fsck -o full /dev/rdsk/c0t0d0s0

::: What are components that fsck check  for consistency

The kinds of consistency checks that the fsck command applies to these UFS file system components: superblock, cylinder group blocks, inodes, indirect blocks, and data blocks.

::: Checking ufs file systems

Fsck is a 5-phase process ( for UFS filesystems). Fsck can automatically correct most of the errors. Serious errors reported by ufs’s fsck at the very beginning, especially before reporting the start of phase 1, indicate an invalid super-block.
Fsck should be terminated and restarted with -b option specifying one of the alternate super-blocks. Block 32 is always an alternate and can be tried first. But if the front of the file system was overwritten, it may also be damaged.
What is Superblock?
The superblock stores summary information, which is the most commonly corrupted component in a UFS file system. Every change to the file system affects the super-block which is cached in RAM. Periodically, at the sync interval it is written in to the disk. If the CPU is halted and the last command is not a sync command, the superblock almost certainly becomes corrupted.
Below are the 5-phases of fsck:
Phase 1: Checks Blocks and Sizes
Free blocks are stored in the cylinder group block maps. The fsck command checks that all the blocks marked as free are not claimed by any files. When all the blocks have been accounted for, the fsck command checks to see if the number of free blocks plus the number of blocks that are claimed by the inodes equal the total number of blocks in the file system. The file system size and layout information are the most critical pieces of information for the fsck command. Although there is no way to actually check these sizes because they are statically determined when the file system is created. However, the fsck command can check that the sizes are within reasonable bounds. All other file system checks require that these sizes be correct. If the fsck command detects corruption in the static parameters of the primary superblock, it requests the operator to specify the location of an alternate superblock.
Phase 1 checks the inode list, looking for individual inode entries. The list of inodes is checked sequentially starting with inode 2 (inode 0 and inode 1 are reserved). Each inode is checked for inconsistencies in the following: Format and type, Link count, Duplicate block, Bad block numbers, and Inode size. Errors requiring answers include: UNKNOWN FILE TYPE I=inode number (CLEAR)
The file type bits are invalid in the inode. Options are to leave the the problem and attempt to recover the data by hand later or to erase the entry and its data by clearing the inode. PARTIALLY TRUNCATED INODE I=inode number (SALVAGE) The inode appears to point to less data than the file does. This is safely salvaged, because it indicates a crash while truncating the file to shorten it. block BAD I=inode number block DUP I=inode number The disk block pointed to by the inode is either out of range for this inode or already in the use by another file. This in informational message.
Phase 2: Check path names
This phase removes directory entries from bad inodes found in phase 1 and checks for directories with inode pointers that are out of range or pointing to bad inodes.
The following message can come up when there is a major damage to the inode table.:
  • ROOT INODE NOT DIRECTORY (FIX?)
 When ever A bad inode number is found, an unallocated inode was used in a directory, or an inode that had a bad or duplicate block number in it is referenced, as shown below:
  • I OUT OF RANGE I=inode number NAME=file name(REMOVE?)
  •  UNALLOCATED I=inode number OWNER=0 MODE=M SIZE=S MTIME=T TYPE=F (REMOVE?)
  •  BAD/DUP I=inode number OWNER=0 MODE=M SIZE=S MTIME=T TYPE=F (REMOVE?)
 
The choices given are to remove the file losing the data or to leave the error. If you leave the file the file system is still damaged, but you have the chance to try to dump the file first and salvage part of the data before rerunning fsck to remove the entry.
Below errors are correctable with little chance of subsequent damage.
  • Various Directory Length Errors: zero length, too short, not multiple of block size, corrupted
Fsck will ask us for confirmation to  “fix”  or “remove” the directory as appropriate. 
Phase 3: Check Connectivity
In this phase fsck command checks the general connectivity of the file system. It will detect errors in unreferenced directories. If a directory is found that is not linked to the file system, the fsck command links the directory to the lost+found directory of the file system. This condition can occur when inodes are written to the file system, but the corresponding directory data blocks are not. It prints status messages for all directories placed in lost+found.
Phase 4: Check Reference Counts
This phase uses the information from phase 2 and 3 to check for unreferenced files and incorrect link counts on files , directories or special files.
Whenever the filename is not known (it is an unreferenced file) fsck will ask to reconnect the file into the lost+found directory with the inode number as its name. If you clear the file its contents are lost. Unreferenced files that are empty are cleared automatically.
  • UNREF FILE I=inode number OWNER=0 MODE=M SIZE=S MTIME=T (RECONNECT?)
Fsck will prompt as below, whenever it found an entry with a different number of references than what was listed in the inode. You should let fsck adjust the count.
  • LINK COUNT FILE I=inode number OWNER=0 MODE=M SIZE=S MTIME=T COUNT=X (ADJUST?)
  • LINK COUNT DIR I=inode number OWNER=0 MODE=M SIZE=S MTIME=T COUNT=X (ADJUST?)
Sometime FSCK will prompt as below whenever a file or directory has a bad or duplicate block in it.  If you clear it now, the data is lost. You can leave the error and attempt to recover the date and rerun fsck later to clear the file.
  • BAD/DUP FILE I=inode number OWNER=0 MODE=M SIZE=S MTIME=T (CLEAR)
Phase 5: Check Cylinder Groups
This phase checks the free block and unused inode maps. It will automatically correct the free lists if necessary, although in manual mode it will ask permission first.

:::: How to deal with lost+found content after running the FSCK

UNIX file systems really are very robust. However, if fsck did find major problems or made a large number of corrections, rerun it to be sure the disk isn’t undergoing hardware failure. If you keep a log of all the inodes it clears, you can go to the backup tape and dump the list of inodes on the tape. Restore just those inodes to restore the files.
If fsck reconnected unreferenced entries, it places them in the lost+found directory.
  • Items in the lost+found directory can of any type: files, directories, special files(devices), or fifos.
  • If it is a fifo, you can safely delete it: the process that opened it, is long gone and will open a new one when it runs again.
  • For files use the owner name to contact the owner and have him look at the contents and see if the file is worth keeping.
  • For directories, the files in the directory should help you and the owner determine where they belong.
  • You can look at the backup tape list for a directory with those contents if necessary, then just remake the directory and move the files back.
  • Finally we can remove the directory entry in lost+found.

::: Points to remember while trying to run fsck.

Keep the following points in mind when running the fsck command to check UFS file systems:
  • A file system should be inactive when using fsck to check that file system.
  •  File system changes waiting to be flushed to disk or file system changes that occur during the fsck checking process can be interpreted as file system corruption and may not be a reliable indication of a problem.
  •  A file system must be inactive when using fsck to repair that file system.
  •  File system changes waiting to be flushed to disk or file system changes that occur during the fsck repairing process might cause the file system to become corrupted or might cause the system to crash.
  •  Unmount a file system before using fsck on that file system, to ensure that it is inactive and that all file system data structures are consistent as possible.
  •  The only exceptions are for the active root (/) and /usr file systems, because they must be mounted to run fsck.
  •  If you need to repair the root (/) or /usr file systems, boot the system from an alternate device, if possible, so that these file systems are unmounted and inactive.

:::: How to Run  fsck in Multiuser mode

Prior to running fsck on a file system in multiuser mode, user must umount the file system without affecting the OS. Common reasons why a user may not be able to unmount a filesystem are :
  • – processes or users are interacting with the filesystem
  • – the filesystem in being shared out
The following filesystems can not be unmounted while the OS is in multiuser mode:
/
/usr
/var
The system used in this example has the following setup:
# df -k
 Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s0 1985487 71707 1854216 4% /
/dev/dsk/c0t0d0s6 6049806 1389769 4599539 24% /usr
/dev/dsk/c0t0d0s1 1985487 187397 1738526 10% /var
/dev/dsk/c0t0d0s5 8066141 153699 7831781 2% /opt
/dev/dsk/c0t0d0s4 4032654 9 3992319 1% /test
/dev/dsk/c0t0d0s7 10082476 38393 9943259 1% /work
The output from df -k shows there are three filesystems that can be umounted safely without affecting the OS –
/dev/dsk/c0t0d0s5 8066141 153699 7831781 2% /opt
/dev/dsk/c0t0d0s4 4032654 9 3992319 1% /test
/dev/dsk/c0t0d0s7 10082476 38393 9943259 1% /work
There could be a problem with /opt /dev/dsk/c0t0d0s5. The /opt filesystem is used a lot by third party software, and some Sun software also runs in this filesystem.
To fsck a filesystem, it must be umounted first as seen below –
# umount /Data
umount: /Data busy
As shown above, /work gets an error when trying to umount it. Use the fuser command, with or without the -c option, to see what processes are causing the filesystem to be busy –
# fuser -c /work
/work: 29916c 9188c
# fuser /work
/work: 29917c 9188c
# ps -ef | grep 29916
root 29918 9188 0 15:51:21 pts/13 0:00 grep 29916
# ps -ef | grep 9188
root 29920 9188 0 15:51:42 pts/13 0:00 grep 9188
root 9188 563 0 Mar 08 pts/13 0:00 -sh
Once we know no other processes or users are using this filesystem, the user can proceed to umount and fsck the filesystem.
The -k option can be used with the fuser command to automatically kill the processes associated with the filesystem ( fuser -k /work ). Use caution when using the -k option with fuser. See man page on fuser. Understanding of the filesystem is critical. Be sure to know the system well, and use care NOT to do the following:
  • Kill processes critical to the machines functionality and original   design.
  • Umount filesystems that can’t be umounted and fscked while  the system is in multiuser mode.
  •  The user does not want to just randomly kill processes so that a fsck can be run on a filesystem.
Make sure that the filesystem we are unmounting is not used for nfs share.  Running the “share” command will show all the nfs shared filesystems.
If the filesystem to be unmounted shows up in the share command output, unshare the filesystem before the unmount is attempted.
for example:
# cd /
# umount /data
# fsck -y /dev/rdsk/c0t0d0s7
** /dev/rdsk/c0t0d0s7
** Last Mounted on /work
** Phase 1 – Check Blocks and Sizes
** Phase 2 – Check Pathnames
** Phase 3 – Check Connectivity
** Phase 4 – Check Reference Counts
** Phase 5 – Check Cyl groups
11 files, 38393 used, 10044083 free (3 frags, 1255510 blocks, 0.0% fragmentation)
#fsck -m /dev/rdsk/c0t0d0s7
** /dev/rdsk/c0t0d0s7
ufs fsck: sanity check: /dev/rdsk/c0t0d0s7 okay
# mount /data
  • FSCK uses “-y” option to  automatically fix, or try to fix, issues with the corrupted filesystem.
  • The -m option was also used, which reports back if the file system state is clean, without making any changes.
Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

2 Responses

  1. September 18, 2015

    […] Read – Everything you need to know before you run FSCK […]

  2. July 22, 2016

    […] Read – Everything you need to know before you run FSCK […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us