Linux : Identifying Disk Errors in Linux
Below procedure can be used to identify the hard disk errors in linux operating system.
Execute the following command to identify the available disk configuration
# /bin/more /proc/scsi/scsi
Output will include all the currently available disks to the linux, and the output vary depending on platform model and configuration, but will be similar to the following:
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: AMI Model: Virtual CDROM Rev: 1.00
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: AMI Model: Virtual Floppy Rev: 1.00
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 02 Lun: 00
Vendor: SEAGATE Model: ST973401LSUN72G Rev: 0556
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 03 Lun: 00
Vendor: SEAGATE Model: ST973401LSUN72G Rev: 0556
Type: Direct-Access ANSI SCSI revision: 03
List the partitions for each available harddisk:
# /sbin/fdisk -l
Output will vary depending on platform model and configuration, but will be similar to the following:
Disk /dev/sdb: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 131 1052226 83 Linux
/dev/sdb2 132 8402 66436807+ 83 Linux
/dev/sdb3 8403 8924 4192965 82 Linux swap
Disk /dev/sdc: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 * 1 8924 71681998+ 83 Linux
The above commands are dynamic and will reflect the currently available disks. Devices that have failed in such a way that they are now offline will not be reflected in this output. Some platforms have virtual devices used for runtime addition of a storage media. These devices which are usually identified as ‘virtual’ can be ignored as their presence is not required for this diagnosis.
If you see one or more expected disks are not present in the output, then you can assume that those disks went offline and not able to respon to the scsi/fdisk probes
Diagnosis of Disk Errors
# /bin/grep SCSI /var/log/messages*
# /bin/grep ‘fs error’ /var/log/messages*
Output will vary depending on platform model and configuration, but will be similar to the following:
Dec 12 12:30:00 GURKULSERVER kernel: SCSI device sdb: 143374738 512-byte hdwr sectors (73408 MB)
Dec 12 12:30:00 GURKULSERVER kernel: SCSI device sdb: drive cache: write through
Dec 12 12:30:01 GURKULSERVER kernel: SCSI device sdc: 143374738 512-byte hdwr sectors (73408 MB)
Dec 12 12:30:01 GURKULSERVER kernel: SCSI device sdc: drive cache: write through
Dec 12 13:35:00 GURKULSERVER kernel: SCSI error : <2 0 3 0> return code = 0x10000
Dec 12 13:35:00 GURKULSERVER kernel: EXT2-fs error (device sdc1): read_inode_bitmap: Cannot read inode bitmap - block_group = 422, inode_bitmap = 13828097
Except the underline messages other messages are runtime events that are output by the platforms hardware discovery during boot. These messages can be ignored as they are not errors but are useful because they allow us to understand the disks identities that are available at boot time.
And The underlines messages are errors and are output due to a failing component or complete disk.
The first of the two errors details the SCSI error type.
The second of the two errors details the device which suffered the SCSI error and the type of error decoded for human readable format.
We use the keyword SCSI because all storage devices in a modern Linux platform including IDE/PATA, FC-AL, SAS, SATA, SCSI, and USB emulate SCSI to be represented as a storage device. Therefore, most error messages reported in the system messages file are prefixed with the word SCSI