Veritas Volume Manager – 'vxdisk list' Shows Disk(s) as "online failing" and 'vxprint -th" Shows Disk(s) as "FAILING"
Occasionally, for various reasons, a disk, or disks, may get marked by Volume Manager as “failing”. This can be seen by doing ‘vxdisk list’ and ‘vxprint -th’:
DEVICE TYPE DISK GROUP STATUS c1t0d0s2 sliced datadg01 datadg online failing
dm L1-0 c1t0d0s2 - 71124291 - FAILING - -
These “failing” statuses are usually a result of a hardware hiccup of some sort. For example, an A5x00 array may have a flaky GBIC that causes inconsistent communication with the disks in the array. Because
Volume Manager can only communicate with the drives intermittently, it may mark them as “failing”.
This is basically a flag to the system administrator that something has happened. It does not necessarily indicate that the hard drive is bad. It just means that at some point in time, Volume Manager was not able to communicate with the disk(s). The failing flag should NOT be confused with the “failed” flag. If a disk shows up as “failed was:” this is almost certainly a bad disk. The failing flag on the other hand is most likely not an
actual bad disk.
Before turning this failing flag off, a firm decision should be made regarding the “real” status of the disk drive(s) in question. One should ALWAYS check the contents of the messages files in /var/adm to see if an actual hardware event occurred. Also, mail sent to the root account should be parsed and checked for messages sent by Volume Manager regarding a hardware problem.
If the drive really has HW problems then the normal routine of running a “vxdiskadm #4, #5” should be performed to replace the faulty unit.
If it is determined that a one-time “glitch” occurred and all indications seem fine (i.e., check output of ‘vxprint -ht’ and make sure all volumes and plexes are “Enabled Active” and all subdisks are “ENA”), the failing flag can be turned off by running the following command:
/usr/sbin/vxedit -g <diskgroup name> set failing=off <disk name>
/usr/sbin/vxedit -g datadg set failing=off datadg01
To verify success of the command, do a vxdisk list and vxprint -th. If the disk shows as “online” then the procedure was successful. The system should be monitored over the next several days to make sure that the disk does not go back to a failing condition. If a disk repeatedly goes back to a failing state, a more thorough analysis of the hardware should be done to identify why.