Veritas Volume Manager – 'vxdisk list' Shows Disk(s) as "online failing" and 'vxprint -th" Shows Disk(s) as "FAILING"

Occasionally, for various reasons, a disk, or disks, may get marked by Volume Manager as “failing”. This can be seen by doing ‘vxdisk list’ and ‘vxprint -th’:

vxdisk list:

DEVICE       TYPE      DISK         GROUP        STATUS
c1t0d0s2     sliced    datadg01     datadg       online failing

vxprint -th:

dm L1-0         c1t0d0s2     -        71124291 -        FAILING  -       -

These “failing” statuses are usually a result of a hardware hiccup of some sort. For example, an A5x00 array may have a flaky GBIC that causes inconsistent communication with the disks in the array. Because
Volume Manager can only communicate with the drives intermittently, it may mark them as “failing”.
This is basically a flag to the system administrator that something has happened. It does not necessarily indicate that the hard drive is bad. It just means that at some point in time, Volume Manager was not able to  communicate with the disk(s). The failing flag should NOT be confused with the “failed” flag. If a disk shows up as “failed was:” this is almost certainly a bad disk. The failing flag on the other hand is most likely not an
actual bad disk.


Instructions:

Before turning this failing flag off, a firm decision should be made regarding the “real” status of the disk drive(s) in question. One should ALWAYS check the contents of the messages files in /var/adm to see if an actual hardware event occurred. Also, mail sent to the root account should be parsed and checked for messages sent by Volume Manager regarding a hardware problem.

If the drive really has HW problems then the normal routine of running a “vxdiskadm #4, #5” should be performed to replace the faulty unit.

If it is determined that a one-time “glitch” occurred and all indications seem fine (i.e., check output of ‘vxprint -ht’ and make sure all volumes and plexes are “Enabled Active” and all subdisks are “ENA”), the failing flag can be turned off by running the following command:

     /usr/sbin/vxedit -g <diskgroup name> set failing=off <disk name>

For example

     /usr/sbin/vxedit -g datadg set failing=off datadg01

To verify success of the command, do a vxdisk list and vxprint -th. If the disk shows as “online” then the procedure was successful. The system should be monitored over the next several days to make sure that the disk does not go back to a failing condition.  If a disk repeatedly goes back to a failing state, a more thorough analysis of the hardware should be done to identify why.

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us