Veritas VxVM : Fails to start after rebooting the system

Upon boot-time startup, the VERITAS Volume Manager configuration daemon vxconfigd scans all the disks and reads the private regions on the disks. It has been observed in some rare situations, where

– I/Os to a disk are failing, and

– there are lengthy delays before the failures are returned to Volume Manager by the operating system



vxconfigd may take an extremely long time to process the disk configurations, causing Volume Manager to be perceived as hung and unable to start.

One way to identify the problem and to isolate the specific disk is:

1. Restart vxconfigd in debug mode, either
a. i. Boot the system with Volume Manager disabled. This can be accomplished by creating the install-db file:

    # touch /etc/vx/reconfig.d/state.d/install-db

ii. When the system is up, manually start vxconfigd on the command line:

    # vxconfigd -k -x 9 -x mstimestamp -x tracefile=filename

The above steps may or may not work depending on which disk is problematic, and whether or not the root disk is under Volume Manager control.

or

b) edit the vxvm-sysboot file (e.g. /sbin/init.d/vxvm-sysboot on HP-UX, /etc/init.d/vxvm-sysboot on Solaris), from

    vxconfigd $vxconfigd_opts -m boot

to

    vxconfigd -x 9 -x mstimestamp -x tracefile=filename $vxconfigd_opts -m boot

and reboot the system

2. When vxconfigd restarts, look for I/O errors in the debug log such as:

    07/31 10:40:45.929: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2184: (thread= 3636)
    07/31 11:09:42.331: DEBUG: IOCTL completion (thread 3636): failed: errno=5 (I/O error)
    07/31 11:09:42.466: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2192: (thread= 3636)
    07/31 11:38:39.674: DEBUG: IOCTL completion (thread 3636): failed: errno=5 (I/O error)
    07/31 11:38:39.813: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2200: (thread= 3636)
    07/31 12:07:37.152: DEBUG: IOCTL completion (thread 3636): failed: errno=5 (I/O error)
    07/31 12:07:37.268: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2208: (thread= 3636)


In this particular case, I/Os consistently failed about 29 minutes after they were issued, causing excessive delays in vxconfigd.


Searching backwards in the log for a “rid” matching the “drid”, one can identify the disk involved, such as:

    07/31 07:18:04.240: DEBUG: IOCTL NEW_DISK da=c123t12d6 rid=0.1600 dm= dmrid=0.0 new_dmrid=0.0 dgiid=0.0 pub_dev=1/575 priv_dev=1/575 pub_len=9223372036854775807 priv_len=922337 2036854775807 kflag=0 vflag=0x60: return 0(0x0)

Workaround:

Exclude the disk identified in step 2 above from Volume Manager by specifying it in the /etc/vx/disks.exclude file, such as:

    c123t12d6


If the disk belongs to a disk group, then the disk group will not be imported as the disk will not be found. The “vxdg -f import” option can be used to force an import if necessary.


Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us