Veritas VxVM : Fails to start after rebooting the system
Upon boot-time startup, the VERITAS Volume Manager configuration daemon vxconfigd scans all the disks and reads the private regions on the disks. It has been observed in some rare situations, where
– I/Os to a disk are failing, and
– there are lengthy delays before the failures are returned to Volume Manager by the operating system
vxconfigd may take an extremely long time to process the disk configurations, causing Volume Manager to be perceived as hung and unable to start.
One way to identify the problem and to isolate the specific disk is:
1. Restart vxconfigd in debug mode, either
a. i. Boot the system with Volume Manager disabled. This can be accomplished by creating the install-db file:
- # touch /etc/vx/reconfig.d/state.d/install-db
ii. When the system is up, manually start vxconfigd on the command line:
- # vxconfigd -k -x 9 -x mstimestamp -x tracefile=filename
The above steps may or may not work depending on which disk is problematic, and whether or not the root disk is under Volume Manager control.
b) edit the vxvm-sysboot file (e.g. /sbin/init.d/vxvm-sysboot on HP-UX, /etc/init.d/vxvm-sysboot on Solaris), from
- vxconfigd $vxconfigd_opts -m boot
- vxconfigd -x 9 -x mstimestamp -x tracefile=filename $vxconfigd_opts -m boot
and reboot the system
2. When vxconfigd restarts, look for I/O errors in the debug log such as:
- 07/31 10:40:45.929: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2184: (thread= 3636)
07/31 11:09:42.331: DEBUG: IOCTL completion (thread 3636): failed: errno=5 (I/O error)
07/31 11:09:42.466: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2192: (thread= 3636)
07/31 11:38:39.674: DEBUG: IOCTL completion (thread 3636): failed: errno=5 (I/O error)
07/31 11:38:39.813: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2200: (thread= 3636)
07/31 12:07:37.152: DEBUG: IOCTL completion (thread 3636): failed: errno=5 (I/O error)
07/31 12:07:37.268: DEBUG: IOCTL VOLDIO_READ len=1 priv,drid=0.1600,offset=2208: (thread= 3636)
In this particular case, I/Os consistently failed about 29 minutes after they were issued, causing excessive delays in vxconfigd.
Searching backwards in the log for a “rid” matching the “drid”, one can identify the disk involved, such as:
- 07/31 07:18:04.240: DEBUG: IOCTL NEW_DISK da=c123t12d6 rid=0.1600 dm= dmrid=0.0 new_dmrid=0.0 dgiid=0.0 pub_dev=1/575 priv_dev=1/575 pub_len=9223372036854775807 priv_len=922337 2036854775807 kflag=0 vflag=0x60: return 0(0x0)
Exclude the disk identified in step 2 above from Volume Manager by specifying it in the /etc/vx/disks.exclude file, such as:
If the disk belongs to a disk group, then the disk group will not be imported as the disk will not be found. The “vxdg -f import” option can be used to force an import if necessary.