Solaris SVM : Disk replacement for Systems With Internal FCAL Drives Under SVM (V280R, V480, V490, V880, V890):

Beginning with Solaris 9 Operating System (OS), Solaris Volume Manager (SVM) software uses a new feature called Device-ID (or DevID), which identifies each disk not only by it’s c#t#d# name, but also by a unique ID generated by the disk’s World Wide Number (WWN), or serial number. The SVM software relies on the Solaris OS to supply it with each disk’s correct DevID.

To replace a disk, use the luxadm command to remove it and insert the new disk. This procedure causes an update of the Solaris OS device framework, so that the new disk’s DevID is inserted and the old disk’s DevID is removed.

PROCEDURE FOR REPLACING MIRRORED DISKS

The following set of commands should work in all cases. Follow the exact sequence to ensure a smooth operation.

To replace a disk, which is controlled by SVM, and is part of a mirror, perform the following steps:

1. Run “metadetach” to detach all the submirrors on the failing disk from their respective mirrors (see the following):
# metadetach -f
NOTE: The “-f” option is not required if the metadevice is in an “okay” state.
2. Run metaclear to remove the configuration from the disk:
# metaclear
You can verify there are no existing metadevices left on the disk, by running the following:
# metastat -p | grep c#t#d#
3. If there are any replicas on this disk, note the number of replicas, and remove them using the following:
# metadb -i (number of replicas to be noted). # metadb -d c#t#d#s#
Verify that there are no existing replicas left on the disk by running the following:
# metadb | grep c#t#d#
4. If there are any open filesystems on this disk not under SVM control, or non-mirrored metadevices, unmount them.
5. Run “format” or “prtvtoc/fmthard” to save the disk partition table information.
# prtvtoc /dev/rdsk/c#t#d#s2 > file
6. Run the ‘luxadm’ command to remove the failed disk.
#luxadm remove_device -F /dev/rdsk/c#t#d#s2 At the prompt, physically remove the disk and continue. The picld daemon notifies the system that the disk has been removed.
7. Initiate devfsadm cleanup subroutines by entering the following command:
# /usr/sbin/devfsadm -C -c disk
The default devfsadm operation is, to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files, in the /devices directory, and logical links in /dev.
With the “-c disk” option, devfsadm will only update disk device files. This saves time, and is important on systems that have tape devices attached. Rebuilding these tape devices could cause undesirable results on non-Sun hardware.
The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names.
This should remove all the device paths for this particular disk. This can be verified with:
# ls -ld /dev/dsk/cxtxd*
This should return no devices.

8. It is now safe to physically replace the disk. Insert a new disk, and configure it. Create the necessary entries in the Solaris OS device tree, with one of the following commands
# devfsadm
or
# /usr/sbin/luxadm insert_device <enclosure_name,sx>
where sx is the slot number
or
# /usr/sbin/luxadm insert_device (if enclosure name is not known)
Note: In many cases, luxadm insert_device does not require the enclosure name and slot number. Use the following to find the slot number:
# luxadm display
To find the use:
# luxadm probe
Run “ls -ld /dev/dsk/c1t1d*” to verify that the new device paths have been created.
CAUTION: After inserting a new disk and running devfsadm (or luxadm), the old ssd instance number changes to a new ssd instance number. This change is expected, so ignore it.
For Example: When the error occurs on the following disk, whose ssd instance is given by ssd3:
WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3): Error for Command: read(10) Error Level: Retryable Requested Block: 15392944 Error Block: 15392958
After inserting a new disk, the ssd instance changes to ssd10 as shown below. It is not a cause of concern as this is expected.
picld[287]: [ID 727222 daemon.error] Device DISK0 inserted qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE scsi: [ID 799468 kern.info] ssd10 at fp2: name w21000011c63f0c94,0, bus address ef genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0 scsi: [ID 365881 kern.info] genunix: [ID 408114 kern.info] /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c 63f0c94,0 (ssd10) online
9. Run “format” or “prtvtoc/fmthard” to put the desired partition table on the new disk.
# fmthard -s file /dev/rdsk/c#t#d#s2
[‘file’ is the prtvtoc saved in step 5]
10. Use “metainit” and “metattach” to create and attach those submirrors to the mirrors to start the resync:
# metainit 1 1 c#t#d#s# # metattach
11. If necessary, re-create the same number of replicas that existed previously, using the -c option of the metadb(1M) command:
# metadb -a -c# c#t#d#s#
12. Be sure to correct the EEPROM entry for the boot-device (only if one of the root disks has been replaced).

PROCEDURE FOR REPLACING A DISK IN A RAID-5 VOLUME

Note: If a disk is used in BOTH a mirror and a RAID5, do not use the following procedure; instead, follow the instructions for the MIRRORED devices (above). This is because the RAID5 array just healed, is treated as a single disk for mirroring purposes.
To replace an SVM-controlled disk, which is part of a RAID5 metadevice, the following steps must be followed:

1. If there are any open filesystems on this disk not under SVM control,or non-mirrored metadevices, unmount them.

2. If there are any replicas on this disk, remove them using:
# metadb -d c#t#d#s#
Verify there are no existing replicas left on the disk by running:
# metadb | grep c#t#d#

3. Run “format” or “prtvtoc/fmthard” to save the disk partition table information.
# prtvtoc /dev/rdsk/c#t#d#s2 > file

4. Run the ‘luxadm’ command to remove the failed disk.
# luxadm remove_device -F /dev/rdsk/c#t#d#s2
At the prompt, physically remove the disk and continue. The picld daemon notifies the system that the disk has been removed.

5. Initiate devfsadm cleanup subroutines by entering the following command:
# /usr/sbin/devfsadm -C -c disk
The default devfsadm operation, is to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files in the /devices directory, and logical links in /dev.
With the “-c disk” option, devfsadm will only update disk device files. This saves time and is important on systems that have tape devices attached. Rebuilding these tape devices could cause undesirable results on non-Sun hardware.
The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names.

This should remove all the device paths for this particular disk. This can be verified with:
# ls -ld /dev/dsk/cxtxd*
This should return no devices.

6. It is now safe to physically replace the disk. Insert a new disk, and configure it. Create the necessary entries in the Solaris OS device tree, with one of the following commands:

# devfsadm
or
# /usr/sbin/luxadm insert_device <enclosure_name,sx> where sx is the slot number
or
# /usr/sbin/luxadm insert_device (if enclosure name is not known)

Note: In many cases, luxadm insert_device does not require the enclosure name and slot number. Use the following to find the slot number:

# luxadm display

To find the you can use:
# luxadm probe

Run “ls -ld /dev/dsk/c1t1d*” to verify that the new device paths have been created.
CAUTION: After inserting a new disk and running devfsadm(or luxadm), the old ssd instance number changes to a new ssd instance number. This change is expected, so ignore it.
For Example:
When the error occurs on the following disks(ssd3).

WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3): Error for Command: read(10) Error Level: Retryable Requested Block: 15392944 Error Block: 15392958
After inserting a new disk, the ssd instance changes to ssd10 as shown below. It is not a cause of concern as this is expected.

  • picld[287]: [ID 727222 daemon.error] Device DISK0 inserted
  • qlc: [ID 686697 kern.info] NOTICE: Qlogic
  • qlc(2): Loop ONLINE
  • scsi: [ID 799468 kern.info] ssd10 at
  • fp2: name w21000011c63f0c94,0, bus address ef
  • genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0
  • scsi: [ID 365881 kern.info]
  • genunix: [ID 408114 kern.info] /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0 (ssd10) online

7. Run ‘format’ or ‘prtvtoc’ to put the desired partition table on the new disk
# fmthard -s file /dev/rdsk/c#t#d#s2
[‘file’ is the prtvtoc saved in step 3]

8. Run ‘metadevadm’ on the disk, which will update the New DevID.

# metadevadm -u c#t#d#
Note: Due to BugID 4808079 a disk can show up as “unavailable” in the metastat command, after running Step 8. To resolve this, run “metastat -i”.

After running this command, the device should show a metastat status of “Okay”.The fix for this bug has been delivered and integrated in s9u4_08,s9u5_02 and s10_35.

9. If necessary, recreate any replicas on the new disk:

# metadb -a c#t#d#s#

10. Run metareplace to enable and resync the new disk:

# metareplace -e c#t#d#s#

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

8 Responses

  1. Yoga says:

    I could not think you are more right..

  2. Yogesh Raheja says:

    @Yoga, thanks but reality cant be changed….

  3. raju_3235 says:

    Thanks Yogesh/Ram,

    I’ve a below concern,Could you please advise mee…

    Hi All,

    Can anyone help me how to extend SAN filesystem on solaris critical box which is under SVM and 1) No powerpath is installed on it.
    2) I don’t see C3 controller to which storage is connected
    3) OS is solaris 9 and its sun fire 480R
    4) I see persistent binding in sd.conf
    5) storage team allocated LUN with ID#A06 (which is 2566 in decimal and I believe if I add this in sd.conf file it wont detect as its greater than 255)

    Basic Info..
    Here is the FS which needs to extend by 20gb
    #df -h
    /dev/md/dsk/d1 46G 37G 8.4G 82% /install2

    ###Disks in d1 ###
    # metastat d1
    c3t0d32
    c3t0d33
    c3t0d75

    ###No powerpath ##
    # powermt display dev=all
    powermt: not found
    #

    ### inq output ###
    #/emc_migration/bin_SUN/inq
    /dev/rdsk/c3t0d32s2 :EMC :SYMMETRIX :5874 :77D20008 :4419840
    /dev/rdsk/c3t0d33s2 :EMC :SYMMETRIX :5874 :77D21008 :4419840
    —–
    ### format output #####

    # echo | format
    Searching for disks…done

    0. c1t0d0
    /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000005ag9c4862,0
    1. c1t1d0
    /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000005ag9c48ac,0
    2. c3t0d28

    ### sd.conf output## 
    I SEE PERSISTENT BINDING.

    output from sd.conf
    name=”sd” parent=”lpfc” target=0 lun=32 hba=”lpfc0″;
    name=”sd” parent=”lpfc” target=0 lun=32 hba=”lpfc1″;
    name=”sd” parent=”lpfc” target=0 lun=32 hba=”lpfc2″;

    #### luxadm probe
    Found Fibre Channel device(s):
    Node WWN:20000005ag9c48ac Device Type:Disk device
    Logical Path:/dev/rdsk/c1t1d0s2
    Node WWN:20000005ag9c4862 Device Type:Disk device
    Logical Path:/dev/rdsk/c1t0d0s2

    ##### cfgadm -al
    Ap_Id Type Receptacle Occupant Condition
    c0 scsi-bus connected configured unknown
    c0::dsk/c0t0d0 CD-ROM connected configured unknown

    #### luxadm -e port
    Found path to 1 HBA ports
    /devices/pci@9,600000/SUNW,qlc@2/fp@0,0:devctl CONNECTED
    #

    Thanks,
    Raj

  4. Mark says:

    Thanks. I found this very helpful replacing a drive in a V880 running solaris 8 with a fairly complicated raid 0+1 setup that had been grown onto some 3310 LUNS. SAVE A COPY of /etc/lvm/md.cf before you start if you are not sure how to re-create your submirror. I couldn’t replace the failed disk slice in my submirror, and had to metaclear the submirror, rebuild it with metainit, and reattach it.

    Another hurdle you may hit – luxadm remove_device didn’t want to play nice. I eventually got it to go by physically removing and reinserting the failed disk after I broke the mirror.

  1. December 2, 2011

    Thank you for puting the time in to publish this info. I found it very useful. If you are ever interested in deep linking directories then please contact me….

    […]gurkulindia.com » Solaris SVM : How to Replace Disks for Systems With Internal FC-AL Drives Under Solaris Volume Manager Sun Fire Servers (V280R, V480, V490, V880, V890):[…]…

  2. October 6, 2015

    […] Read – Disk replacement for Systems With Internal FCAL Drives […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us