Step By Step Procedure for Replacement of a Failed Disk in Solaris (SVM – In Solaris Volume Manager)

In this post we are basically covering the procedure to replace a failed disk under SVM in solaris. 

 

Replacement of a Failed Disk in Solaris (SVM – In Solaris Volume Manager)

TO REPLACE FAILED DISK IN THE SYSTEM:

First of all take the backups of following (necessary):
# metastat –p >/var/tmp/metastat-p-b4repalcement
# metastat –t >/var/tmp/metastat-t-b4replacement
# metadb –i >/var/tmp/metadb-i-b4replacement
# echo | format >/var/tmp/format-b4replacement
# iostat –en >/var/tmp/iostat-en-b4repalcement
# ifconfig –a >/var/tmp/ifconfig-a-b4repalcement

 

1. Identify the failed disk by following commands:
# echo | format
OR
# iostat –en or iostat –En (for complete details regarding failed disk)

OR
# By identifying the logs (/var/adm/messages) & dmesg.

yogesh-test#echo | format
Searching for disks…done
AVAILABLE DISK SELECTIONS:

0. c1t0d0
/pci@0,0/pci1000,30@10/sd@0,0
1. c1t2d0 ———à faulty drive
/pci@0,0/pci1000,30@10/sd@2,0

Specify disk (enter its number): Specify disk (enter its number):

Where: c1t0d0 is the root disk & c1t2d0 is the mirror disk.

yogesh-test#iostat -en
—- errors —
s/w h/w trn tot device
0 0 0 0 fd0
0 0 0 0 md/d0
0 0 0 0 md/d1
0 0 0 0 md/d3
0 0 0 0 md/d10
0 0 0 0 md/d11
0 0 0 0 md/d13
0 0 0 0 md/d20
0 0 0 0 md/d21
0 0 0 0 md/d23
6 0 0 6 c1t0d0
10 0 0 10 c0t0d0
6 50 0 6 c1t2d0 —————à Mirror disk is showing 50 h/w errors
0 0 0 0 yogesh:vold(pid568)



2. Run metadetach command to detach the failed disk’s submirrors (Break the mirror)

# metastat -p
# metadetach -f -à -f for forcefully
# metaclear
# metastat -p | grep –i —to check the submirrors has been cleared or not.

yogesh-test#metastat -p
d3 -m d13 d23 1
d13 1 1 c1t0d0s3
d23 1 1 c1t2d0s3
d1 -m d11 d21 1
d11 1 1 c1t0d0s1
d21 1 1 c1t2d0s1
d0 -m d10 d20 1
d10 1 1 c1t0d0s0
d20 1 1 c1t2d0s0

yogesh-test#metadetach d0 d20; metadetach d1 d21; metadetach d3 d23
d0: submirror d20 is detached
d1: submirror d21 is detached
d3: submirror d23 is detached

yogesh-test#metastat -p
d3 -m d13 1
d13 1 1 c1t0d0s3
d1 -m d11 d21 1
d11 1 1 c1t0d0s1
d0 -m d10 d20 1
d10 1 1 c1t0d0s0
d20 1 1 c1t2d0s0
d21 1 1 c1t2d0s1
d23 1 1 c1t2d0s3

yogesh-test#metastat –ac ——–à only for Solaris 10 (it wont work in previous versions of Solaris)
d3 m 517MB d13
d13 s 517MB c1t0d0s3
d1 m 1.0GB d11 d21
d11 s 1.0GB c1t0d0s1
d0 m 7.8GB d10 d20
d10 s 7.8GB c1t0d0s0
d20 s 7.8GB c1t2d0s0
d21 s 1.0GB c1t2d0s1
d23 s 517MB c1t2d0s3

yogesh-test#metaclear d20; metaclear d21; metaclear d23
d20: Concat/Stripe is cleared
d21: Concat/Stripe is cleared
d23: Concat/Stripe is cleared

3. Delete the statedata base replica’s of the failed disk:

# metadb -i
# metadb -d /dev/dsk/
# metadb -i | grep -i

yogesh-test#metadb -i

flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
a W p luo 16 8192 /dev/dsk/c1t2d0s7
a W p luo 8208 8192 /dev/dsk/c1t2d0s7
a W p luo 16400 8192 /dev/dsk/c1t2d0s7

yogesh-test#metadb –d /dev/dsk/c1t2d0s7

yogesh-test#metadb -i
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7


4. Remove the hard drive from the device tree, type the following command:

 

 

######################################################################

In case of SCSI / SAS Disks

SCSI/SAS disks will appear as below in format output

0. c0t0d0 <DEFAULT cyl 17832 alt 2 hd 255 sec 63>
/pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@0,0

Command Sequence to replace the disks

– cfgadm –al
– cfgadm –c unconfigure Ap_ID ( e.g. cfgadm -c unconfigure c0::dsk/c0t0d0)
– cfgadm -x remove_device c0::dsk/c0t0d0 (for data disk only)

In case of FCAL Sun 280R, V880, V490, V880, V890

FCAL disks will appear as below in format output

0. c1t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107> 
/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf365024,0 

Command Sequence to replace the disks

– luxadm –e port
– luxadm probe (to display paths)
– luxadm remove_device –F /dev/rdsk/c#t#d#s2
– devfsadm –v –Cc disk (where: C= cleans dir; c= specify disk)
– luxadm insert_device (optional)
######################################################################

Below procedure is related to SCSI/SAS disks

# cfgadm –al
# cfgadm –c unconfigure c1::dsk/c1t2d0

yogesh-test#cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown

yogesh-test#cfgadm -c unconfigure c1::dsk/c1t2d0
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected unconfigured unknown

5. Verify the device has been removed from the device tree, type following command:
# cfgadm –al
yogesh-test#cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown

6. Remove failed disk form the server and insert new disk.

7. Configure the new hard drive, type following command:
# cfgadm –c configure c1:dsk/c1t2d0
yogesh-test# cfgadm –c configure c1:dsk/c1t2d0
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown

8. Verify the device has been added to the device tree, type following command:

# cfgadm -al

yogesh-test#cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown

9. Check the disk status in the server by applying:

# echo | format OR # iostat –en

If disk is not visible in the server apply devfsadm –C command to reconfigure the attached devices, for reconfigure all disk apply:

# devfsadm –C –c disks
# echo | format OR # iostat -en

yogesh-test#echo | format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@0,0/pci1000,30@10/sd@0,0
2. c1t2d0
/pci@0,0/pci1000,30@10/sd@2,0

Specify disk (enter its number): Specify disk (enter its number):
yogesh-test#

10. Check the vtoc table for root disk and replaced disk,if not same then use fmthard (It would not be the same for the replaced disk).

# prtvtoc /dev/dsk/
# prtvtoc /dev/dsk/

Copy the VTOC to the replaced disk:

# prtvtoc /dev/dsk//| fmthard -s – /dev/rdsk/
yogesh-test#prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s – /dev/rdsk/c1t2d0s2
fmthard: New volume table of contents now in place.

yogesh-test#prtvtoc /dev/rdsk/c1t0d0s2
* /dev/rdsk/c1t0d0s2 partition map
*
* Dimensions:
* 512 bytes/sector
* 63 sectors/track
* 255 tracks/cylinder
* 16065 sectors/cylinder
* 1304 cylinders
* 1302 accessible cylinders
*

* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 0 16065 16064
* 19631430 1285200 20916629
*
* First Sector Last

* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 16065 16386300 16402364
1 3 01 16402365 2104515 18506879
2 5 00 0 20916630 20916629
3 8 00 18506880 1060290 19567169
7 0 00 19567170 48195 19615364
8 1 01 0 16065 16064

yogesh-test#prtvtoc /dev/rdsk/c1t2d0s2
* /dev/rdsk/c1t2d0s2 partition map
*
* Dimensions:
* 512 bytes/sector
* 63 sectors/track
* 255 tracks/cylinder
* 16065 sectors/cylinder
* 1304 cylinders
* 1302 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 0 16065 16064
* 19631430 1285200 20916629
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 16065 16386300 16402364
1 3 01 16402365 2104515 18506879
2 5 00 0 20916630 20916629
3 8 00 18506880 1060290 19567169
7 0 00 19567170 48195 19615364
8 1 01 0 16065 16064

yogesh-test#


11. Create the statedata base devices on replaced disk.

# metadb -a -f -c 3 /dev/dsk/as s7 we have preserved for statedata base devices/replica’s.

yogesh-test#metadb -i

flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
a p luo 16 8192 /dev/dsk/c1t2d0s7
a p luo 8208 8192 /dev/dsk/c1t2d0s7
a p luo 16400 8192 /dev/dsk/c1t2d0s7
r – replica does not have device relocation information
o – replica active prior to last mddb configuration change
u – replica is up to date
l – locator for this replica was read successfully
c – replica’s location was in /etc/lvm/mddb.cf
p – replica’s location was patched in kernel
m – replica is master, this is replica selected as input
W – replica has device write errors
a – replica is active, commits are occurring to this replica
M – replica had problem with master blocks
D – replica had problem with data blocks
F – replica had format problems
S – replica is too small to hold current data base
R – replica had device read errors

12. Reattach the mirrors and wait untill all mirrors will syncned.

# metainit 1 1
# metattach
# metastat –ac OR metastat –t —-à to check syncing status

yogesh-test#metainit d23 1 1 c1t2d0s3
d23: Concat/Stripe is setup
yogesh-test#metainit d20 1 1 c1t2d0s0
d20: Concat/Stripe is setup
yogesh-test#metainit d21 1 1 c1t2d0s1
d21: Concat/Stripe is setup

yogesh-test#metattach d3 d23; metattach d0 d20; metattach d1 d21
d3: submirror d23 is attached
d0: submirror d20 is attached
d1: submirror d21 is attached

yogesh-test#metastat -ac
d3 m 517MB d13 d23
d13 s 517MB c1t0d0s3
d23 s 517MB c1t2d0s3
d1 m 1.0GB d11 d21
d11 s 1.0GB c1t0d0s1
d21 s 1.0GB c1t2d0s1
d0 m 7.8GB d10 d20
d10 s 7.8GB c1t0d0s0
d20 s 7.8GB c1t2d0s0

13. Safe to run metadevadm command to update the new devID.

# metadevadm -u
metadevadm – To update metadevice information.
yogesh-test#metadevadm -u /dev/dsk/c1t2d0

Yogesh Raheja

Yogesh working as a Consultant in Unix Engineering by profession. And he has multiple years experience in Solaris, Linux , AIX and Veritas Administration. He has been certified for SCSA9, SCSA10, SCNA10, VXVM, VCS, ITILv3. He is very much passionate about sharing his knowledge with others. Specialties: Expertize in Unix/Solaris Server, Linux (RHEL), AIX, Veritas Volume Manager, ZFS, Liveupgrades, Storage Migrations, Cluster deployment (VCS and HACMP) and administration and upgrade on Banking, Telecom, IT Infrastructure, and Hosting Services.

37 Responses

  1. arunkrish says:

    Hi,

    Its so helpful for me as a solaris admin aspirant .It would be more helpfull if you post something on sun hardware.I cant find a common difference between sun server[between entry level,midrange and enterprise]

  2. Let us know, what else do you want from our Gurkul India. We are always ready to assist you.

  3. Yogesh Raheja says:

    Sure, we are trying to bring every thing which is useful for all Unix Champs so that they can proceed with basis to critical tasks with better understanding for OS.

  4. nasir Khan says:

    Could you please write steps for jumpstarting on Sparc machine

  5. Sure, Nasir, give us few days as in my Sparc M/C mother board got failed… I will work on VMbox and make a doc for you with all configurations..

  6. Also this week I am completely engaged with Linux Filesystem Preparation along with my office work..:-)

  7. Satish says:

    Hi Yogesh really its very help full for me i want some more information about disk replacemet

    1)How to replace a failed disk under  veritas level
    2)How to convert root disk (SVM) under VXVm
    3)jumpstat 
    4)raid 0+1 1+0 which one best why?
    5)Mirroring the root disk in veritas level

    Please help me for the step by step and i need booting info also 

    ksatishbsc@gmail.com

  8. Bonny says:

    Hi GurukulIndia,

    we dont have console for v890 servers.
    could you please write steps for creating ILOM for sun-fire-v890 server.

  9. Hi Bonny, you can get the same info on SUN Site. There are dedicated PDFs given by SUN for the same.

  10. Gurkulindia Gurkulindia says:

    @bonny – my understanding is, v890 uses RSC , not sure why you are looking for ILOM. if you mean to say ALOM below is the reference to setup ALOM  – 

    http://wp.me/p1EO9J-2b  

    -> for v890  RSC setup please refer  

    http://wp.me/p1EO9J-28.    

    you can search the blog with V890 for further hardware details.

  11. Shreepati Jha says:

    Hey…. You are doing great work.

  12. Shreepati jha says:

    Could you please tell me process for the Fibre internal disk replacement….This will be a great help.

  13. Prasad says:

    Its really very helpful… I read the other day about VCS also… Can you plz share something basci about SVM also so that SVM will aslo be much easier for us… Thanks in Advance.

  14. Yogesh Raheja says:

    @Prasad, you are most welcome. We will prepare a post on SVM basic and publish it soon.

  15. Santosh says:

    Hi Yogesh,
    can you explain the same scenario in VxVm please

  16. Yogesh Raheja says:

    @Santosh, simplest way is to use vxdiskadm then option 4 & 5. For Complete scenario I have to make a post will whole process.

  17. Ramesh says:

    Hi Yogesh, Great Doc.. Really helped a lot … You and ur team are rocking .. Keep going ..

  18. Yogesh Raheja says:

    @Ramesh, Thanks a lot for your words!!!..

  19. sonu says:

    Hi yogesh, wondering why is that one of the disk doesn’t show in “echo |format” output and also doesn’t show in iostat -En output but the it shows in metastat -p output.

  20. Yogesh Raheja says:

    @Sonu, disk is present and is visible in format , meta & iostat. Infact format & iostat will show you each and every disk in Solaris OS.

  21. sonu says:

    Hi yogi,sorry for confusing you..I was talking about my solaris box…when i do a echo |format i can see only 3 disk although there are 4 disk in that. but when i do a metastat -p i can see the missing disk (as a submirror ) and also the output of “cfgadm -al” is showing that disk along with the other disk and the status is configured and connect..but do not see the disk when i do an iostat -En output its only 3 disk that is getting displayed ..so echo|format,iostat only shows only 3 disk and cfgadm -al, metastat -p shows only 3 disk..

  22. sonu says:

    above correction – last line ** so echo|format,iostat only shows only 3 disk and cfgadm -al, metastat -p shows all 4 disk..

  23. Yogesh Raheja says:

    @Sonu, what I would suggest you is to unconfigure the disk and configure it back. cfgadm -c unconfigure and cfgadm -c configure . (if not allowed use -f option). Again scan the devices devfsadm -Cv. There are chances that OS may not allow you configure/unconfigure. But we have only this option to test and check. Also check the disk paths are available or not in /dev/dsk or /dev/rdsk tree. If this wont work then we need to reconfigure the devices again and the best option for that would be to do reconfig reboot (reboot — -r). Before proceeding check the disk from which your system is booted and run installboot on both root & mirror disk. Perform configure/unconfigure first and let us know if it worked or not.

  24. Yogesh Raheja says:

    @Sonu..also if possible provide us with the outputs of metastat -ac & metadb -i & cfgadm -al & echo | format.

  25. sonu says:

    Thanks yogi for your valuable feedback , i will try that ..i am able to see the disk in the /dev/dsk and /dev/rdsk..lemme try devfsadm -Cv and in worst will go for reconfigure reboot as suggested by you.

  26. krishna says:

    Thanks Yogesh for your quick response…mean while do u have any link for patching for Cluster and under Zones.

  27. Yogesh Raheja says:

    @Krishna, process will remain same as we stated earlier. In cluster you have to shutdown the cluster, disabled the start vcs scripts (gab,llt,vcs etc etc) and proceed with patching. And with zones you can proceed normally (just by shutting down the zones) OS will automatically boot them if any reboot required patch will apply. But if you want to shorten the time edit the .xml file of zones, remove all of the filesystem entries from .xml file, shutdown the zones and then proceed with patching as normal.

  28. krishna says:

    Thanks Yogesh for ur valuable info… :)

  29. Yogesh Raheja says:

    @Krishna, anytime. Hope this helps you. :) Cheers.

  30. krishna says:

    Yogesh…can u pls share the patching procedure for vxvm…

  31. Mushtaq says:

    Hi , 

    I am wondering is there a way or steps that can help to find that the faulty disk is Local / JBOD / SAN

    thanks. 

    • Ramdev Ramdev says:

      Musthaq, you can identify that information from the device path displaying in the format output. most of the time the SAN device paths constructed using the HBA device driver name. And local will show with normal SCSI( or sometimes FCAL) paths, and jbod will have it’s own device path.

  32. Jeeva says:

    hi,

    please send steps to configure autofs,and LVM

  1. September 18, 2015

    […] Read – Disk Replacement from root Mirror […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us