Solaris SVM: Recovery procedures when BOTH sides of the mirror indicate a "Last Erred" state
Other Learning Articles that you may like to read
Free Courses We Offer
Paid Training Courses we Offer
As a result of unusual failures such as multiple power failures, Solstice DiskSuite /Solaris Volume Manager both submirrors of a metadevice mirror may be left in an unusual “Last Erred / Last Erred” state making it impossible to determine exactly which submirror must be fixed/replaced first in order to protect the data stored on the metadevice mirror.
Instructions:
The following are two examples of DiskSuite/Solaris Volume Manager metadevice mirrors with both submirrors indicating that components are in the “Last Erred” state.
Normal DiskSuite/Solaris Volume Manager recovery procedures indicate that the submirror in “Maintenance” state must be fixed BEFORE the submirror in “Last Erred” state. In these examples, it is impossible to determine which submirror must be fixed first to protect data.
The first example below is the metastat output from a simple mirrored metadevice made up of two submirrors built upon single slices of a physical disk, The second example below ist the metastat output from a mirrored metadevice made up of two submirrors comprised of a stripe of three physical components.
Although the metadevices are slightly different, in both cases, the attempted recovery prodcedure is exactly the same. Details of the procedure follow the examples below.
EXAMPLE 1 – metastat of a simple mirrored metadevice:
Note in this example, both submirrors are in a “Last Errd” state making normal recovery procedures impossible.
d14: Mirror Submirror 0: d15 State: Needs maintenance Submirror 1: d16 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 46137984 blocks (22 GB)
d15: Submirror of d14 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d14 c3t32d0s0 Size: 46137984 blocks (22 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t32d0s0 0 No Last Erred Yes
d16: Submirror of d14 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d14 c3t33d0s0 Size: 46137984 blocks (22 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t33d0s0 0 No Last Erred Yes
EXAMPLE 2 – metastat of a striped mirrored metadevice:
Note in this example that the 3 stripes from one submirror and 1 stripe from the second submirror are in “Last Erred” state making normal recovery procedures impossible.
d6: Mirror Submirror 0: d41 State: Needs maintenance Submirror 1: d42 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 49132024 blocks (23 GB)
d41: Submirror of d6 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d6 c3t50d0s1 Size: 49132024 blocks (23 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t50d0s1 0 No Last Erred Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c3t35d0s0 10176 No Last Erred Yes Stripe 2: Device Start Block Dbase State Reloc Hot Spare c3t54d0s4 0 No Last Erred Yes
d42: Submirror of d6 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d6 c3t57d0s4 Size: 49132024 blocks (23 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t51d0s1 0 No Okay Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c3t36d0s0 10176 No Okay Yes Stripe 2: Device Start Block Dbase State Reloc Hot Spare c3t57d0s4 0 No Last Erred Yes
STEPS FOR ATTEMPTED RECOVERY
LOCATE BACKUP TAPES and have them available in the event that the metadevices cannot be recovered. There is no guarantee that the procedures presented below will result in data recovery.
In this case, the *safest* means for attempting recovery is to unmount and clear each mirror leaving only the submirrors.
Fsck and mount each submirror to validate the data.
Determine which submirrors *if any* are valid. The goal is to locate 1 good submirror from each metadevice. Once the good submirror has been located, recreate the mirror using the good submirror. Attach the remaining submirror and mount the mirror to its original mount point.
Below is an example of the procedure used to recover metadevice d6 (the procedure is identical for metadevice d14)
# umount /dev/md/dsk/d6 # metaclear -f d6 # fsck /dev/md/rdsk/d41 # fsck /dev/md/rdsk/d42 # mount /dev/md/dsk/d41 /
** verify data at this point **
If data is determined to be valid, recreate the mirror metadevice using this submirror
metainit d6 -m d41
Note : after creating mirror we should not attached second sub mirror if we will attach then data will be over write on second sub mirror and current data will be lost.
mount metadevice
# mount /dev/md/dsk/d6 /
If data is determined to be INVALID, unmount first submirror, metaclear -f d6, mount second submirror and attempt to validate this data:
# umount /dev/md/dsk/d41 # mount /dev/md/dsk/d42
** verify data at this point **
If data is determined to be valid, recreate the mirror metadevice using this submirror:
# umount /dev/md/dsk/d42 # metainit d6 -m d42mount the metadevice
#mount /dev/md/dsk/d6 /validate the data, if all data is correct and ok then attach second sub mirror.
Very Good Doc
Looks pretty good, but You should take care about command output formating.
Let’s say format them like a code.
nice and makes the things very clear.Very good Doc.
I Think it need small correction.
metainit d6 -m d41 # after creating mirror we should not attached second sub mirror if we will attach then data will be over write on second sub mirror and current data will be lost.
mount metadevice
# mount /dev/md/dsk/d6 /
If data is determined to be INVALID, unmount first submirror, metaclear -f d6, mount second submirror and attempt to validate this data:
# umount /dev/md/dsk/d41 # mount /dev/md/dsk/d42
** verify data at this point **
If data is determined to be valid, recreate the mirror metadevice using this submirror:
# umount /dev/md/dsk/d42 # metainit d6 -m d42
mount the metadevice
#mount /dev/md/dsk/d6 /
validate the data, if all data is correct and ok then attach second submirror.
Thanks Subhash , i agree with you. I have updated the procedure.