Solaris SVM: Recovery procedures when BOTH sides of the mirror indicate a "Last Erred" state

As a result of unusual failures such as multiple power failures, Solstice DiskSuite /Solaris  Volume Manager both submirrors of a metadevice mirror may be left in an unusual “Last Erred / Last Erred” state making it impossible to determine exactly which submirror must be fixed/replaced first in order to protect the data stored on the metadevice mirror.

Instructions:

The following are two examples of DiskSuite/Solaris Volume Manager metadevice mirrors with both submirrors indicating that components are in the “Last Erred” state.

Normal DiskSuite/Solaris Volume Manager recovery procedures indicate that the submirror in “Maintenance” state must be fixed BEFORE the submirror in “Last Erred” state. In these examples, it is impossible to determine which submirror must be fixed first to protect data.

The first example below is the metastat output from a simple mirrored metadevice made up of two submirrors built upon single slices of a physical disk, The second example below ist the metastat output from a mirrored metadevice made up of two submirrors comprised of a stripe of three physical components.

Although the metadevices are slightly different, in both cases, the attempted recovery prodcedure is exactly the same. Details of the procedure follow the examples below.

EXAMPLE 1 – metastat of a simple mirrored metadevice:

Note in this example, both submirrors are in a “Last Errd” state making normal recovery procedures impossible.

d14: Mirror Submirror 0: d15 State: Needs maintenance Submirror 1: d16 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 46137984 blocks (22 GB)
d15: Submirror of d14 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d14 c3t32d0s0  Size: 46137984 blocks (22 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t32d0s0 0 No Last Erred Yes
d16: Submirror of d14 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d14 c3t33d0s0  Size: 46137984 blocks (22 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t33d0s0 0 No Last Erred Yes

EXAMPLE 2 – metastat of a striped mirrored metadevice:

Note in this example that the 3 stripes from one submirror and 1 stripe from the second submirror are in “Last Erred” state making normal recovery procedures impossible.

d6: Mirror Submirror 0: d41 State: Needs maintenance Submirror 1: d42 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 49132024 blocks (23 GB)
d41: Submirror of d6 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d6 c3t50d0s1  Size: 49132024 blocks (23 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t50d0s1 0 No Last Erred Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c3t35d0s0 10176 No Last Erred Yes Stripe 2: Device Start Block Dbase State Reloc Hot Spare c3t54d0s4 0 No Last Erred Yes
d42: Submirror of d6 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d6 c3t57d0s4  Size: 49132024 blocks (23 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c3t51d0s1 0 No Okay Yes Stripe 1: Device Start Block Dbase State Reloc Hot Spare c3t36d0s0 10176 No Okay Yes Stripe 2: Device Start Block Dbase State Reloc Hot Spare c3t57d0s4 0 No Last Erred Yes

STEPS FOR ATTEMPTED RECOVERY

LOCATE BACKUP TAPES and have them available in the event that the metadevices cannot be recovered. There is no guarantee that the procedures presented below will result in data recovery.

In this case, the *safest* means for attempting recovery is to unmount and clear each mirror leaving only the submirrors.

Fsck and mount each submirror to validate the data.

Determine which submirrors *if any* are valid. The goal is to locate 1 good submirror from each metadevice. Once the good submirror has been located, recreate the mirror using the good submirror. Attach the remaining submirror and mount the mirror to its original mount point.

Below is an example of the procedure used to recover metadevice d6  (the procedure is identical for metadevice d14)

# umount /dev/md/dsk/d6 # metaclear -f d6 # fsck /dev/md/rdsk/d41 # fsck /dev/md/rdsk/d42 # mount /dev/md/dsk/d41 /
 ** verify data at this point **

If data is determined to be valid, recreate the mirror metadevice using this submirror

metainit d6 -m d41

Note : after creating mirror we should not attached second sub mirror if we will attach then data will be over write on second sub mirror and current data will be lost.

mount metadevice

# mount /dev/md/dsk/d6 /

If data is determined to be INVALID, unmount first submirror, metaclear -f d6, mount second submirror and attempt to validate this data:

# umount /dev/md/dsk/d41 # mount /dev/md/dsk/d42

** verify data at this point **

If data is determined to be valid, recreate the mirror metadevice using this submirror:
# umount /dev/md/dsk/d42 # metainit d6 -m d42

mount the metadevice
#mount /dev/md/dsk/d6 /

validate the data, if all data is correct and ok then attach second sub mirror.

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

6 Responses

  1. Ramesh says:

    Very Good Doc

  2. Woky says:

    Looks pretty good, but You should take care about command output formating.
    Let’s say format them like a code.

  3. Suprabha says:

    nice and makes the things very clear.Very good Doc.

  4. Subhash Kumar says:

    I Think it need small correction.

    metainit d6 -m d41 # after creating mirror we should not attached second sub mirror if we will attach then data will be over write on second sub mirror and current data will be lost.

    mount metadevice
    # mount /dev/md/dsk/d6 /

    If data is determined to be INVALID, unmount first submirror, metaclear -f d6, mount second submirror and attempt to validate this data:
    # umount /dev/md/dsk/d41 # mount /dev/md/dsk/d42

    ** verify data at this point **

    If data is determined to be valid, recreate the mirror metadevice using this submirror:
    # umount /dev/md/dsk/d42 # metainit d6 -m d42

    mount the metadevice
    #mount /dev/md/dsk/d6 /

    validate the data, if all data is correct and ok then attach second submirror.

  1. September 16, 2015

    […] Read – Recovery procedures when BOTH sides of the mirror indicate a “Last Erred” state […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us