Oracle Enterprise Linux – Configure Raid

RAID (Redundant Array of Inexpensive Disks) defines the use of multiple hard disks by systems to provide increased diskspace, performance and availability. This article solely focuses on implementing RAID1, commonly referred to as mirror disk, whereby two (or more) disks contain identical content. System availability and data integrity is maintained as long as at least one disk survives a failure.


Although using working examples from Oracle Enterprise Linux 5 (OEL5), the article similarly applies to other Linux distributions and versions.

Before proceeding, take a complete backup of the system.

1. Original System Configuration

Prior to implementing RAID, the system comprised the following, simple configuration:

# uname -a
Linux oel5raid1 2.6.18-92.el5 #1 SMP Fri May 23 22:17:30 EDT 2008 i686 i686 i386 GNU/Linux 

# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.2 (Carthage)

 

# fdisk -l 

Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         131     1052226   83  Linux
/dev/sda2             132         768     5116702+  83  Linux
/dev/sda3             769         899     1052257+  82  Linux swap / Solaris

 

# blkid
/dev/sda1: LABEL="/boot" UUID="4774c28e-01f8-4130-988d-9f9e3675a988" SEC_TYPE="ext2" TYPE="ext3"
/dev/sda2: LABEL="/1" UUID="3382d839-ebac-4db2-a19e-4c9e8dc56e0d" SEC_TYPE="ext2"  TYPE="ext3"
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"

 

# cat /etc/fstab
LABEL=/1                /                       ext3     defaults       1 1
LABEL=/boot             /boot                   ext3     defaults       1 2
tmpfs                   /dev/shm                tmpfs    defaults       0 0
devpts                  /dev/pts                devpts   gid=5,mode=620 0 0
sysfs                   /sys                    sysfs    defaults       0 0
proc                    /proc                   proc     defaults       0 0
LABEL=SWAP-sda3         swap                    swap     defaults       0 0

 

# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 

# swapon -s
Filename Type Size Used Priority
/dev/sda3 partition 1052248 0 -1

2. Add Second Hard Disk

A second hard disk is added to the system. Ideally, the second disk should be exactly the same (make and model) as the first. To help avoid single point of failure, add additional disks to a separate disk controller than that used by the first disk.

# fdisk -l /dev/sdb
Disk /dev/sdb: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes 

Device Boot      Start         End      Blocks   Id  System

 

3. Partition Second Hard Disk

The second disk must contain the same configuration (partition layout) as that of the first disk. Disk partitioning can be performed manully using the fdisk(8) utility, however the the sfdisk(8) utility can be used to quickly and easily replicate the partition table from the first disk e.g.:

# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK 

Disk /dev/sdb: 1044 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63   2104514    2104452  83  Linux
/dev/sdb2       2104515  12337919   10233405  83  Linux
/dev/sdb3      12337920  14442434    2104515  82  Linux swap / Solaris
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

4. Modify Secondary Disk Partitions to Type RAID

Use the fdisk(8) utility to modify the second disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:

# fdisk /dev/sdb 

The number of cylinders for this disk is set to 1044.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         131     1052226   83  Linux
/dev/sdb2             132         768     5116702+  83  Linux
/dev/sdb3             769         899     1052257+  82  Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raidf autodetect)

Command (m for help): p

Disk /dev/sdb: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         131     1052226   fd  Linux raid autodetect
/dev/sdb2             132         768     5116702+  fd  Linux raid autodetect
/dev/sdb3             769         899     1052257+  fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

# partprobe /dev/sdb

Verify creation of the new partitions on the second disk e.g.:

# partprobe /dev/sdb
[root@oel5raid /]# cat /proc/partitions
major minor  #blocks  name 

8     0    8388608 sda
8     1    1052226 sda1
8     2    5116702 sda2
8     3    1052257 sda3
8    16    8388608 sdb
8    17    1052226 sdb1
8    18    5116702 sdb2
8    19    1052257 sdb3

5. Create RAID1 Arrays on Second Disk

Use the mdadm(8) utility to create a raid1 array on the second disk only e.g.:

# cat /proc/mdstat
Personalities :
unused devices: <none> 

# mdadm --create /dev/md1 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb1 missing
mdadm: array /dev/md1 started.
# mdadm --create /dev/md2 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb2 missing
mdadm: array /dev/md2 started.
# mdadm --create /dev/md3 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb3 missing
mdadm: array /dev/md3 started.

# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0]
1052160 blocks [2/1] [U_]

md2 : active raid1 sdb2[0]
5116608 blocks [2/1] [U_]

md1 : active raid1 sdb1[0]
1052160 blocks [2/1] [U_]

unused devices: <none>

 


In the example above, raid devices are created using the same numbering as the device partitions they include e.g. /dev/md1 contains /dev/sdb1 and device /dev/sda1 will be added later. The term missing is used as a stub or placeholder that will eventually be replaced with the partitions on the first disk; /dev/sda1, /dev/sda2, /dev/sda3.

Check /proc/mdstat to verify successful creation of the raid devices e.g.:

# cat /proc/partitions
major minor  #blocks  name 

8     0    8388608 sda
8     1    1052226 sda1
8     2    5116702 sda2
8     3    1052257 sda3
8    16    8388608 sdb
8    17    1052226 sdb1
8    18    5116702 sdb2
8    19    1052257 sdb3
9     1    1052160 md1
9     2    5116608 md2
9     3    1052160 md3

6. Taking a Closer Look at RAID Devices

Use the mdadm(8) utility to review raid devices in detail e.g.:

# mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Tue Dec 30 21:46:44 2008
Raid Level : raid1
Array Size : 1052160 (1027.67 MiB 1077.41 MB)
Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent 

Update Time : Tue Dec 30 21:46:44 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : a4d5007d:6974901a:637e5622:e5b514c9
Events : 0.1

Number   Major   Minor   RaidDevice State
0       8       17        0      active sync   /dev/sdb1
1       0        0        1      removed

Note that raid device /dev/md1 solely contains one disk member /dev/sdb1 at this point. The state of the array is clean,degraded denoting only one (of two) underlying disk members is currently active and working.

7. RAID Configuration

Strictly speaking a master RAID configuration is not required. With relevant partitions being marked as type raid (fd), the kernel will auto assemble detected arrays on boot. If desired, one can create RAID configuration file /etc/mdadm.conf or /etc/mdadm/mdadm.conf as a reference to raid device usage e.g.:

# mkdir  /etc/mdadm/
# echo "DEVICE /dev/hd*[0-9] /dev/sd*[0-9]" >> /etc/mdadm/mdadm.conf
# mdadm --detail --scan >> /etc/mdadm/mdadm.conf
# ln -s /etc/mdadm/mdadm.conf /etc/mdadm.conf 

# cat /etc/mdadm/mdadm.conf
DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=a4d5007d:6974901a:637e5622:e5b514c9
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=0e8ce9c6:bd42917d:fd3412bf:01f49095
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=7d696890:890b2eb7:c17bf4e4:d542ba99

The DEVICE filter is used to limit candidate RAID devices being created or added as RAID disk members.

8. Create Filesystems/Swap Devices on RAID devices

Once Created, RAID devices are usable just like any other device. Use the mkfs.ext3(8) or mke2fs(8) and mkswap(8) commands to create EXT3 filesystems and swap device on RAID devices e.g.:

# mkfs.ext3 -L boot.md1 /dev/md1
mke2fs 1.39 (29-May-2006)
Filesystem label=boot.md1
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131616 inodes, 263040 blocks
13152 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=272629760
9 block groups
32768 blocks per group, 32768 fragments per group
14624 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376 

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 35 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

# mkfs.ext3 -L root.md2 /dev/md2
mke2fs 1.39 (29-May-2006)
Filesystem label=root.md2
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
640000 inodes, 1279152 blocks
63957 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1312817152
40 block groups
32768 blocks per group, 32768 fragments per group
16000 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 35 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@oel5raid /]# mkswap -L swap.md3 /dev/md3
Setting up swapspace version 1, size = 1077407 kB
LABEL=swap.md3, no uuid

[root@oel5raid /]# blkid
/dev/sda1: LABEL="/boot" UUID="4774c28e-01f8-4130-988d-9f9e3675a988" SEC_TYPE="ext2" TYPE="ext3"
/dev/sda2: LABEL="/1" UUID="3382d839-ebac-4db2-a19e-4c9e8dc56e0d" SEC_TYPE="ext2" TYPE="ext3"
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"
/dev/sdb1: LABEL="boot.md1" UUID="3d6916b0-0997-47e0-8409-ec4361480cab" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb2: LABEL="root.md2" UUID="b2124e73-b35d-411c-9988-62fd47891c2d" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb3: TYPE="swap" LABEL="swap.md3"
/dev/md1: LABEL="boot.md1" UUID="3d6916b0-0997-47e0-8409-ec4361480cab" SEC_TYPE="ext2" TYPE="ext3"
/dev/md2: LABEL="root.md2" UUID="b2124e73-b35d-411c-9988-62fd47891c2d" SEC_TYPE="ext2" TYPE="ext3"
/dev/md3: TYPE="swap" LABEL="swap.md3"

To avoid confusion later, labels added to filesystems and swap device denote the RAID device on which they are created.

9. Backup Current System Configuration

Beyond this point, significant changes are made to the current system, therefore take a backup of the core system configuration e.g.:

# cp /etc/fstab /etc/fstab.orig
# cp /boot/grub/grub.conf /boot/grub/grub.conf.orig
# mkdir /boot.orig
# sync
# cp -dpRxu /boot/* /boot.orig/

10. Mount Filesystems on RAID Devices

Mount the raided filesystems e.g.

# mkdir /boot.md1
# mount -t ext3 /dev/md1 /boot.md1
# mount | grep boot
/dev/sda1 on /boot type ext3 (rw)
/dev/md1 on /boot.md1 type ext3 (rw) 

# mkdir /root.md2
# mount -t ext3 /dev/md2 /root.md2

11. Optionally mount/swapon filesystems/swap device on RAID devices as their non-RAID devices

Optionally test mount/swapon of filesystems on raided devices as their currently mounted non-raided counterparts e.g.:

# umount /boot
# umount /boot.md1
# mount -t ext3 /dev/md1 /boot
# mount | grep boot
/dev/md1 on /boot type ext3 (rw) 

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       1052248 112     -1

# swapoff /dev/sda3
# swapon /dev/md3

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -2

Note: it is not possible to unmount/remount the root filesystem (/dev/sda2) as it’s currently in use.

12. Modify fstab to Use RAID Devices

Modify the /etc/fstab file to mount/swapon raided devices on system boot.
Substitute relevant LABEL=  or /dev/sdaN entries with their corresponding /dev/mdN devices e.g.:

# cat /etc/fstab
/dev/md2                /                       ext3    defaults        1 1
/dev/md1                /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/md3                swap                    swap    defaults        0 0

13. Add Failback Title to grub.conf

A failback title allows the system to boot the system using one title and fallback to another should any issues occur when booting with the first. This is particularly helpful in that without a failback title, the system may fail to boot and a linux rescue may be needed to restore/recover the system.

Original /boot/grub/grub.conf:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux (2.6.18-92.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=LABEL=/1 rhgb quiet
initrd /initrd-2.6.18-92.el5.img

Modify the original /boot/grub/grub.conf file by adding the failback parameter and a failback grub boot title e.g.:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
failback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux (2.6.18-92.el5)
root (hd1,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2
initrd /initrd-2.6.18-92.el5.img

title Enterprise Linux (2.6.18-92.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=LABEL=/1 rhgb quiet
initrd /initrd-2.6.18-92.el5.img

In the example above, the system is configured to boot using the first boot title (default=0) i.e. the one with /boot on the first partition of the second grub disk device (hd1,0) and specifying the root filesystem on raid device /dev/md2. Should that fail to boot, the system will failback (failback=1) to boot from the second boot title i.e. the one specifying the /boot filesystem on the first partition of the first grub device (hd0,0) and and specifying the root filesystem with label /1. Note that grub boot title numbering starts from zero (0).

14. Remake Initial RAM Disk (One of Two)

Use the mkinitrd(4) utility to recreate the initial ram disk. The initial ram disk must be rebuilt with raid  module support to ensure the system has the required drivers to boot from raided devices e.g.:

# cd /boot
# mv initrd-`uname -r`.img initrd-`uname -r`.img.orig 

# mkinitrd -v -f initrd-2.6.18-92.el5.img `uname -r`
Creating initramfs
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3: jbd
Looking for deps of module jbd
Found RAID component md2
Looking for deps of module raid1
Looking for driver for device sdb
...
Adding module ehci-hcd
Adding module ohci-hcd
Adding module uhci-hcd
Adding module jbd
Adding module ext3
Adding module raid1
Adding module scsi_mod
Adding module sd_mod
Adding module scsi_transport_spi
Adding module mptbase
Adding module mptscsih
Adding module mptspi
Adding module libata
Adding module ata_piix
...

[root@oel5raid /]# ls -l /boot/initrd*
-rw------- 1 root root 2467721 Dec 30 23:09 /boot/initrd-2.6.18-92.el5.img
-rw------- 1 root root 2467715 Dec 29 17:52 /boot/initrd-2.6.18-92.el5.img.orig

 

Note: another mkinitrd will be required again later after /dev/sdaN partitions are added to the arrays.

15. Copy Contents of Non-RAID filesystems to RAID filesystems

If the raided filesystems were unmounted earlier, remount them as described in Step 10.
Copy the contents of non-raided filesystems (/boot on /dev/sda1, / on /dev/sda2) to their corresponding filesystems on raided devices (/boot.md1 on /dev/md1, /root.md2 on /dev/md2) e.g.:

# sync
# cp -dpRxu /boot/* /boot.md1 

OR

# cd /boot
# sync
# find . -xdev -print0 | cpio -0pdvum --sparse /boot.md1
...

 

# sync
# cp -dpRxu / /root.md2 

OR

# cd /
# sync
# find . -xdev -print0 | cpio -0pdvum --sparse /root.md2
...

Note: there is no need to copy the contents of the swap device. The non-raided swap device (/dev/sda3) will be swapped-off on system shutdown and the raided swap device (/dev/md3) swapped-on on reboot.

16. Install/Reintall GRUB

To cater for the situation where one or other raid disk member is either unavailable, unusable or missing, GRUB [Grand Unified Boot Loader] must be installed to the boot sector (MBR) of every raid disk member participating in an array i.e. /dev/sda, /dev/sdb. Use the grub(8) utility to install grub on the second grub disk (hd1) [/dev/sdb], currently the sole raid disk member e.g.:

# grub 

GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported.  For the first word, TAB
lists possible command completions.  Anywhere else TAB lists the possible
completions of a device/filename.]

grub> root (hd1,0)
Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd1)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd1)"...  15 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

Note: the ‘Checking if “/boot/grub/stage1” exists… no’ message above is safely ignorable – in this instance, /boot resides on a separate partition (/dev/sdb1 [/dev/md1]), the actual root of which is /grub/ where stage1, stage1.5 and stage2 all exist as expected.

Optionally, reinstall grub on the first disk (hd0) [/dev/sda1], the soon to be second raid member disk e.g.:

grub> root (hd0,0)
Filesystem type is ext2fs, partition type 0x83 

grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

The reference to (hd0,0) in /boot/grub/grub.conf is a grub disk reference that refers to disk 1 partition 1, which in this instance is /dev/sda1, the partition that houses the non-raided /boot filesystem. Grub always references disks as (hdN) regardless whether of type IDE or SCSI. At installation time, grub builds and stores a map of disk devices in file /boot/grub/device.map. As the system was initially installed with only one disk present (/dev/sda), the contents of /boot/grub/device.map appears as follows:

# cat /boot/grub/device.map
# this device map was generated by anaconda
(hd0)     /dev/sda

Had the /boot filesystem been installed on /dev/sda3 ,say, grub references in /boot/grub/grub.conf would have been (hd0,2). Grub disk and partition numbering starts from zero (0), whereas partition table disk entries start with ‘a’ e.g. /dev/hda (IDE), /dev/sda (SCSI) and partiton table numbering starts from 1 e.g. /dev/hda1, /dev/sda1.

If there is any confusion regarding grub discovered devices, grub itself may be used to detect or list available devices e.g.:

# grub
Probing devices to guess BIOS drives. This may take a long time. 

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]

grub> root (hd<tab key>
Possible disks are: hd0 hd1

 

17. Reboot the System (Degraded Array)

Reboot the system. As a precaution, be sure to have your operating system installation/rescue media on hand. During boot up, review console messages to determine which device is used to boot the system i.e. /dev/md1 {/dev/sdb} or failback device /dev/sda1. All going well, the system will be using the raid devices albeit in a degraded state i.e. all arrays still only contain one disk member (/dev/sdbN).

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[0]
1052160 blocks [2/1] [U_] 

md2 : active raid1 sdb2[0]
5116608 blocks [2/1] [U_]

md3 : active raid1 sdb3[0]
1052160 blocks [2/1] [U_]

unused devices: <none>

Verify mounted filesystems (/, /boot) are those residing on raided devices e.g.:

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw) 

# swapon -s
Filename Type Size Used Priority
/dev/md3 partition 1052152 0 -1

Further verify that no /dev/sdaN partitions are used e.g.:

# mount | grep sda

# swapon -s | grep sda
#

If you did not add a failback title as described in step 13 and experienced booting issues, perform a linux rescue to restore/recover the system.

18. Modify Primary Disk Partitions to Type RAID

In preparation for adding /dev/sdaN partitions to their respective arrays, use the fdisk(8) utility to modify the primary disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:

# fdisk /dev/sda 

The number of cylinders for this disk is set to 1044.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         131     1052226   83  Linux
/dev/sda2             132         768     5116702+  83  Linux
/dev/sda3             769         899     1052257+  82  Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

 

 

# fdisk -l /dev/sda 

Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         131     1052226   fd  Linux raid autodetect
/dev/sda2             132         768     5116702+  fd  Linux raid autodetect
/dev/sda3             769         899     1052257+  fd  Linux raid autodetect

Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

# partprobe /dev/sda

19. Add Primary Disk Partitions to RAID Arrays

Once the system has successfully booted using raid (i.e. the secondary disk), use the mdadm(8) utility to add the primary disk partitions to their respective arrays. All data on /dev/sdaN partitions will be destroyed in the process.

# mdadm --manage --add /dev/md1 /dev/sda1
mdadm: added /dev/sda1
# mdadm --manage --add /dev/md2 /dev/sda2
mdadm: added /dev/sda2
# mdadm --manage --add /dev/md3 /dev/sda3
mdadm: added /dev/sda3 

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[2] sdb1[0]
1052160 blocks [2/1] [U_]
[=====>...............]  recovery = 25.1% (266112/1052160) finish=4.0min speed=3194K/sec

md2 : active raid1 sda2[2] sdb2[0]
5116608 blocks [2/1] [U_]
resync=DELAYED

md3 : active raid1 sda3[2] sdb3[0]
1052160 blocks [2/1] [U_]
resync=DELAYED

unused devices: <none>

 

Depending on the size of partitions/disks used, data synchronisation between raid disk members may take a long time. Use the watch(1) command to monitor disk synchronisation progress e.g.:

# watch -n 15 cat /proc/mdstat
...

Once complete, /proc/mdstat should denote clean and consistent arrays each with two active, working members e.g.:

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1] sdb1[0]
1052160 blocks [2/2] [UU] 

md2 : active raid1 sda2[1] sdb2[0]
5116608 blocks [2/2] [UU]

md3 : active raid1 sda3[1] sdb3[0]
1052160 blocks [2/2] [UU]

unused devices: <none>

20. Modify grub.conf

Once all /dev/sdaN partitions are added as disk members of their respective arrays, modify the /boot/grub/grub.conf file. Substitiute the previous reference to LABEL=/ in the second boot title with raid device /dev/md2 e.g.:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
failback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux (2.6.18-92.el5)
root (hd1,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2
initrd /initrd-2.6.18-92.el5.img
title Enterprise Linux (2.6.18-92.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2
initrd /initrd-2.6.18-92.el5.img 

 

21. Remake Initial RAM Disk (Two of Two)

Use the mkinitrd(4) utility to recreate the initial ram disk (again) e.g.:

# cd /boot
# mv initrd-`uname -r`.img initrd-`uname -r`.img.1 

# mkinitrd -v -f initrd-`uname -r.el5.img `uname -r`
Creating initramfs
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3: jbd
Looking for deps of module jbd
Found RAID component md2
Looking for deps of module raid1
...

# ls -l /boot/initrd*
-rw------- 1 root root 2477483 Dec 31 08:28 /boot/initrd-2.6.18-92.el5.img
-rw------- 1 root root 2478838 Dec 31 00:36 /boot/initrd-2.6.18-92.el5.img.1
-rw------- 1 root root 2467715 Dec 29 17:52 /boot/initrd-2.6.18-92.el5.img.orig

 

At this point, the system is now up and running using raid1 devices for /, /boot filesystems and swap device.

22. Testing

Before relying on the newly configured system, test the system for proper operation and increased availablility.

Suggested testing includes:

  • boot from alternate boot title (clean array)
  • persistent mount on degraded array (/dev/sdb software failed)
  • boot into degraded array (/dev/sdb software removed)
  • boot into degraded array (/dev/sda physically removed)

Primary diagnostics to monitor during testing include:

  • console messages
  • dmesg
  • /proc/mdstat
  • mdadm –query –detail <md dev>

22.1 Test – persistent mount on degraded array (/dev/sdb software failed))

As part of configuring the system to use raid, you will have already tested booting the system from the second disk i.e. /dev/md1 {/dev/sdb1}. For this test, modify the /boot/grub/grub.conf to boot the system using the first disk, /dev/md1 {/dev/sda1} i.e.:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=1
failback=0

timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux (2.6.18-92.el5)
root (hd1,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
initrd /initrd-2.6.18-92.el5.img
title Enterprise Linux (2.6.18-92.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
initrd /initrd-2.6.18-92.el5.img

Note the changes to the default and failback parameter values.

22.2 Test – boot into degraded array (/dev/sdb software removed)

Verify that the /, /boot filesystems and swap device remain active, usable and writable after failing the second disk member of each raid array e.g.:

# mdadm --manage --fail /dev/md1 /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md3 

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2](F) sda1[1]
1052160 blocks [2/1] [_U]

md3 : active raid1 sdb3[2](F) sda3[1]
1052160 blocks [2/1] [_U]

md2 : active raid1 sdb2[2](F) sda2[1]
5116608 blocks [2/1] [_U]

unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Tue Dec 30 21:46:44 2008
Raid Level : raid1
Array Size : 1052160 (1027.67 MiB 1077.41 MB)
Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 31 08:57:33 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

UUID : a4d5007d:6974901a:637e5622:e5b514c9
Events : 0.70

Number   Major   Minor   RaidDevice State
0       0        0        0      removed
1       8        1        1      active sync   /dev/sda1
2       8       17        -      faulty spare   /dev/sdb1

# dmesg
...
raid1: Disk failure on sdb1, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb1
disk 1, wo:0, o:1, dev:sda1
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda1
raid1: Disk failure on sdb2, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb2
disk 1, wo:0, o:1, dev:sda2
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda2
raid1: Disk failure on sdb3, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb3
disk 1, wo:0, o:1, dev:sda3
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda3

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1

22.3 Test – boot into degraded array (/dev/sdb software removed)

Having software failed the second disk raid member (/dev/sdb), software remove the second disk then test successful system boot e.g.:

# mdadm --manage --remove /dev/md1 /dev/sdb1
mdadm: hot removed /dev/sdb1
# mdadm --manage --remove /dev/md2 /dev/sdb2
mdadm: hot removed /dev/sdb2
# mdadm --manage --remove /dev/md3 /dev/sdb3
mdadm: hot removed /dev/sdb3 

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
1052160 blocks [2/1] [_U]

md3 : active raid1 sda3[1]
1052160 blocks [2/1] [_U]

md2 : active raid1 sda2[1]
5116608 blocks [2/1] [_U]

unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Tue Dec 30 21:46:44 2008
Raid Level : raid1
Array Size : 1052160 (1027.67 MiB 1077.41 MB)
Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 31 09:06:21 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : a4d5007d:6974901a:637e5622:e5b514c9
Events : 0.72

Number   Major   Minor   RaidDevice State
0       0        0        0      removed
1       8        1        1      active sync   /dev/sda1

# dmesg
...
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdb2>
md: export_rdev(sdb2)
md: unbind<sdb3>
md: export_rdev(sdb3)

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1

# shutdown -r now
...

On reboot, add the failed/removed second disk back to the array e.g.:

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
1052160 blocks [2/1] [_U] 

md3 : active raid1 sda3[1]
1052160 blocks [2/1] [_U]

md2 : active raid1 sda2[1]
5116608 blocks [2/1] [_U]

unused devices: <none>

# mdadm --manage --add /dev/md1 /dev/sdb1
mdadm: re-added /dev/sdb1
# mdadm --manage --add /dev/md2 /dev/sdb2
mdadm: re-added /dev/sdb2
# mdadm --manage --add /dev/md3 /dev/sdb3
mdadm: re-added /dev/sdb3

# dmesg
...
md: bind<sdb1>
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sdb1
disk 1, wo:0, o:1, dev:sda1
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 1052160 blocks.
md: bind<sdb2>
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sdb2
disk 1, wo:0, o:1, dev:sda2
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
md: bind<sdb3>
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sdb3
disk 1, wo:0, o:1, dev:sda3
md: delaying resync of md3 until md1 has finished resync (they share one or more physical units)
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)

# watch -n 15 cat /proc/mdstat

 

22.4 Test – boot into degraded array (/dev/sda physically removed)

Similar to tests 22.2 and 22.3, test for ongoing system operation then system boot after physical removal of one or other (or both) raid disk member disks e.g.:

# mdadm --manage --fail /dev/md1 /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sda2
mdadm: set /dev/sda2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sda3
mdadm: set /dev/sda3 faulty in /dev/md3 

# mdadm --manage --remove /dev/md1 /dev/sda1
mdadm: hot removed /dev/sda1
# mdadm --manage --remove /dev/md2 /dev/sda2
mdadm: hot removed /dev/sda2
# mdadm --manage --remove /dev/md3 /dev/sda3
mdadm: hot removed /dev/sda3

 

later dynamically remove (hot unplug) disk devices from the system ( procedure) . Alternatively, shutdown the system and physically remove device /dev/sda from the system before rebooting. On boot, dynamically add the /dev/sda back as a raid disk member, then repeat the same test but physically remove second disk member /dev/sdb. This test not only validates the failback boot title, but also emulates online replacement of a failed hard disk.

 

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

2 Responses

  1. goutham says:

    Hi
         i got a problem with OEL grub .i was  installing OEL 5 .for my oracle installation on OEL5 i installed everthing worked fine .but when i am trying to install vmtools for vmware the linux kernel loaded was xen version and it doesnt support any more so after google i found that we need to change the grub.conf file so that the proper kernel will boot..so i changed the kernel in the /boot/grub/grub.conf file and when it started it went to grub menu.i am unable to load linux now.please can anybody help me this out

    thank you
    gouthamk10@yahoo.com

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us