RAID array maintenance
From CCMSTWiki
[edit]
How to replace a drive on a RAID 5 array
This is a brief Howto for replacing a bad drive in a software RAID 5 array under Linux. This procedure should work for a RAID 1 array too.
Let suppose we have a RAID 5 array composed of 9 drives. This is how the /proc/mdstat file might look like:
md0 : active raid5 sdj1[8] sdi1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
7814078464 blocks level 5, 256k chunk, algorithm 2 [9/9] [UUUUUUUUU]
Now there is one drive, say sde, that is going bad (you see a lot of messages reporting errors on this drive both on the console and on /var/log/messages). Here is how to replace it:
- Mark the drive as faulty:
mdadm --manage /dev/md0 --fail /dev/sde1
- Remove the device from the array:
mdadm --manage /dev/md0 --remove /dev/sde1
- Now comes the tricky part: identifying which physical hard drive corresponds to the sde device. Power down the machine and open up the chassis. Now, looking at the order in which the SATA cables are plugged to the motherboard, make a first guess of which drive is sde. Disconnect power cable and SATA cable from that drive and power up the system. If you guessed right, the system should be able to come up and start the /dev/md0 device. If the guess was wrong, the system will signal problems in starting the /dev/md0 device. Rinse, lather, repeat, until the system is able to start with one hard drive disconnected: Congratulations, you found the hard drive to replace!
- Replace the drive. The replacement must have at least the same capacity as the old drive. Power the system up again. Now the new drive should be seen by the system as /dev/sde.
- Copy the partition table from one of the drives of the array into the new one. This will ensure that the new drive is partitioned exactly as the old one.
sfdisk -d /dev/sdf | sfdisk /dev/sde
- Add /dev/sde1 to the RAID array:
mdadm --manage /dev/md0 --add /dev/sde1
- Done! Now the system will have to resync the array. You can monitor the /proc/mdstat file to check on the progress of the operation. This might take few hours, depending on the dimensions of the array. It is better not to write to the array while it is syncing.
--malagoli 11:57, 8 May 2009 (EDT)
