I recently had a failed drive in my RAID1 array. I’ve just installed
the replacement drive and thought I’d share the method.
Let’s look at the current situation:
root@ace:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda3[1] 483403776 blocks [2/1] [_U] md0 : active raid1 sda1[1] 96256 blocks [2/1] [_U] unused devices: <none>
So we can see we have two mirrored arrays with one drive missing in both.
Let’s see that we’ve recognised the second drive:
root@ace:~# dmesg | grep sd [ 21.465395] Driver 'sd' needs updating - please use bus_type methods [ 21.465486] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB) [ 21.465496] sd 2:0:0:0: [sda] Write Protect is off [ 21.465498] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 21.465512] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 21.465562] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB) [ 21.465571] sd 2:0:0:0: [sda] Write Protect is off [ 21.465573] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 21.465587] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 21.465590] sda: sda1 sda2 sda3 [ 21.487248] sd 2:0:0:0: [sda] Attached SCSI disk [ 21.487303] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) [ 21.487314] sd 2:0:1:0: [sdb] Write Protect is off [ 21.487317] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 [ 21.487331] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 21.487371] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) [ 21.487381] sd 2:0:1:0: [sdb] Write Protect is off [ 21.487382] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 [ 21.487403] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 21.487407] sdb: unknown partition table [ 21.502763] sd 2:0:1:0: [sdb] Attached SCSI disk [ 21.506690] sd 2:0:0:0: Attached scsi generic sg0 type 0 [ 21.506711] sd 2:0:1:0: Attached scsi generic sg1 type 0 [ 21.793835] md: bind<sda1> [ 21.858027] md: bind<sda3>
So, sda has three partitions, sda1, sda2 and sda3, and sdb has no partition
table. Let’s give it one the same as sda. The easiest way to do this is using
sfdisk:
root@ace:~# sfdisk -d /dev/sda | sfdisk /dev/sdb Checking that no-one is using this disk right now ... OK Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track sfdisk: ERROR: sector 0 does not have an MSDOS signature /dev/sdb: unrecognised partition table type Old situation: No partitions found New situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sdb1 * 63 192779 192717 fd Linux RAID autodetect /dev/sdb2 192780 9960299 9767520 82 Linux swap / Solaris /dev/sdb3 9960300 976768064 966807765 fd Linux RAID autodetect /dev/sdb4 0 - 0 0 Empty Successfully wrote the new partition table Re-reading the partition table ... If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).)
If we check dmesg now to check it’s worked, we’ll see:
root@ace:~# dmesg | grep sd ... [ 224.246102] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) [ 224.246322] sd 2:0:1:0: [sdb] Write Protect is off [ 224.246325] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 [ 224.246547] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 224.246686] sdb: unknown partition table [ 227.326278] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) [ 227.326504] sd 2:0:1:0: [sdb] Write Protect is off [ 227.326507] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 [ 227.326703] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 227.326708] sdb: sdb1 sdb2 sdb3
So, now we have identical partition tables. The next thing to do is to add the new partitions to the array:
root@ace:~# mdadm /dev/md0 --add /dev/sdb1 mdadm: added /dev/sdb1 root@ace:~# mdadm /dev/md1 --add /dev/sdb3 mdadm: added /dev/sdb3
Everything looks good. Let’s check dmesg:
[ 323.941542] md: bind<sdb1> [ 324.038183] RAID1 conf printout: [ 324.038189] --- wd:1 rd:2 [ 324.038192] disk 0, wo:1, o:1, dev:sdb1 [ 324.038195] disk 1, wo:0, o:1, dev:sda1 [ 324.038300] md: recovery of RAID array md0 [ 324.038303] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 324.038305] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 324.038310] md: using 128k window, over a total of 96256 blocks. [ 325.417219] md: md0: recovery done. [ 325.453629] RAID1 conf printout: [ 325.453632] --- wd:2 rd:2 [ 325.453634] disk 0, wo:0, o:1, dev:sdb1 [ 325.453636] disk 1, wo:0, o:1, dev:sda1 [ 347.970105] md: bind<sdb3> [ 348.004566] RAID1 conf printout: [ 348.004571] --- wd:1 rd:2 [ 348.004573] disk 0, wo:1, o:1, dev:sdb3 [ 348.004574] disk 1, wo:0, o:1, dev:sda3 [ 348.004657] md: recovery of RAID array md1 [ 348.004659] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 348.004660] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 348.004664] md: using 128k window, over a total of 483403776 blocks.
Everything still looks good. Let’s sit back and watch it rebuild using the wonderfully useful watch command:
root@ace:~# watch -n 1 cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sdb3[2] sda3[1] 483403776 blocks [2/1] [_U] [=====>...............] recovery = 26.0% (126080960/483403776) finish=96.2min speed=61846K/sec md0 : active raid1 sdb1[0] sda1[1] 96256 blocks [2/2] [UU] unused devices: <none>
The Ubuntu and Debian installers will allow you create RAID1 arrays
with less drives than you actually have, so you can use this technique
if you plan to add an additional drive after you’ve installed the
system. Just tell it the eventual number of drives, but only select the
available partitions during RAID setup. I used this method when a new machine recent
didn’t have enough SATA power cables and had to wait for an adaptor to
be delivered.
(Why did no one tell me about watch until recently. I wonder
how many more incredibly useful programs I’ve not discovered even after 10
years of using Linux)
on said:
One extra step that I do is install an MBR on the new disk, to make it bootable:
install-mbr /dev/sdb
on said:
Great article!
Maybe it would be even more useful if merged here:
http://linux-raid.osdl.org/index.php/Reconstruction
on said:
Semi-Tangential note about performance: On my home (== partly “play”) machine, I made the experience that “mdadm –manage .. –fail”-ing the root partition before doing lots of package upgrades (installing KDE 4/experimental and lots of other updates in my case, on a mostly etch system. Dual screen support sucks if the screens don’t have the same size, btw!) speeds up apt considerably, while the subsequent reconstruct step (–remove and then –add the partition) doesn’t slow down the system much during light desktop workload.
My system is a few years old (no SATA, probably not too much cache on the disks, too) and has only 512M RAM, so maybe a better equipped system would make this less noticeable.
(… and no, I probably wouldn’t force-fail part of my /home partition for any length of time 🙂