I recently had a failed drive in my RAID1 array. I’ve just installed
the replacement drive and thought I’d share the method.
Let’s look at the current situation:
root@ace:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda3[1]
483403776 blocks [2/1] [_U]
md0 : active raid1 sda1[1]
96256 blocks [2/1] [_U]
unused devices: <none>
So we can see we have two mirrored arrays with one drive missing in both.
Let’s see that we’ve recognised the second drive:
root@ace:~# dmesg | grep sd
[ 21.465395] Driver 'sd' needs updating - please use bus_type methods
[ 21.465486] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[ 21.465496] sd 2:0:0:0: [sda] Write Protect is off
[ 21.465498] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 21.465512] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 21.465562] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[ 21.465571] sd 2:0:0:0: [sda] Write Protect is off
[ 21.465573] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 21.465587] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 21.465590] sda: sda1 sda2 sda3
[ 21.487248] sd 2:0:0:0: [sda] Attached SCSI disk
[ 21.487303] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[ 21.487314] sd 2:0:1:0: [sdb] Write Protect is off
[ 21.487317] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[ 21.487331] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 21.487371] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[ 21.487381] sd 2:0:1:0: [sdb] Write Protect is off
[ 21.487382] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[ 21.487403] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 21.487407] sdb: unknown partition table
[ 21.502763] sd 2:0:1:0: [sdb] Attached SCSI disk
[ 21.506690] sd 2:0:0:0: Attached scsi generic sg0 type 0
[ 21.506711] sd 2:0:1:0: Attached scsi generic sg1 type 0
[ 21.793835] md: bind<sda1>
[ 21.858027] md: bind<sda3>
So, sda has three partitions, sda1, sda2 and sda3, and sdb has no partition
table. Let’s give it one the same as sda. The easiest way to do this is using
sfdisk:
root@ace:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK
Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track
sfdisk: ERROR: sector 0 does not have an MSDOS signature
/dev/sdb: unrecognised partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0
Device Boot Start End #sectors Id System
/dev/sdb1 * 63 192779 192717 fd Linux RAID autodetect
/dev/sdb2 192780 9960299 9767520 82 Linux swap / Solaris
/dev/sdb3 9960300 976768064 966807765 fd Linux RAID autodetect
/dev/sdb4 0 - 0 0 Empty
Successfully wrote the new partition table
Re-reading the partition table ...
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
If we check dmesg now to check it’s worked, we’ll see:
root@ace:~# dmesg | grep sd
...
[ 224.246102] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[ 224.246322] sd 2:0:1:0: [sdb] Write Protect is off
[ 224.246325] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[ 224.246547] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 224.246686] sdb: unknown partition table
[ 227.326278] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[ 227.326504] sd 2:0:1:0: [sdb] Write Protect is off
[ 227.326507] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[ 227.326703] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 227.326708] sdb: sdb1 sdb2 sdb3
So, now we have identical partition tables. The next thing to do is to add the new partitions to the array:
root@ace:~# mdadm /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1
root@ace:~# mdadm /dev/md1 --add /dev/sdb3
mdadm: added /dev/sdb3
Everything looks good. Let’s check dmesg:
[ 323.941542] md: bind<sdb1>
[ 324.038183] RAID1 conf printout:
[ 324.038189] --- wd:1 rd:2
[ 324.038192] disk 0, wo:1, o:1, dev:sdb1
[ 324.038195] disk 1, wo:0, o:1, dev:sda1
[ 324.038300] md: recovery of RAID array md0
[ 324.038303] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 324.038305] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 324.038310] md: using 128k window, over a total of 96256 blocks.
[ 325.417219] md: md0: recovery done.
[ 325.453629] RAID1 conf printout:
[ 325.453632] --- wd:2 rd:2
[ 325.453634] disk 0, wo:0, o:1, dev:sdb1
[ 325.453636] disk 1, wo:0, o:1, dev:sda1
[ 347.970105] md: bind<sdb3>
[ 348.004566] RAID1 conf printout:
[ 348.004571] --- wd:1 rd:2
[ 348.004573] disk 0, wo:1, o:1, dev:sdb3
[ 348.004574] disk 1, wo:0, o:1, dev:sda3
[ 348.004657] md: recovery of RAID array md1
[ 348.004659] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 348.004660] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 348.004664] md: using 128k window, over a total of 483403776 blocks.
Everything still looks good. Let’s sit back and watch it rebuild using the wonderfully useful watch command:
root@ace:~# watch -n 1 cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[2] sda3[1]
483403776 blocks [2/1] [_U]
[=====>...............] recovery = 26.0% (126080960/483403776) finish=96.2min speed=61846K/sec
md0 : active raid1 sdb1[0] sda1[1]
96256 blocks [2/2] [UU]
unused devices: <none>
The Ubuntu and Debian installers will allow you create RAID1 arrays
with less drives than you actually have, so you can use this technique
if you plan to add an additional drive after you’ve installed the
system. Just tell it the eventual number of drives, but only select the
available partitions during RAID setup. I used this method when a new machine recent
didn’t have enough SATA power cables and had to wait for an adaptor to
be delivered.
(Why did no one tell me about watch until recently. I wonder
how many more incredibly useful programs I’ve not discovered even after 10
years of using Linux)