* Rebuild doesn't start
@ 2009-08-10 23:56 Oliver Martin
2009-08-11 0:56 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Oliver Martin @ 2009-08-10 23:56 UTC (permalink / raw)
To: linux-raid
Hello,
I have two raid5 arrays spanning a number of USB drives. Yesterday, I
unintentionally unplugged one of them while connecting another device
to the same hub. The drive I unplugged used to be /dev/sdh, but when I
plugged it back in, it became /dev/sdi. For md0, this didn't matter. I
re-added it and it performed a rebuild* which completed successfully.
md1, which used to consist of sde2 and sdh2, should now contain sde2
and sdi2. For some reason, though, the rebuild doesn't start when I add
sdi2. It seems md doesn't recognize sdi2 as the same device that used
to be sdh2. Is that correct? How can I tell md about the name change?
Thanks,
Oliver
[*] Bitmaps are enabled on both arrays, so I was somewhat surprised
about the full rebuild; isn't that what bitmaps are supposed to prevent?
$ mdadm /dev/md1 -a /dev/sdi2
mdadm: re-added /dev/sdi2
$ cat /proc/mdstat
[...]
md1 : active raid5 sdi2[0](F) sde2[2]
488375808 blocks super 1.1 level 5, 64k chunk, algorithm 2 [2/1] [_U]
bitmap: 0/8 pages [0KB], 32768KB chunk
$ mdadm -D /dev/md1
/dev/md1:
Version : 1.01
Creation Time : Sun Apr 12 14:19:47 2009
Raid Level : raid5
Array Size : 488375808 (465.75 GiB 500.10 GB)
Used Dev Size : 488375808 (465.75 GiB 500.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Tue Aug 11 01:40:15 2009
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Name : quassel:1 (local to host quassel)
UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
Events : 106
Number Major Minor RaidDevice State
0 0 0 0 removed
2 8 66 1 active sync /dev/sde2
0 8 130 - faulty spare /dev/sdi2
$ mdadm -E /dev/sde2
/dev/sde2:
Magic : a92b4efc
Version : 1.1
Feature Map : 0x1
Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
Name : quassel:1 (local to host quassel)
Creation Time : Sun Apr 12 14:19:47 2009
Raid Level : raid5
Raid Devices : 2
Avail Dev Size : 976751736 (465.75 GiB 500.10 GB)
Array Size : 976751616 (465.75 GiB 500.10 GB)
Used Dev Size : 976751616 (465.75 GiB 500.10 GB)
Data Offset : 264 sectors
Super Offset : 0 sectors
State : clean
Device UUID : 0fcc7d6d:0ec92b47:c371f8e6:bd7d2cac
Internal Bitmap : 2 sectors from superblock
Update Time : Tue Aug 11 01:40:18 2009
Checksum : 4290b585 - correct
Events : 108
Layout : left-symmetric
Chunk Size : 64K
Array Slot : 2 (failed, failed, 1)
Array State : _U 2 failed
$ mdadm -E /dev/sdi2
/dev/sdi2:
Magic : a92b4efc
Version : 1.1
Feature Map : 0x1
Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
Name : quassel:1 (local to host quassel)
Creation Time : Sun Apr 12 14:19:47 2009
Raid Level : raid5
Raid Devices : 2
Avail Dev Size : 976751736 (465.75 GiB 500.10 GB)
Array Size : 976751616 (465.75 GiB 500.10 GB)
Used Dev Size : 976751616 (465.75 GiB 500.10 GB)
Data Offset : 264 sectors
Super Offset : 0 sectors
State : clean
Device UUID : 5ba69d85:c46d6bb0:bf71606e:2877b067
Internal Bitmap : 2 sectors from superblock
Update Time : Mon Aug 10 15:32:23 2009
Checksum : 6db9f21 - correct
Events : 28
Layout : left-symmetric
Chunk Size : 64K
Array Slot : 0 (failed, failed, 1)
Array State : _u 2 failed
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Rebuild doesn't start 2009-08-10 23:56 Rebuild doesn't start Oliver Martin @ 2009-08-11 0:56 ` NeilBrown 2009-08-11 13:28 ` Oliver Martin 0 siblings, 1 reply; 3+ messages in thread From: NeilBrown @ 2009-08-11 0:56 UTC (permalink / raw) To: Oliver Martin; +Cc: linux-raid On Tue, August 11, 2009 9:56 am, Oliver Martin wrote: > Hello, > > I have two raid5 arrays spanning a number of USB drives. Yesterday, I > unintentionally unplugged one of them while connecting another device > to the same hub. The drive I unplugged used to be /dev/sdh, but when I > plugged it back in, it became /dev/sdi. For md0, this didn't matter. I > re-added it and it performed a rebuild* which completed successfully. > > md1, which used to consist of sde2 and sdh2, should now contain sde2 > and sdi2. For some reason, though, the rebuild doesn't start when I add > sdi2. It seems md doesn't recognize sdi2 as the same device that used > to be sdh2. Is that correct? How can I tell md about the name change? If you look closely at the "mdadm -D" etc output that you included you will see that md1 things that sdi2 is faulty. Maybe it is. You would need to check kernel logs to be sure. > > > Thanks, > Oliver > > [*] Bitmaps are enabled on both arrays, so I was somewhat surprised > about the full rebuild; isn't that what bitmaps are supposed to prevent? Yes, bitmaps should prevent a full rebuild. I would need to see kernel logs of when this rebuild happened and "mdadm -D" the array to have any hope of guess why it didn't. NeilBrown > > > $ mdadm /dev/md1 -a /dev/sdi2 > mdadm: re-added /dev/sdi2 > > $ cat /proc/mdstat > [...] > md1 : active raid5 sdi2[0](F) sde2[2] > 488375808 blocks super 1.1 level 5, 64k chunk, algorithm 2 [2/1] > [_U] > bitmap: 0/8 pages [0KB], 32768KB chunk > > $ mdadm -D /dev/md1 > /dev/md1: > Version : 1.01 > Creation Time : Sun Apr 12 14:19:47 2009 > Raid Level : raid5 > Array Size : 488375808 (465.75 GiB 500.10 GB) > Used Dev Size : 488375808 (465.75 GiB 500.10 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 1 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Tue Aug 11 01:40:15 2009 > State : active, degraded > Active Devices : 1 > Working Devices : 1 > Failed Devices : 1 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > Name : quassel:1 (local to host quassel) > UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d > Events : 106 > > Number Major Minor RaidDevice State > 0 0 0 0 removed > 2 8 66 1 active sync /dev/sde2 > > 0 8 130 - faulty spare /dev/sdi2 > > $ mdadm -E /dev/sde2 > /dev/sde2: > Magic : a92b4efc > Version : 1.1 > Feature Map : 0x1 > Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d > Name : quassel:1 (local to host quassel) > Creation Time : Sun Apr 12 14:19:47 2009 > Raid Level : raid5 > Raid Devices : 2 > > Avail Dev Size : 976751736 (465.75 GiB 500.10 GB) > Array Size : 976751616 (465.75 GiB 500.10 GB) > Used Dev Size : 976751616 (465.75 GiB 500.10 GB) > Data Offset : 264 sectors > Super Offset : 0 sectors > State : clean > Device UUID : 0fcc7d6d:0ec92b47:c371f8e6:bd7d2cac > > Internal Bitmap : 2 sectors from superblock > Update Time : Tue Aug 11 01:40:18 2009 > Checksum : 4290b585 - correct > Events : 108 > > Layout : left-symmetric > Chunk Size : 64K > > Array Slot : 2 (failed, failed, 1) > Array State : _U 2 failed > > $ mdadm -E /dev/sdi2 > /dev/sdi2: > Magic : a92b4efc > Version : 1.1 > Feature Map : 0x1 > Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d > Name : quassel:1 (local to host quassel) > Creation Time : Sun Apr 12 14:19:47 2009 > Raid Level : raid5 > Raid Devices : 2 > > Avail Dev Size : 976751736 (465.75 GiB 500.10 GB) > Array Size : 976751616 (465.75 GiB 500.10 GB) > Used Dev Size : 976751616 (465.75 GiB 500.10 GB) > Data Offset : 264 sectors > Super Offset : 0 sectors > State : clean > Device UUID : 5ba69d85:c46d6bb0:bf71606e:2877b067 > > Internal Bitmap : 2 sectors from superblock > Update Time : Mon Aug 10 15:32:23 2009 > Checksum : 6db9f21 - correct > Events : 28 > > Layout : left-symmetric > Chunk Size : 64K > > Array Slot : 0 (failed, failed, 1) > Array State : _u 2 failed > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Rebuild doesn't start 2009-08-11 0:56 ` NeilBrown @ 2009-08-11 13:28 ` Oliver Martin 0 siblings, 0 replies; 3+ messages in thread From: Oliver Martin @ 2009-08-11 13:28 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Am Tue, 11 Aug 2009 10:56:02 +1000 (EST) schrieb NeilBrown: > If you look closely at the "mdadm -D" etc output that you included > you will see that md1 things that sdi2 is faulty. Maybe it is. > You would need to check kernel logs to be sure. I don't think the drive is bad. SMART values look ok, and md0 didn't have any problem with re-adding sdi1. I forgot another strange thing: While I could add sdi1 to md0 and the rebuild succeeded, I couldn't add sdi2 to md1 until after a reboot. I always got an error like this: mdadm: add new device failed for /dev/sdi2: Device or resource busy When all this happened, I was running 2.6.29.1. Afterwards, I tried upgrading to 2.6.30.4 to see if that solved the problem, but nothing changed. > Yes, bitmaps should prevent a full rebuild. I would need to see > kernel logs of when this rebuild happened and "mdadm -D" the > array to have any hope of guess why it didn't. > > NeilBrown $ mdadm -D /dev/md0 /dev/md0: Version : 1.01 Creation Time : Sat Mar 15 13:28:07 2008 Raid Level : raid5 Array Size : 1953535232 (1863.04 GiB 2000.42 GB) Used Dev Size : 488383808 (465.76 GiB 500.11 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Aug 10 19:29:47 2009 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : quassel:0 (local to host quassel) UUID : 1111b4fd:4219035a:f52968e6:cc4dd971 Events : 650394 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 3 8 97 2 active sync /dev/sdg1 4 8 129 3 active sync /dev/sdi1 5 8 65 4 active sync /dev/sde1 --- kernel log --- 21:58:14 usb 4-5.2.4: USB disconnect, address 13 21:58:28 usb 4-5.2.4: new high speed USB device using ehci_hcd and address 17 21:58:28 usb 4-5.2.4: configuration #1 chosen from 1 choice 21:58:28 scsi10 : SCSI emulation for USB Mass Storage devices 21:58:28 usb-storage: device found at 17 21:58:28 usb-storage: waiting for device to settle before scanning 21:58:33 usb-storage: device scan complete 21:58:33 scsi 10:0:0:0: Direct-Access WDC WD10 EACS-00D6B0 PQ: 0 ANSI: 2 CCS 21:58:33 sd 10:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) 21:58:33 sd 10:0:0:0: [sdi] Write Protect is off 21:58:33 sd 10:0:0:0: [sdi] Mode Sense: 00 38 00 00 21:58:33 sd 10:0:0:0: [sdi] Assuming drive cache: write through 21:58:33 sd 10:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) 21:58:33 sd 10:0:0:0: [sdi] Write Protect is off 21:58:33 sd 10:0:0:0: [sdi] Mode Sense: 00 38 00 00 21:58:33 sd 10:0:0:0: [sdi] Assuming drive cache: write through 21:58:33 sdi: sdi1 sdi2 21:58:33 sd 10:0:0:0: [sdi] Attached SCSI disk 21:58:33 sd 10:0:0:0: Attached scsi generic sg9 type 0 I think here I unmounted the file system and stopped the LVM device on the array, but I'm not entirely sure. The initial 17 second delay suggests that this is the first time the array was accessed after unplugging the drive, since the drives were all spun down at the time. 22:03:57 md: md0 still in use. 22:03:57 md: md1 still in use. 22:03:57 md: md0 still in use. 22:03:57 md: md1 still in use. 22:04:14 end_request: I/O error, dev sdh, sector 2 22:04:14 md: super_written gets error=-5, uptodate=0 22:04:14 raid5: Disk failure on sdh1, disabling device. 22:04:14 raid5: Operation continuing on 4 devices. 22:04:14 RAID5 conf printout: 22:04:14 --- rd:5 wd:4 22:04:14 disk 0, o:1, dev:sdb1 22:04:14 disk 1, o:1, dev:sdd1 22:04:14 disk 2, o:1, dev:sdg1 22:04:14 disk 3, o:0, dev:sdh1 22:04:14 disk 4, o:1, dev:sde1 22:04:14 RAID5 conf printout: 22:04:14 --- rd:5 wd:4 22:04:14 disk 0, o:1, dev:sdb1 22:04:14 disk 1, o:1, dev:sdd1 22:04:14 disk 2, o:1, dev:sdg1 22:04:14 disk 4, o:1, dev:sde1 22:04:16 md: md0 still in use. 22:04:16 md: md1 still in use. 22:04:16 md: md0 still in use. 22:04:16 md: md1 still in use. 22:04:21 raid5: Disk failure on sdh2, disabling device. 22:04:21 raid5: Operation continuing on 1 devices. 22:04:21 RAID5 conf printout: 22:04:21 --- rd:2 wd:1 22:04:21 disk 0, o:0, dev:sdh2 22:04:21 disk 1, o:1, dev:sde2 22:04:21 RAID5 conf printout: 22:04:21 --- rd:2 wd:1 22:04:21 disk 1, o:1, dev:sde2 /etc/init.d/mdadm-raid stop This is mdadm 2.6.8 from Debian lenny. That segfault probably shouldn't have happened... 22:04:32 md: md0 stopped. 22:04:32 md: unbind<sdb1> 22:04:32 md: export_rdev(sdb1) 22:04:32 md: unbind<sde1> 22:04:32 md: export_rdev(sde1) 22:04:32 md: unbind<sdh1> 22:04:32 md: export_rdev(sdh1) 22:04:32 md: unbind<sdg1> 22:04:32 md: export_rdev(sdg1) 22:04:32 md: unbind<sdd1> 22:04:32 md: export_rdev(sdd1) 22:04:32 mdadm[18096]: segfault at 118 ip 0806a7b9 sp bffb8160 error 4 in mdadm[8048000+2a000] /etc/init.d/mdadm-raid start 22:04:37 md: md0 stopped. 22:04:38 md: bind<sdd1> 22:04:38 md: bind<sdg1> 22:04:38 md: bind<sdi1> 22:04:38 md: bind<sde1> 22:04:38 md: bind<sdb1> 22:04:38 md: kicking non-fresh sdi1 from array! 22:04:38 md: unbind<sdi1> 22:04:38 md: export_rdev(sdi1) 22:04:38 raid5: device sdb1 operational as raid disk 0 22:04:38 raid5: device sde1 operational as raid disk 4 22:04:38 raid5: device sdg1 operational as raid disk 2 22:04:38 raid5: device sdd1 operational as raid disk 1 22:04:38 raid5: allocated 5255kB for md0 22:04:38 raid5: raid level 5 set md0 active with 4 out of 5 devices, algorithm 2 22:04:38 RAID5 conf printout: 22:04:38 --- rd:5 wd:4 22:04:38 disk 0, o:1, dev:sdb1 22:04:38 disk 1, o:1, dev:sdd1 22:04:38 disk 2, o:1, dev:sdg1 22:04:38 disk 4, o:1, dev:sde1 22:04:38 md0: bitmap initialized from disk: read 1/1 pages, set 1 bits 22:04:38 created bitmap (8 pages) for device md0 22:04:38 md0: detected capacity change from 0 to 2000420077568 22:04:38 md0: unknown partition table mdadm /dev/md0 -a /dev/sdi1 22:05:21 md: bind<sdi1> 22:05:21 RAID5 conf printout: 22:05:21 --- rd:5 wd:4 22:05:21 disk 0, o:1, dev:sdb1 22:05:21 disk 1, o:1, dev:sdd1 22:05:21 disk 2, o:1, dev:sdg1 22:05:21 disk 3, o:1, dev:sdi1 22:05:21 disk 4, o:1, dev:sde1 22:05:21 md: recovery of RAID array md0 22:05:21 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. 22:05:21 md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. 22:05:21 md: using 128k window, over a total of 488383808 blocks. This is probably where I tried to add sdi2 to md1 without any luck. 22:05:54 md: export_rdev(sdi2) 22:05:55 md: export_rdev(sdi2) ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-08-11 13:28 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-10 23:56 Rebuild doesn't start Oliver Martin 2009-08-11 0:56 ` NeilBrown 2009-08-11 13:28 ` Oliver Martin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox