Rebuild doesn't start

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* Rebuild doesn't start
@ 2009-08-10 23:56 Oliver Martin
  2009-08-11  0:56 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Oliver Martin @ 2009-08-10 23:56 UTC (permalink / raw)
  To: linux-raid

Hello,

I have two raid5 arrays spanning a number of USB drives. Yesterday, I
unintentionally unplugged one of them while connecting another device
to the same hub. The drive I unplugged used to be /dev/sdh, but when I
plugged it back in, it became /dev/sdi. For md0, this didn't matter. I
re-added it and it performed a rebuild* which completed successfully.

md1, which used to consist of sde2 and sdh2, should now contain sde2
and sdi2. For some reason, though, the rebuild doesn't start when I add
sdi2. It seems md doesn't recognize sdi2 as the same device that used
to be sdh2. Is that correct? How can I tell md about the name change?


Thanks,
Oliver

[*] Bitmaps are enabled on both arrays, so I was somewhat surprised
about the full rebuild; isn't that what bitmaps are supposed to prevent?


$ mdadm /dev/md1 -a /dev/sdi2
mdadm: re-added /dev/sdi2

$ cat /proc/mdstat 
[...]
md1 : active raid5 sdi2[0](F) sde2[2]
      488375808 blocks super 1.1 level 5, 64k chunk, algorithm 2 [2/1] [_U]
      bitmap: 0/8 pages [0KB], 32768KB chunk

$ mdadm -D /dev/md1
/dev/md1:
        Version : 1.01
  Creation Time : Sun Apr 12 14:19:47 2009
     Raid Level : raid5
     Array Size : 488375808 (465.75 GiB 500.10 GB)
  Used Dev Size : 488375808 (465.75 GiB 500.10 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Aug 11 01:40:15 2009
          State : active, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : quassel:1  (local to host quassel)
           UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
         Events : 106

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       2       8       66        1      active sync   /dev/sde2

       0       8      130        -      faulty spare   /dev/sdi2

$ mdadm -E /dev/sde2
/dev/sde2:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x1
     Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
           Name : quassel:1  (local to host quassel)
  Creation Time : Sun Apr 12 14:19:47 2009
     Raid Level : raid5
   Raid Devices : 2

 Avail Dev Size : 976751736 (465.75 GiB 500.10 GB)
     Array Size : 976751616 (465.75 GiB 500.10 GB)
  Used Dev Size : 976751616 (465.75 GiB 500.10 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 0fcc7d6d:0ec92b47:c371f8e6:bd7d2cac

Internal Bitmap : 2 sectors from superblock
    Update Time : Tue Aug 11 01:40:18 2009
       Checksum : 4290b585 - correct
         Events : 108

         Layout : left-symmetric
     Chunk Size : 64K

    Array Slot : 2 (failed, failed, 1)
   Array State : _U 2 failed

$ mdadm -E /dev/sdi2
/dev/sdi2:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x1
     Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
           Name : quassel:1  (local to host quassel)
  Creation Time : Sun Apr 12 14:19:47 2009
     Raid Level : raid5
   Raid Devices : 2

 Avail Dev Size : 976751736 (465.75 GiB 500.10 GB)
     Array Size : 976751616 (465.75 GiB 500.10 GB)
  Used Dev Size : 976751616 (465.75 GiB 500.10 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 5ba69d85:c46d6bb0:bf71606e:2877b067

Internal Bitmap : 2 sectors from superblock
    Update Time : Mon Aug 10 15:32:23 2009
       Checksum : 6db9f21 - correct
         Events : 28

         Layout : left-symmetric
     Chunk Size : 64K

    Array Slot : 0 (failed, failed, 1)
   Array State : _u 2 failed

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Rebuild doesn't start
  2009-08-10 23:56 Rebuild doesn't start Oliver Martin
@ 2009-08-11  0:56 ` NeilBrown
  2009-08-11 13:28   ` Oliver Martin
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2009-08-11  0:56 UTC (permalink / raw)
  To: Oliver Martin; +Cc: linux-raid

On Tue, August 11, 2009 9:56 am, Oliver Martin wrote:
> Hello,
>
> I have two raid5 arrays spanning a number of USB drives. Yesterday, I
> unintentionally unplugged one of them while connecting another device
> to the same hub. The drive I unplugged used to be /dev/sdh, but when I
> plugged it back in, it became /dev/sdi. For md0, this didn't matter. I
> re-added it and it performed a rebuild* which completed successfully.
>
> md1, which used to consist of sde2 and sdh2, should now contain sde2
> and sdi2. For some reason, though, the rebuild doesn't start when I add
> sdi2. It seems md doesn't recognize sdi2 as the same device that used
> to be sdh2. Is that correct? How can I tell md about the name change?

If you look closely at the "mdadm -D" etc output that you included
you will see that md1 things that sdi2 is faulty.  Maybe it is.
You would need to check kernel logs to be sure.

>
>
> Thanks,
> Oliver
>
> [*] Bitmaps are enabled on both arrays, so I was somewhat surprised
> about the full rebuild; isn't that what bitmaps are supposed to prevent?

Yes, bitmaps should prevent a full rebuild.  I would need to see
kernel logs of when this rebuild happened and "mdadm -D" the
array to have any hope of guess why it didn't.

NeilBrown




>
>
> $ mdadm /dev/md1 -a /dev/sdi2
> mdadm: re-added /dev/sdi2
>
> $ cat /proc/mdstat
> [...]
> md1 : active raid5 sdi2[0](F) sde2[2]
>       488375808 blocks super 1.1 level 5, 64k chunk, algorithm 2 [2/1]
> [_U]
>       bitmap: 0/8 pages [0KB], 32768KB chunk
>
> $ mdadm -D /dev/md1
> /dev/md1:
>         Version : 1.01
>   Creation Time : Sun Apr 12 14:19:47 2009
>      Raid Level : raid5
>      Array Size : 488375808 (465.75 GiB 500.10 GB)
>   Used Dev Size : 488375808 (465.75 GiB 500.10 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 1
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Tue Aug 11 01:40:15 2009
>           State : active, degraded
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 1
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>            Name : quassel:1  (local to host quassel)
>            UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
>          Events : 106
>
>     Number   Major   Minor   RaidDevice State
>        0       0        0        0      removed
>        2       8       66        1      active sync   /dev/sde2
>
>        0       8      130        -      faulty spare   /dev/sdi2
>
> $ mdadm -E /dev/sde2
> /dev/sde2:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x1
>      Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
>            Name : quassel:1  (local to host quassel)
>   Creation Time : Sun Apr 12 14:19:47 2009
>      Raid Level : raid5
>    Raid Devices : 2
>
>  Avail Dev Size : 976751736 (465.75 GiB 500.10 GB)
>      Array Size : 976751616 (465.75 GiB 500.10 GB)
>   Used Dev Size : 976751616 (465.75 GiB 500.10 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : 0fcc7d6d:0ec92b47:c371f8e6:bd7d2cac
>
> Internal Bitmap : 2 sectors from superblock
>     Update Time : Tue Aug 11 01:40:18 2009
>        Checksum : 4290b585 - correct
>          Events : 108
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 2 (failed, failed, 1)
>    Array State : _U 2 failed
>
> $ mdadm -E /dev/sdi2
> /dev/sdi2:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x1
>      Array UUID : e9226e7f:cbdad2a1:481ce05b:9444d71d
>            Name : quassel:1  (local to host quassel)
>   Creation Time : Sun Apr 12 14:19:47 2009
>      Raid Level : raid5
>    Raid Devices : 2
>
>  Avail Dev Size : 976751736 (465.75 GiB 500.10 GB)
>      Array Size : 976751616 (465.75 GiB 500.10 GB)
>   Used Dev Size : 976751616 (465.75 GiB 500.10 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : 5ba69d85:c46d6bb0:bf71606e:2877b067
>
> Internal Bitmap : 2 sectors from superblock
>     Update Time : Mon Aug 10 15:32:23 2009
>        Checksum : 6db9f21 - correct
>          Events : 28
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 0 (failed, failed, 1)
>    Array State : _u 2 failed
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Rebuild doesn't start
  2009-08-11  0:56 ` NeilBrown
@ 2009-08-11 13:28   ` Oliver Martin
  0 siblings, 0 replies; 3+ messages in thread
From: Oliver Martin @ 2009-08-11 13:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Am Tue, 11 Aug 2009 10:56:02 +1000 (EST) schrieb NeilBrown:

> If you look closely at the "mdadm -D" etc output that you included
> you will see that md1 things that sdi2 is faulty.  Maybe it is.
> You would need to check kernel logs to be sure.

I don't think the drive is bad. SMART values look ok, and md0 didn't
have any problem with re-adding sdi1.

I forgot another strange thing: While I could add sdi1 to md0 and the
rebuild succeeded, I couldn't add sdi2 to md1 until after a reboot. I
always got an error like this:
mdadm: add new device failed for /dev/sdi2: Device or resource busy

When all this happened, I was running 2.6.29.1. Afterwards, I tried
upgrading to 2.6.30.4 to see if that solved the problem, but nothing
changed.

> Yes, bitmaps should prevent a full rebuild.  I would need to see
> kernel logs of when this rebuild happened and "mdadm -D" the
> array to have any hope of guess why it didn't.
> 
> NeilBrown

$ mdadm -D /dev/md0
/dev/md0:
        Version : 1.01
  Creation Time : Sat Mar 15 13:28:07 2008
     Raid Level : raid5
     Array Size : 1953535232 (1863.04 GiB 2000.42 GB)
  Used Dev Size : 488383808 (465.76 GiB 500.11 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Aug 10 19:29:47 2009
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : quassel:0  (local to host quassel)
           UUID : 1111b4fd:4219035a:f52968e6:cc4dd971
         Events : 650394

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       3       8       97        2      active sync   /dev/sdg1
       4       8      129        3      active sync   /dev/sdi1
       5       8       65        4      active sync   /dev/sde1


--- kernel log ---

21:58:14 usb 4-5.2.4: USB disconnect, address 13
21:58:28 usb 4-5.2.4: new high speed USB device using ehci_hcd and address 17
21:58:28 usb 4-5.2.4: configuration #1 chosen from 1 choice
21:58:28 scsi10 : SCSI emulation for USB Mass Storage devices
21:58:28 usb-storage: device found at 17
21:58:28 usb-storage: waiting for device to settle before scanning
21:58:33 usb-storage: device scan complete
21:58:33 scsi 10:0:0:0: Direct-Access     WDC WD10 EACS-00D6B0           PQ: 0 ANSI: 2 CCS
21:58:33 sd 10:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
21:58:33 sd 10:0:0:0: [sdi] Write Protect is off
21:58:33 sd 10:0:0:0: [sdi] Mode Sense: 00 38 00 00
21:58:33 sd 10:0:0:0: [sdi] Assuming drive cache: write through
21:58:33 sd 10:0:0:0: [sdi] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
21:58:33 sd 10:0:0:0: [sdi] Write Protect is off
21:58:33 sd 10:0:0:0: [sdi] Mode Sense: 00 38 00 00
21:58:33 sd 10:0:0:0: [sdi] Assuming drive cache: write through
21:58:33  sdi: sdi1 sdi2
21:58:33 sd 10:0:0:0: [sdi] Attached SCSI disk
21:58:33 sd 10:0:0:0: Attached scsi generic sg9 type 0

I think here I unmounted the file system and stopped the LVM device on
the array, but I'm not entirely sure. The initial 17 second delay
suggests that this is the first time the array was accessed after
unplugging the drive, since the drives were all spun down at the time.

22:03:57 md: md0 still in use.
22:03:57 md: md1 still in use.
22:03:57 md: md0 still in use.
22:03:57 md: md1 still in use.
22:04:14 end_request: I/O error, dev sdh, sector 2
22:04:14 md: super_written gets error=-5, uptodate=0
22:04:14 raid5: Disk failure on sdh1, disabling device.
22:04:14 raid5: Operation continuing on 4 devices.
22:04:14 RAID5 conf printout:
22:04:14  --- rd:5 wd:4
22:04:14  disk 0, o:1, dev:sdb1
22:04:14  disk 1, o:1, dev:sdd1
22:04:14  disk 2, o:1, dev:sdg1
22:04:14  disk 3, o:0, dev:sdh1
22:04:14  disk 4, o:1, dev:sde1
22:04:14 RAID5 conf printout:
22:04:14  --- rd:5 wd:4
22:04:14  disk 0, o:1, dev:sdb1
22:04:14  disk 1, o:1, dev:sdd1
22:04:14  disk 2, o:1, dev:sdg1
22:04:14  disk 4, o:1, dev:sde1
22:04:16 md: md0 still in use.
22:04:16 md: md1 still in use.
22:04:16 md: md0 still in use.
22:04:16 md: md1 still in use.
22:04:21 raid5: Disk failure on sdh2, disabling device.
22:04:21 raid5: Operation continuing on 1 devices.
22:04:21 RAID5 conf printout:
22:04:21  --- rd:2 wd:1
22:04:21  disk 0, o:0, dev:sdh2
22:04:21  disk 1, o:1, dev:sde2
22:04:21 RAID5 conf printout:
22:04:21  --- rd:2 wd:1
22:04:21  disk 1, o:1, dev:sde2

/etc/init.d/mdadm-raid stop
This is mdadm 2.6.8 from Debian lenny. That segfault probably shouldn't
have happened...

22:04:32 md: md0 stopped.
22:04:32 md: unbind<sdb1>
22:04:32 md: export_rdev(sdb1)
22:04:32 md: unbind<sde1>
22:04:32 md: export_rdev(sde1)
22:04:32 md: unbind<sdh1>
22:04:32 md: export_rdev(sdh1)
22:04:32 md: unbind<sdg1>
22:04:32 md: export_rdev(sdg1)
22:04:32 md: unbind<sdd1>
22:04:32 md: export_rdev(sdd1)
22:04:32 mdadm[18096]: segfault at 118 ip 0806a7b9 sp bffb8160 error 4 in mdadm[8048000+2a000]

/etc/init.d/mdadm-raid start

22:04:37 md: md0 stopped.
22:04:38 md: bind<sdd1>
22:04:38 md: bind<sdg1>
22:04:38 md: bind<sdi1>
22:04:38 md: bind<sde1>
22:04:38 md: bind<sdb1>
22:04:38 md: kicking non-fresh sdi1 from array!
22:04:38 md: unbind<sdi1>
22:04:38 md: export_rdev(sdi1)
22:04:38 raid5: device sdb1 operational as raid disk 0
22:04:38 raid5: device sde1 operational as raid disk 4
22:04:38 raid5: device sdg1 operational as raid disk 2
22:04:38 raid5: device sdd1 operational as raid disk 1
22:04:38 raid5: allocated 5255kB for md0
22:04:38 raid5: raid level 5 set md0 active with 4 out of 5 devices, algorithm 2
22:04:38 RAID5 conf printout:
22:04:38  --- rd:5 wd:4
22:04:38  disk 0, o:1, dev:sdb1
22:04:38  disk 1, o:1, dev:sdd1
22:04:38  disk 2, o:1, dev:sdg1
22:04:38  disk 4, o:1, dev:sde1
22:04:38 md0: bitmap initialized from disk: read 1/1 pages, set 1 bits
22:04:38 created bitmap (8 pages) for device md0
22:04:38 md0: detected capacity change from 0 to 2000420077568
22:04:38  md0: unknown partition table

mdadm /dev/md0 -a /dev/sdi1

22:05:21 md: bind<sdi1>
22:05:21 RAID5 conf printout:
22:05:21  --- rd:5 wd:4
22:05:21  disk 0, o:1, dev:sdb1
22:05:21  disk 1, o:1, dev:sdd1
22:05:21  disk 2, o:1, dev:sdg1
22:05:21  disk 3, o:1, dev:sdi1
22:05:21  disk 4, o:1, dev:sde1
22:05:21 md: recovery of RAID array md0
22:05:21 md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
22:05:21 md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
22:05:21 md: using 128k window, over a total of 488383808 blocks.

This is probably where I tried to add sdi2 to md1 without any luck.

22:05:54 md: export_rdev(sdi2)
22:05:55 md: export_rdev(sdi2)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-08-11 13:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-10 23:56 Rebuild doesn't start Oliver Martin
2009-08-11  0:56 ` NeilBrown
2009-08-11 13:28   ` Oliver Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox