* raid10 regression: unrecoverable raids
@ 2012-03-19 10:59 Jes Sorensen
2012-03-19 11:08 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Jes Sorensen @ 2012-03-19 10:59 UTC (permalink / raw)
To: Brown, Neil, linux-raid@vger.kernel.org
Hi,
commit 2bb77736ae5dca0a189829fbb7379d43364a9dac
Author: NeilBrown <neilb@suse.de>
Date: Wed Jul 27 11:00:36 2011 +1000
md/raid10: Make use of new recovery_disabled handling
Caused a serious regression making it impossible to recover certain o2
layout raid10 arrays if they get enter a double degraded state.
If I create an array like this:
root@monkeybay ~]# mdadm --create /dev/md25 --raid-devices=4 --chunk=512
--level=raid10 --layout=o2 --assume-clean /dev/sda4 missing missing
/dev/sdd4
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md25 started.
Then adding a spare like this:
[root@monkeybay ~]# mdadm -a /dev/md25 /dev/sdb4
mdadm: added /dev/sdb4
The spare ends up being added into slot 4 rather than into the empty
slot 1 and the array never rebuilds.
[root@monkeybay ~]# mdadm --detail /dev/md25
/dev/md25:
Version : 1.2
Creation Time : Mon Mar 19 12:52:52 2012
Raid Level : raid10
Array Size : 39059456 (37.25 GiB 40.00 GB)
Used Dev Size : 19529728 (18.63 GiB 20.00 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Mon Mar 19 12:52:56 2012
State : clean, degraded
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Layout : offset=2
Chunk Size : 512K
Name : monkeybay:25 (local to host monkeybay)
UUID : afbf95cf:7015f3ff:a788bd4d:03b0fe32
Events : 7
Number Major Minor RaidDevice State
0 8 4 0 active sync /dev/sda4
1 0 0 1 removed
2 0 0 2 removed
3 8 52 3 active sync /dev/sdd4
4 8 20 - spare /dev/sdb4
[root@monkeybay ~]#
This only seems to happen with o2 arrays, whereas n2 ones rebuild fine.
I can reproduce the problem if I fail drives 0 and 3 or 1 and 2. Failing
1 and 3 or 2 and 4 works. The problem shows both when creating the array
as above, or if creating it with all four drives and then failing them.
I have been staring at this for a while, but it isn't quite obvious to
me whether it is the recovery procedure that doesn't handle the double
gap properly or whether it is the re-add that doesn't take the o2 layout
into account properly.
This is a fairly serious bug as once a raid hits this state, it is no
longer possible to rebuild it even by adding more drives :(
Neil. any idea what went wrong with the new bad block handling code in
this case?
Cheers,
Jes
dmesg output:
md: bind<sda4>
md: bind<sdd4>
md/raid10:md25: active with 2 out of 4 devices
md25: detected capacity change from 0 to 39996882944
md25:
md: bind<sdb4>
RAID10 conf printout:
--- wd:2 rd:4
disk 0, wo:0, o:1, dev:sda4
disk 1, wo:1, o:1, dev:sdb4
disk 3, wo:0, o:1, dev:sdd4
md: recovery of RAID array md25
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for recovery.
md: using 128k window, over a total of 19529728k.
md/raid10:md25: insufficient working devices for recovery.
md: md25: recovery done.
RAID10 conf printout:
--- wd:2 rd:4
disk 0, wo:0, o:1, dev:sda4
disk 1, wo:1, o:1, dev:sdb4
disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
--- wd:2 rd:4
disk 0, wo:0, o:1, dev:sda4
disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
--- wd:2 rd:4
disk 0, wo:0, o:1, dev:sda4
disk 2, wo:1, o:1, dev:sdb4
disk 3, wo:0, o:1, dev:sdd4
md: recovery of RAID array md25
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for recovery.
md: using 128k window, over a total of 19529728k.
md/raid10:md25: insufficient working devices for recovery.
md: md25: recovery done.
RAID10 conf printout:
--- wd:2 rd:4
disk 0, wo:0, o:1, dev:sda4
disk 2, wo:1, o:1, dev:sdb4
disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
--- wd:2 rd:4
disk 0, wo:0, o:1, dev:sda4
disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
--- wd:2 rd:4
disk 0, wo:0, o:1, dev:sda4
disk 3, wo:0, o:1, dev:sdd4
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: raid10 regression: unrecoverable raids
2012-03-19 10:59 raid10 regression: unrecoverable raids Jes Sorensen
@ 2012-03-19 11:08 ` NeilBrown
2012-03-19 11:15 ` Jes Sorensen
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2012-03-19 11:08 UTC (permalink / raw)
To: Jes Sorensen; +Cc: linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 4972 bytes --]
On Mon, 19 Mar 2012 11:59:55 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
wrote:
> Hi,
>
> commit 2bb77736ae5dca0a189829fbb7379d43364a9dac
> Author: NeilBrown <neilb@suse.de>
> Date: Wed Jul 27 11:00:36 2011 +1000
>
> md/raid10: Make use of new recovery_disabled handling
>
> Caused a serious regression making it impossible to recover certain o2
> layout raid10 arrays if they get enter a double degraded state.
>
> If I create an array like this:
>
> root@monkeybay ~]# mdadm --create /dev/md25 --raid-devices=4 --chunk=512
> --level=raid10 --layout=o2 --assume-clean /dev/sda4 missing missing
> /dev/sdd4
o2 places data thus:
A B C D
D A B C
where columns are devices.
You've created an array with no place to store B.
mdadm or really shouldn't let you do that. That is the bug.
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md25 started.
>
> Then adding a spare like this:
> [root@monkeybay ~]# mdadm -a /dev/md25 /dev/sdb4
> mdadm: added /dev/sdb4
>
> The spare ends up being added into slot 4 rather than into the empty
> slot 1 and the array never rebuilds.
How could it rebuild? There is nowhere to get B from.
I'm surprised this every "worked"... but maybe I'm missing something.
NeilBrown
>
> [root@monkeybay ~]# mdadm --detail /dev/md25
> /dev/md25:
> Version : 1.2
> Creation Time : Mon Mar 19 12:52:52 2012
> Raid Level : raid10
> Array Size : 39059456 (37.25 GiB 40.00 GB)
> Used Dev Size : 19529728 (18.63 GiB 20.00 GB)
> Raid Devices : 4
> Total Devices : 3
> Persistence : Superblock is persistent
>
> Update Time : Mon Mar 19 12:52:56 2012
> State : clean, degraded
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 1
>
> Layout : offset=2
> Chunk Size : 512K
>
> Name : monkeybay:25 (local to host monkeybay)
> UUID : afbf95cf:7015f3ff:a788bd4d:03b0fe32
> Events : 7
>
> Number Major Minor RaidDevice State
> 0 8 4 0 active sync /dev/sda4
> 1 0 0 1 removed
> 2 0 0 2 removed
> 3 8 52 3 active sync /dev/sdd4
>
> 4 8 20 - spare /dev/sdb4
> [root@monkeybay ~]#
>
> This only seems to happen with o2 arrays, whereas n2 ones rebuild fine.
> I can reproduce the problem if I fail drives 0 and 3 or 1 and 2. Failing
> 1 and 3 or 2 and 4 works. The problem shows both when creating the array
> as above, or if creating it with all four drives and then failing them.
>
> I have been staring at this for a while, but it isn't quite obvious to
> me whether it is the recovery procedure that doesn't handle the double
> gap properly or whether it is the re-add that doesn't take the o2 layout
> into account properly.
>
> This is a fairly serious bug as once a raid hits this state, it is no
> longer possible to rebuild it even by adding more drives :(
>
> Neil. any idea what went wrong with the new bad block handling code in
> this case?
>
> Cheers,
> Jes
>
> dmesg output:
> md: bind<sda4>
> md: bind<sdd4>
> md/raid10:md25: active with 2 out of 4 devices
> md25: detected capacity change from 0 to 39996882944
> md25:
> md: bind<sdb4>
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 1, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> md: recovery of RAID array md25
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000
> KB/sec) for recovery.
> md: using 128k window, over a total of 19529728k.
> md/raid10:md25: insufficient working devices for recovery.
> md: md25: recovery done.
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 1, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 2, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> md: recovery of RAID array md25
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000
> KB/sec) for recovery.
> md: using 128k window, over a total of 19529728k.
> md/raid10:md25: insufficient working devices for recovery.
> md: md25: recovery done.
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 2, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 3, wo:0, o:1, dev:sdd4
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: raid10 regression: unrecoverable raids
2012-03-19 11:08 ` NeilBrown
@ 2012-03-19 11:15 ` Jes Sorensen
0 siblings, 0 replies; 3+ messages in thread
From: Jes Sorensen @ 2012-03-19 11:15 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid@vger.kernel.org
On 03/19/12 12:08, NeilBrown wrote:
> On Mon, 19 Mar 2012 11:59:55 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
> wrote:
>
>> Hi,
>>
>> commit 2bb77736ae5dca0a189829fbb7379d43364a9dac
>> Author: NeilBrown <neilb@suse.de>
>> Date: Wed Jul 27 11:00:36 2011 +1000
>>
>> md/raid10: Make use of new recovery_disabled handling
>>
>> Caused a serious regression making it impossible to recover certain o2
>> layout raid10 arrays if they get enter a double degraded state.
>>
>> If I create an array like this:
>>
>> root@monkeybay ~]# mdadm --create /dev/md25 --raid-devices=4 --chunk=512
>> --level=raid10 --layout=o2 --assume-clean /dev/sda4 missing missing
>> /dev/sdd4
>
> o2 places data thus:
>
> A B C D
> D A B C
>
> where columns are devices.
>
> You've created an array with no place to store B.
> mdadm or really shouldn't let you do that. That is the bug.
Here I was thinking it would rely on alien storage that would get
swapped in magically when something was missing ;)
Actually I thought raid10 here as operating more as two raid1's
concatenated.
>> mdadm: Defaulting to version 1.2 metadata
>> mdadm: array /dev/md25 started.
>>
>> Then adding a spare like this:
>> [root@monkeybay ~]# mdadm -a /dev/md25 /dev/sdb4
>> mdadm: added /dev/sdb4
>>
>> The spare ends up being added into slot 4 rather than into the empty
>> slot 1 and the array never rebuilds.
>
> How could it rebuild? There is nowhere to get B from.
>
> I'm surprised this every "worked"... but maybe I'm missing something.
Well it seems to be more -ENOCLUE from my side here :) Should we do
something in mdadm to prevent creating an array this way?
Cheers,
Jes
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-03-19 11:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-19 10:59 raid10 regression: unrecoverable raids Jes Sorensen
2012-03-19 11:08 ` NeilBrown
2012-03-19 11:15 ` Jes Sorensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).