From: NeilBrown <neilb@suse.de>
To: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: raid10 regression: unrecoverable raids
Date: Mon, 19 Mar 2012 22:08:01 +1100 [thread overview]
Message-ID: <20120319220801.23671fc5@notabene.brown> (raw)
In-Reply-To: <4F6711AB.7010906@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4972 bytes --]
On Mon, 19 Mar 2012 11:59:55 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
wrote:
> Hi,
>
> commit 2bb77736ae5dca0a189829fbb7379d43364a9dac
> Author: NeilBrown <neilb@suse.de>
> Date: Wed Jul 27 11:00:36 2011 +1000
>
> md/raid10: Make use of new recovery_disabled handling
>
> Caused a serious regression making it impossible to recover certain o2
> layout raid10 arrays if they get enter a double degraded state.
>
> If I create an array like this:
>
> root@monkeybay ~]# mdadm --create /dev/md25 --raid-devices=4 --chunk=512
> --level=raid10 --layout=o2 --assume-clean /dev/sda4 missing missing
> /dev/sdd4
o2 places data thus:
A B C D
D A B C
where columns are devices.
You've created an array with no place to store B.
mdadm or really shouldn't let you do that. That is the bug.
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md25 started.
>
> Then adding a spare like this:
> [root@monkeybay ~]# mdadm -a /dev/md25 /dev/sdb4
> mdadm: added /dev/sdb4
>
> The spare ends up being added into slot 4 rather than into the empty
> slot 1 and the array never rebuilds.
How could it rebuild? There is nowhere to get B from.
I'm surprised this every "worked"... but maybe I'm missing something.
NeilBrown
>
> [root@monkeybay ~]# mdadm --detail /dev/md25
> /dev/md25:
> Version : 1.2
> Creation Time : Mon Mar 19 12:52:52 2012
> Raid Level : raid10
> Array Size : 39059456 (37.25 GiB 40.00 GB)
> Used Dev Size : 19529728 (18.63 GiB 20.00 GB)
> Raid Devices : 4
> Total Devices : 3
> Persistence : Superblock is persistent
>
> Update Time : Mon Mar 19 12:52:56 2012
> State : clean, degraded
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 1
>
> Layout : offset=2
> Chunk Size : 512K
>
> Name : monkeybay:25 (local to host monkeybay)
> UUID : afbf95cf:7015f3ff:a788bd4d:03b0fe32
> Events : 7
>
> Number Major Minor RaidDevice State
> 0 8 4 0 active sync /dev/sda4
> 1 0 0 1 removed
> 2 0 0 2 removed
> 3 8 52 3 active sync /dev/sdd4
>
> 4 8 20 - spare /dev/sdb4
> [root@monkeybay ~]#
>
> This only seems to happen with o2 arrays, whereas n2 ones rebuild fine.
> I can reproduce the problem if I fail drives 0 and 3 or 1 and 2. Failing
> 1 and 3 or 2 and 4 works. The problem shows both when creating the array
> as above, or if creating it with all four drives and then failing them.
>
> I have been staring at this for a while, but it isn't quite obvious to
> me whether it is the recovery procedure that doesn't handle the double
> gap properly or whether it is the re-add that doesn't take the o2 layout
> into account properly.
>
> This is a fairly serious bug as once a raid hits this state, it is no
> longer possible to rebuild it even by adding more drives :(
>
> Neil. any idea what went wrong with the new bad block handling code in
> this case?
>
> Cheers,
> Jes
>
> dmesg output:
> md: bind<sda4>
> md: bind<sdd4>
> md/raid10:md25: active with 2 out of 4 devices
> md25: detected capacity change from 0 to 39996882944
> md25:
> md: bind<sdb4>
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 1, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> md: recovery of RAID array md25
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000
> KB/sec) for recovery.
> md: using 128k window, over a total of 19529728k.
> md/raid10:md25: insufficient working devices for recovery.
> md: md25: recovery done.
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 1, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 2, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> md: recovery of RAID array md25
> md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> md: using maximum available idle IO bandwidth (but not more than 200000
> KB/sec) for recovery.
> md: using 128k window, over a total of 19529728k.
> md/raid10:md25: insufficient working devices for recovery.
> md: md25: recovery done.
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 2, wo:1, o:1, dev:sdb4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 3, wo:0, o:1, dev:sdd4
> RAID10 conf printout:
> --- wd:2 rd:4
> disk 0, wo:0, o:1, dev:sda4
> disk 3, wo:0, o:1, dev:sdd4
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-03-19 11:08 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-19 10:59 raid10 regression: unrecoverable raids Jes Sorensen
2012-03-19 11:08 ` NeilBrown [this message]
2012-03-19 11:15 ` Jes Sorensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120319220801.23671fc5@notabene.brown \
--to=neilb@suse.de \
--cc=Jes.Sorensen@redhat.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.