raid10 regression: unrecoverable raids

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jes Sorensen <Jes.Sorensen@redhat.com>
To: "Brown, Neil" <neilb@suse.de>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: raid10 regression: unrecoverable raids
Date: Mon, 19 Mar 2012 11:59:55 +0100	[thread overview]
Message-ID: <4F6711AB.7010906@redhat.com> (raw)

Hi,

commit 2bb77736ae5dca0a189829fbb7379d43364a9dac
Author: NeilBrown <neilb@suse.de>
Date:   Wed Jul 27 11:00:36 2011 +1000

    md/raid10: Make use of new recovery_disabled handling

Caused a serious regression making it impossible to recover certain o2
layout raid10 arrays if they get enter a double degraded state.

If I create an array like this:

root@monkeybay ~]# mdadm --create /dev/md25 --raid-devices=4 --chunk=512
--level=raid10 --layout=o2  --assume-clean  /dev/sda4 missing missing
/dev/sdd4
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md25 started.

Then adding a spare like this:
[root@monkeybay ~]# mdadm -a /dev/md25 /dev/sdb4
mdadm: added /dev/sdb4

The spare ends up being added into slot 4 rather than into the empty
slot 1 and the array never rebuilds.

[root@monkeybay ~]# mdadm --detail /dev/md25
/dev/md25:
        Version : 1.2
  Creation Time : Mon Mar 19 12:52:52 2012
     Raid Level : raid10
     Array Size : 39059456 (37.25 GiB 40.00 GB)
  Used Dev Size : 19529728 (18.63 GiB 20.00 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Mar 19 12:52:56 2012
          State : clean, degraded
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : offset=2
     Chunk Size : 512K

           Name : monkeybay:25  (local to host monkeybay)
           UUID : afbf95cf:7015f3ff:a788bd4d:03b0fe32
         Events : 7

    Number   Major   Minor   RaidDevice State
       0       8        4        0      active sync   /dev/sda4
       1       0        0        1      removed
       2       0        0        2      removed
       3       8       52        3      active sync   /dev/sdd4

       4       8       20        -      spare   /dev/sdb4
[root@monkeybay ~]#

This only seems to happen with o2 arrays, whereas n2 ones rebuild fine.
I can reproduce the problem if I fail drives 0 and 3 or 1 and 2. Failing
1 and 3 or 2 and 4 works. The problem shows both when creating the array
as above, or if creating it with all four drives and then failing them.

I have been staring at this for a while, but it isn't quite obvious to
me whether it is the recovery procedure that doesn't handle the double
gap properly or whether it is the re-add that doesn't take the o2 layout
into account properly.

This is a fairly serious bug as once a raid hits this state, it is no
longer possible to rebuild it even by adding more drives :(

Neil. any idea what went wrong with the new bad block handling code in
this case?

Cheers,
Jes

dmesg output:
md: bind<sda4>
md: bind<sdd4>
md/raid10:md25: active with 2 out of 4 devices
md25: detected capacity change from 0 to 39996882944
 md25:
md: bind<sdb4>
RAID10 conf printout:
 --- wd:2 rd:4
 disk 0, wo:0, o:1, dev:sda4
 disk 1, wo:1, o:1, dev:sdb4
 disk 3, wo:0, o:1, dev:sdd4
md: recovery of RAID array md25
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for recovery.
md: using 128k window, over a total of 19529728k.
md/raid10:md25: insufficient working devices for recovery.
md: md25: recovery done.
RAID10 conf printout:
 --- wd:2 rd:4
 disk 0, wo:0, o:1, dev:sda4
 disk 1, wo:1, o:1, dev:sdb4
 disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
 --- wd:2 rd:4
 disk 0, wo:0, o:1, dev:sda4
 disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
 --- wd:2 rd:4
 disk 0, wo:0, o:1, dev:sda4
 disk 2, wo:1, o:1, dev:sdb4
 disk 3, wo:0, o:1, dev:sdd4
md: recovery of RAID array md25
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for recovery.
md: using 128k window, over a total of 19529728k.
md/raid10:md25: insufficient working devices for recovery.
md: md25: recovery done.
RAID10 conf printout:
 --- wd:2 rd:4
 disk 0, wo:0, o:1, dev:sda4
 disk 2, wo:1, o:1, dev:sdb4
 disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
 --- wd:2 rd:4
 disk 0, wo:0, o:1, dev:sda4
 disk 3, wo:0, o:1, dev:sdd4
RAID10 conf printout:
 --- wd:2 rd:4
 disk 0, wo:0, o:1, dev:sda4
 disk 3, wo:0, o:1, dev:sdd4

next             reply	other threads:[~2012-03-19 10:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-19 10:59 Jes Sorensen [this message]
2012-03-19 11:08 ` raid10 regression: unrecoverable raids NeilBrown
2012-03-19 11:15   ` Jes Sorensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F6711AB.7010906@redhat.com \
    --to=jes.sorensen@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).