mdraid10 regression in 2.6.27.4 (possibly earlier)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdraid10 regression in 2.6.27.4 (possibly earlier)
@ 2008-11-02 11:27 Peter Rabbitson
  2008-11-02 17:37 ` md raid10 " Thomas Backlund
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Rabbitson @ 2008-11-02 11:27 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 394 bytes --]

Hi,

Some weeks ago I upgraded from 2.6.23 to 2.6.27.4. After a failed hard
drive I realized that re-adding drives to a degraded raid10 no longer
works (it adds the drive as a spare and never starts a resync). Booting
back into the old .23 kernel allowed me to complete and resync the array
as usual. Attached find a test case reliably failing on vanilla 2.6.27.4
with no patches.

Thank you



[-- Attachment #2: raid_test_2.6.27.4 --]
[-- Type: text/plain, Size: 511 bytes --]

#!/bin/bash

set -e
[ -e /dev/loop1 ] || modprobe loop

for i in 1 2 3 4; do
    dd if=/dev/zero of=blkloop_$i bs=10M count=1
    losetup /dev/loop$i blkloop_$i
done

mdadm -C /dev/md7 -n 4 -l 10 -p f3 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4

# wait for sync
sleep 2

mdadm -f /dev/md7 /dev/loop1
mdadm -r /dev/md7 /dev/loop1

mdadm -a /dev/md7 /dev/loop1
for i in 1 2 3 4; do
    cat /proc/mdstat
    sleep 2
done


mdadm -S /dev/md7
for i in 1 2 3 4; do
    losetup -d /dev/loop$i
    rm blkloop_$i
done


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdraid10 regression in 2.6.27.4 (possibly earlier)
@ 2008-11-02 17:33 George Spelvin
  2008-11-03  8:30 ` George Spelvin
  0 siblings, 1 reply; 10+ messages in thread
From: George Spelvin @ 2008-11-02 17:33 UTC (permalink / raw)
  To: linux-raid; +Cc: linux

I'd just like to note that I have the same problem with 2.6.27.
/proc/mdstat is stuck saying
      
md4 : active raid10 sde3[6](S) sdf3[4] sdd3[2] sdc3[1] sdb3[0] sda3[5]
      131837184 blocks 256K chunks 2 near-copies [6/5] [UUU_UU]
      bitmap: 31/126 pages [124KB], 512KB chunk

while the kernel complains every second or so:
Nov  2 17:28:32: RAID10 conf printout:
Nov  2 17:28:32:  --- wd:5 rd:6
Nov  2 17:28:32:  disk 0, wo:0, o:1, dev:sdb3
Nov  2 17:28:32:  disk 1, wo:0, o:1, dev:sdc3
Nov  2 17:28:32:  disk 2, wo:0, o:1, dev:sdd3
Nov  2 17:28:32:  disk 4, wo:0, o:1, dev:sdf3
Nov  2 17:28:32:  disk 5, wo:0, o:1, dev:sda3
Nov  2 17:28:33: RAID10 conf printout:
Nov  2 17:28:33:  --- wd:5 rd:6
Nov  2 17:28:33:  disk 0, wo:0, o:1, dev:sdb3
Nov  2 17:28:33:  disk 1, wo:0, o:1, dev:sdc3
Nov  2 17:28:33:  disk 2, wo:0, o:1, dev:sdd3
Nov  2 17:28:33:  disk 4, wo:0, o:1, dev:sdf3
Nov  2 17:28:33:  disk 5, wo:0, o:1, dev:sda3
Nov  2 17:28:38: RAID10 conf printout:
Nov  2 17:28:38:  --- wd:5 rd:6
Nov  2 17:28:38:  disk 0, wo:0, o:1, dev:sdb3
Nov  2 17:28:38:  disk 1, wo:0, o:1, dev:sdc3
Nov  2 17:28:38:  disk 2, wo:0, o:1, dev:sdd3
Nov  2 17:28:38:  disk 4, wo:0, o:1, dev:sdf3
Nov  2 17:28:38:  disk 5, wo:0, o:1, dev:sda3
Nov  2 17:28:38: RAID10 conf printout:
Nov  2 17:28:38:  --- wd:5 rd:6
Nov  2 17:28:38:  disk 0, wo:0, o:1, dev:sdb3
Nov  2 17:28:38:  disk 1, wo:0, o:1, dev:sdc3
Nov  2 17:28:38:  disk 2, wo:0, o:1, dev:sdd3
Nov  2 17:28:38:  disk 4, wo:0, o:1, dev:sdf3
Nov  2 17:28:38:  disk 5, wo:0, o:1, dev:sda3
Nov  2 17:28:43: RAID10 conf printout:
Nov  2 17:28:43:  --- wd:5 rd:6
Nov  2 17:28:43:  disk 0, wo:0, o:1, dev:sdb3
Nov  2 17:28:43:  disk 1, wo:0, o:1, dev:sdc3
Nov  2 17:28:43:  disk 2, wo:0, o:1, dev:sdd3
Nov  2 17:28:43:  disk 4, wo:0, o:1, dev:sdf3
Nov  2 17:28:43:  disk 5, wo:0, o:1, dev:sda3
Nov  2 17:28:44: RAID10 conf printout:
Nov  2 17:28:44:  --- wd:5 rd:6
Nov  2 17:28:44:  disk 0, wo:0, o:1, dev:sdb3
Nov  2 17:28:44:  disk 1, wo:0, o:1, dev:sdc3
Nov  2 17:28:44:  disk 2, wo:0, o:1, dev:sdd3
Nov  2 17:28:44:  disk 4, wo:0, o:1, dev:sdf3
Nov  2 17:28:44:  disk 5, wo:0, o:1, dev:sda3

This is Not Good (tm).

Quad-core Phenom, 64-bit kernel and userland.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: md raid10 regression in 2.6.27.4 (possibly earlier)
  2008-11-02 11:27 mdraid10 regression in 2.6.27.4 (possibly earlier) Peter Rabbitson
@ 2008-11-02 17:37 ` Thomas Backlund
  2008-11-02 23:51   ` Thomas Backlund
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Backlund @ 2008-11-02 17:37 UTC (permalink / raw)
  To: linux-raid

Peter Rabbitson skrev:
> Hi,
> 
> Some weeks ago I upgraded from 2.6.23 to 2.6.27.4. After a failed hard
> drive I realized that re-adding drives to a degraded raid10 no longer
> works (it adds the drive as a spare and never starts a resync). Booting
> back into the old .23 kernel allowed me to complete and resync the array
> as usual. Attached find a test case reliably failing on vanilla 2.6.27.4
> with no patches.
> 

I've just been hit with the same problem...

I have a brand new server setup with 2.6.27.4 x86_64 kernel and a mix of
raid0, raid1, raid5 & raid10 partitions like this:
$ cat /proc/mdstat
Personalities : [raid10] [raid6] [raid5] [raid4] [raid1] [raid0]
md6 : active raid5 sdc8[2] sdb8[1] sda8[0] sdd8[3]
       2491319616 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
       bitmap: 0/198 pages [0KB], 2048KB chunk

md5 : active raid1 sda7[1] sdb7[0] sdd7[2]
       530048 blocks [4/4] [UUUU]

md3 : active raid10 sda5[4](S) sdb5[1] sdc5[2] sdd5[5](S)
       20980608 blocks 64K chunks 2 near-copies [4/2] [_UU_]

md1 : active raid0 sda2[0] sdb2[1] sdc2[2] sdd2[3]
       419456512 blocks 128k chunks

md2 : active raid10 sda3[4](S) sdc3[5](S) sdb3[1] sdd3[3]
       41961600 blocks 64K chunks 2 near-copies [4/2] [_U_U]

md0 : active raid10 sda1[0] sdd1[3] sdc1[2] sdb1[1]
       8401792 blocks 64K chunks 2 near-copies [4/4] [UUUU]

md4 : active raid10 sda6[0] sdd6[3] sdc6[2] sdb6[1]
       10506240 blocks 64K chunks 2 near-copies [4/4] [UUUU]



I have mdadm 2.6.7 with the following fixes:
d7ee65c960fa8a6886df7416307f57545ddc4460 "Fix bad metadata formatting"
43aaf431f66270080368d4b33378bd3dc0fa1c96 "Fix NULL pointer oops"

I was hitting the NULL pointer oops, wich prevented my md's to start 
fully, but with the patches above I can (re)boot the system without
beeing dropped into maintenance mode...

but I cant bring theese raid10 back fully online:

md3 : active raid10 sda5[4](S) sdb5[1] sdc5[2] sdd5[5](S)
       20980608 blocks 64K chunks 2 near-copies [4/2] [_UU_]

md2 : active raid10 sda3[4](S) sdc3[5](S) sdb3[1] sdd3[3]
       41961600 blocks 64K chunks 2 near-copies [4/2] [_U_U]

I can remove and add the missing disks, but they only end up as spares,
they dont get back online...

Any Pointers?

--
Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: md raid10 regression in 2.6.27.4 (possibly earlier)
  2008-11-02 17:37 ` md raid10 " Thomas Backlund
@ 2008-11-02 23:51   ` Thomas Backlund
  2008-11-03 18:09     ` Thomas Backlund
  2008-11-05 23:30     ` md raid10 regression in 2.6.27.4 (possibly earlier) BISECTED Thomas Backlund
  0 siblings, 2 replies; 10+ messages in thread
From: Thomas Backlund @ 2008-11-02 23:51 UTC (permalink / raw)
  Cc: linux-raid

Thomas Backlund skrev:
> Peter Rabbitson skrev:
>> Hi,
>>
>> Some weeks ago I upgraded from 2.6.23 to 2.6.27.4. After a failed hard
>> drive I realized that re-adding drives to a degraded raid10 no longer
>> works (it adds the drive as a spare and never starts a resync). Booting
>> back into the old .23 kernel allowed me to complete and resync the array
>> as usual. Attached find a test case reliably failing on vanilla 2.6.27.4
>> with no patches.
>>
> 
> I've just been hit with the same problem...
> 
> I have a brand new server setup with 2.6.27.4 x86_64 kernel and a mix of
> raid0, raid1, raid5 & raid10 partitions like this:

And an extra datapoint.

Booting into 2.6.26.5 triggers an instant resync of the spare disks, so 
it means we have a regression between 2.6.26.5 and 2.6.27.4

If no-one have a good suggestion to try, I'll start bisecting tomorrow...
--
Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mdraid10 regression in 2.6.27.4 (possibly earlier)
  2008-11-02 17:33 mdraid10 regression in 2.6.27.4 (possibly earlier) George Spelvin
@ 2008-11-03  8:30 ` George Spelvin
  0 siblings, 0 replies; 10+ messages in thread
From: George Spelvin @ 2008-11-03  8:30 UTC (permalink / raw)
  To: linux-raid; +Cc: tmb, linux

> And an extra datapoint.
>
> Booting into 2.6.26.5 triggers an instant resync of the spare disks, so 
> it means we have a regression between 2.6.26.5 and 2.6.27.4

Likewise, 2.6.26.6 fixed the problem for me.  And I can confirm that it's
broken in plain 2.6.27.

There are 8 commits to drivers/md/raid10.c in that interval.
commit 0310fa216decc3ecfab41f327638fa48a81f3735
    Allow raid10 resync to happening in larger chunks.
commit 1e24b15b267293567a8d752721c7ae63f281325a
    Merge branch 'for-linus' of git://neil.brown.name/md
commit 388667bed591b2359713bb17d5de0cf56e961447
    md: raid10: wake up frozen array
commit 8a392625b665c676a77c62f8608d10ff430bcb83
    Merge branch 'for-linus' of git://neil.brown.name/md
commit f233ea5c9e0d8b95e4283bf6a3436b88f6fd3586
    md: Make mddev->array_size sector-based.
commit cc371e66e340f35eed8dc4651c7c18e754c7fb26
    Add bvec_merge_data to handle stacked devices and ->merge_bvec()
commit 199050ea1ff2270174ee525b73bc4c3323098897
    rationalise return value for ->hot_add_disk method.
commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
    Support adding a spare to a live md array with external metadata.

It's not obvious which of these is the problem; they mostly look pretty simple
and safe, at least as far as raid10 ic concerned.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: md raid10 regression in 2.6.27.4 (possibly earlier)
  2008-11-02 23:51   ` Thomas Backlund
@ 2008-11-03 18:09     ` Thomas Backlund
  2008-11-03 18:28       ` Justin Piszcz
  2008-11-05 23:30     ` md raid10 regression in 2.6.27.4 (possibly earlier) BISECTED Thomas Backlund
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Backlund @ 2008-11-03 18:09 UTC (permalink / raw)
  To: linux-raid

Thomas Backlund skrev:
> Thomas Backlund skrev:
>> Peter Rabbitson skrev:
>>> Hi,
>>>
>>> Some weeks ago I upgraded from 2.6.23 to 2.6.27.4. After a failed hard
>>> drive I realized that re-adding drives to a degraded raid10 no longer
>>> works (it adds the drive as a spare and never starts a resync). Booting
>>> back into the old .23 kernel allowed me to complete and resync the array
>>> as usual. Attached find a test case reliably failing on vanilla 2.6.27.4
>>> with no patches.
>>>
>>
>> I've just been hit with the same problem...
>>
>> I have a brand new server setup with 2.6.27.4 x86_64 kernel and a mix of
>> raid0, raid1, raid5 & raid10 partitions like this:
> 
> And an extra datapoint.
> 
> Booting into 2.6.26.5 triggers an instant resync of the spare disks, so 
> it means we have a regression between 2.6.26.5 and 2.6.27.4
> 
> If no-one have a good suggestion to try, I'll start bisecting tomorrow...

Ands some more info...
After rebooting into 2.6.27.4 I got this again:

md5 : active raid1 sdb7[1] sda7[0] sdd7[2]
       530048 blocks [4/3] [UUU_]

md3 : active raid10 sdc5[4](S) sda5[3] sdd5[0] sdb5[1]
       20980608 blocks 64K chunks 2 near-copies [4/3] [UU_U]

md2 : active raid10 sdc3[4](S) sda3[5](S) sdd3[3] sdb3[1]
       41961600 blocks 64K chunks 2 near-copies [4/2] [_U_U]

So it seems it's not only raid10 affected....

and here how they are started:
[root@tmb ~]# cat /etc/udev/rules.d/70-mdadm.rules
SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
	RUN+="/sbin/mdadm --incremental --run --scan $root/%k"

--
Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: md raid10 regression in 2.6.27.4 (possibly earlier)
  2008-11-03 18:09     ` Thomas Backlund
@ 2008-11-03 18:28       ` Justin Piszcz
  0 siblings, 0 replies; 10+ messages in thread
From: Justin Piszcz @ 2008-11-03 18:28 UTC (permalink / raw)
  To: Thomas Backlund; +Cc: linux-raid



On Mon, 3 Nov 2008, Thomas Backlund wrote:

> Thomas Backlund skrev:
>> Thomas Backlund skrev:
>>> Peter Rabbitson skrev:
>
> Ands some more info...
> After rebooting into 2.6.27.4 I got this again:
>
> md5 : active raid1 sdb7[1] sda7[0] sdd7[2]
>      530048 blocks [4/3] [UUU_]
>
> md3 : active raid10 sdc5[4](S) sda5[3] sdd5[0] sdb5[1]
>      20980608 blocks 64K chunks 2 near-copies [4/3] [UU_U]
>
> md2 : active raid10 sdc3[4](S) sda3[5](S) sdd3[3] sdb3[1]
>      41961600 blocks 64K chunks 2 near-copies [4/2] [_U_U]
>
> So it seems it's not only raid10 affected....
>
> and here how they are started:
> [root@tmb ~]# cat /etc/udev/rules.d/70-mdadm.rules
> SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
> 	RUN+="/sbin/mdadm --incremental --run --scan $root/%k"
>

Maybe only raid10 + other raids?  Running r1+r5 here, no problems with 
2.6.27.4:

$ cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sdb2[1] sda2[0]
       136448 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
       276109056 blocks [2/2] [UU]

md3 : active raid5 sdl1[9] sdk1[6] sdj1[7] sdi1[5] sdh1[8] sdg1[4] sdf1[3] sde1[0] sdd1[1] sdc1[2]
       2637296640 blocks level 5, 1024k chunk, algorithm 2 [10/10] [UUUUUUUUUU]

md0 : active raid1 sdb1[1] sda1[0]
       16787776 blocks [2/2] [UU]

unused devices: <none>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: md raid10 regression in 2.6.27.4 (possibly earlier) BISECTED
  2008-11-02 23:51   ` Thomas Backlund
  2008-11-03 18:09     ` Thomas Backlund
@ 2008-11-05 23:30     ` Thomas Backlund
  2008-11-06  6:18       ` Neil Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Backlund @ 2008-11-05 23:30 UTC (permalink / raw)
  To: linux-raid

Thomas Backlund skrev:
> Thomas Backlund skrev:
>> Peter Rabbitson skrev:
>>> Hi,
>>>
>>> Some weeks ago I upgraded from 2.6.23 to 2.6.27.4. After a failed hard
>>> drive I realized that re-adding drives to a degraded raid10 no longer
>>> works (it adds the drive as a spare and never starts a resync). Booting
>>> back into the old .23 kernel allowed me to complete and resync the array
>>> as usual. Attached find a test case reliably failing on vanilla 2.6.27.4
>>> with no patches.
>>>
>>
>> I've just been hit with the same problem...
>>
>> I have a brand new server setup with 2.6.27.4 x86_64 kernel and a mix of
>> raid0, raid1, raid5 & raid10 partitions like this:
> 
> And an extra datapoint.
> 
> Booting into 2.6.26.5 triggers an instant resync of the spare disks, so 
> it means we have a regression between 2.6.26.5 and 2.6.27.4
> 
> If no-one have a good suggestion to try, I'll start bisecting tomorrow...

Ok,
so it was a pita to bisect as the testkernels oopsed from time to time
triggering a full rebuild of my 2.5 TiB raid5 wich takes ~4-5 hours to
complete... and since I wanted to wait for all raids to be fully up and 
synced between reboots, I hat to wait a lot...

But anyway...

This is the commit that breaks the raid10 rebuild/resync:

--- cut ---
6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda is first bad commit
commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
Author: Neil Brown <neilb@notabene.brown>
Date:   Sat Jun 28 08:31:31 2008 +1000

     Support adding a spare to a live md array with external metadata.

     i.e. extend the 'md/dev-XXX/slot' attribute so that you can
     tell a device to fill an vacant slot in an and md array.

     Signed-off-by: Neil Brown <neilb@suse.de>
--- cut ---

I have verified that adding this patch to a working 2.6.26 kernel breaks
the rebuild/resync

I have not verified if reverting it on a 2.6.27 kernel restores the 
rebuild/resync as it does not revert cleanly...

So...

Any suggestions of what to try next ?

--
Thomas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: md raid10 regression in 2.6.27.4 (possibly earlier) BISECTED
  2008-11-05 23:30     ` md raid10 regression in 2.6.27.4 (possibly earlier) BISECTED Thomas Backlund
@ 2008-11-06  6:18       ` Neil Brown
  2008-11-06  9:23         ` Thomas Backlund
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2008-11-06  6:18 UTC (permalink / raw)
  To: Thomas Backlund; +Cc: linux-raid

On Thursday November 6, tmb@mandriva.org wrote:
> 
> But anyway...
> 
> This is the commit that breaks the raid10 rebuild/resync:

Awesome. Thanks!

> 
> --- cut ---
> 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda is first bad commit
> commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
> Author: Neil Brown <neilb@notabene.brown>
                      ^^^^^^^^^^^^^^^^^^^^
Grown.  I hadn't noticed that.  Fixed now I hope.

> Date:   Sat Jun 28 08:31:31 2008 +1000
> 
>      Support adding a spare to a live md array with external metadata.
> 
>      i.e. extend the 'md/dev-XXX/slot' attribute so that you can
>      tell a device to fill an vacant slot in an and md array.
> 
>      Signed-off-by: Neil Brown <neilb@suse.de>
> --- cut ---
> 
> I have verified that adding this patch to a working 2.6.26 kernel breaks
> the rebuild/resync
> 
> I have not verified if reverting it on a 2.6.27 kernel restores the 
> rebuild/resync as it does not revert cleanly...
> 
> So...
> 
> Any suggestions of what to try next ?

You mean apart from hitting Neil with a clue-bat?

Maybe try this patch.  I haven't even compile tested it, but I'm
certain it'll fix your problem.

Thanks again,

NeilBrown

----------------------------------------------
From: NeilBrown <neilb@suse.de>
Date: Thu, 6 Nov 2008 17:14:31 +1100
Subject: [PATCH] md: fix bug in raid10 recovery.

Adding a spare to a raid10 doesn't cause recovery to start.
This is due to an silly type in
  commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
and so is a bug in 2.6.27 and .28-rc.

Thanks to Thomas Backlund for bisecting to find this.

Cc: Thomas Backlund <tmb@mandriva.org>
Cc: stable@kernel.org

Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index da5129a..970a96e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1137,7 +1137,7 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 	if (!enough(conf))
 		return -EINVAL;
 
-	if (rdev->raid_disk)
+	if (rdev->raid_disk >= 0)
 		first = last = rdev->raid_disk;
 
 	if (rdev->saved_raid_disk >= 0 &&
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: md raid10 regression in 2.6.27.4 (possibly earlier) BISECTED
  2008-11-06  6:18       ` Neil Brown
@ 2008-11-06  9:23         ` Thomas Backlund
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Backlund @ 2008-11-06  9:23 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid@vger.kernel.org

Neil Brown skrev:
> On Thursday November 6, tmb@mandriva.org wrote:
>> But anyway...
>>
>> This is the commit that breaks the raid10 rebuild/resync:
> 
> Awesome. Thanks!
> 
>> --- cut ---
>> 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda is first bad commit
>> commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
>> Author: Neil Brown <neilb@notabene.brown>
>                       ^^^^^^^^^^^^^^^^^^^^
> Grown.  I hadn't noticed that.  Fixed now I hope.
> 
>> Date:   Sat Jun 28 08:31:31 2008 +1000
>>
>>      Support adding a spare to a live md array with external metadata.
>>
>>      i.e. extend the 'md/dev-XXX/slot' attribute so that you can
>>      tell a device to fill an vacant slot in an and md array.
>>
>>      Signed-off-by: Neil Brown <neilb@suse.de>
>> --- cut ---
>>
>> I have verified that adding this patch to a working 2.6.26 kernel breaks
>> the rebuild/resync
>>
>> I have not verified if reverting it on a 2.6.27 kernel restores the 
>> rebuild/resync as it does not revert cleanly...
>>
>> So...
>>
>> Any suggestions of what to try next ?
> 
> You mean apart from hitting Neil with a clue-bat?
> 

;-)

> Maybe try this patch.  I haven't even compile tested it, but I'm
> certain it'll fix your problem.
> 

Yeah,
After reading the commit that broke raid10, it's an obvious fix...

And I have now verified it on a 2.6.27.4 kernel that it works!

Thanks!

> Thanks again,
> 
> NeilBrown
> 
> ----------------------------------------------
> From: NeilBrown <neilb@suse.de>
> Date: Thu, 6 Nov 2008 17:14:31 +1100
> Subject: [PATCH] md: fix bug in raid10 recovery.
> 
> Adding a spare to a raid10 doesn't cause recovery to start.
> This is due to an silly type in
>   commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
> and so is a bug in 2.6.27 and .28-rc.
> 
> Thanks to Thomas Backlund for bisecting to find this.
> 
> Cc: Thomas Backlund <tmb@mandriva.org>
> Cc: stable@kernel.org
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
   Tested-by: Thomas Backlund <tmb@mandriva.org>
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index da5129a..970a96e 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1137,7 +1137,7 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
>  	if (!enough(conf))
>  		return -EINVAL;
>  
> -	if (rdev->raid_disk)
> +	if (rdev->raid_disk >= 0)
>  		first = last = rdev->raid_disk;
>  
>  	if (rdev->saved_raid_disk >= 0 &&


--
Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-11-06  9:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-02 11:27 mdraid10 regression in 2.6.27.4 (possibly earlier) Peter Rabbitson
2008-11-02 17:37 ` md raid10 " Thomas Backlund
2008-11-02 23:51   ` Thomas Backlund
2008-11-03 18:09     ` Thomas Backlund
2008-11-03 18:28       ` Justin Piszcz
2008-11-05 23:30     ` md raid10 regression in 2.6.27.4 (possibly earlier) BISECTED Thomas Backlund
2008-11-06  6:18       ` Neil Brown
2008-11-06  9:23         ` Thomas Backlund
  -- strict thread matches above, loose matches on Subject: below --
2008-11-02 17:33 mdraid10 regression in 2.6.27.4 (possibly earlier) George Spelvin
2008-11-03  8:30 ` George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).