Re: (help!) MD RAID6 won't --re-add devices?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Bart Kus <me@bartk.us>
To: linux-raid@vger.kernel.org
Subject: Re: (help!) MD RAID6 won't --re-add devices?
Date: Sat, 15 Jan 2011 11:50:58 -0800	[thread overview]
Message-ID: <4D31FAA2.2080202@bartk.us> (raw)
In-Reply-To: <4D31DE07.1000507@bartk.us>

Some research has revealed a frightening solution:

http://forums.gentoo.org/viewtopic-t-716757-start-0.html

That thread calls upon mdadm --create with the --assume-clean flag.  It 
also seems to re-enforce my suspicions that MD has lost my device order 
numbers when it marked the drives as spare (thanks, MD!  Remind me to 
get you a nice christmas present next year.).  I know the order of 5 out 
of 10 devices, so that leaves 120 permutations to try.  I've whipped up 
some software to generate all the permuted mdadm --create commands.

The question now: how do I test if I've got the right combination?  Can 
I dd a meg off the assembled array and check for errors somewhere?

The other question: Is testing incorrect combinations destructive to any 
data on the drives?  Like, would RAID6 kick in and start "fixing" parity 
errors, even if I'm just reading?

--Bart

On 1/15/2011 9:48 AM, Bart Kus wrote:
> Things seem to have gone from bad to worse.  I upgraded to the latest 
> mdadm, and it actually let me do an --add operation, but --re-add was 
> still failing.  It added all the devices as spares though.  I stopped 
> the array and tried to re-assemble it, but it's not starting.
>
> jo ~ # mdadm -A /dev/md4 -f -u da14eb85:00658f24:80f7a070:b9026515
> mdadm: /dev/md4 assembled from 5 drives and 5 spares - not enough to 
> start the array.
>
> How do I promote these "spares" to being the active decides they once 
> were?  Yes, they're behind a few events, so there will be some data loss.
>
> --Bart
>
> On 1/13/2011 5:03 AM, Bart Kus wrote:
>> Hello,
>>
>> I had a Port Multiplier failure overnight.  This put 5 out of 10 
>> drives offline, degrading my RAID6 array.  The file system is still 
>> mounted (and failing to write):
>>
>> Buffer I/O error on device md4, logical block 3907023608
>> Filesystem "md4": xfs_log_force: error 5 returned.
>> etc...
>>
>> The array is in the following state:
>>
>> /dev/md4:
>>         Version : 1.02
>>   Creation Time : Sun Aug 10 23:41:49 2008
>>      Raid Level : raid6
>>      Array Size : 15628094464 (14904.11 GiB 16003.17 GB)
>>   Used Dev Size : 1953511808 (1863.01 GiB 2000.40 GB)
>>    Raid Devices : 10
>>   Total Devices : 11
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Wed Jan 12 05:32:14 2011
>>           State : clean, degraded
>>  Active Devices : 5
>> Working Devices : 5
>>  Failed Devices : 6
>>   Spare Devices : 0
>>
>>      Chunk Size : 64K
>>
>>            Name : 4
>>            UUID : da14eb85:00658f24:80f7a070:b9026515
>>          Events : 4300692
>>
>>     Number   Major   Minor   RaidDevice State
>>       15       8        1        0      active sync   /dev/sda1
>>        1       0        0        1      removed
>>       12       8       33        2      active sync   /dev/sdc1
>>       16       8       49        3      active sync   /dev/sdd1
>>        4       0        0        4      removed
>>       20       8      193        5      active sync   /dev/sdm1
>>        6       0        0        6      removed
>>        7       0        0        7      removed
>>        8       0        0        8      removed
>>       13       8       17        9      active sync   /dev/sdb1
>>
>>       10       8       97        -      faulty spare
>>       11       8      129        -      faulty spare
>>       14       8      113        -      faulty spare
>>       17       8       81        -      faulty spare
>>       18       8       65        -      faulty spare
>>       19       8      145        -      faulty spare
>>
>> I have replaced the faulty PM and the drives have registered back 
>> with the system, under new names:
>>
>> sd 3:0:0:0: [sdn] Attached SCSI disk
>> sd 3:1:0:0: [sdo] Attached SCSI disk
>> sd 3:2:0:0: [sdp] Attached SCSI disk
>> sd 3:4:0:0: [sdr] Attached SCSI disk
>> sd 3:3:0:0: [sdq] Attached SCSI disk
>>
>> But I can't seem to --re-add them into the array now!
>>
>> # mdadm /dev/md4 --re-add /dev/sdn1 --re-add /dev/sdo1 --re-add 
>> /dev/sdp1 --re-add /dev/sdr1 --re-add /dev/sdq1
>> mdadm: add new device failed for /dev/sdn1 as 21: Device or resource 
>> busy
>>
>> I haven't unmounted the file system and/or stopped the /dev/md4 
>> device, since I think that would drop any buffers either layer might 
>> be holding.  I'd of course prefer to lose as little data as 
>> possible.  How can I get this array going again?
>>
>> PS: I think the reason "Failed Devices" shows 6 and not 5 is because 
>> I had a single HD failure a couple weeks back.  I replaced the drive 
>> and the array re-built A-OK.  I guess it still counted the failure 
>> since the array wasn't stopped during the repair.
>>
>> Thanks for any guidance,
>>
>> --Bart
>>
>> PPS: mdadm - v3.0 - 2nd June 2009
>> PPS: Linux jo.bartk.us 2.6.35-gentoo-r9 #1 SMP Sat Oct 2 21:22:14 PDT 
>> 2010 x86_64 Intel(R) Core(TM)2 Quad CPU @ 2.40GHz GenuineIntel GNU/Linux
>> PPS:  # mdadm --examine /dev/sdn1
>> /dev/sdn1:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : da14eb85:00658f24:80f7a070:b9026515
>>            Name : 4
>>   Creation Time : Sun Aug 10 23:41:49 2008
>>      Raid Level : raid6
>>    Raid Devices : 10
>>
>>  Avail Dev Size : 3907023730 (1863.01 GiB 2000.40 GB)
>>      Array Size : 31256188928 (14904.11 GiB 16003.17 GB)
>>   Used Dev Size : 3907023616 (1863.01 GiB 2000.40 GB)
>>     Data Offset : 272 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : c0cf419f:4c33dc64:84bc1c1a:7e9778ba
>>
>>     Update Time : Wed Jan 12 05:39:55 2011
>>        Checksum : bdb14e66 - correct
>>          Events : 4300672
>>
>>      Chunk Size : 64K
>>
>>    Device Role : spare
>>    Array State : A.AA.A...A ('A' == active, '.' == missing)
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-01-15 19:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-13 13:03 (help!) MD RAID6 won't --re-add devices? Bart Kus
2011-01-15 17:48 ` Bart Kus
2011-01-15 19:50   ` Bart Kus [this message]
2011-01-16  0:05     ` Jérôme Poulin
2011-01-16 21:19       ` (help!) MD RAID6 won't --re-add devices? [SOLVED!] Bart Kus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D31FAA2.2080202@bartk.us \
    --to=me@bartk.us \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).