From: Bart Kus <me@bartk.us>
To: linux-raid@vger.kernel.org
Subject: Re: (help!) MD RAID6 won't --re-add devices?
Date: Sat, 15 Jan 2011 11:50:58 -0800 [thread overview]
Message-ID: <4D31FAA2.2080202@bartk.us> (raw)
In-Reply-To: <4D31DE07.1000507@bartk.us>
Some research has revealed a frightening solution:
http://forums.gentoo.org/viewtopic-t-716757-start-0.html
That thread calls upon mdadm --create with the --assume-clean flag. It
also seems to re-enforce my suspicions that MD has lost my device order
numbers when it marked the drives as spare (thanks, MD! Remind me to
get you a nice christmas present next year.). I know the order of 5 out
of 10 devices, so that leaves 120 permutations to try. I've whipped up
some software to generate all the permuted mdadm --create commands.
The question now: how do I test if I've got the right combination? Can
I dd a meg off the assembled array and check for errors somewhere?
The other question: Is testing incorrect combinations destructive to any
data on the drives? Like, would RAID6 kick in and start "fixing" parity
errors, even if I'm just reading?
--Bart
On 1/15/2011 9:48 AM, Bart Kus wrote:
> Things seem to have gone from bad to worse. I upgraded to the latest
> mdadm, and it actually let me do an --add operation, but --re-add was
> still failing. It added all the devices as spares though. I stopped
> the array and tried to re-assemble it, but it's not starting.
>
> jo ~ # mdadm -A /dev/md4 -f -u da14eb85:00658f24:80f7a070:b9026515
> mdadm: /dev/md4 assembled from 5 drives and 5 spares - not enough to
> start the array.
>
> How do I promote these "spares" to being the active decides they once
> were? Yes, they're behind a few events, so there will be some data loss.
>
> --Bart
>
> On 1/13/2011 5:03 AM, Bart Kus wrote:
>> Hello,
>>
>> I had a Port Multiplier failure overnight. This put 5 out of 10
>> drives offline, degrading my RAID6 array. The file system is still
>> mounted (and failing to write):
>>
>> Buffer I/O error on device md4, logical block 3907023608
>> Filesystem "md4": xfs_log_force: error 5 returned.
>> etc...
>>
>> The array is in the following state:
>>
>> /dev/md4:
>> Version : 1.02
>> Creation Time : Sun Aug 10 23:41:49 2008
>> Raid Level : raid6
>> Array Size : 15628094464 (14904.11 GiB 16003.17 GB)
>> Used Dev Size : 1953511808 (1863.01 GiB 2000.40 GB)
>> Raid Devices : 10
>> Total Devices : 11
>> Persistence : Superblock is persistent
>>
>> Update Time : Wed Jan 12 05:32:14 2011
>> State : clean, degraded
>> Active Devices : 5
>> Working Devices : 5
>> Failed Devices : 6
>> Spare Devices : 0
>>
>> Chunk Size : 64K
>>
>> Name : 4
>> UUID : da14eb85:00658f24:80f7a070:b9026515
>> Events : 4300692
>>
>> Number Major Minor RaidDevice State
>> 15 8 1 0 active sync /dev/sda1
>> 1 0 0 1 removed
>> 12 8 33 2 active sync /dev/sdc1
>> 16 8 49 3 active sync /dev/sdd1
>> 4 0 0 4 removed
>> 20 8 193 5 active sync /dev/sdm1
>> 6 0 0 6 removed
>> 7 0 0 7 removed
>> 8 0 0 8 removed
>> 13 8 17 9 active sync /dev/sdb1
>>
>> 10 8 97 - faulty spare
>> 11 8 129 - faulty spare
>> 14 8 113 - faulty spare
>> 17 8 81 - faulty spare
>> 18 8 65 - faulty spare
>> 19 8 145 - faulty spare
>>
>> I have replaced the faulty PM and the drives have registered back
>> with the system, under new names:
>>
>> sd 3:0:0:0: [sdn] Attached SCSI disk
>> sd 3:1:0:0: [sdo] Attached SCSI disk
>> sd 3:2:0:0: [sdp] Attached SCSI disk
>> sd 3:4:0:0: [sdr] Attached SCSI disk
>> sd 3:3:0:0: [sdq] Attached SCSI disk
>>
>> But I can't seem to --re-add them into the array now!
>>
>> # mdadm /dev/md4 --re-add /dev/sdn1 --re-add /dev/sdo1 --re-add
>> /dev/sdp1 --re-add /dev/sdr1 --re-add /dev/sdq1
>> mdadm: add new device failed for /dev/sdn1 as 21: Device or resource
>> busy
>>
>> I haven't unmounted the file system and/or stopped the /dev/md4
>> device, since I think that would drop any buffers either layer might
>> be holding. I'd of course prefer to lose as little data as
>> possible. How can I get this array going again?
>>
>> PS: I think the reason "Failed Devices" shows 6 and not 5 is because
>> I had a single HD failure a couple weeks back. I replaced the drive
>> and the array re-built A-OK. I guess it still counted the failure
>> since the array wasn't stopped during the repair.
>>
>> Thanks for any guidance,
>>
>> --Bart
>>
>> PPS: mdadm - v3.0 - 2nd June 2009
>> PPS: Linux jo.bartk.us 2.6.35-gentoo-r9 #1 SMP Sat Oct 2 21:22:14 PDT
>> 2010 x86_64 Intel(R) Core(TM)2 Quad CPU @ 2.40GHz GenuineIntel GNU/Linux
>> PPS: # mdadm --examine /dev/sdn1
>> /dev/sdn1:
>> Magic : a92b4efc
>> Version : 1.2
>> Feature Map : 0x0
>> Array UUID : da14eb85:00658f24:80f7a070:b9026515
>> Name : 4
>> Creation Time : Sun Aug 10 23:41:49 2008
>> Raid Level : raid6
>> Raid Devices : 10
>>
>> Avail Dev Size : 3907023730 (1863.01 GiB 2000.40 GB)
>> Array Size : 31256188928 (14904.11 GiB 16003.17 GB)
>> Used Dev Size : 3907023616 (1863.01 GiB 2000.40 GB)
>> Data Offset : 272 sectors
>> Super Offset : 8 sectors
>> State : clean
>> Device UUID : c0cf419f:4c33dc64:84bc1c1a:7e9778ba
>>
>> Update Time : Wed Jan 12 05:39:55 2011
>> Checksum : bdb14e66 - correct
>> Events : 4300672
>>
>> Chunk Size : 64K
>>
>> Device Role : spare
>> Array State : A.AA.A...A ('A' == active, '.' == missing)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-01-15 19:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-13 13:03 (help!) MD RAID6 won't --re-add devices? Bart Kus
2011-01-15 17:48 ` Bart Kus
2011-01-15 19:50 ` Bart Kus [this message]
2011-01-16 0:05 ` Jérôme Poulin
2011-01-16 21:19 ` (help!) MD RAID6 won't --re-add devices? [SOLVED!] Bart Kus
-- strict thread matches above, loose matches on Subject: below --
2011-01-12 13:52 (help!) MD RAID6 won't --re-add devices? Bart Kus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D31FAA2.2080202@bartk.us \
--to=me@bartk.us \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.