From: Bart Kus <me@bartk.us>
To: "Jérôme Poulin" <jeromepoulin@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: (help!) MD RAID6 won't --re-add devices? [SOLVED!]
Date: Sun, 16 Jan 2011 13:19:26 -0800 [thread overview]
Message-ID: <4D3360DE.3060603@bartk.us> (raw)
In-Reply-To: <AANLkTikwEG30jkZuPT+WAEc8=LLPi6c1-Rtv91uH3v-z@mail.gmail.com>
Thanks for the COW idea, had not thought of that. Luckily, I had 10
spare 2TB drives racked and powered, so I just backed up all the drives
using dd.
Turns out a good way to test if you've got the right combination of
drives is to do echo check > sync_action, wait 5 seconds, and then check
mismatch_cnt. If you've found the right combination, the count will be
low or zero.
Another important thing to note is that "Version" reported by mdadm
--detail /dev/mdX is NOT always the same as version reported by mdadm
--examine /dev/sdX. I guess array header and drive header track
different version numbers. My array header was reporting 1.02 while all
the drives were showing 1.2.
And a key thing to know is that the default Data Offset has CHANGED over
the years. My original drives reported an offset of 272 sectors, and I
believe the array was made with mdadm-2.6.6. Using mdadm-3.1.4 to
create a new array put the offset at 2048 sectors, a huge change! Also,
it seems when mdadm-3.1.4 added the old drives (272 offset at the time)
into the array that was missing 5/10 drives and marked them as spares,
the spare-marking process changed the offset to 384 sectors. The array
when created with mdadm-3.1.4 had actually reduced the Used Dev Size a
bit from what the original array had, so none of the permutations worked
since everything was misaligned. I had to downgrade to mdadm-3.0 which
created the array with the proper Dev Size and the proper Data Offset of
272 sectors for the RAID6 blocks to line up.
Is there documentation somewhere about all these default changes? I saw
no options to specify the data offset either. That would be a good
option to add.
But best to add would be functional --re-add capability! Reporting the
array is "busy" when I'm trying to return its 5 missing drives isn't
useful. It should re-add its old drives as expected and flush any
pending buffers.
Below is the (very hacky) code I used to test all the permutations of
the 5 drives whose sequence was lost by being marked as spares.
Hopefully it doesn't have to help anyone in the future.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
char *permutation[] = { "nopqr", "noprq", "noqpr", "noqrp", "norpq",
"norqp", "npoqr", "nporq", "npqor", "npqro", "nproq", "nprqo", "nqopr",
"nqorp", "nqpor", "nqpro", "nqrop", "nqrpo", "nropq", "nroqp", "nrpoq",
"nrpqo", "nrqop", "nrqpo", "onpqr", "onprq", "onqpr", "onqrp", "onrpq",
"onrqp", "opnqr", "opnrq", "opqnr", "opqrn", "oprnq", "oprqn", "oqnpr",
"oqnrp", "oqpnr", "oqprn", "oqrnp", "oqrpn", "ornpq", "ornqp", "orpnq",
"orpqn", "orqnp", "orqpn", "pnoqr", "pnorq", "pnqor", "pnqro", "pnroq",
"pnrqo", "ponqr", "ponrq", "poqnr", "poqrn", "pornq", "porqn", "pqnor",
"pqnro", "pqonr", "pqorn", "pqrno", "pqron", "prnoq", "prnqo", "pronq",
"proqn", "prqno", "prqon", "qnopr", "qnorp", "qnpor", "qnpro", "qnrop",
"qnrpo", "qonpr", "qonrp", "qopnr", "qoprn", "qornp", "qorpn", "qpnor",
"qpnro", "qponr", "qporn", "qprno", "qpron", "qrnop", "qrnpo", "qronp",
"qropn", "qrpno", "qrpon", "rnopq", "rnoqp", "rnpoq", "rnpqo", "rnqop",
"rnqpo", "ronpq", "ronqp", "ropnq", "ropqn", "roqnp", "roqpn", "rpnoq",
"rpnqo", "rponq", "rpoqn", "rpqno", "rpqon", "rqnop", "rqnpo", "rqonp",
"rqopn", "rqpno", "rqpon" };
int main()
{
int i, mismatches, status;
FILE *handle;
char command[1024];
for (i = 0; i < sizeof permutation / sizeof (char *); i++) {
mismatches = -1; // Safety
sprintf(command, "/sbin/mdadm --create /dev/md4
--assume-clean -R -e 1.2 -l 6 -n 10 -c 64 /dev/sda1 /dev/sd%c1 /dev/sdc1
/dev/sdd1 /dev/sd%c1 /dev/sdm1 /dev/sd%c1 /dev/sd%c1 /dev/sd%c1 /dev/sdb1",
permutation[i][0], permutation[i][1],
permutation[i][2], permutation[i][3], permutation[i][4]);
printf("Running: %s\n", command);
status = system(command);
if (WEXITSTATUS(status) != 0) {
printf("Command error\n");
return;
}
sleep(1);
handle = fopen("/sys/block/md4/md/sync_action", "w");
fprintf(handle, "check\n");
fclose(handle);
sleep(5);
handle = fopen("/sys/block/md4/md/mismatch_cnt", "r");
fscanf(handle, "%d", &mismatches);
fclose(handle);
printf("Permutation %s = %d mismatches\n",
permutation[i], mismatches);
fflush(stdout);
sprintf(command, "/sbin/mdadm --stop /dev/md4");
printf("Running: %s\n", command);
status = system(command);
if (WEXITSTATUS(status) != 0) {
printf("Command error\n");
return;
}
sleep(1);
}
}
The permutations I got from an online permutation generator:
http://users.telenet.be/vdmoortel/dirk/Maths/permutations.html
Didn't feel like writing that part of the algorithm.
--Bart
On 1/15/2011 4:05 PM, Jérôme Poulin wrote:
> On Sat, Jan 15, 2011 at 2:50 PM, Bart Kus<me@bartk.us> wrote:
>> Some research has revealed a frightening solution:
>>
>> http://forums.gentoo.org/viewtopic-t-716757-start-0.html
>>
>> That thread calls upon mdadm --create with the --assume-clean flag. It also
>> seems to re-enforce my suspicions that MD has lost my device order numbers
>> when it marked the drives as spare (thanks, MD! Remind me to get you a nice
>> christmas present next year.). I know the order of 5 out of 10 devices, so
>> that leaves 120 permutations to try. I've whipped up some software to
>> generate all the permuted mdadm --create commands.
>>
>> The question now: how do I test if I've got the right combination? Can I dd
>> a meg off the assembled array and check for errors somewhere?
> I guess running a read-only fsck is the best way to proove it working.
>
>> The other question: Is testing incorrect combinations destructive to any
>> data on the drives? Like, would RAID6 kick in and start "fixing" parity
>> errors, even if I'm just reading?
>>
> If you don't want to risk your data, you could create a cowloop of
> each device before writing to it, or dm snapshot using dmsetup.
>
> I made a script for dmsetup snapshot on the side when I really needed
> it because cowloop wouldn't compile. Here it is, it should help you
> understand how it works!
>
>
> RODATA=$1
> shift
> COWFILE=$1
> shift
> FSIZE=$1
> shift
> PREFIX=$1
> shift
>
> if [ -z $RODATA ] || [ -z $COWFILE ] || [ -z $FSIZE ] || [ ! -z $5 ]
> then
> echo "Usage: $0 [read only device] [loop file] [size of loop in MB] {prefix}"
> echo "Read only device won't ever get a write."
> echo "Loop file can be a file or device where writes will be directed too."
> echo "Size is specified in MB, you will be able to write that much
> change to the device created."
> echo "Prefix will get prepended to all devices created by this script
> in /dev/mapper"
> exit -1
> fi
>
> MRODATA=$PREFIX${RODATA#/dev/}data
> COWFILELOOP=$(losetup -f)
> MCOWFILE=$PREFIX${RODATA#/dev/}cow
> MSNAPSHOT=$PREFIX${RODATA#/dev/}snap
>
>
> dd if=/dev/zero of=$COWFILE bs=1M seek=$FSIZE count=1
> losetup $COWFILELOOP $COWFILE
> echo "0 $(blockdev --getsz $RODATA) linear $RODATA 0" | dmsetup create $MRODATA
> echo "0 $(blockdev --getsz $COWFILELOOP) linear $COWFILELOOP 0" |
> dmsetup create $MCOWFILE
> echo "0 $(blockdev --getsz /dev/mapper/$MRODATA) snapshot
> /dev/mapper/$MRODATA /dev/mapper/$MCOWFILE p 64" | dmsetup create
> $MSNAPSHOT
>
> echo "You can now use $MSNAPSHOT for your tests, up to ${FSIZE}MB."
> exit 0
>
>
>> --Bart
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2011-01-16 21:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-13 13:03 (help!) MD RAID6 won't --re-add devices? Bart Kus
2011-01-15 17:48 ` Bart Kus
2011-01-15 19:50 ` Bart Kus
2011-01-16 0:05 ` Jérôme Poulin
2011-01-16 21:19 ` Bart Kus [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D3360DE.3060603@bartk.us \
--to=me@bartk.us \
--cc=jeromepoulin@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).