linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Kus <me@bartk.us>
To: "Jérôme Poulin" <jeromepoulin@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: (help!) MD RAID6 won't --re-add devices? [SOLVED!]
Date: Sun, 16 Jan 2011 13:19:26 -0800	[thread overview]
Message-ID: <4D3360DE.3060603@bartk.us> (raw)
In-Reply-To: <AANLkTikwEG30jkZuPT+WAEc8=LLPi6c1-Rtv91uH3v-z@mail.gmail.com>

Thanks for the COW idea, had not thought of that.  Luckily, I had 10 
spare 2TB drives racked and powered, so I just backed up all the drives 
using dd.

Turns out a good way to test if you've got the right combination of 
drives is to do echo check > sync_action, wait 5 seconds, and then check 
mismatch_cnt.  If you've found the right combination, the count will be 
low or zero.

Another important thing to note is that "Version" reported by mdadm 
--detail /dev/mdX is NOT always the same as version reported by mdadm 
--examine /dev/sdX.  I guess array header and drive header track 
different version numbers.  My array header was reporting 1.02 while all 
the drives were showing 1.2.

And a key thing to know is that the default Data Offset has CHANGED over 
the years.  My original drives reported an offset of 272 sectors, and I 
believe the array was made with mdadm-2.6.6.  Using mdadm-3.1.4 to 
create a new array put the offset at 2048 sectors, a huge change!  Also, 
it seems when mdadm-3.1.4 added the old drives (272 offset at the time) 
into the array that was missing 5/10 drives and marked them as spares, 
the spare-marking process changed the offset to 384 sectors.  The array 
when created with mdadm-3.1.4 had actually reduced the Used Dev Size a 
bit from what the original array had, so none of the permutations worked 
since everything was misaligned.  I had to downgrade to mdadm-3.0 which 
created the array with the proper Dev Size and the proper Data Offset of 
272 sectors for the RAID6 blocks to line up.

Is there documentation somewhere about all these default changes?  I saw 
no options to specify the data offset either.  That would be a good 
option to add.

But best to add would be functional --re-add capability!  Reporting the 
array is "busy" when I'm trying to return its 5 missing drives isn't 
useful.  It should re-add its old drives as expected and flush any 
pending buffers.

Below is the (very hacky) code I used to test all the permutations of 
the 5 drives whose sequence was lost by being marked as spares.  
Hopefully it doesn't have to help anyone in the future.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>

char *permutation[] = { "nopqr", "noprq", "noqpr", "noqrp", "norpq", 
"norqp", "npoqr", "nporq", "npqor", "npqro", "nproq", "nprqo", "nqopr", 
"nqorp", "nqpor", "nqpro", "nqrop", "nqrpo", "nropq", "nroqp", "nrpoq", 
"nrpqo", "nrqop", "nrqpo", "onpqr", "onprq", "onqpr", "onqrp", "onrpq", 
"onrqp", "opnqr", "opnrq", "opqnr", "opqrn", "oprnq", "oprqn", "oqnpr", 
"oqnrp", "oqpnr", "oqprn", "oqrnp", "oqrpn", "ornpq", "ornqp", "orpnq", 
"orpqn", "orqnp", "orqpn", "pnoqr", "pnorq", "pnqor", "pnqro", "pnroq", 
"pnrqo", "ponqr", "ponrq", "poqnr", "poqrn", "pornq", "porqn", "pqnor", 
"pqnro", "pqonr", "pqorn", "pqrno", "pqron", "prnoq", "prnqo", "pronq", 
"proqn", "prqno", "prqon", "qnopr", "qnorp", "qnpor", "qnpro", "qnrop", 
"qnrpo", "qonpr", "qonrp", "qopnr", "qoprn", "qornp", "qorpn", "qpnor", 
"qpnro", "qponr", "qporn", "qprno", "qpron", "qrnop", "qrnpo", "qronp", 
"qropn", "qrpno", "qrpon", "rnopq", "rnoqp", "rnpoq", "rnpqo", "rnqop", 
"rnqpo", "ronpq", "ronqp", "ropnq", "ropqn", "roqnp", "roqpn", "rpnoq", 
"rpnqo", "rponq", "rpoqn", "rpqno", "rpqon", "rqnop", "rqnpo", "rqonp", 
"rqopn", "rqpno", "rqpon" };

int main()
{
         int i, mismatches, status;
         FILE *handle;
         char command[1024];

         for (i = 0; i < sizeof permutation / sizeof (char *); i++) {
                 mismatches = -1; // Safety
                 sprintf(command, "/sbin/mdadm --create /dev/md4 
--assume-clean -R -e 1.2 -l 6 -n 10 -c 64 /dev/sda1 /dev/sd%c1 /dev/sdc1 
/dev/sdd1 /dev/sd%c1 /dev/sdm1 /dev/sd%c1 /dev/sd%c1 /dev/sd%c1 /dev/sdb1",
                         permutation[i][0], permutation[i][1], 
permutation[i][2], permutation[i][3], permutation[i][4]);
                 printf("Running: %s\n", command);
                 status = system(command);
                 if (WEXITSTATUS(status) != 0) {
                         printf("Command error\n");
                         return;
                 }
                 sleep(1);
                 handle = fopen("/sys/block/md4/md/sync_action", "w");
                 fprintf(handle, "check\n");
                 fclose(handle);
                 sleep(5);
                 handle = fopen("/sys/block/md4/md/mismatch_cnt", "r");
                 fscanf(handle, "%d", &mismatches);
                 fclose(handle);
                 printf("Permutation %s = %d mismatches\n", 
permutation[i], mismatches);
                 fflush(stdout);
                 sprintf(command, "/sbin/mdadm --stop /dev/md4");
                 printf("Running: %s\n", command);
                 status = system(command);
                 if (WEXITSTATUS(status) != 0) {
                         printf("Command error\n");
                         return;
                 }
                 sleep(1);

         }
}

The permutations I got from an online permutation generator:

http://users.telenet.be/vdmoortel/dirk/Maths/permutations.html

Didn't feel like writing that part of the algorithm.

--Bart


On 1/15/2011 4:05 PM, Jérôme Poulin wrote:
> On Sat, Jan 15, 2011 at 2:50 PM, Bart Kus<me@bartk.us>  wrote:
>> Some research has revealed a frightening solution:
>>
>> http://forums.gentoo.org/viewtopic-t-716757-start-0.html
>>
>> That thread calls upon mdadm --create with the --assume-clean flag.  It also
>> seems to re-enforce my suspicions that MD has lost my device order numbers
>> when it marked the drives as spare (thanks, MD!  Remind me to get you a nice
>> christmas present next year.).  I know the order of 5 out of 10 devices, so
>> that leaves 120 permutations to try.  I've whipped up some software to
>> generate all the permuted mdadm --create commands.
>>
>> The question now: how do I test if I've got the right combination?  Can I dd
>> a meg off the assembled array and check for errors somewhere?
> I guess running a read-only fsck is the best way to proove it working.
>
>> The other question: Is testing incorrect combinations destructive to any
>> data on the drives?  Like, would RAID6 kick in and start "fixing" parity
>> errors, even if I'm just reading?
>>
> If you don't want to risk your data, you could create a cowloop of
> each device before writing to it, or dm snapshot using dmsetup.
>
> I made a script for dmsetup snapshot on the side when I really needed
> it because cowloop wouldn't compile. Here it is, it should help you
> understand how it works!
>
>
> RODATA=$1
> shift
> COWFILE=$1
> shift
> FSIZE=$1
> shift
> PREFIX=$1
> shift
>
> if [ -z $RODATA ] || [ -z $COWFILE ] || [ -z $FSIZE ] || [ ! -z $5 ]
> then
> 	echo "Usage: $0 [read only device] [loop file] [size of loop in MB] {prefix}"
> 	echo "Read only device won't ever get a write."
> 	echo "Loop file can be a file or device where writes will be directed too."
> 	echo "Size is specified in MB, you will be able to write that much
> change to the device created."
> 	echo "Prefix will get prepended to all devices created by this script
> in /dev/mapper"
> 	exit -1
> fi
>
> MRODATA=$PREFIX${RODATA#/dev/}data
> COWFILELOOP=$(losetup -f)
> MCOWFILE=$PREFIX${RODATA#/dev/}cow
> MSNAPSHOT=$PREFIX${RODATA#/dev/}snap
>
>
> dd if=/dev/zero of=$COWFILE bs=1M seek=$FSIZE count=1
> losetup $COWFILELOOP $COWFILE
> echo "0 $(blockdev --getsz $RODATA) linear $RODATA 0" | dmsetup create $MRODATA
> echo "0 $(blockdev --getsz $COWFILELOOP) linear $COWFILELOOP 0" |
> dmsetup create $MCOWFILE
> echo "0 $(blockdev --getsz /dev/mapper/$MRODATA) snapshot
> /dev/mapper/$MRODATA /dev/mapper/$MCOWFILE p 64" | dmsetup create
> $MSNAPSHOT
>
> echo "You can now use $MSNAPSHOT for your tests, up to ${FSIZE}MB."
> exit 0
>
>
>> --Bart
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2011-01-16 21:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-13 13:03 (help!) MD RAID6 won't --re-add devices? Bart Kus
2011-01-15 17:48 ` Bart Kus
2011-01-15 19:50   ` Bart Kus
2011-01-16  0:05     ` Jérôme Poulin
2011-01-16 21:19       ` Bart Kus [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D3360DE.3060603@bartk.us \
    --to=me@bartk.us \
    --cc=jeromepoulin@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).