From: Neil Brown <neilb@suse.de>
To: George Spelvin <linux@horizon.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: 3-way mirrors
Date: Wed, 8 Sep 2010 08:01:55 +1000 [thread overview]
Message-ID: <20100908080155.2ed9300e@notabene> (raw)
In-Reply-To: <20100907141904.3696.qmail@science.horizon.com>
On 7 Sep 2010 10:19:04 -0400
"George Spelvin" <linux@horizon.com> wrote:
> After some frustration with RAID-5 finding mismatches and not being
> able to figure out which drive has the problem, I'm setting up a rather
> intricate 5-way mirrored (x 2-way striped) system.
>
> The intention is that 3 copies will be on line at any time (dropping to
> 2 in case of disk failure), while copies 4 and 5 will be kept off-site.
> Occasionally one will come in, be re-synced, and then removed again.
> (The file system can be quiesced briefly to permit a clean split.)
>
> Anyway, one nice property of a 2-drive redundancy (3+-way mirror or
> RAID-6) is error detection: in case of a mismatch, it's possible to
> finger the offending drive.
>
> My understanding of the current code is that it just copies one mirror
> (the first readable?) to the others. Does someone have a patch to vote
> on the data? If not, can someone point me at the relevant bit of code
> and orient me enough that I can create it?
>
The relevant bit of code is in the MD_RECOVERY_REQUESTED branch of
sync_request_write() in drivers/md/raid1.c
Look for "memcmp".
This code runs when you "echo repair > /sys/block/mdXXX/md/sync_action
It has already read all blocks and now compares them to see if they are the
same. If not it copies the first to any that are different.
You possibly want to factor out that code into a separate function before
tryin to add any 'voting' code.
> (The other thing I'd love is a more advanced sync_action that can accept a
> block number found by "check" as a parameter to "repair" so I don't have
> to wait while the array is re-scanned. Um... I suppose this depends on
> a local patch I have that logs the sector numbers of mismatches.)
This is already possible via the sync_min and sync_max sysfs files.
Write a number of sectors to sync_max and a lower number to sync_min.
Then write 'repair' to 'sync_action'.
When sync_completed reaches sync_max, the repair will pause.
You can then let it continue by writing a larger number to sync_max, or tell
it to finish by writing 'idle' to 'sync_action'.
If you have patches that you think are generally useful, feel free to submit
them to me for consideration for upstream inclusion.
>
>
> Another thing I'm a bit worried about is the kernel's tendency to
> add drives in the lowest-numbered open slot in a RAID. When used in
> multiply-mirrored RAID-10, this tends to fill up the first stripe hallf
> before starting on the second.
This is controlled by raid10_add_disk in drivers/md/raid10.c. I would
happily accept a patch which made a more balanced choice about where to add
the new disk.
>
> I'm worried that someone not paying attention will --add rather than
> --re-add the off-site backup drives and create mirrors 4 and 5 of
> the first stripe half, thus producing an incomplete backup.
It is already on my to-do list for mdadm-3.2 to reject a --add that looks
like it should be a --re-add. You will need --force to make it a spare, or
--zero it first.
>
> Any suggestions on how to mitigate this risk? And if it happens,
> how do I recover? Is there a way to force a drive to be added
> as 9/10, even if 5/10 is currently empty?
1/ hack at mdadm or wait for mdadm-3.2, or feed people more coffee:-)
2/ You probably cannot recover with any amount of certainty.
3/ That is entirely a kernel decision - 'fix' the kernel.
NeilBrown
>
>
> Thank you very much!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-09-07 22:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-07 14:19 3-way mirrors George Spelvin
2010-09-07 16:07 ` Iordan Iordanov
2010-09-07 18:49 ` George Spelvin
2010-09-07 19:55 ` Keld Jørn Simonsen
2010-09-07 18:31 ` Aryeh Gregor
2010-09-07 19:02 ` George Spelvin
2010-09-08 22:28 ` Bill Davidsen
2010-09-07 22:01 ` Neil Brown [this message]
2010-09-08 1:33 ` Neil Brown
2010-09-08 14:52 ` George Spelvin
2010-09-08 23:04 ` Neil Brown
2010-09-08 9:40 ` RAID mismatches (and reporting thereof) Tim Small
2010-09-08 12:35 ` George Spelvin
2010-09-28 16:42 ` 3-way mirrors Tim Small
-- strict thread matches above, loose matches on Subject: below --
2010-09-08 3:58 Michael Sallaway
2010-09-08 4:16 ` Neil Brown
2010-09-08 5:45 Michael Sallaway
2010-09-08 6:02 ` Neil Brown
2010-09-08 6:16 Michael Sallaway
2010-09-08 6:40 ` Neil Brown
2010-09-08 9:06 ` Tim Small
2010-09-08 7:01 Michael Sallaway
2010-09-08 9:11 ` Tim Small
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100908080155.2ed9300e@notabene \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=linux@horizon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).