From: "George Spelvin" <linux@horizon.com>
To: iordan@cdf.toronto.edu, linux@horizon.com
Cc: linux-raid@vger.kernel.org
Subject: Re: 3-way mirrors
Date: 7 Sep 2010 14:49:17 -0400 [thread overview]
Message-ID: <20100907184917.8181.qmail@science.horizon.com> (raw)
In-Reply-To: <4C866354.3050904@cdf.toronto.edu>
George Spelvin wrote:
>> Anyway, one nice property of a 2-drive redundancy (3+-way mirror or
>> RAID-6) is error detection: in case of a mismatch, it's possible to
>> finger the offending drive.
> When we see a mismatch_cnt > 0, we would run a dd/cmp script which would
> detect the drive and sector which is mismatched (i.e. we would craft a
> script which runs three dd processes in parallel, reading from each
> drive, and compares the data).
> When an inconsistency is discovered, we would have the sector which
> doesn't match, and which drive it's on. However, even at 60MB/s, this
> would take 5 hours to perform with our 1TB drives. So, it would be much
> better if we could do this while we are up, somehow.
That was my hope, for the md software to do it automatically.
>> My understanding of the current code is that it just copies one mirror
>> (the first readable?) to the others. Does someone have a patch to vote
>> on the data? If not, can someone point me at the relevant bit of code
>> and orient me enough that I can create it?
> Resyncing an entire drive is probably not necessary with a mismatch,
> because you already know the rest of the drive is synced and can simply
> manually force a particular sector to match.
Ideally, I'd like ZFS-like checksums on the data, with a mismatch triggering
a read of all mirrors and a reconstruction attempt. With that, a silently
corrupted sector on RAID-5 can be pinpointed and fixed.
But in the meantime, I'd like check/repair passes to tell me if 2 of the 3
mirrors agree, so I can blame the third.
>> (The other thing I'd love is a more advanced that can accept a
>> block number found by "check" as a parameter to "repair" so I don't have
>> to wait while the array is re-scanned. Um... I suppose this depends on
>> a local patch I have that logs the sector numbers of mismatches.)
> Yes, but don't you run the risk of syncing the "bad" data from the
> mismatch drive to the other two drives if you do this automatically?
> Don't you also need a parameter to specify which drive to sync from?
That's why I wanted the voting, so the RAID software could decide
automatically. I don't see a practical way to identify the correct
block contents in isolation, although mapping up to a logical file
may find a file which can be checked for consistency.
(But debugfs takes forever to run icheck + ncheck on a large filesystem.)
> At any rate, if the mismatch sector(s) are also logged during the array
> check, then resyncing this sector by hand would be easy and fast with
> minimal downtime. It would be great to have this functionality to start
> with.
I use the following patch. Note that it reports the offset in 512-byte
sectors within a single component; multiply by the number of data drives
and divide by sectors per block to get a block offset within the RAID
array.
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d1d6891..2dcffcd 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1363,6 +1363,8 @@ static void sync_request_write(mddev_t *mddev, r10bio_t *r10_bio)
break;
if (j == vcnt)
continue;
+ printk(KERN_INFO "%s: Mismatch at sector %llu\n",
+ mdname(mddev), (unsigned long long)r10_bio->sector);
mddev->resync_mismatches += r10_bio->sectors;
}
if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 96c6902..a0a0b08 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2732,6 +2732,8 @@ static void handle_parity_checks5(raid5_conf_t *conf, struct stripe_head *sh,
*/
set_bit(STRIPE_INSYNC, &sh->state);
else {
+printk(KERN_INFO "%s: Mismatch at sector %llu\n", mdname(conf->mddev),
+ (unsigned long long)sh->sector);
conf->mddev->resync_mismatches += STRIPE_SECTORS;
if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
/* don't try to repair!! */
next prev parent reply other threads:[~2010-09-07 18:49 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-07 14:19 3-way mirrors George Spelvin
2010-09-07 16:07 ` Iordan Iordanov
2010-09-07 18:49 ` George Spelvin [this message]
2010-09-07 19:55 ` Keld Jørn Simonsen
2010-09-07 18:31 ` Aryeh Gregor
2010-09-07 19:02 ` George Spelvin
2010-09-08 22:28 ` Bill Davidsen
2010-09-07 22:01 ` Neil Brown
2010-09-08 1:33 ` Neil Brown
2010-09-08 14:52 ` George Spelvin
2010-09-08 23:04 ` Neil Brown
2010-09-08 9:40 ` RAID mismatches (and reporting thereof) Tim Small
2010-09-08 12:35 ` George Spelvin
2010-09-28 16:42 ` 3-way mirrors Tim Small
-- strict thread matches above, loose matches on Subject: below --
2010-09-08 3:58 Michael Sallaway
2010-09-08 4:16 ` Neil Brown
2010-09-08 5:45 Michael Sallaway
2010-09-08 6:02 ` Neil Brown
2010-09-08 6:16 Michael Sallaway
2010-09-08 6:40 ` Neil Brown
2010-09-08 9:06 ` Tim Small
2010-09-08 7:01 Michael Sallaway
2010-09-08 9:11 ` Tim Small
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100907184917.8181.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=iordan@cdf.toronto.edu \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).