From: NeilBrown <neilb@suse.com>
To: Nix <nix@esperi.org.uk>, Chris Murphy <lists@colorremedies.com>
Cc: David Brown <david.brown@hesbynett.no>,
Anthony Youngman <antlists@youngman.org.uk>,
Phil Turmel <philip@turmel.org>, "Ravi (Tom) Hale" <ravi@hale.ee>,
Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks)
Date: Wed, 17 May 2017 07:11:17 +1000 [thread overview]
Message-ID: <87bmqsmrre.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <87efvpmqf6.fsf@notabene.neil.brown.name>
[-- Attachment #1: Type: text/plain, Size: 4656 bytes --]
On Tue, May 16 2017, NeilBrown wrote:
> On Tue, May 09 2017, Nix wrote:
>
>> On 9 May 2017, Chris Murphy verbalised:
>>
>>> 1. md reports all data drives and the LBAs for the affected stripe
>>
>> Enough rambling from me. Here's a hilariously untested patch against
>> 4.11 (as in I haven't even booted with it: my systems are kind of in
>> flux right now as I migrate to the md-based server that got me all
>> concerned about this). It compiles! And it's definitely safer than
>> trying a repair, and makes it possible to recover from a real mismatch
>> without losing all your hair in the process, or determine that a
>> mismatch is spurious or irrelevant. And that's enough for me, frankly.
>> This is a very rare problem, one hopes.
>>
>> (It's probably not ideal, because the error is just known to be
>> somewhere in that stripe, not on that sector, which makes determining
>> the affected data somewhat harder. But at least you can figure out what
>> filesystem it's on. :) )
>>
>> 8<------------------------------------------------------------->8
>> From: Nick Alcock <nick.alcock@oracle.com>
>> Subject: [PATCH] md: report sector of stripes with check mismatches
>>
>> This makes it possible, with appropriate filesystem support, for a
>> sysadmin to tell what is affected by the mismatch, and whether
>> it should be ignored (if it's inside a swap partition, for
>> instance).
>>
>> We ratelimit to prevent log flooding: if there are so many
>> mismatches that ratelimiting is necessary, the individual messages
>> are relatively unlikely to be important (either the machine is
>> swapping like crazy or something is very wrong with the disk).
>>
>> Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
>> ---
>> drivers/md/raid5.c | 16 ++++++++++++----
>> 1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index ed5cd705b985..bcd2e5150e29 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -3959,10 +3959,14 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh,
>> set_bit(STRIPE_INSYNC, &sh->state);
>> else {
>> atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
>> - if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
>> + if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
>> /* don't try to repair!! */
>> set_bit(STRIPE_INSYNC, &sh->state);
>> - else {
>> + pr_warn_ratelimited("%s: mismatch around sector "
>> + "%llu\n", __func__,
>> + (unsigned long long)
>> + sh->sector);
>> + } else {
>
> I think there is no point giving the function name,
> but that you should give the name of the array.
> Also "around" is a little vague.
> Maybe something like:
>
>> + pr_warn_ratelimited("%s: mismatch sector in range "
>> + "%llu-%llu\n", mdname(conf->mddev),
>> + (unsigned long long) sh->sector,
>> + (unsigned long long) sh->sector + STRIPE_SECTORS);
>
> As an optional enhancement, you could add "will recalculate P/Q" or
> "left unchanged" as appropriate.
>
> Providing at least that the array name is included in the message, I
> support this patch.
Actually, I have another caveat. I don't think we want these messages
during initial resync, or any resync. Only during a 'check' or
'repair'.
So add a check for MD_RECOVERY_REQUESTED or maybe for
sh->sectors >= conf->mddev->recovery_cp
NeilBrown
>
> NeilBrown
>
>
>
>> sh->check_state = check_state_compute_run;
>> set_bit(STRIPE_COMPUTE_RUN, &sh->state);
>> set_bit(STRIPE_OP_COMPUTE_BLK, &s->ops_request);
>> @@ -4111,10 +4115,14 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh,
>> }
>> } else {
>> atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
>> - if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
>> + if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
>> /* don't try to repair!! */
>> set_bit(STRIPE_INSYNC, &sh->state);
>> - else {
>> + pr_warn_ratelimited("%s: mismatch around sector "
>> + "%llu\n", __func__,
>> + (unsigned long long)
>> + sh->sector);
>> + } else {
>> int *target = &sh->ops.target;
>>
>> sh->ops.target = -1;
>> --
>> 2.12.2.212.gea238cf35.dirty
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-05-16 21:11 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-04 10:04 Fault tolerance in RAID0 with badblocks Ravi (Tom) Hale
2017-05-04 13:44 ` Wols Lists
2017-05-05 4:03 ` Fault tolerance " Ravi (Tom) Hale
2017-05-05 19:20 ` Anthony Youngman
2017-05-06 11:21 ` Ravi (Tom) Hale
2017-05-06 13:00 ` Wols Lists
2017-05-08 14:50 ` Nix
2017-05-08 18:00 ` Anthony Youngman
2017-05-09 10:11 ` David Brown
2017-05-09 10:18 ` Nix
2017-05-08 19:02 ` Phil Turmel
2017-05-08 19:52 ` Nix
2017-05-08 20:27 ` Anthony Youngman
2017-05-09 9:53 ` Nix
2017-05-09 11:09 ` David Brown
2017-05-09 11:27 ` Nix
2017-05-09 11:58 ` David Brown
2017-05-09 17:25 ` Chris Murphy
2017-05-09 19:44 ` Wols Lists
2017-05-10 3:53 ` Chris Murphy
2017-05-10 4:49 ` Wols Lists
2017-05-10 17:18 ` Chris Murphy
2017-05-16 3:20 ` NeilBrown
2017-05-10 5:00 ` Dave Stevens
2017-05-10 16:44 ` Edward Kuns
2017-05-10 18:09 ` Chris Murphy
2017-05-09 20:18 ` Nix
2017-05-09 20:52 ` Wols Lists
2017-05-10 8:41 ` David Brown
2017-05-09 21:06 ` A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks) Nix
2017-05-12 11:14 ` Nix
2017-05-16 3:27 ` NeilBrown
2017-05-16 9:13 ` Nix
2017-05-16 21:11 ` NeilBrown [this message]
2017-05-16 21:46 ` Nix
2017-05-18 0:07 ` Shaohua Li
2017-05-19 4:53 ` NeilBrown
2017-05-19 10:31 ` Nix
2017-05-19 16:48 ` Shaohua Li
2017-06-02 12:28 ` Nix
2017-05-19 4:49 ` NeilBrown
2017-05-19 10:32 ` Nix
2017-05-19 16:55 ` Shaohua Li
2017-05-21 22:00 ` NeilBrown
2017-05-09 19:16 ` Fault tolerance with badblocks Phil Turmel
2017-05-09 20:01 ` Nix
2017-05-09 20:57 ` Wols Lists
2017-05-09 21:22 ` Nix
2017-05-09 21:23 ` Phil Turmel
2017-05-09 21:32 ` NeilBrown
2017-05-10 19:03 ` Nix
2017-05-09 16:05 ` Chris Murphy
2017-05-09 17:49 ` Wols Lists
2017-05-10 3:06 ` Chris Murphy
2017-05-08 20:56 ` Phil Turmel
2017-05-09 10:28 ` Nix
2017-05-09 10:50 ` Reindl Harald
2017-05-09 11:15 ` Nix
2017-05-09 11:48 ` Reindl Harald
2017-05-09 16:11 ` Nix
2017-05-09 16:46 ` Reindl Harald
2017-05-09 7:37 ` David Brown
2017-05-09 9:58 ` Nix
2017-05-09 10:28 ` Brad Campbell
2017-05-09 10:40 ` Nix
2017-05-09 12:15 ` Tim Small
2017-05-09 15:30 ` Nix
2017-05-05 20:23 ` Peter Grandi
2017-05-05 22:14 ` Nix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bmqsmrre.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=antlists@youngman.org.uk \
--cc=david.brown@hesbynett.no \
--cc=linux-raid@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=nix@esperi.org.uk \
--cc=philip@turmel.org \
--cc=ravi@hale.ee \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).