From: David Brown <david.brown@hesbynett.no>
To: Anthony Youngman <antlists@youngman.org.uk>, Nix <nix@esperi.org.uk>
Cc: "Ravi (Tom) Hale" <ravi@hale.ee>, linux-raid@vger.kernel.org
Subject: Re: Fault tolerance with badblocks
Date: Tue, 09 May 2017 12:11:07 +0200 [thread overview]
Message-ID: <591195BB.7080200@hesbynett.no> (raw)
In-Reply-To: <17fe9ff3-1096-8303-a228-e910a77d8146@youngman.org.uk>
On 08/05/17 20:00, Anthony Youngman wrote:
> With redundant raid (and that doesn't include a two-disk, or even
> three-disk mirror), it SHOULD recalculate the failed block. If it
> doesn't bother even though it can, I'd call that a bug in scrub.
Please read:
<http://neil.brown.name/blog/20100211050355>
> What I
> thought happened was that it reads a stripe direct from disk, and if
> that failed it read the same stripe via the raid code, to get the raid
> error correction to fire, and then it rewrote the stripe.
That /is/ what happens.
As I mentioned in another reply, /reading/ is enough to trigger a
re-write on the disk if significant /correctable/ errors are discovered
by the disk's firmware. It is extremely rare that the raid level will
see an error (see the linked article by Neil Brown) - usually, the raid
level sees a missing block because the disk firmware could not read the
block correctly. In such cases, the raid software will write the
correct data back to the disk at the same logical block, and the disk
firmware will re-map it to a different block.
>
> What would be a nice touch, is that if we have a massive timeout for
> non-SCT drives, if the scrub has to wait more than, say, 10 seconds for
> a read to succeed it then assumes the block is failing and rewrites it.
I don't think the raid level can do that - it must wait for the drive to
finish handling the read request, or drop the drive entirely.
If the disk takes a long time to read a block, then it will either fail
and mark the block bad, or it will get the data off the disk and then
automatically re-write the data to a re-mapped block. The scrub can
therefore handle it like any other read.
> Actually, scrub that (groan... :-) - if the drive takes longer than 1/3
> of the timeout to respond, then the scrub assumes it's dodgy and
> rewrites it.
>>
>> If there was a way to get md to *rewrite* everything during scrub,
>> rather than just checking, this might help (in addition to letting the
>> drive refresh the magnetization of absolutely everything). "repair" mode
>> appears to do no writes until an error is found, whereupon (on RAID 6)
>> it proceeds to make a "repair" that is more likely than not to overwrite
>> good data with bad. Optionally writing what's already there on non-error
>> seems like it might be a worthwhile (and fairly simple) change.
>>
> Agreed. But without some heuristic, it's actually going to make a scrub
> much slower, and achieve very little apart from adding unnecessary wear
> to the drive.
>
> Cheers,
> Wol
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-05-09 10:11 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-04 10:04 Fault tolerance in RAID0 with badblocks Ravi (Tom) Hale
2017-05-04 13:44 ` Wols Lists
2017-05-05 4:03 ` Fault tolerance " Ravi (Tom) Hale
2017-05-05 19:20 ` Anthony Youngman
2017-05-06 11:21 ` Ravi (Tom) Hale
2017-05-06 13:00 ` Wols Lists
2017-05-08 14:50 ` Nix
2017-05-08 18:00 ` Anthony Youngman
2017-05-09 10:11 ` David Brown [this message]
2017-05-09 10:18 ` Nix
2017-05-08 19:02 ` Phil Turmel
2017-05-08 19:52 ` Nix
2017-05-08 20:27 ` Anthony Youngman
2017-05-09 9:53 ` Nix
2017-05-09 11:09 ` David Brown
2017-05-09 11:27 ` Nix
2017-05-09 11:58 ` David Brown
2017-05-09 17:25 ` Chris Murphy
2017-05-09 19:44 ` Wols Lists
2017-05-10 3:53 ` Chris Murphy
2017-05-10 4:49 ` Wols Lists
2017-05-10 17:18 ` Chris Murphy
2017-05-16 3:20 ` NeilBrown
2017-05-10 5:00 ` Dave Stevens
2017-05-10 16:44 ` Edward Kuns
2017-05-10 18:09 ` Chris Murphy
2017-05-09 20:18 ` Nix
2017-05-09 20:52 ` Wols Lists
2017-05-10 8:41 ` David Brown
2017-05-09 21:06 ` A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks) Nix
2017-05-12 11:14 ` Nix
2017-05-16 3:27 ` NeilBrown
2017-05-16 9:13 ` Nix
2017-05-16 21:11 ` NeilBrown
2017-05-16 21:46 ` Nix
2017-05-18 0:07 ` Shaohua Li
2017-05-19 4:53 ` NeilBrown
2017-05-19 10:31 ` Nix
2017-05-19 16:48 ` Shaohua Li
2017-06-02 12:28 ` Nix
2017-05-19 4:49 ` NeilBrown
2017-05-19 10:32 ` Nix
2017-05-19 16:55 ` Shaohua Li
2017-05-21 22:00 ` NeilBrown
2017-05-09 19:16 ` Fault tolerance with badblocks Phil Turmel
2017-05-09 20:01 ` Nix
2017-05-09 20:57 ` Wols Lists
2017-05-09 21:22 ` Nix
2017-05-09 21:23 ` Phil Turmel
2017-05-09 21:32 ` NeilBrown
2017-05-10 19:03 ` Nix
2017-05-09 16:05 ` Chris Murphy
2017-05-09 17:49 ` Wols Lists
2017-05-10 3:06 ` Chris Murphy
2017-05-08 20:56 ` Phil Turmel
2017-05-09 10:28 ` Nix
2017-05-09 10:50 ` Reindl Harald
2017-05-09 11:15 ` Nix
2017-05-09 11:48 ` Reindl Harald
2017-05-09 16:11 ` Nix
2017-05-09 16:46 ` Reindl Harald
2017-05-09 7:37 ` David Brown
2017-05-09 9:58 ` Nix
2017-05-09 10:28 ` Brad Campbell
2017-05-09 10:40 ` Nix
2017-05-09 12:15 ` Tim Small
2017-05-09 15:30 ` Nix
2017-05-05 20:23 ` Peter Grandi
2017-05-05 22:14 ` Nix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=591195BB.7080200@hesbynett.no \
--to=david.brown@hesbynett.no \
--cc=antlists@youngman.org.uk \
--cc=linux-raid@vger.kernel.org \
--cc=nix@esperi.org.uk \
--cc=ravi@hale.ee \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.