Re: Fault tolerance with badblocks

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nix <nix@esperi.org.uk>
To: Phil Turmel <philip@turmel.org>
Cc: Wols Lists <antlists@youngman.org.uk>,
	"Ravi (Tom) Hale" <ravi@hale.ee>,
	linux-raid@vger.kernel.org
Subject: Re: Fault tolerance with badblocks
Date: Tue, 09 May 2017 11:28:53 +0100	[thread overview]
Message-ID: <871sry728q.fsf@esperi.org.uk> (raw)
In-Reply-To: <e2196f02-2b94-8afb-06a0-9695d441c890@turmel.org> (Phil Turmel's message of "Mon, 8 May 2017 16:56:24 -0400")

On 8 May 2017, Phil Turmel said:

> On 05/08/2017 03:52 PM, Nix wrote:
>> And... then what do you do? On RAID-6, it appears the answer is "live
>> with a high probability of inevitable corruption".
>
> No, you investigate the quality of your data and the integrity of the
> rest of the system, as something *other* than a drive problem caused the
> mismatch.  (Swap is a known exception, though.)

Yeah, I'm going to "rely" on the fact that this machine has heaps of
memory and won't be swapping much when it does a RAID scrub. :)

But "you investigate the quality of your data"... so now, on a single
mismatch that won't go away, I have to compare all my data with backups,
taking countless hours and emitting heaps of spurious errors because no
backup is ever quite up to date? Those backups *live* on hard drives, so
it has exactly the same chance of spurious disk-layer errors as the
thing that preceded it (quite possibly higher).

Honestly, scrubs are looking less and less desirable the more I talk
about them. Massive worry inducers that don't actually spot problems in
any meaningful sense (not even at the level of "there is a problem on
this disk", just "there is a problem on this array").

>> That's not very good.
>> (AIUI, if a check scrub finds a URE, it'll rewrite it, and when in the
>> common case the drive spares it out and the write succeeds, this will
>> not be reported as a mismatch: is this right?)
>
> This is also wrong, because you are assuming sparing-out is the common
> case.  A read error does not automatically trigger relocation.  It
> triggers *verification* of the next *write*.  In young drives,

So I guess we only need to worry about mismatches if they don't go away
and are persistently in the same place on the same drive. (Only you
can't tell what place that is, or what drive that is, because md doesn't
tell you. I'm really tempted to fix *that* at least, a printk() or
something.)

> { Drive self tests might do some pre-emptive rewriting of marginal
> sectors -- it's not something drive manufacturers are documenting.  But
> a drive self-test cannot fix an unreadable sector -- it doesn't know
> what to write there. }

Agreed.

>>> This is actually counterproductive.  Rewriting everything may refresh
>>> the magnetism on weakening sectors, but will also prevent the drive from
>>> *finding* weakening sectors that really do need relocation.
>> 
>> If a sector weakens purely because of neighbouring writes or temperature
>> or a vibrating housing or something (i.e. not because of actual damage),
>> so that a rewrite will strengthen it and relocation was never necessary,
>> surely you've just saved a pointless bit of sector sparing? (I don't
>> know: I'm not sure what the relative frequency of these things is. Read
>> and write errors in general are so rare that it's quite possible I'm
>> worrying about nothing at all. I do know I forgot to scrub my old
>> hardware RAID array for about three years and nothing bad happened...)
>
> Drives that are in applications that get *read* pretty often don't need
> much if any scrubbing -- the application itself will expose problem
> sectors.  Hobbyists and home media servers can go months with specific
> files unread, so developing problems can hit in clusters.  Regular
> scrubbing will catch these problems before they take your array down.

Yeah, and I have plenty of archival data on this array -- it's the first
one I've ever had that's big enough to consider using for that as well
as for frequently-used stuff whose integrity I care about. (But even the
frequently-read stuff is bcached, so even that is in effect archival
much of the time, from the perspective of its read.)

> And you can't compare hardware array behavior to MD -- they have their
> own algorithms to take care of attached disks without OS intervention.

I don't see what the difference is between a hardware array controller
with its own noddy OS, barely-maintained software, creaking processor,
and not very big battery-backed RAM and md with a decent OS, much faster
processor, decent software, and often masses of RAM and a journal on
SSD, except that the md array will be far faster and if anything goes
wrong you have much higher chance of actually getting your data back
with md. :)

The days of saying "hardware arrays are just different/better, md cannot
compete with them" are many years in the past. People are *replacing*
hardware arrays with md these days because the hardware arrays are
*worse* on almost every metric. If hardware arrays have magic recovery
algorithms that md and/or the Linux block layer don't, the question now
is why not? not "oh we cannot compare"

next prev parent reply	other threads:[~2017-05-09 10:28 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-04 10:04 Fault tolerance in RAID0 with badblocks Ravi (Tom) Hale
2017-05-04 13:44 ` Wols Lists
2017-05-05  4:03   ` Fault tolerance " Ravi (Tom) Hale
2017-05-05 19:20     ` Anthony Youngman
2017-05-06 11:21       ` Ravi (Tom) Hale
2017-05-06 13:00         ` Wols Lists
2017-05-08 14:50           ` Nix
2017-05-08 18:00             ` Anthony Youngman
2017-05-09 10:11               ` David Brown
2017-05-09 10:18               ` Nix
2017-05-08 19:02             ` Phil Turmel
2017-05-08 19:52               ` Nix
2017-05-08 20:27                 ` Anthony Youngman
2017-05-09  9:53                   ` Nix
2017-05-09 11:09                     ` David Brown
2017-05-09 11:27                       ` Nix
2017-05-09 11:58                         ` David Brown
2017-05-09 17:25                           ` Chris Murphy
2017-05-09 19:44                             ` Wols Lists
2017-05-10  3:53                               ` Chris Murphy
2017-05-10  4:49                                 ` Wols Lists
2017-05-10 17:18                                   ` Chris Murphy
2017-05-16  3:20                                   ` NeilBrown
2017-05-10  5:00                                 ` Dave Stevens
2017-05-10 16:44                                 ` Edward Kuns
2017-05-10 18:09                                   ` Chris Murphy
2017-05-09 20:18                             ` Nix
2017-05-09 20:52                               ` Wols Lists
2017-05-10  8:41                               ` David Brown
2017-05-09 21:06                             ` A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks) Nix
2017-05-12 11:14                               ` Nix
2017-05-16  3:27                               ` NeilBrown
2017-05-16  9:13                                 ` Nix
2017-05-16 21:11                                 ` NeilBrown
2017-05-16 21:46                                   ` Nix
2017-05-18  0:07                                     ` Shaohua Li
2017-05-19  4:53                                       ` NeilBrown
2017-05-19 10:31                                         ` Nix
2017-05-19 16:48                                           ` Shaohua Li
2017-06-02 12:28                                             ` Nix
2017-05-19  4:49                                     ` NeilBrown
2017-05-19 10:32                                       ` Nix
2017-05-19 16:55                                         ` Shaohua Li
2017-05-21 22:00                                           ` NeilBrown
2017-05-09 19:16                         ` Fault tolerance with badblocks Phil Turmel
2017-05-09 20:01                           ` Nix
2017-05-09 20:57                             ` Wols Lists
2017-05-09 21:22                               ` Nix
2017-05-09 21:23                             ` Phil Turmel
2017-05-09 21:32                     ` NeilBrown
2017-05-10 19:03                       ` Nix
2017-05-09 16:05                   ` Chris Murphy
2017-05-09 17:49                     ` Wols Lists
2017-05-10  3:06                       ` Chris Murphy
2017-05-08 20:56                 ` Phil Turmel
2017-05-09 10:28                   ` Nix [this message]
2017-05-09 10:50                     ` Reindl Harald
2017-05-09 11:15                       ` Nix
2017-05-09 11:48                         ` Reindl Harald
2017-05-09 16:11                           ` Nix
2017-05-09 16:46                             ` Reindl Harald
2017-05-09  7:37             ` David Brown
2017-05-09  9:58               ` Nix
2017-05-09 10:28                 ` Brad Campbell
2017-05-09 10:40                   ` Nix
2017-05-09 12:15                     ` Tim Small
2017-05-09 15:30                       ` Nix
2017-05-05 20:23     ` Peter Grandi
2017-05-05 22:14       ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871sry728q.fsf@esperi.org.uk \
    --to=nix@esperi.org.uk \
    --cc=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=philip@turmel.org \
    --cc=ravi@hale.ee \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.