Re: What are mdadm maintainers to do? (error recovery redundancy/data loss)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Murphy <lists@colorremedies.com>
To: linux-raid@vger.kernel.org
Subject: Re: What are mdadm maintainers to do? (error recovery redundancy/data loss)
Date: Tue, 17 Feb 2015 18:02:00 -0700	[thread overview]
Message-ID: <CAJCQCtTv-HMX77GWqA+_1rfkWodsTETp_Y3w5En-N4nGCymWsA@mail.gmail.com> (raw)
In-Reply-To: <54E3C51C.2080106@websitemanagers.com.au>

On Tue, Feb 17, 2015 at 3:47 PM, Adam Goryachev
<mailinglists@websitemanagers.com.au> wrote:

> If we enable SCT ERC on every drive that supports it, and we are using the
> drive (only) in a RAID0/linear array then what is the downside?

Unnecessary data loss.

> As I
> understand it, the drive will no longer try for > 120sec to recover the data
> stored in the "bad" sector, and instead return an unreadable error message
> in a short amount of time (well below 30 seconds) which means the driver
> will be able to return a read error to the application (or FS or MD) and the
> system as a whole will carry on.

Not necessarily, it depends what's in that sector. If it's user data,
this means a sector (or possibly more) of data loss. If it's file
system metadata it means progressive file system corruption.

Configuring the drive to give up too soon is completely inappropriate
for single, raid0 or linear configurations.

Arguably the drive should have already recovered this data. If a
longer recovery can recover, then why isn't the drive writing the data
back to that sector so that next time it isn't so ambiguous that it
requires long recovery? I can't answer that question. In some case
that appears to happen in other cases it's not. But the followup is
that there really ought to be some way for user space to get access to
these kinds of errors rather than them accumulating until disaster
strikes.

The contra argument to that is, it's still cheaper to buy the proper
use case specified drive.

>If we didn't enable SCT ERC, then the
> entire drive would vanish, (because the timeout wasn't changed for the
> driver) and the current read and every future read/write will all fail, and
> the system will probably crash (well, depending on the application, FS
> layout, etc).

Umm no. If SCT ERC remains a high value or disable, while also
increasing the kernel command timer, the drive has a longer chance to
recover. That's the appropriate configuration for single, linear, and
raid0.

>
> So, IMHO, it seems that by default, every SCT ERC capable drive should have
> this enabled by default. As a part of error recovery (ie, crap that really
> important data stored on those few unreadable sectors) the user could
> manually disable SCT ERC and re-attempt to request the data from the drive
> (eg, during dd_rescue or similar).

If you do this for single, linear, or raid0 it will increase the
incident of data loss that would otherwise not occur if deep/long
recovery times were available.

Before changing these settings, there should be some better
understanding of what the manufacturer defined recovery times in the
real world actually are, and whether or not these long recoveries are
helpful. Presumably they'd say they are helpful, but I think we need
facts to contradict their position before second guessing the default
settings. And we have such facts to do exactly that when it comes to
raid1, 5, 6 with such drives which is why the recommendation is to
change SCT ERC if supported.

> Secondly, changing the timeout for those drives that don't support SCT ERC,
> again, it is fairly similar to above, we get the error from the drive before
> the timeout, except we will avoid the only possible downside above (failing
> to read a very unlikely but possible to read sector). Again, we will avoid
> dropping the entire drive, even if all operations on this drive will stop
> for a longer period of time, it is probably better than stopping
> permanently.

Not by default. You can't assume any drive hang is due to bad sectors
that merely need a longer recovery time. It could be some other error
condition, in which case doing a 120 or 180 second *by default* delay
means no error messages at all for upwards of 3 minutes.

And in any case the proper place to change the default kernel command
timer value is in the kernel, not with a udev rule.

I don't know if a udev rule can say "If the drive exclusively uses md,
lvm, btrfs, zfs raid1, 4+ or nested of those, and if the drive does
not support configurable SCT ERC, then change the kernel command timer
for those devices to ~120 seconds" then that might be a plausible
solution to use consumer drives the manufacturer rather explicitly
proscribes from use in raid...

But the contra argument to that is, why should anyone do this work for
(sorry) basically cheap users who don't want to buy the proper drive
for the specific use case? There are limited resources for this work.
And in fact the problem has a work around, if not a solution.

What we still don't have is something that reports any such problems
to user space.

-- 
Chris Murphy

next prev parent reply	other threads:[~2015-02-18  1:02 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-14 21:59 re-add POLICY Chris
2015-02-15 19:03 ` re-add POLICY: conflict detection? Chris
2015-02-16  3:28 ` re-add POLICY NeilBrown
2015-02-16 12:23   ` Chris
2015-02-16 13:17     ` Phil Turmel
2015-02-16 16:15       ` desktop disk's error recovery timouts (was: re-add POLICY) Chris
2015-02-16 17:19         ` desktop disk's error recovery timouts Phil Turmel
2015-02-16 17:48           ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) Chris
2015-02-16 19:44             ` What are mdadm maintainers to do? Phil Turmel
2015-02-16 23:49             ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) NeilBrown
2015-02-17  7:52               ` What are mdadm maintainers to do? (error recovery redundancy/data loss) Chris
2015-02-17  8:48                 ` Mikael Abrahamsson
2015-02-17 10:37                   ` Chris
2015-02-17 19:33                 ` Chris Murphy
2015-02-17 22:47                   ` Adam Goryachev
2015-02-18  1:02                     ` Chris Murphy [this message]
2015-02-18 11:04                       ` Chris
2015-02-19  6:12                         ` Chris Murphy
2015-02-20  5:12                           ` Roger Heflin
2015-02-17 23:33                   ` Chris
2015-02-18 15:04               ` help with the little script (erc timout fix) Chris
2015-02-18 21:25                 ` NeilBrown
2015-02-17 15:09     ` re-add POLICY Chris
2015-02-22 13:23       ` Chris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJCQCtTv-HMX77GWqA+_1rfkWodsTETp_Y3w5En-N4nGCymWsA@mail.gmail.com \
    --to=lists@colorremedies.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).