All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris <email.bug@arcor.de>
To: linux-raid@vger.kernel.org
Subject: Re: What are mdadm maintainers to do? (error recovery redundancy/data loss)
Date: Wed, 18 Feb 2015 11:04:35 +0000 (UTC)	[thread overview]
Message-ID: <loom.20150218T102011-486@post.gmane.org> (raw)
In-Reply-To: CAJCQCtTv-HMX77GWqA+_1rfkWodsTETp_Y3w5En-N4nGCymWsA@mail.gmail.com

>

Hello all,

the discussion about SCTERC boils down to letting the drive attempt ERC a
little more or less. For any given disk experience seems to tell the slight
difference is, that if ERC is allowed longer you may see the first
unrecoverable erros (UREs) just a little (maybe only a month) later.

UREs are inevitable. Thus, if I run a filesystem on just a single drive it
will get corrupted at some point, nothing to do about it.

Wait, except..., use a redundant raid! And here it makes a lot of a
difference that the drive's ERC actually terminates before the controller
timeout, to not loose all your redundacy again and be in hight risk of UREs
showing up during the re-sync.

So for a proper comparison we need to look at the difference it makes in the
usage scenarios (error delay vs. loosing redundant error resilence + URE
triggering), not at the single recoverable/unrecoverable error incidence. It
looks to me, that it makes a lot of a differnce to redundant raids and no
qualitative difference to single disk filesystems.

And we need to keep in mind that single disk filesystems do also depend on
the disk to stop grinding away with ERC attempts before the controller
timout. Otherwise disk reset may make the system clear buffers and loose
open files? Without prolonging the linux default controller timout, SCTERC
can prevent that where supported.



> in any case the proper place to change the default kernel command
> timer value is in the kernel, not with a udev rule.

Right. And as you write increasing the controller timout has clear downsides.

Noteing as well, as long as the proposed script (a temporary safety measure)
maximizes the controller timeout to remedy for disks that don's support
SCTERC, this would even fix the timout mismatch for single disk filesystems.
(Letting the controller wait until the disk finally succeeds or fails its
recovery attempts.)

So the proposed script actually provides a case that brings benefit for
raid0 setups as well (as long as the linux default is not adaptive to the
disk parameters), but increasing the controller timout in all cases would
introduce long and unreported i/o blocking into all redundant setups.


> I don't know if a udev rule can say "If the drive exclusively uses md,
> lvm, btrfs, zfs raid1, 4+ or nested of those, and if the drive does
> not support configurable SCT ERC, then change the kernel command timer
> for those devices to ~120 seconds" then that might be a plausible
> solution to use consumer drives the manufacturer rather explicitly
> proscribes from use in raid...

The script called by the udev rule could do that, but can be kept as simple
as proposed, and can set SCTERC regardles, because setting SCTERC below the
controller timout makes a qualitative difference in running the redundant
arrays and a marginal difference in running non-redundant filesystems. (And
nevertheless, set long controller timout for devices that don's support SCTERC.)



After all, this looks like a quite simple change is appropriate:

In udev-md-raid-assembly.rules, below LABEL="md_inc" (only handling all md
suppported devices) add one rule:

# fix timouts for redundant raids, if possible
TEST="/usr/sbin/smartctl", ENV{MD_LEVEL}=="raid[1-9]*",
RUN+="/usr/bin/mdadm-erc-timout-fix"


And in a new /usr/bin/mdadm-erc-timout-fix file implement:

  if smartctl -l scterc ${HDD_DEV} returns "Disabled" 
    /usr/sbin/smartctl -l scterc,70,70 ${HDD_DEV}
  else
    if smartctl -l scterc ${HDD_DEV} does not return "seconds"
      echo 180 >/sys/block/${HDD_DEV}/device/timeout


Regards,
Chris



  reply	other threads:[~2015-02-18 11:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-14 21:59 re-add POLICY Chris
2015-02-15 19:03 ` re-add POLICY: conflict detection? Chris
2015-02-16  3:28 ` re-add POLICY NeilBrown
2015-02-16 12:23   ` Chris
2015-02-16 13:17     ` Phil Turmel
2015-02-16 16:15       ` desktop disk's error recovery timouts (was: re-add POLICY) Chris
2015-02-16 17:19         ` desktop disk's error recovery timouts Phil Turmel
2015-02-16 17:48           ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) Chris
2015-02-16 19:44             ` What are mdadm maintainers to do? Phil Turmel
2015-02-16 23:49             ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) NeilBrown
2015-02-17  7:52               ` What are mdadm maintainers to do? (error recovery redundancy/data loss) Chris
2015-02-17  8:48                 ` Mikael Abrahamsson
2015-02-17 10:37                   ` Chris
2015-02-17 19:33                 ` Chris Murphy
2015-02-17 22:47                   ` Adam Goryachev
2015-02-18  1:02                     ` Chris Murphy
2015-02-18 11:04                       ` Chris [this message]
2015-02-19  6:12                         ` Chris Murphy
2015-02-20  5:12                           ` Roger Heflin
2015-02-17 23:33                   ` Chris
2015-02-18 15:04               ` help with the little script (erc timout fix) Chris
2015-02-18 21:25                 ` NeilBrown
2015-02-17 15:09     ` re-add POLICY Chris
2015-02-22 13:23       ` Chris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20150218T102011-486@post.gmane.org \
    --to=email.bug@arcor.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.