Re: help with the little script (erc timout fix)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Chris <email.bug@arcor.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: help with the little script (erc timout fix)
Date: Thu, 19 Feb 2015 08:25:34 +1100	[thread overview]
Message-ID: <20150219082534.0830ee30@notabene.brown> (raw)
In-Reply-To: <loom.20150218T155053-576@post.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2459 bytes --]

On Wed, 18 Feb 2015 15:04:53 +0000 (UTC) Chris <email.bug@arcor.de> wrote:

> 
> Hello,
> 
> by adapting what I could find, I compiled the following short snippet now.
> 
> Could list members please look at this novice code and suggest a way to 
> determine the containing disk device $HDD_DEV from the parition/disk,
> before I dare to test this.
> 
> 
> 
> In udev-md-raid-assembly.rules, below LABEL="md_inc" (section only handling
> all md suppported devices) add:
> 
> # fix timouts for redundant raids, if possible
> IMPORT{program}="BINDIR/mdadm --examine --export $tempnode"
> TEST="/usr/sbin/smartctl", ENV{MD_LEVEL}=="raid[1-9]*",
> RUN+="BINDIR/mdadm-erc-timout-fix.sh $tempnode"

It might make sense to have 2 rules, one for partitions and one for disks
(based on ENV{DEVTYPE}).  Then use $parent to get the device from the
partition, and  $devnode to get the device of the disk.

> 
> And in a new mdadm-erc-timout-fix.sh file implement:
> 
>   #! /bin/sh
> 
>   HDD_DEV= $1 somehow stipping off the tailing numbers?
> 
>   if smartctl -l scterc ${HDD_DEV} | grep -q Disabled ; then
>     /usr/sbin/smartctl -l scterc,70,70 ${HDD_DEV}
>   else
>     if ! smartctl -l scterc ${HDD_DEV} | grep -q seconds ; then
>       echo 180 >/sys/block/${HDD_DEV}/device/timeout
>     fi
>   fi

You should be consistent and use /usr/sbin/smartctl everywhere, or explicitly
set $PATH and just use smartctl  everywhere.

> 
> Correct execution during boot would seem to require that distro
> package managers hook smartctl and the script into the initramfs
> generation.
> 
> Regards,
> Chris

One problem with this approach is that it assumes circumstances don't change.
If you have a working RAID1, then limiting the timeout on both devices makes
sense.  If you have a degraded RAID1 with only one device left then you
really want the drive to try as hard as it can to get the data.

There is a "FAILFAST" mechanism in the kernel which allows the filesystem to
md etc to indicate that it wants accesses to "fail fast", which presumably
means to use a smaller timeout.
I would rather md used this flag where appropriate, and for the device to
respond to it by using suitable timeouts.

The problem is that FAILFAST isn't documented usefully and it is very hard to
figure out what exactly (if anything) it does.

But until that is resolved, a fix like this is probably a good idea.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

next prev parent reply	other threads:[~2015-02-18 21:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-14 21:59 re-add POLICY Chris
2015-02-15 19:03 ` re-add POLICY: conflict detection? Chris
2015-02-16  3:28 ` re-add POLICY NeilBrown
2015-02-16 12:23   ` Chris
2015-02-16 13:17     ` Phil Turmel
2015-02-16 16:15       ` desktop disk's error recovery timouts (was: re-add POLICY) Chris
2015-02-16 17:19         ` desktop disk's error recovery timouts Phil Turmel
2015-02-16 17:48           ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) Chris
2015-02-16 19:44             ` What are mdadm maintainers to do? Phil Turmel
2015-02-16 23:49             ` What are mdadm maintainers to do? (was: desktop disk's error recovery timeouts) NeilBrown
2015-02-17  7:52               ` What are mdadm maintainers to do? (error recovery redundancy/data loss) Chris
2015-02-17  8:48                 ` Mikael Abrahamsson
2015-02-17 10:37                   ` Chris
2015-02-17 19:33                 ` Chris Murphy
2015-02-17 22:47                   ` Adam Goryachev
2015-02-18  1:02                     ` Chris Murphy
2015-02-18 11:04                       ` Chris
2015-02-19  6:12                         ` Chris Murphy
2015-02-20  5:12                           ` Roger Heflin
2015-02-17 23:33                   ` Chris
2015-02-18 15:04               ` help with the little script (erc timout fix) Chris
2015-02-18 21:25                 ` NeilBrown [this message]
2015-02-17 15:09     ` re-add POLICY Chris
2015-02-22 13:23       ` Chris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150219082534.0830ee30@notabene.brown \
    --to=neilb@suse.de \
    --cc=email.bug@arcor.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).