rules and scripts (erc timeout fix)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* rules and scripts (erc timeout fix)
@ 2015-02-20 16:54 email.bug
       [not found] ` <B639B66A-F606-43CD-8FCC-D1A7810762D1@gmx.de>
  0 siblings, 1 reply; 3+ messages in thread
From: email.bug @ 2015-02-20 16:54 UTC (permalink / raw)
  To: linux-raid

Hello all,

enjoy, I tested the scripts set timeouts ok here, but I only have
drives that support erc timeouts (even if some have it disabled by default) none that
would really require setting a long controller timeout.

Cheers,
Chris

smartctl-timeouts README

The smartctl-timeouts scripts adjust the disk timeouts according to use-cases,
fixing common mismatching defaults that have often lead to data loss.

The scripts are to be called by udev rules during device initialization.
Every redundancy providing block device module may ship with proper udev rules
that initialize the timeouts for their possibly redundant devices.
The module may further adjust the actual status according to run-time changes.

NOTE: Correct execution during boot requires that distro package managers
      hook smartctl and the smartctl-timeouts scripts into the initramfs.

RATIONALE

The error recovery (ERC) timout *must* be shorter than the controller timeout.

Otherwise read errors will cause controller resets, leading to direct data loss
or, if it is a redundant disk, loss of redundancy and a very high probability
of another read error and data loss when re-establishing the redundancy.

If a drive does not support adjusting its ERC timout, the controller timeout
must be increased above the drive's 'maximal error recovery time.
If you don't want that kind of long device timeout, you should look for a drive
with SCT ERC timout support. (smartctl -l scterc /dev/...)

IMPACT

If possible, the ERC timeout is adjusted to the controller timeout minus 5 seconds,
for all disks that contain possibly redundant data.

The controller timeout is only changed (raising it to LONG_CTRL_WAIT_SECONDS)
for drives without SCTERC support and entirely non-redundant-disks, to allow these
drives to properly finish their error recovery before a reset is triggerd.

Because controller timeouts are only increased selectively (only drives without SCTERC
support and surely non-redundant disks), the scripts won't change any timeouts in
professional, dedicated, redundant setups (e.g. storage servers etc.), except if
LONG_WAIT_ALL_NONREDUND_DISKS is configured to be true.

TODO

* non-redundant-partitions: conditional udev triggering, or a test in the script could
  determine if all partions of the disk have been detected already and are all
  non-redundant, to call non-redundant-disk in this case.

* parser to read ERC timout values?
    - redundant-disk: a previously set "controller timeout - 5 seconds" ERC timeout
      (possibly-redundant), could also be reset to 7 seconds, not just a "Disabled" value.

* If a redundancy controlling kernel module is to make dynamic adjustments,
  "redundant-partition" needs implementation.

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <B639B66A-F606-43CD-8FCC-D1A7810762D1@gmx.de>]

* Re: udev rules and scripts (erc timeout fix)
       [not found] ` <B639B66A-F606-43CD-8FCC-D1A7810762D1@gmx.de>
@ 2015-02-22 10:23   ` Chris
  2015-02-25 14:37     ` Chris
  0 siblings, 1 reply; 3+ messages in thread
From: Chris @ 2015-02-22 10:23 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 24224 bytes --]

Hello all,

thanks for giving notice that the attachment didn't come through the
mailinglist.

The following is an improved version with its README, inserted inline.
It has changes to support configuring specific timeouts, and switching
between them.

Cheers,
Chris

--------
# smartctl-timeouts_defaults
# Defaults used by smartctl-timeouts scripts:

NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS="183"
# Should always be set. (e.g. 183)
# Used for disks without SCTERC support, to prevent too early
# resets. This long controller timeout value should be above
# the usual error recovery time of the harddrives without
# SCTERC support. Unfortunately, these values don't seem
# to be readable from the drives nor published.

NONREDUNDANT_UNSURE_RESET_ALL_DISKS=""
# If "true", ERC timout gets disabled for non-redundant disks
# an the value is used as the controller timeout.
# Can be set to "true" to try letting non-redundant disks fully
# complete their error recovery attempt.

# The configuration options below can be left blank or commmented
# out. This results in working with the hardware, kernel, or
# distribution defaults, and doing only necessary adaptions when
# initializing.
# But without configuring specific values, switching between
# the redundancy modes may not work well.

#NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS="63"
# May be set to allow ample ERC time (e.g. 63).
# If blank the current timeout will not be changed, if possible.
# Note that the max. ERC timout is 99 seconds, so an exceeding
# controller timeout won't result in longer error correction
# attempts. Possibly use NONREDUNDANT_UNSURE_RESET_ALL_DISKS if your
# disk will do longer error correction attempts, if the ERC
# timeout is disabled.

#POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS="48"
# May be set to allow some (-5s ) ERC timeout, yet not blocking
# redundant disks for too long.
# If blank the current setting will not be changed, if possible.

#REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS="29"
# May be set to quickly reset blocking disks.
# If blank the current setting will not be changed, if possible.

TIMING_CMD="/usr/sbin/smartctl -l scterc"
set -o nounset -o errexit

-------
# do not edit this file, it will be overwritten on update

# Don't process any events if anaconda is running as anaconda brings up
# raid devices manually
ENV{ANACONDA}=="?*", GOTO="md_inc_end"
# assemble md arrays

SUBSYSTEM!="block", GOTO="md_inc_end"

# handle potential components of arrays (the ones supported by md)
ENV{ID_FS_TYPE}=="linux_raid_member", GOTO="md_inc"

# "noiswmd" on kernel command line stops mdadm from handling
#  "isw" (aka IMSM - Intel RAID).
# "nodmraid" on kernel command line stops mdadm from handling
#  "isw" or "ddf".
IMPORT{cmdline}="noiswmd"
IMPORT{cmdline}="nodmraid"

ENV{nodmraid}=="?*", GOTO="md_inc_end"
ENV{ID_FS_TYPE}=="ddf_raid_member", GOTO="md_inc"
ENV{noiswmd}=="?*", GOTO="md_inc_end"
ENV{ID_FS_TYPE}=="isw_raid_member", GOTO="md_inc"
GOTO="md_inc_end"

LABEL="md_inc"

# initialize redundancy possibility status
# (only the kernel module could set actual run-time state, and may in the future
# set a dynamic FASTFAIL kernel device property instead of calling smartctl-timeout scripts)
IMPORT{program}="BINDIR/mdadm --examine --export $tempnode"
ENV{MD_LEVEL}=="raid[1-9]*", ENV{REDUNDANT_DEV}="possibly"
ENV{MD_LEVEL}=="raid0", ENV{REDUNDANT_DEV}="false"

# remember you can limit what gets auto/incrementally assembled by
# mdadm.conf(5)'s 'AUTO' and selectively whitelist using 'ARRAY'
ACTION=="add|change", IMPORT{program}="BINDIR/mdadm --incremental --export $tempnode --offroot ${DEVLINKS}"
ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*", ENV{MD_FOREIGN}=="no", ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"
ACTION=="remove", ENV{ID_PATH}=="?*", RUN+="BINDIR/mdadm -If $name --path $env{ID_PATH}"
ACTION=="remove", ENV{ID_PATH}!="?*", RUN+="BINDIR/mdadm -If $name"

LABEL="md_inc_end"

# initialize redundancy status for all surely non-redundant devices
# (The mdadm, btrfs, zfs, lvm, ... devices need too be adjusted by their own packages)
ENV{ID_FS_TYPE}!="linux_raid*|ddf_raid*|isw_raid*|lvm_*|LVM*|btrfs*|zfs*", ENV{REDUNDANT_DEV}="false"

# call initial HDD error correction timeouts adjustment
ENV{DEVTYPE}=="partition", ENV{REDUNDANT_DEV}=="possibly", TEST="/usr/sbin/smartctl", RUN+="BINDIR/smartctl-timeouts_possibly-redundant-partition.sh $parent"
ENV{DEVTYPE}=="partition", ENV{REDUNDANT_DEV}=="false", TEST="/usr/sbin/smartctl", RUN+="BINDIR/smartctl-timeouts_non-redundant-partition.sh $parent"
ENV{DEVTYPE}=="disk", ENV{REDUNDANT_DEV}=="possibly", TEST="/usr/sbin/smartctl", RUN+="BINDIR/smartctl-timeouts_posibly-redundant-disk.sh $devnode"
ENV{DEVTYPE}=="disk", ENV{REDUNDANT_DEV}=="false", TEST="/usr/sbin/smartctl", RUN+="BINDIR/smartctl-timeouts_non-redundant-disk.sh $devnode"

------
#!/bin/sh
# smartctl-timeouts_possibly-redundant-disk.sh

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. $SCRIPT_DIR/smartctl-timeouts_defaults

HDD_DEV="$1"

echo "Adjusting $HDD_DEV timeouts:"

if ! ${TIMING_CMD} /dev/${HDD_DEV} | grep -q Disabled \
  && ! ${TIMING_CMD} /dev/${HDD_DEV} | grep -q seconds
then
  # ERC timeout is not supported (not disabled and not set):
  # * Set the controller timeout to be considerably loooooong.
  #   - To allow the drive to give up its ERC attempts by itself.
  #   - Let the drive return a proper read error, so that the redundancy
  #     provider (md, lvm, btrfs, ...) can re-write the bad block.
  #   - Disk read errors thus result in long i/o blocking periods with
  #     no error messages that may not be watched by or reported to the user,
  #   - but waiting this long should prevent unecessary controller resets of the
  #     entire drive and the corresponding loss of redundancy/data.
  echo "Drive without ERC timeout support, setting NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS (${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS}s)"
  echo ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS} >/sys/block/${HDD_DEV}/device/timeout
else
  SWITCH_FROM_OTHER_CONFIGURED_SMARTCTL_TIMEOUT="false"

  # reset controller timeout, if a configured value was previously set
  if [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS:--1} ] \
    || [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:--1} ] \
    || [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:--1} ]
  then
    SWITCH_FROM_OTHER_CONFIGURED_SMARTCTL_TIMEOUT="true"
    echo "resetting controller from another configured value (`cat /sys/block/${HDD_DEV}/device/timeout`s) to ${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-30}s"
    echo ${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-30} >/sys/block/${HDD_DEV}/device/timeout
  else
    # set possibly-redundant timeout anyway, if configured
    if [ ${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-undefined} != "undefined" ] ; then
      echo "setting controller timeout to POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS (${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS}s)"
      echo ${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS} >/sys/block/${HDD_DEV}/device/timeout
    fi
  fi

  if ${TIMING_CMD} /dev/${HDD_DEV} | grep -q Disabled \
     || [ $SWITCH_FROM_OTHER_CONFIGURED_SMARTCTL_TIMEOUT = "true" ] \
     || [ ${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-undefined} != "undefined" ]
  then
    # ERC timeout is disabled or configured:
    # * set it to controller timeout -5 seconds
    #   - Allows redundancy provider to read data from another disk and re-write the bad block
    #     before the controller resets the entire drive and the raid looses redundancy/data completely.
    #   - Longer than the usual 7s default of dedicated raid drives, to allow as much
    #     ERC time as possible (good if degraded and for non-redundant partitions on same drive).
    ERC_TENTHS=$(expr `cat /sys/block/${HDD_DEV}/device/timeout` \* 10 - 50)
    # prevent exceeding max. scterc value
    if [ $ERC_TENTHS -gt 999 ] ; then
      ERC_TENTHS="999"
    fi
    echo "maximizing ERC timeout to controller timeout -5 seconds (`expr $ERC_TENTHS / 10`s)"
    ${TIMING_CMD},$ERC_TENTHS,$ERC_TENTHS /dev/${HDD_DEV} > /dev/null
  fi
fi
----------

#! /bin/sh
# smartctl-timeouts_possibly-redundant-partition.sh

# This script sets the timeouts for "mixed drives" that contain redundant
# and non-redundant partitions.

# A single, possibly-redundant partition is enough to set the entire drive's
# timeouts to possibly-redundant settings (with a determined ERC timeout slightly
# below the default or configured controller timout, if possible).
#
# This avoids to risk unknown disk recovery times and needing a very long
# controller timeouts. Where configuring such a ERC timout is possible,
# this means the disk recovery may be terminated quicker than the drive
# would without the timout set, but it ensures that there will be no resets
# leading to data loss and redundancy loss.

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
$SCRIPT_DIR/smartctl-timeouts_possibly-redundant-disk.sh $1

--------------
#!/bin/sh
# smartctl-timeouts_non-redundant-disk.sh

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. $SCRIPT_DIR/smartctl-timeouts_defaults

HDD_DEV="$1"

echo "Adjusting $HDD_DEV timeouts:"

if [ ${NONREDUNDANT_UNSURE_RESET_ALL_DISKS:-false} = "true" ] ; then
  # * disable any ERC timeout
  #   - Allows the drive to do ERC without imposing a timeout.
  ${TIMING_CMD},0,0 /dev/${HDD_DEV} > /dev/null

  # * Set the controller timeout to be considerably loooooong.
  #   - To allow the drive to give up its ERC attempts by itself.
  #   - Let the drive return a proper read error, so that the redundancy
  #     provider (md, lvm, btrfs, ...) can re-write the bad block.
  #   - Disk read errors thus result in long i/o blocking periods with
  #     no error messages that may not be watched by or reported to the user,
  #   - but waiting this long should prevent unecessary controller resets of the
  #     entire drive and the corresponding loss of redundancy/data.
  echo "NONREDUNDANT_UNSURE_RESET_ALL_DISKS is true, setting NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS (${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS}s)"
  echo ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS} >/sys/block/${HDD_DEV}/device/timeout

else

  if ! ${TIMING_CMD} /dev/${HDD_DEV} | grep -q Disabled \
    && ! ${TIMING_CMD} /dev/${HDD_DEV} | grep -q seconds
  then
    # ERC timeout is not supported (not disabled and not set)
    # * Set the controller timeout to be considerably loooooong.
    echo "Drive without ERC timeout support, setting NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS (${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS}s)"
    echo ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS} >/sys/block/${HDD_DEV}/device/timeout

  else

    if ${TIMING_CMD} /dev/${HDD_DEV} | grep -q seconds \
      || [ ${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-undefined} != "undefined" ]
    then

      # reset controller timeout, if a configured value was previously set
      if [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS:--1} ] \
        || [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:--1} ] \
        || [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:--1} ]
      then
        echo "resetting controller from another configured value (`cat /sys/block/${HDD_DEV}/device/timeout`s) to ${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-60}s"
        echo ${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-60} >/sys/block/${HDD_DEV}/device/timeout
      else
        # set non-redundant timeout anyway, if configured
        if [ ${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-undefined} != "undefined" ] ; then
          echo "setting configured NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS (${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS}s)"
          echo ${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS} >/sys/block/${HDD_DEV}/device/timeout
        fi
      fi

      # An ERC timeout is set or configured:
      # * change ERC timout to controller timeout -5 seconds
      #   - Longer than the usual 7s default of dedicated raid drives, to allow as much
      #     ERC time as possible.
      ERC_TENTHS=$(expr `cat /sys/block/${HDD_DEV}/device/timeout` \* 10 - 50)

      # prevent exceeding max. scterc value
      if [ $ERC_TENTHS -gt 999 ] ; then
        ERC_TENTHS="999"
      fi

      echo "maximizing ERC timeout to controller timeout -5 seconds (`expr $ERC_TENTHS / 10`s)"
      ${TIMING_CMD},$ERC_TENTHS,$ERC_TENTHS /dev/${HDD_DEV} > /dev/null

    else # ERC timeout disabled
      echo "found ERC timeout disabled, setting NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS (${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS}s)"
      echo ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS} >/sys/block/${HDD_DEV}/device/timeout
    fi
  fi
fi

--------------
#!/bin/sh
# smartctl-timeouts_non-redundant-partition.sh

# Because there may also be redundant partitions on this disk we must not
# unconditionally alter the timeouts.

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. $SCRIPT_DIR/smartctl-timeouts_defaults

HDD_DEV="$1"

REDUNDANT_DISK="unchecked"

# TODO
#for dev in `cd /sys/block/${HDD_DEV} ; ls -d ${HDD_DEV}*` ; do
#  if equvalent to udev's ENV{REDUNDANT_DEV}=="yes|possibly"; then
#    $REDUNDANT_DISK="possibly"
#  fi
#done
#if [ $REDUNDANT_DISK="unchecked" ] ; then
#  REDUNDANT_DISK="false"
#fi

if [ $REDUNDANT_DISK = "false" ] \
# TODO  && all partitions have been detected by udev already
then
  $SCRIPT_DIR/smartctl-timeouts_non-redundant-disk.sh $1
else
  $SCRIPT_DIR/smartctl-timeouts_possibly-redundant-disk.sh $1
fi

-------------
#!/bin/sh
# smartctl-timeouts_redundant-disk.sh

# Redundant timouts are NEVER to be triggerd by udev rules!
# Because only the redundancy providing kernel module knows the actual run-time
# redundancy status, can adjust it and call this script dynamically.
# Udev rules can only determine "possibly redundant" devices.

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. $SCRIPT_DIR/smartctl-timeouts_defaults

HDD_DEV="$1"

echo "Adjusting $HDD_DEV timeouts:"

if ! ${TIMING_CMD} /dev/${HDD_DEV} | grep -q Disabled \
  && ! ${TIMING_CMD} /dev/${HDD_DEV} | grep -q seconds
then
    # ERC timeout is not supported (not disabled and not set):
    # * Set the controller timeout to be considerably loooooong.
    #   - To allow the drive to give up its ERC attempts by itself.
    #   - Let the drive return a proper read error, so that the redundancy
    #     provider (md, lvm, btrfs, ...) can re-write the bad block.
    #   - Disk read errors thus result in long i/o blocking periods with
    #     no error messages that may not be watched by or reported to the user,
    #   - but waiting this long should prevent unecessary controller resets of the
    #     entire drive and the corresponding loss of redundancy/data.
    echo "Drive without ERC timeout support, setting NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS (${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS}s)"
    echo ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS} >/sys/block/${HDD_DEV}/device/timeout
else
  SWITCH_FROM_OTHER_CONFIGURED_SMARTCTL_TIMEOUT="false"

  # reset controller timeout, if a configured value was previously set
  if [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS:--1} ] \
    || [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${NONREDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:--1} ] \
    || [ `cat /sys/block/${HDD_DEV}/device/timeout` = ${POSSIBLY_REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:--1} ]
  then
    SWITCH_FROM_OTHER_CONFIGURED_SMARTCTL_TIMEOUT="true"
    echo "resetting controller from another configured value (`cat /sys/block/${HDD_DEV}/device/timeout`s) to ${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-30}s"
    echo ${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-30} >/sys/block/${HDD_DEV}/device/timeout
  else
    # set possibly-redundant timeout anyway, if configured
    if [ ${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-undefined} != "undefined" ] ; then
      echo "setting controller timeout to REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS (${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS}s)"
      echo ${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS} >/sys/block/${HDD_DEV}/device/timeout
    else
      if [ `cat /sys/block/${HDD_DEV}/device/timeout` -gt 30 ] ; then
        echo "reducing controller timout to 30 seconds"
        echo 30 >/sys/block/${HDD_DEV}/device/timeout
      fi
    fi
  fi

  if ${TIMING_CMD} /dev/${HDD_DEV} | grep -q Disabled \
    || [ $SWITCH_FROM_OTHER_CONFIGURED_SMARTCTL_TIMEOUT = "true" ] \
    || [ ${REDUNDANT_DISK_CONTROLLER_TIMEOUT_SECONDS:-undefined} != "undefined" ] \
# TODO:  || [ $(expr `cat /sys/block/${HDD_DEV}/device/timeout` \* 10 - 50) = read of current ERC timeout value ]
  then
    # ERC timeout is disabled, is configured, or has been "maximized" to the controller timeout -5 seconds:
    # * set it to 7 seconds
    #   - The usual quick 7s default of dedicated raid drives.
    #   - Allows redundancy provider to quickly read data from another disk and re-write the bad block
    #     before the controller resets the entire drive and the raid looses redundancy/data completely.
    echo "setting ERC timeout to 7 seconds"
    ${TIMING_CMD},70,70 /dev/${HDD_DEV} > /dev/null
  fi
fi

------
#!/bin/sh
# smartctl-timeouts_redundant-partition.sh

# Redundant timouts are NEVER to be triggerd by udev rules!
# Because only the redundancy providing kernel module knows the actual run-time
# redundancy status, can adjust it and call this script dynamically.
# Udev rules can only determine "possibly redundant" devices.

SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
. $SCRIPT_DIR/smartctl-timeouts_defaults

HDD_DEV="$1"

REDUNDANT_DISK="unchecked"

# TODO
#for dev in `cd /sys/block/${HDD_DEV} ; ls -d ${HDD_DEV}*` ; do
#  if equvalent to udev's ENV{REDUNDANT_DEV}=="false"; then
#    $REDUNDANT_DISK="false"
#  fi
#done
#if [ $REDUNDANT_DISK="unchecked" ] ; then
#  REDUNDANT_DISK="true"
#fi

if [ $REDUNDANT_DISK = "true" ] \
# TODO:  && all partitions have been detected by udev already
then
  $SCRIPT_DIR/smartctl-timeouts_redundant-disk.sh $1
else
  $SCRIPT_DIR/smartctl-timeouts_possibly-redundant-disk.sh $1
fi

---------

smartctl-timeouts README

The smartctl-timeouts scripts adjust controller and disk timeouts according
to redundancy status, and fix commonly mismatching defaults with drives that
have no error recovery timeout configured, which has often lead to data loss.

The scripts are to be called by udev rules during device initialization,
and by kernel modules acording to the run-time redundancy status changes.
Every redundancy providing block device module may ship with proper udev rules
that initialize the timeouts for their possibly redundant devices.

An alternative to these scripts may be to investigate the FASTFAIL
feature in the kernel.

NOTE: Correct execution during boot requires that distro package managers
      hook smartctl and the smartctl-timeouts scripts into the initramfs.

RATIONALE

The error recovery (ERC) timeout *must* be shorter than the controller timeout.

Otherwise read errors will cause controller resets, leading to direct data loss
or, if it is a redundant disk, loss of redundancy and a very high probability
of another read error and data loss when re-establishing the redundancy.

If a drive does not support adjusting its ERC timeout, the controller timeout
must be increased above the drive's maximal error recovery time.
If you don't want that kind of long device timeout, you should look for a drive
with SCT ERC timeout support. (smartctl -l scterc /dev/...)

IMPACT (without having specific timeouts configured)

For possibly redundant disks: If supported but simply disabled in the drive,
the ERC timeout is adjusted to the current controller timeout minus 5 seconds.

The controller timeout is only raised (to NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS)
for drives without SCTERC support. As well as for entirely non-redundant-disks,
in an attempt to allow these drives to finish their error recovery regularily
before a reset is triggerd.

As controller timeouts are only increased selectively (only drives without SCTERC
support and surely non-redundant disks), the scripts only adapt mismatching
timeouts, by default. Existing manufacturer or custom ERC timeout settings (as in
professional, dedicated, redundant setups, e.g. storage servers etc.) won't be
changed, except with specific configuration options.

TODO

* non-redundant-partitions: conditional udev triggering, or a test in the script could
  determine if all partions of the disk have been detected already and are all
  non-redundant, to call non-redundant-disk in this case.

* parser to read ERC timeout values?
    - redundant-disk: a previously set "controller timeout - 5 seconds" ERC timeout
      (possibly-redundant), could also be reset to 7 seconds, not just a "Disabled" value.

* If a redundancy controlling kernel module is to make dynamic adjustments,
  "redundant-partition" needs implementation.

smartctl-timeouts README

The smartctl-timeouts scripts adjust controller and disk timeouts according
to redundancy status, and fix commonly mismatching defaults with drives that
have no error recovery timeout configured, which has often lead to data loss.

The scripts are to be called by udev rules during device initialization,
and by kernel modules acording to the run-time redundancy status changes.
Every redundancy providing block device module may ship with proper udev rules
that initialize the timeouts for their possibly redundant devices.

An alternative to these scripts may be to investigate the FASTFAIL
feature in the kernel.

NOTE: Correct execution during boot requires that distro package managers
      hook smartctl and the smartctl-timeouts scripts into the initramfs.

RATIONALE

The error recovery (ERC) timeout *must* be shorter than the controller timeout.

Otherwise read errors will cause controller resets, leading to direct data loss
or, if it is a redundant disk, loss of redundancy and a very high probability
of another read error and data loss when re-establishing the redundancy.

If a drive does not support adjusting its ERC timeout, the controller timeout
must be increased above the drive's maximal error recovery time.
If you don't want that kind of long device timeout, you should look for a drive
with SCT ERC timeout support. (smartctl -l scterc /dev/...)

IMPACT (without having specific timeouts configured)

For possibly redundant disks: If supported but simply disabled in the drive,
the ERC timeout is adjusted to the current controller timeout minus 5 seconds.

The controller timeout is only raised (to NONREDUNDANT_UNSURE_CONTROLLER_RESET_SECONDS)
for drives without SCTERC support. As well as for entirely non-redundant-disks,
in an attempt to allow these drives to finish their error recovery regularily
before a reset is triggerd.

As controller timeouts are only increased selectively (only drives without SCTERC
support and surely non-redundant disks), the scripts only adapt mismatching
timeouts, by default. Existing manufacturer or custom ERC timeout settings (as in
professional, dedicated, redundant setups, e.g. storage servers etc.) won't be
changed, except with specific configuration options.

TODO

* non-redundant-partitions: conditional udev triggering, or a test in the script could
  determine if all partions of the disk have been detected already and are all
  non-redundant, to call non-redundant-disk in this case.

* parser to read ERC timeout values?
    - redundant-disk: a previously set "controller timeout - 5 seconds" ERC timeout
      (possibly-redundant), could also be reset to 7 seconds, not just a "Disabled" value.

* If a redundancy controlling kernel module is to make dynamic adjustments,
  "redundant-partition" needs implementation.

[-- Attachment #2: smartctl-timeouts_email2.zip --]
[-- Type: application/zip, Size: 9916 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: udev rules and scripts (erc timeout fix)
  2015-02-22 10:23   ` udev " Chris
@ 2015-02-25 14:37     ` Chris
  0 siblings, 0 replies; 3+ messages in thread
From: Chris @ 2015-02-25 14:37 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2379 bytes --]


Hello,

managed to test and bugfix the udev rules.

Simply dropping all files into /etc/udev/rules.d/ is enough, and
an "udevadm test /block/<drive_dev>/<partition_dev>" (dryrun) for a
raid member seems to run ok and tells me it would execute the
appropriate script.

However, for me, un- and re-plugging a device still does not seem to
execute the script. The _possibly-redundant-disk.sh currently logs
into /tmp/timeout-tst, but udev events don't appear there.

If you have the possiblity to test this in your /etc/udev/rules.d/
setup, it would be much appreciated.

Cheers,
Chris




# /etc/udev/rules.d/99-test_mdadm_smartctl-timeouts.rules

SUBSYSTEM!="block", GOTO="md_inc_end"

# handle potential components of arrays (the ones supported by md)
ENV{ID_FS_TYPE}=="linux_raid_member", GOTO="md_inc"

# "noiswmd" on kernel command line stops mdadm from handling
#  "isw" (aka IMSM - Intel RAID).
# "nodmraid" on kernel command line stops mdadm from handling
#  "isw" or "ddf".
IMPORT{cmdline}="noiswmd"
IMPORT{cmdline}="nodmraid"

ENV{nodmraid}=="?*", GOTO="md_inc_end"
ENV{ID_FS_TYPE}=="ddf_raid_member", GOTO="md_inc"
ENV{noiswmd}=="?*", GOTO="md_inc_end"
ENV{ID_FS_TYPE}=="isw_raid_member", GOTO="md_inc"
GOTO="md_inc_end"

LABEL="md_inc"

# initialize redundancy possibility status
# (only the kernel module could set actual run-time state, and may in the future
# set a dynamic FASTFAIL kernel device property instead of calling smartctl-timeout scripts)

IMPORT{program}="/sbin/mdadm --examine --export $tempnode"
ENV{MD_LEVEL}=="raid[1-9]*", ENV{REDUNDANT_DEV}="possibly"
ENV{MD_LEVEL}=="raid0", ENV{REDUNDANT_DEV}="false"


LABEL="md_inc_end"


# call initial HDD error correction timeouts adjustment
ENV{DEVTYPE}=="partition", ENV{REDUNDANT_DEV}=="possibly", TEST=="/usr/sbin/smartctl", RUN+="/etc/udev/rules.d/smartctl-timeouts_possibly-redundant-partition.sh $parent"
ENV{DEVTYPE}=="partition", ENV{REDUNDANT_DEV}=="false|", TEST=="/usr/sbin/smartctl", RUN+="/etc/udev/rules.d/smartctl-timeouts_non-redundant-partition.sh $parent"
ENV{DEVTYPE}=="disk", ENV{REDUNDANT_DEV}=="possibly", TEST=="/usr/sbin/smartctl", RUN+="/etc/udev/rules.d/smartctl-timeouts_posibly-redundant-disk.sh $devnode"
ENV{DEVTYPE}=="disk", ENV{REDUNDANT_DEV}=="false", TEST=="/usr/sbin/smartctl", RUN+="/etc/udev/rules.d/smartctl-timeouts_non-redundant-disk.sh $devnode"




















[-- Attachment #2: smartctl-timeouts_udevadm-test.tar.gz --]
[-- Type: application/x-gzip, Size: 5448 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-02-25 14:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-20 16:54 rules and scripts (erc timeout fix) email.bug
     [not found] ` <B639B66A-F606-43CD-8FCC-D1A7810762D1@gmx.de>
2015-02-22 10:23   ` udev " Chris
2015-02-25 14:37     ` Chris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).