All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Chris Murphy <lists@colorremedies.com>
Cc: linux-raid Raid <linux-raid@vger.kernel.org>
Subject: Re: recommended way to add ssd cache to mdraid array
Date: Fri, 11 Jan 2013 19:47:51 -0500	[thread overview]
Message-ID: <50F0B2B7.1080604@turmel.org> (raw)
In-Reply-To: <3EE4457C-A185-4223-A84D-5BFBB204A94C@colorremedies.com>

On 01/11/2013 12:46 PM, Chris Murphy wrote:
> 
> On Jan 11, 2013, at 10:39 AM, Chris Murphy <lists@colorremedies.com>
> wrote:
>> 
>> They probably have a high ERC time out as all consumer disks do so
>> you should also check /sys/block/sdX/device/timeout and make sure
>> it's not significantly less than the drive. It may be possible for
>> smartctl or hdparm to figure out what the drive ERC timeout is.
>> 
>> http://cgi.csc.liv.ac.uk/~greg/projects/erc/
> 
> Actually what I wrote is misleading to the point it's wrong. You want
> the linux device time out to be greater than the device timeout. The
> device needs to be allowed to give up, and report back a read error
> to linux/md, so that md knows it should reconstruct the missing data
> from parity, and overwrite the (obviously) bad blocks causing the
> read error.
> 
> If the linux device time out is even a little bit less than the
> drive's timeout, md never gets the sector read error, doesn't repair
> it, since linux boots the whole drive. Now instead of repairing a few
> sectors, you have a degraded array on your hands. Usual consumer
> drive time outs are quite high, they can be up to a couple minutes
> long. Linux device time out is 30 seconds.

This isn't quite right.  When the linux driver stack times out, it
passes the error to MD.  MD doesn't care if the drive reported the
error, or if the controller reported the error, it just knows that it
couldn't read that block.  It goes to recovery, which typically
generates the replacement data in a few milliseconds, and tries to write
back to the first disk.  *That* instantly fails, since the controller is
resetting the link and the drive is still in la-la land trying to read
the data.  MD will tolerate several bad reads before it kicks out a
drive, but will immediately kick if a write fails.

By the time you come to investigate, the drive has completed its
timeout, the link has reset, and the otherwise good drive is sitting
idle (failed).

Any array running with mismatched timeouts will kick a drive on every
unrecoverable read error, where it would likely have just fixed it.

Sadly, many hobbyist arrays are built with desktop drives, and the
timeouts are left mismatched.  When that hobbyist later learns s/he
should be scrubbing, the long-overdue scrub is very likely to produce
UREs on multiple drives (BOOM).

Phil

  parent reply	other threads:[~2013-01-12  0:47 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-22  6:57 recommended way to add ssd cache to mdraid array Thomas Fjellstrom
2012-12-23  3:44 ` Thomas Fjellstrom
2013-01-09 18:41   ` Thomas Fjellstrom
2013-01-10  6:25     ` Chris Murphy
2013-01-10 10:49       ` Thomas Fjellstrom
2013-01-10 21:36         ` Chris Murphy
2013-01-11  0:18           ` Stan Hoeppner
2013-01-11 12:35             ` Thomas Fjellstrom
2013-01-11 12:48               ` Thomas Fjellstrom
2013-01-14  0:05               ` Tommy Apel Hansen
2013-01-14  8:58                 ` Thomas Fjellstrom
2013-01-14 18:22                   ` Thomas Fjellstrom
2013-01-14 19:45                     ` Stan Hoeppner
2013-01-14 21:53                       ` Thomas Fjellstrom
2013-01-14 22:51                         ` Chris Murphy
2013-01-15  3:25                           ` Thomas Fjellstrom
2013-01-15  1:50                         ` Stan Hoeppner
2013-01-15  3:52                           ` Thomas Fjellstrom
2013-01-15  8:38                             ` Stan Hoeppner
2013-01-15  9:02                               ` Tommy Apel
2013-01-15 11:19                                 ` Stan Hoeppner
2013-01-15 10:47                               ` Tommy Apel
2013-01-16  5:31                               ` Thomas Fjellstrom
2013-01-16  8:59                                 ` John Robinson
2013-01-16 21:29                                   ` Stan Hoeppner
2013-02-10  6:59                                     ` Thomas Fjellstrom
2013-01-16 22:06                                 ` Stan Hoeppner
2013-01-14 21:38                     ` Tommy Apel Hansen
2013-01-14 21:47                     ` Tommy Apel Hansen
2013-01-11 12:20           ` Thomas Fjellstrom
2013-01-11 17:39             ` Chris Murphy
2013-01-11 17:46               ` Chris Murphy
2013-01-11 18:52                 ` Thomas Fjellstrom
2013-01-12  0:47                 ` Phil Turmel [this message]
2013-01-12  3:56                   ` Chris Murphy
2013-01-13 22:13                     ` Phil Turmel
2013-01-13 23:20                       ` Chris Murphy
2013-01-14  0:23                         ` Phil Turmel
2013-01-14  3:58                           ` Chris Murphy
2013-01-14 22:00                           ` Thomas Fjellstrom
2013-01-11 18:51               ` Thomas Fjellstrom
2013-01-11 22:17                 ` Stan Hoeppner
2013-01-12  2:44                   ` Thomas Fjellstrom
2013-01-12  8:33                     ` Stan Hoeppner
2013-01-12 14:44                       ` Thomas Fjellstrom
2013-01-13 19:18                 ` Chris Murphy
2013-01-14  9:06                   ` Thomas Fjellstrom
2013-01-11 18:50             ` Stan Hoeppner
2013-01-12  2:45               ` Thomas Fjellstrom
2013-01-12 12:06           ` Roy Sigurd Karlsbakk
2013-01-12 14:14             ` Stan Hoeppner
2013-01-12 16:37               ` Roy Sigurd Karlsbakk
2013-01-10 13:13   ` Brad Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50F0B2B7.1080604@turmel.org \
    --to=philip@turmel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.