Re: md RAID with enterprise-class SATA or SAS drives

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Daniel Pocock <daniel@pocock.com.au>
Cc: Marcus Sorensen <shadowsor@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: md RAID with enterprise-class SATA or SAS drives
Date: Thu, 10 May 2012 11:15:10 -0400	[thread overview]
Message-ID: <4FABDB7E.2050602@turmel.org> (raw)
In-Reply-To: <4FABD7BF.2020208@pocock.com.au>

On 05/10/2012 10:59 AM, Daniel Pocock wrote:
> 
>> Here is where Marcus and I part ways.  A very common report I see on
>> this mailing list is people who have lost arrays where the drives all
>> appear to be healthy.  Given the large size of today's hard drives,
>> even healthy drives will occasionally have an unrecoverable read error.
>>
>> When this happens in a raid array with a desktop drive without SCTERC,
>> the driver times out and reports an error to MD.  MD proceeds to
>> reconstruct the missing data and tries to write it back to the bad
>> sector.  However, that drive is still trying to read the bad sector and
>> ignores the controller.  The write is immediately rejected.  BOOM!  The
>> *write* error ejects that member from the array.  And you are now
>> degraded.
>>
>> If you don't notice the degraded array right away, you probably won't
>> notice until a URE on another drive pops up.  Once that happens, you
>> can't complete a resync to revive the array.
> 
> What action would you recommend for someone running md on desktop drives
> today?  Can md be configured in some way to avoid such a disaster?

You have to set the controller's link timeout greater than the worst-
case recovery time.  Unfortunately, that's generally not specified, and
therefore only discovered when you have a real URE.  In my experience,
it's on the order of two to three minutes.

One thing to keep in mind:  If you set the controller timeout that high,
you may encounter protocol timeouts in your services running on top of
those filesystems.  So it isn't a general solution.

FWIW:  /sys/block/sdX/device/timeout

>> Running a "check" or "repair" on an array without TLER will have the
>> opposite of the intended effect: any URE will kick a drive out instead
>> of fixing it.
>>
>> In the same scenario with an enterprise drive, or a drive with SCTERC
>> turned on, the drive read times out before the controller driver, the
>> controller never resets the link to the drive, and the followup write
>> succeeds.  (The sector is either successfully corrected in place, or
>> it is relocated by the drive.)  No BOOM.
> 
> I tend to agree with that approach, and I think that is what Adaptec is
> proposing in their FAQ
> 
> Presumably, if you really do need one of those sectors, the SCTERC
> timeout can be extended (e.g. by disk recovery software) to try harder?

Sure.  SCTERC is set by the smartctl command.  If you need to run
dd_rescue or some other recovery tool on a disk, you can simply set
SCTERC back to zero (disabled).  Or cycle power on the drive.  But you
would also have to set the controller's timeout, or it is pointless.

I don't know what you'd do with an enterprise drive that has TLER by
default.

>>>> - if a non-RAID SAS card is used, does it matter which card is chosen?
>>>> Does md work equally well with all of them?
>>>
>>> Yes, I believe md raid would work equally well on all SAS HBAs,
>>> however the cards themselves vary in performance. Some cards that have
>>> simple RAID built-in can be flashed to a dumb card in order to reclaim
>>> more card memory (LSI "IR mode" cards), but the performance gain is
>>> generally minimal
>>
>> Hardware RAID cards usually offer battery-backed write cache, which is
>> very valuable in some applications.  I don't have a need for that kind
>> of performance, so I can't speak to the details.  (Is Stan H.
>> listening?)
> 
> BBWC is not just expensive, it also has an extra management overhead,
> batteries need to have full discharges occasionally (at a time when
> cache is off), routine battery replacement, etc

I haven't had to deal with this :-)

Phil

next prev parent reply	other threads:[~2012-05-10 15:15 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-09 22:00 md RAID with enterprise-class SATA or SAS drives Daniel Pocock
2012-05-09 22:33 ` Marcus Sorensen
2012-05-10 13:34   ` Daniel Pocock
2012-05-10 13:51   ` Phil Turmel
2012-05-10 14:59     ` Daniel Pocock
2012-05-10 15:15       ` Phil Turmel [this message]
2012-05-10 15:26     ` Marcus Sorensen
2012-05-10 16:04       ` Phil Turmel
2012-05-10 17:53         ` Keith Keller
2012-05-10 18:10           ` Mathias Burén
2012-05-10 18:23           ` Phil Turmel
2012-05-10 19:15             ` Keith Keller
2012-05-10 18:42         ` Daniel Pocock
2012-05-10 19:09           ` Phil Turmel
2012-05-10 20:30             ` Daniel Pocock
2012-05-11  6:50             ` Michael Tokarev
2012-05-21 14:19           ` Brian Candler
2012-05-21 14:29             ` Phil Turmel
2012-05-26 21:58               ` Stefan *St0fF* Huebner
2012-05-10 21:43       ` Stan Hoeppner
2012-05-10 23:00         ` Marcus Sorensen
2012-05-10 21:15     ` Stan Hoeppner
2012-05-10 21:31       ` Daniel Pocock
2012-05-11  1:53         ` Stan Hoeppner
2012-05-11  8:31           ` Daniel Pocock
2012-05-11 13:54             ` Pierre Beck
2012-05-10 21:41       ` Phil Turmel
2012-05-10 22:27       ` David Brown
2012-05-10 22:37         ` Daniel Pocock
     [not found]         ` <CABYL=ToORULrdhBVQk0K8zQqFYkOomY-wgG7PpnJnzP9u7iBnA@mail.gmail.com>
2012-05-11  7:10           ` David Brown
2012-05-11  8:16             ` Daniel Pocock
2012-05-11 22:28               ` Stan Hoeppner
2012-05-21 15:20                 ` CoolCold
2012-05-21 18:51                   ` Stan Hoeppner
2012-05-21 18:54                     ` Roberto Spadim
2012-05-21 19:05                       ` Stan Hoeppner
2012-05-21 19:38                         ` Roberto Spadim
2012-05-21 23:34                     ` NeilBrown
2012-05-22  6:36                       ` Stan Hoeppner
2012-05-22  7:29                         ` David Brown
2012-05-23 13:14                           ` Stan Hoeppner
2012-05-23 13:27                             ` Roberto Spadim
2012-05-23 19:49                             ` David Brown
2012-05-23 23:46                               ` Stan Hoeppner
2012-05-24  1:18                                 ` Stan Hoeppner
2012-05-24  2:08                                   ` NeilBrown
2012-05-24  6:16                                     ` Stan Hoeppner
2012-05-24  2:10                         ` NeilBrown
2012-05-24  2:55                           ` Roberto Spadim
2012-05-11 22:17             ` Stan Hoeppner
  -- strict thread matches above, loose matches on Subject: below --
2012-05-10  1:29 Richard Scobie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FABDB7E.2050602@turmel.org \
    --to=philip@turmel.org \
    --cc=daniel@pocock.com.au \
    --cc=linux-raid@vger.kernel.org \
    --cc=shadowsor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.