From: Phil Turmel <philip@turmel.org>
To: Daniel Pocock <daniel@pocock.com.au>
Cc: Marcus Sorensen <shadowsor@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: md RAID with enterprise-class SATA or SAS drives
Date: Thu, 10 May 2012 11:15:10 -0400 [thread overview]
Message-ID: <4FABDB7E.2050602@turmel.org> (raw)
In-Reply-To: <4FABD7BF.2020208@pocock.com.au>
On 05/10/2012 10:59 AM, Daniel Pocock wrote:
>
>> Here is where Marcus and I part ways. A very common report I see on
>> this mailing list is people who have lost arrays where the drives all
>> appear to be healthy. Given the large size of today's hard drives,
>> even healthy drives will occasionally have an unrecoverable read error.
>>
>> When this happens in a raid array with a desktop drive without SCTERC,
>> the driver times out and reports an error to MD. MD proceeds to
>> reconstruct the missing data and tries to write it back to the bad
>> sector. However, that drive is still trying to read the bad sector and
>> ignores the controller. The write is immediately rejected. BOOM! The
>> *write* error ejects that member from the array. And you are now
>> degraded.
>>
>> If you don't notice the degraded array right away, you probably won't
>> notice until a URE on another drive pops up. Once that happens, you
>> can't complete a resync to revive the array.
>
> What action would you recommend for someone running md on desktop drives
> today? Can md be configured in some way to avoid such a disaster?
You have to set the controller's link timeout greater than the worst-
case recovery time. Unfortunately, that's generally not specified, and
therefore only discovered when you have a real URE. In my experience,
it's on the order of two to three minutes.
One thing to keep in mind: If you set the controller timeout that high,
you may encounter protocol timeouts in your services running on top of
those filesystems. So it isn't a general solution.
FWIW: /sys/block/sdX/device/timeout
>> Running a "check" or "repair" on an array without TLER will have the
>> opposite of the intended effect: any URE will kick a drive out instead
>> of fixing it.
>>
>> In the same scenario with an enterprise drive, or a drive with SCTERC
>> turned on, the drive read times out before the controller driver, the
>> controller never resets the link to the drive, and the followup write
>> succeeds. (The sector is either successfully corrected in place, or
>> it is relocated by the drive.) No BOOM.
>
> I tend to agree with that approach, and I think that is what Adaptec is
> proposing in their FAQ
>
> Presumably, if you really do need one of those sectors, the SCTERC
> timeout can be extended (e.g. by disk recovery software) to try harder?
Sure. SCTERC is set by the smartctl command. If you need to run
dd_rescue or some other recovery tool on a disk, you can simply set
SCTERC back to zero (disabled). Or cycle power on the drive. But you
would also have to set the controller's timeout, or it is pointless.
I don't know what you'd do with an enterprise drive that has TLER by
default.
>>>> - if a non-RAID SAS card is used, does it matter which card is chosen?
>>>> Does md work equally well with all of them?
>>>
>>> Yes, I believe md raid would work equally well on all SAS HBAs,
>>> however the cards themselves vary in performance. Some cards that have
>>> simple RAID built-in can be flashed to a dumb card in order to reclaim
>>> more card memory (LSI "IR mode" cards), but the performance gain is
>>> generally minimal
>>
>> Hardware RAID cards usually offer battery-backed write cache, which is
>> very valuable in some applications. I don't have a need for that kind
>> of performance, so I can't speak to the details. (Is Stan H.
>> listening?)
>
> BBWC is not just expensive, it also has an extra management overhead,
> batteries need to have full discharges occasionally (at a time when
> cache is off), routine battery replacement, etc
I haven't had to deal with this :-)
Phil
next prev parent reply other threads:[~2012-05-10 15:15 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-09 22:00 md RAID with enterprise-class SATA or SAS drives Daniel Pocock
2012-05-09 22:33 ` Marcus Sorensen
2012-05-10 13:34 ` Daniel Pocock
2012-05-10 13:51 ` Phil Turmel
2012-05-10 14:59 ` Daniel Pocock
2012-05-10 15:15 ` Phil Turmel [this message]
2012-05-10 15:26 ` Marcus Sorensen
2012-05-10 16:04 ` Phil Turmel
2012-05-10 17:53 ` Keith Keller
2012-05-10 18:10 ` Mathias Burén
2012-05-10 18:23 ` Phil Turmel
2012-05-10 19:15 ` Keith Keller
2012-05-10 18:42 ` Daniel Pocock
2012-05-10 19:09 ` Phil Turmel
2012-05-10 20:30 ` Daniel Pocock
2012-05-11 6:50 ` Michael Tokarev
2012-05-21 14:19 ` Brian Candler
2012-05-21 14:29 ` Phil Turmel
2012-05-26 21:58 ` Stefan *St0fF* Huebner
2012-05-10 21:43 ` Stan Hoeppner
2012-05-10 23:00 ` Marcus Sorensen
2012-05-10 21:15 ` Stan Hoeppner
2012-05-10 21:31 ` Daniel Pocock
2012-05-11 1:53 ` Stan Hoeppner
2012-05-11 8:31 ` Daniel Pocock
2012-05-11 13:54 ` Pierre Beck
2012-05-10 21:41 ` Phil Turmel
2012-05-10 22:27 ` David Brown
2012-05-10 22:37 ` Daniel Pocock
[not found] ` <CABYL=ToORULrdhBVQk0K8zQqFYkOomY-wgG7PpnJnzP9u7iBnA@mail.gmail.com>
2012-05-11 7:10 ` David Brown
2012-05-11 8:16 ` Daniel Pocock
2012-05-11 22:28 ` Stan Hoeppner
2012-05-21 15:20 ` CoolCold
2012-05-21 18:51 ` Stan Hoeppner
2012-05-21 18:54 ` Roberto Spadim
2012-05-21 19:05 ` Stan Hoeppner
2012-05-21 19:38 ` Roberto Spadim
2012-05-21 23:34 ` NeilBrown
2012-05-22 6:36 ` Stan Hoeppner
2012-05-22 7:29 ` David Brown
2012-05-23 13:14 ` Stan Hoeppner
2012-05-23 13:27 ` Roberto Spadim
2012-05-23 19:49 ` David Brown
2012-05-23 23:46 ` Stan Hoeppner
2012-05-24 1:18 ` Stan Hoeppner
2012-05-24 2:08 ` NeilBrown
2012-05-24 6:16 ` Stan Hoeppner
2012-05-24 2:10 ` NeilBrown
2012-05-24 2:55 ` Roberto Spadim
2012-05-11 22:17 ` Stan Hoeppner
-- strict thread matches above, loose matches on Subject: below --
2012-05-10 1:29 Richard Scobie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FABDB7E.2050602@turmel.org \
--to=philip@turmel.org \
--cc=daniel@pocock.com.au \
--cc=linux-raid@vger.kernel.org \
--cc=shadowsor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).