Re: md RAID with enterprise-class SATA or SAS drives

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Pocock <daniel@pocock.com.au>
To: Phil Turmel <philip@turmel.org>
Cc: Marcus Sorensen <shadowsor@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: md RAID with enterprise-class SATA or SAS drives
Date: Thu, 10 May 2012 20:30:37 +0000	[thread overview]
Message-ID: <4FAC256D.2000903@pocock.com.au> (raw)
In-Reply-To: <4FAC1259.6090407@turmel.org>



On 10/05/12 19:09, Phil Turmel wrote:
> On 05/10/2012 02:42 PM, Daniel Pocock wrote:
>>
>> I think you have to look at the average user's perspective: even most IT
>> people don't want to know everything about what goes on in their drives.
>>  They just expect stuff to work in a manner they consider `sensible'.
>> There is an expectation that if you have RAID you have more safety than
>> without RAID.  The idea that a whole array can go down because of
>> different sectors failing in each drive seems to violate that expectation.
> 
> You absolutely do have more safety, you just might not have as much more
> safety as you think.  Modern distributions try hard to automate much of
> this setup (e.g. Ubuntu tries to set up mdmon for you when you install
> mdadm), but it is not 100%.
> 
> Expectations have also changed in the past few years, too, in opposing
> ways.  One, hard drive capacities have skyrocketed (Yay!), but error
> rate specs have not, so typical users are more likely to encounter UREs.
> 
> Two, Linux has gained much more acceptance from home users building
> media servers and such, with much more exposure to non-enterprise
> components.
> 
> Not to excuse the situation--just to explain it.  Coding in this
> arena is mostly volunteers, too.

I understand what you mean, and some of those issues can't be solved
with some quick fix.

However, the degraded array situation where the user doesn't know what
to do is probably not so bad for a highly technical user who can choose
the correct drive to rescue

In the heat of battle (I've been in various corporate environments when
RAID systems have gone down) there is often tremendous pressure and
emotion.  In that scenario, someone might not have a lot of time to
investigate what is really wrong, and might form the conclusion that all
the drives are completely dead even though it is just a case of a few
bad sectors on each.

>>> Coordinating the drive and the controller timeouts is the *only* way
>>> to avoid the URE kickout scenario.
>>
>> I really think that is something that needs consideration, as a minimum,
>> should md log a warning message if SCTERC is not supported and
>> configured in a satisfactory way?
> 
> This sounds useful.

Maybe it could be checked periodically in case it changes, or in case
not all drives are present at boot time

>>> Changing TLER/ERC when an array becomes degraded for a real hardware
>>> failure is a useful idea. I think I'll look at scripting that.
>>
>> Ok, so I bought an enterprise grade drive, the WD RE4 (2TB) and I'm
>> about to add it in place of the drive that failed.
>>
>> I did a quick check with smartctl:
>>
>> # smartctl -a /dev/sdb -l scterc
>> ....
>> SCT Error Recovery Control:
>>            Read:     70 (7.0 seconds)
>>           Write:     70 (7.0 seconds)
>>
>> so the TLER feature appears to be there.  I haven't tried changing it.
>>
>> For my old Barracuda 7200.12 that is still working, I see this:
>>
>> SCT Error Recovery Control:
>>            Read: Disabled
>>           Write: Disabled
> 
> You should try changing it.  Drives that don't support it won't even
> show you that.
> 
> You can then put "smartctl -l scterc,70,70 /dev/sdX" in /etc/rc.local or
> your distribution's equivalent.

Done - it looks like the drive accepted it

This is what I put in rc.local: I'm hoping that my drives always come up
as sd[ab] of course, are there other ways to do this using disk labels,
or does md have any type of callback/hook scripts (e.g. like ppp-up.d)?

echo -n "smartctl: Trying to enable SCTERC / TLER on main disks..."
/usr/sbin/smartctl -l scterc,70,70 /dev/sda > /dev/null
/usr/sbin/smartctl -l scterc,70,70 /dev/sdb > /dev/null
echo "."

I also have some /sbin/blockdev --setra calls in rc.local, do you have
any suggestions on how that should be optimized for the LVM/md
combination, e.g. I have

Raw partitions: /dev/sd[ab]2 as elements of the RAID1
MD: /dev/md2 as a PV for LVM
LVM: various LVs for different things (e.g. some for photos, some of
compiling large source code projects, very different IO patterns for
each LV)

>> and a diff between the full output for both drives reveals the following:
>>
>> -SCT capabilities:             (0x103f) SCT Status supported.
>> +SCT capabilities:             (0x303f) SCT Status supported.
>>                                         SCT Error Recovery Control
>> supported.
>>                                         SCT Feature Control supported.
>>                                         SCT Data Table supported.
>>
>>
>>
>>
>>>> Here are a few odd things to consider, if you're worried about this topic:
>>>>
>>>> * Using smartctl to increase the ERC timeout on enterprise SATA
>>>> drives, say to 25 seconds, for use with md. I have no idea if this
>>>> will cause the drive to actually try different methods of recovery,
>>>> but it could be a good middle ground.
>>>
>>
>> What are the consequences if I don't do that?  I currently have 7
>> seconds on my new drive.  If md can't read a sector from the drive, will
>> it fail the whole drive?  Will it automatically read the sector from the
>> other drive so the application won't know something bad happened?  Will
>> it automatically try to re-write the sector on the drive that couldn't
>> read it?
> 
> MD fails drives on *write* errors.  It reconstructs from mirrors or
> parity on read errors and writes the result back to the origin drive.

Ok, that is re-assuring

>> Would you know how btrfs behaves in that same scenario - does it try to
>> write out the sector to the drive that failed the read?  Does it also
>> try to write out the sector when a read came in with a bad checksum and
>> it got a good copy from the other drive?
> 
> I haven't experimented with btrfs yet.  It is still marked experimental.

Apparently

a) it may be supported in the next round of major distributions (e.g.
Debian 7 is considering it)
b) the only reason it is still marked experimental (and this is what
I've read, it is not my opinion as I don't know enough about it) is
simply because btrfsck is not fully complete

Also, there is heavy competition from ZFS on FreeBSD, I hear a lot about
people using that combination because of the perceived lateness of btrfs
on Linux - but once again, I don't know how well the ZFS/FreeBSD
combination handles drive hardware, all I know is that ZFS has the
checksum capability (which gives it an edge over any regular RAID1 like
mdraid)

next prev parent reply	other threads:[~2012-05-10 20:30 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-09 22:00 md RAID with enterprise-class SATA or SAS drives Daniel Pocock
2012-05-09 22:33 ` Marcus Sorensen
2012-05-10 13:34   ` Daniel Pocock
2012-05-10 13:51   ` Phil Turmel
2012-05-10 14:59     ` Daniel Pocock
2012-05-10 15:15       ` Phil Turmel
2012-05-10 15:26     ` Marcus Sorensen
2012-05-10 16:04       ` Phil Turmel
2012-05-10 17:53         ` Keith Keller
2012-05-10 18:10           ` Mathias Burén
2012-05-10 18:23           ` Phil Turmel
2012-05-10 19:15             ` Keith Keller
2012-05-10 18:42         ` Daniel Pocock
2012-05-10 19:09           ` Phil Turmel
2012-05-10 20:30             ` Daniel Pocock [this message]
2012-05-11  6:50             ` Michael Tokarev
2012-05-21 14:19           ` Brian Candler
2012-05-21 14:29             ` Phil Turmel
2012-05-26 21:58               ` Stefan *St0fF* Huebner
2012-05-10 21:43       ` Stan Hoeppner
2012-05-10 23:00         ` Marcus Sorensen
2012-05-10 21:15     ` Stan Hoeppner
2012-05-10 21:31       ` Daniel Pocock
2012-05-11  1:53         ` Stan Hoeppner
2012-05-11  8:31           ` Daniel Pocock
2012-05-11 13:54             ` Pierre Beck
2012-05-10 21:41       ` Phil Turmel
2012-05-10 22:27       ` David Brown
2012-05-10 22:37         ` Daniel Pocock
     [not found]         ` <CABYL=ToORULrdhBVQk0K8zQqFYkOomY-wgG7PpnJnzP9u7iBnA@mail.gmail.com>
2012-05-11  7:10           ` David Brown
2012-05-11  8:16             ` Daniel Pocock
2012-05-11 22:28               ` Stan Hoeppner
2012-05-21 15:20                 ` CoolCold
2012-05-21 18:51                   ` Stan Hoeppner
2012-05-21 18:54                     ` Roberto Spadim
2012-05-21 19:05                       ` Stan Hoeppner
2012-05-21 19:38                         ` Roberto Spadim
2012-05-21 23:34                     ` NeilBrown
2012-05-22  6:36                       ` Stan Hoeppner
2012-05-22  7:29                         ` David Brown
2012-05-23 13:14                           ` Stan Hoeppner
2012-05-23 13:27                             ` Roberto Spadim
2012-05-23 19:49                             ` David Brown
2012-05-23 23:46                               ` Stan Hoeppner
2012-05-24  1:18                                 ` Stan Hoeppner
2012-05-24  2:08                                   ` NeilBrown
2012-05-24  6:16                                     ` Stan Hoeppner
2012-05-24  2:10                         ` NeilBrown
2012-05-24  2:55                           ` Roberto Spadim
2012-05-11 22:17             ` Stan Hoeppner
  -- strict thread matches above, loose matches on Subject: below --
2012-05-10  1:29 Richard Scobie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FAC256D.2000903@pocock.com.au \
    --to=daniel@pocock.com.au \
    --cc=linux-raid@vger.kernel.org \
    --cc=philip@turmel.org \
    --cc=shadowsor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).