Re: Read errors on raid5 ignored, array still clean .. then disaster !!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Roger Heflin <rogerheflin@gmail.com>
To: Giovanni Tessore <giotex@texsoft.it>
Cc: Asdo <asdo@shiftmail.org>, linux-raid@vger.kernel.org
Subject: Re: Read errors on raid5 ignored, array still clean .. then disaster !!
Date: Sat, 30 Jan 2010 19:23:01 -0600	[thread overview]
Message-ID: <4B64DB75.70607@gmail.com> (raw)
In-Reply-To: <4B64B0C1.1040503@texsoft.it>

Giovanni Tessore wrote:
> 
>> RAID-5 unfortunately is inherently insecure, here is why:
>> If one drive gets kicked, MD starts recovering to a spare.
>> At that point any single read error during the regeneration (that's a 
>> scrub) will fail the array.
>> This is a problem that cannot be overcome in theory.
> Yes, I was just getting the same conclusion :-(
> Suppose you have 2Tb mainstream disks, with a read error ratio of 1 
> sector each 1E+14 bits = 1.25E+13 bytes.
> It means that you likely get an error every 6.25 times you read the 
> whole disk!
> So in case of failure of a disk, you have 1 possibility over 6 to fail 
> the array during recostruction.
> Simply unacceptable!
> 
> I looked at specs of some enterprise disks, and the read error ratio for 
> them is 1 sector each 1E+15 or each 1E+16. Better but still risky.
> Ok.. I'll definitely move to raid-6.
> 
> Also raid-1 with less than 3 disks becomes useless the same way :-(
> Idem for raid-10 ...wow
> 
> Well, these two threads on read errors came out as kinda instructive ... 
> doh!!
> 
> Regards
> 

The manufacturer error numbers don't mean much, typically good disks 
won't fail a rebuild that often, I have done alot of rebuilds and read 
  errors during a rebuild are fairly rare, especially if you are doing 
proper patrol reads/scans of the raid arrays, if you have disks 
setting for long periods of time without read scans, then all bets are 
off and you will have issues.

I have never seen a properly good disk that gets that high of error 
rate actually exposed to the OS.  I have dealt with >5000 disk for 
several years of history on the 5000+ disks.

I have seen a few manufacturer "lots" of disks that had seriously high 
error rates (certain sizes and certain manufacture ranges), in one set 
of disks (2000 desktop drives, and 600 "enterprise" drives) the 
desktop drives were almost perfect (same company as the "enterprise" 
disks, <10 replaced after 2 years, out of the 600 "enterprise" disks 
we had replaced about 50 (read errors) when we finally got the 
manufacture send a engineer on-site to validate what was happening and 
RMA the entire lot (all 550 disk that had not yet been replaced).

Nothing in the error rate indicated that behavior, so if you get a bad 
lot it will be very bad, if you don't get a bad lot you very likely 
won't have issues.   Now including the bad lots data into the overall 
error rate, may result in the error rate being that high, but you luck 
will depend on if you have a good or bad lot.

next prev parent reply	other threads:[~2010-01-31  1:23 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-26 22:28 Read errors on raid5 ignored, array still clean .. then disaster !! Giovanni Tessore
2010-01-27  7:41 ` Luca Berra
2010-01-27  9:01   ` Goswin von Brederlow
2010-01-29 10:48   ` Neil Brown
2010-01-29 11:58     ` Goswin von Brederlow
2010-01-29 19:14     ` Giovanni Tessore
2010-01-30  7:58       ` Luca Berra
2010-01-30 15:52         ` Giovanni Tessore
2010-01-30  7:54     ` Luca Berra
2010-01-30 10:55     ` Giovanni Tessore
2010-01-30 18:44     ` Giovanni Tessore
2010-01-30 21:41       ` Asdo
2010-01-30 22:20         ` Giovanni Tessore
2010-01-31  1:23           ` Roger Heflin [this message]
2010-01-31 10:45             ` Giovanni Tessore
2010-01-31 14:08               ` Roger Heflin
2010-01-31 14:31         ` Asdo
2010-02-01 10:56           ` Giovanni Tessore
2010-02-01 12:45             ` Asdo
2010-02-01 15:11               ` Giovanni Tessore
2010-02-01 13:27             ` Luca Berra
2010-02-01 15:51               ` Giovanni Tessore
2010-01-27  9:01 ` Asdo
2010-01-27 10:09   ` Giovanni Tessore
2010-01-27 10:50     ` Asdo
2010-01-27 15:06       ` Goswin von Brederlow
2010-01-27 16:15       ` Giovanni Tessore
2010-01-27 19:33     ` Richard Scobie
  -- strict thread matches above, loose matches on Subject: below --
2010-01-27  9:56 Giovanni Tessore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B64DB75.70607@gmail.com \
    --to=rogerheflin@gmail.com \
    --cc=asdo@shiftmail.org \
    --cc=giotex@texsoft.it \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).