Re: Read errors on raid5 ignored, array still clean .. then disaster !!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Giovanni Tessore <giotex@texsoft.it>
To: Asdo <asdo@shiftmail.org>
Cc: linux-raid@vger.kernel.org, Neil Brown <neilb@suse.de>
Subject: Re: Read errors on raid5 ignored, array still clean .. then disaster !!
Date: Mon, 01 Feb 2010 11:56:39 +0100	[thread overview]
Message-ID: <4B66B367.6030803@texsoft.it> (raw)
In-Reply-To: <4B65943A.4040800@shiftmail.org>

Asdo wrote:
> Asdo wrote:
>> Giovanni Tessore wrote:
>>> Hm funny ... I just read now from md's man:
>>>
>>> "In  kernels  prior to about 2.6.15, a read error would cause the 
>>> same effect as a write error.  In later kernels, a read-error will 
>>> instead cause md to attempt a recovery by overwriting the bad block. 
>>> .... "
>>>
>>> So things have changed since 2.6.15 ... I was not so wrong to expect 
>>> "the old behaviour" and to be disappointed.
>>> [CUT]
>>
>> I have the feeling the current behaviour is the correct one at least 
>> for RAID-6.
>>
>> [CUT]
>>
>> RAID-5 unfortunately is inherently insecure, here is why:
>> If one drive gets kicked, MD starts recovering to a spare.
>> At that point any single read error during the regeneration (that's a 
>> scrub) will fail the array.
>> This is a problem that cannot be overcome in theory.
>> Even with the old algorithm, any sector failed after the last scrub 
>> will take the array down when one disk is kicked (array will go down 
>> during recovery).
>> So you would need to scrub continuously, or you would need 
>> hyper-reliable disks.
>>
>> Yes, kicking a drive as soon as it presents the first unreadable 
>> sector can be a strategy for trying to select hyper-reliable disks...
>>
>> Ok after all I might agree this can be a reasonable strategy for 
>> raid1,4,5...
Yes, the new behaviour is good for raid-6.
But unsafe for raid 1, 4, 5, 10.
The old behaviour saved me in the past, and would have saved also this 
time, allowing me to replace the disk as soon as possible.. the new one 
didn't at all...
The new one must at least clearly alert the user that a drive is getting 
read errors on raid 1,4,5,10.
>>
>> I'd also agree that with 1.x superblock it would be desirable to be 
>> able to set the maximum number of corrected read errors before a 
>> drive is kicked, which could be set by default to 0 for raid 1,4,5 
>> and to... I don't know... 20 (50? 100?) for raid-6.
Now seems to be hard coded set to 256 ...

> I can add that this situation with raid 1,4,5,10 would be greatly 
> ameliorated when the hot-device-replace feature gets implemented.
> The failures of raid 1,4,5,10 are due to the zero redundancy you get 
> in the time frame from when a drive is kicked to the end of the 
> regeneration.
> However if the hot-device-replace feature is added, and gets linked to 
> the drive-kicking process, the problem would disappear.
>
> Ideally instead of kicking (=failing) a drive directly, the 
> hot-device-replace feature would be triggered, so the new drive would 
> be replicated from the one being kicked (a few damaged blocks can be 
> read from parity in case of read error from the disk being replaced, 
> but don't "fail" the drive during the replace process just for this) 
> In this way you get 1 redundancy instead of zero during rebuild, and 
> the chances of the array going down during the rebuild process are 
> pratically nullified.
>
> I think the "hot-device-replace" action can replace the "fail" action 
> in the most used scenarios, which is the drive being kicked due to:
> 1 - unrecoverable read error (end of relocation sectors available)
> 2 - surpassing the threshold for max corrected read errors (see above, 
> if/when this gets implemented on 1.x superblock)
Both solutions seems good to me ... even if, yes, #1 is problably 
overcame by #2.
And personally I'd keep zero, or a very low value, for max corrected 
error threshold in raid 1,4,5,10.

I may suggest also this for emergency situation (no hot spares 
available, already degraded array, read error on remaining disk(s)):
suppose you have a single disk which is getting read errors: maybe you 
lose some data, but you can still do a backup and save most data.
If you have a degraded array which gets an unrecoverable read error, 
reconstruction is not feasible any more, the disk is mark failed and the 
whole array fails. The you have to recreate with --force or 
--assume-clean, start to backup data.. but on each other read errors you 
get the array offline again ... recreate in --force mode .. and so on 
(which needs skill and it's error prone).
Maybe would be useful to have unrecoverable read errors on degraded 
array to:
1) sent a big alert to admin, with detailed info
2) don't fail the disk and whole array, but set it into readonly mode
3) report read errors to the OS (as for a single drive)

This would allow to do a partial backup and save as most data as 
possible without having to tamper with create --force etc..
Experienced use may still try to overcome the situation readding devices 
(maybe one gone out simply due to timeout), with create --force, etc.. 
but many persons may have big troubles doing so, and they just see all 
their data gone, when just a few sectors over many Tb are unreadable and 
most data cab be saved.

Bets regards.

-- 
Cordiali saluti.
Yours faithfully.

Giovanni Tessore

next prev parent reply	other threads:[~2010-02-01 10:56 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-26 22:28 Read errors on raid5 ignored, array still clean .. then disaster !! Giovanni Tessore
2010-01-27  7:41 ` Luca Berra
2010-01-27  9:01   ` Goswin von Brederlow
2010-01-29 10:48   ` Neil Brown
2010-01-29 11:58     ` Goswin von Brederlow
2010-01-29 19:14     ` Giovanni Tessore
2010-01-30  7:58       ` Luca Berra
2010-01-30 15:52         ` Giovanni Tessore
2010-01-30  7:54     ` Luca Berra
2010-01-30 10:55     ` Giovanni Tessore
2010-01-30 18:44     ` Giovanni Tessore
2010-01-30 21:41       ` Asdo
2010-01-30 22:20         ` Giovanni Tessore
2010-01-31  1:23           ` Roger Heflin
2010-01-31 10:45             ` Giovanni Tessore
2010-01-31 14:08               ` Roger Heflin
2010-01-31 14:31         ` Asdo
2010-02-01 10:56           ` Giovanni Tessore [this message]
2010-02-01 12:45             ` Asdo
2010-02-01 15:11               ` Giovanni Tessore
2010-02-01 13:27             ` Luca Berra
2010-02-01 15:51               ` Giovanni Tessore
2010-01-27  9:01 ` Asdo
2010-01-27 10:09   ` Giovanni Tessore
2010-01-27 10:50     ` Asdo
2010-01-27 15:06       ` Goswin von Brederlow
2010-01-27 16:15       ` Giovanni Tessore
2010-01-27 19:33     ` Richard Scobie
  -- strict thread matches above, loose matches on Subject: below --
2010-01-27  9:56 Giovanni Tessore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B66B367.6030803@texsoft.it \
    --to=giotex@texsoft.it \
    --cc=asdo@shiftmail.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).