* read error recovery threshold
@ 2014-09-15 16:56 Eric Mei
2014-09-22 3:35 ` NeilBrown
0 siblings, 1 reply; 2+ messages in thread
From: Eric Mei @ 2014-09-15 16:56 UTC (permalink / raw)
To: linux-raid
Hi,
After a read error detected, RAID6 will initiate a recovery procedure
try to correct it, until the number of read error exceeds a threshold,
which is "conf->max_nr_stripes" (see raid5_end_read_request()), I'm
wondering the reasoning behind this. To me the threshold seems a drive
property, but max_nr_stripes is a array-wide cache setting and can be
changed at runtime. In our specific case, we observed a drive emitting
lots of read errors without being marked as faulty because the larger
max_nr_stripes
setting.
Look at other part of MD code, there is "mddev::max_corr_read_errors"
which is set to 20, but only RAID10 makes use of it. Also the comment
above MD_DEFAULT_MAX_CORRECTED_READ_ERRORS says "...We divide the read
error count by 2 for every hour elapsed between read errors", but I
don't see any code matching this description.
Any thoughts? Thanks
Eric
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: read error recovery threshold
2014-09-15 16:56 read error recovery threshold Eric Mei
@ 2014-09-22 3:35 ` NeilBrown
0 siblings, 0 replies; 2+ messages in thread
From: NeilBrown @ 2014-09-22 3:35 UTC (permalink / raw)
To: Eric Mei; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]
On Mon, 15 Sep 2014 10:56:11 -0600 Eric Mei <meijia@gmail.com> wrote:
> Hi,
>
> After a read error detected, RAID6 will initiate a recovery procedure
> try to correct it, until the number of read error exceeds a threshold,
> which is "conf->max_nr_stripes" (see raid5_end_read_request()), I'm
> wondering the reasoning behind this. To me the threshold seems a drive
> property, but max_nr_stripes is a array-wide cache setting and can be
> changed at runtime. In our specific case, we observed a drive emitting
> lots of read errors without being marked as faulty because the larger
> max_nr_stripes
> setting.
>
> Look at other part of MD code, there is "mddev::max_corr_read_errors"
> which is set to 20, but only RAID10 makes use of it. Also the comment
> above MD_DEFAULT_MAX_CORRECTED_READ_ERRORS says "...We divide the read
> error count by 2 for every hour elapsed between read errors", but I
> don't see any code matching this description.
>
> Any thoughts? Thanks
Yes, it is inconsistent.
It wasn't designed to be inconsistent, it just happened.
Patch with good justification will be looked on kindly.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-09-22 3:35 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-15 16:56 read error recovery threshold Eric Mei
2014-09-22 3:35 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).