From: Giovanni Tessore <giotex@texsoft.it>
To: linux-raid@vger.kernel.org
Subject: Re: Read errors on raid5 ignored, array still clean .. then disaster !!
Date: Wed, 27 Jan 2010 17:15:05 +0100 [thread overview]
Message-ID: <4B606689.8060608@texsoft.it> (raw)
In-Reply-To: <4B601A5F.9050301@shiftmail.org>
> Also is it possible that you experienced an electricity surge or a
> physical shock on the computer?
No, the machine is well protected by a good UPS unit.
I had a look to the kernel's sources (2.6.24, I'll check later latest
kernel)
I'm not a kernel's expert, I didn't need to take a deep look inside it
before, but:
Into drivers/md/raid5.c :
raid5_end_read_request()
{ ...
else if (atomic_read(&rdev->read_errors) > conf->max_nr_stripes)
printk(KERN_WARNING "raid5:%s: Too many read errors, failing device
%s.\n", mdname(conf->mddev), bdn);
... }
It surely keeps track of how many read errors occured! So, the driver
detects recovered read errors and counts them!
Later in the same source:
int run(mddev_t *mddev)
{ ...
conf->max_nr_stripes = NR_STRIPES;
... }
Looks like it statically sets a limit of 256 recovered read errors
before setting the device as faulty.
Moreover, from the *Documentation/md.txt* file itself, it states that
for each md device into /sys/block there is a directory for each
physical device composing the array, like /sys/block/md0/md/dev-sda1,
each directory containing many device's parameter, and among them:
...
errors
An approximate count of read errors that have been detected on
this device but have not caused the device to be evicted from
the array (either because they were corrected or because they
happened while the array was read-only). When using version-1
metadata, this value persists across restarts of the array.
...
So the info on how many read errors occured on device is collected and
available!
I would suggest the following, that *would surely help a lot in
preventing disasters* like mine:
- it seems that the max number of read errors allowed is set statically
into raid5.c by "conf->max_nr_stripes = NR_STRIPES;" to 256, eventually
let it be configurable by an entry into /sys/block/mdXX
- let /proc/mdstat report clearly how many read errors occurred per
device, if any
- let mdadm be configurable in monitor mode to trigger alerts when the
number of read errors for a device changes or goes > n
- explain clearly in the how-to and other user's documentation what's
the behaviour of the raid towards read errors; after a fast survey among
my colleagues, i have noticed nobody was aware of this, and all of them
were sure that raid had the same behaviour for both write and read errors!
I examined kernel source 2.6.24 and mdadm 2.6.3, maybe into newer
versions this already happens; if so, sorry.
My knowledge of linux-raud implementation is not good (otherwise I would
anwser here, not ask :P ), but maybe I can help.
Thanks
Giovanni
next prev parent reply other threads:[~2010-01-27 16:15 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-26 22:28 Read errors on raid5 ignored, array still clean .. then disaster !! Giovanni Tessore
2010-01-27 7:41 ` Luca Berra
2010-01-27 9:01 ` Goswin von Brederlow
2010-01-29 10:48 ` Neil Brown
2010-01-29 11:58 ` Goswin von Brederlow
2010-01-29 19:14 ` Giovanni Tessore
2010-01-30 7:58 ` Luca Berra
2010-01-30 15:52 ` Giovanni Tessore
2010-01-30 7:54 ` Luca Berra
2010-01-30 10:55 ` Giovanni Tessore
2010-01-30 18:44 ` Giovanni Tessore
2010-01-30 21:41 ` Asdo
2010-01-30 22:20 ` Giovanni Tessore
2010-01-31 1:23 ` Roger Heflin
2010-01-31 10:45 ` Giovanni Tessore
2010-01-31 14:08 ` Roger Heflin
2010-01-31 14:31 ` Asdo
2010-02-01 10:56 ` Giovanni Tessore
2010-02-01 12:45 ` Asdo
2010-02-01 15:11 ` Giovanni Tessore
2010-02-01 13:27 ` Luca Berra
2010-02-01 15:51 ` Giovanni Tessore
2010-01-27 9:01 ` Asdo
2010-01-27 10:09 ` Giovanni Tessore
2010-01-27 10:50 ` Asdo
2010-01-27 15:06 ` Goswin von Brederlow
2010-01-27 16:15 ` Giovanni Tessore [this message]
2010-01-27 19:33 ` Richard Scobie
-- strict thread matches above, loose matches on Subject: below --
2010-01-27 9:56 Giovanni Tessore
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B606689.8060608@texsoft.it \
--to=giotex@texsoft.it \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).