From: Redeeman <redeeman@metanurb.dk>
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: detection/correction of corruption with raid6
Date: Tue, 16 Dec 2008 23:25:17 +0100 [thread overview]
Message-ID: <1229466317.22331.71.camel@localhost> (raw)
In-Reply-To: <1229464713.6159.16.camel@localhost.localdomain>
On Tue, 2008-12-16 at 22:58 +0100, Piergiorgio Sartor wrote:
> Hi all,
>
> while I do agree that the issue needs more in deep thinking,
> I would like to tell a recent story that happened to me.
>
> I was testing a RAID-6 array, with 7, small, HDs.
> Intention was to get used to different situations, repair,
> grow, fail, remove, etc.
>
> After some playing, I started to check the files on the array
> and I found out that they were not (always) correct.
> So I started a check of the array, which returned some 1000 or
> more mismatches.
>
> After some investigation, I found out that one HD had a "flaky"
> interface, data was correctly written, but sometimes, randomly,
> reading returned some "wrong" bits (re-cabling solved the issue).
>
> To check this with RAID-6, I could run the check with 6 disks,
> for 7 times, each with a different disk removed, until one run
> returned no mismatches.
> At this point, I knew which "data path" was defective.
>
> It would have saved a lot of time, if the check could have
> done this automatically...
Exactly! this is partly the point i make too
>
> So, my RFE, would be, if possible, to try, during RAID-6 check,
> to find out if and which HD has the mismatch.
> Ideally, at the end of the check, the system log should show
> how many mismatches, if any, are likely to belong to which HD
> or are undetermined.
> This would help to diagnose the full data path and reduce
> testing time in case of problems.
> In case only one HD results problematic, this one could be
> failed, removed and the complete cabling, I/F and so on checked.
> Of course, this goes beyond the simple "HD failure protection"
> scope of RAID, nevertheless I do not see why this possibility
> should be neglected, unless it is too complex/difficult to
> implement and maintain.
Yeah, I myself do not know how much more complicated this would make
things, but i would imagine it would be worth it..
>
> Regarding the possibility of recovery, I have one question:
>
> Why a RAID system might have inconsistencies?
> Why do we have a "check" command at all, to run weekly or monthly?
As previously stated in discussion, while most bitflips etc does not
happen on disk(apparently), they do happen, whether its in ram, pci,
controller etc...
Also, i imagine its just to be on top of things, read and ensure stuff
works.. (but this is pure speculation)
>
> Thanks,
>
> bye,
>
>
next prev parent reply other threads:[~2008-12-16 22:25 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-16 21:58 detection/correction of corruption with raid6 Piergiorgio Sartor
2008-12-16 22:25 ` Redeeman [this message]
2008-12-17 21:52 ` Piergiorgio Sartor
2008-12-19 4:39 ` Neil Brown
2008-12-19 5:38 ` Redeeman
2008-12-17 14:48 ` Bill Davidsen
2008-12-17 15:50 ` David Lethe
[not found] ` <494960E8.8020407@tmr.com>
2008-12-17 21:47 ` David Lethe
-- strict thread matches above, loose matches on Subject: below --
2008-12-19 8:40 piergiorgio.sartor
2008-12-19 13:10 ` Redeeman
2008-12-05 21:00 Redeeman
2008-12-05 21:02 ` Justin Piszcz
2008-12-05 21:06 ` Redeeman
2008-12-05 21:09 ` Justin Piszcz
2008-12-05 21:12 ` Redeeman
2008-12-05 21:17 ` Justin Piszcz
2008-12-05 21:30 ` Michał Przyłuski
2008-12-05 22:12 ` Peter Rabbitson
2008-12-05 22:26 ` Michał Przyłuski
2008-12-05 22:43 ` Greg Freemyer
2008-12-06 0:39 ` Roger Heflin
2008-12-12 15:31 ` Redeeman
2008-12-16 2:33 ` Neil Brown
2008-12-16 6:33 ` Redeeman
2008-12-16 7:59 ` Mattias Wadenstein
2008-12-16 22:20 ` Chris Worley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1229466317.22331.71.camel@localhost \
--to=redeeman@metanurb.dk \
--cc=linux-raid@vger.kernel.org \
--cc=piergiorgio.sartor@nexgo.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).