From: NeilBrown <neilb@suse.de>
To: Phil Turmel <philip@turmel.org>
Cc: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>,
linux-raid@vger.kernel.org
Subject: Re: md road-map: 2011
Date: Thu, 17 Feb 2011 11:52:57 +1100 [thread overview]
Message-ID: <20110217115257.28a8d174@notabene.brown> (raw)
In-Reply-To: <4D5C6AAF.1040600@turmel.org>
On Wed, 16 Feb 2011 19:24:15 -0500 Phil Turmel <philip@turmel.org> wrote:
> On 02/16/2011 04:48 PM, NeilBrown wrote:
> > On Wed, 16 Feb 2011 21:29:39 +0100 Piergiorgio Sartor
> >>
> >>> Better reporting of inconsistencies.
> >>> ------------------------------------
> >>>
> >>> When a 'check' finds a data inconsistency it would be useful if it
> >>> was reported. That would allow a sysadmin to try to understand the
> >>> cause and possibly fix it.
> >>
> >> Could you, please, consider to add, for RAID-6, the
> >> capability to report also which device, potentially,
> >> has the problem? Thanks!
> >
> > I would rather leave that to user-space. If I report where the problem is, a
> > tool could directly read all the blocks in that stripe and perform any fancy
> > calculations you like. I may even write that tool (but no promises).
>
> Hmmm. The existing "check" code, if it encounters a read error, will use
> available redundancy to recover that data and rewrite it on the spot.
>
> Without a read error, or with multiple redundancy, the calculations to
> check consistency are performed and reported. With all the data "hot", and half
> the calculation to pinpoint an inconsistency done, it seems a shame to have
> userspace redo it.
>
> Are you adamantly opposed to the kernel doing this? (For Raid6) Code talks,
> of course, but I'd rather not start if I'm only going to be shot down.
>
I like to think I remain open-minded to any compelling arguments.
However putting code into the kernel which *only* tells user-space something
that it could figure out for itself doesn't sound sensible - though it
depends a bit on how much code.
Also - as I understand it - the RAID6 code works on a byte-by-byte basis.
This the P and Q bytes are computed from the N data bytes, and collections of
these bytes form blocks.
The "which block is bad calculation" take the data bytes and the P and Q
bytes and produces a new byte. If that byte is < N, it means that just
changing data byte N can make P and Q consistent. (if it is N, the the P
bytes is bad, if it is N+1 then the Q byte is bad). If it is >N+1, then
... possibly multiple bytes are bad .. my knowledge gets hazy here.
So when you do the computation on all of the bytes in all of the blocks you
get a block full of answers.
If the answers are all the same - that tells you something fairly strong.
If they are a "all different" then that is also a fairly strong statement.
But what if most are the same, but a few are different? How do you interpret
that?
The point I'm trying to get to is that the result of this RAID6 calculation
isn't a simple "that device is bad". It is a block of data that needs to be
interpreted.
I'd rather have user-space do that interpretation, so it may as well do the
calculation too.
If you wanted to do it in the kernel, you would need to be very clear about
what information you provide, what it means exactly, and why it is sufficient.
NeilBrown
next prev parent reply other threads:[~2011-02-17 0:52 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-16 10:27 md road-map: 2011 NeilBrown
2011-02-16 11:28 ` Giovanni Tessore
2011-02-16 13:40 ` Roberto Spadim
2011-02-16 14:00 ` Robin Hill
2011-02-16 14:09 ` Roberto Spadim
2011-02-16 14:21 ` Roberto Spadim
2011-02-16 21:55 ` NeilBrown
2011-02-17 1:30 ` Roberto Spadim
2011-02-16 14:13 ` Joe Landman
2011-02-16 21:24 ` NeilBrown
2011-02-16 21:44 ` Roman Mamedov
2011-02-16 21:59 ` NeilBrown
2011-02-17 0:48 ` Phil Turmel
2011-02-16 22:12 ` Joe Landman
2011-02-16 15:42 ` David Brown
2011-02-16 21:35 ` NeilBrown
2011-02-16 22:34 ` David Brown
2011-02-16 23:01 ` NeilBrown
2011-02-17 0:30 ` David Brown
2011-02-17 0:55 ` NeilBrown
2011-02-17 1:04 ` Keld Jørn Simonsen
2011-02-17 10:45 ` David Brown
2011-02-17 10:58 ` Keld Jørn Simonsen
2011-02-17 11:45 ` Giovanni Tessore
2011-02-17 15:44 ` Keld Jørn Simonsen
2011-02-17 16:22 ` Roberto Spadim
2011-02-18 0:13 ` Giovanni Tessore
2011-02-18 2:56 ` Keld Jørn Simonsen
2011-02-18 4:27 ` Roberto Spadim
2011-02-18 9:47 ` Giovanni Tessore
2011-02-18 18:43 ` Keld Jørn Simonsen
2011-02-18 19:00 ` Roberto Spadim
2011-02-18 19:18 ` Keld Jørn Simonsen
2011-02-18 19:22 ` Roberto Spadim
2011-02-16 17:20 ` Joe Landman
2011-02-16 21:36 ` NeilBrown
2011-02-16 19:37 ` Phil Turmel
2011-02-16 21:44 ` NeilBrown
2011-02-17 0:11 ` Phil Turmel
2011-02-16 20:29 ` Piergiorgio Sartor
2011-02-16 21:48 ` NeilBrown
2011-02-16 22:53 ` Piergiorgio Sartor
2011-02-17 0:24 ` Phil Turmel
2011-02-17 0:52 ` NeilBrown [this message]
2011-02-17 1:14 ` Phil Turmel
2011-02-17 3:10 ` NeilBrown
2011-02-17 18:46 ` Phil Turmel
2011-02-17 21:04 ` Mr. James W. Laferriere
2011-02-18 1:48 ` NeilBrown
2011-02-17 19:56 ` Piergiorgio Sartor
2011-02-16 22:50 ` Keld Jørn Simonsen
2011-02-23 5:06 ` Daniel Reurich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110217115257.28a8d174@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=philip@turmel.org \
--cc=piergiorgio.sartor@nexgo.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).