linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Rabbitson <rabbit+list@rabbit.us>
To: Theodore Tso <tytso@MIT.EDU>
Cc: Bas van Schaik <bas@tuxes.nl>, linux-raid@vger.kernel.org
Subject: Re: Redundancy check using "echo check > sync_action": error	reporting?
Date: Fri, 21 Mar 2008 00:08:20 +0100	[thread overview]
Message-ID: <47E2EE64.5080101@rabbit.us> (raw)
In-Reply-To: <20080320163551.GG13719@mit.edu>

Theodore Tso wrote:
> On Thu, Mar 20, 2008 at 03:19:08PM +0100, Bas van Schaik wrote:
>>> There's no explicit message produced by the md module, no.  You need to
>>> check the /sys/block/md{X}/md/mismatch_cnt entry to find out how many
>>> mismatches there are.  Similarly, following a repair this will indicate
>>> how many mismatches it thinks have been fixed (by updating the parity
>>> block to match the data blocks).
>>>   
>> Marvellous! I naively assumed that the module would warn me, but that's
>> not true. Wouldn't it be appropriate to print a message to dmesg if such
>> a mismatch occurs during a check? Such a mismatch clearly means that
>> there is something wrong with your hardware lying beneath md, doesn't it?
> 
> If a mismatch is detected in a RAID-6 configuration, it should be
> possible to figure out what should be fixed (since with two hot spares
> there should be enough redundancy not only to detect an error, but to
> correct it.)  Out of curiosity, does md do this automatically, either
> when reading from a stripe, or during a resync operation?
> 

In my modest experience with root/high performance spool on various raid 
levels I can pretty much conclude that the current check mechanism doesn't do 
enough to give power to the user. We can debate all we want about what the MD 
driver should do when it finds a mismatch, yet there is no way for the user to 
figure out what the mismatch is and take appropriate action. This does not 
apply only to RIAD5/6 - what about RAID1/10 with >2 chunk copies? What if the 
only wrong value is taken and written all over the other good blocks?

I think that the solution is rather simple, and I would contribute a patch if 
I had any C experience. The current check mechanism remains the same - 
mismatch_cnt is incremented/reset just the same as before. However on every 
mismatching chunk the system printks the following:

1) the start offset of the chunk(md1/10) or stripe(md5/6) within the MD device
2) one line for every active disk containing:
	a) the offset of the chunk within the MD componnent
	b) a {md5|sha1}sum of the chunk

In a common case array this will take no more than 8 lines in dmesg. However 
it will allow:

1) For a human to determine at a glance which disk holds a mismatching chunk 
in raid 1/10
2) Determine the same for raid 6 using a userspace tool which will calculate 
the parity for every possible permutation of chunks
3) using some external tools to determine which file might have been affected 
on the layered file system


Now of course the problem remains how to repair the array using the 
information obtained above. I think the best way would be to extend the syntax 
of repair itself, so that:

echo repair > .../sync_action would use the old heuristics

echo repair <mdoffset> <component N> > .../sync_action will update the chunk 
on drive N which corresponds to the chunk/stripe at mdoffset within the MD 
device, using the information from the other drives, and not the other way 
around as might happen with just a repair.

Just my 2c

Peter

  parent reply	other threads:[~2008-03-20 23:08 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-16 14:21 Redundancy check using "echo check > sync_action": error reporting? Bas van Schaik
2008-03-16 15:14 ` Janek Kozicki
2008-03-20 13:32   ` Bas van Schaik
2008-03-20 13:47     ` Robin Hill
2008-03-20 14:19       ` Bas van Schaik
2008-03-20 14:45         ` Robin Hill
2008-03-20 15:16           ` Bas van Schaik
2008-03-20 16:04             ` Robin Hill
2008-03-20 16:35         ` Theodore Tso
2008-03-20 17:10           ` Robin Hill
2008-03-20 17:39           ` Andre Noll
2008-03-20 18:02             ` Theodore Tso
2008-03-20 18:57               ` Andre Noll
2008-03-21 14:02               ` Ric Wheeler
2008-03-21 20:19               ` NeilBrown
2008-03-21 20:45                 ` Ric Wheeler
2008-03-22 17:13                 ` Bill Davidsen
2008-03-20 23:08           ` Peter Rabbitson [this message]
2008-03-21 14:24             ` Bill Davidsen
2008-03-21 14:52               ` Peter Rabbitson
2008-03-21 17:13                 ` Theodore Tso
2008-03-21 17:35                   ` Peter Rabbitson
2008-03-22 13:27                     ` Theodore Tso
2008-03-22 14:00                       ` Bas van Schaik
2008-03-25  4:44                       ` Neil Brown
2008-03-25 15:17                         ` Bill Davidsen
2008-03-25  9:19                       ` Mattias Wadenstein
2008-03-21 17:43                   ` Robin Hill
2008-03-21 23:01                 ` Bill Davidsen
2008-03-21 23:45                   ` Carlos Carvalho
2008-03-22 17:19                     ` Bill Davidsen
2008-03-21 23:55                   ` Robin Hill
2008-03-22 10:03                     ` Peter Rabbitson
2008-03-22 10:42                       ` What do Events actually mean? Justin Piszcz
2008-03-22 17:35                         ` David Greaves
2008-03-22 17:48                           ` Justin Piszcz
2008-03-22 18:02                             ` David Greaves
2008-03-25  3:58                         ` Neil Brown
2008-03-26  8:57                           ` David Greaves
2008-03-26  8:57                           ` David Greaves
2008-05-04  7:30                       ` Redundancy check using "echo check > sync_action": error reporting? Peter Rabbitson
2008-05-06  6:36                         ` Luca Berra
2008-03-25  4:24             ` Neil Brown
2008-03-25  9:00               ` Peter Rabbitson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47E2EE64.5080101@rabbit.us \
    --to=rabbit+list@rabbit.us \
    --cc=bas@tuxes.nl \
    --cc=linux-raid@vger.kernel.org \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).