linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: md: raid5 resync corrects read errors on data block - is this correct?
Date: Thu, 13 Sep 2012 10:19:24 +1000	[thread overview]
Message-ID: <20120913101924.13431e6e@notabene.brown> (raw)
In-Reply-To: <CAGRgLy6Syso2dfELFhpK9nCyEgvuD6k3F=MRJBnrVU=btKqKVw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2730 bytes --]

On Wed, 12 Sep 2012 19:49:52 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> I have done some more investigation on that.
> 
> I see that according to handle_stripe_dirtying(), raid6 always does
> reconstruct-write, while raid5 checks what will be cheaper in terms of
> IOs - read-modify-write or reconstruct-write. For example, for a
> 3-drive raid5, both are the same, so because of:
> 
> if (rmw < rcw && rmw > 0)
> ... /* this is not picked, because in this case rmw==rcw==1 */
> 
> reconstruct-write is always performed for such 3-drvie raid5. Is this correct?

Yes. 

> 
> The issue with doing read-modify-writes is that later we have no
> reliable way to know whether the parity block is correct - when we
> later do reconstruct-write because of a read error, for example. For
> read requests we could have perhaps checked the bitmap, and do
> reconstruct-write if the relevant bit is not set, but for write
> requests the relevant bit will always be set, because it is set when
> the write is started.
> 
> I tried the following scenario, which showed a data corruption:
> # Create 4-drive raid5 in "--force" mode, so resync starts
> # Write one sector on a stripe that resync has not handled yet. RMW is
> performed, but the parity is incorrect because two other data blocks
> were not taken into account (they contain garbage).
> # Induce a read-error on the sector that I just wrote to
> # Let resync handle this stripe
> 
> As a result, resync corrects my sector using other two data blocks +
> parity block, which is out of sync. When I read back the sector, data
> is incorrect.
> 
> I see that I can easily enforce raid5 to always do reconstruct-write,
> the same way like you do for raid6. However, I realize that for
> performance reasons, it is better to do RMW if possible.
> 
> What do you think about the following rough suggestion: in
> handle_stripe_dirtying() check whether resync is ongoing or should be
> started - using MD_RECOVERY_SYNC, for example. If there is an ongoing
> resync, there is a good reason for that, probably parity on some
> stripes is out of date. So in that case, always force
> reconstruct-write. Otherwise, count what is cheaper like you do now.
> (Can RCW be really cheaper than RMW?)
> 
> So during resync, array performance will be lower, but we will ensure
> that all stripe-blocks are consistent. What do you think?

I'm fairly sure we used to do that - long long ago. (hunts through git
history...)  No.  The code-fragment was there but it was commented out.

I think it would be good to avoid 'rmw' if the sector offset is less than
recovery_cp.

Care to write a patch?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-09-13  0:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-11 19:10 md: raid5 resync corrects read errors on data block - is this correct? Alexander Lyakas
2012-09-11 22:29 ` NeilBrown
2012-09-12  7:15   ` Alexander Lyakas
2012-09-12 16:49   ` Alexander Lyakas
2012-09-13  0:19     ` NeilBrown [this message]
2012-09-13 16:05       ` Alexander Lyakas
2012-09-13 16:11         ` Alexander Lyakas
2012-09-17 11:15           ` Alexander Lyakas
2012-09-19  5:59             ` NeilBrown
2012-09-20  8:26               ` Alexander Lyakas
2012-09-25  6:57                 ` NeilBrown
2012-09-25  7:50                   ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120913101924.13431e6e@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).