Re: md: raid5 resync corrects read errors on data block - is this correct?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: md: raid5 resync corrects read errors on data block - is this correct?
Date: Tue, 25 Sep 2012 16:57:02 +1000	[thread overview]
Message-ID: <20120925165702.0d7afcd7@notabene.brown> (raw)
In-Reply-To: <CAGRgLy6EsYj90f4UqPt9=p_qumWLq2vCCYwa-YL97P=k6R58HQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7017 bytes --]

On Thu, 20 Sep 2012 11:26:50 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> you are completely right. I got confused between mddev->recovery_cp
> and sb->resync_offset; the latter may become 0 due to in-flight WRITEs
> and not due to resync. Looking at the code again, I see that
> recovery_cp is totally one-way from sb->resync_offset to MaxSector
> (except for explicit loading via sysfs). Also recovery_cp is not
> relevant to "check" and "repair". So recovery_cp is pretty simple
> after all.
> 
> Below is V2 patch. (I have also to credit it to somebody else, because
> he was the one that said - just do rcw while you are resyncing).
> 
> Thanks,
> Alex.
> 
> 
> -----------------
> >From cc3e2bfcf2fd2c69180577949425d69de88706bb Mon Sep 17 00:00:00 2001
> From: Alex Lyakas <alex@zadarastorage.com>
> Date: Thu, 13 Sep 2012 18:55:00 +0300
> Subject: [PATCH] When RAID5 is dirty, force reconstruct-write instead of
>  read-modify-write.
> 
> Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
> Signed-off-by: Yair Hershko <yair@zadarastorage.com>

Signed-off-by has a very specific meaning - it isn't just a way of giving
recredit.
If Yair wrote some of the code, this is fine.
If not, then something like "Suggest-by:" might be more appropriate.
Should I change it to that.

applied, thanks.

NeilBrown


> 
> diff --git a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> index 5332202..9fdd5e3 100644
> --- a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> +++ b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> @@ -2555,12 +2555,24 @@ static void handle_stripe_dirtying(struct r5conf *conf,
>                                    int disks)
>  {
>         int rmw = 0, rcw = 0, i;
> -       if (conf->max_degraded == 2) {
> -               /* RAID6 requires 'rcw' in current implementation
> -                * Calculate the real rcw later - for now fake it
> +       sector_t recovery_cp = conf->mddev->recovery_cp;
> +
> +       /* RAID6 requires 'rcw' in current implementation.
> +        * Otherwise, check whether resync is now happening or should start.
> +        * If yes, then the array is dirty (after unclean shutdown or
> +        * initial creation), so parity in some stripes might be inconsistent.
> +        * In this case, we need to always do reconstruct-write, to ensure
> +        * that in case of drive failure or read-error correction, we
> +        * generate correct data from the parity.
> +        */
> +       if (conf->max_degraded == 2 ||
> +           (recovery_cp < MaxSector && sh->sector >= recovery_cp)) {
> +               /* Calculate the real rcw later - for now make it
>                  * look like rcw is cheaper
>                  */
>                 rcw = 1; rmw = 2;
> +               pr_debug("force RCW max_degraded=%u, recovery_cp=%lu
> sh->sector=%lu\n",
> +                        conf->max_degraded, recovery_cp, sh->sector);
>         } else for (i = disks; i--; ) {
>                 /* would I have to read this buffer for read_modify_write */
>                 struct r5dev *dev = &sh->dev[i];
> 
> 
> 
> 
> 
> 
> On Wed, Sep 19, 2012 at 8:59 AM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 17 Sep 2012 14:15:16 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Hi Neil,
> >> below is a bit less-ugly version of the patch.
> >> Thanks,
> >> Alex.
> >>
> >> >From 05cf800d623bf558c99d542cf8bf083c85b7e5d5 Mon Sep 17 00:00:00 2001
> >> From: Alex Lyakas <alex@zadarastorage.com>
> >> Date: Thu, 13 Sep 2012 18:55:00 +0300
> >> Subject: [PATCH] When RAID5 is dirty, force reconstruct-write instead of
> >>  read-modify-write.
> >>
> >> Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
> >> Signed-off-by: Yair Hershko <yair@zadarastorage.com>
> >>
> >> diff --git a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> >> b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> >> index 5332202..0702785 100644
> >> --- a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> >> +++ b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
> >> @@ -2555,12 +2555,36 @@ static void handle_stripe_dirtying(struct r5conf *conf,
> >>                                    int disks)
> >>  {
> >>         int rmw = 0, rcw = 0, i;
> >> -       if (conf->max_degraded == 2) {
> >> -               /* RAID6 requires 'rcw' in current implementation
> >> -                * Calculate the real rcw later - for now fake it
> >> +       sector_t recovery_cp = conf->mddev->recovery_cp;
> >> +       unsigned long recovery = conf->mddev->recovery;
> >> +       int needed = test_bit(MD_RECOVERY_NEEDED, &recovery);
> >> +       int resyncing = test_bit(MD_RECOVERY_SYNC, &recovery) &&
> >> +                       !test_bit(MD_RECOVERY_REQUESTED, &recovery) &&
> >> +                       !test_bit(MD_RECOVERY_CHECK, &recovery);
> >> +       int transitional = test_bit(MD_RECOVERY_RUNNING, &recovery) &&
> >> +                          !test_bit(MD_RECOVERY_SYNC, &recovery) &&
> >> +                          !test_bit(MD_RECOVERY_RECOVER, &recovery) &&
> >> +                          !test_bit(MD_RECOVERY_DONE, &recovery) &&
> >> +                          !test_bit(MD_RECOVERY_RESHAPE, &recovery);
> >
> > Thanks Alex,
> >  however I don't understand why you want to test all of these bits.
> > Isn't it enough just to check ->recovery_cp ??
> >
> >> +
> >> +       /* RAID6 requires 'rcw' in current implementation.
> >> +        * Otherwise, attempt to check whether resync is now happening
> >> +        * or should start.
> >> +         * If yes, then the array is dirty (after unclean shutdown or
> >> +         * initial creation), so parity in some stripes might be inconsistent.
> >> +         * In this case, we need to always do reconstruct-write, to ensure
> >> +         * that in case of drive failure or read-error correction, we
> >> +         * generate correct data from the parity.
> >> +         */
> >> +       if (conf->max_degraded == 2 ||
> >> +           (recovery_cp < MaxSector && sh->sector >= recovery_cp &&
> >> +            (needed || resyncing || transitional))) {
> >> +               /* Calculate the real rcw later - for now fake it
> >>                  * look like rcw is cheaper
> >
> > Also, we should probably fix this comment.  s/fake/make/
> >
> > Thanks,
> > NeilBrown
> >
> >
> >
> >>                  */
> >>                 rcw = 1; rmw = 2;
> >> +               pr_debug("force RCW max_degraded=%u, recovery_cp=%lu
> >> sh->sector=%lu recovery=0x%lx\n",
> >> +                        conf->max_degraded, recovery_cp, sh->sector, recovery);
> >>         } else for (i = disks; i--; ) {
> >>                 /* would I have to read this buffer for read_modify_write */
> >>                 struct r5dev *dev = &sh->dev[i];
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2012-09-25  6:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-11 19:10 md: raid5 resync corrects read errors on data block - is this correct? Alexander Lyakas
2012-09-11 22:29 ` NeilBrown
2012-09-12  7:15   ` Alexander Lyakas
2012-09-12 16:49   ` Alexander Lyakas
2012-09-13  0:19     ` NeilBrown
2012-09-13 16:05       ` Alexander Lyakas
2012-09-13 16:11         ` Alexander Lyakas
2012-09-17 11:15           ` Alexander Lyakas
2012-09-19  5:59             ` NeilBrown
2012-09-20  8:26               ` Alexander Lyakas
2012-09-25  6:57                 ` NeilBrown [this message]
2012-09-25  7:50                   ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120925165702.0d7afcd7@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).