Re: raid1 repair: sync_request() aborts if one of the drives has bad block recorded

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: raid1 repair: sync_request() aborts if one of the drives has bad block recorded
Date: Tue, 17 Jul 2012 11:17:53 +1000	[thread overview]
Message-ID: <20120717111753.178d53d5@notabene.brown> (raw)
In-Reply-To: <CAGRgLy6JZXiHFg11Z-_3krj8iKp_Wk7X_wz8OyikyvhLpxzcxg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5439 bytes --]

On Mon, 16 Jul 2012 11:45:12 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hi Neil,
> thanks for your comments.
> 
> > I don't see that MD_RECOVERY_REQUESTED is really an issue is it?
> > We the wrong thing happen in any case where there just one device with no bad
> > block, and at least one device with a bad block.  In that case we want to
> > skip forward by the number of bad blocks, not skip to the end of the array.
> 
> I think that MD_RECOVERY_REQUESTED has some meaning here. Because
> there are only two places where we increase "write_targets":
> Here:
> 		} else if (!test_bit(In_sync, &rdev->flags)) {
> 			bio->bi_rw = WRITE;
> 			bio->bi_end_io = end_sync_write;
> 			write_targets ++;
> and
> 	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && read_targets > 0)
> 		/* extra read targets are also write targets */
> 		write_targets += read_targets-1;
> 
> So if we'll see a device that is not In_sync, we will increase
> write_targets and won't fall into that issue. That's why I was
> thinking to check for MD_RECOVERY_REQUESTED.

But neither of the code segements you have identified depend on
MD_RECOVERY_REQUESTED.

The problem occurs with resync/repair/check but not with recover.


> 
> >
> > So this?
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 240ff31..c443f93 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -2422,7 +2422,10 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
> >                 /* There is nowhere to write, so all non-sync
> >                  * drives must be failed - so we are finished
> >                  */
> > -               sector_t rv = max_sector - sector_nr;
> > +               sector_t rv;
> > +               if (min_bad > 0)
> > +                       max_sector = sector_nr + min_bad;
> > +               rv = max_sector - sector_nr;
> >                 *skipped = 1;
> >                 put_buf(r1_bio);
> >                 return rv;
> >
> 
> I tested it and it works. However, I have a question: shouldn't we try
> to clear bad blocks during "repair" in particular or during any
> kind-of-sync in general?

Thanks for testing.


> 
> Because currently the following is happening:
> - In sync_request devices are selected as candidates for READs and for
> WRITEs (using various criteria)
> - Then we issue the READs and eventually end up in sync_request_write()
> - Here we schedule WRITEs (without consulting bad blocks or WriteErrorSeen bit)
> - In end_sync_write, we check if we can clear some bad blocks, and if
> yes, eventually we end up in handle_sync_write_finished(), where we
> clear bad blocks
> 
> So getting back to sync_request(), the issue is this: if we consider a
> device for READs, we check for badblocks. If we find a badblock, we
> skip it and don't consider this device neither for READs nor for
> WRITEs. How about:
> - We consider a device for READs
> - If it has a bad block, we consider it for WRITEs instead (so we have
> a 3rd place where we consider the device for WRITE).
> - We may consult WriteErrorSeen bit as we do in normal make_request(),
> but not sure, because right now on sync path, we don't consult it

Good point.
I think we should consider WriteErrorSeen here.  We don't currently consult
it in sync because we avoid bad blocks completely.
So you patch would become:

  } else if (!test_bit(WriteErrorSeen, &rdev->flags)) {
         bio->bi_rw = WRITE;
         bio->bi_end_io = end_sync_write;
         write_targets++;
  }

If you could confirm that works and resubmit it with a 'Signed-off-by:' line
I'll apply it.
Thanks.
I've already included my previous patch in my queue and tagged it for -stable.

NeilBrown

> 
> So this patch:
> --- c:/work/code_backups/psp13/raid1.c  Sun Jul 15 16:39:07 2012
> +++ ubuntu-precise/source/drivers/md/raid1.c    Mon Jul 16 11:35:40 2012
> @@ -2385,20 +2385,25 @@
>                                         if (wonly < 0)
>                                                 wonly = i;
>                                 } else {
>                                         if (disk < 0)
>                                                 disk = i;
>                                 }
>                                 bio->bi_rw = READ;
>                                 bio->bi_end_io = end_sync_read;
>                                 read_targets++;
>                         }
> +                       else {
> +                               bio->bi_rw = WRITE;
> +                               bio->bi_end_io = end_sync_write;
> +                               write_targets ++;
> +                       }
>                 }
>                 if (bio->bi_end_io) {
>                         atomic_inc(&rdev->nr_pending);
>                         bio->bi_sector = sector_nr + rdev->data_offset;
>                         bio->bi_bdev = rdev->bdev;
>                         bio->bi_private = r1_bio;
>                 }
>         }
>         rcu_read_unlock();
>         if (disk < 0)
> 
> 
> I tested this patch (together with yours) and it cleared the bad
> blocks, but not sure what else I might be missing. My Linux kernel
> programming skills are still ... questionable.
> 
> Thanks,
> Alex.
> 
> 
> 
> >
> > Thanks,
> > NeilBrown
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2012-07-17  1:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-12 15:38 raid1 repair: sync_request() aborts if one of the drives has bad block recorded Alexander Lyakas
2012-07-16  3:37 ` NeilBrown
2012-07-16  8:45   ` Alexander Lyakas
2012-07-17  1:17     ` NeilBrown [this message]
2012-07-17 13:17       ` Alexander Lyakas
2012-07-24 19:30         ` Alexander Lyakas
2012-07-31  2:11           ` NeilBrown
2012-07-31  5:56             ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120717111753.178d53d5@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).