From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: raid1 repair: sync_request() aborts if one of the drives has bad block recorded
Date: Tue, 17 Jul 2012 11:17:53 +1000 [thread overview]
Message-ID: <20120717111753.178d53d5@notabene.brown> (raw)
In-Reply-To: <CAGRgLy6JZXiHFg11Z-_3krj8iKp_Wk7X_wz8OyikyvhLpxzcxg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5439 bytes --]
On Mon, 16 Jul 2012 11:45:12 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:
> Hi Neil,
> thanks for your comments.
>
> > I don't see that MD_RECOVERY_REQUESTED is really an issue is it?
> > We the wrong thing happen in any case where there just one device with no bad
> > block, and at least one device with a bad block. In that case we want to
> > skip forward by the number of bad blocks, not skip to the end of the array.
>
> I think that MD_RECOVERY_REQUESTED has some meaning here. Because
> there are only two places where we increase "write_targets":
> Here:
> } else if (!test_bit(In_sync, &rdev->flags)) {
> bio->bi_rw = WRITE;
> bio->bi_end_io = end_sync_write;
> write_targets ++;
> and
> if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && read_targets > 0)
> /* extra read targets are also write targets */
> write_targets += read_targets-1;
>
> So if we'll see a device that is not In_sync, we will increase
> write_targets and won't fall into that issue. That's why I was
> thinking to check for MD_RECOVERY_REQUESTED.
But neither of the code segements you have identified depend on
MD_RECOVERY_REQUESTED.
The problem occurs with resync/repair/check but not with recover.
>
> >
> > So this?
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 240ff31..c443f93 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -2422,7 +2422,10 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
> > /* There is nowhere to write, so all non-sync
> > * drives must be failed - so we are finished
> > */
> > - sector_t rv = max_sector - sector_nr;
> > + sector_t rv;
> > + if (min_bad > 0)
> > + max_sector = sector_nr + min_bad;
> > + rv = max_sector - sector_nr;
> > *skipped = 1;
> > put_buf(r1_bio);
> > return rv;
> >
>
> I tested it and it works. However, I have a question: shouldn't we try
> to clear bad blocks during "repair" in particular or during any
> kind-of-sync in general?
Thanks for testing.
>
> Because currently the following is happening:
> - In sync_request devices are selected as candidates for READs and for
> WRITEs (using various criteria)
> - Then we issue the READs and eventually end up in sync_request_write()
> - Here we schedule WRITEs (without consulting bad blocks or WriteErrorSeen bit)
> - In end_sync_write, we check if we can clear some bad blocks, and if
> yes, eventually we end up in handle_sync_write_finished(), where we
> clear bad blocks
>
> So getting back to sync_request(), the issue is this: if we consider a
> device for READs, we check for badblocks. If we find a badblock, we
> skip it and don't consider this device neither for READs nor for
> WRITEs. How about:
> - We consider a device for READs
> - If it has a bad block, we consider it for WRITEs instead (so we have
> a 3rd place where we consider the device for WRITE).
> - We may consult WriteErrorSeen bit as we do in normal make_request(),
> but not sure, because right now on sync path, we don't consult it
Good point.
I think we should consider WriteErrorSeen here. We don't currently consult
it in sync because we avoid bad blocks completely.
So you patch would become:
} else if (!test_bit(WriteErrorSeen, &rdev->flags)) {
bio->bi_rw = WRITE;
bio->bi_end_io = end_sync_write;
write_targets++;
}
If you could confirm that works and resubmit it with a 'Signed-off-by:' line
I'll apply it.
Thanks.
I've already included my previous patch in my queue and tagged it for -stable.
NeilBrown
>
> So this patch:
> --- c:/work/code_backups/psp13/raid1.c Sun Jul 15 16:39:07 2012
> +++ ubuntu-precise/source/drivers/md/raid1.c Mon Jul 16 11:35:40 2012
> @@ -2385,20 +2385,25 @@
> if (wonly < 0)
> wonly = i;
> } else {
> if (disk < 0)
> disk = i;
> }
> bio->bi_rw = READ;
> bio->bi_end_io = end_sync_read;
> read_targets++;
> }
> + else {
> + bio->bi_rw = WRITE;
> + bio->bi_end_io = end_sync_write;
> + write_targets ++;
> + }
> }
> if (bio->bi_end_io) {
> atomic_inc(&rdev->nr_pending);
> bio->bi_sector = sector_nr + rdev->data_offset;
> bio->bi_bdev = rdev->bdev;
> bio->bi_private = r1_bio;
> }
> }
> rcu_read_unlock();
> if (disk < 0)
>
>
> I tested this patch (together with yours) and it cleared the bad
> blocks, but not sure what else I might be missing. My Linux kernel
> programming skills are still ... questionable.
>
> Thanks,
> Alex.
>
>
>
> >
> > Thanks,
> > NeilBrown
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-07-17 1:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-12 15:38 raid1 repair: sync_request() aborts if one of the drives has bad block recorded Alexander Lyakas
2012-07-16 3:37 ` NeilBrown
2012-07-16 8:45 ` Alexander Lyakas
2012-07-17 1:17 ` NeilBrown [this message]
2012-07-17 13:17 ` Alexander Lyakas
2012-07-24 19:30 ` Alexander Lyakas
2012-07-31 2:11 ` NeilBrown
2012-07-31 5:56 ` Alexander Lyakas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120717111753.178d53d5@notabene.brown \
--to=neilb@suse.de \
--cc=alex.bolshoy@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).