From: NeilBrown <neilb@suse.de>
To: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@kernel.org>,
Eryu Guan <eguan@redhat.com>
Subject: Re: raid5 lockups post ca64cae96037de16e4af92678814f5d4bf0c1c65
Date: Tue, 12 Mar 2013 12:32:24 +1100 [thread overview]
Message-ID: <20130312123224.62018981@notabene.brown> (raw)
In-Reply-To: <20130312093231.72c54735@notabene.brown>
[-- Attachment #1: Type: text/plain, Size: 6300 bytes --]
On Tue, 12 Mar 2013 09:32:31 +1100 NeilBrown <neilb@suse.de> wrote:
> On Wed, 06 Mar 2013 10:31:55 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
> wrote:
>
> >
> > I am attaching the test script I am running too. It was written by Eryu
> > Guan.
>
> Thanks for that. I've tried using it but haven't managed to trigger a BUG
> yet. What size are the loop files? I mostly use fairly small ones, but
> maybe it needs to be bigger to trigger the problem.
Shortly after I wrote that I got a bug-on! It hasn't happened again though.
This was using code without that latest patch I sent. The bug was
BUG_ON(s->uptodate != disks);
in the check_state_compute_result case of handle_parity_checks5() which is
probably the same cause as your most recent BUG.
I've revised my thinking a bit and am now running with this patch which I
think should fix a problem that probably caused the symptoms we have seen.
If you could run your tests for a while too and is whether it will still crash
for you, I'd really appreciate it.
Thanks,
NeilBrown
Subject: [PATCH] md/raid5: ensure sync and recovery don't happen at the same
time.
A number of problems can occur due to races between
resync/recovery and discard.
- if sync_request calls handle_stripe() while a discard is
happening on the stripe, it might call handle_stripe_clean_event
before all of the individual discard requests have completed
(so some devices are still locked, but not all).
Since commit ca64cae96037de16e4af92678814f5d4bf0c1c65
md/raid5: Make sure we clear R5_Discard when discard is finished.
this will cause R5_Discard to be cleared for the parity device,
so handle_stripe_clean_event will not be called when the other
devices do become unlocked, so their ->written will not be cleared.
This ultimately leads to a WARN_ON in init_stripe and a lock-up.
- If handle_stripe_clean_event does clear R5_UPTODATE at an awkward
time for resync, it can lead to s->uptodate being less than disks
in handle_parity_checks5(), which triggers a BUG (because it is
one).
So:
- keep R5_Discard on the parity device until all other devices have
completed their discard request
- make sure don't try to have a 'discard' and a 'sync' action at
the same time.
Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 51af9da..c216dd3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2609,6 +2609,8 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
int i;
clear_bit(STRIPE_SYNCING, &sh->state);
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[sh->pd_idx].flags))
+ wake_up(&conf->wait_for_overlap);
s->syncing = 0;
s->replacing = 0;
/* There is nothing more to do for sync/check/repair.
@@ -2782,6 +2784,7 @@ static void handle_stripe_clean_event(struct r5conf *conf,
{
int i;
struct r5dev *dev;
+ int discard_pending = 0;
for (i = disks; i--; )
if (sh->dev[i].written) {
@@ -2810,9 +2813,23 @@ static void handle_stripe_clean_event(struct r5conf *conf,
STRIPE_SECTORS,
!test_bit(STRIPE_DEGRADED, &sh->state),
0);
- }
- } else if (test_bit(R5_Discard, &sh->dev[i].flags))
- clear_bit(R5_Discard, &sh->dev[i].flags);
+ } else if (test_bit(R5_Discard, &dev->flags))
+ discard_pending = 1;
+ }
+ if (!discard_pending &&
+ test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags)) {
+ clear_bit(R5_Discard, &sh->dev[sh->pd_idx].flags);
+ clear_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+ if (sh->qd_idx >= 0) {
+ clear_bit(R5_Discard, &sh->dev[sh->qd_idx].flags);
+ clear_bit(R5_UPTODATE, &sh->dev[sh->qd_idx].flags);
+ }
+ /* now that discard is done we can proceed with any sync */
+ clear_bit(STRIPE_DISCARD, &sh->state);
+ if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state))
+ set_bit(STRIPE_HANDLE, &sh->state);
+
+ }
if (test_and_clear_bit(STRIPE_FULL_WRITE, &sh->state))
if (atomic_dec_and_test(&conf->pending_full_writes))
@@ -3464,9 +3481,15 @@ static void handle_stripe(struct stripe_head *sh)
return;
}
- if (test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
- set_bit(STRIPE_SYNCING, &sh->state);
- clear_bit(STRIPE_INSYNC, &sh->state);
+ if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
+ spin_lock(&sh->stripe_lock);
+ /* Cannot process 'sync' concurrently with 'discard' */
+ if (!test_bit(STRIPE_DISCARD, &sh->state) &&
+ test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
+ set_bit(STRIPE_SYNCING, &sh->state);
+ clear_bit(STRIPE_INSYNC, &sh->state);
+ }
+ spin_unlock(&sh->stripe_lock);
}
clear_bit(STRIPE_DELAYED, &sh->state);
@@ -3626,6 +3649,8 @@ static void handle_stripe(struct stripe_head *sh)
test_bit(STRIPE_INSYNC, &sh->state)) {
md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
clear_bit(STRIPE_SYNCING, &sh->state);
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[sh->pd_idx].flags))
+ wake_up(&conf->wait_for_overlap);
}
/* If the failed drives are just a ReadError, then we might need
@@ -4222,6 +4247,13 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
prepare_to_wait(&conf->wait_for_overlap, &w,
TASK_UNINTERRUPTIBLE);
spin_lock_irq(&sh->stripe_lock);
+ if (test_bit(STRIPE_SYNCING, &sh->state)) {
+ set_bit(R5_Overlap, &sh->dev[sh->pd_idx].flags);
+ spin_unlock_irq(&sh->stripe_lock);
+ release_stripe(sh);
+ schedule();
+ goto again;
+ }
for (d = 0; d < conf->raid_disks; d++) {
if (d == sh->pd_idx || d == sh->qd_idx)
continue;
@@ -4233,6 +4265,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
goto again;
}
}
+ set_bit(STRIPE_DISCARD, &sh->state);
finish_wait(&conf->wait_for_overlap, &w);
for (d = 0; d < conf->raid_disks; d++) {
if (d == sh->pd_idx || d == sh->qd_idx)
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 2afd835..996bdf3 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -324,6 +324,7 @@ enum {
STRIPE_COMPUTE_RUN,
STRIPE_OPS_REQ_PENDING,
STRIPE_ON_UNPLUG_LIST,
+ STRIPE_DISCARD,
};
/*
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-03-12 1:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-04 13:50 raid5 lockups post ca64cae96037de16e4af92678814f5d4bf0c1c65 Jes Sorensen
2013-03-04 21:00 ` NeilBrown
2013-03-05 8:44 ` Jes Sorensen
2013-03-06 2:18 ` NeilBrown
2013-03-06 9:31 ` Jes Sorensen
2013-03-11 22:32 ` NeilBrown
2013-03-12 1:32 ` NeilBrown [this message]
2013-03-12 11:12 ` joystick
2013-03-20 0:54 ` NeilBrown
2013-03-12 13:45 ` Jes Sorensen
2013-03-12 23:35 ` NeilBrown
2013-03-13 7:32 ` Jes Sorensen
2013-03-14 7:35 ` Jes Sorensen
2013-03-20 0:55 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130312123224.62018981@notabene.brown \
--to=neilb@suse.de \
--cc=Jes.Sorensen@redhat.com \
--cc=eguan@redhat.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).