From: Dan Williams <dan.j.williams@intel.com>
To: "BERTRAND Joël" <joel.bertrand@systella.fr>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Wed, 17 Oct 2007 17:46:08 -0700 [thread overview]
Message-ID: <1192668368.22506.6.camel@dwillia2-linux.ch.intel.com> (raw)
In-Reply-To: <47163BF9.304@systella.fr>
[-- Attachment #1: Type: text/plain, Size: 1608 bytes --]
On Wed, 2007-10-17 at 09:44 -0700, BERTRAND Joël wrote:
> Dan,
>
> I have modified get_stripe_work like this :
>
> static unsigned long get_stripe_work(struct stripe_head *sh)
> {
> unsigned long pending;
> int ack = 0;
> int a,b,c,d,e,f,g;
>
> pending = sh->ops.pending;
>
> test_and_ack_op(STRIPE_OP_BIOFILL, pending);
> a=ack;
> test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
> b=ack;
> test_and_ack_op(STRIPE_OP_PREXOR, pending);
> c=ack;
> test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
> d=ack;
> test_and_ack_op(STRIPE_OP_POSTXOR, pending);
> e=ack;
> test_and_ack_op(STRIPE_OP_CHECK, pending);
> f=ack;
> if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
> ack++;
> g=ack;
>
> sh->ops.count -= ack;
>
> if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n",
> a,b,c,d,e,f,g);
> BUG_ON(sh->ops.count < 0);
>
> return pending;
> }
>
> and I obtain on console :
>
> 1 1 1 1 1 2
> kernel BUG at drivers/md/raid5.c:390!
> \|/ ____ \|/
> "@'/ .. \`@"
> /_| \__/ |_\
> \__U_/
> md7_resync(5409): Kernel bad sw trap 5 [#1]
>
> If that can help you...
>
> JKB
This gives more evidence that it is probably mishandling of
STRIPE_OP_BIOFILL. The attached patch (replacing the previous) moves
the clearing of these bits into handle_stripe5 and adds some debug
information.
--
Dan
[-- Attachment #2: fix-biofill-clear2.patch --]
[-- Type: text/x-patch, Size: 1830 bytes --]
raid5: fix clearing of biofill operations (try2)
From: Dan Williams <dan.j.williams@intel.com>
ops_complete_biofill() runs outside of spin_lock(&sh->lock) and clears the
'pending' and 'ack' bits. Since the test_and_ack_op() macro only checks
against 'complete' it can get an inconsistent snapshot of pending work.
Move the clearing of these bits to handle_stripe5(), under the lock.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/md/raid5.c | 17 ++++++++++++++---
1 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f96dea9..3808f52 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -377,7 +377,12 @@ static unsigned long get_stripe_work(struct stripe_head *sh)
ack++;
sh->ops.count -= ack;
- BUG_ON(sh->ops.count < 0);
+ if (unlikely(sh->ops.count < 0)) {
+ printk(KERN_ERR "pending: %#lx ops.pending: %#lx ops.ack: %#lx "
+ "ops.complete: %#lx\n", pending, sh->ops.pending,
+ sh->ops.ack, sh->ops.complete);
+ BUG();
+ }
return pending;
}
@@ -551,8 +556,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
}
}
}
- clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
- clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+ set_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
return_io(return_bi);
@@ -2630,6 +2634,13 @@ static void handle_stripe5(struct stripe_head *sh)
s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
/* Now to look around and see what can be done */
+ /* clean-up completed biofill operations */
+ if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
+ }
+
rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
next prev parent reply other threads:[~2007-10-18 0:46 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-16 13:24 [BUG] Raid5 trouble BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:58 ` Dan Williams
2007-10-17 15:40 ` Dan Williams
2007-10-17 16:44 ` BERTRAND Joël
2007-10-18 0:46 ` Dan Williams [this message]
2007-10-18 8:29 ` BERTRAND Joël
2007-10-19 2:55 ` Bill Davidsen
2007-10-19 8:04 ` BERTRAND Joël
2007-10-19 15:51 ` Dan Williams
2007-10-19 16:03 ` BERTRAND Joël
[not found] ` <4718DE66.8000905@tmr.com>
2007-10-19 20:42 ` BERTRAND Joël
2007-10-19 20:49 ` [BUG] Raid1/5 over iSCSI trouble BERTRAND Joël
2007-10-19 21:02 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:06 ` BERTRAND Joël
2007-10-19 21:10 ` Ross S. W. Walker
2007-10-20 7:45 ` BERTRAND Joël
2007-10-19 21:11 ` [Iscsitarget-devel] " Scott Kaelin
2007-10-19 21:04 ` BERTRAND Joël
2007-10-19 21:08 ` Ross S. W. Walker
2007-10-19 21:12 ` Dan Williams
2007-10-20 8:05 ` BERTRAND Joël
2007-10-24 7:12 ` BERTRAND Joël
2007-10-24 20:10 ` Bill Davidsen
2007-10-24 23:49 ` Dan Williams
2007-10-25 0:03 ` David Miller
2007-10-27 13:29 ` BERTRAND Joël
2007-10-27 18:27 ` Dan Williams
2007-10-27 19:35 ` BERTRAND Joël
2007-10-27 21:13 ` Ming Zhang
2007-10-29 10:40 ` BERTRAND Joël
2007-10-19 21:19 ` Ming Zhang
2007-10-19 23:50 ` Bill Davidsen
2007-10-19 23:58 ` Bill Davidsen
2007-10-20 7:52 ` BERTRAND Joël
2007-10-17 16:07 ` [BUG] Raid5 trouble BERTRAND Joël
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1192668368.22506.6.camel@dwillia2-linux.ch.intel.com \
--to=dan.j.williams@intel.com \
--cc=joel.bertrand@systella.fr \
--cc=linux-raid@vger.kernel.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).