From: Dan Williams <dan.j.williams@intel.com>
To: "BERTRAND Joël" <joel.bertrand@systella.fr>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Wed, 17 Oct 2007 17:46:08 -0700 [thread overview]
Message-ID: <1192668368.22506.6.camel@dwillia2-linux.ch.intel.com> (raw)
In-Reply-To: <47163BF9.304@systella.fr>
[-- Attachment #1: Type: text/plain, Size: 1608 bytes --]
On Wed, 2007-10-17 at 09:44 -0700, BERTRAND Joël wrote:
> Dan,
>
> I have modified get_stripe_work like this :
>
> static unsigned long get_stripe_work(struct stripe_head *sh)
> {
> unsigned long pending;
> int ack = 0;
> int a,b,c,d,e,f,g;
>
> pending = sh->ops.pending;
>
> test_and_ack_op(STRIPE_OP_BIOFILL, pending);
> a=ack;
> test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
> b=ack;
> test_and_ack_op(STRIPE_OP_PREXOR, pending);
> c=ack;
> test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
> d=ack;
> test_and_ack_op(STRIPE_OP_POSTXOR, pending);
> e=ack;
> test_and_ack_op(STRIPE_OP_CHECK, pending);
> f=ack;
> if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
> ack++;
> g=ack;
>
> sh->ops.count -= ack;
>
> if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n",
> a,b,c,d,e,f,g);
> BUG_ON(sh->ops.count < 0);
>
> return pending;
> }
>
> and I obtain on console :
>
> 1 1 1 1 1 2
> kernel BUG at drivers/md/raid5.c:390!
> \|/ ____ \|/
> "@'/ .. \`@"
> /_| \__/ |_\
> \__U_/
> md7_resync(5409): Kernel bad sw trap 5 [#1]
>
> If that can help you...
>
> JKB
This gives more evidence that it is probably mishandling of
STRIPE_OP_BIOFILL. The attached patch (replacing the previous) moves
the clearing of these bits into handle_stripe5 and adds some debug
information.
--
Dan
[-- Attachment #2: fix-biofill-clear2.patch --]
[-- Type: text/x-patch, Size: 1830 bytes --]
raid5: fix clearing of biofill operations (try2)
From: Dan Williams <dan.j.williams@intel.com>
ops_complete_biofill() runs outside of spin_lock(&sh->lock) and clears the
'pending' and 'ack' bits. Since the test_and_ack_op() macro only checks
against 'complete' it can get an inconsistent snapshot of pending work.
Move the clearing of these bits to handle_stripe5(), under the lock.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/md/raid5.c | 17 ++++++++++++++---
1 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f96dea9..3808f52 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -377,7 +377,12 @@ static unsigned long get_stripe_work(struct stripe_head *sh)
ack++;
sh->ops.count -= ack;
- BUG_ON(sh->ops.count < 0);
+ if (unlikely(sh->ops.count < 0)) {
+ printk(KERN_ERR "pending: %#lx ops.pending: %#lx ops.ack: %#lx "
+ "ops.complete: %#lx\n", pending, sh->ops.pending,
+ sh->ops.ack, sh->ops.complete);
+ BUG();
+ }
return pending;
}
@@ -551,8 +556,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
}
}
}
- clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
- clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+ set_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
return_io(return_bi);
@@ -2630,6 +2634,13 @@ static void handle_stripe5(struct stripe_head *sh)
s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
/* Now to look around and see what can be done */
+ /* clean-up completed biofill operations */
+ if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
+ }
+
rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: "BERTRAND Joël" <joel.bertrand@systella.fr>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Thu, 18 Oct 2007 00:46:08 +0000 [thread overview]
Message-ID: <1192668368.22506.6.camel@dwillia2-linux.ch.intel.com> (raw)
In-Reply-To: <47163BF9.304@systella.fr>
[-- Attachment #1: Type: text/plain, Size: 1608 bytes --]
On Wed, 2007-10-17 at 09:44 -0700, BERTRAND Joël wrote:
> Dan,
>
> I have modified get_stripe_work like this :
>
> static unsigned long get_stripe_work(struct stripe_head *sh)
> {
> unsigned long pending;
> int ack = 0;
> int a,b,c,d,e,f,g;
>
> pending = sh->ops.pending;
>
> test_and_ack_op(STRIPE_OP_BIOFILL, pending);
> a=ack;
> test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
> b=ack;
> test_and_ack_op(STRIPE_OP_PREXOR, pending);
> c=ack;
> test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
> d=ack;
> test_and_ack_op(STRIPE_OP_POSTXOR, pending);
> e=ack;
> test_and_ack_op(STRIPE_OP_CHECK, pending);
> f=ack;
> if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
> ack++;
> g=ack;
>
> sh->ops.count -= ack;
>
> if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n",
> a,b,c,d,e,f,g);
> BUG_ON(sh->ops.count < 0);
>
> return pending;
> }
>
> and I obtain on console :
>
> 1 1 1 1 1 2
> kernel BUG at drivers/md/raid5.c:390!
> \|/ ____ \|/
> "@'/ .. \`@"
> /_| \__/ |_\
> \__U_/
> md7_resync(5409): Kernel bad sw trap 5 [#1]
>
> If that can help you...
>
> JKB
This gives more evidence that it is probably mishandling of
STRIPE_OP_BIOFILL. The attached patch (replacing the previous) moves
the clearing of these bits into handle_stripe5 and adds some debug
information.
--
Dan
[-- Attachment #2: fix-biofill-clear2.patch --]
[-- Type: text/x-patch, Size: 1830 bytes --]
raid5: fix clearing of biofill operations (try2)
From: Dan Williams <dan.j.williams@intel.com>
ops_complete_biofill() runs outside of spin_lock(&sh->lock) and clears the
'pending' and 'ack' bits. Since the test_and_ack_op() macro only checks
against 'complete' it can get an inconsistent snapshot of pending work.
Move the clearing of these bits to handle_stripe5(), under the lock.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/md/raid5.c | 17 ++++++++++++++---
1 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f96dea9..3808f52 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -377,7 +377,12 @@ static unsigned long get_stripe_work(struct stripe_head *sh)
ack++;
sh->ops.count -= ack;
- BUG_ON(sh->ops.count < 0);
+ if (unlikely(sh->ops.count < 0)) {
+ printk(KERN_ERR "pending: %#lx ops.pending: %#lx ops.ack: %#lx "
+ "ops.complete: %#lx\n", pending, sh->ops.pending,
+ sh->ops.ack, sh->ops.complete);
+ BUG();
+ }
return pending;
}
@@ -551,8 +556,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
}
}
}
- clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
- clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+ set_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
return_io(return_bi);
@@ -2630,6 +2634,13 @@ static void handle_stripe5(struct stripe_head *sh)
s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
/* Now to look around and see what can be done */
+ /* clean-up completed biofill operations */
+ if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
+ clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
+ }
+
rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
next prev parent reply other threads:[~2007-10-18 0:46 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-16 13:24 [BUG] Raid5 trouble BERTRAND Joël
2007-10-16 13:24 ` BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:58 ` Dan Williams
2007-10-17 14:58 ` Dan Williams
2007-10-17 15:40 ` Dan Williams
2007-10-17 15:40 ` Dan Williams
2007-10-17 16:44 ` BERTRAND Joël
2007-10-17 16:44 ` BERTRAND Joël
2007-10-18 0:46 ` Dan Williams [this message]
2007-10-18 0:46 ` Dan Williams
2007-10-18 8:29 ` BERTRAND Joël
2007-10-18 8:29 ` BERTRAND Joël
2007-10-19 2:55 ` Bill Davidsen
2007-10-19 2:55 ` Bill Davidsen
2007-10-19 8:04 ` BERTRAND Joël
2007-10-19 8:04 ` BERTRAND Joël
2007-10-19 15:51 ` Dan Williams
2007-10-19 15:51 ` Dan Williams
2007-10-19 16:03 ` BERTRAND Joël
2007-10-19 16:03 ` BERTRAND Joël
[not found] ` <4718DE66.8000905@tmr.com>
2007-10-19 20:42 ` BERTRAND Joël
2007-10-19 20:42 ` BERTRAND Joël
2007-10-19 20:49 ` [BUG] Raid1/5 over iSCSI trouble BERTRAND Joël
2007-10-19 20:49 ` BERTRAND Joël
2007-10-19 21:02 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:02 ` Ross S. W. Walker
2007-10-19 21:06 ` BERTRAND Joël
2007-10-19 21:06 ` [Iscsitarget-devel] " BERTRAND Joël
2007-10-19 21:10 ` Ross S. W. Walker
2007-10-19 21:10 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-20 7:45 ` BERTRAND Joël
2007-10-20 7:45 ` [Iscsitarget-devel] " BERTRAND Joël
2007-10-19 21:11 ` Scott Kaelin
2007-10-19 21:11 ` Scott Kaelin
2007-10-19 21:04 ` BERTRAND Joël
2007-10-19 21:04 ` BERTRAND Joël
2007-10-19 21:08 ` Ross S. W. Walker
2007-10-19 21:08 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:12 ` Dan Williams
2007-10-19 21:12 ` Dan Williams
2007-10-20 8:05 ` BERTRAND Joël
2007-10-20 8:05 ` BERTRAND Joël
2007-10-24 7:12 ` BERTRAND Joël
2007-10-24 7:12 ` BERTRAND Joël
2007-10-24 20:10 ` Bill Davidsen
2007-10-24 20:10 ` Bill Davidsen
2007-10-24 23:49 ` Dan Williams
2007-10-24 23:49 ` Dan Williams
2007-10-25 0:03 ` David Miller
2007-10-25 0:03 ` David Miller
2007-10-27 13:29 ` BERTRAND Joël
2007-10-27 13:29 ` BERTRAND Joël
2007-10-27 18:27 ` Dan Williams
2007-10-27 18:27 ` Dan Williams
2007-10-27 19:35 ` BERTRAND Joël
2007-10-27 19:35 ` BERTRAND Joël
2007-10-27 21:13 ` Ming Zhang
2007-10-27 21:13 ` Ming Zhang
2007-10-29 10:40 ` BERTRAND Joël
2007-10-29 10:40 ` BERTRAND Joël
2007-10-19 21:19 ` Ming Zhang
2007-10-19 21:19 ` [Iscsitarget-devel] " Ming Zhang
2007-10-19 23:50 ` Bill Davidsen
2007-10-19 23:50 ` Bill Davidsen
2007-10-19 23:58 ` Bill Davidsen
2007-10-19 23:58 ` Bill Davidsen
2007-10-20 7:52 ` BERTRAND Joël
2007-10-20 7:52 ` BERTRAND Joël
2007-10-17 16:07 ` [BUG] Raid5 trouble BERTRAND Joël
2007-10-17 16:07 ` BERTRAND Joël
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1192668368.22506.6.camel@dwillia2-linux.ch.intel.com \
--to=dan.j.williams@intel.com \
--cc=joel.bertrand@systella.fr \
--cc=linux-raid@vger.kernel.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.