From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Wed, 17 Oct 2007 18:44:41 +0200 [thread overview]
Message-ID: <47163BF9.304@systella.fr> (raw)
In-Reply-To: <e9c3a7c20710170840u2ed8d6a9x26523eec6700ad11@mail.gmail.com>
Dan Williams wrote:
> On 10/17/07, Dan Williams <dan.j.williams@intel.com> wrote:
>> On 10/17/07, BERTRAND Joël <joel.bertrand@systella.fr> wrote:
>>> BERTRAND Joël wrote:
>>>> Hello,
>>>>
>>>> I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
>>>> server has a partitionable raid5 array (/dev/md/d0) and I have to
>>>> synchronize both raid5 volumes by raid1. Thus, I have tried to build a
>>>> raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
>>>> the second server) and I obtain a BUG :
>>>>
>>>> Root gershwin:[/usr/scripts] > mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
>>>> /dev/sdi1
>>>> ...
>>> Hello,
>>>
>>> I have fixed iscsi-target, and I have tested it. It works now without
>>> any trouble. Patches were posted on iscsi-target mailing list. When I
>>> use iSCSI to access to foreign raid5 volume, it works fine. I can format
>>> foreign volume, copy large files on it... But when I tried to create a
>>> new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
>>> receive my well known Oops. You can find my dmesg after Oops :
>>>
>> Can you send your .config and your bootup dmesg?
>>
>
> I found a problem which may lead to the operations count dropping
> below zero. If ops_complete_biofill() gets preempted in between the
> following calls:
>
> raid5.c:554> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
> raid5.c:555> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
>
> ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
> causing the assertion. In fact, the 'pending' bit should always be
> cleared first, but the other cases are protected by
> spin_lock(&sh->lock). Patch attached.
Dan,
I have modified get_stripe_work like this :
static unsigned long get_stripe_work(struct stripe_head *sh)
{
unsigned long pending;
int ack = 0;
int a,b,c,d,e,f,g;
pending = sh->ops.pending;
test_and_ack_op(STRIPE_OP_BIOFILL, pending);
a=ack;
test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
b=ack;
test_and_ack_op(STRIPE_OP_PREXOR, pending);
c=ack;
test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
d=ack;
test_and_ack_op(STRIPE_OP_POSTXOR, pending);
e=ack;
test_and_ack_op(STRIPE_OP_CHECK, pending);
f=ack;
if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
ack++;
g=ack;
sh->ops.count -= ack;
if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n",
a,b,c,d,e,f,g);
BUG_ON(sh->ops.count < 0);
return pending;
}
and I obtain on console :
1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]
If that can help you...
JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Wed, 17 Oct 2007 16:44:41 +0000 [thread overview]
Message-ID: <47163BF9.304@systella.fr> (raw)
In-Reply-To: <e9c3a7c20710170840u2ed8d6a9x26523eec6700ad11@mail.gmail.com>
Dan Williams wrote:
> On 10/17/07, Dan Williams <dan.j.williams@intel.com> wrote:
>> On 10/17/07, BERTRAND Joël <joel.bertrand@systella.fr> wrote:
>>> BERTRAND Joël wrote:
>>>> Hello,
>>>>
>>>> I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
>>>> server has a partitionable raid5 array (/dev/md/d0) and I have to
>>>> synchronize both raid5 volumes by raid1. Thus, I have tried to build a
>>>> raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
>>>> the second server) and I obtain a BUG :
>>>>
>>>> Root gershwin:[/usr/scripts] > mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
>>>> /dev/sdi1
>>>> ...
>>> Hello,
>>>
>>> I have fixed iscsi-target, and I have tested it. It works now without
>>> any trouble. Patches were posted on iscsi-target mailing list. When I
>>> use iSCSI to access to foreign raid5 volume, it works fine. I can format
>>> foreign volume, copy large files on it... But when I tried to create a
>>> new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
>>> receive my well known Oops. You can find my dmesg after Oops :
>>>
>> Can you send your .config and your bootup dmesg?
>>
>
> I found a problem which may lead to the operations count dropping
> below zero. If ops_complete_biofill() gets preempted in between the
> following calls:
>
> raid5.c:554> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
> raid5.c:555> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
>
> ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
> causing the assertion. In fact, the 'pending' bit should always be
> cleared first, but the other cases are protected by
> spin_lock(&sh->lock). Patch attached.
Dan,
I have modified get_stripe_work like this :
static unsigned long get_stripe_work(struct stripe_head *sh)
{
unsigned long pending;
int ack = 0;
int a,b,c,d,e,f,g;
pending = sh->ops.pending;
test_and_ack_op(STRIPE_OP_BIOFILL, pending);
a¬k;
test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
b¬k;
test_and_ack_op(STRIPE_OP_PREXOR, pending);
c¬k;
test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
d¬k;
test_and_ack_op(STRIPE_OP_POSTXOR, pending);
e¬k;
test_and_ack_op(STRIPE_OP_CHECK, pending);
f¬k;
if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
ack++;
g¬k;
sh->ops.count -= ack;
if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n",
a,b,c,d,e,f,g);
BUG_ON(sh->ops.count < 0);
return pending;
}
and I obtain on console :
1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]
If that can help you...
JKB
next prev parent reply other threads:[~2007-10-17 16:44 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-16 13:24 [BUG] Raid5 trouble BERTRAND Joël
2007-10-16 13:24 ` BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:58 ` Dan Williams
2007-10-17 14:58 ` Dan Williams
2007-10-17 15:40 ` Dan Williams
2007-10-17 15:40 ` Dan Williams
2007-10-17 16:44 ` BERTRAND Joël [this message]
2007-10-17 16:44 ` BERTRAND Joël
2007-10-18 0:46 ` Dan Williams
2007-10-18 0:46 ` Dan Williams
2007-10-18 8:29 ` BERTRAND Joël
2007-10-18 8:29 ` BERTRAND Joël
2007-10-19 2:55 ` Bill Davidsen
2007-10-19 2:55 ` Bill Davidsen
2007-10-19 8:04 ` BERTRAND Joël
2007-10-19 8:04 ` BERTRAND Joël
2007-10-19 15:51 ` Dan Williams
2007-10-19 15:51 ` Dan Williams
2007-10-19 16:03 ` BERTRAND Joël
2007-10-19 16:03 ` BERTRAND Joël
[not found] ` <4718DE66.8000905@tmr.com>
2007-10-19 20:42 ` BERTRAND Joël
2007-10-19 20:42 ` BERTRAND Joël
2007-10-19 20:49 ` [BUG] Raid1/5 over iSCSI trouble BERTRAND Joël
2007-10-19 20:49 ` BERTRAND Joël
2007-10-19 21:02 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:02 ` Ross S. W. Walker
2007-10-19 21:06 ` BERTRAND Joël
2007-10-19 21:06 ` [Iscsitarget-devel] " BERTRAND Joël
2007-10-19 21:10 ` Ross S. W. Walker
2007-10-19 21:10 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-20 7:45 ` BERTRAND Joël
2007-10-20 7:45 ` [Iscsitarget-devel] " BERTRAND Joël
2007-10-19 21:11 ` Scott Kaelin
2007-10-19 21:11 ` Scott Kaelin
2007-10-19 21:04 ` BERTRAND Joël
2007-10-19 21:04 ` BERTRAND Joël
2007-10-19 21:08 ` Ross S. W. Walker
2007-10-19 21:08 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:12 ` Dan Williams
2007-10-19 21:12 ` Dan Williams
2007-10-20 8:05 ` BERTRAND Joël
2007-10-20 8:05 ` BERTRAND Joël
2007-10-24 7:12 ` BERTRAND Joël
2007-10-24 7:12 ` BERTRAND Joël
2007-10-24 20:10 ` Bill Davidsen
2007-10-24 20:10 ` Bill Davidsen
2007-10-24 23:49 ` Dan Williams
2007-10-24 23:49 ` Dan Williams
2007-10-25 0:03 ` David Miller
2007-10-25 0:03 ` David Miller
2007-10-27 13:29 ` BERTRAND Joël
2007-10-27 13:29 ` BERTRAND Joël
2007-10-27 18:27 ` Dan Williams
2007-10-27 18:27 ` Dan Williams
2007-10-27 19:35 ` BERTRAND Joël
2007-10-27 19:35 ` BERTRAND Joël
2007-10-27 21:13 ` Ming Zhang
2007-10-27 21:13 ` Ming Zhang
2007-10-29 10:40 ` BERTRAND Joël
2007-10-29 10:40 ` BERTRAND Joël
2007-10-19 21:19 ` Ming Zhang
2007-10-19 21:19 ` [Iscsitarget-devel] " Ming Zhang
2007-10-19 23:50 ` Bill Davidsen
2007-10-19 23:50 ` Bill Davidsen
2007-10-19 23:58 ` Bill Davidsen
2007-10-19 23:58 ` Bill Davidsen
2007-10-20 7:52 ` BERTRAND Joël
2007-10-20 7:52 ` BERTRAND Joël
2007-10-17 16:07 ` [BUG] Raid5 trouble BERTRAND Joël
2007-10-17 16:07 ` BERTRAND Joël
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47163BF9.304@systella.fr \
--to=joel.bertrand@systella.fr \
--cc=dan.j.williams@intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.