From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Wed, 17 Oct 2007 18:44:41 +0200 [thread overview]
Message-ID: <47163BF9.304@systella.fr> (raw)
In-Reply-To: <e9c3a7c20710170840u2ed8d6a9x26523eec6700ad11@mail.gmail.com>
Dan Williams wrote:
> On 10/17/07, Dan Williams <dan.j.williams@intel.com> wrote:
>> On 10/17/07, BERTRAND Joël <joel.bertrand@systella.fr> wrote:
>>> BERTRAND Joël wrote:
>>>> Hello,
>>>>
>>>> I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
>>>> server has a partitionable raid5 array (/dev/md/d0) and I have to
>>>> synchronize both raid5 volumes by raid1. Thus, I have tried to build a
>>>> raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
>>>> the second server) and I obtain a BUG :
>>>>
>>>> Root gershwin:[/usr/scripts] > mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
>>>> /dev/sdi1
>>>> ...
>>> Hello,
>>>
>>> I have fixed iscsi-target, and I have tested it. It works now without
>>> any trouble. Patches were posted on iscsi-target mailing list. When I
>>> use iSCSI to access to foreign raid5 volume, it works fine. I can format
>>> foreign volume, copy large files on it... But when I tried to create a
>>> new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
>>> receive my well known Oops. You can find my dmesg after Oops :
>>>
>> Can you send your .config and your bootup dmesg?
>>
>
> I found a problem which may lead to the operations count dropping
> below zero. If ops_complete_biofill() gets preempted in between the
> following calls:
>
> raid5.c:554> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
> raid5.c:555> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
>
> ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
> causing the assertion. In fact, the 'pending' bit should always be
> cleared first, but the other cases are protected by
> spin_lock(&sh->lock). Patch attached.
Dan,
I have modified get_stripe_work like this :
static unsigned long get_stripe_work(struct stripe_head *sh)
{
unsigned long pending;
int ack = 0;
int a,b,c,d,e,f,g;
pending = sh->ops.pending;
test_and_ack_op(STRIPE_OP_BIOFILL, pending);
a=ack;
test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
b=ack;
test_and_ack_op(STRIPE_OP_PREXOR, pending);
c=ack;
test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
d=ack;
test_and_ack_op(STRIPE_OP_POSTXOR, pending);
e=ack;
test_and_ack_op(STRIPE_OP_CHECK, pending);
f=ack;
if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
ack++;
g=ack;
sh->ops.count -= ack;
if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n",
a,b,c,d,e,f,g);
BUG_ON(sh->ops.count < 0);
return pending;
}
and I obtain on console :
1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]
If that can help you...
JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2007-10-17 16:44 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-16 13:24 [BUG] Raid5 trouble BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:58 ` Dan Williams
2007-10-17 15:40 ` Dan Williams
2007-10-17 16:44 ` BERTRAND Joël [this message]
2007-10-18 0:46 ` Dan Williams
2007-10-18 8:29 ` BERTRAND Joël
2007-10-19 2:55 ` Bill Davidsen
2007-10-19 8:04 ` BERTRAND Joël
2007-10-19 15:51 ` Dan Williams
2007-10-19 16:03 ` BERTRAND Joël
[not found] ` <4718DE66.8000905@tmr.com>
2007-10-19 20:42 ` BERTRAND Joël
2007-10-19 20:49 ` [BUG] Raid1/5 over iSCSI trouble BERTRAND Joël
2007-10-19 21:02 ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:06 ` BERTRAND Joël
2007-10-19 21:10 ` Ross S. W. Walker
2007-10-20 7:45 ` BERTRAND Joël
2007-10-19 21:11 ` [Iscsitarget-devel] " Scott Kaelin
2007-10-19 21:04 ` BERTRAND Joël
2007-10-19 21:08 ` Ross S. W. Walker
2007-10-19 21:12 ` Dan Williams
2007-10-20 8:05 ` BERTRAND Joël
2007-10-24 7:12 ` BERTRAND Joël
2007-10-24 20:10 ` Bill Davidsen
2007-10-24 23:49 ` Dan Williams
2007-10-25 0:03 ` David Miller
2007-10-27 13:29 ` BERTRAND Joël
2007-10-27 18:27 ` Dan Williams
2007-10-27 19:35 ` BERTRAND Joël
2007-10-27 21:13 ` Ming Zhang
2007-10-29 10:40 ` BERTRAND Joël
2007-10-19 21:19 ` Ming Zhang
2007-10-19 23:50 ` Bill Davidsen
2007-10-19 23:58 ` Bill Davidsen
2007-10-20 7:52 ` BERTRAND Joël
2007-10-17 16:07 ` [BUG] Raid5 trouble BERTRAND Joël
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47163BF9.304@systella.fr \
--to=joel.bertrand@systella.fr \
--cc=dan.j.williams@intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).