All of lore.kernel.org
 help / color / mirror / Atom feed
From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Wed, 17 Oct 2007 18:44:41 +0200	[thread overview]
Message-ID: <47163BF9.304@systella.fr> (raw)
In-Reply-To: <e9c3a7c20710170840u2ed8d6a9x26523eec6700ad11@mail.gmail.com>

Dan Williams wrote:
> On 10/17/07, Dan Williams <dan.j.williams@intel.com> wrote:
>> On 10/17/07, BERTRAND Joël <joel.bertrand@systella.fr> wrote:
>>> BERTRAND Joël wrote:
>>>>     Hello,
>>>>
>>>>     I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
>>>> server has a partitionable raid5 array (/dev/md/d0) and I have to
>>>> synchronize both raid5 volumes by raid1. Thus, I have tried to build a
>>>> raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
>>>> the second server) and I obtain a BUG :
>>>>
>>>> Root gershwin:[/usr/scripts] > mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
>>>> /dev/sdi1
>>>> ...
>>>         Hello,
>>>
>>>         I have fixed iscsi-target, and I have tested it. It works now without
>>> any trouble. Patches were posted on iscsi-target mailing list. When I
>>> use iSCSI to access to foreign raid5 volume, it works fine. I can format
>>> foreign volume, copy large files on it... But when I tried to create a
>>> new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
>>> receive my well known Oops. You can find my dmesg after Oops :
>>>
>> Can you send your .config and your bootup dmesg?
>>
> 
> I found a problem which may lead to the operations count dropping
> below zero.  If ops_complete_biofill() gets preempted in between the
> following calls:
> 
> raid5.c:554> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
> raid5.c:555> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
> 
> ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
> causing the assertion.  In fact, the 'pending' bit should always be
> cleared first, but the other cases are protected by
> spin_lock(&sh->lock).  Patch attached.

	Dan,

	I have modified get_stripe_work like this :

static unsigned long get_stripe_work(struct stripe_head *sh)
{
         unsigned long pending;
         int ack = 0;
         int a,b,c,d,e,f,g;

         pending = sh->ops.pending;

         test_and_ack_op(STRIPE_OP_BIOFILL, pending);
         a=ack;
         test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
         b=ack;
         test_and_ack_op(STRIPE_OP_PREXOR, pending);
         c=ack;
         test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
         d=ack;
         test_and_ack_op(STRIPE_OP_POSTXOR, pending);
         e=ack;
         test_and_ack_op(STRIPE_OP_CHECK, pending);
         f=ack;
         if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
                 ack++;
         g=ack;

         sh->ops.count -= ack;

         if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n", 
a,b,c,d,e,f,g);
         BUG_ON(sh->ops.count < 0);

         return pending;
}

and I obtain on console :

  1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]

	If that can help you...

	JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-raid@vger.kernel.org, sparclinux@vger.kernel.org
Subject: Re: [BUG] Raid5 trouble
Date: Wed, 17 Oct 2007 16:44:41 +0000	[thread overview]
Message-ID: <47163BF9.304@systella.fr> (raw)
In-Reply-To: <e9c3a7c20710170840u2ed8d6a9x26523eec6700ad11@mail.gmail.com>

Dan Williams wrote:
> On 10/17/07, Dan Williams <dan.j.williams@intel.com> wrote:
>> On 10/17/07, BERTRAND Joël <joel.bertrand@systella.fr> wrote:
>>> BERTRAND Joël wrote:
>>>>     Hello,
>>>>
>>>>     I run 2.6.23 linux kernel on two T1000 (sparc64) servers. Each
>>>> server has a partitionable raid5 array (/dev/md/d0) and I have to
>>>> synchronize both raid5 volumes by raid1. Thus, I have tried to build a
>>>> raid1 volume between /dev/md/d0p1 and /dev/sdi1 (exported by iscsi from
>>>> the second server) and I obtain a BUG :
>>>>
>>>> Root gershwin:[/usr/scripts] > mdadm -C /dev/md7 -l1 -n2 /dev/md/d0p1
>>>> /dev/sdi1
>>>> ...
>>>         Hello,
>>>
>>>         I have fixed iscsi-target, and I have tested it. It works now without
>>> any trouble. Patches were posted on iscsi-target mailing list. When I
>>> use iSCSI to access to foreign raid5 volume, it works fine. I can format
>>> foreign volume, copy large files on it... But when I tried to create a
>>> new raid1 volume with a local raid5 volume and a foreign raid5 volume, I
>>> receive my well known Oops. You can find my dmesg after Oops :
>>>
>> Can you send your .config and your bootup dmesg?
>>
> 
> I found a problem which may lead to the operations count dropping
> below zero.  If ops_complete_biofill() gets preempted in between the
> following calls:
> 
> raid5.c:554> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
> raid5.c:555> clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
> 
> ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
> causing the assertion.  In fact, the 'pending' bit should always be
> cleared first, but the other cases are protected by
> spin_lock(&sh->lock).  Patch attached.

	Dan,

	I have modified get_stripe_work like this :

static unsigned long get_stripe_work(struct stripe_head *sh)
{
         unsigned long pending;
         int ack = 0;
         int a,b,c,d,e,f,g;

         pending = sh->ops.pending;

         test_and_ack_op(STRIPE_OP_BIOFILL, pending);
         a¬k;
         test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
         b¬k;
         test_and_ack_op(STRIPE_OP_PREXOR, pending);
         c¬k;
         test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
         d¬k;
         test_and_ack_op(STRIPE_OP_POSTXOR, pending);
         e¬k;
         test_and_ack_op(STRIPE_OP_CHECK, pending);
         f¬k;
         if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
                 ack++;
         g¬k;

         sh->ops.count -= ack;

         if (sh->ops.count<0) printk("%d %d %d %d %d %d %d\n", 
a,b,c,d,e,f,g);
         BUG_ON(sh->ops.count < 0);

         return pending;
}

and I obtain on console :

  1 1 1 1 1 2
kernel BUG at drivers/md/raid5.c:390!
               \|/ ____ \|/
               "@'/ .. \`@"
               /_| \__/ |_\
                  \__U_/
md7_resync(5409): Kernel bad sw trap 5 [#1]

	If that can help you...

	JKB

  reply	other threads:[~2007-10-17 16:44 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-16 13:24 [BUG] Raid5 trouble BERTRAND Joël
2007-10-16 13:24 ` BERTRAND Joël
2007-10-17 14:32 ` BERTRAND Joël
2007-10-17 14:32   ` BERTRAND Joël
2007-10-17 14:58   ` Dan Williams
2007-10-17 14:58     ` Dan Williams
2007-10-17 15:40     ` Dan Williams
2007-10-17 15:40       ` Dan Williams
2007-10-17 16:44       ` BERTRAND Joël [this message]
2007-10-17 16:44         ` BERTRAND Joël
2007-10-18  0:46         ` Dan Williams
2007-10-18  0:46           ` Dan Williams
2007-10-18  8:29           ` BERTRAND Joël
2007-10-18  8:29             ` BERTRAND Joël
2007-10-19  2:55       ` Bill Davidsen
2007-10-19  2:55         ` Bill Davidsen
2007-10-19  8:04         ` BERTRAND Joël
2007-10-19  8:04           ` BERTRAND Joël
2007-10-19 15:51           ` Dan Williams
2007-10-19 15:51             ` Dan Williams
2007-10-19 16:03             ` BERTRAND Joël
2007-10-19 16:03               ` BERTRAND Joël
     [not found]             ` <4718DE66.8000905@tmr.com>
2007-10-19 20:42               ` BERTRAND Joël
2007-10-19 20:42                 ` BERTRAND Joël
2007-10-19 20:49                 ` [BUG] Raid1/5 over iSCSI trouble BERTRAND Joël
2007-10-19 20:49                   ` BERTRAND Joël
2007-10-19 21:02                   ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:02                     ` Ross S. W. Walker
2007-10-19 21:06                     ` BERTRAND Joël
2007-10-19 21:06                       ` [Iscsitarget-devel] " BERTRAND Joël
2007-10-19 21:10                       ` Ross S. W. Walker
2007-10-19 21:10                         ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-20  7:45                         ` BERTRAND Joël
2007-10-20  7:45                           ` [Iscsitarget-devel] " BERTRAND Joël
2007-10-19 21:11                       ` Scott Kaelin
2007-10-19 21:11                         ` Scott Kaelin
2007-10-19 21:04                   ` BERTRAND Joël
2007-10-19 21:04                     ` BERTRAND Joël
2007-10-19 21:08                     ` Ross S. W. Walker
2007-10-19 21:08                       ` [Iscsitarget-devel] " Ross S. W. Walker
2007-10-19 21:12                     ` Dan Williams
2007-10-19 21:12                       ` Dan Williams
2007-10-20  8:05                       ` BERTRAND Joël
2007-10-20  8:05                         ` BERTRAND Joël
2007-10-24  7:12                         ` BERTRAND Joël
2007-10-24  7:12                           ` BERTRAND Joël
2007-10-24 20:10                           ` Bill Davidsen
2007-10-24 20:10                             ` Bill Davidsen
2007-10-24 23:49                           ` Dan Williams
2007-10-24 23:49                             ` Dan Williams
2007-10-25  0:03                             ` David Miller
2007-10-25  0:03                               ` David Miller
2007-10-27 13:29                             ` BERTRAND Joël
2007-10-27 13:29                               ` BERTRAND Joël
2007-10-27 18:27                               ` Dan Williams
2007-10-27 18:27                                 ` Dan Williams
2007-10-27 19:35                                 ` BERTRAND Joël
2007-10-27 19:35                                   ` BERTRAND Joël
2007-10-27 21:13                               ` Ming Zhang
2007-10-27 21:13                                 ` Ming Zhang
2007-10-29 10:40                                 ` BERTRAND Joël
2007-10-29 10:40                                   ` BERTRAND Joël
2007-10-19 21:19                     ` Ming Zhang
2007-10-19 21:19                       ` [Iscsitarget-devel] " Ming Zhang
2007-10-19 23:50                     ` Bill Davidsen
2007-10-19 23:50                       ` Bill Davidsen
2007-10-19 23:58                       ` Bill Davidsen
2007-10-19 23:58                         ` Bill Davidsen
2007-10-20  7:52                       ` BERTRAND Joël
2007-10-20  7:52                         ` BERTRAND Joël
2007-10-17 16:07     ` [BUG] Raid5 trouble BERTRAND Joël
2007-10-17 16:07       ` BERTRAND Joël

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47163BF9.304@systella.fr \
    --to=joel.bertrand@systella.fr \
    --cc=dan.j.williams@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.