Re: raid5, 2 drives dead at same time,kernel will Oops?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "3tcdgwg3" <3tcdgwg3@prodigy.net>
To: 3tcdgwg3 <3tcdgwg3@prodigy.net>, Neil Brown <neilb@cse.unsw.edu.au>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid5, 2 drives dead at same time,kernel will Oops?
Date: Fri, 30 May 2003 13:33:03 -0700	[thread overview]
Message-ID: <020701c326ea$abfa0d80$7b07a8c0@pluto> (raw)
In-Reply-To: 00a901c31ffe$19ead340$7b07a8c0@pluto

Hi,

I have some other issues under this "more than 1
arm broken in a raid5 array" condition. The next
important one is this:

If I have two arrays, if first one is a raid5 and resync
is going on, second is another raid5, resync is schedule
to start after the first raid5 array synced.
At this time, if I kill to arms in the first raid5 array, the resync
will stop, but never aborted, consequently, second raid5
array never get a chance to start the resync.
Is there a fix for this?

Thanks

-W


----- Original Message -----
From: "3tcdgwg3" <3tcdgwg3@prodigy.net>
To: "Neil Brown" <neilb@cse.unsw.edu.au>
Cc: <linux-raid@vger.kernel.org>
Sent: Wednesday, May 21, 2003 6:04 PM
Subject: Re: raid5, 2 drives dead at same time,kernel will Oops?


> Neil,
> Preliminary test looks good, will test more
> when have time.
>
> Thanks,
> -Will.
> ----- Original Message -----
> From: "Neil Brown" <neilb@cse.unsw.edu.au>
> To: "3tcdgwg3" <3tcdgwg3@prodigy.net>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Tuesday, May 20, 2003 7:42 PM
> Subject: Re: raid5, 2 drives dead at same time,kernel will Oops?
>
>
> > On Monday May 19, 3tcdgwg3@prodigy.net wrote:
> > > Hi,
> > >
> > > I am trying to simulate a case that two drives
> > > in an array fail ad same time.
> > > I use two ide drives, I try to create a
> > > raid 5 array with 4 arms, created as following:
> > >
> > > /dev/hdc1
> > > /dev/hde1
> > > /dev/hdc2
> > > /dev/hde2
> > >
> > > This is just for test, I know create two arms on
> > > one hard drive doesn't make much sense.
> > >
> > >
> > > Anyway, when I run this array, if I power off one
> > > of hard drive (/dev/hde) to simulate two arms failing
> > > at same  time in an array, I got system Oops. I am using
> > > 2.4-18 kernel.
> > >
> > > Anyone can tell me if this is normal? or if there is a fix for this?
> > >
> >
> > Congratulations and thanks.  You have managed to trigger a bug that
> > no-one else has found.
> >
> > The following patch (against 2.4.20) should fix it.  If you can test
> > and confirm I would really appreciate it.
> >
> > NeilBrown
> >
> >
> > ------------------------------------------------------------
> > Handle concurrent failure of two drives in raid5
> >
> > If two drives both fail during a write request, raid5 doesn't
> > cope properly and will eventually oops.
> >
> > With this patch, blocks that have already been 'written'
> > are failed when double drive failure is noticed, as well as
> > blocks that are about to be written.
> >
> >  ----------- Diffstat output ------------
> >  ./drivers/md/raid5.c |   10 +++++++++-
> >  1 files changed, 9 insertions(+), 1 deletion(-)
> >
> > diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
> > --- ./drivers/md/raid5.c~current~ 2003-05-21 12:42:07.000000000 +1000
> > +++ ./drivers/md/raid5.c 2003-05-21 12:37:37.000000000 +1000
> > @@ -882,7 +882,7 @@ static void handle_stripe(struct stripe_
> >   /* check if the array has lost two devices and, if so, some requests
> might
> >   * need to be failed
> >   */
> > - if (failed > 1 && to_read+to_write) {
> > + if (failed > 1 && to_read+to_write+written) {
> >   for (i=disks; i--; ) {
> >   /* fail all writes first */
> >   if (sh->bh_write[i]) to_write--;
> > @@ -891,6 +891,14 @@ static void handle_stripe(struct stripe_
> >   bh->b_reqnext = return_fail;
> >   return_fail = bh;
> >   }
> > + /* and fail all 'written' */
> > + if (sh->bh_written[i]) written--;
> > + while ((bh = sh->bh_written[i])) {
> > + sh->bh_written[i] = bh->b_reqnext;
> > + bh->b_reqnext = return_fail;
> > + return_fail = bh;
> > + }
> > +
> >   /* fail any reads if this device is non-operational */
> >   if (!conf->disks[i].operational) {
> >   spin_lock_irq(&conf->device_lock);
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2003-05-30 20:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-05-06 23:24 Upgrading Raid1 array to have 2. disks Anders Fugmann
2003-05-06 23:46 ` Mads Peter Bach
2003-05-07  0:00 ` Neil Brown
2003-05-08 19:11   ` Anders Fugmann
2003-05-20  0:33   ` raid5, 2 drives dead at same time,kernel will Oops? 3tcdgwg3
2003-05-21  2:42     ` Neil Brown
2003-05-22  1:04       ` 3tcdgwg3
2003-05-30 20:33         ` 3tcdgwg3 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='020701c326ea$abfa0d80$7b07a8c0@pluto' \
    --to=3tcdgwg3@prodigy.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.