linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>,
	tim.gardner@canonical.com, gregkh@suse.de
Subject: Re: RAID5: failing an active component during spare rebuild - arrays hangs
Date: Wed, 14 Dec 2011 22:32:40 +1100	[thread overview]
Message-ID: <20111214223240.01045828@notabene.brown> (raw)
In-Reply-To: <CAGRgLy7fs5AUGk1a+Cp36VgqRvP23fYWx_GM9yViJM6YmmMdSw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5441 bytes --]

On Wed, 14 Dec 2011 12:27:43 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hello Neil,
> we are looking at Ubuntu-oneiric kernel 3.0.0-14.23.
> We see that this fix was delivered to it by the following commit:
> ---------------------------------
> commit 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
> Author: NeilBrown <neilb@suse.de>
> Date:   Wed Oct 26 10:31:04 2011 +1100
> 
>     md/raid5: fix bug that could result in reads from a failed device.
> 
>     BugLink: http://bugs.launchpad.net/bugs/890952
> 
>     commit 355840e7a7e56bb2834fd3b0da64da5465f8aeaa upstream.
> ------------------------------------
> However, when looking at the diff, we see that only handle_stripe6()
> function was fixed and not handle_stripe5(). That also explains why we
> saw this issue on oneiric with raid5. Here is the diff:
> ----------------------------------------------------------
> alex@ubuntu-alyakas-srv:/mnt/share/src/ubuntu-oneiric$ git diff
> ccfe5df60a583cbad36969344679903585e2eac7
> 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 2581ba1..e509147 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3369,7 +3369,7 @@ static void handle_stripe6(struct stripe_head *sh)
>                         /* Not in-sync */;
>                 else if (test_bit(In_sync, &rdev->flags))
>                         set_bit(R5_Insync, &dev->flags);
> -               else {
> +               else if (!test_bit(Faulty, &rdev->flags)) {
>                         /* in sync if before recovery_offset */
>                         if (sh->sector + STRIPE_SECTORS <=
> rdev->recovery_offset)
>                                 set_bit(R5_Insync, &dev->flags);
> -----------------------------------------------
> 
> What is the reason the fix for raid5 was not applied there? Should we
> apply the same fix for raid5 as well manually?
> Copying also other two persons signed on the commit.

Yes, I stuffed up when I back-ported the patch for -stable and missed the
RAID5 bit I've been meaning to send and update to stable but haven't yet.
Will do it in the morning - thanks for the reminder.

NeilBrown


> 
> Thanks,
>   Alex.
> 
> 
> On Tue, Dec 6, 2011 at 11:21 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 6 Dec 2011 23:07:53 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Thanks, Neil!!!
> >> Looks like this patch solves the issue. I applied it manually though,
> >> for some reason git refused to apply it.
> >>
> >> Thanks again for great help,
> >>   Alex.
> >
> > Great.  Thanks for the confirmation.
> >
> > NeilBrown
> >
> >
> >>
> >>
> >> On Tue, Dec 6, 2011 at 5:16 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Sun, 27 Nov 2011 11:56:17 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> >> > wrote:
> >> >
> >> >> Hello Neil,
> >> >> we have compiled the natty kernel with dynamic debugging enabled for
> >> >> raid456, and reproduced the problem.
> >> >> The kernel log is available at
> >> >> https://docs.google.com/open?id=0B9rmyUifdvMLMzk1YjYwZDUtYzhhYi00MDRlLTkzYjItMDM0Y2ZhZmU3ZDRk
> >> >>
> >> >> Some more information:
> >> >> - array was created at Nov 27 11:28:03
> >> >> - manual drive failure was issued at 11:28:09
> >> >>
> >> >> Please let me know if you need any additional information.
> >> >>
> >> >
> >> > Hi,
> >> >  sorry for the long delay, I've had a lot of distractions this past week.
> >> >
> >> > I looks like you are hitting the bug fixed by upstream commit
> >> >    355840e7a7e56bb2834fd3b0da64da5465f8aeaa
> >> >
> >> > The symptoms are slightly different to those described in that commit but I'm
> >> > sure the root problem is the same.
> >> >
> >> > That patch doesn't apply to 2.6.38 though.
> >> > Use this one.
> >> >
> >> > NeilBrown
> >> >
> >> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> > index 78536fd..8144126 100644
> >> > --- a/drivers/md/raid5.c
> >> > +++ b/drivers/md/raid5.c
> >> > @@ -3086,7 +3086,7 @@ static void handle_stripe5(struct stripe_head *sh)
> >> >                        /* Not in-sync */;
> >> >                else if (test_bit(In_sync, &rdev->flags))
> >> >                        set_bit(R5_Insync, &dev->flags);
> >> > -               else {
> >> > +               else if (!test_bit(Faulty, &rdev->flags)) {
> >> >                        /* could be in-sync depending on recovery/reshape status */
> >> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
> >> >                                set_bit(R5_Insync, &dev->flags);
> >> > @@ -3377,7 +3377,7 @@ static void handle_stripe6(struct stripe_head *sh)
> >> >                        /* Not in-sync */;
> >> >                else if (test_bit(In_sync, &rdev->flags))
> >> >                        set_bit(R5_Insync, &dev->flags);
> >> > -               else {
> >> > +               else if (!test_bit(Faulty, &rdev->flags)) {
> >> >                        /* in sync if before recovery_offset */
> >> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
> >> >                                set_bit(R5_Insync, &dev->flags);
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2011-12-14 11:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BANLkTikkeoCsr3-UBSPEDrYwh4jGSn=MaA@mail.gmail.com>
2011-06-05 19:41 ` RAID5: failing an active component during spare rebuild - arrays hangs Alexander Lyakas
     [not found]   ` <20110605230014.14822hd7b50rcqww@cakebox.homeunix.net>
2011-06-06 18:19     ` Alexander Lyakas
2011-06-21  8:05       ` Alexander Lyakas
2011-06-22  2:54   ` NeilBrown
2011-06-26 18:13     ` Alexander Lyakas
2011-06-28  2:29       ` NeilBrown
2011-07-17  8:29         ` Alexander Lyakas
2011-08-25  8:59           ` Alexander Lyakas
2011-08-25 10:10             ` Alexander Lyakas
2011-08-31  2:46             ` NeilBrown
2011-11-27  9:56               ` Alexander Lyakas
2011-12-06  3:16                 ` NeilBrown
2011-12-06 21:07                   ` Alexander Lyakas
2011-12-06 21:21                     ` NeilBrown
2011-12-14 10:27                       ` Alexander Lyakas
2011-12-14 11:32                         ` NeilBrown [this message]
2011-12-15 14:38                           ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111214223240.01045828@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=gregkh@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=tim.gardner@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).