From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>,
tim.gardner@canonical.com, gregkh@suse.de
Subject: Re: RAID5: failing an active component during spare rebuild - arrays hangs
Date: Wed, 14 Dec 2011 22:32:40 +1100 [thread overview]
Message-ID: <20111214223240.01045828@notabene.brown> (raw)
In-Reply-To: <CAGRgLy7fs5AUGk1a+Cp36VgqRvP23fYWx_GM9yViJM6YmmMdSw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5441 bytes --]
On Wed, 14 Dec 2011 12:27:43 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:
> Hello Neil,
> we are looking at Ubuntu-oneiric kernel 3.0.0-14.23.
> We see that this fix was delivered to it by the following commit:
> ---------------------------------
> commit 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
> Author: NeilBrown <neilb@suse.de>
> Date: Wed Oct 26 10:31:04 2011 +1100
>
> md/raid5: fix bug that could result in reads from a failed device.
>
> BugLink: http://bugs.launchpad.net/bugs/890952
>
> commit 355840e7a7e56bb2834fd3b0da64da5465f8aeaa upstream.
> ------------------------------------
> However, when looking at the diff, we see that only handle_stripe6()
> function was fixed and not handle_stripe5(). That also explains why we
> saw this issue on oneiric with raid5. Here is the diff:
> ----------------------------------------------------------
> alex@ubuntu-alyakas-srv:/mnt/share/src/ubuntu-oneiric$ git diff
> ccfe5df60a583cbad36969344679903585e2eac7
> 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 2581ba1..e509147 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3369,7 +3369,7 @@ static void handle_stripe6(struct stripe_head *sh)
> /* Not in-sync */;
> else if (test_bit(In_sync, &rdev->flags))
> set_bit(R5_Insync, &dev->flags);
> - else {
> + else if (!test_bit(Faulty, &rdev->flags)) {
> /* in sync if before recovery_offset */
> if (sh->sector + STRIPE_SECTORS <=
> rdev->recovery_offset)
> set_bit(R5_Insync, &dev->flags);
> -----------------------------------------------
>
> What is the reason the fix for raid5 was not applied there? Should we
> apply the same fix for raid5 as well manually?
> Copying also other two persons signed on the commit.
Yes, I stuffed up when I back-ported the patch for -stable and missed the
RAID5 bit I've been meaning to send and update to stable but haven't yet.
Will do it in the morning - thanks for the reminder.
NeilBrown
>
> Thanks,
> Alex.
>
>
> On Tue, Dec 6, 2011 at 11:21 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 6 Dec 2011 23:07:53 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Thanks, Neil!!!
> >> Looks like this patch solves the issue. I applied it manually though,
> >> for some reason git refused to apply it.
> >>
> >> Thanks again for great help,
> >> Alex.
> >
> > Great. Thanks for the confirmation.
> >
> > NeilBrown
> >
> >
> >>
> >>
> >> On Tue, Dec 6, 2011 at 5:16 AM, NeilBrown <neilb@suse.de> wrote:
> >> > On Sun, 27 Nov 2011 11:56:17 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> >> > wrote:
> >> >
> >> >> Hello Neil,
> >> >> we have compiled the natty kernel with dynamic debugging enabled for
> >> >> raid456, and reproduced the problem.
> >> >> The kernel log is available at
> >> >> https://docs.google.com/open?id=0B9rmyUifdvMLMzk1YjYwZDUtYzhhYi00MDRlLTkzYjItMDM0Y2ZhZmU3ZDRk
> >> >>
> >> >> Some more information:
> >> >> - array was created at Nov 27 11:28:03
> >> >> - manual drive failure was issued at 11:28:09
> >> >>
> >> >> Please let me know if you need any additional information.
> >> >>
> >> >
> >> > Hi,
> >> > sorry for the long delay, I've had a lot of distractions this past week.
> >> >
> >> > I looks like you are hitting the bug fixed by upstream commit
> >> > 355840e7a7e56bb2834fd3b0da64da5465f8aeaa
> >> >
> >> > The symptoms are slightly different to those described in that commit but I'm
> >> > sure the root problem is the same.
> >> >
> >> > That patch doesn't apply to 2.6.38 though.
> >> > Use this one.
> >> >
> >> > NeilBrown
> >> >
> >> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> > index 78536fd..8144126 100644
> >> > --- a/drivers/md/raid5.c
> >> > +++ b/drivers/md/raid5.c
> >> > @@ -3086,7 +3086,7 @@ static void handle_stripe5(struct stripe_head *sh)
> >> > /* Not in-sync */;
> >> > else if (test_bit(In_sync, &rdev->flags))
> >> > set_bit(R5_Insync, &dev->flags);
> >> > - else {
> >> > + else if (!test_bit(Faulty, &rdev->flags)) {
> >> > /* could be in-sync depending on recovery/reshape status */
> >> > if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
> >> > set_bit(R5_Insync, &dev->flags);
> >> > @@ -3377,7 +3377,7 @@ static void handle_stripe6(struct stripe_head *sh)
> >> > /* Not in-sync */;
> >> > else if (test_bit(In_sync, &rdev->flags))
> >> > set_bit(R5_Insync, &dev->flags);
> >> > - else {
> >> > + else if (!test_bit(Faulty, &rdev->flags)) {
> >> > /* in sync if before recovery_offset */
> >> > if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
> >> > set_bit(R5_Insync, &dev->flags);
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2011-12-14 11:32 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BANLkTikkeoCsr3-UBSPEDrYwh4jGSn=MaA@mail.gmail.com>
2011-06-05 19:41 ` RAID5: failing an active component during spare rebuild - arrays hangs Alexander Lyakas
[not found] ` <20110605230014.14822hd7b50rcqww@cakebox.homeunix.net>
2011-06-06 18:19 ` Alexander Lyakas
2011-06-21 8:05 ` Alexander Lyakas
2011-06-22 2:54 ` NeilBrown
2011-06-26 18:13 ` Alexander Lyakas
2011-06-28 2:29 ` NeilBrown
2011-07-17 8:29 ` Alexander Lyakas
2011-08-25 8:59 ` Alexander Lyakas
2011-08-25 10:10 ` Alexander Lyakas
2011-08-31 2:46 ` NeilBrown
2011-11-27 9:56 ` Alexander Lyakas
2011-12-06 3:16 ` NeilBrown
2011-12-06 21:07 ` Alexander Lyakas
2011-12-06 21:21 ` NeilBrown
2011-12-14 10:27 ` Alexander Lyakas
2011-12-14 11:32 ` NeilBrown [this message]
2011-12-15 14:38 ` Alexander Lyakas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111214223240.01045828@notabene.brown \
--to=neilb@suse.de \
--cc=alex.bolshoy@gmail.com \
--cc=gregkh@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=tim.gardner@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).