From: NeilBrown <neilb@suse.de>
To: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@kernel.org>,
Eryu Guan <eguan@redhat.com>
Subject: Re: raid5 lockups post ca64cae96037de16e4af92678814f5d4bf0c1c65
Date: Tue, 12 Mar 2013 09:32:31 +1100 [thread overview]
Message-ID: <20130312093231.72c54735@notabene.brown> (raw)
In-Reply-To: <wrfj4ngog9h0.fsf@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2537 bytes --]
On Wed, 06 Mar 2013 10:31:55 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
wrote:
> NeilBrown <neilb@suse.de> writes:
> > On Tue, 05 Mar 2013 09:44:54 +0100 Jes Sorensen <Jes.Sorensen@redhat.com>
> > wrote:
> >> > Does this fix it?
> >> >
> >> > NeilBrown
> >>
> >> Unfortunately no, I still see these crashes with this one applied :(
> >>
> >
> > Thanks - the symptom looked similar, but now that I look more closely I can
> > see it is quite different.
> >
> > How about this then? I can't really see what is happening, but based on the
> > patch that you identified it must be related to these flags.
> > It seems that handle_stripe_clean_event() is being called to early, and it
> > doesn't clear out the ->written bios because they are still locked or
> > something. But it does clear R5_Discard on the parity block, so
> > handle_stripe_clean_event doesn't get called again.
> >
> > This makes the handling of the various flags somewhat more uniform, which is
> > probably a good thing.
>
> Hi Neil,
>
> With this one applied I end up with an OOPS instead. Note I had to
> modify the last test/clear bit sequence to use &sh->dev[i].flags instead
> of &dev->flags to avoid a compiler warning.
Oops.
>
> I am attaching the test script I am running too. It was written by Eryu
> Guan.
Thanks for that. I've tried using it but haven't managed to trigger a BUG
yet. What size are the loop files? I mostly use fairly small ones, but
maybe it needs to be bigger to trigger the problem.
My current guess is that recovery and discard are both called on the stripe
at the same time and they race and leave the stripe in a slightly confused
state. But I haven't found the exact state yet.
The discard code always attaches a 'discard' request to every device in a
stripe_head all at once, under stripe_lock. However when ops_bio_drain picks
those discard requests of ->towrite and puts them on ->written, it takes
stripe_lock once per device.
Maybe we just need to change the coverage to stripe_lock there to be held for
the entire loop. We would still want to drop it before calling
async_copy_data() and reclaim afterwards, but that wouldn't affect the
'discard' case.
>
>
> [ 2623.554780] kernel BUG at drivers/md/raid5.c:2954!
Could you confirm exactly which line this was - there are a few BUG_ON()s
around there. They are all related to R5_UPTODATE not being set I think,
but it might help to know exactly when it isn't set.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-03-11 22:32 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-04 13:50 raid5 lockups post ca64cae96037de16e4af92678814f5d4bf0c1c65 Jes Sorensen
2013-03-04 21:00 ` NeilBrown
2013-03-05 8:44 ` Jes Sorensen
2013-03-06 2:18 ` NeilBrown
2013-03-06 9:31 ` Jes Sorensen
2013-03-11 22:32 ` NeilBrown [this message]
2013-03-12 1:32 ` NeilBrown
2013-03-12 11:12 ` joystick
2013-03-20 0:54 ` NeilBrown
2013-03-12 13:45 ` Jes Sorensen
2013-03-12 23:35 ` NeilBrown
2013-03-13 7:32 ` Jes Sorensen
2013-03-14 7:35 ` Jes Sorensen
2013-03-20 0:55 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130312093231.72c54735@notabene.brown \
--to=neilb@suse.de \
--cc=Jes.Sorensen@redhat.com \
--cc=eguan@redhat.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).