From: Neil Brown <neilb@suse.de>
To: Justin Bronder <jsbronder@gentoo.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid10 device hangs during resync and heavy I/O.
Date: Sat, 7 Aug 2010 21:22:48 +1000 [thread overview]
Message-ID: <20100807212248.1bda536b@notabene> (raw)
In-Reply-To: <20100802203754.GA10647@gmail.com>
On Mon, 2 Aug 2010 16:37:54 -0400
Justin Bronder <jsbronder@gentoo.org> wrote:
> On 02/08/10 12:58 +1000, Neil Brown wrote:
> > On Mon, 2 Aug 2010 12:29:49 +1000
> > Neil Brown <neilb@suse.de> wrote:
> >
> >
> > > Ahhhh.... I see the problem. Because a 'generic_make_request' is already
> > > active, the once called by raid10::make_request just queues the request until
> > > the top level one completes. This results in a deadlock.
> > >
> > > I'll have to ponder a bit to figure out the best way to fix this.
> > >
> >
> > So, one good strong cup of tea later I think I have a good solution.
> >
> > Would you be able to test with this patch and confirm that you cannot
> > reproduce the hang?
>
> I've been running with this patch on 2.6.34.1 all day and have yet to cause
> the hang. Given it took under 5 minutes earlier, feel free to add:
>
> Tested-by: Justin Bronder <jsbronder@gentoo.org>
>
> I really appreciate you taking care of this. Thanks.
And thank you for testing. I've queued this up now and will send it to Linus
and -stable shortly.
NeilBrown
>
> > Thanks.
> >
> > NeilBrown
> >
> > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> > index 42e64e4..d1d6891 100644
> > --- a/drivers/md/raid10.c
> > +++ b/drivers/md/raid10.c
> > @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio)
> > */
> > bp = bio_split(bio,
> > chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
> > +
> > + /* Each of these 'make_request' calls will call 'wait_barrier'.
> > + * If the first succeeds but the second blocks due to the resync
> > + * thread raising the barrier, we will deadlock because the
> > + * IO to the underlying device will be queued in generic_make_request
> > + * and will never complete, so will never reduce nr_pending.
> > + * So increment nr_waiting here so no new raise_barriers will
> > + * succeed, and so the second wait_barrier cannot block.
> > + */
> > + spin_lock_irq(&conf->resync_lock);
> > + conf->nr_waiting++;
> > + spin_unlock_irq(&conf->resync_lock);
> > +
> > if (make_request(mddev, &bp->bio1))
> > generic_make_request(&bp->bio1);
> > if (make_request(mddev, &bp->bio2))
> > generic_make_request(&bp->bio2);
> >
> > + spin_lock_irq(&conf->resync_lock);
> > + conf->nr_waiting--;
> > + wake_up(&conf->wait_barrier);
> > + spin_unlock_irq(&conf->resync_lock);
> > +
> > bio_pair_release(bp);
> > return 0;
> > bad_map:
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
prev parent reply other threads:[~2010-08-07 11:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-16 18:46 Raid10 device hangs during resync and heavy I/O Justin Bronder
2010-07-16 18:49 ` Justin Bronder
2010-07-22 18:49 ` Justin Bronder
2010-07-23 3:19 ` Neil Brown
2010-07-23 15:47 ` Justin Bronder
2010-08-02 2:29 ` Neil Brown
2010-08-02 2:58 ` Neil Brown
2010-08-02 20:37 ` Justin Bronder
2010-08-07 11:22 ` Neil Brown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100807212248.1bda536b@notabene \
--to=neilb@suse.de \
--cc=jsbronder@gentoo.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).