From: Mike Snitzer <snitzer@redhat.com>
To: Ming Lei <tom.leiming@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Jeff Moyer <jmoyer@redhat.com>,
dm-devel@redhat.com, Mikulas Patocka <mpatocka@redhat.com>,
Kent Overstreet <kent.overstreet@gmail.com>,
"Alasdair G. Kergon" <agk@redhat.com>
Subject: Re: [PATCH v3 for-4.4] block: flush queued bios when process blocks to avoid deadlock
Date: Tue, 20 Oct 2015 15:57:07 -0400 [thread overview]
Message-ID: <20151020195707.GB31689@redhat.com> (raw)
In-Reply-To: <CACVXFVP7GwgWUHc4sHZwShwmdzFT5ZrHRcEH4zd5A8HommmCJg@mail.gmail.com>
On Sat, Oct 17 2015 at 12:04pm -0400,
Ming Lei <tom.leiming@gmail.com> wrote:
> On Thu, Oct 15, 2015 at 4:47 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> > From: Mikulas Patocka <mpatocka@redhat.com>
> >
> > The block layer uses per-process bio list to avoid recursion in
> > generic_make_request. When generic_make_request is called recursively,
> > the bio is added to current->bio_list and generic_make_request returns
> > immediately. The top-level instance of generic_make_request takes bios
> > from current->bio_list and processes them.
> >
> > Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
> > stacking drivers") created a workqueue for every bio set and code
> > in bio_alloc_bioset() that tries to resolve some low-memory deadlocks by
> > redirecting bios queued on current->bio_list to the workqueue if the
> > system is low on memory. However another deadlock (see below **) may
> > happen, without any low memory condition, because generic_make_request
> > is queuing bios to current->bio_list (rather than submitting them).
> >
> > Fix this deadlock by redirecting any bios on current->bio_list to the
> > bio_set's rescue workqueue on every schedule call. Consequently, when
> > the process blocks on a mutex, the bios queued on current->bio_list are
> > dispatched to independent workqueus and they can complete without
> > waiting for the mutex to be available.
>
> It isn't common to acquire mutex/semaphone inside .make_request()
> or .request_fn(), so I am wondering it is good to reuse the rescuing
> workqueue for this unusual case.
Which specific locking are you concerned about?
> Also sometimes it can hurt performance by converting I/O submission
> from one context into concurrent contexts of workqueue, especially
> in case of sequential I/O, since plug & plug merge can't be used any
> more.
True, plug and plug merge wouldn't be usable but this recursive call to
generic_make_request isn't expected to be common.
This patch was to fix a relatively obscure bio spliting scenario that
fell out of the complexity of dm-snapshot.
> > diff --git a/block/bio.c b/block/bio.c
> > index ad3f276..99f5a2ad 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -354,35 +354,35 @@ static void bio_alloc_rescue(struct work_struct *work)
> > }
> > }
> >
> > -static void punt_bios_to_rescuer(struct bio_set *bs)
> > +/**
> > + * blk_flush_bio_list
> > + * @tsk: task_struct whose bio_list must be flushed
> > + *
> > + * Pop bios queued on @tsk->bio_list and submit each of them to
> > + * their rescue workqueue.
> > + *
> > + * If the bio doesn't have a bio_set, we leave it on @tsk->bio_list.
> > + * However, stacking drivers should use bio_set, so this shouldn't be
> > + * an issue.
> > + */
> > +void blk_flush_bio_list(struct task_struct *tsk)
...
> > + while ((bio = bio_list_pop(&list))) {
> > + struct bio_set *bs = bio->bi_pool;
> > + if (unlikely(!bs)) {
> > + bio_list_add(tsk->bio_list, bio);
> > + continue;
> > + }
> >
> > - queue_work(bs->rescue_workqueue, &bs->rescue_work);
> > + spin_lock(&bs->rescue_lock);
> > + bio_list_add(&bs->rescue_list, bio);
> > + queue_work(bs->rescue_workqueue, &bs->rescue_work);
> > + spin_unlock(&bs->rescue_lock);
> > + }
>
> Not like rescuring path, schedule out can be quite frequent, and the
> above change will switch to submit these I/Os from wq concurrently,
> which might hurt performance for sequential I/O.
>
> Also I am wondering why not submit these I/Os in 'current' context
> just like what flush plug does?
Flush plug during schedule makes use of kblockd so I'm not sure what
you're referring to here.
> > }
> >
> > /**
> > @@ -422,7 +422,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
> > */
> > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> > {
> > - gfp_t saved_gfp = gfp_mask;
> > unsigned front_pad;
> > unsigned inline_vecs;
> > unsigned long idx = BIO_POOL_NONE;
> > @@ -457,23 +456,11 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> > * reserve.
> > *
> > * We solve this, and guarantee forward progress, with a rescuer
> > - * workqueue per bio_set. If we go to allocate and there are
> > - * bios on current->bio_list, we first try the allocation
> > - * without __GFP_WAIT; if that fails, we punt those bios we
> > - * would be blocking to the rescuer workqueue before we retry
> > - * with the original gfp_flags.
> > + * workqueue per bio_set. If an allocation would block (due to
> > + * __GFP_WAIT) the scheduler will first punt all bios on
> > + * current->bio_list to the rescuer workqueue.
> > */
> > -
> > - if (current->bio_list && !bio_list_empty(current->bio_list))
> > - gfp_mask &= ~__GFP_WAIT;
> > -
> > p = mempool_alloc(bs->bio_pool, gfp_mask);
> > - if (!p && gfp_mask != saved_gfp) {
> > - punt_bios_to_rescuer(bs);
> > - gfp_mask = saved_gfp;
> > - p = mempool_alloc(bs->bio_pool, gfp_mask);
> > - }
> > -
> > front_pad = bs->front_pad;
> > inline_vecs = BIO_INLINE_VECS;
> > }
> > @@ -486,12 +473,6 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> >
> > if (nr_iovecs > inline_vecs) {
> > bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, bs->bvec_pool);
> > - if (!bvl && gfp_mask != saved_gfp) {
> > - punt_bios_to_rescuer(bs);
> > - gfp_mask = saved_gfp;
> > - bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, bs->bvec_pool);
> > - }
> > -
>
> Looks you touched rescuing path for bio allocation, and better to just
> do one thing in one patch.
Yes, good point, I've split the patches, you can see the result in the 2
topmost commits in this branch:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip
Definitely helped clean up the patches and make them more
approachable/reviewable. Thanks for the suggestion.
I'll hold off on sending out v4 of these patches until I can better
undersatnd your concerns about using the rescue workqueue during
schedule.
Mike
next prev parent reply other threads:[~2015-10-20 19:57 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-27 15:03 [PATCH] block: flush queued bios when the process blocks Mikulas Patocka
2014-05-27 15:08 ` Jens Axboe
2014-05-27 15:23 ` Mikulas Patocka
2014-05-27 15:42 ` Jens Axboe
2014-05-27 16:26 ` Mikulas Patocka
2014-05-27 17:33 ` Mike Snitzer
2014-05-27 19:56 ` Kent Overstreet
2015-10-05 19:50 ` Mike Snitzer
2014-05-27 17:42 ` [PATCH] " Jens Axboe
2014-05-27 18:14 ` [dm-devel] " Christoph Hellwig
2014-05-27 19:59 ` Kent Overstreet
2014-05-27 19:56 ` Mikulas Patocka
2014-05-27 20:06 ` Kent Overstreet
2014-05-29 23:52 ` Mikulas Patocka
2015-10-05 20:59 ` Mike Snitzer
2015-10-06 13:28 ` Mikulas Patocka
2015-10-06 13:47 ` Mike Snitzer
2015-10-06 14:10 ` Mikulas Patocka
2015-10-06 14:26 ` Mikulas Patocka
2015-10-06 18:17 ` [dm-devel] " Mikulas Patocka
2015-10-06 18:50 ` Mike Snitzer
2015-10-06 20:16 ` [PATCH v2] " Mike Snitzer
2015-10-06 20:26 ` Mike Snitzer
2015-10-08 15:04 ` Mikulas Patocka
2015-10-08 15:08 ` Mike Snitzer
2015-10-09 19:52 ` Mike Snitzer
2015-10-09 19:59 ` Mike Snitzer
2015-10-14 20:47 ` [PATCH v3 for-4.4] block: flush queued bios when process blocks to avoid deadlock Mike Snitzer
2015-10-14 21:44 ` Jeff Moyer
2015-10-17 16:04 ` Ming Lei
2015-10-20 19:57 ` Mike Snitzer [this message]
2015-10-20 20:03 ` Mikulas Patocka
2015-10-21 16:38 ` Ming Lei
2015-10-21 21:49 ` Mikulas Patocka
2015-10-22 1:53 ` Ming Lei
2015-10-15 3:27 ` [PATCH v2] block: flush queued bios when the process blocks Ming Lei
2015-10-15 8:06 ` Mike Snitzer
2015-10-16 3:08 ` Ming Lei
2015-10-16 15:29 ` Mike Snitzer
2015-10-17 15:54 ` Ming Lei
2015-10-17 15:54 ` Ming Lei
2015-10-09 11:58 ` kbuild test robot
2014-05-27 17:59 ` [PATCH] " Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151020195707.GB31689@redhat.com \
--to=snitzer@redhat.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=jmoyer@redhat.com \
--cc=kent.overstreet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=tom.leiming@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.