From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Kent Overstreet <koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
bharrosh-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org,
Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
Subject: Re: [PATCH v6 07/13] block: Avoid deadlocks with bio allocation by stacking drivers
Date: Fri, 24 Aug 2012 13:28:55 -0700 [thread overview]
Message-ID: <20120824202855.GE21325@google.com> (raw)
In-Reply-To: <20120824055554.GC11977-jC9Py7bek1znysI04z7BkA@public.gmane.org>
Hello,
On Thu, Aug 23, 2012 at 10:55:54PM -0700, Kent Overstreet wrote:
> > Why aren't we turning off __GFP_WAIT instead? e.g. What if the caller
> > has one of NUMA flags or __GFP_COLD specified?
>
> Didn't think of that. The reason I did it that way is I wasn't sure if
> just doing &= ~__GFP_WAIT would be correct, since that would leave
> __GFP_IO|__GFP_FS set.
Using the appropriate __GFP_IO/FS flags is the caller's
responsibility. The only thing bioset needs to worry and take action
about here is __GFP_WAIT causing indefinite wait in mempool.
> > Plesae don't mix struct definition relocation (or any relocation
> > really) with actual changes. It buries the actual changes and makes
> > reviewing difficult.
>
> Make a new patch that does nothing more than reorder the definitions,
> then?
Yeap, with justification.
> block: Avoid deadlocks with bio allocation by stacking drivers
>
> Previously, if we ever try to allocate more than once from the same bio
> set while running under generic_make_request() (i.e. a stacking block
> driver), we risk deadlock.
>
> This is because of the code in generic_make_request() that converts
> recursion to iteration; any bios we submit won't actually be submitted
> (so they can complete and eventually be freed) until after we return -
> this means if we allocate a second bio, we're blocking the first one
> from ever being freed.
>
> Thus if enough threads call into a stacking block driver at the same
> time with bios that need multiple splits, and the bio_set's reserve gets
> used up, we deadlock.
>
> This can be worked around in the driver code - we could check if we're
> running under generic_make_request(), then mask out __GFP_WAIT when we
> go to allocate a bio, and if the allocation fails punt to workqueue and
> retry the allocation.
>
> But this is tricky and not a generic solution. This patch solves it for
> all users by inverting the previously described technique. We allocate a
> rescuer workqueue for each bio_set, and then in the allocation code if
> there are bios on current->bio_list we would be blocking, we punt them
> to the rescuer workqueue to be submitted.
>
> Tested it by forcing the rescue codepath to be taken (by disabling the
> first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
> of arbitrary bio splitting) and verified that the rescuer was being
> invoked.
Yeah, the description looks good to me.
> struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> {
> + gfp_t saved_gfp = gfp_mask;
> unsigned front_pad;
> unsigned inline_vecs;
> unsigned long idx = BIO_POOL_NONE;
> @@ -318,16 +336,44 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> p = kmalloc(sizeof(struct bio) +
> nr_iovecs * sizeof(struct bio_vec),
> gfp_mask);
> +
> front_pad = 0;
> inline_vecs = nr_iovecs;
> } else {
> + /*
> + * generic_make_request() converts recursion to iteration; this
> + * means if we're running beneath it, any bios we allocate and
> + * submit will not be submitted (and thus freed) until after we
> + * return.
> + *
> + * This exposes us to a potential deadlock if we allocate
> + * multiple bios from the same bio_set() while running
> + * underneath generic_make_request(). If we were to allocate
> + * multiple bios (say a stacking block driver that was splitting
> + * bios), we would deadlock if we exhausted the mempool's
> + * reserve.
> + *
> + * We solve this, and guarantee forward progress, with a rescuer
> + * workqueue per bio_set. If we go to allocate and there are
> + * bios on current->bio_list, we first try the allocation
> + * without __GFP_WAIT; if that fails, we punt those bios we
> + * would be blocking to the rescuer workqueue before we retry
> + * with the original gfp_flags.
> + */
Can you please add a comment in generic_make_request() to describe the
issue briefly and link back here?
> void bioset_free(struct bio_set *bs)
> {
> + if (bs->rescue_workqueue)
Why is the conditional necessary? Is it possible to have a bioset w/o
rescue_workqueue?
> + destroy_workqueue(bs->rescue_workqueue);
> +
> if (bs->bio_pool)
> mempool_destroy(bs->bio_pool);
This makes bioset_free() require process context, which probably is
okay for bioset but still isn't nice. Might worth noting in the patch
description.
Thanks.
--
tejun
WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj@kernel.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org,
dm-devel@redhat.com, vgoyal@redhat.com, mpatocka@redhat.com,
bharrosh@panasas.com, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH v6 07/13] block: Avoid deadlocks with bio allocation by stacking drivers
Date: Fri, 24 Aug 2012 13:28:55 -0700 [thread overview]
Message-ID: <20120824202855.GE21325@google.com> (raw)
In-Reply-To: <20120824055554.GC11977@moria.home.lan>
Hello,
On Thu, Aug 23, 2012 at 10:55:54PM -0700, Kent Overstreet wrote:
> > Why aren't we turning off __GFP_WAIT instead? e.g. What if the caller
> > has one of NUMA flags or __GFP_COLD specified?
>
> Didn't think of that. The reason I did it that way is I wasn't sure if
> just doing &= ~__GFP_WAIT would be correct, since that would leave
> __GFP_IO|__GFP_FS set.
Using the appropriate __GFP_IO/FS flags is the caller's
responsibility. The only thing bioset needs to worry and take action
about here is __GFP_WAIT causing indefinite wait in mempool.
> > Plesae don't mix struct definition relocation (or any relocation
> > really) with actual changes. It buries the actual changes and makes
> > reviewing difficult.
>
> Make a new patch that does nothing more than reorder the definitions,
> then?
Yeap, with justification.
> block: Avoid deadlocks with bio allocation by stacking drivers
>
> Previously, if we ever try to allocate more than once from the same bio
> set while running under generic_make_request() (i.e. a stacking block
> driver), we risk deadlock.
>
> This is because of the code in generic_make_request() that converts
> recursion to iteration; any bios we submit won't actually be submitted
> (so they can complete and eventually be freed) until after we return -
> this means if we allocate a second bio, we're blocking the first one
> from ever being freed.
>
> Thus if enough threads call into a stacking block driver at the same
> time with bios that need multiple splits, and the bio_set's reserve gets
> used up, we deadlock.
>
> This can be worked around in the driver code - we could check if we're
> running under generic_make_request(), then mask out __GFP_WAIT when we
> go to allocate a bio, and if the allocation fails punt to workqueue and
> retry the allocation.
>
> But this is tricky and not a generic solution. This patch solves it for
> all users by inverting the previously described technique. We allocate a
> rescuer workqueue for each bio_set, and then in the allocation code if
> there are bios on current->bio_list we would be blocking, we punt them
> to the rescuer workqueue to be submitted.
>
> Tested it by forcing the rescue codepath to be taken (by disabling the
> first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
> of arbitrary bio splitting) and verified that the rescuer was being
> invoked.
Yeah, the description looks good to me.
> struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> {
> + gfp_t saved_gfp = gfp_mask;
> unsigned front_pad;
> unsigned inline_vecs;
> unsigned long idx = BIO_POOL_NONE;
> @@ -318,16 +336,44 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> p = kmalloc(sizeof(struct bio) +
> nr_iovecs * sizeof(struct bio_vec),
> gfp_mask);
> +
> front_pad = 0;
> inline_vecs = nr_iovecs;
> } else {
> + /*
> + * generic_make_request() converts recursion to iteration; this
> + * means if we're running beneath it, any bios we allocate and
> + * submit will not be submitted (and thus freed) until after we
> + * return.
> + *
> + * This exposes us to a potential deadlock if we allocate
> + * multiple bios from the same bio_set() while running
> + * underneath generic_make_request(). If we were to allocate
> + * multiple bios (say a stacking block driver that was splitting
> + * bios), we would deadlock if we exhausted the mempool's
> + * reserve.
> + *
> + * We solve this, and guarantee forward progress, with a rescuer
> + * workqueue per bio_set. If we go to allocate and there are
> + * bios on current->bio_list, we first try the allocation
> + * without __GFP_WAIT; if that fails, we punt those bios we
> + * would be blocking to the rescuer workqueue before we retry
> + * with the original gfp_flags.
> + */
Can you please add a comment in generic_make_request() to describe the
issue briefly and link back here?
> void bioset_free(struct bio_set *bs)
> {
> + if (bs->rescue_workqueue)
Why is the conditional necessary? Is it possible to have a bioset w/o
rescue_workqueue?
> + destroy_workqueue(bs->rescue_workqueue);
> +
> if (bs->bio_pool)
> mempool_destroy(bs->bio_pool);
This makes bioset_free() require process context, which probably is
okay for bioset but still isn't nice. Might worth noting in the patch
description.
Thanks.
--
tejun
next prev parent reply other threads:[~2012-08-24 20:28 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-22 17:03 [PATCH v6 00/13] Block cleanups Kent Overstreet
2012-08-22 17:03 ` Kent Overstreet
2012-08-22 17:03 ` [PATCH v6 01/13] block: Generalized bio pool freeing Kent Overstreet
2012-08-22 17:03 ` Kent Overstreet
[not found] ` <1345655050-28199-2-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 21:27 ` Nicholas A. Bellinger
2012-08-22 21:27 ` Nicholas A. Bellinger
2012-08-22 17:03 ` [PATCH v6 02/13] dm: Use bioset's front_pad for dm_rq_clone_bio_info Kent Overstreet
[not found] ` <1345655050-28199-3-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 18:32 ` Tejun Heo
2012-08-22 18:32 ` Tejun Heo
2012-08-22 21:30 ` Vivek Goyal
2012-08-22 21:30 ` Vivek Goyal
[not found] ` <20120822213010.GA8020-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-24 7:14 ` Kent Overstreet
2012-08-24 7:14 ` Kent Overstreet
[not found] ` <20120824071448.GF11977-jC9Py7bek1znysI04z7BkA@public.gmane.org>
2012-08-24 18:40 ` Vivek Goyal
2012-08-24 18:40 ` Vivek Goyal
2012-08-22 17:04 ` [PATCH v6 03/13] block: Add bio_reset() Kent Overstreet
[not found] ` <1345655050-28199-4-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 18:34 ` Tejun Heo
2012-08-22 18:34 ` Tejun Heo
2012-08-22 19:51 ` Tejun Heo
2012-08-22 17:04 ` [PATCH v6 04/13] pktcdvd: Switch to bio_kmalloc() Kent Overstreet
2012-08-22 17:04 ` Kent Overstreet
[not found] ` <1345655050-28199-5-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 19:55 ` Tejun Heo
2012-08-22 19:55 ` Tejun Heo
[not found] ` <20120822195546.GG19212-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-28 23:19 ` Jiri Kosina
2012-08-28 23:19 ` Jiri Kosina
2012-08-29 4:35 ` Peter Osterlund
2012-09-03 16:15 ` Jiri Kosina
2012-09-03 16:15 ` Jiri Kosina
2012-08-22 17:04 ` [PATCH v6 05/13] block: Kill bi_destructor Kent Overstreet
2012-08-22 17:04 ` Kent Overstreet
[not found] ` <1345655050-28199-6-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 20:00 ` Tejun Heo
2012-08-22 20:00 ` Tejun Heo
[not found] ` <20120822200032.GH19212-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-24 5:09 ` Kent Overstreet
2012-08-24 5:09 ` Kent Overstreet
2012-08-22 17:04 ` [PATCH v6 06/13] block: Consolidate bio_alloc_bioset(), bio_kmalloc() Kent Overstreet
2012-08-22 17:04 ` Kent Overstreet
[not found] ` <1345655050-28199-7-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 20:17 ` Tejun Heo
2012-08-22 20:17 ` Tejun Heo
[not found] ` <20120822201730.GI19212-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-24 5:04 ` Kent Overstreet
2012-08-24 5:04 ` Kent Overstreet
[not found] ` <20120824050400.GA11977-jC9Py7bek1znysI04z7BkA@public.gmane.org>
2012-08-24 20:08 ` Tejun Heo
2012-08-24 20:08 ` Tejun Heo
2012-08-22 17:04 ` [PATCH v6 07/13] block: Avoid deadlocks with bio allocation by stacking drivers Kent Overstreet
2012-08-22 20:30 ` Tejun Heo
2012-08-22 20:30 ` Tejun Heo
2012-08-24 5:55 ` Kent Overstreet
[not found] ` <20120824055554.GC11977-jC9Py7bek1znysI04z7BkA@public.gmane.org>
2012-08-24 20:28 ` Tejun Heo [this message]
2012-08-24 20:28 ` Tejun Heo
[not found] ` <1345655050-28199-1-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 17:04 ` [PATCH v6 08/13] block: Add an explicit bio flag for bios that own their bvec Kent Overstreet
2012-08-22 17:04 ` Kent Overstreet
[not found] ` <1345655050-28199-9-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 17:43 ` Adrian Bunk
2012-08-22 17:43 ` Adrian Bunk
[not found] ` <20120822174351.GA27453-c9qLp6CXzmZPEsXTsnYjsA@public.gmane.org>
2012-08-22 19:22 ` Kent Overstreet
2012-08-22 19:22 ` Kent Overstreet
2012-08-22 20:00 ` Adrian Bunk
2012-08-22 20:00 ` Adrian Bunk
[not found] ` <20120822200038.GB27453-c9qLp6CXzmZPEsXTsnYjsA@public.gmane.org>
2012-08-28 17:23 ` Kent Overstreet
2012-08-28 17:23 ` Kent Overstreet
2012-08-22 17:04 ` [PATCH v6 09/13] block: Rename bio_split() -> bio_pair_split() Kent Overstreet
2012-08-22 17:04 ` Kent Overstreet
2012-08-22 17:04 ` [PATCH v6 10/13] block: Introduce new bio_split() Kent Overstreet
2012-08-22 17:04 ` Kent Overstreet
[not found] ` <1345655050-28199-11-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 20:46 ` Tejun Heo
2012-08-22 20:46 ` Tejun Heo
2012-08-22 17:04 ` [PATCH v6 12/13] block: Add bio_clone_bioset(), bio_clone_kmalloc() Kent Overstreet
2012-08-22 17:04 ` Kent Overstreet
[not found] ` <1345655050-28199-13-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 17:13 ` Jeff Garzik
2012-08-22 17:13 ` Jeff Garzik
2012-08-22 21:07 ` Tejun Heo
2012-08-22 21:07 ` Tejun Heo
[not found] ` <20120822210740.GM19212-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-24 6:24 ` Kent Overstreet
2012-08-24 6:24 ` Kent Overstreet
[not found] ` <20120824062418.GD11977-jC9Py7bek1znysI04z7BkA@public.gmane.org>
2012-08-24 20:36 ` Tejun Heo
2012-08-24 20:36 ` Tejun Heo
2012-08-23 18:00 ` [PATCH v6 00/13] Block cleanups Vivek Goyal
2012-08-23 18:00 ` Vivek Goyal
[not found] ` <20120823180041.GK12232-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-08-24 12:46 ` Kent Overstreet
2012-08-24 12:46 ` Kent Overstreet
2012-08-22 17:04 ` [PATCH v6 11/13] block: Rework bio_pair_split() Kent Overstreet
[not found] ` <1345655050-28199-12-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 21:04 ` Tejun Heo
2012-08-22 21:04 ` Tejun Heo
[not found] ` <20120822210410.GL19212-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-24 2:25 ` Martin K. Petersen
2012-08-24 2:25 ` Martin K. Petersen
[not found] ` <yq1y5l5hvpw.fsf-+q57XtR/GgMb6DWv4sQWN6xOck334EZe@public.gmane.org>
2012-08-24 10:37 ` Kent Overstreet
2012-08-24 10:37 ` Kent Overstreet
2012-08-24 20:58 ` Tejun Heo
2012-08-24 20:58 ` Tejun Heo
2012-08-24 10:30 ` Kent Overstreet
2012-08-24 20:53 ` Tejun Heo
2012-08-22 17:04 ` [PATCH v6 13/13] block: Only clone bio vecs that are in use Kent Overstreet
[not found] ` <1345655050-28199-14-git-send-email-koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-22 21:10 ` Tejun Heo
2012-08-22 21:10 ` Tejun Heo
[not found] ` <20120822211045.GN19212-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-08-24 7:05 ` Kent Overstreet
2012-08-24 7:05 ` Kent Overstreet
[not found] ` <20120824070508.GE11977-jC9Py7bek1znysI04z7BkA@public.gmane.org>
2012-08-24 20:42 ` Tejun Heo
2012-08-24 20:42 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120824202855.GE21325@google.com \
--to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=bharrosh-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org \
--cc=dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=koverstreet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.