All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Keith Busch <keith.busch@intel.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Jiri Kosina <jkosina@suse.cz>, Ming Lei <ming.lei@canonical.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	NeilBrown <neilb@suse.com>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	Takashi Iwai <tiwai@suse.de>,
	linux-bcache@vger.kernel.org, Zheng Liu <gnehzuil.liu@gmail.com>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	dm-devel@redhat.com, Shaohua Li <shli@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Alasdair Kergon <agk@redhat.com>,
	Roland Kammerer <roland.kammerer@linbit.com>
Subject: Re: [PATCH 1/1] block: fix blk_queue_split() resource exhaustion
Date: Fri, 8 Jul 2016 14:49:57 -0400	[thread overview]
Message-ID: <20160708184957.GA10765@redhat.com> (raw)
In-Reply-To: <1467990243-3531-2-git-send-email-lars.ellenberg@linbit.com>

On Fri, Jul 08 2016 at 11:04am -0400,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:

> For a long time, generic_make_request() converts recursion into
> iteration by queuing recursive arguments on current->bio_list.
> 
> This is convenient for stacking drivers,
> the top-most driver would take the originally submitted bio,
> and re-submit a re-mapped version of it, or one or more clones,
> or one or more new allocated bios to its backend(s). Which
> are then simply processed in turn, and each can again queue
> more "backend-bios" until we reach the bottom of the driver stack,
> and actually dispatch to the real backend device.
> 
> Any stacking driver ->make_request_fn() could expect that,
> once it returns, any backend-bios it submitted via recursive calls
> to generic_make_request() would now be processed and dispatched, before
> the current task would call into this driver again.
> 
> This is changed by commit
>   54efd50 block: make generic_make_request handle arbitrarily sized bios
> 
> Drivers may call blk_queue_split() inside their ->make_request_fn(),
> which may split the current bio into a front-part to be dealt with
> immediately, and a remainder-part, which may need to be split even
> further. That remainder-part will simply also be pushed to
> current->bio_list, and would end up being head-of-queue, in front
> of any backend-bios the current make_request_fn() might submit during
> processing of the fron-part.
> 
> Which means the current task would immediately end up back in the same
> make_request_fn() of the same driver again, before any of its backend
> bios have even been processed.
> 
> This can lead to resource starvation deadlock.
> Drivers could avoid this by learning to not need blk_queue_split(),
> or by submitting their backend bios in a different context (dedicated
> kernel thread, work_queue context, ...). Or by playing funny re-ordering
> games with entries on current->bio_list.
> 
> Instead, I suggest to distinguish between recursive calls to
> generic_make_request(), and pushing back the remainder part in
> blk_queue_split(), by pointing current->bio_lists to a
> 	struct recursion_to_iteration_bio_lists {
> 		struct bio_list recursion;
> 		struct bio_list queue;
> 	}
> 
> By providing each q->make_request_fn() with an empty "recursion"
> bio_list, then merging any recursively submitted bios to the
> head of the "queue" list, we can make the recursion-to-iteration
> logic in generic_make_request() process deepest level bios first,
> and "sibling" bios of the same level in "natural" order.
> 
> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
> ---
>  block/bio.c               | 27 +++++++++++++++++--------
>  block/blk-core.c          | 50 +++++++++++++++++++++++++----------------------
>  block/blk-merge.c         |  5 ++++-
>  drivers/md/bcache/btree.c | 12 ++++++------
>  drivers/md/dm-bufio.c     |  2 +-
>  drivers/md/md.h           |  7 +++++++
>  drivers/md/raid1.c        |  5 ++---
>  drivers/md/raid10.c       |  5 ++---
>  include/linux/bio.h       | 18 +++++++++++++++++
>  include/linux/sched.h     |  4 ++--
>  10 files changed, 88 insertions(+), 47 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 848cd35..1f9fcf0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -366,12 +366,16 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>  	 */
>  
>  	bio_list_init(&punt);
> -	bio_list_init(&nopunt);
>  
> -	while ((bio = bio_list_pop(current->bio_list)))
> +	bio_list_init(&nopunt);
> +	while ((bio = bio_list_pop(&current->bio_lists->recursion)))
>  		bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> +	current->bio_lists->recursion = nopunt;
>  
> -	*current->bio_list = nopunt;
> +	bio_list_init(&nopunt);
> +	while ((bio = bio_list_pop(&current->bio_lists->queue)))
> +		bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> +	current->bio_lists->queue = nopunt;
>  
>  	spin_lock(&bs->rescue_lock);
>  	bio_list_merge(&bs->rescue_list, &punt);
> @@ -380,6 +384,13 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>  	queue_work(bs->rescue_workqueue, &bs->rescue_work);
>  }
>  
> +static bool current_has_pending_bios(void)
> +{
> +	return current->bio_lists &&
> +		(!bio_list_empty(&current->bio_lists->queue) ||
> +		 !bio_list_empty(&current->bio_lists->recursion));
> +}
> +

This should be moved to include/linux/bio.h

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3cfd67d..74bceea 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2066,35 +2071,34 @@ blk_qc_t generic_make_request(struct bio *bio)
>  	 * Before entering the loop, bio->bi_next is NULL (as all callers
>  	 * ensure that) so we have a list with a single bio.
>  	 * We pretend that we have just taken it off a longer list, so
> -	 * we assign bio_list to a pointer to the bio_list_on_stack,
> -	 * thus initialising the bio_list of new bios to be
> -	 * added.  ->make_request() may indeed add some more bios
> -	 * through a recursive call to generic_make_request.  If it
> -	 * did, we find a non-NULL value in bio_list and re-enter the loop
> -	 * from the top.  In this case we really did just take the bio
> -	 * of the top of the list (no pretending) and so remove it from
> -	 * bio_list, and call into ->make_request() again.
> +	 * we assign bio_list to a pointer to the bio_lists_on_stack,
> +	 * thus initialising the bio_lists of new bios to be added.
> +	 * ->make_request() may indeed add some more bios to .recursion
> +	 * through a recursive call to generic_make_request.  If it did,
> +	 * we find a non-NULL value in .recursion, merge .recursion back to the
> +	 * head of .queue, and re-enter the loop from the top.  In this case we
> +	 * really did just take the bio of the top of the list (no pretending)
> +	 * and so remove it from .queue, and call into ->make_request() again.
>  	 */
>  	BUG_ON(bio->bi_next);
> -	bio_list_init(&bio_list_on_stack);
> -	current->bio_list = &bio_list_on_stack;
> +	bio_list_init(&bio_lists_on_stack.queue);
> +	current->bio_lists = &bio_lists_on_stack;
>  	do {
>  		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>  
>  		if (likely(blk_queue_enter(q, false) == 0)) {
> +			bio_list_init(&bio_lists_on_stack.recursion);
>  			ret = q->make_request_fn(q, bio);
> -
>  			blk_queue_exit(q);
> -
> -			bio = bio_list_pop(current->bio_list);
> +			bio_list_merge_head(&bio_lists_on_stack.queue,
> +					&bio_lists_on_stack.recursion);
> +			/* XXX bio_list_init(&bio_lists_on_stack.recursion); */

Please remove this XXX commented code.

> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b4f3352..b62e65f4 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -664,6 +664,13 @@ static inline void rdev_dec_pending(struct md_rdev *rdev, struct mddev *mddev)
>  	}
>  }
>  
> +static inline bool current_has_pending_bios(void)
> +{
> +	return current->bio_lists && (
> +	      !bio_list_empty(&current->bio_lists->queue) ||
> +	      !bio_list_empty(&current->bio_lists->recursion));
> +}
> +

This can be removed now that include/linux/bio.h exports the same.

  reply	other threads:[~2016-07-08 18:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08 15:04 [PATCH 0/1] block: fix blk_queue_split() resource exhaustion Lars Ellenberg
2016-07-08 15:04 ` [PATCH 1/1] " Lars Ellenberg
2016-07-08 18:49   ` Mike Snitzer [this message]
2016-07-11 14:13     ` Lars Ellenberg
2016-07-11 14:10   ` [PATCH v2 " Lars Ellenberg
2016-07-12  2:55     ` [dm-devel] " NeilBrown
2016-07-13  2:18       ` Eric Wheeler
2016-07-13  2:32         ` Mike Snitzer
2016-07-19  9:00           ` Lars Ellenberg
2016-07-19  9:00             ` Lars Ellenberg
2016-07-21 22:53             ` Eric Wheeler
2016-07-25 20:39               ` Jeff Moyer
2016-08-11  4:16             ` Eric Wheeler
2017-01-07 19:56             ` Lars Ellenberg
2017-01-07 19:56               ` Lars Ellenberg
2016-09-16 20:14         ` [dm-devel] " Matthias Ferdinand
2016-09-18 23:10           ` Eric Wheeler
2016-09-19 20:43             ` Matthias Ferdinand
2016-09-21 21:08               ` bcache: discard BUG (was: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion) Eric Wheeler
2016-12-23  8:49     ` [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion Michael Wang
2016-12-23 11:45       ` Lars Ellenberg
2016-12-23 11:45         ` Lars Ellenberg
2017-01-02 14:33         ` [dm-devel] " Jack Wang
2017-01-02 14:33           ` Jack Wang
2017-01-04  5:12           ` NeilBrown
2017-01-04  5:12             ` [dm-devel] " NeilBrown
2017-01-04 18:50             ` Mike Snitzer
2017-01-04 18:50               ` Mike Snitzer
2017-01-05 10:54               ` 王金浦
2017-01-05 10:54                 ` 王金浦
2017-01-06 16:50               ` Mikulas Patocka
2017-01-06 16:50                 ` Mikulas Patocka
2017-01-06 17:34                 ` Mikulas Patocka
2017-01-06 17:34                   ` Mikulas Patocka
2017-01-06 17:34                   ` Mikulas Patocka
2017-01-06 19:52                   ` Mike Snitzer
2017-01-06 19:52                     ` Mike Snitzer
2017-01-06 23:01                     ` NeilBrown
2017-01-06 23:01                       ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160708184957.GA10765@redhat.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=keith.busch@intel.com \
    --cc=kent.overstreet@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@canonical.com \
    --cc=mingo@redhat.com \
    --cc=neilb@suse.com \
    --cc=peterz@infradead.org \
    --cc=roland.kammerer@linbit.com \
    --cc=shli@kernel.org \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.