From: Mike Snitzer <snitzer@redhat.com>
To: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: linux-block@vger.kernel.org,
Roland Kammerer <roland.kammerer@linbit.com>,
Jens Axboe <axboe@kernel.dk>, NeilBrown <neilb@suse.com>,
Kent Overstreet <kent.overstreet@gmail.com>,
Shaohua Li <shli@kernel.org>, Alasdair Kergon <agk@redhat.com>,
dm-devel@redhat.com, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Takashi Iwai <tiwai@suse.de>, Jiri Kosina <jkosina@suse.cz>,
Zheng Liu <gnehzuil.liu@gmail.com>,
Ming Lei <ming.lei@canonical.com>,
Keith Busch <keith.busch@intel.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-kernel@vger.kernel.org, linux-bcache@vger.kernel.org,
linux-raid@vger.kernel.org
Subject: Re: block: fix blk_queue_split() resource exhaustion
Date: Thu, 7 Jul 2016 10:36:39 -0400 [thread overview]
Message-ID: <20160707143637.GA2890@redhat.com> (raw)
In-Reply-To: <1466583730-28595-1-git-send-email-lars.ellenberg@linbit.com>
On Wed, Jun 22 2016 at 4:22am -0400,
Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> For a long time, generic_make_request() converts recursion into
> iteration by queuing recursive arguments on current->bio_list.
>
> This is convenient for stacking drivers,
> the top-most driver would take the originally submitted bio,
> and re-submit a re-mapped version of it, or one or more clones,
> or one or more new allocated bios to its backend(s). Which
> are then simply processed in turn, and each can again queue
> more "backend-bios" until we reach the bottom of the driver stack,
> and actually dispatch to the real backend device.
>
> Any stacking driver ->make_request_fn() could expect that,
> once it returns, any backend-bios it submitted via recursive calls
> to generic_make_request() would now be processed and dispatched, before
> the current task would call into this driver again.
>
> This is changed by commit
> 54efd50 block: make generic_make_request handle arbitrarily sized bios
>
> Drivers may call blk_queue_split() inside their ->make_request_fn(),
> which may split the current bio into a front-part to be dealt with
> immediately, and a remainder-part, which may need to be split even
> further. That remainder-part will simply also be pushed to
> current->bio_list, and would end up being head-of-queue, in front
> of any backend-bios the current make_request_fn() might submit during
> processing of the fron-part.
>
> Which means the current task would immediately end up back in the same
> make_request_fn() of the same driver again, before any of its backend
> bios have even been processed.
>
> This can lead to resource starvation deadlock.
> Drivers could avoid this by learning to not need blk_queue_split(),
> or by submitting their backend bios in a different context (dedicated
> kernel thread, work_queue context, ...). Or by playing funny re-ordering
> games with entries on current->bio_list.
>
> Instead, I suggest to distinguish between recursive calls to
> generic_make_request(), and pushing back the remainder part in
> blk_queue_split(), by pointing current->bio_lists to a
> struct recursion_to_iteration_bio_lists {
> struct bio_list recursion;
> struct bio_list remainder;
> }
>
> To have all bios targeted to drivers lower in the stack processed before
> processing the next piece of a bio targeted at the higher levels,
> as long as queued bios resulting from recursion are available,
> they will continue to be processed in FIFO order.
> Pushed back bio-parts resulting from blk_queue_split() will be processed
> in LIFO order, one-by-one, whenever the recursion list becomes empty.
>
> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
I've rebased this patch against Jens' for-4.8/core (resolved conflict in
blk-merge.c) and pushed the result to this wip2 branch, feel free to use
it to resubmit for inclusion if/when that is the way forward, see:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip2&id=36cee4b1ddef0a46562045b421792a847c570b6b
prev parent reply other threads:[~2016-07-07 14:36 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-22 8:22 [RFC] block: fix blk_queue_split() resource exhaustion Lars Ellenberg
2016-06-22 8:22 ` Lars Ellenberg
2016-06-24 11:36 ` Ming Lei
2016-06-24 14:27 ` Lars Ellenberg
2016-06-24 15:15 ` Mike Snitzer
2016-06-28 8:24 ` Lars Ellenberg
2016-06-25 9:30 ` [RFC] " Ming Lei
2016-06-28 8:45 ` Lars Ellenberg
2016-07-02 10:03 ` Ming Lei
2016-07-02 10:28 ` Ming Lei
2016-07-04 8:20 ` Lars Ellenberg
2016-07-04 10:47 ` Ming Lei
2016-07-06 12:38 ` Lars Ellenberg
2016-07-06 15:57 ` Ming Lei
2016-07-07 8:03 ` Lars Ellenberg
2016-07-07 8:03 ` Lars Ellenberg
2016-07-07 13:14 ` Ming Lei
2016-07-07 5:35 ` [dm-devel] " NeilBrown
2016-07-07 5:35 ` NeilBrown
2016-07-07 8:16 ` Lars Ellenberg
2016-07-07 12:39 ` Lars Ellenberg
2016-07-07 12:39 ` Lars Ellenberg
2016-07-07 12:47 ` Mike Snitzer
2016-07-07 22:07 ` [dm-devel] [RFC] " NeilBrown
2016-07-08 8:02 ` Lars Ellenberg
2016-07-08 9:39 ` NeilBrown
2016-07-08 13:00 ` Lars Ellenberg
2016-07-08 13:24 ` Re[2]: " Pavel Goran
2016-07-08 13:59 ` Lars Ellenberg
2016-07-08 11:08 ` Ming Lei
2016-07-08 11:08 ` Ming Lei
2016-07-08 12:52 ` Lars Ellenberg
2016-07-08 13:05 ` Mike Snitzer
2016-07-07 12:45 ` Mike Snitzer
2016-07-07 22:40 ` NeilBrown
2016-07-07 14:36 ` Mike Snitzer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160707143637.GA2890@redhat.com \
--to=snitzer@redhat.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=gnehzuil.liu@gmail.com \
--cc=jkosina@suse.cz \
--cc=keith.busch@intel.com \
--cc=kent.overstreet@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=lars.ellenberg@linbit.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@canonical.com \
--cc=mingo@redhat.com \
--cc=neilb@suse.com \
--cc=peterz@infradead.org \
--cc=roland.kammerer@linbit.com \
--cc=shli@kernel.org \
--cc=tiwai@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.