* Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
@ 2018-03-22 13:06 gregkh
2018-03-22 13:47 ` Mike Snitzer
0 siblings, 1 reply; 4+ messages in thread
From: gregkh @ 2018-03-22 13:06 UTC (permalink / raw)
To: neilb, alexander.levin, gregkh, snitzer; +Cc: stable, stable-commits
This is a note to let you know that I've just added the patch titled
dm: ensure bio submission follows a depth-first tree walk
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
>From foo@baz Thu Mar 22 14:03:39 CET 2018
From: NeilBrown <neilb@suse.com>
Date: Wed, 6 Sep 2017 09:43:28 +1000
Subject: dm: ensure bio submission follows a depth-first tree walk
From: NeilBrown <neilb@suse.com>
[ Upstream commit 18a25da84354c6bb655320de6072c00eda6eb602 ]
A dm device can, in general, represent a tree of targets, each of which
handles a sub-range of the range of blocks handled by the parent.
The bio sequencing managed by generic_make_request() requires that bios
are generated and handled in a depth-first manner. Each call to a
make_request_fn() may submit bios to a single member device, and may
submit bios for a reduced region of the same device as the
make_request_fn.
In particular, any bios submitted to member devices must be expected to
be processed in order, so a later one must never wait for an earlier
one.
This ordering is usually achieved by using bio_split() to reduce a bio
to a size that can be completely handled by one target, and resubmitting
the remainder to the originating device. bio_queue_split() shows the
canonical approach.
dm doesn't follow this approach, largely because it has needed to split
bios since long before bio_split() was available. It currently can
submit bios to separate targets within the one dm_make_request() call.
Dependencies between these targets, as can happen with dm-snap, can
cause deadlocks if either bios gets stuck behind the other in the queues
managed by generic_make_request(). This requires the 'rescue'
functionality provided by dm_offload_{start,end}.
Some of this requirement can be removed by changing the order of bio
submission to follow the canonical approach. That is, if dm finds that
it needs to split a bio, the remainder should be sent to
generic_make_request() rather than being handled immediately. This
delays the handling until the first part is completely processed, so the
deadlock problems do not occur.
__split_and_process_bio() can be called both from dm_make_request() and
from dm_wq_work(). When called from dm_wq_work() the current approach
is perfectly satisfactory as each bio will be processed immediately.
When called from dm_make_request(), current->bio_list will be non-NULL,
and in this case it is best to create a separate "clone" bio for the
remainder.
When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder. Otherwise, if the newly allocated bio is used as the
remainder, and if it then needs to be split again, then the next
bio_clone_bioset() call will be made while holding a reference a bio
(result of the first clone) from the same bioset. This can potentially
exhaust the bioset mempool and result in a memory allocation deadlock.
Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio(). This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().
To provide the clone bio when splitting, we use q->bio_split. This
was previously being freed by bio-based dm to avoid having excess
rescuer threads. As bio_split bio sets no longer create rescuer
threads, there is little cost and much gain from restoring the
q->bio_split bio set.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/md/dm.c | 33 ++++++++++++++++++++++++---------
1 file changed, 24 insertions(+), 9 deletions(-)
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1499,8 +1499,29 @@ static void __split_and_process_bio(stru
} else {
ci.bio = bio;
ci.sector_count = bio_sectors(bio);
- while (ci.sector_count && !error)
+ while (ci.sector_count && !error) {
error = __split_and_process_non_flush(&ci);
+ if (current->bio_list && ci.sector_count && !error) {
+ /*
+ * Remainder must be passed to generic_make_request()
+ * so that it gets handled *after* bios already submitted
+ * have been completely processed.
+ * We take a clone of the original to store in
+ * ci.io->bio to be used by end_io_acct() and
+ * for dec_pending to use for completion handling.
+ * As this path is not used for REQ_OP_ZONE_REPORT,
+ * the usage of io->bio in dm_remap_zone_report()
+ * won't be affected by this reassignment.
+ */
+ struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
+ md->queue->bio_split);
+ ci.io->bio = b;
+ bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
+ bio_chain(b, bio);
+ generic_make_request(bio);
+ break;
+ }
+ }
}
/* drop the extra reference count */
@@ -1511,8 +1532,8 @@ static void __split_and_process_bio(stru
*---------------------------------------------------------------*/
/*
- * The request function that just remaps the bio built up by
- * dm_merge_bvec.
+ * The request function that remaps the bio to one target and
+ * splits off any remainder.
*/
static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
{
@@ -2035,12 +2056,6 @@ int dm_setup_md_queue(struct mapped_devi
case DM_TYPE_DAX_BIO_BASED:
dm_init_normal_md_queue(md);
blk_queue_make_request(md->queue, dm_make_request);
- /*
- * DM handles splitting bios as needed. Free the bio_split bioset
- * since it won't be used (saves 1 process per bio-based DM device).
- */
- bioset_free(md->queue->bio_split);
- md->queue->bio_split = NULL;
if (type == DM_TYPE_DAX_BIO_BASED)
queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue);
Patches currently in stable-queue which might be from neilb@suse.com are
queue-4.15/dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
2018-03-22 13:06 Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree gregkh
@ 2018-03-22 13:47 ` Mike Snitzer
2018-03-22 13:56 ` Mikulas Patocka
2018-03-22 14:01 ` Greg KH
0 siblings, 2 replies; 4+ messages in thread
From: Mike Snitzer @ 2018-03-22 13:47 UTC (permalink / raw)
To: gregkh
Cc: neilb, alexander.levin, stable, stable-commits, dm-devel,
Mikulas Patocka
On Thu, Mar 22 2018 at 9:06am -0400,
gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> dm: ensure bio submission follows a depth-first tree walk
>
> to the 4.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>
> The filename of the patch is:
> dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
> and it can be found in the queue-4.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable@vger.kernel.org> know about it.
The following applies to 4.14 stable@ too.
I think it very questionable to pull this into stable trees. How'd it
even elevate to be considered for stable@?
But if you do, you definitely need this additional upstream commit:
8dd601fa8317 ("dm: correctly handle chained bios in dec_pending()")
Even with that I'm not excited about supporting some partial backport of
these changes because there were a lot more related changes -- only
taking a subset makes these stable@ kernels unicorns.. I deal with
enough unicorn vendor kernels (but on my terms, with my control over
what is "supportable"):
Anyway, other related commits are:
80cd17578310 dm crypt: remove BIOSET_NEED_RESCUER flag
c110a4b6e603 dm io: remove BIOSET_NEED_RESCUER flag from bios bioset
f31c21e4365c dm: remove unused 'num_write_bios' target interface
318716ddea08 dm: safely allocate multiple bioset bios
4a3f54d94d5c dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure
0776aa0e30aa dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs
3d7f45625a84 dm: fix __send_changing_extent_only() to send first bio and chain remainder
So NAK from me. If others have a compelling argument I'm open to
considering.
Mike
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
2018-03-22 13:47 ` Mike Snitzer
@ 2018-03-22 13:56 ` Mikulas Patocka
2018-03-22 14:01 ` Greg KH
1 sibling, 0 replies; 4+ messages in thread
From: Mikulas Patocka @ 2018-03-22 13:56 UTC (permalink / raw)
To: Mike Snitzer
Cc: gregkh, neilb, alexander.levin, stable, stable-commits, dm-devel
I agree.
The patch "dm: ensure bio submission follows a depth-first tree walk"
doesn't really fix any bug (and it may introduce new bugs), so it isn't
applicable to the stable kernel branches.
The kernel 4.15 and below have a "dm_offload" mechanism that avoids the
possible deadlocks in dm, there is no reason why this should be converted
to the "depth-first tree walk".
Mikulas
On Thu, 22 Mar 2018, Mike Snitzer wrote:
> On Thu, Mar 22 2018 at 9:06am -0400,
> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> wrote:
>
> >
> > This is a note to let you know that I've just added the patch titled
> >
> > dm: ensure bio submission follows a depth-first tree walk
> >
> > to the 4.15-stable tree which can be found at:
> > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> >
> > The filename of the patch is:
> > dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
> > and it can be found in the queue-4.15 subdirectory.
> >
> > If you, or anyone else, feels it should not be added to the stable tree,
> > please let <stable@vger.kernel.org> know about it.
>
> The following applies to 4.14 stable@ too.
>
> I think it very questionable to pull this into stable trees. How'd it
> even elevate to be considered for stable@?
>
> But if you do, you definitely need this additional upstream commit:
> 8dd601fa8317 ("dm: correctly handle chained bios in dec_pending()")
>
> Even with that I'm not excited about supporting some partial backport of
> these changes because there were a lot more related changes -- only
> taking a subset makes these stable@ kernels unicorns.. I deal with
> enough unicorn vendor kernels (but on my terms, with my control over
> what is "supportable"):
>
> Anyway, other related commits are:
> 80cd17578310 dm crypt: remove BIOSET_NEED_RESCUER flag
> c110a4b6e603 dm io: remove BIOSET_NEED_RESCUER flag from bios bioset
> f31c21e4365c dm: remove unused 'num_write_bios' target interface
> 318716ddea08 dm: safely allocate multiple bioset bios
> 4a3f54d94d5c dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure
> 0776aa0e30aa dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs
> 3d7f45625a84 dm: fix __send_changing_extent_only() to send first bio and chain remainder
>
> So NAK from me. If others have a compelling argument I'm open to
> considering.
>
> Mike
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
2018-03-22 13:47 ` Mike Snitzer
2018-03-22 13:56 ` Mikulas Patocka
@ 2018-03-22 14:01 ` Greg KH
1 sibling, 0 replies; 4+ messages in thread
From: Greg KH @ 2018-03-22 14:01 UTC (permalink / raw)
To: Mike Snitzer
Cc: neilb, alexander.levin, stable, stable-commits, dm-devel,
Mikulas Patocka
On Thu, Mar 22, 2018 at 09:47:59AM -0400, Mike Snitzer wrote:
> On Thu, Mar 22 2018 at 9:06am -0400,
> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> wrote:
>
> >
> > This is a note to let you know that I've just added the patch titled
> >
> > dm: ensure bio submission follows a depth-first tree walk
> >
> > to the 4.15-stable tree which can be found at:
> > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> >
> > The filename of the patch is:
> > dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
> > and it can be found in the queue-4.15 subdirectory.
> >
> > If you, or anyone else, feels it should not be added to the stable tree,
> > please let <stable@vger.kernel.org> know about it.
>
> The following applies to 4.14 stable@ too.
>
> I think it very questionable to pull this into stable trees. How'd it
> even elevate to be considered for stable@?
>
> But if you do, you definitely need this additional upstream commit:
> 8dd601fa8317 ("dm: correctly handle chained bios in dec_pending()")
>
> Even with that I'm not excited about supporting some partial backport of
> these changes because there were a lot more related changes -- only
> taking a subset makes these stable@ kernels unicorns.. I deal with
> enough unicorn vendor kernels (but on my terms, with my control over
> what is "supportable"):
>
> Anyway, other related commits are:
> 80cd17578310 dm crypt: remove BIOSET_NEED_RESCUER flag
> c110a4b6e603 dm io: remove BIOSET_NEED_RESCUER flag from bios bioset
> f31c21e4365c dm: remove unused 'num_write_bios' target interface
> 318716ddea08 dm: safely allocate multiple bioset bios
> 4a3f54d94d5c dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure
> 0776aa0e30aa dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs
> 3d7f45625a84 dm: fix __send_changing_extent_only() to send first bio and chain remainder
>
> So NAK from me. If others have a compelling argument I'm open to
> considering.
Thanks for letting me know, I've now dropped this from the 4.15.y,
4.14.y, and 4.9.y stable patch queues.
greg k-h
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-03-22 14:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-22 13:06 Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree gregkh
2018-03-22 13:47 ` Mike Snitzer
2018-03-22 13:56 ` Mikulas Patocka
2018-03-22 14:01 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).