stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
@ 2018-03-22 13:06 gregkh
  2018-03-22 13:47 ` Mike Snitzer
  0 siblings, 1 reply; 4+ messages in thread
From: gregkh @ 2018-03-22 13:06 UTC (permalink / raw)
  To: neilb, alexander.levin, gregkh, snitzer; +Cc: stable, stable-commits


This is a note to let you know that I've just added the patch titled

    dm: ensure bio submission follows a depth-first tree walk

to the 4.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
and it can be found in the queue-4.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From foo@baz Thu Mar 22 14:03:39 CET 2018
From: NeilBrown <neilb@suse.com>
Date: Wed, 6 Sep 2017 09:43:28 +1000
Subject: dm: ensure bio submission follows a depth-first tree walk

From: NeilBrown <neilb@suse.com>


[ Upstream commit 18a25da84354c6bb655320de6072c00eda6eb602 ]

A dm device can, in general, represent a tree of targets, each of which
handles a sub-range of the range of blocks handled by the parent.

The bio sequencing managed by generic_make_request() requires that bios
are generated and handled in a depth-first manner.  Each call to a
make_request_fn() may submit bios to a single member device, and may
submit bios for a reduced region of the same device as the
make_request_fn.

In particular, any bios submitted to member devices must be expected to
be processed in order, so a later one must never wait for an earlier
one.

This ordering is usually achieved by using bio_split() to reduce a bio
to a size that can be completely handled by one target, and resubmitting
the remainder to the originating device. bio_queue_split() shows the
canonical approach.

dm doesn't follow this approach, largely because it has needed to split
bios since long before bio_split() was available.  It currently can
submit bios to separate targets within the one dm_make_request() call.
Dependencies between these targets, as can happen with dm-snap, can
cause deadlocks if either bios gets stuck behind the other in the queues
managed by generic_make_request().  This requires the 'rescue'
functionality provided by dm_offload_{start,end}.

Some of this requirement can be removed by changing the order of bio
submission to follow the canonical approach.  That is, if dm finds that
it needs to split a bio, the remainder should be sent to
generic_make_request() rather than being handled immediately.  This
delays the handling until the first part is completely processed, so the
deadlock problems do not occur.

__split_and_process_bio() can be called both from dm_make_request() and
from dm_wq_work().  When called from dm_wq_work() the current approach
is perfectly satisfactory as each bio will be processed immediately.
When called from dm_make_request(), current->bio_list will be non-NULL,
and in this case it is best to create a separate "clone" bio for the
remainder.

When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder.  Otherwise, if the newly allocated bio is used as the
remainder, and if it then needs to be split again, then the next
bio_clone_bioset() call will be made while holding a reference a bio
(result of the first clone) from the same bioset.  This can potentially
exhaust the bioset mempool and result in a memory allocation deadlock.

Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio().  This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().

To provide the clone bio when splitting, we use q->bio_split.  This
was previously being freed by bio-based dm to avoid having excess
rescuer threads.  As bio_split bio sets no longer create rescuer
threads, there is little cost and much gain from restoring the
q->bio_split bio set.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/dm.c |   33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1499,8 +1499,29 @@ static void __split_and_process_bio(stru
 	} else {
 		ci.bio = bio;
 		ci.sector_count = bio_sectors(bio);
-		while (ci.sector_count && !error)
+		while (ci.sector_count && !error) {
 			error = __split_and_process_non_flush(&ci);
+			if (current->bio_list && ci.sector_count && !error) {
+				/*
+				 * Remainder must be passed to generic_make_request()
+				 * so that it gets handled *after* bios already submitted
+				 * have been completely processed.
+				 * We take a clone of the original to store in
+				 * ci.io->bio to be used by end_io_acct() and
+				 * for dec_pending to use for completion handling.
+				 * As this path is not used for REQ_OP_ZONE_REPORT,
+				 * the usage of io->bio in dm_remap_zone_report()
+				 * won't be affected by this reassignment.
+				 */
+				struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
+								 md->queue->bio_split);
+				ci.io->bio = b;
+				bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
+				bio_chain(b, bio);
+				generic_make_request(bio);
+				break;
+			}
+		}
 	}
 
 	/* drop the extra reference count */
@@ -1511,8 +1532,8 @@ static void __split_and_process_bio(stru
  *---------------------------------------------------------------*/
 
 /*
- * The request function that just remaps the bio built up by
- * dm_merge_bvec.
+ * The request function that remaps the bio to one target and
+ * splits off any remainder.
  */
 static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
 {
@@ -2035,12 +2056,6 @@ int dm_setup_md_queue(struct mapped_devi
 	case DM_TYPE_DAX_BIO_BASED:
 		dm_init_normal_md_queue(md);
 		blk_queue_make_request(md->queue, dm_make_request);
-		/*
-		 * DM handles splitting bios as needed.  Free the bio_split bioset
-		 * since it won't be used (saves 1 process per bio-based DM device).
-		 */
-		bioset_free(md->queue->bio_split);
-		md->queue->bio_split = NULL;
 
 		if (type == DM_TYPE_DAX_BIO_BASED)
 			queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue);


Patches currently in stable-queue which might be from neilb@suse.com are

queue-4.15/dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
  2018-03-22 13:06 Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree gregkh
@ 2018-03-22 13:47 ` Mike Snitzer
  2018-03-22 13:56   ` Mikulas Patocka
  2018-03-22 14:01   ` Greg KH
  0 siblings, 2 replies; 4+ messages in thread
From: Mike Snitzer @ 2018-03-22 13:47 UTC (permalink / raw)
  To: gregkh
  Cc: neilb, alexander.levin, stable, stable-commits, dm-devel,
	Mikulas Patocka

On Thu, Mar 22 2018 at  9:06am -0400,
gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> wrote:

> 
> This is a note to let you know that I've just added the patch titled
> 
>     dm: ensure bio submission follows a depth-first tree walk
> 
> to the 4.15-stable tree which can be found at:
>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> 
> The filename of the patch is:
>      dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
> and it can be found in the queue-4.15 subdirectory.
> 
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable@vger.kernel.org> know about it.

The following applies to 4.14 stable@ too.

I think it very questionable to pull this into stable trees.  How'd it
even elevate to be considered for stable@?

But if you do, you definitely need this additional upstream commit:
8dd601fa8317 ("dm: correctly handle chained bios in dec_pending()")

Even with that I'm not excited about supporting some partial backport of
these changes because there were a lot more related changes -- only
taking a subset makes these stable@ kernels unicorns.. I deal with
enough unicorn vendor kernels (but on my terms, with my control over
what is "supportable"):

Anyway, other related commits are:
80cd17578310 dm crypt: remove BIOSET_NEED_RESCUER flag
c110a4b6e603 dm io: remove BIOSET_NEED_RESCUER flag from bios bioset
f31c21e4365c dm: remove unused 'num_write_bios' target interface
318716ddea08 dm: safely allocate multiple bioset bios
4a3f54d94d5c dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure
0776aa0e30aa dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs
3d7f45625a84 dm: fix __send_changing_extent_only() to send first bio and chain remainder

So NAK from me.  If others have a compelling argument I'm open to
considering.

Mike

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
  2018-03-22 13:47 ` Mike Snitzer
@ 2018-03-22 13:56   ` Mikulas Patocka
  2018-03-22 14:01   ` Greg KH
  1 sibling, 0 replies; 4+ messages in thread
From: Mikulas Patocka @ 2018-03-22 13:56 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: gregkh, neilb, alexander.levin, stable, stable-commits, dm-devel

I agree.

The patch "dm: ensure bio submission follows a depth-first tree walk" 
doesn't really fix any bug (and it may introduce new bugs), so it isn't 
applicable to the stable kernel branches.

The kernel 4.15 and below have a "dm_offload" mechanism that avoids the 
possible deadlocks in dm, there is no reason why this should be converted 
to the "depth-first tree walk".

Mikulas


On Thu, 22 Mar 2018, Mike Snitzer wrote:

> On Thu, Mar 22 2018 at  9:06am -0400,
> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> wrote:
> 
> > 
> > This is a note to let you know that I've just added the patch titled
> > 
> >     dm: ensure bio submission follows a depth-first tree walk
> > 
> > to the 4.15-stable tree which can be found at:
> >     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> > 
> > The filename of the patch is:
> >      dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
> > and it can be found in the queue-4.15 subdirectory.
> > 
> > If you, or anyone else, feels it should not be added to the stable tree,
> > please let <stable@vger.kernel.org> know about it.
> 
> The following applies to 4.14 stable@ too.
> 
> I think it very questionable to pull this into stable trees.  How'd it
> even elevate to be considered for stable@?
> 
> But if you do, you definitely need this additional upstream commit:
> 8dd601fa8317 ("dm: correctly handle chained bios in dec_pending()")
> 
> Even with that I'm not excited about supporting some partial backport of
> these changes because there were a lot more related changes -- only
> taking a subset makes these stable@ kernels unicorns.. I deal with
> enough unicorn vendor kernels (but on my terms, with my control over
> what is "supportable"):
> 
> Anyway, other related commits are:
> 80cd17578310 dm crypt: remove BIOSET_NEED_RESCUER flag
> c110a4b6e603 dm io: remove BIOSET_NEED_RESCUER flag from bios bioset
> f31c21e4365c dm: remove unused 'num_write_bios' target interface
> 318716ddea08 dm: safely allocate multiple bioset bios
> 4a3f54d94d5c dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure
> 0776aa0e30aa dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs
> 3d7f45625a84 dm: fix __send_changing_extent_only() to send first bio and chain remainder
> 
> So NAK from me.  If others have a compelling argument I'm open to
> considering.
> 
> Mike
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree
  2018-03-22 13:47 ` Mike Snitzer
  2018-03-22 13:56   ` Mikulas Patocka
@ 2018-03-22 14:01   ` Greg KH
  1 sibling, 0 replies; 4+ messages in thread
From: Greg KH @ 2018-03-22 14:01 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: neilb, alexander.levin, stable, stable-commits, dm-devel,
	Mikulas Patocka

On Thu, Mar 22, 2018 at 09:47:59AM -0400, Mike Snitzer wrote:
> On Thu, Mar 22 2018 at  9:06am -0400,
> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> wrote:
> 
> > 
> > This is a note to let you know that I've just added the patch titled
> > 
> >     dm: ensure bio submission follows a depth-first tree walk
> > 
> > to the 4.15-stable tree which can be found at:
> >     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> > 
> > The filename of the patch is:
> >      dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
> > and it can be found in the queue-4.15 subdirectory.
> > 
> > If you, or anyone else, feels it should not be added to the stable tree,
> > please let <stable@vger.kernel.org> know about it.
> 
> The following applies to 4.14 stable@ too.
> 
> I think it very questionable to pull this into stable trees.  How'd it
> even elevate to be considered for stable@?
> 
> But if you do, you definitely need this additional upstream commit:
> 8dd601fa8317 ("dm: correctly handle chained bios in dec_pending()")
> 
> Even with that I'm not excited about supporting some partial backport of
> these changes because there were a lot more related changes -- only
> taking a subset makes these stable@ kernels unicorns.. I deal with
> enough unicorn vendor kernels (but on my terms, with my control over
> what is "supportable"):
> 
> Anyway, other related commits are:
> 80cd17578310 dm crypt: remove BIOSET_NEED_RESCUER flag
> c110a4b6e603 dm io: remove BIOSET_NEED_RESCUER flag from bios bioset
> f31c21e4365c dm: remove unused 'num_write_bios' target interface
> 318716ddea08 dm: safely allocate multiple bioset bios
> 4a3f54d94d5c dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure
> 0776aa0e30aa dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs
> 3d7f45625a84 dm: fix __send_changing_extent_only() to send first bio and chain remainder
> 
> So NAK from me.  If others have a compelling argument I'm open to
> considering.

Thanks for letting me know, I've now dropped this from the 4.15.y,
4.14.y, and 4.9.y stable patch queues.

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-03-22 14:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-22 13:06 Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.15-stable tree gregkh
2018-03-22 13:47 ` Mike Snitzer
2018-03-22 13:56   ` Mikulas Patocka
2018-03-22 14:01   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).