From: Mike Snitzer <snitzer@redhat.com>
To: gregkh@linuxfoundation.org
Cc: neilb@suse.com, alexander.levin@microsoft.com,
stable@vger.kernel.org, stable-commits@vger.kernel.org
Subject: Re: Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.14-stable tree
Date: Thu, 22 Mar 2018 09:49:09 -0400 [thread overview]
Message-ID: <20180322134909.GC27235@redhat.com> (raw)
In-Reply-To: <152172538955144@kroah.com>
Please see my reply to:
Patch "dm: ensure bio submission follows a depth-first tree walk" has
been added to the 4.15-stable tree
NAK
On Thu, Mar 22 2018 at 9:29am -0400,
gregkh@linuxfoundation.org <gregkh@linuxfoundation.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> dm: ensure bio submission follows a depth-first tree walk
>
> to the 4.14-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>
> The filename of the patch is:
> dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
> and it can be found in the queue-4.14 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable@vger.kernel.org> know about it.
>
>
> From foo@baz Thu Mar 22 14:26:48 CET 2018
> From: NeilBrown <neilb@suse.com>
> Date: Wed, 6 Sep 2017 09:43:28 +1000
> Subject: dm: ensure bio submission follows a depth-first tree walk
>
> From: NeilBrown <neilb@suse.com>
>
>
> [ Upstream commit 18a25da84354c6bb655320de6072c00eda6eb602 ]
>
> A dm device can, in general, represent a tree of targets, each of which
> handles a sub-range of the range of blocks handled by the parent.
>
> The bio sequencing managed by generic_make_request() requires that bios
> are generated and handled in a depth-first manner. Each call to a
> make_request_fn() may submit bios to a single member device, and may
> submit bios for a reduced region of the same device as the
> make_request_fn.
>
> In particular, any bios submitted to member devices must be expected to
> be processed in order, so a later one must never wait for an earlier
> one.
>
> This ordering is usually achieved by using bio_split() to reduce a bio
> to a size that can be completely handled by one target, and resubmitting
> the remainder to the originating device. bio_queue_split() shows the
> canonical approach.
>
> dm doesn't follow this approach, largely because it has needed to split
> bios since long before bio_split() was available. It currently can
> submit bios to separate targets within the one dm_make_request() call.
> Dependencies between these targets, as can happen with dm-snap, can
> cause deadlocks if either bios gets stuck behind the other in the queues
> managed by generic_make_request(). This requires the 'rescue'
> functionality provided by dm_offload_{start,end}.
>
> Some of this requirement can be removed by changing the order of bio
> submission to follow the canonical approach. That is, if dm finds that
> it needs to split a bio, the remainder should be sent to
> generic_make_request() rather than being handled immediately. This
> delays the handling until the first part is completely processed, so the
> deadlock problems do not occur.
>
> __split_and_process_bio() can be called both from dm_make_request() and
> from dm_wq_work(). When called from dm_wq_work() the current approach
> is perfectly satisfactory as each bio will be processed immediately.
> When called from dm_make_request(), current->bio_list will be non-NULL,
> and in this case it is best to create a separate "clone" bio for the
> remainder.
>
> When we use bio_clone_bioset() to split off the front part of a bio
> and chain the two together and submit the remainder to
> generic_make_request(), it is important that the newly allocated
> bio is used as the head to be processed immediately, and the original
> bio gets "bio_advance()"d and sent to generic_make_request() as the
> remainder. Otherwise, if the newly allocated bio is used as the
> remainder, and if it then needs to be split again, then the next
> bio_clone_bioset() call will be made while holding a reference a bio
> (result of the first clone) from the same bioset. This can potentially
> exhaust the bioset mempool and result in a memory allocation deadlock.
>
> Note that there is no race caused by reassigning cio.io->bio after already
> calling __map_bio(). This bio will only be dereferenced again after
> dec_pending() has found io->io_count to be zero, and this cannot happen
> before the dec_pending() call at the end of __split_and_process_bio().
>
> To provide the clone bio when splitting, we use q->bio_split. This
> was previously being freed by bio-based dm to avoid having excess
> rescuer threads. As bio_split bio sets no longer create rescuer
> threads, there is little cost and much gain from restoring the
> q->bio_split bio set.
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> drivers/md/dm.c | 33 ++++++++++++++++++++++++---------
> 1 file changed, 24 insertions(+), 9 deletions(-)
>
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1497,8 +1497,29 @@ static void __split_and_process_bio(stru
> } else {
> ci.bio = bio;
> ci.sector_count = bio_sectors(bio);
> - while (ci.sector_count && !error)
> + while (ci.sector_count && !error) {
> error = __split_and_process_non_flush(&ci);
> + if (current->bio_list && ci.sector_count && !error) {
> + /*
> + * Remainder must be passed to generic_make_request()
> + * so that it gets handled *after* bios already submitted
> + * have been completely processed.
> + * We take a clone of the original to store in
> + * ci.io->bio to be used by end_io_acct() and
> + * for dec_pending to use for completion handling.
> + * As this path is not used for REQ_OP_ZONE_REPORT,
> + * the usage of io->bio in dm_remap_zone_report()
> + * won't be affected by this reassignment.
> + */
> + struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
> + md->queue->bio_split);
> + ci.io->bio = b;
> + bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
> + bio_chain(b, bio);
> + generic_make_request(bio);
> + break;
> + }
> + }
> }
>
> /* drop the extra reference count */
> @@ -1509,8 +1530,8 @@ static void __split_and_process_bio(stru
> *---------------------------------------------------------------*/
>
> /*
> - * The request function that just remaps the bio built up by
> - * dm_merge_bvec.
> + * The request function that remaps the bio to one target and
> + * splits off any remainder.
> */
> static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
> {
> @@ -2044,12 +2065,6 @@ int dm_setup_md_queue(struct mapped_devi
> case DM_TYPE_DAX_BIO_BASED:
> dm_init_normal_md_queue(md);
> blk_queue_make_request(md->queue, dm_make_request);
> - /*
> - * DM handles splitting bios as needed. Free the bio_split bioset
> - * since it won't be used (saves 1 process per bio-based DM device).
> - */
> - bioset_free(md->queue->bio_split);
> - md->queue->bio_split = NULL;
>
> if (type == DM_TYPE_DAX_BIO_BASED)
> queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue);
>
>
> Patches currently in stable-queue which might be from neilb@suse.com are
>
> queue-4.14/dm-ensure-bio-submission-follows-a-depth-first-tree-walk.patch
next prev parent reply other threads:[~2018-03-22 13:49 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-22 13:29 Patch "dm: ensure bio submission follows a depth-first tree walk" has been added to the 4.14-stable tree gregkh
2018-03-22 13:49 ` Mike Snitzer [this message]
2018-03-22 14:01 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180322134909.GC27235@redhat.com \
--to=snitzer@redhat.com \
--cc=alexander.levin@microsoft.com \
--cc=gregkh@linuxfoundation.org \
--cc=neilb@suse.com \
--cc=stable-commits@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.