From: Mike Snitzer <snitzer@redhat.com>
To: NeilBrown <neilb@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
Jens Axboe <axboe@kernel.dk>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-block@vger.kernel.org,
device-mapper development <dm-devel@redhat.com>,
Zdenek Kabelac <zkabelac@redhat.com>
Subject: Re: new patchset to eliminate DM's use of BIOSET_NEED_RESCUER
Date: Tue, 21 Nov 2017 23:28:38 -0500 [thread overview]
Message-ID: <20171122042838.GB20417@redhat.com> (raw)
In-Reply-To: <87bmjv0xos.fsf@notabene.neil.brown.name>
On Tue, Nov 21 2017 at 11:00pm -0500,
NeilBrown <neilb@suse.com> wrote:
> On Tue, Nov 21 2017, Mikulas Patocka wrote:
>
> > On Tue, 21 Nov 2017, Mike Snitzer wrote:
> >
> >> On Tue, Nov 21 2017 at 4:23pm -0500,
> >> Mikulas Patocka <mpatocka@redhat.com> wrote:
> >>
> >> > This is not correct:
> >> >
> >> > 2206 static void dm_wq_work(struct work_struct *work)
> >> > 2207 {
> >> > 2208 struct mapped_device *md = container_of(work, struct mapped_device, work);
> >> > 2209 struct bio *bio;
> >> > 2210 int srcu_idx;
> >> > 2211 struct dm_table *map;
> >> > 2212
> >> > 2213 if (!bio_list_empty(&md->rescued)) {
> >> > 2214 struct bio_list list;
> >> > 2215 spin_lock_irq(&md->deferred_lock);
> >> > 2216 list = md->rescued;
> >> > 2217 bio_list_init(&md->rescued);
> >> > 2218 spin_unlock_irq(&md->deferred_lock);
> >> > 2219 while ((bio = bio_list_pop(&list)))
> >> > 2220 generic_make_request(bio);
> >> > 2221 }
> >> > 2222
> >> > 2223 map = dm_get_live_table(md, &srcu_idx);
> >> > 2224
> >> > 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> >> > 2226 spin_lock_irq(&md->deferred_lock);
> >> > 2227 bio = bio_list_pop(&md->deferred);
> >> > 2228 spin_unlock_irq(&md->deferred_lock);
> >> > 2229
> >> > 2230 if (!bio)
> >> > 2231 break;
> >> > 2232
> >> > 2233 if (dm_request_based(md))
> >> > 2234 generic_make_request(bio);
> >> > 2235 else
> >> > 2236 __split_and_process_bio(md, map, bio);
> >> > 2237 }
> >> > 2238
> >> > 2239 dm_put_live_table(md, srcu_idx);
> >> > 2240 }
> >> >
> >> > You can see that if we are in dm_wq_work in __split_and_process_bio, we
> >> > will not process md->rescued list.
> >>
> >> Can you elaborate further? We cannot be "in dm_wq_work in
> >> __split_and_process_bio" simultaneously. Do you mean as a side-effect
> >> of scheduling away from __split_and_process_bio?
> >>
> >> The more detail you can share the better.
> >
> > Suppose this scenario:
> >
> > * dm_wq_work calls __split_and_process_bio
> > * __split_and_process_bio eventually reaches the function snapshot_map
> > * snapshot_map attempts to take the snapshot lock
> >
> > * the snapshot lock could be released only if some bios submitted by the
> > snapshot driver to the underlying device complete
> > * the bios submitted to the underlying device were already offloaded by
> > some other task and they are waiting on the list md->rescued
> > * the bios waiting on md->rescued are not processed, because dm_wq_work is
> > blocked in snapshot_map (called from __split_and_process_bio)
>
> Yes, I think you are right.
>
> I think the solution is to get rid of the dm_offload() infrastructure
> and make it not necessary.
> i.e. discard my patches
> dm: prepare to discontinue use of BIOSET_NEED_RESCUER
> and
> dm: revise 'rescue' strategy for bio-based bioset allocations
>
> And build on "dm: ensure bio submission follows a depth-first tree walk"
> which was written after those and already makes dm_offload() less
> important.
>
> Since that "depth-first" patch, every request to the dm device, after
> the initial splitting, allocates just one dm_target_io structure, and
> makes just one __map_bio() call, and so will behave exactly the way
> generic_make_request() expects and copes with - thus avoiding awkward
> dependencies and deadlocks. Except....
Yes, FYI I've also verified that even with just the "depth-first" patch
(and dm_offload disabled) the snapshot deadlock is fixed.
> a/ If any target defines ->num_write_bios() to return >1,
> __clone_and_map_data_bio() will make multiple calls to alloc_tio()
> and __map_bio(), which might need rescuing.
> But no target defines num_write_bios, and none have since it was
> removed from dm-cache 4.5 years ago.
> Can we discard num_write_bios??
Yes.
> b/ If any target sets any of num_{flush,discard,write_same,write_zeroes}_bios
> to a value > 1, then __send_duplicate_bios() will also make multiple
> calls to alloc_tio() and __map_bio().
> Some do.
> dm-cache-target: flush=2
> dm-snap: flush=2
> dm-stripe: discard, write_same, write_zeroes all set to 'stripes'.
>
> These will only be a problem if the second (or subsequent) alloc_tio()
> blocks waiting for an earlier allocation to complete. This will only
> be a problem if multiple threads are each trying to allocate multiple
> dm_target_io from the same bioset at the same time.
> This is rare and should be easier to address than the current
> dm_offload() approach.
> One possibility would be to copy the approach taken by
> crypt_alloc_buffer() which needs to allocate multiple entries from a
> mempool.
> It first tries the with GFP_NOWAIT. If that fails it take a mutex and
> tries with GFP_NOIO. This mean only one thread will try to allocate
> multiple bios at once, so there can be no deadlock.
>
> Below are two RFC patches. The first removes num_write_bios.
> The second is incomplete and makes a stab are allocating multiple bios
> at once safely.
> A third would be needed to remove dm_offload() etc... but I cannot quite
> fit that in today - must be off.
Great.
> From: NeilBrown <neilb@suse.com>
> Date: Wed, 22 Nov 2017 14:25:18 +1100
> Subject: [PATCH] DM: remove num_write_bios target interface.
>
> No target provides num_write_bios and none has done
> since 2013.
> Having the possibility of num_write_bios > 1 complicates
> bio allocation.
> So remove the interface and assume there is only one bio
> needed.
> If a target ever needs more, it must provide a suitable
> bioset and allocate itself based on its particular needs.
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> drivers/md/dm.c | 22 ++++------------------
> include/linux/device-mapper.h | 15 ---------------
> 2 files changed, 4 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index b20febd6cbc7..8c1a05609eea 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1323,27 +1323,13 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
> {
> struct bio *bio = ci->bio;
> struct dm_target_io *tio;
> - unsigned target_bio_nr;
> - unsigned num_target_bios = 1;
> int r = 0;
>
> - /*
> - * Does the target want to receive duplicate copies of the bio?
> - */
> - if (bio_data_dir(bio) == WRITE && ti->num_write_bios)
> - num_target_bios = ti->num_write_bios(ti, bio);
> -
> - for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) {
> - tio = alloc_tio(ci, ti, target_bio_nr);
> - tio->len_ptr = len;
> - r = clone_bio(tio, bio, sector, *len);
> - if (r < 0) {
> - free_tio(tio);
> - break;
> - }
> + tio = alloc_tio(ci, ti, 0);
> + tio->len_ptr = len;
> + r = clone_bio(tio, bio, sector, *len);
> + if (r >= 0)
> __map_bio(tio);
> - }
> -
This bit is wrong, free_tio() is needed if clone_bio() fails. I can fix
it up though.
I'll work through your patches tomorrow.
Thanks,
Mike
next prev parent reply other threads:[~2017-11-22 4:28 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-18 4:38 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-06-18 4:38 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
2017-06-18 4:38 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
2017-06-18 4:38 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
2017-06-18 4:38 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
2017-06-18 4:38 ` [PATCH 06/13] rbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18 4:38 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
2017-06-18 4:38 ` [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18 4:38 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
2017-06-18 4:38 ` [PATCH 08/13] pktcdvd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18 4:38 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
2017-06-18 4:38 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
2017-06-18 4:38 ` [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify() NeilBrown
2017-06-18 4:38 ` [PATCH 12/13] block: remove bio_clone() and all references NeilBrown
2017-06-18 18:41 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning Jens Axboe
2017-06-18 21:36 ` NeilBrown
2017-11-20 16:43 ` Mike Snitzer
2017-11-21 0:34 ` [dm-devel] " NeilBrown
2017-11-21 1:35 ` Mike Snitzer
2017-11-21 12:10 ` Mike Snitzer
2017-11-21 12:43 ` Mike Snitzer
2017-11-21 19:47 ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] Mike Snitzer
2017-11-21 21:23 ` [dm-devel] " Mikulas Patocka
2017-11-21 22:51 ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER Mike Snitzer
2017-11-22 1:21 ` Mikulas Patocka
2017-11-22 2:32 ` Mike Snitzer
2017-11-22 4:00 ` [dm-devel] " NeilBrown
2017-11-22 4:28 ` Mike Snitzer [this message]
2017-11-22 21:18 ` Mike Snitzer
2017-11-22 18:24 ` [dm-devel] " Mikulas Patocka
2017-11-22 18:49 ` Mike Snitzer
2017-11-23 5:12 ` [dm-devel] " NeilBrown
2017-11-23 22:52 ` [PATCH] dm: use cloned bio as head, not remainder, in __split_and_process_bio() NeilBrown
2017-11-27 14:23 ` Mike Snitzer
2017-11-28 22:18 ` [dm-devel] " NeilBrown
2017-11-21 23:03 ` [dm-devel] new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] NeilBrown
2017-11-21 19:44 ` [dm-devel] [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-11-21 19:50 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171122042838.GB20417@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=neilb@suse.com \
--cc=zkabelac@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).