From: Mike Snitzer <snitzer@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: NeilBrown <neilb@suse.com>, Jens Axboe <axboe@kernel.dk>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-block@vger.kernel.org,
device-mapper development <dm-devel@redhat.com>,
Zdenek Kabelac <zkabelac@redhat.com>
Subject: Re: new patchset to eliminate DM's use of BIOSET_NEED_RESCUER
Date: Tue, 21 Nov 2017 21:32:03 -0500 [thread overview]
Message-ID: <20171122023203.GA20417@redhat.com> (raw)
In-Reply-To: <alpine.LRH.2.02.1711212009040.28456@file01.intranet.prod.int.rdu2.redhat.com>
On Tue, Nov 21 2017 at 8:21pm -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Tue, 21 Nov 2017, Mike Snitzer wrote:
>
> > On Tue, Nov 21 2017 at 4:23pm -0500,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > > This is not correct:
> > >
> > > 2206 static void dm_wq_work(struct work_struct *work)
> > > 2207 {
> > > 2208 struct mapped_device *md = container_of(work, struct mapped_device, work);
> > > 2209 struct bio *bio;
> > > 2210 int srcu_idx;
> > > 2211 struct dm_table *map;
> > > 2212
> > > 2213 if (!bio_list_empty(&md->rescued)) {
> > > 2214 struct bio_list list;
> > > 2215 spin_lock_irq(&md->deferred_lock);
> > > 2216 list = md->rescued;
> > > 2217 bio_list_init(&md->rescued);
> > > 2218 spin_unlock_irq(&md->deferred_lock);
> > > 2219 while ((bio = bio_list_pop(&list)))
> > > 2220 generic_make_request(bio);
> > > 2221 }
> > > 2222
> > > 2223 map = dm_get_live_table(md, &srcu_idx);
> > > 2224
> > > 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> > > 2226 spin_lock_irq(&md->deferred_lock);
> > > 2227 bio = bio_list_pop(&md->deferred);
> > > 2228 spin_unlock_irq(&md->deferred_lock);
> > > 2229
> > > 2230 if (!bio)
> > > 2231 break;
> > > 2232
> > > 2233 if (dm_request_based(md))
> > > 2234 generic_make_request(bio);
> > > 2235 else
> > > 2236 __split_and_process_bio(md, map, bio);
> > > 2237 }
> > > 2238
> > > 2239 dm_put_live_table(md, srcu_idx);
> > > 2240 }
> > >
> > > You can see that if we are in dm_wq_work in __split_and_process_bio, we
> > > will not process md->rescued list.
> >
> > Can you elaborate further? We cannot be "in dm_wq_work in
> > __split_and_process_bio" simultaneously. Do you mean as a side-effect
> > of scheduling away from __split_and_process_bio?
> >
> > The more detail you can share the better.
>
> Suppose this scenario:
>
> * dm_wq_work calls __split_and_process_bio
Right, I later realized this was the call chain you were referring to.
Not sure how I missed it the first time around.
> * __split_and_process_bio eventually reaches the function snapshot_map
> * snapshot_map attempts to take the snapshot lock
>
> * the snapshot lock could be released only if some bios submitted by the
> snapshot driver to the underlying device complete
> * the bios submitted to the underlying device were already offloaded by
> some other task and they are waiting on the list md->rescued
> * the bios waiting on md->rescued are not processed, because dm_wq_work is
> blocked in snapshot_map (called from __split_and_process_bio)
SO you're saying the case that Neil doesn't think should happen:
https://lkml.org/lkml/2017/11/21/658
...can happen.
> > > The processing of md->rescued is also wrong - bios for different devices
> > > must be offloaded to different helper threads, so that processing a bio
> > > for a lower device doesn't depend on processing a bio for a higher device.
> > > If you offload all the bios on current->bio_list to the same thread, the
> > > bios still depend on each other and the deadlock will still happen.
> >
> > Commit 325738403 ("dm: revise 'rescue' strategy for bio-based bioset
> > allocations") speaks to this with:
> >
> > "Note that only current->bio_list[0] is offloaded. current->bio_list[1]
> > contains bios that were scheduled *before* the current one started, so
> > they must have been submitted from higher up the stack, and we cannot be
> > waiting for them here (thanks to the "dm: ensure bio submission follows
> > a depth-first tree walk" commit). Also, we now rescue *all* bios on the
> > list as there is nothing to be gained by being more selective."
>
> I think you are right - if we only offload current->bio_list[0], then
> mixing of dependent bios on the offloaded list won't happen.
>
> > And again: this patchset passes your dm-snapshot deadlock test. Is
> > that test somehow lacking?
>
> With your patchset, the deadlock would happen only if bios are queued on
> &md->deferred - and that happens only in case of resume or if we are
> processing REQ_PREFLUSH with non-zero data size.
>
> So, the simple test that I wrote doesn't trigger it, but a more complex
> test involving REQ_PREFLUSH could.
Makes sense. But I need to think further about _why_ bios submitted to
the snapshot driver's underlying device would end up on md->rescued
(like you suggested above). Again, Neil thinks it not possible. Neil
said:
"they will not be recursive calls, so nothing will be added to
current->bio_list[0] and nothing will be moved to md->rescued. Each
generic_make_request() will completely submit the request in the lower
level devel."
Mike
next prev parent reply other threads:[~2017-11-22 2:32 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-18 4:38 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-06-18 4:38 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
2017-06-18 4:38 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
2017-06-18 4:38 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
2017-06-18 4:38 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
2017-06-18 4:38 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
2017-06-18 4:38 ` [PATCH 06/13] rbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18 4:38 ` [PATCH 07/13] drbd: " NeilBrown
2017-06-18 4:38 ` [PATCH 08/13] pktcdvd: " NeilBrown
2017-06-18 4:38 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
2017-06-18 4:38 ` [PATCH 12/13] block: remove bio_clone() and all references NeilBrown
2017-06-18 4:38 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
2017-06-18 4:38 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
2017-06-18 4:38 ` [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify() NeilBrown
2017-06-18 18:41 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning Jens Axboe
2017-06-18 21:36 ` NeilBrown
2017-11-20 16:43 ` Mike Snitzer
2017-11-21 0:34 ` [dm-devel] " NeilBrown
2017-11-21 0:34 ` NeilBrown
2017-11-21 0:34 ` NeilBrown
2017-11-21 1:35 ` Mike Snitzer
2017-11-21 1:35 ` Mike Snitzer
2017-11-21 12:10 ` Mike Snitzer
2017-11-21 12:10 ` Mike Snitzer
2017-11-21 12:43 ` Mike Snitzer
2017-11-21 12:43 ` Mike Snitzer
2017-11-21 19:47 ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] Mike Snitzer
2017-11-21 21:23 ` [dm-devel] " Mikulas Patocka
2017-11-21 22:51 ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER Mike Snitzer
2017-11-22 1:21 ` Mikulas Patocka
2017-11-22 2:32 ` Mike Snitzer [this message]
2017-11-22 4:00 ` [dm-devel] " NeilBrown
2017-11-22 4:00 ` NeilBrown
2017-11-22 4:00 ` NeilBrown
2017-11-22 4:28 ` Mike Snitzer
2017-11-22 4:28 ` Mike Snitzer
2017-11-22 21:18 ` Mike Snitzer
2017-11-22 18:24 ` [dm-devel] " Mikulas Patocka
2017-11-22 18:49 ` Mike Snitzer
2017-11-23 5:12 ` [dm-devel] " NeilBrown
2017-11-23 5:12 ` NeilBrown
2017-11-23 22:52 ` [PATCH] dm: use cloned bio as head, not remainder, in __split_and_process_bio() NeilBrown
2017-11-23 22:52 ` NeilBrown
2017-11-27 14:23 ` Mike Snitzer
2017-11-28 22:18 ` [dm-devel] " NeilBrown
2017-11-28 22:18 ` NeilBrown
2017-11-21 23:03 ` [dm-devel] new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] NeilBrown
2017-11-21 23:03 ` NeilBrown
2017-11-21 19:44 ` [dm-devel] [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-11-21 19:44 ` NeilBrown
2017-11-21 19:44 ` NeilBrown
2017-11-21 19:50 ` Mike Snitzer
2017-11-21 19:50 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171122023203.GA20417@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=neilb@suse.com \
--cc=zkabelac@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.