linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: NeilBrown <neilb@suse.com>, Jens Axboe <axboe@kernel.dk>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org,
	device-mapper development <dm-devel@redhat.com>,
	Zdenek Kabelac <zkabelac@redhat.com>
Subject: Re: new patchset to eliminate DM's use of BIOSET_NEED_RESCUER
Date: Tue, 21 Nov 2017 21:32:03 -0500	[thread overview]
Message-ID: <20171122023203.GA20417@redhat.com> (raw)
In-Reply-To: <alpine.LRH.2.02.1711212009040.28456@file01.intranet.prod.int.rdu2.redhat.com>

On Tue, Nov 21 2017 at  8:21pm -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Tue, 21 Nov 2017, Mike Snitzer wrote:
> 
> > On Tue, Nov 21 2017 at  4:23pm -0500,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > This is not correct:
> > > 
> > >    2206 static void dm_wq_work(struct work_struct *work)
> > >    2207 {
> > >    2208         struct mapped_device *md = container_of(work, struct mapped_device, work);
> > >    2209         struct bio *bio;
> > >    2210         int srcu_idx;
> > >    2211         struct dm_table *map;
> > >    2212
> > >    2213         if (!bio_list_empty(&md->rescued)) {
> > >    2214                 struct bio_list list;
> > >    2215                 spin_lock_irq(&md->deferred_lock);
> > >    2216                 list = md->rescued;
> > >    2217                 bio_list_init(&md->rescued);
> > >    2218                 spin_unlock_irq(&md->deferred_lock);
> > >    2219                 while ((bio = bio_list_pop(&list)))
> > >    2220                         generic_make_request(bio);
> > >    2221         }
> > >    2222
> > >    2223         map = dm_get_live_table(md, &srcu_idx);
> > >    2224
> > >    2225         while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> > >    2226                 spin_lock_irq(&md->deferred_lock);
> > >    2227                 bio = bio_list_pop(&md->deferred);
> > >    2228                 spin_unlock_irq(&md->deferred_lock);
> > >    2229
> > >    2230                 if (!bio)
> > >    2231                         break;
> > >    2232
> > >    2233                 if (dm_request_based(md))
> > >    2234                         generic_make_request(bio);
> > >    2235                 else
> > >    2236                         __split_and_process_bio(md, map, bio);
> > >    2237         }
> > >    2238
> > >    2239         dm_put_live_table(md, srcu_idx);
> > >    2240 }
> > > 
> > > You can see that if we are in dm_wq_work in __split_and_process_bio, we 
> > > will not process md->rescued list.
> > 
> > Can you elaborate further?  We cannot be "in dm_wq_work in
> > __split_and_process_bio" simultaneously.  Do you mean as a side-effect
> > of scheduling away from __split_and_process_bio?
> > 
> > The more detail you can share the better.
> 
> Suppose this scenario:
> 
> * dm_wq_work calls __split_and_process_bio

Right, I later realized this was the call chain you were referring to.
Not sure how I missed it the first time around.

> * __split_and_process_bio eventually reaches the function snapshot_map
> * snapshot_map attempts to take the snapshot lock
> 
> * the snapshot lock could be released only if some bios submitted by the 
> snapshot driver to the underlying device complete
> * the bios submitted to the underlying device were already offloaded by 
> some other task and they are waiting on the list md->rescued
> * the bios waiting on md->rescued are not processed, because dm_wq_work is 
> blocked in snapshot_map (called from __split_and_process_bio)

SO you're saying the case that Neil doesn't think should happen:
https://lkml.org/lkml/2017/11/21/658
...can happen.

> > > The processing of md->rescued is also wrong - bios for different devices 
> > > must be offloaded to different helper threads, so that processing a bio 
> > > for a lower device doesn't depend on processing a bio for a higher device. 
> > > If you offload all the bios on current->bio_list to the same thread, the 
> > > bios still depend on each other and the deadlock will still happen.
> > 
> > Commit 325738403 ("dm: revise 'rescue' strategy for bio-based bioset
> > allocations") speaks to this with:
> > 
> > "Note that only current->bio_list[0] is offloaded.  current->bio_list[1]
> > contains bios that were scheduled *before* the current one started, so
> > they must have been submitted from higher up the stack, and we cannot be
> > waiting for them here (thanks to the "dm: ensure bio submission follows
> > a depth-first tree walk" commit).  Also, we now rescue *all* bios on the
> > list as there is nothing to be gained by being more selective."
> 
> I think you are right - if we only offload current->bio_list[0], then 
> mixing of dependent bios on the offloaded list won't happen.
> 
> > And again: this patchset passes your dm-snapshot deadlock test.  Is
> > that test somehow lacking?
> 
> With your patchset, the deadlock would happen only if bios are queued on 
> &md->deferred - and that happens only in case of resume or if we are 
> processing REQ_PREFLUSH with non-zero data size.
> 
> So, the simple test that I wrote doesn't trigger it, but a more complex 
> test involving REQ_PREFLUSH could.

Makes sense.  But I need to think further about _why_ bios submitted to
the snapshot driver's underlying device would end up on md->rescued
(like you suggested above).  Again, Neil thinks it not possible.  Neil
said:
"they will not be recursive calls, so nothing will be added to
current->bio_list[0] and nothing will be moved to md->rescued.  Each
generic_make_request() will completely submit the request in the lower
level devel."

Mike

  reply	other threads:[~2017-11-22  2:32 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-18  4:38 [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-06-18  4:38 ` [PATCH 04/13] blk: use non-rescuing bioset for q->bio_split NeilBrown
2017-06-18  4:38 ` [PATCH 01/13] blk: remove bio_set arg from blk_queue_split() NeilBrown
2017-06-18  4:38 ` [PATCH 03/13] blk: make the bioset rescue_workqueue optional NeilBrown
2017-06-18  4:38 ` [PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create() NeilBrown
2017-06-18  4:38 ` [PATCH 06/13] rbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18  4:38 ` [PATCH 08/13] pktcdvd: " NeilBrown
2017-06-18  4:38 ` [PATCH 05/13] block: Improvements to bounce-buffer handling NeilBrown
2017-06-18  4:38 ` [PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone() NeilBrown
2017-06-18  4:38 ` [PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast() NeilBrown
2017-06-18  4:38 ` [PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split() NeilBrown
2017-06-18  4:38 ` [PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify() NeilBrown
2017-06-18  4:38 ` [PATCH 10/13] xen-blkfront: remove bio splitting NeilBrown
2017-06-18  4:38 ` [PATCH 12/13] block: remove bio_clone() and all references NeilBrown
2017-06-18 18:41 ` [PATCH 00/13] block: assorted cleanup for bio splitting and cloning Jens Axboe
2017-06-18 21:36   ` NeilBrown
2017-11-20 16:43     ` Mike Snitzer
2017-11-21  0:34       ` [dm-devel] " NeilBrown
2017-11-21  1:35         ` Mike Snitzer
2017-11-21 12:10           ` Mike Snitzer
2017-11-21 12:43             ` Mike Snitzer
2017-11-21 19:47               ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] Mike Snitzer
2017-11-21 21:23                 ` [dm-devel] " Mikulas Patocka
2017-11-21 22:51                   ` new patchset to eliminate DM's use of BIOSET_NEED_RESCUER Mike Snitzer
2017-11-22  1:21                     ` Mikulas Patocka
2017-11-22  2:32                       ` Mike Snitzer [this message]
2017-11-22  4:00                       ` [dm-devel] " NeilBrown
2017-11-22  4:28                         ` Mike Snitzer
2017-11-22 21:18                           ` Mike Snitzer
2017-11-22 18:24                         ` [dm-devel] " Mikulas Patocka
2017-11-22 18:49                           ` Mike Snitzer
2017-11-23  5:12                           ` [dm-devel] " NeilBrown
2017-11-23 22:52                           ` [PATCH] dm: use cloned bio as head, not remainder, in __split_and_process_bio() NeilBrown
2017-11-27 14:23                             ` Mike Snitzer
2017-11-28 22:18                               ` [dm-devel] " NeilBrown
2017-11-21 23:03                   ` [dm-devel] new patchset to eliminate DM's use of BIOSET_NEED_RESCUER [was: Re: [PATCH 00/13] block: assorted cleanup for bio splitting and cloning.] NeilBrown
2017-11-21 19:44             ` [dm-devel] [PATCH 00/13] block: assorted cleanup for bio splitting and cloning NeilBrown
2017-11-21 19:50               ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171122023203.GA20417@redhat.com \
    --to=snitzer@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=neilb@suse.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).