All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	device-mapper development <dm-devel@redhat.com>,
	"hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
Date: Fri, 2 Sep 2016 11:12:13 -0400	[thread overview]
Message-ID: <20160902151213.GA17508@redhat.com> (raw)
In-Reply-To: <6af010f8-0a8f-cf0e-d819-3b8e1c20b56e@sandisk.com>

On Thu, Sep 01 2016 at  8:03pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 09/01/2016 04:48 PM, Mike Snitzer wrote:
> > On Thu, Sep 01 2016 at  7:17pm -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >> Sorry that I misread your previous e-mail. After I received your
> >> latest e-mail I rebased my tree on top of the devel.bart branch
> >> mentioned above. My tests still pass. The only two patches in my
> >> tree that are relevant and that are not in the devel.bart branch
> >> have been attached to this e-mail. Did your test involve the sd
> >> driver? If so, do the attached two patches help? If the sd driver
> >> was not involved, can you provide more information about the hang
> >> you ran into? The output and log messages generated by the following
> >> commands after the hang has been reproduced would be very welcome:
> >> * echo w > /proc/sysrq-trigger
> >> * (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})
> > 
> > sd is used.  I'll apply those patches and test, tomorrow, but I'm pretty
> > skeptical.
> > 
> > Haven't had any problems with these tests for quite a while.  The tests
> > I'm running are just those in the mptest testsuite, see:
> > https://github.com/snitm/mptest
> > 
> > Running them should be as simple as you doing:
> > 
> > git clone git://github.com/snitm/mptest.git
> > cd mptest
> > ./runtest
> > 
> > The default is to use dm-mq on scsi-mq ontop of tcmloop.
> > 
> > [ ... ]
> 
> Hello Mike,
> 
> If I run mptest on my setup I can reproduce the hang. But what I see is
> that the service-time path selector is in use when the hang is triggered.
> I will patch that path selector in the same way as I did with the
> queue-length path selector and rerun the test.
> 
> # dmsetup table
> 1Linux_scsi_debug_2000: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 3 1 service-time 0 1 2 8:128 1 1 service-time 0 1 2 8:144 1 1 service-time 0 1 2 8:112 1 1 
> mp: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 2 1 queue-length 0 2 1 8:96 1 8:112 1 queue-length 0 2 1 8:128 1 8:144 1 
> # (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list}) | grep -v ':$'
> dm-0/mq/0/pending:      ffff880358610000
> dm-1/mq/0/pending:      ffff880358220200
> dm-1/mq/0/pending:      ffff880358220400

In case I haven't been clear: calling blk_mq_freeze_queue() _after_
you've suspended the DM device will only trigger IO that will get
re-queued to the DM device's blk-mq queue.  So you're creating a
livelock since blk_mq_freeze_queue_wait() will not return, see stack
from:
https://www.redhat.com/archives/dm-devel/2016-September/msg00017.html

FYI, the BLK_MQ_S_STOPPED check that you removed from dm_mq_queue_rq()
in this commit...
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=devel.bart&id=69eb3e60e099a6117fc754e70eedd504685326ad
...is effectively serving as this when the device is suspended:

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index b5db523..3bc16dc 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -863,6 +863,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
                dm_put_live_table(md, srcu_idx);
        }

+       if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)))
+               return BLK_MQ_RQ_QUEUE_BUSY;
+
        if (ti->type->busy && ti->type->busy(ti))
                return BLK_MQ_RQ_QUEUE_BUSY;

Which is comparable to what the old .request_fn DM does (see:
dm_make_request).

Without that type of suspend check the code will go on to call
dm-mpath.c:multipath_clone_and_map() which will only result in
DM_MAPIO_REQUEUE (and dm_mq_queue_rq returning BLK_MQ_RQ_QUEUE_BUSY in
the case when multipath has no paths, via must_push_back_rq()).

Anyway, not what we want in general.  The goal during DM device suspend
is to stop IO from being mapped.  With request-based DM suspend we're
punting requests back to the block layer's request_queue.

So in the case of blk-mq request-based DM: we cannot expect
blk_mq_freeze_queue(), during suspend, to complete if requests are
getting requeued to the blk-mq queue via BLK_MQ_RQ_QUEUE_BUSY.

Mike

  reply	other threads:[~2016-09-02 15:12 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
2016-08-31 22:15 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-08-31 22:16 ` [PATCH 2/9] dm: Rename a function argument Bart Van Assche
2016-09-01  3:29   ` Mike Snitzer
2016-09-01 14:17     ` Bart Van Assche
2016-08-31 22:16 ` [PATCH 3/9] dm: Introduce signal_pending_state() Bart Van Assche
2016-08-31 22:16 ` [PATCH 4/9] dm: Convert wait loops Bart Van Assche
2016-08-31 22:17 ` [PATCH 5/9] dm: Add two lockdep_assert_held() statements Bart Van Assche
2016-08-31 22:17 ` [PATCH 6/9] dm: Simplify dm_old_stop_queue() Bart Van Assche
2016-08-31 22:17 ` [PATCH 7/9] dm: Mark block layer queue dead before destroying the dm device Bart Van Assche
2016-08-31 22:18 ` [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues Bart Van Assche
2016-09-01  3:13   ` Mike Snitzer
2016-09-01 14:23     ` Bart Van Assche
2016-09-01 15:05       ` Mike Snitzer
2016-09-01 15:31         ` Bart Van Assche
2016-09-01 15:50           ` Mike Snitzer
2016-09-01 16:12             ` Mike Snitzer
2016-09-01 17:59               ` Bart Van Assche
2016-09-01 19:05                 ` Mike Snitzer
2016-09-01 19:35                   ` Mike Snitzer
2016-09-01 20:15                   ` Bart Van Assche
2016-09-01 20:33                     ` Mike Snitzer
2016-09-01 20:39                       ` Bart Van Assche
2016-09-01 20:48                         ` Mike Snitzer
2016-09-01 20:52                           ` Bart Van Assche
2016-09-01 21:17                             ` Mike Snitzer
2016-09-01 22:18                               ` Mike Snitzer
2016-09-01 22:22                                 ` Bart Van Assche
2016-09-01 22:26                                   ` Mike Snitzer
2016-09-01 23:17                                     ` Bart Van Assche
2016-09-01 23:47                                       ` Mike Snitzer
2016-09-02  0:03                                         ` Bart Van Assche
2016-09-02 15:12                                           ` Mike Snitzer [this message]
2016-09-02 16:10                                             ` should blk-mq halt requeue processing while queue is frozen? [was: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues] Mike Snitzer
2016-09-02 22:42                                               ` [dm-devel] should blk-mq halt requeue processing while queue is frozen? Bart Van Assche
2016-09-02 22:42                                                 ` Bart Van Assche
2016-09-03  0:34                                                 ` Mike Snitzer
2016-09-07 16:41                                                 ` Mike Snitzer
2016-09-13  8:01                                                   ` [dm-devel] " Bart Van Assche
2016-09-13 14:36                                                     ` Mike Snitzer
2016-08-31 22:18 ` [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop Bart Van Assche
2016-09-01  2:49   ` Mike Snitzer
2016-09-01 14:14     ` Bart Van Assche
2016-09-01 15:06       ` Mike Snitzer
2016-09-01 15:22         ` Bart Van Assche
2016-09-01 15:26           ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160902151213.GA17508@redhat.com \
    --to=snitzer@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@sandisk.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@lst.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.