dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	device-mapper development <dm-devel@redhat.com>,
	"hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
Date: Fri, 2 Sep 2016 11:12:13 -0400	[thread overview]
Message-ID: <20160902151213.GA17508@redhat.com> (raw)
In-Reply-To: <6af010f8-0a8f-cf0e-d819-3b8e1c20b56e@sandisk.com>

On Thu, Sep 01 2016 at  8:03pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 09/01/2016 04:48 PM, Mike Snitzer wrote:
> > On Thu, Sep 01 2016 at  7:17pm -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >> Sorry that I misread your previous e-mail. After I received your
> >> latest e-mail I rebased my tree on top of the devel.bart branch
> >> mentioned above. My tests still pass. The only two patches in my
> >> tree that are relevant and that are not in the devel.bart branch
> >> have been attached to this e-mail. Did your test involve the sd
> >> driver? If so, do the attached two patches help? If the sd driver
> >> was not involved, can you provide more information about the hang
> >> you ran into? The output and log messages generated by the following
> >> commands after the hang has been reproduced would be very welcome:
> >> * echo w > /proc/sysrq-trigger
> >> * (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})
> > 
> > sd is used.  I'll apply those patches and test, tomorrow, but I'm pretty
> > skeptical.
> > 
> > Haven't had any problems with these tests for quite a while.  The tests
> > I'm running are just those in the mptest testsuite, see:
> > https://github.com/snitm/mptest
> > 
> > Running them should be as simple as you doing:
> > 
> > git clone git://github.com/snitm/mptest.git
> > cd mptest
> > ./runtest
> > 
> > The default is to use dm-mq on scsi-mq ontop of tcmloop.
> > 
> > [ ... ]
> 
> Hello Mike,
> 
> If I run mptest on my setup I can reproduce the hang. But what I see is
> that the service-time path selector is in use when the hang is triggered.
> I will patch that path selector in the same way as I did with the
> queue-length path selector and rerun the test.
> 
> # dmsetup table
> 1Linux_scsi_debug_2000: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 3 1 service-time 0 1 2 8:128 1 1 service-time 0 1 2 8:144 1 1 service-time 0 1 2 8:112 1 1 
> mp: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 2 1 queue-length 0 2 1 8:96 1 8:112 1 queue-length 0 2 1 8:128 1 8:144 1 
> # (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list}) | grep -v ':$'
> dm-0/mq/0/pending:      ffff880358610000
> dm-1/mq/0/pending:      ffff880358220200
> dm-1/mq/0/pending:      ffff880358220400

In case I haven't been clear: calling blk_mq_freeze_queue() _after_
you've suspended the DM device will only trigger IO that will get
re-queued to the DM device's blk-mq queue.  So you're creating a
livelock since blk_mq_freeze_queue_wait() will not return, see stack
from:
https://www.redhat.com/archives/dm-devel/2016-September/msg00017.html

FYI, the BLK_MQ_S_STOPPED check that you removed from dm_mq_queue_rq()
in this commit...
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=devel.bart&id=69eb3e60e099a6117fc754e70eedd504685326ad
...is effectively serving as this when the device is suspended:

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index b5db523..3bc16dc 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -863,6 +863,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
                dm_put_live_table(md, srcu_idx);
        }

+       if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)))
+               return BLK_MQ_RQ_QUEUE_BUSY;
+
        if (ti->type->busy && ti->type->busy(ti))
                return BLK_MQ_RQ_QUEUE_BUSY;

Which is comparable to what the old .request_fn DM does (see:
dm_make_request).

Without that type of suspend check the code will go on to call
dm-mpath.c:multipath_clone_and_map() which will only result in
DM_MAPIO_REQUEUE (and dm_mq_queue_rq returning BLK_MQ_RQ_QUEUE_BUSY in
the case when multipath has no paths, via must_push_back_rq()).

Anyway, not what we want in general.  The goal during DM device suspend
is to stop IO from being mapped.  With request-based DM suspend we're
punting requests back to the block layer's request_queue.

So in the case of blk-mq request-based DM: we cannot expect
blk_mq_freeze_queue(), during suspend, to complete if requests are
getting requeued to the blk-mq queue via BLK_MQ_RQ_QUEUE_BUSY.

Mike

  reply	other threads:[~2016-09-02 15:12 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
2016-08-31 22:15 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-08-31 22:16 ` [PATCH 2/9] dm: Rename a function argument Bart Van Assche
2016-09-01  3:29   ` Mike Snitzer
2016-09-01 14:17     ` Bart Van Assche
2016-08-31 22:16 ` [PATCH 3/9] dm: Introduce signal_pending_state() Bart Van Assche
2016-08-31 22:16 ` [PATCH 4/9] dm: Convert wait loops Bart Van Assche
2016-08-31 22:17 ` [PATCH 5/9] dm: Add two lockdep_assert_held() statements Bart Van Assche
2016-08-31 22:17 ` [PATCH 6/9] dm: Simplify dm_old_stop_queue() Bart Van Assche
2016-08-31 22:17 ` [PATCH 7/9] dm: Mark block layer queue dead before destroying the dm device Bart Van Assche
2016-08-31 22:18 ` [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues Bart Van Assche
2016-09-01  3:13   ` Mike Snitzer
2016-09-01 14:23     ` Bart Van Assche
2016-09-01 15:05       ` Mike Snitzer
2016-09-01 15:31         ` Bart Van Assche
2016-09-01 15:50           ` Mike Snitzer
2016-09-01 16:12             ` Mike Snitzer
2016-09-01 17:59               ` Bart Van Assche
2016-09-01 19:05                 ` Mike Snitzer
2016-09-01 19:35                   ` Mike Snitzer
2016-09-01 20:15                   ` Bart Van Assche
2016-09-01 20:33                     ` Mike Snitzer
2016-09-01 20:39                       ` Bart Van Assche
2016-09-01 20:48                         ` Mike Snitzer
2016-09-01 20:52                           ` Bart Van Assche
2016-09-01 21:17                             ` Mike Snitzer
2016-09-01 22:18                               ` Mike Snitzer
2016-09-01 22:22                                 ` Bart Van Assche
2016-09-01 22:26                                   ` Mike Snitzer
2016-09-01 23:17                                     ` Bart Van Assche
2016-09-01 23:47                                       ` Mike Snitzer
2016-09-02  0:03                                         ` Bart Van Assche
2016-09-02 15:12                                           ` Mike Snitzer [this message]
2016-09-02 16:10                                             ` should blk-mq halt requeue processing while queue is frozen? [was: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues] Mike Snitzer
2016-09-02 22:42                                               ` [dm-devel] should blk-mq halt requeue processing while queue is frozen? Bart Van Assche
2016-09-03  0:34                                                 ` Mike Snitzer
2016-09-07 16:41                                                 ` Mike Snitzer
2016-09-13  8:01                                                   ` [dm-devel] " Bart Van Assche
2016-09-13 14:36                                                     ` Mike Snitzer
2016-08-31 22:18 ` [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop Bart Van Assche
2016-09-01  2:49   ` Mike Snitzer
2016-09-01 14:14     ` Bart Van Assche
2016-09-01 15:06       ` Mike Snitzer
2016-09-01 15:22         ` Bart Van Assche
2016-09-01 15:26           ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160902151213.GA17508@redhat.com \
    --to=snitzer@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bart.vanassche@sandisk.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@lst.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).