From: Mike Snitzer <snitzer@redhat.com>
To: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
device-mapper development <dm-devel@redhat.com>,
"hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
Date: Fri, 2 Sep 2016 11:12:13 -0400 [thread overview]
Message-ID: <20160902151213.GA17508@redhat.com> (raw)
In-Reply-To: <6af010f8-0a8f-cf0e-d819-3b8e1c20b56e@sandisk.com>
On Thu, Sep 01 2016 at 8:03pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/01/2016 04:48 PM, Mike Snitzer wrote:
> > On Thu, Sep 01 2016 at 7:17pm -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >> Sorry that I misread your previous e-mail. After I received your
> >> latest e-mail I rebased my tree on top of the devel.bart branch
> >> mentioned above. My tests still pass. The only two patches in my
> >> tree that are relevant and that are not in the devel.bart branch
> >> have been attached to this e-mail. Did your test involve the sd
> >> driver? If so, do the attached two patches help? If the sd driver
> >> was not involved, can you provide more information about the hang
> >> you ran into? The output and log messages generated by the following
> >> commands after the hang has been reproduced would be very welcome:
> >> * echo w > /proc/sysrq-trigger
> >> * (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})
> >
> > sd is used. I'll apply those patches and test, tomorrow, but I'm pretty
> > skeptical.
> >
> > Haven't had any problems with these tests for quite a while. The tests
> > I'm running are just those in the mptest testsuite, see:
> > https://github.com/snitm/mptest
> >
> > Running them should be as simple as you doing:
> >
> > git clone git://github.com/snitm/mptest.git
> > cd mptest
> > ./runtest
> >
> > The default is to use dm-mq on scsi-mq ontop of tcmloop.
> >
> > [ ... ]
>
> Hello Mike,
>
> If I run mptest on my setup I can reproduce the hang. But what I see is
> that the service-time path selector is in use when the hang is triggered.
> I will patch that path selector in the same way as I did with the
> queue-length path selector and rerun the test.
>
> # dmsetup table
> 1Linux_scsi_debug_2000: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 3 1 service-time 0 1 2 8:128 1 1 service-time 0 1 2 8:144 1 1 service-time 0 1 2 8:112 1 1
> mp: 0 2097152 multipath 3 retain_attached_hw_handler queue_mode mq 1 alua 2 1 queue-length 0 2 1 8:96 1 8:112 1 queue-length 0 2 1 8:128 1 8:144 1
> # (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list}) | grep -v ':$'
> dm-0/mq/0/pending: ffff880358610000
> dm-1/mq/0/pending: ffff880358220200
> dm-1/mq/0/pending: ffff880358220400
In case I haven't been clear: calling blk_mq_freeze_queue() _after_
you've suspended the DM device will only trigger IO that will get
re-queued to the DM device's blk-mq queue. So you're creating a
livelock since blk_mq_freeze_queue_wait() will not return, see stack
from:
https://www.redhat.com/archives/dm-devel/2016-September/msg00017.html
FYI, the BLK_MQ_S_STOPPED check that you removed from dm_mq_queue_rq()
in this commit...
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=devel.bart&id=69eb3e60e099a6117fc754e70eedd504685326ad
...is effectively serving as this when the device is suspended:
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index b5db523..3bc16dc 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -863,6 +863,9 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
dm_put_live_table(md, srcu_idx);
}
+ if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)))
+ return BLK_MQ_RQ_QUEUE_BUSY;
+
if (ti->type->busy && ti->type->busy(ti))
return BLK_MQ_RQ_QUEUE_BUSY;
Which is comparable to what the old .request_fn DM does (see:
dm_make_request).
Without that type of suspend check the code will go on to call
dm-mpath.c:multipath_clone_and_map() which will only result in
DM_MAPIO_REQUEUE (and dm_mq_queue_rq returning BLK_MQ_RQ_QUEUE_BUSY in
the case when multipath has no paths, via must_push_back_rq()).
Anyway, not what we want in general. The goal during DM device suspend
is to stop IO from being mapped. With request-based DM suspend we're
punting requests back to the block layer's request_queue.
So in the case of blk-mq request-based DM: we cannot expect
blk_mq_freeze_queue(), during suspend, to complete if requests are
getting requeued to the blk-mq queue via BLK_MQ_RQ_QUEUE_BUSY.
Mike
next prev parent reply other threads:[~2016-09-02 15:12 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-31 22:14 [PATCH 0/9] dm patches for kernel v4.9 Bart Van Assche
2016-08-31 22:15 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-08-31 22:16 ` [PATCH 2/9] dm: Rename a function argument Bart Van Assche
2016-09-01 3:29 ` Mike Snitzer
2016-09-01 14:17 ` Bart Van Assche
2016-08-31 22:16 ` [PATCH 3/9] dm: Introduce signal_pending_state() Bart Van Assche
2016-08-31 22:16 ` [PATCH 4/9] dm: Convert wait loops Bart Van Assche
2016-08-31 22:17 ` [PATCH 5/9] dm: Add two lockdep_assert_held() statements Bart Van Assche
2016-08-31 22:17 ` [PATCH 6/9] dm: Simplify dm_old_stop_queue() Bart Van Assche
2016-08-31 22:17 ` [PATCH 7/9] dm: Mark block layer queue dead before destroying the dm device Bart Van Assche
2016-08-31 22:18 ` [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues Bart Van Assche
2016-09-01 3:13 ` Mike Snitzer
2016-09-01 14:23 ` Bart Van Assche
2016-09-01 15:05 ` Mike Snitzer
2016-09-01 15:31 ` Bart Van Assche
2016-09-01 15:50 ` Mike Snitzer
2016-09-01 16:12 ` Mike Snitzer
2016-09-01 17:59 ` Bart Van Assche
2016-09-01 19:05 ` Mike Snitzer
2016-09-01 19:35 ` Mike Snitzer
2016-09-01 20:15 ` Bart Van Assche
2016-09-01 20:33 ` Mike Snitzer
2016-09-01 20:39 ` Bart Van Assche
2016-09-01 20:48 ` Mike Snitzer
2016-09-01 20:52 ` Bart Van Assche
2016-09-01 21:17 ` Mike Snitzer
2016-09-01 22:18 ` Mike Snitzer
2016-09-01 22:22 ` Bart Van Assche
2016-09-01 22:26 ` Mike Snitzer
2016-09-01 23:17 ` Bart Van Assche
2016-09-01 23:47 ` Mike Snitzer
2016-09-02 0:03 ` Bart Van Assche
2016-09-02 15:12 ` Mike Snitzer [this message]
2016-09-02 16:10 ` should blk-mq halt requeue processing while queue is frozen? [was: Re: [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues] Mike Snitzer
2016-09-02 22:42 ` [dm-devel] should blk-mq halt requeue processing while queue is frozen? Bart Van Assche
2016-09-03 0:34 ` Mike Snitzer
2016-09-07 16:41 ` Mike Snitzer
2016-09-13 8:01 ` [dm-devel] " Bart Van Assche
2016-09-13 14:36 ` Mike Snitzer
2016-08-31 22:18 ` [PATCH 9/9] dm path selector: Avoid that device removal triggers an infinite loop Bart Van Assche
2016-09-01 2:49 ` Mike Snitzer
2016-09-01 14:14 ` Bart Van Assche
2016-09-01 15:06 ` Mike Snitzer
2016-09-01 15:22 ` Bart Van Assche
2016-09-01 15:26 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160902151213.GA17508@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@sandisk.com \
--cc=dm-devel@redhat.com \
--cc=hch@lst.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).