From: Jens Axboe <axboe@kernel.dk>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
Date: Tue, 19 Nov 2013 16:59:37 -0700 [thread overview]
Message-ID: <20131119235937.GC4094@kernel.dk> (raw)
In-Reply-To: <20131119232308.GS11434@dastard>
On Wed, Nov 20 2013, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 03:51:27PM -0700, Jens Axboe wrote:
> > On 11/19/2013 03:42 PM, Jens Axboe wrote:
> > > On 11/19/2013 02:43 PM, Jens Axboe wrote:
> > >> On 11/19/2013 02:34 PM, Dave Chinner wrote:
> > >>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
> > >>>> On Tue, Nov 19 2013, Jens Axboe wrote:
> > >>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
> > >>>>>> Hi Jens,
> > >>>>>>
> > >>>>>> I was just running xfstests on a 3.13 kernel that has had the block
> > >>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
> > >>>>>> CPU VM using virtio,cache=none for the block devices under test,
> > >>>>>> with many (130+) threads stuck below submit_bio() like this:
> > >>>>>>
> > >>>>>> Call Trace:
> > >>>>>> [<ffffffff81adb1c9>] schedule+0x29/0x70
> > >>>>>> [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> > >>>>>> [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> > >>>>>> [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> > >>>>>> [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> > >>>>>> [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> > >>>>>> [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> > >>>>>>
> > >>>>>> reads and writes are hung, both data (direct and buffered) and
> > >>>>>> metadata.
> > >>>>>>
> > >>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
> > >>>>>> buffered and direct IO, both reads and writes) so it looks like IO
> > >>>>>> completion has stalled in some manner, too.
> > >>>>>
> > >>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
> > >>>>
> > >>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
> > >>>> flight, but haven't completed. So the next one that comes in just sits
> > >>>> there forever. I can't find any sysfs debug entries for virtio, would be
> > >>>> nice to inspect its queue as well...
> > >>>
> > >>> Does it have anything to do with the fact that the request queue
> > >>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
> > >>>
> > >>> /sys/block/vdb/queue/nr_requests
> > >>> 128
> > >>>
> > >>> /sys/block/vdb/mq/0/tags
> > >>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> > >>> nr_free=0, nr_reserved=1
> > >>> cpu00: nr_free=0
> > >>> cpu01: nr_free=0
> > >>>
> > >>> Seems to imply that if we queue up more than 66 IOs without
> > >>> dispatching them, we'll run out of tags. And without another IO
> > >>> coming through, the "none" scheduler that virtio uses will never
> > >>> get a trigger to push out the currently queued IO?
> > >>
> > >> No, the nr_requests isn't actually relevant in the blk-mq context, the
> > >> driver sets its own depth. For the above, it's 64 normal commands, and 2
> > >> reserved. The reserved would be for a flush, for instance. If someone
> > >> attempts to queue more than the allocated number of requests, it'll stop
> > >> the blk-mq queue and kick things into gear on the virtio side. Then when
> > >> requests complete, we start the queue again.
> > >>
> > >> If you look at virtio_queue_rq(), that handles a single request. This
> > >> request is already tagged at this point. If we can't add it to the ring,
> > >> we simply stop the queue and kick off whatever pending we might have. We
> > >> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
> > >> sending us more. When we get the virtblk_done() callback from virtio, we
> > >> end the requests on the blk-mq side and restart the queue.
> > >
> > > I added some debug code to see if we had anything pending on the blk-mq
> > > side, and it's all empty. It really just looks like we are missing
> > > completions on the virtio side. Very odd.
> >
> > Patching in the old rq path works however, so...
>
> ... we've found a race condition ;)
>
> FWIW, the thing that comes immediately to mind for me is that all
> the wakeups and queue kicking is done outside of any locks, using
> status gained from inside the lock context. Hence the mq stop/start
> and virt queue kicking can all race - perhaps that's resulting in a
> missed wakeup, restart or queue kick?
I _think_ it only happens when queue depth > device queue depth, but I
don't immediately see what is wrong. That logic is pretty much tried and
true. As a debug measure, I just ignored the stop bit, but it doesn't
change anything. BTW, that is pretty much the same as before.
Anyway, no real news yet, will debug later tonight or tomorrow morning
again.
--
Jens Axboe
next prev parent reply other threads:[~2013-11-19 23:59 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-19 8:02 [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices Dave Chinner
2013-11-19 10:36 ` Christoph Hellwig
2013-11-19 16:05 ` Jens Axboe
2013-11-19 16:09 ` Christoph Hellwig
2013-11-19 16:16 ` Jens Axboe
2013-11-19 21:30 ` Dave Chinner
2013-11-19 21:40 ` Jens Axboe
2013-11-19 20:15 ` Jens Axboe
2013-11-19 21:20 ` Jens Axboe
2013-11-19 21:34 ` Dave Chinner
2013-11-19 21:43 ` Jens Axboe
2013-11-19 22:42 ` Jens Axboe
2013-11-19 22:51 ` Jens Axboe
2013-11-19 23:23 ` Dave Chinner
2013-11-19 23:59 ` Jens Axboe [this message]
2013-11-20 0:08 ` Jens Axboe
2013-11-20 1:44 ` Shaohua Li
2013-11-20 1:54 ` Jens Axboe
2013-11-20 2:02 ` Jens Axboe
2013-11-20 2:53 ` Dave Chinner
2013-11-20 3:12 ` Jens Axboe
2013-11-20 8:07 ` Christoph Hellwig
2013-11-20 16:21 ` Jens Axboe
2013-11-20 8:04 ` Christoph Hellwig
2013-11-20 16:20 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131119235937.GC4094@kernel.dk \
--to=axboe@kernel.dk \
--cc=david@fromorbit.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).