linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
@ 2013-11-19  8:02 Dave Chinner
  2013-11-19 10:36 ` Christoph Hellwig
  2013-11-19 20:15 ` Jens Axboe
  0 siblings, 2 replies; 25+ messages in thread
From: Dave Chinner @ 2013-11-19  8:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

Hi Jens,

I was just running xfstests on a 3.13 kernel that has had the block
layer changed merged into it. generic/269 on XFS is hanging on a 2
CPU VM using virtio,cache=none for the block devices under test,
with many (130+) threads stuck below submit_bio() like this:

 Call Trace:
  [<ffffffff81adb1c9>] schedule+0x29/0x70
  [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
  [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
  [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
  [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
  [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
  [<ffffffff8174e40c>] submit_bio+0x6c/0x120

reads and writes are hung, both data (direct and buffered) and
metadata.

Some IOs are sitting in io_schedule, waiting for IO completion (both
buffered and direct IO, both reads and writes) so it looks like IO
completion has stalled in some manner, too.

Also, when I run iostat, all my virtio block devices have
disappeared from it. i.e. I only see /dev/sda, and not /dev/vd[a-d]
like all previous kernels have shown up. That appears to be due to
/proc/diskstats not showing any stats for those devices anymore.

$ cat /proc/diskstats |grep vd
 253       0 vda 0 0 0 0 0 0 0 0 0 0 0
 253      16 vdb 0 0 0 0 0 0 0 0 0 0 0
 253      32 vdc 0 0 0 0 0 0 0 0 0 0 0
 253      48 vdd 0 0 0 0 0 0 0 0 0 0 0

I have no idea if it's related to the above hang, but either way
breaking iostat is a major regression....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19  8:02 [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices Dave Chinner
@ 2013-11-19 10:36 ` Christoph Hellwig
  2013-11-19 16:05   ` Jens Axboe
  2013-11-19 20:15 ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2013-11-19 10:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jens Axboe, linux-kernel

On Tue, Nov 19, 2013 at 07:02:18PM +1100, Dave Chinner wrote:
> I have no idea if it's related to the above hang, but either way
> breaking iostat is a major regression....

Both of them are most likely due to the conversion of virtio_blk
to the blk-mq code.

I've not seen the hang in my heavy xfstests testing, but that was a
slightly different codebase than what finally got in, so I'll try
to reproduce it once I get some spare QA cycles.

The missing stats should be solved by something like the patch below
(untested so far):


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 862f458..41e3b85 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -175,7 +175,7 @@ static void blk_mq_rq_ctx_init(struct blk_mq_ctx *ctx, struct request *rq,
 			       unsigned int rw_flags)
 {
 	rq->mq_ctx = ctx;
-	rq->cmd_flags = rw_flags;
+	rq->cmd_flags = rw_flags | REQ_IO_STAT;
 	ctx->rq_dispatched[rw_is_sync(rw_flags)]++;
 }
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 10:36 ` Christoph Hellwig
@ 2013-11-19 16:05   ` Jens Axboe
  2013-11-19 16:09     ` Christoph Hellwig
  2013-11-19 21:30     ` Dave Chinner
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 16:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, linux-kernel

On Tue, Nov 19 2013, Christoph Hellwig wrote:
> On Tue, Nov 19, 2013 at 07:02:18PM +1100, Dave Chinner wrote:
> > I have no idea if it's related to the above hang, but either way
> > breaking iostat is a major regression....
> 
> Both of them are most likely due to the conversion of virtio_blk
> to the blk-mq code.
> 
> I've not seen the hang in my heavy xfstests testing, but that was a
> slightly different codebase than what finally got in, so I'll try
> to reproduce it once I get some spare QA cycles.

Thanks! Dave, if you still have it in that state, can you dump the
contents of /sys/block/<devs>/mq/ for the device/devices that are hung?

> The missing stats should be solved by something like the patch below
> (untested so far):
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 862f458..41e3b85 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -175,7 +175,7 @@ static void blk_mq_rq_ctx_init(struct blk_mq_ctx *ctx, struct request *rq,
>  			       unsigned int rw_flags)
>  {
>  	rq->mq_ctx = ctx;
> -	rq->cmd_flags = rw_flags;
> +	rq->cmd_flags = rw_flags | REQ_IO_STAT;
>  	ctx->rq_dispatched[rw_is_sync(rw_flags)]++;
>  }

I have queued up the below.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 862f458..ef90f18 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -171,9 +171,12 @@ bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
 }
 EXPORT_SYMBOL(blk_mq_can_queue);
 
-static void blk_mq_rq_ctx_init(struct blk_mq_ctx *ctx, struct request *rq,
-			       unsigned int rw_flags)
+static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx,
+			       struct request *rq, unsigned int rw_flags)
 {
+	if (blk_queue_io_stat(q))
+		rw_flags |= REQ_IO_STAT;
+
 	rq->mq_ctx = ctx;
 	rq->cmd_flags = rw_flags;
 	ctx->rq_dispatched[rw_is_sync(rw_flags)]++;
@@ -197,7 +200,7 @@ static struct request *blk_mq_alloc_request_pinned(struct request_queue *q,
 
 		rq = __blk_mq_alloc_request(hctx, gfp & ~__GFP_WAIT, reserved);
 		if (rq) {
-			blk_mq_rq_ctx_init(ctx, rq, rw);
+			blk_mq_rq_ctx_init(q, ctx, rq, rw);
 			break;
 		} else if (!(gfp & __GFP_WAIT))
 			break;
@@ -921,7 +924,7 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	trace_block_getrq(q, bio, rw);
 	rq = __blk_mq_alloc_request(hctx, GFP_ATOMIC, false);
 	if (likely(rq))
-		blk_mq_rq_ctx_init(ctx, rq, rw);
+		blk_mq_rq_ctx_init(q, ctx, rq, rw);
 	else {
 		blk_mq_put_ctx(ctx);
 		trace_block_sleeprq(q, bio, rw);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 16:05   ` Jens Axboe
@ 2013-11-19 16:09     ` Christoph Hellwig
  2013-11-19 16:16       ` Jens Axboe
  2013-11-19 21:30     ` Dave Chinner
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2013-11-19 16:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Dave Chinner, linux-kernel

On Tue, Nov 19, 2013 at 09:05:09AM -0700, Jens Axboe wrote:
> I have queued up the below.

I don't think that will work.  We only set QUEUE_FLAG_IO_STAT in
blk_init_allocated_queue, which doesn't get called from
blk_mq_init_queue.  That being said I really don't understand why we
need to have the QUEUE_FLAG_IO_STAT flag at all.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 16:09     ` Christoph Hellwig
@ 2013-11-19 16:16       ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 16:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, linux-kernel

On 11/19/2013 09:09 AM, Christoph Hellwig wrote:
> On Tue, Nov 19, 2013 at 09:05:09AM -0700, Jens Axboe wrote:
>> I have queued up the below.
> 
> I don't think that will work.  We only set QUEUE_FLAG_IO_STAT in
> blk_init_allocated_queue, which doesn't get called from
> blk_mq_init_queue.  That being said I really don't understand why we
> need to have the QUEUE_FLAG_IO_STAT flag at all.

Hmm good point, we need to init those flags there too. I'll get this
properly tested.

Back when the accounting was initially made optional, there was about a
1% reduction in overhead with it off. That's why the tunable is there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19  8:02 [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices Dave Chinner
  2013-11-19 10:36 ` Christoph Hellwig
@ 2013-11-19 20:15 ` Jens Axboe
  2013-11-19 21:20   ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 20:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On Tue, Nov 19 2013, Dave Chinner wrote:
> Hi Jens,
> 
> I was just running xfstests on a 3.13 kernel that has had the block
> layer changed merged into it. generic/269 on XFS is hanging on a 2
> CPU VM using virtio,cache=none for the block devices under test,
> with many (130+) threads stuck below submit_bio() like this:
> 
>  Call Trace:
>   [<ffffffff81adb1c9>] schedule+0x29/0x70
>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> 
> reads and writes are hung, both data (direct and buffered) and
> metadata.
> 
> Some IOs are sitting in io_schedule, waiting for IO completion (both
> buffered and direct IO, both reads and writes) so it looks like IO
> completion has stalled in some manner, too.

Can I get a recipe to reproduce this? I haven't had any luck so far.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 20:15 ` Jens Axboe
@ 2013-11-19 21:20   ` Jens Axboe
  2013-11-19 21:34     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 21:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On Tue, Nov 19 2013, Jens Axboe wrote:
> On Tue, Nov 19 2013, Dave Chinner wrote:
> > Hi Jens,
> > 
> > I was just running xfstests on a 3.13 kernel that has had the block
> > layer changed merged into it. generic/269 on XFS is hanging on a 2
> > CPU VM using virtio,cache=none for the block devices under test,
> > with many (130+) threads stuck below submit_bio() like this:
> > 
> >  Call Trace:
> >   [<ffffffff81adb1c9>] schedule+0x29/0x70
> >   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> >   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> >   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> >   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> >   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> >   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> > 
> > reads and writes are hung, both data (direct and buffered) and
> > metadata.
> > 
> > Some IOs are sitting in io_schedule, waiting for IO completion (both
> > buffered and direct IO, both reads and writes) so it looks like IO
> > completion has stalled in some manner, too.
> 
> Can I get a recipe to reproduce this? I haven't had any luck so far.

OK, I reproduced it. Looks weird, basically all 64 commands are in
flight, but haven't completed. So the next one that comes in just sits
there forever. I can't find any sysfs debug entries for virtio, would be
nice to inspect its queue as well...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 16:05   ` Jens Axboe
  2013-11-19 16:09     ` Christoph Hellwig
@ 2013-11-19 21:30     ` Dave Chinner
  2013-11-19 21:40       ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2013-11-19 21:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-kernel

On Tue, Nov 19, 2013 at 09:05:09AM -0700, Jens Axboe wrote:
> On Tue, Nov 19 2013, Christoph Hellwig wrote:
> > On Tue, Nov 19, 2013 at 07:02:18PM +1100, Dave Chinner wrote:
> > > I have no idea if it's related to the above hang, but either way
> > > breaking iostat is a major regression....
> > 
> > Both of them are most likely due to the conversion of virtio_blk
> > to the blk-mq code.
> > 
> > I've not seen the hang in my heavy xfstests testing, but that was a
> > slightly different codebase than what finally got in, so I'll try
> > to reproduce it once I get some spare QA cycles.
> 
> Thanks! Dave, if you still have it in that state, can you dump the
> contents of /sys/block/<devs>/mq/ for the device/devices that are hung?

# find /sys/block/vdb/mq -type f -print -exec cat {} \;
/sys/block/vdb/mq/0/run
1313835
/sys/block/vdb/mq/0/cpu0/completed
546857 207203
/sys/block/vdb/mq/0/cpu0/rq_list
CTX pending:
/sys/block/vdb/mq/0/cpu0/merged
0
/sys/block/vdb/mq/0/cpu0/dispatched
257733 496352
/sys/block/vdb/mq/0/cpu1/completed
547714 200741
/sys/block/vdb/mq/0/cpu1/rq_list
CTX pending:
/sys/block/vdb/mq/0/cpu1/merged
0
/sys/block/vdb/mq/0/cpu1/dispatched
262114 486381
/sys/block/vdb/mq/0/tags
nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
nr_free=0, nr_reserved=1
  cpu00: nr_free=0
  cpu01: nr_free=0
/sys/block/vdb/mq/0/ipi_redirect
0
/sys/block/vdb/mq/0/queued
1384795
/sys/block/vdb/mq/0/dispatched
       0        194596
       1        1022691
       2        68540
       4        10138
       8        7686
      16        10234
      32        0
      64        0
     128        0
     256        0
/sys/block/vdb/mq/0/pending
HCTX pending:
#

It looks like it's hung because of this:

/sys/block/vdb/mq/0/tags
nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
nr_free=0, nr_reserved=1
  cpu00: nr_free=0
  cpu01: nr_free=0

No free tags in teh pool, nor any free tags in the per-cpu
magazines. Perhaps there is a leak somewhere?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 21:20   ` Jens Axboe
@ 2013-11-19 21:34     ` Dave Chinner
  2013-11-19 21:43       ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2013-11-19 21:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
> On Tue, Nov 19 2013, Jens Axboe wrote:
> > On Tue, Nov 19 2013, Dave Chinner wrote:
> > > Hi Jens,
> > > 
> > > I was just running xfstests on a 3.13 kernel that has had the block
> > > layer changed merged into it. generic/269 on XFS is hanging on a 2
> > > CPU VM using virtio,cache=none for the block devices under test,
> > > with many (130+) threads stuck below submit_bio() like this:
> > > 
> > >  Call Trace:
> > >   [<ffffffff81adb1c9>] schedule+0x29/0x70
> > >   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> > >   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> > >   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> > >   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> > >   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> > >   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> > > 
> > > reads and writes are hung, both data (direct and buffered) and
> > > metadata.
> > > 
> > > Some IOs are sitting in io_schedule, waiting for IO completion (both
> > > buffered and direct IO, both reads and writes) so it looks like IO
> > > completion has stalled in some manner, too.
> > 
> > Can I get a recipe to reproduce this? I haven't had any luck so far.
> 
> OK, I reproduced it. Looks weird, basically all 64 commands are in
> flight, but haven't completed. So the next one that comes in just sits
> there forever. I can't find any sysfs debug entries for virtio, would be
> nice to inspect its queue as well...

Does it have anything to do with the fact that the request queue
depth is 128 entries and the tag pool only has 66 tags in it? i.e:

/sys/block/vdb/queue/nr_requests
128

/sys/block/vdb/mq/0/tags
nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
nr_free=0, nr_reserved=1
  cpu00: nr_free=0
  cpu01: nr_free=0

Seems to imply that if we queue up more than 66 IOs without
dispatching them, we'll run out of tags. And without another IO
coming through, the "none" scheduler that virtio uses will never
get a trigger to push out the currently queued IO?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 21:30     ` Dave Chinner
@ 2013-11-19 21:40       ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 21:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-kernel

On 11/19/2013 02:30 PM, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 09:05:09AM -0700, Jens Axboe wrote:
>> On Tue, Nov 19 2013, Christoph Hellwig wrote:
>>> On Tue, Nov 19, 2013 at 07:02:18PM +1100, Dave Chinner wrote:
>>>> I have no idea if it's related to the above hang, but either way
>>>> breaking iostat is a major regression....
>>>
>>> Both of them are most likely due to the conversion of virtio_blk
>>> to the blk-mq code.
>>>
>>> I've not seen the hang in my heavy xfstests testing, but that was a
>>> slightly different codebase than what finally got in, so I'll try
>>> to reproduce it once I get some spare QA cycles.
>>
>> Thanks! Dave, if you still have it in that state, can you dump the
>> contents of /sys/block/<devs>/mq/ for the device/devices that are hung?
> 
> # find /sys/block/vdb/mq -type f -print -exec cat {} \;
> /sys/block/vdb/mq/0/run
> 1313835
> /sys/block/vdb/mq/0/cpu0/completed
> 546857 207203
> /sys/block/vdb/mq/0/cpu0/rq_list
> CTX pending:
> /sys/block/vdb/mq/0/cpu0/merged
> 0
> /sys/block/vdb/mq/0/cpu0/dispatched
> 257733 496352
> /sys/block/vdb/mq/0/cpu1/completed
> 547714 200741
> /sys/block/vdb/mq/0/cpu1/rq_list
> CTX pending:
> /sys/block/vdb/mq/0/cpu1/merged
> 0
> /sys/block/vdb/mq/0/cpu1/dispatched
> 262114 486381
> /sys/block/vdb/mq/0/tags
> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> nr_free=0, nr_reserved=1
>   cpu00: nr_free=0
>   cpu01: nr_free=0
> /sys/block/vdb/mq/0/ipi_redirect
> 0
> /sys/block/vdb/mq/0/queued
> 1384795
> /sys/block/vdb/mq/0/dispatched
>        0        194596
>        1        1022691
>        2        68540
>        4        10138
>        8        7686
>       16        10234
>       32        0
>       64        0
>      128        0
>      256        0
> /sys/block/vdb/mq/0/pending
> HCTX pending:
> #
> 
> It looks like it's hung because of this:
> 
> /sys/block/vdb/mq/0/tags
> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> nr_free=0, nr_reserved=1
>   cpu00: nr_free=0
>   cpu01: nr_free=0
> 
> No free tags in teh pool, nor any free tags in the per-cpu
> magazines. Perhaps there is a leak somewhere?

It's hung because all tags and requests are allocated an inflight. In
the above, you have 65 allocated, which must be 64 reads+writes and a
flush out of the reserved pool. Which matches the free stats here near
the bottom, 0 normal tags free and 1 reserved free.

The end_io path for virtio-blk is pretty darn simple. I'm trying to
debug this now...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 21:34     ` Dave Chinner
@ 2013-11-19 21:43       ` Jens Axboe
  2013-11-19 22:42         ` Jens Axboe
  2013-11-20  8:04         ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 21:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On 11/19/2013 02:34 PM, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
>> On Tue, Nov 19 2013, Jens Axboe wrote:
>>> On Tue, Nov 19 2013, Dave Chinner wrote:
>>>> Hi Jens,
>>>>
>>>> I was just running xfstests on a 3.13 kernel that has had the block
>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
>>>> CPU VM using virtio,cache=none for the block devices under test,
>>>> with many (130+) threads stuck below submit_bio() like this:
>>>>
>>>>  Call Trace:
>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
>>>>
>>>> reads and writes are hung, both data (direct and buffered) and
>>>> metadata.
>>>>
>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
>>>> buffered and direct IO, both reads and writes) so it looks like IO
>>>> completion has stalled in some manner, too.
>>>
>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
>>
>> OK, I reproduced it. Looks weird, basically all 64 commands are in
>> flight, but haven't completed. So the next one that comes in just sits
>> there forever. I can't find any sysfs debug entries for virtio, would be
>> nice to inspect its queue as well...
> 
> Does it have anything to do with the fact that the request queue
> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
> 
> /sys/block/vdb/queue/nr_requests
> 128
> 
> /sys/block/vdb/mq/0/tags
> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> nr_free=0, nr_reserved=1
>   cpu00: nr_free=0
>   cpu01: nr_free=0
> 
> Seems to imply that if we queue up more than 66 IOs without
> dispatching them, we'll run out of tags. And without another IO
> coming through, the "none" scheduler that virtio uses will never
> get a trigger to push out the currently queued IO?

No, the nr_requests isn't actually relevant in the blk-mq context, the
driver sets its own depth. For the above, it's 64 normal commands, and 2
reserved. The reserved would be for a flush, for instance. If someone
attempts to queue more than the allocated number of requests, it'll stop
the blk-mq queue and kick things into gear on the virtio side. Then when
requests complete, we start the queue again.

If you look at virtio_queue_rq(), that handles a single request. This
request is already tagged at this point. If we can't add it to the ring,
we simply stop the queue and kick off whatever pending we might have. We
return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
sending us more. When we get the virtblk_done() callback from virtio, we
end the requests on the blk-mq side and restart the queue.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 21:43       ` Jens Axboe
@ 2013-11-19 22:42         ` Jens Axboe
  2013-11-19 22:51           ` Jens Axboe
  2013-11-20  8:04         ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 22:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On 11/19/2013 02:43 PM, Jens Axboe wrote:
> On 11/19/2013 02:34 PM, Dave Chinner wrote:
>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
>>> On Tue, Nov 19 2013, Jens Axboe wrote:
>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
>>>>> Hi Jens,
>>>>>
>>>>> I was just running xfstests on a 3.13 kernel that has had the block
>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
>>>>> CPU VM using virtio,cache=none for the block devices under test,
>>>>> with many (130+) threads stuck below submit_bio() like this:
>>>>>
>>>>>  Call Trace:
>>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
>>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
>>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
>>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
>>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
>>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
>>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
>>>>>
>>>>> reads and writes are hung, both data (direct and buffered) and
>>>>> metadata.
>>>>>
>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
>>>>> buffered and direct IO, both reads and writes) so it looks like IO
>>>>> completion has stalled in some manner, too.
>>>>
>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
>>>
>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
>>> flight, but haven't completed. So the next one that comes in just sits
>>> there forever. I can't find any sysfs debug entries for virtio, would be
>>> nice to inspect its queue as well...
>>
>> Does it have anything to do with the fact that the request queue
>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
>>
>> /sys/block/vdb/queue/nr_requests
>> 128
>>
>> /sys/block/vdb/mq/0/tags
>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
>> nr_free=0, nr_reserved=1
>>   cpu00: nr_free=0
>>   cpu01: nr_free=0
>>
>> Seems to imply that if we queue up more than 66 IOs without
>> dispatching them, we'll run out of tags. And without another IO
>> coming through, the "none" scheduler that virtio uses will never
>> get a trigger to push out the currently queued IO?
> 
> No, the nr_requests isn't actually relevant in the blk-mq context, the
> driver sets its own depth. For the above, it's 64 normal commands, and 2
> reserved. The reserved would be for a flush, for instance. If someone
> attempts to queue more than the allocated number of requests, it'll stop
> the blk-mq queue and kick things into gear on the virtio side. Then when
> requests complete, we start the queue again.
> 
> If you look at virtio_queue_rq(), that handles a single request. This
> request is already tagged at this point. If we can't add it to the ring,
> we simply stop the queue and kick off whatever pending we might have. We
> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
> sending us more. When we get the virtblk_done() callback from virtio, we
> end the requests on the blk-mq side and restart the queue.

I added some debug code to see if we had anything pending on the blk-mq
side, and it's all empty. It really just looks like we are missing
completions on the virtio side. Very odd.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 22:42         ` Jens Axboe
@ 2013-11-19 22:51           ` Jens Axboe
  2013-11-19 23:23             ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 22:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On 11/19/2013 03:42 PM, Jens Axboe wrote:
> On 11/19/2013 02:43 PM, Jens Axboe wrote:
>> On 11/19/2013 02:34 PM, Dave Chinner wrote:
>>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
>>>> On Tue, Nov 19 2013, Jens Axboe wrote:
>>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
>>>>>> Hi Jens,
>>>>>>
>>>>>> I was just running xfstests on a 3.13 kernel that has had the block
>>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
>>>>>> CPU VM using virtio,cache=none for the block devices under test,
>>>>>> with many (130+) threads stuck below submit_bio() like this:
>>>>>>
>>>>>>  Call Trace:
>>>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
>>>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
>>>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
>>>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
>>>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
>>>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
>>>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
>>>>>>
>>>>>> reads and writes are hung, both data (direct and buffered) and
>>>>>> metadata.
>>>>>>
>>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
>>>>>> buffered and direct IO, both reads and writes) so it looks like IO
>>>>>> completion has stalled in some manner, too.
>>>>>
>>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
>>>>
>>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
>>>> flight, but haven't completed. So the next one that comes in just sits
>>>> there forever. I can't find any sysfs debug entries for virtio, would be
>>>> nice to inspect its queue as well...
>>>
>>> Does it have anything to do with the fact that the request queue
>>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
>>>
>>> /sys/block/vdb/queue/nr_requests
>>> 128
>>>
>>> /sys/block/vdb/mq/0/tags
>>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
>>> nr_free=0, nr_reserved=1
>>>   cpu00: nr_free=0
>>>   cpu01: nr_free=0
>>>
>>> Seems to imply that if we queue up more than 66 IOs without
>>> dispatching them, we'll run out of tags. And without another IO
>>> coming through, the "none" scheduler that virtio uses will never
>>> get a trigger to push out the currently queued IO?
>>
>> No, the nr_requests isn't actually relevant in the blk-mq context, the
>> driver sets its own depth. For the above, it's 64 normal commands, and 2
>> reserved. The reserved would be for a flush, for instance. If someone
>> attempts to queue more than the allocated number of requests, it'll stop
>> the blk-mq queue and kick things into gear on the virtio side. Then when
>> requests complete, we start the queue again.
>>
>> If you look at virtio_queue_rq(), that handles a single request. This
>> request is already tagged at this point. If we can't add it to the ring,
>> we simply stop the queue and kick off whatever pending we might have. We
>> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
>> sending us more. When we get the virtblk_done() callback from virtio, we
>> end the requests on the blk-mq side and restart the queue.
> 
> I added some debug code to see if we had anything pending on the blk-mq
> side, and it's all empty. It really just looks like we are missing
> completions on the virtio side. Very odd.

Patching in the old rq path works however, so...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 22:51           ` Jens Axboe
@ 2013-11-19 23:23             ` Dave Chinner
  2013-11-19 23:59               ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2013-11-19 23:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Tue, Nov 19, 2013 at 03:51:27PM -0700, Jens Axboe wrote:
> On 11/19/2013 03:42 PM, Jens Axboe wrote:
> > On 11/19/2013 02:43 PM, Jens Axboe wrote:
> >> On 11/19/2013 02:34 PM, Dave Chinner wrote:
> >>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
> >>>> On Tue, Nov 19 2013, Jens Axboe wrote:
> >>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
> >>>>>> Hi Jens,
> >>>>>>
> >>>>>> I was just running xfstests on a 3.13 kernel that has had the block
> >>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
> >>>>>> CPU VM using virtio,cache=none for the block devices under test,
> >>>>>> with many (130+) threads stuck below submit_bio() like this:
> >>>>>>
> >>>>>>  Call Trace:
> >>>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
> >>>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> >>>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> >>>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> >>>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> >>>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> >>>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> >>>>>>
> >>>>>> reads and writes are hung, both data (direct and buffered) and
> >>>>>> metadata.
> >>>>>>
> >>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
> >>>>>> buffered and direct IO, both reads and writes) so it looks like IO
> >>>>>> completion has stalled in some manner, too.
> >>>>>
> >>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
> >>>>
> >>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
> >>>> flight, but haven't completed. So the next one that comes in just sits
> >>>> there forever. I can't find any sysfs debug entries for virtio, would be
> >>>> nice to inspect its queue as well...
> >>>
> >>> Does it have anything to do with the fact that the request queue
> >>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
> >>>
> >>> /sys/block/vdb/queue/nr_requests
> >>> 128
> >>>
> >>> /sys/block/vdb/mq/0/tags
> >>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> >>> nr_free=0, nr_reserved=1
> >>>   cpu00: nr_free=0
> >>>   cpu01: nr_free=0
> >>>
> >>> Seems to imply that if we queue up more than 66 IOs without
> >>> dispatching them, we'll run out of tags. And without another IO
> >>> coming through, the "none" scheduler that virtio uses will never
> >>> get a trigger to push out the currently queued IO?
> >>
> >> No, the nr_requests isn't actually relevant in the blk-mq context, the
> >> driver sets its own depth. For the above, it's 64 normal commands, and 2
> >> reserved. The reserved would be for a flush, for instance. If someone
> >> attempts to queue more than the allocated number of requests, it'll stop
> >> the blk-mq queue and kick things into gear on the virtio side. Then when
> >> requests complete, we start the queue again.
> >>
> >> If you look at virtio_queue_rq(), that handles a single request. This
> >> request is already tagged at this point. If we can't add it to the ring,
> >> we simply stop the queue and kick off whatever pending we might have. We
> >> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
> >> sending us more. When we get the virtblk_done() callback from virtio, we
> >> end the requests on the blk-mq side and restart the queue.
> > 
> > I added some debug code to see if we had anything pending on the blk-mq
> > side, and it's all empty. It really just looks like we are missing
> > completions on the virtio side. Very odd.
> 
> Patching in the old rq path works however, so...

... we've found a race condition ;)

FWIW, the thing that comes immediately to mind for me is that all
the wakeups and queue kicking is done outside of any locks, using
status gained from inside the lock context. Hence the mq stop/start
and virt queue kicking can all race - perhaps that's resulting in a
missed wakeup, restart or queue kick?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 23:23             ` Dave Chinner
@ 2013-11-19 23:59               ` Jens Axboe
  2013-11-20  0:08                 ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2013-11-19 23:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On Wed, Nov 20 2013, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 03:51:27PM -0700, Jens Axboe wrote:
> > On 11/19/2013 03:42 PM, Jens Axboe wrote:
> > > On 11/19/2013 02:43 PM, Jens Axboe wrote:
> > >> On 11/19/2013 02:34 PM, Dave Chinner wrote:
> > >>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
> > >>>> On Tue, Nov 19 2013, Jens Axboe wrote:
> > >>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
> > >>>>>> Hi Jens,
> > >>>>>>
> > >>>>>> I was just running xfstests on a 3.13 kernel that has had the block
> > >>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
> > >>>>>> CPU VM using virtio,cache=none for the block devices under test,
> > >>>>>> with many (130+) threads stuck below submit_bio() like this:
> > >>>>>>
> > >>>>>>  Call Trace:
> > >>>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
> > >>>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> > >>>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> > >>>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> > >>>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> > >>>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> > >>>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> > >>>>>>
> > >>>>>> reads and writes are hung, both data (direct and buffered) and
> > >>>>>> metadata.
> > >>>>>>
> > >>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
> > >>>>>> buffered and direct IO, both reads and writes) so it looks like IO
> > >>>>>> completion has stalled in some manner, too.
> > >>>>>
> > >>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
> > >>>>
> > >>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
> > >>>> flight, but haven't completed. So the next one that comes in just sits
> > >>>> there forever. I can't find any sysfs debug entries for virtio, would be
> > >>>> nice to inspect its queue as well...
> > >>>
> > >>> Does it have anything to do with the fact that the request queue
> > >>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
> > >>>
> > >>> /sys/block/vdb/queue/nr_requests
> > >>> 128
> > >>>
> > >>> /sys/block/vdb/mq/0/tags
> > >>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> > >>> nr_free=0, nr_reserved=1
> > >>>   cpu00: nr_free=0
> > >>>   cpu01: nr_free=0
> > >>>
> > >>> Seems to imply that if we queue up more than 66 IOs without
> > >>> dispatching them, we'll run out of tags. And without another IO
> > >>> coming through, the "none" scheduler that virtio uses will never
> > >>> get a trigger to push out the currently queued IO?
> > >>
> > >> No, the nr_requests isn't actually relevant in the blk-mq context, the
> > >> driver sets its own depth. For the above, it's 64 normal commands, and 2
> > >> reserved. The reserved would be for a flush, for instance. If someone
> > >> attempts to queue more than the allocated number of requests, it'll stop
> > >> the blk-mq queue and kick things into gear on the virtio side. Then when
> > >> requests complete, we start the queue again.
> > >>
> > >> If you look at virtio_queue_rq(), that handles a single request. This
> > >> request is already tagged at this point. If we can't add it to the ring,
> > >> we simply stop the queue and kick off whatever pending we might have. We
> > >> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
> > >> sending us more. When we get the virtblk_done() callback from virtio, we
> > >> end the requests on the blk-mq side and restart the queue.
> > > 
> > > I added some debug code to see if we had anything pending on the blk-mq
> > > side, and it's all empty. It really just looks like we are missing
> > > completions on the virtio side. Very odd.
> > 
> > Patching in the old rq path works however, so...
> 
> ... we've found a race condition ;)
> 
> FWIW, the thing that comes immediately to mind for me is that all
> the wakeups and queue kicking is done outside of any locks, using
> status gained from inside the lock context. Hence the mq stop/start
> and virt queue kicking can all race - perhaps that's resulting in a
> missed wakeup, restart or queue kick?

I _think_ it only happens when queue depth > device queue depth, but I
don't immediately see what is wrong. That logic is pretty much tried and
true. As a debug measure, I just ignored the stop bit, but it doesn't
change anything. BTW, that is pretty much the same as before.

Anyway, no real news yet, will debug later tonight or tomorrow morning
again.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 23:59               ` Jens Axboe
@ 2013-11-20  0:08                 ` Jens Axboe
  2013-11-20  1:44                   ` Shaohua Li
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2013-11-20  0:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On Tue, Nov 19 2013, Jens Axboe wrote:
> On Wed, Nov 20 2013, Dave Chinner wrote:
> > On Tue, Nov 19, 2013 at 03:51:27PM -0700, Jens Axboe wrote:
> > > On 11/19/2013 03:42 PM, Jens Axboe wrote:
> > > > On 11/19/2013 02:43 PM, Jens Axboe wrote:
> > > >> On 11/19/2013 02:34 PM, Dave Chinner wrote:
> > > >>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
> > > >>>> On Tue, Nov 19 2013, Jens Axboe wrote:
> > > >>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
> > > >>>>>> Hi Jens,
> > > >>>>>>
> > > >>>>>> I was just running xfstests on a 3.13 kernel that has had the block
> > > >>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
> > > >>>>>> CPU VM using virtio,cache=none for the block devices under test,
> > > >>>>>> with many (130+) threads stuck below submit_bio() like this:
> > > >>>>>>
> > > >>>>>>  Call Trace:
> > > >>>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
> > > >>>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> > > >>>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> > > >>>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> > > >>>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> > > >>>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> > > >>>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> > > >>>>>>
> > > >>>>>> reads and writes are hung, both data (direct and buffered) and
> > > >>>>>> metadata.
> > > >>>>>>
> > > >>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
> > > >>>>>> buffered and direct IO, both reads and writes) so it looks like IO
> > > >>>>>> completion has stalled in some manner, too.
> > > >>>>>
> > > >>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
> > > >>>>
> > > >>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
> > > >>>> flight, but haven't completed. So the next one that comes in just sits
> > > >>>> there forever. I can't find any sysfs debug entries for virtio, would be
> > > >>>> nice to inspect its queue as well...
> > > >>>
> > > >>> Does it have anything to do with the fact that the request queue
> > > >>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
> > > >>>
> > > >>> /sys/block/vdb/queue/nr_requests
> > > >>> 128
> > > >>>
> > > >>> /sys/block/vdb/mq/0/tags
> > > >>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> > > >>> nr_free=0, nr_reserved=1
> > > >>>   cpu00: nr_free=0
> > > >>>   cpu01: nr_free=0
> > > >>>
> > > >>> Seems to imply that if we queue up more than 66 IOs without
> > > >>> dispatching them, we'll run out of tags. And without another IO
> > > >>> coming through, the "none" scheduler that virtio uses will never
> > > >>> get a trigger to push out the currently queued IO?
> > > >>
> > > >> No, the nr_requests isn't actually relevant in the blk-mq context, the
> > > >> driver sets its own depth. For the above, it's 64 normal commands, and 2
> > > >> reserved. The reserved would be for a flush, for instance. If someone
> > > >> attempts to queue more than the allocated number of requests, it'll stop
> > > >> the blk-mq queue and kick things into gear on the virtio side. Then when
> > > >> requests complete, we start the queue again.
> > > >>
> > > >> If you look at virtio_queue_rq(), that handles a single request. This
> > > >> request is already tagged at this point. If we can't add it to the ring,
> > > >> we simply stop the queue and kick off whatever pending we might have. We
> > > >> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
> > > >> sending us more. When we get the virtblk_done() callback from virtio, we
> > > >> end the requests on the blk-mq side and restart the queue.
> > > > 
> > > > I added some debug code to see if we had anything pending on the blk-mq
> > > > side, and it's all empty. It really just looks like we are missing
> > > > completions on the virtio side. Very odd.
> > > 
> > > Patching in the old rq path works however, so...
> > 
> > ... we've found a race condition ;)
> > 
> > FWIW, the thing that comes immediately to mind for me is that all
> > the wakeups and queue kicking is done outside of any locks, using
> > status gained from inside the lock context. Hence the mq stop/start
> > and virt queue kicking can all race - perhaps that's resulting in a
> > missed wakeup, restart or queue kick?
> 
> I _think_ it only happens when queue depth > device queue depth, but I
> don't immediately see what is wrong. That logic is pretty much tried and
> true. As a debug measure, I just ignored the stop bit, but it doesn't
> change anything. BTW, that is pretty much the same as before.
> 
> Anyway, no real news yet, will debug later tonight or tomorrow morning
> again.

BTW, ran a blktrace on the device, also looks like completion events
just stop coming in:

[...]
253,16   0   321645     2.044497762  1394  P   N [fio]
253,16   0   321646     2.044498812  1394  U   N [fio] 1
253,16   0   321647     2.044499839  1394  D   R 5768832 + 8 [fio]
253,16   1   307476     2.044504353  1396  S   R 4412024 + 8 [fio]
253,16   1   307477     2.044505381  1396  P   N [fio]
253,16   1   307478     2.044506799  1396  U   N [fio] 1
253,16   1   307479     2.044507784  1396  D   R 4412024 + 8 [fio]
253,16   0   321648     2.044509571  1394  C   R 3828264 + 8 [0]
253,16   0   321649     2.044525980  1394  Q   R 13757184 + 8 [fio]
253,16   0   321650     2.044526706  1394  G   R 13757184 + 8 [fio]
253,16   1   307480     2.044527670  1396  C   R 16292472 + 8 [0]
253,16   0   321651     2.044527830  1394  P   N [fio]
253,16   0   321652     2.044528889  1394  U   N [fio] 1
253,16   0   321653     2.044530006  1394  D   R 13757184 + 8 [fio]
253,16   0   321654     2.044539915  1394  C   R 7557264 + 8 [0]
253,16   0   321655     2.044544008  1394  C   R 4320544 + 8 [0]
253,16   1   307481     2.044574922  1396  C   R 15264800 + 8 [0]
253,16   1   307482     2.044579465  1396  C   R 15595656 + 8 [0]
253,16   0   321656     2.044591661  1394  C   R 8071352 + 8 [0]
253,16   1   307483     2.044595664  1395  P   N [fio]
253,16   1   307484     2.044597335  1395  U   N [fio] 1
253,16   1   307485     2.044605257  1395  C   R 15726296 + 8 [0]
253,16   0   321657     2.044607859  1394  Q   R 17562168 + 8 [fio]
253,16   0   321658     2.044608531  1394  G   R 17562168 + 8 [fio]
253,16   0   321659     2.044609756  1394  P   N [fio]
253,16   0   321660     2.044611086  1394  U   N [fio] 1
253,16   0   321661     2.044612275  1394  D   R 17562168 + 8 [fio]
253,16   1   307486     2.044614316  1395  D   R 4664552 + 8 [fio]
253,16   0   321662     2.044621707  1394  C   R 18403168 + 8 [0]
253,16   0   321663     2.044626241  1394  C   R 6357824 + 8 [0]
253,16   1   307487     2.044637446  1393  Q   R 2061240 + 8 [fio]
253,16   1   307488     2.044638263  1393  G   R 2061240 + 8 [fio]
253,16   1   307489     2.044645914  1393  C   R 6364168 + 8 [0]
253,16   0   321664     2.044650488  1394  Q   R 20380488 + 8 [fio]
253,16   0   321665     2.044651199  1394  G   R 20380488 + 8 [fio]
253,16   1   307490     2.044654060  1393  P   N [fio]
253,16   1   307491     2.044655466  1393  U   N [fio] 1
253,16   1   307492     2.044656734  1393  D   R 2061240 + 8 [fio]
253,16   0   321666     2.044659910  1394  C   R 16579544 + 8 [0]
253,16   0   321667     2.044663679  1394  C   R 14878944 + 8 [0]
253,16   0   321668     2.044671537  1394  P   N [fio]
253,16   0   321669     2.044673120  1394  U   N [fio] 1
253,16   0   321670     2.044674203  1394  D   R 20380488 + 8 [fio]
253,16   1   307493     2.044678596  1395  Q   R 15126872 + 8 [fio]
253,16   1   307494     2.044679280  1395  G   R 15126872 + 8 [fio]
253,16   1   307495     2.044682674  1395  P   N [fio]
253,16   1   307496     2.044689057  1395  C   R 6251136 + 8 [0]
253,16   1   307497     2.044697970  1395  U   N [fio] 1
253,16   1   307498     2.044699073  1395  D   R 15126872 + 8 [fio]
253,16   0   321671     2.044702366  1394  C   R 6028184 + 8 [0]
253,16   0   321672     2.044710344  1394  Q   R 8698464 + 8 [fio]
253,16   0   321673     2.044711072  1394  G   R 8698464 + 8 [fio]
253,16   0   321674     2.044712179  1394  P   N [fio]
253,16   0   321675     2.044713626  1394  U   N [fio] 1
253,16   1   307499     2.044713961  1393  C   R 20851408 + 8 [0]
253,16   0   321676     2.044714693  1394  D   R 8698464 + 8 [fio]
253,16   0   321677     2.044724918  1394  C   R 16890840 + 8 [0]
253,16   1   307500     2.044733628  1393  Q   R 20289656 + 8 [fio]
253,16   1   307501     2.044739576  1393  C   R 6570816 + 8 [0]
253,16   1   307502     2.044747620  1393  G   R 20289656 + 8 [fio]
253,16   1   307503     2.044748710  1393  P   N [fio]
253,16   1   307504     2.044750128  1393  U   N [fio] 1
253,16   0   321678     2.044750392  1394  C   R 6513968 + 8 [0]
253,16   1   307505     2.044751307  1393  D   R 20289656 + 8 [fio]
253,16   0   321679     2.044758314  1394  Q   R 5781280 + 8 [fio]
253,16   0   321680     2.044758861  1394  G   R 5781280 + 8 [fio]
253,16   0   321681     2.044759973  1394  P   N [fio]
253,16   0   321682     2.044761183  1394  U   N [fio] 1
253,16   1   307506     2.044761883  1396  Q   R 7477904 + 8 [fio]
253,16   0   321683     2.044762348  1394  D   R 5781280 + 8 [fio]
253,16   1   307507     2.044762462  1396  G   R 7477904 + 8 [fio]
253,16   1   307508     2.044763486  1396  P   N [fio]
253,16   1   307509     2.044764604  1396  U   N [fio] 1
253,16   1   307510     2.044765426  1396  D   R 7477904 + 8 [fio]
253,16   0   321684     2.044776281  1394  Q   R 17552200 + 8 [fio]
253,16   0   321685     2.044776867  1394  G   R 17552200 + 8 [fio]
253,16   0   321686     2.044777975  1394  P   N [fio]
253,16   1   307511     2.044778944  1396  C   R 16910272 + 8 [0]
253,16   0   321687     2.044779060  1394  U   N [fio] 1
253,16   0   321688     2.044780234  1394  D   R 17552200 + 8 [fio]
253,16   0   321689     2.044792840  1394  C   R 19736656 + 8 [0]
253,16   1   307512     2.044803867  1395  C   R 18088120 + 8 [0]
253,16   0   321690     2.044817910  1394  C   R 19495248 + 8 [0]
253,16   1   307513     2.044831267  1396  C   R 18613664 + 8 [0]
253,16   0   321691     2.044835246  1394  Q   R 2662776 + 8 [fio]
253,16   0   321692     2.044835878  1394  G   R 2662776 + 8 [fio]
253,16   0   321693     2.044836968  1394  P   N [fio]
253,16   0   321694     2.044838353  1394  U   N [fio] 1
253,16   0   321695     2.044839447  1394  D   R 2662776 + 8 [fio]
253,16   1   307514     2.044851246  1395  Q   R 8578768 + 8 [fio]
253,16   1   307515     2.044851868  1395  G   R 8578768 + 8 [fio]
253,16   1   307516     2.044853020  1395  P   N [fio]
253,16   0   321696     2.044853136  1394  Q   R 12783160 + 8 [fio]
253,16   0   321697     2.044853700  1394  G   R 12783160 + 8 [fio]
253,16   1   307517     2.044854185  1395  U   N [fio] 1
253,16   0   321698     2.044854759  1394  P   N [fio]
253,16   1   307518     2.044855289  1395  D   R 8578768 + 8 [fio]
253,16   0   321699     2.044855745  1394  U   N [fio] 1
253,16   0   321700     2.044856691  1394  D   R 12783160 + 8 [fio]
253,16   1   307519     2.044869744  1395  Q   R 1741184 + 8 [fio]
253,16   0   321701     2.044869754  1394  Q   R 11982944 + 8 [fio]
253,16   0   321702     2.044870388  1394  G   R 11982944 + 8 [fio]
253,16   1   307520     2.044870481  1395  G   R 1741184 + 8 [fio]
253,16   0   321703     2.044871564  1394  P   N [fio]
253,16   1   307521     2.044871650  1395  P   N [fio]
253,16   0   321704     2.044872615  1394  U   N [fio] 1
253,16   1   307522     2.044872687  1395  U   N [fio] 1
253,16   0   321705     2.044873938  1394  D   R 11982944 + 8 [fio]
253,16   0   321706     2.044875583  1394  D   R 1741184 + 8 [fio]
253,16   1   307523     2.044884491  1395  Q   R 11475600 + 8 [fio]
253,16   1   307524     2.044885101  1395  G   R 11475600 + 8 [fio]
253,16   1   307525     2.044886205  1395  P   N [fio]
253,16   0   321707     2.044887140  1394  Q   R 6125272 + 8 [fio]
253,16   1   307526     2.044887313  1395  U   N [fio] 1
253,16   0   321708     2.044887737  1394  G   R 6125272 + 8 [fio]
253,16   1   307527     2.044888330  1395  D   R 11475600 + 8 [fio]
253,16   0   321709     2.044888832  1394  P   N [fio]
253,16   0   321710     2.044889710  1394  U   N [fio] 1
253,16   0   321711     2.044890577  1394  D   R 6125272 + 8 [fio]
253,16   1   307528     2.044900606  1395  Q   R 634648 + 8 [fio]
253,16   1   307529     2.044901228  1395  G   R 634648 + 8 [fio]
253,16   0   321712     2.044901785  1394  Q   R 9187912 + 8 [fio]
253,16   1   307530     2.044902443  1395  P   N [fio]
253,16   0   321713     2.044902452  1394  G   R 9187912 + 8 [fio]
253,16   0   321714     2.044903495  1394  P   N [fio]
253,16   1   307531     2.044903547  1395  U   N [fio] 1
253,16   0   321715     2.044904448  1394  U   N [fio] 1
253,16   1   307532     2.044904580  1395  D   R 634648 + 8 [fio]
253,16   0   321716     2.044905342  1394  D   R 9187912 + 8 [fio]
253,16   1   307533     2.044917154  1395  Q   R 10607112 + 8 [fio]
253,16   1   307534     2.044917842  1395  G   R 10607112 + 8 [fio]
253,16   1   307535     2.044918920  1395  P   N [fio]
253,16   0   321717     2.044918974  1394  C   R 20819560 + 8 [0]
253,16   1   307536     2.044920027  1395  U   N [fio] 1
253,16   1   307537     2.044920973  1395  D   R 10607112 + 8 [fio]
253,16   1   307538     2.044932045  1395  C   R 18488736 + 8 [0]
253,16   0   321718     2.044938208  1394  Q   R 3527824 + 8 [fio]
253,16   0   321719     2.044938891  1394  G   R 3527824 + 8 [fio]
253,16   0   321720     2.044946221  1394  C   R 1054040 + 8 [0]
253,16   0   321721     2.044950233  1394  C   R 9042344 + 8 [0]
253,16   0   321722     2.044957832  1394  P   N [fio]
253,16   0   321723     2.044959392  1394  U   N [fio] 1
253,16   1   307539     2.044959886  1393  Q   R 18613872 + 8 [fio]
253,16   0   321724     2.044960611  1394  D   R 3527824 + 8 [fio]
253,16   1   307540     2.044960694  1393  G   R 18613872 + 8 [fio]
253,16   1   307541     2.044961852  1393  P   N [fio]
253,16   1   307542     2.044970724  1393  C   R 19504408 + 8 [0]
253,16   1   307543     2.044974744  1393  C   R 509080 + 8 [0]
253,16   0   321725     2.044980982  1394  Q   R 599736 + 8 [fio]
253,16   0   321726     2.044981594  1394  G   R 599736 + 8 [fio]
253,16   1   307544     2.044982536  1393  U   N [fio] 1
253,16   0   321727     2.044982690  1394  P   N [fio]
253,16   0   321728     2.044983622  1394  U   N [fio] 1
253,16   1   307545     2.044983687  1393  D   R 18613872 + 8 [fio]
253,16   0   321729     2.044986857  1394  D   R 599736 + 8 [fio]
253,16   1   307546     2.044992685  1396  Q   R 15873256 + 8 [fio]
253,16   0   321730     2.044993135  1394  C   R 9258392 + 8 [0]
253,16   1   307547     2.044993319  1396  G   R 15873256 + 8 [fio]
253,16   1   307548     2.044994455  1396  P   N [fio]
253,16   1   307549     2.044995683  1396  U   N [fio] 1
253,16   1   307550     2.044996660  1396  D   R 15873256 + 8 [fio]
253,16   0   321731     2.044996827  1394  C   R 12335824 + 8 [0]
253,16   1   307551     2.045015913  1396  C   R 10141512 + 8 [0]
253,16   1   307552     2.045019947  1396  C   R 2943808 + 8 [0]
253,16   0   321732     2.045023048  1394  Q   R 15878680 + 8 [fio]
253,16   0   321733     2.045023693  1394  G   R 15878680 + 8 [fio]
253,16   0   321734     2.045024976  1394  P   N [fio]
253,16   0   321735     2.045026157  1394  U   N [fio] 1
253,16   0   321736     2.045027185  1394  D   R 15878680 + 8 [fio]
253,16   0   321737     2.045039003  1394  C   R 12536088 + 8 [0]
253,16   0   321738     2.045043351  1394  C   R 12728832 + 8 [0]
253,16   1   307553     2.045047258  1393  Q   R 16429168 + 8 [fio]
253,16   1   307554     2.045047952  1393  G   R 16429168 + 8 [fio]
253,16   1   307555     2.045049191  1393  P   N [fio]
253,16   1   307556     2.045050400  1393  U   N [fio] 1
253,16   1   307557     2.045051596  1393  D   R 16429168 + 8 [fio]
253,16   1   307558     2.045059051  1393  C   R 19676536 + 8 [0]
253,16   1   307559     2.045063260  1393  C   R 3602624 + 8 [0]
253,16   0   321739     2.045067833  1394  Q   R 8871496 + 8 [fio]
253,16   0   321740     2.045068568  1394  G   R 8871496 + 8 [fio]
253,16   0   321741     2.045069712  1394  P   N [fio]
253,16   0   321742     2.045070860  1394  U   N [fio] 1
253,16   0   321743     2.045071828  1394  D   R 8871496 + 8 [fio]
253,16   1   307560     2.045081644  1396  Q   R 18234504 + 8 [fio]
253,16   1   307561     2.045082288  1396  G   R 18234504 + 8 [fio]
253,16   1   307562     2.045083331  1396  P   N [fio]
253,16   0   321744     2.045083945  1394  C   R 12931496 + 8 [0]
253,16   1   307563     2.045084583  1396  U   N [fio] 1
253,16   1   307564     2.045085578  1396  D   R 18234504 + 8 [fio]
253,16   0   321745     2.045087682  1394  C   R 8223176 + 8 [0]
253,16   1   307565     2.045107378  1396  C   R 3414184 + 8 [0]
253,16   0   321746     2.045109170  1394  Q   R 7639272 + 8 [fio]
253,16   0   321747     2.045109940  1394  G   R 7639272 + 8 [fio]
253,16   0   321748     2.045111122  1394  P   N [fio]
253,16   1   307566     2.045111541  1396  C   R 1703056 + 8 [0]
253,16   0   321749     2.045112293  1394  U   N [fio] 1
253,16   0   321750     2.045113409  1394  D   R 7639272 + 8 [fio]
253,16   0   321751     2.045129019  1394  C   R 4844912 + 8 [0]
253,16   0   321752     2.045133082  1394  C   R 14433456 + 8 [0]
253,16   1   307567     2.045138375  1393  Q   R 1554408 + 8 [fio]
253,16   1   307568     2.045139072  1393  G   R 1554408 + 8 [fio]
253,16   0   321753     2.045159160  1394  Q   R 3870352 + 8 [fio]
253,16   0   321754     2.045159766  1394  G   R 3870352 + 8 [fio]
253,16   0   321755     2.045160881  1394  P   N [fio]
253,16   0   321756     2.045162172  1394  U   N [fio] 1
253,16   0   321757     2.045163282  1394  D   R 3870352 + 8 [fio]
253,16   1   307569     2.045169743  1393  C   R 2935704 + 8 [0]
253,16   1   307570     2.045178563  1393  P   N [fio]
253,16   1   307571     2.045180295  1393  U   N [fio] 1
253,16   0   321758     2.045180768  1394  C   R 15646472 + 8 [0]
253,16   1   307572     2.045181451  1393  D   R 1554408 + 8 [fio]
253,16   0   321759     2.045184556  1394  C   R 14682176 + 8 [0]
253,16   0   321760     2.045187371  1394  C   R 13046784 + 8 [0]
253,16   0   321761     2.045196543  1394  Q   R 11609952 + 8 [fio]
253,16   0   321762     2.045197289  1394  G   R 11609952 + 8 [fio]
253,16   1   307573     2.045198166  1393  C   R 14489456 + 8 [0]
253,16   0   321763     2.045198329  1394  P   N [fio]
253,16   0   321764     2.045199520  1394  U   N [fio] 1
253,16   0   321765     2.045200544  1394  D   R 11609952 + 8 [fio]
253,16   0   321766     2.045208562  1394  C   R 6083512 + 8 [0]
253,16   1   307574     2.045219756  1395  C   R 3340928 + 8 [0]
253,16   0   321767     2.045232590  1394  C   R 6967144 + 8 [0]
253,16   1   307575     2.045243177  1396  C   R 15966080 + 8 [0]
253,16   1   307576     2.045250844  1396  Q   R 9139408 + 8 [fio]
253,16   1   307577     2.045251499  1396  G   R 9139408 + 8 [fio]
253,16   1   307578     2.045252527  1396  P   N [fio]
253,16   1   307579     2.045254025  1396  U   N [fio] 1
253,16   0   321768     2.045254048  1394  C   R 5681104 + 8 [0]
253,16   1   307580     2.045255169  1396  D   R 9139408 + 8 [fio]
253,16   0   321769     2.045262286  1394  Q   R 9056968 + 8 [fio]
253,16   0   321770     2.045262923  1394  G   R 9056968 + 8 [fio]
253,16   0   321771     2.045263851  1394  P   N [fio]
253,16   0   321772     2.045265016  1394  U   N [fio] 1
253,16   0   321773     2.045266087  1394  D   R 9056968 + 8 [fio]
253,16   1   307581     2.045267796    40  C   R 14598208 + 8 [0]
253,16   0   321774     2.045284312  1394  Q   R 3200080 + 8 [fio]
253,16   0   321775     2.045284867  1394  G   R 3200080 + 8 [fio]
253,16   0   321776     2.045285862  1394  P   N [fio]
253,16   1   307582     2.045287878  1396  Q   R 18113240 + 8 [fio]
253,16   1   307583     2.045288509  1396  G   R 18113240 + 8 [fio]
253,16   1   307584     2.045289508  1396  P   N [fio]
253,16   1   307585     2.045290565  1396  U   N [fio] 1
253,16   1   307586     2.045291629  1396  D   R 18113240 + 8 [fio]
253,16   0   321777     2.045294432  1394  C   R 4126280 + 8 [0]
253,16   0   321778     2.045302896  1394  U   N [fio] 1
253,16   0   321779     2.045303909  1394  D   R 3200080 + 8 [fio]
253,16   1   307587     2.045305145  1396  Q   R 12895512 + 8 [fio]
253,16   1   307588     2.045305766  1396  G   R 12895512 + 8 [fio]
253,16   1   307589     2.045306887  1396  P   N [fio]
253,16   1   307590     2.045308050  1396  U   N [fio] 1
253,16   1   307591     2.045309139  1396  D   R 12895512 + 8 [fio]
253,16   1   307592     2.045322292  1396  Q   R 14034472 + 8 [fio]
253,16   0   321780     2.045322301  1394  Q   R 796184 + 8 [fio]
253,16   0   321781     2.045322940  1394  G   R 796184 + 8 [fio]
253,16   1   307593     2.045322958  1396  G   R 14034472 + 8 [fio]
253,16   0   321782     2.045324193  1394  P   N [fio]
253,16   1   307594     2.045324199  1396  P   N [fio]
253,16   1   307595     2.045325295  1396  U   N [fio] 1
253,16   0   321783     2.045325314  1394  U   N [fio] 1
253,16   0   321784     2.045326501  1394  D   R 14034472 + 8 [fio]
253,16   1   307596     2.045326516  1396  D   R 796184 + 8 [fio]
253,16   1   307597     2.045339252  1396  Q   R 8827184 + 8 [fio]
253,16   0   321785     2.045339262  1394  Q   R 16108264 + 8 [fio]
253,16   0   321786     2.045339900  1394  G   R 16108264 + 8 [fio]
253,16   1   307598     2.045339902  1396  G   R 8827184 + 8 [fio]
253,16   1   307599     2.045341051  1396  P   N [fio]
253,16   0   321787     2.045341126  1394  P   N [fio]
253,16   1   307600     2.045342012  1396  U   N [fio] 1
253,16   0   321788     2.045342086  1394  U   N [fio] 1
253,16   0   321789     2.045343282  1394  D   R 16108264 + 8 [fio]
253,16   0   321790     2.045344979  1394  D   R 8827184 + 8 [fio]
253,16   1   307601     2.045353588  1396  Q   R 11285896 + 8 [fio]
253,16   1   307602     2.045354336  1396  G   R 11285896 + 8 [fio]
253,16   1   307603     2.045355348  1396  P   N [fio]
253,16   0   321791     2.045356120  1394  Q   R 8593080 + 8 [fio]
253,16   1   307604     2.045356402  1396  U   N [fio] 1
253,16   0   321792     2.045356714  1394  G   R 8593080 + 8 [fio]
253,16   1   307605     2.045357472  1396  D   R 11285896 + 8 [fio]
253,16   0   321793     2.045357789  1394  P   N [fio]
253,16   0   321794     2.045358691  1394  U   N [fio] 1
253,16   0   321795     2.045359550  1394  D   R 8593080 + 8 [fio]
253,16   1   307606     2.045369558  1396  Q   R 17509128 + 8 [fio]
253,16   1   307607     2.045370220  1396  G   R 17509128 + 8 [fio]
253,16   0   321796     2.045370807  1394  Q   R 7393376 + 8 [fio]
253,16   1   307608     2.045371249  1396  P   N [fio]
253,16   0   321797     2.045371405  1394  G   R 7393376 + 8 [fio]
253,16   1   307609     2.045372368  1396  U   N [fio] 1
253,16   0   321798     2.045372460  1394  P   N [fio]
253,16   1   307610     2.045373379  1396  D   R 17509128 + 8 [fio]
253,16   0   321799     2.045373443  1394  U   N [fio] 1
253,16   0   321800     2.045374334  1394  D   R 7393376 + 8 [fio]
253,16   1   307611     2.045385849  1396  Q   R 15227224 + 8 [fio]
253,16   0   321801     2.045385859  1394  Q   R 4025880 + 8 [fio]
253,16   0   321802     2.045386465  1394  G   R 4025880 + 8 [fio]
253,16   1   307612     2.045386490  1396  G   R 15227224 + 8 [fio]
253,16   0   321803     2.045387752  1394  P   N [fio]
253,16   1   307613     2.045387756  1396  P   N [fio]
253,16   1   307614     2.045388787  1396  U   N [fio] 1
253,16   0   321804     2.045388802  1394  U   N [fio] 1
253,16   0   321805     2.045390041  1394  D   R 4025880 + 8 [fio]
253,16   0   321806     2.045391662  1394  D   R 15227224 + 8 [fio]
253,16   1   307615     2.045400454  1396  Q   R 1524424 + 8 [fio]
253,16   1   307616     2.045401113  1396  G   R 1524424 + 8 [fio]
253,16   1   307617     2.045402135  1396  P   N [fio]
253,16   0   321807     2.045402877  1394  Q   R 20400040 + 8 [fio]
253,16   1   307618     2.045403201  1396  U   N [fio] 1
253,16   0   321808     2.045403474  1394  G   R 20400040 + 8 [fio]
253,16   1   307619     2.045404307  1396  D   R 1524424 + 8 [fio]
253,16   0   321809     2.045404603  1394  P   N [fio]
253,16   0   321810     2.045405503  1394  U   N [fio] 1
253,16   0   321811     2.045406368  1394  D   R 20400040 + 8 [fio]
253,16   1   307620     2.045416364  1396  Q   R 12912080 + 8 [fio]
253,16   1   307621     2.045417025  1396  G   R 12912080 + 8 [fio]
253,16   0   321812     2.045417239  1394  Q   R 14427880 + 8 [fio]
253,16   0   321813     2.045417755  1394  G   R 14427880 + 8 [fio]
253,16   1   307622     2.045418346  1396  P   N [fio]
253,16   0   321814     2.045418901  1394  P   N [fio]
253,16   1   307623     2.045419483  1396  U   N [fio] 1
253,16   0   321815     2.045419817  1394  U   N [fio] 1
253,16   1   307624     2.045420512  1396  D   R 12912080 + 8 [fio]
253,16   0   321816     2.045420817  1394  D   R 14427880 + 8 [fio]
253,16   0   321817     2.045433275  1394  Q   R 5098080 + 8 [fio]
253,16   1   307625     2.045433281  1396  Q   R 9204032 + 8 [fio]
253,16   1   307626     2.045433910  1396  G   R 9204032 + 8 [fio]
253,16   0   321818     2.045433939  1394  G   R 5098080 + 8 [fio]
253,16   1   307627     2.045435074  1396  P   N [fio]
253,16   0   321819     2.045435151  1394  P   N [fio]
253,16   1   307628     2.045436065  1396  U   N [fio] 1
253,16   0   321820     2.045436134  1394  U   N [fio] 1
253,16   1   307629     2.045437153  1396  D   R 5098080 + 8 [fio]
253,16   0   321821     2.045437189  1394  D   R 9204032 + 8 [fio]
253,16   0   321822     2.045449724  1394  Q   R 8743352 + 8 [fio]
253,16   1   307630     2.045449731  1396  Q   R 3957640 + 8 [fio]
253,16   1   307631     2.045450373  1396  G   R 3957640 + 8 [fio]
253,16   0   321823     2.045450385  1394  G   R 8743352 + 8 [fio]
253,16   1   307632     2.045451499  1396  P   N [fio]
253,16   0   321824     2.045451560  1394  P   N [fio]
253,16   1   307633     2.045452564  1396  U   N [fio] 1
253,16   0   321825     2.045452595  1394  U   N [fio] 1
253,16   1   307634     2.045453896  1396  D   R 8743352 + 8 [fio]
253,16   1   307635     2.045455483  1396  D   R 3957640 + 8 [fio]
253,16   0   321826     2.045464230  1394  Q   R 18141352 + 8 [fio]
253,16   0   321827     2.045464809  1394  G   R 18141352 + 8 [fio]
253,16   0   321828     2.045465808  1394  P   N [fio]
253,16   1   307636     2.045466884  1396  Q   R 14356248 + 8 [fio]
253,16   0   321829     2.045466925  1394  U   N [fio] 1
253,16   1   307637     2.045467445  1396  G   R 14356248 + 8 [fio]
253,16   0   321830     2.045467911  1394  D   R 18141352 + 8 [fio]
253,16   1   307638     2.045469917  1396  P   N [fio]
253,16   1   307639     2.045470841  1396  U   N [fio] 1
253,16   1   307640     2.045471757  1396  D   R 14356248 + 8 [fio]
253,16   0   321831     2.045480371  1394  Q   R 3678872 + 8 [fio]
253,16   0   321832     2.045480944  1394  G   R 3678872 + 8 [fio]
253,16   0   321833     2.045482626  1394  S   R 3678872 + 8 [fio]
253,16   1   307641     2.045483448  1396  Q   R 14080120 + 8 [fio]
253,16   1   307642     2.045483952  1396  G   R 14080120 + 8 [fio]
253,16   1   307643     2.045484932  1396  P   N [fio]
253,16   1   307644     2.045485963  1396  U   N [fio] 1
253,16   1   307645     2.045486944  1396  D   R 14080120 + 8 [fio]
253,16   1   307646     2.045500301  1396  Q   R 7780032 + 8 [fio]
253,16   1   307647     2.045500913  1396  G   R 7780032 + 8 [fio]
253,16   1   307648     2.045502001  1396  P   N [fio]
253,16   1   307649     2.045503033  1396  U   N [fio] 1
253,16   1   307650     2.045504023  1396  D   R 7780032 + 8 [fio]
253,16   1   307651     2.045514780  1396  Q   R 19258792 + 8 [fio]
253,16   1   307652     2.045515251  1396  G   R 19258792 + 8 [fio]
253,16   1   307653     2.045516023  1396  P   N [fio]
253,16   1   307654     2.045516601  1396  U   N [fio] 1
253,16   1   307655     2.045517220  1396  D   R 19258792 + 8 [fio]
253,16   1   307656     2.045524439  1396  Q   R 17379048 + 8 [fio]
253,16   1   307657     2.045524744  1396  G   R 17379048 + 8 [fio]
253,16   1   307658     2.045525291  1396  P   N [fio]
253,16   1   307659     2.045525815  1396  U   N [fio] 1
253,16   1   307660     2.045526389  1396  D   R 17379048 + 8 [fio]
253,16   1   307661     2.045533654  1396  Q   R 6701640 + 8 [fio]
253,16   1   307662     2.045533958  1396  G   R 6701640 + 8 [fio]
253,16   1   307663     2.045535034  1396  S   R 6701640 + 8 [fio]
253,16   1   307664     2.045545018  1395  Q   R 3973592 + 8 [fio]
253,16   1   307665     2.045545304  1395  G   R 3973592 + 8 [fio]
253,16   1   307666     2.045546132  1395  S   R 3973592 + 8 [fio]
253,16   1   307667     2.045574463  1393  Q   R 2334424 + 8 [fio]
253,16   1   307668     2.045574949  1393  G   R 2334424 + 8 [fio]
253,16   1   307669     2.045575980  1393  S   R 2334424 + 8 [fio]
CPU0 (vdb):
 Reads Queued:      52,778,  211,112KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:   53,948,  215,792KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:   51,912,  207,648KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:            64        	 Write depth:             0
 IO unplugs:        52,773        	 Timer unplugs:           0
CPU1 (vdb):
 Reads Queued:      50,593,  202,372KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:   49,419,  197,676KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:   51,391,  205,564KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:            64        	 Write depth:             0
 IO unplugs:        50,594        	 Timer unplugs:           0

Total (vdb):
 Reads Queued:     103,371,  413,484KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:  103,367,  413,468KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:  103,303,  413,212KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 IO unplugs:       103,367        	 Timer unplugs:           0

Throughput (R/W): 202,059KiB/s / 0KiB/s
Events (vdb): 629,502 entries
Skips: 0 forward (0 -   0.0%)

The last part of the trace is all 4 processes I run failing to allocate
a request and going to sleep. The last completion event is some 150
lines up, after various successful dispatch etc events.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  0:08                 ` Jens Axboe
@ 2013-11-20  1:44                   ` Shaohua Li
  2013-11-20  1:54                     ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Shaohua Li @ 2013-11-20  1:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Dave Chinner, linux-kernel

On Tue, Nov 19, 2013 at 05:08:58PM -0700, Jens Axboe wrote:
> On Tue, Nov 19 2013, Jens Axboe wrote:
> > On Wed, Nov 20 2013, Dave Chinner wrote:
> > > On Tue, Nov 19, 2013 at 03:51:27PM -0700, Jens Axboe wrote:
> > > > On 11/19/2013 03:42 PM, Jens Axboe wrote:
> > > > > On 11/19/2013 02:43 PM, Jens Axboe wrote:
> > > > >> On 11/19/2013 02:34 PM, Dave Chinner wrote:
> > > > >>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
> > > > >>>> On Tue, Nov 19 2013, Jens Axboe wrote:
> > > > >>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
> > > > >>>>>> Hi Jens,
> > > > >>>>>>
> > > > >>>>>> I was just running xfstests on a 3.13 kernel that has had the block
> > > > >>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
> > > > >>>>>> CPU VM using virtio,cache=none for the block devices under test,
> > > > >>>>>> with many (130+) threads stuck below submit_bio() like this:
> > > > >>>>>>
> > > > >>>>>>  Call Trace:
> > > > >>>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
> > > > >>>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> > > > >>>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> > > > >>>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> > > > >>>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> > > > >>>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> > > > >>>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> > > > >>>>>>
> > > > >>>>>> reads and writes are hung, both data (direct and buffered) and
> > > > >>>>>> metadata.
> > > > >>>>>>
> > > > >>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
> > > > >>>>>> buffered and direct IO, both reads and writes) so it looks like IO
> > > > >>>>>> completion has stalled in some manner, too.
> > > > >>>>>
> > > > >>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
> > > > >>>>
> > > > >>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
> > > > >>>> flight, but haven't completed. So the next one that comes in just sits
> > > > >>>> there forever. I can't find any sysfs debug entries for virtio, would be
> > > > >>>> nice to inspect its queue as well...
> > > > >>>
> > > > >>> Does it have anything to do with the fact that the request queue
> > > > >>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
> > > > >>>
> > > > >>> /sys/block/vdb/queue/nr_requests
> > > > >>> 128
> > > > >>>
> > > > >>> /sys/block/vdb/mq/0/tags
> > > > >>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> > > > >>> nr_free=0, nr_reserved=1
> > > > >>>   cpu00: nr_free=0
> > > > >>>   cpu01: nr_free=0
> > > > >>>
> > > > >>> Seems to imply that if we queue up more than 66 IOs without
> > > > >>> dispatching them, we'll run out of tags. And without another IO
> > > > >>> coming through, the "none" scheduler that virtio uses will never
> > > > >>> get a trigger to push out the currently queued IO?
> > > > >>
> > > > >> No, the nr_requests isn't actually relevant in the blk-mq context, the
> > > > >> driver sets its own depth. For the above, it's 64 normal commands, and 2
> > > > >> reserved. The reserved would be for a flush, for instance. If someone
> > > > >> attempts to queue more than the allocated number of requests, it'll stop
> > > > >> the blk-mq queue and kick things into gear on the virtio side. Then when
> > > > >> requests complete, we start the queue again.
> > > > >>
> > > > >> If you look at virtio_queue_rq(), that handles a single request. This
> > > > >> request is already tagged at this point. If we can't add it to the ring,
> > > > >> we simply stop the queue and kick off whatever pending we might have. We
> > > > >> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
> > > > >> sending us more. When we get the virtblk_done() callback from virtio, we
> > > > >> end the requests on the blk-mq side and restart the queue.
> > > > > 
> > > > > I added some debug code to see if we had anything pending on the blk-mq
> > > > > side, and it's all empty. It really just looks like we are missing
> > > > > completions on the virtio side. Very odd.
> > > > 
> > > > Patching in the old rq path works however, so...
> > > 
> > > ... we've found a race condition ;)
> > > 
> > > FWIW, the thing that comes immediately to mind for me is that all
> > > the wakeups and queue kicking is done outside of any locks, using
> > > status gained from inside the lock context. Hence the mq stop/start
> > > and virt queue kicking can all race - perhaps that's resulting in a
> > > missed wakeup, restart or queue kick?
> > 
> > I _think_ it only happens when queue depth > device queue depth, but I
> > don't immediately see what is wrong. That logic is pretty much tried and
> > true. As a debug measure, I just ignored the stop bit, but it doesn't
> > change anything. BTW, that is pretty much the same as before.
> > 
> > Anyway, no real news yet, will debug later tonight or tomorrow morning
> > again.
> 
> BTW, ran a blktrace on the device, also looks like completion events
> just stop coming in:
> 
> [...]
> 253,16   0   321645     2.044497762  1394  P   N [fio]
> 253,16   0   321646     2.044498812  1394  U   N [fio] 1
> 253,16   0   321647     2.044499839  1394  D   R 5768832 + 8 [fio]
> 253,16   1   307476     2.044504353  1396  S   R 4412024 + 8 [fio]
> 253,16   1   307477     2.044505381  1396  P   N [fio]
> 253,16   1   307478     2.044506799  1396  U   N [fio] 1
> 253,16   1   307479     2.044507784  1396  D   R 4412024 + 8 [fio]
> 253,16   0   321648     2.044509571  1394  C   R 3828264 + 8 [0]
> 253,16   0   321649     2.044525980  1394  Q   R 13757184 + 8 [fio]
> 253,16   0   321650     2.044526706  1394  G   R 13757184 + 8 [fio]
> 253,16   1   307480     2.044527670  1396  C   R 16292472 + 8 [0]
> 253,16   0   321651     2.044527830  1394  P   N [fio]
> 253,16   0   321652     2.044528889  1394  U   N [fio] 1
> 253,16   0   321653     2.044530006  1394  D   R 13757184 + 8 [fio]
> 253,16   0   321654     2.044539915  1394  C   R 7557264 + 8 [0]
> 253,16   0   321655     2.044544008  1394  C   R 4320544 + 8 [0]
> 253,16   1   307481     2.044574922  1396  C   R 15264800 + 8 [0]
> 253,16   1   307482     2.044579465  1396  C   R 15595656 + 8 [0]
> 253,16   0   321656     2.044591661  1394  C   R 8071352 + 8 [0]
> 253,16   1   307483     2.044595664  1395  P   N [fio]
> 253,16   1   307484     2.044597335  1395  U   N [fio] 1
> 253,16   1   307485     2.044605257  1395  C   R 15726296 + 8 [0]
> 253,16   0   321657     2.044607859  1394  Q   R 17562168 + 8 [fio]
> 253,16   0   321658     2.044608531  1394  G   R 17562168 + 8 [fio]
> 253,16   0   321659     2.044609756  1394  P   N [fio]
> 253,16   0   321660     2.044611086  1394  U   N [fio] 1
> 253,16   0   321661     2.044612275  1394  D   R 17562168 + 8 [fio]
> 253,16   1   307486     2.044614316  1395  D   R 4664552 + 8 [fio]
> 253,16   0   321662     2.044621707  1394  C   R 18403168 + 8 [0]
> 253,16   0   321663     2.044626241  1394  C   R 6357824 + 8 [0]
> 253,16   1   307487     2.044637446  1393  Q   R 2061240 + 8 [fio]
> 253,16   1   307488     2.044638263  1393  G   R 2061240 + 8 [fio]
> 253,16   1   307489     2.044645914  1393  C   R 6364168 + 8 [0]
> 253,16   0   321664     2.044650488  1394  Q   R 20380488 + 8 [fio]
> 253,16   0   321665     2.044651199  1394  G   R 20380488 + 8 [fio]
> 253,16   1   307490     2.044654060  1393  P   N [fio]
> 253,16   1   307491     2.044655466  1393  U   N [fio] 1
> 253,16   1   307492     2.044656734  1393  D   R 2061240 + 8 [fio]
> 253,16   0   321666     2.044659910  1394  C   R 16579544 + 8 [0]
> 253,16   0   321667     2.044663679  1394  C   R 14878944 + 8 [0]
> 253,16   0   321668     2.044671537  1394  P   N [fio]
> 253,16   0   321669     2.044673120  1394  U   N [fio] 1
> 253,16   0   321670     2.044674203  1394  D   R 20380488 + 8 [fio]
> 253,16   1   307493     2.044678596  1395  Q   R 15126872 + 8 [fio]
> 253,16   1   307494     2.044679280  1395  G   R 15126872 + 8 [fio]
> 253,16   1   307495     2.044682674  1395  P   N [fio]
> 253,16   1   307496     2.044689057  1395  C   R 6251136 + 8 [0]
> 253,16   1   307497     2.044697970  1395  U   N [fio] 1
> 253,16   1   307498     2.044699073  1395  D   R 15126872 + 8 [fio]
> 253,16   0   321671     2.044702366  1394  C   R 6028184 + 8 [0]
> 253,16   0   321672     2.044710344  1394  Q   R 8698464 + 8 [fio]
> 253,16   0   321673     2.044711072  1394  G   R 8698464 + 8 [fio]
> 253,16   0   321674     2.044712179  1394  P   N [fio]
> 253,16   0   321675     2.044713626  1394  U   N [fio] 1
> 253,16   1   307499     2.044713961  1393  C   R 20851408 + 8 [0]
> 253,16   0   321676     2.044714693  1394  D   R 8698464 + 8 [fio]
> 253,16   0   321677     2.044724918  1394  C   R 16890840 + 8 [0]
> 253,16   1   307500     2.044733628  1393  Q   R 20289656 + 8 [fio]
> 253,16   1   307501     2.044739576  1393  C   R 6570816 + 8 [0]
> 253,16   1   307502     2.044747620  1393  G   R 20289656 + 8 [fio]
> 253,16   1   307503     2.044748710  1393  P   N [fio]
> 253,16   1   307504     2.044750128  1393  U   N [fio] 1
> 253,16   0   321678     2.044750392  1394  C   R 6513968 + 8 [0]
> 253,16   1   307505     2.044751307  1393  D   R 20289656 + 8 [fio]
> 253,16   0   321679     2.044758314  1394  Q   R 5781280 + 8 [fio]
> 253,16   0   321680     2.044758861  1394  G   R 5781280 + 8 [fio]
> 253,16   0   321681     2.044759973  1394  P   N [fio]
> 253,16   0   321682     2.044761183  1394  U   N [fio] 1
> 253,16   1   307506     2.044761883  1396  Q   R 7477904 + 8 [fio]
> 253,16   0   321683     2.044762348  1394  D   R 5781280 + 8 [fio]
> 253,16   1   307507     2.044762462  1396  G   R 7477904 + 8 [fio]
> 253,16   1   307508     2.044763486  1396  P   N [fio]
> 253,16   1   307509     2.044764604  1396  U   N [fio] 1
> 253,16   1   307510     2.044765426  1396  D   R 7477904 + 8 [fio]
> 253,16   0   321684     2.044776281  1394  Q   R 17552200 + 8 [fio]
> 253,16   0   321685     2.044776867  1394  G   R 17552200 + 8 [fio]
> 253,16   0   321686     2.044777975  1394  P   N [fio]
> 253,16   1   307511     2.044778944  1396  C   R 16910272 + 8 [0]
> 253,16   0   321687     2.044779060  1394  U   N [fio] 1
> 253,16   0   321688     2.044780234  1394  D   R 17552200 + 8 [fio]
> 253,16   0   321689     2.044792840  1394  C   R 19736656 + 8 [0]
> 253,16   1   307512     2.044803867  1395  C   R 18088120 + 8 [0]
> 253,16   0   321690     2.044817910  1394  C   R 19495248 + 8 [0]
> 253,16   1   307513     2.044831267  1396  C   R 18613664 + 8 [0]
> 253,16   0   321691     2.044835246  1394  Q   R 2662776 + 8 [fio]
> 253,16   0   321692     2.044835878  1394  G   R 2662776 + 8 [fio]
> 253,16   0   321693     2.044836968  1394  P   N [fio]
> 253,16   0   321694     2.044838353  1394  U   N [fio] 1
> 253,16   0   321695     2.044839447  1394  D   R 2662776 + 8 [fio]
> 253,16   1   307514     2.044851246  1395  Q   R 8578768 + 8 [fio]
> 253,16   1   307515     2.044851868  1395  G   R 8578768 + 8 [fio]
> 253,16   1   307516     2.044853020  1395  P   N [fio]
> 253,16   0   321696     2.044853136  1394  Q   R 12783160 + 8 [fio]
> 253,16   0   321697     2.044853700  1394  G   R 12783160 + 8 [fio]
> 253,16   1   307517     2.044854185  1395  U   N [fio] 1
> 253,16   0   321698     2.044854759  1394  P   N [fio]
> 253,16   1   307518     2.044855289  1395  D   R 8578768 + 8 [fio]
> 253,16   0   321699     2.044855745  1394  U   N [fio] 1
> 253,16   0   321700     2.044856691  1394  D   R 12783160 + 8 [fio]
> 253,16   1   307519     2.044869744  1395  Q   R 1741184 + 8 [fio]
> 253,16   0   321701     2.044869754  1394  Q   R 11982944 + 8 [fio]
> 253,16   0   321702     2.044870388  1394  G   R 11982944 + 8 [fio]
> 253,16   1   307520     2.044870481  1395  G   R 1741184 + 8 [fio]
> 253,16   0   321703     2.044871564  1394  P   N [fio]
> 253,16   1   307521     2.044871650  1395  P   N [fio]
> 253,16   0   321704     2.044872615  1394  U   N [fio] 1
> 253,16   1   307522     2.044872687  1395  U   N [fio] 1
> 253,16   0   321705     2.044873938  1394  D   R 11982944 + 8 [fio]
> 253,16   0   321706     2.044875583  1394  D   R 1741184 + 8 [fio]
> 253,16   1   307523     2.044884491  1395  Q   R 11475600 + 8 [fio]
> 253,16   1   307524     2.044885101  1395  G   R 11475600 + 8 [fio]
> 253,16   1   307525     2.044886205  1395  P   N [fio]
> 253,16   0   321707     2.044887140  1394  Q   R 6125272 + 8 [fio]
> 253,16   1   307526     2.044887313  1395  U   N [fio] 1
> 253,16   0   321708     2.044887737  1394  G   R 6125272 + 8 [fio]
> 253,16   1   307527     2.044888330  1395  D   R 11475600 + 8 [fio]
> 253,16   0   321709     2.044888832  1394  P   N [fio]
> 253,16   0   321710     2.044889710  1394  U   N [fio] 1
> 253,16   0   321711     2.044890577  1394  D   R 6125272 + 8 [fio]
> 253,16   1   307528     2.044900606  1395  Q   R 634648 + 8 [fio]
> 253,16   1   307529     2.044901228  1395  G   R 634648 + 8 [fio]
> 253,16   0   321712     2.044901785  1394  Q   R 9187912 + 8 [fio]
> 253,16   1   307530     2.044902443  1395  P   N [fio]
> 253,16   0   321713     2.044902452  1394  G   R 9187912 + 8 [fio]
> 253,16   0   321714     2.044903495  1394  P   N [fio]
> 253,16   1   307531     2.044903547  1395  U   N [fio] 1
> 253,16   0   321715     2.044904448  1394  U   N [fio] 1
> 253,16   1   307532     2.044904580  1395  D   R 634648 + 8 [fio]
> 253,16   0   321716     2.044905342  1394  D   R 9187912 + 8 [fio]
> 253,16   1   307533     2.044917154  1395  Q   R 10607112 + 8 [fio]
> 253,16   1   307534     2.044917842  1395  G   R 10607112 + 8 [fio]
> 253,16   1   307535     2.044918920  1395  P   N [fio]
> 253,16   0   321717     2.044918974  1394  C   R 20819560 + 8 [0]
> 253,16   1   307536     2.044920027  1395  U   N [fio] 1
> 253,16   1   307537     2.044920973  1395  D   R 10607112 + 8 [fio]
> 253,16   1   307538     2.044932045  1395  C   R 18488736 + 8 [0]
> 253,16   0   321718     2.044938208  1394  Q   R 3527824 + 8 [fio]
> 253,16   0   321719     2.044938891  1394  G   R 3527824 + 8 [fio]
> 253,16   0   321720     2.044946221  1394  C   R 1054040 + 8 [0]
> 253,16   0   321721     2.044950233  1394  C   R 9042344 + 8 [0]
> 253,16   0   321722     2.044957832  1394  P   N [fio]
> 253,16   0   321723     2.044959392  1394  U   N [fio] 1
> 253,16   1   307539     2.044959886  1393  Q   R 18613872 + 8 [fio]
> 253,16   0   321724     2.044960611  1394  D   R 3527824 + 8 [fio]
> 253,16   1   307540     2.044960694  1393  G   R 18613872 + 8 [fio]
> 253,16   1   307541     2.044961852  1393  P   N [fio]
> 253,16   1   307542     2.044970724  1393  C   R 19504408 + 8 [0]
> 253,16   1   307543     2.044974744  1393  C   R 509080 + 8 [0]
> 253,16   0   321725     2.044980982  1394  Q   R 599736 + 8 [fio]
> 253,16   0   321726     2.044981594  1394  G   R 599736 + 8 [fio]
> 253,16   1   307544     2.044982536  1393  U   N [fio] 1
> 253,16   0   321727     2.044982690  1394  P   N [fio]
> 253,16   0   321728     2.044983622  1394  U   N [fio] 1
> 253,16   1   307545     2.044983687  1393  D   R 18613872 + 8 [fio]
> 253,16   0   321729     2.044986857  1394  D   R 599736 + 8 [fio]
> 253,16   1   307546     2.044992685  1396  Q   R 15873256 + 8 [fio]
> 253,16   0   321730     2.044993135  1394  C   R 9258392 + 8 [0]
> 253,16   1   307547     2.044993319  1396  G   R 15873256 + 8 [fio]
> 253,16   1   307548     2.044994455  1396  P   N [fio]
> 253,16   1   307549     2.044995683  1396  U   N [fio] 1
> 253,16   1   307550     2.044996660  1396  D   R 15873256 + 8 [fio]
> 253,16   0   321731     2.044996827  1394  C   R 12335824 + 8 [0]
> 253,16   1   307551     2.045015913  1396  C   R 10141512 + 8 [0]
> 253,16   1   307552     2.045019947  1396  C   R 2943808 + 8 [0]
> 253,16   0   321732     2.045023048  1394  Q   R 15878680 + 8 [fio]
> 253,16   0   321733     2.045023693  1394  G   R 15878680 + 8 [fio]
> 253,16   0   321734     2.045024976  1394  P   N [fio]
> 253,16   0   321735     2.045026157  1394  U   N [fio] 1
> 253,16   0   321736     2.045027185  1394  D   R 15878680 + 8 [fio]
> 253,16   0   321737     2.045039003  1394  C   R 12536088 + 8 [0]
> 253,16   0   321738     2.045043351  1394  C   R 12728832 + 8 [0]
> 253,16   1   307553     2.045047258  1393  Q   R 16429168 + 8 [fio]
> 253,16   1   307554     2.045047952  1393  G   R 16429168 + 8 [fio]
> 253,16   1   307555     2.045049191  1393  P   N [fio]
> 253,16   1   307556     2.045050400  1393  U   N [fio] 1
> 253,16   1   307557     2.045051596  1393  D   R 16429168 + 8 [fio]
> 253,16   1   307558     2.045059051  1393  C   R 19676536 + 8 [0]
> 253,16   1   307559     2.045063260  1393  C   R 3602624 + 8 [0]
> 253,16   0   321739     2.045067833  1394  Q   R 8871496 + 8 [fio]
> 253,16   0   321740     2.045068568  1394  G   R 8871496 + 8 [fio]
> 253,16   0   321741     2.045069712  1394  P   N [fio]
> 253,16   0   321742     2.045070860  1394  U   N [fio] 1
> 253,16   0   321743     2.045071828  1394  D   R 8871496 + 8 [fio]
> 253,16   1   307560     2.045081644  1396  Q   R 18234504 + 8 [fio]
> 253,16   1   307561     2.045082288  1396  G   R 18234504 + 8 [fio]
> 253,16   1   307562     2.045083331  1396  P   N [fio]
> 253,16   0   321744     2.045083945  1394  C   R 12931496 + 8 [0]
> 253,16   1   307563     2.045084583  1396  U   N [fio] 1
> 253,16   1   307564     2.045085578  1396  D   R 18234504 + 8 [fio]
> 253,16   0   321745     2.045087682  1394  C   R 8223176 + 8 [0]
> 253,16   1   307565     2.045107378  1396  C   R 3414184 + 8 [0]
> 253,16   0   321746     2.045109170  1394  Q   R 7639272 + 8 [fio]
> 253,16   0   321747     2.045109940  1394  G   R 7639272 + 8 [fio]
> 253,16   0   321748     2.045111122  1394  P   N [fio]
> 253,16   1   307566     2.045111541  1396  C   R 1703056 + 8 [0]
> 253,16   0   321749     2.045112293  1394  U   N [fio] 1
> 253,16   0   321750     2.045113409  1394  D   R 7639272 + 8 [fio]
> 253,16   0   321751     2.045129019  1394  C   R 4844912 + 8 [0]
> 253,16   0   321752     2.045133082  1394  C   R 14433456 + 8 [0]
> 253,16   1   307567     2.045138375  1393  Q   R 1554408 + 8 [fio]
> 253,16   1   307568     2.045139072  1393  G   R 1554408 + 8 [fio]
> 253,16   0   321753     2.045159160  1394  Q   R 3870352 + 8 [fio]
> 253,16   0   321754     2.045159766  1394  G   R 3870352 + 8 [fio]
> 253,16   0   321755     2.045160881  1394  P   N [fio]
> 253,16   0   321756     2.045162172  1394  U   N [fio] 1
> 253,16   0   321757     2.045163282  1394  D   R 3870352 + 8 [fio]
> 253,16   1   307569     2.045169743  1393  C   R 2935704 + 8 [0]
> 253,16   1   307570     2.045178563  1393  P   N [fio]
> 253,16   1   307571     2.045180295  1393  U   N [fio] 1
> 253,16   0   321758     2.045180768  1394  C   R 15646472 + 8 [0]
> 253,16   1   307572     2.045181451  1393  D   R 1554408 + 8 [fio]
> 253,16   0   321759     2.045184556  1394  C   R 14682176 + 8 [0]
> 253,16   0   321760     2.045187371  1394  C   R 13046784 + 8 [0]
> 253,16   0   321761     2.045196543  1394  Q   R 11609952 + 8 [fio]
> 253,16   0   321762     2.045197289  1394  G   R 11609952 + 8 [fio]
> 253,16   1   307573     2.045198166  1393  C   R 14489456 + 8 [0]
> 253,16   0   321763     2.045198329  1394  P   N [fio]
> 253,16   0   321764     2.045199520  1394  U   N [fio] 1
> 253,16   0   321765     2.045200544  1394  D   R 11609952 + 8 [fio]
> 253,16   0   321766     2.045208562  1394  C   R 6083512 + 8 [0]
> 253,16   1   307574     2.045219756  1395  C   R 3340928 + 8 [0]
> 253,16   0   321767     2.045232590  1394  C   R 6967144 + 8 [0]
> 253,16   1   307575     2.045243177  1396  C   R 15966080 + 8 [0]
> 253,16   1   307576     2.045250844  1396  Q   R 9139408 + 8 [fio]
> 253,16   1   307577     2.045251499  1396  G   R 9139408 + 8 [fio]
> 253,16   1   307578     2.045252527  1396  P   N [fio]
> 253,16   1   307579     2.045254025  1396  U   N [fio] 1
> 253,16   0   321768     2.045254048  1394  C   R 5681104 + 8 [0]
> 253,16   1   307580     2.045255169  1396  D   R 9139408 + 8 [fio]
> 253,16   0   321769     2.045262286  1394  Q   R 9056968 + 8 [fio]
> 253,16   0   321770     2.045262923  1394  G   R 9056968 + 8 [fio]
> 253,16   0   321771     2.045263851  1394  P   N [fio]
> 253,16   0   321772     2.045265016  1394  U   N [fio] 1
> 253,16   0   321773     2.045266087  1394  D   R 9056968 + 8 [fio]
> 253,16   1   307581     2.045267796    40  C   R 14598208 + 8 [0]
> 253,16   0   321774     2.045284312  1394  Q   R 3200080 + 8 [fio]
> 253,16   0   321775     2.045284867  1394  G   R 3200080 + 8 [fio]
> 253,16   0   321776     2.045285862  1394  P   N [fio]
> 253,16   1   307582     2.045287878  1396  Q   R 18113240 + 8 [fio]
> 253,16   1   307583     2.045288509  1396  G   R 18113240 + 8 [fio]
> 253,16   1   307584     2.045289508  1396  P   N [fio]
> 253,16   1   307585     2.045290565  1396  U   N [fio] 1
> 253,16   1   307586     2.045291629  1396  D   R 18113240 + 8 [fio]
> 253,16   0   321777     2.045294432  1394  C   R 4126280 + 8 [0]
> 253,16   0   321778     2.045302896  1394  U   N [fio] 1
> 253,16   0   321779     2.045303909  1394  D   R 3200080 + 8 [fio]
> 253,16   1   307587     2.045305145  1396  Q   R 12895512 + 8 [fio]
> 253,16   1   307588     2.045305766  1396  G   R 12895512 + 8 [fio]
> 253,16   1   307589     2.045306887  1396  P   N [fio]
> 253,16   1   307590     2.045308050  1396  U   N [fio] 1
> 253,16   1   307591     2.045309139  1396  D   R 12895512 + 8 [fio]
> 253,16   1   307592     2.045322292  1396  Q   R 14034472 + 8 [fio]
> 253,16   0   321780     2.045322301  1394  Q   R 796184 + 8 [fio]
> 253,16   0   321781     2.045322940  1394  G   R 796184 + 8 [fio]
> 253,16   1   307593     2.045322958  1396  G   R 14034472 + 8 [fio]
> 253,16   0   321782     2.045324193  1394  P   N [fio]
> 253,16   1   307594     2.045324199  1396  P   N [fio]
> 253,16   1   307595     2.045325295  1396  U   N [fio] 1
> 253,16   0   321783     2.045325314  1394  U   N [fio] 1
> 253,16   0   321784     2.045326501  1394  D   R 14034472 + 8 [fio]
> 253,16   1   307596     2.045326516  1396  D   R 796184 + 8 [fio]
> 253,16   1   307597     2.045339252  1396  Q   R 8827184 + 8 [fio]
> 253,16   0   321785     2.045339262  1394  Q   R 16108264 + 8 [fio]
> 253,16   0   321786     2.045339900  1394  G   R 16108264 + 8 [fio]
> 253,16   1   307598     2.045339902  1396  G   R 8827184 + 8 [fio]
> 253,16   1   307599     2.045341051  1396  P   N [fio]
> 253,16   0   321787     2.045341126  1394  P   N [fio]
> 253,16   1   307600     2.045342012  1396  U   N [fio] 1
> 253,16   0   321788     2.045342086  1394  U   N [fio] 1
> 253,16   0   321789     2.045343282  1394  D   R 16108264 + 8 [fio]
> 253,16   0   321790     2.045344979  1394  D   R 8827184 + 8 [fio]
> 253,16   1   307601     2.045353588  1396  Q   R 11285896 + 8 [fio]
> 253,16   1   307602     2.045354336  1396  G   R 11285896 + 8 [fio]
> 253,16   1   307603     2.045355348  1396  P   N [fio]
> 253,16   0   321791     2.045356120  1394  Q   R 8593080 + 8 [fio]
> 253,16   1   307604     2.045356402  1396  U   N [fio] 1
> 253,16   0   321792     2.045356714  1394  G   R 8593080 + 8 [fio]
> 253,16   1   307605     2.045357472  1396  D   R 11285896 + 8 [fio]
> 253,16   0   321793     2.045357789  1394  P   N [fio]
> 253,16   0   321794     2.045358691  1394  U   N [fio] 1
> 253,16   0   321795     2.045359550  1394  D   R 8593080 + 8 [fio]
> 253,16   1   307606     2.045369558  1396  Q   R 17509128 + 8 [fio]
> 253,16   1   307607     2.045370220  1396  G   R 17509128 + 8 [fio]
> 253,16   0   321796     2.045370807  1394  Q   R 7393376 + 8 [fio]
> 253,16   1   307608     2.045371249  1396  P   N [fio]
> 253,16   0   321797     2.045371405  1394  G   R 7393376 + 8 [fio]
> 253,16   1   307609     2.045372368  1396  U   N [fio] 1
> 253,16   0   321798     2.045372460  1394  P   N [fio]
> 253,16   1   307610     2.045373379  1396  D   R 17509128 + 8 [fio]
> 253,16   0   321799     2.045373443  1394  U   N [fio] 1
> 253,16   0   321800     2.045374334  1394  D   R 7393376 + 8 [fio]
> 253,16   1   307611     2.045385849  1396  Q   R 15227224 + 8 [fio]
> 253,16   0   321801     2.045385859  1394  Q   R 4025880 + 8 [fio]
> 253,16   0   321802     2.045386465  1394  G   R 4025880 + 8 [fio]
> 253,16   1   307612     2.045386490  1396  G   R 15227224 + 8 [fio]
> 253,16   0   321803     2.045387752  1394  P   N [fio]
> 253,16   1   307613     2.045387756  1396  P   N [fio]
> 253,16   1   307614     2.045388787  1396  U   N [fio] 1
> 253,16   0   321804     2.045388802  1394  U   N [fio] 1
> 253,16   0   321805     2.045390041  1394  D   R 4025880 + 8 [fio]
> 253,16   0   321806     2.045391662  1394  D   R 15227224 + 8 [fio]
> 253,16   1   307615     2.045400454  1396  Q   R 1524424 + 8 [fio]
> 253,16   1   307616     2.045401113  1396  G   R 1524424 + 8 [fio]
> 253,16   1   307617     2.045402135  1396  P   N [fio]
> 253,16   0   321807     2.045402877  1394  Q   R 20400040 + 8 [fio]
> 253,16   1   307618     2.045403201  1396  U   N [fio] 1
> 253,16   0   321808     2.045403474  1394  G   R 20400040 + 8 [fio]
> 253,16   1   307619     2.045404307  1396  D   R 1524424 + 8 [fio]
> 253,16   0   321809     2.045404603  1394  P   N [fio]
> 253,16   0   321810     2.045405503  1394  U   N [fio] 1
> 253,16   0   321811     2.045406368  1394  D   R 20400040 + 8 [fio]
> 253,16   1   307620     2.045416364  1396  Q   R 12912080 + 8 [fio]
> 253,16   1   307621     2.045417025  1396  G   R 12912080 + 8 [fio]
> 253,16   0   321812     2.045417239  1394  Q   R 14427880 + 8 [fio]
> 253,16   0   321813     2.045417755  1394  G   R 14427880 + 8 [fio]
> 253,16   1   307622     2.045418346  1396  P   N [fio]
> 253,16   0   321814     2.045418901  1394  P   N [fio]
> 253,16   1   307623     2.045419483  1396  U   N [fio] 1
> 253,16   0   321815     2.045419817  1394  U   N [fio] 1
> 253,16   1   307624     2.045420512  1396  D   R 12912080 + 8 [fio]
> 253,16   0   321816     2.045420817  1394  D   R 14427880 + 8 [fio]
> 253,16   0   321817     2.045433275  1394  Q   R 5098080 + 8 [fio]
> 253,16   1   307625     2.045433281  1396  Q   R 9204032 + 8 [fio]
> 253,16   1   307626     2.045433910  1396  G   R 9204032 + 8 [fio]
> 253,16   0   321818     2.045433939  1394  G   R 5098080 + 8 [fio]
> 253,16   1   307627     2.045435074  1396  P   N [fio]
> 253,16   0   321819     2.045435151  1394  P   N [fio]
> 253,16   1   307628     2.045436065  1396  U   N [fio] 1
> 253,16   0   321820     2.045436134  1394  U   N [fio] 1
> 253,16   1   307629     2.045437153  1396  D   R 5098080 + 8 [fio]
> 253,16   0   321821     2.045437189  1394  D   R 9204032 + 8 [fio]
> 253,16   0   321822     2.045449724  1394  Q   R 8743352 + 8 [fio]
> 253,16   1   307630     2.045449731  1396  Q   R 3957640 + 8 [fio]
> 253,16   1   307631     2.045450373  1396  G   R 3957640 + 8 [fio]
> 253,16   0   321823     2.045450385  1394  G   R 8743352 + 8 [fio]
> 253,16   1   307632     2.045451499  1396  P   N [fio]
> 253,16   0   321824     2.045451560  1394  P   N [fio]
> 253,16   1   307633     2.045452564  1396  U   N [fio] 1
> 253,16   0   321825     2.045452595  1394  U   N [fio] 1
> 253,16   1   307634     2.045453896  1396  D   R 8743352 + 8 [fio]
> 253,16   1   307635     2.045455483  1396  D   R 3957640 + 8 [fio]
> 253,16   0   321826     2.045464230  1394  Q   R 18141352 + 8 [fio]
> 253,16   0   321827     2.045464809  1394  G   R 18141352 + 8 [fio]
> 253,16   0   321828     2.045465808  1394  P   N [fio]
> 253,16   1   307636     2.045466884  1396  Q   R 14356248 + 8 [fio]
> 253,16   0   321829     2.045466925  1394  U   N [fio] 1
> 253,16   1   307637     2.045467445  1396  G   R 14356248 + 8 [fio]
> 253,16   0   321830     2.045467911  1394  D   R 18141352 + 8 [fio]
> 253,16   1   307638     2.045469917  1396  P   N [fio]
> 253,16   1   307639     2.045470841  1396  U   N [fio] 1
> 253,16   1   307640     2.045471757  1396  D   R 14356248 + 8 [fio]
> 253,16   0   321831     2.045480371  1394  Q   R 3678872 + 8 [fio]
> 253,16   0   321832     2.045480944  1394  G   R 3678872 + 8 [fio]
> 253,16   0   321833     2.045482626  1394  S   R 3678872 + 8 [fio]
> 253,16   1   307641     2.045483448  1396  Q   R 14080120 + 8 [fio]
> 253,16   1   307642     2.045483952  1396  G   R 14080120 + 8 [fio]
> 253,16   1   307643     2.045484932  1396  P   N [fio]
> 253,16   1   307644     2.045485963  1396  U   N [fio] 1
> 253,16   1   307645     2.045486944  1396  D   R 14080120 + 8 [fio]
> 253,16   1   307646     2.045500301  1396  Q   R 7780032 + 8 [fio]
> 253,16   1   307647     2.045500913  1396  G   R 7780032 + 8 [fio]
> 253,16   1   307648     2.045502001  1396  P   N [fio]
> 253,16   1   307649     2.045503033  1396  U   N [fio] 1
> 253,16   1   307650     2.045504023  1396  D   R 7780032 + 8 [fio]
> 253,16   1   307651     2.045514780  1396  Q   R 19258792 + 8 [fio]
> 253,16   1   307652     2.045515251  1396  G   R 19258792 + 8 [fio]
> 253,16   1   307653     2.045516023  1396  P   N [fio]
> 253,16   1   307654     2.045516601  1396  U   N [fio] 1
> 253,16   1   307655     2.045517220  1396  D   R 19258792 + 8 [fio]
> 253,16   1   307656     2.045524439  1396  Q   R 17379048 + 8 [fio]
> 253,16   1   307657     2.045524744  1396  G   R 17379048 + 8 [fio]
> 253,16   1   307658     2.045525291  1396  P   N [fio]
> 253,16   1   307659     2.045525815  1396  U   N [fio] 1
> 253,16   1   307660     2.045526389  1396  D   R 17379048 + 8 [fio]
> 253,16   1   307661     2.045533654  1396  Q   R 6701640 + 8 [fio]
> 253,16   1   307662     2.045533958  1396  G   R 6701640 + 8 [fio]
> 253,16   1   307663     2.045535034  1396  S   R 6701640 + 8 [fio]
> 253,16   1   307664     2.045545018  1395  Q   R 3973592 + 8 [fio]
> 253,16   1   307665     2.045545304  1395  G   R 3973592 + 8 [fio]
> 253,16   1   307666     2.045546132  1395  S   R 3973592 + 8 [fio]
> 253,16   1   307667     2.045574463  1393  Q   R 2334424 + 8 [fio]
> 253,16   1   307668     2.045574949  1393  G   R 2334424 + 8 [fio]
> 253,16   1   307669     2.045575980  1393  S   R 2334424 + 8 [fio]
> CPU0 (vdb):
>  Reads Queued:      52,778,  211,112KiB	 Writes Queued:           0,        0KiB
>  Read Dispatches:   53,948,  215,792KiB	 Write Dispatches:        0,        0KiB
>  Reads Requeued:         0		 Writes Requeued:         0
>  Reads Completed:   51,912,  207,648KiB	 Writes Completed:        0,        0KiB
>  Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
>  Read depth:            64        	 Write depth:             0
>  IO unplugs:        52,773        	 Timer unplugs:           0
> CPU1 (vdb):
>  Reads Queued:      50,593,  202,372KiB	 Writes Queued:           0,        0KiB
>  Read Dispatches:   49,419,  197,676KiB	 Write Dispatches:        0,        0KiB
>  Reads Requeued:         0		 Writes Requeued:         0
>  Reads Completed:   51,391,  205,564KiB	 Writes Completed:        0,        0KiB
>  Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
>  Read depth:            64        	 Write depth:             0
>  IO unplugs:        50,594        	 Timer unplugs:           0
> 
> Total (vdb):
>  Reads Queued:     103,371,  413,484KiB	 Writes Queued:           0,        0KiB
>  Read Dispatches:  103,367,  413,468KiB	 Write Dispatches:        0,        0KiB
>  Reads Requeued:         0		 Writes Requeued:         0
>  Reads Completed:  103,303,  413,212KiB	 Writes Completed:        0,        0KiB
>  Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
>  IO unplugs:       103,367        	 Timer unplugs:           0
> 
> Throughput (R/W): 202,059KiB/s / 0KiB/s
> Events (vdb): 629,502 entries
> Skips: 0 forward (0 -   0.0%)
> 
> The last part of the trace is all 4 processes I run failing to allocate
> a request and going to sleep. The last completion event is some 150
> lines up, after various successful dispatch etc events.

Looks like a race condition, below works for me, please try.


Subject: virtio_blk: fix race condition

virtqueue_kick() isn't multi-thread safe.

Signed-off-by: Shaohua Li <shli@fusionio.com>

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 588479d..f353959 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -204,10 +204,11 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 		virtqueue_kick(vblk->vq);
 		return BLK_MQ_RQ_QUEUE_BUSY;
 	}
-	spin_unlock_irqrestore(&vblk->vq_lock, flags);
 
 	if (last)
 		virtqueue_kick(vblk->vq);
+	spin_unlock_irqrestore(&vblk->vq_lock, flags);
+
 	return BLK_MQ_RQ_QUEUE_OK;
 }
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  1:44                   ` Shaohua Li
@ 2013-11-20  1:54                     ` Jens Axboe
  2013-11-20  2:02                       ` Jens Axboe
  2013-11-20  8:07                       ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-20  1:54 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Dave Chinner, linux-kernel

On Wed, Nov 20 2013, Shaohua Li wrote:
> On Tue, Nov 19, 2013 at 05:08:58PM -0700, Jens Axboe wrote:
> > On Tue, Nov 19 2013, Jens Axboe wrote:
> > > On Wed, Nov 20 2013, Dave Chinner wrote:
> > > > On Tue, Nov 19, 2013 at 03:51:27PM -0700, Jens Axboe wrote:
> > > > > On 11/19/2013 03:42 PM, Jens Axboe wrote:
> > > > > > On 11/19/2013 02:43 PM, Jens Axboe wrote:
> > > > > >> On 11/19/2013 02:34 PM, Dave Chinner wrote:
> > > > > >>> On Tue, Nov 19, 2013 at 02:20:42PM -0700, Jens Axboe wrote:
> > > > > >>>> On Tue, Nov 19 2013, Jens Axboe wrote:
> > > > > >>>>> On Tue, Nov 19 2013, Dave Chinner wrote:
> > > > > >>>>>> Hi Jens,
> > > > > >>>>>>
> > > > > >>>>>> I was just running xfstests on a 3.13 kernel that has had the block
> > > > > >>>>>> layer changed merged into it. generic/269 on XFS is hanging on a 2
> > > > > >>>>>> CPU VM using virtio,cache=none for the block devices under test,
> > > > > >>>>>> with many (130+) threads stuck below submit_bio() like this:
> > > > > >>>>>>
> > > > > >>>>>>  Call Trace:
> > > > > >>>>>>   [<ffffffff81adb1c9>] schedule+0x29/0x70
> > > > > >>>>>>   [<ffffffff817833ee>] percpu_ida_alloc+0x16e/0x330
> > > > > >>>>>>   [<ffffffff81759bef>] blk_mq_wait_for_tags+0x1f/0x40
> > > > > >>>>>>   [<ffffffff81758bee>] blk_mq_alloc_request_pinned+0x4e/0xf0
> > > > > >>>>>>   [<ffffffff8175931b>] blk_mq_make_request+0x3bb/0x4a0
> > > > > >>>>>>   [<ffffffff8174d2b2>] generic_make_request+0xc2/0x110
> > > > > >>>>>>   [<ffffffff8174e40c>] submit_bio+0x6c/0x120
> > > > > >>>>>>
> > > > > >>>>>> reads and writes are hung, both data (direct and buffered) and
> > > > > >>>>>> metadata.
> > > > > >>>>>>
> > > > > >>>>>> Some IOs are sitting in io_schedule, waiting for IO completion (both
> > > > > >>>>>> buffered and direct IO, both reads and writes) so it looks like IO
> > > > > >>>>>> completion has stalled in some manner, too.
> > > > > >>>>>
> > > > > >>>>> Can I get a recipe to reproduce this? I haven't had any luck so far.
> > > > > >>>>
> > > > > >>>> OK, I reproduced it. Looks weird, basically all 64 commands are in
> > > > > >>>> flight, but haven't completed. So the next one that comes in just sits
> > > > > >>>> there forever. I can't find any sysfs debug entries for virtio, would be
> > > > > >>>> nice to inspect its queue as well...
> > > > > >>>
> > > > > >>> Does it have anything to do with the fact that the request queue
> > > > > >>> depth is 128 entries and the tag pool only has 66 tags in it? i.e:
> > > > > >>>
> > > > > >>> /sys/block/vdb/queue/nr_requests
> > > > > >>> 128
> > > > > >>>
> > > > > >>> /sys/block/vdb/mq/0/tags
> > > > > >>> nr_tags=66, reserved_tags=2, batch_move=16, max_cache=32
> > > > > >>> nr_free=0, nr_reserved=1
> > > > > >>>   cpu00: nr_free=0
> > > > > >>>   cpu01: nr_free=0
> > > > > >>>
> > > > > >>> Seems to imply that if we queue up more than 66 IOs without
> > > > > >>> dispatching them, we'll run out of tags. And without another IO
> > > > > >>> coming through, the "none" scheduler that virtio uses will never
> > > > > >>> get a trigger to push out the currently queued IO?
> > > > > >>
> > > > > >> No, the nr_requests isn't actually relevant in the blk-mq context, the
> > > > > >> driver sets its own depth. For the above, it's 64 normal commands, and 2
> > > > > >> reserved. The reserved would be for a flush, for instance. If someone
> > > > > >> attempts to queue more than the allocated number of requests, it'll stop
> > > > > >> the blk-mq queue and kick things into gear on the virtio side. Then when
> > > > > >> requests complete, we start the queue again.
> > > > > >>
> > > > > >> If you look at virtio_queue_rq(), that handles a single request. This
> > > > > >> request is already tagged at this point. If we can't add it to the ring,
> > > > > >> we simply stop the queue and kick off whatever pending we might have. We
> > > > > >> return BLK_MQ_RQ_QUEUE_BUSY to blk-mq, which tells that to back off on
> > > > > >> sending us more. When we get the virtblk_done() callback from virtio, we
> > > > > >> end the requests on the blk-mq side and restart the queue.
> > > > > > 
> > > > > > I added some debug code to see if we had anything pending on the blk-mq
> > > > > > side, and it's all empty. It really just looks like we are missing
> > > > > > completions on the virtio side. Very odd.
> > > > > 
> > > > > Patching in the old rq path works however, so...
> > > > 
> > > > ... we've found a race condition ;)
> > > > 
> > > > FWIW, the thing that comes immediately to mind for me is that all
> > > > the wakeups and queue kicking is done outside of any locks, using
> > > > status gained from inside the lock context. Hence the mq stop/start
> > > > and virt queue kicking can all race - perhaps that's resulting in a
> > > > missed wakeup, restart or queue kick?
> > > 
> > > I _think_ it only happens when queue depth > device queue depth, but I
> > > don't immediately see what is wrong. That logic is pretty much tried and
> > > true. As a debug measure, I just ignored the stop bit, but it doesn't
> > > change anything. BTW, that is pretty much the same as before.
> > > 
> > > Anyway, no real news yet, will debug later tonight or tomorrow morning
> > > again.
> > 
> > BTW, ran a blktrace on the device, also looks like completion events
> > just stop coming in:
> > 
> > [...]
> > 253,16   0   321645     2.044497762  1394  P   N [fio]
> > 253,16   0   321646     2.044498812  1394  U   N [fio] 1
> > 253,16   0   321647     2.044499839  1394  D   R 5768832 + 8 [fio]
> > 253,16   1   307476     2.044504353  1396  S   R 4412024 + 8 [fio]
> > 253,16   1   307477     2.044505381  1396  P   N [fio]
> > 253,16   1   307478     2.044506799  1396  U   N [fio] 1
> > 253,16   1   307479     2.044507784  1396  D   R 4412024 + 8 [fio]
> > 253,16   0   321648     2.044509571  1394  C   R 3828264 + 8 [0]
> > 253,16   0   321649     2.044525980  1394  Q   R 13757184 + 8 [fio]
> > 253,16   0   321650     2.044526706  1394  G   R 13757184 + 8 [fio]
> > 253,16   1   307480     2.044527670  1396  C   R 16292472 + 8 [0]
> > 253,16   0   321651     2.044527830  1394  P   N [fio]
> > 253,16   0   321652     2.044528889  1394  U   N [fio] 1
> > 253,16   0   321653     2.044530006  1394  D   R 13757184 + 8 [fio]
> > 253,16   0   321654     2.044539915  1394  C   R 7557264 + 8 [0]
> > 253,16   0   321655     2.044544008  1394  C   R 4320544 + 8 [0]
> > 253,16   1   307481     2.044574922  1396  C   R 15264800 + 8 [0]
> > 253,16   1   307482     2.044579465  1396  C   R 15595656 + 8 [0]
> > 253,16   0   321656     2.044591661  1394  C   R 8071352 + 8 [0]
> > 253,16   1   307483     2.044595664  1395  P   N [fio]
> > 253,16   1   307484     2.044597335  1395  U   N [fio] 1
> > 253,16   1   307485     2.044605257  1395  C   R 15726296 + 8 [0]
> > 253,16   0   321657     2.044607859  1394  Q   R 17562168 + 8 [fio]
> > 253,16   0   321658     2.044608531  1394  G   R 17562168 + 8 [fio]
> > 253,16   0   321659     2.044609756  1394  P   N [fio]
> > 253,16   0   321660     2.044611086  1394  U   N [fio] 1
> > 253,16   0   321661     2.044612275  1394  D   R 17562168 + 8 [fio]
> > 253,16   1   307486     2.044614316  1395  D   R 4664552 + 8 [fio]
> > 253,16   0   321662     2.044621707  1394  C   R 18403168 + 8 [0]
> > 253,16   0   321663     2.044626241  1394  C   R 6357824 + 8 [0]
> > 253,16   1   307487     2.044637446  1393  Q   R 2061240 + 8 [fio]
> > 253,16   1   307488     2.044638263  1393  G   R 2061240 + 8 [fio]
> > 253,16   1   307489     2.044645914  1393  C   R 6364168 + 8 [0]
> > 253,16   0   321664     2.044650488  1394  Q   R 20380488 + 8 [fio]
> > 253,16   0   321665     2.044651199  1394  G   R 20380488 + 8 [fio]
> > 253,16   1   307490     2.044654060  1393  P   N [fio]
> > 253,16   1   307491     2.044655466  1393  U   N [fio] 1
> > 253,16   1   307492     2.044656734  1393  D   R 2061240 + 8 [fio]
> > 253,16   0   321666     2.044659910  1394  C   R 16579544 + 8 [0]
> > 253,16   0   321667     2.044663679  1394  C   R 14878944 + 8 [0]
> > 253,16   0   321668     2.044671537  1394  P   N [fio]
> > 253,16   0   321669     2.044673120  1394  U   N [fio] 1
> > 253,16   0   321670     2.044674203  1394  D   R 20380488 + 8 [fio]
> > 253,16   1   307493     2.044678596  1395  Q   R 15126872 + 8 [fio]
> > 253,16   1   307494     2.044679280  1395  G   R 15126872 + 8 [fio]
> > 253,16   1   307495     2.044682674  1395  P   N [fio]
> > 253,16   1   307496     2.044689057  1395  C   R 6251136 + 8 [0]
> > 253,16   1   307497     2.044697970  1395  U   N [fio] 1
> > 253,16   1   307498     2.044699073  1395  D   R 15126872 + 8 [fio]
> > 253,16   0   321671     2.044702366  1394  C   R 6028184 + 8 [0]
> > 253,16   0   321672     2.044710344  1394  Q   R 8698464 + 8 [fio]
> > 253,16   0   321673     2.044711072  1394  G   R 8698464 + 8 [fio]
> > 253,16   0   321674     2.044712179  1394  P   N [fio]
> > 253,16   0   321675     2.044713626  1394  U   N [fio] 1
> > 253,16   1   307499     2.044713961  1393  C   R 20851408 + 8 [0]
> > 253,16   0   321676     2.044714693  1394  D   R 8698464 + 8 [fio]
> > 253,16   0   321677     2.044724918  1394  C   R 16890840 + 8 [0]
> > 253,16   1   307500     2.044733628  1393  Q   R 20289656 + 8 [fio]
> > 253,16   1   307501     2.044739576  1393  C   R 6570816 + 8 [0]
> > 253,16   1   307502     2.044747620  1393  G   R 20289656 + 8 [fio]
> > 253,16   1   307503     2.044748710  1393  P   N [fio]
> > 253,16   1   307504     2.044750128  1393  U   N [fio] 1
> > 253,16   0   321678     2.044750392  1394  C   R 6513968 + 8 [0]
> > 253,16   1   307505     2.044751307  1393  D   R 20289656 + 8 [fio]
> > 253,16   0   321679     2.044758314  1394  Q   R 5781280 + 8 [fio]
> > 253,16   0   321680     2.044758861  1394  G   R 5781280 + 8 [fio]
> > 253,16   0   321681     2.044759973  1394  P   N [fio]
> > 253,16   0   321682     2.044761183  1394  U   N [fio] 1
> > 253,16   1   307506     2.044761883  1396  Q   R 7477904 + 8 [fio]
> > 253,16   0   321683     2.044762348  1394  D   R 5781280 + 8 [fio]
> > 253,16   1   307507     2.044762462  1396  G   R 7477904 + 8 [fio]
> > 253,16   1   307508     2.044763486  1396  P   N [fio]
> > 253,16   1   307509     2.044764604  1396  U   N [fio] 1
> > 253,16   1   307510     2.044765426  1396  D   R 7477904 + 8 [fio]
> > 253,16   0   321684     2.044776281  1394  Q   R 17552200 + 8 [fio]
> > 253,16   0   321685     2.044776867  1394  G   R 17552200 + 8 [fio]
> > 253,16   0   321686     2.044777975  1394  P   N [fio]
> > 253,16   1   307511     2.044778944  1396  C   R 16910272 + 8 [0]
> > 253,16   0   321687     2.044779060  1394  U   N [fio] 1
> > 253,16   0   321688     2.044780234  1394  D   R 17552200 + 8 [fio]
> > 253,16   0   321689     2.044792840  1394  C   R 19736656 + 8 [0]
> > 253,16   1   307512     2.044803867  1395  C   R 18088120 + 8 [0]
> > 253,16   0   321690     2.044817910  1394  C   R 19495248 + 8 [0]
> > 253,16   1   307513     2.044831267  1396  C   R 18613664 + 8 [0]
> > 253,16   0   321691     2.044835246  1394  Q   R 2662776 + 8 [fio]
> > 253,16   0   321692     2.044835878  1394  G   R 2662776 + 8 [fio]
> > 253,16   0   321693     2.044836968  1394  P   N [fio]
> > 253,16   0   321694     2.044838353  1394  U   N [fio] 1
> > 253,16   0   321695     2.044839447  1394  D   R 2662776 + 8 [fio]
> > 253,16   1   307514     2.044851246  1395  Q   R 8578768 + 8 [fio]
> > 253,16   1   307515     2.044851868  1395  G   R 8578768 + 8 [fio]
> > 253,16   1   307516     2.044853020  1395  P   N [fio]
> > 253,16   0   321696     2.044853136  1394  Q   R 12783160 + 8 [fio]
> > 253,16   0   321697     2.044853700  1394  G   R 12783160 + 8 [fio]
> > 253,16   1   307517     2.044854185  1395  U   N [fio] 1
> > 253,16   0   321698     2.044854759  1394  P   N [fio]
> > 253,16   1   307518     2.044855289  1395  D   R 8578768 + 8 [fio]
> > 253,16   0   321699     2.044855745  1394  U   N [fio] 1
> > 253,16   0   321700     2.044856691  1394  D   R 12783160 + 8 [fio]
> > 253,16   1   307519     2.044869744  1395  Q   R 1741184 + 8 [fio]
> > 253,16   0   321701     2.044869754  1394  Q   R 11982944 + 8 [fio]
> > 253,16   0   321702     2.044870388  1394  G   R 11982944 + 8 [fio]
> > 253,16   1   307520     2.044870481  1395  G   R 1741184 + 8 [fio]
> > 253,16   0   321703     2.044871564  1394  P   N [fio]
> > 253,16   1   307521     2.044871650  1395  P   N [fio]
> > 253,16   0   321704     2.044872615  1394  U   N [fio] 1
> > 253,16   1   307522     2.044872687  1395  U   N [fio] 1
> > 253,16   0   321705     2.044873938  1394  D   R 11982944 + 8 [fio]
> > 253,16   0   321706     2.044875583  1394  D   R 1741184 + 8 [fio]
> > 253,16   1   307523     2.044884491  1395  Q   R 11475600 + 8 [fio]
> > 253,16   1   307524     2.044885101  1395  G   R 11475600 + 8 [fio]
> > 253,16   1   307525     2.044886205  1395  P   N [fio]
> > 253,16   0   321707     2.044887140  1394  Q   R 6125272 + 8 [fio]
> > 253,16   1   307526     2.044887313  1395  U   N [fio] 1
> > 253,16   0   321708     2.044887737  1394  G   R 6125272 + 8 [fio]
> > 253,16   1   307527     2.044888330  1395  D   R 11475600 + 8 [fio]
> > 253,16   0   321709     2.044888832  1394  P   N [fio]
> > 253,16   0   321710     2.044889710  1394  U   N [fio] 1
> > 253,16   0   321711     2.044890577  1394  D   R 6125272 + 8 [fio]
> > 253,16   1   307528     2.044900606  1395  Q   R 634648 + 8 [fio]
> > 253,16   1   307529     2.044901228  1395  G   R 634648 + 8 [fio]
> > 253,16   0   321712     2.044901785  1394  Q   R 9187912 + 8 [fio]
> > 253,16   1   307530     2.044902443  1395  P   N [fio]
> > 253,16   0   321713     2.044902452  1394  G   R 9187912 + 8 [fio]
> > 253,16   0   321714     2.044903495  1394  P   N [fio]
> > 253,16   1   307531     2.044903547  1395  U   N [fio] 1
> > 253,16   0   321715     2.044904448  1394  U   N [fio] 1
> > 253,16   1   307532     2.044904580  1395  D   R 634648 + 8 [fio]
> > 253,16   0   321716     2.044905342  1394  D   R 9187912 + 8 [fio]
> > 253,16   1   307533     2.044917154  1395  Q   R 10607112 + 8 [fio]
> > 253,16   1   307534     2.044917842  1395  G   R 10607112 + 8 [fio]
> > 253,16   1   307535     2.044918920  1395  P   N [fio]
> > 253,16   0   321717     2.044918974  1394  C   R 20819560 + 8 [0]
> > 253,16   1   307536     2.044920027  1395  U   N [fio] 1
> > 253,16   1   307537     2.044920973  1395  D   R 10607112 + 8 [fio]
> > 253,16   1   307538     2.044932045  1395  C   R 18488736 + 8 [0]
> > 253,16   0   321718     2.044938208  1394  Q   R 3527824 + 8 [fio]
> > 253,16   0   321719     2.044938891  1394  G   R 3527824 + 8 [fio]
> > 253,16   0   321720     2.044946221  1394  C   R 1054040 + 8 [0]
> > 253,16   0   321721     2.044950233  1394  C   R 9042344 + 8 [0]
> > 253,16   0   321722     2.044957832  1394  P   N [fio]
> > 253,16   0   321723     2.044959392  1394  U   N [fio] 1
> > 253,16   1   307539     2.044959886  1393  Q   R 18613872 + 8 [fio]
> > 253,16   0   321724     2.044960611  1394  D   R 3527824 + 8 [fio]
> > 253,16   1   307540     2.044960694  1393  G   R 18613872 + 8 [fio]
> > 253,16   1   307541     2.044961852  1393  P   N [fio]
> > 253,16   1   307542     2.044970724  1393  C   R 19504408 + 8 [0]
> > 253,16   1   307543     2.044974744  1393  C   R 509080 + 8 [0]
> > 253,16   0   321725     2.044980982  1394  Q   R 599736 + 8 [fio]
> > 253,16   0   321726     2.044981594  1394  G   R 599736 + 8 [fio]
> > 253,16   1   307544     2.044982536  1393  U   N [fio] 1
> > 253,16   0   321727     2.044982690  1394  P   N [fio]
> > 253,16   0   321728     2.044983622  1394  U   N [fio] 1
> > 253,16   1   307545     2.044983687  1393  D   R 18613872 + 8 [fio]
> > 253,16   0   321729     2.044986857  1394  D   R 599736 + 8 [fio]
> > 253,16   1   307546     2.044992685  1396  Q   R 15873256 + 8 [fio]
> > 253,16   0   321730     2.044993135  1394  C   R 9258392 + 8 [0]
> > 253,16   1   307547     2.044993319  1396  G   R 15873256 + 8 [fio]
> > 253,16   1   307548     2.044994455  1396  P   N [fio]
> > 253,16   1   307549     2.044995683  1396  U   N [fio] 1
> > 253,16   1   307550     2.044996660  1396  D   R 15873256 + 8 [fio]
> > 253,16   0   321731     2.044996827  1394  C   R 12335824 + 8 [0]
> > 253,16   1   307551     2.045015913  1396  C   R 10141512 + 8 [0]
> > 253,16   1   307552     2.045019947  1396  C   R 2943808 + 8 [0]
> > 253,16   0   321732     2.045023048  1394  Q   R 15878680 + 8 [fio]
> > 253,16   0   321733     2.045023693  1394  G   R 15878680 + 8 [fio]
> > 253,16   0   321734     2.045024976  1394  P   N [fio]
> > 253,16   0   321735     2.045026157  1394  U   N [fio] 1
> > 253,16   0   321736     2.045027185  1394  D   R 15878680 + 8 [fio]
> > 253,16   0   321737     2.045039003  1394  C   R 12536088 + 8 [0]
> > 253,16   0   321738     2.045043351  1394  C   R 12728832 + 8 [0]
> > 253,16   1   307553     2.045047258  1393  Q   R 16429168 + 8 [fio]
> > 253,16   1   307554     2.045047952  1393  G   R 16429168 + 8 [fio]
> > 253,16   1   307555     2.045049191  1393  P   N [fio]
> > 253,16   1   307556     2.045050400  1393  U   N [fio] 1
> > 253,16   1   307557     2.045051596  1393  D   R 16429168 + 8 [fio]
> > 253,16   1   307558     2.045059051  1393  C   R 19676536 + 8 [0]
> > 253,16   1   307559     2.045063260  1393  C   R 3602624 + 8 [0]
> > 253,16   0   321739     2.045067833  1394  Q   R 8871496 + 8 [fio]
> > 253,16   0   321740     2.045068568  1394  G   R 8871496 + 8 [fio]
> > 253,16   0   321741     2.045069712  1394  P   N [fio]
> > 253,16   0   321742     2.045070860  1394  U   N [fio] 1
> > 253,16   0   321743     2.045071828  1394  D   R 8871496 + 8 [fio]
> > 253,16   1   307560     2.045081644  1396  Q   R 18234504 + 8 [fio]
> > 253,16   1   307561     2.045082288  1396  G   R 18234504 + 8 [fio]
> > 253,16   1   307562     2.045083331  1396  P   N [fio]
> > 253,16   0   321744     2.045083945  1394  C   R 12931496 + 8 [0]
> > 253,16   1   307563     2.045084583  1396  U   N [fio] 1
> > 253,16   1   307564     2.045085578  1396  D   R 18234504 + 8 [fio]
> > 253,16   0   321745     2.045087682  1394  C   R 8223176 + 8 [0]
> > 253,16   1   307565     2.045107378  1396  C   R 3414184 + 8 [0]
> > 253,16   0   321746     2.045109170  1394  Q   R 7639272 + 8 [fio]
> > 253,16   0   321747     2.045109940  1394  G   R 7639272 + 8 [fio]
> > 253,16   0   321748     2.045111122  1394  P   N [fio]
> > 253,16   1   307566     2.045111541  1396  C   R 1703056 + 8 [0]
> > 253,16   0   321749     2.045112293  1394  U   N [fio] 1
> > 253,16   0   321750     2.045113409  1394  D   R 7639272 + 8 [fio]
> > 253,16   0   321751     2.045129019  1394  C   R 4844912 + 8 [0]
> > 253,16   0   321752     2.045133082  1394  C   R 14433456 + 8 [0]
> > 253,16   1   307567     2.045138375  1393  Q   R 1554408 + 8 [fio]
> > 253,16   1   307568     2.045139072  1393  G   R 1554408 + 8 [fio]
> > 253,16   0   321753     2.045159160  1394  Q   R 3870352 + 8 [fio]
> > 253,16   0   321754     2.045159766  1394  G   R 3870352 + 8 [fio]
> > 253,16   0   321755     2.045160881  1394  P   N [fio]
> > 253,16   0   321756     2.045162172  1394  U   N [fio] 1
> > 253,16   0   321757     2.045163282  1394  D   R 3870352 + 8 [fio]
> > 253,16   1   307569     2.045169743  1393  C   R 2935704 + 8 [0]
> > 253,16   1   307570     2.045178563  1393  P   N [fio]
> > 253,16   1   307571     2.045180295  1393  U   N [fio] 1
> > 253,16   0   321758     2.045180768  1394  C   R 15646472 + 8 [0]
> > 253,16   1   307572     2.045181451  1393  D   R 1554408 + 8 [fio]
> > 253,16   0   321759     2.045184556  1394  C   R 14682176 + 8 [0]
> > 253,16   0   321760     2.045187371  1394  C   R 13046784 + 8 [0]
> > 253,16   0   321761     2.045196543  1394  Q   R 11609952 + 8 [fio]
> > 253,16   0   321762     2.045197289  1394  G   R 11609952 + 8 [fio]
> > 253,16   1   307573     2.045198166  1393  C   R 14489456 + 8 [0]
> > 253,16   0   321763     2.045198329  1394  P   N [fio]
> > 253,16   0   321764     2.045199520  1394  U   N [fio] 1
> > 253,16   0   321765     2.045200544  1394  D   R 11609952 + 8 [fio]
> > 253,16   0   321766     2.045208562  1394  C   R 6083512 + 8 [0]
> > 253,16   1   307574     2.045219756  1395  C   R 3340928 + 8 [0]
> > 253,16   0   321767     2.045232590  1394  C   R 6967144 + 8 [0]
> > 253,16   1   307575     2.045243177  1396  C   R 15966080 + 8 [0]
> > 253,16   1   307576     2.045250844  1396  Q   R 9139408 + 8 [fio]
> > 253,16   1   307577     2.045251499  1396  G   R 9139408 + 8 [fio]
> > 253,16   1   307578     2.045252527  1396  P   N [fio]
> > 253,16   1   307579     2.045254025  1396  U   N [fio] 1
> > 253,16   0   321768     2.045254048  1394  C   R 5681104 + 8 [0]
> > 253,16   1   307580     2.045255169  1396  D   R 9139408 + 8 [fio]
> > 253,16   0   321769     2.045262286  1394  Q   R 9056968 + 8 [fio]
> > 253,16   0   321770     2.045262923  1394  G   R 9056968 + 8 [fio]
> > 253,16   0   321771     2.045263851  1394  P   N [fio]
> > 253,16   0   321772     2.045265016  1394  U   N [fio] 1
> > 253,16   0   321773     2.045266087  1394  D   R 9056968 + 8 [fio]
> > 253,16   1   307581     2.045267796    40  C   R 14598208 + 8 [0]
> > 253,16   0   321774     2.045284312  1394  Q   R 3200080 + 8 [fio]
> > 253,16   0   321775     2.045284867  1394  G   R 3200080 + 8 [fio]
> > 253,16   0   321776     2.045285862  1394  P   N [fio]
> > 253,16   1   307582     2.045287878  1396  Q   R 18113240 + 8 [fio]
> > 253,16   1   307583     2.045288509  1396  G   R 18113240 + 8 [fio]
> > 253,16   1   307584     2.045289508  1396  P   N [fio]
> > 253,16   1   307585     2.045290565  1396  U   N [fio] 1
> > 253,16   1   307586     2.045291629  1396  D   R 18113240 + 8 [fio]
> > 253,16   0   321777     2.045294432  1394  C   R 4126280 + 8 [0]
> > 253,16   0   321778     2.045302896  1394  U   N [fio] 1
> > 253,16   0   321779     2.045303909  1394  D   R 3200080 + 8 [fio]
> > 253,16   1   307587     2.045305145  1396  Q   R 12895512 + 8 [fio]
> > 253,16   1   307588     2.045305766  1396  G   R 12895512 + 8 [fio]
> > 253,16   1   307589     2.045306887  1396  P   N [fio]
> > 253,16   1   307590     2.045308050  1396  U   N [fio] 1
> > 253,16   1   307591     2.045309139  1396  D   R 12895512 + 8 [fio]
> > 253,16   1   307592     2.045322292  1396  Q   R 14034472 + 8 [fio]
> > 253,16   0   321780     2.045322301  1394  Q   R 796184 + 8 [fio]
> > 253,16   0   321781     2.045322940  1394  G   R 796184 + 8 [fio]
> > 253,16   1   307593     2.045322958  1396  G   R 14034472 + 8 [fio]
> > 253,16   0   321782     2.045324193  1394  P   N [fio]
> > 253,16   1   307594     2.045324199  1396  P   N [fio]
> > 253,16   1   307595     2.045325295  1396  U   N [fio] 1
> > 253,16   0   321783     2.045325314  1394  U   N [fio] 1
> > 253,16   0   321784     2.045326501  1394  D   R 14034472 + 8 [fio]
> > 253,16   1   307596     2.045326516  1396  D   R 796184 + 8 [fio]
> > 253,16   1   307597     2.045339252  1396  Q   R 8827184 + 8 [fio]
> > 253,16   0   321785     2.045339262  1394  Q   R 16108264 + 8 [fio]
> > 253,16   0   321786     2.045339900  1394  G   R 16108264 + 8 [fio]
> > 253,16   1   307598     2.045339902  1396  G   R 8827184 + 8 [fio]
> > 253,16   1   307599     2.045341051  1396  P   N [fio]
> > 253,16   0   321787     2.045341126  1394  P   N [fio]
> > 253,16   1   307600     2.045342012  1396  U   N [fio] 1
> > 253,16   0   321788     2.045342086  1394  U   N [fio] 1
> > 253,16   0   321789     2.045343282  1394  D   R 16108264 + 8 [fio]
> > 253,16   0   321790     2.045344979  1394  D   R 8827184 + 8 [fio]
> > 253,16   1   307601     2.045353588  1396  Q   R 11285896 + 8 [fio]
> > 253,16   1   307602     2.045354336  1396  G   R 11285896 + 8 [fio]
> > 253,16   1   307603     2.045355348  1396  P   N [fio]
> > 253,16   0   321791     2.045356120  1394  Q   R 8593080 + 8 [fio]
> > 253,16   1   307604     2.045356402  1396  U   N [fio] 1
> > 253,16   0   321792     2.045356714  1394  G   R 8593080 + 8 [fio]
> > 253,16   1   307605     2.045357472  1396  D   R 11285896 + 8 [fio]
> > 253,16   0   321793     2.045357789  1394  P   N [fio]
> > 253,16   0   321794     2.045358691  1394  U   N [fio] 1
> > 253,16   0   321795     2.045359550  1394  D   R 8593080 + 8 [fio]
> > 253,16   1   307606     2.045369558  1396  Q   R 17509128 + 8 [fio]
> > 253,16   1   307607     2.045370220  1396  G   R 17509128 + 8 [fio]
> > 253,16   0   321796     2.045370807  1394  Q   R 7393376 + 8 [fio]
> > 253,16   1   307608     2.045371249  1396  P   N [fio]
> > 253,16   0   321797     2.045371405  1394  G   R 7393376 + 8 [fio]
> > 253,16   1   307609     2.045372368  1396  U   N [fio] 1
> > 253,16   0   321798     2.045372460  1394  P   N [fio]
> > 253,16   1   307610     2.045373379  1396  D   R 17509128 + 8 [fio]
> > 253,16   0   321799     2.045373443  1394  U   N [fio] 1
> > 253,16   0   321800     2.045374334  1394  D   R 7393376 + 8 [fio]
> > 253,16   1   307611     2.045385849  1396  Q   R 15227224 + 8 [fio]
> > 253,16   0   321801     2.045385859  1394  Q   R 4025880 + 8 [fio]
> > 253,16   0   321802     2.045386465  1394  G   R 4025880 + 8 [fio]
> > 253,16   1   307612     2.045386490  1396  G   R 15227224 + 8 [fio]
> > 253,16   0   321803     2.045387752  1394  P   N [fio]
> > 253,16   1   307613     2.045387756  1396  P   N [fio]
> > 253,16   1   307614     2.045388787  1396  U   N [fio] 1
> > 253,16   0   321804     2.045388802  1394  U   N [fio] 1
> > 253,16   0   321805     2.045390041  1394  D   R 4025880 + 8 [fio]
> > 253,16   0   321806     2.045391662  1394  D   R 15227224 + 8 [fio]
> > 253,16   1   307615     2.045400454  1396  Q   R 1524424 + 8 [fio]
> > 253,16   1   307616     2.045401113  1396  G   R 1524424 + 8 [fio]
> > 253,16   1   307617     2.045402135  1396  P   N [fio]
> > 253,16   0   321807     2.045402877  1394  Q   R 20400040 + 8 [fio]
> > 253,16   1   307618     2.045403201  1396  U   N [fio] 1
> > 253,16   0   321808     2.045403474  1394  G   R 20400040 + 8 [fio]
> > 253,16   1   307619     2.045404307  1396  D   R 1524424 + 8 [fio]
> > 253,16   0   321809     2.045404603  1394  P   N [fio]
> > 253,16   0   321810     2.045405503  1394  U   N [fio] 1
> > 253,16   0   321811     2.045406368  1394  D   R 20400040 + 8 [fio]
> > 253,16   1   307620     2.045416364  1396  Q   R 12912080 + 8 [fio]
> > 253,16   1   307621     2.045417025  1396  G   R 12912080 + 8 [fio]
> > 253,16   0   321812     2.045417239  1394  Q   R 14427880 + 8 [fio]
> > 253,16   0   321813     2.045417755  1394  G   R 14427880 + 8 [fio]
> > 253,16   1   307622     2.045418346  1396  P   N [fio]
> > 253,16   0   321814     2.045418901  1394  P   N [fio]
> > 253,16   1   307623     2.045419483  1396  U   N [fio] 1
> > 253,16   0   321815     2.045419817  1394  U   N [fio] 1
> > 253,16   1   307624     2.045420512  1396  D   R 12912080 + 8 [fio]
> > 253,16   0   321816     2.045420817  1394  D   R 14427880 + 8 [fio]
> > 253,16   0   321817     2.045433275  1394  Q   R 5098080 + 8 [fio]
> > 253,16   1   307625     2.045433281  1396  Q   R 9204032 + 8 [fio]
> > 253,16   1   307626     2.045433910  1396  G   R 9204032 + 8 [fio]
> > 253,16   0   321818     2.045433939  1394  G   R 5098080 + 8 [fio]
> > 253,16   1   307627     2.045435074  1396  P   N [fio]
> > 253,16   0   321819     2.045435151  1394  P   N [fio]
> > 253,16   1   307628     2.045436065  1396  U   N [fio] 1
> > 253,16   0   321820     2.045436134  1394  U   N [fio] 1
> > 253,16   1   307629     2.045437153  1396  D   R 5098080 + 8 [fio]
> > 253,16   0   321821     2.045437189  1394  D   R 9204032 + 8 [fio]
> > 253,16   0   321822     2.045449724  1394  Q   R 8743352 + 8 [fio]
> > 253,16   1   307630     2.045449731  1396  Q   R 3957640 + 8 [fio]
> > 253,16   1   307631     2.045450373  1396  G   R 3957640 + 8 [fio]
> > 253,16   0   321823     2.045450385  1394  G   R 8743352 + 8 [fio]
> > 253,16   1   307632     2.045451499  1396  P   N [fio]
> > 253,16   0   321824     2.045451560  1394  P   N [fio]
> > 253,16   1   307633     2.045452564  1396  U   N [fio] 1
> > 253,16   0   321825     2.045452595  1394  U   N [fio] 1
> > 253,16   1   307634     2.045453896  1396  D   R 8743352 + 8 [fio]
> > 253,16   1   307635     2.045455483  1396  D   R 3957640 + 8 [fio]
> > 253,16   0   321826     2.045464230  1394  Q   R 18141352 + 8 [fio]
> > 253,16   0   321827     2.045464809  1394  G   R 18141352 + 8 [fio]
> > 253,16   0   321828     2.045465808  1394  P   N [fio]
> > 253,16   1   307636     2.045466884  1396  Q   R 14356248 + 8 [fio]
> > 253,16   0   321829     2.045466925  1394  U   N [fio] 1
> > 253,16   1   307637     2.045467445  1396  G   R 14356248 + 8 [fio]
> > 253,16   0   321830     2.045467911  1394  D   R 18141352 + 8 [fio]
> > 253,16   1   307638     2.045469917  1396  P   N [fio]
> > 253,16   1   307639     2.045470841  1396  U   N [fio] 1
> > 253,16   1   307640     2.045471757  1396  D   R 14356248 + 8 [fio]
> > 253,16   0   321831     2.045480371  1394  Q   R 3678872 + 8 [fio]
> > 253,16   0   321832     2.045480944  1394  G   R 3678872 + 8 [fio]
> > 253,16   0   321833     2.045482626  1394  S   R 3678872 + 8 [fio]
> > 253,16   1   307641     2.045483448  1396  Q   R 14080120 + 8 [fio]
> > 253,16   1   307642     2.045483952  1396  G   R 14080120 + 8 [fio]
> > 253,16   1   307643     2.045484932  1396  P   N [fio]
> > 253,16   1   307644     2.045485963  1396  U   N [fio] 1
> > 253,16   1   307645     2.045486944  1396  D   R 14080120 + 8 [fio]
> > 253,16   1   307646     2.045500301  1396  Q   R 7780032 + 8 [fio]
> > 253,16   1   307647     2.045500913  1396  G   R 7780032 + 8 [fio]
> > 253,16   1   307648     2.045502001  1396  P   N [fio]
> > 253,16   1   307649     2.045503033  1396  U   N [fio] 1
> > 253,16   1   307650     2.045504023  1396  D   R 7780032 + 8 [fio]
> > 253,16   1   307651     2.045514780  1396  Q   R 19258792 + 8 [fio]
> > 253,16   1   307652     2.045515251  1396  G   R 19258792 + 8 [fio]
> > 253,16   1   307653     2.045516023  1396  P   N [fio]
> > 253,16   1   307654     2.045516601  1396  U   N [fio] 1
> > 253,16   1   307655     2.045517220  1396  D   R 19258792 + 8 [fio]
> > 253,16   1   307656     2.045524439  1396  Q   R 17379048 + 8 [fio]
> > 253,16   1   307657     2.045524744  1396  G   R 17379048 + 8 [fio]
> > 253,16   1   307658     2.045525291  1396  P   N [fio]
> > 253,16   1   307659     2.045525815  1396  U   N [fio] 1
> > 253,16   1   307660     2.045526389  1396  D   R 17379048 + 8 [fio]
> > 253,16   1   307661     2.045533654  1396  Q   R 6701640 + 8 [fio]
> > 253,16   1   307662     2.045533958  1396  G   R 6701640 + 8 [fio]
> > 253,16   1   307663     2.045535034  1396  S   R 6701640 + 8 [fio]
> > 253,16   1   307664     2.045545018  1395  Q   R 3973592 + 8 [fio]
> > 253,16   1   307665     2.045545304  1395  G   R 3973592 + 8 [fio]
> > 253,16   1   307666     2.045546132  1395  S   R 3973592 + 8 [fio]
> > 253,16   1   307667     2.045574463  1393  Q   R 2334424 + 8 [fio]
> > 253,16   1   307668     2.045574949  1393  G   R 2334424 + 8 [fio]
> > 253,16   1   307669     2.045575980  1393  S   R 2334424 + 8 [fio]
> > CPU0 (vdb):
> >  Reads Queued:      52,778,  211,112KiB	 Writes Queued:           0,        0KiB
> >  Read Dispatches:   53,948,  215,792KiB	 Write Dispatches:        0,        0KiB
> >  Reads Requeued:         0		 Writes Requeued:         0
> >  Reads Completed:   51,912,  207,648KiB	 Writes Completed:        0,        0KiB
> >  Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
> >  Read depth:            64        	 Write depth:             0
> >  IO unplugs:        52,773        	 Timer unplugs:           0
> > CPU1 (vdb):
> >  Reads Queued:      50,593,  202,372KiB	 Writes Queued:           0,        0KiB
> >  Read Dispatches:   49,419,  197,676KiB	 Write Dispatches:        0,        0KiB
> >  Reads Requeued:         0		 Writes Requeued:         0
> >  Reads Completed:   51,391,  205,564KiB	 Writes Completed:        0,        0KiB
> >  Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
> >  Read depth:            64        	 Write depth:             0
> >  IO unplugs:        50,594        	 Timer unplugs:           0
> > 
> > Total (vdb):
> >  Reads Queued:     103,371,  413,484KiB	 Writes Queued:           0,        0KiB
> >  Read Dispatches:  103,367,  413,468KiB	 Write Dispatches:        0,        0KiB
> >  Reads Requeued:         0		 Writes Requeued:         0
> >  Reads Completed:  103,303,  413,212KiB	 Writes Completed:        0,        0KiB
> >  Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
> >  IO unplugs:       103,367        	 Timer unplugs:           0
> > 
> > Throughput (R/W): 202,059KiB/s / 0KiB/s
> > Events (vdb): 629,502 entries
> > Skips: 0 forward (0 -   0.0%)
> > 
> > The last part of the trace is all 4 processes I run failing to allocate
> > a request and going to sleep. The last completion event is some 150
> > lines up, after various successful dispatch etc events.
> 
> Looks like a race condition, below works for me, please try.
> 
> 
> Subject: virtio_blk: fix race condition
> 
> virtqueue_kick() isn't multi-thread safe.
> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 588479d..f353959 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -204,10 +204,11 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  		virtqueue_kick(vblk->vq);
>  		return BLK_MQ_RQ_QUEUE_BUSY;
>  	}
> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
>  
>  	if (last)
>  		virtqueue_kick(vblk->vq);
> +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> +
>  	return BLK_MQ_RQ_QUEUE_OK;
>  }

Just stumbled on that too. You need one more, btw, for the sg failure
case:


diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 588479d58f52..6a680d4de7f1 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -199,15 +199,16 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 
 	spin_lock_irqsave(&vblk->vq_lock, flags);
 	if (__virtblk_add_req(vblk->vq, vbr, vbr->sg, num) < 0) {
+		virtqueue_kick(vblk->vq);
 		spin_unlock_irqrestore(&vblk->vq_lock, flags);
 		blk_mq_stop_hw_queue(hctx);
-		virtqueue_kick(vblk->vq);
 		return BLK_MQ_RQ_QUEUE_BUSY;
 	}
-	spin_unlock_irqrestore(&vblk->vq_lock, flags);
 
 	if (last)
 		virtqueue_kick(vblk->vq);
+
+	spin_unlock_irqrestore(&vblk->vq_lock, flags);
 	return BLK_MQ_RQ_QUEUE_OK;
 }
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  1:54                     ` Jens Axboe
@ 2013-11-20  2:02                       ` Jens Axboe
  2013-11-20  2:53                         ` Dave Chinner
  2013-11-20  8:07                       ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2013-11-20  2:02 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Dave Chinner, linux-kernel

On Tue, Nov 19 2013, Jens Axboe wrote:
> > Looks like a race condition, below works for me, please try.
> > 
> > 
> > Subject: virtio_blk: fix race condition
> > 
> > virtqueue_kick() isn't multi-thread safe.
> > 
> > Signed-off-by: Shaohua Li <shli@fusionio.com>
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 588479d..f353959 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -204,10 +204,11 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
> >  		virtqueue_kick(vblk->vq);
> >  		return BLK_MQ_RQ_QUEUE_BUSY;
> >  	}
> > -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> >  
> >  	if (last)
> >  		virtqueue_kick(vblk->vq);
> > +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > +
> >  	return BLK_MQ_RQ_QUEUE_OK;
> >  }
> 
> Just stumbled on that too. You need one more, btw, for the sg failure
> case:
> 
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 588479d58f52..6a680d4de7f1 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -199,15 +199,16 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  
>  	spin_lock_irqsave(&vblk->vq_lock, flags);
>  	if (__virtblk_add_req(vblk->vq, vbr, vbr->sg, num) < 0) {
> +		virtqueue_kick(vblk->vq);
>  		spin_unlock_irqrestore(&vblk->vq_lock, flags);
>  		blk_mq_stop_hw_queue(hctx);
> -		virtqueue_kick(vblk->vq);
>  		return BLK_MQ_RQ_QUEUE_BUSY;
>  	}
> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
>  
>  	if (last)
>  		virtqueue_kick(vblk->vq);
> +
> +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
>  	return BLK_MQ_RQ_QUEUE_OK;
>  }

Tested successfully here too. Dave, please give it a go, looks like this
should fix it up for you. Committed here:

http://git.kernel.dk/?p=linux-block.git;a=commit;h=f02b9ac35a47dff745c7637fbc095f01cc03646e

and will send it upstream today, so we can avoid having an -rc1 with a
broken virtio-blk.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  2:02                       ` Jens Axboe
@ 2013-11-20  2:53                         ` Dave Chinner
  2013-11-20  3:12                           ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2013-11-20  2:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Shaohua Li, linux-kernel

On Tue, Nov 19, 2013 at 07:02:30PM -0700, Jens Axboe wrote:
> On Tue, Nov 19 2013, Jens Axboe wrote:
> > > Looks like a race condition, below works for me, please try.
> > > 
> > > 
> > > Subject: virtio_blk: fix race condition
> > > 
> > > virtqueue_kick() isn't multi-thread safe.
> > > 
> > > Signed-off-by: Shaohua Li <shli@fusionio.com>
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 588479d..f353959 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -204,10 +204,11 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
> > >  		virtqueue_kick(vblk->vq);
> > >  		return BLK_MQ_RQ_QUEUE_BUSY;
> > >  	}
> > > -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > >  
> > >  	if (last)
> > >  		virtqueue_kick(vblk->vq);
> > > +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > > +
> > >  	return BLK_MQ_RQ_QUEUE_OK;
> > >  }
> > 
> > Just stumbled on that too. You need one more, btw, for the sg failure
> > case:
> > 
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 588479d58f52..6a680d4de7f1 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -199,15 +199,16 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
> >  
> >  	spin_lock_irqsave(&vblk->vq_lock, flags);
> >  	if (__virtblk_add_req(vblk->vq, vbr, vbr->sg, num) < 0) {
> > +		virtqueue_kick(vblk->vq);
> >  		spin_unlock_irqrestore(&vblk->vq_lock, flags);
> >  		blk_mq_stop_hw_queue(hctx);
> > -		virtqueue_kick(vblk->vq);
> >  		return BLK_MQ_RQ_QUEUE_BUSY;
> >  	}
> > -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> >  
> >  	if (last)
> >  		virtqueue_kick(vblk->vq);
> > +
> > +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> >  	return BLK_MQ_RQ_QUEUE_OK;
> >  }
> 
> Tested successfully here too.

Ah, so it is exactly the problem I suggested it might be. ;)

> Dave, please give it a go, looks like this
> should fix it up for you. Committed here:
> 
> http://git.kernel.dk/?p=linux-block.git;a=commit;h=f02b9ac35a47dff745c7637fbc095f01cc03646e

Testing it now. might take a little while to confirm given it had
taken a few iterations of xfstests before I tripped over it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  2:53                         ` Dave Chinner
@ 2013-11-20  3:12                           ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-20  3:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Shaohua Li, linux-kernel

On Wed, Nov 20 2013, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 07:02:30PM -0700, Jens Axboe wrote:
> > On Tue, Nov 19 2013, Jens Axboe wrote:
> > > > Looks like a race condition, below works for me, please try.
> > > > 
> > > > 
> > > > Subject: virtio_blk: fix race condition
> > > > 
> > > > virtqueue_kick() isn't multi-thread safe.
> > > > 
> > > > Signed-off-by: Shaohua Li <shli@fusionio.com>
> > > > 
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index 588479d..f353959 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -204,10 +204,11 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
> > > >  		virtqueue_kick(vblk->vq);
> > > >  		return BLK_MQ_RQ_QUEUE_BUSY;
> > > >  	}
> > > > -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > > >  
> > > >  	if (last)
> > > >  		virtqueue_kick(vblk->vq);
> > > > +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > > > +
> > > >  	return BLK_MQ_RQ_QUEUE_OK;
> > > >  }
> > > 
> > > Just stumbled on that too. You need one more, btw, for the sg failure
> > > case:
> > > 
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 588479d58f52..6a680d4de7f1 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -199,15 +199,16 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
> > >  
> > >  	spin_lock_irqsave(&vblk->vq_lock, flags);
> > >  	if (__virtblk_add_req(vblk->vq, vbr, vbr->sg, num) < 0) {
> > > +		virtqueue_kick(vblk->vq);
> > >  		spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > >  		blk_mq_stop_hw_queue(hctx);
> > > -		virtqueue_kick(vblk->vq);
> > >  		return BLK_MQ_RQ_QUEUE_BUSY;
> > >  	}
> > > -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > >  
> > >  	if (last)
> > >  		virtqueue_kick(vblk->vq);
> > > +
> > > +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> > >  	return BLK_MQ_RQ_QUEUE_OK;
> > >  }
> > 
> > Tested successfully here too.
> 
> Ah, so it is exactly the problem I suggested it might be. ;)

It isn't actually, it's not a race between the queue conditions, the
stopping/starting etc or inside/outside lock state checking. It's a
"simple" race between the virtqueue operations. It is a race, however,
but I think that one was given :-)

> > Dave, please give it a go, looks like this
> > should fix it up for you. Committed here:
> > 
> > http://git.kernel.dk/?p=linux-block.git;a=commit;h=f02b9ac35a47dff745c7637fbc095f01cc03646e
> 
> Testing it now. might take a little while to confirm given it had
> taken a few iterations of xfstests before I tripped over it...

I feel pretty confident in it, fwiw. My test case was boiled down to
trigger it in seconds, and it survived a lengthy run afterwards.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-19 21:43       ` Jens Axboe
  2013-11-19 22:42         ` Jens Axboe
@ 2013-11-20  8:04         ` Christoph Hellwig
  2013-11-20 16:20           ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2013-11-20  8:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Dave Chinner, linux-kernel

On Tue, Nov 19, 2013 at 02:43:51PM -0700, Jens Axboe wrote:
> No, the nr_requests isn't actually relevant in the blk-mq context, the
> driver sets its own depth. For the above, it's 64 normal commands, and 2
> reserved. The reserved would be for a flush, for instance. If someone
> attempts to queue more than the allocated number of requests, it'll stop
> the blk-mq queue and kick things into gear on the virtio side. Then when
> requests complete, we start the queue again.

Btw, any reason we lowerered the queue depth from 128 to 64 in
virtio-blk?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  1:54                     ` Jens Axboe
  2013-11-20  2:02                       ` Jens Axboe
@ 2013-11-20  8:07                       ` Christoph Hellwig
  2013-11-20 16:21                         ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2013-11-20  8:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Shaohua Li, Dave Chinner, linux-kernel

> Just stumbled on that too. You need one more, btw, for the sg failure
> case:
> 
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 588479d58f52..6a680d4de7f1 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -199,15 +199,16 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>  
>  	spin_lock_irqsave(&vblk->vq_lock, flags);
>  	if (__virtblk_add_req(vblk->vq, vbr, vbr->sg, num) < 0) {
> +		virtqueue_kick(vblk->vq);
>  		spin_unlock_irqrestore(&vblk->vq_lock, flags);
>  		blk_mq_stop_hw_queue(hctx);
> -		virtqueue_kick(vblk->vq);
>  		return BLK_MQ_RQ_QUEUE_BUSY;
>  	}
> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
>  
>  	if (last)
>  		virtqueue_kick(vblk->vq);
> +
> +	spin_unlock_irqrestore(&vblk->vq_lock, flags);

Note that virtqueue_kick can be split into a virtqueue_prepare_split
that needs the lock, and a virtqueue_notify that doesn't, but it might
not be nessecary to do that optimization right now.

Btw, how much do I need to bribe you to send mails where I don't have 
step over 500 lines of dead quotes to find the actual patch?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  8:04         ` Christoph Hellwig
@ 2013-11-20 16:20           ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-20 16:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, linux-kernel

On 11/20/2013 01:04 AM, Christoph Hellwig wrote:
> On Tue, Nov 19, 2013 at 02:43:51PM -0700, Jens Axboe wrote:
>> No, the nr_requests isn't actually relevant in the blk-mq context, the
>> driver sets its own depth. For the above, it's 64 normal commands, and 2
>> reserved. The reserved would be for a flush, for instance. If someone
>> attempts to queue more than the allocated number of requests, it'll stop
>> the blk-mq queue and kick things into gear on the virtio side. Then when
>> requests complete, we start the queue again.
> 
> Btw, any reason we lowerered the queue depth from 128 to 64 in
> virtio-blk?

No, we can easily bump it to 128 again, I think it was a leftover from
where it had more queues and testing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices
  2013-11-20  8:07                       ` Christoph Hellwig
@ 2013-11-20 16:21                         ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2013-11-20 16:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Shaohua Li, Dave Chinner, linux-kernel

On 11/20/2013 01:07 AM, Christoph Hellwig wrote:
>> Just stumbled on that too. You need one more, btw, for the sg failure
>> case:
>>
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 588479d58f52..6a680d4de7f1 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -199,15 +199,16 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
>>  
>>  	spin_lock_irqsave(&vblk->vq_lock, flags);
>>  	if (__virtblk_add_req(vblk->vq, vbr, vbr->sg, num) < 0) {
>> +		virtqueue_kick(vblk->vq);
>>  		spin_unlock_irqrestore(&vblk->vq_lock, flags);
>>  		blk_mq_stop_hw_queue(hctx);
>> -		virtqueue_kick(vblk->vq);
>>  		return BLK_MQ_RQ_QUEUE_BUSY;
>>  	}
>> -	spin_unlock_irqrestore(&vblk->vq_lock, flags);
>>  
>>  	if (last)
>>  		virtqueue_kick(vblk->vq);
>> +
>> +	spin_unlock_irqrestore(&vblk->vq_lock, flags);
> 
> Note that virtqueue_kick can be split into a virtqueue_prepare_split
> that needs the lock, and a virtqueue_notify that doesn't, but it might
> not be nessecary to do that optimization right now.

Right, that would be easily doable. Seems it's in the realm of micro
optimizations, but worth doing next time it's touched.

> Btw, how much do I need to bribe you to send mails where I don't have 
> step over 500 lines of dead quotes to find the actual patch?

Sorry, the next one was trimmed :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-11-20 16:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-19  8:02 [Regression x2, 3.13-git] virtio block mq hang, iostat busted on virtio devices Dave Chinner
2013-11-19 10:36 ` Christoph Hellwig
2013-11-19 16:05   ` Jens Axboe
2013-11-19 16:09     ` Christoph Hellwig
2013-11-19 16:16       ` Jens Axboe
2013-11-19 21:30     ` Dave Chinner
2013-11-19 21:40       ` Jens Axboe
2013-11-19 20:15 ` Jens Axboe
2013-11-19 21:20   ` Jens Axboe
2013-11-19 21:34     ` Dave Chinner
2013-11-19 21:43       ` Jens Axboe
2013-11-19 22:42         ` Jens Axboe
2013-11-19 22:51           ` Jens Axboe
2013-11-19 23:23             ` Dave Chinner
2013-11-19 23:59               ` Jens Axboe
2013-11-20  0:08                 ` Jens Axboe
2013-11-20  1:44                   ` Shaohua Li
2013-11-20  1:54                     ` Jens Axboe
2013-11-20  2:02                       ` Jens Axboe
2013-11-20  2:53                         ` Dave Chinner
2013-11-20  3:12                           ` Jens Axboe
2013-11-20  8:07                       ` Christoph Hellwig
2013-11-20 16:21                         ` Jens Axboe
2013-11-20  8:04         ` Christoph Hellwig
2013-11-20 16:20           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).