qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
@ 2015-10-26 11:50 Andrey Korolyov
  2015-10-26 15:37 ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2015-10-26 11:50 UTC (permalink / raw)
  To: qemu-devel@nongnu.org; +Cc: Peter Lieven

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

Hi,

during the test against generic storage backend with NBD frontend we
found that the virtio block device is always splitting a single read
range request to 4k ones, bringing the overall performance of the
sequential reads far below virtio-scsi. Random reads are going
relatively well on small blocks due to small overhead comparing to
sequential ones and writes are ok in all cases. Multiread slightly
improves the situation, but it would be nice to see complete
pass-through of range read requests down to backend without an
intermediate splitting.

Samples measured on an NBD backend during 128k sequential reads for
both virtio-blk and virtio-scsi are attached. Please let me know if it
looks like that I missed something or this behavior is plainly wrong.

Thanks!

[-- Attachment #2: virtio-blk.txt --]
[-- Type: text/plain, Size: 1651 bytes --]

125550: *NBD_CMD_READ from 513298432 (1002536) len 4096, exp->buf, +(READ from fd 5 offset 513298432 len 4096), buf->net, +OK!
125551: *NBD_CMD_READ from 513302528 (1002544) len 4096, exp->buf, +(READ from fd 5 offset 513302528 len 4096), buf->net, +OK!
125552: *NBD_CMD_READ from 513306624 (1002552) len 4096, exp->buf, +(READ from fd 5 offset 513306624 len 4096), buf->net, +OK!
125553: *NBD_CMD_READ from 513310720 (1002560) len 4096, exp->buf, +(READ from fd 5 offset 513310720 len 4096), buf->net, +OK!
125554: *NBD_CMD_READ from 513314816 (1002568) len 4096, exp->buf, +(READ from fd 5 offset 513314816 len 4096), buf->net, +OK!
125555: *NBD_CMD_READ from 513318912 (1002576) len 4096, exp->buf, +(READ from fd 5 offset 513318912 len 4096), buf->net, +OK!
125556: *NBD_CMD_READ from 513323008 (1002584) len 4096, exp->buf, +(READ from fd 5 offset 513323008 len 4096), buf->net, +OK!
125557: *NBD_CMD_READ from 513327104 (1002592) len 4096, exp->buf, +(READ from fd 5 offset 513327104 len 4096), buf->net, +OK!
125558: *NBD_CMD_READ from 513331200 (1002600) len 4096, exp->buf, +(READ from fd 5 offset 513331200 len 4096), buf->net, +OK!
125559: *NBD_CMD_READ from 513335296 (1002608) len 4096, exp->buf, +(READ from fd 5 offset 513335296 len 4096), buf->net, +OK!
125560: *NBD_CMD_READ from 513339392 (1002616) len 4096, exp->buf, +(READ from fd 5 offset 513339392 len 4096), buf->net, +OK!
125561: *NBD_CMD_READ from 513343488 (1002624) len 4096, exp->buf, +(READ from fd 5 offset 513343488 len 4096), buf->net, +OK!
125562: *NBD_CMD_READ from 513347584 (1002632) len 4096, exp->buf, +(READ from fd 5 offset 513347584 len 4096), buf->net, +OK!

[-- Attachment #3: virtio-scsi.txt --]
[-- Type: text/plain, Size: 1834 bytes --]

8294: *NBD_CMD_READ from 1071120384 (2092032) len 131072, exp->buf, +(READ from fd 5 offset 1071120384 len 131072), buf->net, +OK!
8295: *NBD_CMD_READ from 1071251456 (2092288) len 131072, exp->buf, +(READ from fd 5 offset 1071251456 len 131072), buf->net, +OK!
8296: *NBD_CMD_READ from 1071382528 (2092544) len 131072, exp->buf, +(READ from fd 5 offset 1071382528 len 131072), buf->net, +OK!
8297: *NBD_CMD_READ from 1071513600 (2092800) len 131072, exp->buf, +(READ from fd 5 offset 1071513600 len 131072), buf->net, +OK!
8298: *NBD_CMD_READ from 1071644672 (2093056) len 131072, exp->buf, +(READ from fd 5 offset 1071644672 len 131072), buf->net, +OK!
8299: *NBD_CMD_READ from 1071775744 (2093312) len 131072, exp->buf, +(READ from fd 5 offset 1071775744 len 131072), buf->net, +OK!
8300: *NBD_CMD_READ from 1071906816 (2093568) len 131072, exp->buf, +(READ from fd 5 offset 1071906816 len 131072), buf->net, +OK!
8301: *NBD_CMD_READ from 1072037888 (2093824) len 131072, exp->buf, +(READ from fd 5 offset 1072037888 len 131072), buf->net, +OK!
8302: *NBD_CMD_READ from 1072168960 (2094080) len 131072, exp->buf, +(READ from fd 5 offset 1072168960 len 131072), buf->net, +OK!
8303: *NBD_CMD_READ from 1072300032 (2094336) len 131072, exp->buf, +(READ from fd 5 offset 1072300032 len 131072), buf->net, +OK!
8304: *NBD_CMD_READ from 1072431104 (2094592) len 131072, exp->buf, +(READ from fd 5 offset 1072431104 len 131072), buf->net, +OK!
8305: *NBD_CMD_READ from 1072562176 (2094848) len 131072, exp->buf, +(READ from fd 5 offset 1072562176 len 131072), buf->net, +OK!
8306: *NBD_CMD_READ from 1072693248 (2095104) len 131072, exp->buf, +(READ from fd 5 offset 1072693248 len 131072), buf->net, +OK!
8307: *NBD_CMD_READ from 1072824320 (2095360) len 131072, exp->buf, +(READ from fd 5 offset 1072824320 len 131072), buf->net, +OK!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 11:50 [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds? Andrey Korolyov
@ 2015-10-26 15:37 ` Paolo Bonzini
  2015-10-26 16:31   ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2015-10-26 15:37 UTC (permalink / raw)
  To: Andrey Korolyov, qemu-devel@nongnu.org; +Cc: Peter Lieven



On 26/10/2015 12:50, Andrey Korolyov wrote:
> Hi,
> 
> during the test against generic storage backend with NBD frontend we
> found that the virtio block device is always splitting a single read
> range request to 4k ones, bringing the overall performance of the
> sequential reads far below virtio-scsi. Random reads are going
> relatively well on small blocks due to small overhead comparing to
> sequential ones and writes are ok in all cases. Multiread slightly
> improves the situation, but it would be nice to see complete
> pass-through of range read requests down to backend without an
> intermediate splitting.
> 
> Samples measured on an NBD backend during 128k sequential reads for
> both virtio-blk and virtio-scsi are attached. Please let me know if it
> looks like that I missed something or this behavior is plainly wrong.

How does the blktrace look like in the guest?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 15:37 ` Paolo Bonzini
@ 2015-10-26 16:31   ` Andrey Korolyov
  2015-10-26 16:37     ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2015-10-26 16:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Sergey Fionov, Peter Lieven, qemu-devel@nongnu.org

On Mon, Oct 26, 2015 at 6:37 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 26/10/2015 12:50, Andrey Korolyov wrote:
>> Hi,
>>
>> during the test against generic storage backend with NBD frontend we
>> found that the virtio block device is always splitting a single read
>> range request to 4k ones, bringing the overall performance of the
>> sequential reads far below virtio-scsi. Random reads are going
>> relatively well on small blocks due to small overhead comparing to
>> sequential ones and writes are ok in all cases. Multiread slightly
>> improves the situation, but it would be nice to see complete
>> pass-through of range read requests down to backend without an
>> intermediate splitting.
>>
>> Samples measured on an NBD backend during 128k sequential reads for
>> both virtio-blk and virtio-scsi are attached. Please let me know if it
>> looks like that I missed something or this behavior is plainly wrong.
>
> How does the blktrace look like in the guest?
>

Yep, thanks for suggestion. It looks now like a pure driver issue:

 Reads Queued:       11008,    44032KiB  Writes Queued:           0,        0KiB
 Read Dispatches:    11008,    44032KiB  Write Dispatches:        0,        0KiB

vs

 Reads Queued:      185728,   742912KiB  Writes Queued:           0,        0KiB
 Read Dispatches:     2902,   742912KiB  Write Dispatches:        0,        0KiB

Because guest virtio-blk driver lacks *any* blk scheduler management,
this is kinda logical. Requests for scsi backend are dispatched in
single block-sized chunks as well, but they are mostly merged by a
scheduler before being passed to the device layer. Could be there any
improvements over the situation except writing an underlay b/w virtio
emulator backend and the real storage?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 16:31   ` Andrey Korolyov
@ 2015-10-26 16:37     ` Paolo Bonzini
  2015-10-26 16:43       ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2015-10-26 16:37 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Sergey Fionov, Jens Axboe, Jeff Moyer, Peter Lieven,
	qemu-devel@nongnu.org

On 26/10/2015 17:31, Andrey Korolyov wrote:
>> the virtio block device is always splitting a single read
>> range request to 4k ones, bringing the overall performance of the
>> sequential reads far below virtio-scsi.
>> 
>> How does the blktrace look like in the guest?
> 
> Yep, thanks for suggestion. It looks now like a pure driver issue:
> 
>  Reads Queued:       11008,    44032KiB  Writes Queued:           0,        0KiB
>  Read Dispatches:    11008,    44032KiB  Write Dispatches:        0,        0KiB
> 
> vs
> 
>  Reads Queued:      185728,   742912KiB  Writes Queued:           0,        0KiB
>  Read Dispatches:     2902,   742912KiB  Write Dispatches:        0,        0KiB
> 
> Because guest virtio-blk driver lacks *any* blk scheduler management,
> this is kinda logical. Requests for scsi backend are dispatched in
                                                       ^^^^^^^^^^

queued you mean?

> single block-sized chunks as well, but they are mostly merged by a
> scheduler before being passed to the device layer. Could be there any
> improvements over the situation except writing an underlay b/w virtio
> emulator backend and the real storage?

This is probably the fall-out of converting the virtio-blk to use
blk-mq, which was premature to say the least.  Jeff Moyer was working on
it, but I'm not sure if this has been merged.  Andrey, what kernel are
you using?

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 16:37     ` Paolo Bonzini
@ 2015-10-26 16:43       ` Andrey Korolyov
  2015-10-26 17:03         ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2015-10-26 16:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fionov, Jens Axboe, Jeff Moyer, Peter Lieven,
	qemu-devel@nongnu.org

On Mon, Oct 26, 2015 at 7:37 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 26/10/2015 17:31, Andrey Korolyov wrote:
>>> the virtio block device is always splitting a single read
>>> range request to 4k ones, bringing the overall performance of the
>>> sequential reads far below virtio-scsi.
>>>
>>> How does the blktrace look like in the guest?
>>
>> Yep, thanks for suggestion. It looks now like a pure driver issue:
>>
>>  Reads Queued:       11008,    44032KiB  Writes Queued:           0,        0KiB
>>  Read Dispatches:    11008,    44032KiB  Write Dispatches:        0,        0KiB
>>
>> vs
>>
>>  Reads Queued:      185728,   742912KiB  Writes Queued:           0,        0KiB
>>  Read Dispatches:     2902,   742912KiB  Write Dispatches:        0,        0KiB
>>
>> Because guest virtio-blk driver lacks *any* blk scheduler management,
>> this is kinda logical. Requests for scsi backend are dispatched in
>                                                        ^^^^^^^^^^
>
> queued you mean?
>
>> single block-sized chunks as well, but they are mostly merged by a
>> scheduler before being passed to the device layer. Could be there any
>> improvements over the situation except writing an underlay b/w virtio
>> emulator backend and the real storage?
>
> This is probably the fall-out of converting the virtio-blk to use
> blk-mq, which was premature to say the least.  Jeff Moyer was working on
> it, but I'm not sure if this has been merged.  Andrey, what kernel are
> you using?
>
> Paolo

Queued, sorry for honest mistype, guest kernel is a 3.16.x from
jessie, so regular blk-mq is there. Any point of interest for trying
something newer? And of course I didn`t thought about something older,
will try against 3.10 now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 16:43       ` Andrey Korolyov
@ 2015-10-26 17:03         ` Paolo Bonzini
  2015-10-26 17:18           ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2015-10-26 17:03 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Sergey Fionov, Jens Axboe, Jeff Moyer, Peter Lieven,
	qemu-devel@nongnu.org



On 26/10/2015 17:43, Andrey Korolyov wrote:
> On Mon, Oct 26, 2015 at 7:37 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 26/10/2015 17:31, Andrey Korolyov wrote:
>>>> the virtio block device is always splitting a single read
>>>> range request to 4k ones, bringing the overall performance of the
>>>> sequential reads far below virtio-scsi.
>>>>
>>>> How does the blktrace look like in the guest?
>>>
>>> Yep, thanks for suggestion. It looks now like a pure driver issue:
>>>
>>>  Reads Queued:       11008,    44032KiB  Writes Queued:           0,        0KiB
>>>  Read Dispatches:    11008,    44032KiB  Write Dispatches:        0,        0KiB
>>>
>>> vs
>>>
>>>  Reads Queued:      185728,   742912KiB  Writes Queued:           0,        0KiB
>>>  Read Dispatches:     2902,   742912KiB  Write Dispatches:        0,        0KiB
>>>
>>> Because guest virtio-blk driver lacks *any* blk scheduler management,
>>> this is kinda logical. Requests for scsi backend are dispatched in
>>                                                        ^^^^^^^^^^
>>
>> queued you mean?
>>
>>> single block-sized chunks as well, but they are mostly merged by a
>>> scheduler before being passed to the device layer. Could be there any
>>> improvements over the situation except writing an underlay b/w virtio
>>> emulator backend and the real storage?
>>
>> This is probably the fall-out of converting the virtio-blk to use
>> blk-mq, which was premature to say the least.  Jeff Moyer was working on
>> it, but I'm not sure if this has been merged.  Andrey, what kernel are
>> you using?
> 
> Queued, sorry for honest mistype, guest kernel is a 3.16.x from
> jessie, so regular blk-mq is there. Any point of interest for trying
> something newer? And of course I didn`t thought about something older,
> will try against 3.10 now.

Yes, it makes sense to try both something older and something newer.  I
found this:

    commit e6c4438ba7cb615448492849970aaf0aaa1cc973
    Author: Jeff Moyer <jmoyer@redhat.com>
    Date:   Fri May 8 10:51:30 2015 -0700

    blk-mq: fix plugging in blk_sq_make_request

Looking at the meat of the patch, we have:

 	const int is_sync = rw_is_sync(bio->bi_rw);
 	const int is_flush_fua = bio->bi_rw & (REQ_FLUSH | REQ_FUA);
-	unsigned int use_plug, request_count = 0;
+	struct blk_plug *plug;
+	unsigned int request_count = 0;
 	struct blk_map_ctx data;
 	struct request *rq;

-	/*
-	 * If we have multiple hardware queues, just go directly to
-	 * one of those for sync IO.
-	 */
-	use_plug = !is_flush_fua && !is_sync;

For reads rw_is_sync returns true, hence use_plug is always false.  So
4.2 kernels could fix this issue.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 17:03         ` Paolo Bonzini
@ 2015-10-26 17:18           ` Andrey Korolyov
  2015-10-26 17:32             ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2015-10-26 17:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fionov, Jens Axboe, Jeff Moyer, Peter Lieven,
	qemu-devel@nongnu.org

On Mon, Oct 26, 2015 at 8:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 26/10/2015 17:43, Andrey Korolyov wrote:
>> On Mon, Oct 26, 2015 at 7:37 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> On 26/10/2015 17:31, Andrey Korolyov wrote:
>>>>> the virtio block device is always splitting a single read
>>>>> range request to 4k ones, bringing the overall performance of the
>>>>> sequential reads far below virtio-scsi.
>>>>>
>>>>> How does the blktrace look like in the guest?
>>>>
>>>> Yep, thanks for suggestion. It looks now like a pure driver issue:
>>>>
>>>>  Reads Queued:       11008,    44032KiB  Writes Queued:           0,        0KiB
>>>>  Read Dispatches:    11008,    44032KiB  Write Dispatches:        0,        0KiB
>>>>
>>>> vs
>>>>
>>>>  Reads Queued:      185728,   742912KiB  Writes Queued:           0,        0KiB
>>>>  Read Dispatches:     2902,   742912KiB  Write Dispatches:        0,        0KiB
>>>>
>>>> Because guest virtio-blk driver lacks *any* blk scheduler management,
>>>> this is kinda logical. Requests for scsi backend are dispatched in
>>>                                                        ^^^^^^^^^^
>>>
>>> queued you mean?
>>>
>>>> single block-sized chunks as well, but they are mostly merged by a
>>>> scheduler before being passed to the device layer. Could be there any
>>>> improvements over the situation except writing an underlay b/w virtio
>>>> emulator backend and the real storage?
>>>
>>> This is probably the fall-out of converting the virtio-blk to use
>>> blk-mq, which was premature to say the least.  Jeff Moyer was working on
>>> it, but I'm not sure if this has been merged.  Andrey, what kernel are
>>> you using?
>>
>> Queued, sorry for honest mistype, guest kernel is a 3.16.x from
>> jessie, so regular blk-mq is there. Any point of interest for trying
>> something newer? And of course I didn`t thought about something older,
>> will try against 3.10 now.
>
> Yes, it makes sense to try both something older and something newer.  I
> found this:
>
>     commit e6c4438ba7cb615448492849970aaf0aaa1cc973
>     Author: Jeff Moyer <jmoyer@redhat.com>
>     Date:   Fri May 8 10:51:30 2015 -0700
>
>     blk-mq: fix plugging in blk_sq_make_request
>
> Looking at the meat of the patch, we have:
>
>         const int is_sync = rw_is_sync(bio->bi_rw);
>         const int is_flush_fua = bio->bi_rw & (REQ_FLUSH | REQ_FUA);
> -       unsigned int use_plug, request_count = 0;
> +       struct blk_plug *plug;
> +       unsigned int request_count = 0;
>         struct blk_map_ctx data;
>         struct request *rq;
>
> -       /*
> -        * If we have multiple hardware queues, just go directly to
> -        * one of those for sync IO.
> -        */
> -       use_plug = !is_flush_fua && !is_sync;
>
> For reads rw_is_sync returns true, hence use_plug is always false.  So
> 4.2 kernels could fix this issue.
>
> Paolo

Yes, both cases are positive, thanks for very detailed explanation and
for tips. Does this also mean that most current distros which are
using 'broken' >=3.13 <4.2 driver would bring sequential read
performance, especially on rotating media, or media with high request
latency like hybrid disk, down to knees for virtio, which almost
always is a default selection?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 17:18           ` Andrey Korolyov
@ 2015-10-26 17:32             ` Paolo Bonzini
  2015-10-26 18:28               ` Andrey Korolyov
  2015-10-27  2:04               ` Fam Zheng
  0 siblings, 2 replies; 12+ messages in thread
From: Paolo Bonzini @ 2015-10-26 17:32 UTC (permalink / raw)
  To: Andrey Korolyov
  Cc: Sergey Fionov, Jens Axboe, Jeff Moyer, Peter Lieven,
	qemu-devel@nongnu.org



On 26/10/2015 18:18, Andrey Korolyov wrote:
> Yes, both cases are positive, thanks for very detailed explanation and
> for tips. Does this also mean that most current distros which are
> using 'broken' >=3.13 <4.2 driver would bring sequential read
> performance, especially on rotating media, or media with high request
> latency like hybrid disk, down to knees for virtio, which almost
> always is a default selection?

Yes, this is why I said the conversion was premature.  On one hand I
totally agree that virtio-blk is a great guinea pig for blk-mq
conversion, on the other hand people are using the thing in production
and the effects weren't quite understood.

It's a common misconception that virt doesn't benefit from the elevator,
but actually you get (well... used to get...) much better performance
from the deadline scheduler than the noop scheduler.  Merging is the
main reason, because it lowers the amount of work that you have to do in
the host.

Even if you don't get better performance, merging will get better CPU
utilization because the longer s/g lists take time to process in the
host, and the effect's much larger than a few extra milliwatts in a
bare-metal controller.

Having a "real" multiqueue model in the host (real = one I/O thread and
one AIO context per guest queue, with each I/O thread able to service
multiple disks; rather than a "fake" multiqueue where you still have one
I/O thread and AIO context per guest disk, so all the queues really
funnel into one in the host) should fix this, but it's at least a few
months away in QEMU... probably something like QEMU 2.8.  My plan is for
2.6 to have fine-grained critical sections (patches written, will repost
during 2.5 hard freeze), 2.7 (unlikely 2.6) to have fine-grained locks,
and 2.8 or 2.9 to have multiqueue.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 17:32             ` Paolo Bonzini
@ 2015-10-26 18:28               ` Andrey Korolyov
  2015-10-27  2:04               ` Fam Zheng
  1 sibling, 0 replies; 12+ messages in thread
From: Andrey Korolyov @ 2015-10-26 18:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergey Fionov, Jens Axboe, Jeff Moyer, Peter Lieven,
	qemu-devel@nongnu.org

On Mon, Oct 26, 2015 at 8:32 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 26/10/2015 18:18, Andrey Korolyov wrote:
>> Yes, both cases are positive, thanks for very detailed explanation and
>> for tips. Does this also mean that most current distros which are
>> using 'broken' >=3.13 <4.2 driver would bring sequential read
>> performance, especially on rotating media, or media with high request
>> latency like hybrid disk, down to knees for virtio, which almost
>> always is a default selection?
>
> Yes, this is why I said the conversion was premature.  On one hand I
> totally agree that virtio-blk is a great guinea pig for blk-mq
> conversion, on the other hand people are using the thing in production
> and the effects weren't quite understood.
>
> It's a common misconception that virt doesn't benefit from the elevator,
> but actually you get (well... used to get...) much better performance
> from the deadline scheduler than the noop scheduler.  Merging is the
> main reason, because it lowers the amount of work that you have to do in
> the host.
>
> Even if you don't get better performance, merging will get better CPU
> utilization because the longer s/g lists take time to process in the
> host, and the effect's much larger than a few extra milliwatts in a
> bare-metal controller.
>
> Having a "real" multiqueue model in the host (real = one I/O thread and
> one AIO context per guest queue, with each I/O thread able to service
> multiple disks; rather than a "fake" multiqueue where you still have one
> I/O thread and AIO context per guest disk, so all the queues really
> funnel into one in the host) should fix this, but it's at least a few
> months away in QEMU... probably something like QEMU 2.8.  My plan is for
> 2.6 to have fine-grained critical sections (patches written, will repost
> during 2.5 hard freeze), 2.7 (unlikely 2.6) to have fine-grained locks,
> and 2.8 or 2.9 to have multiqueue.
>


BTW it seems that I made a little stronger claim than it is actually -
at least 3.18 works fine in all cases, so the issue was fixed a bit
earlier than in 4.2. If this is true, fix could possibly land in
remaining distro queues, even if they are not following stable
branches. I could bisect this in a spare time because I don`t see any
obvious candidates from the diff which would fix a misbehavior at a
glance.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-26 17:32             ` Paolo Bonzini
  2015-10-26 18:28               ` Andrey Korolyov
@ 2015-10-27  2:04               ` Fam Zheng
  2015-10-27  9:30                 ` Paolo Bonzini
  1 sibling, 1 reply; 12+ messages in thread
From: Fam Zheng @ 2015-10-27  2:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jens Axboe, Andrey Korolyov, Peter Lieven, qemu-devel@nongnu.org,
	Jeff Moyer, Sergey Fionov

On Mon, 10/26 18:32, Paolo Bonzini wrote:
> Having a "real" multiqueue model in the host (real = one I/O thread and
> one AIO context per guest queue, with each I/O thread able to service
> multiple disks; rather than a "fake" multiqueue where you still have one
> I/O thread and AIO context per guest disk, so all the queues really
> funnel into one in the host) should fix this, but it's at least a few
> months away in QEMU... probably something like QEMU 2.8.  My plan is for
> 2.6 to have fine-grained critical sections (patches written, will repost
> during 2.5 hard freeze), 2.7 (unlikely 2.6) to have fine-grained locks,
> and 2.8 or 2.9 to have multiqueue.

Paolo,

You're talking about virtio-scsi, right? What about virtio-blk? Do you think we
should resume the "fake" virtio-blk multiqueue work on QEMU side?

Thanks,
Fam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-27  2:04               ` Fam Zheng
@ 2015-10-27  9:30                 ` Paolo Bonzini
  2015-10-30 20:04                   ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2015-10-27  9:30 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Jens Axboe, Andrey Korolyov, Peter Lieven, qemu-devel@nongnu.org,
	Jeff Moyer, Sergey Fionov



On 27/10/2015 03:04, Fam Zheng wrote:
> > My plan is for
> > 2.6 to have fine-grained critical sections (patches written, will repost
> > during 2.5 hard freeze), 2.7 (unlikely 2.6) to have fine-grained locks,
> > and 2.8 or 2.9 to have multiqueue.
> 
> You're talking about virtio-scsi, right? What about virtio-blk? Do you think we
> should resume the "fake" virtio-blk multiqueue work on QEMU side?

Even both.  If you resume the "fake" virtio-blk multiqueue, converting
to real multiqueue at the same time as virtio-scsi should be trivial.
The difficult part of course is not the device models, it's the block
layer...

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds?
  2015-10-27  9:30                 ` Paolo Bonzini
@ 2015-10-30 20:04                   ` Andrey Korolyov
  0 siblings, 0 replies; 12+ messages in thread
From: Andrey Korolyov @ 2015-10-30 20:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jens Axboe, Fam Zheng, Peter Lieven, qemu-devel@nongnu.org,
	Jeff Moyer, Sergey Fionov, Christoph Hellwig

> BTW it seems that I made a little stronger claim than it is actually -
> at least 3.18 works fine in all cases, so the issue was fixed a bit
> earlier than in 4.2.


Ok, it turns out that the fix was brought by

commit 447f05bb488bff4282088259b04f47f0f9f76760
Author: Akinobu Mita <akinobu.mita@gmail.com>
Date:   Thu Oct 9 15:26:58 2014 -0700

    block_dev: implement readpages() to optimize sequential read

This patch looks like a perfect candidate for affected stable queues,
though actually it could be called 'a collateral fix'. Any objections
against pinging stable/distro maintainers or better drop-in
replacements are very welcomed.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-10-30 20:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-26 11:50 [Qemu-devel] 4k seq read splitting for virtio-blk - possible workarounds? Andrey Korolyov
2015-10-26 15:37 ` Paolo Bonzini
2015-10-26 16:31   ` Andrey Korolyov
2015-10-26 16:37     ` Paolo Bonzini
2015-10-26 16:43       ` Andrey Korolyov
2015-10-26 17:03         ` Paolo Bonzini
2015-10-26 17:18           ` Andrey Korolyov
2015-10-26 17:32             ` Paolo Bonzini
2015-10-26 18:28               ` Andrey Korolyov
2015-10-27  2:04               ` Fam Zheng
2015-10-27  9:30                 ` Paolo Bonzini
2015-10-30 20:04                   ` Andrey Korolyov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).