From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58636)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1ZqlBA-0005X0-6i
	for qemu-devel@nongnu.org; Mon, 26 Oct 2015 13:03:29 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1ZqlB6-0001aV-Tz
	for qemu-devel@nongnu.org; Mon, 26 Oct 2015 13:03:28 -0400
Received: from mail-pa0-x22c.google.com ([2607:f8b0:400e:c03::22c]:35172)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1ZqlB6-0001aA-IE
	for qemu-devel@nongnu.org; Mon, 26 Oct 2015 13:03:24 -0400
Received: by pasz6 with SMTP id z6so193487385pas.2
	for <qemu-devel@nongnu.org>; Mon, 26 Oct 2015 10:03:24 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
References: <CABYiri9ZtC9tm8aoJ9LFBYW6EQkTTe1yVxunL_hDW22Y3x-0+g@mail.gmail.com>
	<562E48B9.6090600@redhat.com>
	<CABYiri_saUt_zujydtiuENCWgNSEU3F1TKBpiASWTx1Uk=uv0g@mail.gmail.com>
	<562E56B8.2030109@redhat.com>
	<CABYiri_2-fBnLfHDe96rW-1JFK47ZfQZqo9+XUs18qs3aKV+5A@mail.gmail.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <562E5CD4.8010902@redhat.com>
Date: Mon, 26 Oct 2015 18:03:16 +0100
MIME-Version: 1.0
In-Reply-To: <CABYiri_2-fBnLfHDe96rW-1JFK47ZfQZqo9+XUs18qs3aKV+5A@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] 4k seq read splitting for virtio-blk - possible
	workarounds?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Andrey Korolyov <andrey@xdel.ru>
Cc: Sergey Fionov <fionov@gmail.com>, Jens Axboe <axboe@kernel.dk>, Jeff Moyer <jmoyer@redhat.com>, Peter Lieven <pl@kamp.de>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>


On 26/10/2015 17:43, Andrey Korolyov wrote:
> On Mon, Oct 26, 2015 at 7:37 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 26/10/2015 17:31, Andrey Korolyov wrote:
>>>> the virtio block device is always splitting a single read
>>>> range request to 4k ones, bringing the overall performance of the
>>>> sequential reads far below virtio-scsi.
>>>>
>>>> How does the blktrace look like in the guest?
>>>
>>> Yep, thanks for suggestion. It looks now like a pure driver issue:
>>>
>>>  Reads Queued:       11008,    44032KiB  Writes Queued:           0,        0KiB
>>>  Read Dispatches:    11008,    44032KiB  Write Dispatches:        0,        0KiB
>>>
>>> vs
>>>
>>>  Reads Queued:      185728,   742912KiB  Writes Queued:           0,        0KiB
>>>  Read Dispatches:     2902,   742912KiB  Write Dispatches:        0,        0KiB
>>>
>>> Because guest virtio-blk driver lacks *any* blk scheduler management,
>>> this is kinda logical. Requests for scsi backend are dispatched in
>>                                                        ^^^^^^^^^^
>>
>> queued you mean?
>>
>>> single block-sized chunks as well, but they are mostly merged by a
>>> scheduler before being passed to the device layer. Could be there any
>>> improvements over the situation except writing an underlay b/w virtio
>>> emulator backend and the real storage?
>>
>> This is probably the fall-out of converting the virtio-blk to use
>> blk-mq, which was premature to say the least.  Jeff Moyer was working on
>> it, but I'm not sure if this has been merged.  Andrey, what kernel are
>> you using?
> 
> Queued, sorry for honest mistype, guest kernel is a 3.16.x from
> jessie, so regular blk-mq is there. Any point of interest for trying
> something newer? And of course I didn`t thought about something older,
> will try against 3.10 now.

Yes, it makes sense to try both something older and something newer.  I
found this:

    commit e6c4438ba7cb615448492849970aaf0aaa1cc973
    Author: Jeff Moyer <jmoyer@redhat.com>
    Date:   Fri May 8 10:51:30 2015 -0700

    blk-mq: fix plugging in blk_sq_make_request

Looking at the meat of the patch, we have:

 	const int is_sync = rw_is_sync(bio->bi_rw);
 	const int is_flush_fua = bio->bi_rw & (REQ_FLUSH | REQ_FUA);
-	unsigned int use_plug, request_count = 0;
+	struct blk_plug *plug;
+	unsigned int request_count = 0;
 	struct blk_map_ctx data;
 	struct request *rq;

-	/*
-	 * If we have multiple hardware queues, just go directly to
-	 * one of those for sync IO.
-	 */
-	use_plug = !is_flush_fua && !is_sync;

For reads rw_is_sync returns true, hence use_plug is always false.  So
4.2 kernels could fix this issue.

Paolo