Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Liu Yuan <namei.unix@gmail.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Avi Kivity <avi@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Khoa Huynh <khoa@us.ibm.com>,
	Badari Pulavarty <pbadari@us.ibm.com>
Subject: Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device
Date: Fri, 29 Jul 2011 15:22:00 +0800	[thread overview]
Message-ID: <4E325F98.5090308@gmail.com> (raw)
In-Reply-To: <CAJSP0QUKz7LFuARF_n2LcBi_uuSzfmkjrWsHAWrXXkEaQJKkEA@mail.gmail.com>

Hi Stefan
On 07/28/2011 11:44 PM, Stefan Hajnoczi wrote:
> On Thu, Jul 28, 2011 at 3:29 PM, Liu Yuan<namei.unix@gmail.com>  wrote:
>
> Did you investigate userspace virtio-blk performance?  If so, what
> issues did you find?
>

Yes, in the performance table I presented, virtio-blk in the user space 
lags behind the vhost-blk(although this prototype is very primitive 
impl.) in the kernel by about 15%.

Actually, the motivation to start vhost-blk is that, in our observation, 
KVM(virtio enabled) in RHEL 6 is worse than Xen(PV) in RHEL in disk IO 
perspective, especially for sequential read/write (around 20% gap).

We'll deploy a large number of KVM-based systems as the infrastructure 
of some service and this gap is really unpleasant.

By the design, IMHO, virtio performance is supposed to be comparable to 
the para-vulgarization solution if not better, because for KVM, guest 
and backend driver could sit in the same address space via mmaping. This 
would reduce the overhead involved in page table modification, thus 
speed up the buffer management and transfer a lot compared with Xen PV.

I am not in a qualified  position to talk about QEMU , but I think the 
surprised performance improvement by this very primitive vhost-blk 
simply manifest that, the internal structure for qemu io is the way 
bloated. I say it *surprised* because basically vhost just reduces the 
number of system calls, which is heavily tuned by chip manufacture for 
years. So, I guess the performance number vhost-blk gains mainly could 
possibly be contributed to *shorter and simpler* code path.

Anyway, IMHO, compared with user space approach, the in-kernel one would 
allow more flexibility and better integration with the kernel IO stack, 
since we don't need two IO stacks for guest OS.

> I have a hacked up world here that basically implements vhost-blk in userspace:
> http://repo.or.cz/w/qemu/stefanha.git/blob/refs/heads/virtio-blk-data-plane:/hw/virtio-blk.c
>
>   * A dedicated virtqueue thread sleeps on ioeventfd
>   * Guest memory is pre-mapped and accessed directly (not using QEMU's
> usually memory access functions)
>   * Linux AIO is used, the QEMU block layer is bypassed
>   * Completion interrupts are injected from the virtqueue thread using ioctl
>
> I will try to rebase onto qemu-kvm.git/master (this work is several
> months old).  Then we can compare to see how much of the benefit can
> be gotten in userspace.
>
I don't really get you about vhost-blk in user space since vhost 
infrastructure itself means an in-kernel accelerator that implemented in 
kernel . I guess what you meant is somewhat a re-write of virtio-blk in 
user space with a dedicated thread handling requests, and shorter code 
path similar to vhost-blk.

>> [performance]
>>
>>         Currently, the fio benchmarking number is rather promising. The seq read is imporved as much as 16% for throughput and the latency is dropped up to 14%. For seq write, 13.5% and 13% respectively.
>>
>> sequential read:
>> +-------------+-------------+---------------+---------------+
>> | iodepth     | 1           |   2           |   3           |
>> +-------------+-------------+---------------+----------------
>> | virtio-blk  | 4116(214)   |   7814(222)   |   8867(306)   |
>> +-------------+-------------+---------------+---------------+
>> | vhost-blk   | 4755(183)   |   8645(202)   |   10084(266)  |
>> +-------------+-------------+---------------+---------------+
>>
>> 4116(214) means 4116 IOPS/s, the it is completion latency is 214 us.
>>
>> seqeuential write:
>> +-------------+-------------+----------------+--------------+
>> | iodepth     |  1          |    2           |  3           |
>> +-------------+-------------+----------------+--------------+
>> | virtio-blk  | 3848(228)   |   6505(275)    |  9335(291)   |
>> +-------------+-------------+----------------+--------------+
>> | vhost-blk   | 4370(198)   |   7009(249)    |  9938(264)   |
>> +-------------+-------------+----------------+--------------+
>>
>> the fio command for sequential read:
>>
>> sudo fio -name iops -readonly -rw=read -runtime=120 -iodepth 1 -filename /dev/vda -ioengine libaio -direct=1 -bs=512
>>
>> and config file for sequential write is:
>>
>> dev@taobao:~$ cat rw.fio
>> -------------------------
>> [test]
>>
>> rw=rw
>> size=200M
>> directory=/home/dev/data
>> ioengine=libaio
>> iodepth=1
>> direct=1
>> bs=512
>> -------------------------
> 512 byte blocksize is very small, given that you can expect a file
> system to have 4 KB or so block sizes.  It would be interesting to
> measure a wider range of block sizes: 4 KB, 64 KB, and 128 KB for
> example.
>
> Stefan
Actually, I have tested 4KB, it shows the same improvement. What I care 
more is iodepth, since batched AIO would benefit it.But my laptop SATA 
doesn't behave well as it advertises: it says its NCQ queue depth is 32 
and kernel tells me it support 31 requests in one go. When increase 
iodepth in the test up to 4, both the host and guest' IOPS drops 
drastically.

Yuan

next prev parent reply	other threads:[~2011-07-29  7:22 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28 14:29 [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device Liu Yuan
2011-07-28 14:29 ` [RFC PATCH] vhost-blk: An in-kernel accelerator for virtio-blk Liu Yuan
2011-07-28 14:47   ` Christoph Hellwig
2011-07-29 11:19     ` Liu Yuan
2011-07-28 15:18   ` Stefan Hajnoczi
2011-07-28 15:22   ` Michael S. Tsirkin
2011-07-29 15:09     ` Liu Yuan
2011-08-01  6:25     ` Liu Yuan
2011-08-01  8:12       ` Michael S. Tsirkin
2011-08-01  8:55         ` Liu Yuan
2011-08-01 10:26           ` Michael S. Tsirkin
2011-08-11 19:59     ` Dongsu Park
2011-08-12  8:56       ` Alan Cox
2011-07-28 14:29 ` [RFC PATCH] vhost: Enable vhost-blk support Liu Yuan
2011-07-28 15:44 ` [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device Stefan Hajnoczi
2011-07-29  4:48   ` Stefan Hajnoczi
2011-07-29  7:59     ` Liu Yuan
2011-07-29 10:55       ` Christoph Hellwig
2011-07-29  7:22   ` Liu Yuan [this message]
2011-07-29  9:06     ` Stefan Hajnoczi
2011-07-29 12:01       ` Liu Yuan
2011-07-29 12:29         ` Stefan Hajnoczi
2011-07-29 12:50           ` Stefan Hajnoczi
2011-07-29 14:45             ` Liu Yuan
2011-07-29 14:50               ` Liu Yuan
2011-07-29 15:25         ` Sasha Levin
2011-08-01  8:17           ` Avi Kivity
2011-08-01  9:18             ` Liu Yuan
2011-08-01  9:37               ` Avi Kivity
2011-07-29 18:12     ` Badari Pulavarty
2011-08-01  5:46       ` Liu Yuan
2011-08-01  8:12         ` Christoph Hellwig
2011-08-04 21:58         ` Badari Pulavarty
2011-08-05  7:56           ` Liu Yuan
2011-08-05 11:04           ` Liu Yuan
2011-08-05 18:02             ` Badari Pulavarty
2011-08-08  1:35               ` Liu Yuan
2011-08-08  5:04                 ` Badari Pulavarty
2011-08-08  7:31                   ` Liu Yuan
2011-08-08 17:16                     ` Badari Pulavarty
2011-08-10  2:19                       ` Liu Yuan
2011-08-10 20:37                         ` Badari Pulavarty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E325F98.5090308@gmail.com \
    --to=namei.unix@gmail.com \
    --cc=avi@redhat.com \
    --cc=khoa@us.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbadari@us.ibm.com \
    --cc=rusty@rustcorp.com.au \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox