From: Badari Pulavarty <pbadari@us.ibm.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: kvm@vger.kernel.org
Subject: Re: [RFC] vhost-blk implementation
Date: Wed, 24 Mar 2010 13:22:37 -0700 [thread overview]
Message-ID: <4BAA748D.40509@us.ibm.com> (raw)
In-Reply-To: <20100324200402.GA22272@infradead.org>
Christoph Hellwig wrote:
>> Inspired by vhost-net implementation, I did initial prototype
>> of vhost-blk to see if it provides any benefits over QEMU virtio-blk.
>> I haven't handled all the error cases, fixed naming conventions etc.,
>> but the implementation is stable to play with. I tried not to deviate
>> from vhost-net implementation where possible.
>>
>
> Can you also send the qemu side of it?
>
>
>> with vhost-blk:
>> ----------------
>>
>> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
>> 640000+0 records in
>> 640000+0 records out
>> 83886080000 bytes (84 GB) copied, 126.135 seconds, 665 MB/s
>>
>> real 2m6.137s
>> user 0m0.281s
>> sys 0m14.725s
>>
>> without vhost-blk: (virtio)
>> ---------------------------
>>
>> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
>> 640000+0 records in
>> 640000+0 records out
>> 83886080000 bytes (84 GB) copied, 275.466 seconds, 305 MB/s
>>
>> real 4m35.468s
>> user 0m0.373s
>> sys 0m48.074s
>>
>
> Which caching mode is this? I assume data=writeback, because otherwise
> you'd be doing synchronous I/O directly from the handler.
>
Yes. This is with default (writeback) cache model. As mentioned earlier,
readhead is helping here
and most cases, data would be ready in the pagecache.
>
>> +static int do_handle_io(struct file *file, uint32_t type, uint64_t sector,
>> + struct iovec *iov, int in)
>> +{
>> + loff_t pos = sector << 8;
>> + int ret = 0;
>> +
>> + if (type & VIRTIO_BLK_T_FLUSH) {
>> + ret = vfs_fsync(file, file->f_path.dentry, 1);
>> + } else if (type & VIRTIO_BLK_T_OUT) {
>> + ret = vfs_writev(file, iov, in, &pos);
>> + } else {
>> + ret = vfs_readv(file, iov, in, &pos);
>> + }
>> + return ret;
>>
>
> I have to admit I don't understand the vhost architecture at all, but
> where do the actual data pointers used by the iovecs reside?
> vfs_readv/writev expect both the iovec itself and the buffers
> pointed to by it to reside in userspace, so just using kernel buffers
> here will break badly on architectures with different user/kernel
> mappings. A lot of this is fixable using simple set_fs & co tricks,
> but for direct I/O which uses get_user_pages even that will fail badly.
>
iovecs and buffers are user-space pointers (from the host kernel point
of view). They are
guest address. So, I don't need to do any set_fs tricks.
> Also it seems like you're doing all the I/O synchronous here? For
> data=writeback operations that could explain the read speedup
> as you're avoiding context switches, but for actual write I/O
> which has to get data to disk (either directly from vfs_writev or
> later through vfs_fsync) this seems like a really bad idea stealing
> a lot of guest time that should happen in the background.
>
Yes. QEMU virtio-blk is batching up all the writes and handing of the
work to another
thread. When the writes() are complete, its sending a status completion.
Since I am
doing everything synchronous (even though its write to pagecache) one
request at a
time, that explains the slow down. We need to find a way to
1) batch IO writes together
2) hand off to another thread to do the IO, so that vhost-thread can handle
next set of requests
3) update the status on the completion
What do should I do here ? I can create bunch of kernel threads to do
the IO for me.
Or some how fit and reuse AIO io_submit() mechanism. Whats the best way
here ?
I hate do duplicate all the code VFS is doing.
>
> Other than that the code seems quite nice and simple, but one huge
> problem is that it'll only support raw images, and thus misses out
> on all the "nice" image formats used in qemu deployments, especially
> qcow2. It's also missing the ioctl magic we're having in various
> places, both for controlling host devices like cdroms and SG
> passthrough.
>
True... unfortunately, I don't understand all of those (qcow2) details
yet !! I need to read up on those,
to even make a comment :(
Thanks,
Badari
next prev parent reply other threads:[~2010-03-24 20:22 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-23 1:00 [RFC] vhost-blk implementation Badari Pulavarty
2010-03-23 1:16 ` Anthony Liguori
2010-03-23 1:45 ` Badari Pulavarty
2010-03-23 2:00 ` Anthony Liguori
2010-03-23 2:50 ` Badari Pulavarty
2010-03-23 10:05 ` Avi Kivity
2010-03-23 14:48 ` Badari Pulavarty
2010-03-23 10:03 ` Avi Kivity
2010-03-23 14:55 ` Badari Pulavarty
2010-03-23 16:53 ` Avi Kivity
2010-03-24 20:05 ` Christoph Hellwig
2010-03-25 6:29 ` Avi Kivity
2010-03-25 15:48 ` Christoph Hellwig
2010-03-25 15:51 ` Avi Kivity
2010-03-25 15:00 ` Asdo
2010-04-05 19:59 ` Christoph Hellwig
2010-04-07 0:36 ` [RFC] vhost-blk implementation (v2) Badari Pulavarty
2010-03-23 10:09 ` [RFC] vhost-blk implementation Eran Rom
2010-03-24 20:04 ` Christoph Hellwig
2010-03-24 20:22 ` Badari Pulavarty [this message]
2010-03-25 7:57 ` Avi Kivity
2010-03-25 14:36 ` Badari Pulavarty
2010-03-25 15:57 ` Christoph Hellwig
2010-03-26 18:53 ` Eran Rom
2010-04-08 16:17 ` Stefan Hajnoczi
2010-04-05 19:23 ` Christoph Hellwig
2010-04-05 23:17 ` Badari Pulavarty
2010-03-24 20:27 ` Badari Pulavarty
2010-03-29 15:41 ` Badari Pulavarty
2010-03-29 18:20 ` Chris Wright
2010-03-29 20:37 ` Avi Kivity
2010-03-29 22:51 ` Badari Pulavarty
2010-03-29 23:56 ` Chris Wright
2010-03-30 12:43 ` Avi Kivity
2010-04-05 14:22 ` Stefan Hajnoczi
2010-04-06 2:27 ` Badari Pulavarty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BAA748D.40509@us.ibm.com \
--to=pbadari@us.ibm.com \
--cc=hch@infradead.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox