From: Christoph Hellwig <hch@infradead.org>
To: Badari Pulavarty <pbadari@us.ibm.com>
Cc: kvm@vger.kernel.org
Subject: Re: [RFC] vhost-blk implementation
Date: Wed, 24 Mar 2010 16:04:02 -0400 [thread overview]
Message-ID: <20100324200402.GA22272@infradead.org> (raw)
In-Reply-To: <1269306023.7931.72.camel@badari-desktop>
> Inspired by vhost-net implementation, I did initial prototype
> of vhost-blk to see if it provides any benefits over QEMU virtio-blk.
> I haven't handled all the error cases, fixed naming conventions etc.,
> but the implementation is stable to play with. I tried not to deviate
> from vhost-net implementation where possible.
Can you also send the qemu side of it?
> with vhost-blk:
> ----------------
>
> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
> 640000+0 records in
> 640000+0 records out
> 83886080000 bytes (84 GB) copied, 126.135 seconds, 665 MB/s
>
> real 2m6.137s
> user 0m0.281s
> sys 0m14.725s
>
> without vhost-blk: (virtio)
> ---------------------------
>
> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
> 640000+0 records in
> 640000+0 records out
> 83886080000 bytes (84 GB) copied, 275.466 seconds, 305 MB/s
>
> real 4m35.468s
> user 0m0.373s
> sys 0m48.074s
Which caching mode is this? I assume data=writeback, because otherwise
you'd be doing synchronous I/O directly from the handler.
> +static int do_handle_io(struct file *file, uint32_t type, uint64_t sector,
> + struct iovec *iov, int in)
> +{
> + loff_t pos = sector << 8;
> + int ret = 0;
> +
> + if (type & VIRTIO_BLK_T_FLUSH) {
> + ret = vfs_fsync(file, file->f_path.dentry, 1);
> + } else if (type & VIRTIO_BLK_T_OUT) {
> + ret = vfs_writev(file, iov, in, &pos);
> + } else {
> + ret = vfs_readv(file, iov, in, &pos);
> + }
> + return ret;
I have to admit I don't understand the vhost architecture at all, but
where do the actual data pointers used by the iovecs reside?
vfs_readv/writev expect both the iovec itself and the buffers
pointed to by it to reside in userspace, so just using kernel buffers
here will break badly on architectures with different user/kernel
mappings. A lot of this is fixable using simple set_fs & co tricks,
but for direct I/O which uses get_user_pages even that will fail badly.
Also it seems like you're doing all the I/O synchronous here? For
data=writeback operations that could explain the read speedup
as you're avoiding context switches, but for actual write I/O
which has to get data to disk (either directly from vfs_writev or
later through vfs_fsync) this seems like a really bad idea stealing
a lot of guest time that should happen in the background.
Other than that the code seems quite nice and simple, but one huge
problem is that it'll only support raw images, and thus misses out
on all the "nice" image formats used in qemu deployments, especially
qcow2. It's also missing the ioctl magic we're having in various
places, both for controlling host devices like cdroms and SG
passthrough.
next prev parent reply other threads:[~2010-03-24 20:04 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-23 1:00 [RFC] vhost-blk implementation Badari Pulavarty
2010-03-23 1:16 ` Anthony Liguori
2010-03-23 1:45 ` Badari Pulavarty
2010-03-23 2:00 ` Anthony Liguori
2010-03-23 2:50 ` Badari Pulavarty
2010-03-23 10:05 ` Avi Kivity
2010-03-23 14:48 ` Badari Pulavarty
2010-03-23 10:03 ` Avi Kivity
2010-03-23 14:55 ` Badari Pulavarty
2010-03-23 16:53 ` Avi Kivity
2010-03-24 20:05 ` Christoph Hellwig
2010-03-25 6:29 ` Avi Kivity
2010-03-25 15:48 ` Christoph Hellwig
2010-03-25 15:51 ` Avi Kivity
2010-03-25 15:00 ` Asdo
2010-04-05 19:59 ` Christoph Hellwig
2010-04-07 0:36 ` [RFC] vhost-blk implementation (v2) Badari Pulavarty
2010-03-23 10:09 ` [RFC] vhost-blk implementation Eran Rom
2010-03-24 20:04 ` Christoph Hellwig [this message]
2010-03-24 20:22 ` Badari Pulavarty
2010-03-25 7:57 ` Avi Kivity
2010-03-25 14:36 ` Badari Pulavarty
2010-03-25 15:57 ` Christoph Hellwig
2010-03-26 18:53 ` Eran Rom
2010-04-08 16:17 ` Stefan Hajnoczi
2010-04-05 19:23 ` Christoph Hellwig
2010-04-05 23:17 ` Badari Pulavarty
2010-03-24 20:27 ` Badari Pulavarty
2010-03-29 15:41 ` Badari Pulavarty
2010-03-29 18:20 ` Chris Wright
2010-03-29 20:37 ` Avi Kivity
2010-03-29 22:51 ` Badari Pulavarty
2010-03-29 23:56 ` Chris Wright
2010-03-30 12:43 ` Avi Kivity
2010-04-05 14:22 ` Stefan Hajnoczi
2010-04-06 2:27 ` Badari Pulavarty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100324200402.GA22272@infradead.org \
--to=hch@infradead.org \
--cc=kvm@vger.kernel.org \
--cc=pbadari@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox