From mboxrd@z Thu Jan 1 00:00:00 1970 From: Badari Pulavarty Subject: Re: [RFC] vhost-blk implementation Date: Tue, 23 Mar 2010 07:55:24 -0700 Message-ID: <4BA8D65C.2060605@us.ibm.com> References: <1269306023.7931.72.camel@badari-desktop> <4BA891E2.9040500@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from e8.ny.us.ibm.com ([32.97.182.138]:52349 "EHLO e8.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754323Ab0CWOza (ORCPT ); Tue, 23 Mar 2010 10:55:30 -0400 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by e8.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id o2NEkumW002585 for ; Tue, 23 Mar 2010 10:46:56 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o2NEtTn9158352 for ; Tue, 23 Mar 2010 10:55:29 -0400 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o2NEtAQN023661 for ; Tue, 23 Mar 2010 08:55:11 -0600 In-Reply-To: <4BA891E2.9040500@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > On 03/23/2010 03:00 AM, Badari Pulavarty wrote: >> Forgot to CC: KVM list earlier >> >> [RFC] vhost-blk implementation.eml >> >> Subject: >> [RFC] vhost-blk implementation >> From: >> Badari Pulavarty >> Date: >> Mon, 22 Mar 2010 17:34:06 -0700 >> >> To: >> virtualization@lists.linux-foundation.org, qemu-devel@nongnu.org >> >> >> Hi, >> >> Inspired by vhost-net implementation, I did initial prototype >> of vhost-blk to see if it provides any benefits over QEMU virtio-blk. >> I haven't handled all the error cases, fixed naming conventions etc., >> but the implementation is stable to play with. I tried not to deviate >> from vhost-net implementation where possible. >> >> NOTE: Only change I had to make to vhost core code is to >> increase VHOST_NET_MAX_SG to 130 (128+2) in vhost.h >> >> Performance: >> ============= >> >> I have done simple tests to see how it performs. I got very >> encouraging results on sequential read tests. But on sequential >> write tests, I see degrade over virtio-blk. I can't figure out and >> explain why. Can some one shed light on whats happening here ? >> >> Read Results: >> ============= >> Test does read of 84GB file from the host (through virtio). I unmount >> and mount the filesystem on the host to make sure there is nothing >> in the page cache.. >> >> +#define VHOST_BLK_VQ_MAX 1 >> + >> +struct vhost_blk { >> + struct vhost_dev dev; >> + struct vhost_virtqueue vqs[VHOST_BLK_VQ_MAX]; >> + struct vhost_poll poll[VHOST_BLK_VQ_MAX]; >> +}; >> + >> +static int do_handle_io(struct file *file, uint32_t type, uint64_t >> sector, >> + struct iovec *iov, int in) >> +{ >> + loff_t pos = sector<< 8; >> + int ret = 0; >> + >> + if (type& VIRTIO_BLK_T_FLUSH) { >> + ret = vfs_fsync(file, file->f_path.dentry, 1); >> + } else if (type& VIRTIO_BLK_T_OUT) { >> + ret = vfs_writev(file, iov, in,&pos); >> + } else { >> + ret = vfs_readv(file, iov, in,&pos); >> + } >> + return ret; >> +} >> > > This should be done asynchronously. That is likely the cause of write > performance degradation. For reads, readahead means that that you're > async anyway, but writes/syncs are still synchronous. I am not sure what you mean by async here. Even if I use f_op->aio_write() its still synchronous (except for DIO). Since we are writing to pagecache and not waiting for write() to complete, this is the best we can do here. Do you mean offload write() handling to another thread ? > > I also think it should be done at the bio layer. I am not sure what you meant here. Do you want to do submit_bio() directly ? Its not going to be that simple. Since the sector# is offset within the file, one have to do getblocks() on it to find the real-disk-block#s + we have to do get_user_pages() on these iovecs before submitting them to bio.. All of this work is done by vfs_write()/vfs_read() anyway.. I am not sure what you are suggesting here.. > File I/O is going to be slower, if we do vhost-blk we should > concentrate on maximum performance. The block layer also exposes more > functionality we can use (asynchronous barriers for example). > > btw, for fairness, cpu measurements should be done from the host side > and include the vhost thread. > Will do. Thanks, Badari