From mboxrd@z Thu Jan 1 00:00:00 1970 From: Badari Pulavarty Subject: Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device Date: Sun, 14 Aug 2011 21:17:59 -0700 Message-ID: <4E489DF7.3050707@us.ibm.com> References: <1311863346-4338-1-git-send-email-namei.unix@gmail.com> <4E325F98.5090308@gmail.com> <4E32F7F2.4080607@us.ibm.com> <4E363DB9.70801@gmail.com> <1312495132.9603.4.camel@badari-desktop> <4E3BCE4D.7090809@gmail.com> <4E3C302A.3040500@us.ibm.com> <4E3F3D4E.70104@gmail.com> <4E3F6E72.1000907@us.ibm.com> <4E3F90E3.9080600@gmail.com> <4E4019E1.2090508@us.ibm.com> <4E41EAC5.8060001@gmail.com> <1313008667.9603.14.camel@badari-desktop> <4E4345F1.90107@gmail.com> <4E434A51.8000902@gmail.com> <4E44B100.3000208@us.ibm.com> <4E44E40C.7040407@gmail.com> <4E45113C.3040502@gmail.com> <4E4550DB.3020802@us.ibm.com> <4E489097.1070307@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Dongsu Park To: Liu Yuan Return-path: Received: from e38.co.us.ibm.com ([32.97.110.159]:57911 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751435Ab1HOESB (ORCPT ); Mon, 15 Aug 2011 00:18:01 -0400 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by e38.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p7EKei8O027059 for ; Sun, 14 Aug 2011 14:40:44 -0600 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p7F4HxLK152968 for ; Sun, 14 Aug 2011 22:17:59 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p7F4HwQi011585 for ; Sun, 14 Aug 2011 22:17:59 -0600 In-Reply-To: <4E489097.1070307@gmail.com> Sender: kvm-owner@vger.kernel.org List-ID: On 8/14/2011 8:20 PM, Liu Yuan wrote: > On 08/13/2011 12:12 AM, Badari Pulavarty wrote: >> On 8/12/2011 4:40 AM, Liu Yuan wrote: >>> On 08/12/2011 04:27 PM, Liu Yuan wrote: >>>> On 08/12/2011 12:50 PM, Badari Pulavarty wrote: >>>>> On 8/10/2011 8:19 PM, Liu Yuan wrote: >>>>>> On 08/11/2011 11:01 AM, Liu Yuan wrote: >>>>>>> >>>>>>>> It looks like the patch wouldn't work for testing multiple >>>>>>>> devices. >>>>>>>> >>>>>>>> vhost_blk_open() does >>>>>>>> + used_info_cachep = KMEM_CACHE(used_info, >>>>>>>> SLAB_HWCACHE_ALIGN | >>>>>>>> SLAB_PANIC); >>>>>>>> >>>>>>> >>>>>>> This is weird. how do you open multiple device?I just opened the >>>>>>> device with following command: >>>>>>> >>>>>>> -drive file=/dev/sda6,if=virtio,cache=none,aio=native -drive >>>>>>> file=~/data0.img,if=virtio,cache=none,aio=native -drive >>>>>>> file=~/data1.img,if=virtio,cache=none,aio=native >>>>>>> >>>>>>> And I didn't meet any problem. >>>>>>> >>>>>>> this would tell qemu to open three devices, and pass three FDs >>>>>>> to three instances of vhost_blk module. >>>>>>> So KMEM_CACHE() is okay in vhost_blk_open(). >>>>>>> >>>>>> >>>>>> Oh, you are right. KMEM_CACHE() is in the wrong place. it is >>>>>> three instances vhost worker threads created. Hmmm, but I didn't >>>>>> meet any problem when opening it and running it. So strange. I'll >>>>>> go to figure it out. >>>>>> >>>>>>>> When opening second device, we get panic since used_info_cachep is >>>>>>>> already created. Just to make progress I moved this call to >>>>>>>> vhost_blk_init(). >>>>>>>> >>>>>>>> I don't see any host panics now. With single block device (dd), >>>>>>>> it seems to work fine. But when I start testing multiple block >>>>>>>> devices I quickly run into hangs in the guest. I see following >>>>>>>> messages in the guest from virtio_ring.c: >>>>>>>> >>>>>>>> virtio_blk virtio2: requests: id 0 is not a head ! >>>>>>>> virtio_blk virtio1: requests: id 0 is not a head ! >>>>>>>> virtio_blk virtio4: requests: id 1 is not a head ! >>>>>>>> virtio_blk virtio3: requests: id 39 is not a head ! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Badari >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> vq->data[] is initialized by guest virtio-blk driver and >>>>>>> vhost_blk is unware of it. it looks like used ID passed >>>>>>> over by vhost_blk to guest virtio_blk is wrong, but, it should >>>>>>> not happen. :| >>>>>>> >>>>>>> And I can't reproduce this on my laptop. :( >>>>>>> >>>>> Finally, found the issue :) >>>>> >>>>> Culprit is: >>>>> >>>>> +static struct io_event events[MAX_EVENTS]; >>>>> >>>>> With multiple devices, multiple threads could be executing >>>>> handle_completion() (one for >>>>> each fd) at the same time. "events" array is global :( Need to >>>>> make it one per device/fd. >>>>> >>>>> For test, I changed MAX_EVENTS to 32 and moved "events" array to >>>>> be local (stack) >>>>> to handle_completion(). Tests are running fine. >>>>> >>>>> Your laptop must have single processor, hence you have only one >>>>> thread executing handle_completion() >>>>> at any time.. >>>>> >>>>> Thanks, >>>>> Badari >>>>> >>>>> >>>> Good catch, this is rather cool!....Yup, I develop it mostly in a >>>> nested KVM environment. and the L2 host only runs single processor :( >>>> >>>> Thanks, >>>> Yuan >>> By the way, MAX_EVENTS should be 128, as much as guest virtio_blk >>> driver can batch-submit, >>> causing array overflow. >>> I have had turned on the debug, and had seen as much as over 100 >>> requests batched from guest OS. >>> >> >> Hmm.. I am not sure why you see over 100 outstanding events per fd. >> Max events could be as high as >> number of number of outstanding IOs. >> >> Anyway, instead of putting it on stack, I kmalloced it now. >> >> Dongsu Park, Here is the complete patch. >> >> Thanks >> Badari >> >> > In the physical machine, there is a queue depth posted by block device > driver to limit the > pending requests number, normally it is 31. But virtio driver doesn't > post it in the guest OS. > So nothing prvents OS batch-submitting requests more than 31. > > I have noticed over 100 pending requests during guest OS initilization > and it is reproducible. > > BTW, how is perf number for vhost-blk in your environment? Right now I am doing "dd" tests to test out the functionality and stability. I plan to collect FFSB benchmark results across 6-virtio-blk/vhost-blk disks with all profiles - seq read, seq write, random read, random write with blocksizes varying from 4k to 1MB. I will start the test tomorrow. It will take few days to run thru all the scenarios. I don't have an easy way to collect host CPU consumption - but for now lets focus on throughput and latency. I will share the results in few days. Thanks Badari