From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60604) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCTD8-0000KQ-JW for qemu-devel@nongnu.org; Wed, 30 Jul 2014 08:42:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XCTD0-00063d-PC for qemu-devel@nongnu.org; Wed, 30 Jul 2014 08:42:26 -0400 Received: from e06smtp12.uk.ibm.com ([195.75.94.108]:58399) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCTD0-00063O-GP for qemu-devel@nongnu.org; Wed, 30 Jul 2014 08:42:18 -0400 Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 30 Jul 2014 13:42:15 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id 46BCA2190045 for ; Wed, 30 Jul 2014 13:41:58 +0100 (BST) Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212]) by b06cxnps3075.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s6UCgDpe35323946 for ; Wed, 30 Jul 2014 12:42:13 GMT Received: from d06av01.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s6UCgBWx003674 for ; Wed, 30 Jul 2014 06:42:13 -0600 Message-ID: <53D8E822.7050007@de.ibm.com> Date: Wed, 30 Jul 2014 14:42:10 +0200 From: Christian Borntraeger MIME-Version: 1.0 References: <1406720388-18671-1-git-send-email-ming.lei@canonical.com> In-Reply-To: <1406720388-18671-1-git-send-email-ming.lei@canonical.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 00/14] dataplane: optimization and multi virtqueue support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ming Lei , qemu-devel@nongnu.org, Peter Maydell , Paolo Bonzini , Stefan Hajnoczi Cc: Kevin Wolf , Christian Ehrhardt , Fam Zheng , "Michael S. Tsirkin" On 30/07/14 13:39, Ming Lei wrote: > These patches bring up below 4 changes: > > - introduce selective coroutine bypass mechanism > for improving performance of virtio-blk dataplane with > raw format image > > - introduce object allocation pool and apply it to > virtio-blk dataplane for improving its performance > > - linux-aio changes: fixing for cases of -EAGAIN and partial > completion, increase max events to 256, and remove one unuseful > fields in 'struct qemu_laiocb' > > - support multi virtqueue for virtio-blk dataplane > > The virtio-blk multi virtqueue feature will be added to virtio spec 1.1[1], > and the 3.17 linux kernel[2] will support the feature in virtio-blk driver. > For those who wants to play the stuff, the kernel side patche can be found > in either Jens's block tree[3] or linux-next[4]. > > Below fio script running from VM is used for test improvement of these patches: > > [global] > direct=1 > size=128G > bsrange=4k-4k > timeout=120 > numjobs=${JOBS} > ioengine=libaio > iodepth=64 > filename=/dev/vdc > group_reporting=1 > > [f] > rw=randread > > One quadcore VM(8G RAM) is created in below host to run above fio test: > > - server(16cores: 8 physical cores, 2 threads per physical core) > > Follows the test result on throughput improvement(IOPS) with > this patchset(4 virtqueues per virito-blk device) against QEMU > 2.1.0-rc5: 30% throughput improvement can be observed, and > scalability for parallel I/Os is improved more(80% throughput > improvement is observed in case of 4 JOBS). > > From above result, we can see both scalability and performance > get improved a lot. > > After commit 580b6b2aa2(dataplane: use the QEMU block > layer for I/O), average time for submiting one single > request has been increased a lot, as my trace, the average > time taken for submiting one request has been doubled even > though block plug&unplug mechanism is introduced to > ease its effect. That is why this patchset introduces > selective coroutine bypass mechanism and object allocation > pool for saving the time first. Based on QEMU 2.0, only > single virtio-blk dataplane multi virtqueue patch can get > better improvement than current result[5]. > > TODO: > - optimize block layer for linux aio, so that > more time can be saved for submitting request > - support more than one aio-context for improving > virtio-blk performance [...] > > [1], http://marc.info/?l=linux-api&m=140486843317107&w=2 > [2], http://marc.info/?l=linux-api&m=140418368421229&w=2 > [3], http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/ #for-3.17/drivers > [4], https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/ > [5], http://marc.info/?l=linux-api&m=140377573830230&w=2 FYI, I just tested with one virtqueue on s390 (3.15 as guest). It was just a quick sniff, but we are getting closer to the fio results that we had before commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O). I cant give proper numbers right now, as I am on a shared storage subsystem but this looks like we are on the right track. I have not looked at the code, though. Christian