From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39757) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEwwB-0003wa-D1 for qemu-devel@nongnu.org; Wed, 06 Aug 2014 04:51:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XEww2-0003A4-8n for qemu-devel@nongnu.org; Wed, 06 Aug 2014 04:51:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:5426) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEww2-00039o-0n for qemu-devel@nongnu.org; Wed, 06 Aug 2014 04:51:02 -0400 Message-ID: <53E1EC6D.9020308@redhat.com> Date: Wed, 06 Aug 2014 10:50:53 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1407209598-2572-1-git-send-email-ming.lei@canonical.com> <20140805094844.GF4391@noname.str.redhat.com> <20140805134815.GD12251@stefanha-thinkpad.redhat.com> <20140805144728.GH4391@noname.str.redhat.com> <53E1DD0E.8080202@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ming Lei Cc: Kevin Wolf , Peter Maydell , Fam Zheng , "Michael S. Tsirkin" , qemu-devel , Stefan Hajnoczi Il 06/08/2014 10:38, Ming Lei ha scritto: > On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini wrote: >> Il 06/08/2014 07:33, Ming Lei ha scritto: >>>>> I played a bit with the following, I hope it's not too naive. I couldn't >>>>> see a difference with your patches, but at least one reason for this is >>>>> probably that my laptop SSD isn't fast enough to make the CPU the >>>>> bottleneck. Haven't tried ramdisk yet, that would probably be the next >>>>> thing. (I actually wrote the patch up just for some profiling on my own, >>>>> not for comparing throughput, but it should be usable for that as well.) >>> This might not be good for the test since it is basically a sequential >>> read test, which can be optimized a lot by kernel. And I always use >>> randread benchmark. >> >> A microbenchmark already exists in tests/test-coroutine.c, and doesn't >> really tell us much; it's obvious that coroutines execute more code, the >> question is why it affects the iops performance. > > Could you take a look at the coroutine benchmark I worte? The running > result shows coroutine does decrease performance a lot compared with > bypass coroutine like the patchset is doing. Your benchmark is synchronous, while disk I/O is asynchronous. Your benchmark doesn't add much compared to "time tests/test-coroutine -m perf -p /perf/yield". It takes 8 seconds on my machine, and 10^8 function calls obviously take less than 8 seconds. I've sent a patch to add a "baseline" function call benchmark to test-coroutine. >> The sequential read should be the right workload. For fio, you want to >> get as many iops as possible to QEMU and so you need randread. But >> qemu-img is not run in a guest and if the kernel optimizes sequential >> reads then the bypass should have even more benefits because it makes >> userspace proportionally more expensive. Do you agree with this? Paolo