From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dongsu Park Subject: Re: virtio-blk performance regression and qemu-kvm Date: Wed, 22 Feb 2012 17:48:40 +0100 Message-ID: <20120222164840.GA8517@gmail.com> References: <20120210143639.GA17883@gmail.com> <20120221155725.GA950@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org To: Stefan Hajnoczi Return-path: Received: from mail-ee0-f46.google.com ([74.125.83.46]:50243 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750888Ab2BVQst (ORCPT ); Wed, 22 Feb 2012 11:48:49 -0500 Received: by eekc14 with SMTP id c14so77415eek.19 for ; Wed, 22 Feb 2012 08:48:48 -0800 (PST) Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Hi Stefan, see below. On 21.02.2012 17:27, Stefan Hajnoczi wrote: > On Tue, Feb 21, 2012 at 3:57 PM, Dongsu Park > wrote: ...... > I'm not sure if O_DIRECT and Linux AIO to /dev/ram0 is a good idea. > At least with tmpfs O_DIRECT does not even work - which kind of makes > sense there because tmpfs lives in the page cache. My point here is > that ramdisk does not follow the same rules or have the same > performance characteristics as real disks do. It's something to be > careful about. Did you run this test because you noticed a real-world > regression? That's a good point. I agree with you. /dev/ram0 isn't a good choice in this case. Of course I noticed real-world regressions, but not with /dev/ram0. Therefore I tested again with a block device backed by a raw file image. Its result was however nearly the same: regression since 0.15. ...... > Try turning ioeventfd off for the virtio-blk device: > > -device virtio-blk-pci,ioeventfd=off,... > > You might see better performance since ramdisk I/O should be very > low-latency. The overhead of using ioeventfd might not make it > worthwhile. The ioeventfd feature was added post-0.14 IIRC. Normally > it helps avoid stealing vcpu time and also causing lock contention > inside the guest - but if host I/O latency is extremely low it might > be faster to issue I/O from the vcpu thread. Thanks for the tip. I tried that too, but no success. However, today I observed interesting phenomenen. On qemu-kvm command-line, if I set -smp maxcpus to 32, R/W bandwidth gets boosted up to 100 MBps. # /usr/bin/kvm ... -smp 2,cores=1,maxcpus=32,threads=1 -numa mynode,mem=32G,nodeid=mynodeid That looks weird, because my test machine has only 4 physical CPUs. But setting maxcpus=4 brings only poor performance.(< 30 MBps) Additionally, performance seems to decrease if more vCPUs are pinned. In libvirt xml, for example, "2" causes performance degradation, but "2" is ok. That doesn't look reasonable either. Cheers, Dongsu