From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dongsu Park Subject: Re: virtio-blk performance regression and qemu-kvm Date: Wed, 22 Feb 2012 17:48:40 +0100 Message-ID: <20120222164840.GA8517@gmail.com> References: <20120210143639.GA17883@gmail.com> <20120221155725.GA950@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org To: Stefan Hajnoczi Return-path: Received: from mail-ee0-f46.google.com ([74.125.83.46]:50243 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750888Ab2BVQst (ORCPT ); Wed, 22 Feb 2012 11:48:49 -0500 Received: by eekc14 with SMTP id c14so77415eek.19 for ; Wed, 22 Feb 2012 08:48:48 -0800 (PST) Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Hi Stefan, see below. On 21.02.2012 17:27, Stefan Hajnoczi wrote: > On Tue, Feb 21, 2012 at 3:57 PM, Dongsu Park > wrote: ...... > I'm not sure if O_DIRECT and Linux AIO to /dev/ram0 is a good idea. > At least with tmpfs O_DIRECT does not even work - which kind of makes > sense there because tmpfs lives in the page cache. My point here is > that ramdisk does not follow the same rules or have the same > performance characteristics as real disks do. It's something to be > careful about. Did you run this test because you noticed a real-world > regression? That's a good point. I agree with you. /dev/ram0 isn't a good choice in this case. Of course I noticed real-world regressions, but not with /dev/ram0. Therefore I tested again with a block device backed by a raw file image. Its result was however nearly the same: regression since 0.15. ...... > Try turning ioeventfd off for the virtio-blk device: > > -device virtio-blk-pci,ioeventfd=off,... > > You might see better performance since ramdisk I/O should be very > low-latency. The overhead of using ioeventfd might not make it > worthwhile. The ioeventfd feature was added post-0.14 IIRC. Normally > it helps avoid stealing vcpu time and also causing lock contention > inside the guest - but if host I/O latency is extremely low it might > be faster to issue I/O from the vcpu thread. Thanks for the tip. I tried that too, but no success. However, today I observed interesting phenomenen. On qemu-kvm command-line, if I set -smp maxcpus to 32, R/W bandwidth gets boosted up to 100 MBps. # /usr/bin/kvm ... -smp 2,cores=1,maxcpus=32,threads=1 -numa mynode,mem=32G,nodeid=mynodeid That looks weird, because my test machine has only 4 physical CPUs. But setting maxcpus=4 brings only poor performance.(< 30 MBps) Additionally, performance seems to decrease if more vCPUs are pinned. In libvirt xml, for example, "2" causes performance degradation, but "2" is ok. That doesn't look reasonable either. Cheers, Dongsu From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:52596) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S0FNC-0001e2-DG for qemu-devel@nongnu.org; Wed, 22 Feb 2012 11:49:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S0FN4-0002Pp-Fl for qemu-devel@nongnu.org; Wed, 22 Feb 2012 11:48:58 -0500 Received: from mail-ee0-f45.google.com ([74.125.83.45]:53003) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S0FN4-0002Pa-BF for qemu-devel@nongnu.org; Wed, 22 Feb 2012 11:48:50 -0500 Received: by eekd17 with SMTP id d17so73939eek.4 for ; Wed, 22 Feb 2012 08:48:48 -0800 (PST) Date: Wed, 22 Feb 2012 17:48:40 +0100 From: Dongsu Park Message-ID: <20120222164840.GA8517@gmail.com> References: <20120210143639.GA17883@gmail.com> <20120221155725.GA950@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org Hi Stefan, see below. On 21.02.2012 17:27, Stefan Hajnoczi wrote: > On Tue, Feb 21, 2012 at 3:57 PM, Dongsu Park > wrote: ...... > I'm not sure if O_DIRECT and Linux AIO to /dev/ram0 is a good idea. > At least with tmpfs O_DIRECT does not even work - which kind of makes > sense there because tmpfs lives in the page cache. My point here is > that ramdisk does not follow the same rules or have the same > performance characteristics as real disks do. It's something to be > careful about. Did you run this test because you noticed a real-world > regression? That's a good point. I agree with you. /dev/ram0 isn't a good choice in this case. Of course I noticed real-world regressions, but not with /dev/ram0. Therefore I tested again with a block device backed by a raw file image. Its result was however nearly the same: regression since 0.15. ...... > Try turning ioeventfd off for the virtio-blk device: > > -device virtio-blk-pci,ioeventfd=off,... > > You might see better performance since ramdisk I/O should be very > low-latency. The overhead of using ioeventfd might not make it > worthwhile. The ioeventfd feature was added post-0.14 IIRC. Normally > it helps avoid stealing vcpu time and also causing lock contention > inside the guest - but if host I/O latency is extremely low it might > be faster to issue I/O from the vcpu thread. Thanks for the tip. I tried that too, but no success. However, today I observed interesting phenomenen. On qemu-kvm command-line, if I set -smp maxcpus to 32, R/W bandwidth gets boosted up to 100 MBps. # /usr/bin/kvm ... -smp 2,cores=1,maxcpus=32,threads=1 -numa mynode,mem=32G,nodeid=mynodeid That looks weird, because my test machine has only 4 physical CPUs. But setting maxcpus=4 brings only poor performance.(< 30 MBps) Additionally, performance seems to decrease if more vCPUs are pinned. In libvirt xml, for example, "2" causes performance degradation, but "2" is ok. That doesn't look reasonable either. Cheers, Dongsu