From: Dongsu Park <dongsu.park@profitbricks.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: virtio-blk performance regression and qemu-kvm
Date: Wed, 22 Feb 2012 17:48:40 +0100 [thread overview]
Message-ID: <20120222164840.GA8517@gmail.com> (raw)
In-Reply-To: <CAJSP0QVjXX2NbJ1mOQtyJ7akycem3p4u6t4y3H5Nv1Dq-CQ9AQ@mail.gmail.com>
Hi Stefan,
see below.
On 21.02.2012 17:27, Stefan Hajnoczi wrote:
> On Tue, Feb 21, 2012 at 3:57 PM, Dongsu Park
> <dongsu.park@profitbricks.com> wrote:
...<snip>...
> I'm not sure if O_DIRECT and Linux AIO to /dev/ram0 is a good idea.
> At least with tmpfs O_DIRECT does not even work - which kind of makes
> sense there because tmpfs lives in the page cache. My point here is
> that ramdisk does not follow the same rules or have the same
> performance characteristics as real disks do. It's something to be
> careful about. Did you run this test because you noticed a real-world
> regression?
That's a good point.
I agree with you. /dev/ram0 isn't a good choice in this case.
Of course I noticed real-world regressions, but not with /dev/ram0.
Therefore I tested again with a block device backed by a raw file image.
Its result was however nearly the same: regression since 0.15.
...<snip>...
> Try turning ioeventfd off for the virtio-blk device:
>
> -device virtio-blk-pci,ioeventfd=off,...
>
> You might see better performance since ramdisk I/O should be very
> low-latency. The overhead of using ioeventfd might not make it
> worthwhile. The ioeventfd feature was added post-0.14 IIRC. Normally
> it helps avoid stealing vcpu time and also causing lock contention
> inside the guest - but if host I/O latency is extremely low it might
> be faster to issue I/O from the vcpu thread.
Thanks for the tip. I tried that too, but no success.
However, today I observed interesting phenomenen.
On qemu-kvm command-line, if I set -smp maxcpus to 32,
R/W bandwidth gets boosted up to 100 MBps.
# /usr/bin/kvm ...
-smp 2,cores=1,maxcpus=32,threads=1 -numa mynode,mem=32G,nodeid=mynodeid
That looks weird, because my test machine has only 4 physical CPUs.
But setting maxcpus=4 brings only poor performance.(< 30 MBps)
Additionally, performance seems to decrease if more vCPUs are pinned.
In libvirt xml, for example, "<vcpu cpuset='0-1'>2</vcpu>" causes
performance degradation, but "<vcpu cpuset='1'>2</vcpu>" is ok.
That doesn't look reasonable either.
Cheers,
Dongsu
WARNING: multiple messages have this Message-ID (diff)
From: Dongsu Park <dongsu.park@profitbricks.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm
Date: Wed, 22 Feb 2012 17:48:40 +0100 [thread overview]
Message-ID: <20120222164840.GA8517@gmail.com> (raw)
In-Reply-To: <CAJSP0QVjXX2NbJ1mOQtyJ7akycem3p4u6t4y3H5Nv1Dq-CQ9AQ@mail.gmail.com>
Hi Stefan,
see below.
On 21.02.2012 17:27, Stefan Hajnoczi wrote:
> On Tue, Feb 21, 2012 at 3:57 PM, Dongsu Park
> <dongsu.park@profitbricks.com> wrote:
...<snip>...
> I'm not sure if O_DIRECT and Linux AIO to /dev/ram0 is a good idea.
> At least with tmpfs O_DIRECT does not even work - which kind of makes
> sense there because tmpfs lives in the page cache. My point here is
> that ramdisk does not follow the same rules or have the same
> performance characteristics as real disks do. It's something to be
> careful about. Did you run this test because you noticed a real-world
> regression?
That's a good point.
I agree with you. /dev/ram0 isn't a good choice in this case.
Of course I noticed real-world regressions, but not with /dev/ram0.
Therefore I tested again with a block device backed by a raw file image.
Its result was however nearly the same: regression since 0.15.
...<snip>...
> Try turning ioeventfd off for the virtio-blk device:
>
> -device virtio-blk-pci,ioeventfd=off,...
>
> You might see better performance since ramdisk I/O should be very
> low-latency. The overhead of using ioeventfd might not make it
> worthwhile. The ioeventfd feature was added post-0.14 IIRC. Normally
> it helps avoid stealing vcpu time and also causing lock contention
> inside the guest - but if host I/O latency is extremely low it might
> be faster to issue I/O from the vcpu thread.
Thanks for the tip. I tried that too, but no success.
However, today I observed interesting phenomenen.
On qemu-kvm command-line, if I set -smp maxcpus to 32,
R/W bandwidth gets boosted up to 100 MBps.
# /usr/bin/kvm ...
-smp 2,cores=1,maxcpus=32,threads=1 -numa mynode,mem=32G,nodeid=mynodeid
That looks weird, because my test machine has only 4 physical CPUs.
But setting maxcpus=4 brings only poor performance.(< 30 MBps)
Additionally, performance seems to decrease if more vCPUs are pinned.
In libvirt xml, for example, "<vcpu cpuset='0-1'>2</vcpu>" causes
performance degradation, but "<vcpu cpuset='1'>2</vcpu>" is ok.
That doesn't look reasonable either.
Cheers,
Dongsu
next prev parent reply other threads:[~2012-02-22 16:48 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 14:36 virtio-blk performance regression and qemu-kvm Dongsu Park
2012-02-10 14:36 ` [Qemu-devel] " Dongsu Park
2012-02-12 23:55 ` Rusty Russell
2012-02-12 23:55 ` [Qemu-devel] " Rusty Russell
2012-02-21 16:45 ` Dongsu Park
2012-02-21 16:45 ` [Qemu-devel] " Dongsu Park
2012-02-21 22:16 ` Rusty Russell
2012-02-21 22:16 ` [Qemu-devel] " Rusty Russell
2012-02-13 11:57 ` Stefan Hajnoczi
2012-02-13 11:57 ` [Qemu-devel] " Stefan Hajnoczi
2012-02-21 15:57 ` Dongsu Park
2012-02-21 15:57 ` [Qemu-devel] " Dongsu Park
2012-02-21 17:27 ` Stefan Hajnoczi
2012-02-21 17:27 ` [Qemu-devel] " Stefan Hajnoczi
2012-02-22 16:48 ` Dongsu Park [this message]
2012-02-22 16:48 ` Dongsu Park
2012-02-22 19:53 ` Stefan Hajnoczi
2012-02-22 19:53 ` [Qemu-devel] " Stefan Hajnoczi
2012-02-28 16:39 ` Martin Mailand
2012-02-28 16:39 ` [Qemu-devel] " Martin Mailand
2012-02-28 17:05 ` Stefan Hajnoczi
2012-02-28 17:05 ` [Qemu-devel] " Stefan Hajnoczi
2012-02-28 17:15 ` Martin Mailand
2012-02-28 17:15 ` [Qemu-devel] " Martin Mailand
2012-02-29 8:38 ` Stefan Hajnoczi
2012-02-29 8:38 ` [Qemu-devel] " Stefan Hajnoczi
2012-02-29 13:12 ` Martin Mailand
2012-02-29 13:12 ` [Qemu-devel] " Martin Mailand
2012-02-29 13:44 ` Stefan Hajnoczi
2012-02-29 13:44 ` [Qemu-devel] " Stefan Hajnoczi
2012-02-29 13:52 ` Stefan Hajnoczi
2012-02-29 13:52 ` [Qemu-devel] " Stefan Hajnoczi
2012-03-05 16:13 ` Martin Mailand
2012-03-05 16:13 ` [Qemu-devel] " Martin Mailand
2012-03-05 16:35 ` Stefan Hajnoczi
2012-03-05 16:35 ` [Qemu-devel] " Stefan Hajnoczi
2012-03-05 16:44 ` Martin Mailand
2012-03-05 16:44 ` [Qemu-devel] " Martin Mailand
2012-03-06 12:59 ` Stefan Hajnoczi
2012-03-06 12:59 ` [Qemu-devel] " Stefan Hajnoczi
2012-03-06 22:07 ` Reeted
2012-03-06 22:07 ` Reeted
2012-03-07 8:04 ` Stefan Hajnoczi
2012-03-07 14:21 ` Reeted
2012-03-07 14:33 ` Stefan Hajnoczi
2012-03-07 14:33 ` Stefan Hajnoczi
2012-03-07 10:39 ` Martin Mailand
2012-03-07 10:39 ` [Qemu-devel] " Martin Mailand
2012-03-07 11:21 ` Paolo Bonzini
2012-03-07 11:21 ` [Qemu-devel] " Paolo Bonzini
2012-03-06 14:32 ` Dongsu Park
2012-03-06 14:32 ` [Qemu-devel] " Dongsu Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120222164840.GA8517@gmail.com \
--to=dongsu.park@profitbricks.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.