* Re: performance tuning, problem with paravirtualized clock
2008-04-06 19:56 performance tuning, problem with paravirtualized clock Nikola Ciprich
@ 2008-04-06 20:38 ` Anthony Liguori
2008-04-06 21:57 ` Nikola Ciprich
2008-04-06 20:49 ` Dor Laor
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2008-04-06 20:38 UTC (permalink / raw)
To: Nikola Ciprich; +Cc: kvm-devel
Nikola Ciprich wrote:
> Hi,
> I spent some time trying to tune performance of KVM guest using kernel
> compilation as a kind of benchmark (I'm using virtual machines for
> compiling a lot, so it's good benchmark for me in general)
>
> Host machine: 2x quad core XEON E5420 @ 2.50GHz, 4GB RAM, 2.6.24 + kvm-64
> guest configuration: all 8 cores available, 2GB RAM, 2.6.24 or latest GIT
> + kvm-64
>
> some results:
> - compilation in KVM guest is roughly 2x slower than on bare metal.
> - trying various block device backends (ide, scsi, virtio_blk) didn't
> really matter much for my case
> - enabling CONFIG_KVM_GUEST under latest GIT with kvm-64 patch applied
> decreased compile time by about 10%, which is nice!
> - enabling CONFIG_KVM_CLOCK made guest unstable, often unable to finish
> booting at all, disabling acpi made it a bit better, but still quite
> unstable (cpu0 lock-ups, etc)
>
> Is there currently anything more I could do to improve performance? I'm
> wandering what is slowing compilation, if I compare some CPU intensive
> application (ie bzip2), it seems to run in nearly native speed, but kernel
> compilation is much slower even if run from ramdisk, maybe it could be
> improved further by tunning scheduler etc?
>
I would think you should get about 70% of native with what you've done
about. I've not seen instabilities with CONFIG_KVM_CLOCK myself.
Setting up a hugetlbfs mount and using -mem-path may give you a bit of a
bump too but I'd be surprised if it was more than 5%
The next biggest win you're going to see is using NPT (available in the
recent AMD Barcelona/Phenom processors). NPT + hugetblfs should get
you pretty close to native (I'd reckon 95-98%).
On the Intel side of things, you'll have to wait until the Nehalem which
will support EPT (which is Intel's version of NPT).
Can you be specific about your guest configurations? Are you using -smp 8?
Regards,
Anthony Liguori
> anyways keep up the good work!
> cheers!
> nik
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Register now and save $200. Hurry, offer ends at 11:59 p.m.,
> Monday, April 7! Use priority code J8TLD2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: performance tuning, problem with paravirtualized clock
2008-04-06 20:38 ` Anthony Liguori
@ 2008-04-06 21:57 ` Nikola Ciprich
2008-04-06 22:08 ` Anthony Liguori
0 siblings, 1 reply; 11+ messages in thread
From: Nikola Ciprich @ 2008-04-06 21:57 UTC (permalink / raw)
To: kvm-devel
Hi Anthony!
Anthony Liguori wrote:
> I would think you should get about 70% of native with what you've done
> about. I've not seen instabilities with CONFIG_KVM_CLOCK myself.
>
> Setting up a hugetlbfs mount and using -mem-path may give you a bit of
> a bump too but I'd be surprised if it was more than 5%
I've tried it now, and starting kvm with -mem-path pointing to hugetlbfs
mounted dir immediately fails and I see following message in dmesg of host:
VM: killing process qemu-system-x86
pointing to tmpfs mounted dir seems to work, I'll measure performance
gain...
>
> The next biggest win you're going to see is using NPT (available in
> the recent AMD Barcelona/Phenom processors). NPT + hugetblfs should
> get you pretty close to native (I'd reckon 95-98%).
Yup, it seemed to me that kvm performes WAY better on my phenom based
home desktop! I'll check that later too
>
> On the Intel side of things, you'll have to wait until the Nehalem
> which will support EPT (which is Intel's version of NPT).
>
> Can you be specific about your guest configurations? Are you using
> -smp 8?
yes, I'm using -smp 8
>
> Regards,
>
> Anthony Liguori
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: performance tuning, problem with paravirtualized clock
2008-04-06 21:57 ` Nikola Ciprich
@ 2008-04-06 22:08 ` Anthony Liguori
2008-04-06 22:24 ` Nikola Ciprich
0 siblings, 1 reply; 11+ messages in thread
From: Anthony Liguori @ 2008-04-06 22:08 UTC (permalink / raw)
To: Nikola Ciprich; +Cc: kvm-devel
Nikola Ciprich wrote:
> Hi Anthony!
> Anthony Liguori wrote:
>
>> I would think you should get about 70% of native with what you've done
>> about. I've not seen instabilities with CONFIG_KVM_CLOCK myself.
>>
>> Setting up a hugetlbfs mount and using -mem-path may give you a bit of
>> a bump too but I'd be surprised if it was more than 5%
>>
> I've tried it now, and starting kvm with -mem-path pointing to hugetlbfs
> mounted dir immediately fails and I see following message in dmesg of host:
> VM: killing process qemu-system-x86
> pointing to tmpfs mounted dir seems to work, I'll measure performance
> gain...
>
You won't see a gain with tmpfs. Make sure you reserve huge pages
first. For a 1GB guest, you'll need something like:
echo 540 > /proc/sys/vm/nr_hugepages
When you create a VM, you need a bit more memory than 1GB for per-guest
overhead. That's why I reserve 540 instead of 512. You can probably
get away with 530 really.
Check that it succeeded by cat'ing /proc/meminfo.
>> The next biggest win you're going to see is using NPT (available in
>> the recent AMD Barcelona/Phenom processors). NPT + hugetblfs should
>> get you pretty close to native (I'd reckon 95-98%).
>>
> Yup, it seemed to me that kvm performes WAY better on my phenom based
> home desktop! I'll check that later too
>
Definitely. A parallel compile is one of the best case scenarios for
NPT so you should see the most dramatic improvement there.
>> On the Intel side of things, you'll have to wait until the Nehalem
>> which will support EPT (which is Intel's version of NPT).
>>
>> Can you be specific about your guest configurations? Are you using
>> -smp 8?
>>
> yes, I'm using -smp 8
>
It's not quite apples to apples then since you're sharing CPUs with the
host. Typically, if I'm benchmarking an 8-way system with 2GB of RAM,
I'll create a 4-way guest with 1GB of RAM and then to generate native
numbers, reboot the host with maxcpus=4 mem=1G.
Regards,
Anthony Liguori
>> Regards,
>>
>> Anthony Liguori
>>
>
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Register now and save $200. Hurry, offer ends at 11:59 p.m.,
> Monday, April 7! Use priority code J8TLD2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: performance tuning, problem with paravirtualized clock
2008-04-06 22:08 ` Anthony Liguori
@ 2008-04-06 22:24 ` Nikola Ciprich
2008-04-06 22:33 ` Anthony Liguori
0 siblings, 1 reply; 11+ messages in thread
From: Nikola Ciprich @ 2008-04-06 22:24 UTC (permalink / raw)
To: Anthony Liguori, kvm-devel
Anthony Liguori wrote:
> You won't see a gain with tmpfs. Make sure you reserve huge pages
> first. For a 1GB guest, you'll need something like:
>
> echo 540 > /proc/sys/vm/nr_hugepages
>
> When you create a VM, you need a bit more memory than 1GB for
> per-guest overhead. That's why I reserve 540 instead of 512. You can
> probably get away with 530 really.
>
> Check that it succeeded by cat'ing /proc/meminfo.
>
Well, I tried various values now, and booting fails immediately:
Decompressing Linux...
invalid compressed format (err=1)
-- System halted
weird...
anyways I'll also give fresh kvm-65 a try now :)
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: performance tuning, problem with paravirtualized clock
2008-04-06 22:24 ` Nikola Ciprich
@ 2008-04-06 22:33 ` Anthony Liguori
0 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2008-04-06 22:33 UTC (permalink / raw)
To: Nikola Ciprich; +Cc: kvm-devel
Nikola Ciprich wrote:
> Anthony Liguori wrote:
>> You won't see a gain with tmpfs. Make sure you reserve huge pages
>> first. For a 1GB guest, you'll need something like:
>>
>> echo 540 > /proc/sys/vm/nr_hugepages
>>
>> When you create a VM, you need a bit more memory than 1GB for
>> per-guest overhead. That's why I reserve 540 instead of 512. You
>> can probably get away with 530 really.
>>
>> Check that it succeeded by cat'ing /proc/meminfo.
>>
> Well, I tried various values now, and booting fails immediately:
> Decompressing Linux...
It's very likely that you won't be able to allocate enough pages to run
a guest if your system has been running sufficient long and memory is
highly fragmented.
If you cat /proc/meminfo and HugePages_Free is much less than 540,
you're not going to be able to create a 1GB guest. Unfortunately, you
won't get a failure until the guest tries to use memory.
The only solution is to reboot and reserve huge pages early before they
get fragmented.
Regards,
Anthony Liguori
> invalid compressed format (err=1)
>
> -- System halted
>
>
> weird...
> anyways I'll also give fresh kvm-65 a try now :)
>
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: performance tuning, problem with paravirtualized clock
2008-04-06 19:56 performance tuning, problem with paravirtualized clock Nikola Ciprich
2008-04-06 20:38 ` Anthony Liguori
@ 2008-04-06 20:49 ` Dor Laor
2008-04-06 22:25 ` Nikola Ciprich
2008-04-07 4:14 ` Avi Kivity
2008-04-07 15:41 ` Marcelo Tosatti
3 siblings, 1 reply; 11+ messages in thread
From: Dor Laor @ 2008-04-06 20:49 UTC (permalink / raw)
To: Nikola Ciprich; +Cc: kvm-devel
On Sun, 2008-04-06 at 21:56 +0200, Nikola Ciprich wrote:
> Hi,
> I spent some time trying to tune performance of KVM guest using kernel
> compilation as a kind of benchmark (I'm using virtual machines for
> compiling a lot, so it's good benchmark for me in general)
>
> Host machine: 2x quad core XEON E5420 @ 2.50GHz, 4GB RAM, 2.6.24 + kvm-64
> guest configuration: all 8 cores available, 2GB RAM, 2.6.24 or latest GIT
> + kvm-64
>
> some results:
> - compilation in KVM guest is roughly 2x slower than on bare metal.
> - trying various block device backends (ide, scsi, virtio_blk) didn't
> really matter much for my case
> - enabling CONFIG_KVM_GUEST under latest GIT with kvm-64 patch applied
> decreased compile time by about 10%, which is nice!
> - enabling CONFIG_KVM_CLOCK made guest unstable, often unable to finish
> booting at all, disabling acpi made it a bit better, but still quite
> unstable (cpu0 lock-ups, etc)
>
> Is there currently anything more I could do to improve performance? I'm
> wandering what is slowing compilation, if I compare some CPU intensive
> application (ie bzip2), it seems to run in nearly native speed, but kernel
> compilation is much slower even if run from ramdisk, maybe it could be
> improved further by tunning scheduler etc?
>
Can you try a non smp guest and check the results?
Also if you do try smp guest, can you pin each thread to a different
physical core and re-test?
Regards,
Dor
> anyways keep up the good work!
> cheers!
> nik
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Register now and save $200. Hurry, offer ends at 11:59 p.m.,
> Monday, April 7! Use priority code J8TLD2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: performance tuning, problem with paravirtualized clock
2008-04-06 19:56 performance tuning, problem with paravirtualized clock Nikola Ciprich
2008-04-06 20:38 ` Anthony Liguori
2008-04-06 20:49 ` Dor Laor
@ 2008-04-07 4:14 ` Avi Kivity
2008-04-07 15:41 ` Marcelo Tosatti
3 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2008-04-07 4:14 UTC (permalink / raw)
To: Nikola Ciprich; +Cc: kvm-devel
Nikola Ciprich wrote:
> Hi,
> I spent some time trying to tune performance of KVM guest using kernel
> compilation as a kind of benchmark (I'm using virtual machines for
> compiling a lot, so it's good benchmark for me in general)
>
> Host machine: 2x quad core XEON E5420 @ 2.50GHz, 4GB RAM, 2.6.24 + kvm-64
> guest configuration: all 8 cores available, 2GB RAM, 2.6.24 or latest GIT
> + kvm-64
>
> some results:
> - compilation in KVM guest is roughly 2x slower than on bare metal.
>
50% scaling is actually quite good for 8-way.
What do you get for 4-way guests?
> - enabling CONFIG_KVM_GUEST under latest GIT with kvm-64 patch applied
> decreased compile time by about 10%, which is nice!
>
>
We expect to improve this some more as paravirt_ops improves.
> Is there currently anything more I could do to improve performance? I'm
> wandering what is slowing compilation, if I compare some CPU intensive
> application (ie bzip2), it seems to run in nearly native speed, but kernel
> compilation is much slower even if run from ramdisk, maybe it could be
> improved further by tunning scheduler etc?
>
>
As others mentioned, large pages may help somewhat, as well as newer
hardware. kvm-65 also improves scalability.
How much cpu does the qemu process consume? Perfect utilization would
be around 800%. What's the system/user ratio (I think you need to use
2.6.25-rc to get accurate results for this)?
Can you provide the result of 'kvm_stat -1' taken a few times during a
compile run?
--
Any sufficiently difficult bug is indistinguishable from a feature.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: performance tuning, problem with paravirtualized clock
2008-04-06 19:56 performance tuning, problem with paravirtualized clock Nikola Ciprich
` (2 preceding siblings ...)
2008-04-07 4:14 ` Avi Kivity
@ 2008-04-07 15:41 ` Marcelo Tosatti
2008-04-07 20:47 ` Nikola Ciprich
3 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2008-04-07 15:41 UTC (permalink / raw)
To: Nikola Ciprich; +Cc: kvm-devel
On Sun, Apr 06, 2008 at 09:56:39PM +0200, Nikola Ciprich wrote:
> Hi,
> I spent some time trying to tune performance of KVM guest using kernel
> compilation as a kind of benchmark (I'm using virtual machines for
> compiling a lot, so it's good benchmark for me in general)
>
> Host machine: 2x quad core XEON E5420 @ 2.50GHz, 4GB RAM, 2.6.24 + kvm-64
> guest configuration: all 8 cores available, 2GB RAM, 2.6.24 or latest GIT
> + kvm-64
>
> some results:
> - compilation in KVM guest is roughly 2x slower than on bare metal.
> - trying various block device backends (ide, scsi, virtio_blk) didn't
> really matter much for my case
> - enabling CONFIG_KVM_GUEST under latest GIT with kvm-64 patch applied
> decreased compile time by about 10%, which is nice!
> - enabling CONFIG_KVM_CLOCK made guest unstable, often unable to finish
> booting at all, disabling acpi made it a bit better, but still quite
> unstable (cpu0 lock-ups, etc)
Can you please provide more details on this? Which kernel version are
you running on the host?
Please try -git (there was a lock inversion problem in the KVM clock
which could cause lock ups).
>
> Is there currently anything more I could do to improve performance? I'm
> wandering what is slowing compilation, if I compare some CPU intensive
> application (ie bzip2), it seems to run in nearly native speed, but kernel
> compilation is much slower even if run from ramdisk, maybe it could be
> improved further by tunning scheduler etc?
You might want to pin each guest VCPU to a physical host CPU and check
if that makes a difference.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: performance tuning, problem with paravirtualized clock
2008-04-07 15:41 ` Marcelo Tosatti
@ 2008-04-07 20:47 ` Nikola Ciprich
0 siblings, 0 replies; 11+ messages in thread
From: Nikola Ciprich @ 2008-04-07 20:47 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: kvm-devel
Hi Marcelo!
>> Host machine: 2x quad core XEON E5420 @ 2.50GHz, 4GB RAM, 2.6.24 + kvm-64
>> guest configuration: all 8 cores available, 2GB RAM, 2.6.24 or latest GIT
>> + kvm-64
> Can you please provide more details on this? Which kernel version are
> you running on the host?
As I wrote, I use 2.6.24 for the host (+ kvm-65 at the moment), and I'm
trying both 2.6.24 and git for the guest.
>
> Please try -git (there was a lock inversion problem in the KVM clock
> which could cause lock ups).
Unfortunately the problem persists with today's git with kvm-65 patch
applied.
> You might want to pin each guest VCPU to a physical host CPU and check
> if that makes a difference.
It seems that processes are divided on cores quite well, but I'll try
pinning them explicitly to be sure...
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
^ permalink raw reply [flat|nested] 11+ messages in thread