* Windows Server 2008 VM performance
@ 2009-06-02 21:05 Andrew Theurer
2009-06-02 21:38 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Theurer @ 2009-06-02 21:05 UTC (permalink / raw)
To: KVM list
I've been looking at how KVM handles windows guests, and I am a little
concerned with the CPU overhead. My test case is as follows:
I am running 4 instances of a J2EE benchmark. Each instance needs one
application server and one DB server. 8 VMs in total are used.
I have the same App and DB software for Linux and Windows (and same
versions) so I can compare between Linux and Windows. I also have
another hypervisor which I can test both Windows and Linux VMs.
The host has EPT capable processors. VMs in KVM are backed with large
pages.
Test results:
Config CPU utilization
------ -----
KVM-85
Windows Server 2008 64-bit VMs 44.84
RedHat 5.3 w/ 2.6.29 64-bit VMs 24.56
Other-Hypervisor
Windows Server 2008 64-bit VMs 30.63
RedHat 5.3 w/ 2.6.18 64-bit VMs 27.13
-KVM running Windows VMs uses 46% more CPU than the Other-Hypervisor
-The Other-Hypervisor provides an optimized virtual network driver
-KVM results listed above did not use virtio_net or virtio_disk for
Windows, but do for Linux
-One extra KVM run (not listed above) was made with virtio_net for
Windows VMs but only reduced CPU by 2%
-Most of the CPU overhead could be attributed to the DB VMs, where there
is about 5 MB/sec writes per VM
-I don't have a virtio_block driver for Windows to test. Does one exist?
-All tests above had 2 vCPUS per VM
Here's a comparison of kvm_stat between Windows (run1) and Linux (run2):
run1 run2 run1/run2
---- ---- ---------
efer_relo: 0 0 1
exits : 1206880 121916 9.899
fpu_reloa: 210969 20863 10.112
halt_exit: 15092 13222 1.141
halt_wake: 14466 9294 1.556
host_stat: 211066 45117 4.678
hypercall: 0 0 1
insn_emul: 119582 38126 3.136
insn_emul: 0 0 1
invlpg : 0 0 1
io_exits : 131051 26349 4.974
irq_exits: 8128 12937 0.628
irq_injec: 29955 21825 1.373
irq_windo: 2504 2022 1.238
kvm_reque: 0 0 1
largepage: 1 64 0.009
mmio_exit: 59224 0 Inf
mmu_cache: 0 3 0.000
mmu_flood: 0 0 1
mmu_pde_z: 0 0 1
mmu_pte_u: 0 0 1
mmu_pte_w: 0 0 1
mmu_recyc: 0 0 1
mmu_shado: 0 0 1
mmu_unsyn: 0 0 1
mmu_unsyn: 0 0 1
nmi_injec: 0 0 1
nmi_windo: 0 0 1
pf_fixed : 1 67 0.009
pf_guest : 0 0 1
remote_tl: 0 0 1
request_n: 0 0 1
signal_ex: 0 0 1
tlb_flush: 220 14037 0.016
10x the number of exits, a problem?
I happened to try just one vCPU per VM for KVM/Windows VMs, and I was
surprised how much of a difference it made:
Config CPU utilization
------ -----
KVM-85
Windows Server 2008 64-bit VMs, 2 vCPU per VM 44.84
Windows Server 2008 64-bit VMs, 1 vCPU per VM 36.44
A 19% reduction in CPU utilization vs KVM/Windows-2vCPU! Does not
explain all the overhead (vs Other-Hypervisor, 2 vCPUs per VM) but, that
sure seems like a lot between 1 to 2 vCPUs for KVM/Windows-VMs. I have
not run with 1 vCPU per VM with Other-Hypervisor, but I will soon.
Anyway, I also collected kvm_stat for the 1 vCPU case, and here it is
compared to KVM/Linux VMs with 2 vCPUs:
run1 run2 run1/run2
---- ---- ---------
efer_relo: 0 0 1
exits : 1184471 121916 9.715
fpu_reloa: 192766 20863 9.240
halt_exit: 4697 13222 0.355
halt_wake: 4360 9294 0.469
host_stat: 192828 45117 4.274
hypercall: 0 0 1
insn_emul: 130487 38126 3.422
insn_emul: 0 0 1
invlpg : 0 0 1
io_exits : 114430 26349 4.343
irq_exits: 7075 12937 0.547
irq_injec: 29930 21825 1.371
irq_windo: 2391 2022 1.182
kvm_reque: 0 0 1
largepage: 0 64 0.001
mmio_exit: 69028 0 Inf
mmu_cache: 0 3 0.000
mmu_flood: 0 0 1
mmu_pde_z: 0 0 1
mmu_pte_u: 0 0 1
mmu_pte_w: 0 0 1
mmu_recyc: 0 0 1
mmu_shado: 0 0 1
mmu_unsyn: 0 0 1
mmu_unsyn: 0 0 1
nmi_injec: 0 0 1
nmi_windo: 0 0 1
pf_fixed : 0 67 0.001
pf_guest : 0 0 1
remote_tl: 0 0 1
request_n: 0 0 1
signal_ex: 0 0 1
tlb_flush: 124 14037 0.009
Still see the huge difference in vm_exits, so I guess not all is great yet.
So, what would be the best course of action for this? Is there a
virtio_block driver to test? Can we find the root cause of the exits
(is there a way to get stack dump or something that can show where there
are coming from)?
Thanks!
-Andrew
P.S. Here is the qemu cmd line for the windows VMs:
/usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
/dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a -net
nic,model=e1000,vlan=0,macaddr=00:50:56:00:00:03 -m 1024 -mempath
/hugetlbfs -net tap,vlan=0,ifname=tap3,script=/etc/qemu-ifup -vnc
127.0.0.1:3 -smp 2 -daemonize
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Windows Server 2008 VM performance
2009-06-02 21:05 Windows Server 2008 VM performance Andrew Theurer
@ 2009-06-02 21:38 ` Avi Kivity
2009-06-02 21:45 ` Javier Guerra
2009-06-03 14:21 ` Andrew Theurer
0 siblings, 2 replies; 7+ messages in thread
From: Avi Kivity @ 2009-06-02 21:38 UTC (permalink / raw)
To: Andrew Theurer; +Cc: KVM list
Andrew Theurer wrote:
> I've been looking at how KVM handles windows guests, and I am a little
> concerned with the CPU overhead. My test case is as follows:
>
> I am running 4 instances of a J2EE benchmark. Each instance needs one
> application server and one DB server. 8 VMs in total are used.
>
> I have the same App and DB software for Linux and Windows (and same
> versions) so I can compare between Linux and Windows. I also have
> another hypervisor which I can test both Windows and Linux VMs.
>
> The host has EPT capable processors. VMs in KVM are backed with large
> pages.
>
>
> Test results:
>
> Config CPU utilization
> ------ -----
> KVM-85
> Windows Server 2008 64-bit VMs 44.84
> RedHat 5.3 w/ 2.6.29 64-bit VMs 24.56
> Other-Hypervisor
> Windows Server 2008 64-bit VMs 30.63
> RedHat 5.3 w/ 2.6.18 64-bit VMs 27.13
>
> -KVM running Windows VMs uses 46% more CPU than the Other-Hypervisor
> -The Other-Hypervisor provides an optimized virtual network driver
> -KVM results listed above did not use virtio_net or virtio_disk for
> Windows, but do for Linux
> -One extra KVM run (not listed above) was made with virtio_net for
> Windows VMs but only reduced CPU by 2%
I think the publicly available driver is pretty old and unoptimized.
> -Most of the CPU overhead could be attributed to the DB VMs, where
> there is about 5 MB/sec writes per VM
> -I don't have a virtio_block driver for Windows to test. Does one exist?
Exists, not public though.
> -All tests above had 2 vCPUS per VM
>
>
> Here's a comparison of kvm_stat between Windows (run1) and Linux (run2):
>
> run1 run2 run1/run2
> ---- ---- ---------
> efer_relo: 0 0 1
> exits : 1206880 121916 9.899
total exits is the prime measure of course.
> fpu_reloa: 210969 20863 10.112
> halt_exit: 15092 13222 1.141
> halt_wake: 14466 9294 1.556
> host_stat: 211066 45117 4.678
host state reloads measure (approximately) exits to userspace, likely
due to the unoptimized drivers.
> hypercall: 0 0 1
> insn_emul: 119582 38126 3.136
again lack of drivers
> insn_emul: 0 0 1
> invlpg : 0 0 1
> io_exits : 131051 26349 4.974
ditto
> irq_exits: 8128 12937 0.628
> irq_injec: 29955 21825 1.373
> irq_windo: 2504 2022 1.238
> kvm_reque: 0 0 1
> largepage: 1 64 0.009
> mmio_exit: 59224 0 Inf
wow, linux avoids mmio completely. good.
>
>
> 10x the number of exits, a problem?
_the_ problem.
>
> I happened to try just one vCPU per VM for KVM/Windows VMs, and I was
> surprised how much of a difference it made:
>
> Config CPU utilization
> ------ -----
> KVM-85
> Windows Server 2008 64-bit VMs, 2 vCPU per VM 44.84
> Windows Server 2008 64-bit VMs, 1 vCPU per VM 36.44
>
> A 19% reduction in CPU utilization vs KVM/Windows-2vCPU! Does not
> explain all the overhead (vs Other-Hypervisor, 2 vCPUs per VM) but,
> that sure seems like a lot between 1 to 2 vCPUs for KVM/Windows-VMs.
Inter-process communication is expensive. Marcelo added some
optimizations (the sending vcpu used to wait for the target vcpu, now
they don't). They're in kvm-86 (of course). You'll need 2.6.26+ on the
host for them to take effect (of course, for many features, the more
recent the host the faster kvm on top can run).
Windows 2008 also implements some hypervisor accelerations which are
especially useful on smp. Gleb has started some work on this. I don't
know if other_hypervisor implements them.
Finally, smp is expensive! data moves across caches, processors wait for
each other, etc.
> I have not run with 1 vCPU per VM with Other-Hypervisor, but I will
> soon. Anyway, I also collected kvm_stat for the 1 vCPU case, and here
> it is compared to KVM/Linux VMs with 2 vCPUs:
>
> run1 run2 run1/run2
> ---- ---- ---------
> efer_relo: 0 0 1
> exits : 1184471 121916 9.715
>
> Still see the huge difference in vm_exits, so I guess not all is great
> yet.
Yeah, exit rate stayed the same, so it's probably IPC costs and
intrinsic smp costs.
> So, what would be the best course of action for this?
> Is there a virtio_block driver to test?
There is, but it isn't available yet.
> Can we find the root cause of the exits (is there a way to get stack
> dump or something that can show where there are coming from)?
Marcelo is working on a super-duper easy to use kvm trace which can show
what's going on. The old one is reasonably easy though it exports less
data. If you can generate some traces, I'll have a look at them.
>
> P.S. Here is the qemu cmd line for the windows VMs:
> /usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
> /dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a
Use: -drive /dev/very-long-name,cache=none instead of -hda to disable
the host cache. Won't make much of a differnence for 5 MB/s though.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Windows Server 2008 VM performance
2009-06-02 21:38 ` Avi Kivity
@ 2009-06-02 21:45 ` Javier Guerra
2009-06-02 21:48 ` Avi Kivity
2009-06-03 14:21 ` Andrew Theurer
1 sibling, 1 reply; 7+ messages in thread
From: Javier Guerra @ 2009-06-02 21:45 UTC (permalink / raw)
To: Avi Kivity; +Cc: Andrew Theurer, KVM list
On Tue, Jun 2, 2009 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
> Andrew Theurer wrote:
>> P.S. Here is the qemu cmd line for the windows VMs:
>> /usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
>> /dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a
>
> Use: -drive /dev/very-long-name,cache=none instead of -hda to disable the
> host cache. Won't make much of a differnence for 5 MB/s though.
would emulating SCSI instead of IDE help while we hope and wait for
virtio_block drivers? (and better virtio_net).
--
Javier
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Windows Server 2008 VM performance
2009-06-02 21:45 ` Javier Guerra
@ 2009-06-02 21:48 ` Avi Kivity
2009-06-02 22:15 ` Ryan Harper
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-06-02 21:48 UTC (permalink / raw)
To: Javier Guerra; +Cc: Andrew Theurer, KVM list
Javier Guerra wrote:
> On Tue, Jun 2, 2009 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
>
>> Andrew Theurer wrote:
>>
>>> P.S. Here is the qemu cmd line for the windows VMs:
>>> /usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
>>> /dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a
>>>
>> Use: -drive /dev/very-long-name,cache=none instead of -hda to disable the
>> host cache. Won't make much of a differnence for 5 MB/s though.
>>
>
> would emulating SCSI instead of IDE help while we hope and wait for
> virtio_block drivers? (and better virtio_net).
>
I don't think the scsi emulation currently supports command parallelism,
but I could be wrong.
Worth trying.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Windows Server 2008 VM performance
2009-06-02 21:48 ` Avi Kivity
@ 2009-06-02 22:15 ` Ryan Harper
0 siblings, 0 replies; 7+ messages in thread
From: Ryan Harper @ 2009-06-02 22:15 UTC (permalink / raw)
To: Avi Kivity; +Cc: Javier Guerra, Andrew Theurer, KVM list
* Avi Kivity <avi@redhat.com> [2009-06-02 16:49]:
> Javier Guerra wrote:
> >On Tue, Jun 2, 2009 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
> >
> >>Andrew Theurer wrote:
> >>
> >>>P.S. Here is the qemu cmd line for the windows VMs:
> >>>/usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
> >>>/dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a
> >>>
> >>Use: -drive /dev/very-long-name,cache=none instead of -hda to disable the
> >>host cache. Won't make much of a differnence for 5 MB/s though.
> >>
> >
> >would emulating SCSI instead of IDE help while we hope and wait for
> >virtio_block drivers? (and better virtio_net).
> >
>
> I don't think the scsi emulation currently supports command parallelism,
> but I could be wrong.
>
For reads only, and I don't think 2k8 installs to lsi scsi acutally
complete.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Windows Server 2008 VM performance
2009-06-02 21:38 ` Avi Kivity
2009-06-02 21:45 ` Javier Guerra
@ 2009-06-03 14:21 ` Andrew Theurer
2009-06-03 17:38 ` Marcelo Tosatti
1 sibling, 1 reply; 7+ messages in thread
From: Andrew Theurer @ 2009-06-03 14:21 UTC (permalink / raw)
To: Avi Kivity; +Cc: KVM list
Avi Kivity wrote:
> Andrew Theurer wrote:
>
>
>> Is there a virtio_block driver to test?
>
> There is, but it isn't available yet.
OK. Can I assume a better virtio_net driver is in the works as well?
>
>> Can we find the root cause of the exits (is there a way to get stack
>> dump or something that can show where there are coming from)?
>
> Marcelo is working on a super-duper easy to use kvm trace which can
> show what's going on. The old one is reasonably easy though it
> exports less data. If you can generate some traces, I'll have a look
> at them.
Thanks Avi. I'll try out kvm-86 and see if I can generate some kvm
trace data.
-Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Windows Server 2008 VM performance
2009-06-03 14:21 ` Andrew Theurer
@ 2009-06-03 17:38 ` Marcelo Tosatti
0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2009-06-03 17:38 UTC (permalink / raw)
To: Andrew Theurer; +Cc: Avi Kivity, KVM list
[-- Attachment #1: Type: text/plain, Size: 1607 bytes --]
On Wed, Jun 03, 2009 at 09:21:16AM -0500, Andrew Theurer wrote:
> Avi Kivity wrote:
>> Andrew Theurer wrote:
>>
>>
>>> Is there a virtio_block driver to test?
>>
>> There is, but it isn't available yet.
> OK. Can I assume a better virtio_net driver is in the works as well?
>>
>>> Can we find the root cause of the exits (is there a way to get stack
>>> dump or something that can show where there are coming from)?
>>
>> Marcelo is working on a super-duper easy to use kvm trace which can
>> show what's going on. The old one is reasonably easy though it
>> exports less data. If you can generate some traces, I'll have a look
>> at them.
>
> Thanks Avi. I'll try out kvm-86 and see if I can generate some kvm
> trace data.
clone
git://git.kernel.org/pub/scm/linux/kernel/git/marcelo/linux-2.6-x86-kvmtrace.git,
checkout (remote) branch kvmtrace, _build with KVM=y,KVM_{INTEL,AMD}=y_
(there's a missing export symbol, will pull the upstream fix ASAP).
Then once running the benchmark with this kernel, on a phase thats
indicative of the workload, do:
echo kvm_exit > /debugfs/tracing/set_event
sleep n
cat /debugfs/tracing/trace > /tmp/trace-save.txt
Make n relatively small, 1 or 2 seconds. Would be nice to see UP vs SMP
Win2008 guests.
echo > /debugfs/tracing/set_event to stop tracing
echo > /debugfs/tracing/trace to zero the trace buffer
With some post processing you can the get the exit reason percentages.
Alternatively you can use systemtap (see attached script, need some
adjustment in the entry/exit line number probes) so it calculates the
exit percentages for you.
[-- Attachment #2: kvm.stp --]
[-- Type: text/plain, Size: 1722 bytes --]
global exit_names[70]
global latency[70]
global vmexit
global vmentry
global vmx_exit_reason
// run stap -t kvm.stp to find out the overhead of which probe
// and change it here. should be automatic!
global exit_probe_overhead = 645
global entry_probe_overhead = 882
global handlexit_probe_overhead = 405
global overhead
probe begin (1)
{
overhead = 882 + 645 + 405
exit_names[0] = "EXCEPTION"
exit_names[1] = "EXTERNAL INT"
exit_names[7] = "PENDING INTERRUPT"
exit_names[10] = "CPUID"
exit_names[18] = "HYPERCALL"
exit_names[28] = "CR ACCESS"
exit_names[29] = "DR ACCESS"
exit_names[30] = "IO INSTRUCTION (LIGHT)"
exit_names[31] = "MSR READ"
exit_names[32] = "MSR WRITE"
exit_names[44] = "APIC ACCESS"
exit_names[60] = "IO INSTRUCTION (HEAVY)"
}
probe module("kvm_intel").statement("kvm_handle_exit@arch/x86/kvm/vmx.c:3011")
{
vmx_exit_reason = $exit_reason
}
// exit
probe module("kvm_intel").statement("vmx_vcpu_run@arch/x86/kvm/vmx.c:3226")
{
vmexit = get_cycles()
}
// entry
probe module("kvm_intel").function("vmx_vcpu_run")
{
vmentry = get_cycles()
if (vmx_exit_reason != 12)
latency[vmx_exit_reason] <<< vmentry - vmexit - overhead
}
//heavy-exit
probe module("kvm").function("kvm_arch_vcpu_ioctl_run")
{
vmx_exit_reason = 60
}
probe end {
foreach (x in latency-) {
printf("%d: %s\n", x, exit_names[x])
printf ("avg %d = sum %d / count %d\n",
@avg(latency[x]), @sum(latency[x]), @count(latency[x]))
printf ("min %d max %d\n", @min(latency[x]), @max(latency[x]))
print(@hist_linear(latency[x], 10000000, 50000000, 10000000))
}
print("\n")
}
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-06-03 17:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-02 21:05 Windows Server 2008 VM performance Andrew Theurer
2009-06-02 21:38 ` Avi Kivity
2009-06-02 21:45 ` Javier Guerra
2009-06-02 21:48 ` Avi Kivity
2009-06-02 22:15 ` Ryan Harper
2009-06-03 14:21 ` Andrew Theurer
2009-06-03 17:38 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox