Performace data when running Windows VMs

All of lore.kernel.org
 help / color / mirror / Atom feed

* Performace data when running Windows VMs
@ 2009-08-26 14:57 Andrew Theurer
  2009-08-26 15:44 ` Avi Kivity
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Theurer @ 2009-08-26 14:57 UTC (permalink / raw)
  To: kvm

I recently gathered some performance data when running Windows Server
2008 VMs, and I wanted to share it here.  There are 12 Windows
Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent
execution of 6 J2EE type benchmarks.  Each benchmark needs a App VM and
a Database VM.  The benchmark clients inject a fixed rate of requests
which yields X% CPU utilization on the host.  A different hypervisor was
compared; KVM used about 60% more CPU cycles to complete the same amount
of work.  Both had their hypervisor specific paravirt IO drivers in the
VMs.

Server is a 2 socket Core/i7, SMT off, with 72 GB memory

Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865
Qemu was kvm-87.  I tried a few newer versions of Qemu; none of them
worked with the RedHat virtIO Windows drivers.  I tried:

f3600c589a9ee5ea4c0fec74ed4e06a15b461d52
0.11.0-rc1
0.10.6
kvm-88

All but 0.10.6 had "Problem code 10" driver error in the VM.  0.10.6 had
"a disk read error occurred" very early in the booting of the VM.

I/O on the host was not what I would call very high:  outbound network
averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
243/sec and write ops was 561/sec

Host CPU breakdown was the following:

user  nice  system irq  softirq guest  idle  iowait
5.67  0.00  11.64  0.09 1.05    31.90  46.06 3.59


The amount of kernel time had me concerned.  Here is oprofile:


> samples  %        app name                 symbol name
> 1163422  52.3744  kvm-intel.ko             vmx_vcpu_run
> 103996    4.6816  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg
> 81036     3.6480  kvm.ko                   kvm_arch_vcpu_ioctl_run
> 37913     1.7068  qemu-system-x86_64       cpu_physical_memory_rw
> 34720     1.5630  qemu-system-x86_64       phys_page_find_alloc
> 23234     1.0459  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe
> 20964     0.9437  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg
> 17628     0.7936  libc-2.5.so              memcpy
> 16587     0.7467  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read
> 15681     0.7059  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read
> 15466     0.6962  kvm.ko                   find_highest_vector
> 14611     0.6578  qemu-system-x86_64       qemu_get_ram_ptr
> 11254     0.5066  kvm-intel.ko             vmcs_writel
> 11133     0.5012  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string
> 10917     0.4915  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe
> 10760     0.4844  qemu-system-x86_64       virtqueue_get_head
> 9025      0.4063  kvm-intel.ko             vmx_handle_exit
> 8953      0.4030  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule
> 8753      0.3940  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light
> 8465      0.3811  qemu-system-x86_64       virtqueue_avail_bytes
> 8185      0.3685  kvm-intel.ko             handle_cr
> 8069      0.3632  kvm.ko                   kvm_set_irq
> 7697      0.3465  kvm.ko                   kvm_lapic_sync_from_vapic
> 7586      0.3415  qemu-system-x86_64       main_loop_wait
> 7480      0.3367  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select
> 7121      0.3206  qemu-system-x86_64       lduw_phys
> 7003      0.3153  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit
> 6062      0.2729  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree
> 5477      0.2466  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput
> 5454      0.2455  kvm.ko                   kvm_lapic_get_cr8
> 5096      0.2294  kvm.ko                   kvm_load_guest_fpu
> 5057      0.2277  kvm.ko                   apic_update_ppr
> 4929      0.2219  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read
> 4900      0.2206  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry
> 4866      0.2191  kvm.ko                   kvm_apic_has_interrupt
> 4670      0.2102  kvm-intel.ko             skip_emulated_instruction
> 4644      0.2091  kvm.ko                   kvm_cpu_has_interrupt
> 4548      0.2047  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to
> 4328      0.1948  kvm.ko                   kvm_apic_accept_pic_intr
> 4303      0.1937  libpthread-2.5.so        pthread_mutex_lock
> 4235      0.1906  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call
> 4175      0.1879  kvm.ko                   kvm_put_guest_fpu
> 4170      0.1877  qemu-system-x86_64       ldl_phys
> 4098      0.1845  kvm-intel.ko             vmx_set_interrupt_shadow
> 4003      0.1802  qemu-system-x86_64       kvm_run

I was wondering why the get/set debugreg was so high.  I don't recall
seeing this much with Linux VMs.

Here is an average of kvm_stat:


> efer_relo  0
> exits      1262814
> fpu_reloa  103842
> halt_exit  9918
> halt_wake  9763
> host_stat  103846
> hypercall  0
> insn_emul  23277
> insn_emul  23277
> invlpg     0
> io_exits   82717
> irq_exits  12797
> irq_injec  18806
> irq_windo  1194
> largepage  12
> mmio_exit  0
> mmu_cache  0
> mmu_flood  0
> mmu_pde_z  0
> mmu_pte_u  0
> mmu_pte_w  0
> mmu_recyc  0
> mmu_shado  0
> mmu_unsyn  0
> nmi_injec  0
> nmi_windo  0
> pf_fixed   12
> pf_guest   0
> remote_tl  0
> request_i  0
> signal_ex  0
> tlb_flush  0

For 12 VMs, do the number of exits/sec seem reasonable?

Comments?

Thanks,

-Andrew


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performace data when running Windows VMs
  2009-08-26 14:57 Performace data when running Windows VMs Andrew Theurer
@ 2009-08-26 15:44 ` Avi Kivity
  2009-08-26 16:14   ` Andrew Theurer
  0 siblings, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2009-08-26 15:44 UTC (permalink / raw)
  To: habanero; +Cc: kvm, Yan Vugenfirer

On 08/26/2009 05:57 PM, Andrew Theurer wrote:
> I recently gathered some performance data when running Windows Server
> 2008 VMs, and I wanted to share it here.  There are 12 Windows
> Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent
> execution of 6 J2EE type benchmarks.  Each benchmark needs a App VM and
> a Database VM.  The benchmark clients inject a fixed rate of requests
> which yields X% CPU utilization on the host.  A different hypervisor was
> compared; KVM used about 60% more CPU cycles to complete the same amount
> of work.  Both had their hypervisor specific paravirt IO drivers in the
> VMs.
>
> Server is a 2 socket Core/i7, SMT off, with 72 GB memory
>    

Did you use large pages?

> Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865
> Qemu was kvm-87.  I tried a few newer versions of Qemu; none of them
> worked with the RedHat virtIO Windows drivers.  I tried:
>
> f3600c589a9ee5ea4c0fec74ed4e06a15b461d52
> 0.11.0-rc1
> 0.10.6
> kvm-88
>
> All but 0.10.6 had "Problem code 10" driver error in the VM.  0.10.6 had
> "a disk read error occurred" very early in the booting of the VM.
>    

Yan?

> I/O on the host was not what I would call very high:  outbound network
> averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
> 243/sec and write ops was 561/sec
>    

What was the disk bandwidth used?  Presumably, direct access to the 
volume with cache=off?

linux-aio should help reduce cpu usage.

> Host CPU breakdown was the following:
>
> user  nice  system irq  softirq guest  idle  iowait
> 5.67  0.00  11.64  0.09 1.05    31.90  46.06 3.59
>
>
> The amount of kernel time had me concerned.  Here is oprofile:
>    

user+system is about 55% of guest time, and it's all overhead.

>> samples  %        app name                 symbol name
>> 1163422  52.3744  kvm-intel.ko             vmx_vcpu_run
>> 103996    4.6816  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg
>> 81036     3.6480  kvm.ko                   kvm_arch_vcpu_ioctl_run
>> 37913     1.7068  qemu-system-x86_64       cpu_physical_memory_rw
>> 34720     1.5630  qemu-system-x86_64       phys_page_find_alloc
>>      

We should really optimize these two.

>> 23234     1.0459  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe
>> 20964     0.9437  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg
>> 17628     0.7936  libc-2.5.so              memcpy
>> 16587     0.7467  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read
>> 15681     0.7059  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read
>> 15466     0.6962  kvm.ko                   find_highest_vector
>> 14611     0.6578  qemu-system-x86_64       qemu_get_ram_ptr
>> 11254     0.5066  kvm-intel.ko             vmcs_writel
>> 11133     0.5012  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string
>> 10917     0.4915  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe
>> 10760     0.4844  qemu-system-x86_64       virtqueue_get_head
>> 9025      0.4063  kvm-intel.ko             vmx_handle_exit
>> 8953      0.4030  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule
>> 8753      0.3940  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light
>> 8465      0.3811  qemu-system-x86_64       virtqueue_avail_bytes
>> 8185      0.3685  kvm-intel.ko             handle_cr
>> 8069      0.3632  kvm.ko                   kvm_set_irq
>> 7697      0.3465  kvm.ko                   kvm_lapic_sync_from_vapic
>> 7586      0.3415  qemu-system-x86_64       main_loop_wait
>> 7480      0.3367  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select
>> 7121      0.3206  qemu-system-x86_64       lduw_phys
>> 7003      0.3153  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit
>> 6062      0.2729  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree
>> 5477      0.2466  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput
>> 5454      0.2455  kvm.ko                   kvm_lapic_get_cr8
>> 5096      0.2294  kvm.ko                   kvm_load_guest_fpu
>> 5057      0.2277  kvm.ko                   apic_update_ppr
>> 4929      0.2219  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read
>> 4900      0.2206  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry
>> 4866      0.2191  kvm.ko                   kvm_apic_has_interrupt
>> 4670      0.2102  kvm-intel.ko             skip_emulated_instruction
>> 4644      0.2091  kvm.ko                   kvm_cpu_has_interrupt
>> 4548      0.2047  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to
>> 4328      0.1948  kvm.ko                   kvm_apic_accept_pic_intr
>> 4303      0.1937  libpthread-2.5.so        pthread_mutex_lock
>> 4235      0.1906  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call
>> 4175      0.1879  kvm.ko                   kvm_put_guest_fpu
>> 4170      0.1877  qemu-system-x86_64       ldl_phys
>> 4098      0.1845  kvm-intel.ko             vmx_set_interrupt_shadow
>> 4003      0.1802  qemu-system-x86_64       kvm_run
>>      
> I was wondering why the get/set debugreg was so high.  I don't recall
> seeing this much with Linux VMs.
>    

Could it be that Windows uses the debug registers?  Maybe we're 
incorrectly deciding to switch them.

Apart from that, nothing really stands out.  We'll just have to optimize 
things one by one.

> Here is an average of kvm_stat:
>
>
>    
>> efer_relo  0
>> exits      1262814
>>      

100K exits/sec/vm.  This is high.

>> fpu_reloa  103842
>>      

So is this -- maybe we're misdetecting fpu usage on EPT.

>> halt_exit  9918
>> halt_wake  9763
>> host_stat  103846
>>      

This is presumably due to virtio in qemu.

>> hypercall  0
>> insn_emul  23277
>> insn_emul  23277
>> invlpg     0
>> io_exits   82717
>>      

Yes, it is.

>> irq_exits  12797
>> irq_injec  18806
>> irq_windo  1194
>> largepage  12
>> mmio_exit  0
>> mmu_cache  0
>> mmu_flood  0
>> mmu_pde_z  0
>> mmu_pte_u  0
>> mmu_pte_w  0
>> mmu_recyc  0
>> mmu_shado  0
>> mmu_unsyn  0
>> nmi_injec  0
>> nmi_windo  0
>> pf_fixed   12
>> pf_guest   0
>> remote_tl  0
>> request_i  0
>> signal_ex  0
>> tlb_flush  0
>>      
> For 12 VMs, do the number of exits/sec seem reasonable?
>
> Comments?
>    

Not all of the exits are accounted for, so we're missing a big part of 
the picture.  2.6.32 will have better statistics through ftrace.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performace data when running Windows VMs
  2009-08-26 15:44 ` Avi Kivity
@ 2009-08-26 16:14   ` Andrew Theurer
  2009-08-26 16:26     ` Avi Kivity
  2009-08-26 16:27     ` Brian Jackson
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Theurer @ 2009-08-26 16:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Yan Vugenfirer

On Wed, 2009-08-26 at 18:44 +0300, Avi Kivity wrote:
> On 08/26/2009 05:57 PM, Andrew Theurer wrote:
> > I recently gathered some performance data when running Windows Server
> > 2008 VMs, and I wanted to share it here.  There are 12 Windows
> > Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent
> > execution of 6 J2EE type benchmarks.  Each benchmark needs a App VM and
> > a Database VM.  The benchmark clients inject a fixed rate of requests
> > which yields X% CPU utilization on the host.  A different hypervisor was
> > compared; KVM used about 60% more CPU cycles to complete the same amount
> > of work.  Both had their hypervisor specific paravirt IO drivers in the
> > VMs.
> >
> > Server is a 2 socket Core/i7, SMT off, with 72 GB memory
> >    
> 
> Did you use large pages?

Yes.
> 
> > Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865
> > Qemu was kvm-87.  I tried a few newer versions of Qemu; none of them
> > worked with the RedHat virtIO Windows drivers.  I tried:
> >
> > f3600c589a9ee5ea4c0fec74ed4e06a15b461d52
> > 0.11.0-rc1
> > 0.10.6
> > kvm-88
> >
> > All but 0.10.6 had "Problem code 10" driver error in the VM.  0.10.6 had
> > "a disk read error occurred" very early in the booting of the VM.
> >    
> 
> Yan?
> 
> > I/O on the host was not what I would call very high:  outbound network
> > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
> > 243/sec and write ops was 561/sec
> >    
> 
> What was the disk bandwidth used?  Presumably, direct access to the 
> volume with cache=off?

2.4 MB/sec write, 0.6MB/sec read, cache=none
The VMs' boot disks are IDE, but apps use their second disk which is
virtio.

> linux-aio should help reduce cpu usage.

I assume this is in a newer version of Qemu?

> > Host CPU breakdown was the following:
> >
> > user  nice  system irq  softirq guest  idle  iowait
> > 5.67  0.00  11.64  0.09 1.05    31.90  46.06 3.59
> >
> >
> > The amount of kernel time had me concerned.  Here is oprofile:
> >    
> 
> user+system is about 55% of guest time, and it's all overhead.
> 
> >> samples  %        app name                 symbol name
> >> 1163422  52.3744  kvm-intel.ko             vmx_vcpu_run
> >> 103996    4.6816  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg
> >> 81036     3.6480  kvm.ko                   kvm_arch_vcpu_ioctl_run
> >> 37913     1.7068  qemu-system-x86_64       cpu_physical_memory_rw
> >> 34720     1.5630  qemu-system-x86_64       phys_page_find_alloc
> >>      
> 
> We should really optimize these two.
> 
> >> 23234     1.0459  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe
> >> 20964     0.9437  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg
> >> 17628     0.7936  libc-2.5.so              memcpy
> >> 16587     0.7467  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read
> >> 15681     0.7059  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read
> >> 15466     0.6962  kvm.ko                   find_highest_vector
> >> 14611     0.6578  qemu-system-x86_64       qemu_get_ram_ptr
> >> 11254     0.5066  kvm-intel.ko             vmcs_writel
> >> 11133     0.5012  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string
> >> 10917     0.4915  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe
> >> 10760     0.4844  qemu-system-x86_64       virtqueue_get_head
> >> 9025      0.4063  kvm-intel.ko             vmx_handle_exit
> >> 8953      0.4030  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule
> >> 8753      0.3940  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light
> >> 8465      0.3811  qemu-system-x86_64       virtqueue_avail_bytes
> >> 8185      0.3685  kvm-intel.ko             handle_cr
> >> 8069      0.3632  kvm.ko                   kvm_set_irq
> >> 7697      0.3465  kvm.ko                   kvm_lapic_sync_from_vapic
> >> 7586      0.3415  qemu-system-x86_64       main_loop_wait
> >> 7480      0.3367  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select
> >> 7121      0.3206  qemu-system-x86_64       lduw_phys
> >> 7003      0.3153  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit
> >> 6062      0.2729  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree
> >> 5477      0.2466  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput
> >> 5454      0.2455  kvm.ko                   kvm_lapic_get_cr8
> >> 5096      0.2294  kvm.ko                   kvm_load_guest_fpu
> >> 5057      0.2277  kvm.ko                   apic_update_ppr
> >> 4929      0.2219  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read
> >> 4900      0.2206  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry
> >> 4866      0.2191  kvm.ko                   kvm_apic_has_interrupt
> >> 4670      0.2102  kvm-intel.ko             skip_emulated_instruction
> >> 4644      0.2091  kvm.ko                   kvm_cpu_has_interrupt
> >> 4548      0.2047  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to
> >> 4328      0.1948  kvm.ko                   kvm_apic_accept_pic_intr
> >> 4303      0.1937  libpthread-2.5.so        pthread_mutex_lock
> >> 4235      0.1906  vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call
> >> 4175      0.1879  kvm.ko                   kvm_put_guest_fpu
> >> 4170      0.1877  qemu-system-x86_64       ldl_phys
> >> 4098      0.1845  kvm-intel.ko             vmx_set_interrupt_shadow
> >> 4003      0.1802  qemu-system-x86_64       kvm_run
> >>      
> > I was wondering why the get/set debugreg was so high.  I don't recall
> > seeing this much with Linux VMs.
> >    
> 
> Could it be that Windows uses the debug registers?  Maybe we're 
> incorrectly deciding to switch them.

I was wondering about that.  I was thinking of just backing out the
support for debugregs and see what happens.

Did the up/down_read seem kind of high?  Are we doing a lock of locking?

> 
> Apart from that, nothing really stands out.  We'll just have to optimize 
> things one by one.
> 
> > Here is an average of kvm_stat:
> >
> >
> >    
> >> efer_relo  0
> >> exits      1262814
> >>      
> 
> 100K exits/sec/vm.  This is high.
> 
> >> fpu_reloa  103842
> >>      
> 
> So is this -- maybe we're misdetecting fpu usage on EPT.
> 
> >> halt_exit  9918
> >> halt_wake  9763
> >> host_stat  103846
> >>      
> 
> This is presumably due to virtio in qemu.
> 
> >> hypercall  0
> >> insn_emul  23277
> >> insn_emul  23277
> >> invlpg     0
> >> io_exits   82717
> >>      
> 
> Yes, it is.
> 
> >> irq_exits  12797
> >> irq_injec  18806
> >> irq_windo  1194
> >> largepage  12
> >> mmio_exit  0
> >> mmu_cache  0
> >> mmu_flood  0
> >> mmu_pde_z  0
> >> mmu_pte_u  0
> >> mmu_pte_w  0
> >> mmu_recyc  0
> >> mmu_shado  0
> >> mmu_unsyn  0
> >> nmi_injec  0
> >> nmi_windo  0
> >> pf_fixed   12
> >> pf_guest   0
> >> remote_tl  0
> >> request_i  0
> >> signal_ex  0
> >> tlb_flush  0
> >>      
> > For 12 VMs, do the number of exits/sec seem reasonable?
> >
> > Comments?
> >    
> 
> Not all of the exits are accounted for, so we're missing a big part of 
> the picture.  2.6.32 will have better statistics through ftrace.

Thanks for the comments!

-Andrew


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performace data when running Windows VMs
  2009-08-26 16:14   ` Andrew Theurer
@ 2009-08-26 16:26     ` Avi Kivity
  2009-08-26 17:51       ` Andrew Theurer
  2009-08-26 16:27     ` Brian Jackson
  1 sibling, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2009-08-26 16:26 UTC (permalink / raw)
  To: habanero; +Cc: kvm, Yan Vugenfirer

On 08/26/2009 07:14 PM, Andrew Theurer wrote:
> On Wed, 2009-08-26 at 18:44 +0300, Avi Kivity wrote:
>    
>> On 08/26/2009 05:57 PM, Andrew Theurer wrote:
>>      
>>> I recently gathered some performance data when running Windows Server
>>> 2008 VMs, and I wanted to share it here.  There are 12 Windows
>>> Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent
>>> execution of 6 J2EE type benchmarks.  Each benchmark needs a App VM and
>>> a Database VM.  The benchmark clients inject a fixed rate of requests
>>> which yields X% CPU utilization on the host.  A different hypervisor was
>>> compared; KVM used about 60% more CPU cycles to complete the same amount
>>> of work.  Both had their hypervisor specific paravirt IO drivers in the
>>> VMs.
>>>
>>> Server is a 2 socket Core/i7, SMT off, with 72 GB memory
>>>
>>>        
>> Did you use large pages?
>>      
> Yes.
>    

The stats show 'largepage = 12'.  Something's wrong.  There's a commit 
(7736d680) that's supposed to fix largepage support for kvm-87, maybe 
it's incomplete.

>>> I/O on the host was not what I would call very high:  outbound network
>>> averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
>>> 243/sec and write ops was 561/sec
>>>
>>>        
>> What was the disk bandwidth used?  Presumably, direct access to the
>> volume with cache=off?
>>      
> 2.4 MB/sec write, 0.6MB/sec read, cache=none
> The VMs' boot disks are IDE, but apps use their second disk which is
> virtio.
>    

Chickenfeed.

Do the network stats include interguest traffic?  I presume *all* of the 
traffic was interguest.

>> linux-aio should help reduce cpu usage.
>>      
> I assume this is in a newer version of Qemu?
>    

No, posted and awaiting merge.

>> Could it be that Windows uses the debug registers?  Maybe we're
>> incorrectly deciding to switch them.
>>      
> I was wondering about that.  I was thinking of just backing out the
> support for debugregs and see what happens.
>
> Did the up/down_read seem kind of high?  Are we doing a lock of locking?
>    

It is.  We do.  Marcelo made some threats to remove this lock.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performace data when running Windows VMs
  2009-08-26 16:26     ` Avi Kivity
@ 2009-08-26 17:51       ` Andrew Theurer
  2009-08-26 19:20         ` Avi Kivity
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Theurer @ 2009-08-26 17:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Yan Vugenfirer

On Wed, 2009-08-26 at 19:26 +0300, Avi Kivity wrote:
> On 08/26/2009 07:14 PM, Andrew Theurer wrote:
> > On Wed, 2009-08-26 at 18:44 +0300, Avi Kivity wrote:
> >    
> >> On 08/26/2009 05:57 PM, Andrew Theurer wrote:
> >>      
> >>> I recently gathered some performance data when running Windows Server
> >>> 2008 VMs, and I wanted to share it here.  There are 12 Windows
> >>> Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent
> >>> execution of 6 J2EE type benchmarks.  Each benchmark needs a App VM and
> >>> a Database VM.  The benchmark clients inject a fixed rate of requests
> >>> which yields X% CPU utilization on the host.  A different hypervisor was
> >>> compared; KVM used about 60% more CPU cycles to complete the same amount
> >>> of work.  Both had their hypervisor specific paravirt IO drivers in the
> >>> VMs.
> >>>
> >>> Server is a 2 socket Core/i7, SMT off, with 72 GB memory
> >>>
> >>>        
> >> Did you use large pages?
> >>      
> > Yes.
> >    
> 
> The stats show 'largepage = 12'.  Something's wrong.  There's a commit 
> (7736d680) that's supposed to fix largepage support for kvm-87, maybe 
> it's incomplete.

How strange.  /proc/meminfo showed that almost all of the pages were
used:

HugePages_Total:   12556
HugePages_Free:      220
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

I just assumed they were used properly.  Maybe not.

> >>> I/O on the host was not what I would call very high:  outbound network
> >>> averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
> >>> 243/sec and write ops was 561/sec
> >>>
> >>>        
> >> What was the disk bandwidth used?  Presumably, direct access to the
> >> volume with cache=off?
> >>      
> > 2.4 MB/sec write, 0.6MB/sec read, cache=none
> > The VMs' boot disks are IDE, but apps use their second disk which is
> > virtio.
> >    
> 
> Chickenfeed.
> 
> Do the network stats include interguest traffic?  I presume *all* of the 
> traffic was interguest.

Sar network data:

>                  IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s
> Average:           lo      0.00      0.00      0.00      0.00 
> Average:         usb0      0.39      0.19      0.02      0.01 
> Average:         eth0   2968.83   5093.02    340.13   6966.64
> Average:         eth1   2992.92   5124.08    342.75   7008.53 
> Average:         eth2   1455.53   2500.63    167.45   3421.64 
> Average:         eth3   1500.59   2574.36    171.98   3524.82 
> Average:          br0      2.41      0.95      0.32      0.13 
> Average:          br1      1.52      0.00      0.20      0.00 
> Average:          br2      1.52      0.00      0.20      0.00 
> Average:          br3      1.52      0.00      0.20      0.00 
> Average:          br4      0.00      0.00      0.00      0.00 
> Average:         tap3    669.38    708.07    290.89    140.81 
> Average:       tap109    678.53    723.58    294.07    143.31 
> Average:       tap215    673.20    711.47    291.99    141.78 
> Average:       tap321    675.26    719.33    293.01    142.37 
> Average:        tap27    679.23    729.90    293.86    143.60 
> Average:       tap133    680.17    734.08    294.33    143.85 
> Average:         tap2   1002.24   2214.19   3458.54    457.95 
> Average:       tap108   1021.85   2246.53   3491.02    463.48 
> Average:       tap214   1002.81   2195.22   3411.80    457.28 
> Average:       tap320   1017.43   2241.49   3508.20    462.54 
> Average:        tap26   1028.52   2237.98   3483.84    462.53 
> Average:       tap132   1034.05   2240.89   3493.37    463.32 

tap0-99 go to eth0, 100-199 to eth1, 200-299 to eth2, 300-399 to eth4.
There is some inter-guest traffic between VM pairs (like taps 2&3,
108&119, etc.) but not that significant.

> 
> >> linux-aio should help reduce cpu usage.
> >>      
> > I assume this is in a newer version of Qemu?
> >    
> 
> No, posted and awaiting merge.
> 
> >> Could it be that Windows uses the debug registers?  Maybe we're
> >> incorrectly deciding to switch them.
> >>      
> > I was wondering about that.  I was thinking of just backing out the
> > support for debugregs and see what happens.
> >
> > Did the up/down_read seem kind of high?  Are we doing a lock of locking?
> >    
> 
> It is.  We do.  Marcelo made some threats to remove this lock.

Thanks,

-Andrew



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performace data when running Windows VMs
  2009-08-26 17:51       ` Andrew Theurer
@ 2009-08-26 19:20         ` Avi Kivity
  0 siblings, 0 replies; 8+ messages in thread
From: Avi Kivity @ 2009-08-26 19:20 UTC (permalink / raw)
  To: habanero; +Cc: kvm, Yan Vugenfirer

On 08/26/2009 08:51 PM, Andrew Theurer wrote:
>>
>> The stats show 'largepage = 12'.  Something's wrong.  There's a commit
>> (7736d680) that's supposed to fix largepage support for kvm-87, maybe
>> it's incomplete.
>>      
> How strange.  /proc/meminfo showed that almost all of the pages were
> used:
>
> HugePages_Total:   12556
> HugePages_Free:      220
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
>
> I just assumed they were used properly.  Maybe not.
>    

My mistake.  The kvm_stat numbers you provided were rate (per second), 
so it just means it's still faulting in pages at a rate of 1 per guest 
per second.

>    
>>>>> I/O on the host was not what I would call very high:  outbound network
>>>>> averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
>>>>> 243/sec and write ops was 561/sec
>>>>>
>>>>>
>>>>>            
>>>> What was the disk bandwidth used?  Presumably, direct access to the
>>>> volume with cache=off?
>>>>
>>>>          
>>> 2.4 MB/sec write, 0.6MB/sec read, cache=none
>>> The VMs' boot disks are IDE, but apps use their second disk which is
>>> virtio.
>>>
>>>        
>> Chickenfeed.
>>
>> Do the network stats include interguest traffic?  I presume *all* of the
>> traffic was interguest.
>>      
> Sar network data:
>
>    
>>                   IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s
>> Average:           lo      0.00      0.00      0.00      0.00
>> Average:         usb0      0.39      0.19      0.02      0.01
>> Average:         eth0   2968.83   5093.02    340.13   6966.64
>> Average:         eth1   2992.92   5124.08    342.75   7008.53
>> Average:         eth2   1455.53   2500.63    167.45   3421.64
>> Average:         eth3   1500.59   2574.36    171.98   3524.82
>> Average:          br0      2.41      0.95      0.32      0.13
>> Average:          br1      1.52      0.00      0.20      0.00
>> Average:          br2      1.52      0.00      0.20      0.00
>> Average:          br3      1.52      0.00      0.20      0.00
>> Average:          br4      0.00      0.00      0.00      0.00
>> Average:         tap3    669.38    708.07    290.89    140.81
>> Average:       tap109    678.53    723.58    294.07    143.31
>> Average:       tap215    673.20    711.47    291.99    141.78
>> Average:       tap321    675.26    719.33    293.01    142.37
>> Average:        tap27    679.23    729.90    293.86    143.60
>> Average:       tap133    680.17    734.08    294.33    143.85
>> Average:         tap2   1002.24   2214.19   3458.54    457.95
>> Average:       tap108   1021.85   2246.53   3491.02    463.48
>> Average:       tap214   1002.81   2195.22   3411.80    457.28
>> Average:       tap320   1017.43   2241.49   3508.20    462.54
>> Average:        tap26   1028.52   2237.98   3483.84    462.53
>> Average:       tap132   1034.05   2240.89   3493.37    463.32
>>      
> tap0-99 go to eth0, 100-199 to eth1, 200-299 to eth2, 300-399 to eth4.
> There is some inter-guest traffic between VM pairs (like taps 2&3,
> 108&119, etc.) but not that significant.
>    

Oh, so there are external load generators involved.

Can you run this on kvm.git master, with

CONFIG_TRACEPOINTS=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_EVENT_TRACING=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_DYNAMIC_FTRACE=y

(some may be overkill)

and, while the test is running, do:

  cd /sys/kernel/debug/tracing
  echo kvm > set_event
  (wait two seconds)
  cat trace > /tmp/trace

and send me /tmp/trace.bz2?  should be quite big.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performace data when running Windows VMs
  2009-08-26 16:14   ` Andrew Theurer
  2009-08-26 16:26     ` Avi Kivity
@ 2009-08-26 16:27     ` Brian Jackson
  2009-08-26 17:52       ` Andrew Theurer
  1 sibling, 1 reply; 8+ messages in thread
From: Brian Jackson @ 2009-08-26 16:27 UTC (permalink / raw)
  To: habanero; +Cc: Avi Kivity, kvm, Yan Vugenfirer

On Wednesday 26 August 2009 11:14:57 am Andrew Theurer wrote:
<snip>
> >
> > > I/O on the host was not what I would call very high:  outbound network
> > > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
> > > 243/sec and write ops was 561/sec
> >
> > What was the disk bandwidth used?  Presumably, direct access to the
> > volume with cache=off?
>
> 2.4 MB/sec write, 0.6MB/sec read, cache=none
> The VMs' boot disks are IDE, but apps use their second disk which is
> virtio.


In my testing, I got better performance from IDE than the new virtio block 
driver for windows. There appears to be some optimization left to do on them.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performace data when running Windows VMs
  2009-08-26 16:27     ` Brian Jackson
@ 2009-08-26 17:52       ` Andrew Theurer
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Theurer @ 2009-08-26 17:52 UTC (permalink / raw)
  To: Brian Jackson; +Cc: Avi Kivity, kvm, Yan Vugenfirer

On Wed, 2009-08-26 at 11:27 -0500, Brian Jackson wrote:
> On Wednesday 26 August 2009 11:14:57 am Andrew Theurer wrote:
> <snip>
> > >
> > > > I/O on the host was not what I would call very high:  outbound network
> > > > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was
> > > > 243/sec and write ops was 561/sec
> > >
> > > What was the disk bandwidth used?  Presumably, direct access to the
> > > volume with cache=off?
> >
> > 2.4 MB/sec write, 0.6MB/sec read, cache=none
> > The VMs' boot disks are IDE, but apps use their second disk which is
> > virtio.
> 
> 
> In my testing, I got better performance from IDE than the new virtio block 
> driver for windows. There appears to be some optimization left to do on them.

Thanks Brian.  I will try IDE on both VM disks to see how it compares.

-Andrew


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-08-26 19:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-26 14:57 Performace data when running Windows VMs Andrew Theurer
2009-08-26 15:44 ` Avi Kivity
2009-08-26 16:14   ` Andrew Theurer
2009-08-26 16:26     ` Avi Kivity
2009-08-26 17:51       ` Andrew Theurer
2009-08-26 19:20         ` Avi Kivity
2009-08-26 16:27     ` Brian Jackson
2009-08-26 17:52       ` Andrew Theurer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.