* Performace data when running Windows VMs @ 2009-08-26 14:57 Andrew Theurer 2009-08-26 15:44 ` Avi Kivity 0 siblings, 1 reply; 8+ messages in thread From: Andrew Theurer @ 2009-08-26 14:57 UTC (permalink / raw) To: kvm I recently gathered some performance data when running Windows Server 2008 VMs, and I wanted to share it here. There are 12 Windows Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent execution of 6 J2EE type benchmarks. Each benchmark needs a App VM and a Database VM. The benchmark clients inject a fixed rate of requests which yields X% CPU utilization on the host. A different hypervisor was compared; KVM used about 60% more CPU cycles to complete the same amount of work. Both had their hypervisor specific paravirt IO drivers in the VMs. Server is a 2 socket Core/i7, SMT off, with 72 GB memory Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865 Qemu was kvm-87. I tried a few newer versions of Qemu; none of them worked with the RedHat virtIO Windows drivers. I tried: f3600c589a9ee5ea4c0fec74ed4e06a15b461d52 0.11.0-rc1 0.10.6 kvm-88 All but 0.10.6 had "Problem code 10" driver error in the VM. 0.10.6 had "a disk read error occurred" very early in the booting of the VM. I/O on the host was not what I would call very high: outbound network averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was 243/sec and write ops was 561/sec Host CPU breakdown was the following: user nice system irq softirq guest idle iowait 5.67 0.00 11.64 0.09 1.05 31.90 46.06 3.59 The amount of kernel time had me concerned. Here is oprofile: > samples % app name symbol name > 1163422 52.3744 kvm-intel.ko vmx_vcpu_run > 103996 4.6816 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg > 81036 3.6480 kvm.ko kvm_arch_vcpu_ioctl_run > 37913 1.7068 qemu-system-x86_64 cpu_physical_memory_rw > 34720 1.5630 qemu-system-x86_64 phys_page_find_alloc > 23234 1.0459 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe > 20964 0.9437 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg > 17628 0.7936 libc-2.5.so memcpy > 16587 0.7467 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read > 15681 0.7059 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read > 15466 0.6962 kvm.ko find_highest_vector > 14611 0.6578 qemu-system-x86_64 qemu_get_ram_ptr > 11254 0.5066 kvm-intel.ko vmcs_writel > 11133 0.5012 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string > 10917 0.4915 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe > 10760 0.4844 qemu-system-x86_64 virtqueue_get_head > 9025 0.4063 kvm-intel.ko vmx_handle_exit > 8953 0.4030 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule > 8753 0.3940 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light > 8465 0.3811 qemu-system-x86_64 virtqueue_avail_bytes > 8185 0.3685 kvm-intel.ko handle_cr > 8069 0.3632 kvm.ko kvm_set_irq > 7697 0.3465 kvm.ko kvm_lapic_sync_from_vapic > 7586 0.3415 qemu-system-x86_64 main_loop_wait > 7480 0.3367 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select > 7121 0.3206 qemu-system-x86_64 lduw_phys > 7003 0.3153 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit > 6062 0.2729 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree > 5477 0.2466 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput > 5454 0.2455 kvm.ko kvm_lapic_get_cr8 > 5096 0.2294 kvm.ko kvm_load_guest_fpu > 5057 0.2277 kvm.ko apic_update_ppr > 4929 0.2219 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read > 4900 0.2206 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry > 4866 0.2191 kvm.ko kvm_apic_has_interrupt > 4670 0.2102 kvm-intel.ko skip_emulated_instruction > 4644 0.2091 kvm.ko kvm_cpu_has_interrupt > 4548 0.2047 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to > 4328 0.1948 kvm.ko kvm_apic_accept_pic_intr > 4303 0.1937 libpthread-2.5.so pthread_mutex_lock > 4235 0.1906 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call > 4175 0.1879 kvm.ko kvm_put_guest_fpu > 4170 0.1877 qemu-system-x86_64 ldl_phys > 4098 0.1845 kvm-intel.ko vmx_set_interrupt_shadow > 4003 0.1802 qemu-system-x86_64 kvm_run I was wondering why the get/set debugreg was so high. I don't recall seeing this much with Linux VMs. Here is an average of kvm_stat: > efer_relo 0 > exits 1262814 > fpu_reloa 103842 > halt_exit 9918 > halt_wake 9763 > host_stat 103846 > hypercall 0 > insn_emul 23277 > insn_emul 23277 > invlpg 0 > io_exits 82717 > irq_exits 12797 > irq_injec 18806 > irq_windo 1194 > largepage 12 > mmio_exit 0 > mmu_cache 0 > mmu_flood 0 > mmu_pde_z 0 > mmu_pte_u 0 > mmu_pte_w 0 > mmu_recyc 0 > mmu_shado 0 > mmu_unsyn 0 > nmi_injec 0 > nmi_windo 0 > pf_fixed 12 > pf_guest 0 > remote_tl 0 > request_i 0 > signal_ex 0 > tlb_flush 0 For 12 VMs, do the number of exits/sec seem reasonable? Comments? Thanks, -Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performace data when running Windows VMs 2009-08-26 14:57 Performace data when running Windows VMs Andrew Theurer @ 2009-08-26 15:44 ` Avi Kivity 2009-08-26 16:14 ` Andrew Theurer 0 siblings, 1 reply; 8+ messages in thread From: Avi Kivity @ 2009-08-26 15:44 UTC (permalink / raw) To: habanero; +Cc: kvm, Yan Vugenfirer On 08/26/2009 05:57 PM, Andrew Theurer wrote: > I recently gathered some performance data when running Windows Server > 2008 VMs, and I wanted to share it here. There are 12 Windows > Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent > execution of 6 J2EE type benchmarks. Each benchmark needs a App VM and > a Database VM. The benchmark clients inject a fixed rate of requests > which yields X% CPU utilization on the host. A different hypervisor was > compared; KVM used about 60% more CPU cycles to complete the same amount > of work. Both had their hypervisor specific paravirt IO drivers in the > VMs. > > Server is a 2 socket Core/i7, SMT off, with 72 GB memory > Did you use large pages? > Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865 > Qemu was kvm-87. I tried a few newer versions of Qemu; none of them > worked with the RedHat virtIO Windows drivers. I tried: > > f3600c589a9ee5ea4c0fec74ed4e06a15b461d52 > 0.11.0-rc1 > 0.10.6 > kvm-88 > > All but 0.10.6 had "Problem code 10" driver error in the VM. 0.10.6 had > "a disk read error occurred" very early in the booting of the VM. > Yan? > I/O on the host was not what I would call very high: outbound network > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was > 243/sec and write ops was 561/sec > What was the disk bandwidth used? Presumably, direct access to the volume with cache=off? linux-aio should help reduce cpu usage. > Host CPU breakdown was the following: > > user nice system irq softirq guest idle iowait > 5.67 0.00 11.64 0.09 1.05 31.90 46.06 3.59 > > > The amount of kernel time had me concerned. Here is oprofile: > user+system is about 55% of guest time, and it's all overhead. >> samples % app name symbol name >> 1163422 52.3744 kvm-intel.ko vmx_vcpu_run >> 103996 4.6816 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg >> 81036 3.6480 kvm.ko kvm_arch_vcpu_ioctl_run >> 37913 1.7068 qemu-system-x86_64 cpu_physical_memory_rw >> 34720 1.5630 qemu-system-x86_64 phys_page_find_alloc >> We should really optimize these two. >> 23234 1.0459 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe >> 20964 0.9437 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg >> 17628 0.7936 libc-2.5.so memcpy >> 16587 0.7467 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read >> 15681 0.7059 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read >> 15466 0.6962 kvm.ko find_highest_vector >> 14611 0.6578 qemu-system-x86_64 qemu_get_ram_ptr >> 11254 0.5066 kvm-intel.ko vmcs_writel >> 11133 0.5012 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string >> 10917 0.4915 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe >> 10760 0.4844 qemu-system-x86_64 virtqueue_get_head >> 9025 0.4063 kvm-intel.ko vmx_handle_exit >> 8953 0.4030 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule >> 8753 0.3940 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light >> 8465 0.3811 qemu-system-x86_64 virtqueue_avail_bytes >> 8185 0.3685 kvm-intel.ko handle_cr >> 8069 0.3632 kvm.ko kvm_set_irq >> 7697 0.3465 kvm.ko kvm_lapic_sync_from_vapic >> 7586 0.3415 qemu-system-x86_64 main_loop_wait >> 7480 0.3367 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select >> 7121 0.3206 qemu-system-x86_64 lduw_phys >> 7003 0.3153 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit >> 6062 0.2729 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree >> 5477 0.2466 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput >> 5454 0.2455 kvm.ko kvm_lapic_get_cr8 >> 5096 0.2294 kvm.ko kvm_load_guest_fpu >> 5057 0.2277 kvm.ko apic_update_ppr >> 4929 0.2219 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read >> 4900 0.2206 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry >> 4866 0.2191 kvm.ko kvm_apic_has_interrupt >> 4670 0.2102 kvm-intel.ko skip_emulated_instruction >> 4644 0.2091 kvm.ko kvm_cpu_has_interrupt >> 4548 0.2047 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to >> 4328 0.1948 kvm.ko kvm_apic_accept_pic_intr >> 4303 0.1937 libpthread-2.5.so pthread_mutex_lock >> 4235 0.1906 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call >> 4175 0.1879 kvm.ko kvm_put_guest_fpu >> 4170 0.1877 qemu-system-x86_64 ldl_phys >> 4098 0.1845 kvm-intel.ko vmx_set_interrupt_shadow >> 4003 0.1802 qemu-system-x86_64 kvm_run >> > I was wondering why the get/set debugreg was so high. I don't recall > seeing this much with Linux VMs. > Could it be that Windows uses the debug registers? Maybe we're incorrectly deciding to switch them. Apart from that, nothing really stands out. We'll just have to optimize things one by one. > Here is an average of kvm_stat: > > > >> efer_relo 0 >> exits 1262814 >> 100K exits/sec/vm. This is high. >> fpu_reloa 103842 >> So is this -- maybe we're misdetecting fpu usage on EPT. >> halt_exit 9918 >> halt_wake 9763 >> host_stat 103846 >> This is presumably due to virtio in qemu. >> hypercall 0 >> insn_emul 23277 >> insn_emul 23277 >> invlpg 0 >> io_exits 82717 >> Yes, it is. >> irq_exits 12797 >> irq_injec 18806 >> irq_windo 1194 >> largepage 12 >> mmio_exit 0 >> mmu_cache 0 >> mmu_flood 0 >> mmu_pde_z 0 >> mmu_pte_u 0 >> mmu_pte_w 0 >> mmu_recyc 0 >> mmu_shado 0 >> mmu_unsyn 0 >> nmi_injec 0 >> nmi_windo 0 >> pf_fixed 12 >> pf_guest 0 >> remote_tl 0 >> request_i 0 >> signal_ex 0 >> tlb_flush 0 >> > For 12 VMs, do the number of exits/sec seem reasonable? > > Comments? > Not all of the exits are accounted for, so we're missing a big part of the picture. 2.6.32 will have better statistics through ftrace. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performace data when running Windows VMs 2009-08-26 15:44 ` Avi Kivity @ 2009-08-26 16:14 ` Andrew Theurer 2009-08-26 16:26 ` Avi Kivity 2009-08-26 16:27 ` Brian Jackson 0 siblings, 2 replies; 8+ messages in thread From: Andrew Theurer @ 2009-08-26 16:14 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Yan Vugenfirer On Wed, 2009-08-26 at 18:44 +0300, Avi Kivity wrote: > On 08/26/2009 05:57 PM, Andrew Theurer wrote: > > I recently gathered some performance data when running Windows Server > > 2008 VMs, and I wanted to share it here. There are 12 Windows > > Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent > > execution of 6 J2EE type benchmarks. Each benchmark needs a App VM and > > a Database VM. The benchmark clients inject a fixed rate of requests > > which yields X% CPU utilization on the host. A different hypervisor was > > compared; KVM used about 60% more CPU cycles to complete the same amount > > of work. Both had their hypervisor specific paravirt IO drivers in the > > VMs. > > > > Server is a 2 socket Core/i7, SMT off, with 72 GB memory > > > > Did you use large pages? Yes. > > > Host kernel used was kvm.git v2.6.31-rc3-3419-g6df4865 > > Qemu was kvm-87. I tried a few newer versions of Qemu; none of them > > worked with the RedHat virtIO Windows drivers. I tried: > > > > f3600c589a9ee5ea4c0fec74ed4e06a15b461d52 > > 0.11.0-rc1 > > 0.10.6 > > kvm-88 > > > > All but 0.10.6 had "Problem code 10" driver error in the VM. 0.10.6 had > > "a disk read error occurred" very early in the booting of the VM. > > > > Yan? > > > I/O on the host was not what I would call very high: outbound network > > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was > > 243/sec and write ops was 561/sec > > > > What was the disk bandwidth used? Presumably, direct access to the > volume with cache=off? 2.4 MB/sec write, 0.6MB/sec read, cache=none The VMs' boot disks are IDE, but apps use their second disk which is virtio. > linux-aio should help reduce cpu usage. I assume this is in a newer version of Qemu? > > Host CPU breakdown was the following: > > > > user nice system irq softirq guest idle iowait > > 5.67 0.00 11.64 0.09 1.05 31.90 46.06 3.59 > > > > > > The amount of kernel time had me concerned. Here is oprofile: > > > > user+system is about 55% of guest time, and it's all overhead. > > >> samples % app name symbol name > >> 1163422 52.3744 kvm-intel.ko vmx_vcpu_run > >> 103996 4.6816 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_set_debugreg > >> 81036 3.6480 kvm.ko kvm_arch_vcpu_ioctl_run > >> 37913 1.7068 qemu-system-x86_64 cpu_physical_memory_rw > >> 34720 1.5630 qemu-system-x86_64 phys_page_find_alloc > >> > > We should really optimize these two. > > >> 23234 1.0459 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_write_msr_safe > >> 20964 0.9437 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_get_debugreg > >> 17628 0.7936 libc-2.5.so memcpy > >> 16587 0.7467 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __down_read > >> 15681 0.7059 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __up_read > >> 15466 0.6962 kvm.ko find_highest_vector > >> 14611 0.6578 qemu-system-x86_64 qemu_get_ram_ptr > >> 11254 0.5066 kvm-intel.ko vmcs_writel > >> 11133 0.5012 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 copy_user_generic_string > >> 10917 0.4915 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 native_read_msr_safe > >> 10760 0.4844 qemu-system-x86_64 virtqueue_get_head > >> 9025 0.4063 kvm-intel.ko vmx_handle_exit > >> 8953 0.4030 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 schedule > >> 8753 0.3940 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fget_light > >> 8465 0.3811 qemu-system-x86_64 virtqueue_avail_bytes > >> 8185 0.3685 kvm-intel.ko handle_cr > >> 8069 0.3632 kvm.ko kvm_set_irq > >> 7697 0.3465 kvm.ko kvm_lapic_sync_from_vapic > >> 7586 0.3415 qemu-system-x86_64 main_loop_wait > >> 7480 0.3367 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 do_select > >> 7121 0.3206 qemu-system-x86_64 lduw_phys > >> 7003 0.3153 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_exit > >> 6062 0.2729 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 kfree > >> 5477 0.2466 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 fput > >> 5454 0.2455 kvm.ko kvm_lapic_get_cr8 > >> 5096 0.2294 kvm.ko kvm_load_guest_fpu > >> 5057 0.2277 kvm.ko apic_update_ppr > >> 4929 0.2219 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 up_read > >> 4900 0.2206 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 audit_syscall_entry > >> 4866 0.2191 kvm.ko kvm_apic_has_interrupt > >> 4670 0.2102 kvm-intel.ko skip_emulated_instruction > >> 4644 0.2091 kvm.ko kvm_cpu_has_interrupt > >> 4548 0.2047 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 __switch_to > >> 4328 0.1948 kvm.ko kvm_apic_accept_pic_intr > >> 4303 0.1937 libpthread-2.5.so pthread_mutex_lock > >> 4235 0.1906 vmlinux-2.6.31-rc5-v2.6.31-rc3-3419-g6df4865-autokern1 system_call > >> 4175 0.1879 kvm.ko kvm_put_guest_fpu > >> 4170 0.1877 qemu-system-x86_64 ldl_phys > >> 4098 0.1845 kvm-intel.ko vmx_set_interrupt_shadow > >> 4003 0.1802 qemu-system-x86_64 kvm_run > >> > > I was wondering why the get/set debugreg was so high. I don't recall > > seeing this much with Linux VMs. > > > > Could it be that Windows uses the debug registers? Maybe we're > incorrectly deciding to switch them. I was wondering about that. I was thinking of just backing out the support for debugregs and see what happens. Did the up/down_read seem kind of high? Are we doing a lock of locking? > > Apart from that, nothing really stands out. We'll just have to optimize > things one by one. > > > Here is an average of kvm_stat: > > > > > > > >> efer_relo 0 > >> exits 1262814 > >> > > 100K exits/sec/vm. This is high. > > >> fpu_reloa 103842 > >> > > So is this -- maybe we're misdetecting fpu usage on EPT. > > >> halt_exit 9918 > >> halt_wake 9763 > >> host_stat 103846 > >> > > This is presumably due to virtio in qemu. > > >> hypercall 0 > >> insn_emul 23277 > >> insn_emul 23277 > >> invlpg 0 > >> io_exits 82717 > >> > > Yes, it is. > > >> irq_exits 12797 > >> irq_injec 18806 > >> irq_windo 1194 > >> largepage 12 > >> mmio_exit 0 > >> mmu_cache 0 > >> mmu_flood 0 > >> mmu_pde_z 0 > >> mmu_pte_u 0 > >> mmu_pte_w 0 > >> mmu_recyc 0 > >> mmu_shado 0 > >> mmu_unsyn 0 > >> nmi_injec 0 > >> nmi_windo 0 > >> pf_fixed 12 > >> pf_guest 0 > >> remote_tl 0 > >> request_i 0 > >> signal_ex 0 > >> tlb_flush 0 > >> > > For 12 VMs, do the number of exits/sec seem reasonable? > > > > Comments? > > > > Not all of the exits are accounted for, so we're missing a big part of > the picture. 2.6.32 will have better statistics through ftrace. Thanks for the comments! -Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performace data when running Windows VMs 2009-08-26 16:14 ` Andrew Theurer @ 2009-08-26 16:26 ` Avi Kivity 2009-08-26 17:51 ` Andrew Theurer 2009-08-26 16:27 ` Brian Jackson 1 sibling, 1 reply; 8+ messages in thread From: Avi Kivity @ 2009-08-26 16:26 UTC (permalink / raw) To: habanero; +Cc: kvm, Yan Vugenfirer On 08/26/2009 07:14 PM, Andrew Theurer wrote: > On Wed, 2009-08-26 at 18:44 +0300, Avi Kivity wrote: > >> On 08/26/2009 05:57 PM, Andrew Theurer wrote: >> >>> I recently gathered some performance data when running Windows Server >>> 2008 VMs, and I wanted to share it here. There are 12 Windows >>> Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent >>> execution of 6 J2EE type benchmarks. Each benchmark needs a App VM and >>> a Database VM. The benchmark clients inject a fixed rate of requests >>> which yields X% CPU utilization on the host. A different hypervisor was >>> compared; KVM used about 60% more CPU cycles to complete the same amount >>> of work. Both had their hypervisor specific paravirt IO drivers in the >>> VMs. >>> >>> Server is a 2 socket Core/i7, SMT off, with 72 GB memory >>> >>> >> Did you use large pages? >> > Yes. > The stats show 'largepage = 12'. Something's wrong. There's a commit (7736d680) that's supposed to fix largepage support for kvm-87, maybe it's incomplete. >>> I/O on the host was not what I would call very high: outbound network >>> averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was >>> 243/sec and write ops was 561/sec >>> >>> >> What was the disk bandwidth used? Presumably, direct access to the >> volume with cache=off? >> > 2.4 MB/sec write, 0.6MB/sec read, cache=none > The VMs' boot disks are IDE, but apps use their second disk which is > virtio. > Chickenfeed. Do the network stats include interguest traffic? I presume *all* of the traffic was interguest. >> linux-aio should help reduce cpu usage. >> > I assume this is in a newer version of Qemu? > No, posted and awaiting merge. >> Could it be that Windows uses the debug registers? Maybe we're >> incorrectly deciding to switch them. >> > I was wondering about that. I was thinking of just backing out the > support for debugregs and see what happens. > > Did the up/down_read seem kind of high? Are we doing a lock of locking? > It is. We do. Marcelo made some threats to remove this lock. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performace data when running Windows VMs 2009-08-26 16:26 ` Avi Kivity @ 2009-08-26 17:51 ` Andrew Theurer 2009-08-26 19:20 ` Avi Kivity 0 siblings, 1 reply; 8+ messages in thread From: Andrew Theurer @ 2009-08-26 17:51 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm, Yan Vugenfirer On Wed, 2009-08-26 at 19:26 +0300, Avi Kivity wrote: > On 08/26/2009 07:14 PM, Andrew Theurer wrote: > > On Wed, 2009-08-26 at 18:44 +0300, Avi Kivity wrote: > > > >> On 08/26/2009 05:57 PM, Andrew Theurer wrote: > >> > >>> I recently gathered some performance data when running Windows Server > >>> 2008 VMs, and I wanted to share it here. There are 12 Windows > >>> Server2008 64-bit VMs (1 vcpu, 2 GB) running which handle the concurrent > >>> execution of 6 J2EE type benchmarks. Each benchmark needs a App VM and > >>> a Database VM. The benchmark clients inject a fixed rate of requests > >>> which yields X% CPU utilization on the host. A different hypervisor was > >>> compared; KVM used about 60% more CPU cycles to complete the same amount > >>> of work. Both had their hypervisor specific paravirt IO drivers in the > >>> VMs. > >>> > >>> Server is a 2 socket Core/i7, SMT off, with 72 GB memory > >>> > >>> > >> Did you use large pages? > >> > > Yes. > > > > The stats show 'largepage = 12'. Something's wrong. There's a commit > (7736d680) that's supposed to fix largepage support for kvm-87, maybe > it's incomplete. How strange. /proc/meminfo showed that almost all of the pages were used: HugePages_Total: 12556 HugePages_Free: 220 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB I just assumed they were used properly. Maybe not. > >>> I/O on the host was not what I would call very high: outbound network > >>> averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was > >>> 243/sec and write ops was 561/sec > >>> > >>> > >> What was the disk bandwidth used? Presumably, direct access to the > >> volume with cache=off? > >> > > 2.4 MB/sec write, 0.6MB/sec read, cache=none > > The VMs' boot disks are IDE, but apps use their second disk which is > > virtio. > > > > Chickenfeed. > > Do the network stats include interguest traffic? I presume *all* of the > traffic was interguest. Sar network data: > IFACE rxpck/s txpck/s rxkB/s txkB/s > Average: lo 0.00 0.00 0.00 0.00 > Average: usb0 0.39 0.19 0.02 0.01 > Average: eth0 2968.83 5093.02 340.13 6966.64 > Average: eth1 2992.92 5124.08 342.75 7008.53 > Average: eth2 1455.53 2500.63 167.45 3421.64 > Average: eth3 1500.59 2574.36 171.98 3524.82 > Average: br0 2.41 0.95 0.32 0.13 > Average: br1 1.52 0.00 0.20 0.00 > Average: br2 1.52 0.00 0.20 0.00 > Average: br3 1.52 0.00 0.20 0.00 > Average: br4 0.00 0.00 0.00 0.00 > Average: tap3 669.38 708.07 290.89 140.81 > Average: tap109 678.53 723.58 294.07 143.31 > Average: tap215 673.20 711.47 291.99 141.78 > Average: tap321 675.26 719.33 293.01 142.37 > Average: tap27 679.23 729.90 293.86 143.60 > Average: tap133 680.17 734.08 294.33 143.85 > Average: tap2 1002.24 2214.19 3458.54 457.95 > Average: tap108 1021.85 2246.53 3491.02 463.48 > Average: tap214 1002.81 2195.22 3411.80 457.28 > Average: tap320 1017.43 2241.49 3508.20 462.54 > Average: tap26 1028.52 2237.98 3483.84 462.53 > Average: tap132 1034.05 2240.89 3493.37 463.32 tap0-99 go to eth0, 100-199 to eth1, 200-299 to eth2, 300-399 to eth4. There is some inter-guest traffic between VM pairs (like taps 2&3, 108&119, etc.) but not that significant. > > >> linux-aio should help reduce cpu usage. > >> > > I assume this is in a newer version of Qemu? > > > > No, posted and awaiting merge. > > >> Could it be that Windows uses the debug registers? Maybe we're > >> incorrectly deciding to switch them. > >> > > I was wondering about that. I was thinking of just backing out the > > support for debugregs and see what happens. > > > > Did the up/down_read seem kind of high? Are we doing a lock of locking? > > > > It is. We do. Marcelo made some threats to remove this lock. Thanks, -Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performace data when running Windows VMs 2009-08-26 17:51 ` Andrew Theurer @ 2009-08-26 19:20 ` Avi Kivity 0 siblings, 0 replies; 8+ messages in thread From: Avi Kivity @ 2009-08-26 19:20 UTC (permalink / raw) To: habanero; +Cc: kvm, Yan Vugenfirer On 08/26/2009 08:51 PM, Andrew Theurer wrote: >> >> The stats show 'largepage = 12'. Something's wrong. There's a commit >> (7736d680) that's supposed to fix largepage support for kvm-87, maybe >> it's incomplete. >> > How strange. /proc/meminfo showed that almost all of the pages were > used: > > HugePages_Total: 12556 > HugePages_Free: 220 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > > I just assumed they were used properly. Maybe not. > My mistake. The kvm_stat numbers you provided were rate (per second), so it just means it's still faulting in pages at a rate of 1 per guest per second. > >>>>> I/O on the host was not what I would call very high: outbound network >>>>> averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was >>>>> 243/sec and write ops was 561/sec >>>>> >>>>> >>>>> >>>> What was the disk bandwidth used? Presumably, direct access to the >>>> volume with cache=off? >>>> >>>> >>> 2.4 MB/sec write, 0.6MB/sec read, cache=none >>> The VMs' boot disks are IDE, but apps use their second disk which is >>> virtio. >>> >>> >> Chickenfeed. >> >> Do the network stats include interguest traffic? I presume *all* of the >> traffic was interguest. >> > Sar network data: > > >> IFACE rxpck/s txpck/s rxkB/s txkB/s >> Average: lo 0.00 0.00 0.00 0.00 >> Average: usb0 0.39 0.19 0.02 0.01 >> Average: eth0 2968.83 5093.02 340.13 6966.64 >> Average: eth1 2992.92 5124.08 342.75 7008.53 >> Average: eth2 1455.53 2500.63 167.45 3421.64 >> Average: eth3 1500.59 2574.36 171.98 3524.82 >> Average: br0 2.41 0.95 0.32 0.13 >> Average: br1 1.52 0.00 0.20 0.00 >> Average: br2 1.52 0.00 0.20 0.00 >> Average: br3 1.52 0.00 0.20 0.00 >> Average: br4 0.00 0.00 0.00 0.00 >> Average: tap3 669.38 708.07 290.89 140.81 >> Average: tap109 678.53 723.58 294.07 143.31 >> Average: tap215 673.20 711.47 291.99 141.78 >> Average: tap321 675.26 719.33 293.01 142.37 >> Average: tap27 679.23 729.90 293.86 143.60 >> Average: tap133 680.17 734.08 294.33 143.85 >> Average: tap2 1002.24 2214.19 3458.54 457.95 >> Average: tap108 1021.85 2246.53 3491.02 463.48 >> Average: tap214 1002.81 2195.22 3411.80 457.28 >> Average: tap320 1017.43 2241.49 3508.20 462.54 >> Average: tap26 1028.52 2237.98 3483.84 462.53 >> Average: tap132 1034.05 2240.89 3493.37 463.32 >> > tap0-99 go to eth0, 100-199 to eth1, 200-299 to eth2, 300-399 to eth4. > There is some inter-guest traffic between VM pairs (like taps 2&3, > 108&119, etc.) but not that significant. > Oh, so there are external load generators involved. Can you run this on kvm.git master, with CONFIG_TRACEPOINTS=y CONFIG_TRACER_MAX_TRACE=y CONFIG_RING_BUFFER=y CONFIG_FTRACE_NMI_ENTER=y CONFIG_EVENT_TRACING=y CONFIG_TRACING=y CONFIG_GENERIC_TRACER=y CONFIG_TRACING_SUPPORT=y CONFIG_FTRACE=y CONFIG_DYNAMIC_FTRACE=y (some may be overkill) and, while the test is running, do: cd /sys/kernel/debug/tracing echo kvm > set_event (wait two seconds) cat trace > /tmp/trace and send me /tmp/trace.bz2? should be quite big. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performace data when running Windows VMs 2009-08-26 16:14 ` Andrew Theurer 2009-08-26 16:26 ` Avi Kivity @ 2009-08-26 16:27 ` Brian Jackson 2009-08-26 17:52 ` Andrew Theurer 1 sibling, 1 reply; 8+ messages in thread From: Brian Jackson @ 2009-08-26 16:27 UTC (permalink / raw) To: habanero; +Cc: Avi Kivity, kvm, Yan Vugenfirer On Wednesday 26 August 2009 11:14:57 am Andrew Theurer wrote: <snip> > > > > > I/O on the host was not what I would call very high: outbound network > > > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was > > > 243/sec and write ops was 561/sec > > > > What was the disk bandwidth used? Presumably, direct access to the > > volume with cache=off? > > 2.4 MB/sec write, 0.6MB/sec read, cache=none > The VMs' boot disks are IDE, but apps use their second disk which is > virtio. In my testing, I got better performance from IDE than the new virtio block driver for windows. There appears to be some optimization left to do on them. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Performace data when running Windows VMs 2009-08-26 16:27 ` Brian Jackson @ 2009-08-26 17:52 ` Andrew Theurer 0 siblings, 0 replies; 8+ messages in thread From: Andrew Theurer @ 2009-08-26 17:52 UTC (permalink / raw) To: Brian Jackson; +Cc: Avi Kivity, kvm, Yan Vugenfirer On Wed, 2009-08-26 at 11:27 -0500, Brian Jackson wrote: > On Wednesday 26 August 2009 11:14:57 am Andrew Theurer wrote: > <snip> > > > > > > > I/O on the host was not what I would call very high: outbound network > > > > averaged at 163 Mbit/s inbound was 8 Mbit/s, while disk read ops was > > > > 243/sec and write ops was 561/sec > > > > > > What was the disk bandwidth used? Presumably, direct access to the > > > volume with cache=off? > > > > 2.4 MB/sec write, 0.6MB/sec read, cache=none > > The VMs' boot disks are IDE, but apps use their second disk which is > > virtio. > > > In my testing, I got better performance from IDE than the new virtio block > driver for windows. There appears to be some optimization left to do on them. Thanks Brian. I will try IDE on both VM disks to see how it compares. -Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-08-26 19:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-26 14:57 Performace data when running Windows VMs Andrew Theurer 2009-08-26 15:44 ` Avi Kivity 2009-08-26 16:14 ` Andrew Theurer 2009-08-26 16:26 ` Avi Kivity 2009-08-26 17:51 ` Andrew Theurer 2009-08-26 19:20 ` Avi Kivity 2009-08-26 16:27 ` Brian Jackson 2009-08-26 17:52 ` Andrew Theurer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.