public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Windows Server 2008 VM performance
@ 2009-06-02 21:05 Andrew Theurer
  2009-06-02 21:38 ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Theurer @ 2009-06-02 21:05 UTC (permalink / raw)
  To: KVM list

I've been looking at how KVM handles windows guests, and I am a little 
concerned with the CPU overhead.  My test case is as follows:

I am running 4 instances of a J2EE benchmark.  Each instance needs one 
application server and one DB server.  8 VMs in total are used.

I have the same App and DB software for Linux and Windows (and same 
versions) so I can compare between Linux and Windows.  I also have 
another hypervisor which I can test both Windows and Linux VMs.

The host has EPT capable processors.  VMs in KVM are backed with large 
pages.


Test results:

Config                                CPU utilization
------                                -----
KVM-85
   Windows Server 2008 64-bit VMs     44.84
   RedHat 5.3 w/ 2.6.29 64-bit VMs    24.56
Other-Hypervisor
   Windows Server 2008 64-bit VMs     30.63
   RedHat 5.3 w/ 2.6.18 64-bit VMs    27.13

-KVM running Windows VMs uses 46% more CPU than the Other-Hypervisor
-The Other-Hypervisor provides an optimized virtual network driver
-KVM results listed above did not use virtio_net or virtio_disk for 
Windows, but do for Linux
-One extra KVM run (not listed above) was made with virtio_net for 
Windows VMs but only reduced CPU by 2%
-Most of the CPU overhead could be attributed to the DB VMs, where there 
is about 5 MB/sec writes per VM
-I don't have a virtio_block driver for Windows to test.  Does one exist?
-All tests above had 2 vCPUS per VM


Here's a comparison of kvm_stat between Windows (run1) and Linux (run2):

                 run1          run2        run1/run2
                 ----          ----        ---------
efer_relo:          0             0         1
exits    :    1206880        121916         9.899
fpu_reloa:     210969         20863        10.112
halt_exit:      15092         13222         1.141
halt_wake:      14466          9294         1.556
host_stat:     211066         45117         4.678
hypercall:          0             0         1
insn_emul:     119582         38126         3.136
insn_emul:          0             0         1
invlpg   :          0             0         1
io_exits :     131051         26349         4.974
irq_exits:       8128         12937         0.628
irq_injec:      29955         21825         1.373
irq_windo:       2504          2022         1.238
kvm_reque:          0             0         1
largepage:          1            64         0.009
mmio_exit:      59224             0           Inf
mmu_cache:          0             3         0.000
mmu_flood:          0             0         1
mmu_pde_z:          0             0         1
mmu_pte_u:          0             0         1
mmu_pte_w:          0             0         1
mmu_recyc:          0             0         1
mmu_shado:          0             0         1
mmu_unsyn:          0             0         1
mmu_unsyn:          0             0         1
nmi_injec:          0             0         1
nmi_windo:          0             0         1
pf_fixed :          1            67         0.009
pf_guest :          0             0         1
remote_tl:          0             0         1
request_n:          0             0         1
signal_ex:          0             0         1
tlb_flush:        220         14037         0.016


10x the number of exits, a problem?

I happened to try just one vCPU per VM for KVM/Windows VMs, and I was 
surprised how much of a difference it made:

Config                                               CPU utilization
------                                               -----
KVM-85
   Windows Server 2008 64-bit VMs, 2 vCPU per VM     44.84
   Windows Server 2008 64-bit VMs, 1 vCPU per VM     36.44

A 19% reduction in CPU utilization vs KVM/Windows-2vCPU!  Does not 
explain all the overhead (vs Other-Hypervisor, 2 vCPUs per VM) but, that 
sure seems like a lot between 1 to 2 vCPUs for KVM/Windows-VMs.  I have 
not run with 1 vCPU per VM with Other-Hypervisor, but I will soon.  
Anyway, I also collected kvm_stat for the 1 vCPU case, and here it is 
compared to KVM/Linux VMs with 2 vCPUs:

                 run1          run2        run1/run2
                 ----          ----        ---------
efer_relo:          0             0         1
exits    :    1184471        121916         9.715
fpu_reloa:     192766         20863         9.240
halt_exit:       4697         13222         0.355
halt_wake:       4360          9294         0.469
host_stat:     192828         45117         4.274
hypercall:          0             0         1
insn_emul:     130487         38126         3.422
insn_emul:          0             0         1
invlpg   :          0             0         1
io_exits :     114430         26349         4.343
irq_exits:       7075         12937         0.547
irq_injec:      29930         21825         1.371
irq_windo:       2391          2022         1.182
kvm_reque:          0             0         1
largepage:          0            64         0.001
mmio_exit:      69028             0           Inf
mmu_cache:          0             3         0.000
mmu_flood:          0             0         1
mmu_pde_z:          0             0         1
mmu_pte_u:          0             0         1
mmu_pte_w:          0             0         1
mmu_recyc:          0             0         1
mmu_shado:          0             0         1
mmu_unsyn:          0             0         1
mmu_unsyn:          0             0         1
nmi_injec:          0             0         1
nmi_windo:          0             0         1
pf_fixed :          0            67         0.001
pf_guest :          0             0         1
remote_tl:          0             0         1
request_n:          0             0         1
signal_ex:          0             0         1
tlb_flush:        124         14037         0.009

Still see the huge difference in vm_exits, so I guess not all is great yet.

So, what would be the best course of action for this?  Is there a 
virtio_block driver to test?  Can we find the root cause of the exits 
(is there a way to get stack dump or something that can show where there 
are coming from)?

Thanks!

-Andrew


P.S. Here is the qemu cmd line for the windows VMs:
/usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda 
/dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a -net 
nic,model=e1000,vlan=0,macaddr=00:50:56:00:00:03 -m 1024 -mempath 
/hugetlbfs -net tap,vlan=0,ifname=tap3,script=/etc/qemu-ifup -vnc 
127.0.0.1:3 -smp 2 -daemonize




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Windows Server 2008 VM performance
  2009-06-02 21:05 Windows Server 2008 VM performance Andrew Theurer
@ 2009-06-02 21:38 ` Avi Kivity
  2009-06-02 21:45   ` Javier Guerra
  2009-06-03 14:21   ` Andrew Theurer
  0 siblings, 2 replies; 7+ messages in thread
From: Avi Kivity @ 2009-06-02 21:38 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: KVM list

Andrew Theurer wrote:
> I've been looking at how KVM handles windows guests, and I am a little 
> concerned with the CPU overhead.  My test case is as follows:
>
> I am running 4 instances of a J2EE benchmark.  Each instance needs one 
> application server and one DB server.  8 VMs in total are used.
>
> I have the same App and DB software for Linux and Windows (and same 
> versions) so I can compare between Linux and Windows.  I also have 
> another hypervisor which I can test both Windows and Linux VMs.
>
> The host has EPT capable processors.  VMs in KVM are backed with large 
> pages.
>
>
> Test results:
>
> Config                                CPU utilization
> ------                                -----
> KVM-85
>   Windows Server 2008 64-bit VMs     44.84
>   RedHat 5.3 w/ 2.6.29 64-bit VMs    24.56
> Other-Hypervisor
>   Windows Server 2008 64-bit VMs     30.63
>   RedHat 5.3 w/ 2.6.18 64-bit VMs    27.13
>
> -KVM running Windows VMs uses 46% more CPU than the Other-Hypervisor
> -The Other-Hypervisor provides an optimized virtual network driver
> -KVM results listed above did not use virtio_net or virtio_disk for 
> Windows, but do for Linux
> -One extra KVM run (not listed above) was made with virtio_net for 
> Windows VMs but only reduced CPU by 2%

I think the publicly available driver is pretty old and unoptimized.

> -Most of the CPU overhead could be attributed to the DB VMs, where 
> there is about 5 MB/sec writes per VM
> -I don't have a virtio_block driver for Windows to test.  Does one exist?

Exists, not public though.

> -All tests above had 2 vCPUS per VM
>
>
> Here's a comparison of kvm_stat between Windows (run1) and Linux (run2):
>
>                 run1          run2        run1/run2
>                 ----          ----        ---------
> efer_relo:          0             0         1
> exits    :    1206880        121916         9.899

total exits is the prime measure of course.

> fpu_reloa:     210969         20863        10.112
> halt_exit:      15092         13222         1.141
> halt_wake:      14466          9294         1.556
> host_stat:     211066         45117         4.678

host state reloads measure (approximately) exits to userspace, likely 
due to the unoptimized drivers.

> hypercall:          0             0         1
> insn_emul:     119582         38126         3.136

again lack of drivers

> insn_emul:          0             0         1
> invlpg   :          0             0         1
> io_exits :     131051         26349         4.974

ditto

> irq_exits:       8128         12937         0.628
> irq_injec:      29955         21825         1.373
> irq_windo:       2504          2022         1.238
> kvm_reque:          0             0         1
> largepage:          1            64         0.009
> mmio_exit:      59224             0           Inf

wow, linux avoids mmio completely.  good.

>
>
> 10x the number of exits, a problem?

_the_ problem.

>
> I happened to try just one vCPU per VM for KVM/Windows VMs, and I was 
> surprised how much of a difference it made:
>
> Config                                               CPU utilization
> ------                                               -----
> KVM-85
>   Windows Server 2008 64-bit VMs, 2 vCPU per VM     44.84
>   Windows Server 2008 64-bit VMs, 1 vCPU per VM     36.44
>
> A 19% reduction in CPU utilization vs KVM/Windows-2vCPU!  Does not 
> explain all the overhead (vs Other-Hypervisor, 2 vCPUs per VM) but, 
> that sure seems like a lot between 1 to 2 vCPUs for KVM/Windows-VMs.  

Inter-process communication is expensive.  Marcelo added some 
optimizations (the sending vcpu used to wait for the target vcpu, now 
they don't).  They're in kvm-86 (of course).  You'll need 2.6.26+ on the 
host for them to take effect (of course, for many features, the more 
recent the host the faster kvm on top can run).

Windows 2008 also implements some hypervisor accelerations which are 
especially useful on smp.  Gleb has started some work on this.  I don't 
know if other_hypervisor implements them.

Finally, smp is expensive! data moves across caches, processors wait for 
each other, etc.

> I have not run with 1 vCPU per VM with Other-Hypervisor, but I will 
> soon.  Anyway, I also collected kvm_stat for the 1 vCPU case, and here 
> it is compared to KVM/Linux VMs with 2 vCPUs:
>
>                 run1          run2        run1/run2
>                 ----          ----        ---------
> efer_relo:          0             0         1
> exits    :    1184471        121916         9.715
>
> Still see the huge difference in vm_exits, so I guess not all is great 
> yet.

Yeah, exit rate stayed the same, so it's probably IPC costs and 
intrinsic smp costs.

> So, what would be the best course of action for this?


> Is there a virtio_block driver to test?  

There is, but it isn't available yet.

> Can we find the root cause of the exits (is there a way to get stack 
> dump or something that can show where there are coming from)?

Marcelo is working on a super-duper easy to use kvm trace which can show 
what's going on.  The old one is reasonably easy though it exports less 
data.  If you can generate some traces, I'll have a look at them.

>
> P.S. Here is the qemu cmd line for the windows VMs:
> /usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda 
> /dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a 

Use: -drive /dev/very-long-name,cache=none instead of -hda to disable 
the host cache.  Won't make much of a differnence for 5 MB/s though.



-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Windows Server 2008 VM performance
  2009-06-02 21:38 ` Avi Kivity
@ 2009-06-02 21:45   ` Javier Guerra
  2009-06-02 21:48     ` Avi Kivity
  2009-06-03 14:21   ` Andrew Theurer
  1 sibling, 1 reply; 7+ messages in thread
From: Javier Guerra @ 2009-06-02 21:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Andrew Theurer, KVM list

On Tue, Jun 2, 2009 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
> Andrew Theurer wrote:
>> P.S. Here is the qemu cmd line for the windows VMs:
>> /usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
>> /dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a
>
> Use: -drive /dev/very-long-name,cache=none instead of -hda to disable the
> host cache.  Won't make much of a differnence for 5 MB/s though.

would emulating SCSI instead of IDE help while we hope and wait for
virtio_block drivers? (and better virtio_net).

-- 
Javier

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Windows Server 2008 VM performance
  2009-06-02 21:45   ` Javier Guerra
@ 2009-06-02 21:48     ` Avi Kivity
  2009-06-02 22:15       ` Ryan Harper
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-06-02 21:48 UTC (permalink / raw)
  To: Javier Guerra; +Cc: Andrew Theurer, KVM list

Javier Guerra wrote:
> On Tue, Jun 2, 2009 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
>   
>> Andrew Theurer wrote:
>>     
>>> P.S. Here is the qemu cmd line for the windows VMs:
>>> /usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
>>> /dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a
>>>       
>> Use: -drive /dev/very-long-name,cache=none instead of -hda to disable the
>> host cache.  Won't make much of a differnence for 5 MB/s though.
>>     
>
> would emulating SCSI instead of IDE help while we hope and wait for
> virtio_block drivers? (and better virtio_net).
>   

I don't think the scsi emulation currently supports command parallelism, 
but I could be wrong.

Worth trying.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Windows Server 2008 VM performance
  2009-06-02 21:48     ` Avi Kivity
@ 2009-06-02 22:15       ` Ryan Harper
  0 siblings, 0 replies; 7+ messages in thread
From: Ryan Harper @ 2009-06-02 22:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Javier Guerra, Andrew Theurer, KVM list

* Avi Kivity <avi@redhat.com> [2009-06-02 16:49]:
> Javier Guerra wrote:
> >On Tue, Jun 2, 2009 at 4:38 PM, Avi Kivity <avi@redhat.com> wrote:
> >  
> >>Andrew Theurer wrote:
> >>    
> >>>P.S. Here is the qemu cmd line for the windows VMs:
> >>>/usr/local/bin/qemu-system-x86_64 -name newcastle-xdbt01 -hda
> >>>/dev/disk/by-id/scsi-3600a0b80000f1eb10000074f4a02b08a
> >>>      
> >>Use: -drive /dev/very-long-name,cache=none instead of -hda to disable the
> >>host cache.  Won't make much of a differnence for 5 MB/s though.
> >>    
> >
> >would emulating SCSI instead of IDE help while we hope and wait for
> >virtio_block drivers? (and better virtio_net).
> >  
> 
> I don't think the scsi emulation currently supports command parallelism, 
> but I could be wrong.
> 
For reads only, and I don't think 2k8 installs to lsi scsi acutally
complete.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Windows Server 2008 VM performance
  2009-06-02 21:38 ` Avi Kivity
  2009-06-02 21:45   ` Javier Guerra
@ 2009-06-03 14:21   ` Andrew Theurer
  2009-06-03 17:38     ` Marcelo Tosatti
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Theurer @ 2009-06-03 14:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list

Avi Kivity wrote:
> Andrew Theurer wrote:
>
>
>> Is there a virtio_block driver to test?  
>
> There is, but it isn't available yet.
OK.  Can I assume a better virtio_net driver is in the works as well?
>
>> Can we find the root cause of the exits (is there a way to get stack 
>> dump or something that can show where there are coming from)?
>
> Marcelo is working on a super-duper easy to use kvm trace which can 
> show what's going on.  The old one is reasonably easy though it 
> exports less data.  If you can generate some traces, I'll have a look 
> at them.

Thanks Avi.  I'll try out kvm-86 and see if I can generate some kvm 
trace data.

-Andrew


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Windows Server 2008 VM performance
  2009-06-03 14:21   ` Andrew Theurer
@ 2009-06-03 17:38     ` Marcelo Tosatti
  0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2009-06-03 17:38 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Avi Kivity, KVM list

[-- Attachment #1: Type: text/plain, Size: 1607 bytes --]

On Wed, Jun 03, 2009 at 09:21:16AM -0500, Andrew Theurer wrote:
> Avi Kivity wrote:
>> Andrew Theurer wrote:
>>
>>
>>> Is there a virtio_block driver to test?  
>>
>> There is, but it isn't available yet.
> OK.  Can I assume a better virtio_net driver is in the works as well?
>>
>>> Can we find the root cause of the exits (is there a way to get stack  
>>> dump or something that can show where there are coming from)?
>>
>> Marcelo is working on a super-duper easy to use kvm trace which can  
>> show what's going on.  The old one is reasonably easy though it  
>> exports less data.  If you can generate some traces, I'll have a look  
>> at them.
>
> Thanks Avi.  I'll try out kvm-86 and see if I can generate some kvm  
> trace data.

clone
git://git.kernel.org/pub/scm/linux/kernel/git/marcelo/linux-2.6-x86-kvmtrace.git, 
checkout (remote) branch kvmtrace, _build with KVM=y,KVM_{INTEL,AMD}=y_ 
(there's a missing export symbol, will pull the upstream fix ASAP).

Then once running the benchmark with this kernel, on a phase thats
indicative of the workload, do:

echo kvm_exit > /debugfs/tracing/set_event
sleep n
cat /debugfs/tracing/trace > /tmp/trace-save.txt

Make n relatively small, 1 or 2 seconds. Would be nice to see UP vs SMP
Win2008 guests.

echo > /debugfs/tracing/set_event to stop tracing
echo > /debugfs/tracing/trace to zero the trace buffer

With some post processing you can the get the exit reason percentages.

Alternatively you can use systemtap (see attached script, need some
adjustment in the entry/exit line number probes) so it calculates the
exit percentages for you.



[-- Attachment #2: kvm.stp --]
[-- Type: text/plain, Size: 1722 bytes --]

global exit_names[70]
global latency[70]
global vmexit
global vmentry
global vmx_exit_reason

// run stap -t kvm.stp to find out the overhead of which probe
// and change it here. should be automatic!
global exit_probe_overhead = 645
global entry_probe_overhead = 882
global handlexit_probe_overhead = 405
global overhead

probe begin (1)
{
    overhead = 882 + 645 + 405
    exit_names[0] = "EXCEPTION"
    exit_names[1] = "EXTERNAL INT"
    exit_names[7] = "PENDING INTERRUPT"
    exit_names[10] = "CPUID"
    exit_names[18] = "HYPERCALL"
    exit_names[28] = "CR ACCESS"
    exit_names[29] = "DR ACCESS"
    exit_names[30] = "IO INSTRUCTION (LIGHT)"
    exit_names[31] = "MSR READ"
    exit_names[32] = "MSR WRITE"
    exit_names[44] = "APIC ACCESS"
    exit_names[60] = "IO INSTRUCTION (HEAVY)"
}

probe module("kvm_intel").statement("kvm_handle_exit@arch/x86/kvm/vmx.c:3011")
{
    vmx_exit_reason = $exit_reason
}

// exit
probe module("kvm_intel").statement("vmx_vcpu_run@arch/x86/kvm/vmx.c:3226") 
{
    vmexit = get_cycles()
}

// entry
probe module("kvm_intel").function("vmx_vcpu_run")
{
    vmentry = get_cycles()
    if (vmx_exit_reason != 12)
        latency[vmx_exit_reason] <<< vmentry - vmexit - overhead
}

//heavy-exit
probe module("kvm").function("kvm_arch_vcpu_ioctl_run")
{
    vmx_exit_reason = 60
}

probe end {
       foreach (x in latency-) {
        printf("%d: %s\n", x, exit_names[x])
        printf ("avg %d = sum %d / count %d\n",
                   @avg(latency[x]), @sum(latency[x]), @count(latency[x]))
        printf ("min %d max %d\n", @min(latency[x]), @max(latency[x]))
        print(@hist_linear(latency[x], 10000000, 50000000, 10000000))
        }
        
        print("\n")
}




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-06-03 17:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-02 21:05 Windows Server 2008 VM performance Andrew Theurer
2009-06-02 21:38 ` Avi Kivity
2009-06-02 21:45   ` Javier Guerra
2009-06-02 21:48     ` Avi Kivity
2009-06-02 22:15       ` Ryan Harper
2009-06-03 14:21   ` Andrew Theurer
2009-06-03 17:38     ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox