* KVM and NUMA
@ 2010-07-15 17:10 Ralf Spenneberg
2010-07-15 19:31 ` Daniel P. Berrange
0 siblings, 1 reply; 3+ messages in thread
From: Ralf Spenneberg @ 2010-07-15 17:10 UTC (permalink / raw)
To: kvm
Hi,
I just had a chance to play with KVM on Ubuntu 10.04 LTS on some new HP
360 g6 with Nehalem processors. I have a feeling that KVM and NUMA on
these machines do not play well together.
Doing some benchmarks I got bizarre numbers. Sometimes the VMs were
performing fine and some times the performance was very bad! Apparently
KVM does not recognize the NUMA-architecture and places memory and
process randomly and therefore often on different numa cells.
First a couple of specs of the machine:
Two Nehalem sockets with E5520, Hyperthreading turned off, 4 cores per
socket, all in all 8 processors.
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
stepping : 5
...
Linux recognizes the NUMA-architecture:
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6
node 0 size: 12277 MB
node 0 free: 9183 MB
node 1 cpus: 1 3 5 7
node 1 size: 12287 MB
node 1 free: 8533 MB
node distances:
node 0 1
0: 10 20
1: 20 10
So I have got two cells with 4 cores each.
Virsh does not recognize the topology:
# virsh capabilities
<capabilities>
<host>
<cpu>
<arch>x86_64</arch>
<model>core2duo</model>
<topology sockets='2' cores='4' threads='1'/>
<feature name='lahf_lm'/>
..
I guess this is the fact, because QEMU does not recognize the
NUMA-Architecture (QEMU-Monitor):
(qemu) info numa
0 nodes
This is done on an Ubuntu 10.04 LTS:
Linux lxkvm01 2.6.32-23-server #37-Ubuntu SMP Fri Jun 11 09:11:11 UTC
2010 x86_64 GNU/Linux
The package used was:
qemu-kvm 0.12.3+noroms-0ubuntu9.2
When doing benchmarks numastat shows a lot of misses when the
performance is bad.
# numastat
node0 node1
numa_hit 15527158 17015505
numa_miss 7032982 3512950
numa_foreign 3512950 7032982
interleave_hit 8078 8264
local_node 15525187 17006655
other_node 7034953 3521800
So apparently KVM does not utilize the NUMA-architecture. Did I do
something wrong. Is KVM missing a patch? Do I need to activate something
in KVM to recognize the NUMA-Architecture?
I tried the newest Kernel Module 2.6.32.15 and Qemu-kvm 0.12.4 without a
change.
Any hints?
Kind regards,
Ralf
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: KVM and NUMA
2010-07-15 17:10 KVM and NUMA Ralf Spenneberg
@ 2010-07-15 19:31 ` Daniel P. Berrange
2010-07-16 6:35 ` Ralf Spenneberg
0 siblings, 1 reply; 3+ messages in thread
From: Daniel P. Berrange @ 2010-07-15 19:31 UTC (permalink / raw)
To: Ralf Spenneberg; +Cc: kvm
On Thu, Jul 15, 2010 at 07:10:35PM +0200, Ralf Spenneberg wrote:
> Hi,
>
> I just had a chance to play with KVM on Ubuntu 10.04 LTS on some new HP
> 360 g6 with Nehalem processors. I have a feeling that KVM and NUMA on
> these machines do not play well together.
>
> Doing some benchmarks I got bizarre numbers. Sometimes the VMs were
> performing fine and some times the performance was very bad! Apparently
> KVM does not recognize the NUMA-architecture and places memory and
> process randomly and therefore often on different numa cells.
>
> First a couple of specs of the machine:
> Two Nehalem sockets with E5520, Hyperthreading turned off, 4 cores per
> socket, all in all 8 processors.
>
>
> Linux recognizes the NUMA-architecture:
> # numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6
> node 0 size: 12277 MB
> node 0 free: 9183 MB
> node 1 cpus: 1 3 5 7
> node 1 size: 12287 MB
> node 1 free: 8533 MB
> node distances:
> node 0 1
> 0: 10 20
> 1: 20 10
If numactl --hardware works, then libvirt should work,
since libvirt uses the numactl library to query topology
>
> So I have got two cells with 4 cores each.
>
> Virsh does not recognize the topology:
> # virsh capabilities
> <capabilities>
> <host>
> <cpu>
> <arch>x86_64</arch>
> <model>core2duo</model>
> <topology sockets='2' cores='4' threads='1'/>
> <feature name='lahf_lm'/>
> ..
The NUMA topology does not get put inside the <cpu> element. It
is one level up in a <topology> element. eg
<capabilities>
<host>
<cpu>
<arch>x86_64</arch>
....snip....
</cpu>
...snip...
<topology>
<cells num='2'>
<cell id='0'>
<cpus num='4'>
<cpu id='0'/>
<cpu id='1'/>
<cpu id='2'/>
<cpu id='3'/>
</cpus>
</cell>
<cell id='1'>
<cpus num='4'>
<cpu id='4'/>
<cpu id='5'/>
<cpu id='6'/>
<cpu id='7'/>
</cpus>
</cell>
</cells>
</topology>
This shows 2 numa nodes (cells in libvirt terminology) each with
4 CPUs. You can also query free RAM in each node/cell
# virsh freecell 0
0: 1922084 kB
# virsh freecell 1
1: 1035700 kB
>From both of these you can then decide where to place the guest
> I guess this is the fact, because QEMU does not recognize the
> NUMA-Architecture (QEMU-Monitor):
> (qemu) info numa
> 0 nodes
IIRC this is reporting the guest NUMA topology which is
completely independant of host NUMA topology.
> So apparently KVM does not utilize the NUMA-architecture. Did I do
> something wrong. Is KVM missing a patch? Do I need to activate something
> in KVM to recognize the NUMA-Architecture?
There are two aspects to NUMA. 1. Placing QEMU on appropriate NUMA
ndes. 2. defining guest NUMA topology
By default QEMU will float freely across any CPUs and all the guest
RAM will appear in one node. This is can be bad for performance,
especially if you are benchmarking
So for performance testing you definitely want to bind QEMU to the
CPUs within a single NUMA node at startup, this will mean that all
memory accesses are local to the node. Unless you give the guest
more virtual RAM, than there is free RAM on the local NUMA node.
Since you suggest you're using libvirt, the low level way todo
this is in the guest XML at the <vcpu> element
In my capabilities XML example above you can see 2 numa nodes,
each with 4 cpus. So if I want to restrict the guest to the
first NUMA node which has CPU numbers 0, 1, 2, 3, then I'd do
<domain type='kvm' id='8'>
<name>rhel6x86_64</name>
<uuid>0bbf8187-bce1-bc77-2a2c-fb033816f7f4</uuid>
<memory>819200</memory>
<currentMemory>819200</currentMemory>
<vcpu cpuset='0-3'>2</vcpu>
...snip...
You can verify the pinning with virsh cpuinfo
# virsh vcpuinfo rhel5xen
VCPU: 0
CPU: 1
State: running
CPU time: 15.9s
CPU Affinity: yyyy----
VCPU: 1
CPU: 2
State: running
CPU time: 9.5s
CPU Affinity: yyyy----
....snip rest...
It is not yet possible to define the guest visible NUMA topology via
libvirt, but that shouldn't be too critical for performance unless
you needed to your guest to be able to span multiple host nodes.
For further performance you also really want to enable hugepages on
your host (eg mount hugetlbfs at /dev/hugepages), then restart
libvirtd daemon, and then add the following to your guest XML just
after the <memory> element:
<memoryBacking>
<hugepages/>
</memoryBacking>
This will make it pre-allocate hugepages for all guest RAM at startup.
NB the downside is that you can't overcommit RAM, but that's a tradeoff
between maximising utilization and maximising performance.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: KVM and NUMA
2010-07-15 19:31 ` Daniel P. Berrange
@ 2010-07-16 6:35 ` Ralf Spenneberg
0 siblings, 0 replies; 3+ messages in thread
From: Ralf Spenneberg @ 2010-07-16 6:35 UTC (permalink / raw)
To: kvm
Hi Daniel,
thanks for your response.
Am Donnerstag, den 15.07.2010, 20:31 +0100 schrieb Daniel P. Berrange:
> If numactl --hardware works, then libvirt should work,
> since libvirt uses the numactl library to query topology
Ok. I did not know that, and in my case it does not seem to work. See
below.
> The NUMA topology does not get put inside the <cpu> element. It
> is one level up in a <topology> element. eg
>
In my case (Ubuntu 10.04 LTS) it is just put inside the cpu element.
Full host listing:
<capabilities>
<host>
<cpu>
<arch>x86_64</arch>
<model>core2duo</model>
<topology sockets='2' cores='4' threads='1'/>
<feature name='lahf_lm'/>
<feature name='rdtscp'/>
<feature name='popcnt'/>
<feature name='dca'/>
<feature name='xtpr'/>
<feature name='cx16'/>
<feature name='tm2'/>
<feature name='est'/>
<feature name='vmx'/>
<feature name='ds_cpl'/>
<feature name='pbe'/>
<feature name='tm'/>
<feature name='ht'/>
<feature name='ss'/>
<feature name='acpi'/>
<feature name='ds'/>
</cpu>
<migration_features>
<live/>
<uri_transports>
<uri_transport>tcp</uri_transport>
</uri_transports>
</migration_features>
<secmodel>
<model>apparmor</model>
<doi>0</doi>
</secmodel>
</host>
> > I guess this is the fact, because QEMU does not recognize the
> > NUMA-Architecture (QEMU-Monitor):
> > (qemu) info numa
> > 0 nodes
Thanks for the clarification.
> There are two aspects to NUMA. 1. Placing QEMU on appropriate NUMA
> ndes. 2. defining guest NUMA topology
Right. I am interested in placing Qemu on the appropriate node.
>
> By default QEMU will float freely across any CPUs and all the guest
> RAM will appear in one node. This is can be bad for performance,
> especially if you are benchmarking
> So for performance testing you definitely want to bind QEMU to the
> CPUs within a single NUMA node at startup, this will mean that all
> memory accesses are local to the node. Unless you give the guest
> more virtual RAM, than there is free RAM on the local NUMA node.
> Since you suggest you're using libvirt, the low level way todo
> this is in the guest XML at the <vcpu> element
Ok. But will my Qemu implementation use the appropriate RAM since it
does not recognize the architecture?
> For further performance you also really want to enable hugepages on
> your host (eg mount hugetlbfs at /dev/hugepages), then restart
> libvirtd daemon, and then add the following to your guest XML just
> after the <memory> element:
>
> <memoryBacking>
> <hugepages/>
> </memoryBacking>
I have played with that, too. I could mount the hugetlbfs filesystem and
define the mountpoint in libvirt. The guest started ok but I could
verify that it was actually used. /proc/meminfo always showed 100% free
huge pages whether the guest was running or not. Shouldn't these pages
be used when the guest is running?
As I said: Ubuntu not RHEL.
Kind regards,
Ralf
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-07-16 6:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-15 17:10 KVM and NUMA Ralf Spenneberg
2010-07-15 19:31 ` Daniel P. Berrange
2010-07-16 6:35 ` Ralf Spenneberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox