From: "Daniel P. Berrange" <berrange@redhat.com>
To: Ralf Spenneberg <software@opensource-security.de>
Cc: kvm@vger.kernel.org
Subject: Re: KVM and NUMA
Date: Thu, 15 Jul 2010 20:31:24 +0100 [thread overview]
Message-ID: <20100715193124.GA24837@redhat.com> (raw)
In-Reply-To: <1279213835.21655.77.camel@localhost>
On Thu, Jul 15, 2010 at 07:10:35PM +0200, Ralf Spenneberg wrote:
> Hi,
>
> I just had a chance to play with KVM on Ubuntu 10.04 LTS on some new HP
> 360 g6 with Nehalem processors. I have a feeling that KVM and NUMA on
> these machines do not play well together.
>
> Doing some benchmarks I got bizarre numbers. Sometimes the VMs were
> performing fine and some times the performance was very bad! Apparently
> KVM does not recognize the NUMA-architecture and places memory and
> process randomly and therefore often on different numa cells.
>
> First a couple of specs of the machine:
> Two Nehalem sockets with E5520, Hyperthreading turned off, 4 cores per
> socket, all in all 8 processors.
>
>
> Linux recognizes the NUMA-architecture:
> # numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6
> node 0 size: 12277 MB
> node 0 free: 9183 MB
> node 1 cpus: 1 3 5 7
> node 1 size: 12287 MB
> node 1 free: 8533 MB
> node distances:
> node 0 1
> 0: 10 20
> 1: 20 10
If numactl --hardware works, then libvirt should work,
since libvirt uses the numactl library to query topology
>
> So I have got two cells with 4 cores each.
>
> Virsh does not recognize the topology:
> # virsh capabilities
> <capabilities>
> <host>
> <cpu>
> <arch>x86_64</arch>
> <model>core2duo</model>
> <topology sockets='2' cores='4' threads='1'/>
> <feature name='lahf_lm'/>
> ..
The NUMA topology does not get put inside the <cpu> element. It
is one level up in a <topology> element. eg
<capabilities>
<host>
<cpu>
<arch>x86_64</arch>
....snip....
</cpu>
...snip...
<topology>
<cells num='2'>
<cell id='0'>
<cpus num='4'>
<cpu id='0'/>
<cpu id='1'/>
<cpu id='2'/>
<cpu id='3'/>
</cpus>
</cell>
<cell id='1'>
<cpus num='4'>
<cpu id='4'/>
<cpu id='5'/>
<cpu id='6'/>
<cpu id='7'/>
</cpus>
</cell>
</cells>
</topology>
This shows 2 numa nodes (cells in libvirt terminology) each with
4 CPUs. You can also query free RAM in each node/cell
# virsh freecell 0
0: 1922084 kB
# virsh freecell 1
1: 1035700 kB
>From both of these you can then decide where to place the guest
> I guess this is the fact, because QEMU does not recognize the
> NUMA-Architecture (QEMU-Monitor):
> (qemu) info numa
> 0 nodes
IIRC this is reporting the guest NUMA topology which is
completely independant of host NUMA topology.
> So apparently KVM does not utilize the NUMA-architecture. Did I do
> something wrong. Is KVM missing a patch? Do I need to activate something
> in KVM to recognize the NUMA-Architecture?
There are two aspects to NUMA. 1. Placing QEMU on appropriate NUMA
ndes. 2. defining guest NUMA topology
By default QEMU will float freely across any CPUs and all the guest
RAM will appear in one node. This is can be bad for performance,
especially if you are benchmarking
So for performance testing you definitely want to bind QEMU to the
CPUs within a single NUMA node at startup, this will mean that all
memory accesses are local to the node. Unless you give the guest
more virtual RAM, than there is free RAM on the local NUMA node.
Since you suggest you're using libvirt, the low level way todo
this is in the guest XML at the <vcpu> element
In my capabilities XML example above you can see 2 numa nodes,
each with 4 cpus. So if I want to restrict the guest to the
first NUMA node which has CPU numbers 0, 1, 2, 3, then I'd do
<domain type='kvm' id='8'>
<name>rhel6x86_64</name>
<uuid>0bbf8187-bce1-bc77-2a2c-fb033816f7f4</uuid>
<memory>819200</memory>
<currentMemory>819200</currentMemory>
<vcpu cpuset='0-3'>2</vcpu>
...snip...
You can verify the pinning with virsh cpuinfo
# virsh vcpuinfo rhel5xen
VCPU: 0
CPU: 1
State: running
CPU time: 15.9s
CPU Affinity: yyyy----
VCPU: 1
CPU: 2
State: running
CPU time: 9.5s
CPU Affinity: yyyy----
....snip rest...
It is not yet possible to define the guest visible NUMA topology via
libvirt, but that shouldn't be too critical for performance unless
you needed to your guest to be able to span multiple host nodes.
For further performance you also really want to enable hugepages on
your host (eg mount hugetlbfs at /dev/hugepages), then restart
libvirtd daemon, and then add the following to your guest XML just
after the <memory> element:
<memoryBacking>
<hugepages/>
</memoryBacking>
This will make it pre-allocate hugepages for all guest RAM at startup.
NB the downside is that you can't overcommit RAM, but that's a tradeoff
between maximising utilization and maximising performance.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
next prev parent reply other threads:[~2010-07-15 19:31 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-15 17:10 KVM and NUMA Ralf Spenneberg
2010-07-15 19:31 ` Daniel P. Berrange [this message]
2010-07-16 6:35 ` Ralf Spenneberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100715193124.GA24837@redhat.com \
--to=berrange@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=software@opensource-security.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox