From: Andre Przywara <andre.przywara@amd.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, "Daniel P. Berrange" <berrange@redhat.com>,
Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Date: Mon, 1 Dec 2008 15:15:19 +0100 [thread overview]
Message-ID: <4933F177.5040802@amd.com> (raw)
In-Reply-To: <49318A10.7080801@redhat.com>
Avi Kivity wrote:
> Andre Przywara wrote:
>> The user (or better: management application) specifies the host nodes
>> the guest should use: -nodes 2,3 would create a two node guest mapped to
>> node 2 and 3 on the host. These numbers are handed over to libnuma:
>> VCPUs are pinned to the nodes and the allocated guest memory is bound to
>> it's respective node. Since libnuma seems not to be installed
>> everywhere, the user has to enable this via configure --enable-numa
>> In the BIOS code an ACPI SRAT table was added, which describes the NUMA
>> topology to the guest. The number of nodes is communicated via the CMOS
>> RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me.
>
> There exists now a firmware interface in qemu for this kind of
> communications.
Oh, right you are, I missed that (was well hidden). I was looking at how
the BIOS detects memory size and CPU numbers and these methods are quite
cumbersome. Why not convert them to the FW_CFG methods (which the qemu
side already sets)? To not diverge too much from the original BOCHS BIOS?
>> Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes
>> parameter reverts to the old behavior.
>
> '-nodes' is too generic a name ('node' could also mean a host). Suggest
> -numanode.
>
> Need more flexibility: specify the range of memory per node, which cpus
> are in the node, relative weights for the SRAT table:
>
> -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3
I converted my code to use the new firmware interface. This also makes
it possible to pass more information between qemu and BIOS (which
prevented a more flexible command line in the first version).
So I would opt for the following:
- use numanode (or simply numa?) instead of the misleading -nodes
- allow passing memory sizes, VCPU subsets and host CPU pin info
I would prefer Daniel's version:
-numa <nrnodes>[,mem:<node1size>[;<node2size>...]]
[,cpu:<node1cpus>[;<node2cpus>...]]
[,pin:<node1hostnode>[;<host2hostnode>...]]
That would allow easy things like -numa 2 (for a two guest node), not
given options would result in defaults (equally split-up resources).
The only problem is the default option for the host side, as libnuma
requires to explicitly name the nodes. Maybe make the pin: part _not_
optional? I would at least want to pin the memory, one could discuss
about the VCPUs...
>
> Also need a monitor command to change host nodes dynamically:
Implementing a monitor interface is a good idea.
> (qemu) numanode 1 0
Does that include page migration? That would be easily possible with
mbind(MPOL_MF_MOVE), but would take some time and resources (which I
think is OK if explicitly triggered in the monitor).
Any other useful commands for the monitor? Maybe (temporary) VCPU
migration without page migration?
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG,
Wilschdorfer Landstr. 101, 01109 Dresden, Germany
Register Court Dresden: HRA 4896, General Partner authorized
to represent: AMD Saxony LLC (Wilmington, Delaware, US)
General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
next prev parent reply other threads:[~2008-12-01 14:15 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-27 22:23 [PATCH 0/3] KVM-userspace: add NUMA support for guests Andre Przywara
2008-11-28 8:14 ` Andi Kleen
2008-11-29 18:43 ` Avi Kivity
2008-11-29 20:10 ` Andi Kleen
2008-11-29 20:35 ` Avi Kivity
2008-11-30 15:41 ` Andi Kleen
2008-11-30 15:38 ` Avi Kivity
2008-11-30 16:05 ` Andi Kleen
2008-11-30 16:38 ` Avi Kivity
2008-11-30 17:04 ` Andi Kleen
2008-11-30 17:11 ` Avi Kivity
2008-11-30 17:42 ` Andi Kleen
2008-11-30 18:07 ` Avi Kivity
2008-11-30 18:55 ` Andi Kleen
2008-11-30 19:11 ` Skywing
2008-11-30 20:08 ` Avi Kivity
2008-11-30 20:07 ` Avi Kivity
2008-11-30 21:41 ` Andi Kleen
2008-11-30 21:50 ` Avi Kivity
2008-11-30 22:08 ` Skywing
2008-11-28 10:40 ` Daniel P. Berrange
2008-11-29 18:29 ` Avi Kivity
2008-12-01 14:15 ` Andre Przywara [this message]
2008-12-01 14:29 ` Avi Kivity
2008-12-01 15:27 ` Anthony Liguori
2008-12-01 15:34 ` Anthony Liguori
2008-12-01 15:37 ` Avi Kivity
2008-12-01 15:49 ` Anthony Liguori
2008-12-01 14:44 ` Daniel P. Berrange
2008-12-01 14:53 ` Avi Kivity
2008-12-01 15:18 ` Anthony Liguori
2008-12-01 15:35 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4933F177.5040802@amd.com \
--to=andre.przywara@amd.com \
--cc=ak@suse.de \
--cc=avi@redhat.com \
--cc=berrange@redhat.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox