public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@amd.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, "Daniel P. Berrange" <berrange@redhat.com>,
	Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Date: Mon, 1 Dec 2008 15:15:19 +0100	[thread overview]
Message-ID: <4933F177.5040802@amd.com> (raw)
In-Reply-To: <49318A10.7080801@redhat.com>

Avi Kivity wrote:
> Andre Przywara wrote:
>> The user (or better: management application) specifies the host nodes
>> the guest should use: -nodes 2,3 would create a two node guest mapped to
>> node 2 and 3 on the host. These numbers are handed over to libnuma:
>> VCPUs are pinned to the nodes and the allocated guest memory is bound to
>> it's respective node. Since libnuma seems not to be installed
>> everywhere, the user has to enable this via configure --enable-numa
>> In the BIOS code an ACPI SRAT table was added, which describes the NUMA
>> topology to the guest. The number of nodes is communicated via the CMOS
>> RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me.
> 
> There exists now a firmware interface in qemu for this kind of 
> communications.
Oh, right you are, I missed that (was well hidden). I was looking at how 
the BIOS detects memory size and CPU numbers and these methods are quite 
cumbersome. Why not convert them to the FW_CFG methods (which the qemu 
side already sets)? To not diverge too much from the original BOCHS BIOS?

>> Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes
>> parameter reverts to the old behavior.
> 
> '-nodes' is too generic a name ('node' could also mean a host).  Suggest 
> -numanode.
> 
> Need more flexibility: specify the range of memory per node, which cpus 
> are in the node, relative weights for the SRAT table:
> 
>   -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3

I converted my code to use the new firmware interface. This also makes 
it possible to pass more information between qemu and BIOS (which 
prevented a more flexible command line in the first version).
So I would opt for the following:
- use numanode (or simply numa?) instead of the misleading -nodes
- allow passing memory sizes, VCPU subsets and host CPU pin info
I would prefer Daniel's version:
-numa <nrnodes>[,mem:<node1size>[;<node2size>...]]
[,cpu:<node1cpus>[;<node2cpus>...]]
[,pin:<node1hostnode>[;<host2hostnode>...]]

That would allow easy things like -numa 2 (for a two guest node), not 
given options would result in defaults (equally split-up resources).

The only problem is the default option for the host side, as libnuma 
requires to explicitly name the nodes. Maybe make the pin: part _not_ 
optional? I would at least want to pin the memory, one could discuss 
about the VCPUs...

> 
> Also need a monitor command to change host nodes dynamically:
Implementing a monitor interface is a good idea.
> (qemu) numanode 1 0
Does that include page migration? That would be easily possible with 
mbind(MPOL_MF_MOVE), but would take some time and resources (which I 
think is OK if explicitly triggered in the monitor).
Any other useful commands for the monitor? Maybe (temporary) VCPU 
migration without page migration?

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG,
Wilschdorfer Landstr. 101, 01109 Dresden, Germany
Register Court Dresden: HRA 4896, General Partner authorized
to represent: AMD Saxony LLC (Wilmington, Delaware, US)
General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


  reply	other threads:[~2008-12-01 14:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27 22:23 [PATCH 0/3] KVM-userspace: add NUMA support for guests Andre Przywara
2008-11-28  8:14 ` Andi Kleen
2008-11-29 18:43   ` Avi Kivity
2008-11-29 20:10     ` Andi Kleen
2008-11-29 20:35       ` Avi Kivity
2008-11-30 15:41         ` Andi Kleen
2008-11-30 15:38           ` Avi Kivity
2008-11-30 16:05             ` Andi Kleen
2008-11-30 16:38               ` Avi Kivity
2008-11-30 17:04                 ` Andi Kleen
2008-11-30 17:11                   ` Avi Kivity
2008-11-30 17:42                     ` Andi Kleen
2008-11-30 18:07                       ` Avi Kivity
2008-11-30 18:55                         ` Andi Kleen
2008-11-30 19:11                           ` Skywing
2008-11-30 20:08                             ` Avi Kivity
2008-11-30 20:07                           ` Avi Kivity
2008-11-30 21:41                             ` Andi Kleen
2008-11-30 21:50                               ` Avi Kivity
2008-11-30 22:08                                 ` Skywing
2008-11-28 10:40 ` Daniel P. Berrange
2008-11-29 18:29 ` Avi Kivity
2008-12-01 14:15   ` Andre Przywara [this message]
2008-12-01 14:29     ` Avi Kivity
2008-12-01 15:27       ` Anthony Liguori
2008-12-01 15:34         ` Anthony Liguori
2008-12-01 15:37         ` Avi Kivity
2008-12-01 15:49           ` Anthony Liguori
2008-12-01 14:44     ` Daniel P. Berrange
2008-12-01 14:53       ` Avi Kivity
2008-12-01 15:18 ` Anthony Liguori
2008-12-01 15:35   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4933F177.5040802@amd.com \
    --to=andre.przywara@amd.com \
    --cc=ak@suse.de \
    --cc=avi@redhat.com \
    --cc=berrange@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox