All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@amd.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, "Daniel P. Berrange" <berrange@redhat.com>,
	Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Date: Mon, 1 Dec 2008 15:15:19 +0100	[thread overview]
Message-ID: <4933F177.5040802@amd.com> (raw)
In-Reply-To: <49318A10.7080801@redhat.com>

Avi Kivity wrote:
> Andre Przywara wrote:
>> The user (or better: management application) specifies the host nodes
>> the guest should use: -nodes 2,3 would create a two node guest mapped to
>> node 2 and 3 on the host. These numbers are handed over to libnuma:
>> VCPUs are pinned to the nodes and the allocated guest memory is bound to
>> it's respective node. Since libnuma seems not to be installed
>> everywhere, the user has to enable this via configure --enable-numa
>> In the BIOS code an ACPI SRAT table was added, which describes the NUMA
>> topology to the guest. The number of nodes is communicated via the CMOS
>> RAM (offset 0x3E). If someone thinks of this as a bad idea, tell me.
> 
> There exists now a firmware interface in qemu for this kind of 
> communications.
Oh, right you are, I missed that (was well hidden). I was looking at how 
the BIOS detects memory size and CPU numbers and these methods are quite 
cumbersome. Why not convert them to the FW_CFG methods (which the qemu 
side already sets)? To not diverge too much from the original BOCHS BIOS?

>> Node over-committing is allowed (-nodes 0,0,0,0), omitting the -nodes
>> parameter reverts to the old behavior.
> 
> '-nodes' is too generic a name ('node' could also mean a host).  Suggest 
> -numanode.
> 
> Need more flexibility: specify the range of memory per node, which cpus 
> are in the node, relative weights for the SRAT table:
> 
>   -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3

I converted my code to use the new firmware interface. This also makes 
it possible to pass more information between qemu and BIOS (which 
prevented a more flexible command line in the first version).
So I would opt for the following:
- use numanode (or simply numa?) instead of the misleading -nodes
- allow passing memory sizes, VCPU subsets and host CPU pin info
I would prefer Daniel's version:
-numa <nrnodes>[,mem:<node1size>[;<node2size>...]]
[,cpu:<node1cpus>[;<node2cpus>...]]
[,pin:<node1hostnode>[;<host2hostnode>...]]

That would allow easy things like -numa 2 (for a two guest node), not 
given options would result in defaults (equally split-up resources).

The only problem is the default option for the host side, as libnuma 
requires to explicitly name the nodes. Maybe make the pin: part _not_ 
optional? I would at least want to pin the memory, one could discuss 
about the VCPUs...

> 
> Also need a monitor command to change host nodes dynamically:
Implementing a monitor interface is a good idea.
> (qemu) numanode 1 0
Does that include page migration? That would be easily possible with 
mbind(MPOL_MF_MOVE), but would take some time and resources (which I 
think is OK if explicitly triggered in the monitor).
Any other useful commands for the monitor? Maybe (temporary) VCPU 
migration without page migration?

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG,
Wilschdorfer Landstr. 101, 01109 Dresden, Germany
Register Court Dresden: HRA 4896, General Partner authorized
to represent: AMD Saxony LLC (Wilmington, Delaware, US)
General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy


  reply	other threads:[~2008-12-01 14:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27 22:23 [PATCH 0/3] KVM-userspace: add NUMA support for guests Andre Przywara
2008-11-28  8:14 ` Andi Kleen
2008-11-29 18:43   ` Avi Kivity
2008-11-29 20:10     ` Andi Kleen
2008-11-29 20:35       ` Avi Kivity
2008-11-30 15:41         ` Andi Kleen
2008-11-30 15:38           ` Avi Kivity
2008-11-30 16:05             ` Andi Kleen
2008-11-30 16:38               ` Avi Kivity
2008-11-30 17:04                 ` Andi Kleen
2008-11-30 17:11                   ` Avi Kivity
2008-11-30 17:42                     ` Andi Kleen
2008-11-30 18:07                       ` Avi Kivity
2008-11-30 18:55                         ` Andi Kleen
2008-11-30 19:11                           ` Skywing
2008-11-30 20:08                             ` Avi Kivity
2008-11-30 20:07                           ` Avi Kivity
2008-11-30 21:41                             ` Andi Kleen
2008-11-30 21:50                               ` Avi Kivity
2008-11-30 22:08                                 ` Skywing
2008-11-28 10:40 ` Daniel P. Berrange
2008-11-29 18:29 ` Avi Kivity
2008-12-01 14:15   ` Andre Przywara [this message]
2008-12-01 14:29     ` Avi Kivity
2008-12-01 15:27       ` Anthony Liguori
2008-12-01 15:34         ` Anthony Liguori
2008-12-01 15:37         ` Avi Kivity
2008-12-01 15:49           ` Anthony Liguori
2008-12-01 14:44     ` Daniel P. Berrange
2008-12-01 14:53       ` Avi Kivity
2008-12-01 15:18 ` Anthony Liguori
2008-12-01 15:35   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4933F177.5040802@amd.com \
    --to=andre.przywara@amd.com \
    --cc=ak@suse.de \
    --cc=avi@redhat.com \
    --cc=berrange@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.