Re: [PATCH 0/3][RFC] NUMA: add host side pinning

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Andre Przywara <andre.przywara@amd.com>
To: Alexander Graf <agraf@suse.de>
Cc: Anthony Liguori <anthony@codemonkey.ws>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH 0/3][RFC] NUMA: add host side pinning
Date: Thu, 24 Jun 2010 12:58:53 +0200	[thread overview]
Message-ID: <4C233A6D.7030805@amd.com> (raw)
In-Reply-To: <865764AB-4E51-4ED4-8832-AED6A237A9D3@suse.de>

Alexander Graf wrote:
> On 24.06.2010, at 00:21, Anthony Liguori wrote:
> 
>> On 06/23/2010 04:09 PM, Andre Przywara wrote:
>>> Hi,
>>>
>>> these three patches add basic NUMA pinning to KVM. According to a user
>>> provided assignment parts of the guest's memory will be bound to different
>>> host nodes. This should increase performance in large virtual machines
>>> and on loaded hosts.
>>> These patches are quite basic (but work) and I send them as RFC to get
>>> some feedback before implementing stuff in vain.
>>>
>>> To use it you need to provide a guest NUMA configuration, this could be
>>> as simple as "-numa node -numa node" to give two nodes in the guest. Then
>>> you pin these nodes on a separate command line option to different host
>>> nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2"
>>> This separation of host and guest config sounds a bit complicated, but
>>> was demanded last time I submitted a similar version.
>>> I refrained from binding the vCPUs to physical CPUs for now, but this
>>> can be added later with an "cpubind" option to "-numa pin,". Also this
>>> could be done from a management application by using sched_setaffinity().
>>>
>>> Please note that this is currently made for qemu-kvm, although I am not
>>> up-to-date regarding the curent status of upstreams QEMU's true SMP
>>> capabilities. The final patch will be made against upstream QEMU anyway.
>>> Also this is currently for Linux hosts (any other KVM hosts alive?) and
>>> for PC guests only. I think both can be fixed easily if someone requests
>>> it (and gives me a pointer to further information).
>>>
>>> Please comment on the approach in general and the implementation.
>>>   
>> If we extended integrated -mem-path with -numa such that a different path could be used with each numa node (and we let an explicit file be specified instead of just a directory), then if I understand correctly, we could use numactl without any specific integration in qemu.  Does this sound correct?
>>
>> IOW:
>>
>> qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem
>>
>> It's then possible to say:
>>
>> numactl --file /dev/shm/node0.mem --interleave=0,1
>> numactl --file /dev/shm/node1.mem --membind=2
>>
>> I think this approach is nicer because it gives the user a lot more flexibility without having us chase other tools like numactl.  For instance, your patches only support pinning and not interleaving.
> 
> Interesting idea.
> 
> So who would create the /dev/shm/nodeXX files?
Currently it is QEMU. It creates a somewhat unique filename, opens and 
unlinks it. The difference would be to name the file after the option 
and to not unlink it.

 > I can imagine starting numactl before qemu, even though that's
 > cumbersome. I don't think it's feasible to start numactl after
 > qemu is running. That'd involve way too much magic that I'd prefer
 > qemu to call numactl itself.
Using the current code the files would not exist before QEMU allocated 
RAM, and after that it could already touch pages before numactl set the 
policy.
To avoid this I'd like to see the pinning done from within QEMU. I am 
not sure whether calling numactl via system() and friends is OK, I'd 
prefer to run the syscalls directly (like in patch 3/3) and pull the 
necessary options into the -numa pin,... command line. We could mimic 
numactl's syntax here.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

next prev parent reply	other threads:[~2010-06-24 11:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-23 21:09 [PATCH 0/3][RFC] NUMA: add host side pinning Andre Przywara
2010-06-23 21:09 ` [PATCH 1/3] NUMA: add Linux libnuma detection Andre Przywara
2010-06-23 21:09 ` [PATCH 2/3] NUMA: add parsing of host NUMA pin option Andre Przywara
2010-06-23 21:09 ` [PATCH 3/3] NUMA: realize NUMA memory pinning Andre Przywara
2010-06-23 22:21 ` [PATCH 0/3][RFC] NUMA: add host side pinning Anthony Liguori
2010-06-23 22:29   ` Alexander Graf
2010-06-24 10:58     ` Andre Przywara [this message]
2010-06-24 11:12       ` Avi Kivity
2010-06-24 11:34         ` Andre Przywara
2010-06-24 11:42           ` Avi Kivity
2010-06-28 16:20             ` Anthony Liguori
2010-06-28 16:26               ` Alexander Graf
2010-06-29  9:46               ` Avi Kivity
2010-06-25 11:00           ` Jes Sorensen
2010-06-25 11:06             ` Andre Przywara
2010-06-25 11:37               ` Jes Sorensen
2010-06-28 16:17         ` Anthony Liguori
2010-06-29  9:48           ` Avi Kivity
2010-06-24  6:44   ` Andre Przywara
2010-06-24 13:14   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C233A6D.7030805@amd.com \
    --to=andre.przywara@amd.com \
    --cc=agraf@suse.de \
    --cc=anthony@codemonkey.ws \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox