Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
	kvm@vger.kernel.org, "Daniel P. Berrange" <berrange@redhat.com>,
	Andi Kleen <ak@suse.de>
Subject: Re: [PATCH 0/3] KVM-userspace: add NUMA support for guests
Date: Mon, 01 Dec 2008 09:49:04 -0600	[thread overview]
Message-ID: <49340770.9000908@codemonkey.ws> (raw)
In-Reply-To: <493404CA.3060404@redhat.com>

Avi Kivity wrote:
> Anthony Liguori wrote:
>>
>> I see no compelling reason to do cpu placement internally.  It can be 
>> done quite effectively externally.
>>
>> Memory allocation is tough, but I don't think it's out of reach.  
>> Looking at the numactl man page, you can do:
>>
>> numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch
>>       Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1.
>>
>>
>> Since we can already create VM's with the -mem-path argument, if you 
>> create a 2GB guest and want it to span two numa nodes, you could do:
>>
>> numactl  --offset=0G  --length=1G --membind=0 --file /dev/shm/A --touch
>> numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch
>>
>> And then create the VM with:
>>
>> qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ...
>>
>> What's best about this approach, is that you get full access to what 
>> numactl is capable of.  Interleaving, rebalancing, etc.
>
> It looks horribly difficult and unintuitive.  It forces you to use 
> -mem-path (which is an abomination; the only reason it lives is that 
> we can't allocate large pages with it).

As opposed to inventing new options for QEMU that convey all of the same 
information a slightly different way?  We're stuck with -mem-path so we 
might as well make good use of it.

The proposed syntax is:

qemu -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3

The new syntax would be:

qemu -smp 4 -numa nodes=2,cpus=1:2:3:4,mem=1G:1G -mem-path 
/dev/hugetlbfs/foo

Then you would have to look up the thread ids, and do

taskset <vcpu1>
taskset <vcpu2>
taskset <vcpu3>
taskset <vcpu4>
numactl -o 1G -l 1G -m 0 -f /dev/hugetlbfs/foo
numactl -o 1G -l 1G -m 1 -f /dev/hugetlbfs/foo

This may look like a lot more, but it's not going to be nearly enough to 
specify a NUMA placement on startup.  What if you have a very large NUMA 
system and want to rebalance virtual machines?  You need a mechanism to 
do this that now has to be exposed through the monitor.  In fact, you'll 
almost certainly introduce a taskset-like monitor command and a 
numactl-like monitor command.

Why reinvent the wheel?  Plus, taskset and numactl gives you a lot of 
flexibility.  All we're going to do by cooking this stuff into QEMU is 
artificially limit ourselves.

Regards,

Anthony LIguori

next prev parent reply	other threads:[~2008-12-01 15:49 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27 22:23 [PATCH 0/3] KVM-userspace: add NUMA support for guests Andre Przywara
2008-11-28  8:14 ` Andi Kleen
2008-11-29 18:43   ` Avi Kivity
2008-11-29 20:10     ` Andi Kleen
2008-11-29 20:35       ` Avi Kivity
2008-11-30 15:41         ` Andi Kleen
2008-11-30 15:38           ` Avi Kivity
2008-11-30 16:05             ` Andi Kleen
2008-11-30 16:38               ` Avi Kivity
2008-11-30 17:04                 ` Andi Kleen
2008-11-30 17:11                   ` Avi Kivity
2008-11-30 17:42                     ` Andi Kleen
2008-11-30 18:07                       ` Avi Kivity
2008-11-30 18:55                         ` Andi Kleen
2008-11-30 19:11                           ` Skywing
2008-11-30 20:08                             ` Avi Kivity
2008-11-30 20:07                           ` Avi Kivity
2008-11-30 21:41                             ` Andi Kleen
2008-11-30 21:50                               ` Avi Kivity
2008-11-30 22:08                                 ` Skywing
2008-11-28 10:40 ` Daniel P. Berrange
2008-11-29 18:29 ` Avi Kivity
2008-12-01 14:15   ` Andre Przywara
2008-12-01 14:29     ` Avi Kivity
2008-12-01 15:27       ` Anthony Liguori
2008-12-01 15:34         ` Anthony Liguori
2008-12-01 15:37         ` Avi Kivity
2008-12-01 15:49           ` Anthony Liguori [this message]
2008-12-01 14:44     ` Daniel P. Berrange
2008-12-01 14:53       ` Avi Kivity
2008-12-01 15:18 ` Anthony Liguori
2008-12-01 15:35   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49340770.9000908@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=ak@suse.de \
    --cc=andre.przywara@amd.com \
    --cc=avi@redhat.com \
    --cc=berrange@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.