From: Peter Zijlstra <peterz@infradead.org>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
Liu ping fan <kernelfans@gmail.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Avi Kivity <avi@redhat.com>,
Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [Qemu-devel] [PATCH 1/2] sched: add virt sched domain for the guest
Date: Wed, 23 May 2012 17:52:47 +0200 [thread overview]
Message-ID: <1337788367.9783.12.camel@laptop> (raw)
In-Reply-To: <4FBD00DA.5080308@linux.vnet.ibm.com>
On Wed, 2012-05-23 at 08:23 -0700, Dave Hansen wrote:
> On 05/23/2012 01:48 AM, Peter Zijlstra wrote:
> > On Wed, 2012-05-23 at 16:34 +0800, Liu ping fan wrote:
> >> > so we need to migrate some of vcpus from node-B to node-A, or to
> >> > node-C.
> > This is absolutely broken, you cannot do that.
> >
> > A guest task might want to be node affine, it looks at the topology sets
> > a cpu affinity mask and expects to stay on that node.
> >
> > But then you come along, and flip one of those cpus to another node. The
> > guest task will now run on another node and get remote memory accesses.
>
> Insane, sure. But, if the node has physically gone away, what do we do?
> I think we've got to either kill the guest, or let it run somewhere
> suboptimal. Sounds like you're advocating killing it. ;)
You all seem terribly confused. If you want a guest that 100% mirrors
the host topology you need hard-binding of all vcpu threads and clearly
you're in trouble if you unplug a host cpu while there's still a vcpu
expecting to run there.
That's an administrator error and you get to keep the pieces, I don't
care.
In case you want simple virt-numa where a number of vcpus constitute a
vnode and have their memory all on the same node the vcpus are ran on,
what does it matter if you unplug something in the host? Just migrate
everything -- including memory.
But what Liu was proposing is completely insane and broken. You cannot
simply remap cpu:node relations. Wanting to do that shows a profound
lack of understanding.
Our kernel assumes that a cpu remains on the same node. All userspace
that does anything with NUMA assumes the same. You cannot change this.
next prev parent reply other threads:[~2012-05-23 15:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-23 6:32 [Qemu-devel] [RFC] kvm: export host NUMA info to guest's scheduler Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] [PATCH 1/2] sched: add virt sched domain for the guest Liu Ping Fan
2012-05-23 7:54 ` Peter Zijlstra
2012-05-23 8:10 ` Liu ping fan
2012-05-23 8:23 ` Peter Zijlstra
2012-05-23 8:34 ` Liu ping fan
2012-05-23 8:48 ` Peter Zijlstra
2012-05-23 9:58 ` Liu ping fan
2012-05-23 10:14 ` Peter Zijlstra
2012-05-23 15:23 ` Dave Hansen
2012-05-23 15:52 ` Peter Zijlstra [this message]
2012-05-23 6:32 ` [Qemu-devel] [PATCH 2/2] sched: add virt domain device's driver Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] [PATCH] kvm: collect vcpus' numa info for guest's scheduler Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] [PATCH] Qemu: add virt sched domain device Liu Ping Fan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1337788367.9783.12.camel@laptop \
--to=peterz@infradead.org \
--cc=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=dave@linux.vnet.ibm.com \
--cc=kernelfans@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).