From: Peter Zijlstra <peterz@infradead.org>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Liu ping fan <kernelfans@gmail.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
qemu-devel@nongnu.org, Ingo Molnar <mingo@redhat.com>,
Avi Kivity <avi@redhat.com>,
Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [PATCH 1/2] sched: add virt sched domain for the guest
Date: Wed, 23 May 2012 17:52:47 +0200 [thread overview]
Message-ID: <1337788367.9783.12.camel@laptop> (raw)
In-Reply-To: <4FBD00DA.5080308@linux.vnet.ibm.com>
On Wed, 2012-05-23 at 08:23 -0700, Dave Hansen wrote:
> On 05/23/2012 01:48 AM, Peter Zijlstra wrote:
> > On Wed, 2012-05-23 at 16:34 +0800, Liu ping fan wrote:
> >> > so we need to migrate some of vcpus from node-B to node-A, or to
> >> > node-C.
> > This is absolutely broken, you cannot do that.
> >
> > A guest task might want to be node affine, it looks at the topology sets
> > a cpu affinity mask and expects to stay on that node.
> >
> > But then you come along, and flip one of those cpus to another node. The
> > guest task will now run on another node and get remote memory accesses.
>
> Insane, sure. But, if the node has physically gone away, what do we do?
> I think we've got to either kill the guest, or let it run somewhere
> suboptimal. Sounds like you're advocating killing it. ;)
You all seem terribly confused. If you want a guest that 100% mirrors
the host topology you need hard-binding of all vcpu threads and clearly
you're in trouble if you unplug a host cpu while there's still a vcpu
expecting to run there.
That's an administrator error and you get to keep the pieces, I don't
care.
In case you want simple virt-numa where a number of vcpus constitute a
vnode and have their memory all on the same node the vcpus are ran on,
what does it matter if you unplug something in the host? Just migrate
everything -- including memory.
But what Liu was proposing is completely insane and broken. You cannot
simply remap cpu:node relations. Wanting to do that shows a profound
lack of understanding.
Our kernel assumes that a cpu remains on the same node. All userspace
that does anything with NUMA assumes the same. You cannot change this.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
Liu ping fan <kernelfans@gmail.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Avi Kivity <avi@redhat.com>,
Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [Qemu-devel] [PATCH 1/2] sched: add virt sched domain for the guest
Date: Wed, 23 May 2012 17:52:47 +0200 [thread overview]
Message-ID: <1337788367.9783.12.camel@laptop> (raw)
In-Reply-To: <4FBD00DA.5080308@linux.vnet.ibm.com>
On Wed, 2012-05-23 at 08:23 -0700, Dave Hansen wrote:
> On 05/23/2012 01:48 AM, Peter Zijlstra wrote:
> > On Wed, 2012-05-23 at 16:34 +0800, Liu ping fan wrote:
> >> > so we need to migrate some of vcpus from node-B to node-A, or to
> >> > node-C.
> > This is absolutely broken, you cannot do that.
> >
> > A guest task might want to be node affine, it looks at the topology sets
> > a cpu affinity mask and expects to stay on that node.
> >
> > But then you come along, and flip one of those cpus to another node. The
> > guest task will now run on another node and get remote memory accesses.
>
> Insane, sure. But, if the node has physically gone away, what do we do?
> I think we've got to either kill the guest, or let it run somewhere
> suboptimal. Sounds like you're advocating killing it. ;)
You all seem terribly confused. If you want a guest that 100% mirrors
the host topology you need hard-binding of all vcpu threads and clearly
you're in trouble if you unplug a host cpu while there's still a vcpu
expecting to run there.
That's an administrator error and you get to keep the pieces, I don't
care.
In case you want simple virt-numa where a number of vcpus constitute a
vnode and have their memory all on the same node the vcpus are ran on,
what does it matter if you unplug something in the host? Just migrate
everything -- including memory.
But what Liu was proposing is completely insane and broken. You cannot
simply remap cpu:node relations. Wanting to do that shows a profound
lack of understanding.
Our kernel assumes that a cpu remains on the same node. All userspace
that does anything with NUMA assumes the same. You cannot change this.
next prev parent reply other threads:[~2012-05-23 15:52 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-23 6:32 [RFC] kvm: export host NUMA info to guest's scheduler Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] " Liu Ping Fan
2012-05-23 6:32 ` Liu Ping Fan
2012-05-23 6:32 ` [PATCH 1/2] sched: add virt sched domain for the guest Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] " Liu Ping Fan
2012-05-23 7:54 ` Peter Zijlstra
2012-05-23 7:54 ` [Qemu-devel] " Peter Zijlstra
2012-05-23 8:10 ` Liu ping fan
2012-05-23 8:10 ` [Qemu-devel] " Liu ping fan
2012-05-23 8:23 ` Peter Zijlstra
2012-05-23 8:23 ` [Qemu-devel] " Peter Zijlstra
2012-05-23 8:34 ` Liu ping fan
2012-05-23 8:34 ` [Qemu-devel] " Liu ping fan
2012-05-23 8:48 ` Peter Zijlstra
2012-05-23 8:48 ` [Qemu-devel] " Peter Zijlstra
2012-05-23 9:58 ` Liu ping fan
2012-05-23 9:58 ` [Qemu-devel] " Liu ping fan
2012-05-23 10:14 ` Peter Zijlstra
2012-05-23 10:14 ` [Qemu-devel] " Peter Zijlstra
2012-05-23 15:23 ` Dave Hansen
2012-05-23 15:23 ` [Qemu-devel] " Dave Hansen
2012-05-23 15:52 ` Peter Zijlstra [this message]
2012-05-23 15:52 ` Peter Zijlstra
2012-05-23 6:32 ` [PATCH 2/2] sched: add virt domain device's driver Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] " Liu Ping Fan
2012-05-23 6:32 ` Liu Ping Fan
2012-05-23 6:32 ` [PATCH] kvm: collect vcpus' numa info for guest's scheduler Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] " Liu Ping Fan
2012-05-23 6:32 ` [PATCH] Qemu: add virt sched domain device Liu Ping Fan
2012-05-23 6:32 ` [Qemu-devel] " Liu Ping Fan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1337788367.9783.12.camel@laptop \
--to=peterz@infradead.org \
--cc=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=dave@linux.vnet.ibm.com \
--cc=kernelfans@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.