From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:47879) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXDrv-00084h-ER for qemu-devel@nongnu.org; Wed, 23 May 2012 11:53:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SXDrq-0002ip-HT for qemu-devel@nongnu.org; Wed, 23 May 2012 11:52:59 -0400 Received: from merlin.infradead.org ([205.233.59.134]:53461) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SXDrq-0002i1-Dv for qemu-devel@nongnu.org; Wed, 23 May 2012 11:52:54 -0400 Received: from canuck.infradead.org ([2001:4978:20e::1]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1SXDro-0006q8-8C for qemu-devel@nongnu.org; Wed, 23 May 2012 15:52:52 +0000 Received: from dhcp-089-099-019-018.chello.nl ([89.99.19.18] helo=dyad.programming.kicks-ass.net) by canuck.infradead.org with esmtpsa (Exim 4.76 #1 (Red Hat Linux)) id 1SXDrn-0006YA-Uw for qemu-devel@nongnu.org; Wed, 23 May 2012 15:52:52 +0000 From: Peter Zijlstra In-Reply-To: <4FBD00DA.5080308@linux.vnet.ibm.com> References: <1337754751-9018-1-git-send-email-kernelfans@gmail.com> <1337754751-9018-2-git-send-email-kernelfans@gmail.com> <1337759644.9698.49.camel@twins> <1337761402.9698.62.camel@twins> <1337762914.9698.65.camel@twins> <4FBD00DA.5080308@linux.vnet.ibm.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 23 May 2012 17:52:47 +0200 Message-ID: <1337788367.9783.12.camel@laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/2] sched: add virt sched domain for the guest List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Dave Hansen Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Liu ping fan , linux-kernel@vger.kernel.org, Ingo Molnar , Avi Kivity , Anthony Liguori On Wed, 2012-05-23 at 08:23 -0700, Dave Hansen wrote: > On 05/23/2012 01:48 AM, Peter Zijlstra wrote: > > On Wed, 2012-05-23 at 16:34 +0800, Liu ping fan wrote: > >> > so we need to migrate some of vcpus from node-B to node-A, or to > >> > node-C. > > This is absolutely broken, you cannot do that. > > > > A guest task might want to be node affine, it looks at the topology sets > > a cpu affinity mask and expects to stay on that node. > > > > But then you come along, and flip one of those cpus to another node. The > > guest task will now run on another node and get remote memory accesses. > > Insane, sure. But, if the node has physically gone away, what do we do? > I think we've got to either kill the guest, or let it run somewhere > suboptimal. Sounds like you're advocating killing it. ;) You all seem terribly confused. If you want a guest that 100% mirrors the host topology you need hard-binding of all vcpu threads and clearly you're in trouble if you unplug a host cpu while there's still a vcpu expecting to run there. That's an administrator error and you get to keep the pieces, I don't care. In case you want simple virt-numa where a number of vcpus constitute a vnode and have their memory all on the same node the vcpus are ran on, what does it matter if you unplug something in the host? Just migrate everything -- including memory. But what Liu was proposing is completely insane and broken. You cannot simply remap cpu:node relations. Wanting to do that shows a profound lack of understanding. Our kernel assumes that a cpu remains on the same node. All userspace that does anything with NUMA assumes the same. You cannot change this.