From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Tue, 08 Feb 2011 06:43:43 +0100
Message-ID: <4D50D80F.9000007@ts.fujitsu.com>
References: <4D41FD3A.5090506@amd.com>	<201102021539.06664.stephan.diestelhorst@amd.com>	<4D4974D1.1080503@ts.fujitsu.com>	<201102021701.05665.stephan.diestelhorst@amd.com>	<4D4A43B7.5040707@ts.fujitsu.com>
	<4D4A72D8.3020502@ts.fujitsu.com>	<4D4C08B6.30600@amd.com>
	<4D4FE7E2.9070605@amd.com>	<4D4FF452.6060508@ts.fujitsu.com>
	<AANLkTinoRUQC_suVYFM9-x3D00KvYofq3R=XkCQUj6RP@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <AANLkTinoRUQC_suVYFM9-x3D00KvYofq3R=XkCQUj6RP@mail.gmail.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
List-Id: xen-devel@lists.xenproject.org

On 02/07/11 16:55, George Dunlap wrote:
> Juergen,
>
> What is supposed to happen if a domain is in cpupool0, and then all of
> the cpus are taken out of cpupool0?  Is that possible?

No. Cpupool0 can't be without any cpu, as Dom0 is always member of cpupool0.

>
> It looks like there's code in cpupools.c:cpupool_unassign_cpu() which
> will move all VMs in a cpupool to cpupool0 before removing the last
> cpu.  But what happens if cpupool0 is the pool that has become empty?
> It seems like that breaks a lot of the assumptions; e.g.,
> sched_move_domain() seems to assume that the pool we're moving a VM to
> actually has cpus.

The move of VMs to cpupool0 is done only for domains which are dying.
If there are any active domains in the cpupool, removing the last cpu from
it will be denied.

>
> While we're at it, what's with the "(cpu != cpu_moving_cpu)" in the
> first half of cpupool_unassign_cpu()?  Under what conditions are you
> anticipating cpupool_unassign_cpu() being called a second time before
> the first completes?  If you have to abort the move because
> schedule_cpu_switch() failed, wouldn't it be better just to roll the
> whole transaction back, rather than leaving it hanging in the middle?

Not really. It could take some time until all vcpus have been migrated to
another cpu. In this case -EAGAIN is returned and the cpu is already
removed from the cpumask of valid cpus for that cpupool to avoid scheduling
of other vcpus on that cpu. Without cpu_moving_cpu there would be no
forward progress guaranteed.

>
> Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0?  What
> could possibly be the use of grabbing a random cpupool and then trying
> to remove the specified cpu from it?

This is a very good question :-)
I think this should be fixed. Seems to be a copy and paste error. I'll send a
patch.


Thanks for your thoughts,


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html