From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Tue, 08 Feb 2011 06:43:43 +0100 Message-ID: <4D50D80F.9000007@ts.fujitsu.com> References: <4D41FD3A.5090506@amd.com> <201102021539.06664.stephan.diestelhorst@amd.com> <4D4974D1.1080503@ts.fujitsu.com> <201102021701.05665.stephan.diestelhorst@amd.com> <4D4A43B7.5040707@ts.fujitsu.com> <4D4A72D8.3020502@ts.fujitsu.com> <4D4C08B6.30600@amd.com> <4D4FE7E2.9070605@amd.com> <4D4FF452.6060508@ts.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: George Dunlap Cc: Andre Przywara , "xen-devel@lists.xensource.com" , "Diestelhorst, Stephan" List-Id: xen-devel@lists.xenproject.org On 02/07/11 16:55, George Dunlap wrote: > Juergen, > > What is supposed to happen if a domain is in cpupool0, and then all of > the cpus are taken out of cpupool0? Is that possible? No. Cpupool0 can't be without any cpu, as Dom0 is always member of cpupool0. > > It looks like there's code in cpupools.c:cpupool_unassign_cpu() which > will move all VMs in a cpupool to cpupool0 before removing the last > cpu. But what happens if cpupool0 is the pool that has become empty? > It seems like that breaks a lot of the assumptions; e.g., > sched_move_domain() seems to assume that the pool we're moving a VM to > actually has cpus. The move of VMs to cpupool0 is done only for domains which are dying. If there are any active domains in the cpupool, removing the last cpu from it will be denied. > > While we're at it, what's with the "(cpu != cpu_moving_cpu)" in the > first half of cpupool_unassign_cpu()? Under what conditions are you > anticipating cpupool_unassign_cpu() being called a second time before > the first completes? If you have to abort the move because > schedule_cpu_switch() failed, wouldn't it be better just to roll the > whole transaction back, rather than leaving it hanging in the middle? Not really. It could take some time until all vcpus have been migrated to another cpu. In this case -EAGAIN is returned and the cpu is already removed from the cpumask of valid cpus for that cpupool to avoid scheduling of other vcpus on that cpu. Without cpu_moving_cpu there would be no forward progress guaranteed. > > Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0? What > could possibly be the use of grabbing a random cpupool and then trying > to remove the specified cpu from it? This is a very good question :-) I think this should be fixed. Seems to be a copy and paste error. I'll send a patch. Thanks for your thoughts, Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html