From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Fri, 28 Jan 2011 12:44:00 +0100 Message-ID: <4D42AC00.8050109@ts.fujitsu.com> References: <4D41FD3A.5090506@amd.com> <4D426673.7020200@ts.fujitsu.com> <4D42A35D.3050507@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D42A35D.3050507@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andre Przywara Cc: "xen-devel@lists.xensource.com" , Ian Jackson , Keir Fraser List-Id: xen-devel@lists.xenproject.org On 01/28/11 12:07, Andre Przywara wrote: > Juergen Gross wrote: >> On 01/28/11 00:18, Andre Przywara wrote: >>> Hi, >>> >>> when I boot my machine without restricting Dom0 (dom0_mem= >>> dom0_max_vcpus=) I get an _hypervisor_ crash when I run >>> # xl cpupool-numa-split >>> If Dom0's resources are limited on the Xen cmdline, everything works >>> fine. >>> The crashdump points to a scheduling problem with weights, so I assume >>> the NUMA distribution algorithm some fools the hypervisor completely. >>> >>> I will investigate this further tomorrow, but maybe someone has some >>> good idea. >> >> I've seen this once with an older cpupool version on a 24 processor >> machine. >> It was NOT related to NUMA, but did occur only on reboot after a Dom0 >> panic. >> The machine had an init script creating a cpupool and populating it with >> cpus. The machine was in a panic loop due to the BUG in sched_acct >> then until >> it was resetted manually. After the reset the problem was gone. >> >> As I was never able to reproduce the problem later (the same software is >> running on dozens of machines!), I assumed there was a problem related to >> the first Dom0 panic, may be some destroyed BIOS tables. >> >> Can the crash be reproduced easily? > Yes. > If I don't specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I > can reliably trigger the crash with xl cpupool-numa-split. > Omitting dom0_max_vcpus only does not suffice. Do I understand correctly? No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ? Could you try this patch? diff -r b59f04eb8978 xen/common/schedule.c --- a/xen/common/schedule.c Fri Jan 21 18:06:23 2011 +0000 +++ b/xen/common/schedule.c Fri Jan 28 12:42:46 2011 +0100 @@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp idle = idle_vcpu[cpu]; ppriv = SCHED_OP(new_ops, alloc_pdata, cpu); + BUG_ON(ppriv == NULL); vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv); + BUG_ON(vpriv == NULL); pcpu_schedule_lock_irqsave(cpu, flags); -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html