From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Fri, 28 Jan 2011 07:47:15 +0100 Message-ID: <4D426673.7020200@ts.fujitsu.com> References: <4D41FD3A.5090506@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D41FD3A.5090506@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andre Przywara Cc: "xen-devel@lists.xensource.com" , Ian Jackson , Keir Fraser List-Id: xen-devel@lists.xenproject.org On 01/28/11 00:18, Andre Przywara wrote: > Hi, > > when I boot my machine without restricting Dom0 (dom0_mem= > dom0_max_vcpus=) I get an _hypervisor_ crash when I run > # xl cpupool-numa-split > If Dom0's resources are limited on the Xen cmdline, everything works fine. > The crashdump points to a scheduling problem with weights, so I assume > the NUMA distribution algorithm some fools the hypervisor completely. > > I will investigate this further tomorrow, but maybe someone has some > good idea. I've seen this once with an older cpupool version on a 24 processor machine. It was NOT related to NUMA, but did occur only on reboot after a Dom0 panic. The machine had an init script creating a cpupool and populating it with cpus. The machine was in a panic loop due to the BUG in sched_acct then until it was resetted manually. After the reset the problem was gone. As I was never able to reproduce the problem later (the same software is running on dozens of machines!), I assumed there was a problem related to the first Dom0 panic, may be some destroyed BIOS tables. Can the crash be reproduced easily? Juergen > > Regards, > Andre. > > root@dosorca:/data/images# xl cpupool-numa-split > (XEN) Xen BUG at sched_credit.c:990 > (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[] csched_acct+0x11f/0x419 > (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor > (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 > (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 > (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 > (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf > (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8 > (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618 > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c480297d80: > (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0 > (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282 > (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000 > (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9 > (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34 > (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f > (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80 > (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880 > (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327 > (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18 > (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2 > (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000 > (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000 > (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10 > (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001 > (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e > (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa > (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [] csched_acct+0x11f/0x419 > (XEN) [] execute_timer+0x4e/0x6c > (XEN) [] timer_softirq_action+0xf2/0x245 > (XEN) [] __do_softirq+0x88/0x99 > (XEN) [] do_softirq+0x6a/0x7a > (XEN) [] idle_loop+0x6a/0x6f > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at sched_credit.c:990 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html