From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Mon, 31 Jan 2011 08:04:45 +0100 Message-ID: <4D465F0D.4010408@ts.fujitsu.com> References: <4D41FD3A.5090506@amd.com> <4D426673.7020200@ts.fujitsu.com> <4D42A35D.3050507@amd.com> <4D42AC00.8050109@ts.fujitsu.com> <4D42C153.5050104@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D42C153.5050104@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andre Przywara Cc: Ian Jackson , "xen-devel@lists.xensource.com" , Keir Fraser List-Id: xen-devel@lists.xenproject.org On 01/28/11 14:14, Andre Przywara wrote: >> >> Do I understand correctly? >> No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ? > Yes, see my previous mail to George. > >> >> Could you try this patch? > Ok, the crash dump is as follows: Hmm, is the new crash reproducable as well? Seems not to be directly related to my diagnosis patch... Currently I have no NUMA machine available. I tried to use numa=fake=... boot parameter, but this seems to fake only NUMA memory nodes, all cpus are still in node 0: (XEN) 'u' pressed -> dumping numa info (now-0x120:5D5E0203) (XEN) idx0 -> NODE0 start->0 size->524288 (XEN) phys_to_nid(0000000000001000) -> 0 should be 0 (XEN) idx1 -> NODE1 start->524288 size->524288 (XEN) phys_to_nid(0000000080001000) -> 1 should be 1 (XEN) idx2 -> NODE2 start->1048576 size->524288 (XEN) phys_to_nid(0000000100001000) -> 2 should be 2 (XEN) idx3 -> NODE3 start->1572864 size->1835008 (XEN) phys_to_nid(0000000180001000) -> 3 should be 3 (XEN) CPU0 -> NODE0 (XEN) CPU1 -> NODE0 (XEN) CPU2 -> NODE0 (XEN) CPU3 -> NODE0 (XEN) Memory location of each domain: (XEN) Domain 0 (total: 3003121): (XEN) Node 0: 433864 (XEN) Node 1: 258522 (XEN) Node 2: 514315 (XEN) Node 3: 1796420 I suspect a problem with the __cpuinit stuff overwriting some node info. Andre, could you check this? I hope to reproduce your problem on my machine. > (XEN) Xen BUG at sched_credit.c:384 > (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- > (XEN) CPU: 2 > (XEN) RIP: e008:[] csched_alloc_pdata+0x146/0x17f > (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor > (XEN) rax: ffff830434322000 rbx: ffff830434418748 rcx: 0000000000000024 > (XEN) rdx: ffff82c4802d3ec0 rsi: 0000000000000003 rdi: ffff8304343c9100 > (XEN) rbp: ffff83043457fce8 rsp: ffff83043457fca8 r8: 0000000000000001 > (XEN) r9: ffff830434418748 r10: ffff82c48021a0a0 r11: 0000000000000286 > (XEN) r12: 0000000000000024 r13: ffff83123a3b2b60 r14: ffff830434418730 > (XEN) r15: 0000000000000024 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) cr3: 00000008061df000 cr2: ffff8817a21f87a0 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff83043457fca8: > (XEN) ffff83043457fcb8 ffff83123a3b2b60 0000000000000286 0000000000000024 > (XEN) ffff830434418820 ffff83123a3b2a70 0000000000000024 ffff82c4802b0880 > (XEN) ffff83043457fd58 ffff82c48011fa63 ffff82f60102aa80 0000000000081554 > (XEN) ffff8300c7cfa000 0000000000000000 0000400000000000 ffff82c480248e00 > (XEN) 0000000000000002 0000000000000024 ffff830434418820 0000000000305000 > (XEN) ffff82c4802550e4 ffff82c4802b0880 ffff83043457fd78 ffff82c48010188c > (XEN) ffff83043457fe40 0000000000000024 ffff83043457fdb8 ffff82c480101b94 > (XEN) ffff83043457fdb8 ffff82c4801836f2 fffffffe00000286 ffff83043457ff18 > (XEN) 0000000002170004 0000000000305000 ffff83043457fef8 ffff82c480125281 > (XEN) ffff83043457fdd8 0000000180153c9d 0000000000000000 ffff82c4801068f8 > (XEN) 0000000000000296 ffff8300c7e0a1c8 aaaaaaaaaaaaaaaa 0000000000000000 > (XEN) ffff88007d1ac170 ffff88007d1ac170 ffff83043457fef8 ffff82c480113d8a > (XEN) ffff83043457fe78 ffff83043457fe88 0000000800000012 0000000600000004 > (XEN) 0000000000000000 ffffffff00000024 0000000000000000 00007fac2e0e5a00 > (XEN) 0000000002170000 0000000000000000 0000000000000000 ffffffffffffffff > (XEN) 0000000000000000 0000000000000080 000000000000002f 0000000002170004 > (XEN) 0000000002172004 0000000002174004 00007fff878f1c80 0000000000000033 > (XEN) ffff83043457fed8 ffff8300c7e0a000 00007fff878f1b30 0000000000305000 > (XEN) 0000000000000003 0000000000000003 00007cfbcba800c7 ffff82c480207dd8 > (XEN) ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003 > (XEN) Xen call trace: > (XEN) [] csched_alloc_pdata+0x146/0x17f > (XEN) [] schedule_cpu_switch+0x75/0x1eb > (XEN) [] cpupool_assign_cpu_locked+0x44/0x8b > (XEN) [] cpupool_do_sysctl+0x1fb/0x461 > (XEN) [] do_sysctl+0x921/0xa30 > (XEN) [] syscall_enter+0xc8/0x122 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 2: > (XEN) Xen BUG at sched_credit.c:384 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html