From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guillaume Rousse Subject: Re: Arbitrary reboot with xen 3.4.x Date: Fri, 20 Nov 2009 11:42:23 +0100 Message-ID: <4B06728F.1050008@inria.fr> References: <4B058940.2050009@inria.fr> <20091119180951.GF16033@reaktio.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20091119180951.GF16033@reaktio.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: =?ISO-8859-1?Q?Pasi_K=E4rkk=E4inen?= Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Pasi K=E4rkk=E4inen a =E9crit : > On Thu, Nov 19, 2009 at 07:06:56PM +0100, Guillaume Rousse wrote: >> Hello. >> >> I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU= .=20 >> When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37),=20 >> everything seems to be fine, I can launch the various hosts, but 5 to = 10=20 >> minutes later, the host violently reboot... I can't find any trace in=20 >> the logs. I do have a second host with the same configuration and setu= p,=20 >> and the result is similar. It seems to be linked with domU activity,=20 >> because without any domU, or without any domU with actual activity, I=20 >> don't have any reboot. I had to rollback to xen 3.3.0. >> >=20 > Did you try the new Xen 3.4.2 ? I just did this morning. Without any changelog, it's a bit 'upgrade and=20 pray'... >> It seems like an hardware issue (but it doesn't appears with 3.3.0), o= r=20 >> a crash in the hypervisor, than syslog is unable to catch when it=20 >> appears. How can I try to get a trace ? >> >=20 > You should setup a serial console, so you can capture and > log the full console (xen + dom0 kernel) output to other computer.. Indeed. Here is the output. At first domU crash, because of memory ballooning=20 issue, is not fatal. The second crash, however is. I don't know if it's=20 because of uncorrect state after initial crash, or because of additional=20 domUs launched in the interim. (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) Domain 1 (vcpu#0) crashed on cpu#3:=20 (XEN) ----[ Xen-3.4.1 x86_64 debug=3Dn Not tainted ]----=20 (XEN) CPU: 3=20 (XEN) RIP: 0010:[]=20 (XEN) RFLAGS: 0000000000010246 CONTEXT: hvm guest=20 (XEN) rax: 00000000007028b8 rbx: 0000000000001000 rcx:=20 0000000000000200 (XEN) rdx: 0000000000000000 rsi: 00000000007028b8 rdi:=20 ffff8800123a0000 (XEN) rbp: ffff88001a119b68 rsp: ffff88001a119b50 r8:=20 ffffea00003fcb00 (XEN) r9: 000000000001050f r10: 0000000000000000 r11:=20 0000000000000001 (XEN) r12: 0000000000001000 r13: 0000000000000000 r14:=20 ffff88001796aea8 (XEN) r15: 0000000000001000 cr0: 000000008005003b cr4:=20 00000000000006f0 (XEN) cr3: 000000001a079000 cr2: 00007fc176c772e8=20 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0018 cs: 0010=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) Domain 2 reported crashed by domain 0 on cpu#0:=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) p2m_pod_demand_populate: Out of populate-on-demand memory!=20 (XEN) domain_crash called from p2m.c:1091=20 (XEN) ----[ Xen-3.4.1 x86_64 debug=3Dn Not tainted ]----=20 (XEN) CPU: 0=20 (XEN) RIP: e008:[] hash_foreach+0x59/0xe0=20 (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor=20 (XEN) rax: 0000000000000000 rbx: ffff8284000c1780 rcx:=20 00000000000060bc (XEN) rdx: ffff83041f98c000 rsi: 0000000000000336 rdi:=20 ffff8300be7c0000 (XEN) rbp: 0000000000000336 rsp: ffff828c80257848 r8:=20 0000000000200c00 (XEN) r9: 0000000000000001 r10: ffff83041f98c000 r11:=20 ffff828c801b10e0 (XEN) r12: 0000000000000001 r13: 0000000000000000 r14:=20 00000000000060bc (XEN) r15: ffff828c80205f80 cr0: 000000008005003b cr4:=20 00000000000026f0 (XEN) cr3: 0000000021759000 cr2: 0000000000000000=20 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008=20 (XEN) Xen stack trace from rsp=3Dffff828c80257848:=20 (XEN) 0000000000000000 ffff8300be7c0000 ffff83041f98c000=20 ffff8284000c1780 (XEN) ffff8300be7c0000 00000000000060bc 0000000000000000=20 00000000000144bc (XEN) ffff8300be7c0000 ffff828c801aae4d ffff828c80257960=20 00000000000060bc (XEN) ffff828c80257960 ffff83041f98c000 ffff83041f98c000=20 ffff828c801b13bf (XEN) 00000000000144bc 0000000000200c00 ffff83041f4ed5e0=20 ffff83041f98d130 (XEN) ffff828c80284d24 ffff83041f4ed5e0 ffff828c80257960=20 ffff828c80257968 (XEN) ffff83041f98c000 00000000000144bc 0000000000000000=20 ffff828c801a96d4 (XEN) 0000000000000200 2000000000000000 ffff828c80257a80=20 000000061f98c000 (XEN) 0000000000000200 007fffffffffffff 0000000000000000=20 ffff83041f4ed000 (XEN) 000000000041f4ed 0000000000000001 0000000000000001=20 0000000000000200 (XEN) 00000000000144bc ffff83041f98c000 0000000000000006=20 ffff828c801a5991 (XEN) ffff828c80257abc 0000000000000001 ffff828c80257ba8=20 007fffffffffffff (XEN) ffff828c802579f0 ffff83041f98c000 ffff828c80257a80=20 ffff828c801a6efb (XEN) 0000000400000000 0000000000000000 ffff8300060bc000=20 ffff8300060bb000 (XEN) ffff8300060ba000 ffff8300060b9000 ffff8300060b8000=20 ffff8300060b7000 (XEN) ffff8300060b6000 ffff8300060b5000 ffff8300060b4000=20 ffff8300060b3000 (XEN) ffff8300060b2000 ffff8300060b1000 ffff8300060b0000=20 ffff8300060af000 (XEN) ffff8300060ae000 ffff828c801f16dc 0000000000000082=20 0000000100000001 (XEN) 0000000100000001 0000000100000001 0000000100000001=20 0000000100000001 (XEN) 0000000100000001 0000000100000001 0000000100000001=20 0000000000000286 (XEN) Xen call trace:=20 (XEN) [] hash_foreach+0x59/0xe0=20 (XEN) [] sh_remove_all_mappings+0x8d/0x200=20 (XEN) [] shadow_write_p2m_entry+0x2df/0x330=20 (XEN) [] p2m_set_entry+0x344/0x430=20 (XEN) [] set_p2m_entry+0x71/0xa0=20 (XEN) [] p2m_pod_zero_check+0x1db/0x310=20 (XEN) [] p2m_pod_demand_populate+0x830/0xa40=20 (XEN) [] p2m_gfn_to_mfn+0x224/0x260=20 (XEN) [] mod_l1_entry+0x6e5/0x7b0=20 (XEN) [] do_mmu_update+0x937/0x16e0=20 (XEN) [] get_page_type+0xb/0x20=20 (XEN) [] do_multicall+0x164/0x370=20 (XEN) [] syscall_enter+0xa9/0xae=20 (XEN)=20 (XEN) Pagetable walk from 0000000000000000:=20 (XEN) L4[0x000] =3D 000000001cb48067 00000000003d6ca9=20 (XEN) L3[0x000] =3D 000000000c58b067 00000000003e72ec=20 (XEN) L2[0x000] =3D 0000000000000000 ffffffffffffffff=20 (XEN)=20 (XEN) ****************************************=20 (XEN) Panic on CPU 0:=20 (XEN) FATAL PAGE FAULT=20 (XEN) [error_code=3D0000]=20 (XEN) Faulting linear address: 0000000000000000=20 (XEN) ****************************************=20 (XEN)=20 (XEN) Reboot in five seconds... My domUs all have this configuration: memory =3D 256 maxmem =3D 512 Or different values, but always with the same ratio between memory and=20 max memory. Which seems to be quite useless for hvm domUs, as memory=20 ballooning is not supported AFAIK, unless using pv-drivers (which I=20 can't manage to build). With identical values, the issue does'nt appear. With Xen 3.4.2, the domUs still crash, but at least dom0 does not=20 reboot. So it's just less worst :) --=20 BOFH excuse #426: internet is needed to catch the etherbunny