* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub [not found] <5600628A.20202@zappa.cx> @ 2015-09-22 8:53 ` Ian Campbell 2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko 0 siblings, 1 reply; 6+ messages in thread From: Ian Campbell @ 2015-09-22 8:53 UTC (permalink / raw) To: grub-devel, Vladimir 'φ-coder/phcoder' Serbinenko Cc: Andreas Sundstrom, xen-devel Hi Vladimir & grub-devel, Do you have any thoughts on this issue with i386 pv-grub2? Thanks, Ian. On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote: > This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches > applied) and Xen 4.4.1 > > I originally posted a bug report with Debian but got the suggestion to > file bugs with upstream as well. > Debian bug report: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480 > > Note that my original thought was that this bug probably is within GRUB. > But Ian asked me to file a bug with Xen as well, you have to live with > the > fact that it is centered around GRUB though. > > Here's the information from my original bug report: > > Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes > fail when chainloading the domU's grub. 64-bit domU seem to work 100% > of the time. > > My understanding of the process: > > * dom0 launches domU with grub that is loaded from dom0's disk. > * Grub reads config file from memdisk, and then looks for grub binary in > domU filesystem. > * If grub is found in domU it then chainloads (multiboot) that grub > binary > and the domU grub reads grub.cfg and continue booting. > * If grub is not found in domU it reads grub.cfg and continues with > boot. > > It fails at step 3 in my list of the boot process, but sometimes it > does work so it may be something like a race condition that causes the > problem? > > A workaround is to not install or rename /boot/xen in domU so that the > first grub that is loaded from dom0's disk will not find the grub > binary in the domU filesystem and hence continues to read grub.cfg and > boot. The drawback of this is of course that the two versions can't > differ too much as there are different setups creating grub.cfg and > then reading/parsing it at boot time. > > I am not sure at this point whether this is a problem in XEN or a > problem in grub but I compiled the legacy pvgrub that uses some minios > from XEN (don't really know much more about it) and when that legacy > pvgrub chainloads the domU grub it seems to work 100% of the time. Now > the legace pvgrub is not a real alternative as it's not packaged for > Debian though. > > When it fails "xl create vm -c" outputs this: > Parsing config from /etc/xen/vm > libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain > type for domid=16 > Unable to attach console > libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console > child [0] exited with error status 1 > > And "xl dmesg" shows errors like this: > (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from > 0x0000000000000000 to 0x000000000000ffff. > (XEN) d16:v0: unhandled page fault (ec=0010) > (XEN) Pagetable walk from 0000000000000000: > (XEN) L4[0x000] = 0000000200256027 000000000000049c > (XEN) L3[0x000] = 0000000200255027 000000000000049d > (XEN) L2[0x000] = 0000000200251023 00000000000004a1 > (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff > (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0 > compat_create_bounce_frame+0xc6/0xde > (XEN) Domain 16 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e019:[<0000000000000000>] > (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest > (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 > (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000 > (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000 > (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940 > (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0 > (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000 > (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019 > (XEN) Guest stack trace from esp=005a5ff0: > (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389 > 0016b388 > (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381 > 0016b380 > (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379 > 0016b378 > (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371 > 0016b370 > (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369 > 0016b368 > (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361 > 0016b360 > (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359 > 0016b358 > (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351 > 0016b350 > (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349 > 0016b348 > (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341 > 0016b340 > (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339 > 0016b338 > (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331 > 0016b330 > (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329 > 0016b328 > (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321 > 0016b320 > (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319 > 0016b318 > (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311 > 0016b310 > (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309 > 0016b308 > (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301 > 0016b300 > (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9 > 0016b2f8 > (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1 > 0016b2f0 > > An easy way to find out which grub you are in if the machine boots is > to hit 'c' and type 'ls', only the grub from dom0 will know about > (memdisk). So when trying to replicate the issue (and the domU > actually starts) you can hit 'c', type 'ls' (check for memdisk) and > then type 'halt' and relaunch the domU. Usually I can't launch more > than 4-5 times in a row before it fails, often it fails on my first > try. > > For information I have reproduced on two different AMD desktop > processor machines, not sure if Intel would be any different. I'm > pretty sure I did tests with grub from unstable with same result at > some point, but can test again if that is likely to work. > > The package that is in installed on the domU side is "grub-xen". > > I am unable to understand how to debug grub further on my own, I have > printed out text from grub so that I understood that it is the > chainload that fails. I see no output from the domU grub (except when > it works as it should of course). I can help with further testing if > needed. > > /Andreas > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub 2015-09-22 8:53 ` [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Ian Campbell @ 2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko 2016-01-22 13:01 ` Andrew Cooper 2016-01-22 17:44 ` Andreas Sundstrom 0 siblings, 2 replies; 6+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 12:56 UTC (permalink / raw) To: Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel [-- Attachment #1: Type: text/plain, Size: 7439 bytes --] On 22.09.2015 10:53, Ian Campbell wrote: > Hi Vladimir & grub-devel, > > Do you have any thoughts on this issue with i386 pv-grub2? > Is it still an issue? If so I'll try to replicate it. From stack dump I see that it has jumped to NULL. GRUB has no threads so it's not a race condition with itself but may be one with some Xen part. An altrnative possibility is that grub forgets to flush cache at some point in boot process. > Thanks, Ian. > > On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote: >> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches >> applied) and Xen 4.4.1 >> >> I originally posted a bug report with Debian but got the suggestion to >> file bugs with upstream as well. >> Debian bug report: >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480 >> >> Note that my original thought was that this bug probably is within GRUB. >> But Ian asked me to file a bug with Xen as well, you have to live with >> the >> fact that it is centered around GRUB though. >> >> Here's the information from my original bug report: >> >> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes >> fail when chainloading the domU's grub. 64-bit domU seem to work 100% >> of the time. >> >> My understanding of the process: >> >> * dom0 launches domU with grub that is loaded from dom0's disk. >> * Grub reads config file from memdisk, and then looks for grub binary in >> domU filesystem. >> * If grub is found in domU it then chainloads (multiboot) that grub >> binary >> and the domU grub reads grub.cfg and continue booting. >> * If grub is not found in domU it reads grub.cfg and continues with >> boot. >> >> It fails at step 3 in my list of the boot process, but sometimes it >> does work so it may be something like a race condition that causes the >> problem? >> >> A workaround is to not install or rename /boot/xen in domU so that the >> first grub that is loaded from dom0's disk will not find the grub >> binary in the domU filesystem and hence continues to read grub.cfg and >> boot. The drawback of this is of course that the two versions can't >> differ too much as there are different setups creating grub.cfg and >> then reading/parsing it at boot time. >> >> I am not sure at this point whether this is a problem in XEN or a >> problem in grub but I compiled the legacy pvgrub that uses some minios >> from XEN (don't really know much more about it) and when that legacy >> pvgrub chainloads the domU grub it seems to work 100% of the time. Now >> the legace pvgrub is not a real alternative as it's not packaged for >> Debian though. >> >> When it fails "xl create vm -c" outputs this: >> Parsing config from /etc/xen/vm >> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain >> type for domid=16 >> Unable to attach console >> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console >> child [0] exited with error status 1 >> >> And "xl dmesg" shows errors like this: >> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from >> 0x0000000000000000 to 0x000000000000ffff. >> (XEN) d16:v0: unhandled page fault (ec=0010) >> (XEN) Pagetable walk from 0000000000000000: >> (XEN) L4[0x000] = 0000000200256027 000000000000049c >> (XEN) L3[0x000] = 0000000200255027 000000000000049d >> (XEN) L2[0x000] = 0000000200251023 00000000000004a1 >> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff >> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0 >> compat_create_bounce_frame+0xc6/0xde >> (XEN) Domain 16 (vcpu#0) crashed on cpu#0: >> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e019:[<0000000000000000>] >> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest >> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 >> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000 >> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000 >> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000 >> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940 >> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0 >> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000 >> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019 >> (XEN) Guest stack trace from esp=005a5ff0: >> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389 >> 0016b388 >> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381 >> 0016b380 >> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379 >> 0016b378 >> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371 >> 0016b370 >> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369 >> 0016b368 >> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361 >> 0016b360 >> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359 >> 0016b358 >> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351 >> 0016b350 >> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349 >> 0016b348 >> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341 >> 0016b340 >> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339 >> 0016b338 >> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331 >> 0016b330 >> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329 >> 0016b328 >> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321 >> 0016b320 >> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319 >> 0016b318 >> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311 >> 0016b310 >> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309 >> 0016b308 >> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301 >> 0016b300 >> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9 >> 0016b2f8 >> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1 >> 0016b2f0 >> >> An easy way to find out which grub you are in if the machine boots is >> to hit 'c' and type 'ls', only the grub from dom0 will know about >> (memdisk). So when trying to replicate the issue (and the domU >> actually starts) you can hit 'c', type 'ls' (check for memdisk) and >> then type 'halt' and relaunch the domU. Usually I can't launch more >> than 4-5 times in a row before it fails, often it fails on my first >> try. >> >> For information I have reproduced on two different AMD desktop >> processor machines, not sure if Intel would be any different. I'm >> pretty sure I did tests with grub from unstable with same result at >> some point, but can test again if that is likely to work. >> >> The package that is in installed on the domU side is "grub-xen". >> >> I am unable to understand how to debug grub further on my own, I have >> printed out text from grub so that I understood that it is the >> chainload that fails. I see no output from the domU grub (except when >> it works as it should of course). I can help with further testing if >> needed. >> >> /Andreas >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 213 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub 2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 13:01 ` Andrew Cooper 2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko 2016-01-22 17:44 ` Andreas Sundstrom 1 sibling, 1 reply; 6+ messages in thread From: Andrew Cooper @ 2016-01-22 13:01 UTC (permalink / raw) To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell, grub-devel Cc: Andreas Sundstrom, xen-devel [-- Attachment #1: Type: text/plain, Size: 7962 bytes --] On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote: > On 22.09.2015 10:53, Ian Campbell wrote: >> Hi Vladimir & grub-devel, >> >> Do you have any thoughts on this issue with i386 pv-grub2? >> > Is it still an issue? If so I'll try to replicate it. From stack dump I > see that it has jumped to NULL. GRUB has no threads so it's not a race > condition with itself but may be one with some Xen part. An altrnative > possibility is that grub forgets to flush cache at some point in boot > process. Looks like GRUB doesn't have a traptable registered with Xen (the PV equivalent of the IDT). First, Xen tried to inject a #GP fault and found that the entry EIP was at 0 (which is sadly the default if nothing is specified). It then took a pagefault while attempting to inject the #GP, and crashed the domain. ~Andrew >> Thanks, Ian. >> >> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote: >>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches >>> applied) and Xen 4.4.1 >>> >>> I originally posted a bug report with Debian but got the suggestion to >>> file bugs with upstream as well. >>> Debian bug report: >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480 >>> >>> Note that my original thought was that this bug probably is within GRUB. >>> But Ian asked me to file a bug with Xen as well, you have to live with >>> the >>> fact that it is centered around GRUB though. >>> >>> Here's the information from my original bug report: >>> >>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes >>> fail when chainloading the domU's grub. 64-bit domU seem to work 100% >>> of the time. >>> >>> My understanding of the process: >>> >>> * dom0 launches domU with grub that is loaded from dom0's disk. >>> * Grub reads config file from memdisk, and then looks for grub binary in >>> domU filesystem. >>> * If grub is found in domU it then chainloads (multiboot) that grub >>> binary >>> and the domU grub reads grub.cfg and continue booting. >>> * If grub is not found in domU it reads grub.cfg and continues with >>> boot. >>> >>> It fails at step 3 in my list of the boot process, but sometimes it >>> does work so it may be something like a race condition that causes the >>> problem? >>> >>> A workaround is to not install or rename /boot/xen in domU so that the >>> first grub that is loaded from dom0's disk will not find the grub >>> binary in the domU filesystem and hence continues to read grub.cfg and >>> boot. The drawback of this is of course that the two versions can't >>> differ too much as there are different setups creating grub.cfg and >>> then reading/parsing it at boot time. >>> >>> I am not sure at this point whether this is a problem in XEN or a >>> problem in grub but I compiled the legacy pvgrub that uses some minios >>> from XEN (don't really know much more about it) and when that legacy >>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now >>> the legace pvgrub is not a real alternative as it's not packaged for >>> Debian though. >>> >>> When it fails "xl create vm -c" outputs this: >>> Parsing config from /etc/xen/vm >>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain >>> type for domid=16 >>> Unable to attach console >>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console >>> child [0] exited with error status 1 >>> >>> And "xl dmesg" shows errors like this: >>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from >>> 0x0000000000000000 to 0x000000000000ffff. >>> (XEN) d16:v0: unhandled page fault (ec=0010) >>> (XEN) Pagetable walk from 0000000000000000: >>> (XEN) L4[0x000] = 0000000200256027 000000000000049c >>> (XEN) L3[0x000] = 0000000200255027 000000000000049d >>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1 >>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff >>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0 >>> compat_create_bounce_frame+0xc6/0xde >>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0: >>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]---- >>> (XEN) CPU: 0 >>> (XEN) RIP: e019:[<0000000000000000>] >>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest >>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 >>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000 >>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000 >>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000 >>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940 >>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0 >>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000 >>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019 >>> (XEN) Guest stack trace from esp=005a5ff0: >>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389 >>> 0016b388 >>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381 >>> 0016b380 >>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379 >>> 0016b378 >>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371 >>> 0016b370 >>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369 >>> 0016b368 >>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361 >>> 0016b360 >>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359 >>> 0016b358 >>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351 >>> 0016b350 >>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349 >>> 0016b348 >>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341 >>> 0016b340 >>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339 >>> 0016b338 >>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331 >>> 0016b330 >>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329 >>> 0016b328 >>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321 >>> 0016b320 >>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319 >>> 0016b318 >>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311 >>> 0016b310 >>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309 >>> 0016b308 >>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301 >>> 0016b300 >>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9 >>> 0016b2f8 >>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1 >>> 0016b2f0 >>> >>> An easy way to find out which grub you are in if the machine boots is >>> to hit 'c' and type 'ls', only the grub from dom0 will know about >>> (memdisk). So when trying to replicate the issue (and the domU >>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and >>> then type 'halt' and relaunch the domU. Usually I can't launch more >>> than 4-5 times in a row before it fails, often it fails on my first >>> try. >>> >>> For information I have reproduced on two different AMD desktop >>> processor machines, not sure if Intel would be any different. I'm >>> pretty sure I did tests with grub from unstable with same result at >>> some point, but can test again if that is likely to work. >>> >>> The package that is in installed on the domU side is "grub-xen". >>> >>> I am unable to understand how to debug grub further on my own, I have >>> printed out text from grub so that I understood that it is the >>> chainload that fails. I see no output from the domU grub (except when >>> it works as it should of course). I can help with further testing if >>> needed. >>> >>> /Andreas >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel [-- Attachment #2: Type: text/html, Size: 8673 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub 2016-01-22 13:01 ` Andrew Cooper @ 2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko 2016-01-22 13:43 ` Andrew Cooper 0 siblings, 1 reply; 6+ messages in thread From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 13:08 UTC (permalink / raw) To: Andrew Cooper, Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel [-- Attachment #1: Type: text/plain, Size: 8465 bytes --] On 22.01.2016 14:01, Andrew Cooper wrote: > On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote: >> On 22.09.2015 10:53, Ian Campbell wrote: >>> Hi Vladimir & grub-devel, >>> >>> Do you have any thoughts on this issue with i386 pv-grub2? >>> >> Is it still an issue? If so I'll try to replicate it. From stack dump I >> see that it has jumped to NULL. GRUB has no threads so it's not a race >> condition with itself but may be one with some Xen part. An altrnative >> possibility is that grub forgets to flush cache at some point in boot >> process. > > Looks like GRUB doesn't have a traptable registered with Xen (the PV > equivalent of the IDT). > > First, Xen tried to inject a #GP fault and found that the entry EIP was > at 0 (which is sadly the default if nothing is specified). It then took > a pagefault while attempting to inject the #GP, and crashed the domain. > Do you have a link how to add one? We can put a catch-stacktrace-abort on it. > ~Andrew > >>> Thanks, Ian. >>> >>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote: >>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches >>>> applied) and Xen 4.4.1 >>>> >>>> I originally posted a bug report with Debian but got the suggestion to >>>> file bugs with upstream as well. >>>> Debian bug report: >>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480 >>>> >>>> Note that my original thought was that this bug probably is within GRUB. >>>> But Ian asked me to file a bug with Xen as well, you have to live with >>>> the >>>> fact that it is centered around GRUB though. >>>> >>>> Here's the information from my original bug report: >>>> >>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes >>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100% >>>> of the time. >>>> >>>> My understanding of the process: >>>> >>>> * dom0 launches domU with grub that is loaded from dom0's disk. >>>> * Grub reads config file from memdisk, and then looks for grub binary in >>>> domU filesystem. >>>> * If grub is found in domU it then chainloads (multiboot) that grub >>>> binary >>>> and the domU grub reads grub.cfg and continue booting. >>>> * If grub is not found in domU it reads grub.cfg and continues with >>>> boot. >>>> >>>> It fails at step 3 in my list of the boot process, but sometimes it >>>> does work so it may be something like a race condition that causes the >>>> problem? >>>> >>>> A workaround is to not install or rename /boot/xen in domU so that the >>>> first grub that is loaded from dom0's disk will not find the grub >>>> binary in the domU filesystem and hence continues to read grub.cfg and >>>> boot. The drawback of this is of course that the two versions can't >>>> differ too much as there are different setups creating grub.cfg and >>>> then reading/parsing it at boot time. >>>> >>>> I am not sure at this point whether this is a problem in XEN or a >>>> problem in grub but I compiled the legacy pvgrub that uses some minios >>>> from XEN (don't really know much more about it) and when that legacy >>>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now >>>> the legace pvgrub is not a real alternative as it's not packaged for >>>> Debian though. >>>> >>>> When it fails "xl create vm -c" outputs this: >>>> Parsing config from /etc/xen/vm >>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain >>>> type for domid=16 >>>> Unable to attach console >>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console >>>> child [0] exited with error status 1 >>>> >>>> And "xl dmesg" shows errors like this: >>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from >>>> 0x0000000000000000 to 0x000000000000ffff. >>>> (XEN) d16:v0: unhandled page fault (ec=0010) >>>> (XEN) Pagetable walk from 0000000000000000: >>>> (XEN) L4[0x000] = 0000000200256027 000000000000049c >>>> (XEN) L3[0x000] = 0000000200255027 000000000000049d >>>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1 >>>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff >>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0 >>>> compat_create_bounce_frame+0xc6/0xde >>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0: >>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]---- >>>> (XEN) CPU: 0 >>>> (XEN) RIP: e019:[<0000000000000000>] >>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest >>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 >>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000 >>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000 >>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000 >>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940 >>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0 >>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000 >>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019 >>>> (XEN) Guest stack trace from esp=005a5ff0: >>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389 >>>> 0016b388 >>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381 >>>> 0016b380 >>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379 >>>> 0016b378 >>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371 >>>> 0016b370 >>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369 >>>> 0016b368 >>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361 >>>> 0016b360 >>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359 >>>> 0016b358 >>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351 >>>> 0016b350 >>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349 >>>> 0016b348 >>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341 >>>> 0016b340 >>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339 >>>> 0016b338 >>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331 >>>> 0016b330 >>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329 >>>> 0016b328 >>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321 >>>> 0016b320 >>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319 >>>> 0016b318 >>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311 >>>> 0016b310 >>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309 >>>> 0016b308 >>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301 >>>> 0016b300 >>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9 >>>> 0016b2f8 >>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1 >>>> 0016b2f0 >>>> >>>> An easy way to find out which grub you are in if the machine boots is >>>> to hit 'c' and type 'ls', only the grub from dom0 will know about >>>> (memdisk). So when trying to replicate the issue (and the domU >>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and >>>> then type 'halt' and relaunch the domU. Usually I can't launch more >>>> than 4-5 times in a row before it fails, often it fails on my first >>>> try. >>>> >>>> For information I have reproduced on two different AMD desktop >>>> processor machines, not sure if Intel would be any different. I'm >>>> pretty sure I did tests with grub from unstable with same result at >>>> some point, but can test again if that is likely to work. >>>> >>>> The package that is in installed on the domU side is "grub-xen". >>>> >>>> I am unable to understand how to debug grub further on my own, I have >>>> printed out text from grub so that I understood that it is the >>>> chainload that fails. I see no output from the domU grub (except when >>>> it works as it should of course). I can help with further testing if >>>> needed. >>>> >>>> /Andreas >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xen.org >>>> http://lists.xen.org/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 213 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub 2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 13:43 ` Andrew Cooper 0 siblings, 0 replies; 6+ messages in thread From: Andrew Cooper @ 2016-01-22 13:43 UTC (permalink / raw) To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell, grub-devel Cc: Andreas Sundstrom, xen-devel On 22/01/16 13:08, Vladimir 'φ-coder/phcoder' Serbinenko wrote: > On 22.01.2016 14:01, Andrew Cooper wrote: >> On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote: >>> On 22.09.2015 10:53, Ian Campbell wrote: >>>> Hi Vladimir & grub-devel, >>>> >>>> Do you have any thoughts on this issue with i386 pv-grub2? >>>> >>> Is it still an issue? If so I'll try to replicate it. From stack dump I >>> see that it has jumped to NULL. GRUB has no threads so it's not a race >>> condition with itself but may be one with some Xen part. An altrnative >>> possibility is that grub forgets to flush cache at some point in boot >>> process. >> Looks like GRUB doesn't have a traptable registered with Xen (the PV >> equivalent of the IDT). >> >> First, Xen tried to inject a #GP fault and found that the entry EIP was >> at 0 (which is sadly the default if nothing is specified). It then took >> a pagefault while attempting to inject the #GP, and crashed the domain. >> > Do you have a link how to add one? We can put a catch-stacktrace-abort > on it. This is from my microkernel framework, and is probably the most succinct code implementation: http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen-test-framework.git;a=blob;f=arch/x86/pv/traps.c;h=7f9a1908d260659c10f5cbb1d2d234c9fea1edb5;hb=HEAD#l31 The hypercall ABI documentation is: http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/xen.h;h=cdd93c1c6446a92e89188c6a5132538188825d27;hb=refs/heads/staging#l126 ~Andrew ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub 2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko 2016-01-22 13:01 ` Andrew Cooper @ 2016-01-22 17:44 ` Andreas Sundstrom 1 sibling, 0 replies; 6+ messages in thread From: Andreas Sundstrom @ 2016-01-22 17:44 UTC (permalink / raw) To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell, grub-devel Cc: xen-devel On 2016-01-22 13:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote: > On 22.09.2015 10:53, Ian Campbell wrote: >> Hi Vladimir & grub-devel, >> >> Do you have any thoughts on this issue with i386 pv-grub2? >> > Is it still an issue? If so I'll try to replicate it. From stack dump I > see that it has jumped to NULL. GRUB has no threads so it's not a race > condition with itself but may be one with some Xen part. An altrnative > possibility is that grub forgets to flush cache at some point in boot > process. I can still reproduce the issue. I don't think much has changed in my setup since the report. I run the current version of Xen and GRUB from Debian stable. /Andreas ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-01-22 21:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <5600628A.20202@zappa.cx> 2015-09-22 8:53 ` [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Ian Campbell 2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko 2016-01-22 13:01 ` Andrew Cooper 2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko 2016-01-22 13:43 ` Andrew Cooper 2016-01-22 17:44 ` Andreas Sundstrom
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).