* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
[not found] <5600628A.20202@zappa.cx>
@ 2015-09-22 8:53 ` Ian Campbell
2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko
0 siblings, 1 reply; 6+ messages in thread
From: Ian Campbell @ 2015-09-22 8:53 UTC (permalink / raw)
To: grub-devel, Vladimir 'φ-coder/phcoder' Serbinenko
Cc: Andreas Sundstrom, xen-devel
Hi Vladimir & grub-devel,
Do you have any thoughts on this issue with i386 pv-grub2?
Thanks, Ian.
On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
> applied) and Xen 4.4.1
>
> I originally posted a bug report with Debian but got the suggestion to
> file bugs with upstream as well.
> Debian bug report:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>
> Note that my original thought was that this bug probably is within GRUB.
> But Ian asked me to file a bug with Xen as well, you have to live with
> the
> fact that it is centered around GRUB though.
>
> Here's the information from my original bug report:
>
> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
> of the time.
>
> My understanding of the process:
>
> * dom0 launches domU with grub that is loaded from dom0's disk.
> * Grub reads config file from memdisk, and then looks for grub binary in
> domU filesystem.
> * If grub is found in domU it then chainloads (multiboot) that grub
> binary
> and the domU grub reads grub.cfg and continue booting.
> * If grub is not found in domU it reads grub.cfg and continues with
> boot.
>
> It fails at step 3 in my list of the boot process, but sometimes it
> does work so it may be something like a race condition that causes the
> problem?
>
> A workaround is to not install or rename /boot/xen in domU so that the
> first grub that is loaded from dom0's disk will not find the grub
> binary in the domU filesystem and hence continues to read grub.cfg and
> boot. The drawback of this is of course that the two versions can't
> differ too much as there are different setups creating grub.cfg and
> then reading/parsing it at boot time.
>
> I am not sure at this point whether this is a problem in XEN or a
> problem in grub but I compiled the legacy pvgrub that uses some minios
> from XEN (don't really know much more about it) and when that legacy
> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
> the legace pvgrub is not a real alternative as it's not packaged for
> Debian though.
>
> When it fails "xl create vm -c" outputs this:
> Parsing config from /etc/xen/vm
> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
> type for domid=16
> Unable to attach console
> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
> child [0] exited with error status 1
>
> And "xl dmesg" shows errors like this:
> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
> 0x0000000000000000 to 0x000000000000ffff.
> (XEN) d16:v0: unhandled page fault (ec=0010)
> (XEN) Pagetable walk from 0000000000000000:
> (XEN) L4[0x000] = 0000000200256027 000000000000049c
> (XEN) L3[0x000] = 0000000200255027 000000000000049d
> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
> compat_create_bounce_frame+0xc6/0xde
> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e019:[<0000000000000000>]
> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
> (XEN) Guest stack trace from esp=005a5ff0:
> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
> 0016b388
> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
> 0016b380
> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
> 0016b378
> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
> 0016b370
> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
> 0016b368
> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
> 0016b360
> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
> 0016b358
> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
> 0016b350
> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
> 0016b348
> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
> 0016b340
> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
> 0016b338
> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
> 0016b330
> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
> 0016b328
> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
> 0016b320
> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
> 0016b318
> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
> 0016b310
> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
> 0016b308
> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
> 0016b300
> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
> 0016b2f8
> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
> 0016b2f0
>
> An easy way to find out which grub you are in if the machine boots is
> to hit 'c' and type 'ls', only the grub from dom0 will know about
> (memdisk). So when trying to replicate the issue (and the domU
> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
> then type 'halt' and relaunch the domU. Usually I can't launch more
> than 4-5 times in a row before it fails, often it fails on my first
> try.
>
> For information I have reproduced on two different AMD desktop
> processor machines, not sure if Intel would be any different. I'm
> pretty sure I did tests with grub from unstable with same result at
> some point, but can test again if that is likely to work.
>
> The package that is in installed on the domU side is "grub-xen".
>
> I am unable to understand how to debug grub further on my own, I have
> printed out text from grub so that I understood that it is the
> chainload that fails. I see no output from the domU grub (except when
> it works as it should of course). I can help with further testing if
> needed.
>
> /Andreas
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
2015-09-22 8:53 ` [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Ian Campbell
@ 2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:01 ` Andrew Cooper
2016-01-22 17:44 ` Andreas Sundstrom
0 siblings, 2 replies; 6+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 12:56 UTC (permalink / raw)
To: Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel
[-- Attachment #1: Type: text/plain, Size: 7439 bytes --]
On 22.09.2015 10:53, Ian Campbell wrote:
> Hi Vladimir & grub-devel,
>
> Do you have any thoughts on this issue with i386 pv-grub2?
>
Is it still an issue? If so I'll try to replicate it. From stack dump I
see that it has jumped to NULL. GRUB has no threads so it's not a race
condition with itself but may be one with some Xen part. An altrnative
possibility is that grub forgets to flush cache at some point in boot
process.
> Thanks, Ian.
>
> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>> applied) and Xen 4.4.1
>>
>> I originally posted a bug report with Debian but got the suggestion to
>> file bugs with upstream as well.
>> Debian bug report:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>
>> Note that my original thought was that this bug probably is within GRUB.
>> But Ian asked me to file a bug with Xen as well, you have to live with
>> the
>> fact that it is centered around GRUB though.
>>
>> Here's the information from my original bug report:
>>
>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>> of the time.
>>
>> My understanding of the process:
>>
>> * dom0 launches domU with grub that is loaded from dom0's disk.
>> * Grub reads config file from memdisk, and then looks for grub binary in
>> domU filesystem.
>> * If grub is found in domU it then chainloads (multiboot) that grub
>> binary
>> and the domU grub reads grub.cfg and continue booting.
>> * If grub is not found in domU it reads grub.cfg and continues with
>> boot.
>>
>> It fails at step 3 in my list of the boot process, but sometimes it
>> does work so it may be something like a race condition that causes the
>> problem?
>>
>> A workaround is to not install or rename /boot/xen in domU so that the
>> first grub that is loaded from dom0's disk will not find the grub
>> binary in the domU filesystem and hence continues to read grub.cfg and
>> boot. The drawback of this is of course that the two versions can't
>> differ too much as there are different setups creating grub.cfg and
>> then reading/parsing it at boot time.
>>
>> I am not sure at this point whether this is a problem in XEN or a
>> problem in grub but I compiled the legacy pvgrub that uses some minios
>> from XEN (don't really know much more about it) and when that legacy
>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>> the legace pvgrub is not a real alternative as it's not packaged for
>> Debian though.
>>
>> When it fails "xl create vm -c" outputs this:
>> Parsing config from /etc/xen/vm
>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>> type for domid=16
>> Unable to attach console
>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>> child [0] exited with error status 1
>>
>> And "xl dmesg" shows errors like this:
>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>> 0x0000000000000000 to 0x000000000000ffff.
>> (XEN) d16:v0: unhandled page fault (ec=0010)
>> (XEN) Pagetable walk from 0000000000000000:
>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>> compat_create_bounce_frame+0xc6/0xde
>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e019:[<0000000000000000>]
>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>> (XEN) Guest stack trace from esp=005a5ff0:
>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>> 0016b388
>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>> 0016b380
>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>> 0016b378
>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>> 0016b370
>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>> 0016b368
>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>> 0016b360
>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>> 0016b358
>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>> 0016b350
>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>> 0016b348
>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>> 0016b340
>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>> 0016b338
>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>> 0016b330
>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>> 0016b328
>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>> 0016b320
>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>> 0016b318
>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>> 0016b310
>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>> 0016b308
>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>> 0016b300
>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>> 0016b2f8
>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>> 0016b2f0
>>
>> An easy way to find out which grub you are in if the machine boots is
>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>> (memdisk). So when trying to replicate the issue (and the domU
>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>> then type 'halt' and relaunch the domU. Usually I can't launch more
>> than 4-5 times in a row before it fails, often it fails on my first
>> try.
>>
>> For information I have reproduced on two different AMD desktop
>> processor machines, not sure if Intel would be any different. I'm
>> pretty sure I did tests with grub from unstable with same result at
>> some point, but can test again if that is likely to work.
>>
>> The package that is in installed on the domU side is "grub-xen".
>>
>> I am unable to understand how to debug grub further on my own, I have
>> printed out text from grub so that I understood that it is the
>> chainload that fails. I see no output from the domU grub (except when
>> it works as it should of course). I can help with further testing if
>> needed.
>>
>> /Andreas
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2016-01-22 13:01 ` Andrew Cooper
2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 17:44 ` Andreas Sundstrom
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2016-01-22 13:01 UTC (permalink / raw)
To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
grub-devel
Cc: Andreas Sundstrom, xen-devel
[-- Attachment #1: Type: text/plain, Size: 7962 bytes --]
On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.09.2015 10:53, Ian Campbell wrote:
>> Hi Vladimir & grub-devel,
>>
>> Do you have any thoughts on this issue with i386 pv-grub2?
>>
> Is it still an issue? If so I'll try to replicate it. From stack dump I
> see that it has jumped to NULL. GRUB has no threads so it's not a race
> condition with itself but may be one with some Xen part. An altrnative
> possibility is that grub forgets to flush cache at some point in boot
> process.
Looks like GRUB doesn't have a traptable registered with Xen (the PV
equivalent of the IDT).
First, Xen tried to inject a #GP fault and found that the entry EIP was
at 0 (which is sadly the default if nothing is specified). It then took
a pagefault while attempting to inject the #GP, and crashed the domain.
~Andrew
>> Thanks, Ian.
>>
>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>>> applied) and Xen 4.4.1
>>>
>>> I originally posted a bug report with Debian but got the suggestion to
>>> file bugs with upstream as well.
>>> Debian bug report:
>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>>
>>> Note that my original thought was that this bug probably is within GRUB.
>>> But Ian asked me to file a bug with Xen as well, you have to live with
>>> the
>>> fact that it is centered around GRUB though.
>>>
>>> Here's the information from my original bug report:
>>>
>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>>> of the time.
>>>
>>> My understanding of the process:
>>>
>>> * dom0 launches domU with grub that is loaded from dom0's disk.
>>> * Grub reads config file from memdisk, and then looks for grub binary in
>>> domU filesystem.
>>> * If grub is found in domU it then chainloads (multiboot) that grub
>>> binary
>>> and the domU grub reads grub.cfg and continue booting.
>>> * If grub is not found in domU it reads grub.cfg and continues with
>>> boot.
>>>
>>> It fails at step 3 in my list of the boot process, but sometimes it
>>> does work so it may be something like a race condition that causes the
>>> problem?
>>>
>>> A workaround is to not install or rename /boot/xen in domU so that the
>>> first grub that is loaded from dom0's disk will not find the grub
>>> binary in the domU filesystem and hence continues to read grub.cfg and
>>> boot. The drawback of this is of course that the two versions can't
>>> differ too much as there are different setups creating grub.cfg and
>>> then reading/parsing it at boot time.
>>>
>>> I am not sure at this point whether this is a problem in XEN or a
>>> problem in grub but I compiled the legacy pvgrub that uses some minios
>>> from XEN (don't really know much more about it) and when that legacy
>>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>>> the legace pvgrub is not a real alternative as it's not packaged for
>>> Debian though.
>>>
>>> When it fails "xl create vm -c" outputs this:
>>> Parsing config from /etc/xen/vm
>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>>> type for domid=16
>>> Unable to attach console
>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>>> child [0] exited with error status 1
>>>
>>> And "xl dmesg" shows errors like this:
>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>>> 0x0000000000000000 to 0x000000000000ffff.
>>> (XEN) d16:v0: unhandled page fault (ec=0010)
>>> (XEN) Pagetable walk from 0000000000000000:
>>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>>> compat_create_bounce_frame+0xc6/0xde
>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e019:[<0000000000000000>]
>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>>> (XEN) Guest stack trace from esp=005a5ff0:
>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>>> 0016b388
>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>>> 0016b380
>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>>> 0016b378
>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>>> 0016b370
>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>>> 0016b368
>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>>> 0016b360
>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>>> 0016b358
>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>>> 0016b350
>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>>> 0016b348
>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>>> 0016b340
>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>>> 0016b338
>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>>> 0016b330
>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>>> 0016b328
>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>>> 0016b320
>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>>> 0016b318
>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>>> 0016b310
>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>>> 0016b308
>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>>> 0016b300
>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>>> 0016b2f8
>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>>> 0016b2f0
>>>
>>> An easy way to find out which grub you are in if the machine boots is
>>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>>> (memdisk). So when trying to replicate the issue (and the domU
>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>>> then type 'halt' and relaunch the domU. Usually I can't launch more
>>> than 4-5 times in a row before it fails, often it fails on my first
>>> try.
>>>
>>> For information I have reproduced on two different AMD desktop
>>> processor machines, not sure if Intel would be any different. I'm
>>> pretty sure I did tests with grub from unstable with same result at
>>> some point, but can test again if that is likely to work.
>>>
>>> The package that is in installed on the domU side is "grub-xen".
>>>
>>> I am unable to understand how to debug grub further on my own, I have
>>> printed out text from grub so that I understood that it is the
>>> chainload that fails. I see no output from the domU grub (except when
>>> it works as it should of course). I can help with further testing if
>>> needed.
>>>
>>> /Andreas
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
[-- Attachment #2: Type: text/html, Size: 8673 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
2016-01-22 13:01 ` Andrew Cooper
@ 2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:43 ` Andrew Cooper
0 siblings, 1 reply; 6+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2016-01-22 13:08 UTC (permalink / raw)
To: Andrew Cooper, Ian Campbell, grub-devel; +Cc: Andreas Sundstrom, xen-devel
[-- Attachment #1: Type: text/plain, Size: 8465 bytes --]
On 22.01.2016 14:01, Andrew Cooper wrote:
> On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>> On 22.09.2015 10:53, Ian Campbell wrote:
>>> Hi Vladimir & grub-devel,
>>>
>>> Do you have any thoughts on this issue with i386 pv-grub2?
>>>
>> Is it still an issue? If so I'll try to replicate it. From stack dump I
>> see that it has jumped to NULL. GRUB has no threads so it's not a race
>> condition with itself but may be one with some Xen part. An altrnative
>> possibility is that grub forgets to flush cache at some point in boot
>> process.
>
> Looks like GRUB doesn't have a traptable registered with Xen (the PV
> equivalent of the IDT).
>
> First, Xen tried to inject a #GP fault and found that the entry EIP was
> at 0 (which is sadly the default if nothing is specified). It then took
> a pagefault while attempting to inject the #GP, and crashed the domain.
>
Do you have a link how to add one? We can put a catch-stacktrace-abort
on it.
> ~Andrew
>
>>> Thanks, Ian.
>>>
>>> On Mon, 2015-09-21 at 22:03 +0200, Andreas Sundstrom wrote:
>>>> This is using Debian Jessie and grub 2.02~beta2-22 (with Debian patches
>>>> applied) and Xen 4.4.1
>>>>
>>>> I originally posted a bug report with Debian but got the suggestion to
>>>> file bugs with upstream as well.
>>>> Debian bug report:
>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799480
>>>>
>>>> Note that my original thought was that this bug probably is within GRUB.
>>>> But Ian asked me to file a bug with Xen as well, you have to live with
>>>> the
>>>> fact that it is centered around GRUB though.
>>>>
>>>> Here's the information from my original bug report:
>>>>
>>>> Using 64-bit dom0 and 32-bit domU PV (para-virtualized) grub sometimes
>>>> fail when chainloading the domU's grub. 64-bit domU seem to work 100%
>>>> of the time.
>>>>
>>>> My understanding of the process:
>>>>
>>>> * dom0 launches domU with grub that is loaded from dom0's disk.
>>>> * Grub reads config file from memdisk, and then looks for grub binary in
>>>> domU filesystem.
>>>> * If grub is found in domU it then chainloads (multiboot) that grub
>>>> binary
>>>> and the domU grub reads grub.cfg and continue booting.
>>>> * If grub is not found in domU it reads grub.cfg and continues with
>>>> boot.
>>>>
>>>> It fails at step 3 in my list of the boot process, but sometimes it
>>>> does work so it may be something like a race condition that causes the
>>>> problem?
>>>>
>>>> A workaround is to not install or rename /boot/xen in domU so that the
>>>> first grub that is loaded from dom0's disk will not find the grub
>>>> binary in the domU filesystem and hence continues to read grub.cfg and
>>>> boot. The drawback of this is of course that the two versions can't
>>>> differ too much as there are different setups creating grub.cfg and
>>>> then reading/parsing it at boot time.
>>>>
>>>> I am not sure at this point whether this is a problem in XEN or a
>>>> problem in grub but I compiled the legacy pvgrub that uses some minios
>>>> from XEN (don't really know much more about it) and when that legacy
>>>> pvgrub chainloads the domU grub it seems to work 100% of the time. Now
>>>> the legace pvgrub is not a real alternative as it's not packaged for
>>>> Debian though.
>>>>
>>>> When it fails "xl create vm -c" outputs this:
>>>> Parsing config from /etc/xen/vm
>>>> libxl: error: libxl_dom.c:35:libxl__domain_type: unable to get domain
>>>> type for domid=16
>>>> Unable to attach console
>>>> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console
>>>> child [0] exited with error status 1
>>>>
>>>> And "xl dmesg" shows errors like this:
>>>> (XEN) traps.c:2514:d15 Domain attempted WRMSR 00000000c0010201 from
>>>> 0x0000000000000000 to 0x000000000000ffff.
>>>> (XEN) d16:v0: unhandled page fault (ec=0010)
>>>> (XEN) Pagetable walk from 0000000000000000:
>>>> (XEN) L4[0x000] = 0000000200256027 000000000000049c
>>>> (XEN) L3[0x000] = 0000000200255027 000000000000049d
>>>> (XEN) L2[0x000] = 0000000200251023 00000000000004a1
>>>> (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
>>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08021feb0
>>>> compat_create_bounce_frame+0xc6/0xde
>>>> (XEN) Domain 16 (vcpu#0) crashed on cpu#0:
>>>> (XEN) ----[ Xen-4.4.1 x86_64 debug=n Not tainted ]----
>>>> (XEN) CPU: 0
>>>> (XEN) RIP: e019:[<0000000000000000>]
>>>> (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest
>>>> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
>>>> (XEN) rdx: 0000000000000000 rsi: 0000000000499000 rdi: 0000000000800000
>>>> (XEN) rbp: 000000000000000a rsp: 00000000005a5ff0 r8: 0000000000000000
>>>> (XEN) r9: 0000000000000000 r10: ffff83023e9b9000 r11: ffff83023e9b9000
>>>> (XEN) r12: 0000033f3d335bfb r13: ffff82d080300800 r14: ffff82d0802ea940
>>>> (XEN) r15: ffff83005e819000 cr0: 000000008005003b cr4: 00000000000506f0
>>>> (XEN) cr3: 0000000200b7a000 cr2: 0000000000000000
>>>> (XEN) ds: e021 es: e021 fs: e021 gs: e021 ss: e021 cs: e019
>>>> (XEN) Guest stack trace from esp=005a5ff0:
>>>> (XEN) 00000010 00000000 0001e019 00010046 0016b38b 0016b38a 0016b389
>>>> 0016b388
>>>> (XEN) 0016b387 0016b386 0016b385 0016b384 0016b383 0016b382 0016b381
>>>> 0016b380
>>>> (XEN) 0016b37f 0016b37e 0016b37d 0016b37c 0016b37b 0016b37a 0016b379
>>>> 0016b378
>>>> (XEN) 0016b377 0016b376 0016b375 0016b374 0016b373 0016b372 0016b371
>>>> 0016b370
>>>> (XEN) 0016b36f 0016b36e 0016b36d 0016b36c 0016b36b 0016b36a 0016b369
>>>> 0016b368
>>>> (XEN) 0016b367 0016b366 0016b365 0016b364 0016b363 0016b362 0016b361
>>>> 0016b360
>>>> (XEN) 0016b35f 0016b35e 0016b35d 0016b35c 0016b35b 0016b35a 0016b359
>>>> 0016b358
>>>> (XEN) 0016b357 0016b356 0016b355 0016b354 0016b353 0016b352 0016b351
>>>> 0016b350
>>>> (XEN) 0016b34f 0016b34e 0016b34d 0016b34c 0016b34b 0016b34a 0016b349
>>>> 0016b348
>>>> (XEN) 0016b347 0016b346 0016b345 0016b344 0016b343 0016b342 0016b341
>>>> 0016b340
>>>> (XEN) 0016b33f 0016b33e 0016b33d 0016b33c 0016b33b 0016b33a 0016b339
>>>> 0016b338
>>>> (XEN) 0016b337 0016b336 0016b335 0016b334 0016b333 0016b332 0016b331
>>>> 0016b330
>>>> (XEN) 0016b32f 0016b32e 0016b32d 0016b32c 0016b32b 0016b32a 0016b329
>>>> 0016b328
>>>> (XEN) 0016b327 0016b326 0016b325 0016b324 0016b323 0016b322 0016b321
>>>> 0016b320
>>>> (XEN) 0016b31f 0016b31e 0016b31d 0016b31c 0016b31b 0016b31a 0016b319
>>>> 0016b318
>>>> (XEN) 0016b317 0016b316 0016b315 0016b314 0016b313 0016b312 0016b311
>>>> 0016b310
>>>> (XEN) 0016b30f 0016b30e 0016b30d 0016b30c 0016b30b 0016b30a 0016b309
>>>> 0016b308
>>>> (XEN) 0016b307 0016b306 0016b305 0016b304 0016b303 0016b302 0016b301
>>>> 0016b300
>>>> (XEN) 0016b2ff 0016b2fe 0016b2fd 0016b2fc 0016b2fb 0016b2fa 0016b2f9
>>>> 0016b2f8
>>>> (XEN) 0016b2f7 0016b2f6 0016b2f5 0016b2f4 0016b2f3 0016b2f2 0016b2f1
>>>> 0016b2f0
>>>>
>>>> An easy way to find out which grub you are in if the machine boots is
>>>> to hit 'c' and type 'ls', only the grub from dom0 will know about
>>>> (memdisk). So when trying to replicate the issue (and the domU
>>>> actually starts) you can hit 'c', type 'ls' (check for memdisk) and
>>>> then type 'halt' and relaunch the domU. Usually I can't launch more
>>>> than 4-5 times in a row before it fails, often it fails on my first
>>>> try.
>>>>
>>>> For information I have reproduced on two different AMD desktop
>>>> processor machines, not sure if Intel would be any different. I'm
>>>> pretty sure I did tests with grub from unstable with same result at
>>>> some point, but can test again if that is likely to work.
>>>>
>>>> The package that is in installed on the domU side is "grub-xen".
>>>>
>>>> I am unable to understand how to debug grub further on my own, I have
>>>> printed out text from grub so that I understood that it is the
>>>> chainload that fails. I see no output from the domU grub (except when
>>>> it works as it should of course). I can help with further testing if
>>>> needed.
>>>>
>>>> /Andreas
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2016-01-22 13:43 ` Andrew Cooper
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2016-01-22 13:43 UTC (permalink / raw)
To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
grub-devel
Cc: Andreas Sundstrom, xen-devel
On 22/01/16 13:08, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.01.2016 14:01, Andrew Cooper wrote:
>> On 22/01/16 12:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>>> On 22.09.2015 10:53, Ian Campbell wrote:
>>>> Hi Vladimir & grub-devel,
>>>>
>>>> Do you have any thoughts on this issue with i386 pv-grub2?
>>>>
>>> Is it still an issue? If so I'll try to replicate it. From stack dump I
>>> see that it has jumped to NULL. GRUB has no threads so it's not a race
>>> condition with itself but may be one with some Xen part. An altrnative
>>> possibility is that grub forgets to flush cache at some point in boot
>>> process.
>> Looks like GRUB doesn't have a traptable registered with Xen (the PV
>> equivalent of the IDT).
>>
>> First, Xen tried to inject a #GP fault and found that the entry EIP was
>> at 0 (which is sadly the default if nothing is specified). It then took
>> a pagefault while attempting to inject the #GP, and crashed the domain.
>>
> Do you have a link how to add one? We can put a catch-stacktrace-abort
> on it.
This is from my microkernel framework, and is probably the most succinct
code implementation:
http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen-test-framework.git;a=blob;f=arch/x86/pv/traps.c;h=7f9a1908d260659c10f5cbb1d2d234c9fea1edb5;hb=HEAD#l31
The hypercall ABI documentation is:
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/xen.h;h=cdd93c1c6446a92e89188c6a5132538188825d27;hb=refs/heads/staging#l126
~Andrew
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub
2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:01 ` Andrew Cooper
@ 2016-01-22 17:44 ` Andreas Sundstrom
1 sibling, 0 replies; 6+ messages in thread
From: Andreas Sundstrom @ 2016-01-22 17:44 UTC (permalink / raw)
To: Vladimir 'φ-coder/phcoder' Serbinenko, Ian Campbell,
grub-devel
Cc: xen-devel
On 2016-01-22 13:56, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 22.09.2015 10:53, Ian Campbell wrote:
>> Hi Vladimir & grub-devel,
>>
>> Do you have any thoughts on this issue with i386 pv-grub2?
>>
> Is it still an issue? If so I'll try to replicate it. From stack dump I
> see that it has jumped to NULL. GRUB has no threads so it's not a race
> condition with itself but may be one with some Xen part. An altrnative
> possibility is that grub forgets to flush cache at some point in boot
> process.
I can still reproduce the issue.
I don't think much has changed in my setup since the report.
I run the current version of Xen and GRUB from Debian stable.
/Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-01-22 21:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5600628A.20202@zappa.cx>
2015-09-22 8:53 ` [Xen-devel] [BUG] XEN domU crash when PV grub chainloads 32-bit domU grub Ian Campbell
2016-01-22 12:56 ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:01 ` Andrew Cooper
2016-01-22 13:08 ` Vladimir 'φ-coder/phcoder' Serbinenko
2016-01-22 13:43 ` Andrew Cooper
2016-01-22 17:44 ` Andreas Sundstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).