* DomU: kernel BUG at arch/x86/xen/enlighten.c:425
@ 2013-03-08 2:23 James Sinclair
2013-03-08 8:38 ` Jan Beulich
0 siblings, 1 reply; 11+ messages in thread
From: James Sinclair @ 2013-03-08 2:23 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 4616 bytes --]
Xen: (XEN) Xen version 4.1.5-pre (maker@(none)) (gcc version 4.4.5 (Debian 4.4.5-8) ) Mon Feb 4 12:59:38 EST 2013
Dom0: 3.7.7-1-x86_64 #1 SMP Thu Feb 14 15:58:35 EST 2013 x86_64 GNU/Linux
DomU: 3.7.10 (32 bit)
We're seeing this bug pop up fairly regularly. The BUG below is from the first time it's triggered after a reboot - subsequent triggers produce similar tracebacks with the kernel flagged as "Tainted: G D" - I can provide more examples if desired.
------------[ cut here ]------------
kernel BUG at arch/x86/xen/enlighten.c:425!
invalid opcode: 0000 [#1] SMP
Modules linked in:
Pid: 3158, comm: ntpd Not tainted 3.7.10-linode49 #2
EIP: 0061:[<c0102e8a>] EFLAGS: 00210282 CPU: 0
EIP is at set_aliased_prot+0x9a/0x110
EAX: ffffffea EBX: ee2e6000 ECX: 2c74a063 EDX: 80000019
ESI: 00000000 EDI: 00002000 EBP: 2c74a063 ESP: ead9da10
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
CR0: 8005003b CR2: 4a258bc4 CR3: 2aa7a000 CR4: 00002660
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process ntpd (pid: 3158, ti=ead9c000 task=eb6e3300 task.ti=ead9c000)
Stack:
0007fb6b ee2e6000 80000019 ec1ed000 00000001 ee2e6000 00000200 00002000
eb6e3300 c0102f27 eb19cc80 eb19cc80 00000000 c010ba5f eb19cc80 eb19cc80
c013083c c010863b c078c405 00000000 00000000 c091e1c0 c015ade1 00000000
Call Trace:
[<c0102f27>] ? xen_free_ldt+0x27/0x40
[<c010ba5f>] ? destroy_context+0x2f/0xa0
[<c013083c>] ? __mmdrop+0x1c/0xb0
[<c010863b>] ? __switch_to+0x13b/0x3a0
[<c078c405>] ? _raw_spin_lock+0x5/0x10
[<c015ade1>] ? finish_task_switch+0xb1/0xc0
[<c078b77d>] ? __schedule+0x1fd/0x590
[<c0639aa7>] ? get_unique_tuple+0x167/0x200
[<c06258c2>] ? nf_ct_invert_tuple+0x52/0x70
[<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
[<c078ad9d>] ? schedule_hrtimeout_range_clock+0x10d/0x120
[<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
[<c014ecca>] ? add_wait_queue+0x1a/0x50
[<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[<c078adbf>] ? schedule_hrtimeout_range+0xf/0x20
[<c01e4a47>] ? poll_schedule_timeout+0x37/0x50
[<c01e57f5>] ? do_select+0x445/0x510
[<c04f97f5>] ? info_for_irq+0x5/0x20
[<c04f9eb0>] ? evtchn_from_irq+0x10/0x40
[<c01e4be0>] ? __pollwait+0xf0/0xf0
[<c01e4be0>] ? __pollwait+0xf0/0xf0
[<c01e4be0>] ? __pollwait+0xf0/0xf0
[<c01e4be0>] ? __pollwait+0xf0/0xf0
[<c01e4be0>] ? __pollwait+0xf0/0xf0
[<c01e4be0>] ? __pollwait+0xf0/0xf0
[<c01e4be0>] ? __pollwait+0xf0/0xf0
[<c069c55c>] ? udp_sendmsg+0x2ec/0x820
[<c0678250>] ? ip_append_page+0x4e0/0x4e0
[<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
[<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[<c05d9cc2>] ? __skb_recv_datagram+0xf2/0x260
[<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
[<c05d1586>] ? lock_sock_fast+0x16/0x50
[<c069bf76>] ? udp_recvmsg+0x76/0x2e0
[<c06a293c>] ? inet_recvmsg+0x5c/0xb0
[<c06a3817>] ? inet_sendmsg+0x47/0xb0
[<c05ce847>] ? sock_recvmsg+0xe7/0x100
[<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
[<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
[<c0112017>] ? save_xstate_sig+0x297/0x310
[<c01e5a6d>] ? core_sys_select+0x1ad/0x2a0
[<c014165f>] ? __set_task_blocked+0x2f/0x80
[<c01419a9>] ? set_current_blocked+0x49/0x60
[<c0142fd4>] ? signal_delivered+0x44/0x60
[<c0111002>] ? fpu_finit+0x52/0x60
[<c018adb6>] ? __audit_syscall_exit+0x3b6/0x3f0
[<c0112749>] ? syscall_trace_leave+0xd9/0x110
[<c01e5dcf>] ? sys_select+0x2f/0xb0
[<c078c844>] ? syscall_call+0x7/0xb
Code: d2 0f a4 c2 0c c1 e0 0c 09 da 09 c6 89 f0 ff 15 d8 34 92 c0 31 f6 89 c5 8b 5c 24 04 89 54 24 08 89 c1 e8 3a e3 ff ff 85 c0 74 04 <0f> 0b eb fe 8b 04 24 8b 54 24 0c c1 e0 05 8b 04 10 c1 e8 1e 69
EIP: [<c0102e8a>] set_aliased_prot+0x9a/0x110 SS:ESP 0069:ead9da10
---[ end trace a8da5f670b3c74bb ]---
Several processes have been seen to trigger it - audispd gam java klogd ntpd perl syslogd
Eventually init will trigger it and cause a kernel panic:
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
I've done some Googling to see if this is a known issue. All I could find was a closed Debian bug from Dec 2010 with no resolution:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607709
Please let me know if you need further info from me.
Thanks,
James Sinclair
[-- Attachment #1.2: Type: text/html, Size: 7742 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2013-03-08 2:23 James Sinclair
@ 2013-03-08 8:38 ` Jan Beulich
[not found] ` <4A885B42-B352-4FE4-A0A7-2B10CF595E61@linode.com>
0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2013-03-08 8:38 UTC (permalink / raw)
To: James Sinclair; +Cc: xen-devel
>>> On 08.03.13 at 03:23, James Sinclair <james.sinclair@linode.com> wrote:
> Xen: (XEN) Xen version 4.1.5-pre (maker@(none)) (gcc version 4.4.5 (Debian
> 4.4.5-8) ) Mon Feb 4 12:59:38 EST 2013
> Dom0: 3.7.7-1-x86_64 #1 SMP Thu Feb 14 15:58:35 EST 2013 x86_64 GNU/Linux
> DomU: 3.7.10 (32 bit)
>
> We're seeing this bug pop up fairly regularly. The BUG below is from the
> first time it's triggered after a reboot - subsequent triggers produce similar
> tracebacks with the kernel flagged as "Tainted: G D" - I can provide more
> examples if desired.
Two fundamental things that are missing: For one, this is almost
certainly being accompanied by some hypervisor message(s), so
you will want to provide the hypervisor log (making sure the log
level is high enough). And then, even more with you using a not
yet released hypervisor version, you would want to clarify whether
this is a regression of some sort (i.e. whether a certain hypervisor
and/or kernel version are known not to exhibit this).
> ------------[ cut here ]------------
> kernel BUG at arch/x86/xen/enlighten.c:425!
> invalid opcode: 0000 [#1] SMP
> Modules linked in:
> Pid: 3158, comm: ntpd Not tainted 3.7.10-linode49 #2
> EIP: 0061:[<c0102e8a>] EFLAGS: 00210282 CPU: 0
> EIP is at set_aliased_prot+0x9a/0x110
> EAX: ffffffea EBX: ee2e6000 ECX: 2c74a063 EDX: 80000019
> ESI: 00000000 EDI: 00002000 EBP: 2c74a063 ESP: ead9da10
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> CR0: 8005003b CR2: 4a258bc4 CR3: 2aa7a000 CR4: 00002660
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process ntpd (pid: 3158, ti=ead9c000 task=eb6e3300 task.ti=ead9c000)
> Stack:
> 0007fb6b ee2e6000 80000019 ec1ed000 00000001 ee2e6000 00000200 00002000
> eb6e3300 c0102f27 eb19cc80 eb19cc80 00000000 c010ba5f eb19cc80 eb19cc80
> c013083c c010863b c078c405 00000000 00000000 c091e1c0 c015ade1 00000000
> Call Trace:
> [<c0102f27>] ? xen_free_ldt+0x27/0x40
Quite unusual to have a process with LDT these days. And hence
quite likely for this to be the culprit. The primary guess therefore
would be that the LDT didn't get cleared yet (MMUEXT_SET_LDT).
Jan
> [<c010ba5f>] ? destroy_context+0x2f/0xa0
> [<c013083c>] ? __mmdrop+0x1c/0xb0
> [<c010863b>] ? __switch_to+0x13b/0x3a0
> [<c078c405>] ? _raw_spin_lock+0x5/0x10
> [<c015ade1>] ? finish_task_switch+0xb1/0xc0
> [<c078b77d>] ? __schedule+0x1fd/0x590
> [<c0639aa7>] ? get_unique_tuple+0x167/0x200
> [<c06258c2>] ? nf_ct_invert_tuple+0x52/0x70
> [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
> [<c078ad9d>] ? schedule_hrtimeout_range_clock+0x10d/0x120
> [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
> [<c014ecca>] ? add_wait_queue+0x1a/0x50
> [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
> [<c078adbf>] ? schedule_hrtimeout_range+0xf/0x20
> [<c01e4a47>] ? poll_schedule_timeout+0x37/0x50
> [<c01e57f5>] ? do_select+0x445/0x510
> [<c04f97f5>] ? info_for_irq+0x5/0x20
> [<c04f9eb0>] ? evtchn_from_irq+0x10/0x40
> [<c01e4be0>] ? __pollwait+0xf0/0xf0
> [<c01e4be0>] ? __pollwait+0xf0/0xf0
> [<c01e4be0>] ? __pollwait+0xf0/0xf0
> [<c01e4be0>] ? __pollwait+0xf0/0xf0
> [<c01e4be0>] ? __pollwait+0xf0/0xf0
> [<c01e4be0>] ? __pollwait+0xf0/0xf0
> [<c01e4be0>] ? __pollwait+0xf0/0xf0
> [<c069c55c>] ? udp_sendmsg+0x2ec/0x820
> [<c0678250>] ? ip_append_page+0x4e0/0x4e0
> [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
> [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
> [<c05d9cc2>] ? __skb_recv_datagram+0xf2/0x260
> [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
> [<c05d1586>] ? lock_sock_fast+0x16/0x50
> [<c069bf76>] ? udp_recvmsg+0x76/0x2e0
> [<c06a293c>] ? inet_recvmsg+0x5c/0xb0
> [<c06a3817>] ? inet_sendmsg+0x47/0xb0
> [<c05ce847>] ? sock_recvmsg+0xe7/0x100
> [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
> [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
> [<c0112017>] ? save_xstate_sig+0x297/0x310
> [<c01e5a6d>] ? core_sys_select+0x1ad/0x2a0
> [<c014165f>] ? __set_task_blocked+0x2f/0x80
> [<c01419a9>] ? set_current_blocked+0x49/0x60
> [<c0142fd4>] ? signal_delivered+0x44/0x60
> [<c0111002>] ? fpu_finit+0x52/0x60
> [<c018adb6>] ? __audit_syscall_exit+0x3b6/0x3f0
> [<c0112749>] ? syscall_trace_leave+0xd9/0x110
> [<c01e5dcf>] ? sys_select+0x2f/0xb0
> [<c078c844>] ? syscall_call+0x7/0xb
> Code: d2 0f a4 c2 0c c1 e0 0c 09 da 09 c6 89 f0 ff 15 d8 34 92 c0 31 f6
> 89 c5 8b 5c 24 04 89 54 24 08 89 c1 e8 3a e3 ff ff 85 c0 74 04 <0f> 0b eb fe 8b
> 04 24 8b 54 24 0c c1 e0 05 8b 04 10 c1 e8 1e 69
> EIP: [<c0102e8a>] set_aliased_prot+0x9a/0x110 SS:ESP 0069:ead9da10
> ---[ end trace a8da5f670b3c74bb ]---
>
> Several processes have been seen to trigger it - audispd gam java klogd ntpd
> perl syslogd
> Eventually init will trigger it and cause a kernel panic:
>
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> I've done some Googling to see if this is a known issue. All I could find
> was a closed Debian bug from Dec 2010 with no resolution:
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607709
>
> Please let me know if you need further info from me.
>
> Thanks,
> James Sinclair
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
[not found] ` <4A885B42-B352-4FE4-A0A7-2B10CF595E61@linode.com>
@ 2013-03-12 7:45 ` Jan Beulich
2013-03-12 22:56 ` James Sinclair
0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2013-03-12 7:45 UTC (permalink / raw)
To: James Sinclair; +Cc: xen-devel
>>> On 12.03.13 at 04:56, James Sinclair <james.sinclair@linode.com> wrote:
(re-adding xen-devel to Cc)
> I've done further testing and I have confirmed that I can trigger the bug
> pretty reliably on several Xen versions:
>
> * xen-3.4.4
> * xen-4.1.4
> * xen-4.1.5-pre
>
> Dom0 kernels range between 2.6.18 & 3.7.7-1. I've also tried quite a few DomU
> kernels and I can trigger it in everything I've tried between 2.6.39.1 and
> 3.7.10. In the end, I haven't managed to find a combination of Xen, Dom0 &
> DomU where I cannot trigger it.
Partly because you apparently never tried non-pvops DomU kernels.
But that's not the point here.
> As for the hypervisor log, I'm not seeing anything get logged at all. I
> suspect that I'm not capturing them correctly. Using Xen 4.1.4 I added the
> following to the command line:
>
> loglvl=all guest_loglvl=all sync_console console_to_ring earlyprintk=xen
> debug loglevel=8
Now, this mixture of hypervisor and kernel options already
suggests that you don't look in the right place. Please put Xen
options on the Xen command line, and kernel ones on the kernel
line.
> I'm seeing extra logs from Xen during boot, but nothing when the bug is
> triggered in the DomU. I'm assuming any extra logging will show up on the
> console or in /var/log/xen/xend.log - is there somewhere else I should be
> looking? Or something I'm not setting correctly?
They show up on the serial console (if in use) or in the output of
"xl dmesg" (or "xm dmesg" if xend is in use). Without extra
precautions, they _won't_ show up anywhere under /var/log.
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2013-03-12 7:45 ` Jan Beulich
@ 2013-03-12 22:56 ` James Sinclair
2013-03-13 0:38 ` James Sinclair
2013-03-13 9:50 ` Ian Campbell
0 siblings, 2 replies; 11+ messages in thread
From: James Sinclair @ 2013-03-12 22:56 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On 12/03/2013, at 6:45 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 12.03.13 at 04:56, James Sinclair <james.sinclair@linode.com> wrote:
>
> (re-adding xen-devel to Cc)
>
>> I've done further testing and I have confirmed that I can trigger the bug
>> pretty reliably on several Xen versions:
>>
>> * xen-3.4.4
>> * xen-4.1.4
>> * xen-4.1.5-pre
>>
>> Dom0 kernels range between 2.6.18 & 3.7.7-1. I've also tried quite a few DomU
>> kernels and I can trigger it in everything I've tried between 2.6.39.1 and
>> 3.7.10. In the end, I haven't managed to find a combination of Xen, Dom0 &
>> DomU where I cannot trigger it.
>
> Partly because you apparently never tried non-pvops DomU kernels.
> But that's not the point here.
>
>> As for the hypervisor log, I'm not seeing anything get logged at all. I
>> suspect that I'm not capturing them correctly. Using Xen 4.1.4 I added the
>> following to the command line:
>>
>> loglvl=all guest_loglvl=all sync_console console_to_ring earlyprintk=xen
>> debug loglevel=8
>
> Now, this mixture of hypervisor and kernel options already
> suggests that you don't look in the right place. Please put Xen
> options on the Xen command line, and kernel ones on the kernel
> line.
The documentation I found that mentioned using "earlyprintk=xen debug loglevel=8" did not make it clear those are kernel options. I've corrected that now:
Hypervisor options: loglvl=all guest_loglvl=all sync_console console_to_ring
Kernel options: earlyprintk=xen debug loglevel=8
>
>> I'm seeing extra logs from Xen during boot, but nothing when the bug is
>> triggered in the DomU. I'm assuming any extra logging will show up on the
>> console or in /var/log/xen/xend.log - is there somewhere else I should be
>> looking? Or something I'm not setting correctly?
>
> They show up on the serial console (if in use) or in the output of
> "xl dmesg" (or "xm dmesg" if xend is in use). Without extra
> precautions, they _won't_ show up anywhere under /var/log.
I'm not seeing anything extra logged on the serial console nor in "xl dmesg" when the bug hits. The only log entry I'm getting is a warning from XendDomainInfo that the domain has crashed, and that's only logged when the bug hits a critical process (usually init) that causes the domU kernel to panic.
I'm working on building a hypervisor with "debug=y" enabled to see if that gives me anything extra.
James
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2013-03-12 22:56 ` James Sinclair
@ 2013-03-13 0:38 ` James Sinclair
2013-03-13 9:50 ` Ian Campbell
1 sibling, 0 replies; 11+ messages in thread
From: James Sinclair @ 2013-03-13 0:38 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
On 13/03/2013, at 9:56 AM, James Sinclair <james.sinclair@linode.com> wrote:
> On 12/03/2013, at 6:45 PM, Jan Beulich <JBeulich@suse.com> wrote:
>
>>>>> On 12.03.13 at 04:56, James Sinclair <james.sinclair@linode.com> wrote:
>>
>> (re-adding xen-devel to Cc)
>>
>>> I've done further testing and I have confirmed that I can trigger the bug
>>> pretty reliably on several Xen versions:
>>>
>>> * xen-3.4.4
>>> * xen-4.1.4
>>> * xen-4.1.5-pre
>>>
>>> Dom0 kernels range between 2.6.18 & 3.7.7-1. I've also tried quite a few DomU
>>> kernels and I can trigger it in everything I've tried between 2.6.39.1 and
>>> 3.7.10. In the end, I haven't managed to find a combination of Xen, Dom0 &
>>> DomU where I cannot trigger it.
>>
>> Partly because you apparently never tried non-pvops DomU kernels.
>> But that's not the point here.
>>
>>> As for the hypervisor log, I'm not seeing anything get logged at all. I
>>> suspect that I'm not capturing them correctly. Using Xen 4.1.4 I added the
>>> following to the command line:
>>>
>>> loglvl=all guest_loglvl=all sync_console console_to_ring earlyprintk=xen
>>> debug loglevel=8
>>
>> Now, this mixture of hypervisor and kernel options already
>> suggests that you don't look in the right place. Please put Xen
>> options on the Xen command line, and kernel ones on the kernel
>> line.
>
> The documentation I found that mentioned using "earlyprintk=xen debug loglevel=8" did not make it clear those are kernel options. I've corrected that now:
>
> Hypervisor options: loglvl=all guest_loglvl=all sync_console console_to_ring
> Kernel options: earlyprintk=xen debug loglevel=8
>
>>
>>> I'm seeing extra logs from Xen during boot, but nothing when the bug is
>>> triggered in the DomU. I'm assuming any extra logging will show up on the
>>> console or in /var/log/xen/xend.log - is there somewhere else I should be
>>> looking? Or something I'm not setting correctly?
>>
>> They show up on the serial console (if in use) or in the output of
>> "xl dmesg" (or "xm dmesg" if xend is in use). Without extra
>> precautions, they _won't_ show up anywhere under /var/log.
>
> I'm not seeing anything extra logged on the serial console nor in "xl dmesg" when the bug hits. The only log entry I'm getting is a warning from XendDomainInfo that the domain has crashed, and that's only logged when the bug hits a critical process (usually init) that causes the domU kernel to panic.
>
> I'm working on building a hypervisor with "debug=y" enabled to see if that gives me anything extra.
I've managed to set up a debug build of the 4.1.4 hypervisor and, with logging turned all the way up (as far as I can tell), nothing gets logged when the domU hits the kernel bug.
I'm not sure what to try next.
James
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2013-03-12 22:56 ` James Sinclair
2013-03-13 0:38 ` James Sinclair
@ 2013-03-13 9:50 ` Ian Campbell
2013-03-18 0:27 ` James Sinclair
1 sibling, 1 reply; 11+ messages in thread
From: Ian Campbell @ 2013-03-13 9:50 UTC (permalink / raw)
To: James Sinclair; +Cc: Jan Beulich, xen-devel
On Tue, 2013-03-12 at 22:56 +0000, James Sinclair wrote:
> The documentation I found that mentioned using "earlyprintk=xen debug
> loglevel=8" did not make it clear those are kernel options.
Which document was this? Perhaps we can update it.
> I've corrected that now:
>
> Hypervisor options: loglvl=all guest_loglvl=all sync_console console_to_ring
> Kernel options: earlyprintk=xen debug loglevel=8
Ian.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2013-03-13 9:50 ` Ian Campbell
@ 2013-03-18 0:27 ` James Sinclair
0 siblings, 0 replies; 11+ messages in thread
From: James Sinclair @ 2013-03-18 0:27 UTC (permalink / raw)
To: Ian Campbell; +Cc: Jan Beulich, xen-devel
On 13/03/2013, at 8:50 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2013-03-12 at 22:56 +0000, James Sinclair wrote:
>> The documentation I found that mentioned using "earlyprintk=xen debug
>> loglevel=8" did not make it clear those are kernel options.
>
> Which document was this? Perhaps we can update it.
I thought it was in the wiki but after going through my browser's history of wiki pages I didn't find it.
Anyway, I've corrected my configuration and triggered the bug - still nothing being logged when it happens. What's the next step? I there something else I should try to diagnose the issue?
James
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
@ 2015-09-15 16:03 Thomas DEBESSE
2015-09-15 16:09 ` Andrew Cooper
0 siblings, 1 reply; 11+ messages in thread
From: Thomas DEBESSE @ 2015-09-15 16:03 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 4713 bytes --]
Hi, I'm replying to this thread from 2013:
http://lists.xen.org/archives/html/xen-devel/2013-03/threads.html#00649
Like James Sinclair, all I could find is a closed Debian bug from Dec 2010
with no resolution:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=60770
Do you have some news about this bug?
I got it too with a 3.16 kernel on Debian:
Sep 15 16:57:14 server kernel: [ 19.844447] ------------[ cut here
]------------
Sep 15 16:57:14 server kernel: [ 19.844468] kernel BUG at
/build/linux-sPqfgd/linux-3.16.7-ckt11/arch/x86/xen/enlighten.c:494!
Sep 15 16:57:14 server kernel: [ 19.844479] invalid opcode: 0000 [#1] SMP
Sep 15 16:57:14 server kernel: [ 19.844487] Modules linked in: fuse nfsd
auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc evdev coretemp
pcspkr ext4 crc16 mbcache jbd2 dm_mod md_mod xen_netfront xen_blkfront
Sep 15 16:57:14 server kernel: [ 19.844519] CPU: 1 PID: 930 Comm: cmd Not
tainted 3.16.0-4-686-pae #1 Debian 3.16.7-ckt11-1
Sep 15 16:57:14 server kernel: [ 19.844529] task: e8ba4560 ti: c29f8000
task.ti: c29f8000
Sep 15 16:57:14 server kernel: [ 19.844535] EIP: 0061:[<c100373d>]
EFLAGS: 00010282 CPU: 1
Sep 15 16:57:14 server kernel: [ 19.844545] EIP is at
set_aliased_prot+0x10d/0x120
Sep 15 16:57:14 server kernel: [ 19.844551] EAX: ffffffea EBX: ede01000
ECX: cc5ae063 EDX: 80000000
Sep 15 16:57:14 server kernel: [ 19.844558] ESI: 00000000 EDI: 80000001
EBP: c29f9dbc ESP: c29f9d98
Sep 15 16:57:14 server kernel: [ 19.844564] DS: 007b ES: 007b FS: 00d8
GS: 00e0 SS: 0069
Sep 15 16:57:14 server kernel: [ 19.844570] CR0: 8005003b CR2: 00111484
CR3: 029ab000 CR4: 00002660
Sep 15 16:57:14 server kernel: [ 19.844578] Stack:
Sep 15 16:57:14 server kernel: [ 19.844582] 80000000 cc5ae063 001f3c8a
ede01000 ecac2140 00000001 ede02000 00000400
Sep 15 16:57:14 server kernel: [ 19.844594] 00000000 c29f9dd0 c1003781
c2831ac0 e8892010 c2831ac0 c29f9ddc c10122be
Sep 15 16:57:14 server kernel: [ 19.844606] 00000000 c29f9e00 c1053fa6
c29f9df0 c1002e90 e8ba4560 ecdcf8c0 00000000
Sep 15 16:57:14 server kernel: [ 19.844618] Call Trace:
Sep 15 16:57:14 server kernel: [ 19.844628] [<c1003781>] ?
xen_free_ldt+0x31/0x40
Sep 15 16:57:14 server kernel: [ 19.844640] [<c10122be>] ?
destroy_context+0x2e/0x90
Sep 15 16:57:14 server kernel: [ 19.844651] [<c1053fa6>] ?
__mmdrop+0x26/0x90
Sep 15 16:57:14 server kernel: [ 19.844659] [<c1002e90>] ?
xen_end_context_switch+0x10/0x20
Sep 15 16:57:14 server kernel: [ 19.844668] [<c107c59f>] ?
finish_task_switch+0x9f/0xd0
Sep 15 16:57:14 server kernel: [ 19.844677] [<c1478e60>] ?
__schedule+0x230/0x6e0
Sep 15 16:57:14 server kernel: [ 19.844685] [<c116e381>] ?
__sb_end_write+0x31/0x70
Sep 15 16:57:14 server kernel: [ 19.844694] [<c117361c>] ?
pipe_write+0x34c/0x3d0
Sep 15 16:57:14 server kernel: [ 19.844703] [<c147be59>] ?
_raw_spin_lock_irqsave+0x19/0x40
Sep 15 16:57:14 server kernel: [ 19.844713] [<c147baa3>] ?
_raw_spin_unlock_irqrestore+0x13/0x20
Sep 15 16:57:14 server kernel: [ 19.844723] [<c1090398>] ?
prepare_to_wait+0x48/0x70
Sep 15 16:57:14 server kernel: [ 19.844732] [<c117324d>] ?
pipe_wait+0x4d/0x80
Sep 15 16:57:14 server kernel: [ 19.844740] [<c1090680>] ?
prepare_to_wait_event+0xd0/0xd0
Sep 15 16:57:14 server kernel: [ 19.844749] [<c11737f1>] ?
pipe_read+0x151/0x260
Sep 15 16:57:14 server kernel: [ 19.844758] [<c116bd96>] ?
new_sync_read+0x66/0xa0
Sep 15 16:57:14 server kernel: [ 19.844766] [<c116bd30>] ?
default_llseek+0x170/0x170
Sep 15 16:57:14 server kernel: [ 19.844774] [<c116c620>] ?
vfs_read+0x80/0x150
Sep 15 16:57:14 server kernel: [ 19.844780] [<c116cdc6>] ?
SyS_read+0x46/0x90
Sep 15 16:57:14 server kernel: [ 19.844789] [<c147c2df>] ?
sysenter_do_call+0x12/0x12
Sep 15 16:57:14 server kernel: [ 19.844794] Code: 2e 83 c4 18 5b 5e 5f 5d
c3 90 8d 74 26 00 83 3d d4 92 76 c1 02 75 c8 8d b4 26 00 00 00 00 e8 2b 5e
13 00 83 c4 18 5b 5e 5f 5d c3 <0f> 0b 0f 0b 0f 0b 8d b6 00 00 00 00 8d bc
27 00 00 00 00 55 89
Sep 15 16:57:14 server kernel: [ 19.844868] EIP: [<c100373d>]
set_aliased_prot+0x10d/0x120 SS:ESP 0069:c29f9d98
Sep 15 16:57:14 server kernel: [ 19.844882] ---[ end trace
5b8a5a9c639bac8c ]---
The message above is from DomU kernel. In fact, when I get this message,
I'm lucky: it means the error was handled without crashing. Most of the
case the vm just reboot itself before logging or printing any message at
all.
On Dom0 side, `xl dmesg` shows nothing.
I downgraded my DomU kernel to 3.2 and it seems to work for now but it's
not a fix.
I was running xen 4.4.1-9 and linux 3.16.7-ckt11-1 (686-pae) from Debian.
I don't have more information, at all.
--
Thomas DEBESSE
[-- Attachment #1.2: Type: text/html, Size: 5515 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2015-09-15 16:03 DomU: kernel BUG at arch/x86/xen/enlighten.c:425 Thomas DEBESSE
@ 2015-09-15 16:09 ` Andrew Cooper
2015-09-15 16:28 ` Thomas DEBESSE
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2015-09-15 16:09 UTC (permalink / raw)
To: Thomas DEBESSE, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 5281 bytes --]
On 15/09/15 17:03, Thomas DEBESSE wrote:
> Hi, I'm replying to this thread from 2013:
> http://lists.xen.org/archives/html/xen-devel/2013-03/threads.html#00649
>
> Like James Sinclair, all I could find is a closed Debian bug from Dec
> 2010 with no resolution:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=60770
>
> Do you have some news about this bug?
>
> I got it too with a 3.16 kernel on Debian:
>
> Sep 15 16:57:14 server kernel: [ 19.844447] ------------[ cut here
> ]------------
> Sep 15 16:57:14 server kernel: [ 19.844468] kernel BUG at
> /build/linux-sPqfgd/linux-3.16.7-ckt11/arch/x86/xen/enlighten.c:494!
> Sep 15 16:57:14 server kernel: [ 19.844479] invalid opcode: 0000
> [#1] SMP
> Sep 15 16:57:14 server kernel: [ 19.844487] Modules linked in: fuse
> nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc evdev
> coretemp pcspkr ext4 crc16 mbcache jbd2 dm_mod md_mod xen_netfront
> xen_blkfront
> Sep 15 16:57:14 server kernel: [ 19.844519] CPU: 1 PID: 930 Comm:
> cmd Not tainted 3.16.0-4-686-pae #1 Debian 3.16.7-ckt11-1
> Sep 15 16:57:14 server kernel: [ 19.844529] task: e8ba4560 ti:
> c29f8000 task.ti: c29f8000
> Sep 15 16:57:14 server kernel: [ 19.844535] EIP: 0061:[<c100373d>]
> EFLAGS: 00010282 CPU: 1
> Sep 15 16:57:14 server kernel: [ 19.844545] EIP is at
> set_aliased_prot+0x10d/0x120
> Sep 15 16:57:14 server kernel: [ 19.844551] EAX: ffffffea EBX:
> ede01000 ECX: cc5ae063 EDX: 80000000
> Sep 15 16:57:14 server kernel: [ 19.844558] ESI: 00000000 EDI:
> 80000001 EBP: c29f9dbc ESP: c29f9d98
> Sep 15 16:57:14 server kernel: [ 19.844564] DS: 007b ES: 007b FS:
> 00d8 GS: 00e0 SS: 0069
> Sep 15 16:57:14 server kernel: [ 19.844570] CR0: 8005003b CR2:
> 00111484 CR3: 029ab000 CR4: 00002660
> Sep 15 16:57:14 server kernel: [ 19.844578] Stack:
> Sep 15 16:57:14 server kernel: [ 19.844582] 80000000 cc5ae063
> 001f3c8a ede01000 ecac2140 00000001 ede02000 00000400
> Sep 15 16:57:14 server kernel: [ 19.844594] 00000000 c29f9dd0
> c1003781 c2831ac0 e8892010 c2831ac0 c29f9ddc c10122be
> Sep 15 16:57:14 server kernel: [ 19.844606] 00000000 c29f9e00
> c1053fa6 c29f9df0 c1002e90 e8ba4560 ecdcf8c0 00000000
> Sep 15 16:57:14 server kernel: [ 19.844618] Call Trace:
> Sep 15 16:57:14 server kernel: [ 19.844628] [<c1003781>] ?
> xen_free_ldt+0x31/0x40
> Sep 15 16:57:14 server kernel: [ 19.844640] [<c10122be>] ?
> destroy_context+0x2e/0x90
> Sep 15 16:57:14 server kernel: [ 19.844651] [<c1053fa6>] ?
> __mmdrop+0x26/0x90
> Sep 15 16:57:14 server kernel: [ 19.844659] [<c1002e90>] ?
> xen_end_context_switch+0x10/0x20
> Sep 15 16:57:14 server kernel: [ 19.844668] [<c107c59f>] ?
> finish_task_switch+0x9f/0xd0
> Sep 15 16:57:14 server kernel: [ 19.844677] [<c1478e60>] ?
> __schedule+0x230/0x6e0
> Sep 15 16:57:14 server kernel: [ 19.844685] [<c116e381>] ?
> __sb_end_write+0x31/0x70
> Sep 15 16:57:14 server kernel: [ 19.844694] [<c117361c>] ?
> pipe_write+0x34c/0x3d0
> Sep 15 16:57:14 server kernel: [ 19.844703] [<c147be59>] ?
> _raw_spin_lock_irqsave+0x19/0x40
> Sep 15 16:57:14 server kernel: [ 19.844713] [<c147baa3>] ?
> _raw_spin_unlock_irqrestore+0x13/0x20
> Sep 15 16:57:14 server kernel: [ 19.844723] [<c1090398>] ?
> prepare_to_wait+0x48/0x70
> Sep 15 16:57:14 server kernel: [ 19.844732] [<c117324d>] ?
> pipe_wait+0x4d/0x80
> Sep 15 16:57:14 server kernel: [ 19.844740] [<c1090680>] ?
> prepare_to_wait_event+0xd0/0xd0
> Sep 15 16:57:14 server kernel: [ 19.844749] [<c11737f1>] ?
> pipe_read+0x151/0x260
> Sep 15 16:57:14 server kernel: [ 19.844758] [<c116bd96>] ?
> new_sync_read+0x66/0xa0
> Sep 15 16:57:14 server kernel: [ 19.844766] [<c116bd30>] ?
> default_llseek+0x170/0x170
> Sep 15 16:57:14 server kernel: [ 19.844774] [<c116c620>] ?
> vfs_read+0x80/0x150
> Sep 15 16:57:14 server kernel: [ 19.844780] [<c116cdc6>] ?
> SyS_read+0x46/0x90
> Sep 15 16:57:14 server kernel: [ 19.844789] [<c147c2df>] ?
> sysenter_do_call+0x12/0x12
> Sep 15 16:57:14 server kernel: [ 19.844794] Code: 2e 83 c4 18 5b 5e
> 5f 5d c3 90 8d 74 26 00 83 3d d4 92 76 c1 02 75 c8 8d b4 26 00 00 00
> 00 e8 2b 5e 13 00 83 c4 18 5b 5e 5f 5d c3 <0f> 0b 0f 0b 0f 0b 8d b6 00
> 00 00 00 8d bc 27 00 00 00 00 55 89
> Sep 15 16:57:14 server kernel: [ 19.844868] EIP: [<c100373d>]
> set_aliased_prot+0x10d/0x120 SS:ESP 0069:c29f9d98
> Sep 15 16:57:14 server kernel: [ 19.844882] ---[ end trace
> 5b8a5a9c639bac8c ]---
>
> The message above is from DomU kernel. In fact, when I get this
> message, I'm lucky: it means the error was handled without crashing.
> Most of the case the vm just reboot itself before logging or printing
> any message at all.
> On Dom0 side, `xl dmesg` shows nothing.
>
> I downgraded my DomU kernel to 3.2 and it seems to work for now but
> it's not a fix.
> //
>
> I was running xen 4.4.1-9 and linux 3.16.7-ckt11-1 (686-pae) from Debian.
>
> I don't have more information, at all.
The instantiation of HYPERVISOR_update_va_mapping() in
set_aliased_prot() has always been buggy in pvops kernels.
This bug should be fixed by c/s 0b0e55 "x86/xen: Probe target addresses
in set_aliased_prot before the hypercall" which is in the process of
being backported to #stable as a prerequisite for the recent LDT CVE fixes.
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 7731 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2015-09-15 16:09 ` Andrew Cooper
@ 2015-09-15 16:28 ` Thomas DEBESSE
2015-09-15 16:35 ` Andrew Cooper
0 siblings, 1 reply; 11+ messages in thread
From: Thomas DEBESSE @ 2015-09-15 16:28 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 629 bytes --]
2015-09-15 18:09 GMT+02:00 Andrew Cooper <andrew.cooper3@citrix.com>:
> The instantiation of HYPERVISOR_update_va_mapping() in set_aliased_prot()
> has always been buggy in pvops kernels.
>
> This bug should be fixed by c/s 0b0e55 "x86/xen: Probe target addresses in
> set_aliased_prot before the hypercall" which is in the process of being
> backported to #stable as a prerequisite for the recent LDT CVE fixes.
>
> ~Andrew
>
OK thanks, I will stick with the 3.2 kernel for the moment, it seems to
work with it.
I don't know why, but only one vm seems to be affected (and they all are
built the same way).
--
Thomas DEBESSE
[-- Attachment #1.2: Type: text/html, Size: 1283 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425
2015-09-15 16:28 ` Thomas DEBESSE
@ 2015-09-15 16:35 ` Andrew Cooper
0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2015-09-15 16:35 UTC (permalink / raw)
To: Thomas DEBESSE; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 821 bytes --]
On 15/09/15 17:28, Thomas DEBESSE wrote:
> 2015-09-15 18:09 GMT+02:00 Andrew Cooper <andrew.cooper3@citrix.com
> <mailto:andrew.cooper3@citrix.com>>:
>
> The instantiation of HYPERVISOR_update_va_mapping() in
> set_aliased_prot() has always been buggy in pvops kernels.
>
> This bug should be fixed by c/s 0b0e55 "x86/xen: Probe target
> addresses in set_aliased_prot before the hypercall" which is in
> the process of being backported to #stable as a prerequisite for
> the recent LDT CVE fixes.
>
> ~Andrew
>
>
> OK thanks, I will stick with the 3.2 kernel for the moment, it seems
> to work with it.
> I don't know why, but only one vm seems to be affected (and they all
> are built the same way).
It will depend on userspace running in the VM as to whether the bug gets
tickled.
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 1945 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-09-15 16:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-15 16:03 DomU: kernel BUG at arch/x86/xen/enlighten.c:425 Thomas DEBESSE
2015-09-15 16:09 ` Andrew Cooper
2015-09-15 16:28 ` Thomas DEBESSE
2015-09-15 16:35 ` Andrew Cooper
-- strict thread matches above, loose matches on Subject: below --
2013-03-08 2:23 James Sinclair
2013-03-08 8:38 ` Jan Beulich
[not found] ` <4A885B42-B352-4FE4-A0A7-2B10CF595E61@linode.com>
2013-03-12 7:45 ` Jan Beulich
2013-03-12 22:56 ` James Sinclair
2013-03-13 0:38 ` James Sinclair
2013-03-13 9:50 ` Ian Campbell
2013-03-18 0:27 ` James Sinclair
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).