* DomU Oopsing on xen-3.0-testing changeset 8259
@ 2005-12-23 19:03 Charles Duffy
2006-06-19 13:12 ` Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] Charles Duffy
0 siblings, 1 reply; 6+ messages in thread
From: Charles Duffy @ 2005-12-23 19:03 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 508 bytes --]
One of my DomUs is sporadically oopsing, roughly once per day. This was
first observed on a pre-3.0-release changeset; after upgrading to
changeset 8259 on the xen-3.0-testing branch (after the release), it
still occurs.
This effectively kills the instance when it occurs -- worse, the
instance in question *stays* down even though panic=5 is specified as an
extra parameter to be passed to the DomU kernel.
The text of the oops is attached, as are my kernel configs (which are a
touch nonstandard).
[-- Attachment #2: xen-oops --]
[-- Type: text/plain, Size: 1530 bytes --]
Unable to handle kernel NULL pointer dereference at 0000000000000a33 RIP:
<ffffffff8013e0a3>{__dequeue_signal+275}
PGD 11146067 PUD 185ee067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: ipv6 dm_mod
Pid: 2642, comm: java Tainted: GF 2.6.12.6-xenU
RIP: e030:[<ffffffff8013e0a3>] <ffffffff8013e0a3>{__dequeue_signal+275}
RSP: e02b:ffff88002aa1be58 EFLAGS: 00010046
RAX: 0000000000000a33 RBX: ffff880013d3fa88 RCX: 0000000000000009
RDX: 0000000000000200 RSI: 0000000000000a33 RDI: ffff880013d3fa88
RBP: ffff88003e680a30 R08: ffff88002aa1a000 R09: 0000000000000000
R10: 0000000000000060 R11: 0000000000000612 R12: 000000000000000a
R13: ffff88002aa1bec8 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000048c34960(0063) GS:ffffffff803e9d80(0000) knlGS:ffffffff803e9d80
CS: e033 DS: 0000 ES: 0000
Process java (pid: 2642, threadinfo ffff88002aa1a000, task ffff880013d3f4b0)
Stack: 0000000000000000 ffff88002aa1bec0 ffff88002aa1bec8 ffff880013d3f4b0
0000000000000000 ffffffff8013e1dd ffff88002aa1bed8 0000000000000000
0000000000000000 7fffffffffffffff
Call Trace:<ffffffff8013e1dd>{dequeue_signal+45} <ffffffff80140a1d>{sys_rt_sigtimedwait+589}
<ffffffff80140d93>{sys_tgkill+291} <ffffffff801403cd>{sigprocmask+253}
<ffffffff801404d3>{sys_rt_sigprocmask+211} <ffffffff80112406>{system_call+134}
<ffffffff80112380>{system_call+0}
Code: 48 8b 00 0f 18 08 48 39 de 75 e2 48 85 ed 0f 84 84 00 00 00
RIP <ffffffff8013e0a3>{__dequeue_signal+275} RSP <ffff88002aa1be58>
CR2: 0000000000000a33
[-- Attachment #3: config-2.6.12.6-xen0.bz2 --]
[-- Type: application/octet-stream, Size: 7761 bytes --]
[-- Attachment #4: config-2.6.12.6-xenU.bz2 --]
[-- Type: application/octet-stream, Size: 4522 bytes --]
[-- Attachment #5: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread* Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] 2005-12-23 19:03 DomU Oopsing on xen-3.0-testing changeset 8259 Charles Duffy @ 2006-06-19 13:12 ` Charles Duffy 2006-06-19 13:59 ` Keir Fraser 2006-06-19 14:00 ` Keir Fraser 0 siblings, 2 replies; 6+ messages in thread From: Charles Duffy @ 2006-06-19 13:12 UTC (permalink / raw) To: xen-devel I'm seeing the same behavior I previously reported against xen-3.0-testing changeset 8259, albeit much more sporadically, on Xen 3.0.2 (with a 2.6.16.16 kernel built via the Gentoo Xen packages). I'd use stock XenSource binaries, but last I checked they don't have support for some of my hardware (ie. the 3w9xxx driver). Hints on anything I can do to provide more detailed information (in the hopes of actually getting this fixed) would be welcome. ksymoops outlook looks like the following: RIP: e030:[<ffffffff8013b1e3>] <ffffffff8013b1e3>{__dequeue_signal+259} Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 RSP: e02b:ffff88003144fe38 EFLAGS: 00010446 RAX: 0000000000000000 RBX: ffff88000b3e06d0 RCX: 0000000000000009 RDX: 0000000000000200 RSI: ffff88003144feb0 RDI: 0000000000000000 RBP: ffff88003144fe68 R08: ffff88003144e000 R09: 0000000000000000 R10: 0000000000000060 R11: 00000000fffffffa R12: ffff88001b05c950 R13: 000000000000000a R14: ffff88003144feb8 R15: 000000000000000a FS: 00002b464b3890a0(0063) GS:ffffffff80535000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Stack: 000000003144fe88 0000000000000000 ffff88003144feb0 ffff88003144feb8 ffff88000b3e00c0 0000000000000000 ffff88003144fe98 ffffffff8013b2f0 0000000000000000 0000000000000000 Call Trace: <ffffffff8013b2f0>{dequeue_signal+48} <ffffffff8013dcf4>{sys_rt_sigtimedwait+596} <ffffffff8013e02a>{do_tkill+250} <ffffffff8013ab12>{recalc_sigpending+18} <ffffffff8013d621>{sigprocmask+225} <ffffffff8013d79c>{sys_rt_sigprocmask+268} <ffffffff8010b27e>{system_call+134} <ffffffff8010b1f8>{system_call+0} Code: 48 8b 00 0f 18 08 48 39 df 75 e4 4d 85 e4 74 64 49 8b 54 24 >>RIP; ffffffff8013b1e3 <__dequeue_signal+103/1e0> <===== >>RBX; ffff88000b3e06d0 <__start___xen_guest+ffff88000b3d42ea/ffffffff800f3c1a> >>RSI; ffff88003144feb0 <__start___xen_guest+ffff880031443aca/ffffffff800f3c1a> >>RBP; ffff88003144fe68 <__start___xen_guest+ffff880031443a82/ffffffff800f3c1a> >>R08; ffff88003144e000 <__start___xen_guest+ffff880031441c1a/ffffffff800f3c1a> >>R11; 00000000fffffffa <__start___xen_guest+ffff3c14/ffffffff800f3c1a> >>R12; ffff88001b05c950 <__start___xen_guest+ffff88001b05056a/ffffffff800f3c1a> >>R14; ffff88003144feb8 <__start___xen_guest+ffff880031443ad2/ffffffff800f3c1a> Trace; ffffffff8013b2f0 <dequeue_signal+30/e0> Trace; ffffffff8013e02a <do_tkill+fa/150> Trace; ffffffff8013d621 <sigprocmask+e1/150> Trace; ffffffff8010b27e <system_call+86/8b> Code; ffffffff8013b1e3 <__dequeue_signal+103/1e0> 0000000000000000 <_RIP>: Code; ffffffff8013b1e3 <__dequeue_signal+103/1e0> <===== 0: 48 8b 00 mov (%rax),%rax <===== Code; ffffffff8013b1e6 <__dequeue_signal+106/1e0> 3: 0f 18 08 prefetcht0 (%rax) Code; ffffffff8013b1e9 <__dequeue_signal+109/1e0> 6: 48 39 df cmp %rbx,%rdi Code; ffffffff8013b1ec <__dequeue_signal+10c/1e0> 9: 75 e4 jne ffffffffffffffef <_RIP+0xffffffffffffffef> Code; ffffffff8013b1ee <__dequeue_signal+10e/1e0> b: 4d 85 e4 test %r12,%r12 Code; ffffffff8013b1f1 <__dequeue_signal+111/1e0> e: 74 64 je 74 <_RIP+0x74> Code; ffffffff8013b1f3 <__dequeue_signal+113/1e0> 10: 49 8b 54 24 00 mov 0x0(%r12),%rdx CR2: 0000000000000000 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] 2006-06-19 13:12 ` Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] Charles Duffy @ 2006-06-19 13:59 ` Keir Fraser 2006-06-26 21:15 ` Charles Duffy 2006-07-18 20:20 ` Charles Duffy 2006-06-19 14:00 ` Keir Fraser 1 sibling, 2 replies; 6+ messages in thread From: Keir Fraser @ 2006-06-19 13:59 UTC (permalink / raw) To: Charles Duffy; +Cc: xen-devel On 19 Jun 2006, at 14:12, Charles Duffy wrote: > I'm seeing the same behavior I previously reported against > xen-3.0-testing changeset 8259, albeit much more sporadically, on Xen > 3.0.2 (with a 2.6.16.16 kernel built via the Gentoo Xen packages). I'd > use stock XenSource binaries, but last I checked they don't have > support for some of my hardware (ie. the 3w9xxx driver). > > Hints on anything I can do to provide more detailed information (in > the hopes of actually getting this fixed) would be welcome. Does it always crash in __dequeue_signal()? You might have to add some tracing in there to find out exactly which part of the function it is crashing in. -- Keir ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] 2006-06-19 13:59 ` Keir Fraser @ 2006-06-26 21:15 ` Charles Duffy 2006-07-18 20:20 ` Charles Duffy 1 sibling, 0 replies; 6+ messages in thread From: Charles Duffy @ 2006-06-26 21:15 UTC (permalink / raw) To: xen-devel Keir Fraser wrote: > Does it always crash in __dequeue_signal()? Yes. > You might have to add some tracing in there to find out exactly which > part of the function it is crashing in. Any way to do that with less performance impact than adding printks to what I presume is a quite-commonly-called method? This system is used by internal personnel; while its performance and uptime aren't critical, it would be a good thing if they weren't impacted more than necessary. > Also, try upgrading to 3.0-testing tip and see if you still get the problem. Should it be adequate to upgrade this DomU only, or is there cause to also upgrade the hypervisor and Dom0? (I'm also building a kernel with debug symbols; I anticipate that this will let me get an annotated disassembled copy of the source in question, and thus figure out which line of source maps to the instruction offset from the top of the method we're in... at least, in theory). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] 2006-06-19 13:59 ` Keir Fraser 2006-06-26 21:15 ` Charles Duffy @ 2006-07-18 20:20 ` Charles Duffy 1 sibling, 0 replies; 6+ messages in thread From: Charles Duffy @ 2006-07-18 20:20 UTC (permalink / raw) To: xen-devel Keir Fraser wrote: > On 19 Jun 2006, at 14:12, Charles Duffy wrote: > >> I'm seeing the same behavior I previously reported against >> xen-3.0-testing changeset 8259, albeit much more sporadically, on Xen >> 3.0.2 (with a 2.6.16.16 kernel built via the Gentoo Xen packages). I'd >> use stock XenSource binaries, but last I checked they don't have >> support for some of my hardware (ie. the 3w9xxx driver). >> >> Hints on anything I can do to provide more detailed information (in >> the hopes of actually getting this fixed) would be welcome. > > Does it always crash in __dequeue_signal()? You might have to add some > tracing in there to find out exactly which part of the function it is > crashing in. Okay. I've rebuilt against a debug-enabled kernel, and (on getting another panic) decompiled vmlinux to try to match the instructions it's failing in to an individual line. The crash appears to be occurring in this second instruction generated associated with kernel/signal.c:1976 (from Linux-2.6.16.16+Xen 3.0.2): kernel/signal.c:1976 /* Run the handler. */ *return_ka = *ka; ffffffff8013d152: 48 8b 75 d0 mov 0xffffffffffffffd0(%rbp),%rsi ffffffff8013d156: 48 89 06 mov %rax,(%rsi) <<<=== HERE ffffffff8013d159: 48 8b 42 f0 mov 0xfffffffffffffff0(%rdx),%rax ffffffff8013d15d: 48 89 46 08 mov %rax,0x8(%rsi) ffffffff8013d161: 48 8b 42 f8 mov 0xfffffffffffffff8(%rdx),%rax ffffffff8013d165: 48 89 46 10 mov %rax,0x10(%rsi) ffffffff8013d169: 48 8b 41 18 mov 0x18(%rcx),%rax ffffffff8013d16d: 48 89 46 18 mov %rax,0x18(%rsi) My x86 assembler is tremendously rusty, but it looks to me like return_ka (which is passed in as a parameter to get_signal_to_deliver) points somewhere it shouldn't. This parameter is passed in from arch/x86_64/kernel/signal.c's do_signal(), where it's declared as a function-local variable with its home on the stack. The code all looks fine at a glance -- but since the top of the stack is at ffff88013e87fe18, it doesn't make much sense for a variable living on the stack defined just a few calls ago to be at 7c51186269a192da. I'm guessing there's some kind of funky race condition going on -- but beyond that vague assertion, I'm pretty much lost. Ideas, anyone? ksymoops output follows: CPU 0 Pid: 16571, comm: java Not tainted 2.6.16.18-xen #4 RIP: e030:[<ffffffff8013d156>] <ffffffff8013d156>{get_signal_to_deliver+662} Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 RSP: e02b:ffff88013e87fdc8 EFLAGS: 00010406 RAX: 00002ab89d447a1b RBX: 000000000000000a RCX: ffff88000061eb68 RDX: ffff88000061eb80 RSI: 7c51186269a192da RDI: ffff880144962750 RBP: ffff88013e87fe18 R08: 0000000000000000 R09: 0000000000003a66 R10: 0000000000000000 R11: ffffffff8010b27e R12: 000000000000000a R13: ffff88013e87fe48 R14: 0000000000000008 R15: ffff88013e87fe48 FS: 00002b47b9c0f900(0063) GS:ffffffff80535000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Stack: ffff88013e87fe68 7acefa865eaca248 26ab946c27ba950b 46b67a71dd1c67e3 7c51186269a192da 1287d8a161cad8d5 60de4c306b46ae9f a035a0ac294ee773 6cd46345a1e152ae 228b761ceaf9a045 Call Trace: <ffffffff8010b27e>{system_call+134} <ffffffff8010ad69>{sys_rt_sigsuspend+249} <ffffffff8010b681>{ptregscall_common+61} Code: 48 89 06 48 8b 42 f0 48 89 46 08 48 8b 42 f8 48 89 46 10 48 >>RIP; ffffffff8013d156 <get_signal_to_deliver+296/6e0> <===== >>RAX; 00002ab89d447a1b <__crc_ioctl_by_bdev+2ab79d5e3940/fffffffe8029bf25> >>RCX; ffff88000061eb68 <__crc_ioctl_by_bdev+ffff87ff007baa8d/fffffffe8029bf25> >>RDX; ffff88000061eb80 <__crc_ioctl_by_bdev+ffff87ff007baaa5/fffffffe8029bf25> >>RSI; 7c51186269a192da <__crc_ioctl_by_bdev+7c51186169bb51ff/fffffffe8029bf25> >>RDI; ffff880144962750 <__crc_ioctl_by_bdev+ffff880044afe675/fffffffe8029bf25> >>RBP; ffff88013e87fe18 <__crc_ioctl_by_bdev+ffff88003ea1bd3d/fffffffe8029bf25> >>R11; ffffffff8010b27e <system_call+86/8b> >>R13; ffff88013e87fe48 <__crc_ioctl_by_bdev+ffff88003ea1bd6d/fffffffe8029bf25> >>R15; ffff88013e87fe48 <__crc_ioctl_by_bdev+ffff88003ea1bd6d/fffffffe8029bf25> Trace; ffffffff8010b27e <system_call+86/8b> Trace; ffffffff8010b681 <ptregscall_common+3d/64> Code; ffffffff8013d156 <get_signal_to_deliver+296/6e0> 0000000000000000 <_RIP>: Code; ffffffff8013d156 <get_signal_to_deliver+296/6e0> <===== 0: 48 89 06 mov %rax,(%rsi) <===== Code; ffffffff8013d159 <get_signal_to_deliver+299/6e0> 3: 48 8b 42 f0 mov 0xfffffffffffffff0(%rdx),%rax Code; ffffffff8013d15d <get_signal_to_deliver+29d/6e0> 7: 48 89 46 08 mov %rax,0x8(%rsi) Code; ffffffff8013d161 <get_signal_to_deliver+2a1/6e0> b: 48 8b 42 f8 mov 0xfffffffffffffff8(%rdx),%rax Code; ffffffff8013d165 <get_signal_to_deliver+2a5/6e0> f: 48 89 46 10 mov %rax,0x10(%rsi) Code; ffffffff8013d169 <get_signal_to_deliver+2a9/6e0> 13: 48 00 00 rex64 add %al,(%rax) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] 2006-06-19 13:12 ` Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] Charles Duffy 2006-06-19 13:59 ` Keir Fraser @ 2006-06-19 14:00 ` Keir Fraser 1 sibling, 0 replies; 6+ messages in thread From: Keir Fraser @ 2006-06-19 14:00 UTC (permalink / raw) To: Charles Duffy; +Cc: xen-devel On 19 Jun 2006, at 14:12, Charles Duffy wrote: > I'm seeing the same behavior I previously reported against > xen-3.0-testing changeset 8259, albeit much more sporadically, on Xen > 3.0.2 (with a 2.6.16.16 kernel built via the Gentoo Xen packages). I'd > use stock XenSource binaries, but last I checked they don't have > support for some of my hardware (ie. the 3w9xxx driver). > > Hints on anything I can do to provide more detailed information (in > the hopes of actually getting this fixed) would be welcome. Also, try upgrading to 3.0-testing tip and see if you still get the problem. -- Keir ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-07-18 20:20 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-12-23 19:03 DomU Oopsing on xen-3.0-testing changeset 8259 Charles Duffy 2006-06-19 13:12 ` Oops in xen 3.0.2 dequeue_signal [was: Re: DomU Oopsing on xen-3.0-testing changeset 8259] Charles Duffy 2006-06-19 13:59 ` Keir Fraser 2006-06-26 21:15 ` Charles Duffy 2006-07-18 20:20 ` Charles Duffy 2006-06-19 14:00 ` Keir Fraser
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.