* Recurring OOPS in latest -unstable kernel
@ 2005-07-03 1:33 Kip Macy
2005-07-03 8:27 ` Keir Fraser
0 siblings, 1 reply; 6+ messages in thread
From: Kip Macy @ 2005-07-03 1:33 UTC (permalink / raw)
To: Ian Pratt, Keir Fraser, xen-devel
I hit the following oops a couple of times a day - it seems to
correspond to tearing down a vif:
Jul 3 01:30:13 ubuntu kernel: ------------[ cut here ]------------
Jul 3 01:30:13 ubuntu kernel: kernel BUG at include/linux/dcache.h:293!
Jul 3 01:30:13 ubuntu kernel: invalid operand: 0000 [#1]
Jul 3 01:30:13 ubuntu kernel: SMP
Jul 3 01:30:13 ubuntu kernel: Modules linked in: video thermal
processor fan button battery ac mptscsih mptbase
Jul 3 01:30:13 ubuntu kernel: CPU: 0
Jul 3 01:30:13 ubuntu kernel: EIP: 0061:[<c0193100>] Not tainted VLI
Jul 3 01:30:13 ubuntu kernel: EFLAGS: 00010246 (2.6.11.12-xen0)
Jul 3 01:30:13 ubuntu kernel: EIP is at sysfs_remove_dir+0x100/0x110
Jul 3 01:30:13 ubuntu kernel: eax: 00000000 ebx: d557b3d4 ecx:
dcfd4234 edx: d557b3d4
Jul 3 01:30:13 ubuntu kernel: esi: da0e1a20 edi: dd4d1424 ebp:
00000006 esp: c089de64
Jul 3 01:30:13 ubuntu kernel: ds: 007b es: 007b ss: 0069
Jul 3 01:30:13 ubuntu kernel: Process events/0 (pid: 10,
threadinfo=c089c000 task=c075ca40)
Jul 3 01:30:13 ubuntu kernel: Stack: c0191f02 dcfd4dbc dc576000
d557b3d4 da0e1a20 dc576000 00000006 c0211070
Jul 3 01:30:13 ubuntu kernel: d557b3d4 00000002 d557b340
c03f087a d557b3d4 d557b340 da0e1a20 c03f1948
Jul 3 01:30:13 ubuntu kernel: da0e1a20 dc576000 c04c84a0
dc576000 00000006 dc576144 c012cd55 c04c84a0
Jul 3 01:30:13 ubuntu kernel: Call Trace:
Jul 3 01:30:13 ubuntu kernel: [<c0191f02>] sysfs_hash_and_remove+0x52/0xe9
Jul 3 01:30:13 ubuntu kernel: [<c0211070>] kobject_del+0x20/0x30
Jul 3 01:30:13 ubuntu kernel: [<c03f087a>] br_del_if+0x3a/0x5c
Jul 3 01:30:13 ubuntu kernel: [<c03f1948>] br_device_event+0xb8/0x100
Jul 3 01:30:13 ubuntu kernel: [<c012cd55>] notifier_call_chain+0x25/0x40
Jul 3 01:30:13 ubuntu kernel: [<c03a4a2f>] unregister_netdevice+0x14f/0x270
Jul 3 01:30:13 ubuntu kernel: [<c03a4b65>] unregister_netdev+0x15/0x1e
Jul 3 01:30:13 ubuntu kernel: [<c02be4f5>] netif_destroy+0x75/0x90
Jul 3 01:30:13 ubuntu kernel: [<c02bdeb4>] netif_ctrlif_rx+0x64/0xb0
Jul 3 01:30:13 ubuntu kernel: [<c0105550>] __ctrl_if_rxmsg_deferred+0x40/0x50
Jul 3 01:30:13 ubuntu kernel: [<c012fbc8>] worker_thread+0x1d8/0x260
Jul 3 01:30:14 ubuntu kernel: [<c0105510>] __ctrl_if_rxmsg_deferred+0x0/0x50
Jul 3 01:30:14 ubuntu kernel: [<c011a930>] default_wake_function+0x0/0x20
Jul 3 01:30:14 ubuntu kernel: [<c011a930>] default_wake_function+0x0/0x20
Jul 3 01:30:14 ubuntu kernel: [<c012f9f0>] worker_thread+0x0/0x260
Jul 3 01:30:14 ubuntu kernel: [<c01341ad>] kthread+0xbd/0x100
Jul 3 01:30:14 ubuntu kernel: [<c01340f0>] kthread+0x0/0x100
Jul 3 01:30:14 ubuntu kernel: [<c0107b15>] kernel_thread_helper+0x5/0x10
Jul 3 01:30:14 ubuntu kernel: Code: 89 44 24 08 8b 00 89 04 24 e8 0d
25 fb ff 8b 54 24 08 8b 42 04 89 04 24 e8 7e e0 07 00 8b 44 24 08 89
04 24 e8 f2 24 fb ff eb 92 <0f> 0b 25 01 53 65 42 c0 e9 13 ff ff ff 8d
76 00 83 ec 20 89 5c
Jul 3 01:31:57 ubuntu xenstored: xenstored corruption: connection id
0: err Bad address: Unknown error 14 (Bad address)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Recurring OOPS in latest -unstable kernel
2005-07-03 1:33 Kip Macy
@ 2005-07-03 8:27 ` Keir Fraser
2005-07-03 16:21 ` Kip Macy
0 siblings, 1 reply; 6+ messages in thread
From: Keir Fraser @ 2005-07-03 8:27 UTC (permalink / raw)
To: Kip Macy; +Cc: xen-devel, Ian Pratt
On 3 Jul 2005, at 02:33, Kip Macy wrote:
> I hit the following oops a couple of times a day - it seems to
> correspond to tearing down a vif:
Are you actually trying to tear down a vif when the crash occurs, or is
its refcnt falling to zero because of a bug?
We've had this bug report at least once before, but I couldn;t find any
obvious problem from reading through the backtrace...
-- Keir
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Recurring OOPS in latest -unstable kernel
2005-07-03 8:27 ` Keir Fraser
@ 2005-07-03 16:21 ` Kip Macy
0 siblings, 0 replies; 6+ messages in thread
From: Kip Macy @ 2005-07-03 16:21 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, Ian Pratt
This happens periodically when a domU crashes, so I can't say for
sure. I've been more focused on debugging my domU :-)
-Kip
On 7/3/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
>
> On 3 Jul 2005, at 02:33, Kip Macy wrote:
>
> > I hit the following oops a couple of times a day - it seems to
> > correspond to tearing down a vif:
>
> Are you actually trying to tear down a vif when the crash occurs, or is
> its refcnt falling to zero because of a bug?
>
> We've had this bug report at least once before, but I couldn;t find any
> obvious problem from reading through the backtrace...
>
> -- Keir
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Recurring OOPS in latest -unstable kernel
@ 2005-07-03 19:36 Ian Pratt
2005-07-03 20:28 ` Kip Macy
0 siblings, 1 reply; 6+ messages in thread
From: Ian Pratt @ 2005-07-03 19:36 UTC (permalink / raw)
To: Keir Fraser, Kip Macy; +Cc: xen-devel, Ian Pratt
> -----Original Message-----
> > I hit the following oops a couple of times a day - it seems to
> > correspond to tearing down a vif:
>
> Are you actually trying to tear down a vif when the crash
> occurs, or is its refcnt falling to zero because of a bug?
>
> We've had this bug report at least once before, but I
> couldn;t find any obvious problem from reading through the
> backtrace...
This sounds rather like the bug that's being seen with the ported SuSE
kernel. Appended is a summary of the info we have on it.
Ian
The problem really looks obscure to me, a requests seems to be routed to
the wrong netback(vifX.0) device, the refcount drops to 0 and then we
OOps. (The normal oops path is the BUG() in line
101 of netback/interface.c, I patched the kernel to get a backtrace at
the place where we schedule the work.)
The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
screws up the ringbuffers -- should we start reviewing the path down
from hypervisor_callback?
Something strange seems to happen there with ringbuffer assignment to
interfaces and I guess we need to review the upcall path.
Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
...
I don't know the code well enough see it without adding a lot of
instrumentation to the code.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Recurring OOPS in latest -unstable kernel
2005-07-03 19:36 Recurring OOPS in latest -unstable kernel Ian Pratt
@ 2005-07-03 20:28 ` Kip Macy
2005-07-03 22:20 ` Kip Macy
0 siblings, 1 reply; 6+ messages in thread
From: Kip Macy @ 2005-07-03 20:28 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel, Ian Pratt
Just to clarify - this is straight out of the -unstable tree from
yesterday with no CONFIG_REGPARM. Nonetheless, a few things are
different:
CONFIG_MK8=y
CONFIG_SMP=y
# CONFIG_PREEMPT is not set
-Kip
>
> The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
> screws up the ringbuffers -- should we start reviewing the path down
> from hypervisor_callback?
>
> Something strange seems to happen there with ringbuffer assignment to
> interfaces and I guess we need to review the upcall path.
> Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
> ...
> I don't know the code well enough see it without adding a lot of
> instrumentation to the code.
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Recurring OOPS in latest -unstable kernel
2005-07-03 20:28 ` Kip Macy
@ 2005-07-03 22:20 ` Kip Macy
0 siblings, 0 replies; 6+ messages in thread
From: Kip Macy @ 2005-07-03 22:20 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel, Ian Pratt
Let me know if there is anything I can do to help out. Having to
reboot every second or third dom create is frustrating. I know you
have a v40z there, so it would be surprising if you couldn't reproduce
it.
-Kip
On 7/3/05, Kip Macy <kip.macy@gmail.com> wrote:
> Just to clarify - this is straight out of the -unstable tree from
> yesterday with no CONFIG_REGPARM. Nonetheless, a few things are
> different:
> CONFIG_MK8=y
> CONFIG_SMP=y
> # CONFIG_PREEMPT is not set
>
> -Kip
> >
> > The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
> > screws up the ringbuffers -- should we start reviewing the path down
> > from hypervisor_callback?
> >
> > Something strange seems to happen there with ringbuffer assignment to
> > interfaces and I guess we need to review the upcall path.
> > Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
> > ...
> > I don't know the code well enough see it without adding a lot of
> > instrumentation to the code.
> >
> >
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-07-03 22:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-03 19:36 Recurring OOPS in latest -unstable kernel Ian Pratt
2005-07-03 20:28 ` Kip Macy
2005-07-03 22:20 ` Kip Macy
-- strict thread matches above, loose matches on Subject: below --
2005-07-03 1:33 Kip Macy
2005-07-03 8:27 ` Keir Fraser
2005-07-03 16:21 ` Kip Macy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.