Recurring OOPS in latest -unstable kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

* Recurring OOPS in latest -unstable kernel
@ 2005-07-03  1:33 Kip Macy
  2005-07-03  8:27 ` Keir Fraser
  0 siblings, 1 reply; 6+ messages in thread
From: Kip Macy @ 2005-07-03  1:33 UTC (permalink / raw)
  To: Ian Pratt, Keir Fraser, xen-devel

I hit the following oops a couple of times a day - it seems to
correspond to tearing down a vif:

Jul  3 01:30:13 ubuntu kernel: ------------[ cut here ]------------
Jul  3 01:30:13 ubuntu kernel: kernel BUG at include/linux/dcache.h:293!
Jul  3 01:30:13 ubuntu kernel: invalid operand: 0000 [#1]
Jul  3 01:30:13 ubuntu kernel: SMP 
Jul  3 01:30:13 ubuntu kernel: Modules linked in: video thermal
processor fan button battery ac mptscsih mptbase
Jul  3 01:30:13 ubuntu kernel: CPU:    0
Jul  3 01:30:13 ubuntu kernel: EIP:    0061:[<c0193100>]    Not tainted VLI
Jul  3 01:30:13 ubuntu kernel: EFLAGS: 00010246   (2.6.11.12-xen0) 
Jul  3 01:30:13 ubuntu kernel: EIP is at sysfs_remove_dir+0x100/0x110
Jul  3 01:30:13 ubuntu kernel: eax: 00000000   ebx: d557b3d4   ecx:
dcfd4234   edx: d557b3d4
Jul  3 01:30:13 ubuntu kernel: esi: da0e1a20   edi: dd4d1424   ebp:
00000006   esp: c089de64
Jul  3 01:30:13 ubuntu kernel: ds: 007b   es: 007b   ss: 0069
Jul  3 01:30:13 ubuntu kernel: Process events/0 (pid: 10,
threadinfo=c089c000 task=c075ca40)
Jul  3 01:30:13 ubuntu kernel: Stack: c0191f02 dcfd4dbc dc576000
d557b3d4 da0e1a20 dc576000 00000006 c0211070
Jul  3 01:30:13 ubuntu kernel:        d557b3d4 00000002 d557b340
c03f087a d557b3d4 d557b340 da0e1a20 c03f1948
Jul  3 01:30:13 ubuntu kernel:        da0e1a20 dc576000 c04c84a0
dc576000 00000006 dc576144 c012cd55 c04c84a0
Jul  3 01:30:13 ubuntu kernel: Call Trace:
Jul  3 01:30:13 ubuntu kernel:  [<c0191f02>] sysfs_hash_and_remove+0x52/0xe9
Jul  3 01:30:13 ubuntu kernel:  [<c0211070>] kobject_del+0x20/0x30
Jul  3 01:30:13 ubuntu kernel:  [<c03f087a>] br_del_if+0x3a/0x5c
Jul  3 01:30:13 ubuntu kernel:  [<c03f1948>] br_device_event+0xb8/0x100
Jul  3 01:30:13 ubuntu kernel:  [<c012cd55>] notifier_call_chain+0x25/0x40
Jul  3 01:30:13 ubuntu kernel:  [<c03a4a2f>] unregister_netdevice+0x14f/0x270
Jul  3 01:30:13 ubuntu kernel:  [<c03a4b65>] unregister_netdev+0x15/0x1e
Jul  3 01:30:13 ubuntu kernel:  [<c02be4f5>] netif_destroy+0x75/0x90
Jul  3 01:30:13 ubuntu kernel:  [<c02bdeb4>] netif_ctrlif_rx+0x64/0xb0
Jul  3 01:30:13 ubuntu kernel:  [<c0105550>] __ctrl_if_rxmsg_deferred+0x40/0x50
Jul  3 01:30:13 ubuntu kernel:  [<c012fbc8>] worker_thread+0x1d8/0x260
Jul  3 01:30:14 ubuntu kernel:  [<c0105510>] __ctrl_if_rxmsg_deferred+0x0/0x50
Jul  3 01:30:14 ubuntu kernel:  [<c011a930>] default_wake_function+0x0/0x20
Jul  3 01:30:14 ubuntu kernel:  [<c011a930>] default_wake_function+0x0/0x20
Jul  3 01:30:14 ubuntu kernel:  [<c012f9f0>] worker_thread+0x0/0x260
Jul  3 01:30:14 ubuntu kernel:  [<c01341ad>] kthread+0xbd/0x100
Jul  3 01:30:14 ubuntu kernel:  [<c01340f0>] kthread+0x0/0x100
Jul  3 01:30:14 ubuntu kernel:  [<c0107b15>] kernel_thread_helper+0x5/0x10
Jul  3 01:30:14 ubuntu kernel: Code: 89 44 24 08 8b 00 89 04 24 e8 0d
25 fb ff 8b 54 24 08 8b 42 04 89 04 24 e8 7e e0 07 00 8b 44 24 08 89
04 24 e8 f2 24 fb ff eb 92 <0f> 0b 25 01 53 65 42 c0 e9 13 ff ff ff 8d
76 00 83 ec 20 89 5c
Jul  3 01:31:57 ubuntu xenstored: xenstored corruption: connection id
0: err Bad address: Unknown error 14 (Bad address)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recurring OOPS in latest -unstable kernel
  2005-07-03  1:33 Kip Macy
@ 2005-07-03  8:27 ` Keir Fraser
  2005-07-03 16:21   ` Kip Macy
  0 siblings, 1 reply; 6+ messages in thread
From: Keir Fraser @ 2005-07-03  8:27 UTC (permalink / raw)
  To: Kip Macy; +Cc: xen-devel, Ian Pratt

On 3 Jul 2005, at 02:33, Kip Macy wrote:

> I hit the following oops a couple of times a day - it seems to
> correspond to tearing down a vif:

Are you actually trying to tear down a vif when the crash occurs, or is 
its refcnt falling to zero because of a bug?

We've had this bug report at least once before, but I couldn;t find any 
obvious problem from reading through the backtrace...

  -- Keir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recurring OOPS in latest -unstable kernel
  2005-07-03  8:27 ` Keir Fraser
@ 2005-07-03 16:21   ` Kip Macy
  0 siblings, 0 replies; 6+ messages in thread
From: Kip Macy @ 2005-07-03 16:21 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Ian Pratt

This happens periodically when a domU crashes, so I can't say for
sure. I've been more focused on debugging my domU :-)

                           -Kip

On 7/3/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:
> 
> On 3 Jul 2005, at 02:33, Kip Macy wrote:
> 
> > I hit the following oops a couple of times a day - it seems to
> > correspond to tearing down a vif:
> 
> Are you actually trying to tear down a vif when the crash occurs, or is
> its refcnt falling to zero because of a bug?
> 
> We've had this bug report at least once before, but I couldn;t find any
> obvious problem from reading through the backtrace...
> 
>  -- Keir
> 
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Recurring OOPS in latest -unstable kernel
@ 2005-07-03 19:36 Ian Pratt
  2005-07-03 20:28 ` Kip Macy
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Pratt @ 2005-07-03 19:36 UTC (permalink / raw)
  To: Keir Fraser, Kip Macy; +Cc: xen-devel, Ian Pratt

> -----Original Message-----

> > I hit the following oops a couple of times a day - it seems to 
> > correspond to tearing down a vif:
> 
> Are you actually trying to tear down a vif when the crash 
> occurs, or is its refcnt falling to zero because of a bug?
> 
> We've had this bug report at least once before, but I 
> couldn;t find any obvious problem from reading through the 
> backtrace...

This sounds rather like the bug that's being seen with the ported SuSE
kernel. Appended is a summary of the info we have on it.

Ian

The problem really looks obscure to me, a requests seems to be routed to
the wrong netback(vifX.0) device, the refcount drops to 0 and then we
OOps. (The normal oops path is the BUG() in line
101 of netback/interface.c, I patched the kernel to get a backtrace at
the place where we schedule the work.)

The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
screws up the ringbuffers -- should we start reviewing the path down
from hypervisor_callback?

Something strange seems to happen there with ringbuffer assignment to
interfaces and I guess we need to review the upcall path.
Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
...
I don't know the code well enough see it without adding a lot of
instrumentation to the code.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recurring OOPS in latest -unstable kernel
  2005-07-03 19:36 Recurring OOPS in latest -unstable kernel Ian Pratt
@ 2005-07-03 20:28 ` Kip Macy
  2005-07-03 22:20   ` Kip Macy
  0 siblings, 1 reply; 6+ messages in thread
From: Kip Macy @ 2005-07-03 20:28 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Ian Pratt

Just to clarify  - this is straight out of the -unstable tree from
yesterday with no CONFIG_REGPARM. Nonetheless, a few things are
different:
CONFIG_MK8=y
CONFIG_SMP=y
# CONFIG_PREEMPT is not set

              -Kip
> 
> The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
> screws up the ringbuffers -- should we start reviewing the path down
> from hypervisor_callback?
> 
> Something strange seems to happen there with ringbuffer assignment to
> interfaces and I guess we need to review the upcall path.
> Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
> ...
> I don't know the code well enough see it without adding a lot of
> instrumentation to the code.
> 
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Recurring OOPS in latest -unstable kernel
  2005-07-03 20:28 ` Kip Macy
@ 2005-07-03 22:20   ` Kip Macy
  0 siblings, 0 replies; 6+ messages in thread
From: Kip Macy @ 2005-07-03 22:20 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel, Ian Pratt

Let me know if there is anything I can do to help out. Having to
reboot every second or third dom create is frustrating. I know you
have a v40z there, so it would be surprising if you couldn't reproduce
it.

     -Kip

On 7/3/05, Kip Macy <kip.macy@gmail.com> wrote:
> Just to clarify  - this is straight out of the -unstable tree from
> yesterday with no CONFIG_REGPARM. Nonetheless, a few things are
> different:
> CONFIG_MK8=y
> CONFIG_SMP=y
> # CONFIG_PREEMPT is not set
> 
>               -Kip
> >
> > The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
> > screws up the ringbuffers -- should we start reviewing the path down
> > from hypervisor_callback?
> >
> > Something strange seems to happen there with ringbuffer assignment to
> > interfaces and I guess we need to review the upcall path.
> > Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
> > ...
> > I don't know the code well enough see it without adding a lot of
> > instrumentation to the code.
> >
> >
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-07-03 22:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-03 19:36 Recurring OOPS in latest -unstable kernel Ian Pratt
2005-07-03 20:28 ` Kip Macy
2005-07-03 22:20   ` Kip Macy
  -- strict thread matches above, loose matches on Subject: below --
2005-07-03  1:33 Kip Macy
2005-07-03  8:27 ` Keir Fraser
2005-07-03 16:21   ` Kip Macy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.