netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.50 BUG_TRAP on !dev->deadbeaf, and oopses
@ 2002-11-30 21:09 David Brownell
  2002-12-01 12:45 ` Stefan Rompf
  0 siblings, 1 reply; 5+ messages in thread
From: David Brownell @ 2002-11-30 21:09 UTC (permalink / raw)
  To: netdev

Since sometime before 2.5.4x kernels, many of the usb networking drivers
running on 2.5 tend to trigger trouble like this (full text appended)
when they unplug.  The drivers in question didn't change how they talked
to the network stack; the stack started to complain, and oopses started:

   KERNEL: assertion (!dev->deadbeaf) failed at net/core/dev.c(2544)

I think there's another bug, beyond the obvious speling erorz.  Namely,
that "deadbeaf" is only set after that BUG_TRAP, or on one error path.
The assertion prevents hotpluggable network drivers from unregistering
when the hardware goes away ... which is a regression.

For now I'm just commenting out that broken assertion, but I wonder
if a better fix wouldn't be a "no deadbeaf" diet for the kernel.  But
there might be more problems than that.


The next message I got (at least in this 2.5.50 oops) was

   unregister_netdevice: device /dfd74058 never was registered

That's odd because we know for a fact that the earlier call to
register_netdev() returned.  Something got deeply confused,
and likely that caused the oops.

I remember seeing similar failures with the 'pegasus' driver,
with the "deadbeaf" problem and an oops, but I don't remember
whether those oopses were at all like this one (or gave that
"never registered" message).

Since there have been 2.5 kernels, using essentially identical
drivers, that don't trigger any of those problems, I'm wondering
what's up..  I'm suspecting the networking code caused all of
these, now that the sysfs-related bugs in usbcore (which caused
different unplug problems) seem to be mostly gone.  Suggestions?

- Dave


Here's the full trace, pretty typical of what I've seen when
unplugging those network devices:


KERNEL: assertion (!dev->deadbeaf) failed at net/core/dev.c(2544)
unregister_netdevice: device /dfd74058 never was registered
eip: c0107bf0
------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:123!
invalid operand: 0000
CPU:    0
EIP:    0060:[<c0107c4f>]    Tainted: G S
EFLAGS: 00010086
EIP is at __down+0x5f/0x1c0
eax: 0000000e   ebx: dfd74034   ecx: 00000000   edx: 0000ac2c
esi: dfd74034   edi: dfd7402c   ebp: 00000286   esp: db5dde38
ds: 0068   es: 0068   ss: 0068
Process khubd (pid: 913, threadinfo=db5dc000 task=d64eb940)
Stack: c027b05c c0107bf0 d64eb940 00000000 d64eb940 c011bec0 00000000 00000000
        dfd74058 dfd74058 dfa5c504 dfd74034 e08897e0 dfd7402c db5dde84 c010807b
        dfd74034 00000000 00000000 c6f7e000 e08897a8 e088991e 00000077 e0854302
Call Trace:
  [<c0107bf0>] __down+0x0/0x1c0
  [<c011bec0>] default_wake_function+0x0/0x40
  [<e08897e0>] +0x0/0x18 [usbnet]
  [<c010807b>] __down_failed+0xb/0x14
  [<e08897a8>] .text.lock.usbnet+0x9b/0xd3 [usbnet]
  [<e088991e>] +0x36/0x2d8 [usbnet]
  [<e0854302>] +0x36/0x1174 [usbcore]
  [<e0889820>] usbnet_driver+0x0/0xc0 [usbnet]
  [<e0889820>] usbnet_driver+0x0/0xc0 [usbnet]
  [<e0846248>] usb_device_remove+0xc8/0x140 [usbcore]
  [<e0889838>] usbnet_driver+0x18/0xc0 [usbnet]
  [<c01c8a52>] detach+0x42/0x50
  [<e085a360>] usb_bus_type+0x0/0x120 [usbcore]
  [<e085a394>] usb_bus_type+0x34/0x120 [usbcore]
  [<c01c8a70>] device_detach+0x10/0x20
  [<e0889838>] usbnet_driver+0x18/0xc0 [usbnet]
  [<c01c8bca>] bus_remove_device+0x5a/0xb0
  [<c01c8138>] device_del+0x78/0xa0
  [<c01c816b>] device_unregister+0xb/0x16
  [<e0846b55>] usb_disconnect+0x95/0xf0 [usbcore]
  [<e0849250>] usb_hub_port_connect_change+0xa0/0x2c0 [usbcore]
  [<e084965b>] usb_hub_events+0x1eb/0x420 [usbcore]
  [<e0857300>] +0x1ec0/0x3d20 [usbcore]
  [<e08498c5>] usb_hub_thread+0x35/0x100 [usbcore]
  [<c01091c9>] ret_from_fork+0x5/0x14
  [<c011bec0>] default_wake_function+0x0/0x40
  [<e085a4b8>] khubd_wait+0x8/0x10 [usbcore]
  [<e085a4b8>] khubd_wait+0x8/0x10 [usbcore]
  [<e0849890>] usb_hub_thread+0x0/0x100 [usbcore]
  [<c0107029>] kernel_thread_helper+0x5/0xc

Code: 0f 0b 7b 00 44 af 27 c0 59 5b 8d b4 26 00 00 00 00 f0 fe 4e
  <6>note: khubd[913] exited with preempt_count 1


Note that killing khubd() like that meant that no usb
device connects or disconnects can be processed again
without rebooting.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.50 BUG_TRAP on !dev->deadbeaf, and oopses
  2002-11-30 21:09 2.5.50 BUG_TRAP on !dev->deadbeaf, and oopses David Brownell
@ 2002-12-01 12:45 ` Stefan Rompf
  2002-12-02 18:44   ` David Brownell
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Rompf @ 2002-12-01 12:45 UTC (permalink / raw)
  To: David Brownell; +Cc: netdev

Hi,

David Brownell wrote:

>    KERNEL: assertion (!dev->deadbeaf) failed at net/core/dev.c(2544)
> 
> I think there's another bug, beyond the obvious speling erorz.  Namely,
> that "deadbeaf" is only set after that BUG_TRAP, or on one error path.
> The assertion prevents hotpluggable network drivers from unregistering
> when the hardware goes away ... which is a regression.

actually, the assertion is triggered when someone tries to unregister a
netdevice twice, and that's also why you get

>    unregister_netdevice: device /dfd74058 never was registered

>From a short browsing through usb.c I don't see a similiar bug catcher
in usb_device_remove(), so have a look if the USB subsystem itself
removes a unplugged device
twice for some reason.

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.50 BUG_TRAP on !dev->deadbeaf, and oopses
  2002-12-01 12:45 ` Stefan Rompf
@ 2002-12-02 18:44   ` David Brownell
  2002-12-08 22:42     ` David Brownell
  0 siblings, 1 reply; 5+ messages in thread
From: David Brownell @ 2002-12-02 18:44 UTC (permalink / raw)
  To: Stefan Rompf; +Cc: netdev

Hi Stefan,

>>   KERNEL: assertion (!dev->deadbeaf) failed at net/core/dev.c(2544)
>>
>>I think there's another bug, beyond the obvious speling erorz.  Namely,
>>that "deadbeaf" is only set after that BUG_TRAP, or on one error path.
>>The assertion prevents hotpluggable network drivers from unregistering
>>when the hardware goes away ... which is a regression.
> 
> 
> actually, the assertion is triggered when someone tries to unregister a
> netdevice twice, and that's also why you get

Then why will grep of all kernel files not turn up other places where
'deadbeaf' gets set?  There's strange stuff going on here regardless
(as well as speling issue), which looks pretty buglike.

Plus: this kind of bugcatch should use magic numbers, or maybe zero.
Assuming "any nonzero value is valid", like this assertion does, is
clearly going to fail for any of the class of bugs highlighted by
slab poisoning.  (0xa5a5a5a5 gets accepted as valid...)


>>   unregister_netdevice: device /dfd74058 never was registered
> 
> 
> From a short browsing through usb.c I don't see a similiar bug catcher
> in usb_device_remove(), so have a look if the USB subsystem itself
> removes a unplugged device twice for some reason.

At least one failure path also involves "rmmod" of the network
drivers, where the device hardware is still around; so that code
would not always be called.

I wouldn't rule out problems in the relevant usbcore/sysfs bits,
even now that they seem to have stabilized again (and yes, I was
wondering about multiple disconnects too), but all that deadbeaf
logic still looks fishy to me.

- Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.50 BUG_TRAP on !dev->deadbeaf, and oopses
  2002-12-02 18:44   ` David Brownell
@ 2002-12-08 22:42     ` David Brownell
  2002-12-09 19:51       ` David Brownell
  0 siblings, 1 reply; 5+ messages in thread
From: David Brownell @ 2002-12-08 22:42 UTC (permalink / raw)
  To: Stefan Rompf; +Cc: netdev

Following up on my earlier reply ...

> Hi Stefan,
> 
>>>   KERNEL: assertion (!dev->deadbeaf) failed at net/core/dev.c(2544)
>>>
>>> I think there's another bug, beyond the obvious speling erorz.  Namely,
>>> that "deadbeaf" is only set after that BUG_TRAP, or on one error path.
>>> The assertion prevents hotpluggable network drivers from unregistering
>>> when the hardware goes away ... which is a regression.
>>
>>
>>
>> actually, the assertion is triggered when someone tries to unregister a
>> netdevice twice, and that's also why you get


FWIW I just added a printk so I could see if the disconnect() method
was called more than once (by USB) per your guess ... no, it wasn't.
It's called once, leading to flakey diagnostices and a BUG().

So this is clearly some kind of network layer problem, as I described
in my original message (and then in this one).  Behavioral proof, as
well as the one that came from inspecting the kernel source code and
noticing that "deadbeaf" clearly can't be achieving what it seems to
be intending to do...

Is there someone who has a clear explanation of exactly how "deadbeaf"
was once expected to work -- and now (since sometime before about
2.5.40) evidently doesn't?

It seems to be driven by side effects, and whatever comments are in
the code aren't any help.  The only case "deadbeaf" could be set is
still documented as an error path ... but evidently those USB drivers
don't hit that "error" path any more on 2.5 (but they do on 2.4, and
did earlier in 2.5 also).

My thought is that there were some bugs covering for each other, and
one of them got fixed ... exposing this.  But without knowing what
the networking code was really expecting to do, I can't fix anything.



> Then why will grep of all kernel files not turn up other places where
> 'deadbeaf' gets set?  There's strange stuff going on here regardless
> (as well as speling issue), which looks pretty buglike.
> 
> Plus: this kind of bugcatch should use magic numbers, or maybe zero.
> Assuming "any nonzero value is valid", like this assertion does, is
> clearly going to fail for any of the class of bugs highlighted by
> slab poisoning.  (0xa5a5a5a5 gets accepted as valid...)
> 
> 
>>>   unregister_netdevice: device /dfd74058 never was registered
>>
>>
>>
>> From a short browsing through usb.c I don't see a similiar bug catcher
>> in usb_device_remove(), so have a look if the USB subsystem itself
>> removes a unplugged device twice for some reason.
> 
> 
> At least one failure path also involves "rmmod" of the network
> drivers, where the device hardware is still around; so that code
> would not always be called.
> 
> I wouldn't rule out problems in the relevant usbcore/sysfs bits,
> even now that they seem to have stabilized again (and yes, I was
> wondering about multiple disconnects too), but all that deadbeaf
> logic still looks fishy to me.

Right now I _would_ absolutely rule out such a problem.  And that
"deadbeaf" stuff still looks more than a little bit dubious.


> - Dave
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.5.50 BUG_TRAP on !dev->deadbeaf, and oopses
  2002-12-08 22:42     ` David Brownell
@ 2002-12-09 19:51       ` David Brownell
  0 siblings, 0 replies; 5+ messages in thread
From: David Brownell @ 2002-12-09 19:51 UTC (permalink / raw)
  To: Stefan Rompf; +Cc: netdev


> Is there someone who has a clear explanation of exactly how "deadbeaf"
> was once expected to work -- and now (since sometime before about
> 2.5.40) evidently doesn't?
> 
> It seems to be driven by side effects, and whatever comments are in
> the code aren't any help.  The only case "deadbeaf" could be set is
> still documented as an error path ... 

All that still holds true.  There's something fishy going on, or
just old cruft that's lingered.  I suppose I should just patch it
and see if contradictory information appears then.

>> Plus: this kind of bugcatch should use magic numbers, or maybe zero.
>> Assuming "any nonzero value is valid", like this assertion does, is
>> clearly going to fail for any of the class of bugs highlighted by
>> slab poisoning.  (0xa5a5a5a5 gets accepted as valid...)

Actually I found a place where the wrong pointer was being used.
Heh -- lucky me, to make my point that way ... a stray pointer
happened to point to a zero, and so triggered that warning.  As a
bugcatch it's pretty poor:  it wouldn't normally trigger on that
kind of bug, either.

- Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-12-09 19:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-30 21:09 2.5.50 BUG_TRAP on !dev->deadbeaf, and oopses David Brownell
2002-12-01 12:45 ` Stefan Rompf
2002-12-02 18:44   ` David Brownell
2002-12-08 22:42     ` David Brownell
2002-12-09 19:51       ` David Brownell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).