* E1000: bug on error path in e1000_probe()
@ 2006-08-01 15:16 Stephane Doyon
2006-08-01 18:54 ` Auke Kok
0 siblings, 1 reply; 3+ messages in thread
From: Stephane Doyon @ 2006-08-01 15:16 UTC (permalink / raw)
To: linux.nics, netdev
Hi,
The e1000_probe() function passes references to the netdev structure
before it's actually registered. In the (admittedly obscure) case where
the netdev registration fails, we are left with a dangling reference.
Specifically, e1000_probe() calls
netif_carrier_off(netdev);
before register_netdev(netdev).
(It also calls pci_set_drvdata(pdev, netdev) rather early, not sure how
important that is.)
netif_carrier_off() does linkwatch_fire_event(dev);, which in turn does
dev_hold(dev); and queues up an event with a reference to the netdev.
But the net_device reference counting mechanism only works on registered
netdevs.
Should the register_netdev() call fail, the error path does
free_netdev(netdev);, and when the event goes off, it accesses random
memory through the dangling reference.
I would recommend moving the register_netdev() call earlier.
Thanks
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: E1000: bug on error path in e1000_probe()
2006-08-01 15:16 E1000: bug on error path in e1000_probe() Stephane Doyon
@ 2006-08-01 18:54 ` Auke Kok
2006-08-01 19:29 ` Stephane Doyon
0 siblings, 1 reply; 3+ messages in thread
From: Auke Kok @ 2006-08-01 18:54 UTC (permalink / raw)
To: Stephane Doyon; +Cc: netdev, Jesse Brandeburg, Cramer, Jeb J
Stephane Doyon wrote:
> The e1000_probe() function passes references to the netdev structure
> before it's actually registered. In the (admittedly obscure) case where
> the netdev registration fails, we are left with a dangling reference.
>
> Specifically, e1000_probe() calls
> netif_carrier_off(netdev);
> before register_netdev(netdev).
>
> (It also calls pci_set_drvdata(pdev, netdev) rather early, not sure how
> important that is.)
>
> netif_carrier_off() does linkwatch_fire_event(dev);, which in turn does
> dev_hold(dev); and queues up an event with a reference to the netdev.
>
> But the net_device reference counting mechanism only works on registered
> netdevs.
>
> Should the register_netdev() call fail, the error path does
> free_netdev(netdev);, and when the event goes off, it accesses random
> memory through the dangling reference.
>
> I would recommend moving the register_netdev() call earlier.
We agree that this may be an issue and we're looking at how this mis-ordering
entered the code in the first place. I'm probably going to send a patch later
today or include it in this week-worths upstream patches later this week.
We were wondering however how you encountered this problem? Did you see a case
where this race actually happened? it might be an interesting case to look at.
Or did you do this by code review only?
Auke
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: E1000: bug on error path in e1000_probe()
2006-08-01 18:54 ` Auke Kok
@ 2006-08-01 19:29 ` Stephane Doyon
0 siblings, 0 replies; 3+ messages in thread
From: Stephane Doyon @ 2006-08-01 19:29 UTC (permalink / raw)
To: Auke Kok; +Cc: netdev, Jesse Brandeburg, Cramer, Jeb J
On Tue, 1 Aug 2006, Auke Kok wrote:
> Stephane Doyon wrote:
>> The e1000_probe() function passes references to the netdev structure
>> before it's actually registered. In the (admittedly obscure) case where
>> the netdev registration fails, we are left with a dangling reference.
>>
>> Specifically, e1000_probe() calls
>> netif_carrier_off(netdev);
>> before register_netdev(netdev).
>>
>> (It also calls pci_set_drvdata(pdev, netdev) rather early, not sure how
>> important that is.)
>>
>> netif_carrier_off() does linkwatch_fire_event(dev);, which in turn does
>> dev_hold(dev); and queues up an event with a reference to the netdev.
>>
>> But the net_device reference counting mechanism only works on registered
>> netdevs.
>>
>> Should the register_netdev() call fail, the error path does
>> free_netdev(netdev);, and when the event goes off, it accesses random
>> memory through the dangling reference.
>>
>> I would recommend moving the register_netdev() call earlier.
>
> We agree that this may be an issue and we're looking at how this mis-ordering
> entered the code in the first place. I'm probably going to send a patch later
> today or include it in this week-worths upstream patches later this week.
Thank you for looking at this.
> We were wondering however how you encountered this problem? Did you see a
> case where this race actually happened? it might be an interesting case to
> look at. Or did you do this by code review only?
Yes I did see a case where this happened, but the failure in
register_netdev() was due to some unrelated kernel code modifications I
was working with. The effect of the dangling reference was an unhandled
"kernel paging request somewhere in the USB EHCI driver where some pointer
got corrupted. The USB modules were being inserted soon after the e1000. I
moved things around and eventually I put a sleep after the modprobe e1000,
and then I got a "BUG at kernel/timer.c:279!" instead, and the backtrace
showed mod_timer() <= __netdev_watchdog_up() <= ... <= dev_activate)( <=
linkwatch_run_queue()... I figured out the problem from there. In
e1000_probe(), I moved netif_carrier_off() (and pci_set_drvdata( and
netif_stop_queue() too for good measure) to after the register_netdev()
call, and that made the weird effects go away. The error path in question
is pretty obscure, but it wasn't easy working backward from the memory
corruption effect, so that's my extra incentive for reporting the problem.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-08-01 19:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-01 15:16 E1000: bug on error path in e1000_probe() Stephane Doyon
2006-08-01 18:54 ` Auke Kok
2006-08-01 19:29 ` Stephane Doyon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).