netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* netif_tx_disable vs netif_stop_queue (possible races?)
@ 2006-06-08 23:14 Daniel Drake
  2006-06-09  4:41 ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Drake @ 2006-06-08 23:14 UTC (permalink / raw)
  To: netdev; +Cc: David Brownell

Hi,

In order to avoid disconnect races in ZD1211, I'm trying to convince 
myself that other USB network drivers are race-free (and why).

I brought up a possible issue in usbnet with David Brownell, but it 
appears neither of us are 100% clear on the details, so it would be good 
to get some extra clarification.

USB devices can be yanked out by the user at *any* time, and typically 
free_netdev is called in the disconnect handler, which frees all 
device-specific structures.

The issue I brought up is that code using these structures may be 
running concurrently on another CPU when the disconnect handler is 
invoked, resulting in access to freed data.

More specifically, we're talking about drivers/usb/net/usbnet.c and the 
usbnet_disconnect() function.  The race I am highlighting is that 
usbnet's hard_start_xmit handler (usbnet_start_xmit) may be running when 
the disconnect happens.

Is this a possible scenario?

My thoughts on avoiding this are to ensure that the TX queue is stopped 
and all current transmissions in usbnet_stop (the netdev->stop hook). I 
understand that the stop function is called during disconnect, because 
usbnet_disconnect() calls unregister_netdev(). The stop function already 
has a call to netif_stop_queue but (based on my knowledge from LDD3) 
this is not enough.

My understanding is that netif_stop_queue() should be used only inside a 
hard_start_xmit function, when it wants to make sure that no more 
transmissions are started until the queue is reopened.

I suggested that we use netif_tx_disable here, which is effectively the 
same but because it takes the lock, it guarantees that no 
hard_start_xmit transmissions are running on any CPU when it returns.

David pointed out that netif_tx_disable is rarely used, and a large 
number of drivers call netif_stop_queue in their stop function, so if 
what I suggest is true, there may be many such potential races 
elsewhere. Am I missing something simple?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: netif_tx_disable vs netif_stop_queue (possible races?)
  2006-06-08 23:14 netif_tx_disable vs netif_stop_queue (possible races?) Daniel Drake
@ 2006-06-09  4:41 ` Herbert Xu
  2006-06-09 15:29   ` Daniel Drake
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2006-06-09  4:41 UTC (permalink / raw)
  To: Daniel Drake; +Cc: netdev, david-b

Daniel Drake <dsd@gentoo.org> wrote:
> 
> More specifically, we're talking about drivers/usb/net/usbnet.c and the 
> usbnet_disconnect() function.  The race I am highlighting is that 
> usbnet's hard_start_xmit handler (usbnet_start_xmit) may be running when 
> the disconnect happens.
> 
> Is this a possible scenario?

It should be safe, if only because of the synchronize_net that occurs
before a netdev can be freed.

However, there is definitely room for clean-ups.  In particular, LLTX
drivers have changed the meaning of xmit_lock so things like dev_deactivate
no longer carries the same power they used to.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: netif_tx_disable vs netif_stop_queue (possible races?)
  2006-06-09  4:41 ` Herbert Xu
@ 2006-06-09 15:29   ` Daniel Drake
  2006-06-09 23:35     ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Drake @ 2006-06-09 15:29 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, david-b

Herbert Xu wrote:
> Daniel Drake <dsd@gentoo.org> wrote:
>> More specifically, we're talking about drivers/usb/net/usbnet.c and the 
>> usbnet_disconnect() function.  The race I am highlighting is that 
>> usbnet's hard_start_xmit handler (usbnet_start_xmit) may be running when 
>> the disconnect happens.
>>
>> Is this a possible scenario?
> 
> It should be safe, if only because of the synchronize_net that occurs
> before a netdev can be freed.

Can I interpret your response as: If the TX queue is disabled in 
advance, no hard_start_xmit functions will be running on any CPU after 
synchronize_net() has returned?

The synchronize_net() code doesn't make it very clear.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: netif_tx_disable vs netif_stop_queue (possible races?)
  2006-06-09 15:29   ` Daniel Drake
@ 2006-06-09 23:35     ` Herbert Xu
  2006-06-10 12:42       ` Daniel Drake
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2006-06-09 23:35 UTC (permalink / raw)
  To: Daniel Drake; +Cc: netdev, david-b

On Fri, Jun 09, 2006 at 04:29:13PM +0100, Daniel Drake wrote:
> 
> Can I interpret your response as: If the TX queue is disabled in 
> advance, no hard_start_xmit functions will be running on any CPU after 
> synchronize_net() has returned?

Correct.  All callers of hard_start_xmit do so under RCU or equivalent
locks so they must be complete by the time synchronize_net() returns.

What you got watch out for though are paths where the driver might
reenable the queue.  That could be a real bug.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: netif_tx_disable vs netif_stop_queue (possible races?)
  2006-06-09 23:35     ` Herbert Xu
@ 2006-06-10 12:42       ` Daniel Drake
  2006-06-10 12:59         ` Evgeniy Polyakov
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Drake @ 2006-06-10 12:42 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, david-b

Herbert Xu wrote:
> Correct.  All callers of hard_start_xmit do so under RCU or equivalent
> locks so they must be complete by the time synchronize_net() returns.

Does this hold for other operations? Such as:

- The netdev->set_mac_address function
- The wireless ioctl's (SIOCSIWESSID, etc)

Are these also guaranteed to have returned after synchronize_net()?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: netif_tx_disable vs netif_stop_queue (possible races?)
  2006-06-10 12:42       ` Daniel Drake
@ 2006-06-10 12:59         ` Evgeniy Polyakov
  2006-06-10 16:40           ` Daniel Drake
  0 siblings, 1 reply; 8+ messages in thread
From: Evgeniy Polyakov @ 2006-06-10 12:59 UTC (permalink / raw)
  To: Daniel Drake; +Cc: Herbert Xu, netdev, david-b

On Sat, Jun 10, 2006 at 01:42:21PM +0100, Daniel Drake (dsd@gentoo.org) wrote:
> Herbert Xu wrote:
> >Correct.  All callers of hard_start_xmit do so under RCU or equivalent
> >locks so they must be complete by the time synchronize_net() returns.
> 
> Does this hold for other operations? Such as:
> 
> - The netdev->set_mac_address function
> - The wireless ioctl's (SIOCSIWESSID, etc)
> 
> Are these also guaranteed to have returned after synchronize_net()?

None of above calls is protected with RCU (except set_mac_address()
called through ioctl, which is performed under read_lock which disables
preemtption), so they still can run after synchronize_net().

But if you are talking about synchronize_net() inside
unregister_netdevice(), which is called from 
usbnet_disconnect()->unregister_netdev(), than it is safe.

> Thanks,
> Daniel

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: netif_tx_disable vs netif_stop_queue (possible races?)
  2006-06-10 12:59         ` Evgeniy Polyakov
@ 2006-06-10 16:40           ` Daniel Drake
  2006-06-10 17:15             ` Evgeniy Polyakov
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Drake @ 2006-06-10 16:40 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Herbert Xu, netdev, david-b

Evgeniy Polyakov wrote:
> On Sat, Jun 10, 2006 at 01:42:21PM +0100, Daniel Drake (dsd@gentoo.org) wrote:
>> Herbert Xu wrote:
>>> Correct.  All callers of hard_start_xmit do so under RCU or equivalent
>>> locks so they must be complete by the time synchronize_net() returns.
>> Does this hold for other operations? Such as:
>>
>> - The netdev->set_mac_address function
>> - The wireless ioctl's (SIOCSIWESSID, etc)
>>
>> Are these also guaranteed to have returned after synchronize_net()?
> 
> None of above calls is protected with RCU (except set_mac_address()
> called through ioctl, which is performed under read_lock which disables
> preemtption), so they still can run after synchronize_net().
> 
> But if you are talking about synchronize_net() inside
> unregister_netdevice(), which is called from 
> usbnet_disconnect()->unregister_netdev(), than it is safe.

Are you referring to set_mac_address in the above statement, or both 
set_mac_address *and* the wireless ioctls?

I'm basically just looking to clarify that after unregister_netdev has 
completed, none of the following can be still in progress on any CPU, 
and none of the following can be triggered again:

1. hard_start_xmit handler
2. set_mac_address handler
3. WX ioctls

It's logical that this is the case, but the code doesn't make that very 
clear (and would certainly result in many potential ZD1211 races if this 
was not the case).

Daniel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: netif_tx_disable vs netif_stop_queue (possible races?)
  2006-06-10 16:40           ` Daniel Drake
@ 2006-06-10 17:15             ` Evgeniy Polyakov
  0 siblings, 0 replies; 8+ messages in thread
From: Evgeniy Polyakov @ 2006-06-10 17:15 UTC (permalink / raw)
  To: Daniel Drake; +Cc: Herbert Xu, netdev, david-b

On Sat, Jun 10, 2006 at 05:40:39PM +0100, Daniel Drake (dsd@gentoo.org) wrote:
> Evgeniy Polyakov wrote:
> >On Sat, Jun 10, 2006 at 01:42:21PM +0100, Daniel Drake (dsd@gentoo.org) 
> >wrote:
> >>Herbert Xu wrote:
> >>>Correct.  All callers of hard_start_xmit do so under RCU or equivalent
> >>>locks so they must be complete by the time synchronize_net() returns.
> >>Does this hold for other operations? Such as:
> >>
> >>- The netdev->set_mac_address function
> >>- The wireless ioctl's (SIOCSIWESSID, etc)
> >>
> >>Are these also guaranteed to have returned after synchronize_net()?
> >
> >None of above calls is protected with RCU (except set_mac_address()
> >called through ioctl, which is performed under read_lock which disables
> >preemtption), so they still can run after synchronize_net().
> >
> >But if you are talking about synchronize_net() inside
> >unregister_netdevice(), which is called from 
> >usbnet_disconnect()->unregister_netdev(), than it is safe.
> 
> Are you referring to set_mac_address in the above statement, or both 
> set_mac_address *and* the wireless ioctls?

oth calls have the same nature actually, and both calls are not
protected by RCU.

> I'm basically just looking to clarify that after unregister_netdev has 
> completed, none of the following can be still in progress on any CPU, 
> and none of the following can be triggered again:
> 
> 1. hard_start_xmit handler
> 2. set_mac_address handler
> 3. WX ioctls
> 
> It's logical that this is the case, but the code doesn't make that very 
> clear (and would certainly result in many potential ZD1211 races if this 
> was not the case).

set_mac_address() and wireless ioctls are protected by rtnl.
unregister_netdevice()  is called under rtnl protection too.
But hard_start_xmit() is not protected (and can not be protected in all
situations) by sleeping semaphore (like rtnl), 
so instead it runs under RCU, which is synchronized in synchronize_net()
inside unregister_netdevice().

> Daniel

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-10 17:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-08 23:14 netif_tx_disable vs netif_stop_queue (possible races?) Daniel Drake
2006-06-09  4:41 ` Herbert Xu
2006-06-09 15:29   ` Daniel Drake
2006-06-09 23:35     ` Herbert Xu
2006-06-10 12:42       ` Daniel Drake
2006-06-10 12:59         ` Evgeniy Polyakov
2006-06-10 16:40           ` Daniel Drake
2006-06-10 17:15             ` Evgeniy Polyakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).