Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-20 23:51   ` Herbert Xu
@ 2006-04-20 22:36     ` Michael Chan
  2006-04-21  1:27       ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Chan @ 2006-04-20 22:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: shawvrana, netdev, Auke Kok, David S. Miller

On Fri, 2006-04-21 at 09:51 +1000, Herbert Xu wrote:

> 
> Actually TG3 is buggy too.  If the reset task is scheduled but
> isn't running yet there is no synchronisation here to prevent the
> reset task from running after tg3_close releases the tp lock.
> 

If we're in tg3_close() and the reset task isn't running yet, tg3_close
() will proceed. However, when the reset task finally runs, it will see
that netif_running() is zero and will just return.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
       [not found] <200604201035.00100.shaw@vranix.com>
@ 2006-04-20 23:36 ` Herbert Xu
  2006-04-20 23:51   ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2006-04-20 23:36 UTC (permalink / raw)
  To: shawvrana; +Cc: netdev, Auke Kok, David S. Miller, Michael Chan

On Thu, Apr 20, 2006 at 05:35:00PM +0000, shawvrana@gmail.com wrote:
> 
> If the e1000_tx_timeout_task were running concurrently with e1000_down, it 
> seems that they could both attempt to kfree_skb concurrently when running 
> e1000_unmap_and_free_tx_resource.   I googled around to find mention of this 
> anywhere with no luck.  Has this been discussed already?

Yes that's definitely buggy.  There needs to be some form of
synchronisation as the TG3 driver does.  However, to be frank
I'm not too fond of what the TG3 driver does either.  Is there
no better way than an msleep loop?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-20 23:36 ` e1000_down and tx_timeout worker race cleaning the transmit buffers Herbert Xu
@ 2006-04-20 23:51   ` Herbert Xu
  2006-04-20 22:36     ` Michael Chan
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2006-04-20 23:51 UTC (permalink / raw)
  To: shawvrana; +Cc: netdev, Auke Kok, David S. Miller, Michael Chan

On Fri, Apr 21, 2006 at 09:36:31AM +1000, herbert wrote:
> 
> Yes that's definitely buggy.  There needs to be some form of
> synchronisation as the TG3 driver does.  However, to be frank
> I'm not too fond of what the TG3 driver does either.  Is there
> no better way than an msleep loop?

Actually TG3 is buggy too.  If the reset task is scheduled but
isn't running yet there is no synchronisation here to prevent the
reset task from running after tg3_close releases the tp lock.

It needs to kill the reset task and make sure it doesn't get
rescheduled by someone else.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  1:33         ` Herbert Xu
@ 2006-04-21  0:10           ` Michael Chan
  2006-04-21  2:37             ` Herbert Xu
                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Michael Chan @ 2006-04-21  0:10 UTC (permalink / raw)
  To: Herbert Xu; +Cc: shawvrana, netdev, Auke Kok, David S. Miller

On Fri, 2006-04-21 at 11:33 +1000, Herbert Xu wrote:

> Actually, what if the tg3_close is followed by a tg3_open? That could
> produce a spurious reset which I suppose isn't that bad.

Yes, an extra reset. And yes, it isn't too bad.

> Also if the
> module is unloaded bad things will happen as well.

In tg3_remove_one(), we call flush_scheduled_work() in case the
reset_task is still pending. Here, it is safe to call
flush_scheduled_work() because we're not holding the rtnl. Again, when
it runs, nothing bad will happen because it will see netif_running() ==
0.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  2:40               ` Herbert Xu
@ 2006-04-21  1:24                 ` Michael Chan
  2006-04-21 13:27                   ` Andy Gospodarek
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Chan @ 2006-04-21  1:24 UTC (permalink / raw)
  To: Herbert Xu; +Cc: shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

On Fri, 2006-04-21 at 12:40 +1000, Herbert Xu wrote:

> One simple solution is to establish a separate queue for RTNL-holding
> users or vice versa for non-RTNL holding networking users.  That
> would allow the drivers to safely flush the non-RTNL queue while
> holding the RTNL.

You mean a separate workqueue for net drivers to use instead of the
keventd_wq? Yeah, I think that'll work. Each driver can also create its
own workqueue but that may be a bit more wasteful.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-20 22:36     ` Michael Chan
@ 2006-04-21  1:27       ` Herbert Xu
  2006-04-21  1:33         ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2006-04-21  1:27 UTC (permalink / raw)
  To: Michael Chan; +Cc: shawvrana, netdev, Auke Kok, David S. Miller

On Thu, Apr 20, 2006 at 03:36:57PM -0700, Michael Chan wrote:
>
> If we're in tg3_close() and the reset task isn't running yet, tg3_close
> () will proceed. However, when the reset task finally runs, it will see
> that netif_running() is zero and will just return.

Yes you're absolutely right.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  2:42             ` Shaw Vrana
@ 2006-04-21  1:33               ` Michael Chan
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Chan @ 2006-04-21  1:33 UTC (permalink / raw)
  To: Shaw Vrana; +Cc: Herbert Xu, netdev, Auke Kok, David S. Miller

On Thu, 2006-04-20 at 19:42 -0700, Shaw Vrana wrote:

> I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
> e1000_down.  It's against 2.6.16.9.
> 
You're not following our discussion. It is not safe to call
flush_scheduled_work() in a driver's close() because it is holding the
rtnl and can deadlock with linkwatch_event() if it happens to be on the
workqueue.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  1:27       ` Herbert Xu
@ 2006-04-21  1:33         ` Herbert Xu
  2006-04-21  0:10           ` Michael Chan
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2006-04-21  1:33 UTC (permalink / raw)
  To: Michael Chan; +Cc: shawvrana, netdev, Auke Kok, David S. Miller

On Fri, Apr 21, 2006 at 11:27:01AM +1000, herbert wrote:
> On Thu, Apr 20, 2006 at 03:36:57PM -0700, Michael Chan wrote:
> >
> > If we're in tg3_close() and the reset task isn't running yet, tg3_close
> > () will proceed. However, when the reset task finally runs, it will see
> > that netif_running() is zero and will just return.
> 
> Yes you're absolutely right.

Actually, what if the tg3_close is followed by a tg3_open? That could
produce a spurious reset which I suppose isn't that bad.  Also if the
module is unloaded bad things will happen as well.  So I still don't
feel too comfortable about leaving it scheduled after a close.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  0:10           ` Michael Chan
@ 2006-04-21  2:37             ` Herbert Xu
  2006-04-21  2:40               ` Herbert Xu
  2006-04-21  2:42             ` Shaw Vrana
  2006-04-21  3:05             ` shaw
  2 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2006-04-21  2:37 UTC (permalink / raw)
  To: Michael Chan; +Cc: herbert, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

Michael Chan <mchan@broadcom.com> wrote:
> 
> In tg3_remove_one(), we call flush_scheduled_work() in case the
> reset_task is still pending. Here, it is safe to call

Great.

> flush_scheduled_work() because we're not holding the rtnl. Again, when

Hmm doing a quick grep seems to indicate that quite a number of drivers
do this in netdev->close or other callbacks under RTNL.  This means that
they're all vulnerable to the linkwatch deadlock that you alluded to.

Rather than dealing with this individually in each driver perhaps we should
come up with a more centralised solution?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  2:37             ` Herbert Xu
@ 2006-04-21  2:40               ` Herbert Xu
  2006-04-21  1:24                 ` Michael Chan
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2006-04-21  2:40 UTC (permalink / raw)
  To: Michael Chan; +Cc: shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

On Fri, Apr 21, 2006 at 12:37:36PM +1000, Herbert Xu wrote:
> 
> Rather than dealing with this individually in each driver perhaps we should
> come up with a more centralised solution?

One simple solution is to establish a separate queue for RTNL-holding
users or vice versa for non-RTNL holding networking users.  That
would allow the drivers to safely flush the non-RTNL queue while
holding the RTNL.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  0:10           ` Michael Chan
  2006-04-21  2:37             ` Herbert Xu
@ 2006-04-21  2:42             ` Shaw Vrana
  2006-04-21  1:33               ` Michael Chan
  2006-04-21  3:05             ` shaw
  2 siblings, 1 reply; 20+ messages in thread
From: Shaw Vrana @ 2006-04-21  2:42 UTC (permalink / raw)
  To: Michael Chan; +Cc: Herbert Xu, shawvrana, netdev, Auke Kok, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On Thursday 20 April 2006 17:10, Michael Chan wrote:
> In tg3_remove_one(), we call flush_scheduled_work() in case the
> reset_task is still pending. Here, it is safe to call
> flush_scheduled_work() because we're not holding the rtnl. Again, when
> it runs, nothing bad will happen because it will see netif_running() ==
> 0.

I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
e1000_down.  It's against 2.6.16.9.

Shaw

[-- Attachment #2: e1000_flush_in_close.patch --]
[-- Type: text/x-diff, Size: 614 bytes --]

diff -u -uprN -X linux-2.6.16.9/Documentation/dontdiff linux-2.6.16.9/drivers/net/e1000/e1000_main.c linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.9/drivers/net/e1000/e1000_main.c	2006-04-18 23:10:14.000000000 -0700
+++ linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c	2006-04-20 19:36:55.000000000 -0700
@@ -538,6 +538,7 @@ e1000_down(struct e1000_adapter *adapter
 	del_timer_sync(&adapter->tx_fifo_stall_timer);
 	del_timer_sync(&adapter->watchdog_timer);
 	del_timer_sync(&adapter->phy_info_timer);
+	flush_scheduled_work();	
 
 #ifdef CONFIG_E1000_NAPI
 	netif_poll_disable(netdev);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  0:10           ` Michael Chan
  2006-04-21  2:37             ` Herbert Xu
  2006-04-21  2:42             ` Shaw Vrana
@ 2006-04-21  3:05             ` shaw
  2 siblings, 0 replies; 20+ messages in thread
From: shaw @ 2006-04-21  3:05 UTC (permalink / raw)
  To: Michael Chan; +Cc: Herbert Xu, shawvrana, netdev, Auke Kok, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

I've replied to this once before, but haven't seen my last two emails on the 
list, so I'm sending again with different settings.  Sorry for the noise.

On Thursday 20 April 2006 17:10, Michael Chan wrote:
> In tg3_remove_one(), we call flush_scheduled_work() in case the
> reset_task is still pending. Here, it is safe to call
> flush_scheduled_work() because we're not holding the rtnl. Again, when
> it runs, nothing bad will happen because it will see netif_running() ==
> 0.

I'll bite!  Here's a patch to add a call to flush_scheduled_work() in 
e1000_down.  It's against 2.6.16.9.

Thanks,
Shaw

[-- Attachment #2: e1000_flush_in_close.patch --]
[-- Type: text/x-diff, Size: 614 bytes --]

diff -u -uprN -X linux-2.6.16.9/Documentation/dontdiff linux-2.6.16.9/drivers/net/e1000/e1000_main.c linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.9/drivers/net/e1000/e1000_main.c	2006-04-18 23:10:14.000000000 -0700
+++ linux-2.6.16.9-patch/drivers/net/e1000/e1000_main.c	2006-04-20 19:36:55.000000000 -0700
@@ -538,6 +538,7 @@ e1000_down(struct e1000_adapter *adapter
 	del_timer_sync(&adapter->tx_fifo_stall_timer);
 	del_timer_sync(&adapter->watchdog_timer);
 	del_timer_sync(&adapter->phy_info_timer);
+	flush_scheduled_work();	
 
 #ifdef CONFIG_E1000_NAPI
 	netif_poll_disable(netdev);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21  1:24                 ` Michael Chan
@ 2006-04-21 13:27                   ` Andy Gospodarek
  2006-04-21 15:28                     ` Michael Chan
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Gospodarek @ 2006-04-21 13:27 UTC (permalink / raw)
  To: Michael Chan
  Cc: Herbert Xu, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

On Thu, Apr 20, 2006 at 06:24:36PM -0700, Michael Chan wrote:
> On Fri, 2006-04-21 at 12:40 +1000, Herbert Xu wrote:
> 
> > One simple solution is to establish a separate queue for RTNL-holding
> > users or vice versa for non-RTNL holding networking users.  That
> > would allow the drivers to safely flush the non-RTNL queue while
> > holding the RTNL.
> 
> You mean a separate workqueue for net drivers to use instead of the
> keventd_wq? Yeah, I think that'll work. Each driver can also create its
> own workqueue but that may be a bit more wasteful.
> 

Isn't the only possibility for a linkwatch deadlock when the
__LINK_STATE_LINKWATCH_PENDING but is set in dev->state?  

Off the top of my head...

Would it be interesting to change the calls for flush_scheduled_work()
to a new function net_flush_scheduled_work() with the intent on
eventually creating new a new work queue but temporarily just checking
to make sure there are no linkwatch events pending and if there are
allowing them to run first before calling flush_scheduled_work()?

This probably isn't a perfect solution, but I thought I'd throw it out
there and see what you think....

-andy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21 13:27                   ` Andy Gospodarek
@ 2006-04-21 15:28                     ` Michael Chan
  2006-04-21 20:01                       ` Andy Gospodarek
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Chan @ 2006-04-21 15:28 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Herbert Xu, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

On Fri, 2006-04-21 at 09:27 -0400, Andy Gospodarek wrote:

> 
> Isn't the only possibility for a linkwatch deadlock when the
> __LINK_STATE_LINKWATCH_PENDING but is set in dev->state?  

This device that you're about to close may be on the linkwatch list, or
other devices may also be on the linkwatch list. As long as
linkwatch_event is in front of your driver's task on the same workqueue,
it will deadlock.

> 
> Off the top of my head...
> 
> Would it be interesting to change the calls for flush_scheduled_work()
> to a new function net_flush_scheduled_work() with the intent on
> eventually creating new a new work queue but temporarily just checking
> to make sure there are no linkwatch events pending and if there are
> allowing them to run first before calling flush_scheduled_work()?
> 

Using the same workqueue and holding the rtnl_lock, I'm not sure how you
can let linkwatch_event run first without deadlocking.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21 20:01                       ` Andy Gospodarek
@ 2006-04-21 19:00                         ` Michael Chan
  2006-04-21 20:46                           ` Andy Gospodarek
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Chan @ 2006-04-21 19:00 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Herbert Xu, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote:

> I just hate to see extra resources used to solve problems that good
> coding can solve (not that my suggestion is necessarily a 'good' one),
> so I was trying to think of a way to resolve this without explicitly
> adding another workqueue.

If you don't want to add another workqueue, then look at tg3, bnx2, and
one of the smc drivers on how to effectively wait for the driver's
workqueue task to finish without deadlocking with linkwatch_event.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21 15:28                     ` Michael Chan
@ 2006-04-21 20:01                       ` Andy Gospodarek
  2006-04-21 19:00                         ` Michael Chan
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Gospodarek @ 2006-04-21 20:01 UTC (permalink / raw)
  To: Michael Chan
  Cc: Herbert Xu, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

On 4/21/06, Michael Chan <mchan@broadcom.com> wrote:
> On Fri, 2006-04-21 at 09:27 -0400, Andy Gospodarek wrote:
>
> >
> > Isn't the only possibility for a linkwatch deadlock when the
> > __LINK_STATE_LINKWATCH_PENDING but is set in dev->state?
>
> This device that you're about to close may be on the linkwatch list, or
> other devices may also be on the linkwatch list. As long as
> linkwatch_event is in front of your driver's task on the same workqueue,
> it will deadlock.

My thought was that you would only try and service the linkwatch queue
if the device you are closing has the __LINK_STATE_LINKWATCH_PENDING
bit set in that device's dev->state.    I agree there still could be
problems with ordering of events on the workqueue, so it may make
sense to use a separate workqueue for linkwatch events if we can't
come up with a more consistent way to call flush_scheduled_work across
all drivers.

I just hate to see extra resources used to solve problems that good
coding can solve (not that my suggestion is necessarily a 'good' one),
so I was trying to think of a way to resolve this without explicitly
adding another workqueue.

>
> >
> > Off the top of my head...
> >
> > Would it be interesting to change the calls for flush_scheduled_work()
> > to a new function net_flush_scheduled_work() with the intent on
> > eventually creating new a new work queue but temporarily just checking
> > to make sure there are no linkwatch events pending and if there are
> > allowing them to run first before calling flush_scheduled_work()?
> >
>
> Using the same workqueue and holding the rtnl_lock, I'm not sure how you
> can let linkwatch_event run first without deadlocking.
>
>

You wouldn't call linkwatch_event, you would call linkwatch_run_queue
(with special care taken to make sure you take the rtnl_lock if
needed).  See a similar example for calling linkwatch_run_queue in
netdev_wait_allrefs.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21 19:00                         ` Michael Chan
@ 2006-04-21 20:46                           ` Andy Gospodarek
  2006-04-27  0:14                             ` Shaw
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Gospodarek @ 2006-04-21 20:46 UTC (permalink / raw)
  To: Michael Chan
  Cc: Herbert Xu, shawvrana, netdev, auke-jan.h.kok, davem, jgarzik

On 4/21/06, Michael Chan <mchan@broadcom.com> wrote:
> On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote:
>
> > I just hate to see extra resources used to solve problems that good
> > coding can solve (not that my suggestion is necessarily a 'good' one),
> > so I was trying to think of a way to resolve this without explicitly
> > adding another workqueue.
>
> If you don't want to add another workqueue, then look at tg3, bnx2, and
> one of the smc drivers on how to effectively wait for the driver's
> workqueue task to finish without deadlocking with linkwatch_event.
>

I agree 100%.  I just hope others can manage to figure that out too.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-21 20:46                           ` Andy Gospodarek
@ 2006-04-27  0:14                             ` Shaw
  2006-04-27  4:55                               ` Auke Kok
  0 siblings, 1 reply; 20+ messages in thread
From: Shaw @ 2006-04-27  0:14 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Michael Chan, Herbert Xu, netdev, auke-jan.h.kok, davem, jgarzik

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

On 4/21/06, Andy Gospodarek <andy@greyhouse.net> wrote:
> On 4/21/06, Michael Chan <mchan@broadcom.com> wrote:
> > On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote:
> >
> > > I just hate to see extra resources used to solve problems that good
> > > coding can solve (not that my suggestion is necessarily a 'good' one),
> > > so I was trying to think of a way to resolve this without explicitly
> > > adding another workqueue.
> >
> > If you don't want to add another workqueue, then look at tg3, bnx2, and
> > one of the smc drivers on how to effectively wait for the driver's
> > workqueue task to finish without deadlocking with linkwatch_event.
> >
>
> I agree 100%.  I just hope others can manage to figure that out too.

Ok, here's another attempt.  The goal here is to serialize attempts to
clean the tx and rx buffers, and ensure that e1000_close is called
after the tx_timeout_task has completed running and/or that the task
is safe to run after e1000_close hasrun.

I'm concerned about the addition of the netif_running check to
e1000_down.  While something like this is needed, I'm not familiar
enough w/ the code to know if this is okay.
All explanations and comments are greatly appreciated.

Thanks,
Shaw

[-- Attachment #2: e1000.patch --]
[-- Type: text/x-patch, Size: 2736 bytes --]

diff -u -uprN -X linux-2.6.16.11/Documentation/dontdiff linux-2.6.16.11/drivers/net/e1000/e1000.h linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000.h
--- linux-2.6.16.11/drivers/net/e1000/e1000.h	2006-04-24 13:20:24.000000000 -0700
+++ linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000.h	2006-04-26 16:23:46.475842000 -0700
@@ -358,5 +358,8 @@ struct e1000_adapter {
 #ifdef CONFIG_PCI_MSI
 	boolean_t have_msi;
 #endif
+	uint32_t flags;
+#define E1000_CLEANING 0x00000001
+	spinlock_t clean_lock;
 };
 #endif /* _E1000_H_ */
diff -u -uprN -X linux-2.6.16.11/Documentation/dontdiff linux-2.6.16.11/drivers/net/e1000/e1000_main.c linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000_main.c
--- linux-2.6.16.11/drivers/net/e1000/e1000_main.c	2006-04-24 13:20:24.000000000 -0700
+++ linux-2.6.16.11.e1000_patch/drivers/net/e1000/e1000_main.c	2006-04-26 16:59:48.742905000 -0700
@@ -525,6 +525,16 @@ e1000_down(struct e1000_adapter *adapter
 	boolean_t mng_mode_enabled = (adapter->hw.mac_type >= e1000_82571) &&
 				     e1000_check_mng_mode(&adapter->hw);
 
+	spin_lock_bh(&adapter->clean_lock);
+	adapter->flags |= E1000_CLEANING;
+
+	if (!netif_running(netdev)) {
+	    adapter->flags &= ~E1000_CLEANING;
+	    spin_unlock_bh(&adapter->clean_lock);
+	    return;
+	}
+	spin_unlock_bh(&adapter->clean_lock);
+
 	e1000_irq_disable(adapter);
 #ifdef CONFIG_E1000_MQ
 	while (atomic_read(&adapter->rx_sched_call_data.count) != 0);
@@ -549,8 +559,12 @@ e1000_down(struct e1000_adapter *adapter
 	netif_stop_queue(netdev);
 
 	e1000_reset(adapter);
+
+	spin_lock_bh(&adapter->clean_lock);
 	e1000_clean_all_tx_rings(adapter);
 	e1000_clean_all_rx_rings(adapter);
+	adapter->flags &= ~E1000_CLEANING;
+	spin_unlock_bh(&adapter->clean_lock);
 
 	/* Power down the PHY so no link is implied when interface is down *
 	 * The PHY cannot be powered down if any of the following is TRUE *
@@ -1109,6 +1123,8 @@ e1000_sw_init(struct e1000_adapter *adap
 
 	atomic_set(&adapter->irq_sem, 1);
 	spin_lock_init(&adapter->stats_lock);
+	spin_lock_init(&adapter->clean_lock);
+	adapter->flags = 0;
 
 	return 0;
 }
@@ -1269,10 +1285,18 @@ e1000_close(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
+	/* Calling flush_scheduled_work() may deadlock because
+	 * linkwatch_event() may be on the workqueue and it will 
+	 * try to get the rtnl_lock which we are holding. */
+	while (adapter->flags & E1000_CLEANING) 
+	    msleep(1);
+
 	e1000_down(adapter);
 
+	spin_lock_bh(&adapter->clean_lock);
 	e1000_free_all_tx_resources(adapter);
 	e1000_free_all_rx_resources(adapter);
+	spin_unlock_bh(&adapter->clean_lock);
 
 	if ((adapter->hw.mng_cookie.status &
 			  E1000_MNG_DHCP_COOKIE_STATUS_VLAN_SUPPORT)) {



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-27  0:14                             ` Shaw
@ 2006-04-27  4:55                               ` Auke Kok
  2006-04-29 21:57                                 ` Shaw Vrana
  0 siblings, 1 reply; 20+ messages in thread
From: Auke Kok @ 2006-04-27  4:55 UTC (permalink / raw)
  To: Shaw
  Cc: Andy Gospodarek, Michael Chan, Herbert Xu, netdev, auke-jan.h.kok,
	davem, jgarzik

Shaw wrote:
> On 4/21/06, Andy Gospodarek <andy@greyhouse.net> wrote:
>> On 4/21/06, Michael Chan <mchan@broadcom.com> wrote:
>>> On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote:
>>>
>>>> I just hate to see extra resources used to solve problems that good
>>>> coding can solve (not that my suggestion is necessarily a 'good' one),
>>>> so I was trying to think of a way to resolve this without explicitly
>>>> adding another workqueue.
>>> If you don't want to add another workqueue, then look at tg3, bnx2, and
>>> one of the smc drivers on how to effectively wait for the driver's
>>> workqueue task to finish without deadlocking with linkwatch_event.
>>>
>> I agree 100%.  I just hope others can manage to figure that out too.
> 
> Ok, here's another attempt.  The goal here is to serialize attempts to
> clean the tx and rx buffers, and ensure that e1000_close is called
> after the tx_timeout_task has completed running and/or that the task
> is safe to run after e1000_close hasrun.
> 
> I'm concerned about the addition of the netif_running check to
> e1000_down.  While something like this is needed, I'm not familiar
> enough w/ the code to know if this is okay.
> All explanations and comments are greatly appreciated.

I apologise for not getting back on this earlier but Jesse Brandeburg and I 
have been digging into this for two days and making some big progress. One of 
the main fixes will be that we're taking out a watchdog reset task completely 
and doing down/up cycles instead, which removes a large portion of the race 
conditions at this stage completely (the tx_timeout triggers a watchdog reset 
which can happen during an e1000_down causing a double free interrupt, or a 
double allocation).

We're making good progress with this and are now working on removing the last 
race between the ioctl path and the ifdn/ifup stuff, where the last remaining 
race location is in the ethtool test which does all sorts of funny lowlevel 
driver stuff that can seriously OOPS if you're running ethtool tests while 
ifup/downing your interface.

While I appreciate patches ;^) I think we're on a better path by making these 
cleanups, and actually reducing the code in large places. I hope to be able to 
push something out for RFC soon. Added benefit will be that we're dropping a 
whole bunch of irq operations where we didn't need to (soft resets).

Cheers,

Auke

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: e1000_down and tx_timeout worker race cleaning the transmit buffers
  2006-04-27  4:55                               ` Auke Kok
@ 2006-04-29 21:57                                 ` Shaw Vrana
  0 siblings, 0 replies; 20+ messages in thread
From: Shaw Vrana @ 2006-04-29 21:57 UTC (permalink / raw)
  To: Auke Kok
  Cc: Andy Gospodarek, Michael Chan, Herbert Xu, netdev, auke-jan.h.kok,
	davem, jgarzik

Hi Auke,

On 4/26/06, Auke Kok <sofar@foo-projects.org> wrote:

> > I'm concerned about the addition of the netif_running check to
> > e1000_down.  While something like this is needed, I'm not familiar
> > enough w/ the code to know if this is okay.
> > All explanations and comments are greatly appreciated.
> While I appreciate patches ;^) I think we're on a better path by making these
> cleanups, and actually reducing the code in large places. I hope to be able to
> push something out for RFC soon. Added benefit will be that we're dropping a
> whole bunch of irq operations where we didn't need to (soft resets).

Well, it looks like my patch won't work as the e1000_close is called
by dev.c only after it clears the __LINK_STATE_START bit, which means
that e1000_down will exit prematurely when called from e1000_close in
my approach.  Any feedback on the approach would be appreciated, as
your upcoming patch sounds like it might be too aggressive to get put
into a stabilization patch.  ;)

I understand the need to fix the problems associated with the
watchdog_task as well, though I wonder if it wouldn't be better to
remove it altogether given the complexity of cleaning up after these
tasks in general.  I've personally had more problems with the watchdog
task than with the possible sleep in the watchdog timer code.  I can't
help but siding with the RedHat folks who currently ship a version of
the e1000 driver that fixes the mechanism used to sleep instead of the
watchdog_task approach.  Perhaps I missed the discussion of this, I'm
only finding the patch itself with google.

Thanks,
Shaw

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-04-29 21:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200604201035.00100.shaw@vranix.com>
2006-04-20 23:36 ` e1000_down and tx_timeout worker race cleaning the transmit buffers Herbert Xu
2006-04-20 23:51   ` Herbert Xu
2006-04-20 22:36     ` Michael Chan
2006-04-21  1:27       ` Herbert Xu
2006-04-21  1:33         ` Herbert Xu
2006-04-21  0:10           ` Michael Chan
2006-04-21  2:37             ` Herbert Xu
2006-04-21  2:40               ` Herbert Xu
2006-04-21  1:24                 ` Michael Chan
2006-04-21 13:27                   ` Andy Gospodarek
2006-04-21 15:28                     ` Michael Chan
2006-04-21 20:01                       ` Andy Gospodarek
2006-04-21 19:00                         ` Michael Chan
2006-04-21 20:46                           ` Andy Gospodarek
2006-04-27  0:14                             ` Shaw
2006-04-27  4:55                               ` Auke Kok
2006-04-29 21:57                                 ` Shaw Vrana
2006-04-21  2:42             ` Shaw Vrana
2006-04-21  1:33               ` Michael Chan
2006-04-21  3:05             ` shaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).