[RFC PATCH] e100: Fix workqueue race

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] e100: Fix workqueue race
@ 2010-01-21 16:48 Alan Cox
  2010-01-21 17:20 ` Stephen Hemminger
  2010-01-22  8:42 ` Jarek Poplawski
  0 siblings, 2 replies; 5+ messages in thread
From: Alan Cox @ 2010-01-21 16:48 UTC (permalink / raw)
  To: jesse.brandeburg, netdev

(Incidentally this doesn't seem to be the only net driver that looks
suspect here)

e100: Fix the TX workqueue race

From: Alan Cox <alan@linux.intel.com>

Nothing stops the workqueue being left to run in parallel with close or a
few other operations. This causes double unmaps and the like.

See kerneloops.org #1041230 for an example

Signed-off-by: Alan Cox <alan@linux.intel.com>
---

 drivers/net/e100.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)


diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 5c7a155..5e02e4f 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2232,7 +2232,7 @@ err_rx_clean_list:
 	return err;
 }
 
-static void e100_down(struct nic *nic)
+static void e100_do_down(struct nic *nic)
 {
 	/* wait here for poll to complete */
 	napi_disable(&nic->napi);
@@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
 	e100_rx_clean_list(nic);
 }
 
+/* For the non TX timeout case we want to kill the tx timeout before
+   we do this otherwise a parallel tx timeout will make a nasty mess. */
+
+static void e100_down(struct nic *nic)
+{
+	cancel_work_sync(&nic->tx_timeout_task);
+	e100_do_down(nic);
+}
+
 static void e100_tx_timeout(struct net_device *netdev)
 {
 	struct nic *nic = netdev_priv(netdev);
@@ -2261,7 +2270,7 @@ static void e100_tx_timeout_task(struct work_struct *work)
 
 	DPRINTK(TX_ERR, DEBUG, "scb.status=0x%02X\n",
 		ioread8(&nic->csr->scb.status));
-	e100_down(netdev_priv(netdev));
+	e100_do_down(netdev_priv(netdev));
 	e100_up(netdev_priv(netdev));
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] e100: Fix workqueue race
  2010-01-21 16:48 [RFC PATCH] e100: Fix workqueue race Alan Cox
@ 2010-01-21 17:20 ` Stephen Hemminger
  2010-01-22  8:42 ` Jarek Poplawski
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2010-01-21 17:20 UTC (permalink / raw)
  To: Alan Cox; +Cc: jesse.brandeburg, netdev

On Thu, 21 Jan 2010 16:48:01 +0000
Alan Cox <alan@linux.intel.com> wrote:

> (Incidentally this doesn't seem to be the only net driver that looks
> suspect here)
> 
> e100: Fix the TX workqueue race
> 
> From: Alan Cox <alan@linux.intel.com>
> 
> Nothing stops the workqueue being left to run in parallel with close or a
> few other operations. This causes double unmaps and the like.
> 
> See kerneloops.org #1041230 for an example
> 
> Signed-off-by: Alan Cox <alan@linux.intel.com>

Most drivers solve this by getting rtnl_lock in the timeout work
function.

-- 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] e100: Fix workqueue race
  2010-01-21 16:48 [RFC PATCH] e100: Fix workqueue race Alan Cox
  2010-01-21 17:20 ` Stephen Hemminger
@ 2010-01-22  8:42 ` Jarek Poplawski
  2010-01-22  9:07   ` Jarek Poplawski
  1 sibling, 1 reply; 5+ messages in thread
From: Jarek Poplawski @ 2010-01-22  8:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: jesse.brandeburg, netdev

On 21-01-2010 17:48, Alan Cox wrote:
> (Incidentally this doesn't seem to be the only net driver that looks
> suspect here)
> 
> e100: Fix the TX workqueue race
> 
> From: Alan Cox <alan@linux.intel.com>
> 
> Nothing stops the workqueue being left to run in parallel with close or a
> few other operations. This causes double unmaps and the like.
> 
> See kerneloops.org #1041230 for an example
> 
> Signed-off-by: Alan Cox <alan@linux.intel.com>
> ---
> 
>  drivers/net/e100.c |   13 +++++++++++--
>  1 files changed, 11 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 5c7a155..5e02e4f 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -2232,7 +2232,7 @@ err_rx_clean_list:
>  	return err;
>  }
>  
> -static void e100_down(struct nic *nic)
> +static void e100_do_down(struct nic *nic)
>  {
>  	/* wait here for poll to complete */
>  	napi_disable(&nic->napi);
> @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
>  	e100_rx_clean_list(nic);
>  }
>  
> +/* For the non TX timeout case we want to kill the tx timeout before
> +   we do this otherwise a parallel tx timeout will make a nasty mess. */
> +
> +static void e100_down(struct nic *nic)
> +{
> +	cancel_work_sync(&nic->tx_timeout_task);

Can't tx_timeout_task be triggered just between these two calls here?

Jarek P.

> +	e100_do_down(nic);
> +}
> +
>  static void e100_tx_timeout(struct net_device *netdev)
>  {
>  	struct nic *nic = netdev_priv(netdev);
> @@ -2261,7 +2270,7 @@ static void e100_tx_timeout_task(struct work_struct *work)
>  
>  	DPRINTK(TX_ERR, DEBUG, "scb.status=0x%02X\n",
>  		ioread8(&nic->csr->scb.status));
> -	e100_down(netdev_priv(netdev));
> +	e100_do_down(netdev_priv(netdev));
>  	e100_up(netdev_priv(netdev));
>  }
>  
> --

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] e100: Fix workqueue race
  2010-01-22  8:42 ` Jarek Poplawski
@ 2010-01-22  9:07   ` Jarek Poplawski
  2010-01-22  9:38     ` Jarek Poplawski
  0 siblings, 1 reply; 5+ messages in thread
From: Jarek Poplawski @ 2010-01-22  9:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: jesse.brandeburg, netdev

On Fri, Jan 22, 2010 at 08:42:00AM +0000, Jarek Poplawski wrote:
> On 21-01-2010 17:48, Alan Cox wrote:
> > (Incidentally this doesn't seem to be the only net driver that looks
> > suspect here)
> > 
> > e100: Fix the TX workqueue race
> > 
> > From: Alan Cox <alan@linux.intel.com>
> > 
> > Nothing stops the workqueue being left to run in parallel with close or a
> > few other operations. This causes double unmaps and the like.
> > 
> > See kerneloops.org #1041230 for an example
> > 
> > Signed-off-by: Alan Cox <alan@linux.intel.com>
> > ---
> > 
> >  drivers/net/e100.c |   13 +++++++++++--
> >  1 files changed, 11 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> > index 5c7a155..5e02e4f 100644
> > --- a/drivers/net/e100.c
> > +++ b/drivers/net/e100.c
> > @@ -2232,7 +2232,7 @@ err_rx_clean_list:
> >  	return err;
> >  }
> >  
> > -static void e100_down(struct nic *nic)
> > +static void e100_do_down(struct nic *nic)
> >  {
> >  	/* wait here for poll to complete */
> >  	napi_disable(&nic->napi);
> > @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
> >  	e100_rx_clean_list(nic);
> >  }
> >  
> > +/* For the non TX timeout case we want to kill the tx timeout before
> > +   we do this otherwise a parallel tx timeout will make a nasty mess. */
> > +
> > +static void e100_down(struct nic *nic)
> > +{
> > +	cancel_work_sync(&nic->tx_timeout_task);
> 
> Can't tx_timeout_task be triggered just between these two calls here?

More exactly: except when this is called from dev_close(), where it
should work OK. (At least until tx_timeout_task doesn't take any lock
held here - especially rtnl_lock.)

Jarek P.

> 
> > +	e100_do_down(nic);
> > +}
> > +
> >  static void e100_tx_timeout(struct net_device *netdev)
> >  {
> >  	struct nic *nic = netdev_priv(netdev);
> > @@ -2261,7 +2270,7 @@ static void e100_tx_timeout_task(struct work_struct *work)
> >  
> >  	DPRINTK(TX_ERR, DEBUG, "scb.status=0x%02X\n",
> >  		ioread8(&nic->csr->scb.status));
> > -	e100_down(netdev_priv(netdev));
> > +	e100_do_down(netdev_priv(netdev));
> >  	e100_up(netdev_priv(netdev));
> >  }
> >  
> > --

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] e100: Fix workqueue race
  2010-01-22  9:07   ` Jarek Poplawski
@ 2010-01-22  9:38     ` Jarek Poplawski
  0 siblings, 0 replies; 5+ messages in thread
From: Jarek Poplawski @ 2010-01-22  9:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: jesse.brandeburg, netdev

On Fri, Jan 22, 2010 at 09:07:31AM +0000, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 08:42:00AM +0000, Jarek Poplawski wrote:
> > On 21-01-2010 17:48, Alan Cox wrote:
> > > (Incidentally this doesn't seem to be the only net driver that looks
> > > suspect here)
> > > 
> > > e100: Fix the TX workqueue race
> > > 
> > > From: Alan Cox <alan@linux.intel.com>
> > > 
> > > Nothing stops the workqueue being left to run in parallel with close or a
> > > few other operations. This causes double unmaps and the like.
> > > 
> > > See kerneloops.org #1041230 for an example
> > > 
> > > Signed-off-by: Alan Cox <alan@linux.intel.com>
> > > ---
> > > 
> > >  drivers/net/e100.c |   13 +++++++++++--
> > >  1 files changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > 
> > > diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> > > index 5c7a155..5e02e4f 100644
> > > --- a/drivers/net/e100.c
> > > +++ b/drivers/net/e100.c
> > > @@ -2232,7 +2232,7 @@ err_rx_clean_list:
> > >  	return err;
> > >  }
> > >  
> > > -static void e100_down(struct nic *nic)
> > > +static void e100_do_down(struct nic *nic)
> > >  {
> > >  	/* wait here for poll to complete */
> > >  	napi_disable(&nic->napi);
> > > @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
> > >  	e100_rx_clean_list(nic);
> > >  }
> > >  
> > > +/* For the non TX timeout case we want to kill the tx timeout before
> > > +   we do this otherwise a parallel tx timeout will make a nasty mess. */
> > > +
> > > +static void e100_down(struct nic *nic)
> > > +{
> > > +	cancel_work_sync(&nic->tx_timeout_task);
> > 
> > Can't tx_timeout_task be triggered just between these two calls here?
> 
> More exactly: except when this is called from dev_close(), where it
> should work OK. (At least until tx_timeout_task doesn't take any lock
> held here - especially rtnl_lock.)

Hmm... Even more exactly, since tx_timeout_task can be triggered not
only by dev_watchdog(), dev_close() is suspicious too.

Jarek P.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-22  9:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-21 16:48 [RFC PATCH] e100: Fix workqueue race Alan Cox
2010-01-21 17:20 ` Stephen Hemminger
2010-01-22  8:42 ` Jarek Poplawski
2010-01-22  9:07   ` Jarek Poplawski
2010-01-22  9:38     ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).