* [RFC PATCH] e100: Fix workqueue race
@ 2010-01-21 16:48 Alan Cox
2010-01-21 17:20 ` Stephen Hemminger
2010-01-22 8:42 ` Jarek Poplawski
0 siblings, 2 replies; 5+ messages in thread
From: Alan Cox @ 2010-01-21 16:48 UTC (permalink / raw)
To: jesse.brandeburg, netdev
(Incidentally this doesn't seem to be the only net driver that looks
suspect here)
e100: Fix the TX workqueue race
From: Alan Cox <alan@linux.intel.com>
Nothing stops the workqueue being left to run in parallel with close or a
few other operations. This causes double unmaps and the like.
See kerneloops.org #1041230 for an example
Signed-off-by: Alan Cox <alan@linux.intel.com>
---
drivers/net/e100.c | 13 +++++++++++--
1 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 5c7a155..5e02e4f 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2232,7 +2232,7 @@ err_rx_clean_list:
return err;
}
-static void e100_down(struct nic *nic)
+static void e100_do_down(struct nic *nic)
{
/* wait here for poll to complete */
napi_disable(&nic->napi);
@@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
e100_rx_clean_list(nic);
}
+/* For the non TX timeout case we want to kill the tx timeout before
+ we do this otherwise a parallel tx timeout will make a nasty mess. */
+
+static void e100_down(struct nic *nic)
+{
+ cancel_work_sync(&nic->tx_timeout_task);
+ e100_do_down(nic);
+}
+
static void e100_tx_timeout(struct net_device *netdev)
{
struct nic *nic = netdev_priv(netdev);
@@ -2261,7 +2270,7 @@ static void e100_tx_timeout_task(struct work_struct *work)
DPRINTK(TX_ERR, DEBUG, "scb.status=0x%02X\n",
ioread8(&nic->csr->scb.status));
- e100_down(netdev_priv(netdev));
+ e100_do_down(netdev_priv(netdev));
e100_up(netdev_priv(netdev));
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] e100: Fix workqueue race
2010-01-21 16:48 [RFC PATCH] e100: Fix workqueue race Alan Cox
@ 2010-01-21 17:20 ` Stephen Hemminger
2010-01-22 8:42 ` Jarek Poplawski
1 sibling, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2010-01-21 17:20 UTC (permalink / raw)
To: Alan Cox; +Cc: jesse.brandeburg, netdev
On Thu, 21 Jan 2010 16:48:01 +0000
Alan Cox <alan@linux.intel.com> wrote:
> (Incidentally this doesn't seem to be the only net driver that looks
> suspect here)
>
> e100: Fix the TX workqueue race
>
> From: Alan Cox <alan@linux.intel.com>
>
> Nothing stops the workqueue being left to run in parallel with close or a
> few other operations. This causes double unmaps and the like.
>
> See kerneloops.org #1041230 for an example
>
> Signed-off-by: Alan Cox <alan@linux.intel.com>
Most drivers solve this by getting rtnl_lock in the timeout work
function.
--
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] e100: Fix workqueue race
2010-01-21 16:48 [RFC PATCH] e100: Fix workqueue race Alan Cox
2010-01-21 17:20 ` Stephen Hemminger
@ 2010-01-22 8:42 ` Jarek Poplawski
2010-01-22 9:07 ` Jarek Poplawski
1 sibling, 1 reply; 5+ messages in thread
From: Jarek Poplawski @ 2010-01-22 8:42 UTC (permalink / raw)
To: Alan Cox; +Cc: jesse.brandeburg, netdev
On 21-01-2010 17:48, Alan Cox wrote:
> (Incidentally this doesn't seem to be the only net driver that looks
> suspect here)
>
> e100: Fix the TX workqueue race
>
> From: Alan Cox <alan@linux.intel.com>
>
> Nothing stops the workqueue being left to run in parallel with close or a
> few other operations. This causes double unmaps and the like.
>
> See kerneloops.org #1041230 for an example
>
> Signed-off-by: Alan Cox <alan@linux.intel.com>
> ---
>
> drivers/net/e100.c | 13 +++++++++++--
> 1 files changed, 11 insertions(+), 2 deletions(-)
>
>
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 5c7a155..5e02e4f 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -2232,7 +2232,7 @@ err_rx_clean_list:
> return err;
> }
>
> -static void e100_down(struct nic *nic)
> +static void e100_do_down(struct nic *nic)
> {
> /* wait here for poll to complete */
> napi_disable(&nic->napi);
> @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
> e100_rx_clean_list(nic);
> }
>
> +/* For the non TX timeout case we want to kill the tx timeout before
> + we do this otherwise a parallel tx timeout will make a nasty mess. */
> +
> +static void e100_down(struct nic *nic)
> +{
> + cancel_work_sync(&nic->tx_timeout_task);
Can't tx_timeout_task be triggered just between these two calls here?
Jarek P.
> + e100_do_down(nic);
> +}
> +
> static void e100_tx_timeout(struct net_device *netdev)
> {
> struct nic *nic = netdev_priv(netdev);
> @@ -2261,7 +2270,7 @@ static void e100_tx_timeout_task(struct work_struct *work)
>
> DPRINTK(TX_ERR, DEBUG, "scb.status=0x%02X\n",
> ioread8(&nic->csr->scb.status));
> - e100_down(netdev_priv(netdev));
> + e100_do_down(netdev_priv(netdev));
> e100_up(netdev_priv(netdev));
> }
>
> --
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] e100: Fix workqueue race
2010-01-22 8:42 ` Jarek Poplawski
@ 2010-01-22 9:07 ` Jarek Poplawski
2010-01-22 9:38 ` Jarek Poplawski
0 siblings, 1 reply; 5+ messages in thread
From: Jarek Poplawski @ 2010-01-22 9:07 UTC (permalink / raw)
To: Alan Cox; +Cc: jesse.brandeburg, netdev
On Fri, Jan 22, 2010 at 08:42:00AM +0000, Jarek Poplawski wrote:
> On 21-01-2010 17:48, Alan Cox wrote:
> > (Incidentally this doesn't seem to be the only net driver that looks
> > suspect here)
> >
> > e100: Fix the TX workqueue race
> >
> > From: Alan Cox <alan@linux.intel.com>
> >
> > Nothing stops the workqueue being left to run in parallel with close or a
> > few other operations. This causes double unmaps and the like.
> >
> > See kerneloops.org #1041230 for an example
> >
> > Signed-off-by: Alan Cox <alan@linux.intel.com>
> > ---
> >
> > drivers/net/e100.c | 13 +++++++++++--
> > 1 files changed, 11 insertions(+), 2 deletions(-)
> >
> >
> > diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> > index 5c7a155..5e02e4f 100644
> > --- a/drivers/net/e100.c
> > +++ b/drivers/net/e100.c
> > @@ -2232,7 +2232,7 @@ err_rx_clean_list:
> > return err;
> > }
> >
> > -static void e100_down(struct nic *nic)
> > +static void e100_do_down(struct nic *nic)
> > {
> > /* wait here for poll to complete */
> > napi_disable(&nic->napi);
> > @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
> > e100_rx_clean_list(nic);
> > }
> >
> > +/* For the non TX timeout case we want to kill the tx timeout before
> > + we do this otherwise a parallel tx timeout will make a nasty mess. */
> > +
> > +static void e100_down(struct nic *nic)
> > +{
> > + cancel_work_sync(&nic->tx_timeout_task);
>
> Can't tx_timeout_task be triggered just between these two calls here?
More exactly: except when this is called from dev_close(), where it
should work OK. (At least until tx_timeout_task doesn't take any lock
held here - especially rtnl_lock.)
Jarek P.
>
> > + e100_do_down(nic);
> > +}
> > +
> > static void e100_tx_timeout(struct net_device *netdev)
> > {
> > struct nic *nic = netdev_priv(netdev);
> > @@ -2261,7 +2270,7 @@ static void e100_tx_timeout_task(struct work_struct *work)
> >
> > DPRINTK(TX_ERR, DEBUG, "scb.status=0x%02X\n",
> > ioread8(&nic->csr->scb.status));
> > - e100_down(netdev_priv(netdev));
> > + e100_do_down(netdev_priv(netdev));
> > e100_up(netdev_priv(netdev));
> > }
> >
> > --
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] e100: Fix workqueue race
2010-01-22 9:07 ` Jarek Poplawski
@ 2010-01-22 9:38 ` Jarek Poplawski
0 siblings, 0 replies; 5+ messages in thread
From: Jarek Poplawski @ 2010-01-22 9:38 UTC (permalink / raw)
To: Alan Cox; +Cc: jesse.brandeburg, netdev
On Fri, Jan 22, 2010 at 09:07:31AM +0000, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 08:42:00AM +0000, Jarek Poplawski wrote:
> > On 21-01-2010 17:48, Alan Cox wrote:
> > > (Incidentally this doesn't seem to be the only net driver that looks
> > > suspect here)
> > >
> > > e100: Fix the TX workqueue race
> > >
> > > From: Alan Cox <alan@linux.intel.com>
> > >
> > > Nothing stops the workqueue being left to run in parallel with close or a
> > > few other operations. This causes double unmaps and the like.
> > >
> > > See kerneloops.org #1041230 for an example
> > >
> > > Signed-off-by: Alan Cox <alan@linux.intel.com>
> > > ---
> > >
> > > drivers/net/e100.c | 13 +++++++++++--
> > > 1 files changed, 11 insertions(+), 2 deletions(-)
> > >
> > >
> > > diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> > > index 5c7a155..5e02e4f 100644
> > > --- a/drivers/net/e100.c
> > > +++ b/drivers/net/e100.c
> > > @@ -2232,7 +2232,7 @@ err_rx_clean_list:
> > > return err;
> > > }
> > >
> > > -static void e100_down(struct nic *nic)
> > > +static void e100_do_down(struct nic *nic)
> > > {
> > > /* wait here for poll to complete */
> > > napi_disable(&nic->napi);
> > > @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic)
> > > e100_rx_clean_list(nic);
> > > }
> > >
> > > +/* For the non TX timeout case we want to kill the tx timeout before
> > > + we do this otherwise a parallel tx timeout will make a nasty mess. */
> > > +
> > > +static void e100_down(struct nic *nic)
> > > +{
> > > + cancel_work_sync(&nic->tx_timeout_task);
> >
> > Can't tx_timeout_task be triggered just between these two calls here?
>
> More exactly: except when this is called from dev_close(), where it
> should work OK. (At least until tx_timeout_task doesn't take any lock
> held here - especially rtnl_lock.)
Hmm... Even more exactly, since tx_timeout_task can be triggered not
only by dev_watchdog(), dev_close() is suspicious too.
Jarek P.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-01-22 9:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-21 16:48 [RFC PATCH] e100: Fix workqueue race Alan Cox
2010-01-21 17:20 ` Stephen Hemminger
2010-01-22 8:42 ` Jarek Poplawski
2010-01-22 9:07 ` Jarek Poplawski
2010-01-22 9:38 ` Jarek Poplawski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).