From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jarek Poplawski Subject: Re: [RFC PATCH] e100: Fix workqueue race Date: Fri, 22 Jan 2010 09:07:31 +0000 Message-ID: <20100122090731.GC6200@ff.dom.local> References: <20100121164801.416170b9@linux.intel.com> <20100122084200.GB6200@ff.dom.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: jesse.brandeburg@intel.com, netdev@vger.kernel.org To: Alan Cox Return-path: Received: from mail-fx0-f223.google.com ([209.85.220.223]:52301 "EHLO mail-fx0-f223.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751844Ab0AVJHj (ORCPT ); Fri, 22 Jan 2010 04:07:39 -0500 Received: by fxm23 with SMTP id 23so1050290fxm.38 for ; Fri, 22 Jan 2010 01:07:37 -0800 (PST) Content-Disposition: inline In-Reply-To: <20100122084200.GB6200@ff.dom.local> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Jan 22, 2010 at 08:42:00AM +0000, Jarek Poplawski wrote: > On 21-01-2010 17:48, Alan Cox wrote: > > (Incidentally this doesn't seem to be the only net driver that looks > > suspect here) > > > > e100: Fix the TX workqueue race > > > > From: Alan Cox > > > > Nothing stops the workqueue being left to run in parallel with close or a > > few other operations. This causes double unmaps and the like. > > > > See kerneloops.org #1041230 for an example > > > > Signed-off-by: Alan Cox > > --- > > > > drivers/net/e100.c | 13 +++++++++++-- > > 1 files changed, 11 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/e100.c b/drivers/net/e100.c > > index 5c7a155..5e02e4f 100644 > > --- a/drivers/net/e100.c > > +++ b/drivers/net/e100.c > > @@ -2232,7 +2232,7 @@ err_rx_clean_list: > > return err; > > } > > > > -static void e100_down(struct nic *nic) > > +static void e100_do_down(struct nic *nic) > > { > > /* wait here for poll to complete */ > > napi_disable(&nic->napi); > > @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic) > > e100_rx_clean_list(nic); > > } > > > > +/* For the non TX timeout case we want to kill the tx timeout before > > + we do this otherwise a parallel tx timeout will make a nasty mess. */ > > + > > +static void e100_down(struct nic *nic) > > +{ > > + cancel_work_sync(&nic->tx_timeout_task); > > Can't tx_timeout_task be triggered just between these two calls here? More exactly: except when this is called from dev_close(), where it should work OK. (At least until tx_timeout_task doesn't take any lock held here - especially rtnl_lock.) Jarek P. > > > + e100_do_down(nic); > > +} > > + > > static void e100_tx_timeout(struct net_device *netdev) > > { > > struct nic *nic = netdev_priv(netdev); > > @@ -2261,7 +2270,7 @@ static void e100_tx_timeout_task(struct work_struct *work) > > > > DPRINTK(TX_ERR, DEBUG, "scb.status=0x%02X\n", > > ioread8(&nic->csr->scb.status)); > > - e100_down(netdev_priv(netdev)); > > + e100_do_down(netdev_priv(netdev)); > > e100_up(netdev_priv(netdev)); > > } > > > > --