From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jarek Poplawski Subject: Re: [RFC PATCH] e100: Fix workqueue race Date: Fri, 22 Jan 2010 09:38:34 +0000 Message-ID: <20100122093834.GA7629@ff.dom.local> References: <20100121164801.416170b9@linux.intel.com> <20100122084200.GB6200@ff.dom.local> <20100122090731.GC6200@ff.dom.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: jesse.brandeburg@intel.com, netdev@vger.kernel.org To: Alan Cox Return-path: Received: from mail-fx0-f223.google.com ([209.85.220.223]:53233 "EHLO mail-fx0-f223.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753118Ab0AVJik (ORCPT ); Fri, 22 Jan 2010 04:38:40 -0500 Received: by fxm23 with SMTP id 23so1080008fxm.38 for ; Fri, 22 Jan 2010 01:38:39 -0800 (PST) Content-Disposition: inline In-Reply-To: <20100122090731.GC6200@ff.dom.local> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Jan 22, 2010 at 09:07:31AM +0000, Jarek Poplawski wrote: > On Fri, Jan 22, 2010 at 08:42:00AM +0000, Jarek Poplawski wrote: > > On 21-01-2010 17:48, Alan Cox wrote: > > > (Incidentally this doesn't seem to be the only net driver that looks > > > suspect here) > > > > > > e100: Fix the TX workqueue race > > > > > > From: Alan Cox > > > > > > Nothing stops the workqueue being left to run in parallel with close or a > > > few other operations. This causes double unmaps and the like. > > > > > > See kerneloops.org #1041230 for an example > > > > > > Signed-off-by: Alan Cox > > > --- > > > > > > drivers/net/e100.c | 13 +++++++++++-- > > > 1 files changed, 11 insertions(+), 2 deletions(-) > > > > > > > > > diff --git a/drivers/net/e100.c b/drivers/net/e100.c > > > index 5c7a155..5e02e4f 100644 > > > --- a/drivers/net/e100.c > > > +++ b/drivers/net/e100.c > > > @@ -2232,7 +2232,7 @@ err_rx_clean_list: > > > return err; > > > } > > > > > > -static void e100_down(struct nic *nic) > > > +static void e100_do_down(struct nic *nic) > > > { > > > /* wait here for poll to complete */ > > > napi_disable(&nic->napi); > > > @@ -2245,6 +2245,15 @@ static void e100_down(struct nic *nic) > > > e100_rx_clean_list(nic); > > > } > > > > > > +/* For the non TX timeout case we want to kill the tx timeout before > > > + we do this otherwise a parallel tx timeout will make a nasty mess. */ > > > + > > > +static void e100_down(struct nic *nic) > > > +{ > > > + cancel_work_sync(&nic->tx_timeout_task); > > > > Can't tx_timeout_task be triggered just between these two calls here? > > More exactly: except when this is called from dev_close(), where it > should work OK. (At least until tx_timeout_task doesn't take any lock > held here - especially rtnl_lock.) Hmm... Even more exactly, since tx_timeout_task can be triggered not only by dev_watchdog(), dev_close() is suspicious too. Jarek P.