From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matt Mackall Subject: Re: serious netpoll bug w/NAPI Date: Wed, 9 Feb 2005 10:32:19 -0800 Message-ID: <20050209183219.GA2366@waste.org> References: <20050208201634.03074349.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@oss.sgi.com To: "David S. Miller" , Jeff Moyer Content-Disposition: inline In-Reply-To: <20050208201634.03074349.davem@davemloft.net> Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org On Tue, Feb 08, 2005 at 08:16:34PM -0800, David S. Miller wrote: > > Consider a NAPI device currently executing it's poll function, > pushing SKBs into the networking stack. > > Some of these will generate response packets etc. > > If for some reason a printk() is generated by the packet processing > and: > > 1) the netconsole output device is the same as the NAPI device > processing packets > > 2) netif_queue_stopped() is true because the tx queue is full > > the netpoll code will recurse back into the driver's poll function. > This is incredibly illegal and results in all kinds of driver state > corruption. ->poll() must execute only once at a time. On closer inspection, there's a couple other related failure cases with the new ->poll logic in netpoll. I'm afraid it looks like CONFIG_NETPOLL will need to guard ->poll() with a per-device spinlock on netpoll-enabled devices. This will mean putting a pointer to struct netpoll in struct net_device (which I should have done in the first place) and will take a few patches to sort out. -- Mathematics is the supreme nostalgia of our time.