From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Miller" Subject: serious netpoll bug w/NAPI Date: Tue, 8 Feb 2005 20:16:34 -0800 Message-ID: <20050208201634.03074349.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: mpm@selenic.com To: netdev@oss.sgi.com Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org Consider a NAPI device currently executing it's poll function, pushing SKBs into the networking stack. Some of these will generate response packets etc. If for some reason a printk() is generated by the packet processing and: 1) the netconsole output device is the same as the NAPI device processing packets 2) netif_queue_stopped() is true because the tx queue is full the netpoll code will recurse back into the driver's poll function. This is incredibly illegal and results in all kinds of driver state corruption. ->poll() must execute only once at a time. This situation is actually quite common, via the ipt_LOG.c packet logging module. What the netpoll code appears to be trying to do is get the TX queue to make forward progress by invoking ->poll() if pending. The trouble is, that ->poll() at the top level will not clear the __LINK_STATE_RX_SCHED bit and delete itself from the poll list until it is done with ->poll() processing. So we get backtraces like: tg3_rx() tg3_poll() poll_napi() netpoll_poll() write_msg() .. printk() ... ip_rcv() ... netif_receive_skb() tg3_rx() tg3_poll() net_rx_action() __do_softirq() do_softirq() resulting in RX queue corruption in the driver and usually NULL skb pointer dereferences.