From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH] net: cpu offline cause napi stall Date: Tue, 07 Jun 2011 01:01:36 -0700 (PDT) Message-ID: <20110607.010136.1857896380762569754.davem@davemloft.net> References: <20110601204233.GA2410@osiris.boeblingen.de.ibm.com> <20110606.145051.267562411413352856.davem@davemloft.net> <1307429403.2642.77.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1307429403.2642.77.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-Archive: List-Post: To: eric.dumazet@gmail.com Cc: heiko.carstens@de.ibm.com, blaschka@linux.vnet.ibm.com, netdev@vger.kernel.org, linux-s390@vger.kernel.org List-ID: From: Eric Dumazet Date: Tue, 07 Jun 2011 08:50:03 +0200 > From: Heiko Carstens > > Frank Blaschka reported : > > During heavy network load we turn off/on cpus. > Sometimes this causes a stall on the network device. > Digging into the dump I found out following: > > napi is scheduled but does not run. From the I/O buffers > and the napi state I see napi/rx_softirq processing has stopped > because the budget was reached. napi stays in the > softnet_data poll_list and the rx_softirq was raised again. > > I assume at this time the cpu offline comes in, > the rx softirq is raised/moved to another cpu but napi stays in the > poll_list of the softnet_data of the now offline cpu. > > Reviewing dev_cpu_callback (net/core/dev.c) I did not find the > poll_list is transfered to the new cpu. > > > This patch is a straightforward implementation of Frank suggestion : > > Transfert poll_list and trigger NET_RX_SOFTIRQ on new cpu. > > Reported-by: Frank Blaschka > Signed-off-by: Heiko Carstens > Signed-off-by: Eric Dumazet > Tested-by: Eric Dumazet Applied, thanks everyone.