From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cong Wang Subject: Re: NULL deref in bnx2 / crashes ? ( was: netconsole leads to stalled CPU task ) Date: Thu, 23 Aug 2012 08:31:14 +0000 (UTC) Message-ID: References: <1345634026.5158.1084.camel@edumazet-glaptop> <1345640757.5158.1321.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org Return-path: Received: from plane.gmane.org ([80.91.229.3]:38006 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933648Ab2HWIba (ORCPT ); Thu, 23 Aug 2012 04:31:30 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1T4Sp4-0000j1-GY for netdev@vger.kernel.org; Thu, 23 Aug 2012 10:31:26 +0200 Received: from 112.132.201.16 ([112.132.201.16]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 23 Aug 2012 10:31:26 +0200 Received: from xiyou.wangcong by 112.132.201.16 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 23 Aug 2012 10:31:26 +0200 Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 23 Aug 2012 at 07:57 GMT, Cong Wang wrote: > On Wed, 22 Aug 2012 at 14:29 GMT, Sylvain Munaut wrote: >> Hi, >> >> >> The machine with the intel card still hard freeze (no output / no nothing ...) >> The machine with the bnx2 don't crash anymore and no NULL deref, but >> the modprobe still hangs and I get this every 180 sec or so : > > The NULL-deref can be reproduced easily, and Eric's patch could fix it. > So, Eric, can you resend your patch with your SOB? > > I can't reproduce the hang as it is net driver specific, it is > probably related with my patch: > > commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d > Author: Amerigo Wang > Date: Fri Aug 10 01:24:50 2012 +0000 > > netpoll: re-enable irq in poll_napi() > Could you test the following patch? diff --git a/net/core/netpoll.c b/net/core/netpoll.c index ddc453b..ed4d1e4 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -166,11 +166,18 @@ static int poll_one_napi(struct netpoll_info *npinfo, static void poll_napi(struct net_device *dev) { struct napi_struct *napi; + LIST_HEAD(napi_list); int budget = 16; WARN_ON_ONCE(!irqs_disabled()); - list_for_each_entry(napi, &dev->napi_list, dev_list) { + /* After we enable the IRQ, new entries could be added + * to this list, we need to save it before re-enable + * IRQ. + */ + list_splice_tail(&dev->napi_list, &napi_list); + + list_for_each_entry(napi, &napi_list, dev_list) { local_irq_enable(); if (napi->poll_owner != smp_processor_id() && spin_trylock(&napi->poll_lock)) { @@ -187,6 +194,7 @@ static void poll_napi(struct net_device *dev) } local_irq_disable(); } + list_splice_tail(&napi_list, &dev->napi_list); } static void service_arp_queue(struct netpoll_info *npi) However, it seems we should take rtnl lock to make sure dev->napi_list is really safe, I am not sure if the following one makes sense. diff --git a/net/core/netpoll.c b/net/core/netpoll.c index ddc453b..7770e2b 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -170,8 +170,9 @@ static void poll_napi(struct net_device *dev) WARN_ON_ONCE(!irqs_disabled()); + local_irq_enable(); + rtnl_lock(); list_for_each_entry(napi, &dev->napi_list, dev_list) { - local_irq_enable(); if (napi->poll_owner != smp_processor_id() && spin_trylock(&napi->poll_lock)) { rcu_read_lock_bh(); @@ -180,13 +181,12 @@ static void poll_napi(struct net_device *dev) rcu_read_unlock_bh(); spin_unlock(&napi->poll_lock); - if (!budget) { - local_irq_disable(); + if (!budget) break; - } } - local_irq_disable(); } + rtnl_unlock(); + local_irq_disable(); } static void service_arp_queue(struct netpoll_info *npi)