From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [0/8] netpoll/bridge fixes Date: Tue, 15 Jun 2010 22:08:08 -0700 Message-ID: <20100616050808.GD2911@linux.vnet.ibm.com> References: <20100610145915.721a86b7@nehalam> <20100610224839.GA22469@gondor.apana.org.au> <20100611021142.GA24490@gondor.apana.org.au> <20100615.113940.245399246.davem@davemloft.net> <1276657139.19249.50.camel@edumazet-laptop> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: David Miller , herbert@gondor.apana.org.au, shemminger@vyatta.com, mst@redhat.com, frzhang@redhat.com, netdev@vger.kernel.org, amwang@redhat.com, mpm@selenic.com To: Eric Dumazet Return-path: Received: from e4.ny.us.ibm.com ([32.97.182.144]:49547 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754095Ab0FPFIM (ORCPT ); Wed, 16 Jun 2010 01:08:12 -0400 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by e4.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o5G4sru7031840 for ; Wed, 16 Jun 2010 00:54:53 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o5G58BSg115936 for ; Wed, 16 Jun 2010 01:08:11 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o5G58AMA016279 for ; Wed, 16 Jun 2010 01:08:11 -0400 Content-Disposition: inline In-Reply-To: <1276657139.19249.50.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Jun 16, 2010 at 04:58:59AM +0200, Eric Dumazet wrote: > Le mardi 15 juin 2010 =E0 11:39 -0700, David Miller a =E9crit : > > From: Herbert Xu > > Date: Fri, 11 Jun 2010 12:11:42 +1000 > >=20 > > > On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote: > > >> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrot= e: > > >> > > > >> > Okay, then add a comment where in_irq is used? > > >>=20 > > >> Actually let me put it into a wrapper. I'll respin the patches. > > >=20 > > > OK here is a repost. And this time it really is 8 patches :) > > > I've tested it lightly. > >=20 > > All applied to net-next-2.6, thanks Herbert. >=20 > Well... >=20 > [ 52.914014] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > [ 52.914018] [ INFO: suspicious rcu_dereference_check() usage. ] > [ 52.914020] --------------------------------------------------- > [ 52.914024] include/linux/netpoll.h:67 invoked rcu_dereference_che= ck() without protection! > [ 52.914027]=20 > [ 52.914027] other info that might help us debug this: > [ 52.914029]=20 > [ 52.914031]=20 > [ 52.914032] rcu_scheduler_active =3D 1, debug_locks =3D 1 > [ 52.914035] 4 locks held by swapper/0: > [ 52.914037] #0: (&n->timer){+.-...}, at: [] run_timer_= softirq+0x1b8/0x419 > [ 52.914052] #1: (slock-AF_INET){+.....}, at: [] icmp_s= end+0x149/0x58b > [ 52.914063] #2: (rcu_read_lock_bh){.+....}, at: [] dev= _queue_xmit+0xf7/0x5df > [ 52.914073] #3: (rcu_read_lock_bh){.+....}, at: [] net= if_rx+0x0/0x195 > [ 52.914081]=20 > [ 52.914081] stack backtrace: > [ 52.914086] Pid: 0, comm: swapper Not tainted 2.6.35-rc1-00508-gdb= e3a24-dirty #78 > [ 52.914089] Call Trace: > [ 52.914095] [] ? printk+0xf/0x13 > [ 52.914103] [] lockdep_rcu_dereference+0x74/0x7d > [ 52.914107] [] netif_rx+0x6b/0x195 > [ 52.914111] [] ? dev_queue_xmit+0xf7/0x5df > [ 52.914117] [] loopback_xmit+0x4a/0x70 > [ 52.914122] [] dev_hard_start_xmit+0x25b/0x322 > [ 52.914126] [] dev_queue_xmit+0x4c5/0x5df > [ 52.914131] [] ? trace_hardirqs_on+0xb/0xd > [ 52.914135] [] neigh_resolve_output+0x2e8/0x33f > [ 52.914142] [] ? eth_header+0x0/0x8e > [ 52.914147] [] ip_finish_output+0x323/0x3b1 > [ 52.914152] [] ? local_bh_enable_ip+0x97/0xad > [ 52.914156] [] ip_output+0xe2/0xfe > [ 52.914160] [] ip_local_out+0x41/0x55 > [ 52.914164] [] ip_push_pending_frames+0x284/0x2fa > [ 52.914169] [] icmp_push_reply+0xe8/0xf3 > [ 52.914174] [] icmp_send+0x542/0x58b > [ 52.914181] [] ? find_busiest_group+0x1c9/0x631 > [ 52.914188] [] ipv4_link_failure+0x17/0x7b > [ 52.914193] [] arp_error_report+0x46/0x61 > [ 52.914197] [] neigh_invalidate+0x68/0x80 > [ 52.914201] [] neigh_timer_handler+0x124/0x1d2 > [ 52.914206] [] run_timer_softirq+0x29e/0x419 > [ 52.914210] [] ? neigh_timer_handler+0x0/0x1d2 > [ 52.914215] [] __do_softirq+0x126/0x277 > [ 52.914219] [] ? __do_softirq+0x0/0x277 > [ 52.914222] [] ? irq_exit+0x38/0x74 > [ 52.914230] [] ? do_IRQ+0x87/0x9b > [ 52.914235] [] ? common_interrupt+0x2e/0x34 > [ 52.914241] [] ? sched_clock_local+0x3f/0x11f > [ 52.914249] [] ? acpi_idle_enter_bm+0x271/0x2a0 > [ 52.914256] [] ? cpuidle_idle_call+0x76/0x151 > [ 52.914261] [] ? cpu_idle+0x49/0x76 > [ 52.914266] [] ? rest_init+0xd6/0xdb > [ 52.914274] [] ? start_kernel+0x31b/0x320 > [ 52.914278] [] ? i386_start_kernel+0xc9/0xd0 >=20 >=20 > Paul, could you please explain if current lockdep rules are correct, = or could be relaxed ? >=20 > I thought : >=20 > rcu_read_lock_bh(); >=20 > was a shorthand to >=20 > local_disable_bh(); > rcu_read_lock(); In CONFIG_TREE_RCU and CONFIG_TINY_RCU, rcu_read_lock_bh() is actually shorthand for only local_disable_bh(). Therefore, rcu_dereference() will scream if only rcu_read_lock_bh() is held. However, in CONFIG_PREEMPT_TREE_RCU, rcu_read_lock_bh() is its own mechanism that does local_disable_bh() but has its own set of grace periods, independent of those of rcu_read_lock(). > Why lockdep is not able to make a correct diagnostic ? Here is the situation I am concerned about: o Task 0 does rcu_read_lock(), then p=3Drcu_dereference_bh(). If we make the change you are asking for, rcu_dereference_bh() is OK with this. o Task 0 now is preempted before finishing its RCU read-side critical section. o Task 1 removes the data element referenced by pointer p, then invokes synchronize_rcu_bh(). o Task 0 does not block synchronize_rcu_bh(), so the grace period completes. o Task 1 frees up the data element referenced by pointer p, which might be reallocated as some other type, unmapped, or whatever else. o Task 0 resumes, and is sadly disappointed when the data element referenced by pointer p has been swept out from under it. Or am I missing something here? Thanx, Paul > Thanks >=20 > [PATCH net-next-2.6] netpoll: Fix one rcu_dereference() lockdep splat >=20 > lockdep doesnt allow yet following construct : >=20 > rcu_read_lock_bh(); > npinfo =3D rcu_dereference(skb->dev->npinfo); >=20 > Fix lockdep splat using rcu_dereference_bh() >=20 > Signed-off-by: Eric Dumazet > --- > include/linux/netpoll.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) >=20 > diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h > index 4c77fe7..472365e 100644 > --- a/include/linux/netpoll.h > +++ b/include/linux/netpoll.h > @@ -64,7 +64,7 @@ static inline bool netpoll_rx(struct sk_buff *skb) > bool ret =3D false; >=20 > rcu_read_lock_bh(); > - npinfo =3D rcu_dereference(skb->dev->npinfo); > + npinfo =3D rcu_dereference_bh(skb->dev->npinfo); >=20 > if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags)) > goto out; >=20 >=20