From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [PATCH 6/13] bridge: Add core IGMP snooping support Date: Fri, 5 Mar 2010 21:06:56 -0800 Message-ID: <20100306050656.GA6812@linux.vnet.ibm.com> References: <20100228054012.GA7583@gondor.apana.org.au> <20100305234327.GJ6764@linux.vnet.ibm.com> <20100306011718.GA12812@gondor.apana.org.au> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "David S. Miller" , netdev@vger.kernel.org, Stephen Hemminger To: Herbert Xu Return-path: Received: from e2.ny.us.ibm.com ([32.97.182.142]:58050 "EHLO e2.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750783Ab0CFFG7 (ORCPT ); Sat, 6 Mar 2010 00:06:59 -0500 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by e2.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id o264uZTW010343 for ; Fri, 5 Mar 2010 23:56:35 -0500 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o2656wwo176336 for ; Sat, 6 Mar 2010 00:06:58 -0500 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o2656vfx031798 for ; Sat, 6 Mar 2010 00:06:57 -0500 Content-Disposition: inline In-Reply-To: <20100306011718.GA12812@gondor.apana.org.au> Sender: netdev-owner@vger.kernel.org List-ID: On Sat, Mar 06, 2010 at 09:17:18AM +0800, Herbert Xu wrote: > On Fri, Mar 05, 2010 at 03:43:27PM -0800, Paul E. McKenney wrote: > > > > Cool!!! You use a pair of list_head structures, so that a given > > element can be in both the old and the new hash table simultaneously. > > Of course, an RCU grace period must elapse between consecutive resizings. > > Which appears to be addressed. > > Thanks :) I will try to extend this to other existing hash tables > where the number of updates can be limited like it is here. > > > The teardown needs an rcu_barrier_bh() rather than the current > > synchronize_rcu_bh(), please see below. > > All the call_rcu_bh's are done under multicast_lock. The first > check taken after taking the multicast_lock is whether we've > started the tear-down. So where it currently calls synchronize() > it should already be the case that no call_rcu_bh's are still > running. Agreed, but the callbacks registered by the call_rcu_bh() might run at any time, possibly quite some time after the synchronize_rcu_bh() completes. For example, the last call_rcu_bh() might register on one CPU, and the synchronize_rcu_bh() on another CPU. Then there is no guarantee that the call_rcu_bh()'s callback will execute before the synchronize_rcu_bh() returns. In contrast, rcu_barrier_bh() is guaranteed not to return until all pending RCU-bh callbacks have executed. > > Also, I don't see how the teardown code is preventing new readers from > > finding the data structures before they are being passed to call_rcu_bh(). > > You can't safely start the RCU grace period until -after- all new readers > > have been excluded. (But I could easily be missing something here.) > > I understand. However, AFAICS whatever it is that we are destroying > is taken off the reader's visible data structure before call_rcu_bh. > Do you have a particular case in mind where this is not the case? I might simply have missed the operation that removed reader visibility, looking again... Ah, I see it. The "br->mdb = NULL" in br_multicast_stop() makes it impossible for the readers to get to any of the data. Right? If so, my confusion, you are right, this one is OK. > > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and > > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking > > scheme correctly. > > Any function that modifies the data structure is done under the > multicast_lock, including br_multicast_del_pg. But spin_lock() does not take the place of rcu_read_lock_bh(). And so, in theory, the RCU-bh grace period could complete between the time that br_multicast_del_pg() does its call_rcu_bh() and the "*pp = p->next;" at the top of the next loop iteration. If so, then br_multicast_free_pg()'s kfree() will possibly have clobbered "p->next". Low probability, yes, but a long-running interrupt could do the trick. Or is there something I am missing that is preventing an RCU-bh grace period from completing near the bottom of br_multicast_del_pg()'s "for" loop? > > Hmmm... Where is the read-side code? Wherever it is, it cannot safely > > dereference the ->old pointer. > > Right, the old pointer is merely there to limit rehashings to one > per window. So it isn't used by the read-side. Good! > The read-side is the data path (non-IGMP multicast packets). The > sole entry point is br_mdb_get(). Hmmm... So the caller is responsible for rcu_read_lock_bh()? Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu() in __br_mdb_ip_get(), then? Or is something else going on here? Thanx, Paul > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt