From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Date: Fri, 17 Apr 2009 07:16:32 +0200 Message-ID: <49E810B0.9000906@cosmosbay.com> References: <20090415170111.6e1ca264@nehalam> <49E72E83.50702@trash.net> <20090416.153354.170676392.davem@davemloft.net> <20090416234955.GL6924@linux.vnet.ibm.com> <20090417012812.GA25534@linux.vnet.ibm.com> <20090416215033.3e648a7a@nehalam> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: paulmck@linux.vnet.ibm.com, David Miller , kaber@trash.net, torvalds@linux-foundation.org, jeff.chua.linux@gmail.com, paulus@samba.org, mingo@elte.hu, laijs@cn.fujitsu.com, jengelh@medozas.de, r000n@r000n.net, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, benh@kernel.crashing.org, mathieu.desnoyers@polymtl.ca To: Stephen Hemminger Return-path: In-Reply-To: <20090416215033.3e648a7a@nehalam> Sender: netfilter-devel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Stephen Hemminger a =E9crit : > On Thu, 16 Apr 2009 18:28:12 -0700 > "Paul E. McKenney" wrote: >=20 >> On Thu, Apr 16, 2009 at 04:49:55PM -0700, Paul E. McKenney wrote: >>> On Thu, Apr 16, 2009 at 03:33:54PM -0700, David Miller wrote: >>>> From: Patrick McHardy >>>> Date: Thu, 16 Apr 2009 15:11:31 +0200 >>>> >>>>> Linus Torvalds wrote: >>>>>> On Wed, 15 Apr 2009, Stephen Hemminger wrote: >>>>>>> The counters are the bigger problem, otherwise we could just fr= ee >>>>>>> table >>>>>>> info via rcu. Do we really have to support: replace where the = counter >>>>>>> values coming out to user space are always exactly accurate, or= is it >>>>>>> allowed to replace a rule and maybe lose some counter ticks (wo= rst >>>>>>> case >>>>>>> NCPU-1). >>>>>> Why not just read the counters fromt he old one at RCU free time= (they >>>>>> are guaranteed to be stable at that point, since we're all done = with >>>>>> those entries), and apply them at that point to the current setu= p? >>>>> We need the counters immediately to copy them to userspace, so wa= iting >>>>> for an asynchronous RCU free is not going to work. >>>> It just occurred to me that since all netfilter packet handling >>>> goes through one place, we could have a sort-of "netfilter RCU" >>>> of sorts to solve this problem. >>> OK, I am putting one together... >>> >>> It will be needed sooner or later, though I suspect per-CPU locking >>> would work fine in this case. >> And here is a crude first cut. Untested, probably does not even com= pile. >> >> Straight conversion of Mathieu Desnoyers's user-space RCU implementa= tion >> at git://lttng.org/userspace-rcu.git to the kernel (and yes, I did h= elp >> a little, but he must bear the bulk of the guilt). Pick on srcu.h >> and srcu.c out of sheer laziness. User-space testing gives deep >> sub-microsecond grace-period latencies, so should be fast enough, at >> least if you don't mind two smp_call_function() invocations per grac= e >> period and spinning on each instance of a per-CPU variable. >> >> Again, I believe per-CPU locking should work fine for the netfilter >> counters, but I guess "friends don't let friends use hashed locks". >> (I would not know for sure, never having used them myself, except of >> course to protect hash tables.) >> >> Most definitely -not- for inclusion at this point. Next step is to = hack >> up the relevant rcutorture code and watch it explode on contact. ;-= ) >> >> Signed-off-by: Paul E. McKenney >=20 > I am glad to see this worked on, but would rather not use RCU in this= case > of iptables. It would be good for some of the other long grace period= sutff. >=20 > The code to per-cpu entry consolidation by alloc/flip in 2.6.30-rc2 w= as > hard to debug and more convoluted so it probably would be a long term= maintaince > nightmare. The issue was the variable size skip structure so it made > for lots of iterators, etc. If the non-RCU per-cpu spinlock version i= s just > as fast, it is easier to understand. I agree that for 2.6.30, we could use a per-cpu spinlock as your last p= atch did, this would be very risky to push this new RCU right now.=20 But this new stuff looks very promising, (no more locked ops on fast pa= th), and considering new percpu_{add|sub...} infra, very fast : static inline void rcu_read_unlock_fgp(void) { barrier(); percpu_sub(rcu_fgp_active_readers, 1); /* one instruction on x86 */ preempt_enable(); } I wonder if IPI are really necessary on x86 if we use percpu_sub() sinc= e it already contains a barrier, and rcu_read_lock_fgp(void) also ends wi= th a barrier() call... -- To unsubscribe from this list: send the line "unsubscribe netfilter-dev= el" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html