From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: David Miller <davem@davemloft.net>
Cc: kaber@trash.net, torvalds@linux-foundation.org,
shemminger@vyatta.com, dada1@cosmosbay.com,
jeff.chua.linux@gmail.com, paulus@samba.org, mingo@elte.hu,
laijs@cn.fujitsu.com, jengelh@medozas.de, r000n@r000n.net,
linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org,
netdev@vger.kernel.org, benh@kernel.crashing.org,
mathieu.desnoyers@polymtl.ca
Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3)
Date: Thu, 16 Apr 2009 18:28:12 -0700 [thread overview]
Message-ID: <20090417012812.GA25534@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090416234955.GL6924@linux.vnet.ibm.com>
On Thu, Apr 16, 2009 at 04:49:55PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 16, 2009 at 03:33:54PM -0700, David Miller wrote:
> > From: Patrick McHardy <kaber@trash.net>
> > Date: Thu, 16 Apr 2009 15:11:31 +0200
> >
> > > Linus Torvalds wrote:
> > >> On Wed, 15 Apr 2009, Stephen Hemminger wrote:
> > >>> The counters are the bigger problem, otherwise we could just free
> > >>> table
> > >>> info via rcu. Do we really have to support: replace where the counter
> > >>> values coming out to user space are always exactly accurate, or is it
> > >>> allowed to replace a rule and maybe lose some counter ticks (worst
> > >>> case
> > >>> NCPU-1).
> > >> Why not just read the counters fromt he old one at RCU free time (they
> > >> are guaranteed to be stable at that point, since we're all done with
> > >> those entries), and apply them at that point to the current setup?
> > >
> > > We need the counters immediately to copy them to userspace, so waiting
> > > for an asynchronous RCU free is not going to work.
> >
> > It just occurred to me that since all netfilter packet handling
> > goes through one place, we could have a sort-of "netfilter RCU"
> > of sorts to solve this problem.
>
> OK, I am putting one together...
>
> It will be needed sooner or later, though I suspect per-CPU locking
> would work fine in this case.
And here is a crude first cut. Untested, probably does not even compile.
Straight conversion of Mathieu Desnoyers's user-space RCU implementation
at git://lttng.org/userspace-rcu.git to the kernel (and yes, I did help
a little, but he must bear the bulk of the guilt). Pick on srcu.h
and srcu.c out of sheer laziness. User-space testing gives deep
sub-microsecond grace-period latencies, so should be fast enough, at
least if you don't mind two smp_call_function() invocations per grace
period and spinning on each instance of a per-CPU variable.
Again, I believe per-CPU locking should work fine for the netfilter
counters, but I guess "friends don't let friends use hashed locks".
(I would not know for sure, never having used them myself, except of
course to protect hash tables.)
Most definitely -not- for inclusion at this point. Next step is to hack
up the relevant rcutorture code and watch it explode on contact. ;-)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
include/linux/srcu.h | 30 ++++++++++++++++++++++++
kernel/srcu.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 93 insertions(+)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index aca0eee..4577cd0 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -50,4 +50,34 @@ void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
void synchronize_srcu(struct srcu_struct *sp);
long srcu_batches_completed(struct srcu_struct *sp);
+/* Single bit for grace-period index, low-order bits are nesting counter. */
+#define RCU_FGP_COUNT 1UL
+#define RCU_FGP_PARITY (1UL << (sizeof(long) << 2))
+#define RCU_FGP_NEST_MASK (RCU_FGP_PARITY - 1)
+
+extern long rcu_fgp_ctr;
+DECLARE_PER_CPU(long, rcu_fgp_active_readers);
+
+static inline void rcu_read_lock_fgp(void)
+{
+ long tmp;
+ long *uarp;
+
+ preempt_disable();
+ uarp = &__get_cpu_var(rcu_fgp_active_readers);
+ tmp = *uarp;
+ if (likely(!(tmp & RCU_FGP_NEST_MASK)))
+ *uarp = rcu_fgp_ctr; /* Outermost rcu_read_lock(). */
+ else
+ *uarp = tmp + RCU_FGP_COUNT; /* Nested rcu_read_lock(). */
+ barrier();
+}
+
+static inline void rcu_read_unlock_fgp(void)
+{
+ barrier();
+ __get_cpu_var(rcu_fgp_active_readers)--;
+ preempt_enable();
+}
+
#endif
diff --git a/kernel/srcu.c b/kernel/srcu.c
index b0aeeaf..a5dc464 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -255,3 +255,66 @@ EXPORT_SYMBOL_GPL(srcu_read_lock);
EXPORT_SYMBOL_GPL(srcu_read_unlock);
EXPORT_SYMBOL_GPL(synchronize_srcu);
EXPORT_SYMBOL_GPL(srcu_batches_completed);
+
+DEFINE_MUTEX(rcu_fgp_mutex);
+long rcu_fgp_ctr = RCU_FGP_COUNT;
+DEFINE_PER_CPU(long, rcu_fgp_active_readers);
+
+/*
+ * Determine if the specified counter value indicates that we need to
+ * wait on the corresponding CPU to exit an rcu_fgp read-side critical
+ * section. Return non-zero if so.
+ *
+ * Assumes that rcu_fgp_mutex is held, and thus that rcu_fgp_ctr is
+ * unchanging.
+ */
+static inline int rcu_old_fgp_ongoing(long *value)
+{
+ long v = ACCESS_ONCE(*value);
+
+ return (v & RCU_FGP_NEST_MASK) &&
+ ((v ^ rcu_fgp_ctr) & RCU_FGP_PARITY);
+}
+
+static void rcu_fgp_wait_for_quiescent_state(void)
+{
+ int cpu;
+ long *uarp;
+
+ for_each_online_cpu(cpu) {
+ uarp = &per_cpu(rcu_fgp_active_readers, cpu);
+ while (rcu_old_fgp_ongoing(uarp))
+ cpu_relax();
+ }
+}
+
+static void rcu_fgp_do_mb(void *unused)
+{
+ smp_mb(); /* Each CPU does a memory barrier. */
+}
+
+void synchronize_rcu_fgp(void)
+{
+ mutex_lock(&rcu_fgp_mutex);
+
+ /* CPUs must see earlier change before parity flip. */
+ smp_call_function(rcu_fgp_do_mb, NULL, 1);
+
+ /*
+ * We must flip twice to correctly handle tasks that stall
+ * in rcu_read_lock_fgp() between the time that they fetch
+ * rcu_fgp_ctr and the time that the store to their CPU's
+ * rcu_fgp_active_readers. No matter when they resume
+ * execution, we will wait for them to get to the corresponding
+ * rcu_read_unlock_fgp().
+ */
+ ACCESS_ONCE(rcu_fgp_ctr) ^= RCU_FGP_PARITY; /* flip parity 0 -> 1 */
+ rcu_fgp_wait_for_quiescent_state(); /* wait for old readers */
+ ACCESS_ONCE(rcu_fgp_ctr) ^= RCU_FGP_PARITY; /* flip parity 1 -> 0 */
+ rcu_fgp_wait_for_quiescent_state(); /* wait for old readers */
+
+ /* Prevent CPUs from reordering out of prior RCU critical sections. */
+ smp_call_function(rcu_fgp_do_mb, NULL, 1);
+
+ mutex_unlock(&rcu_fgp_mutex);
+}
next prev parent reply other threads:[~2009-04-17 1:28 UTC|newest]
Thread overview: 216+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-10 9:15 iptables very slow after commit784544739a25c30637397ace5489eeb6e15d7d49 Jeff Chua
2009-04-10 16:52 ` Stephen Hemminger
2009-04-11 1:07 ` Jeff Chua
2009-04-11 1:25 ` David Miller
2009-04-11 1:39 ` iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Linus Torvalds
2009-04-11 4:15 ` Paul E. McKenney
2009-04-11 5:14 ` Jan Engelhardt
2009-04-11 5:42 ` Paul E. McKenney
2009-04-11 6:00 ` David Miller
2009-04-11 18:12 ` Kyle Moffett
2009-04-11 18:32 ` Arkadiusz Miskiewicz
2009-04-12 0:54 ` david
2009-04-12 5:05 ` Kyle Moffett
2009-04-12 12:30 ` Harald Welte
2009-04-12 16:38 ` Jan Engelhardt
2009-04-11 15:07 ` Stephen Hemminger
2009-04-11 16:05 ` Jeff Chua
2009-04-11 17:51 ` Linus Torvalds
2009-04-11 7:08 ` Ingo Molnar
2009-04-11 15:05 ` Stephen Hemminger
2009-04-11 17:48 ` Paul E. McKenney
2009-04-12 10:54 ` Ingo Molnar
2009-04-12 11:34 ` Paul Mackerras
2009-04-12 17:31 ` Paul E. McKenney
2009-04-13 1:13 ` David Miller
2009-04-13 4:04 ` Paul E. McKenney
2009-04-13 16:53 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU Stephen Hemminger
2009-04-13 17:40 ` Eric Dumazet
2009-04-13 18:11 ` Stephen Hemminger
2009-04-13 19:06 ` Martin Josefsson
2009-04-13 19:17 ` Linus Torvalds
2009-04-13 22:24 ` Andrew Morton
2009-04-13 23:20 ` Stephen Hemminger
2009-04-13 23:26 ` Andrew Morton
2009-04-13 23:37 ` Linus Torvalds
2009-04-13 23:52 ` Ingo Molnar
2009-04-14 12:27 ` Patrick McHardy
2009-04-14 14:23 ` Eric Dumazet
2009-04-14 14:45 ` Stephen Hemminger
2009-04-14 15:49 ` Eric Dumazet
2009-04-14 16:51 ` Jeff Chua
2009-04-14 18:17 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v2) Stephen Hemminger
2009-04-14 19:28 ` Eric Dumazet
2009-04-14 21:11 ` Stephen Hemminger
2009-04-14 21:13 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Stephen Hemminger
2009-04-14 21:40 ` Eric Dumazet
2009-04-15 10:59 ` Patrick McHardy
2009-04-15 16:31 ` Stephen Hemminger
2009-04-15 20:55 ` Stephen Hemminger
2009-04-15 21:07 ` Eric Dumazet
2009-04-15 21:55 ` Jan Engelhardt
2009-04-16 12:12 ` Patrick McHardy
2009-04-16 12:24 ` Jan Engelhardt
2009-04-16 12:31 ` Patrick McHardy
2009-04-15 21:57 ` [PATCH] netfilter: use per-cpu rwlock rather than RCU (v4) Stephen Hemminger
2009-04-15 23:48 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) David Miller
2009-04-16 0:01 ` Stephen Hemminger
2009-04-16 0:05 ` David Miller
2009-04-16 12:28 ` Patrick McHardy
2009-04-16 0:10 ` Linus Torvalds
2009-04-16 0:45 ` [PATCH] netfilter: use per-cpu spinlock and RCU (v5) Stephen Hemminger
2009-04-16 5:01 ` Eric Dumazet
2009-04-16 13:53 ` Patrick McHardy
2009-04-16 14:47 ` Paul E. McKenney
2009-04-16 16:10 ` [PATCH] netfilter: use per-cpu recursive spinlock (v6) Eric Dumazet
2009-04-16 16:20 ` Eric Dumazet
2009-04-16 16:37 ` Linus Torvalds
2009-04-16 16:59 ` Patrick McHardy
2009-04-16 17:58 ` Paul E. McKenney
2009-04-16 18:41 ` Eric Dumazet
2009-04-16 20:49 ` [PATCH[] netfilter: use per-cpu reader-writer lock (v0.7) Stephen Hemminger
2009-04-16 21:02 ` Linus Torvalds
2009-04-16 23:04 ` Ingo Molnar
2009-04-17 0:13 ` [PATCH] netfilter: use per-cpu recursive spinlock (v6) Paul E. McKenney
2009-04-16 13:11 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Patrick McHardy
2009-04-16 22:33 ` David Miller
2009-04-16 23:49 ` Paul E. McKenney
2009-04-16 23:52 ` [PATCH] netfilter: per-cpu spin-lock with recursion (v0.8) Stephen Hemminger
2009-04-17 0:15 ` Jeff Chua
2009-04-17 5:55 ` Peter Zijlstra
2009-04-17 6:03 ` Eric Dumazet
2009-04-17 6:14 ` Eric Dumazet
2009-04-17 17:08 ` Peter Zijlstra
2009-04-17 11:17 ` Patrick McHardy
2009-04-17 1:28 ` Paul E. McKenney [this message]
2009-04-17 2:19 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Mathieu Desnoyers
2009-04-17 5:05 ` Paul E. McKenney
2009-04-17 5:44 ` Mathieu Desnoyers
2009-04-17 14:51 ` Paul E. McKenney
2009-04-17 4:50 ` Stephen Hemminger
2009-04-17 5:08 ` Paul E. McKenney
2009-04-17 5:16 ` Eric Dumazet
2009-04-17 5:40 ` Paul E. McKenney
2009-04-17 8:07 ` David Miller
2009-04-17 15:00 ` Paul E. McKenney
2009-04-17 17:22 ` Peter Zijlstra
2009-04-17 17:32 ` Linus Torvalds
2009-04-17 6:12 ` Peter Zijlstra
2009-04-17 16:33 ` Paul E. McKenney
2009-04-17 16:51 ` Peter Zijlstra
2009-04-17 21:29 ` Paul E. McKenney
2009-04-18 9:40 ` Evgeniy Polyakov
2009-04-18 14:14 ` Paul E. McKenney
2009-04-20 17:34 ` [PATCH] netfilter: use per-cpu recursive lock (v10) Stephen Hemminger
2009-04-20 18:21 ` Paul E. McKenney
2009-04-20 18:25 ` Eric Dumazet
2009-04-20 20:32 ` Stephen Hemminger
2009-04-20 20:42 ` Stephen Hemminger
2009-04-20 21:05 ` Paul E. McKenney
2009-04-20 21:23 ` Paul Mackerras
2009-04-20 21:58 ` Paul E. McKenney
2009-04-20 22:41 ` Paul Mackerras
2009-04-20 23:01 ` [PATCH] netfilter: use per-cpu recursive lock (v11) Stephen Hemminger
2009-04-21 3:41 ` Lai Jiangshan
2009-04-21 3:56 ` Eric Dumazet
2009-04-21 4:15 ` Stephen Hemminger
2009-04-21 5:22 ` Lai Jiangshan
2009-04-21 5:45 ` Stephen Hemminger
2009-04-21 6:52 ` Lai Jiangshan
2009-04-21 8:16 ` Evgeniy Polyakov
2009-04-21 8:42 ` Lai Jiangshan
2009-04-21 8:49 ` David Miller
2009-04-21 8:55 ` Eric Dumazet
2009-04-21 9:22 ` Evgeniy Polyakov
2009-04-21 9:34 ` Lai Jiangshan
2009-04-21 5:34 ` Lai Jiangshan
2009-04-21 4:59 ` Eric Dumazet
2009-04-21 16:37 ` Paul E. McKenney
2009-04-21 5:46 ` Lai Jiangshan
2009-04-21 16:13 ` Linus Torvalds
2009-04-21 16:43 ` Stephen Hemminger
2009-04-21 16:50 ` Linus Torvalds
2009-04-21 18:02 ` Ingo Molnar
2009-04-21 18:15 ` Stephen Hemminger
2009-04-21 19:10 ` Ingo Molnar
2009-04-21 19:46 ` Eric Dumazet
2009-04-22 7:35 ` Ingo Molnar
2009-04-22 8:53 ` Eric Dumazet
2009-04-22 10:13 ` Jarek Poplawski
2009-04-22 11:26 ` Ingo Molnar
2009-04-22 11:39 ` Jarek Poplawski
2009-04-22 11:18 ` Ingo Molnar
2009-04-22 15:19 ` Linus Torvalds
2009-04-22 16:57 ` Eric Dumazet
2009-04-22 17:18 ` Linus Torvalds
2009-04-22 20:46 ` Jarek Poplawski
2009-04-22 17:48 ` Ingo Molnar
2009-04-21 21:04 ` Stephen Hemminger
2009-04-22 8:00 ` Ingo Molnar
2009-04-21 19:39 ` Ingo Molnar
2009-04-21 21:39 ` [PATCH] netfilter: use per-cpu recursive lock (v13) Stephen Hemminger
2009-04-22 4:17 ` Paul E. McKenney
2009-04-22 14:57 ` Eric Dumazet
2009-04-22 15:32 ` Linus Torvalds
2009-04-24 4:09 ` [PATCH] netfilter: use per-CPU recursive lock {XIV} Stephen Hemminger
2009-04-24 4:58 ` Eric Dumazet
2009-04-24 15:33 ` Patrick McHardy
2009-04-24 16:18 ` Stephen Hemminger
2009-04-24 20:43 ` Jarek Poplawski
2009-04-25 20:30 ` [PATCH] netfilter: iptables no lockdep is needed Stephen Hemminger
2009-04-26 8:18 ` Jarek Poplawski
2009-04-26 18:24 ` [PATCH] netfilter: use per-CPU recursive lock {XV} Eric Dumazet
2009-04-26 18:56 ` Mathieu Desnoyers
2009-04-26 21:57 ` Stephen Hemminger
2009-04-26 22:32 ` Mathieu Desnoyers
2009-04-27 17:44 ` Peter Zijlstra
2009-04-27 18:30 ` [PATCH] netfilter: use per-CPU r**ursive " Stephen Hemminger
2009-04-27 18:54 ` Ingo Molnar
2009-04-27 19:06 ` Stephen Hemminger
2009-04-27 19:46 ` Linus Torvalds
2009-04-27 19:48 ` Linus Torvalds
2009-04-27 20:36 ` Evgeniy Polyakov
2009-04-27 20:58 ` Linus Torvalds
2009-04-27 21:40 ` Stephen Hemminger
2009-04-27 22:24 ` Linus Torvalds
2009-04-27 23:01 ` Linus Torvalds
2009-04-27 23:03 ` Linus Torvalds
2009-04-28 6:58 ` Eric Dumazet
2009-04-28 11:53 ` David Miller
2009-04-28 12:40 ` Ingo Molnar
2009-04-28 13:43 ` David Miller
2009-04-28 13:52 ` Mathieu Desnoyers
2009-04-28 14:37 ` David Miller
2009-04-28 14:49 ` Mathieu Desnoyers
2009-04-28 15:00 ` David Miller
2009-04-28 16:24 ` [PATCH] netfilter: revised locking for x_tables Stephen Hemminger
2009-04-28 16:50 ` Linus Torvalds
2009-04-28 16:55 ` Linus Torvalds
2009-04-29 5:37 ` David Miller
2009-04-30 3:26 ` Jeff Chua
2009-04-30 3:31 ` David Miller
2009-04-28 15:42 ` [PATCH] netfilter: use per-CPU r**ursive lock {XV} Paul E. McKenney
2009-04-28 17:35 ` Christoph Lameter
2009-04-28 15:09 ` Linus Torvalds
2009-04-27 23:32 ` Linus Torvalds
2009-04-28 7:41 ` Peter Zijlstra
2009-04-28 14:22 ` Paul E. McKenney
2009-04-28 7:42 ` Jan Engelhardt
2009-04-26 19:31 ` [PATCH] netfilter: use per-CPU recursive " Mathieu Desnoyers
2009-04-26 20:55 ` Eric Dumazet
2009-04-26 21:39 ` Mathieu Desnoyers
2009-04-21 18:34 ` [PATCH] netfilter: use per-cpu recursive lock (v11) Paul E. McKenney
2009-04-21 20:14 ` Linus Torvalds
2009-04-20 23:44 ` [PATCH] netfilter: use per-cpu recursive lock (v10) Paul E. McKenney
2009-04-16 0:02 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Linus Torvalds
2009-04-16 6:26 ` Eric Dumazet
2009-04-16 14:33 ` Paul E. McKenney
2009-04-15 3:23 ` David Miller
2009-04-14 17:19 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU Stephen Hemminger
2009-04-11 15:50 ` iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Stephen Hemminger
2009-04-11 17:43 ` Paul E. McKenney
2009-04-11 18:57 ` Linus Torvalds
2009-04-12 0:34 ` Paul E. McKenney
2009-04-12 7:23 ` Evgeniy Polyakov
2009-04-12 16:06 ` Stephen Hemminger
2009-04-12 17:30 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090417012812.GA25534@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=jeff.chua.linux@gmail.com \
--cc=jengelh@medozas.de \
--cc=kaber@trash.net \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=paulus@samba.org \
--cc=r000n@r000n.net \
--cc=shemminger@vyatta.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).