From: Eric Dumazet <dada1@cosmosbay.com>
To: Stephen Hemminger <shemminger@vyatta.com>,
David Miller <davem@davemloft.net>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>, Paul Mackerras <paulus@samba.org>,
paulmck@linux.vnet.ibm.com, Evgeniy Polyakov <zbr@ioremap.net>,
kaber@trash.net, jeff.chua.linux@gmail.com, laijs@cn.fujitsu.com,
jengelh@medozas.de, r000n@r000n.net,
linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org,
netdev@vger.kernel.org, benh@kernel.crashing.org,
mathieu.desnoyers@polymtl.ca
Subject: [PATCH] netfilter: use per-CPU recursive lock {XV}
Date: Sun, 26 Apr 2009 20:24:35 +0200 [thread overview]
Message-ID: <49F4A6E3.7080102@cosmosbay.com> (raw)
In-Reply-To: <20090425133052.4cb711f5@nehalam>
From: Stephen Hemminger <shemminger@vyatta.com>
> Epilogue due to master Jarek. Lockdep carest not about the locking
> doth bestowed. Therefore no keys are needed.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
So far, so good, should be ready for inclusion now, nobody complained :)
I include the final patch, merge of your last two patches.
David, could you please review it once again and apply it if it's OK ?
Thanks to all for your help and patience
[PATCH] netfilter: use per-CPU recursive lock {XV}
In days of old in 2.6.29, netfilter did locketh using a
lock of the reader kind when doing its table business, and do
a writer when with pen in hand like a overworked accountant
did replace the tables. This sucketh and caused the single
lock to fly back and forth like a poor errant boy.
But then netfilter was blessed with RCU and the performance
was divine, but alas there were those that suffered for
trying to replace their many rules one at a time.
So now RCU must be vanquished from the scene, and better
chastity belts be placed upon this valuable asset most dear.
The locks that were but one are now replaced by one per suitor.
The repair was made after much discussion involving
Eric the wise, and Linus the foul. With flowers springing
up amid the thorns some peace has finally prevailed and
all is soothed. This patch and purple prose was penned by
in honor of "Talk like Shakespeare" day.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
What hath changed over the last two setting suns:
* more words, mostly correct...
* no need to locketh for writeh on current cpu tis
always so
* the locking of all cpu's on replace is always done as
part of the get_counters cycle, so the sychronize swip
in replace tables is gone with only a comment remaing
* Epilogue due to master Jarek. Lockdep carest not about
the locking doth bestowed. Therefore no keys are needed.
include/linux/netfilter/x_tables.h | 55 ++++++++++-
net/ipv4/netfilter/arp_tables.c | 125 +++++++-------------------
net/ipv4/netfilter/ip_tables.c | 126 +++++++--------------------
net/ipv6/netfilter/ip6_tables.c | 123 +++++++-------------------
net/netfilter/x_tables.c | 50 +++++-----
5 files changed, 183 insertions(+), 296 deletions(-)
diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 7b1a652..511debb 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -354,9 +354,6 @@ struct xt_table
/* What hooks you will enter on */
unsigned int valid_hooks;
- /* Lock for the curtain */
- struct mutex lock;
-
/* Man behind the curtain... */
struct xt_table_info *private;
@@ -434,8 +431,56 @@ extern void xt_proto_fini(struct net *net, u_int8_t af);
extern struct xt_table_info *xt_alloc_table_info(unsigned int size);
extern void xt_free_table_info(struct xt_table_info *info);
-extern void xt_table_entry_swap_rcu(struct xt_table_info *old,
- struct xt_table_info *new);
+
+/*
+ * Per-CPU read/write lock associated with per-cpu table entries.
+ * This is not a general solution but makes reader locking fast since
+ * there is no shared variable to cause cache ping-pong; but adds an
+ * additional write-side penalty since update must lock all
+ * possible CPU's.
+ *
+ * Read lock is used by ip/arp/ip6 tables rule processing which runs per-cpu.
+ * It needs to ensure that the rules are not being changed while packet
+ * is being processed. In some cases, the read lock will be acquired
+ * twice on the same CPU; this is okay because read locks handle nesting.
+ *
+ * Write lock is used in two cases:
+ * 1. reading counter values
+ * all readers need to be stopped and the per-CPU values are summed.
+ *
+ * 2. replacing tables
+ * any readers that are using the old tables have to complete
+ * before freeing the old table. This is handled by reading
+ * as a side effect of reading counters
+ */
+DECLARE_PER_CPU(rwlock_t, xt_info_locks);
+
+static inline void xt_info_rdlock_bh(void)
+{
+ /*
+ * Note: can not use read_lock_bh(&__get_cpu_var(xt_info_locks))
+ * because need to ensure that preemption is disable before
+ * acquiring per-cpu-variable, so do it as a two step process
+ */
+ local_bh_disable();
+ read_lock(&__get_cpu_var(xt_info_locks));
+}
+
+static inline void xt_info_rdunlock_bh(void)
+{
+ read_unlock_bh(&__get_cpu_var(xt_info_locks));
+}
+
+static inline void xt_info_wrlock(unsigned int cpu)
+{
+ write_lock(&per_cpu(xt_info_locks, cpu));
+}
+
+static inline void xt_info_wrunlock(unsigned int cpu)
+{
+
+ write_unlock(&per_cpu(xt_info_locks, cpu));
+}
/*
* This helper is performance critical and must be inlined
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 5ba533d..831fe18 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -253,9 +253,9 @@ unsigned int arpt_do_table(struct sk_buff *skb,
indev = in ? in->name : nulldevname;
outdev = out ? out->name : nulldevname;
- rcu_read_lock_bh();
- private = rcu_dereference(table->private);
- table_base = rcu_dereference(private->entries[smp_processor_id()]);
+ xt_info_rdlock_bh();
+ private = table->private;
+ table_base = private->entries[smp_processor_id()];
e = get_entry(table_base, private->hook_entry[hook]);
back = get_entry(table_base, private->underflow[hook]);
@@ -273,6 +273,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
hdr_len = sizeof(*arp) + (2 * sizeof(struct in_addr)) +
(2 * skb->dev->addr_len);
+
ADD_COUNTER(e->counters, hdr_len, 1);
t = arpt_get_target(e);
@@ -328,8 +329,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
e = (void *)e + e->next_offset;
}
} while (!hotdrop);
-
- rcu_read_unlock_bh();
+ xt_info_rdunlock_bh();
if (hotdrop)
return NF_DROP;
@@ -711,9 +711,12 @@ static void get_counters(const struct xt_table_info *t,
/* Instead of clearing (by a previous call to memset())
* the counters and using adds, we set the counters
* with data used by 'current' CPU
- * We dont care about preemption here.
+ *
+ * Bottom half has to be disabled to prevent deadlock
+ * if new softirq were to run and call ipt_do_table
*/
- curcpu = raw_smp_processor_id();
+ local_bh_disable();
+ curcpu = smp_processor_id();
i = 0;
ARPT_ENTRY_ITERATE(t->entries[curcpu],
@@ -726,73 +729,22 @@ static void get_counters(const struct xt_table_info *t,
if (cpu == curcpu)
continue;
i = 0;
+ xt_info_wrlock(cpu);
ARPT_ENTRY_ITERATE(t->entries[cpu],
t->size,
add_entry_to_counter,
counters,
&i);
+ xt_info_wrunlock(cpu);
}
-}
-
-
-/* We're lazy, and add to the first CPU; overflow works its fey magic
- * and everything is OK. */
-static int
-add_counter_to_entry(struct arpt_entry *e,
- const struct xt_counters addme[],
- unsigned int *i)
-{
- ADD_COUNTER(e->counters, addme[*i].bcnt, addme[*i].pcnt);
-
- (*i)++;
- return 0;
-}
-
-/* Take values from counters and add them back onto the current cpu */
-static void put_counters(struct xt_table_info *t,
- const struct xt_counters counters[])
-{
- unsigned int i, cpu;
-
- local_bh_disable();
- cpu = smp_processor_id();
- i = 0;
- ARPT_ENTRY_ITERATE(t->entries[cpu],
- t->size,
- add_counter_to_entry,
- counters,
- &i);
local_bh_enable();
}
-static inline int
-zero_entry_counter(struct arpt_entry *e, void *arg)
-{
- e->counters.bcnt = 0;
- e->counters.pcnt = 0;
- return 0;
-}
-
-static void
-clone_counters(struct xt_table_info *newinfo, const struct xt_table_info *info)
-{
- unsigned int cpu;
- const void *loc_cpu_entry = info->entries[raw_smp_processor_id()];
-
- memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
- for_each_possible_cpu(cpu) {
- memcpy(newinfo->entries[cpu], loc_cpu_entry, info->size);
- ARPT_ENTRY_ITERATE(newinfo->entries[cpu], newinfo->size,
- zero_entry_counter, NULL);
- }
-}
-
static struct xt_counters *alloc_counters(struct xt_table *table)
{
unsigned int countersize;
struct xt_counters *counters;
struct xt_table_info *private = table->private;
- struct xt_table_info *info;
/* We need atomic snapshot of counters: rest doesn't change
* (other than comefrom, which userspace doesn't care
@@ -802,30 +754,11 @@ static struct xt_counters *alloc_counters(struct xt_table *table)
counters = vmalloc_node(countersize, numa_node_id());
if (counters == NULL)
- goto nomem;
-
- info = xt_alloc_table_info(private->size);
- if (!info)
- goto free_counters;
-
- clone_counters(info, private);
-
- mutex_lock(&table->lock);
- xt_table_entry_swap_rcu(private, info);
- synchronize_net(); /* Wait until smoke has cleared */
+ return ERR_PTR(-ENOMEM);
- get_counters(info, counters);
- put_counters(private, counters);
- mutex_unlock(&table->lock);
-
- xt_free_table_info(info);
+ get_counters(private, counters);
return counters;
-
- free_counters:
- vfree(counters);
- nomem:
- return ERR_PTR(-ENOMEM);
}
static int copy_entries_to_user(unsigned int total_size,
@@ -1094,8 +1027,9 @@ static int __do_replace(struct net *net, const char *name,
(newinfo->number <= oldinfo->initial_entries))
module_put(t->me);
- /* Get the old counters. */
+ /* Get the old counters, and synchronize with replace */
get_counters(oldinfo, counters);
+
/* Decrease module usage counts and free resource */
loc_cpu_old_entry = oldinfo->entries[raw_smp_processor_id()];
ARPT_ENTRY_ITERATE(loc_cpu_old_entry, oldinfo->size, cleanup_entry,
@@ -1165,10 +1099,23 @@ static int do_replace(struct net *net, void __user *user, unsigned int len)
return ret;
}
+/* We're lazy, and add to the first CPU; overflow works its fey magic
+ * and everything is OK. */
+static int
+add_counter_to_entry(struct arpt_entry *e,
+ const struct xt_counters addme[],
+ unsigned int *i)
+{
+ ADD_COUNTER(e->counters, addme[*i].bcnt, addme[*i].pcnt);
+
+ (*i)++;
+ return 0;
+}
+
static int do_add_counters(struct net *net, void __user *user, unsigned int len,
int compat)
{
- unsigned int i;
+ unsigned int i, curcpu;
struct xt_counters_info tmp;
struct xt_counters *paddc;
unsigned int num_counters;
@@ -1224,26 +1171,26 @@ static int do_add_counters(struct net *net, void __user *user, unsigned int len,
goto free;
}
- mutex_lock(&t->lock);
+ local_bh_disable();
private = t->private;
if (private->number != num_counters) {
ret = -EINVAL;
goto unlock_up_free;
}
- preempt_disable();
i = 0;
/* Choose the copy that is on our node */
- loc_cpu_entry = private->entries[smp_processor_id()];
+ curcpu = smp_processor_id();
+ loc_cpu_entry = private->entries[curcpu];
+ xt_info_wrlock(curcpu);
ARPT_ENTRY_ITERATE(loc_cpu_entry,
private->size,
add_counter_to_entry,
paddc,
&i);
- preempt_enable();
+ xt_info_wrunlock(curcpu);
unlock_up_free:
- mutex_unlock(&t->lock);
-
+ local_bh_enable();
xt_table_unlock(t);
module_put(t->me);
free:
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 810c0b6..2ec8d72 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -338,10 +338,9 @@ ipt_do_table(struct sk_buff *skb,
tgpar.hooknum = hook;
IP_NF_ASSERT(table->valid_hooks & (1 << hook));
-
- rcu_read_lock_bh();
- private = rcu_dereference(table->private);
- table_base = rcu_dereference(private->entries[smp_processor_id()]);
+ xt_info_rdlock_bh();
+ private = table->private;
+ table_base = private->entries[smp_processor_id()];
e = get_entry(table_base, private->hook_entry[hook]);
@@ -436,8 +435,7 @@ ipt_do_table(struct sk_buff *skb,
e = (void *)e + e->next_offset;
}
} while (!hotdrop);
-
- rcu_read_unlock_bh();
+ xt_info_rdunlock_bh();
#ifdef DEBUG_ALLOW_ALL
return NF_ACCEPT;
@@ -896,10 +894,13 @@ get_counters(const struct xt_table_info *t,
/* Instead of clearing (by a previous call to memset())
* the counters and using adds, we set the counters
- * with data used by 'current' CPU
- * We dont care about preemption here.
+ * with data used by 'current' CPU.
+ *
+ * Bottom half has to be disabled to prevent deadlock
+ * if new softirq were to run and call ipt_do_table
*/
- curcpu = raw_smp_processor_id();
+ local_bh_disable();
+ curcpu = smp_processor_id();
i = 0;
IPT_ENTRY_ITERATE(t->entries[curcpu],
@@ -912,74 +913,22 @@ get_counters(const struct xt_table_info *t,
if (cpu == curcpu)
continue;
i = 0;
+ xt_info_wrlock(cpu);
IPT_ENTRY_ITERATE(t->entries[cpu],
t->size,
add_entry_to_counter,
counters,
&i);
+ xt_info_wrunlock(cpu);
}
-
-}
-
-/* We're lazy, and add to the first CPU; overflow works its fey magic
- * and everything is OK. */
-static int
-add_counter_to_entry(struct ipt_entry *e,
- const struct xt_counters addme[],
- unsigned int *i)
-{
- ADD_COUNTER(e->counters, addme[*i].bcnt, addme[*i].pcnt);
-
- (*i)++;
- return 0;
-}
-
-/* Take values from counters and add them back onto the current cpu */
-static void put_counters(struct xt_table_info *t,
- const struct xt_counters counters[])
-{
- unsigned int i, cpu;
-
- local_bh_disable();
- cpu = smp_processor_id();
- i = 0;
- IPT_ENTRY_ITERATE(t->entries[cpu],
- t->size,
- add_counter_to_entry,
- counters,
- &i);
local_bh_enable();
}
-
-static inline int
-zero_entry_counter(struct ipt_entry *e, void *arg)
-{
- e->counters.bcnt = 0;
- e->counters.pcnt = 0;
- return 0;
-}
-
-static void
-clone_counters(struct xt_table_info *newinfo, const struct xt_table_info *info)
-{
- unsigned int cpu;
- const void *loc_cpu_entry = info->entries[raw_smp_processor_id()];
-
- memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
- for_each_possible_cpu(cpu) {
- memcpy(newinfo->entries[cpu], loc_cpu_entry, info->size);
- IPT_ENTRY_ITERATE(newinfo->entries[cpu], newinfo->size,
- zero_entry_counter, NULL);
- }
-}
-
static struct xt_counters * alloc_counters(struct xt_table *table)
{
unsigned int countersize;
struct xt_counters *counters;
struct xt_table_info *private = table->private;
- struct xt_table_info *info;
/* We need atomic snapshot of counters: rest doesn't change
(other than comefrom, which userspace doesn't care
@@ -988,30 +937,11 @@ static struct xt_counters * alloc_counters(struct xt_table *table)
counters = vmalloc_node(countersize, numa_node_id());
if (counters == NULL)
- goto nomem;
+ return ERR_PTR(-ENOMEM);
- info = xt_alloc_table_info(private->size);
- if (!info)
- goto free_counters;
-
- clone_counters(info, private);
-
- mutex_lock(&table->lock);
- xt_table_entry_swap_rcu(private, info);
- synchronize_net(); /* Wait until smoke has cleared */
-
- get_counters(info, counters);
- put_counters(private, counters);
- mutex_unlock(&table->lock);
-
- xt_free_table_info(info);
+ get_counters(private, counters);
return counters;
-
- free_counters:
- vfree(counters);
- nomem:
- return ERR_PTR(-ENOMEM);
}
static int
@@ -1306,8 +1236,9 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks,
(newinfo->number <= oldinfo->initial_entries))
module_put(t->me);
- /* Get the old counters. */
+ /* Get the old counters, and synchronize with replace */
get_counters(oldinfo, counters);
+
/* Decrease module usage counts and free resource */
loc_cpu_old_entry = oldinfo->entries[raw_smp_processor_id()];
IPT_ENTRY_ITERATE(loc_cpu_old_entry, oldinfo->size, cleanup_entry,
@@ -1377,11 +1308,23 @@ do_replace(struct net *net, void __user *user, unsigned int len)
return ret;
}
+/* We're lazy, and add to the first CPU; overflow works its fey magic
+ * and everything is OK. */
+static int
+add_counter_to_entry(struct ipt_entry *e,
+ const struct xt_counters addme[],
+ unsigned int *i)
+{
+ ADD_COUNTER(e->counters, addme[*i].bcnt, addme[*i].pcnt);
+
+ (*i)++;
+ return 0;
+}
static int
do_add_counters(struct net *net, void __user *user, unsigned int len, int compat)
{
- unsigned int i;
+ unsigned int i, curcpu;
struct xt_counters_info tmp;
struct xt_counters *paddc;
unsigned int num_counters;
@@ -1437,25 +1380,26 @@ do_add_counters(struct net *net, void __user *user, unsigned int len, int compat
goto free;
}
- mutex_lock(&t->lock);
+ local_bh_disable();
private = t->private;
if (private->number != num_counters) {
ret = -EINVAL;
goto unlock_up_free;
}
- preempt_disable();
i = 0;
/* Choose the copy that is on our node */
- loc_cpu_entry = private->entries[raw_smp_processor_id()];
+ curcpu = smp_processor_id();
+ loc_cpu_entry = private->entries[curcpu];
+ xt_info_wrlock(curcpu);
IPT_ENTRY_ITERATE(loc_cpu_entry,
private->size,
add_counter_to_entry,
paddc,
&i);
- preempt_enable();
+ xt_info_wrunlock(curcpu);
unlock_up_free:
- mutex_unlock(&t->lock);
+ local_bh_enable();
xt_table_unlock(t);
module_put(t->me);
free:
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 800ae85..219e165 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -365,9 +365,9 @@ ip6t_do_table(struct sk_buff *skb,
IP_NF_ASSERT(table->valid_hooks & (1 << hook));
- rcu_read_lock_bh();
- private = rcu_dereference(table->private);
- table_base = rcu_dereference(private->entries[smp_processor_id()]);
+ xt_info_rdlock_bh();
+ private = table->private;
+ table_base = private->entries[smp_processor_id()];
e = get_entry(table_base, private->hook_entry[hook]);
@@ -466,7 +466,7 @@ ip6t_do_table(struct sk_buff *skb,
#ifdef CONFIG_NETFILTER_DEBUG
((struct ip6t_entry *)table_base)->comefrom = NETFILTER_LINK_POISON;
#endif
- rcu_read_unlock_bh();
+ xt_info_rdunlock_bh();
#ifdef DEBUG_ALLOW_ALL
return NF_ACCEPT;
@@ -926,9 +926,12 @@ get_counters(const struct xt_table_info *t,
/* Instead of clearing (by a previous call to memset())
* the counters and using adds, we set the counters
* with data used by 'current' CPU
- * We dont care about preemption here.
+ *
+ * Bottom half has to be disabled to prevent deadlock
+ * if new softirq were to run and call ipt_do_table
*/
- curcpu = raw_smp_processor_id();
+ local_bh_disable();
+ curcpu = smp_processor_id();
i = 0;
IP6T_ENTRY_ITERATE(t->entries[curcpu],
@@ -941,72 +944,22 @@ get_counters(const struct xt_table_info *t,
if (cpu == curcpu)
continue;
i = 0;
+ xt_info_wrlock(cpu);
IP6T_ENTRY_ITERATE(t->entries[cpu],
t->size,
add_entry_to_counter,
counters,
&i);
+ xt_info_wrunlock(cpu);
}
-}
-
-/* We're lazy, and add to the first CPU; overflow works its fey magic
- * and everything is OK. */
-static int
-add_counter_to_entry(struct ip6t_entry *e,
- const struct xt_counters addme[],
- unsigned int *i)
-{
- ADD_COUNTER(e->counters, addme[*i].bcnt, addme[*i].pcnt);
-
- (*i)++;
- return 0;
-}
-
-/* Take values from counters and add them back onto the current cpu */
-static void put_counters(struct xt_table_info *t,
- const struct xt_counters counters[])
-{
- unsigned int i, cpu;
-
- local_bh_disable();
- cpu = smp_processor_id();
- i = 0;
- IP6T_ENTRY_ITERATE(t->entries[cpu],
- t->size,
- add_counter_to_entry,
- counters,
- &i);
local_bh_enable();
}
-static inline int
-zero_entry_counter(struct ip6t_entry *e, void *arg)
-{
- e->counters.bcnt = 0;
- e->counters.pcnt = 0;
- return 0;
-}
-
-static void
-clone_counters(struct xt_table_info *newinfo, const struct xt_table_info *info)
-{
- unsigned int cpu;
- const void *loc_cpu_entry = info->entries[raw_smp_processor_id()];
-
- memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
- for_each_possible_cpu(cpu) {
- memcpy(newinfo->entries[cpu], loc_cpu_entry, info->size);
- IP6T_ENTRY_ITERATE(newinfo->entries[cpu], newinfo->size,
- zero_entry_counter, NULL);
- }
-}
-
static struct xt_counters *alloc_counters(struct xt_table *table)
{
unsigned int countersize;
struct xt_counters *counters;
struct xt_table_info *private = table->private;
- struct xt_table_info *info;
/* We need atomic snapshot of counters: rest doesn't change
(other than comefrom, which userspace doesn't care
@@ -1015,30 +968,11 @@ static struct xt_counters *alloc_counters(struct xt_table *table)
counters = vmalloc_node(countersize, numa_node_id());
if (counters == NULL)
- goto nomem;
+ return ERR_PTR(-ENOMEM);
- info = xt_alloc_table_info(private->size);
- if (!info)
- goto free_counters;
-
- clone_counters(info, private);
-
- mutex_lock(&table->lock);
- xt_table_entry_swap_rcu(private, info);
- synchronize_net(); /* Wait until smoke has cleared */
-
- get_counters(info, counters);
- put_counters(private, counters);
- mutex_unlock(&table->lock);
-
- xt_free_table_info(info);
+ get_counters(private, counters);
return counters;
-
- free_counters:
- vfree(counters);
- nomem:
- return ERR_PTR(-ENOMEM);
}
static int
@@ -1334,8 +1268,9 @@ __do_replace(struct net *net, const char *name, unsigned int valid_hooks,
(newinfo->number <= oldinfo->initial_entries))
module_put(t->me);
- /* Get the old counters. */
+ /* Get the old counters, and synchronize with replace */
get_counters(oldinfo, counters);
+
/* Decrease module usage counts and free resource */
loc_cpu_old_entry = oldinfo->entries[raw_smp_processor_id()];
IP6T_ENTRY_ITERATE(loc_cpu_old_entry, oldinfo->size, cleanup_entry,
@@ -1405,11 +1340,24 @@ do_replace(struct net *net, void __user *user, unsigned int len)
return ret;
}
+/* We're lazy, and add to the first CPU; overflow works its fey magic
+ * and everything is OK. */
+static int
+add_counter_to_entry(struct ip6t_entry *e,
+ const struct xt_counters addme[],
+ unsigned int *i)
+{
+ ADD_COUNTER(e->counters, addme[*i].bcnt, addme[*i].pcnt);
+
+ (*i)++;
+ return 0;
+}
+
static int
do_add_counters(struct net *net, void __user *user, unsigned int len,
int compat)
{
- unsigned int i;
+ unsigned int i, curcpu;
struct xt_counters_info tmp;
struct xt_counters *paddc;
unsigned int num_counters;
@@ -1465,25 +1413,28 @@ do_add_counters(struct net *net, void __user *user, unsigned int len,
goto free;
}
- mutex_lock(&t->lock);
+
+ local_bh_disable();
private = t->private;
if (private->number != num_counters) {
ret = -EINVAL;
goto unlock_up_free;
}
- preempt_disable();
i = 0;
/* Choose the copy that is on our node */
- loc_cpu_entry = private->entries[raw_smp_processor_id()];
+ curcpu = smp_processor_id();
+ xt_info_wrlock(curcpu);
+ loc_cpu_entry = private->entries[curcpu];
IP6T_ENTRY_ITERATE(loc_cpu_entry,
private->size,
add_counter_to_entry,
paddc,
&i);
- preempt_enable();
+ xt_info_wrunlock(curcpu);
+
unlock_up_free:
- mutex_unlock(&t->lock);
+ local_bh_enable();
xt_table_unlock(t);
module_put(t->me);
free:
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 509a956..5807a4d 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -625,20 +625,6 @@ void xt_free_table_info(struct xt_table_info *info)
}
EXPORT_SYMBOL(xt_free_table_info);
-void xt_table_entry_swap_rcu(struct xt_table_info *oldinfo,
- struct xt_table_info *newinfo)
-{
- unsigned int cpu;
-
- for_each_possible_cpu(cpu) {
- void *p = oldinfo->entries[cpu];
- rcu_assign_pointer(oldinfo->entries[cpu], newinfo->entries[cpu]);
- newinfo->entries[cpu] = p;
- }
-
-}
-EXPORT_SYMBOL_GPL(xt_table_entry_swap_rcu);
-
/* Find table by name, grabs mutex & ref. Returns ERR_PTR() on error. */
struct xt_table *xt_find_table_lock(struct net *net, u_int8_t af,
const char *name)
@@ -676,32 +662,43 @@ void xt_compat_unlock(u_int8_t af)
EXPORT_SYMBOL_GPL(xt_compat_unlock);
#endif
+DEFINE_PER_CPU(rwlock_t, xt_info_locks);
+EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks);
+
+
struct xt_table_info *
xt_replace_table(struct xt_table *table,
unsigned int num_counters,
struct xt_table_info *newinfo,
int *error)
{
- struct xt_table_info *oldinfo, *private;
+ struct xt_table_info *private;
/* Do the substitution. */
- mutex_lock(&table->lock);
+ local_bh_disable();
private = table->private;
+
/* Check inside lock: is the old number correct? */
if (num_counters != private->number) {
duprintf("num_counters != table->private->number (%u/%u)\n",
num_counters, private->number);
- mutex_unlock(&table->lock);
+ local_bh_enable();
*error = -EAGAIN;
return NULL;
}
- oldinfo = private;
- rcu_assign_pointer(table->private, newinfo);
- newinfo->initial_entries = oldinfo->initial_entries;
- mutex_unlock(&table->lock);
- synchronize_net();
- return oldinfo;
+ table->private = newinfo;
+ newinfo->initial_entries = private->initial_entries;
+
+ /*
+ * Even though table entries have now been swapped, other CPU's
+ * may still be using the old entries. This is okay, because
+ * resynchronization happens because of the locking done
+ * during the get_counters() routine.
+ */
+ local_bh_enable();
+
+ return private;
}
EXPORT_SYMBOL_GPL(xt_replace_table);
@@ -734,7 +731,6 @@ struct xt_table *xt_register_table(struct net *net, struct xt_table *table,
/* Simplifies replace_table code. */
table->private = bootstrap;
- mutex_init(&table->lock);
if (!xt_replace_table(table, 0, newinfo, &ret))
goto unlock;
@@ -1147,7 +1143,11 @@ static struct pernet_operations xt_net_ops = {
static int __init xt_init(void)
{
- int i, rv;
+ unsigned int i;
+ int rv;
+
+ for_each_possible_cpu(i)
+ rwlock_init(&per_cpu(xt_info_locks, i));
xt = kmalloc(sizeof(struct xt_af) * NFPROTO_NUMPROTO, GFP_KERNEL);
if (!xt)
next prev parent reply other threads:[~2009-04-26 18:24 UTC|newest]
Thread overview: 254+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-10 9:15 iptables very slow after commit784544739a25c30637397ace5489eeb6e15d7d49 Jeff Chua
2009-04-10 16:52 ` Stephen Hemminger
2009-04-11 1:07 ` Jeff Chua
2009-04-11 1:25 ` David Miller
2009-04-11 1:39 ` iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Linus Torvalds
2009-04-11 4:15 ` Paul E. McKenney
2009-04-11 5:14 ` Jan Engelhardt
2009-04-11 5:42 ` Paul E. McKenney
2009-04-11 6:00 ` David Miller
2009-04-11 18:12 ` Kyle Moffett
2009-04-11 18:12 ` Kyle Moffett
2009-04-11 18:32 ` Arkadiusz Miskiewicz
2009-04-11 18:32 ` Arkadiusz Miskiewicz
2009-04-12 0:54 ` david
2009-04-12 5:05 ` Kyle Moffett
2009-04-12 5:05 ` Kyle Moffett
2009-04-12 12:30 ` Harald Welte
2009-04-12 16:38 ` Jan Engelhardt
2009-04-11 15:07 ` Stephen Hemminger
2009-04-11 16:05 ` Jeff Chua
2009-04-11 16:05 ` Jeff Chua
2009-04-11 17:51 ` Linus Torvalds
2009-04-11 7:08 ` Ingo Molnar
2009-04-11 15:05 ` Stephen Hemminger
2009-04-11 17:48 ` Paul E. McKenney
2009-04-12 10:54 ` Ingo Molnar
2009-04-12 11:34 ` Paul Mackerras
2009-04-12 17:31 ` Paul E. McKenney
2009-04-13 1:13 ` David Miller
2009-04-13 4:04 ` Paul E. McKenney
2009-04-13 16:53 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU Stephen Hemminger
2009-04-13 17:40 ` Eric Dumazet
2009-04-13 17:40 ` Eric Dumazet
2009-04-13 18:11 ` Stephen Hemminger
2009-04-13 19:06 ` Martin Josefsson
2009-04-13 19:17 ` Linus Torvalds
2009-04-13 22:24 ` Andrew Morton
2009-04-13 23:20 ` Stephen Hemminger
2009-04-13 23:26 ` Andrew Morton
2009-04-13 23:37 ` Linus Torvalds
2009-04-13 23:52 ` Ingo Molnar
2009-04-14 12:27 ` Patrick McHardy
2009-04-14 14:23 ` Eric Dumazet
2009-04-14 14:45 ` Stephen Hemminger
2009-04-14 15:49 ` Eric Dumazet
2009-04-14 15:49 ` Eric Dumazet
2009-04-14 16:51 ` Jeff Chua
2009-04-14 18:17 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v2) Stephen Hemminger
2009-04-14 19:28 ` Eric Dumazet
2009-04-14 21:11 ` Stephen Hemminger
2009-04-14 21:13 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Stephen Hemminger
2009-04-14 21:40 ` Eric Dumazet
2009-04-14 21:40 ` Eric Dumazet
2009-04-15 10:59 ` Patrick McHardy
2009-04-15 10:59 ` Patrick McHardy
2009-04-15 16:31 ` Stephen Hemminger
2009-04-15 16:31 ` Stephen Hemminger
2009-04-15 20:55 ` Stephen Hemminger
2009-04-15 21:07 ` Eric Dumazet
2009-04-15 21:55 ` Jan Engelhardt
2009-04-16 12:12 ` Patrick McHardy
2009-04-16 12:24 ` Jan Engelhardt
2009-04-16 12:24 ` Jan Engelhardt
2009-04-16 12:31 ` Patrick McHardy
2009-04-16 12:31 ` Patrick McHardy
2009-04-15 21:57 ` [PATCH] netfilter: use per-cpu rwlock rather than RCU (v4) Stephen Hemminger
2009-04-15 23:48 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) David Miller
2009-04-16 0:01 ` Stephen Hemminger
2009-04-16 0:05 ` David Miller
2009-04-16 12:28 ` Patrick McHardy
2009-04-16 0:10 ` Linus Torvalds
2009-04-16 0:45 ` [PATCH] netfilter: use per-cpu spinlock and RCU (v5) Stephen Hemminger
2009-04-16 5:01 ` Eric Dumazet
2009-04-16 13:53 ` Patrick McHardy
2009-04-16 13:53 ` Patrick McHardy
2009-04-16 14:47 ` Paul E. McKenney
2009-04-16 14:47 ` Paul E. McKenney
2009-04-16 16:10 ` [PATCH] netfilter: use per-cpu recursive spinlock (v6) Eric Dumazet
2009-04-16 16:10 ` Eric Dumazet
2009-04-16 16:20 ` Eric Dumazet
2009-04-16 16:20 ` Eric Dumazet
2009-04-16 16:37 ` Linus Torvalds
2009-04-16 16:59 ` Patrick McHardy
2009-04-16 17:58 ` Paul E. McKenney
2009-04-16 17:58 ` Paul E. McKenney
2009-04-16 18:41 ` Eric Dumazet
2009-04-16 20:49 ` [PATCH[] netfilter: use per-cpu reader-writer lock (v0.7) Stephen Hemminger
2009-04-16 21:02 ` Linus Torvalds
2009-04-16 23:04 ` Ingo Molnar
2009-04-17 0:13 ` [PATCH] netfilter: use per-cpu recursive spinlock (v6) Paul E. McKenney
2009-04-17 0:13 ` Paul E. McKenney
2009-04-16 13:11 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Patrick McHardy
2009-04-16 22:33 ` David Miller
2009-04-16 23:49 ` Paul E. McKenney
2009-04-16 23:52 ` [PATCH] netfilter: per-cpu spin-lock with recursion (v0.8) Stephen Hemminger
2009-04-17 0:15 ` Jeff Chua
2009-04-17 5:55 ` Peter Zijlstra
2009-04-17 6:03 ` Eric Dumazet
2009-04-17 6:14 ` Eric Dumazet
2009-04-17 6:14 ` Eric Dumazet
2009-04-17 17:08 ` Peter Zijlstra
2009-04-17 11:17 ` Patrick McHardy
2009-04-17 11:17 ` Patrick McHardy
2009-04-17 1:28 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Paul E. McKenney
2009-04-17 2:19 ` Mathieu Desnoyers
2009-04-17 5:05 ` Paul E. McKenney
2009-04-17 5:44 ` Mathieu Desnoyers
2009-04-17 14:51 ` Paul E. McKenney
2009-04-17 4:50 ` Stephen Hemminger
2009-04-17 5:08 ` Paul E. McKenney
2009-04-17 5:16 ` Eric Dumazet
2009-04-17 5:16 ` Eric Dumazet
2009-04-17 5:40 ` Paul E. McKenney
2009-04-17 5:40 ` Paul E. McKenney
2009-04-17 8:07 ` David Miller
2009-04-17 15:00 ` Paul E. McKenney
2009-04-17 17:22 ` Peter Zijlstra
2009-04-17 17:32 ` Linus Torvalds
2009-04-17 6:12 ` Peter Zijlstra
2009-04-17 16:33 ` Paul E. McKenney
2009-04-17 16:51 ` Peter Zijlstra
2009-04-17 21:29 ` Paul E. McKenney
2009-04-18 9:40 ` Evgeniy Polyakov
2009-04-18 14:14 ` Paul E. McKenney
2009-04-20 17:34 ` [PATCH] netfilter: use per-cpu recursive lock (v10) Stephen Hemminger
2009-04-20 18:21 ` Paul E. McKenney
2009-04-20 18:25 ` Eric Dumazet
2009-04-20 18:25 ` Eric Dumazet
2009-04-20 20:32 ` Stephen Hemminger
2009-04-20 20:42 ` Stephen Hemminger
2009-04-20 21:05 ` Paul E. McKenney
2009-04-20 21:05 ` Paul E. McKenney
2009-04-20 21:23 ` Paul Mackerras
2009-04-20 21:58 ` Paul E. McKenney
2009-04-20 22:41 ` Paul Mackerras
2009-04-20 23:01 ` [PATCH] netfilter: use per-cpu recursive lock (v11) Stephen Hemminger
2009-04-21 3:41 ` Lai Jiangshan
2009-04-21 3:56 ` Eric Dumazet
2009-04-21 4:15 ` Stephen Hemminger
2009-04-21 5:22 ` Lai Jiangshan
2009-04-21 5:45 ` Stephen Hemminger
2009-04-21 5:45 ` Stephen Hemminger
2009-04-21 6:52 ` Lai Jiangshan
2009-04-21 8:16 ` Evgeniy Polyakov
2009-04-21 8:42 ` Lai Jiangshan
2009-04-21 8:49 ` David Miller
2009-04-21 8:55 ` Eric Dumazet
2009-04-21 9:22 ` Evgeniy Polyakov
2009-04-21 9:34 ` Lai Jiangshan
2009-04-21 9:34 ` Lai Jiangshan
2009-04-21 5:34 ` Lai Jiangshan
2009-04-21 5:34 ` Lai Jiangshan
2009-04-21 4:59 ` Eric Dumazet
2009-04-21 4:59 ` Eric Dumazet
2009-04-21 16:37 ` Paul E. McKenney
2009-04-21 5:46 ` Lai Jiangshan
2009-04-21 16:13 ` Linus Torvalds
2009-04-21 16:43 ` Stephen Hemminger
2009-04-21 16:50 ` Linus Torvalds
2009-04-21 18:02 ` Ingo Molnar
2009-04-21 18:15 ` Stephen Hemminger
2009-04-21 19:10 ` Ingo Molnar
2009-04-21 19:46 ` Eric Dumazet
2009-04-21 19:46 ` Eric Dumazet
2009-04-22 7:35 ` Ingo Molnar
2009-04-22 7:35 ` Ingo Molnar
2009-04-22 8:53 ` Eric Dumazet
2009-04-22 10:13 ` Jarek Poplawski
2009-04-22 11:26 ` Ingo Molnar
2009-04-22 11:39 ` Jarek Poplawski
2009-04-22 11:18 ` Ingo Molnar
2009-04-22 15:19 ` Linus Torvalds
2009-04-22 16:57 ` Eric Dumazet
2009-04-22 17:18 ` Linus Torvalds
2009-04-22 20:46 ` Jarek Poplawski
2009-04-22 17:48 ` Ingo Molnar
2009-04-21 21:04 ` Stephen Hemminger
2009-04-22 8:00 ` Ingo Molnar
2009-04-21 19:39 ` Ingo Molnar
2009-04-21 21:39 ` [PATCH] netfilter: use per-cpu recursive lock (v13) Stephen Hemminger
2009-04-22 4:17 ` Paul E. McKenney
2009-04-22 14:57 ` Eric Dumazet
2009-04-22 15:32 ` Linus Torvalds
2009-04-24 4:09 ` [PATCH] netfilter: use per-CPU recursive lock {XIV} Stephen Hemminger
2009-04-24 4:58 ` Eric Dumazet
2009-04-24 15:33 ` Patrick McHardy
2009-04-24 15:33 ` Patrick McHardy
2009-04-24 16:18 ` Stephen Hemminger
2009-04-24 16:18 ` Stephen Hemminger
2009-04-24 20:43 ` Jarek Poplawski
2009-04-25 20:30 ` [PATCH] netfilter: iptables no lockdep is needed Stephen Hemminger
2009-04-26 8:18 ` Jarek Poplawski
2009-04-26 18:24 ` Eric Dumazet [this message]
2009-04-26 18:56 ` [PATCH] netfilter: use per-CPU recursive lock {XV} Mathieu Desnoyers
2009-04-26 21:57 ` Stephen Hemminger
2009-04-26 22:32 ` Mathieu Desnoyers
2009-04-27 17:44 ` Peter Zijlstra
2009-04-27 18:30 ` [PATCH] netfilter: use per-CPU r**ursive " Stephen Hemminger
2009-04-27 18:54 ` Ingo Molnar
2009-04-27 19:06 ` Stephen Hemminger
2009-04-27 19:46 ` Linus Torvalds
2009-04-27 19:48 ` Linus Torvalds
2009-04-27 20:36 ` Evgeniy Polyakov
2009-04-27 20:58 ` Linus Torvalds
2009-04-27 21:40 ` Stephen Hemminger
2009-04-27 21:40 ` Stephen Hemminger
2009-04-27 22:24 ` Linus Torvalds
2009-04-27 23:01 ` Linus Torvalds
2009-04-27 23:03 ` Linus Torvalds
2009-04-28 6:58 ` Eric Dumazet
2009-04-28 6:58 ` Eric Dumazet
2009-04-28 11:53 ` David Miller
2009-04-28 12:40 ` Ingo Molnar
2009-04-28 13:43 ` David Miller
2009-04-28 13:52 ` Mathieu Desnoyers
2009-04-28 14:37 ` David Miller
2009-04-28 14:49 ` Mathieu Desnoyers
2009-04-28 15:00 ` David Miller
2009-04-28 16:24 ` [PATCH] netfilter: revised locking for x_tables Stephen Hemminger
2009-04-28 16:50 ` Linus Torvalds
2009-04-28 16:55 ` Linus Torvalds
2009-04-29 5:37 ` David Miller
[not found] ` <20090428.223708.168741998.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2009-04-30 3:26 ` Jeff Chua
2009-04-30 3:26 ` Jeff Chua
[not found] ` <b6a2187b0904292026k7d6107a7vcdc761d4149f40aa-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-30 3:31 ` David Miller
2009-04-30 3:31 ` David Miller
2009-05-01 8:38 ` [PATCH] netfilter: use likely() in xt_info_rdlock_bh() Eric Dumazet
2009-05-01 16:10 ` David Miller
2009-04-28 15:42 ` [PATCH] netfilter: use per-CPU r**ursive lock {XV} Paul E. McKenney
2009-04-28 17:35 ` Christoph Lameter
2009-04-28 15:09 ` Linus Torvalds
2009-04-27 23:32 ` Linus Torvalds
2009-04-28 7:41 ` Peter Zijlstra
2009-04-28 14:22 ` Paul E. McKenney
2009-04-28 7:42 ` Jan Engelhardt
2009-04-26 19:31 ` [PATCH] netfilter: use per-CPU recursive " Mathieu Desnoyers
2009-04-26 20:55 ` Eric Dumazet
2009-04-26 20:55 ` Eric Dumazet
2009-04-26 21:39 ` Mathieu Desnoyers
2009-04-21 18:34 ` [PATCH] netfilter: use per-cpu recursive lock (v11) Paul E. McKenney
2009-04-21 20:14 ` Linus Torvalds
2009-04-20 23:44 ` [PATCH] netfilter: use per-cpu recursive lock (v10) Paul E. McKenney
2009-04-16 0:02 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3) Linus Torvalds
2009-04-16 6:26 ` Eric Dumazet
2009-04-16 14:33 ` Paul E. McKenney
2009-04-15 3:23 ` David Miller
2009-04-14 17:19 ` [PATCH] netfilter: use per-cpu spinlock rather than RCU Stephen Hemminger
2009-04-11 15:50 ` iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Stephen Hemminger
2009-04-11 17:43 ` Paul E. McKenney
2009-04-11 18:57 ` Linus Torvalds
2009-04-12 0:34 ` Paul E. McKenney
2009-04-12 7:23 ` Evgeniy Polyakov
2009-04-12 16:06 ` Stephen Hemminger
2009-04-12 17:30 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49F4A6E3.7080102@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=jarkao2@gmail.com \
--cc=jeff.chua.linux@gmail.com \
--cc=jengelh@medozas.de \
--cc=kaber@trash.net \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=r000n@r000n.net \
--cc=shemminger@vyatta.com \
--cc=torvalds@linux-foundation.org \
--cc=zbr@ioremap.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.