* [PATCH] fix iptables on systems with discontiguous processor ids
@ 2005-10-10 16:41 Harald Welte
2005-10-10 21:15 ` David S. Miller
2005-10-10 21:48 ` Sébastien Bernard
0 siblings, 2 replies; 40+ messages in thread
From: Harald Welte @ 2005-10-10 16:41 UTC (permalink / raw)
To: David Miller; +Cc: Netfilter Development Mailinglist
[-- Attachment #1: Type: text/plain, Size: 3804 bytes --]
Hi Dave!
This is my proposed patch for the problem you've described. Please test
and submit. If it works, I'll also prepare a patch for {arp,ip6}_tables.
[NETFILTER] ip_tables: fix per cpu handling
As David Miller points out, the "smp_processor_id()'s" are not always
contiguous. Esp. some boxes like Sun Ultra60 have CPU 0 and 2 installed in
one system, but no "1". Therefore the current logic of how iptables
manages the per cpu copies of the ruleset is broken. This patch is
supposed to fix it.
Signed-off-by: Harald Welte <laforge@netfilter.org>
---
commit e759eaa9e9e92330c5fcfd760d767d4f39375a03
tree 63b96f7df57f51dc7e284969c3a08b9264cc2c5f
parent 1ab8dccdbf16c09f8da124fc4c82024de24dfae2
author Harald Welte <laforge@netfilter.org> Mon, 10 Oct 2005 18:29:24 +0200
committer Harald Welte <laforge@netfilter.org> Mon, 10 Oct 2005 18:29:24 +0200
net/ipv4/netfilter/ip_tables.c | 26 ++++++++++++++++++++------
1 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -27,6 +27,7 @@
#include <asm/semaphore.h>
#include <linux/proc_fs.h>
#include <linux/err.h>
+#include <linux/cpumask.h>
#include <linux/netfilter_ipv4/ip_tables.h>
@@ -124,6 +125,19 @@ static LIST_HEAD(ipt_tables);
#define up(x) do { printk("UP:%u:" #x "\n", __LINE__); up(x); } while(0)
#endif
+/* Find the highest possible smp_processor_id() */
+static unsigned int highest_processor_id(void)
+{
+ unsigned int cpu, highest = 0;
+
+ for_each_cpu_mask(cpu, cpu_possible_map) {
+ if (cpu > highest)
+ highest = cpu;
+ }
+
+ return highest;
+}
+
/* Returns whether matches rule or not. */
static inline int
ip_packet_match(const struct iphdr *ip,
@@ -921,8 +935,8 @@ translate_table(const char *name,
}
/* And one copy for every other CPU */
- for (i = 1; i < num_possible_cpus(); i++) {
- memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i,
+ for_each_cpu_mask(i, cpu_possible_map) {
+ memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i),
newinfo->entries,
SMP_ALIGN(newinfo->size));
}
@@ -943,7 +957,7 @@ replace_table(struct ipt_table *table,
struct ipt_entry *table_base;
unsigned int i;
- for (i = 0; i < num_possible_cpus(); i++) {
+ for_each_cpu_mask(i, cpu_possible_map) {
table_base =
(void *)newinfo->entries
+ TABLE_OFFSET(newinfo, i);
@@ -990,7 +1004,7 @@ get_counters(const struct ipt_table_info
unsigned int cpu;
unsigned int i;
- for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+ for_each_cpu_mask(cpu, cpu_possible_map) {
i = 0;
IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu),
t->size,
@@ -1128,7 +1142,7 @@ do_replace(void __user *user, unsigned i
return -ENOMEM;
newinfo = vmalloc(sizeof(struct ipt_table_info)
- + SMP_ALIGN(tmp.size) * num_possible_cpus());
+ + SMP_ALIGN(tmp.size) * (highest_processor_id()+1));
if (!newinfo)
return -ENOMEM;
@@ -1458,7 +1472,7 @@ int ipt_register_table(struct ipt_table
= { 0, 0, 0, { 0 }, { 0 }, { } };
newinfo = vmalloc(sizeof(struct ipt_table_info)
- + SMP_ALIGN(repl->size) * num_possible_cpus());
+ + SMP_ALIGN(repl->size) * (highest_processor_id()+1));
if (!newinfo)
return -ENOMEM;
--
- Harald Welte <laforge@netfilter.org> http://netfilter.org/
============================================================================
"Fragmentation is like classful addressing -- an interesting early
architectural error that shows how much experimentation was going
on while IP was being designed." -- Paul Vixie
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 16:41 [PATCH] fix iptables on systems with discontiguous processor ids Harald Welte @ 2005-10-10 21:15 ` David S. Miller 2005-10-10 21:41 ` Patrick McHardy ` (2 more replies) 2005-10-10 21:48 ` Sébastien Bernard 1 sibling, 3 replies; 40+ messages in thread From: David S. Miller @ 2005-10-10 21:15 UTC (permalink / raw) To: laforge; +Cc: netfilter-devel From: Harald Welte <laforge@netfilter.org> Date: Mon, 10 Oct 2005 18:41:41 +0200 > This is my proposed patch for the problem you've described. Please test > and submit. If it works, I'll also prepare a patch for {arp,ip6}_tables. Ebtables needs it too. "git grep num_possible_cpus" shows all of the users in the whole tree, %80 of which are the netfilter cases we're fixing here :-) Why don't any of the existing interfaces on cpumasks and numbers provide what you need here? Perhaps this routine you are adding (highest_processor_id()) belongs in linux/cpumask.h? Perhaps named something like "highest_possible_processor_id()" to be consistent with the "num_possible_cpus()" naming? Otherwise the patch looks fine, thanks for following up on this Harald. I'll push the rest of the netfilter fixes once you cook up the final patch for this, thanks. Thanks. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:15 ` David S. Miller @ 2005-10-10 21:41 ` Patrick McHardy 2005-10-10 21:52 ` Patrick McHardy 2005-10-10 22:04 ` David S. Miller 2005-10-10 21:46 ` Harald Welte 2005-10-11 10:44 ` Harald Welte 2 siblings, 2 replies; 40+ messages in thread From: Patrick McHardy @ 2005-10-10 21:41 UTC (permalink / raw) To: David S. Miller; +Cc: laforge, netfilter-devel David S. Miller wrote: > From: Harald Welte <laforge@netfilter.org> > Date: Mon, 10 Oct 2005 18:41:41 +0200 > > Why don't any of the existing interfaces on cpumasks and numbers > provide what you need here? Perhaps this routine you are adding > (highest_processor_id()) belongs in linux/cpumask.h? Perhaps > named something like "highest_possible_processor_id()" to be > consistent with the "num_possible_cpus()" naming? Even better would be a function returning the CPU ID as "physical" ID, skipping holes in the space. This would allow to save the memory for unused CPU IDs. Maybe a small table mapping "virtual" to "phyiscal" IDs? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:41 ` Patrick McHardy @ 2005-10-10 21:52 ` Patrick McHardy 2005-10-10 22:06 ` David S. Miller 2005-10-10 22:04 ` David S. Miller 1 sibling, 1 reply; 40+ messages in thread From: Patrick McHardy @ 2005-10-10 21:52 UTC (permalink / raw) To: David S. Miller; +Cc: laforge, netfilter-devel Patrick McHardy wrote: > David S. Miller wrote: > >>From: Harald Welte <laforge@netfilter.org> >>Date: Mon, 10 Oct 2005 18:41:41 +0200 >> >>Why don't any of the existing interfaces on cpumasks and numbers >>provide what you need here? Perhaps this routine you are adding >>(highest_processor_id()) belongs in linux/cpumask.h? Perhaps >>named something like "highest_possible_processor_id()" to be >>consistent with the "num_possible_cpus()" naming? > > > Even better would be a function returning the CPU ID as "physical" > ID, skipping holes in the space. This would allow to save the memory > for unused CPU IDs. Maybe a small table mapping "virtual" to "phyiscal" > IDs? I don't know which other architectures are affected by this, but for the case Dave mentioned some function static inline int cpu_physical_id(void) { return smp_processor_id() >> 1; } should also solve the problem. Even if not as easily implementable as this. it looks like a logical complement to num_possible_cpus(). ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:52 ` Patrick McHardy @ 2005-10-10 22:06 ` David S. Miller 2005-10-11 14:23 ` Harald Welte 0 siblings, 1 reply; 40+ messages in thread From: David S. Miller @ 2005-10-10 22:06 UTC (permalink / raw) To: kaber; +Cc: laforge, netfilter-devel From: Patrick McHardy <kaber@trash.net> Date: Mon, 10 Oct 2005 23:52:11 +0200 > I don't know which other architectures are affected by this, but for > the case Dave mentioned some function > > static inline int cpu_physical_id(void) > { > return smp_processor_id() >> 1; > } > > should also solve the problem. Even if not as easily implementable > as this. it looks like a logical complement to num_possible_cpus(). Stop papering around this issue, please. :-) The cpu ID space is discontiguous, period. If you need to allocate per-cpu data there are very straightforward ways to do this. Many other subsystems do this in an error free manner and I think netfilter iptables can do it too. :) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 22:06 ` David S. Miller @ 2005-10-11 14:23 ` Harald Welte 2005-10-11 19:39 ` David S. Miller 0 siblings, 1 reply; 40+ messages in thread From: Harald Welte @ 2005-10-11 14:23 UTC (permalink / raw) To: David S. Miller; +Cc: netfilter-devel, kaber [-- Attachment #1: Type: text/plain, Size: 1713 bytes --] On Mon, Oct 10, 2005 at 03:06:23PM -0700, David S. Miller wrote: > If you need to allocate per-cpu data there are very straightforward > ways to do this. Many other subsystems do this in an error free > manner and I think netfilter iptables can do it too. :) Yes, it _could_ if the design was not originally making the assumption that the cpuid space is contiuous. I think it could still be solved (without running into any userspace compatibility issues), although not in a very straightforward way :( iptables very much predates any per_cpu code... given that discontiguous processor id's seem to be very rare, I think it's fine to waste some memory on those few systems by ussing this "allocate array from 0 to max smp processor id" approach. Any kind of mapping, as David pointed out, is probably difficult and prone to cause races, etc. For a better solution, I was thinking about having an array ouf pointers for entry points (e.g. instead of using TABLE_OFFSET() to calculate an offset, dereference the table address from that pointer array. The array is indexed by smp_processor_id(), and at allocation time we only allocate memory (and fill in pointers) for those smp_processor_id()'s that actually exist. This will require some more intrusive changes, which I'm not inclined to add for 2.6.14. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 14:23 ` Harald Welte @ 2005-10-11 19:39 ` David S. Miller 2005-10-12 6:36 ` Harald Welte 0 siblings, 1 reply; 40+ messages in thread From: David S. Miller @ 2005-10-11 19:39 UTC (permalink / raw) To: laforge; +Cc: netfilter-devel, kaber From: Harald Welte <laforge@netfilter.org> Date: Tue, 11 Oct 2005 16:23:04 +0200 > given that discontiguous processor id's seem to be very rare, I think > it's fine to waste some memory on those few systems by ussing this > "allocate array from 0 to max smp processor id" approach. Let's get your patch working first :-) The original reporter said that your patch still OOPSes when he tries to start using iptables. Didn't you see that? On thinking about this some more, the duplication of _all_ of this information per-cpu is quite questionable, at least the "read mostly" parts that just describe the rules. The counters make tons of sense, per cpu, but that's the majority of it. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 19:39 ` David S. Miller @ 2005-10-12 6:36 ` Harald Welte 0 siblings, 0 replies; 40+ messages in thread From: Harald Welte @ 2005-10-12 6:36 UTC (permalink / raw) To: David S. Miller; +Cc: netfilter-devel, kaber [-- Attachment #1: Type: text/plain, Size: 1950 bytes --] On Tue, Oct 11, 2005 at 12:39:20PM -0700, David S. Miller wrote: > From: Harald Welte <laforge@netfilter.org> > Date: Tue, 11 Oct 2005 16:23:04 +0200 > > > given that discontiguous processor id's seem to be very rare, I think > > it's fine to waste some memory on those few systems by ussing this > > "allocate array from 0 to max smp processor id" approach. > > Let's get your patch working first :-) The original reporter > said that your patch still OOPSes when he tries to start > using iptables. Didn't you see that? Yes, I did. Please note my emails are heavily delayed, since I'm almost every day travelling (either plane or train). > On thinking about this some more, the duplication of _all_ of this > information per-cpu is quite questionable, at least the "read mostly" > parts that just describe the rules. The counters make tons of sense, > per cpu, but that's the majority of it. Yes, it makes no sense. I never questioned that. I really dislike a lot of these strange things in ip_tables. Unfortunately a change of something fundamental like this will require lots of code auditing (basically all match/target extensions). At the moment a match/target can modify it's cpu-local matchinfo, and maybe it's not suposed to change global state. Also, anything that writes to target/matchinfo would then require additional (write)locking. It definitely is a design mistake, I think Rusty admitted to that even someyears ago. But now we have to live with the legacy, and now we need a quick fix, not a complete redesign :) -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:41 ` Patrick McHardy 2005-10-10 21:52 ` Patrick McHardy @ 2005-10-10 22:04 ` David S. Miller 1 sibling, 0 replies; 40+ messages in thread From: David S. Miller @ 2005-10-10 22:04 UTC (permalink / raw) To: kaber; +Cc: laforge, netfilter-devel From: Patrick McHardy <kaber@trash.net> Date: Mon, 10 Oct 2005 23:41:43 +0200 > Even better would be a function returning the CPU ID as "physical" > ID, skipping holes in the space. This would allow to save the memory > for unused CPU IDs. Maybe a small table mapping "virtual" to "phyiscal" > IDs? We used to have this and it was really ugly and error prone. I like the discontiguous cpu ID space much better. I was so happy when I could blow away those virtual<-->physical cpu ID mappings. People got it wrong all the time and it makes hotplug cpu support more difficult. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:15 ` David S. Miller 2005-10-10 21:41 ` Patrick McHardy @ 2005-10-10 21:46 ` Harald Welte 2005-10-11 13:54 ` Henrik Nordstrom 2005-10-11 10:44 ` Harald Welte 2 siblings, 1 reply; 40+ messages in thread From: Harald Welte @ 2005-10-10 21:46 UTC (permalink / raw) To: David S. Miller; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2482 bytes --] On Mon, Oct 10, 2005 at 02:15:18PM -0700, David S. Miller wrote: > From: Harald Welte <laforge@netfilter.org> > Date: Mon, 10 Oct 2005 18:41:41 +0200 > > > This is my proposed patch for the problem you've described. Please test > > and submit. If it works, I'll also prepare a patch for {arp,ip6}_tables. > > Ebtables needs it too. "git grep num_possible_cpus" shows all > of the users in the whole tree, %80 of which are the netfilter > cases we're fixing here :-) > > Why don't any of the existing interfaces on cpumasks and numbers > provide what you need here? I don't know why they are incomplete ;) My guess is that they're mostly targeted to per_cpu users. Basically to solve the 'iptables' problem, there are two ways: 1) allocate one copy of the ruleset per cpu that actually exists, independent of their smp_processor_id(). This means that for every packet traversing the ruleset, we need to resolve the logical "cpu number" from the physical smp_processor_id(). Since there is no apparent mapping between them, we'd need to iterate over the cpu_possible bitmask and find "how many bits are set between 0 and smp_processor_id()". I'm not sure how expensive such calculations are, but I'd rather not additional code to the per-packet path. Advantage of this solution is that there is only one copy of the ruleset per physical cpu. 2) allocate one copy of the ruleset for every "possible smp_processor_id()", which is what the patch that I sent in my last email implements. > Perhaps this routine you are adding (highest_processor_id()) belongs > in linux/cpumask.h? Perhaps named something like > "highest_possible_processor_id()" to be consistent with the > "num_possible_cpus()" naming? I thought it was too specific for the generic code, but since we'll need it from all the foobar_tables, it's probably best in cpumask.h > Otherwise the patch looks fine, thanks for following up on this > Harald. I'll push the rest of the netfilter fixes once you cook > up the final patch for this, thanks. ok. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:46 ` Harald Welte @ 2005-10-11 13:54 ` Henrik Nordstrom 2005-10-11 14:13 ` Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Henrik Nordstrom @ 2005-10-11 13:54 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel, David S. Miller On Mon, 10 Oct 2005, Harald Welte wrote: > I don't know why they are incomplete ;) My guess is that they're mostly > targeted to per_cpu users. Basically to solve the 'iptables' problem, > there are two ways: Or three. 3) Keep an NUM_CPUS array of pointers to the per-cpu tables, allocated separately per CPU instead of allocating a single large blob for all CPUs. These pointers are updated using RCU, and indexed using the physical CPU id. Cost: One (most likely cached) indirect memory access on each packet, and slightly more complex update procedure (num_possible_cpus allocations and pointers to update instead of just one). Regards Henrik ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 13:54 ` Henrik Nordstrom @ 2005-10-11 14:13 ` Eric Dumazet 2005-10-11 17:45 ` Harald Welte 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2005-10-11 14:13 UTC (permalink / raw) To: Henrik Nordstrom; +Cc: Harald Welte, netfilter-devel, David S. Miller Henrik Nordstrom a écrit : > On Mon, 10 Oct 2005, Harald Welte wrote: > >> I don't know why they are incomplete ;) My guess is that they're mostly >> targeted to per_cpu users. Basically to solve the 'iptables' problem, >> there are two ways: > > > Or three. > > 3) Keep an NUM_CPUS array of pointers to the per-cpu tables, allocated > separately per CPU instead of allocating a single large blob for all > CPUs. These pointers are updated using RCU, and indexed using the > physical CPU id. > Yes, this is what I suggested some weeks ago with a patch. As vmalloc_node() is not yet part of kernel, I even coded a substitute for it. (per_cpu/kmalloc are not an option since the allocations done in iptables can be very large) http://marc.theaimsgroup.com/?l=linux-netdev&m=112733887410796&w=2 > Cost: One (most likely cached) indirect memory access on each packet, > and slightly more complex update procedure (num_possible_cpus > allocations and pointers to update instead of just one). > > Regards > Henrik > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 14:13 ` Eric Dumazet @ 2005-10-11 17:45 ` Harald Welte 2005-10-11 22:57 ` David S. Miller 0 siblings, 1 reply; 40+ messages in thread From: Harald Welte @ 2005-10-11 17:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: netfilter-devel, David S. Miller, Henrik Nordstrom [-- Attachment #1: Type: text/plain, Size: 1459 bytes --] On Tue, Oct 11, 2005 at 04:13:03PM +0200, Eric Dumazet wrote: > Henrik Nordstrom a écrit : > >On Mon, 10 Oct 2005, Harald Welte wrote: > >>I don't know why they are incomplete ;) My guess is that they're mostly > >>targeted to per_cpu users. Basically to solve the 'iptables' problem, > >>there are two ways: > >Or three. > > 3) Keep an NUM_CPUS array of pointers to the per-cpu tables, allocated separately per > >CPU instead of allocating a single large blob for all CPUs. These pointers are updated > >using RCU, and indexed using the physical CPU id. > > Yes, this is what I suggested some weeks ago with a patch. > > As vmalloc_node() is not yet part of kernel, I even coded a substitute for it. > (per_cpu/kmalloc are not an option since the allocations done in iptables can be very large) Eric, I still have your patches on hold and hope that vmalloc_node will eventually appear. I don't really like the idea of adding code to iptables that has to go knee-deep into the vm code.... So once vmalloc_node() is there, I'm open to merging your changes. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 17:45 ` Harald Welte @ 2005-10-11 22:57 ` David S. Miller 2005-10-12 7:00 ` Harald Welte 2005-10-13 22:01 ` Eric Dumazet 0 siblings, 2 replies; 40+ messages in thread From: David S. Miller @ 2005-10-11 22:57 UTC (permalink / raw) To: laforge; +Cc: netfilter-devel, hno, dada1 From: Harald Welte <laforge@netfilter.org> Date: Tue, 11 Oct 2005 19:45:56 +0200 > Eric, I still have your patches on hold and hope that vmalloc_node will > eventually appear. I don't really like the idea of adding code to > iptables that has to go knee-deep into the vm code.... So once > vmalloc_node() is there, I'm open to merging your changes. After thinking about this a bit, what I'm trying to do now is champion getting the vmalloc_node() patch into 2.6.14 so we can use Eric's approach to fix this bug cleanly. Attached is the current version of the vmalloc_node() patch. Eric, can you send me an updated copy of your patch, such that it uses this vmalloc_node() instead of the mempolicy hacks it had previously? If this fails, I'm afraid we might need to go with the NR_CPUS option for now as much as I'd really really hate to do that. Thanks. From: Christoph Lameter <clameter@engr.sgi.com> This patch adds vmalloc_node(size, node) -> Allocate necessary memory on the specified node and get_vm_area_node(size, flags, node) and the other functions that it depends on. Signed-off-by: Andrew Morton <akpm@osdl.org> --- include/linux/vmalloc.h | 8 ++++- mm/vmalloc.c | 73 +++++++++++++++++++++++++++++++++++++----------- 2 files changed, 64 insertions(+), 17 deletions(-) diff -puN include/linux/vmalloc.h~vmalloc_node include/linux/vmalloc.h --- 25/include/linux/vmalloc.h~vmalloc_node Tue Oct 11 14:16:27 2005 +++ 25-akpm/include/linux/vmalloc.h Tue Oct 11 14:19:30 2005 @@ -32,10 +32,14 @@ struct vm_struct { * Highlevel APIs for driver use */ extern void *vmalloc(unsigned long size); +extern void *vmalloc_node(unsigned long size, int node); extern void *vmalloc_exec(unsigned long size); extern void *vmalloc_32(unsigned long size); extern void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot); -extern void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot); +extern void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, + pgprot_t prot); +extern void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, + pgprot_t prot, int node); extern void vfree(void *addr); extern void *vmap(struct page **pages, unsigned int count, @@ -48,6 +52,8 @@ extern void vunmap(void *addr); extern struct vm_struct *get_vm_area(unsigned long size, unsigned long flags); extern struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags, unsigned long start, unsigned long end); +extern struct vm_struct *get_vm_area_node(unsigned long size, + unsigned long flags, int node); extern struct vm_struct *remove_vm_area(void *addr); extern struct vm_struct *__remove_vm_area(void *addr); extern int map_vm_area(struct vm_struct *area, pgprot_t prot, diff -puN mm/vmalloc.c~vmalloc_node mm/vmalloc.c --- 25/mm/vmalloc.c~vmalloc_node Tue Oct 11 14:16:27 2005 +++ 25-akpm/mm/vmalloc.c Tue Oct 11 14:22:42 2005 @@ -5,6 +5,7 @@ * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999 * SMP-safe vmalloc/vfree/ioremap, Tigran Aivazian <tigran@veritas.com>, May 2000 * Major rework to support vmap/vunmap, Christoph Hellwig, SGI, August 2002 + * Numa awareness, Christoph Lameter, SGI, June 2005 */ #include <linux/mm.h> @@ -158,8 +159,8 @@ int map_vm_area(struct vm_struct *area, return err; } -struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags, - unsigned long start, unsigned long end) +struct vm_struct *__get_vm_area_node(unsigned long size, unsigned long flags, + unsigned long start, unsigned long end, int node) { struct vm_struct **p, *tmp, *area; unsigned long align = 1; @@ -178,7 +179,7 @@ struct vm_struct *__get_vm_area(unsigned addr = ALIGN(start, align); size = PAGE_ALIGN(size); - area = kmalloc(sizeof(*area), GFP_KERNEL); + area = kmalloc_node(sizeof(*area), GFP_KERNEL, node); if (unlikely(!area)) return NULL; @@ -231,6 +232,12 @@ out: return NULL; } +struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags, + unsigned long start, unsigned long end) +{ + return __get_vm_area_node(size, flags, start, end, -1); +} + /** * get_vm_area - reserve a contingous kernel virtual area * @@ -246,6 +253,11 @@ struct vm_struct *get_vm_area(unsigned l return __get_vm_area(size, flags, VMALLOC_START, VMALLOC_END); } +struct vm_struct *get_vm_area_node(unsigned long size, unsigned long flags, int node) +{ + return __get_vm_area_node(size, flags, VMALLOC_START, VMALLOC_END, node); +} + /* Caller must hold vmlist_lock */ struct vm_struct *__remove_vm_area(void *addr) { @@ -342,7 +354,6 @@ void vfree(void *addr) BUG_ON(in_interrupt()); __vunmap(addr, 1); } - EXPORT_SYMBOL(vfree); /** @@ -360,7 +371,6 @@ void vunmap(void *addr) BUG_ON(in_interrupt()); __vunmap(addr, 0); } - EXPORT_SYMBOL(vunmap); /** @@ -392,10 +402,10 @@ void *vmap(struct page **pages, unsigned return area->addr; } - EXPORT_SYMBOL(vmap); -void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot) +void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, + pgprot_t prot, int node) { struct page **pages; unsigned int nr_pages, array_size, i; @@ -406,9 +416,9 @@ void *__vmalloc_area(struct vm_struct *a area->nr_pages = nr_pages; /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) - pages = __vmalloc(array_size, gfp_mask, PAGE_KERNEL); + pages = __vmalloc_node(array_size, gfp_mask, PAGE_KERNEL, node); else - pages = kmalloc(array_size, (gfp_mask & ~__GFP_HIGHMEM)); + pages = kmalloc_node(array_size, (gfp_mask & ~__GFP_HIGHMEM), node); area->pages = pages; if (!area->pages) { remove_vm_area(area->addr); @@ -418,7 +428,10 @@ void *__vmalloc_area(struct vm_struct *a memset(area->pages, 0, array_size); for (i = 0; i < area->nr_pages; i++) { - area->pages[i] = alloc_page(gfp_mask); + if (node < 0) + area->pages[i] = alloc_page(gfp_mask); + else + area->pages[i] = alloc_pages_node(node, gfp_mask, 0); if (unlikely(!area->pages[i])) { /* Successfully allocated i pages, free them in __vunmap() */ area->nr_pages = i; @@ -435,18 +448,25 @@ fail: return NULL; } +void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot) +{ + return __vmalloc_area_node(area, gfp_mask, prot, -1); +} + /** - * __vmalloc - allocate virtually contiguous memory + * __vmalloc_node - allocate virtually contiguous memory * * @size: allocation size * @gfp_mask: flags for the page level allocator * @prot: protection mask for the allocated pages + * @node node to use for allocation or -1 * * Allocate enough pages to cover @size from the page level * allocator with @gfp_mask flags. Map them into contiguous * kernel virtual space, using a pagetable protection of @prot. */ -void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot) +void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot, + int node) { struct vm_struct *area; @@ -454,13 +474,18 @@ void *__vmalloc(unsigned long size, gfp_ if (!size || (size >> PAGE_SHIFT) > num_physpages) return NULL; - area = get_vm_area(size, VM_ALLOC); + area = get_vm_area_node(size, VM_ALLOC, node); if (!area) return NULL; - return __vmalloc_area(area, gfp_mask, prot); + return __vmalloc_area_node(area, gfp_mask, prot, node); } +EXPORT_SYMBOL(__vmalloc_node); +void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot) +{ + return __vmalloc_node(size, gfp_mask, prot, -1); +} EXPORT_SYMBOL(__vmalloc); /** @@ -478,9 +503,26 @@ void *vmalloc(unsigned long size) { return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL); } - EXPORT_SYMBOL(vmalloc); +/** + * vmalloc_node - allocate memory on a specific node + * + * @size: allocation size + * @node; numa node + * + * Allocate enough pages to cover @size from the page level + * allocator and map them into contiguous kernel virtual space. + * + * For tight cotrol over page level allocator and protection flags + * use __vmalloc() instead. + */ +void *vmalloc_node(unsigned long size, int node) +{ + return __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL, node); +} +EXPORT_SYMBOL(vmalloc_node); + #ifndef PAGE_KERNEL_EXEC # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif @@ -515,7 +557,6 @@ void *vmalloc_32(unsigned long size) { return __vmalloc(size, GFP_KERNEL, PAGE_KERNEL); } - EXPORT_SYMBOL(vmalloc_32); long vread(char *buf, char *addr, unsigned long count) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 22:57 ` David S. Miller @ 2005-10-12 7:00 ` Harald Welte 2005-10-13 22:01 ` Eric Dumazet 1 sibling, 0 replies; 40+ messages in thread From: Harald Welte @ 2005-10-12 7:00 UTC (permalink / raw) To: David S. Miller; +Cc: netfilter-devel, hno, dada1 [-- Attachment #1: Type: text/plain, Size: 1266 bytes --] On Tue, Oct 11, 2005 at 03:57:33PM -0700, David S. Miller wrote: > From: Harald Welte <laforge@netfilter.org> > Date: Tue, 11 Oct 2005 19:45:56 +0200 > > > Eric, I still have your patches on hold and hope that vmalloc_node will > > eventually appear. I don't really like the idea of adding code to > > iptables that has to go knee-deep into the vm code.... So once > > vmalloc_node() is there, I'm open to merging your changes. > > After thinking about this a bit, what I'm trying to do now is > champion getting the vmalloc_node() patch into 2.6.14 so we > can use Eric's approach to fix this bug cleanly. Ok, seems fine withe me. > If this fails, I'm afraid we might need to go with the NR_CPUS > option for now as much as I'd really really hate to do that. Also fine with me, we should probably document it somewhere for 2.6.14 and get it finally fixed in 2.6.15 -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 22:57 ` David S. Miller 2005-10-12 7:00 ` Harald Welte @ 2005-10-13 22:01 ` Eric Dumazet 2005-10-17 4:18 ` David S. Miller 1 sibling, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2005-10-13 22:01 UTC (permalink / raw) To: David S. Miller; +Cc: laforge, netfilter-devel, hno [-- Attachment #1: Type: text/plain, Size: 2207 bytes --] David S. Miller a écrit : > From: Harald Welte <laforge@netfilter.org> > Date: Tue, 11 Oct 2005 19:45:56 +0200 > > >>Eric, I still have your patches on hold and hope that vmalloc_node will >>eventually appear. I don't really like the idea of adding code to >>iptables that has to go knee-deep into the vm code.... So once >>vmalloc_node() is there, I'm open to merging your changes. > > > After thinking about this a bit, what I'm trying to do now is > champion getting the vmalloc_node() patch into 2.6.14 so we > can use Eric's approach to fix this bug cleanly. > > Attached is the current version of the vmalloc_node() patch. > Eric, can you send me an updated copy of your patch, such > that it uses this vmalloc_node() instead of the mempolicy > hacks it had previously? > > If this fails, I'm afraid we might need to go with the NR_CPUS > option for now as much as I'd really really hate to do that. > > Thanks. Hi David, sorry for being late, I was traveling these last two days. Here is the patch, depending on the fact that a vmalloc_node() already exists. (If not, just replace vmalloc_node() call by vmalloc()) [PATCH] iptables : SMP/NUMA allocation, and elimination of a problem with discontiguous processor ids. Part of the performance problem we have with netfilter is memory allocation is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch separate cache lines) Even with small iptables rules, the cost of this misplacement can be high on common workloads. Instead of using one vmalloc() area (located in the node of the iptables process), we now allocate an area for each possible CPU, using vmalloc_node() so that memory should be allocated in the CPU's node if possible. If the size of ipt_table is small enough (less than one page), we use kmalloc_node() instead of vmalloc_node(), to use less memory and less TLB entries) for small setups. Please note that this patch doesnt increase the number of allocated bytes, only the location of allocated zones. For machines where processors id may be discontiguous, only needed memory is allocated. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> [-- Attachment #2: patch_iptables.4 --] [-- Type: text/plain, Size: 12553 bytes --] --- linux-2.6.14-rc4/net/ipv4/netfilter/ip_tables.c 2005-10-11 03:19:19.000000000 +0200 +++ linux-2.6.14-rc4-ed/net/ipv4/netfilter/ip_tables.c 2005-10-14 01:52:14.000000000 +0200 @@ -82,11 +82,6 @@ context stops packets coming through and allows user context to read the counters or update the rules. - To be cache friendly on SMP, we arrange them like so: - [ n-entries ] - ... cache-align padding ... - [ n-entries ] - Hence the start of any table is given by get_table() below. */ /* The table itself */ @@ -104,20 +99,15 @@ unsigned int underflow[NF_IP_NUMHOOKS]; /* ipt_entry tables: one per CPU */ - char entries[0] ____cacheline_aligned; + void *entries[NR_CPUS]; }; static LIST_HEAD(ipt_target); static LIST_HEAD(ipt_match); static LIST_HEAD(ipt_tables); +#define SET_COUNTER(c,b,p) do { (c).bcnt = (b); (c).pcnt = (p); } while(0) #define ADD_COUNTER(c,b,p) do { (c).bcnt += (b); (c).pcnt += (p); } while(0) -#ifdef CONFIG_SMP -#define TABLE_OFFSET(t,p) (SMP_ALIGN((t)->size)*(p)) -#else -#define TABLE_OFFSET(t,p) 0 -#endif - #if 0 #define down(x) do { printk("DOWN:%u:" #x "\n", __LINE__); down(x); } while(0) #define down_interruptible(x) ({ int __r; printk("DOWNi:%u:" #x "\n", __LINE__); __r = down_interruptible(x); if (__r != 0) printk("ABORT-DOWNi:%u\n", __LINE__); __r; }) @@ -289,8 +279,7 @@ read_lock_bh(&table->lock); IP_NF_ASSERT(table->valid_hooks & (1 << hook)); - table_base = (void *)table->private->entries - + TABLE_OFFSET(table->private, smp_processor_id()); + table_base = (void *)table->private->entries[smp_processor_id()]; e = get_entry(table_base, table->private->hook_entry[hook]); #ifdef CONFIG_NETFILTER_DEBUG @@ -562,7 +551,8 @@ /* Figures out from what hook each rule can be called: returns 0 if there are loops. Puts hook bitmask in comefrom. */ static int -mark_source_chains(struct ipt_table_info *newinfo, unsigned int valid_hooks) +mark_source_chains(struct ipt_table_info *newinfo, + unsigned int valid_hooks, void *entry0) { unsigned int hook; @@ -571,7 +561,7 @@ for (hook = 0; hook < NF_IP_NUMHOOKS; hook++) { unsigned int pos = newinfo->hook_entry[hook]; struct ipt_entry *e - = (struct ipt_entry *)(newinfo->entries + pos); + = (struct ipt_entry *)(entry0 + pos); if (!(valid_hooks & (1 << hook))) continue; @@ -621,13 +611,13 @@ goto next; e = (struct ipt_entry *) - (newinfo->entries + pos); + (entry0 + pos); } while (oldpos == pos + e->next_offset); /* Move along one */ size = e->next_offset; e = (struct ipt_entry *) - (newinfo->entries + pos + size); + (entry0 + pos + size); e->counters.pcnt = pos; pos += size; } else { @@ -644,7 +634,7 @@ newpos = pos + e->next_offset; } e = (struct ipt_entry *) - (newinfo->entries + newpos); + (entry0 + newpos); e->counters.pcnt = pos; pos = newpos; } @@ -854,6 +844,7 @@ translate_table(const char *name, unsigned int valid_hooks, struct ipt_table_info *newinfo, + void *entry0, unsigned int size, unsigned int number, const unsigned int *hook_entries, @@ -874,11 +865,11 @@ duprintf("translate_table: size %u\n", newinfo->size); i = 0; /* Walk through entries, checking offsets. */ - ret = IPT_ENTRY_ITERATE(newinfo->entries, newinfo->size, + ret = IPT_ENTRY_ITERATE(entry0, newinfo->size, check_entry_size_and_hooks, newinfo, - newinfo->entries, - newinfo->entries + size, + entry0, + entry0 + size, hook_entries, underflows, &i); if (ret != 0) return ret; @@ -906,25 +897,24 @@ } } - if (!mark_source_chains(newinfo, valid_hooks)) + if (!mark_source_chains(newinfo, valid_hooks, entry0)) return -ELOOP; /* Finally, each sanity check must pass */ i = 0; - ret = IPT_ENTRY_ITERATE(newinfo->entries, newinfo->size, + ret = IPT_ENTRY_ITERATE(entry0, newinfo->size, check_entry, name, size, &i); if (ret != 0) { - IPT_ENTRY_ITERATE(newinfo->entries, newinfo->size, + IPT_ENTRY_ITERATE(entry0, newinfo->size, cleanup_entry, &i); return ret; } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, - newinfo->entries, - SMP_ALIGN(newinfo->size)); + for_each_cpu(i) { + if (newinfo->entries[i] && newinfo->entries[i] != entry0) + memcpy(newinfo->entries[i], entry0, newinfo->size); } return ret; @@ -940,15 +930,12 @@ #ifdef CONFIG_NETFILTER_DEBUG { - struct ipt_entry *table_base; - unsigned int i; + int cpu; - for (i = 0; i < num_possible_cpus(); i++) { - table_base = - (void *)newinfo->entries - + TABLE_OFFSET(newinfo, i); - - table_base->comefrom = 0xdead57ac; + for_each_cpu(cpu) { + struct ipt_entry *table_base = newinfo->entries[cpu]; + if (table_base) + table_base->comefrom = 0xdead57ac; } } #endif @@ -983,21 +970,51 @@ return 0; } +static inline int +set_entry_to_counter(const struct ipt_entry *e, + struct ipt_counters total[], + unsigned int *i) +{ + SET_COUNTER(total[*i], e->counters.bcnt, e->counters.pcnt); + + (*i)++; + return 0; +} + static void get_counters(const struct ipt_table_info *t, struct ipt_counters counters[]) { unsigned int cpu; unsigned int i; + unsigned int curcpu = get_cpu(); + + /* + * Instead of clearing (by a previous call to memset()) + * the counters and using adds, we set the counters + * with data used by our CPU + */ + i = 0; + IPT_ENTRY_ITERATE(t->entries[curcpu], + t->size, + set_entry_to_counter, + counters, + &i); - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + /* + * Then for every other CPUS, we add their counters + */ + for_each_cpu(cpu) { + if (cpu == curcpu) + continue; i = 0; - IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), + IPT_ENTRY_ITERATE(t->entries[cpu], t->size, add_entry_to_counter, counters, &i); } + put_cpu(); } static int @@ -1009,6 +1026,7 @@ struct ipt_entry *e; struct ipt_counters *counters; int ret = 0; + void *loc_cpu_entry; /* We need atomic snapshot of counters: rest doesn't change (other than comefrom, which userspace doesn't care @@ -1020,13 +1038,16 @@ return -ENOMEM; /* First, sum counters... */ - memset(counters, 0, countersize); write_lock_bh(&table->lock); get_counters(table->private, counters); write_unlock_bh(&table->lock); - /* ... then copy entire thing from CPU 0... */ - if (copy_to_user(userptr, table->private->entries, total_size) != 0) { + /* + * choose the copy that is on our node/cpu, + */ + loc_cpu_entry = table->private->entries[get_cpu()]; + /* ... then copy entire thing ... */ + if (copy_to_user(userptr, loc_cpu_entry, total_size) != 0) { ret = -EFAULT; goto free_counters; } @@ -1038,7 +1059,7 @@ struct ipt_entry_match *m; struct ipt_entry_target *t; - e = (struct ipt_entry *)(table->private->entries + off); + e = (struct ipt_entry *)(loc_cpu_entry + off); if (copy_to_user(userptr + off + offsetof(struct ipt_entry, counters), &counters[num], @@ -1075,6 +1096,7 @@ } free_counters: + put_cpu(); vfree(counters); return ret; } @@ -1107,6 +1129,42 @@ return ret; } +static void free_table_info(struct ipt_table_info *info) +{ + int cpu; + for_each_cpu(cpu) { + if (info->size <= PAGE_SIZE) + kfree(info->entries[cpu]); + else + vfree(info->entries[cpu]); + } + kfree(info); +} + +static struct ipt_table_info *alloc_table_info(unsigned int size) +{ +struct ipt_table_info *newinfo; +int cpu; + newinfo = kzalloc(sizeof(struct ipt_table_info), GFP_KERNEL); + if (!newinfo) + return NULL; + newinfo->size = size; + for_each_cpu(cpu) { + if (size <= PAGE_SIZE) + newinfo->entries[cpu] = kmalloc_node(size, + GFP_KERNEL, + cpu_to_node(cpu)); + else + newinfo->entries[cpu] = vmalloc_node(size, cpu_to_node(cpu)); + if (newinfo->entries[cpu] == 0) { + free_table_info(newinfo); + return NULL; + } + } + return newinfo; +} + + static int do_replace(void __user *user, unsigned int len) { @@ -1115,6 +1173,7 @@ struct ipt_table *t; struct ipt_table_info *newinfo, *oldinfo; struct ipt_counters *counters; + void *loc_cpu_entry, *loc_cpu_old_entry; if (copy_from_user(&tmp, user, sizeof(tmp)) != 0) return -EFAULT; @@ -1127,12 +1186,14 @@ if ((SMP_ALIGN(tmp.size) >> PAGE_SHIFT) + 2 > num_physpages) return -ENOMEM; - newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + newinfo = alloc_table_info(tmp.size); if (!newinfo) return -ENOMEM; - - if (copy_from_user(newinfo->entries, user + sizeof(tmp), + /* + * choose the copy that is on our node/cpu + */ + loc_cpu_entry = newinfo->entries[get_cpu()]; + if (copy_from_user(loc_cpu_entry, user + sizeof(tmp), tmp.size) != 0) { ret = -EFAULT; goto free_newinfo; @@ -1143,10 +1204,9 @@ ret = -ENOMEM; goto free_newinfo; } - memset(counters, 0, tmp.num_counters * sizeof(struct ipt_counters)); ret = translate_table(tmp.name, tmp.valid_hooks, - newinfo, tmp.size, tmp.num_entries, + newinfo, loc_cpu_entry, tmp.size, tmp.num_entries, tmp.hook_entry, tmp.underflow); if (ret != 0) goto free_newinfo_counters; @@ -1185,8 +1245,10 @@ /* Get the old counters. */ get_counters(oldinfo, counters); /* Decrease module usage counts and free resource */ - IPT_ENTRY_ITERATE(oldinfo->entries, oldinfo->size, cleanup_entry,NULL); - vfree(oldinfo); + loc_cpu_old_entry = oldinfo->entries[smp_processor_id()]; + IPT_ENTRY_ITERATE(loc_cpu_old_entry, oldinfo->size, cleanup_entry,NULL); + put_cpu(); + free_table_info(oldinfo); if (copy_to_user(tmp.counters, counters, sizeof(struct ipt_counters) * tmp.num_counters) != 0) ret = -EFAULT; @@ -1198,11 +1260,12 @@ module_put(t->me); up(&ipt_mutex); free_newinfo_counters_untrans: - IPT_ENTRY_ITERATE(newinfo->entries, newinfo->size, cleanup_entry,NULL); + IPT_ENTRY_ITERATE(loc_cpu_entry, newinfo->size, cleanup_entry,NULL); free_newinfo_counters: vfree(counters); free_newinfo: - vfree(newinfo); + put_cpu(); + free_table_info(newinfo); return ret; } @@ -1235,6 +1298,7 @@ struct ipt_counters_info tmp, *paddc; struct ipt_table *t; int ret = 0; + void *loc_cpu_entry; if (copy_from_user(&tmp, user, sizeof(tmp)) != 0) return -EFAULT; @@ -1264,7 +1328,11 @@ } i = 0; - IPT_ENTRY_ITERATE(t->private->entries, + /* + * Choose the copy that is on our node + */ + loc_cpu_entry = t->private->entries[smp_processor_id()]; + IPT_ENTRY_ITERATE(loc_cpu_entry, t->private->size, add_counter_to_entry, paddc->counters, @@ -1456,27 +1524,32 @@ struct ipt_table_info *newinfo; static struct ipt_table_info bootstrap = { 0, 0, 0, { 0 }, { 0 }, { } }; + void *loc_cpu_entry; - newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + newinfo = alloc_table_info(repl->size); if (!newinfo) return -ENOMEM; - memcpy(newinfo->entries, repl->entries, repl->size); + /* + * choose the copy that is on our node/cpu + */ + loc_cpu_entry = newinfo->entries[get_cpu()]; + memcpy(loc_cpu_entry, repl->entries, repl->size); ret = translate_table(table->name, table->valid_hooks, - newinfo, repl->size, + newinfo, loc_cpu_entry, repl->size, repl->num_entries, repl->hook_entry, repl->underflow); + put_cpu(); if (ret != 0) { - vfree(newinfo); + free_table_info(newinfo); return ret; } ret = down_interruptible(&ipt_mutex); if (ret != 0) { - vfree(newinfo); + free_table_info(newinfo); return ret; } @@ -1505,20 +1578,26 @@ return ret; free_unlock: - vfree(newinfo); + free_table_info(newinfo); goto unlock; } void ipt_unregister_table(struct ipt_table *table) { + void *loc_cpu_entry; down(&ipt_mutex); LIST_DELETE(&ipt_tables, table); up(&ipt_mutex); - /* Decrease module usage counts and free resources */ - IPT_ENTRY_ITERATE(table->private->entries, table->private->size, + /* + * Decrease module usage counts and free resources + * Choose the copy that is on our node/cpu + */ + loc_cpu_entry = table->private->entries[get_cpu()]; + IPT_ENTRY_ITERATE(loc_cpu_entry, table->private->size, cleanup_entry, NULL); - vfree(table->private); + put_cpu(); + free_table_info(table->private); } /* Returns 1 if the port is matched by the range, 0 otherwise */ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-13 22:01 ` Eric Dumazet @ 2005-10-17 4:18 ` David S. Miller 0 siblings, 0 replies; 40+ messages in thread From: David S. Miller @ 2005-10-17 4:18 UTC (permalink / raw) To: dada1; +Cc: laforge, netfilter-devel, hno From: Eric Dumazet <dada1@cosmosbay.com> Date: Fri, 14 Oct 2005 00:01:13 +0200 > Hi David, sorry for being late, I was traveling these last two days. No problem, I'm about to go traveling for 2 weeks :-) We put in a slightly different, arguably simpler/safer fix, in for 2.6.14, but I invite you to submit this patch of your's once the vmalloc_node() stuff goes into 2.6.15 Thanks! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:15 ` David S. Miller 2005-10-10 21:41 ` Patrick McHardy 2005-10-10 21:46 ` Harald Welte @ 2005-10-11 10:44 ` Harald Welte 2005-10-11 17:15 ` Bart De Schuymer 2005-10-11 17:54 ` Harald Welte 2 siblings, 2 replies; 40+ messages in thread From: Harald Welte @ 2005-10-11 10:44 UTC (permalink / raw) To: David S. Miller, bdschuym; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1712 bytes --] On Mon, Oct 10, 2005 at 02:15:18PM -0700, David S. Miller wrote: > From: Harald Welte <laforge@netfilter.org> > Date: Mon, 10 Oct 2005 18:41:41 +0200 > > > This is my proposed patch for the problem you've described. Please test > > and submit. If it works, I'll also prepare a patch for {arp,ip6}_tables. > > Ebtables needs it too. "git grep num_possible_cpus" shows all > of the users in the whole tree, %80 of which are the netfilter > cases we're fixing here :-) mh. I'm not very familiar with the ebtables specific changes, and apparently it differs quite a bit in counter handling. Anyway, I've tried to come up with a patch that also includes ebtables. Bart: Could you please comment on it? If you have missed the previous thread: some systems have smp processor id's "0 2" but no "1", e.g. num_possible_cpus() is 2 (and we allocate two rulesets), but then we try to index it with '2', e.g. the non-existing 3rd copy. Unfortunately I don't have a SMP box during travel, so I can't test whether it at least still works on 'normal' smp boxes. So all I can say is that it looks right, and it compiles with CONFIG_SMP. Please, could anybody test this on their SMP boxes ASAP? I won't be having access to a SMP machine for testing until 2005-10-18, and presumably we don't want to delay this fix. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 10:44 ` Harald Welte @ 2005-10-11 17:15 ` Bart De Schuymer 2005-10-11 17:55 ` Harald Welte 2005-10-11 17:54 ` Harald Welte 1 sibling, 1 reply; 40+ messages in thread From: Bart De Schuymer @ 2005-10-11 17:15 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel, David S. Miller Op di, 11-10-2005 te 12:44 +0200, schreef Harald Welte: > On Mon, Oct 10, 2005 at 02:15:18PM -0700, David S. Miller wrote: > > Ebtables needs it too. "git grep num_possible_cpus" shows all > > of the users in the whole tree, %80 of which are the netfilter > > cases we're fixing here :-) > > mh. I'm not very familiar with the ebtables specific changes, and > apparently it differs quite a bit in counter handling. Counters are allocated separately from the rules so that rules don't need to be duplicated for each cpu. But the fix for ebtables should be very similar to the one for iptables. > Anyway, I've tried to come up with a patch that also includes ebtables. > > Bart: Could you please comment on it? If you have missed the previous > thread: some systems have smp processor id's "0 2" but no "1", e.g. > num_possible_cpus() is 2 (and we allocate two rulesets), but then we try > to index it with '2', e.g. the non-existing 3rd copy. I will once I can see it :) cheers, Bart ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 17:15 ` Bart De Schuymer @ 2005-10-11 17:55 ` Harald Welte 0 siblings, 0 replies; 40+ messages in thread From: Harald Welte @ 2005-10-11 17:55 UTC (permalink / raw) To: Bart De Schuymer; +Cc: netfilter-devel, David S. Miller [-- Attachment #1: Type: text/plain, Size: 941 bytes --] On Tue, Oct 11, 2005 at 05:15:01PM +0000, Bart De Schuymer wrote: > > mh. I'm not very familiar with the ebtables specific changes, and > > apparently it differs quite a bit in counter handling. > > Counters are allocated separately from the rules so that rules don't > need to be duplicated for each cpu. Yes, this makes a lot of sense. I didn't want to imply that I dislike that ip/eb tables difference. I just wanted to say that I don't know the code well enough. > I will once I can see it :) see my other mail, sorry forgot the attachment. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 10:44 ` Harald Welte 2005-10-11 17:15 ` Bart De Schuymer @ 2005-10-11 17:54 ` Harald Welte 2005-10-11 21:54 ` Sébastien Bernard 2005-10-12 22:54 ` David S. Miller 1 sibling, 2 replies; 40+ messages in thread From: Harald Welte @ 2005-10-11 17:54 UTC (permalink / raw) To: David S. Miller, bdschuym, netfilter-devel [-- Attachment #1.1: Type: text/plain, Size: 512 bytes --] *sigh* forgot to attach the patch. This probably still has the same problem that was reported (oops when get_entries() is called). -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #1.2: s --] [-- Type: text/plain, Size: 9560 bytes --] diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -392,4 +392,18 @@ extern cpumask_t cpu_present_map; #define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map) #define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map) +/* Find the highest possible smp_processor_id() */ +static inline unsigned int highest_possible_processor_id(void) +{ + unsigned int cpu, highest = 0; + + for_each_cpu_mask(cpu, cpu_possible_map) { + if (cpu > highest) + highest = cpu; + } + + return highest; +} + + #endif /* __LINUX_CPUMASK_H */ diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <asm/uaccess.h> #include <linux/smp.h> +#include <linux/cpumask.h> #include <net/sock.h> /* needed for logical [in,out]-dev filtering */ #include "../br_private.h" @@ -823,10 +824,11 @@ static int translate_table(struct ebt_re /* this will get free'd in do_replace()/ebt_register_table() if an error occurs */ newinfo->chainstack = (struct ebt_chainstack **) - vmalloc(num_possible_cpus() * sizeof(struct ebt_chainstack)); + vmalloc((highest_possible_processor_id()+1) + * sizeof(struct ebt_chainstack)); if (!newinfo->chainstack) return -ENOMEM; - for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu_mask(i, cpu_possible_map) { newinfo->chainstack[i] = vmalloc(udc_cnt * sizeof(struct ebt_chainstack)); if (!newinfo->chainstack[i]) { @@ -893,11 +895,8 @@ static void get_counters(struct ebt_coun int i, cpu; struct ebt_counter *counter_base; - /* counters of cpu 0 */ - memcpy(counters, oldcounters, - sizeof(struct ebt_counter) * nentries); /* add other counters to those of cpu 0 */ - for (cpu = 1; cpu < num_possible_cpus(); cpu++) { + for_each_cpu_mask(cpu, cpu_possible_map) { counter_base = COUNTER_BASE(oldcounters, nentries, cpu); for (i = 0; i < nentries; i++) { counters[i].pcnt += counter_base[i].pcnt; @@ -929,7 +928,8 @@ static int do_replace(void __user *user, BUGPRINT("Entries_size never zero\n"); return -EINVAL; } - countersize = COUNTER_OFFSET(tmp.nentries) * num_possible_cpus(); + countersize = COUNTER_OFFSET(tmp.nentries) * + (highest_possible_processor_id()+1); newinfo = (struct ebt_table_info *) vmalloc(sizeof(struct ebt_table_info) + countersize); if (!newinfo) @@ -1022,7 +1022,7 @@ static int do_replace(void __user *user, vfree(table->entries); if (table->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu_mask(i, cpu_possible_map) vfree(table->chainstack[i]); vfree(table->chainstack); } @@ -1040,7 +1040,7 @@ free_counterstmp: vfree(counterstmp); /* can be initialized in translate_table() */ if (newinfo->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu_mask(i, cpu_possible_map) vfree(newinfo->chainstack[i]); vfree(newinfo->chainstack); } @@ -1132,7 +1132,8 @@ int ebt_register_table(struct ebt_table return -EINVAL; } - countersize = COUNTER_OFFSET(table->table->nentries) * num_possible_cpus(); + countersize = COUNTER_OFFSET(table->table->nentries) * + (highest_possible_processor_id()+1); newinfo = (struct ebt_table_info *) vmalloc(sizeof(struct ebt_table_info) + countersize); ret = -ENOMEM; @@ -1186,7 +1187,7 @@ free_unlock: up(&ebt_mutex); free_chainstack: if (newinfo->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu_mask(i, cpu_possible_map) vfree(newinfo->chainstack[i]); vfree(newinfo->chainstack); } @@ -1209,7 +1210,7 @@ void ebt_unregister_table(struct ebt_tab up(&ebt_mutex); vfree(table->private->entries); if (table->private->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu_mask(i, cpu_possible_map) vfree(table->private->chainstack[i]); vfree(table->private->chainstack); } diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -716,8 +716,8 @@ static int translate_table(const char *n } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu_mask(i, cpu_possible_map) { + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -767,7 +767,7 @@ static void get_counters(const struct ar unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu_mask(cpu, cpu_possible_map) { i = 0; ARPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -885,7 +885,8 @@ static int do_replace(void __user *user, return -ENOMEM; newinfo = vmalloc(sizeof(struct arpt_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1158,7 +1159,8 @@ int arpt_register_table(struct arpt_tabl = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct arpt_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) { ret = -ENOMEM; return ret; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -27,6 +27,7 @@ #include <asm/semaphore.h> #include <linux/proc_fs.h> #include <linux/err.h> +#include <linux/cpumask.h> #include <linux/netfilter_ipv4/ip_tables.h> @@ -921,8 +922,8 @@ translate_table(const char *name, } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu_mask(i, cpu_possible_map) { + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -943,7 +944,7 @@ replace_table(struct ipt_table *table, struct ipt_entry *table_base; unsigned int i; - for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu_mask(i, cpu_possible_map) { table_base = (void *)newinfo->entries + TABLE_OFFSET(newinfo, i); @@ -990,7 +991,7 @@ get_counters(const struct ipt_table_info unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu_mask(cpu, cpu_possible_map) { i = 0; IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -1128,7 +1129,8 @@ do_replace(void __user *user, unsigned i return -ENOMEM; newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1458,7 +1460,8 @@ int ipt_register_table(struct ipt_table = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -28,6 +28,7 @@ #include <asm/uaccess.h> #include <asm/semaphore.h> #include <linux/proc_fs.h> +#include <linux/cpumask.h> #include <linux/netfilter_ipv6/ip6_tables.h> @@ -950,8 +951,8 @@ translate_table(const char *name, } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu_mask(i, cpu_possible_map) { + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -973,6 +974,7 @@ replace_table(struct ip6t_table *table, unsigned int i; for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu_mask(i, cpu_possible_mask) { table_base = (void *)newinfo->entries + TABLE_OFFSET(newinfo, i); @@ -1019,7 +1021,7 @@ get_counters(const struct ip6t_table_inf unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu_mask(cpu, cpu_possible_map) { i = 0; IP6T_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -1153,7 +1155,8 @@ do_replace(void __user *user, unsigned i return -ENOMEM; newinfo = vmalloc(sizeof(struct ip6t_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1467,7 +1470,8 @@ int ip6t_register_table(struct ip6t_tabl = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct ip6t_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 17:54 ` Harald Welte @ 2005-10-11 21:54 ` Sébastien Bernard 2005-10-11 22:32 ` David S. Miller 2005-10-12 6:43 ` Harald Welte 2005-10-12 22:54 ` David S. Miller 1 sibling, 2 replies; 40+ messages in thread From: Sébastien Bernard @ 2005-10-11 21:54 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel, bdschuym, David S. Miller Harald Welte a écrit : >*sigh* > >forgot to attach the patch. > >This probably still has the same problem that was reported (oops when >get_entries() is called). > > > >------------------------------------------------------------------------ > >diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h >--- a/include/linux/cpumask.h >+++ b/include/linux/cpumask.h >@@ -392,4 +392,18 @@ extern cpumask_t cpu_present_map; > #define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map) > #define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map) > >+/* Find the highest possible smp_processor_id() */ >+static inline unsigned int highest_possible_processor_id(void) >+{ >+ unsigned int cpu, highest = 0; >+ >+ for_each_cpu_mask(cpu, cpu_possible_map) { >+ if (cpu > highest) >+ highest = cpu; >+ } >+ >+ return highest; >+} >+ >+ > #endif /* __LINUX_CPUMASK_H */ >diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c >--- a/net/bridge/netfilter/ebtables.c >+++ b/net/bridge/netfilter/ebtables.c >@@ -26,6 +26,7 @@ > #include <linux/spinlock.h> > #include <asm/uaccess.h> > #include <linux/smp.h> >+#include <linux/cpumask.h> > #include <net/sock.h> > /* needed for logical [in,out]-dev filtering */ > #include "../br_private.h" >@@ -823,10 +824,11 @@ static int translate_table(struct ebt_re > /* this will get free'd in do_replace()/ebt_register_table() > if an error occurs */ > newinfo->chainstack = (struct ebt_chainstack **) >- vmalloc(num_possible_cpus() * sizeof(struct ebt_chainstack)); >+ vmalloc((highest_possible_processor_id()+1) >+ * sizeof(struct ebt_chainstack)); > if (!newinfo->chainstack) > return -ENOMEM; >- for (i = 0; i < num_possible_cpus(); i++) { >+ for_each_cpu_mask(i, cpu_possible_map) { > newinfo->chainstack[i] = > vmalloc(udc_cnt * sizeof(struct ebt_chainstack)); > if (!newinfo->chainstack[i]) { >@@ -893,11 +895,8 @@ static void get_counters(struct ebt_coun > int i, cpu; > struct ebt_counter *counter_base; > >- /* counters of cpu 0 */ >- memcpy(counters, oldcounters, >- sizeof(struct ebt_counter) * nentries); > /* add other counters to those of cpu 0 */ >- for (cpu = 1; cpu < num_possible_cpus(); cpu++) { >+ for_each_cpu_mask(cpu, cpu_possible_map) { > counter_base = COUNTER_BASE(oldcounters, nentries, cpu); > for (i = 0; i < nentries; i++) { > counters[i].pcnt += counter_base[i].pcnt; >@@ -929,7 +928,8 @@ static int do_replace(void __user *user, > BUGPRINT("Entries_size never zero\n"); > return -EINVAL; > } >- countersize = COUNTER_OFFSET(tmp.nentries) * num_possible_cpus(); >+ countersize = COUNTER_OFFSET(tmp.nentries) * >+ (highest_possible_processor_id()+1); > newinfo = (struct ebt_table_info *) > vmalloc(sizeof(struct ebt_table_info) + countersize); > if (!newinfo) >@@ -1022,7 +1022,7 @@ static int do_replace(void __user *user, > > vfree(table->entries); > if (table->chainstack) { >- for (i = 0; i < num_possible_cpus(); i++) >+ for_each_cpu_mask(i, cpu_possible_map) > vfree(table->chainstack[i]); > vfree(table->chainstack); > } >@@ -1040,7 +1040,7 @@ free_counterstmp: > vfree(counterstmp); > /* can be initialized in translate_table() */ > if (newinfo->chainstack) { >- for (i = 0; i < num_possible_cpus(); i++) >+ for_each_cpu_mask(i, cpu_possible_map) > vfree(newinfo->chainstack[i]); > vfree(newinfo->chainstack); > } >@@ -1132,7 +1132,8 @@ int ebt_register_table(struct ebt_table > return -EINVAL; > } > >- countersize = COUNTER_OFFSET(table->table->nentries) * num_possible_cpus(); >+ countersize = COUNTER_OFFSET(table->table->nentries) * >+ (highest_possible_processor_id()+1); > newinfo = (struct ebt_table_info *) > vmalloc(sizeof(struct ebt_table_info) + countersize); > ret = -ENOMEM; >@@ -1186,7 +1187,7 @@ free_unlock: > up(&ebt_mutex); > free_chainstack: > if (newinfo->chainstack) { >- for (i = 0; i < num_possible_cpus(); i++) >+ for_each_cpu_mask(i, cpu_possible_map) > vfree(newinfo->chainstack[i]); > vfree(newinfo->chainstack); > } >@@ -1209,7 +1210,7 @@ void ebt_unregister_table(struct ebt_tab > up(&ebt_mutex); > vfree(table->private->entries); > if (table->private->chainstack) { >- for (i = 0; i < num_possible_cpus(); i++) >+ for_each_cpu_mask(i, cpu_possible_map) > vfree(table->private->chainstack[i]); > vfree(table->private->chainstack); > } >diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c >--- a/net/ipv4/netfilter/arp_tables.c >+++ b/net/ipv4/netfilter/arp_tables.c >@@ -716,8 +716,8 @@ static int translate_table(const char *n > } > > /* And one copy for every other CPU */ >- for (i = 1; i < num_possible_cpus(); i++) { >- memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, >+ for_each_cpu_mask(i, cpu_possible_map) { >+ memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), > newinfo->entries, > SMP_ALIGN(newinfo->size)); > } >@@ -767,7 +767,7 @@ static void get_counters(const struct ar > unsigned int cpu; > unsigned int i; > >- for (cpu = 0; cpu < num_possible_cpus(); cpu++) { >+ for_each_cpu_mask(cpu, cpu_possible_map) { > i = 0; > ARPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), > t->size, >@@ -885,7 +885,8 @@ static int do_replace(void __user *user, > return -ENOMEM; > > newinfo = vmalloc(sizeof(struct arpt_table_info) >- + SMP_ALIGN(tmp.size) * num_possible_cpus()); >+ + SMP_ALIGN(tmp.size) * >+ (highest_possible_processor_id()+1)); > if (!newinfo) > return -ENOMEM; > >@@ -1158,7 +1159,8 @@ int arpt_register_table(struct arpt_tabl > = { 0, 0, 0, { 0 }, { 0 }, { } }; > > newinfo = vmalloc(sizeof(struct arpt_table_info) >- + SMP_ALIGN(repl->size) * num_possible_cpus()); >+ + SMP_ALIGN(repl->size) * >+ (highest_possible_processor_id()+1)); > if (!newinfo) { > ret = -ENOMEM; > return ret; >diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c >--- a/net/ipv4/netfilter/ip_tables.c >+++ b/net/ipv4/netfilter/ip_tables.c >@@ -27,6 +27,7 @@ > #include <asm/semaphore.h> > #include <linux/proc_fs.h> > #include <linux/err.h> >+#include <linux/cpumask.h> > > #include <linux/netfilter_ipv4/ip_tables.h> > >@@ -921,8 +922,8 @@ translate_table(const char *name, > } > > /* And one copy for every other CPU */ >- for (i = 1; i < num_possible_cpus(); i++) { >- memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, >+ for_each_cpu_mask(i, cpu_possible_map) { >+ memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), > newinfo->entries, > SMP_ALIGN(newinfo->size)); > } >@@ -943,7 +944,7 @@ replace_table(struct ipt_table *table, > struct ipt_entry *table_base; > unsigned int i; > >- for (i = 0; i < num_possible_cpus(); i++) { >+ for_each_cpu_mask(i, cpu_possible_map) { > table_base = > (void *)newinfo->entries > + TABLE_OFFSET(newinfo, i); >@@ -990,7 +991,7 @@ get_counters(const struct ipt_table_info > unsigned int cpu; > unsigned int i; > >- for (cpu = 0; cpu < num_possible_cpus(); cpu++) { >+ for_each_cpu_mask(cpu, cpu_possible_map) { > i = 0; > IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), > t->size, >@@ -1128,7 +1129,8 @@ do_replace(void __user *user, unsigned i > return -ENOMEM; > > newinfo = vmalloc(sizeof(struct ipt_table_info) >- + SMP_ALIGN(tmp.size) * num_possible_cpus()); >+ + SMP_ALIGN(tmp.size) * >+ (highest_possible_processor_id()+1)); > if (!newinfo) > return -ENOMEM; > >@@ -1458,7 +1460,8 @@ int ipt_register_table(struct ipt_table > = { 0, 0, 0, { 0 }, { 0 }, { } }; > > newinfo = vmalloc(sizeof(struct ipt_table_info) >- + SMP_ALIGN(repl->size) * num_possible_cpus()); >+ + SMP_ALIGN(repl->size) * >+ (highest_possible_processor_id()+1)); > if (!newinfo) > return -ENOMEM; > >diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c >--- a/net/ipv6/netfilter/ip6_tables.c >+++ b/net/ipv6/netfilter/ip6_tables.c >@@ -28,6 +28,7 @@ > #include <asm/uaccess.h> > #include <asm/semaphore.h> > #include <linux/proc_fs.h> >+#include <linux/cpumask.h> > > #include <linux/netfilter_ipv6/ip6_tables.h> > >@@ -950,8 +951,8 @@ translate_table(const char *name, > } > > /* And one copy for every other CPU */ >- for (i = 1; i < num_possible_cpus(); i++) { >- memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, >+ for_each_cpu_mask(i, cpu_possible_map) { >+ memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), > newinfo->entries, > SMP_ALIGN(newinfo->size)); > } >@@ -973,6 +974,7 @@ replace_table(struct ip6t_table *table, > unsigned int i; > > for (i = 0; i < num_possible_cpus(); i++) { >+ for_each_cpu_mask(i, cpu_possible_mask) { > table_base = > (void *)newinfo->entries > + TABLE_OFFSET(newinfo, i); >@@ -1019,7 +1021,7 @@ get_counters(const struct ip6t_table_inf > unsigned int cpu; > unsigned int i; > >- for (cpu = 0; cpu < num_possible_cpus(); cpu++) { >+ for_each_cpu_mask(cpu, cpu_possible_map) { > i = 0; > IP6T_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), > t->size, >@@ -1153,7 +1155,8 @@ do_replace(void __user *user, unsigned i > return -ENOMEM; > > newinfo = vmalloc(sizeof(struct ip6t_table_info) >- + SMP_ALIGN(tmp.size) * num_possible_cpus()); >+ + SMP_ALIGN(tmp.size) * >+ (highest_possible_processor_id()+1)); > if (!newinfo) > return -ENOMEM; > >@@ -1467,7 +1470,8 @@ int ip6t_register_table(struct ip6t_tabl > = { 0, 0, 0, { 0 }, { 0 }, { } }; > > newinfo = vmalloc(sizeof(struct ip6t_table_info) >- + SMP_ALIGN(repl->size) * num_possible_cpus()); >+ + SMP_ALIGN(repl->size) * >+ (highest_possible_processor_id()+1)); > if (!newinfo) > return -ENOMEM; > > > On sparc64, cpu_possible_map does not exist. You have either cpu_online_map or phys_cpu_present_map. Seb ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 21:54 ` Sébastien Bernard @ 2005-10-11 22:32 ` David S. Miller 2005-10-12 6:22 ` seb 2005-10-12 6:43 ` Harald Welte 1 sibling, 1 reply; 40+ messages in thread From: David S. Miller @ 2005-10-11 22:32 UTC (permalink / raw) To: seb; +Cc: laforge, netfilter-devel, bdschuym From: Sébastien Bernard <seb@frankengul.org> Date: Tue, 11 Oct 2005 23:54:13 +0200 > On sparc64, cpu_possible_map does not exist. Yes, it does, although indirectly. Otherwise for_each_cpu() would be horrible broken and never allow building sparc64 SMP kernels :-) It's defined in kernel/sched.c for non-SMP and defined to phys_cpu_present_map when SMP. Did you try to build his patch and it actually failed? If so, please post the build failure instead of speculation about why it failed so that this can be properly analyzed :-) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 22:32 ` David S. Miller @ 2005-10-12 6:22 ` seb 2005-10-12 6:31 ` David S. Miller 0 siblings, 1 reply; 40+ messages in thread From: seb @ 2005-10-12 6:22 UTC (permalink / raw) To: netfilter-devel On Tue, Oct 11, 2005 at 03:32:57PM -0700, David S. Miller wrote: > From: Sébastien Bernard <seb@frankengul.org> > Date: Tue, 11 Oct 2005 23:54:13 +0200 > > > On sparc64, cpu_possible_map does not exist. > > Yes, it does, although indirectly. Otherwise for_each_cpu() would be > horrible broken and never allow building sparc64 SMP kernels :-) > > It's defined in kernel/sched.c for non-SMP and defined to > phys_cpu_present_map when SMP. > > Did you try to build his patch and it actually failed? If so, please > post the build failure instead of speculation about why it failed so > that this can be properly analyzed :-) > It is no speculation. After building, the modules_install reports the symbol undefined. Well, I see now that it is only a lack of EXPORT_SYMBOL. I was checking other arch. All smp arch used cpu_online_map and cpu_possible_map but sparc which used phys_cpu_present_map (a specificity of the sparc port I presume). Anyway, it's on its way now. Seb ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-12 6:22 ` seb @ 2005-10-12 6:31 ` David S. Miller 2005-10-12 7:04 ` Sébastien Bernard 0 siblings, 1 reply; 40+ messages in thread From: David S. Miller @ 2005-10-12 6:31 UTC (permalink / raw) To: seb; +Cc: netfilter-devel From: seb@frankengul.org Date: Wed, 12 Oct 2005 08:22:18 +0200 > After building, the modules_install reports the symbol undefined. > Well, I see now that it is only a lack of EXPORT_SYMBOL. Aha, that makes sense. > All smp arch used cpu_online_map and cpu_possible_map but sparc which > used phys_cpu_present_map (a specificity of the sparc port I presume). Yes, we do this because since we lack cpu hotplug support this means "cpu present"=="cpu possible". ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-12 6:31 ` David S. Miller @ 2005-10-12 7:04 ` Sébastien Bernard 2005-10-12 7:34 ` David S. Miller 0 siblings, 1 reply; 40+ messages in thread From: Sébastien Bernard @ 2005-10-12 7:04 UTC (permalink / raw) To: David S. Miller; +Cc: netfilter-devel David S. Miller a écrit : >From: seb@frankengul.org >Date: Wed, 12 Oct 2005 08:22:18 +0200 > > > >>After building, the modules_install reports the symbol undefined. >>Well, I see now that it is only a lack of EXPORT_SYMBOL. >> >> > >Aha, that makes sense. > > > >>All smp arch used cpu_online_map and cpu_possible_map but sparc which >>used phys_cpu_present_map (a specificity of the sparc port I presume). >> >> > >Yes, we do this because since we lack cpu hotplug support >this means "cpu present"=="cpu possible". > > > Ok, heres is the oops provoked by the second patch posted by Harald on the 11th of October. unable to handle paging_request (some zeros...)140098000 iptables_restor(2488): Oops [#1] TPC: <get_couters +0x68/0xe0> Caller: do_ipt_get_ctl + 0x5b8/0x7a0 nf_sock_opt + 0xa4 / 0x160 ip_get_sockopt + 0xa4 / 0x5e0 sock_common_getsockopt + 0x1c/0x40 sys_get_sockopt + 0x84/0xe0 compat_sys_getsockopt + 0x28/0x120 linux_sparc_syscall32 + 0x34/0x40 Note that it is a different crash. The previous was a unable to handle kernel request NULL dereference. As a result, the machine locks hard. Hope it helps. Seb ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-12 7:04 ` Sébastien Bernard @ 2005-10-12 7:34 ` David S. Miller 0 siblings, 0 replies; 40+ messages in thread From: David S. Miller @ 2005-10-12 7:34 UTC (permalink / raw) To: seb; +Cc: netfilter-devel From: Sébastien Bernard <seb@frankengul.org> Date: Wed, 12 Oct 2005 09:04:13 +0200 > unable to handle paging_request (some zeros...)140098000 This 0x140098000 is in the VMALLOC area, presumable this is an unmapped out-of-range VMALLOC address. > iptables_restor(2488): Oops [#1] > TPC: <get_couters +0x68/0xe0> I guess there is still some indexing problem in Harald's patch, which is what causes the out-of-range vmalloc access. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 21:54 ` Sébastien Bernard 2005-10-11 22:32 ` David S. Miller @ 2005-10-12 6:43 ` Harald Welte 1 sibling, 0 replies; 40+ messages in thread From: Harald Welte @ 2005-10-12 6:43 UTC (permalink / raw) To: Sébastien Bernard; +Cc: netfilter-devel, bdschuym, David S. Miller [-- Attachment #1: Type: text/plain, Size: 624 bytes --] On Tue, Oct 11, 2005 at 11:54:13PM +0200, Sébastien Bernard wrote: > Harald Welte a écrit : > On sparc64, cpu_possible_map does not exist. strange, I guess this should be fixed in the sparc code then. Maybe I was just missing some includes... -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 17:54 ` Harald Welte 2005-10-11 21:54 ` Sébastien Bernard @ 2005-10-12 22:54 ` David S. Miller 2005-10-13 7:18 ` seb 2005-10-13 8:46 ` Herbert Xu 1 sibling, 2 replies; 40+ messages in thread From: David S. Miller @ 2005-10-12 22:54 UTC (permalink / raw) To: laforge; +Cc: netfilter-devel, bdschuym, seb From: Harald Welte <laforge@netfilter.org> Date: Tue, 11 Oct 2005 19:54:57 +0200 I think I found the bug in your patch Harald. > @@ -716,8 +716,8 @@ static int translate_table(const char *n > } > > /* And one copy for every other CPU */ > - for (i = 1; i < num_possible_cpus(); i++) { > - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, > + for_each_cpu_mask(i, cpu_possible_map) { > + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), > newinfo->entries, > SMP_ALIGN(newinfo->size)); > } This isn't correct I think. It assumes cpu numbers are linear :-) In the two cpu case of cpu "0" and cpu "2" what this loop will do is: 1) copy entry 0 to entry 1 2) copy entry 0 to entry 3 which fails to initialize entry 2 and that's how we crash. Maybe the loop is better constructed like this (btw, for_each_cpu() walks over cpu_possible_map, no need to expand that by hand): for_each_cpu(i) { if (i == 0) continue; memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, newinfo->entries, SMP_ALIGN(newinfo->size)); } Here is an updated patch, and it takes care of all of the platforms that don't export the necessary cpumask symbols. Sebastian can you test this one out please? Thanks a lot. diff --git a/arch/cris/arch-v32/kernel/smp.c b/arch/cris/arch-v32/kernel/smp.c index 2c5cae0..957f551 100644 --- a/arch/cris/arch-v32/kernel/smp.c +++ b/arch/cris/arch-v32/kernel/smp.c @@ -15,6 +15,7 @@ #include <linux/kernel.h> #include <linux/cpumask.h> #include <linux/interrupt.h> +#include <linux/module.h> #define IPI_SCHEDULE 1 #define IPI_CALL 2 @@ -28,6 +29,7 @@ spinlock_t cris_atomic_locks[] = { [0 .. /* CPU masks */ cpumask_t cpu_online_map = CPU_MASK_NONE; cpumask_t phys_cpu_present_map = CPU_MASK_NONE; +EXPORT_SYMBOL(phys_cpu_present_map); /* Variables used during SMP boot */ volatile int cpu_now_booting = 0; diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c index 56a39d6..5ecefc0 100644 --- a/arch/sh/kernel/smp.c +++ b/arch/sh/kernel/smp.c @@ -22,6 +22,7 @@ #include <linux/time.h> #include <linux/timex.h> #include <linux/sched.h> +#include <linux/module.h> #include <asm/atomic.h> #include <asm/processor.h> @@ -39,6 +40,8 @@ struct sh_cpuinfo cpu_data[NR_CPUS]; extern void per_cpu_trap_init(void); cpumask_t cpu_possible_map; +EXPORT_SYMBOL(cpu_possible_map); + cpumask_t cpu_online_map; static atomic_t cpus_booted = ATOMIC_INIT(0); diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index b15826f..36c46a7 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -392,4 +392,18 @@ extern cpumask_t cpu_present_map; #define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map) #define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map) +/* Find the highest possible smp_processor_id() */ +static inline unsigned int highest_possible_processor_id(void) +{ + unsigned int cpu, highest = 0; + + for_each_cpu_mask(cpu, cpu_possible_map) { + if (cpu > highest) + highest = cpu; + } + + return highest; +} + + #endif /* __LINUX_CPUMASK_H */ diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index c454014..9010661 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <asm/uaccess.h> #include <linux/smp.h> +#include <linux/cpumask.h> #include <net/sock.h> /* needed for logical [in,out]-dev filtering */ #include "../br_private.h" @@ -823,10 +824,11 @@ static int translate_table(struct ebt_re /* this will get free'd in do_replace()/ebt_register_table() if an error occurs */ newinfo->chainstack = (struct ebt_chainstack **) - vmalloc(num_possible_cpus() * sizeof(struct ebt_chainstack)); + vmalloc((highest_possible_processor_id()+1) + * sizeof(struct ebt_chainstack)); if (!newinfo->chainstack) return -ENOMEM; - for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu(i) { newinfo->chainstack[i] = vmalloc(udc_cnt * sizeof(struct ebt_chainstack)); if (!newinfo->chainstack[i]) { @@ -893,11 +895,8 @@ static void get_counters(struct ebt_coun int i, cpu; struct ebt_counter *counter_base; - /* counters of cpu 0 */ - memcpy(counters, oldcounters, - sizeof(struct ebt_counter) * nentries); /* add other counters to those of cpu 0 */ - for (cpu = 1; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { counter_base = COUNTER_BASE(oldcounters, nentries, cpu); for (i = 0; i < nentries; i++) { counters[i].pcnt += counter_base[i].pcnt; @@ -929,7 +928,8 @@ static int do_replace(void __user *user, BUGPRINT("Entries_size never zero\n"); return -EINVAL; } - countersize = COUNTER_OFFSET(tmp.nentries) * num_possible_cpus(); + countersize = COUNTER_OFFSET(tmp.nentries) * + (highest_possible_processor_id()+1); newinfo = (struct ebt_table_info *) vmalloc(sizeof(struct ebt_table_info) + countersize); if (!newinfo) @@ -1022,7 +1022,7 @@ static int do_replace(void __user *user, vfree(table->entries); if (table->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(table->chainstack[i]); vfree(table->chainstack); } @@ -1040,7 +1040,7 @@ free_counterstmp: vfree(counterstmp); /* can be initialized in translate_table() */ if (newinfo->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(newinfo->chainstack[i]); vfree(newinfo->chainstack); } @@ -1132,7 +1132,8 @@ int ebt_register_table(struct ebt_table return -EINVAL; } - countersize = COUNTER_OFFSET(table->table->nentries) * num_possible_cpus(); + countersize = COUNTER_OFFSET(table->table->nentries) * + (highest_possible_processor_id()+1); newinfo = (struct ebt_table_info *) vmalloc(sizeof(struct ebt_table_info) + countersize); ret = -ENOMEM; @@ -1186,7 +1187,7 @@ free_unlock: up(&ebt_mutex); free_chainstack: if (newinfo->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(newinfo->chainstack[i]); vfree(newinfo->chainstack); } @@ -1209,7 +1210,7 @@ void ebt_unregister_table(struct ebt_tab up(&ebt_mutex); vfree(table->private->entries); if (table->private->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(table->private->chainstack[i]); vfree(table->private->chainstack); } diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index fa16342..a796928 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -716,8 +716,10 @@ static int translate_table(const char *n } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu(i) { + if (i == 0) + continue; + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -767,7 +769,7 @@ static void get_counters(const struct ar unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { i = 0; ARPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -885,7 +887,8 @@ static int do_replace(void __user *user, return -ENOMEM; newinfo = vmalloc(sizeof(struct arpt_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1158,7 +1161,8 @@ int arpt_register_table(struct arpt_tabl = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct arpt_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) { ret = -ENOMEM; return ret; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index eef99a1..75c27e9 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -27,6 +27,7 @@ #include <asm/semaphore.h> #include <linux/proc_fs.h> #include <linux/err.h> +#include <linux/cpumask.h> #include <linux/netfilter_ipv4/ip_tables.h> @@ -921,8 +922,10 @@ translate_table(const char *name, } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu(i) { + if (i == 0) + continue; + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -943,7 +946,7 @@ replace_table(struct ipt_table *table, struct ipt_entry *table_base; unsigned int i; - for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu(i) { table_base = (void *)newinfo->entries + TABLE_OFFSET(newinfo, i); @@ -990,7 +993,7 @@ get_counters(const struct ipt_table_info unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { i = 0; IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -1128,7 +1131,8 @@ do_replace(void __user *user, unsigned i return -ENOMEM; newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1458,7 +1462,8 @@ int ipt_register_table(struct ipt_table = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 2da514b..b03e906 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -28,6 +28,7 @@ #include <asm/uaccess.h> #include <asm/semaphore.h> #include <linux/proc_fs.h> +#include <linux/cpumask.h> #include <linux/netfilter_ipv6/ip6_tables.h> @@ -950,8 +951,10 @@ translate_table(const char *name, } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu(i) { + if (i == 0) + continue; + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -973,6 +976,7 @@ replace_table(struct ip6t_table *table, unsigned int i; for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu(i) { table_base = (void *)newinfo->entries + TABLE_OFFSET(newinfo, i); @@ -1019,7 +1023,7 @@ get_counters(const struct ip6t_table_inf unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { i = 0; IP6T_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -1153,7 +1157,8 @@ do_replace(void __user *user, unsigned i return -ENOMEM; newinfo = vmalloc(sizeof(struct ip6t_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1467,7 +1472,8 @@ int ip6t_register_table(struct ip6t_tabl = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct ip6t_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-12 22:54 ` David S. Miller @ 2005-10-13 7:18 ` seb 2005-10-13 19:11 ` David S. Miller 2005-10-13 8:46 ` Herbert Xu 1 sibling, 1 reply; 40+ messages in thread From: seb @ 2005-10-13 7:18 UTC (permalink / raw) To: David S. Miller; +Cc: netfilter-devel, bdschuym On Wed, Oct 12, 2005 at 03:54:19PM -0700, David S. Miller wrote: > From: Harald Welte <laforge@netfilter.org> > Date: Tue, 11 Oct 2005 19:54:57 +0200 > > I think I found the bug in your patch Harald. > > > @@ -716,8 +716,8 @@ static int translate_table(const char *n > > } > > > > /* And one copy for every other CPU */ > > - for (i = 1; i < num_possible_cpus(); i++) { > > - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, > > + for_each_cpu_mask(i, cpu_possible_map) { > > + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*(1+i), > > newinfo->entries, > > SMP_ALIGN(newinfo->size)); > > } > > This isn't correct I think. It assumes cpu numbers are > linear :-) > > In the two cpu case of cpu "0" and cpu "2" what this loop > will do is: > > 1) copy entry 0 to entry 1 > 2) copy entry 0 to entry 3 > > which fails to initialize entry 2 and that's how we crash. > > Maybe the loop is better constructed like this (btw, for_each_cpu() > walks over cpu_possible_map, no need to expand that by hand): > > for_each_cpu(i) { > if (i == 0) > continue; > > memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, > newinfo->entries, > SMP_ALIGN(newinfo->size)); > } > > Here is an updated patch, and it takes care of all of the platforms > that don't export the necessary cpumask symbols. > > Sebastian can you test this one out please? > > Thanks a lot. You're welcome :). I'm pleased to announce that the mail i'm writing is on the machine with your patch applied, and nothing bad happenned in the 5 first minutes. So I can reasonnably assert that the patch is correct and fix the problem. Just one little point: Should the cpu_possible_map be an exported symbol for all smp arch or not ? Seb ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-13 7:18 ` seb @ 2005-10-13 19:11 ` David S. Miller 0 siblings, 0 replies; 40+ messages in thread From: David S. Miller @ 2005-10-13 19:11 UTC (permalink / raw) To: seb; +Cc: netfilter-devel, bdschuym From: seb@frankengul.org Date: Thu, 13 Oct 2005 09:18:51 +0200 > Just one little point: Should the cpu_possible_map be an exported symbol > for all smp arch or not ? Several, like sparc64, definite it to something else. I audited the whole tree to make sure it would work on every SMP platform. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-12 22:54 ` David S. Miller 2005-10-13 7:18 ` seb @ 2005-10-13 8:46 ` Herbert Xu 2005-10-13 21:14 ` David S. Miller 1 sibling, 1 reply; 40+ messages in thread From: Herbert Xu @ 2005-10-13 8:46 UTC (permalink / raw) To: David S. Miller; +Cc: laforge, netfilter-devel, bdschuym, seb David S. Miller <davem@davemloft.net> wrote: > > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h > index b15826f..36c46a7 100644 > --- a/include/linux/cpumask.h > +++ b/include/linux/cpumask.h > @@ -392,4 +392,18 @@ extern cpumask_t cpu_present_map; > #define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map) > #define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map) > > +/* Find the highest possible smp_processor_id() */ > +static inline unsigned int highest_possible_processor_id(void) > +{ > + unsigned int cpu, highest = 0; > + > + for_each_cpu_mask(cpu, cpu_possible_map) { > + if (cpu > highest) > + highest = cpu; > + } > + > + return highest; > +} This is a cue for someone to do find_last_bit for 2.6.15 :) In the mean time we can at least kill the (cpu > highest) check since cpu is monotonically increasing. > @@ -893,11 +895,8 @@ static void get_counters(struct ebt_coun > int i, cpu; > struct ebt_counter *counter_base; > > - /* counters of cpu 0 */ > - memcpy(counters, oldcounters, > - sizeof(struct ebt_counter) * nentries); > /* add other counters to those of cpu 0 */ > - for (cpu = 1; cpu < num_possible_cpus(); cpu++) { > + for_each_cpu(cpu) { > counter_base = COUNTER_BASE(oldcounters, nentries, cpu); > for (i = 0; i < nentries; i++) { > counters[i].pcnt += counter_base[i].pcnt; I'm not sure about this. Counters hasn't been zeroed yet, has it? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-13 8:46 ` Herbert Xu @ 2005-10-13 21:14 ` David S. Miller 2005-10-13 22:37 ` seb 2005-10-14 18:33 ` Harald Welte 0 siblings, 2 replies; 40+ messages in thread From: David S. Miller @ 2005-10-13 21:14 UTC (permalink / raw) To: herbert; +Cc: laforge, netfilter-devel, bdschuym, seb From: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu, 13 Oct 2005 18:46:06 +1000 > > +/* Find the highest possible smp_processor_id() */ > > +static inline unsigned int highest_possible_processor_id(void) > > +{ > > + unsigned int cpu, highest = 0; > > + > > + for_each_cpu_mask(cpu, cpu_possible_map) { > > + if (cpu > highest) > > + highest = cpu; > > + } > > + > > + return highest; > > +} > > In the mean time we can at least kill the (cpu > highest) check since > cpu is monotonically increasing. Correct. > > @@ -893,11 +895,8 @@ static void get_counters(struct ebt_coun > > int i, cpu; > > struct ebt_counter *counter_base; > > > > - /* counters of cpu 0 */ > > - memcpy(counters, oldcounters, > > - sizeof(struct ebt_counter) * nentries); > > /* add other counters to those of cpu 0 */ > > - for (cpu = 1; cpu < num_possible_cpus(); cpu++) { > > + for_each_cpu(cpu) { > > counter_base = COUNTER_BASE(oldcounters, nentries, cpu); > > for (i = 0; i < nentries; i++) { > > counters[i].pcnt += counter_base[i].pcnt; > > I'm not sure about this. Counters hasn't been zeroed yet, has it? Good catch, I fixed this up to preserve the original memset(), then do the loop like: for_each_cpu(cpu) { if (cpu == 0) continue; ... With these two fixes, the following is what I'm about to push to Linus. Thanks everyone. diff --git a/arch/cris/arch-v32/kernel/smp.c b/arch/cris/arch-v32/kernel/smp.c index 2c5cae0..957f551 100644 --- a/arch/cris/arch-v32/kernel/smp.c +++ b/arch/cris/arch-v32/kernel/smp.c @@ -15,6 +15,7 @@ #include <linux/kernel.h> #include <linux/cpumask.h> #include <linux/interrupt.h> +#include <linux/module.h> #define IPI_SCHEDULE 1 #define IPI_CALL 2 @@ -28,6 +29,7 @@ spinlock_t cris_atomic_locks[] = { [0 .. /* CPU masks */ cpumask_t cpu_online_map = CPU_MASK_NONE; cpumask_t phys_cpu_present_map = CPU_MASK_NONE; +EXPORT_SYMBOL(phys_cpu_present_map); /* Variables used during SMP boot */ volatile int cpu_now_booting = 0; diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c index 56a39d6..5ecefc0 100644 --- a/arch/sh/kernel/smp.c +++ b/arch/sh/kernel/smp.c @@ -22,6 +22,7 @@ #include <linux/time.h> #include <linux/timex.h> #include <linux/sched.h> +#include <linux/module.h> #include <asm/atomic.h> #include <asm/processor.h> @@ -39,6 +40,8 @@ struct sh_cpuinfo cpu_data[NR_CPUS]; extern void per_cpu_trap_init(void); cpumask_t cpu_possible_map; +EXPORT_SYMBOL(cpu_possible_map); + cpumask_t cpu_online_map; static atomic_t cpus_booted = ATOMIC_INIT(0); diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index b15826f..fe97783 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -392,4 +392,16 @@ extern cpumask_t cpu_present_map; #define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map) #define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map) +/* Find the highest possible smp_processor_id() */ +static inline unsigned int highest_possible_processor_id(void) +{ + unsigned int cpu, highest = 0; + + for_each_cpu_mask(cpu, cpu_possible_map) + highest = cpu; + + return highest; +} + + #endif /* __LINUX_CPUMASK_H */ diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index c454014..f8ffbf6 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -26,6 +26,7 @@ #include <linux/spinlock.h> #include <asm/uaccess.h> #include <linux/smp.h> +#include <linux/cpumask.h> #include <net/sock.h> /* needed for logical [in,out]-dev filtering */ #include "../br_private.h" @@ -823,10 +824,11 @@ static int translate_table(struct ebt_re /* this will get free'd in do_replace()/ebt_register_table() if an error occurs */ newinfo->chainstack = (struct ebt_chainstack **) - vmalloc(num_possible_cpus() * sizeof(struct ebt_chainstack)); + vmalloc((highest_possible_processor_id()+1) + * sizeof(struct ebt_chainstack)); if (!newinfo->chainstack) return -ENOMEM; - for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu(i) { newinfo->chainstack[i] = vmalloc(udc_cnt * sizeof(struct ebt_chainstack)); if (!newinfo->chainstack[i]) { @@ -895,9 +897,12 @@ static void get_counters(struct ebt_coun /* counters of cpu 0 */ memcpy(counters, oldcounters, - sizeof(struct ebt_counter) * nentries); + sizeof(struct ebt_counter) * nentries); + /* add other counters to those of cpu 0 */ - for (cpu = 1; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { + if (cpu == 0) + continue; counter_base = COUNTER_BASE(oldcounters, nentries, cpu); for (i = 0; i < nentries; i++) { counters[i].pcnt += counter_base[i].pcnt; @@ -929,7 +934,8 @@ static int do_replace(void __user *user, BUGPRINT("Entries_size never zero\n"); return -EINVAL; } - countersize = COUNTER_OFFSET(tmp.nentries) * num_possible_cpus(); + countersize = COUNTER_OFFSET(tmp.nentries) * + (highest_possible_processor_id()+1); newinfo = (struct ebt_table_info *) vmalloc(sizeof(struct ebt_table_info) + countersize); if (!newinfo) @@ -1022,7 +1028,7 @@ static int do_replace(void __user *user, vfree(table->entries); if (table->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(table->chainstack[i]); vfree(table->chainstack); } @@ -1040,7 +1046,7 @@ free_counterstmp: vfree(counterstmp); /* can be initialized in translate_table() */ if (newinfo->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(newinfo->chainstack[i]); vfree(newinfo->chainstack); } @@ -1132,7 +1138,8 @@ int ebt_register_table(struct ebt_table return -EINVAL; } - countersize = COUNTER_OFFSET(table->table->nentries) * num_possible_cpus(); + countersize = COUNTER_OFFSET(table->table->nentries) * + (highest_possible_processor_id()+1); newinfo = (struct ebt_table_info *) vmalloc(sizeof(struct ebt_table_info) + countersize); ret = -ENOMEM; @@ -1186,7 +1193,7 @@ free_unlock: up(&ebt_mutex); free_chainstack: if (newinfo->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(newinfo->chainstack[i]); vfree(newinfo->chainstack); } @@ -1209,7 +1216,7 @@ void ebt_unregister_table(struct ebt_tab up(&ebt_mutex); vfree(table->private->entries); if (table->private->chainstack) { - for (i = 0; i < num_possible_cpus(); i++) + for_each_cpu(i) vfree(table->private->chainstack[i]); vfree(table->private->chainstack); } diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index fa16342..a796928 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -716,8 +716,10 @@ static int translate_table(const char *n } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu(i) { + if (i == 0) + continue; + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -767,7 +769,7 @@ static void get_counters(const struct ar unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { i = 0; ARPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -885,7 +887,8 @@ static int do_replace(void __user *user, return -ENOMEM; newinfo = vmalloc(sizeof(struct arpt_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1158,7 +1161,8 @@ int arpt_register_table(struct arpt_tabl = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct arpt_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) { ret = -ENOMEM; return ret; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index eef99a1..75c27e9 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -27,6 +27,7 @@ #include <asm/semaphore.h> #include <linux/proc_fs.h> #include <linux/err.h> +#include <linux/cpumask.h> #include <linux/netfilter_ipv4/ip_tables.h> @@ -921,8 +922,10 @@ translate_table(const char *name, } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu(i) { + if (i == 0) + continue; + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -943,7 +946,7 @@ replace_table(struct ipt_table *table, struct ipt_entry *table_base; unsigned int i; - for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu(i) { table_base = (void *)newinfo->entries + TABLE_OFFSET(newinfo, i); @@ -990,7 +993,7 @@ get_counters(const struct ipt_table_info unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { i = 0; IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -1128,7 +1131,8 @@ do_replace(void __user *user, unsigned i return -ENOMEM; newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1458,7 +1462,8 @@ int ipt_register_table(struct ipt_table = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct ipt_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 2da514b..b03e906 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -28,6 +28,7 @@ #include <asm/uaccess.h> #include <asm/semaphore.h> #include <linux/proc_fs.h> +#include <linux/cpumask.h> #include <linux/netfilter_ipv6/ip6_tables.h> @@ -950,8 +951,10 @@ translate_table(const char *name, } /* And one copy for every other CPU */ - for (i = 1; i < num_possible_cpus(); i++) { - memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i, + for_each_cpu(i) { + if (i == 0) + continue; + memcpy(newinfo->entries + SMP_ALIGN(newinfo->size) * i, newinfo->entries, SMP_ALIGN(newinfo->size)); } @@ -973,6 +976,7 @@ replace_table(struct ip6t_table *table, unsigned int i; for (i = 0; i < num_possible_cpus(); i++) { + for_each_cpu(i) { table_base = (void *)newinfo->entries + TABLE_OFFSET(newinfo, i); @@ -1019,7 +1023,7 @@ get_counters(const struct ip6t_table_inf unsigned int cpu; unsigned int i; - for (cpu = 0; cpu < num_possible_cpus(); cpu++) { + for_each_cpu(cpu) { i = 0; IP6T_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu), t->size, @@ -1153,7 +1157,8 @@ do_replace(void __user *user, unsigned i return -ENOMEM; newinfo = vmalloc(sizeof(struct ip6t_table_info) - + SMP_ALIGN(tmp.size) * num_possible_cpus()); + + SMP_ALIGN(tmp.size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; @@ -1467,7 +1472,8 @@ int ip6t_register_table(struct ip6t_tabl = { 0, 0, 0, { 0 }, { 0 }, { } }; newinfo = vmalloc(sizeof(struct ip6t_table_info) - + SMP_ALIGN(repl->size) * num_possible_cpus()); + + SMP_ALIGN(repl->size) * + (highest_possible_processor_id()+1)); if (!newinfo) return -ENOMEM; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-13 21:14 ` David S. Miller @ 2005-10-13 22:37 ` seb 2005-10-14 18:33 ` Harald Welte 1 sibling, 0 replies; 40+ messages in thread From: seb @ 2005-10-13 22:37 UTC (permalink / raw) To: netfilter-devel On Thu, Oct 13, 2005 at 02:14:08PM -0700, David S. Miller wrote: > > With these two fixes, the following is what I'm about to push > to Linus. > > Thanks everyone. > I'll be off this machine for the week-end (now in fact). I won't be able to test both patch until Sunday evening. I'll try to test it tomorrow morning (GMT+1). Seb ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-13 21:14 ` David S. Miller 2005-10-13 22:37 ` seb @ 2005-10-14 18:33 ` Harald Welte 1 sibling, 0 replies; 40+ messages in thread From: Harald Welte @ 2005-10-14 18:33 UTC (permalink / raw) To: David S. Miller; +Cc: netfilter-devel, bdschuym, herbert, seb [-- Attachment #1: Type: text/plain, Size: 748 bytes --] On Thu, Oct 13, 2005 at 02:14:08PM -0700, David S. Miller wrote: > With these two fixes, the following is what I'm about to push > to Linus. Ok, thanks Dave. There was little chance for me to respond quickly, since Linux Kongress had really limited network connectivity, and it kept breaking down, preventing me from accessing my emails :( Enjoy your holidays, -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 16:41 [PATCH] fix iptables on systems with discontiguous processor ids Harald Welte 2005-10-10 21:15 ` David S. Miller @ 2005-10-10 21:48 ` Sébastien Bernard 2005-10-11 14:23 ` Harald Welte 1 sibling, 1 reply; 40+ messages in thread From: Sébastien Bernard @ 2005-10-10 21:48 UTC (permalink / raw) To: Harald Welte, netfilter-devel Harald Welte a écrit : >Hi Dave! > >This is my proposed patch for the problem you've described. Please test >and submit. If it works, I'll also prepare a patch for {arp,ip6}_tables. > > >[NETFILTER] ip_tables: fix per cpu handling > >As David Miller points out, the "smp_processor_id()'s" are not always >contiguous. Esp. some boxes like Sun Ultra60 have CPU 0 and 2 installed in >one system, but no "1". Therefore the current logic of how iptables >manages the per cpu copies of the ruleset is broken. This patch is >supposed to fix it. > >Signed-off-by: Harald Welte <laforge@netfilter.org> > >--- >commit e759eaa9e9e92330c5fcfd760d767d4f39375a03 >tree 63b96f7df57f51dc7e284969c3a08b9264cc2c5f >parent 1ab8dccdbf16c09f8da124fc4c82024de24dfae2 >author Harald Welte <laforge@netfilter.org> Mon, 10 Oct 2005 18:29:24 +0200 >committer Harald Welte <laforge@netfilter.org> Mon, 10 Oct 2005 18:29:24 +0200 > >[snip] > > I tested this patch on my machine. It ooops instantly in get_counters() from ip_tables.c. Seb ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-10 21:48 ` Sébastien Bernard @ 2005-10-11 14:23 ` Harald Welte 2005-10-11 20:59 ` Sébastien Bernard 0 siblings, 1 reply; 40+ messages in thread From: Harald Welte @ 2005-10-11 14:23 UTC (permalink / raw) To: Sébastien Bernard; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1091 bytes --] On Mon, Oct 10, 2005 at 11:48:16PM +0200, Sébastien Bernard wrote: > >Signed-off-by: Harald Welte <laforge@netfilter.org> > >--- > >commit e759eaa9e9e92330c5fcfd760d767d4f39375a03 > >tree 63b96f7df57f51dc7e284969c3a08b9264cc2c5f > >parent 1ab8dccdbf16c09f8da124fc4c82024de24dfae2 > >author Harald Welte <laforge@netfilter.org> Mon, 10 Oct 2005 18:29:24 +0200 > >committer Harald Welte <laforge@netfilter.org> Mon, 10 Oct 2005 18:29:24 +0200 > >[snip] > > > I tested this patch on my machine. > It ooops instantly in get_counters() from ip_tables.c. Mh, can you capture the oops and send it? I'm travelling and don't have access to any SMP box, so it's unlikely I will be able to resolve this alone. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 14:23 ` Harald Welte @ 2005-10-11 20:59 ` Sébastien Bernard 2005-10-11 21:33 ` David S. Miller 2005-10-12 6:40 ` Harald Welte 0 siblings, 2 replies; 40+ messages in thread From: Sébastien Bernard @ 2005-10-11 20:59 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel Harald Welte a écrit : >On Mon, Oct 10, 2005 at 11:48:16PM +0200, Sébastien Bernard wrote: > > >>>[snip] >>> >>> >>I tested this patch on my machine. >>It ooops instantly in get_counters() from ip_tables.c. >> >> > >Mh, can you capture the oops and send it? I'm travelling and don't have >access to any SMP box, so it's unlikely I will be able to resolve this >alone. > > What information do you need exactly ? The oops can be tiresome to write down. Is the stackdump enough or do you need register info ? Seb ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 20:59 ` Sébastien Bernard @ 2005-10-11 21:33 ` David S. Miller 2005-10-12 6:40 ` Harald Welte 1 sibling, 0 replies; 40+ messages in thread From: David S. Miller @ 2005-10-11 21:33 UTC (permalink / raw) To: seb; +Cc: laforge, netfilter-devel From: Sébastien Bernard <seb@frankengul.org> Date: Tue, 11 Oct 2005 22:59:20 +0200 > What information do you need exactly ? > The oops can be tiresome to write down. > Is the stackdump enough or do you need register info ? The "TPC" should be translated into a specific symbol in the debugging output, as should the backtrace call trace. I guess those symbols ought to be enough for now, and if we need more info we can ask for it specifically later after going over the log you give us. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] fix iptables on systems with discontiguous processor ids 2005-10-11 20:59 ` Sébastien Bernard 2005-10-11 21:33 ` David S. Miller @ 2005-10-12 6:40 ` Harald Welte 1 sibling, 0 replies; 40+ messages in thread From: Harald Welte @ 2005-10-12 6:40 UTC (permalink / raw) To: Sébastien Bernard; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1092 bytes --] On Tue, Oct 11, 2005 at 10:59:20PM +0200, Sébastien Bernard wrote: > Harald Welte a écrit : > > >On Mon, Oct 10, 2005 at 11:48:16PM +0200, Sébastien Bernard wrote: > > > >>>[snip] > >>> > >>I tested this patch on my machine. > >>It ooops instantly in get_counters() from ip_tables.c. > >> > >Mh, can you capture the oops and send it? I'm travelling and don't have > >access to any SMP box, so it's unlikely I will be able to resolve this > >alone. > > > What information do you need exactly ? > The oops can be tiresome to write down. please use a serial console or take a photograph with a digital camera. ;) Pleaes also send me the corresponding ip_tables.ko object file. Thanks -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2005-10-17 4:18 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-10-10 16:41 [PATCH] fix iptables on systems with discontiguous processor ids Harald Welte 2005-10-10 21:15 ` David S. Miller 2005-10-10 21:41 ` Patrick McHardy 2005-10-10 21:52 ` Patrick McHardy 2005-10-10 22:06 ` David S. Miller 2005-10-11 14:23 ` Harald Welte 2005-10-11 19:39 ` David S. Miller 2005-10-12 6:36 ` Harald Welte 2005-10-10 22:04 ` David S. Miller 2005-10-10 21:46 ` Harald Welte 2005-10-11 13:54 ` Henrik Nordstrom 2005-10-11 14:13 ` Eric Dumazet 2005-10-11 17:45 ` Harald Welte 2005-10-11 22:57 ` David S. Miller 2005-10-12 7:00 ` Harald Welte 2005-10-13 22:01 ` Eric Dumazet 2005-10-17 4:18 ` David S. Miller 2005-10-11 10:44 ` Harald Welte 2005-10-11 17:15 ` Bart De Schuymer 2005-10-11 17:55 ` Harald Welte 2005-10-11 17:54 ` Harald Welte 2005-10-11 21:54 ` Sébastien Bernard 2005-10-11 22:32 ` David S. Miller 2005-10-12 6:22 ` seb 2005-10-12 6:31 ` David S. Miller 2005-10-12 7:04 ` Sébastien Bernard 2005-10-12 7:34 ` David S. Miller 2005-10-12 6:43 ` Harald Welte 2005-10-12 22:54 ` David S. Miller 2005-10-13 7:18 ` seb 2005-10-13 19:11 ` David S. Miller 2005-10-13 8:46 ` Herbert Xu 2005-10-13 21:14 ` David S. Miller 2005-10-13 22:37 ` seb 2005-10-14 18:33 ` Harald Welte 2005-10-10 21:48 ` Sébastien Bernard 2005-10-11 14:23 ` Harald Welte 2005-10-11 20:59 ` Sébastien Bernard 2005-10-11 21:33 ` David S. Miller 2005-10-12 6:40 ` Harald Welte
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.