* limited number if iptable rules on 64bit hosts
@ 2005-02-02 13:38 Olaf Hering
2005-02-02 22:25 ` Olaf Hering
0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-02 13:38 UTC (permalink / raw)
To: netdev
What buffer or sysctrl value has to change to allow more than 3445 rules
like this (on a 64bit host with 64bit iptables)?
iptables -A FORWARD -j ACCEPT
setsockopt(3, SOL_IP, 0x40 /* IP_??? */,
"filter\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524368) =
-1 ENOMEM (Cannot allocate memory)
I see this with 2.6.5 and 2.6.11.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-02 13:38 limited number if iptable rules on 64bit hosts Olaf Hering
@ 2005-02-02 22:25 ` Olaf Hering
2005-02-02 22:38 ` Bill Rugolsky Jr.
0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-02 22:25 UTC (permalink / raw)
To: netdev
On Wed, Feb 02, Olaf Hering wrote:
>
> What buffer or sysctrl value has to change to allow more than 3445 rules
> like this (on a 64bit host with 64bit iptables)?
>
> iptables -A FORWARD -j ACCEPT
>
> setsockopt(3, SOL_IP, 0x40 /* IP_??? */,
> "filter\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524368) =
> -1 ENOMEM (Cannot allocate memory)
it triggers the first -ENOMEM in
net/ipv4/netfilter/ip_tables.c:do_replace
sizeof(struct ipt_table_info)+SMP_ALIGN(tmp.size)*NR_CPUS == 67108992 bytes
128+524288*128==67108992
(sizeof(struct ipt_table_info) + (((tmp.size) + (1 << 7)-1) & ~((1 << 7)-1)) * 128)
hmm, no braces missing.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-02 22:25 ` Olaf Hering
@ 2005-02-02 22:38 ` Bill Rugolsky Jr.
2005-02-02 22:52 ` Olaf Hering
0 siblings, 1 reply; 10+ messages in thread
From: Bill Rugolsky Jr. @ 2005-02-02 22:38 UTC (permalink / raw)
To: Olaf Hering; +Cc: netdev
On Wed, Feb 02, 2005 at 11:25:16PM +0100, Olaf Hering wrote:
> it triggers the first -ENOMEM in
> net/ipv4/netfilter/ip_tables.c:do_replace
>
> sizeof(struct ipt_table_info)+SMP_ALIGN(tmp.size)*NR_CPUS == 67108992 bytes
>
> 128+524288*128==67108992
>
> (sizeof(struct ipt_table_info) + (((tmp.size) + (1 << 7)-1) & ~((1 << 7)-1)) * 128)
>
> hmm, no braces missing.
I don't have time to look now [I'm running for the door],
but that's possibly the vmalloc() limit of 64M (67108864) ?
Regards,
Bill Rugolsky
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-02 22:38 ` Bill Rugolsky Jr.
@ 2005-02-02 22:52 ` Olaf Hering
2005-02-03 11:19 ` Olaf Kirch
0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-02 22:52 UTC (permalink / raw)
To: Bill Rugolsky Jr.; +Cc: netdev
On Wed, Feb 02, Bill Rugolsky Jr. wrote:
> I don't have time to look now [I'm running for the door],
> but that's possibly the vmalloc() limit of 64M (67108864) ?
maybe.
->size is a userprovided value, havent looked closely at iptables
source. It seems we have to live with this limitation.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-02 22:52 ` Olaf Hering
@ 2005-02-03 11:19 ` Olaf Kirch
2005-02-03 18:48 ` David S. Miller
0 siblings, 1 reply; 10+ messages in thread
From: Olaf Kirch @ 2005-02-03 11:19 UTC (permalink / raw)
To: Olaf Hering; +Cc: Bill Rugolsky Jr., netdev
On Wed, Feb 02, 2005 at 11:52:58PM +0100, Olaf Hering wrote:
> > I don't have time to look now [I'm running for the door],
> > but that's possibly the vmalloc() limit of 64M (67108864) ?
>
> maybe.
> ->size is a userprovided value, havent looked closely at iptables
> source. It seems we have to live with this limitation.
The problem is two-fold. netfilter tries to allocate some data
per-CPU and does
vmalloc(sizeof(struct ipt_table_info)
+ SMP_ALIGN(tmp.size) * NR_CPUS);
At 3445 rules, tmp.size is 524272 (why does it want that much memory? I
would expect the only data that's per-CPU is the packet and byte
counters).
In some of our kernel configurations, NR_CPUS is 128 or even more,
and we run into a vmalloc limit here.
vmalloc wants to allocate an arrays of struct page pointers, and on
a 64bit platform this means you're limited to 131072 / 8 = 16384
pages, or 67108864 bytes. In the example Olaf H posted, we fail at
128 + 524272 * 128 = 67108992 bytes, i.e. 16385 pages.
So I guess it all boils down to why netfilter needs 150-odd bytes
per rule and CPU.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-03 11:19 ` Olaf Kirch
@ 2005-02-03 18:48 ` David S. Miller
2005-02-03 18:59 ` Olaf Hering
0 siblings, 1 reply; 10+ messages in thread
From: David S. Miller @ 2005-02-03 18:48 UTC (permalink / raw)
To: Olaf Kirch; +Cc: olh, brugolsky, netdev
On Thu, 3 Feb 2005 12:19:39 +0100
Olaf Kirch <okir@suse.de> wrote:
> At 3445 rules, tmp.size is 524272 (why does it want that much memory? I
> would expect the only data that's per-CPU is the packet and byte
> counters).
The rule itself is replicated per-cpu as well to keep L2 cache
accesses local per cpu on SMP systems.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-03 18:48 ` David S. Miller
@ 2005-02-03 18:59 ` Olaf Hering
2005-02-03 19:00 ` David S. Miller
0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-03 18:59 UTC (permalink / raw)
To: David S. Miller; +Cc: Olaf Kirch, brugolsky, netdev
On Thu, Feb 03, David S. Miller wrote:
> On Thu, 3 Feb 2005 12:19:39 +0100
> Olaf Kirch <okir@suse.de> wrote:
>
> > At 3445 rules, tmp.size is 524272 (why does it want that much memory? I
> > would expect the only data that's per-CPU is the packet and byte
> > counters).
>
> The rule itself is replicated per-cpu as well to keep L2 cache
> accesses local per cpu on SMP systems.
Andy made this change, which helped on a dual box.
diff -u linux-2.6.5/net/ipv4/netfilter/ip_tables.c-o linux-2.6.5/net/ipv4/netfilter/ip_tables.c
--- linux-2.6.5/net/ipv4/netfilter/ip_tables.c-o 2005-02-03 08:06:33.000000000 +0100
+++ linux-2.6.5/net/ipv4/netfilter/ip_tables.c 2005-02-03 13:06:32.163182472 +0100
@@ -29,6 +29,12 @@
#include <linux/netfilter_ipv4/ip_tables.h>
+#ifdef CONFIG_HOTPLUG_CPU
+#define NF_NR_CPUS NR_CPUS
+#else
+#define NF_NR_CPUS num_online_cpus()
+#endif
+
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>");
MODULE_DESCRIPTION("IPv4 packet filter");
@@ -860,7 +866,7 @@
}
/* And one copy for every other CPU */
- for (i = 1; i < NR_CPUS; i++) {
+ for (i = 1; i < NF_NR_CPUS; i++) {
memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i,
newinfo->entries,
SMP_ALIGN(newinfo->size));
@@ -882,7 +888,7 @@
struct ipt_entry *table_base;
unsigned int i;
- for (i = 0; i < NR_CPUS; i++) {
+ for (i = 0; i < NF_NR_CPUS; i++) {
table_base =
(void *)newinfo->entries
+ TABLE_OFFSET(newinfo, i);
@@ -929,7 +935,7 @@
unsigned int cpu;
unsigned int i;
- for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ for (cpu = 0; cpu < NF_NR_CPUS; cpu++) {
i = 0;
IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu),
t->size,
@@ -1067,7 +1073,7 @@
return -ENOMEM;
newinfo = vmalloc(sizeof(struct ipt_table_info)
- + SMP_ALIGN(tmp.size) * NR_CPUS);
+ + SMP_ALIGN(tmp.size) * NF_NR_CPUS);
if (!newinfo)
return -ENOMEM;
@@ -1380,7 +1386,7 @@
= { 0, 0, 0, { 0 }, { 0 }, { } };
newinfo = vmalloc(sizeof(struct ipt_table_info)
- + SMP_ALIGN(table->table->size) * NR_CPUS);
+ + SMP_ALIGN(table->table->size) * NF_NR_CPUS);
if (!newinfo)
return -ENOMEM;
diff -u linux-2.6.5/mm/vmalloc.c-o linux-2.6.5/mm/vmalloc.c
--- linux-2.6.5/mm/vmalloc.c-o 2005-02-03 08:06:50.000000000 +0100
+++ linux-2.6.5/mm/vmalloc.c 2005-02-03 13:07:44.162236952 +0100
@@ -310,7 +310,10 @@
__free_page(area->pages[i]);
}
- kfree(area->pages);
+ if (area->nr_pages * sizeof(struct page *) >= 4*PAGE_SIZE)
+ vfree(area->pages);
+ else
+ kfree(area->pages);
}
kfree(area);
@@ -414,7 +417,11 @@
array_size = (nr_pages * sizeof(struct page *));
area->nr_pages = nr_pages;
- area->pages = pages = kmalloc(array_size, (gfp_mask & ~__GFP_HIGHMEM));
+
+ if (array_size >= 4*PAGE_SIZE)
+ area->pages = pages = __vmalloc(array_size, (gfp_mask & ~__GFP_HIGHMEM), PAGE_KERNEL);
+ else
+ area->pages = pages = kmalloc(array_size, (gfp_mask & ~__GFP_HIGHMEM));
if (!area->pages) {
remove_vm_area(area->addr);
kfree(area);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-03 18:59 ` Olaf Hering
@ 2005-02-03 19:00 ` David S. Miller
2005-02-03 19:33 ` Bart De Schuymer
2005-02-03 21:35 ` Bill Rugolsky Jr.
0 siblings, 2 replies; 10+ messages in thread
From: David S. Miller @ 2005-02-03 19:00 UTC (permalink / raw)
To: Olaf Hering; +Cc: okir, netdev, netfilter-devel, brugolsky
On Thu, 3 Feb 2005 19:59:28 +0100
Olaf Hering <olh@suse.de> wrote:
> On Thu, Feb 03, David S. Miller wrote:
>
> > The rule itself is replicated per-cpu as well to keep L2 cache
> > accesses local per cpu on SMP systems.
>
> Andy made this change, which helped on a dual box.
It might not help for Olaf's 128 cpu box though :-)
I think reconsider the idea of replicating the rule itself per-cpu.
Also, this thread should have begun with netfilter-devel at least on
the CC:, added.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-03 19:00 ` David S. Miller
@ 2005-02-03 19:33 ` Bart De Schuymer
2005-02-03 21:35 ` Bill Rugolsky Jr.
1 sibling, 0 replies; 10+ messages in thread
From: Bart De Schuymer @ 2005-02-03 19:33 UTC (permalink / raw)
To: David S. Miller; +Cc: okir, netdev, netfilter-devel, brugolsky, Olaf Hering
Op do, 03-02-2005 te 11:00 -0800, schreef David S. Miller:
> On Thu, 3 Feb 2005 19:59:28 +0100
> Olaf Hering <olh@suse.de> wrote:
>
> > On Thu, Feb 03, David S. Miller wrote:
> >
> > > The rule itself is replicated per-cpu as well to keep L2 cache
> > > accesses local per cpu on SMP systems.
> >
> > Andy made this change, which helped on a dual box.
>
> It might not help for Olaf's 128 cpu box though :-)
>
> I think reconsider the idea of replicating the rule itself per-cpu.
> Also, this thread should have begun with netfilter-devel at least on
> the CC:, added.
Note that ebtables only has per-cpu counters.
I wonder what limits are seen on such systems for ebtables.
cheers,
Bart
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: limited number if iptable rules on 64bit hosts
2005-02-03 19:00 ` David S. Miller
2005-02-03 19:33 ` Bart De Schuymer
@ 2005-02-03 21:35 ` Bill Rugolsky Jr.
1 sibling, 0 replies; 10+ messages in thread
From: Bill Rugolsky Jr. @ 2005-02-03 21:35 UTC (permalink / raw)
To: David S. Miller; +Cc: okir, netdev, netfilter-devel, Olaf Hering
On Thu, Feb 03, 2005 at 11:00:49AM -0800, David S. Miller wrote:
> It might not help for Olaf's 128 cpu box though :-)
>
> I think reconsider the idea of replicating the rule itself per-cpu.
> Also, this thread should have begun with netfilter-devel at least on
> the CC:, added.
As Olaf Kirch pointed out, an entry is about 150 bytes, while the counters
are two 64-bit ints, and it looks like 'unsigned int comefrom' is set as
the chains are traversed [net/ipv4/netfilter/ip_tables.c]:
/* Save old back ptr in next entry */
struct ipt_entry *next
= (void *)e + e->next_offset;
next->comefrom
= (void *)back - table_base;
/* set back pointer to next entry */
back = next;
That's 20-24 bytes of state per-entry per-cpu, for a factor of 6-7 savings,
at the expense of hairing up the code slightly to do parallel indexed
access, Fortran style.
If I am understanding the mm code correctly, a single vmalloc() allocation
is currently limited to 64M on a 64-bit platform, but the VMALLOC address
range is much greater, so one might also prefer to do a kmalloc()/vmalloc()
per CPU, perhaps by creating {vmalloc,vfree}_percpu() and using the
percpu interfaces.
Bill Rugolsky
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-02-03 21:35 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-02 13:38 limited number if iptable rules on 64bit hosts Olaf Hering
2005-02-02 22:25 ` Olaf Hering
2005-02-02 22:38 ` Bill Rugolsky Jr.
2005-02-02 22:52 ` Olaf Hering
2005-02-03 11:19 ` Olaf Kirch
2005-02-03 18:48 ` David S. Miller
2005-02-03 18:59 ` Olaf Hering
2005-02-03 19:00 ` David S. Miller
2005-02-03 19:33 ` Bart De Schuymer
2005-02-03 21:35 ` Bill Rugolsky Jr.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).