netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* limited number if iptable rules on 64bit hosts
@ 2005-02-02 13:38 Olaf Hering
  2005-02-02 22:25 ` Olaf Hering
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-02 13:38 UTC (permalink / raw)
  To: netdev


What buffer or sysctrl value has to change to allow more than 3445 rules
like this (on a 64bit host with 64bit iptables)?

iptables -A FORWARD -j ACCEPT

setsockopt(3, SOL_IP, 0x40 /* IP_??? */,
"filter\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524368) =
-1 ENOMEM (Cannot allocate memory)

I see this with 2.6.5 and 2.6.11.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-02 13:38 limited number if iptable rules on 64bit hosts Olaf Hering
@ 2005-02-02 22:25 ` Olaf Hering
  2005-02-02 22:38   ` Bill Rugolsky Jr.
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-02 22:25 UTC (permalink / raw)
  To: netdev

 On Wed, Feb 02, Olaf Hering wrote:

> 
> What buffer or sysctrl value has to change to allow more than 3445 rules
> like this (on a 64bit host with 64bit iptables)?
> 
> iptables -A FORWARD -j ACCEPT
> 
> setsockopt(3, SOL_IP, 0x40 /* IP_??? */,
> "filter\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524368) =
> -1 ENOMEM (Cannot allocate memory)

it triggers the first -ENOMEM in
net/ipv4/netfilter/ip_tables.c:do_replace

sizeof(struct ipt_table_info)+SMP_ALIGN(tmp.size)*NR_CPUS == 67108992 bytes

128+524288*128==67108992

(sizeof(struct ipt_table_info) + (((tmp.size) + (1 << 7)-1) & ~((1 << 7)-1)) * 128)

hmm, no braces missing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-02 22:25 ` Olaf Hering
@ 2005-02-02 22:38   ` Bill Rugolsky Jr.
  2005-02-02 22:52     ` Olaf Hering
  0 siblings, 1 reply; 10+ messages in thread
From: Bill Rugolsky Jr. @ 2005-02-02 22:38 UTC (permalink / raw)
  To: Olaf Hering; +Cc: netdev

On Wed, Feb 02, 2005 at 11:25:16PM +0100, Olaf Hering wrote:
> it triggers the first -ENOMEM in
> net/ipv4/netfilter/ip_tables.c:do_replace
> 
> sizeof(struct ipt_table_info)+SMP_ALIGN(tmp.size)*NR_CPUS == 67108992 bytes
> 
> 128+524288*128==67108992
> 
> (sizeof(struct ipt_table_info) + (((tmp.size) + (1 << 7)-1) & ~((1 << 7)-1)) * 128)
> 
> hmm, no braces missing.

I don't have time to look now [I'm running for the door],
but that's possibly the vmalloc() limit of 64M (67108864) ?

Regards,

	Bill Rugolsky

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-02 22:38   ` Bill Rugolsky Jr.
@ 2005-02-02 22:52     ` Olaf Hering
  2005-02-03 11:19       ` Olaf Kirch
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-02 22:52 UTC (permalink / raw)
  To: Bill Rugolsky Jr.; +Cc: netdev

 On Wed, Feb 02, Bill Rugolsky Jr. wrote:

> I don't have time to look now [I'm running for the door],
> but that's possibly the vmalloc() limit of 64M (67108864) ?

maybe.
->size is a userprovided value, havent looked closely at iptables
source. It seems we have to live with this limitation.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-02 22:52     ` Olaf Hering
@ 2005-02-03 11:19       ` Olaf Kirch
  2005-02-03 18:48         ` David S. Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Kirch @ 2005-02-03 11:19 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Bill Rugolsky Jr., netdev

On Wed, Feb 02, 2005 at 11:52:58PM +0100, Olaf Hering wrote:
> > I don't have time to look now [I'm running for the door],
> > but that's possibly the vmalloc() limit of 64M (67108864) ?
> 
> maybe.
> ->size is a userprovided value, havent looked closely at iptables
> source. It seems we have to live with this limitation.

The problem is two-fold. netfilter tries to allocate some data
per-CPU and does

	vmalloc(sizeof(struct ipt_table_info)
	                + SMP_ALIGN(tmp.size) * NR_CPUS);

At 3445 rules, tmp.size is 524272 (why does it want that much memory? I
would expect the only data that's per-CPU is the packet and byte
counters).

In some of our kernel configurations, NR_CPUS is 128 or even more,
and we run into a vmalloc limit here.

vmalloc wants to allocate an arrays of struct page pointers, and on
a 64bit platform this means you're limited to 131072 / 8 = 16384
pages, or 67108864 bytes. In the example Olaf H posted, we fail at
128 + 524272 * 128 = 67108992 bytes, i.e. 16385 pages.

So I guess it all boils down to why netfilter needs 150-odd bytes
per rule and CPU.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-03 11:19       ` Olaf Kirch
@ 2005-02-03 18:48         ` David S. Miller
  2005-02-03 18:59           ` Olaf Hering
  0 siblings, 1 reply; 10+ messages in thread
From: David S. Miller @ 2005-02-03 18:48 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: olh, brugolsky, netdev

On Thu, 3 Feb 2005 12:19:39 +0100
Olaf Kirch <okir@suse.de> wrote:

> At 3445 rules, tmp.size is 524272 (why does it want that much memory? I
> would expect the only data that's per-CPU is the packet and byte
> counters).

The rule itself is replicated per-cpu as well to keep L2 cache
accesses local per cpu on SMP systems.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-03 18:48         ` David S. Miller
@ 2005-02-03 18:59           ` Olaf Hering
  2005-02-03 19:00             ` David S. Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2005-02-03 18:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: Olaf Kirch, brugolsky, netdev

 On Thu, Feb 03, David S. Miller wrote:

> On Thu, 3 Feb 2005 12:19:39 +0100
> Olaf Kirch <okir@suse.de> wrote:
> 
> > At 3445 rules, tmp.size is 524272 (why does it want that much memory? I
> > would expect the only data that's per-CPU is the packet and byte
> > counters).
> 
> The rule itself is replicated per-cpu as well to keep L2 cache
> accesses local per cpu on SMP systems.

Andy made this change, which helped on a dual box.


diff -u linux-2.6.5/net/ipv4/netfilter/ip_tables.c-o linux-2.6.5/net/ipv4/netfilter/ip_tables.c
--- linux-2.6.5/net/ipv4/netfilter/ip_tables.c-o	2005-02-03 08:06:33.000000000 +0100
+++ linux-2.6.5/net/ipv4/netfilter/ip_tables.c	2005-02-03 13:06:32.163182472 +0100
@@ -29,6 +29,12 @@
 
 #include <linux/netfilter_ipv4/ip_tables.h>
 
+#ifdef CONFIG_HOTPLUG_CPU
+#define NF_NR_CPUS NR_CPUS
+#else
+#define NF_NR_CPUS num_online_cpus() 
+#endif
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>");
 MODULE_DESCRIPTION("IPv4 packet filter");
@@ -860,7 +866,7 @@
 	}
 
 	/* And one copy for every other CPU */
-	for (i = 1; i < NR_CPUS; i++) {
+	for (i = 1; i < NF_NR_CPUS; i++) {
 		memcpy(newinfo->entries + SMP_ALIGN(newinfo->size)*i,
 		       newinfo->entries,
 		       SMP_ALIGN(newinfo->size));
@@ -882,7 +888,7 @@
 		struct ipt_entry *table_base;
 		unsigned int i;
 
-		for (i = 0; i < NR_CPUS; i++) {
+		for (i = 0; i < NF_NR_CPUS; i++) {
 			table_base =
 				(void *)newinfo->entries
 				+ TABLE_OFFSET(newinfo, i);
@@ -929,7 +935,7 @@
 	unsigned int cpu;
 	unsigned int i;
 
-	for (cpu = 0; cpu < NR_CPUS; cpu++) {
+	for (cpu = 0; cpu < NF_NR_CPUS; cpu++) {
 		i = 0;
 		IPT_ENTRY_ITERATE(t->entries + TABLE_OFFSET(t, cpu),
 				  t->size,
@@ -1067,7 +1073,7 @@
 		return -ENOMEM;
 
 	newinfo = vmalloc(sizeof(struct ipt_table_info)
-			  + SMP_ALIGN(tmp.size) * NR_CPUS);
+			  + SMP_ALIGN(tmp.size) * NF_NR_CPUS);
 	if (!newinfo)
 		return -ENOMEM;
 
@@ -1380,7 +1386,7 @@
 		= { 0, 0, 0, { 0 }, { 0 }, { } };
 
 	newinfo = vmalloc(sizeof(struct ipt_table_info)
-			  + SMP_ALIGN(table->table->size) * NR_CPUS);
+			  + SMP_ALIGN(table->table->size) * NF_NR_CPUS);
 	if (!newinfo)
 		return -ENOMEM;
 
diff -u linux-2.6.5/mm/vmalloc.c-o linux-2.6.5/mm/vmalloc.c
--- linux-2.6.5/mm/vmalloc.c-o	2005-02-03 08:06:50.000000000 +0100
+++ linux-2.6.5/mm/vmalloc.c	2005-02-03 13:07:44.162236952 +0100
@@ -310,7 +310,10 @@
 			__free_page(area->pages[i]);
 		}
 
-		kfree(area->pages);
+		if (area->nr_pages * sizeof(struct page *) >= 4*PAGE_SIZE)
+			vfree(area->pages);
+		else
+			kfree(area->pages);
 	}
 
 	kfree(area);
@@ -414,7 +417,11 @@
 	array_size = (nr_pages * sizeof(struct page *));
 
 	area->nr_pages = nr_pages;
-	area->pages = pages = kmalloc(array_size, (gfp_mask & ~__GFP_HIGHMEM));
+
+	if (array_size >= 4*PAGE_SIZE) 
+		area->pages = pages = __vmalloc(array_size, (gfp_mask & ~__GFP_HIGHMEM), PAGE_KERNEL);
+	else
+		area->pages = pages = kmalloc(array_size, (gfp_mask & ~__GFP_HIGHMEM));
 	if (!area->pages) {
 		remove_vm_area(area->addr);
 		kfree(area);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-03 18:59           ` Olaf Hering
@ 2005-02-03 19:00             ` David S. Miller
  2005-02-03 19:33               ` Bart De Schuymer
  2005-02-03 21:35               ` Bill Rugolsky Jr.
  0 siblings, 2 replies; 10+ messages in thread
From: David S. Miller @ 2005-02-03 19:00 UTC (permalink / raw)
  To: Olaf Hering; +Cc: okir, netdev, netfilter-devel, brugolsky

On Thu, 3 Feb 2005 19:59:28 +0100
Olaf Hering <olh@suse.de> wrote:

>  On Thu, Feb 03, David S. Miller wrote:
> 
> > The rule itself is replicated per-cpu as well to keep L2 cache
> > accesses local per cpu on SMP systems.
> 
> Andy made this change, which helped on a dual box.

It might not help for Olaf's 128 cpu box though :-)

I think reconsider the idea of replicating the rule itself per-cpu.
Also, this thread should have begun with netfilter-devel at least on
the CC:, added.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-03 19:00             ` David S. Miller
@ 2005-02-03 19:33               ` Bart De Schuymer
  2005-02-03 21:35               ` Bill Rugolsky Jr.
  1 sibling, 0 replies; 10+ messages in thread
From: Bart De Schuymer @ 2005-02-03 19:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: okir, netdev, netfilter-devel, brugolsky, Olaf Hering

Op do, 03-02-2005 te 11:00 -0800, schreef David S. Miller:
> On Thu, 3 Feb 2005 19:59:28 +0100
> Olaf Hering <olh@suse.de> wrote:
> 
> >  On Thu, Feb 03, David S. Miller wrote:
> > 
> > > The rule itself is replicated per-cpu as well to keep L2 cache
> > > accesses local per cpu on SMP systems.
> > 
> > Andy made this change, which helped on a dual box.
> 
> It might not help for Olaf's 128 cpu box though :-)
> 
> I think reconsider the idea of replicating the rule itself per-cpu.
> Also, this thread should have begun with netfilter-devel at least on
> the CC:, added.

Note that ebtables only has per-cpu counters.
I wonder what limits are seen on such systems for ebtables.

cheers,
Bart

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: limited number if iptable rules on 64bit hosts
  2005-02-03 19:00             ` David S. Miller
  2005-02-03 19:33               ` Bart De Schuymer
@ 2005-02-03 21:35               ` Bill Rugolsky Jr.
  1 sibling, 0 replies; 10+ messages in thread
From: Bill Rugolsky Jr. @ 2005-02-03 21:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: okir, netdev, netfilter-devel, Olaf Hering

On Thu, Feb 03, 2005 at 11:00:49AM -0800, David S. Miller wrote:
> It might not help for Olaf's 128 cpu box though :-)
> 
> I think reconsider the idea of replicating the rule itself per-cpu.
> Also, this thread should have begun with netfilter-devel at least on
> the CC:, added.

As Olaf Kirch pointed out, an entry is about 150 bytes, while the counters
are two 64-bit ints, and it looks like 'unsigned int comefrom' is set as
the chains are traversed [net/ipv4/netfilter/ip_tables.c]:

	/* Save old back ptr in next entry */
	struct ipt_entry *next
		= (void *)e + e->next_offset;
	next->comefrom
		= (void *)back - table_base;
	/* set back pointer to next entry */
	back = next;

That's 20-24 bytes of state per-entry per-cpu, for a factor of 6-7 savings,
at the expense of hairing up the code slightly to do parallel indexed
access, Fortran style.

If I am understanding the mm code correctly, a single vmalloc() allocation
is currently limited to 64M on a 64-bit platform, but the VMALLOC address
range is much greater, so one might also prefer to do a kmalloc()/vmalloc()
per CPU, perhaps by creating {vmalloc,vfree}_percpu() and using the
percpu interfaces.

	Bill Rugolsky

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-02-03 21:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-02 13:38 limited number if iptable rules on 64bit hosts Olaf Hering
2005-02-02 22:25 ` Olaf Hering
2005-02-02 22:38   ` Bill Rugolsky Jr.
2005-02-02 22:52     ` Olaf Hering
2005-02-03 11:19       ` Olaf Kirch
2005-02-03 18:48         ` David S. Miller
2005-02-03 18:59           ` Olaf Hering
2005-02-03 19:00             ` David S. Miller
2005-02-03 19:33               ` Bart De Schuymer
2005-02-03 21:35               ` Bill Rugolsky Jr.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).