netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* netfilter question
  2005-02-10 23:15   ` ULOG target for ipv6 Jonas Berlin
@ 2005-02-11 22:10     ` Pedro Fortuna
  0 siblings, 0 replies; 10+ messages in thread
From: Pedro Fortuna @ 2005-02-11 22:10 UTC (permalink / raw)
  To: netfilter-devel

Hello guys. I'll try to make it as short an simple as I can.

I want to develop a kernel module which will be running in two linux
hosts, connected by a crossover network cable (ethernet). This kernel
module will intercept a specific type of traffic (as an example, let's
say FTP packets (encapsulated in DIX frames)), both incomming and
outgoing, and change the ethertype in the frame header.

Outgoing dix frames carrying FTP packets get their ethertype changed
to a private, non standard ethertype number, just before they leave
the host (i.e. before they are passed to the network driver). The
frame is intercepted with the NF_IP_POST_ROUTING hook.

Incoming dix frames carrying FTP packets are get their ethertype
changed (at this point, a non standard ethertype number) to the
standard IPv4 ethertype number (i.e. 0x800), just after they are
processed by the network driver. The frame is intercepted with the
NF_IP_PRE_ROUTING hook.

My doubt is:
I'm not sure if I will be able to intercept the incoming frames
because they have a non standard ethertype number. They might get
dropped before passing through the NF_IP_PRE_ROUTING hook, due to the
unrecognized ethertype number. Is this true or false?
If the frame passes the hook before trying to identify the packet
type, then I'll have no trouble, because my netfilter module changes
the frame to the original ethertype number, thus making the hole
process transparent to the TCP/IP stacks running in both hosts.

I could explain what the hell I need to this for, but then you would
have three times more text to read :P I tried to restrict this post to
a minimum-painless-size.

Regards,
-Pedro Fortuna

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
       [not found] <cad49557-7c7a-83c9-d2b6-71d9624f0d52@miromedia.ca>
@ 2016-11-16 13:33 ` Eric Dumazet
  2016-11-16 15:02   ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-16 13:33 UTC (permalink / raw)
  To: Eric Desrochers; +Cc: Florian Westphal, netfilter-devel

On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> Hi Eric,
>
> My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
>
> I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
>
> I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
>
> So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
>
>
> Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
>
> Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
>
> I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
>
> Thanks in advance.
>
> Regards,
>
> Eric
>

Hi Eric

Thanks for your mail. But you should CC it on netfilter-devel mailing list.

Key point is that we really care about fast path : packet processing.
And cited commit helps this a lot by lowering the memory foot print on
hosts with many cores.
This is a step into right direction.

Now we probably should batch the percpu allocations one page at a
time, or ask Tejun if percpu allocations could be really really fast
(probably much harder)

But really you should not use iptables one rule at a time...
This will never compete with iptables-restore. ;)

Florian, would you have time to work on a patch trying to group the
percpu allocations one page at a time ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-16 13:33 ` netfilter question Eric Dumazet
@ 2016-11-16 15:02   ` Florian Westphal
  2016-11-16 15:23     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2016-11-16 15:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Desrochers, Florian Westphal, netfilter-devel

Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> > Hi Eric,
> >
> > My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
> >
> > I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
> >
> > I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
> >
> > So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
> >
> >
> > Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
> >
> > Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
> >
> > I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
> >
> > Thanks in advance.

[..]

> Key point is that we really care about fast path : packet processing.
> And cited commit helps this a lot by lowering the memory foot print on
> hosts with many cores.
> This is a step into right direction.
> 
> Now we probably should batch the percpu allocations one page at a
> time, or ask Tejun if percpu allocations could be really really fast
> (probably much harder)
> 
> But really you should not use iptables one rule at a time...
> This will never compete with iptables-restore. ;)
> 
> Florian, would you have time to work on a patch trying to group the
> percpu allocations one page at a time ?

You mean something like this ? :
        xt_entry_foreach(iter, entry0, newinfo->size) {
-               ret = find_check_entry(iter, net, repl->name, repl->size);
-               if (ret != 0)
+               if (pcpu_alloc == 0) {
+                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));
+                       if (IS_ERR_VALUE(pcnt))
+                               BUG();
+               }
+
+               iter->counters.pcnt = pcnt + pcpu_alloc;
+               iter->counters.bcnt = !!pcpu_alloc;
+               pcpu_alloc += sizeof(struct xt_counters);
+
+               if (pcpu_alloc > PAGE_SIZE - sizeof(struct xt_counters))
+                       pcpu_alloc = 0;
+
+               ret = find_check_entry(iter, net, repl->name, repl->size)
 ...

This is going to be ugly since we'll have to deal with SMP vs. NONSMP (i.e. no perpcu allocations)
in ip/ip6/arptables.

Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).

But maybe I don't understand what you are suggesting :)
Can you elaborate?

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-16 15:02   ` Florian Westphal
@ 2016-11-16 15:23     ` Eric Dumazet
  2016-11-17  0:07       ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-16 15:23 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, Eric Desrochers, netfilter-devel

On Wed, 2016-11-16 at 16:02 +0100, Florian Westphal wrote:
> Eric Dumazet <edumazet@google.com> wrote:
> > On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> > > Hi Eric,
> > >
> > > My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
> > >
> > > I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
> > >
> > > I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
> > >
> > > So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
> > >
> > >
> > > Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
> > >
> > > Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
> > >
> > > I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
> > >
> > > Thanks in advance.
> 
> [..]
> 
> > Key point is that we really care about fast path : packet processing.
> > And cited commit helps this a lot by lowering the memory foot print on
> > hosts with many cores.
> > This is a step into right direction.
> > 
> > Now we probably should batch the percpu allocations one page at a
> > time, or ask Tejun if percpu allocations could be really really fast
> > (probably much harder)
> > 
> > But really you should not use iptables one rule at a time...
> > This will never compete with iptables-restore. ;)
> > 
> > Florian, would you have time to work on a patch trying to group the
> > percpu allocations one page at a time ?
> 
> You mean something like this ? :
>         xt_entry_foreach(iter, entry0, newinfo->size) {
> -               ret = find_check_entry(iter, net, repl->name, repl->size);
> -               if (ret != 0)
> +               if (pcpu_alloc == 0) {
> +                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));

alignment should be a page.

> +                       if (IS_ERR_VALUE(pcnt))
> +                               BUG();

well. no BUG() for sure ;)

> +               }
> +
> +               iter->counters.pcnt = pcnt + pcpu_alloc;
> +               iter->counters.bcnt = !!pcpu_alloc;
> +               pcpu_alloc += sizeof(struct xt_counters);
> +
> +               if (pcpu_alloc > PAGE_SIZE - sizeof(struct xt_counters))
> +                       pcpu_alloc = 0;
> +
> +               ret = find_check_entry(iter, net, repl->name, repl->size)
>  ...
> 
> This is going to be ugly since we'll have to deal with SMP vs. NONSMP (i.e. no perpcu allocations)
> in ip/ip6/arptables.

Time for a common helper then ...

> 
> Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).

Free if the address is aligned to a page boundary ?

Otherwise skip it, it already has been freed earlier.

> 
> But maybe I don't understand what you are suggesting :)
> Can you elaborate?

Note that this grouping will also help data locality.

I definitely have servers with a huge number of percpu allocations and I
fear we might have many TLB misses because of possible spread of
xt_counters.

Note that percpu pages must not be shared by multiple users
(ip/ip6/arptable), each table should get its own cache.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-16 15:23     ` Eric Dumazet
@ 2016-11-17  0:07       ` Florian Westphal
  2016-11-17  2:34         ` Eric Dumazet
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Florian Westphal @ 2016-11-17  0:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, Eric Dumazet, Eric Desrochers, netfilter-devel

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> > > > Hi Eric,
> > > >
> > > > My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
> > > >
> > > > I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
> > > >
> > > > I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
> > > >
> > > > So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
> > > >
> > > >
> > > > Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
> > > >
> > > > Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
> > > >
> > > > I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
> > > >
> > > > Thanks in advance.
> > 
> > [..]
> > 
> > > Key point is that we really care about fast path : packet processing.
> > > And cited commit helps this a lot by lowering the memory foot print on
> > > hosts with many cores.
> > > This is a step into right direction.
> > > 
> > > Now we probably should batch the percpu allocations one page at a
> > > time, or ask Tejun if percpu allocations could be really really fast
> > > (probably much harder)
> > > 
> > > But really you should not use iptables one rule at a time...
> > > This will never compete with iptables-restore. ;)
> > > 
> > > Florian, would you have time to work on a patch trying to group the
> > > percpu allocations one page at a time ?
> > 
> > You mean something like this ? :
> >         xt_entry_foreach(iter, entry0, newinfo->size) {
> > -               ret = find_check_entry(iter, net, repl->name, repl->size);
> > -               if (ret != 0)
> > +               if (pcpu_alloc == 0) {
> > +                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));
> 
> alignment should be a page.

[..]

> > Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).
> 
> Free if the address is aligned to a page boundary ?

Good idea.  This seems to work for me.  Eric (Desrochers), does this
improve the situation for you as well?

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -403,38 +403,14 @@ static inline unsigned long ifname_compare_aligned(const char *_a,
 	return ret;
 }
 
+struct xt_percpu_counter_alloc_state {
+	unsigned int off;
+	const char *mem;
+};
 
-/* On SMP, ip(6)t_entry->counters.pcnt holds address of the
- * real (percpu) counter.  On !SMP, its just the packet count,
- * so nothing needs to be done there.
- *
- * xt_percpu_counter_alloc returns the address of the percpu
- * counter, or 0 on !SMP. We force an alignment of 16 bytes
- * so that bytes/packets share a common cache line.
- *
- * Hence caller must use IS_ERR_VALUE to check for error, this
- * allows us to return 0 for single core systems without forcing
- * callers to deal with SMP vs. NONSMP issues.
- */
-static inline unsigned long xt_percpu_counter_alloc(void)
-{
-	if (nr_cpu_ids > 1) {
-		void __percpu *res = __alloc_percpu(sizeof(struct xt_counters),
-						    sizeof(struct xt_counters));
-
-		if (res == NULL)
-			return -ENOMEM;
-
-		return (__force unsigned long) res;
-	}
-
-	return 0;
-}
-static inline void xt_percpu_counter_free(u64 pcnt)
-{
-	if (nr_cpu_ids > 1)
-		free_percpu((void __percpu *) (unsigned long) pcnt);
-}
+bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
+			     struct xt_counters *counter);
+void xt_percpu_counter_free(struct xt_counters *cnt);
 
 static inline struct xt_counters *
 xt_get_this_cpu_counter(struct xt_counters *cnt)
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 39004da318e2..cbea0cb030da 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -411,17 +411,15 @@ static inline int check_target(struct arpt_entry *e, const char *name)
 }
 
 static inline int
-find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
+find_check_entry(struct arpt_entry *e, const char *name, unsigned int size,
+		 struct xt_percpu_counter_alloc_state *alloc_state)
 {
 	struct xt_entry_target *t;
 	struct xt_target *target;
-	unsigned long pcnt;
 	int ret;
 
-	pcnt = xt_percpu_counter_alloc();
-	if (IS_ERR_VALUE(pcnt))
+	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
 		return -ENOMEM;
-	e->counters.pcnt = pcnt;
 
 	t = arpt_get_target(e);
 	target = xt_request_find_target(NFPROTO_ARP, t->u.user.name,
@@ -439,7 +437,7 @@ find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
 err:
 	module_put(t->u.kernel.target->me);
 out:
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 
 	return ret;
 }
@@ -519,7 +517,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
 	module_put(par.target->me);
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 }
 
 /* Checks and translates the user-supplied table segment (held in
@@ -528,6 +526,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
 static int translate_table(struct xt_table_info *newinfo, void *entry0,
 			   const struct arpt_replace *repl)
 {
+	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
 	struct arpt_entry *iter;
 	unsigned int *offsets;
 	unsigned int i;
@@ -590,7 +589,7 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 	/* Finally, each sanity check must pass */
 	i = 0;
 	xt_entry_foreach(iter, entry0, newinfo->size) {
-		ret = find_check_entry(iter, repl->name, repl->size);
+		ret = find_check_entry(iter, repl->name, repl->size, &alloc_state);
 		if (ret != 0)
 			break;
 		++i;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 46815c8a60d7..0024550516d1 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -531,7 +531,8 @@ static int check_target(struct ipt_entry *e, struct net *net, const char *name)
 
 static int
 find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
-		 unsigned int size)
+		 unsigned int size,
+		 struct xt_percpu_counter_alloc_state *alloc_state)
 {
 	struct xt_entry_target *t;
 	struct xt_target *target;
@@ -539,12 +540,9 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
 	unsigned int j;
 	struct xt_mtchk_param mtpar;
 	struct xt_entry_match *ematch;
-	unsigned long pcnt;
 
-	pcnt = xt_percpu_counter_alloc();
-	if (IS_ERR_VALUE(pcnt))
+	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
 		return -ENOMEM;
-	e->counters.pcnt = pcnt;
 
 	j = 0;
 	mtpar.net	= net;
@@ -582,7 +580,7 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
 		cleanup_match(ematch, net);
 	}
 
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 
 	return ret;
 }
@@ -670,7 +668,7 @@ cleanup_entry(struct ipt_entry *e, struct net *net)
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
 	module_put(par.target->me);
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 }
 
 /* Checks and translates the user-supplied table segment (held in
@@ -679,6 +677,7 @@ static int
 translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		const struct ipt_replace *repl)
 {
+	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
 	struct ipt_entry *iter;
 	unsigned int *offsets;
 	unsigned int i;
@@ -738,7 +737,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 	/* Finally, each sanity check must pass */
 	i = 0;
 	xt_entry_foreach(iter, entry0, newinfo->size) {
-		ret = find_check_entry(iter, net, repl->name, repl->size);
+		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
 		if (ret != 0)
 			break;
 		++i;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 6ff42b8301cc..123d9af6742e 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -562,7 +562,8 @@ static int check_target(struct ip6t_entry *e, struct net *net, const char *name)
 
 static int
 find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
-		 unsigned int size)
+		 unsigned int size,
+		 struct xt_percpu_counter_alloc_state *alloc_state)
 {
 	struct xt_entry_target *t;
 	struct xt_target *target;
@@ -570,12 +571,9 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
 	unsigned int j;
 	struct xt_mtchk_param mtpar;
 	struct xt_entry_match *ematch;
-	unsigned long pcnt;
 
-	pcnt = xt_percpu_counter_alloc();
-	if (IS_ERR_VALUE(pcnt))
+	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
 		return -ENOMEM;
-	e->counters.pcnt = pcnt;
 
 	j = 0;
 	mtpar.net	= net;
@@ -612,7 +610,7 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
 		cleanup_match(ematch, net);
 	}
 
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 
 	return ret;
 }
@@ -699,8 +697,7 @@ static void cleanup_entry(struct ip6t_entry *e, struct net *net)
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
 	module_put(par.target->me);
-
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 }
 
 /* Checks and translates the user-supplied table segment (held in
@@ -709,6 +706,7 @@ static int
 translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		const struct ip6t_replace *repl)
 {
+	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
 	struct ip6t_entry *iter;
 	unsigned int *offsets;
 	unsigned int i;
@@ -768,7 +766,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 	/* Finally, each sanity check must pass */
 	i = 0;
 	xt_entry_foreach(iter, entry0, newinfo->size) {
-		ret = find_check_entry(iter, net, repl->name, repl->size);
+		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
 		if (ret != 0)
 			break;
 		++i;
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index ad818e52859b..a4d1084b163f 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1615,6 +1615,59 @@ void xt_proto_fini(struct net *net, u_int8_t af)
 }
 EXPORT_SYMBOL_GPL(xt_proto_fini);
 
+/**
+ * xt_percpu_counter_alloc - allocate x_tables rule counter
+ *
+ * @state: pointer to xt_percpu allocation state
+ * @counter: pointer to counter struct inside the ip(6)/arpt_entry struct
+ *
+ * On SMP, the packet counter [ ip(6)t_entry->counters.pcnt ] will then
+ * contain the address of the real (percpu) counter.
+ *
+ * Rule evaluation needs to use xt_get_this_cpu_counter() helper
+ * to fetch the real percpu counter.
+ *
+ * To speed up allocation and improve data locality, an entire
+ * page is allocated.
+ *
+ * xt_percpu_counter_alloc_state contains the base address of the
+ * allocated page and the current sub-offset.
+ *
+ * returns false on error.
+ */
+bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
+			     struct xt_counters *counter)
+{
+	BUILD_BUG_ON(PAGE_SIZE < (sizeof(*counter) * 2));
+
+	if (nr_cpu_ids <= 1)
+		return true;
+
+	if (state->mem == NULL) {
+		state->mem = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
+		if (!state->mem)
+			return false;
+	}
+	counter->pcnt = (__force unsigned long)(state->mem + state->off);
+	state->off += sizeof(*counter);
+	if (state->off > (PAGE_SIZE - sizeof(*counter))) {
+		state->mem = NULL;
+		state->off = 0;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(xt_percpu_counter_alloc);
+
+void xt_percpu_counter_free(struct xt_counters *counters)
+{
+	unsigned long pcnt = counters->pcnt;
+
+	if (nr_cpu_ids > 1 && (pcnt & (PAGE_SIZE - 1)) == 0)
+		free_percpu((void __percpu *) (unsigned long)pcnt);
+}
+EXPORT_SYMBOL_GPL(xt_percpu_counter_free);
+
 static int __net_init xt_net_init(struct net *net)
 {
 	int i;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-17  0:07       ` Florian Westphal
@ 2016-11-17  2:34         ` Eric Dumazet
  2016-11-17 15:49         ` Eric Desrochers
  2016-11-20  6:33         ` Eric Dumazet
  2 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2016-11-17  2:34 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, Eric Desrochers, netfilter-devel

On Thu, 2016-11-17 at 01:07 +0100, Florian Westphal wrote:

Seems very nice !

> +
> +void xt_percpu_counter_free(struct xt_counters *counters)
> +{
> +	unsigned long pcnt = counters->pcnt;
> +
> +	if (nr_cpu_ids > 1 && (pcnt & (PAGE_SIZE - 1)) == 0)
> +		free_percpu((void __percpu *) (unsigned long)pcnt);
> +}


pcnt is already an "unsigned long"

This packing might also speed up "iptables -nvL" dumps ;)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-17  0:07       ` Florian Westphal
  2016-11-17  2:34         ` Eric Dumazet
@ 2016-11-17 15:49         ` Eric Desrochers
  2016-11-20  6:33         ` Eric Dumazet
  2 siblings, 0 replies; 10+ messages in thread
From: Eric Desrochers @ 2016-11-17 15:49 UTC (permalink / raw)
  To: Florian Westphal, Eric Dumazet; +Cc: Eric Dumazet, netfilter-devel

Hi Florian,

thanks for quick response, will give it a try and get back to you with the outcome of my test.

On 2016-11-17 01:07 AM, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
>>>>> Hi Eric,
>>>>>
>>>>> My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
>>>>>
>>>>> I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
>>>>>
>>>>> I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
>>>>>
>>>>> So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
>>>>>
>>>>>
>>>>> Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
>>>>>
>>>>> Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
>>>>>
>>>>> I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
>>>>>
>>>>> Thanks in advance.
>>> [..]
>>>
>>>> Key point is that we really care about fast path : packet processing.
>>>> And cited commit helps this a lot by lowering the memory foot print on
>>>> hosts with many cores.
>>>> This is a step into right direction.
>>>>
>>>> Now we probably should batch the percpu allocations one page at a
>>>> time, or ask Tejun if percpu allocations could be really really fast
>>>> (probably much harder)
>>>>
>>>> But really you should not use iptables one rule at a time...
>>>> This will never compete with iptables-restore. ;)
>>>>
>>>> Florian, would you have time to work on a patch trying to group the
>>>> percpu allocations one page at a time ?
>>> You mean something like this ? :
>>>         xt_entry_foreach(iter, entry0, newinfo->size) {
>>> -               ret = find_check_entry(iter, net, repl->name, repl->size);
>>> -               if (ret != 0)
>>> +               if (pcpu_alloc == 0) {
>>> +                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));
>> alignment should be a page.
> [..]
>
>>> Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).
>> Free if the address is aligned to a page boundary ?
> Good idea.  This seems to work for me.  Eric (Desrochers), does this
> improve the situation for you as well?
>
> diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
> --- a/include/linux/netfilter/x_tables.h
> +++ b/include/linux/netfilter/x_tables.h
> @@ -403,38 +403,14 @@ static inline unsigned long ifname_compare_aligned(const char *_a,
>  	return ret;
>  }
>  
> +struct xt_percpu_counter_alloc_state {
> +	unsigned int off;
> +	const char *mem;
> +};
>  
> -/* On SMP, ip(6)t_entry->counters.pcnt holds address of the
> - * real (percpu) counter.  On !SMP, its just the packet count,
> - * so nothing needs to be done there.
> - *
> - * xt_percpu_counter_alloc returns the address of the percpu
> - * counter, or 0 on !SMP. We force an alignment of 16 bytes
> - * so that bytes/packets share a common cache line.
> - *
> - * Hence caller must use IS_ERR_VALUE to check for error, this
> - * allows us to return 0 for single core systems without forcing
> - * callers to deal with SMP vs. NONSMP issues.
> - */
> -static inline unsigned long xt_percpu_counter_alloc(void)
> -{
> -	if (nr_cpu_ids > 1) {
> -		void __percpu *res = __alloc_percpu(sizeof(struct xt_counters),
> -						    sizeof(struct xt_counters));
> -
> -		if (res == NULL)
> -			return -ENOMEM;
> -
> -		return (__force unsigned long) res;
> -	}
> -
> -	return 0;
> -}
> -static inline void xt_percpu_counter_free(u64 pcnt)
> -{
> -	if (nr_cpu_ids > 1)
> -		free_percpu((void __percpu *) (unsigned long) pcnt);
> -}
> +bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
> +			     struct xt_counters *counter);
> +void xt_percpu_counter_free(struct xt_counters *cnt);
>  
>  static inline struct xt_counters *
>  xt_get_this_cpu_counter(struct xt_counters *cnt)
> diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
> index 39004da318e2..cbea0cb030da 100644
> --- a/net/ipv4/netfilter/arp_tables.c
> +++ b/net/ipv4/netfilter/arp_tables.c
> @@ -411,17 +411,15 @@ static inline int check_target(struct arpt_entry *e, const char *name)
>  }
>  
>  static inline int
> -find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
> +find_check_entry(struct arpt_entry *e, const char *name, unsigned int size,
> +		 struct xt_percpu_counter_alloc_state *alloc_state)
>  {
>  	struct xt_entry_target *t;
>  	struct xt_target *target;
> -	unsigned long pcnt;
>  	int ret;
>  
> -	pcnt = xt_percpu_counter_alloc();
> -	if (IS_ERR_VALUE(pcnt))
> +	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
>  		return -ENOMEM;
> -	e->counters.pcnt = pcnt;
>  
>  	t = arpt_get_target(e);
>  	target = xt_request_find_target(NFPROTO_ARP, t->u.user.name,
> @@ -439,7 +437,7 @@ find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
>  err:
>  	module_put(t->u.kernel.target->me);
>  out:
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  
>  	return ret;
>  }
> @@ -519,7 +517,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
>  	if (par.target->destroy != NULL)
>  		par.target->destroy(&par);
>  	module_put(par.target->me);
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  }
>  
>  /* Checks and translates the user-supplied table segment (held in
> @@ -528,6 +526,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
>  static int translate_table(struct xt_table_info *newinfo, void *entry0,
>  			   const struct arpt_replace *repl)
>  {
> +	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
>  	struct arpt_entry *iter;
>  	unsigned int *offsets;
>  	unsigned int i;
> @@ -590,7 +589,7 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
>  	/* Finally, each sanity check must pass */
>  	i = 0;
>  	xt_entry_foreach(iter, entry0, newinfo->size) {
> -		ret = find_check_entry(iter, repl->name, repl->size);
> +		ret = find_check_entry(iter, repl->name, repl->size, &alloc_state);
>  		if (ret != 0)
>  			break;
>  		++i;
> diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> index 46815c8a60d7..0024550516d1 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -531,7 +531,8 @@ static int check_target(struct ipt_entry *e, struct net *net, const char *name)
>  
>  static int
>  find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
> -		 unsigned int size)
> +		 unsigned int size,
> +		 struct xt_percpu_counter_alloc_state *alloc_state)
>  {
>  	struct xt_entry_target *t;
>  	struct xt_target *target;
> @@ -539,12 +540,9 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
>  	unsigned int j;
>  	struct xt_mtchk_param mtpar;
>  	struct xt_entry_match *ematch;
> -	unsigned long pcnt;
>  
> -	pcnt = xt_percpu_counter_alloc();
> -	if (IS_ERR_VALUE(pcnt))
> +	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
>  		return -ENOMEM;
> -	e->counters.pcnt = pcnt;
>  
>  	j = 0;
>  	mtpar.net	= net;
> @@ -582,7 +580,7 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
>  		cleanup_match(ematch, net);
>  	}
>  
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  
>  	return ret;
>  }
> @@ -670,7 +668,7 @@ cleanup_entry(struct ipt_entry *e, struct net *net)
>  	if (par.target->destroy != NULL)
>  		par.target->destroy(&par);
>  	module_put(par.target->me);
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  }
>  
>  /* Checks and translates the user-supplied table segment (held in
> @@ -679,6 +677,7 @@ static int
>  translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  		const struct ipt_replace *repl)
>  {
> +	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
>  	struct ipt_entry *iter;
>  	unsigned int *offsets;
>  	unsigned int i;
> @@ -738,7 +737,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  	/* Finally, each sanity check must pass */
>  	i = 0;
>  	xt_entry_foreach(iter, entry0, newinfo->size) {
> -		ret = find_check_entry(iter, net, repl->name, repl->size);
> +		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
>  		if (ret != 0)
>  			break;
>  		++i;
> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
> index 6ff42b8301cc..123d9af6742e 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -562,7 +562,8 @@ static int check_target(struct ip6t_entry *e, struct net *net, const char *name)
>  
>  static int
>  find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
> -		 unsigned int size)
> +		 unsigned int size,
> +		 struct xt_percpu_counter_alloc_state *alloc_state)
>  {
>  	struct xt_entry_target *t;
>  	struct xt_target *target;
> @@ -570,12 +571,9 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
>  	unsigned int j;
>  	struct xt_mtchk_param mtpar;
>  	struct xt_entry_match *ematch;
> -	unsigned long pcnt;
>  
> -	pcnt = xt_percpu_counter_alloc();
> -	if (IS_ERR_VALUE(pcnt))
> +	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
>  		return -ENOMEM;
> -	e->counters.pcnt = pcnt;
>  
>  	j = 0;
>  	mtpar.net	= net;
> @@ -612,7 +610,7 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
>  		cleanup_match(ematch, net);
>  	}
>  
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  
>  	return ret;
>  }
> @@ -699,8 +697,7 @@ static void cleanup_entry(struct ip6t_entry *e, struct net *net)
>  	if (par.target->destroy != NULL)
>  		par.target->destroy(&par);
>  	module_put(par.target->me);
> -
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  }
>  
>  /* Checks and translates the user-supplied table segment (held in
> @@ -709,6 +706,7 @@ static int
>  translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  		const struct ip6t_replace *repl)
>  {
> +	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
>  	struct ip6t_entry *iter;
>  	unsigned int *offsets;
>  	unsigned int i;
> @@ -768,7 +766,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  	/* Finally, each sanity check must pass */
>  	i = 0;
>  	xt_entry_foreach(iter, entry0, newinfo->size) {
> -		ret = find_check_entry(iter, net, repl->name, repl->size);
> +		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
>  		if (ret != 0)
>  			break;
>  		++i;
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index ad818e52859b..a4d1084b163f 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1615,6 +1615,59 @@ void xt_proto_fini(struct net *net, u_int8_t af)
>  }
>  EXPORT_SYMBOL_GPL(xt_proto_fini);
>  
> +/**
> + * xt_percpu_counter_alloc - allocate x_tables rule counter
> + *
> + * @state: pointer to xt_percpu allocation state
> + * @counter: pointer to counter struct inside the ip(6)/arpt_entry struct
> + *
> + * On SMP, the packet counter [ ip(6)t_entry->counters.pcnt ] will then
> + * contain the address of the real (percpu) counter.
> + *
> + * Rule evaluation needs to use xt_get_this_cpu_counter() helper
> + * to fetch the real percpu counter.
> + *
> + * To speed up allocation and improve data locality, an entire
> + * page is allocated.
> + *
> + * xt_percpu_counter_alloc_state contains the base address of the
> + * allocated page and the current sub-offset.
> + *
> + * returns false on error.
> + */
> +bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
> +			     struct xt_counters *counter)
> +{
> +	BUILD_BUG_ON(PAGE_SIZE < (sizeof(*counter) * 2));
> +
> +	if (nr_cpu_ids <= 1)
> +		return true;
> +
> +	if (state->mem == NULL) {
> +		state->mem = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
> +		if (!state->mem)
> +			return false;
> +	}
> +	counter->pcnt = (__force unsigned long)(state->mem + state->off);
> +	state->off += sizeof(*counter);
> +	if (state->off > (PAGE_SIZE - sizeof(*counter))) {
> +		state->mem = NULL;
> +		state->off = 0;
> +	}
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(xt_percpu_counter_alloc);
> +
> +void xt_percpu_counter_free(struct xt_counters *counters)
> +{
> +	unsigned long pcnt = counters->pcnt;
> +
> +	if (nr_cpu_ids > 1 && (pcnt & (PAGE_SIZE - 1)) == 0)
> +		free_percpu((void __percpu *) (unsigned long)pcnt);
> +}
> +EXPORT_SYMBOL_GPL(xt_percpu_counter_free);
> +
>  static int __net_init xt_net_init(struct net *net)
>  {
>  	int i;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-17  0:07       ` Florian Westphal
  2016-11-17  2:34         ` Eric Dumazet
  2016-11-17 15:49         ` Eric Desrochers
@ 2016-11-20  6:33         ` Eric Dumazet
       [not found]           ` <CAGUFhKwQTRRJpfGi2fRkFfGdpLYMN-2F9G+dEsavM7UGbkjjdA@mail.gmail.com>
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-20  6:33 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, Eric Desrochers, netfilter-devel

On Thu, 2016-11-17 at 01:07 +0100, Florian Westphal wrote:

> +	if (state->mem == NULL) {
> +		state->mem = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
> +		if (!state->mem)
> +			return false;
> +	}

This will fail on arches where PAGE_SIZE=65536

percpu allocator limit is PCPU_MIN_UNIT_SIZE  ( 32 KB )

So maybe use a smaller value like 4096 ?

#define XT_PCPU_BLOC_SIZE 4096



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
       [not found]           ` <CAGUFhKwQTRRJpfGi2fRkFfGdpLYMN-2F9G+dEsavM7UGbkjjdA@mail.gmail.com>
@ 2016-11-20 17:31             ` Eric Dumazet
  2016-11-20 17:55               ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-20 17:31 UTC (permalink / raw)
  To: Eric D; +Cc: Eric Dumazet, netfilter-devel, Florian Westphal

On Sun, 2016-11-20 at 12:22 -0500, Eric D wrote:
> I'm currently abroad for work and will come back home soon. I will
> test the solution and provide feedback to Florian by end of week.
> 
> Thanks for jumping on this quickly.
> 
> Eric
> 
> 
> On Nov 20, 2016 7:33 AM, "Eric Dumazet" <eric.dumazet@gmail.com>
> wrote:
>         On Thu, 2016-11-17 at 01:07 +0100, Florian Westphal wrote:
>         
>         > +     if (state->mem == NULL) {
>         > +             state->mem = __alloc_percpu(PAGE_SIZE,
>         PAGE_SIZE);
>         > +             if (!state->mem)
>         > +                     return false;
>         > +     }
>         
>         This will fail on arches where PAGE_SIZE=65536
>         
>         percpu allocator limit is PCPU_MIN_UNIT_SIZE  ( 32 KB )
>         
>         So maybe use a smaller value like 4096 ?
>         
>         #define XT_PCPU_BLOC_SIZE 4096
>         
Thanks Eric, I will test the patch myself, because I believe we need it
asap ;)





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-20 17:31             ` Eric Dumazet
@ 2016-11-20 17:55               ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2016-11-20 17:55 UTC (permalink / raw)
  To: Eric D; +Cc: Eric Dumazet, netfilter-devel, Florian Westphal

On Sun, 2016-11-20 at 09:31 -0800, Eric Dumazet wrote:

> Thanks Eric, I will test the patch myself, because I believe we need it
> asap ;)


Current net-next without Florian patch :

lpaa24:~# time for f in `seq 1 2000` ; do iptables -A FORWARD ; done

real	0m12.856s
user	0m0.590s
sys	0m11.131s


perf report ...; perf report ->

    47.45%  iptables  [kernel.kallsyms]  [k] pcpu_alloc_area                      
     8.49%  iptables  [kernel.kallsyms]  [k] memset_erms                          
     7.35%  iptables  [kernel.kallsyms]  [k] get_counters                         
     2.87%  iptables  [kernel.kallsyms]  [k] __memmove                            
     2.33%  iptables  [kernel.kallsyms]  [k] pcpu_alloc                           
     2.07%  iptables  [kernel.kallsyms]  [k] _find_next_bit.part.0                
     1.62%  iptables  xtables-multi      [.] 0x000000000001bb9d                   
     1.25%  iptables  [kernel.kallsyms]  [k] page_fault                           
     1.01%  iptables  [kernel.kallsyms]  [k] memcmp                               
     0.94%  iptables  [kernel.kallsyms]  [k] translate_table                      
     0.76%  iptables  [kernel.kallsyms]  [k] find_next_bit                        
     0.73%  iptables  [kernel.kallsyms]  [k] filemap_map_pages                    
     0.68%  iptables  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string       
     0.54%  iptables  [kernel.kallsyms]  [k] __get_user_8                         
     0.54%  iptables  [kernel.kallsyms]  [k] clear_page_c_e                

After patch :

lpaa24:~# time for f in `seq 1 2000` ; do iptables -A FORWARD ; done

real	0m3.867s
user	0m0.559s
sys	0m2.216s

    22.15%  iptables  [kernel.kallsyms]  [k] get_counters                           
     5.85%  iptables  xtables-multi      [.] 0x000000000001bbac                     
     3.99%  iptables  [kernel.kallsyms]  [k] page_fault                             
     2.37%  iptables  [kernel.kallsyms]  [k] memcmp                                 
     2.19%  iptables  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string         
     1.89%  iptables  [kernel.kallsyms]  [k] translate_table                        
     1.78%  iptables  [kernel.kallsyms]  [k] memset_erms                            
     1.74%  iptables  [kernel.kallsyms]  [k] clear_page_c_e                         
     1.73%  iptables  [kernel.kallsyms]  [k] __get_user_8                           
     1.72%  iptables  [kernel.kallsyms]  [k] perf_iterate_ctx                       
     1.21%  iptables  [kernel.kallsyms]  [k] handle_mm_fault                        
     0.98%  iptables  [kernel.kallsyms]  [k] unmap_page_range          

So this is a huge win. And I suspect data path will also gain from all
pcpu counters being in the same area of memory (this is where I am very interested)



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-11-20 17:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cad49557-7c7a-83c9-d2b6-71d9624f0d52@miromedia.ca>
2016-11-16 13:33 ` netfilter question Eric Dumazet
2016-11-16 15:02   ` Florian Westphal
2016-11-16 15:23     ` Eric Dumazet
2016-11-17  0:07       ` Florian Westphal
2016-11-17  2:34         ` Eric Dumazet
2016-11-17 15:49         ` Eric Desrochers
2016-11-20  6:33         ` Eric Dumazet
     [not found]           ` <CAGUFhKwQTRRJpfGi2fRkFfGdpLYMN-2F9G+dEsavM7UGbkjjdA@mail.gmail.com>
2016-11-20 17:31             ` Eric Dumazet
2016-11-20 17:55               ` Eric Dumazet
2005-02-08  7:50 netfilter & ipv6 Jonas Berlin
     [not found] ` <53965.213.236.112.75.1107867276.squirrel@213.236.112.75>
2005-02-10 23:15   ` ULOG target for ipv6 Jonas Berlin
2005-02-11 22:10     ` netfilter question Pedro Fortuna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).