Re: netfilter question

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: netfilter question
       [not found] <cad49557-7c7a-83c9-d2b6-71d9624f0d52@miromedia.ca>
@ 2016-11-16 13:33 ` Eric Dumazet
  2016-11-16 15:02   ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-16 13:33 UTC (permalink / raw)
  To: Eric Desrochers; +Cc: Florian Westphal, netfilter-devel

On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> Hi Eric,
>
> My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
>
> I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
>
> I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
>
> So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
>
>
> Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
>
> Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
>
> I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
>
> Thanks in advance.
>
> Regards,
>
> Eric
>

Hi Eric

Thanks for your mail. But you should CC it on netfilter-devel mailing list.

Key point is that we really care about fast path : packet processing.
And cited commit helps this a lot by lowering the memory foot print on
hosts with many cores.
This is a step into right direction.

Now we probably should batch the percpu allocations one page at a
time, or ask Tejun if percpu allocations could be really really fast
(probably much harder)

But really you should not use iptables one rule at a time...
This will never compete with iptables-restore. ;)

Florian, would you have time to work on a patch trying to group the
percpu allocations one page at a time ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-16 13:33 ` netfilter question Eric Dumazet
@ 2016-11-16 15:02   ` Florian Westphal
  2016-11-16 15:23     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2016-11-16 15:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Desrochers, Florian Westphal, netfilter-devel

Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> > Hi Eric,
> >
> > My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
> >
> > I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
> >
> > I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
> >
> > So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
> >
> >
> > Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
> >
> > Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
> >
> > I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
> >
> > Thanks in advance.

[..]

> Key point is that we really care about fast path : packet processing.
> And cited commit helps this a lot by lowering the memory foot print on
> hosts with many cores.
> This is a step into right direction.
> 
> Now we probably should batch the percpu allocations one page at a
> time, or ask Tejun if percpu allocations could be really really fast
> (probably much harder)
> 
> But really you should not use iptables one rule at a time...
> This will never compete with iptables-restore. ;)
> 
> Florian, would you have time to work on a patch trying to group the
> percpu allocations one page at a time ?

You mean something like this ? :
        xt_entry_foreach(iter, entry0, newinfo->size) {
-               ret = find_check_entry(iter, net, repl->name, repl->size);
-               if (ret != 0)
+               if (pcpu_alloc == 0) {
+                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));
+                       if (IS_ERR_VALUE(pcnt))
+                               BUG();
+               }
+
+               iter->counters.pcnt = pcnt + pcpu_alloc;
+               iter->counters.bcnt = !!pcpu_alloc;
+               pcpu_alloc += sizeof(struct xt_counters);
+
+               if (pcpu_alloc > PAGE_SIZE - sizeof(struct xt_counters))
+                       pcpu_alloc = 0;
+
+               ret = find_check_entry(iter, net, repl->name, repl->size)
 ...

This is going to be ugly since we'll have to deal with SMP vs. NONSMP (i.e. no perpcu allocations)
in ip/ip6/arptables.

Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).

But maybe I don't understand what you are suggesting :)
Can you elaborate?

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-16 15:02   ` Florian Westphal
@ 2016-11-16 15:23     ` Eric Dumazet
  2016-11-17  0:07       ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-16 15:23 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, Eric Desrochers, netfilter-devel

On Wed, 2016-11-16 at 16:02 +0100, Florian Westphal wrote:
> Eric Dumazet <edumazet@google.com> wrote:
> > On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> > > Hi Eric,
> > >
> > > My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
> > >
> > > I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
> > >
> > > I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
> > >
> > > So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
> > >
> > >
> > > Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
> > >
> > > Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
> > >
> > > I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
> > >
> > > Thanks in advance.
> 
> [..]
> 
> > Key point is that we really care about fast path : packet processing.
> > And cited commit helps this a lot by lowering the memory foot print on
> > hosts with many cores.
> > This is a step into right direction.
> > 
> > Now we probably should batch the percpu allocations one page at a
> > time, or ask Tejun if percpu allocations could be really really fast
> > (probably much harder)
> > 
> > But really you should not use iptables one rule at a time...
> > This will never compete with iptables-restore. ;)
> > 
> > Florian, would you have time to work on a patch trying to group the
> > percpu allocations one page at a time ?
> 
> You mean something like this ? :
>         xt_entry_foreach(iter, entry0, newinfo->size) {
> -               ret = find_check_entry(iter, net, repl->name, repl->size);
> -               if (ret != 0)
> +               if (pcpu_alloc == 0) {
> +                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));

alignment should be a page.

> +                       if (IS_ERR_VALUE(pcnt))
> +                               BUG();

well. no BUG() for sure ;)

> +               }
> +
> +               iter->counters.pcnt = pcnt + pcpu_alloc;
> +               iter->counters.bcnt = !!pcpu_alloc;
> +               pcpu_alloc += sizeof(struct xt_counters);
> +
> +               if (pcpu_alloc > PAGE_SIZE - sizeof(struct xt_counters))
> +                       pcpu_alloc = 0;
> +
> +               ret = find_check_entry(iter, net, repl->name, repl->size)
>  ...
> 
> This is going to be ugly since we'll have to deal with SMP vs. NONSMP (i.e. no perpcu allocations)
> in ip/ip6/arptables.

Time for a common helper then ...

> 
> Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).

Free if the address is aligned to a page boundary ?

Otherwise skip it, it already has been freed earlier.

> 
> But maybe I don't understand what you are suggesting :)
> Can you elaborate?

Note that this grouping will also help data locality.

I definitely have servers with a huge number of percpu allocations and I
fear we might have many TLB misses because of possible spread of
xt_counters.

Note that percpu pages must not be shared by multiple users
(ip/ip6/arptable), each table should get its own cache.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-16 15:23     ` Eric Dumazet
@ 2016-11-17  0:07       ` Florian Westphal
  2016-11-17  2:34         ` Eric Dumazet
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Florian Westphal @ 2016-11-17  0:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, Eric Dumazet, Eric Desrochers, netfilter-devel

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
> > > > Hi Eric,
> > > >
> > > > My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
> > > >
> > > > I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
> > > >
> > > > I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
> > > >
> > > > So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
> > > >
> > > >
> > > > Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
> > > >
> > > > Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
> > > >
> > > > I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
> > > >
> > > > Thanks in advance.
> > 
> > [..]
> > 
> > > Key point is that we really care about fast path : packet processing.
> > > And cited commit helps this a lot by lowering the memory foot print on
> > > hosts with many cores.
> > > This is a step into right direction.
> > > 
> > > Now we probably should batch the percpu allocations one page at a
> > > time, or ask Tejun if percpu allocations could be really really fast
> > > (probably much harder)
> > > 
> > > But really you should not use iptables one rule at a time...
> > > This will never compete with iptables-restore. ;)
> > > 
> > > Florian, would you have time to work on a patch trying to group the
> > > percpu allocations one page at a time ?
> > 
> > You mean something like this ? :
> >         xt_entry_foreach(iter, entry0, newinfo->size) {
> > -               ret = find_check_entry(iter, net, repl->name, repl->size);
> > -               if (ret != 0)
> > +               if (pcpu_alloc == 0) {
> > +                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));
> 
> alignment should be a page.

[..]

> > Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).
> 
> Free if the address is aligned to a page boundary ?

Good idea.  This seems to work for me.  Eric (Desrochers), does this
improve the situation for you as well?

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -403,38 +403,14 @@ static inline unsigned long ifname_compare_aligned(const char *_a,
 	return ret;
 }
 
+struct xt_percpu_counter_alloc_state {
+	unsigned int off;
+	const char *mem;
+};
 
-/* On SMP, ip(6)t_entry->counters.pcnt holds address of the
- * real (percpu) counter.  On !SMP, its just the packet count,
- * so nothing needs to be done there.
- *
- * xt_percpu_counter_alloc returns the address of the percpu
- * counter, or 0 on !SMP. We force an alignment of 16 bytes
- * so that bytes/packets share a common cache line.
- *
- * Hence caller must use IS_ERR_VALUE to check for error, this
- * allows us to return 0 for single core systems without forcing
- * callers to deal with SMP vs. NONSMP issues.
- */
-static inline unsigned long xt_percpu_counter_alloc(void)
-{
-	if (nr_cpu_ids > 1) {
-		void __percpu *res = __alloc_percpu(sizeof(struct xt_counters),
-						    sizeof(struct xt_counters));
-
-		if (res == NULL)
-			return -ENOMEM;
-
-		return (__force unsigned long) res;
-	}
-
-	return 0;
-}
-static inline void xt_percpu_counter_free(u64 pcnt)
-{
-	if (nr_cpu_ids > 1)
-		free_percpu((void __percpu *) (unsigned long) pcnt);
-}
+bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
+			     struct xt_counters *counter);
+void xt_percpu_counter_free(struct xt_counters *cnt);
 
 static inline struct xt_counters *
 xt_get_this_cpu_counter(struct xt_counters *cnt)
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 39004da318e2..cbea0cb030da 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -411,17 +411,15 @@ static inline int check_target(struct arpt_entry *e, const char *name)
 }
 
 static inline int
-find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
+find_check_entry(struct arpt_entry *e, const char *name, unsigned int size,
+		 struct xt_percpu_counter_alloc_state *alloc_state)
 {
 	struct xt_entry_target *t;
 	struct xt_target *target;
-	unsigned long pcnt;
 	int ret;
 
-	pcnt = xt_percpu_counter_alloc();
-	if (IS_ERR_VALUE(pcnt))
+	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
 		return -ENOMEM;
-	e->counters.pcnt = pcnt;
 
 	t = arpt_get_target(e);
 	target = xt_request_find_target(NFPROTO_ARP, t->u.user.name,
@@ -439,7 +437,7 @@ find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
 err:
 	module_put(t->u.kernel.target->me);
 out:
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 
 	return ret;
 }
@@ -519,7 +517,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
 	module_put(par.target->me);
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 }
 
 /* Checks and translates the user-supplied table segment (held in
@@ -528,6 +526,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
 static int translate_table(struct xt_table_info *newinfo, void *entry0,
 			   const struct arpt_replace *repl)
 {
+	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
 	struct arpt_entry *iter;
 	unsigned int *offsets;
 	unsigned int i;
@@ -590,7 +589,7 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 	/* Finally, each sanity check must pass */
 	i = 0;
 	xt_entry_foreach(iter, entry0, newinfo->size) {
-		ret = find_check_entry(iter, repl->name, repl->size);
+		ret = find_check_entry(iter, repl->name, repl->size, &alloc_state);
 		if (ret != 0)
 			break;
 		++i;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 46815c8a60d7..0024550516d1 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -531,7 +531,8 @@ static int check_target(struct ipt_entry *e, struct net *net, const char *name)
 
 static int
 find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
-		 unsigned int size)
+		 unsigned int size,
+		 struct xt_percpu_counter_alloc_state *alloc_state)
 {
 	struct xt_entry_target *t;
 	struct xt_target *target;
@@ -539,12 +540,9 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
 	unsigned int j;
 	struct xt_mtchk_param mtpar;
 	struct xt_entry_match *ematch;
-	unsigned long pcnt;
 
-	pcnt = xt_percpu_counter_alloc();
-	if (IS_ERR_VALUE(pcnt))
+	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
 		return -ENOMEM;
-	e->counters.pcnt = pcnt;
 
 	j = 0;
 	mtpar.net	= net;
@@ -582,7 +580,7 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
 		cleanup_match(ematch, net);
 	}
 
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 
 	return ret;
 }
@@ -670,7 +668,7 @@ cleanup_entry(struct ipt_entry *e, struct net *net)
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
 	module_put(par.target->me);
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 }
 
 /* Checks and translates the user-supplied table segment (held in
@@ -679,6 +677,7 @@ static int
 translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		const struct ipt_replace *repl)
 {
+	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
 	struct ipt_entry *iter;
 	unsigned int *offsets;
 	unsigned int i;
@@ -738,7 +737,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 	/* Finally, each sanity check must pass */
 	i = 0;
 	xt_entry_foreach(iter, entry0, newinfo->size) {
-		ret = find_check_entry(iter, net, repl->name, repl->size);
+		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
 		if (ret != 0)
 			break;
 		++i;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 6ff42b8301cc..123d9af6742e 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -562,7 +562,8 @@ static int check_target(struct ip6t_entry *e, struct net *net, const char *name)
 
 static int
 find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
-		 unsigned int size)
+		 unsigned int size,
+		 struct xt_percpu_counter_alloc_state *alloc_state)
 {
 	struct xt_entry_target *t;
 	struct xt_target *target;
@@ -570,12 +571,9 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
 	unsigned int j;
 	struct xt_mtchk_param mtpar;
 	struct xt_entry_match *ematch;
-	unsigned long pcnt;
 
-	pcnt = xt_percpu_counter_alloc();
-	if (IS_ERR_VALUE(pcnt))
+	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
 		return -ENOMEM;
-	e->counters.pcnt = pcnt;
 
 	j = 0;
 	mtpar.net	= net;
@@ -612,7 +610,7 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
 		cleanup_match(ematch, net);
 	}
 
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 
 	return ret;
 }
@@ -699,8 +697,7 @@ static void cleanup_entry(struct ip6t_entry *e, struct net *net)
 	if (par.target->destroy != NULL)
 		par.target->destroy(&par);
 	module_put(par.target->me);
-
-	xt_percpu_counter_free(e->counters.pcnt);
+	xt_percpu_counter_free(&e->counters);
 }
 
 /* Checks and translates the user-supplied table segment (held in
@@ -709,6 +706,7 @@ static int
 translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		const struct ip6t_replace *repl)
 {
+	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
 	struct ip6t_entry *iter;
 	unsigned int *offsets;
 	unsigned int i;
@@ -768,7 +766,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 	/* Finally, each sanity check must pass */
 	i = 0;
 	xt_entry_foreach(iter, entry0, newinfo->size) {
-		ret = find_check_entry(iter, net, repl->name, repl->size);
+		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
 		if (ret != 0)
 			break;
 		++i;
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index ad818e52859b..a4d1084b163f 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1615,6 +1615,59 @@ void xt_proto_fini(struct net *net, u_int8_t af)
 }
 EXPORT_SYMBOL_GPL(xt_proto_fini);
 
+/**
+ * xt_percpu_counter_alloc - allocate x_tables rule counter
+ *
+ * @state: pointer to xt_percpu allocation state
+ * @counter: pointer to counter struct inside the ip(6)/arpt_entry struct
+ *
+ * On SMP, the packet counter [ ip(6)t_entry->counters.pcnt ] will then
+ * contain the address of the real (percpu) counter.
+ *
+ * Rule evaluation needs to use xt_get_this_cpu_counter() helper
+ * to fetch the real percpu counter.
+ *
+ * To speed up allocation and improve data locality, an entire
+ * page is allocated.
+ *
+ * xt_percpu_counter_alloc_state contains the base address of the
+ * allocated page and the current sub-offset.
+ *
+ * returns false on error.
+ */
+bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
+			     struct xt_counters *counter)
+{
+	BUILD_BUG_ON(PAGE_SIZE < (sizeof(*counter) * 2));
+
+	if (nr_cpu_ids <= 1)
+		return true;
+
+	if (state->mem == NULL) {
+		state->mem = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
+		if (!state->mem)
+			return false;
+	}
+	counter->pcnt = (__force unsigned long)(state->mem + state->off);
+	state->off += sizeof(*counter);
+	if (state->off > (PAGE_SIZE - sizeof(*counter))) {
+		state->mem = NULL;
+		state->off = 0;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(xt_percpu_counter_alloc);
+
+void xt_percpu_counter_free(struct xt_counters *counters)
+{
+	unsigned long pcnt = counters->pcnt;
+
+	if (nr_cpu_ids > 1 && (pcnt & (PAGE_SIZE - 1)) == 0)
+		free_percpu((void __percpu *) (unsigned long)pcnt);
+}
+EXPORT_SYMBOL_GPL(xt_percpu_counter_free);
+
 static int __net_init xt_net_init(struct net *net)
 {
 	int i;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-17  0:07       ` Florian Westphal
@ 2016-11-17  2:34         ` Eric Dumazet
  2016-11-17 15:49         ` Eric Desrochers
  2016-11-20  6:33         ` Eric Dumazet
  2 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2016-11-17  2:34 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, Eric Desrochers, netfilter-devel

On Thu, 2016-11-17 at 01:07 +0100, Florian Westphal wrote:

Seems very nice !

> +
> +void xt_percpu_counter_free(struct xt_counters *counters)
> +{
> +	unsigned long pcnt = counters->pcnt;
> +
> +	if (nr_cpu_ids > 1 && (pcnt & (PAGE_SIZE - 1)) == 0)
> +		free_percpu((void __percpu *) (unsigned long)pcnt);
> +}


pcnt is already an "unsigned long"

This packing might also speed up "iptables -nvL" dumps ;)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-17  0:07       ` Florian Westphal
  2016-11-17  2:34         ` Eric Dumazet
@ 2016-11-17 15:49         ` Eric Desrochers
  2016-11-20  6:33         ` Eric Dumazet
  2 siblings, 0 replies; 10+ messages in thread
From: Eric Desrochers @ 2016-11-17 15:49 UTC (permalink / raw)
  To: Florian Westphal, Eric Dumazet; +Cc: Eric Dumazet, netfilter-devel

Hi Florian,

thanks for quick response, will give it a try and get back to you with the outcome of my test.

On 2016-11-17 01:07 AM, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Wed, Nov 16, 2016 at 2:22 AM, Eric Desrochers <ericd@miromedia.ca> wrote:
>>>>> Hi Eric,
>>>>>
>>>>> My name is Eric and I'm reaching you today as I found your name in multiple netfilter kernel commits, and was hoping we can discuss about a potential regression.
>>>>>
>>>>> I identified (git bisect) this commit [https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f] as the one introducing a serious performance slowdown when using the binary ip/ip6tables with a large number of policies.
>>>>>
>>>>> I also tried with the latest and greatest v4.9-rc4 mainline kernel, and the slowdown is still present.
>>>>>
>>>>> So even commit [https://github.com/torvalds/linux/commit/a1a56aaa0735c7821e85c37268a7c12e132415cb] which introduce a 16 bytes alignment on xt_counter percpu allocations so that bytes and packets sit in same cache line, doesn't have impact too.
>>>>>
>>>>>
>>>>> Everything I found is detailed in the following bug : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1640786
>>>>>
>>>>> Of course, I'm totally aware that "iptables-restore" should be the favorite choice as it is way more efficient (note that using iptables-restore doesn't exhibit the problem) but some folks still rely on ip/ip6tables and might face this performance slowdown.
>>>>>
>>>>> I found the problem today, I will continue to investigate on my side, but I was wondering if we could have a discussion about this subject.
>>>>>
>>>>> Thanks in advance.
>>> [..]
>>>
>>>> Key point is that we really care about fast path : packet processing.
>>>> And cited commit helps this a lot by lowering the memory foot print on
>>>> hosts with many cores.
>>>> This is a step into right direction.
>>>>
>>>> Now we probably should batch the percpu allocations one page at a
>>>> time, or ask Tejun if percpu allocations could be really really fast
>>>> (probably much harder)
>>>>
>>>> But really you should not use iptables one rule at a time...
>>>> This will never compete with iptables-restore. ;)
>>>>
>>>> Florian, would you have time to work on a patch trying to group the
>>>> percpu allocations one page at a time ?
>>> You mean something like this ? :
>>>         xt_entry_foreach(iter, entry0, newinfo->size) {
>>> -               ret = find_check_entry(iter, net, repl->name, repl->size);
>>> -               if (ret != 0)
>>> +               if (pcpu_alloc == 0) {
>>> +                       pcnt = __alloc_percpu(PAGE_SIZE, sizeof(struct xt_counters));
>> alignment should be a page.
> [..]
>
>>> Error unwind will also be a mess (we can abuse .bcnt to tell if pcpu offset should be free'd or not).
>> Free if the address is aligned to a page boundary ?
> Good idea.  This seems to work for me.  Eric (Desrochers), does this
> improve the situation for you as well?
>
> diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
> --- a/include/linux/netfilter/x_tables.h
> +++ b/include/linux/netfilter/x_tables.h
> @@ -403,38 +403,14 @@ static inline unsigned long ifname_compare_aligned(const char *_a,
>  	return ret;
>  }
>  
> +struct xt_percpu_counter_alloc_state {
> +	unsigned int off;
> +	const char *mem;
> +};
>  
> -/* On SMP, ip(6)t_entry->counters.pcnt holds address of the
> - * real (percpu) counter.  On !SMP, its just the packet count,
> - * so nothing needs to be done there.
> - *
> - * xt_percpu_counter_alloc returns the address of the percpu
> - * counter, or 0 on !SMP. We force an alignment of 16 bytes
> - * so that bytes/packets share a common cache line.
> - *
> - * Hence caller must use IS_ERR_VALUE to check for error, this
> - * allows us to return 0 for single core systems without forcing
> - * callers to deal with SMP vs. NONSMP issues.
> - */
> -static inline unsigned long xt_percpu_counter_alloc(void)
> -{
> -	if (nr_cpu_ids > 1) {
> -		void __percpu *res = __alloc_percpu(sizeof(struct xt_counters),
> -						    sizeof(struct xt_counters));
> -
> -		if (res == NULL)
> -			return -ENOMEM;
> -
> -		return (__force unsigned long) res;
> -	}
> -
> -	return 0;
> -}
> -static inline void xt_percpu_counter_free(u64 pcnt)
> -{
> -	if (nr_cpu_ids > 1)
> -		free_percpu((void __percpu *) (unsigned long) pcnt);
> -}
> +bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
> +			     struct xt_counters *counter);
> +void xt_percpu_counter_free(struct xt_counters *cnt);
>  
>  static inline struct xt_counters *
>  xt_get_this_cpu_counter(struct xt_counters *cnt)
> diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
> index 39004da318e2..cbea0cb030da 100644
> --- a/net/ipv4/netfilter/arp_tables.c
> +++ b/net/ipv4/netfilter/arp_tables.c
> @@ -411,17 +411,15 @@ static inline int check_target(struct arpt_entry *e, const char *name)
>  }
>  
>  static inline int
> -find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
> +find_check_entry(struct arpt_entry *e, const char *name, unsigned int size,
> +		 struct xt_percpu_counter_alloc_state *alloc_state)
>  {
>  	struct xt_entry_target *t;
>  	struct xt_target *target;
> -	unsigned long pcnt;
>  	int ret;
>  
> -	pcnt = xt_percpu_counter_alloc();
> -	if (IS_ERR_VALUE(pcnt))
> +	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
>  		return -ENOMEM;
> -	e->counters.pcnt = pcnt;
>  
>  	t = arpt_get_target(e);
>  	target = xt_request_find_target(NFPROTO_ARP, t->u.user.name,
> @@ -439,7 +437,7 @@ find_check_entry(struct arpt_entry *e, const char *name, unsigned int size)
>  err:
>  	module_put(t->u.kernel.target->me);
>  out:
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  
>  	return ret;
>  }
> @@ -519,7 +517,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
>  	if (par.target->destroy != NULL)
>  		par.target->destroy(&par);
>  	module_put(par.target->me);
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  }
>  
>  /* Checks and translates the user-supplied table segment (held in
> @@ -528,6 +526,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
>  static int translate_table(struct xt_table_info *newinfo, void *entry0,
>  			   const struct arpt_replace *repl)
>  {
> +	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
>  	struct arpt_entry *iter;
>  	unsigned int *offsets;
>  	unsigned int i;
> @@ -590,7 +589,7 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
>  	/* Finally, each sanity check must pass */
>  	i = 0;
>  	xt_entry_foreach(iter, entry0, newinfo->size) {
> -		ret = find_check_entry(iter, repl->name, repl->size);
> +		ret = find_check_entry(iter, repl->name, repl->size, &alloc_state);
>  		if (ret != 0)
>  			break;
>  		++i;
> diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> index 46815c8a60d7..0024550516d1 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -531,7 +531,8 @@ static int check_target(struct ipt_entry *e, struct net *net, const char *name)
>  
>  static int
>  find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
> -		 unsigned int size)
> +		 unsigned int size,
> +		 struct xt_percpu_counter_alloc_state *alloc_state)
>  {
>  	struct xt_entry_target *t;
>  	struct xt_target *target;
> @@ -539,12 +540,9 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
>  	unsigned int j;
>  	struct xt_mtchk_param mtpar;
>  	struct xt_entry_match *ematch;
> -	unsigned long pcnt;
>  
> -	pcnt = xt_percpu_counter_alloc();
> -	if (IS_ERR_VALUE(pcnt))
> +	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
>  		return -ENOMEM;
> -	e->counters.pcnt = pcnt;
>  
>  	j = 0;
>  	mtpar.net	= net;
> @@ -582,7 +580,7 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name,
>  		cleanup_match(ematch, net);
>  	}
>  
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  
>  	return ret;
>  }
> @@ -670,7 +668,7 @@ cleanup_entry(struct ipt_entry *e, struct net *net)
>  	if (par.target->destroy != NULL)
>  		par.target->destroy(&par);
>  	module_put(par.target->me);
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  }
>  
>  /* Checks and translates the user-supplied table segment (held in
> @@ -679,6 +677,7 @@ static int
>  translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  		const struct ipt_replace *repl)
>  {
> +	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
>  	struct ipt_entry *iter;
>  	unsigned int *offsets;
>  	unsigned int i;
> @@ -738,7 +737,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  	/* Finally, each sanity check must pass */
>  	i = 0;
>  	xt_entry_foreach(iter, entry0, newinfo->size) {
> -		ret = find_check_entry(iter, net, repl->name, repl->size);
> +		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
>  		if (ret != 0)
>  			break;
>  		++i;
> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
> index 6ff42b8301cc..123d9af6742e 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -562,7 +562,8 @@ static int check_target(struct ip6t_entry *e, struct net *net, const char *name)
>  
>  static int
>  find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
> -		 unsigned int size)
> +		 unsigned int size,
> +		 struct xt_percpu_counter_alloc_state *alloc_state)
>  {
>  	struct xt_entry_target *t;
>  	struct xt_target *target;
> @@ -570,12 +571,9 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
>  	unsigned int j;
>  	struct xt_mtchk_param mtpar;
>  	struct xt_entry_match *ematch;
> -	unsigned long pcnt;
>  
> -	pcnt = xt_percpu_counter_alloc();
> -	if (IS_ERR_VALUE(pcnt))
> +	if (!xt_percpu_counter_alloc(alloc_state, &e->counters))
>  		return -ENOMEM;
> -	e->counters.pcnt = pcnt;
>  
>  	j = 0;
>  	mtpar.net	= net;
> @@ -612,7 +610,7 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name,
>  		cleanup_match(ematch, net);
>  	}
>  
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  
>  	return ret;
>  }
> @@ -699,8 +697,7 @@ static void cleanup_entry(struct ip6t_entry *e, struct net *net)
>  	if (par.target->destroy != NULL)
>  		par.target->destroy(&par);
>  	module_put(par.target->me);
> -
> -	xt_percpu_counter_free(e->counters.pcnt);
> +	xt_percpu_counter_free(&e->counters);
>  }
>  
>  /* Checks and translates the user-supplied table segment (held in
> @@ -709,6 +706,7 @@ static int
>  translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  		const struct ip6t_replace *repl)
>  {
> +	struct xt_percpu_counter_alloc_state alloc_state = { 0 };
>  	struct ip6t_entry *iter;
>  	unsigned int *offsets;
>  	unsigned int i;
> @@ -768,7 +766,7 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
>  	/* Finally, each sanity check must pass */
>  	i = 0;
>  	xt_entry_foreach(iter, entry0, newinfo->size) {
> -		ret = find_check_entry(iter, net, repl->name, repl->size);
> +		ret = find_check_entry(iter, net, repl->name, repl->size, &alloc_state);
>  		if (ret != 0)
>  			break;
>  		++i;
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index ad818e52859b..a4d1084b163f 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1615,6 +1615,59 @@ void xt_proto_fini(struct net *net, u_int8_t af)
>  }
>  EXPORT_SYMBOL_GPL(xt_proto_fini);
>  
> +/**
> + * xt_percpu_counter_alloc - allocate x_tables rule counter
> + *
> + * @state: pointer to xt_percpu allocation state
> + * @counter: pointer to counter struct inside the ip(6)/arpt_entry struct
> + *
> + * On SMP, the packet counter [ ip(6)t_entry->counters.pcnt ] will then
> + * contain the address of the real (percpu) counter.
> + *
> + * Rule evaluation needs to use xt_get_this_cpu_counter() helper
> + * to fetch the real percpu counter.
> + *
> + * To speed up allocation and improve data locality, an entire
> + * page is allocated.
> + *
> + * xt_percpu_counter_alloc_state contains the base address of the
> + * allocated page and the current sub-offset.
> + *
> + * returns false on error.
> + */
> +bool xt_percpu_counter_alloc(struct xt_percpu_counter_alloc_state *state,
> +			     struct xt_counters *counter)
> +{
> +	BUILD_BUG_ON(PAGE_SIZE < (sizeof(*counter) * 2));
> +
> +	if (nr_cpu_ids <= 1)
> +		return true;
> +
> +	if (state->mem == NULL) {
> +		state->mem = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
> +		if (!state->mem)
> +			return false;
> +	}
> +	counter->pcnt = (__force unsigned long)(state->mem + state->off);
> +	state->off += sizeof(*counter);
> +	if (state->off > (PAGE_SIZE - sizeof(*counter))) {
> +		state->mem = NULL;
> +		state->off = 0;
> +	}
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(xt_percpu_counter_alloc);
> +
> +void xt_percpu_counter_free(struct xt_counters *counters)
> +{
> +	unsigned long pcnt = counters->pcnt;
> +
> +	if (nr_cpu_ids > 1 && (pcnt & (PAGE_SIZE - 1)) == 0)
> +		free_percpu((void __percpu *) (unsigned long)pcnt);
> +}
> +EXPORT_SYMBOL_GPL(xt_percpu_counter_free);
> +
>  static int __net_init xt_net_init(struct net *net)
>  {
>  	int i;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-17  0:07       ` Florian Westphal
  2016-11-17  2:34         ` Eric Dumazet
  2016-11-17 15:49         ` Eric Desrochers
@ 2016-11-20  6:33         ` Eric Dumazet
       [not found]           ` <CAGUFhKwQTRRJpfGi2fRkFfGdpLYMN-2F9G+dEsavM7UGbkjjdA@mail.gmail.com>
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-20  6:33 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, Eric Desrochers, netfilter-devel

On Thu, 2016-11-17 at 01:07 +0100, Florian Westphal wrote:

> +	if (state->mem == NULL) {
> +		state->mem = __alloc_percpu(PAGE_SIZE, PAGE_SIZE);
> +		if (!state->mem)
> +			return false;
> +	}

This will fail on arches where PAGE_SIZE=65536

percpu allocator limit is PCPU_MIN_UNIT_SIZE  ( 32 KB )

So maybe use a smaller value like 4096 ?

#define XT_PCPU_BLOC_SIZE 4096



^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <CAGUFhKwQTRRJpfGi2fRkFfGdpLYMN-2F9G+dEsavM7UGbkjjdA@mail.gmail.com>]

* Re: netfilter question
       [not found]           ` <CAGUFhKwQTRRJpfGi2fRkFfGdpLYMN-2F9G+dEsavM7UGbkjjdA@mail.gmail.com>
@ 2016-11-20 17:31             ` Eric Dumazet
  2016-11-20 17:55               ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2016-11-20 17:31 UTC (permalink / raw)
  To: Eric D; +Cc: Eric Dumazet, netfilter-devel, Florian Westphal

On Sun, 2016-11-20 at 12:22 -0500, Eric D wrote:
> I'm currently abroad for work and will come back home soon. I will
> test the solution and provide feedback to Florian by end of week.
> 
> Thanks for jumping on this quickly.
> 
> Eric
> 
> 
> On Nov 20, 2016 7:33 AM, "Eric Dumazet" <eric.dumazet@gmail.com>
> wrote:
>         On Thu, 2016-11-17 at 01:07 +0100, Florian Westphal wrote:
>         
>         > +     if (state->mem == NULL) {
>         > +             state->mem = __alloc_percpu(PAGE_SIZE,
>         PAGE_SIZE);
>         > +             if (!state->mem)
>         > +                     return false;
>         > +     }
>         
>         This will fail on arches where PAGE_SIZE=65536
>         
>         percpu allocator limit is PCPU_MIN_UNIT_SIZE  ( 32 KB )
>         
>         So maybe use a smaller value like 4096 ?
>         
>         #define XT_PCPU_BLOC_SIZE 4096
>         
Thanks Eric, I will test the patch myself, because I believe we need it
asap ;)





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: netfilter question
  2016-11-20 17:31             ` Eric Dumazet
@ 2016-11-20 17:55               ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2016-11-20 17:55 UTC (permalink / raw)
  To: Eric D; +Cc: Eric Dumazet, netfilter-devel, Florian Westphal

On Sun, 2016-11-20 at 09:31 -0800, Eric Dumazet wrote:

> Thanks Eric, I will test the patch myself, because I believe we need it
> asap ;)


Current net-next without Florian patch :

lpaa24:~# time for f in `seq 1 2000` ; do iptables -A FORWARD ; done

real	0m12.856s
user	0m0.590s
sys	0m11.131s


perf report ...; perf report ->

    47.45%  iptables  [kernel.kallsyms]  [k] pcpu_alloc_area                      
     8.49%  iptables  [kernel.kallsyms]  [k] memset_erms                          
     7.35%  iptables  [kernel.kallsyms]  [k] get_counters                         
     2.87%  iptables  [kernel.kallsyms]  [k] __memmove                            
     2.33%  iptables  [kernel.kallsyms]  [k] pcpu_alloc                           
     2.07%  iptables  [kernel.kallsyms]  [k] _find_next_bit.part.0                
     1.62%  iptables  xtables-multi      [.] 0x000000000001bb9d                   
     1.25%  iptables  [kernel.kallsyms]  [k] page_fault                           
     1.01%  iptables  [kernel.kallsyms]  [k] memcmp                               
     0.94%  iptables  [kernel.kallsyms]  [k] translate_table                      
     0.76%  iptables  [kernel.kallsyms]  [k] find_next_bit                        
     0.73%  iptables  [kernel.kallsyms]  [k] filemap_map_pages                    
     0.68%  iptables  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string       
     0.54%  iptables  [kernel.kallsyms]  [k] __get_user_8                         
     0.54%  iptables  [kernel.kallsyms]  [k] clear_page_c_e                

After patch :

lpaa24:~# time for f in `seq 1 2000` ; do iptables -A FORWARD ; done

real	0m3.867s
user	0m0.559s
sys	0m2.216s

    22.15%  iptables  [kernel.kallsyms]  [k] get_counters                           
     5.85%  iptables  xtables-multi      [.] 0x000000000001bbac                     
     3.99%  iptables  [kernel.kallsyms]  [k] page_fault                             
     2.37%  iptables  [kernel.kallsyms]  [k] memcmp                                 
     2.19%  iptables  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string         
     1.89%  iptables  [kernel.kallsyms]  [k] translate_table                        
     1.78%  iptables  [kernel.kallsyms]  [k] memset_erms                            
     1.74%  iptables  [kernel.kallsyms]  [k] clear_page_c_e                         
     1.73%  iptables  [kernel.kallsyms]  [k] __get_user_8                           
     1.72%  iptables  [kernel.kallsyms]  [k] perf_iterate_ctx                       
     1.21%  iptables  [kernel.kallsyms]  [k] handle_mm_fault                        
     0.98%  iptables  [kernel.kallsyms]  [k] unmap_page_range          

So this is a huge win. And I suspect data path will also gain from all
pcpu counters being in the same area of memory (this is where I am very interested)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* netfilter & ipv6
@ 2005-02-08  7:50 Jonas Berlin
       [not found] ` <53965.213.236.112.75.1107867276.squirrel@213.236.112.75>
  0 siblings, 1 reply; 10+ messages in thread
From: Jonas Berlin @ 2005-02-08  7:50 UTC (permalink / raw)
  To: netfilter-devel

Hi!

I have found that there seems to be a bunch of match & target modules 
for iptables that don't exist for ip6tables.. Like CLASSIFY for example. 
And some that exist have only a subset of the features.

Also some modules exist only for linux 2.4 but were never ported to 2.6 
or don't compile cleanly anymore on the newest 2.6 kernels.

While I don't have that much experience of 2.6 yet and only have been 
hacking some smaller modules for 2.4 in the past, I think I should have 
enough skills to at least port/update some of these.

So I thought maybe I could help out with this task - ipv6 is something 
that I hope and think will take over ipv4 some day and I'd like for 
linux to be able to provide the same experience then as ipv4 users of 
iptables have today.

I have some questions..

1. Some modules (will) look almost identical on ipv4 on and ipv6 - would 
there be any point in making the ipv4 modules export some symbols and 
then re-use those methods in ipv6 if it wouldn't imply any (major) 
changes to the ipv4 modules?

2. I think it could be nice to keep track of what modules exist for ipv4 
and ipv6 on some web page and maybe also let people sign up to join the 
porting effort (if there are any =). What do you think? And is that 
something I could help out with as well? I have a server at home that 
could run it, however behind an 0,5M adsl line..

3. If I succeed porting something, what should I name the patch to be 
sent to patch-o-matic-ng? I don't think I could just extend the existing 
ones as that would suggest the original authors have something to do 
with it and also some of them also are pending or already included in 
the ekernel. So, would CLASSIFY_v6 be an acceptable name for the ipv6 
version of CLASSIFY? And maybe condition_26 for the 2.6 port of 
condition from 2.4? What's your general procedure if someone have 
updates to a module that someone else did? Should the original authors 
be contacted instead of sending patches to this list?

Any suggestions you have are welcome!

-- 
- xkr47

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <53965.213.236.112.75.1107867276.squirrel@213.236.112.75>]

* ULOG target for ipv6
       [not found] ` <53965.213.236.112.75.1107867276.squirrel@213.236.112.75>
@ 2005-02-10 23:15   ` Jonas Berlin
  2005-02-11 22:10     ` netfilter question Pedro Fortuna
  0 siblings, 1 reply; 10+ messages in thread
From: Jonas Berlin @ 2005-02-10 23:15 UTC (permalink / raw)
  To: Martijn Lievaart, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 678 bytes --]

Martijn Lievaart wrote:

>What I mostly miss in ipv6 support is conntrack and ULOG.
>  
>
Here's my first attempt of ULOG for ipv6. It's _untested_, but it 
compiles at least on 2.6 and it patches fine against 2.4.28 also.

As a side note - I have no idea how a (the) userspace daemon will react 
to getting ipv6 packets. :)

Feel free to send me any problems you encounter with the kernel module 
or iptables extension though.

Good luck :)

--------

Someone could also point me to the latest instructions on how to 
contribute - the online docs I found here didn't work very well:

  
http://netfilter.org/documentation/HOWTO//netfilter-extensions-HOWTO-9.html

-- 
- xkr47


[-- Attachment #2: ipv6-ULOG-1.tar.bz2 --]
[-- Type: application/octet-stream, Size: 5806 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* netfilter question
  2005-02-10 23:15   ` ULOG target for ipv6 Jonas Berlin
@ 2005-02-11 22:10     ` Pedro Fortuna
  0 siblings, 0 replies; 10+ messages in thread
From: Pedro Fortuna @ 2005-02-11 22:10 UTC (permalink / raw)
  To: netfilter-devel

Hello guys. I'll try to make it as short an simple as I can.

I want to develop a kernel module which will be running in two linux
hosts, connected by a crossover network cable (ethernet). This kernel
module will intercept a specific type of traffic (as an example, let's
say FTP packets (encapsulated in DIX frames)), both incomming and
outgoing, and change the ethertype in the frame header.

Outgoing dix frames carrying FTP packets get their ethertype changed
to a private, non standard ethertype number, just before they leave
the host (i.e. before they are passed to the network driver). The
frame is intercepted with the NF_IP_POST_ROUTING hook.

Incoming dix frames carrying FTP packets are get their ethertype
changed (at this point, a non standard ethertype number) to the
standard IPv4 ethertype number (i.e. 0x800), just after they are
processed by the network driver. The frame is intercepted with the
NF_IP_PRE_ROUTING hook.

My doubt is:
I'm not sure if I will be able to intercept the incoming frames
because they have a non standard ethertype number. They might get
dropped before passing through the NF_IP_PRE_ROUTING hook, due to the
unrecognized ethertype number. Is this true or false?
If the frame passes the hook before trying to identify the packet
type, then I'll have no trouble, because my netfilter module changes
the frame to the original ethertype number, thus making the hole
process transparent to the TCP/IP stacks running in both hosts.

I could explain what the hell I need to this for, but then you would
have three times more text to read :P I tried to restrict this post to
a minimum-painless-size.

Regards,
-Pedro Fortuna

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-11-20 17:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cad49557-7c7a-83c9-d2b6-71d9624f0d52@miromedia.ca>
2016-11-16 13:33 ` netfilter question Eric Dumazet
2016-11-16 15:02   ` Florian Westphal
2016-11-16 15:23     ` Eric Dumazet
2016-11-17  0:07       ` Florian Westphal
2016-11-17  2:34         ` Eric Dumazet
2016-11-17 15:49         ` Eric Desrochers
2016-11-20  6:33         ` Eric Dumazet
     [not found]           ` <CAGUFhKwQTRRJpfGi2fRkFfGdpLYMN-2F9G+dEsavM7UGbkjjdA@mail.gmail.com>
2016-11-20 17:31             ` Eric Dumazet
2016-11-20 17:55               ` Eric Dumazet
2005-02-08  7:50 netfilter & ipv6 Jonas Berlin
     [not found] ` <53965.213.236.112.75.1107867276.squirrel@213.236.112.75>
2005-02-10 23:15   ` ULOG target for ipv6 Jonas Berlin
2005-02-11 22:10     ` netfilter question Pedro Fortuna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).