netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] netfilter: connlimit: scalability improvements
@ 2014-03-07 13:37 Florian Westphal
  2014-03-07 13:37 ` [PATCH 1/7] netfilter: connlimit: factor hlist search into new function Florian Westphal
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel

The connlimit match suffers from two problems:

- lock contention when multiple cpus invoke the match function
- algorithmic complexity: on average the connlimit match will need to
  traverse list of length NUMBER_OF_CONNTRACKS % 256 (HASH_BUCKET); as
  it needs to test which entries are still active by querying conntrack.

This patch set tries to solve both issues.

Tested on 4-core machine, load was generated via synflood from
randomly-generated IP addresses.

Config:
sysctl net.nf_conntrack_max=256000
echo 65536 > /sys/module/nf_conntrack/parameters/hashsize

With conntrack but without any iptables rules, the machine is not cpu
limited when flooding, the network is simply not able to handle more
packets. (close to 100 kpps rx, 50 kpps outbound syn/acks).
RPS was disabled in this test.

When adding
-A INPUT -p tcp --syn -m connlimit --connlimit-above 5 --connlimit-mask 32 --connlimit-saddr

this changes, entire test is now cpu-bound; kernel only handles ~6kpps rx.

enabling rps helps (at cost of more cpus being busy), but still maxes
out at ~35kpps rx.

perf trace in RPS-on test shows lock contention:
+  20.84%   ksoftirqd/2  [kernel.kallsyms]              [k] _raw_spin_lock_bh
+  20.76%   ksoftirqd/1  [kernel.kallsyms]              [k] _raw_spin_lock_bh
+  20.42%   ksoftirqd/0  [kernel.kallsyms]              [k] _raw_spin_lock_bh
+   6.07%   ksoftirqd/2  [nf_conntrack]                 [k] ____nf_conntrack_find
+   6.07%   ksoftirqd/1  [nf_conntrack]                 [k] ____nf_conntrack_find
+   5.97%   ksoftirqd/0  [nf_conntrack]                 [k] ____nf_conntrack_find
+   2.47%   ksoftirqd/2  [nf_conntrack]                 [k] hash_conntrack_raw
+   2.45%   ksoftirqd/0  [nf_conntrack]                 [k] hash_conntrack_raw
+   2.44%   ksoftirqd/1  [nf_conntrack]                 [k] hash_conntrack_raw

With keyed locks the contention goes away, providing some improvement
(50 kpps rx, 10 kpps tx):
+  20.95%  ksoftirqd/0  [nf_conntrack]                 [k] ____nf_conntrack_find
+  20.50%  ksoftirqd/1  [nf_conntrack]                 [k] ____nf_conntrack_find
+  20.27%  ksoftirqd/2  [nf_conntrack]                 [k] ____nf_conntrack_find
+   5.76%  ksoftirqd/1  [nf_conntrack]                 [k] hash_conntrack_raw
+   5.39%  ksoftirqd/2  [nf_conntrack]                 [k] hash_conntrack_raw
+   5.35%  ksoftirqd/0  [nf_conntrack]                 [k] hash_conntrack_raw
+   2.00%  ksoftirqd/1  [kernel.kallsyms]              [k] __rcu_read_unlock
+   1.95%  ksoftirqd/0  [kernel.kallsyms]              [k] __rcu_read_unlock
+   1.86%  ksoftirqd/2  [kernel.kallsyms]              [k] __rcu_read_unlock
+   1.14%  ksoftirqd/0  [nf_conntrack]                 [k] __nf_conntrack_find_get
+   1.14%  ksoftirqd/2  [nf_conntrack]                 [k] __nf_conntrack_find_get
+   1.05%  ksoftirqd/1  [nf_conntrack]                 [k] __nf_conntrack_find_get

With rbtree-based storage (and keyed locks) we can however handle *almost* the
same load as without the rule, (90kpps, 51kpps outbound):

+  17.24%       swapper  [nf_conntrack]                 [k] ____nf_conntrack_find
+   6.60%   ksoftirqd/2  [nf_conntrack]                 [k] ____nf_conntrack_find
+   2.73%       swapper  [nf_conntrack]                 [k] hash_conntrack_raw
+   2.36%       swapper  [xt_connlimit]                 [k] count_tree
+   2.23%       swapper  [nf_conntrack]                 [k] __nf_conntrack_confirm
+   2.00%       swapper  [kernel.kallsyms]              [k] _raw_spin_lock
+   1.40%       swapper  [nf_conntrack]                 [k] __nf_conntrack_find_get
+   1.29%       swapper  [kernel.kallsyms]              [k] __rcu_read_unlock
+   1.13%       swapper  [kernel.kallsyms]              [k] _raw_spin_lock_bh
+   1.13%   ksoftirqd/2  [nf_conntrack]                 [k] hash_conntrack_raw
+   1.06%       swapper  [kernel.kallsyms]              [k] sha_transform

Comments welcome.

These changes may also be pulled from

git://git.breakpoint.cc/fw/nf-next connlimit_18

Florian Westphal (7):
      netfilter: connlimit: factor hlist search into new function
      netfilter: connlimit: improve packet-to-closed-connection logic
      netfilter: connlimit: move insertion of new element out of count function
      netfilter: connlimit: use kmem_cache for conn objects
      netfilter: connlimit: use keyed locks
      netfilter: connlimit: make same_source_net signed
      netfilter: connlimit: use rbtree for per-host conntrack obj storage

 net/netfilter/xt_connlimit.c | 300 +++++++++++++++++++++++++++++++++----------
 1 file changed, 231 insertions(+), 69 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] netfilter: connlimit: factor hlist search into new function
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
@ 2014-03-07 13:37 ` Florian Westphal
  2014-03-07 13:37 ` [PATCH 2/7] netfilter: connlimit: improve packet-to-closed-connection logic Florian Westphal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Simplifies followup patch that introduces separate locks for each of
the hash slots.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_connlimit.c | 49 +++++++++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index c40b269..6988818 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -92,30 +92,24 @@ same_source_net(const union nf_inet_addr *addr,
 	}
 }
 
-static int count_them(struct net *net,
-		      struct xt_connlimit_data *data,
-		      const struct nf_conntrack_tuple *tuple,
-		      const union nf_inet_addr *addr,
-		      const union nf_inet_addr *mask,
-		      u_int8_t family)
+static int count_hlist(struct net *net,
+		       struct hlist_head *head,
+		       const struct nf_conntrack_tuple *tuple,
+		       const union nf_inet_addr *addr,
+		       const union nf_inet_addr *mask,
+		       u_int8_t family)
 {
 	const struct nf_conntrack_tuple_hash *found;
 	struct xt_connlimit_conn *conn;
 	struct hlist_node *n;
 	struct nf_conn *found_ct;
-	struct hlist_head *hash;
 	bool addit = true;
 	int matches = 0;
 
-	if (family == NFPROTO_IPV6)
-		hash = &data->iphash[connlimit_iphash6(addr, mask)];
-	else
-		hash = &data->iphash[connlimit_iphash(addr->ip & mask->ip)];
-
 	rcu_read_lock();
 
 	/* check the saved connections */
-	hlist_for_each_entry_safe(conn, n, hash, node) {
+	hlist_for_each_entry_safe(conn, n, head, node) {
 		found    = nf_conntrack_find_get(net, NF_CT_DEFAULT_ZONE,
 						 &conn->tuple);
 		found_ct = NULL;
@@ -166,13 +160,38 @@ static int count_them(struct net *net,
 			return -ENOMEM;
 		conn->tuple = *tuple;
 		conn->addr = *addr;
-		hlist_add_head(&conn->node, hash);
+		hlist_add_head(&conn->node, head);
 		++matches;
 	}
 
 	return matches;
 }
 
+static int count_them(struct net *net,
+		      struct xt_connlimit_data *data,
+		      const struct nf_conntrack_tuple *tuple,
+		      const union nf_inet_addr *addr,
+		      const union nf_inet_addr *mask,
+		      u_int8_t family)
+{
+	struct hlist_head *hhead;
+	int count;
+	u32 hash;
+
+	if (family == NFPROTO_IPV6)
+		hash = connlimit_iphash6(addr, mask);
+	else
+		hash = connlimit_iphash(addr->ip & mask->ip);
+
+	hhead = &data->iphash[hash];
+
+	spin_lock_bh(&data->lock);
+	count = count_hlist(net, hhead, tuple, addr, mask, family);
+	spin_unlock_bh(&data->lock);
+
+	return count;
+}
+
 static bool
 connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
@@ -202,10 +221,8 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 			  iph->daddr : iph->saddr;
 	}
 
-	spin_lock_bh(&info->data->lock);
 	connections = count_them(net, info->data, tuple_ptr, &addr,
 	                         &info->mask, par->family);
-	spin_unlock_bh(&info->data->lock);
 
 	if (connections < 0)
 		/* kmalloc failed, drop it entirely */
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] netfilter: connlimit: improve packet-to-closed-connection logic
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
  2014-03-07 13:37 ` [PATCH 1/7] netfilter: connlimit: factor hlist search into new function Florian Westphal
@ 2014-03-07 13:37 ` Florian Westphal
  2014-03-07 13:37 ` [PATCH 3/7] netfilter: connlimit: move insertion of new element out of count function Florian Westphal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Instead of freeing the entry from our list and then adding
it back again in the 'packet to closing connection' case just keep the
matching entry around.  Also drop the found_ct != NULL test as
nf_ct_tuplehash_to_ctrack is just container_of().

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_connlimit.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index 6988818..d4c6db1 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -112,29 +112,22 @@ static int count_hlist(struct net *net,
 	hlist_for_each_entry_safe(conn, n, head, node) {
 		found    = nf_conntrack_find_get(net, NF_CT_DEFAULT_ZONE,
 						 &conn->tuple);
-		found_ct = NULL;
+		if (found == NULL) {
+			hlist_del(&conn->node);
+			kfree(conn);
+			continue;
+		}
 
-		if (found != NULL)
-			found_ct = nf_ct_tuplehash_to_ctrack(found);
+		found_ct = nf_ct_tuplehash_to_ctrack(found);
 
-		if (found_ct != NULL &&
-		    nf_ct_tuple_equal(&conn->tuple, tuple) &&
-		    !already_closed(found_ct))
+		if (nf_ct_tuple_equal(&conn->tuple, tuple)) {
 			/*
 			 * Just to be sure we have it only once in the list.
 			 * We should not see tuples twice unless someone hooks
 			 * this into a table without "-p tcp --syn".
 			 */
 			addit = false;
-
-		if (found == NULL) {
-			/* this one is gone */
-			hlist_del(&conn->node);
-			kfree(conn);
-			continue;
-		}
-
-		if (already_closed(found_ct)) {
+		} else if (already_closed(found_ct)) {
 			/*
 			 * we do not care about connections which are
 			 * closed already -> ditch it
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] netfilter: connlimit: move insertion of new element out of count function
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
  2014-03-07 13:37 ` [PATCH 1/7] netfilter: connlimit: factor hlist search into new function Florian Westphal
  2014-03-07 13:37 ` [PATCH 2/7] netfilter: connlimit: improve packet-to-closed-connection logic Florian Westphal
@ 2014-03-07 13:37 ` Florian Westphal
  2014-03-07 13:37 ` [PATCH 4/7] netfilter: connlimit: use kmem_cache for conn objects Florian Westphal
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Allows easier code-reuse in followup patches.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_connlimit.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index d4c6db1..0220d40 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -97,13 +97,12 @@ static int count_hlist(struct net *net,
 		       const struct nf_conntrack_tuple *tuple,
 		       const union nf_inet_addr *addr,
 		       const union nf_inet_addr *mask,
-		       u_int8_t family)
+		       u_int8_t family, bool *addit)
 {
 	const struct nf_conntrack_tuple_hash *found;
 	struct xt_connlimit_conn *conn;
 	struct hlist_node *n;
 	struct nf_conn *found_ct;
-	bool addit = true;
 	int matches = 0;
 
 	rcu_read_lock();
@@ -126,7 +125,7 @@ static int count_hlist(struct net *net,
 			 * We should not see tuples twice unless someone hooks
 			 * this into a table without "-p tcp --syn".
 			 */
-			addit = false;
+			*addit = false;
 		} else if (already_closed(found_ct)) {
 			/*
 			 * we do not care about connections which are
@@ -146,20 +145,22 @@ static int count_hlist(struct net *net,
 
 	rcu_read_unlock();
 
-	if (addit) {
-		/* save the new connection in our list */
-		conn = kmalloc(sizeof(*conn), GFP_ATOMIC);
-		if (conn == NULL)
-			return -ENOMEM;
-		conn->tuple = *tuple;
-		conn->addr = *addr;
-		hlist_add_head(&conn->node, head);
-		++matches;
-	}
-
 	return matches;
 }
 
+static bool add_hlist(struct hlist_head *head,
+		      const struct nf_conntrack_tuple *tuple,
+		      const union nf_inet_addr *addr)
+{
+	struct xt_connlimit_conn *conn = kmalloc(sizeof(*conn), GFP_ATOMIC);
+	if (conn == NULL)
+		return false;
+	conn->tuple = *tuple;
+	conn->addr = *addr;
+	hlist_add_head(&conn->node, head);
+	return true;
+}
+
 static int count_them(struct net *net,
 		      struct xt_connlimit_data *data,
 		      const struct nf_conntrack_tuple *tuple,
@@ -170,6 +171,7 @@ static int count_them(struct net *net,
 	struct hlist_head *hhead;
 	int count;
 	u32 hash;
+	bool addit = true;
 
 	if (family == NFPROTO_IPV6)
 		hash = connlimit_iphash6(addr, mask);
@@ -179,7 +181,13 @@ static int count_them(struct net *net,
 	hhead = &data->iphash[hash];
 
 	spin_lock_bh(&data->lock);
-	count = count_hlist(net, hhead, tuple, addr, mask, family);
+	count = count_hlist(net, hhead, tuple, addr, mask, family, &addit);
+	if (addit) {
+		if (add_hlist(hhead, tuple, addr))
+			count++;
+		else
+			count = -ENOMEM;
+	}
 	spin_unlock_bh(&data->lock);
 
 	return count;
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] netfilter: connlimit: use kmem_cache for conn objects
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
                   ` (2 preceding siblings ...)
  2014-03-07 13:37 ` [PATCH 3/7] netfilter: connlimit: move insertion of new element out of count function Florian Westphal
@ 2014-03-07 13:37 ` Florian Westphal
  2014-03-07 13:37 ` [PATCH 5/7] netfilter: connlimit: use keyed locks Florian Westphal
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

We might allocate thousands of these (one object per connection).
Use distinct kmem cache to permit simplte tracking on how many
objects are currently used by the connlimit match via the sysfs.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_connlimit.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index 0220d40..a8eaabb 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -44,6 +44,7 @@ struct xt_connlimit_data {
 };
 
 static u_int32_t connlimit_rnd __read_mostly;
+static struct kmem_cache *connlimit_conn_cachep __read_mostly;
 
 static inline unsigned int connlimit_iphash(__be32 addr)
 {
@@ -113,7 +114,7 @@ static int count_hlist(struct net *net,
 						 &conn->tuple);
 		if (found == NULL) {
 			hlist_del(&conn->node);
-			kfree(conn);
+			kmem_cache_free(connlimit_conn_cachep, conn);
 			continue;
 		}
 
@@ -133,7 +134,7 @@ static int count_hlist(struct net *net,
 			 */
 			nf_ct_put(found_ct);
 			hlist_del(&conn->node);
-			kfree(conn);
+			kmem_cache_free(connlimit_conn_cachep, conn);
 			continue;
 		}
 
@@ -152,7 +153,9 @@ static bool add_hlist(struct hlist_head *head,
 		      const struct nf_conntrack_tuple *tuple,
 		      const union nf_inet_addr *addr)
 {
-	struct xt_connlimit_conn *conn = kmalloc(sizeof(*conn), GFP_ATOMIC);
+	struct xt_connlimit_conn *conn;
+
+	conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
 	if (conn == NULL)
 		return false;
 	conn->tuple = *tuple;
@@ -285,7 +288,7 @@ static void connlimit_mt_destroy(const struct xt_mtdtor_param *par)
 	for (i = 0; i < ARRAY_SIZE(info->data->iphash); ++i) {
 		hlist_for_each_entry_safe(conn, n, &hash[i], node) {
 			hlist_del(&conn->node);
-			kfree(conn);
+			kmem_cache_free(connlimit_conn_cachep, conn);
 		}
 	}
 
@@ -305,12 +308,23 @@ static struct xt_match connlimit_mt_reg __read_mostly = {
 
 static int __init connlimit_mt_init(void)
 {
-	return xt_register_match(&connlimit_mt_reg);
+	int ret;
+	connlimit_conn_cachep = kmem_cache_create("xt_connlimit_conn",
+					   sizeof(struct xt_connlimit_conn),
+					   0, 0, NULL);
+	if (!connlimit_conn_cachep)
+		return -ENOMEM;
+
+	ret = xt_register_match(&connlimit_mt_reg);
+	if (ret != 0)
+		kmem_cache_destroy(connlimit_conn_cachep);
+	return ret;
 }
 
 static void __exit connlimit_mt_exit(void)
 {
 	xt_unregister_match(&connlimit_mt_reg);
+	kmem_cache_destroy(connlimit_conn_cachep);
 }
 
 module_init(connlimit_mt_init);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] netfilter: connlimit: use keyed locks
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
                   ` (3 preceding siblings ...)
  2014-03-07 13:37 ` [PATCH 4/7] netfilter: connlimit: use kmem_cache for conn objects Florian Westphal
@ 2014-03-07 13:37 ` Florian Westphal
  2014-03-09 17:13   ` Jan Engelhardt
  2014-03-07 13:37 ` [PATCH 6/7] netfilter: connlimit: make same_source_net signed Florian Westphal
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

connlimit currently suffers from spinlock contention, example for
4-core system with rps enabled:

+  20.84%   ksoftirqd/2  [kernel.kallsyms] [k] _raw_spin_lock_bh
+  20.76%   ksoftirqd/1  [kernel.kallsyms] [k] _raw_spin_lock_bh
+  20.42%   ksoftirqd/0  [kernel.kallsyms] [k] _raw_spin_lock_bh
+   6.07%   ksoftirqd/2  [nf_conntrack]    [k] ____nf_conntrack_find
+   6.07%   ksoftirqd/1  [nf_conntrack]    [k] ____nf_conntrack_find
+   5.97%   ksoftirqd/0  [nf_conntrack]    [k] ____nf_conntrack_find
+   2.47%   ksoftirqd/2  [nf_conntrack]    [k] hash_conntrack_raw
+   2.45%   ksoftirqd/0  [nf_conntrack]    [k] hash_conntrack_raw
+   2.44%   ksoftirqd/1  [nf_conntrack]    [k] hash_conntrack_raw

May allow parallel lookup/insert/delete if the entry is hashed to
another slot.  With patch:

+  20.95%  ksoftirqd/0  [nf_conntrack] [k] ____nf_conntrack_find
+  20.50%  ksoftirqd/1  [nf_conntrack] [k] ____nf_conntrack_find
+  20.27%  ksoftirqd/2  [nf_conntrack] [k] ____nf_conntrack_find
+   5.76%  ksoftirqd/1  [nf_conntrack] [k] hash_conntrack_raw
+   5.39%  ksoftirqd/2  [nf_conntrack] [k] hash_conntrack_raw
+   5.35%  ksoftirqd/0  [nf_conntrack] [k] hash_conntrack_raw
+   2.00%  ksoftirqd/1  [kernel.kallsyms] [k] __rcu_read_unlock

Improved rx processing rate from ~35kpps to ~50 kpps.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_connlimit.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index a8eaabb..892566f 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -31,6 +31,9 @@
 #include <net/netfilter/nf_conntrack_tuple.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 
+#define CONNLIMIT_SLOTS	256 /* power-of-two */
+#define CONNLIMIT_LOCK_SLOTS	32 /* power-of-two */
+
 /* we will save the tuples of all connections we care about */
 struct xt_connlimit_conn {
 	struct hlist_node		node;
@@ -39,8 +42,8 @@ struct xt_connlimit_conn {
 };
 
 struct xt_connlimit_data {
-	struct hlist_head	iphash[256];
-	spinlock_t		lock;
+	struct hlist_head	iphash[CONNLIMIT_SLOTS];
+	spinlock_t		locks[CONNLIMIT_LOCK_SLOTS];
 };
 
 static u_int32_t connlimit_rnd __read_mostly;
@@ -48,7 +51,8 @@ static struct kmem_cache *connlimit_conn_cachep __read_mostly;
 
 static inline unsigned int connlimit_iphash(__be32 addr)
 {
-	return jhash_1word((__force __u32)addr, connlimit_rnd) & 0xFF;
+	return jhash_1word((__force __u32)addr,
+			    connlimit_rnd) % CONNLIMIT_SLOTS;
 }
 
 static inline unsigned int
@@ -61,7 +65,8 @@ connlimit_iphash6(const union nf_inet_addr *addr,
 	for (i = 0; i < ARRAY_SIZE(addr->ip6); ++i)
 		res.ip6[i] = addr->ip6[i] & mask->ip6[i];
 
-	return jhash2((u32 *)res.ip6, ARRAY_SIZE(res.ip6), connlimit_rnd) & 0xFF;
+	return jhash2((u32 *)res.ip6, ARRAY_SIZE(res.ip6),
+		       connlimit_rnd) % CONNLIMIT_SLOTS;
 }
 
 static inline bool already_closed(const struct nf_conn *conn)
@@ -183,7 +188,7 @@ static int count_them(struct net *net,
 
 	hhead = &data->iphash[hash];
 
-	spin_lock_bh(&data->lock);
+	spin_lock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
 	count = count_hlist(net, hhead, tuple, addr, mask, family, &addit);
 	if (addit) {
 		if (add_hlist(hhead, tuple, addr))
@@ -191,7 +196,7 @@ static int count_them(struct net *net,
 		else
 			count = -ENOMEM;
 	}
-	spin_unlock_bh(&data->lock);
+	spin_unlock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
 
 	return count;
 }
@@ -227,7 +232,6 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 
 	connections = count_them(net, info->data, tuple_ptr, &addr,
 	                         &info->mask, par->family);
-
 	if (connections < 0)
 		/* kmalloc failed, drop it entirely */
 		goto hotdrop;
@@ -268,7 +272,9 @@ static int connlimit_mt_check(const struct xt_mtchk_param *par)
 		return -ENOMEM;
 	}
 
-	spin_lock_init(&info->data->lock);
+	for (i = 0; i < ARRAY_SIZE(info->data->locks); ++i)
+		spin_lock_init(&info->data->locks[i]);
+
 	for (i = 0; i < ARRAY_SIZE(info->data->iphash); ++i)
 		INIT_HLIST_HEAD(&info->data->iphash[i]);
 
@@ -309,6 +315,9 @@ static struct xt_match connlimit_mt_reg __read_mostly = {
 static int __init connlimit_mt_init(void)
 {
 	int ret;
+
+	BUILD_BUG_ON(CONNLIMIT_LOCK_SLOTS > CONNLIMIT_SLOTS);
+
 	connlimit_conn_cachep = kmem_cache_create("xt_connlimit_conn",
 					   sizeof(struct xt_connlimit_conn),
 					   0, 0, NULL);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] netfilter: connlimit: make same_source_net signed
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
                   ` (4 preceding siblings ...)
  2014-03-07 13:37 ` [PATCH 5/7] netfilter: connlimit: use keyed locks Florian Westphal
@ 2014-03-07 13:37 ` Florian Westphal
  2014-03-07 13:37 ` [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage Florian Westphal
  2014-03-12 12:58 ` [PATCH 0/7] netfilter: connlimit: scalability improvements Pablo Neira Ayuso
  7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

currently returns 1 if they're the same.  Make it work like mem/strcmp
so it can be used as rbtree search function.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_connlimit.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index 892566f..d3a83a9 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -78,13 +78,14 @@ static inline bool already_closed(const struct nf_conn *conn)
 		return 0;
 }
 
-static inline unsigned int
+static int
 same_source_net(const union nf_inet_addr *addr,
 		const union nf_inet_addr *mask,
 		const union nf_inet_addr *u3, u_int8_t family)
 {
 	if (family == NFPROTO_IPV4) {
-		return (addr->ip & mask->ip) == (u3->ip & mask->ip);
+		return ntohl(addr->ip & mask->ip) -
+		       ntohl(u3->ip & mask->ip);
 	} else {
 		union nf_inet_addr lh, rh;
 		unsigned int i;
@@ -94,7 +95,7 @@ same_source_net(const union nf_inet_addr *addr,
 			rh.ip6[i] = u3->ip6[i] & mask->ip6[i];
 		}
 
-		return memcmp(&lh.ip6, &rh.ip6, sizeof(lh.ip6)) == 0;
+		return memcmp(&lh.ip6, &rh.ip6, sizeof(lh.ip6));
 	}
 }
 
@@ -143,7 +144,7 @@ static int count_hlist(struct net *net,
 			continue;
 		}
 
-		if (same_source_net(addr, mask, &conn->addr, family))
+		if (same_source_net(addr, mask, &conn->addr, family) == 0)
 			/* same source network -> be counted! */
 			++matches;
 		nf_ct_put(found_ct);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
                   ` (5 preceding siblings ...)
  2014-03-07 13:37 ` [PATCH 6/7] netfilter: connlimit: make same_source_net signed Florian Westphal
@ 2014-03-07 13:37 ` Florian Westphal
  2014-03-07 14:47   ` Eric Dumazet
  2014-03-12 12:58 ` [PATCH 0/7] netfilter: connlimit: scalability improvements Pablo Neira Ayuso
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 13:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

With current match design every invocation of the connlimit_match
function means we have to perform (number_of_conntracks % 256) lookups
in the conntrack table [ to perform GC/delete stale entries ].
This is also the reason why ____nf_conntrack_find() in perf top has
> 20% cpu time per core.

This patch changes the storage to rbtree which cuts down the number of
ct objects that need testing.

When looking up a new tuple, we only test the connections of the host
objects we visit while searching for the wanted host/network (or
the leaf we need to insert at).

The slot count is reduced to 32.  Increasing slot count doesn't
speed up things much because of rbtree nature.

before patch (50kpps rx, 10kpps tx):
+  20.95%  ksoftirqd/0  [nf_conntrack] [k] ____nf_conntrack_find
+  20.50%  ksoftirqd/1  [nf_conntrack] [k] ____nf_conntrack_find
+  20.27%  ksoftirqd/2  [nf_conntrack] [k] ____nf_conntrack_find
+   5.76%  ksoftirqd/1  [nf_conntrack] [k] hash_conntrack_raw
+   5.39%  ksoftirqd/2  [nf_conntrack] [k] hash_conntrack_raw
+   5.35%  ksoftirqd/0  [nf_conntrack] [k] hash_conntrack_raw

after (90kpps, 51kpps tx):
+  17.24%       swapper  [nf_conntrack]    [k] ____nf_conntrack_find
+   6.60%   ksoftirqd/2  [nf_conntrack]    [k] ____nf_conntrack_find
+   2.73%       swapper  [nf_conntrack]    [k] hash_conntrack_raw
+   2.36%       swapper  [xt_connlimit]    [k] count_tree

Obvious disadvantages to previous version are the increase in code
complexity and the increased memory cost.

Partially based on Eric Dumazets fq scheduler.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_connlimit.c | 214 +++++++++++++++++++++++++++++++++----------
 1 file changed, 167 insertions(+), 47 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index d3a83a9..190bbf2 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -19,6 +19,7 @@
 #include <linux/jhash.h>
 #include <linux/slab.h>
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/skbuff.h>
@@ -31,8 +32,9 @@
 #include <net/netfilter/nf_conntrack_tuple.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 
-#define CONNLIMIT_SLOTS	256 /* power-of-two */
+#define CONNLIMIT_SLOTS		32 /* power-of-two */
 #define CONNLIMIT_LOCK_SLOTS	32 /* power-of-two */
+#define CONNLIMIT_GC_MAX_NODES	16
 
 /* we will save the tuples of all connections we care about */
 struct xt_connlimit_conn {
@@ -41,12 +43,20 @@ struct xt_connlimit_conn {
 	union nf_inet_addr		addr;
 };
 
+struct xt_connlimit_rb {
+	struct rb_node node;
+	struct hlist_head hhead; /* connections/hosts in same subnet */
+	union nf_inet_addr addr; /* search key */
+};
+
 struct xt_connlimit_data {
-	struct hlist_head	iphash[CONNLIMIT_SLOTS];
+	struct rb_root climit_root4[CONNLIMIT_SLOTS];
+	struct rb_root climit_root6[CONNLIMIT_SLOTS];
 	spinlock_t		locks[CONNLIMIT_LOCK_SLOTS];
 };
 
 static u_int32_t connlimit_rnd __read_mostly;
+static struct kmem_cache *connlimit_rb_cachep __read_mostly;
 static struct kmem_cache *connlimit_conn_cachep __read_mostly;
 
 static inline unsigned int connlimit_iphash(__be32 addr)
@@ -99,19 +109,33 @@ same_source_net(const union nf_inet_addr *addr,
 	}
 }
 
-static int count_hlist(struct net *net,
-		       struct hlist_head *head,
-		       const struct nf_conntrack_tuple *tuple,
-		       const union nf_inet_addr *addr,
-		       const union nf_inet_addr *mask,
-		       u_int8_t family, bool *addit)
+static bool add_hlist(struct hlist_head *head,
+		      const struct nf_conntrack_tuple *tuple,
+		      const union nf_inet_addr *addr)
+{
+	struct xt_connlimit_conn *conn;
+
+	conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
+	if (conn == NULL)
+		return false;
+	conn->tuple = *tuple;
+	conn->addr = *addr;
+	hlist_add_head(&conn->node, head);
+	return true;
+}
+
+static unsigned int check_hlist(struct net *net,
+				struct hlist_head *head,
+				const struct nf_conntrack_tuple *tuple,
+				bool *addit)
 {
 	const struct nf_conntrack_tuple_hash *found;
 	struct xt_connlimit_conn *conn;
 	struct hlist_node *n;
 	struct nf_conn *found_ct;
-	int matches = 0;
+	unsigned int length = 0;
 
+	*addit = true;
 	rcu_read_lock();
 
 	/* check the saved connections */
@@ -144,30 +168,104 @@ static int count_hlist(struct net *net,
 			continue;
 		}
 
-		if (same_source_net(addr, mask, &conn->addr, family) == 0)
-			/* same source network -> be counted! */
-			++matches;
 		nf_ct_put(found_ct);
+		length++;
 	}
 
 	rcu_read_unlock();
 
-	return matches;
+	return length;
 }
 
-static bool add_hlist(struct hlist_head *head,
-		      const struct nf_conntrack_tuple *tuple,
-		      const union nf_inet_addr *addr)
+static void tree_nodes_free(struct rb_root *root,
+			    struct xt_connlimit_rb *gc_nodes[],
+			    unsigned int gc_count)
+{
+	struct xt_connlimit_rb *rbconn;
+
+	while (gc_count) {
+		rbconn = gc_nodes[--gc_count];
+		rb_erase(&rbconn->node, root);
+		kmem_cache_free(connlimit_rb_cachep, rbconn);
+	}
+}
+
+static unsigned int
+count_tree(struct net *net, struct rb_root *root,
+	   const struct nf_conntrack_tuple *tuple,
+	   const union nf_inet_addr *addr, const union nf_inet_addr *mask,
+	   u8 family)
 {
+	struct xt_connlimit_rb *gc_nodes[CONNLIMIT_GC_MAX_NODES];
+	struct rb_node **rbnode, *parent = NULL;
+	struct xt_connlimit_rb *rbconn;
 	struct xt_connlimit_conn *conn;
+	unsigned int count = 0;
+	unsigned int gc_count = 0;
+
+	rbnode = &(root->rb_node);
+
+	while (*rbnode) {
+		int diff;
+		bool addit;
+
+		rbconn = container_of(*rbnode, struct xt_connlimit_rb, node);
+
+		parent = *rbnode;
+		diff = same_source_net(addr, mask, &rbconn->addr, family);
+		if (diff < 0) {
+			rbnode = &((*rbnode)->rb_left);
+		} else if (diff > 0) {
+			rbnode = &((*rbnode)->rb_right);
+		} else {
+			/* same source network -> be counted! */
+			count = check_hlist(net, &rbconn->hhead, tuple, &addit);
+
+			tree_nodes_free(root, gc_nodes, gc_count);
+			if (!addit)
+				return count;
+
+			if (!add_hlist(&rbconn->hhead, tuple, addr))
+				return 0; /* hotdrop */
+
+			return count + 1;
+		}
+
+		if (gc_count >= ARRAY_SIZE(gc_nodes))
+			continue;
+
+		/* only used for GC on hhead, retval and 'addit' ignored */
+		check_hlist(net, &rbconn->hhead, tuple, &addit);
+		if (hlist_empty(&rbconn->hhead)) {
+			/* node with no connections, prep for removal */
+			gc_nodes[gc_count++] = rbconn;
+		}
+	}
+
+	/* no match, need to insert new node */
+	rbconn = kmem_cache_alloc(connlimit_rb_cachep, GFP_ATOMIC);
+	if (rbconn == NULL)
+		goto out;
 
 	conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
-	if (conn == NULL)
-		return false;
+	if (conn == NULL) {
+		kmem_cache_free(connlimit_rb_cachep, rbconn);
+		goto out;
+	}
+
 	conn->tuple = *tuple;
 	conn->addr = *addr;
-	hlist_add_head(&conn->node, head);
-	return true;
+	rbconn->addr = *addr;
+
+	INIT_HLIST_HEAD(&rbconn->hhead);
+	hlist_add_head(&conn->node, &rbconn->hhead);
+
+	rb_link_node(&rbconn->node, parent, rbnode);
+	rb_insert_color(&rbconn->node, root);
+	count = 1;
+ out:
+	tree_nodes_free(root, gc_nodes, gc_count);
+	return count;
 }
 
 static int count_them(struct net *net,
@@ -177,26 +275,22 @@ static int count_them(struct net *net,
 		      const union nf_inet_addr *mask,
 		      u_int8_t family)
 {
-	struct hlist_head *hhead;
+	struct rb_root *root;
 	int count;
 	u32 hash;
-	bool addit = true;
 
-	if (family == NFPROTO_IPV6)
+	if (family == NFPROTO_IPV6) {
 		hash = connlimit_iphash6(addr, mask);
-	else
+		root = &data->climit_root6[hash];
+	} else {
 		hash = connlimit_iphash(addr->ip & mask->ip);
-
-	hhead = &data->iphash[hash];
+		root = &data->climit_root4[hash];
+	}
 
 	spin_lock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
-	count = count_hlist(net, hhead, tuple, addr, mask, family, &addit);
-	if (addit) {
-		if (add_hlist(hhead, tuple, addr))
-			count++;
-		else
-			count = -ENOMEM;
-	}
+
+	count = count_tree(net, root, tuple, addr, mask, family);
+
 	spin_unlock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
 
 	return count;
@@ -212,7 +306,7 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	const struct nf_conntrack_tuple *tuple_ptr = &tuple;
 	enum ip_conntrack_info ctinfo;
 	const struct nf_conn *ct;
-	int connections;
+	unsigned int connections;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if (ct != NULL)
@@ -233,7 +327,7 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
 
 	connections = count_them(net, info->data, tuple_ptr, &addr,
 	                         &info->mask, par->family);
-	if (connections < 0)
+	if (connections == 0)
 		/* kmalloc failed, drop it entirely */
 		goto hotdrop;
 
@@ -276,28 +370,44 @@ static int connlimit_mt_check(const struct xt_mtchk_param *par)
 	for (i = 0; i < ARRAY_SIZE(info->data->locks); ++i)
 		spin_lock_init(&info->data->locks[i]);
 
-	for (i = 0; i < ARRAY_SIZE(info->data->iphash); ++i)
-		INIT_HLIST_HEAD(&info->data->iphash[i]);
+	for (i = 0; i < ARRAY_SIZE(info->data->climit_root4); ++i)
+		info->data->climit_root4[i] = RB_ROOT;
+	for (i = 0; i < ARRAY_SIZE(info->data->climit_root6); ++i)
+		info->data->climit_root6[i] = RB_ROOT;
 
 	return 0;
 }
 
-static void connlimit_mt_destroy(const struct xt_mtdtor_param *par)
+static void destroy_tree(struct rb_root *r)
 {
-	const struct xt_connlimit_info *info = par->matchinfo;
 	struct xt_connlimit_conn *conn;
+	struct xt_connlimit_rb *rbconn;
 	struct hlist_node *n;
-	struct hlist_head *hash = info->data->iphash;
-	unsigned int i;
+	struct rb_node *node;
 
-	nf_ct_l3proto_module_put(par->family);
+	while ((node = rb_first(r)) != NULL) {
+		rbconn = container_of(node, struct xt_connlimit_rb, node);
 
-	for (i = 0; i < ARRAY_SIZE(info->data->iphash); ++i) {
-		hlist_for_each_entry_safe(conn, n, &hash[i], node) {
-			hlist_del(&conn->node);
+		rb_erase(node, r);
+
+		hlist_for_each_entry_safe(conn, n, &rbconn->hhead, node)
 			kmem_cache_free(connlimit_conn_cachep, conn);
-		}
+
+		kmem_cache_free(connlimit_rb_cachep, rbconn);
 	}
+}
+
+static void connlimit_mt_destroy(const struct xt_mtdtor_param *par)
+{
+	const struct xt_connlimit_info *info = par->matchinfo;
+	unsigned int i;
+
+	nf_ct_l3proto_module_put(par->family);
+
+	for (i = 0; i < ARRAY_SIZE(info->data->climit_root4); ++i)
+		destroy_tree(&info->data->climit_root4[i]);
+	for (i = 0; i < ARRAY_SIZE(info->data->climit_root6); ++i)
+		destroy_tree(&info->data->climit_root6[i]);
 
 	kfree(info->data);
 }
@@ -325,9 +435,18 @@ static int __init connlimit_mt_init(void)
 	if (!connlimit_conn_cachep)
 		return -ENOMEM;
 
+	connlimit_rb_cachep = kmem_cache_create("xt_connlimit_rb",
+					   sizeof(struct xt_connlimit_rb),
+					   0, 0, NULL);
+	if (!connlimit_rb_cachep) {
+		kmem_cache_destroy(connlimit_conn_cachep);
+		return -ENOMEM;
+	}
 	ret = xt_register_match(&connlimit_mt_reg);
-	if (ret != 0)
+	if (ret != 0) {
 		kmem_cache_destroy(connlimit_conn_cachep);
+		kmem_cache_destroy(connlimit_rb_cachep);
+	}
 	return ret;
 }
 
@@ -335,6 +454,7 @@ static void __exit connlimit_mt_exit(void)
 {
 	xt_unregister_match(&connlimit_mt_reg);
 	kmem_cache_destroy(connlimit_conn_cachep);
+	kmem_cache_destroy(connlimit_rb_cachep);
 }
 
 module_init(connlimit_mt_init);
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage
  2014-03-07 13:37 ` [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage Florian Westphal
@ 2014-03-07 14:47   ` Eric Dumazet
  2014-03-07 16:15     ` Florian Westphal
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-03-07 14:47 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, 2014-03-07 at 14:37 +0100, Florian Westphal wrote:

> +	/* no match, need to insert new node */
> +	rbconn = kmem_cache_alloc(connlimit_rb_cachep, GFP_ATOMIC);
> +	if (rbconn == NULL)
> +		goto out;
>  
>  	conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
> -	if (conn == NULL)
> -		return false;
> +	if (conn == NULL) {
> +		kmem_cache_free(connlimit_rb_cachep, rbconn);
> +		goto out;
> +	}
> +
>  	conn->tuple = *tuple;
>  	conn->addr = *addr;
> -	hlist_add_head(&conn->node, head);
> -	return true;
> +	rbconn->addr = *addr;
> +
> +	INIT_HLIST_HEAD(&rbconn->hhead);
> +	hlist_add_head(&conn->node, &rbconn->hhead);
> +
> +	rb_link_node(&rbconn->node, parent, rbnode);
> +	rb_insert_color(&rbconn->node, root);
> +	count = 1;
> + out:
> +	tree_nodes_free(root, gc_nodes, gc_count);
> +	return count;
>  }

Very nice work Florian

I would call tree_nodes_free() _before_ attempting the
kmem_cache_alloc(), so that the allocation can reuse a hot object that
you freed right before allocation.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage
  2014-03-07 14:47   ` Eric Dumazet
@ 2014-03-07 16:15     ` Florian Westphal
  2014-03-09 18:42       ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2014-03-07 16:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netfilter-devel

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2014-03-07 at 14:37 +0100, Florian Westphal wrote:
> 
> > +	/* no match, need to insert new node */
> > +	rbconn = kmem_cache_alloc(connlimit_rb_cachep, GFP_ATOMIC);
> > +	if (rbconn == NULL)
> > +		goto out;
> >  
> >  	conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
> > -	if (conn == NULL)
> > -		return false;
> > +	if (conn == NULL) {
> > +		kmem_cache_free(connlimit_rb_cachep, rbconn);
> > +		goto out;
> > +	}
> > +
> >  	conn->tuple = *tuple;
> >  	conn->addr = *addr;
> > -	hlist_add_head(&conn->node, head);
> > -	return true;
> > +	rbconn->addr = *addr;
> > +
> > +	INIT_HLIST_HEAD(&rbconn->hhead);
> > +	hlist_add_head(&conn->node, &rbconn->hhead);
> > +
> > +	rb_link_node(&rbconn->node, parent, rbnode);
> > +	rb_insert_color(&rbconn->node, root);
> > +	count = 1;
> > + out:
> > +	tree_nodes_free(root, gc_nodes, gc_count);
> > +	return count;
> >  }
> 
> Very nice work Florian

Thanks Eric.

> I would call tree_nodes_free() _before_ attempting the
> kmem_cache_alloc(), so that the allocation can reuse a hot object that
> you freed right before allocation.

Hmm, that would be nice.  I need to think about it again,
problem is that moving it at this time could result in
freeing the would-be parent of the new node.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] netfilter: connlimit: use keyed locks
  2014-03-07 13:37 ` [PATCH 5/7] netfilter: connlimit: use keyed locks Florian Westphal
@ 2014-03-09 17:13   ` Jan Engelhardt
  2014-03-09 18:31     ` Florian Westphal
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Engelhardt @ 2014-03-09 17:13 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel


On Friday 2014-03-07 14:37, Florian Westphal wrote:
> 
>+#define CONNLIMIT_SLOTS	256 /* power-of-two */
>+#define CONNLIMIT_LOCK_SLOTS	32 /* power-of-two */

It's clear that 256 and 32 are powers of two. Or did you intend to
write "these should be a power of two"? Which would then raise the
question if they really need to be a power of two.

Given

> static inline unsigned int connlimit_iphash(__be32 addr)
> {
>-	return jhash_1word((__force __u32)addr, connlimit_rnd) & 0xFF;
>+	return jhash_1word((__force __u32)addr,
>+			    connlimit_rnd) % CONNLIMIT_SLOTS;
> }
> 
>@@ -183,7 +188,7 @@ static int count_them(struct net *net,
> 
> 	hhead = &data->iphash[hash];
> 
>-	spin_lock_bh(&data->lock);
>+	spin_lock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
> 	count = count_hlist(net, hhead, tuple, addr, mask, family, &addit);
> 	if (addit) {
> 		if (add_hlist(hhead, tuple, addr))

it would seem that it is sufficient to have CONNLIMIT_SLOTS be
a multiple of CONNLIMIT_LOCK_SLOTS and

> {
> 	int ret;
>+
>+	BUILD_BUG_ON(CONNLIMIT_LOCK_SLOTS > CONNLIMIT_SLOTS);
>+

be followed by

	BUILD_BUG_ON(CONNLIMIT_SLOTS % CONNLIMIT_LOCK_SLOTS != 0);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] netfilter: connlimit: use keyed locks
  2014-03-09 17:13   ` Jan Engelhardt
@ 2014-03-09 18:31     ` Florian Westphal
  0 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-09 18:31 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Florian Westphal, netfilter-devel

Jan Engelhardt <jengelh@inai.de> wrote:
> On Friday 2014-03-07 14:37, Florian Westphal wrote:
> > 
> >+#define CONNLIMIT_SLOTS	256 /* power-of-two */
> >+#define CONNLIMIT_LOCK_SLOTS	32 /* power-of-two */
> 
> It's clear that 256 and 32 are powers of two. Or did you intend to
> write "these should be a power of two"? Which would then raise the
> question if they really need to be a power of two.

Right, the intent was to avoid mod instruction.

> 	BUILD_BUG_ON(CONNLIMIT_SLOTS % CONNLIMIT_LOCK_SLOTS != 0);

I like it.  Will change, thanks for suggesting this.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage
  2014-03-07 16:15     ` Florian Westphal
@ 2014-03-09 18:42       ` Eric Dumazet
  2014-03-09 18:43         ` Florian Westphal
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-03-09 18:42 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, 2014-03-07 at 17:15 +0100, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > I would call tree_nodes_free() _before_ attempting the
> > kmem_cache_alloc(), so that the allocation can reuse a hot object that
> > you freed right before allocation.
> 
> Hmm, that would be nice.  I need to think about it again,
> problem is that moving it at this time could result in
> freeing the would-be parent of the new node.

Yeah, thats why fq_gc() is followed by a full lookup.

In practice, the lookup done in fq_gc() brings in cpu cache all the
cache lines, and second lookup is very fast.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage
  2014-03-09 18:42       ` Eric Dumazet
@ 2014-03-09 18:43         ` Florian Westphal
  2014-03-09 21:45           ` Florian Westphal
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2014-03-09 18:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netfilter-devel

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Hmm, that would be nice.  I need to think about it again,
> > problem is that moving it at this time could result in
> > freeing the would-be parent of the new node.
> 
> Yeah, thats why fq_gc() is followed by a full lookup.
> 
> In practice, the lookup done in fq_gc() brings in cpu cache all the
> cache lines, and second lookup is very fast.

I had wondered about this.  Ok, that makes sense.
I'll change it to be more like fq.

Thanks for explaining this.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage
  2014-03-09 18:43         ` Florian Westphal
@ 2014-03-09 21:45           ` Florian Westphal
  0 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2014-03-09 21:45 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, netfilter-devel

Florian Westphal <fw@strlen.de> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > Hmm, that would be nice.  I need to think about it again,
> > > problem is that moving it at this time could result in
> > > freeing the would-be parent of the new node.
> > 
> > Yeah, thats why fq_gc() is followed by a full lookup.
> > 
> > In practice, the lookup done in fq_gc() brings in cpu cache all the
> > cache lines, and second lookup is very fast.
> 
> I had wondered about this.  Ok, that makes sense.
> I'll change it to be more like fq.
> 
> Thanks for explaining this.

Not exactly pretty, but in most cases the restart won't be needed
(either because we found node-to-add-to or no stale node to remove).

I'll fold the following into patch #7, but will wait until Tuesday
before resend in order to give others a chance to comment.

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -202,7 +202,9 @@ count_tree(struct net *net, struct rb_root *root,
 	struct xt_connlimit_conn *conn;
 	unsigned int count = 0;
 	unsigned int gc_count = 0;
+	bool no_gc = false;
 
+ restart:
 	rbnode = &(root->rb_node);
 	while (*rbnode) {
 		int diff;
@@ -230,7 +232,7 @@ count_tree(struct net *net, struct rb_root *root,
 			return count + 1;
 		}
 
-		if (gc_count >= ARRAY_SIZE(gc_nodes))
+		if (no_gc || gc_count >= ARRAY_SIZE(gc_nodes))
 			continue;
 
 		/* only used for GC on hhead, retval and 'addit' ignored */
@@ -239,15 +241,22 @@ count_tree(struct net *net, struct rb_root *root,
 			gc_nodes[gc_count++] = rbconn;
 	}
 
+	if (gc_count) {
+		no_gc = true;
+		tree_nodes_free(root, gc_nodes, gc_count);
+		gc_count = 0;
+		goto restart;
+	}
+
 	/* no match, need to insert new node */
 	rbconn = kmem_cache_alloc(connlimit_rb_cachep, GFP_ATOMIC);
 	if (rbconn == NULL)
-		goto out;
+		return 0;
 
 	conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
 	if (conn == NULL) {
 		kmem_cache_free(connlimit_rb_cachep, rbconn);
-		goto out;
+		return 0;
 	}
 
 	conn->tuple = *tuple;
@@ -259,10 +268,7 @@ count_tree(struct net *net, struct rb_root *root,
 
 	rb_link_node(&rbconn->node, parent, rbnode);
 	rb_insert_color(&rbconn->node, root);
-	count = 1;
- out:
-	tree_nodes_free(root, gc_nodes, gc_count);
-	return count;
+	return 1;
 }
 
 static int count_them(struct net *net,

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] netfilter: connlimit: scalability improvements
  2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
                   ` (6 preceding siblings ...)
  2014-03-07 13:37 ` [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage Florian Westphal
@ 2014-03-12 12:58 ` Pablo Neira Ayuso
  7 siblings, 0 replies; 16+ messages in thread
From: Pablo Neira Ayuso @ 2014-03-12 12:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Mar 07, 2014 at 02:37:08PM +0100, Florian Westphal wrote:
> Florian Westphal (7):
>       netfilter: connlimit: factor hlist search into new function
>       netfilter: connlimit: improve packet-to-closed-connection logic
>       netfilter: connlimit: move insertion of new element out of count function
>       netfilter: connlimit: use kmem_cache for conn objects

Unless you have any concern, I'm going to apply from 1 to 4 which look
good to me. You can send me the three remaining patches after you have
address remaining issues.

Thanks.

>       netfilter: connlimit: use keyed locks
>       netfilter: connlimit: make same_source_net signed
>       netfilter: connlimit: use rbtree for per-host conntrack obj storage

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-03-12 12:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-07 13:37 [PATCH 0/7] netfilter: connlimit: scalability improvements Florian Westphal
2014-03-07 13:37 ` [PATCH 1/7] netfilter: connlimit: factor hlist search into new function Florian Westphal
2014-03-07 13:37 ` [PATCH 2/7] netfilter: connlimit: improve packet-to-closed-connection logic Florian Westphal
2014-03-07 13:37 ` [PATCH 3/7] netfilter: connlimit: move insertion of new element out of count function Florian Westphal
2014-03-07 13:37 ` [PATCH 4/7] netfilter: connlimit: use kmem_cache for conn objects Florian Westphal
2014-03-07 13:37 ` [PATCH 5/7] netfilter: connlimit: use keyed locks Florian Westphal
2014-03-09 17:13   ` Jan Engelhardt
2014-03-09 18:31     ` Florian Westphal
2014-03-07 13:37 ` [PATCH 6/7] netfilter: connlimit: make same_source_net signed Florian Westphal
2014-03-07 13:37 ` [PATCH 7/7] netfilter: connlimit: use rbtree for per-host conntrack obj storage Florian Westphal
2014-03-07 14:47   ` Eric Dumazet
2014-03-07 16:15     ` Florian Westphal
2014-03-09 18:42       ` Eric Dumazet
2014-03-09 18:43         ` Florian Westphal
2014-03-09 21:45           ` Florian Westphal
2014-03-12 12:58 ` [PATCH 0/7] netfilter: connlimit: scalability improvements Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).