* [PATCH v2 0/3] netfilter: connlimit: scalability improvements
@ 2014-03-12 22:49 Florian Westphal
2014-03-12 22:49 ` [PATCH v2 1/3] netfilter: connlimit: use keyed locks Florian Westphal
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Florian Westphal @ 2014-03-12 22:49 UTC (permalink / raw)
To: netfilter-devel
Resending the last three patches of the set; I have addressed
the comments I've received. See individual patches on whats
changed vs v1.
I've done a brief re-rest with 2-hrs of synflooding and
nf_conntrack_max=2000000 plus conntrack -F every 10 seconds and did not
encounter any issues.
I am copying the original v1 cover letter below.
The connlimit match suffers from two problems:
- lock contention when multiple cpus invoke the match function
- algorithmic complexity: on average the connlimit match will need to
examine NUMBER_OF_CONNTRACKS % HASH_BUCKET (always 256) connections
as the match will test for every connection assigned to the same bucked
as the new one wheter the conntrack is still active.
This patch set tries to solve both issues.
Tested on 4-core machine, load was generated via synflood from
randomly-generated IP addresses.
Config:
sysctl net.nf_conntrack_max=256000
echo 65536 > /sys/module/nf_conntrack/parameters/hashsize
With conntrack but without any iptables rules, the machine is not cpu
limited when flooding, the network is simply not able to handle more
packets. (close to 100 kpps rx, 50 kpps outbound syn/acks).
RPS was disabled in this test.
When adding
-A INPUT -p tcp --syn -m connlimit --connlimit-above 5 --connlimit-mask 32 --connlimit-saddr
this changes, entire test is now cpu-bound and we can only handle ~6kpps rx and
4kpps tx.
enabling rps helps (echo 7 > /sys/class/net/eth0/queues/rx-0/rps_cpus),
at cost of more cpu cycles, but we still max out at ~35kpps rx.
perf trace in this case shows lock contention:
+ 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
+ 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
+ 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
+ 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
+ 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
+ 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
With keyed locks the contention goes away, providing some improvement
(50 kpps rx, 10 kpps tx):
+ 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
+ 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
+ 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
+ 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
+ 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
+ 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
+ 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock
+ 1.95% ksoftirqd/0 [kernel.kallsyms] [k] __rcu_read_unlock
+ 1.86% ksoftirqd/2 [kernel.kallsyms] [k] __rcu_read_unlock
+ 1.14% ksoftirqd/0 [nf_conntrack] [k] __nf_conntrack_find_get
+ 1.14% ksoftirqd/2 [nf_conntrack] [k] __nf_conntrack_find_get
+ 1.05% ksoftirqd/1 [nf_conntrack] [k] __nf_conntrack_find_get
With rbtree-based storage (and keyed locks) we can however handle *almost* the
same load as without the rule, (90kpps, 51kpps outbound):
+ 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find
+ 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
+ 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw
+ 2.36% swapper [xt_connlimit] [k] count_tree
+ 2.23% swapper [nf_conntrack] [k] __nf_conntrack_confirm
+ 2.00% swapper [kernel.kallsyms] [k] _raw_spin_lock
+ 1.40% swapper [nf_conntrack] [k] __nf_conntrack_find_get
+ 1.29% swapper [kernel.kallsyms] [k] __rcu_read_unlock
+ 1.13% swapper [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 1.13% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
+ 1.06% swapper [kernel.kallsyms] [k] sha_transform
xt_connlimit.c | 259 ++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 200 insertions(+), 59 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/3] netfilter: connlimit: use keyed locks
2014-03-12 22:49 [PATCH v2 0/3] netfilter: connlimit: scalability improvements Florian Westphal
@ 2014-03-12 22:49 ` Florian Westphal
2014-03-12 22:49 ` [PATCH v2 2/3] netfilter: connlimit: make same_source_net signed Florian Westphal
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2014-03-12 22:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
connlimit currently suffers from spinlock contention, example for
4-core system with rps enabled:
+ 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh
+ 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
+ 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
+ 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
+ 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
+ 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
+ 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
May allow parallel lookup/insert/delete if the entry is hashed to
another slot. With patch:
+ 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
+ 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
+ 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
+ 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
+ 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
+ 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
+ 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock
Improved rx processing rate from ~35kpps to ~50 kpps.
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Changes in v2:
Address Jans comments wrt. CONNLIMIT_SLOTS 'power of 2'
and add BUILD_BUG_ON for it.
net/netfilter/xt_connlimit.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index a8eaabb..ad290cc 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -31,6 +31,9 @@
#include <net/netfilter/nf_conntrack_tuple.h>
#include <net/netfilter/nf_conntrack_zones.h>
+#define CONNLIMIT_SLOTS 256
+#define CONNLIMIT_LOCK_SLOTS 32
+
/* we will save the tuples of all connections we care about */
struct xt_connlimit_conn {
struct hlist_node node;
@@ -39,8 +42,8 @@ struct xt_connlimit_conn {
};
struct xt_connlimit_data {
- struct hlist_head iphash[256];
- spinlock_t lock;
+ struct hlist_head iphash[CONNLIMIT_SLOTS];
+ spinlock_t locks[CONNLIMIT_LOCK_SLOTS];
};
static u_int32_t connlimit_rnd __read_mostly;
@@ -48,7 +51,8 @@ static struct kmem_cache *connlimit_conn_cachep __read_mostly;
static inline unsigned int connlimit_iphash(__be32 addr)
{
- return jhash_1word((__force __u32)addr, connlimit_rnd) & 0xFF;
+ return jhash_1word((__force __u32)addr,
+ connlimit_rnd) % CONNLIMIT_SLOTS;
}
static inline unsigned int
@@ -61,7 +65,8 @@ connlimit_iphash6(const union nf_inet_addr *addr,
for (i = 0; i < ARRAY_SIZE(addr->ip6); ++i)
res.ip6[i] = addr->ip6[i] & mask->ip6[i];
- return jhash2((u32 *)res.ip6, ARRAY_SIZE(res.ip6), connlimit_rnd) & 0xFF;
+ return jhash2((u32 *)res.ip6, ARRAY_SIZE(res.ip6),
+ connlimit_rnd) % CONNLIMIT_SLOTS;
}
static inline bool already_closed(const struct nf_conn *conn)
@@ -183,7 +188,7 @@ static int count_them(struct net *net,
hhead = &data->iphash[hash];
- spin_lock_bh(&data->lock);
+ spin_lock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
count = count_hlist(net, hhead, tuple, addr, mask, family, &addit);
if (addit) {
if (add_hlist(hhead, tuple, addr))
@@ -191,7 +196,7 @@ static int count_them(struct net *net,
else
count = -ENOMEM;
}
- spin_unlock_bh(&data->lock);
+ spin_unlock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
return count;
}
@@ -227,7 +232,6 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
connections = count_them(net, info->data, tuple_ptr, &addr,
&info->mask, par->family);
-
if (connections < 0)
/* kmalloc failed, drop it entirely */
goto hotdrop;
@@ -268,7 +272,9 @@ static int connlimit_mt_check(const struct xt_mtchk_param *par)
return -ENOMEM;
}
- spin_lock_init(&info->data->lock);
+ for (i = 0; i < ARRAY_SIZE(info->data->locks); ++i)
+ spin_lock_init(&info->data->locks[i]);
+
for (i = 0; i < ARRAY_SIZE(info->data->iphash); ++i)
INIT_HLIST_HEAD(&info->data->iphash[i]);
@@ -309,6 +315,10 @@ static struct xt_match connlimit_mt_reg __read_mostly = {
static int __init connlimit_mt_init(void)
{
int ret;
+
+ BUILD_BUG_ON(CONNLIMIT_LOCK_SLOTS > CONNLIMIT_SLOTS);
+ BUILD_BUG_ON((CONNLIMIT_SLOTS % CONNLIMIT_LOCK_SLOTS) != 0);
+
connlimit_conn_cachep = kmem_cache_create("xt_connlimit_conn",
sizeof(struct xt_connlimit_conn),
0, 0, NULL);
--
1.8.1.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/3] netfilter: connlimit: make same_source_net signed
2014-03-12 22:49 [PATCH v2 0/3] netfilter: connlimit: scalability improvements Florian Westphal
2014-03-12 22:49 ` [PATCH v2 1/3] netfilter: connlimit: use keyed locks Florian Westphal
@ 2014-03-12 22:49 ` Florian Westphal
2014-03-12 22:49 ` [PATCH v2 3/3] netfilter: connlimit: use rbtree for per-host conntrack obj storage Florian Westphal
2014-03-17 11:44 ` [PATCH v2 0/3] netfilter: connlimit: scalability improvements Pablo Neira Ayuso
3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2014-03-12 22:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
currently returns 1 if they're the same. Make it work like mem/strcmp
so it can be used as rbtree search function.
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
no changes since v1.
net/netfilter/xt_connlimit.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index ad290cc..dc5207f 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -78,13 +78,14 @@ static inline bool already_closed(const struct nf_conn *conn)
return 0;
}
-static inline unsigned int
+static int
same_source_net(const union nf_inet_addr *addr,
const union nf_inet_addr *mask,
const union nf_inet_addr *u3, u_int8_t family)
{
if (family == NFPROTO_IPV4) {
- return (addr->ip & mask->ip) == (u3->ip & mask->ip);
+ return ntohl(addr->ip & mask->ip) -
+ ntohl(u3->ip & mask->ip);
} else {
union nf_inet_addr lh, rh;
unsigned int i;
@@ -94,7 +95,7 @@ same_source_net(const union nf_inet_addr *addr,
rh.ip6[i] = u3->ip6[i] & mask->ip6[i];
}
- return memcmp(&lh.ip6, &rh.ip6, sizeof(lh.ip6)) == 0;
+ return memcmp(&lh.ip6, &rh.ip6, sizeof(lh.ip6));
}
}
@@ -143,7 +144,7 @@ static int count_hlist(struct net *net,
continue;
}
- if (same_source_net(addr, mask, &conn->addr, family))
+ if (same_source_net(addr, mask, &conn->addr, family) == 0)
/* same source network -> be counted! */
++matches;
nf_ct_put(found_ct);
--
1.8.1.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 3/3] netfilter: connlimit: use rbtree for per-host conntrack obj storage
2014-03-12 22:49 [PATCH v2 0/3] netfilter: connlimit: scalability improvements Florian Westphal
2014-03-12 22:49 ` [PATCH v2 1/3] netfilter: connlimit: use keyed locks Florian Westphal
2014-03-12 22:49 ` [PATCH v2 2/3] netfilter: connlimit: make same_source_net signed Florian Westphal
@ 2014-03-12 22:49 ` Florian Westphal
2014-03-17 11:44 ` [PATCH v2 0/3] netfilter: connlimit: scalability improvements Pablo Neira Ayuso
3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2014-03-12 22:49 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
With current match design every invocation of the connlimit_match
function means we have to perform (number_of_conntracks % 256) lookups
in the conntrack table [ to perform GC/delete stale entries ].
This is also the reason why ____nf_conntrack_find() in perf top has
> 20% cpu time per core.
This patch changes the storage to rbtree which cuts down the number of
ct objects that need testing.
When looking up a new tuple, we only test the connections of the host
objects we visit while searching for the wanted host/network (or
the leaf we need to insert at).
The slot count is reduced to 32. Increasing slot count doesn't
speed up things much because of rbtree nature.
before patch (50kpps rx, 10kpps tx):
+ 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
+ 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
+ 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
+ 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
+ 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
+ 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
after (90kpps, 51kpps tx):
+ 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find
+ 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
+ 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw
+ 2.36% swapper [xt_connlimit] [k] count_tree
Obvious disadvantages to previous version are the increase in code
complexity and the increased memory cost.
Partially based on Eric Dumazets fq scheduler.
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Changes in v2:
Address Eric Dumazets suggestion to free GC'd tree nodes
before allocating a new one.
Requires a restart of the lookup, but, as Eric explained
the 2nd lookup will be fast since all data is in the cache.
net/netfilter/xt_connlimit.c | 224 ++++++++++++++++++++++++++++++++++---------
1 file changed, 177 insertions(+), 47 deletions(-)
diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index dc5207f..b1d43d4 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -19,6 +19,7 @@
#include <linux/jhash.h>
#include <linux/slab.h>
#include <linux/list.h>
+#include <linux/rbtree.h>
#include <linux/module.h>
#include <linux/random.h>
#include <linux/skbuff.h>
@@ -31,8 +32,9 @@
#include <net/netfilter/nf_conntrack_tuple.h>
#include <net/netfilter/nf_conntrack_zones.h>
-#define CONNLIMIT_SLOTS 256
+#define CONNLIMIT_SLOTS 32
#define CONNLIMIT_LOCK_SLOTS 32
+#define CONNLIMIT_GC_MAX_NODES 8
/* we will save the tuples of all connections we care about */
struct xt_connlimit_conn {
@@ -41,12 +43,20 @@ struct xt_connlimit_conn {
union nf_inet_addr addr;
};
+struct xt_connlimit_rb {
+ struct rb_node node;
+ struct hlist_head hhead; /* connections/hosts in same subnet */
+ union nf_inet_addr addr; /* search key */
+};
+
struct xt_connlimit_data {
- struct hlist_head iphash[CONNLIMIT_SLOTS];
+ struct rb_root climit_root4[CONNLIMIT_SLOTS];
+ struct rb_root climit_root6[CONNLIMIT_SLOTS];
spinlock_t locks[CONNLIMIT_LOCK_SLOTS];
};
static u_int32_t connlimit_rnd __read_mostly;
+static struct kmem_cache *connlimit_rb_cachep __read_mostly;
static struct kmem_cache *connlimit_conn_cachep __read_mostly;
static inline unsigned int connlimit_iphash(__be32 addr)
@@ -99,19 +109,33 @@ same_source_net(const union nf_inet_addr *addr,
}
}
-static int count_hlist(struct net *net,
- struct hlist_head *head,
- const struct nf_conntrack_tuple *tuple,
- const union nf_inet_addr *addr,
- const union nf_inet_addr *mask,
- u_int8_t family, bool *addit)
+static bool add_hlist(struct hlist_head *head,
+ const struct nf_conntrack_tuple *tuple,
+ const union nf_inet_addr *addr)
+{
+ struct xt_connlimit_conn *conn;
+
+ conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
+ if (conn == NULL)
+ return false;
+ conn->tuple = *tuple;
+ conn->addr = *addr;
+ hlist_add_head(&conn->node, head);
+ return true;
+}
+
+static unsigned int check_hlist(struct net *net,
+ struct hlist_head *head,
+ const struct nf_conntrack_tuple *tuple,
+ bool *addit)
{
const struct nf_conntrack_tuple_hash *found;
struct xt_connlimit_conn *conn;
struct hlist_node *n;
struct nf_conn *found_ct;
- int matches = 0;
+ unsigned int length = 0;
+ *addit = true;
rcu_read_lock();
/* check the saved connections */
@@ -144,30 +168,114 @@ static int count_hlist(struct net *net,
continue;
}
- if (same_source_net(addr, mask, &conn->addr, family) == 0)
- /* same source network -> be counted! */
- ++matches;
nf_ct_put(found_ct);
+ length++;
}
rcu_read_unlock();
- return matches;
+ return length;
}
-static bool add_hlist(struct hlist_head *head,
- const struct nf_conntrack_tuple *tuple,
- const union nf_inet_addr *addr)
+static void tree_nodes_free(struct rb_root *root,
+ struct xt_connlimit_rb *gc_nodes[],
+ unsigned int gc_count)
+{
+ struct xt_connlimit_rb *rbconn;
+
+ while (gc_count) {
+ rbconn = gc_nodes[--gc_count];
+ rb_erase(&rbconn->node, root);
+ kmem_cache_free(connlimit_rb_cachep, rbconn);
+ }
+}
+
+static unsigned int
+count_tree(struct net *net, struct rb_root *root,
+ const struct nf_conntrack_tuple *tuple,
+ const union nf_inet_addr *addr, const union nf_inet_addr *mask,
+ u8 family)
{
+ struct xt_connlimit_rb *gc_nodes[CONNLIMIT_GC_MAX_NODES];
+ struct rb_node **rbnode, *parent;
+ struct xt_connlimit_rb *rbconn;
struct xt_connlimit_conn *conn;
+ unsigned int gc_count;
+ bool no_gc = false;
+
+ restart:
+ gc_count = 0;
+ parent = NULL;
+ rbnode = &(root->rb_node);
+ while (*rbnode) {
+ int diff;
+ bool addit;
+
+ rbconn = container_of(*rbnode, struct xt_connlimit_rb, node);
+
+ parent = *rbnode;
+ diff = same_source_net(addr, mask, &rbconn->addr, family);
+ if (diff < 0) {
+ rbnode = &((*rbnode)->rb_left);
+ } else if (diff > 0) {
+ rbnode = &((*rbnode)->rb_right);
+ } else {
+ /* same source network -> be counted! */
+ unsigned int count;
+ count = check_hlist(net, &rbconn->hhead, tuple, &addit);
+
+ tree_nodes_free(root, gc_nodes, gc_count);
+ if (!addit)
+ return count;
+
+ if (!add_hlist(&rbconn->hhead, tuple, addr))
+ return 0; /* hotdrop */
+
+ return count + 1;
+ }
+
+ if (no_gc || gc_count >= ARRAY_SIZE(gc_nodes))
+ continue;
+
+ /* only used for GC on hhead, retval and 'addit' ignored */
+ check_hlist(net, &rbconn->hhead, tuple, &addit);
+ if (hlist_empty(&rbconn->hhead))
+ gc_nodes[gc_count++] = rbconn;
+ }
+
+ if (gc_count) {
+ no_gc = true;
+ tree_nodes_free(root, gc_nodes, gc_count);
+ /* tree_node_free before new allocation permits
+ * allocator to re-use newly free'd object.
+ *
+ * This is a rare event; in most cases we will find
+ * existing node to re-use. (or gc_count is 0).
+ */
+ goto restart;
+ }
+
+ /* no match, need to insert new node */
+ rbconn = kmem_cache_alloc(connlimit_rb_cachep, GFP_ATOMIC);
+ if (rbconn == NULL)
+ return 0;
conn = kmem_cache_alloc(connlimit_conn_cachep, GFP_ATOMIC);
- if (conn == NULL)
- return false;
+ if (conn == NULL) {
+ kmem_cache_free(connlimit_rb_cachep, rbconn);
+ return 0;
+ }
+
conn->tuple = *tuple;
conn->addr = *addr;
- hlist_add_head(&conn->node, head);
- return true;
+ rbconn->addr = *addr;
+
+ INIT_HLIST_HEAD(&rbconn->hhead);
+ hlist_add_head(&conn->node, &rbconn->hhead);
+
+ rb_link_node(&rbconn->node, parent, rbnode);
+ rb_insert_color(&rbconn->node, root);
+ return 1;
}
static int count_them(struct net *net,
@@ -177,26 +285,22 @@ static int count_them(struct net *net,
const union nf_inet_addr *mask,
u_int8_t family)
{
- struct hlist_head *hhead;
+ struct rb_root *root;
int count;
u32 hash;
- bool addit = true;
- if (family == NFPROTO_IPV6)
+ if (family == NFPROTO_IPV6) {
hash = connlimit_iphash6(addr, mask);
- else
+ root = &data->climit_root6[hash];
+ } else {
hash = connlimit_iphash(addr->ip & mask->ip);
-
- hhead = &data->iphash[hash];
+ root = &data->climit_root4[hash];
+ }
spin_lock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
- count = count_hlist(net, hhead, tuple, addr, mask, family, &addit);
- if (addit) {
- if (add_hlist(hhead, tuple, addr))
- count++;
- else
- count = -ENOMEM;
- }
+
+ count = count_tree(net, root, tuple, addr, mask, family);
+
spin_unlock_bh(&data->locks[hash % CONNLIMIT_LOCK_SLOTS]);
return count;
@@ -212,7 +316,7 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
const struct nf_conntrack_tuple *tuple_ptr = &tuple;
enum ip_conntrack_info ctinfo;
const struct nf_conn *ct;
- int connections;
+ unsigned int connections;
ct = nf_ct_get(skb, &ctinfo);
if (ct != NULL)
@@ -233,7 +337,7 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
connections = count_them(net, info->data, tuple_ptr, &addr,
&info->mask, par->family);
- if (connections < 0)
+ if (connections == 0)
/* kmalloc failed, drop it entirely */
goto hotdrop;
@@ -276,28 +380,44 @@ static int connlimit_mt_check(const struct xt_mtchk_param *par)
for (i = 0; i < ARRAY_SIZE(info->data->locks); ++i)
spin_lock_init(&info->data->locks[i]);
- for (i = 0; i < ARRAY_SIZE(info->data->iphash); ++i)
- INIT_HLIST_HEAD(&info->data->iphash[i]);
+ for (i = 0; i < ARRAY_SIZE(info->data->climit_root4); ++i)
+ info->data->climit_root4[i] = RB_ROOT;
+ for (i = 0; i < ARRAY_SIZE(info->data->climit_root6); ++i)
+ info->data->climit_root6[i] = RB_ROOT;
return 0;
}
-static void connlimit_mt_destroy(const struct xt_mtdtor_param *par)
+static void destroy_tree(struct rb_root *r)
{
- const struct xt_connlimit_info *info = par->matchinfo;
struct xt_connlimit_conn *conn;
+ struct xt_connlimit_rb *rbconn;
struct hlist_node *n;
- struct hlist_head *hash = info->data->iphash;
- unsigned int i;
+ struct rb_node *node;
- nf_ct_l3proto_module_put(par->family);
+ while ((node = rb_first(r)) != NULL) {
+ rbconn = container_of(node, struct xt_connlimit_rb, node);
- for (i = 0; i < ARRAY_SIZE(info->data->iphash); ++i) {
- hlist_for_each_entry_safe(conn, n, &hash[i], node) {
- hlist_del(&conn->node);
+ rb_erase(node, r);
+
+ hlist_for_each_entry_safe(conn, n, &rbconn->hhead, node)
kmem_cache_free(connlimit_conn_cachep, conn);
- }
+
+ kmem_cache_free(connlimit_rb_cachep, rbconn);
}
+}
+
+static void connlimit_mt_destroy(const struct xt_mtdtor_param *par)
+{
+ const struct xt_connlimit_info *info = par->matchinfo;
+ unsigned int i;
+
+ nf_ct_l3proto_module_put(par->family);
+
+ for (i = 0; i < ARRAY_SIZE(info->data->climit_root4); ++i)
+ destroy_tree(&info->data->climit_root4[i]);
+ for (i = 0; i < ARRAY_SIZE(info->data->climit_root6); ++i)
+ destroy_tree(&info->data->climit_root6[i]);
kfree(info->data);
}
@@ -326,9 +446,18 @@ static int __init connlimit_mt_init(void)
if (!connlimit_conn_cachep)
return -ENOMEM;
+ connlimit_rb_cachep = kmem_cache_create("xt_connlimit_rb",
+ sizeof(struct xt_connlimit_rb),
+ 0, 0, NULL);
+ if (!connlimit_rb_cachep) {
+ kmem_cache_destroy(connlimit_conn_cachep);
+ return -ENOMEM;
+ }
ret = xt_register_match(&connlimit_mt_reg);
- if (ret != 0)
+ if (ret != 0) {
kmem_cache_destroy(connlimit_conn_cachep);
+ kmem_cache_destroy(connlimit_rb_cachep);
+ }
return ret;
}
@@ -336,6 +465,7 @@ static void __exit connlimit_mt_exit(void)
{
xt_unregister_match(&connlimit_mt_reg);
kmem_cache_destroy(connlimit_conn_cachep);
+ kmem_cache_destroy(connlimit_rb_cachep);
}
module_init(connlimit_mt_init);
--
1.8.1.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 0/3] netfilter: connlimit: scalability improvements
2014-03-12 22:49 [PATCH v2 0/3] netfilter: connlimit: scalability improvements Florian Westphal
` (2 preceding siblings ...)
2014-03-12 22:49 ` [PATCH v2 3/3] netfilter: connlimit: use rbtree for per-host conntrack obj storage Florian Westphal
@ 2014-03-17 11:44 ` Pablo Neira Ayuso
3 siblings, 0 replies; 5+ messages in thread
From: Pablo Neira Ayuso @ 2014-03-17 11:44 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Wed, Mar 12, 2014 at 11:49:48PM +0100, Florian Westphal wrote:
> Resending the last three patches of the set; I have addressed
> the comments I've received. See individual patches on whats
> changed vs v1.
>
> I've done a brief re-rest with 2-hrs of synflooding and
> nf_conntrack_max=2000000 plus conntrack -F every 10 seconds and did not
> encounter any issues.
>
> I am copying the original v1 cover letter below.
>
> The connlimit match suffers from two problems:
>
> - lock contention when multiple cpus invoke the match function
> - algorithmic complexity: on average the connlimit match will need to
> examine NUMBER_OF_CONNTRACKS % HASH_BUCKET (always 256) connections
> as the match will test for every connection assigned to the same bucked
> as the new one wheter the conntrack is still active.
>
> This patch set tries to solve both issues.
Series applied, thanks Florian.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-03-17 11:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-12 22:49 [PATCH v2 0/3] netfilter: connlimit: scalability improvements Florian Westphal
2014-03-12 22:49 ` [PATCH v2 1/3] netfilter: connlimit: use keyed locks Florian Westphal
2014-03-12 22:49 ` [PATCH v2 2/3] netfilter: connlimit: make same_source_net signed Florian Westphal
2014-03-12 22:49 ` [PATCH v2 3/3] netfilter: connlimit: use rbtree for per-host conntrack obj storage Florian Westphal
2014-03-17 11:44 ` [PATCH v2 0/3] netfilter: connlimit: scalability improvements Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).