* [PATCH 0/4 nf-next v2] netfilter: rework conncount API to receive ct directly
@ 2025-11-10 15:42 Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 1/4 nf-next v2] netfilter: conntrack: add nf_ct_get_or_find() helper Fernando Fernandez Mancera
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2025-11-10 15:42 UTC (permalink / raw)
To: netfilter-devel
Cc: coreteam, pablo, fw, phil, aconole, echaudro, i.maximets,
Fernando Fernandez Mancera
This series is fixing two different problems. The first issue is related
to duplicated entries when used for non-confirmed connections in
nft_connlimit and xt_connlimit. Now, nf_conncount_add() checks whether
the connection is confirmed or not. If the connection is confirmed,
skip the add.
In order to do that, the nf_conncount API is now receiving struct nf_conn
as argument instead of tuple and zone. In addition, nf_conncount_count()
also needs to receive the net because it calls nf_conncount_gc_list()
inside it if ct is NULL.
The second issue this series is fixing is related to
nft_connlimit/xt_connlimit not updating the list of connection for
confirmed connections breaking softlimiting use-cases like limiting the
bandwidth when too many connections are open.
This has been tested on datapath using connlimit in nftables and
iptables. I have stressed the system up to 2000 connections.
CC'ing openvswitch maintainers as this change on the API required me to
touch their code. I am not very familiar with the internals of
openvswitch but I believe this should be fine for them as they hold a
reference to a valid ct already. If you could provide some testing from
openvswitch side it would be really helpful.
Fernando Fernandez Mancera (4):
netfilter: conntrack: add nf_ct_get_or_find() helper
netfilter: nf_conncount: only track connection if it is not confirmed
netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
netfilter: nft_connlimit: update connection list if add was skipped
include/net/netfilter/nf_conntrack.h | 3 +
include/net/netfilter/nf_conntrack_count.h | 10 +--
net/netfilter/nf_conncount.c | 94 +++++++++++++---------
net/netfilter/nf_conntrack_core.c | 35 ++++++++
net/netfilter/nft_connlimit.c | 45 ++++++-----
net/netfilter/xt_connlimit.c | 27 +++----
net/openvswitch/conntrack.c | 14 ++--
7 files changed, 133 insertions(+), 95 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/4 nf-next v2] netfilter: conntrack: add nf_ct_get_or_find() helper
2025-11-10 15:42 [PATCH 0/4 nf-next v2] netfilter: rework conncount API to receive ct directly Fernando Fernandez Mancera
@ 2025-11-10 15:42 ` Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 2/4 nf-next v2] netfilter: nf_conncount: only track connection if it is not confirmed Fernando Fernandez Mancera
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2025-11-10 15:42 UTC (permalink / raw)
To: netfilter-devel
Cc: coreteam, pablo, fw, phil, aconole, echaudro, i.maximets,
Fernando Fernandez Mancera
Introduce a helper function which tries to get a ct, if it is a
template, then the tuple pointer from the skb and perform a lookup to
find the actual ct.
A refcounted bool parameter must be passed in order to know if the
caller need to call nf_ct_put() after it is done using the ct.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: added this commit to the series
---
include/net/netfilter/nf_conntrack.h | 3 +++
net/netfilter/nf_conntrack_core.c | 35 ++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index aa0a7c82199e..5a0dd5715122 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -201,6 +201,9 @@ bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff,
u_int16_t l3num, struct net *net,
struct nf_conntrack_tuple *tuple);
+struct nf_conn *nf_ct_get_or_find(struct net *net, const struct sk_buff *skb,
+ u16 l3num, bool *refcounted);
+
void __nf_ct_refresh_acct(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
u32 extra_jiffies, unsigned int bytes);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 0b95f226f211..3ef152a378cc 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -428,6 +428,41 @@ bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff,
}
EXPORT_SYMBOL_GPL(nf_ct_get_tuplepr);
+struct nf_conn *
+nf_ct_get_or_find(struct net *net, const struct sk_buff *skb, u16 l3num,
+ bool *refcounted)
+{
+ const struct nf_conntrack_tuple_hash *h;
+ const struct nf_conntrack_zone *zone;
+ struct nf_conntrack_tuple tuple;
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct;
+
+ ct = nf_ct_get(skb, &ctinfo);
+ if (ct && !nf_ct_is_template(ct))
+ return ct;
+
+ if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), l3num, net, &tuple))
+ return NULL;
+
+ if (ct)
+ zone = nf_ct_zone(ct);
+ else
+ zone = &nf_ct_zone_dflt;
+
+ h = nf_conntrack_find_get(net, zone, &tuple);
+ if (!h)
+ return NULL;
+
+ ct = nf_ct_tuplehash_to_ctrack(h);
+
+ /* refcount increased, nf_ct_put() is needed after done with it */
+ *refcounted = true;
+
+ return ct;
+}
+EXPORT_SYMBOL_GPL(nf_ct_get_or_find);
+
bool
nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse,
const struct nf_conntrack_tuple *orig)
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4 nf-next v2] netfilter: nf_conncount: only track connection if it is not confirmed
2025-11-10 15:42 [PATCH 0/4 nf-next v2] netfilter: rework conncount API to receive ct directly Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 1/4 nf-next v2] netfilter: conntrack: add nf_ct_get_or_find() helper Fernando Fernandez Mancera
@ 2025-11-10 15:42 ` Fernando Fernandez Mancera
2025-11-10 20:44 ` Florian Westphal
2025-11-10 15:42 ` [PATCH 3/4 nf-next v2] netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 4/4 nf-next v2] netfilter: nft_connlimit: update connection list if add was skipped Fernando Fernandez Mancera
3 siblings, 1 reply; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2025-11-10 15:42 UTC (permalink / raw)
To: netfilter-devel
Cc: coreteam, pablo, fw, phil, aconole, echaudro, i.maximets,
Fernando Fernandez Mancera
Since commit d265929930e2 ("netfilter: nf_conncount: reduce unnecessary
GC") if nftables/iptables connlimit is used without a check for new
connections there can be duplicated entries tracked.
Pass the nf_conn struct directly to the nf_conncount API and check
whether the connection is confirmed or not inside nf_conncount_add(). If
the connection is confirmed, skip it.
Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: handle the ct refcount, avoid copy-paste by using the helper
introduced in the new commit "nf_ct_find_or_get"
---
include/net/netfilter/nf_conntrack_count.h | 10 +---
net/netfilter/nf_conncount.c | 70 ++++++++++++----------
net/netfilter/nft_connlimit.c | 31 +++++-----
net/netfilter/xt_connlimit.c | 27 ++++-----
net/openvswitch/conntrack.c | 14 ++---
5 files changed, 70 insertions(+), 82 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_count.h b/include/net/netfilter/nf_conntrack_count.h
index 1b58b5b91ff6..4fc2fb03d093 100644
--- a/include/net/netfilter/nf_conntrack_count.h
+++ b/include/net/netfilter/nf_conntrack_count.h
@@ -18,15 +18,11 @@ struct nf_conncount_list {
struct nf_conncount_data *nf_conncount_init(struct net *net, unsigned int keylen);
void nf_conncount_destroy(struct net *net, struct nf_conncount_data *data);
-unsigned int nf_conncount_count(struct net *net,
+unsigned int nf_conncount_count(struct net *net, const struct nf_conn *ct,
struct nf_conncount_data *data,
- const u32 *key,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone);
+ const u32 *key);
-int nf_conncount_add(struct net *net, struct nf_conncount_list *list,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone);
+int nf_conncount_add(const struct nf_conn *ct, struct nf_conncount_list *list);
void nf_conncount_list_init(struct nf_conncount_list *list);
diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 913ede2f57f9..803589f4eaa1 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -122,16 +122,22 @@ find_or_evict(struct net *net, struct nf_conncount_list *list,
return ERR_PTR(-EAGAIN);
}
-static int __nf_conncount_add(struct net *net,
- struct nf_conncount_list *list,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone)
+static int __nf_conncount_add(const struct nf_conn *ct,
+ struct nf_conncount_list *list)
{
const struct nf_conntrack_tuple_hash *found;
struct nf_conncount_tuple *conn, *conn_n;
+ const struct nf_conntrack_tuple *tuple;
+ const struct nf_conntrack_zone *zone;
struct nf_conn *found_ct;
unsigned int collect = 0;
+ if (!ct || nf_ct_is_confirmed(ct))
+ return -EINVAL;
+
+ tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+ zone = nf_ct_zone(ct);
+
if ((u32)jiffies == list->last_gc)
goto add_new_node;
@@ -140,7 +146,7 @@ static int __nf_conncount_add(struct net *net,
if (collect > CONNCOUNT_GC_MAX_NODES)
break;
- found = find_or_evict(net, list, conn);
+ found = find_or_evict(nf_ct_net(ct), list, conn);
if (IS_ERR(found)) {
/* Not found, but might be about to be confirmed */
if (PTR_ERR(found) == -EAGAIN) {
@@ -198,16 +204,13 @@ static int __nf_conncount_add(struct net *net,
return 0;
}
-int nf_conncount_add(struct net *net,
- struct nf_conncount_list *list,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone)
+int nf_conncount_add(const struct nf_conn *ct, struct nf_conncount_list *list)
{
int ret;
/* check the saved connections */
spin_lock_bh(&list->list_lock);
- ret = __nf_conncount_add(net, list, tuple, zone);
+ ret = __nf_conncount_add(ct, list);
spin_unlock_bh(&list->list_lock);
return ret;
@@ -308,21 +311,22 @@ static void schedule_gc_worker(struct nf_conncount_data *data, int tree)
}
static unsigned int
-insert_tree(struct net *net,
+insert_tree(const struct nf_conn *ct,
struct nf_conncount_data *data,
struct rb_root *root,
unsigned int hash,
- const u32 *key,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone)
+ const u32 *key)
{
struct nf_conncount_rb *gc_nodes[CONNCOUNT_GC_MAX_NODES];
+ const struct nf_conntrack_zone *zone = nf_ct_zone(ct);
+ const struct nf_conntrack_tuple *tuple;
struct rb_node **rbnode, *parent;
struct nf_conncount_rb *rbconn;
struct nf_conncount_tuple *conn;
unsigned int count = 0, gc_count = 0;
bool do_gc = true;
+ tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
spin_lock_bh(&nf_conncount_locks[hash]);
restart:
parent = NULL;
@@ -340,8 +344,8 @@ insert_tree(struct net *net,
} else {
int ret;
- ret = nf_conncount_add(net, &rbconn->list, tuple, zone);
- if (ret)
+ ret = nf_conncount_add(ct, &rbconn->list);
+ if (ret && ret != -EINVAL)
count = 0; /* hotdrop */
else
count = rbconn->list.count;
@@ -352,7 +356,7 @@ insert_tree(struct net *net,
if (gc_count >= ARRAY_SIZE(gc_nodes))
continue;
- if (do_gc && nf_conncount_gc_list(net, &rbconn->list))
+ if (do_gc && nf_conncount_gc_list(nf_ct_net(ct), &rbconn->list))
gc_nodes[gc_count++] = rbconn;
}
@@ -394,11 +398,9 @@ insert_tree(struct net *net,
}
static unsigned int
-count_tree(struct net *net,
+count_tree(struct net *net, const struct nf_conn *ct,
struct nf_conncount_data *data,
- const u32 *key,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone)
+ const u32 *key)
{
struct rb_root *root;
struct rb_node *parent;
@@ -422,7 +424,7 @@ count_tree(struct net *net,
} else {
int ret;
- if (!tuple) {
+ if (!ct) {
nf_conncount_gc_list(net, &rbconn->list);
return rbconn->list.count;
}
@@ -437,19 +439,23 @@ count_tree(struct net *net,
}
/* same source network -> be counted! */
- ret = __nf_conncount_add(net, &rbconn->list, tuple, zone);
+ ret = __nf_conncount_add(ct, &rbconn->list);
spin_unlock_bh(&rbconn->list.list_lock);
- if (ret)
+ if (ret && ret != -EINVAL) {
return 0; /* hotdrop */
- else
+ } else {
+ /* -EINVAL means add was skipped, update the list */
+ if (ret == -EINVAL)
+ nf_conncount_gc_list(net, &rbconn->list);
return rbconn->list.count;
+ }
}
}
- if (!tuple)
+ if (!ct)
return 0;
- return insert_tree(net, data, root, hash, key, tuple, zone);
+ return insert_tree(ct, data, root, hash, key);
}
static void tree_gc_worker(struct work_struct *work)
@@ -511,16 +517,14 @@ static void tree_gc_worker(struct work_struct *work)
}
/* Count and return number of conntrack entries in 'net' with particular 'key'.
- * If 'tuple' is not null, insert it into the accounting data structure.
+ * If 'ct' is not null, insert the tuple into the accounting data structure.
* Call with RCU read lock.
*/
-unsigned int nf_conncount_count(struct net *net,
+unsigned int nf_conncount_count(struct net *net, const struct nf_conn *ct,
struct nf_conncount_data *data,
- const u32 *key,
- const struct nf_conntrack_tuple *tuple,
- const struct nf_conntrack_zone *zone)
+ const u32 *key)
{
- return count_tree(net, data, key, tuple, zone);
+ return count_tree(net, ct, data, key);
}
EXPORT_SYMBOL_GPL(nf_conncount_count);
diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
index fc35a11cdca2..102c2ed3de41 100644
--- a/net/netfilter/nft_connlimit.c
+++ b/net/netfilter/nft_connlimit.c
@@ -24,29 +24,28 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv,
const struct nft_pktinfo *pkt,
const struct nft_set_ext *ext)
{
- const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt;
- const struct nf_conntrack_tuple *tuple_ptr;
- struct nf_conntrack_tuple tuple;
- enum ip_conntrack_info ctinfo;
- const struct nf_conn *ct;
+ bool refcounted = false;
+ struct nf_conn *ct;
unsigned int count;
+ int err;
- tuple_ptr = &tuple;
-
- ct = nf_ct_get(pkt->skb, &ctinfo);
- if (ct != NULL) {
- tuple_ptr = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
- zone = nf_ct_zone(ct);
- } else if (!nf_ct_get_tuplepr(pkt->skb, skb_network_offset(pkt->skb),
- nft_pf(pkt), nft_net(pkt), &tuple)) {
+ ct = nf_ct_get_or_find(nft_net(pkt), pkt->skb, nft_pf(pkt), &refcounted);
+ if (!ct) {
regs->verdict.code = NF_DROP;
return;
}
- if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
- regs->verdict.code = NF_DROP;
- return;
+ err = nf_conncount_add(ct, priv->list);
+ if (err) {
+ if (err != -EINVAL) {
+ if (refcounted)
+ nf_ct_put(ct);
+ regs->verdict.code = NF_DROP;
+ return;
+ }
}
+ if (refcounted)
+ nf_ct_put(ct);
count = READ_ONCE(priv->list->count);
diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index 0189f8b6b0bd..8c21890e4536 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -29,24 +29,16 @@
static bool
connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
- struct net *net = xt_net(par);
const struct xt_connlimit_info *info = par->matchinfo;
- struct nf_conntrack_tuple tuple;
- const struct nf_conntrack_tuple *tuple_ptr = &tuple;
- const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt;
- enum ip_conntrack_info ctinfo;
- const struct nf_conn *ct;
+ struct net *net = xt_net(par);
unsigned int connections;
+ bool refcounted = false;
+ struct nf_conn *ct;
u32 key[5];
- ct = nf_ct_get(skb, &ctinfo);
- if (ct != NULL) {
- tuple_ptr = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
- zone = nf_ct_zone(ct);
- } else if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
- xt_family(par), net, &tuple)) {
+ ct = nf_ct_get_or_find(net, skb, xt_family(par), &refcounted);
+ if (!ct)
goto hotdrop;
- }
if (xt_family(par) == NFPROTO_IPV6) {
const struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -59,18 +51,19 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
for (i = 0; i < ARRAY_SIZE(addr.ip6); ++i)
addr.ip6[i] &= info->mask.ip6[i];
memcpy(key, &addr, sizeof(addr.ip6));
- key[4] = zone->id;
+ key[4] = nf_ct_zone(ct)->id;
} else {
const struct iphdr *iph = ip_hdr(skb);
key[0] = (info->flags & XT_CONNLIMIT_DADDR) ?
(__force __u32)iph->daddr : (__force __u32)iph->saddr;
key[0] &= (__force __u32)info->mask.ip;
- key[1] = zone->id;
+ key[1] = nf_ct_zone(ct)->id;
}
- connections = nf_conncount_count(net, info->data, key, tuple_ptr,
- zone);
+ connections = nf_conncount_count(net, ct, info->data, key);
+ if (refcounted)
+ nf_ct_put(ct);
if (connections == 0)
/* kmalloc failed, drop it entirely */
goto hotdrop;
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index e573e9221302..e23a0de48ff9 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -929,7 +929,7 @@ static u32 ct_limit_get(const struct ovs_ct_limit_info *info, u16 zone)
static int ovs_ct_check_limit(struct net *net,
const struct ovs_conntrack_info *info,
- const struct nf_conntrack_tuple *tuple)
+ const struct nf_conn *ct)
{
struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
@@ -942,8 +942,8 @@ static int ovs_ct_check_limit(struct net *net,
if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED)
return 0;
- connections = nf_conncount_count(net, ct_limit_info->data,
- &conncount_key, tuple, &info->zone);
+ connections = nf_conncount_count(net, ct, ct_limit_info->data,
+ &conncount_key);
if (connections > per_zone_limit)
return -ENOMEM;
@@ -972,8 +972,7 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
if (static_branch_unlikely(&ovs_ct_limit_enabled)) {
if (!nf_ct_is_confirmed(ct)) {
- err = ovs_ct_check_limit(net, info,
- &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+ err = ovs_ct_check_limit(net, info, ct);
if (err) {
net_warn_ratelimited("openvswitch: zone: %u "
"exceeds conntrack limit\n",
@@ -1762,16 +1761,13 @@ static int __ovs_ct_limit_get_zone_limit(struct net *net,
u16 zone_id, u32 limit,
struct sk_buff *reply)
{
- struct nf_conntrack_zone ct_zone;
struct ovs_zone_limit zone_limit;
u32 conncount_key = zone_id;
zone_limit.zone_id = zone_id;
zone_limit.limit = limit;
- nf_ct_zone_init(&ct_zone, zone_id, NF_CT_DEFAULT_ZONE_DIR, 0);
- zone_limit.count = nf_conncount_count(net, data, &conncount_key, NULL,
- &ct_zone);
+ zone_limit.count = nf_conncount_count(net, NULL, data, &conncount_key);
return nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4 nf-next v2] netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
2025-11-10 15:42 [PATCH 0/4 nf-next v2] netfilter: rework conncount API to receive ct directly Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 1/4 nf-next v2] netfilter: conntrack: add nf_ct_get_or_find() helper Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 2/4 nf-next v2] netfilter: nf_conncount: only track connection if it is not confirmed Fernando Fernandez Mancera
@ 2025-11-10 15:42 ` Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 4/4 nf-next v2] netfilter: nft_connlimit: update connection list if add was skipped Fernando Fernandez Mancera
3 siblings, 0 replies; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2025-11-10 15:42 UTC (permalink / raw)
To: netfilter-devel
Cc: coreteam, pablo, fw, phil, aconole, echaudro, i.maximets,
Fernando Fernandez Mancera
For convenience when performing GC over the connection list, make
nf_conncount_gc_list() to disable BH. This unifies the behavior with
nf_conncount_add() and nf_conncount_count().
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: no changes done
---
net/netfilter/nf_conncount.c | 24 +++++++++++++++++-------
net/netfilter/nft_connlimit.c | 7 +------
2 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 803589f4eaa1..cb52fe928a77 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -227,8 +227,8 @@ void nf_conncount_list_init(struct nf_conncount_list *list)
EXPORT_SYMBOL_GPL(nf_conncount_list_init);
/* Return true if the list is empty. Must be called with BH disabled. */
-bool nf_conncount_gc_list(struct net *net,
- struct nf_conncount_list *list)
+static bool __nf_conncount_gc_list(struct net *net,
+ struct nf_conncount_list *list)
{
const struct nf_conntrack_tuple_hash *found;
struct nf_conncount_tuple *conn, *conn_n;
@@ -240,10 +240,6 @@ bool nf_conncount_gc_list(struct net *net,
if ((u32)jiffies == READ_ONCE(list->last_gc))
return false;
- /* don't bother if other cpu is already doing GC */
- if (!spin_trylock(&list->list_lock))
- return false;
-
list_for_each_entry_safe(conn, conn_n, &list->head, node) {
found = find_or_evict(net, list, conn);
if (IS_ERR(found)) {
@@ -272,7 +268,21 @@ bool nf_conncount_gc_list(struct net *net,
if (!list->count)
ret = true;
list->last_gc = (u32)jiffies;
- spin_unlock(&list->list_lock);
+
+ return ret;
+};
+
+bool nf_conncount_gc_list(struct net *net,
+ struct nf_conncount_list *list)
+{
+ bool ret;
+
+ /* don't bother if other cpu is already doing GC */
+ if (!spin_trylock_bh(&list->list_lock))
+ return false;
+
+ ret = __nf_conncount_gc_list(net, list);
+ spin_unlock_bh(&list->list_lock);
return ret;
}
diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
index 102c2ed3de41..ecb65da113d6 100644
--- a/net/netfilter/nft_connlimit.c
+++ b/net/netfilter/nft_connlimit.c
@@ -237,13 +237,8 @@ static void nft_connlimit_destroy_clone(const struct nft_ctx *ctx,
static bool nft_connlimit_gc(struct net *net, const struct nft_expr *expr)
{
struct nft_connlimit *priv = nft_expr_priv(expr);
- bool ret;
- local_bh_disable();
- ret = nf_conncount_gc_list(net, priv->list);
- local_bh_enable();
-
- return ret;
+ return nf_conncount_gc_list(net, priv->list);
}
static struct nft_expr_type nft_connlimit_type;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4 nf-next v2] netfilter: nft_connlimit: update connection list if add was skipped
2025-11-10 15:42 [PATCH 0/4 nf-next v2] netfilter: rework conncount API to receive ct directly Fernando Fernandez Mancera
` (2 preceding siblings ...)
2025-11-10 15:42 ` [PATCH 3/4 nf-next v2] netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH Fernando Fernandez Mancera
@ 2025-11-10 15:42 ` Fernando Fernandez Mancera
3 siblings, 0 replies; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2025-11-10 15:42 UTC (permalink / raw)
To: netfilter-devel
Cc: coreteam, pablo, fw, phil, aconole, echaudro, i.maximets,
Fernando Fernandez Mancera
Connlimit expression can be used for all kind of packets and not only
for packets with connection state new. See this ruleset as example:
table ip filter {
chain input {
type filter hook input priority filter; policy accept;
tcp dport 22 ct count over 4 counter
}
}
Currently, if the connection count goes over the limit the counter will
count the packets. When a connection is closed, the connection count
won't decrement as it should because it is only updated for new
connections due to an optimization on __nf_conncount_add() that prevents
updating the list if the connection is duplicated.
To solve this problem, check whether the connection was skipped and if
so, update the list. This fix isn't necessary on xt_connlimit as the new
nf_conncount API updates the list when the add is skipped inside
nf_conncount_count().
Fixes: 976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: no changes done
---
net/netfilter/nft_connlimit.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
index ecb65da113d6..9e029397387a 100644
--- a/net/netfilter/nft_connlimit.c
+++ b/net/netfilter/nft_connlimit.c
@@ -37,7 +37,14 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv,
err = nf_conncount_add(ct, priv->list);
if (err) {
- if (err != -EINVAL) {
+ if (err == -EINVAL) {
+ /* Call gc to update the list count if any connection has
+ * been closed already. This is useful to softlimit
+ * connections like limiting bandwidth based on a number
+ * of open connections.
+ */
+ nf_conncount_gc_list(nf_ct_net(ct), priv->list);
+ } else {
if (refcounted)
nf_ct_put(ct);
regs->verdict.code = NF_DROP;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/4 nf-next v2] netfilter: nf_conncount: only track connection if it is not confirmed
2025-11-10 15:42 ` [PATCH 2/4 nf-next v2] netfilter: nf_conncount: only track connection if it is not confirmed Fernando Fernandez Mancera
@ 2025-11-10 20:44 ` Florian Westphal
2025-11-10 21:40 ` Fernando Fernandez Mancera
0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2025-11-10 20:44 UTC (permalink / raw)
To: Fernando Fernandez Mancera
Cc: netfilter-devel, coreteam, pablo, phil, aconole, echaudro,
i.maximets
Fernando Fernandez Mancera <fmancera@suse.de> wrote:
> diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
> index 0189f8b6b0bd..8c21890e4536 100644
> --- a/net/netfilter/xt_connlimit.c
> +++ b/net/netfilter/xt_connlimit.c
> @@ -29,24 +29,16 @@
> static bool
> connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
> {
> - struct net *net = xt_net(par);
> const struct xt_connlimit_info *info = par->matchinfo;
> - struct nf_conntrack_tuple tuple;
> - const struct nf_conntrack_tuple *tuple_ptr = &tuple;
> - const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt;
> - enum ip_conntrack_info ctinfo;
> - const struct nf_conn *ct;
> + struct net *net = xt_net(par);
> unsigned int connections;
> + bool refcounted = false;
> + struct nf_conn *ct;
> u32 key[5];
>
> - ct = nf_ct_get(skb, &ctinfo);
> - if (ct != NULL) {
> - tuple_ptr = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> - zone = nf_ct_zone(ct);
> - } else if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
> - xt_family(par), net, &tuple)) {
> + ct = nf_ct_get_or_find(net, skb, xt_family(par), &refcounted);
> + if (!ct)
> goto hotdrop;
> - }
This can't work this way for -t raw use case, which we need
to preserve.
Anyone who uses -t raw ... -m connlimit ...
will now have their packets dropped, so no connection
can make forward progress (not even when using iptables --syn).
We need to get rid of the
if ((u32)jiffies == list->last_gc)
goto add_new_node;
check in nf_conncount_add(), or, (to not add perf regress for ovs ...)
apply it when a (confirmed) conntrack entry is present.
Given that limitation, I don't think this nf_ct_get_or_find() helper
makes any sense, since you still need to pass the tupleptr down
to count_tree().
I think passing in the sk_buff is better, so all of this
conntrack/tuple/zone etc. stuff is hidden away in nf_conncount.c.
I think you could start by *adding*
unsigned int nf_conncount_count_skb(struct net *net,
const struct sk_buff *skb,
struct nf_conncount_data *data,
const u32 *key);
As frontend function for nf_conncount_count().
This new function could (re)use some of the code you made
for nf_ct_get_or_find(), the zone usage there looks correct
to me.
Then, in patch 2, convert -m connlimit.
You could send that as an initial patch set already.
Then, in patch 3 (or later followup patch set), convert remaining user
(ovs) and hide old api.
Then, in patch 4, start pushing down the sk_buff more in nf_conncount.c
until its available for nf_conncount_add().
Then, add nf_conncount_add_skb and repeat this process.
Does that make sense?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/4 nf-next v2] netfilter: nf_conncount: only track connection if it is not confirmed
2025-11-10 20:44 ` Florian Westphal
@ 2025-11-10 21:40 ` Fernando Fernandez Mancera
0 siblings, 0 replies; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2025-11-10 21:40 UTC (permalink / raw)
To: Florian Westphal
Cc: netfilter-devel, coreteam, pablo, phil, aconole, echaudro,
i.maximets
On 11/10/25 9:44 PM, Florian Westphal wrote:
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>> diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
>> index 0189f8b6b0bd..8c21890e4536 100644
>> --- a/net/netfilter/xt_connlimit.c
>> +++ b/net/netfilter/xt_connlimit.c
>> @@ -29,24 +29,16 @@
>> static bool
>> connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> {
>> - struct net *net = xt_net(par);
>> const struct xt_connlimit_info *info = par->matchinfo;
>> - struct nf_conntrack_tuple tuple;
>> - const struct nf_conntrack_tuple *tuple_ptr = &tuple;
>> - const struct nf_conntrack_zone *zone = &nf_ct_zone_dflt;
>> - enum ip_conntrack_info ctinfo;
>> - const struct nf_conn *ct;
>> + struct net *net = xt_net(par);
>> unsigned int connections;
>> + bool refcounted = false;
>> + struct nf_conn *ct;
>> u32 key[5];
>>
>> - ct = nf_ct_get(skb, &ctinfo);
>> - if (ct != NULL) {
>> - tuple_ptr = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
>> - zone = nf_ct_zone(ct);
>> - } else if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
>> - xt_family(par), net, &tuple)) {
>> + ct = nf_ct_get_or_find(net, skb, xt_family(par), &refcounted);
>> + if (!ct)
>> goto hotdrop;
>> - }
>
> This can't work this way for -t raw use case, which we need
> to preserve.
>
> Anyone who uses -t raw ... -m connlimit ...
>
> will now have their packets dropped, so no connection
> can make forward progress (not even when using iptables --syn).
>
> We need to get rid of the
> if ((u32)jiffies == list->last_gc)
> goto add_new_node;
>
> check in nf_conncount_add(), or, (to not add perf regress for ovs ...)
> apply it when a (confirmed) conntrack entry is present.
>
> Given that limitation, I don't think this nf_ct_get_or_find() helper
> makes any sense, since you still need to pass the tupleptr down
> to count_tree().
>
*sight* you are right, this approach still have the same problem with
raw as we won't have a ct..
> I think passing in the sk_buff is better, so all of this
> conntrack/tuple/zone etc. stuff is hidden away in nf_conncount.c.
>
Yes, this is the only solution possible passing down the sk_buff and
handle all of that there.
> I think you could start by *adding*
>> unsigned int nf_conncount_count_skb(struct net *net,
> const struct sk_buff *skb,
> struct nf_conncount_data *data,
> const u32 *key);
>
> As frontend function for nf_conncount_count().
> This new function could (re)use some of the code you made
> for nf_ct_get_or_find(), the zone usage there looks correct
> to me.
>
> Then, in patch 2, convert -m connlimit.
> You could send that as an initial patch set already.
>
> Then, in patch 3 (or later followup patch set), convert remaining user
> (ovs) and hide old api.
>
> Then, in patch 4, start pushing down the sk_buff more in nf_conncount.c
> until its available for nf_conncount_add().
>
> Then, add nf_conncount_add_skb and repeat this process.
>
> Does that make sense?
Yes. Thank you for all these explanations Florian. This is being much
complex than I expected initially..
Thanks,
Fernando.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-10 21:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-10 15:42 [PATCH 0/4 nf-next v2] netfilter: rework conncount API to receive ct directly Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 1/4 nf-next v2] netfilter: conntrack: add nf_ct_get_or_find() helper Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 2/4 nf-next v2] netfilter: nf_conncount: only track connection if it is not confirmed Fernando Fernandez Mancera
2025-11-10 20:44 ` Florian Westphal
2025-11-10 21:40 ` Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 3/4 nf-next v2] netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH Fernando Fernandez Mancera
2025-11-10 15:42 ` [PATCH 4/4 nf-next v2] netfilter: nft_connlimit: update connection list if add was skipped Fernando Fernandez Mancera
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.