All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf] netfilter: nf_conncount: prevent connlimit drops for early confirmed ct
@ 2026-05-13 12:15 Fernando Fernandez Mancera
  2026-05-13 12:45 ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Fernando Fernandez Mancera @ 2026-05-13 12:15 UTC (permalink / raw)
  To: netfilter-devel
  Cc: coreteam, phil, fw, pablo, Fernando Fernandez Mancera,
	Alejandro Olivan Alvarez

Commit 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add
was skipped") introduced a regression where packets for valid
connections are dropped when using connlimit for soft-limiting
scenarios.

The issue occurs when a new connection reuses a socket currently in
the TIME_WAIT state. In this scenario, the connection tracking entry
is evaluated as already confirmed. Previously, __nf_conncount_add()
assumed that if a connection was confirmed and did not originate from
the loopback interface, it should skip the addition and return -EEXIST.

Skipping the addition triggers a garbage collection run that cleans up
the TIME_WAIT connection. Consequently, the active connection count
drops to 0, which xt_connlimit mishandles, leading to the false rejection
of the perfectly valid new connection.

Fix this by replacing the interface check with protocol-agnostic state
checks. We now skip the tree insertion and preserve the lockless garbage
collection optimization only if the connection is IPS_ASSURED or
IP_CT_ESTABLISHED. This allows early-confirmed setup packets (such as
reused TIME_WAIT sockets or locally generated SYN-ACKs) to be properly
evaluated and counted without falsely dropping. The goto
check_connections path is maintained to ensure these setup packets are
deduplicated correctly.

This has been tested with slowhttptest and HTTP server configured
locally to ensure we are not breaking soft-limiting scenarios for local
or external connections. In addition, it was tested with a OVS zone
limit too.

Fixes: 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped")
Reported-by: Alejandro Olivan Alvarez <alejandro.olivan.alvarez@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/177349610461.3071718.4083978280323144323@eldamar.lan/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
 net/netfilter/nf_conncount.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 00eed5b4d1b1..bf718d26acda 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -129,13 +129,13 @@ static bool get_ct_or_tuple_from_skb(struct net *net,
 				     struct nf_conn **ct,
 				     struct nf_conntrack_tuple *tuple,
 				     const struct nf_conntrack_zone **zone,
+				     enum ip_conntrack_info *ctinfo,
 				     bool *refcounted)
 {
 	const struct nf_conntrack_tuple_hash *h;
-	enum ip_conntrack_info ctinfo;
 	struct nf_conn *found_ct;
 
-	found_ct = nf_ct_get(skb, &ctinfo);
+	found_ct = nf_ct_get(skb, ctinfo);
 	if (found_ct && !nf_ct_is_template(found_ct)) {
 		*tuple = found_ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
 		*zone = nf_ct_zone(found_ct);
@@ -169,27 +169,22 @@ static int __nf_conncount_add(struct net *net,
 	const struct nf_conntrack_tuple_hash *found;
 	struct nf_conncount_tuple *conn, *conn_n;
 	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = NULL;
 	struct nf_conn *found_ct;
 	unsigned int collect = 0;
 	bool refcounted = false;
 	int err = 0;
 
-	if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, &refcounted))
+	if (!get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, &ctinfo, &refcounted))
 		return -ENOENT;
 
 	if (ct && nf_ct_is_confirmed(ct)) {
-		/* local connections are confirmed in postrouting so confirmation
-		 * might have happened before hitting connlimit
-		 */
-		if (skb->skb_iif != LOOPBACK_IFINDEX) {
+		if (test_bit(IPS_ASSURED_BIT, &ct->status) || ctinfo == IP_CT_ESTABLISHED) {
 			err = -EEXIST;
 			goto out_put;
 		}
 
-		/* this is likely a local connection, skip optimization to avoid
-		 * adding duplicates from a 'packet train'
-		 */
 		goto check_connections;
 	}
 
@@ -408,6 +403,7 @@ insert_tree(struct net *net,
 	struct nf_conntrack_tuple tuple;
 	struct nf_conncount_tuple *conn;
 	struct nf_conncount_rb *rbconn;
+	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = NULL;
 
 	spin_lock_bh(&nf_conncount_locks[hash]);
@@ -451,7 +447,7 @@ insert_tree(struct net *net,
 		goto restart;
 	}
 
-	if (get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, &refcounted)) {
+	if (get_ct_or_tuple_from_skb(net, skb, l3num, &ct, &tuple, &zone, &ctinfo, &refcounted)) {
 		/* expected case: match, insert new node */
 		rbconn = kmem_cache_alloc(conncount_rb_cachep, GFP_ATOMIC);
 		if (rbconn == NULL)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH nf] netfilter: nf_conncount: prevent connlimit drops for early confirmed ct
  2026-05-13 12:15 [PATCH nf] netfilter: nf_conncount: prevent connlimit drops for early confirmed ct Fernando Fernandez Mancera
@ 2026-05-13 12:45 ` Florian Westphal
  2026-05-13 13:48   ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-05-13 12:45 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netfilter-devel, coreteam, phil, pablo, Alejandro Olivan Alvarez

Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>  	if (ct && nf_ct_is_confirmed(ct)) {
> -		/* local connections are confirmed in postrouting so confirmation
> -		 * might have happened before hitting connlimit
> -		 */
> -		if (skb->skb_iif != LOOPBACK_IFINDEX) {
> +		if (test_bit(IPS_ASSURED_BIT, &ct->status) || ctinfo == IP_CT_ESTABLISHED) {
>  			err = -EEXIST;
>  			goto out_put;
>  		}

IIRC IP_CT_ESTABLISHED requires we can observe traffic in both
directions.  I'm not sure it was a good idea to allow this new
usage case added in 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped"),
but I don't think we can revert it either :-(

No idea how to fix this mess.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH nf] netfilter: nf_conncount: prevent connlimit drops for early confirmed ct
  2026-05-13 12:45 ` Florian Westphal
@ 2026-05-13 13:48   ` Fernando Fernandez Mancera
  2026-05-13 14:13     ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Fernando Fernandez Mancera @ 2026-05-13 13:48 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, coreteam, phil, pablo, Alejandro Olivan Alvarez



On 5/13/26 2:45 PM, Florian Westphal wrote:
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>>   	if (ct && nf_ct_is_confirmed(ct)) {
>> -		/* local connections are confirmed in postrouting so confirmation
>> -		 * might have happened before hitting connlimit
>> -		 */
>> -		if (skb->skb_iif != LOOPBACK_IFINDEX) {
>> +		if (test_bit(IPS_ASSURED_BIT, &ct->status) || ctinfo == IP_CT_ESTABLISHED) {
>>   			err = -EEXIST;
>>   			goto out_put;
>>   		}
> 
> IIRC IP_CT_ESTABLISHED requires we can observe traffic in both
> directions.  I'm not sure it was a good idea to allow this new
> usage case added in 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped"),
> but I don't think we can revert it either :-(
> 

Whether allowing or not, it was already being used as the tcp --syn 
requirement was never enforced. This currently fixes it in most of the 
situations, we just missed an edge case.

The proposed solutions should be fine as it covers:

1. Standard connlimit usage with tcp --syn for both local/non-local 
connections and input/output hooks

2. soft-limiting connlimit usage without tcp --syn on both 
local/non-local connections and input/output hooks

3. OVS zone limits (this is actually like scenario 1.)

It also covers re-use of sockets in TIME_WAIT and fixes hotdrops for 
outgoing already tracked connections when doing softlimiting in the 
INPUT hook. Of course, it prevents the double tracking even for 
retransmissions.

About IP_CT_ESTABLISHED, I added it because it was not clear to me that 
IPS_ASSURED_BIT is always set. I guess yes for TCP/UDP but what about 
other protocols? (Are we supporting other protocols???) Anyway, I have 
tested it and confirmed that for TCP/UDP it is safe to drop it.

And please note that the idea is to be cautious when returning --EXIST. 
If IPS_ASSURED_BIT is set we can for sure skip the tracking BUT if not, 
we run a GC skipping the skip optimization..

FWIW; I run a test with 60k connections and no warnings/drops/dup 
tracking seen.

> No idea how to fix this mess.

Is it that bad? I mean, it has some back and forth and I apologize for 
that but overall this is fixing some real use cases.

Thanks for the patience Florian,
Fernando.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH nf] netfilter: nf_conncount: prevent connlimit drops for early confirmed ct
  2026-05-13 13:48   ` Fernando Fernandez Mancera
@ 2026-05-13 14:13     ` Florian Westphal
  2026-05-13 14:52       ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-05-13 14:13 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netfilter-devel, coreteam, phil, pablo, Alejandro Olivan Alvarez

Fernando Fernandez Mancera <fmancera@suse.de> wrote:
> About IP_CT_ESTABLISHED, I added it because it was not clear to me that 
> IPS_ASSURED_BIT is always set. I guess yes for TCP/UDP but what about 
> other protocols? (Are we supporting other protocols???) Anyway, I have 
> tested it and confirmed that for TCP/UDP it is safe to drop it.

SCTP is the only other relevant one for this use-case, I think.

> And please note that the idea is to be cautious when returning --EXIST. 
> If IPS_ASSURED_BIT is set we can for sure skip the tracking BUT if not, 
> we run a GC skipping the skip optimization..

6 months from now I will no longer know wtf this assured check is
doing.  Please consider rewriting the existing comments so that this
makes some sense.

> Is it that bad? I mean, it has some back and forth and I apologize for 
> that but overall this is fixing some real use cases.

I know, this isn't your fault. Conncount is used in all kinds of cases
that it wasn't designed for and thus we have this esoteric breakage in
first place.

No way we can avoid it.  I think your patch is the best we can do here.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH nf] netfilter: nf_conncount: prevent connlimit drops for early confirmed ct
  2026-05-13 14:13     ` Florian Westphal
@ 2026-05-13 14:52       ` Fernando Fernandez Mancera
  0 siblings, 0 replies; 5+ messages in thread
From: Fernando Fernandez Mancera @ 2026-05-13 14:52 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, coreteam, phil, pablo, Alejandro Olivan Alvarez

On 5/13/26 4:13 PM, Florian Westphal wrote:
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>> About IP_CT_ESTABLISHED, I added it because it was not clear to me that
>> IPS_ASSURED_BIT is always set. I guess yes for TCP/UDP but what about
>> other protocols? (Are we supporting other protocols???) Anyway, I have
>> tested it and confirmed that for TCP/UDP it is safe to drop it.
> 
> SCTP is the only other relevant one for this use-case, I think.
> 

Then we can drop it, thanks Florian.

>> And please note that the idea is to be cautious when returning --EXIST.
>> If IPS_ASSURED_BIT is set we can for sure skip the tracking BUT if not,
>> we run a GC skipping the skip optimization..
> 
> 6 months from now I will no longer know wtf this assured check is
> doing.  Please consider rewriting the existing comments so that this
> makes some sense.
> 

Fair point, let me add some comments around it.

>> Is it that bad? I mean, it has some back and forth and I apologize for
>> that but overall this is fixing some real use cases.
> 
> I know, this isn't your fault. Conncount is used in all kinds of cases
> that it wasn't designed for and thus we have this esoteric breakage in
> first place.
> 
> No way we can avoid it.  I think your patch is the best we can do here.
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-13 14:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 12:15 [PATCH nf] netfilter: nf_conncount: prevent connlimit drops for early confirmed ct Fernando Fernandez Mancera
2026-05-13 12:45 ` Florian Westphal
2026-05-13 13:48   ` Fernando Fernandez Mancera
2026-05-13 14:13     ` Florian Westphal
2026-05-13 14:52       ` Fernando Fernandez Mancera

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.