All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost
@ 2026-01-18 11:13 Fernando Fernandez Mancera
  2026-01-18 12:22 ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2026-01-18 11:13 UTC (permalink / raw)
  To: netfilter-devel
  Cc: coreteam, pablo, fw, phil, Fernando Fernandez Mancera,
	Michal Slabihoudek

Since commit be102eb6a0e7 ("netfilter: nf_conncount: rework API to use
sk_buff directly"), we skip the adding and trigger a GC when the ct is
confirmed. For connections originated from local to local it doesn't
work because the connection is confirmed from a early stage, therefore
tracking is always skipped.

In order to fix this, we check whether IPS_SEEN_REPLY_BIT is set to
understand if it is really confirmed. If it isn't then we fallback on a
GC plus track operation skipping the optimization. This fallback is
necessary to avoid duplicated tracking of a packet train e.g 10 UDP
datagrams sent on a burst when initiating the connection.

Tested with xt_connlimit/nft_connlimit and OVS limit and with a HTTP
server and iperf3 on UDP mode.

Fixes: be102eb6a0e7 ("netfilter: nf_conncount: rework API to use sk_buff directly")
Reported-by: Michal Slabihoudek <michal.slabihoudek@gooddata.com>
Closes: https://lore.kernel.org/netfilter/6989BD9F-8C24-4397-9AD7-4613B28BF0DB@gooddata.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
Note: rebased in top of nf-next/testing tree
 net/netfilter/nf_conncount.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 288936f5c1bf..5588cd0fcd9a 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -179,14 +179,25 @@ static int __nf_conncount_add(struct net *net,
 		return -ENOENT;
 
 	if (ct && nf_ct_is_confirmed(ct)) {
-		err = -EEXIST;
-		goto out_put;
+		/* connections from localhost are confirmed almost instantly,
+		 * check if there has been a reply
+		 */
+		if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
+			err = -EEXIST;
+			goto out_put;
+		}
+
+		/* this is likely a local connection, skip optimization to avoid
+		 * adding duplicates from a 'packet train'
+		 */
+		goto check_connections;
 	}
 
 	if ((u32)jiffies == list->last_gc &&
 	    (list->count - list->last_gc_count) < CONNCOUNT_GC_MAX_COLLECT)
 		goto add_new_node;
 
+check_connections:
 	/* check the saved connections */
 	list_for_each_entry_safe(conn, conn_n, &list->head, node) {
 		if (collect > CONNCOUNT_GC_MAX_COLLECT)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost
  2026-01-18 11:13 [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost Fernando Fernandez Mancera
@ 2026-01-18 12:22 ` Florian Westphal
  2026-01-18 15:50   ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2026-01-18 12:22 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netfilter-devel, coreteam, pablo, phil, Michal Slabihoudek

Fernando Fernandez Mancera <fmancera@suse.de> wrote:
> Since commit be102eb6a0e7 ("netfilter: nf_conncount: rework API to use
> sk_buff directly"), we skip the adding and trigger a GC when the ct is
> confirmed. For connections originated from local to local it doesn't
> work because the connection is confirmed from a early stage, therefore
> tracking is always skipped.

Alternative:

@@ -415,7 +415,7 @@ insert_tree(struct net *net,
                        if (ret && ret != -EEXIST)
                                count = 0; /* hotdrop */
                        else
-                               count = rbconn->list.count;
+                               count = rbconn->list.count ? rbconn->list.count : 1;

?

connlimit for localhost connections only works correctly in output or
postrouting, even before any of your changes.

As you say; for local connections, confirmation happens in postrouting,
i.e., before prerouting rules are evaluated.

Hence, even before any of your changes, the conntrack limit is never
effective because the connections are confirmed before.
In the reported example, its no problem to create 1000k connections,
500 will go through, rest will eventually time out.  But they are
created.

AFAICS the problem is erroneous trigger of "hotdrop" mode.  First
connection attempt allocates new node, with count == 1.

Subsequent attempt encounter -EEXIST check instead of add, then
return list length with is 0, not 1  so packet gets dropped.

Without your patches, connections won't complete once reaching
the limit, but the conntrack entries can be allocated and are confirmed
regardless of "-m connlimit".

Thus I'm not sold on this use case, it doesn't limit connections,
it only limits established ones.

If its legit case, then we should have a test case for this in
iptables.git .

That said, the patch looks correct to me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost
  2026-01-18 12:22 ` Florian Westphal
@ 2026-01-18 15:50   ` Fernando Fernandez Mancera
  2026-01-18 16:03     ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2026-01-18 15:50 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, coreteam, pablo, phil, Michal Slabihoudek

On 1/18/26 1:22 PM, Florian Westphal wrote:
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>> Since commit be102eb6a0e7 ("netfilter: nf_conncount: rework API to use
>> sk_buff directly"), we skip the adding and trigger a GC when the ct is
>> confirmed. For connections originated from local to local it doesn't
>> work because the connection is confirmed from a early stage, therefore
>> tracking is always skipped.
> 
> Alternative:
> 
> @@ -415,7 +415,7 @@ insert_tree(struct net *net,
>                          if (ret && ret != -EEXIST)
>                                  count = 0; /* hotdrop */
>                          else
> -                               count = rbconn->list.count;
> +                               count = rbconn->list.count ? rbconn->list.count : 1;
> 
> ?
> 

I also thought about this but to me it feels a bit misleading. In 
essence, packets will go through but connections won't be tracked.

See the comment below, if you believe we should not cover this use case 
I am fine with this proposed alternative.

> connlimit for localhost connections only works correctly in output or
> postrouting, even before any of your changes.
> 
> As you say; for local connections, confirmation happens in postrouting,
> i.e., before prerouting rules are evaluated.
> 
> Hence, even before any of your changes, the conntrack limit is never
> effective because the connections are confirmed before.
> In the reported example, its no problem to create 1000k connections,
> 500 will go through, rest will eventually time out.  But they are
> created.
> 
> AFAICS the problem is erroneous trigger of "hotdrop" mode.  First
> connection attempt allocates new node, with count == 1.
> 
> Subsequent attempt encounter -EEXIST check instead of add, then
> return list length with is 0, not 1  so packet gets dropped.
> 
> Without your patches, connections won't complete once reaching
> the limit, but the conntrack entries can be allocated and are confirmed
> regardless of "-m connlimit".
> 
> Thus I'm not sold on this use case, it doesn't limit connections,
> it only limits established ones.
> 

Yes, this is completely right. I think the main issue here is that the 
usage and definition of connlimit has been too open. E.g people started 
using it for soft-limiting connections while it wasn't designed for that.

Here, this is a similar situation. I have seen people using connlimit to 
limit the number of connections established against a specific backend.

> If its legit case, then we should have a test case for this in
> iptables.git .
> 
> That said, the patch looks correct to me.

To me it is a legit case, corner case for sure but legit. Anyway, I am 
fine adding both solutions and won't push hard on this because as I said 
is quite corner case. I am just afraid of breaking existing use cases.

Thank you for the feedback Florian!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost
  2026-01-18 15:50   ` Fernando Fernandez Mancera
@ 2026-01-18 16:03     ` Florian Westphal
  2026-01-18 16:34       ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2026-01-18 16:03 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netfilter-devel, coreteam, pablo, phil, Michal Slabihoudek

Fernando Fernandez Mancera <fmancera@suse.de> wrote:
> To me it is a legit case, corner case for sure but legit. Anyway, I am fine
> adding both solutions and won't push hard on this because as I said is quite
> corner case. I am just afraid of breaking existing use cases.

Yes :-/

What about a v2 that checks skb->skb_iif, would that work?
I dislike the need to unconditionally wait for SEEN_REPLY.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost
  2026-01-18 16:03     ` Florian Westphal
@ 2026-01-18 16:34       ` Fernando Fernandez Mancera
  2026-01-19  0:21         ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2026-01-18 16:34 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, coreteam, pablo, phil, Michal Slabihoudek

On 1/18/26 5:03 PM, Florian Westphal wrote:
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>> To me it is a legit case, corner case for sure but legit. Anyway, I am fine
>> adding both solutions and won't push hard on this because as I said is quite
>> corner case. I am just afraid of breaking existing use cases.
> 
> Yes :-/
> 
> What about a v2 that checks skb->skb_iif, would that work?
> I dislike the need to unconditionally wait for SEEN_REPLY.
> 

After a quick test, it works for local connections. Although it doesn't 
work for reverse-connlimit on INPUT. Consider the following ruleset:

iptables -I INPUT -p tcp --sport 80 --tcp-flags FIN,SYN,RST,ACK SYN -m 
connlimit --connlimit-above 100 -j LOG --log-prefix "Exceeded limit 
established connections to 443"

But I believe this isn't correct and this should be in OUTPUT instead. 
In the OUTPUT it works well.

To clarify this is the diff:

diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
index 5588cd0fcd9a..339aaf5e3393 100644
--- a/net/netfilter/nf_conncount.c
+++ b/net/netfilter/nf_conncount.c
@@ -182,7 +182,7 @@ static int __nf_conncount_add(struct net *net,
                 /* connections from localhost are confirmed almost 
instantly,
                  * check if there has been a reply
                  */
-               if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
+               if (skb->skb_iif != 1) {
                         err = -EEXIST;
                         goto out_put;
                 }

I will send a V2 and ask for testing from Michal if possible.

Thanks,
Fernando.


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost
  2026-01-18 16:34       ` Fernando Fernandez Mancera
@ 2026-01-19  0:21         ` Florian Westphal
  2026-01-19 16:37           ` Fernando Fernandez Mancera
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2026-01-19  0:21 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netfilter-devel, coreteam, pablo, phil, Michal Slabihoudek

Fernando Fernandez Mancera <fmancera@suse.de> wrote:
> After a quick test, it works for local connections. Although it doesn't 
> work for reverse-connlimit on INPUT. Consider the following ruleset:
> 
> iptables -I INPUT -p tcp --sport 80 --tcp-flags FIN,SYN,RST,ACK SYN -m 
> connlimit --connlimit-above 100 -j LOG --log-prefix "Exceeded limit 
> established connections to 443"

Mhh, what is that supposed to do?

'sport 80' meaning that we're client and we're receiving back a syn/ack?
The rule only matches syn packets.

I'm confused what this should accomplish.

> To clarify this is the diff:
> 
> diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
> index 5588cd0fcd9a..339aaf5e3393 100644
> --- a/net/netfilter/nf_conncount.c
> +++ b/net/netfilter/nf_conncount.c
> @@ -182,7 +182,7 @@ static int __nf_conncount_add(struct net *net,
>                  /* connections from localhost are confirmed almost 
> instantly,
>                   * check if there has been a reply
>                   */
> -               if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
> +               if (skb->skb_iif != 1) {
>                          err = -EEXIST;
>                          goto out_put;
>                  }
> 
> I will send a V2 and ask for testing from Michal if possible.

Thanks!  connlimit is very old and there is no formal spec as
to what its supposed to do, so I supsect we should try to at least
fix the reported regression.   I'm fine with both approaches
(REPLY and iif test), but the iif one would be 'better' in the sense
that its a clear workaround for the more shady corner case :-)

Would you mind updating the comment was well to explain that
this is related to loopback traffic, with conncount sitting
in prerouting and thus after conntrack confirmation?

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost
  2026-01-19  0:21         ` Florian Westphal
@ 2026-01-19 16:37           ` Fernando Fernandez Mancera
  0 siblings, 0 replies; 7+ messages in thread
From: Fernando Fernandez Mancera @ 2026-01-19 16:37 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, coreteam, pablo, phil, Michal Slabihoudek



On 1/19/26 1:21 AM, Florian Westphal wrote:
> Fernando Fernandez Mancera <fmancera@suse.de> wrote:
>> After a quick test, it works for local connections. Although it doesn't
>> work for reverse-connlimit on INPUT. Consider the following ruleset:
>>
>> iptables -I INPUT -p tcp --sport 80 --tcp-flags FIN,SYN,RST,ACK SYN -m
>> connlimit --connlimit-above 100 -j LOG --log-prefix "Exceeded limit
>> established connections to 443"
> 
> Mhh, what is that supposed to do?
> 
> 'sport 80' meaning that we're client and we're receiving back a syn/ack?
> The rule only matches syn packets.
> 
> I'm confused what this should accomplish.
> 

Sorry that is copy-paste typo.

  iptables -I INPUT -p tcp --sport 80 --tcp-flags FIN,SYN,RST,ACK SYN,ACK
  -m connlimit --connlimit-above 100 -j LOG --log-prefix "Exceed
  established connections to 80"

Anyway, I think this should not be supported. In order to accomplish the 
case where we are the client, it should use OUTPUT.

>> To clarify this is the diff:
>>
>> diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c
>> index 5588cd0fcd9a..339aaf5e3393 100644
>> --- a/net/netfilter/nf_conncount.c
>> +++ b/net/netfilter/nf_conncount.c
>> @@ -182,7 +182,7 @@ static int __nf_conncount_add(struct net *net,
>>                   /* connections from localhost are confirmed almost
>> instantly,
>>                    * check if there has been a reply
>>                    */
>> -               if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
>> +               if (skb->skb_iif != 1) {
>>                           err = -EEXIST;
>>                           goto out_put;
>>                   }
>>
>> I will send a V2 and ask for testing from Michal if possible.
> 
> Thanks!  connlimit is very old and there is no formal spec as
> to what its supposed to do, so I supsect we should try to at least
> fix the reported regression.   I'm fine with both approaches
> (REPLY and iif test), but the iif one would be 'better' in the sense
> that its a clear workaround for the more shady corner case :-)
> 
> Would you mind updating the comment was well to explain that
> this is related to loopback traffic, with conncount sitting
> in prerouting and thus after conntrack confirmation?
> 

Sure thing, in addition I am using LOOPBACK_IFINDEX which is defined in 
include/net/flow.h and included by include/linux/netfilter.h and I 
consider it better than the magic number.

Thanks,
Fernando.

> Thanks!
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-19 16:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-18 11:13 [PATCH nf-next] netfilter: nf_conncount: fix tracking of connections from localhost Fernando Fernandez Mancera
2026-01-18 12:22 ` Florian Westphal
2026-01-18 15:50   ` Fernando Fernandez Mancera
2026-01-18 16:03     ` Florian Westphal
2026-01-18 16:34       ` Fernando Fernandez Mancera
2026-01-19  0:21         ` Florian Westphal
2026-01-19 16:37           ` Fernando Fernandez Mancera

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.