* [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection
@ 2025-10-29 13:23 Fernando Fernandez Mancera
2025-10-29 20:33 ` Pablo Neira Ayuso
0 siblings, 1 reply; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2025-10-29 13:23 UTC (permalink / raw)
To: netfilter-devel; +Cc: coreteam, fw, pablo, Fernando Fernandez Mancera
Connlimit expression can be used for all kind of packets and not only
for packets with connection state new. See this ruleset as example:
table ip filter {
chain input {
type filter hook input priority filter; policy accept;
tcp dport 22 ct count over 4 counter
}
}
Currently, if the connection count goes over the limit the counter will
count the packets. When a connection is closed, the connection count
won't decrement as it should because it is only updated for new
connections due to an optimization on __nf_conncount_add() that prevents
updating the list if the connection is duplicated.
In addition, since commit d265929930e2 ("netfilter: nf_conncount: reduce
unnecessary GC") there can be situations where a duplicated connection
is added to the list. This is caused by two packets from the same
connection being processed during the same jiffy.
To solve these problems, check whether this is a new connection and only
add the connection to the list if that is the case during connlimit
evaluation. Otherwise run a GC to update the count. This doesn't yield a
performance degradation.
Fixed in xt_connlimit too.
Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
Fixes: 976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
---
v2: use nf_ct_is_confirmed(), add comment about why the gc call is
needed and fix this in xt_connlimit too.
---
net/netfilter/nft_connlimit.c | 17 ++++++++++++++---
net/netfilter/xt_connlimit.c | 14 ++++++++++++--
2 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
index fc35a11cdca2..dedea1681e73 100644
--- a/net/netfilter/nft_connlimit.c
+++ b/net/netfilter/nft_connlimit.c
@@ -43,9 +43,20 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv,
return;
}
- if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
- regs->verdict.code = NF_DROP;
- return;
+ if (!ct || !nf_ct_is_confirmed(ct)) {
+ if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
+ regs->verdict.code = NF_DROP;
+ return;
+ }
+ } else {
+ /* Call gc to update the list count if any connection has been
+ * closed already. This is useful to softlimit connections
+ * like limiting bandwidth based on a number of open
+ * connections.
+ */
+ local_bh_disable();
+ nf_conncount_gc_list(nft_net(pkt), priv->list);
+ local_bh_enable();
}
count = READ_ONCE(priv->list->count);
diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index 0189f8b6b0bd..5c90e1929d86 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -69,8 +69,18 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
key[1] = zone->id;
}
- connections = nf_conncount_count(net, info->data, key, tuple_ptr,
- zone);
+ if (!ct || !nf_ct_is_confirmed(ct)) {
+ connections = nf_conncount_count(net, info->data, key, tuple_ptr,
+ zone);
+ } else {
+ /* Call nf_conncount_count() with NULL tuple and zone to update
+ * the list if any connection has been closed already. This is
+ * useful to softlimit connections like limiting bandwidth based
+ * on a number of open connections.
+ */
+ connections = nf_conncount_count(net, info->data, key, NULL, NULL);
+ }
+
if (connections == 0)
/* kmalloc failed, drop it entirely */
goto hotdrop;
--
2.51.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection
2025-10-29 13:23 [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection Fernando Fernandez Mancera
@ 2025-10-29 20:33 ` Pablo Neira Ayuso
2025-10-30 8:12 ` Fernando Fernandez Mancera
0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2025-10-29 20:33 UTC (permalink / raw)
To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, fw
Hi Fernando,
On Wed, Oct 29, 2025 at 02:23:18PM +0100, Fernando Fernandez Mancera wrote:
> Connlimit expression can be used for all kind of packets and not only
> for packets with connection state new. See this ruleset as example:
>
> table ip filter {
> chain input {
> type filter hook input priority filter; policy accept;
> tcp dport 22 ct count over 4 counter
> }
> }
>
> Currently, if the connection count goes over the limit the counter will
> count the packets. When a connection is closed, the connection count
> won't decrement as it should because it is only updated for new
> connections due to an optimization on __nf_conncount_add() that prevents
> updating the list if the connection is duplicated.
>
> In addition, since commit d265929930e2 ("netfilter: nf_conncount: reduce
> unnecessary GC") there can be situations where a duplicated connection
> is added to the list. This is caused by two packets from the same
> connection being processed during the same jiffy.
>
> To solve these problems, check whether this is a new connection and only
> add the connection to the list if that is the case during connlimit
> evaluation. Otherwise run a GC to update the count. This doesn't yield a
> performance degradation.
This is true is list is small, e.g. ct count over 4.
But user could much larger value, then every packet could trigger a
long list walk, because gc is bound to CONNCOUNT_GC_MAX_NODES which is
the maximum number of nodes that is _collected_.
And maybe the user selects:
ct count over N mark set 0x1
where N is high, the gc walk can be long.
TBH, I added this expression mainly focusing on being used with
dynset, I allowed it too in rules for parity. In the dynset case,
there is a front-end datastructure (set) and this conncount list
is per element. Maybe there high ct count is also possible.
With this patch, gc is called more frequently, not only for each new
packet.
> Fixed in xt_connlimit too.
>
> Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
> Fixes: 976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
> Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
> ---
> v2: use nf_ct_is_confirmed(), add comment about why the gc call is
> needed and fix this in xt_connlimit too.
> ---
> net/netfilter/nft_connlimit.c | 17 ++++++++++++++---
> net/netfilter/xt_connlimit.c | 14 ++++++++++++--
> 2 files changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
> index fc35a11cdca2..dedea1681e73 100644
> --- a/net/netfilter/nft_connlimit.c
> +++ b/net/netfilter/nft_connlimit.c
> @@ -43,9 +43,20 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv,
> return;
> }
>
> - if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
> - regs->verdict.code = NF_DROP;
> - return;
> + if (!ct || !nf_ct_is_confirmed(ct)) {
> + if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
> + regs->verdict.code = NF_DROP;
> + return;
> + }
> + } else {
> + /* Call gc to update the list count if any connection has been
> + * closed already. This is useful to softlimit connections
> + * like limiting bandwidth based on a number of open
> + * connections.
> + */
> + local_bh_disable();
> + nf_conncount_gc_list(nft_net(pkt), priv->list);
> + local_bh_enable();
> }
>
> count = READ_ONCE(priv->list->count);
> diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
> index 0189f8b6b0bd..5c90e1929d86 100644
> --- a/net/netfilter/xt_connlimit.c
> +++ b/net/netfilter/xt_connlimit.c
> @@ -69,8 +69,18 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
> key[1] = zone->id;
> }
>
> - connections = nf_conncount_count(net, info->data, key, tuple_ptr,
> - zone);
> + if (!ct || !nf_ct_is_confirmed(ct)) {
> + connections = nf_conncount_count(net, info->data, key, tuple_ptr,
> + zone);
> + } else {
> + /* Call nf_conncount_count() with NULL tuple and zone to update
> + * the list if any connection has been closed already. This is
> + * useful to softlimit connections like limiting bandwidth based
> + * on a number of open connections.
> + */
> + connections = nf_conncount_count(net, info->data, key, NULL, NULL);
> + }
Maybe remove this from xt_connlimit?
> +
> if (connections == 0)
> /* kmalloc failed, drop it entirely */
> goto hotdrop;
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection
2025-10-29 20:33 ` Pablo Neira Ayuso
@ 2025-10-30 8:12 ` Fernando Fernandez Mancera
2025-10-30 23:04 ` Pablo Neira Ayuso
0 siblings, 1 reply; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2025-10-30 8:12 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, coreteam, fw
On 10/29/25 9:33 PM, Pablo Neira Ayuso wrote:
> Hi Fernando,
>
> On Wed, Oct 29, 2025 at 02:23:18PM +0100, Fernando Fernandez Mancera wrote:
>> Connlimit expression can be used for all kind of packets and not only
>> for packets with connection state new. See this ruleset as example:
>>
>> table ip filter {
>> chain input {
>> type filter hook input priority filter; policy accept;
>> tcp dport 22 ct count over 4 counter
>> }
>> }
>>
>> Currently, if the connection count goes over the limit the counter will
>> count the packets. When a connection is closed, the connection count
>> won't decrement as it should because it is only updated for new
>> connections due to an optimization on __nf_conncount_add() that prevents
>> updating the list if the connection is duplicated.
>>
>> In addition, since commit d265929930e2 ("netfilter: nf_conncount: reduce
>> unnecessary GC") there can be situations where a duplicated connection
>> is added to the list. This is caused by two packets from the same
>> connection being processed during the same jiffy.
>>
>> To solve these problems, check whether this is a new connection and only
>> add the connection to the list if that is the case during connlimit
>> evaluation. Otherwise run a GC to update the count. This doesn't yield a
>> performance degradation.
>
> This is true is list is small, e.g. ct count over 4.
>
> But user could much larger value, then every packet could trigger a
> long list walk, because gc is bound to CONNCOUNT_GC_MAX_NODES which is
> the maximum number of nodes that is _collected_.
>
> And maybe the user selects:
>
> ct count over N mark set 0x1
>
> where N is high, the gc walk can be long.
>
> TBH, I added this expression mainly focusing on being used with
> dynset, I allowed it too in rules for parity. In the dynset case,
> there is a front-end datastructure (set) and this conncount list
> is per element. Maybe there high ct count is also possible.
>
> With this patch, gc is called more frequently, not only for each new
> packet.
>
How is it called more frequently? Before, it was calling
nf_conncount_add() for every packet which is indeed performing a GC
inside, both nf_conncount_add() and nf_conncount_gc_list() return
immediately if a GC was performed during the same jiffy.
I tried with a limit of 2000 connections and didn't notice any
performance change. I could try with CONNCOUNT_GC_MAX_NODES too.
>> Fixed in xt_connlimit too.
>>
>> Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
>> Fixes: 976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
>> Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
>> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
>> ---
>> v2: use nf_ct_is_confirmed(), add comment about why the gc call is
>> needed and fix this in xt_connlimit too.
>> ---
>> net/netfilter/nft_connlimit.c | 17 ++++++++++++++---
>> net/netfilter/xt_connlimit.c | 14 ++++++++++++--
>> 2 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
>> index fc35a11cdca2..dedea1681e73 100644
>> --- a/net/netfilter/nft_connlimit.c
>> +++ b/net/netfilter/nft_connlimit.c
>> @@ -43,9 +43,20 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv,
>> return;
>> }
>>
>> - if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
>> - regs->verdict.code = NF_DROP;
>> - return;
>> + if (!ct || !nf_ct_is_confirmed(ct)) {
>> + if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
>> + regs->verdict.code = NF_DROP;
>> + return;
>> + }
>> + } else {
>> + /* Call gc to update the list count if any connection has been
>> + * closed already. This is useful to softlimit connections
>> + * like limiting bandwidth based on a number of open
>> + * connections.
>> + */
>> + local_bh_disable();
>> + nf_conncount_gc_list(nft_net(pkt), priv->list);
>> + local_bh_enable();
>> }
>>
>> count = READ_ONCE(priv->list->count);
>> diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
>> index 0189f8b6b0bd..5c90e1929d86 100644
>> --- a/net/netfilter/xt_connlimit.c
>> +++ b/net/netfilter/xt_connlimit.c
>> @@ -69,8 +69,18 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> key[1] = zone->id;
>> }
>>
>> - connections = nf_conncount_count(net, info->data, key, tuple_ptr,
>> - zone);
>> + if (!ct || !nf_ct_is_confirmed(ct)) {
>> + connections = nf_conncount_count(net, info->data, key, tuple_ptr,
>> + zone);
>> + } else {
>> + /* Call nf_conncount_count() with NULL tuple and zone to update
>> + * the list if any connection has been closed already. This is
>> + * useful to softlimit connections like limiting bandwidth based
>> + * on a number of open connections.
>> + */
>> + connections = nf_conncount_count(net, info->data, key, NULL, NULL);
>> + }
>
> Maybe remove this from xt_connlimit?
>
Dropping this would leave xt_connlimit broken for the use-cases I
discussed with Florian on the v1.
>> +
>> if (connections == 0)
>> /* kmalloc failed, drop it entirely */
>> goto hotdrop;
>> --
>> 2.51.0
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection
2025-10-30 8:12 ` Fernando Fernandez Mancera
@ 2025-10-30 23:04 ` Pablo Neira Ayuso
2025-10-30 23:40 ` Florian Westphal
2025-10-31 8:55 ` Fernando Fernandez Mancera
0 siblings, 2 replies; 6+ messages in thread
From: Pablo Neira Ayuso @ 2025-10-30 23:04 UTC (permalink / raw)
To: Fernando Fernandez Mancera; +Cc: netfilter-devel, coreteam, fw
Hi Fernando,
On Thu, Oct 30, 2025 at 09:12:32AM +0100, Fernando Fernandez Mancera wrote:
>
> On 10/29/25 9:33 PM, Pablo Neira Ayuso wrote:
> > Hi Fernando,
> >
> > On Wed, Oct 29, 2025 at 02:23:18PM +0100, Fernando Fernandez Mancera wrote:
> > > Connlimit expression can be used for all kind of packets and not only
> > > for packets with connection state new. See this ruleset as example:
> > >
> > > table ip filter {
> > > chain input {
> > > type filter hook input priority filter; policy accept;
> > > tcp dport 22 ct count over 4 counter
> > > }
> > > }
> > >
> > > Currently, if the connection count goes over the limit the counter will
> > > count the packets. When a connection is closed, the connection count
> > > won't decrement as it should because it is only updated for new
> > > connections due to an optimization on __nf_conncount_add() that prevents
> > > updating the list if the connection is duplicated.
> > >
> > > In addition, since commit d265929930e2 ("netfilter: nf_conncount: reduce
> > > unnecessary GC") there can be situations where a duplicated connection
> > > is added to the list. This is caused by two packets from the same
> > > connection being processed during the same jiffy.
> > >
> > > To solve these problems, check whether this is a new connection and only
> > > add the connection to the list if that is the case during connlimit
> > > evaluation. Otherwise run a GC to update the count. This doesn't yield a
> > > performance degradation.
> >
> > This is true is list is small, e.g. ct count over 4.
> >
> > But user could much larger value, then every packet could trigger a
> > long list walk, because gc is bound to CONNCOUNT_GC_MAX_NODES which is
> > the maximum number of nodes that is _collected_.
> >
> > And maybe the user selects:
> >
> > ct count over N mark set 0x1
> >
> > where N is high, the gc walk can be long.
> >
> > TBH, I added this expression mainly focusing on being used with
> > dynset, I allowed it too in rules for parity. In the dynset case,
> > there is a front-end datastructure (set) and this conncount list
> > is per element. Maybe there high ct count is also possible.
> >
> > With this patch, gc is called more frequently, not only for each new
> > packet.
> >
>
> How is it called more frequently? Before, it was calling nf_conncount_add()
> for every packet which is indeed performing a GC inside, both
> nf_conncount_add() and nf_conncount_gc_list() return immediately if a GC was
> performed during the same jiffy.
Before this patch, without 'ct state new' in front, this was just
adding duplicates, then count is wrong, ie. this is broken.
Assuming 'ct state new' in place, then gc is only called when new
entries for the initial packet of a connection (still broken because
duplicates due to retransmissions are possible).
My proposal:
- Follow a more conservative approach: Perform this gc cycle for
confirmed ct only when 'ct count over' evaluates true or 'ct count'
evaluates false.
- For the confirmed ct case, stop gc inmediately when one slot is
released to short-circuit the walk.
... but still long walk could possible.
- More difficult: For the confirmed ct case, add a limit on the
maximum entries that are walked over in the gc iteration for each
packet. If no connections are found to be released, annotate the
entry at which this stops and a jiffy timestamp, to resume from where
the gc walk has stopped in the previous gc. The timestamp could be
used to decide whether to make a full gc walk or not. I mean, explore
a bit more advance gc logic now that this will be alled for every
packet.
> I tried with a limit of 2000 connections and didn't notice any performance
> change. I could try with CONNCOUNT_GC_MAX_NODES too.
>
> > > Fixed in xt_connlimit too.
> > >
> > > Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
> > > Fixes: 976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
> > > Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
> > > Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
> > > ---
> > > v2: use nf_ct_is_confirmed(), add comment about why the gc call is
> > > needed and fix this in xt_connlimit too.
> > > ---
> > > net/netfilter/nft_connlimit.c | 17 ++++++++++++++---
> > > net/netfilter/xt_connlimit.c | 14 ++++++++++++--
> > > 2 files changed, 26 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/netfilter/nft_connlimit.c b/net/netfilter/nft_connlimit.c
> > > index fc35a11cdca2..dedea1681e73 100644
> > > --- a/net/netfilter/nft_connlimit.c
> > > +++ b/net/netfilter/nft_connlimit.c
> > > @@ -43,9 +43,20 @@ static inline void nft_connlimit_do_eval(struct nft_connlimit *priv,
> > > return;
> > > }
> > > - if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
> > > - regs->verdict.code = NF_DROP;
> > > - return;
> > > + if (!ct || !nf_ct_is_confirmed(ct)) {
> > > + if (nf_conncount_add(nft_net(pkt), priv->list, tuple_ptr, zone)) {
> > > + regs->verdict.code = NF_DROP;
> > > + return;
> > > + }
> > > + } else {
> > > + /* Call gc to update the list count if any connection has been
> > > + * closed already. This is useful to softlimit connections
> > > + * like limiting bandwidth based on a number of open
> > > + * connections.
> > > + */
> > > + local_bh_disable();
> > > + nf_conncount_gc_list(nft_net(pkt), priv->list);
> > > + local_bh_enable();
> > > }
> > > count = READ_ONCE(priv->list->count);
> > > diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
> > > index 0189f8b6b0bd..5c90e1929d86 100644
> > > --- a/net/netfilter/xt_connlimit.c
> > > +++ b/net/netfilter/xt_connlimit.c
> > > @@ -69,8 +69,18 @@ connlimit_mt(const struct sk_buff *skb, struct xt_action_param *par)
> > > key[1] = zone->id;
> > > }
> > > - connections = nf_conncount_count(net, info->data, key, tuple_ptr,
> > > - zone);
> > > + if (!ct || !nf_ct_is_confirmed(ct)) {
> > > + connections = nf_conncount_count(net, info->data, key, tuple_ptr,
> > > + zone);
> > > + } else {
> > > + /* Call nf_conncount_count() with NULL tuple and zone to update
> > > + * the list if any connection has been closed already. This is
> > > + * useful to softlimit connections like limiting bandwidth based
> > > + * on a number of open connections.
> > > + */
> > > + connections = nf_conncount_count(net, info->data, key, NULL, NULL);
> > > + }
> >
> > Maybe remove this from xt_connlimit?
> >
>
> Dropping this would leave xt_connlimit broken for the use-cases I discussed
> with Florian on the v1.
I don't want to drop this, I am wondering if walking a very long
linear list for each packet can be an issue for every packet.
For xt_connlimit, there is a rbtree of lists, ie. the list is splitted
accross different rbtree nodes and the list should be smaller.
But for nft_connlimit, this is a raw linear list exposed to packet
path, and users can set an arbitrarily large count number.
> > > +
> > > if (connections == 0)
> > > /* kmalloc failed, drop it entirely */
> > > goto hotdrop;
> > > --
> > > 2.51.0
> > >
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection
2025-10-30 23:04 ` Pablo Neira Ayuso
@ 2025-10-30 23:40 ` Florian Westphal
2025-10-31 8:55 ` Fernando Fernandez Mancera
1 sibling, 0 replies; 6+ messages in thread
From: Florian Westphal @ 2025-10-30 23:40 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Fernando Fernandez Mancera, netfilter-devel, coreteam
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > TBH, I added this expression mainly focusing on being used with
> > > dynset, I allowed it too in rules for parity. In the dynset case,
> > > there is a front-end datastructure (set) and this conncount list
> > > is per element. Maybe there high ct count is also possible.
> > >
> > > With this patch, gc is called more frequently, not only for each new
> > > packet.
> > >
> >
> > How is it called more frequently? Before, it was calling nf_conncount_add()
> > for every packet which is indeed performing a GC inside, both
> > nf_conncount_add() and nf_conncount_gc_list() return immediately if a GC was
> > performed during the same jiffy.
>
> Before this patch, without 'ct state new' in front, this was just
> adding duplicates, then count is wrong, ie. this is broken.
Yep, this was broken. It did never occur to me to use this
for anything other than 'stop accepting new connections from
address/network when over x'.
> Assuming 'ct state new' in place, then gc is only called when new
> entries for the initial packet of a connection (still broken because
> duplicates due to retransmissions are possible).
>
> My proposal:
>
> - Follow a more conservative approach: Perform this gc cycle for
> confirmed ct only when 'ct count over' evaluates true or 'ct count'
> evaluates false.
> - For the confirmed ct case, stop gc inmediately when one slot is
> released to short-circuit the walk.
Its currently ending gc early after 8 reaped entries.
> ... but still long walk could possible.
Yes.
> - More difficult: For the confirmed ct case, add a limit on the
> maximum entries that are walked over in the gc iteration for each
> packet. If no connections are found to be released, annotate the
> entry at which this stops and a jiffy timestamp, to resume from where
> the gc walk has stopped in the previous gc. The timestamp could be
> used to decide whether to make a full gc walk or not. I mean, explore
> a bit more advance gc logic now that this will be alled for every
> packet.
There is this:
if ((u32)jiffies == READ_ONCE(list->last_gc))
return false;
i.e. we never collect more than once per jiffy and list.
(actually we could, if one cpu has sailed past the test
and another cpu completed the list walk right after,
but I don't think its a frequent thing to happen).
I don't think its worth playing with further logic here
because even if entry x was checked one jiffy ago there
is always a chance that this connection has been torn
down right now.
If you think its needed to avoid long list walks
(i.e. resume on next packet), then I think we need
a dedicated gc function that does this, e.g. by
relinking scanned-and-valid entries at the tail.
But for insert case we have to scan until we reach
end-of-list or evict at least one entry, else we report
incorrect ct count.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection
2025-10-30 23:04 ` Pablo Neira Ayuso
2025-10-30 23:40 ` Florian Westphal
@ 2025-10-31 8:55 ` Fernando Fernandez Mancera
1 sibling, 0 replies; 6+ messages in thread
From: Fernando Fernandez Mancera @ 2025-10-31 8:55 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, coreteam, fw
On 10/31/25 12:04 AM, Pablo Neira Ayuso wrote:
> Hi Fernando,
>
> On Thu, Oct 30, 2025 at 09:12:32AM +0100, Fernando Fernandez Mancera wrote:
>>
>> On 10/29/25 9:33 PM, Pablo Neira Ayuso wrote:
>>> Hi Fernando,
>>>
>>> On Wed, Oct 29, 2025 at 02:23:18PM +0100, Fernando Fernandez Mancera wrote:
>>>> Connlimit expression can be used for all kind of packets and not only
>>>> for packets with connection state new. See this ruleset as example:
>>>>
>>>> table ip filter {
>>>> chain input {
>>>> type filter hook input priority filter; policy accept;
>>>> tcp dport 22 ct count over 4 counter
>>>> }
>>>> }
>>>>
>>>> Currently, if the connection count goes over the limit the counter will
>>>> count the packets. When a connection is closed, the connection count
>>>> won't decrement as it should because it is only updated for new
>>>> connections due to an optimization on __nf_conncount_add() that prevents
>>>> updating the list if the connection is duplicated.
>>>>
>>>> In addition, since commit d265929930e2 ("netfilter: nf_conncount: reduce
>>>> unnecessary GC") there can be situations where a duplicated connection
>>>> is added to the list. This is caused by two packets from the same
>>>> connection being processed during the same jiffy.
>>>>
>>>> To solve these problems, check whether this is a new connection and only
>>>> add the connection to the list if that is the case during connlimit
>>>> evaluation. Otherwise run a GC to update the count. This doesn't yield a
>>>> performance degradation.
>>>
>>> This is true is list is small, e.g. ct count over 4.
>>>
>>> But user could much larger value, then every packet could trigger a
>>> long list walk, because gc is bound to CONNCOUNT_GC_MAX_NODES which is
>>> the maximum number of nodes that is _collected_.
>>>
>>> And maybe the user selects:
>>>
>>> ct count over N mark set 0x1
>>>
>>> where N is high, the gc walk can be long.
>>>
>>> TBH, I added this expression mainly focusing on being used with
>>> dynset, I allowed it too in rules for parity. In the dynset case,
>>> there is a front-end datastructure (set) and this conncount list
>>> is per element. Maybe there high ct count is also possible.
>>>
>>> With this patch, gc is called more frequently, not only for each new
>>> packet.
>>>
>>
>> How is it called more frequently? Before, it was calling nf_conncount_add()
>> for every packet which is indeed performing a GC inside, both
>> nf_conncount_add() and nf_conncount_gc_list() return immediately if a GC was
>> performed during the same jiffy.
>
> Before this patch, without 'ct state new' in front, this was just
> adding duplicates, then count is wrong, ie. this is broken.
>
> Assuming 'ct state new' in place, then gc is only called when new
> entries for the initial packet of a connection (still broken because
> duplicates due to retransmissions are possible).
>
> My proposal:
>
> - Follow a more conservative approach: Perform this gc cycle for
> confirmed ct only when 'ct count over' evaluates true or 'ct count'
> evaluates false.
>
I like this in particular because it would save CPU.. as long as the
condition is not met AFAIU we don't need to check if the list is updated
or not.
This would be easy to add to the proposed patch.
> - For the confirmed ct case, stop gc inmediately when one slot is
> released to short-circuit the walk.
>
I guess we care only about long lists.. maybe as Florian said stopping
after 8 collects is enough. We could reduce it if needed.
> ... but still long walk could possible.
>
> - More difficult: For the confirmed ct case, add a limit on the
> maximum entries that are walked over in the gc iteration for each
> packet. If no connections are found to be released, annotate the
> entry at which this stops and a jiffy timestamp, to resume from where
> the gc walk has stopped in the previous gc. The timestamp could be
> used to decide whether to make a full gc walk or not. I mean, explore
> a bit more advance gc logic now that this will be alled for every
> packet.
My proposal is to defer all the changes to nf_conncount.c to a follow-up
patch on nf-next. As there are more users of this, it seems risky to me
to change the logic on rc4 or greater. What if I send a v3 with the
change mentioned above to save some CPU cycles and then we move forward
the discussion in a series for nf-next? Do you agree?
Thanks,
Fernando.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-31 8:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-29 13:23 [PATCH nf v2] netfilter: nft_connlimit: fix duplicated tracking of a connection Fernando Fernandez Mancera
2025-10-29 20:33 ` Pablo Neira Ayuso
2025-10-30 8:12 ` Fernando Fernandez Mancera
2025-10-30 23:04 ` Pablo Neira Ayuso
2025-10-30 23:40 ` Florian Westphal
2025-10-31 8:55 ` Fernando Fernandez Mancera
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).