public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
@ 2026-03-11 19:40 Prasanna S Panchamukhi
  2026-03-12 12:12 ` Fernando Fernandez Mancera
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Prasanna S Panchamukhi @ 2026-03-11 19:40 UTC (permalink / raw)
  To: netfilter-devel
  Cc: panchamukhi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netdev,
	linux-doc, linux-kernel, coreteam

The conntrack garbage collection worker uses an adaptive algorithm that
adjusts the scan interval based on the average timeout of tracked
entries.  The upper bound of this interval is hardcoded as
GC_SCAN_INTERVAL_MAX (60 seconds).

Expose the upper bound as a new sysctl,
net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at
runtime without rebuilding the kernel.  The default remains 60 seconds
to preserve existing behavior.  The sysctl is global and read-only in
non-init network namespaces, consistent with nf_conntrack_max and
nf_conntrack_buckets.

In environments where long-lived offloaded flows dominate the table,
the adaptive average drifts toward the maximum, delaying cleanup
of short-lived expired entries such as those in TCP CLOSE state
(10s timeout). Adding sysctl to set the maximum GC scan helps to
tune according to the evironment.

Signed-off-by: Prasanna S Panchamukhi <panchamukhi@arista.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: Jonathan Corbet <corbet@lwn.net>
cc: Shuah Khan <skhan@linuxfoundation.org>
cc: Pablo Neira Ayuso <pablo@netfilter.org>
cc: Florian Westphal <fw@strlen.de>
cc: Phil Sutter <phil@nwl.cc>
cc: netdev@vger.kernel.org
cc: linux-doc@vger.kernel.org
cc: linux-kernel@vger.kernel.org
to: netfilter-devel@vger.kernel.org
cc: coreteam@netfilter.org
---
 Documentation/networking/nf_conntrack-sysctl.rst | 11 +++++++++++
 include/net/netfilter/nf_conntrack.h             |  1 +
 net/netfilter/nf_conntrack_core.c                |  9 ++++++---
 net/netfilter/nf_conntrack_standalone.c          | 10 ++++++++++
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index 35f889259fcd..c848eef9bc4f 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -64,6 +64,17 @@ nf_conntrack_frag6_timeout - INTEGER (seconds)
 
 	Time to keep an IPv6 fragment in memory.
 
+nf_conntrack_gc_scan_interval_max - INTEGER (seconds)
+	default 60
+
+	Maximum interval between garbage collection scans of the connection
+	tracking table. The GC worker uses an adaptive algorithm that adjusts
+	the scan interval based on average entry timeouts; this parameter caps
+	the upper bound. Lower values cause expired entries (e.g. connections
+	in CLOSE state) to be cleaned up faster, at the cost of slightly more
+	CPU usage. Minimum value is 1.
+	This sysctl is only writeable in the initial net namespace.
+
 nf_conntrack_generic_timeout - INTEGER (seconds)
 	default 600
 
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index bc42dd0e10e6..0449577f322e 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -331,6 +331,7 @@ extern struct hlist_nulls_head *nf_conntrack_hash;
 extern unsigned int nf_conntrack_htable_size;
 extern seqcount_spinlock_t nf_conntrack_generation;
 extern unsigned int nf_conntrack_max;
+extern unsigned int nf_conntrack_gc_scan_interval_max;
 
 /* must be called with rcu read lock held */
 static inline void
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 27ce5fda8993..54949246f329 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -91,7 +91,7 @@ static DEFINE_MUTEX(nf_conntrack_mutex);
  * allowing non-idle machines to wakeup more often when needed.
  */
 #define GC_SCAN_INITIAL_COUNT	100
-#define GC_SCAN_INTERVAL_INIT	GC_SCAN_INTERVAL_MAX
+#define GC_SCAN_INTERVAL_INIT	nf_conntrack_gc_scan_interval_max
 
 #define GC_SCAN_MAX_DURATION	msecs_to_jiffies(10)
 #define GC_SCAN_EXPIRED_MAX	(64000u / HZ)
@@ -204,6 +204,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
+
+unsigned int nf_conntrack_gc_scan_interval_max __read_mostly = GC_SCAN_INTERVAL_MAX;
+
 seqcount_spinlock_t nf_conntrack_generation __read_mostly;
 static siphash_aligned_key_t nf_conntrack_hash_rnd;
 
@@ -1568,7 +1571,7 @@ static void gc_worker(struct work_struct *work)
 				delta_time = nfct_time_stamp - gc_work->start_time;
 
 				/* re-sched immediately if total cycle time is exceeded */
-				next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX;
+				next_run = delta_time < (s32)nf_conntrack_gc_scan_interval_max;
 				goto early_exit;
 			}
 
@@ -1630,7 +1633,7 @@ static void gc_worker(struct work_struct *work)
 
 	gc_work->next_bucket = 0;
 
-	next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX);
+	next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, nf_conntrack_gc_scan_interval_max);
 
 	delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1);
 	if (next_run > (unsigned long)delta_time)
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 207b240b14e5..f8cab779763f 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -637,6 +637,7 @@ enum nf_ct_sysctl_index {
 	NF_SYSCTL_CT_PROTO_TIMEOUT_GRE,
 	NF_SYSCTL_CT_PROTO_TIMEOUT_GRE_STREAM,
 #endif
+	NF_SYSCTL_CT_GC_SCAN_INTERVAL_MAX,
 
 	NF_SYSCTL_CT_LAST_SYSCTL,
 };
@@ -920,6 +921,14 @@ static struct ctl_table nf_ct_sysctl_table[] = {
 		.proc_handler   = proc_dointvec_jiffies,
 	},
 #endif
+	[NF_SYSCTL_CT_GC_SCAN_INTERVAL_MAX] = {
+		.procname	= "nf_conntrack_gc_scan_interval_max",
+		.data		= &nf_conntrack_gc_scan_interval_max,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_jiffies,
+		.extra1		= SYSCTL_ONE,
+	},
 };
 
 static struct ctl_table nf_ct_netfilter_table[] = {
@@ -1043,6 +1052,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 		table[NF_SYSCTL_CT_MAX].mode = 0444;
 		table[NF_SYSCTL_CT_EXPECT_MAX].mode = 0444;
 		table[NF_SYSCTL_CT_BUCKETS].mode = 0444;
+		table[NF_SYSCTL_CT_GC_SCAN_INTERVAL_MAX].mode = 0444;
 	}
 
 	cnet->sysctl_header = register_net_sysctl_sz(net, "net/netfilter",
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
  2026-03-11 19:40 [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl Prasanna S Panchamukhi
@ 2026-03-12 12:12 ` Fernando Fernandez Mancera
  2026-03-12 12:15 ` Fernando Fernandez Mancera
  2026-03-12 12:36 ` Florian Westphal
  2 siblings, 0 replies; 8+ messages in thread
From: Fernando Fernandez Mancera @ 2026-03-12 12:12 UTC (permalink / raw)
  To: Prasanna S Panchamukhi, netfilter-devel
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, Pablo Neira Ayuso,
	Florian Westphal, Phil Sutter, netdev, linux-doc, linux-kernel,
	coreteam

On 3/11/26 8:40 PM, Prasanna S Panchamukhi wrote:
> The conntrack garbage collection worker uses an adaptive algorithm that
> adjusts the scan interval based on the average timeout of tracked
> entries.  The upper bound of this interval is hardcoded as
> GC_SCAN_INTERVAL_MAX (60 seconds).
> 
> Expose the upper bound as a new sysctl,
> net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at
> runtime without rebuilding the kernel.  The default remains 60 seconds
> to preserve existing behavior.  The sysctl is global and read-only in
> non-init network namespaces, consistent with nf_conntrack_max and
> nf_conntrack_buckets.
> 
> In environments where long-lived offloaded flows dominate the table,
> the adaptive average drifts toward the maximum, delaying cleanup
> of short-lived expired entries such as those in TCP CLOSE state
> (10s timeout). Adding sysctl to set the maximum GC scan helps to
> tune according to the evironment.
> 
> Signed-off-by: Prasanna S Panchamukhi <panchamukhi@arista.com>
[...]
> ---
>   Documentation/networking/nf_conntrack-sysctl.rst | 11 +++++++++++
>   include/net/netfilter/nf_conntrack.h             |  1 +
>   net/netfilter/nf_conntrack_core.c                |  9 ++++++---
>   net/netfilter/nf_conntrack_standalone.c          | 10 ++++++++++
>   4 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
> index 35f889259fcd..c848eef9bc4f 100644
> --- a/Documentation/networking/nf_conntrack-sysctl.rst
> +++ b/Documentation/networking/nf_conntrack-sysctl.rst
> @@ -64,6 +64,17 @@ nf_conntrack_frag6_timeout - INTEGER (seconds)
>   
>   	Time to keep an IPv6 fragment in memory.
>   
> +nf_conntrack_gc_scan_interval_max - INTEGER (seconds)
> +	default 60
> +
> +	Maximum interval between garbage collection scans of the connection
> +	tracking table. The GC worker uses an adaptive algorithm that adjusts
> +	the scan interval based on average entry timeouts; this parameter caps
> +	the upper bound. Lower values cause expired entries (e.g. connections
> +	in CLOSE state) to be cleaned up faster, at the cost of slightly more
> +	CPU usage. Minimum value is 1.
> +	This sysctl is only writeable in the initial net namespace.
> +
>   nf_conntrack_generic_timeout - INTEGER (seconds)
>   	default 600
>   
> diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
> index bc42dd0e10e6..0449577f322e 100644
> --- a/include/net/netfilter/nf_conntrack.h
> +++ b/include/net/netfilter/nf_conntrack.h
> @@ -331,6 +331,7 @@ extern struct hlist_nulls_head *nf_conntrack_hash;
>   extern unsigned int nf_conntrack_htable_size;
>   extern seqcount_spinlock_t nf_conntrack_generation;
>   extern unsigned int nf_conntrack_max;
> +extern unsigned int nf_conntrack_gc_scan_interval_max;
>   
>   /* must be called with rcu read lock held */
>   static inline void
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 27ce5fda8993..54949246f329 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -91,7 +91,7 @@ static DEFINE_MUTEX(nf_conntrack_mutex);
>    * allowing non-idle machines to wakeup more often when needed.
>    */
>   #define GC_SCAN_INITIAL_COUNT	100
> -#define GC_SCAN_INTERVAL_INIT	GC_SCAN_INTERVAL_MAX
> +#define GC_SCAN_INTERVAL_INIT	nf_conntrack_gc_scan_interval_max
>   
>   #define GC_SCAN_MAX_DURATION	msecs_to_jiffies(10)
>   #define GC_SCAN_EXPIRED_MAX	(64000u / HZ)
> @@ -204,6 +204,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
>   
>   unsigned int nf_conntrack_max __read_mostly;
>   EXPORT_SYMBOL_GPL(nf_conntrack_max);
> +
> +unsigned int nf_conntrack_gc_scan_interval_max __read_mostly = GC_SCAN_INTERVAL_MAX;
> +
>   seqcount_spinlock_t nf_conntrack_generation __read_mostly;
>   static siphash_aligned_key_t nf_conntrack_hash_rnd;
>   
> @@ -1568,7 +1571,7 @@ static void gc_worker(struct work_struct *work)
>   				delta_time = nfct_time_stamp - gc_work->start_time;
>   
>   				/* re-sched immediately if total cycle time is exceeded */
> -				next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX;
> +				next_run = delta_time < (s32)nf_conntrack_gc_scan_interval_max;
>   				goto early_exit;
>   			}
>   
> @@ -1630,7 +1633,7 @@ static void gc_worker(struct work_struct *work)
>   
>   	gc_work->next_bucket = 0;
>   
> -	next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX);
> +	next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, nf_conntrack_gc_scan_interval_max);
>   
>   	delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1);
>   	if (next_run > (unsigned long)delta_time)
> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
> index 207b240b14e5..f8cab779763f 100644
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -637,6 +637,7 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_GRE,
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_GRE_STREAM,
>   #endif
> +	NF_SYSCTL_CT_GC_SCAN_INTERVAL_MAX,
>   
>   	NF_SYSCTL_CT_LAST_SYSCTL,
>   };
> @@ -920,6 +921,14 @@ static struct ctl_table nf_ct_sysctl_table[] = {
>   		.proc_handler   = proc_dointvec_jiffies,
>   	},
>   #endif
> +	[NF_SYSCTL_CT_GC_SCAN_INTERVAL_MAX] = {
> +		.procname	= "nf_conntrack_gc_scan_interval_max",
> +		.data		= &nf_conntrack_gc_scan_interval_max,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_jiffies,
> +		.extra1		= SYSCTL_ONE,
> +	},
>   };
>   
>   static struct ctl_table nf_ct_netfilter_table[] = {
> @@ -1043,6 +1052,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>   		table[NF_SYSCTL_CT_MAX].mode = 0444;
>   		table[NF_SYSCTL_CT_EXPECT_MAX].mode = 0444;
>   		table[NF_SYSCTL_CT_BUCKETS].mode = 0444;
> +		table[NF_SYSCTL_CT_GC_SCAN_INTERVAL_MAX].mode = 0444;
>   	}
>   
>   	cnet->sysctl_header = register_net_sysctl_sz(net, "net/netfilter",


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
  2026-03-11 19:40 [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl Prasanna S Panchamukhi
  2026-03-12 12:12 ` Fernando Fernandez Mancera
@ 2026-03-12 12:15 ` Fernando Fernandez Mancera
  2026-03-12 21:44   ` Prasanna Panchamukhi
  2026-03-12 12:36 ` Florian Westphal
  2 siblings, 1 reply; 8+ messages in thread
From: Fernando Fernandez Mancera @ 2026-03-12 12:15 UTC (permalink / raw)
  To: Prasanna S Panchamukhi, netfilter-devel
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, Pablo Neira Ayuso,
	Florian Westphal, Phil Sutter, netdev, linux-doc, linux-kernel,
	coreteam

On 3/11/26 8:40 PM, Prasanna S Panchamukhi wrote:
> The conntrack garbage collection worker uses an adaptive algorithm that
> adjusts the scan interval based on the average timeout of tracked
> entries.  The upper bound of this interval is hardcoded as
> GC_SCAN_INTERVAL_MAX (60 seconds).
> 
> Expose the upper bound as a new sysctl,
> net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at
> runtime without rebuilding the kernel.  The default remains 60 seconds
> to preserve existing behavior.  The sysctl is global and read-only in
> non-init network namespaces, consistent with nf_conntrack_max and
> nf_conntrack_buckets.
> 
> In environments where long-lived offloaded flows dominate the table,
> the adaptive average drifts toward the maximum, delaying cleanup
> of short-lived expired entries such as those in TCP CLOSE state
> (10s timeout). Adding sysctl to set the maximum GC scan helps to
> tune according to the evironment.
> 
> Signed-off-by: Prasanna S Panchamukhi <panchamukhi@arista.com>
[...]
> ---
>   Documentation/networking/nf_conntrack-sysctl.rst | 11 +++++++++++
>   include/net/netfilter/nf_conntrack.h             |  1 +
>   net/netfilter/nf_conntrack_core.c                |  9 ++++++---
>   net/netfilter/nf_conntrack_standalone.c          | 10 ++++++++++
>   4 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
> index 35f889259fcd..c848eef9bc4f 100644
> --- a/Documentation/networking/nf_conntrack-sysctl.rst
> +++ b/Documentation/networking/nf_conntrack-sysctl.rst
> @@ -64,6 +64,17 @@ nf_conntrack_frag6_timeout - INTEGER (seconds)
>   
>   	Time to keep an IPv6 fragment in memory.
>   
> +nf_conntrack_gc_scan_interval_max - INTEGER (seconds)
> +	default 60
> +
> +	Maximum interval between garbage collection scans of the connection
> +	tracking table. The GC worker uses an adaptive algorithm that adjusts
> +	the scan interval based on average entry timeouts; this parameter caps
> +	the upper bound. Lower values cause expired entries (e.g. connections
> +	in CLOSE state) to be cleaned up faster, at the cost of slightly more
> +	CPU usage. Minimum value is 1.
> +	This sysctl is only writeable in the initial net namespace.
> +

I think it would be a good idea to add under which situations it is good 
to tweak this setting.

>   nf_conntrack_generic_timeout - INTEGER (seconds)
>   	default 600
>   
> diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
> index bc42dd0e10e6..0449577f322e 100644
> --- a/include/net/netfilter/nf_conntrack.h
> +++ b/include/net/netfilter/nf_conntrack.h
> @@ -331,6 +331,7 @@ extern struct hlist_nulls_head *nf_conntrack_hash;
>   extern unsigned int nf_conntrack_htable_size;
>   extern seqcount_spinlock_t nf_conntrack_generation;
>   extern unsigned int nf_conntrack_max;
> +extern unsigned int nf_conntrack_gc_scan_interval_max;
>

Could it be just int? so there is no need to cast it to s32 later?

>   /* must be called with rcu read lock held */
>   static inline void
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 27ce5fda8993..54949246f329 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -91,7 +91,7 @@ static DEFINE_MUTEX(nf_conntrack_mutex);
>    * allowing non-idle machines to wakeup more often when needed.
>    */
>   #define GC_SCAN_INITIAL_COUNT	100
> -#define GC_SCAN_INTERVAL_INIT	GC_SCAN_INTERVAL_MAX
> +#define GC_SCAN_INTERVAL_INIT	nf_conntrack_gc_scan_interval_max
>   
>   #define GC_SCAN_MAX_DURATION	msecs_to_jiffies(10)
>   #define GC_SCAN_EXPIRED_MAX	(64000u / HZ)
> @@ -204,6 +204,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
>   
>   unsigned int nf_conntrack_max __read_mostly;
>   EXPORT_SYMBOL_GPL(nf_conntrack_max);
> +
> +unsigned int nf_conntrack_gc_scan_interval_max __read_mostly = GC_SCAN_INTERVAL_MAX;
> +
>   seqcount_spinlock_t nf_conntrack_generation __read_mostly;
>   static siphash_aligned_key_t nf_conntrack_hash_rnd;
>   
> @@ -1568,7 +1571,7 @@ static void gc_worker(struct work_struct *work)
>   				delta_time = nfct_time_stamp - gc_work->start_time;
>   
>   				/* re-sched immediately if total cycle time is exceeded */
> -				next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX;
> +				next_run = delta_time < (s32)nf_conntrack_gc_scan_interval_max;
>   				goto early_exit;
>   			}
>   

READ_ONCE() is required IMHO as it can be modified from sysctl concurrently.

> @@ -1630,7 +1633,7 @@ static void gc_worker(struct work_struct *work)
>   
>   	gc_work->next_bucket = 0;
>   
> -	next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX);
> +	next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, nf_conntrack_gc_scan_interval_max);
>   

Likewise here, READ_ONCE() recommended..

Thanks,
Fernando.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
  2026-03-11 19:40 [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl Prasanna S Panchamukhi
  2026-03-12 12:12 ` Fernando Fernandez Mancera
  2026-03-12 12:15 ` Fernando Fernandez Mancera
@ 2026-03-12 12:36 ` Florian Westphal
  2026-03-12 22:31   ` Prasanna Panchamukhi
  2 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2026-03-12 12:36 UTC (permalink / raw)
  To: Prasanna S Panchamukhi
  Cc: netfilter-devel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Pablo Neira Ayuso, Phil Sutter, netdev, linux-doc, linux-kernel,
	coreteam

Prasanna S Panchamukhi <panchamukhi@arista.com> wrote:
> The conntrack garbage collection worker uses an adaptive algorithm that
> adjusts the scan interval based on the average timeout of tracked
> entries.  The upper bound of this interval is hardcoded as
> GC_SCAN_INTERVAL_MAX (60 seconds).
> 
> Expose the upper bound as a new sysctl,
> net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at
> runtime without rebuilding the kernel.  The default remains 60 seconds
> to preserve existing behavior.  The sysctl is global and read-only in
> non-init network namespaces, consistent with nf_conntrack_max and
> nf_conntrack_buckets.

This was proposed before, see:

https://lore.kernel.org/netfilter-devel/aO-id5W6Tr7frdHN@strlen.de/
https://lore.kernel.org/netfilter-devel/aRsuU57juCvsMBKE@strlen.de/

I did not hear back wrt. the horizon cache.

I'm not 100% opposed to this, but I do wonder if we really can't do
better than the current avg strategy.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
  2026-03-12 12:15 ` Fernando Fernandez Mancera
@ 2026-03-12 21:44   ` Prasanna Panchamukhi
  0 siblings, 0 replies; 8+ messages in thread
From: Prasanna Panchamukhi @ 2026-03-12 21:44 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netfilter-devel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netdev,
	linux-doc, linux-kernel, coreteam

Hi Fernando,

Thank you for the quick review.

On Thu, Mar 12, 2026 at 5:15 AM Fernando Fernandez Mancera
<fmancera@suse.de> wrote:
>
> On 3/11/26 8:40 PM, Prasanna S Panchamukhi wrote:
> > The conntrack garbage collection worker uses an adaptive algorithm that
> > adjusts the scan interval based on the average timeout of tracked
> > entries.  The upper bound of this interval is hardcoded as
> > GC_SCAN_INTERVAL_MAX (60 seconds).
> >
> > Expose the upper bound as a new sysctl,
> > net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at
> > runtime without rebuilding the kernel.  The default remains 60 seconds
> > to preserve existing behavior.  The sysctl is global and read-only in
> > non-init network namespaces, consistent with nf_conntrack_max and
> > nf_conntrack_buckets.
> >
> > In environments where long-lived offloaded flows dominate the table,
> > the adaptive average drifts toward the maximum, delaying cleanup
> > of short-lived expired entries such as those in TCP CLOSE state
> > (10s timeout). Adding sysctl to set the maximum GC scan helps to
> > tune according to the evironment.
> >
> > Signed-off-by: Prasanna S Panchamukhi <panchamukhi@arista.com>
> [...]
> > ---
> >   Documentation/networking/nf_conntrack-sysctl.rst | 11 +++++++++++
> >   include/net/netfilter/nf_conntrack.h             |  1 +
> >   net/netfilter/nf_conntrack_core.c                |  9 ++++++---
> >   net/netfilter/nf_conntrack_standalone.c          | 10 ++++++++++
> >   4 files changed, 28 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
> > index 35f889259fcd..c848eef9bc4f 100644
> > --- a/Documentation/networking/nf_conntrack-sysctl.rst
> > +++ b/Documentation/networking/nf_conntrack-sysctl.rst
> > @@ -64,6 +64,17 @@ nf_conntrack_frag6_timeout - INTEGER (seconds)
> >
> >       Time to keep an IPv6 fragment in memory.
> >
> > +nf_conntrack_gc_scan_interval_max - INTEGER (seconds)
> > +     default 60
> > +
> > +     Maximum interval between garbage collection scans of the connection
> > +     tracking table. The GC worker uses an adaptive algorithm that adjusts
> > +     the scan interval based on average entry timeouts; this parameter caps
> > +     the upper bound. Lower values cause expired entries (e.g. connections
> > +     in CLOSE state) to be cleaned up faster, at the cost of slightly more
> > +     CPU usage. Minimum value is 1.
> > +     This sysctl is only writeable in the initial net namespace.
> > +
>
> I think it would be a good idea to add under which situations it is good
> to tweak this setting.


Done.

>
>
> >   nf_conntrack_generic_timeout - INTEGER (seconds)
> >       default 600
> >
> > diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
> > index bc42dd0e10e6..0449577f322e 100644
> > --- a/include/net/netfilter/nf_conntrack.h
> > +++ b/include/net/netfilter/nf_conntrack.h
> > @@ -331,6 +331,7 @@ extern struct hlist_nulls_head *nf_conntrack_hash;
> >   extern unsigned int nf_conntrack_htable_size;
> >   extern seqcount_spinlock_t nf_conntrack_generation;
> >   extern unsigned int nf_conntrack_max;
> > +extern unsigned int nf_conntrack_gc_scan_interval_max;
> >
>
> Could it be just int? so there is no need to cast it to s32 later?



Regarding the data type, I encountered the following compilation error
when trying to address the signedness:

"../../net/netfilter/nf_conntrack_core.c: In function 'gc_worker':
../../include/linux/compiler_types.h:548:45: error: call to
'__compiletime_assert_1027' declared with attribute error:
clamp(next_run, (1ul * 250), gc_scan_max) signedness error"


>
> >   /* must be called with rcu read lock held */
> >   static inline void
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 27ce5fda8993..54949246f329 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -91,7 +91,7 @@ static DEFINE_MUTEX(nf_conntrack_mutex);
> >    * allowing non-idle machines to wakeup more often when needed.
> >    */
> >   #define GC_SCAN_INITIAL_COUNT       100
> > -#define GC_SCAN_INTERVAL_INIT        GC_SCAN_INTERVAL_MAX
> > +#define GC_SCAN_INTERVAL_INIT        nf_conntrack_gc_scan_interval_max
> >
> >   #define GC_SCAN_MAX_DURATION        msecs_to_jiffies(10)
> >   #define GC_SCAN_EXPIRED_MAX (64000u / HZ)
> > @@ -204,6 +204,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
> >
> >   unsigned int nf_conntrack_max __read_mostly;
> >   EXPORT_SYMBOL_GPL(nf_conntrack_max);
> > +
> > +unsigned int nf_conntrack_gc_scan_interval_max __read_mostly = GC_SCAN_INTERVAL_MAX;
> > +
> >   seqcount_spinlock_t nf_conntrack_generation __read_mostly;
> >   static siphash_aligned_key_t nf_conntrack_hash_rnd;
> >
> > @@ -1568,7 +1571,7 @@ static void gc_worker(struct work_struct *work)
> >                               delta_time = nfct_time_stamp - gc_work->start_time;
> >
> >                               /* re-sched immediately if total cycle time is exceeded */
> > -                             next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX;
> > +                             next_run = delta_time < (s32)nf_conntrack_gc_scan_interval_max;
> >                               goto early_exit;
> >                       }
> >
>
> READ_ONCE() is required IMHO as it can be modified from sysctl concurrently.
Done.
>
> > @@ -1630,7 +1633,7 @@ static void gc_worker(struct work_struct *work)
> >
> >       gc_work->next_bucket = 0;
> >
> > -     next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX);
> > +     next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, nf_conntrack_gc_scan_interval_max);
> >
>
> Likewise here, READ_ONCE() recommended..

Done. I have also added a local variable gc_scan_max to avoid multiple
load instructions since it is referenced twice in the code.

>
> Thanks,
> Fernando.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
  2026-03-12 12:36 ` Florian Westphal
@ 2026-03-12 22:31   ` Prasanna Panchamukhi
  2026-03-12 22:42     ` Pablo Neira Ayuso
  2026-03-12 23:10     ` Florian Westphal
  0 siblings, 2 replies; 8+ messages in thread
From: Prasanna Panchamukhi @ 2026-03-12 22:31 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Pablo Neira Ayuso, Phil Sutter, netdev, linux-doc, linux-kernel,
	coreteam

On Thu, Mar 12, 2026 at 5:36 AM Florian Westphal <fw@strlen.de> wrote:
>
> Prasanna S Panchamukhi <panchamukhi@arista.com> wrote:
> > The conntrack garbage collection worker uses an adaptive algorithm that
> > adjusts the scan interval based on the average timeout of tracked
> > entries.  The upper bound of this interval is hardcoded as
> > GC_SCAN_INTERVAL_MAX (60 seconds).
> >
> > Expose the upper bound as a new sysctl,
> > net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at
> > runtime without rebuilding the kernel.  The default remains 60 seconds
> > to preserve existing behavior.  The sysctl is global and read-only in
> > non-init network namespaces, consistent with nf_conntrack_max and
> > nf_conntrack_buckets.
>
> This was proposed before, see:
>
> https://lore.kernel.org/netfilter-devel/aO-id5W6Tr7frdHN@strlen.de/
> https://lore.kernel.org/netfilter-devel/aRsuU57juCvsMBKE@strlen.de/
>
> I did not hear back wrt. the horizon cache.
>
> I'm not 100% opposed to this, but I do wonder if we really can't do
> better than the current avg strategy.

Hi Florian,

Our primary goal is to cap the maximum time taken by the GC to clean
up expired entries. We rely on user-space notifications to clean up
these entries from the hardware, so ensuring a predictable upper bound
is important for our use case.

Regarding the adaptive strategy, we are using this sysctl to address
environments where the current average-based calculation delays the
cleanup of short-lived entries.

Thanks,
Prasanna

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
  2026-03-12 22:31   ` Prasanna Panchamukhi
@ 2026-03-12 22:42     ` Pablo Neira Ayuso
  2026-03-12 23:10     ` Florian Westphal
  1 sibling, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2026-03-12 22:42 UTC (permalink / raw)
  To: Prasanna Panchamukhi
  Cc: Florian Westphal, netfilter-devel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, Phil Sutter, netdev, linux-doc, linux-kernel,
	coreteam

Hi Prasanna,

On Thu, Mar 12, 2026 at 03:31:06PM -0700, Prasanna Panchamukhi wrote:
> On Thu, Mar 12, 2026 at 5:36 AM Florian Westphal <fw@strlen.de> wrote:
> >
> > Prasanna S Panchamukhi <panchamukhi@arista.com> wrote:
> > > The conntrack garbage collection worker uses an adaptive algorithm that
> > > adjusts the scan interval based on the average timeout of tracked
> > > entries.  The upper bound of this interval is hardcoded as
> > > GC_SCAN_INTERVAL_MAX (60 seconds).
> > >
> > > Expose the upper bound as a new sysctl,
> > > net.netfilter.nf_conntrack_gc_scan_interval_max, so it can be tuned at
> > > runtime without rebuilding the kernel.  The default remains 60 seconds
> > > to preserve existing behavior.  The sysctl is global and read-only in
> > > non-init network namespaces, consistent with nf_conntrack_max and
> > > nf_conntrack_buckets.
> >
> > This was proposed before, see:
> >
> > https://lore.kernel.org/netfilter-devel/aO-id5W6Tr7frdHN@strlen.de/
> > https://lore.kernel.org/netfilter-devel/aRsuU57juCvsMBKE@strlen.de/
> >
> > I did not hear back wrt. the horizon cache.
> >
> > I'm not 100% opposed to this, but I do wonder if we really can't do
> > better than the current avg strategy.
> 
> Hi Florian,
> 
> Our primary goal is to cap the maximum time taken by the GC to clean
> up expired entries. We rely on user-space notifications to clean up
> these entries from the hardware, so ensuring a predictable upper bound
> is important for our use case.

Is there any reason why you decide not to use instead the existing
hardware offload infrastructure for this purpose?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl
  2026-03-12 22:31   ` Prasanna Panchamukhi
  2026-03-12 22:42     ` Pablo Neira Ayuso
@ 2026-03-12 23:10     ` Florian Westphal
  1 sibling, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-03-12 23:10 UTC (permalink / raw)
  To: Prasanna Panchamukhi
  Cc: netfilter-devel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Pablo Neira Ayuso, Phil Sutter, netdev, linux-doc, linux-kernel,
	coreteam

Prasanna Panchamukhi <panchamukhi@arista.com> wrote:
> Our primary goal is to cap the maximum time taken by the GC to clean
> up expired entries. We rely on user-space notifications to clean up
> these entries from the hardware, so ensuring a predictable upper bound
> is important for our use case.

Sure, but why can't we try to give a better default behavior?

while true; conntrack -L >/dev/null;done

basically does what you want already (but in a dumb way).

> Regarding the adaptive strategy, we are using this sysctl to address
> environments where the current average-based calculation delays the
> cleanup of short-lived entries.

Yes, and I did propose to adapt the existing strategy to provide more
timely notifications.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-12 23:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 19:40 [PATCH net-next] netfilter: conntrack: expose gc_scan_interval_max via sysctl Prasanna S Panchamukhi
2026-03-12 12:12 ` Fernando Fernandez Mancera
2026-03-12 12:15 ` Fernando Fernandez Mancera
2026-03-12 21:44   ` Prasanna Panchamukhi
2026-03-12 12:36 ` Florian Westphal
2026-03-12 22:31   ` Prasanna Panchamukhi
2026-03-12 22:42     ` Pablo Neira Ayuso
2026-03-12 23:10     ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox