* [PATCH net-next] netfilter: conntrack: Reduce cond_resched frequency in gc_worker
@ 2025-10-14 11:51 lirongqing
2025-10-14 13:06 ` Florian Westphal
0 siblings, 1 reply; 5+ messages in thread
From: lirongqing @ 2025-10-14 11:51 UTC (permalink / raw)
To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Phil Sutter, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, netfilter-devel, coreteam, netdev,
linux-kernel
Cc: Li RongQing
From: Li RongQing <lirongqing@baidu.com>
The current implementation calls cond_resched() in every iteration
of the garbage collection loop. This creates some overhead when
processing large conntrack tables with billions of entries,
as each cond_resched() invocation involves scheduler operations.
To reduce this overhead, implement a time-based throttling mechanism
that calls cond_resched() at most once per millisecond. This maintains
system responsiveness while minimizing scheduler contention.
gc_worker() with hashsize=10000 shows measurable improvement:
Before: 7114.274us
After: 5993.518us (15.8% reduction)
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
net/netfilter/nf_conntrack_core.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 344f882..779ca03 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1513,7 +1513,7 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
static void gc_worker(struct work_struct *work)
{
unsigned int i, hashsz, nf_conntrack_max95 = 0;
- u32 end_time, start_time = nfct_time_stamp;
+ u32 end_time, resched_time, start_time = nfct_time_stamp;
struct conntrack_gc_work *gc_work;
unsigned int expired_count = 0;
unsigned long next_run;
@@ -1536,6 +1536,7 @@ static void gc_worker(struct work_struct *work)
count = gc_work->count;
end_time = start_time + GC_SCAN_MAX_DURATION;
+ resched_time = nfct_time_stamp;
do {
struct nf_conntrack_tuple_hash *h;
@@ -1615,7 +1616,10 @@ static void gc_worker(struct work_struct *work)
* we will just continue with next hash slot.
*/
rcu_read_unlock();
- cond_resched();
+ if (nfct_time_stamp - resched_time > msecs_to_jiffies(1)) {
+ cond_resched();
+ resched_time = nfct_time_stamp;
+ }
i++;
delta_time = nfct_time_stamp - end_time;
--
2.9.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] netfilter: conntrack: Reduce cond_resched frequency in gc_worker
2025-10-14 11:51 [PATCH net-next] netfilter: conntrack: Reduce cond_resched frequency in gc_worker lirongqing
@ 2025-10-14 13:06 ` Florian Westphal
2025-10-15 1:56 ` [????] " Li,Rongqing
0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2025-10-14 13:06 UTC (permalink / raw)
To: lirongqing
Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Phil Sutter,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netfilter-devel, coreteam, netdev, linux-kernel
lirongqing <lirongqing@baidu.com> wrote:
> From: Li RongQing <lirongqing@baidu.com>
>
> The current implementation calls cond_resched() in every iteration
> of the garbage collection loop. This creates some overhead when
> processing large conntrack tables with billions of entries,
> as each cond_resched() invocation involves scheduler operations.
>
> To reduce this overhead, implement a time-based throttling mechanism
> that calls cond_resched() at most once per millisecond. This maintains
> system responsiveness while minimizing scheduler contention.
>
> gc_worker() with hashsize=10000 shows measurable improvement:
>
> Before: 7114.274us
> After: 5993.518us (15.8% reduction)
I dislike this, I have never seen this pattern.
Whole point of cond_resched() is to let scheduler decide.
Maybe it would be better to move gc_worker off to its own
work queue (create_workqueue()) instead of reusing system wq
so one can tune the priority instead?
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [????] Re: [PATCH net-next] netfilter: conntrack: Reduce cond_resched frequency in gc_worker
2025-10-14 13:06 ` Florian Westphal
@ 2025-10-15 1:56 ` Li,Rongqing
2025-10-15 11:06 ` Florian Westphal
0 siblings, 1 reply; 5+ messages in thread
From: Li,Rongqing @ 2025-10-15 1:56 UTC (permalink / raw)
To: Florian Westphal
Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Phil Sutter,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netfilter-devel@vger.kernel.org,
coreteam@netfilter.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
> > The current implementation calls cond_resched() in every iteration of
> > the garbage collection loop. This creates some overhead when
> > processing large conntrack tables with billions of entries, as each
> > cond_resched() invocation involves scheduler operations.
> >
> > To reduce this overhead, implement a time-based throttling mechanism
> > that calls cond_resched() at most once per millisecond. This maintains
> > system responsiveness while minimizing scheduler contention.
> >
> > gc_worker() with hashsize=10000 shows measurable improvement:
> >
> > Before: 7114.274us
> > After: 5993.518us (15.8% reduction)
>
> I dislike this, I have never seen this pattern.
>
This patch is similar as
commit 271557de7cbfdecb08e89ae1ca74647ceb57224f
xfs: reduce the rate of cond_resched calls inside scrub
> Whole point of cond_resched() is to let scheduler decide.
>
> Maybe it would be better to move gc_worker off to its own work queue
> (create_workqueue()) instead of reusing system wq so one can tune the
> priority instead?
I am fine to move gc_worker to its own work queue
Thanks
-Li
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] netfilter: conntrack: Reduce cond_resched frequency in gc_worker
2025-10-15 1:56 ` [????] " Li,Rongqing
@ 2025-10-15 11:06 ` Florian Westphal
2025-10-15 11:49 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2025-10-15 11:06 UTC (permalink / raw)
To: Li,Rongqing
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, netfilter-devel@vger.kernel.org,
coreteam@netfilter.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, mingo, peterz, juri.lelli,
vincent.guittot
Li,Rongqing <lirongqing@baidu.com> wrote:
[ CC scheduler experts & drop netfilter maintainers ]
Context: proposed patch
(https://patchwork.ozlabs.org/project/netfilter-devel/patch/20251014115103.2678-1-lirongqing@baidu.com/)
does:
- cond_resched();
+ if (jiffies - resched_time > msecs_to_jiffies(1)) {
+ cond_resched();
+ resched_time = jiffies;
+ }
... and my knee-jerk reaction was "reject".
But author pointed me at:
commit 271557de7cbfdecb08e89ae1ca74647ceb57224f
xfs: reduce the rate of cond_resched calls inside scrub
So:
Is calling cond_resched() unconditionally while walking hashtable/tree etc.
really discouraged? I see a truckload of cond_resched() calls in similar
walkers all over networking. I find it hard to believe that conntrack is
somehow special and should call it only once per ms.
If cond_resched() is really so expensive even just for *checking*
(retval 0), then maybe we should only call it for every n-th hash slot?
(every L1_CACHE_BYTES?).
But even in that case it would be good to have a comment or documentation
entry about recommended usage, or better yet, make a variant of
xchk_maybe_relax() available via sched.h...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] netfilter: conntrack: Reduce cond_resched frequency in gc_worker
2025-10-15 11:06 ` Florian Westphal
@ 2025-10-15 11:49 ` Peter Zijlstra
0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2025-10-15 11:49 UTC (permalink / raw)
To: Florian Westphal
Cc: Li,Rongqing, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, netfilter-devel@vger.kernel.org,
coreteam@netfilter.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, mingo, juri.lelli, vincent.guittot,
Thomas Gleixner
On Wed, Oct 15, 2025 at 01:06:01PM +0200, Florian Westphal wrote:
> Li,Rongqing <lirongqing@baidu.com> wrote:
>
> [ CC scheduler experts & drop netfilter maintainers ]
>
> Context: proposed patch
> (https://patchwork.ozlabs.org/project/netfilter-devel/patch/20251014115103.2678-1-lirongqing@baidu.com/)
> does:
>
> - cond_resched();
> + if (jiffies - resched_time > msecs_to_jiffies(1)) {
> + cond_resched();
> + resched_time = jiffies;
> + }
>
> ... and my knee-jerk reaction was "reject".
>
> But author pointed me at:
> commit 271557de7cbfdecb08e89ae1ca74647ceb57224f
> xfs: reduce the rate of cond_resched calls inside scrub
>
> So:
>
> Is calling cond_resched() unconditionally while walking hashtable/tree etc.
> really discouraged? I see a truckload of cond_resched() calls in similar
> walkers all over networking. I find it hard to believe that conntrack is
> somehow special and should call it only once per ms.
>
> If cond_resched() is really so expensive even just for *checking*
> (retval 0), then maybe we should only call it for every n-th hash slot?
> (every L1_CACHE_BYTES?).
>
> But even in that case it would be good to have a comment or documentation
> entry about recommended usage, or better yet, make a variant of
> xchk_maybe_relax() available via sched.h...
The plan is to remove cond_resched() and friends entirely and have
PREEMPT_LAZY fully replace PREEMPT_VOLUNTARY.
But I don't think we currently have anybody actively working on this.
Ideally distros should be switching to LAZY and report performance vs
VOLUNTARY such that we can try and address things.
Perhaps the thing to do is to just disable VOLUNTARY and see what
happens :-)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-15 11:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14 11:51 [PATCH net-next] netfilter: conntrack: Reduce cond_resched frequency in gc_worker lirongqing
2025-10-14 13:06 ` Florian Westphal
2025-10-15 1:56 ` [????] " Li,Rongqing
2025-10-15 11:06 ` Florian Westphal
2025-10-15 11:49 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).