From: Chuang Wang <nashuiliang@gmail.com>
Cc: Chuang Wang <nashuiliang@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Kuniyuki Iwashima <kuniyu@google.com>,
Samiullah Khawaja <skhawaja@google.com>,
Hangbin Liu <liuhangbin@gmail.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH net-next] net: reduce RFS/ARFS flow updates by checking LLC affinity
Date: Sun, 8 Mar 2026 15:09:16 +0800 [thread overview]
Message-ID: <20260308070925.58939-1-nashuiliang@gmail.com> (raw)
The current implementation of rps_record_sock_flow() updates the flow
table every time a socket is processed on a different CPU. In high-load
scenarios, especially with Accelerated RFS (ARFS), this triggers
frequent flow steering updates via ndo_rx_flow_steer.
For drivers like mlx5 that implement hardware flow steering, these
constant updates lead to significant contention on internal driver locks
(e.g., arfs_lock). This contention often becomes a performance
bottleneck that outweighs the steering benefits.
This patch introduces a cache-aware update strategy: the flow record is
only updated if the flow migrates across Last Level Cache (LLC)
boundaries. This minimizes expensive hardware reconfigurations while
preserving cache locality for the application.
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
---
include/net/rps.h | 17 +--------------
net/core/dev.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 16 deletions(-)
diff --git a/include/net/rps.h b/include/net/rps.h
index e33c6a2fa8bb..2cd8698a79d5 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -55,22 +55,7 @@ struct rps_sock_flow_table {
#define RPS_NO_CPU 0xffff
-static inline void rps_record_sock_flow(rps_tag_ptr tag_ptr, u32 hash)
-{
- unsigned int index = hash & rps_tag_to_mask(tag_ptr);
- u32 val = hash & ~net_hotdata.rps_cpu_mask;
- struct rps_sock_flow_table *table;
-
- /* We only give a hint, preemption can change CPU under us */
- val |= raw_smp_processor_id();
-
- table = rps_tag_to_table(tag_ptr);
- /* The following WRITE_ONCE() is paired with the READ_ONCE()
- * here, and another one in get_rps_cpu().
- */
- if (READ_ONCE(table[index].ent) != val)
- WRITE_ONCE(table[index].ent, val);
-}
+void rps_record_sock_flow(rps_tag_ptr tag_ptr, u32 hash);
static inline void _sock_rps_record_flow_hash(__u32 hash)
{
diff --git a/net/core/dev.c b/net/core/dev.c
index 203dc36aaed5..770cfb6fe06b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5175,6 +5175,60 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
return cpu;
}
+/**
+ * rps_record_cond - Determine if RPS flow table should be updated
+ * @old_val: Previous flow record value
+ * @new_val: Target flow record value
+ *
+ * Returns true if the record needs an update.
+ */
+static inline bool rps_record_cond(u32 old_val, u32 new_val)
+{
+ u32 old_cpu = old_val & ~net_hotdata.rps_cpu_mask;
+ u32 new_cpu = new_val & ~net_hotdata.rps_cpu_mask;
+
+ if (old_val == new_val)
+ return false;
+
+ /* Force update if the recorded CPU is invalid or has gone offline */
+ if (old_cpu >= nr_cpu_ids || !cpu_active(old_cpu))
+ return true;
+
+ /*
+ * Force an update if the current task is no longer permitted
+ * to run on the old_cpu.
+ */
+ if (!cpumask_test_cpu(old_cpu, current->cpus_ptr))
+ return true;
+
+ /*
+ * If CPUs do not share a cache, allow the update to prevent
+ * expensive remote memory accesses and cache misses.
+ */
+ if (!cpus_share_cache(old_cpu, new_cpu))
+ return true;
+
+ return false;
+}
+
+void rps_record_sock_flow(rps_tag_ptr tag_ptr, u32 hash)
+{
+ unsigned int index = hash & rps_tag_to_mask(tag_ptr);
+ u32 val = hash & ~net_hotdata.rps_cpu_mask;
+ struct rps_sock_flow_table *table;
+
+ /* We only give a hint, preemption can change CPU under us */
+ val |= raw_smp_processor_id();
+
+ table = rps_tag_to_table(tag_ptr);
+ /* The following WRITE_ONCE() is paired with the READ_ONCE()
+ * here, and another one in get_rps_cpu().
+ */
+ if (rps_record_cond(READ_ONCE(table[index].ent), val))
+ WRITE_ONCE(table[index].ent, val);
+}
+EXPORT_SYMBOL(rps_record_sock_flow);
+
#ifdef CONFIG_RFS_ACCEL
/**
--
2.47.3
next reply other threads:[~2026-03-08 7:10 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-08 7:09 Chuang Wang [this message]
2026-03-08 7:19 ` [PATCH net-next] net: reduce RFS/ARFS flow updates by checking LLC affinity Eric Dumazet
2026-03-08 8:20 ` chuang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260308070925.58939-1-nashuiliang@gmail.com \
--to=nashuiliang@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liuhangbin@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=skhawaja@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox