* [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables
@ 2026-03-01 18:14 Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 1/7] net: add rps_tag_ptr type and helpers Eric Dumazet
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
This series adds rps_tag_ptr which encodes both a pointer
and a size of a power-of-two hash table in a single long word.
RFS hash tables (global and per rx-queue) are converted to rps_tag_ptr.
This removes a cache line miss, and allows high-order allocations.
The global hash table can benefit from huge pages.
v2: address various kernel bots reports.
Eric Dumazet (7):
net: add rps_tag_ptr type and helpers
net-sysfs: remove rcu field from 'struct rps_sock_flow_table'
net-sysfs: add rps_sock_flow_table_mask() helper
net-sysfs: use rps_tag_ptr and remove metadata from
rps_sock_flow_table
net-sysfs: get rid of rps_dev_flow_lock
net-sysfs: remove rcu field from 'struct rps_dev_flow_table'
net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
Documentation/networking/scaling.rst | 13 ++--
include/net/hotdata.h | 5 +-
include/net/netdev_rx_queue.h | 3 +-
include/net/rps-types.h | 24 +++++++
include/net/rps.h | 49 ++++++---------
net/core/dev.c | 61 +++++++++++-------
net/core/net-sysfs.c | 66 +++++++++----------
net/core/sysctl_net_core.c | 94 +++++++++++++++-------------
8 files changed, 176 insertions(+), 139 deletions(-)
create mode 100644 include/net/rps-types.h
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 1/7] net: add rps_tag_ptr type and helpers
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
@ 2026-03-01 18:14 ` Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 2/7] net-sysfs: remove rcu field from 'struct rps_sock_flow_table' Eric Dumazet
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
Add a new rps_tag_ptr type to encode a pointer and a size
to a power-of-two table.
Three helpers are added converting an rps_tag_ptr to:
1) A log of the size.
2) A mask : (size - 1).
3) A pointer to the array.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/rps-types.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
create mode 100644 include/net/rps-types.h
diff --git a/include/net/rps-types.h b/include/net/rps-types.h
new file mode 100644
index 0000000000000000000000000000000000000000..6b90a66866c1f75dae768bed84a4eeb9ffb5fc1a
--- /dev/null
+++ b/include/net/rps-types.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _NET_RPS_TYPES_H
+#define _NET_RPS_TYPES_H
+
+/* Define a rps_tag_ptr:
+ * Low order 5 bits are used to store the ilog2(size) of an RPS table.
+ */
+typedef unsigned long rps_tag_ptr;
+
+static inline u8 rps_tag_to_log(rps_tag_ptr tag_ptr)
+{
+ return tag_ptr & 31U;
+}
+
+static inline u32 rps_tag_to_mask(rps_tag_ptr tag_ptr)
+{
+ return (1U << rps_tag_to_log(tag_ptr)) - 1;
+}
+
+static inline void *rps_tag_to_table(rps_tag_ptr tag_ptr)
+{
+ return (void *)(tag_ptr & ~31UL);
+}
+#endif /* _NET_RPS_TYPES_H */
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 2/7] net-sysfs: remove rcu field from 'struct rps_sock_flow_table'
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 1/7] net: add rps_tag_ptr type and helpers Eric Dumazet
@ 2026-03-01 18:14 ` Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 3/7] net-sysfs: add rps_sock_flow_table_mask() helper Eric Dumazet
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
Removing rcu_head (and @mask in a following patch)
will allow a power-of-two allocation and thus high-order
allocation for better performance.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/rps.h | 1 -
net/core/sysctl_net_core.c | 4 +++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/net/rps.h b/include/net/rps.h
index f1794cd2e7fb32a36bde9959fab651663ab190fd..32cfa250d9f931b8ab1c94e0410d0820bb9c999f 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -60,7 +60,6 @@ struct rps_dev_flow_table {
* meaning we use 32-6=26 bits for the hash.
*/
struct rps_sock_flow_table {
- struct rcu_head rcu;
u32 mask;
u32 ents[] ____cacheline_aligned_in_smp;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 03aea10073f003b0339884ee0f40b8c96d7d22e2..0b659c932cffef45e05207890b8187d64ae3c85a 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -147,6 +147,7 @@ static int rps_sock_flow_sysctl(const struct ctl_table *table, int write,
};
struct rps_sock_flow_table *orig_sock_table, *sock_table;
static DEFINE_MUTEX(sock_flow_mutex);
+ void *tofree = NULL;
mutex_lock(&sock_flow_mutex);
@@ -193,13 +194,14 @@ static int rps_sock_flow_sysctl(const struct ctl_table *table, int write,
if (orig_sock_table) {
static_branch_dec(&rps_needed);
static_branch_dec(&rfs_needed);
- kvfree_rcu(orig_sock_table, rcu);
+ tofree = orig_sock_table;
}
}
}
mutex_unlock(&sock_flow_mutex);
+ kvfree_rcu_mightsleep(tofree);
return ret;
}
#endif /* CONFIG_RPS */
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 3/7] net-sysfs: add rps_sock_flow_table_mask() helper
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 1/7] net: add rps_tag_ptr type and helpers Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 2/7] net-sysfs: remove rcu field from 'struct rps_sock_flow_table' Eric Dumazet
@ 2026-03-01 18:14 ` Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 4/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table Eric Dumazet
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
In preparation of the following patch, abstract access
to the @mask field in 'struct rps_sock_flow_table'.
Also cleanup rps_sock_flow_sysctl() a bit :
- Rename orig_sock_table to o_sock_table.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/rps.h | 11 ++++++++---
net/core/dev.c | 4 +++-
net/core/sysctl_net_core.c | 19 ++++++++++---------
3 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/include/net/rps.h b/include/net/rps.h
index 32cfa250d9f931b8ab1c94e0410d0820bb9c999f..82cdffdf3e6b0035e7ceeb130b5b4ac19772e46c 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -60,18 +60,23 @@ struct rps_dev_flow_table {
* meaning we use 32-6=26 bits for the hash.
*/
struct rps_sock_flow_table {
- u32 mask;
+ u32 _mask;
u32 ents[] ____cacheline_aligned_in_smp;
};
#define RPS_SOCK_FLOW_TABLE_SIZE(_num) (offsetof(struct rps_sock_flow_table, ents[_num]))
+static inline u32 rps_sock_flow_table_mask(const struct rps_sock_flow_table *table)
+{
+ return table->_mask;
+}
+
#define RPS_NO_CPU 0xffff
static inline void rps_record_sock_flow(struct rps_sock_flow_table *table,
u32 hash)
{
- unsigned int index = hash & table->mask;
+ unsigned int index = hash & rps_sock_flow_table_mask(table);
u32 val = hash & ~net_hotdata.rps_cpu_mask;
/* We only give a hint, preemption can change CPU under us */
@@ -129,7 +134,7 @@ static inline void _sock_rps_delete_flow(const struct sock *sk)
rcu_read_lock();
table = rcu_dereference(net_hotdata.rps_sock_flow_table);
if (table) {
- index = hash & table->mask;
+ index = hash & rps_sock_flow_table_mask(table);
if (READ_ONCE(table->ents[index]) != RPS_NO_CPU)
WRITE_ONCE(table->ents[index], RPS_NO_CPU);
}
diff --git a/net/core/dev.c b/net/core/dev.c
index 1cf3ad840697ed93a6c4cc5163aae514fda90eff..de70ef784d6363b3af4f9279e107647c90f5af19 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5112,12 +5112,14 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
if (flow_table && sock_flow_table) {
struct rps_dev_flow *rflow;
u32 next_cpu;
+ u32 flow_id;
u32 ident;
/* First check into global flow table if there is a match.
* This READ_ONCE() pairs with WRITE_ONCE() from rps_record_sock_flow().
*/
- ident = READ_ONCE(sock_flow_table->ents[hash & sock_flow_table->mask]);
+ flow_id = hash & rps_sock_flow_table_mask(sock_flow_table);
+ ident = READ_ONCE(sock_flow_table->ents[flow_id]);
if ((ident ^ hash) & ~net_hotdata.rps_cpu_mask)
goto try_rps;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 0b659c932cffef45e05207890b8187d64ae3c85a..cfbe798493b5789dc8baedf9dcbe9c20918e2ba6 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -145,16 +145,17 @@ static int rps_sock_flow_sysctl(const struct ctl_table *table, int write,
.maxlen = sizeof(size),
.mode = table->mode
};
- struct rps_sock_flow_table *orig_sock_table, *sock_table;
+ struct rps_sock_flow_table *o_sock_table, *sock_table;
static DEFINE_MUTEX(sock_flow_mutex);
void *tofree = NULL;
mutex_lock(&sock_flow_mutex);
- orig_sock_table = rcu_dereference_protected(
+ o_sock_table = rcu_dereference_protected(
net_hotdata.rps_sock_flow_table,
lockdep_is_held(&sock_flow_mutex));
- size = orig_size = orig_sock_table ? orig_sock_table->mask + 1 : 0;
+ size = o_sock_table ? rps_sock_flow_table_mask(o_sock_table) + 1 : 0;
+ orig_size = size;
ret = proc_dointvec(&tmp, write, buffer, lenp, ppos);
@@ -165,6 +166,7 @@ static int rps_sock_flow_sysctl(const struct ctl_table *table, int write,
mutex_unlock(&sock_flow_mutex);
return -EINVAL;
}
+ sock_table = o_sock_table;
size = roundup_pow_of_two(size);
if (size != orig_size) {
sock_table =
@@ -175,26 +177,25 @@ static int rps_sock_flow_sysctl(const struct ctl_table *table, int write,
}
net_hotdata.rps_cpu_mask =
roundup_pow_of_two(nr_cpu_ids) - 1;
- sock_table->mask = size - 1;
- } else
- sock_table = orig_sock_table;
+ sock_table->_mask = size - 1;
+ }
for (i = 0; i < size; i++)
sock_table->ents[i] = RPS_NO_CPU;
} else
sock_table = NULL;
- if (sock_table != orig_sock_table) {
+ if (sock_table != o_sock_table) {
rcu_assign_pointer(net_hotdata.rps_sock_flow_table,
sock_table);
if (sock_table) {
static_branch_inc(&rps_needed);
static_branch_inc(&rfs_needed);
}
- if (orig_sock_table) {
+ if (o_sock_table) {
static_branch_dec(&rps_needed);
static_branch_dec(&rfs_needed);
- tofree = orig_sock_table;
+ tofree = o_sock_table;
}
}
}
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 4/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
` (2 preceding siblings ...)
2026-03-01 18:14 ` [PATCH v2 net-next 3/7] net-sysfs: add rps_sock_flow_table_mask() helper Eric Dumazet
@ 2026-03-01 18:14 ` Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 5/7] net-sysfs: get rid of rps_dev_flow_lock Eric Dumazet
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
Instead of storing the @mask at the beginning of rps_sock_flow_table,
use 5 low order bits of the rps_tag_ptr to store the log of the size.
This removes a potential cache line miss to fetch @mask.
More importantly, we can switch to vmalloc_huge() without wasting memory.
Tested with:
numactl --interleave=all bash -c "echo 4194304 >/proc/sys/net/core/rps_sock_flow_entries"
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
Documentation/networking/scaling.rst | 13 ++--
include/net/hotdata.h | 5 +-
include/net/rps.h | 42 ++++++-------
net/core/dev.c | 12 ++--
net/core/sysctl_net_core.c | 89 +++++++++++++++-------------
5 files changed, 86 insertions(+), 75 deletions(-)
diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst
index 0023afa530ec166bb13558318053ff5ed0906b71..6c261eb48845a40516f201233df13694863ee8cd 100644
--- a/Documentation/networking/scaling.rst
+++ b/Documentation/networking/scaling.rst
@@ -403,16 +403,21 @@ Both of these need to be set before RFS is enabled for a receive queue.
Values for both are rounded up to the nearest power of two. The
suggested flow count depends on the expected number of active connections
at any given time, which may be significantly less than the number of open
-connections. We have found that a value of 32768 for rps_sock_flow_entries
-works fairly well on a moderately loaded server.
+connections. We have found that a value of 65536 for rps_sock_flow_entries
+works fairly well on a moderately loaded server. Big servers might
+need 1048576 or even higher values.
+
+On a NUMA host it is advisable to spread rps_sock_flow_entries on all nodes.
+
+numactl --interleave=all bash -c "echo 1048576 >/proc/sys/net/core/rps_sock_flow_entries"
For a single queue device, the rps_flow_cnt value for the single queue
would normally be configured to the same value as rps_sock_flow_entries.
For a multi-queue device, the rps_flow_cnt for each queue might be
configured as rps_sock_flow_entries / N, where N is the number of
-queues. So for instance, if rps_sock_flow_entries is set to 32768 and there
+queues. So for instance, if rps_sock_flow_entries is set to 131072 and there
are 16 configured receive queues, rps_flow_cnt for each queue might be
-configured as 2048.
+configured as 8192.
Accelerated RFS
diff --git a/include/net/hotdata.h b/include/net/hotdata.h
index 6632b1aa7584821fd4ab42163b77dfff6732a45e..62534d1f3c707038cd6b805ccc1889e7709d999d 100644
--- a/include/net/hotdata.h
+++ b/include/net/hotdata.h
@@ -6,6 +6,9 @@
#include <linux/types.h>
#include <linux/netdevice.h>
#include <net/protocol.h>
+#ifdef CONFIG_RPS
+#include <net/rps-types.h>
+#endif
struct skb_defer_node {
struct llist_head defer_list;
@@ -33,7 +36,7 @@ struct net_hotdata {
struct kmem_cache *skbuff_fclone_cache;
struct kmem_cache *skb_small_head_cache;
#ifdef CONFIG_RPS
- struct rps_sock_flow_table __rcu *rps_sock_flow_table;
+ rps_tag_ptr rps_sock_flow_table;
u32 rps_cpu_mask;
#endif
struct skb_defer_node __percpu *skb_defer_nodes;
diff --git a/include/net/rps.h b/include/net/rps.h
index 82cdffdf3e6b0035e7ceeb130b5b4ac19772e46c..dee930d9dd38e0e975e78d938bc7adc96048b724 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -8,6 +8,7 @@
#include <net/hotdata.h>
#ifdef CONFIG_RPS
+#include <net/rps-types.h>
extern struct static_key_false rps_needed;
extern struct static_key_false rfs_needed;
@@ -60,45 +61,38 @@ struct rps_dev_flow_table {
* meaning we use 32-6=26 bits for the hash.
*/
struct rps_sock_flow_table {
- u32 _mask;
-
- u32 ents[] ____cacheline_aligned_in_smp;
+ u32 ent;
};
-#define RPS_SOCK_FLOW_TABLE_SIZE(_num) (offsetof(struct rps_sock_flow_table, ents[_num]))
-
-static inline u32 rps_sock_flow_table_mask(const struct rps_sock_flow_table *table)
-{
- return table->_mask;
-}
#define RPS_NO_CPU 0xffff
-static inline void rps_record_sock_flow(struct rps_sock_flow_table *table,
- u32 hash)
+static inline void rps_record_sock_flow(rps_tag_ptr tag_ptr, u32 hash)
{
- unsigned int index = hash & rps_sock_flow_table_mask(table);
+ unsigned int index = hash & rps_tag_to_mask(tag_ptr);
u32 val = hash & ~net_hotdata.rps_cpu_mask;
+ struct rps_sock_flow_table *table;
/* We only give a hint, preemption can change CPU under us */
val |= raw_smp_processor_id();
+ table = rps_tag_to_table(tag_ptr);
/* The following WRITE_ONCE() is paired with the READ_ONCE()
* here, and another one in get_rps_cpu().
*/
- if (READ_ONCE(table->ents[index]) != val)
- WRITE_ONCE(table->ents[index], val);
+ if (READ_ONCE(table[index].ent) != val)
+ WRITE_ONCE(table[index].ent, val);
}
static inline void _sock_rps_record_flow_hash(__u32 hash)
{
- struct rps_sock_flow_table *sock_flow_table;
+ rps_tag_ptr tag_ptr;
if (!hash)
return;
rcu_read_lock();
- sock_flow_table = rcu_dereference(net_hotdata.rps_sock_flow_table);
- if (sock_flow_table)
- rps_record_sock_flow(sock_flow_table, hash);
+ tag_ptr = READ_ONCE(net_hotdata.rps_sock_flow_table);
+ if (tag_ptr)
+ rps_record_sock_flow(tag_ptr, hash);
rcu_read_unlock();
}
@@ -125,6 +119,7 @@ static inline void _sock_rps_record_flow(const struct sock *sk)
static inline void _sock_rps_delete_flow(const struct sock *sk)
{
struct rps_sock_flow_table *table;
+ rps_tag_ptr tag_ptr;
u32 hash, index;
hash = READ_ONCE(sk->sk_rxhash);
@@ -132,11 +127,12 @@ static inline void _sock_rps_delete_flow(const struct sock *sk)
return;
rcu_read_lock();
- table = rcu_dereference(net_hotdata.rps_sock_flow_table);
- if (table) {
- index = hash & rps_sock_flow_table_mask(table);
- if (READ_ONCE(table->ents[index]) != RPS_NO_CPU)
- WRITE_ONCE(table->ents[index], RPS_NO_CPU);
+ tag_ptr = READ_ONCE(net_hotdata.rps_sock_flow_table);
+ if (tag_ptr) {
+ index = hash & rps_tag_to_mask(tag_ptr);
+ table = rps_tag_to_table(tag_ptr);
+ if (READ_ONCE(table[index].ent) != RPS_NO_CPU)
+ WRITE_ONCE(table[index].ent, RPS_NO_CPU);
}
rcu_read_unlock();
}
diff --git a/net/core/dev.c b/net/core/dev.c
index de70ef784d6363b3af4f9279e107647c90f5af19..d4837b058b2ff02e94f9590e310edbcb06dad0f2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5075,9 +5075,9 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
struct rps_dev_flow **rflowp)
{
- const struct rps_sock_flow_table *sock_flow_table;
struct netdev_rx_queue *rxqueue = dev->_rx;
struct rps_dev_flow_table *flow_table;
+ rps_tag_ptr global_tag_ptr;
struct rps_map *map;
int cpu = -1;
u32 tcpu;
@@ -5108,8 +5108,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
if (!hash)
goto done;
- sock_flow_table = rcu_dereference(net_hotdata.rps_sock_flow_table);
- if (flow_table && sock_flow_table) {
+ global_tag_ptr = READ_ONCE(net_hotdata.rps_sock_flow_table);
+ if (flow_table && global_tag_ptr) {
+ struct rps_sock_flow_table *sock_flow_table;
struct rps_dev_flow *rflow;
u32 next_cpu;
u32 flow_id;
@@ -5118,8 +5119,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
/* First check into global flow table if there is a match.
* This READ_ONCE() pairs with WRITE_ONCE() from rps_record_sock_flow().
*/
- flow_id = hash & rps_sock_flow_table_mask(sock_flow_table);
- ident = READ_ONCE(sock_flow_table->ents[flow_id]);
+ flow_id = hash & rps_tag_to_mask(global_tag_ptr);
+ sock_flow_table = rps_tag_to_table(global_tag_ptr);
+ ident = READ_ONCE(sock_flow_table[flow_id].ent);
if ((ident ^ hash) & ~net_hotdata.rps_cpu_mask)
goto try_rps;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cfbe798493b5789dc8baedf9dcbe9c20918e2ba6..502705e0464981ecfc32233d22c747e14b3febf7 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -138,68 +138,73 @@ static int rps_default_mask_sysctl(const struct ctl_table *table, int write,
static int rps_sock_flow_sysctl(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
+ struct rps_sock_flow_table *o_sock_table, *sock_table;
+ static DEFINE_MUTEX(sock_flow_mutex);
+ rps_tag_ptr o_tag_ptr, tag_ptr;
unsigned int orig_size, size;
- int ret, i;
struct ctl_table tmp = {
.data = &size,
.maxlen = sizeof(size),
.mode = table->mode
};
- struct rps_sock_flow_table *o_sock_table, *sock_table;
- static DEFINE_MUTEX(sock_flow_mutex);
void *tofree = NULL;
+ int ret, i;
+ u8 log;
mutex_lock(&sock_flow_mutex);
- o_sock_table = rcu_dereference_protected(
- net_hotdata.rps_sock_flow_table,
- lockdep_is_held(&sock_flow_mutex));
- size = o_sock_table ? rps_sock_flow_table_mask(o_sock_table) + 1 : 0;
+ o_tag_ptr = tag_ptr = net_hotdata.rps_sock_flow_table;
+
+ size = o_tag_ptr ? rps_tag_to_mask(o_tag_ptr) + 1 : 0;
+ o_sock_table = rps_tag_to_table(o_tag_ptr);
orig_size = size;
ret = proc_dointvec(&tmp, write, buffer, lenp, ppos);
- if (write) {
- if (size) {
- if (size > 1<<29) {
- /* Enforce limit to prevent overflow */
+ if (!write)
+ goto unlock;
+
+ if (size) {
+ if (size > 1<<29) {
+ /* Enforce limit to prevent overflow */
+ mutex_unlock(&sock_flow_mutex);
+ return -EINVAL;
+ }
+ sock_table = o_sock_table;
+ size = roundup_pow_of_two(size);
+ if (size != orig_size) {
+ sock_table = vmalloc_huge(size * sizeof(*sock_table),
+ GFP_KERNEL);
+ if (!sock_table) {
mutex_unlock(&sock_flow_mutex);
- return -EINVAL;
- }
- sock_table = o_sock_table;
- size = roundup_pow_of_two(size);
- if (size != orig_size) {
- sock_table =
- vmalloc(RPS_SOCK_FLOW_TABLE_SIZE(size));
- if (!sock_table) {
- mutex_unlock(&sock_flow_mutex);
- return -ENOMEM;
- }
- net_hotdata.rps_cpu_mask =
- roundup_pow_of_two(nr_cpu_ids) - 1;
- sock_table->_mask = size - 1;
+ return -ENOMEM;
}
+ net_hotdata.rps_cpu_mask =
+ roundup_pow_of_two(nr_cpu_ids) - 1;
+ log = ilog2(size);
+ tag_ptr = (rps_tag_ptr)sock_table | log;
+ }
- for (i = 0; i < size; i++)
- sock_table->ents[i] = RPS_NO_CPU;
- } else
- sock_table = NULL;
-
- if (sock_table != o_sock_table) {
- rcu_assign_pointer(net_hotdata.rps_sock_flow_table,
- sock_table);
- if (sock_table) {
- static_branch_inc(&rps_needed);
- static_branch_inc(&rfs_needed);
- }
- if (o_sock_table) {
- static_branch_dec(&rps_needed);
- static_branch_dec(&rfs_needed);
- tofree = o_sock_table;
- }
+ for (i = 0; i < size; i++)
+ sock_table[i].ent = RPS_NO_CPU;
+ } else {
+ sock_table = NULL;
+ tag_ptr = 0UL;
+ }
+ if (tag_ptr != o_tag_ptr) {
+ smp_store_release(&net_hotdata.rps_sock_flow_table, tag_ptr);
+ if (sock_table) {
+ static_branch_inc(&rps_needed);
+ static_branch_inc(&rfs_needed);
+ }
+ if (o_sock_table) {
+ static_branch_dec(&rps_needed);
+ static_branch_dec(&rfs_needed);
+ tofree = o_sock_table;
}
}
+unlock:
mutex_unlock(&sock_flow_mutex);
kvfree_rcu_mightsleep(tofree);
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 5/7] net-sysfs: get rid of rps_dev_flow_lock
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
` (3 preceding siblings ...)
2026-03-01 18:14 ` [PATCH v2 net-next 4/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table Eric Dumazet
@ 2026-03-01 18:14 ` Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 6/7] net-sysfs: remove rcu field from 'struct rps_dev_flow_table' Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table Eric Dumazet
6 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
Use unrcu_pointer() and xchg() in store_rps_dev_flow_table_cnt()
instead of a dedicated spinlock.
Make a similar change in rx_queue_release(), so that both
functions use a similar construct and synchronization.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/net-sysfs.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 07624b682b08b24da790d377f1deec4dc0a84269..52fcf7fa58a808e79c1a17c8719830bcfb7c1674 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1084,7 +1084,6 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
{
unsigned long mask, count;
struct rps_dev_flow_table *table, *old_table;
- static DEFINE_SPINLOCK(rps_dev_flow_lock);
int rc;
if (!capable(CAP_NET_ADMIN))
@@ -1128,11 +1127,8 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
table = NULL;
}
- spin_lock(&rps_dev_flow_lock);
- old_table = rcu_dereference_protected(queue->rps_flow_table,
- lockdep_is_held(&rps_dev_flow_lock));
- rcu_assign_pointer(queue->rps_flow_table, table);
- spin_unlock(&rps_dev_flow_lock);
+ old_table = unrcu_pointer(xchg(&queue->rps_flow_table,
+ RCU_INITIALIZER(table)));
if (old_table)
call_rcu(&old_table->rcu, rps_dev_flow_table_release);
@@ -1161,8 +1157,8 @@ static void rx_queue_release(struct kobject *kobj)
{
struct netdev_rx_queue *queue = to_rx_queue(kobj);
#ifdef CONFIG_RPS
+ struct rps_dev_flow_table *old_table;
struct rps_map *map;
- struct rps_dev_flow_table *flow_table;
map = rcu_dereference_protected(queue->rps_map, 1);
if (map) {
@@ -1170,11 +1166,9 @@ static void rx_queue_release(struct kobject *kobj)
kfree_rcu(map, rcu);
}
- flow_table = rcu_dereference_protected(queue->rps_flow_table, 1);
- if (flow_table) {
- RCU_INIT_POINTER(queue->rps_flow_table, NULL);
- call_rcu(&flow_table->rcu, rps_dev_flow_table_release);
- }
+ old_table = unrcu_pointer(xchg(&queue->rps_flow_table, NULL));
+ if (old_table)
+ call_rcu(&old_table->rcu, rps_dev_flow_table_release);
#endif
memset(kobj, 0, sizeof(*kobj));
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 6/7] net-sysfs: remove rcu field from 'struct rps_dev_flow_table'
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
` (4 preceding siblings ...)
2026-03-01 18:14 ` [PATCH v2 net-next 5/7] net-sysfs: get rid of rps_dev_flow_lock Eric Dumazet
@ 2026-03-01 18:14 ` Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table Eric Dumazet
6 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
Remove rps_dev_flow_table_release() in favor of kvfree_rcu_mightsleep().
In the following pach, we will remove "u8 @log" field
and 'struct rps_dev_flow_table' size will be a power-of-two.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/rps.h | 1 -
net/core/net-sysfs.c | 11 ++---------
2 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/include/net/rps.h b/include/net/rps.h
index dee930d9dd38e0e975e78d938bc7adc96048b724..e900480e828b487c721b3ef392f4abb427ad442c 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -44,7 +44,6 @@ struct rps_dev_flow {
*/
struct rps_dev_flow_table {
u8 log;
- struct rcu_head rcu;
struct rps_dev_flow flows[];
};
#define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 52fcf7fa58a808e79c1a17c8719830bcfb7c1674..fd6f81930bc6437957f32206c84db87ee242fede 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1072,13 +1072,6 @@ static ssize_t show_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
return sysfs_emit(buf, "%lu\n", val);
}
-static void rps_dev_flow_table_release(struct rcu_head *rcu)
-{
- struct rps_dev_flow_table *table = container_of(rcu,
- struct rps_dev_flow_table, rcu);
- vfree(table);
-}
-
static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
const char *buf, size_t len)
{
@@ -1131,7 +1124,7 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
RCU_INITIALIZER(table)));
if (old_table)
- call_rcu(&old_table->rcu, rps_dev_flow_table_release);
+ kvfree_rcu_mightsleep(old_table);
return len;
}
@@ -1168,7 +1161,7 @@ static void rx_queue_release(struct kobject *kobj)
old_table = unrcu_pointer(xchg(&queue->rps_flow_table, NULL));
if (old_table)
- call_rcu(&old_table->rcu, rps_dev_flow_table_release);
+ kvfree_rcu_mightsleep(old_table);
#endif
memset(kobj, 0, sizeof(*kobj));
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
` (5 preceding siblings ...)
2026-03-01 18:14 ` [PATCH v2 net-next 6/7] net-sysfs: remove rcu field from 'struct rps_dev_flow_table' Eric Dumazet
@ 2026-03-01 18:14 ` Eric Dumazet
2026-03-01 22:05 ` kernel test robot
2026-03-01 23:38 ` kernel test robot
6 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-01 18:14 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
Eric Dumazet
Instead of storing the @log at the beginning of rps_dev_flow_table
use 5 low order bits of the rps_tag_ptr to store the log of the size.
This removes a potential cache line miss (for light traffic).
This allows us to switch to one high-order allocation instead of vmalloc()
when CONFIG_RFS_ACCEL is not set.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2: removed "struct rps_dev_flow_table" to address kernel build bot report.
include/net/netdev_rx_queue.h | 3 +-
include/net/rps.h | 10 -------
net/core/dev.c | 53 ++++++++++++++++++++---------------
net/core/net-sysfs.c | 53 ++++++++++++++++++++---------------
4 files changed, 63 insertions(+), 56 deletions(-)
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index cfa72c4853876c6fcb84b5c551580d9205f7b29d..08f81329fc11dc86767f9da661be8c7194dc1da2 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -8,13 +8,14 @@
#include <net/xdp.h>
#include <net/page_pool/types.h>
#include <net/netdev_queues.h>
+#include <net/rps-types.h>
/* This structure contains an instance of an RX queue. */
struct netdev_rx_queue {
struct xdp_rxq_info xdp_rxq;
#ifdef CONFIG_RPS
struct rps_map __rcu *rps_map;
- struct rps_dev_flow_table __rcu *rps_flow_table;
+ rps_tag_ptr rps_flow_table;
#endif
struct kobject kobj;
const struct attribute_group **groups;
diff --git a/include/net/rps.h b/include/net/rps.h
index e900480e828b487c721b3ef392f4abb427ad442c..e33c6a2fa8bbca3555ecccbbf9132d01cc433c36 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -39,16 +39,6 @@ struct rps_dev_flow {
};
#define RPS_NO_FILTER 0xffff
-/*
- * The rps_dev_flow_table structure contains a table of flow mappings.
- */
-struct rps_dev_flow_table {
- u8 log;
- struct rps_dev_flow flows[];
-};
-#define RPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct rps_dev_flow_table) + \
- ((_num) * sizeof(struct rps_dev_flow)))
-
/*
* The rps_sock_flow_table contains mappings of flows to the last CPU
* on which they were processed by the application (set in recvmsg).
diff --git a/net/core/dev.c b/net/core/dev.c
index d4837b058b2ff02e94f9590e310edbcb06dad0f2..053a30a8c0ea4464d3b61c7dde8ad916eeef1c19 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4968,16 +4968,16 @@ EXPORT_SYMBOL(rps_needed);
struct static_key_false rfs_needed __read_mostly;
EXPORT_SYMBOL(rfs_needed);
-static u32 rfs_slot(u32 hash, const struct rps_dev_flow_table *flow_table)
+static u32 rfs_slot(u32 hash, rps_tag_ptr tag_ptr)
{
- return hash_32(hash, flow_table->log);
+ return hash_32(hash, rps_tag_to_log(tag_ptr));
}
#ifdef CONFIG_RFS_ACCEL
/**
* rps_flow_is_active - check whether the flow is recently active.
* @rflow: Specific flow to check activity.
- * @flow_table: per-queue flowtable that @rflow belongs to.
+ * @log: ilog2(hashsize).
* @cpu: CPU saved in @rflow.
*
* If the CPU has processed many packets since the flow's last activity
@@ -4986,7 +4986,7 @@ static u32 rfs_slot(u32 hash, const struct rps_dev_flow_table *flow_table)
* Return: true if flow was recently active.
*/
static bool rps_flow_is_active(struct rps_dev_flow *rflow,
- struct rps_dev_flow_table *flow_table,
+ u8 log,
unsigned int cpu)
{
unsigned int flow_last_active;
@@ -4999,7 +4999,7 @@ static bool rps_flow_is_active(struct rps_dev_flow *rflow,
flow_last_active = READ_ONCE(rflow->last_qtail);
return (int)(sd_input_head - flow_last_active) <
- (int)(10 << flow_table->log);
+ (int)(10 << log);
}
#endif
@@ -5011,9 +5011,10 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
u32 head;
#ifdef CONFIG_RFS_ACCEL
struct netdev_rx_queue *rxqueue;
- struct rps_dev_flow_table *flow_table;
+ struct rps_dev_flow *flow_table;
struct rps_dev_flow *old_rflow;
struct rps_dev_flow *tmp_rflow;
+ rps_tag_ptr q_tag_ptr;
unsigned int tmp_cpu;
u16 rxq_index;
u32 flow_id;
@@ -5028,16 +5029,18 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
goto out;
rxqueue = dev->_rx + rxq_index;
- flow_table = rcu_dereference(rxqueue->rps_flow_table);
- if (!flow_table)
+ q_tag_ptr = READ_ONCE(rxqueue->rps_flow_table);
+ if (!q_tag_ptr)
goto out;
- flow_id = rfs_slot(hash, flow_table);
- tmp_rflow = &flow_table->flows[flow_id];
+ flow_id = rfs_slot(hash, q_tag_ptr);
+ flow_table = rps_tag_to_table(q_tag_ptr);
+ tmp_rflow = flow_table + flow_id;
tmp_cpu = READ_ONCE(tmp_rflow->cpu);
if (READ_ONCE(tmp_rflow->filter) != RPS_NO_FILTER) {
- if (rps_flow_is_active(tmp_rflow, flow_table,
+ if (rps_flow_is_active(tmp_rflow,
+ rps_tag_to_log(q_tag_ptr),
tmp_cpu)) {
if (hash != READ_ONCE(tmp_rflow->hash) ||
next_cpu == tmp_cpu)
@@ -5076,8 +5079,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
struct rps_dev_flow **rflowp)
{
struct netdev_rx_queue *rxqueue = dev->_rx;
- struct rps_dev_flow_table *flow_table;
- rps_tag_ptr global_tag_ptr;
+ rps_tag_ptr global_tag_ptr, q_tag_ptr;
struct rps_map *map;
int cpu = -1;
u32 tcpu;
@@ -5098,9 +5100,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
/* Avoid computing hash if RFS/RPS is not active for this rxqueue */
- flow_table = rcu_dereference(rxqueue->rps_flow_table);
+ q_tag_ptr = READ_ONCE(rxqueue->rps_flow_table);
map = rcu_dereference(rxqueue->rps_map);
- if (!flow_table && !map)
+ if (!q_tag_ptr && !map)
goto done;
skb_reset_network_header(skb);
@@ -5109,8 +5111,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
goto done;
global_tag_ptr = READ_ONCE(net_hotdata.rps_sock_flow_table);
- if (flow_table && global_tag_ptr) {
+ if (q_tag_ptr && global_tag_ptr) {
struct rps_sock_flow_table *sock_flow_table;
+ struct rps_dev_flow *flow_table;
struct rps_dev_flow *rflow;
u32 next_cpu;
u32 flow_id;
@@ -5130,7 +5133,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
/* OK, now we know there is a match,
* we can look at the local (per receive queue) flow table
*/
- rflow = &flow_table->flows[rfs_slot(hash, flow_table)];
+ flow_id = rfs_slot(hash, q_tag_ptr);
+ flow_table = rps_tag_to_table(q_tag_ptr);
+ rflow = flow_table + flow_id;
tcpu = rflow->cpu;
/*
@@ -5190,19 +5195,23 @@ bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
u32 flow_id, u16 filter_id)
{
struct netdev_rx_queue *rxqueue = dev->_rx + rxq_index;
- struct rps_dev_flow_table *flow_table;
+ struct rps_dev_flow *flow_table;
struct rps_dev_flow *rflow;
+ rps_tag_ptr q_tag_ptr;
bool expire = true;
+ u8 log;
rcu_read_lock();
- flow_table = rcu_dereference(rxqueue->rps_flow_table);
- if (flow_table && flow_id < (1UL << flow_table->log)) {
+ q_tag_ptr = READ_ONCE(rxqueue->rps_flow_table);
+ log = rps_tag_to_log(q_tag_ptr);
+ if (q_tag_ptr && flow_id < (1UL << log)) {
unsigned int cpu;
- rflow = &flow_table->flows[flow_id];
+ flow_table = rps_tag_to_table(q_tag_ptr);
+ rflow = flow_table + flow_id;
cpu = READ_ONCE(rflow->cpu);
if (READ_ONCE(rflow->filter) == filter_id &&
- rps_flow_is_active(rflow, flow_table, cpu))
+ rps_flow_is_active(rflow, log, cpu))
expire = false;
}
rcu_read_unlock();
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index fd6f81930bc6437957f32206c84db87ee242fede..38bf2b42efb9abbf0a2b66b1d7dae130c85f1966 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1060,14 +1060,12 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
static ssize_t show_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
char *buf)
{
- struct rps_dev_flow_table *flow_table;
unsigned long val = 0;
+ rps_tag_ptr tag_ptr;
- rcu_read_lock();
- flow_table = rcu_dereference(queue->rps_flow_table);
- if (flow_table)
- val = 1UL << flow_table->log;
- rcu_read_unlock();
+ tag_ptr = READ_ONCE(queue->rps_flow_table);
+ if (tag_ptr)
+ val = 1UL << rps_tag_to_log(tag_ptr);
return sysfs_emit(buf, "%lu\n", val);
}
@@ -1075,8 +1073,10 @@ static ssize_t show_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
const char *buf, size_t len)
{
+ struct rps_dev_flow *table;
+ rps_tag_ptr otag, tag_ptr = 0UL;
unsigned long mask, count;
- struct rps_dev_flow_table *table, *old_table;
+ size_t sz;
int rc;
if (!capable(CAP_NET_ADMIN))
@@ -1107,24 +1107,31 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
return -EINVAL;
}
#endif
- table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(mask + 1));
+ sz = max_t(size_t, sizeof(*table) * (mask + 1),
+ PAGE_SIZE);
+ if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
+ is_power_of_2(sz))
+ table = kvmalloc(sz, GFP_KERNEL);
+ else
+ table = vmalloc(sz);
if (!table)
return -ENOMEM;
-
- table->log = ilog2(mask) + 1;
+ tag_ptr = (rps_tag_ptr)table;
+ if (rps_tag_to_log(tag_ptr)) {
+ pr_err_once("store_rps_dev_flow_table_cnt() got a non page aligned allocation.\n");
+ kvfree(table);
+ return -ENOMEM;
+ }
+ tag_ptr |= (ilog2(mask) + 1);
for (count = 0; count <= mask; count++) {
- table->flows[count].cpu = RPS_NO_CPU;
- table->flows[count].filter = RPS_NO_FILTER;
+ table[count].cpu = RPS_NO_CPU;
+ table[count].filter = RPS_NO_FILTER;
}
- } else {
- table = NULL;
}
- old_table = unrcu_pointer(xchg(&queue->rps_flow_table,
- RCU_INITIALIZER(table)));
-
- if (old_table)
- kvfree_rcu_mightsleep(old_table);
+ otag = xchg(&queue->rps_flow_table, tag_ptr);
+ if (otag)
+ kvfree_rcu_mightsleep(rps_tag_to_table(otag));
return len;
}
@@ -1150,7 +1157,7 @@ static void rx_queue_release(struct kobject *kobj)
{
struct netdev_rx_queue *queue = to_rx_queue(kobj);
#ifdef CONFIG_RPS
- struct rps_dev_flow_table *old_table;
+ rps_tag_ptr tag_ptr;
struct rps_map *map;
map = rcu_dereference_protected(queue->rps_map, 1);
@@ -1159,9 +1166,9 @@ static void rx_queue_release(struct kobject *kobj)
kfree_rcu(map, rcu);
}
- old_table = unrcu_pointer(xchg(&queue->rps_flow_table, NULL));
- if (old_table)
- kvfree_rcu_mightsleep(old_table);
+ tag_ptr = xchg(&queue->rps_flow_table, 0UL);
+ if (tag_ptr)
+ kvfree_rcu_mightsleep(rps_tag_to_table(tag_ptr));
#endif
memset(kobj, 0, sizeof(*kobj));
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
2026-03-01 18:14 ` [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table Eric Dumazet
@ 2026-03-01 22:05 ` kernel test robot
2026-03-01 23:38 ` kernel test robot
1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2026-03-01 22:05 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: oe-kbuild-all, Simon Horman, Kuniyuki Iwashima, netdev,
eric.dumazet, Eric Dumazet
Hi Eric,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-rps_tag_ptr-type-and-helpers/20260302-021900
base: net-next/main
patch link: https://lore.kernel.org/r/20260301181457.3539105-8-edumazet%40google.com
patch subject: [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
config: sh-defconfig (https://download.01.org/0day-ci/archive/20260302/202603020519.4JHxNgoV-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260302/202603020519.4JHxNgoV-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603020519.4JHxNgoV-lkp@intel.com/
All errors (new ones prefixed by >>):
net/core/net-sysfs.c: In function 'store_rps_dev_flow_table_cnt':
>> net/core/net-sysfs.c:1104:41: error: implicit declaration of function 'RPS_DEV_FLOW_TABLE_SIZE' [-Wimplicit-function-declaration]
1104 | if (mask > (ULONG_MAX - RPS_DEV_FLOW_TABLE_SIZE(1))
| ^~~~~~~~~~~~~~~~~~~~~~~
vim +/RPS_DEV_FLOW_TABLE_SIZE +1104 net/core/net-sysfs.c
fec5e652e58fa6 Tom Herbert 2010-04-16 1072
f5acb907dc24c3 Eric Dumazet 2010-04-19 1073 static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
fec5e652e58fa6 Tom Herbert 2010-04-16 1074 const char *buf, size_t len)
fec5e652e58fa6 Tom Herbert 2010-04-16 1075 {
5b629854b594cd Eric Dumazet 2026-03-01 1076 struct rps_dev_flow *table;
5b629854b594cd Eric Dumazet 2026-03-01 1077 rps_tag_ptr otag, tag_ptr = 0UL;
60b778ce519625 Eric Dumazet 2011-12-24 1078 unsigned long mask, count;
5b629854b594cd Eric Dumazet 2026-03-01 1079 size_t sz;
60b778ce519625 Eric Dumazet 2011-12-24 1080 int rc;
fec5e652e58fa6 Tom Herbert 2010-04-16 1081
fec5e652e58fa6 Tom Herbert 2010-04-16 1082 if (!capable(CAP_NET_ADMIN))
fec5e652e58fa6 Tom Herbert 2010-04-16 1083 return -EPERM;
fec5e652e58fa6 Tom Herbert 2010-04-16 1084
60b778ce519625 Eric Dumazet 2011-12-24 1085 rc = kstrtoul(buf, 0, &count);
60b778ce519625 Eric Dumazet 2011-12-24 1086 if (rc < 0)
60b778ce519625 Eric Dumazet 2011-12-24 1087 return rc;
fec5e652e58fa6 Tom Herbert 2010-04-16 1088
fec5e652e58fa6 Tom Herbert 2010-04-16 1089 if (count) {
60b778ce519625 Eric Dumazet 2011-12-24 1090 mask = count - 1;
60b778ce519625 Eric Dumazet 2011-12-24 1091 /* mask = roundup_pow_of_two(count) - 1;
60b778ce519625 Eric Dumazet 2011-12-24 1092 * without overflows...
60b778ce519625 Eric Dumazet 2011-12-24 1093 */
60b778ce519625 Eric Dumazet 2011-12-24 1094 while ((mask | (mask >> 1)) != mask)
60b778ce519625 Eric Dumazet 2011-12-24 1095 mask |= (mask >> 1);
60b778ce519625 Eric Dumazet 2011-12-24 1096 /* On 64 bit arches, must check mask fits in table->mask (u32),
8e3bff96afa673 stephen hemminger 2013-12-08 1097 * and on 32bit arches, must check
8e3bff96afa673 stephen hemminger 2013-12-08 1098 * RPS_DEV_FLOW_TABLE_SIZE(mask + 1) doesn't overflow.
60b778ce519625 Eric Dumazet 2011-12-24 1099 */
60b778ce519625 Eric Dumazet 2011-12-24 1100 #if BITS_PER_LONG > 32
60b778ce519625 Eric Dumazet 2011-12-24 1101 if (mask > (unsigned long)(u32)mask)
a0a129f8b6cff5 Xi Wang 2011-12-22 1102 return -EINVAL;
60b778ce519625 Eric Dumazet 2011-12-24 1103 #else
60b778ce519625 Eric Dumazet 2011-12-24 @1104 if (mask > (ULONG_MAX - RPS_DEV_FLOW_TABLE_SIZE(1))
a0a129f8b6cff5 Xi Wang 2011-12-22 1105 / sizeof(struct rps_dev_flow)) {
fec5e652e58fa6 Tom Herbert 2010-04-16 1106 /* Enforce a limit to prevent overflow */
fec5e652e58fa6 Tom Herbert 2010-04-16 1107 return -EINVAL;
fec5e652e58fa6 Tom Herbert 2010-04-16 1108 }
60b778ce519625 Eric Dumazet 2011-12-24 1109 #endif
5b629854b594cd Eric Dumazet 2026-03-01 1110 sz = max_t(size_t, sizeof(*table) * (mask + 1),
5b629854b594cd Eric Dumazet 2026-03-01 1111 PAGE_SIZE);
5b629854b594cd Eric Dumazet 2026-03-01 1112 if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
5b629854b594cd Eric Dumazet 2026-03-01 1113 is_power_of_2(sz))
5b629854b594cd Eric Dumazet 2026-03-01 1114 table = kvmalloc(sz, GFP_KERNEL);
5b629854b594cd Eric Dumazet 2026-03-01 1115 else
5b629854b594cd Eric Dumazet 2026-03-01 1116 table = vmalloc(sz);
fec5e652e58fa6 Tom Herbert 2010-04-16 1117 if (!table)
fec5e652e58fa6 Tom Herbert 2010-04-16 1118 return -ENOMEM;
5b629854b594cd Eric Dumazet 2026-03-01 1119 tag_ptr = (rps_tag_ptr)table;
5b629854b594cd Eric Dumazet 2026-03-01 1120 if (rps_tag_to_log(tag_ptr)) {
5b629854b594cd Eric Dumazet 2026-03-01 1121 pr_err_once("store_rps_dev_flow_table_cnt() got a non page aligned allocation.\n");
5b629854b594cd Eric Dumazet 2026-03-01 1122 kvfree(table);
5b629854b594cd Eric Dumazet 2026-03-01 1123 return -ENOMEM;
5b629854b594cd Eric Dumazet 2026-03-01 1124 }
5b629854b594cd Eric Dumazet 2026-03-01 1125 tag_ptr |= (ilog2(mask) + 1);
97bcc5b6f45425 Krishna Kumar 2025-08-25 1126 for (count = 0; count <= mask; count++) {
5b629854b594cd Eric Dumazet 2026-03-01 1127 table[count].cpu = RPS_NO_CPU;
5b629854b594cd Eric Dumazet 2026-03-01 1128 table[count].filter = RPS_NO_FILTER;
97bcc5b6f45425 Krishna Kumar 2025-08-25 1129 }
6648c65e7ea72c stephen hemminger 2017-08-18 1130 }
fec5e652e58fa6 Tom Herbert 2010-04-16 1131
5b629854b594cd Eric Dumazet 2026-03-01 1132 otag = xchg(&queue->rps_flow_table, tag_ptr);
5b629854b594cd Eric Dumazet 2026-03-01 1133 if (otag)
5b629854b594cd Eric Dumazet 2026-03-01 1134 kvfree_rcu_mightsleep(rps_tag_to_table(otag));
fec5e652e58fa6 Tom Herbert 2010-04-16 1135
fec5e652e58fa6 Tom Herbert 2010-04-16 1136 return len;
fec5e652e58fa6 Tom Herbert 2010-04-16 1137 }
fec5e652e58fa6 Tom Herbert 2010-04-16 1138
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
2026-03-01 18:14 ` [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table Eric Dumazet
2026-03-01 22:05 ` kernel test robot
@ 2026-03-01 23:38 ` kernel test robot
2026-03-02 6:02 ` Eric Dumazet
1 sibling, 1 reply; 11+ messages in thread
From: kernel test robot @ 2026-03-01 23:38 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: llvm, oe-kbuild-all, Simon Horman, Kuniyuki Iwashima, netdev,
eric.dumazet, Eric Dumazet
Hi Eric,
kernel test robot noticed the following build errors:
[auto build test ERROR on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-rps_tag_ptr-type-and-helpers/20260302-021900
base: net-next/main
patch link: https://lore.kernel.org/r/20260301181457.3539105-8-edumazet%40google.com
patch subject: [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20260302/202603020743.NEL0agrx-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260302/202603020743.NEL0agrx-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603020743.NEL0agrx-lkp@intel.com/
All errors (new ones prefixed by >>):
>> net/core/net-sysfs.c:1104:27: error: call to undeclared function 'RPS_DEV_FLOW_TABLE_SIZE'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1104 | if (mask > (ULONG_MAX - RPS_DEV_FLOW_TABLE_SIZE(1))
| ^
1 error generated.
vim +/RPS_DEV_FLOW_TABLE_SIZE +1104 net/core/net-sysfs.c
fec5e652e58fa60 Tom Herbert 2010-04-16 1072
f5acb907dc24c38 Eric Dumazet 2010-04-19 1073 static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
fec5e652e58fa60 Tom Herbert 2010-04-16 1074 const char *buf, size_t len)
fec5e652e58fa60 Tom Herbert 2010-04-16 1075 {
5b629854b594cdb Eric Dumazet 2026-03-01 1076 struct rps_dev_flow *table;
5b629854b594cdb Eric Dumazet 2026-03-01 1077 rps_tag_ptr otag, tag_ptr = 0UL;
60b778ce5196251 Eric Dumazet 2011-12-24 1078 unsigned long mask, count;
5b629854b594cdb Eric Dumazet 2026-03-01 1079 size_t sz;
60b778ce5196251 Eric Dumazet 2011-12-24 1080 int rc;
fec5e652e58fa60 Tom Herbert 2010-04-16 1081
fec5e652e58fa60 Tom Herbert 2010-04-16 1082 if (!capable(CAP_NET_ADMIN))
fec5e652e58fa60 Tom Herbert 2010-04-16 1083 return -EPERM;
fec5e652e58fa60 Tom Herbert 2010-04-16 1084
60b778ce5196251 Eric Dumazet 2011-12-24 1085 rc = kstrtoul(buf, 0, &count);
60b778ce5196251 Eric Dumazet 2011-12-24 1086 if (rc < 0)
60b778ce5196251 Eric Dumazet 2011-12-24 1087 return rc;
fec5e652e58fa60 Tom Herbert 2010-04-16 1088
fec5e652e58fa60 Tom Herbert 2010-04-16 1089 if (count) {
60b778ce5196251 Eric Dumazet 2011-12-24 1090 mask = count - 1;
60b778ce5196251 Eric Dumazet 2011-12-24 1091 /* mask = roundup_pow_of_two(count) - 1;
60b778ce5196251 Eric Dumazet 2011-12-24 1092 * without overflows...
60b778ce5196251 Eric Dumazet 2011-12-24 1093 */
60b778ce5196251 Eric Dumazet 2011-12-24 1094 while ((mask | (mask >> 1)) != mask)
60b778ce5196251 Eric Dumazet 2011-12-24 1095 mask |= (mask >> 1);
60b778ce5196251 Eric Dumazet 2011-12-24 1096 /* On 64 bit arches, must check mask fits in table->mask (u32),
8e3bff96afa6736 stephen hemminger 2013-12-08 1097 * and on 32bit arches, must check
8e3bff96afa6736 stephen hemminger 2013-12-08 1098 * RPS_DEV_FLOW_TABLE_SIZE(mask + 1) doesn't overflow.
60b778ce5196251 Eric Dumazet 2011-12-24 1099 */
60b778ce5196251 Eric Dumazet 2011-12-24 1100 #if BITS_PER_LONG > 32
60b778ce5196251 Eric Dumazet 2011-12-24 1101 if (mask > (unsigned long)(u32)mask)
a0a129f8b6cff54 Xi Wang 2011-12-22 1102 return -EINVAL;
60b778ce5196251 Eric Dumazet 2011-12-24 1103 #else
60b778ce5196251 Eric Dumazet 2011-12-24 @1104 if (mask > (ULONG_MAX - RPS_DEV_FLOW_TABLE_SIZE(1))
a0a129f8b6cff54 Xi Wang 2011-12-22 1105 / sizeof(struct rps_dev_flow)) {
fec5e652e58fa60 Tom Herbert 2010-04-16 1106 /* Enforce a limit to prevent overflow */
fec5e652e58fa60 Tom Herbert 2010-04-16 1107 return -EINVAL;
fec5e652e58fa60 Tom Herbert 2010-04-16 1108 }
60b778ce5196251 Eric Dumazet 2011-12-24 1109 #endif
5b629854b594cdb Eric Dumazet 2026-03-01 1110 sz = max_t(size_t, sizeof(*table) * (mask + 1),
5b629854b594cdb Eric Dumazet 2026-03-01 1111 PAGE_SIZE);
5b629854b594cdb Eric Dumazet 2026-03-01 1112 if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
5b629854b594cdb Eric Dumazet 2026-03-01 1113 is_power_of_2(sz))
5b629854b594cdb Eric Dumazet 2026-03-01 1114 table = kvmalloc(sz, GFP_KERNEL);
5b629854b594cdb Eric Dumazet 2026-03-01 1115 else
5b629854b594cdb Eric Dumazet 2026-03-01 1116 table = vmalloc(sz);
fec5e652e58fa60 Tom Herbert 2010-04-16 1117 if (!table)
fec5e652e58fa60 Tom Herbert 2010-04-16 1118 return -ENOMEM;
5b629854b594cdb Eric Dumazet 2026-03-01 1119 tag_ptr = (rps_tag_ptr)table;
5b629854b594cdb Eric Dumazet 2026-03-01 1120 if (rps_tag_to_log(tag_ptr)) {
5b629854b594cdb Eric Dumazet 2026-03-01 1121 pr_err_once("store_rps_dev_flow_table_cnt() got a non page aligned allocation.\n");
5b629854b594cdb Eric Dumazet 2026-03-01 1122 kvfree(table);
5b629854b594cdb Eric Dumazet 2026-03-01 1123 return -ENOMEM;
5b629854b594cdb Eric Dumazet 2026-03-01 1124 }
5b629854b594cdb Eric Dumazet 2026-03-01 1125 tag_ptr |= (ilog2(mask) + 1);
97bcc5b6f45425a Krishna Kumar 2025-08-25 1126 for (count = 0; count <= mask; count++) {
5b629854b594cdb Eric Dumazet 2026-03-01 1127 table[count].cpu = RPS_NO_CPU;
5b629854b594cdb Eric Dumazet 2026-03-01 1128 table[count].filter = RPS_NO_FILTER;
97bcc5b6f45425a Krishna Kumar 2025-08-25 1129 }
6648c65e7ea72c3 stephen hemminger 2017-08-18 1130 }
fec5e652e58fa60 Tom Herbert 2010-04-16 1131
5b629854b594cdb Eric Dumazet 2026-03-01 1132 otag = xchg(&queue->rps_flow_table, tag_ptr);
5b629854b594cdb Eric Dumazet 2026-03-01 1133 if (otag)
5b629854b594cdb Eric Dumazet 2026-03-01 1134 kvfree_rcu_mightsleep(rps_tag_to_table(otag));
fec5e652e58fa60 Tom Herbert 2010-04-16 1135
fec5e652e58fa60 Tom Herbert 2010-04-16 1136 return len;
fec5e652e58fa60 Tom Herbert 2010-04-16 1137 }
fec5e652e58fa60 Tom Herbert 2010-04-16 1138
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
2026-03-01 23:38 ` kernel test robot
@ 2026-03-02 6:02 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-02 6:02 UTC (permalink / raw)
To: kernel test robot
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, llvm,
oe-kbuild-all, Simon Horman, Kuniyuki Iwashima, netdev,
eric.dumazet
On Mon, Mar 2, 2026 at 12:39 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Eric,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on net-next/main]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-rps_tag_ptr-type-and-helpers/20260302-021900
> base: net-next/main
> patch link: https://lore.kernel.org/r/20260301181457.3539105-8-edumazet%40google.com
> patch subject: [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table
> config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20260302/202603020743.NEL0agrx-lkp@intel.com/config)
> compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260302/202603020743.NEL0agrx-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202603020743.NEL0agrx-lkp@intel.com/
Yeah, in V3 I will remove the BITS_PER_LONG distinction, prone to
errors like this one.
#if BITS_PER_LONG > 32
...
#else
...
#endif
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-03-02 6:03 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-01 18:14 [PATCH v2 net-next 0/7] rfs: use high-order allocations for hash tables Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 1/7] net: add rps_tag_ptr type and helpers Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 2/7] net-sysfs: remove rcu field from 'struct rps_sock_flow_table' Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 3/7] net-sysfs: add rps_sock_flow_table_mask() helper Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 4/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_table Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 5/7] net-sysfs: get rid of rps_dev_flow_lock Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 6/7] net-sysfs: remove rcu field from 'struct rps_dev_flow_table' Eric Dumazet
2026-03-01 18:14 ` [PATCH v2 net-next 7/7] net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_table Eric Dumazet
2026-03-01 22:05 ` kernel test robot
2026-03-01 23:38 ` kernel test robot
2026-03-02 6:02 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox