netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] rps: misc changes
@ 2025-04-07 16:35 Eric Dumazet
  2025-04-07 16:35 ` [PATCH net-next 1/4] net: rps: change skb_flow_limit() hash function Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Eric Dumazet @ 2025-04-07 16:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

Minor changes in rps:

skb_flow_limit() is probably unused these days,
and data-races are quite theoretical.

Eric Dumazet (4):
  net: rps: change skb_flow_limit() hash function
  net: rps: annotate data-races around (struct sd_flow_limit)->count
  net: add data-race annotations in softnet_seq_show()
  net: rps: remove kfree_rcu_mightsleep() use

 include/net/rps.h          |  5 +++--
 net/core/dev.c             | 11 +++++++----
 net/core/dev.h             |  5 +++--
 net/core/net-procfs.c      |  9 +++++----
 net/core/sysctl_net_core.c |  6 +++---
 5 files changed, 21 insertions(+), 15 deletions(-)

-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/4] net: rps: change skb_flow_limit() hash function
  2025-04-07 16:35 [PATCH net-next 0/4] rps: misc changes Eric Dumazet
@ 2025-04-07 16:35 ` Eric Dumazet
  2025-04-07 18:53   ` Willem de Bruijn
  2025-04-07 16:36 ` [PATCH net-next 2/4] net: rps: annotate data-races around (struct sd_flow_limit)->count Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2025-04-07 16:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

As explained in commit f3483c8e1da6 ("net: rfs: hash function change"),
masking low order bits of skb_get_hash(skb) has low entropy.

A NIC with 32 RX queues uses the 5 low order bits of rss key
to select a queue. This means all packets landing to a given
queue share the same 5 low order bits.

Switch to hash_32() to reduce hash collisions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c             | 2 +-
 net/core/dev.h             | 2 +-
 net/core/sysctl_net_core.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 0608605cfc24..f674236f34be 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5038,7 +5038,7 @@ static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
 	rcu_read_lock();
 	fl = rcu_dereference(sd->flow_limit);
 	if (fl) {
-		new_flow = skb_get_hash(skb) & (fl->num_buckets - 1);
+		new_flow = hash_32(skb_get_hash(skb), fl->log_buckets);
 		old_flow = fl->history[fl->history_head];
 		fl->history[fl->history_head] = new_flow;
 
diff --git a/net/core/dev.h b/net/core/dev.h
index 7ee203395d8e..b1cd44b5f009 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -16,7 +16,7 @@ struct cpumask;
 #define FLOW_LIMIT_HISTORY	(1 << 7)  /* must be ^2 and !overflow buckets */
 struct sd_flow_limit {
 	u64			count;
-	unsigned int		num_buckets;
+	u8			log_buckets;
 	unsigned int		history_head;
 	u16			history[FLOW_LIMIT_HISTORY];
 	u8			buckets[];
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index c7769ee0d9c5..5cfe76ede523 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -248,7 +248,7 @@ static int flow_limit_cpu_sysctl(const struct ctl_table *table, int write,
 					ret = -ENOMEM;
 					goto write_unlock;
 				}
-				cur->num_buckets = netdev_flow_limit_table_len;
+				cur->log_buckets = ilog2(netdev_flow_limit_table_len);
 				rcu_assign_pointer(sd->flow_limit, cur);
 			}
 		}
-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/4] net: rps: annotate data-races around (struct sd_flow_limit)->count
  2025-04-07 16:35 [PATCH net-next 0/4] rps: misc changes Eric Dumazet
  2025-04-07 16:35 ` [PATCH net-next 1/4] net: rps: change skb_flow_limit() hash function Eric Dumazet
@ 2025-04-07 16:36 ` Eric Dumazet
  2025-04-07 18:55   ` Willem de Bruijn
  2025-04-07 16:36 ` [PATCH net-next 3/4] net: add data-race annotations in softnet_seq_show() Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2025-04-07 16:36 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

softnet_seq_show() can read fl->count while another cpu
updates this field from skb_flow_limit().

Make this field an 'unsigned int', as its only consumer
only deals with 32 bit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c        | 3 ++-
 net/core/dev.h        | 2 +-
 net/core/net-procfs.c | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f674236f34be..969883173182 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5049,7 +5049,8 @@ static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
 			fl->buckets[old_flow]--;
 
 		if (++fl->buckets[new_flow] > (FLOW_LIMIT_HISTORY >> 1)) {
-			fl->count++;
+			/* Pairs with READ_ONCE() in softnet_seq_show() */
+			WRITE_ONCE(fl->count, fl->count + 1);
 			rcu_read_unlock();
 			return true;
 		}
diff --git a/net/core/dev.h b/net/core/dev.h
index b1cd44b5f009..e855e1cb43fd 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -15,7 +15,7 @@ struct cpumask;
 /* Random bits of netdevice that don't need to be exposed */
 #define FLOW_LIMIT_HISTORY	(1 << 7)  /* must be ^2 and !overflow buckets */
 struct sd_flow_limit {
-	u64			count;
+	unsigned int		count;
 	u8			log_buckets;
 	unsigned int		history_head;
 	u16			history[FLOW_LIMIT_HISTORY];
diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 3e92bf0f9060..69782d62fbe1 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -132,8 +132,9 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 
 	rcu_read_lock();
 	fl = rcu_dereference(sd->flow_limit);
+	/* Pairs with WRITE_ONCE() in skb_flow_limit() */
 	if (fl)
-		flow_limit_count = fl->count;
+		flow_limit_count = READ_ONCE(fl->count);
 	rcu_read_unlock();
 #endif
 
-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/4] net: add data-race annotations in softnet_seq_show()
  2025-04-07 16:35 [PATCH net-next 0/4] rps: misc changes Eric Dumazet
  2025-04-07 16:35 ` [PATCH net-next 1/4] net: rps: change skb_flow_limit() hash function Eric Dumazet
  2025-04-07 16:36 ` [PATCH net-next 2/4] net: rps: annotate data-races around (struct sd_flow_limit)->count Eric Dumazet
@ 2025-04-07 16:36 ` Eric Dumazet
  2025-04-07 19:04   ` Willem de Bruijn
  2025-04-07 16:36 ` [PATCH net-next 4/4] net: rps: remove kfree_rcu_mightsleep() use Eric Dumazet
  2025-04-08 20:08 ` [PATCH net-next 0/4] rps: misc changes patchwork-bot+netdevbpf
  4 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2025-04-07 16:36 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

softnet_seq_show() reads several fields that might be updated
concurrently. Add READ_ONCE() and WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c        | 6 ++++--
 net/core/net-procfs.c | 6 +++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 969883173182..4ccc6dc5303e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4953,7 +4953,8 @@ static void rps_trigger_softirq(void *data)
 	struct softnet_data *sd = data;
 
 	____napi_schedule(sd, &sd->backlog);
-	sd->received_rps++;
+	/* Pairs with READ_ONCE() in softnet_seq_show() */
+	WRITE_ONCE(sd->received_rps, sd->received_rps + 1);
 }
 
 #endif /* CONFIG_RPS */
@@ -7523,7 +7524,8 @@ static __latent_entropy void net_rx_action(void)
 		 */
 		if (unlikely(budget <= 0 ||
 			     time_after_eq(jiffies, time_limit))) {
-			sd->time_squeeze++;
+			/* Pairs with READ_ONCE() in softnet_seq_show() */
+			WRITE_ONCE(sd->time_squeeze, sd->time_squeeze + 1);
 			break;
 		}
 	}
diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 69782d62fbe1..4f0f0709a1cb 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -145,11 +145,11 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq,
 		   "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
 		   "%08x %08x\n",
-		   sd->processed, atomic_read(&sd->dropped),
-		   sd->time_squeeze, 0,
+		   READ_ONCE(sd->processed), atomic_read(&sd->dropped),
+		   READ_ONCE(sd->time_squeeze), 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   0,	/* was cpu_collision */
-		   sd->received_rps, flow_limit_count,
+		   READ_ONCE(sd->received_rps), flow_limit_count,
 		   input_qlen + process_qlen, (int)seq->index,
 		   input_qlen, process_qlen);
 	return 0;
-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/4] net: rps: remove kfree_rcu_mightsleep() use
  2025-04-07 16:35 [PATCH net-next 0/4] rps: misc changes Eric Dumazet
                   ` (2 preceding siblings ...)
  2025-04-07 16:36 ` [PATCH net-next 3/4] net: add data-race annotations in softnet_seq_show() Eric Dumazet
@ 2025-04-07 16:36 ` Eric Dumazet
  2025-04-07 19:36   ` Willem de Bruijn
  2025-04-08 20:08 ` [PATCH net-next 0/4] rps: misc changes patchwork-bot+netdevbpf
  4 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2025-04-07 16:36 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

Add an rcu_head to sd_flow_limit and rps_sock_flow_table structs
to use the more conventional and predictable k[v]free_rcu().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/rps.h          | 5 +++--
 net/core/dev.h             | 1 +
 net/core/sysctl_net_core.c | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/net/rps.h b/include/net/rps.h
index e358e9711f27..507f4aa5d39b 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -57,9 +57,10 @@ struct rps_dev_flow_table {
  * meaning we use 32-6=26 bits for the hash.
  */
 struct rps_sock_flow_table {
-	u32	mask;
+	struct rcu_head	rcu;
+	u32		mask;
 
-	u32	ents[] ____cacheline_aligned_in_smp;
+	u32		ents[] ____cacheline_aligned_in_smp;
 };
 #define	RPS_SOCK_FLOW_TABLE_SIZE(_num) (offsetof(struct rps_sock_flow_table, ents[_num]))
 
diff --git a/net/core/dev.h b/net/core/dev.h
index e855e1cb43fd..710abc05ebdb 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -15,6 +15,7 @@ struct cpumask;
 /* Random bits of netdevice that don't need to be exposed */
 #define FLOW_LIMIT_HISTORY	(1 << 7)  /* must be ^2 and !overflow buckets */
 struct sd_flow_limit {
+	struct rcu_head		rcu;
 	unsigned int		count;
 	u8			log_buckets;
 	unsigned int		history_head;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 5cfe76ede523..5dbb2c6f371d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -201,7 +201,7 @@ static int rps_sock_flow_sysctl(const struct ctl_table *table, int write,
 			if (orig_sock_table) {
 				static_branch_dec(&rps_needed);
 				static_branch_dec(&rfs_needed);
-				kvfree_rcu_mightsleep(orig_sock_table);
+				kvfree_rcu(orig_sock_table, rcu);
 			}
 		}
 	}
@@ -239,7 +239,7 @@ static int flow_limit_cpu_sysctl(const struct ctl_table *table, int write,
 				     lockdep_is_held(&flow_limit_update_mutex));
 			if (cur && !cpumask_test_cpu(i, mask)) {
 				RCU_INIT_POINTER(sd->flow_limit, NULL);
-				kfree_rcu_mightsleep(cur);
+				kfree_rcu(cur, rcu);
 			} else if (!cur && cpumask_test_cpu(i, mask)) {
 				cur = kzalloc_node(len, GFP_KERNEL,
 						   cpu_to_node(i));
-- 
2.49.0.504.g3bcea36a83-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 1/4] net: rps: change skb_flow_limit() hash function
  2025-04-07 16:35 ` [PATCH net-next 1/4] net: rps: change skb_flow_limit() hash function Eric Dumazet
@ 2025-04-07 18:53   ` Willem de Bruijn
  0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-04-07 18:53 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

Eric Dumazet wrote:
> As explained in commit f3483c8e1da6 ("net: rfs: hash function change"),
> masking low order bits of skb_get_hash(skb) has low entropy.
> 
> A NIC with 32 RX queues uses the 5 low order bits of rss key
> to select a queue. This means all packets landing to a given
> queue share the same 5 low order bits.
> 
> Switch to hash_32() to reduce hash collisions.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/4] net: rps: annotate data-races around (struct sd_flow_limit)->count
  2025-04-07 16:36 ` [PATCH net-next 2/4] net: rps: annotate data-races around (struct sd_flow_limit)->count Eric Dumazet
@ 2025-04-07 18:55   ` Willem de Bruijn
  0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-04-07 18:55 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

Eric Dumazet wrote:
> softnet_seq_show() can read fl->count while another cpu
> updates this field from skb_flow_limit().
> 
> Make this field an 'unsigned int', as its only consumer
> only deals with 32 bit.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/4] net: add data-race annotations in softnet_seq_show()
  2025-04-07 16:36 ` [PATCH net-next 3/4] net: add data-race annotations in softnet_seq_show() Eric Dumazet
@ 2025-04-07 19:04   ` Willem de Bruijn
  0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-04-07 19:04 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

Eric Dumazet wrote:
> softnet_seq_show() reads several fields that might be updated
> concurrently. Add READ_ONCE() and WRITE_ONCE() annotations.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 4/4] net: rps: remove kfree_rcu_mightsleep() use
  2025-04-07 16:36 ` [PATCH net-next 4/4] net: rps: remove kfree_rcu_mightsleep() use Eric Dumazet
@ 2025-04-07 19:36   ` Willem de Bruijn
  0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2025-04-07 19:36 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Willem de Bruijn, netdev, eric.dumazet,
	Eric Dumazet

Eric Dumazet wrote:
> Add an rcu_head to sd_flow_limit and rps_sock_flow_table structs
> to use the more conventional and predictable k[v]free_rcu().
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/4] rps: misc changes
  2025-04-07 16:35 [PATCH net-next 0/4] rps: misc changes Eric Dumazet
                   ` (3 preceding siblings ...)
  2025-04-07 16:36 ` [PATCH net-next 4/4] net: rps: remove kfree_rcu_mightsleep() use Eric Dumazet
@ 2025-04-08 20:08 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-04-08 20:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, willemb, netdev, eric.dumazet

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  7 Apr 2025 16:35:58 +0000 you wrote:
> Minor changes in rps:
> 
> skb_flow_limit() is probably unused these days,
> and data-races are quite theoretical.
> 
> Eric Dumazet (4):
>   net: rps: change skb_flow_limit() hash function
>   net: rps: annotate data-races around (struct sd_flow_limit)->count
>   net: add data-race annotations in softnet_seq_show()
>   net: rps: remove kfree_rcu_mightsleep() use
> 
> [...]

Here is the summary with links:
  - [net-next,1/4] net: rps: change skb_flow_limit() hash function
    https://git.kernel.org/netdev/net-next/c/c3025e94daa9
  - [net-next,2/4] net: rps: annotate data-races around (struct sd_flow_limit)->count
    https://git.kernel.org/netdev/net-next/c/7b6f0a852da3
  - [net-next,3/4] net: add data-race annotations in softnet_seq_show()
    https://git.kernel.org/netdev/net-next/c/22d046a778e4
  - [net-next,4/4] net: rps: remove kfree_rcu_mightsleep() use
    https://git.kernel.org/netdev/net-next/c/0a7de4a8f898

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-04-08 20:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 16:35 [PATCH net-next 0/4] rps: misc changes Eric Dumazet
2025-04-07 16:35 ` [PATCH net-next 1/4] net: rps: change skb_flow_limit() hash function Eric Dumazet
2025-04-07 18:53   ` Willem de Bruijn
2025-04-07 16:36 ` [PATCH net-next 2/4] net: rps: annotate data-races around (struct sd_flow_limit)->count Eric Dumazet
2025-04-07 18:55   ` Willem de Bruijn
2025-04-07 16:36 ` [PATCH net-next 3/4] net: add data-race annotations in softnet_seq_show() Eric Dumazet
2025-04-07 19:04   ` Willem de Bruijn
2025-04-07 16:36 ` [PATCH net-next 4/4] net: rps: remove kfree_rcu_mightsleep() use Eric Dumazet
2025-04-07 19:36   ` Willem de Bruijn
2025-04-08 20:08 ` [PATCH net-next 0/4] rps: misc changes patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).