netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters
@ 2025-08-28 10:27 Eric Dumazet
  2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev,
	eric.dumazet, Eric Dumazet

inet_diag_bc_sk() pulls five cache lines per socket,
while most filters only need the two first ones.

We can change it to only pull needed cache lines,
to make things like "ss -temoi src :21456" much faster.

First patches (1-3) are annotating data-races as a first step.

Eric Dumazet (5):
  inet_diag: annotate data-races in inet_diag_msg_common_fill()
  tcp: annotate data-races in tcp_req_diag_fill()
  inet_diag: annotate data-races in inet_diag_bc_sk()
  inet_diag: change inet_diag_bc_sk() first argument
  inet_diag: avoid cache line misses in inet_diag_bc_sk()

 include/linux/inet_diag.h |  7 +++-
 net/ipv4/inet_diag.c      | 85 ++++++++++++++++++++++-----------------
 net/ipv4/raw_diag.c       | 10 ++---
 net/ipv4/tcp_diag.c       | 12 +++---
 net/ipv4/tcp_output.c     |  2 +-
 net/ipv4/udp_diag.c       | 10 ++---
 net/mptcp/mptcp_diag.c    | 15 ++-----
 7 files changed, 70 insertions(+), 71 deletions(-)

-- 
2.51.0.268.g9569e192d0-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill()
  2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet
@ 2025-08-28 10:27 ` Eric Dumazet
  2025-08-29  1:09   ` Kuniyuki Iwashima
  2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev,
	eric.dumazet, Eric Dumazet

inet_diag_msg_common_fill() can run without socket lock.
Add READ_ONCE() or data_race() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 9d4dcd17728cd17e693b59cee03f37b75f6c923a..7a9c347bc66fe35fa9771649db2f205af30e2a44 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -71,25 +71,25 @@ static void inet_diag_unlock_handler(const struct inet_diag_handler *handler)
 
 void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 {
-	r->idiag_family = sk->sk_family;
+	r->idiag_family = READ_ONCE(sk->sk_family);
 
-	r->id.idiag_sport = htons(sk->sk_num);
-	r->id.idiag_dport = sk->sk_dport;
-	r->id.idiag_if = sk->sk_bound_dev_if;
+	r->id.idiag_sport = htons(READ_ONCE(sk->sk_num));
+	r->id.idiag_dport = READ_ONCE(sk->sk_dport);
+	r->id.idiag_if = READ_ONCE(sk->sk_bound_dev_if);
 	sock_diag_save_cookie(sk, r->id.idiag_cookie);
 
 #if IS_ENABLED(CONFIG_IPV6)
-	if (sk->sk_family == AF_INET6) {
-		*(struct in6_addr *)r->id.idiag_src = sk->sk_v6_rcv_saddr;
-		*(struct in6_addr *)r->id.idiag_dst = sk->sk_v6_daddr;
+	if (r->idiag_family == AF_INET6) {
+		data_race(*(struct in6_addr *)r->id.idiag_src = sk->sk_v6_rcv_saddr);
+		data_race(*(struct in6_addr *)r->id.idiag_dst = sk->sk_v6_daddr);
 	} else
 #endif
 	{
 	memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
 	memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
 
-	r->id.idiag_src[0] = sk->sk_rcv_saddr;
-	r->id.idiag_dst[0] = sk->sk_daddr;
+	r->id.idiag_src[0] = READ_ONCE(sk->sk_rcv_saddr);
+	r->id.idiag_dst[0] = READ_ONCE(sk->sk_daddr);
 	}
 }
 EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
-- 
2.51.0.268.g9569e192d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill()
  2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet
  2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet
@ 2025-08-28 10:27 ` Eric Dumazet
  2025-08-29  1:11   ` Kuniyuki Iwashima
  2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev,
	eric.dumazet, Eric Dumazet

req->num_retrans and rsk_timer.expires are read locklessly,
and can be changed from tcp_rtx_synack().

Add READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_diag.c   | 4 ++--
 net/ipv4/tcp_output.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index 2f3a779ce7a2da7d59c6a471c155c3e6d1563acd..4ed6b93527f4ad00f34cc732639c0c82d0feff08 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -248,12 +248,12 @@ static int tcp_req_diag_fill(struct sock *sk, struct sk_buff *skb,
 	inet_diag_msg_common_fill(r, sk);
 	r->idiag_state = TCP_SYN_RECV;
 	r->idiag_timer = 1;
-	r->idiag_retrans = reqsk->num_retrans;
+	r->idiag_retrans = READ_ONCE(reqsk->num_retrans);
 
 	BUILD_BUG_ON(offsetof(struct inet_request_sock, ir_cookie) !=
 		     offsetof(struct sock, sk_cookie));
 
-	tmo = inet_reqsk(sk)->rsk_timer.expires - jiffies;
+	tmo = READ_ONCE(inet_reqsk(sk)->rsk_timer.expires) - jiffies;
 	r->idiag_expires = jiffies_delta_to_msecs(tmo);
 	r->idiag_rqueue	= 0;
 	r->idiag_wqueue	= 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 06b26a6efd628e85f97bdb7253c344565b0ed56d..e180364b8ddad4baa9978418ffd9c8b871342cb9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4438,7 +4438,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
 			tcp_sk_rw(sk)->total_retrans++;
 		}
 		trace_tcp_retransmit_synack(sk, req);
-		req->num_retrans++;
+		WRITE_ONCE(req->num_retrans, req->num_retrans + 1);
 	}
 	return res;
 }
-- 
2.51.0.268.g9569e192d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk()
  2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet
  2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet
  2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet
@ 2025-08-28 10:27 ` Eric Dumazet
  2025-08-29  1:12   ` Kuniyuki Iwashima
  2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev,
	eric.dumazet, Eric Dumazet

inet_diag_bc_sk() runs with an unlocked socket,
annotate potential races with READ_ONCE().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 7a9c347bc66fe35fa9771649db2f205af30e2a44..3827e9979d4f9a4b33665e08ce69eb803fe4f948 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -580,7 +580,7 @@ static void entry_fill_addrs(struct inet_diag_entry *entry,
 			     const struct sock *sk)
 {
 #if IS_ENABLED(CONFIG_IPV6)
-	if (sk->sk_family == AF_INET6) {
+	if (entry->family == AF_INET6) {
 		entry->saddr = sk->sk_v6_rcv_saddr.s6_addr32;
 		entry->daddr = sk->sk_v6_daddr.s6_addr32;
 	} else
@@ -593,18 +593,18 @@ static void entry_fill_addrs(struct inet_diag_entry *entry,
 
 int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk)
 {
-	struct inet_sock *inet = inet_sk(sk);
+	const struct inet_sock *inet = inet_sk(sk);
 	struct inet_diag_entry entry;
 
 	if (!bc)
 		return 1;
 
-	entry.family = sk->sk_family;
+	entry.family = READ_ONCE(sk->sk_family);
 	entry_fill_addrs(&entry, sk);
-	entry.sport = inet->inet_num;
-	entry.dport = ntohs(inet->inet_dport);
-	entry.ifindex = sk->sk_bound_dev_if;
-	entry.userlocks = sk_fullsock(sk) ? sk->sk_userlocks : 0;
+	entry.sport = READ_ONCE(inet->inet_num);
+	entry.dport = ntohs(READ_ONCE(inet->inet_dport));
+	entry.ifindex = READ_ONCE(sk->sk_bound_dev_if);
+	entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0;
 	if (sk_fullsock(sk))
 		entry.mark = READ_ONCE(sk->sk_mark);
 	else if (sk->sk_state == TCP_NEW_SYN_RECV)
-- 
2.51.0.268.g9569e192d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument
  2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet
                   ` (2 preceding siblings ...)
  2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet
@ 2025-08-28 10:27 ` Eric Dumazet
  2025-08-29  1:22   ` Kuniyuki Iwashima
  2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet
  2025-08-30  2:40 ` [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters patchwork-bot+netdevbpf
  5 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev,
	eric.dumazet, Eric Dumazet

We want to have access to the inet_diag_dump_data structure
in the following patch.

This patch removes duplication in callers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/inet_diag.h |  2 +-
 net/ipv4/inet_diag.c      |  3 ++-
 net/ipv4/raw_diag.c       | 10 +++-------
 net/ipv4/tcp_diag.c       |  8 +++-----
 net/ipv4/udp_diag.c       | 10 +++-------
 net/mptcp/mptcp_diag.c    | 15 ++++-----------
 6 files changed, 16 insertions(+), 32 deletions(-)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 30bf8f7ea62b6b34dbf45d061fb8870a91be0944..86a0641ec36e1bf25483a8e6c3412073b9893d36 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -46,7 +46,7 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 		      const struct inet_diag_req_v2 *req,
 		      u16 nlmsg_flags, bool net_admin);
 
-int inet_diag_bc_sk(const struct nlattr *_bc, struct sock *sk);
+int inet_diag_bc_sk(const struct inet_diag_dump_data *cb_data, struct sock *sk);
 
 void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk);
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3827e9979d4f9a4b33665e08ce69eb803fe4f948..11710304268781581b3559aca770d50dc0090ef3 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -591,8 +591,9 @@ static void entry_fill_addrs(struct inet_diag_entry *entry,
 	}
 }
 
-int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk)
+int inet_diag_bc_sk(const struct inet_diag_dump_data *cb_data, struct sock *sk)
 {
+	const struct nlattr *bc = cb_data->inet_diag_nla_bc;
 	const struct inet_sock *inet = inet_sk(sk);
 	struct inet_diag_entry entry;
 
diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index cc793bd8de258c3a12f11e95cec81c5ae4b9a7f6..943e5998e0ad5e23d0f3c4d364139bcb07ac5e0c 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -126,9 +126,9 @@ static int raw_diag_dump_one(struct netlink_callback *cb,
 static int sk_diag_dump(struct sock *sk, struct sk_buff *skb,
 			struct netlink_callback *cb,
 			const struct inet_diag_req_v2 *r,
-			struct nlattr *bc, bool net_admin)
+			bool net_admin)
 {
-	if (!inet_diag_bc_sk(bc, sk))
+	if (!inet_diag_bc_sk(cb->data, sk))
 		return 0;
 
 	return inet_sk_diag_fill(sk, NULL, skb, cb, r, NLM_F_MULTI, net_admin);
@@ -140,17 +140,13 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 	bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN);
 	struct raw_hashinfo *hashinfo = raw_get_hashinfo(r);
 	struct net *net = sock_net(skb->sk);
-	struct inet_diag_dump_data *cb_data;
 	int num, s_num, slot, s_slot;
 	struct hlist_head *hlist;
 	struct sock *sk = NULL;
-	struct nlattr *bc;
 
 	if (IS_ERR(hashinfo))
 		return;
 
-	cb_data = cb->data;
-	bc = cb_data->inet_diag_nla_bc;
 	s_slot = cb->args[0];
 	num = s_num = cb->args[1];
 
@@ -174,7 +170,7 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 			if (r->id.idiag_dport != inet->inet_dport &&
 			    r->id.idiag_dport)
 				goto next;
-			if (sk_diag_dump(sk, skb, cb, r, bc, net_admin) < 0)
+			if (sk_diag_dump(sk, skb, cb, r, net_admin) < 0)
 				goto out_unlock;
 next:
 			num++;
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index 4ed6b93527f4ad00f34cc732639c0c82d0feff08..d83efd91f461c8ad0157faeebae051b32cb07bf4 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -320,11 +320,9 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 	u32 idiag_states = r->idiag_states;
 	struct inet_hashinfo *hashinfo;
 	int i, num, s_i, s_num;
-	struct nlattr *bc;
 	struct sock *sk;
 
 	hashinfo = net->ipv4.tcp_death_row.hashinfo;
-	bc = cb_data->inet_diag_nla_bc;
 	if (idiag_states & TCPF_SYN_RECV)
 		idiag_states |= TCPF_NEW_SYN_RECV;
 	s_i = cb->args[1];
@@ -365,7 +363,7 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 				    r->id.idiag_sport)
 					goto next_listen;
 
-				if (!inet_diag_bc_sk(bc, sk))
+				if (!inet_diag_bc_sk(cb_data, sk))
 					goto next_listen;
 
 				if (inet_sk_diag_fill(sk, inet_csk(sk), skb,
@@ -432,7 +430,7 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 					    r->sdiag_family != sk->sk_family)
 						goto next_bind;
 
-					if (!inet_diag_bc_sk(bc, sk))
+					if (!inet_diag_bc_sk(cb_data, sk))
 						goto next_bind;
 
 					sock_hold(sk);
@@ -519,7 +517,7 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 				goto next_normal;
 			twsk_build_assert();
 
-			if (!inet_diag_bc_sk(bc, sk))
+			if (!inet_diag_bc_sk(cb_data, sk))
 				goto next_normal;
 
 			if (!refcount_inc_not_zero(&sk->sk_refcnt))
diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c
index 38cb3a28e4ed6d54f7078afa2700e71db9ce4b85..6e491c720c9075bcef9d5daf9bc852fab3412231 100644
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -16,9 +16,9 @@
 static int sk_diag_dump(struct sock *sk, struct sk_buff *skb,
 			struct netlink_callback *cb,
 			const struct inet_diag_req_v2 *req,
-			struct nlattr *bc, bool net_admin)
+			bool net_admin)
 {
-	if (!inet_diag_bc_sk(bc, sk))
+	if (!inet_diag_bc_sk(cb->data, sk))
 		return 0;
 
 	return inet_sk_diag_fill(sk, NULL, skb, cb, req, NLM_F_MULTI,
@@ -92,12 +92,8 @@ static void udp_dump(struct udp_table *table, struct sk_buff *skb,
 {
 	bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN);
 	struct net *net = sock_net(skb->sk);
-	struct inet_diag_dump_data *cb_data;
 	int num, s_num, slot, s_slot;
-	struct nlattr *bc;
 
-	cb_data = cb->data;
-	bc = cb_data->inet_diag_nla_bc;
 	s_slot = cb->args[0];
 	num = s_num = cb->args[1];
 
@@ -130,7 +126,7 @@ static void udp_dump(struct udp_table *table, struct sk_buff *skb,
 			    r->id.idiag_dport)
 				goto next;
 
-			if (sk_diag_dump(sk, skb, cb, r, bc, net_admin) < 0) {
+			if (sk_diag_dump(sk, skb, cb, r, net_admin) < 0) {
 				spin_unlock_bh(&hslot->lock);
 				goto done;
 			}
diff --git a/net/mptcp/mptcp_diag.c b/net/mptcp/mptcp_diag.c
index 0566dd793810a58055d33548bcb5e511116eed61..ac974299de71cdf85cba7d848d14a241f5ff4dc8 100644
--- a/net/mptcp/mptcp_diag.c
+++ b/net/mptcp/mptcp_diag.c
@@ -15,9 +15,9 @@
 static int sk_diag_dump(struct sock *sk, struct sk_buff *skb,
 			struct netlink_callback *cb,
 			const struct inet_diag_req_v2 *req,
-			struct nlattr *bc, bool net_admin)
+			bool net_admin)
 {
-	if (!inet_diag_bc_sk(bc, sk))
+	if (!inet_diag_bc_sk(cb->data, sk))
 		return 0;
 
 	return inet_sk_diag_fill(sk, inet_csk(sk), skb, cb, req, NLM_F_MULTI,
@@ -76,9 +76,7 @@ static void mptcp_diag_dump_listeners(struct sk_buff *skb, struct netlink_callba
 				      const struct inet_diag_req_v2 *r,
 				      bool net_admin)
 {
-	struct inet_diag_dump_data *cb_data = cb->data;
 	struct mptcp_diag_ctx *diag_ctx = (void *)cb->ctx;
-	struct nlattr *bc = cb_data->inet_diag_nla_bc;
 	struct net *net = sock_net(skb->sk);
 	struct inet_hashinfo *hinfo;
 	int i;
@@ -121,7 +119,7 @@ static void mptcp_diag_dump_listeners(struct sk_buff *skb, struct netlink_callba
 			if (!refcount_inc_not_zero(&sk->sk_refcnt))
 				goto next_listen;
 
-			ret = sk_diag_dump(sk, skb, cb, r, bc, net_admin);
+			ret = sk_diag_dump(sk, skb, cb, r, net_admin);
 
 			sock_put(sk);
 
@@ -154,15 +152,10 @@ static void mptcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 	bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN);
 	struct mptcp_diag_ctx *diag_ctx = (void *)cb->ctx;
 	struct net *net = sock_net(skb->sk);
-	struct inet_diag_dump_data *cb_data;
 	struct mptcp_sock *msk;
-	struct nlattr *bc;
 
 	BUILD_BUG_ON(sizeof(cb->ctx) < sizeof(*diag_ctx));
 
-	cb_data = cb->data;
-	bc = cb_data->inet_diag_nla_bc;
-
 	while ((msk = mptcp_token_iter_next(net, &diag_ctx->s_slot,
 					    &diag_ctx->s_num)) != NULL) {
 		struct inet_sock *inet = (struct inet_sock *)msk;
@@ -181,7 +174,7 @@ static void mptcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 		    r->id.idiag_dport)
 			goto next;
 
-		ret = sk_diag_dump(sk, skb, cb, r, bc, net_admin);
+		ret = sk_diag_dump(sk, skb, cb, r, net_admin);
 next:
 		sock_put(sk);
 		if (ret < 0) {
-- 
2.51.0.268.g9569e192d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk()
  2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet
                   ` (3 preceding siblings ...)
  2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet
@ 2025-08-28 10:27 ` Eric Dumazet
  2025-08-29  1:33   ` Kuniyuki Iwashima
  2025-08-30  2:40 ` [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters patchwork-bot+netdevbpf
  5 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev,
	eric.dumazet, Eric Dumazet

inet_diag_bc_sk() pulls five cache lines per socket,
while most filters only need the two first ones.

Add three booleans to struct inet_diag_dump_data,
that are selectively set if a filter needs specific socket fields.

- mark_needed       /* INET_DIAG_BC_MARK_COND present. */
- cgroup_needed     /* INET_DIAG_BC_CGROUP_COND present. */
- userlocks_needed  /* INET_DIAG_BC_AUTO present. */

This removes millions of cache lines misses per ss invocation
when simple filters are specified on busy servers.

offsetof(struct sock, sk_userlocks) = 0xf3
offsetof(struct sock, sk_mark) = 0x20c
offsetof(struct sock, sk_cgrp_data) = 0x298

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/inet_diag.h |  5 ++++
 net/ipv4/inet_diag.c      | 52 +++++++++++++++++++++++----------------
 2 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 86a0641ec36e1bf25483a8e6c3412073b9893d36..704fd415c2b497dfba591a7ef46009dec7824d75 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -38,6 +38,11 @@ struct inet_diag_dump_data {
 #define inet_diag_nla_bpf_stgs req_nlas[INET_DIAG_REQ_SK_BPF_STORAGES]
 
 	struct bpf_sk_storage_diag *bpf_stg_diag;
+	bool mark_needed;	/* INET_DIAG_BC_MARK_COND present. */
+#ifdef CONFIG_SOCK_CGROUP_DATA
+	bool cgroup_needed;	/* INET_DIAG_BC_CGROUP_COND present. */
+#endif
+	bool userlocks_needed;	/* INET_DIAG_BC_AUTO present. */
 };
 
 struct inet_connection_sock;
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 11710304268781581b3559aca770d50dc0090ef3..f0b6c5a411a2008e2a039ed37e262f3f132e58ac 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -605,18 +605,22 @@ int inet_diag_bc_sk(const struct inet_diag_dump_data *cb_data, struct sock *sk)
 	entry.sport = READ_ONCE(inet->inet_num);
 	entry.dport = ntohs(READ_ONCE(inet->inet_dport));
 	entry.ifindex = READ_ONCE(sk->sk_bound_dev_if);
-	entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0;
-	if (sk_fullsock(sk))
-		entry.mark = READ_ONCE(sk->sk_mark);
-	else if (sk->sk_state == TCP_NEW_SYN_RECV)
-		entry.mark = inet_rsk(inet_reqsk(sk))->ir_mark;
-	else if (sk->sk_state == TCP_TIME_WAIT)
-		entry.mark = inet_twsk(sk)->tw_mark;
-	else
-		entry.mark = 0;
+	if (cb_data->userlocks_needed)
+		entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0;
+	if (cb_data->mark_needed) {
+		if (sk_fullsock(sk))
+			entry.mark = READ_ONCE(sk->sk_mark);
+		else if (sk->sk_state == TCP_NEW_SYN_RECV)
+			entry.mark = inet_rsk(inet_reqsk(sk))->ir_mark;
+		else if (sk->sk_state == TCP_TIME_WAIT)
+			entry.mark = inet_twsk(sk)->tw_mark;
+		else
+			entry.mark = 0;
+	}
 #ifdef CONFIG_SOCK_CGROUP_DATA
-	entry.cgroup_id = sk_fullsock(sk) ?
-		cgroup_id(sock_cgroup_ptr(&sk->sk_cgrp_data)) : 0;
+	if (cb_data->cgroup_needed)
+		entry.cgroup_id = sk_fullsock(sk) ?
+			cgroup_id(sock_cgroup_ptr(&sk->sk_cgrp_data)) : 0;
 #endif
 
 	return inet_diag_bc_run(bc, &entry);
@@ -716,16 +720,21 @@ static bool valid_cgroupcond(const struct inet_diag_bc_op *op, int len,
 }
 #endif
 
-static int inet_diag_bc_audit(const struct nlattr *attr,
+static int inet_diag_bc_audit(struct inet_diag_dump_data *cb_data,
 			      const struct sk_buff *skb)
 {
-	bool net_admin = netlink_net_capable(skb, CAP_NET_ADMIN);
+	const struct nlattr *attr = cb_data->inet_diag_nla_bc;
 	const void *bytecode, *bc;
 	int bytecode_len, len;
+	bool net_admin;
+
+	if (!attr)
+		return 0;
 
-	if (!attr || nla_len(attr) < sizeof(struct inet_diag_bc_op))
+	if (nla_len(attr) < sizeof(struct inet_diag_bc_op))
 		return -EINVAL;
 
+	net_admin = netlink_net_capable(skb, CAP_NET_ADMIN);
 	bytecode = bc = nla_data(attr);
 	len = bytecode_len = nla_len(attr);
 
@@ -757,14 +766,18 @@ static int inet_diag_bc_audit(const struct nlattr *attr,
 				return -EPERM;
 			if (!valid_markcond(bc, len, &min_len))
 				return -EINVAL;
+			cb_data->mark_needed = true;
 			break;
 #ifdef CONFIG_SOCK_CGROUP_DATA
 		case INET_DIAG_BC_CGROUP_COND:
 			if (!valid_cgroupcond(bc, len, &min_len))
 				return -EINVAL;
+			cb_data->cgroup_needed = true;
 			break;
 #endif
 		case INET_DIAG_BC_AUTO:
+			cb_data->userlocks_needed = true;
+			fallthrough;
 		case INET_DIAG_BC_JMP:
 		case INET_DIAG_BC_NOP:
 			break;
@@ -841,13 +854,10 @@ static int __inet_diag_dump_start(struct netlink_callback *cb, int hdrlen)
 		kfree(cb_data);
 		return err;
 	}
-	nla = cb_data->inet_diag_nla_bc;
-	if (nla) {
-		err = inet_diag_bc_audit(nla, skb);
-		if (err) {
-			kfree(cb_data);
-			return err;
-		}
+	err = inet_diag_bc_audit(cb_data, skb);
+	if (err) {
+		kfree(cb_data);
+		return err;
 	}
 
 	nla = cb_data->inet_diag_nla_bpf_stgs;
-- 
2.51.0.268.g9569e192d0-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill()
  2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet
@ 2025-08-29  1:09   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-29  1:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Willem de Bruijn, netdev, eric.dumazet

On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote:
>
> inet_diag_msg_common_fill() can run without socket lock.
> Add READ_ONCE() or data_race() annotations.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill()
  2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet
@ 2025-08-29  1:11   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-29  1:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Willem de Bruijn, netdev, eric.dumazet

On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote:
>
> req->num_retrans and rsk_timer.expires are read locklessly,
> and can be changed from tcp_rtx_synack().
>
> Add READ_ONCE()/WRITE_ONCE() annotations.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk()
  2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet
@ 2025-08-29  1:12   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-29  1:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Willem de Bruijn, netdev, eric.dumazet

On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote:
>
> inet_diag_bc_sk() runs with an unlocked socket,
> annotate potential races with READ_ONCE().
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument
  2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet
@ 2025-08-29  1:22   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-29  1:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Willem de Bruijn, netdev, eric.dumazet

On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote:
>
> We want to have access to the inet_diag_dump_data structure
> in the following patch.
>
> This patch removes duplication in callers.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk()
  2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet
@ 2025-08-29  1:33   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 12+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-29  1:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Willem de Bruijn, netdev, eric.dumazet

On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote:
>
> inet_diag_bc_sk() pulls five cache lines per socket,
> while most filters only need the two first ones.
>
> Add three booleans to struct inet_diag_dump_data,
> that are selectively set if a filter needs specific socket fields.
>
> - mark_needed       /* INET_DIAG_BC_MARK_COND present. */
> - cgroup_needed     /* INET_DIAG_BC_CGROUP_COND present. */
> - userlocks_needed  /* INET_DIAG_BC_AUTO present. */
>
> This removes millions of cache lines misses per ss invocation
> when simple filters are specified on busy servers.
>
> offsetof(struct sock, sk_userlocks) = 0xf3
> offsetof(struct sock, sk_mark) = 0x20c
> offsetof(struct sock, sk_cgrp_data) = 0x298
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters
  2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet
                   ` (4 preceding siblings ...)
  2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet
@ 2025-08-30  2:40 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-30  2:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, ncardwell, horms, kuniyu, willemb, netdev,
	eric.dumazet

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 28 Aug 2025 10:27:33 +0000 you wrote:
> inet_diag_bc_sk() pulls five cache lines per socket,
> while most filters only need the two first ones.
> 
> We can change it to only pull needed cache lines,
> to make things like "ss -temoi src :21456" much faster.
> 
> First patches (1-3) are annotating data-races as a first step.
> 
> [...]

Here is the summary with links:
  - [net-next,1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill()
    https://git.kernel.org/netdev/net-next/c/9a574257b968
  - [net-next,2/5] tcp: annotate data-races in tcp_req_diag_fill()
    https://git.kernel.org/netdev/net-next/c/8e60447f0831
  - [net-next,3/5] inet_diag: annotate data-races in inet_diag_bc_sk()
    https://git.kernel.org/netdev/net-next/c/4fd84a0aaf2b
  - [net-next,4/5] inet_diag: change inet_diag_bc_sk() first argument
    https://git.kernel.org/netdev/net-next/c/9529320ad64e
  - [net-next,5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk()
    https://git.kernel.org/netdev/net-next/c/95fa78830e5b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-08-30  2:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet
2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet
2025-08-29  1:09   ` Kuniyuki Iwashima
2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet
2025-08-29  1:11   ` Kuniyuki Iwashima
2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet
2025-08-29  1:12   ` Kuniyuki Iwashima
2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet
2025-08-29  1:22   ` Kuniyuki Iwashima
2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet
2025-08-29  1:33   ` Kuniyuki Iwashima
2025-08-30  2:40 ` [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).