* [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters @ 2025-08-28 10:27 Eric Dumazet 2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet ` (5 more replies) 0 siblings, 6 replies; 12+ messages in thread From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw) To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev, eric.dumazet, Eric Dumazet inet_diag_bc_sk() pulls five cache lines per socket, while most filters only need the two first ones. We can change it to only pull needed cache lines, to make things like "ss -temoi src :21456" much faster. First patches (1-3) are annotating data-races as a first step. Eric Dumazet (5): inet_diag: annotate data-races in inet_diag_msg_common_fill() tcp: annotate data-races in tcp_req_diag_fill() inet_diag: annotate data-races in inet_diag_bc_sk() inet_diag: change inet_diag_bc_sk() first argument inet_diag: avoid cache line misses in inet_diag_bc_sk() include/linux/inet_diag.h | 7 +++- net/ipv4/inet_diag.c | 85 ++++++++++++++++++++++----------------- net/ipv4/raw_diag.c | 10 ++--- net/ipv4/tcp_diag.c | 12 +++--- net/ipv4/tcp_output.c | 2 +- net/ipv4/udp_diag.c | 10 ++--- net/mptcp/mptcp_diag.c | 15 ++----- 7 files changed, 70 insertions(+), 71 deletions(-) -- 2.51.0.268.g9569e192d0-goog ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() 2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet @ 2025-08-28 10:27 ` Eric Dumazet 2025-08-29 1:09 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet ` (4 subsequent siblings) 5 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw) To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev, eric.dumazet, Eric Dumazet inet_diag_msg_common_fill() can run without socket lock. Add READ_ONCE() or data_race() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/inet_diag.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 9d4dcd17728cd17e693b59cee03f37b75f6c923a..7a9c347bc66fe35fa9771649db2f205af30e2a44 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -71,25 +71,25 @@ static void inet_diag_unlock_handler(const struct inet_diag_handler *handler) void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk) { - r->idiag_family = sk->sk_family; + r->idiag_family = READ_ONCE(sk->sk_family); - r->id.idiag_sport = htons(sk->sk_num); - r->id.idiag_dport = sk->sk_dport; - r->id.idiag_if = sk->sk_bound_dev_if; + r->id.idiag_sport = htons(READ_ONCE(sk->sk_num)); + r->id.idiag_dport = READ_ONCE(sk->sk_dport); + r->id.idiag_if = READ_ONCE(sk->sk_bound_dev_if); sock_diag_save_cookie(sk, r->id.idiag_cookie); #if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family == AF_INET6) { - *(struct in6_addr *)r->id.idiag_src = sk->sk_v6_rcv_saddr; - *(struct in6_addr *)r->id.idiag_dst = sk->sk_v6_daddr; + if (r->idiag_family == AF_INET6) { + data_race(*(struct in6_addr *)r->id.idiag_src = sk->sk_v6_rcv_saddr); + data_race(*(struct in6_addr *)r->id.idiag_dst = sk->sk_v6_daddr); } else #endif { memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src)); memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst)); - r->id.idiag_src[0] = sk->sk_rcv_saddr; - r->id.idiag_dst[0] = sk->sk_daddr; + r->id.idiag_src[0] = READ_ONCE(sk->sk_rcv_saddr); + r->id.idiag_dst[0] = READ_ONCE(sk->sk_daddr); } } EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill); -- 2.51.0.268.g9569e192d0-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() 2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet @ 2025-08-29 1:09 ` Kuniyuki Iwashima 0 siblings, 0 replies; 12+ messages in thread From: Kuniyuki Iwashima @ 2025-08-29 1:09 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Simon Horman, Willem de Bruijn, netdev, eric.dumazet On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > inet_diag_msg_common_fill() can run without socket lock. > Add READ_ONCE() or data_race() annotations. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() 2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet 2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet @ 2025-08-28 10:27 ` Eric Dumazet 2025-08-29 1:11 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet ` (3 subsequent siblings) 5 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw) To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev, eric.dumazet, Eric Dumazet req->num_retrans and rsk_timer.expires are read locklessly, and can be changed from tcp_rtx_synack(). Add READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/tcp_diag.c | 4 ++-- net/ipv4/tcp_output.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c index 2f3a779ce7a2da7d59c6a471c155c3e6d1563acd..4ed6b93527f4ad00f34cc732639c0c82d0feff08 100644 --- a/net/ipv4/tcp_diag.c +++ b/net/ipv4/tcp_diag.c @@ -248,12 +248,12 @@ static int tcp_req_diag_fill(struct sock *sk, struct sk_buff *skb, inet_diag_msg_common_fill(r, sk); r->idiag_state = TCP_SYN_RECV; r->idiag_timer = 1; - r->idiag_retrans = reqsk->num_retrans; + r->idiag_retrans = READ_ONCE(reqsk->num_retrans); BUILD_BUG_ON(offsetof(struct inet_request_sock, ir_cookie) != offsetof(struct sock, sk_cookie)); - tmo = inet_reqsk(sk)->rsk_timer.expires - jiffies; + tmo = READ_ONCE(inet_reqsk(sk)->rsk_timer.expires) - jiffies; r->idiag_expires = jiffies_delta_to_msecs(tmo); r->idiag_rqueue = 0; r->idiag_wqueue = 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 06b26a6efd628e85f97bdb7253c344565b0ed56d..e180364b8ddad4baa9978418ffd9c8b871342cb9 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -4438,7 +4438,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req) tcp_sk_rw(sk)->total_retrans++; } trace_tcp_retransmit_synack(sk, req); - req->num_retrans++; + WRITE_ONCE(req->num_retrans, req->num_retrans + 1); } return res; } -- 2.51.0.268.g9569e192d0-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() 2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet @ 2025-08-29 1:11 ` Kuniyuki Iwashima 0 siblings, 0 replies; 12+ messages in thread From: Kuniyuki Iwashima @ 2025-08-29 1:11 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Simon Horman, Willem de Bruijn, netdev, eric.dumazet On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > req->num_retrans and rsk_timer.expires are read locklessly, > and can be changed from tcp_rtx_synack(). > > Add READ_ONCE()/WRITE_ONCE() annotations. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() 2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet 2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet 2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet @ 2025-08-28 10:27 ` Eric Dumazet 2025-08-29 1:12 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet ` (2 subsequent siblings) 5 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw) To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev, eric.dumazet, Eric Dumazet inet_diag_bc_sk() runs with an unlocked socket, annotate potential races with READ_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/inet_diag.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 7a9c347bc66fe35fa9771649db2f205af30e2a44..3827e9979d4f9a4b33665e08ce69eb803fe4f948 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -580,7 +580,7 @@ static void entry_fill_addrs(struct inet_diag_entry *entry, const struct sock *sk) { #if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family == AF_INET6) { + if (entry->family == AF_INET6) { entry->saddr = sk->sk_v6_rcv_saddr.s6_addr32; entry->daddr = sk->sk_v6_daddr.s6_addr32; } else @@ -593,18 +593,18 @@ static void entry_fill_addrs(struct inet_diag_entry *entry, int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); struct inet_diag_entry entry; if (!bc) return 1; - entry.family = sk->sk_family; + entry.family = READ_ONCE(sk->sk_family); entry_fill_addrs(&entry, sk); - entry.sport = inet->inet_num; - entry.dport = ntohs(inet->inet_dport); - entry.ifindex = sk->sk_bound_dev_if; - entry.userlocks = sk_fullsock(sk) ? sk->sk_userlocks : 0; + entry.sport = READ_ONCE(inet->inet_num); + entry.dport = ntohs(READ_ONCE(inet->inet_dport)); + entry.ifindex = READ_ONCE(sk->sk_bound_dev_if); + entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0; if (sk_fullsock(sk)) entry.mark = READ_ONCE(sk->sk_mark); else if (sk->sk_state == TCP_NEW_SYN_RECV) -- 2.51.0.268.g9569e192d0-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() 2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet @ 2025-08-29 1:12 ` Kuniyuki Iwashima 0 siblings, 0 replies; 12+ messages in thread From: Kuniyuki Iwashima @ 2025-08-29 1:12 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Simon Horman, Willem de Bruijn, netdev, eric.dumazet On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > inet_diag_bc_sk() runs with an unlocked socket, > annotate potential races with READ_ONCE(). > > Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument 2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet ` (2 preceding siblings ...) 2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet @ 2025-08-28 10:27 ` Eric Dumazet 2025-08-29 1:22 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet 2025-08-30 2:40 ` [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters patchwork-bot+netdevbpf 5 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw) To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev, eric.dumazet, Eric Dumazet We want to have access to the inet_diag_dump_data structure in the following patch. This patch removes duplication in callers. Signed-off-by: Eric Dumazet <edumazet@google.com> --- include/linux/inet_diag.h | 2 +- net/ipv4/inet_diag.c | 3 ++- net/ipv4/raw_diag.c | 10 +++------- net/ipv4/tcp_diag.c | 8 +++----- net/ipv4/udp_diag.c | 10 +++------- net/mptcp/mptcp_diag.c | 15 ++++----------- 6 files changed, 16 insertions(+), 32 deletions(-) diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h index 30bf8f7ea62b6b34dbf45d061fb8870a91be0944..86a0641ec36e1bf25483a8e6c3412073b9893d36 100644 --- a/include/linux/inet_diag.h +++ b/include/linux/inet_diag.h @@ -46,7 +46,7 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, const struct inet_diag_req_v2 *req, u16 nlmsg_flags, bool net_admin); -int inet_diag_bc_sk(const struct nlattr *_bc, struct sock *sk); +int inet_diag_bc_sk(const struct inet_diag_dump_data *cb_data, struct sock *sk); void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk); diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 3827e9979d4f9a4b33665e08ce69eb803fe4f948..11710304268781581b3559aca770d50dc0090ef3 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -591,8 +591,9 @@ static void entry_fill_addrs(struct inet_diag_entry *entry, } } -int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk) +int inet_diag_bc_sk(const struct inet_diag_dump_data *cb_data, struct sock *sk) { + const struct nlattr *bc = cb_data->inet_diag_nla_bc; const struct inet_sock *inet = inet_sk(sk); struct inet_diag_entry entry; diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c index cc793bd8de258c3a12f11e95cec81c5ae4b9a7f6..943e5998e0ad5e23d0f3c4d364139bcb07ac5e0c 100644 --- a/net/ipv4/raw_diag.c +++ b/net/ipv4/raw_diag.c @@ -126,9 +126,9 @@ static int raw_diag_dump_one(struct netlink_callback *cb, static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct netlink_callback *cb, const struct inet_diag_req_v2 *r, - struct nlattr *bc, bool net_admin) + bool net_admin) { - if (!inet_diag_bc_sk(bc, sk)) + if (!inet_diag_bc_sk(cb->data, sk)) return 0; return inet_sk_diag_fill(sk, NULL, skb, cb, r, NLM_F_MULTI, net_admin); @@ -140,17 +140,13 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN); struct raw_hashinfo *hashinfo = raw_get_hashinfo(r); struct net *net = sock_net(skb->sk); - struct inet_diag_dump_data *cb_data; int num, s_num, slot, s_slot; struct hlist_head *hlist; struct sock *sk = NULL; - struct nlattr *bc; if (IS_ERR(hashinfo)) return; - cb_data = cb->data; - bc = cb_data->inet_diag_nla_bc; s_slot = cb->args[0]; num = s_num = cb->args[1]; @@ -174,7 +170,7 @@ static void raw_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, if (r->id.idiag_dport != inet->inet_dport && r->id.idiag_dport) goto next; - if (sk_diag_dump(sk, skb, cb, r, bc, net_admin) < 0) + if (sk_diag_dump(sk, skb, cb, r, net_admin) < 0) goto out_unlock; next: num++; diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c index 4ed6b93527f4ad00f34cc732639c0c82d0feff08..d83efd91f461c8ad0157faeebae051b32cb07bf4 100644 --- a/net/ipv4/tcp_diag.c +++ b/net/ipv4/tcp_diag.c @@ -320,11 +320,9 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, u32 idiag_states = r->idiag_states; struct inet_hashinfo *hashinfo; int i, num, s_i, s_num; - struct nlattr *bc; struct sock *sk; hashinfo = net->ipv4.tcp_death_row.hashinfo; - bc = cb_data->inet_diag_nla_bc; if (idiag_states & TCPF_SYN_RECV) idiag_states |= TCPF_NEW_SYN_RECV; s_i = cb->args[1]; @@ -365,7 +363,7 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, r->id.idiag_sport) goto next_listen; - if (!inet_diag_bc_sk(bc, sk)) + if (!inet_diag_bc_sk(cb_data, sk)) goto next_listen; if (inet_sk_diag_fill(sk, inet_csk(sk), skb, @@ -432,7 +430,7 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, r->sdiag_family != sk->sk_family) goto next_bind; - if (!inet_diag_bc_sk(bc, sk)) + if (!inet_diag_bc_sk(cb_data, sk)) goto next_bind; sock_hold(sk); @@ -519,7 +517,7 @@ static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, goto next_normal; twsk_build_assert(); - if (!inet_diag_bc_sk(bc, sk)) + if (!inet_diag_bc_sk(cb_data, sk)) goto next_normal; if (!refcount_inc_not_zero(&sk->sk_refcnt)) diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c index 38cb3a28e4ed6d54f7078afa2700e71db9ce4b85..6e491c720c9075bcef9d5daf9bc852fab3412231 100644 --- a/net/ipv4/udp_diag.c +++ b/net/ipv4/udp_diag.c @@ -16,9 +16,9 @@ static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct netlink_callback *cb, const struct inet_diag_req_v2 *req, - struct nlattr *bc, bool net_admin) + bool net_admin) { - if (!inet_diag_bc_sk(bc, sk)) + if (!inet_diag_bc_sk(cb->data, sk)) return 0; return inet_sk_diag_fill(sk, NULL, skb, cb, req, NLM_F_MULTI, @@ -92,12 +92,8 @@ static void udp_dump(struct udp_table *table, struct sk_buff *skb, { bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN); struct net *net = sock_net(skb->sk); - struct inet_diag_dump_data *cb_data; int num, s_num, slot, s_slot; - struct nlattr *bc; - cb_data = cb->data; - bc = cb_data->inet_diag_nla_bc; s_slot = cb->args[0]; num = s_num = cb->args[1]; @@ -130,7 +126,7 @@ static void udp_dump(struct udp_table *table, struct sk_buff *skb, r->id.idiag_dport) goto next; - if (sk_diag_dump(sk, skb, cb, r, bc, net_admin) < 0) { + if (sk_diag_dump(sk, skb, cb, r, net_admin) < 0) { spin_unlock_bh(&hslot->lock); goto done; } diff --git a/net/mptcp/mptcp_diag.c b/net/mptcp/mptcp_diag.c index 0566dd793810a58055d33548bcb5e511116eed61..ac974299de71cdf85cba7d848d14a241f5ff4dc8 100644 --- a/net/mptcp/mptcp_diag.c +++ b/net/mptcp/mptcp_diag.c @@ -15,9 +15,9 @@ static int sk_diag_dump(struct sock *sk, struct sk_buff *skb, struct netlink_callback *cb, const struct inet_diag_req_v2 *req, - struct nlattr *bc, bool net_admin) + bool net_admin) { - if (!inet_diag_bc_sk(bc, sk)) + if (!inet_diag_bc_sk(cb->data, sk)) return 0; return inet_sk_diag_fill(sk, inet_csk(sk), skb, cb, req, NLM_F_MULTI, @@ -76,9 +76,7 @@ static void mptcp_diag_dump_listeners(struct sk_buff *skb, struct netlink_callba const struct inet_diag_req_v2 *r, bool net_admin) { - struct inet_diag_dump_data *cb_data = cb->data; struct mptcp_diag_ctx *diag_ctx = (void *)cb->ctx; - struct nlattr *bc = cb_data->inet_diag_nla_bc; struct net *net = sock_net(skb->sk); struct inet_hashinfo *hinfo; int i; @@ -121,7 +119,7 @@ static void mptcp_diag_dump_listeners(struct sk_buff *skb, struct netlink_callba if (!refcount_inc_not_zero(&sk->sk_refcnt)) goto next_listen; - ret = sk_diag_dump(sk, skb, cb, r, bc, net_admin); + ret = sk_diag_dump(sk, skb, cb, r, net_admin); sock_put(sk); @@ -154,15 +152,10 @@ static void mptcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN); struct mptcp_diag_ctx *diag_ctx = (void *)cb->ctx; struct net *net = sock_net(skb->sk); - struct inet_diag_dump_data *cb_data; struct mptcp_sock *msk; - struct nlattr *bc; BUILD_BUG_ON(sizeof(cb->ctx) < sizeof(*diag_ctx)); - cb_data = cb->data; - bc = cb_data->inet_diag_nla_bc; - while ((msk = mptcp_token_iter_next(net, &diag_ctx->s_slot, &diag_ctx->s_num)) != NULL) { struct inet_sock *inet = (struct inet_sock *)msk; @@ -181,7 +174,7 @@ static void mptcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, r->id.idiag_dport) goto next; - ret = sk_diag_dump(sk, skb, cb, r, bc, net_admin); + ret = sk_diag_dump(sk, skb, cb, r, net_admin); next: sock_put(sk); if (ret < 0) { -- 2.51.0.268.g9569e192d0-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument 2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet @ 2025-08-29 1:22 ` Kuniyuki Iwashima 0 siblings, 0 replies; 12+ messages in thread From: Kuniyuki Iwashima @ 2025-08-29 1:22 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Simon Horman, Willem de Bruijn, netdev, eric.dumazet On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > We want to have access to the inet_diag_dump_data structure > in the following patch. > > This patch removes duplication in callers. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() 2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet ` (3 preceding siblings ...) 2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet @ 2025-08-28 10:27 ` Eric Dumazet 2025-08-29 1:33 ` Kuniyuki Iwashima 2025-08-30 2:40 ` [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters patchwork-bot+netdevbpf 5 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2025-08-28 10:27 UTC (permalink / raw) To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell Cc: Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, netdev, eric.dumazet, Eric Dumazet inet_diag_bc_sk() pulls five cache lines per socket, while most filters only need the two first ones. Add three booleans to struct inet_diag_dump_data, that are selectively set if a filter needs specific socket fields. - mark_needed /* INET_DIAG_BC_MARK_COND present. */ - cgroup_needed /* INET_DIAG_BC_CGROUP_COND present. */ - userlocks_needed /* INET_DIAG_BC_AUTO present. */ This removes millions of cache lines misses per ss invocation when simple filters are specified on busy servers. offsetof(struct sock, sk_userlocks) = 0xf3 offsetof(struct sock, sk_mark) = 0x20c offsetof(struct sock, sk_cgrp_data) = 0x298 Signed-off-by: Eric Dumazet <edumazet@google.com> --- include/linux/inet_diag.h | 5 ++++ net/ipv4/inet_diag.c | 52 +++++++++++++++++++++++---------------- 2 files changed, 36 insertions(+), 21 deletions(-) diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h index 86a0641ec36e1bf25483a8e6c3412073b9893d36..704fd415c2b497dfba591a7ef46009dec7824d75 100644 --- a/include/linux/inet_diag.h +++ b/include/linux/inet_diag.h @@ -38,6 +38,11 @@ struct inet_diag_dump_data { #define inet_diag_nla_bpf_stgs req_nlas[INET_DIAG_REQ_SK_BPF_STORAGES] struct bpf_sk_storage_diag *bpf_stg_diag; + bool mark_needed; /* INET_DIAG_BC_MARK_COND present. */ +#ifdef CONFIG_SOCK_CGROUP_DATA + bool cgroup_needed; /* INET_DIAG_BC_CGROUP_COND present. */ +#endif + bool userlocks_needed; /* INET_DIAG_BC_AUTO present. */ }; struct inet_connection_sock; diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 11710304268781581b3559aca770d50dc0090ef3..f0b6c5a411a2008e2a039ed37e262f3f132e58ac 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -605,18 +605,22 @@ int inet_diag_bc_sk(const struct inet_diag_dump_data *cb_data, struct sock *sk) entry.sport = READ_ONCE(inet->inet_num); entry.dport = ntohs(READ_ONCE(inet->inet_dport)); entry.ifindex = READ_ONCE(sk->sk_bound_dev_if); - entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0; - if (sk_fullsock(sk)) - entry.mark = READ_ONCE(sk->sk_mark); - else if (sk->sk_state == TCP_NEW_SYN_RECV) - entry.mark = inet_rsk(inet_reqsk(sk))->ir_mark; - else if (sk->sk_state == TCP_TIME_WAIT) - entry.mark = inet_twsk(sk)->tw_mark; - else - entry.mark = 0; + if (cb_data->userlocks_needed) + entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0; + if (cb_data->mark_needed) { + if (sk_fullsock(sk)) + entry.mark = READ_ONCE(sk->sk_mark); + else if (sk->sk_state == TCP_NEW_SYN_RECV) + entry.mark = inet_rsk(inet_reqsk(sk))->ir_mark; + else if (sk->sk_state == TCP_TIME_WAIT) + entry.mark = inet_twsk(sk)->tw_mark; + else + entry.mark = 0; + } #ifdef CONFIG_SOCK_CGROUP_DATA - entry.cgroup_id = sk_fullsock(sk) ? - cgroup_id(sock_cgroup_ptr(&sk->sk_cgrp_data)) : 0; + if (cb_data->cgroup_needed) + entry.cgroup_id = sk_fullsock(sk) ? + cgroup_id(sock_cgroup_ptr(&sk->sk_cgrp_data)) : 0; #endif return inet_diag_bc_run(bc, &entry); @@ -716,16 +720,21 @@ static bool valid_cgroupcond(const struct inet_diag_bc_op *op, int len, } #endif -static int inet_diag_bc_audit(const struct nlattr *attr, +static int inet_diag_bc_audit(struct inet_diag_dump_data *cb_data, const struct sk_buff *skb) { - bool net_admin = netlink_net_capable(skb, CAP_NET_ADMIN); + const struct nlattr *attr = cb_data->inet_diag_nla_bc; const void *bytecode, *bc; int bytecode_len, len; + bool net_admin; + + if (!attr) + return 0; - if (!attr || nla_len(attr) < sizeof(struct inet_diag_bc_op)) + if (nla_len(attr) < sizeof(struct inet_diag_bc_op)) return -EINVAL; + net_admin = netlink_net_capable(skb, CAP_NET_ADMIN); bytecode = bc = nla_data(attr); len = bytecode_len = nla_len(attr); @@ -757,14 +766,18 @@ static int inet_diag_bc_audit(const struct nlattr *attr, return -EPERM; if (!valid_markcond(bc, len, &min_len)) return -EINVAL; + cb_data->mark_needed = true; break; #ifdef CONFIG_SOCK_CGROUP_DATA case INET_DIAG_BC_CGROUP_COND: if (!valid_cgroupcond(bc, len, &min_len)) return -EINVAL; + cb_data->cgroup_needed = true; break; #endif case INET_DIAG_BC_AUTO: + cb_data->userlocks_needed = true; + fallthrough; case INET_DIAG_BC_JMP: case INET_DIAG_BC_NOP: break; @@ -841,13 +854,10 @@ static int __inet_diag_dump_start(struct netlink_callback *cb, int hdrlen) kfree(cb_data); return err; } - nla = cb_data->inet_diag_nla_bc; - if (nla) { - err = inet_diag_bc_audit(nla, skb); - if (err) { - kfree(cb_data); - return err; - } + err = inet_diag_bc_audit(cb_data, skb); + if (err) { + kfree(cb_data); + return err; } nla = cb_data->inet_diag_nla_bpf_stgs; -- 2.51.0.268.g9569e192d0-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() 2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet @ 2025-08-29 1:33 ` Kuniyuki Iwashima 0 siblings, 0 replies; 12+ messages in thread From: Kuniyuki Iwashima @ 2025-08-29 1:33 UTC (permalink / raw) To: Eric Dumazet Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Simon Horman, Willem de Bruijn, netdev, eric.dumazet On Thu, Aug 28, 2025 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > inet_diag_bc_sk() pulls five cache lines per socket, > while most filters only need the two first ones. > > Add three booleans to struct inet_diag_dump_data, > that are selectively set if a filter needs specific socket fields. > > - mark_needed /* INET_DIAG_BC_MARK_COND present. */ > - cgroup_needed /* INET_DIAG_BC_CGROUP_COND present. */ > - userlocks_needed /* INET_DIAG_BC_AUTO present. */ > > This removes millions of cache lines misses per ss invocation > when simple filters are specified on busy servers. > > offsetof(struct sock, sk_userlocks) = 0xf3 > offsetof(struct sock, sk_mark) = 0x20c > offsetof(struct sock, sk_cgrp_data) = 0x298 > > Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters 2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet ` (4 preceding siblings ...) 2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet @ 2025-08-30 2:40 ` patchwork-bot+netdevbpf 5 siblings, 0 replies; 12+ messages in thread From: patchwork-bot+netdevbpf @ 2025-08-30 2:40 UTC (permalink / raw) To: Eric Dumazet Cc: davem, kuba, pabeni, ncardwell, horms, kuniyu, willemb, netdev, eric.dumazet Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 28 Aug 2025 10:27:33 +0000 you wrote: > inet_diag_bc_sk() pulls five cache lines per socket, > while most filters only need the two first ones. > > We can change it to only pull needed cache lines, > to make things like "ss -temoi src :21456" much faster. > > First patches (1-3) are annotating data-races as a first step. > > [...] Here is the summary with links: - [net-next,1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() https://git.kernel.org/netdev/net-next/c/9a574257b968 - [net-next,2/5] tcp: annotate data-races in tcp_req_diag_fill() https://git.kernel.org/netdev/net-next/c/8e60447f0831 - [net-next,3/5] inet_diag: annotate data-races in inet_diag_bc_sk() https://git.kernel.org/netdev/net-next/c/4fd84a0aaf2b - [net-next,4/5] inet_diag: change inet_diag_bc_sk() first argument https://git.kernel.org/netdev/net-next/c/9529320ad64e - [net-next,5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() https://git.kernel.org/netdev/net-next/c/95fa78830e5b You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-30 2:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-28 10:27 [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters Eric Dumazet 2025-08-28 10:27 ` [PATCH net-next 1/5] inet_diag: annotate data-races in inet_diag_msg_common_fill() Eric Dumazet 2025-08-29 1:09 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 2/5] tcp: annotate data-races in tcp_req_diag_fill() Eric Dumazet 2025-08-29 1:11 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 3/5] inet_diag: annotate data-races in inet_diag_bc_sk() Eric Dumazet 2025-08-29 1:12 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 4/5] inet_diag: change inet_diag_bc_sk() first argument Eric Dumazet 2025-08-29 1:22 ` Kuniyuki Iwashima 2025-08-28 10:27 ` [PATCH net-next 5/5] inet_diag: avoid cache line misses in inet_diag_bc_sk() Eric Dumazet 2025-08-29 1:33 ` Kuniyuki Iwashima 2025-08-30 2:40 ` [PATCH net-next 0/5] inet_diag: make dumps faster with simple filters patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).