netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
@ 2025-02-07 16:23 Paolo Abeni
  2025-02-07 16:23 ` [RFC PATCH 1/2] sock: introduce set_tsflags operation Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-02-07 16:23 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Eric Dumazet, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	David Ahern

While benchmarking the recently shared page frag revert, I observed a
lot of cache misses in the UDP RX path due to false sharing between the
sk_tsflags and the sk_forward_alloc sk fields.

Here comes a solution attempt for such a problem, inspired by commit
f796feabb9f5 ("udp: add local "peek offset enabled" flag").

The first patch adds a new proto op allowing protocol specific operation
on tsflags updates, and the 2nd one leverages such operation to cache
the problematic field in a cache friendly manner.

The need for a new operation is possibly suboptimal, hence the RFC tag,
but I could not find other good solutions. I considered:
- moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
  group. It arguably belongs to such group, but the change would create
  a couple of holes, increasing the 'struct sock' size and would have 
  side effects on other protocols
- moving the sk_tsflags just before 'sk_stamp'; similar to the above,
  would possibly reduce the side effects, as most of 'struct sock'
  layout will be unchanged. Could increase the number of cacheline
  accessed in the TX path.

I opted for the present solution as it should minimize the side effects
to other protocols.

Paolo Abeni (2):
  sock: introduce set_tsflags operation
  udp: avoid false sharing via protocol specific set_tsflags

 include/linux/udp.h | 12 ++++++++++++
 include/net/sock.h  | 15 +++++++++++----
 include/net/tcp.h   |  1 +
 net/core/sock.c     | 24 +++++++++---------------
 net/ipv4/tcp.c      | 16 ++++++++++++++++
 net/ipv4/tcp_ipv4.c |  1 +
 net/ipv4/udp.c      |  3 ++-
 net/ipv6/tcp_ipv6.c |  1 +
 net/ipv6/udp.c      |  3 ++-
 9 files changed, 55 insertions(+), 21 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/2] sock: introduce set_tsflags operation
  2025-02-07 16:23 [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Paolo Abeni
@ 2025-02-07 16:23 ` Paolo Abeni
  2025-02-07 16:23 ` [RFC PATCH 2/2] udp: avoid false sharing via protocol specific set_tsflags Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-02-07 16:23 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Eric Dumazet, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	David Ahern

The generic socket code for SO_TIMESTAMPING_* carries some TCP
specific bits. Add a new protocol operation hooked on such socket
option and move the TCP implementation into the relevant callback.

Place the new hook after all the core checks, to allow the protocol
to take "conclusive" actions.

The next patch will add a tsflags implementation for the UDP protocol.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/sock.h  |  1 +
 include/net/tcp.h   |  1 +
 net/core/sock.c     | 24 +++++++++---------------
 net/ipv4/tcp.c      | 16 ++++++++++++++++
 net/ipv4/tcp_ipv4.c |  1 +
 net/ipv6/tcp_ipv6.c |  1 +
 6 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8036b3b79cd8..282dd23b90dc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1246,6 +1246,7 @@ struct proto {
 					int optname, char __user *optval,
 					int __user *option);
 	void			(*keepalive)(struct sock *sk, int valbool);
+	int			(*tsflags)(struct sock *sk, int val);
 #ifdef CONFIG_COMPAT
 	int			(*compat_ioctl)(struct sock *sk,
 					unsigned int cmd, unsigned long arg);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5b2b04835688..962500d0c4a9 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -416,6 +416,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
 int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		   unsigned int optlen);
 void tcp_set_keepalive(struct sock *sk, int val);
+int tcp_set_tsflags(struct sock *sk, int val);
 void tcp_syn_ack_timeout(const struct request_sock *req);
 int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		int flags, int *addr_len);
diff --git a/net/core/sock.c b/net/core/sock.c
index eae2ae70a2e0..aafba8e30080 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -911,21 +911,6 @@ int sock_set_timestamping(struct sock *sk, int optname,
 	    !(val & SOF_TIMESTAMPING_OPT_ID))
 		return -EINVAL;
 
-	if (val & SOF_TIMESTAMPING_OPT_ID &&
-	    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
-		if (sk_is_tcp(sk)) {
-			if ((1 << sk->sk_state) &
-			    (TCPF_CLOSE | TCPF_LISTEN))
-				return -EINVAL;
-			if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
-				atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
-			else
-				atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
-		} else {
-			atomic_set(&sk->sk_tskey, 0);
-		}
-	}
-
 	if (val & SOF_TIMESTAMPING_OPT_STATS &&
 	    !(val & SOF_TIMESTAMPING_OPT_TSONLY))
 		return -EINVAL;
@@ -936,6 +921,15 @@ int sock_set_timestamping(struct sock *sk, int optname,
 			return ret;
 	}
 
+	if (sk->sk_prot->tsflags) {
+		if (sk->sk_prot->tsflags(sk, val))
+			return -EINVAL;
+	} else {
+		if (val & SOF_TIMESTAMPING_OPT_ID &&
+		    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID))
+			atomic_set(&sk->sk_tskey, 0);
+	}
+
 	WRITE_ONCE(sk->sk_tsflags, val);
 	sock_valbool_flag(sk, SOCK_TSTAMP_NEW, optname == SO_TIMESTAMPING_NEW);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7f43d31c9400..4bd795d311d2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1741,6 +1741,22 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
 }
 EXPORT_SYMBOL(tcp_set_rcvlowat);
 
+int tcp_set_tsflags(struct sock *sk, int val)
+{
+	if (val & SOF_TIMESTAMPING_OPT_ID &&
+	    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
+		if ((1 << sk->sk_state) &
+		    (TCPF_CLOSE | TCPF_LISTEN))
+			return -EINVAL;
+		if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
+			atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
+		else
+			atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(tcp_set_tsflags);
+
 void tcp_update_recv_tstamps(struct sk_buff *skb,
 			     struct scm_timestamping_internal *tss)
 {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cc2b5194a18d..d37ff97508d2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3361,6 +3361,7 @@ struct proto tcp_prot = {
 	.getsockopt		= tcp_getsockopt,
 	.bpf_bypass_getsockopt	= tcp_bpf_bypass_getsockopt,
 	.keepalive		= tcp_set_keepalive,
+	.tsflags		= tcp_set_tsflags,
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
 	.splice_eof		= tcp_splice_eof,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2debdf085a3b..99092c9c1856 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2334,6 +2334,7 @@ struct proto tcpv6_prot = {
 	.getsockopt		= tcp_getsockopt,
 	.bpf_bypass_getsockopt	= tcp_bpf_bypass_getsockopt,
 	.keepalive		= tcp_set_keepalive,
+	.tsflags		= tcp_set_tsflags,
 	.recvmsg		= tcp_recvmsg,
 	.sendmsg		= tcp_sendmsg,
 	.splice_eof		= tcp_splice_eof,
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/2] udp: avoid false sharing via protocol specific set_tsflags
  2025-02-07 16:23 [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Paolo Abeni
  2025-02-07 16:23 ` [RFC PATCH 1/2] sock: introduce set_tsflags operation Paolo Abeni
@ 2025-02-07 16:23 ` Paolo Abeni
  2025-02-10  4:00 ` [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Willem de Bruijn
  2025-02-10  7:49 ` Eric Dumazet
  3 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-02-07 16:23 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, Eric Dumazet, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	David Ahern

After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"),
the sk_tsflags field shares the same cacheline with sk_forward_alloc.

The UDP protocol does not acquire the sock lock in the RX path;
forward allocations are protected via the receive queue spinlock.

Due to the above, under high packet rate traffic, when the BH and the
user-space process run on different CPUs, UDP packet reception will
experience a cache miss while accessing sk_tsflags.

Similarly to commit f796feabb9f5 ("udp: add local "peek offset enabled"
flag"), add a new field in the udp_sock struct to store a copy of the
ts_flags value in a cache friendly manner.

Use the newly introduced protocol op to sync-up the new field at every
sk_tsflags update.

With this patch applied, on an AMD epic server with i40e NICs, I
measured a 10% performance improvement for small packets UDP flood
performance tests - possibly a larger delta could be observed with more
recent H/W.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h | 12 ++++++++++++
 include/net/sock.h  | 14 ++++++++++----
 net/ipv4/udp.c      |  3 ++-
 net/ipv6/udp.c      |  3 ++-
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index 0807e21cfec9..b186ac20fd38 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -101,6 +101,9 @@ struct udp_sock {
 
 	/* Cache friendly copy of sk->sk_peek_off >= 0 */
 	bool		peeking_with_offset;
+
+	/* Cache friendly copy of sk_tsflags & TSFLAGS_ANY */
+	bool		tsflags_any;
 };
 
 #define udp_test_bit(nr, sk)			\
@@ -125,6 +128,15 @@ static inline int udp_set_peek_off(struct sock *sk, int val)
 	return 0;
 }
 
+static inline int udp_set_tsflags(struct sock *sk, int val)
+{
+	WRITE_ONCE(udp_sk(sk)->tsflags_any, !!(val & TSFLAGS_ANY));
+	if (val & SOF_TIMESTAMPING_OPT_ID &&
+	    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID))
+		atomic_set(&sk->sk_tskey, 0);
+	return 0;
+}
+
 static inline void udp_set_no_check6_tx(struct sock *sk, bool val)
 {
 	udp_assign_bit(NO_CHECK6_TX, sk, val);
diff --git a/include/net/sock.h b/include/net/sock.h
index 282dd23b90dc..5767c2dace2a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2660,8 +2660,8 @@ void __sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
 		       struct sk_buff *skb);
 
 #define SK_DEFAULT_STAMP (-1L * NSEC_PER_SEC)
-static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
-				   struct sk_buff *skb)
+static inline void sock_do_recv_cmsgs(struct msghdr *msg, struct sock *sk,
+				      struct sk_buff *skb, bool tsflags_any)
 {
 #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL)			| \
 			   (1UL << SOCK_RCVTSTAMP)			| \
@@ -2670,8 +2670,7 @@ static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
 #define TSFLAGS_ANY	  (SOF_TIMESTAMPING_SOFTWARE			| \
 			   SOF_TIMESTAMPING_RAW_HARDWARE)
 
-	if (sk->sk_flags & FLAGS_RECV_CMSGS ||
-	    READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY)
+	if (sk->sk_flags & FLAGS_RECV_CMSGS || tsflags_any)
 		__sock_recv_cmsgs(msg, sk, skb);
 	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
 		sock_write_timestamp(sk, skb->tstamp);
@@ -2679,6 +2678,13 @@ static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
 		sock_write_timestamp(sk, 0);
 }
 
+static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
+				   struct sk_buff *skb)
+{
+	return sock_do_recv_cmsgs(msg, sk, skb,
+				  READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY);
+}
+
 void __sock_tx_timestamp(__u32 tsflags, __u8 *tx_flags);
 
 /**
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index a9bb9ce5438e..001bb4579330 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2084,7 +2084,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
 		UDP_INC_STATS(sock_net(sk),
 			      UDP_MIB_INDATAGRAMS, is_udplite);
 
-	sock_recv_cmsgs(msg, sk, skb);
+	sock_do_recv_cmsgs(msg, sk, skb, udp_sk(sk)->tsflags_any);
 
 	/* Copy the address. */
 	if (sin) {
@@ -3191,6 +3191,7 @@ struct proto udp_prot = {
 	.destroy		= udp_destroy_sock,
 	.setsockopt		= udp_setsockopt,
 	.getsockopt		= udp_getsockopt,
+	.tsflags		= udp_set_tsflags,
 	.sendmsg		= udp_sendmsg,
 	.recvmsg		= udp_recvmsg,
 	.splice_eof		= udp_splice_eof,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c6ea438b5c75..28e2eb331ceb 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -532,7 +532,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	if (!peeking)
 		SNMP_INC_STATS(mib, UDP_MIB_INDATAGRAMS);
 
-	sock_recv_cmsgs(msg, sk, skb);
+	sock_do_recv_cmsgs(msg, sk, skb, udp_sk(sk)->tsflags_any);
 
 	/* Copy the address. */
 	if (msg->msg_name) {
@@ -1917,6 +1917,7 @@ struct proto udpv6_prot = {
 	.destroy		= udpv6_destroy_sock,
 	.setsockopt		= udpv6_setsockopt,
 	.getsockopt		= udpv6_getsockopt,
+	.tsflags		= udp_set_tsflags,
 	.sendmsg		= udpv6_sendmsg,
 	.recvmsg		= udpv6_recvmsg,
 	.splice_eof		= udpv6_splice_eof,
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-07 16:23 [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Paolo Abeni
  2025-02-07 16:23 ` [RFC PATCH 1/2] sock: introduce set_tsflags operation Paolo Abeni
  2025-02-07 16:23 ` [RFC PATCH 2/2] udp: avoid false sharing via protocol specific set_tsflags Paolo Abeni
@ 2025-02-10  4:00 ` Willem de Bruijn
  2025-02-10 15:13   ` Eric Dumazet
  2025-02-10  7:49 ` Eric Dumazet
  3 siblings, 1 reply; 14+ messages in thread
From: Willem de Bruijn @ 2025-02-10  4:00 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Willem de Bruijn, Eric Dumazet, Kuniyuki Iwashima,
	David S. Miller, Jakub Kicinski, Simon Horman, Neal Cardwell,
	David Ahern

Paolo Abeni wrote:
> While benchmarking the recently shared page frag revert, I observed a
> lot of cache misses in the UDP RX path due to false sharing between the
> sk_tsflags and the sk_forward_alloc sk fields.
> 
> Here comes a solution attempt for such a problem, inspired by commit
> f796feabb9f5 ("udp: add local "peek offset enabled" flag").
> 
> The first patch adds a new proto op allowing protocol specific operation
> on tsflags updates, and the 2nd one leverages such operation to cache
> the problematic field in a cache friendly manner.
> 
> The need for a new operation is possibly suboptimal, hence the RFC tag,
> but I could not find other good solutions. I considered:
> - moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
>   group. It arguably belongs to such group, but the change would create
>   a couple of holes, increasing the 'struct sock' size and would have 
>   side effects on other protocols
> - moving the sk_tsflags just before 'sk_stamp'; similar to the above,
>   would possibly reduce the side effects, as most of 'struct sock'
>   layout will be unchanged. Could increase the number of cacheline
>   accessed in the TX path.
> 
> I opted for the present solution as it should minimize the side effects
> to other protocols.

The code looks solid at a high level to me.

But if the issue can be adddressed by just moving a field, that is
quite appealing. So have no reviewed closely yet.

Question is which field to swap it with. Something like sk_rcvlowat is
not used in UDP. But is clearly not a write_rxtx or write_tx field.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-07 16:23 [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-02-10  4:00 ` [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Willem de Bruijn
@ 2025-02-10  7:49 ` Eric Dumazet
  3 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2025-02-10  7:49 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Willem de Bruijn, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Simon Horman, Neal Cardwell, David Ahern

On Fri, Feb 7, 2025 at 5:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> While benchmarking the recently shared page frag revert, I observed a
> lot of cache misses in the UDP RX path due to false sharing between the
> sk_tsflags and the sk_forward_alloc sk fields.
>
> Here comes a solution attempt for such a problem, inspired by commit
> f796feabb9f5 ("udp: add local "peek offset enabled" flag").
>
> The first patch adds a new proto op allowing protocol specific operation
> on tsflags updates, and the 2nd one leverages such operation to cache
> the problematic field in a cache friendly manner.
>
> The need for a new operation is possibly suboptimal, hence the RFC tag,
> but I could not find other good solutions. I considered:
> - moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
>   group. It arguably belongs to such group, but the change would create
>   a couple of holes, increasing the 'struct sock' size and would have
>   side effects on other protocols
> - moving the sk_tsflags just before 'sk_stamp'; similar to the above,
>   would possibly reduce the side effects, as most of 'struct sock'
>   layout will be unchanged. Could increase the number of cacheline
>   accessed in the TX path.
>
> I opted for the present solution as it should minimize the side effects
> to other protocols.

Hmm, thanks for the analysis. I will take a look at this today.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10  4:00 ` [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Willem de Bruijn
@ 2025-02-10 15:13   ` Eric Dumazet
  2025-02-10 16:16     ` Paolo Abeni
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2025-02-10 15:13 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Paolo Abeni, netdev, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Simon Horman, Neal Cardwell, David Ahern

On Mon, Feb 10, 2025 at 5:00 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Paolo Abeni wrote:
> > While benchmarking the recently shared page frag revert, I observed a
> > lot of cache misses in the UDP RX path due to false sharing between the
> > sk_tsflags and the sk_forward_alloc sk fields.
> >
> > Here comes a solution attempt for such a problem, inspired by commit
> > f796feabb9f5 ("udp: add local "peek offset enabled" flag").
> >
> > The first patch adds a new proto op allowing protocol specific operation
> > on tsflags updates, and the 2nd one leverages such operation to cache
> > the problematic field in a cache friendly manner.
> >
> > The need for a new operation is possibly suboptimal, hence the RFC tag,
> > but I could not find other good solutions. I considered:
> > - moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
> >   group. It arguably belongs to such group, but the change would create
> >   a couple of holes, increasing the 'struct sock' size and would have
> >   side effects on other protocols
> > - moving the sk_tsflags just before 'sk_stamp'; similar to the above,
> >   would possibly reduce the side effects, as most of 'struct sock'
> >   layout will be unchanged. Could increase the number of cacheline
> >   accessed in the TX path.
> >
> > I opted for the present solution as it should minimize the side effects
> > to other protocols.
>
> The code looks solid at a high level to me.
>
> But if the issue can be adddressed by just moving a field, that is
> quite appealing. So have no reviewed closely yet.
>

sk_tsflags has not been put in an optimal group, I would indeed move it,
even if this creates one hole.

Holes tend to be used quite fast anyway with new fields.

Perhaps sock_read_tx group would be the best location,
because tcp_recv_timestamp() is not called in the fast path.

diff --git a/include/net/sock.h b/include/net/sock.h
index 8036b3b79cd8be64550dcfd6ce213039460acb1f..b54fbf2d9e72c3d3300e1f7638ecfbb99fdf409d
100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -444,7 +444,6 @@ struct sock {
        socket_lock_t           sk_lock;
        u32                     sk_reserved_mem;
        int                     sk_forward_alloc;
-       u32                     sk_tsflags;
        __cacheline_group_end(sock_write_rxtx);

        __cacheline_group_begin(sock_write_tx);
@@ -474,6 +473,7 @@ struct sock {
        unsigned long           sk_max_pacing_rate;
        long                    sk_sndtimeo;
        u32                     sk_priority;
+       u32                     sk_tsflags;
        u32                     sk_mark;
        struct dst_entry __rcu  *sk_dst_cache;
        netdev_features_t       sk_route_caps;
diff --git a/net/core/sock.c b/net/core/sock.c
index eae2ae70a2e03df370d8ef7750a7bb13cc3b8d8f..4f855361b6c7fa74c449bf5ea3a0e88b7c0f33fb
100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -4371,7 +4371,6 @@ static int __init sock_struct_check(void)
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx, sk_lock);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx,
sk_reserved_mem);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx,
sk_forward_alloc);
-       CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_rxtx, sk_tsflags);

        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_tx,
sk_omem_alloc);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_write_tx,
sk_omem_alloc);
@@ -4394,6 +4393,7 @@ static int __init sock_struct_check(void)
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_tx, sk_sndtimeo);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_tx, sk_priority);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_tx, sk_mark);
+       CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_tx, sk_tsflags);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_tx, sk_dst_cache);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_tx, sk_route_caps);
        CACHELINE_ASSERT_GROUP_MEMBER(struct sock, sock_read_tx, sk_gso_type);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 15:13   ` Eric Dumazet
@ 2025-02-10 16:16     ` Paolo Abeni
  2025-02-10 16:37       ` Eric Dumazet
  2025-02-10 21:24       ` Paolo Abeni
  0 siblings, 2 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-02-10 16:16 UTC (permalink / raw)
  To: Eric Dumazet, Willem de Bruijn
  Cc: netdev, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Simon Horman, Neal Cardwell, David Ahern

On 2/10/25 4:13 PM, Eric Dumazet wrote:
> On Mon, Feb 10, 2025 at 5:00 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>>
>> Paolo Abeni wrote:
>>> While benchmarking the recently shared page frag revert, I observed a
>>> lot of cache misses in the UDP RX path due to false sharing between the
>>> sk_tsflags and the sk_forward_alloc sk fields.
>>>
>>> Here comes a solution attempt for such a problem, inspired by commit
>>> f796feabb9f5 ("udp: add local "peek offset enabled" flag").
>>>
>>> The first patch adds a new proto op allowing protocol specific operation
>>> on tsflags updates, and the 2nd one leverages such operation to cache
>>> the problematic field in a cache friendly manner.
>>>
>>> The need for a new operation is possibly suboptimal, hence the RFC tag,
>>> but I could not find other good solutions. I considered:
>>> - moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
>>>   group. It arguably belongs to such group, but the change would create
>>>   a couple of holes, increasing the 'struct sock' size and would have
>>>   side effects on other protocols
>>> - moving the sk_tsflags just before 'sk_stamp'; similar to the above,
>>>   would possibly reduce the side effects, as most of 'struct sock'
>>>   layout will be unchanged. Could increase the number of cacheline
>>>   accessed in the TX path.
>>>
>>> I opted for the present solution as it should minimize the side effects
>>> to other protocols.
>>
>> The code looks solid at a high level to me.
>>
>> But if the issue can be adddressed by just moving a field, that is
>> quite appealing. So have no reviewed closely yet.
>>
> 
> sk_tsflags has not been put in an optimal group, I would indeed move it,
> even if this creates one hole.
> 
> Holes tend to be used quite fast anyway with new fields.
> 
> Perhaps sock_read_tx group would be the best location,
> because tcp_recv_timestamp() is not called in the fast path.

Just to wrap my head on the above reasoning: for UDP such a change could
possibly increase the number of `struct sock` cache-line accessed in the
RX path (the `sock_write_tx` group should not be touched otherwise) but
that will not matter much, because we expect a low number of UDP sockets
in the system, right?

Side note: FWIW I think we will have 2 holes, 4 bytes each, one after
`sk_forward_alloc` and another one after `sk_mark`.

I missed that explicit alignment of the `tcp_sock_write_tx` group; that
will prevent the overall grow of `struct tcp_sock`, and will avoid bad
side effects while changing the struct layout.

I expect the change you propose would perform alike the RFC patches, but
I'll try to do an explicit test later (and report here the results).

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 16:16     ` Paolo Abeni
@ 2025-02-10 16:37       ` Eric Dumazet
  2025-02-10 17:53         ` Willem de Bruijn
  2025-02-10 21:24       ` Paolo Abeni
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2025-02-10 16:37 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, netdev, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Simon Horman, Neal Cardwell, David Ahern

On Mon, Feb 10, 2025 at 5:16 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 2/10/25 4:13 PM, Eric Dumazet wrote:
> > On Mon, Feb 10, 2025 at 5:00 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> >>
> >> Paolo Abeni wrote:
> >>> While benchmarking the recently shared page frag revert, I observed a
> >>> lot of cache misses in the UDP RX path due to false sharing between the
> >>> sk_tsflags and the sk_forward_alloc sk fields.
> >>>
> >>> Here comes a solution attempt for such a problem, inspired by commit
> >>> f796feabb9f5 ("udp: add local "peek offset enabled" flag").
> >>>
> >>> The first patch adds a new proto op allowing protocol specific operation
> >>> on tsflags updates, and the 2nd one leverages such operation to cache
> >>> the problematic field in a cache friendly manner.
> >>>
> >>> The need for a new operation is possibly suboptimal, hence the RFC tag,
> >>> but I could not find other good solutions. I considered:
> >>> - moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
> >>>   group. It arguably belongs to such group, but the change would create
> >>>   a couple of holes, increasing the 'struct sock' size and would have
> >>>   side effects on other protocols
> >>> - moving the sk_tsflags just before 'sk_stamp'; similar to the above,
> >>>   would possibly reduce the side effects, as most of 'struct sock'
> >>>   layout will be unchanged. Could increase the number of cacheline
> >>>   accessed in the TX path.
> >>>
> >>> I opted for the present solution as it should minimize the side effects
> >>> to other protocols.
> >>
> >> The code looks solid at a high level to me.
> >>
> >> But if the issue can be adddressed by just moving a field, that is
> >> quite appealing. So have no reviewed closely yet.
> >>
> >
> > sk_tsflags has not been put in an optimal group, I would indeed move it,
> > even if this creates one hole.
> >
> > Holes tend to be used quite fast anyway with new fields.
> >
> > Perhaps sock_read_tx group would be the best location,
> > because tcp_recv_timestamp() is not called in the fast path.
>
> Just to wrap my head on the above reasoning: for UDP such a change could
> possibly increase the number of `struct sock` cache-line accessed in the
> RX path (the `sock_write_tx` group should not be touched otherwise) but
> that will not matter much, because we expect a low number of UDP sockets
> in the system, right?

Are you referring to UDP applications needing timestamps ?

Because sk_tsflags is mostly always used in TX

We have not seen this issue because 97dc7cd92ac67f6e05 ("ptp: Support
late timestamp determination")
was not in our kernels at that time.

Perhaps we could change netdev_get_tstamp() so that we read sk->sk_tsflags
only when really needed ?

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5429581f22995bff639e6962a317adbd0ce30cff..848b70fb116421bf02159a53524a0700b87e851a
100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -5103,18 +5103,6 @@ static inline void netdev_rx_csum_fault(struct
net_device *dev,
 void net_enable_timestamp(void);
 void net_disable_timestamp(void);

-static inline ktime_t netdev_get_tstamp(struct net_device *dev,
-                                       const struct
skb_shared_hwtstamps *hwtstamps,
-                                       bool cycles)
-{
-       const struct net_device_ops *ops = dev->netdev_ops;
-
-       if (ops->ndo_get_tstamp)
-               return ops->ndo_get_tstamp(dev, hwtstamps, cycles);
-
-       return hwtstamps->hwtstamp;
-}
-
 #ifndef CONFIG_PREEMPT_RT
 static inline void netdev_xmit_set_more(bool more)
 {
diff --git a/net/socket.c b/net/socket.c
index 262a28b59c7f0f760fd29e207f270e65150abec8..6dc52c72fccd22f25c6e90d68de491863dc23689
100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -799,9 +799,22 @@ static bool skb_is_swtx_tstamp(const struct
sk_buff *skb, int false_tstamp)
        return skb->tstamp && !false_tstamp && skb_is_err_queue(skb);
 }

+static ktime_t netdev_get_tstamp(struct net_device *dev,
+                                const struct skb_shared_hwtstamps *hwtstamps,
+                                struct sock *sk)
+{
+       const struct net_device_ops *ops = dev->netdev_ops;
+
+       if (ops->ndo_get_tstamp) {
+               bool cycles = READ_ONCE(sk->sk_tsflags) &
SOF_TIMESTAMPING_BIND_PHC;
+
+               return ops->ndo_get_tstamp(dev, hwtstamps, cycles);
+       }
+       return hwtstamps->hwtstamp;
+}
+
 static ktime_t get_timestamp(struct sock *sk, struct sk_buff *skb,
int *if_index)
 {
-       bool cycles = READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_BIND_PHC;
        struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb);
        struct net_device *orig_dev;
        ktime_t hwtstamp;
@@ -810,7 +823,7 @@ static ktime_t get_timestamp(struct sock *sk,
struct sk_buff *skb, int *if_index
        orig_dev = dev_get_by_napi_id(skb_napi_id(skb));
        if (orig_dev) {
                *if_index = orig_dev->ifindex;
-               hwtstamp = netdev_get_tstamp(orig_dev, shhwtstamps, cycles);
+               hwtstamp = netdev_get_tstamp(orig_dev, shhwtstamps, sk);
        } else {
                hwtstamp = shhwtstamps->hwtstamp;
        }


>
> Side note: FWIW I think we will have 2 holes, 4 bytes each, one after
> `sk_forward_alloc` and another one after `sk_mark`.
>
> I missed that explicit alignment of the `tcp_sock_write_tx` group; that
> will prevent the overall grow of `struct tcp_sock`, and will avoid bad
> side effects while changing the struct layout.
>
> I expect the change you propose would perform alike the RFC patches, but
> I'll try to do an explicit test later (and report here the results).
>
> Thanks,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 16:37       ` Eric Dumazet
@ 2025-02-10 17:53         ` Willem de Bruijn
  2025-02-10 20:54           ` Paolo Abeni
  0 siblings, 1 reply; 14+ messages in thread
From: Willem de Bruijn @ 2025-02-10 17:53 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni
  Cc: Willem de Bruijn, netdev, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Simon Horman, Neal Cardwell, David Ahern

Eric Dumazet wrote:
> On Mon, Feb 10, 2025 at 5:16 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On 2/10/25 4:13 PM, Eric Dumazet wrote:
> > > On Mon, Feb 10, 2025 at 5:00 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > >>
> > >> Paolo Abeni wrote:
> > >>> While benchmarking the recently shared page frag revert, I observed a
> > >>> lot of cache misses in the UDP RX path due to false sharing between the
> > >>> sk_tsflags and the sk_forward_alloc sk fields.
> > >>>
> > >>> Here comes a solution attempt for such a problem, inspired by commit
> > >>> f796feabb9f5 ("udp: add local "peek offset enabled" flag").
> > >>>
> > >>> The first patch adds a new proto op allowing protocol specific operation
> > >>> on tsflags updates, and the 2nd one leverages such operation to cache
> > >>> the problematic field in a cache friendly manner.
> > >>>
> > >>> The need for a new operation is possibly suboptimal, hence the RFC tag,
> > >>> but I could not find other good solutions. I considered:
> > >>> - moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
> > >>>   group. It arguably belongs to such group, but the change would create
> > >>>   a couple of holes, increasing the 'struct sock' size and would have
> > >>>   side effects on other protocols
> > >>> - moving the sk_tsflags just before 'sk_stamp'; similar to the above,
> > >>>   would possibly reduce the side effects, as most of 'struct sock'
> > >>>   layout will be unchanged. Could increase the number of cacheline
> > >>>   accessed in the TX path.
> > >>>
> > >>> I opted for the present solution as it should minimize the side effects
> > >>> to other protocols.
> > >>
> > >> The code looks solid at a high level to me.
> > >>
> > >> But if the issue can be adddressed by just moving a field, that is
> > >> quite appealing. So have no reviewed closely yet.
> > >>
> > >
> > > sk_tsflags has not been put in an optimal group, I would indeed move it,
> > > even if this creates one hole.
> > >
> > > Holes tend to be used quite fast anyway with new fields.
> > >
> > > Perhaps sock_read_tx group would be the best location,
> > > because tcp_recv_timestamp() is not called in the fast path.
> >
> > Just to wrap my head on the above reasoning: for UDP such a change could
> > possibly increase the number of `struct sock` cache-line accessed in the
> > RX path (the `sock_write_tx` group should not be touched otherwise) but
> > that will not matter much, because we expect a low number of UDP sockets
> > in the system, right?
> 
> Are you referring to UDP applications needing timestamps ?
> 
> Because sk_tsflags is mostly always used in TX

I thought the issue on rx was with the test in sock_recv_cmsgs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 17:53         ` Willem de Bruijn
@ 2025-02-10 20:54           ` Paolo Abeni
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-02-10 20:54 UTC (permalink / raw)
  To: Willem de Bruijn, Eric Dumazet
  Cc: netdev, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Simon Horman, Neal Cardwell, David Ahern

On 2/10/25 6:53 PM, Willem de Bruijn wrote:
> Eric Dumazet wrote:
>> On Mon, Feb 10, 2025 at 5:16 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>
>>> On 2/10/25 4:13 PM, Eric Dumazet wrote:
>>>> On Mon, Feb 10, 2025 at 5:00 AM Willem de Bruijn
>>>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>
>>>>> Paolo Abeni wrote:
>>>>>> While benchmarking the recently shared page frag revert, I observed a
>>>>>> lot of cache misses in the UDP RX path due to false sharing between the
>>>>>> sk_tsflags and the sk_forward_alloc sk fields.
>>>>>>
>>>>>> Here comes a solution attempt for such a problem, inspired by commit
>>>>>> f796feabb9f5 ("udp: add local "peek offset enabled" flag").
>>>>>>
>>>>>> The first patch adds a new proto op allowing protocol specific operation
>>>>>> on tsflags updates, and the 2nd one leverages such operation to cache
>>>>>> the problematic field in a cache friendly manner.
>>>>>>
>>>>>> The need for a new operation is possibly suboptimal, hence the RFC tag,
>>>>>> but I could not find other good solutions. I considered:
>>>>>> - moving the sk_tsflags just before 'sk_policy', in the 'sock_read_rxtx'
>>>>>>   group. It arguably belongs to such group, but the change would create
>>>>>>   a couple of holes, increasing the 'struct sock' size and would have
>>>>>>   side effects on other protocols
>>>>>> - moving the sk_tsflags just before 'sk_stamp'; similar to the above,
>>>>>>   would possibly reduce the side effects, as most of 'struct sock'
>>>>>>   layout will be unchanged. Could increase the number of cacheline
>>>>>>   accessed in the TX path.
>>>>>>
>>>>>> I opted for the present solution as it should minimize the side effects
>>>>>> to other protocols.
>>>>>
>>>>> The code looks solid at a high level to me.
>>>>>
>>>>> But if the issue can be adddressed by just moving a field, that is
>>>>> quite appealing. So have no reviewed closely yet.
>>>>>
>>>>
>>>> sk_tsflags has not been put in an optimal group, I would indeed move it,
>>>> even if this creates one hole.
>>>>
>>>> Holes tend to be used quite fast anyway with new fields.
>>>>
>>>> Perhaps sock_read_tx group would be the best location,
>>>> because tcp_recv_timestamp() is not called in the fast path.
>>>
>>> Just to wrap my head on the above reasoning: for UDP such a change could
>>> possibly increase the number of `struct sock` cache-line accessed in the
>>> RX path (the `sock_write_tx` group should not be touched otherwise) but
>>> that will not matter much, because we expect a low number of UDP sockets
>>> in the system, right?
>>
>> Are you referring to UDP applications needing timestamps ?
>>
>> Because sk_tsflags is mostly always used in TX
> 
> I thought the issue on rx was with the test in sock_recv_cmsgs.

Yes, the critical bits in my tests are in sock_recv_cmsgs(): such
function touches sk->sk_tsflags unconditionally on each recvmsg call.

/P


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 16:16     ` Paolo Abeni
  2025-02-10 16:37       ` Eric Dumazet
@ 2025-02-10 21:24       ` Paolo Abeni
  2025-02-10 21:26         ` Eric Dumazet
  2025-02-10 21:33         ` Eric Dumazet
  1 sibling, 2 replies; 14+ messages in thread
From: Paolo Abeni @ 2025-02-10 21:24 UTC (permalink / raw)
  To: Eric Dumazet, Willem de Bruijn
  Cc: netdev, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Simon Horman, Neal Cardwell, David Ahern

On 2/10/25 5:16 PM, Paolo Abeni wrote:
> I expect the change you propose would perform alike the RFC patches, but
> I'll try to do an explicit test later (and report here the results).

I ran my test on the sock layout change, and it gave the same (good)
results as the RFC. Note that such test uses a single socket receiver,
so it's not affected in any way by the eventual increase of touched
'struct sock' cachelines.

BTW it just occurred to me that if we could use another bit from
sk_flags, something alike the following (completely untested!!!) would
do, without changing the struct sock layout and without adding other
sock proto ops:

---
diff --git a/include/net/sock.h b/include/net/sock.h
index 8036b3b79cd8..a526db7f5c60 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -954,6 +954,7 @@ enum sock_flags {
 	SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
 	SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
 	SOCK_RCVPRIORITY, /* Receive SO_PRIORITY ancillary data with packet */
+	SOCK_TIMESTAMPING_ANY, /* sk_tsflags & TSFLAGS_ANY */
 };

 #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL <<
SOCK_TIMESTAMPING_RX_SOFTWARE))
@@ -2665,12 +2666,12 @@ static inline void sock_recv_cmsgs(struct msghdr
*msg, struct sock *sk,
 #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL)			| \
 			   (1UL << SOCK_RCVTSTAMP)			| \
 			   (1UL << SOCK_RCVMARK)			|\
-			   (1UL << SOCK_RCVPRIORITY))
+			   (1UL << SOCK_RCVPRIORITY)			|\
+			   (1UL << SOCK_TIMESTAMPING_ANY))
 #define TSFLAGS_ANY	  (SOF_TIMESTAMPING_SOFTWARE			| \
 			   SOF_TIMESTAMPING_RAW_HARDWARE)

-	if (sk->sk_flags & FLAGS_RECV_CMSGS ||
-	    READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY)
+	if (sk->sk_flags & FLAGS_RECV_CMSGS)
 		__sock_recv_cmsgs(msg, sk, skb);
 	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
 		sock_write_timestamp(sk, skb->tstamp);
diff --git a/net/core/sock.c b/net/core/sock.c
index eae2ae70a2e0..a197f0a0b878 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -938,6 +938,7 @@ int sock_set_timestamping(struct sock *sk, int optname,

 	WRITE_ONCE(sk->sk_tsflags, val);
 	sock_valbool_flag(sk, SOCK_TSTAMP_NEW, optname == SO_TIMESTAMPING_NEW);
+	sock_valbool_flag(sk, SOCK_TIMESTAMPING_ANY, !!(val & TSFLAGS_ANY));

 	if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
 		sock_enable_timestamp(sk,

Cheers,

Paolo


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 21:24       ` Paolo Abeni
@ 2025-02-10 21:26         ` Eric Dumazet
  2025-02-11  3:16           ` Willem de Bruijn
  2025-02-10 21:33         ` Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2025-02-10 21:26 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, netdev, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Simon Horman, Neal Cardwell, David Ahern

On Mon, Feb 10, 2025 at 10:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 2/10/25 5:16 PM, Paolo Abeni wrote:
> > I expect the change you propose would perform alike the RFC patches, but
> > I'll try to do an explicit test later (and report here the results).
>
> I ran my test on the sock layout change, and it gave the same (good)
> results as the RFC. Note that such test uses a single socket receiver,
> so it's not affected in any way by the eventual increase of touched
> 'struct sock' cachelines.
>
> BTW it just occurred to me that if we could use another bit from
> sk_flags, something alike the following (completely untested!!!) would
> do, without changing the struct sock layout and without adding other
> sock proto ops:
>
> ---
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 8036b3b79cd8..a526db7f5c60 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -954,6 +954,7 @@ enum sock_flags {
>         SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
>         SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
>         SOCK_RCVPRIORITY, /* Receive SO_PRIORITY ancillary data with packet */
> +       SOCK_TIMESTAMPING_ANY, /* sk_tsflags & TSFLAGS_ANY */
>  };
>
>  #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL <<
> SOCK_TIMESTAMPING_RX_SOFTWARE))
> @@ -2665,12 +2666,12 @@ static inline void sock_recv_cmsgs(struct msghdr
> *msg, struct sock *sk,
>  #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL)                       | \
>                            (1UL << SOCK_RCVTSTAMP)                      | \
>                            (1UL << SOCK_RCVMARK)                        |\
> -                          (1UL << SOCK_RCVPRIORITY))
> +                          (1UL << SOCK_RCVPRIORITY)                    |\
> +                          (1UL << SOCK_TIMESTAMPING_ANY))
>  #define TSFLAGS_ANY      (SOF_TIMESTAMPING_SOFTWARE                    | \
>                            SOF_TIMESTAMPING_RAW_HARDWARE)
>
> -       if (sk->sk_flags & FLAGS_RECV_CMSGS ||
> -           READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY)
> +       if (sk->sk_flags & FLAGS_RECV_CMSGS)
>                 __sock_recv_cmsgs(msg, sk, skb);
>         else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
>                 sock_write_timestamp(sk, skb->tstamp);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index eae2ae70a2e0..a197f0a0b878 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -938,6 +938,7 @@ int sock_set_timestamping(struct sock *sk, int optname,
>
>         WRITE_ONCE(sk->sk_tsflags, val);
>         sock_valbool_flag(sk, SOCK_TSTAMP_NEW, optname == SO_TIMESTAMPING_NEW);
> +       sock_valbool_flag(sk, SOCK_TIMESTAMPING_ANY, !!(val & TSFLAGS_ANY));
>
>         if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
>                 sock_enable_timestamp(sk,

This looks nice indeed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 21:24       ` Paolo Abeni
  2025-02-10 21:26         ` Eric Dumazet
@ 2025-02-10 21:33         ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2025-02-10 21:33 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Willem de Bruijn, netdev, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Simon Horman, Neal Cardwell, David Ahern

On Mon, Feb 10, 2025 at 10:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 2/10/25 5:16 PM, Paolo Abeni wrote:
> > I expect the change you propose would perform alike the RFC patches, but
> > I'll try to do an explicit test later (and report here the results).
>
> I ran my test on the sock layout change, and it gave the same (good)
> results as the RFC. Note that such test uses a single socket receiver,
> so it's not affected in any way by the eventual increase of touched
> 'struct sock' cachelines.
>
> BTW it just occurred to me that if we could use another bit from
> sk_flags, something alike the following (completely untested!!!) would
> do, without changing the struct sock layout and without adding other
> sock proto ops:
>
> ---
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 8036b3b79cd8..a526db7f5c60 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -954,6 +954,7 @@ enum sock_flags {
>         SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
>         SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
>         SOCK_RCVPRIORITY, /* Receive SO_PRIORITY ancillary data with packet */
> +       SOCK_TIMESTAMPING_ANY, /* sk_tsflags & TSFLAGS_ANY */
>  };
>
>  #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL <<
> SOCK_TIMESTAMPING_RX_SOFTWARE))
> @@ -2665,12 +2666,12 @@ static inline void sock_recv_cmsgs(struct msghdr
> *msg, struct sock *sk,
>  #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL)                       | \
>                            (1UL << SOCK_RCVTSTAMP)                      | \
>                            (1UL << SOCK_RCVMARK)                        |\
> -                          (1UL << SOCK_RCVPRIORITY))
> +                          (1UL << SOCK_RCVPRIORITY)                    |\
> +                          (1UL << SOCK_TIMESTAMPING_ANY))
>  #define TSFLAGS_ANY      (SOF_TIMESTAMPING_SOFTWARE                    | \
>                            SOF_TIMESTAMPING_RAW_HARDWARE)
>
> -       if (sk->sk_flags & FLAGS_RECV_CMSGS ||
> -           READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY)
> +       if (sk->sk_flags & FLAGS_RECV_CMSGS)

BTW we should READ_ONCE(sk->sk_flags) ...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags
  2025-02-10 21:26         ` Eric Dumazet
@ 2025-02-11  3:16           ` Willem de Bruijn
  0 siblings, 0 replies; 14+ messages in thread
From: Willem de Bruijn @ 2025-02-11  3:16 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni
  Cc: Willem de Bruijn, netdev, Kuniyuki Iwashima, David S. Miller,
	Jakub Kicinski, Simon Horman, Neal Cardwell, David Ahern

Eric Dumazet wrote:
> On Mon, Feb 10, 2025 at 10:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On 2/10/25 5:16 PM, Paolo Abeni wrote:
> > > I expect the change you propose would perform alike the RFC patches, but
> > > I'll try to do an explicit test later (and report here the results).
> >
> > I ran my test on the sock layout change, and it gave the same (good)
> > results as the RFC. Note that such test uses a single socket receiver,
> > so it's not affected in any way by the eventual increase of touched
> > 'struct sock' cachelines.
> >
> > BTW it just occurred to me that if we could use another bit from
> > sk_flags, something alike the following (completely untested!!!) would
> > do, without changing the struct sock layout and without adding other
> > sock proto ops:
> >
> > ---
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 8036b3b79cd8..a526db7f5c60 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -954,6 +954,7 @@ enum sock_flags {
> >         SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
> >         SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
> >         SOCK_RCVPRIORITY, /* Receive SO_PRIORITY ancillary data with packet */
> > +       SOCK_TIMESTAMPING_ANY, /* sk_tsflags & TSFLAGS_ANY */
> >  };
> >
> >  #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL <<
> > SOCK_TIMESTAMPING_RX_SOFTWARE))
> > @@ -2665,12 +2666,12 @@ static inline void sock_recv_cmsgs(struct msghdr
> > *msg, struct sock *sk,
> >  #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL)                       | \
> >                            (1UL << SOCK_RCVTSTAMP)                      | \
> >                            (1UL << SOCK_RCVMARK)                        |\
> > -                          (1UL << SOCK_RCVPRIORITY))
> > +                          (1UL << SOCK_RCVPRIORITY)                    |\
> > +                          (1UL << SOCK_TIMESTAMPING_ANY))
> >  #define TSFLAGS_ANY      (SOF_TIMESTAMPING_SOFTWARE                    | \
> >                            SOF_TIMESTAMPING_RAW_HARDWARE)
> >
> > -       if (sk->sk_flags & FLAGS_RECV_CMSGS ||
> > -           READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY)
> > +       if (sk->sk_flags & FLAGS_RECV_CMSGS)
> >                 __sock_recv_cmsgs(msg, sk, skb);
> >         else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
> >                 sock_write_timestamp(sk, skb->tstamp);
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index eae2ae70a2e0..a197f0a0b878 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -938,6 +938,7 @@ int sock_set_timestamping(struct sock *sk, int optname,
> >
> >         WRITE_ONCE(sk->sk_tsflags, val);
> >         sock_valbool_flag(sk, SOCK_TSTAMP_NEW, optname == SO_TIMESTAMPING_NEW);
> > +       sock_valbool_flag(sk, SOCK_TIMESTAMPING_ANY, !!(val & TSFLAGS_ANY));
> >
> >         if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
> >                 sock_enable_timestamp(sk,
> 
> This looks nice indeed.

+1

I thought we could refine to use RX bits rather than TSFLAGS_ANY.
But not trivial, and today already uses TSFLAGS_ANY so out of scope
for this patch. sock_recv_timestamp has weird behavior: returning a
timestamp also if SOF_TIMESTAMPING_SOFTWARE is set without an RX flag.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-02-11  3:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 16:23 [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Paolo Abeni
2025-02-07 16:23 ` [RFC PATCH 1/2] sock: introduce set_tsflags operation Paolo Abeni
2025-02-07 16:23 ` [RFC PATCH 2/2] udp: avoid false sharing via protocol specific set_tsflags Paolo Abeni
2025-02-10  4:00 ` [RFC PATCH 0/2] udp: avoid false sharing on sk_tsflags Willem de Bruijn
2025-02-10 15:13   ` Eric Dumazet
2025-02-10 16:16     ` Paolo Abeni
2025-02-10 16:37       ` Eric Dumazet
2025-02-10 17:53         ` Willem de Bruijn
2025-02-10 20:54           ` Paolo Abeni
2025-02-10 21:24       ` Paolo Abeni
2025-02-10 21:26         ` Eric Dumazet
2025-02-11  3:16           ` Willem de Bruijn
2025-02-10 21:33         ` Eric Dumazet
2025-02-10  7:49 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).