[PATCH net-next 0/2] udp: avoid false sharing on receive

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] udp: avoid false sharing on receive
@ 2022-10-19 10:01 Paolo Abeni
  2022-10-19 10:02 ` [PATCH net-next 1/2] net: introduce and use custom sockopt socket flag Paolo Abeni
  2022-10-19 10:02 ` [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline Paolo Abeni
  0 siblings, 2 replies; 7+ messages in thread
From: Paolo Abeni @ 2022-10-19 10:01 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, mptcp, David Ahern,
	Mat Martineau, Matthieu Baerts

Under high UDP load, the BH processing and the user-space receiver can
run on different cores.

The UDP implementation does a lot of effort to avoid false sharing in
the receive path, but recent changes to the struct sock layout moved
the sk_forward_alloc and the sk_rcvbuf fields on the same cacheline:

        /* --- cacheline 4 boundary (256 bytes) --- */
                struct sk_buff *   tail;
        } sk_backlog;
        int                        sk_forward_alloc;
        unsigned int               sk_reserved_mem;
        unsigned int               sk_ll_usec;
        unsigned int               sk_napi_id;
        int                        sk_rcvbuf;

sk_forward_alloc is updated by the BH, while sk_rcvbuf is accessed by
udp_recvmsg(), causing false sharing.

A possible solution would be to re-order the struct sock fields to avoid
the false sharing. Such change is subject to being invalidated by future
changes and could have negative side effects on other workload.

Instead this series uses a different approach, touching only the UDP
socket layout. 

The first patch generalizes the custom setsockopt infrastructure, to
allow UDP tracking the buffer size, and the second patch addresses the
issue, copying the relevant buffer information into an already hot
cacheline.

Overall the above gives a 10% peek throughput increase under UDP flood.

Paolo Abeni (2):
  net: introduce and use custom sockopt socket flag
  udp: track the forward memory release threshold in an hot cacheline

 include/linux/net.h  |  1 +
 include/linux/udp.h  |  3 +++
 net/ipv4/udp.c       | 22 +++++++++++++++++++---
 net/ipv6/udp.c       |  8 ++++++--
 net/mptcp/protocol.c |  4 ++++
 net/socket.c         |  8 +-------
 6 files changed, 34 insertions(+), 12 deletions(-)

-- 
2.37.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/2] net: introduce and use custom sockopt socket flag
  2022-10-19 10:01 [PATCH net-next 0/2] udp: avoid false sharing on receive Paolo Abeni
@ 2022-10-19 10:02 ` Paolo Abeni
  2022-10-19 15:16   ` Matthieu Baerts
  2022-10-19 10:02 ` [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline Paolo Abeni
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2022-10-19 10:02 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, mptcp, David Ahern,
	Mat Martineau, Matthieu Baerts

We will soon introduce custom setsockopt for UDP sockets, too.
Instead of doing even more complex arbitrary checks inside
sock_use_custom_sol_socket(), add a new socket flag and set it
for the relevant socket types (currently only MPTCP).

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/net.h  | 1 +
 net/mptcp/protocol.c | 4 ++++
 net/socket.c         | 8 +-------
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 711c3593c3b8..59350fd85823 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -41,6 +41,7 @@ struct net;
 #define SOCK_NOSPACE		2
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
+#define SOCK_CUSTOM_SOCKOPT	5
 
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f599ad44ed24..0448a5c3da3c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2708,6 +2708,8 @@ static int mptcp_init_sock(struct sock *sk)
 	if (ret)
 		return ret;
 
+	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
+
 	/* fetch the ca name; do it outside __mptcp_init_sock(), so that clone will
 	 * propagate the correct value
 	 */
@@ -3684,6 +3686,8 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 		struct mptcp_subflow_context *subflow;
 		struct sock *newsk = newsock->sk;
 
+		set_bit(SOCK_CUSTOM_SOCKOPT, &newsock->flags);
+
 		lock_sock(newsk);
 
 		/* PM/worker can now acquire the first subflow socket
diff --git a/net/socket.c b/net/socket.c
index 00da9ce3dba0..55c5d536e5f6 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2199,13 +2199,7 @@ SYSCALL_DEFINE4(recv, int, fd, void __user *, ubuf, size_t, size,
 
 static bool sock_use_custom_sol_socket(const struct socket *sock)
 {
-	const struct sock *sk = sock->sk;
-
-	/* Use sock->ops->setsockopt() for MPTCP */
-	return IS_ENABLED(CONFIG_MPTCP) &&
-	       sk->sk_protocol == IPPROTO_MPTCP &&
-	       sk->sk_type == SOCK_STREAM &&
-	       (sk->sk_family == AF_INET || sk->sk_family == AF_INET6);
+	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
 }
 
 /*
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline
  2022-10-19 10:01 [PATCH net-next 0/2] udp: avoid false sharing on receive Paolo Abeni
  2022-10-19 10:02 ` [PATCH net-next 1/2] net: introduce and use custom sockopt socket flag Paolo Abeni
@ 2022-10-19 10:02 ` Paolo Abeni
  2022-10-19 16:33   ` Kuniyuki Iwashima
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2022-10-19 10:02 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, mptcp, David Ahern,
	Mat Martineau, Matthieu Baerts

When the receiver process and the BH runs on different cores,
udp_rmem_release() experience a cache miss while accessing sk_rcvbuf,
as the latter shares the same cacheline with sk_forward_alloc, written
by the BH.

With this patch, UDP tracks the rcvbuf value and its update via custom
SOL_SOCKET socket options, and copies the forward memory threshold value
used by udp_rmem_release() in a different cacheline, already accessed by
the above function and uncontended.

Overall the above give a 10% peek throughput increase under UDP flood.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/udp.h |  3 +++
 net/ipv4/udp.c      | 22 +++++++++++++++++++---
 net/ipv6/udp.c      |  8 ++++++--
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index e96da4157d04..5cdba00a904a 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -87,6 +87,9 @@ struct udp_sock {
 
 	/* This field is dirtied by udp_recvmsg() */
 	int		forward_deficit;
+
+	/* This fields follows rcvbuf value, and is touched by udp_recvmsg */
+	int		forward_threshold;
 };
 
 #define UDP_MAX_SEGMENTS	(1 << 6UL)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8126f67d18b3..915f573587fa 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1448,7 +1448,7 @@ static void udp_rmem_release(struct sock *sk, int size, int partial,
 	if (likely(partial)) {
 		up->forward_deficit += size;
 		size = up->forward_deficit;
-		if (size < (sk->sk_rcvbuf >> 2) &&
+		if (size < READ_ONCE(up->forward_threshold) &&
 		    !skb_queue_empty(&up->reader_queue))
 			return;
 	} else {
@@ -1622,8 +1622,12 @@ static void udp_destruct_sock(struct sock *sk)
 
 int udp_init_sock(struct sock *sk)
 {
-	skb_queue_head_init(&udp_sk(sk)->reader_queue);
+	struct udp_sock *up = udp_sk(sk);
+
+	skb_queue_head_init(&up->reader_queue);
+	up->forward_threshold = sk->sk_rcvbuf >> 2;
 	sk->sk_destruct = udp_destruct_sock;
+	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
 	return 0;
 }
 
@@ -2671,6 +2675,18 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
 	int err = 0;
 	int is_udplite = IS_UDPLITE(sk);
 
+	if (level == SOL_SOCKET) {
+		err = sk_setsockopt(sk, level, optname, optval, optlen);
+
+		if (optname == SO_RCVBUF || optname == SO_RCVBUFFORCE) {
+			sockopt_lock_sock(sk);
+			/* paired with READ_ONCE in udp_rmem_release() */
+			WRITE_ONCE(up->forward_threshold, sk->sk_rcvbuf >> 2);
+			sockopt_release_sock(sk);
+		}
+		return err;
+	}
+
 	if (optlen < sizeof(int))
 		return -EINVAL;
 
@@ -2784,7 +2800,7 @@ EXPORT_SYMBOL(udp_lib_setsockopt);
 int udp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		   unsigned int optlen)
 {
-	if (level == SOL_UDP  ||  level == SOL_UDPLITE)
+	if (level == SOL_UDP  ||  level == SOL_UDPLITE || level == SOL_SOCKET)
 		return udp_lib_setsockopt(sk, level, optname,
 					  optval, optlen,
 					  udp_push_pending_frames);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8d09f0ea5b8c..1ed20bcfd7a0 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -64,8 +64,12 @@ static void udpv6_destruct_sock(struct sock *sk)
 
 int udpv6_init_sock(struct sock *sk)
 {
-	skb_queue_head_init(&udp_sk(sk)->reader_queue);
+	struct udp_sock *up = udp_sk(sk);
+
+	skb_queue_head_init(&up->reader_queue);
+	up->forward_threshold = sk->sk_rcvbuf >> 2;
 	sk->sk_destruct = udpv6_destruct_sock;
+	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
 	return 0;
 }
 
@@ -1671,7 +1675,7 @@ void udpv6_destroy_sock(struct sock *sk)
 int udpv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		     unsigned int optlen)
 {
-	if (level == SOL_UDP  ||  level == SOL_UDPLITE)
+	if (level == SOL_UDP  ||  level == SOL_UDPLITE || level == SOL_SOCKET)
 		return udp_lib_setsockopt(sk, level, optname,
 					  optval, optlen,
 					  udp_v6_push_pending_frames);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] net: introduce and use custom sockopt socket flag
  2022-10-19 10:02 ` [PATCH net-next 1/2] net: introduce and use custom sockopt socket flag Paolo Abeni
@ 2022-10-19 15:16   ` Matthieu Baerts
  0 siblings, 0 replies; 7+ messages in thread
From: Matthieu Baerts @ 2022-10-19 15:16 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, mptcp, David Ahern,
	Mat Martineau

Hi Paolo,

On 19/10/2022 12:02, Paolo Abeni wrote:
> We will soon introduce custom setsockopt for UDP sockets, too.
> Instead of doing even more complex arbitrary checks inside
> sock_use_custom_sol_socket(), add a new socket flag and set it
> for the relevant socket types (currently only MPTCP).

Good idea, it is clearer and it looks good to me!

For the modifications in MPTCP:

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline
  2022-10-19 10:02 ` [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline Paolo Abeni
@ 2022-10-19 16:33   ` Kuniyuki Iwashima
  2022-10-19 16:58     ` Paolo Abeni
  0 siblings, 1 reply; 7+ messages in thread
From: Kuniyuki Iwashima @ 2022-10-19 16:33 UTC (permalink / raw)
  To: pabeni
  Cc: davem, dsahern, edumazet, kuba, mathew.j.martineau,
	matthieu.baerts, mptcp, netdev

From:   Paolo Abeni <pabeni@redhat.com>
Date:   Wed, 19 Oct 2022 12:02:01 +0200
> When the receiver process and the BH runs on different cores,
> udp_rmem_release() experience a cache miss while accessing sk_rcvbuf,
> as the latter shares the same cacheline with sk_forward_alloc, written
> by the BH.
> 
> With this patch, UDP tracks the rcvbuf value and its update via custom
> SOL_SOCKET socket options, and copies the forward memory threshold value
> used by udp_rmem_release() in a different cacheline, already accessed by
> the above function and uncontended.
> 
> Overall the above give a 10% peek throughput increase under UDP flood.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/linux/udp.h |  3 +++
>  net/ipv4/udp.c      | 22 +++++++++++++++++++---
>  net/ipv6/udp.c      |  8 ++++++--
>  3 files changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index e96da4157d04..5cdba00a904a 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -87,6 +87,9 @@ struct udp_sock {
>  
>  	/* This field is dirtied by udp_recvmsg() */
>  	int		forward_deficit;
> +
> +	/* This fields follows rcvbuf value, and is touched by udp_recvmsg */
> +	int		forward_threshold;
>  };
>  
>  #define UDP_MAX_SEGMENTS	(1 << 6UL)
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 8126f67d18b3..915f573587fa 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1448,7 +1448,7 @@ static void udp_rmem_release(struct sock *sk, int size, int partial,
>  	if (likely(partial)) {
>  		up->forward_deficit += size;
>  		size = up->forward_deficit;
> -		if (size < (sk->sk_rcvbuf >> 2) &&
> +		if (size < READ_ONCE(up->forward_threshold) &&
>  		    !skb_queue_empty(&up->reader_queue))
>  			return;
>  	} else {
> @@ -1622,8 +1622,12 @@ static void udp_destruct_sock(struct sock *sk)
>  
>  int udp_init_sock(struct sock *sk)
>  {
> -	skb_queue_head_init(&udp_sk(sk)->reader_queue);
> +	struct udp_sock *up = udp_sk(sk);
> +
> +	skb_queue_head_init(&up->reader_queue);
> +	up->forward_threshold = sk->sk_rcvbuf >> 2;
>  	sk->sk_destruct = udp_destruct_sock;
> +	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
>  	return 0;
>  }
>  
> @@ -2671,6 +2675,18 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
>  	int err = 0;
>  	int is_udplite = IS_UDPLITE(sk);
>  
> +	if (level == SOL_SOCKET) {
> +		err = sk_setsockopt(sk, level, optname, optval, optlen);
> +
> +		if (optname == SO_RCVBUF || optname == SO_RCVBUFFORCE) {
> +			sockopt_lock_sock(sk);

Can we drop this lock by adding READ_ONCE() to sk->sk_rcvbuf below ?


> +			/* paired with READ_ONCE in udp_rmem_release() */
> +			WRITE_ONCE(up->forward_threshold, sk->sk_rcvbuf >> 2);
> +			sockopt_release_sock(sk);
> +		}
> +		return err;
> +	}
> +
>  	if (optlen < sizeof(int))
>  		return -EINVAL;
>  
> @@ -2784,7 +2800,7 @@ EXPORT_SYMBOL(udp_lib_setsockopt);
>  int udp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
>  		   unsigned int optlen)
>  {
> -	if (level == SOL_UDP  ||  level == SOL_UDPLITE)
> +	if (level == SOL_UDP  ||  level == SOL_UDPLITE || level == SOL_SOCKET)
>  		return udp_lib_setsockopt(sk, level, optname,
>  					  optval, optlen,
>  					  udp_push_pending_frames);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 8d09f0ea5b8c..1ed20bcfd7a0 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -64,8 +64,12 @@ static void udpv6_destruct_sock(struct sock *sk)
>  
>  int udpv6_init_sock(struct sock *sk)
>  {
> -	skb_queue_head_init(&udp_sk(sk)->reader_queue);
> +	struct udp_sock *up = udp_sk(sk);
> +
> +	skb_queue_head_init(&up->reader_queue);
> +	up->forward_threshold = sk->sk_rcvbuf >> 2;
>  	sk->sk_destruct = udpv6_destruct_sock;
> +	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
>  	return 0;
>  }

It's time to factorise this part like udp_destruct_common() ?


>  
> @@ -1671,7 +1675,7 @@ void udpv6_destroy_sock(struct sock *sk)
>  int udpv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
>  		     unsigned int optlen)
>  {
> -	if (level == SOL_UDP  ||  level == SOL_UDPLITE)
> +	if (level == SOL_UDP  ||  level == SOL_UDPLITE || level == SOL_SOCKET)
>  		return udp_lib_setsockopt(sk, level, optname,
>  					  optval, optlen,
>  					  udp_v6_push_pending_frames);
> -- 
> 2.37.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline
  2022-10-19 16:33   ` Kuniyuki Iwashima
@ 2022-10-19 16:58     ` Paolo Abeni
  2022-10-19 17:09       ` Kuniyuki Iwashima
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2022-10-19 16:58 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: davem, dsahern, edumazet, kuba, mathew.j.martineau,
	matthieu.baerts, mptcp, netdev

On Wed, 2022-10-19 at 09:33 -0700, Kuniyuki Iwashima wrote:
> From:   Paolo Abeni <pabeni@redhat.com>
> Date:   Wed, 19 Oct 2022 12:02:01 +0200
> > When the receiver process and the BH runs on different cores,
> > udp_rmem_release() experience a cache miss while accessing sk_rcvbuf,
> > as the latter shares the same cacheline with sk_forward_alloc, written
> > by the BH.
> > 
> > With this patch, UDP tracks the rcvbuf value and its update via custom
> > SOL_SOCKET socket options, and copies the forward memory threshold value
> > used by udp_rmem_release() in a different cacheline, already accessed by
> > the above function and uncontended.
> > 
> > Overall the above give a 10% peek throughput increase under UDP flood.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/linux/udp.h |  3 +++
> >  net/ipv4/udp.c      | 22 +++++++++++++++++++---
> >  net/ipv6/udp.c      |  8 ++++++--
> >  3 files changed, 28 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > index e96da4157d04..5cdba00a904a 100644
> > --- a/include/linux/udp.h
> > +++ b/include/linux/udp.h
> > @@ -87,6 +87,9 @@ struct udp_sock {
> >  
> >  	/* This field is dirtied by udp_recvmsg() */
> >  	int		forward_deficit;
> > +
> > +	/* This fields follows rcvbuf value, and is touched by udp_recvmsg */
> > +	int		forward_threshold;
> >  };
> >  
> >  #define UDP_MAX_SEGMENTS	(1 << 6UL)
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 8126f67d18b3..915f573587fa 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1448,7 +1448,7 @@ static void udp_rmem_release(struct sock *sk, int size, int partial,
> >  	if (likely(partial)) {
> >  		up->forward_deficit += size;
> >  		size = up->forward_deficit;
> > -		if (size < (sk->sk_rcvbuf >> 2) &&
> > +		if (size < READ_ONCE(up->forward_threshold) &&
> >  		    !skb_queue_empty(&up->reader_queue))
> >  			return;
> >  	} else {
> > @@ -1622,8 +1622,12 @@ static void udp_destruct_sock(struct sock *sk)
> >  
> >  int udp_init_sock(struct sock *sk)
> >  {
> > -	skb_queue_head_init(&udp_sk(sk)->reader_queue);
> > +	struct udp_sock *up = udp_sk(sk);
> > +
> > +	skb_queue_head_init(&up->reader_queue);
> > +	up->forward_threshold = sk->sk_rcvbuf >> 2;
> >  	sk->sk_destruct = udp_destruct_sock;
> > +	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
> >  	return 0;
> >  }
> >  
> > @@ -2671,6 +2675,18 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
> >  	int err = 0;
> >  	int is_udplite = IS_UDPLITE(sk);
> >  
> > +	if (level == SOL_SOCKET) {
> > +		err = sk_setsockopt(sk, level, optname, optval, optlen);
> > +
> > +		if (optname == SO_RCVBUF || optname == SO_RCVBUFFORCE) {
> > +			sockopt_lock_sock(sk);
> 
> Can we drop this lock by adding READ_ONCE() to sk->sk_rcvbuf below ?

I think we can't. If there are racing thread updating rcvbuf, we could
end-up with mismatching value in forward_threshold. Not a likely
scenario, but still... This is control path, acquiring the lock once
more should not be a problem.

> > +			/* paired with READ_ONCE in udp_rmem_release() */
> > +			WRITE_ONCE(up->forward_threshold, sk->sk_rcvbuf >> 2);
> > +			sockopt_release_sock(sk);
> > +		}
> > +		return err;
> > +	}
> > +
> >  	if (optlen < sizeof(int))
> >  		return -EINVAL;
> >  
> > @@ -2784,7 +2800,7 @@ EXPORT_SYMBOL(udp_lib_setsockopt);
> >  int udp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
> >  		   unsigned int optlen)
> >  {
> > -	if (level == SOL_UDP  ||  level == SOL_UDPLITE)
> > +	if (level == SOL_UDP  ||  level == SOL_UDPLITE || level == SOL_SOCKET)
> >  		return udp_lib_setsockopt(sk, level, optname,
> >  					  optval, optlen,
> >  					  udp_push_pending_frames);
> > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> > index 8d09f0ea5b8c..1ed20bcfd7a0 100644
> > --- a/net/ipv6/udp.c
> > +++ b/net/ipv6/udp.c
> > @@ -64,8 +64,12 @@ static void udpv6_destruct_sock(struct sock *sk)
> >  
> >  int udpv6_init_sock(struct sock *sk)
> >  {
> > -	skb_queue_head_init(&udp_sk(sk)->reader_queue);
> > +	struct udp_sock *up = udp_sk(sk);
> > +
> > +	skb_queue_head_init(&up->reader_queue);
> > +	up->forward_threshold = sk->sk_rcvbuf >> 2;
> >  	sk->sk_destruct = udpv6_destruct_sock;
> > +	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
> >  	return 0;
> >  }
> 
> It's time to factorise this part like udp_destruct_common() ?

I guess it makes sense. Possibly 'udp_lib_destruct()' just to follow
others helper style?

> 
Thanks,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline
  2022-10-19 16:58     ` Paolo Abeni
@ 2022-10-19 17:09       ` Kuniyuki Iwashima
  0 siblings, 0 replies; 7+ messages in thread
From: Kuniyuki Iwashima @ 2022-10-19 17:09 UTC (permalink / raw)
  To: pabeni
  Cc: davem, dsahern, edumazet, kuba, kuniyu, mathew.j.martineau,
	matthieu.baerts, mptcp, netdev

From:   Paolo Abeni <pabeni@redhat.com>
Date:   Wed, 19 Oct 2022 18:58:33 +0200
> On Wed, 2022-10-19 at 09:33 -0700, Kuniyuki Iwashima wrote:
> > From:   Paolo Abeni <pabeni@redhat.com>
> > Date:   Wed, 19 Oct 2022 12:02:01 +0200
> > > When the receiver process and the BH runs on different cores,
> > > udp_rmem_release() experience a cache miss while accessing sk_rcvbuf,
> > > as the latter shares the same cacheline with sk_forward_alloc, written
> > > by the BH.
> > > 
> > > With this patch, UDP tracks the rcvbuf value and its update via custom
> > > SOL_SOCKET socket options, and copies the forward memory threshold value
> > > used by udp_rmem_release() in a different cacheline, already accessed by
> > > the above function and uncontended.
> > > 
> > > Overall the above give a 10% peek throughput increase under UDP flood.
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  include/linux/udp.h |  3 +++
> > >  net/ipv4/udp.c      | 22 +++++++++++++++++++---
> > >  net/ipv6/udp.c      |  8 ++++++--
> > >  3 files changed, 28 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > > index e96da4157d04..5cdba00a904a 100644
> > > --- a/include/linux/udp.h
> > > +++ b/include/linux/udp.h
> > > @@ -87,6 +87,9 @@ struct udp_sock {
> > >  
> > >  	/* This field is dirtied by udp_recvmsg() */
> > >  	int		forward_deficit;
> > > +
> > > +	/* This fields follows rcvbuf value, and is touched by udp_recvmsg */
> > > +	int		forward_threshold;
> > >  };
> > >  
> > >  #define UDP_MAX_SEGMENTS	(1 << 6UL)
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 8126f67d18b3..915f573587fa 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -1448,7 +1448,7 @@ static void udp_rmem_release(struct sock *sk, int size, int partial,
> > >  	if (likely(partial)) {
> > >  		up->forward_deficit += size;
> > >  		size = up->forward_deficit;
> > > -		if (size < (sk->sk_rcvbuf >> 2) &&
> > > +		if (size < READ_ONCE(up->forward_threshold) &&
> > >  		    !skb_queue_empty(&up->reader_queue))
> > >  			return;
> > >  	} else {
> > > @@ -1622,8 +1622,12 @@ static void udp_destruct_sock(struct sock *sk)
> > >  
> > >  int udp_init_sock(struct sock *sk)
> > >  {
> > > -	skb_queue_head_init(&udp_sk(sk)->reader_queue);
> > > +	struct udp_sock *up = udp_sk(sk);
> > > +
> > > +	skb_queue_head_init(&up->reader_queue);
> > > +	up->forward_threshold = sk->sk_rcvbuf >> 2;
> > >  	sk->sk_destruct = udp_destruct_sock;
> > > +	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
> > >  	return 0;
> > >  }
> > >  
> > > @@ -2671,6 +2675,18 @@ int udp_lib_setsockopt(struct sock *sk, int level, int optname,
> > >  	int err = 0;
> > >  	int is_udplite = IS_UDPLITE(sk);
> > >  
> > > +	if (level == SOL_SOCKET) {
> > > +		err = sk_setsockopt(sk, level, optname, optval, optlen);
> > > +
> > > +		if (optname == SO_RCVBUF || optname == SO_RCVBUFFORCE) {
> > > +			sockopt_lock_sock(sk);
> > 
> > Can we drop this lock by adding READ_ONCE() to sk->sk_rcvbuf below ?
> 
> I think we can't. If there are racing thread updating rcvbuf, we could
> end-up with mismatching value in forward_threshold. Not a likely
> scenario, but still... This is control path, acquiring the lock once
> more should not be a problem.

I see.
Thank you!


> > > +			/* paired with READ_ONCE in udp_rmem_release() */
> > > +			WRITE_ONCE(up->forward_threshold, sk->sk_rcvbuf >> 2);
> > > +			sockopt_release_sock(sk);
> > > +		}
> > > +		return err;
> > > +	}
> > > +
> > >  	if (optlen < sizeof(int))
> > >  		return -EINVAL;
> > >  
> > > @@ -2784,7 +2800,7 @@ EXPORT_SYMBOL(udp_lib_setsockopt);
> > >  int udp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
> > >  		   unsigned int optlen)
> > >  {
> > > -	if (level == SOL_UDP  ||  level == SOL_UDPLITE)
> > > +	if (level == SOL_UDP  ||  level == SOL_UDPLITE || level == SOL_SOCKET)
> > >  		return udp_lib_setsockopt(sk, level, optname,
> > >  					  optval, optlen,
> > >  					  udp_push_pending_frames);
> > > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> > > index 8d09f0ea5b8c..1ed20bcfd7a0 100644
> > > --- a/net/ipv6/udp.c
> > > +++ b/net/ipv6/udp.c
> > > @@ -64,8 +64,12 @@ static void udpv6_destruct_sock(struct sock *sk)
> > >  
> > >  int udpv6_init_sock(struct sock *sk)
> > >  {
> > > -	skb_queue_head_init(&udp_sk(sk)->reader_queue);
> > > +	struct udp_sock *up = udp_sk(sk);
> > > +
> > > +	skb_queue_head_init(&up->reader_queue);
> > > +	up->forward_threshold = sk->sk_rcvbuf >> 2;
> > >  	sk->sk_destruct = udpv6_destruct_sock;
> > > +	set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags);
> > >  	return 0;
> > >  }
> > 
> > It's time to factorise this part like udp_destruct_common() ?
> 
> I guess it makes sense. Possibly 'udp_lib_destruct()' just to follow
> others helper style?

Ah, I should have named it so :)


> 
> > 
> Thanks,
> 
> Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-19 17:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-19 10:01 [PATCH net-next 0/2] udp: avoid false sharing on receive Paolo Abeni
2022-10-19 10:02 ` [PATCH net-next 1/2] net: introduce and use custom sockopt socket flag Paolo Abeni
2022-10-19 15:16   ` Matthieu Baerts
2022-10-19 10:02 ` [PATCH net-next 2/2] udp: track the forward memory release threshold in an hot cacheline Paolo Abeni
2022-10-19 16:33   ` Kuniyuki Iwashima
2022-10-19 16:58     ` Paolo Abeni
2022-10-19 17:09       ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).