netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	Willem de Bruijn <willemb@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>
Subject: Re: [PATCH net-next 3/3] net: add new sk->sk_drops1 field
Date: Tue, 26 Aug 2025 09:16:38 +0200	[thread overview]
Message-ID: <8f09830a-d83d-43c9-b36b-88ba0a23e9b2@redhat.com> (raw)
In-Reply-To: <CANn89iLNnYXH0z4BOc0UZjvbuZ5gWWHVTP1MrOHkVUq26szCKA@mail.gmail.com>

On 8/26/25 8:46 AM, Eric Dumazet wrote:
> On Mon, Aug 25, 2025 at 11:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 8/25/25 9:59 PM, Eric Dumazet wrote:
>>> sk->sk_drops can be heavily contended when
>>> changed from many cpus.
>>>
>>> Instead using too expensive per-cpu data structure,
>>> add a second sk->sk_drops1 field and change
>>> sk_drops_inc() to be NUMA aware.
>>>
>>> This patch adds 64 bytes per socket.
>>
>> I'm wondering: since the main target for dealing with drops are UDP
>> sockets, have you considered adding sk_drops1 to udp_sock, instead?
> 
> I actually saw the issues on RAW sockets, some applications were using them
> in a non appropriate way. This was not an attack on single UDP sockets, but
> a self-inflicted issue on RAW sockets.
> 
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Mar 7 16:29:43 2024 +0000
> 
>     ipv6: raw: check sk->sk_rcvbuf earlier
> 
>     There is no point cloning an skb and having to free the clone
>     if the receive queue of the raw socket is full.
> 
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Reviewed-by: Willem de Bruijn <willemb@google.com>
>     Link: https://lore.kernel.org/r/20240307162943.2523817-1-edumazet@google.com
>     Signed-off-by: Jakub Kicinski <kuba@kernel.org>

I see, thanks for the pointer. Perhaps something alike the following
(completely untested) could fit? With similar delta for raw sock and
sk_drops_{read,inc,reset} would check sk_drop_counters and ev. use it
instead of sk->sk_drop. Otherwise I have no objections at all!
---
diff --git a/include/net/sock.h b/include/net/sock.h
index 63a6a48afb48..3dd76c04bd86 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -102,6 +102,11 @@ struct net;
 typedef __u32 __bitwise __portpair;
 typedef __u64 __bitwise __addrpair;

+struct socket_drop_counters {
+	atomic_t sk_drops0 ____cacheline_aligned_in_smp;
+	atomic_t sk_drops1 ____cacheline_aligned_in_smp;
+};
+
 /**
  *	struct sock_common - minimal network layer representation of sockets
  *	@skc_daddr: Foreign IPv4 addr
@@ -449,6 +454,7 @@ struct sock {
 #ifdef CONFIG_XFRM
 	struct xfrm_policy __rcu *sk_policy[2];
 #endif
+	struct socket_drop_counters *sk_drop_counters;
 	__cacheline_group_end(sock_read_rxtx);

 	__cacheline_group_begin(sock_write_rxtx);
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 4e1a672af4c5..45eec01fbbb2 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -108,6 +108,8 @@ struct udp_sock {
 	 * the last UDP socket cacheline.
 	 */
 	struct hlist_node	tunnel_list;
+
+	struct socket_drop_counters drop_counters;
};

 #define udp_test_bit(nr, sk)			\
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cc3ce0f762ec..eff90755b6ac 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1818,6 +1818,7 @@ static void udp_destruct_sock(struct sock *sk)
 int udp_init_sock(struct sock *sk)
 {
 	udp_lib_init_sock(sk);
+	sk->sk_drop_counters = &udp_sk(sk)->drop_counters;
 	sk->sk_destruct = udp_destruct_sock;
 	set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
 	return 0;
---

>> Plus an additional conditional/casting in sk_drops_{read,inc,reset}.
>>
>> That would save some memory also offer the opportunity to use more
>> memory to deal with  NUMA hosts.
>>
>> (I had the crazy idea to keep sk_drop on a contended cacheline and use 2
>> (or more) cacheline aligned fields for udp_sock only).
> 
> I am working on rmem_alloc batches on both producer and consumer
> as a follow up of recent thread on netdev :
> 
> https://lore.kernel.org/netdev/aKh_yi0gASYajhev@bzorp3/T/#m392d5c87ab08d6ae005c23ffc8a3186cbac07cf2
> 
> Right now, when multiple cpus (running on different NUMA nodes) are
> feeding packets to __udp_enqueue_schedule_skb()
> we are touching two cache lines, my plan is to reduce this to a single one.

Obviously looking forward to it!

Thanks,

Paolo


  reply	other threads:[~2025-08-26  7:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-25 19:59 [PATCH net-next 0/3] net: better drop accounting Eric Dumazet
2025-08-25 19:59 ` [PATCH net-next 1/3] net: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers Eric Dumazet
2025-08-25 20:15   ` Kuniyuki Iwashima
2025-08-25 19:59 ` [PATCH net-next 2/3] net: move sk_drops out of sock_write_rx group Eric Dumazet
2025-08-25 20:17   ` Kuniyuki Iwashima
2025-08-25 19:59 ` [PATCH net-next 3/3] net: add new sk->sk_drops1 field Eric Dumazet
2025-08-25 20:19   ` Kuniyuki Iwashima
2025-08-26  5:46     ` Eric Dumazet
2025-08-26  5:56       ` Kuniyuki Iwashima
2025-08-26  6:33   ` Paolo Abeni
2025-08-26  6:46     ` Eric Dumazet
2025-08-26  7:16       ` Paolo Abeni [this message]
2025-08-26 12:11         ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f09830a-d83d-43c9-b36b-88ba0a23e9b2@redhat.com \
    --to=pabeni@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).