From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Eric Dumazet <edumazet@google.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Willem de Bruijn <willemb@google.com>,
Kuniyuki Iwashima <kuniyu@google.com>,
David Ahern <dsahern@kernel.org>,
netdev@vger.kernel.org, eric.dumazet@gmail.com
Subject: Re: [PATCH net-next 09/10] udp: make busylock per socket
Date: Tue, 16 Sep 2025 15:15:00 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.17894869e21ae@gmail.com> (raw)
In-Reply-To: <CANn89i+KGxhZNmFw8TsD9GzQ8=Acag_ALDw9AB5A4gupBpRzQQ@mail.gmail.com>
Eric Dumazet wrote:
> On Tue, Sep 16, 2025 at 9:31 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Eric Dumazet wrote:
> > > While having all spinlocks packed into an array was a space saver,
> > > this also caused NUMA imbalance and hash collisions.
> > >
> > > UDPv6 socket size becomes 1600 after this patch.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > > include/linux/udp.h | 1 +
> > > include/net/udp.h | 1 +
> > > net/ipv4/udp.c | 20 ++------------------
> > > 3 files changed, 4 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > > index 6ed008ab166557e868c1918daaaa5d551b7989a7..e554890c4415b411f35007d3ece9e6042db7a544 100644
> > > --- a/include/linux/udp.h
> > > +++ b/include/linux/udp.h
> > > @@ -109,6 +109,7 @@ struct udp_sock {
> > > */
> > > struct hlist_node tunnel_list;
> > > struct numa_drop_counters drop_counters;
> > > + spinlock_t busylock ____cacheline_aligned_in_smp;
> > > };
> > >
> > > #define udp_test_bit(nr, sk) \
> > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > index a08822e294b038c0d00d4a5f5cac62286a207926..eecd64097f91196897f45530540b9c9b68c5ba4e 100644
> > > --- a/include/net/udp.h
> > > +++ b/include/net/udp.h
> > > @@ -289,6 +289,7 @@ static inline void udp_lib_init_sock(struct sock *sk)
> > > struct udp_sock *up = udp_sk(sk);
> > >
> > > sk->sk_drop_counters = &up->drop_counters;
> > > + spin_lock_init(&up->busylock);
> > > skb_queue_head_init(&up->reader_queue);
> > > INIT_HLIST_NODE(&up->tunnel_list);
> > > up->forward_threshold = sk->sk_rcvbuf >> 2;
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 25143f932447df2a84dd113ca33e1ccf15b3503c..7d1444821ee51a19cd5fd0dd5b8d096104c9283c 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -1689,17 +1689,11 @@ static void udp_skb_dtor_locked(struct sock *sk, struct sk_buff *skb)
> > > * to relieve pressure on the receive_queue spinlock shared by consumer.
> > > * Under flood, this means that only one producer can be in line
> > > * trying to acquire the receive_queue spinlock.
> > > - * These busylock can be allocated on a per cpu manner, instead of a
> > > - * per socket one (that would consume a cache line per socket)
> > > */
> > > -static int udp_busylocks_log __read_mostly;
> > > -static spinlock_t *udp_busylocks __read_mostly;
> > > -
> > > -static spinlock_t *busylock_acquire(void *ptr)
> > > +static spinlock_t *busylock_acquire(struct sock *sk)
> > > {
> > > - spinlock_t *busy;
> > > + spinlock_t *busy = &udp_sk(sk)->busylock;
> > >
> > > - busy = udp_busylocks + hash_ptr(ptr, udp_busylocks_log);
> > > spin_lock(busy);
> > > return busy;
> > > }
> > > @@ -3997,7 +3991,6 @@ static void __init bpf_iter_register(void)
> > > void __init udp_init(void)
> > > {
> > > unsigned long limit;
> > > - unsigned int i;
> > >
> > > udp_table_init(&udp_table, "UDP");
> > > limit = nr_free_buffer_pages() / 8;
> > > @@ -4006,15 +3999,6 @@ void __init udp_init(void)
> > > sysctl_udp_mem[1] = limit;
> > > sysctl_udp_mem[2] = sysctl_udp_mem[0] * 2;
> > >
> > > - /* 16 spinlocks per cpu */
> > > - udp_busylocks_log = ilog2(nr_cpu_ids) + 4;
> > > - udp_busylocks = kmalloc(sizeof(spinlock_t) << udp_busylocks_log,
> > > - GFP_KERNEL);
> >
> > A per sock busylock is preferable over increasing this array to be
> > full percpu (and converting percpu to avoid false sharing)?
> >
> > Because that would take a lot of space on modern server platforms?
> > Just trying to understand the trade-off made.
>
> The goal of the busylock is to have a single gate before sk->sk_receive_queue.
>
> Having per-cpu spinlocks will not fit the need ?
Oh of course. For high rate UDP servers I was mistakenly immediately
thinking of SO_REUSEPORT and CPU pinning.
And thus different sockets that may accidentally share a hashed lock
or hit cacheline false sharing.
But this is a single socket being hit with traffic from multiple
producers.
> Note that having per-NUMA receive queues is on my plate, but not finished yet.
Interesting!
> I tried to remove the busylock (because modern UDP has a second queue
> (up->reader_queue),
> so __skb_recv_udp() splices things in batches), but busylock was still
> beneficial.
Good to know, thanks.
next prev parent reply other threads:[~2025-09-16 19:15 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-16 16:09 [PATCH net-next 00/10] udp: increase RX performance under stress Eric Dumazet
2025-09-16 16:09 ` [PATCH net-next 01/10] ipv6: make ipv6_pinfo.saddr_cache a boolean Eric Dumazet
2025-09-17 14:59 ` Willem de Bruijn
2025-09-17 15:30 ` David Ahern
2025-09-17 17:56 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 02/10] ipv6: make ipv6_pinfo.daddr_cache " Eric Dumazet
2025-09-17 14:59 ` Willem de Bruijn
2025-09-17 15:33 ` David Ahern
2025-09-17 17:57 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 03/10] ipv6: np->rxpmtu race annotation Eric Dumazet
2025-09-17 14:59 ` Willem de Bruijn
2025-09-17 15:34 ` David Ahern
2025-09-17 17:59 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 04/10] ipv6: reorganise struct ipv6_pinfo Eric Dumazet
2025-09-17 15:00 ` Willem de Bruijn
2025-09-17 15:36 ` David Ahern
2025-09-17 18:01 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 05/10] udp: refine __udp_enqueue_schedule_skb() test Eric Dumazet
2025-09-17 15:00 ` Willem de Bruijn
2025-09-17 15:57 ` Eric Dumazet
2025-09-17 17:53 ` Kuniyuki Iwashima
2025-09-17 19:07 ` Willem de Bruijn
2025-09-17 15:39 ` David Ahern
2025-09-16 16:09 ` [PATCH net-next 06/10] udp: update sk_rmem_alloc before busylock acquisition Eric Dumazet
2025-09-17 15:01 ` Willem de Bruijn
2025-09-17 16:07 ` Eric Dumazet
2025-09-17 19:14 ` Willem de Bruijn
2025-09-17 15:44 ` David Ahern
2025-09-17 18:02 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 07/10] net: group sk_backlog and sk_receive_queue Eric Dumazet
2025-09-17 15:01 ` Willem de Bruijn
2025-09-17 15:45 ` David Ahern
2025-09-17 18:11 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 08/10] udp: add udp_drops_inc() helper Eric Dumazet
2025-09-17 15:02 ` Willem de Bruijn
2025-09-17 15:47 ` David Ahern
2025-09-17 18:13 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 09/10] udp: make busylock per socket Eric Dumazet
2025-09-16 16:31 ` Willem de Bruijn
2025-09-16 17:10 ` Eric Dumazet
2025-09-16 19:15 ` Willem de Bruijn [this message]
2025-09-17 15:03 ` Willem de Bruijn
2025-09-17 15:52 ` David Ahern
2025-09-17 18:15 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 10/10] udp: use skb_attempt_defer_free() Eric Dumazet
2025-09-17 15:03 ` Willem de Bruijn
2025-09-17 15:55 ` David Ahern
2025-09-17 16:15 ` Paolo Abeni
2025-09-17 16:32 ` Eric Dumazet
2025-09-18 6:38 ` Paolo Abeni
2025-09-17 18:20 ` Kuniyuki Iwashima
2025-10-13 21:44 ` [REGRESSION] xfrm issue bisected to 6471658dc66c ("udp: use skb_attempt_defer_free()") Michal Kubecek
2025-10-13 22:12 ` Eric Dumazet
2025-09-18 1:03 ` [PATCH net-next 00/10] udp: increase RX performance under stress Jakub Kicinski
2025-09-18 8:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.17894869e21ae@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).