From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Eric Dumazet <edumazet@google.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Willem de Bruijn <willemb@google.com>,
Kuniyuki Iwashima <kuniyu@google.com>,
David Ahern <dsahern@kernel.org>,
netdev@vger.kernel.org, eric.dumazet@gmail.com
Subject: Re: [PATCH net-next 09/10] udp: make busylock per socket
Date: Tue, 16 Sep 2025 15:15:00 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.17894869e21ae@gmail.com> (raw)
In-Reply-To: <CANn89i+KGxhZNmFw8TsD9GzQ8=Acag_ALDw9AB5A4gupBpRzQQ@mail.gmail.com>
Eric Dumazet wrote:
> On Tue, Sep 16, 2025 at 9:31 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Eric Dumazet wrote:
> > > While having all spinlocks packed into an array was a space saver,
> > > this also caused NUMA imbalance and hash collisions.
> > >
> > > UDPv6 socket size becomes 1600 after this patch.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > > include/linux/udp.h | 1 +
> > > include/net/udp.h | 1 +
> > > net/ipv4/udp.c | 20 ++------------------
> > > 3 files changed, 4 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > > index 6ed008ab166557e868c1918daaaa5d551b7989a7..e554890c4415b411f35007d3ece9e6042db7a544 100644
> > > --- a/include/linux/udp.h
> > > +++ b/include/linux/udp.h
> > > @@ -109,6 +109,7 @@ struct udp_sock {
> > > */
> > > struct hlist_node tunnel_list;
> > > struct numa_drop_counters drop_counters;
> > > + spinlock_t busylock ____cacheline_aligned_in_smp;
> > > };
> > >
> > > #define udp_test_bit(nr, sk) \
> > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > index a08822e294b038c0d00d4a5f5cac62286a207926..eecd64097f91196897f45530540b9c9b68c5ba4e 100644
> > > --- a/include/net/udp.h
> > > +++ b/include/net/udp.h
> > > @@ -289,6 +289,7 @@ static inline void udp_lib_init_sock(struct sock *sk)
> > > struct udp_sock *up = udp_sk(sk);
> > >
> > > sk->sk_drop_counters = &up->drop_counters;
> > > + spin_lock_init(&up->busylock);
> > > skb_queue_head_init(&up->reader_queue);
> > > INIT_HLIST_NODE(&up->tunnel_list);
> > > up->forward_threshold = sk->sk_rcvbuf >> 2;
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 25143f932447df2a84dd113ca33e1ccf15b3503c..7d1444821ee51a19cd5fd0dd5b8d096104c9283c 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -1689,17 +1689,11 @@ static void udp_skb_dtor_locked(struct sock *sk, struct sk_buff *skb)
> > > * to relieve pressure on the receive_queue spinlock shared by consumer.
> > > * Under flood, this means that only one producer can be in line
> > > * trying to acquire the receive_queue spinlock.
> > > - * These busylock can be allocated on a per cpu manner, instead of a
> > > - * per socket one (that would consume a cache line per socket)
> > > */
> > > -static int udp_busylocks_log __read_mostly;
> > > -static spinlock_t *udp_busylocks __read_mostly;
> > > -
> > > -static spinlock_t *busylock_acquire(void *ptr)
> > > +static spinlock_t *busylock_acquire(struct sock *sk)
> > > {
> > > - spinlock_t *busy;
> > > + spinlock_t *busy = &udp_sk(sk)->busylock;
> > >
> > > - busy = udp_busylocks + hash_ptr(ptr, udp_busylocks_log);
> > > spin_lock(busy);
> > > return busy;
> > > }
> > > @@ -3997,7 +3991,6 @@ static void __init bpf_iter_register(void)
> > > void __init udp_init(void)
> > > {
> > > unsigned long limit;
> > > - unsigned int i;
> > >
> > > udp_table_init(&udp_table, "UDP");
> > > limit = nr_free_buffer_pages() / 8;
> > > @@ -4006,15 +3999,6 @@ void __init udp_init(void)
> > > sysctl_udp_mem[1] = limit;
> > > sysctl_udp_mem[2] = sysctl_udp_mem[0] * 2;
> > >
> > > - /* 16 spinlocks per cpu */
> > > - udp_busylocks_log = ilog2(nr_cpu_ids) + 4;
> > > - udp_busylocks = kmalloc(sizeof(spinlock_t) << udp_busylocks_log,
> > > - GFP_KERNEL);
> >
> > A per sock busylock is preferable over increasing this array to be
> > full percpu (and converting percpu to avoid false sharing)?
> >
> > Because that would take a lot of space on modern server platforms?
> > Just trying to understand the trade-off made.
>
> The goal of the busylock is to have a single gate before sk->sk_receive_queue.
>
> Having per-cpu spinlocks will not fit the need ?
Oh of course. For high rate UDP servers I was mistakenly immediately
thinking of SO_REUSEPORT and CPU pinning.
And thus different sockets that may accidentally share a hashed lock
or hit cacheline false sharing.
But this is a single socket being hit with traffic from multiple
producers.
> Note that having per-NUMA receive queues is on my plate, but not finished yet.
Interesting!
> I tried to remove the busylock (because modern UDP has a second queue
> (up->reader_queue),
> so __skb_recv_udp() splices things in batches), but busylock was still
> beneficial.
Good to know, thanks.
next prev parent reply other threads:[~2025-09-16 19:15 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-16 16:09 [PATCH net-next 00/10] udp: increase RX performance under stress Eric Dumazet
2025-09-16 16:09 ` [PATCH net-next 01/10] ipv6: make ipv6_pinfo.saddr_cache a boolean Eric Dumazet
2025-09-17 14:59 ` Willem de Bruijn
2025-09-17 15:30 ` David Ahern
2025-09-17 17:56 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 02/10] ipv6: make ipv6_pinfo.daddr_cache " Eric Dumazet
2025-09-17 14:59 ` Willem de Bruijn
2025-09-17 15:33 ` David Ahern
2025-09-17 17:57 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 03/10] ipv6: np->rxpmtu race annotation Eric Dumazet
2025-09-17 14:59 ` Willem de Bruijn
2025-09-17 15:34 ` David Ahern
2025-09-17 17:59 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 04/10] ipv6: reorganise struct ipv6_pinfo Eric Dumazet
2025-09-17 15:00 ` Willem de Bruijn
2025-09-17 15:36 ` David Ahern
2025-09-17 18:01 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 05/10] udp: refine __udp_enqueue_schedule_skb() test Eric Dumazet
2025-09-17 15:00 ` Willem de Bruijn
2025-09-17 15:57 ` Eric Dumazet
2025-09-17 17:53 ` Kuniyuki Iwashima
2025-09-17 19:07 ` Willem de Bruijn
2025-09-17 15:39 ` David Ahern
2025-09-16 16:09 ` [PATCH net-next 06/10] udp: update sk_rmem_alloc before busylock acquisition Eric Dumazet
2025-09-17 15:01 ` Willem de Bruijn
2025-09-17 16:07 ` Eric Dumazet
2025-09-17 19:14 ` Willem de Bruijn
2025-09-17 15:44 ` David Ahern
2025-09-17 18:02 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 07/10] net: group sk_backlog and sk_receive_queue Eric Dumazet
2025-09-17 15:01 ` Willem de Bruijn
2025-09-17 15:45 ` David Ahern
2025-09-17 18:11 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 08/10] udp: add udp_drops_inc() helper Eric Dumazet
2025-09-17 15:02 ` Willem de Bruijn
2025-09-17 15:47 ` David Ahern
2025-09-17 18:13 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 09/10] udp: make busylock per socket Eric Dumazet
2025-09-16 16:31 ` Willem de Bruijn
2025-09-16 17:10 ` Eric Dumazet
2025-09-16 19:15 ` Willem de Bruijn [this message]
2025-09-17 15:03 ` Willem de Bruijn
2025-09-17 15:52 ` David Ahern
2025-09-17 18:15 ` Kuniyuki Iwashima
2025-09-16 16:09 ` [PATCH net-next 10/10] udp: use skb_attempt_defer_free() Eric Dumazet
2025-09-17 15:03 ` Willem de Bruijn
2025-09-17 15:55 ` David Ahern
2025-09-17 16:15 ` Paolo Abeni
2025-09-17 16:32 ` Eric Dumazet
2025-09-18 6:38 ` Paolo Abeni
2025-09-17 18:20 ` Kuniyuki Iwashima
2025-10-13 21:44 ` [REGRESSION] xfrm issue bisected to 6471658dc66c ("udp: use skb_attempt_defer_free()") Michal Kubecek
2025-10-13 22:12 ` Eric Dumazet
2025-09-18 1:03 ` [PATCH net-next 00/10] udp: increase RX performance under stress Jakub Kicinski
2025-09-18 8:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.17894869e21ae@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.