netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Eric Dumazet <edumazet@google.com>,
	 "David S . Miller" <davem@davemloft.net>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>,
	 Willem de Bruijn <willemb@google.com>,
	 Kuniyuki Iwashima <kuniyu@google.com>,
	 netdev@vger.kernel.org,  eric.dumazet@gmail.com,
	 Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH v4 net-next] udp: remove busylock and add per NUMA queues
Date: Mon, 22 Sep 2025 08:52:32 -0400	[thread overview]
Message-ID: <willemdebruijn.kernel.af97f0e88745@gmail.com> (raw)
In-Reply-To: <20250922104240.2182559-1-edumazet@google.com>

Eric Dumazet wrote:
> busylock was protecting UDP sockets against packet floods,
> but unfortunately was not protecting the host itself.
> 
> Under stress, many cpus could spin while acquiring the busylock,
> and NIC had to drop packets. Or packets would be dropped
> in cpu backlog if RPS/RFS were in place.
> 
> This patch replaces the busylock by intermediate
> lockless queues. (One queue per NUMA node).
> 
> This means that fewer number of cpus have to acquire
> the UDP receive queue lock.
> 
> Most of the cpus can either:
> - immediately drop the packet.
> - or queue it in their NUMA aware lockless queue.
> 
> Then one of the cpu is chosen to process this lockless queue
> in a batch.
> 
> The batch only contains packets that were cooked on the same
> NUMA node, thus with very limited latency impact.
> 
> Tested:
> 
> DDOS targeting a victim UDP socket, on a platform with 6 NUMA nodes
> (Intel(R) Xeon(R) 6985P-C)
> 
> Before:
> 
> nstat -n ; sleep 1 ; nstat | grep Udp
> Udp6InDatagrams                 1004179            0.0
> Udp6InErrors                    3117               0.0
> Udp6RcvbufErrors                3117               0.0
> 
> After:
> nstat -n ; sleep 1 ; nstat | grep Udp
> Udp6InDatagrams                 1116633            0.0
> Udp6InErrors                    14197275           0.0
> Udp6RcvbufErrors                14197275           0.0
> 
> We can see this host can now proces 14.2 M more packets per second
> while under attack, and the victim socket can receive 11 % more
> packets.
> 
> I used a small bpftrace program measuring time (in us) spent in
> __udp_enqueue_schedule_skb().
> 
> Before:
> 
> @udp_enqueue_us[398]:
> [0]                24901 |@@@                                                 |
> [1]                63512 |@@@@@@@@@                                           |
> [2, 4)            344827 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4, 8)            244673 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
> [8, 16)            54022 |@@@@@@@@                                            |
> [16, 32)          222134 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                   |
> [32, 64)          232042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
> [64, 128)           4219 |                                                    |
> [128, 256)           188 |                                                    |
> 
> After:
> 
> @udp_enqueue_us[398]:
> [0]              5608855 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [1]              1111277 |@@@@@@@@@@                                          |
> [2, 4)            501439 |@@@@                                                |
> [4, 8)            102921 |                                                    |
> [8, 16)            29895 |                                                    |
> [16, 32)           43500 |                                                    |
> [32, 64)           31552 |                                                    |
> [64, 128)            979 |                                                    |
> [128, 256)            13 |                                                    |
> 
> Note that the remaining bottleneck for this platform is in
> udp_drops_inc() because we limited struct numa_drop_counters
> to only two nodes so far.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Paolo Abeni <pabeni@redhat.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

  reply	other threads:[~2025-09-22 12:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-22 10:42 [PATCH v4 net-next] udp: remove busylock and add per NUMA queues Eric Dumazet
2025-09-22 12:52 ` Willem de Bruijn [this message]
2025-09-23 18:22 ` Kuniyuki Iwashima
2025-09-24  0:00 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.af97f0e88745@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).