All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: Eric Dumazet <edumazet@google.com>
Cc: willemdebruijn.kernel@gmail.com,  davem@davemloft.net,
	dsahern@kernel.org,  kuba@kernel.org,  pabeni@redhat.com,
	kuniyu@google.com,  horms@kernel.org,  netdev@vger.kernel.org
Subject: Re: [PATCH] udp: Force compute_score to always inline
Date: Thu, 09 Apr 2026 18:50:53 -0400	[thread overview]
Message-ID: <87v7dzoiia.fsf@mailhost.krisman.be> (raw)
In-Reply-To: <CANn89iKQhLOdtn-_viyDN8ytjJtR-4p0gteXL6gGSHoUYZp5Hw@mail.gmail.com> (Eric Dumazet's message of "Thu, 9 Apr 2026 15:36:15 -0700")

Eric Dumazet <edumazet@google.com> writes:

> On Thu, Apr 9, 2026 at 3:16 PM Gabriel Krisman Bertazi <krisman@suse.de> wrote:
>
>>
>> Back in 2024 I reported a 7-12% regression on an iperf3 UDP loopback
>> thoughput test that we traced to the extra overhead of calling
>> compute_score on two places, introduced by commit f0ea27e7bfe1 ("udp:
>> re-score reuseport groups when connected sockets are present").  At the
>> time, I pointed out the overhead was caused by the multiple calls,
>> associated with cpu-specific mitigations, and merged commit
>> 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites") to
>> jump back explicitly, to force the rescore call in a single place.
>>
>> Recently though, we got another regression report against a newer distro
>> version, which a team colleague traced back to the same root-cause.
>> Turns out that once we updated to gcc-13, the compiler got smart enough
>> to unroll the loop, undoing my previous mitigation.  Let's bite the
>> bullet and __always_inline compute_score on both ipv4 and ipv6 to
>> prevent gcc from de-optimizing it again in the future.  These functions
>> are only called in two places each, udpX_lib_lookup1 and
>> udpX_lib_lookup2, so the extra size shouldn't be a problem and it is hot
>> enough to be very visible in profilings.  In fact, with gcc13, forcing
>> the inline will prevent gcc from unrolling the fix from commit
>> 50aee97d1511, so we don't end up increasing udpX_lib_lookup2 at all.
>>
>> I haven't recollected the results myself, as I don't have access to the
>> machine at the moment.  But the same colleague reported 4.67%
>> inprovement with this patch in the loopback benchmark, solving the
>> regression report within noise margins.
>
> You could include scripts/bloat-o-meter results, so that we can sense
> the cost of such a change.
>
> $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
> add/remove: 0/2 grow/shrink: 6/1 up/down: 622/-410 (212)
> Function                                     old     new   delta
> __udp6_lib_lookup                            797    1007    +210
> __udp4_lib_lookup                            838     984    +146
> udp6_lib_lookup2                             404     536    +132
> udp4_lib_lookup2                             396     498    +102
> udpv6_rcv                                   3018    3034     +16
> udp_init_sock                                244     260     +16
> bpf_iter_udp_batch                           953     937     -16
> __pfx_compute_score                           32       -     -32
> compute_score                                362       -    -362
> Total: Before=30269687, After=30269899, chg +0.00%
>
> No change for clang.
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Apologies, I wasn't aware of that tool. I did some calculations by hand
and found something like 200 bytes extra in udp6_lib_lookup2.

For gcc-13:

scripts/bloat-o-meter vmlinux vmlinux-inline
add/remove: 0/2 grow/shrink: 4/0 up/down: 616/-416 (200)
Function                                     old     new   delta
udp6_lib_lookup2                             762     949    +187
__udp6_lib_lookup                            810     975    +165
udp4_lib_lookup2                             757     906    +149
__udp4_lib_lookup                            871     986    +115
__pfx_compute_score                           32       -     -32
compute_score                                384       -    -384
Total: Before=35011784, After=35011984, chg +0.00%



-- 
Gabriel Krisman Bertazi

  reply	other threads:[~2026-04-09 22:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 22:15 [PATCH] udp: Force compute_score to always inline Gabriel Krisman Bertazi
2026-04-09 22:36 ` Eric Dumazet
2026-04-09 22:50   ` Gabriel Krisman Bertazi [this message]
2026-04-10 13:02 ` Willem de Bruijn
2026-04-10 13:04 ` Willem de Bruijn
2026-04-10 16:01   ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v7dzoiia.fsf@mailhost.krisman.be \
    --to=krisman@suse.de \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.