From: Ihor Solodrai <ihor.solodrai@linux.dev>
To: Nathan Chancellor <nathan@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>,
Yonghong Song <yonghong.song@linux.dev>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
llvm@lists.linux.dev
Subject: Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Date: Mon, 5 Jan 2026 17:06:49 -0800 [thread overview]
Message-ID: <6908562f-4a99-44ea-bffb-19f33fcffe83@linux.dev> (raw)
In-Reply-To: <20260105234605.GB1276749@ax162>
On 1/5/26 3:46 PM, Nathan Chancellor wrote:
> On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
>> Hi Nathan, thank you for the patch.
>>
>> I'm starting to think it wasn't a good idea to do
>>
>> echo "" | ${CC} ...
>>
>> here, given the number of associated bugs.
>
> Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
> problem since that contains the endianness flag for some targets. I
> cannot imagine any more issues than that but I can understand wanting to
> back out of it.
>
>> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
>>
>> ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
>> --strip-all ${1} "${btf_data}" 2>/dev/null
>>
>> I changed to ${CC} on the assumption it's a quicker operation than
>> stripping entire vmlinux. But maybe it's not worth it and we should
>> change back to --strip-all? wdyt?
>
> That certainly seems more robust to me. I see the logic but with
> '--only-section' and no glob, I would expect that to be a rather quick
> operation but I am running out of time today to test and benchmark such
> a change. I will try to do it tomorrow unless someone beats me to it.
I got curious and did a little experiment. Basically, I ran perf stat
on this part of gen-btf.sh:
echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
--set-section-flags .BTF=alloc,readonly ${btf_data}
${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}
Replacing ${CC} command with:
${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null
for comparison.
TL;DR is that using ${CC} is:
* about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
* about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1
With obvious caveats that this is a particular machine (Threadripper
PRO 3975WX), toolchain etc:
* clang version 21.1.7
* gcc (GCC) 15.2.1 20251211
This is bpf-next (a069190b590e) with BPF CI-like kconfig.
Pasting perf stat output below.
# llvm-objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh
Performance counter stats for './gen-btf.o_strip.sh' (31 runs):
1,300,945,256 task-clock:u # 0.962 CPUs utilized ( +- 0.10% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
327,311 page-faults:u # 251.595 K/sec ( +- 0.00% )
1,532,927,570 instructions:u # 1.33 insn per cycle
# 0.03 stalled cycles per insn ( +- 0.00% )
1,155,639,083 cycles:u # 0.888 GHz ( +- 0.18% )
53,144,866 stalled-cycles-frontend:u # 4.60% frontend cycles idle ( +- 0.99% )
297,229,466 branches:u # 228.472 M/sec ( +- 0.00% )
903,337 branch-misses:u # 0.30% of all branches ( +- 0.02% )
1.35200 +- 0.00137 seconds time elapsed ( +- 0.10% )
# GNU objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh
Performance counter stats for './gen-btf.o_strip.sh' (31 runs):
119,747,488 task-clock:u # 0.970 CPUs utilized ( +- 0.41% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
9,186 page-faults:u # 76.711 K/sec ( +- 0.01% )
132,651,881 instructions:u # 1.68 insn per cycle
# 0.08 stalled cycles per insn ( +- 0.00% )
79,191,259 cycles:u # 0.661 GHz ( +- 1.06% )
10,136,981 stalled-cycles-frontend:u # 12.80% frontend cycles idle ( +- 2.58% )
28,422,807 branches:u # 237.356 M/sec ( +- 0.00% )
354,981 branch-misses:u # 1.25% of all branches ( +- 0.02% )
0.123415 +- 0.000564 seconds time elapsed ( +- 0.46% )
# echo "" | clang ...
$ perf stat -r 31 -- ./gen-btf.o_llvm.sh
Performance counter stats for './gen-btf.o_llvm.sh' (31 runs):
62,107,490 task-clock:u # 0.774 CPUs utilized ( +- 0.31% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
9,755 page-faults:u # 157.066 K/sec ( +- 0.01% )
88,196,854 instructions:u # 1.18 insn per cycle
# 0.19 stalled cycles per insn ( +- 0.00% )
74,944,793 cycles:u # 1.207 GHz ( +- 0.50% )
16,494,448 stalled-cycles-frontend:u # 22.01% frontend cycles idle ( +- 0.48% )
17,914,949 branches:u # 288.451 M/sec ( +- 0.00% )
459,548 branch-misses:u # 2.57% of all branches ( +- 0.10% )
0.080237 +- 0.000313 seconds time elapsed ( +- 0.39% )
# echo "" | gcc ...
$ perf stat -r 31 -- ./gen-btf.o_gnu.sh
Performance counter stats for './gen-btf.o_gnu.sh' (31 runs):
53,683,797 task-clock:u # 0.770 CPUs utilized ( +- 0.33% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
8,390 page-faults:u # 156.286 K/sec ( +- 0.01% )
69,398,474 instructions:u # 1.22 insn per cycle
# 0.17 stalled cycles per insn ( +- 0.00% )
56,763,954 cycles:u # 1.057 GHz ( +- 0.39% )
12,103,546 stalled-cycles-frontend:u # 21.32% frontend cycles idle ( +- 0.47% )
14,064,366 branches:u # 261.985 M/sec ( +- 0.00% )
347,383 branch-misses:u # 2.47% of all branches ( +- 0.09% )
0.069735 +- 0.000253 seconds time elapsed ( +- 0.36% )
>
> Cheers,
> Nathan
next prev parent reply other threads:[~2026-01-06 1:07 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-05 21:12 [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file Nathan Chancellor
2026-01-05 22:01 ` Ihor Solodrai
2026-01-05 23:46 ` Nathan Chancellor
2026-01-06 1:06 ` Ihor Solodrai [this message]
2026-01-06 21:53 ` Nathan Chancellor
2026-01-06 22:01 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6908562f-4a99-44ea-bffb-19f33fcffe83@linux.dev \
--to=ihor.solodrai@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=martin.lau@linux.dev \
--cc=nathan@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.