public inbox for llvm@lists.linux.dev
 help / color / mirror / Atom feed
From: Ihor Solodrai <ihor.solodrai@linux.dev>
To: Nathan Chancellor <nathan@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	llvm@lists.linux.dev
Subject: Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Date: Mon, 5 Jan 2026 17:06:49 -0800	[thread overview]
Message-ID: <6908562f-4a99-44ea-bffb-19f33fcffe83@linux.dev> (raw)
In-Reply-To: <20260105234605.GB1276749@ax162>

On 1/5/26 3:46 PM, Nathan Chancellor wrote:
> On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
>> Hi Nathan, thank you for the patch.
>>
>> I'm starting to think it wasn't a good idea to do
>>
>> 	echo "" | ${CC} ...
>>
>> here, given the number of associated bugs.
> 
> Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
> problem since that contains the endianness flag for some targets. I
> cannot imagine any more issues than that but I can understand wanting to
> back out of it.
> 
>> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
>>
>> 	${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
>> 		--strip-all ${1} "${btf_data}" 2>/dev/null
>>
>> I changed to ${CC} on the assumption it's a quicker operation than
>> stripping entire vmlinux. But maybe it's not worth it and we should
>> change back to --strip-all? wdyt?
> 
> That certainly seems more robust to me. I see the logic but with
> '--only-section' and no glob, I would expect that to be a rather quick
> operation but I am running out of time today to test and benchmark such
> a change. I will try to do it tomorrow unless someone beats me to it.

I got curious and did a little experiment. Basically, I ran perf stat
on this part of gen-btf.sh:

	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
		--set-section-flags .BTF=alloc,readonly ${btf_data}
	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}

Replacing ${CC} command with:

	${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null

for comparison.

TL;DR is that using ${CC} is:
  * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
  * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1

With obvious caveats that this is a particular machine (Threadripper
PRO 3975WX), toolchain etc:
  * clang version 21.1.7
  * gcc (GCC) 15.2.1 20251211

This is bpf-next (a069190b590e) with BPF CI-like kconfig.

Pasting perf stat output below.


# llvm-objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

     1,300,945,256      task-clock:u                     #    0.962 CPUs utilized               ( +-  0.10% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
           327,311      page-faults:u                    #  251.595 K/sec                       ( +-  0.00% )
     1,532,927,570      instructions:u                   #    1.33  insn per cycle            
                                                  #    0.03  stalled cycles per insn     ( +-  0.00% )
     1,155,639,083      cycles:u                         #    0.888 GHz                         ( +-  0.18% )
        53,144,866      stalled-cycles-frontend:u        #    4.60% frontend cycles idle        ( +-  0.99% )
       297,229,466      branches:u                       #  228.472 M/sec                       ( +-  0.00% )
           903,337      branch-misses:u                  #    0.30% of all branches             ( +-  0.02% )

           1.35200 +- 0.00137 seconds time elapsed  ( +-  0.10% )


# GNU objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

       119,747,488      task-clock:u                     #    0.970 CPUs utilized               ( +-  0.41% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,186      page-faults:u                    #   76.711 K/sec                       ( +-  0.01% )
       132,651,881      instructions:u                   #    1.68  insn per cycle            
                                                  #    0.08  stalled cycles per insn     ( +-  0.00% )
        79,191,259      cycles:u                         #    0.661 GHz                         ( +-  1.06% )
        10,136,981      stalled-cycles-frontend:u        #   12.80% frontend cycles idle        ( +-  2.58% )
        28,422,807      branches:u                       #  237.356 M/sec                       ( +-  0.00% )
           354,981      branch-misses:u                  #    1.25% of all branches             ( +-  0.02% )

          0.123415 +- 0.000564 seconds time elapsed  ( +-  0.46% )


# echo "" | clang ...
$ perf stat -r 31 -- ./gen-btf.o_llvm.sh

 Performance counter stats for './gen-btf.o_llvm.sh' (31 runs):

        62,107,490      task-clock:u                     #    0.774 CPUs utilized               ( +-  0.31% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,755      page-faults:u                    #  157.066 K/sec                       ( +-  0.01% )
        88,196,854      instructions:u                   #    1.18  insn per cycle            
                                                  #    0.19  stalled cycles per insn     ( +-  0.00% )
        74,944,793      cycles:u                         #    1.207 GHz                         ( +-  0.50% )
        16,494,448      stalled-cycles-frontend:u        #   22.01% frontend cycles idle        ( +-  0.48% )
        17,914,949      branches:u                       #  288.451 M/sec                       ( +-  0.00% )
           459,548      branch-misses:u                  #    2.57% of all branches             ( +-  0.10% )

          0.080237 +- 0.000313 seconds time elapsed  ( +-  0.39% )


# echo "" | gcc ...
$ perf stat -r 31 -- ./gen-btf.o_gnu.sh

 Performance counter stats for './gen-btf.o_gnu.sh' (31 runs):

        53,683,797      task-clock:u                     #    0.770 CPUs utilized               ( +-  0.33% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             8,390      page-faults:u                    #  156.286 K/sec                       ( +-  0.01% )
        69,398,474      instructions:u                   #    1.22  insn per cycle            
                                                  #    0.17  stalled cycles per insn     ( +-  0.00% )
        56,763,954      cycles:u                         #    1.057 GHz                         ( +-  0.39% )
        12,103,546      stalled-cycles-frontend:u        #   21.32% frontend cycles idle        ( +-  0.47% )
        14,064,366      branches:u                       #  261.985 M/sec                       ( +-  0.00% )
           347,383      branch-misses:u                  #    2.47% of all branches             ( +-  0.09% )

          0.069735 +- 0.000253 seconds time elapsed  ( +-  0.36% )


> 
> Cheers,
> Nathan


  reply	other threads:[~2026-01-06  1:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05 21:12 [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file Nathan Chancellor
2026-01-05 22:01 ` Ihor Solodrai
2026-01-05 23:46   ` Nathan Chancellor
2026-01-06  1:06     ` Ihor Solodrai [this message]
2026-01-06 21:53       ` Nathan Chancellor
2026-01-06 22:01         ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6908562f-4a99-44ea-bffb-19f33fcffe83@linux.dev \
    --to=ihor.solodrai@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=martin.lau@linux.dev \
    --cc=nathan@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox