All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ihor Solodrai <ihor.solodrai@linux.dev>
To: Nathan Chancellor <nathan@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	llvm@lists.linux.dev
Subject: Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file
Date: Mon, 5 Jan 2026 17:06:49 -0800	[thread overview]
Message-ID: <6908562f-4a99-44ea-bffb-19f33fcffe83@linux.dev> (raw)
In-Reply-To: <20260105234605.GB1276749@ax162>

On 1/5/26 3:46 PM, Nathan Chancellor wrote:
> On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
>> Hi Nathan, thank you for the patch.
>>
>> I'm starting to think it wasn't a good idea to do
>>
>> 	echo "" | ${CC} ...
>>
>> here, given the number of associated bugs.
> 
> Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
> problem since that contains the endianness flag for some targets. I
> cannot imagine any more issues than that but I can understand wanting to
> back out of it.
> 
>> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
>>
>> 	${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
>> 		--strip-all ${1} "${btf_data}" 2>/dev/null
>>
>> I changed to ${CC} on the assumption it's a quicker operation than
>> stripping entire vmlinux. But maybe it's not worth it and we should
>> change back to --strip-all? wdyt?
> 
> That certainly seems more robust to me. I see the logic but with
> '--only-section' and no glob, I would expect that to be a rather quick
> operation but I am running out of time today to test and benchmark such
> a change. I will try to do it tomorrow unless someone beats me to it.

I got curious and did a little experiment. Basically, I ran perf stat
on this part of gen-btf.sh:

	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
		--set-section-flags .BTF=alloc,readonly ${btf_data}
	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}

Replacing ${CC} command with:

	${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null

for comparison.

TL;DR is that using ${CC} is:
  * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
  * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1

With obvious caveats that this is a particular machine (Threadripper
PRO 3975WX), toolchain etc:
  * clang version 21.1.7
  * gcc (GCC) 15.2.1 20251211

This is bpf-next (a069190b590e) with BPF CI-like kconfig.

Pasting perf stat output below.


# llvm-objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

     1,300,945,256      task-clock:u                     #    0.962 CPUs utilized               ( +-  0.10% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
           327,311      page-faults:u                    #  251.595 K/sec                       ( +-  0.00% )
     1,532,927,570      instructions:u                   #    1.33  insn per cycle            
                                                  #    0.03  stalled cycles per insn     ( +-  0.00% )
     1,155,639,083      cycles:u                         #    0.888 GHz                         ( +-  0.18% )
        53,144,866      stalled-cycles-frontend:u        #    4.60% frontend cycles idle        ( +-  0.99% )
       297,229,466      branches:u                       #  228.472 M/sec                       ( +-  0.00% )
           903,337      branch-misses:u                  #    0.30% of all branches             ( +-  0.02% )

           1.35200 +- 0.00137 seconds time elapsed  ( +-  0.10% )


# GNU objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

       119,747,488      task-clock:u                     #    0.970 CPUs utilized               ( +-  0.41% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,186      page-faults:u                    #   76.711 K/sec                       ( +-  0.01% )
       132,651,881      instructions:u                   #    1.68  insn per cycle            
                                                  #    0.08  stalled cycles per insn     ( +-  0.00% )
        79,191,259      cycles:u                         #    0.661 GHz                         ( +-  1.06% )
        10,136,981      stalled-cycles-frontend:u        #   12.80% frontend cycles idle        ( +-  2.58% )
        28,422,807      branches:u                       #  237.356 M/sec                       ( +-  0.00% )
           354,981      branch-misses:u                  #    1.25% of all branches             ( +-  0.02% )

          0.123415 +- 0.000564 seconds time elapsed  ( +-  0.46% )


# echo "" | clang ...
$ perf stat -r 31 -- ./gen-btf.o_llvm.sh

 Performance counter stats for './gen-btf.o_llvm.sh' (31 runs):

        62,107,490      task-clock:u                     #    0.774 CPUs utilized               ( +-  0.31% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,755      page-faults:u                    #  157.066 K/sec                       ( +-  0.01% )
        88,196,854      instructions:u                   #    1.18  insn per cycle            
                                                  #    0.19  stalled cycles per insn     ( +-  0.00% )
        74,944,793      cycles:u                         #    1.207 GHz                         ( +-  0.50% )
        16,494,448      stalled-cycles-frontend:u        #   22.01% frontend cycles idle        ( +-  0.48% )
        17,914,949      branches:u                       #  288.451 M/sec                       ( +-  0.00% )
           459,548      branch-misses:u                  #    2.57% of all branches             ( +-  0.10% )

          0.080237 +- 0.000313 seconds time elapsed  ( +-  0.39% )


# echo "" | gcc ...
$ perf stat -r 31 -- ./gen-btf.o_gnu.sh

 Performance counter stats for './gen-btf.o_gnu.sh' (31 runs):

        53,683,797      task-clock:u                     #    0.770 CPUs utilized               ( +-  0.33% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             8,390      page-faults:u                    #  156.286 K/sec                       ( +-  0.01% )
        69,398,474      instructions:u                   #    1.22  insn per cycle            
                                                  #    0.17  stalled cycles per insn     ( +-  0.00% )
        56,763,954      cycles:u                         #    1.057 GHz                         ( +-  0.39% )
        12,103,546      stalled-cycles-frontend:u        #   21.32% frontend cycles idle        ( +-  0.47% )
        14,064,366      branches:u                       #  261.985 M/sec                       ( +-  0.00% )
           347,383      branch-misses:u                  #    2.47% of all branches             ( +-  0.09% )

          0.069735 +- 0.000253 seconds time elapsed  ( +-  0.36% )


> 
> Cheers,
> Nathan


  reply	other threads:[~2026-01-06  1:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05 21:12 [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating initial .o file Nathan Chancellor
2026-01-05 22:01 ` Ihor Solodrai
2026-01-05 23:46   ` Nathan Chancellor
2026-01-06  1:06     ` Ihor Solodrai [this message]
2026-01-06 21:53       ` Nathan Chancellor
2026-01-06 22:01         ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6908562f-4a99-44ea-bffb-19f33fcffe83@linux.dev \
    --to=ihor.solodrai@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=martin.lau@linux.dev \
    --cc=nathan@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.