public inbox for dwarves@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH dwarves v2 00/10] pahole: shared ELF and faster reproducible BTF encoding
@ 2024-12-13 22:36 Ihor Solodrai
  2024-12-13 22:36 ` [PATCH dwarves v2 01/10] btf_encoder: simplify function encoding Ihor Solodrai
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Ihor Solodrai @ 2024-12-13 22:36 UTC (permalink / raw)
  To: dwarves; +Cc: acme, alan.maguire, eddyz87, andrii, mykolal, bpf

This is a v2 of the patchset aiming to speed up parallel BTF encoding
when reproducible_build flag is set (see link [1]).

In comparison to v1:
  * patch #2 adding section-relative addresses to elf_functions is
    removed as unrelated [2]
  * patch #9 [3] is replaced with patches #8, #9 and #10 (the biggest
    and most important in this series)

Patch #10 rewrites multithreading implementation to job/worker
model. See the details in the commit message.

The ./tests/tests pass with a vmlinux build on bpf-next.

I also confrimed that the reproducible bpftool dump of BTF produced
for vmlinux is identical between this patch series and pahole/next.

With this patch series, the performance of parallel BTF encoding is
comparable to non-reproducible runs on pahole/next. Depending on the
number of threads and allowed memory usage (indirectly controlled by
max_decoded_cus parameter of the queue in the dwarf_loader.c), it may
be a little slower or a little faster.

Note that the number of CPU cycles is significantly less, although the
wall-clock time is somewhat greater for -j24, as reported by perf.

See sample measurements below (host nproc=24).

This patch (always reproducible)

    -j1 mem 842020 Kb, time 6.31 sec
    -j3 mem 864604 Kb, time 2.90 sec
    -j6 mem 927760 Kb, time 2.21 sec
    -j12 mem 1026616 Kb, time 2.29 sec
    -j24 mem 1188448 Kb, time 2.36 sec
    -j48 mem 1462656 Kb, time 2.48 sec

     Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

        46,771,092,586      cycles:u                                                                ( +-  0.17% )

               2.36785 +- 0.00503 seconds time elapsed  ( +-  0.21% )

pahole/next (1cb4202) non-reproducible

    -j1 mem 834004 Kb, time 6.25 sec
    -j3 mem 976480 Kb, time 3.21 sec
    -j6 mem 1081432 Kb, time 2.36 sec
    -j12 mem 1161252 Kb, time 2.07 sec
    -j24 mem 1303060 Kb, time 2.13 sec
    -j48 mem 1537800 Kb, time 2.39 sec

     Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

        60,436,382,442      cycles:u                                                                ( +-  0.22% )

                2.2024 +- 0.0151 seconds time elapsed  ( +-  0.68% )

pahole/next (1cb4202) reproducible

    -j1 mem 4745764 Kb, time 7.64 sec
    -j3 mem 4744556 Kb, time 3.95 sec
    -j6 mem 4744592 Kb, time 2.98 sec
    -j12 mem 4744680 Kb, time 2.99 sec
    -j24 mem 4745252 Kb, time 2.99 sec
    -j48 mem 4744520 Kb, time 2.98 sec

     Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

        38,155,725,721      cycles:u                                                                ( +-  0.29% )

               3.00290 +- 0.00501 seconds time elapsed  ( +-  0.17% )

[1] https://lore.kernel.org/dwarves/20241128012341.4081072-1-ihor.solodrai@pm.me/
[2] https://lore.kernel.org/dwarves/20241128012341.4081072-3-ihor.solodrai@pm.me/
[3] https://lore.kernel.org/dwarves/20241128012341.4081072-10-ihor.solodrai@pm.me/

Alan Maguire (2):
  btf_encoder: simplify function encoding
  btf_encoder: separate elf function, saved function representations

Ihor Solodrai (8):
  dwarf_loader: introduce pre_load_module hook to conf_load
  btf_encoder: introduce elf_functions struct type
  btf_encoder: collect elf_functions in btf_encoder__pre_load_module
  btf_encoder: switch to shared elf_functions table
  btf_encoder: introduce btf_encoding_context
  btf_encoder: remove skip_encoding_inconsistent_proto
  dwarf_loader: introduce cu->id
  dwarf_loader: multithreading with a job/worker model

 btf_encoder.c               | 639 +++++++++++++++++++++---------------
 btf_encoder.h               |   8 +-
 btf_loader.c                |   2 +-
 ctf_loader.c                |   2 +-
 dwarf_loader.c              | 352 ++++++++++++++------
 dwarves.c                   |  44 ---
 dwarves.h                   |  21 +-
 pahole.c                    | 237 +++----------
 pdwtags.c                   |   3 +-
 pfunct.c                    |   3 +-
 tests/reproducible_build.sh |   5 +-
 11 files changed, 685 insertions(+), 631 deletions(-)

-- 
2.47.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-12-20 12:31 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13 22:36 [PATCH dwarves v2 00/10] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
2024-12-13 22:36 ` [PATCH dwarves v2 01/10] btf_encoder: simplify function encoding Ihor Solodrai
2024-12-13 22:36 ` [PATCH dwarves v2 02/10] btf_encoder: separate elf function, saved function representations Ihor Solodrai
2024-12-19 14:59   ` Jiri Olsa
2024-12-13 22:36 ` [PATCH dwarves v2 03/10] dwarf_loader: introduce pre_load_module hook to conf_load Ihor Solodrai
2024-12-13 22:37 ` [PATCH dwarves v2 04/10] btf_encoder: introduce elf_functions struct type Ihor Solodrai
2024-12-13 22:37 ` [PATCH dwarves v2 05/10] btf_encoder: collect elf_functions in btf_encoder__pre_load_module Ihor Solodrai
2024-12-13 22:37 ` [PATCH dwarves v2 06/10] btf_encoder: switch to shared elf_functions table Ihor Solodrai
2024-12-19 14:58   ` Jiri Olsa
2024-12-19 19:06     ` Ihor Solodrai
2024-12-13 22:37 ` [PATCH dwarves v2 07/10] btf_encoder: introduce btf_encoding_context Ihor Solodrai
2024-12-17  2:39   ` Eduard Zingerman
2024-12-17  3:15     ` Eduard Zingerman
2024-12-17 18:06       ` Ihor Solodrai
2024-12-18  0:03         ` Andrii Nakryiko
2024-12-18  0:40           ` Eduard Zingerman
2024-12-18 20:07             ` Ihor Solodrai
2024-12-19 14:59             ` Jiri Olsa
2024-12-13 22:37 ` [PATCH dwarves v2 08/10] btf_encoder: remove skip_encoding_inconsistent_proto Ihor Solodrai
2024-12-13 22:37 ` [PATCH dwarves v2 09/10] dwarf_loader: introduce cu->id Ihor Solodrai
2024-12-13 22:37 ` [PATCH dwarves v2 10/10] dwarf_loader: multithreading with a job/worker model Ihor Solodrai
2024-12-17  0:57   ` Eduard Zingerman
2024-12-17 18:12     ` Ihor Solodrai
2024-12-19 14:59     ` Jiri Olsa
2024-12-17  2:14   ` Eduard Zingerman
2024-12-19 14:59   ` Jiri Olsa
2024-12-19 19:31     ` Ihor Solodrai
2024-12-20  9:25       ` Jiri Olsa
2024-12-17  7:00 ` [PATCH dwarves v2 00/10] pahole: shared ELF and faster reproducible BTF encoding Eduard Zingerman
2024-12-20 12:31   ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox