public inbox for dwarves@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding
@ 2024-11-28  1:23 Ihor Solodrai
  2024-11-28  1:23 ` [RFC PATCH 1/9] btf_encoder: simplify function encoding Ihor Solodrai
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:23 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

This patch series continues the work previously focused on sharing of
the ELF functions table between BTF encoders [1].

A recap:
* I've been looking into ways to improve the performance of
  reproducible (--btf_features=reproducible_build) DWARF->BTF
  encoding in pahole [2].
* Originally, Eduard Zingerman suggested to me an approach of creating
  btf_encoder for each compilation unit, and then merging resulting
  BTFs in a particular order.
* This idea required significant changes in how BTF encoders access
  ELF information, which led to the "shared elf_functions" patch
  series [1].
* During review Alan Maguire (and Eduard too, off-list) pointed out
  [3] that shared elf_functions implementation can be simplified, if
  infividual function information is split into immutable (ELF part)
  and mutable (BTF encoding part). Alan also shared a draft of this
  change [4]. Three commits from that draft are included in this patch
  series unchanged.

At that point I had all the pre-requisites to try "btf_encoder per CU"
idea, and so I did [5]. Unfortunately, it turned out that this
approach significantly slows down the encoding, although it reduces
memory usage for parallel reproducible encoding.

Simple measurement script (courtesy of Eduard):

    #!/bin/bash
    for j in 1 4 16 64; do
        /usr/bin/time -f "jobs ${j}, mem %M Kb, time %e sec" \
                      ./build/pahole -J -j$j \
                             --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build \
                             --btf_encode_detached=/dev/null \
                             --lang_exclude=rust \
                             $vmlinux > /dev/null
    done

Output for [5]:
    jobs 1, mem 1225652 Kb, time 10.37 sec
    jobs 4, mem 1242000 Kb, time 5.44 sec
    jobs 16, mem 1271692 Kb, time 4.33 sec
    jobs 64, mem 1410928 Kb, time 4.32 sec

Output for pahole/next (98d1f01):
    jobs 1, mem 3378104 Kb, time 8.76 sec
    jobs 4, mem 3378716 Kb, time 4.17 sec
    jobs 16, mem 3378144 Kb, time 4.03 sec
    jobs 64, mem 3378628 Kb, time 4.05 sec

The main reason for this, as far as I understand, is that each
individual encoder now needs to do more work. And on top of that,
there are more encoders to merge, which increases load on the
sequential part of a pahole run.

For example, mutable BTF maintains a string set to avoid string
duplication, but when an encoder is only concerned with a single CU
out of 2k+, this optimization becomes moot.

While disappointing, these observations prompted me to look closer on
how exacly is pahole spending the time when BTF encoding in parallel
[6]. I looked at the version with shared elf_functions table [7] on
the assumption that it will land eventually.

Here is a very rough high level breakdown (for vmlinux as input):
  * 81% multithreaded part
    * 64% loading DWARF
    * 17% encoding BTF
  * 20% sequential part
    * 14% BTF dedup
    *  6% BTF merge and everything else

The current encoding algorithm of each thread goes roughly like this:
    * each thread has it's own btf_encoder object
    * a thread reads a CU from DWARF and converts it into the internal
      representation (struct cu), this is what takes 60%+ of time
    * created CU is then passed to btf_encoder__encode_cu
    * when the encoding is complete, CU is deleted (this is important
      for reducing memory footprint)
    * when all CUs are processed, exit and proceed to single-threaded
      BTF merge and dedup

Overall, this is pretty fast. Except, when we need the resulting BTF
to be deterministic (aka reproducible).

Parallel reproducible encoding is implemented by enforcing the order
of CU processing between threads. How? Well, a thread is only allowed
to encode a CU in CU__LOADED state, and BTF encoding happens under a
lock. Which means that the 17% part in the breakdown above is not
actually multithreaded with reproducible_build flag.

To summarize the observations:
 * BTF encoding is actually a minor part of the multithreaded work
   (DWARF loading is about 4x bigger)
 * BTF encoding requires sequential post-processing (merge and dedup)
 * BTF processing in libbpf is kinda faster when it's one huge BTF
 * BTF encoding is sequential with reproducible_build anyway

So why try so hard making BTF encoding multithreaded?..

This led me to an idea: what if we leave BTF encoding part sequential,
but implement it in a way that does not stall DWARF loading? In other
words: let's have a single CU consumer and N-1 CU producers.

Producers are slower anyway, and we can also now get rid of the BTF
merge almost entirely (except for "add saved functions").

This is implemented by the last patch in this series, and the
measurements look promising (see the patch commit message).

Test results for this patch series:

  1: Validation of BTF encoding of functions; this may take some time: Ok
  2: Default BTF on a system without BTF: Ok
  3: Flexible arrays accounting: WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_cast_to_kern_ctx already with attribute (bpf_kfunc), ignoring
WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_rdonly_cast already with attribute (bpf_kfunc), ignoring
pahole: type 'nft_pipapo_elem' not found
pahole: type 'ip6t_standard' not found
pahole: type 'ip6t_error' not found
pahole: type 'nft_rbtree_elem' not found
pahole: type 'nft_rule_dp_last' not found
pahole: type 'nft_bitmap_elem' not found
pahole: type 'fuse_direntplus' not found
pahole: type 'ipt_standard' not found
pahole: type 'ipt_error' not found
pahole: type 'tls_rec' not found
pahole: type 'nft_rhash_elem' not found
pahole: type 'nft_hash_elem' not found
Ok
  4: Pretty printing of files using DWARF type information: Ok
  5: Parallel reproducible DWARF Loading/Serial BTF encoding: Ok


[1]: https://lore.kernel.org/dwarves/20241016001025.857970-1-ihor.solodrai@pm.me/
[2]: https://github.com/theihor/dwarves/pull/3
[3]: https://lore.kernel.org/dwarves/8678ce40-3ce2-4ece-985b-a40427386d57@oracle.com/
[4]: https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:elf-prep
[5]: https://github.com/theihor/dwarves/pull/8
[6]: https://gist.github.com/theihor/f000ce89427828e61fdaa567b332649b
[7]: https://github.com/theihor/dwarves/pull/8/commits/a7bc67d79d90f98776c6dc5fdaf9f088eb09909d


Alan Maguire (3):
  btf_encoder: simplify function encoding
  btf_encoder: store,use section-relative addresses in ELF function
    representation
  btf_encoder: separate elf function, saved function representations

Ihor Solodrai (6):
  dwarf_loader: introduce pre_load_module hook to conf_load
  btf_encoder: introduce elf_functions struct type
  btf_encoder: collect elf_functions in btf_encoder__pre_load_module
  btf_encoder: switch to shared elf_functions table
  btf_encoder: introduce btf_encoding_context
  pahole: faster reproducible BTF encoding

 btf_encoder.c  | 661 ++++++++++++++++++++++++++++++-------------------
 btf_encoder.h  |   6 +
 dwarf_loader.c |  18 +-
 dwarves.c      |  47 ++--
 dwarves.h      |  16 +-
 pahole.c       | 265 +++++++++-----------
 6 files changed, 567 insertions(+), 446 deletions(-)

-- 
2.47.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 1/9] btf_encoder: simplify function encoding
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
@ 2024-11-28  1:23 ` Ihor Solodrai
  2024-11-29  8:16   ` Eduard Zingerman
  2024-12-02 13:55   ` Jiri Olsa
  2024-11-28  1:23 ` [RFC PATCH 2/9] btf_encoder: store,use section-relative addresses in ELF function representation Ihor Solodrai
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:23 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

From: Alan Maguire <alan.maguire@oracle.com>

Currently we have two modes of function encoding; one adds functions
based upon the first instance found and ignores inconsistent
representations.  The second saves function representations and later
finds inconsistencies.  The mode chosen is determined by
conf_load->skip_encoding_btf_inconsistent_proto.

The knock-on effect is that we need to support two modes in
btf_encoder__add_func(); one for each case.  Simplify by using
the "save function" approach for both cases; only difference is
that we allow inconsistent representations if
skip_encoding_btf_inconsistent_proto is not set (it is set by default
for upstream kernels and has been for a while).

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 btf_encoder.c | 79 +++++++++++++++++----------------------------------
 1 file changed, 26 insertions(+), 53 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index e1adddf..98e4d7d 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -88,7 +88,6 @@ struct btf_encoder_func_state {
 struct elf_function {
 	const char	*name;
 	char		*alias;
-	bool		 generated;
 	size_t		prefixlen;
 	struct btf_encoder_func_state state;
 };
@@ -120,6 +119,7 @@ struct btf_encoder {
 			  force,
 			  gen_floats,
 			  skip_encoding_decl_tag,
+			  skip_encoding_inconsistent_proto,
 			  tag_kfuncs,
 			  gen_distilled_base;
 	uint32_t	  array_index_id;
@@ -1165,18 +1165,18 @@ out:
 	return err;
 }
 
-static int32_t btf_encoder__add_func(struct btf_encoder *encoder, struct function *fn,
+static int32_t btf_encoder__add_func(struct btf_encoder *encoder,
 				     struct elf_function *func)
 {
+	struct btf_encoder_func_state *state = &func->state;
 	int btf_fnproto_id, btf_fn_id, tag_type_id = 0;
 	int16_t component_idx = -1;
 	const char *name;
 	const char *value;
 	char tmp_value[KSYM_NAME_LEN];
+	uint16_t idx;
 
-	assert(fn != NULL || func != NULL);
-
-	btf_fnproto_id = btf_encoder__add_func_proto(encoder, fn ? &fn->proto : NULL, func);
+	btf_fnproto_id = btf_encoder__add_func_proto(encoder, NULL, func);
 	name = func->alias ?: func->name;
 	if (btf_fnproto_id >= 0)
 		btf_fn_id = btf_encoder__add_ref_type(encoder, BTF_KIND_FUNC, btf_fnproto_id,
@@ -1186,40 +1186,23 @@ static int32_t btf_encoder__add_func(struct btf_encoder *encoder, struct functio
 		       name, btf_fnproto_id < 0 ? "proto" : "func");
 		return -1;
 	}
-	if (!fn) {
-		struct btf_encoder_func_state *state = &func->state;
-		uint16_t idx;
-
-		if (state->nr_annots == 0)
-			return 0;
+	if (state->nr_annots == 0)
+		return 0;
 
-		for (idx = 0; idx < state->nr_annots; idx++) {
-			struct btf_encoder_func_annot *a = &state->annots[idx];
+	for (idx = 0; idx < state->nr_annots; idx++) {
+		struct btf_encoder_func_annot *a = &state->annots[idx];
 
-			value = btf__str_by_offset(encoder->btf, a->value);
-			/* adding BTF data may result in a mode of the
-			 * value string memory, so make a temporary copy.
-			 */
-			strncpy(tmp_value, value, sizeof(tmp_value) - 1);
-			component_idx = a->component_idx;
-
-			tag_type_id = btf_encoder__add_decl_tag(encoder, tmp_value,
-								btf_fn_id, component_idx);
-			if (tag_type_id < 0)
-				break;
-		}
-	} else {
-		struct llvm_annotation *annot;
-
-		list_for_each_entry(annot, &fn->annots, node) {
-			value = annot->value;
-			component_idx = annot->component_idx;
+		value = btf__str_by_offset(encoder->btf, a->value);
+		/* adding BTF data may result in a mode of the
+		 * value string memory, so make a temporary copy.
+		 */
+		strncpy(tmp_value, value, sizeof(tmp_value) - 1);
+		component_idx = a->component_idx;
 
-			tag_type_id = btf_encoder__add_decl_tag(encoder, value, btf_fn_id,
-								component_idx);
-			if (tag_type_id < 0)
-				break;
-		}
+		tag_type_id = btf_encoder__add_decl_tag(encoder, tmp_value,
+							btf_fn_id, component_idx);
+		if (tag_type_id < 0)
+			break;
 	}
 	if (tag_type_id < 0) {
 		fprintf(stderr,
@@ -1277,8 +1260,9 @@ static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
 		 * just do not _use_ them.  Only exclude functions with
 		 * unexpected register use or multiple inconsistent prototypes.
 		 */
-		if (!state->unexpected_reg && !state->inconsistent_proto) {
-			if (btf_encoder__add_func(encoder, NULL, func))
+		if (!encoder->skip_encoding_inconsistent_proto ||
+		    (!state->unexpected_reg && !state->inconsistent_proto)) {
+			if (btf_encoder__add_func(encoder, func))
 				return -1;
 		}
 		state->processed = 1;
@@ -2339,6 +2323,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 		encoder->force		 = conf_load->btf_encode_force;
 		encoder->gen_floats	 = conf_load->btf_gen_floats;
 		encoder->skip_encoding_decl_tag	 = conf_load->skip_encoding_btf_decl_tag;
+		encoder->skip_encoding_inconsistent_proto = conf_load->skip_encoding_btf_inconsistent_proto;
 		encoder->tag_kfuncs	 = conf_load->btf_decl_tag_kfuncs;
 		encoder->gen_distilled_base = conf_load->btf_gen_distilled_base;
 		encoder->verbose	 = verbose;
@@ -2544,7 +2529,6 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 
 	cu__for_each_function(cu, core_id, fn) {
 		struct elf_function *func = NULL;
-		bool save = false;
 
 		/*
 		 * Skip functions that:
@@ -2566,15 +2550,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 
 			/* prefer exact function name match... */
 			func = btf_encoder__find_function(encoder, name, 0);
-			if (func) {
-				if (func->generated)
-					continue;
-				if (conf_load->skip_encoding_btf_inconsistent_proto)
-					save = true;
-				else
-					func->generated = true;
-			} else if (encoder->functions.suffix_cnt &&
-				   conf_load->btf_gen_optimized) {
+			if (!func && encoder->functions.suffix_cnt &&
+			    conf_load->btf_gen_optimized) {
 				/* falling back to name.isra.0 match if no exact
 				 * match is found; only bother if we found any
 				 * .suffix function names.  The function
@@ -2585,7 +2562,6 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 				func = btf_encoder__find_function(encoder, name,
 								  strlen(name));
 				if (func) {
-					save = true;
 					if (encoder->verbose)
 						printf("matched function '%s' with '%s'%s\n",
 						       name, func->name,
@@ -2603,10 +2579,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 		if (!func)
 			continue;
 
-		if (save)
-			err = btf_encoder__save_func(encoder, fn, func);
-		else
-			err = btf_encoder__add_func(encoder, fn, func);
+		err = btf_encoder__save_func(encoder, fn, func);
 		if (err)
 			goto out;
 	}
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/9] btf_encoder: store,use section-relative addresses in ELF function representation
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
  2024-11-28  1:23 ` [RFC PATCH 1/9] btf_encoder: simplify function encoding Ihor Solodrai
@ 2024-11-28  1:23 ` Ihor Solodrai
  2024-11-29  9:07   ` Eduard Zingerman
  2024-11-28  1:23 ` [RFC PATCH 3/9] btf_encoder: separate elf function, saved function representations Ihor Solodrai
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:23 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

From: Alan Maguire <alan.maguire@oracle.com>

This will help us do more accurate DWARF/ELF matching.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 btf_encoder.c | 37 +++++++++++++++++++++++++++++++------
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 98e4d7d..01d7094 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -88,6 +88,7 @@ struct btf_encoder_func_state {
 struct elf_function {
 	const char	*name;
 	char		*alias;
+	uint32_t	addr;
 	size_t		prefixlen;
 	struct btf_encoder_func_state state;
 };
@@ -131,6 +132,7 @@ struct btf_encoder {
 		int		    allocated;
 		int		    cnt;
 		int		    suffix_cnt; /* number of .isra, .part etc */
+		uint64_t	    base_addr;
 	} functions;
 };
 
@@ -1274,13 +1276,23 @@ static int functions_cmp(const void *_a, const void *_b)
 {
 	const struct elf_function *a = _a;
 	const struct elf_function *b = _b;
+	int ret;
 
 	/* if search key allows prefix match, verify target has matching
 	 * prefix len and prefix matches.
 	 */
 	if (a->prefixlen && a->prefixlen == b->prefixlen)
-		return strncmp(a->name, b->name, b->prefixlen);
-	return strcmp(a->name, b->name);
+		ret = strncmp(a->name, b->name, b->prefixlen);
+	else
+		ret = strcmp(a->name, b->name);
+	if (ret != 0)
+		return ret;
+	/* avoid address mismatch */
+        if (a->addr != 0 && b->addr != 0) {
+                if (a->addr != b->addr)
+                        return a->addr > b->addr ? 1 : -1;
+        }
+	return 0;
 }
 
 #ifndef max
@@ -1301,6 +1313,7 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *
 {
 	struct elf_function *new;
 	const char *name;
+	uint64_t addr;
 
 	if (elf_sym__type(sym) != STT_FUNC)
 		return 0;
@@ -1325,6 +1338,9 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *
 	memset(&encoder->functions.entries[encoder->functions.cnt], 0,
 	       sizeof(*new));
 	encoder->functions.entries[encoder->functions.cnt].name = name;
+	/* convert to absoulte address for DWARF/ELF matching. */
+	addr = elf_sym__value(sym);
+	encoder->functions.entries[encoder->functions.cnt].addr = (uint32_t)addr;
 	if (strchr(name, '.')) {
 		const char *suffix = strchr(name, '.');
 
@@ -1336,9 +1352,10 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *
 }
 
 static struct elf_function *btf_encoder__find_function(const struct btf_encoder *encoder,
-						       const char *name, size_t prefixlen)
+						       const char *name, size_t prefixlen,
+						       uint32_t addr)
 {
-	struct elf_function key = { .name = name, .prefixlen = prefixlen };
+	struct elf_function key = { .name = name, .prefixlen = prefixlen, .addr = addr };
 
 	return bsearch(&key, encoder->functions.entries, encoder->functions.cnt, sizeof(key), functions_cmp);
 }
@@ -2086,11 +2103,16 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 
 static int btf_encoder__collect_symbols(struct btf_encoder *encoder)
 {
+	bool base_addr_set = false;
 	uint32_t sym_sec_idx;
 	uint32_t core_id;
 	GElf_Sym sym;
 
 	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
+		if (!base_addr_set && sym_sec_idx && sym_sec_idx < encoder->seccnt) {
+			encoder->functions.base_addr = encoder->secinfo[sym_sec_idx].addr;
+			base_addr_set = true;
+		}
 		if (btf_encoder__collect_function(encoder, &sym))
 			return -1;
 	}
@@ -2543,13 +2565,16 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 			continue;
 		if (encoder->functions.cnt) {
 			const char *name;
+			uint64_t addr;
 
 			name = function__name(fn);
 			if (!name)
 				continue;
 
+			addr = (uint32_t)function__addr(fn);
+
 			/* prefer exact function name match... */
-			func = btf_encoder__find_function(encoder, name, 0);
+			func = btf_encoder__find_function(encoder, name, 0, addr);
 			if (!func && encoder->functions.suffix_cnt &&
 			    conf_load->btf_gen_optimized) {
 				/* falling back to name.isra.0 match if no exact
@@ -2560,7 +2585,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 				 * in any cu.
 				 */
 				func = btf_encoder__find_function(encoder, name,
-								  strlen(name));
+								  strlen(name), addr);
 				if (func) {
 					if (encoder->verbose)
 						printf("matched function '%s' with '%s'%s\n",
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 3/9] btf_encoder: separate elf function, saved function representations
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
  2024-11-28  1:23 ` [RFC PATCH 1/9] btf_encoder: simplify function encoding Ihor Solodrai
  2024-11-28  1:23 ` [RFC PATCH 2/9] btf_encoder: store,use section-relative addresses in ELF function representation Ihor Solodrai
@ 2024-11-28  1:23 ` Ihor Solodrai
  2024-11-29 20:37   ` Eduard Zingerman
  2024-11-28  1:24 ` [RFC PATCH 4/9] dwarf_loader: introduce pre_load_module hook to conf_load Ihor Solodrai
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:23 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

From: Alan Maguire <alan.maguire@oracle.com>

Have saved function representation point back at immutable ELF function
table.  This will make sharing the ELF function table across encoders
easier.  Simply accumulate saved functions for each encoder, and on
completion combine them into a name-sorted list.  Then carry out
comparisons to check for inconsistent representations, skipping functions
that are inconsistent in their representation.

Thre is a small growth in maximum resident set size due to saving
more functions; it grows from

	Maximum resident set size (kbytes): 701888

to:

	Maximum resident set size (kbytes): 704168

...with this patch for -j1.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 btf_encoder.c | 297 +++++++++++++++++++++++++++-----------------------
 1 file changed, 161 insertions(+), 136 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 01d7094..6114cc8 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -72,14 +72,15 @@ struct btf_encoder_func_annot {
 
 /* state used to do later encoding of saved functions */
 struct btf_encoder_func_state {
+	struct list_head node;
+	struct btf_encoder *encoder;
+	struct elf_function *elf;
 	uint32_t type_id_off;
 	uint16_t nr_parms;
 	uint16_t nr_annots;
-	uint8_t initialized:1;
 	uint8_t optimized_parms:1;
 	uint8_t unexpected_reg:1;
 	uint8_t inconsistent_proto:1;
-	uint8_t processed:1;
 	int ret_type_id;
 	struct btf_encoder_func_parm *parms;
 	struct btf_encoder_func_annot *annots;
@@ -90,7 +91,6 @@ struct elf_function {
 	char		*alias;
 	uint32_t	addr;
 	size_t		prefixlen;
-	struct btf_encoder_func_state state;
 };
 
 struct elf_secinfo {
@@ -127,6 +127,7 @@ struct btf_encoder {
 	struct elf_secinfo *secinfo;
 	size_t             seccnt;
 	int                encode_vars;
+	struct list_head   func_states;
 	struct {
 		struct elf_function *entries;
 		int		    allocated;
@@ -695,25 +696,26 @@ static int32_t btf_encoder__tag_type(struct btf_encoder *encoder, uint32_t tag_t
 }
 
 static int32_t btf_encoder__add_func_proto(struct btf_encoder *encoder, struct ftype *ftype,
-					   struct elf_function *func)
+					   struct btf_encoder_func_state *state)
 {
-	struct btf *btf = encoder->btf;
 	const struct btf_type *t;
+	struct btf *btf;
 	struct parameter *param;
 	uint16_t nr_params, param_idx;
 	int32_t id, type_id;
 	char tmp_name[KSYM_NAME_LEN];
 	const char *name;
-	struct btf_encoder_func_state *state;
 
-	assert(ftype != NULL || func != NULL);
+	assert(ftype != NULL || state != NULL);
 
 	/* add btf_type for func_proto */
 	if (ftype) {
+		btf = encoder->btf;
 		nr_params = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0);
 		type_id = btf_encoder__tag_type(encoder, ftype->tag.type);
-	} else if (func) {
-		state = &func->state;
+	} else if (state) {
+		encoder = state->encoder;
+		btf = state->encoder->btf;
 		nr_params = state->nr_parms;
 		type_id = state->ret_type_id;
 	} else {
@@ -1033,10 +1035,13 @@ static bool types__match(struct btf_encoder *encoder,
 	return false;
 }
 
-static bool funcs__match(struct btf_encoder *encoder, struct elf_function *func,
-			 struct btf *btf1, struct btf_encoder_func_state *s1,
-			 struct btf *btf2, struct btf_encoder_func_state *s2)
+static bool funcs__match(struct btf_encoder_func_state *s1,
+			 struct btf_encoder_func_state *s2)
 {
+	struct btf_encoder *encoder = s1->encoder;
+	struct elf_function *func = s1->elf;
+	struct btf *btf1 = s1->encoder->btf;
+	struct btf *btf2 = s2->encoder->btf;
 	uint8_t i;
 
 	if (s1->nr_parms != s2->nr_parms) {
@@ -1074,8 +1079,7 @@ static bool funcs__match(struct btf_encoder *encoder, struct elf_function *func,
 
 static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn, struct elf_function *func)
 {
-	struct btf_encoder_func_state *existing = &func->state;
-	struct btf_encoder_func_state state = { 0 };
+	struct btf_encoder_func_state *state = zalloc(sizeof(*state));
 	struct ftype *ftype = &fn->proto;
 	struct btf *btf = encoder->btf;
 	struct llvm_annotation *annot;
@@ -1083,22 +1087,23 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
 	uint8_t param_idx = 0;
 	int str_off, err = 0;
 
-	/* if already skipping this function, no need to proceed. */
-	if (existing->unexpected_reg || existing->inconsistent_proto)
-		return 0;
+	if (!state)
+		return -ENOMEM;
 
-	state.nr_parms = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0);
-	state.ret_type_id = ftype->tag.type == 0 ? 0 : encoder->type_id_off + ftype->tag.type;
-	if (state.nr_parms > 0) {
-		state.parms = zalloc(state.nr_parms * sizeof(*state.parms));
-		if (!state.parms) {
+	state->encoder = encoder;
+	state->elf = func;
+	state->nr_parms = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0);
+	state->ret_type_id = ftype->tag.type == 0 ? 0 : encoder->type_id_off + ftype->tag.type;
+	if (state->nr_parms > 0) {
+		state->parms = zalloc(state->nr_parms * sizeof(*state->parms));
+		if (!state->parms) {
 			err = -ENOMEM;
 			goto out;
 		}
 	}
-	state.inconsistent_proto = ftype->inconsistent_proto;
-	state.unexpected_reg = ftype->unexpected_reg;
-	state.optimized_parms = ftype->optimized_parms;
+	state->inconsistent_proto = ftype->inconsistent_proto;
+	state->unexpected_reg = ftype->unexpected_reg;
+	state->optimized_parms = ftype->optimized_parms;
 	ftype__for_each_parameter(ftype, param) {
 		const char *name = parameter__name(param) ?: "";
 
@@ -1107,21 +1112,21 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
 			err = str_off;
 			goto out;
 		}
-		state.parms[param_idx].name_off = str_off;
-		state.parms[param_idx].type_id = param->tag.type == 0 ? 0 :
-						encoder->type_id_off + param->tag.type;
+		state->parms[param_idx].name_off = str_off;
+		state->parms[param_idx].type_id = param->tag.type == 0 ? 0 :
+						  encoder->type_id_off + param->tag.type;
 		param_idx++;
 	}
 	if (ftype->unspec_parms)
-		state.parms[param_idx].type_id = 0;
+		state->parms[param_idx].type_id = 0;
 
 	list_for_each_entry(annot, &fn->annots, node)
-		state.nr_annots++;
-	if (state.nr_annots) {
+		state->nr_annots++;
+	if (state->nr_annots) {
 		uint8_t idx = 0;
 
-		state.annots = zalloc(state.nr_annots * sizeof(*state.annots));
-		if (!state.annots) {
+		state->annots = zalloc(state->nr_annots * sizeof(*state->annots));
+		if (!state->annots) {
 			err = -ENOMEM;
 			goto out;
 		}
@@ -1131,46 +1136,24 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
 				err = str_off;
 				goto out;
 			}
-			state.annots[idx].value = str_off;
-			state.annots[idx].component_idx = annot->component_idx;
+			state->annots[idx].value = str_off;
+			state->annots[idx].component_idx = annot->component_idx;
 			idx++;
 		}
 	}
-	state.initialized = 1;
-
-	if (state.unexpected_reg)
-		btf_encoder__log_func_skip(encoder, func,
-					   "unexpected register used for parameter\n");
-	if (!existing->initialized) {
-		memcpy(existing, &state, sizeof(*existing));
-		return 0;
-	}
-
-	/* If saving and we find an existing entry, we want to merge
-	 * observations across both functions, checking that the
-	 * "seen optimized parameters", "inconsistent prototype"
-	 * and "unexpected register" status is reflected in the
-	 * func entry.
-	 * If the entry is new, record encoder state required
-	 * to add the local function later (encoder + type_id_off)
-	 * such that we can add the function later.
-	 */
-	existing->optimized_parms |= state.optimized_parms;
-	existing->unexpected_reg |= state.unexpected_reg;
-	if (!existing->unexpected_reg &&
-	    !funcs__match(encoder, func, encoder->btf, &state,
-			   encoder->btf, existing))
-		existing->inconsistent_proto = 1;
+	list_add_tail(&state->node, &encoder->func_states);
+	return 0;
 out:
-	zfree(&state.annots);
-	zfree(&state.parms);
+	zfree(&state->annots);
+	zfree(&state->parms);
+	free(state);
 	return err;
 }
 
 static int32_t btf_encoder__add_func(struct btf_encoder *encoder,
-				     struct elf_function *func)
+				     struct btf_encoder_func_state *state)
 {
-	struct btf_encoder_func_state *state = &func->state;
+	struct elf_function *func = state->elf;
 	int btf_fnproto_id, btf_fn_id, tag_type_id = 0;
 	int16_t component_idx = -1;
 	const char *name;
@@ -1178,7 +1161,7 @@ static int32_t btf_encoder__add_func(struct btf_encoder *encoder,
 	char tmp_value[KSYM_NAME_LEN];
 	uint16_t idx;
 
-	btf_fnproto_id = btf_encoder__add_func_proto(encoder, NULL, func);
+	btf_fnproto_id = btf_encoder__add_func_proto(encoder, NULL, state);
 	name = func->alias ?: func->name;
 	if (btf_fnproto_id >= 0)
 		btf_fn_id = btf_encoder__add_ref_type(encoder, BTF_KIND_FUNC, btf_fnproto_id,
@@ -1216,62 +1199,6 @@ static int32_t btf_encoder__add_func(struct btf_encoder *encoder,
 	return 0;
 }
 
-static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
-{
-	int i;
-
-	for (i = 0; i < encoder->functions.cnt; i++) {
-		struct elf_function *func = &encoder->functions.entries[i];
-		struct btf_encoder_func_state *state = &func->state;
-		struct btf_encoder *other_encoder = NULL;
-
-		if (!state->initialized || state->processed)
-			continue;
-		/* merge optimized-out status across encoders; since each
-		 * encoder has the same elf symbol table we can use the
-		 * same index to access the same elf symbol.
-		 */
-		btf_encoders__for_each_encoder(other_encoder) {
-			struct elf_function *other_func;
-			struct btf_encoder_func_state *other_state;
-			uint8_t optimized, unexpected, inconsistent;
-
-			if (other_encoder == encoder)
-				continue;
-
-			other_func = &other_encoder->functions.entries[i];
-			other_state = &other_func->state;
-			if (!other_state->initialized)
-				continue;
-			optimized = state->optimized_parms | other_state->optimized_parms;
-			unexpected = state->unexpected_reg | other_state->unexpected_reg;
-			inconsistent = state->inconsistent_proto | other_state->inconsistent_proto;
-			if (!unexpected && !inconsistent &&
-			    !funcs__match(encoder, func,
-					  encoder->btf, state,
-					  other_encoder->btf, other_state))
-				inconsistent = 1;
-			state->optimized_parms = other_state->optimized_parms = optimized;
-			state->unexpected_reg = other_state->unexpected_reg = unexpected;
-			state->inconsistent_proto = other_state->inconsistent_proto = inconsistent;
-
-			other_state->processed = 1;
-		}
-		/* do not exclude functions with optimized-out parameters; they
-		 * may still be _called_ with the right parameter values, they
-		 * just do not _use_ them.  Only exclude functions with
-		 * unexpected register use or multiple inconsistent prototypes.
-		 */
-		if (!encoder->skip_encoding_inconsistent_proto ||
-		    (!state->unexpected_reg && !state->inconsistent_proto)) {
-			if (btf_encoder__add_func(encoder, func))
-				return -1;
-		}
-		state->processed = 1;
-	}
-	return 0;
-}
-
 static int functions_cmp(const void *_a, const void *_b)
 {
 	const struct elf_function *a = _a;
@@ -1282,16 +1209,16 @@ static int functions_cmp(const void *_a, const void *_b)
 	 * prefix len and prefix matches.
 	 */
 	if (a->prefixlen && a->prefixlen == b->prefixlen)
-		ret = strncmp(a->name, b->name, b->prefixlen);
+		ret = strncmp(a->name, b->name, a->prefixlen);
 	else
 		ret = strcmp(a->name, b->name);
 	if (ret != 0)
 		return ret;
 	/* avoid address mismatch */
-        if (a->addr != 0 && b->addr != 0) {
-                if (a->addr != b->addr)
-                        return a->addr > b->addr ? 1 : -1;
-        }
+	if (a->addr != 0 && b->addr != 0) {
+		if (a->addr != b->addr)
+			return a->addr > b->addr ? 1 : -1;
+	}
 	return 0;
 }
 
@@ -1309,6 +1236,109 @@ static void *reallocarray_grow(void *ptr, int *nmemb, size_t size)
 	return new;
 }
 
+static int saved_functions_cmp(const void *_a, const void *_b)
+{
+	struct btf_encoder_func_state * const *a = _a;
+	struct btf_encoder_func_state * const *b = _b;
+
+	return functions_cmp((*a)->elf, (*b)->elf);
+}
+
+static int saved_functions_combine(void *_a, void *_b)
+{
+	uint8_t optimized, unexpected, inconsistent;
+	struct btf_encoder_func_state *a = _a;
+	struct btf_encoder_func_state *b = _b;
+	int ret;
+
+	ret = strncmp(a->elf->name, b->elf->name,
+		      max(a->elf->prefixlen, b->elf->prefixlen));
+	if (ret != 0)
+		return ret;
+	optimized = a->optimized_parms | b->optimized_parms;
+	unexpected = a->unexpected_reg | b->unexpected_reg;
+	inconsistent = a->inconsistent_proto | b->inconsistent_proto;
+	if (!unexpected && !inconsistent && !funcs__match(a, b))
+		inconsistent = 1;
+	a->optimized_parms = b->optimized_parms = optimized;
+	a->unexpected_reg = b->unexpected_reg = unexpected;
+	a->inconsistent_proto = b->inconsistent_proto = inconsistent;
+
+	return 0;
+}
+
+static void btf_encoder__delete_saved_funcs(struct btf_encoder *encoder)
+{
+	struct btf_encoder_func_state *pos, *s;
+
+	list_for_each_entry_safe(pos, s, &encoder->func_states, node) {
+		list_del(&pos->node);
+		free(pos->parms);
+		free(pos->annots);
+		free(pos);
+	}
+}
+
+static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
+{
+	struct btf_encoder_func_state **saved_fns, *s;
+	struct btf_encoder *e = NULL;
+	int i = 0, j, nr_saved_fns = 0;
+
+	/* Retrieve function states from each encoder, combine them
+	 * and sort by name, addr.
+	 */
+	btf_encoders__for_each_encoder(e) {
+		list_for_each_entry(s, &e->func_states, node)
+			nr_saved_fns++;
+	}
+	/* Another thread already did this work */
+	if (nr_saved_fns == 0) {
+		printf("nothing to do for encoder...\n");
+		return 0;
+	}
+
+	printf("got %d saved functions...\n", nr_saved_fns);
+	saved_fns = calloc(nr_saved_fns, sizeof(*saved_fns));
+	btf_encoders__for_each_encoder(e) {
+		list_for_each_entry(s, &e->func_states, node)
+			saved_fns[i++] = s;
+	}
+	printf("added %d saved fns\n", i);
+	qsort(saved_fns, nr_saved_fns, sizeof(*saved_fns), saved_functions_cmp);
+
+	for (i = 0; i < nr_saved_fns; i = j) {
+		struct btf_encoder_func_state *state = saved_fns[i];
+
+		/* Compare across sorted functions that match by name/prefix;
+		 * share inconsistent/unexpected reg state between them.
+		 */
+		j = i + 1;
+
+		while (j < nr_saved_fns &&
+		       saved_functions_combine(saved_fns[i], saved_fns[j]) == 0)
+				j++;
+
+		/* do not exclude functions with optimized-out parameters; they
+		 * may still be _called_ with the right parameter values, they
+		 * just do not _use_ them.  Only exclude functions with
+		 * unexpected register use or multiple inconsistent prototypes.
+		 */
+		if (!encoder->skip_encoding_inconsistent_proto ||
+		    (!state->unexpected_reg && !state->inconsistent_proto)) {
+			if (btf_encoder__add_func(state->encoder, state)) {
+				free(saved_fns);
+				return -1;
+			}
+		}
+	}
+	/* Now that we are done with function states, free them. */
+	free(saved_fns);
+	btf_encoders__for_each_encoder(e)
+		btf_encoder__delete_saved_funcs(e);
+	return 0;
+}
+
 static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *sym)
 {
 	struct elf_function *new;
@@ -1346,6 +1376,8 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *
 
 		encoder->functions.suffix_cnt++;
 		encoder->functions.entries[encoder->functions.cnt].prefixlen = suffix - name;
+	} else {
+		encoder->functions.entries[encoder->functions.cnt].prefixlen = strlen(name);
 	}
 	encoder->functions.cnt++;
 	return 0;
@@ -2353,6 +2385,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 		encoder->need_index_type = false;
 		encoder->array_index_id  = 0;
 		encoder->encode_vars = 0;
+		INIT_LIST_HEAD(&encoder->func_states);
 		if (!conf_load->skip_encoding_btf_vars)
 			encoder->encode_vars |= BTF_VAR_PERCPU;
 		if (conf_load->encode_btf_global_vars)
@@ -2437,16 +2470,8 @@ out_delete:
 	return NULL;
 }
 
-void btf_encoder__delete_func(struct elf_function *func)
-{
-	free(func->alias);
-	zfree(&func->state.annots);
-	zfree(&func->state.parms);
-}
-
 void btf_encoder__delete(struct btf_encoder *encoder)
 {
-	int i;
 	size_t shndx;
 
 	if (encoder == NULL)
@@ -2461,12 +2486,12 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 	encoder->btf = NULL;
 	elf_symtab__delete(encoder->symtab);
 
-	for (i = 0; i < encoder->functions.cnt; i++)
-		btf_encoder__delete_func(&encoder->functions.entries[i]);
 	encoder->functions.allocated = encoder->functions.cnt = 0;
 	free(encoder->functions.entries);
 	encoder->functions.entries = NULL;
 
+	btf_encoder__delete_saved_funcs(encoder);
+
 	free(encoder);
 }
 
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 4/9] dwarf_loader: introduce pre_load_module hook to conf_load
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
                   ` (2 preceding siblings ...)
  2024-11-28  1:23 ` [RFC PATCH 3/9] btf_encoder: separate elf function, saved function representations Ihor Solodrai
@ 2024-11-28  1:24 ` Ihor Solodrai
  2024-11-29 21:03   ` Eduard Zingerman
  2024-11-28  1:24 ` [RFC PATCH 5/9] btf_encoder: introduce elf_functions struct type Ihor Solodrai
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:24 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

Add a function pointer to conf_load, which is called immediately after
Elf is extracted from Dwfl_Module in cus__proces_dwflmod.

This is a preparation for making elf_functions table shared between
encoders. Shared table can be built as soon as the relevant Elf is
available.

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 dwarf_loader.c | 14 +++++++-------
 dwarves.h      | 11 +++++++++--
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 598fde4..5d55649 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3796,13 +3796,6 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
 	return DWARF_CB_OK;
 }
 
-struct process_dwflmod_parms {
-	struct cus	 *cus;
-	struct conf_load *conf;
-	const char	 *filename;
-	uint32_t	 nr_dwarf_sections_found;
-};
-
 static int cus__process_dwflmod(Dwfl_Module *dwflmod,
 				void **userdata __maybe_unused,
 				const char *name __maybe_unused,
@@ -3826,11 +3819,18 @@ static int cus__process_dwflmod(Dwfl_Module *dwflmod,
 	Dwarf *dw = dwfl_module_getdwarf(dwflmod, &dwbias);
 
 	int err = DWARF_CB_OK;
+	if (parms->conf->pre_load_module) {
+		err = parms->conf->pre_load_module(dwflmod, elf);
+		if (err)
+			return DWARF_CB_ABORT;
+	}
+
 	if (dw != NULL) {
 		++parms->nr_dwarf_sections_found;
 		err = cus__load_module(cus, parms->conf, dwflmod, dw, elf,
 				       parms->filename);
 	}
+
 	/*
 	 * XXX We will fall back to try finding other debugging
 	 * formats (CTF), so no point in telling this to the user
diff --git a/dwarves.h b/dwarves.h
index 1cb0d62..1a0bd4b 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -37,6 +37,7 @@
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
 
 struct cu;
+struct cus;
 
 enum load_steal_kind {
 	LSK__KEEPIT,
@@ -59,6 +60,13 @@ typedef uint32_t type_id_t;
 struct btf;
 struct conf_fprintf;
 
+struct process_dwflmod_parms {
+	struct cus	 *cus;
+	struct conf_load *conf;
+	const char	 *filename;
+	uint32_t	 nr_dwarf_sections_found;
+};
+
 /** struct conf_load - load configuration
  * @thread_exit - called at the end of a thread, 1st user: BTF encoder dedup
  * @extra_dbg_info - keep original debugging format extra info
@@ -107,6 +115,7 @@ struct conf_load {
 	struct conf_fprintf	*conf_fprintf;
 	int			(*threads_prepare)(struct conf_load *conf, int nr_threads, void **thr_data);
 	int			(*threads_collect)(struct conf_load *conf, int nr_threads, void **thr_data, int error);
+	int			(*pre_load_module)(Dwfl_Module *mod, Elf *elf);
 };
 
 /** struct conf_fprintf - hints to the __fprintf routines
@@ -168,8 +177,6 @@ struct conf_fprintf {
 	uint8_t    skip_emitting_modifier:1;
 };
 
-struct cus;
-
 struct cus *cus__new(void);
 void cus__delete(struct cus *cus);
 
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 5/9] btf_encoder: introduce elf_functions struct type
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
                   ` (3 preceding siblings ...)
  2024-11-28  1:24 ` [RFC PATCH 4/9] dwarf_loader: introduce pre_load_module hook to conf_load Ihor Solodrai
@ 2024-11-28  1:24 ` Ihor Solodrai
  2024-11-29 21:50   ` Eduard Zingerman
  2024-11-28  1:24 ` [RFC PATCH 6/9] btf_encoder: collect elf_functions in btf_encoder__pre_load_module Ihor Solodrai
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:24 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

Extract elf_functions struct type from btf_encoder.

Replace methods operating functions table in btf_encoder by methods
operating on elf_functions:
- btf_encoder__collect_function -> elf_functions__collect_function
- btf_encoder__collect_symbols -> elf_functions__collect

Now these functions do not depend on btf_encoder being passed to them
as a parameter.

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 btf_encoder.c | 125 ++++++++++++++++++++++++++------------------------
 1 file changed, 66 insertions(+), 59 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 6114cc8..8331efe 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -102,6 +102,13 @@ struct elf_secinfo {
 	struct gobuffer secinfo;
 };
 
+struct elf_functions {
+	struct elf_symtab *symtab;
+	struct elf_function *entries;
+	int cnt;
+	int suffix_cnt; /* number of .isra, .part etc */
+};
+
 /*
  * cu: cu being processed.
  */
@@ -128,13 +135,7 @@ struct btf_encoder {
 	size_t             seccnt;
 	int                encode_vars;
 	struct list_head   func_states;
-	struct {
-		struct elf_function *entries;
-		int		    allocated;
-		int		    cnt;
-		int		    suffix_cnt; /* number of .isra, .part etc */
-		uint64_t	    base_addr;
-	} functions;
+	struct elf_functions functions;
 };
 
 struct btf_func {
@@ -1226,16 +1227,6 @@ static int functions_cmp(const void *_a, const void *_b)
 #define max(x, y) ((x) < (y) ? (y) : (x))
 #endif
 
-static void *reallocarray_grow(void *ptr, int *nmemb, size_t size)
-{
-	int new_nmemb = max(1000, *nmemb * 3 / 2);
-	void *new = realloc(ptr, new_nmemb * size);
-
-	if (new)
-		*nmemb = new_nmemb;
-	return new;
-}
-
 static int saved_functions_cmp(const void *_a, const void *_b)
 {
 	struct btf_encoder_func_state * const *a = _a;
@@ -1339,47 +1330,34 @@ static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
 	return 0;
 }
 
-static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *sym)
+static int elf_functions__collect_function(struct elf_functions *functions, GElf_Sym *sym)
 {
-	struct elf_function *new;
+	struct elf_function *func;
 	const char *name;
 	uint64_t addr;
 
 	if (elf_sym__type(sym) != STT_FUNC)
 		return 0;
-	name = elf_sym__name(sym, encoder->symtab);
+
+	name = elf_sym__name(sym, functions->symtab);
 	if (!name)
 		return 0;
 
-	if (encoder->functions.cnt == encoder->functions.allocated) {
-		new = reallocarray_grow(encoder->functions.entries,
-					&encoder->functions.allocated,
-					sizeof(*encoder->functions.entries));
-		if (!new) {
-			/*
-			 * The cleanup - delete_functions is called
-			 * in btf_encoder__encode_cu error path.
-			 */
-			return -1;
-		}
-		encoder->functions.entries = new;
-	}
-
-	memset(&encoder->functions.entries[encoder->functions.cnt], 0,
-	       sizeof(*new));
-	encoder->functions.entries[encoder->functions.cnt].name = name;
+	func = &functions->entries[functions->cnt];
+	func->name = name;
 	/* convert to absoulte address for DWARF/ELF matching. */
 	addr = elf_sym__value(sym);
-	encoder->functions.entries[encoder->functions.cnt].addr = (uint32_t)addr;
+	func->addr = (uint32_t)addr;
 	if (strchr(name, '.')) {
 		const char *suffix = strchr(name, '.');
-
-		encoder->functions.suffix_cnt++;
-		encoder->functions.entries[encoder->functions.cnt].prefixlen = suffix - name;
+		functions->suffix_cnt++;
+		func->prefixlen = suffix - name;
 	} else {
-		encoder->functions.entries[encoder->functions.cnt].prefixlen = strlen(name);
+		func->prefixlen = strlen(name);
 	}
-	encoder->functions.cnt++;
+
+	functions->cnt++;
+
 	return 0;
 }
 
@@ -2132,31 +2110,59 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 	return err;
 }
 
-
-static int btf_encoder__collect_symbols(struct btf_encoder *encoder)
+static int elf_functions__collect(struct elf_functions *functions)
 {
-	bool base_addr_set = false;
-	uint32_t sym_sec_idx;
+	uint32_t nr_symbols = elf_symtab__nr_symbols(functions->symtab);
+	struct elf_function *tmp;
+	Elf32_Word sym_sec_idx;
 	uint32_t core_id;
 	GElf_Sym sym;
+	int err;
+
+	/* We know that number of functions is less than number of symbols,
+	 * so we can overallocate temporarily.
+	 */
+	functions->entries = calloc(nr_symbols, sizeof(struct elf_function));
+	if (!functions->entries) {
+		err = -ENOMEM;
+		goto out_free;
+	}
 
-	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
-		if (!base_addr_set && sym_sec_idx && sym_sec_idx < encoder->seccnt) {
-			encoder->functions.base_addr = encoder->secinfo[sym_sec_idx].addr;
-			base_addr_set = true;
+	functions->cnt = 0;
+	elf_symtab__for_each_symbol_index(functions->symtab, core_id, sym, sym_sec_idx) {
+		if (elf_functions__collect_function(functions, &sym)) {
+			err = -1;
+			goto out_free;
 		}
-		if (btf_encoder__collect_function(encoder, &sym))
-			return -1;
 	}
 
-	if (encoder->functions.cnt) {
-		qsort(encoder->functions.entries, encoder->functions.cnt, sizeof(encoder->functions.entries[0]),
+	if (functions->cnt) {
+		qsort(functions->entries,
+		      functions->cnt,
+		      sizeof(functions->entries[0]),
 		      functions_cmp);
-		if (encoder->verbose)
-			printf("Found %d functions!\n", encoder->functions.cnt);
+	} else {
+		err = 0;
+		goto out_free;
+	}
+
+	/* Reallocate to the exact size */
+	tmp = realloc(functions->entries, functions->cnt * sizeof(struct elf_function));
+	if (tmp) {
+		functions->entries = tmp;
+	} else {
+		fprintf(stderr, "could not reallocate memory for elf_functions table\n");
+		err = -ENOMEM;
+		goto out_free;
 	}
 
 	return 0;
+
+out_free:
+	free(functions->entries);
+	functions->entries = NULL;
+	functions->cnt = 0;
+	return err;
 }
 
 static bool ftype__has_arg_names(const struct ftype *ftype)
@@ -2417,6 +2423,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 				printf("%s: '%s' doesn't have symtab.\n", __func__, cu->filename);
 			goto out;
 		}
+		encoder->functions.symtab = encoder->symtab;
 
 		/* index the ELF sections for later lookup */
 
@@ -2455,7 +2462,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 		if (!found_percpu && encoder->verbose)
 			printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
 
-		if (btf_encoder__collect_symbols(encoder))
+		if (elf_functions__collect(&encoder->functions))
 			goto out_delete;
 
 		if (encoder->verbose)
@@ -2486,7 +2493,7 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 	encoder->btf = NULL;
 	elf_symtab__delete(encoder->symtab);
 
-	encoder->functions.allocated = encoder->functions.cnt = 0;
+	encoder->functions.cnt = 0;
 	free(encoder->functions.entries);
 	encoder->functions.entries = NULL;
 
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 6/9] btf_encoder: collect elf_functions in btf_encoder__pre_load_module
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
                   ` (4 preceding siblings ...)
  2024-11-28  1:24 ` [RFC PATCH 5/9] btf_encoder: introduce elf_functions struct type Ihor Solodrai
@ 2024-11-28  1:24 ` Ihor Solodrai
  2024-11-29 22:27   ` Eduard Zingerman
  2024-11-28  1:24 ` [RFC PATCH 7/9] btf_encoder: switch to shared elf_functions table Ihor Solodrai
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:24 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

Introduce a global elf_functions_list variable in btf_encoder.c that
contains an elf_functions per ELF.

An elf_functions structure is allocated and filled out by
btf_encoder__pre_load_module() hook, and the list is cleared after
btf_encoder__encode() is done.

At this point btf_encoders don't use shared elf_functions yet (each
maintains their own copy as before), but it is built before encoders
are initialized.

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 btf_encoder.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++
 btf_encoder.h |  2 ++
 pahole.c      |  3 +++
 3 files changed, 71 insertions(+)

diff --git a/btf_encoder.c b/btf_encoder.c
index 8331efe..8b1db5b 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -103,6 +103,8 @@ struct elf_secinfo {
 };
 
 struct elf_functions {
+	struct list_head node; /* for elf_functions_list */
+	Elf *elf; /* source ELF */
 	struct elf_symtab *symtab;
 	struct elf_function *entries;
 	int cnt;
@@ -149,6 +151,67 @@ struct btf_kfunc_set_range {
 	uint64_t end;
 };
 
+
+/* In principle, multiple ELFs can be processed in one pahole run,
+ * so we have to store elf_functions table per ELF.
+ * An element is added to the list on btf_encoder__pre_load_module,
+ * and removed after btf_encoder__encode is done.
+ */
+static LIST_HEAD(elf_functions_list);
+
+static inline void elf_functions__delete(struct elf_functions *funcs)
+{
+	free(funcs->entries);
+	elf_symtab__delete(funcs->symtab);
+	list_del(&funcs->node);
+	free(funcs);
+}
+
+static inline void elf_functions__delete_all(void)
+{
+	struct list_head *pos, *tmp;
+
+	list_for_each_safe(pos, tmp, &elf_functions_list) {
+		struct elf_functions *funcs = list_entry(pos, struct elf_functions, node);
+
+		elf_functions__delete(funcs);
+	}
+}
+
+static int elf_functions__collect(struct elf_functions *functions);
+
+int btf_encoder__pre_load_module(Dwfl_Module *mod, Elf *elf)
+{
+	struct elf_functions *funcs;
+	int err;
+
+	funcs = calloc(1, sizeof(*funcs));
+	if (!funcs) {
+		err = -ENOMEM;
+		goto out_delete;
+	}
+
+	funcs->symtab = elf_symtab__new(NULL, elf);
+	if (!funcs->symtab) {
+		err = -1;
+		goto out_delete;
+	}
+
+	funcs->elf = elf;
+	err = elf_functions__collect(funcs);
+	if (err)
+		goto out_delete;
+
+	list_add_tail(&funcs->node, &elf_functions_list);
+
+	return 0;
+
+out_delete:
+	elf_functions__delete(funcs);
+	return err;
+}
+
+
 static LIST_HEAD(encoders);
 static pthread_mutex_t encoders__lock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -2107,6 +2170,8 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 #endif
 		err = btf_encoder__write_elf(encoder, encoder->btf, BTF_ELF_SEC);
 	}
+
+	elf_functions__delete_all();
 	return err;
 }
 
@@ -2424,6 +2489,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 			goto out;
 		}
 		encoder->functions.symtab = encoder->symtab;
+		encoder->functions.elf = cu->elf;
 
 		/* index the ELF sections for later lookup */
 
diff --git a/btf_encoder.h b/btf_encoder.h
index 824963b..7debd67 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -34,4 +34,6 @@ struct btf *btf_encoder__btf(struct btf_encoder *encoder);
 
 int btf_encoder__add_encoder(struct btf_encoder *encoder, struct btf_encoder *other);
 
+int btf_encoder__pre_load_module(Dwfl_Module *mod, Elf *elf);
+
 #endif /* _BTF_ENCODER_H_ */
diff --git a/pahole.c b/pahole.c
index fa5d8c7..1f8cf4b 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3737,6 +3737,9 @@ int main(int argc, char *argv[])
 		conf_load.threads_collect = pahole_threads_collect;
 	}
 
+	if (btf_encode)
+		conf_load.pre_load_module = btf_encoder__pre_load_module;
+
 	// Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
 	if (conf.header_type && !class_name && prettify_input) {
 		conf.count = 1;
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 7/9] btf_encoder: switch to shared elf_functions table
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
                   ` (5 preceding siblings ...)
  2024-11-28  1:24 ` [RFC PATCH 6/9] btf_encoder: collect elf_functions in btf_encoder__pre_load_module Ihor Solodrai
@ 2024-11-28  1:24 ` Ihor Solodrai
  2024-11-29 22:35   ` Eduard Zingerman
  2024-11-28  1:24 ` [RFC PATCH 8/9] btf_encoder: introduce btf_encoding_context Ihor Solodrai
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:24 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

Do not collect functions from ELF for each new btf_encoder, and
instead set a pointer to a shared elf_functions table, built
beforehand by btf_encoder__pre_cus__load_module().

Do not call btf_encoder__add_saved_funcs() on every
btf_encoder__add_encoder(). Instead, for non-reproducible
multi-threaded case do that in pahole_threads_collect(), and for
single-threaded or reproducible_build do that right before
btf_encoder__encode().

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 btf_encoder.c | 42 +++++++++++++++++++-----------------------
 btf_encoder.h |  1 +
 pahole.c      |  9 ++++++++-
 3 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 8b1db5b..778481d 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -137,7 +137,7 @@ struct btf_encoder {
 	size_t             seccnt;
 	int                encode_vars;
 	struct list_head   func_states;
-	struct elf_functions functions;
+	struct elf_functions *functions;
 };
 
 struct btf_func {
@@ -159,6 +159,19 @@ struct btf_kfunc_set_range {
  */
 static LIST_HEAD(elf_functions_list);
 
+static struct elf_functions *elf_functions__get(Elf *elf)
+{
+	struct list_head *pos;
+
+	list_for_each(pos, &elf_functions_list) {
+		struct elf_functions *funcs = list_entry(pos, struct elf_functions, node);
+
+		if (funcs->elf == elf)
+			return funcs;
+	}
+	return NULL;
+}
+
 static inline void elf_functions__delete(struct elf_functions *funcs)
 {
 	free(funcs->entries);
@@ -215,8 +228,6 @@ out_delete:
 static LIST_HEAD(encoders);
 static pthread_mutex_t encoders__lock = PTHREAD_MUTEX_INITIALIZER;
 
-static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder);
-
 /* mutex only needed for add/delete, as this can happen in multiple encoding
  * threads.  Traversal of the list is currently confined to thread collection.
  */
@@ -869,8 +880,6 @@ int32_t btf_encoder__add_encoder(struct btf_encoder *encoder, struct btf_encoder
 	if (encoder == other)
 		return 0;
 
-	btf_encoder__add_saved_funcs(other);
-
 	for (shndx = 1; shndx < other->seccnt; shndx++) {
 		struct gobuffer *var_secinfo_buf = &other->secinfo[shndx].secinfo;
 		size_t sz = gobuffer__size(var_secinfo_buf);
@@ -1333,7 +1342,7 @@ static void btf_encoder__delete_saved_funcs(struct btf_encoder *encoder)
 	}
 }
 
-static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
+int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
 {
 	struct btf_encoder_func_state **saved_fns, *s;
 	struct btf_encoder *e = NULL;
@@ -1430,7 +1439,7 @@ static struct elf_function *btf_encoder__find_function(const struct btf_encoder
 {
 	struct elf_function key = { .name = name, .prefixlen = prefixlen, .addr = addr };
 
-	return bsearch(&key, encoder->functions.entries, encoder->functions.cnt, sizeof(key), functions_cmp);
+	return bsearch(&key, encoder->functions->entries, encoder->functions->cnt, sizeof(key), functions_cmp);
 }
 
 static bool btf_name_char_ok(char c, bool first)
@@ -2118,9 +2127,6 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 	int err;
 	size_t shndx;
 
-	/* for single-threaded case, saved funcs are added here */
-	btf_encoder__add_saved_funcs(encoder);
-
 	for (shndx = 1; shndx < encoder->seccnt; shndx++)
 		if (gobuffer__size(&encoder->secinfo[shndx].secinfo))
 			btf_encoder__add_datasec(encoder, shndx);
@@ -2488,8 +2494,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 				printf("%s: '%s' doesn't have symtab.\n", __func__, cu->filename);
 			goto out;
 		}
-		encoder->functions.symtab = encoder->symtab;
-		encoder->functions.elf = cu->elf;
+		encoder->functions = elf_functions__get(cu->elf);
 
 		/* index the ELF sections for later lookup */
 
@@ -2528,9 +2533,6 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 		if (!found_percpu && encoder->verbose)
 			printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
 
-		if (elf_functions__collect(&encoder->functions))
-			goto out_delete;
-
 		if (encoder->verbose)
 			printf("File %s:\n", cu->filename);
 		btf_encoders__add(encoder);
@@ -2559,12 +2561,6 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 	encoder->btf = NULL;
 	elf_symtab__delete(encoder->symtab);
 
-	encoder->functions.cnt = 0;
-	free(encoder->functions.entries);
-	encoder->functions.entries = NULL;
-
-	btf_encoder__delete_saved_funcs(encoder);
-
 	free(encoder);
 }
 
@@ -2661,7 +2657,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 			continue;
 		if (!ftype__has_arg_names(&fn->proto))
 			continue;
-		if (encoder->functions.cnt) {
+		if (encoder->functions->cnt) {
 			const char *name;
 			uint64_t addr;
 
@@ -2673,7 +2669,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 
 			/* prefer exact function name match... */
 			func = btf_encoder__find_function(encoder, name, 0, addr);
-			if (!func && encoder->functions.suffix_cnt &&
+			if (!func && encoder->functions->suffix_cnt &&
 			    conf_load->btf_gen_optimized) {
 				/* falling back to name.isra.0 match if no exact
 				 * match is found; only bother if we found any
diff --git a/btf_encoder.h b/btf_encoder.h
index 7debd67..29f652a 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -33,6 +33,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 struct btf *btf_encoder__btf(struct btf_encoder *encoder);
 
 int btf_encoder__add_encoder(struct btf_encoder *encoder, struct btf_encoder *other);
+int btf_encoder__add_saved_funcs(struct btf_encoder *base_encoder);
 
 int btf_encoder__pre_load_module(Dwfl_Module *mod, Elf *elf);
 
diff --git a/pahole.c b/pahole.c
index 1f8cf4b..b5aea56 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3185,12 +3185,16 @@ static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void *
 	if (error)
 		goto out;
 
+	err = btf_encoder__add_saved_funcs(btf_encoder);
+	if (err < 0)
+		goto out;
+
 	for (i = 0; i < nr_threads; i++) {
 		/*
 		 * Merge content of the btf instances of worker threads to the btf
 		 * instance of the primary btf_encoder.
                 */
-		if (!threads[i]->btf)
+		if (!threads[i]->encoder || !threads[i]->btf)
 			continue;
 		err = btf_encoder__add_encoder(btf_encoder, threads[i]->encoder);
 		if (err < 0)
@@ -3845,6 +3849,9 @@ try_sole_arg_as_class_names:
 			exit(1);
 		}
 
+		if (conf_load.nr_jobs <= 1 || conf_load.reproducible_build)
+			btf_encoder__add_saved_funcs(btf_encoder);
+
 		err = btf_encoder__encode(btf_encoder);
 		if (err) {
 			fputs("Failed to encode BTF\n", stderr);
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 8/9] btf_encoder: introduce btf_encoding_context
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
                   ` (6 preceding siblings ...)
  2024-11-28  1:24 ` [RFC PATCH 7/9] btf_encoder: switch to shared elf_functions table Ihor Solodrai
@ 2024-11-28  1:24 ` Ihor Solodrai
  2024-11-29 23:12   ` Eduard Zingerman
  2024-11-28  1:24 ` [RFC PATCH 9/9] pahole: faster reproducible BTF encoding Ihor Solodrai
  2024-12-02 13:55 ` [RFC PATCH 0/9] pahole: shared ELF and " Jiri Olsa
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:24 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

Introduce a static struct holding global data necessary for BTF
encoding: elf_functions tables and btf_encoder structs.

The context has init()/exit() interface that should be used to
indicate when BTF encoding work has started and ended.

I considered freeing everything contained in the context exclusively
on exit(), however it turns out this unnecessarily increases max
RSS. Probably because the work done in btf_encoder__encode() requires
relatively more memory, and if encoders and tables are freed earlier,
that space is reused.

Compare:
    -j4: 	Maximum resident set size (kbytes): 868484
    -j8: 	Maximum resident set size (kbytes): 1003040
    -j16: 	Maximum resident set size (kbytes): 1039416
    -j32: 	Maximum resident set size (kbytes): 1145312
vs
    -j4: 	Maximum resident set size (kbytes): 972692
    -j8: 	Maximum resident set size (kbytes): 1043184
    -j16: 	Maximum resident set size (kbytes): 1081156
    -j32: 	Maximum resident set size (kbytes): 1218184

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 btf_encoder.c | 117 +++++++++++++++++++++++++++++++++++++-------------
 btf_encoder.h |   3 ++
 pahole.c      |  10 ++++-
 3 files changed, 98 insertions(+), 32 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 778481d..bd5a370 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -151,19 +151,73 @@ struct btf_kfunc_set_range {
 	uint64_t end;
 };
 
+static struct {
+	bool initialized;
+
+	/* In principle, multiple ELFs can be processed in one pahole
+	 * run, so we have to store elf_functions table per ELF.
+	 * An elf_functions struct is added to the list in
+	 * btf_encoder__pre_load_module().
+	 * The list is cleared at the end of btf_encoder__add_saved_funcs().
+	 */
+	struct list_head elf_functions_list;
+
+	/* The mutex only needed for add/delete, as this can happen in
+	 * multiple encoding threads.  A btf_encoder is added to this
+	 * list in btf_encoder__new(), and removed in btf_encoder__delete().
+	 * All encoders except the main one (`btf_encoder` in pahole.c)
+	 * are deleted in pahole_threads_collect().
+	 */
+	pthread_mutex_t btf_encoder_list_lock;
+	struct list_head btf_encoder_list;
+
+} btf_encoding_context;
+
+int btf_encoding_context__init(void)
+{
+	int err = 0;
+
+	if (btf_encoding_context.initialized) {
+		fprintf(stderr, "%s was called while context is already initialized\n", __func__);
+		err = -1;
+		goto out;
+	}
+
+	INIT_LIST_HEAD(&btf_encoding_context.elf_functions_list);
+	INIT_LIST_HEAD(&btf_encoding_context.btf_encoder_list);
+	pthread_mutex_init(&btf_encoding_context.btf_encoder_list_lock, NULL);
+	btf_encoding_context.initialized = true;
+
+out:
+	return err;
+}
+
+static inline void btf_encoder__delete_all(void);
+static inline void elf_functions__delete_all(void);
+
+void btf_encoding_context__exit(void)
+{
+	if (!btf_encoding_context.initialized) {
+		fprintf(stderr, "%s was called while context is not initialized\n", __func__);
+		return;
+	}
+
+	if (!list_empty(&btf_encoding_context.elf_functions_list))
+		elf_functions__delete_all();
+
+	if (!list_empty(&btf_encoding_context.btf_encoder_list))
+		btf_encoder__delete_all();
+
+	pthread_mutex_destroy(&btf_encoding_context.btf_encoder_list_lock);
+	btf_encoding_context.initialized = false;
+}
 
-/* In principle, multiple ELFs can be processed in one pahole run,
- * so we have to store elf_functions table per ELF.
- * An element is added to the list on btf_encoder__pre_load_module,
- * and removed after btf_encoder__encode is done.
- */
-static LIST_HEAD(elf_functions_list);
 
 static struct elf_functions *elf_functions__get(Elf *elf)
 {
 	struct list_head *pos;
 
-	list_for_each(pos, &elf_functions_list) {
+	list_for_each(pos, &btf_encoding_context.elf_functions_list) {
 		struct elf_functions *funcs = list_entry(pos, struct elf_functions, node);
 
 		if (funcs->elf == elf)
@@ -184,9 +238,8 @@ static inline void elf_functions__delete_all(void)
 {
 	struct list_head *pos, *tmp;
 
-	list_for_each_safe(pos, tmp, &elf_functions_list) {
+	list_for_each_safe(pos, tmp, &btf_encoding_context.elf_functions_list) {
 		struct elf_functions *funcs = list_entry(pos, struct elf_functions, node);
-
 		elf_functions__delete(funcs);
 	}
 }
@@ -215,7 +268,7 @@ int btf_encoder__pre_load_module(Dwfl_Module *mod, Elf *elf)
 	if (err)
 		goto out_delete;
 
-	list_add_tail(&funcs->node, &elf_functions_list);
+	list_add_tail(&funcs->node, &btf_encoding_context.elf_functions_list);
 
 	return 0;
 
@@ -224,29 +277,21 @@ out_delete:
 	return err;
 }
 
-
-static LIST_HEAD(encoders);
-static pthread_mutex_t encoders__lock = PTHREAD_MUTEX_INITIALIZER;
-
-/* mutex only needed for add/delete, as this can happen in multiple encoding
- * threads.  Traversal of the list is currently confined to thread collection.
- */
-
 #define btf_encoders__for_each_encoder(encoder)		\
-	list_for_each_entry(encoder, &encoders, node)
+	list_for_each_entry(encoder, &btf_encoding_context.btf_encoder_list, node)
 
 static void btf_encoders__add(struct btf_encoder *encoder)
 {
-	pthread_mutex_lock(&encoders__lock);
-	list_add_tail(&encoder->node, &encoders);
-	pthread_mutex_unlock(&encoders__lock);
+	pthread_mutex_lock(&btf_encoding_context.btf_encoder_list_lock);
+	list_add_tail(&encoder->node, &btf_encoding_context.btf_encoder_list);
+	pthread_mutex_unlock(&btf_encoding_context.btf_encoder_list_lock);
 }
 
 static void btf_encoders__delete(struct btf_encoder *encoder)
 {
 	struct btf_encoder *existing = NULL;
 
-	pthread_mutex_lock(&encoders__lock);
+	pthread_mutex_lock(&btf_encoding_context.btf_encoder_list_lock);
 	/* encoder may not have been added to list yet; check. */
 	btf_encoders__for_each_encoder(existing) {
 		if (encoder == existing)
@@ -254,7 +299,7 @@ static void btf_encoders__delete(struct btf_encoder *encoder)
 	}
 	if (encoder == existing)
 		list_del(&encoder->node);
-	pthread_mutex_unlock(&encoders__lock);
+	pthread_mutex_unlock(&btf_encoding_context.btf_encoder_list_lock);
 }
 
 #define PERCPU_SECTION ".data..percpu"
@@ -1355,19 +1400,15 @@ int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
 		list_for_each_entry(s, &e->func_states, node)
 			nr_saved_fns++;
 	}
-	/* Another thread already did this work */
-	if (nr_saved_fns == 0) {
-		printf("nothing to do for encoder...\n");
+
+	if (nr_saved_fns == 0)
 		return 0;
-	}
 
-	printf("got %d saved functions...\n", nr_saved_fns);
 	saved_fns = calloc(nr_saved_fns, sizeof(*saved_fns));
 	btf_encoders__for_each_encoder(e) {
 		list_for_each_entry(s, &e->func_states, node)
 			saved_fns[i++] = s;
 	}
-	printf("added %d saved fns\n", i);
 	qsort(saved_fns, nr_saved_fns, sizeof(*saved_fns), saved_functions_cmp);
 
 	for (i = 0; i < nr_saved_fns; i = j) {
@@ -1399,6 +1440,9 @@ int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
 	free(saved_fns);
 	btf_encoders__for_each_encoder(e)
 		btf_encoder__delete_saved_funcs(e);
+
+	elf_functions__delete_all();
+
 	return 0;
 }
 
@@ -2177,7 +2221,6 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 		err = btf_encoder__write_elf(encoder, encoder->btf, BTF_ELF_SEC);
 	}
 
-	elf_functions__delete_all();
 	return err;
 }
 
@@ -2564,6 +2607,18 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 	free(encoder);
 }
 
+static inline void btf_encoder__delete_all(void)
+{
+	struct btf_encoder *encoder;
+	struct list_head *pos, *tmp;
+
+	list_for_each_safe(pos, tmp, &btf_encoding_context.btf_encoder_list) {
+		encoder = list_entry(pos, struct btf_encoder, node);
+		btf_encoder__delete(encoder);
+	}
+}
+
+
 int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct conf_load *conf_load)
 {
 	struct llvm_annotation *annot;
diff --git a/btf_encoder.h b/btf_encoder.h
index 29f652a..73d6e13 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -23,6 +23,9 @@ enum btf_var_option {
 	BTF_VAR_GLOBAL = 2,
 };
 
+int btf_encoding_context__init(void);
+void btf_encoding_context__exit(void);
+
 struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool verbose, struct conf_load *conf_load);
 void btf_encoder__delete(struct btf_encoder *encoder);
 
diff --git a/pahole.c b/pahole.c
index b5aea56..6b46399 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3741,8 +3741,12 @@ int main(int argc, char *argv[])
 		conf_load.threads_collect = pahole_threads_collect;
 	}
 
-	if (btf_encode)
+	if (btf_encode) {
 		conf_load.pre_load_module = btf_encoder__pre_load_module;
+		err = btf_encoding_context__init();
+		if (err < 0)
+			goto out;
+	}
 
 	// Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
 	if (conf.header_type && !class_name && prettify_input) {
@@ -3858,7 +3862,11 @@ try_sole_arg_as_class_names:
 			goto out_cus_delete;
 		}
 	}
+
 out_ok:
+	if (btf_encode)
+		btf_encoding_context__exit();
+
 	if (stats_formatter != NULL)
 		print_stats();
 
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 9/9] pahole: faster reproducible BTF encoding
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
                   ` (7 preceding siblings ...)
  2024-11-28  1:24 ` [RFC PATCH 8/9] btf_encoder: introduce btf_encoding_context Ihor Solodrai
@ 2024-11-28  1:24 ` Ihor Solodrai
  2024-11-30  0:17   ` Eduard Zingerman
  2024-12-02 13:55 ` [RFC PATCH 0/9] pahole: shared ELF and " Jiri Olsa
  9 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-11-28  1:24 UTC (permalink / raw)
  To: dwarves, acme; +Cc: bpf, alan.maguire, eddyz87, andrii, mykolal

Change multithreaded implementation of BTF encoding:

  * Use a single btf_encoder accumulating BTF for all compilation units
  * Make BTF encoding routine exclusive: only one thread at a time may
    execute btf_encoder__encode_cu
  * Introduce CU ids: an id is an index of a CU, in order they are
    created in dwarf_loader.c
  * Introduce CU__PROCESSED cu_state to inidicate what CUs have been
    processed by the encoder
  * Enforce encoding order of compilation units (struct cu) loaded
    from DWARF by utilizing global struct cus as a queue
  * reproducible_build option is now moot: BTF encoding is always
    reproducible with this change
  * Most of the code that merged the results of multiple BTF encoders
    into one BTF after CU processing is removed

Motivation behind this change and analysis that led to it are in the
cover letter to the patch series.

In short, this implementation of BTF encoding makes it reproducible
without sacrificing the performance gains from parallel
processing. The speed in terms of wall-clock time is comparable to
non-reproducible runs on pahole/next [1]. The memory footprint is
lower with increased number of threads.

pahole/next (12ca112):

            Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

    50,493,244,369      cycles                                                                  ( +-  0.26% )

            1.6863 +- 0.0150 seconds time elapsed  ( +-  0.89% )

jobs 1, mem 546556 Kb, time 4.53 sec
jobs 2, mem 599776 Kb, time 2.81 sec
jobs 4, mem 661756 Kb, time 2.05 sec
jobs 8, mem 764584 Kb, time 1.58 sec
jobs 16, mem 844856 Kb, time 1.59 sec
jobs 32, mem 1047880 Kb, time 1.69 sec

This patchset on top of pahole/next:

 Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):

    31,175,635,417      cycles                                                                  ( +-  0.22% )

           1.58644 +- 0.00501 seconds time elapsed  ( +-  0.32% )

jobs 1, mem 544780 Kb, time 4.47 sec
jobs 2, mem 553944 Kb, time 4.68 sec
jobs 4, mem 563352 Kb, time 2.36 sec
jobs 8, mem 585508 Kb, time 1.73 sec
jobs 16, mem 635212 Kb, time 1.61 sec
jobs 32, mem 772752 Kb, time 1.59 sec

[1]: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=12ca11281912c272f931e836b9160ee827250716

Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
---
 dwarf_loader.c |   4 +
 dwarves.c      |  47 ++++-----
 dwarves.h      |   5 +-
 pahole.c       | 255 +++++++++++++++++++------------------------------
 4 files changed, 127 insertions(+), 184 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 5d55649..86f8b92 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3440,6 +3440,7 @@ struct dwarf_cus {
 	int		    build_id_len;
 	int		    error;
 	struct dwarf_cu	    *type_dcu;
+	uint32_t	nr_cus_created;
 };
 
 struct dwarf_thread {
@@ -3472,6 +3473,9 @@ static struct dwarf_cu *dwarf_cus__create_cu(struct dwarf_cus *dcus, Dwarf_Die *
 	cu->priv = dcu;
 	cu->dfops = &dwarf__ops;
 
+	cu->id = dcus->nr_cus_created;
+	dcus->nr_cus_created++;
+
 	return dcu;
 }
 
diff --git a/dwarves.c b/dwarves.c
index ae512b9..e8701ba 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -537,39 +537,18 @@ void cus__set_cu_state(struct cus *cus, struct cu *cu, enum cu_state state)
 	cus__unlock(cus);
 }
 
-// Used only when reproducible builds are desired
-struct cu *cus__get_next_processable_cu(struct cus *cus)
+struct cu *cus__get_cu_by_id(struct cus *cus, uint32_t id)
 {
-	struct cu *cu;
-
+	struct cu *cu, *result = NULL;
 	cus__lock(cus);
-
 	list_for_each_entry(cu, &cus->cus, node) {
-		switch (cu->state) {
-		case CU__LOADED:
-			cu->state = CU__PROCESSING;
-			goto found;
-		case CU__PROCESSING:
-			// This will happen when we get to parallel
-			// reproducible BTF encoding, libbpf dedup work needed
-			// here. The other possibility is when we're flushing
-			// the DWARF processed CUs when the parallel DWARF
-			// loading stoped and we still have CUs to encode to
-			// BTF because of ordering requirements.
-			continue;
-		case CU__UNPROCESSED:
-			// The first entry isn't loaded, signal the
-			// caller to return and try another day, as we
-			// need to respect the original DWARF CU ordering.
-			goto out;
+		if (cu->id == id) {
+			result = cu;
+			break;
 		}
 	}
-out:
-	cu = NULL;
-found:
 	cus__unlock(cus);
-
-	return cu;
+	return result;
 }
 
 bool cus__empty(const struct cus *cus)
@@ -610,6 +589,20 @@ void cus__add(struct cus *cus, struct cu *cu)
 	cu__find_class_holes(cu);
 }
 
+void cus__remove_processed_cus(struct cus *cus)
+{
+	struct cu *cu, *tmp;
+
+	cus__lock(cus);
+	list_for_each_entry_safe(cu, tmp, &cus->cus, node) {
+		if (cu->state == CU__PROCESSED) {
+			__cus__remove(cus, cu);
+			cu__delete(cu);
+		}
+	}
+	cus__unlock(cus);
+}
+
 static void ptr_table__init(struct ptr_table *pt)
 {
 	pt->entries = NULL;
diff --git a/dwarves.h b/dwarves.h
index 1a0bd4b..8af4045 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -49,6 +49,7 @@ enum cu_state {
 	CU__UNPROCESSED,
 	CU__LOADED,
 	CU__PROCESSING,
+	CU__PROCESSED,
 };
 
 /*
@@ -194,8 +195,9 @@ void cus__add(struct cus *cus, struct cu *cu);
 
 void __cus__remove(struct cus *cus, struct cu *cu);
 void cus__remove(struct cus *cus, struct cu *cu);
+void cus__remove_processed_cus(struct cus *cus);
 
-struct cu *cus__get_next_processable_cu(struct cus *cus);
+struct cu *cus__get_cu_by_id(struct cus *cus, uint32_t id);
 
 void cus__set_cu_state(struct cus *cus, struct cu *cu, enum cu_state state);
 
@@ -297,6 +299,7 @@ struct cu {
 	struct ptr_table functions_table;
 	struct ptr_table tags_table;
 	struct rb_root	 functions;
+	uint32_t	 id;
 	const char	 *name;
 	char		 *filename;
 	void 		 *priv;
diff --git a/pahole.c b/pahole.c
index 6b46399..f56c66f 100644
--- a/pahole.c
+++ b/pahole.c
@@ -94,6 +94,7 @@ static struct conf_fprintf conf = {
 
 static struct conf_load conf_load = {
 	.conf_fprintf = &conf,
+	.nr_jobs = 1,
 };
 
 struct structure {
@@ -3136,12 +3137,10 @@ static bool print_enumeration_with_enumerator(struct cu *cu, const char *name)
 	return false;
 }
 
-struct thread_data {
-	struct btf *btf;
-	struct btf_encoder *encoder;
-};
+// not used yet
+struct thread_data { int dummy; };
 
-static int pahole_threads_prepare_reproducible_build(struct conf_load *conf, int nr_threads, void **thr_data)
+static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
 {
 	for (int i = 0; i < nr_threads; i++)
 		thr_data[i] = NULL;
@@ -3149,17 +3148,6 @@ static int pahole_threads_prepare_reproducible_build(struct conf_load *conf, int
 	return 0;
 }
 
-static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
-{
-	int i;
-	struct thread_data *threads = calloc(sizeof(struct thread_data), nr_threads);
-
-	for (i = 0; i < nr_threads; i++)
-		thr_data[i] = threads + i;
-
-	return 0;
-}
-
 static int pahole_thread_exit(struct conf_load *conf, void *thr_data)
 {
 	struct thread_data *thread = thr_data;
@@ -3179,7 +3167,6 @@ static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void *
 				  int error)
 {
 	struct thread_data **threads = (struct thread_data **)thr_data;
-	int i;
 	int err = 0;
 
 	if (error)
@@ -3189,29 +3176,97 @@ static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void *
 	if (err < 0)
 		goto out;
 
-	for (i = 0; i < nr_threads; i++) {
-		/*
-		 * Merge content of the btf instances of worker threads to the btf
-		 * instance of the primary btf_encoder.
-                */
-		if (!threads[i]->encoder || !threads[i]->btf)
-			continue;
-		err = btf_encoder__add_encoder(btf_encoder, threads[i]->encoder);
-		if (err < 0)
-			goto out;
+out:
+	free(threads[0]);
+
+	return err;
+}
+
+static inline void btf_encoder__init(struct cu *cu, struct conf_load *conf_load)
+{
+	btf_encoder = btf_encoder__new(cu,
+				       detached_btf_filename,
+				       conf_load->base_btf,
+				       global_verbose,
+				       conf_load);
+	if (!btf_encoder) {
+		fprintf(stderr, "Error creating BTF encoder.\n");
+		exit(1);
 	}
-	err = 0;
+}
 
-out:
-	for (i = 0; i < nr_threads; i++) {
-		if (threads[i]->encoder && threads[i]->encoder != btf_encoder) {
-			btf_encoder__delete(threads[i]->encoder);
-			threads[i]->encoder = NULL;
+static enum load_steal_kind pahole_stealer__threaded_btf_encode(struct cu *cu, struct conf_load *conf_load)
+{
+	static pthread_mutex_t btf_encode_lock = PTHREAD_MUTEX_INITIALIZER;
+	static uint32_t next_cu_id_to_encode;
+
+	struct cu *cu_to_encode = NULL;
+	int encoding_ret;
+
+	if (pthread_mutex_trylock(&btf_encode_lock) != 0) {
+		/* Another thread is busy encoding.
+		 * Check the number of CUs in queue, and if it's too high, wait.
+		 */
+		while (cus__nr_entries(cus) >= conf_load->nr_jobs - 1) {
+			usleep(1000);
+			cus__remove_processed_cus(cus);
 		}
+		goto out;
 	}
-	free(threads[0]);
 
-	return err;
+	/* Got the lock: check whether btf_encoder has already been created */
+	if (!btf_encoder) {
+		if (cu->id == 0)
+			btf_encoder__init(cu, conf_load);
+		goto out_unlock;
+	}
+
+	/* Proceed to encoding in order of cu->id
+	 * cus->cus serves as a queue of loaded cu-s
+	 * Keep encoding until a cu with expected id is not yet loaded
+	 */
+	do {
+		cu_to_encode = cus__get_cu_by_id(cus, next_cu_id_to_encode);
+		if (!cu_to_encode || cu_to_encode->state != CU__LOADED)
+			break;
+
+		encoding_ret = btf_encoder__encode_cu(btf_encoder,
+						      cu_to_encode,
+						      conf_load);
+		if (encoding_ret < 0) {
+			fprintf(stderr, "Encountered error while encoding BTF.\n");
+			exit(1);
+		}
+
+		cus__set_cu_state(cus, cu_to_encode, CU__PROCESSED);
+		next_cu_id_to_encode++;
+	} while (true);
+
+out_unlock:
+	pthread_mutex_unlock(&btf_encode_lock);
+out:
+	// always keep incoming CUs
+	// they will be deleted by cus__remove_processed_cus()
+	return LSK__KEEPIT;
+}
+
+static enum load_steal_kind pahole_stealer__btf_encode(struct cu *cu, struct conf_load *conf_load)
+{
+	int ret;
+
+	if (!btf_encoder)
+		btf_encoder__init(cu, conf_load);
+
+	ret = btf_encoder__encode_cu(btf_encoder, cu, conf_load);
+	if (ret < 0) {
+		fprintf(stderr, "Encountered error while encoding BTF.\n");
+		exit(1);
+	}
+
+	// This has no meaning for a single thread, but let's be consistent
+	cus__set_cu_state(cus, cu, CU__PROCESSED);
+
+	return ret;
 }
 
 static enum load_steal_kind pahole_stealer(struct cu *cu,
@@ -3238,94 +3293,10 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 		return LSK__DELETE; // Maybe we can find this in several CUs, so don't stop it
 
 	if (btf_encode) {
-		static pthread_mutex_t btf_lock = PTHREAD_MUTEX_INITIALIZER;
-		struct btf_encoder *encoder;
-
-		pthread_mutex_lock(&btf_lock);
-		/*
-		 * FIXME:
-		 *
-		 * This should be really done at main(), but since in the current codebase only at this
-		 * point we'll have cu->elf setup...
-		 */
-		if (!btf_encoder) {
-			/*
-			 * btf_encoder is the primary encoder.
-			 * And, it is used by the thread
-			 * create it.
-			 */
-			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf,
-						       global_verbose, conf_load);
-			if (btf_encoder && thr_data) {
-				struct thread_data *thread = thr_data;
-
-				thread->encoder = btf_encoder;
-				thread->btf = btf_encoder__btf(btf_encoder);
-			}
-		}
-
-		// Reproducible builds don't have multiple btf_encoders, so we need to keep the lock until we encode BTF for this CU.
-		if (thr_data)
-			pthread_mutex_unlock(&btf_lock);
-
-		if (!btf_encoder) {
-			ret = LSK__STOP_LOADING;
-			goto out_btf;
-		}
-
-		/*
-		 * thr_data keeps per-thread data for worker threads.  Each worker thread
-		 * has an encoder.  The main thread will merge the data collected by all
-		 * these encoders to btf_encoder.  However, the first thread reaching this
-		 * function creates btf_encoder and reuses it as its local encoder.  It
-		 * avoids copying the data collected by the first thread.
-		 */
-		if (thr_data) {
-			struct thread_data *thread = thr_data;
-
-			if (thread->encoder == NULL) {
-				thread->encoder =
-					btf_encoder__new(cu, detached_btf_filename,
-							 NULL,
-							 global_verbose,
-							 conf_load);
-				thread->btf = btf_encoder__btf(thread->encoder);
-			}
-			encoder = thread->encoder;
-		} else {
-			encoder = btf_encoder;
-		}
-
-		// Since we don't have yet a way to parallelize the BTF encoding, we
-		// need to ask the loader for the next CU that we can process, one
-		// that is loaded and is in order, if the next one isn't yet loaded,
-		// then return to let the DWARF loader thread to load the next one,
-		// eventually all will get processed, even if when all DWARF loading
-		// threads finish.
-		if (conf_load->reproducible_build) {
-			ret = LSK__KEEPIT; // we're not processing the cu passed to this
-					  // function, so keep it.
-			cu = cus__get_next_processable_cu(cus);
-			if (cu == NULL)
-				goto out_btf;
-		}
-
-		ret = btf_encoder__encode_cu(encoder, cu, conf_load);
-		if (ret < 0) {
-			fprintf(stderr, "Encountered error while encoding BTF.\n");
-			exit(1);
-		}
-
-		if (conf_load->reproducible_build) {
-			ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it.
-			// Kinda equivalent to LSK__DELETE since we processed this, but we can't delete it
-			// as we stash references to entries in CUs for 'struct function' in btf_encoder__add_saved_funcs()
-			// and btf_encoder__save_func(), so we can't delete them here. - Alan Maguire
-		}
-out_btf:
-		if (!thr_data) // See comment about reproducibe_build above
-			pthread_mutex_unlock(&btf_lock);
-		return ret;
+		if (conf_load->nr_jobs > 1)
+			return pahole_stealer__threaded_btf_encode(cu, conf_load);
+		else
+			return pahole_stealer__btf_encode(cu, conf_load);
 	}
 #if 0
 	if (ctf_encode) {
@@ -3625,24 +3596,6 @@ out_free:
 	return ret;
 }
 
-static int cus__flush_reproducible_build(struct cus *cus, struct btf_encoder *encoder, struct conf_load *conf_load)
-{
-	int err = 0;
-
-	while (true) {
-		struct cu *cu = cus__get_next_processable_cu(cus);
-
-		if (cu == NULL)
-			break;
-
-		err = btf_encoder__encode_cu(encoder, cu, conf_load);
-		if (err < 0)
-			break;
-	}
-
-	return err;
-}
-
 int main(int argc, char *argv[])
 {
 	int err, remaining, rc = EXIT_FAILURE;
@@ -3731,18 +3684,12 @@ int main(int argc, char *argv[])
 	if (languages.exclude)
 		conf_load.early_cu_filter = cu__filter;
 
-	conf_load.thread_exit = pahole_thread_exit;
-
-	if (conf_load.reproducible_build) {
-		conf_load.threads_prepare = pahole_threads_prepare_reproducible_build;
-		conf_load.threads_collect = NULL;
-	} else {
+	if (btf_encode) {
+		conf_load.pre_load_module = btf_encoder__pre_load_module;
 		conf_load.threads_prepare = pahole_threads_prepare;
 		conf_load.threads_collect = pahole_threads_collect;
-	}
+		conf_load.thread_exit = pahole_thread_exit;
 
-	if (btf_encode) {
-		conf_load.pre_load_module = btf_encoder__pre_load_module;
 		err = btf_encoding_context__init();
 		if (err < 0)
 			goto out;
@@ -3847,13 +3794,9 @@ try_sole_arg_as_class_names:
 	header = NULL;
 
 	if (btf_encode && btf_encoder) { // maybe all CUs were filtered out and thus we don't have an encoder?
-		if (conf_load.reproducible_build &&
-		    cus__flush_reproducible_build(cus, btf_encoder, &conf_load) < 0) {
-			fprintf(stderr, "Encountered error while encoding BTF.\n");
-			exit(1);
-		}
 
-		if (conf_load.nr_jobs <= 1 || conf_load.reproducible_build)
+		// pahole_threads_collect() is not called in single-threaded runs
+		if (conf_load.nr_jobs <= 1)
 			btf_encoder__add_saved_funcs(btf_encoder);
 
 		err = btf_encoder__encode(btf_encoder);
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/9] btf_encoder: simplify function encoding
  2024-11-28  1:23 ` [RFC PATCH 1/9] btf_encoder: simplify function encoding Ihor Solodrai
@ 2024-11-29  8:16   ` Eduard Zingerman
  2024-12-02 13:55   ` Jiri Olsa
  1 sibling, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29  8:16 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:23 +0000, Ihor Solodrai wrote:
> From: Alan Maguire <alan.maguire@oracle.com>
> 
> Currently we have two modes of function encoding; one adds functions
> based upon the first instance found and ignores inconsistent
> representations.  The second saves function representations and later
> finds inconsistencies.  The mode chosen is determined by
> conf_load->skip_encoding_btf_inconsistent_proto.
> 
> The knock-on effect is that we need to support two modes in
> btf_encoder__add_func(); one for each case.  Simplify by using
> the "save function" approach for both cases; only difference is
> that we allow inconsistent representations if
> skip_encoding_btf_inconsistent_proto is not set (it is set by default
> for upstream kernels and has been for a while).
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/9] btf_encoder: store,use section-relative addresses in ELF function representation
  2024-11-28  1:23 ` [RFC PATCH 2/9] btf_encoder: store,use section-relative addresses in ELF function representation Ihor Solodrai
@ 2024-11-29  9:07   ` Eduard Zingerman
  2024-12-02 14:34     ` Alan Maguire
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29  9:07 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:23 +0000, Ihor Solodrai wrote:
> From: Alan Maguire <alan.maguire@oracle.com>
> 
> This will help us do more accurate DWARF/ELF matching.

It would be good to have a more detailed explanation here.
E.g. number of generated functions differs with this patch:

# without this patch
$ bpftool btf dump file /home/eddy/work/tmp/old.btf | grep "\] FUNC '" | wc -l
48056
# with this patch
$ bpftool btf dump file /home/eddy/work/tmp/new.btf | grep "\] FUNC '" | wc -l
48189

It would be helpful to peek one of newly added functions and explain
why it was previously excluded.

> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---
>  btf_encoder.c | 37 +++++++++++++++++++++++++++++++------
>  1 file changed, 31 insertions(+), 6 deletions(-)
> 
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 98e4d7d..01d7094 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -88,6 +88,7 @@ struct btf_encoder_func_state {
>  struct elf_function {
>  	const char	*name;
>  	char		*alias;
> +	uint32_t	addr;
>  	size_t		prefixlen;
>  	struct btf_encoder_func_state state;
>  };
> @@ -131,6 +132,7 @@ struct btf_encoder {
>  		int		    allocated;
>  		int		    cnt;
>  		int		    suffix_cnt; /* number of .isra, .part etc */
> +		uint64_t	    base_addr;

This field is set, but never read.

>  	} functions;
>  };
>  

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/9] btf_encoder: separate elf function, saved function representations
  2024-11-28  1:23 ` [RFC PATCH 3/9] btf_encoder: separate elf function, saved function representations Ihor Solodrai
@ 2024-11-29 20:37   ` Eduard Zingerman
  2024-12-09 21:19     ` Ihor Solodrai
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29 20:37 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:23 +0000, Ihor Solodrai wrote:
> From: Alan Maguire <alan.maguire@oracle.com>
> 
> Have saved function representation point back at immutable ELF function
> table.  This will make sharing the ELF function table across encoders
> easier.  Simply accumulate saved functions for each encoder, and on
> completion combine them into a name-sorted list.  Then carry out
> comparisons to check for inconsistent representations, skipping functions
> that are inconsistent in their representation.
> 
> Thre is a small growth in maximum resident set size due to saving
> more functions; it grows from
> 
> 	Maximum resident set size (kbytes): 701888
> 
> to:
> 
> 	Maximum resident set size (kbytes): 704168
> 
> ...with this patch for -j1.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

I like what this patch does, a few nits below.

Note:
this patch leads to 58 less functions being generated,
compared to a previous patch, for my test configuration.
For example, functions like:
- hid_map_usage_clear
- jhash
- nlmsg_parse_deprecated_strict
Are not in the BTF anymore. It would be good if patch message could
explain why this happens.

[...]

> +static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
> +{
> +	struct btf_encoder_func_state **saved_fns, *s;
> +	struct btf_encoder *e = NULL;
> +	int i = 0, j, nr_saved_fns = 0;
> +
> +	/* Retrieve function states from each encoder, combine them
> +	 * and sort by name, addr.
> +	 */
> +	btf_encoders__for_each_encoder(e) {
> +		list_for_each_entry(s, &e->func_states, node)
> +			nr_saved_fns++;
> +	}
> +	/* Another thread already did this work */
> +	if (nr_saved_fns == 0) {
> +		printf("nothing to do for encoder...\n");
> +		return 0;
> +	}

Nit: this function is called from pahole_threads_collect():

	static int pahole_threads_collect(...)
		for (i = 0; i < nr_threads; i++)
			...
			err = btf_encoder__add_encoder(btf_encoder, threads[i]->encoder);
			...

	int32_t btf_encoder__add_encoder(struct btf_encoder *encoder, struct btf_encoder *other)
		...
		btf_encoder__add_saved_funcs(other);
		...
	
      maybe move call to btf_encoder__add_saved_funcs() to pahole_threads_collect()
      outside of the loop? So that comment about another thread won't be necessary.

> +
> +	printf("got %d saved functions...\n", nr_saved_fns);
> +	saved_fns = calloc(nr_saved_fns, sizeof(*saved_fns));
> +	btf_encoders__for_each_encoder(e) {
> +		list_for_each_entry(s, &e->func_states, node)
> +			saved_fns[i++] = s;
> +	}
> +	printf("added %d saved fns\n", i);
> +	qsort(saved_fns, nr_saved_fns, sizeof(*saved_fns), saved_functions_cmp);
> +
> +	for (i = 0; i < nr_saved_fns; i = j) {
> +		struct btf_encoder_func_state *state = saved_fns[i];
> +
> +		/* Compare across sorted functions that match by name/prefix;
> +		 * share inconsistent/unexpected reg state between them.
> +		 */
> +		j = i + 1;
> +
> +		while (j < nr_saved_fns &&
> +		       saved_functions_combine(saved_fns[i], saved_fns[j]) == 0)
> +				j++;
> +
> +		/* do not exclude functions with optimized-out parameters; they
> +		 * may still be _called_ with the right parameter values, they
> +		 * just do not _use_ them.  Only exclude functions with
> +		 * unexpected register use or multiple inconsistent prototypes.
> +		 */
> +		if (!encoder->skip_encoding_inconsistent_proto ||
> +		    (!state->unexpected_reg && !state->inconsistent_proto)) {
> +			if (btf_encoder__add_func(state->encoder, state)) {
> +				free(saved_fns);
> +				return -1;
> +			}
> +		}
> +	}
> +	/* Now that we are done with function states, free them. */
> +	free(saved_fns);
> +	btf_encoders__for_each_encoder(e)
> +		btf_encoder__delete_saved_funcs(e);
> +	return 0;
> +}

[...]

> @@ -2437,16 +2470,8 @@ out_delete:
>  	return NULL;
>  }
>  
> -void btf_encoder__delete_func(struct elf_function *func)
> -{
> -	free(func->alias);

Nit: it looks like func->alias is never freed after this change.

> -	zfree(&func->state.annots);
> -	zfree(&func->state.parms);
> -}

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 4/9] dwarf_loader: introduce pre_load_module hook to conf_load
  2024-11-28  1:24 ` [RFC PATCH 4/9] dwarf_loader: introduce pre_load_module hook to conf_load Ihor Solodrai
@ 2024-11-29 21:03   ` Eduard Zingerman
  0 siblings, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29 21:03 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:24 +0000, Ihor Solodrai wrote:
> Add a function pointer to conf_load, which is called immediately after
> Elf is extracted from Dwfl_Module in cus__proces_dwflmod.
> 
> This is a preparation for making elf_functions table shared between
> encoders. Shared table can be built as soon as the relevant Elf is
> available.
> 
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---
>  dwarf_loader.c | 14 +++++++-------
>  dwarves.h      | 11 +++++++++--
>  2 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index 598fde4..5d55649 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -3796,13 +3796,6 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
>  	return DWARF_CB_OK;
>  }
>  
> -struct process_dwflmod_parms {
> -	struct cus	 *cus;
> -	struct conf_load *conf;
> -	const char	 *filename;
> -	uint32_t	 nr_dwarf_sections_found;
> -};
> -

This is probably a leftover, moving this structure is not necessary.

>  static int cus__process_dwflmod(Dwfl_Module *dwflmod,
>  				void **userdata __maybe_unused,
>  				const char *name __maybe_unused,
> @@ -3826,11 +3819,18 @@ static int cus__process_dwflmod(Dwfl_Module *dwflmod,
>  	Dwarf *dw = dwfl_module_getdwarf(dwflmod, &dwbias);
>  
>  	int err = DWARF_CB_OK;
> +	if (parms->conf->pre_load_module) {
> +		err = parms->conf->pre_load_module(dwflmod, elf);
> +		if (err)
> +			return DWARF_CB_ABORT;
> +	}
> +
>  	if (dw != NULL) {
>  		++parms->nr_dwarf_sections_found;
>  		err = cus__load_module(cus, parms->conf, dwflmod, dw, elf,
>  				       parms->filename);
>  	}
> +

Nit: useless white space change.

>  	/*
>  	 * XXX We will fall back to try finding other debugging
>  	 * formats (CTF), so no point in telling this to the user

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 5/9] btf_encoder: introduce elf_functions struct type
  2024-11-28  1:24 ` [RFC PATCH 5/9] btf_encoder: introduce elf_functions struct type Ihor Solodrai
@ 2024-11-29 21:50   ` Eduard Zingerman
  0 siblings, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29 21:50 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:24 +0000, Ihor Solodrai wrote:
> Extract elf_functions struct type from btf_encoder.
> 
> Replace methods operating functions table in btf_encoder by methods
> operating on elf_functions:
> - btf_encoder__collect_function -> elf_functions__collect_function
> - btf_encoder__collect_symbols -> elf_functions__collect
> 
> Now these functions do not depend on btf_encoder being passed to them
> as a parameter.
> 
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> @@ -2132,31 +2110,59 @@ int btf_encoder__encode(struct btf_encoder *encoder)
>  	return err;
>  }
>  
> -
> -static int btf_encoder__collect_symbols(struct btf_encoder *encoder)
> +static int elf_functions__collect(struct elf_functions *functions)
>  {
> -	bool base_addr_set = false;
> -	uint32_t sym_sec_idx;
> +	uint32_t nr_symbols = elf_symtab__nr_symbols(functions->symtab);
> +	struct elf_function *tmp;
> +	Elf32_Word sym_sec_idx;
>  	uint32_t core_id;
>  	GElf_Sym sym;
> +	int err;
> +
> +	/* We know that number of functions is less than number of symbols,
> +	 * so we can overallocate temporarily.
> +	 */
> +	functions->entries = calloc(nr_symbols, sizeof(struct elf_function));

Nit: use sizeof(*functions->entries) instead of sizeof(struct elf_function)
     here and elsewhere.

> +	if (!functions->entries) {
> +		err = -ENOMEM;
> +		goto out_free;
> +	}
>  
> -	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
> -		if (!base_addr_set && sym_sec_idx && sym_sec_idx < encoder->seccnt) {
> -			encoder->functions.base_addr = encoder->secinfo[sym_sec_idx].addr;
> -			base_addr_set = true;
> +	functions->cnt = 0;
> +	elf_symtab__for_each_symbol_index(functions->symtab, core_id, sym, sym_sec_idx) {
> +		if (elf_functions__collect_function(functions, &sym)) {
> +			err = -1;
> +			goto out_free;

Nit: elf_functions__collect_function() never fails now (make it void?).

>  		}
> -		if (btf_encoder__collect_function(encoder, &sym))
> -			return -1;
>  	}
>  
> -	if (encoder->functions.cnt) {
> -		qsort(encoder->functions.entries, encoder->functions.cnt, sizeof(encoder->functions.entries[0]),
> +	if (functions->cnt) {
> +		qsort(functions->entries,
> +		      functions->cnt,
> +		      sizeof(functions->entries[0]),
>  		      functions_cmp);
> -		if (encoder->verbose)
> -			printf("Found %d functions!\n", encoder->functions.cnt);
> +	} else {
> +		err = 0;
> +		goto out_free;
> +	}
> +
> +	/* Reallocate to the exact size */
> +	tmp = realloc(functions->entries, functions->cnt * sizeof(struct elf_function));
> +	if (tmp) {
> +		functions->entries = tmp;
> +	} else {
> +		fprintf(stderr, "could not reallocate memory for elf_functions table\n");
> +		err = -ENOMEM;
> +		goto out_free;
>  	}
>  
>  	return 0;
> +
> +out_free:
> +	free(functions->entries);
> +	functions->entries = NULL;
> +	functions->cnt = 0;
> +	return err;
>  }
>  
>  static bool ftype__has_arg_names(const struct ftype *ftype)
> @@ -2417,6 +2423,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
>  				printf("%s: '%s' doesn't have symtab.\n", __func__, cu->filename);
>  			goto out;
>  		}
> +		encoder->functions.symtab = encoder->symtab;
>  
>  		/* index the ELF sections for later lookup */
>  
> @@ -2455,7 +2462,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
>  		if (!found_percpu && encoder->verbose)
>  			printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
>  
> -		if (btf_encoder__collect_symbols(encoder))
> +		if (elf_functions__collect(&encoder->functions))
>  			goto out_delete;
>  
>  		if (encoder->verbose)
> @@ -2486,7 +2493,7 @@ void btf_encoder__delete(struct btf_encoder *encoder)
>  	encoder->btf = NULL;
>  	elf_symtab__delete(encoder->symtab);
>  
> -	encoder->functions.allocated = encoder->functions.cnt = 0;
> +	encoder->functions.cnt = 0;
>  	free(encoder->functions.entries);
>  	encoder->functions.entries = NULL;

Nit: this cleanup code is repeated two times.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 6/9] btf_encoder: collect elf_functions in btf_encoder__pre_load_module
  2024-11-28  1:24 ` [RFC PATCH 6/9] btf_encoder: collect elf_functions in btf_encoder__pre_load_module Ihor Solodrai
@ 2024-11-29 22:27   ` Eduard Zingerman
  0 siblings, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29 22:27 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:24 +0000, Ihor Solodrai wrote:
> Introduce a global elf_functions_list variable in btf_encoder.c that
> contains an elf_functions per ELF.
> 
> An elf_functions structure is allocated and filled out by
> btf_encoder__pre_load_module() hook, and the list is cleared after
> btf_encoder__encode() is done.
> 
> At this point btf_encoders don't use shared elf_functions yet (each
> maintains their own copy as before), but it is built before encoders
> are initialized.
> 
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 7/9] btf_encoder: switch to shared elf_functions table
  2024-11-28  1:24 ` [RFC PATCH 7/9] btf_encoder: switch to shared elf_functions table Ihor Solodrai
@ 2024-11-29 22:35   ` Eduard Zingerman
  2024-12-09 23:55     ` Ihor Solodrai
  0 siblings, 1 reply; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29 22:35 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:24 +0000, Ihor Solodrai wrote:
> Do not collect functions from ELF for each new btf_encoder, and
> instead set a pointer to a shared elf_functions table, built
> beforehand by btf_encoder__pre_cus__load_module().
> 
> Do not call btf_encoder__add_saved_funcs() on every
> btf_encoder__add_encoder(). Instead, for non-reproducible
> multi-threaded case do that in pahole_threads_collect(), and for
> single-threaded or reproducible_build do that right before
> btf_encoder__encode().
> 
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> diff --git a/pahole.c b/pahole.c
> index 1f8cf4b..b5aea56 100644
> --- a/pahole.c
> +++ b/pahole.c
> @@ -3185,12 +3185,16 @@ static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void *
>  	if (error)
>  		goto out;
>  
> +	err = btf_encoder__add_saved_funcs(btf_encoder);
> +	if (err < 0)
> +		goto out;
> +
>  	for (i = 0; i < nr_threads; i++) {
>  		/*
>  		 * Merge content of the btf instances of worker threads to the btf
>  		 * instance of the primary btf_encoder.
>                  */
> -		if (!threads[i]->btf)
> +		if (!threads[i]->encoder || !threads[i]->btf)
>  			continue;

When can 'threads[i]->encoder' be NULL? In case if worker thread exits with error?

>  		err = btf_encoder__add_encoder(btf_encoder, threads[i]->encoder);
>  		if (err < 0)

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 8/9] btf_encoder: introduce btf_encoding_context
  2024-11-28  1:24 ` [RFC PATCH 8/9] btf_encoder: introduce btf_encoding_context Ihor Solodrai
@ 2024-11-29 23:12   ` Eduard Zingerman
  0 siblings, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-29 23:12 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:24 +0000, Ihor Solodrai wrote:

Note, after applying this patch I see the following report from
address sanitizer:

Direct leak of 3640 byte(s) in 1 object(s) allocated from:
    #0 0x7f12d2ac2290 in calloc (/lib64/libasan.so.8+0xc2290) (BuildId: ...)
    #1 0x478775 in btf_encoder__new /home/eddy/work/dwarves-fork/btf_encoder.c:2562
    #2 0x416f98 in pahole_stealer /home/eddy/work/dwarves-fork/pahole.c:3257
    #3 0x49f9b9 in cu__finalize /home/eddy/work/dwarves-fork/dwarf_loader.c:3263

Which points to line:

    encoder->secinfo = calloc(encoder->seccnt, sizeof(*encoder->secinfo));

(when compiled with the following flags:
 -fsanitize=undefined,address -fsanitize-recover=address -fno-omit-frame-pointer)


[...]

> @@ -1355,19 +1400,15 @@ int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)

Nit: should probably remove 'encoder' parameter and replace it with
     bool skip_encoding_inconsistent_proto
     (in one of the earlier patches, just noticed this).

>  		list_for_each_entry(s, &e->func_states, node)
>  			nr_saved_fns++;
>  	}
> -	/* Another thread already did this work */
> -	if (nr_saved_fns == 0) {
> -		printf("nothing to do for encoder...\n");
> +
> +	if (nr_saved_fns == 0)
>  		return 0;
> -	}
>  
> -	printf("got %d saved functions...\n", nr_saved_fns);
>  	saved_fns = calloc(nr_saved_fns, sizeof(*saved_fns));
>  	btf_encoders__for_each_encoder(e) {
>  		list_for_each_entry(s, &e->func_states, node)
>  			saved_fns[i++] = s;
>  	}
> -	printf("added %d saved fns\n", i);
>  	qsort(saved_fns, nr_saved_fns, sizeof(*saved_fns), saved_functions_cmp);
>  
>  	for (i = 0; i < nr_saved_fns; i = j) {

[...]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 9/9] pahole: faster reproducible BTF encoding
  2024-11-28  1:24 ` [RFC PATCH 9/9] pahole: faster reproducible BTF encoding Ihor Solodrai
@ 2024-11-30  0:17   ` Eduard Zingerman
  0 siblings, 0 replies; 26+ messages in thread
From: Eduard Zingerman @ 2024-11-30  0:17 UTC (permalink / raw)
  To: Ihor Solodrai, dwarves, acme; +Cc: bpf, alan.maguire, andrii, mykolal

On Thu, 2024-11-28 at 01:24 +0000, Ihor Solodrai wrote:
> Change multithreaded implementation of BTF encoding:
> 
>   * Use a single btf_encoder accumulating BTF for all compilation units
>   * Make BTF encoding routine exclusive: only one thread at a time may
>     execute btf_encoder__encode_cu
>   * Introduce CU ids: an id is an index of a CU, in order they are
>     created in dwarf_loader.c
>   * Introduce CU__PROCESSED cu_state to inidicate what CUs have been
>     processed by the encoder
>   * Enforce encoding order of compilation units (struct cu) loaded
>     from DWARF by utilizing global struct cus as a queue
>   * reproducible_build option is now moot: BTF encoding is always
>     reproducible with this change
>   * Most of the code that merged the results of multiple BTF encoders
>     into one BTF after CU processing is removed
> 
> Motivation behind this change and analysis that led to it are in the
> cover letter to the patch series.
> 
> In short, this implementation of BTF encoding makes it reproducible
> without sacrificing the performance gains from parallel
> processing. The speed in terms of wall-clock time is comparable to
> non-reproducible runs on pahole/next [1]. The memory footprint is
> lower with increased number of threads.
> 
> pahole/next (12ca112):
> 
>             Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):
> 
>     50,493,244,369      cycles                                                                  ( +-  0.26% )
> 
>             1.6863 +- 0.0150 seconds time elapsed  ( +-  0.89% )
> 
> jobs 1, mem 546556 Kb, time 4.53 sec
> jobs 2, mem 599776 Kb, time 2.81 sec
> jobs 4, mem 661756 Kb, time 2.05 sec
> jobs 8, mem 764584 Kb, time 1.58 sec
> jobs 16, mem 844856 Kb, time 1.59 sec
> jobs 32, mem 1047880 Kb, time 1.69 sec
> 
> This patchset on top of pahole/next:
> 
>  Performance counter stats for '/home/theihor/dev/dwarves/build/pahole -J -j24 --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs --btf_encode_detached=/dev/null --lang_exclude=rust /home/theihor/git/kernel.org/bpf-next/kbuild-output/.tmp_vmlinux1' (13 runs):
> 
>     31,175,635,417      cycles                                                                  ( +-  0.22% )
> 
>            1.58644 +- 0.00501 seconds time elapsed  ( +-  0.32% )
> 
> jobs 1, mem 544780 Kb, time 4.47 sec
> jobs 2, mem 553944 Kb, time 4.68 sec
> jobs 4, mem 563352 Kb, time 2.36 sec
> jobs 8, mem 585508 Kb, time 1.73 sec
> jobs 16, mem 635212 Kb, time 1.61 sec
> jobs 32, mem 772752 Kb, time 1.59 sec
> 
> [1]: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=12ca11281912c272f931e836b9160ee827250716
> 
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
> ---

I think this is a solid idea and a good observation,
but implementation inherits unnecessary complexity from the previous design.
There is no real need to keep single-threaded and multi-threaded modes
separate, instead:
- main thread can serve as a dedicated "collector" thread,
  waiting sequentially for CUs with ids ranging from 0 to number of cus;
- configurable number of worker threads can parse DWARF concurrently
  and put CU objects to the processing queue;
- the queue size has to be bounded to keep memory consumption within
  certain limits (but be careful, a simple bounded queue protected by
  semaphore won't do, as the queue might get fully filled with ids
  different from expected, in case when first CU takes a very long time
  to process and N CUs after it take very short time to process).

[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding
  2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
                   ` (8 preceding siblings ...)
  2024-11-28  1:24 ` [RFC PATCH 9/9] pahole: faster reproducible BTF encoding Ihor Solodrai
@ 2024-12-02 13:55 ` Jiri Olsa
  2024-12-06 18:19   ` Ihor Solodrai
  9 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2024-12-02 13:55 UTC (permalink / raw)
  To: Ihor Solodrai; +Cc: dwarves, acme, bpf, alan.maguire, eddyz87, andrii, mykolal

On Thu, Nov 28, 2024 at 01:23:44AM +0000, Ihor Solodrai wrote:

SNIP

> Test results for this patch series:
> 
>   1: Validation of BTF encoding of functions; this may take some time: Ok
>   2: Default BTF on a system without BTF: Ok
>   3: Flexible arrays accounting: WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_cast_to_kern_ctx already with attribute (bpf_kfunc), ignoring
> WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_rdonly_cast already with attribute (bpf_kfunc), ignoring
> pahole: type 'nft_pipapo_elem' not found
> pahole: type 'ip6t_standard' not found
> pahole: type 'ip6t_error' not found
> pahole: type 'nft_rbtree_elem' not found
> pahole: type 'nft_rule_dp_last' not found
> pahole: type 'nft_bitmap_elem' not found
> pahole: type 'fuse_direntplus' not found
> pahole: type 'ipt_standard' not found
> pahole: type 'ipt_error' not found
> pahole: type 'tls_rec' not found
> pahole: type 'nft_rhash_elem' not found
> pahole: type 'nft_hash_elem' not found
> Ok
>   4: Pretty printing of files using DWARF type information: Ok
>   5: Parallel reproducible DWARF Loading/Serial BTF encoding: Ok

hi,
when trying selftests with this change, I'm getting wrong .BTF
on bpf selftest bpf_testmod.ko module

$ bpftool btf dump file ./bpf_testmod.ko
Error: failed to load BTF from ./bpf_testmod.ko: Invalid argument

jirka

> 
> 
> [1]: https://lore.kernel.org/dwarves/20241016001025.857970-1-ihor.solodrai@pm.me/
> [2]: https://github.com/theihor/dwarves/pull/3
> [3]: https://lore.kernel.org/dwarves/8678ce40-3ce2-4ece-985b-a40427386d57@oracle.com/
> [4]: https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:elf-prep
> [5]: https://github.com/theihor/dwarves/pull/8
> [6]: https://gist.github.com/theihor/f000ce89427828e61fdaa567b332649b
> [7]: https://github.com/theihor/dwarves/pull/8/commits/a7bc67d79d90f98776c6dc5fdaf9f088eb09909d
> 
> 
> Alan Maguire (3):
>   btf_encoder: simplify function encoding
>   btf_encoder: store,use section-relative addresses in ELF function
>     representation
>   btf_encoder: separate elf function, saved function representations
> 
> Ihor Solodrai (6):
>   dwarf_loader: introduce pre_load_module hook to conf_load
>   btf_encoder: introduce elf_functions struct type
>   btf_encoder: collect elf_functions in btf_encoder__pre_load_module
>   btf_encoder: switch to shared elf_functions table
>   btf_encoder: introduce btf_encoding_context
>   pahole: faster reproducible BTF encoding
> 
>  btf_encoder.c  | 661 ++++++++++++++++++++++++++++++-------------------
>  btf_encoder.h  |   6 +
>  dwarf_loader.c |  18 +-
>  dwarves.c      |  47 ++--
>  dwarves.h      |  16 +-
>  pahole.c       | 265 +++++++++-----------
>  6 files changed, 567 insertions(+), 446 deletions(-)
> 
> -- 
> 2.47.0
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/9] btf_encoder: simplify function encoding
  2024-11-28  1:23 ` [RFC PATCH 1/9] btf_encoder: simplify function encoding Ihor Solodrai
  2024-11-29  8:16   ` Eduard Zingerman
@ 2024-12-02 13:55   ` Jiri Olsa
  1 sibling, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2024-12-02 13:55 UTC (permalink / raw)
  To: Ihor Solodrai; +Cc: dwarves, acme, bpf, alan.maguire, eddyz87, andrii, mykolal

On Thu, Nov 28, 2024 at 01:23:49AM +0000, Ihor Solodrai wrote:
> From: Alan Maguire <alan.maguire@oracle.com>
> 
> Currently we have two modes of function encoding; one adds functions
> based upon the first instance found and ignores inconsistent
> representations.  The second saves function representations and later
> finds inconsistencies.  The mode chosen is determined by
> conf_load->skip_encoding_btf_inconsistent_proto.
> 
> The knock-on effect is that we need to support two modes in
> btf_encoder__add_func(); one for each case.  Simplify by using
> the "save function" approach for both cases; only difference is
> that we allow inconsistent representations if
> skip_encoding_btf_inconsistent_proto is not set (it is set by default
> for upstream kernels and has been for a while).
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>

Acked-by: Jiri Olsa <jolsa@kernel.org>

jirka

> ---
>  btf_encoder.c | 79 +++++++++++++++++----------------------------------
>  1 file changed, 26 insertions(+), 53 deletions(-)
> 
> diff --git a/btf_encoder.c b/btf_encoder.c
> index e1adddf..98e4d7d 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -88,7 +88,6 @@ struct btf_encoder_func_state {
>  struct elf_function {
>  	const char	*name;
>  	char		*alias;
> -	bool		 generated;
>  	size_t		prefixlen;
>  	struct btf_encoder_func_state state;
>  };
> @@ -120,6 +119,7 @@ struct btf_encoder {
>  			  force,
>  			  gen_floats,
>  			  skip_encoding_decl_tag,
> +			  skip_encoding_inconsistent_proto,
>  			  tag_kfuncs,
>  			  gen_distilled_base;
>  	uint32_t	  array_index_id;
> @@ -1165,18 +1165,18 @@ out:
>  	return err;
>  }
>  
> -static int32_t btf_encoder__add_func(struct btf_encoder *encoder, struct function *fn,
> +static int32_t btf_encoder__add_func(struct btf_encoder *encoder,
>  				     struct elf_function *func)
>  {
> +	struct btf_encoder_func_state *state = &func->state;
>  	int btf_fnproto_id, btf_fn_id, tag_type_id = 0;
>  	int16_t component_idx = -1;
>  	const char *name;
>  	const char *value;
>  	char tmp_value[KSYM_NAME_LEN];
> +	uint16_t idx;
>  
> -	assert(fn != NULL || func != NULL);
> -
> -	btf_fnproto_id = btf_encoder__add_func_proto(encoder, fn ? &fn->proto : NULL, func);
> +	btf_fnproto_id = btf_encoder__add_func_proto(encoder, NULL, func);
>  	name = func->alias ?: func->name;
>  	if (btf_fnproto_id >= 0)
>  		btf_fn_id = btf_encoder__add_ref_type(encoder, BTF_KIND_FUNC, btf_fnproto_id,
> @@ -1186,40 +1186,23 @@ static int32_t btf_encoder__add_func(struct btf_encoder *encoder, struct functio
>  		       name, btf_fnproto_id < 0 ? "proto" : "func");
>  		return -1;
>  	}
> -	if (!fn) {
> -		struct btf_encoder_func_state *state = &func->state;
> -		uint16_t idx;
> -
> -		if (state->nr_annots == 0)
> -			return 0;
> +	if (state->nr_annots == 0)
> +		return 0;
>  
> -		for (idx = 0; idx < state->nr_annots; idx++) {
> -			struct btf_encoder_func_annot *a = &state->annots[idx];
> +	for (idx = 0; idx < state->nr_annots; idx++) {
> +		struct btf_encoder_func_annot *a = &state->annots[idx];
>  
> -			value = btf__str_by_offset(encoder->btf, a->value);
> -			/* adding BTF data may result in a mode of the
> -			 * value string memory, so make a temporary copy.
> -			 */
> -			strncpy(tmp_value, value, sizeof(tmp_value) - 1);
> -			component_idx = a->component_idx;
> -
> -			tag_type_id = btf_encoder__add_decl_tag(encoder, tmp_value,
> -								btf_fn_id, component_idx);
> -			if (tag_type_id < 0)
> -				break;
> -		}
> -	} else {
> -		struct llvm_annotation *annot;
> -
> -		list_for_each_entry(annot, &fn->annots, node) {
> -			value = annot->value;
> -			component_idx = annot->component_idx;
> +		value = btf__str_by_offset(encoder->btf, a->value);
> +		/* adding BTF data may result in a mode of the
> +		 * value string memory, so make a temporary copy.
> +		 */
> +		strncpy(tmp_value, value, sizeof(tmp_value) - 1);
> +		component_idx = a->component_idx;
>  
> -			tag_type_id = btf_encoder__add_decl_tag(encoder, value, btf_fn_id,
> -								component_idx);
> -			if (tag_type_id < 0)
> -				break;
> -		}
> +		tag_type_id = btf_encoder__add_decl_tag(encoder, tmp_value,
> +							btf_fn_id, component_idx);
> +		if (tag_type_id < 0)
> +			break;
>  	}
>  	if (tag_type_id < 0) {
>  		fprintf(stderr,
> @@ -1277,8 +1260,9 @@ static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
>  		 * just do not _use_ them.  Only exclude functions with
>  		 * unexpected register use or multiple inconsistent prototypes.
>  		 */
> -		if (!state->unexpected_reg && !state->inconsistent_proto) {
> -			if (btf_encoder__add_func(encoder, NULL, func))
> +		if (!encoder->skip_encoding_inconsistent_proto ||
> +		    (!state->unexpected_reg && !state->inconsistent_proto)) {
> +			if (btf_encoder__add_func(encoder, func))
>  				return -1;
>  		}
>  		state->processed = 1;
> @@ -2339,6 +2323,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
>  		encoder->force		 = conf_load->btf_encode_force;
>  		encoder->gen_floats	 = conf_load->btf_gen_floats;
>  		encoder->skip_encoding_decl_tag	 = conf_load->skip_encoding_btf_decl_tag;
> +		encoder->skip_encoding_inconsistent_proto = conf_load->skip_encoding_btf_inconsistent_proto;
>  		encoder->tag_kfuncs	 = conf_load->btf_decl_tag_kfuncs;
>  		encoder->gen_distilled_base = conf_load->btf_gen_distilled_base;
>  		encoder->verbose	 = verbose;
> @@ -2544,7 +2529,6 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  
>  	cu__for_each_function(cu, core_id, fn) {
>  		struct elf_function *func = NULL;
> -		bool save = false;
>  
>  		/*
>  		 * Skip functions that:
> @@ -2566,15 +2550,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  
>  			/* prefer exact function name match... */
>  			func = btf_encoder__find_function(encoder, name, 0);
> -			if (func) {
> -				if (func->generated)
> -					continue;
> -				if (conf_load->skip_encoding_btf_inconsistent_proto)
> -					save = true;
> -				else
> -					func->generated = true;
> -			} else if (encoder->functions.suffix_cnt &&
> -				   conf_load->btf_gen_optimized) {
> +			if (!func && encoder->functions.suffix_cnt &&
> +			    conf_load->btf_gen_optimized) {
>  				/* falling back to name.isra.0 match if no exact
>  				 * match is found; only bother if we found any
>  				 * .suffix function names.  The function
> @@ -2585,7 +2562,6 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  				func = btf_encoder__find_function(encoder, name,
>  								  strlen(name));
>  				if (func) {
> -					save = true;
>  					if (encoder->verbose)
>  						printf("matched function '%s' with '%s'%s\n",
>  						       name, func->name,
> @@ -2603,10 +2579,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  		if (!func)
>  			continue;
>  
> -		if (save)
> -			err = btf_encoder__save_func(encoder, fn, func);
> -		else
> -			err = btf_encoder__add_func(encoder, fn, func);
> +		err = btf_encoder__save_func(encoder, fn, func);
>  		if (err)
>  			goto out;
>  	}
> -- 
> 2.47.0
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/9] btf_encoder: store,use section-relative addresses in ELF function representation
  2024-11-29  9:07   ` Eduard Zingerman
@ 2024-12-02 14:34     ` Alan Maguire
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Maguire @ 2024-12-02 14:34 UTC (permalink / raw)
  To: Eduard Zingerman, Ihor Solodrai, dwarves, acme; +Cc: bpf, andrii, mykolal

On 29/11/2024 09:07, Eduard Zingerman wrote:
> On Thu, 2024-11-28 at 01:23 +0000, Ihor Solodrai wrote:
>> From: Alan Maguire <alan.maguire@oracle.com>
>>
>> This will help us do more accurate DWARF/ELF matching.
> 
> It would be good to have a more detailed explanation here.
> E.g. number of generated functions differs with this patch:
> 
> # without this patch
> $ bpftool btf dump file /home/eddy/work/tmp/old.btf | grep "\] FUNC '" | wc -l
> 48056
> # with this patch
> $ bpftool btf dump file /home/eddy/work/tmp/new.btf | grep "\] FUNC '" | wc -l
> 48189
> 
> It would be helpful to peek one of newly added functions and explain
> why it was previously excluded.
>

I've also found the opposite situation; that some functions are not
present after this patch that were before. In your case - where more
were matched - I suspect the extra address hint allowed us to do a
better job of matching ELF and DWARF such that we matched .isra.0
functions that represent actual function boundaries rather than .cold
subfunctions that had different signatures. I'm working on a respin of
the initial patches that ensures we get the same (or more) number of
functions, but it shouldn't change substantially anything that follows
that Ihor has done.

>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
>> ---
>>  btf_encoder.c | 37 +++++++++++++++++++++++++++++++------
>>  1 file changed, 31 insertions(+), 6 deletions(-)
>>
>> diff --git a/btf_encoder.c b/btf_encoder.c
>> index 98e4d7d..01d7094 100644
>> --- a/btf_encoder.c
>> +++ b/btf_encoder.c
>> @@ -88,6 +88,7 @@ struct btf_encoder_func_state {
>>  struct elf_function {
>>  	const char	*name;
>>  	char		*alias;
>> +	uint32_t	addr;
>>  	size_t		prefixlen;
>>  	struct btf_encoder_func_state state;
>>  };
>> @@ -131,6 +132,7 @@ struct btf_encoder {
>>  		int		    allocated;
>>  		int		    cnt;
>>  		int		    suffix_cnt; /* number of .isra, .part etc */
>> +		uint64_t	    base_addr;
> 
> This field is set, but never read.
> 
>>  	} functions;
>>  };
>>  
> 
> [...]
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding
  2024-12-02 13:55 ` [RFC PATCH 0/9] pahole: shared ELF and " Jiri Olsa
@ 2024-12-06 18:19   ` Ihor Solodrai
  2024-12-06 18:30     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 26+ messages in thread
From: Ihor Solodrai @ 2024-12-06 18:19 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: dwarves, acme, bpf, alan.maguire, eddyz87, andrii, mykolal

On Monday, December 2nd, 2024 at 5:55 AM, Jiri Olsa <olsajiri@gmail.com> wrote:

> 
> 
> On Thu, Nov 28, 2024 at 01:23:44AM +0000, Ihor Solodrai wrote:
> 
> SNIP
> 
> > Test results for this patch series:
> > 
> > 1: Validation of BTF encoding of functions; this may take some time: Ok
> > 2: Default BTF on a system without BTF: Ok
> > 3: Flexible arrays accounting: WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_cast_to_kern_ctx already with attribute (bpf_kfunc), ignoring
> > WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_rdonly_cast already with attribute (bpf_kfunc), ignoring
> > pahole: type 'nft_pipapo_elem' not found
> > pahole: type 'ip6t_standard' not found
> > pahole: type 'ip6t_error' not found
> > pahole: type 'nft_rbtree_elem' not found
> > pahole: type 'nft_rule_dp_last' not found
> > pahole: type 'nft_bitmap_elem' not found
> > pahole: type 'fuse_direntplus' not found
> > pahole: type 'ipt_standard' not found
> > pahole: type 'ipt_error' not found
> > pahole: type 'tls_rec' not found
> > pahole: type 'nft_rhash_elem' not found
> > pahole: type 'nft_hash_elem' not found
> > Ok
> > 4: Pretty printing of files using DWARF type information: Ok
> > 5: Parallel reproducible DWARF Loading/Serial BTF encoding: Ok
> 
> 
> hi,
> when trying selftests with this change, I'm getting wrong .BTF
> on bpf selftest bpf_testmod.ko module
> 
> $ bpftool btf dump file ./bpf_testmod.ko
> Error: failed to load BTF from ./bpf_testmod.ko: Invalid argument

Hi Jiri, thank you for testing.

I think the reason for this failure is that changes in the last patch
of the series [1] don't handle correctly a situation when the number
of CUs is lesser than the number of jobs. I was too focused on trying
to speed up vmlinux encoding.

I am going to try implementing a clear queueing interface between
dwarf_loader and pahole_stealer. Hopefully it will make it harder to
introduce bugs like this.

I've started working on the v2 of this series which I hope to submit
sometime next week.

[1] https://lore.kernel.org/dwarves/20241128012341.4081072-10-ihor.solodrai@pm.me/

> 
> jirka
> 
> > Alan Maguire (3):
> > btf_encoder: simplify function encoding
> > btf_encoder: store,use section-relative addresses in ELF function
> > representation
> > btf_encoder: separate elf function, saved function representations
> > 
> > Ihor Solodrai (6):
> > dwarf_loader: introduce pre_load_module hook to conf_load
> > btf_encoder: introduce elf_functions struct type
> > btf_encoder: collect elf_functions in btf_encoder__pre_load_module
> > btf_encoder: switch to shared elf_functions table
> > btf_encoder: introduce btf_encoding_context
> > pahole: faster reproducible BTF encoding
> > 
> > btf_encoder.c | 661 ++++++++++++++++++++++++++++++-------------------
> > btf_encoder.h | 6 +
> > dwarf_loader.c | 18 +-
> > dwarves.c | 47 ++--
> > dwarves.h | 16 +-
> > pahole.c | 265 +++++++++-----------
> > 6 files changed, 567 insertions(+), 446 deletions(-)
> > 
> > --
> > 2.47.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding
  2024-12-06 18:19   ` Ihor Solodrai
@ 2024-12-06 18:30     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-12-06 18:30 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: Jiri Olsa, dwarves, bpf, alan.maguire, eddyz87, andrii, mykolal

On Fri, Dec 06, 2024 at 06:19:24PM +0000, Ihor Solodrai wrote:
> On Monday, December 2nd, 2024 at 5:55 AM, Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > 
> > 
> > On Thu, Nov 28, 2024 at 01:23:44AM +0000, Ihor Solodrai wrote:
> > 
> > SNIP
> > 
> > > Test results for this patch series:
> > > 
> > > 1: Validation of BTF encoding of functions; this may take some time: Ok
> > > 2: Default BTF on a system without BTF: Ok
> > > 3: Flexible arrays accounting: WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_cast_to_kern_ctx already with attribute (bpf_kfunc), ignoring
> > > WARNING: still unsuported BTF_KIND_DECL_TAG(bpf_fastcall) for bpf_rdonly_cast already with attribute (bpf_kfunc), ignoring
> > > pahole: type 'nft_pipapo_elem' not found
> > > pahole: type 'ip6t_standard' not found
> > > pahole: type 'ip6t_error' not found
> > > pahole: type 'nft_rbtree_elem' not found
> > > pahole: type 'nft_rule_dp_last' not found
> > > pahole: type 'nft_bitmap_elem' not found
> > > pahole: type 'fuse_direntplus' not found
> > > pahole: type 'ipt_standard' not found
> > > pahole: type 'ipt_error' not found
> > > pahole: type 'tls_rec' not found
> > > pahole: type 'nft_rhash_elem' not found
> > > pahole: type 'nft_hash_elem' not found
> > > Ok
> > > 4: Pretty printing of files using DWARF type information: Ok
> > > 5: Parallel reproducible DWARF Loading/Serial BTF encoding: Ok
> > 
> > 
> > hi,
> > when trying selftests with this change, I'm getting wrong .BTF
> > on bpf selftest bpf_testmod.ko module
> > 
> > $ bpftool btf dump file ./bpf_testmod.ko
> > Error: failed to load BTF from ./bpf_testmod.ko: Invalid argument
> 
> Hi Jiri, thank you for testing.
> 
> I think the reason for this failure is that changes in the last patch
> of the series [1] don't handle correctly a situation when the number
> of CUs is lesser than the number of jobs. I was too focused on trying
> to speed up vmlinux encoding.
> 
> I am going to try implementing a clear queueing interface between
> dwarf_loader and pahole_stealer. Hopefully it will make it harder to
> introduce bugs like this.
> 
> I've started working on the v2 of this series which I hope to submit
> sometime next week.

Thanks for working on this! I'll try to join the reviewers team soon.

- Arnaldo
 
> [1] https://lore.kernel.org/dwarves/20241128012341.4081072-10-ihor.solodrai@pm.me/
> 
> > 
> > jirka
> > 
> > > Alan Maguire (3):
> > > btf_encoder: simplify function encoding
> > > btf_encoder: store,use section-relative addresses in ELF function
> > > representation
> > > btf_encoder: separate elf function, saved function representations
> > > 
> > > Ihor Solodrai (6):
> > > dwarf_loader: introduce pre_load_module hook to conf_load
> > > btf_encoder: introduce elf_functions struct type
> > > btf_encoder: collect elf_functions in btf_encoder__pre_load_module
> > > btf_encoder: switch to shared elf_functions table
> > > btf_encoder: introduce btf_encoding_context
> > > pahole: faster reproducible BTF encoding
> > > 
> > > btf_encoder.c | 661 ++++++++++++++++++++++++++++++-------------------
> > > btf_encoder.h | 6 +
> > > dwarf_loader.c | 18 +-
> > > dwarves.c | 47 ++--
> > > dwarves.h | 16 +-
> > > pahole.c | 265 +++++++++-----------
> > > 6 files changed, 567 insertions(+), 446 deletions(-)
> > > 
> > > --
> > > 2.47.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/9] btf_encoder: separate elf function, saved function representations
  2024-11-29 20:37   ` Eduard Zingerman
@ 2024-12-09 21:19     ` Ihor Solodrai
  0 siblings, 0 replies; 26+ messages in thread
From: Ihor Solodrai @ 2024-12-09 21:19 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: dwarves, acme, bpf, alan.maguire, andrii, mykolal

On Friday, November 29th, 2024 at 12:37 PM, Eduard Zingerman <eddyz87@gmail.com> wrote:

> On Thu, 2024-11-28 at 01:23 +0000, Ihor Solodrai wrote:
> 
> [...]
> 
> Acked-by: Eduard Zingerman eddyz87@gmail.com
> 
> 
> I like what this patch does, a few nits below.
> 
> Note:
> this patch leads to 58 less functions being generated,
> compared to a previous patch, for my test configuration.
> For example, functions like:
> - hid_map_usage_clear
> - jhash
> - nlmsg_parse_deprecated_strict
> Are not in the BTF anymore. It would be good if patch message could
> explain why this happens.

Hey Eduard.

I decided to remove patch 2/9 [1] for the v2 of this series, as it is
largely unrelated.

Applying 1/9 and 3/9 (this patch) to next, I couldn't reproduce the
difference you described. With reproducible_build I get identical text
dump of BTFs between next [2] and the changes [3].

It looks to me the difference you noted here is caused by introduction
of section-relative addresses in the patch 2/9.

> 
> [...]
> 
> > +static int btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
> > +{
> > + struct btf_encoder_func_state **saved_fns, *s;
> > + struct btf_encoder e = NULL;
> > + int i = 0, j, nr_saved_fns = 0;
> > +
> > + / Retrieve function states from each encoder, combine them
> > + * and sort by name, addr.
> > + /
> > + btf_encoders__for_each_encoder(e) {
> > + list_for_each_entry(s, &e->func_states, node)
> > + nr_saved_fns++;
> > + }
> > + / Another thread already did this work */
> > + if (nr_saved_fns == 0) {
> > + printf("nothing to do for encoder...\n");
> > + return 0;
> > + }
> 
> 
> Nit: this function is called from pahole_threads_collect():
> 
> static int pahole_threads_collect(...)
> for (i = 0; i < nr_threads; i++)
> ...
> err = btf_encoder__add_encoder(btf_encoder, threads[i]->encoder);
> 
> ...
> 
> int32_t btf_encoder__add_encoder(struct btf_encoder *encoder, struct btf_encoder *other)
> ...
> btf_encoder__add_saved_funcs(other);
> ...
> 
> maybe move call to btf_encoder__add_saved_funcs() to pahole_threads_collect()
> outside of the loop? So that comment about another thread won't be necessary.

Done in [3].

> 
> [...]
> 
> > @@ -2437,16 +2470,8 @@ out_delete:
> > return NULL;
> > }
> > 
> > -void btf_encoder__delete_func(struct elf_function *func)
> > -{
> > - free(func->alias);
> 
> 
> Nit: it looks like func->alias is never freed after this change.

Yeap. Fixed in [3].

Sanitizer also caught leaked secinfo here, the one you noted in a
different thread [4]. Fixed in [3] as well.

[1]: https://lore.kernel.org/dwarves/20241128012341.4081072-3-ihor.solodrai@pm.me/
[2]: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=3ddadc131586d6f3aa68775636adff5f7e7ff0f0
[3]: https://github.com/acmel/dwarves/commit/20ef1cd5131d340caaa4719e980b4da77c345579
[4]: https://lore.kernel.org/dwarves/e1df45360963d265ea5e0b3634f0a3dae0c9c343.camel@gmail.com/

> 
> > - zfree(&func->state.annots);
> > - zfree(&func->state.parms);
> > -}
> 
> 
> [...]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 7/9] btf_encoder: switch to shared elf_functions table
  2024-11-29 22:35   ` Eduard Zingerman
@ 2024-12-09 23:55     ` Ihor Solodrai
  0 siblings, 0 replies; 26+ messages in thread
From: Ihor Solodrai @ 2024-12-09 23:55 UTC (permalink / raw)
  To: Eduard Zingerman; +Cc: dwarves, acme, bpf, alan.maguire, andrii, mykolal

On Friday, November 29th, 2024 at 2:35 PM, Eduard Zingerman <eddyz87@gmail.com> wrote:

> 
> 
> On Thu, 2024-11-28 at 01:24 +0000, Ihor Solodrai wrote:
> 
> > Do not collect functions from ELF for each new btf_encoder, and
> > instead set a pointer to a shared elf_functions table, built
> > beforehand by btf_encoder__pre_cus__load_module().
> > 
> > Do not call btf_encoder__add_saved_funcs() on every
> > btf_encoder__add_encoder(). Instead, for non-reproducible
> > multi-threaded case do that in pahole_threads_collect(), and for
> > single-threaded or reproducible_build do that right before
> > btf_encoder__encode().
> > 
> > Signed-off-by: Ihor Solodrai ihor.solodrai@pm.me
> > ---
> 
> 
> Acked-by: Eduard Zingerman eddyz87@gmail.com
> 
> 
> [...]
> 
> > diff --git a/pahole.c b/pahole.c
> > index 1f8cf4b..b5aea56 100644
> > --- a/pahole.c
> > +++ b/pahole.c
> > @@ -3185,12 +3185,16 @@ static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void *
> > if (error)
> > goto out;
> > 
> > + err = btf_encoder__add_saved_funcs(btf_encoder);
> > + if (err < 0)
> > + goto out;
> > +
> > for (i = 0; i < nr_threads; i++) {
> > /*
> > * Merge content of the btf instances of worker threads to the btf
> > * instance of the primary btf_encoder.
> > */
> > - if (!threads[i]->btf)
> > + if (!threads[i]->encoder || !threads[i]->btf)
> > continue;
> 
> 
> When can 'threads[i]->encoder' be NULL? In case if worker thread exits with error?

IIRC if the number of CUs is less than number of jobs, the threads
array may contain NULL elements. I believe this was caught by testing
earlier versions of the patch.

> 
> > err = btf_encoder__add_encoder(btf_encoder, threads[i]->encoder);
> > if (err < 0)
> 
> 
> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-12-09 23:55 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28  1:23 [RFC PATCH 0/9] pahole: shared ELF and faster reproducible BTF encoding Ihor Solodrai
2024-11-28  1:23 ` [RFC PATCH 1/9] btf_encoder: simplify function encoding Ihor Solodrai
2024-11-29  8:16   ` Eduard Zingerman
2024-12-02 13:55   ` Jiri Olsa
2024-11-28  1:23 ` [RFC PATCH 2/9] btf_encoder: store,use section-relative addresses in ELF function representation Ihor Solodrai
2024-11-29  9:07   ` Eduard Zingerman
2024-12-02 14:34     ` Alan Maguire
2024-11-28  1:23 ` [RFC PATCH 3/9] btf_encoder: separate elf function, saved function representations Ihor Solodrai
2024-11-29 20:37   ` Eduard Zingerman
2024-12-09 21:19     ` Ihor Solodrai
2024-11-28  1:24 ` [RFC PATCH 4/9] dwarf_loader: introduce pre_load_module hook to conf_load Ihor Solodrai
2024-11-29 21:03   ` Eduard Zingerman
2024-11-28  1:24 ` [RFC PATCH 5/9] btf_encoder: introduce elf_functions struct type Ihor Solodrai
2024-11-29 21:50   ` Eduard Zingerman
2024-11-28  1:24 ` [RFC PATCH 6/9] btf_encoder: collect elf_functions in btf_encoder__pre_load_module Ihor Solodrai
2024-11-29 22:27   ` Eduard Zingerman
2024-11-28  1:24 ` [RFC PATCH 7/9] btf_encoder: switch to shared elf_functions table Ihor Solodrai
2024-11-29 22:35   ` Eduard Zingerman
2024-12-09 23:55     ` Ihor Solodrai
2024-11-28  1:24 ` [RFC PATCH 8/9] btf_encoder: introduce btf_encoding_context Ihor Solodrai
2024-11-29 23:12   ` Eduard Zingerman
2024-11-28  1:24 ` [RFC PATCH 9/9] pahole: faster reproducible BTF encoding Ihor Solodrai
2024-11-30  0:17   ` Eduard Zingerman
2024-12-02 13:55 ` [RFC PATCH 0/9] pahole: shared ELF and " Jiri Olsa
2024-12-06 18:19   ` Ihor Solodrai
2024-12-06 18:30     ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox