Dwarves debugging tools
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>,
	Clark Williams <williams@redhat.com>,
	dwarves@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: [PATCH 12/16] dwarf_loader: Support DW_TAG_imported_unit for same-file partial units
Date: Mon, 22 Jun 2026 17:24:35 -0300	[thread overview]
Message-ID: <20260622202441.14799-13-acme@kernel.org> (raw)
In-Reply-To: <20260622202441.14799-1-acme@kernel.org>

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Binaries processed by the dwz(1) tool have their DWARF type information
deduplicated into DW_TAG_partial_unit entries that are then referenced
via DW_TAG_imported_unit from each DW_TAG_compile_unit that uses those
types. This is the standard DWARF mechanism for cross-CU type sharing.

On Fedora/RHEL, most debuginfo packages are built with dwz, making this
a common pattern. For instance, bash-debuginfo has 10,486
DW_TAG_partial_unit, 384 DW_TAG_compile_unit, and 8,572
DW_TAG_imported_unit entries — all using same-file references (no .dwz
alternate DWARF file involved).

Before this patch, pahole skipped DW_TAG_partial_unit with a warning:

  WARNING: DW_TAG_partial_unit used, some types will not be considered!
           Probably this was optimized using a tool like 'dwz'
           A future version of pahole will support this.

And DW_TAG_imported_unit was silently ignored (returned NULL), causing
pahole to report "file has no dwarf type information" for binaries like
bash and glibc.

The fix adds die__process_imported_unit(), called from die__process_unit()
when encountering DW_TAG_imported_unit. It follows DW_AT_import to the
referenced DW_TAG_partial_unit DIE and processes its children inline into
the importing compile unit's type tables. This works because
dwarf_formref_die() already handles all DWARF reference forms, and each
CU maintains its own independent hash tables — so the same partial unit
can be safely imported by multiple CUs, each getting its own copy of the
types.

Since imported units can themselves contain DW_TAG_imported_unit entries
(nested imports), a depth limit of 64 is enforced to prevent stack
overflow from pathological or corrupted DWARF.  A warning is emitted if
the limit is reached.

Some binaries (e.g. chromium-browser on Fedora 44, built with Rust
components) also have DW_TAG_imported_unit entries that reference partial
units in an alternate debug file via DW_FORM_GNU_ref_alt (the
.gnu_debugaltlink mechanism). When elfutils resolves such a reference, it
returns DIEs from the alternate file whose offsets are in a different
address space — processing these into the main CU's hash tables corrupts
type references and causes a crash during type recoding.

The same DW_FORM_GNU_ref_alt form can also appear on regular type
attributes (DW_AT_type, DW_AT_abstract_origin, DW_AT_specification,
etc.), not just on DW_TAG_imported_unit's DW_AT_import. Guard all paths
via attr_form_is_ref_alt(), which skips the reference and warns once, so
users know why some types are missing rather than getting a crash.

The korg/alt_dwarf branch had a previous attempt at this that also
handled the .dwz alternate DWARF file case (DW_FORM_GNU_ref_alt), but it
was never merged and is now 294 commits behind master. This patch takes a
simpler approach focused on the same-file case first, which covers dwz
output on Fedora/RHEL where all partial units are within the same .debug
file.

Before (bash-5.3.9-3.fc44.x86_64 debuginfo):
  $ pahole -F dwarf /usr/lib/debug/usr/bin/bash-5.3.9-3.fc44.x86_64.debug
  WARNING: DW_TAG_partial_unit used, some types will not be considered!
  pahole: /usr/lib/debug/usr/bin/bash-5.3.9-3.fc44.x86_64.debug: file has no dwarf type information

After:
  $ pahole -F dwarf /usr/lib/debug/usr/bin/bash-5.3.9-3.fc44.x86_64.debug | wc -l
  1605
  $ pahole -F dwarf -C variable /usr/lib/debug/usr/bin/bash-5.3.9-3.fc44.x86_64.debug
  struct variable {
  	char *                     name;                 /*     0     8 */
  	char *                     value;                /*     8     8 */
  	...
  	/* size: 48, cachelines: 1, members: 7 */
  };

Before (chromium-browser debuginfo, Fedora 44):
  $ pahole /usr/lib/debug/.../chromium-browser-149.0.7827.155-1.fc44.x86_64.debug
  Segmentation fault

After:
  $ pahole /usr/lib/debug/.../chromium-browser-149.0.7827.155-1.fc44.x86_64.debug
  WARNING: DW_FORM_GNU_ref_alt (dwz alternate debug file) not yet supported,
           some types will not be available.

Reported-by: Sashiko:gemini-3-1-pro-preview # Running on a local machine
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 153 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 131 insertions(+), 22 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 9f7d2bd23359191b..7091655588cd8b4d 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -136,6 +136,9 @@ struct dwarf_cu {
 	struct dwarf_tag *last_type_lookup;
 	struct cu *cu;
 	struct dwarf_cu *type_unit;
+	Dwarf_Off *imported_units;
+	uint32_t  nr_imported_units;
+	uint32_t  allocated_imported_units;
 };
 
 static int dwarf_cu__init(struct dwarf_cu *dcu, struct cu *cu)
@@ -161,6 +164,9 @@ static int dwarf_cu__init(struct dwarf_cu *dcu, struct cu *cu)
 		INIT_HLIST_HEAD(&dcu->hash_types[i]);
 	}
 	dcu->type_unit = NULL;
+	dcu->imported_units = NULL;
+	dcu->nr_imported_units = 0;
+	dcu->allocated_imported_units = 0;
 	// To avoid a per-lookup check against NULL in dwarf_cu__find_type_by_ref()
 	dcu->last_type_lookup = &sentinel_dtag;
 	return 0;
@@ -185,6 +191,7 @@ static void dwarf_cu__delete(struct cu *cu)
 
 	struct dwarf_cu *dcu = cu->priv;
 
+	free(dcu->imported_units);
 	// dcu->hash_tags & dcu->hash_types are on cu->obstack
 	cu__free(cu, dcu);
 	cu->priv = NULL;
@@ -446,12 +453,32 @@ static const char *attr_string(Dwarf_Die *die, uint32_t name, struct conf_load *
 	return str;
 }
 
+static bool attr_form_is_ref_alt(Dwarf_Attribute *attr)
+{
+	if (attr->form == DW_FORM_GNU_ref_alt) {
+		static bool warned;
+
+		if (!warned) {
+			fprintf(stderr,
+				"WARNING: DW_FORM_GNU_ref_alt (dwz alternate debug file) not yet supported,\n"
+				"         some types will not be available.\n");
+			warned = true;
+		}
+		return true;
+	}
+	return false;
+}
+
 static bool attr_type(Dwarf_Die *die, uint32_t attr_name, Dwarf_Off *offset)
 {
 	Dwarf_Attribute attr;
 
 	if (dwarf_attr(die, attr_name, &attr) != NULL) {
 		Dwarf_Die type_die;
+		if (attr_form_is_ref_alt(&attr)) {
+			*offset = 0;
+			return 0;
+		}
 		if (dwarf_formref_die(&attr, &type_die) != NULL) {
 			*offset = dwarf_dieoffset(&type_die);
 			return attr.form == DW_FORM_ref_sig8;
@@ -679,7 +706,8 @@ static void type__init(struct type *type, Dwarf_Die *die, struct cu *cu, struct
 	Dwarf_Attribute attr;
 	if (dwarf_attr(die, DW_AT_type, &attr) != NULL) {
 		Dwarf_Die type_die;
-		if (dwarf_formref_die(&attr, &type_die) != NULL) {
+		if (!attr_form_is_ref_alt(&attr) &&
+		    dwarf_formref_die(&attr, &type_die) != NULL) {
 			uint64_t encoding = attr_numeric(&type_die, DW_AT_encoding);
 
 			if (encoding == DW_ATE_signed || encoding == DW_ATE_signed_char)
@@ -993,9 +1021,14 @@ static int add_gnu_annotation_chain(Dwarf_Die *die, int component_idx,
 	Dwarf_Attribute attr;
 	Dwarf_Die annot_die;
 
-	while (dwarf_attr(die, DW_AT_GNU_annotation, &attr) != NULL &&
-	       dwarf_formref_die(&attr, &annot_die) != NULL &&
-	       dwarf_tag(&annot_die) == DW_TAG_GNU_annotation) {
+	while (dwarf_attr(die, DW_AT_GNU_annotation, &attr) != NULL) {
+		if (attr_form_is_ref_alt(&attr))
+			break;
+		if (dwarf_formref_die(&attr, &annot_die) == NULL)
+			break;
+		if (dwarf_tag(&annot_die) != DW_TAG_GNU_annotation)
+			break;
+
 		int ret = add_tag_annotation(&annot_die, component_idx, conf, head);
 		if (ret)
 			return ret;
@@ -1791,9 +1824,13 @@ check_gnu_attr:
 		goto out;
 
 	/* Handle GCC-style DW_AT_GNU_annotation attribute */
-	while (dwarf_attr(die, DW_AT_GNU_annotation, &attr) != NULL &&
-	       dwarf_formref_die(&attr, &annot_die) != NULL &&
-	       dwarf_tag(&annot_die) == DW_TAG_GNU_annotation) {
+	while (dwarf_attr(die, DW_AT_GNU_annotation, &attr) != NULL) {
+		if (attr_form_is_ref_alt(&attr))
+			break;
+		if (dwarf_formref_die(&attr, &annot_die) == NULL)
+			break;
+		if (dwarf_tag(&annot_die) != DW_TAG_GNU_annotation)
+			break;
 		name = attr_string(&annot_die, DW_AT_name, conf);
 		if (strcmp(name, "btf_type_tag") != 0)
 			break;
@@ -2614,7 +2651,7 @@ static struct tag *__die__process_tag(Dwarf_Die *die, struct cu *cu,
 
 	switch (dwarf_tag(die)) {
 	case DW_TAG_imported_unit:
-		return NULL; // We don't support imported units yet, so to avoid segfaults
+		return &unsupported_tag; // Handled in die__process_unit()
 	case DW_TAG_array_type:
 		tag = die__create_new_array(die, cu);		break;
 	case DW_TAG_string_type: // FORTRAN stuff, looks like an array
@@ -2682,9 +2719,90 @@ static struct tag *__die__process_tag(Dwarf_Die *die, struct cu *cu,
 	return tag;
 }
 
-static int die__process_unit(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
+#define MAX_IMPORTED_UNIT_DEPTH 64
+
+static int die__process_unit(Dwarf_Die *die, struct cu *cu, struct conf_load *conf, int import_depth);
+
+static bool dwarf_cu__imported_unit_visited(struct dwarf_cu *dcu, Dwarf_Off offset)
+{
+	for (uint32_t i = 0; i < dcu->nr_imported_units; i++)
+		if (dcu->imported_units[i] == offset)
+			return true;
+	return false;
+}
+
+static int dwarf_cu__mark_imported_unit(struct dwarf_cu *dcu, struct cu *cu, Dwarf_Off offset)
+{
+	if (dcu->nr_imported_units == dcu->allocated_imported_units) {
+		uint32_t new_size = dcu->allocated_imported_units ? dcu->allocated_imported_units * 2 : 16;
+		Dwarf_Off *new_array = realloc(dcu->imported_units, new_size * sizeof(Dwarf_Off));
+		if (new_array == NULL)
+			return -ENOMEM;
+		dcu->imported_units = new_array;
+		dcu->allocated_imported_units = new_size;
+	}
+	dcu->imported_units[dcu->nr_imported_units++] = offset;
+	return 0;
+}
+
+static int die__process_imported_unit(Dwarf_Die *die, struct cu *cu, struct conf_load *conf, int import_depth)
+{
+	Dwarf_Attribute attr;
+
+	if (dwarf_attr(die, DW_AT_import, &attr) == NULL)
+		return 0;
+
+	if (attr_form_is_ref_alt(&attr))
+		return 0;
+
+	Dwarf_Die imported_die;
+
+	if (dwarf_formref_die(&attr, &imported_die) == NULL)
+		return 0;
+
+	if (dwarf_tag(&imported_die) != DW_TAG_partial_unit)
+		return 0;
+
+	if (import_depth >= MAX_IMPORTED_UNIT_DEPTH) {
+		static bool warned;
+
+		if (!warned) {
+			fprintf(stderr,
+				"WARNING: DW_TAG_imported_unit nesting too deep (>%d), "
+				"some types will not be available.\n",
+				MAX_IMPORTED_UNIT_DEPTH);
+			warned = true;
+		}
+		return 0;
+	}
+
+	Dwarf_Off offset = dwarf_dieoffset(&imported_die);
+	struct dwarf_cu *dcu = cu->priv;
+
+	if (dwarf_cu__imported_unit_visited(dcu, offset))
+		return 0;
+
+	if (dwarf_cu__mark_imported_unit(dcu, cu, offset))
+		return -ENOMEM;
+
+	Dwarf_Die child;
+
+	if (dwarf_child(&imported_die, &child) == 0)
+		return die__process_unit(&child, cu, conf, import_depth + 1);
+
+	return 0;
+}
+
+static int die__process_unit(Dwarf_Die *die, struct cu *cu, struct conf_load *conf, int import_depth)
 {
 	do {
+		if (dwarf_tag(die) == DW_TAG_imported_unit) {
+			int err = die__process_imported_unit(die, cu, conf, import_depth);
+			if (err)
+				return err;
+			continue;
+		}
+
 		struct tag *tag = die__process_tag(die, cu, 1, conf);
 		if (tag == NULL)
 			return -ENOMEM;
@@ -3305,17 +3423,8 @@ static int die__process(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
 		return 0; // so that other units can be processed
 	}
 
-	if (tag == DW_TAG_partial_unit) {
-		static bool warned;
-
-		if (!warned) {
-			fprintf(stderr, "WARNING: DW_TAG_partial_unit used, some types will not be considered!\n"
-					"         Probably this was optimized using a tool like 'dwz'\n"
-					"         A future version of pahole will support this.\n");
-			warned = true;
-		}
-		return 0; // so that other units can be processed
-	}
+	if (tag == DW_TAG_partial_unit)
+		return 0; // Processed inline when reached via DW_TAG_imported_unit
 
 	if (tag != DW_TAG_compile_unit && tag != DW_TAG_type_unit) {
 		fprintf(stderr, "%s: DW_TAG_compile_unit, DW_TAG_type_unit, DW_TAG_partial_unit or DW_TAG_skeleton_unit expected got %s (0x%x) @ %llx!\n",
@@ -3336,7 +3445,7 @@ static int die__process(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
 		return DWARF_CB_OK;
 
 	if (dwarf_child(die, &child) == 0) {
-		int err = die__process_unit(&child, cu, conf);
+		int err = die__process_unit(&child, cu, conf, 0);
 		if (err)
 			return err;
 	}
@@ -4099,7 +4208,7 @@ static int cus__merge_and_process_cu(struct cus *cus, struct conf_load *conf,
 				filtered = conf->early_cu_filter(&unmerged_cu) == NULL;
 			}
 
-			if (!filtered && die__process_unit(&child, cu, conf) != 0)
+			if (!filtered && die__process_unit(&child, cu, conf, 0) != 0)
 				goto out_abort;
 		}
 
-- 
2.54.0


  parent reply	other threads:[~2026-06-22 20:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-22 20:24 [PATCHES v3 0/7] Initial support for some Rust tags, DW_TAG_imported_unit Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 01/16] dwarf_loader: Initial support for DW_TAG_variant_part Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 02/16] dwarf_loader: Allow forcing the merge of CUs for solving inter CU tag references Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 03/16] dwarf_loader: Initial support for DW_TAG_subprogram in DW_TAG_enumeration Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 04/16] encoders: Fix diagnostic messages for unexpected tags in enumerations Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 05/16] dwarves_fprintf: Accumulate function__fprintf return value in enumeration printing Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 06/16] dwarves: Use tag__delete for enumeration children Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 07/16] btf_encoder: Fix types__match parameter comparison in BTF_KIND_FUNC_PROTO Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 08/16] encoders: Handle DW_TAG_subprogram in enumerations during BTF/CTF encoding Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 09/16] dwarf_loader: Populate DW_TAG_variant children in DW_TAG_variant_part Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 10/16] btf_encoder: Encode variant parts as union members in BTF Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 11/16] dwarf_loader: Handle DW_FORM_block in attr_numeric for Rust discriminant values Arnaldo Carvalho de Melo
2026-06-22 20:24 ` Arnaldo Carvalho de Melo [this message]
2026-06-22 20:24 ` [PATCH 13/16] dwarf_loader: Fix cus__merging_cu failing to detect DW_FORM_ref_addr Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 14/16] tests: Add inter-CU type reference comparison test Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 15/16] tests: Guard cleanup() against empty outdir to prevent rm /* Arnaldo Carvalho de Melo
2026-06-22 20:24 ` [PATCH 16/16] tests: Source test_lib.sh via dirname so tests run from any directory Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260622202441.14799-13-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=alan.maguire@oracle.com \
    --cc=dwarves@vger.kernel.org \
    --cc=jolsa@kernel.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox