* [PATCH dwarves] dwarf_loader: Handle DW_AT_location attrs containing DW_OP_plus_uconst
@ 2025-11-30 3:21 Yao Zi
2025-12-04 0:46 ` Yonghong Song
0 siblings, 1 reply; 4+ messages in thread
From: Yao Zi @ 2025-11-30 3:21 UTC (permalink / raw)
To: Alan Maguire; +Cc: dwarves, bpf, Yao Zi, q66
LLVM has a GlobalMerge pass, which tries to group multiple global
variables together and address them with through a single register with
offsets coded in instructions, to reduce register pressure. Address of
symbols transformed by the pass may be represented by an DWARF
expression consisting of DW_OP_addrx and DW_OP_plus_uconst, which
naturally matches the way a merged variable is addressed.
However, our dwarf_loader currently ignores anything but the first in
the location expression, including the DW_OP_plus_uconst atom, which
appears the second operation in this case. This could result in broken
BTF information produced by pahole, where several merged symbols are
given the same offset, even though in fact they don't overlap.
LLVM has enabled MergeGlobal pass for PowerPC[1] and RISC-V[2] by
default since version 20, let's handle DW_OP_plus_uconst operations in
DW_AT_location attributes correctly to ensure correct BTF could be
produced for LLVM-built kernels.
Fixes: a6ea527aab91 ("variable: Add ->addr member")
Reported-by: q66 <me@q66.moe>
Closes: https://github.com/ClangBuiltLinux/linux/issues/2089
Link: https://github.com/llvm/llvm-project/commit/aaa37d6755e6 # [1]
Link: https://github.com/llvm/llvm-project/commit/9d02264b03ea # [2]
Signed-off-by: Yao Zi <ziyao@disroot.org>
---
The problem is found by several distros building Linux kernel with LLVM
and BTF enabled, after upgrading to LLVM 20 or later, kernels built for
RISC-V and PowerPC issue errors like
[ 1.296358] BPF: type_id=4457 offset=4224 size=8
[ 1.296767] BPF:
[ 1.296919] BPF: Invalid offset
on startup, and loading any modules fails with -EINVAL unless
CONFIG_MODULE_ALLOW_BTF_MISMATCH is turned on,
# insmod tun.ko
[ 12.892421] failed to validate module [tun] BTF: -22
[ 12.936971] failed to validate module [tun] BTF: -22
insmod: can't insert 'tun.ko': Invalid argument
By comparing DWARF dump and BTF dump, it's found BTF contains symbols
with the same offset,
type_id=4148 offset=4208 size=8 (VAR 'vector_misaligned_access')
type_id=4147 offset=4208 size=8 (VAR 'misaligned_access_speed')
while the same symbols are described with different DW_AT_location
attributes,
0x0011ade7: DW_TAG_variable
DW_AT_name ("misaligned_access_speed")
DW_AT_type (0x0011adf2 "long")
DW_AT_decl_file ("...")
DW_AT_external (true)
DW_AT_decl_line (24)
DW_AT_location (DW_OP_addrx 0x0)
...
0x0011adf6: DW_TAG_variable
DW_AT_name ("vector_misaligned_access")
DW_AT_type (0x0011adf2 "long")
DW_AT_external (true)
DW_AT_decl_file ("...")
DW_AT_decl_line (25)
DW_AT_location (DW_OP_addrx 0x0, DW_OP_plus_uconst 0x8)
For more detailed analysis and kernel config for reproducing the issue,
please refer to the Closes link. Thanks for your time and review.
dwarf_loader.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/dwarf_loader.c b/dwarf_loader.c
index 79be3f516a26..635015676389 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -708,6 +708,11 @@ static enum vscope dwarf__location(Dwarf_Die *die, uint64_t *addr, struct locati
case DW_OP_addrx:
scope = VSCOPE_GLOBAL;
*addr = expr[0].number;
+
+ if (location->exprlen == 2 &&
+ expr[1].atom == DW_OP_plus_uconst)
+ addr += expr[1].number;
+
break;
case DW_OP_reg1 ... DW_OP_reg31:
case DW_OP_breg0 ... DW_OP_breg31:
--
2.51.2
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH dwarves] dwarf_loader: Handle DW_AT_location attrs containing DW_OP_plus_uconst
2025-11-30 3:21 [PATCH dwarves] dwarf_loader: Handle DW_AT_location attrs containing DW_OP_plus_uconst Yao Zi
@ 2025-12-04 0:46 ` Yonghong Song
2025-12-13 8:20 ` Yao Zi
0 siblings, 1 reply; 4+ messages in thread
From: Yonghong Song @ 2025-12-04 0:46 UTC (permalink / raw)
To: Yao Zi, Alan Maguire; +Cc: dwarves, bpf, q66
On 11/29/25 7:21 PM, Yao Zi wrote:
> LLVM has a GlobalMerge pass, which tries to group multiple global
> variables together and address them with through a single register with
> offsets coded in instructions, to reduce register pressure. Address of
> symbols transformed by the pass may be represented by an DWARF
> expression consisting of DW_OP_addrx and DW_OP_plus_uconst, which
> naturally matches the way a merged variable is addressed.
>
> However, our dwarf_loader currently ignores anything but the first in
> the location expression, including the DW_OP_plus_uconst atom, which
> appears the second operation in this case. This could result in broken
> BTF information produced by pahole, where several merged symbols are
> given the same offset, even though in fact they don't overlap.
>
> LLVM has enabled MergeGlobal pass for PowerPC[1] and RISC-V[2] by
> default since version 20, let's handle DW_OP_plus_uconst operations in
> DW_AT_location attributes correctly to ensure correct BTF could be
> produced for LLVM-built kernels.
>
> Fixes: a6ea527aab91 ("variable: Add ->addr member")
> Reported-by: q66 <me@q66.moe>
> Closes: https://github.com/ClangBuiltLinux/linux/issues/2089
> Link: https://github.com/llvm/llvm-project/commit/aaa37d6755e6 # [1]
> Link: https://github.com/llvm/llvm-project/commit/9d02264b03ea # [2]
> Signed-off-by: Yao Zi <ziyao@disroot.org>
> ---
>
> The problem is found by several distros building Linux kernel with LLVM
> and BTF enabled, after upgrading to LLVM 20 or later, kernels built for
> RISC-V and PowerPC issue errors like
>
> [ 1.296358] BPF: type_id=4457 offset=4224 size=8
> [ 1.296767] BPF:
> [ 1.296919] BPF: Invalid offset
>
> on startup, and loading any modules fails with -EINVAL unless
> CONFIG_MODULE_ALLOW_BTF_MISMATCH is turned on,
>
> # insmod tun.ko
> [ 12.892421] failed to validate module [tun] BTF: -22
> [ 12.936971] failed to validate module [tun] BTF: -22
> insmod: can't insert 'tun.ko': Invalid argument
>
> By comparing DWARF dump and BTF dump, it's found BTF contains symbols
> with the same offset,
>
> type_id=4148 offset=4208 size=8 (VAR 'vector_misaligned_access')
> type_id=4147 offset=4208 size=8 (VAR 'misaligned_access_speed')
>
> while the same symbols are described with different DW_AT_location
> attributes,
>
> 0x0011ade7: DW_TAG_variable
> DW_AT_name ("misaligned_access_speed")
> DW_AT_type (0x0011adf2 "long")
> DW_AT_decl_file ("...")
> DW_AT_external (true)
> DW_AT_decl_line (24)
> DW_AT_location (DW_OP_addrx 0x0)
>
> ...
>
> 0x0011adf6: DW_TAG_variable
> DW_AT_name ("vector_misaligned_access")
> DW_AT_type (0x0011adf2 "long")
> DW_AT_external (true)
> DW_AT_decl_file ("...")
> DW_AT_decl_line (25)
> DW_AT_location (DW_OP_addrx 0x0, DW_OP_plus_uconst 0x8)
>
> For more detailed analysis and kernel config for reproducing the issue,
> please refer to the Closes link. Thanks for your time and review.
>
> dwarf_loader.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index 79be3f516a26..635015676389 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -708,6 +708,11 @@ static enum vscope dwarf__location(Dwarf_Die *die, uint64_t *addr, struct locati
> case DW_OP_addrx:
> scope = VSCOPE_GLOBAL;
> *addr = expr[0].number;
> +
> + if (location->exprlen == 2 &&
> + expr[1].atom == DW_OP_plus_uconst)
> + addr += expr[1].number;
This does not work. 'addr' is the parameter and the above new 'addr' value won't
pass back to caller so the above is effectively a noop.
I think we need to add an additional parameter to pass the 'expr[1].number' back
to the caller, e.g.,
static enum vscope dwarf__location(Dwarf_Die *die, uint64_t *addr, uint32_t *offset, struct location *location) { ... }
and
in the above
*offset = expr[1].number.
Now the caller has the following information:
. The deference of *addr stores the index to .debug_addr
. The offset to the address in .debug_addr
and the final address will be debug_addr[*addr] + offset.
> +
> break;
> case DW_OP_reg1 ... DW_OP_reg31:
> case DW_OP_breg0 ... DW_OP_breg31:
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH dwarves] dwarf_loader: Handle DW_AT_location attrs containing DW_OP_plus_uconst
2025-12-04 0:46 ` Yonghong Song
@ 2025-12-13 8:20 ` Yao Zi
2025-12-17 4:12 ` Yonghong Song
0 siblings, 1 reply; 4+ messages in thread
From: Yao Zi @ 2025-12-13 8:20 UTC (permalink / raw)
To: Yonghong Song, Alan Maguire; +Cc: dwarves, bpf, q66
Hi Yonghong,
Sorry for the late reply,
On Wed, Dec 03, 2025 at 04:46:20PM -0800, Yonghong Song wrote:
>
>
> On 11/29/25 7:21 PM, Yao Zi wrote:
...
> > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > index 79be3f516a26..635015676389 100644
> > --- a/dwarf_loader.c
> > +++ b/dwarf_loader.c
> > @@ -708,6 +708,11 @@ static enum vscope dwarf__location(Dwarf_Die *die, uint64_t *addr, struct locati
> > case DW_OP_addrx:
> > scope = VSCOPE_GLOBAL;
> > *addr = expr[0].number;
> > +
> > + if (location->exprlen == 2 &&
> > + expr[1].atom == DW_OP_plus_uconst)
> > + addr += expr[1].number;
>
> This does not work. 'addr' is the parameter and the above new 'addr' value won't
> pass back to caller so the above is effectively a noop.
Oops, this is a silly problem.
> I think we need to add an additional parameter to pass the 'expr[1].number' back
> to the caller, e.g.,
However, I don't think it's necessary. See my explanation below,
> static enum vscope dwarf__location(Dwarf_Die *die, uint64_t *addr, uint32_t *offset, struct location *location) { ... }
>
> and
>
> in the above
> *offset = expr[1].number.
>
> Now the caller has the following information:
> . The deference of *addr stores the index to .debug_addr
No, dwarf__location() invokes attr_location(), which calls
dwarf_getlocation() and dwarf_formaddr(), the latter already performs a
lookup in .debug_addr[1], so what is stored in *addr is right the symbol
address.
Thus I think it's enough to keep the signature, but add the offset to
*addr.
> . The offset to the address in .debug_addr
> and the final address will be debug_addr[*addr] + offset.
>
> > +
> > break;
> > case DW_OP_reg1 ... DW_OP_reg31:
> > case DW_OP_breg0 ... DW_OP_breg31:
>
Thanks for your review, I'll soon send a patch with the missing pointer
dereference to addr added.
Best regards,
Yao Zi
[1]: https://github.com/sourceware-org/elfutils/blob/67199e1c974db37f2bd200dcca7d7103f42ed06e/libdw/dwarf_formaddr.c#L37-L77
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH dwarves] dwarf_loader: Handle DW_AT_location attrs containing DW_OP_plus_uconst
2025-12-13 8:20 ` Yao Zi
@ 2025-12-17 4:12 ` Yonghong Song
0 siblings, 0 replies; 4+ messages in thread
From: Yonghong Song @ 2025-12-17 4:12 UTC (permalink / raw)
To: Yao Zi, Alan Maguire; +Cc: dwarves, bpf, q66
On 12/13/25 12:20 AM, Yao Zi wrote:
> Hi Yonghong,
>
> Sorry for the late reply,
>
> On Wed, Dec 03, 2025 at 04:46:20PM -0800, Yonghong Song wrote:
>>
>> On 11/29/25 7:21 PM, Yao Zi wrote:
> ...
>
>>> diff --git a/dwarf_loader.c b/dwarf_loader.c
>>> index 79be3f516a26..635015676389 100644
>>> --- a/dwarf_loader.c
>>> +++ b/dwarf_loader.c
>>> @@ -708,6 +708,11 @@ static enum vscope dwarf__location(Dwarf_Die *die, uint64_t *addr, struct locati
>>> case DW_OP_addrx:
>>> scope = VSCOPE_GLOBAL;
>>> *addr = expr[0].number;
>>> +
>>> + if (location->exprlen == 2 &&
>>> + expr[1].atom == DW_OP_plus_uconst)
>>> + addr += expr[1].number;
>> This does not work. 'addr' is the parameter and the above new 'addr' value won't
>> pass back to caller so the above is effectively a noop.
> Oops, this is a silly problem.
>
>> I think we need to add an additional parameter to pass the 'expr[1].number' back
>> to the caller, e.g.,
> However, I don't think it's necessary. See my explanation below,
>
>> static enum vscope dwarf__location(Dwarf_Die *die, uint64_t *addr, uint32_t *offset, struct location *location) { ... }
>>
>> and
>>
>> in the above
>> *offset = expr[1].number.
>>
>> Now the caller has the following information:
>> . The deference of *addr stores the index to .debug_addr
> No, dwarf__location() invokes attr_location(), which calls
> dwarf_getlocation() and dwarf_formaddr(), the latter already performs a
> lookup in .debug_addr[1], so what is stored in *addr is right the symbol
> address.
>
> Thus I think it's enough to keep the signature, but add the offset to
> *addr.
Indeed, this does make sense. Thanks for explanation.
>
>> . The offset to the address in .debug_addr
>> and the final address will be debug_addr[*addr] + offset.
>>
>>> +
>>> break;
>>> case DW_OP_reg1 ... DW_OP_reg31:
>>> case DW_OP_breg0 ... DW_OP_breg31:
> Thanks for your review, I'll soon send a patch with the missing pointer
> dereference to addr added.
>
> Best regards,
> Yao Zi
>
> [1]: https://github.com/sourceware-org/elfutils/blob/67199e1c974db37f2bd200dcca7d7103f42ed06e/libdw/dwarf_formaddr.c#L37-L77
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-12-17 4:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-30 3:21 [PATCH dwarves] dwarf_loader: Handle DW_AT_location attrs containing DW_OP_plus_uconst Yao Zi
2025-12-04 0:46 ` Yonghong Song
2025-12-13 8:20 ` Yao Zi
2025-12-17 4:12 ` Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox