* Encoding BTF information from DWARF causes "has void type" error.
@ 2024-02-06 12:02 Sebastian Andrzej Siewior
2024-02-06 13:42 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-02-06 12:02 UTC (permalink / raw)
To: dwarves; +Cc: Thomas Gleixner
Hi,
with linux kernel v6.8-rc2-RT and the following file as testcase
----->8--------
#include <linux/spinlock.h>
#include <linux/local_lock.h>
struct per_cpu_struct {
local_lock_t lock;
unsigned int something;
};
static DEFINE_PER_CPU(struct per_cpu_struct, per_cpu_struct) = {
.lock = INIT_LOCAL_LOCK(lock),
};
DEFINE_GUARD(ll_lock, local_lock_t __percpu*,
local_lock(_T),
local_unlock(_T))
void function(void);
void function(void)
{
struct per_cpu_struct *pcs = this_cpu_ptr(&per_cpu_struct);
guard(ll_lock)(&per_cpu_struct.lock);
pcs->something++;
}
-----8<--------
compiling and running pahole afterwards:
| make kernel/pahole-tc.o && pahole -J --btf_gen_floats -j --lang_exclude=rust kernel/pahole-tc.o
| CC kernel/pahole-tc.o
| error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
This doesn't look good. If I swap the order of "lock" and "something"
within per_cpu_struct then it goes away.
The dwarf/die object it complains about has only DW_AT_abstract_origin
and DW_AT_location set.
At this point I am not sure if gcc wrongly created the dwarf information
or of pahole accidentally matches the internal local_lock_t member as
per_cpu_struct because it starts the same address.
From dumpwarf the problematic entry is:
| 0x0000085b: DW_TAG_variable
| DW_AT_abstract_origin (0x0000099e "t")
| DW_AT_location (DW_OP_addr 0x0, DW_OP_stack_value)
which looks slightly different in the good case:
| 0x00000852: DW_TAG_variable
| DW_AT_abstract_origin (0x00000980 "t")
| DW_AT_location (0x00000040:
| [0x0000000000000046, 0x000000000000005b): DW_OP_reg3 RBX
| [0x000000000000005b, 0x0000000000000061): DW_OP_addr 0x0, DW_OP_stack_value)
| DW_AT_GNU_locviews (0x0000003c)
I made an archive with c file, the compiled version in bad and good case
(swapped the first two members), the dumpwarf output and uploaded to
https://breakpoint.cc/pahole-tc.tar.xz
Sebastian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Encoding BTF information from DWARF causes "has void type" error.
2024-02-06 12:02 Encoding BTF information from DWARF causes "has void type" error Sebastian Andrzej Siewior
@ 2024-02-06 13:42 ` Arnaldo Carvalho de Melo
2024-02-06 14:23 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-02-06 13:42 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: dwarves, Thomas Gleixner
On Tue, Feb 06, 2024 at 01:02:26PM +0100, Sebastian Andrzej Siewior wrote:
> with linux kernel v6.8-rc2-RT and the following file as testcase
>
> ----->8--------
>
> #include <linux/spinlock.h>
> #include <linux/local_lock.h>
>
> struct per_cpu_struct {
> local_lock_t lock;
> unsigned int something;
> };
>
> static DEFINE_PER_CPU(struct per_cpu_struct, per_cpu_struct) = {
> .lock = INIT_LOCAL_LOCK(lock),
> };
>
> DEFINE_GUARD(ll_lock, local_lock_t __percpu*,
> local_lock(_T),
> local_unlock(_T))
>
> void function(void);
> void function(void)
> {
> struct per_cpu_struct *pcs = this_cpu_ptr(&per_cpu_struct);
>
> guard(ll_lock)(&per_cpu_struct.lock);
> pcs->something++;
> }
>
> -----8<--------
>
> compiling and running pahole afterwards:
> | make kernel/pahole-tc.o && pahole -J --btf_gen_floats -j --lang_exclude=rust kernel/pahole-tc.o
> | CC kernel/pahole-tc.o
> | error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
>
> This doesn't look good. If I swap the order of "lock" and "something"
> within per_cpu_struct then it goes away.
> The dwarf/die object it complains about has only DW_AT_abstract_origin
> and DW_AT_location set.
> At this point I am not sure if gcc wrongly created the dwarf information
> or of pahole accidentally matches the internal local_lock_t member as
> per_cpu_struct because it starts the same address.
>
> >From dumpwarf the problematic entry is:
> | 0x0000085b: DW_TAG_variable
> | DW_AT_abstract_origin (0x0000099e "t")
> | DW_AT_location (DW_OP_addr 0x0, DW_OP_stack_value)
>
> which looks slightly different in the good case:
> | 0x00000852: DW_TAG_variable
> | DW_AT_abstract_origin (0x00000980 "t")
> | DW_AT_location (0x00000040:
> | [0x0000000000000046, 0x000000000000005b): DW_OP_reg3 RBX
> | [0x000000000000005b, 0x0000000000000061): DW_OP_addr 0x0, DW_OP_stack_value)
> | DW_AT_GNU_locviews (0x0000003c)
>
> I made an archive with c file, the compiled version in bad and good case
> (swapped the first two members), the dumpwarf output and uploaded to
> https://breakpoint.cc/pahole-tc.tar.xz
Lots of great info, I'm on it.
- Arnaldo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Encoding BTF information from DWARF causes "has void type" error.
2024-02-06 13:42 ` Arnaldo Carvalho de Melo
@ 2024-02-06 14:23 ` Arnaldo Carvalho de Melo
2024-02-06 14:40 ` Sebastian Andrzej Siewior
2024-02-07 10:01 ` Alan Maguire
0 siblings, 2 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-02-06 14:23 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Jiri Olsa, Alan Maguire
Cc: dwarves, Thomas Gleixner
On Tue, Feb 06, 2024 at 10:42:44AM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Feb 06, 2024 at 01:02:26PM +0100, Sebastian Andrzej Siewior wrote:
> > with linux kernel v6.8-rc2-RT and the following file as testcase
> >
> > ----->8--------
> >
> > #include <linux/spinlock.h>
> > #include <linux/local_lock.h>
> >
> > struct per_cpu_struct {
> > local_lock_t lock;
> > unsigned int something;
> > };
> >
> > static DEFINE_PER_CPU(struct per_cpu_struct, per_cpu_struct) = {
> > .lock = INIT_LOCAL_LOCK(lock),
> > };
> >
> > DEFINE_GUARD(ll_lock, local_lock_t __percpu*,
> > local_lock(_T),
> > local_unlock(_T))
> >
> > void function(void);
> > void function(void)
> > {
> > struct per_cpu_struct *pcs = this_cpu_ptr(&per_cpu_struct);
> >
> > guard(ll_lock)(&per_cpu_struct.lock);
> > pcs->something++;
> > }
> >
> > -----8<--------
> >
> > compiling and running pahole afterwards:
> > | make kernel/pahole-tc.o && pahole -J --btf_gen_floats -j --lang_exclude=rust kernel/pahole-tc.o
> > | CC kernel/pahole-tc.o
> > | error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> >
> > This doesn't look good. If I swap the order of "lock" and "something"
> > within per_cpu_struct then it goes away.
> > The dwarf/die object it complains about has only DW_AT_abstract_origin
> > and DW_AT_location set.
Are you sure?
What I see:
⬢[acme@toolbox pahole]$ cp sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o .
⬢[acme@toolbox pahole]$ pahole --btf_encode pahole-tc-bad.o
error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
⬢[acme@toolbox pahole]$
⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep DW_TAG_variable -A5 | grep -w per_cpu_struct -B1 -A4
<1><764>: Abbrev Number: 29 (DW_TAG_variable)
<765> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
<769> DW_AT_decl_file : 1
<769> DW_AT_decl_line : 9
<76a> DW_AT_decl_column : 8
<76b> DW_AT_type : <0x73c>
⬢[acme@toolbox pahole]$
⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep '<73c>.*DW_TAG_structure_type' -A7
<1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
<73d> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
<741> DW_AT_byte_size : 136
<742> DW_AT_decl_file : 1
<743> DW_AT_decl_line : 4
<744> DW_AT_decl_column : 8
<745> DW_AT_sibling : <0x764>
<2><749>: Abbrev Number: 1 (DW_TAG_member)
⬢[acme@toolbox pahole]$
But when using verbose mode:
[95] FUNC function type_id=94
search cu 'kernel/pahole-tc.c' for percpu global variables.
Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
[96] VAR per_cpu_struct type=90 linkage=0
Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
[97] VAR per_cpu_struct type=91 linkage=0
error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
[98] DATASEC .data..percpu size=136 vlen=2
type=96 offset=0 size=136
type=97 offset=0 size=136
⬢[acme@toolbox pahole]$ pahole -V --btf_encode pahole-tc-bad.o
It ends up considering two variables with that name?
⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep per_cpu_struct -B1 -A5
<1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
<73d> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
<741> DW_AT_byte_size : 136
<742> DW_AT_decl_file : 1
<743> DW_AT_decl_line : 4
<744> DW_AT_decl_column : 8
<745> DW_AT_sibling : <0x764>
--
<1><764>: Abbrev Number: 29 (DW_TAG_variable)
<765> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
<769> DW_AT_decl_file : 1
<769> DW_AT_decl_line : 9
<76a> DW_AT_decl_column : 8
<76b> DW_AT_type : <0x73c>
<76f> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0)
⬢[acme@toolbox pahole]$
Just dumping some more info in case Jiri or Alan thinks about something.
- Arnaldo
> > At this point I am not sure if gcc wrongly created the dwarf information
> > or of pahole accidentally matches the internal local_lock_t member as
> > per_cpu_struct because it starts the same address.
> >
> > >From dumpwarf the problematic entry is:
> > | 0x0000085b: DW_TAG_variable
> > | DW_AT_abstract_origin (0x0000099e "t")
> > | DW_AT_location (DW_OP_addr 0x0, DW_OP_stack_value)
> >
> > which looks slightly different in the good case:
> > | 0x00000852: DW_TAG_variable
> > | DW_AT_abstract_origin (0x00000980 "t")
> > | DW_AT_location (0x00000040:
> > | [0x0000000000000046, 0x000000000000005b): DW_OP_reg3 RBX
> > | [0x000000000000005b, 0x0000000000000061): DW_OP_addr 0x0, DW_OP_stack_value)
> > | DW_AT_GNU_locviews (0x0000003c)
> >
> > I made an archive with c file, the compiled version in bad and good case
> > (swapped the first two members), the dumpwarf output and uploaded to
> > https://breakpoint.cc/pahole-tc.tar.xz
>
> Lots of great info, I'm on it.
>
> - Arnaldo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Encoding BTF information from DWARF causes "has void type" error.
2024-02-06 14:23 ` Arnaldo Carvalho de Melo
@ 2024-02-06 14:40 ` Sebastian Andrzej Siewior
2024-02-07 10:01 ` Alan Maguire
1 sibling, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-02-06 14:40 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Jiri Olsa, Alan Maguire, dwarves, Thomas Gleixner
On 2024-02-06 11:23:54 [-0300], Arnaldo Carvalho de Melo wrote:
> > > compiling and running pahole afterwards:
> > > | make kernel/pahole-tc.o && pahole -J --btf_gen_floats -j --lang_exclude=rust kernel/pahole-tc.o
> > > | CC kernel/pahole-tc.o
> > > | error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> > >
> > > This doesn't look good. If I swap the order of "lock" and "something"
> > > within per_cpu_struct then it goes away.
> > > The dwarf/die object it complains about has only DW_AT_abstract_origin
> > > and DW_AT_location set.
>
> Are you sure?
I added to variable__new():
| for (i = 0; i < DW_AT_hi_user; i++) {
| if (dwarf_hasattr(die, i))
| fprintf(stderr, " => ATTR %x\n", i);
| }
and I get:
| => ATTR 2
| => ATTR 31
so this what I have to say in my defence ;)
> What I see:
>
> ⬢[acme@toolbox pahole]$ cp sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o .
> ⬢[acme@toolbox pahole]$ pahole --btf_encode pahole-tc-bad.o
> error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> ⬢[acme@toolbox pahole]$
>
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep DW_TAG_variable -A5 | grep -w per_cpu_struct -B1 -A4
> <1><764>: Abbrev Number: 29 (DW_TAG_variable)
> <765> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <769> DW_AT_decl_file : 1
> <769> DW_AT_decl_line : 9
> <76a> DW_AT_decl_column : 8
> <76b> DW_AT_type : <0x73c>
> ⬢[acme@toolbox pahole]$
>
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep '<73c>.*DW_TAG_structure_type' -A7
> <1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
> <73d> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <741> DW_AT_byte_size : 136
> <742> DW_AT_decl_file : 1
> <743> DW_AT_decl_line : 4
> <744> DW_AT_decl_column : 8
> <745> DW_AT_sibling : <0x764>
> <2><749>: Abbrev Number: 1 (DW_TAG_member)
> ⬢[acme@toolbox pahole]$
>
> But when using verbose mode:
>
> [95] FUNC function type_id=94
> search cu 'kernel/pahole-tc.c' for percpu global variables.
> Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
> [96] VAR per_cpu_struct type=90 linkage=0
> Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
> [97] VAR per_cpu_struct type=91 linkage=0
> error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> [98] DATASEC .data..percpu size=136 vlen=2
> type=96 offset=0 size=136
> type=97 offset=0 size=136
> ⬢[acme@toolbox pahole]$ pahole -V --btf_encode pahole-tc-bad.o
>
> It ends up considering two variables with that name?
Yes. That is the odd part. The second one comes from the guard()
construct which has also a this_cpu_ptr() as part of local_lock(). After
a while I figured out that dwarf_tag->id is the same as in dwarfdump
output and this matched the "t".
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep per_cpu_struct -B1 -A5
> <1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
> <73d> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <741> DW_AT_byte_size : 136
> <742> DW_AT_decl_file : 1
> <743> DW_AT_decl_line : 4
> <744> DW_AT_decl_column : 8
> <745> DW_AT_sibling : <0x764>
> --
> <1><764>: Abbrev Number: 29 (DW_TAG_variable)
> <765> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <769> DW_AT_decl_file : 1
> <769> DW_AT_decl_line : 9
> <76a> DW_AT_decl_column : 8
> <76b> DW_AT_type : <0x73c>
> <76f> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0)
> ⬢[acme@toolbox pahole]$
>
> Just dumping some more info in case Jiri or Alan thinks about something.
Thank you for looking at this Arnaldo.
> - Arnaldo
Sebastian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Encoding BTF information from DWARF causes "has void type" error.
2024-02-06 14:23 ` Arnaldo Carvalho de Melo
2024-02-06 14:40 ` Sebastian Andrzej Siewior
@ 2024-02-07 10:01 ` Alan Maguire
2024-02-07 11:03 ` Alan Maguire
1 sibling, 1 reply; 8+ messages in thread
From: Alan Maguire @ 2024-02-07 10:01 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Sebastian Andrzej Siewior, Jiri Olsa
Cc: dwarves, Thomas Gleixner
On 06/02/2024 14:23, Arnaldo Carvalho de Melo wrote:
> On Tue, Feb 06, 2024 at 10:42:44AM -0300, Arnaldo Carvalho de Melo wrote:
>> On Tue, Feb 06, 2024 at 01:02:26PM +0100, Sebastian Andrzej Siewior wrote:
>>> with linux kernel v6.8-rc2-RT and the following file as testcase
>>>
>>> ----->8--------
>>>
>>> #include <linux/spinlock.h>
>>> #include <linux/local_lock.h>
>>>
>>> struct per_cpu_struct {
>>> local_lock_t lock;
>>> unsigned int something;
>>> };
>>>
>>> static DEFINE_PER_CPU(struct per_cpu_struct, per_cpu_struct) = {
>>> .lock = INIT_LOCAL_LOCK(lock),
>>> };
>>>
>>> DEFINE_GUARD(ll_lock, local_lock_t __percpu*,
>>> local_lock(_T),
>>> local_unlock(_T))
>>>
>>> void function(void);
>>> void function(void)
>>> {
>>> struct per_cpu_struct *pcs = this_cpu_ptr(&per_cpu_struct);
>>>
>>> guard(ll_lock)(&per_cpu_struct.lock);
>>> pcs->something++;
>>> }
>>>
>>> -----8<--------
>>>
>>> compiling and running pahole afterwards:
>>> | make kernel/pahole-tc.o && pahole -J --btf_gen_floats -j --lang_exclude=rust kernel/pahole-tc.o
>>> | CC kernel/pahole-tc.o
>>> | error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
>>>
>>> This doesn't look good. If I swap the order of "lock" and "something"
>>> within per_cpu_struct then it goes away.
>>> The dwarf/die object it complains about has only DW_AT_abstract_origin
>>> and DW_AT_location set.
>
Thanks for reporting and providing all this data!
The eventual generated BTF might give us some clues, running
$ pahole -J pahole-tc-good-swap.o
$ pahole -J pahole-tc-bad.o
...and dumping raw BTF we see that in the bad case, we've got 3
variables of different types called "per_cpu_ptr":
$ bpftool btf dump file pahole-tc-bad.o |grep per_cpu_ptr
[88] STRUCT 'per_cpu_struct' size=136 vlen=2
[94] VAR 'per_cpu_struct' type_id=88, linkage=static
[95] VAR 'per_cpu_struct' type_id=89, linkage=static
[96] VAR 'per_cpu_struct' type_id=87, linkage=static
type_id=94 offset=0 size=136 (VAR 'per_cpu_pe struct')
type_id=95 offset=0 size=136 (VAR 'per_cpu_struct')
type_id=96 offset=0 size=136 (VAR 'per_cpu_struct')
The first variable [94] is of type "struct per_cpu_struct", the second is
[89] TYPEDEF 'class_ll_lock_t' type_id=87
...and the third is type id 87:
[86] TYPEDEF 'local_lock_t' type_id=73
[87] PTR '(anon)' type_id=86
which is a pointer to a 'typedef local_local_t'.
Contrast this to the good case, where only 1 is seen:
$ bpftool btf dump file pahole-tc-good-swap.o format raw |grep per_cpu
[88] STRUCT 'per_cpu_struct' size=136 vlen=2
[94] VAR 'per_cpu_struct' type_id=88, linkage=static
type_id=94 offset=0 size=136 (VAR 'per_cpu_struct')
So I think as you suspected we oddly seem to be conflating the
per_cpu_struct variable with its first member somehow, and
worse doing it twice; once for the typedef class_ll_lock_t and
again for the ptr to local_lock_t.
The DWARF looks right to me:
0x00000764: DW_TAG_variable
DW_AT_name ("per_cpu_struct")
DW_AT_decl_file
("/home/bigeasy/linux-stable-intree/kernel/pahole-tc.c")
DW_AT_decl_line (9)
DW_AT_decl_column (0x08)
DW_AT_type (0x0000073c "per_cpu_struct")
DW_AT_location (DW_OP_addr 0x0)
Putting additional debugging in the BTF encoder:
diff --git a/btf_encoder.c b/btf_encoder.c
index fd04008..73ee6f7 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1520,6 +1520,9 @@ static int btf_encoder__encode_cu_variables(struct
btf_encoder *encoder)
uint64_t addr;
int id;
+ if (encoder->verbose)
+ printf("processing var '%s' at pos %p, tag
0x%x\n", variable__name(var), pos, pos->tag);
+
if (var->declaration && !var->spec)
continue;
...we see
search cu 'kernel/pahole-tc.c' for percpu global variables.
processing var 'current_stack_pointer' at pos 0xa14be0, tag 0x34
processing var 'this_cpu_off' at pos 0xa14cb0, tag 0x34
processing var 'per_cpu_struct' at pos 0xa201f0, tag 0x34
Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0
encoded
[96] VAR per_cpu_struct type=90 linkage=0
processing var 'pcs' at pos 0xa20c20, tag 0x34
processing var '__UNIQUE_ID_guard44' at pos 0xa20cf0, tag 0x34
Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0
encoded
[97] VAR per_cpu_struct type=91 linkage=0
processing var '__vpp_verify' at pos 0xa20e60, tag 0x34
processing var 'tcp_ptr__' at pos 0xa20fd0, tag 0x34
processing var '(null)' at pos 0xa21130, tag 0x34
error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that
has void type
Looking at the above, we see it's the __UNIQUE_ID prefixed variables
that cause the trouble; for each of these we wind up encoding an
instance of the per_cpu_struct variable.
We have a special case meant to handle this, but it is triggering for
the __UNIQUE_ID variables:
/* A lot of "special" DWARF variables (e.g, __UNIQUE_ID___xxx)
* have addr == 0, which is the same as, say, valid
* fixed_percpu_data per-CPU variable. To distinguish
between
* them, additionally compare DWARF and ELF symbol names. If
* DWARF doesn't provide proper name, pessimistically assume
* bad variable.
*
* Examples of such special variables are:
*
* 1. __ADDRESSABLE(sym), which are forcely emitted as
symbols.
* 2. __UNIQUE_ID(prefix), which are introduced to
generate unique ids.
* 3. __exitcall(fn), functions which are labeled as
exit calls.
*
* This is relevant only for vmlinux image, as for kernel
* modules per-CPU data section has non-zero offset so all
* per-CPU symbols have non-zero values.
*/
if (var->ip.addr == 0) {
if (!dwarf_name || strcmp(dwarf_name, name))
continue;
}
So I suspect the problem is
if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name)
...which uses the associated address to look up the name, and then goes
on to use that name to encode.
The problem is the conditional check above; I can't see why it needs to
be guarded here. We never want to do an address-based lookup for a
variable and not have it be the same variable name as we have from DWARF
- which is the one we're trying to encode, right? The following fixes
the issue for me, can you try this out if you get a chance? I might be
missing something about that check so other folks please do weigh in if
something looks broken. Thanks!
diff --git a/btf_encoder.c b/btf_encoder.c
index fd04008..aecb311 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1559,10 +1559,8 @@ static int
btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
* modules per-CPU data section has non-zero offset so all
* per-CPU symbols have non-zero values.
*/
- if (var->ip.addr == 0) {
- if (!dwarf_name || strcmp(dwarf_name, name))
- continue;
- }
+ if (!dwarf_name || strcmp(dwarf_name, name))
+ continue;
if (var->spec)
var = var->spec;
> Are you sure?
>
> What I see:
>
> ⬢[acme@toolbox pahole]$ cp sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o .
> ⬢[acme@toolbox pahole]$ pahole --btf_encode pahole-tc-bad.o
> error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> ⬢[acme@toolbox pahole]$
>
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep DW_TAG_variable -A5 | grep -w per_cpu_struct -B1 -A4
> <1><764>: Abbrev Number: 29 (DW_TAG_variable)
> <765> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <769> DW_AT_decl_file : 1
> <769> DW_AT_decl_line : 9
> <76a> DW_AT_decl_column : 8
> <76b> DW_AT_type : <0x73c>
> ⬢[acme@toolbox pahole]$
>
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep '<73c>.*DW_TAG_structure_type' -A7
> <1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
> <73d> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <741> DW_AT_byte_size : 136
> <742> DW_AT_decl_file : 1
> <743> DW_AT_decl_line : 4
> <744> DW_AT_decl_column : 8
> <745> DW_AT_sibling : <0x764>
> <2><749>: Abbrev Number: 1 (DW_TAG_member)
> ⬢[acme@toolbox pahole]$
>
> But when using verbose mode:
>
> [95] FUNC function type_id=94
> search cu 'kernel/pahole-tc.c' for percpu global variables.
> Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
> [96] VAR per_cpu_struct type=90 linkage=0
> Variable 'per_cpu_struct' from CU 'kernel/pahole-tc.c' at address 0x0 encoded
> [97] VAR per_cpu_struct type=91 linkage=0
> error: found variable 'per_cpu_struct' in CU 'kernel/pahole-tc.c' that has void type
> [98] DATASEC .data..percpu size=136 vlen=2
> type=96 offset=0 size=136
> type=97 offset=0 size=136
> ⬢[acme@toolbox pahole]$ pahole -V --btf_encode pahole-tc-bad.o
>
> It ends up considering two variables with that name?
>
> ⬢[acme@toolbox pahole]$ readelf -wi sebastian-rt-void-variable-btf/pahole-tc/pahole-tc-bad.o | grep per_cpu_struct -B1 -A5
> <1><73c>: Abbrev Number: 4 (DW_TAG_structure_type)
> <73d> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <741> DW_AT_byte_size : 136
> <742> DW_AT_decl_file : 1
> <743> DW_AT_decl_line : 4
> <744> DW_AT_decl_column : 8
> <745> DW_AT_sibling : <0x764>
> --
> <1><764>: Abbrev Number: 29 (DW_TAG_variable)
> <765> DW_AT_name : (indirect string, offset: 0x184): per_cpu_struct
> <769> DW_AT_decl_file : 1
> <769> DW_AT_decl_line : 9
> <76a> DW_AT_decl_column : 8
> <76b> DW_AT_type : <0x73c>
> <76f> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0)
> ⬢[acme@toolbox pahole]$
>
> Just dumping some more info in case Jiri or Alan thinks about something.
>
> - Arnaldo
>
>>> At this point I am not sure if gcc wrongly created the dwarf information
>>> or of pahole accidentally matches the internal local_lock_t member as
>>> per_cpu_struct because it starts the same address.
>>>
>>> >From dumpwarf the problematic entry is:
>>> | 0x0000085b: DW_TAG_variable
>>> | DW_AT_abstract_origin (0x0000099e "t")
>>> | DW_AT_location (DW_OP_addr 0x0, DW_OP_stack_value)
>>>
>>> which looks slightly different in the good case:
>>> | 0x00000852: DW_TAG_variable
>>> | DW_AT_abstract_origin (0x00000980 "t")
>>> | DW_AT_location (0x00000040:
>>> | [0x0000000000000046, 0x000000000000005b): DW_OP_reg3 RBX
>>> | [0x000000000000005b, 0x0000000000000061): DW_OP_addr 0x0, DW_OP_stack_value)
>>> | DW_AT_GNU_locviews (0x0000003c)
>>>
>>> I made an archive with c file, the compiled version in bad and good case
>>> (swapped the first two members), the dumpwarf output and uploaded to
>>> https://breakpoint.cc/pahole-tc.tar.xz
>>
>> Lots of great info, I'm on it.
>>
>> - Arnaldo
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Encoding BTF information from DWARF causes "has void type" error.
2024-02-07 10:01 ` Alan Maguire
@ 2024-02-07 11:03 ` Alan Maguire
2024-02-07 12:30 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 8+ messages in thread
From: Alan Maguire @ 2024-02-07 11:03 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Sebastian Andrzej Siewior, Jiri Olsa
Cc: dwarves, Thomas Gleixner
On 07/02/2024 10:01, Alan Maguire wrote:
> The problem is the conditional check above; I can't see why it needs to
> be guarded here. We never want to do an address-based lookup for a
> variable and not have it be the same variable name as we have from DWARF
> - which is the one we're trying to encode, right? The following fixes
> the issue for me, can you try this out if you get a chance? I might be
> missing something about that check so other folks please do weigh in if
> something looks broken. Thanks!
>
apologies, but my suggested fix is wrong I believe. When tested with
vmlinux BTF generation, around 100 per-cpu variables disappeared.
I believe the fix should instead be (deliberately leaving in
verbose output to help debug future issues):
diff --git a/btf_encoder.c b/btf_encoder.c
index fd04008..ec081ce 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1527,6 +1527,10 @@ static int
btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
if (variable__scope(var) != VSCOPE_GLOBAL && !var->spec)
continue;
+ if (encoder->verbose)
+ printf("processing var '%s' at addr 0x%lx\n",
+ variable__name(var), var->ip.addr);
+
/* addr has to be recorded before we follow spec */
addr = var->ip.addr;
dwarf_name = variable__name(var);
@@ -1559,7 +1563,7 @@ static int btf_encoder__encode_cu_variables(struct
btf_encoder *encoder)
* modules per-CPU data section has non-zero offset so all
* per-CPU symbols have non-zero values.
*/
- if (var->ip.addr == 0) {
+ if (addr == 0) {
if (!dwarf_name || strcmp(dwarf_name, name))
continue;
}
This works for both vmlinux generation and the problem you ran into.
The explanation is that prior to this, we get an adjusted value for
"addr" to do per-cpu variable lookup, where we subtract the per-cpu base
address. We should also use that adjusted address to check for a zero
address in the test to see if we need to use names to resolve per-cpu
variable identity. The problem was the __UNIQUE_ID variable had a
non-zero var->ip.addr value but its value relative to the per-cpu base
was 0. With the adjusted value, we do the name matching and skip the
encoding as intended I believe.
Alan
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Encoding BTF information from DWARF causes "has void type" error.
2024-02-07 11:03 ` Alan Maguire
@ 2024-02-07 12:30 ` Sebastian Andrzej Siewior
2024-02-13 18:33 ` Alan Maguire
0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-02-07 12:30 UTC (permalink / raw)
To: Alan Maguire
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, dwarves, Thomas Gleixner
On 2024-02-07 11:03:15 [+0000], Alan Maguire wrote:
> On 07/02/2024 10:01, Alan Maguire wrote:
> > The problem is the conditional check above; I can't see why it needs to
> > be guarded here. We never want to do an address-based lookup for a
> > variable and not have it be the same variable name as we have from DWARF
> > - which is the one we're trying to encode, right? The following fixes
> > the issue for me, can you try this out if you get a chance? I might be
> > missing something about that check so other folks please do weigh in if
> > something looks broken. Thanks!
So the former patch worked.
> apologies, but my suggested fix is wrong I believe. When tested with
> vmlinux BTF generation, around 100 per-cpu variables disappeared.
> I believe the fix should instead be (deliberately leaving in
> verbose output to help debug future issues):
no worries.
…
> This works for both vmlinux generation and the problem you ran into.
the .o file was a reduced testcase. It works with the "complete" file
here, too. However once that file is merged into vmlinux.o I get the error
again:
| $ pahole -J --btf_gen_floats -j --lang_exclude=rust pahole-tc2/skbuff.o
| $ pahole -J --btf_gen_floats -j --lang_exclude=rust pahole-tc2/vmlinux.o --btf_encode_force
| error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
| error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
| error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
| error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
The -V output is huge and I don't really know what to look at. The
skbuff.o skipped __UNIQUE_ID_guard826 and a few "NULLs" and the address
for __UNIQUE_ID_guard826 was 0. In the vmlinux.o the address for
__UNIQUE_ID_guard826 is 19440.
I uploaded the two .o at
https://breakpoint.cc/pahole-tc2.tar.xz
it decompresses to 1.1GiB.
> The explanation is that prior to this, we get an adjusted value for
> "addr" to do per-cpu variable lookup, where we subtract the per-cpu base
> address. We should also use that adjusted address to check for a zero
> address in the test to see if we need to use names to resolve per-cpu
> variable identity. The problem was the __UNIQUE_ID variable had a
> non-zero var->ip.addr value but its value relative to the per-cpu base
> was 0. With the adjusted value, we do the name matching and skip the
> encoding as intended I believe.
>
> Alan
Sebastian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Encoding BTF information from DWARF causes "has void type" error.
2024-02-07 12:30 ` Sebastian Andrzej Siewior
@ 2024-02-13 18:33 ` Alan Maguire
0 siblings, 0 replies; 8+ messages in thread
From: Alan Maguire @ 2024-02-13 18:33 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, dwarves, Thomas Gleixner
On 07/02/2024 12:30, Sebastian Andrzej Siewior wrote:
> On 2024-02-07 11:03:15 [+0000], Alan Maguire wrote:
>> On 07/02/2024 10:01, Alan Maguire wrote:
>>> The problem is the conditional check above; I can't see why it needs to
>>> be guarded here. We never want to do an address-based lookup for a
>>> variable and not have it be the same variable name as we have from DWARF
>>> - which is the one we're trying to encode, right? The following fixes
>>> the issue for me, can you try this out if you get a chance? I might be
>>> missing something about that check so other folks please do weigh in if
>>> something looks broken. Thanks!
>
> So the former patch worked.
>
>> apologies, but my suggested fix is wrong I believe. When tested with
>> vmlinux BTF generation, around 100 per-cpu variables disappeared.
>> I believe the fix should instead be (deliberately leaving in
>> verbose output to help debug future issues):
>
> no worries.
>
> …
>> This works for both vmlinux generation and the problem you ran into.
>
> the .o file was a reduced testcase. It works with the "complete" file
> here, too. However once that file is merged into vmlinux.o I get the error
> again:
> | $ pahole -J --btf_gen_floats -j --lang_exclude=rust pahole-tc2/skbuff.o
> | $ pahole -J --btf_gen_floats -j --lang_exclude=rust pahole-tc2/vmlinux.o --btf_encode_force
> | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
> | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
> | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
> | error: found variable 'napi_alloc_cache' in CU 'net/core/skbuff.c' that has void type
>
> The -V output is huge and I don't really know what to look at. The
> skbuff.o skipped __UNIQUE_ID_guard826 and a few "NULLs" and the address
> for __UNIQUE_ID_guard826 was 0. In the vmlinux.o the address for
> __UNIQUE_ID_guard826 is 19440.
>
Thanks for trying this out! From what I can see, when the individual
objects are linked we no longer have the offset 0 that triggers the name
comparison. On the other side though, if we are not selective in doing
these name comparisons, we lose encoding of a fair few legitimate
per-cpu variables in vmlinux BTF. So the question is how to thread the
needle in ensuring that the matches are accurate for cases like the
above while also ensuring we don't leave any legitimate variables out.
I don't have a good answer to that yet I'm afraid.
I think the problem of missing variables when the check is unselective
_may_ boil down to cases where the DWARF name is NULL (due to it using
an abstract origin reference); in such cases we cannot name-match and
such variables would be skipped. We have some code in dwarf_loader.c
that attempts to merge abstract origin references and their referents
such that the name is filled in for the abstract origin-based variables,
but it may be that we are missing a corner case somewhere. I'll dig
into this further on Friday, but if anyone else has time to do some
analysis here that would be great! Thanks!
Alan
> I uploaded the two .o at
> https://breakpoint.cc/pahole-tc2.tar.xz
>
> it decompresses to 1.1GiB.
>
>> The explanation is that prior to this, we get an adjusted value for
>> "addr" to do per-cpu variable lookup, where we subtract the per-cpu base
>> address. We should also use that adjusted address to check for a zero
>> address in the test to see if we need to use names to resolve per-cpu
>> variable identity. The problem was the __UNIQUE_ID variable had a
>> non-zero var->ip.addr value but its value relative to the per-cpu base
>> was 0. With the adjusted value, we do the name matching and skip the
>> encoding as intended I believe.
>>
>> Alan
>
> Sebastian
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-02-13 18:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-06 12:02 Encoding BTF information from DWARF causes "has void type" error Sebastian Andrzej Siewior
2024-02-06 13:42 ` Arnaldo Carvalho de Melo
2024-02-06 14:23 ` Arnaldo Carvalho de Melo
2024-02-06 14:40 ` Sebastian Andrzej Siewior
2024-02-07 10:01 ` Alan Maguire
2024-02-07 11:03 ` Alan Maguire
2024-02-07 12:30 ` Sebastian Andrzej Siewior
2024-02-13 18:33 ` Alan Maguire
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.