Re: [REGRESSION] module BTF validation failure (Error -22) on next

public inbox for dwarves@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [REGRESSION] module BTF validation failure (Error -22) on next
       [not found]     ` <Z10MkXtzyY9RDqSp@pop-os.localdomain>
@ 2024-12-14 12:15       ` Alan Maguire
  2024-12-16 15:19         ` Alan Maguire
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Maguire @ 2024-12-14 12:15 UTC (permalink / raw)
  To: Cong Wang, Uros Bizjak
  Cc: Laura Nao, bpf, chrome-platform, kernel, linux-kernel,
	regressions, Stephen Brennan, dwarves@vger.kernel.org

On 14/12/2024 04:41, Cong Wang wrote:
> On Thu, Dec 05, 2024 at 08:36:33AM +0100, Uros Bizjak wrote:
>> On Wed, Dec 4, 2024 at 4:52 PM Laura Nao <laura.nao@collabora.com> wrote:
>>>
>>> On 11/15/24 18:17, Laura Nao wrote:
>>>> I managed to reproduce the issue locally and I've uploaded the vmlinux[1]
>>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of the
>>>> modules[3] and its btf data[4] extracted with:
>>>>
>>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko > cros_kbd_led_backlight.ko.raw
>>>>
>>>> Looking again at the logs[5], I've noticed the following is reported:
>>>>
>>>> [    0.415885] BPF:    type_id=115803 offset=177920 size=1152
>>>> [    0.416029] BPF:
>>>> [    0.416083] BPF: Invalid offset
>>>> [    0.416165] BPF:
>>>>
>>>> There are two different definitions of rcu_data in '.data..percpu', one
>>>> is a struct and the other is an integer:
>>>>
>>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data')
>>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data')
>>>>
>>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static
>>>> [115803] VAR 'rcu_data' type_id=1, linkage=static
>>>>
>>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69
>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
>>>>
>>>> I assume that's not expected, correct?
>>>>
>>>> I'll dig a bit deeper and report back if I can find anything else.
>>>
>>> I ran a bisection, and it appears the culprit commit is:
>>> https://lore.kernel.org/all/20241021080856.48746-2-ubizjak@gmail.com/
>>>
>>> Hi Uros, do you have any suggestions or insights on resolving this issue?
>>
>> There is a stray ";" at the end of the #define, perhaps this makes a difference:
>>
>> +#define PERCPU_PTR(__p) \
>> + (typeof(*(__p)) __force __kernel *)(__p);
>> +
>>
>> and SHIFT_PERCPU_PTR macro now expands to:
>>
>> RELOC_HIDE((typeof(*(p)) __force __kernel *)(p);, (offset))
>>
>> A follow-up patch in the series changes PERCPU_PTR macro to:
>>
>> #define PERCPU_PTR(__p) \
>> ({ \
>> unsigned long __pcpu_ptr = (__force unsigned long)(__p); \
>> (typeof(*(__p)) __force __kernel *)(__pcpu_ptr); \
>> })
>>
>> so this should again correctly cast the value.
> 
> Hm, I saw a similar bug but with pahole 1.28. My kernel complains about
> BTF invalid offset:
> 
> [    7.785788] BPF: 	 type_id=2394 offset=0 size=1
> [    7.786411] BPF:
> [    7.786703] BPF: Invalid offset
> [    7.787119] BPF:
> 
> Dumping the vmlinux (there is no module invovled), I saw it is related to
> percpu pointer too:
> 
> [2394] VAR '__pcpu_unique_cpu_hw_events' type_id=2, linkage=global
> ...
> [163643] DATASEC '.data..percpu' size=2123280 vlen=808
>         type_id=2393 offset=0 size=1 (VAR '__pcpu_scope_cpu_hw_events')
>         type_id=2394 offset=0 size=1 (VAR '__pcpu_unique_cpu_hw_events')
> ...
> 
> I compiled and installed the latest pahole from its git repo:
> 
> $ pahole --version
> v1.28
> 
> Thanks.

Thanks for the report! Looking at percpu-defs.h it looks like the
existence of such variables requires either

#if defined(ARCH_NEEDS_WEAK_PER_CPU) ||
defined(CONFIG_DEBUG_FORCE_WEAK_PER_CPU)

...

#define DEFINE_PER_CPU_SECTION(type, name, sec)                         \
        __PCPU_DUMMY_ATTRS char __pcpu_scope_##name;                    \
        extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;            \
        __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;                   \
        extern __PCPU_ATTRS(sec) __typeof__(type) name;                 \
        __PCPU_ATTRS(sec) __weak __typeof__(type) name


I'm guessing your .config has CONFIG_DEBUG_FORCE_WEAK_PER_CPU, or are
you building on s390/alpha?

I've reproduced this on bpf-next with CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y,
pahole v1.28 and gcc-12; I see ~900 __pcpu_ variables and get the same
BTF errors since multipe __pcpu_ vars share the offset 0.

A simple workaround in dwarves - and I verified this resolved the issue
for me - would be

diff --git a/btf_encoder.c b/btf_encoder.c
index 3754884..4a1799a 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -2174,7 +2174,8 @@ static bool filter_variable_name(const char *name)
                X("__UNIQUE_ID"),
                X("__tpstrtab_"),
                X("__exitcall_"),
-               X("__func_stack_frame_non_standard_")
+               X("__func_stack_frame_non_standard_"),
+               X("__pcpu_")
                #undef X
        };
        int i;

...but I'd like us to understand further why variables which were
supposed to be in a .discard section end up being encoded as there may
be other problems lurking here aside from this one. More soon hopefully...

Alan

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] module BTF validation failure (Error -22) on next
  2024-12-14 12:15       ` [REGRESSION] module BTF validation failure (Error -22) on next Alan Maguire
@ 2024-12-16 15:19         ` Alan Maguire
  2024-12-16 21:28           ` Stephen Brennan
                             ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Alan Maguire @ 2024-12-16 15:19 UTC (permalink / raw)
  To: Cong Wang, Uros Bizjak
  Cc: Laura Nao, bpf, chrome-platform, kernel, linux-kernel,
	regressions, Stephen Brennan, dwarves@vger.kernel.org

On 14/12/2024 12:15, Alan Maguire wrote:
> On 14/12/2024 04:41, Cong Wang wrote:
>> On Thu, Dec 05, 2024 at 08:36:33AM +0100, Uros Bizjak wrote:
>>> On Wed, Dec 4, 2024 at 4:52 PM Laura Nao <laura.nao@collabora.com> wrote:
>>>>
>>>> On 11/15/24 18:17, Laura Nao wrote:
>>>>> I managed to reproduce the issue locally and I've uploaded the vmlinux[1]
>>>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of the
>>>>> modules[3] and its btf data[4] extracted with:
>>>>>
>>>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko > cros_kbd_led_backlight.ko.raw
>>>>>
>>>>> Looking again at the logs[5], I've noticed the following is reported:
>>>>>
>>>>> [    0.415885] BPF:    type_id=115803 offset=177920 size=1152
>>>>> [    0.416029] BPF:
>>>>> [    0.416083] BPF: Invalid offset
>>>>> [    0.416165] BPF:
>>>>>
>>>>> There are two different definitions of rcu_data in '.data..percpu', one
>>>>> is a struct and the other is an integer:
>>>>>
>>>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data')
>>>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data')
>>>>>
>>>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static
>>>>> [115803] VAR 'rcu_data' type_id=1, linkage=static
>>>>>
>>>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69
>>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
>>>>>
>>>>> I assume that's not expected, correct?
>>>>>
>>>>> I'll dig a bit deeper and report back if I can find anything else.
>>>>
>>>> I ran a bisection, and it appears the culprit commit is:
>>>> https://lore.kernel.org/all/20241021080856.48746-2-ubizjak@gmail.com/
>>>>
>>>> Hi Uros, do you have any suggestions or insights on resolving this issue?
>>>
>>> There is a stray ";" at the end of the #define, perhaps this makes a difference:
>>>
>>> +#define PERCPU_PTR(__p) \
>>> + (typeof(*(__p)) __force __kernel *)(__p);
>>> +
>>>
>>> and SHIFT_PERCPU_PTR macro now expands to:
>>>
>>> RELOC_HIDE((typeof(*(p)) __force __kernel *)(p);, (offset))
>>>
>>> A follow-up patch in the series changes PERCPU_PTR macro to:
>>>
>>> #define PERCPU_PTR(__p) \
>>> ({ \
>>> unsigned long __pcpu_ptr = (__force unsigned long)(__p); \
>>> (typeof(*(__p)) __force __kernel *)(__pcpu_ptr); \
>>> })
>>>
>>> so this should again correctly cast the value.
>>
>> Hm, I saw a similar bug but with pahole 1.28. My kernel complains about
>> BTF invalid offset:
>>
>> [    7.785788] BPF: 	 type_id=2394 offset=0 size=1
>> [    7.786411] BPF:
>> [    7.786703] BPF: Invalid offset
>> [    7.787119] BPF:
>>
>> Dumping the vmlinux (there is no module invovled), I saw it is related to
>> percpu pointer too:
>>
>> [2394] VAR '__pcpu_unique_cpu_hw_events' type_id=2, linkage=global
>> ...
>> [163643] DATASEC '.data..percpu' size=2123280 vlen=808
>>         type_id=2393 offset=0 size=1 (VAR '__pcpu_scope_cpu_hw_events')
>>         type_id=2394 offset=0 size=1 (VAR '__pcpu_unique_cpu_hw_events')
>> ...
>>
>> I compiled and installed the latest pahole from its git repo:
>>
>> $ pahole --version
>> v1.28
>>
>> Thanks.
> 
> Thanks for the report! Looking at percpu-defs.h it looks like the
> existence of such variables requires either
> 
> #if defined(ARCH_NEEDS_WEAK_PER_CPU) ||
> defined(CONFIG_DEBUG_FORCE_WEAK_PER_CPU)
> 
> ...
> 
> #define DEFINE_PER_CPU_SECTION(type, name, sec)                         \
>         __PCPU_DUMMY_ATTRS char __pcpu_scope_##name;                    \
>         extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;            \
>         __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;                   \
>         extern __PCPU_ATTRS(sec) __typeof__(type) name;                 \
>         __PCPU_ATTRS(sec) __weak __typeof__(type) name
> 
> 
> I'm guessing your .config has CONFIG_DEBUG_FORCE_WEAK_PER_CPU, or are
> you building on s390/alpha?
> 
> I've reproduced this on bpf-next with CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y,
> pahole v1.28 and gcc-12; I see ~900 __pcpu_ variables and get the same
> BTF errors since multipe __pcpu_ vars share the offset 0.
> 
> A simple workaround in dwarves - and I verified this resolved the issue
> for me - would be
> 
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 3754884..4a1799a 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -2174,7 +2174,8 @@ static bool filter_variable_name(const char *name)
>                 X("__UNIQUE_ID"),
>                 X("__tpstrtab_"),
>                 X("__exitcall_"),
> -               X("__func_stack_frame_non_standard_")
> +               X("__func_stack_frame_non_standard_"),
> +               X("__pcpu_")
>                 #undef X
>         };
>         int i;
> 
> ...but I'd like us to understand further why variables which were
> supposed to be in a .discard section end up being encoded as there may
> be other problems lurking here aside from this one. More soon hopefully...
>


A bit more context here - variable encoding takes the address of the
variable from DWARF to locate the associated ELF section. Because we
insist on having a variable specification - with a location - this
usually works fine. However the problem is that because these dummy
__pcpu_ variables specify a .discard section, their addresses are 0, so
we get for example:

 <1><1e535>: Abbrev Number: 114 (DW_TAG_variable)
    <1e536>   DW_AT_name        : (indirect string, offset: 0x5e97):
__pcpu_unique_kstack_offset
    <1e53a>   DW_AT_decl_file   : 1
    <1e53b>   DW_AT_decl_line   : 823
    <1e53d>   DW_AT_decl_column : 1
    <1e53e>   DW_AT_type        : <0x57>
    <1e542>   DW_AT_external    : 1
    <1e542>   DW_AT_declaration : 1
 <1><1e542>: Abbrev Number: 156 (DW_TAG_variable)
    <1e544>   DW_AT_specification: <0x1e535>
    <1e548>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
(DW_OP_addr: 0)


You can see the same thing for a simple program like this:

#include <stdio.h>

#define SEC(name) __attribute__((section(name)))

SEC("/DISCARD/") int d1;
extern int d1;
SEC("/DISCARD/") int d2;
extern int d2;

int main(int argc, char *argv[])
{
	return 0;
}


If you compile it with -g, the DWARF shows that d1 and d2 both have
address 0:

 <1><72>: Abbrev Number: 5 (DW_TAG_variable)
    <73>   DW_AT_name        : d1
    <76>   DW_AT_decl_file   : 1
    <77>   DW_AT_decl_line   : 5
    <78>   DW_AT_decl_column : 22
    <79>   DW_AT_type        : <0x57>
    <7d>   DW_AT_external    : 1
    <7d>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
(DW_OP_addr: 0)
 <1><87>: Abbrev Number: 5 (DW_TAG_variable)
    <88>   DW_AT_name        : d2
    <8b>   DW_AT_decl_file   : 1
    <8c>   DW_AT_decl_line   : 7
    <8d>   DW_AT_decl_column : 22
    <8e>   DW_AT_type        : <0x57>
    <92>   DW_AT_external    : 1
    <92>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
(DW_OP_addr: 0)


So the reason this happens for dwarves v1.28 in particular is - as I
understand it - we moved away from recording ELF section information for
each variable and matching that with DWARF info, instead relying on the
address to locate the associated ELF section. In cases like the above
the address information unfortunately leads us astray.

Seems like there's a few approaches we can take in fixing this:

1. designate "__pcpu_" prefix as a variable prefix to filter out. This
resolves the immediate problem but is too narrowly focused IMO and we
may end up playing whack-a-mole with other dummy variable prefixes.
2. resurrect ELF section variable information fully; i.e. record a list
of variables per ELF section (or at least per ELF section we care
about). If variable is not on the list for the ELF section, do not
encode it.
3. midway between the two; for the 0 address case specifically, verify
that the variable name really _is_ in the associated ELF section. No
need to create a local ELF table variable representation, we could just
walk the table in the case of the 0 addresses.

Diff for approach 3 is as follows

diff --git a/btf_encoder.c b/btf_encoder.c
index 3754884..21a0ab6 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -2189,6 +2189,26 @@ static bool filter_variable_name(const char *name)
        return false;
 }

+bool variable_in_sec(struct btf_encoder *encoder, const char *name,
size_t shndx)
+{
+       uint32_t sym_sec_idx;
+       uint32_t core_id;
+       GElf_Sym sym;
+
+       elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym,
sym_sec_idx) {
+               const char *sym_name;
+
+               if (sym_sec_idx != shndx || elf_sym__type(&sym) !=
STT_OBJECT)
+                       continue;
+               sym_name = elf_sym__name(&sym, encoder->symtab);
+               if (!sym_name)
+                       continue;
+               if (strcmp(name, sym_name) == 0)
+                       return true;
+       }
+       return false;
+}
+
 static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
 {
        struct cu *cu = encoder->cu;
@@ -2258,6 +2278,11 @@ static int
btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
                if (filter_variable_name(name))
                        continue;

+               /* A 0 address may be in a .discard section; ensure the
+                * variable really is in this section by checking ELF
symtab.
+                */
+               if (addr == 0 && !variable_in_sec(encoder, name, shndx))
+                       continue;
                /* Check for invalid BTF names */
                if (!btf_name_valid(name)) {
                        dump_invalid_symbol("Found invalid variable name
when encoding btf",


...so slightly more complex than option 1, but a bit more general in its
applicability to .discard section variables.

For the pahole folks, what do we think? Which option (or indeed other
ones I haven't thought of) makes sense for a fix for this? Thanks!

Alan

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] module BTF validation failure (Error -22) on next
  2024-12-16 15:19         ` Alan Maguire
@ 2024-12-16 21:28           ` Stephen Brennan
  2024-12-17  8:02           ` Jiri Olsa
  2025-01-15 17:38           ` Cong Wang
  2 siblings, 0 replies; 6+ messages in thread
From: Stephen Brennan @ 2024-12-16 21:28 UTC (permalink / raw)
  To: Alan Maguire, Cong Wang, Uros Bizjak
  Cc: Laura Nao, bpf, chrome-platform, kernel, linux-kernel,
	regressions, dwarves@vger.kernel.org

Alan Maguire <alan.maguire@oracle.com> writes:
> On 14/12/2024 12:15, Alan Maguire wrote:
>> On 14/12/2024 04:41, Cong Wang wrote:
>>> On Thu, Dec 05, 2024 at 08:36:33AM +0100, Uros Bizjak wrote:
>>>> On Wed, Dec 4, 2024 at 4:52 PM Laura Nao <laura.nao@collabora.com> wrote:
>>>>>
>>>>> On 11/15/24 18:17, Laura Nao wrote:
>>>>>> I managed to reproduce the issue locally and I've uploaded the vmlinux[1]
>>>>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of the
>>>>>> modules[3] and its btf data[4] extracted with:
>>>>>>
>>>>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko > cros_kbd_led_backlight.ko.raw
>>>>>>
>>>>>> Looking again at the logs[5], I've noticed the following is reported:
>>>>>>
>>>>>> [    0.415885] BPF:    type_id=115803 offset=177920 size=1152
>>>>>> [    0.416029] BPF:
>>>>>> [    0.416083] BPF: Invalid offset
>>>>>> [    0.416165] BPF:
>>>>>>
>>>>>> There are two different definitions of rcu_data in '.data..percpu', one
>>>>>> is a struct and the other is an integer:
>>>>>>
>>>>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data')
>>>>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data')
>>>>>>
>>>>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static
>>>>>> [115803] VAR 'rcu_data' type_id=1, linkage=static
>>>>>>
>>>>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69
>>>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
>>>>>>
>>>>>> I assume that's not expected, correct?
>>>>>>
>>>>>> I'll dig a bit deeper and report back if I can find anything else.
>>>>>
>>>>> I ran a bisection, and it appears the culprit commit is:
>>>>> https://lore.kernel.org/all/20241021080856.48746-2-ubizjak@gmail.com/
>>>>>
>>>>> Hi Uros, do you have any suggestions or insights on resolving this issue?
>>>>
>>>> There is a stray ";" at the end of the #define, perhaps this makes a difference:
>>>>
>>>> +#define PERCPU_PTR(__p) \
>>>> + (typeof(*(__p)) __force __kernel *)(__p);
>>>> +
>>>>
>>>> and SHIFT_PERCPU_PTR macro now expands to:
>>>>
>>>> RELOC_HIDE((typeof(*(p)) __force __kernel *)(p);, (offset))
>>>>
>>>> A follow-up patch in the series changes PERCPU_PTR macro to:
>>>>
>>>> #define PERCPU_PTR(__p) \
>>>> ({ \
>>>> unsigned long __pcpu_ptr = (__force unsigned long)(__p); \
>>>> (typeof(*(__p)) __force __kernel *)(__pcpu_ptr); \
>>>> })
>>>>
>>>> so this should again correctly cast the value.
>>>
>>> Hm, I saw a similar bug but with pahole 1.28. My kernel complains about
>>> BTF invalid offset:
>>>
>>> [    7.785788] BPF: 	 type_id=2394 offset=0 size=1
>>> [    7.786411] BPF:
>>> [    7.786703] BPF: Invalid offset
>>> [    7.787119] BPF:
>>>
>>> Dumping the vmlinux (there is no module invovled), I saw it is related to
>>> percpu pointer too:
>>>
>>> [2394] VAR '__pcpu_unique_cpu_hw_events' type_id=2, linkage=global
>>> ...
>>> [163643] DATASEC '.data..percpu' size=2123280 vlen=808
>>>         type_id=2393 offset=0 size=1 (VAR '__pcpu_scope_cpu_hw_events')
>>>         type_id=2394 offset=0 size=1 (VAR '__pcpu_unique_cpu_hw_events')
>>> ...
>>>
>>> I compiled and installed the latest pahole from its git repo:
>>>
>>> $ pahole --version
>>> v1.28
>>>
>>> Thanks.
>> 
>> Thanks for the report! Looking at percpu-defs.h it looks like the
>> existence of such variables requires either
>> 
>> #if defined(ARCH_NEEDS_WEAK_PER_CPU) ||
>> defined(CONFIG_DEBUG_FORCE_WEAK_PER_CPU)
>> 
>> ...
>> 
>> #define DEFINE_PER_CPU_SECTION(type, name, sec)                         \
>>         __PCPU_DUMMY_ATTRS char __pcpu_scope_##name;                    \
>>         extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;            \
>>         __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;                   \
>>         extern __PCPU_ATTRS(sec) __typeof__(type) name;                 \
>>         __PCPU_ATTRS(sec) __weak __typeof__(type) name
>> 
>> 
>> I'm guessing your .config has CONFIG_DEBUG_FORCE_WEAK_PER_CPU, or are
>> you building on s390/alpha?
>> 
>> I've reproduced this on bpf-next with CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y,
>> pahole v1.28 and gcc-12; I see ~900 __pcpu_ variables and get the same
>> BTF errors since multipe __pcpu_ vars share the offset 0.
>> 
>> A simple workaround in dwarves - and I verified this resolved the issue
>> for me - would be
>> 
>> diff --git a/btf_encoder.c b/btf_encoder.c
>> index 3754884..4a1799a 100644
>> --- a/btf_encoder.c
>> +++ b/btf_encoder.c
>> @@ -2174,7 +2174,8 @@ static bool filter_variable_name(const char *name)
>>                 X("__UNIQUE_ID"),
>>                 X("__tpstrtab_"),
>>                 X("__exitcall_"),
>> -               X("__func_stack_frame_non_standard_")
>> +               X("__func_stack_frame_non_standard_"),
>> +               X("__pcpu_")
>>                 #undef X
>>         };
>>         int i;
>> 
>> ...but I'd like us to understand further why variables which were
>> supposed to be in a .discard section end up being encoded as there may
>> be other problems lurking here aside from this one. More soon hopefully...
>>
>
>
> A bit more context here - variable encoding takes the address of the
> variable from DWARF to locate the associated ELF section. Because we
> insist on having a variable specification - with a location - this
> usually works fine. However the problem is that because these dummy
> __pcpu_ variables specify a .discard section, their addresses are 0, so
> we get for example:
>
>  <1><1e535>: Abbrev Number: 114 (DW_TAG_variable)
>     <1e536>   DW_AT_name        : (indirect string, offset: 0x5e97):
> __pcpu_unique_kstack_offset
>     <1e53a>   DW_AT_decl_file   : 1
>     <1e53b>   DW_AT_decl_line   : 823
>     <1e53d>   DW_AT_decl_column : 1
>     <1e53e>   DW_AT_type        : <0x57>
>     <1e542>   DW_AT_external    : 1
>     <1e542>   DW_AT_declaration : 1
>  <1><1e542>: Abbrev Number: 156 (DW_TAG_variable)
>     <1e544>   DW_AT_specification: <0x1e535>
>     <1e548>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
>
>
> You can see the same thing for a simple program like this:
>
> #include <stdio.h>
>
> #define SEC(name) __attribute__((section(name)))
>
> SEC("/DISCARD/") int d1;
> extern int d1;
> SEC("/DISCARD/") int d2;
> extern int d2;
>
> int main(int argc, char *argv[])
> {
> 	return 0;
> }
>
>
> If you compile it with -g, the DWARF shows that d1 and d2 both have
> address 0:
>
>  <1><72>: Abbrev Number: 5 (DW_TAG_variable)
>     <73>   DW_AT_name        : d1
>     <76>   DW_AT_decl_file   : 1
>     <77>   DW_AT_decl_line   : 5
>     <78>   DW_AT_decl_column : 22
>     <79>   DW_AT_type        : <0x57>
>     <7d>   DW_AT_external    : 1
>     <7d>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
>  <1><87>: Abbrev Number: 5 (DW_TAG_variable)
>     <88>   DW_AT_name        : d2
>     <8b>   DW_AT_decl_file   : 1
>     <8c>   DW_AT_decl_line   : 7
>     <8d>   DW_AT_decl_column : 22
>     <8e>   DW_AT_type        : <0x57>
>     <92>   DW_AT_external    : 1
>     <92>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
>
>
> So the reason this happens for dwarves v1.28 in particular is - as I
> understand it - we moved away from recording ELF section information for
> each variable and matching that with DWARF info, instead relying on the
> address to locate the associated ELF section. In cases like the above
> the address information unfortunately leads us astray.
>
> Seems like there's a few approaches we can take in fixing this:
>
> 1. designate "__pcpu_" prefix as a variable prefix to filter out. This
> resolves the immediate problem but is too narrowly focused IMO and we
> may end up playing whack-a-mole with other dummy variable prefixes.
> 2. resurrect ELF section variable information fully; i.e. record a list
> of variables per ELF section (or at least per ELF section we care
> about). If variable is not on the list for the ELF section, do not
> encode it.
> 3. midway between the two; for the 0 address case specifically, verify
> that the variable name really _is_ in the associated ELF section. No
> need to create a local ELF table variable representation, we could just
> walk the table in the case of the 0 addresses.
>
> Diff for approach 3 is as follows
>
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 3754884..21a0ab6 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -2189,6 +2189,26 @@ static bool filter_variable_name(const char *name)
>         return false;
>  }
>
> +bool variable_in_sec(struct btf_encoder *encoder, const char *name,
> size_t shndx)
> +{
> +       uint32_t sym_sec_idx;
> +       uint32_t core_id;
> +       GElf_Sym sym;
> +
> +       elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym,
> sym_sec_idx) {
> +               const char *sym_name;
> +
> +               if (sym_sec_idx != shndx || elf_sym__type(&sym) !=
> STT_OBJECT)
> +                       continue;
> +               sym_name = elf_sym__name(&sym, encoder->symtab);
> +               if (!sym_name)
> +                       continue;
> +               if (strcmp(name, sym_name) == 0)
> +                       return true;
> +       }
> +       return false;
> +}
> +
>  static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
>  {
>         struct cu *cu = encoder->cu;
> @@ -2258,6 +2278,11 @@ static int
> btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
>                 if (filter_variable_name(name))
>                         continue;
>
> +               /* A 0 address may be in a .discard section; ensure the
> +                * variable really is in this section by checking ELF
> symtab.
> +                */
> +               if (addr == 0 && !variable_in_sec(encoder, name, shndx))
> +                       continue;
>                 /* Check for invalid BTF names */
>                 if (!btf_name_valid(name)) {
>                         dump_invalid_symbol("Found invalid variable name
> when encoding btf",
>
>
> ...so slightly more complex than option 1, but a bit more general in its
> applicability to .discard section variables.
>
> For the pahole folks, what do we think? Which option (or indeed other
> ones I haven't thought of) makes sense for a fix for this? Thanks!

Hi Alan,

Thanks so much for the analysis. It strikes me that it would be very
nice if the compiler could somehow annotate DIEs that are discarded. But
maybe the discarding happens far enough along that the DWARF can't be
modified?

In any case, while we wish for better compilers, we need a solution now.
I think Option 3 is a really great compromise. Reading it in context
with the code, it makes intuitive sense and it works when we're
outputting just the percpu globals, or when we're outputting all
globals. And there's little risk of issues, given that up until 1.28,
we've been using the ELF name for all variables anyway, so we know that
there have always been ELF symbols for all the variables we care about.

Stephen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] module BTF validation failure (Error -22) on next
  2024-12-16 15:19         ` Alan Maguire
  2024-12-16 21:28           ` Stephen Brennan
@ 2024-12-17  8:02           ` Jiri Olsa
  2025-01-15 17:38           ` Cong Wang
  2 siblings, 0 replies; 6+ messages in thread
From: Jiri Olsa @ 2024-12-17  8:02 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Cong Wang, Uros Bizjak, Laura Nao, bpf, chrome-platform, kernel,
	linux-kernel, regressions, Stephen Brennan,
	dwarves@vger.kernel.org

On Mon, Dec 16, 2024 at 03:19:01PM +0000, Alan Maguire wrote:
> On 14/12/2024 12:15, Alan Maguire wrote:
> > On 14/12/2024 04:41, Cong Wang wrote:
> >> On Thu, Dec 05, 2024 at 08:36:33AM +0100, Uros Bizjak wrote:
> >>> On Wed, Dec 4, 2024 at 4:52 PM Laura Nao <laura.nao@collabora.com> wrote:
> >>>>
> >>>> On 11/15/24 18:17, Laura Nao wrote:
> >>>>> I managed to reproduce the issue locally and I've uploaded the vmlinux[1]
> >>>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of the
> >>>>> modules[3] and its btf data[4] extracted with:
> >>>>>
> >>>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko > cros_kbd_led_backlight.ko.raw
> >>>>>
> >>>>> Looking again at the logs[5], I've noticed the following is reported:
> >>>>>
> >>>>> [    0.415885] BPF:    type_id=115803 offset=177920 size=1152
> >>>>> [    0.416029] BPF:
> >>>>> [    0.416083] BPF: Invalid offset
> >>>>> [    0.416165] BPF:
> >>>>>
> >>>>> There are two different definitions of rcu_data in '.data..percpu', one
> >>>>> is a struct and the other is an integer:
> >>>>>
> >>>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data')
> >>>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data')
> >>>>>
> >>>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static
> >>>>> [115803] VAR 'rcu_data' type_id=1, linkage=static
> >>>>>
> >>>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69
> >>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
> >>>>>
> >>>>> I assume that's not expected, correct?
> >>>>>
> >>>>> I'll dig a bit deeper and report back if I can find anything else.
> >>>>
> >>>> I ran a bisection, and it appears the culprit commit is:
> >>>> https://lore.kernel.org/all/20241021080856.48746-2-ubizjak@gmail.com/
> >>>>
> >>>> Hi Uros, do you have any suggestions or insights on resolving this issue?
> >>>
> >>> There is a stray ";" at the end of the #define, perhaps this makes a difference:
> >>>
> >>> +#define PERCPU_PTR(__p) \
> >>> + (typeof(*(__p)) __force __kernel *)(__p);
> >>> +
> >>>
> >>> and SHIFT_PERCPU_PTR macro now expands to:
> >>>
> >>> RELOC_HIDE((typeof(*(p)) __force __kernel *)(p);, (offset))
> >>>
> >>> A follow-up patch in the series changes PERCPU_PTR macro to:
> >>>
> >>> #define PERCPU_PTR(__p) \
> >>> ({ \
> >>> unsigned long __pcpu_ptr = (__force unsigned long)(__p); \
> >>> (typeof(*(__p)) __force __kernel *)(__pcpu_ptr); \
> >>> })
> >>>
> >>> so this should again correctly cast the value.
> >>
> >> Hm, I saw a similar bug but with pahole 1.28. My kernel complains about
> >> BTF invalid offset:
> >>
> >> [    7.785788] BPF: 	 type_id=2394 offset=0 size=1
> >> [    7.786411] BPF:
> >> [    7.786703] BPF: Invalid offset
> >> [    7.787119] BPF:
> >>
> >> Dumping the vmlinux (there is no module invovled), I saw it is related to
> >> percpu pointer too:
> >>
> >> [2394] VAR '__pcpu_unique_cpu_hw_events' type_id=2, linkage=global
> >> ...
> >> [163643] DATASEC '.data..percpu' size=2123280 vlen=808
> >>         type_id=2393 offset=0 size=1 (VAR '__pcpu_scope_cpu_hw_events')
> >>         type_id=2394 offset=0 size=1 (VAR '__pcpu_unique_cpu_hw_events')
> >> ...
> >>
> >> I compiled and installed the latest pahole from its git repo:
> >>
> >> $ pahole --version
> >> v1.28
> >>
> >> Thanks.
> > 
> > Thanks for the report! Looking at percpu-defs.h it looks like the
> > existence of such variables requires either
> > 
> > #if defined(ARCH_NEEDS_WEAK_PER_CPU) ||
> > defined(CONFIG_DEBUG_FORCE_WEAK_PER_CPU)
> > 
> > ...
> > 
> > #define DEFINE_PER_CPU_SECTION(type, name, sec)                         \
> >         __PCPU_DUMMY_ATTRS char __pcpu_scope_##name;                    \
> >         extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;            \
> >         __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;                   \
> >         extern __PCPU_ATTRS(sec) __typeof__(type) name;                 \
> >         __PCPU_ATTRS(sec) __weak __typeof__(type) name
> > 
> > 
> > I'm guessing your .config has CONFIG_DEBUG_FORCE_WEAK_PER_CPU, or are
> > you building on s390/alpha?
> > 
> > I've reproduced this on bpf-next with CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y,
> > pahole v1.28 and gcc-12; I see ~900 __pcpu_ variables and get the same
> > BTF errors since multipe __pcpu_ vars share the offset 0.
> > 
> > A simple workaround in dwarves - and I verified this resolved the issue
> > for me - would be
> > 
> > diff --git a/btf_encoder.c b/btf_encoder.c
> > index 3754884..4a1799a 100644
> > --- a/btf_encoder.c
> > +++ b/btf_encoder.c
> > @@ -2174,7 +2174,8 @@ static bool filter_variable_name(const char *name)
> >                 X("__UNIQUE_ID"),
> >                 X("__tpstrtab_"),
> >                 X("__exitcall_"),
> > -               X("__func_stack_frame_non_standard_")
> > +               X("__func_stack_frame_non_standard_"),
> > +               X("__pcpu_")
> >                 #undef X
> >         };
> >         int i;
> > 
> > ...but I'd like us to understand further why variables which were
> > supposed to be in a .discard section end up being encoded as there may
> > be other problems lurking here aside from this one. More soon hopefully...
> >
> 
> 
> A bit more context here - variable encoding takes the address of the
> variable from DWARF to locate the associated ELF section. Because we
> insist on having a variable specification - with a location - this
> usually works fine. However the problem is that because these dummy
> __pcpu_ variables specify a .discard section, their addresses are 0, so
> we get for example:
> 
>  <1><1e535>: Abbrev Number: 114 (DW_TAG_variable)
>     <1e536>   DW_AT_name        : (indirect string, offset: 0x5e97):
> __pcpu_unique_kstack_offset
>     <1e53a>   DW_AT_decl_file   : 1
>     <1e53b>   DW_AT_decl_line   : 823
>     <1e53d>   DW_AT_decl_column : 1
>     <1e53e>   DW_AT_type        : <0x57>
>     <1e542>   DW_AT_external    : 1
>     <1e542>   DW_AT_declaration : 1
>  <1><1e542>: Abbrev Number: 156 (DW_TAG_variable)
>     <1e544>   DW_AT_specification: <0x1e535>
>     <1e548>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
> 
> 
> You can see the same thing for a simple program like this:
> 
> #include <stdio.h>
> 
> #define SEC(name) __attribute__((section(name)))
> 
> SEC("/DISCARD/") int d1;
> extern int d1;
> SEC("/DISCARD/") int d2;
> extern int d2;
> 
> int main(int argc, char *argv[])
> {
> 	return 0;
> }
> 
> 
> If you compile it with -g, the DWARF shows that d1 and d2 both have
> address 0:
> 
>  <1><72>: Abbrev Number: 5 (DW_TAG_variable)
>     <73>   DW_AT_name        : d1
>     <76>   DW_AT_decl_file   : 1
>     <77>   DW_AT_decl_line   : 5
>     <78>   DW_AT_decl_column : 22
>     <79>   DW_AT_type        : <0x57>
>     <7d>   DW_AT_external    : 1
>     <7d>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
>  <1><87>: Abbrev Number: 5 (DW_TAG_variable)
>     <88>   DW_AT_name        : d2
>     <8b>   DW_AT_decl_file   : 1
>     <8c>   DW_AT_decl_line   : 7
>     <8d>   DW_AT_decl_column : 22
>     <8e>   DW_AT_type        : <0x57>
>     <92>   DW_AT_external    : 1
>     <92>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
> 
> 
> So the reason this happens for dwarves v1.28 in particular is - as I
> understand it - we moved away from recording ELF section information for
> each variable and matching that with DWARF info, instead relying on the
> address to locate the associated ELF section. In cases like the above
> the address information unfortunately leads us astray.
> 
> Seems like there's a few approaches we can take in fixing this:
> 
> 1. designate "__pcpu_" prefix as a variable prefix to filter out. This
> resolves the immediate problem but is too narrowly focused IMO and we
> may end up playing whack-a-mole with other dummy variable prefixes.
> 2. resurrect ELF section variable information fully; i.e. record a list
> of variables per ELF section (or at least per ELF section we care
> about). If variable is not on the list for the ELF section, do not
> encode it.
> 3. midway between the two; for the 0 address case specifically, verify
> that the variable name really _is_ in the associated ELF section. No
> need to create a local ELF table variable representation, we could just
> walk the table in the case of the 0 addresses.
> 
> Diff for approach 3 is as follows
> 
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 3754884..21a0ab6 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -2189,6 +2189,26 @@ static bool filter_variable_name(const char *name)
>         return false;
>  }
> 
> +bool variable_in_sec(struct btf_encoder *encoder, const char *name,
> size_t shndx)
> +{
> +       uint32_t sym_sec_idx;
> +       uint32_t core_id;
> +       GElf_Sym sym;
> +
> +       elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym,
> sym_sec_idx) {
> +               const char *sym_name;
> +
> +               if (sym_sec_idx != shndx || elf_sym__type(&sym) !=
> STT_OBJECT)
> +                       continue;
> +               sym_name = elf_sym__name(&sym, encoder->symtab);
> +               if (!sym_name)
> +                       continue;
> +               if (strcmp(name, sym_name) == 0)
> +                       return true;
> +       }
> +       return false;
> +}
> +
>  static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
>  {
>         struct cu *cu = encoder->cu;
> @@ -2258,6 +2278,11 @@ static int
> btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
>                 if (filter_variable_name(name))
>                         continue;
> 
> +               /* A 0 address may be in a .discard section; ensure the
> +                * variable really is in this section by checking ELF
> symtab.
> +                */
> +               if (addr == 0 && !variable_in_sec(encoder, name, shndx))
> +                       continue;
>                 /* Check for invalid BTF names */
>                 if (!btf_name_valid(name)) {
>                         dump_invalid_symbol("Found invalid variable name
> when encoding btf",
> 
> 
> ...so slightly more complex than option 1, but a bit more general in its
> applicability to .discard section variables.
> 
> For the pahole folks, what do we think? Which option (or indeed other
> ones I haven't thought of) makes sense for a fix for this? Thanks!

I can reproduce this with the CONFIG_DEBUG_FORCE_WEAK_PER_CPU enable,
the fix looks fine, could you send formal patch?

thanks,
jirka

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] module BTF validation failure (Error -22) on next
  2024-12-16 15:19         ` Alan Maguire
  2024-12-16 21:28           ` Stephen Brennan
  2024-12-17  8:02           ` Jiri Olsa
@ 2025-01-15 17:38           ` Cong Wang
  2025-01-16  9:51             ` Alan Maguire
  2 siblings, 1 reply; 6+ messages in thread
From: Cong Wang @ 2025-01-15 17:38 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Uros Bizjak, Laura Nao, bpf, chrome-platform, kernel,
	linux-kernel, regressions, Stephen Brennan,
	dwarves@vger.kernel.org

On Mon, Dec 16, 2024 at 03:19:01PM +0000, Alan Maguire wrote:
> Seems like there's a few approaches we can take in fixing this:
> 
> 1. designate "__pcpu_" prefix as a variable prefix to filter out. This
> resolves the immediate problem but is too narrowly focused IMO and we
> may end up playing whack-a-mole with other dummy variable prefixes.
> 2. resurrect ELF section variable information fully; i.e. record a list
> of variables per ELF section (or at least per ELF section we care
> about). If variable is not on the list for the ELF section, do not
> encode it.
> 3. midway between the two; for the 0 address case specifically, verify
> that the variable name really _is_ in the associated ELF section. No
> need to create a local ELF table variable representation, we could just
> walk the table in the case of the 0 addresses.
> 
> Diff for approach 3 is as follows

Hi Alan,

Thanks for your detailed analysis.

Is your patch formally submitted and merged? A quick code search with
variable_in_sec() tells me no, but I could very easily miss things here,
hence I am asking you. :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] module BTF validation failure (Error -22) on next
  2025-01-15 17:38           ` Cong Wang
@ 2025-01-16  9:51             ` Alan Maguire
  0 siblings, 0 replies; 6+ messages in thread
From: Alan Maguire @ 2025-01-16  9:51 UTC (permalink / raw)
  To: Cong Wang
  Cc: Uros Bizjak, Laura Nao, bpf, chrome-platform, kernel,
	linux-kernel, regressions, Stephen Brennan,
	dwarves@vger.kernel.org

On 15/01/2025 17:38, Cong Wang wrote:
> On Mon, Dec 16, 2024 at 03:19:01PM +0000, Alan Maguire wrote:
>> Seems like there's a few approaches we can take in fixing this:
>>
>> 1. designate "__pcpu_" prefix as a variable prefix to filter out. This
>> resolves the immediate problem but is too narrowly focused IMO and we
>> may end up playing whack-a-mole with other dummy variable prefixes.
>> 2. resurrect ELF section variable information fully; i.e. record a list
>> of variables per ELF section (or at least per ELF section we care
>> about). If variable is not on the list for the ELF section, do not
>> encode it.
>> 3. midway between the two; for the 0 address case specifically, verify
>> that the variable name really _is_ in the associated ELF section. No
>> need to create a local ELF table variable representation, we could just
>> walk the table in the case of the 0 addresses.
>>
>> Diff for approach 3 is as follows
> 
> Hi Alan,
> 
> Thanks for your detailed analysis.
> 
> Is your patch formally submitted and merged? A quick code search with
> variable_in_sec() tells me no, but I could very easily miss things here,
> hence I am asking you. :)

hi Cong, I submitted the patch here:

https://lore.kernel.org/dwarves/20241217103629.2383809-1-alan.maguire@oracle.com/

..but it looks like it hasn't landed officially yet, apologies about
that. Would you mind testing the master branch of dwarves with the patch
applied to confirm the problem is fixed? It worked at my end but would
be great to confirm it fixes the issue on your side too. Thanks!

Alan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-16  9:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241115171712.427535-1-laura.nao@collabora.com>
     [not found] ` <20241204155305.444280-1-laura.nao@collabora.com>
     [not found]   ` <CAFULd4a+GjfN5EgPM-utJNfwo5vQ9Sq+uqXJ62eP9ed7bBJ50w@mail.gmail.com>
     [not found]     ` <Z10MkXtzyY9RDqSp@pop-os.localdomain>
2024-12-14 12:15       ` [REGRESSION] module BTF validation failure (Error -22) on next Alan Maguire
2024-12-16 15:19         ` Alan Maguire
2024-12-16 21:28           ` Stephen Brennan
2024-12-17  8:02           ` Jiri Olsa
2025-01-15 17:38           ` Cong Wang
2025-01-16  9:51             ` Alan Maguire

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox