[PATCH CRASH] Fix module section load address when sh

Linux debuggers
 help / color / mirror / Atom feed

* [PATCH CRASH] Fix module section load address when sh_addr != 0
@ 2025-04-01 21:01 Stephen Brennan
  2025-04-02  7:25 ` [Crash-utility] " Tao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Brennan @ 2025-04-01 21:01 UTC (permalink / raw)
  To: devel; +Cc: linux-debuggers, Stephen Brennan

A user reported that crash was reporting the address for certain module
variables incorrectly. I was able to track it down specifically to
variables which were located in the .data section of a kernel module.
While the "sym" command gave the correct value, printing the address of
the variable or expressions based on it with "p" would give an incorrect
value. For example, the variable "ata_dummy_port_ops" variable is
included in the .data section of libata.ko when built as a module:

    $ sudo grep '\bata_dummy_port_ops\b' /proc/kallsyms
    ffffffffc0a71580 d ata_dummy_port_ops   [libata]
    $ sudo crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /proc/kcore
    crash> sym ata_dummy_port_ops
    ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
    crash> mod -s libata
         MODULE       NAME                      TEXT_BASE         SIZE  OBJECT FILE
    ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
    /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
    crash> sym ata_dummy_port_ops
    ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
    crash> p/x &ata_dummy_port_ops
    $1 = 0xffffffffc0a6fe80

The symbol value (from kallsyms) is correct, but its address provided by
GDB is incorrect. It turns out that the .data section has an sh_addr
which is non-zero. The result of this is that calculate_load_order_6_4()
incorrectly calculates the base address for the .data section. This
patch fixes the base address which is later provided to GDB via
add-symbol-file.

The impact here is interesting. Only variables within sections that have
a non-zero sh_addr are impacted. It turns out that this is relatively
common since Linux kernel commit 22d407b164ff7 ("lib: add allocation
tagging support for memory allocation profiling"), which was merged in
6.10. That commit added an entry to the scripts/module.lds.S linker
script, without specifying a base address of zero. I believe that is the
reason for the non-zero base addresses.

I was able to verify that, in addition to the Oracle Linux kernel where
we initially noticed the issue, kernel modules on Arch Linux and Fedora
also have non-zero .data sh_addr values. This is likely the case for
most non-clang kernels since 6.10, but those were the only two distros I
checked. While my reading of the module.lds.S seems to indicate that
kernels built with CONFIG_LTO_CLANG=y should also have non-zero .data,
.bss, and .rodata section addresses, I haven't been able to reproduce
this with clang LTO kernels. Regardless, crash should properly handle
non-zero sh_addr since it exists in the real world now.

The core of the issue is that the symbol value returned by BFD includes
the sh_addr of the section containing the symbol. For example, suppose
a symbol with address 0 is located within a section with virtual address
0xa00. Then, the resulting symbol value will be 0xa00, not 0.

calculate_load_order_6_4() computes the base address of each section by
using a kallsyms symbol known to be within that section, and then
subtracting the value of the symbol from the object file. This
implicitly assumes that the section sh_addr is zero, and thus the symbol
value is just an offset. To fix the computation, add in the section base
address, to account for cases where it is non-zero.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 symbols.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/symbols.c b/symbols.c
index 5adbc30..e30fafe 100644
--- a/symbols.c
+++ b/symbols.c
@@ -12808,6 +12808,7 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
 	asymbol *store;
 	asymbol *sym;
 	symbol_info syminfo;
+	bfd_vma secaddr;
 	char *secname;
 	int i, t;

@@ -12860,6 +12861,7 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
 				}
 				if (strcmp(syminfo.name, s1->name) == 0) {
 					secname = (char *)bfd_section_name(sym->section);
+					secaddr = bfd_section_vma(sym->section);
 					break;
 				}

@@ -12890,14 +12892,14 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
 			}

 			/* Update the offset information for the section */
-			sec_start = s1->value - syminfo.value;
+			sec_start = s1->value - syminfo.value + secaddr;
 			/* keep the address instead of offset */
 			lm->mod_section_data[i].addr = sec_start;
 			lm->mod_section_data[i].flags |= SEC_FOUND;

 			if (CRASHDEBUG(2))
-				fprintf(fp, "update sec offset sym %s @ %lx  val %lx  section %s\n",
-					s1->name, s1->value, (ulong)syminfo.value, secname);
+				fprintf(fp, "update sec offset sym %s @ %lx  val %lx  section %s @ %lx\n",
+					s1->name, s1->value, (ulong)syminfo.value, secname, secaddr);

 			if (strcmp(secname, ".text") == 0)
 				lm->mod_text_start = sec_start;
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Crash-utility] [PATCH CRASH] Fix module section load address when sh_addr != 0
  2025-04-01 21:01 [PATCH CRASH] Fix module section load address when sh_addr != 0 Stephen Brennan
@ 2025-04-02  7:25 ` Tao Liu
  2025-04-02 23:18   ` Stephen Brennan
  0 siblings, 1 reply; 9+ messages in thread
From: Tao Liu @ 2025-04-02  7:25 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: devel, linux-debuggers

Hi Stephen,

Thanks for reporting the issue and patch. Please check if I understood
you correctly. The correct output you are expecting in your case is:

    crash> sym ata_dummy_port_ops
    ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
    crash> mod -s libata
         MODULE       NAME                      TEXT_BASE         SIZE
 OBJECT FILE
    ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
    /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
    crash> sym ata_dummy_port_ops
    ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
    crash> p/x &ata_dummy_port_ops
    $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
same as sym's output, right?

If that is the case, then after applied your patch, the issue still
exists on my machine:

crash> sym fuse_miscdevice
ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
crash> mod -s fuse
     MODULE       NAME                             TEXT_BASE
SIZE  OBJECT FILE
ffffffffc05f8dc0  fuse                          ffffffffc05da000
233472  /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
crash> sym fuse_miscdevice
ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
crash> p/x &fuse_miscdevice
$1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.

The .data section of fuse.ko.debug have non-zero address:

$ readelf -S /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
...
[50] .data             NOBITS           0000000000000e20  00000100
       000000000000080f  0000000000000000  WA       0     0     32

Could you please re-check your patch for this, or is there something
I'm missing?

Thanks,
Tao Liu

On Wed, Apr 2, 2025 at 11:41 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> A user reported that crash was reporting the address for certain module
> variables incorrectly. I was able to track it down specifically to
> variables which were located in the .data section of a kernel module.
> While the "sym" command gave the correct value, printing the address of
> the variable or expressions based on it with "p" would give an incorrect
> value. For example, the variable "ata_dummy_port_ops" variable is
> included in the .data section of libata.ko when built as a module:
>
>     $ sudo grep '\bata_dummy_port_ops\b' /proc/kallsyms
>     ffffffffc0a71580 d ata_dummy_port_ops   [libata]
>     $ sudo crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /proc/kcore
>     crash> sym ata_dummy_port_ops
>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
>     crash> mod -s libata
>          MODULE       NAME                      TEXT_BASE         SIZE  OBJECT FILE
>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
>     crash> sym ata_dummy_port_ops
>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
>     crash> p/x &ata_dummy_port_ops
>     $1 = 0xffffffffc0a6fe80
>
> The symbol value (from kallsyms) is correct, but its address provided by
> GDB is incorrect. It turns out that the .data section has an sh_addr
> which is non-zero. The result of this is that calculate_load_order_6_4()
> incorrectly calculates the base address for the .data section. This
> patch fixes the base address which is later provided to GDB via
> add-symbol-file.
>
> The impact here is interesting. Only variables within sections that have
> a non-zero sh_addr are impacted. It turns out that this is relatively
> common since Linux kernel commit 22d407b164ff7 ("lib: add allocation
> tagging support for memory allocation profiling"), which was merged in
> 6.10. That commit added an entry to the scripts/module.lds.S linker
> script, without specifying a base address of zero. I believe that is the
> reason for the non-zero base addresses.
>
> I was able to verify that, in addition to the Oracle Linux kernel where
> we initially noticed the issue, kernel modules on Arch Linux and Fedora
> also have non-zero .data sh_addr values. This is likely the case for
> most non-clang kernels since 6.10, but those were the only two distros I
> checked. While my reading of the module.lds.S seems to indicate that
> kernels built with CONFIG_LTO_CLANG=y should also have non-zero .data,
> .bss, and .rodata section addresses, I haven't been able to reproduce
> this with clang LTO kernels. Regardless, crash should properly handle
> non-zero sh_addr since it exists in the real world now.
>
> The core of the issue is that the symbol value returned by BFD includes
> the sh_addr of the section containing the symbol. For example, suppose
> a symbol with address 0 is located within a section with virtual address
> 0xa00. Then, the resulting symbol value will be 0xa00, not 0.
>
> calculate_load_order_6_4() computes the base address of each section by
> using a kallsyms symbol known to be within that section, and then
> subtracting the value of the symbol from the object file. This
> implicitly assumes that the section sh_addr is zero, and thus the symbol
> value is just an offset. To fix the computation, add in the section base
> address, to account for cases where it is non-zero.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
>  symbols.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/symbols.c b/symbols.c
> index 5adbc30..e30fafe 100644
> --- a/symbols.c
> +++ b/symbols.c
> @@ -12808,6 +12808,7 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
>         asymbol *store;
>         asymbol *sym;
>         symbol_info syminfo;
> +       bfd_vma secaddr;
>         char *secname;
>         int i, t;
>
> @@ -12860,6 +12861,7 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
>                                 }
>                                 if (strcmp(syminfo.name, s1->name) == 0) {
>                                         secname = (char *)bfd_section_name(sym->section);
> +                                       secaddr = bfd_section_vma(sym->section);
>                                         break;
>                                 }
>
> @@ -12890,14 +12892,14 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
>                         }
>
>                         /* Update the offset information for the section */
> -                       sec_start = s1->value - syminfo.value;
> +                       sec_start = s1->value - syminfo.value + secaddr;
>                         /* keep the address instead of offset */
>                         lm->mod_section_data[i].addr = sec_start;
>                         lm->mod_section_data[i].flags |= SEC_FOUND;
>
>                         if (CRASHDEBUG(2))
> -                               fprintf(fp, "update sec offset sym %s @ %lx  val %lx  section %s\n",
> -                                       s1->name, s1->value, (ulong)syminfo.value, secname);
> +                               fprintf(fp, "update sec offset sym %s @ %lx  val %lx  section %s @ %lx\n",
> +                                       s1->name, s1->value, (ulong)syminfo.value, secname, secaddr);
>
>                         if (strcmp(secname, ".text") == 0)
>                                 lm->mod_text_start = sec_start;
> --
> 2.43.5
> --
> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
> To unsubscribe send an email to devel-leave@lists.crash-utility.osci.io
> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
> Contribution Guidelines: https://github.com/crash-utility/crash/wiki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Crash-utility] [PATCH CRASH] Fix module section load address when sh_addr != 0
  2025-04-02  7:25 ` [Crash-utility] " Tao Liu
@ 2025-04-02 23:18   ` Stephen Brennan
  2025-04-03 22:43     ` Stephen Brennan
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Brennan @ 2025-04-02 23:18 UTC (permalink / raw)
  To: Tao Liu; +Cc: devel, linux-debuggers

Tao Liu <ltao@redhat.com> writes:
> Hi Stephen,
>
> Thanks for reporting the issue and patch. Please check if I understood
> you correctly. The correct output you are expecting in your case is:
>
>     crash> sym ata_dummy_port_ops
>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
>     crash> mod -s libata
>          MODULE       NAME                      TEXT_BASE         SIZE
>  OBJECT FILE
>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
>     crash> sym ata_dummy_port_ops
>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
>     crash> p/x &ata_dummy_port_ops
>     $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
> same as sym's output, right?

Yes, that's correct.

> If that is the case, then after applied your patch, the issue still
> exists on my machine:
>
> crash> sym fuse_miscdevice
> ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
> crash> mod -s fuse
>      MODULE       NAME                             TEXT_BASE
> SIZE  OBJECT FILE
> ffffffffc05f8dc0  fuse                          ffffffffc05da000
> 233472  /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> crash> sym fuse_miscdevice
> ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
> crash> p/x &fuse_miscdevice
> $1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.

Indeed, I checked with fuse on Fedora (6.11.4-301.fc41.x86_64, so almost
the same version). And I see the same behavior with fuse_miscdevice.

In your case and mine, the offset is 0x1d0c0, which could be a clue to
how this mismatch happens.

> The .data section of fuse.ko.debug have non-zero address:
>
> $ readelf -S /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> Section Headers:
>   [Nr] Name              Type             Address           Offset
>        Size              EntSize          Flags  Link  Info  Align
> ...
> [50] .data             NOBITS           0000000000000e20  00000100
>        000000000000080f  0000000000000000  WA       0     0     32
>
> Could you please re-check your patch for this, or is there something
> I'm missing?

From what I can tell, the important difference here may be that the
symbol you selected is a static variable (which becomes a local ELF
symbol). Apparently, my patch only resolves the issue for *global
variables*, but not for local variables.

$ eu-readelf -s /usr/lib/debug/lib/modules/$(uname -r)/kernel/fs/fuse/fuse.ko.debug | grep -P "fuse_(miscdevice|mutex)"
   85: 0000000000000100     80 OBJECT  LOCAL  DEFAULT       50 fuse_miscdevice
 1041: 00000000000005c0     32 OBJECT  GLOBAL DEFAULT       50 fuse_mutex

Here is the behavior of crash's master branch on these two symbols:

crash> sym fuse_miscdevice
ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
crash> sym fuse_mutex
ffffffffc02fb4a0 (?) fuse_mutex [fuse]
crash> mod -s fuse
     MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
crash> sym fuse_miscdevice
ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
crash> sym fuse_mutex
ffffffffc02fb4a0 (B) fuse_mutex [fuse]
crash> p/x &fuse_miscdevice
$1 = 0xffffffffc02ddf20
crash> p/x &fuse_mutex
$2 = 0xffffffffc02fa680

Both fuse_miscdevice, and fuse_mutex are mismatched (with different
offsets). And now, here's the behavior of crash with my patch:

crash> sym fuse_miscdevice
ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
crash> sym fuse_mutex
ffffffffc02fb4a0 (?) fuse_mutex [fuse]
crash> mod -s fuse
     MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
crash> sym fuse_miscdevice
ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
crash> sym fuse_mutex
ffffffffc02fb4a0 (B) fuse_mutex [fuse]
crash> p/x &fuse_miscdevice
$1 = 0xffffffffc02ddf20
crash> p/x &fuse_mutex
$2 = 0xffffffffc02fb4a0

Here the global fuse_mutex is correct, but the static/local
fuse_miscdevice is still incorrect. This indicates that there's more yet
to fix. (But at least my patch does part of the work!)

I'm wondering whether this has anything to do with ELF relocations. From
what I can tell using "readelf -r" and "llvm-dwarfdump", the relocations
for the location of these variables are different. Relocations for
global variables seem to be done via its corresponding symbol, e.g:

  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000008bb6b  041100000001 R_X86_64_64       00000000000005c0 fuse_mutex + 0

Whereas relocations for the local variable seem to be done relative to
the ".data" section symbol:

  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000000133f4  001e00000001 R_X86_64_64       0000000000000000 .data + 100

These offsets correspond roughly to what you'd see for the DWARF
offsets of the location expressions for each variable from
llvm-dwarfdump. So one theory here could be that GDB is not using the
provided ".data" address to perform the relocation correctly.

I've been playing with the internals of GDB to try to get more
information. It seems like "fuse_mutex" is described as "unresolved"
whereas "fuse_miscdevice" is shown as being part of the .text section
with the same incorrect address.

crash> gdb maint print symbols -objfile fuse.ko.debug | grep fuse_mutex
crash> p/x &fuse_mutex
$1 = 0xffffffffc02fb4a0
crash> gdb maint print symbols -objfile fuse.ko.debug | grep fuse_mutex
 } fuse_mutex; unresolved
crash> gdb maint print symbols -objfile fuse.ko.debug | grep fuse_miscdevice
   } fuse_miscdevice; static at 0xffffffffc02ddf20 section .text

I'm not 100% certain if this is relevant, at this point I'm just sharing
debugging information in case it helps you or anybody else. I'll
continue looking into the root cause of the issue for static variables.

Thanks,
Stephen

> Thanks,
> Tao Liu
>
> On Wed, Apr 2, 2025 at 11:41 AM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> A user reported that crash was reporting the address for certain module
>> variables incorrectly. I was able to track it down specifically to
>> variables which were located in the .data section of a kernel module.
>> While the "sym" command gave the correct value, printing the address of
>> the variable or expressions based on it with "p" would give an incorrect
>> value. For example, the variable "ata_dummy_port_ops" variable is
>> included in the .data section of libata.ko when built as a module:
>>
>>     $ sudo grep '\bata_dummy_port_ops\b' /proc/kallsyms
>>     ffffffffc0a71580 d ata_dummy_port_ops   [libata]
>>     $ sudo crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /proc/kcore
>>     crash> sym ata_dummy_port_ops
>>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
>>     crash> mod -s libata
>>          MODULE       NAME                      TEXT_BASE         SIZE  OBJECT FILE
>>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
>>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
>>     crash> sym ata_dummy_port_ops
>>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
>>     crash> p/x &ata_dummy_port_ops
>>     $1 = 0xffffffffc0a6fe80
>>
>> The symbol value (from kallsyms) is correct, but its address provided by
>> GDB is incorrect. It turns out that the .data section has an sh_addr
>> which is non-zero. The result of this is that calculate_load_order_6_4()
>> incorrectly calculates the base address for the .data section. This
>> patch fixes the base address which is later provided to GDB via
>> add-symbol-file.
>>
>> The impact here is interesting. Only variables within sections that have
>> a non-zero sh_addr are impacted. It turns out that this is relatively
>> common since Linux kernel commit 22d407b164ff7 ("lib: add allocation
>> tagging support for memory allocation profiling"), which was merged in
>> 6.10. That commit added an entry to the scripts/module.lds.S linker
>> script, without specifying a base address of zero. I believe that is the
>> reason for the non-zero base addresses.
>>
>> I was able to verify that, in addition to the Oracle Linux kernel where
>> we initially noticed the issue, kernel modules on Arch Linux and Fedora
>> also have non-zero .data sh_addr values. This is likely the case for
>> most non-clang kernels since 6.10, but those were the only two distros I
>> checked. While my reading of the module.lds.S seems to indicate that
>> kernels built with CONFIG_LTO_CLANG=y should also have non-zero .data,
>> .bss, and .rodata section addresses, I haven't been able to reproduce
>> this with clang LTO kernels. Regardless, crash should properly handle
>> non-zero sh_addr since it exists in the real world now.
>>
>> The core of the issue is that the symbol value returned by BFD includes
>> the sh_addr of the section containing the symbol. For example, suppose
>> a symbol with address 0 is located within a section with virtual address
>> 0xa00. Then, the resulting symbol value will be 0xa00, not 0.
>>
>> calculate_load_order_6_4() computes the base address of each section by
>> using a kallsyms symbol known to be within that section, and then
>> subtracting the value of the symbol from the object file. This
>> implicitly assumes that the section sh_addr is zero, and thus the symbol
>> value is just an offset. To fix the computation, add in the section base
>> address, to account for cases where it is non-zero.
>>
>> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>> ---
>>  symbols.c | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/symbols.c b/symbols.c
>> index 5adbc30..e30fafe 100644
>> --- a/symbols.c
>> +++ b/symbols.c
>> @@ -12808,6 +12808,7 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
>>         asymbol *store;
>>         asymbol *sym;
>>         symbol_info syminfo;
>> +       bfd_vma secaddr;
>>         char *secname;
>>         int i, t;
>>
>> @@ -12860,6 +12861,7 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
>>                                 }
>>                                 if (strcmp(syminfo.name, s1->name) == 0) {
>>                                         secname = (char *)bfd_section_name(sym->section);
>> +                                       secaddr = bfd_section_vma(sym->section);
>>                                         break;
>>                                 }
>>
>> @@ -12890,14 +12892,14 @@ calculate_load_order_6_4(struct load_module *lm, bfd *bfd, int dynamic,
>>                         }
>>
>>                         /* Update the offset information for the section */
>> -                       sec_start = s1->value - syminfo.value;
>> +                       sec_start = s1->value - syminfo.value + secaddr;
>>                         /* keep the address instead of offset */
>>                         lm->mod_section_data[i].addr = sec_start;
>>                         lm->mod_section_data[i].flags |= SEC_FOUND;
>>
>>                         if (CRASHDEBUG(2))
>> -                               fprintf(fp, "update sec offset sym %s @ %lx  val %lx  section %s\n",
>> -                                       s1->name, s1->value, (ulong)syminfo.value, secname);
>> +                               fprintf(fp, "update sec offset sym %s @ %lx  val %lx  section %s @ %lx\n",
>> +                                       s1->name, s1->value, (ulong)syminfo.value, secname, secaddr);
>>
>>                         if (strcmp(secname, ".text") == 0)
>>                                 lm->mod_text_start = sec_start;
>> --
>> 2.43.5
>> --
>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
>> To unsubscribe send an email to devel-leave@lists.crash-utility.osci.io
>> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Crash-utility] [PATCH CRASH] Fix module section load address when sh_addr != 0
  2025-04-02 23:18   ` Stephen Brennan
@ 2025-04-03 22:43     ` Stephen Brennan
  2025-04-04  6:41       ` Tao Liu
  2025-04-04 10:27       ` Tao Liu
  0 siblings, 2 replies; 9+ messages in thread
From: Stephen Brennan @ 2025-04-03 22:43 UTC (permalink / raw)
  To: Tao Liu; +Cc: devel, linux-debuggers

[-- Attachment #1: Type: text/plain, Size: 8109 bytes --]

Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> Tao Liu <ltao@redhat.com> writes:
>> Hi Stephen,
>>
>> Thanks for reporting the issue and patch. Please check if I understood
>> you correctly. The correct output you are expecting in your case is:
>>
>>     crash> sym ata_dummy_port_ops
>>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
>>     crash> mod -s libata
>>          MODULE       NAME                      TEXT_BASE         SIZE
>>  OBJECT FILE
>>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
>>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
>>     crash> sym ata_dummy_port_ops
>>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
>>     crash> p/x &ata_dummy_port_ops
>>     $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
>> same as sym's output, right?
>
> Yes, that's correct.
>
>> If that is the case, then after applied your patch, the issue still
>> exists on my machine:
>>
>> crash> sym fuse_miscdevice
>> ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
>> crash> mod -s fuse
>>      MODULE       NAME                             TEXT_BASE
>> SIZE  OBJECT FILE
>> ffffffffc05f8dc0  fuse                          ffffffffc05da000
>> 233472  /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
>> crash> sym fuse_miscdevice
>> ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
>> crash> p/x &fuse_miscdevice
>> $1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.
>
> Indeed, I checked with fuse on Fedora (6.11.4-301.fc41.x86_64, so almost
> the same version). And I see the same behavior with fuse_miscdevice.
>
> In your case and mine, the offset is 0x1d0c0, which could be a clue to
> how this mismatch happens.
>
>> The .data section of fuse.ko.debug have non-zero address:
>>
>> $ readelf -S /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
>> Section Headers:
>>   [Nr] Name              Type             Address           Offset
>>        Size              EntSize          Flags  Link  Info  Align
>> ...
>> [50] .data             NOBITS           0000000000000e20  00000100
>>        000000000000080f  0000000000000000  WA       0     0     32
>>
>> Could you please re-check your patch for this, or is there something
>> I'm missing?
>
> From what I can tell, the important difference here may be that the
> symbol you selected is a static variable (which becomes a local ELF
> symbol). Apparently, my patch only resolves the issue for *global
> variables*, but not for local variables.
>
> $ eu-readelf -s /usr/lib/debug/lib/modules/$(uname -r)/kernel/fs/fuse/fuse.ko.debug | grep -P "fuse_(miscdevice|mutex)"
>    85: 0000000000000100     80 OBJECT  LOCAL  DEFAULT       50 fuse_miscdevice
>  1041: 00000000000005c0     32 OBJECT  GLOBAL DEFAULT       50 fuse_mutex
>
> Here is the behavior of crash's master branch on these two symbols:
>
> crash> sym fuse_miscdevice
> ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> crash> sym fuse_mutex
> ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> crash> mod -s fuse
>      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> crash> sym fuse_miscdevice
> ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> crash> sym fuse_mutex
> ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> crash> p/x &fuse_miscdevice
> $1 = 0xffffffffc02ddf20
> crash> p/x &fuse_mutex
> $2 = 0xffffffffc02fa680
>
> Both fuse_miscdevice, and fuse_mutex are mismatched (with different
> offsets). And now, here's the behavior of crash with my patch:
>
> crash> sym fuse_miscdevice
> ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> crash> sym fuse_mutex
> ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> crash> mod -s fuse
>      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> crash> sym fuse_miscdevice
> ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> crash> sym fuse_mutex
> ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> crash> p/x &fuse_miscdevice
> $1 = 0xffffffffc02ddf20
> crash> p/x &fuse_mutex
> $2 = 0xffffffffc02fb4a0
>
> Here the global fuse_mutex is correct, but the static/local
> fuse_miscdevice is still incorrect. This indicates that there's more yet
> to fix. (But at least my patch does part of the work!)
>
> I'm wondering whether this has anything to do with ELF relocations.

Hello Tao,

I've continued debugging and verified that the root cause for this was
GDB performing relocations to the DWARF info incorrectly. At the bottom
of this email I've attached my patch, which you can apply to the master
branch to test out my proposed fix. The patch file also contains a
longer explanation of what I believe the issue is.

However, since it updates the gdb-16.2.patch file, it's quite difficult
to understand. Thus, I'll also provide a diff here of the relevant GDB
source file.

I'm not certain how the GDB patches are normally updated & reviewed. I
also recognize that this change is a bit messy, so I'd appreciate your
feedback as well.

Thank you,
Stephen

--- tmp/gdb-16.2/gdb/symfile.c	2025-04-03 15:38:26.093760270 -0700
+++ gdb-16.2/gdb/symfile.c	2025-04-03 15:07:43.239691104 -0700
@@ -341,8 +341,13 @@ place_section (bfd *abfd, asection *sect
     return;
 
   /* If the user specified an offset, honor it.  */
-  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0)
+  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0) {
+    /* addr_info_make_relative() subtracts out the section VMA. But if the user
+     * specified an offset, they have already taken this into account. Add it
+     * back in */
+    offsets[gdb_bfd_section_index (abfd, sect)] += bfd_section_vma(sect);
     return;
+  }
 
   /* Otherwise, let's try to find a place for the section.  */
   start_addr = (lowest + align - 1) & -align;
@@ -628,35 +633,8 @@ default_symfile_offsets (struct objfile
   if ((bfd_get_file_flags (objfile->obfd.get ()) & (EXEC_P | DYNAMIC)) == 0)
     {
       bfd *abfd = objfile->obfd.get ();
-      asection *cur_sec;
+      asection *cur_sec = NULL;
 
-      for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = cur_sec->next)
-	/* We do not expect this to happen; just skip this step if the
-	   relocatable file has a section with an assigned VMA.  */
-        if (bfd_section_vma (cur_sec) != 0
-           /*
-            *  Kernel modules may have some non-zero VMAs, i.e., like the
-            *  __ksymtab and __ksymtab_gpl sections in this example:
-            *
-            *    Section Headers:
-            *      [Nr] Name              Type             Address           Offset
-            *           Size              EntSize          Flags  Link  Info  Align
-            *      ...
-            *      [ 8] __ksymtab         PROGBITS         0000000000000060  0000ad90
-            *           0000000000000010  0000000000000000   A       0     0     16
-            *      [ 9] .rela__ksymtab    RELA             0000000000000000  0000ada0
-            *           0000000000000030  0000000000000018          43     8     8
-            *      [10] __ksymtab_gpl     PROGBITS         0000000000000070  0000add0
-            *           00000000000001a0  0000000000000000   A       0     0     16
-            *      ...
-            *
-            *  but they should be treated as if they are NULL.
-            */
-            && strncmp (bfd_section_name (cur_sec), "__k", 3) != 0)
-	  break;
-
-      if (cur_sec == NULL)
-	{
 	  section_offsets &offsets = objfile->section_offsets;
 
 	  /* Pick non-overlapping offsets for sections the user did not
@@ -704,7 +682,6 @@ default_symfile_offsets (struct objfile
 					offsets[cur_sec->index]);
 	      offsets[cur_sec->index] = 0;
 	    }
-	}
     }
 
   /* Remember the bfd indexes for the .text, .data, .bss and



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Full commit --]
[-- Type: text/x-patch, Size: 7988 bytes --]

From 729b89edabc1c67c0b6272c22a7ab9e4f4c4f957 Mon Sep 17 00:00:00 2001
From: Stephen Brennan <stephen.s.brennan@oracle.com>
Date: Thu, 3 Apr 2025 15:17:42 -0700
Subject: [PATCH] Fix bad relocations when module sh_addr is nonzero

In the previous commit, we fixed crash's logic which determines the
module load addresses in the presence of sections with nonzero sh_addr.
This allowed GDB to correctly locate public variables (i.e. GLOBAL
symbols) via the ELF symbol table (msymbols). However, LOCAL symbols
still had incorrect addresses.

The root cause for those issues is not in crash. Instead, GDB does not
expect sections with nonzero sh_addr, and so it slipped up in multiple
places:

1. In default_symfile_offsets(), GDB detects the nonzero sh_addr field
   and fails to apply the user-supplied section offsets. The result is
   that later, in symfile_relocate_debug_section(), relocations are
   applied to the DWARF info using the wrong section addresses, which
   results in invalid addresses for variables. Clearly, this has
   happened before, because crash has special-cased the "__ksymtab*"
   section names to avoid this condition. To resolve this, we simply
   drop the check for nonzero sh_addr altogether: the only ET_REL files
   we encounter should be kernel modules, so there's no real reason to
   be picky.

2. Even with that fixed, the user-supplied section addresses are
   clobbered by the addr_info_make_relative(), which subtracts out
   section offsets during its operation. To resolve this, undo the
   operation for ET_REL files where a section address was provided by
   the user (i.e. crash).

With these two fixes, both local and global variables from a module
section with nonzero sh_addr are correctly reported. Behavior is
unchanged for modules with a zero sh_addr.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 gdb-16.2.patch | 87 +++++++++++++++++++++++++++++---------------------
 1 file changed, 50 insertions(+), 37 deletions(-)

diff --git a/gdb-16.2.patch b/gdb-16.2.patch
index ee9bd2b..a6edba2 100644
--- a/gdb-16.2.patch
+++ b/gdb-16.2.patch
@@ -623,37 +623,50 @@
  	    gdb_printf (_("Backtrace stopped: %s\n"),
  			frame_stop_reason_string (trailing));
  	}
---- gdb-16.2/gdb/symfile.c.orig
-+++ gdb-16.2/gdb/symfile.c
-@@ -633,7 +633,26 @@ default_symfile_offsets (struct objfile *objfile,
-       for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = cur_sec->next)
- 	/* We do not expect this to happen; just skip this step if the
- 	   relocatable file has a section with an assigned VMA.  */
+--- gdb-16.2/gdb/symfile.c.orig	2025-04-03 14:56:36.640330084 -0700
++++ gdb-16.2/gdb/symfile.c	2025-04-03 15:07:43.239691104 -0700
+@@ -341,8 +341,13 @@ place_section (bfd *abfd, asection *sect
+     return;
+ 
+   /* If the user specified an offset, honor it.  */
+-  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0)
++  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0) {
++    /* addr_info_make_relative() subtracts out the section VMA. But if the user
++     * specified an offset, they have already taken this into account. Add it
++     * back in */
++    offsets[gdb_bfd_section_index (abfd, sect)] += bfd_section_vma(sect);
+     return;
++  }
+ 
+   /* Otherwise, let's try to find a place for the section.  */
+   start_addr = (lowest + align - 1) & -align;
+@@ -628,16 +633,8 @@ default_symfile_offsets (struct objfile
+   if ((bfd_get_file_flags (objfile->obfd.get ()) & (EXEC_P | DYNAMIC)) == 0)
+     {
+       bfd *abfd = objfile->obfd.get ();
+-      asection *cur_sec;
+-
+-      for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = cur_sec->next)
+-	/* We do not expect this to happen; just skip this step if the
+-	   relocatable file has a section with an assigned VMA.  */
 -	if (bfd_section_vma (cur_sec) != 0)
-+        if (bfd_section_vma (cur_sec) != 0
-+           /*
-+            *  Kernel modules may have some non-zero VMAs, i.e., like the
-+            *  __ksymtab and __ksymtab_gpl sections in this example:
-+            *
-+            *    Section Headers:
-+            *      [Nr] Name              Type             Address           Offset
-+            *           Size              EntSize          Flags  Link  Info  Align
-+            *      ...
-+            *      [ 8] __ksymtab         PROGBITS         0000000000000060  0000ad90
-+            *           0000000000000010  0000000000000000   A       0     0     16
-+            *      [ 9] .rela__ksymtab    RELA             0000000000000000  0000ada0
-+            *           0000000000000030  0000000000000018          43     8     8
-+            *      [10] __ksymtab_gpl     PROGBITS         0000000000000070  0000add0
-+            *           00000000000001a0  0000000000000000   A       0     0     16
-+            *      ...
-+            *
-+            *  but they should be treated as if they are NULL.
-+            */
-+            && strncmp (bfd_section_name (cur_sec), "__k", 3) != 0)
- 	  break;
- 
-       if (cur_sec == NULL)
-@@ -1069,6 +1088,12 @@ symbol_file_add_with_addrs (const gdb_bfd_ref_ptr &abfd, const char *name,
+-	  break;
++      asection *cur_sec = NULL;
+ 
+-      if (cur_sec == NULL)
+-	{
+ 	  section_offsets &offsets = objfile->section_offsets;
+ 
+ 	  /* Pick non-overlapping offsets for sections the user did not
+@@ -685,7 +682,6 @@ default_symfile_offsets (struct objfile
+ 					offsets[cur_sec->index]);
+ 	      offsets[cur_sec->index] = 0;
+ 	    }
+-	}
+     }
+ 
+   /* Remember the bfd indexes for the .text, .data, .bss and
+@@ -1069,6 +1065,12 @@ symbol_file_add_with_addrs (const gdb_bf
  
    objfile *objfile
      = objfile::make (abfd, current_program_space, name, flags, parent);
@@ -666,7 +679,7 @@
  
    /* We either created a new mapped symbol table, mapped an existing
       symbol table file which has not had initial symbol reading
-@@ -1095,6 +1120,7 @@ symbol_file_add_with_addrs (const gdb_bfd_ref_ptr &abfd, const char *name,
+@@ -1095,6 +1097,7 @@ symbol_file_add_with_addrs (const gdb_bf
  		    styled_string (file_name_style.style (), name));
  
        objfile->expand_all_symtabs ();
@@ -674,7 +687,7 @@
      }
  
    /* Note that we only print a message if we have no symbols and have
-@@ -1352,6 +1378,10 @@ show_debug_file_directory (struct ui_file *file, int from_tty,
+@@ -1352,6 +1355,10 @@ show_debug_file_directory (struct ui_fil
  #if ! defined (DEBUG_SUBDIRECTORY)
  #define DEBUG_SUBDIRECTORY ".debug"
  #endif
@@ -685,7 +698,7 @@
  
  /* Find a separate debuginfo file for OBJFILE, using DIR as the directory
     where the original file resides (may not be the same as
-@@ -1390,6 +1420,15 @@ find_separate_debug_file (const char *dir,
+@@ -1390,6 +1397,15 @@ find_separate_debug_file (const char *di
    if (separate_debug_file_exists (debugfile, crc32, objfile, warnings))
      return debugfile;
  
@@ -701,7 +714,7 @@
    /* Then try in the global debugfile directories.
  
       Keep backward compatibility so that DEBUG_FILE_DIRECTORY being "" will
-@@ -1545,6 +1584,14 @@ find_separate_debug_file_by_debuglink
+@@ -1545,6 +1561,14 @@ find_separate_debug_file_by_debuglink
  	}
      }
  
@@ -716,7 +729,7 @@
    return debugfile;
  }
  
-@@ -2318,8 +2365,10 @@ add_symbol_file_command (const char *args, int from_tty)
+@@ -2318,8 +2342,10 @@ add_symbol_file_command (const char *arg
    else if (section_addrs.empty ())
      gdb_printf ("\n");
  
@@ -727,7 +740,7 @@
  
    objf = symbol_file_add (filename.get (), add_flags, &section_addrs,
  			  flags);
-@@ -2660,6 +2709,7 @@ reread_symbols (int from_tty)
+@@ -2660,6 +2686,7 @@ reread_symbols (int from_tty)
  					   objfile_name (objfile)));
  
  	      objfile->expand_all_symtabs ();
@@ -735,7 +748,7 @@
  	    }
  
  	  if (!objfile_has_symbols (objfile))
-@@ -3638,6 +3688,15 @@ bfd_byte *
+@@ -3638,6 +3665,15 @@ bfd_byte *
  symfile_relocate_debug_section (struct objfile *objfile,
  				asection *sectp, bfd_byte *buf)
  {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Crash-utility] [PATCH CRASH] Fix module section load address when sh_addr != 0
  2025-04-03 22:43     ` Stephen Brennan
@ 2025-04-04  6:41       ` Tao Liu
  2025-04-04 10:27       ` Tao Liu
  1 sibling, 0 replies; 9+ messages in thread
From: Tao Liu @ 2025-04-04  6:41 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: devel, linux-debuggers

Hi Stephen,


On Fri, Apr 4, 2025 at 11:43 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> > Tao Liu <ltao@redhat.com> writes:
> >> Hi Stephen,
> >>
> >> Thanks for reporting the issue and patch. Please check if I understood
> >> you correctly. The correct output you are expecting in your case is:
> >>
> >>     crash> sym ata_dummy_port_ops
> >>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
> >>     crash> mod -s libata
> >>          MODULE       NAME                      TEXT_BASE         SIZE
> >>  OBJECT FILE
> >>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
> >>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
> >>     crash> sym ata_dummy_port_ops
> >>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
> >>     crash> p/x &ata_dummy_port_ops
> >>     $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
> >> same as sym's output, right?
> >
> > Yes, that's correct.
> >
> >> If that is the case, then after applied your patch, the issue still
> >> exists on my machine:
> >>
> >> crash> sym fuse_miscdevice
> >> ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
> >> crash> mod -s fuse
> >>      MODULE       NAME                             TEXT_BASE
> >> SIZE  OBJECT FILE
> >> ffffffffc05f8dc0  fuse                          ffffffffc05da000
> >> 233472  /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> crash> sym fuse_miscdevice
> >> ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
> >> crash> p/x &fuse_miscdevice
> >> $1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.
> >
> > Indeed, I checked with fuse on Fedora (6.11.4-301.fc41.x86_64, so almost
> > the same version). And I see the same behavior with fuse_miscdevice.
> >
> > In your case and mine, the offset is 0x1d0c0, which could be a clue to
> > how this mismatch happens.
> >
> >> The .data section of fuse.ko.debug have non-zero address:
> >>
> >> $ readelf -S /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> Section Headers:
> >>   [Nr] Name              Type             Address           Offset
> >>        Size              EntSize          Flags  Link  Info  Align
> >> ...
> >> [50] .data             NOBITS           0000000000000e20  00000100
> >>        000000000000080f  0000000000000000  WA       0     0     32
> >>
> >> Could you please re-check your patch for this, or is there something
> >> I'm missing?
> >
> > From what I can tell, the important difference here may be that the
> > symbol you selected is a static variable (which becomes a local ELF
> > symbol). Apparently, my patch only resolves the issue for *global
> > variables*, but not for local variables.
> >
> > $ eu-readelf -s /usr/lib/debug/lib/modules/$(uname -r)/kernel/fs/fuse/fuse.ko.debug | grep -P "fuse_(miscdevice|mutex)"
> >    85: 0000000000000100     80 OBJECT  LOCAL  DEFAULT       50 fuse_miscdevice
> >  1041: 00000000000005c0     32 OBJECT  GLOBAL DEFAULT       50 fuse_mutex
> >
> > Here is the behavior of crash's master branch on these two symbols:
> >
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> > crash> mod -s fuse
> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> > crash> p/x &fuse_miscdevice
> > $1 = 0xffffffffc02ddf20
> > crash> p/x &fuse_mutex
> > $2 = 0xffffffffc02fa680
> >
> > Both fuse_miscdevice, and fuse_mutex are mismatched (with different
> > offsets). And now, here's the behavior of crash with my patch:
> >
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> > crash> mod -s fuse
> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> > crash> p/x &fuse_miscdevice
> > $1 = 0xffffffffc02ddf20
> > crash> p/x &fuse_mutex
> > $2 = 0xffffffffc02fb4a0
> >
> > Here the global fuse_mutex is correct, but the static/local
> > fuse_miscdevice is still incorrect. This indicates that there's more yet
> > to fix. (But at least my patch does part of the work!)
> >
> > I'm wondering whether this has anything to do with ELF relocations.
>
> Hello Tao,
>
> I've continued debugging and verified that the root cause for this was
> GDB performing relocations to the DWARF info incorrectly. At the bottom
> of this email I've attached my patch, which you can apply to the master
> branch to test out my proposed fix. The patch file also contains a
> longer explanation of what I believe the issue is.
>
> However, since it updates the gdb-16.2.patch file, it's quite difficult
> to understand. Thus, I'll also provide a diff here of the relevant GDB
> source file.
>
> I'm not certain how the GDB patches are normally updated & reviewed. I
> also recognize that this change is a bit messy, so I'd appreciate your
> feedback as well.

Thanks a lot for your efforts on this. I'm looking into your patch,
please wait a while.

Thanks,
Tao Liu

>
> Thank you,
> Stephen
>
> --- tmp/gdb-16.2/gdb/symfile.c  2025-04-03 15:38:26.093760270 -0700
> +++ gdb-16.2/gdb/symfile.c      2025-04-03 15:07:43.239691104 -0700
> @@ -341,8 +341,13 @@ place_section (bfd *abfd, asection *sect
>      return;
>
>    /* If the user specified an offset, honor it.  */
> -  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0)
> +  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0) {
> +    /* addr_info_make_relative() subtracts out the section VMA. But if the user
> +     * specified an offset, they have already taken this into account. Add it
> +     * back in */
> +    offsets[gdb_bfd_section_index (abfd, sect)] += bfd_section_vma(sect);
>      return;
> +  }
>
>    /* Otherwise, let's try to find a place for the section.  */
>    start_addr = (lowest + align - 1) & -align;
> @@ -628,35 +633,8 @@ default_symfile_offsets (struct objfile
>    if ((bfd_get_file_flags (objfile->obfd.get ()) & (EXEC_P | DYNAMIC)) == 0)
>      {
>        bfd *abfd = objfile->obfd.get ();
> -      asection *cur_sec;
> +      asection *cur_sec = NULL;
>
> -      for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = cur_sec->next)
> -       /* We do not expect this to happen; just skip this step if the
> -          relocatable file has a section with an assigned VMA.  */
> -        if (bfd_section_vma (cur_sec) != 0
> -           /*
> -            *  Kernel modules may have some non-zero VMAs, i.e., like the
> -            *  __ksymtab and __ksymtab_gpl sections in this example:
> -            *
> -            *    Section Headers:
> -            *      [Nr] Name              Type             Address           Offset
> -            *           Size              EntSize          Flags  Link  Info  Align
> -            *      ...
> -            *      [ 8] __ksymtab         PROGBITS         0000000000000060  0000ad90
> -            *           0000000000000010  0000000000000000   A       0     0     16
> -            *      [ 9] .rela__ksymtab    RELA             0000000000000000  0000ada0
> -            *           0000000000000030  0000000000000018          43     8     8
> -            *      [10] __ksymtab_gpl     PROGBITS         0000000000000070  0000add0
> -            *           00000000000001a0  0000000000000000   A       0     0     16
> -            *      ...
> -            *
> -            *  but they should be treated as if they are NULL.
> -            */
> -            && strncmp (bfd_section_name (cur_sec), "__k", 3) != 0)
> -         break;
> -
> -      if (cur_sec == NULL)
> -       {
>           section_offsets &offsets = objfile->section_offsets;
>
>           /* Pick non-overlapping offsets for sections the user did not
> @@ -704,7 +682,6 @@ default_symfile_offsets (struct objfile
>                                         offsets[cur_sec->index]);
>               offsets[cur_sec->index] = 0;
>             }
> -       }
>      }
>
>    /* Remember the bfd indexes for the .text, .data, .bss and
>
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Crash-utility] [PATCH CRASH] Fix module section load address when sh_addr != 0
  2025-04-03 22:43     ` Stephen Brennan
  2025-04-04  6:41       ` Tao Liu
@ 2025-04-04 10:27       ` Tao Liu
  2025-04-04 23:30         ` Stephen Brennan
  1 sibling, 1 reply; 9+ messages in thread
From: Tao Liu @ 2025-04-04 10:27 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: devel, linux-debuggers

Hi Stephen,

On Fri, Apr 4, 2025 at 11:43 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> > Tao Liu <ltao@redhat.com> writes:
> >> Hi Stephen,
> >>
> >> Thanks for reporting the issue and patch. Please check if I understood
> >> you correctly. The correct output you are expecting in your case is:
> >>
> >>     crash> sym ata_dummy_port_ops
> >>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
> >>     crash> mod -s libata
> >>          MODULE       NAME                      TEXT_BASE         SIZE
> >>  OBJECT FILE
> >>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
> >>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
> >>     crash> sym ata_dummy_port_ops
> >>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
> >>     crash> p/x &ata_dummy_port_ops
> >>     $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
> >> same as sym's output, right?
> >
> > Yes, that's correct.
> >
> >> If that is the case, then after applied your patch, the issue still
> >> exists on my machine:
> >>
> >> crash> sym fuse_miscdevice
> >> ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
> >> crash> mod -s fuse
> >>      MODULE       NAME                             TEXT_BASE
> >> SIZE  OBJECT FILE
> >> ffffffffc05f8dc0  fuse                          ffffffffc05da000
> >> 233472  /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> crash> sym fuse_miscdevice
> >> ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
> >> crash> p/x &fuse_miscdevice
> >> $1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.
> >
> > Indeed, I checked with fuse on Fedora (6.11.4-301.fc41.x86_64, so almost
> > the same version). And I see the same behavior with fuse_miscdevice.
> >
> > In your case and mine, the offset is 0x1d0c0, which could be a clue to
> > how this mismatch happens.
> >
> >> The .data section of fuse.ko.debug have non-zero address:
> >>
> >> $ readelf -S /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> Section Headers:
> >>   [Nr] Name              Type             Address           Offset
> >>        Size              EntSize          Flags  Link  Info  Align
> >> ...
> >> [50] .data             NOBITS           0000000000000e20  00000100
> >>        000000000000080f  0000000000000000  WA       0     0     32
> >>
> >> Could you please re-check your patch for this, or is there something
> >> I'm missing?
> >
> > From what I can tell, the important difference here may be that the
> > symbol you selected is a static variable (which becomes a local ELF
> > symbol). Apparently, my patch only resolves the issue for *global
> > variables*, but not for local variables.
> >
> > $ eu-readelf -s /usr/lib/debug/lib/modules/$(uname -r)/kernel/fs/fuse/fuse.ko.debug | grep -P "fuse_(miscdevice|mutex)"
> >    85: 0000000000000100     80 OBJECT  LOCAL  DEFAULT       50 fuse_miscdevice
> >  1041: 00000000000005c0     32 OBJECT  GLOBAL DEFAULT       50 fuse_mutex
> >
> > Here is the behavior of crash's master branch on these two symbols:
> >
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> > crash> mod -s fuse
> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> > crash> p/x &fuse_miscdevice
> > $1 = 0xffffffffc02ddf20
> > crash> p/x &fuse_mutex
> > $2 = 0xffffffffc02fa680
> >
> > Both fuse_miscdevice, and fuse_mutex are mismatched (with different
> > offsets). And now, here's the behavior of crash with my patch:
> >
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> > crash> mod -s fuse
> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> > crash> sym fuse_miscdevice
> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> > crash> sym fuse_mutex
> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> > crash> p/x &fuse_miscdevice
> > $1 = 0xffffffffc02ddf20
> > crash> p/x &fuse_mutex
> > $2 = 0xffffffffc02fb4a0
> >
> > Here the global fuse_mutex is correct, but the static/local
> > fuse_miscdevice is still incorrect. This indicates that there's more yet
> > to fix. (But at least my patch does part of the work!)
> >
> > I'm wondering whether this has anything to do with ELF relocations.
>
> Hello Tao,

Thanks for your detailed debug info, which is quite helpful! For your
No.1 patch which fixes the global variable relocation, it looks good.
However, for your No.2 patch in the attached file, I have some
questions.
>
> I've continued debugging and verified that the root cause for this was
> GDB performing relocations to the DWARF info incorrectly. At the bottom

1) Is this a gdb bug? If it does, then the proper way is to post No.2
patch to gdb mailing list and get it fixed there, then we can do gdb
patch backporting to crash. Frankly I haven't figured out whether it
is gdb or crash bug so far by myself...

2) I'm quoting part of the commit log in your No.2 patch has follows
(Please don't send patch as attachment next time, because I cannot
comment directly on it)

...
1. In default_symfile_offsets(), GDB detects the nonzero sh_addr field
   and fails to apply the user-supplied section offsets. The result is
   that later, in symfile_relocate_debug_section(), relocations are
   applied to the DWARF info using the wrong section addresses, which

Didn't we have corrected the .data section address in No.1 patch of
the .ko? Then why the "wrong section addresses"?

   results in invalid addresses for variables. Clearly, this has
   happened before, because crash has special-cased the "__ksymtab*"

Which line of code do you mean the special-cased __ksymtab? I didn't
find one in gdb/symfile.c.

   section names to avoid this condition. To resolve this, we simply
   drop the check for nonzero sh_addr altogether: the only ET_REL files
   we encounter should be kernel modules, so there's no real reason to
   be picky.

2. Even with that fixed, the user-supplied section addresses are
   clobbered by the addr_info_make_relative(), which subtracts out
   section offsets during its operation. To resolve this, undo the
   operation for ET_REL files where a section address was provided by
   the user (i.e. crash).
...

According to your description, it seems to me the gdb cannot handle
the nonzero sh_addr in the .ko case, but .ko isn't the only one whose
.data VMA is non-zero, see this case:

$ readelf -S /lib64/libc.so.6
...
[30] .data             PROGBITS         00000000001e8000  001e8000
       00000000000016c8  0000000000000000  WA       0     0     32
...

I don't know if gdb also fails in the userspace .so case? I mean, if
gdb can handle the user case with no problem, then probably we don't
need to hack gdb like this.

Thanks,
Tao Liu


> of this email I've attached my patch, which you can apply to the master
> branch to test out my proposed fix. The patch file also contains a
> longer explanation of what I believe the issue is.
>
> However, since it updates the gdb-16.2.patch file, it's quite difficult
> to understand. Thus, I'll also provide a diff here of the relevant GDB
> source file.
>
> I'm not certain how the GDB patches are normally updated & reviewed. I
> also recognize that this change is a bit messy, so I'd appreciate your
> feedback as well.
>
> Thank you,
> Stephen
>
> --- tmp/gdb-16.2/gdb/symfile.c  2025-04-03 15:38:26.093760270 -0700
> +++ gdb-16.2/gdb/symfile.c      2025-04-03 15:07:43.239691104 -0700
> @@ -341,8 +341,13 @@ place_section (bfd *abfd, asection *sect
>      return;
>
>    /* If the user specified an offset, honor it.  */
> -  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0)
> +  if (offsets[gdb_bfd_section_index (abfd, sect)] != 0) {
> +    /* addr_info_make_relative() subtracts out the section VMA. But if the user
> +     * specified an offset, they have already taken this into account. Add it
> +     * back in */
> +    offsets[gdb_bfd_section_index (abfd, sect)] += bfd_section_vma(sect);
>      return;
> +  }
>
>    /* Otherwise, let's try to find a place for the section.  */
>    start_addr = (lowest + align - 1) & -align;
> @@ -628,35 +633,8 @@ default_symfile_offsets (struct objfile
>    if ((bfd_get_file_flags (objfile->obfd.get ()) & (EXEC_P | DYNAMIC)) == 0)
>      {
>        bfd *abfd = objfile->obfd.get ();
> -      asection *cur_sec;
> +      asection *cur_sec = NULL;
>
> -      for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = cur_sec->next)
> -       /* We do not expect this to happen; just skip this step if the
> -          relocatable file has a section with an assigned VMA.  */
> -        if (bfd_section_vma (cur_sec) != 0
> -           /*
> -            *  Kernel modules may have some non-zero VMAs, i.e., like the
> -            *  __ksymtab and __ksymtab_gpl sections in this example:
> -            *
> -            *    Section Headers:
> -            *      [Nr] Name              Type             Address           Offset
> -            *           Size              EntSize          Flags  Link  Info  Align
> -            *      ...
> -            *      [ 8] __ksymtab         PROGBITS         0000000000000060  0000ad90
> -            *           0000000000000010  0000000000000000   A       0     0     16
> -            *      [ 9] .rela__ksymtab    RELA             0000000000000000  0000ada0
> -            *           0000000000000030  0000000000000018          43     8     8
> -            *      [10] __ksymtab_gpl     PROGBITS         0000000000000070  0000add0
> -            *           00000000000001a0  0000000000000000   A       0     0     16
> -            *      ...
> -            *
> -            *  but they should be treated as if they are NULL.
> -            */
> -            && strncmp (bfd_section_name (cur_sec), "__k", 3) != 0)
> -         break;
> -
> -      if (cur_sec == NULL)
> -       {
>           section_offsets &offsets = objfile->section_offsets;
>
>           /* Pick non-overlapping offsets for sections the user did not
> @@ -704,7 +682,6 @@ default_symfile_offsets (struct objfile
>                                         offsets[cur_sec->index]);
>               offsets[cur_sec->index] = 0;
>             }
> -       }
>      }
>
>    /* Remember the bfd indexes for the .text, .data, .bss and
>
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Crash-utility] [PATCH CRASH] Fix module section load address when sh_addr != 0
  2025-04-04 10:27       ` Tao Liu
@ 2025-04-04 23:30         ` Stephen Brennan
  2025-04-07  8:15           ` Tao Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Brennan @ 2025-04-04 23:30 UTC (permalink / raw)
  To: Tao Liu; +Cc: devel, linux-debuggers

Hi Tao,

Thanks for digging through the code with me. I'm not very confident or
familiar with it, so I don't have 100% confident answers, but I'll do my
best to answer:

Tao Liu <ltao@redhat.com> writes:
> Hi Stephen,
>
> On Fri, Apr 4, 2025 at 11:43 AM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Stephen Brennan <stephen.s.brennan@oracle.com> writes:
>> > Tao Liu <ltao@redhat.com> writes:
>> >> Hi Stephen,
>> >>
>> >> Thanks for reporting the issue and patch. Please check if I understood
>> >> you correctly. The correct output you are expecting in your case is:
>> >>
>> >>     crash> sym ata_dummy_port_ops
>> >>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
>> >>     crash> mod -s libata
>> >>          MODULE       NAME                      TEXT_BASE         SIZE
>> >>  OBJECT FILE
>> >>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
>> >>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
>> >>     crash> sym ata_dummy_port_ops
>> >>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
>> >>     crash> p/x &ata_dummy_port_ops
>> >>     $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
>> >> same as sym's output, right?
>> >
>> > Yes, that's correct.
>> >
>> >> If that is the case, then after applied your patch, the issue still
>> >> exists on my machine:
>> >>
>> >> crash> sym fuse_miscdevice
>> >> ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
>> >> crash> mod -s fuse
>> >>      MODULE       NAME                             TEXT_BASE
>> >> SIZE  OBJECT FILE
>> >> ffffffffc05f8dc0  fuse                          ffffffffc05da000
>> >> 233472  /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
>> >> crash> sym fuse_miscdevice
>> >> ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
>> >> crash> p/x &fuse_miscdevice
>> >> $1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.
>> >
>> > Indeed, I checked with fuse on Fedora (6.11.4-301.fc41.x86_64, so almost
>> > the same version). And I see the same behavior with fuse_miscdevice.
>> >
>> > In your case and mine, the offset is 0x1d0c0, which could be a clue to
>> > how this mismatch happens.
>> >
>> >> The .data section of fuse.ko.debug have non-zero address:
>> >>
>> >> $ readelf -S /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
>> >> Section Headers:
>> >>   [Nr] Name              Type             Address           Offset
>> >>        Size              EntSize          Flags  Link  Info  Align
>> >> ...
>> >> [50] .data             NOBITS           0000000000000e20  00000100
>> >>        000000000000080f  0000000000000000  WA       0     0     32
>> >>
>> >> Could you please re-check your patch for this, or is there something
>> >> I'm missing?
>> >
>> > From what I can tell, the important difference here may be that the
>> > symbol you selected is a static variable (which becomes a local ELF
>> > symbol). Apparently, my patch only resolves the issue for *global
>> > variables*, but not for local variables.
>> >
>> > $ eu-readelf -s /usr/lib/debug/lib/modules/$(uname -r)/kernel/fs/fuse/fuse.ko.debug | grep -P "fuse_(miscdevice|mutex)"
>> >    85: 0000000000000100     80 OBJECT  LOCAL  DEFAULT       50 fuse_miscdevice
>> >  1041: 00000000000005c0     32 OBJECT  GLOBAL DEFAULT       50 fuse_mutex
>> >
>> > Here is the behavior of crash's master branch on these two symbols:
>> >
>> > crash> sym fuse_miscdevice
>> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
>> > crash> sym fuse_mutex
>> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
>> > crash> mod -s fuse
>> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
>> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
>> > crash> sym fuse_miscdevice
>> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
>> > crash> sym fuse_mutex
>> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
>> > crash> p/x &fuse_miscdevice
>> > $1 = 0xffffffffc02ddf20
>> > crash> p/x &fuse_mutex
>> > $2 = 0xffffffffc02fa680
>> >
>> > Both fuse_miscdevice, and fuse_mutex are mismatched (with different
>> > offsets). And now, here's the behavior of crash with my patch:
>> >
>> > crash> sym fuse_miscdevice
>> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
>> > crash> sym fuse_mutex
>> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
>> > crash> mod -s fuse
>> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
>> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
>> > crash> sym fuse_miscdevice
>> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
>> > crash> sym fuse_mutex
>> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
>> > crash> p/x &fuse_miscdevice
>> > $1 = 0xffffffffc02ddf20
>> > crash> p/x &fuse_mutex
>> > $2 = 0xffffffffc02fb4a0
>> >
>> > Here the global fuse_mutex is correct, but the static/local
>> > fuse_miscdevice is still incorrect. This indicates that there's more yet
>> > to fix. (But at least my patch does part of the work!)
>> >
>> > I'm wondering whether this has anything to do with ELF relocations.
>>
>> Hello Tao,
>
> Thanks for your detailed debug info, which is quite helpful! For your
> No.1 patch which fixes the global variable relocation, it looks good.
> However, for your No.2 patch in the attached file, I have some
> questions.
>>
>> I've continued debugging and verified that the root cause for this was
>> GDB performing relocations to the DWARF info incorrectly. At the bottom
>
> 1) Is this a gdb bug? If it does, then the proper way is to post No.2
> patch to gdb mailing list and get it fixed there, then we can do gdb
> patch backporting to crash. Frankly I haven't figured out whether it
> is gdb or crash bug so far by myself...

I'm also not sure whether it's a GDB or Crash bug.

Honestly, I think the "bug" is in Linux for producing non-zero section
addresses in a relocatable file. It seems to be universal that section
addresses are zero for relocatble ELF files.

> 2) I'm quoting part of the commit log in your No.2 patch has follows
> (Please don't send patch as attachment next time, because I cannot
> comment directly on it)

Understood, sorry.

> ...
> 1. In default_symfile_offsets(), GDB detects the nonzero sh_addr field
>    and fails to apply the user-supplied section offsets. The result is
>    that later, in symfile_relocate_debug_section(), relocations are
>    applied to the DWARF info using the wrong section addresses, which
>
> Didn't we have corrected the .data section address in No.1 patch of
> the .ko? Then why the "wrong section addresses"?

The flow is like this:

1. crash executes "add-symbol-file [...] -s .data ADDRESS"
2. add_symbol_file_command() parses these and stores them in a list
3. We eventually make our way down to syms_from_objfile_1(), which seems
   to be the main logic.
   a. addr_info_make_relative() takes this list of section offsets,
      along with the BFD file, and matches each one against the BFD
      section. The function description indicates that the purpose is to
      make the addresses provided be relative to the addresses in the
      BFD section, which I suppose is why this number is subtracted.

   b. Then, default_symfile_offsets() is called from
      syms_from_objfile_1(). This function only cares about ET_REL
      relocatable files. It verifies that all sections have a zero
      sh_addr, and assuming that is so, it updates the load address and
      sets various fields on the BFD file to allow relocations to work
      correctly.
4. Relocations get applied using the values set in 3.b. Since the values
   in 3.b were tampered with in 3.a, relocations that are based on the
   address of the affected address are done incorrectly.

I'm still not certain who is at fault, though!

>    results in invalid addresses for variables. Clearly, this has
>    happened before, because crash has special-cased the "__ksymtab*"
>
> Which line of code do you mean the special-cased __ksymtab? I didn't
> find one in gdb/symfile.c.

It is this part of the diff. I've removed it in my commit and also
removed the check for nen-zero section addresses.

--- gdb-16.2/gdb/symfile.c.orig
+++ gdb-16.2/gdb/symfile.c
@@ -633,7 +633,26 @@ default_symfile_offsets (struct objfile *objfile,
       for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = cur_sec->next)
 	/* We do not expect this to happen; just skip this step if the
 	   relocatable file has a section with an assigned VMA.  */
-	if (bfd_section_vma (cur_sec) != 0)
+        if (bfd_section_vma (cur_sec) != 0
+           /*
+            *  Kernel modules may have some non-zero VMAs, i.e., like the
+            *  __ksymtab and __ksymtab_gpl sections in this example:
+            *
+            *    Section Headers:
+            *      [Nr] Name              Type             Address           Offset
+            *           Size              EntSize          Flags  Link  Info  Align
+            *      ...
+            *      [ 8] __ksymtab         PROGBITS         0000000000000060  0000ad90
+            *           0000000000000010  0000000000000000   A       0     0     16
+            *      [ 9] .rela__ksymtab    RELA             0000000000000000  0000ada0
+            *           0000000000000030  0000000000000018          43     8     8
+            *      [10] __ksymtab_gpl     PROGBITS         0000000000000070  0000add0
+            *           00000000000001a0  0000000000000000   A       0     0     16
+            *      ...
+            *
+            *  but they should be treated as if they are NULL.
+            */
+            && strncmp (bfd_section_name (cur_sec), "__k", 3) != 0)
 	  break;
 
       if (cur_sec == NULL)


>    section names to avoid this condition. To resolve this, we simply
>    drop the check for nonzero sh_addr altogether: the only ET_REL files
>    we encounter should be kernel modules, so there's no real reason to
>    be picky.
>
> 2. Even with that fixed, the user-supplied section addresses are
>    clobbered by the addr_info_make_relative(), which subtracts out
>    section offsets during its operation. To resolve this, undo the
>    operation for ET_REL files where a section address was provided by
>    the user (i.e. crash).
> ...
>
> According to your description, it seems to me the gdb cannot handle
> the nonzero sh_addr in the .ko case, but .ko isn't the only one whose
> .data VMA is non-zero, see this case:
>
> $ readelf -S /lib64/libc.so.6
> ...
> [30] .data             PROGBITS         00000000001e8000  001e8000
>        00000000000016c8  0000000000000000  WA       0     0     32
> ...
>
> I don't know if gdb also fails in the userspace .so case? I mean, if
> gdb can handle the user case with no problem, then probably we don't
> need to hack gdb like this.

One important difference is that libc.so.6 is a ET_DYN dynamic object
file. A key difference between dynamic executables and relocatable files
is the symbol table: symbols in dynamic executables have an absolute
virtual address, and they only change based on the load address. Symbols
in relocatable files have an address which is an offset from the
beginning of their section. So if a relocation is done based on the
section address, and the section address is incorrect, then the
relocation will be wrong.

Stephen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Crash-utility] [PATCH CRASH] Fix module section load address when sh_addr != 0
  2025-04-04 23:30         ` Stephen Brennan
@ 2025-04-07  8:15           ` Tao Liu
       [not found]             ` <CANU+ZyeHSoJXUu5m9SaNEy0gQi+HDCzA+NMzp-w=XngGyFMADA@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Tao Liu @ 2025-04-07  8:15 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: devel, linux-debuggers

Hi Stephen,

On Sat, Apr 5, 2025 at 12:30 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Hi Tao,
>
> Thanks for digging through the code with me. I'm not very confident or
> familiar with it, so I don't have 100% confident answers, but I'll do my
> best to answer:

Thanks for your explanation! I think your modification is reasonable
and I have run tests against your patch, no regression is found.
Though I still haven't figured out whether it is a gdb issue or crash
specific, however I think it is OK to accept your patch for now. So I
will ack your No.1 and No.2 patch. Though No.2 patch needs some
cleanup on the code, but I think that is just a minor change, I have
no objection to its major part.

So ack.

Thanks,
Tao Liu

>
> Tao Liu <ltao@redhat.com> writes:
> > Hi Stephen,
> >
> > On Fri, Apr 4, 2025 at 11:43 AM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >>
> >> Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> >> > Tao Liu <ltao@redhat.com> writes:
> >> >> Hi Stephen,
> >> >>
> >> >> Thanks for reporting the issue and patch. Please check if I understood
> >> >> you correctly. The correct output you are expecting in your case is:
> >> >>
> >> >>     crash> sym ata_dummy_port_ops
> >> >>     ffffffffc0a71580 (?) ata_dummy_port_ops [libata]
> >> >>     crash> mod -s libata
> >> >>          MODULE       NAME                      TEXT_BASE         SIZE
> >> >>  OBJECT FILE
> >> >>     ffffffffc0a7b640  libata                 ffffffffc0a47000   520192
> >> >>     /usr/lib/debug/lib/modules/6.12.0-0.11.8.el9uek.x86_64/kernel/drivers/ata/libata.ko.debug
> >> >>     crash> sym ata_dummy_port_ops
> >> >>     ffffffffc0a71580 (B) ata_dummy_port_ops [libata]
> >> >>     crash> p/x &ata_dummy_port_ops
> >> >>     $1 = 0xffffffffc0a6fe80 <<-------- should be 0xffffffffc0a71580,
> >> >> same as sym's output, right?
> >> >
> >> > Yes, that's correct.
> >> >
> >> >> If that is the case, then after applied your patch, the issue still
> >> >> exists on my machine:
> >> >>
> >> >> crash> sym fuse_miscdevice
> >> >> ffffffffc05f7fe0 (?) fuse_miscdevice [fuse]
> >> >> crash> mod -s fuse
> >> >>      MODULE       NAME                             TEXT_BASE
> >> >> SIZE  OBJECT FILE
> >> >> ffffffffc05f8dc0  fuse                          ffffffffc05da000
> >> >> 233472  /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> >> crash> sym fuse_miscdevice
> >> >> ffffffffc05f7fe0 (b) fuse_miscdevice [fuse]
> >> >> crash> p/x &fuse_miscdevice
> >> >> $1 = 0xffffffffc05daf20 << ------ unmatch with the previous value.
> >> >
> >> > Indeed, I checked with fuse on Fedora (6.11.4-301.fc41.x86_64, so almost
> >> > the same version). And I see the same behavior with fuse_miscdevice.
> >> >
> >> > In your case and mine, the offset is 0x1d0c0, which could be a clue to
> >> > how this mismatch happens.
> >> >
> >> >> The .data section of fuse.ko.debug have non-zero address:
> >> >>
> >> >> $ readelf -S /usr/lib/debug/lib/modules/6.11.5-300.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> >> Section Headers:
> >> >>   [Nr] Name              Type             Address           Offset
> >> >>        Size              EntSize          Flags  Link  Info  Align
> >> >> ...
> >> >> [50] .data             NOBITS           0000000000000e20  00000100
> >> >>        000000000000080f  0000000000000000  WA       0     0     32
> >> >>
> >> >> Could you please re-check your patch for this, or is there something
> >> >> I'm missing?
> >> >
> >> > From what I can tell, the important difference here may be that the
> >> > symbol you selected is a static variable (which becomes a local ELF
> >> > symbol). Apparently, my patch only resolves the issue for *global
> >> > variables*, but not for local variables.
> >> >
> >> > $ eu-readelf -s /usr/lib/debug/lib/modules/$(uname -r)/kernel/fs/fuse/fuse.ko.debug | grep -P "fuse_(miscdevice|mutex)"
> >> >    85: 0000000000000100     80 OBJECT  LOCAL  DEFAULT       50 fuse_miscdevice
> >> >  1041: 00000000000005c0     32 OBJECT  GLOBAL DEFAULT       50 fuse_mutex
> >> >
> >> > Here is the behavior of crash's master branch on these two symbols:
> >> >
> >> > crash> sym fuse_miscdevice
> >> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> >> > crash> sym fuse_mutex
> >> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> >> > crash> mod -s fuse
> >> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> >> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> > crash> sym fuse_miscdevice
> >> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> >> > crash> sym fuse_mutex
> >> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> >> > crash> p/x &fuse_miscdevice
> >> > $1 = 0xffffffffc02ddf20
> >> > crash> p/x &fuse_mutex
> >> > $2 = 0xffffffffc02fa680
> >> >
> >> > Both fuse_miscdevice, and fuse_mutex are mismatched (with different
> >> > offsets). And now, here's the behavior of crash with my patch:
> >> >
> >> > crash> sym fuse_miscdevice
> >> > ffffffffc02fafe0 (?) fuse_miscdevice [fuse]
> >> > crash> sym fuse_mutex
> >> > ffffffffc02fb4a0 (?) fuse_mutex [fuse]
> >> > crash> mod -s fuse
> >> >      MODULE       NAME                                  TEXT_BASE         SIZE  OBJECT FILE
> >> > ffffffffc02fbdc0  fuse                               ffffffffc02dd000   233472  /usr/lib/debug/lib/modules/6.11.4-301.fc41.x86_64/kernel/fs/fuse/fuse.ko.debug
> >> > crash> sym fuse_miscdevice
> >> > ffffffffc02fafe0 (b) fuse_miscdevice [fuse]
> >> > crash> sym fuse_mutex
> >> > ffffffffc02fb4a0 (B) fuse_mutex [fuse]
> >> > crash> p/x &fuse_miscdevice
> >> > $1 = 0xffffffffc02ddf20
> >> > crash> p/x &fuse_mutex
> >> > $2 = 0xffffffffc02fb4a0
> >> >
> >> > Here the global fuse_mutex is correct, but the static/local
> >> > fuse_miscdevice is still incorrect. This indicates that there's more yet
> >> > to fix. (But at least my patch does part of the work!)
> >> >
> >> > I'm wondering whether this has anything to do with ELF relocations.
> >>
> >> Hello Tao,
> >
> > Thanks for your detailed debug info, which is quite helpful! For your
> > No.1 patch which fixes the global variable relocation, it looks good.
> > However, for your No.2 patch in the attached file, I have some
> > questions.
> >>
> >> I've continued debugging and verified that the root cause for this was
> >> GDB performing relocations to the DWARF info incorrectly. At the bottom
> >
> > 1) Is this a gdb bug? If it does, then the proper way is to post No.2
> > patch to gdb mailing list and get it fixed there, then we can do gdb
> > patch backporting to crash. Frankly I haven't figured out whether it
> > is gdb or crash bug so far by myself...
>
> I'm also not sure whether it's a GDB or Crash bug.
>
> Honestly, I think the "bug" is in Linux for producing non-zero section
> addresses in a relocatable file. It seems to be universal that section
> addresses are zero for relocatble ELF files.
>
> > 2) I'm quoting part of the commit log in your No.2 patch has follows
> > (Please don't send patch as attachment next time, because I cannot
> > comment directly on it)
>
> Understood, sorry.
>
> > ...
> > 1. In default_symfile_offsets(), GDB detects the nonzero sh_addr field
> >    and fails to apply the user-supplied section offsets. The result is
> >    that later, in symfile_relocate_debug_section(), relocations are
> >    applied to the DWARF info using the wrong section addresses, which
> >
> > Didn't we have corrected the .data section address in No.1 patch of
> > the .ko? Then why the "wrong section addresses"?
>
> The flow is like this:
>
> 1. crash executes "add-symbol-file [...] -s .data ADDRESS"
> 2. add_symbol_file_command() parses these and stores them in a list
> 3. We eventually make our way down to syms_from_objfile_1(), which seems
>    to be the main logic.
>    a. addr_info_make_relative() takes this list of section offsets,
>       along with the BFD file, and matches each one against the BFD
>       section. The function description indicates that the purpose is to
>       make the addresses provided be relative to the addresses in the
>       BFD section, which I suppose is why this number is subtracted.
>
>    b. Then, default_symfile_offsets() is called from
>       syms_from_objfile_1(). This function only cares about ET_REL
>       relocatable files. It verifies that all sections have a zero
>       sh_addr, and assuming that is so, it updates the load address and
>       sets various fields on the BFD file to allow relocations to work
>       correctly.
> 4. Relocations get applied using the values set in 3.b. Since the values
>    in 3.b were tampered with in 3.a, relocations that are based on the
>    address of the affected address are done incorrectly.
>
> I'm still not certain who is at fault, though!
>
> >    results in invalid addresses for variables. Clearly, this has
> >    happened before, because crash has special-cased the "__ksymtab*"
> >
> > Which line of code do you mean the special-cased __ksymtab? I didn't
> > find one in gdb/symfile.c.
>
> It is this part of the diff. I've removed it in my commit and also
> removed the check for nen-zero section addresses.
>
> --- gdb-16.2/gdb/symfile.c.orig
> +++ gdb-16.2/gdb/symfile.c
> @@ -633,7 +633,26 @@ default_symfile_offsets (struct objfile *objfile,
>        for (cur_sec = abfd->sections; cur_sec != NULL; cur_sec = cur_sec->next)
>         /* We do not expect this to happen; just skip this step if the
>            relocatable file has a section with an assigned VMA.  */
> -       if (bfd_section_vma (cur_sec) != 0)
> +        if (bfd_section_vma (cur_sec) != 0
> +           /*
> +            *  Kernel modules may have some non-zero VMAs, i.e., like the
> +            *  __ksymtab and __ksymtab_gpl sections in this example:
> +            *
> +            *    Section Headers:
> +            *      [Nr] Name              Type             Address           Offset
> +            *           Size              EntSize          Flags  Link  Info  Align
> +            *      ...
> +            *      [ 8] __ksymtab         PROGBITS         0000000000000060  0000ad90
> +            *           0000000000000010  0000000000000000   A       0     0     16
> +            *      [ 9] .rela__ksymtab    RELA             0000000000000000  0000ada0
> +            *           0000000000000030  0000000000000018          43     8     8
> +            *      [10] __ksymtab_gpl     PROGBITS         0000000000000070  0000add0
> +            *           00000000000001a0  0000000000000000   A       0     0     16
> +            *      ...
> +            *
> +            *  but they should be treated as if they are NULL.
> +            */
> +            && strncmp (bfd_section_name (cur_sec), "__k", 3) != 0)
>           break;
>
>        if (cur_sec == NULL)
>
>
> >    section names to avoid this condition. To resolve this, we simply
> >    drop the check for nonzero sh_addr altogether: the only ET_REL files
> >    we encounter should be kernel modules, so there's no real reason to
> >    be picky.
> >
> > 2. Even with that fixed, the user-supplied section addresses are
> >    clobbered by the addr_info_make_relative(), which subtracts out
> >    section offsets during its operation. To resolve this, undo the
> >    operation for ET_REL files where a section address was provided by
> >    the user (i.e. crash).
> > ...
> >
> > According to your description, it seems to me the gdb cannot handle
> > the nonzero sh_addr in the .ko case, but .ko isn't the only one whose
> > .data VMA is non-zero, see this case:
> >
> > $ readelf -S /lib64/libc.so.6
> > ...
> > [30] .data             PROGBITS         00000000001e8000  001e8000
> >        00000000000016c8  0000000000000000  WA       0     0     32
> > ...
> >
> > I don't know if gdb also fails in the userspace .so case? I mean, if
> > gdb can handle the user case with no problem, then probably we don't
> > need to hack gdb like this.
>
> One important difference is that libc.so.6 is a ET_DYN dynamic object
> file. A key difference between dynamic executables and relocatable files
> is the symbol table: symbols in dynamic executables have an absolute
> virtual address, and they only change based on the load address. Symbols
> in relocatable files have an address which is an offset from the
> beginning of their section. So if a relocation is done based on the
> section address, and the section address is incorrect, then the
> relocation will be wrong.
>
> Stephen
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH CRASH] Fix module section load address when sh_addr != 0
       [not found]             ` <CANU+ZyeHSoJXUu5m9SaNEy0gQi+HDCzA+NMzp-w=XngGyFMADA@mail.gmail.com>
@ 2025-04-18  3:11               ` Stephen Brennan
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Brennan @ 2025-04-18  3:11 UTC (permalink / raw)
  To: lijiang, Tao Liu; +Cc: devel, linux-debuggers

On 4/17/25 18:43, lijiang wrote:
> On Mon, Apr 7, 2025 at 4:18 PM Tao Liu <ltao@redhat.com
> <mailto:ltao@redhat.com>> wrote:
> 
>     Hi Stephen,
> 
>     On Sat, Apr 5, 2025 at 12:30 PM Stephen Brennan
>     <stephen.s.brennan@oracle.com <mailto:stephen.s.brennan@oracle.com>>
>     wrote:
>     >
>     > Hi Tao,
>     >
>     > Thanks for digging through the code with me. I'm not very confident or
>     > familiar with it, so I don't have 100% confident answers, but I'll
>     do my
>     > best to answer:
> 
>     Thanks for your explanation! I think your modification is reasonable
>     and I have run tests against your patch, no regression is found.
>     Though I still haven't figured out whether it is a gdb issue or crash
>     specific, however I think it is OK to accept your patch for now. So I
>     will ack your No.1 and No.2 patch. Though No.2 patch needs some
>     cleanup on the code, but I think that is just a minor change, I have
>     no objection to its major part.
> 
>     So ack.
> 
> 
>  Applied:
> [1] https://github.com/crash-utility/crash/
> commit/2cf1a93805a8df6a2a4b7fde8d57b6d873c1bbeb <https://github.com/
> crash-utility/crash/commit/2cf1a93805a8df6a2a4b7fde8d57b6d873c1bbeb>
> [2] https://github.com/crash-utility/crash/commit/
> b982ddc4d66dcd59e567201428eb6191e4a24695 <https://github.com/crash-
> utility/crash/commit/b982ddc4d66dcd59e567201428eb6191e4a24695>
> 
> BTW: regenerated the patch for [2].

Thank you!
Stephen

> 
> Thanks
> Lianbo
> 
> 
>     Thanks,
>     Tao Liu
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-04-18  3:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-01 21:01 [PATCH CRASH] Fix module section load address when sh_addr != 0 Stephen Brennan
2025-04-02  7:25 ` [Crash-utility] " Tao Liu
2025-04-02 23:18   ` Stephen Brennan
2025-04-03 22:43     ` Stephen Brennan
2025-04-04  6:41       ` Tao Liu
2025-04-04 10:27       ` Tao Liu
2025-04-04 23:30         ` Stephen Brennan
2025-04-07  8:15           ` Tao Liu
     [not found]             ` <CANU+ZyeHSoJXUu5m9SaNEy0gQi+HDCzA+NMzp-w=XngGyFMADA@mail.gmail.com>
2025-04-18  3:11               ` Stephen Brennan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox