* [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address
@ 2025-06-04 8:42 kernel test robot
2025-06-04 11:04 ` Sebastian Andrzej Siewior
2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
0 siblings, 2 replies; 9+ messages in thread
From: kernel test robot @ 2025-06-04 8:42 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: oe-lkp, lkp, linux-kernel, Paolo Abeni, Allison Henderson, netdev,
linux-rdma, rds-devel, oliver.sang
Hello,
kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:
commit: c50d295c37f2648a8d9e8a572fedaad027d134bb ("rds: Use nested-BH locking for rds_page_remainder")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master dee264c16a6334dcdbea5c186f5ff35f98b1df42]
[test failed on linux-next/master 3a83b350b5be4b4f6bd895eecf9a92080200ee5d]
in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:
runtime: 300s
group: group-01
nr_groups: 5
config: i386-randconfig-017-20250530
compiler: gcc-12
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
(please refer to attached dmesg/kmsg for entire log/backtrace)
the issue does not always happen, 45 times out of 200 runs as below. but parent
keeps clean.
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/group/nr_groups:
vm-snb-i386/trinity/debian-11.1-i386-20220923.cgz/i386-randconfig-017-20250530/gcc-12/300s/group-01/5
0af5928f358c40c1 c50d295c37f2648a8d9e8a572fe
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:200 22% 45:200 dmesg.BUG:unable_to_handle_page_fault_for_address
:200 22% 45:200 dmesg.EIP:strcmp
:200 22% 45:200 dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
:200 22% 45:200 dmesg.Oops
:200 22% 45:200 dmesg.boot_failures
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
[ 66.659921][ T3569] BUG: unable to handle page fault for address: 00001010
[ 66.660296][ T3569] #PF: supervisor read access in kernel mode
[ 66.660593][ T3569] #PF: error_code(0x0000) - not-present page
[ 66.660880][ T3569] *pde = 00000000
[ 66.661062][ T3569] Oops: Oops: 0000 [#1] SMP
[ 66.661283][ T3569] CPU: 0 UID: 65534 PID: 3569 Comm: trinity-c6 Not tainted 6.15.0-rc5-01128-gc50d295c37f2 #1 PREEMPT(full) 36e7369f99e2cec5fc7af69ab3b5e48162ffa3ce
[ 66.661987][ T3569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 66.662476][ T3569] EIP: strcmp (kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/lib/string_32.c:100)
[ 66.662689][ T3569] Code: c9 ff f2 ae 4f 8b 4d f0 49 78 06 ac aa 84 c0 75 f7 31 c0 aa 5e 89 d8 5b 5e 5f 5d 31 d2 31 c9 c3 55 89 e5 57 89 d7 56 89 c6 ac <ae> 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 5e 5f 5d 31 d2 c3 55
All code
========
0: c9 leave
1: ff f2 push %rdx
3: ae scas %es:(%rdi),%al
4: 4f 8b 4d f0 rex.WRXB mov -0x10(%r13),%r9
8: 49 78 06 rex.WB js 0x11
b: ac lods %ds:(%rsi),%al
c: aa stos %al,%es:(%rdi)
d: 84 c0 test %al,%al
f: 75 f7 jne 0x8
11: 31 c0 xor %eax,%eax
13: aa stos %al,%es:(%rdi)
14: 5e pop %rsi
15: 89 d8 mov %ebx,%eax
17: 5b pop %rbx
18: 5e pop %rsi
19: 5f pop %rdi
1a: 5d pop %rbp
1b: 31 d2 xor %edx,%edx
1d: 31 c9 xor %ecx,%ecx
1f: c3 ret
20: 55 push %rbp
21: 89 e5 mov %esp,%ebp
23: 57 push %rdi
24: 89 d7 mov %edx,%edi
26: 56 push %rsi
27: 89 c6 mov %eax,%esi
29: ac lods %ds:(%rsi),%al
2a:* ae scas %es:(%rdi),%al <-- trapping instruction
2b: 75 08 jne 0x35
2d: 84 c0 test %al,%al
2f: 75 f8 jne 0x29
31: 31 c0 xor %eax,%eax
33: eb 04 jmp 0x39
35: 19 c0 sbb %eax,%eax
37: 0c 01 or $0x1,%al
39: 5e pop %rsi
3a: 5f pop %rdi
3b: 5d pop %rbp
3c: 31 d2 xor %edx,%edx
3e: c3 ret
3f: 55 push %rbp
Code starting with the faulting instruction
===========================================
0: ae scas %es:(%rdi),%al
1: 75 08 jne 0xb
3: 84 c0 test %al,%al
5: 75 f8 jne 0xffffffffffffffff
7: 31 c0 xor %eax,%eax
9: eb 04 jmp 0xf
b: 19 c0 sbb %eax,%eax
d: 0c 01 or $0x1,%al
f: 5e pop %rsi
10: 5f pop %rdi
11: 5d pop %rbp
12: 31 d2 xor %edx,%edx
14: c3 ret
15: 55 push %rbp
[ 66.663604][ T3569] EAX: c6326063 EBX: e336dc08 ECX: c6d03c10 EDX: 00001010
[ 66.663941][ T3569] ESI: c63260c3 EDI: 00001010 EBP: ed5b7c4c ESP: ed5b7c44
[ 66.664278][ T3569] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010082
[ 66.664650][ T3569] CR0: 80050033 CR2: 00001010 CR3: 3c528000 CR4: 000406d0
[ 66.664987][ T3569] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 66.665323][ T3569] DR6: fffe0ff0 DR7: 00000400
[ 66.665548][ T3569] Call Trace:
[ 66.665709][ T3569] register_lock_class (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:880 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:1345)
[ 66.665957][ T3569] __lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5111)
[ 66.666178][ T3569] ? unknown_module_param_cb (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/rcupdate.h:1155)
[ 66.666439][ T3569] ? lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5868 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5823)
[ 66.666661][ T3569] ? unknown_module_param_cb (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/rcupdate.h:1155)
[ 66.666921][ T3569] ? mem_alloc_profiling_enabled (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:83 kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:150) rds
[ 66.667383][ T3569] lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5868 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5823)
[ 66.667598][ T3569] ? mem_alloc_profiling_enabled (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:83 kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:150) rds
[ 66.668058][ T3569] ? lock_release (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5889)
[ 66.668275][ T3569] ? class_rcu_destructor+0x5a/0x69
[ 66.668562][ T3569] local_lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/local_lock_internal.h:39) rds
[ 66.668991][ T3569] ? mem_alloc_profiling_enabled (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:83 kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:150) rds
[ 66.669453][ T3569] rds_page_remainder_alloc (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/page.c:93 (discriminator 34)) rds
[ 66.669907][ T3569] ? __init_waitqueue_head (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/sched/wait.c:12)
[ 66.670162][ T3569] rds_message_copy_from_user (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/message.c:440) rds
[ 66.670625][ T3569] ? rds_message_alloc_sgs (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/message.c:329) rds
[ 66.671072][ T3569] rds_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/send.c:1280) rds
[ 66.671480][ T3569] ? __import_iovec (kbuild/obj/consumer/i386-randconfig-017-20250530/lib/iov_iter.c:1445 kbuild/obj/consumer/i386-randconfig-017-20250530/lib/iov_iter.c:1459)
[ 66.671712][ T3569] sock_sendmsg_nosec (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:715)
[ 66.671949][ T3569] ____sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:727 kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2566)
[ 66.672178][ T3569] ___sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2620)
[ 66.672413][ T3569] ? unlock_hrtimer_base+0xa/0x10
[ 66.672693][ T3569] ? __lock_release+0x49/0x105
[ 66.672951][ T3569] ? unlock_hrtimer_base+0xa/0x10
[ 66.673221][ T3569] ? mark_lock (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:4732 (discriminator 3))
[ 66.673430][ T3569] ? __lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5235)
[ 66.673664][ T3569] ? rcu_read_unlock (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/rcupdate.h:329)
[ 66.673897][ T3569] ? lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5868 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5823)
[ 66.674119][ T3569] ? __fget_light (kbuild/obj/consumer/i386-randconfig-017-20250530/fs/file.c:1154)
[ 66.674339][ T3569] __sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2652)
[ 66.674556][ T3569] __ia32_sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2655)
[ 66.674791][ T3569] ia32_sys_call (kbuild/obj/consumer/i386-randconfig-017-20250530/./arch/x86/include/generated/asm/syscalls_32.h:371)
[ 66.675017][ T3569] do_int80_syscall_32 (kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/entry/syscall_32.c:83 kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/entry/syscall_32.c:259)
[ 66.675256][ T3569] entry_INT80_32 (kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/entry/entry_32.S:945)
[ 66.675482][ T3569] EIP: 0xa7edd092
[ 66.675660][ T3569] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 f8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00
All code
========
0: 00 00 add %al,(%rax)
2: 00 e9 add %ch,%cl
4: 90 nop
5: ff (bad)
6: ff (bad)
7: ff (bad)
8: ff a3 24 00 00 00 jmp *0x24(%rbx)
e: 68 30 00 00 00 push $0x30
13: e9 80 ff ff ff jmp 0xffffffffffffff98
18: ff a3 f8 ff ff ff jmp *-0x8(%rbx)
1e: 66 90 xchg %ax,%ax
...
28: cd 80 int $0x80
2a:* c3 ret <-- trapping instruction
2b: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
32: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
38: 8b 1c 24 mov (%rsp),%ebx
3b: c3 ret
3c: 8d .byte 0x8d
3d: b4 26 mov $0x26,%ah
...
Code starting with the faulting instruction
===========================================
0: c3 ret
1: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
8: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
e: 8b 1c 24 mov (%rsp),%ebx
11: c3 ret
12: 8d .byte 0x8d
13: b4 26 mov $0x26,%ah
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250604/202506041623.e45e4f7d-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address
2025-06-04 8:42 [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address kernel test robot
@ 2025-06-04 11:04 ` Sebastian Andrzej Siewior
2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-04 11:04 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-kernel, Paolo Abeni, Allison Henderson, netdev,
linux-rdma, rds-devel
On 2025-06-04 16:42:37 [+0800], kernel test robot wrote:
>
> Hello,
Hi,
> kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:
>
> commit: c50d295c37f2648a8d9e8a572fedaad027d134bb ("rds: Use nested-BH locking for rds_page_remainder")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
…
> the issue does not always happen, 45 times out of 200 runs as below. but parent
> keeps clean.
I can reproduce this quite reliably. Looking…
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] module: Make sure relocations are applied to the per-CPU section
2025-06-04 8:42 [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address kernel test robot
2025-06-04 11:04 ` Sebastian Andrzej Siewior
@ 2025-06-04 15:27 ` Sebastian Andrzej Siewior
2025-06-05 6:07 ` [PATCH v2] " Sebastian Andrzej Siewior
1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-04 15:27 UTC (permalink / raw)
To: linux-modules
Cc: oe-lkp, lkp, linux-kernel, kernel test robot, Paolo Abeni,
Allison Henderson, netdev, linux-rdma, rds-devel,
Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Daniel Gomez,
Peter Zijlstra, Thomas Gleixner
The per-CPU data section is handled differently than the other sections.
The memory allocations requires a special __percpu pointer and then the
section is copied into the view of each CPU. Therefore the SHF_ALLOC
flag is removed to ensure move_module() skips it.
Later, relocations are applied and apply_relocations() skips sections
without SHF_ALLOC because they have not been copied. This also skips the
per-CPU data section.
The missing relocations result in a NULL pointer on x86-64 and very
small values on x86-32. This results in a crash because it is not
skipped like NULL pointer would and it can't be dereferenced.
Such an assignment happens during compile time per-CPU lock
initialisation with lockdep enabled.
Add the SHF_ALLOC flag back for the per-CPU section after move_module().
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check. No, really!")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/module/main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 5c6ab20240a6d..35abb5f13d7dc 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2816,6 +2816,9 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
if (err)
return ERR_PTR(err);
+ /* Add SHF_ALLOC back so that relocations are applied. */
+ info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
+
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
--
2.49.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2] module: Make sure relocations are applied to the per-CPU section
2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
@ 2025-06-05 6:07 ` Sebastian Andrzej Siewior
2025-06-05 13:44 ` Petr Pavlu
0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-05 6:07 UTC (permalink / raw)
To: linux-modules
Cc: oe-lkp, lkp, linux-kernel, kernel test robot, Paolo Abeni,
Allison Henderson, netdev, linux-rdma, rds-devel,
Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Daniel Gomez,
Peter Zijlstra, Thomas Gleixner
The per-CPU data section is handled differently than the other sections.
The memory allocations requires a special __percpu pointer and then the
section is copied into the view of each CPU. Therefore the SHF_ALLOC
flag is removed to ensure move_module() skips it.
Later, relocations are applied and apply_relocations() skips sections
without SHF_ALLOC because they have not been copied. This also skips the
per-CPU data section.
The missing relocations result in a NULL pointer on x86-64 and very
small values on x86-32. This results in a crash because it is not
skipped like NULL pointer would and can't be dereferenced.
Such an assignment happens during static per-CPU lock initialisation
with lockdep enabled.
Add the SHF_ALLOC flag back for the per-CPU section (if found) after
move_module().
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check. No, really!")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: https://lore.kernel.org/all/20250604152707.CieD9tN0@linutronix.de/
- Add the flag back only on SMP if the per-CPU section was found.
kernel/module/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 5c6ab20240a6d..4f6554dedf8ea 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
if (err)
return ERR_PTR(err);
+ /* Add SHF_ALLOC back so that relocations are applied. */
+ if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
+ info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
+
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
--
2.49.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
2025-06-05 6:07 ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2025-06-05 13:44 ` Petr Pavlu
2025-06-05 14:39 ` Peter Zijlstra
2025-06-05 15:54 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 9+ messages in thread
From: Petr Pavlu @ 2025-06-05 13:44 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
Thomas Gleixner
On 6/5/25 8:07 AM, Sebastian Andrzej Siewior wrote:
> The per-CPU data section is handled differently than the other sections.
> The memory allocations requires a special __percpu pointer and then the
> section is copied into the view of each CPU. Therefore the SHF_ALLOC
> flag is removed to ensure move_module() skips it.
>
> Later, relocations are applied and apply_relocations() skips sections
> without SHF_ALLOC because they have not been copied. This also skips the
> per-CPU data section.
> The missing relocations result in a NULL pointer on x86-64 and very
> small values on x86-32. This results in a crash because it is not
> skipped like NULL pointer would and can't be dereferenced.
>
> Such an assignment happens during static per-CPU lock initialisation
> with lockdep enabled.
>
> Add the SHF_ALLOC flag back for the per-CPU section (if found) after
> move_module().
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
> Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check. No, really!")
Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
(pre-Git, [1])?
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> v1…v2: https://lore.kernel.org/all/20250604152707.CieD9tN0@linutronix.de/
> - Add the flag back only on SMP if the per-CPU section was found.
>
> kernel/module/main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 5c6ab20240a6d..4f6554dedf8ea 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> if (err)
> return ERR_PTR(err);
>
> + /* Add SHF_ALLOC back so that relocations are applied. */
> + if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
> + info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
> +
> /* Module has been copied to its final place now: return it. */
> mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> kmemleak_load_module(mod, info);
This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
is set by rewrite_section_headers() to point to the percpu data in the
userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
doesn't move and the sh_addr isn't adjusted by move_module(). The
function apply_relocations() then applies the relocations in the initial
ELF copy. Finally, post_relocation() copies the relocated percpu data to
their final per-CPU destinations.
However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
this way. It is ok to reset it once, but if we need to set it back again
then I would reconsider this.
An alternative approach could be to teach apply_relocations() that the
percpu section is special and should be relocated even though it doesn't
have SHF_ALLOC set. This would also allow adding a comment explaining
that we're relocating the data in the original ELF copy, which I find
useful to mention as it is different to other relocation processing.
For instance:
/*
* Don't bother with non-allocated sections.
*
* An exception is the percpu section, which has separate allocations
* for individual CPUs. We relocate the percpu section in the initial
* ELF template and subsequently copy it to the per-CPU destinations.
*/
if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
infosec != info->index.pcpu)
continue;
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=b3b91325f3c77ace041f769ada7039ebc7aab8de
--
Thanks,
Petr
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
2025-06-05 13:44 ` Petr Pavlu
@ 2025-06-05 14:39 ` Peter Zijlstra
2025-06-05 15:54 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2025-06-05 14:39 UTC (permalink / raw)
To: Petr Pavlu
Cc: Sebastian Andrzej Siewior, linux-modules, oe-lkp, lkp,
linux-kernel, kernel test robot, Paolo Abeni, Allison Henderson,
netdev, linux-rdma, rds-devel, Luis Chamberlain, Sami Tolvanen,
Daniel Gomez, Thomas Gleixner
On Thu, Jun 05, 2025 at 03:44:23PM +0200, Petr Pavlu wrote:
> For instance:
>
> /*
> * Don't bother with non-allocated sections.
> *
> * An exception is the percpu section, which has separate allocations
> * for individual CPUs. We relocate the percpu section in the initial
> * ELF template and subsequently copy it to the per-CPU destinations.
> */
> if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
> infosec != info->index.pcpu)
> continue;
Right, and pcpu is a data section and should not have relative
relocations, only absolute.
So copying things should not be a problem.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
2025-06-05 13:44 ` Petr Pavlu
2025-06-05 14:39 ` Peter Zijlstra
@ 2025-06-05 15:54 ` Sebastian Andrzej Siewior
2025-06-05 16:50 ` Petr Pavlu
1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-05 15:54 UTC (permalink / raw)
To: Petr Pavlu
Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
Thomas Gleixner
On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
> (pre-Git, [1])?
Looking further back into the history, we have
21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
which does
+ if (pcpuindex) {
+ /* We have a special allocation for this section. */
+ mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
+ sechdrs[pcpuindex].sh_addralign);
+ if (!mod->percpu) {
+ err = -ENOMEM;
+ goto free_mod;
+ }
+ sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
+ }
so this looks like the origin.
…
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> > if (err)
> > return ERR_PTR(err);
> >
> > + /* Add SHF_ALLOC back so that relocations are applied. */
> > + if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
> > + info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
> > +
> > /* Module has been copied to its final place now: return it. */
> > mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> > kmemleak_load_module(mod, info);
>
> This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
> is set by rewrite_section_headers() to point to the percpu data in the
> userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
> doesn't move and the sh_addr isn't adjusted by move_module(). The
> function apply_relocations() then applies the relocations in the initial
> ELF copy. Finally, post_relocation() copies the relocated percpu data to
> their final per-CPU destinations.
>
> However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
> this way. It is ok to reset it once, but if we need to set it back again
> then I would reconsider this.
I had the other way around but this flag is not considered anywhere
else other than the functions called here. So I decided to add back what
was taken once.
> An alternative approach could be to teach apply_relocations() that the
> percpu section is special and should be relocated even though it doesn't
> have SHF_ALLOC set. This would also allow adding a comment explaining
> that we're relocating the data in the original ELF copy, which I find
> useful to mention as it is different to other relocation processing.
Not sure if this makes it better. It looks like it continues a
workaround…
The only reason why it has been removed in the first place is to skip
the copy process.
We could also keep the flag and skip the section during the copy
process based on its id. This was the original intention.
> For instance:
>
> /*
> * Don't bother with non-allocated sections.
> *
> * An exception is the percpu section, which has separate allocations
> * for individual CPUs. We relocate the percpu section in the initial
> * ELF template and subsequently copy it to the per-CPU destinations.
> */
> if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
> infosec != info->index.pcpu)
> continue;
>
If you insist but…
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
2025-06-05 15:54 ` Sebastian Andrzej Siewior
@ 2025-06-05 16:50 ` Petr Pavlu
2025-06-10 14:55 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Petr Pavlu @ 2025-06-05 16:50 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
Thomas Gleixner
On 6/5/25 5:54 PM, Sebastian Andrzej Siewior wrote:
> On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
>> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
>> (pre-Git, [1])?
>
> Looking further back into the history, we have
> 21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
>
> which does
>
> + if (pcpuindex) {
> + /* We have a special allocation for this section. */
> + mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
> + sechdrs[pcpuindex].sh_addralign);
> + if (!mod->percpu) {
> + err = -ENOMEM;
> + goto free_mod;
> + }
> + sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
> + }
>
> so this looks like the origin.
This patch added the initial per-cpu support for modules. The relocation
handling at that point appears correct to me. I think it's the mentioned patch
"Don't relocate non-allocated regions in modules" that broke it.
>
> …
>>> --- a/kernel/module/main.c
>>> +++ b/kernel/module/main.c
>>> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
>>> if (err)
>>> return ERR_PTR(err);
>>>
>>> + /* Add SHF_ALLOC back so that relocations are applied. */
>>> + if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
>>> + info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
>>> +
>>> /* Module has been copied to its final place now: return it. */
>>> mod = (void *)info->sechdrs[info->index.mod].sh_addr;
>>> kmemleak_load_module(mod, info);
>>
>> This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
>> is set by rewrite_section_headers() to point to the percpu data in the
>> userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
>> doesn't move and the sh_addr isn't adjusted by move_module(). The
>> function apply_relocations() then applies the relocations in the initial
>> ELF copy. Finally, post_relocation() copies the relocated percpu data to
>> their final per-CPU destinations.
>>
>> However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
>> this way. It is ok to reset it once, but if we need to set it back again
>> then I would reconsider this.
>
> I had the other way around but this flag is not considered anywhere
> else other than the functions called here. So I decided to add back what
> was taken once.
>
>> An alternative approach could be to teach apply_relocations() that the
>> percpu section is special and should be relocated even though it doesn't
>> have SHF_ALLOC set. This would also allow adding a comment explaining
>> that we're relocating the data in the original ELF copy, which I find
>> useful to mention as it is different to other relocation processing.
>
> Not sure if this makes it better. It looks like it continues a
> workaround…
> The only reason why it has been removed in the first place is to skip
> the copy process.
The SHF_ALLOC flag is also removed to prevent the section from being allocated
by layout_sections().
> We could also keep the flag and skip the section during the copy
> process based on its id. This was the original intention.
>
>> For instance:
>>
>> /*
>> * Don't bother with non-allocated sections.
>> *
>> * An exception is the percpu section, which has separate allocations
>> * for individual CPUs. We relocate the percpu section in the initial
>> * ELF template and subsequently copy it to the per-CPU destinations.
>> */
>> if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
>> infosec != info->index.pcpu)
>> continue;
>>
>
> If you insist but…
It seems logical to me that the SHF_ALLOC flag is removed for the percpu section
since it isn't directly allocated by the regular process. This is consistent
with what the module loader does in other similar cases. I could also understand
keeping the flag and explicitly skipping the layout and allocate process for the
section. However, adjusting the flag back and forth to trigger the right code
paths in between seems fragile to me and harder to maintain if we need to
shuffle things around in the future.
--
Cheers,
Petr
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
2025-06-05 16:50 ` Petr Pavlu
@ 2025-06-10 14:55 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-10 14:55 UTC (permalink / raw)
To: Petr Pavlu
Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
Thomas Gleixner
On 2025-06-05 18:50:27 [+0200], Petr Pavlu wrote:
> On 6/5/25 5:54 PM, Sebastian Andrzej Siewior wrote:
> > On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
> >> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
> >> (pre-Git, [1])?
> >
> > Looking further back into the history, we have
> > 21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
> >
> > which does
> >
> > + if (pcpuindex) {
> > + /* We have a special allocation for this section. */
> > + mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
> > + sechdrs[pcpuindex].sh_addralign);
> > + if (!mod->percpu) {
> > + err = -ENOMEM;
> > + goto free_mod;
> > + }
> > + sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
> > + }
> >
> > so this looks like the origin.
>
> This patch added the initial per-cpu support for modules. The relocation
> handling at that point appears correct to me. I think it's the mentioned patch
> "Don't relocate non-allocated regions in modules" that broke it.
Ach, it ignores that bit. Okay then.
> It seems logical to me that the SHF_ALLOC flag is removed for the percpu section
> since it isn't directly allocated by the regular process. This is consistent
> with what the module loader does in other similar cases. I could also understand
> keeping the flag and explicitly skipping the layout and allocate process for the
> section. However, adjusting the flag back and forth to trigger the right code
> paths in between seems fragile to me and harder to maintain if we need to
> shuffle things around in the future.
Okay. Let me add this exception later on instead of adding the bit back.
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-06-10 14:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 8:42 [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address kernel test robot
2025-06-04 11:04 ` Sebastian Andrzej Siewior
2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
2025-06-05 6:07 ` [PATCH v2] " Sebastian Andrzej Siewior
2025-06-05 13:44 ` Petr Pavlu
2025-06-05 14:39 ` Peter Zijlstra
2025-06-05 15:54 ` Sebastian Andrzej Siewior
2025-06-05 16:50 ` Petr Pavlu
2025-06-10 14:55 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).