[linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [linus:master] [rds]  c50d295c37: BUG:unable_to_handle_page_fault_for_address
@ 2025-06-04  8:42 kernel test robot
  2025-06-04 11:04 ` Sebastian Andrzej Siewior
  2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
  0 siblings, 2 replies; 9+ messages in thread
From: kernel test robot @ 2025-06-04  8:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: oe-lkp, lkp, linux-kernel, Paolo Abeni, Allison Henderson, netdev,
	linux-rdma, rds-devel, oliver.sang


Hello,

kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:

commit: c50d295c37f2648a8d9e8a572fedaad027d134bb ("rds: Use nested-BH locking for rds_page_remainder")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      dee264c16a6334dcdbea5c186f5ff35f98b1df42]
[test failed on linux-next/master 3a83b350b5be4b4f6bd895eecf9a92080200ee5d]

in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:

	runtime: 300s
	group: group-01
	nr_groups: 5


config: i386-randconfig-017-20250530
compiler: gcc-12
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)


the issue does not always happen, 45 times out of 200 runs as below. but parent
keeps clean.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/group/nr_groups:
  vm-snb-i386/trinity/debian-11.1-i386-20220923.cgz/i386-randconfig-017-20250530/gcc-12/300s/group-01/5

0af5928f358c40c1 c50d295c37f2648a8d9e8a572fe
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :200         22%          45:200   dmesg.BUG:unable_to_handle_page_fault_for_address
           :200         22%          45:200   dmesg.EIP:strcmp
           :200         22%          45:200   dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
           :200         22%          45:200   dmesg.Oops
           :200         22%          45:200   dmesg.boot_failures



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com


[   66.659921][ T3569] BUG: unable to handle page fault for address: 00001010
[   66.660296][ T3569] #PF: supervisor read access in kernel mode
[   66.660593][ T3569] #PF: error_code(0x0000) - not-present page
[   66.660880][ T3569] *pde = 00000000
[   66.661062][ T3569] Oops: Oops: 0000 [#1] SMP
[   66.661283][ T3569] CPU: 0 UID: 65534 PID: 3569 Comm: trinity-c6 Not tainted 6.15.0-rc5-01128-gc50d295c37f2 #1 PREEMPT(full)  36e7369f99e2cec5fc7af69ab3b5e48162ffa3ce
[   66.661987][ T3569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 66.662476][ T3569] EIP: strcmp (kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/lib/string_32.c:100) 
[ 66.662689][ T3569] Code: c9 ff f2 ae 4f 8b 4d f0 49 78 06 ac aa 84 c0 75 f7 31 c0 aa 5e 89 d8 5b 5e 5f 5d 31 d2 31 c9 c3 55 89 e5 57 89 d7 56 89 c6 ac <ae> 75 08 84 c0 75 f8 31 c0 eb 04 19 c0 0c 01 5e 5f 5d 31 d2 c3 55
All code
========
   0:	c9                   	leave
   1:	ff f2                	push   %rdx
   3:	ae                   	scas   %es:(%rdi),%al
   4:	4f 8b 4d f0          	rex.WRXB mov -0x10(%r13),%r9
   8:	49 78 06             	rex.WB js 0x11
   b:	ac                   	lods   %ds:(%rsi),%al
   c:	aa                   	stos   %al,%es:(%rdi)
   d:	84 c0                	test   %al,%al
   f:	75 f7                	jne    0x8
  11:	31 c0                	xor    %eax,%eax
  13:	aa                   	stos   %al,%es:(%rdi)
  14:	5e                   	pop    %rsi
  15:	89 d8                	mov    %ebx,%eax
  17:	5b                   	pop    %rbx
  18:	5e                   	pop    %rsi
  19:	5f                   	pop    %rdi
  1a:	5d                   	pop    %rbp
  1b:	31 d2                	xor    %edx,%edx
  1d:	31 c9                	xor    %ecx,%ecx
  1f:	c3                   	ret
  20:	55                   	push   %rbp
  21:	89 e5                	mov    %esp,%ebp
  23:	57                   	push   %rdi
  24:	89 d7                	mov    %edx,%edi
  26:	56                   	push   %rsi
  27:	89 c6                	mov    %eax,%esi
  29:	ac                   	lods   %ds:(%rsi),%al
  2a:*	ae                   	scas   %es:(%rdi),%al		<-- trapping instruction
  2b:	75 08                	jne    0x35
  2d:	84 c0                	test   %al,%al
  2f:	75 f8                	jne    0x29
  31:	31 c0                	xor    %eax,%eax
  33:	eb 04                	jmp    0x39
  35:	19 c0                	sbb    %eax,%eax
  37:	0c 01                	or     $0x1,%al
  39:	5e                   	pop    %rsi
  3a:	5f                   	pop    %rdi
  3b:	5d                   	pop    %rbp
  3c:	31 d2                	xor    %edx,%edx
  3e:	c3                   	ret
  3f:	55                   	push   %rbp

Code starting with the faulting instruction
===========================================
   0:	ae                   	scas   %es:(%rdi),%al
   1:	75 08                	jne    0xb
   3:	84 c0                	test   %al,%al
   5:	75 f8                	jne    0xffffffffffffffff
   7:	31 c0                	xor    %eax,%eax
   9:	eb 04                	jmp    0xf
   b:	19 c0                	sbb    %eax,%eax
   d:	0c 01                	or     $0x1,%al
   f:	5e                   	pop    %rsi
  10:	5f                   	pop    %rdi
  11:	5d                   	pop    %rbp
  12:	31 d2                	xor    %edx,%edx
  14:	c3                   	ret
  15:	55                   	push   %rbp
[   66.663604][ T3569] EAX: c6326063 EBX: e336dc08 ECX: c6d03c10 EDX: 00001010
[   66.663941][ T3569] ESI: c63260c3 EDI: 00001010 EBP: ed5b7c4c ESP: ed5b7c44
[   66.664278][ T3569] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010082
[   66.664650][ T3569] CR0: 80050033 CR2: 00001010 CR3: 3c528000 CR4: 000406d0
[   66.664987][ T3569] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   66.665323][ T3569] DR6: fffe0ff0 DR7: 00000400
[   66.665548][ T3569] Call Trace:
[ 66.665709][ T3569] register_lock_class (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:880 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:1345) 
[ 66.665957][ T3569] __lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5111) 
[ 66.666178][ T3569] ? unknown_module_param_cb (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/rcupdate.h:1155) 
[ 66.666439][ T3569] ? lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5868 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5823) 
[ 66.666661][ T3569] ? unknown_module_param_cb (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/rcupdate.h:1155) 
[ 66.666921][ T3569] ? mem_alloc_profiling_enabled (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:83 kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:150) rds 
[ 66.667383][ T3569] lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5868 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5823) 
[ 66.667598][ T3569] ? mem_alloc_profiling_enabled (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:83 kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:150) rds 
[ 66.668058][ T3569] ? lock_release (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5889) 
[ 66.668275][ T3569] ? class_rcu_destructor+0x5a/0x69 
[ 66.668562][ T3569] local_lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/local_lock_internal.h:39) rds 
[ 66.668991][ T3569] ? mem_alloc_profiling_enabled (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:83 kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/list.h:150) rds 
[ 66.669453][ T3569] rds_page_remainder_alloc (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/page.c:93 (discriminator 34)) rds 
[ 66.669907][ T3569] ? __init_waitqueue_head (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/sched/wait.c:12) 
[ 66.670162][ T3569] rds_message_copy_from_user (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/message.c:440) rds 
[ 66.670625][ T3569] ? rds_message_alloc_sgs (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/message.c:329) rds 
[ 66.671072][ T3569] rds_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/rds/send.c:1280) rds 
[ 66.671480][ T3569] ? __import_iovec (kbuild/obj/consumer/i386-randconfig-017-20250530/lib/iov_iter.c:1445 kbuild/obj/consumer/i386-randconfig-017-20250530/lib/iov_iter.c:1459) 
[ 66.671712][ T3569] sock_sendmsg_nosec (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:715) 
[ 66.671949][ T3569] ____sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:727 kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2566) 
[ 66.672178][ T3569] ___sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2620) 
[ 66.672413][ T3569] ? unlock_hrtimer_base+0xa/0x10 
[ 66.672693][ T3569] ? __lock_release+0x49/0x105 
[ 66.672951][ T3569] ? unlock_hrtimer_base+0xa/0x10 
[ 66.673221][ T3569] ? mark_lock (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:4732 (discriminator 3)) 
[ 66.673430][ T3569] ? __lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5235) 
[ 66.673664][ T3569] ? rcu_read_unlock (kbuild/obj/consumer/i386-randconfig-017-20250530/include/linux/rcupdate.h:329) 
[ 66.673897][ T3569] ? lock_acquire (kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:472 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5868 kbuild/obj/consumer/i386-randconfig-017-20250530/kernel/locking/lockdep.c:5823) 
[ 66.674119][ T3569] ? __fget_light (kbuild/obj/consumer/i386-randconfig-017-20250530/fs/file.c:1154) 
[ 66.674339][ T3569] __sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2652) 
[ 66.674556][ T3569] __ia32_sys_sendmsg (kbuild/obj/consumer/i386-randconfig-017-20250530/net/socket.c:2655) 
[ 66.674791][ T3569] ia32_sys_call (kbuild/obj/consumer/i386-randconfig-017-20250530/./arch/x86/include/generated/asm/syscalls_32.h:371) 
[ 66.675017][ T3569] do_int80_syscall_32 (kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/entry/syscall_32.c:83 kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/entry/syscall_32.c:259) 
[ 66.675256][ T3569] entry_INT80_32 (kbuild/obj/consumer/i386-randconfig-017-20250530/arch/x86/entry/entry_32.S:945) 
[   66.675482][ T3569] EIP: 0xa7edd092
[ 66.675660][ T3569] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 f8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 e9                	add    %ch,%cl
   4:	90                   	nop
   5:	ff                   	(bad)
   6:	ff                   	(bad)
   7:	ff                   	(bad)
   8:	ff a3 24 00 00 00    	jmp    *0x24(%rbx)
   e:	68 30 00 00 00       	push   $0x30
  13:	e9 80 ff ff ff       	jmp    0xffffffffffffff98
  18:	ff a3 f8 ff ff ff    	jmp    *-0x8(%rbx)
  1e:	66 90                	xchg   %ax,%ax
	...
  28:	cd 80                	int    $0x80
  2a:*	c3                   	ret		<-- trapping instruction
  2b:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
  32:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
  38:	8b 1c 24             	mov    (%rsp),%ebx
  3b:	c3                   	ret
  3c:	8d                   	.byte 0x8d
  3d:	b4 26                	mov    $0x26,%ah
	...

Code starting with the faulting instruction
===========================================
   0:	c3                   	ret
   1:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
   8:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
   e:	8b 1c 24             	mov    (%rsp),%ebx
  11:	c3                   	ret
  12:	8d                   	.byte 0x8d
  13:	b4 26                	mov    $0x26,%ah


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250604/202506041623.e45e4f7d-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linus:master] [rds]  c50d295c37: BUG:unable_to_handle_page_fault_for_address
  2025-06-04  8:42 [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address kernel test robot
@ 2025-06-04 11:04 ` Sebastian Andrzej Siewior
  2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
  1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-04 11:04 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Paolo Abeni, Allison Henderson, netdev,
	linux-rdma, rds-devel

On 2025-06-04 16:42:37 [+0800], kernel test robot wrote:
> 
> Hello,
Hi,

> kernel test robot noticed "BUG:unable_to_handle_page_fault_for_address" on:
> 
> commit: c50d295c37f2648a8d9e8a572fedaad027d134bb ("rds: Use nested-BH locking for rds_page_remainder")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
…
> the issue does not always happen, 45 times out of 200 runs as below. but parent
> keeps clean.

I can reproduce this quite reliably. Looking…

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] module: Make sure relocations are applied to the per-CPU section
  2025-06-04  8:42 [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address kernel test robot
  2025-06-04 11:04 ` Sebastian Andrzej Siewior
@ 2025-06-04 15:27 ` Sebastian Andrzej Siewior
  2025-06-05  6:07   ` [PATCH v2] " Sebastian Andrzej Siewior
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-04 15:27 UTC (permalink / raw)
  To: linux-modules
  Cc: oe-lkp, lkp, linux-kernel, kernel test robot, Paolo Abeni,
	Allison Henderson, netdev, linux-rdma, rds-devel,
	Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Daniel Gomez,
	Peter Zijlstra, Thomas Gleixner

The per-CPU data section is handled differently than the other sections.
The memory allocations requires a special __percpu pointer and then the
section is copied into the view of each CPU. Therefore the SHF_ALLOC
flag is removed to ensure move_module() skips it.

Later, relocations are applied and apply_relocations() skips sections
without SHF_ALLOC because they have not been copied. This also skips the
per-CPU data section.

The missing relocations result in a NULL pointer on x86-64 and very
small values on x86-32. This results in a crash because it is not
skipped like NULL pointer would and it can't be dereferenced.

Such an assignment happens during compile time per-CPU lock
initialisation with lockdep enabled.

Add the SHF_ALLOC flag back for the per-CPU section after move_module().

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check.  No, really!")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/module/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 5c6ab20240a6d..35abb5f13d7dc 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2816,6 +2816,9 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
 	if (err)
 		return ERR_PTR(err);

+	/* Add SHF_ALLOC back so that relocations are applied. */
+	info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
+
 	/* Module has been copied to its final place now: return it. */
 	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
 	kmemleak_load_module(mod, info);
-- 
2.49.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2] module: Make sure relocations are applied to the per-CPU section
  2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
@ 2025-06-05  6:07   ` Sebastian Andrzej Siewior
  2025-06-05 13:44     ` Petr Pavlu
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-05  6:07 UTC (permalink / raw)
  To: linux-modules
  Cc: oe-lkp, lkp, linux-kernel, kernel test robot, Paolo Abeni,
	Allison Henderson, netdev, linux-rdma, rds-devel,
	Luis Chamberlain, Petr Pavlu, Sami Tolvanen, Daniel Gomez,
	Peter Zijlstra, Thomas Gleixner

The per-CPU data section is handled differently than the other sections.
The memory allocations requires a special __percpu pointer and then the
section is copied into the view of each CPU. Therefore the SHF_ALLOC
flag is removed to ensure move_module() skips it.

Later, relocations are applied and apply_relocations() skips sections
without SHF_ALLOC because they have not been copied. This also skips the
per-CPU data section.
The missing relocations result in a NULL pointer on x86-64 and very
small values on x86-32. This results in a crash because it is not
skipped like NULL pointer would and can't be dereferenced.

Such an assignment happens during static per-CPU lock initialisation
with lockdep enabled.

Add the SHF_ALLOC flag back for the per-CPU section (if found) after
move_module().

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check.  No, really!")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: https://lore.kernel.org/all/20250604152707.CieD9tN0@linutronix.de/
  - Add the flag back only on SMP if the per-CPU section was found.

 kernel/module/main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 5c6ab20240a6d..4f6554dedf8ea 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
 	if (err)
 		return ERR_PTR(err);

+	/* Add SHF_ALLOC back so that relocations are applied. */
+	if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
+		info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
+
 	/* Module has been copied to its final place now: return it. */
 	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
 	kmemleak_load_module(mod, info);
-- 
2.49.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
  2025-06-05  6:07   ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2025-06-05 13:44     ` Petr Pavlu
  2025-06-05 14:39       ` Peter Zijlstra
  2025-06-05 15:54       ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 9+ messages in thread
From: Petr Pavlu @ 2025-06-05 13:44 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
	Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
	Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
	Thomas Gleixner

On 6/5/25 8:07 AM, Sebastian Andrzej Siewior wrote:
> The per-CPU data section is handled differently than the other sections.
> The memory allocations requires a special __percpu pointer and then the
> section is copied into the view of each CPU. Therefore the SHF_ALLOC
> flag is removed to ensure move_module() skips it.
> 
> Later, relocations are applied and apply_relocations() skips sections
> without SHF_ALLOC because they have not been copied. This also skips the
> per-CPU data section.
> The missing relocations result in a NULL pointer on x86-64 and very
> small values on x86-32. This results in a crash because it is not
> skipped like NULL pointer would and can't be dereferenced.
> 
> Such an assignment happens during static per-CPU lock initialisation
> with lockdep enabled.
> 
> Add the SHF_ALLOC flag back for the per-CPU section (if found) after
> move_module().
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
> Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check.  No, really!")

Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
(pre-Git, [1])?

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> v1…v2: https://lore.kernel.org/all/20250604152707.CieD9tN0@linutronix.de/
>   - Add the flag back only on SMP if the per-CPU section was found.
> 
>  kernel/module/main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 5c6ab20240a6d..4f6554dedf8ea 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
>  	if (err)
>  		return ERR_PTR(err);
>  
> +	/* Add SHF_ALLOC back so that relocations are applied. */
> +	if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
> +		info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
> +
>  	/* Module has been copied to its final place now: return it. */
>  	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
>  	kmemleak_load_module(mod, info);

This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
is set by rewrite_section_headers() to point to the percpu data in the
userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
doesn't move and the sh_addr isn't adjusted by move_module(). The
function apply_relocations() then applies the relocations in the initial
ELF copy. Finally, post_relocation() copies the relocated percpu data to
their final per-CPU destinations.

However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
this way. It is ok to reset it once, but if we need to set it back again
then I would reconsider this.

An alternative approach could be to teach apply_relocations() that the
percpu section is special and should be relocated even though it doesn't
have SHF_ALLOC set. This would also allow adding a comment explaining
that we're relocating the data in the original ELF copy, which I find
useful to mention as it is different to other relocation processing.

For instance:

	/*
	 * Don't bother with non-allocated sections.
	 *
	 * An exception is the percpu section, which has separate allocations
	 * for individual CPUs. We relocate the percpu section in the initial
	 * ELF template and subsequently copy it to the per-CPU destinations.
	 */
	if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
	    infosec != info->index.pcpu)
		continue;

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=b3b91325f3c77ace041f769ada7039ebc7aab8de

-- 
Thanks,
Petr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
  2025-06-05 13:44     ` Petr Pavlu
@ 2025-06-05 14:39       ` Peter Zijlstra
  2025-06-05 15:54       ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2025-06-05 14:39 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: Sebastian Andrzej Siewior, linux-modules, oe-lkp, lkp,
	linux-kernel, kernel test robot, Paolo Abeni, Allison Henderson,
	netdev, linux-rdma, rds-devel, Luis Chamberlain, Sami Tolvanen,
	Daniel Gomez, Thomas Gleixner

On Thu, Jun 05, 2025 at 03:44:23PM +0200, Petr Pavlu wrote:

> For instance:
> 
> 	/*
> 	 * Don't bother with non-allocated sections.
> 	 *
> 	 * An exception is the percpu section, which has separate allocations
> 	 * for individual CPUs. We relocate the percpu section in the initial
> 	 * ELF template and subsequently copy it to the per-CPU destinations.
> 	 */
> 	if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
> 	    infosec != info->index.pcpu)
> 		continue;

Right, and pcpu is a data section and should not have relative
relocations, only absolute.

So copying things should not be a problem.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
  2025-06-05 13:44     ` Petr Pavlu
  2025-06-05 14:39       ` Peter Zijlstra
@ 2025-06-05 15:54       ` Sebastian Andrzej Siewior
  2025-06-05 16:50         ` Petr Pavlu
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-05 15:54 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
	Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
	Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
	Thomas Gleixner

On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
> (pre-Git, [1])?

Looking further back into the history, we have
	21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")

which does

+       if (pcpuindex) {
+               /* We have a special allocation for this section. */
+               mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
+                                             sechdrs[pcpuindex].sh_addralign);
+               if (!mod->percpu) {
+                       err = -ENOMEM;
+                       goto free_mod;
+               }
+               sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
+       }

so this looks like the origin.

…
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> >  	if (err)
> >  		return ERR_PTR(err);
> >  
> > +	/* Add SHF_ALLOC back so that relocations are applied. */
> > +	if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
> > +		info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
> > +
> >  	/* Module has been copied to its final place now: return it. */
> >  	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> >  	kmemleak_load_module(mod, info);
> 
> This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
> is set by rewrite_section_headers() to point to the percpu data in the
> userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
> doesn't move and the sh_addr isn't adjusted by move_module(). The
> function apply_relocations() then applies the relocations in the initial
> ELF copy. Finally, post_relocation() copies the relocated percpu data to
> their final per-CPU destinations.
> 
> However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
> this way. It is ok to reset it once, but if we need to set it back again
> then I would reconsider this.

I had the other way around but this flag is not considered anywhere
else other than the functions called here. So I decided to add back what
was taken once.

> An alternative approach could be to teach apply_relocations() that the
> percpu section is special and should be relocated even though it doesn't
> have SHF_ALLOC set. This would also allow adding a comment explaining
> that we're relocating the data in the original ELF copy, which I find
> useful to mention as it is different to other relocation processing.

Not sure if this makes it better. It looks like it continues a
workaround…
The only reason why it has been removed in the first place is to skip
the copy process.
We could also keep the flag and skip the section during the copy
process based on its id. This was the original intention.

> For instance:
> 
> 	/*
> 	 * Don't bother with non-allocated sections.
> 	 *
> 	 * An exception is the percpu section, which has separate allocations
> 	 * for individual CPUs. We relocate the percpu section in the initial
> 	 * ELF template and subsequently copy it to the per-CPU destinations.
> 	 */
> 	if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
> 	    infosec != info->index.pcpu)
> 		continue;
> 

If you insist but…

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
  2025-06-05 15:54       ` Sebastian Andrzej Siewior
@ 2025-06-05 16:50         ` Petr Pavlu
  2025-06-10 14:55           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Petr Pavlu @ 2025-06-05 16:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
	Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
	Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
	Thomas Gleixner

On 6/5/25 5:54 PM, Sebastian Andrzej Siewior wrote:
> On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
>> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
>> (pre-Git, [1])?
> 
> Looking further back into the history, we have
> 	21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
> 
> which does
> 
> +       if (pcpuindex) {
> +               /* We have a special allocation for this section. */
> +               mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
> +                                             sechdrs[pcpuindex].sh_addralign);
> +               if (!mod->percpu) {
> +                       err = -ENOMEM;
> +                       goto free_mod;
> +               }
> +               sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
> +       }
> 
> so this looks like the origin.

This patch added the initial per-cpu support for modules. The relocation
handling at that point appears correct to me. I think it's the mentioned patch
"Don't relocate non-allocated regions in modules" that broke it.

> 
> …
>>> --- a/kernel/module/main.c
>>> +++ b/kernel/module/main.c
>>> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
>>>  	if (err)
>>>  		return ERR_PTR(err);
>>>  
>>> +	/* Add SHF_ALLOC back so that relocations are applied. */
>>> +	if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
>>> +		info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
>>> +
>>>  	/* Module has been copied to its final place now: return it. */
>>>  	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
>>>  	kmemleak_load_module(mod, info);
>>
>> This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
>> is set by rewrite_section_headers() to point to the percpu data in the
>> userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
>> doesn't move and the sh_addr isn't adjusted by move_module(). The
>> function apply_relocations() then applies the relocations in the initial
>> ELF copy. Finally, post_relocation() copies the relocated percpu data to
>> their final per-CPU destinations.
>>
>> However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
>> this way. It is ok to reset it once, but if we need to set it back again
>> then I would reconsider this.
> 
> I had the other way around but this flag is not considered anywhere
> else other than the functions called here. So I decided to add back what
> was taken once.
> 
>> An alternative approach could be to teach apply_relocations() that the
>> percpu section is special and should be relocated even though it doesn't
>> have SHF_ALLOC set. This would also allow adding a comment explaining
>> that we're relocating the data in the original ELF copy, which I find
>> useful to mention as it is different to other relocation processing.
> 
> Not sure if this makes it better. It looks like it continues a
> workaround…
> The only reason why it has been removed in the first place is to skip
> the copy process.

The SHF_ALLOC flag is also removed to prevent the section from being allocated
by layout_sections().

> We could also keep the flag and skip the section during the copy
> process based on its id. This was the original intention.
> 
>> For instance:
>>
>> 	/*
>> 	 * Don't bother with non-allocated sections.
>> 	 *
>> 	 * An exception is the percpu section, which has separate allocations
>> 	 * for individual CPUs. We relocate the percpu section in the initial
>> 	 * ELF template and subsequently copy it to the per-CPU destinations.
>> 	 */
>> 	if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
>> 	    infosec != info->index.pcpu)
>> 		continue;
>>
> 
> If you insist but…

It seems logical to me that the SHF_ALLOC flag is removed for the percpu section
since it isn't directly allocated by the regular process. This is consistent
with what the module loader does in other similar cases. I could also understand
keeping the flag and explicitly skipping the layout and allocate process for the
section. However, adjusting the flag back and forth to trigger the right code
paths in between seems fragile to me and harder to maintain if we need to
shuffle things around in the future.

-- 
Cheers,
Petr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] module: Make sure relocations are applied to the per-CPU section
  2025-06-05 16:50         ` Petr Pavlu
@ 2025-06-10 14:55           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-06-10 14:55 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: linux-modules, oe-lkp, lkp, linux-kernel, kernel test robot,
	Paolo Abeni, Allison Henderson, netdev, linux-rdma, rds-devel,
	Luis Chamberlain, Sami Tolvanen, Daniel Gomez, Peter Zijlstra,
	Thomas Gleixner

On 2025-06-05 18:50:27 [+0200], Petr Pavlu wrote:
> On 6/5/25 5:54 PM, Sebastian Andrzej Siewior wrote:
> > On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
> >> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
> >> (pre-Git, [1])?
> > 
> > Looking further back into the history, we have
> > 	21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
> > 
> > which does
> > 
> > +       if (pcpuindex) {
> > +               /* We have a special allocation for this section. */
> > +               mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
> > +                                             sechdrs[pcpuindex].sh_addralign);
> > +               if (!mod->percpu) {
> > +                       err = -ENOMEM;
> > +                       goto free_mod;
> > +               }
> > +               sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
> > +       }
> > 
> > so this looks like the origin.
> 
> This patch added the initial per-cpu support for modules. The relocation
> handling at that point appears correct to me. I think it's the mentioned patch
> "Don't relocate non-allocated regions in modules" that broke it.

Ach, it ignores that bit. Okay then.

> It seems logical to me that the SHF_ALLOC flag is removed for the percpu section
> since it isn't directly allocated by the regular process. This is consistent
> with what the module loader does in other similar cases. I could also understand
> keeping the flag and explicitly skipping the layout and allocate process for the
> section. However, adjusting the flag back and forth to trigger the right code
> paths in between seems fragile to me and harder to maintain if we need to
> shuffle things around in the future.

Okay. Let me add this exception later on instead of adding the bit back.

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-06-10 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04  8:42 [linus:master] [rds] c50d295c37: BUG:unable_to_handle_page_fault_for_address kernel test robot
2025-06-04 11:04 ` Sebastian Andrzej Siewior
2025-06-04 15:27 ` [PATCH] module: Make sure relocations are applied to the per-CPU section Sebastian Andrzej Siewior
2025-06-05  6:07   ` [PATCH v2] " Sebastian Andrzej Siewior
2025-06-05 13:44     ` Petr Pavlu
2025-06-05 14:39       ` Peter Zijlstra
2025-06-05 15:54       ` Sebastian Andrzej Siewior
2025-06-05 16:50         ` Petr Pavlu
2025-06-10 14:55           ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).