Linux Trace Kernel
 help / color / mirror / Atom feed
* [PATCH bpf-next] x86/ftrace: relocate %rip-relative percpu refs in dynamic trampolines
@ 2026-05-27 19:12 Alexis Lothoré (eBPF Foundation)
  2026-05-27 21:11 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Alexis Lothoré (eBPF Foundation) @ 2026-05-27 19:12 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mark Rutland, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Uros Bizjak
  Cc: Thomas Petazzoni, Ingo Molnar, linux-kernel, linux-trace-kernel,
	bpf, ebpf, Bastien Curutchet,
	Alexis Lothoré (eBPF Foundation)

With CONFIG_CALL_DEPTH_TRACKING enabled on an x86 retbleed-affected
platform (eg: Skylake), with retbleed=stuff, registering a dynamic
ftrace trampoline crashes on the first call into the traced function:

  [    9.630365] BUG: unable to handle page fault for address: ffff88817ae18880
  [    9.630365] #PF: supervisor write access in kernel mode
  [    9.630365] #PF: error_code(0x0002) - not-present page
  [    9.630365] PGD 4b53067 P4D 4b53067 PUD 0
  [    9.630365] Oops: Oops: 0002 [#1] SMP PTI
  [    9.630365] CPU: 3 UID: 0 PID: 187 Comm: usleep Not tainted 7.0.10 #243 PREEMPT(full)
  [    9.630365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014
  [    9.630365] RIP: 0010:0xffffffffc0400058
  [    9.630365] Code: 24 78 00 00 00 00 48 89 ea 48 89 54 24 20 48 8b b4 24 b8 00 00 00 48 8b bc 24 b0 00 00 00 48 89 bc 24 80 00 00 00 48 83 ef 05 <65> 48 c1 3d 1f a8 b6 02 05 48 8b 15 f6 00 00 00 4c 89 3c 24 4c 89
  [    9.630365] RSP: 0018:ffffc90000a3fe60 EFLAGS: 00010382
  [    9.630365] RAX: ffffffffffffffff RBX: ffffc90000a3ff58 RCX: 0000000000000000
  [    9.630365] RDX: ffffc90000a3ff48 RSI: ffffffff82198e40 RDI: ffffffff813f5654
  [    9.630365] RBP: ffffc90000a3ff48 R08: 0000000000000000 R09: 0000000000000000
  [    9.630365] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8881030d3780
  [    9.630365] R13: 00000000000000e6 R14: 0000000000000000 R15: 0000000000000000
  [    9.630365] FS:  00007f081d131740(0000) GS:ffff8881b7eae000(0000) knlGS:0000000000000000
  [    9.630365] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    9.630365] CR2: ffff88817ae18880 CR3: 00000001033dc006 CR4: 00000000003706f0
  [    9.630365] Call Trace:
  [    9.630365]  <TASK>
  [    9.630365]  ? find_held_lock+0x2b/0x80
  [    9.630365]  ? exc_page_fault+0x74/0x220
  [    9.630365]  ? lock_release+0xe1/0x320
  [    9.630365]  ? __x64_sys_clock_nanosleep+0x9/0x1a0
  [    9.630365]  ? lockdep_hardirqs_on_prepare+0xd9/0x190
  [    9.630365]  ? trace_hardirqs_on+0x18/0x100
  [    9.630365]  __x64_sys_clock_nanosleep+0x9/0x1a0
  [    9.630365]  do_syscall_64+0x100/0x5f0
  [    9.630365]  ? exc_page_fault+0x1e0/0x220
  [    9.630365]  ? call_depth_return_thunk+0x2a/0xd0
  [    9.630365]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [    9.630365] RIP: 0033:0x7f081d20ad83
  [    9.630365] Code: ff ff c3 0f 1f 40 00 83 ff 03 74 7b 83 ff 02 b8 fa ff ff ff 49 89 ca 0f 44 f8 80 3d c6 d2 10 00 00 74 14 b8 e6 00 00 00 0f 05 <f7> d8 c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 54 24 10
  [    9.630365] RSP: 002b:00007ffd539e1328 EFLAGS: 00000202 ORIG_RAX: 00000000000000e6
  [    9.630365] RAX: ffffffffffffffda RBX: 0000000000000103 RCX: 00007f081d20ad83
  [    9.630365] RDX: 00007ffd539e1340 RSI: 0000000000000000 RDI: 0000000000000000
  [    9.630365] RBP: 00007ffd539e14f8 R08: 0000000000000000 R09: 0000000000000000
  [    9.630365] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
  [    9.630365] R13: 00007ffd539e1510 R14: 00007f081d370000 R15: 00005619c9d78338
  [    9.630365]  </TASK>
  [    9.630365] Modules linked in:
  [    9.630365] CR2: ffff88817ae18880
  [    9.630365] ---[ end trace 0000000000000000 ]---
  [    9.630365] RIP: 0010:0xffffffffc0400058
  [    9.630365] Code: 24 78 00 00 00 00 48 89 ea 48 89 54 24 20 48 8b b4 24 b8 00 00 00 48 8b bc 24 b0 00 00 00 48 89 bc 24 80 00 00 00 48 83 ef 05 <65> 48 c1 3d 1f a8 b6 02 05 48 8b 15 f6 00 00 00 4c 89 3c 24 4c 89
  [    9.630365] RSP: 0018:ffffc90000a3fe60 EFLAGS: 00010382
  [    9.630365] RAX: ffffffffffffffff RBX: ffffc90000a3ff58 RCX: 0000000000000000
  [    9.630365] RDX: ffffc90000a3ff48 RSI: ffffffff82198e40 RDI: ffffffff813f5654
  [    9.630365] RBP: ffffc90000a3ff48 R08: 0000000000000000 R09: 0000000000000000
  [    9.630365] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8881030d3780
  [    9.630365] R13: 00000000000000e6 R14: 0000000000000000 R15: 0000000000000000
  [    9.630365] FS:  00007f081d131740(0000) GS:ffff8881b7eae000(0000) knlGS:0000000000000000
  [    9.630365] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    9.630365] CR2: ffff88817ae18880 CR3: 00000001033dc006 CR4: 00000000003706f0
  [    9.630365] Kernel panic - not syncing: Fatal exception

This small reproducer allows to easily trigger the crash:

  # echo 'p __x64_sys_clock_nanosleep' > /sys/kernel/tracing/kprobe_events
  # echo 1 > /sys/kernel/tracing/events/kprobes/p___x64_sys_clock_nanosleep_0/enable
  # usleep 1

Monitoring the crash under GDB points to the exact instruction in charge
of incrementing the call depth:

  sarq $5, %gs:__x86_call_depth(%rip)

This instruction matches the one inserted by the ftrace_regs_caller from
ftrace_64.S. This emitted code was likely working fine until the
introduction of commit 59bec00ace28 ("x86/percpu: Introduce
%rip-relative addressing to PER_CPU_VAR()"): it has made the call depth
accounting addressing relative to $rip, instead of being based on an
absolute address. As this code exact location depends on where the
trampoline lives in memory, the corresponding displacement needs to be
adjusted at runtime to actually correctly find the per-cpu
__x86_call_depth value, otherwise the targeted address is wrong, leading
to the page fault seen above.

Fix the %rip-relative displacement of the copied CALL_DEPTH_ACCOUNT
instruction (from ftrace_regs_caller) by calling
text_poke_apply_relocation(), as it is done for example by the x86 BPF
JIT compiler through x86_call_depth_emit_accounting(). This corrects
both CALL_DEPTH_ACCOUNT slots, in ftrace_caller and ftrace_regs_caller.

Fixes: 59bec00ace28 ("x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()")
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
 arch/x86/kernel/ftrace.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 0543b57f54ee..357df1b2922c 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -375,6 +375,13 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 			goto fail;
 	}
 
+	/*
+	 * Generated trampoline may contain rip-relative addressing which
+	 * displacement needs to be fixed
+	 */
+	text_poke_apply_relocation(trampoline, trampoline, size,
+				   (void *)start_offset, size);
+
 	/*
 	 * The address of the ftrace_ops that is used for this trampoline
 	 * is stored at the end of the trampoline. This will be used to

---
base-commit: aef70d0806e39b83f1fbecc32c72cc328751292a
change-id: 20260527-fix_call_depth_in_trampoline-80bc56930c8f

Best regards,
--  
Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-27 21:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260527-fix_call_depth_in_trampoline-v1-1-d0292bfe7eed@bootlin.com>
2026-05-27 19:19 ` [PATCH bpf-next] x86/ftrace: relocate %rip-relative percpu refs in dynamic trampolines Alexis Lothoré
2026-05-27 19:30   ` Steven Rostedt
2026-05-27 19:12 Alexis Lothoré (eBPF Foundation)
2026-05-27 21:11 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox