* [PATCH] static_call: use CFI-compliant return0 stubs
[not found] <20260309223156.GA73501@google.com>
@ 2026-03-11 22:57 ` Carlos Llamas
2026-03-11 23:14 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Carlos Llamas @ 2026-03-11 22:57 UTC (permalink / raw)
To: Sami Tolvanen, Catalin Marinas, Will Deacon, Peter Zijlstra,
Josh Poimboeuf, Jason Baron, Alice Ryhl, Steven Rostedt,
Ard Biesheuvel, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, James Clark, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
Valentin Schneider, Kees Cook, Linus Walleij,
Borislav Petkov (AMD), Nathan Chancellor, Thomas Gleixner,
Mathieu Desnoyers, Shaopeng Tan, Jens Remus, Juergen Gross,
Carlos Llamas, Conor Dooley, David Kaplan, Lukas Bulwahn,
Jinjie Ruan, James Morse, Thomas Huth, Sean Christopherson,
Paolo Bonzini
Cc: kernel-team, linux-kernel, Will McVicker, Thomas Weißschuh,
moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
open list:PERFORMANCE EVENTS SUBSYSTEM
Architectures with !HAVE_STATIC_CALL (such as arm64) rely on the generic
static_call implementation via indirect calls. In particular, users of
DEFINE_STATIC_CALL_RET0, default to the generic __static_call_return0
stub to optimize the unset path.
However, __static_call_return0 has a fixed signature of "long (*)(void)"
which may not match the expected prototype at callsites. This triggers
CFI failures when CONFIG_CFI is enabled. A trivial linux-perf command
does it:
$ perf record -a sleep 1
CFI failure at perf_prepare_sample+0x98/0x7f8 (target: __static_call_return0+0x0/0x10; expected type: 0x837de525)
Internal error: Oops - CFI: 00000000f2008228 [#1] SMP
Modules linked in:
CPU: 0 UID: 0 PID: 638 Comm: perf Not tainted 7.0.0-rc3 #25 PREEMPT
Hardware name: linux,dummy-virt (DT)
pstate: 900000c5 (NzcV daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : perf_prepare_sample+0x98/0x7f8
lr : perf_prepare_sample+0x70/0x7f8
sp : ffff80008289bc20
x29: ffff80008289bc30 x28: 000000000000001f x27: 0000000000000018
x26: 0000000000000100 x25: ffffffffffffffff x24: 0000000000000000
x23: 0000000000010187 x22: ffff8000851eba40 x21: 0000000000010087
x20: ffff0000098c9ea0 x19: ffff80008289bdc0 x18: 0000000000000000
x17: 00000000837de525 x16: 0000000072923c8f x15: 7fffffffffffffff
x14: 00007fffffffffff x13: 00000000ffffffea x12: 0000000000000000
x11: 0000000000000015 x10: 0000000000000000 x9 : ffff8000822f2240
x8 : ffff800080276e4c x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : ffff8000851eba10 x3 : ffff8000851eba40
x2 : ffff8000822f2240 x1 : 0000000000000000 x0 : 00000009d377c3a0
Call trace:
perf_prepare_sample+0x98/0x7f8 (P)
perf_event_output_forward+0x5c/0x17c
__perf_event_overflow+0x2fc/0x460
perf_event_overflow+0x1c/0x28
armv8pmu_handle_irq+0x134/0x210
[...]
To fix this, let architectures provide an ARCH_DEFINE_TYPED_STUB_RET0
implementation that generates individual signature-matching stubs for
users of DEFINE_STATIC_CALL_RET0. This ensures the CFI hash of the
target call matches that of the callsite.
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will McVicker <willmcvicker@google.com>
Fixes: 87b940a0675e ("perf/core: Use static_call to optimize perf_guest_info_callbacks")
Closes: https://lore.kernel.org/all/YfrQzoIWyv9lNljh@google.com/
Suggested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
arch/Kconfig | 4 ++++
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/linkage.h | 3 ++-
arch/arm64/include/asm/static_call.h | 23 +++++++++++++++++++++++
include/linux/static_call.h | 19 ++++++++++++++++++-
kernel/events/core.c | 11 +++++++----
kernel/sched/core.c | 4 ++--
7 files changed, 57 insertions(+), 8 deletions(-)
create mode 100644 arch/arm64/include/asm/static_call.h
diff --git a/arch/Kconfig b/arch/Kconfig
index 102ddbd4298e..7735d548f02e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1678,6 +1678,10 @@ config HAVE_STATIC_CALL_INLINE
depends on HAVE_STATIC_CALL
select OBJTOOL
+config HAVE_STATIC_CALL_TYPED_STUBS
+ bool
+ depends on !HAVE_STATIC_CALL
+
config HAVE_PREEMPT_DYNAMIC
bool
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 38dba5f7e4d2..b370c31a23cf 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -252,6 +252,7 @@ config ARM64
select HAVE_RSEQ
select HAVE_RUST if RUSTC_SUPPORTS_ARM64
select HAVE_STACKPROTECTOR
+ select HAVE_STATIC_CALL_TYPED_STUBS if CFI
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
select HAVE_KRETPROBES
diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 40bd17add539..5625ea365d27 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -4,9 +4,10 @@
#ifdef __ASSEMBLER__
#include <asm/assembler.h>
#endif
+#include <linux/stringify.h>
#define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT
-#define __ALIGN_STR ".balign " #CONFIG_FUNCTION_ALIGNMENT
+#define __ALIGN_STR __stringify(__ALIGN)
/*
* When using in-kernel BTI we need to ensure that PCS-conformant
diff --git a/arch/arm64/include/asm/static_call.h b/arch/arm64/include/asm/static_call.h
new file mode 100644
index 000000000000..ef754b58b1c9
--- /dev/null
+++ b/arch/arm64/include/asm/static_call.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_STATIC_CALL_H
+#define _ASM_ARM64_STATIC_CALL_H
+
+#include <linux/compiler.h>
+#include <asm/linkage.h>
+
+/* Generates a CFI-compliant "return 0" stub matching @reffunc signature */
+#define __ARCH_DEFINE_TYPED_STUB_RET0(name, reffunc) \
+ typeof(reffunc) name; \
+ __ADDRESSABLE(name); \
+ asm( \
+ " " __ALIGN_STR " \n" \
+ " .4byte __kcfi_typeid_" #name "\n" \
+ #name ": \n" \
+ " bti c \n" \
+ " mov x0, xzr \n" \
+ " ret" \
+ );
+#define ARCH_DEFINE_TYPED_STUB_RET0(name, reffunc) \
+ __ARCH_DEFINE_TYPED_STUB_RET0(name, reffunc)
+
+#endif /* _ASM_ARM64_STATIC_CALL_H */
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 78a77a4ae0ea..6cb44441dfe0 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -184,6 +184,8 @@ extern int static_call_text_reserved(void *start, void *end);
extern long __static_call_return0(void);
+#define STATIC_CALL_STUB_RET0(...) ((void *)&__static_call_return0)
+
#define DEFINE_STATIC_CALL(name, _func) \
DECLARE_STATIC_CALL(name, _func); \
struct static_call_key STATIC_CALL_KEY(name) = { \
@@ -270,6 +272,8 @@ static inline int static_call_text_reserved(void *start, void *end)
extern long __static_call_return0(void);
+#define STATIC_CALL_STUB_RET0(...) ((void *)&__static_call_return0)
+
#define EXPORT_STATIC_CALL(name) \
EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \
EXPORT_SYMBOL(STATIC_CALL_TRAMP(name))
@@ -294,6 +298,18 @@ static inline long __static_call_return0(void)
return 0;
}
+#ifdef CONFIG_HAVE_STATIC_CALL_TYPED_STUBS
+#include <asm/static_call.h>
+
+#define STATIC_CALL_STUB_RET0(name) __static_call_##name
+#define DEFINE_STATIC_CALL_STUB_RET0(name, _func) \
+ ARCH_DEFINE_TYPED_STUB_RET0(STATIC_CALL_STUB_RET0(name), _func)
+#else
+/* Fall back to the generic __static_call_return0 stub */
+#define STATIC_CALL_STUB_RET0(...) ((void *)&__static_call_return0)
+#define DEFINE_STATIC_CALL_STUB_RET0(...)
+#endif
+
#define __DEFINE_STATIC_CALL(name, _func, _func_init) \
DECLARE_STATIC_CALL(name, _func); \
struct static_call_key STATIC_CALL_KEY(name) = { \
@@ -307,7 +323,8 @@ static inline long __static_call_return0(void)
__DEFINE_STATIC_CALL(name, _func, NULL)
#define DEFINE_STATIC_CALL_RET0(name, _func) \
- __DEFINE_STATIC_CALL(name, _func, __static_call_return0)
+ DEFINE_STATIC_CALL_STUB_RET0(name, _func) \
+ __DEFINE_STATIC_CALL(name, _func, STATIC_CALL_STUB_RET0(name))
static inline void __static_call_nop(void) { }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1f5699b339ec..6ac00e89d320 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7695,16 +7695,19 @@ void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
}
EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
+#define static_call_disable(name) \
+ static_call_update(name, STATIC_CALL_STUB_RET0(name))
+
void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
{
if (WARN_ON_ONCE(rcu_access_pointer(perf_guest_cbs) != cbs))
return;
rcu_assign_pointer(perf_guest_cbs, NULL);
- static_call_update(__perf_guest_state, (void *)&__static_call_return0);
- static_call_update(__perf_guest_get_ip, (void *)&__static_call_return0);
- static_call_update(__perf_guest_handle_intel_pt_intr, (void *)&__static_call_return0);
- static_call_update(__perf_guest_handle_mediated_pmi, (void *)&__static_call_return0);
+ static_call_disable(__perf_guest_state);
+ static_call_disable(__perf_guest_get_ip);
+ static_call_disable(__perf_guest_handle_intel_pt_intr);
+ static_call_disable(__perf_guest_handle_mediated_pmi);
synchronize_rcu();
}
EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..57c441d01564 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7443,12 +7443,12 @@ EXPORT_SYMBOL(__cond_resched);
#ifdef CONFIG_PREEMPT_DYNAMIC
# ifdef CONFIG_HAVE_PREEMPT_DYNAMIC_CALL
# define cond_resched_dynamic_enabled __cond_resched
-# define cond_resched_dynamic_disabled ((void *)&__static_call_return0)
+# define cond_resched_dynamic_disabled STATIC_CALL_STUB_RET0(cond_resched)
DEFINE_STATIC_CALL_RET0(cond_resched, __cond_resched);
EXPORT_STATIC_CALL_TRAMP(cond_resched);
# define might_resched_dynamic_enabled __cond_resched
-# define might_resched_dynamic_disabled ((void *)&__static_call_return0)
+# define might_resched_dynamic_disabled STATIC_CALL_STUB_RET0(might_resched)
DEFINE_STATIC_CALL_RET0(might_resched, __cond_resched);
EXPORT_STATIC_CALL_TRAMP(might_resched);
# elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] static_call: use CFI-compliant return0 stubs
2026-03-11 22:57 ` [PATCH] static_call: use CFI-compliant return0 stubs Carlos Llamas
@ 2026-03-11 23:14 ` Peter Zijlstra
2026-03-12 0:16 ` Carlos Llamas
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2026-03-11 23:14 UTC (permalink / raw)
To: Carlos Llamas
Cc: Sami Tolvanen, Catalin Marinas, Will Deacon, Josh Poimboeuf,
Jason Baron, Alice Ryhl, Steven Rostedt, Ard Biesheuvel,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
Linus Walleij, Borislav Petkov (AMD), Nathan Chancellor,
Thomas Gleixner, Mathieu Desnoyers, Shaopeng Tan, Jens Remus,
Juergen Gross, Conor Dooley, David Kaplan, Lukas Bulwahn,
Jinjie Ruan, James Morse, Thomas Huth, Sean Christopherson,
Paolo Bonzini, kernel-team, linux-kernel, Will McVicker,
Thomas Weißschuh,
moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
open list:PERFORMANCE EVENTS SUBSYSTEM
On Wed, Mar 11, 2026 at 10:57:40PM +0000, Carlos Llamas wrote:
> Architectures with !HAVE_STATIC_CALL (such as arm64) rely on the generic
> static_call implementation via indirect calls. In particular, users of
> DEFINE_STATIC_CALL_RET0, default to the generic __static_call_return0
> stub to optimize the unset path.
>
> However, __static_call_return0 has a fixed signature of "long (*)(void)"
> which may not match the expected prototype at callsites. This triggers
> CFI failures when CONFIG_CFI is enabled. A trivial linux-perf command
> does it:
*sigh*...
And ARM64 can't really do the inline thing because its immediate range
is too small and it all turns into a mess constructing the address in a
register and doing an indirect call anyway, right?
I'll stare at it in more detail tomorrow.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] static_call: use CFI-compliant return0 stubs
2026-03-11 23:14 ` Peter Zijlstra
@ 2026-03-12 0:16 ` Carlos Llamas
2026-03-12 7:40 ` Ard Biesheuvel
0 siblings, 1 reply; 6+ messages in thread
From: Carlos Llamas @ 2026-03-12 0:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Sami Tolvanen, Catalin Marinas, Will Deacon, Josh Poimboeuf,
Jason Baron, Alice Ryhl, Steven Rostedt, Ard Biesheuvel,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
Linus Walleij, Borislav Petkov (AMD), Nathan Chancellor,
Thomas Gleixner, Mathieu Desnoyers, Shaopeng Tan, Jens Remus,
Juergen Gross, Conor Dooley, David Kaplan, Lukas Bulwahn,
Jinjie Ruan, James Morse, Thomas Huth, Sean Christopherson,
Paolo Bonzini, kernel-team, linux-kernel, Will McVicker,
Thomas Weißschuh,
moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
open list:PERFORMANCE EVENTS SUBSYSTEM
On Thu, Mar 12, 2026 at 12:14:06AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 11, 2026 at 10:57:40PM +0000, Carlos Llamas wrote:
> > Architectures with !HAVE_STATIC_CALL (such as arm64) rely on the generic
> > static_call implementation via indirect calls. In particular, users of
> > DEFINE_STATIC_CALL_RET0, default to the generic __static_call_return0
> > stub to optimize the unset path.
> >
> > However, __static_call_return0 has a fixed signature of "long (*)(void)"
> > which may not match the expected prototype at callsites. This triggers
> > CFI failures when CONFIG_CFI is enabled. A trivial linux-perf command
> > does it:
>
> *sigh*...
>
> And ARM64 can't really do the inline thing because its immediate range
> is too small and it all turns into a mess constructing the address in a
> register and doing an indirect call anyway, right?
>
Right, the range for the jump is very limited. I _think_ tracepoints
have managed to implement the trampoline work-around:
arch/arm64/kernel/ftrace.c
So it looks do-able I think but a much complex route.
--
Carlos Llamas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] static_call: use CFI-compliant return0 stubs
2026-03-12 0:16 ` Carlos Llamas
@ 2026-03-12 7:40 ` Ard Biesheuvel
2026-03-12 8:07 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2026-03-12 7:40 UTC (permalink / raw)
To: Carlos Llamas, Peter Zijlstra
Cc: Sami Tolvanen, Catalin Marinas, Will Deacon, Josh Poimboeuf,
Jason Baron, Alice Ryhl, Steven Rostedt, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
Linus Walleij, Borislav Petkov, Nathan Chancellor,
Thomas Gleixner, Mathieu Desnoyers, Shaopeng Tan, Jens Remus,
Juergen Gross, Conor Dooley, David Kaplan, Lukas Bulwahn,
Jinjie Ruan, James Morse, Thomas Huth, Sean Christopherson,
Paolo Bonzini, kernel-team, linux-kernel, Will McVicker,
Thomas Weißschuh,
moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
open list:PERFORMANCE EVENTS SUBSYSTEM
Hi Carlos,
You've cc'ed around 50 people on this patch, which is a bit excessive. Better to take get_maintainer.pl with a grain of salt if it proposes a cc list like that.
On Thu, 12 Mar 2026, at 01:16, Carlos Llamas wrote:
> On Thu, Mar 12, 2026 at 12:14:06AM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 11, 2026 at 10:57:40PM +0000, Carlos Llamas wrote:
>> > Architectures with !HAVE_STATIC_CALL (such as arm64) rely on the generic
>> > static_call implementation via indirect calls. In particular, users of
>> > DEFINE_STATIC_CALL_RET0, default to the generic __static_call_return0
>> > stub to optimize the unset path.
>> >
>> > However, __static_call_return0 has a fixed signature of "long (*)(void)"
>> > which may not match the expected prototype at callsites. This triggers
>> > CFI failures when CONFIG_CFI is enabled. A trivial linux-perf command
>> > does it:
>>
>> *sigh*...
>>
>> And ARM64 can't really do the inline thing because its immediate range
>> is too small and it all turns into a mess constructing the address in a
>> register and doing an indirect call anyway, right?
>>
>
> Right, the range for the jump is very limited. I _think_ tracepoints
> have managed to implement the trampoline work-around:
> arch/arm64/kernel/ftrace.c
>
> So it looks do-able I think but a much complex route.
>
So far, we have managed to avoid the blessings of objtool on arm64, and the complexity associated with the inline patching is not really justified, given that on arm64, there is not really a need to avoid indirect calls (and as Peter says, we might end up with them anyway)
A while ago, I had a stab at implementing the out-of-line variety [0], but nobody cared enough to even respond. It is rather concise, and localised to arm64, so it is something we might consider for CONFIG_CFI builds. It is essentially the same sequence that arm64 uses for trampolines between modules and the kernel if they are out of direct branching range, with some .rodata patching to change the target. (arm64 basically only permits code patching without stopping the machine when it involves patching branch opcodes into NOPS or vice versa).
Doing so for only CONFIG_CFI makes sense because it removes the CFI overhead for all static calls, although it adds back some overhead for the trampoline. But there is currently no need to do this unconditionally.
[0] https://lore.kernel.org/linux-arm-kernel/20201120082103.4840-1-ardb@kernel.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] static_call: use CFI-compliant return0 stubs
2026-03-12 7:40 ` Ard Biesheuvel
@ 2026-03-12 8:07 ` Peter Zijlstra
2026-03-12 17:18 ` Carlos Llamas
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2026-03-12 8:07 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Carlos Llamas, Sami Tolvanen, Catalin Marinas, Will Deacon,
Josh Poimboeuf, Jason Baron, Alice Ryhl, Steven Rostedt,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
Linus Walleij, Borislav Petkov, Nathan Chancellor,
Thomas Gleixner, Mathieu Desnoyers, Shaopeng Tan, Jens Remus,
Juergen Gross, Conor Dooley, David Kaplan, Lukas Bulwahn,
Jinjie Ruan, James Morse, Thomas Huth, Sean Christopherson,
Paolo Bonzini, kernel-team, linux-kernel, Will McVicker,
Thomas Weißschuh,
moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
open list:PERFORMANCE EVENTS SUBSYSTEM
On Thu, Mar 12, 2026 at 08:40:11AM +0100, Ard Biesheuvel wrote:
> So far, we have managed to avoid the blessings of objtool on arm64,
> and the complexity associated with the inline patching is not really
> justified, given that on arm64, there is not really a need to avoid
> indirect calls (and as Peter says, we might end up with them anyway)
>
> A while ago, I had a stab at implementing the out-of-line variety [0],
> but nobody cared enough to even respond. It is rather concise, and
> localised to arm64, so it is something we might consider for
> CONFIG_CFI builds. It is essentially the same sequence that arm64 uses
> for trampolines between modules and the kernel if they are out of
> direct branching range, with some .rodata patching to change the
> target. (arm64 basically only permits code patching without stopping
> the machine when it involves patching branch opcodes into NOPS or vice
> versa).
>
> Doing so for only CONFIG_CFI makes sense because it removes the CFI
> overhead for all static calls, although it adds back some overhead for
> the trampoline. But there is currently no need to do this
> unconditionally.
Right, so your v3 is very simple and straight forward, and should work
as an end run around the CFI issue, by effectively doing that indirect
tail call in the trampoline outside of the compiler generated software
cfi things.
And I think I like your thing better because it handles all possible
cases, not just the ret0 oddity and isn't in fact much larger.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] static_call: use CFI-compliant return0 stubs
2026-03-12 8:07 ` Peter Zijlstra
@ 2026-03-12 17:18 ` Carlos Llamas
0 siblings, 0 replies; 6+ messages in thread
From: Carlos Llamas @ 2026-03-12 17:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ard Biesheuvel, Sami Tolvanen, Catalin Marinas, Will Deacon,
Josh Poimboeuf, Jason Baron, Alice Ryhl, Steven Rostedt,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Kees Cook,
Linus Walleij, Borislav Petkov, Nathan Chancellor,
Thomas Gleixner, Mathieu Desnoyers, Shaopeng Tan, Jens Remus,
Juergen Gross, Conor Dooley, David Kaplan, Lukas Bulwahn,
Jinjie Ruan, James Morse, Thomas Huth, Sean Christopherson,
Paolo Bonzini, kernel-team, linux-kernel, Will McVicker,
Thomas Weißschuh,
moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
open list:PERFORMANCE EVENTS SUBSYSTEM
On Thu, Mar 12, 2026 at 09:07:40AM +0100, Peter Zijlstra wrote:
> On Thu, Mar 12, 2026 at 08:40:11AM +0100, Ard Biesheuvel wrote:
> > So far, we have managed to avoid the blessings of objtool on arm64,
> > and the complexity associated with the inline patching is not really
> > justified, given that on arm64, there is not really a need to avoid
> > indirect calls (and as Peter says, we might end up with them anyway)
> >
> > A while ago, I had a stab at implementing the out-of-line variety [0],
> > but nobody cared enough to even respond. It is rather concise, and
> > localised to arm64, so it is something we might consider for
> > CONFIG_CFI builds. It is essentially the same sequence that arm64 uses
> > for trampolines between modules and the kernel if they are out of
> > direct branching range, with some .rodata patching to change the
> > target. (arm64 basically only permits code patching without stopping
> > the machine when it involves patching branch opcodes into NOPS or vice
> > versa).
Great! I'll go read your implementation then.
> > Doing so for only CONFIG_CFI makes sense because it removes the CFI
> > overhead for all static calls, although it adds back some overhead for
> > the trampoline. But there is currently no need to do this
> > unconditionally.
>
> Right, so your v3 is very simple and straight forward, and should work
> as an end run around the CFI issue, by effectively doing that indirect
> tail call in the trampoline outside of the compiler generated software
> cfi things.
>
> And I think I like your thing better because it handles all possible
> cases, not just the ret0 oddity and isn't in fact much larger.
SGTM, I'll switch over to testing Ard's patch.
Thanks,
Carlos Llamas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-12 17:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260309223156.GA73501@google.com>
2026-03-11 22:57 ` [PATCH] static_call: use CFI-compliant return0 stubs Carlos Llamas
2026-03-11 23:14 ` Peter Zijlstra
2026-03-12 0:16 ` Carlos Llamas
2026-03-12 7:40 ` Ard Biesheuvel
2026-03-12 8:07 ` Peter Zijlstra
2026-03-12 17:18 ` Carlos Llamas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox