From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f49.google.com (mail-dl1-f49.google.com [74.125.82.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E98AE31E846 for ; Thu, 12 Mar 2026 18:02:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773338562; cv=none; b=lF8LUrt8tHv6Yru4BdCxevnVDjQzTJ49upSfqAZ96M0jWYgWJMACt6KIRLyM+w+VtQx03H/NTdxDppG0JnQUWPPmVUhtRmovT38Z+xF/nY75e2M9Z6WILPbck4iFw72S/Fb2H5TiGndQ18hhEOZHebOgn3JKZRgPbvhBu6tqXkA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773338562; c=relaxed/simple; bh=w5OZhSjgHcCqbLbfh121RAoECzatu2DjOEM9zjnNOqY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=gb4DKkbP+MZ0K8mRVKIUYNAZbUqro7P2t5D7dt5YlRBnFW9g+QxV03KhhfltrH+Mj5j0nD8+WFSbnq6KQxRFJ3q5pOD7RFQwzRBB53ZaQkbchkW+79q7t9v3KQBTwcYjiFjhOoRLA/3gTWJh/QFiv/RPIR+DkwBH1sGM2ywNdfQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nIRnIpI0; arc=none smtp.client-ip=74.125.82.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nIRnIpI0" Received: by mail-dl1-f49.google.com with SMTP id a92af1059eb24-126ea4e9697so1129c88.1 for ; Thu, 12 Mar 2026 11:02:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1773338560; x=1773943360; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=83nz5kFoQpJR7VJnb9yP55CswKtFueIceddf1yuOaDs=; b=nIRnIpI043uHBBGfOrdWgXPakYZh6b7Hkn907HuW/Cac/ewrKn5J//PL/mZZKWnp4l kcj0KGh8WVdrIV07FPbiyQiGOYI96YOUPTRKCF+RnLCRPyq+XWOf7WzFe9Wd8+8b40Zi EvJb8vjME8qp7E82SApNA0kE0aIP4VZUEhNw51F0xd48w7LoISMjU14tJdCaobBqBzGy Q7w9kykvSeoonHY+ZPdvGg0D5K9637hhC3e8oIM68ApgW0AJeIY3GHFKWUAdbcvrJTKH ErXxVZ2w0SbETufSQy3PIW0XUrHYG2qPzD2sq+SO8O6NEGVlfs25SHw9/Y94Of6no4mq qOiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773338560; x=1773943360; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=83nz5kFoQpJR7VJnb9yP55CswKtFueIceddf1yuOaDs=; b=vPtz0rIgPLUSnzQkrraWFpTFf0z9zDp8psFW201XpLVicabs9ZgQSBOZKjx53rfkBN mdFAducVNaekmEpIvYMbVTUndDAd+oEUZ6RMqLDYAFME3+duYw8OIIMQVyClXGmG82iz NJc4oByU64asKkNKkaGc3zQM9wGuzsctx9z4ka/1ptckcahVKRsS3p2q0rjEbs7l93q3 I09lIYDeQjz/yVQZ51GMXTcNZKHeccdCAaPQp6flgpSuTHvRUI8wrVqgEUskPGRXXxJ5 gSeIXOCQwaVXTQ1MhuYYcvt9HosO4vvzPT0ssnCY0JWUcOaHMVG0BimtUfZ4XAa3fUwQ JpVQ== X-Forwarded-Encrypted: i=1; AJvYcCWoAEG5CK4+uP3tjsixtNpGPr96abCZi8gk64IjjGmyO+V9qwpqAGDQqsKZ/A/+DVtDiZVDZLTWOYuOZ7g=@vger.kernel.org X-Gm-Message-State: AOJu0YyqFytVgt4NgPzf0MV4cAT/Sgl/Qz5xHXcRYcs5SaKLq45+c/1p Kwvndiy6vODgqk+L1q+Rqyn26iOdL0wLW6uC4P6NhEY7fH33iaLSbK7adFgE2KJ2mQ== X-Gm-Gg: ATEYQzyffGyRsWrltdL0KI/MIC7t6NQxv9IrPXW0cyPkm7Yp/cCMh/L0cSCZtS/EAjj OMVpOFXjF1ZcfR1feqay9/BAfgejNqE+16KQ3flptvUb0JVKCqK+duXKORlVLfKZ3TVR8nrd8I1 S/PpB4mzflioJ1HESVEaRdc9+5qI3clWVSM8qyw8GOINj6faM9PCgUPiE2F+R4gDsRm53/pvfr6 mAuR9ojb5sC+LQWmAhrcLpACLyPIoBzXI9HH1yHUm9UMiJYy03ndT35bY2Kpc7t23jiE30VP8gz zLOl+W//T3bUxvpMmWaEJlIgCt0ZgoZc5pjR0PWpxwWQiv9JkQ3zpmmm4j7mIw3Wz8Fiq4RpNTW MGeTNrRMrKpd+MFVlSaTwQcs0mRBTxnAzsCtWWMlQFSO1dDQt4euoQqg3Lp78RLm75/ZQBPwZ/0 fp17AzAgmk1Wmz4ZGp4b24cKJ+aelSeC4Kq3dgLcnwgljqQGpp2/ma7HvgMb+z/w== X-Received: by 2002:a05:701b:271c:b0:11d:f440:2690 with SMTP id a92af1059eb24-128f4294d5dmr12633c88.0.1773338559084; Thu, 12 Mar 2026 11:02:39 -0700 (PDT) Received: from google.com (154.52.125.34.bc.googleusercontent.com. [34.125.52.154]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2be8a833a3asm7349791eec.5.2026.03.12.11.02.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Mar 2026 11:02:38 -0700 (PDT) Date: Thu, 12 Mar 2026 18:02:34 +0000 From: Carlos Llamas To: Ard Biesheuvel Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Mark Rutland , Quentin Perret , Catalin Marinas , James Morse , Will Deacon , Frederic Weisbecker , Peter Zijlstra , Kees Cook , Sami Tolvanen , Andy Lutomirski , Josh Poimboeuf , Steven Rostedt Subject: Re: [PATCH v6 2/2] arm64: implement support for static call trampolines Message-ID: References: <20211105145917.2828911-1-ardb@kernel.org> <20211105145917.2828911-3-ardb@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211105145917.2828911-3-ardb@kernel.org> On Fri, Nov 05, 2021 at 03:59:17PM +0100, Ard Biesheuvel wrote: > Implement arm64 support for the 'unoptimized' static call variety, which > routes all calls through a single trampoline that is patched to perform a > tail call to the selected function. > > It is expected that the direct branch instruction will be able to cover > the common case. However, given that static call targets may be located > in modules loaded out of direct branching range, we need a fallback path > that loads the address into R16 and uses a branch-to-register (BR) > instruction to perform an indirect call. > > Unlike on x86, there is no pressing need on arm64 to avoid indirect > calls at all cost, but hiding it from the compiler as is done here does > have some benefits: > - the literal is located in .text, which gives us the same robustness > advantage that code patching does; > - no performance hit on CFI enabled Clang builds that decorate compiler > emitted indirect calls with branch target validity checks. > > Acked-by: Peter Zijlstra > Signed-off-by: Ard Biesheuvel > --- I'm starting to testing this out on top of 7.0-rc3... > arch/arm64/Kconfig | 2 + > arch/arm64/include/asm/static_call.h | 40 ++++++++++ > arch/arm64/kernel/patching.c | 77 +++++++++++++++++++- > arch/arm64/kernel/vmlinux.lds.S | 1 + > 4 files changed, 117 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 176d6fddc4f2..ccc33b85769c 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -193,6 +193,8 @@ config ARM64 > select HAVE_PERF_USER_STACK_DUMP > select HAVE_REGS_AND_STACK_ACCESS_API > select HAVE_POSIX_CPU_TIMERS_TASK_WORK > + # https://github.com/ClangBuiltLinux/linux/issues/1354 > + select HAVE_STATIC_CALL if !LTO_CLANG_THIN || CLANG_VERSION >= 130000 I got a circular dependency error on this... error: recursive dependency detected! symbol GCOV_KERNEL depends on DEBUG_FS symbol DEBUG_FS is selected by GPIO_VIRTUSER symbol GPIO_VIRTUSER depends on GPIOLIB symbol GPIOLIB is selected by CEC_GPIO symbol CEC_GPIO depends on PREEMPTION symbol PREEMPTION is selected by PREEMPT_BUILD symbol PREEMPT_BUILD is selected by PREEMPT_DYNAMIC symbol PREEMPT_DYNAMIC depends on HAVE_PREEMPT_DYNAMIC symbol HAVE_PREEMPT_DYNAMIC is selected by HAVE_PREEMPT_DYNAMIC_CALL symbol HAVE_PREEMPT_DYNAMIC_CALL depends on HAVE_STATIC_CALL symbol HAVE_STATIC_CALL is selected by LTO_CLANG_THIN symbol LTO_CLANG_THIN is part of choice block at arch/Kconfig:817 symbol unknown is visible depending on LTO_CLANG_FULL symbol LTO_CLANG_FULL prompt is visible depending on HAS_LTO_CLANG symbol HAS_LTO_CLANG depends on GCOV_KERNEL ...so I just dropped the checks altogether for now. > select HAVE_FUNCTION_ARG_ACCESS_API > select HAVE_FUTEX_CMPXCHG if FUTEX > select MMU_GATHER_RCU_TABLE_FREE > diff --git a/arch/arm64/include/asm/static_call.h b/arch/arm64/include/asm/static_call.h > new file mode 100644 > index 000000000000..6ee918991510 > --- /dev/null > +++ b/arch/arm64/include/asm/static_call.h > @@ -0,0 +1,40 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_STATIC_CALL_H > +#define _ASM_STATIC_CALL_H > + > +/* > + * The sequence below is laid out in a way that guarantees that the literal and > + * the instruction are always covered by the same cacheline, and can be updated > + * using a single store-pair instruction (provided that we rewrite the BTI C > + * instruction as well). This means the literal and the instruction are always > + * in sync when observed via the D-side. > + * > + * However, this does not guarantee that the I-side will catch up immediately > + * as well: until the I-cache maintenance completes, CPUs may branch to the old > + * target, or execute a stale NOP or RET. We deal with this by writing the > + * literal unconditionally, even if it is 0x0 or the branch is in range. That > + * way, a stale NOP will fall through and call the new target via an indirect > + * call. Stale RETs or Bs will be taken as before, and branch to the old > + * target. > + */ > +#define __ARCH_DEFINE_STATIC_CALL_TRAMP(name, insn) \ > + asm(" .pushsection .static_call.text, \"ax\" \n" \ > + " .align 4 \n" \ > + " .globl " STATIC_CALL_TRAMP_STR(name) " \n" \ > + "0: .quad 0x0 \n" \ > + STATIC_CALL_TRAMP_STR(name) ": \n" \ > + " hint 34 /* BTI C */ \n" \ > + insn " \n" \ > + " ldr x16, 0b \n" \ > + " cbz x16, 1f \n" \ > + " br x16 \n" \ > + "1: ret \n" \ > + " .popsection \n") > + > +#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) \ > + __ARCH_DEFINE_STATIC_CALL_TRAMP(name, "b " #func) > + > +#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \ > + __ARCH_DEFINE_STATIC_CALL_TRAMP(name, "ret") > + Now that we have the RET0 implementation I added: --- a/arch/arm64/include/asm/static_call.h +++ b/arch/arm64/include/asm/static_call.h @@ -37,4 +37,7 @@ #define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \ __ARCH_DEFINE_STATIC_CALL_TRAMP(name, "ret") +#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) \ + __ARCH_DEFINE_STATIC_CALL_TRAMP(name, "mov x0, xzr") + #endif /* _ASM_STATIC_CALL_H */ > +#endif /* _ASM_STATIC_CALL_H */ > diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c > index 771f543464e0..a265a87d4d9e 100644 > --- a/arch/arm64/kernel/patching.c > +++ b/arch/arm64/kernel/patching.c > @@ -3,6 +3,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -66,7 +67,7 @@ int __kprobes aarch64_insn_read(void *addr, u32 *insnp) > return ret; > } > > -static int __kprobes __aarch64_insn_write(void *addr, __le32 insn) > +static int __kprobes __aarch64_insn_write(void *addr, void *insn, int size) > { > void *waddr = addr; > unsigned long flags = 0; > @@ -75,7 +76,7 @@ static int __kprobes __aarch64_insn_write(void *addr, __le32 insn) > raw_spin_lock_irqsave(&patch_lock, flags); > waddr = patch_map(addr, FIX_TEXT_POKE0); > > - ret = copy_to_kernel_nofault(waddr, &insn, AARCH64_INSN_SIZE); > + ret = copy_to_kernel_nofault(waddr, insn, size); > > patch_unmap(FIX_TEXT_POKE0); > raw_spin_unlock_irqrestore(&patch_lock, flags); > @@ -85,7 +86,77 @@ static int __kprobes __aarch64_insn_write(void *addr, __le32 insn) > > int __kprobes aarch64_insn_write(void *addr, u32 insn) > { > - return __aarch64_insn_write(addr, cpu_to_le32(insn)); > + __le32 i = cpu_to_le32(insn); > + > + return __aarch64_insn_write(addr, &i, AARCH64_INSN_SIZE); > +} > + > +static void *strip_cfi_jt(void *addr) > +{ > + if (IS_ENABLED(CONFIG_CFI_CLANG)) { I believe this is now just "CONFIG_CFI". > + void *p = addr; > + u32 insn; > + > + /* > + * Taking the address of a function produces the address of the > + * jump table entry when Clang CFI is enabled. Such entries are > + * ordinary jump instructions, preceded by a BTI C instruction > + * if BTI is enabled for the kernel. > + */ > + if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) > + p += 4; > + > + insn = le32_to_cpup(p); > + if (aarch64_insn_is_b(insn)) > + return p + aarch64_get_branch_offset(insn); > + > + WARN_ON(1); > + } > + return addr; > +} > + > +void arch_static_call_transform(void *site, void *tramp, void *func, bool tail) > +{ > + /* > + * -0x8 > + * 0x0 bti c <--- trampoline entry point > + * 0x4 > + * 0x8 ldr x16, > + * 0xc cbz x16, 20 > + * 0x10 br x16 > + * 0x14 ret > + */ > + struct { > + u64 literal; > + __le32 insn[2]; > + } insns; > + u32 insn; > + int ret; > + > + insn = aarch64_insn_gen_hint(AARCH64_INSN_HINT_BTIC); > + insns.literal = (u64)func; > + insns.insn[0] = cpu_to_le32(insn); > + > + if (!func) { > + insn = aarch64_insn_gen_branch_reg(AARCH64_INSN_REG_LR, > + AARCH64_INSN_BRANCH_RETURN); > + } else { > + insn = aarch64_insn_gen_branch_imm((u64)tramp + 4, > + (u64)strip_cfi_jt(func), > + AARCH64_INSN_BRANCH_NOLINK); > + > + /* > + * Use a NOP if the branch target is out of range, and rely on > + * the indirect call instead. > + */ > + if (insn == AARCH64_BREAK_FAULT) > + insn = aarch64_insn_gen_hint(AARCH64_INSN_HINT_NOP); > + } > + insns.insn[1] = cpu_to_le32(insn); > + > + ret = __aarch64_insn_write(tramp - 8, &insns, sizeof(insns)); > + if (!WARN_ON(ret)) > + caches_clean_inval_pou((u64)tramp - 8, sizeof(insns)); > } > > int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn) > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S > index 50bab186c49b..e16860a14eaf 100644 > --- a/arch/arm64/kernel/vmlinux.lds.S > +++ b/arch/arm64/kernel/vmlinux.lds.S > @@ -173,6 +173,7 @@ SECTIONS > HIBERNATE_TEXT > KEXEC_TEXT > TRAMP_TEXT > + STATIC_CALL_TEXT > *(.gnu.warning) > . = ALIGN(16); > *(.got) /* Global offset table */ > -- > 2.30.2 > A CFI crash was reported in [1]. Also, I was able to reproduce a CFI failure with just a simple "linux-perf" command. With this patchset I no longer see these issues. Awesome! [1] https://lore.kernel.org/all/YfrQzoIWyv9lNljh@google.com/ -- Carlos Llamas