[PATCH 0/4] arm64: Add return address protection to asm code

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] arm64: Add return address protection to asm code
@ 2022-11-29 14:17 Ard Biesheuvel
  2022-11-29 14:18 ` [PATCH 1/4] arm64: assembler: Force error on misuse of .Lframe_local_offset Ard Biesheuvel
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 14:17 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown

Control flow integrity features such as shadow call stack or PAC work by
placing special instructions between the reload of the link register
from the stack and the function return. The point of this is not only to
protect the control flow when calling that particular function, but also
to ensure that the sequence of instructions appearing at the end of the
function cannot be subverted and used in other ways than intended in a
ROP/JOP style attack.

This means that it is generally a bad idea to incorporate any code that
is rarely or never used, but lacks such protections. So add some macros
that we can invoke in assembler code to protect the return address while
it is stored on the stack, and wire it up in the ftrace code and the EFI
runtime service wrapper code, both of which are often built into
production kernels even when not used.

Another example of this is crypto code, and I will be sending some fixes
via the crypto tree that ensure that these protections are enabled there
as well.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>

Ard Biesheuvel (4):
  arm64: assembler: Force error on misuse of .Lframe_local_offset
  arm64: assembler: Add macros for return address protection
  arm64: efi: Add return address protection to runtime wrapper
  arm64: ftrace: Add return address protection

 arch/arm64/include/asm/assembler.h | 82 ++++++++++++++++++++
 arch/arm64/kernel/efi-rt-wrapper.S | 12 ++-
 arch/arm64/kernel/entry-ftrace.S   | 28 ++++++-
 3 files changed, 117 insertions(+), 5 deletions(-)

-- 
2.35.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] arm64: assembler: Force error on misuse of .Lframe_local_offset
  2022-11-29 14:17 [PATCH 0/4] arm64: Add return address protection to asm code Ard Biesheuvel
@ 2022-11-29 14:18 ` Ard Biesheuvel
  2022-11-29 14:18 ` [PATCH 2/4] arm64: assembler: Add macros for return address protection Ard Biesheuvel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 14:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown

The frame_push macro sets a local symbol .Lframe_local_offset to the
offset where the local variable area resides in the stack frame.
However, while we take care not to nest frame_push and frame_pop
sequences, .Lframe_local_offset retains its most recent value, allowing
it to be referenced erroneously from outside a frame_push/frame_pop
pair. So set it to an obviously wrong value that is guaranteed to
trigger a link error in frame_pop.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 30eee6473cf0c0ea..3d1714a7eb6411ba 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -752,6 +752,7 @@ alternative_endif
 	.endif
 	ldp		x29, x30, [sp], #.Lframe_local_offset + .Lframe_extra
 	.set		.Lframe_regcount, -1
+	.set		.Lframe_local_offset, frame_local_offset_error
 	.endif
 	.endm
 
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] arm64: assembler: Add macros for return address protection
  2022-11-29 14:17 [PATCH 0/4] arm64: Add return address protection to asm code Ard Biesheuvel
  2022-11-29 14:18 ` [PATCH 1/4] arm64: assembler: Force error on misuse of .Lframe_local_offset Ard Biesheuvel
@ 2022-11-29 14:18 ` Ard Biesheuvel
  2022-11-30 14:15   ` Mark Rutland
  2022-11-29 14:18 ` [PATCH 3/4] arm64: efi: Add return address protection to runtime wrapper Ard Biesheuvel
  2022-11-29 14:18 ` [PATCH 4/4] arm64: ftrace: Add return address protection Ard Biesheuvel
  3 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 14:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown

When in-kernel pointer authentication is configured, emit PACIASP and
AUTIASP instructions as well as shadow call stack pushes and pops,
depending on the configuration.

Note that dynamic shadow call stack makes this slightly tricky, as it
depends on in-kernel BTI as well. The resulting code will never contain
both PAC and shadow call stack operations, even if shadow call stack
support is not configured as dynamic.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h | 81 ++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 3d1714a7eb6411ba..99d74c29ab3cbe05 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -692,6 +692,85 @@ alternative_endif
 #endif
 	.endm
 
+	/*
+	 * protect_return_address - protect the return address value in
+	 * register @reg, either by signing it using PAC and/or by storing it
+	 * on the shadow call stack.
+	 *
+	 * The sequence below emits a shadow call stack push if the feature is
+	 * enabled, and if in-kernel PAC is enabled as well, the instruction
+	 * will be patched into a PACIA instruction involving the same register
+	 * address (and SP as the modifier) if PAC is detected at runtime.
+	 *
+	 * If in-kernel BTI and dynamic shadow call stacks are also configured,
+	 * it becomes a bit more tricky, because then, shadow call stacks will
+	 * only be enabled on non-BTI hardware, regardless of the PAUTH state.
+	 * In that case, we emit one of the following sequences.
+	 *
+	 *     PAC+BTI enabled  No PAC or BTI	BTI without PAC	PAC without BTI
+	 *
+	 *     B   0f		NOP		B   0f		NOP
+	 *     NOP		SCS push	SCS push	NOP
+	 *  0: PACIA		NOP		NOP		PACIA
+	 *
+	 * Note that, due to the code patching occuring at function entry and
+	 * exit, these macros must not be used in code that may execute before
+	 * the boot CPU feature based code patching has completed.
+	 */
+	.macro		protect_return_address, reg=x30
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
+alternative_if ARM64_BTI
+	b		.L0_\@
+alternative_else_nop_endif
+#endif
+alternative_if_not ARM64_HAS_ADDRESS_AUTH
+#endif
+#ifdef CONFIG_SHADOW_CALL_STACK
+	str		\reg, [x18], #8
+#endif
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+#if !defined(CONFIG_SHADOW_CALL_STACK) || \
+    (defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL))
+.L0_\@:	nop
+#endif
+alternative_else
+#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
+	nop
+#endif
+	.arch_extension	pauth
+	pacia		\reg, sp
+alternative_endif
+#endif
+	.endm
+
+	/*
+	 * restore_return_address - restore the return address value in
+	 * register @reg, either by authenticating it using PAC and/or
+	 * reloading it from the shadow call stack.
+	 */
+	.macro		restore_return_address, reg=x30
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+alternative_if ARM64_HAS_ADDRESS_AUTH
+	.arch_extension	pauth
+	autia		\reg, sp
+alternative_else_nop_endif
+#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
+alternative_if ARM64_BTI
+	b		.L0_\@
+alternative_else_nop_endif
+#endif
+alternative_if_not ARM64_HAS_ADDRESS_AUTH
+#endif
+#ifdef CONFIG_SHADOW_CALL_STACK
+	ldr		\reg, [x18, #-8]!
+#endif
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+alternative_else_nop_endif
+.L0_\@:
+#endif
+	.endm
+
 	/*
 	 * frame_push - Push @regcount callee saved registers to the stack,
 	 *              starting at x19, as well as x29/x30, and set x29 to
@@ -699,6 +778,7 @@ alternative_endif
 	 *              for locals.
 	 */
 	.macro		frame_push, regcount:req, extra
+	protect_return_address
 	__frame		st, \regcount, \extra
 	.endm
 
@@ -710,6 +790,7 @@ alternative_endif
 	 */
 	.macro		frame_pop
 	__frame		ld
+	restore_return_address
 	.endm
 
 	.macro		__frame_regs, reg1, reg2, op, num
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] arm64: efi: Add return address protection to runtime wrapper
  2022-11-29 14:17 [PATCH 0/4] arm64: Add return address protection to asm code Ard Biesheuvel
  2022-11-29 14:18 ` [PATCH 1/4] arm64: assembler: Force error on misuse of .Lframe_local_offset Ard Biesheuvel
  2022-11-29 14:18 ` [PATCH 2/4] arm64: assembler: Add macros for return address protection Ard Biesheuvel
@ 2022-11-29 14:18 ` Ard Biesheuvel
  2022-11-29 14:18 ` [PATCH 4/4] arm64: ftrace: Add return address protection Ard Biesheuvel
  3 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 14:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown

Add return address protection to the EFI runtime wrapper so that this
code is less likely to be taken advantage for ROP/JOP style attacks.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/efi-rt-wrapper.S | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/efi-rt-wrapper.S b/arch/arm64/kernel/efi-rt-wrapper.S
index afd3e81e1b627b87..874da02f3a1664c3 100644
--- a/arch/arm64/kernel/efi-rt-wrapper.S
+++ b/arch/arm64/kernel/efi-rt-wrapper.S
@@ -6,6 +6,7 @@
 #include <linux/linkage.h>
 
 SYM_FUNC_START(__efi_rt_asm_wrapper)
+	protect_return_address
 	stp	x29, x30, [sp, #-112]!
 	mov	x29, sp
 
@@ -46,9 +47,7 @@ SYM_FUNC_START(__efi_rt_asm_wrapper)
 	ldp	x1, x2, [sp, #16]
 	cmp	x2, x18
 	ldp	x29, x30, [sp], #112
-	b.ne	0f
-	ret
-0:
+
 	/*
 	 * With CONFIG_SHADOW_CALL_STACK, the kernel uses x18 to store a
 	 * shadow stack pointer, which we need to restore before returning to
@@ -59,7 +58,10 @@ SYM_FUNC_START(__efi_rt_asm_wrapper)
 #ifdef CONFIG_SHADOW_CALL_STACK
 	ldr_this_cpu	x18, __efi_rt_asm_recover_sp + 8, x9
 #endif
-
+	b.ne	0f
+	restore_return_address
+	ret
+0:
 	b	efi_handle_corrupted_x18	// tail call
 SYM_FUNC_END(__efi_rt_asm_wrapper)
 
@@ -74,5 +76,7 @@ SYM_CODE_START(__efi_rt_asm_recover)
 	ldp	x27, x28, [sp, #96]
 	ldp	x29, x30, [sp], #112
 
+	restore_return_address
+
 	b	efi_handle_runtime_exception
 SYM_CODE_END(__efi_rt_asm_recover)
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-11-29 14:17 [PATCH 0/4] arm64: Add return address protection to asm code Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2022-11-29 14:18 ` [PATCH 3/4] arm64: efi: Add return address protection to runtime wrapper Ard Biesheuvel
@ 2022-11-29 14:18 ` Ard Biesheuvel
  2022-11-30 14:04   ` Mark Rutland
  3 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 14:18 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown

Use the newly added asm macros to protect and restore the return address
in the ftrace call wrappers, based on whichever method is active (PAC
and/or shadow call stack).

If the graph tracer is in use, this covers both the return address *to*
the ftrace call site as well as the return address *at* the call site,
and the latter will either be restored in return_to_handler(), or before
returning to the call site.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 795344ab4ec45889..c744e4dd8c90a352 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -35,6 +35,11 @@
  * is missing from the LR and existing chain of frame records.
  */
 	.macro  ftrace_regs_entry, allregs=0
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	protect_return_address x9
+#endif
+	protect_return_address x30
+
 	/* Make room for pt_regs, plus a callee frame */
 	sub	sp, sp, #(PT_REGS_SIZE + 16)
 
@@ -89,7 +94,9 @@ SYM_CODE_START(ftrace_caller)
 	b	ftrace_common
 SYM_CODE_END(ftrace_caller)
 
-SYM_CODE_START(ftrace_common)
+SYM_CODE_START_LOCAL(ftrace_common)
+	alternative_insn  nop, "xpaci x30", ARM64_HAS_ADDRESS_AUTH, IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
+
 	sub	x0, x30, #AARCH64_INSN_SIZE	// ip (callsite's BL insn)
 	mov	x1, x9				// parent_ip (callsite's LR)
 	ldr_l	x2, function_trace_op		// op
@@ -115,9 +122,27 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
 	ldr	x30, [sp, #S_LR]
 	ldr	x9, [sp, #S_PC]
 
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	/* grab the original return address from the stack */
+	ldr	x10, [sp, #PT_REGS_SIZE + 8]
+#endif
+
 	/* Restore the callsite's SP */
 	add	sp, sp, #PT_REGS_SIZE + 16
 
+	restore_return_address x9
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	/* compare the original return address with the actual one */
+	cmp	x10, x30
+	b.ne	0f
+
+	/*
+	 * If they are the same, unprotect it now. If it was modified, it will
+	 * be dealt with in return_to_handler() below.
+	 */
+	restore_return_address x30
+0:
+#endif
 	ret	x9
 SYM_CODE_END(ftrace_common)
 
@@ -329,6 +354,7 @@ SYM_CODE_START(return_to_handler)
 	ldp x6, x7, [sp, #48]
 	add sp, sp, #64
 
+	restore_return_address x30
 	ret
 SYM_CODE_END(return_to_handler)
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-11-29 14:18 ` [PATCH 4/4] arm64: ftrace: Add return address protection Ard Biesheuvel
@ 2022-11-30 14:04   ` Mark Rutland
  2022-11-30 14:26     ` Ard Biesheuvel
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-11-30 14:04 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

Hi Ard,

On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> Use the newly added asm macros to protect and restore the return address
> in the ftrace call wrappers, based on whichever method is active (PAC
> and/or shadow call stack).
> 
> If the graph tracer is in use, this covers both the return address *to*
> the ftrace call site as well as the return address *at* the call site,
> and the latter will either be restored in return_to_handler(), or before
> returning to the call site.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)

As a heads-up, this code has recently changed quite significantly, and this
won't apply to the version queued in arm64's for-next/{ftrace,core} branches.

I had a direction of travel in mind with some changes for better stacktracing,
which won't work with the approach here, so I'd prefer we do this a bit
differently; more on that below.

> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index 795344ab4ec45889..c744e4dd8c90a352 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -35,6 +35,11 @@
>   * is missing from the LR and existing chain of frame records.
>   */
>  	.macro  ftrace_regs_entry, allregs=0
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	protect_return_address x9
> +#endif
> +	protect_return_address x30

I think if we're going to protect the callsite's original LR (x9 here), we
should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
whether that's vulnerable rather than whether we intend to modify it, so I
don't think it makes sene to protect it conditionally based on
CONFIG_FUNCTION_GRAPH_TRACER.

I'm a bit worried this might confuse some ftrace code manipulating the return
address (e.g. manipulation of the ftrace graph return stack), as I don't think
that's all PAC-clean, and might need some modification.

> +
>  	/* Make room for pt_regs, plus a callee frame */
>  	sub	sp, sp, #(PT_REGS_SIZE + 16)
>  
> @@ -89,7 +94,9 @@ SYM_CODE_START(ftrace_caller)
>  	b	ftrace_common
>  SYM_CODE_END(ftrace_caller)
>  
> -SYM_CODE_START(ftrace_common)
> +SYM_CODE_START_LOCAL(ftrace_common)
> +	alternative_insn  nop, "xpaci x30", ARM64_HAS_ADDRESS_AUTH, IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> +
>  	sub	x0, x30, #AARCH64_INSN_SIZE	// ip (callsite's BL insn)
>  	mov	x1, x9				// parent_ip (callsite's LR)
>  	ldr_l	x2, function_trace_op		// op
> @@ -115,9 +122,27 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>  	ldr	x30, [sp, #S_LR]
>  	ldr	x9, [sp, #S_PC]
>  
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	/* grab the original return address from the stack */
> +	ldr	x10, [sp, #PT_REGS_SIZE + 8]
> +#endif

I'm planning to teach the stack unwinder how to unwind through ftrace_regs,
such that we wouldn't need to duplicate the LR in a frame record here, and so
we'd *only* have the copy inside the struct ftrace_regs.

I think we don't need the copy here if we sign the callsite's LR against the
base of the struct ftrace_regs. That way ftrace_graph_func() can sign the
updated return address, and this code wouldn't need to care. The ftrace_regs
have a copy of x18 that we can use to manipulate the SCS.

> +
>  	/* Restore the callsite's SP */
>  	add	sp, sp, #PT_REGS_SIZE + 16
>  
> +	restore_return_address x9
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	/* compare the original return address with the actual one */
> +	cmp	x10, x30
> +	b.ne	0f
> +
> +	/*
> +	 * If they are the same, unprotect it now. If it was modified, it will
> +	 * be dealt with in return_to_handler() below.
> +	 */
> +	restore_return_address x30
> +0:
> +#endif
>  	ret	x9

This means if the return address is clobbered, we'll blindly trust it without
authentication, which IMO undermines the point of signing it in the first
place.

As above, I'd prefer that we had ftrace_graph_func() fix things up so that we
can unconditionally authenticate things here, which would be a bit stronger and
simpler to reason about.

Thanks,
Mark.

>  SYM_CODE_END(ftrace_common)
>  
> @@ -329,6 +354,7 @@ SYM_CODE_START(return_to_handler)
>  	ldp x6, x7, [sp, #48]
>  	add sp, sp, #64
>  
> +	restore_return_address x30
>  	ret
>  SYM_CODE_END(return_to_handler)
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> -- 
> 2.35.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] arm64: assembler: Add macros for return address protection
  2022-11-29 14:18 ` [PATCH 2/4] arm64: assembler: Add macros for return address protection Ard Biesheuvel
@ 2022-11-30 14:15   ` Mark Rutland
  2022-11-30 14:33     ` Ard Biesheuvel
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-11-30 14:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

On Tue, Nov 29, 2022 at 03:18:01PM +0100, Ard Biesheuvel wrote:
> When in-kernel pointer authentication is configured, emit PACIASP and
> AUTIASP instructions as well as shadow call stack pushes and pops,
> depending on the configuration.
> 
> Note that dynamic shadow call stack makes this slightly tricky, as it
> depends on in-kernel BTI as well. The resulting code will never contain
> both PAC and shadow call stack operations, even if shadow call stack
> support is not configured as dynamic.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/assembler.h | 81 ++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 3d1714a7eb6411ba..99d74c29ab3cbe05 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -692,6 +692,85 @@ alternative_endif
>  #endif
>  	.endm
>  
> +	/*
> +	 * protect_return_address - protect the return address value in
> +	 * register @reg, either by signing it using PAC and/or by storing it
> +	 * on the shadow call stack.
> +	 *
> +	 * The sequence below emits a shadow call stack push if the feature is
> +	 * enabled, and if in-kernel PAC is enabled as well, the instruction
> +	 * will be patched into a PACIA instruction involving the same register
> +	 * address (and SP as the modifier) if PAC is detected at runtime.
> +	 *
> +	 * If in-kernel BTI and dynamic shadow call stacks are also configured,
> +	 * it becomes a bit more tricky, because then, shadow call stacks will
> +	 * only be enabled on non-BTI hardware, regardless of the PAUTH state.
> +	 * In that case, we emit one of the following sequences.
> +	 *
> +	 *     PAC+BTI enabled  No PAC or BTI	BTI without PAC	PAC without BTI
> +	 *
> +	 *     B   0f		NOP		B   0f		NOP
> +	 *     NOP		SCS push	SCS push	NOP
> +	 *  0: PACIA		NOP		NOP		PACIA
> +	 *
> +	 * Note that, due to the code patching occuring at function entry and
> +	 * exit, these macros must not be used in code that may execute before
> +	 * the boot CPU feature based code patching has completed.

I'm a bit worried about that, since there's some stuff like early ftrace that
might set up some state before this runs.

Is there no way we can have scs_patch() handle this the same as other PACIASP /
AUTIASP sequences? That would mean we do all SCS patching in one go, so there's
less risk of error, and we'd only require a single instruction rather than
three.

If we're happy doing this late, I think we could instead use a callback
alternative to align with the regular SCS patching logic -- default to
{PAC,AUT}IASP, and have the callback have the same checks as the SCS patching
to determine when to patch with LDR/STR.

Thanks,
Mark.

> +	 */
> +	.macro		protect_return_address, reg=x30
> +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> +#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
> +alternative_if ARM64_BTI
> +	b		.L0_\@
> +alternative_else_nop_endif
> +#endif
> +alternative_if_not ARM64_HAS_ADDRESS_AUTH
> +#endif
> +#ifdef CONFIG_SHADOW_CALL_STACK
> +	str		\reg, [x18], #8
> +#endif
> +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> +#if !defined(CONFIG_SHADOW_CALL_STACK) || \
> +    (defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL))
> +.L0_\@:	nop
> +#endif
> +alternative_else
> +#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
> +	nop
> +#endif
> +	.arch_extension	pauth
> +	pacia		\reg, sp
> +alternative_endif
> +#endif
> +	.endm
> +
> +	/*
> +	 * restore_return_address - restore the return address value in
> +	 * register @reg, either by authenticating it using PAC and/or
> +	 * reloading it from the shadow call stack.
> +	 */
> +	.macro		restore_return_address, reg=x30
> +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> +alternative_if ARM64_HAS_ADDRESS_AUTH
> +	.arch_extension	pauth
> +	autia		\reg, sp
> +alternative_else_nop_endif
> +#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
> +alternative_if ARM64_BTI
> +	b		.L0_\@
> +alternative_else_nop_endif
> +#endif
> +alternative_if_not ARM64_HAS_ADDRESS_AUTH
> +#endif
> +#ifdef CONFIG_SHADOW_CALL_STACK
> +	ldr		\reg, [x18, #-8]!
> +#endif
> +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> +alternative_else_nop_endif
> +.L0_\@:
> +#endif
> +	.endm
> +
>  	/*
>  	 * frame_push - Push @regcount callee saved registers to the stack,
>  	 *              starting at x19, as well as x29/x30, and set x29 to
> @@ -699,6 +778,7 @@ alternative_endif
>  	 *              for locals.
>  	 */
>  	.macro		frame_push, regcount:req, extra
> +	protect_return_address
>  	__frame		st, \regcount, \extra
>  	.endm
>  
> @@ -710,6 +790,7 @@ alternative_endif
>  	 */
>  	.macro		frame_pop
>  	__frame		ld
> +	restore_return_address
>  	.endm
>  
>  	.macro		__frame_regs, reg1, reg2, op, num
> -- 
> 2.35.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-11-30 14:04   ` Mark Rutland
@ 2022-11-30 14:26     ` Ard Biesheuvel
  2022-11-30 17:45       ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2022-11-30 14:26 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

Hi Mark,

On Wed, 30 Nov 2022 at 15:04, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Ard,
>
> On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> > Use the newly added asm macros to protect and restore the return address
> > in the ftrace call wrappers, based on whichever method is active (PAC
> > and/or shadow call stack).
> >
> > If the graph tracer is in use, this covers both the return address *to*
> > the ftrace call site as well as the return address *at* the call site,
> > and the latter will either be restored in return_to_handler(), or before
> > returning to the call site.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
>
> As a heads-up, this code has recently changed quite significantly, and this
> won't apply to the version queued in arm64's for-next/{ftrace,core} branches.
>
> I had a direction of travel in mind with some changes for better stacktracing,
> which won't work with the approach here, so I'd prefer we do this a bit
> differently; more on that below.
>
> > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> > index 795344ab4ec45889..c744e4dd8c90a352 100644
> > --- a/arch/arm64/kernel/entry-ftrace.S
> > +++ b/arch/arm64/kernel/entry-ftrace.S
> > @@ -35,6 +35,11 @@
> >   * is missing from the LR and existing chain of frame records.
> >   */
> >       .macro  ftrace_regs_entry, allregs=0
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > +     protect_return_address x9
> > +#endif
> > +     protect_return_address x30
>
> I think if we're going to protect the callsite's original LR (x9 here), we
> should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
> whether that's vulnerable rather than whether we intend to modify it, so I
> don't think it makes sene to protect it conditionally based on
> CONFIG_FUNCTION_GRAPH_TRACER.
>

My reasoning was that if we are not going to return from it (in
return_to_handler()), we can rely on the interrupted function to
sign/authenticate it as usual. So the only reason for signing it here
is so that we can authenticate it in return_to_handler() if that
exists on the call path, removing a potentially vulnerable sequence
from that function.

> I'm a bit worried this might confuse some ftrace code manipulating the return
> address (e.g. manipulation of the ftrace graph return stack), as I don't think
> that's all PAC-clean, and might need some modification.
>

This is the reason for the xpaci instruction below.

> > +
> >       /* Make room for pt_regs, plus a callee frame */
> >       sub     sp, sp, #(PT_REGS_SIZE + 16)
> >
> > @@ -89,7 +94,9 @@ SYM_CODE_START(ftrace_caller)
> >       b       ftrace_common
> >  SYM_CODE_END(ftrace_caller)
> >
> > -SYM_CODE_START(ftrace_common)
> > +SYM_CODE_START_LOCAL(ftrace_common)
> > +     alternative_insn  nop, "xpaci x30", ARM64_HAS_ADDRESS_AUTH, IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> > +
> >       sub     x0, x30, #AARCH64_INSN_SIZE     // ip (callsite's BL insn)
> >       mov     x1, x9                          // parent_ip (callsite's LR)
> >       ldr_l   x2, function_trace_op           // op
> > @@ -115,9 +122,27 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> >       ldr     x30, [sp, #S_LR]
> >       ldr     x9, [sp, #S_PC]
> >
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > +     /* grab the original return address from the stack */
> > +     ldr     x10, [sp, #PT_REGS_SIZE + 8]
> > +#endif
>
> I'm planning to teach the stack unwinder how to unwind through ftrace_regs,
> such that we wouldn't need to duplicate the LR in a frame record here, and so
> we'd *only* have the copy inside the struct ftrace_regs.
>

Does doing so solve anything beyond reducing the stack footprint by 16 bytes?

> I think we don't need the copy here if we sign the callsite's LR against the
> base of the struct ftrace_regs. That way ftrace_graph_func() can sign the
> updated return address, and this code wouldn't need to care. The ftrace_regs
> have a copy of x18 that we can use to manipulate the SCS.
>

The updated return address will be signed when returning to the call
site, and we never return from it here or anywhere else, so I don't
think we need to sign it to begin with.

What we need to sign here is the LR value that return_to_handler()
will use, so ideally, we'd only sign the callsite's LR if we know we
will be returning via return_to_handler().

> > +
> >       /* Restore the callsite's SP */
> >       add     sp, sp, #PT_REGS_SIZE + 16
> >
> > +     restore_return_address x9
> > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > +     /* compare the original return address with the actual one */
> > +     cmp     x10, x30
> > +     b.ne    0f
> > +
> > +     /*
> > +      * If they are the same, unprotect it now. If it was modified, it will
> > +      * be dealt with in return_to_handler() below.
> > +      */
> > +     restore_return_address x30
> > +0:
> > +#endif
> >       ret     x9
>
> This means if the return address is clobbered, we'll blindly trust it without
> authentication, which IMO undermines the point of signing it in the first
> place.
>

How do you mean? x9 is authenticated here, and x30 will either be
authenticated here or in return_to_handler()

> As above, I'd prefer that we had ftrace_graph_func() fix things up so that we
> can unconditionally authenticate things here, which would be a bit stronger and
> simpler to reason about.
>

I think having all in one place makes it much easier to reason about,
tbh. Adding additional handling of the PAC state as well as the shadow
call stack in ftrace_graph_func() seems much more fiddly to me.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] arm64: assembler: Add macros for return address protection
  2022-11-30 14:15   ` Mark Rutland
@ 2022-11-30 14:33     ` Ard Biesheuvel
  0 siblings, 0 replies; 14+ messages in thread
From: Ard Biesheuvel @ 2022-11-30 14:33 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

On Wed, 30 Nov 2022 at 15:16, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Nov 29, 2022 at 03:18:01PM +0100, Ard Biesheuvel wrote:
> > When in-kernel pointer authentication is configured, emit PACIASP and
> > AUTIASP instructions as well as shadow call stack pushes and pops,
> > depending on the configuration.
> >
> > Note that dynamic shadow call stack makes this slightly tricky, as it
> > depends on in-kernel BTI as well. The resulting code will never contain
> > both PAC and shadow call stack operations, even if shadow call stack
> > support is not configured as dynamic.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/include/asm/assembler.h | 81 ++++++++++++++++++++
> >  1 file changed, 81 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> > index 3d1714a7eb6411ba..99d74c29ab3cbe05 100644
> > --- a/arch/arm64/include/asm/assembler.h
> > +++ b/arch/arm64/include/asm/assembler.h
> > @@ -692,6 +692,85 @@ alternative_endif
> >  #endif
> >       .endm
> >
> > +     /*
> > +      * protect_return_address - protect the return address value in
> > +      * register @reg, either by signing it using PAC and/or by storing it
> > +      * on the shadow call stack.
> > +      *
> > +      * The sequence below emits a shadow call stack push if the feature is
> > +      * enabled, and if in-kernel PAC is enabled as well, the instruction
> > +      * will be patched into a PACIA instruction involving the same register
> > +      * address (and SP as the modifier) if PAC is detected at runtime.
> > +      *
> > +      * If in-kernel BTI and dynamic shadow call stacks are also configured,
> > +      * it becomes a bit more tricky, because then, shadow call stacks will
> > +      * only be enabled on non-BTI hardware, regardless of the PAUTH state.
> > +      * In that case, we emit one of the following sequences.
> > +      *
> > +      *     PAC+BTI enabled  No PAC or BTI   BTI without PAC PAC without BTI
> > +      *
> > +      *     B   0f           NOP             B   0f          NOP
> > +      *     NOP              SCS push        SCS push        NOP
> > +      *  0: PACIA            NOP             NOP             PACIA
> > +      *
> > +      * Note that, due to the code patching occuring at function entry and
> > +      * exit, these macros must not be used in code that may execute before
> > +      * the boot CPU feature based code patching has completed.
>
> I'm a bit worried about that, since there's some stuff like early ftrace that
> might set up some state before this runs.
>
> Is there no way we can have scs_patch() handle this the same as other PACIASP /
> AUTIASP sequences? That would mean we do all SCS patching in one go, so there's
> less risk of error, and we'd only require a single instruction rather than
> three.
>

Yes, we can. I actually have the patches ready to go to emit the CFI
directives etc.

The only problem is that it won't apply to PACIA instructions with
different registers, so we won't be able to use it for signing x9 in
the ftrace code, for instance. Also, it is a bit fiddly to get the CFI
directives right in general (so that unwinding will work through
functions that manipulate the stack) and I am reluctant to emit CFI
metadata with only PAC annotations which will confuse the unwinder
(either at runtime if we end up using it for any reason, or maybe just
GDB when people use it on a vmlinux that has dynamic SCS enabled)


> If we're happy doing this late, I think we could instead use a callback
> alternative to align with the regular SCS patching logic -- default to
> {PAC,AUT}IASP, and have the callback have the same checks as the SCS patching
> to determine when to patch with LDR/STR.
>

Agreed. Happy to code this up as a callback alternative but I didn't
want to burn too much time on that before giving the series some
review exposure.

> > +      */
> > +     .macro          protect_return_address, reg=x30
> > +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> > +#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
> > +alternative_if ARM64_BTI
> > +     b               .L0_\@
> > +alternative_else_nop_endif
> > +#endif
> > +alternative_if_not ARM64_HAS_ADDRESS_AUTH
> > +#endif
> > +#ifdef CONFIG_SHADOW_CALL_STACK
> > +     str             \reg, [x18], #8
> > +#endif
> > +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> > +#if !defined(CONFIG_SHADOW_CALL_STACK) || \
> > +    (defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL))
> > +.L0_\@:      nop
> > +#endif
> > +alternative_else
> > +#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
> > +     nop
> > +#endif
> > +     .arch_extension pauth
> > +     pacia           \reg, sp
> > +alternative_endif
> > +#endif
> > +     .endm
> > +
> > +     /*
> > +      * restore_return_address - restore the return address value in
> > +      * register @reg, either by authenticating it using PAC and/or
> > +      * reloading it from the shadow call stack.
> > +      */
> > +     .macro          restore_return_address, reg=x30
> > +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> > +alternative_if ARM64_HAS_ADDRESS_AUTH
> > +     .arch_extension pauth
> > +     autia           \reg, sp
> > +alternative_else_nop_endif
> > +#if defined(CONFIG_DYNAMIC_SCS) && defined(CONFIG_ARM64_BTI_KERNEL)
> > +alternative_if ARM64_BTI
> > +     b               .L0_\@
> > +alternative_else_nop_endif
> > +#endif
> > +alternative_if_not ARM64_HAS_ADDRESS_AUTH
> > +#endif
> > +#ifdef CONFIG_SHADOW_CALL_STACK
> > +     ldr             \reg, [x18, #-8]!
> > +#endif
> > +#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
> > +alternative_else_nop_endif
> > +.L0_\@:
> > +#endif
> > +     .endm
> > +
> >       /*
> >        * frame_push - Push @regcount callee saved registers to the stack,
> >        *              starting at x19, as well as x29/x30, and set x29 to
> > @@ -699,6 +778,7 @@ alternative_endif
> >        *              for locals.
> >        */
> >       .macro          frame_push, regcount:req, extra
> > +     protect_return_address
> >       __frame         st, \regcount, \extra
> >       .endm
> >
> > @@ -710,6 +790,7 @@ alternative_endif
> >        */
> >       .macro          frame_pop
> >       __frame         ld
> > +     restore_return_address
> >       .endm
> >
> >       .macro          __frame_regs, reg1, reg2, op, num
> > --
> > 2.35.1
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-11-30 14:26     ` Ard Biesheuvel
@ 2022-11-30 17:45       ` Mark Rutland
  2022-12-01 13:09         ` Ard Biesheuvel
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-11-30 17:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

On Wed, Nov 30, 2022 at 03:26:19PM +0100, Ard Biesheuvel wrote:
> On Wed, 30 Nov 2022 at 15:04, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> > > Use the newly added asm macros to protect and restore the return address
> > > in the ftrace call wrappers, based on whichever method is active (PAC
> > > and/or shadow call stack).
> > >
> > > If the graph tracer is in use, this covers both the return address *to*
> > > the ftrace call site as well as the return address *at* the call site,
> > > and the latter will either be restored in return_to_handler(), or before
> > > returning to the call site.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
> > >  1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > As a heads-up, this code has recently changed quite significantly, and this
> > won't apply to the version queued in arm64's for-next/{ftrace,core} branches.
> >
> > I had a direction of travel in mind with some changes for better stacktracing,
> > which won't work with the approach here, so I'd prefer we do this a bit
> > differently; more on that below.
> >
> > > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> > > index 795344ab4ec45889..c744e4dd8c90a352 100644
> > > --- a/arch/arm64/kernel/entry-ftrace.S
> > > +++ b/arch/arm64/kernel/entry-ftrace.S
> > > @@ -35,6 +35,11 @@
> > >   * is missing from the LR and existing chain of frame records.
> > >   */
> > >       .macro  ftrace_regs_entry, allregs=0
> > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > +     protect_return_address x9
> > > +#endif
> > > +     protect_return_address x30
> >
> > I think if we're going to protect the callsite's original LR (x9 here), we
> > should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
> > whether that's vulnerable rather than whether we intend to modify it, so I
> > don't think it makes sene to protect it conditionally based on
> > CONFIG_FUNCTION_GRAPH_TRACER.
> 
> My reasoning was that if we are not going to return from it (in
> return_to_handler()), we can rely on the interrupted function to
> sign/authenticate it as usual. So the only reason for signing it here
> is so that we can authenticate it in return_to_handler() if that
> exists on the call path, removing a potentially vulnerable sequence
> from that function.

What I was trying to point out is that there is a window where this is spilled
to the stack (and hence is potentially vulnerable) between
ftrace_{caller,regs_caller}() and the end of ftrace_common().

So if we don't protect this when CONFIG_FUNCTION_GRAPH_TRACER=n, it could be
clobbered during that window (e.g. while function tracers are invoked),
*before* we return back into the instrumented function and sign the
(potentially already clobbered) value.

Hence, my thinking is that we should sign this regardless of
CONFIG_FUNCTION_GRAPH_TRACER to mitigate that case. I agree that we also want
it to be signed while it's in the graph return stack (i.e. until the
instrumented function returns back to return_to_handler()). In general, we
should sign the value if it's going to be spilled to the stack.

> > I'm a bit worried this might confuse some ftrace code manipulating the return
> > address (e.g. manipulation of the ftrace graph return stack), as I don't think
> > that's all PAC-clean, and might need some modification.
> 
> This is the reason for the xpaci instruction below.

Unfortunately, that alone isn't sufficient.

What I was alluding to is that this change means the ftrace graph return stack
contains signed addresses, and other code doesn't expect that. For example,
arm64's stacktrace code currently depends on the graph return stack containing
plain pointers, and so that gets broken as of this patch when function graph
tracing is enabled:

| # uname -a
| # Linux buildroot 6.1.0-rc7-00003-g44a67f0b8ac7 #1 SMP PREEMPT Wed Nov 30 17:19:38 GMT 2022 aarch64 GNU/Linux
| # cat /proc/self/stack
| [<0>] proc_pid_stack+0xc0/0x130
| [<0>] proc_single_show+0x68/0x120
| [<0>] seq_read_iter+0x16c/0x45c
| [<0>] seq_read+0x98/0xd0
| [<0>] vfs_read+0xc8/0x2c0
| [<0>] ksys_read+0x78/0x110
| [<0>] __arm64_sys_read+0x24/0x30
| [<0>] invoke_syscall+0x50/0x120
| [<0>] el0_svc_common.constprop.0+0xd4/0xf4
| [<0>] do_el0_svc+0x34/0xd0
| [<0>] el0_svc+0x2c/0x84
| [<0>] el0t_64_sync_handler+0xf4/0x120
| [<0>] el0t_64_sync+0x18c/0x190
| # echo function_graph > /sys/kernel/debug/tracing/current_tracer 
| # cat /proc/self/stack
| [<0>] 0xf5f98000083dff40
| [<0>] 0xd6b88000083e0f68
| [<0>] 0x21ac800008381ad0
| [<0>] 0xd0bc800008381e58
| [<0>] 0x22b280000834bc28
| [<0>] 0xf0ca80000834c5c8
| [<0>] 0x299080000834c684
| [<0>] 0xb1a1800008029cf0
| [<0>] 0x9bd0800008029e94
| [<0>] 0x1788800008029ee8
| [<0>] 0xa08680000916dd5c
| [<0>] el0t_64_sync_handler+0xf4/0x120
| [<0>] el0t_64_sync+0x18c/0x190

That's unfortunate (and would break RELIABLE_STACKTRACE, which we're slowly
getting towards being able to implement), but it's simple enough to account for
in the stacktrace code.

I have a fear that there are other cases where code tries to consume the graph
return stack (or to match against entries within it), which would be similarly
broken. I vaguely recall that we had issues of that shape in the past when we
tried to adjust the reported PC value, and would need to go page that in to
check that we don't open a similar issue here.

> > > +
> > >       /* Make room for pt_regs, plus a callee frame */
> > >       sub     sp, sp, #(PT_REGS_SIZE + 16)
> > >
> > > @@ -89,7 +94,9 @@ SYM_CODE_START(ftrace_caller)
> > >       b       ftrace_common
> > >  SYM_CODE_END(ftrace_caller)
> > >
> > > -SYM_CODE_START(ftrace_common)
> > > +SYM_CODE_START_LOCAL(ftrace_common)
> > > +     alternative_insn  nop, "xpaci x30", ARM64_HAS_ADDRESS_AUTH, IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> > > +
> > >       sub     x0, x30, #AARCH64_INSN_SIZE     // ip (callsite's BL insn)
> > >       mov     x1, x9                          // parent_ip (callsite's LR)
> > >       ldr_l   x2, function_trace_op           // op
> > > @@ -115,9 +122,27 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> > >       ldr     x30, [sp, #S_LR]
> > >       ldr     x9, [sp, #S_PC]
> > >
> > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > +     /* grab the original return address from the stack */
> > > +     ldr     x10, [sp, #PT_REGS_SIZE + 8]
> > > +#endif
> >
> > I'm planning to teach the stack unwinder how to unwind through ftrace_regs,
> > such that we wouldn't need to duplicate the LR in a frame record here, and so
> > we'd *only* have the copy inside the struct ftrace_regs.
> 
> Does doing so solve anything beyond reducing the stack footprint by 16 bytes?

My concern is functional rather than stack space. Having the single copy means
that it's not possible for the two copies to become out-of-sync, and so the
unwinder will always return the actual return address even when it has been
rewritten. Thats important for livepatching where that may be changed for
function redirection rather than tracing (and so there's not a return path to
balance against), and similarly for ftrace direct calls / trampolines, which we
might need to implement a variant of.

I've already implemented similar logic for unwinding through the pt_regs, and
I'm planning to clean that up and get it out after v6.2-rc1:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/metadata

> > I think we don't need the copy here if we sign the callsite's LR against the
> > base of the struct ftrace_regs. That way ftrace_graph_func() can sign the
> > updated return address, and this code wouldn't need to care. The ftrace_regs
> > have a copy of x18 that we can use to manipulate the SCS.
> 
> The updated return address will be signed when returning to the call
> site, and we never return from it here or anywhere else, so I don't
> think we need to sign it to begin with.

As above, there's a window where it's spilled ot the stack, and I think we
should protect it for that window where it has been spilled. Otherwise it can
be clobbered prior to being signed.

> What we need to sign here is the LR value that return_to_handler()
> will use, so ideally, we'd only sign the callsite's LR if we know we
> will be returning via return_to_handler().
> 
> > > +
> > >       /* Restore the callsite's SP */
> > >       add     sp, sp, #PT_REGS_SIZE + 16
> > >
> > > +     restore_return_address x9
> > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > +     /* compare the original return address with the actual one */
> > > +     cmp     x10, x30
> > > +     b.ne    0f
> > > +
> > > +     /*
> > > +      * If they are the same, unprotect it now. If it was modified, it will
> > > +      * be dealt with in return_to_handler() below.
> > > +      */
> > > +     restore_return_address x30
> > > +0:
> > > +#endif
> > >       ret     x9
> >
> > This means if the return address is clobbered, we'll blindly trust it without
> > authentication, which IMO undermines the point of signing it in the first
> > place.
> 
> How do you mean? x9 is authenticated here, and x30 will either be
> authenticated here or in return_to_handler()

I had confused myself here; sorry for the noise.

> > As above, I'd prefer that we had ftrace_graph_func() fix things up so that we
> > can unconditionally authenticate things here, which would be a bit stronger and
> > simpler to reason about.
> >
> 
> I think having all in one place makes it much easier to reason about,
> tbh. Adding additional handling of the PAC state as well as the shadow
> call stack in ftrace_graph_func() seems much more fiddly to me.

I appreciate that concern, but my intuition here is the inverse; I'd like to
avoid the conditionality in the regular tracing path to make that clearly
balanced and (from my perspective) easier to reason about.

I'm happy if we have to do a bit more work in ftrace_graph_func() and
return_to_handler() since those are already more special anyway.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-11-30 17:45       ` Mark Rutland
@ 2022-12-01 13:09         ` Ard Biesheuvel
  2022-12-01 14:40           ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2022-12-01 13:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

On Wed, 30 Nov 2022 at 18:45, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, Nov 30, 2022 at 03:26:19PM +0100, Ard Biesheuvel wrote:
> > On Wed, 30 Nov 2022 at 15:04, Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> > > > Use the newly added asm macros to protect and restore the return address
> > > > in the ftrace call wrappers, based on whichever method is active (PAC
> > > > and/or shadow call stack).
> > > >
> > > > If the graph tracer is in use, this covers both the return address *to*
> > > > the ftrace call site as well as the return address *at* the call site,
> > > > and the latter will either be restored in return_to_handler(), or before
> > > > returning to the call site.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
> > > >  1 file changed, 27 insertions(+), 1 deletion(-)
> > >
> > > As a heads-up, this code has recently changed quite significantly, and this
> > > won't apply to the version queued in arm64's for-next/{ftrace,core} branches.
> > >
> > > I had a direction of travel in mind with some changes for better stacktracing,
> > > which won't work with the approach here, so I'd prefer we do this a bit
> > > differently; more on that below.
> > >
> > > > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> > > > index 795344ab4ec45889..c744e4dd8c90a352 100644
> > > > --- a/arch/arm64/kernel/entry-ftrace.S
> > > > +++ b/arch/arm64/kernel/entry-ftrace.S
> > > > @@ -35,6 +35,11 @@
> > > >   * is missing from the LR and existing chain of frame records.
> > > >   */
> > > >       .macro  ftrace_regs_entry, allregs=0
> > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > +     protect_return_address x9
> > > > +#endif
> > > > +     protect_return_address x30
> > >
> > > I think if we're going to protect the callsite's original LR (x9 here), we
> > > should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
> > > whether that's vulnerable rather than whether we intend to modify it, so I
> > > don't think it makes sene to protect it conditionally based on
> > > CONFIG_FUNCTION_GRAPH_TRACER.
> >
> > My reasoning was that if we are not going to return from it (in
> > return_to_handler()), we can rely on the interrupted function to
> > sign/authenticate it as usual. So the only reason for signing it here
> > is so that we can authenticate it in return_to_handler() if that
> > exists on the call path, removing a potentially vulnerable sequence
> > from that function.
>
> What I was trying to point out is that there is a window where this is spilled
> to the stack (and hence is potentially vulnerable) between
> ftrace_{caller,regs_caller}() and the end of ftrace_common().
>
> So if we don't protect this when CONFIG_FUNCTION_GRAPH_TRACER=n, it could be
> clobbered during that window (e.g. while function tracers are invoked),
> *before* we return back into the instrumented function and sign the
> (potentially already clobbered) value.
>

Agreed.

But to clarify, the intent of this series is not to add protection to
ftrace, the intent is to get rid of the gadgets from the ftrace code
that can be abused even if you don't use ftrace at all.

> Hence, my thinking is that we should sign this regardless of
> CONFIG_FUNCTION_GRAPH_TRACER to mitigate that case. I agree that we also want
> it to be signed while it's in the graph return stack (i.e. until the
> instrumented function returns back to return_to_handler()). In general, we
> should sign the value if it's going to be spilled to the stack.
>

Sure, but it solves a different problem.

> > > I'm a bit worried this might confuse some ftrace code manipulating the return
> > > address (e.g. manipulation of the ftrace graph return stack), as I don't think
> > > that's all PAC-clean, and might need some modification.
> >
> > This is the reason for the xpaci instruction below.
>
> Unfortunately, that alone isn't sufficient.
>
> What I was alluding to is that this change means the ftrace graph return stack
> contains signed addresses, and other code doesn't expect that. For example,
> arm64's stacktrace code currently depends on the graph return stack containing
> plain pointers, and so that gets broken as of this patch when function graph
> tracing is enabled:
>
> | # uname -a
> | # Linux buildroot 6.1.0-rc7-00003-g44a67f0b8ac7 #1 SMP PREEMPT Wed Nov 30 17:19:38 GMT 2022 aarch64 GNU/Linux
> | # cat /proc/self/stack
> | [<0>] proc_pid_stack+0xc0/0x130
> | [<0>] proc_single_show+0x68/0x120
> | [<0>] seq_read_iter+0x16c/0x45c
> | [<0>] seq_read+0x98/0xd0
> | [<0>] vfs_read+0xc8/0x2c0
> | [<0>] ksys_read+0x78/0x110
> | [<0>] __arm64_sys_read+0x24/0x30
> | [<0>] invoke_syscall+0x50/0x120
> | [<0>] el0_svc_common.constprop.0+0xd4/0xf4
> | [<0>] do_el0_svc+0x34/0xd0
> | [<0>] el0_svc+0x2c/0x84
> | [<0>] el0t_64_sync_handler+0xf4/0x120
> | [<0>] el0t_64_sync+0x18c/0x190
> | # echo function_graph > /sys/kernel/debug/tracing/current_tracer
> | # cat /proc/self/stack
> | [<0>] 0xf5f98000083dff40
> | [<0>] 0xd6b88000083e0f68
> | [<0>] 0x21ac800008381ad0
> | [<0>] 0xd0bc800008381e58
> | [<0>] 0x22b280000834bc28
> | [<0>] 0xf0ca80000834c5c8
> | [<0>] 0x299080000834c684
> | [<0>] 0xb1a1800008029cf0
> | [<0>] 0x9bd0800008029e94
> | [<0>] 0x1788800008029ee8
> | [<0>] 0xa08680000916dd5c
> | [<0>] el0t_64_sync_handler+0xf4/0x120
> | [<0>] el0t_64_sync+0x18c/0x190
>
> That's unfortunate (and would break RELIABLE_STACKTRACE, which we're slowly
> getting towards being able to implement), but it's simple enough to account for
> in the stacktrace code.
>

Indeed. Those functions should just strip the PAC bits, no?

> I have a fear that there are other cases where code tries to consume the graph
> return stack (or to match against entries within it), which would be similarly
> broken. I vaguely recall that we had issues of that shape in the past when we
> tried to adjust the reported PC value, and would need to go page that in to
> check that we don't open a similar issue here.
>

OK

> > > > +
> > > >       /* Make room for pt_regs, plus a callee frame */
> > > >       sub     sp, sp, #(PT_REGS_SIZE + 16)
> > > >
> > > > @@ -89,7 +94,9 @@ SYM_CODE_START(ftrace_caller)
> > > >       b       ftrace_common
> > > >  SYM_CODE_END(ftrace_caller)
> > > >
> > > > -SYM_CODE_START(ftrace_common)
> > > > +SYM_CODE_START_LOCAL(ftrace_common)
> > > > +     alternative_insn  nop, "xpaci x30", ARM64_HAS_ADDRESS_AUTH, IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> > > > +
> > > >       sub     x0, x30, #AARCH64_INSN_SIZE     // ip (callsite's BL insn)
> > > >       mov     x1, x9                          // parent_ip (callsite's LR)
> > > >       ldr_l   x2, function_trace_op           // op
> > > > @@ -115,9 +122,27 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> > > >       ldr     x30, [sp, #S_LR]
> > > >       ldr     x9, [sp, #S_PC]
> > > >
> > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > +     /* grab the original return address from the stack */
> > > > +     ldr     x10, [sp, #PT_REGS_SIZE + 8]
> > > > +#endif
> > >
> > > I'm planning to teach the stack unwinder how to unwind through ftrace_regs,
> > > such that we wouldn't need to duplicate the LR in a frame record here, and so
> > > we'd *only* have the copy inside the struct ftrace_regs.
> >
> > Does doing so solve anything beyond reducing the stack footprint by 16 bytes?
>
> My concern is functional rather than stack space. Having the single copy means
> that it's not possible for the two copies to become out-of-sync, and so the
> unwinder will always return the actual return address even when it has been
> rewritten. Thats important for livepatching where that may be changed for
> function redirection rather than tracing (and so there's not a return path to
> balance against), and similarly for ftrace direct calls / trampolines, which we
> might need to implement a variant of.
>
> I've already implemented similar logic for unwinding through the pt_regs, and
> I'm planning to clean that up and get it out after v6.2-rc1:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/metadata
>
> > > I think we don't need the copy here if we sign the callsite's LR against the
> > > base of the struct ftrace_regs. That way ftrace_graph_func() can sign the
> > > updated return address, and this code wouldn't need to care. The ftrace_regs
> > > have a copy of x18 that we can use to manipulate the SCS.
> >
> > The updated return address will be signed when returning to the call
> > site, and we never return from it here or anywhere else, so I don't
> > think we need to sign it to begin with.
>
> As above, there's a window where it's spilled ot the stack, and I think we
> should protect it for that window where it has been spilled. Otherwise it can
> be clobbered prior to being signed.
>

Yes, if ftrace is in use.

> > What we need to sign here is the LR value that return_to_handler()
> > will use, so ideally, we'd only sign the callsite's LR if we know we
> > will be returning via return_to_handler().
> >
> > > > +
> > > >       /* Restore the callsite's SP */
> > > >       add     sp, sp, #PT_REGS_SIZE + 16
> > > >
> > > > +     restore_return_address x9
> > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > +     /* compare the original return address with the actual one */
> > > > +     cmp     x10, x30
> > > > +     b.ne    0f
> > > > +
> > > > +     /*
> > > > +      * If they are the same, unprotect it now. If it was modified, it will
> > > > +      * be dealt with in return_to_handler() below.
> > > > +      */
> > > > +     restore_return_address x30
> > > > +0:
> > > > +#endif
> > > >       ret     x9
> > >
> > > This means if the return address is clobbered, we'll blindly trust it without
> > > authentication, which IMO undermines the point of signing it in the first
> > > place.
> >
> > How do you mean? x9 is authenticated here, and x30 will either be
> > authenticated here or in return_to_handler()
>
> I had confused myself here; sorry for the noise.
>
> > > As above, I'd prefer that we had ftrace_graph_func() fix things up so that we
> > > can unconditionally authenticate things here, which would be a bit stronger and
> > > simpler to reason about.
> > >
> >
> > I think having all in one place makes it much easier to reason about,
> > tbh. Adding additional handling of the PAC state as well as the shadow
> > call stack in ftrace_graph_func() seems much more fiddly to me.
>
> I appreciate that concern, but my intuition here is the inverse; I'd like to
> avoid the conditionality in the regular tracing path to make that clearly
> balanced and (from my perspective) easier to reason about.
>
> I'm happy if we have to do a bit more work in ftrace_graph_func() and
> return_to_handler() since those are already more special anyway.
>

Fair enough. As long as the asm routines have a SCS pop or AUTIASP
between reloading x30 and returning to it, I don't have any problems
with that.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-12-01 13:09         ` Ard Biesheuvel
@ 2022-12-01 14:40           ` Mark Rutland
  2022-12-01 15:05             ` Ard Biesheuvel
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-12-01 14:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

On Thu, Dec 01, 2022 at 02:09:41PM +0100, Ard Biesheuvel wrote:
> On Wed, 30 Nov 2022 at 18:45, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Wed, Nov 30, 2022 at 03:26:19PM +0100, Ard Biesheuvel wrote:
> > > On Wed, 30 Nov 2022 at 15:04, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> > > > > Use the newly added asm macros to protect and restore the return address
> > > > > in the ftrace call wrappers, based on whichever method is active (PAC
> > > > > and/or shadow call stack).
> > > > >
> > > > > If the graph tracer is in use, this covers both the return address *to*
> > > > > the ftrace call site as well as the return address *at* the call site,
> > > > > and the latter will either be restored in return_to_handler(), or before
> > > > > returning to the call site.
> > > > >
> > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > ---
> > > > >  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
> > > > >  1 file changed, 27 insertions(+), 1 deletion(-)
> > > >
> > > > As a heads-up, this code has recently changed quite significantly, and this
> > > > won't apply to the version queued in arm64's for-next/{ftrace,core} branches.
> > > >
> > > > I had a direction of travel in mind with some changes for better stacktracing,
> > > > which won't work with the approach here, so I'd prefer we do this a bit
> > > > differently; more on that below.
> > > >
> > > > > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> > > > > index 795344ab4ec45889..c744e4dd8c90a352 100644
> > > > > --- a/arch/arm64/kernel/entry-ftrace.S
> > > > > +++ b/arch/arm64/kernel/entry-ftrace.S
> > > > > @@ -35,6 +35,11 @@
> > > > >   * is missing from the LR and existing chain of frame records.
> > > > >   */
> > > > >       .macro  ftrace_regs_entry, allregs=0
> > > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > > +     protect_return_address x9
> > > > > +#endif
> > > > > +     protect_return_address x30
> > > >
> > > > I think if we're going to protect the callsite's original LR (x9 here), we
> > > > should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
> > > > whether that's vulnerable rather than whether we intend to modify it, so I
> > > > don't think it makes sene to protect it conditionally based on
> > > > CONFIG_FUNCTION_GRAPH_TRACER.
> > >
> > > My reasoning was that if we are not going to return from it (in
> > > return_to_handler()), we can rely on the interrupted function to
> > > sign/authenticate it as usual. So the only reason for signing it here
> > > is so that we can authenticate it in return_to_handler() if that
> > > exists on the call path, removing a potentially vulnerable sequence
> > > from that function.
> >
> > What I was trying to point out is that there is a window where this is spilled
> > to the stack (and hence is potentially vulnerable) between
> > ftrace_{caller,regs_caller}() and the end of ftrace_common().
> >
> > So if we don't protect this when CONFIG_FUNCTION_GRAPH_TRACER=n, it could be
> > clobbered during that window (e.g. while function tracers are invoked),
> > *before* we return back into the instrumented function and sign the
> > (potentially already clobbered) value.
> 
> Agreed.
> 
> But to clarify, the intent of this series is not to add protection to
> ftrace, the intent is to get rid of the gadgets from the ftrace code
> that can be abused even if you don't use ftrace at all.

Ok; sorry for missing that; I'll need to think a little harder.

> > Hence, my thinking is that we should sign this regardless of
> > CONFIG_FUNCTION_GRAPH_TRACER to mitigate that case. I agree that we also want
> > it to be signed while it's in the graph return stack (i.e. until the
> > instrumented function returns back to return_to_handler()). In general, we
> > should sign the value if it's going to be spilled to the stack.
> 
> Sure, but it solves a different problem.

Fair enough!

I think we're agreed that something which solves both issues makes sense, even
if that's not necessary for the gadgetisation issue specifically?

> > > > I'm a bit worried this might confuse some ftrace code manipulating the return
> > > > address (e.g. manipulation of the ftrace graph return stack), as I don't think
> > > > that's all PAC-clean, and might need some modification.
> > >
> > > This is the reason for the xpaci instruction below.
> >
> > Unfortunately, that alone isn't sufficient.
> >
> > What I was alluding to is that this change means the ftrace graph return stack
> > contains signed addresses, and other code doesn't expect that. For example,
> > arm64's stacktrace code currently depends on the graph return stack containing
> > plain pointers, and so that gets broken as of this patch when function graph
> > tracing is enabled:
> >
> > | # uname -a
> > | # Linux buildroot 6.1.0-rc7-00003-g44a67f0b8ac7 #1 SMP PREEMPT Wed Nov 30 17:19:38 GMT 2022 aarch64 GNU/Linux
> > | # cat /proc/self/stack
> > | [<0>] proc_pid_stack+0xc0/0x130
> > | [<0>] proc_single_show+0x68/0x120
> > | [<0>] seq_read_iter+0x16c/0x45c
> > | [<0>] seq_read+0x98/0xd0
> > | [<0>] vfs_read+0xc8/0x2c0
> > | [<0>] ksys_read+0x78/0x110
> > | [<0>] __arm64_sys_read+0x24/0x30
> > | [<0>] invoke_syscall+0x50/0x120
> > | [<0>] el0_svc_common.constprop.0+0xd4/0xf4
> > | [<0>] do_el0_svc+0x34/0xd0
> > | [<0>] el0_svc+0x2c/0x84
> > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > | [<0>] el0t_64_sync+0x18c/0x190
> > | # echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > | # cat /proc/self/stack
> > | [<0>] 0xf5f98000083dff40
> > | [<0>] 0xd6b88000083e0f68
> > | [<0>] 0x21ac800008381ad0
> > | [<0>] 0xd0bc800008381e58
> > | [<0>] 0x22b280000834bc28
> > | [<0>] 0xf0ca80000834c5c8
> > | [<0>] 0x299080000834c684
> > | [<0>] 0xb1a1800008029cf0
> > | [<0>] 0x9bd0800008029e94
> > | [<0>] 0x1788800008029ee8
> > | [<0>] 0xa08680000916dd5c
> > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > | [<0>] el0t_64_sync+0x18c/0x190
> >
> > That's unfortunate (and would break RELIABLE_STACKTRACE, which we're slowly
> > getting towards being able to implement), but it's simple enough to account for
> > in the stacktrace code.
> >
> 
> Indeed. Those functions should just strip the PAC bits, no?

For that case, yup. That was roughly what I meant about it being simple to deal
with in the stacktrace code. :)

> > I have a fear that there are other cases where code tries to consume the graph
> > return stack (or to match against entries within it), which would be similarly
> > broken. I vaguely recall that we had issues of that shape in the past when we
> > tried to adjust the reported PC value, and would need to go page that in to
> > check that we don't open a similar issue here.
> 
> OK

FWIW, I'm happy to go audit that, I just wanted to make sure we didn't forget
to do so, since it's not obvious that there are potential issues there.

[...]

> > > > >       /* Restore the callsite's SP */
> > > > >       add     sp, sp, #PT_REGS_SIZE + 16
> > > > >
> > > > > +     restore_return_address x9
> > > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > > +     /* compare the original return address with the actual one */
> > > > > +     cmp     x10, x30
> > > > > +     b.ne    0f
> > > > > +
> > > > > +     /*
> > > > > +      * If they are the same, unprotect it now. If it was modified, it will
> > > > > +      * be dealt with in return_to_handler() below.
> > > > > +      */
> > > > > +     restore_return_address x30
> > > > > +0:
> > > > > +#endif
> > > > >       ret     x9

> > > > As above, I'd prefer that we had ftrace_graph_func() fix things up so that we
> > > > can unconditionally authenticate things here, which would be a bit stronger and
> > > > simpler to reason about.
> > > >
> > >
> > > I think having all in one place makes it much easier to reason about,
> > > tbh. Adding additional handling of the PAC state as well as the shadow
> > > call stack in ftrace_graph_func() seems much more fiddly to me.
> >
> > I appreciate that concern, but my intuition here is the inverse; I'd like to
> > avoid the conditionality in the regular tracing path to make that clearly
> > balanced and (from my perspective) easier to reason about.
> >
> > I'm happy if we have to do a bit more work in ftrace_graph_func() and
> > return_to_handler() since those are already more special anyway.
> >
> 
> Fair enough. As long as the asm routines have a SCS pop or AUTIASP
> between reloading x30 and returning to it, I don't have any problems
> with that.

Sure; I think that's workable. I have a rough shape in mind, so I'll have a go
at that as an example and try to get back to you shortly.

With that in mind, I think we should also fix up
qcom_link_stack_sanitisation(), since that ends up creating a gadget of the form:

	MOV	X30, Xn
	RET

... and that can be fixed by leaving it to the compiler to save/restore x30,
whereupon it should create a frame record and all the usual PAC goodness.
Example patch below (reformatted into the usual arm64 inline asm style).

Thanks,
Mark.

---->8----
diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
index bfce41c2a53b3..9fc54facf1ccb 100644
--- a/arch/arm64/kernel/proton-pack.c
+++ b/arch/arm64/kernel/proton-pack.c
@@ -250,12 +250,13 @@ static noinstr void qcom_link_stack_sanitisation(void)
 {
        u64 tmp;
 
-       asm volatile("mov       %0, x30         \n"
-                    ".rept     16              \n"
-                    "bl        . + 4           \n"
-                    ".endr                     \n"
-                    "mov       x30, %0         \n"
-                    : "=&r" (tmp));
+       asm volatile(
+       "       .rept   16              \n"
+       "       bl      . + 4           \n"
+       "       .endr                   \n"
+       : "=&r" (tmp)
+       :
+       : "x30");
 }
 
 static bp_hardening_cb_t spectre_v2_get_sw_mitigation_cb(void)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-12-01 14:40           ` Mark Rutland
@ 2022-12-01 15:05             ` Ard Biesheuvel
  2022-12-01 15:48               ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Ard Biesheuvel @ 2022-12-01 15:05 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

On Thu, 1 Dec 2022 at 15:40, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Dec 01, 2022 at 02:09:41PM +0100, Ard Biesheuvel wrote:
> > On Wed, 30 Nov 2022 at 18:45, Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 03:26:19PM +0100, Ard Biesheuvel wrote:
> > > > On Wed, 30 Nov 2022 at 15:04, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> > > > > > Use the newly added asm macros to protect and restore the return address
> > > > > > in the ftrace call wrappers, based on whichever method is active (PAC
> > > > > > and/or shadow call stack).
> > > > > >
> > > > > > If the graph tracer is in use, this covers both the return address *to*
> > > > > > the ftrace call site as well as the return address *at* the call site,
> > > > > > and the latter will either be restored in return_to_handler(), or before
> > > > > > returning to the call site.
> > > > > >
> > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > ---
> > > > > >  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
> > > > > >  1 file changed, 27 insertions(+), 1 deletion(-)
> > > > >
> > > > > As a heads-up, this code has recently changed quite significantly, and this
> > > > > won't apply to the version queued in arm64's for-next/{ftrace,core} branches.
> > > > >
> > > > > I had a direction of travel in mind with some changes for better stacktracing,
> > > > > which won't work with the approach here, so I'd prefer we do this a bit
> > > > > differently; more on that below.
> > > > >
> > > > > > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> > > > > > index 795344ab4ec45889..c744e4dd8c90a352 100644
> > > > > > --- a/arch/arm64/kernel/entry-ftrace.S
> > > > > > +++ b/arch/arm64/kernel/entry-ftrace.S
> > > > > > @@ -35,6 +35,11 @@
> > > > > >   * is missing from the LR and existing chain of frame records.
> > > > > >   */
> > > > > >       .macro  ftrace_regs_entry, allregs=0
> > > > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > > > +     protect_return_address x9
> > > > > > +#endif
> > > > > > +     protect_return_address x30
> > > > >
> > > > > I think if we're going to protect the callsite's original LR (x9 here), we
> > > > > should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
> > > > > whether that's vulnerable rather than whether we intend to modify it, so I
> > > > > don't think it makes sene to protect it conditionally based on
> > > > > CONFIG_FUNCTION_GRAPH_TRACER.
> > > >
> > > > My reasoning was that if we are not going to return from it (in
> > > > return_to_handler()), we can rely on the interrupted function to
> > > > sign/authenticate it as usual. So the only reason for signing it here
> > > > is so that we can authenticate it in return_to_handler() if that
> > > > exists on the call path, removing a potentially vulnerable sequence
> > > > from that function.
> > >
> > > What I was trying to point out is that there is a window where this is spilled
> > > to the stack (and hence is potentially vulnerable) between
> > > ftrace_{caller,regs_caller}() and the end of ftrace_common().
> > >
> > > So if we don't protect this when CONFIG_FUNCTION_GRAPH_TRACER=n, it could be
> > > clobbered during that window (e.g. while function tracers are invoked),
> > > *before* we return back into the instrumented function and sign the
> > > (potentially already clobbered) value.
> >
> > Agreed.
> >
> > But to clarify, the intent of this series is not to add protection to
> > ftrace, the intent is to get rid of the gadgets from the ftrace code
> > that can be abused even if you don't use ftrace at all.
>
> Ok; sorry for missing that; I'll need to think a little harder.
>

You said it :-)

> > > Hence, my thinking is that we should sign this regardless of
> > > CONFIG_FUNCTION_GRAPH_TRACER to mitigate that case. I agree that we also want
> > > it to be signed while it's in the graph return stack (i.e. until the
> > > instrumented function returns back to return_to_handler()). In general, we
> > > should sign the value if it's going to be spilled to the stack.
> >
> > Sure, but it solves a different problem.
>
> Fair enough!
>
> I think we're agreed that something which solves both issues makes sense, even
> if that's not necessary for the gadgetisation issue specifically?
>

Of course.

So the issue we are talking about here is the fact that you might be
able to attack the ftrace infrastructure while it is being used so
that the function return from ftrace_common() is made to point
somewhere else. I agree that this is something we might want to
harden, and I also wonder whether we should perhaps insert three NOPs
instead of two, or teach the compiler to put its PACIASP right after
so that we can use BR instead of RET to perform the return.

But again, this is ground that I am currently not attempting to cover.

> > > > > I'm a bit worried this might confuse some ftrace code manipulating the return
> > > > > address (e.g. manipulation of the ftrace graph return stack), as I don't think
> > > > > that's all PAC-clean, and might need some modification.
> > > >
> > > > This is the reason for the xpaci instruction below.
> > >
> > > Unfortunately, that alone isn't sufficient.
> > >
> > > What I was alluding to is that this change means the ftrace graph return stack
> > > contains signed addresses, and other code doesn't expect that. For example,
> > > arm64's stacktrace code currently depends on the graph return stack containing
> > > plain pointers, and so that gets broken as of this patch when function graph
> > > tracing is enabled:
> > >
> > > | # uname -a
> > > | # Linux buildroot 6.1.0-rc7-00003-g44a67f0b8ac7 #1 SMP PREEMPT Wed Nov 30 17:19:38 GMT 2022 aarch64 GNU/Linux
> > > | # cat /proc/self/stack
> > > | [<0>] proc_pid_stack+0xc0/0x130
> > > | [<0>] proc_single_show+0x68/0x120
> > > | [<0>] seq_read_iter+0x16c/0x45c
> > > | [<0>] seq_read+0x98/0xd0
> > > | [<0>] vfs_read+0xc8/0x2c0
> > > | [<0>] ksys_read+0x78/0x110
> > > | [<0>] __arm64_sys_read+0x24/0x30
> > > | [<0>] invoke_syscall+0x50/0x120
> > > | [<0>] el0_svc_common.constprop.0+0xd4/0xf4
> > > | [<0>] do_el0_svc+0x34/0xd0
> > > | [<0>] el0_svc+0x2c/0x84
> > > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > > | [<0>] el0t_64_sync+0x18c/0x190
> > > | # echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > | # cat /proc/self/stack
> > > | [<0>] 0xf5f98000083dff40
> > > | [<0>] 0xd6b88000083e0f68
> > > | [<0>] 0x21ac800008381ad0
> > > | [<0>] 0xd0bc800008381e58
> > > | [<0>] 0x22b280000834bc28
> > > | [<0>] 0xf0ca80000834c5c8
> > > | [<0>] 0x299080000834c684
> > > | [<0>] 0xb1a1800008029cf0
> > > | [<0>] 0x9bd0800008029e94
> > > | [<0>] 0x1788800008029ee8
> > > | [<0>] 0xa08680000916dd5c
> > > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > > | [<0>] el0t_64_sync+0x18c/0x190
> > >
> > > That's unfortunate (and would break RELIABLE_STACKTRACE, which we're slowly
> > > getting towards being able to implement), but it's simple enough to account for
> > > in the stacktrace code.
> > >
> >
> > Indeed. Those functions should just strip the PAC bits, no?
>
> For that case, yup. That was roughly what I meant about it being simple to deal
> with in the stacktrace code. :)
>

Right. So given that this is an issue for PAC but not for shadow call
stack, we might consider a shorter term fix where we push/pop these
addresses to the shadow call stack, and address the PAC clearing more
comprehensively once we get around to it.

> > > I have a fear that there are other cases where code tries to consume the graph
> > > return stack (or to match against entries within it), which would be similarly
> > > broken. I vaguely recall that we had issues of that shape in the past when we
> > > tried to adjust the reported PC value, and would need to go page that in to
> > > check that we don't open a similar issue here.
> >
> > OK
>
> FWIW, I'm happy to go audit that, I just wanted to make sure we didn't forget
> to do so, since it's not obvious that there are potential issues there.
>

Great.

> > > > > >       /* Restore the callsite's SP */
> > > > > >       add     sp, sp, #PT_REGS_SIZE + 16
> > > > > >
> > > > > > +     restore_return_address x9
> > > > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > > > +     /* compare the original return address with the actual one */
> > > > > > +     cmp     x10, x30
> > > > > > +     b.ne    0f
> > > > > > +
> > > > > > +     /*
> > > > > > +      * If they are the same, unprotect it now. If it was modified, it will
> > > > > > +      * be dealt with in return_to_handler() below.
> > > > > > +      */
> > > > > > +     restore_return_address x30
> > > > > > +0:
> > > > > > +#endif
> > > > > >       ret     x9
>
> > > > > As above, I'd prefer that we had ftrace_graph_func() fix things up so that we
> > > > > can unconditionally authenticate things here, which would be a bit stronger and
> > > > > simpler to reason about.
> > > > >
> > > >
> > > > I think having all in one place makes it much easier to reason about,
> > > > tbh. Adding additional handling of the PAC state as well as the shadow
> > > > call stack in ftrace_graph_func() seems much more fiddly to me.
> > >
> > > I appreciate that concern, but my intuition here is the inverse; I'd like to
> > > avoid the conditionality in the regular tracing path to make that clearly
> > > balanced and (from my perspective) easier to reason about.
> > >
> > > I'm happy if we have to do a bit more work in ftrace_graph_func() and
> > > return_to_handler() since those are already more special anyway.
> > >
> >
> > Fair enough. As long as the asm routines have a SCS pop or AUTIASP
> > between reloading x30 and returning to it, I don't have any problems
> > with that.
>
> Sure; I think that's workable. I have a rough shape in mind, so I'll have a go
> at that as an example and try to get back to you shortly.
>

Thanks.

> With that in mind, I think we should also fix up
> qcom_link_stack_sanitisation(), since that ends up creating a gadget of the form:
>
>         MOV     X30, Xn
>         RET
>
> ... and that can be fixed by leaving it to the compiler to save/restore x30,
> whereupon it should create a frame record and all the usual PAC goodness.
> Example patch below (reformatted into the usual arm64 inline asm style).
>
> Thanks,
> Mark.
>
> ---->8----
> diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
> index bfce41c2a53b3..9fc54facf1ccb 100644
> --- a/arch/arm64/kernel/proton-pack.c
> +++ b/arch/arm64/kernel/proton-pack.c
> @@ -250,12 +250,13 @@ static noinstr void qcom_link_stack_sanitisation(void)
>  {
>         u64 tmp;
>
> -       asm volatile("mov       %0, x30         \n"
> -                    ".rept     16              \n"
> -                    "bl        . + 4           \n"
> -                    ".endr                     \n"
> -                    "mov       x30, %0         \n"
> -                    : "=&r" (tmp));
> +       asm volatile(
> +       "       .rept   16              \n"
> +       "       bl      . + 4           \n"
> +       "       .endr                   \n"
> +       : "=&r" (tmp)
> +       :
> +       : "x30");
>  }
>

Yeah, I'm sure that's the last one :-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] arm64: ftrace: Add return address protection
  2022-12-01 15:05             ` Ard Biesheuvel
@ 2022-12-01 15:48               ` Mark Rutland
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Rutland @ 2022-12-01 15:48 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Kees Cook,
	Catalin Marinas, Mark Brown

On Thu, Dec 01, 2022 at 04:05:35PM +0100, Ard Biesheuvel wrote:
> On Thu, 1 Dec 2022 at 15:40, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Thu, Dec 01, 2022 at 02:09:41PM +0100, Ard Biesheuvel wrote:
> > > On Wed, 30 Nov 2022 at 18:45, Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Wed, Nov 30, 2022 at 03:26:19PM +0100, Ard Biesheuvel wrote:
> > > > > On Wed, 30 Nov 2022 at 15:04, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > > On Tue, Nov 29, 2022 at 03:18:03PM +0100, Ard Biesheuvel wrote:
> > > > > > > Use the newly added asm macros to protect and restore the return address
> > > > > > > in the ftrace call wrappers, based on whichever method is active (PAC
> > > > > > > and/or shadow call stack).
> > > > > > >
> > > > > > > If the graph tracer is in use, this covers both the return address *to*
> > > > > > > the ftrace call site as well as the return address *at* the call site,
> > > > > > > and the latter will either be restored in return_to_handler(), or before
> > > > > > > returning to the call site.
> > > > > > >
> > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > > ---
> > > > > > >  arch/arm64/kernel/entry-ftrace.S | 28 +++++++++++++++++++-
> > > > > > >  1 file changed, 27 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > As a heads-up, this code has recently changed quite significantly, and this
> > > > > > won't apply to the version queued in arm64's for-next/{ftrace,core} branches.
> > > > > >
> > > > > > I had a direction of travel in mind with some changes for better stacktracing,
> > > > > > which won't work with the approach here, so I'd prefer we do this a bit
> > > > > > differently; more on that below.
> > > > > >
> > > > > > > diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> > > > > > > index 795344ab4ec45889..c744e4dd8c90a352 100644
> > > > > > > --- a/arch/arm64/kernel/entry-ftrace.S
> > > > > > > +++ b/arch/arm64/kernel/entry-ftrace.S
> > > > > > > @@ -35,6 +35,11 @@
> > > > > > >   * is missing from the LR and existing chain of frame records.
> > > > > > >   */
> > > > > > >       .macro  ftrace_regs_entry, allregs=0
> > > > > > > +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > > > > > +     protect_return_address x9
> > > > > > > +#endif
> > > > > > > +     protect_return_address x30
> > > > > >
> > > > > > I think if we're going to protect the callsite's original LR (x9 here), we
> > > > > > should do that regardless of CONFIG_FUNCTION_GRAPH_TRACER; what matters is
> > > > > > whether that's vulnerable rather than whether we intend to modify it, so I
> > > > > > don't think it makes sene to protect it conditionally based on
> > > > > > CONFIG_FUNCTION_GRAPH_TRACER.
> > > > >
> > > > > My reasoning was that if we are not going to return from it (in
> > > > > return_to_handler()), we can rely on the interrupted function to
> > > > > sign/authenticate it as usual. So the only reason for signing it here
> > > > > is so that we can authenticate it in return_to_handler() if that
> > > > > exists on the call path, removing a potentially vulnerable sequence
> > > > > from that function.
> > > >
> > > > What I was trying to point out is that there is a window where this is spilled
> > > > to the stack (and hence is potentially vulnerable) between
> > > > ftrace_{caller,regs_caller}() and the end of ftrace_common().
> > > >
> > > > So if we don't protect this when CONFIG_FUNCTION_GRAPH_TRACER=n, it could be
> > > > clobbered during that window (e.g. while function tracers are invoked),
> > > > *before* we return back into the instrumented function and sign the
> > > > (potentially already clobbered) value.
> > >
> > > Agreed.
> > >
> > > But to clarify, the intent of this series is not to add protection to
> > > ftrace, the intent is to get rid of the gadgets from the ftrace code
> > > that can be abused even if you don't use ftrace at all.
> >
> > Ok; sorry for missing that; I'll need to think a little harder.
> >
> 
> You said it :-)
> 
> > > > Hence, my thinking is that we should sign this regardless of
> > > > CONFIG_FUNCTION_GRAPH_TRACER to mitigate that case. I agree that we also want
> > > > it to be signed while it's in the graph return stack (i.e. until the
> > > > instrumented function returns back to return_to_handler()). In general, we
> > > > should sign the value if it's going to be spilled to the stack.
> > >
> > > Sure, but it solves a different problem.
> >
> > Fair enough!
> >
> > I think we're agreed that something which solves both issues makes sense, even
> > if that's not necessary for the gadgetisation issue specifically?
> 
> Of course.

Great -- just wanted to check there wasn't an inverse problem I'd missed!

> So the issue we are talking about here is the fact that you might be
> able to attack the ftrace infrastructure while it is being used so
> that the function return from ftrace_common() is made to point
> somewhere else.

Yup. I suspect the risk is must lower due to the smaller amount of code there,
but given things like fprobe and BPF hooks, there might be code that gets
injected there which isn't as careful as we'd like, so it would be nice to
protect.

> I agree that this is something we might want to
> harden, and I also wonder whether we should perhaps insert three NOPs
> instead of two, or teach the compiler to put its PACIASP right after
> so that we can use BR instead of RET to perform the return.

I think that approach is a mixed bag :/

I was hoping that we could reduce the set of BTI-compatible instructions we
have, and I'd like to get to a point where we can set SCTLR_ELx.BT1=1 so that
PACIASP isn't an implicit BTI in the kernel. That way we'd be in a similar boat
to x86 after redundant ENDBRs are removed, with forward-edge protection being
strengthened and EXPORT-control being strengthened. That needs new compiler
help to output separate BTI and PACIASP instructions, but otherwise that's
relatively simple, and could significantly reduce the set of gadgetizable
functions regardless of whether ftrace is in use.

Given that, I'm not keen on adding an extra BTI-compatible instruction into
function prologues.

> But again, this is ground that I am currently not attempting to cover.
> 
> > > > > > I'm a bit worried this might confuse some ftrace code manipulating the return
> > > > > > address (e.g. manipulation of the ftrace graph return stack), as I don't think
> > > > > > that's all PAC-clean, and might need some modification.
> > > > >
> > > > > This is the reason for the xpaci instruction below.
> > > >
> > > > Unfortunately, that alone isn't sufficient.
> > > >
> > > > What I was alluding to is that this change means the ftrace graph return stack
> > > > contains signed addresses, and other code doesn't expect that. For example,
> > > > arm64's stacktrace code currently depends on the graph return stack containing
> > > > plain pointers, and so that gets broken as of this patch when function graph
> > > > tracing is enabled:
> > > >
> > > > | # uname -a
> > > > | # Linux buildroot 6.1.0-rc7-00003-g44a67f0b8ac7 #1 SMP PREEMPT Wed Nov 30 17:19:38 GMT 2022 aarch64 GNU/Linux
> > > > | # cat /proc/self/stack
> > > > | [<0>] proc_pid_stack+0xc0/0x130
> > > > | [<0>] proc_single_show+0x68/0x120
> > > > | [<0>] seq_read_iter+0x16c/0x45c
> > > > | [<0>] seq_read+0x98/0xd0
> > > > | [<0>] vfs_read+0xc8/0x2c0
> > > > | [<0>] ksys_read+0x78/0x110
> > > > | [<0>] __arm64_sys_read+0x24/0x30
> > > > | [<0>] invoke_syscall+0x50/0x120
> > > > | [<0>] el0_svc_common.constprop.0+0xd4/0xf4
> > > > | [<0>] do_el0_svc+0x34/0xd0
> > > > | [<0>] el0_svc+0x2c/0x84
> > > > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > > > | [<0>] el0t_64_sync+0x18c/0x190
> > > > | # echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > > > | # cat /proc/self/stack
> > > > | [<0>] 0xf5f98000083dff40
> > > > | [<0>] 0xd6b88000083e0f68
> > > > | [<0>] 0x21ac800008381ad0
> > > > | [<0>] 0xd0bc800008381e58
> > > > | [<0>] 0x22b280000834bc28
> > > > | [<0>] 0xf0ca80000834c5c8
> > > > | [<0>] 0x299080000834c684
> > > > | [<0>] 0xb1a1800008029cf0
> > > > | [<0>] 0x9bd0800008029e94
> > > > | [<0>] 0x1788800008029ee8
> > > > | [<0>] 0xa08680000916dd5c
> > > > | [<0>] el0t_64_sync_handler+0xf4/0x120
> > > > | [<0>] el0t_64_sync+0x18c/0x190
> > > >
> > > > That's unfortunate (and would break RELIABLE_STACKTRACE, which we're slowly
> > > > getting towards being able to implement), but it's simple enough to account for
> > > > in the stacktrace code.
> > > >
> > >
> > > Indeed. Those functions should just strip the PAC bits, no?
> >
> > For that case, yup. That was roughly what I meant about it being simple to deal
> > with in the stacktrace code. :)
> 
> Right. So given that this is an issue for PAC but not for shadow call
> stack, we might consider a shorter term fix where we push/pop these
> addresses to the shadow call stack, and address the PAC clearing more
> comprehensively once we get around to it.

I'm not necessarily opposed to that, and TBH we might not need the address in
the graph return stack to be signed, since the graph return stack itself is a
shadow stack.

I think we can restructure things such that the values on the graph return
stack would remain unsigned, but we'd still always protect spills to the
regular stack *AND* the assembly would be structured to ensure to remove the
return gadgets.

As before, I'll have a go at that and try to get it out shortly.

[...]

> > With that in mind, I think we should also fix up
> > qcom_link_stack_sanitisation(), since that ends up creating a gadget of the form:
> >
> >         MOV     X30, Xn
> >         RET

> Yeah, I'm sure that's the last one :-)

:)

Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-12-01 15:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-29 14:17 [PATCH 0/4] arm64: Add return address protection to asm code Ard Biesheuvel
2022-11-29 14:18 ` [PATCH 1/4] arm64: assembler: Force error on misuse of .Lframe_local_offset Ard Biesheuvel
2022-11-29 14:18 ` [PATCH 2/4] arm64: assembler: Add macros for return address protection Ard Biesheuvel
2022-11-30 14:15   ` Mark Rutland
2022-11-30 14:33     ` Ard Biesheuvel
2022-11-29 14:18 ` [PATCH 3/4] arm64: efi: Add return address protection to runtime wrapper Ard Biesheuvel
2022-11-29 14:18 ` [PATCH 4/4] arm64: ftrace: Add return address protection Ard Biesheuvel
2022-11-30 14:04   ` Mark Rutland
2022-11-30 14:26     ` Ard Biesheuvel
2022-11-30 17:45       ` Mark Rutland
2022-12-01 13:09         ` Ard Biesheuvel
2022-12-01 14:40           ` Mark Rutland
2022-12-01 15:05             ` Ard Biesheuvel
2022-12-01 15:48               ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).