linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] arm64: livepatch: Enable livepatch without sframe
@ 2025-03-08  1:27 Song Liu
  2025-03-08  1:27 ` [PATCH 1/2] arm64: Implement arch_stack_walk_reliable Song Liu
  2025-03-08  1:27 ` [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH Song Liu
  0 siblings, 2 replies; 16+ messages in thread
From: Song Liu @ 2025-03-08  1:27 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching
  Cc: indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, jpoimboe,
	mark.rutland, peterz, roman.gushchin, rostedt, will, kernel-team,
	song

There are recent efforts to enable livepatch for arm64, with sframe [1] or
without sframe [2]. This set tries to enable livepatch without sframe. Some
of the code, however, are from [1].

Although the sframe implementation is more promising in longer term, it
suffers from the following issues:

  1. sframe is not yet supported in llvm;
  2. There is still bug in binutil [3], so that we cannot yet use sframe
     with gcc;
  3. sframe unwinder hasn't been fully verified in the kernel.

On the other hand, arm64 processors have become more and more important in
the data center world. Therefore, it is getting critical to support
livepatching of arm64 kernels.

With recent change in arm64 unwinder [4], it is possible to reliably
livepatch arm64 kernels without sframe. This is because we do not need
arch_stack_walk_reliable() to get reliable stack trace in all scenarios.
Instead, we only need arch_stack_walk_reliable() to detect when the
stack trace is not reliable, then the livepatch logic can retry the patch
transition at a later time.

Given the increasing need of livepatching, and relatively long time before
sframe is fully ready (for both gcc and clang), we would like to enable
livepatch without sframe.

Thanks!

[1] https://lore.kernel.org/live-patching/20250127213310.2496133-1-wnliu@google.com/
[2] https://lore.kernel.org/live-patching/20250129232936.1795412-1-song@kernel.org/
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=32589
[4] https://lore.kernel.org/linux-arm-kernel/20241017092538.1859841-1-mark.rutland@arm.com/

Song Liu (2):
  arm64: Implement arch_stack_walk_reliable
  arm64: Implement HAVE_LIVEPATCH

 arch/arm64/Kconfig                         |  3 ++
 arch/arm64/include/asm/stacktrace/common.h |  1 +
 arch/arm64/include/asm/thread_info.h       |  4 +-
 arch/arm64/kernel/entry-common.c           |  4 ++
 arch/arm64/kernel/stacktrace.c             | 44 +++++++++++++++++++++-
 5 files changed, 54 insertions(+), 2 deletions(-)

--
2.43.5

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-08  1:27 [PATCH 0/2] arm64: livepatch: Enable livepatch without sframe Song Liu
@ 2025-03-08  1:27 ` Song Liu
  2025-03-13 18:12   ` Breno Leitao
  2025-03-18 18:45   ` Josh Poimboeuf
  2025-03-08  1:27 ` [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH Song Liu
  1 sibling, 2 replies; 16+ messages in thread
From: Song Liu @ 2025-03-08  1:27 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching
  Cc: indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, jpoimboe,
	mark.rutland, peterz, roman.gushchin, rostedt, will, kernel-team,
	song

With proper exception boundary detection, it is possible to implment
arch_stack_walk_reliable without sframe.

Note that, arch_stack_walk_reliable does not guarantee getting reliable
stack in all scenarios. Instead, it can reliably detect when the stack
trace is not reliable, which is enough to provide reliable livepatching.

This version has been inspired by Weinan Liu's patch [1].

[1] https://lore.kernel.org/live-patching/20250127213310.2496133-7-wnliu@google.com/
Signed-off-by: Song Liu <song@kernel.org>
---
 arch/arm64/Kconfig                         |  2 +-
 arch/arm64/include/asm/stacktrace/common.h |  1 +
 arch/arm64/kernel/stacktrace.c             | 44 +++++++++++++++++++++-
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 940343beb3d4..ed4f7bf4a879 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -275,6 +275,7 @@ config ARM64
 	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select USER_STACKTRACE_SUPPORT
 	select VDSO_GETRANDOM
+	select HAVE_RELIABLE_STACKTRACE
 	help
 	  ARM 64-bit (AArch64) Linux support.
 
@@ -2499,4 +2500,3 @@ endmenu # "CPU Power Management"
 source "drivers/acpi/Kconfig"
 
 source "arch/arm64/kvm/Kconfig"
-
diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
index 821a8fdd31af..072469fd91b7 100644
--- a/arch/arm64/include/asm/stacktrace/common.h
+++ b/arch/arm64/include/asm/stacktrace/common.h
@@ -33,6 +33,7 @@ struct unwind_state {
 	struct stack_info stack;
 	struct stack_info *stacks;
 	int nr_stacks;
+	bool unreliable;
 };
 
 static inline struct stack_info stackinfo_get_unknown(void)
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 1d9d51d7627f..69d0567a0c38 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -230,8 +230,14 @@ kunwind_next_frame_record(struct kunwind_state *state)
 	new_fp = READ_ONCE(record->fp);
 	new_pc = READ_ONCE(record->lr);
 
-	if (!new_fp && !new_pc)
+	if (!new_fp && !new_pc) {
+		/*
+		 * Searching across exception boundaries. The stack is now
+		 * unreliable.
+		 */
+		state->common.unreliable = true;
 		return kunwind_next_frame_record_meta(state);
+	}
 
 	unwind_consume_stack(&state->common, info, fp, sizeof(*record));
 
@@ -347,6 +353,7 @@ kunwind_stack_walk(kunwind_consume_fn consume_state,
 		.common = {
 			.stacks = stacks,
 			.nr_stacks = ARRAY_SIZE(stacks),
+			.unreliable = false,
 		},
 	};
 
@@ -387,6 +394,41 @@ noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry,
 	kunwind_stack_walk(arch_kunwind_consume_entry, &data, task, regs);
 }
 
+struct kunwind_reliable_consume_entry_data {
+	stack_trace_consume_fn consume_entry;
+	void *cookie;
+	bool unreliable;
+};
+
+static __always_inline bool
+arch_kunwind_reliable_consume_entry(const struct kunwind_state *state, void *cookie)
+{
+	struct kunwind_reliable_consume_entry_data *data = cookie;
+
+	if (state->common.unreliable) {
+		data->unreliable = true;
+		return false;
+	}
+	return data->consume_entry(data->cookie, state->common.pc);
+}
+
+noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+			void *cookie, struct task_struct *task)
+{
+	struct kunwind_reliable_consume_entry_data data = {
+		.consume_entry = consume_entry,
+		.cookie = cookie,
+		.unreliable = false,
+	};
+
+	kunwind_stack_walk(arch_kunwind_reliable_consume_entry, &data, task, NULL);
+
+	if (data.unreliable)
+		return -EINVAL;
+
+	return 0;
+}
+
 struct bpf_unwind_consume_entry_data {
 	bool (*consume_entry)(void *cookie, u64 ip, u64 sp, u64 fp);
 	void *cookie;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH
  2025-03-08  1:27 [PATCH 0/2] arm64: livepatch: Enable livepatch without sframe Song Liu
  2025-03-08  1:27 ` [PATCH 1/2] arm64: Implement arch_stack_walk_reliable Song Liu
@ 2025-03-08  1:27 ` Song Liu
  2025-03-12 14:05   ` Miroslav Benes
  2025-03-13 18:09   ` Breno Leitao
  1 sibling, 2 replies; 16+ messages in thread
From: Song Liu @ 2025-03-08  1:27 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching
  Cc: indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, jpoimboe,
	mark.rutland, peterz, roman.gushchin, rostedt, will, kernel-team,
	song, Suraj Jitindar Singh, Torsten Duwe

This is largely based on [1] by Suraj Jitindar Singh.

Test coverage:

- Passed manual tests with samples/livepatch.
- Passed all but test-kprobe.sh in selftests/livepatch.
  test-kprobe.sh is expected to fail, because arm64 doesn't have
  KPROBES_ON_FTRACE.
- Passed tests with kpatch-build [2]. (This version includes commits that
  are not merged to upstream kpatch yet).

[1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/
[2] https://github.com/liu-song-6/kpatch/tree/fb-6.13
Cc: Suraj Jitindar Singh <surajjs@amazon.com>
Cc: Torsten Duwe <duwe@suse.de>
Signed-off-by: Song Liu <song@kernel.org>
---
 arch/arm64/Kconfig                   | 3 +++
 arch/arm64/include/asm/thread_info.h | 4 +++-
 arch/arm64/kernel/entry-common.c     | 4 ++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ed4f7bf4a879..bc9edba6171a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -276,6 +276,7 @@ config ARM64
 	select USER_STACKTRACE_SUPPORT
 	select VDSO_GETRANDOM
 	select HAVE_RELIABLE_STACKTRACE
+	select HAVE_LIVEPATCH
 	help
 	  ARM 64-bit (AArch64) Linux support.
 
@@ -2500,3 +2501,5 @@ endmenu # "CPU Power Management"
 source "drivers/acpi/Kconfig"
 
 source "arch/arm64/kvm/Kconfig"
+
+source "kernel/livepatch/Kconfig"
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 1114c1c3300a..4ac42e13032b 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -64,6 +64,7 @@ void arch_setup_new_exec(void);
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_MTE_ASYNC_FAULT	5	/* MTE Asynchronous Tag Check Fault */
 #define TIF_NOTIFY_SIGNAL	6	/* signal notifications exist */
+#define TIF_PATCH_PENDING	7	/* pending live patching update */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -92,6 +93,7 @@ void arch_setup_new_exec(void);
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
 #define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
+#define _TIF_PATCH_PENDING	(1 << TIF_PATCH_PENDING)
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 #define _TIF_32BIT		(1 << TIF_32BIT)
@@ -103,7 +105,7 @@ void arch_setup_new_exec(void);
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
 				 _TIF_UPROBE | _TIF_MTE_ASYNC_FAULT | \
-				 _TIF_NOTIFY_SIGNAL)
+				 _TIF_NOTIFY_SIGNAL | _TIF_PATCH_PENDING)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index b260ddc4d3e9..b537af333b42 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -8,6 +8,7 @@
 #include <linux/context_tracking.h>
 #include <linux/kasan.h>
 #include <linux/linkage.h>
+#include <linux/livepatch.h>
 #include <linux/lockdep.h>
 #include <linux/ptrace.h>
 #include <linux/resume_user_mode.h>
@@ -144,6 +145,9 @@ static void do_notify_resume(struct pt_regs *regs, unsigned long thread_flags)
 				       (void __user *)NULL, current);
 		}
 
+		if (thread_flags & _TIF_PATCH_PENDING)
+			klp_update_patch_state(current);
+
 		if (thread_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 			do_signal(regs);
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH
  2025-03-08  1:27 ` [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH Song Liu
@ 2025-03-12 14:05   ` Miroslav Benes
  2025-03-13 18:09   ` Breno Leitao
  1 sibling, 0 replies; 16+ messages in thread
From: Miroslav Benes @ 2025-03-12 14:05 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, jpoimboe,
	mark.rutland, peterz, roman.gushchin, rostedt, will, kernel-team,
	Suraj Jitindar Singh, Torsten Duwe

Hi,

On Fri, 7 Mar 2025, Song Liu wrote:

> This is largely based on [1] by Suraj Jitindar Singh.
> 
> Test coverage:
> 
> - Passed manual tests with samples/livepatch.
> - Passed all but test-kprobe.sh in selftests/livepatch.
>   test-kprobe.sh is expected to fail, because arm64 doesn't have
>   KPROBES_ON_FTRACE.

Correct. The test should take into account and bail out.

> - Passed tests with kpatch-build [2]. (This version includes commits that
>   are not merged to upstream kpatch yet).
> 
> [1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/
> [2] https://github.com/liu-song-6/kpatch/tree/fb-6.13
> Cc: Suraj Jitindar Singh <surajjs@amazon.com>
> Cc: Torsten Duwe <duwe@suse.de>
> Signed-off-by: Song Liu <song@kernel.org>

Acked-by: Miroslav Benes <mbenes@suse.cz>

Also as mentioned in the other thread, parts of this patch will go once 
arm64 is converted to generic entry infrastructure.

Thank you,
Miroslav

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH
  2025-03-08  1:27 ` [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH Song Liu
  2025-03-12 14:05   ` Miroslav Benes
@ 2025-03-13 18:09   ` Breno Leitao
  1 sibling, 0 replies; 16+ messages in thread
From: Breno Leitao @ 2025-03-13 18:09 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, jpoimboe,
	mark.rutland, peterz, roman.gushchin, rostedt, will, kernel-team,
	Suraj Jitindar Singh, Torsten Duwe

On Fri, Mar 07, 2025 at 05:27:42PM -0800, Song Liu wrote:
> This is largely based on [1] by Suraj Jitindar Singh.
> 
> Test coverage:
> 
> - Passed manual tests with samples/livepatch.
> - Passed all but test-kprobe.sh in selftests/livepatch.
>   test-kprobe.sh is expected to fail, because arm64 doesn't have
>   KPROBES_ON_FTRACE.
> - Passed tests with kpatch-build [2]. (This version includes commits that
>   are not merged to upstream kpatch yet).
> 
> [1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/
> [2] https://github.com/liu-song-6/kpatch/tree/fb-6.13
> Cc: Suraj Jitindar Singh <surajjs@amazon.com>
> Cc: Torsten Duwe <duwe@suse.de>
> Signed-off-by: Song Liu <song@kernel.org>

Tested-by: Breno Leitao <leitao@debian.org>

PS: I've tested this patchset with the examples from samples/ on a arm64
host, and they worked as expected.

Thanks
--breno

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-08  1:27 ` [PATCH 1/2] arm64: Implement arch_stack_walk_reliable Song Liu
@ 2025-03-13 18:12   ` Breno Leitao
  2025-03-13 19:10     ` Song Liu
  2025-03-18 18:45   ` Josh Poimboeuf
  1 sibling, 1 reply; 16+ messages in thread
From: Breno Leitao @ 2025-03-13 18:12 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, jpoimboe,
	mark.rutland, peterz, roman.gushchin, rostedt, will, kernel-team

On Fri, Mar 07, 2025 at 05:27:41PM -0800, Song Liu wrote:
> With proper exception boundary detection, it is possible to implment
> arch_stack_walk_reliable without sframe.
> 
> Note that, arch_stack_walk_reliable does not guarantee getting reliable
> stack in all scenarios. Instead, it can reliably detect when the stack
> trace is not reliable, which is enough to provide reliable livepatching.
> 
> This version has been inspired by Weinan Liu's patch [1].
> 
> [1] https://lore.kernel.org/live-patching/20250127213310.2496133-7-wnliu@google.com/
> Signed-off-by: Song Liu <song@kernel.org>

Tested-by: Breno Leitao <leitao@debian.org>

>  arch/arm64/Kconfig                         |  2 +-
>  arch/arm64/include/asm/stacktrace/common.h |  1 +
>  arch/arm64/kernel/stacktrace.c             | 44 +++++++++++++++++++++-
>  3 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 940343beb3d4..ed4f7bf4a879 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -275,6 +275,7 @@ config ARM64
>  	select HAVE_SOFTIRQ_ON_OWN_STACK
>  	select USER_STACKTRACE_SUPPORT
>  	select VDSO_GETRANDOM
> +	select HAVE_RELIABLE_STACKTRACE

Can we really mark this is reliable stacktrace?  I am wondering
if we need an intermediate state (potentially reliable stacktrace?)
until we have a fully reliable stack unwinder.

Thanks for working on it.
--breno

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-13 18:12   ` Breno Leitao
@ 2025-03-13 19:10     ` Song Liu
  0 siblings, 0 replies; 16+ messages in thread
From: Song Liu @ 2025-03-13 19:10 UTC (permalink / raw)
  To: Breno Leitao
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, jpoimboe,
	mark.rutland, peterz, roman.gushchin, rostedt, will, kernel-team

On Thu, Mar 13, 2025 at 11:12 AM Breno Leitao <leitao@debian.org> wrote:
>
> On Fri, Mar 07, 2025 at 05:27:41PM -0800, Song Liu wrote:
> > With proper exception boundary detection, it is possible to implment
> > arch_stack_walk_reliable without sframe.
> >
> > Note that, arch_stack_walk_reliable does not guarantee getting reliable
> > stack in all scenarios. Instead, it can reliably detect when the stack
> > trace is not reliable, which is enough to provide reliable livepatching.
> >
> > This version has been inspired by Weinan Liu's patch [1].
> >
> > [1] https://lore.kernel.org/live-patching/20250127213310.2496133-7-wnliu@google.com/
> > Signed-off-by: Song Liu <song@kernel.org>
>
> Tested-by: Breno Leitao <leitao@debian.org>

Thanks for the testing!

>
> >  arch/arm64/Kconfig                         |  2 +-
> >  arch/arm64/include/asm/stacktrace/common.h |  1 +
> >  arch/arm64/kernel/stacktrace.c             | 44 +++++++++++++++++++++-
> >  3 files changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 940343beb3d4..ed4f7bf4a879 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -275,6 +275,7 @@ config ARM64
> >       select HAVE_SOFTIRQ_ON_OWN_STACK
> >       select USER_STACKTRACE_SUPPORT
> >       select VDSO_GETRANDOM
> > +     select HAVE_RELIABLE_STACKTRACE
>
> Can we really mark this is reliable stacktrace?  I am wondering
> if we need an intermediate state (potentially reliable stacktrace?)
> until we have a fully reliable stack unwinder.

AFAICT, we do not expect arch_stack_walk_reliable() to always
return a reliable stack. Instead, it is expected to return -EINVAL if
the stack trace is not reliable. OTOH, arch_stack_walk() doesn't
warn the caller when the stack trace is not reliable. This is exactly
what we need for live patch: we just need to make the patch
transition when the stack trace is reliable and none of the functions
in the stack is being patched. If the stack trace is not reliable, we
will retry the transition at a later time.

Thanks,
Song

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-08  1:27 ` [PATCH 1/2] arm64: Implement arch_stack_walk_reliable Song Liu
  2025-03-13 18:12   ` Breno Leitao
@ 2025-03-18 18:45   ` Josh Poimboeuf
  2025-03-18 20:14     ` Song Liu
  1 sibling, 1 reply; 16+ messages in thread
From: Josh Poimboeuf @ 2025-03-18 18:45 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, mark.rutland,
	peterz, roman.gushchin, rostedt, will, kernel-team

On Fri, Mar 07, 2025 at 05:27:41PM -0800, Song Liu wrote:
> With proper exception boundary detection, it is possible to implment
> arch_stack_walk_reliable without sframe.
> 
> Note that, arch_stack_walk_reliable does not guarantee getting reliable
> stack in all scenarios. Instead, it can reliably detect when the stack
> trace is not reliable, which is enough to provide reliable livepatching.
> 
> This version has been inspired by Weinan Liu's patch [1].
> 
> [1] https://lore.kernel.org/live-patching/20250127213310.2496133-7-wnliu@google.com/
> Signed-off-by: Song Liu <song@kernel.org>

This looks incomplete.  The reliable unwinder needs to be extra
paranoid.  There are several already-checked-for errors in the unwinder
that don't actually set the unreliable bit.

There are likely other failure modes it should also be checking for.
For example I don't see where it confirms that the unwind completed to
the end of the stack (which is typically at a certain offset).

See for example all the error conditions in the x86 version of
arch_stack_walk_reliable() and in arch/x86/kernel/unwind_frame.c.

-- 
Josh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-18 18:45   ` Josh Poimboeuf
@ 2025-03-18 20:14     ` Song Liu
  2025-03-18 23:00       ` Josh Poimboeuf
  0 siblings, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-03-18 20:14 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, mark.rutland,
	peterz, roman.gushchin, rostedt, will, kernel-team

Hi Josh,

Thanks for the review!

On Tue, Mar 18, 2025 at 11:45 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Fri, Mar 07, 2025 at 05:27:41PM -0800, Song Liu wrote:
> > With proper exception boundary detection, it is possible to implment
> > arch_stack_walk_reliable without sframe.
> >
> > Note that, arch_stack_walk_reliable does not guarantee getting reliable
> > stack in all scenarios. Instead, it can reliably detect when the stack
> > trace is not reliable, which is enough to provide reliable livepatching.
> >
> > This version has been inspired by Weinan Liu's patch [1].
> >
> > [1] https://lore.kernel.org/live-patching/20250127213310.2496133-7-wnliu@google.com/
> > Signed-off-by: Song Liu <song@kernel.org>
>
> This looks incomplete.  The reliable unwinder needs to be extra
> paranoid.  There are several already-checked-for errors in the unwinder
> that don't actually set the unreliable bit.
>
> There are likely other failure modes it should also be checking for.
> For example I don't see where it confirms that the unwind completed to
> the end of the stack (which is typically at a certain offset).

If I understand the comment correctly, this should be handled by the
meta data type FRAME_META_TYPE_FINAL.

>
> See for example all the error conditions in the x86 version of
> arch_stack_walk_reliable() and in arch/x86/kernel/unwind_frame.c.

I guess I missed this part:

diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c
index 69d0567a0c38..3bb8e3ea4c4b 100644
--- i/arch/arm64/kernel/stacktrace.c
+++ w/arch/arm64/kernel/stacktrace.c
@@ -268,6 +268,8 @@ kunwind_next(struct kunwind_state *state)
        case KUNWIND_SOURCE_TASK:
        case KUNWIND_SOURCE_REGS_PC:
                err = kunwind_next_frame_record(state);
+               if (err && err != -ENOENT)
+                       state->common.unreliable = true;
                break;
        default:
                err = -EINVAL;


With this part, we should cover all these cases? Did I miss something
else?

Thanks,
Song

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-18 20:14     ` Song Liu
@ 2025-03-18 23:00       ` Josh Poimboeuf
  2025-03-18 23:38         ` Song Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Josh Poimboeuf @ 2025-03-18 23:00 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, mark.rutland,
	peterz, roman.gushchin, rostedt, will, kernel-team

On Tue, Mar 18, 2025 at 01:14:40PM -0700, Song Liu wrote:
> >
> > See for example all the error conditions in the x86 version of
> > arch_stack_walk_reliable() and in arch/x86/kernel/unwind_frame.c.
> 
> I guess I missed this part:
> 
> diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c
> index 69d0567a0c38..3bb8e3ea4c4b 100644
> --- i/arch/arm64/kernel/stacktrace.c
> +++ w/arch/arm64/kernel/stacktrace.c
> @@ -268,6 +268,8 @@ kunwind_next(struct kunwind_state *state)
>         case KUNWIND_SOURCE_TASK:
>         case KUNWIND_SOURCE_REGS_PC:
>                 err = kunwind_next_frame_record(state);
> +               if (err && err != -ENOENT)
> +                       state->common.unreliable = true;
>                 break;
>         default:
>                 err = -EINVAL;

I still see some issues:

  - do_kunwind() -> kunwind_recover_return_address() can fail

  - do_kunwind() -> consume_state() can fail

  - even in the -ENOENT case the unreliable bit has already been set
    right before the call to kunwind_next_frame_record_meta().

-- 
Josh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-18 23:00       ` Josh Poimboeuf
@ 2025-03-18 23:38         ` Song Liu
  2025-03-19  1:03           ` Josh Poimboeuf
  0 siblings, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-03-18 23:38 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, mark.rutland,
	peterz, roman.gushchin, rostedt, will, kernel-team

On Tue, Mar 18, 2025 at 4:00 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Tue, Mar 18, 2025 at 01:14:40PM -0700, Song Liu wrote:
> > >
> > > See for example all the error conditions in the x86 version of
> > > arch_stack_walk_reliable() and in arch/x86/kernel/unwind_frame.c.
> >
> > I guess I missed this part:
> >
> > diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c
> > index 69d0567a0c38..3bb8e3ea4c4b 100644
> > --- i/arch/arm64/kernel/stacktrace.c
> > +++ w/arch/arm64/kernel/stacktrace.c
> > @@ -268,6 +268,8 @@ kunwind_next(struct kunwind_state *state)
> >         case KUNWIND_SOURCE_TASK:
> >         case KUNWIND_SOURCE_REGS_PC:
> >                 err = kunwind_next_frame_record(state);
> > +               if (err && err != -ENOENT)
> > +                       state->common.unreliable = true;
> >                 break;
> >         default:
> >                 err = -EINVAL;
>
> I still see some issues:
>
>   - do_kunwind() -> kunwind_recover_return_address() can fail
>
>   - do_kunwind() -> consume_state() can fail

Great points! I have got the following:

diff --git i/arch/arm64/kernel/stacktrace.c w/arch/arm64/kernel/stacktrace.c
index 69d0567a0c38..852e4b322209 100644
--- i/arch/arm64/kernel/stacktrace.c
+++ w/arch/arm64/kernel/stacktrace.c
@@ -139,6 +139,7 @@ kunwind_recover_return_address(struct kunwind_state *state)
                                                (void *)state->common.fp);
                if (state->common.pc == orig_pc) {
                        WARN_ON_ONCE(state->task == current);
+                       state->common.unreliable = true;
                        return -EINVAL;
                }
                state->common.pc = orig_pc;
@@ -268,6 +269,8 @@ kunwind_next(struct kunwind_state *state)
        case KUNWIND_SOURCE_TASK:
        case KUNWIND_SOURCE_REGS_PC:
                err = kunwind_next_frame_record(state);
+               if (err && err != -ENOENT)
+                       state->common.unreliable = true;
                break;
        default:
                err = -EINVAL;
@@ -404,12 +407,16 @@ static __always_inline bool
 arch_kunwind_reliable_consume_entry(const struct kunwind_state
*state, void *cookie)
 {
        struct kunwind_reliable_consume_entry_data *data = cookie;
+       bool ret;

        if (state->common.unreliable) {
                data->unreliable = true;
                return false;
        }
-       return data->consume_entry(data->cookie, state->common.pc);
+       ret = data->consume_entry(data->cookie, state->common.pc);
+       if (!ret)
+               data->unreliable = true;
+       return ret;
 }

 noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn
consume_entry,


>   - even in the -ENOENT case the unreliable bit has already been set
>     right before the call to kunwind_next_frame_record_meta().

For this one, do you mean we set state->common.unreliable, but
failed to propagate it to data.unreliable?

Thanks,
Song

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-18 23:38         ` Song Liu
@ 2025-03-19  1:03           ` Josh Poimboeuf
  2025-03-19  3:58             ` Song Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Josh Poimboeuf @ 2025-03-19  1:03 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, mark.rutland,
	peterz, roman.gushchin, rostedt, will, kernel-team

On Tue, Mar 18, 2025 at 04:38:20PM -0700, Song Liu wrote:
> On Tue, Mar 18, 2025 at 4:00 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >   - even in the -ENOENT case the unreliable bit has already been set
> >     right before the call to kunwind_next_frame_record_meta().
> 
> For this one, do you mean we set state->common.unreliable, but
> failed to propagate it to data.unreliable?

Hm, I hadn't noticed that.  That code is quite the maze.

It's unfortunate there are two separate 'unreliable' variables.  It
looks like consume_state() is the only way they get synced?

How does that work if kunwind_next() returns an error and skips
consume_state()?  Or if kunwind_recover_return_address() returns an
error to kunwind_next()?

What I actually meant was the following:

  do_kunwind()
    kunwind_next()
      kunwind_next_frame_record()
        state->common.unreliable = true;
	kunwind_next_frame_record_meta()
	  return -ENOENT;

Notice that in the success case (-ENOENT), unreliable has already been
set.

Actually I think it would be much simpler to just propagate -ENOENT down
the call chain.  Then no 'unreliable' bits needed.

Like so (instead of original patch):

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9fe3e7566a6..5713fad567c5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -276,6 +276,7 @@ config ARM64
 	select HAVE_SOFTIRQ_ON_OWN_STACK
 	select USER_STACKTRACE_SUPPORT
 	select VDSO_GETRANDOM
+	select HAVE_RELIABLE_STACKTRACE
 	help
 	  ARM 64-bit (AArch64) Linux support.
 
@@ -2509,4 +2510,3 @@ endmenu # "CPU Power Management"
 source "drivers/acpi/Kconfig"
 
 source "arch/arm64/kvm/Kconfig"
-
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 1d9d51d7627f..e227da842bc3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -277,22 +277,28 @@ kunwind_next(struct kunwind_state *state)
 
 typedef bool (*kunwind_consume_fn)(const struct kunwind_state *state, void *cookie);
 
-static __always_inline void
+static __always_inline int
 do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
 	   void *cookie)
 {
-	if (kunwind_recover_return_address(state))
-		return;
+	int ret;
+
+	ret = kunwind_recover_return_address(state);
+	if (ret)
+		return ret;
 
 	while (1) {
 		int ret;
 
 		if (!consume_state(state, cookie))
-			break;
+			return -EINVAL;
+
 		ret = kunwind_next(state);
-		if (ret < 0)
-			break;
+		if (ret)
+			return ret;
 	}
+
+	return -EINVAL;
 }
 
 /*
@@ -324,7 +330,7 @@ do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
 			: stackinfo_get_unknown();		\
 	})
 
-static __always_inline void
+static __always_inline int
 kunwind_stack_walk(kunwind_consume_fn consume_state,
 		   void *cookie, struct task_struct *task,
 		   struct pt_regs *regs)
@@ -352,7 +358,7 @@ kunwind_stack_walk(kunwind_consume_fn consume_state,
 
 	if (regs) {
 		if (task != current)
-			return;
+			return -EINVAL;
 		kunwind_init_from_regs(&state, regs);
 	} else if (task == current) {
 		kunwind_init_from_caller(&state);
@@ -360,7 +366,7 @@ kunwind_stack_walk(kunwind_consume_fn consume_state,
 		kunwind_init_from_task(&state, task);
 	}
 
-	do_kunwind(&state, consume_state, cookie);
+	return do_kunwind(&state, consume_state, cookie);
 }
 
 struct kunwind_consume_entry_data {
@@ -387,6 +393,25 @@ noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry,
 	kunwind_stack_walk(arch_kunwind_consume_entry, &data, task, regs);
 }
 
+noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+			void *cookie, struct task_struct *task)
+{
+	int ret;
+	struct kunwind_consume_entry_data data = {
+		.consume_entry = consume_entry,
+		.cookie = cookie,
+	};
+
+	ret = kunwind_stack_walk(arch_kunwind_consume_entry, &data, task, NULL);
+	if (ret) {
+		if (ret == -ENOENT)
+			return 0;
+		return ret;
+	}
+
+	return -EINVAL;
+}
+
 struct bpf_unwind_consume_entry_data {
 	bool (*consume_entry)(void *cookie, u64 ip, u64 sp, u64 fp);
 	void *cookie;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-19  1:03           ` Josh Poimboeuf
@ 2025-03-19  3:58             ` Song Liu
  2025-03-19  5:39               ` Josh Poimboeuf
  0 siblings, 1 reply; 16+ messages in thread
From: Song Liu @ 2025-03-19  3:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, mark.rutland,
	peterz, roman.gushchin, rostedt, will, kernel-team

On Tue, Mar 18, 2025 at 6:03 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Tue, Mar 18, 2025 at 04:38:20PM -0700, Song Liu wrote:
> > On Tue, Mar 18, 2025 at 4:00 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >   - even in the -ENOENT case the unreliable bit has already been set
> > >     right before the call to kunwind_next_frame_record_meta().
> >
> > For this one, do you mean we set state->common.unreliable, but
> > failed to propagate it to data.unreliable?
>
> Hm, I hadn't noticed that.  That code is quite the maze.
>
> It's unfortunate there are two separate 'unreliable' variables.  It
> looks like consume_state() is the only way they get synced?
>
> How does that work if kunwind_next() returns an error and skips
> consume_state()?  Or if kunwind_recover_return_address() returns an
> error to kunwind_next()?
>
> What I actually meant was the following:
>
>   do_kunwind()
>     kunwind_next()
>       kunwind_next_frame_record()
>         state->common.unreliable = true;
>         kunwind_next_frame_record_meta()
>           return -ENOENT;
>
> Notice that in the success case (-ENOENT), unreliable has already been
> set.
>
> Actually I think it would be much simpler to just propagate -ENOENT down
> the call chain.  Then no 'unreliable' bits needed.

Yeah, I was thinking about something like this. This is actually quite
similar to my original RFC version.

On a closer look, I think we also need some logic in unwind_find_stack()
so that we can see when the unwinder hits the exception boundary. For
this reason, we may still need unwind_state.unreliable. I will look into
fixing this and send v2.

Thanks,
Song

>
> Like so (instead of original patch):
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c9fe3e7566a6..5713fad567c5 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -276,6 +276,7 @@ config ARM64
>         select HAVE_SOFTIRQ_ON_OWN_STACK
>         select USER_STACKTRACE_SUPPORT
>         select VDSO_GETRANDOM
> +       select HAVE_RELIABLE_STACKTRACE
>         help
>           ARM 64-bit (AArch64) Linux support.
>
> @@ -2509,4 +2510,3 @@ endmenu # "CPU Power Management"
>  source "drivers/acpi/Kconfig"
>
>  source "arch/arm64/kvm/Kconfig"
> -
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 1d9d51d7627f..e227da842bc3 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -277,22 +277,28 @@ kunwind_next(struct kunwind_state *state)
>
>  typedef bool (*kunwind_consume_fn)(const struct kunwind_state *state, void *cookie);
>
> -static __always_inline void
> +static __always_inline int
>  do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
>            void *cookie)
>  {
> -       if (kunwind_recover_return_address(state))
> -               return;
> +       int ret;
> +
> +       ret = kunwind_recover_return_address(state);
> +       if (ret)
> +               return ret;
>
>         while (1) {
>                 int ret;
>
>                 if (!consume_state(state, cookie))
> -                       break;
> +                       return -EINVAL;
> +
>                 ret = kunwind_next(state);
> -               if (ret < 0)
> -                       break;
> +               if (ret)
> +                       return ret;
>         }
> +
> +       return -EINVAL;
>  }
>
>  /*
> @@ -324,7 +330,7 @@ do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
>                         : stackinfo_get_unknown();              \
>         })
>
> -static __always_inline void
> +static __always_inline int
>  kunwind_stack_walk(kunwind_consume_fn consume_state,
>                    void *cookie, struct task_struct *task,
>                    struct pt_regs *regs)
> @@ -352,7 +358,7 @@ kunwind_stack_walk(kunwind_consume_fn consume_state,
>
>         if (regs) {
>                 if (task != current)
> -                       return;
> +                       return -EINVAL;
>                 kunwind_init_from_regs(&state, regs);
>         } else if (task == current) {
>                 kunwind_init_from_caller(&state);
> @@ -360,7 +366,7 @@ kunwind_stack_walk(kunwind_consume_fn consume_state,
>                 kunwind_init_from_task(&state, task);
>         }
>
> -       do_kunwind(&state, consume_state, cookie);
> +       return do_kunwind(&state, consume_state, cookie);
>  }
>
>  struct kunwind_consume_entry_data {
> @@ -387,6 +393,25 @@ noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry,
>         kunwind_stack_walk(arch_kunwind_consume_entry, &data, task, regs);
>  }
>
> +noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
> +                       void *cookie, struct task_struct *task)
> +{
> +       int ret;
> +       struct kunwind_consume_entry_data data = {
> +               .consume_entry = consume_entry,
> +               .cookie = cookie,
> +       };
> +
> +       ret = kunwind_stack_walk(arch_kunwind_consume_entry, &data, task, NULL);
> +       if (ret) {
> +               if (ret == -ENOENT)
> +                       return 0;
> +               return ret;
> +       }
> +
> +       return -EINVAL;
> +}
> +
>  struct bpf_unwind_consume_entry_data {
>         bool (*consume_entry)(void *cookie, u64 ip, u64 sp, u64 fp);
>         void *cookie;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-19  3:58             ` Song Liu
@ 2025-03-19  5:39               ` Josh Poimboeuf
  2025-03-19 18:37                 ` Weinan Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Josh Poimboeuf @ 2025-03-19  5:39 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	indu.bhagat, puranjay, wnliu, irogers, joe.lawrence, mark.rutland,
	peterz, roman.gushchin, rostedt, will, kernel-team

On Tue, Mar 18, 2025 at 08:58:52PM -0700, Song Liu wrote:
> On a closer look, I think we also need some logic in unwind_find_stack()
> so that we can see when the unwinder hits the exception boundary. For
> this reason, we may still need unwind_state.unreliable. I will look into
> fixing this and send v2.

Isn't that what FRAME_META_TYPE_PT_REGS is for?

Maybe it can just tell kunwind_stack_walk() to set a bit in
kunwind_state which tells kunwind_next_frame_record_meta() to quit the
unwind early for the FRAME_META_TYPE_PT_REGS case.  That also has the
benefit of stopping the unwind as soon as the exception is encounterd.

-- 
Josh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-19  5:39               ` Josh Poimboeuf
@ 2025-03-19 18:37                 ` Weinan Liu
  2025-03-19 19:09                   ` Song Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Weinan Liu @ 2025-03-19 18:37 UTC (permalink / raw)
  To: jpoimboe
  Cc: indu.bhagat, irogers, joe.lawrence, kernel-team, linux-arm-kernel,
	linux-kernel, linux-toolchains, live-patching, mark.rutland,
	peterz, puranjay, roman.gushchin, rostedt, song, will, wnliu

On Tue, Mar 18, 2025 at 10:39 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Tue, Mar 18, 2025 at 08:58:52PM -0700, Song Liu wrote:
> > On a closer look, I think we also need some logic in unwind_find_stack()
> > so that we can see when the unwinder hits the exception boundary. For
> > this reason, we may still need unwind_state.unreliable. I will look into
> > fixing this and send v2.
>
> Isn't that what FRAME_META_TYPE_PT_REGS is for?
>
> Maybe it can just tell kunwind_stack_walk() to set a bit in
> kunwind_state which tells kunwind_next_frame_record_meta() to quit the
> unwind early for the FRAME_META_TYPE_PT_REGS case.  That also has the
> benefit of stopping the unwind as soon as the exception is encounterd.
>

After reviewing the code flow, it seems like we should treat all -EINVALID
cases or `FRAME_META_TYPE_PT_REGS` cases as unreliable unwinds.

Would a simplification like the one below work?
Or we can return a special value for success cases in kunwind_next_regs_pc() 

```
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 69d0567a0c38..0eb69fa6161a 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -296,7 +296,8 @@ do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
 		if (!consume_state(state, cookie))
 			break;
 		ret = kunwind_next(state);
-		if (ret < 0)
+		if (ret < 0 || state->source == KUNWIND_SOURCE_REGS_PC)
+			state->common.unreliable = true;
 			break;
 	}
 }
```

--
Weinan

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] arm64: Implement arch_stack_walk_reliable
  2025-03-19 18:37                 ` Weinan Liu
@ 2025-03-19 19:09                   ` Song Liu
  0 siblings, 0 replies; 16+ messages in thread
From: Song Liu @ 2025-03-19 19:09 UTC (permalink / raw)
  To: Weinan Liu
  Cc: jpoimboe, indu.bhagat, irogers, joe.lawrence, kernel-team,
	linux-arm-kernel, linux-kernel, linux-toolchains, live-patching,
	mark.rutland, peterz, puranjay, roman.gushchin, rostedt, will

On Wed, Mar 19, 2025 at 11:38 AM Weinan Liu <wnliu@google.com> wrote:
>
> On Tue, Mar 18, 2025 at 10:39 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Tue, Mar 18, 2025 at 08:58:52PM -0700, Song Liu wrote:
> > > On a closer look, I think we also need some logic in unwind_find_stack()
> > > so that we can see when the unwinder hits the exception boundary. For
> > > this reason, we may still need unwind_state.unreliable. I will look into
> > > fixing this and send v2.
> >
> > Isn't that what FRAME_META_TYPE_PT_REGS is for?
> >
> > Maybe it can just tell kunwind_stack_walk() to set a bit in
> > kunwind_state which tells kunwind_next_frame_record_meta() to quit the
> > unwind early for the FRAME_META_TYPE_PT_REGS case.  That also has the
> > benefit of stopping the unwind as soon as the exception is encounterd.
> >
>
> After reviewing the code flow, it seems like we should treat all -EINVALID
> cases or `FRAME_META_TYPE_PT_REGS` cases as unreliable unwinds.

Agreed with this: all -EINVALID cases or `FRAME_META_TYPE_PT_REGS`
should be unreliable, IIUC.

>
> Would a simplification like the one below work?
> Or we can return a special value for success cases in kunwind_next_regs_pc()
>
> ```
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 69d0567a0c38..0eb69fa6161a 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -296,7 +296,8 @@ do_kunwind(struct kunwind_state *state, kunwind_consume_fn consume_state,
>                 if (!consume_state(state, cookie))
>                         break;
>                 ret = kunwind_next(state);
> -               if (ret < 0)
> +               if (ret < 0 || state->source == KUNWIND_SOURCE_REGS_PC)
> +                       state->common.unreliable = true;

I am current leaning toward not using common.unreliable. It seems to add
unnecessary complexity here. But I may change my mind later on.

Thanks,
Song

>                         break;
>         }
>  }
> ```
>
> --
> Weinan

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-19 19:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08  1:27 [PATCH 0/2] arm64: livepatch: Enable livepatch without sframe Song Liu
2025-03-08  1:27 ` [PATCH 1/2] arm64: Implement arch_stack_walk_reliable Song Liu
2025-03-13 18:12   ` Breno Leitao
2025-03-13 19:10     ` Song Liu
2025-03-18 18:45   ` Josh Poimboeuf
2025-03-18 20:14     ` Song Liu
2025-03-18 23:00       ` Josh Poimboeuf
2025-03-18 23:38         ` Song Liu
2025-03-19  1:03           ` Josh Poimboeuf
2025-03-19  3:58             ` Song Liu
2025-03-19  5:39               ` Josh Poimboeuf
2025-03-19 18:37                 ` Weinan Liu
2025-03-19 19:09                   ` Song Liu
2025-03-08  1:27 ` [PATCH 2/2] arm64: Implement HAVE_LIVEPATCH Song Liu
2025-03-12 14:05   ` Miroslav Benes
2025-03-13 18:09   ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).