From: Masami Hiramatsu <mhiramat@kernel.org>
To: Daniel Xu <dxu@dxuuu.xyz>
Cc: rostedt@goodmis.org, jpoimboe@redhat.com, kuba@kernel.org,
ast@kernel.org, tglx@linutronix.de, mingo@redhat.com,
x86@kernel.org, linux-kernel@vger.kernel.org,
bpf@vger.kernel.org, kernel-team@fb.com, yhs@fb.com
Subject: Re: [PATCH] x86: kprobes: orc: Fix ORC walks in kretprobes
Date: Fri, 5 Mar 2021 18:28:06 +0900 [thread overview]
Message-ID: <20210305182806.df403dec398875c2c1b2c62d@kernel.org> (raw)
In-Reply-To: <d72c62498ea0514e7b81a3eab5e8c1671137b9a0.1614902828.git.dxu@dxuuu.xyz>
Hi Daniel,
On Thu, 4 Mar 2021 16:07:52 -0800
Daniel Xu <dxu@dxuuu.xyz> wrote:
> Getting a stack trace from inside a kretprobe used to work with frame
> pointer stack walks. After the default unwinder was switched to ORC,
> stack traces broke because ORC did not know how to skip the
> `kretprobe_trampoline` "frame".
>
> Frame based stack walks used to work with kretprobes because
> `kretprobe_trampoline` does not set up a new call frame. Thus, the frame
> pointer based unwinder could walk directly to the kretprobe'd caller.
>
> For example, this stack is walked incorrectly with ORC + kretprobe:
>
> # bpftrace -e 'kretprobe:do_nanosleep { @[kstack] = count() }'
> Attaching 1 probe...
> ^C
>
> @[
> kretprobe_trampoline+0
> ]: 1
>
> After this patch, the stack is walked correctly:
>
> # bpftrace -e 'kretprobe:do_nanosleep { @[kstack] = count() }'
> Attaching 1 probe...
> ^C
>
> @[
> kretprobe_trampoline+0
> __x64_sys_nanosleep+150
> do_syscall_64+51
> entry_SYSCALL_64_after_hwframe+68
> ]: 12
>
> Fixes: fc72ae40e303 ("x86/unwind: Make CONFIG_UNWINDER_ORC=y the default in kconfig for 64-bit")
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
OK, basically good, but this is messy, and doing much more than fixing issue.
> ---
> arch/x86/kernel/unwind_orc.c | 53 +++++++++++++++++++++++++++++++++++-
> kernel/kprobes.c | 8 +++---
> 2 files changed, 56 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 2a1d47f47eee..1b88d75e2e9e 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -1,7 +1,9 @@
> // SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/kprobes.h>
> #include <linux/objtool.h>
> #include <linux/module.h>
> #include <linux/sort.h>
> +#include <asm/kprobes.h>
> #include <asm/ptrace.h>
> #include <asm/stacktrace.h>
> #include <asm/unwind.h>
> @@ -77,9 +79,11 @@ static struct orc_entry *orc_module_find(unsigned long ip)
> }
> #endif
>
> -#ifdef CONFIG_DYNAMIC_FTRACE
> +#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_KRETPROBES)
> static struct orc_entry *orc_find(unsigned long ip);
> +#endif
>
> +#ifdef CONFIG_DYNAMIC_FTRACE
> /*
> * Ftrace dynamic trampolines do not have orc entries of their own.
> * But they are copies of the ftrace entries that are static and
> @@ -117,6 +121,43 @@ static struct orc_entry *orc_ftrace_find(unsigned long ip)
> }
> #endif
>
> +#ifdef CONFIG_KRETPROBES
> +static struct orc_entry *orc_kretprobe_find(void)
> +{
> + kprobe_opcode_t *correct_ret_addr = NULL;
> + struct kretprobe_instance *ri = NULL;
> + struct llist_node *node;
> +
> + node = current->kretprobe_instances.first;
> + while (node) {
> + ri = container_of(node, struct kretprobe_instance, llist);
> +
> + if ((void *)ri->ret_addr != &kretprobe_trampoline) {
> + /*
> + * This is the real return address. Any other
> + * instances associated with this task are for
> + * other calls deeper on the call stack
> + */
> + correct_ret_addr = ri->ret_addr;
> + break;
> + }
> +
> +
> + node = node->next;
> + }
> +
> + if (!correct_ret_addr)
> + return NULL;
> +
> + return orc_find((unsigned long)correct_ret_addr);
> +}
> +#else
> +static struct orc_entry *orc_kretprobe_find(void)
> +{
> + return NULL;
> +}
> +#endif
This code is too much depending on kretprobe internal implementation.
This should should be provided by kretprobe.
> /*
> * If we crash with IP==0, the last successfully executed instruction
> * was probably an indirect function call with a NULL function pointer,
> @@ -148,6 +189,16 @@ static struct orc_entry *orc_find(unsigned long ip)
> if (ip == 0)
> return &null_orc_entry;
>
> + /*
> + * Kretprobe lookup -- must occur before vmlinux addresses as
> + * kretprobe_trampoline is in the symbol table.
> + */
> + if (ip == (unsigned long) &kretprobe_trampoline) {
> + orc = orc_kretprobe_find();
> + if (orc)
> + return orc;
> + }
Here too. at least "ip == (unsigned long) &kretprobe_trampoline" should
be hidden by an inline function...
> +
> /* For non-init vmlinux addresses, use the fast lookup table: */
> if (ip >= LOOKUP_START_IP && ip < LOOKUP_STOP_IP) {
> unsigned int idx, start, stop;
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 745f08fdd7a6..334c23d33451 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1895,10 +1895,6 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
> BUG_ON(1);
>
> found:
> - /* Unlink all nodes for this frame. */
> - current->kretprobe_instances.first = node->next;
> - node->next = NULL;
> -
> /* Run them.. */
> while (first) {
> ri = container_of(first, struct kretprobe_instance, llist);
> @@ -1917,6 +1913,10 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
> recycle_rp_inst(ri);
> }
>
> + /* Unlink all nodes for this frame. */
> + current->kretprobe_instances.first = node->next;
> + node->next = NULL;
Nack, this is a bit dangerous. We should unlink the chunk of kretprobe instances and
recycle it as I did in my patch, see below;
https://lore.kernel.org/bpf/20210304221947.5a177ce2e1e94314e57c38a4@kernel.org/
I would like to fix this issue in the generic part, not for x86 only.
Let me refresh my series for fixing it.
Thank you,
--
Masami Hiramatsu <mhiramat@kernel.org>
next prev parent reply other threads:[~2021-03-05 9:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-05 0:07 [PATCH] x86: kprobes: orc: Fix ORC walks in kretprobes Daniel Xu
2021-03-05 9:28 ` Masami Hiramatsu [this message]
2021-03-05 10:58 ` Masami Hiramatsu
2021-03-05 19:25 ` Daniel Xu
2021-03-05 19:32 ` Josh Poimboeuf
2021-03-05 20:45 ` Daniel Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210305182806.df403dec398875c2c1b2c62d@kernel.org \
--to=mhiramat@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=dxu@dxuuu.xyz \
--cc=jpoimboe@redhat.com \
--cc=kernel-team@fb.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.