From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Jiri Olsa <olsajiri@gmail.com>
Cc: oleg@redhat.com, Aleksa Sarai <cyphar@cyphar.com>,
Eyal Birger <eyal.birger@gmail.com>,
mhiramat@kernel.org, linux-kernel <linux-kernel@vger.kernel.org>,
linux-trace-kernel@vger.kernel.org,
BPF-dev-list <bpf@vger.kernel.org>,
Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
peterz@infradead.org, tglx@linutronix.de, bp@alien8.de,
x86@kernel.org, linux-api@vger.kernel.org,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
rafi@rbk.io, Shmulik Ladkani <shmulik.ladkani@gmail.com>
Subject: Re: Crash when attaching uretprobes to processes running in Docker
Date: Tue, 14 Jan 2025 19:05:21 +0900 [thread overview]
Message-ID: <20250114190521.0b69a1af64cac41106101154@kernel.org> (raw)
In-Reply-To: <Z4YszJfOvFEAaKjF@krava>
On Tue, 14 Jan 2025 10:22:20 +0100
Jiri Olsa <olsajiri@gmail.com> wrote:
> On Sat, Jan 11, 2025 at 07:40:15PM +0100, Jiri Olsa wrote:
> > On Sat, Jan 11, 2025 at 02:25:37AM +1100, Aleksa Sarai wrote:
> > > On 2025-01-10, Eyal Birger <eyal.birger@gmail.com> wrote:
> > > > Hi,
> > > >
> > > > When attaching uretprobes to processes running inside docker, the attached
> > > > process is segfaulted when encountering the retprobe. The offending commit
> > > > is:
> > > >
> > > > ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
> > > >
> > > > To my understanding, the reason is that now that uretprobe is a system call,
> > > > the default seccomp filters in docker block it as they only allow a specific
> > > > set of known syscalls.
> > >
> > > FWIW, the default seccomp profile of Docker _should_ return -ENOSYS for
> > > uretprobe (runc has a bunch of ugly logic to try to guarantee this if
> > > Docker hasn't updated their profile to include it). Though I guess that
> > > isn't sufficient for the magic that uretprobe(2) does...
> > >
> > > > This behavior can be reproduced by the below bash script, which works before
> > > > this commit.
> > > >
> > > > Reported-by: Rafael Buchbinder <rafi@rbk.io>
> >
> > hi,
> > nice ;-) thanks for the report, the problem seems to be that uretprobe syscall
> > is blocked and uretprobe trampoline does not expect that
> >
> > I think we could add code to the uretprobe trampoline to detect this and
> > execute standard int3 as fallback to process uretprobe, I'm checking on that
>
> hack below seems to fix the issue, it's using rbx to signal that uretprobe
> syscall got executed, if not, trampoline does int3 and executes uretprobe
> handler in the old way
>
> unfortunately now the uretprobe trampoline size crosses the xol slot limit so
> will need to come up with some generic/arch code solution for that, code below
> is neglecting that for now
>
> jirka
>
>
> ---
> arch/x86/kernel/uprobes.c | 24 ++++++++++++++++++++++++
> include/linux/uprobes.h | 1 +
> kernel/events/uprobes.c | 10 ++++++++--
> 3 files changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 5a952c5ea66b..b54863f6fa25 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -315,14 +315,25 @@ asm (
> ".global uretprobe_trampoline_entry\n"
> "uretprobe_trampoline_entry:\n"
> "pushq %rax\n"
> + "pushq %rbx\n"
> "pushq %rcx\n"
> "pushq %r11\n"
> + "movq $1, %rbx\n"
> "movq $" __stringify(__NR_uretprobe) ", %rax\n"
> "syscall\n"
> ".global uretprobe_syscall_check\n"
> "uretprobe_syscall_check:\n"
> + "or %rbx,%rbx\n"
> + "jz uretprobe_syscall_return\n"
> "popq %r11\n"
> "popq %rcx\n"
> + "popq %rbx\n"
> + "popq %rax\n"
> + "int3\n"
> + "uretprobe_syscall_return:\n"
> + "popq %r11\n"
> + "popq %rcx\n"
> + "popq %rbx\n"
>
> /* The uretprobe syscall replaces stored %rax value with final
> * return address, so we don't restore %rax in here and just
> @@ -338,6 +349,16 @@ extern u8 uretprobe_trampoline_entry[];
> extern u8 uretprobe_trampoline_end[];
> extern u8 uretprobe_syscall_check[];
>
> +#define UINSNS_PER_PAGE (PAGE_SIZE/UPROBE_XOL_SLOT_BYTES)
> +
> +bool arch_is_uretprobe_trampoline(unsigned long vaddr)
> +{
> + unsigned long start = uprobe_get_trampoline_vaddr();
> + unsigned long end = start + 2*UINSNS_PER_PAGE;
> +
> + return vaddr >= start && vaddr < end;
> +}
> +
> void *arch_uprobe_trampoline(unsigned long *psize)
> {
> static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> @@ -418,6 +439,9 @@ SYSCALL_DEFINE0(uretprobe)
> regs->r11 = regs->flags;
> regs->cx = regs->ip;
>
> + /* zero rbx to signal trampoline that uretprobe syscall was executed */
> + regs->bx = 0;
Can we just return -ENOSYS as like as other syscall instead of
using rbx as a side channel?
We can carefully check the return address is not -ERRNO when set up
and reserve the -ENOSYS for this use case.
Thank you,
> +
> return regs->ax;
>
> sigill:
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index e0a4c2082245..dbde57a68a1b 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -213,6 +213,7 @@ extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
> extern void uprobe_handle_trampoline(struct pt_regs *regs);
> extern void *arch_uprobe_trampoline(unsigned long *psize);
> extern unsigned long uprobe_get_trampoline_vaddr(void);
> +bool arch_is_uretprobe_trampoline(unsigned long vaddr);
> #else /* !CONFIG_UPROBES */
> struct uprobes_state {
> };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index fa04b14a7d72..73df64109f38 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1703,6 +1703,11 @@ void * __weak arch_uprobe_trampoline(unsigned long *psize)
> return &insn;
> }
>
> +bool __weak arch_is_uretprobe_trampoline(unsigned long vaddr)
> +{
> + return vaddr == uprobe_get_trampoline_vaddr();
> +}
> +
> static struct xol_area *__create_xol_area(unsigned long vaddr)
> {
> struct mm_struct *mm = current->mm;
> @@ -1725,8 +1730,9 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
>
> area->vaddr = vaddr;
> init_waitqueue_head(&area->wq);
> - /* Reserve the 1st slot for get_trampoline_vaddr() */
> + /* Reserve the first two slots for get_trampoline_vaddr() */
> set_bit(0, area->bitmap);
> + set_bit(1, area->bitmap);
> insns = arch_uprobe_trampoline(&insns_size);
> arch_uprobe_copy_ixol(area->page, 0, insns, insns_size);
>
> @@ -2536,7 +2542,7 @@ static void handle_swbp(struct pt_regs *regs)
> int is_swbp;
>
> bp_vaddr = uprobe_get_swbp_addr(regs);
> - if (bp_vaddr == uprobe_get_trampoline_vaddr())
> + if (arch_is_uretprobe_trampoline(bp_vaddr))
> return uprobe_handle_trampoline(regs);
>
> rcu_read_lock_trace();
> --
> 2.47.1
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
next prev parent reply other threads:[~2025-01-14 10:05 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 15:12 Crash when attaching uretprobes to processes running in Docker Eyal Birger
2025-01-10 15:25 ` Aleksa Sarai
2025-01-11 18:40 ` Jiri Olsa
2025-01-14 9:22 ` Jiri Olsa
2025-01-14 10:05 ` Masami Hiramatsu [this message]
2025-01-14 11:21 ` Oleg Nesterov
2025-01-14 14:21 ` Jiri Olsa
2025-01-17 1:23 ` Masami Hiramatsu
2025-01-17 1:57 ` Oleg Nesterov
2025-01-14 10:42 ` Peter Zijlstra
2025-01-14 11:01 ` Oleg Nesterov
2025-01-14 12:02 ` Peter Zijlstra
2025-01-14 12:32 ` Oleg Nesterov
2025-01-14 14:07 ` Peter Zijlstra
2025-01-14 17:43 ` Oleg Nesterov
2025-01-14 10:58 ` Oleg Nesterov
2025-01-14 14:19 ` Jiri Olsa
2025-01-14 19:21 ` Andrii Nakryiko
2025-01-14 20:39 ` Oleg Nesterov
2025-01-14 21:45 ` Andrii Nakryiko
2025-01-14 22:10 ` Oleg Nesterov
2025-01-14 23:52 ` Andrii Nakryiko
2025-01-15 0:09 ` Eyal Birger
2025-01-15 0:50 ` Oleg Nesterov
2025-01-15 5:45 ` Shmulik Ladkani
2025-01-15 15:51 ` Oleg Nesterov
2025-01-17 11:41 ` Peter Zijlstra
2025-01-17 17:53 ` Andrii Nakryiko
2025-01-14 14:08 ` Eyal Birger
2025-01-14 14:33 ` Oleg Nesterov
2025-01-14 14:56 ` Jiri Olsa
2025-01-14 17:25 ` Oleg Nesterov
2025-01-15 9:36 ` Jiri Olsa
2025-01-15 13:24 ` Eyal Birger
2025-01-15 13:25 ` Jiri Olsa
2025-01-15 15:06 ` Oleg Nesterov
2025-01-15 17:56 ` Alexei Starovoitov
2025-01-15 18:20 ` Andrii Nakryiko
2025-01-15 18:40 ` Oleg Nesterov
2025-01-15 18:48 ` Eyal Birger
2025-01-15 19:03 ` Oleg Nesterov
2025-01-15 21:14 ` Eyal Birger
2025-01-16 14:39 ` Oleg Nesterov
2025-01-16 14:47 ` Eyal Birger
2025-01-16 15:31 ` Oleg Nesterov
2025-01-16 17:11 ` Eyal Birger
2025-01-17 0:48 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250114190521.0b69a1af64cac41106101154@kernel.org \
--to=mhiramat@kernel.org \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=cyphar@cyphar.com \
--cc=daniel@iogearbox.net \
--cc=eyal.birger@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=olsajiri@gmail.com \
--cc=peterz@infradead.org \
--cc=rafi@rbk.io \
--cc=rostedt@goodmis.org \
--cc=shmulik.ladkani@gmail.com \
--cc=songliubraving@fb.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox