BPF List
 help / color / mirror / Atom feed
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Jiri Olsa <olsajiri@gmail.com>
Cc: oleg@redhat.com, Aleksa Sarai <cyphar@cyphar.com>,
	Eyal Birger <eyal.birger@gmail.com>,
	mhiramat@kernel.org, linux-kernel <linux-kernel@vger.kernel.org>,
	linux-trace-kernel@vger.kernel.org,
	BPF-dev-list <bpf@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	peterz@infradead.org, tglx@linutronix.de, bp@alien8.de,
	x86@kernel.org, linux-api@vger.kernel.org,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	rafi@rbk.io, Shmulik Ladkani <shmulik.ladkani@gmail.com>
Subject: Re: Crash when attaching uretprobes to processes running in Docker
Date: Tue, 14 Jan 2025 19:05:21 +0900	[thread overview]
Message-ID: <20250114190521.0b69a1af64cac41106101154@kernel.org> (raw)
In-Reply-To: <Z4YszJfOvFEAaKjF@krava>

On Tue, 14 Jan 2025 10:22:20 +0100
Jiri Olsa <olsajiri@gmail.com> wrote:

> On Sat, Jan 11, 2025 at 07:40:15PM +0100, Jiri Olsa wrote:
> > On Sat, Jan 11, 2025 at 02:25:37AM +1100, Aleksa Sarai wrote:
> > > On 2025-01-10, Eyal Birger <eyal.birger@gmail.com> wrote:
> > > > Hi,
> > > > 
> > > > When attaching uretprobes to processes running inside docker, the attached
> > > > process is segfaulted when encountering the retprobe. The offending commit
> > > > is:
> > > > 
> > > > ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
> > > > 
> > > > To my understanding, the reason is that now that uretprobe is a system call,
> > > > the default seccomp filters in docker block it as they only allow a specific
> > > > set of known syscalls.
> > > 
> > > FWIW, the default seccomp profile of Docker _should_ return -ENOSYS for
> > > uretprobe (runc has a bunch of ugly logic to try to guarantee this if
> > > Docker hasn't updated their profile to include it). Though I guess that
> > > isn't sufficient for the magic that uretprobe(2) does...
> > > 
> > > > This behavior can be reproduced by the below bash script, which works before
> > > > this commit.
> > > > 
> > > > Reported-by: Rafael Buchbinder <rafi@rbk.io>
> > 
> > hi,
> > nice ;-) thanks for the report, the problem seems to be that uretprobe syscall
> > is blocked and uretprobe trampoline does not expect that
> > 
> > I think we could add code to the uretprobe trampoline to detect this and
> > execute standard int3 as fallback to process uretprobe, I'm checking on that
> 
> hack below seems to fix the issue, it's using rbx to signal that uretprobe
> syscall got executed, if not, trampoline does int3 and executes uretprobe
> handler in the old way
> 
> unfortunately now the uretprobe trampoline size crosses the xol slot limit so
> will need to come up with some generic/arch code solution for that, code below
> is neglecting that for now
> 
> jirka
> 
> 
> ---
>  arch/x86/kernel/uprobes.c | 24 ++++++++++++++++++++++++
>  include/linux/uprobes.h   |  1 +
>  kernel/events/uprobes.c   | 10 ++++++++--
>  3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 5a952c5ea66b..b54863f6fa25 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -315,14 +315,25 @@ asm (
>  	".global uretprobe_trampoline_entry\n"
>  	"uretprobe_trampoline_entry:\n"
>  	"pushq %rax\n"
> +	"pushq %rbx\n"
>  	"pushq %rcx\n"
>  	"pushq %r11\n"
> +	"movq $1, %rbx\n"
>  	"movq $" __stringify(__NR_uretprobe) ", %rax\n"
>  	"syscall\n"
>  	".global uretprobe_syscall_check\n"
>  	"uretprobe_syscall_check:\n"
> +	"or %rbx,%rbx\n"
> +	"jz uretprobe_syscall_return\n"
>  	"popq %r11\n"
>  	"popq %rcx\n"
> +	"popq %rbx\n"
> +	"popq %rax\n"
> +	"int3\n"
> +	"uretprobe_syscall_return:\n"
> +	"popq %r11\n"
> +	"popq %rcx\n"
> +	"popq %rbx\n"
>  
>  	/* The uretprobe syscall replaces stored %rax value with final
>  	 * return address, so we don't restore %rax in here and just
> @@ -338,6 +349,16 @@ extern u8 uretprobe_trampoline_entry[];
>  extern u8 uretprobe_trampoline_end[];
>  extern u8 uretprobe_syscall_check[];
>  
> +#define UINSNS_PER_PAGE                 (PAGE_SIZE/UPROBE_XOL_SLOT_BYTES)
> +
> +bool arch_is_uretprobe_trampoline(unsigned long vaddr)
> +{
> +	unsigned long start = uprobe_get_trampoline_vaddr();
> +	unsigned long end = start + 2*UINSNS_PER_PAGE;
> +
> +	return vaddr >= start && vaddr < end;
> +}
> +
>  void *arch_uprobe_trampoline(unsigned long *psize)
>  {
>  	static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
> @@ -418,6 +439,9 @@ SYSCALL_DEFINE0(uretprobe)
>  	regs->r11 = regs->flags;
>  	regs->cx  = regs->ip;
>  
> +	/* zero rbx to signal trampoline that uretprobe syscall was executed */
> +	regs->bx  = 0;

Can we just return -ENOSYS as like as other syscall instead of
using rbx as a side channel?
We can carefully check the return address is not -ERRNO when set up
and reserve the -ENOSYS for this use case.

Thank you,

> +
>  	return regs->ax;
>  
>  sigill:
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index e0a4c2082245..dbde57a68a1b 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -213,6 +213,7 @@ extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>  extern void uprobe_handle_trampoline(struct pt_regs *regs);
>  extern void *arch_uprobe_trampoline(unsigned long *psize);
>  extern unsigned long uprobe_get_trampoline_vaddr(void);
> +bool arch_is_uretprobe_trampoline(unsigned long vaddr);
>  #else /* !CONFIG_UPROBES */
>  struct uprobes_state {
>  };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index fa04b14a7d72..73df64109f38 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1703,6 +1703,11 @@ void * __weak arch_uprobe_trampoline(unsigned long *psize)
>  	return &insn;
>  }
>  
> +bool __weak arch_is_uretprobe_trampoline(unsigned long vaddr)
> +{
> +	return vaddr == uprobe_get_trampoline_vaddr();
> +}
> +
>  static struct xol_area *__create_xol_area(unsigned long vaddr)
>  {
>  	struct mm_struct *mm = current->mm;
> @@ -1725,8 +1730,9 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
>  
>  	area->vaddr = vaddr;
>  	init_waitqueue_head(&area->wq);
> -	/* Reserve the 1st slot for get_trampoline_vaddr() */
> +	/* Reserve the first two slots for get_trampoline_vaddr() */
>  	set_bit(0, area->bitmap);
> +	set_bit(1, area->bitmap);
>  	insns = arch_uprobe_trampoline(&insns_size);
>  	arch_uprobe_copy_ixol(area->page, 0, insns, insns_size);
>  
> @@ -2536,7 +2542,7 @@ static void handle_swbp(struct pt_regs *regs)
>  	int is_swbp;
>  
>  	bp_vaddr = uprobe_get_swbp_addr(regs);
> -	if (bp_vaddr == uprobe_get_trampoline_vaddr())
> +	if (arch_is_uretprobe_trampoline(bp_vaddr))
>  		return uprobe_handle_trampoline(regs);
>  
>  	rcu_read_lock_trace();
> -- 
> 2.47.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

  reply	other threads:[~2025-01-14 10:05 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 15:12 Crash when attaching uretprobes to processes running in Docker Eyal Birger
2025-01-10 15:25 ` Aleksa Sarai
2025-01-11 18:40   ` Jiri Olsa
2025-01-14  9:22     ` Jiri Olsa
2025-01-14 10:05       ` Masami Hiramatsu [this message]
2025-01-14 11:21         ` Oleg Nesterov
2025-01-14 14:21           ` Jiri Olsa
2025-01-17  1:23             ` Masami Hiramatsu
2025-01-17  1:57               ` Oleg Nesterov
2025-01-14 10:42       ` Peter Zijlstra
2025-01-14 11:01         ` Oleg Nesterov
2025-01-14 12:02           ` Peter Zijlstra
2025-01-14 12:32             ` Oleg Nesterov
2025-01-14 14:07               ` Peter Zijlstra
2025-01-14 17:43                 ` Oleg Nesterov
2025-01-14 10:58       ` Oleg Nesterov
2025-01-14 14:19         ` Jiri Olsa
2025-01-14 19:21           ` Andrii Nakryiko
2025-01-14 20:39             ` Oleg Nesterov
2025-01-14 21:45               ` Andrii Nakryiko
2025-01-14 22:10                 ` Oleg Nesterov
2025-01-14 23:52                   ` Andrii Nakryiko
2025-01-15  0:09                     ` Eyal Birger
2025-01-15  0:50                       ` Oleg Nesterov
2025-01-15  5:45                         ` Shmulik Ladkani
2025-01-15 15:51                           ` Oleg Nesterov
2025-01-17 11:41                 ` Peter Zijlstra
2025-01-17 17:53                   ` Andrii Nakryiko
2025-01-14 14:08       ` Eyal Birger
2025-01-14 14:33         ` Oleg Nesterov
2025-01-14 14:56           ` Jiri Olsa
2025-01-14 17:25             ` Oleg Nesterov
2025-01-15  9:36               ` Jiri Olsa
2025-01-15 13:24                 ` Eyal Birger
2025-01-15 13:25                 ` Jiri Olsa
2025-01-15 15:06                 ` Oleg Nesterov
2025-01-15 17:56                   ` Alexei Starovoitov
2025-01-15 18:20                     ` Andrii Nakryiko
2025-01-15 18:40                     ` Oleg Nesterov
2025-01-15 18:48                       ` Eyal Birger
2025-01-15 19:03                         ` Oleg Nesterov
2025-01-15 21:14                           ` Eyal Birger
2025-01-16 14:39                             ` Oleg Nesterov
2025-01-16 14:47                               ` Eyal Birger
2025-01-16 15:31                                 ` Oleg Nesterov
2025-01-16 17:11                                   ` Eyal Birger
2025-01-17  0:48                                     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250114190521.0b69a1af64cac41106101154@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=cyphar@cyphar.com \
    --cc=daniel@iogearbox.net \
    --cc=eyal.birger@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=olsajiri@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rafi@rbk.io \
    --cc=rostedt@goodmis.org \
    --cc=shmulik.ladkani@gmail.com \
    --cc=songliubraving@fb.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox