All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	bpf@vger.kernel.org, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Hao Luo <haoluo@google.com>, Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Alan Maguire <alan.maguire@oracle.com>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next 00/13] uprobes: Add support to optimize usdt probes on x86_64
Date: Fri, 13 Dec 2024 10:46:51 +0100	[thread overview]
Message-ID: <Z1wCi6zb2pto55gn@krava> (raw)
In-Reply-To: <CAEf4BzaqFJw5wR5V7HCOf_31k+BXY7_hovNB=S7nurYez2ckcg@mail.gmail.com>

On Thu, Dec 12, 2024 at 04:43:02PM -0800, Andrii Nakryiko wrote:
> On Wed, Dec 11, 2024 at 5:34 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > hi,
> > this patchset adds support to optimize usdt probes on top of 5-byte
> > nop instruction.
> >
> > The generic approach (optimize all uprobes) is hard due to emulating
> > possible multiple original instructions and its related issues. The
> > usdt case, which stores 5-byte nop seems much easier, so starting
> > with that.
> >
> > The basic idea is to replace breakpoint exception with syscall which
> > is faster on x86_64. For more details please see changelog of patch 8.
> >
> > The run_bench_uprobes.sh benchmark triggers uprobe (on top of different
> > original instructions) in a loop and counts how many of those happened
> > per second (the unit below is million loops).
> >
> > There's big speed up if you consider current usdt implementation
> > (uprobe-nop) compared to proposed usdt (uprobe-nop5):
> >
> >   # ./benchs/run_bench_uprobes.sh
> >
> >       usermode-count :  233.831 ± 0.257M/s
> >       syscall-count  :   12.107 ± 0.038M/s
> >   --> uprobe-nop     :    3.246 ± 0.004M/s
> >       uprobe-push    :    3.057 ± 0.000M/s
> >       uprobe-ret     :    1.113 ± 0.003M/s
> >   --> uprobe-nop5    :    6.751 ± 0.037M/s
> >       uretprobe-nop  :    1.740 ± 0.015M/s
> >       uretprobe-push :    1.677 ± 0.018M/s
> >       uretprobe-ret  :    0.852 ± 0.005M/s
> >       uretprobe-nop5 :    6.769 ± 0.040M/s
> 
> uretprobe-nop5 throughput is the same as uprobe-nop5?..

ok, there's bug in the uretprobe bench setup, the number is wrong, sorry
will send new numbers

jirka

> 
> 
> >
> >
> > v1 changes:
> > - rebased on top of bpf-next/master
> > - couple of function/variable renames [Andrii]
> > - added nop5 emulation [Andrii]
> > - added checks to arch_uprobe_verify_opcode [Andrii]
> > - fixed arch_uprobe_is_callable/find_nearest_page [Andrii]
> > - used CALL_INSN_OPCODE [Masami]
> > - added uprobe-nop5 benchmark [Andrii]
> > - using atomic64_t in tramp_area [Andri]
> > - using single page for all uprobe trampoline mappings
> >
> > thanks,
> > jirka
> >
> >
> > ---
> > Jiri Olsa (13):
> >       uprobes: Rename arch_uretprobe_trampoline function
> >       uprobes: Make copy_from_page global
> >       uprobes: Add nbytes argument to uprobe_write_opcode
> >       uprobes: Add arch_uprobe_verify_opcode function
> >       uprobes: Add mapping for optimized uprobe trampolines
> >       uprobes/x86: Add uprobe syscall to speed up uprobe
> >       uprobes/x86: Add support to emulate nop5 instruction
> >       uprobes/x86: Add support to optimize uprobes
> >       selftests/bpf: Use 5-byte nop for x86 usdt probes
> >       selftests/bpf: Add uprobe/usdt optimized test
> >       selftests/bpf: Add hit/attach/detach race optimized uprobe test
> >       selftests/bpf: Add uprobe syscall sigill signal test
> >       selftests/bpf: Add 5-byte nop uprobe trigger bench
> >
> >  arch/x86/entry/syscalls/syscall_64.tbl                  |   1 +
> >  arch/x86/include/asm/uprobes.h                          |   7 +++
> >  arch/x86/kernel/uprobes.c                               | 255 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  include/linux/syscalls.h                                |   2 +
> >  include/linux/uprobes.h                                 |  25 +++++++-
> >  kernel/events/uprobes.c                                 | 191 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  kernel/fork.c                                           |   1 +
> >  kernel/sys_ni.c                                         |   1 +
> >  tools/testing/selftests/bpf/bench.c                     |  12 ++++
> >  tools/testing/selftests/bpf/benchs/bench_trigger.c      |  42 +++++++++++++
> >  tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh |   2 +-
> >  tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 326 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tools/testing/selftests/bpf/progs/uprobe_optimized.c    |  29 +++++++++
> >  tools/testing/selftests/bpf/sdt.h                       |   9 ++-
> >  14 files changed, 880 insertions(+), 23 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_optimized.c

  reply	other threads:[~2024-12-13  9:46 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-11 13:33 [PATCH bpf-next 00/13] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
2024-12-11 13:33 ` [PATCH bpf-next 01/13] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
2024-12-13  0:42   ` Andrii Nakryiko
2024-12-11 13:33 ` [PATCH bpf-next 02/13] uprobes: Make copy_from_page global Jiri Olsa
2024-12-13  0:43   ` Andrii Nakryiko
2024-12-11 13:33 ` [PATCH bpf-next 03/13] uprobes: Add nbytes argument to uprobe_write_opcode Jiri Olsa
2024-12-13  0:45   ` Andrii Nakryiko
2024-12-11 13:33 ` [PATCH bpf-next 04/13] uprobes: Add arch_uprobe_verify_opcode function Jiri Olsa
2024-12-13  0:48   ` Andrii Nakryiko
2024-12-13 13:21     ` Jiri Olsa
2024-12-13 21:11       ` Andrii Nakryiko
2024-12-13 21:52         ` Jiri Olsa
2024-12-11 13:33 ` [PATCH bpf-next 05/13] uprobes: Add mapping for optimized uprobe trampolines Jiri Olsa
2024-12-13  1:01   ` Andrii Nakryiko
2024-12-13 13:42     ` Jiri Olsa
2024-12-13 21:58       ` Andrii Nakryiko
2024-12-11 13:33 ` [PATCH bpf-next 06/13] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
2024-12-13 13:48   ` Thomas Weißschuh
2024-12-13 14:51     ` Jiri Olsa
2024-12-13 15:12       ` Thomas Weißschuh
2024-12-13 21:52         ` Jiri Olsa
2024-12-14 13:21           ` Thomas Weißschuh
2024-12-16  8:03             ` Jiri Olsa
2024-12-11 13:33 ` [PATCH bpf-next 07/13] uprobes/x86: Add support to emulate nop5 instruction Jiri Olsa
2024-12-13 10:45   ` Peter Zijlstra
2024-12-13 13:02     ` Jiri Olsa
2024-12-11 13:33 ` [PATCH bpf-next 08/13] uprobes/x86: Add support to optimize uprobes Jiri Olsa
2024-12-13 10:49   ` Peter Zijlstra
2024-12-13 13:06     ` Jiri Olsa
2024-12-13 21:58   ` Andrii Nakryiko
2024-12-15 12:06   ` David Laight
2024-12-15 14:14     ` Oleg Nesterov
2024-12-16  8:08       ` Jiri Olsa
2024-12-16  9:18         ` David Laight
2024-12-16 10:12           ` Oleg Nesterov
2024-12-16 11:10             ` David Laight
2024-12-16 12:22               ` Oleg Nesterov
2024-12-16 12:50                 ` Jiri Olsa
2024-12-16 15:08                   ` David Laight
2024-12-16 16:06                     ` Jiri Olsa
2024-12-11 13:33 ` [PATCH bpf-next 09/13] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
2024-12-13 21:58   ` Andrii Nakryiko
2024-12-16  8:32     ` Jiri Olsa
2024-12-16 23:06       ` Andrii Nakryiko
2024-12-11 13:33 ` [PATCH bpf-next 10/13] selftests/bpf: Add uprobe/usdt optimized test Jiri Olsa
2024-12-13 21:58   ` Andrii Nakryiko
2024-12-16  7:58     ` Jiri Olsa
2024-12-11 13:34 ` [PATCH bpf-next 11/13] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
2024-12-13 21:58   ` Andrii Nakryiko
2024-12-16  7:59     ` Jiri Olsa
2024-12-11 13:34 ` [PATCH bpf-next 12/13] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
2024-12-11 13:34 ` [PATCH bpf-next 13/13] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
2024-12-13 21:57   ` Andrii Nakryiko
2024-12-16  7:56     ` Jiri Olsa
2024-12-13  0:43 ` [PATCH bpf-next 00/13] uprobes: Add support to optimize usdt probes on x86_64 Andrii Nakryiko
2024-12-13  9:46   ` Jiri Olsa [this message]
2024-12-13 10:51 ` Peter Zijlstra
2024-12-13 13:07   ` Jiri Olsa
2024-12-13 13:54     ` Peter Zijlstra
2024-12-13 14:05       ` Jiri Olsa
2024-12-13 18:39         ` Peter Zijlstra
2024-12-13 21:52           ` Jiri Olsa
2024-12-13 21:59             ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1wCi6zb2pto55gn@krava \
    --to=olsajiri@gmail.com \
    --cc=alan.maguire@oracle.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.