From: Ingo Molnar <mingo@kernel.org>
To: Jiri Olsa <jolsa@kernel.org>
Cc: "Oleg Nesterov" <oleg@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Andrii Nakryiko" <andrii@kernel.org>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, x86@kernel.org,
"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Hao Luo" <haoluo@google.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Alan Maguire" <alan.maguire@oracle.com>,
"David Laight" <David.Laight@aculab.com>,
"Thomas Weißschuh" <thomas@t-8ch.de>
Subject: Re: [PATCH RFCv2 00/18] uprobes: Add support to optimize usdt probes on x86_64
Date: Mon, 24 Feb 2025 19:46:43 +0100 [thread overview]
Message-ID: <Z7y-kwkXZzbv-CQs@gmail.com> (raw)
In-Reply-To: <20250224140151.667679-1-jolsa@kernel.org>
* Jiri Olsa <jolsa@kernel.org> wrote:
> hi,
> this patchset adds support to optimize usdt probes on top of 5-byte
> nop instruction.
>
> The generic approach (optimize all uprobes) is hard due to emulating
> possible multiple original instructions and its related issues. The
> usdt case, which stores 5-byte nop seems much easier, so starting
> with that.
>
> The basic idea is to replace breakpoint exception with syscall which
> is faster on x86_64. For more details please see changelog of patch 8.
>
> The run_bench_uprobes.sh benchmark triggers uprobe (on top of different
> original instructions) in a loop and counts how many of those happened
> per second (the unit below is million loops).
>
> There's big speed up if you consider current usdt implementation
> (uprobe-nop) compared to proposed usdt (uprobe-nop5):
>
> # ./benchs/run_bench_uprobes.sh
>
> usermode-count : 818.386 ± 1.886M/s
> syscall-count : 8.923 ± 0.003M/s
> --> uprobe-nop : 3.086 ± 0.005M/s
> uprobe-push : 2.751 ± 0.001M/s
> uprobe-ret : 1.481 ± 0.000M/s
> --> uprobe-nop5 : 4.016 ± 0.002M/s
> uretprobe-nop : 1.712 ± 0.008M/s
> uretprobe-push : 1.616 ± 0.001M/s
> uretprobe-ret : 1.052 ± 0.000M/s
> uretprobe-nop5 : 2.015 ± 0.000M/s
So I had to dig into patch #12 to see the magnitude of the speedup:
# current:
# usermode-count : 818.836 ± 2.842M/s
# syscall-count : 8.917 ± 0.003M/s
# uprobe-nop : 3.056 ± 0.013M/s
# uprobe-push : 2.903 ± 0.002M/s
# uprobe-ret : 1.533 ± 0.001M/s
# --> uprobe-nop5 : 1.492 ± 0.000M/s
# uretprobe-nop : 1.783 ± 0.000M/s
# uretprobe-push : 1.672 ± 0.001M/s
# uretprobe-ret : 1.067 ± 0.002M/s
# --> uretprobe-nop5 : 1.052 ± 0.000M/s
#
# after the change:
#
# usermode-count : 818.386 ± 1.886M/s
# syscall-count : 8.923 ± 0.003M/s
# uprobe-nop : 3.086 ± 0.005M/s
# uprobe-push : 2.751 ± 0.001M/s
# uprobe-ret : 1.481 ± 0.000M/s
# --> uprobe-nop5 : 4.016 ± 0.002M/s
# uretprobe-nop : 1.712 ± 0.008M/s
# uretprobe-push : 1.616 ± 0.001M/s
# uretprobe-ret : 1.052 ± 0.000M/s
# --> uretprobe-nop5 : 2.015 ± 0.000M/s
That's a +169% and a +91% speedup - pretty darn impressive!
Thanks,
Ingo
prev parent reply other threads:[~2025-02-24 18:46 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-24 14:01 [PATCH RFCv2 00/18] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 01/18] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 02/18] uprobes: Make copy_from_page global Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 03/18] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 04/18] uprobes: Add uprobe_write function Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 05/18] uprobes: Add nbytes argument to uprobe_write_opcode Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 06/18] uprobes: Add orig argument to uprobe_write and uprobe_write_opcode Jiri Olsa
2025-02-28 19:07 ` Andrii Nakryiko
2025-02-28 23:12 ` Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 07/18] uprobes: Add swbp argument to arch_uretprobe_hijack_return_addr Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 08/18] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
2025-02-24 19:22 ` Alexei Starovoitov
2025-02-25 13:35 ` Jiri Olsa
2025-02-25 17:10 ` Andrii Nakryiko
2025-02-25 18:06 ` Alexei Starovoitov
2025-02-26 2:36 ` Alexei Starovoitov
2025-02-24 14:01 ` [PATCH RFCv2 09/18] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 10/18] uprobes/x86: Add mm_uprobe objects to track uprobes within mm Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 11/18] uprobes/x86: Add support to emulate nop5 instruction Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 12/18] uprobes/x86: Add support to optimize uprobes Jiri Olsa
2025-02-28 18:55 ` Andrii Nakryiko
2025-02-28 22:55 ` Jiri Olsa
2025-02-28 23:00 ` Andrii Nakryiko
2025-02-28 23:18 ` Jiri Olsa
2025-02-28 23:27 ` Andrii Nakryiko
2025-02-28 23:00 ` Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 13/18] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 14/18] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 15/18] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 16/18] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 17/18] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
2025-02-24 14:01 ` [PATCH RFCv2 18/18] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
2025-02-24 18:46 ` Ingo Molnar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7y-kwkXZzbv-CQs@gmail.com \
--to=mingo@kernel.org \
--cc=David.Laight@aculab.com \
--cc=alan.maguire@oracle.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=songliubraving@fb.com \
--cc=thomas@t-8ch.de \
--cc=x86@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.