* [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
@ 2025-04-14 8:36 Jiri Olsa
2025-04-14 8:36 ` [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Jiri Olsa @ 2025-04-14 8:36 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire
Adding support to emulate all nop instructions as the original uprobe
instruction.
This change speeds up uprobe on top of all nop instructions and is a
preparation for usdt probe optimization, that will be done on top of
nop5 instruction.
With this change the usdt probe on top of nop5 won't take the performance
hit compared to usdt probe on top of standard nop instruction.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
v3 changes:
- use insn->length as index to x86_nops [Andrii]
arch/x86/kernel/uprobes.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9194695662b2..6d383839e839 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -840,6 +840,11 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
insn_byte_t p;
int i;
+ /* x86_nops[insn->length]; same as jmp with .offs = 0 */
+ if (insn->length <= ASM_NOP_MAX &&
+ !memcmp(insn->kaddr, x86_nops[insn->length], insn->length))
+ goto setup;
+
switch (opc1) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench
2025-04-14 8:36 [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
@ 2025-04-14 8:36 ` Jiri Olsa
2025-04-14 16:13 ` Andrii Nakryiko
2025-04-14 12:55 ` [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Oleg Nesterov
2025-04-14 16:13 ` Andrii Nakryiko
2 siblings, 1 reply; 6+ messages in thread
From: Jiri Olsa @ 2025-04-14 8:36 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire
Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure
uprobes/uretprobes on top of nop5 instruction.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/testing/selftests/bpf/bench.c | 12 ++++++
.../selftests/bpf/benchs/bench_trigger.c | 42 +++++++++++++++++++
.../selftests/bpf/benchs/run_bench_uprobes.sh | 2 +-
3 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 1bd403a5ef7b..0fd8c9b0d38f 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -526,6 +526,12 @@ extern const struct bench bench_trig_uprobe_multi_push;
extern const struct bench bench_trig_uretprobe_multi_push;
extern const struct bench bench_trig_uprobe_multi_ret;
extern const struct bench bench_trig_uretprobe_multi_ret;
+#ifdef __x86_64__
+extern const struct bench bench_trig_uprobe_nop5;
+extern const struct bench bench_trig_uretprobe_nop5;
+extern const struct bench bench_trig_uprobe_multi_nop5;
+extern const struct bench bench_trig_uretprobe_multi_nop5;
+#endif
extern const struct bench bench_rb_libbpf;
extern const struct bench bench_rb_custom;
@@ -586,6 +592,12 @@ static const struct bench *benchs[] = {
&bench_trig_uretprobe_multi_push,
&bench_trig_uprobe_multi_ret,
&bench_trig_uretprobe_multi_ret,
+#ifdef __x86_64__
+ &bench_trig_uprobe_nop5,
+ &bench_trig_uretprobe_nop5,
+ &bench_trig_uprobe_multi_nop5,
+ &bench_trig_uretprobe_multi_nop5,
+#endif
/* ringbuf/perfbuf benchmarks */
&bench_rb_libbpf,
&bench_rb_custom,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 32e9f194d449..82327657846e 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -333,6 +333,20 @@ static void *uprobe_producer_ret(void *input)
return NULL;
}
+#ifdef __x86_64__
+__nocf_check __weak void uprobe_target_nop5(void)
+{
+ asm volatile (".byte 0x0f, 0x1f, 0x44, 0x00, 0x00");
+}
+
+static void *uprobe_producer_nop5(void *input)
+{
+ while (true)
+ uprobe_target_nop5();
+ return NULL;
+}
+#endif
+
static void usetup(bool use_retprobe, bool use_multi, void *target_addr)
{
size_t uprobe_offset;
@@ -448,6 +462,28 @@ static void uretprobe_multi_ret_setup(void)
usetup(true, true /* use_multi */, &uprobe_target_ret);
}
+#ifdef __x86_64__
+static void uprobe_nop5_setup(void)
+{
+ usetup(false, false /* !use_multi */, &uprobe_target_nop5);
+}
+
+static void uretprobe_nop5_setup(void)
+{
+ usetup(true, false /* !use_multi */, &uprobe_target_nop5);
+}
+
+static void uprobe_multi_nop5_setup(void)
+{
+ usetup(false, true /* use_multi */, &uprobe_target_nop5);
+}
+
+static void uretprobe_multi_nop5_setup(void)
+{
+ usetup(true, true /* use_multi */, &uprobe_target_nop5);
+}
+#endif
+
const struct bench bench_trig_syscall_count = {
.name = "trig-syscall-count",
.validate = trigger_validate,
@@ -506,3 +542,9 @@ BENCH_TRIG_USERMODE(uprobe_multi_ret, ret, "uprobe-multi-ret");
BENCH_TRIG_USERMODE(uretprobe_multi_nop, nop, "uretprobe-multi-nop");
BENCH_TRIG_USERMODE(uretprobe_multi_push, push, "uretprobe-multi-push");
BENCH_TRIG_USERMODE(uretprobe_multi_ret, ret, "uretprobe-multi-ret");
+#ifdef __x86_64__
+BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
+BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
+BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
+BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
+#endif
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
index af169f831f2f..03f55405484b 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
@@ -2,7 +2,7 @@
set -eufo pipefail
-for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret}
+for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5}
do
summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-15s: %s\n" $i "$summary"
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
2025-04-14 8:36 [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
2025-04-14 8:36 ` [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
@ 2025-04-14 12:55 ` Oleg Nesterov
2025-04-14 16:13 ` Andrii Nakryiko
2 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2025-04-14 12:55 UTC (permalink / raw)
To: Jiri Olsa
Cc: Peter Zijlstra, Ingo Molnar, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire
On 04/14, Jiri Olsa wrote:
>
> @@ -840,6 +840,11 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> insn_byte_t p;
> int i;
>
> + /* x86_nops[insn->length]; same as jmp with .offs = 0 */
> + if (insn->length <= ASM_NOP_MAX &&
> + !memcmp(insn->kaddr, x86_nops[insn->length], insn->length))
> + goto setup;
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench
2025-04-14 8:36 ` [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
@ 2025-04-14 16:13 ` Andrii Nakryiko
2025-04-18 7:04 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Andrii Nakryiko @ 2025-04-14 16:13 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko, bpf,
linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
Alan Maguire
On Mon, Apr 14, 2025 at 1:37 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure
> uprobes/uretprobes on top of nop5 instruction.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> tools/testing/selftests/bpf/bench.c | 12 ++++++
> .../selftests/bpf/benchs/bench_trigger.c | 42 +++++++++++++++++++
> .../selftests/bpf/benchs/run_bench_uprobes.sh | 2 +-
> 3 files changed, 55 insertions(+), 1 deletion(-)
>
LGTM. Should we land this benchmark patch through the bpf-next tree?
It won't break anything, just will be slower until patch #1 gets into
bpf-next as well, which is fine.
Ingo or Peter, any objections to me routing this patch separately
through bpf-next?
But either way:
Acked-by: Andrii Nakryiko <andrii@kernel.org>
[...]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
2025-04-14 8:36 [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
2025-04-14 8:36 ` [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
2025-04-14 12:55 ` [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Oleg Nesterov
@ 2025-04-14 16:13 ` Andrii Nakryiko
2 siblings, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2025-04-14 16:13 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko, bpf,
linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
Alan Maguire
On Mon, Apr 14, 2025 at 1:36 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding support to emulate all nop instructions as the original uprobe
> instruction.
>
> This change speeds up uprobe on top of all nop instructions and is a
> preparation for usdt probe optimization, that will be done on top of
> nop5 instruction.
>
> With this change the usdt probe on top of nop5 won't take the performance
> hit compared to usdt probe on top of standard nop instruction.
>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> v3 changes:
> - use insn->length as index to x86_nops [Andrii]
>
> arch/x86/kernel/uprobes.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 9194695662b2..6d383839e839 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -840,6 +840,11 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> insn_byte_t p;
> int i;
>
> + /* x86_nops[insn->length]; same as jmp with .offs = 0 */
> + if (insn->length <= ASM_NOP_MAX &&
> + !memcmp(insn->kaddr, x86_nops[insn->length], insn->length))
> + goto setup;
> +
LGTM, thanks!
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> switch (opc1) {
> case 0xeb: /* jmp 8 */
> case 0xe9: /* jmp 32 */
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench
2025-04-14 16:13 ` Andrii Nakryiko
@ 2025-04-18 7:04 ` Ingo Molnar
0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2025-04-18 7:04 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Jiri Olsa, Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf,
linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
Alan Maguire
* Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> On Mon, Apr 14, 2025 at 1:37 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure
> > uprobes/uretprobes on top of nop5 instruction.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > tools/testing/selftests/bpf/bench.c | 12 ++++++
> > .../selftests/bpf/benchs/bench_trigger.c | 42 +++++++++++++++++++
> > .../selftests/bpf/benchs/run_bench_uprobes.sh | 2 +-
> > 3 files changed, 55 insertions(+), 1 deletion(-)
> >
>
> LGTM. Should we land this benchmark patch through the bpf-next tree?
> It won't break anything, just will be slower until patch #1 gets into
> bpf-next as well, which is fine.
>
> Ingo or Peter, any objections to me routing this patch separately
> through bpf-next?
>
> But either way:
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
I've applied this to the perf tree with a few readability edits to the
changelogs and the new tags added in.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-18 7:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 8:36 [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
2025-04-14 8:36 ` [PATCHv3 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
2025-04-14 16:13 ` Andrii Nakryiko
2025-04-18 7:04 ` Ingo Molnar
2025-04-14 12:55 ` [PATCHv3 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Oleg Nesterov
2025-04-14 16:13 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).