* [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
@ 2025-04-11 12:17 Jiri Olsa
2025-04-11 12:17 ` [PATCHv2 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Jiri Olsa @ 2025-04-11 12:17 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire
Adding support to emulate all nop instructions as the original uprobe
instruction.
This change speeds up uprobe on top of all nop instructions and is a
preparation for usdt probe optimization, that will be done on top of
nop5 instruction.
With this change the usdt probe on top of nop5 won't take the performance
hit compared to usdt probe on top of standard nop instruction.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
v2 changes:
- follow Adndrii/Oleg's suggestion and emulate all the nops
arch/x86/kernel/uprobes.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9194695662b2..262960189a1c 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -840,6 +840,12 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
insn_byte_t p;
int i;
+ /* x86_nops[i]; same as jmp with .offs = 0 */
+ for (i = 1; i <= ASM_NOP_MAX; ++i) {
+ if (!memcmp(insn->kaddr, x86_nops[i], i))
+ goto setup;
+ }
+
switch (opc1) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCHv2 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench
2025-04-11 12:17 [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
@ 2025-04-11 12:17 ` Jiri Olsa
2025-04-11 12:48 ` [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Oleg Nesterov
2025-04-11 16:02 ` Andrii Nakryiko
2 siblings, 0 replies; 6+ messages in thread
From: Jiri Olsa @ 2025-04-11 12:17 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire
Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure
uprobes/uretprobes on top of nop5 instruction.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/testing/selftests/bpf/bench.c | 12 ++++++
.../selftests/bpf/benchs/bench_trigger.c | 42 +++++++++++++++++++
.../selftests/bpf/benchs/run_bench_uprobes.sh | 2 +-
3 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 1bd403a5ef7b..0fd8c9b0d38f 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -526,6 +526,12 @@ extern const struct bench bench_trig_uprobe_multi_push;
extern const struct bench bench_trig_uretprobe_multi_push;
extern const struct bench bench_trig_uprobe_multi_ret;
extern const struct bench bench_trig_uretprobe_multi_ret;
+#ifdef __x86_64__
+extern const struct bench bench_trig_uprobe_nop5;
+extern const struct bench bench_trig_uretprobe_nop5;
+extern const struct bench bench_trig_uprobe_multi_nop5;
+extern const struct bench bench_trig_uretprobe_multi_nop5;
+#endif
extern const struct bench bench_rb_libbpf;
extern const struct bench bench_rb_custom;
@@ -586,6 +592,12 @@ static const struct bench *benchs[] = {
&bench_trig_uretprobe_multi_push,
&bench_trig_uprobe_multi_ret,
&bench_trig_uretprobe_multi_ret,
+#ifdef __x86_64__
+ &bench_trig_uprobe_nop5,
+ &bench_trig_uretprobe_nop5,
+ &bench_trig_uprobe_multi_nop5,
+ &bench_trig_uretprobe_multi_nop5,
+#endif
/* ringbuf/perfbuf benchmarks */
&bench_rb_libbpf,
&bench_rb_custom,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 32e9f194d449..82327657846e 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -333,6 +333,20 @@ static void *uprobe_producer_ret(void *input)
return NULL;
}
+#ifdef __x86_64__
+__nocf_check __weak void uprobe_target_nop5(void)
+{
+ asm volatile (".byte 0x0f, 0x1f, 0x44, 0x00, 0x00");
+}
+
+static void *uprobe_producer_nop5(void *input)
+{
+ while (true)
+ uprobe_target_nop5();
+ return NULL;
+}
+#endif
+
static void usetup(bool use_retprobe, bool use_multi, void *target_addr)
{
size_t uprobe_offset;
@@ -448,6 +462,28 @@ static void uretprobe_multi_ret_setup(void)
usetup(true, true /* use_multi */, &uprobe_target_ret);
}
+#ifdef __x86_64__
+static void uprobe_nop5_setup(void)
+{
+ usetup(false, false /* !use_multi */, &uprobe_target_nop5);
+}
+
+static void uretprobe_nop5_setup(void)
+{
+ usetup(true, false /* !use_multi */, &uprobe_target_nop5);
+}
+
+static void uprobe_multi_nop5_setup(void)
+{
+ usetup(false, true /* use_multi */, &uprobe_target_nop5);
+}
+
+static void uretprobe_multi_nop5_setup(void)
+{
+ usetup(true, true /* use_multi */, &uprobe_target_nop5);
+}
+#endif
+
const struct bench bench_trig_syscall_count = {
.name = "trig-syscall-count",
.validate = trigger_validate,
@@ -506,3 +542,9 @@ BENCH_TRIG_USERMODE(uprobe_multi_ret, ret, "uprobe-multi-ret");
BENCH_TRIG_USERMODE(uretprobe_multi_nop, nop, "uretprobe-multi-nop");
BENCH_TRIG_USERMODE(uretprobe_multi_push, push, "uretprobe-multi-push");
BENCH_TRIG_USERMODE(uretprobe_multi_ret, ret, "uretprobe-multi-ret");
+#ifdef __x86_64__
+BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
+BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
+BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
+BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
+#endif
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
index af169f831f2f..03f55405484b 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
@@ -2,7 +2,7 @@
set -eufo pipefail
-for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret}
+for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5}
do
summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-15s: %s\n" $i "$summary"
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
2025-04-11 12:17 [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
2025-04-11 12:17 ` [PATCHv2 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
@ 2025-04-11 12:48 ` Oleg Nesterov
2025-04-11 16:02 ` Andrii Nakryiko
2 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2025-04-11 12:48 UTC (permalink / raw)
To: Jiri Olsa
Cc: Peter Zijlstra, Ingo Molnar, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire
On 04/11, Jiri Olsa wrote:
>
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -840,6 +840,12 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> insn_byte_t p;
> int i;
>
> + /* x86_nops[i]; same as jmp with .offs = 0 */
> + for (i = 1; i <= ASM_NOP_MAX; ++i) {
> + if (!memcmp(insn->kaddr, x86_nops[i], i))
> + goto setup;
> + }
Acked-by: Oleg Nesterov <oleg@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
2025-04-11 12:17 [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
2025-04-11 12:17 ` [PATCHv2 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
2025-04-11 12:48 ` [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Oleg Nesterov
@ 2025-04-11 16:02 ` Andrii Nakryiko
2025-04-11 16:32 ` Oleg Nesterov
2 siblings, 1 reply; 6+ messages in thread
From: Andrii Nakryiko @ 2025-04-11 16:02 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko, bpf,
linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
Alan Maguire
On Fri, Apr 11, 2025 at 5:18 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding support to emulate all nop instructions as the original uprobe
> instruction.
>
> This change speeds up uprobe on top of all nop instructions and is a
> preparation for usdt probe optimization, that will be done on top of
> nop5 instruction.
>
> With this change the usdt probe on top of nop5 won't take the performance
> hit compared to usdt probe on top of standard nop instruction.
>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> v2 changes:
> - follow Adndrii/Oleg's suggestion and emulate all the nops
>
> arch/x86/kernel/uprobes.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 9194695662b2..262960189a1c 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -840,6 +840,12 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> insn_byte_t p;
> int i;
>
> + /* x86_nops[i]; same as jmp with .offs = 0 */
> + for (i = 1; i <= ASM_NOP_MAX; ++i) {
i <= ASM_NOP_MAX && i <= insn->length
?
otherwise what prevents us from reading past the actual instruction bytes?
or, actually, shouldn't we just check memcmp(x86_nops[insn->length])
if insn->length < ASM_NOP_MAX ?
> + if (!memcmp(insn->kaddr, x86_nops[i], i))
> + goto setup;
> + }
> +
> switch (opc1) {
> case 0xeb: /* jmp 8 */
> case 0xe9: /* jmp 32 */
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
2025-04-11 16:02 ` Andrii Nakryiko
@ 2025-04-11 16:32 ` Oleg Nesterov
2025-04-13 19:05 ` Jiri Olsa
0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2025-04-11 16:32 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko, bpf,
linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
Alan Maguire
On 04/11, Andrii Nakryiko wrote:
>
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -840,6 +840,12 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> > insn_byte_t p;
> > int i;
> >
> > + /* x86_nops[i]; same as jmp with .offs = 0 */
> > + for (i = 1; i <= ASM_NOP_MAX; ++i) {
>
> i <= ASM_NOP_MAX && i <= insn->length
>
> ?
>
> otherwise what prevents us from reading past the actual instruction bytes?
Well, copy_insn() just copies MAX_UINSN_BYTES into arch_uprobe.insn[].
If, say, the 1st 11 bytes of arch_uprobe.insn (or insn->kaddr) match
x86_nops[11] then insn->length must be 11, or insn_decode() is buggy?
> or, actually, shouldn't we just check memcmp(x86_nops[insn->length])
> if insn->length < ASM_NOP_MAX ?
Hmm... agreed.
Either way this check can't (doesn't even try to) detect, say,
"rep; BYTES_NOP5", so we do not care if insn->length == 6 in this case.
Good point!
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions
2025-04-11 16:32 ` Oleg Nesterov
@ 2025-04-13 19:05 ` Jiri Olsa
0 siblings, 0 replies; 6+ messages in thread
From: Jiri Olsa @ 2025-04-13 19:05 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrii Nakryiko, Peter Zijlstra, Ingo Molnar, Andrii Nakryiko,
bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire
On Fri, Apr 11, 2025 at 06:32:43PM +0200, Oleg Nesterov wrote:
> On 04/11, Andrii Nakryiko wrote:
> >
> > > --- a/arch/x86/kernel/uprobes.c
> > > +++ b/arch/x86/kernel/uprobes.c
> > > @@ -840,6 +840,12 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> > > insn_byte_t p;
> > > int i;
> > >
> > > + /* x86_nops[i]; same as jmp with .offs = 0 */
> > > + for (i = 1; i <= ASM_NOP_MAX; ++i) {
> >
> > i <= ASM_NOP_MAX && i <= insn->length
> >
> > ?
> >
> > otherwise what prevents us from reading past the actual instruction bytes?
>
> Well, copy_insn() just copies MAX_UINSN_BYTES into arch_uprobe.insn[].
> If, say, the 1st 11 bytes of arch_uprobe.insn (or insn->kaddr) match
> x86_nops[11] then insn->length must be 11, or insn_decode() is buggy?
>
> > or, actually, shouldn't we just check memcmp(x86_nops[insn->length])
> > if insn->length < ASM_NOP_MAX ?
nice, did not think of that
>
> Hmm... agreed.
>
> Either way this check can't (doesn't even try to) detect, say,
> "rep; BYTES_NOP5", so we do not care if insn->length == 6 in this case.
>
> Good point!
I'll run tests and send formal patch for change below
thanks,
jirka
---
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9194695662b2..6d383839e839 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -840,6 +840,11 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
insn_byte_t p;
int i;
+ /* x86_nops[insn->length]; same as jmp with .offs = 0 */
+ if (insn->length <= ASM_NOP_MAX &&
+ !memcmp(insn->kaddr, x86_nops[insn->length], insn->length))
+ goto setup;
+
switch (opc1) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-13 19:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11 12:17 [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Jiri Olsa
2025-04-11 12:17 ` [PATCHv2 perf/core 2/2] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
2025-04-11 12:48 ` [PATCHv2 perf/core 1/2] uprobes/x86: Add support to emulate nop instructions Oleg Nesterov
2025-04-11 16:02 ` Andrii Nakryiko
2025-04-11 16:32 ` Oleg Nesterov
2025-04-13 19:05 ` Jiri Olsa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).