public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 bpf-next 0/5] libbpf: Make optimized uprobes backward compatible
@ 2026-02-11  8:48 Jiri Olsa
  2026-02-11  8:48 ` [PATCHv3 bpf-next 1/5] selftests/bpf: Emit nop,nop5 instructions combo for x86_64 arch Jiri Olsa
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jiri Olsa @ 2026-02-11  8:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-kernel, Song Liu, Yonghong Song, John Fastabend

hi,
we can currently optimize uprobes on top of nop5 instructions,
so application can define USDT_NOP to nop5 and use USDT macro
to define optimized usdt probes.

This works fine on new kernels, but could have performance penalty
on older kernels, that do not have the support to optimize and to
emulate nop5 instruction.

This patchset adds support to workaround the performance penalty
on older kernels that do not support uprobe optimization, please
see detailed description in patch 2.

v1: https://lore.kernel.org/bpf/20251117083551.517393-1-jolsa@kernel.org/
v2: https://lore.kernel.org/bpf/20260210133649.524292-1-jolsa@kernel.org/

v3 changes:
- fix __x86_64 define and other typos [CI]
- add missing '?' to usdt trigger program [CI]

v2 changes:
- after more investigation we realized there are some versions of
  bpftrace and stap that does not work with solution suggested in
  version 1, so we decided to switch to following solution:

  - change USDT macro [1] emits nop,nop5 instructions combo by
    default
  - libbpf detects nop,nop5 instructions combo for USDT probe,
    if there is and if uprobe syscall is detected libbpf installs
    usdt probe on top of nop5 instruction to get it optimized

- added usdt trigger benchmarks [Andrii]
- several small fixes on uprobe syscall detection, tests and other places [Andrii]
- true usdt.h source [1] updated [Andrii]
- compile usdt_* objects unconditionally [Andrii]


thanks,
jirka


[1] https://github.com/libbpf/usdt
---
Jiri Olsa (5):
      selftests/bpf: Emit nop,no5 instructions combo for x86_64 arch
      libbpf: Add uprobe syscall feature detection
      libbpf: Add support to detect nop,nop5 instructions combo for usdt probe
      selftests/bpf: Add test for checking correct nop of optimized usdt
      selftests/bpf: Add usdt trigger bench

 tools/lib/bpf/features.c                                | 24 ++++++++++++++++++++++++
 tools/lib/bpf/libbpf_internal.h                         |  2 ++
 tools/lib/bpf/usdt.c                                    | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 tools/testing/selftests/bpf/.gitignore                  |  2 ++
 tools/testing/selftests/bpf/Makefile                    |  5 ++++-
 tools/testing/selftests/bpf/bench.c                     |  4 ++++
 tools/testing/selftests/bpf/benchs/bench_trigger.c      | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh |  2 +-
 tools/testing/selftests/bpf/prog_tests/usdt.c           | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_usdt.c           |  9 +++++++++
 tools/testing/selftests/bpf/progs/trigger_bench.c       | 10 +++++++++-
 tools/testing/selftests/bpf/usdt.h                      |  2 ++
 tools/testing/selftests/bpf/usdt_1.c                    | 18 ++++++++++++++++++
 tools/testing/selftests/bpf/usdt_2.c                    | 16 ++++++++++++++++
 14 files changed, 287 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/usdt_1.c
 create mode 100644 tools/testing/selftests/bpf/usdt_2.c

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv3 bpf-next 1/5] selftests/bpf: Emit nop,nop5 instructions combo for x86_64 arch
  2026-02-11  8:48 [PATCHv3 bpf-next 0/5] libbpf: Make optimized uprobes backward compatible Jiri Olsa
@ 2026-02-11  8:48 ` Jiri Olsa
  2026-02-11  8:48 ` [PATCHv3 bpf-next 2/5] libbpf: Add uprobe syscall feature detection Jiri Olsa
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2026-02-11  8:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-kernel, Song Liu, Yonghong Song, John Fastabend

Syncing latest usdt.h change [1].

Now that we have nop5 optimization support in kernel, let's emit
nop,nop5 for usdt probe. We leave it up to the library to use
desirable nop instruction.

[1] https://github.com/libbpf/usdt/commit/c9865d158984fb2b73e3cbbdcdfb4f583ad36a73
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/usdt.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h
index 549d1f774810..c71e21df38b3 100644
--- a/tools/testing/selftests/bpf/usdt.h
+++ b/tools/testing/selftests/bpf/usdt.h
@@ -312,6 +312,8 @@ struct usdt_sema { volatile unsigned short active; };
 #ifndef USDT_NOP
 #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
 #define USDT_NOP			nop 0
+#elif defined(__x86_64__)
+#define USDT_NOP                       .byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */
 #else
 #define USDT_NOP			nop
 #endif
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 bpf-next 2/5] libbpf: Add uprobe syscall feature detection
  2026-02-11  8:48 [PATCHv3 bpf-next 0/5] libbpf: Make optimized uprobes backward compatible Jiri Olsa
  2026-02-11  8:48 ` [PATCHv3 bpf-next 1/5] selftests/bpf: Emit nop,nop5 instructions combo for x86_64 arch Jiri Olsa
@ 2026-02-11  8:48 ` Jiri Olsa
  2026-02-11 21:45   ` Andrii Nakryiko
  2026-02-11  8:48 ` [PATCHv3 bpf-next 3/5] libbpf: Add support to detect nop,nop5 instructions combo for usdt probe Jiri Olsa
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2026-02-11  8:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-kernel, Song Liu, Yonghong Song, John Fastabend

Adding uprobe syscall feature detection that will be used
in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/bpf/features.c        | 24 ++++++++++++++++++++++++
 tools/lib/bpf/libbpf_internal.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
index b842b83e2480..04a225c7f2c0 100644
--- a/tools/lib/bpf/features.c
+++ b/tools/lib/bpf/features.c
@@ -506,6 +506,27 @@ static int probe_kern_arg_ctx_tag(int token_fd)
 	return probe_fd(prog_fd);
 }
 
+#ifdef __x86_64__
+
+#ifndef __NR_uprobe
+#define __NR_uprobe 336
+#endif
+
+static int probe_uprobe_syscall(int token_fd)
+{
+	/*
+	 * If kernel supports uprobe() syscall, it will return -ENXIO when called
+	 * from the outside of a kernel-generated uprobe trampoline.
+	 */
+	return syscall(__NR_uprobe) < 0 && errno == ENXIO;
+}
+#else
+static int probe_uprobe_syscall(int token_fd)
+{
+	return 0;
+}
+#endif
+
 typedef int (*feature_probe_fn)(int /* token_fd */);
 
 static struct kern_feature_cache feature_cache;
@@ -581,6 +602,9 @@ static struct kern_feature_desc {
 	[FEAT_BTF_QMARK_DATASEC] = {
 		"BTF DATASEC names starting from '?'", probe_kern_btf_qmark_datasec,
 	},
+	[FEAT_UPROBE_SYSCALL] = {
+		"Kernel supports uprobe syscall", probe_uprobe_syscall,
+	},
 };
 
 bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index fc59b21b51b5..69aa61c038a9 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -392,6 +392,8 @@ enum kern_feature_id {
 	FEAT_ARG_CTX_TAG,
 	/* Kernel supports '?' at the front of datasec names */
 	FEAT_BTF_QMARK_DATASEC,
+	/* Kernel supports uprobe syscall */
+	FEAT_UPROBE_SYSCALL,
 	__FEAT_CNT,
 };
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 bpf-next 3/5] libbpf: Add support to detect nop,nop5 instructions combo for usdt probe
  2026-02-11  8:48 [PATCHv3 bpf-next 0/5] libbpf: Make optimized uprobes backward compatible Jiri Olsa
  2026-02-11  8:48 ` [PATCHv3 bpf-next 1/5] selftests/bpf: Emit nop,nop5 instructions combo for x86_64 arch Jiri Olsa
  2026-02-11  8:48 ` [PATCHv3 bpf-next 2/5] libbpf: Add uprobe syscall feature detection Jiri Olsa
@ 2026-02-11  8:48 ` Jiri Olsa
  2026-02-11 21:45   ` Andrii Nakryiko
  2026-02-11  8:48 ` [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt Jiri Olsa
  2026-02-11  8:48 ` [PATCHv3 bpf-next 5/5] selftests/bpf: Add usdt trigger bench Jiri Olsa
  4 siblings, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2026-02-11  8:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-kernel, Song Liu, Yonghong Song, John Fastabend

Adding support to detect nop,nop5 instructions combo for usdt probe
by checking on probe's following nop5 instruction.

When the nop,nop5 combo is detected together with uprobe syscall,
we can place the probe on top of nop5 and get it optimized.

[1] https://github.com/libbpf/usdt
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/bpf/usdt.c | 55 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index d1524f6f54ae..4e5f70bb4c31 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -262,6 +262,7 @@ struct usdt_manager {
 	bool has_bpf_cookie;
 	bool has_sema_refcnt;
 	bool has_uprobe_multi;
+	bool has_uprobe_syscall;
 };
 
 struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
@@ -301,6 +302,13 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
 	 * usdt probes.
 	 */
 	man->has_uprobe_multi = kernel_supports(obj, FEAT_UPROBE_MULTI_LINK);
+
+	/*
+	 * Detect kernel support for uprobe() syscall, it's presence means we can
+	 * take advantage of faster nop5 uprobe handling.
+	 * Added in: 56101b69c919 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
+	 */
+	man->has_uprobe_syscall = kernel_supports(obj, FEAT_UPROBE_SYSCALL);
 	return man;
 }
 
@@ -585,13 +593,42 @@ static int parse_usdt_note(GElf_Nhdr *nhdr, const char *data, size_t name_off,
 
 static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, __u64 usdt_cookie);
 
-static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
-				const char *usdt_provider, const char *usdt_name, __u64 usdt_cookie,
-				struct usdt_target **out_targets, size_t *out_target_cnt)
+#if defined(__x86_64__)
+static bool has_nop_combo(int fd, long off)
+{
+	static unsigned char nop_combo[6] = {
+		0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 /* nop,nop5 */
+	};
+	unsigned char buf[6] = {};
+
+	/*
+	 * We are using file descriptor that backs Elf object,
+	 * let's dup it to be on the safe side.
+	 */
+	fd = dup(fd);
+	if (fd < 0)
+		return false;
+	if (lseek(fd, off, SEEK_SET) == off)
+		read(fd, buf, 6);
+	close(fd);
+	return memcmp(buf, nop_combo, 6) == 0;
+}
+#else
+static bool has_nop_combo(int fd, long off)
+{
+	return false;
+}
+#endif
+
+static int collect_usdt_targets(struct usdt_manager *man, struct elf_fd *elf_fd, const char *path,
+				pid_t pid, const char *usdt_provider, const char *usdt_name,
+				__u64 usdt_cookie, struct usdt_target **out_targets,
+				size_t *out_target_cnt)
 {
 	size_t off, name_off, desc_off, seg_cnt = 0, vma_seg_cnt = 0, target_cnt = 0;
 	struct elf_seg *segs = NULL, *vma_segs = NULL;
 	struct usdt_target *targets = NULL, *target;
+	Elf *elf = elf_fd->elf;
 	long base_addr = 0;
 	Elf_Scn *notes_scn, *base_scn;
 	GElf_Shdr base_shdr, notes_shdr;
@@ -784,6 +821,16 @@ static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *
 		target = &targets[target_cnt];
 		memset(target, 0, sizeof(*target));
 
+		/*
+		 * We have uprobe syscall and usdt with nop,nop5 instructions combo,
+		 * so we can place the uprobe directly on nop5 (+1) and get this probe
+		 * optimized.
+		 */
+		if (man->has_uprobe_syscall && has_nop_combo(elf_fd->fd, usdt_rel_ip)) {
+			usdt_abs_ip++;
+			usdt_rel_ip++;
+		}
+
 		target->abs_ip = usdt_abs_ip;
 		target->rel_ip = usdt_rel_ip;
 		target->sema_off = usdt_sema_off;
@@ -998,7 +1045,7 @@ struct bpf_link *usdt_manager_attach_usdt(struct usdt_manager *man, const struct
 	/* discover USDT in given binary, optionally limiting
 	 * activations to a given PID, if pid > 0
 	 */
-	err = collect_usdt_targets(man, elf_fd.elf, path, pid, usdt_provider, usdt_name,
+	err = collect_usdt_targets(man, &elf_fd, path, pid, usdt_provider, usdt_name,
 				   usdt_cookie, &targets, &target_cnt);
 	if (err <= 0) {
 		err = (err == 0) ? -ENOENT : err;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt
  2026-02-11  8:48 [PATCHv3 bpf-next 0/5] libbpf: Make optimized uprobes backward compatible Jiri Olsa
                   ` (2 preceding siblings ...)
  2026-02-11  8:48 ` [PATCHv3 bpf-next 3/5] libbpf: Add support to detect nop,nop5 instructions combo for usdt probe Jiri Olsa
@ 2026-02-11  8:48 ` Jiri Olsa
  2026-02-11  9:13   ` bot+bpf-ci
  2026-02-11 21:45   ` Andrii Nakryiko
  2026-02-11  8:48 ` [PATCHv3 bpf-next 5/5] selftests/bpf: Add usdt trigger bench Jiri Olsa
  4 siblings, 2 replies; 15+ messages in thread
From: Jiri Olsa @ 2026-02-11  8:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-kernel, Song Liu, Yonghong Song, John Fastabend

Adding test that attaches bpf program on usdt probe in 2 scenarios;

- attach program on top of usdt_1, which is single nop instruction,
  so the probe stays on nop instruction and is not optimized.

- attach program on top of usdt_2 which is probe defined on top
  of nop,nop5 combo, so the probe is placed on top of nop5 and
  is optimized.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/.gitignore        |  2 +
 tools/testing/selftests/bpf/Makefile          |  3 +-
 tools/testing/selftests/bpf/prog_tests/usdt.c | 85 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_usdt.c |  9 ++
 tools/testing/selftests/bpf/usdt_1.c          | 18 ++++
 tools/testing/selftests/bpf/usdt_2.c          | 16 ++++
 6 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/usdt_1.c
 create mode 100644 tools/testing/selftests/bpf/usdt_2.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index a3ea98211ea6..bfdc5518ecc8 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -47,3 +47,5 @@ verification_cert.h
 *.BTF
 *.BTF_ids
 *.BTF.base
+usdt_1
+usdt_2
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index c6bf4dfb1495..306949162a5b 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -749,7 +749,8 @@ TRUNNER_EXTRA_SOURCES := test_progs.c		\
 			 $(VERIFY_SIG_HDR)		\
 			 flow_dissector_load.h	\
 			 ip_check_defrag_frags.h	\
-			 bpftool_helpers.c
+			 bpftool_helpers.c	\
+			 usdt_1.c usdt_2.c
 TRUNNER_LIB_SOURCES := find_bit.c
 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read				\
 		       $(OUTPUT)/liburandom_read.so			\
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index f4be5269fa90..6daed3dfa75b 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -247,6 +247,89 @@ static void subtest_basic_usdt(bool optimized)
 #undef TRIGGER
 }
 
+#ifdef __x86_64__
+extern void usdt_1(void);
+extern void usdt_2(void);
+
+/* nop, nop5 */
+static unsigned char nop1_nop5_combo[6] = { 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char nop1[6] = { 0x90 };
+
+static void *find_instr(void *fn, unsigned char *instr, size_t cnt)
+{
+	int i;
+
+	for (i = 0; i < 10; i++) {
+		if (!memcmp(instr, fn + i, cnt))
+			return fn + i;
+	}
+	return NULL;
+}
+
+static void subtest_optimized_attach(void)
+{
+	struct test_usdt *skel;
+	__u8 *addr_1, *addr_2;
+
+	/* usdt_1 USDT probe has single nop instruction */
+	addr_1 = find_instr(usdt_1, nop1_nop5_combo, 6);
+	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop5_combo"))
+		return;
+
+	addr_1 = find_instr(usdt_1, nop1, 1);
+	if (!ASSERT_OK_PTR(addr_1, "usdt_1_find_nop1"))
+		return;
+
+	/* usdt_1 USDT probe has nop,nop5 instructions combo */
+	addr_2 = find_instr(usdt_2, nop1_nop5_combo, 6);
+	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop5_combo"))
+		return;
+
+	skel = test_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_usdt__open_and_load"))
+		return;
+
+	/*
+	 * Attach program on top of usdt_1 which is single nop probe,
+	 * so the probe won't get optimized.
+	 */
+	skel->links.usdt_executed = bpf_program__attach_usdt(skel->progs.usdt_executed,
+						     0 /*self*/, "/proc/self/exe",
+						     "optimized_attach", "usdt_1", NULL);
+	if (!ASSERT_OK_PTR(skel->links.usdt_executed, "bpf_program__attach_usdt"))
+		goto cleanup;
+
+	usdt_1();
+	usdt_1();
+
+	/* nop is on addr_1 address */
+	ASSERT_EQ(*addr_1, 0xcc, "int3");
+	ASSERT_EQ(skel->bss->executed, 2, "executed");
+
+	bpf_link__destroy(skel->links.usdt_executed);
+
+	/*
+	 * Attach program on top of usdt_2 which is probe defined on top
+	 * of nop1,nop5 combo, so the probe gets optimized on top of nop5.
+	 */
+	skel->links.usdt_executed = bpf_program__attach_usdt(skel->progs.usdt_executed,
+						     0 /*self*/, "/proc/self/exe",
+						     "optimized_attach", "usdt_2", NULL);
+	if (!ASSERT_OK_PTR(skel->links.usdt_executed, "bpf_program__attach_usdt"))
+		goto cleanup;
+
+	usdt_2();
+	usdt_2();
+
+	/* nop5 is on addr_2 + 1 address */
+	ASSERT_EQ(*(addr_2 + 1), 0xe8, "call");
+	ASSERT_EQ(skel->bss->executed, 4, "executed");
+
+cleanup:
+	test_usdt__destroy(skel);
+}
+#endif
+
 unsigned short test_usdt_100_semaphore SEC(".probes");
 unsigned short test_usdt_300_semaphore SEC(".probes");
 unsigned short test_usdt_400_semaphore SEC(".probes");
@@ -516,6 +599,8 @@ void test_usdt(void)
 #ifdef __x86_64__
 	if (test__start_subtest("basic_optimized"))
 		subtest_basic_usdt(true);
+	if (test__start_subtest("optimized_attach"))
+		subtest_optimized_attach();
 #endif
 	if (test__start_subtest("multispec"))
 		subtest_multispec_usdt();
diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
index a78c87537b07..6911868cdf67 100644
--- a/tools/testing/selftests/bpf/progs/test_usdt.c
+++ b/tools/testing/selftests/bpf/progs/test_usdt.c
@@ -138,4 +138,13 @@ int usdt_sib(struct pt_regs *ctx)
 	return 0;
 }
 
+int executed;
+
+SEC("usdt")
+int usdt_executed(struct pt_regs *ctx)
+{
+	executed++;
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/usdt_1.c b/tools/testing/selftests/bpf/usdt_1.c
new file mode 100644
index 000000000000..4f06e8bcf58b
--- /dev/null
+++ b/tools/testing/selftests/bpf/usdt_1.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#if defined(__x86_64__)
+
+/*
+ * Include usdt.h with defined USDT_NOP macro to use single
+ * nop instruction.
+ */
+#define USDT_NOP .byte 0x90
+#include "usdt.h"
+
+__attribute__((aligned(16)))
+void usdt_1(void)
+{
+	USDT(optimized_attach, usdt_1);
+}
+
+#endif
diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
new file mode 100644
index 000000000000..789883aaca4c
--- /dev/null
+++ b/tools/testing/selftests/bpf/usdt_2.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#if defined(__x86_64__)
+
+/*
+ * Include usdt.h with default nop,nop5 instructions combo.
+ */
+#include "usdt.h"
+
+__attribute__((aligned(16)))
+void usdt_2(void)
+{
+	USDT(optimized_attach, usdt_2);
+}
+
+#endif
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv3 bpf-next 5/5] selftests/bpf: Add usdt trigger bench
  2026-02-11  8:48 [PATCHv3 bpf-next 0/5] libbpf: Make optimized uprobes backward compatible Jiri Olsa
                   ` (3 preceding siblings ...)
  2026-02-11  8:48 ` [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt Jiri Olsa
@ 2026-02-11  8:48 ` Jiri Olsa
  2026-02-11 21:45   ` Andrii Nakryiko
  4 siblings, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2026-02-11  8:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, linux-kernel, Song Liu, Yonghong Song, John Fastabend

Adding usdt trigger bench for usdt:
 trig-usdt_nop - usdt on top of nop1 instruction
 trig-usdt_nop_combo - usdt on top of nop1/nop5 combo

Adding it to benchs/run_bench_uprobes.sh script.

Example run on x86_64 kernel with uprobe syscall:

  # ./benchs/run_bench_uprobes.sh
  usermode-count :  152.507 ± 0.098M/s
  syscall-count  :   14.309 ± 0.093M/s
  uprobe-nop     :    3.190 ± 0.012M/s
  uprobe-push    :    3.057 ± 0.004M/s
  uprobe-ret     :    1.095 ± 0.009M/s
  uprobe-nop5    :    7.305 ± 0.034M/s
  uretprobe-nop  :    2.175 ± 0.005M/s
  uretprobe-push :    2.109 ± 0.003M/s
  uretprobe-ret  :    0.945 ± 0.002M/s
  uretprobe-nop5 :    3.530 ± 0.006M/s
  usdt_nop       :    3.235 ± 0.008M/s   <-- added
  usdt_nop_combo :    7.511 ± 0.045M/s   <-- added

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/Makefile          |  2 +
 tools/testing/selftests/bpf/bench.c           |  4 ++
 .../selftests/bpf/benchs/bench_trigger.c      | 60 +++++++++++++++++++
 .../selftests/bpf/benchs/run_bench_uprobes.sh |  2 +-
 .../selftests/bpf/progs/trigger_bench.c       | 10 +++-
 5 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 306949162a5b..9b2ca0028322 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -874,6 +874,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
 		 $(OUTPUT)/bench_bpf_crypto.o \
 		 $(OUTPUT)/bench_sockmap.o \
 		 $(OUTPUT)/bench_lpm_trie_map.o \
+		 $(OUTPUT)/usdt_1.o \
+		 $(OUTPUT)/usdt_2.o \
 		 #
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 8368bd3a0665..4dacb87e464e 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -541,6 +541,8 @@ extern const struct bench bench_trig_uprobe_nop5;
 extern const struct bench bench_trig_uretprobe_nop5;
 extern const struct bench bench_trig_uprobe_multi_nop5;
 extern const struct bench bench_trig_uretprobe_multi_nop5;
+extern const struct bench bench_trig_usdt_nop;
+extern const struct bench bench_trig_usdt_nop_combo;
 #endif
 
 extern const struct bench bench_rb_libbpf;
@@ -617,6 +619,8 @@ static const struct bench *benchs[] = {
 	&bench_trig_uretprobe_nop5,
 	&bench_trig_uprobe_multi_nop5,
 	&bench_trig_uretprobe_multi_nop5,
+	&bench_trig_usdt_nop,
+	&bench_trig_usdt_nop_combo,
 #endif
 	/* ringbuf/perfbuf benchmarks */
 	&bench_rb_libbpf,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index aeec9edd3851..b4b03fe1f61d 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -405,6 +405,23 @@ static void *uprobe_producer_nop5(void *input)
 		uprobe_target_nop5();
 	return NULL;
 }
+
+void usdt_1(void);
+void usdt_2(void);
+
+static void *uprobe_producer_usdt_nop(void *input)
+{
+	while (true)
+		usdt_1();
+	return NULL;
+}
+
+static void *uprobe_producer_usdt_nop_combo(void *input)
+{
+	while (true)
+		usdt_2();
+	return NULL;
+}
 #endif
 
 static void usetup(bool use_retprobe, bool use_multi, void *target_addr)
@@ -542,6 +559,47 @@ static void uretprobe_multi_nop5_setup(void)
 {
 	usetup(true, true /* use_multi */, &uprobe_target_nop5);
 }
+
+static void usdt_setup(const char *name)
+{
+	struct bpf_link *link;
+	int err;
+
+	setup_libbpf();
+
+	ctx.skel = trigger_bench__open();
+	if (!ctx.skel) {
+		fprintf(stderr, "failed to open skeleton\n");
+		exit(1);
+	}
+
+	bpf_program__set_autoload(ctx.skel->progs.bench_trigger_usdt, true);
+
+	err = trigger_bench__load(ctx.skel);
+	if (err) {
+		fprintf(stderr, "failed to load skeleton\n");
+		exit(1);
+	}
+
+	link = bpf_program__attach_usdt(ctx.skel->progs.bench_trigger_usdt,
+					0 /*self*/, "/proc/self/exe",
+					"optimized_attach", name, NULL);
+	if (libbpf_get_error(link)) {
+		fprintf(stderr, "failed to attach optimized_attach:%s usdt probe\n", name);
+		exit(1);
+	}
+	ctx.skel->links.bench_trigger_usdt = link;
+}
+
+static void usdt_nop_setup(void)
+{
+	usdt_setup("usdt_1");
+}
+
+static void usdt_nop_combo_setup(void)
+{
+	usdt_setup("usdt_2");
+}
 #endif
 
 const struct bench bench_trig_syscall_count = {
@@ -609,4 +667,6 @@ BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
 BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
 BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
 BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
+BENCH_TRIG_USERMODE(usdt_nop, usdt_nop, "usdt_nop");
+BENCH_TRIG_USERMODE(usdt_nop_combo, usdt_nop_combo, "usdt_nop_combo");
 #endif
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
index 03f55405484b..3656676d99d2 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
@@ -2,7 +2,7 @@
 
 set -eufo pipefail
 
-for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5}
+for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5} usdt_nop usdt_nop_combo
 do
 	summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
 	printf "%-15s: %s\n" $i "$summary"
diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c
index 4ea0422d1042..3225b4aee8ff 100644
--- a/tools/testing/selftests/bpf/progs/trigger_bench.c
+++ b/tools/testing/selftests/bpf/progs/trigger_bench.c
@@ -1,10 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2020 Facebook
-#include <linux/bpf.h>
+#include "vmlinux.h"
 #include <asm/unistd.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
 #include "bpf_misc.h"
+#include "bpf/usdt.bpf.h"
 
 char _license[] SEC("license") = "GPL";
 
@@ -180,3 +181,10 @@ int bench_trigger_rawtp(void *ctx)
 	handle(ctx);
 	return 0;
 }
+
+SEC("?usdt")
+int bench_trigger_usdt(void *ctx)
+{
+	inc_counter();
+	return 0;
+}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt
  2026-02-11  8:48 ` [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt Jiri Olsa
@ 2026-02-11  9:13   ` bot+bpf-ci
  2026-02-11 21:45   ` Andrii Nakryiko
  1 sibling, 0 replies; 15+ messages in thread
From: bot+bpf-ci @ 2026-02-11  9:13 UTC (permalink / raw)
  To: jolsa, andrii
  Cc: bpf, linux-kernel, songliubraving, yhs, john.fastabend, ast,
	andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1546 bytes --]

> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c

[ ... ]

> +static void subtest_optimized_attach(void)
> +{
> +	struct test_usdt *skel;
> +	__u8 *addr_1, *addr_2;
> +
> +	/* usdt_1 USDT probe has single nop instruction */
> +	addr_1 = find_instr(usdt_1, nop1_nop5_combo, 6);
> +	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop5_combo"))
> +		return;
> +
> +	addr_1 = find_instr(usdt_1, nop1, 1);
> +	if (!ASSERT_OK_PTR(addr_1, "usdt_1_find_nop1"))
> +		return;
> +
> +	/* usdt_1 USDT probe has nop,nop5 instructions combo */
> +	addr_2 = find_instr(usdt_2, nop1_nop5_combo, 6);

This comment says "usdt_1" but the code operates on "usdt_2". Should this
say "usdt_2 USDT probe has nop,nop5 instructions combo" to match the code?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21898612624

AI-authorship-score: low
AI-authorship-explanation: The code follows standard kernel selftest patterns with natural commit message style, and the copy-paste comment error is a characteristically human mistake.
issues-found: 1
issue-severity-score: low
issue-severity-explanation: Comment/code mismatch in test file where comment says usdt_1 but code operates on usdt_2; does not affect test functionality.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 2/5] libbpf: Add uprobe syscall feature detection
  2026-02-11  8:48 ` [PATCHv3 bpf-next 2/5] libbpf: Add uprobe syscall feature detection Jiri Olsa
@ 2026-02-11 21:45   ` Andrii Nakryiko
  2026-02-12 14:08     ` Jiri Olsa
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2026-02-11 21:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 12:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding uprobe syscall feature detection that will be used
> in following changes.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/lib/bpf/features.c        | 24 ++++++++++++++++++++++++
>  tools/lib/bpf/libbpf_internal.h |  2 ++
>  2 files changed, 26 insertions(+)
>
> diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
> index b842b83e2480..04a225c7f2c0 100644
> --- a/tools/lib/bpf/features.c
> +++ b/tools/lib/bpf/features.c
> @@ -506,6 +506,27 @@ static int probe_kern_arg_ctx_tag(int token_fd)
>         return probe_fd(prog_fd);
>  }
>
> +#ifdef __x86_64__
> +
> +#ifndef __NR_uprobe
> +#define __NR_uprobe 336
> +#endif
> +
> +static int probe_uprobe_syscall(int token_fd)
> +{
> +       /*
> +        * If kernel supports uprobe() syscall, it will return -ENXIO when called
> +        * from the outside of a kernel-generated uprobe trampoline.
> +        */
> +       return syscall(__NR_uprobe) < 0 && errno == ENXIO;
> +}
> +#else
> +static int probe_uprobe_syscall(int token_fd)
> +{
> +       return 0;
> +}
> +#endif
> +
>  typedef int (*feature_probe_fn)(int /* token_fd */);
>
>  static struct kern_feature_cache feature_cache;
> @@ -581,6 +602,9 @@ static struct kern_feature_desc {
>         [FEAT_BTF_QMARK_DATASEC] = {
>                 "BTF DATASEC names starting from '?'", probe_kern_btf_qmark_datasec,
>         },
> +       [FEAT_UPROBE_SYSCALL] = {
> +               "Kernel supports uprobe syscall", probe_uprobe_syscall,

this will conflict with libbpf arena relocation fix landed into bpf,
so let's wait until trees merge.

But also, this description is going into the middle of a sentence,
start it with lower case

pw-bot: cr



> +       },
>  };
>
>  bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> index fc59b21b51b5..69aa61c038a9 100644
> --- a/tools/lib/bpf/libbpf_internal.h
> +++ b/tools/lib/bpf/libbpf_internal.h
> @@ -392,6 +392,8 @@ enum kern_feature_id {
>         FEAT_ARG_CTX_TAG,
>         /* Kernel supports '?' at the front of datasec names */
>         FEAT_BTF_QMARK_DATASEC,
> +       /* Kernel supports uprobe syscall */
> +       FEAT_UPROBE_SYSCALL,
>         __FEAT_CNT,
>  };
>
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 3/5] libbpf: Add support to detect nop,nop5 instructions combo for usdt probe
  2026-02-11  8:48 ` [PATCHv3 bpf-next 3/5] libbpf: Add support to detect nop,nop5 instructions combo for usdt probe Jiri Olsa
@ 2026-02-11 21:45   ` Andrii Nakryiko
  2026-02-12 14:08     ` Jiri Olsa
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2026-02-11 21:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 12:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding support to detect nop,nop5 instructions combo for usdt probe
> by checking on probe's following nop5 instruction.
>
> When the nop,nop5 combo is detected together with uprobe syscall,
> we can place the probe on top of nop5 and get it optimized.
>
> [1] https://github.com/libbpf/usdt
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/lib/bpf/usdt.c | 55 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> index d1524f6f54ae..4e5f70bb4c31 100644
> --- a/tools/lib/bpf/usdt.c
> +++ b/tools/lib/bpf/usdt.c
> @@ -262,6 +262,7 @@ struct usdt_manager {
>         bool has_bpf_cookie;
>         bool has_sema_refcnt;
>         bool has_uprobe_multi;
> +       bool has_uprobe_syscall;
>  };
>
>  struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
> @@ -301,6 +302,13 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
>          * usdt probes.
>          */
>         man->has_uprobe_multi = kernel_supports(obj, FEAT_UPROBE_MULTI_LINK);
> +
> +       /*
> +        * Detect kernel support for uprobe() syscall, it's presence means we can
> +        * take advantage of faster nop5 uprobe handling.
> +        * Added in: 56101b69c919 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
> +        */
> +       man->has_uprobe_syscall = kernel_supports(obj, FEAT_UPROBE_SYSCALL);
>         return man;
>  }
>
> @@ -585,13 +593,42 @@ static int parse_usdt_note(GElf_Nhdr *nhdr, const char *data, size_t name_off,
>
>  static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, __u64 usdt_cookie);
>
> -static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
> -                               const char *usdt_provider, const char *usdt_name, __u64 usdt_cookie,
> -                               struct usdt_target **out_targets, size_t *out_target_cnt)
> +#if defined(__x86_64__)
> +static bool has_nop_combo(int fd, long off)
> +{
> +       static unsigned char nop_combo[6] = {
> +               0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 /* nop,nop5 */
> +       };
> +       unsigned char buf[6] = {};
> +
> +       /*
> +        * We are using file descriptor that backs Elf object,
> +        * let's dup it to be on the safe side.
> +        */
> +       fd = dup(fd);
> +       if (fd < 0)
> +               return false;
> +       if (lseek(fd, off, SEEK_SET) == off)
> +               read(fd, buf, 6);
> +       close(fd);

ugh, use pread() instead of all this ? I wouldn't bother with short
read handling, if we didn't get 6 bytes, so be it, no nop5.

> +       return memcmp(buf, nop_combo, 6) == 0;
> +}
> +#else
> +static bool has_nop_combo(int fd, long off)
> +{
> +       return false;
> +}
> +#endif
> +

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 5/5] selftests/bpf: Add usdt trigger bench
  2026-02-11  8:48 ` [PATCHv3 bpf-next 5/5] selftests/bpf: Add usdt trigger bench Jiri Olsa
@ 2026-02-11 21:45   ` Andrii Nakryiko
  2026-02-12 14:09     ` Jiri Olsa
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2026-02-11 21:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 12:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding usdt trigger bench for usdt:
>  trig-usdt_nop - usdt on top of nop1 instruction
>  trig-usdt_nop_combo - usdt on top of nop1/nop5 combo
>
> Adding it to benchs/run_bench_uprobes.sh script.
>
> Example run on x86_64 kernel with uprobe syscall:
>
>   # ./benchs/run_bench_uprobes.sh
>   usermode-count :  152.507 ± 0.098M/s
>   syscall-count  :   14.309 ± 0.093M/s
>   uprobe-nop     :    3.190 ± 0.012M/s
>   uprobe-push    :    3.057 ± 0.004M/s
>   uprobe-ret     :    1.095 ± 0.009M/s
>   uprobe-nop5    :    7.305 ± 0.034M/s
>   uretprobe-nop  :    2.175 ± 0.005M/s
>   uretprobe-push :    2.109 ± 0.003M/s
>   uretprobe-ret  :    0.945 ± 0.002M/s
>   uretprobe-nop5 :    3.530 ± 0.006M/s
>   usdt_nop       :    3.235 ± 0.008M/s   <-- added
>   usdt_nop_combo :    7.511 ± 0.045M/s   <-- added

consistency, usdt-nop. And for nop_combo I'd use usdt-nop5, that combo
doesn't matter for performance beyond the fact that we have nop5 there

>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/testing/selftests/bpf/Makefile          |  2 +
>  tools/testing/selftests/bpf/bench.c           |  4 ++
>  .../selftests/bpf/benchs/bench_trigger.c      | 60 +++++++++++++++++++
>  .../selftests/bpf/benchs/run_bench_uprobes.sh |  2 +-
>  .../selftests/bpf/progs/trigger_bench.c       | 10 +++-
>  5 files changed, 76 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 306949162a5b..9b2ca0028322 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -874,6 +874,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
>                  $(OUTPUT)/bench_bpf_crypto.o \
>                  $(OUTPUT)/bench_sockmap.o \
>                  $(OUTPUT)/bench_lpm_trie_map.o \
> +                $(OUTPUT)/usdt_1.o \
> +                $(OUTPUT)/usdt_2.o \
>                  #
>         $(call msg,BINARY,,$@)
>         $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
> diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
> index 8368bd3a0665..4dacb87e464e 100644
> --- a/tools/testing/selftests/bpf/bench.c
> +++ b/tools/testing/selftests/bpf/bench.c
> @@ -541,6 +541,8 @@ extern const struct bench bench_trig_uprobe_nop5;
>  extern const struct bench bench_trig_uretprobe_nop5;
>  extern const struct bench bench_trig_uprobe_multi_nop5;
>  extern const struct bench bench_trig_uretprobe_multi_nop5;
> +extern const struct bench bench_trig_usdt_nop;
> +extern const struct bench bench_trig_usdt_nop_combo;
>  #endif
>
>  extern const struct bench bench_rb_libbpf;
> @@ -617,6 +619,8 @@ static const struct bench *benchs[] = {
>         &bench_trig_uretprobe_nop5,
>         &bench_trig_uprobe_multi_nop5,
>         &bench_trig_uretprobe_multi_nop5,
> +       &bench_trig_usdt_nop,
> +       &bench_trig_usdt_nop_combo,
>  #endif
>         /* ringbuf/perfbuf benchmarks */
>         &bench_rb_libbpf,
> diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
> index aeec9edd3851..b4b03fe1f61d 100644
> --- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
> +++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
> @@ -405,6 +405,23 @@ static void *uprobe_producer_nop5(void *input)
>                 uprobe_target_nop5();
>         return NULL;
>  }
> +
> +void usdt_1(void);
> +void usdt_2(void);
> +
> +static void *uprobe_producer_usdt_nop(void *input)
> +{
> +       while (true)
> +               usdt_1();
> +       return NULL;
> +}
> +
> +static void *uprobe_producer_usdt_nop_combo(void *input)
> +{
> +       while (true)
> +               usdt_2();
> +       return NULL;
> +}
>  #endif
>
>  static void usetup(bool use_retprobe, bool use_multi, void *target_addr)
> @@ -542,6 +559,47 @@ static void uretprobe_multi_nop5_setup(void)
>  {
>         usetup(true, true /* use_multi */, &uprobe_target_nop5);
>  }
> +
> +static void usdt_setup(const char *name)
> +{
> +       struct bpf_link *link;
> +       int err;
> +
> +       setup_libbpf();
> +
> +       ctx.skel = trigger_bench__open();
> +       if (!ctx.skel) {
> +               fprintf(stderr, "failed to open skeleton\n");
> +               exit(1);
> +       }
> +
> +       bpf_program__set_autoload(ctx.skel->progs.bench_trigger_usdt, true);
> +
> +       err = trigger_bench__load(ctx.skel);
> +       if (err) {
> +               fprintf(stderr, "failed to load skeleton\n");
> +               exit(1);
> +       }
> +
> +       link = bpf_program__attach_usdt(ctx.skel->progs.bench_trigger_usdt,
> +                                       0 /*self*/, "/proc/self/exe",
> +                                       "optimized_attach", name, NULL);
> +       if (libbpf_get_error(link)) {
> +               fprintf(stderr, "failed to attach optimized_attach:%s usdt probe\n", name);
> +               exit(1);
> +       }
> +       ctx.skel->links.bench_trigger_usdt = link;
> +}
> +
> +static void usdt_nop_setup(void)
> +{
> +       usdt_setup("usdt_1");
> +}
> +
> +static void usdt_nop_combo_setup(void)
> +{
> +       usdt_setup("usdt_2");
> +}
>  #endif
>
>  const struct bench bench_trig_syscall_count = {
> @@ -609,4 +667,6 @@ BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
>  BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
>  BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
>  BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
> +BENCH_TRIG_USERMODE(usdt_nop, usdt_nop, "usdt_nop");
> +BENCH_TRIG_USERMODE(usdt_nop_combo, usdt_nop_combo, "usdt_nop_combo");
>  #endif
> diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
> index 03f55405484b..3656676d99d2 100755
> --- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
> +++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
> @@ -2,7 +2,7 @@
>
>  set -eufo pipefail
>
> -for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5}
> +for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5} usdt_nop usdt_nop_combo

usdt_{nop,nop5}, consistency ;)


>  do
>         summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
>         printf "%-15s: %s\n" $i "$summary"
> diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c
> index 4ea0422d1042..3225b4aee8ff 100644
> --- a/tools/testing/selftests/bpf/progs/trigger_bench.c
> +++ b/tools/testing/selftests/bpf/progs/trigger_bench.c
> @@ -1,10 +1,11 @@
>  // SPDX-License-Identifier: GPL-2.0
>  // Copyright (c) 2020 Facebook
> -#include <linux/bpf.h>
> +#include "vmlinux.h"
>  #include <asm/unistd.h>
>  #include <bpf/bpf_helpers.h>
>  #include <bpf/bpf_tracing.h>
>  #include "bpf_misc.h"
> +#include "bpf/usdt.bpf.h"
>
>  char _license[] SEC("license") = "GPL";
>
> @@ -180,3 +181,10 @@ int bench_trigger_rawtp(void *ctx)
>         handle(ctx);
>         return 0;
>  }
> +
> +SEC("?usdt")
> +int bench_trigger_usdt(void *ctx)
> +{
> +       inc_counter();
> +       return 0;
> +}
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt
  2026-02-11  8:48 ` [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt Jiri Olsa
  2026-02-11  9:13   ` bot+bpf-ci
@ 2026-02-11 21:45   ` Andrii Nakryiko
  2026-02-12 14:10     ` Jiri Olsa
  1 sibling, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2026-02-11 21:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 12:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding test that attaches bpf program on usdt probe in 2 scenarios;
>
> - attach program on top of usdt_1, which is single nop instruction,
>   so the probe stays on nop instruction and is not optimized.
>
> - attach program on top of usdt_2 which is probe defined on top
>   of nop,nop5 combo, so the probe is placed on top of nop5 and
>   is optimized.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/testing/selftests/bpf/.gitignore        |  2 +
>  tools/testing/selftests/bpf/Makefile          |  3 +-
>  tools/testing/selftests/bpf/prog_tests/usdt.c | 85 +++++++++++++++++++
>  tools/testing/selftests/bpf/progs/test_usdt.c |  9 ++
>  tools/testing/selftests/bpf/usdt_1.c          | 18 ++++
>  tools/testing/selftests/bpf/usdt_2.c          | 16 ++++
>  6 files changed, 132 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/usdt_1.c
>  create mode 100644 tools/testing/selftests/bpf/usdt_2.c
>
> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
> index a3ea98211ea6..bfdc5518ecc8 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -47,3 +47,5 @@ verification_cert.h
>  *.BTF
>  *.BTF_ids
>  *.BTF.base
> +usdt_1
> +usdt_2
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index c6bf4dfb1495..306949162a5b 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -749,7 +749,8 @@ TRUNNER_EXTRA_SOURCES := test_progs.c               \
>                          $(VERIFY_SIG_HDR)              \
>                          flow_dissector_load.h  \
>                          ip_check_defrag_frags.h        \
> -                        bpftool_helpers.c
> +                        bpftool_helpers.c      \
> +                        usdt_1.c usdt_2.c
>  TRUNNER_LIB_SOURCES := find_bit.c
>  TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read                          \
>                        $(OUTPUT)/liburandom_read.so                     \
> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> index f4be5269fa90..6daed3dfa75b 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -247,6 +247,89 @@ static void subtest_basic_usdt(bool optimized)
>  #undef TRIGGER
>  }
>
> +#ifdef __x86_64__
> +extern void usdt_1(void);
> +extern void usdt_2(void);
> +
> +/* nop, nop5 */
> +static unsigned char nop1_nop5_combo[6] = { 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char nop1[6] = { 0x90 };
> +
> +static void *find_instr(void *fn, unsigned char *instr, size_t cnt)
> +{
> +       int i;
> +
> +       for (i = 0; i < 10; i++) {
> +               if (!memcmp(instr, fn + i, cnt))
> +                       return fn + i;
> +       }
> +       return NULL;
> +}
> +
> +static void subtest_optimized_attach(void)
> +{
> +       struct test_usdt *skel;
> +       __u8 *addr_1, *addr_2;
> +
> +       /* usdt_1 USDT probe has single nop instruction */
> +       addr_1 = find_instr(usdt_1, nop1_nop5_combo, 6);
> +       if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop5_combo"))
> +               return;
> +
> +       addr_1 = find_instr(usdt_1, nop1, 1);
> +       if (!ASSERT_OK_PTR(addr_1, "usdt_1_find_nop1"))
> +               return;
> +
> +       /* usdt_1 USDT probe has nop,nop5 instructions combo */
> +       addr_2 = find_instr(usdt_2, nop1_nop5_combo, 6);
> +       if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop5_combo"))
> +               return;
> +
> +       skel = test_usdt__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "test_usdt__open_and_load"))
> +               return;
> +
> +       /*
> +        * Attach program on top of usdt_1 which is single nop probe,
> +        * so the probe won't get optimized.
> +        */
> +       skel->links.usdt_executed = bpf_program__attach_usdt(skel->progs.usdt_executed,
> +                                                    0 /*self*/, "/proc/self/exe",
> +                                                    "optimized_attach", "usdt_1", NULL);
> +       if (!ASSERT_OK_PTR(skel->links.usdt_executed, "bpf_program__attach_usdt"))
> +               goto cleanup;
> +
> +       usdt_1();
> +       usdt_1();
> +
> +       /* nop is on addr_1 address */
> +       ASSERT_EQ(*addr_1, 0xcc, "int3");
> +       ASSERT_EQ(skel->bss->executed, 2, "executed");
> +
> +       bpf_link__destroy(skel->links.usdt_executed);
> +
> +       /*
> +        * Attach program on top of usdt_2 which is probe defined on top
> +        * of nop1,nop5 combo, so the probe gets optimized on top of nop5.
> +        */
> +       skel->links.usdt_executed = bpf_program__attach_usdt(skel->progs.usdt_executed,
> +                                                    0 /*self*/, "/proc/self/exe",
> +                                                    "optimized_attach", "usdt_2", NULL);
> +       if (!ASSERT_OK_PTR(skel->links.usdt_executed, "bpf_program__attach_usdt"))
> +               goto cleanup;
> +
> +       usdt_2();
> +       usdt_2();
> +
> +       /* nop5 is on addr_2 + 1 address */
> +       ASSERT_EQ(*(addr_2 + 1), 0xe8, "call");
> +       ASSERT_EQ(skel->bss->executed, 4, "executed");
> +
> +cleanup:
> +       test_usdt__destroy(skel);
> +}
> +#endif
> +
>  unsigned short test_usdt_100_semaphore SEC(".probes");
>  unsigned short test_usdt_300_semaphore SEC(".probes");
>  unsigned short test_usdt_400_semaphore SEC(".probes");
> @@ -516,6 +599,8 @@ void test_usdt(void)
>  #ifdef __x86_64__
>         if (test__start_subtest("basic_optimized"))
>                 subtest_basic_usdt(true);
> +       if (test__start_subtest("optimized_attach"))
> +               subtest_optimized_attach();
>  #endif
>         if (test__start_subtest("multispec"))
>                 subtest_multispec_usdt();
> diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
> index a78c87537b07..6911868cdf67 100644
> --- a/tools/testing/selftests/bpf/progs/test_usdt.c
> +++ b/tools/testing/selftests/bpf/progs/test_usdt.c
> @@ -138,4 +138,13 @@ int usdt_sib(struct pt_regs *ctx)
>         return 0;
>  }
>
> +int executed;
> +
> +SEC("usdt")
> +int usdt_executed(struct pt_regs *ctx)
> +{
> +       executed++;

did you try capturing pt_reg's ip value and validating it?


> +       return 0;
> +}
> +
>  char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/usdt_1.c b/tools/testing/selftests/bpf/usdt_1.c
> new file mode 100644
> index 000000000000..4f06e8bcf58b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/usdt_1.c
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#if defined(__x86_64__)
> +
> +/*
> + * Include usdt.h with defined USDT_NOP macro to use single
> + * nop instruction.
> + */
> +#define USDT_NOP .byte 0x90
> +#include "usdt.h"
> +
> +__attribute__((aligned(16)))
> +void usdt_1(void)
> +{
> +       USDT(optimized_attach, usdt_1);
> +}
> +
> +#endif
> diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
> new file mode 100644
> index 000000000000..789883aaca4c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/usdt_2.c
> @@ -0,0 +1,16 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#if defined(__x86_64__)
> +
> +/*
> + * Include usdt.h with default nop,nop5 instructions combo.
> + */
> +#include "usdt.h"
> +
> +__attribute__((aligned(16)))
> +void usdt_2(void)
> +{
> +       USDT(optimized_attach, usdt_2);
> +}
> +
> +#endif
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 3/5] libbpf: Add support to detect nop,nop5 instructions combo for usdt probe
  2026-02-11 21:45   ` Andrii Nakryiko
@ 2026-02-12 14:08     ` Jiri Olsa
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2026-02-12 14:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 01:45:12PM -0800, Andrii Nakryiko wrote:
> On Wed, Feb 11, 2026 at 12:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding support to detect nop,nop5 instructions combo for usdt probe
> > by checking on probe's following nop5 instruction.
> >
> > When the nop,nop5 combo is detected together with uprobe syscall,
> > we can place the probe on top of nop5 and get it optimized.
> >
> > [1] https://github.com/libbpf/usdt
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/lib/bpf/usdt.c | 55 ++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 51 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> > index d1524f6f54ae..4e5f70bb4c31 100644
> > --- a/tools/lib/bpf/usdt.c
> > +++ b/tools/lib/bpf/usdt.c
> > @@ -262,6 +262,7 @@ struct usdt_manager {
> >         bool has_bpf_cookie;
> >         bool has_sema_refcnt;
> >         bool has_uprobe_multi;
> > +       bool has_uprobe_syscall;
> >  };
> >
> >  struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
> > @@ -301,6 +302,13 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
> >          * usdt probes.
> >          */
> >         man->has_uprobe_multi = kernel_supports(obj, FEAT_UPROBE_MULTI_LINK);
> > +
> > +       /*
> > +        * Detect kernel support for uprobe() syscall, it's presence means we can
> > +        * take advantage of faster nop5 uprobe handling.
> > +        * Added in: 56101b69c919 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
> > +        */
> > +       man->has_uprobe_syscall = kernel_supports(obj, FEAT_UPROBE_SYSCALL);
> >         return man;
> >  }
> >
> > @@ -585,13 +593,42 @@ static int parse_usdt_note(GElf_Nhdr *nhdr, const char *data, size_t name_off,
> >
> >  static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note, __u64 usdt_cookie);
> >
> > -static int collect_usdt_targets(struct usdt_manager *man, Elf *elf, const char *path, pid_t pid,
> > -                               const char *usdt_provider, const char *usdt_name, __u64 usdt_cookie,
> > -                               struct usdt_target **out_targets, size_t *out_target_cnt)
> > +#if defined(__x86_64__)
> > +static bool has_nop_combo(int fd, long off)
> > +{
> > +       static unsigned char nop_combo[6] = {
> > +               0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 /* nop,nop5 */
> > +       };
> > +       unsigned char buf[6] = {};
> > +
> > +       /*
> > +        * We are using file descriptor that backs Elf object,
> > +        * let's dup it to be on the safe side.
> > +        */
> > +       fd = dup(fd);
> > +       if (fd < 0)
> > +               return false;
> > +       if (lseek(fd, off, SEEK_SET) == off)
> > +               read(fd, buf, 6);
> > +       close(fd);
> 
> ugh, use pread() instead of all this ? I wouldn't bother with short
> read handling, if we didn't get 6 bytes, so be it, no nop5.

ok, that's simpler, will change

thanks,
jirka

> 
> > +       return memcmp(buf, nop_combo, 6) == 0;
> > +}
> > +#else
> > +static bool has_nop_combo(int fd, long off)
> > +{
> > +       return false;
> > +}
> > +#endif
> > +
> 
> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 2/5] libbpf: Add uprobe syscall feature detection
  2026-02-11 21:45   ` Andrii Nakryiko
@ 2026-02-12 14:08     ` Jiri Olsa
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2026-02-12 14:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 01:45:08PM -0800, Andrii Nakryiko wrote:
> On Wed, Feb 11, 2026 at 12:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding uprobe syscall feature detection that will be used
> > in following changes.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/lib/bpf/features.c        | 24 ++++++++++++++++++++++++
> >  tools/lib/bpf/libbpf_internal.h |  2 ++
> >  2 files changed, 26 insertions(+)
> >
> > diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
> > index b842b83e2480..04a225c7f2c0 100644
> > --- a/tools/lib/bpf/features.c
> > +++ b/tools/lib/bpf/features.c
> > @@ -506,6 +506,27 @@ static int probe_kern_arg_ctx_tag(int token_fd)
> >         return probe_fd(prog_fd);
> >  }
> >
> > +#ifdef __x86_64__
> > +
> > +#ifndef __NR_uprobe
> > +#define __NR_uprobe 336
> > +#endif
> > +
> > +static int probe_uprobe_syscall(int token_fd)
> > +{
> > +       /*
> > +        * If kernel supports uprobe() syscall, it will return -ENXIO when called
> > +        * from the outside of a kernel-generated uprobe trampoline.
> > +        */
> > +       return syscall(__NR_uprobe) < 0 && errno == ENXIO;
> > +}
> > +#else
> > +static int probe_uprobe_syscall(int token_fd)
> > +{
> > +       return 0;
> > +}
> > +#endif
> > +
> >  typedef int (*feature_probe_fn)(int /* token_fd */);
> >
> >  static struct kern_feature_cache feature_cache;
> > @@ -581,6 +602,9 @@ static struct kern_feature_desc {
> >         [FEAT_BTF_QMARK_DATASEC] = {
> >                 "BTF DATASEC names starting from '?'", probe_kern_btf_qmark_datasec,
> >         },
> > +       [FEAT_UPROBE_SYSCALL] = {
> > +               "Kernel supports uprobe syscall", probe_uprobe_syscall,
> 
> this will conflict with libbpf arena relocation fix landed into bpf,
> so let's wait until trees merge.
> 
> But also, this description is going into the middle of a sentence,
> start it with lower case

ok, will change

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 5/5] selftests/bpf: Add usdt trigger bench
  2026-02-11 21:45   ` Andrii Nakryiko
@ 2026-02-12 14:09     ` Jiri Olsa
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2026-02-12 14:09 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 01:45:15PM -0800, Andrii Nakryiko wrote:
> On Wed, Feb 11, 2026 at 12:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding usdt trigger bench for usdt:
> >  trig-usdt_nop - usdt on top of nop1 instruction
> >  trig-usdt_nop_combo - usdt on top of nop1/nop5 combo
> >
> > Adding it to benchs/run_bench_uprobes.sh script.
> >
> > Example run on x86_64 kernel with uprobe syscall:
> >
> >   # ./benchs/run_bench_uprobes.sh
> >   usermode-count :  152.507 ± 0.098M/s
> >   syscall-count  :   14.309 ± 0.093M/s
> >   uprobe-nop     :    3.190 ± 0.012M/s
> >   uprobe-push    :    3.057 ± 0.004M/s
> >   uprobe-ret     :    1.095 ± 0.009M/s
> >   uprobe-nop5    :    7.305 ± 0.034M/s
> >   uretprobe-nop  :    2.175 ± 0.005M/s
> >   uretprobe-push :    2.109 ± 0.003M/s
> >   uretprobe-ret  :    0.945 ± 0.002M/s
> >   uretprobe-nop5 :    3.530 ± 0.006M/s
> >   usdt_nop       :    3.235 ± 0.008M/s   <-- added
> >   usdt_nop_combo :    7.511 ± 0.045M/s   <-- added
> 
> consistency, usdt-nop. And for nop_combo I'd use usdt-nop5, that combo
> doesn't matter for performance beyond the fact that we have nop5 there

ok

SNIP

> > +static void usdt_setup(const char *name)
> > +{
> > +       struct bpf_link *link;
> > +       int err;
> > +
> > +       setup_libbpf();
> > +
> > +       ctx.skel = trigger_bench__open();
> > +       if (!ctx.skel) {
> > +               fprintf(stderr, "failed to open skeleton\n");
> > +               exit(1);
> > +       }
> > +
> > +       bpf_program__set_autoload(ctx.skel->progs.bench_trigger_usdt, true);
> > +
> > +       err = trigger_bench__load(ctx.skel);
> > +       if (err) {
> > +               fprintf(stderr, "failed to load skeleton\n");
> > +               exit(1);
> > +       }
> > +
> > +       link = bpf_program__attach_usdt(ctx.skel->progs.bench_trigger_usdt,
> > +                                       0 /*self*/, "/proc/self/exe",
> > +                                       "optimized_attach", name, NULL);
> > +       if (libbpf_get_error(link)) {
> > +               fprintf(stderr, "failed to attach optimized_attach:%s usdt probe\n", name);
> > +               exit(1);
> > +       }
> > +       ctx.skel->links.bench_trigger_usdt = link;
> > +}
> > +
> > +static void usdt_nop_setup(void)
> > +{
> > +       usdt_setup("usdt_1");
> > +}
> > +
> > +static void usdt_nop_combo_setup(void)
> > +{
> > +       usdt_setup("usdt_2");
> > +}
> >  #endif
> >
> >  const struct bench bench_trig_syscall_count = {
> > @@ -609,4 +667,6 @@ BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
> >  BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
> >  BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
> >  BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
> > +BENCH_TRIG_USERMODE(usdt_nop, usdt_nop, "usdt_nop");
> > +BENCH_TRIG_USERMODE(usdt_nop_combo, usdt_nop_combo, "usdt_nop_combo");
> >  #endif
> > diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
> > index 03f55405484b..3656676d99d2 100755
> > --- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
> > +++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
> > @@ -2,7 +2,7 @@
> >
> >  set -eufo pipefail
> >
> > -for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5}
> > +for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5} usdt_nop usdt_nop_combo
> 
> usdt_{nop,nop5}, consistency ;)

ook, will change

thanks,
jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt
  2026-02-11 21:45   ` Andrii Nakryiko
@ 2026-02-12 14:10     ` Jiri Olsa
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2026-02-12 14:10 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, linux-kernel, Song Liu, Yonghong Song,
	John Fastabend

On Wed, Feb 11, 2026 at 01:45:24PM -0800, Andrii Nakryiko wrote:

SNIP

> > diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
> > index a78c87537b07..6911868cdf67 100644
> > --- a/tools/testing/selftests/bpf/progs/test_usdt.c
> > +++ b/tools/testing/selftests/bpf/progs/test_usdt.c
> > @@ -138,4 +138,13 @@ int usdt_sib(struct pt_regs *ctx)
> >         return 0;
> >  }
> >
> > +int executed;
> > +
> > +SEC("usdt")
> > +int usdt_executed(struct pt_regs *ctx)
> > +{
> > +       executed++;
> 
> did you try capturing pt_reg's ip value and validating it?

ok, it's easy to add that check

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-02-12 14:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11  8:48 [PATCHv3 bpf-next 0/5] libbpf: Make optimized uprobes backward compatible Jiri Olsa
2026-02-11  8:48 ` [PATCHv3 bpf-next 1/5] selftests/bpf: Emit nop,nop5 instructions combo for x86_64 arch Jiri Olsa
2026-02-11  8:48 ` [PATCHv3 bpf-next 2/5] libbpf: Add uprobe syscall feature detection Jiri Olsa
2026-02-11 21:45   ` Andrii Nakryiko
2026-02-12 14:08     ` Jiri Olsa
2026-02-11  8:48 ` [PATCHv3 bpf-next 3/5] libbpf: Add support to detect nop,nop5 instructions combo for usdt probe Jiri Olsa
2026-02-11 21:45   ` Andrii Nakryiko
2026-02-12 14:08     ` Jiri Olsa
2026-02-11  8:48 ` [PATCHv3 bpf-next 4/5] selftests/bpf: Add test for checking correct nop of optimized usdt Jiri Olsa
2026-02-11  9:13   ` bot+bpf-ci
2026-02-11 21:45   ` Andrii Nakryiko
2026-02-12 14:10     ` Jiri Olsa
2026-02-11  8:48 ` [PATCHv3 bpf-next 5/5] selftests/bpf: Add usdt trigger bench Jiri Olsa
2026-02-11 21:45   ` Andrii Nakryiko
2026-02-12 14:09     ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox