public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@kernel.org>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>
Subject: [PATCH bpf-next 1/4] selftests/bpf: Emit nop,nop5 instructions for x86_64 usdt probe
Date: Mon, 17 Nov 2025 09:35:48 +0100	[thread overview]
Message-ID: <20251117083551.517393-2-jolsa@kernel.org> (raw)
In-Reply-To: <20251117083551.517393-1-jolsa@kernel.org>

We can currently optimize uprobes on top of nop5 instructions,
so application can define USDT_NOP to nop5 and use USDT macro
to define optimized usdt probes.

This works fine on new kernels, but could have performance penalty
on older kernels, that do not have the support to optimize and to
emulate nop5 instruction.

  execution of the usdt probe on top of nop:
  - nop -> trigger usdt -> emulate nop -> continue

  execution of the usdt probe on top of nop5:
  - nop5 -> trigger usdt -> single step nop5 -> continue

Note the 'single step nop5' as the source of performance regression.

To workaround that we change the USDT macro to emit nop,nop5 for
the probe (instead of default nop) and make record of that in
USDT record (more on that below).

This can be detected by application (libbpf) and it can place the
uprobe either on nop or nop5 based on the optimization support in
the kernel.

We make record of using the nop,nop5 instructions in the USDT ELF
note data.

Current elf note format is as follows:

  namesz (4B) | descsz (4B) | type (4B) | name | desc

And current usdt record (with "stapsdt" name) placed in the note's
desc data look like:

  loc_addr  | 8 bytes
  base_addr | 8 bytes
  sema_addr | 8 bytes
  provider  | zero terminated string
  name      | zero terminated string
  args      | zero terminated string

None of the tested parsers (bpftrace-bcc, libbpf) checked that the args
zero terminated byte is the actual end of the 'desc' data. As Andrii
suggested we could use this and place extra zero byte right there as an
indication for the parser we use the nop,nop5 instructions.

It's bit tricky, but the other way would be to introduce new elf note type
or note name and change all existing parsers to recognize it. With the change
above the existing parsers would still recognize such usdt probes.

Note we do not emit this extra byte if app defined its own nop through
USDT_NOP macro.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/usdt.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h
index 549d1f774810..57fa2902136c 100644
--- a/tools/testing/selftests/bpf/usdt.h
+++ b/tools/testing/selftests/bpf/usdt.h
@@ -312,9 +312,16 @@ struct usdt_sema { volatile unsigned short active; };
 #ifndef USDT_NOP
 #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
 #define USDT_NOP			nop 0
+#elif defined(__x86_64__)
+#define USDT_NOP			.byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */
 #else
 #define USDT_NOP			nop
 #endif
+#else
+/*
+ * User define its own nop instruction, do not emit extra note data.
+ */
+#define __usdt_asm_extra
 #endif /* USDT_NOP */
 
 /*
@@ -403,6 +410,15 @@ struct usdt_sema { volatile unsigned short active; };
 	__asm__ __volatile__ ("" :: "m" (sema));
 #endif
 
+#ifndef __usdt_asm_extra
+#ifdef __x86_64__
+#define __usdt_asm_extra									\
+	__usdt_asm1(            .ascii "\0")
+#else
+#define __usdt_asm_extra
+#endif
+#endif
+
 /* main USDT definition (nop and .note.stapsdt metadata) */
 #define __usdt_probe(group, name, sema_def, sema, ...) do {					\
 	sema_def(sema)										\
@@ -420,6 +436,7 @@ struct usdt_sema { volatile unsigned short active; };
 	__usdt_asm_strz(name)									\
 	__usdt_asm_args(__VA_ARGS__)								\
 	__usdt_asm1(		.ascii "\0")							\
+	__usdt_asm_extra									\
 	__usdt_asm1(994:	.balign 4)							\
 	__usdt_asm1(		.popsection)							\
 	__usdt_asm1(.ifndef _.stapsdt.base)							\
-- 
2.51.1


  reply	other threads:[~2025-11-17  8:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-17  8:35 [PATCH bpf-next 0/4] libbpf: Make optimized uprobes backward compatible Jiri Olsa
2025-11-17  8:35 ` Jiri Olsa [this message]
2025-11-24 17:29   ` [PATCH bpf-next 1/4] selftests/bpf: Emit nop,nop5 instructions for x86_64 usdt probe Andrii Nakryiko
2025-11-26  8:29     ` Jiri Olsa
2025-11-17  8:35 ` [PATCH bpf-next 2/4] libbpf: Add uprobe syscall feature detection Jiri Olsa
2025-11-24 17:29   ` Andrii Nakryiko
2025-11-26  8:29     ` Jiri Olsa
2025-11-17  8:35 ` [PATCH bpf-next 3/4] libbpf: Add support to parse extra info in usdt note record Jiri Olsa
2025-11-24 17:29   ` Andrii Nakryiko
2025-11-26  8:30     ` Jiri Olsa
2025-11-17  8:35 ` [PATCH bpf-next 4/4] selftests/bpf: Add test for checking correct nop of optimized usdt Jiri Olsa
2025-11-24 17:34   ` Andrii Nakryiko
2025-11-26  8:29     ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251117083551.517393-2-jolsa@kernel.org \
    --to=jolsa@kernel.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox