From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Andrii Nakryiko <andrii@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Sasha Levin <sashal@kernel.org>,
daniel@iogearbox.net, eddyz87@gmail.com, bpf@vger.kernel.org
Subject: [PATCH AUTOSEL 6.9 08/23] libbpf: detect broken PID filtering logic for multi-uprobe
Date: Wed, 5 Jun 2024 08:01:51 -0400 [thread overview]
Message-ID: <20240605120220.2966127-8-sashal@kernel.org> (raw)
In-Reply-To: <20240605120220.2966127-1-sashal@kernel.org>
From: Andrii Nakryiko <andrii@kernel.org>
[ Upstream commit 04d939a2ab229a3821f04fc81f7c027842f501f1 ]
Libbpf is automatically (and transparently to user) detecting
multi-uprobe support in the kernel, and, if supported, uses
multi-uprobes to improve USDT attachment speed.
USDTs can be attached system-wide or for the specific process by PID. In
the latter case, we rely on correct kernel logic of not triggering USDT
for unrelated processes.
As such, on older kernels that do support multi-uprobes, but still have
broken PID filtering logic, we need to fall back to singular uprobes.
Unfortunately, whether user is using PID filtering or not is known at
the attachment time, which happens after relevant BPF programs were
loaded into the kernel. Also unfortunately, we need to make a call
whether to use multi-uprobes or singular uprobe for SEC("usdt") programs
during BPF object load time, at which point we have no information about
possible PID filtering.
The distinction between single and multi-uprobes is small, but important
for the kernel. Multi-uprobes get BPF_TRACE_UPROBE_MULTI attach type,
and kernel internally substitiute different implementation of some of
BPF helpers (e.g., bpf_get_attach_cookie()) depending on whether uprobe
is multi or singular. So, multi-uprobes and singular uprobes cannot be
intermixed.
All the above implies that we have to make an early and conservative
call about the use of multi-uprobes. And so this patch modifies libbpf's
existing feature detector for multi-uprobe support to also check correct
PID filtering. If PID filtering is not yet fixed, we fall back to
singular uprobes for USDTs.
This extension to feature detection is simple thanks to kernel's -EINVAL
addition for pid < 0.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/lib/bpf/features.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
index a336786a22a38..3df0125ed5fa7 100644
--- a/tools/lib/bpf/features.c
+++ b/tools/lib/bpf/features.c
@@ -392,11 +392,40 @@ static int probe_uprobe_multi_link(int token_fd)
link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
err = -errno; /* close() can clobber errno */
+ if (link_fd >= 0 || err != -EBADF) {
+ close(link_fd);
+ close(prog_fd);
+ return 0;
+ }
+
+ /* Initial multi-uprobe support in kernel didn't handle PID filtering
+ * correctly (it was doing thread filtering, not process filtering).
+ * So now we'll detect if PID filtering logic was fixed, and, if not,
+ * we'll pretend multi-uprobes are not supported, if not.
+ * Multi-uprobes are used in USDT attachment logic, and we need to be
+ * conservative here, because multi-uprobe selection happens early at
+ * load time, while the use of PID filtering is known late at
+ * attachment time, at which point it's too late to undo multi-uprobe
+ * selection.
+ *
+ * Creating uprobe with pid == -1 for (invalid) '/' binary will fail
+ * early with -EINVAL on kernels with fixed PID filtering logic;
+ * otherwise -ESRCH would be returned if passed correct binary path
+ * (but we'll just get -BADF, of course).
+ */
+ link_opts.uprobe_multi.pid = -1; /* invalid PID */
+ link_opts.uprobe_multi.path = "/"; /* invalid path */
+ link_opts.uprobe_multi.offsets = &offset;
+ link_opts.uprobe_multi.cnt = 1;
+
+ link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
+ err = -errno; /* close() can clobber errno */
+
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
- return link_fd < 0 && err == -EBADF;
+ return link_fd < 0 && err == -EINVAL;
}
static int probe_kern_bpf_cookie(int token_fd)
--
2.43.0
next prev parent reply other threads:[~2024-06-05 12:02 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 12:01 [PATCH AUTOSEL 6.9 01/23] nvme-multipath: find NUMA path only for online numa-node Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 02/23] dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 03/23] drm/amdgpu: correct hbm field in boot status Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 04/23] connector: Fix invalid conversion in cn_proc.h Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 05/23] swap: yield device immediately Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 06/23] nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 07/23] afs: Don't cross .backup mountpoint from backup volume Sasha Levin
2024-06-05 12:01 ` Sasha Levin [this message]
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 09/23] regmap-i2c: Subtract reg size from max_write Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 10/23] platform/x86: touchscreen_dmi: Add support for setting touchscreen properties from cmdline Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 11/23] platform/x86: touchscreen_dmi: Add info for GlobalSpace SolT IVW 11.6" tablet Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 12/23] platform/x86: touchscreen_dmi: Add info for the EZpad 6s Pro Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 13/23] block: check for max_hw_sectors underflow Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 14/23] nvmet: fix a possible leak when destroy a ctrl during qp establishment Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 15/23] kbuild: fix short log for AS in link-vmlinux.sh Sasha Levin
2024-06-05 12:01 ` [PATCH AUTOSEL 6.9 16/23] nfc/nci: Add the inconsistency check between the input data length and count Sasha Levin
2024-06-05 12:02 ` [PATCH AUTOSEL 6.9 17/23] spi: cadence: Ensure data lines set to low during dummy-cycle period Sasha Levin
2024-06-05 12:02 ` [PATCH AUTOSEL 6.9 18/23] ALSA: ump: Set default protocol when not given explicitly Sasha Levin
2024-06-05 12:02 ` [PATCH AUTOSEL 6.9 19/23] drm/amdgpu: silence UBSAN warning Sasha Levin
2024-06-05 12:02 ` [PATCH AUTOSEL 6.9 20/23] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent Sasha Levin
2024-06-05 12:02 ` [PATCH AUTOSEL 6.9 21/23] Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices" Sasha Levin
2024-06-05 12:02 ` [PATCH AUTOSEL 6.9 22/23] hwmon: (dell-smm) Add Dell G15 5511 to fan control whitelist Sasha Levin
2024-06-05 12:02 ` [PATCH AUTOSEL 6.9 23/23] null_blk: Do not allow runt zone with zone capacity smaller then zone size Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240605120220.2966127-8-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox