* [PATCH bpf v4 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable()
@ 2026-05-14 18:47 Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ihor Solodrai @ 2026-05-14 18:47 UTC (permalink / raw)
To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi
Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
kernel-team
I noticed a following deadlock splat on bpf-next, when running a
subset of selftests/bpf:
#526 uprobe_multi_test:OK
#28 bpf_obj_id:OK
[ 27.561465]
[ 27.561603] ======================================================
[ 27.561899] WARNING: possible circular locking dependency detected
[ 27.562193] 7.0.0-rc5-g0eeb0094ba03 #3 Tainted: G OE
[ 27.562523] ------------------------------------------------------
[ 27.562820] uprobe_multi/561 is trying to acquire lock:
[ 27.563071] ff1100010768ba18 (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: netfs_start_io_read+0x24/0x110
[ 27.563556]
[ 27.563556] but task is already holding lock:
[ 27.563845] ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.564296]
[ 27.564296] which lock already depends on the new lock.
[ 27.564296]
[ 27.564696]
[ 27.564696] the existing dependency chain (in reverse order) is:
[ 27.565044]
[ 27.565044] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[ 27.565340] __might_fault+0xa3/0x100
[ 27.565572] _copy_to_iter+0xdf/0x1520
[ 27.565786] copy_page_to_iter+0x1c6/0x2d0
[ 27.566011] filemap_read+0x5f7/0xbf0
[ 27.566218] netfs_file_read_iter+0x1ca/0x240
[ 27.566453] vfs_read+0x482/0x9e0
[ 27.566658] ksys_read+0x115/0x1f0
[ 27.566852] do_syscall_64+0xde/0x390
[ 27.567057] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 27.567322]
[ 27.567322] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
[ 27.567701] __lock_acquire+0x1529/0x2a10
[ 27.567923] lock_acquire+0xd7/0x280
[ 27.568124] down_read_interruptible+0x55/0x340
[ 27.568368] netfs_start_io_read+0x24/0x110
[ 27.568622] netfs_file_read_iter+0x191/0x240
[ 27.568858] __kernel_read+0x401/0x850
[ 27.569068] freader_fetch+0x19b/0x790
[ 27.569279] __build_id_parse+0xde/0x580
[ 27.569528] stack_map_get_build_id_offset+0x248/0x650
[ 27.569802] __bpf_get_stack+0x653/0x890
[ 27.570019] bpf_get_stack_sleepable+0x1d/0x30
[ 27.570257] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.570594] uprobe_prog_run+0x425/0x870
[ 27.570816] uprobe_multi_link_handler+0x58/0xb0
[ 27.571062] handler_chain+0x234/0x1270
[ 27.571275] uprobe_notify_resume+0x4de/0xd60
[ 27.571542] exit_to_user_mode_loop+0xe0/0x480
[ 27.571785] exc_int3+0x1bd/0x260
[ 27.571973] asm_exc_int3+0x39/0x40
[ 27.572169]
[ 27.572169] other info that might help us debug this:
[ 27.572169]
[ 27.572571] Possible unsafe locking scenario:
[ 27.572571]
[ 27.572854] CPU0 CPU1
[ 27.573074] ---- ----
[ 27.573293] rlock(&mm->mmap_lock);
[ 27.573502] lock(&sb->s_type->i_mutex_key#8);
[ 27.573846] lock(&mm->mmap_lock);
[ 27.574135] rlock(&sb->s_type->i_mutex_key#8);
[ 27.574364]
[ 27.574364] *** DEADLOCK ***
[ 27.574364]
[ 27.574671] 3 locks held by uprobe_multi/561:
[ 27.574884] #0: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_notify_resume+0x229/0xd60
[ 27.575371] #1: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_prog_run+0x239/0x870
[ 27.575874] #2: ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.576342]
[ 27.576342] stack backtrace:
[ 27.576578] CPU: 7 UID: 0 PID: 561 Comm: uprobe_multi Tainted: G OE 7.0.0-rc5-g0eeb0094ba03 #3 PREEMPT(full)
[ 27.576586] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 27.576588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-5.el9 11/05/2023
[ 27.576591] Call Trace:
[ 27.576595] <TASK>
[ 27.576598] dump_stack_lvl+0x54/0x70
[ 27.576606] print_circular_bug+0x2e8/0x300
[ 27.576616] check_noncircular+0x12e/0x150
[ 27.576628] __lock_acquire+0x1529/0x2a10
[ 27.576642] ? __pfx_walk_pgd_range+0x10/0x10
[ 27.576651] ? srso_return_thunk+0x5/0x5f
[ 27.576656] ? lock_acquire+0xd7/0x280
[ 27.576661] ? avc_has_perm_noaudit+0x38/0x1f0
[ 27.576672] lock_acquire+0xd7/0x280
[ 27.576677] ? netfs_start_io_read+0x24/0x110
[ 27.576685] ? srso_return_thunk+0x5/0x5f
[ 27.576693] ? netfs_start_io_read+0x24/0x110
[ 27.576698] down_read_interruptible+0x55/0x340
[ 27.576702] ? netfs_start_io_read+0x24/0x110
[ 27.576707] ? __pfx_cmp_ex_search+0x10/0x10
[ 27.576716] netfs_start_io_read+0x24/0x110
[ 27.576723] netfs_file_read_iter+0x191/0x240
[ 27.576731] __kernel_read+0x401/0x850
[ 27.576741] ? __pfx___kernel_read+0x10/0x10
[ 27.576752] ? __pfx_fixup_exception+0x10/0x10
[ 27.576761] ? srso_return_thunk+0x5/0x5f
[ 27.576766] ? lock_is_held_type+0x76/0x100
[ 27.576773] ? srso_return_thunk+0x5/0x5f
[ 27.576777] ? mtree_range_walk+0x421/0x6c0
[ 27.576785] freader_fetch+0x19b/0x790
[ 27.576793] ? mt_find+0x14b/0x340
[ 27.576801] ? __pfx_freader_fetch+0x10/0x10
[ 27.576805] ? mt_find+0x29a/0x340
[ 27.576810] ? mt_find+0x14b/0x340
[ 27.576820] __build_id_parse+0xde/0x580
[ 27.576832] ? __pfx___build_id_parse+0x10/0x10
[ 27.576850] stack_map_get_build_id_offset+0x248/0x650
[ 27.576863] __bpf_get_stack+0x653/0x890
[ 27.576867] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 27.576880] ? __pfx___bpf_get_stack+0x10/0x10
[ 27.576884] ? lock_acquire+0xd7/0x280
[ 27.576889] ? uprobe_prog_run+0x239/0x870
[ 27.576895] ? srso_return_thunk+0x5/0x5f
[ 27.576899] ? stack_trace_save+0xae/0x100
[ 27.576908] bpf_get_stack_sleepable+0x1d/0x30
[ 27.576913] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.576919] uprobe_prog_run+0x425/0x870
[ 27.576927] ? srso_return_thunk+0x5/0x5f
[ 27.576933] ? __pfx_uprobe_prog_run+0x10/0x10
[ 27.576937] ? uprobe_notify_resume+0x413/0xd60
[ 27.576952] uprobe_multi_link_handler+0x58/0xb0
[ 27.576960] handler_chain+0x234/0x1270
[ 27.576976] ? __pfx_handler_chain+0x10/0x10
[ 27.576988] ? srso_return_thunk+0x5/0x5f
[ 27.576992] ? __kasan_kmalloc+0x72/0x90
[ 27.576998] ? __pfx_ri_timer+0x10/0x10
[ 27.577003] ? timer_init_key+0xf5/0x200
[ 27.577013] uprobe_notify_resume+0x4de/0xd60
[ 27.577025] ? uprobe_notify_resume+0x229/0xd60
[ 27.577030] ? __pfx_uprobe_notify_resume+0x10/0x10
[ 27.577035] ? srso_return_thunk+0x5/0x5f
[ 27.577039] ? atomic_notifier_call_chain+0xfc/0x110
[ 27.577047] ? srso_return_thunk+0x5/0x5f
[ 27.577051] ? notify_die+0xe5/0x130
[ 27.577056] ? __pfx_notify_die+0x10/0x10
[ 27.577063] ? srso_return_thunk+0x5/0x5f
[ 27.577071] exit_to_user_mode_loop+0xe0/0x480
[ 27.577076] ? asm_exc_int3+0x39/0x40
[ 27.577083] exc_int3+0x1bd/0x260
[ 27.577090] asm_exc_int3+0x39/0x40
[ 27.577095] RIP: 0033:0x6c4180
[ 27.577100] Code: c6 05 0b 6f 32 00 01 5d c3 90 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa eb 8a 66 2e 0f 1f 84 00 00 00 00 00 <cc> 48 89 e5 31 c0 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 31 c0
[ 27.577104] RSP: 002b:00007ffcbf71ada8 EFLAGS: 00000246
[ 27.577109] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fec9163db7b
[ 27.577112] RDX: 00007ffcbf71adbf RSI: 0000000000001000 RDI: 00000000007f0000
[ 27.577115] RBP: 00007ffcbf71add0 R08: 00007fec91731330 R09: 00007fec9178d060
[ 27.577118] R10: 00007fec9153ad98 R11: 0000000000000202 R12: 00007ffcbf71af08
[ 27.577120] R13: 0000000000787760 R14: 00000000009eade8 R15: 00007fec917bf000
[ 27.577135] </TASK>
#54/1 build_id/nofault-paged-out:OK
It was reproducable, which allowed me to bisect to commit 777a8560fd29
("lib/buildid: use __kernel_read() for sleepable context")
Then I used Codex for analysis of the splat, patch and relevant lore
discussions, and then asked it to produce a patch draft, which was a
starting point for this series.
I verified that the patch series fixes the deadlock splat.
---
v3->v4:
* Change Fixes tag in patch #2 (AI)
* Nit in caching implementation (Mykyta)
v3: https://lore.kernel.org/bpf/20260512032906.2670326-1-ihor.solodrai@linux.dev/
v2->v3:
* Split patch #2 in two: stack_map_get_build_id_offset_sleepable()
implementation, and then introduce caching
* Drop taking mmap_lock if CONFIG_PER_VMA_LOCK=n, fall back to raw
IPs instead
* Cache vm_{start,end} in addition to prev_file (Mykyta)
v2: https://lore.kernel.org/bpf/20260409010604.1439087-1-ihor.solodrai@linux.dev/
v1->v2:
* Addressed feedback from Puranjay:
* split out a small refactoring patch
* use mmap_read_trylock()
* take into account CONFIG_PER_VMA_LOCK
* replace find_vma() with vma_lookup()
* cache prev_build_id to avoid re-parsing the same file
* Snapshot vm_pgoff and vm_start before unlocking (AI)
* To avoid repetitive unlocking statements, introduce struct
stack_map_vma_lock to hold relevant lock state info and add an
unlock helper
v1: https://lore.kernel.org/bpf/20260407223003.720428-1-ihor.solodrai@linux.dev/
---
Ihor Solodrai (3):
bpf: Factor out stack_map_build_id_set_ip() in stackmap.c
bpf: Avoid faultable build ID reads under mm locks
bpf: Cache build IDs in sleepable stackmap path
kernel/bpf/stackmap.c | 176 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 167 insertions(+), 9 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH bpf v4 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c
2026-05-14 18:47 [PATCH bpf v4 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
@ 2026-05-14 18:47 ` Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
2 siblings, 0 replies; 10+ messages in thread
From: Ihor Solodrai @ 2026-05-14 18:47 UTC (permalink / raw)
To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi
Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
kernel-team
Factor out a small helper from stack_map_get_build_id_offset() in
preparation for adding a sleepable build ID resolution path.
No functional changes.
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
kernel/bpf/stackmap.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index da3d328f5c15..4ef0fd06cea5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -152,6 +152,12 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b
: build_id_parse_nofault(vma, build_id, NULL);
}
+static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
+{
+ id->status = BPF_STACK_BUILD_ID_IP;
+ memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
+}
+
/*
* Expects all id_offs[i].ip values to be set to correct initial IPs.
* They will be subsequently:
@@ -165,23 +171,21 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b
static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
u32 trace_nr, bool user, bool may_fault)
{
- int i;
struct mmap_unlock_irq_work *work = NULL;
bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work);
+ bool has_user_ctx = user && current && current->mm;
struct vm_area_struct *vma, *prev_vma = NULL;
const char *prev_build_id;
+ int i;
/* If the irq_work is in use, fall back to report ips. Same
* fallback is used for kernel stack (!user) on a stackmap with
* build_id.
*/
- if (!user || !current || !current->mm || irq_work_busy ||
- !mmap_read_trylock(current->mm)) {
+ if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) {
/* cannot access current->mm, fall back to ips */
- for (i = 0; i < trace_nr; i++) {
- id_offs[i].status = BPF_STACK_BUILD_ID_IP;
- memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
- }
+ for (i = 0; i < trace_nr; i++)
+ stack_map_build_id_set_ip(&id_offs[i]);
return;
}
@@ -196,8 +200,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
vma = find_vma(current->mm, ip);
if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) {
/* per entry fall back to ips */
- id_offs[i].status = BPF_STACK_BUILD_ID_IP;
- memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
+ stack_map_build_id_set_ip(&id_offs[i]);
continue;
}
build_id_valid:
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks
2026-05-14 18:47 [PATCH bpf v4 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
@ 2026-05-14 18:47 ` Ihor Solodrai
2026-05-14 19:37 ` bot+bpf-ci
` (2 more replies)
2026-05-14 18:47 ` [PATCH bpf v4 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
2 siblings, 3 replies; 10+ messages in thread
From: Ihor Solodrai @ 2026-05-14 18:47 UTC (permalink / raw)
To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi
Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
kernel-team
Sleepable build ID parsing can block in __kernel_read() [1], so the
stackmap sleepable path must not call it while holding mmap_lock or a
per-VMA read lock.
The issue and the fix are conceptually similar to a recent procfs
patch [2].
Resolve each covered VMA with a stable read-side reference, preferring
lock_vma_under_rcu() and falling back to mmap_read_trylock() only long
enough to acquire the VMA read lock. Take a reference to the backing
file, drop the VMA lock, and then parse the build ID through
(sleepable) build_id_parse_file().
[1]: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
[2]: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers")
Suggested-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
kernel/bpf/stackmap.c | 105 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 105 insertions(+)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 4ef0fd06cea5..c1e96df360c3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -9,6 +9,7 @@
#include <linux/perf_event.h>
#include <linux/btf_ids.h>
#include <linux/buildid.h>
+#include <linux/mmap_lock.h>
#include "percpu_freelist.h"
#include "mmap_unlock_work.h"
@@ -158,6 +159,105 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
}
+struct stack_map_vma_lock {
+ bool vma_locked;
+ struct vm_area_struct *vma;
+ struct mm_struct *mm;
+};
+
+static struct vm_area_struct *
+stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip)
+{
+ struct mm_struct *mm = lock->mm;
+ struct vm_area_struct *vma;
+
+ if (WARN_ON_ONCE(!mm))
+ return NULL;
+
+ vma = lock_vma_under_rcu(mm, ip);
+ if (vma)
+ goto vma_locked;
+
+ if (!mmap_read_trylock(mm))
+ return NULL;
+
+ vma = vma_lookup(mm, ip);
+ if (!vma) {
+ mmap_read_unlock(mm);
+ return NULL;
+ }
+
+#ifdef CONFIG_PER_VMA_LOCK
+ if (!vma_start_read_locked(vma)) {
+ mmap_read_unlock(mm);
+ return NULL;
+ }
+ mmap_read_unlock(mm);
+#else
+ mmap_read_unlock(mm);
+ return NULL;
+#endif
+vma_locked:
+ lock->vma_locked = true;
+ lock->vma = vma;
+ return vma;
+}
+
+static void stack_map_unlock_vma(struct stack_map_vma_lock *lock)
+{
+ struct vm_area_struct *vma = lock->vma;
+
+ if (lock->vma_locked) {
+ if (WARN_ON_ONCE(!vma))
+ goto out;
+ vma_end_read(vma);
+ }
+out:
+ lock->vma_locked = false;
+ lock->vma = NULL;
+}
+
+static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs,
+ u32 trace_nr)
+{
+ struct mm_struct *mm = current->mm;
+ struct stack_map_vma_lock lock = {
+ .vma_locked = false,
+ .vma = NULL,
+ .mm = mm,
+ };
+ unsigned long vm_pgoff, vm_start;
+ struct vm_area_struct *vma;
+ struct file *file;
+ u64 ip;
+
+ for (u32 i = 0; i < trace_nr; i++) {
+ ip = READ_ONCE(id_offs[i].ip);
+ vma = stack_map_lock_vma(&lock, ip);
+ if (!vma || !vma->vm_file) {
+ stack_map_build_id_set_ip(&id_offs[i]);
+ stack_map_unlock_vma(&lock);
+ continue;
+ }
+
+ file = get_file(vma->vm_file);
+ vm_pgoff = vma->vm_pgoff;
+ vm_start = vma->vm_start;
+ stack_map_unlock_vma(&lock);
+
+ /* build_id_parse_file() may block on filesystem reads */
+ if (build_id_parse_file(file, id_offs[i].build_id, NULL)) {
+ stack_map_build_id_set_ip(&id_offs[i]);
+ fput(file);
+ continue;
+ }
+ fput(file);
+
+ id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
+ id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
+ }
+}
+
/*
* Expects all id_offs[i].ip values to be set to correct initial IPs.
* They will be subsequently:
@@ -178,6 +278,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
const char *prev_build_id;
int i;
+ if (may_fault && has_user_ctx) {
+ stack_map_get_build_id_offset_sleepable(id_offs, trace_nr);
+ return;
+ }
+
/* If the irq_work is in use, fall back to report ips. Same
* fallback is used for kernel stack (!user) on a stackmap with
* build_id.
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH bpf v4 3/3] bpf: Cache build IDs in sleepable stackmap path
2026-05-14 18:47 [PATCH bpf v4 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
@ 2026-05-14 18:47 ` Ihor Solodrai
2026-05-14 19:37 ` bot+bpf-ci
2 siblings, 1 reply; 10+ messages in thread
From: Ihor Solodrai @ 2026-05-14 18:47 UTC (permalink / raw)
To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi
Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
kernel-team
Stack traces often contain adjacent IPs from the same VMA or from
different VMAs backed by the same ELF file. Cache the last successfully
parsed build ID together with the resolved VMA range and backing file
so the sleepable build-ID path can avoid repeated VMA locking and file
parsing in common cases.
Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
kernel/bpf/stackmap.c | 56 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 53 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index c1e96df360c3..8a18c9645933 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -226,13 +226,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
.vma = NULL,
.mm = mm,
};
- unsigned long vm_pgoff, vm_start;
+ struct {
+ struct file *file;
+ const unsigned char *build_id;
+ unsigned long vm_start;
+ unsigned long vm_end;
+ unsigned long vm_pgoff;
+ } cache = {};
+ unsigned long vm_pgoff, vm_start, vm_end;
struct vm_area_struct *vma;
struct file *file;
u64 ip;
for (u32 i = 0; i < trace_nr; i++) {
ip = READ_ONCE(id_offs[i].ip);
+
+ /*
+ * Range cache fast path: if ip falls within the previously
+ * resolved VMA range, reuse the cache build_id without
+ * re-acquiring the VMA lock.
+ */
+ if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) {
+ vm_start = cache.vm_start;
+ vm_end = cache.vm_end;
+ vm_pgoff = cache.vm_pgoff;
+ goto build_id_valid;
+ }
+
vma = stack_map_lock_vma(&lock, ip);
if (!vma || !vma->vm_file) {
stack_map_build_id_set_ip(&id_offs[i]);
@@ -240,9 +260,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
continue;
}
- file = get_file(vma->vm_file);
+ file = vma->vm_file;
vm_pgoff = vma->vm_pgoff;
vm_start = vma->vm_start;
+ vm_end = vma->vm_end;
+
+ if (file == cache.file) {
+ /*
+ * Same backing file as previous (e.g. different VMAs
+ * of the same ELF binary). Reuse the cache build_id.
+ */
+ stack_map_unlock_vma(&lock);
+ goto build_id_valid;
+ }
+
+ file = get_file(file);
stack_map_unlock_vma(&lock);
/* build_id_parse_file() may block on filesystem reads */
@@ -251,11 +283,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
fput(file);
continue;
}
- fput(file);
+ if (cache.file)
+ fput(cache.file);
+ cache.file = file;
+ cache.build_id = id_offs[i].build_id;
+
+build_id_valid:
+ /*
+ * In the slow path cache.build_id points to id_offs[i].build_id.
+ * Cache hits leave cache.build_id pointing at a prior slot,
+ * triggering the memcpy here.
+ */
+ if (cache.build_id != id_offs[i].build_id)
+ memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX);
+ cache.vm_start = vm_start;
+ cache.vm_end = vm_end;
+ cache.vm_pgoff = vm_pgoff;
id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
}
+
+ if (cache.file)
+ fput(cache.file);
}
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks
2026-05-14 18:47 ` [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
@ 2026-05-14 19:37 ` bot+bpf-ci
2026-05-14 20:37 ` Ihor Solodrai
2026-05-14 20:47 ` Shakeel Butt
2026-05-14 21:31 ` sashiko-bot
2 siblings, 1 reply; 10+ messages in thread
From: bot+bpf-ci @ 2026-05-14 19:37 UTC (permalink / raw)
To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor
Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel,
kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 4489 bytes --]
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 4ef0fd06cea5..c1e96df360c3 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -9,6 +9,7 @@
> #include <linux/perf_event.h>
> #include <linux/btf_ids.h>
> #include <linux/buildid.h>
> +#include <linux/mmap_lock.h>
> #include "percpu_freelist.h"
> #include "mmap_unlock_work.h"
[ ... ]
> @@ -158,6 +159,105 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
> memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
> }
>
> +struct stack_map_vma_lock {
> + bool vma_locked;
> + struct vm_area_struct *vma;
> + struct mm_struct *mm;
> +};
> +
> +static struct vm_area_struct *
> +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip)
> +{
> + struct mm_struct *mm = lock->mm;
> + struct vm_area_struct *vma;
> +
> + if (WARN_ON_ONCE(!mm))
> + return NULL;
> +
> + vma = lock_vma_under_rcu(mm, ip);
> + if (vma)
> + goto vma_locked;
> +
> + if (!mmap_read_trylock(mm))
> + return NULL;
> +
> + vma = vma_lookup(mm, ip);
> + if (!vma) {
> + mmap_read_unlock(mm);
> + return NULL;
> + }
> +
> +#ifdef CONFIG_PER_VMA_LOCK
> + if (!vma_start_read_locked(vma)) {
> + mmap_read_unlock(mm);
> + return NULL;
> + }
> + mmap_read_unlock(mm);
> +#else
> + mmap_read_unlock(mm);
> + return NULL;
> +#endif
> +vma_locked:
> + lock->vma_locked = true;
> + lock->vma = vma;
> + return vma;
> +}
> +
> +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock)
> +{
> + struct vm_area_struct *vma = lock->vma;
> +
> + if (lock->vma_locked) {
> + if (WARN_ON_ONCE(!vma))
> + goto out;
> + vma_end_read(vma);
> + }
> +out:
> + lock->vma_locked = false;
> + lock->vma = NULL;
> +}
> +
> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs,
> + u32 trace_nr)
> +{
> + struct mm_struct *mm = current->mm;
> + struct stack_map_vma_lock lock = {
> + .vma_locked = false,
> + .vma = NULL,
> + .mm = mm,
> + };
> + unsigned long vm_pgoff, vm_start;
> + struct vm_area_struct *vma;
> + struct file *file;
> + u64 ip;
> +
> + for (u32 i = 0; i < trace_nr; i++) {
> + ip = READ_ONCE(id_offs[i].ip);
> + vma = stack_map_lock_vma(&lock, ip);
> + if (!vma || !vma->vm_file) {
> + stack_map_build_id_set_ip(&id_offs[i]);
> + stack_map_unlock_vma(&lock);
> + continue;
> + }
Is the vm_file check here correct for classifying anonymous VMAs?
Per mm-vma.md, vma->vm_file is not a reliable indicator of file-backed
status. Some anonymous VMAs carry a vm_file reference, such as private
mappings of /dev/zero set via vma_set_anonymous() which sets vm_ops
= NULL but leaves vm_file populated.
The canonical test is vma_is_anonymous(vma) (checking !vma->vm_ops).
For a private /dev/zero mapping, this code would pass /dev/zero to
build_id_parse_file() instead of immediately falling back to the raw IP.
This was flagged in v3 review by bot+bpf-ci@kernel.org and acknowledged
by the author as a minor edge case but left unchanged for consistency
with existing code elsewhere in stackmap.c:
https://lore.kernel.org/bpf/1f275575e3c2309dc7b7f099c7e5cd47bb25182d94883ca0893cf4aaf03c954c@mail.kernel.org/
Should this be switched to vma_is_anonymous(), or is it preferable to
keep the current pattern and address it later across all of stackmap.c?
> +
> + file = get_file(vma->vm_file);
> + vm_pgoff = vma->vm_pgoff;
> + vm_start = vma->vm_start;
> + stack_map_unlock_vma(&lock);
> +
> + /* build_id_parse_file() may block on filesystem reads */
> + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) {
> + stack_map_build_id_set_ip(&id_offs[i]);
> + fput(file);
> + continue;
> + }
> + fput(file);
> +
> + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
> + id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
> + }
> +}
[ ... ]
> @@ -178,6 +278,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
> const char *prev_build_id;
> int i;
>
> + if (may_fault && has_user_ctx) {
> + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr);
> + return;
> + }
> +
> /* If the irq_work is in use, fall back to report ips. Same
> * fallback is used for kernel stack (!user) on a stackmap with
> * build_id.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25879521172
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf v4 3/3] bpf: Cache build IDs in sleepable stackmap path
2026-05-14 18:47 ` [PATCH bpf v4 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
@ 2026-05-14 19:37 ` bot+bpf-ci
0 siblings, 0 replies; 10+ messages in thread
From: bot+bpf-ci @ 2026-05-14 19:37 UTC (permalink / raw)
To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor
Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel,
kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 3746 bytes --]
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index c1e96df360c3..8a18c9645933 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -226,13 +226,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
> .vma = NULL,
> .mm = mm,
> };
> - unsigned long vm_pgoff, vm_start;
> + struct {
> + struct file *file;
> + const unsigned char *build_id;
> + unsigned long vm_start;
> + unsigned long vm_end;
> + unsigned long vm_pgoff;
> + } cache = {};
> + unsigned long vm_pgoff, vm_start, vm_end;
> struct vm_area_struct *vma;
> struct file *file;
> u64 ip;
>
> for (u32 i = 0; i < trace_nr; i++) {
> ip = READ_ONCE(id_offs[i].ip);
> +
> + /*
> + * Range cache fast path: if ip falls within the previously
> + * resolved VMA range, reuse the cache build_id without
> + * re-acquiring the VMA lock.
> + */
> + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) {
> + vm_start = cache.vm_start;
> + vm_end = cache.vm_end;
> + vm_pgoff = cache.vm_pgoff;
> + goto build_id_valid;
> + }
> +
> vma = stack_map_lock_vma(&lock, ip);
> if (!vma || !vma->vm_file) {
> stack_map_build_id_set_ip(&id_offs[i]);
During v3 review, sashiko-bot@kernel.org raised a concern about VMA
classification that remains unaddressed. While the !vma->vm_file check
predates this patch, could this cause issues with anonymous VMAs that
still have a backing file?
For example, private mappings of /dev/zero or driver memory can have a
non-NULL vma->vm_file but are still considered anonymous. If the
stackmap path proceeds to parse the ELF build ID from these files,
could it trigger unintended side effects, such as consuming FIFO data
or altering hardware state when reading from character devices?
Would it be safer to check vma_is_anonymous(vma) here to properly
classify the VMA?
Reference: https://lore.kernel.org/bpf/20260513043154.24483C2BCB7@smtp.kernel.org/
> @@ -240,9 +260,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
> continue;
> }
>
> - file = get_file(vma->vm_file);
> + file = vma->vm_file;
> vm_pgoff = vma->vm_pgoff;
> vm_start = vma->vm_start;
> + vm_end = vma->vm_end;
> +
> + if (file == cache.file) {
> + /*
> + * Same backing file as previous (e.g. different VMAs
> + * of the same ELF binary). Reuse the cache build_id.
> + */
> + stack_map_unlock_vma(&lock);
> + goto build_id_valid;
> + }
> +
> + file = get_file(file);
> stack_map_unlock_vma(&lock);
>
> /* build_id_parse_file() may block on filesystem reads */
> @@ -251,11 +283,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
> fput(file);
> continue;
> }
> - fput(file);
>
> + if (cache.file)
> + fput(cache.file);
> + cache.file = file;
> + cache.build_id = id_offs[i].build_id;
> +
> +build_id_valid:
> + /*
> + * In the slow path cache.build_id points to id_offs[i].build_id.
> + * Cache hits leave cache.build_id pointing at a prior slot,
> + * triggering the memcpy here.
> + */
> + if (cache.build_id != id_offs[i].build_id)
> + memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX);
> + cache.vm_start = vm_start;
> + cache.vm_end = vm_end;
> + cache.vm_pgoff = vm_pgoff;
> id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
> id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
> }
> +
> + if (cache.file)
> + fput(cache.file);
> }
>
> /*
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25879521172
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks
2026-05-14 19:37 ` bot+bpf-ci
@ 2026-05-14 20:37 ` Ihor Solodrai
0 siblings, 0 replies; 10+ messages in thread
From: Ihor Solodrai @ 2026-05-14 20:37 UTC (permalink / raw)
To: bot+bpf-ci, ast, andrii, daniel, eddyz87, memxor
Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel,
kernel-team, martin.lau, yonghong.song, clm
On 5/14/26 12:37 PM, bot+bpf-ci@kernel.org wrote:
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 4ef0fd06cea5..c1e96df360c3 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -9,6 +9,7 @@
>> #include <linux/perf_event.h>
>> #include <linux/btf_ids.h>
>> #include <linux/buildid.h>
>> +#include <linux/mmap_lock.h>
>> #include "percpu_freelist.h"
>> #include "mmap_unlock_work.h"
>
> [ ... ]
>
>> @@ -158,6 +159,105 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
>> memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
>> }
>>
>> +struct stack_map_vma_lock {
>> + bool vma_locked;
>> + struct vm_area_struct *vma;
>> + struct mm_struct *mm;
>> +};
>> +
>> +static struct vm_area_struct *
>> +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip)
>> +{
>> + struct mm_struct *mm = lock->mm;
>> + struct vm_area_struct *vma;
>> +
>> + if (WARN_ON_ONCE(!mm))
>> + return NULL;
>> +
>> + vma = lock_vma_under_rcu(mm, ip);
>> + if (vma)
>> + goto vma_locked;
>> +
>> + if (!mmap_read_trylock(mm))
>> + return NULL;
>> +
>> + vma = vma_lookup(mm, ip);
>> + if (!vma) {
>> + mmap_read_unlock(mm);
>> + return NULL;
>> + }
>> +
>> +#ifdef CONFIG_PER_VMA_LOCK
>> + if (!vma_start_read_locked(vma)) {
>> + mmap_read_unlock(mm);
>> + return NULL;
>> + }
>> + mmap_read_unlock(mm);
>> +#else
>> + mmap_read_unlock(mm);
>> + return NULL;
>> +#endif
>> +vma_locked:
>> + lock->vma_locked = true;
>> + lock->vma = vma;
>> + return vma;
>> +}
>> +
>> +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock)
>> +{
>> + struct vm_area_struct *vma = lock->vma;
>> +
>> + if (lock->vma_locked) {
>> + if (WARN_ON_ONCE(!vma))
>> + goto out;
>> + vma_end_read(vma);
>> + }
>> +out:
>> + lock->vma_locked = false;
>> + lock->vma = NULL;
>> +}
>> +
>> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs,
>> + u32 trace_nr)
>> +{
>> + struct mm_struct *mm = current->mm;
>> + struct stack_map_vma_lock lock = {
>> + .vma_locked = false,
>> + .vma = NULL,
>> + .mm = mm,
>> + };
>> + unsigned long vm_pgoff, vm_start;
>> + struct vm_area_struct *vma;
>> + struct file *file;
>> + u64 ip;
>> +
>> + for (u32 i = 0; i < trace_nr; i++) {
>> + ip = READ_ONCE(id_offs[i].ip);
>> + vma = stack_map_lock_vma(&lock, ip);
>> + if (!vma || !vma->vm_file) {
>> + stack_map_build_id_set_ip(&id_offs[i]);
>> + stack_map_unlock_vma(&lock);
>> + continue;
>> + }
>
> Is the vm_file check here correct for classifying anonymous VMAs?
> Per mm-vma.md, vma->vm_file is not a reliable indicator of file-backed
> status. Some anonymous VMAs carry a vm_file reference, such as private
> mappings of /dev/zero set via vma_set_anonymous() which sets vm_ops
> = NULL but leaves vm_file populated.
>
> The canonical test is vma_is_anonymous(vma) (checking !vma->vm_ops).
> For a private /dev/zero mapping, this code would pass /dev/zero to
> build_id_parse_file() instead of immediately falling back to the raw IP.
>
> This was flagged in v3 review by bot+bpf-ci@kernel.org and acknowledged
> by the author as a minor edge case but left unchanged for consistency
> with existing code elsewhere in stackmap.c:
>
> https://lore.kernel.org/bpf/1f275575e3c2309dc7b7f099c7e5cd47bb25182d94883ca0893cf4aaf03c954c@mail.kernel.org/
>
> Should this be switched to vma_is_anonymous(), or is it preferable to
> keep the current pattern and address it later across all of stackmap.c?
I wonder why the bots are so concerned.
If they get support of at least one human I'll respin with an
additional patch adding vma_is_anonymous() checks.
>
>> +
>> + file = get_file(vma->vm_file);
>> + vm_pgoff = vma->vm_pgoff;
>> + vm_start = vma->vm_start;
>> + stack_map_unlock_vma(&lock);
>> +
>> + /* build_id_parse_file() may block on filesystem reads */
>> + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) {
>> + stack_map_build_id_set_ip(&id_offs[i]);
>> + fput(file);
>> + continue;
>> + }
>> + fput(file);
>> +
>> + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
>> + id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
>> + }
>> +}
>
> [ ... ]
>
>> @@ -178,6 +278,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>> const char *prev_build_id;
>> int i;
>>
>> + if (may_fault && has_user_ctx) {
>> + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr);
>> + return;
>> + }
>> +
>> /* If the irq_work is in use, fall back to report ips. Same
>> * fallback is used for kernel stack (!user) on a stackmap with
>> * build_id.
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25879521172
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks
2026-05-14 18:47 ` [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
2026-05-14 19:37 ` bot+bpf-ci
@ 2026-05-14 20:47 ` Shakeel Butt
2026-05-14 21:31 ` Ihor Solodrai
2026-05-14 21:31 ` sashiko-bot
2 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2026-05-14 20:47 UTC (permalink / raw)
To: Ihor Solodrai
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan,
Mykyta Yatsenko, bpf, linux-kernel, kernel-team
On Thu, May 14, 2026 at 11:47:26AM -0700, Ihor Solodrai wrote:
> Sleepable build ID parsing can block in __kernel_read() [1], so the
> stackmap sleepable path must not call it while holding mmap_lock or a
> per-VMA read lock.
>
> The issue and the fix are conceptually similar to a recent procfs
> patch [2].
>
> Resolve each covered VMA with a stable read-side reference, preferring
> lock_vma_under_rcu() and falling back to mmap_read_trylock()
Why trylock()? Why not just reuse the mechanism introduced in [2]? That is
abstract out the mechanism introduced in [2] in mm core and reuse it.
> only long
> enough to acquire the VMA read lock. Take a reference to the backing
> file, drop the VMA lock, and then parse the build ID through
> (sleepable) build_id_parse_file().
>
> [1]: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
> [2]: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks
2026-05-14 18:47 ` [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
2026-05-14 19:37 ` bot+bpf-ci
2026-05-14 20:47 ` Shakeel Butt
@ 2026-05-14 21:31 ` sashiko-bot
2 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-14 21:31 UTC (permalink / raw)
To: Ihor Solodrai; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Sleepable build ID parsing unconditionally fails on platforms without CONFIG_PER_VMA_LOCK.
- [High] Incorrect file-backed VMA classification using !vma->vm_file instead of vma_is_anonymous(vma).
- [Medium] Severe I/O performance regression due to removal of VMA and build ID caching.
--
commit 5e68cb2c06b22387698d796c574ee7addecfea18
Author: Ihor Solodrai <ihor.solodrai@linux.dev>
bpf: Avoid faultable build ID reads under mm locks
This commit introduces a sleepable path for build ID parsing to avoid
blocking in __kernel_read() while holding mmap_lock or a per-VMA read lock.
Link: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
Link: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 4ef0fd06cea58..c1e96df360c3f 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
[ ... ]
> +static struct vm_area_struct *
> +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip)
> +{
[ ... ]
> + vma = vma_lookup(mm, ip);
> + if (!vma) {
> + mmap_read_unlock(mm);
> + return NULL;
> + }
> +
> +#ifdef CONFIG_PER_VMA_LOCK
> + if (!vma_start_read_locked(vma)) {
> + mmap_read_unlock(mm);
> + return NULL;
> + }
> + mmap_read_unlock(mm);
> +#else
> + mmap_read_unlock(mm);
> + return NULL;
> +#endif
Does this unconditionally fail on platforms without CONFIG_PER_VMA_LOCK? If
the fallback logic acquires the mmap_read_trylock() and successfully locates
the VMA, it looks like it will drop the lock and return NULL in the #else
block.
Could the locking abstraction be extended to hold the mmap_read_lock across
the get_file() call when per-VMA locks are unavailable?
> +vma_locked:
> + lock->vma_locked = true;
> + lock->vma = vma;
> + return vma;
> +}
[ ... ]
> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs,
> + u32 trace_nr)
> +{
> + struct mm_struct *mm = current->mm;
> + struct stack_map_vma_lock lock = {
> + .vma_locked = false,
> + .vma = NULL,
> + .mm = mm,
> + };
> + unsigned long vm_pgoff, vm_start;
> + struct vm_area_struct *vma;
> + struct file *file;
> + u64 ip;
> +
> + for (u32 i = 0; i < trace_nr; i++) {
> + ip = READ_ONCE(id_offs[i].ip);
> + vma = stack_map_lock_vma(&lock, ip);
> + if (!vma || !vma->vm_file) {
Can anonymous VMAs (like private mappings of /dev/zero) have a valid vm_file
pointer? If so, using !vma->vm_file as a proxy for file-backed
classification might dispatch pseudo-files to build_id_parse_file().
Would using !vma_is_anonymous(vma) be more robust here?
> + stack_map_build_id_set_ip(&id_offs[i]);
> + stack_map_unlock_vma(&lock);
> + continue;
> + }
> +
> + file = get_file(vma->vm_file);
> + vm_pgoff = vma->vm_pgoff;
> + vm_start = vma->vm_start;
> + stack_map_unlock_vma(&lock);
> +
> + /* build_id_parse_file() may block on filesystem reads */
> + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) {
> + stack_map_build_id_set_ip(&id_offs[i]);
> + fput(file);
> + continue;
> + }
> + fput(file);
Does this cause an I/O performance drop by removing the VMA and build ID
caching that exists in the non-sleepable path? Without caching prev_vma and
prev_build_id, this loop appears to perform get_file(), build_id_parse_file()
(which reads from the filesystem), and fput() for every single frame in the
stack trace.
I note that this caching is added back later in the series in commit
38cfe2cebe4d604860680459fcb33a68258f0654 ("bpf: Cache build IDs in
sleepable stackmap path"). Should this patch include the caching to avoid
introducing a temporary regression if bisecting?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260514184727.1067141-1-ihor.solodrai@linux.dev?part=2
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks
2026-05-14 20:47 ` Shakeel Butt
@ 2026-05-14 21:31 ` Ihor Solodrai
0 siblings, 0 replies; 10+ messages in thread
From: Ihor Solodrai @ 2026-05-14 21:31 UTC (permalink / raw)
To: Shakeel Butt, Puranjay Mohan
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Mykyta Yatsenko, bpf,
linux-kernel, kernel-team
On 5/14/26 1:47 PM, Shakeel Butt wrote:
> On Thu, May 14, 2026 at 11:47:26AM -0700, Ihor Solodrai wrote:
>> Sleepable build ID parsing can block in __kernel_read() [1], so the
>> stackmap sleepable path must not call it while holding mmap_lock or a
>> per-VMA read lock.
>>
>> The issue and the fix are conceptually similar to a recent procfs
>> patch [2].
>>
>> Resolve each covered VMA with a stable read-side reference, preferring
>> lock_vma_under_rcu() and falling back to mmap_read_trylock()
>
> Why trylock()? Why not just reuse the mechanism introduced in [2]? That is
> abstract out the mechanism introduced in [2] in mm core and reuse it.
v1 used mmap_read_lock() as the fallback, but Puranjay pointed out [1]
that stackmap can be called when the caller already holds
mmap_lock. So I changed to trylock since v2, similar to non-sleepable
path.
AFAIU this means that the common mechanism needs to support trylock
behavior as in stack_map_lock_vma() in this patch.
Do you think this is worth factoring out of stackmap.c?
[1] https://lore.kernel.org/bpf/m25x611s17.fsf@kernel.org/
>
>
>> only long
>> enough to acquire the VMA read lock. Take a reference to the backing
>> file, drop the VMA lock, and then parse the build ID through
>> (sleepable) build_id_parse_file().
>>
>> [1]: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
>> [2]: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-14 21:32 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 18:47 [PATCH bpf v4 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
2026-05-14 18:47 ` [PATCH bpf v4 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
2026-05-14 19:37 ` bot+bpf-ci
2026-05-14 20:37 ` Ihor Solodrai
2026-05-14 20:47 ` Shakeel Butt
2026-05-14 21:31 ` Ihor Solodrai
2026-05-14 21:31 ` sashiko-bot
2026-05-14 18:47 ` [PATCH bpf v4 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
2026-05-14 19:37 ` bot+bpf-ci
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox