BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable()
@ 2026-05-15  0:52 Ihor Solodrai
  2026-05-15  0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ihor Solodrai @ 2026-05-15  0:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Kumar Kartikeya Dwivedi
  Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
	kernel-team

The series introduces stack_map_get_build_id_offset_sleepable(),
fixing a gap with parsing build_id in sleepable context in stackmap.c

I noticed a following deadlock splat on bpf-next, when running a
subset of selftests/bpf:

  #526     uprobe_multi_test:OK
  #28      bpf_obj_id:OK
  [   27.561465]
  [   27.561603] ======================================================
  [   27.561899] WARNING: possible circular locking dependency detected
  [   27.562193] 7.0.0-rc5-g0eeb0094ba03 #3 Tainted: G           OE
  [   27.562523] ------------------------------------------------------
  [   27.562820] uprobe_multi/561 is trying to acquire lock:
  [   27.563071] ff1100010768ba18 (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: netfs_start_io_read+0x24/0x110
  [   27.563556]
  [   27.563556] but task is already holding lock:
  [   27.563845] ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
  [   27.564296]
  [   27.564296] which lock already depends on the new lock.
  [   27.564296]
  [   27.564696]
  [   27.564696] the existing dependency chain (in reverse order) is:
  [   27.565044]
  [   27.565044] -> #1 (&mm->mmap_lock){++++}-{4:4}:
  [   27.565340]        __might_fault+0xa3/0x100
  [   27.565572]        _copy_to_iter+0xdf/0x1520
  [   27.565786]        copy_page_to_iter+0x1c6/0x2d0
  [   27.566011]        filemap_read+0x5f7/0xbf0
  [   27.566218]        netfs_file_read_iter+0x1ca/0x240
  [   27.566453]        vfs_read+0x482/0x9e0
  [   27.566658]        ksys_read+0x115/0x1f0
  [   27.566852]        do_syscall_64+0xde/0x390
  [   27.567057]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [   27.567322]
  [   27.567322] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
  [   27.567701]        __lock_acquire+0x1529/0x2a10
  [   27.567923]        lock_acquire+0xd7/0x280
  [   27.568124]        down_read_interruptible+0x55/0x340
  [   27.568368]        netfs_start_io_read+0x24/0x110
  [   27.568622]        netfs_file_read_iter+0x191/0x240
  [   27.568858]        __kernel_read+0x401/0x850
  [   27.569068]        freader_fetch+0x19b/0x790
  [   27.569279]        __build_id_parse+0xde/0x580
  [   27.569528]        stack_map_get_build_id_offset+0x248/0x650
  [   27.569802]        __bpf_get_stack+0x653/0x890
  [   27.570019]        bpf_get_stack_sleepable+0x1d/0x30
  [   27.570257]        bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
  [   27.570594]        uprobe_prog_run+0x425/0x870
  [   27.570816]        uprobe_multi_link_handler+0x58/0xb0
  [   27.571062]        handler_chain+0x234/0x1270
  [   27.571275]        uprobe_notify_resume+0x4de/0xd60
  [   27.571542]        exit_to_user_mode_loop+0xe0/0x480
  [   27.571785]        exc_int3+0x1bd/0x260
  [   27.571973]        asm_exc_int3+0x39/0x40
  [   27.572169]
  [   27.572169] other info that might help us debug this:
  [   27.572169]
  [   27.572571]  Possible unsafe locking scenario:
  [   27.572571]
  [   27.572854]        CPU0                    CPU1
  [   27.573074]        ----                    ----
  [   27.573293]   rlock(&mm->mmap_lock);
  [   27.573502]                                lock(&sb->s_type->i_mutex_key#8);
  [   27.573846]                                lock(&mm->mmap_lock);
  [   27.574135]   rlock(&sb->s_type->i_mutex_key#8);
  [   27.574364]
  [   27.574364]  *** DEADLOCK ***
  [   27.574364]
  [   27.574671] 3 locks held by uprobe_multi/561:
  [   27.574884]  #0: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_notify_resume+0x229/0xd60
  [   27.575371]  #1: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_prog_run+0x239/0x870
  [   27.575874]  #2: ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
  [   27.576342]
  [   27.576342] stack backtrace:
  [   27.576578] CPU: 7 UID: 0 PID: 561 Comm: uprobe_multi Tainted: G           OE       7.0.0-rc5-g0eeb0094ba03 #3 PREEMPT(full)
  [   27.576586] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  [   27.576588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-5.el9 11/05/2023
  [   27.576591] Call Trace:
  [   27.576595]  <TASK>
  [   27.576598]  dump_stack_lvl+0x54/0x70
  [   27.576606]  print_circular_bug+0x2e8/0x300
  [   27.576616]  check_noncircular+0x12e/0x150
  [   27.576628]  __lock_acquire+0x1529/0x2a10
  [   27.576642]  ? __pfx_walk_pgd_range+0x10/0x10
  [   27.576651]  ? srso_return_thunk+0x5/0x5f
  [   27.576656]  ? lock_acquire+0xd7/0x280
  [   27.576661]  ? avc_has_perm_noaudit+0x38/0x1f0
  [   27.576672]  lock_acquire+0xd7/0x280
  [   27.576677]  ? netfs_start_io_read+0x24/0x110
  [   27.576685]  ? srso_return_thunk+0x5/0x5f
  [   27.576693]  ? netfs_start_io_read+0x24/0x110
  [   27.576698]  down_read_interruptible+0x55/0x340
  [   27.576702]  ? netfs_start_io_read+0x24/0x110
  [   27.576707]  ? __pfx_cmp_ex_search+0x10/0x10
  [   27.576716]  netfs_start_io_read+0x24/0x110
  [   27.576723]  netfs_file_read_iter+0x191/0x240
  [   27.576731]  __kernel_read+0x401/0x850
  [   27.576741]  ? __pfx___kernel_read+0x10/0x10
  [   27.576752]  ? __pfx_fixup_exception+0x10/0x10
  [   27.576761]  ? srso_return_thunk+0x5/0x5f
  [   27.576766]  ? lock_is_held_type+0x76/0x100
  [   27.576773]  ? srso_return_thunk+0x5/0x5f
  [   27.576777]  ? mtree_range_walk+0x421/0x6c0
  [   27.576785]  freader_fetch+0x19b/0x790
  [   27.576793]  ? mt_find+0x14b/0x340
  [   27.576801]  ? __pfx_freader_fetch+0x10/0x10
  [   27.576805]  ? mt_find+0x29a/0x340
  [   27.576810]  ? mt_find+0x14b/0x340
  [   27.576820]  __build_id_parse+0xde/0x580
  [   27.576832]  ? __pfx___build_id_parse+0x10/0x10
  [   27.576850]  stack_map_get_build_id_offset+0x248/0x650
  [   27.576863]  __bpf_get_stack+0x653/0x890
  [   27.576867]  ? __pfx_stack_trace_consume_entry+0x10/0x10
  [   27.576880]  ? __pfx___bpf_get_stack+0x10/0x10
  [   27.576884]  ? lock_acquire+0xd7/0x280
  [   27.576889]  ? uprobe_prog_run+0x239/0x870
  [   27.576895]  ? srso_return_thunk+0x5/0x5f
  [   27.576899]  ? stack_trace_save+0xae/0x100
  [   27.576908]  bpf_get_stack_sleepable+0x1d/0x30
  [   27.576913]  bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
  [   27.576919]  uprobe_prog_run+0x425/0x870
  [   27.576927]  ? srso_return_thunk+0x5/0x5f
  [   27.576933]  ? __pfx_uprobe_prog_run+0x10/0x10
  [   27.576937]  ? uprobe_notify_resume+0x413/0xd60
  [   27.576952]  uprobe_multi_link_handler+0x58/0xb0
  [   27.576960]  handler_chain+0x234/0x1270
  [   27.576976]  ? __pfx_handler_chain+0x10/0x10
  [   27.576988]  ? srso_return_thunk+0x5/0x5f
  [   27.576992]  ? __kasan_kmalloc+0x72/0x90
  [   27.576998]  ? __pfx_ri_timer+0x10/0x10
  [   27.577003]  ? timer_init_key+0xf5/0x200
  [   27.577013]  uprobe_notify_resume+0x4de/0xd60
  [   27.577025]  ? uprobe_notify_resume+0x229/0xd60
  [   27.577030]  ? __pfx_uprobe_notify_resume+0x10/0x10
  [   27.577035]  ? srso_return_thunk+0x5/0x5f
  [   27.577039]  ? atomic_notifier_call_chain+0xfc/0x110
  [   27.577047]  ? srso_return_thunk+0x5/0x5f
  [   27.577051]  ? notify_die+0xe5/0x130
  [   27.577056]  ? __pfx_notify_die+0x10/0x10
  [   27.577063]  ? srso_return_thunk+0x5/0x5f
  [   27.577071]  exit_to_user_mode_loop+0xe0/0x480
  [   27.577076]  ? asm_exc_int3+0x39/0x40
  [   27.577083]  exc_int3+0x1bd/0x260
  [   27.577090]  asm_exc_int3+0x39/0x40
  [   27.577095] RIP: 0033:0x6c4180
  [   27.577100] Code: c6 05 0b 6f 32 00 01 5d c3 90 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa eb 8a 66 2e 0f 1f 84 00 00 00 00 00 <cc> 48 89 e5 31 c0 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 31 c0
  [   27.577104] RSP: 002b:00007ffcbf71ada8 EFLAGS: 00000246
  [   27.577109] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fec9163db7b
  [   27.577112] RDX: 00007ffcbf71adbf RSI: 0000000000001000 RDI: 00000000007f0000
  [   27.577115] RBP: 00007ffcbf71add0 R08: 00007fec91731330 R09: 00007fec9178d060
  [   27.577118] R10: 00007fec9153ad98 R11: 0000000000000202 R12: 00007ffcbf71af08
  [   27.577120] R13: 0000000000787760 R14: 00000000009eade8 R15: 00007fec917bf000
  [   27.577135]  </TASK>
  #54/1    build_id/nofault-paged-out:OK

It was reproducable, which allowed me to bisect to commit 777a8560fd29
("lib/buildid: use __kernel_read() for sleepable context")

Then I used Codex for analysis of the splat, patch and relevant lore
discussions, and then asked it to produce a patch draft, which was a
starting point for this series.

I verified that the patch series fixes the deadlock splat.

---

v4->v5:
  * Add comments explaining mmap_read_trylock() (Shakeel)
  * Rebase on bpf-next (Alexei)

v4: https://lore.kernel.org/bpf/20260514184727.1067141-1-ihor.solodrai@linux.dev/

v3->v4:
  * Change Fixes tag in patch #2 (AI)
  * Nit in caching implementation (Mykyta)
v3: https://lore.kernel.org/bpf/20260512032906.2670326-1-ihor.solodrai@linux.dev/

v2->v3:
  * Split patch #2 in two: stack_map_get_build_id_offset_sleepable()
    implementation, and then introduce caching
  * Drop taking mmap_lock if CONFIG_PER_VMA_LOCK=n, fall back to raw
    IPs instead
  * Cache vm_{start,end} in addition to prev_file (Mykyta)
v2: https://lore.kernel.org/bpf/20260409010604.1439087-1-ihor.solodrai@linux.dev/

v1->v2:
  * Addressed feedback from Puranjay:
    * split out a small refactoring patch
    * use mmap_read_trylock()
    * take into account CONFIG_PER_VMA_LOCK
    * replace find_vma() with vma_lookup()
    * cache prev_build_id to avoid re-parsing the same file
  * Snapshot vm_pgoff and vm_start before unlocking (AI)
  * To avoid repetitive unlocking statements, introduce struct
    stack_map_vma_lock to hold relevant lock state info and add an
    unlock helper
v1: https://lore.kernel.org/bpf/20260407223003.720428-1-ihor.solodrai@linux.dev/

---

Ihor Solodrai (3):
  bpf: Factor out stack_map_build_id_set_ip() in stackmap.c
  bpf: Avoid faultable build ID reads under mm locks
  bpf: Cache build IDs in sleepable stackmap path

 kernel/bpf/stackmap.c | 180 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 171 insertions(+), 9 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c
  2026-05-15  0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
@ 2026-05-15  0:52 ` Ihor Solodrai
  2026-05-15  0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
  2026-05-15  0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
  2 siblings, 0 replies; 7+ messages in thread
From: Ihor Solodrai @ 2026-05-15  0:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Kumar Kartikeya Dwivedi
  Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
	kernel-team

Factor out a small helper from stack_map_get_build_id_offset() in
preparation for adding a sleepable build ID resolution path.

No functional changes.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
 kernel/bpf/stackmap.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index da3d328f5c15..4ef0fd06cea5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -152,6 +152,12 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b
 			 : build_id_parse_nofault(vma, build_id, NULL);
 }
 
+static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
+{
+	id->status = BPF_STACK_BUILD_ID_IP;
+	memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
+}
+
 /*
  * Expects all id_offs[i].ip values to be set to correct initial IPs.
  * They will be subsequently:
@@ -165,23 +171,21 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b
 static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 					  u32 trace_nr, bool user, bool may_fault)
 {
-	int i;
 	struct mmap_unlock_irq_work *work = NULL;
 	bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work);
+	bool has_user_ctx = user && current && current->mm;
 	struct vm_area_struct *vma, *prev_vma = NULL;
 	const char *prev_build_id;
+	int i;
 
 	/* If the irq_work is in use, fall back to report ips. Same
 	 * fallback is used for kernel stack (!user) on a stackmap with
 	 * build_id.
 	 */
-	if (!user || !current || !current->mm || irq_work_busy ||
-	    !mmap_read_trylock(current->mm)) {
+	if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) {
 		/* cannot access current->mm, fall back to ips */
-		for (i = 0; i < trace_nr; i++) {
-			id_offs[i].status = BPF_STACK_BUILD_ID_IP;
-			memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
-		}
+		for (i = 0; i < trace_nr; i++)
+			stack_map_build_id_set_ip(&id_offs[i]);
 		return;
 	}
 
@@ -196,8 +200,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 		vma = find_vma(current->mm, ip);
 		if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) {
 			/* per entry fall back to ips */
-			id_offs[i].status = BPF_STACK_BUILD_ID_IP;
-			memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
+			stack_map_build_id_set_ip(&id_offs[i]);
 			continue;
 		}
 build_id_valid:
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks
  2026-05-15  0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
  2026-05-15  0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
@ 2026-05-15  0:52 ` Ihor Solodrai
  2026-05-15  1:25   ` sashiko-bot
  2026-05-15  1:33   ` bot+bpf-ci
  2026-05-15  0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
  2 siblings, 2 replies; 7+ messages in thread
From: Ihor Solodrai @ 2026-05-15  0:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Kumar Kartikeya Dwivedi
  Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
	kernel-team

Sleepable build ID parsing can block in __kernel_read() [1], so the
stackmap sleepable path must not call it while holding mmap_lock or a
per-VMA read lock.

The issue and the fix are conceptually similar to a recent procfs
patch [2].

Resolve each covered VMA with a stable read-side reference, preferring
lock_vma_under_rcu() and falling back to mmap_read_trylock() only long
enough to acquire the VMA read lock. Take a reference to the backing
file, drop the VMA lock, and then parse the build ID through
(sleepable) build_id_parse_file().

We have to use mmap_read_trylock() (and give up on failure) in this
context because taking mmap_read_lock() is generally unsafe on code
paths reachable from BPF programs [3], and may lead to deadlocks.

[1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
[2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
[3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/

Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers")
Suggested-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
 kernel/bpf/stackmap.c | 109 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 4ef0fd06cea5..08f7659505d1 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -9,6 +9,7 @@
 #include <linux/perf_event.h>
 #include <linux/btf_ids.h>
 #include <linux/buildid.h>
+#include <linux/mmap_lock.h>
 #include "percpu_freelist.h"
 #include "mmap_unlock_work.h"
 
@@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
 	memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
 }
 
+struct stack_map_vma_lock {
+	bool vma_locked;
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+};
+
+static struct vm_area_struct *
+stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip)
+{
+	struct mm_struct *mm = lock->mm;
+	struct vm_area_struct *vma;
+
+	if (WARN_ON_ONCE(!mm))
+		return NULL;
+
+	vma = lock_vma_under_rcu(mm, ip);
+	if (vma)
+		goto vma_locked;
+
+	/*
+	 * Taking mmap_read_lock() is unsafe here, because the caller
+	 * BPF program might already hold it, causing a deadlock.
+	 */
+	if (!mmap_read_trylock(mm))
+		return NULL;
+
+	vma = vma_lookup(mm, ip);
+	if (!vma) {
+		mmap_read_unlock(mm);
+		return NULL;
+	}
+
+#ifdef CONFIG_PER_VMA_LOCK
+	if (!vma_start_read_locked(vma)) {
+		mmap_read_unlock(mm);
+		return NULL;
+	}
+	mmap_read_unlock(mm);
+#else
+	mmap_read_unlock(mm);
+	return NULL;
+#endif
+vma_locked:
+	lock->vma_locked = true;
+	lock->vma = vma;
+	return vma;
+}
+
+static void stack_map_unlock_vma(struct stack_map_vma_lock *lock)
+{
+	struct vm_area_struct *vma = lock->vma;
+
+	if (lock->vma_locked) {
+		if (WARN_ON_ONCE(!vma))
+			goto out;
+		vma_end_read(vma);
+	}
+out:
+	lock->vma_locked = false;
+	lock->vma = NULL;
+}
+
+static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs,
+						    u32 trace_nr)
+{
+	struct mm_struct *mm = current->mm;
+	struct stack_map_vma_lock lock = {
+		.vma_locked = false,
+		.vma = NULL,
+		.mm = mm,
+	};
+	unsigned long vm_pgoff, vm_start;
+	struct vm_area_struct *vma;
+	struct file *file;
+	u64 ip;
+
+	for (u32 i = 0; i < trace_nr; i++) {
+		ip = READ_ONCE(id_offs[i].ip);
+		vma = stack_map_lock_vma(&lock, ip);
+		if (!vma || !vma->vm_file) {
+			stack_map_build_id_set_ip(&id_offs[i]);
+			stack_map_unlock_vma(&lock);
+			continue;
+		}
+
+		file = get_file(vma->vm_file);
+		vm_pgoff = vma->vm_pgoff;
+		vm_start = vma->vm_start;
+		stack_map_unlock_vma(&lock);
+
+		/* build_id_parse_file() may block on filesystem reads */
+		if (build_id_parse_file(file, id_offs[i].build_id, NULL)) {
+			stack_map_build_id_set_ip(&id_offs[i]);
+			fput(file);
+			continue;
+		}
+		fput(file);
+
+		id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
+		id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
+	}
+}
+
 /*
  * Expects all id_offs[i].ip values to be set to correct initial IPs.
  * They will be subsequently:
@@ -178,6 +282,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 	const char *prev_build_id;
 	int i;
 
+	if (may_fault && has_user_ctx) {
+		stack_map_get_build_id_offset_sleepable(id_offs, trace_nr);
+		return;
+	}
+
 	/* If the irq_work is in use, fall back to report ips. Same
 	 * fallback is used for kernel stack (!user) on a stackmap with
 	 * build_id.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path
  2026-05-15  0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
  2026-05-15  0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
  2026-05-15  0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
@ 2026-05-15  0:52 ` Ihor Solodrai
  2026-05-15  1:21   ` bot+bpf-ci
  2 siblings, 1 reply; 7+ messages in thread
From: Ihor Solodrai @ 2026-05-15  0:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Kumar Kartikeya Dwivedi
  Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
	kernel-team

Stack traces often contain adjacent IPs from the same VMA or from
different VMAs backed by the same ELF file. Cache the last successfully
parsed build ID together with the resolved VMA range and backing file
so the sleepable build-ID path can avoid repeated VMA locking and file
parsing in common cases.

Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
 kernel/bpf/stackmap.c | 56 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 08f7659505d1..7336fd55c856 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -230,13 +230,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
 		.vma = NULL,
 		.mm = mm,
 	};
-	unsigned long vm_pgoff, vm_start;
+	struct {
+		struct file *file;
+		const unsigned char *build_id;
+		unsigned long vm_start;
+		unsigned long vm_end;
+		unsigned long vm_pgoff;
+	} cache = {};
+	unsigned long vm_pgoff, vm_start, vm_end;
 	struct vm_area_struct *vma;
 	struct file *file;
 	u64 ip;
 
 	for (u32 i = 0; i < trace_nr; i++) {
 		ip = READ_ONCE(id_offs[i].ip);
+
+		/*
+		 * Range cache fast path: if ip falls within the previously
+		 * resolved VMA range, reuse the cache build_id without
+		 * re-acquiring the VMA lock.
+		 */
+		if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) {
+			vm_start = cache.vm_start;
+			vm_end = cache.vm_end;
+			vm_pgoff = cache.vm_pgoff;
+			goto build_id_valid;
+		}
+
 		vma = stack_map_lock_vma(&lock, ip);
 		if (!vma || !vma->vm_file) {
 			stack_map_build_id_set_ip(&id_offs[i]);
@@ -244,9 +264,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
 			continue;
 		}
 
-		file = get_file(vma->vm_file);
+		file = vma->vm_file;
 		vm_pgoff = vma->vm_pgoff;
 		vm_start = vma->vm_start;
+		vm_end = vma->vm_end;
+
+		if (file == cache.file) {
+			/*
+			 * Same backing file as previous (e.g. different VMAs
+			 * of the same ELF binary). Reuse the cache build_id.
+			 */
+			stack_map_unlock_vma(&lock);
+			goto build_id_valid;
+		}
+
+		file = get_file(file);
 		stack_map_unlock_vma(&lock);
 
 		/* build_id_parse_file() may block on filesystem reads */
@@ -255,11 +287,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
 			fput(file);
 			continue;
 		}
-		fput(file);
 
+		if (cache.file)
+			fput(cache.file);
+		cache.file = file;
+		cache.build_id = id_offs[i].build_id;
+
+build_id_valid:
+		/*
+		 * In the slow path cache.build_id points to id_offs[i].build_id.
+		 * Cache hits leave cache.build_id pointing at a prior slot,
+		 * triggering the memcpy here.
+		 */
+		if (cache.build_id != id_offs[i].build_id)
+			memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX);
+		cache.vm_start = vm_start;
+		cache.vm_end = vm_end;
+		cache.vm_pgoff = vm_pgoff;
 		id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
 		id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
 	}
+
+	if (cache.file)
+		fput(cache.file);
 }
 
 /*
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path
  2026-05-15  0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
@ 2026-05-15  1:21   ` bot+bpf-ci
  0 siblings, 0 replies; 7+ messages in thread
From: bot+bpf-ci @ 2026-05-15  1:21 UTC (permalink / raw)
  To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor
  Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel,
	kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3794 bytes --]

> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 08f7659505d1..7336fd55c856 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -230,13 +230,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
>  		.vma = NULL,
>  		.mm = mm,
>  	};
> -	unsigned long vm_pgoff, vm_start;
> +	struct {
> +		struct file *file;
> +		const unsigned char *build_id;
> +		unsigned long vm_start;
> +		unsigned long vm_end;
> +		unsigned long vm_pgoff;
> +	} cache = {};
> +	unsigned long vm_pgoff, vm_start, vm_end;
>  	struct vm_area_struct *vma;
>  	struct file *file;
>  	u64 ip;
>
>  	for (u32 i = 0; i < trace_nr; i++) {
>  		ip = READ_ONCE(id_offs[i].ip);
> +
> +		/*
> +		 * Range cache fast path: if ip falls within the previously
> +		 * resolved VMA range, reuse the cache build_id without
> +		 * re-acquiring the VMA lock.
> +		 */
> +		if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) {
> +			vm_start = cache.vm_start;
> +			vm_end = cache.vm_end;
> +			vm_pgoff = cache.vm_pgoff;
> +			goto build_id_valid;
> +		}
> +
>  		vma = stack_map_lock_vma(&lock, ip);
>  		if (!vma || !vma->vm_file) {
>  			stack_map_build_id_set_ip(&id_offs[i]);

In v3 and v4, sashiko-bot@kernel.org and bot+bpf-ci@kernel.org raised
a concern about VMA classification here. The code uses !vma->vm_file to
determine if a VMA is file-backed, but this could cause issues with
anonymous VMAs that still have a backing file.

Private mappings of /dev/zero or driver memory can have a non-NULL
vma->vm_file but are still considered anonymous. If the stackmap path
proceeds to parse the ELF build ID from these files, could it trigger
unintended side effects, such as consuming FIFO data or altering
hardware state when reading from character devices?

Would it be safer to check vma_is_anonymous(vma) here to properly
classify the VMA?

Original comment from sashiko-bot@kernel.org (v3):
https://lore.kernel.org/bpf/20260513043154.24483C2BCB7@smtp.kernel.org/

> @@ -244,9 +264,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
>  			continue;
>  		}
>
> -		file = get_file(vma->vm_file);
> +		file = vma->vm_file;
>  		vm_pgoff = vma->vm_pgoff;
>  		vm_start = vma->vm_start;
> +		vm_end = vma->vm_end;
> +
> +		if (file == cache.file) {
> +			/*
> +			 * Same backing file as previous (e.g. different VMAs
> +			 * of the same ELF binary). Reuse the cache build_id.
> +			 */
> +			stack_map_unlock_vma(&lock);
> +			goto build_id_valid;
> +		}
> +
> +		file = get_file(file);
>  		stack_map_unlock_vma(&lock);
>
>  		/* build_id_parse_file() may block on filesystem reads */
> @@ -255,11 +287,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i
>  			fput(file);
>  			continue;
>  		}
> -		fput(file);
>
> +		if (cache.file)
> +			fput(cache.file);
> +		cache.file = file;
> +		cache.build_id = id_offs[i].build_id;
> +
> +build_id_valid:
> +		/*
> +		 * In the slow path cache.build_id points to id_offs[i].build_id.
> +		 * Cache hits leave cache.build_id pointing at a prior slot,
> +		 * triggering the memcpy here.
> +		 */
> +		if (cache.build_id != id_offs[i].build_id)
> +			memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX);
> +		cache.vm_start = vm_start;
> +		cache.vm_end = vm_end;
> +		cache.vm_pgoff = vm_pgoff;
>  		id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
>  		id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
>  	}
> +
> +	if (cache.file)
> +		fput(cache.file);
>  }
>
>  /*

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25894427199

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks
  2026-05-15  0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
@ 2026-05-15  1:25   ` sashiko-bot
  2026-05-15  1:33   ` bot+bpf-ci
  1 sibling, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-05-15  1:25 UTC (permalink / raw)
  To: Ihor Solodrai; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Sleepable build ID parsing silently fails on kernels built without CONFIG_PER_VMA_LOCK.
- [Medium] Incorrect VMA classification using `!vma->vm_file` instead of `vma_is_anonymous()`.
--

commit 6ca3c963b80bfbdca796bfbc82d5d51cf0333054
Author: Ihor Solodrai <ihor.solodrai@linux.dev>

bpf: Avoid faultable build ID reads under mm locks

This commit modifies the sleepable stackmap path to parse build IDs without
holding the mmap_lock or per-VMA read locks. It locks the VMA via RCU or
mmap_read_trylock, captures the file reference, releases the lock, and then
performs the sleepable file read.

Link: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
Link: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
Link: https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/

> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c

[ ... ]

> +static struct vm_area_struct *
> +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip)
> +{

[ ... ]

> +#ifdef CONFIG_PER_VMA_LOCK
> +	if (!vma_start_read_locked(vma)) {
> +		mmap_read_unlock(mm);
> +		return NULL;
> +	}
> +	mmap_read_unlock(mm);
> +#else
> +	mmap_read_unlock(mm);
> +	return NULL;
> +#endif

Does the fallback for kernels built without CONFIG_PER_VMA_LOCK completely
disable the build ID parsing feature?

Looking at the #else branch, it unconditionally drops the mmap_read_lock and
returns NULL. This causes the caller to receive a NULL VMA for every IP,
which skips parsing and falls back to returning IP-only traces.

Could this instead keep the mmap_read_lock held, return the VMA, and allow
the caller to extract the file reference, vm_pgoff, and vm_start before
safely dropping the lock?

> +vma_locked:
> +	lock->vma_locked = true;
> +	lock->vma = vma;
> +	return vma;
> +}

[ ... ]

> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs,
> +						    u32 trace_nr)
> +{

[ ... ]

> +	for (u32 i = 0; i < trace_nr; i++) {
> +		ip = READ_ONCE(id_offs[i].ip);
> +		vma = stack_map_lock_vma(&lock, ip);
> +		if (!vma || !vma->vm_file) {

Does checking !vma->vm_file properly identify file-backed VMAs?

Some anonymous VMAs, such as private mappings of /dev/zero, can have a
non-NULL vm_file but are conceptually anonymous since their vm_ops are NULL.
Allowing these through would result in taking a reference to the device file
and unnecessarily attempting to parse an ELF build ID from it.

Would it be safer to use vma_is_anonymous(vma) to accurately verify if the
VMA is genuinely file-backed?

> +			stack_map_build_id_set_ip(&id_offs[i]);
> +			stack_map_unlock_vma(&lock);
> +			continue;
> +		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260515005244.1333013-1-ihor.solodrai@linux.dev?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks
  2026-05-15  0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
  2026-05-15  1:25   ` sashiko-bot
@ 2026-05-15  1:33   ` bot+bpf-ci
  1 sibling, 0 replies; 7+ messages in thread
From: bot+bpf-ci @ 2026-05-15  1:33 UTC (permalink / raw)
  To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor
  Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel,
	kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 5331 bytes --]

> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 4ef0fd06cea5..08f7659505d1 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -9,6 +9,7 @@
>  #include <linux/perf_event.h>
>  #include <linux/btf_ids.h>
>  #include <linux/buildid.h>
> +#include <linux/mmap_lock.h>
>  #include "percpu_freelist.h"
>  #include "mmap_unlock_work.h"
>
> @@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id)
>  	memset(id->build_id, 0, BUILD_ID_SIZE_MAX);
>  }
>
> +struct stack_map_vma_lock {
> +	bool vma_locked;
> +	struct vm_area_struct *vma;
> +	struct mm_struct *mm;
> +};
> +
> +static struct vm_area_struct *
> +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip)
> +{
> +	struct mm_struct *mm = lock->mm;
> +	struct vm_area_struct *vma;
> +
> +	if (WARN_ON_ONCE(!mm))
> +		return NULL;
> +
> +	vma = lock_vma_under_rcu(mm, ip);
> +	if (vma)
> +		goto vma_locked;
> +
> +	/*
> +	 * Taking mmap_read_lock() is unsafe here, because the caller
> +	 * BPF program might already hold it, causing a deadlock.
> +	 */
> +	if (!mmap_read_trylock(mm))
> +		return NULL;
> +
> +	vma = vma_lookup(mm, ip);
> +	if (!vma) {
> +		mmap_read_unlock(mm);
> +		return NULL;
> +	}
> +
> +#ifdef CONFIG_PER_VMA_LOCK
> +	if (!vma_start_read_locked(vma)) {
> +		mmap_read_unlock(mm);
> +		return NULL;
> +	}
> +	mmap_read_unlock(mm);
> +#else
> +	mmap_read_unlock(mm);
> +	return NULL;
> +#endif
> +vma_locked:
> +	lock->vma_locked = true;
> +	lock->vma = vma;
> +	return vma;
> +}
> +
> +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock)
> +{
> +	struct vm_area_struct *vma = lock->vma;
> +
> +	if (lock->vma_locked) {
> +		if (WARN_ON_ONCE(!vma))
> +			goto out;
> +		vma_end_read(vma);
> +	}
> +out:
> +	lock->vma_locked = false;
> +	lock->vma = NULL;
> +}
> +
> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs,
> +						    u32 trace_nr)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct stack_map_vma_lock lock = {
> +		.vma_locked = false,
> +		.vma = NULL,
> +		.mm = mm,
> +	};
> +	unsigned long vm_pgoff, vm_start;
> +	struct vm_area_struct *vma;
> +	struct file *file;
> +	u64 ip;
> +
> +	for (u32 i = 0; i < trace_nr; i++) {
> +		ip = READ_ONCE(id_offs[i].ip);
> +		vma = stack_map_lock_vma(&lock, ip);
> +		if (!vma || !vma->vm_file) {
                        ^^^^^^^^^^^^^^^^
This isn't a bug, but a subsystem pattern flags this as potentially
concerning: the code uses !vma->vm_file to classify the VMA as anonymous.
Per mm-vma.md's VMA classification rules, the correct test for anonymous
VMAs is vma_is_anonymous() (i.e. !vma->vm_ops).

Some anonymous VMAs carry a vm_file reference. For example, private mappings
of /dev/zero set via vma_set_anonymous() leave vm_file populated while
clearing vm_ops. On such VMAs, this code would proceed down the file-backed
path and hand /dev/zero (or any other vma_set_anonymous()-style driver file)
to build_id_parse_file():

stack_map_get_build_id_offset_sleepable() {
    if (!vma || !vma->vm_file) {
        stack_map_build_id_set_ip(&id_offs[i]);
        ...
        continue;
    }
    file = get_file(vma->vm_file);   /* gets /dev/zero */
    ...
    build_id_parse_file(file, ...)   /* tries to parse /dev/zero as ELF */
}

instead of immediately falling back to reporting the raw IP.

In practice, build_id_parse_file() would fail the ELF magic check with
-EINVAL and fall through to stack_map_build_id_set_ip(), so no crash
occurs. The code matches the pattern in the existing build_id_parse() and
build_id_parse_nofault() API in lib/buildid.c, so the patch is consistent
with established kernel build-ID handling. However, is it worth switching
to vma_is_anonymous(vma) to avoid the unnecessary sleepable read attempt on
driver files?

AI reviewers bot+bpf-ci@kernel.org and sashiko-bot@kernel.org raised this
across v3 and v4, and it remains unaddressed in the current version:
https://lore.kernel.org/bpf/1f275575e3c2309dc7b7f099c7e5cd47bb25182d94883ca0893cf4aaf03c954c@mail.kernel.org/

> +			stack_map_build_id_set_ip(&id_offs[i]);
> +			stack_map_unlock_vma(&lock);
> +			continue;
> +		}
> +
> +		file = get_file(vma->vm_file);
> +		vm_pgoff = vma->vm_pgoff;
> +		vm_start = vma->vm_start;
> +		stack_map_unlock_vma(&lock);
> +
> +		/* build_id_parse_file() may block on filesystem reads */
> +		if (build_id_parse_file(file, id_offs[i].build_id, NULL)) {
> +			stack_map_build_id_set_ip(&id_offs[i]);
> +			fput(file);
> +			continue;
> +		}
> +		fput(file);
> +
> +		id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start;
> +		id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
> +	}
> +}
> +
>  /*
>   * Expects all id_offs[i].ip values to be set to correct initial IPs.
>   * They will be subsequently:
> @@ -178,6 +282,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>  	const char *prev_build_id;
>  	int i;
>
> +	if (may_fault && has_user_ctx) {
> +		stack_map_get_build_id_offset_sleepable(id_offs, trace_nr);
> +		return;
> +	}
> +

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25894427199

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-15  1:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15  0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai
2026-05-15  0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
2026-05-15  0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
2026-05-15  1:25   ` sashiko-bot
2026-05-15  1:33   ` bot+bpf-ci
2026-05-15  0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai
2026-05-15  1:21   ` bot+bpf-ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox