* [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable()
@ 2026-05-15 0:52 Ihor Solodrai
2026-05-15 0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Ihor Solodrai @ 2026-05-15 0:52 UTC (permalink / raw)
To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi
Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
kernel-team
The series introduces stack_map_get_build_id_offset_sleepable(),
fixing a gap with parsing build_id in sleepable context in stackmap.c
I noticed a following deadlock splat on bpf-next, when running a
subset of selftests/bpf:
#526 uprobe_multi_test:OK
#28 bpf_obj_id:OK
[ 27.561465]
[ 27.561603] ======================================================
[ 27.561899] WARNING: possible circular locking dependency detected
[ 27.562193] 7.0.0-rc5-g0eeb0094ba03 #3 Tainted: G OE
[ 27.562523] ------------------------------------------------------
[ 27.562820] uprobe_multi/561 is trying to acquire lock:
[ 27.563071] ff1100010768ba18 (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: netfs_start_io_read+0x24/0x110
[ 27.563556]
[ 27.563556] but task is already holding lock:
[ 27.563845] ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.564296]
[ 27.564296] which lock already depends on the new lock.
[ 27.564296]
[ 27.564696]
[ 27.564696] the existing dependency chain (in reverse order) is:
[ 27.565044]
[ 27.565044] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[ 27.565340] __might_fault+0xa3/0x100
[ 27.565572] _copy_to_iter+0xdf/0x1520
[ 27.565786] copy_page_to_iter+0x1c6/0x2d0
[ 27.566011] filemap_read+0x5f7/0xbf0
[ 27.566218] netfs_file_read_iter+0x1ca/0x240
[ 27.566453] vfs_read+0x482/0x9e0
[ 27.566658] ksys_read+0x115/0x1f0
[ 27.566852] do_syscall_64+0xde/0x390
[ 27.567057] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 27.567322]
[ 27.567322] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
[ 27.567701] __lock_acquire+0x1529/0x2a10
[ 27.567923] lock_acquire+0xd7/0x280
[ 27.568124] down_read_interruptible+0x55/0x340
[ 27.568368] netfs_start_io_read+0x24/0x110
[ 27.568622] netfs_file_read_iter+0x191/0x240
[ 27.568858] __kernel_read+0x401/0x850
[ 27.569068] freader_fetch+0x19b/0x790
[ 27.569279] __build_id_parse+0xde/0x580
[ 27.569528] stack_map_get_build_id_offset+0x248/0x650
[ 27.569802] __bpf_get_stack+0x653/0x890
[ 27.570019] bpf_get_stack_sleepable+0x1d/0x30
[ 27.570257] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.570594] uprobe_prog_run+0x425/0x870
[ 27.570816] uprobe_multi_link_handler+0x58/0xb0
[ 27.571062] handler_chain+0x234/0x1270
[ 27.571275] uprobe_notify_resume+0x4de/0xd60
[ 27.571542] exit_to_user_mode_loop+0xe0/0x480
[ 27.571785] exc_int3+0x1bd/0x260
[ 27.571973] asm_exc_int3+0x39/0x40
[ 27.572169]
[ 27.572169] other info that might help us debug this:
[ 27.572169]
[ 27.572571] Possible unsafe locking scenario:
[ 27.572571]
[ 27.572854] CPU0 CPU1
[ 27.573074] ---- ----
[ 27.573293] rlock(&mm->mmap_lock);
[ 27.573502] lock(&sb->s_type->i_mutex_key#8);
[ 27.573846] lock(&mm->mmap_lock);
[ 27.574135] rlock(&sb->s_type->i_mutex_key#8);
[ 27.574364]
[ 27.574364] *** DEADLOCK ***
[ 27.574364]
[ 27.574671] 3 locks held by uprobe_multi/561:
[ 27.574884] #0: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_notify_resume+0x229/0xd60
[ 27.575371] #1: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_prog_run+0x239/0x870
[ 27.575874] #2: ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.576342]
[ 27.576342] stack backtrace:
[ 27.576578] CPU: 7 UID: 0 PID: 561 Comm: uprobe_multi Tainted: G OE 7.0.0-rc5-g0eeb0094ba03 #3 PREEMPT(full)
[ 27.576586] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 27.576588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-5.el9 11/05/2023
[ 27.576591] Call Trace:
[ 27.576595] <TASK>
[ 27.576598] dump_stack_lvl+0x54/0x70
[ 27.576606] print_circular_bug+0x2e8/0x300
[ 27.576616] check_noncircular+0x12e/0x150
[ 27.576628] __lock_acquire+0x1529/0x2a10
[ 27.576642] ? __pfx_walk_pgd_range+0x10/0x10
[ 27.576651] ? srso_return_thunk+0x5/0x5f
[ 27.576656] ? lock_acquire+0xd7/0x280
[ 27.576661] ? avc_has_perm_noaudit+0x38/0x1f0
[ 27.576672] lock_acquire+0xd7/0x280
[ 27.576677] ? netfs_start_io_read+0x24/0x110
[ 27.576685] ? srso_return_thunk+0x5/0x5f
[ 27.576693] ? netfs_start_io_read+0x24/0x110
[ 27.576698] down_read_interruptible+0x55/0x340
[ 27.576702] ? netfs_start_io_read+0x24/0x110
[ 27.576707] ? __pfx_cmp_ex_search+0x10/0x10
[ 27.576716] netfs_start_io_read+0x24/0x110
[ 27.576723] netfs_file_read_iter+0x191/0x240
[ 27.576731] __kernel_read+0x401/0x850
[ 27.576741] ? __pfx___kernel_read+0x10/0x10
[ 27.576752] ? __pfx_fixup_exception+0x10/0x10
[ 27.576761] ? srso_return_thunk+0x5/0x5f
[ 27.576766] ? lock_is_held_type+0x76/0x100
[ 27.576773] ? srso_return_thunk+0x5/0x5f
[ 27.576777] ? mtree_range_walk+0x421/0x6c0
[ 27.576785] freader_fetch+0x19b/0x790
[ 27.576793] ? mt_find+0x14b/0x340
[ 27.576801] ? __pfx_freader_fetch+0x10/0x10
[ 27.576805] ? mt_find+0x29a/0x340
[ 27.576810] ? mt_find+0x14b/0x340
[ 27.576820] __build_id_parse+0xde/0x580
[ 27.576832] ? __pfx___build_id_parse+0x10/0x10
[ 27.576850] stack_map_get_build_id_offset+0x248/0x650
[ 27.576863] __bpf_get_stack+0x653/0x890
[ 27.576867] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 27.576880] ? __pfx___bpf_get_stack+0x10/0x10
[ 27.576884] ? lock_acquire+0xd7/0x280
[ 27.576889] ? uprobe_prog_run+0x239/0x870
[ 27.576895] ? srso_return_thunk+0x5/0x5f
[ 27.576899] ? stack_trace_save+0xae/0x100
[ 27.576908] bpf_get_stack_sleepable+0x1d/0x30
[ 27.576913] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.576919] uprobe_prog_run+0x425/0x870
[ 27.576927] ? srso_return_thunk+0x5/0x5f
[ 27.576933] ? __pfx_uprobe_prog_run+0x10/0x10
[ 27.576937] ? uprobe_notify_resume+0x413/0xd60
[ 27.576952] uprobe_multi_link_handler+0x58/0xb0
[ 27.576960] handler_chain+0x234/0x1270
[ 27.576976] ? __pfx_handler_chain+0x10/0x10
[ 27.576988] ? srso_return_thunk+0x5/0x5f
[ 27.576992] ? __kasan_kmalloc+0x72/0x90
[ 27.576998] ? __pfx_ri_timer+0x10/0x10
[ 27.577003] ? timer_init_key+0xf5/0x200
[ 27.577013] uprobe_notify_resume+0x4de/0xd60
[ 27.577025] ? uprobe_notify_resume+0x229/0xd60
[ 27.577030] ? __pfx_uprobe_notify_resume+0x10/0x10
[ 27.577035] ? srso_return_thunk+0x5/0x5f
[ 27.577039] ? atomic_notifier_call_chain+0xfc/0x110
[ 27.577047] ? srso_return_thunk+0x5/0x5f
[ 27.577051] ? notify_die+0xe5/0x130
[ 27.577056] ? __pfx_notify_die+0x10/0x10
[ 27.577063] ? srso_return_thunk+0x5/0x5f
[ 27.577071] exit_to_user_mode_loop+0xe0/0x480
[ 27.577076] ? asm_exc_int3+0x39/0x40
[ 27.577083] exc_int3+0x1bd/0x260
[ 27.577090] asm_exc_int3+0x39/0x40
[ 27.577095] RIP: 0033:0x6c4180
[ 27.577100] Code: c6 05 0b 6f 32 00 01 5d c3 90 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa eb 8a 66 2e 0f 1f 84 00 00 00 00 00 <cc> 48 89 e5 31 c0 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 31 c0
[ 27.577104] RSP: 002b:00007ffcbf71ada8 EFLAGS: 00000246
[ 27.577109] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fec9163db7b
[ 27.577112] RDX: 00007ffcbf71adbf RSI: 0000000000001000 RDI: 00000000007f0000
[ 27.577115] RBP: 00007ffcbf71add0 R08: 00007fec91731330 R09: 00007fec9178d060
[ 27.577118] R10: 00007fec9153ad98 R11: 0000000000000202 R12: 00007ffcbf71af08
[ 27.577120] R13: 0000000000787760 R14: 00000000009eade8 R15: 00007fec917bf000
[ 27.577135] </TASK>
#54/1 build_id/nofault-paged-out:OK
It was reproducable, which allowed me to bisect to commit 777a8560fd29
("lib/buildid: use __kernel_read() for sleepable context")
Then I used Codex for analysis of the splat, patch and relevant lore
discussions, and then asked it to produce a patch draft, which was a
starting point for this series.
I verified that the patch series fixes the deadlock splat.
---
v4->v5:
* Add comments explaining mmap_read_trylock() (Shakeel)
* Rebase on bpf-next (Alexei)
v4: https://lore.kernel.org/bpf/20260514184727.1067141-1-ihor.solodrai@linux.dev/
v3->v4:
* Change Fixes tag in patch #2 (AI)
* Nit in caching implementation (Mykyta)
v3: https://lore.kernel.org/bpf/20260512032906.2670326-1-ihor.solodrai@linux.dev/
v2->v3:
* Split patch #2 in two: stack_map_get_build_id_offset_sleepable()
implementation, and then introduce caching
* Drop taking mmap_lock if CONFIG_PER_VMA_LOCK=n, fall back to raw
IPs instead
* Cache vm_{start,end} in addition to prev_file (Mykyta)
v2: https://lore.kernel.org/bpf/20260409010604.1439087-1-ihor.solodrai@linux.dev/
v1->v2:
* Addressed feedback from Puranjay:
* split out a small refactoring patch
* use mmap_read_trylock()
* take into account CONFIG_PER_VMA_LOCK
* replace find_vma() with vma_lookup()
* cache prev_build_id to avoid re-parsing the same file
* Snapshot vm_pgoff and vm_start before unlocking (AI)
* To avoid repetitive unlocking statements, introduce struct
stack_map_vma_lock to hold relevant lock state info and add an
unlock helper
v1: https://lore.kernel.org/bpf/20260407223003.720428-1-ihor.solodrai@linux.dev/
---
Ihor Solodrai (3):
bpf: Factor out stack_map_build_id_set_ip() in stackmap.c
bpf: Avoid faultable build ID reads under mm locks
bpf: Cache build IDs in sleepable stackmap path
kernel/bpf/stackmap.c | 180 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 171 insertions(+), 9 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c 2026-05-15 0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai @ 2026-05-15 0:52 ` Ihor Solodrai 2026-05-15 0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-05-15 0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2 siblings, 0 replies; 12+ messages in thread From: Ihor Solodrai @ 2026-05-15 0:52 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team Factor out a small helper from stack_map_get_build_id_offset() in preparation for adding a sleepable build ID resolution path. No functional changes. Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index da3d328f5c15..4ef0fd06cea5 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -152,6 +152,12 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b : build_id_parse_nofault(vma, build_id, NULL); } +static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) +{ + id->status = BPF_STACK_BUILD_ID_IP; + memset(id->build_id, 0, BUILD_ID_SIZE_MAX); +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -165,23 +171,21 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, u32 trace_nr, bool user, bool may_fault) { - int i; struct mmap_unlock_irq_work *work = NULL; bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); + bool has_user_ctx = user && current && current->mm; struct vm_area_struct *vma, *prev_vma = NULL; const char *prev_build_id; + int i; /* If the irq_work is in use, fall back to report ips. Same * fallback is used for kernel stack (!user) on a stackmap with * build_id. */ - if (!user || !current || !current->mm || irq_work_busy || - !mmap_read_trylock(current->mm)) { + if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { /* cannot access current->mm, fall back to ips */ - for (i = 0; i < trace_nr; i++) { - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); - } + for (i = 0; i < trace_nr; i++) + stack_map_build_id_set_ip(&id_offs[i]); return; } @@ -196,8 +200,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, vma = find_vma(current->mm, ip); if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { /* per entry fall back to ips */ - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); + stack_map_build_id_set_ip(&id_offs[i]); continue; } build_id_valid: -- 2.54.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-15 0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-05-15 0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai @ 2026-05-15 0:52 ` Ihor Solodrai 2026-05-15 1:33 ` bot+bpf-ci 2026-05-15 22:40 ` Andrii Nakryiko 2026-05-15 0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2 siblings, 2 replies; 12+ messages in thread From: Ihor Solodrai @ 2026-05-15 0:52 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team Sleepable build ID parsing can block in __kernel_read() [1], so the stackmap sleepable path must not call it while holding mmap_lock or a per-VMA read lock. The issue and the fix are conceptually similar to a recent procfs patch [2]. Resolve each covered VMA with a stable read-side reference, preferring lock_vma_under_rcu() and falling back to mmap_read_trylock() only long enough to acquire the VMA read lock. Take a reference to the backing file, drop the VMA lock, and then parse the build ID through (sleepable) build_id_parse_file(). We have to use mmap_read_trylock() (and give up on failure) in this context because taking mmap_read_lock() is generally unsafe on code paths reachable from BPF programs [3], and may lead to deadlocks. [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") Suggested-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 109 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 4ef0fd06cea5..08f7659505d1 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -9,6 +9,7 @@ #include <linux/perf_event.h> #include <linux/btf_ids.h> #include <linux/buildid.h> +#include <linux/mmap_lock.h> #include "percpu_freelist.h" #include "mmap_unlock_work.h" @@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) memset(id->build_id, 0, BUILD_ID_SIZE_MAX); } +struct stack_map_vma_lock { + bool vma_locked; + struct vm_area_struct *vma; + struct mm_struct *mm; +}; + +static struct vm_area_struct * +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) +{ + struct mm_struct *mm = lock->mm; + struct vm_area_struct *vma; + + if (WARN_ON_ONCE(!mm)) + return NULL; + + vma = lock_vma_under_rcu(mm, ip); + if (vma) + goto vma_locked; + + /* + * Taking mmap_read_lock() is unsafe here, because the caller + * BPF program might already hold it, causing a deadlock. + */ + if (!mmap_read_trylock(mm)) + return NULL; + + vma = vma_lookup(mm, ip); + if (!vma) { + mmap_read_unlock(mm); + return NULL; + } + +#ifdef CONFIG_PER_VMA_LOCK + if (!vma_start_read_locked(vma)) { + mmap_read_unlock(mm); + return NULL; + } + mmap_read_unlock(mm); +#else + mmap_read_unlock(mm); + return NULL; +#endif +vma_locked: + lock->vma_locked = true; + lock->vma = vma; + return vma; +} + +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) +{ + struct vm_area_struct *vma = lock->vma; + + if (lock->vma_locked) { + if (WARN_ON_ONCE(!vma)) + goto out; + vma_end_read(vma); + } +out: + lock->vma_locked = false; + lock->vma = NULL; +} + +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, + u32 trace_nr) +{ + struct mm_struct *mm = current->mm; + struct stack_map_vma_lock lock = { + .vma_locked = false, + .vma = NULL, + .mm = mm, + }; + unsigned long vm_pgoff, vm_start; + struct vm_area_struct *vma; + struct file *file; + u64 ip; + + for (u32 i = 0; i < trace_nr; i++) { + ip = READ_ONCE(id_offs[i].ip); + vma = stack_map_lock_vma(&lock, ip); + if (!vma || !vma->vm_file) { + stack_map_build_id_set_ip(&id_offs[i]); + stack_map_unlock_vma(&lock); + continue; + } + + file = get_file(vma->vm_file); + vm_pgoff = vma->vm_pgoff; + vm_start = vma->vm_start; + stack_map_unlock_vma(&lock); + + /* build_id_parse_file() may block on filesystem reads */ + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { + stack_map_build_id_set_ip(&id_offs[i]); + fput(file); + continue; + } + fput(file); + + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; + } +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -178,6 +282,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, const char *prev_build_id; int i; + if (may_fault && has_user_ctx) { + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); + return; + } + /* If the irq_work is in use, fall back to report ips. Same * fallback is used for kernel stack (!user) on a stackmap with * build_id. -- 2.54.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-15 0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai @ 2026-05-15 1:33 ` bot+bpf-ci 2026-05-15 22:40 ` Andrii Nakryiko 1 sibling, 0 replies; 12+ messages in thread From: bot+bpf-ci @ 2026-05-15 1:33 UTC (permalink / raw) To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel, kernel-team, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 5331 bytes --] > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 4ef0fd06cea5..08f7659505d1 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -9,6 +9,7 @@ > #include <linux/perf_event.h> > #include <linux/btf_ids.h> > #include <linux/buildid.h> > +#include <linux/mmap_lock.h> > #include "percpu_freelist.h" > #include "mmap_unlock_work.h" > > @@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) > memset(id->build_id, 0, BUILD_ID_SIZE_MAX); > } > > +struct stack_map_vma_lock { > + bool vma_locked; > + struct vm_area_struct *vma; > + struct mm_struct *mm; > +}; > + > +static struct vm_area_struct * > +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > +{ > + struct mm_struct *mm = lock->mm; > + struct vm_area_struct *vma; > + > + if (WARN_ON_ONCE(!mm)) > + return NULL; > + > + vma = lock_vma_under_rcu(mm, ip); > + if (vma) > + goto vma_locked; > + > + /* > + * Taking mmap_read_lock() is unsafe here, because the caller > + * BPF program might already hold it, causing a deadlock. > + */ > + if (!mmap_read_trylock(mm)) > + return NULL; > + > + vma = vma_lookup(mm, ip); > + if (!vma) { > + mmap_read_unlock(mm); > + return NULL; > + } > + > +#ifdef CONFIG_PER_VMA_LOCK > + if (!vma_start_read_locked(vma)) { > + mmap_read_unlock(mm); > + return NULL; > + } > + mmap_read_unlock(mm); > +#else > + mmap_read_unlock(mm); > + return NULL; > +#endif > +vma_locked: > + lock->vma_locked = true; > + lock->vma = vma; > + return vma; > +} > + > +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) > +{ > + struct vm_area_struct *vma = lock->vma; > + > + if (lock->vma_locked) { > + if (WARN_ON_ONCE(!vma)) > + goto out; > + vma_end_read(vma); > + } > +out: > + lock->vma_locked = false; > + lock->vma = NULL; > +} > + > +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > + u32 trace_nr) > +{ > + struct mm_struct *mm = current->mm; > + struct stack_map_vma_lock lock = { > + .vma_locked = false, > + .vma = NULL, > + .mm = mm, > + }; > + unsigned long vm_pgoff, vm_start; > + struct vm_area_struct *vma; > + struct file *file; > + u64 ip; > + > + for (u32 i = 0; i < trace_nr; i++) { > + ip = READ_ONCE(id_offs[i].ip); > + vma = stack_map_lock_vma(&lock, ip); > + if (!vma || !vma->vm_file) { ^^^^^^^^^^^^^^^^ This isn't a bug, but a subsystem pattern flags this as potentially concerning: the code uses !vma->vm_file to classify the VMA as anonymous. Per mm-vma.md's VMA classification rules, the correct test for anonymous VMAs is vma_is_anonymous() (i.e. !vma->vm_ops). Some anonymous VMAs carry a vm_file reference. For example, private mappings of /dev/zero set via vma_set_anonymous() leave vm_file populated while clearing vm_ops. On such VMAs, this code would proceed down the file-backed path and hand /dev/zero (or any other vma_set_anonymous()-style driver file) to build_id_parse_file(): stack_map_get_build_id_offset_sleepable() { if (!vma || !vma->vm_file) { stack_map_build_id_set_ip(&id_offs[i]); ... continue; } file = get_file(vma->vm_file); /* gets /dev/zero */ ... build_id_parse_file(file, ...) /* tries to parse /dev/zero as ELF */ } instead of immediately falling back to reporting the raw IP. In practice, build_id_parse_file() would fail the ELF magic check with -EINVAL and fall through to stack_map_build_id_set_ip(), so no crash occurs. The code matches the pattern in the existing build_id_parse() and build_id_parse_nofault() API in lib/buildid.c, so the patch is consistent with established kernel build-ID handling. However, is it worth switching to vma_is_anonymous(vma) to avoid the unnecessary sleepable read attempt on driver files? AI reviewers bot+bpf-ci@kernel.org and sashiko-bot@kernel.org raised this across v3 and v4, and it remains unaddressed in the current version: https://lore.kernel.org/bpf/1f275575e3c2309dc7b7f099c7e5cd47bb25182d94883ca0893cf4aaf03c954c@mail.kernel.org/ > + stack_map_build_id_set_ip(&id_offs[i]); > + stack_map_unlock_vma(&lock); > + continue; > + } > + > + file = get_file(vma->vm_file); > + vm_pgoff = vma->vm_pgoff; > + vm_start = vma->vm_start; > + stack_map_unlock_vma(&lock); > + > + /* build_id_parse_file() may block on filesystem reads */ > + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { > + stack_map_build_id_set_ip(&id_offs[i]); > + fput(file); > + continue; > + } > + fput(file); > + > + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; > + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > + } > +} > + > /* > * Expects all id_offs[i].ip values to be set to correct initial IPs. > * They will be subsequently: > @@ -178,6 +282,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > const char *prev_build_id; > int i; > > + if (may_fault && has_user_ctx) { > + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); > + return; > + } > + [ ... ] --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25894427199 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-15 0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-05-15 1:33 ` bot+bpf-ci @ 2026-05-15 22:40 ` Andrii Nakryiko 2026-05-15 23:44 ` Ihor Solodrai 1 sibling, 1 reply; 12+ messages in thread From: Andrii Nakryiko @ 2026-05-15 22:40 UTC (permalink / raw) To: Ihor Solodrai, Suren Baghdasaryan Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Thu, May 14, 2026 at 5:53 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > Sleepable build ID parsing can block in __kernel_read() [1], so the > stackmap sleepable path must not call it while holding mmap_lock or a > per-VMA read lock. > > The issue and the fix are conceptually similar to a recent procfs > patch [2]. > > Resolve each covered VMA with a stable read-side reference, preferring > lock_vma_under_rcu() and falling back to mmap_read_trylock() only long > enough to acquire the VMA read lock. Take a reference to the backing > file, drop the VMA lock, and then parse the build ID through > (sleepable) build_id_parse_file(). > > We have to use mmap_read_trylock() (and give up on failure) in this > context because taking mmap_read_lock() is generally unsafe on code > paths reachable from BPF programs [3], and may lead to deadlocks. > > [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ > [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ > [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ > > Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") > Suggested-by: Puranjay Mohan <puranjay@kernel.org> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- > kernel/bpf/stackmap.c | 109 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 109 insertions(+) > > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 4ef0fd06cea5..08f7659505d1 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -9,6 +9,7 @@ > #include <linux/perf_event.h> > #include <linux/btf_ids.h> > #include <linux/buildid.h> > +#include <linux/mmap_lock.h> > #include "percpu_freelist.h" > #include "mmap_unlock_work.h" > > @@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) > memset(id->build_id, 0, BUILD_ID_SIZE_MAX); > } > > +struct stack_map_vma_lock { > + bool vma_locked; > + struct vm_area_struct *vma; > + struct mm_struct *mm; > +}; > + tbh, it feels like this should be provided as some sort of primitive by vma/mm API given how common it becomes when one tries to work with VMAs efficiently (in terms of avoiding unnecessary mmap lock)... but that would be a question to Suren maybe > +static struct vm_area_struct * > +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > +{ > + struct mm_struct *mm = lock->mm; > + struct vm_area_struct *vma; > + > + if (WARN_ON_ONCE(!mm)) > + return NULL; > + > + vma = lock_vma_under_rcu(mm, ip); > + if (vma) > + goto vma_locked; > + > + /* > + * Taking mmap_read_lock() is unsafe here, because the caller > + * BPF program might already hold it, causing a deadlock. > + */ > + if (!mmap_read_trylock(mm)) > + return NULL; > + > + vma = vma_lookup(mm, ip); > + if (!vma) { > + mmap_read_unlock(mm); > + return NULL; > + } > + > +#ifdef CONFIG_PER_VMA_LOCK > + if (!vma_start_read_locked(vma)) { > + mmap_read_unlock(mm); > + return NULL; > + } > + mmap_read_unlock(mm); > +#else > + mmap_read_unlock(mm); > + return NULL; > +#endif > +vma_locked: > + lock->vma_locked = true; > + lock->vma = vma; > + return vma; > +} > + cc'ing Suren to help check we didn't miss any of the per-VMA/mmap locking gotchas > +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) > +{ > + struct vm_area_struct *vma = lock->vma; > + > + if (lock->vma_locked) { > + if (WARN_ON_ONCE(!vma)) > + goto out; > + vma_end_read(vma); > + } > +out: > + lock->vma_locked = false; > + lock->vma = NULL; > +} > + > +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > + u32 trace_nr) why is this only sleepable case? the only difference between sleepable and non-sleepable is the use of build_id_parse[_file] vs build_id_parse_nofault to fetch build ID, no? Other than that the algorithm of locking VMAs is the same, no? > +{ > + struct mm_struct *mm = current->mm; > + struct stack_map_vma_lock lock = { > + .vma_locked = false, > + .vma = NULL, > + .mm = mm, > + }; > + unsigned long vm_pgoff, vm_start; > + struct vm_area_struct *vma; > + struct file *file; > + u64 ip; > + > + for (u32 i = 0; i < trace_nr; i++) { > + ip = READ_ONCE(id_offs[i].ip); > + vma = stack_map_lock_vma(&lock, ip); > + if (!vma || !vma->vm_file) { > + stack_map_build_id_set_ip(&id_offs[i]); > + stack_map_unlock_vma(&lock); > + continue; > + } > + > + file = get_file(vma->vm_file); > + vm_pgoff = vma->vm_pgoff; > + vm_start = vma->vm_start; nit: we can calculate offset here instead of carrying over pgoff and start, offset formula is cheap, no big deal > + stack_map_unlock_vma(&lock); > + > + /* build_id_parse_file() may block on filesystem reads */ > + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { > + stack_map_build_id_set_ip(&id_offs[i]); > + fput(file); > + continue; > + } > + fput(file); > + > + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; > + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > + } > +} > + > /* > * Expects all id_offs[i].ip values to be set to correct initial IPs. > * They will be subsequently: > @@ -178,6 +282,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > const char *prev_build_id; > int i; > > + if (may_fault && has_user_ctx) { > + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); > + return; > + } > + > /* If the irq_work is in use, fall back to report ips. Same > * fallback is used for kernel stack (!user) on a stackmap with > * build_id. > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-15 22:40 ` Andrii Nakryiko @ 2026-05-15 23:44 ` Ihor Solodrai 2026-05-16 2:15 ` Andrii Nakryiko 0 siblings, 1 reply; 12+ messages in thread From: Ihor Solodrai @ 2026-05-15 23:44 UTC (permalink / raw) To: Andrii Nakryiko, Suren Baghdasaryan Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On 5/15/26 3:40 PM, Andrii Nakryiko wrote: > On Thu, May 14, 2026 at 5:53 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: >> >> Sleepable build ID parsing can block in __kernel_read() [1], so the >> stackmap sleepable path must not call it while holding mmap_lock or a >> per-VMA read lock. >> >> The issue and the fix are conceptually similar to a recent procfs >> patch [2]. >> >> Resolve each covered VMA with a stable read-side reference, preferring >> lock_vma_under_rcu() and falling back to mmap_read_trylock() only long >> enough to acquire the VMA read lock. Take a reference to the backing >> file, drop the VMA lock, and then parse the build ID through >> (sleepable) build_id_parse_file(). >> >> We have to use mmap_read_trylock() (and give up on failure) in this >> context because taking mmap_read_lock() is generally unsafe on code >> paths reachable from BPF programs [3], and may lead to deadlocks. >> >> [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ >> [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ >> [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ >> >> Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") >> Suggested-by: Puranjay Mohan <puranjay@kernel.org> >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> >> --- >> kernel/bpf/stackmap.c | 109 ++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 109 insertions(+) >> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index 4ef0fd06cea5..08f7659505d1 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c >> @@ -9,6 +9,7 @@ >> #include <linux/perf_event.h> >> #include <linux/btf_ids.h> >> #include <linux/buildid.h> >> +#include <linux/mmap_lock.h> >> #include "percpu_freelist.h" >> #include "mmap_unlock_work.h" >> >> @@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) >> memset(id->build_id, 0, BUILD_ID_SIZE_MAX); >> } >> >> +struct stack_map_vma_lock { >> + bool vma_locked; >> + struct vm_area_struct *vma; >> + struct mm_struct *mm; >> +}; >> + > > tbh, it feels like this should be provided as some sort of primitive > by vma/mm API given how common it becomes when one tries to work with > VMAs efficiently (in terms of avoiding unnecessary mmap lock)... but > that would be a question to Suren maybe Shakeel raised the same point in v4 [1]. I don't know of any potential users of this pattern, and refactoring this into a generic mechanism now (with only stackmap using it) seems premature. Although I agree with the premise. Let's see if Suren has an opinion. [1] https://lore.kernel.org/bpf/agYzuDIJszA_7rp3@linux.dev/ > >> +static struct vm_area_struct * >> +stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) >> +{ >> + struct mm_struct *mm = lock->mm; >> + struct vm_area_struct *vma; >> + >> + if (WARN_ON_ONCE(!mm)) >> + return NULL; >> + >> + vma = lock_vma_under_rcu(mm, ip); >> + if (vma) >> + goto vma_locked; >> + >> + /* >> + * Taking mmap_read_lock() is unsafe here, because the caller >> + * BPF program might already hold it, causing a deadlock. >> + */ >> + if (!mmap_read_trylock(mm)) >> + return NULL; >> + >> + vma = vma_lookup(mm, ip); >> + if (!vma) { >> + mmap_read_unlock(mm); >> + return NULL; >> + } >> + >> +#ifdef CONFIG_PER_VMA_LOCK >> + if (!vma_start_read_locked(vma)) { >> + mmap_read_unlock(mm); >> + return NULL; >> + } >> + mmap_read_unlock(mm); >> +#else >> + mmap_read_unlock(mm); >> + return NULL; >> +#endif >> +vma_locked: >> + lock->vma_locked = true; >> + lock->vma = vma; >> + return vma; >> +} >> + > > cc'ing Suren to help check we didn't miss any of the per-VMA/mmap > locking gotchas > >> +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) >> +{ >> + struct vm_area_struct *vma = lock->vma; >> + >> + if (lock->vma_locked) { >> + if (WARN_ON_ONCE(!vma)) >> + goto out; >> + vma_end_read(vma); >> + } >> +out: >> + lock->vma_locked = false; >> + lock->vma = NULL; >> +} >> + >> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, >> + u32 trace_nr) > > why is this only sleepable case? the only difference between sleepable > and non-sleepable is the use of build_id_parse[_file] vs > build_id_parse_nofault to fetch build ID, no? Other than that the > algorithm of locking VMAs is the same, no? There seems to be a difference in lock lifetimes, because non-sleepable path may run with IRQs disabled. Right now it does: bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); ... if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { // fallback to IPs... } for (i = 0; i < trace_nr; i++) { // fetch build_id ... } bpf_mmap_unlock_mm(work, current->mm); While on sleepable path we always release the lock before parsing with the new stack_map_unlock_vma(), because build_id_parse_file() can block. Not sure how this could be reconciled in a common lock/unlock pattern, if that's what you're suggesting. For me it's easier to reason about when sleepable / non-sleepable are on distinct codepaths, although we may be able to factor out some common helpers. > >> +{ >> + struct mm_struct *mm = current->mm; >> + struct stack_map_vma_lock lock = { >> + .vma_locked = false, >> + .vma = NULL, >> + .mm = mm, >> + }; >> + unsigned long vm_pgoff, vm_start; >> + struct vm_area_struct *vma; >> + struct file *file; >> + u64 ip; >> + >> + for (u32 i = 0; i < trace_nr; i++) { >> + ip = READ_ONCE(id_offs[i].ip); >> + vma = stack_map_lock_vma(&lock, ip); >> + if (!vma || !vma->vm_file) { >> + stack_map_build_id_set_ip(&id_offs[i]); >> + stack_map_unlock_vma(&lock); >> + continue; >> + } >> + >> + file = get_file(vma->vm_file); >> + vm_pgoff = vma->vm_pgoff; >> + vm_start = vma->vm_start; > > nit: we can calculate offset here instead of carrying over pgoff and > start, offset formula is cheap, no big deal > > >> + stack_map_unlock_vma(&lock); >> + >> + /* build_id_parse_file() may block on filesystem reads */ >> + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { >> + stack_map_build_id_set_ip(&id_offs[i]); >> + fput(file); >> + continue; >> + } >> + fput(file); >> + >> + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; >> + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; >> + } >> +} >> + >> /* >> * Expects all id_offs[i].ip values to be set to correct initial IPs. >> * They will be subsequently: >> @@ -178,6 +282,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, >> const char *prev_build_id; >> int i; >> >> + if (may_fault && has_user_ctx) { >> + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); >> + return; >> + } >> + >> /* If the irq_work is in use, fall back to report ips. Same >> * fallback is used for kernel stack (!user) on a stackmap with >> * build_id. >> -- >> 2.54.0 >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-15 23:44 ` Ihor Solodrai @ 2026-05-16 2:15 ` Andrii Nakryiko 0 siblings, 0 replies; 12+ messages in thread From: Andrii Nakryiko @ 2026-05-16 2:15 UTC (permalink / raw) To: Ihor Solodrai Cc: Suren Baghdasaryan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Fri, May 15, 2026 at 4:44 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > On 5/15/26 3:40 PM, Andrii Nakryiko wrote: > > On Thu, May 14, 2026 at 5:53 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > >> > >> Sleepable build ID parsing can block in __kernel_read() [1], so the > >> stackmap sleepable path must not call it while holding mmap_lock or a > >> per-VMA read lock. > >> > >> The issue and the fix are conceptually similar to a recent procfs > >> patch [2]. > >> > >> Resolve each covered VMA with a stable read-side reference, preferring > >> lock_vma_under_rcu() and falling back to mmap_read_trylock() only long > >> enough to acquire the VMA read lock. Take a reference to the backing > >> file, drop the VMA lock, and then parse the build ID through > >> (sleepable) build_id_parse_file(). > >> > >> We have to use mmap_read_trylock() (and give up on failure) in this > >> context because taking mmap_read_lock() is generally unsafe on code > >> paths reachable from BPF programs [3], and may lead to deadlocks. > >> > >> [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ > >> [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ > >> [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ > >> > >> Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") > >> Suggested-by: Puranjay Mohan <puranjay@kernel.org> > >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > >> --- > >> kernel/bpf/stackmap.c | 109 ++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 109 insertions(+) > >> > >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > >> index 4ef0fd06cea5..08f7659505d1 100644 > >> --- a/kernel/bpf/stackmap.c > >> +++ b/kernel/bpf/stackmap.c > >> @@ -9,6 +9,7 @@ > >> #include <linux/perf_event.h> > >> #include <linux/btf_ids.h> > >> #include <linux/buildid.h> > >> +#include <linux/mmap_lock.h> > >> #include "percpu_freelist.h" > >> #include "mmap_unlock_work.h" > >> > >> @@ -158,6 +159,109 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) > >> memset(id->build_id, 0, BUILD_ID_SIZE_MAX); > >> } > >> > >> +struct stack_map_vma_lock { > >> + bool vma_locked; > >> + struct vm_area_struct *vma; > >> + struct mm_struct *mm; > >> +}; > >> + > > > > tbh, it feels like this should be provided as some sort of primitive > > by vma/mm API given how common it becomes when one tries to work with > > VMAs efficiently (in terms of avoiding unnecessary mmap lock)... but > > that would be a question to Suren maybe > > Shakeel raised the same point in v4 [1]. I don't know of any potential PROCMAP_QUERY and BPF's VMA iterator? > users of this pattern, and refactoring this into a generic mechanism > now (with only stackmap using it) seems premature. Although I agree > with the premise. Let's see if Suren has an opinion. > we shouldn't block on this generalization, just something to consider for the nearest future > [1] https://lore.kernel.org/bpf/agYzuDIJszA_7rp3@linux.dev/ > > > [...] > > why is this only sleepable case? the only difference between sleepable > > and non-sleepable is the use of build_id_parse[_file] vs > > build_id_parse_nofault to fetch build ID, no? Other than that the > > algorithm of locking VMAs is the same, no? > > There seems to be a difference in lock lifetimes, because > non-sleepable path may run with IRQs disabled. Right now it does: > > bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); > ... > if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { > // fallback to IPs... > } > for (i = 0; i < trace_nr; i++) { > // fetch build_id ... > } > bpf_mmap_unlock_mm(work, current->mm); > > While on sleepable path we always release the lock before parsing with > the new stack_map_unlock_vma(), because build_id_parse_file() can > block. > > Not sure how this could be reconciled in a common lock/unlock pattern, > if that's what you're suggesting. > > For me it's easier to reason about when sleepable / non-sleepable are > on distinct codepaths, although we may be able to factor out some > common helpers. The problem is that the rest of the logic is not exactly short or trivial, so it's a bit of a liability to have multiple copies of it across sleepable/non-sleepable implementations. But never mind, we don't have to do it right now. > > > > [...] ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-15 0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-05-15 0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai 2026-05-15 0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai @ 2026-05-15 0:52 ` Ihor Solodrai 2026-05-15 1:21 ` bot+bpf-ci 2026-05-15 22:40 ` Andrii Nakryiko 2 siblings, 2 replies; 12+ messages in thread From: Ihor Solodrai @ 2026-05-15 0:52 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team Stack traces often contain adjacent IPs from the same VMA or from different VMAs backed by the same ELF file. Cache the last successfully parsed build ID together with the resolved VMA range and backing file so the sleepable build-ID path can avoid repeated VMA locking and file parsing in common cases. Suggested-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 56 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 53 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 08f7659505d1..7336fd55c856 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -230,13 +230,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i .vma = NULL, .mm = mm, }; - unsigned long vm_pgoff, vm_start; + struct { + struct file *file; + const unsigned char *build_id; + unsigned long vm_start; + unsigned long vm_end; + unsigned long vm_pgoff; + } cache = {}; + unsigned long vm_pgoff, vm_start, vm_end; struct vm_area_struct *vma; struct file *file; u64 ip; for (u32 i = 0; i < trace_nr; i++) { ip = READ_ONCE(id_offs[i].ip); + + /* + * Range cache fast path: if ip falls within the previously + * resolved VMA range, reuse the cache build_id without + * re-acquiring the VMA lock. + */ + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { + vm_start = cache.vm_start; + vm_end = cache.vm_end; + vm_pgoff = cache.vm_pgoff; + goto build_id_valid; + } + vma = stack_map_lock_vma(&lock, ip); if (!vma || !vma->vm_file) { stack_map_build_id_set_ip(&id_offs[i]); @@ -244,9 +264,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i continue; } - file = get_file(vma->vm_file); + file = vma->vm_file; vm_pgoff = vma->vm_pgoff; vm_start = vma->vm_start; + vm_end = vma->vm_end; + + if (file == cache.file) { + /* + * Same backing file as previous (e.g. different VMAs + * of the same ELF binary). Reuse the cache build_id. + */ + stack_map_unlock_vma(&lock); + goto build_id_valid; + } + + file = get_file(file); stack_map_unlock_vma(&lock); /* build_id_parse_file() may block on filesystem reads */ @@ -255,11 +287,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i fput(file); continue; } - fput(file); + if (cache.file) + fput(cache.file); + cache.file = file; + cache.build_id = id_offs[i].build_id; + +build_id_valid: + /* + * In the slow path cache.build_id points to id_offs[i].build_id. + * Cache hits leave cache.build_id pointing at a prior slot, + * triggering the memcpy here. + */ + if (cache.build_id != id_offs[i].build_id) + memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX); + cache.vm_start = vm_start; + cache.vm_end = vm_end; + cache.vm_pgoff = vm_pgoff; id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; id_offs[i].status = BPF_STACK_BUILD_ID_VALID; } + + if (cache.file) + fput(cache.file); } /* -- 2.54.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-15 0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai @ 2026-05-15 1:21 ` bot+bpf-ci 2026-05-15 22:40 ` Andrii Nakryiko 1 sibling, 0 replies; 12+ messages in thread From: bot+bpf-ci @ 2026-05-15 1:21 UTC (permalink / raw) To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel, kernel-team, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 3794 bytes --] > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 08f7659505d1..7336fd55c856 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -230,13 +230,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > .vma = NULL, > .mm = mm, > }; > - unsigned long vm_pgoff, vm_start; > + struct { > + struct file *file; > + const unsigned char *build_id; > + unsigned long vm_start; > + unsigned long vm_end; > + unsigned long vm_pgoff; > + } cache = {}; > + unsigned long vm_pgoff, vm_start, vm_end; > struct vm_area_struct *vma; > struct file *file; > u64 ip; > > for (u32 i = 0; i < trace_nr; i++) { > ip = READ_ONCE(id_offs[i].ip); > + > + /* > + * Range cache fast path: if ip falls within the previously > + * resolved VMA range, reuse the cache build_id without > + * re-acquiring the VMA lock. > + */ > + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { > + vm_start = cache.vm_start; > + vm_end = cache.vm_end; > + vm_pgoff = cache.vm_pgoff; > + goto build_id_valid; > + } > + > vma = stack_map_lock_vma(&lock, ip); > if (!vma || !vma->vm_file) { > stack_map_build_id_set_ip(&id_offs[i]); In v3 and v4, sashiko-bot@kernel.org and bot+bpf-ci@kernel.org raised a concern about VMA classification here. The code uses !vma->vm_file to determine if a VMA is file-backed, but this could cause issues with anonymous VMAs that still have a backing file. Private mappings of /dev/zero or driver memory can have a non-NULL vma->vm_file but are still considered anonymous. If the stackmap path proceeds to parse the ELF build ID from these files, could it trigger unintended side effects, such as consuming FIFO data or altering hardware state when reading from character devices? Would it be safer to check vma_is_anonymous(vma) here to properly classify the VMA? Original comment from sashiko-bot@kernel.org (v3): https://lore.kernel.org/bpf/20260513043154.24483C2BCB7@smtp.kernel.org/ > @@ -244,9 +264,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > continue; > } > > - file = get_file(vma->vm_file); > + file = vma->vm_file; > vm_pgoff = vma->vm_pgoff; > vm_start = vma->vm_start; > + vm_end = vma->vm_end; > + > + if (file == cache.file) { > + /* > + * Same backing file as previous (e.g. different VMAs > + * of the same ELF binary). Reuse the cache build_id. > + */ > + stack_map_unlock_vma(&lock); > + goto build_id_valid; > + } > + > + file = get_file(file); > stack_map_unlock_vma(&lock); > > /* build_id_parse_file() may block on filesystem reads */ > @@ -255,11 +287,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > fput(file); > continue; > } > - fput(file); > > + if (cache.file) > + fput(cache.file); > + cache.file = file; > + cache.build_id = id_offs[i].build_id; > + > +build_id_valid: > + /* > + * In the slow path cache.build_id points to id_offs[i].build_id. > + * Cache hits leave cache.build_id pointing at a prior slot, > + * triggering the memcpy here. > + */ > + if (cache.build_id != id_offs[i].build_id) > + memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX); > + cache.vm_start = vm_start; > + cache.vm_end = vm_end; > + cache.vm_pgoff = vm_pgoff; > id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; > id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > } > + > + if (cache.file) > + fput(cache.file); > } > > /* --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25894427199 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-15 0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2026-05-15 1:21 ` bot+bpf-ci @ 2026-05-15 22:40 ` Andrii Nakryiko 2026-05-15 23:59 ` Ihor Solodrai 1 sibling, 1 reply; 12+ messages in thread From: Andrii Nakryiko @ 2026-05-15 22:40 UTC (permalink / raw) To: Ihor Solodrai Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Thu, May 14, 2026 at 5:53 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > Stack traces often contain adjacent IPs from the same VMA or from > different VMAs backed by the same ELF file. Cache the last successfully > parsed build ID together with the resolved VMA range and backing file > so the sleepable build-ID path can avoid repeated VMA locking and file > parsing in common cases. > > Suggested-by: Mykyta Yatsenko <yatsenko@meta.com> > Acked-by: Mykyta Yatsenko <yatsenko@meta.com> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- > kernel/bpf/stackmap.c | 56 ++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 53 insertions(+), 3 deletions(-) > > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 08f7659505d1..7336fd55c856 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -230,13 +230,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > .vma = NULL, > .mm = mm, > }; > - unsigned long vm_pgoff, vm_start; > + struct { > + struct file *file; > + const unsigned char *build_id; > + unsigned long vm_start; > + unsigned long vm_end; > + unsigned long vm_pgoff; > + } cache = {}; > + unsigned long vm_pgoff, vm_start, vm_end; > struct vm_area_struct *vma; > struct file *file; > u64 ip; > > for (u32 i = 0; i < trace_nr; i++) { > ip = READ_ONCE(id_offs[i].ip); > + > + /* > + * Range cache fast path: if ip falls within the previously > + * resolved VMA range, reuse the cache build_id without > + * re-acquiring the VMA lock. > + */ > + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { > + vm_start = cache.vm_start; > + vm_end = cache.vm_end; > + vm_pgoff = cache.vm_pgoff; isn't this unnecessarily convoluted way of doing things? If we are here, we know for sure a) how to calculate offset and b) we need to memcpy. So just do that here and just then maybe jump all the way to id_offs[i].offset setting. Or just fill offset, status and memcpy build id here and continue. > + goto build_id_valid; > + } > + > vma = stack_map_lock_vma(&lock, ip); > if (!vma || !vma->vm_file) { > stack_map_build_id_set_ip(&id_offs[i]); > @@ -244,9 +264,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > continue; > } > > - file = get_file(vma->vm_file); > + file = vma->vm_file; > vm_pgoff = vma->vm_pgoff; > vm_start = vma->vm_start; > + vm_end = vma->vm_end; > + > + if (file == cache.file) { > + /* > + * Same backing file as previous (e.g. different VMAs > + * of the same ELF binary). Reuse the cache build_id. > + */ > + stack_map_unlock_vma(&lock); > + goto build_id_valid; > + } same thing here, file matched, so memcpy(build_id), calculate correct offset, set offset and status, unlock and continue. This logic is pretty straightforward, IMO it's cleaner to copy-paste these 4 lines than dealing with the goto maze... > + > + file = get_file(file); > stack_map_unlock_vma(&lock); > > /* build_id_parse_file() may block on filesystem reads */ > @@ -255,11 +287,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > fput(file); > continue; > } > - fput(file); > > + if (cache.file) > + fput(cache.file); > + cache.file = file; > + cache.build_id = id_offs[i].build_id; > + > +build_id_valid: > + /* > + * In the slow path cache.build_id points to id_offs[i].build_id. > + * Cache hits leave cache.build_id pointing at a prior slot, > + * triggering the memcpy here. > + */ > + if (cache.build_id != id_offs[i].build_id) > + memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX); > + cache.vm_start = vm_start; > + cache.vm_end = vm_end; > + cache.vm_pgoff = vm_pgoff; > id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; > id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > } > + > + if (cache.file) > + fput(cache.file); > } > > /* > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-15 22:40 ` Andrii Nakryiko @ 2026-05-15 23:59 ` Ihor Solodrai 2026-05-16 2:20 ` Andrii Nakryiko 0 siblings, 1 reply; 12+ messages in thread From: Ihor Solodrai @ 2026-05-15 23:59 UTC (permalink / raw) To: Andrii Nakryiko Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On 5/15/26 3:40 PM, Andrii Nakryiko wrote: > On Thu, May 14, 2026 at 5:53 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: >> >> Stack traces often contain adjacent IPs from the same VMA or from >> different VMAs backed by the same ELF file. Cache the last successfully >> parsed build ID together with the resolved VMA range and backing file >> so the sleepable build-ID path can avoid repeated VMA locking and file >> parsing in common cases. >> >> Suggested-by: Mykyta Yatsenko <yatsenko@meta.com> >> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> >> --- >> kernel/bpf/stackmap.c | 56 ++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 53 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index 08f7659505d1..7336fd55c856 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c >> @@ -230,13 +230,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i >> .vma = NULL, >> .mm = mm, >> }; >> - unsigned long vm_pgoff, vm_start; >> + struct { >> + struct file *file; >> + const unsigned char *build_id; >> + unsigned long vm_start; >> + unsigned long vm_end; >> + unsigned long vm_pgoff; >> + } cache = {}; >> + unsigned long vm_pgoff, vm_start, vm_end; >> struct vm_area_struct *vma; >> struct file *file; >> u64 ip; >> >> for (u32 i = 0; i < trace_nr; i++) { >> ip = READ_ONCE(id_offs[i].ip); >> + >> + /* >> + * Range cache fast path: if ip falls within the previously >> + * resolved VMA range, reuse the cache build_id without >> + * re-acquiring the VMA lock. >> + */ >> + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { >> + vm_start = cache.vm_start; >> + vm_end = cache.vm_end; >> + vm_pgoff = cache.vm_pgoff; > > isn't this unnecessarily convoluted way of doing things? If we are > here, we know for sure a) how to calculate offset and b) we need to > memcpy. So just do that here and just then maybe jump all the way to > id_offs[i].offset setting. Or just fill offset, status and memcpy > build id here and continue. > >> + goto build_id_valid; >> + } >> + >> vma = stack_map_lock_vma(&lock, ip); >> if (!vma || !vma->vm_file) { >> stack_map_build_id_set_ip(&id_offs[i]); >> @@ -244,9 +264,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i >> continue; >> } >> >> - file = get_file(vma->vm_file); >> + file = vma->vm_file; >> vm_pgoff = vma->vm_pgoff; >> vm_start = vma->vm_start; >> + vm_end = vma->vm_end; >> + >> + if (file == cache.file) { >> + /* >> + * Same backing file as previous (e.g. different VMAs >> + * of the same ELF binary). Reuse the cache build_id. >> + */ >> + stack_map_unlock_vma(&lock); >> + goto build_id_valid; >> + } > > same thing here, file matched, so memcpy(build_id), calculate correct > offset, set offset and status, unlock and continue. This logic is > pretty straightforward, IMO it's cleaner to copy-paste these 4 lines > than dealing with the goto maze... Eeh... it's a style preference I guess. I had two memcpy() points originally, and then Mykyta suggested to consolidate them, so I did. Having a single "build_id_valid" point with a common "ok" sequence makes sense to me, but at the same time I am not allergic to 4-line copy pastes. AFAIU with your suggestion we'll still have a label at id_offs[i].offset = ... so it's not like we'll get rid of goto right? > > >> + >> + file = get_file(file); >> stack_map_unlock_vma(&lock); >> >> /* build_id_parse_file() may block on filesystem reads */ >> @@ -255,11 +287,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i >> fput(file); >> continue; >> } >> - fput(file); >> >> + if (cache.file) >> + fput(cache.file); >> + cache.file = file; >> + cache.build_id = id_offs[i].build_id; >> + >> +build_id_valid: >> + /* >> + * In the slow path cache.build_id points to id_offs[i].build_id. >> + * Cache hits leave cache.build_id pointing at a prior slot, >> + * triggering the memcpy here. >> + */ >> + if (cache.build_id != id_offs[i].build_id) >> + memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX); >> + cache.vm_start = vm_start; >> + cache.vm_end = vm_end; >> + cache.vm_pgoff = vm_pgoff; >> id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; >> id_offs[i].status = BPF_STACK_BUILD_ID_VALID; >> } >> + >> + if (cache.file) >> + fput(cache.file); >> } >> >> /* >> -- >> 2.54.0 >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-15 23:59 ` Ihor Solodrai @ 2026-05-16 2:20 ` Andrii Nakryiko 0 siblings, 0 replies; 12+ messages in thread From: Andrii Nakryiko @ 2026-05-16 2:20 UTC (permalink / raw) To: Ihor Solodrai Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Fri, May 15, 2026 at 4:59 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > On 5/15/26 3:40 PM, Andrii Nakryiko wrote: > > On Thu, May 14, 2026 at 5:53 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > >> > >> Stack traces often contain adjacent IPs from the same VMA or from > >> different VMAs backed by the same ELF file. Cache the last successfully > >> parsed build ID together with the resolved VMA range and backing file > >> so the sleepable build-ID path can avoid repeated VMA locking and file > >> parsing in common cases. > >> > >> Suggested-by: Mykyta Yatsenko <yatsenko@meta.com> > >> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> > >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > >> --- > >> kernel/bpf/stackmap.c | 56 ++++++++++++++++++++++++++++++++++++++++--- > >> 1 file changed, 53 insertions(+), 3 deletions(-) > >> > >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > >> index 08f7659505d1..7336fd55c856 100644 > >> --- a/kernel/bpf/stackmap.c > >> +++ b/kernel/bpf/stackmap.c > >> @@ -230,13 +230,33 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > >> .vma = NULL, > >> .mm = mm, > >> }; > >> - unsigned long vm_pgoff, vm_start; > >> + struct { > >> + struct file *file; > >> + const unsigned char *build_id; > >> + unsigned long vm_start; > >> + unsigned long vm_end; > >> + unsigned long vm_pgoff; > >> + } cache = {}; > >> + unsigned long vm_pgoff, vm_start, vm_end; > >> struct vm_area_struct *vma; > >> struct file *file; > >> u64 ip; > >> > >> for (u32 i = 0; i < trace_nr; i++) { > >> ip = READ_ONCE(id_offs[i].ip); > >> + > >> + /* > >> + * Range cache fast path: if ip falls within the previously > >> + * resolved VMA range, reuse the cache build_id without > >> + * re-acquiring the VMA lock. > >> + */ > >> + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { > >> + vm_start = cache.vm_start; > >> + vm_end = cache.vm_end; > >> + vm_pgoff = cache.vm_pgoff; > > > > isn't this unnecessarily convoluted way of doing things? If we are > > here, we know for sure a) how to calculate offset and b) we need to > > memcpy. So just do that here and just then maybe jump all the way to > > id_offs[i].offset setting. Or just fill offset, status and memcpy > > build id here and continue. > > > >> + goto build_id_valid; > >> + } > >> + > >> vma = stack_map_lock_vma(&lock, ip); > >> if (!vma || !vma->vm_file) { > >> stack_map_build_id_set_ip(&id_offs[i]); > >> @@ -244,9 +264,21 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > >> continue; > >> } > >> > >> - file = get_file(vma->vm_file); > >> + file = vma->vm_file; > >> vm_pgoff = vma->vm_pgoff; > >> vm_start = vma->vm_start; > >> + vm_end = vma->vm_end; > >> + > >> + if (file == cache.file) { > >> + /* > >> + * Same backing file as previous (e.g. different VMAs > >> + * of the same ELF binary). Reuse the cache build_id. > >> + */ > >> + stack_map_unlock_vma(&lock); > >> + goto build_id_valid; > >> + } > > > > same thing here, file matched, so memcpy(build_id), calculate correct > > offset, set offset and status, unlock and continue. This logic is > > pretty straightforward, IMO it's cleaner to copy-paste these 4 lines > > than dealing with the goto maze... > > > Eeh... it's a style preference I guess. I had two memcpy() points > originally, and then Mykyta suggested to consolidate them, so I did. > > Having a single "build_id_valid" point with a common "ok" sequence > makes sense to me, but at the same time I am not allergic to 4-line > copy pastes. > > AFAIU with your suggestion we'll still have a label at > id_offs[i].offset = ... so it's not like we'll get rid of goto right? I'd forego goto altogether. You extracted stack_map_build_id_set_ip() for absolute IP case, why not add stack_map_build_id_set_offset() (or something a bit more reasonably named): void stack_map_build_id_set_offset(struct bpf_stack_build_id *dst, unsigned long off, void *build_id) { memcpy(dst->build_id, build_id, BUILD_ID_SIZE_MAX); dst->offset = off; dst->status = BPF_STACK_BUILD_ID_VALID; } and then `stack_map_build_id_set_offset() + continue` for each of three possible build id cases > > > > > > >> + > >> + file = get_file(file); > >> stack_map_unlock_vma(&lock); > >> > >> /* build_id_parse_file() may block on filesystem reads */ > >> @@ -255,11 +287,29 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > >> fput(file); > >> continue; > >> } > >> - fput(file); > >> > >> + if (cache.file) > >> + fput(cache.file); > >> + cache.file = file; > >> + cache.build_id = id_offs[i].build_id; > >> + > >> +build_id_valid: > >> + /* > >> + * In the slow path cache.build_id points to id_offs[i].build_id. > >> + * Cache hits leave cache.build_id pointing at a prior slot, > >> + * triggering the memcpy here. > >> + */ > >> + if (cache.build_id != id_offs[i].build_id) > >> + memcpy(id_offs[i].build_id, cache.build_id, BUILD_ID_SIZE_MAX); > >> + cache.vm_start = vm_start; > >> + cache.vm_end = vm_end; > >> + cache.vm_pgoff = vm_pgoff; > >> id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; > >> id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > >> } > >> + > >> + if (cache.file) > >> + fput(cache.file); > >> } > >> > >> /* > >> -- > >> 2.54.0 > >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-05-16 2:20 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-15 0:52 [PATCH bpf-next v5 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-05-15 0:52 ` [PATCH bpf-next v5 1/3] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai 2026-05-15 0:52 ` [PATCH bpf-next v5 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-05-15 1:33 ` bot+bpf-ci 2026-05-15 22:40 ` Andrii Nakryiko 2026-05-15 23:44 ` Ihor Solodrai 2026-05-16 2:15 ` Andrii Nakryiko 2026-05-15 0:52 ` [PATCH bpf-next v5 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2026-05-15 1:21 ` bot+bpf-ci 2026-05-15 22:40 ` Andrii Nakryiko 2026-05-15 23:59 ` Ihor Solodrai 2026-05-16 2:20 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox