* [PATCH bpf-next v6 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable()
@ 2026-05-21 22:50 Ihor Solodrai
2026-05-21 22:50 ` [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers Ihor Solodrai
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Ihor Solodrai @ 2026-05-21 22:50 UTC (permalink / raw)
To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Eduard Zingerman, Kumar Kartikeya Dwivedi
Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel,
kernel-team
The series introduces stack_map_get_build_id_offset_sleepable(),
fixing a gap with parsing build_id in sleepable context in stackmap.c
I noticed a following deadlock splat on bpf-next, when running a
subset of selftests/bpf:
#526 uprobe_multi_test:OK
#28 bpf_obj_id:OK
[ 27.561465]
[ 27.561603] ======================================================
[ 27.561899] WARNING: possible circular locking dependency detected
[ 27.562193] 7.0.0-rc5-g0eeb0094ba03 #3 Tainted: G OE
[ 27.562523] ------------------------------------------------------
[ 27.562820] uprobe_multi/561 is trying to acquire lock:
[ 27.563071] ff1100010768ba18 (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: netfs_start_io_read+0x24/0x110
[ 27.563556]
[ 27.563556] but task is already holding lock:
[ 27.563845] ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.564296]
[ 27.564296] which lock already depends on the new lock.
[ 27.564296]
[ 27.564696]
[ 27.564696] the existing dependency chain (in reverse order) is:
[ 27.565044]
[ 27.565044] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[ 27.565340] __might_fault+0xa3/0x100
[ 27.565572] _copy_to_iter+0xdf/0x1520
[ 27.565786] copy_page_to_iter+0x1c6/0x2d0
[ 27.566011] filemap_read+0x5f7/0xbf0
[ 27.566218] netfs_file_read_iter+0x1ca/0x240
[ 27.566453] vfs_read+0x482/0x9e0
[ 27.566658] ksys_read+0x115/0x1f0
[ 27.566852] do_syscall_64+0xde/0x390
[ 27.567057] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 27.567322]
[ 27.567322] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
[ 27.567701] __lock_acquire+0x1529/0x2a10
[ 27.567923] lock_acquire+0xd7/0x280
[ 27.568124] down_read_interruptible+0x55/0x340
[ 27.568368] netfs_start_io_read+0x24/0x110
[ 27.568622] netfs_file_read_iter+0x191/0x240
[ 27.568858] __kernel_read+0x401/0x850
[ 27.569068] freader_fetch+0x19b/0x790
[ 27.569279] __build_id_parse+0xde/0x580
[ 27.569528] stack_map_get_build_id_offset+0x248/0x650
[ 27.569802] __bpf_get_stack+0x653/0x890
[ 27.570019] bpf_get_stack_sleepable+0x1d/0x30
[ 27.570257] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.570594] uprobe_prog_run+0x425/0x870
[ 27.570816] uprobe_multi_link_handler+0x58/0xb0
[ 27.571062] handler_chain+0x234/0x1270
[ 27.571275] uprobe_notify_resume+0x4de/0xd60
[ 27.571542] exit_to_user_mode_loop+0xe0/0x480
[ 27.571785] exc_int3+0x1bd/0x260
[ 27.571973] asm_exc_int3+0x39/0x40
[ 27.572169]
[ 27.572169] other info that might help us debug this:
[ 27.572169]
[ 27.572571] Possible unsafe locking scenario:
[ 27.572571]
[ 27.572854] CPU0 CPU1
[ 27.573074] ---- ----
[ 27.573293] rlock(&mm->mmap_lock);
[ 27.573502] lock(&sb->s_type->i_mutex_key#8);
[ 27.573846] lock(&mm->mmap_lock);
[ 27.574135] rlock(&sb->s_type->i_mutex_key#8);
[ 27.574364]
[ 27.574364] *** DEADLOCK ***
[ 27.574364]
[ 27.574671] 3 locks held by uprobe_multi/561:
[ 27.574884] #0: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_notify_resume+0x229/0xd60
[ 27.575371] #1: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_prog_run+0x239/0x870
[ 27.575874] #2: ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.576342]
[ 27.576342] stack backtrace:
[ 27.576578] CPU: 7 UID: 0 PID: 561 Comm: uprobe_multi Tainted: G OE 7.0.0-rc5-g0eeb0094ba03 #3 PREEMPT(full)
[ 27.576586] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 27.576588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-5.el9 11/05/2023
[ 27.576591] Call Trace:
[ 27.576595] <TASK>
[ 27.576598] dump_stack_lvl+0x54/0x70
[ 27.576606] print_circular_bug+0x2e8/0x300
[ 27.576616] check_noncircular+0x12e/0x150
[ 27.576628] __lock_acquire+0x1529/0x2a10
[ 27.576642] ? __pfx_walk_pgd_range+0x10/0x10
[ 27.576651] ? srso_return_thunk+0x5/0x5f
[ 27.576656] ? lock_acquire+0xd7/0x280
[ 27.576661] ? avc_has_perm_noaudit+0x38/0x1f0
[ 27.576672] lock_acquire+0xd7/0x280
[ 27.576677] ? netfs_start_io_read+0x24/0x110
[ 27.576685] ? srso_return_thunk+0x5/0x5f
[ 27.576693] ? netfs_start_io_read+0x24/0x110
[ 27.576698] down_read_interruptible+0x55/0x340
[ 27.576702] ? netfs_start_io_read+0x24/0x110
[ 27.576707] ? __pfx_cmp_ex_search+0x10/0x10
[ 27.576716] netfs_start_io_read+0x24/0x110
[ 27.576723] netfs_file_read_iter+0x191/0x240
[ 27.576731] __kernel_read+0x401/0x850
[ 27.576741] ? __pfx___kernel_read+0x10/0x10
[ 27.576752] ? __pfx_fixup_exception+0x10/0x10
[ 27.576761] ? srso_return_thunk+0x5/0x5f
[ 27.576766] ? lock_is_held_type+0x76/0x100
[ 27.576773] ? srso_return_thunk+0x5/0x5f
[ 27.576777] ? mtree_range_walk+0x421/0x6c0
[ 27.576785] freader_fetch+0x19b/0x790
[ 27.576793] ? mt_find+0x14b/0x340
[ 27.576801] ? __pfx_freader_fetch+0x10/0x10
[ 27.576805] ? mt_find+0x29a/0x340
[ 27.576810] ? mt_find+0x14b/0x340
[ 27.576820] __build_id_parse+0xde/0x580
[ 27.576832] ? __pfx___build_id_parse+0x10/0x10
[ 27.576850] stack_map_get_build_id_offset+0x248/0x650
[ 27.576863] __bpf_get_stack+0x653/0x890
[ 27.576867] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 27.576880] ? __pfx___bpf_get_stack+0x10/0x10
[ 27.576884] ? lock_acquire+0xd7/0x280
[ 27.576889] ? uprobe_prog_run+0x239/0x870
[ 27.576895] ? srso_return_thunk+0x5/0x5f
[ 27.576899] ? stack_trace_save+0xae/0x100
[ 27.576908] bpf_get_stack_sleepable+0x1d/0x30
[ 27.576913] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.576919] uprobe_prog_run+0x425/0x870
[ 27.576927] ? srso_return_thunk+0x5/0x5f
[ 27.576933] ? __pfx_uprobe_prog_run+0x10/0x10
[ 27.576937] ? uprobe_notify_resume+0x413/0xd60
[ 27.576952] uprobe_multi_link_handler+0x58/0xb0
[ 27.576960] handler_chain+0x234/0x1270
[ 27.576976] ? __pfx_handler_chain+0x10/0x10
[ 27.576988] ? srso_return_thunk+0x5/0x5f
[ 27.576992] ? __kasan_kmalloc+0x72/0x90
[ 27.576998] ? __pfx_ri_timer+0x10/0x10
[ 27.577003] ? timer_init_key+0xf5/0x200
[ 27.577013] uprobe_notify_resume+0x4de/0xd60
[ 27.577025] ? uprobe_notify_resume+0x229/0xd60
[ 27.577030] ? __pfx_uprobe_notify_resume+0x10/0x10
[ 27.577035] ? srso_return_thunk+0x5/0x5f
[ 27.577039] ? atomic_notifier_call_chain+0xfc/0x110
[ 27.577047] ? srso_return_thunk+0x5/0x5f
[ 27.577051] ? notify_die+0xe5/0x130
[ 27.577056] ? __pfx_notify_die+0x10/0x10
[ 27.577063] ? srso_return_thunk+0x5/0x5f
[ 27.577071] exit_to_user_mode_loop+0xe0/0x480
[ 27.577076] ? asm_exc_int3+0x39/0x40
[ 27.577083] exc_int3+0x1bd/0x260
[ 27.577090] asm_exc_int3+0x39/0x40
[ 27.577095] RIP: 0033:0x6c4180
[ 27.577100] Code: c6 05 0b 6f 32 00 01 5d c3 90 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa eb 8a 66 2e 0f 1f 84 00 00 00 00 00 <cc> 48 89 e5 31 c0 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 31 c0
[ 27.577104] RSP: 002b:00007ffcbf71ada8 EFLAGS: 00000246
[ 27.577109] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fec9163db7b
[ 27.577112] RDX: 00007ffcbf71adbf RSI: 0000000000001000 RDI: 00000000007f0000
[ 27.577115] RBP: 00007ffcbf71add0 R08: 00007fec91731330 R09: 00007fec9178d060
[ 27.577118] R10: 00007fec9153ad98 R11: 0000000000000202 R12: 00007ffcbf71af08
[ 27.577120] R13: 0000000000787760 R14: 00000000009eade8 R15: 00007fec917bf000
[ 27.577135] </TASK>
#54/1 build_id/nofault-paged-out:OK
It was reproducable, which allowed me to bisect to commit 777a8560fd29
("lib/buildid: use __kernel_read() for sleepable context")
Then I used Codex for analysis of the splat, patch and relevant lore
discussions, and then asked it to produce a patch draft, which was a
starting point for this series.
I verified that the patch series fixes the deadlock splat.
---
v5->v6:
* Misc refactoring (Andrii):
* add stack_map_build_id_set_valid() helper
* simplify control flow in stack_map_get_build_id_offset_sleepable()
v5: https://lore.kernel.org/bpf/20260515005244.1333013-1-ihor.solodrai@linux.dev/
v4->v5:
* Add comments explaining mmap_read_trylock() (Shakeel)
* Rebase on bpf-next (Alexei)
v4: https://lore.kernel.org/bpf/20260514184727.1067141-1-ihor.solodrai@linux.dev/
v3->v4:
* Change Fixes tag in patch #2 (AI)
* Nit in caching implementation (Mykyta)
v3: https://lore.kernel.org/bpf/20260512032906.2670326-1-ihor.solodrai@linux.dev/
v2->v3:
* Split patch #2 in two: stack_map_get_build_id_offset_sleepable()
implementation, and then introduce caching
* Drop taking mmap_lock if CONFIG_PER_VMA_LOCK=n, fall back to raw
IPs instead
* Cache vm_{start,end} in addition to prev_file (Mykyta)
v2: https://lore.kernel.org/bpf/20260409010604.1439087-1-ihor.solodrai@linux.dev/
v1->v2:
* Addressed feedback from Puranjay:
* split out a small refactoring patch
* use mmap_read_trylock()
* take into account CONFIG_PER_VMA_LOCK
* replace find_vma() with vma_lookup()
* cache prev_build_id to avoid re-parsing the same file
* Snapshot vm_pgoff and vm_start before unlocking (AI)
* To avoid repetitive unlocking statements, introduce struct
stack_map_vma_lock to hold relevant lock state info and add an
unlock helper
v1: https://lore.kernel.org/bpf/20260407223003.720428-1-ihor.solodrai@linux.dev/
---
Ihor Solodrai (3):
bpf: Factor out stack_map build ID helpers
bpf: Avoid faultable build ID reads under mm locks
bpf: Cache build IDs in sleepable stackmap path
kernel/bpf/stackmap.c | 204 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 189 insertions(+), 15 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers 2026-05-21 22:50 [PATCH bpf-next v6 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai @ 2026-05-21 22:50 ` Ihor Solodrai 2026-05-21 23:16 ` sashiko-bot 2026-05-22 17:16 ` Andrii Nakryiko 2026-05-21 22:50 ` [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-05-21 22:50 ` [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2 siblings, 2 replies; 18+ messages in thread From: Ihor Solodrai @ 2026-05-21 22:50 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team Factor out helpers from stack_map_get_build_id_offset() in preparation for adding a sleepable build ID resolution path. No functional changes. Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 51 ++++++++++++++++++++++++++++++------------- 1 file changed, 36 insertions(+), 15 deletions(-) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index da3d328f5c15..4c753e02c415 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -152,6 +152,28 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b : build_id_parse_nofault(vma, build_id, NULL); } +static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) +{ + id->status = BPF_STACK_BUILD_ID_IP; + memset(id->build_id, 0, BUILD_ID_SIZE_MAX); +} + +static inline u64 stack_map_build_id_offset(unsigned long vm_pgoff, + unsigned long vm_start, u64 ip) +{ + return (vm_pgoff << PAGE_SHIFT) + ip - vm_start; +} + +static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, + u64 offset, + const unsigned char *build_id) +{ + id->status = BPF_STACK_BUILD_ID_VALID; + id->offset = offset; + if (id->build_id != build_id) + memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -165,44 +187,43 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, u32 trace_nr, bool user, bool may_fault) { - int i; struct mmap_unlock_irq_work *work = NULL; bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); + bool has_user_ctx = user && current && current->mm; struct vm_area_struct *vma, *prev_vma = NULL; - const char *prev_build_id; + const unsigned char *prev_build_id; + int i; /* If the irq_work is in use, fall back to report ips. Same * fallback is used for kernel stack (!user) on a stackmap with * build_id. */ - if (!user || !current || !current->mm || irq_work_busy || - !mmap_read_trylock(current->mm)) { + if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { /* cannot access current->mm, fall back to ips */ - for (i = 0; i < trace_nr; i++) { - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); - } + for (i = 0; i < trace_nr; i++) + stack_map_build_id_set_ip(&id_offs[i]); return; } for (i = 0; i < trace_nr; i++) { u64 ip = READ_ONCE(id_offs[i].ip); + u64 offset; if (range_in_vma(prev_vma, ip, ip)) { vma = prev_vma; - memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); - goto build_id_valid; + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); + stack_map_build_id_set_valid(&id_offs[i], offset, prev_build_id); + prev_build_id = id_offs[i].build_id; + continue; } vma = find_vma(current->mm, ip); if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { /* per entry fall back to ips */ - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); + stack_map_build_id_set_ip(&id_offs[i]); continue; } -build_id_valid: - id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ip - vma->vm_start; - id_offs[i].status = BPF_STACK_BUILD_ID_VALID; + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); prev_vma = vma; prev_build_id = id_offs[i].build_id; } -- 2.54.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers 2026-05-21 22:50 ` [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers Ihor Solodrai @ 2026-05-21 23:16 ` sashiko-bot 2026-05-21 23:32 ` Ihor Solodrai 2026-05-22 17:16 ` Andrii Nakryiko 1 sibling, 1 reply; 18+ messages in thread From: sashiko-bot @ 2026-05-21 23:16 UTC (permalink / raw) To: Ihor Solodrai; +Cc: bpf Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [Medium] Data race and incorrect ordering in stack_map_build_id_set_valid causing stale reads -- commit 4eb2cc8713ca7f31dcbc75f5166e94dd60761e7c Author: Ihor Solodrai <ihor.solodrai@linux.dev> bpf: Factor out stack_map build ID helpers This commit factors out helpers from stack_map_get_build_id_offset() to prepare for adding a sleepable build ID resolution path. The commit message states there are no functional changes. > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index da3d328f5c15a..4c753e02c4159 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c [ ... ] > @@ -152,6 +152,28 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b [ ... ] > +static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, > + u64 offset, > + const unsigned char *build_id) > +{ > + id->status = BPF_STACK_BUILD_ID_VALID; [Severity: Medium] In bpf_get_stack(), the destination buffer can be a BPF map value that is concurrently readable locklessly by user-space. Since the buffer initially contains the raw instruction pointer (ip), does setting status to VALID before writing the offset create a data race? A concurrent lockless reader polling the status field might observe the VALID status while the union still holds the raw ip instead of the correct file offset, and before the build_id string is fully copied. Prior to this patch, the offset was written before the status. Could this reordering of assignments cause concurrent readers to observe stale data? > + id->offset = offset; > + if (id->build_id != build_id) > + memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); > +} -- Sashiko AI review · https://sashiko.dev/#/patchset/20260521225022.2695755-1-ihor.solodrai@linux.dev?part=1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers 2026-05-21 23:16 ` sashiko-bot @ 2026-05-21 23:32 ` Ihor Solodrai 2026-05-22 17:17 ` Andrii Nakryiko 0 siblings, 1 reply; 18+ messages in thread From: Ihor Solodrai @ 2026-05-21 23:32 UTC (permalink / raw) To: sashiko-reviews; +Cc: bpf On 2026-05-21 4:16 p.m., sashiko-bot@kernel.org wrote: > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: > - [Medium] Data race and incorrect ordering in stack_map_build_id_set_valid causing stale reads > -- > > commit 4eb2cc8713ca7f31dcbc75f5166e94dd60761e7c > Author: Ihor Solodrai <ihor.solodrai@linux.dev> > > bpf: Factor out stack_map build ID helpers > > This commit factors out helpers from stack_map_get_build_id_offset() to > prepare for adding a sleepable build ID resolution path. The commit > message states there are no functional changes. > >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index da3d328f5c15a..4c753e02c4159 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c > [ ... ] >> @@ -152,6 +152,28 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b > [ ... ] >> +static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, >> + u64 offset, >> + const unsigned char *build_id) >> +{ >> + id->status = BPF_STACK_BUILD_ID_VALID; > > [Severity: Medium] > In bpf_get_stack(), the destination buffer can be a BPF map value that is > concurrently readable locklessly by user-space. Since the buffer initially > contains the raw instruction pointer (ip), does setting status to VALID > before writing the offset create a data race? This is a good point. Will fix in the next version. > > A concurrent lockless reader polling the status field might observe the > VALID status while the union still holds the raw ip instead of the correct > file offset, and before the build_id string is fully copied. > > Prior to this patch, the offset was written before the status. Could this > reordering of assignments cause concurrent readers to observe stale data? > >> + id->offset = offset; >> + if (id->build_id != build_id) >> + memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); >> +} > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers 2026-05-21 23:32 ` Ihor Solodrai @ 2026-05-22 17:17 ` Andrii Nakryiko 0 siblings, 0 replies; 18+ messages in thread From: Andrii Nakryiko @ 2026-05-22 17:17 UTC (permalink / raw) To: Ihor Solodrai; +Cc: sashiko-reviews, bpf On Thu, May 21, 2026 at 4:33 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > On 2026-05-21 4:16 p.m., sashiko-bot@kernel.org wrote: > > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: > > - [Medium] Data race and incorrect ordering in stack_map_build_id_set_valid causing stale reads > > -- > > > > commit 4eb2cc8713ca7f31dcbc75f5166e94dd60761e7c > > Author: Ihor Solodrai <ihor.solodrai@linux.dev> > > > > bpf: Factor out stack_map build ID helpers > > > > This commit factors out helpers from stack_map_get_build_id_offset() to > > prepare for adding a sleepable build ID resolution path. The commit > > message states there are no functional changes. > > > >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > >> index da3d328f5c15a..4c753e02c4159 100644 > >> --- a/kernel/bpf/stackmap.c > >> +++ b/kernel/bpf/stackmap.c > > [ ... ] > >> @@ -152,6 +152,28 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b > > [ ... ] > >> +static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, > >> + u64 offset, > >> + const unsigned char *build_id) > >> +{ > >> + id->status = BPF_STACK_BUILD_ID_VALID; > > > > [Severity: Medium] > > In bpf_get_stack(), the destination buffer can be a BPF map value that is > > concurrently readable locklessly by user-space. Since the buffer initially > > contains the raw instruction pointer (ip), does setting status to VALID > > before writing the offset create a data race? > > This is a good point. Will fix in the next version. I honestly wouldn't bother, this is a misuse of the API to concurrently write to the same buffer from two independent bpf_get_stack() calls. We shouldn't guarantee any sane behavior in this case anyways. > > > > > A concurrent lockless reader polling the status field might observe the > > VALID status while the union still holds the raw ip instead of the correct > > file offset, and before the build_id string is fully copied. > > > > Prior to this patch, the offset was written before the status. Could this > > reordering of assignments cause concurrent readers to observe stale data? > > > >> + id->offset = offset; > >> + if (id->build_id != build_id) > >> + memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); > >> +} > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers 2026-05-21 22:50 ` [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers Ihor Solodrai 2026-05-21 23:16 ` sashiko-bot @ 2026-05-22 17:16 ` Andrii Nakryiko 2026-05-22 17:33 ` Ihor Solodrai 1 sibling, 1 reply; 18+ messages in thread From: Andrii Nakryiko @ 2026-05-22 17:16 UTC (permalink / raw) To: Ihor Solodrai Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Thu, May 21, 2026 at 3:51 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > Factor out helpers from stack_map_get_build_id_offset() in > preparation for adding a sleepable build ID resolution path. > > No functional changes. > > Acked-by: Mykyta Yatsenko <yatsenko@meta.com> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- > kernel/bpf/stackmap.c | 51 ++++++++++++++++++++++++++++++------------- > 1 file changed, 36 insertions(+), 15 deletions(-) > [...] > /* > * Expects all id_offs[i].ip values to be set to correct initial IPs. > * They will be subsequently: > @@ -165,44 +187,43 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b > static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > u32 trace_nr, bool user, bool may_fault) > { > - int i; > struct mmap_unlock_irq_work *work = NULL; > bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); > + bool has_user_ctx = user && current && current->mm; > struct vm_area_struct *vma, *prev_vma = NULL; > - const char *prev_build_id; > + const unsigned char *prev_build_id; = NULL ? > + int i; > > /* If the irq_work is in use, fall back to report ips. Same > * fallback is used for kernel stack (!user) on a stackmap with > * build_id. > */ > - if (!user || !current || !current->mm || irq_work_busy || > - !mmap_read_trylock(current->mm)) { > + if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { > /* cannot access current->mm, fall back to ips */ > - for (i = 0; i < trace_nr; i++) { > - id_offs[i].status = BPF_STACK_BUILD_ID_IP; > - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); > - } > + for (i = 0; i < trace_nr; i++) > + stack_map_build_id_set_ip(&id_offs[i]); > return; > } > > for (i = 0; i < trace_nr; i++) { > u64 ip = READ_ONCE(id_offs[i].ip); > + u64 offset; > > if (range_in_vma(prev_vma, ip, ip)) { > vma = prev_vma; > - memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); > - goto build_id_valid; > + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > + stack_map_build_id_set_valid(&id_offs[i], offset, prev_build_id); > + prev_build_id = id_offs[i].build_id; you are not updating prev_vma, so why updating prev_build_id... it's confusing because this serves no purpose, yet the code is there, so whoever is reading this should be asking the question of "what am I missing". Am I missing something? > + continue; > } > vma = find_vma(current->mm, ip); > if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { > /* per entry fall back to ips */ > - id_offs[i].status = BPF_STACK_BUILD_ID_IP; > - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); > + stack_map_build_id_set_ip(&id_offs[i]); > continue; > } > -build_id_valid: > - id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ip - vma->vm_start; > - id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); > prev_vma = vma; > prev_build_id = id_offs[i].build_id; > } > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers 2026-05-22 17:16 ` Andrii Nakryiko @ 2026-05-22 17:33 ` Ihor Solodrai 2026-05-22 17:50 ` Andrii Nakryiko 0 siblings, 1 reply; 18+ messages in thread From: Ihor Solodrai @ 2026-05-22 17:33 UTC (permalink / raw) To: Andrii Nakryiko Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On 5/22/26 10:16 AM, Andrii Nakryiko wrote: > On Thu, May 21, 2026 at 3:51 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: >> >> Factor out helpers from stack_map_get_build_id_offset() in >> preparation for adding a sleepable build ID resolution path. >> >> No functional changes. >> >> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> >> --- >> kernel/bpf/stackmap.c | 51 ++++++++++++++++++++++++++++++------------- >> 1 file changed, 36 insertions(+), 15 deletions(-) >> > > [...] > >> /* >> * Expects all id_offs[i].ip values to be set to correct initial IPs. >> * They will be subsequently: >> @@ -165,44 +187,43 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b >> static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, >> u32 trace_nr, bool user, bool may_fault) >> { >> - int i; >> struct mmap_unlock_irq_work *work = NULL; >> bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); >> + bool has_user_ctx = user && current && current->mm; >> struct vm_area_struct *vma, *prev_vma = NULL; >> - const char *prev_build_id; >> + const unsigned char *prev_build_id; > > = NULL ? Sure, why not. > >> + int i; >> >> /* If the irq_work is in use, fall back to report ips. Same >> * fallback is used for kernel stack (!user) on a stackmap with >> * build_id. >> */ >> - if (!user || !current || !current->mm || irq_work_busy || >> - !mmap_read_trylock(current->mm)) { >> + if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { >> /* cannot access current->mm, fall back to ips */ >> - for (i = 0; i < trace_nr; i++) { >> - id_offs[i].status = BPF_STACK_BUILD_ID_IP; >> - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); >> - } >> + for (i = 0; i < trace_nr; i++) >> + stack_map_build_id_set_ip(&id_offs[i]); >> return; >> } >> >> for (i = 0; i < trace_nr; i++) { >> u64 ip = READ_ONCE(id_offs[i].ip); >> + u64 offset; >> >> if (range_in_vma(prev_vma, ip, ip)) { >> vma = prev_vma; >> - memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); >> - goto build_id_valid; >> + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); >> + stack_map_build_id_set_valid(&id_offs[i], offset, prev_build_id); >> + prev_build_id = id_offs[i].build_id; > > you are not updating prev_vma, so why updating prev_build_id... it's > confusing because this serves no purpose, yet the code is there, so > whoever is reading this should be asking the question of "what am I > missing". Am I missing something? No, you're right, it doesn't look necessary. I think this is AI slop: it took too literally the "no functional changes" part and essentially copy-pasted a piece of code under build_id_valid label before continue. My oversight, will remove. > >> + continue; >> } >> vma = find_vma(current->mm, ip); >> if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { >> /* per entry fall back to ips */ >> - id_offs[i].status = BPF_STACK_BUILD_ID_IP; >> - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); >> + stack_map_build_id_set_ip(&id_offs[i]); >> continue; >> } >> -build_id_valid: >> - id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ip - vma->vm_start; >> - id_offs[i].status = BPF_STACK_BUILD_ID_VALID; >> + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); >> + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); >> prev_vma = vma; >> prev_build_id = id_offs[i].build_id; >> } >> -- >> 2.54.0 >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers 2026-05-22 17:33 ` Ihor Solodrai @ 2026-05-22 17:50 ` Andrii Nakryiko 0 siblings, 0 replies; 18+ messages in thread From: Andrii Nakryiko @ 2026-05-22 17:50 UTC (permalink / raw) To: Ihor Solodrai Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Fri, May 22, 2026 at 10:33 AM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > On 5/22/26 10:16 AM, Andrii Nakryiko wrote: > > On Thu, May 21, 2026 at 3:51 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > >> > >> Factor out helpers from stack_map_get_build_id_offset() in > >> preparation for adding a sleepable build ID resolution path. > >> > >> No functional changes. > >> > >> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> > >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > >> --- > >> kernel/bpf/stackmap.c | 51 ++++++++++++++++++++++++++++++------------- > >> 1 file changed, 36 insertions(+), 15 deletions(-) > >> > > > > [...] > > > >> /* > >> * Expects all id_offs[i].ip values to be set to correct initial IPs. > >> * They will be subsequently: > >> @@ -165,44 +187,43 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b > >> static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > >> u32 trace_nr, bool user, bool may_fault) > >> { > >> - int i; > >> struct mmap_unlock_irq_work *work = NULL; > >> bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); > >> + bool has_user_ctx = user && current && current->mm; > >> struct vm_area_struct *vma, *prev_vma = NULL; > >> - const char *prev_build_id; > >> + const unsigned char *prev_build_id; > > > > = NULL ? > > Sure, why not. so that compiler doesn't complain about potentially reading uninitialized variable > > > > >> + int i; > >> > >> /* If the irq_work is in use, fall back to report ips. Same > >> * fallback is used for kernel stack (!user) on a stackmap with > >> * build_id. > >> */ > >> - if (!user || !current || !current->mm || irq_work_busy || > >> - !mmap_read_trylock(current->mm)) { > >> + if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { > >> /* cannot access current->mm, fall back to ips */ > >> - for (i = 0; i < trace_nr; i++) { > >> - id_offs[i].status = BPF_STACK_BUILD_ID_IP; > >> - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); > >> - } > >> + for (i = 0; i < trace_nr; i++) > >> + stack_map_build_id_set_ip(&id_offs[i]); > >> return; > >> } > >> > >> for (i = 0; i < trace_nr; i++) { > >> u64 ip = READ_ONCE(id_offs[i].ip); > >> + u64 offset; > >> > >> if (range_in_vma(prev_vma, ip, ip)) { > >> vma = prev_vma; > >> - memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); > >> - goto build_id_valid; > >> + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > >> + stack_map_build_id_set_valid(&id_offs[i], offset, prev_build_id); > >> + prev_build_id = id_offs[i].build_id; > > > > you are not updating prev_vma, so why updating prev_build_id... it's > > confusing because this serves no purpose, yet the code is there, so > > whoever is reading this should be asking the question of "what am I > > missing". Am I missing something? > > No, you're right, it doesn't look necessary. > > I think this is AI slop: it took too literally the "no functional > changes" part and essentially copy-pasted a piece of code under > build_id_valid label before continue. My oversight, will remove. > > > > >> + continue; > >> } > >> vma = find_vma(current->mm, ip); > >> if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { > >> /* per entry fall back to ips */ > >> - id_offs[i].status = BPF_STACK_BUILD_ID_IP; > >> - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); > >> + stack_map_build_id_set_ip(&id_offs[i]); you should probably set prev_vma = vma, prev_build_id = NULL here? (give in all other case we try to point prev_vma to the actual vma that we saw for the previous ip?) > >> continue; > >> } > >> -build_id_valid: > >> - id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ip - vma->vm_start; > >> - id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > >> + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > >> + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); > >> prev_vma = vma; > >> prev_build_id = id_offs[i].build_id; > >> } > >> -- > >> 2.54.0 > >> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-21 22:50 [PATCH bpf-next v6 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-05-21 22:50 ` [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers Ihor Solodrai @ 2026-05-21 22:50 ` Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci ` (2 more replies) 2026-05-21 22:50 ` [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2 siblings, 3 replies; 18+ messages in thread From: Ihor Solodrai @ 2026-05-21 22:50 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team Sleepable build ID parsing can block in __kernel_read() [1], so the stackmap sleepable path must not call it while holding mmap_lock or a per-VMA read lock. The issue and the fix are conceptually similar to a recent procfs patch [2]. Resolve each covered VMA with a stable read-side reference, preferring lock_vma_under_rcu() and falling back to mmap_read_trylock() only long enough to acquire the VMA read lock. Take a reference to the backing file, drop the VMA lock, and then parse the build ID through (sleepable) build_id_parse_file(). We have to use mmap_read_trylock() (and give up on failure) in this context because taking mmap_read_lock() is generally unsafe on code paths reachable from BPF programs [3], and may lead to deadlocks. [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") Suggested-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 107 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 4c753e02c415..95336c0e8b56 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -9,6 +9,7 @@ #include <linux/perf_event.h> #include <linux/btf_ids.h> #include <linux/buildid.h> +#include <linux/mmap_lock.h> #include "percpu_freelist.h" #include "mmap_unlock_work.h" @@ -174,6 +175,107 @@ static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); } +struct stack_map_vma_lock { + bool vma_locked; + struct vm_area_struct *vma; + struct mm_struct *mm; +}; + +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) +{ + struct mm_struct *mm = lock->mm; + struct vm_area_struct *vma; + + if (WARN_ON_ONCE(!mm)) + return NULL; + + vma = lock_vma_under_rcu(mm, ip); + if (vma) + goto vma_locked; + + /* + * Taking mmap_read_lock() is unsafe here, because the caller + * BPF program might already hold it, causing a deadlock. + */ + if (!mmap_read_trylock(mm)) + return NULL; + + vma = vma_lookup(mm, ip); + if (!vma) { + mmap_read_unlock(mm); + return NULL; + } + +#ifdef CONFIG_PER_VMA_LOCK + if (!vma_start_read_locked(vma)) { + mmap_read_unlock(mm); + return NULL; + } + mmap_read_unlock(mm); +#else + mmap_read_unlock(mm); + return NULL; +#endif +vma_locked: + lock->vma_locked = true; + lock->vma = vma; + return vma; +} + +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) +{ + struct vm_area_struct *vma = lock->vma; + + if (lock->vma_locked) { + if (WARN_ON_ONCE(!vma)) + goto out; + vma_end_read(vma); + } +out: + lock->vma_locked = false; + lock->vma = NULL; +} + +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, + u32 trace_nr) +{ + struct mm_struct *mm = current->mm; + struct stack_map_vma_lock lock = { + .vma_locked = false, + .vma = NULL, + .mm = mm, + }; + struct vm_area_struct *vma; + struct file *file; + u64 offset; + u64 ip; + + for (u32 i = 0; i < trace_nr; i++) { + ip = READ_ONCE(id_offs[i].ip); + + vma = stack_map_lock_vma(&lock, ip); + if (!vma || !vma->vm_file) { + stack_map_build_id_set_ip(&id_offs[i]); + stack_map_unlock_vma(&lock); + continue; + } + + file = get_file(vma->vm_file); + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); + stack_map_unlock_vma(&lock); + + /* build_id_parse_file() may block on filesystem reads */ + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { + stack_map_build_id_set_ip(&id_offs[i]); + fput(file); + continue; + } + fput(file); + + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); + } +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -194,6 +296,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, const unsigned char *prev_build_id; int i; + if (may_fault && has_user_ctx) { + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); + return; + } + /* If the irq_work is in use, fall back to report ips. Same * fallback is used for kernel stack (!user) on a stackmap with * build_id. -- 2.54.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-21 22:50 ` [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai @ 2026-05-21 23:33 ` bot+bpf-ci 2026-05-21 23:41 ` sashiko-bot 2026-05-22 17:42 ` Andrii Nakryiko 2 siblings, 0 replies; 18+ messages in thread From: bot+bpf-ci @ 2026-05-21 23:33 UTC (permalink / raw) To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel, kernel-team, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 4316 bytes --] > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 4c753e02c415..95336c0e8b56 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c [ ... ] > @@ -174,6 +175,107 @@ static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, > memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); > } > > +struct stack_map_vma_lock { > + bool vma_locked; > + struct vm_area_struct *vma; > + struct mm_struct *mm; > +}; > + > +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > +{ > + struct mm_struct *mm = lock->mm; > + struct vm_area_struct *vma; > + > + if (WARN_ON_ONCE(!mm)) > + return NULL; > + > + vma = lock_vma_under_rcu(mm, ip); > + if (vma) > + goto vma_locked; > + > + /* > + * Taking mmap_read_lock() is unsafe here, because the caller > + * BPF program might already hold it, causing a deadlock. > + */ > + if (!mmap_read_trylock(mm)) > + return NULL; > + > + vma = vma_lookup(mm, ip); > + if (!vma) { > + mmap_read_unlock(mm); > + return NULL; > + } > + > +#ifdef CONFIG_PER_VMA_LOCK > + if (!vma_start_read_locked(vma)) { > + mmap_read_unlock(mm); > + return NULL; > + } > + mmap_read_unlock(mm); > +#else > + mmap_read_unlock(mm); > + return NULL; > +#endif Does the fallback for kernels built without CONFIG_PER_VMA_LOCK completely disable build ID parsing? Looking at the #else branch, it unconditionally drops mmap_read_lock and returns NULL. This causes the caller to receive a NULL VMA for every IP, which skips parsing and falls back to returning IP-only traces. Could this instead keep mmap_read_lock held, return the VMA, and allow the caller to extract the file reference, vm_pgoff, and vm_start before safely dropping the lock? This was raised by sashiko-bot@kernel.org in v4 and v5 at: https://lore.kernel.org/bpf/20260515012547.EFF40C2BCB3@smtp.kernel.org/ > +vma_locked: > + lock->vma_locked = true; > + lock->vma = vma; > + return vma; > +} > + > +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) > +{ > + struct vm_area_struct *vma = lock->vma; > + > + if (lock->vma_locked) { > + if (WARN_ON_ONCE(!vma)) > + goto out; > + vma_end_read(vma); > + } > +out: > + lock->vma_locked = false; > + lock->vma = NULL; > +} > + > +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > + u32 trace_nr) > +{ > + struct mm_struct *mm = current->mm; > + struct stack_map_vma_lock lock = { > + .vma_locked = false, > + .vma = NULL, > + .mm = mm, > + }; > + struct vm_area_struct *vma; > + struct file *file; > + u64 offset; > + u64 ip; > + > + for (u32 i = 0; i < trace_nr; i++) { > + ip = READ_ONCE(id_offs[i].ip); > + > + vma = stack_map_lock_vma(&lock, ip); > + if (!vma || !vma->vm_file) { > + stack_map_build_id_set_ip(&id_offs[i]); > + stack_map_unlock_vma(&lock); > + continue; > + } Is using !vma->vm_file to classify anonymous VMAs fully correct? Per mm-vma.md's VMA classification rules, the canonical test for anonymous VMAs is vma_is_anonymous() (i.e. !vma->vm_ops). Some anonymous VMAs carry a vm_file reference. For example, private mappings of /dev/zero set via vma_set_anonymous() leave vm_file populated while clearing vm_ops. While this won't cause a crash (build_id_parse_file would fail gracefully), it causes unnecessary sleepable read attempts on pseudo-files. This was raised by bot+bpf-ci@kernel.org and sashiko-bot@kernel.org across v3, v4, and v5 at: https://lore.kernel.org/bpf/0e2084173f17252ff227aba064e534682b52a7a451280c0df4041825ebae30b6@mail.kernel.org/ > + > + file = get_file(vma->vm_file); > + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > + stack_map_unlock_vma(&lock); > + > + /* build_id_parse_file() may block on filesystem reads */ > + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { > + stack_map_build_id_set_ip(&id_offs[i]); > + fput(file); > + continue; > + } > + fput(file); > + > + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); > + } > +} [ ... ] --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26258400944 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-21 22:50 ` [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci @ 2026-05-21 23:41 ` sashiko-bot 2026-05-22 17:42 ` Andrii Nakryiko 2 siblings, 0 replies; 18+ messages in thread From: sashiko-bot @ 2026-05-21 23:41 UTC (permalink / raw) To: Ihor Solodrai; +Cc: bpf Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] The lock fallback path unconditionally fails when CONFIG_PER_VMA_LOCK is not set, silently breaking build ID parsing in sleepable BPF programs. - [Medium] Incorrect check for file-backed VMAs using `vma->vm_file` instead of `!vma_is_anonymous(vma)`. -- commit 84d83effb87030cad9aaab2d51c0df334f8c48df Author: Ihor Solodrai <ihor.solodrai@linux.dev> bpf: Avoid faultable build ID reads under mm locks This commit avoids calling sleepable build ID parsing while holding mmap_lock or a per-VMA read lock to prevent blocking. It introduces a new function to resolve VMAs and acquire references, dropping the VMA lock before parsing. Link: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ Link: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ Link: https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c [ ... ] > @@ -174,6 +175,107 @@ static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, [ ... ] > +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > +{ > + struct mm_struct *mm = lock->mm; > + struct vm_area_struct *vma; > + > + if (WARN_ON_ONCE(!mm)) > + return NULL; > + > + vma = lock_vma_under_rcu(mm, ip); > + if (vma) > + goto vma_locked; > + > + /* > + * Taking mmap_read_lock() is unsafe here, because the caller > + * BPF program might already hold it, causing a deadlock. > + */ > + if (!mmap_read_trylock(mm)) > + return NULL; > + > + vma = vma_lookup(mm, ip); > + if (!vma) { > + mmap_read_unlock(mm); > + return NULL; > + } > + > +#ifdef CONFIG_PER_VMA_LOCK > + if (!vma_start_read_locked(vma)) { > + mmap_read_unlock(mm); > + return NULL; > + } > + mmap_read_unlock(mm); > +#else > + mmap_read_unlock(mm); > + return NULL; > +#endif [Severity: High] Does this unconditionally break sleepable build ID parsing when CONFIG_PER_VMA_LOCK is not enabled? Because lock_vma_under_rcu() returns NULL without per-VMA locks, the code falls back to mmap_read_trylock() and vma_lookup(). However, when it reaches the #else block here, it always unlocks and returns NULL, causing the sleepable path to fail and degrade to raw IPs. Prior to this patch, mmap_read_trylock() was used successfully regardless of CONFIG_PER_VMA_LOCK. Could we retain the mmap_read_lock protected fallback for kernels without per-VMA locks, or is there another way to keep this feature working? > +vma_locked: > + lock->vma_locked = true; > + lock->vma = vma; > + return vma; > +} [ ... ] > +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > + u32 trace_nr) > +{ > + struct mm_struct *mm = current->mm; > + struct stack_map_vma_lock lock = { > + .vma_locked = false, > + .vma = NULL, > + .mm = mm, > + }; > + struct vm_area_struct *vma; > + struct file *file; > + u64 offset; > + u64 ip; > + > + for (u32 i = 0; i < trace_nr; i++) { > + ip = READ_ONCE(id_offs[i].ip); > + > + vma = stack_map_lock_vma(&lock, ip); > + if (!vma || !vma->vm_file) { [Severity: Medium] Is it possible for anonymous VMAs (such as private mappings of /dev/zero) to bypass this check? According to the MM guidelines, vma->vm_file can be non-NULL even for anonymous VMAs, which might lead to taking unnecessary file references and calling build_id_parse_file() on them. Should this use !vma_is_anonymous(vma) instead of checking vma->vm_file directly to ensure we only process properly file-backed VMAs? > + stack_map_build_id_set_ip(&id_offs[i]); > + stack_map_unlock_vma(&lock); > + continue; > + } > + > + file = get_file(vma->vm_file); > + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > + stack_map_unlock_vma(&lock); -- Sashiko AI review · https://sashiko.dev/#/patchset/20260521225022.2695755-1-ihor.solodrai@linux.dev?part=2 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-21 22:50 ` [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci 2026-05-21 23:41 ` sashiko-bot @ 2026-05-22 17:42 ` Andrii Nakryiko 2026-05-22 18:04 ` Ihor Solodrai 2 siblings, 1 reply; 18+ messages in thread From: Andrii Nakryiko @ 2026-05-22 17:42 UTC (permalink / raw) To: Ihor Solodrai Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Thu, May 21, 2026 at 3:51 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > Sleepable build ID parsing can block in __kernel_read() [1], so the > stackmap sleepable path must not call it while holding mmap_lock or a > per-VMA read lock. > > The issue and the fix are conceptually similar to a recent procfs > patch [2]. > > Resolve each covered VMA with a stable read-side reference, preferring > lock_vma_under_rcu() and falling back to mmap_read_trylock() only long > enough to acquire the VMA read lock. Take a reference to the backing > file, drop the VMA lock, and then parse the build ID through > (sleepable) build_id_parse_file(). > > We have to use mmap_read_trylock() (and give up on failure) in this > context because taking mmap_read_lock() is generally unsafe on code > paths reachable from BPF programs [3], and may lead to deadlocks. > > [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ > [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ > [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ > > Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") > Suggested-by: Puranjay Mohan <puranjay@kernel.org> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- > kernel/bpf/stackmap.c | 107 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 107 insertions(+) > > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 4c753e02c415..95336c0e8b56 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -9,6 +9,7 @@ > #include <linux/perf_event.h> > #include <linux/btf_ids.h> > #include <linux/buildid.h> > +#include <linux/mmap_lock.h> > #include "percpu_freelist.h" > #include "mmap_unlock_work.h" > > @@ -174,6 +175,107 @@ static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, > memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); > } > > +struct stack_map_vma_lock { > + bool vma_locked; > + struct vm_area_struct *vma; > + struct mm_struct *mm; > +}; > + > +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > +{ > + struct mm_struct *mm = lock->mm; > + struct vm_area_struct *vma; > + > + if (WARN_ON_ONCE(!mm)) > + return NULL; > + > + vma = lock_vma_under_rcu(mm, ip); > + if (vma) > + goto vma_locked; > + > + /* > + * Taking mmap_read_lock() is unsafe here, because the caller > + * BPF program might already hold it, causing a deadlock. > + */ > + if (!mmap_read_trylock(mm)) > + return NULL; > + > + vma = vma_lookup(mm, ip); > + if (!vma) { > + mmap_read_unlock(mm); > + return NULL; > + } As the code is written right now, this vma_lookup is futile if we don't have CONFIG_PER_VMA_LOCK, no? We'll just unlock mmap and return NULL regardless of this operation succeeding. So if that's the intent, then starting from mmap_read_trylock() all the way to mmap_read_unlock + return NULL should be under #ifdef CONFIG_PER_VMA_LOCK. But could it be that the intent was to set lock->vma_lock = false, lock->vma=vma + return vma here if vma_lookup under mmap_lock succeeded?... (oh, now scrolling further down the thread, seems like AIs picked up on this as well, oh well) > + > +#ifdef CONFIG_PER_VMA_LOCK > + if (!vma_start_read_locked(vma)) { > + mmap_read_unlock(mm); > + return NULL; > + } > + mmap_read_unlock(mm); > +#else > + mmap_read_unlock(mm); > + return NULL; see above, was this meant to be a "return vma"? This whole function is quite confusing, please help me understand how it's supposed to work both with and without CONFIG_PER_VMA_LOCK. > +#endif > +vma_locked: > + lock->vma_locked = true; > + lock->vma = vma; > + return vma; > +} > + > +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) > +{ > + struct vm_area_struct *vma = lock->vma; > + > + if (lock->vma_locked) { > + if (WARN_ON_ONCE(!vma)) > + goto out; this should never ever happen, it's a bug, we shouldn't warn on it, we should make sure we never ever have vma_locked set to true if there is no vma the whole pattern of returning NULL from stack_map_lock_vma() and still holding lock (i.e., allowing and expecting to call stack_map_unlock_vma) seems confusing and error-prone, tbh why not simplify it to "if we got vma and locked it -> we return non-NULL vma and lock is held, otherwise no lock is held (because why?)" pw-bot: cr > + vma_end_read(vma); > + } > +out: > + lock->vma_locked = false; > + lock->vma = NULL; > +} > + > +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > + u32 trace_nr) > +{ > + struct mm_struct *mm = current->mm; > + struct stack_map_vma_lock lock = { > + .vma_locked = false, > + .vma = NULL, > + .mm = mm, > + }; > + struct vm_area_struct *vma; > + struct file *file; > + u64 offset; > + u64 ip; > + > + for (u32 i = 0; i < trace_nr; i++) { > + ip = READ_ONCE(id_offs[i].ip); > + > + vma = stack_map_lock_vma(&lock, ip); > + if (!vma || !vma->vm_file) { > + stack_map_build_id_set_ip(&id_offs[i]); > + stack_map_unlock_vma(&lock); see above, I find this confusing to need to call unlock_vma if there is no vma in the first place... > + continue; > + } > + > + file = get_file(vma->vm_file); > + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > + stack_map_unlock_vma(&lock); > + > + /* build_id_parse_file() may block on filesystem reads */ > + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { > + stack_map_build_id_set_ip(&id_offs[i]); > + fput(file); > + continue; > + } > + fput(file); > + > + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); what about if (build_id_parse_file(...)) stack_map_build_id_set_ip(...); /* no build id, parsing failed */ else stack_map_build_id_set_valid(...); fput(file); ? > + } > +} > + > /* > * Expects all id_offs[i].ip values to be set to correct initial IPs. > * They will be subsequently: > @@ -194,6 +296,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > const unsigned char *prev_build_id; > int i; > > + if (may_fault && has_user_ctx) { > + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); > + return; > + } > + > /* If the irq_work is in use, fall back to report ips. Same > * fallback is used for kernel stack (!user) on a stackmap with > * build_id. > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-22 17:42 ` Andrii Nakryiko @ 2026-05-22 18:04 ` Ihor Solodrai 2026-05-22 18:14 ` Andrii Nakryiko 0 siblings, 1 reply; 18+ messages in thread From: Ihor Solodrai @ 2026-05-22 18:04 UTC (permalink / raw) To: Andrii Nakryiko Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On 5/22/26 10:42 AM, Andrii Nakryiko wrote: > On Thu, May 21, 2026 at 3:51 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: >> >> Sleepable build ID parsing can block in __kernel_read() [1], so the >> stackmap sleepable path must not call it while holding mmap_lock or a >> per-VMA read lock. >> >> The issue and the fix are conceptually similar to a recent procfs >> patch [2]. >> >> Resolve each covered VMA with a stable read-side reference, preferring >> lock_vma_under_rcu() and falling back to mmap_read_trylock() only long >> enough to acquire the VMA read lock. Take a reference to the backing >> file, drop the VMA lock, and then parse the build ID through >> (sleepable) build_id_parse_file(). >> >> We have to use mmap_read_trylock() (and give up on failure) in this >> context because taking mmap_read_lock() is generally unsafe on code >> paths reachable from BPF programs [3], and may lead to deadlocks. >> >> [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ >> [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ >> [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ >> >> Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") >> Suggested-by: Puranjay Mohan <puranjay@kernel.org> >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> >> --- >> kernel/bpf/stackmap.c | 107 ++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 107 insertions(+) >> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index 4c753e02c415..95336c0e8b56 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c >> @@ -9,6 +9,7 @@ >> #include <linux/perf_event.h> >> #include <linux/btf_ids.h> >> #include <linux/buildid.h> >> +#include <linux/mmap_lock.h> >> #include "percpu_freelist.h" >> #include "mmap_unlock_work.h" >> >> @@ -174,6 +175,107 @@ static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, >> memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); >> } >> >> +struct stack_map_vma_lock { >> + bool vma_locked; >> + struct vm_area_struct *vma; >> + struct mm_struct *mm; >> +}; >> + >> +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) >> +{ >> + struct mm_struct *mm = lock->mm; >> + struct vm_area_struct *vma; >> + >> + if (WARN_ON_ONCE(!mm)) >> + return NULL; >> + >> + vma = lock_vma_under_rcu(mm, ip); >> + if (vma) >> + goto vma_locked; >> + >> + /* >> + * Taking mmap_read_lock() is unsafe here, because the caller >> + * BPF program might already hold it, causing a deadlock. >> + */ >> + if (!mmap_read_trylock(mm)) >> + return NULL; >> + >> + vma = vma_lookup(mm, ip); >> + if (!vma) { >> + mmap_read_unlock(mm); >> + return NULL; >> + } > > As the code is written right now, this vma_lookup is futile if we > don't have CONFIG_PER_VMA_LOCK, no? We'll just unlock mmap and return > NULL regardless of this operation succeeding. So if that's the intent, > then starting from mmap_read_trylock() all the way to mmap_read_unlock > + return NULL should be under #ifdef CONFIG_PER_VMA_LOCK. > > But could it be that the intent was to set lock->vma_lock = false, > lock->vma=vma + return vma here if vma_lookup under mmap_lock > succeeded?... > > (oh, now scrolling further down the thread, seems like AIs picked up > on this as well, oh well) > >> + >> +#ifdef CONFIG_PER_VMA_LOCK >> + if (!vma_start_read_locked(vma)) { >> + mmap_read_unlock(mm); >> + return NULL; >> + } >> + mmap_read_unlock(mm); >> +#else >> + mmap_read_unlock(mm); >> + return NULL; > > see above, was this meant to be a "return vma"? > > This whole function is quite confusing, please help me understand how > it's supposed to work both with and without CONFIG_PER_VMA_LOCK. In v2 the stack_map_vma_lock could be holding either vma lock or mmap lock, in order to support CONFIG_PER_VMA_LOCK=n Then I tried testing it, and it turned out to set CONFIG_PER_VMA_LOCK=n you need to disable SMP, which seems like an unlikely kconfig: https://lore.kernel.org/bpf/bf91eb90-1b6a-4bad-a0e5-072f9dce7daa@linux.dev/ So since v3 I dropped attempt to support CONFIG_PER_VMA_LOCK=n by treating it the same way as failed mmap_read_trylock(), which is expressed by returning NULL from stack_map_lock_vma(). I guess you have a point in that the stackmap lock/unlock helpers can be refactored further. But with respect to CONFIG_PER_VMA_LOCK=n I don't see why we'd want to maintain additional complexity of mmap lock bookkeeping. > >> +#endif > >> +vma_locked: >> + lock->vma_locked = true; >> + lock->vma = vma; >> + return vma; >> +} >> + >> +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) >> +{ >> + struct vm_area_struct *vma = lock->vma; >> + >> + if (lock->vma_locked) { >> + if (WARN_ON_ONCE(!vma)) >> + goto out; > > this should never ever happen, it's a bug, we shouldn't warn on it, we > should make sure we never ever have vma_locked set to true if there is > no vma > > the whole pattern of returning NULL from stack_map_lock_vma() and > still holding lock (i.e., allowing and expecting to call > stack_map_unlock_vma) seems confusing and error-prone, tbh The idea of stack_map_lock_vma() was that it may fail like trylock, and it is safe to call corresponding unlock(), even if lock isn't held. This allows to have an unlock() on all paths after a lock() in the code, instead of "if locked" or "if null" checks. > > why not simplify it to "if we got vma and locked it -> we return > non-NULL vma and lock is held, otherwise no lock is held (because > why?)" I'll try that. > > pw-bot: cr > > >> + vma_end_read(vma); >> + } >> +out: >> + lock->vma_locked = false; >> + lock->vma = NULL; >> +} >> + >> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, >> + u32 trace_nr) >> +{ >> + struct mm_struct *mm = current->mm; >> + struct stack_map_vma_lock lock = { >> + .vma_locked = false, >> + .vma = NULL, >> + .mm = mm, >> + }; >> + struct vm_area_struct *vma; >> + struct file *file; >> + u64 offset; >> + u64 ip; >> + >> + for (u32 i = 0; i < trace_nr; i++) { >> + ip = READ_ONCE(id_offs[i].ip); >> + >> + vma = stack_map_lock_vma(&lock, ip); >> + if (!vma || !vma->vm_file) { >> + stack_map_build_id_set_ip(&id_offs[i]); >> + stack_map_unlock_vma(&lock); > > see above, I find this confusing to need to call unlock_vma if there > is no vma in the first place... > >> + continue; >> + } >> + >> + file = get_file(vma->vm_file); >> + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); >> + stack_map_unlock_vma(&lock); >> + >> + /* build_id_parse_file() may block on filesystem reads */ >> + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { >> + stack_map_build_id_set_ip(&id_offs[i]); >> + fput(file); >> + continue; >> + } >> + fput(file); >> + >> + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); > > what about > > if (build_id_parse_file(...)) > stack_map_build_id_set_ip(...); /* no build id, parsing failed */ > else > stack_map_build_id_set_valid(...); > > fput(file); > > ? > >> + } >> +} >> + >> /* >> * Expects all id_offs[i].ip values to be set to correct initial IPs. >> * They will be subsequently: >> @@ -194,6 +296,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, >> const unsigned char *prev_build_id; >> int i; >> >> + if (may_fault && has_user_ctx) { >> + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); >> + return; >> + } >> + >> /* If the irq_work is in use, fall back to report ips. Same >> * fallback is used for kernel stack (!user) on a stackmap with >> * build_id. >> -- >> 2.54.0 >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks 2026-05-22 18:04 ` Ihor Solodrai @ 2026-05-22 18:14 ` Andrii Nakryiko 0 siblings, 0 replies; 18+ messages in thread From: Andrii Nakryiko @ 2026-05-22 18:14 UTC (permalink / raw) To: Ihor Solodrai Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Fri, May 22, 2026 at 11:05 AM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > On 5/22/26 10:42 AM, Andrii Nakryiko wrote: > > On Thu, May 21, 2026 at 3:51 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > >> > >> Sleepable build ID parsing can block in __kernel_read() [1], so the > >> stackmap sleepable path must not call it while holding mmap_lock or a > >> per-VMA read lock. > >> > >> The issue and the fix are conceptually similar to a recent procfs > >> patch [2]. > >> > >> Resolve each covered VMA with a stable read-side reference, preferring > >> lock_vma_under_rcu() and falling back to mmap_read_trylock() only long > >> enough to acquire the VMA read lock. Take a reference to the backing > >> file, drop the VMA lock, and then parse the build ID through > >> (sleepable) build_id_parse_file(). > >> > >> We have to use mmap_read_trylock() (and give up on failure) in this > >> context because taking mmap_read_lock() is generally unsafe on code > >> paths reachable from BPF programs [3], and may lead to deadlocks. > >> > >> [1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ > >> [2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ > >> [3] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/ > >> > >> Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers") > >> Suggested-by: Puranjay Mohan <puranjay@kernel.org> > >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > >> --- > >> kernel/bpf/stackmap.c | 107 ++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 107 insertions(+) > >> > >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > >> index 4c753e02c415..95336c0e8b56 100644 > >> --- a/kernel/bpf/stackmap.c > >> +++ b/kernel/bpf/stackmap.c > >> @@ -9,6 +9,7 @@ > >> #include <linux/perf_event.h> > >> #include <linux/btf_ids.h> > >> #include <linux/buildid.h> > >> +#include <linux/mmap_lock.h> > >> #include "percpu_freelist.h" > >> #include "mmap_unlock_work.h" > >> > >> @@ -174,6 +175,107 @@ static inline void stack_map_build_id_set_valid(struct bpf_stack_build_id *id, > >> memcpy(id->build_id, build_id, BUILD_ID_SIZE_MAX); > >> } > >> > >> +struct stack_map_vma_lock { > >> + bool vma_locked; > >> + struct vm_area_struct *vma; > >> + struct mm_struct *mm; > >> +}; > >> + > >> +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > >> +{ > >> + struct mm_struct *mm = lock->mm; > >> + struct vm_area_struct *vma; > >> + > >> + if (WARN_ON_ONCE(!mm)) > >> + return NULL; > >> + > >> + vma = lock_vma_under_rcu(mm, ip); > >> + if (vma) > >> + goto vma_locked; > >> + > >> + /* > >> + * Taking mmap_read_lock() is unsafe here, because the caller > >> + * BPF program might already hold it, causing a deadlock. > >> + */ > >> + if (!mmap_read_trylock(mm)) > >> + return NULL; > >> + > >> + vma = vma_lookup(mm, ip); > >> + if (!vma) { > >> + mmap_read_unlock(mm); > >> + return NULL; > >> + } > > > > As the code is written right now, this vma_lookup is futile if we > > don't have CONFIG_PER_VMA_LOCK, no? We'll just unlock mmap and return > > NULL regardless of this operation succeeding. So if that's the intent, > > then starting from mmap_read_trylock() all the way to mmap_read_unlock > > + return NULL should be under #ifdef CONFIG_PER_VMA_LOCK. > > > > But could it be that the intent was to set lock->vma_lock = false, > > lock->vma=vma + return vma here if vma_lookup under mmap_lock > > succeeded?... > > > > (oh, now scrolling further down the thread, seems like AIs picked up > > on this as well, oh well) > > > >> + > >> +#ifdef CONFIG_PER_VMA_LOCK > >> + if (!vma_start_read_locked(vma)) { > >> + mmap_read_unlock(mm); > >> + return NULL; > >> + } > >> + mmap_read_unlock(mm); > >> +#else > >> + mmap_read_unlock(mm); > >> + return NULL; > > > > see above, was this meant to be a "return vma"? > > > > This whole function is quite confusing, please help me understand how > > it's supposed to work both with and without CONFIG_PER_VMA_LOCK. > > In v2 the stack_map_vma_lock could be holding either vma lock or mmap > lock, in order to support CONFIG_PER_VMA_LOCK=n > > Then I tried testing it, and it turned out to set CONFIG_PER_VMA_LOCK=n > you need to disable SMP, which seems like an unlikely kconfig: > https://lore.kernel.org/bpf/bf91eb90-1b6a-4bad-a0e5-072f9dce7daa@linux.dev/ > > So since v3 I dropped attempt to support CONFIG_PER_VMA_LOCK=n by > treating it the same way as failed mmap_read_trylock(), which is > expressed by returning NULL from stack_map_lock_vma(). > > I guess you have a point in that the stackmap lock/unlock helpers can > be refactored further. But with respect to CONFIG_PER_VMA_LOCK=n I > don't see why we'd want to maintain additional complexity of mmap lock > bookkeeping. > Some architectures do not (yet? ever?) support PER_VMA_LOCK at all. But even for the common arches it should be possible to turn off PER_VMA_LOCK: config PER_VMA_LOCK def_bool y depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP help Allow per-vma locking during page fault handling. This feature allows locking each virtual memory area separately when handling page faults instead of taking mmap_lock. We used to not require PER_VMA_LOCK for this functionality, so if it's not undue complexity and maintenance overhead, I'd try to make it work. > > > > >> +#endif > > > >> +vma_locked: > >> + lock->vma_locked = true; > >> + lock->vma = vma; > >> + return vma; > >> +} > >> + > >> +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) > >> +{ > >> + struct vm_area_struct *vma = lock->vma; > >> + > >> + if (lock->vma_locked) { > >> + if (WARN_ON_ONCE(!vma)) > >> + goto out; > > > > this should never ever happen, it's a bug, we shouldn't warn on it, we > > should make sure we never ever have vma_locked set to true if there is > > no vma > > > > the whole pattern of returning NULL from stack_map_lock_vma() and > > still holding lock (i.e., allowing and expecting to call > > stack_map_unlock_vma) seems confusing and error-prone, tbh > > The idea of stack_map_lock_vma() was that it may fail like trylock, > and it is safe to call corresponding unlock(), even if lock isn't > held. This allows to have an unlock() on all paths after a lock() in > the code, instead of "if locked" or "if null" checks. > I got this, but it's a bit of unusual pattern, so easy to get wrong. Normally, if some lookup function fails to find whatever needs to be found, it should release resources (i.e., not keep lock). Sure, it might require slightly more nuanced error handling in the caller, but that's nothing new. > > > > why not simplify it to "if we got vma and locked it -> we return > > non-NULL vma and lock is held, otherwise no lock is held (because > > why?)" > > I'll try that. > > > > > pw-bot: cr > > > > > >> + vma_end_read(vma); > >> + } > >> +out: > >> + lock->vma_locked = false; > >> + lock->vma = NULL; > >> +} > >> + > >> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > >> + u32 trace_nr) > >> +{ > >> + struct mm_struct *mm = current->mm; > >> + struct stack_map_vma_lock lock = { > >> + .vma_locked = false, > >> + .vma = NULL, > >> + .mm = mm, > >> + }; > >> + struct vm_area_struct *vma; > >> + struct file *file; > >> + u64 offset; > >> + u64 ip; > >> + > >> + for (u32 i = 0; i < trace_nr; i++) { > >> + ip = READ_ONCE(id_offs[i].ip); > >> + > >> + vma = stack_map_lock_vma(&lock, ip); > >> + if (!vma || !vma->vm_file) { > >> + stack_map_build_id_set_ip(&id_offs[i]); > >> + stack_map_unlock_vma(&lock); > > > > see above, I find this confusing to need to call unlock_vma if there > > is no vma in the first place... > > > >> + continue; > >> + } > >> + > >> + file = get_file(vma->vm_file); > >> + offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > >> + stack_map_unlock_vma(&lock); > >> + > >> + /* build_id_parse_file() may block on filesystem reads */ > >> + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { > >> + stack_map_build_id_set_ip(&id_offs[i]); > >> + fput(file); > >> + continue; > >> + } > >> + fput(file); > >> + > >> + stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); > > > > what about > > > > if (build_id_parse_file(...)) > > stack_map_build_id_set_ip(...); /* no build id, parsing failed */ > > else > > stack_map_build_id_set_valid(...); > > > > fput(file); > > > > ? > > > >> + } > >> +} > >> + > >> /* > >> * Expects all id_offs[i].ip values to be set to correct initial IPs. > >> * They will be subsequently: > >> @@ -194,6 +296,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > >> const unsigned char *prev_build_id; > >> int i; > >> > >> + if (may_fault && has_user_ctx) { > >> + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); > >> + return; > >> + } > >> + > >> /* If the irq_work is in use, fall back to report ips. Same > >> * fallback is used for kernel stack (!user) on a stackmap with > >> * build_id. > >> -- > >> 2.54.0 > >> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-21 22:50 [PATCH bpf-next v6 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-05-21 22:50 ` [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers Ihor Solodrai 2026-05-21 22:50 ` [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai @ 2026-05-21 22:50 ` Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci ` (2 more replies) 2 siblings, 3 replies; 18+ messages in thread From: Ihor Solodrai @ 2026-05-21 22:50 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi Cc: Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team Stack traces often contain adjacent IPs from the same VMA or from different VMAs backed by the same ELF file. Cache the last successfully parsed build id together with the resolved VMA range and backing file so the sleepable build id path can avoid repeated VMA locking and file parsing in common cases. Suggested-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 52 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 95336c0e8b56..0d641ac39227 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -245,6 +245,14 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i .vma = NULL, .mm = mm, }; + struct { + struct file *file; + const unsigned char *build_id; + unsigned long vm_start; + unsigned long vm_end; + unsigned long vm_pgoff; + } cache = {}; + unsigned long vm_pgoff, vm_start, vm_end; struct vm_area_struct *vma; struct file *file; u64 offset; @@ -253,6 +261,17 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i for (u32 i = 0; i < trace_nr; i++) { ip = READ_ONCE(id_offs[i].ip); + /* + * Range cache fast path: if ip falls within the previously + * resolved VMA range, reuse the cache build_id without + * re-acquiring the VMA lock. + */ + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { + offset = stack_map_build_id_offset(cache.vm_pgoff, cache.vm_start, ip); + stack_map_build_id_set_valid(&id_offs[i], offset, cache.build_id); + continue; + } + vma = stack_map_lock_vma(&lock, ip); if (!vma || !vma->vm_file) { stack_map_build_id_set_ip(&id_offs[i]); @@ -260,8 +279,26 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i continue; } - file = get_file(vma->vm_file); - offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); + file = vma->vm_file; + vm_pgoff = vma->vm_pgoff; + vm_start = vma->vm_start; + vm_end = vma->vm_end; + offset = stack_map_build_id_offset(vm_pgoff, vm_start, ip); + + if (file == cache.file) { + /* + * Same backing file as previous (e.g. different VMAs + * of the same ELF binary). Reuse the cache build_id. + */ + stack_map_unlock_vma(&lock); + stack_map_build_id_set_valid(&id_offs[i], offset, cache.build_id); + cache.vm_start = vm_start; + cache.vm_end = vm_end; + cache.vm_pgoff = vm_pgoff; + continue; + } + + file = get_file(file); stack_map_unlock_vma(&lock); /* build_id_parse_file() may block on filesystem reads */ @@ -270,10 +307,19 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i fput(file); continue; } - fput(file); stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); + if (cache.file) + fput(cache.file); + cache.file = file; + cache.build_id = id_offs[i].build_id; + cache.vm_start = vm_start; + cache.vm_end = vm_end; + cache.vm_pgoff = vm_pgoff; } + + if (cache.file) + fput(cache.file); } /* -- 2.54.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-21 22:50 ` [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai @ 2026-05-21 23:33 ` bot+bpf-ci 2026-05-22 0:13 ` sashiko-bot 2026-05-22 17:46 ` Andrii Nakryiko 2 siblings, 0 replies; 18+ messages in thread From: bot+bpf-ci @ 2026-05-21 23:33 UTC (permalink / raw) To: ihor.solodrai, ast, andrii, daniel, eddyz87, memxor Cc: puranjay, shakeel.butt, mykyta.yatsenko5, bpf, linux-kernel, kernel-team, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 1951 bytes --] > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 95336c0e8b56..0d641ac39227 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c [ ... ] > @@ -253,6 +261,17 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > for (u32 i = 0; i < trace_nr; i++) { > ip = READ_ONCE(id_offs[i].ip); > > + /* > + * Range cache fast path: if ip falls within the previously > + * resolved VMA range, reuse the cache build_id without > + * re-acquiring the VMA lock. > + */ > + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { > + offset = stack_map_build_id_offset(cache.vm_pgoff, cache.vm_start, ip); > + stack_map_build_id_set_valid(&id_offs[i], offset, cache.build_id); > + continue; > + } > + > vma = stack_map_lock_vma(&lock, ip); > if (!vma || !vma->vm_file) { > stack_map_build_id_set_ip(&id_offs[i]); sashiko-bot@kernel.org raised a concern in v3 (repeated in v4 and v5 by bpf-ci) about the VMA classification here using !vma->vm_file. While this check predates your patch series, it's worth addressing for the context this code now operates in. Can private mappings of /dev/zero or driver memory cause issues? These mappings can have a non-NULL vma->vm_file but are still considered anonymous. If the stackmap path proceeds to parse the ELF build ID from these files, could it trigger unintended side effects? For example, reading from a character device might alter hardware state, or reading from a FIFO could consume data. Would checking vma_is_anonymous(vma) here be safer to properly classify the VMA? Reference: https://lore.kernel.org/bpf/20260513043154.24483C2BCB7@smtp.kernel.org/ [ ... ] --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26258400944 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-21 22:50 ` [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci @ 2026-05-22 0:13 ` sashiko-bot 2026-05-22 17:46 ` Andrii Nakryiko 2 siblings, 0 replies; 18+ messages in thread From: sashiko-bot @ 2026-05-22 0:13 UTC (permalink / raw) To: Ihor Solodrai; +Cc: bpf Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: Pre-existing issues: - [High] The code incorrectly uses `!vma->vm_file` to determine if a VMA has a backing file instead of using `vma_is_anonymous()`. -- commit 064bc7134cce521fa9fe6d8d8e8a26a362f8ccdb Author: Ihor Solodrai <ihor.solodrai@linux.dev> bpf: Cache build IDs in sleepable stackmap path This commit adds caching for the last successfully parsed build ID, VMA range, and backing file in the sleepable stackmap path to avoid repeated VMA locking and file parsing. > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c [ ... ] > @@ -253,6 +261,17 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > for (u32 i = 0; i < trace_nr; i++) { > ip = READ_ONCE(id_offs[i].ip); [ ... ] > vma = stack_map_lock_vma(&lock, ip); > if (!vma || !vma->vm_file) { [Severity: High] This is a pre-existing issue, but using vma->vm_file as a proxy to check if a VMA is file-backed or anonymous can be problematic. Anonymous VMAs (like private mappings of /dev/zero or driver mmaps that call vma_set_anonymous()) can leave vma->vm_file set. If this code treats such VMAs as standard file-backed VMAs, could build_id_parse_file() end up invoking kernel_read() on character devices like TTYs or input devices? This might block indefinitely waiting for input, potentially leading to a local denial of service in a sleepable BPF tracing program. Should this check use vma_is_anonymous(vma) to accurately classify the VMA instead of checking !vma->vm_file? > stack_map_build_id_set_ip(&id_offs[i]); > stack_map_unlock_vma(&lock); > continue; > } -- Sashiko AI review · https://sashiko.dev/#/patchset/20260521225022.2695755-1-ihor.solodrai@linux.dev?part=3 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path 2026-05-21 22:50 ` [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci 2026-05-22 0:13 ` sashiko-bot @ 2026-05-22 17:46 ` Andrii Nakryiko 2 siblings, 0 replies; 18+ messages in thread From: Andrii Nakryiko @ 2026-05-22 17:46 UTC (permalink / raw) To: Ihor Solodrai Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Kumar Kartikeya Dwivedi, Puranjay Mohan, Shakeel Butt, Mykyta Yatsenko, bpf, linux-kernel, kernel-team On Thu, May 21, 2026 at 3:51 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote: > > Stack traces often contain adjacent IPs from the same VMA or from > different VMAs backed by the same ELF file. Cache the last successfully > parsed build id together with the resolved VMA range and backing file > so the sleepable build id path can avoid repeated VMA locking and file > parsing in common cases. > > Suggested-by: Mykyta Yatsenko <yatsenko@meta.com> > Acked-by: Mykyta Yatsenko <yatsenko@meta.com> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- > kernel/bpf/stackmap.c | 52 ++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 49 insertions(+), 3 deletions(-) > LGTM, one nit below Acked-by: Andrii Nakryiko <andrii@kernel.org> > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 95336c0e8b56..0d641ac39227 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -245,6 +245,14 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > .vma = NULL, > .mm = mm, > }; > + struct { > + struct file *file; > + const unsigned char *build_id; > + unsigned long vm_start; > + unsigned long vm_end; > + unsigned long vm_pgoff; > + } cache = {}; > + unsigned long vm_pgoff, vm_start, vm_end; > struct vm_area_struct *vma; > struct file *file; > u64 offset; > @@ -253,6 +261,17 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > for (u32 i = 0; i < trace_nr; i++) { > ip = READ_ONCE(id_offs[i].ip); > > + /* > + * Range cache fast path: if ip falls within the previously > + * resolved VMA range, reuse the cache build_id without > + * re-acquiring the VMA lock. > + */ > + if (cache.build_id && ip >= cache.vm_start && ip < cache.vm_end) { > + offset = stack_map_build_id_offset(cache.vm_pgoff, cache.vm_start, ip); > + stack_map_build_id_set_valid(&id_offs[i], offset, cache.build_id); > + continue; > + } > + > vma = stack_map_lock_vma(&lock, ip); > if (!vma || !vma->vm_file) { > stack_map_build_id_set_ip(&id_offs[i]); > @@ -260,8 +279,26 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > continue; > } > > - file = get_file(vma->vm_file); > - offset = stack_map_build_id_offset(vma->vm_pgoff, vma->vm_start, ip); > + file = vma->vm_file; > + vm_pgoff = vma->vm_pgoff; > + vm_start = vma->vm_start; > + vm_end = vma->vm_end; > + offset = stack_map_build_id_offset(vm_pgoff, vm_start, ip); > + > + if (file == cache.file) { > + /* > + * Same backing file as previous (e.g. different VMAs > + * of the same ELF binary). Reuse the cache build_id. > + */ nit, for fast path comment you put it before if() condition, while here you put it inside, I'd move this one outside as well > + stack_map_unlock_vma(&lock); > + stack_map_build_id_set_valid(&id_offs[i], offset, cache.build_id); > + cache.vm_start = vm_start; > + cache.vm_end = vm_end; > + cache.vm_pgoff = vm_pgoff; > + continue; > + } > + > + file = get_file(file); > stack_map_unlock_vma(&lock); > > /* build_id_parse_file() may block on filesystem reads */ > @@ -270,10 +307,19 @@ static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *i > fput(file); > continue; > } > - fput(file); > > stack_map_build_id_set_valid(&id_offs[i], offset, id_offs[i].build_id); > + if (cache.file) > + fput(cache.file); > + cache.file = file; > + cache.build_id = id_offs[i].build_id; > + cache.vm_start = vm_start; > + cache.vm_end = vm_end; > + cache.vm_pgoff = vm_pgoff; > } > + > + if (cache.file) > + fput(cache.file); > } > > /* > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-05-22 18:14 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-21 22:50 [PATCH bpf-next v6 0/3] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-05-21 22:50 ` [PATCH bpf-next v6 1/3] bpf: Factor out stack_map build ID helpers Ihor Solodrai 2026-05-21 23:16 ` sashiko-bot 2026-05-21 23:32 ` Ihor Solodrai 2026-05-22 17:17 ` Andrii Nakryiko 2026-05-22 17:16 ` Andrii Nakryiko 2026-05-22 17:33 ` Ihor Solodrai 2026-05-22 17:50 ` Andrii Nakryiko 2026-05-21 22:50 ` [PATCH bpf-next v6 2/3] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci 2026-05-21 23:41 ` sashiko-bot 2026-05-22 17:42 ` Andrii Nakryiko 2026-05-22 18:04 ` Ihor Solodrai 2026-05-22 18:14 ` Andrii Nakryiko 2026-05-21 22:50 ` [PATCH bpf-next v6 3/3] bpf: Cache build IDs in sleepable stackmap path Ihor Solodrai 2026-05-21 23:33 ` bot+bpf-ci 2026-05-22 0:13 ` sashiko-bot 2026-05-22 17:46 ` Andrii Nakryiko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.