* [PATCH bpf v2 0/2] bpf: Implement stack_map_get_build_id_offset_sleepable()
@ 2026-04-09 1:06 Ihor Solodrai
2026-04-09 1:06 ` [PATCH bpf v2 1/2] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai
2026-04-09 1:06 ` [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai
0 siblings, 2 replies; 7+ messages in thread
From: Ihor Solodrai @ 2026-04-09 1:06 UTC (permalink / raw)
To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Song Liu
Cc: Puranjay Mohan, Shakeel Butt, bpf, linux-kernel, kernel-team
I noticed a following deadlock splat on bpf-next, when running a
subset of selftests/bpf:
#526 uprobe_multi_test:OK
#28 bpf_obj_id:OK
[ 27.561465]
[ 27.561603] ======================================================
[ 27.561899] WARNING: possible circular locking dependency detected
[ 27.562193] 7.0.0-rc5-g0eeb0094ba03 #3 Tainted: G OE
[ 27.562523] ------------------------------------------------------
[ 27.562820] uprobe_multi/561 is trying to acquire lock:
[ 27.563071] ff1100010768ba18 (&sb->s_type->i_mutex_key#8){++++}-{4:4}, at: netfs_start_io_read+0x24/0x110
[ 27.563556]
[ 27.563556] but task is already holding lock:
[ 27.563845] ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.564296]
[ 27.564296] which lock already depends on the new lock.
[ 27.564296]
[ 27.564696]
[ 27.564696] the existing dependency chain (in reverse order) is:
[ 27.565044]
[ 27.565044] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[ 27.565340] __might_fault+0xa3/0x100
[ 27.565572] _copy_to_iter+0xdf/0x1520
[ 27.565786] copy_page_to_iter+0x1c6/0x2d0
[ 27.566011] filemap_read+0x5f7/0xbf0
[ 27.566218] netfs_file_read_iter+0x1ca/0x240
[ 27.566453] vfs_read+0x482/0x9e0
[ 27.566658] ksys_read+0x115/0x1f0
[ 27.566852] do_syscall_64+0xde/0x390
[ 27.567057] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 27.567322]
[ 27.567322] -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
[ 27.567701] __lock_acquire+0x1529/0x2a10
[ 27.567923] lock_acquire+0xd7/0x280
[ 27.568124] down_read_interruptible+0x55/0x340
[ 27.568368] netfs_start_io_read+0x24/0x110
[ 27.568622] netfs_file_read_iter+0x191/0x240
[ 27.568858] __kernel_read+0x401/0x850
[ 27.569068] freader_fetch+0x19b/0x790
[ 27.569279] __build_id_parse+0xde/0x580
[ 27.569528] stack_map_get_build_id_offset+0x248/0x650
[ 27.569802] __bpf_get_stack+0x653/0x890
[ 27.570019] bpf_get_stack_sleepable+0x1d/0x30
[ 27.570257] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.570594] uprobe_prog_run+0x425/0x870
[ 27.570816] uprobe_multi_link_handler+0x58/0xb0
[ 27.571062] handler_chain+0x234/0x1270
[ 27.571275] uprobe_notify_resume+0x4de/0xd60
[ 27.571542] exit_to_user_mode_loop+0xe0/0x480
[ 27.571785] exc_int3+0x1bd/0x260
[ 27.571973] asm_exc_int3+0x39/0x40
[ 27.572169]
[ 27.572169] other info that might help us debug this:
[ 27.572169]
[ 27.572571] Possible unsafe locking scenario:
[ 27.572571]
[ 27.572854] CPU0 CPU1
[ 27.573074] ---- ----
[ 27.573293] rlock(&mm->mmap_lock);
[ 27.573502] lock(&sb->s_type->i_mutex_key#8);
[ 27.573846] lock(&mm->mmap_lock);
[ 27.574135] rlock(&sb->s_type->i_mutex_key#8);
[ 27.574364]
[ 27.574364] *** DEADLOCK ***
[ 27.574364]
[ 27.574671] 3 locks held by uprobe_multi/561:
[ 27.574884] #0: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_notify_resume+0x229/0xd60
[ 27.575371] #1: ffffffff852e0038 (rcu_tasks_trace_srcu_struct){....}-{0:0}, at: uprobe_prog_run+0x239/0x870
[ 27.575874] #2: ff1100010b12a640 (&mm->mmap_lock){++++}-{4:4}, at: stack_map_get_build_id_offset+0xfa/0x650
[ 27.576342]
[ 27.576342] stack backtrace:
[ 27.576578] CPU: 7 UID: 0 PID: 561 Comm: uprobe_multi Tainted: G OE 7.0.0-rc5-g0eeb0094ba03 #3 PREEMPT(full)
[ 27.576586] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 27.576588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-5.el9 11/05/2023
[ 27.576591] Call Trace:
[ 27.576595] <TASK>
[ 27.576598] dump_stack_lvl+0x54/0x70
[ 27.576606] print_circular_bug+0x2e8/0x300
[ 27.576616] check_noncircular+0x12e/0x150
[ 27.576628] __lock_acquire+0x1529/0x2a10
[ 27.576642] ? __pfx_walk_pgd_range+0x10/0x10
[ 27.576651] ? srso_return_thunk+0x5/0x5f
[ 27.576656] ? lock_acquire+0xd7/0x280
[ 27.576661] ? avc_has_perm_noaudit+0x38/0x1f0
[ 27.576672] lock_acquire+0xd7/0x280
[ 27.576677] ? netfs_start_io_read+0x24/0x110
[ 27.576685] ? srso_return_thunk+0x5/0x5f
[ 27.576693] ? netfs_start_io_read+0x24/0x110
[ 27.576698] down_read_interruptible+0x55/0x340
[ 27.576702] ? netfs_start_io_read+0x24/0x110
[ 27.576707] ? __pfx_cmp_ex_search+0x10/0x10
[ 27.576716] netfs_start_io_read+0x24/0x110
[ 27.576723] netfs_file_read_iter+0x191/0x240
[ 27.576731] __kernel_read+0x401/0x850
[ 27.576741] ? __pfx___kernel_read+0x10/0x10
[ 27.576752] ? __pfx_fixup_exception+0x10/0x10
[ 27.576761] ? srso_return_thunk+0x5/0x5f
[ 27.576766] ? lock_is_held_type+0x76/0x100
[ 27.576773] ? srso_return_thunk+0x5/0x5f
[ 27.576777] ? mtree_range_walk+0x421/0x6c0
[ 27.576785] freader_fetch+0x19b/0x790
[ 27.576793] ? mt_find+0x14b/0x340
[ 27.576801] ? __pfx_freader_fetch+0x10/0x10
[ 27.576805] ? mt_find+0x29a/0x340
[ 27.576810] ? mt_find+0x14b/0x340
[ 27.576820] __build_id_parse+0xde/0x580
[ 27.576832] ? __pfx___build_id_parse+0x10/0x10
[ 27.576850] stack_map_get_build_id_offset+0x248/0x650
[ 27.576863] __bpf_get_stack+0x653/0x890
[ 27.576867] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 27.576880] ? __pfx___bpf_get_stack+0x10/0x10
[ 27.576884] ? lock_acquire+0xd7/0x280
[ 27.576889] ? uprobe_prog_run+0x239/0x870
[ 27.576895] ? srso_return_thunk+0x5/0x5f
[ 27.576899] ? stack_trace_save+0xae/0x100
[ 27.576908] bpf_get_stack_sleepable+0x1d/0x30
[ 27.576913] bpf_prog_51bb4c0f31ae471b_uprobe_sleepable+0x2d/0x42
[ 27.576919] uprobe_prog_run+0x425/0x870
[ 27.576927] ? srso_return_thunk+0x5/0x5f
[ 27.576933] ? __pfx_uprobe_prog_run+0x10/0x10
[ 27.576937] ? uprobe_notify_resume+0x413/0xd60
[ 27.576952] uprobe_multi_link_handler+0x58/0xb0
[ 27.576960] handler_chain+0x234/0x1270
[ 27.576976] ? __pfx_handler_chain+0x10/0x10
[ 27.576988] ? srso_return_thunk+0x5/0x5f
[ 27.576992] ? __kasan_kmalloc+0x72/0x90
[ 27.576998] ? __pfx_ri_timer+0x10/0x10
[ 27.577003] ? timer_init_key+0xf5/0x200
[ 27.577013] uprobe_notify_resume+0x4de/0xd60
[ 27.577025] ? uprobe_notify_resume+0x229/0xd60
[ 27.577030] ? __pfx_uprobe_notify_resume+0x10/0x10
[ 27.577035] ? srso_return_thunk+0x5/0x5f
[ 27.577039] ? atomic_notifier_call_chain+0xfc/0x110
[ 27.577047] ? srso_return_thunk+0x5/0x5f
[ 27.577051] ? notify_die+0xe5/0x130
[ 27.577056] ? __pfx_notify_die+0x10/0x10
[ 27.577063] ? srso_return_thunk+0x5/0x5f
[ 27.577071] exit_to_user_mode_loop+0xe0/0x480
[ 27.577076] ? asm_exc_int3+0x39/0x40
[ 27.577083] exc_int3+0x1bd/0x260
[ 27.577090] asm_exc_int3+0x39/0x40
[ 27.577095] RIP: 0033:0x6c4180
[ 27.577100] Code: c6 05 0b 6f 32 00 01 5d c3 90 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa eb 8a 66 2e 0f 1f 84 00 00 00 00 00 <cc> 48 89 e5 31 c0 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 31 c0
[ 27.577104] RSP: 002b:00007ffcbf71ada8 EFLAGS: 00000246
[ 27.577109] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fec9163db7b
[ 27.577112] RDX: 00007ffcbf71adbf RSI: 0000000000001000 RDI: 00000000007f0000
[ 27.577115] RBP: 00007ffcbf71add0 R08: 00007fec91731330 R09: 00007fec9178d060
[ 27.577118] R10: 00007fec9153ad98 R11: 0000000000000202 R12: 00007ffcbf71af08
[ 27.577120] R13: 0000000000787760 R14: 00000000009eade8 R15: 00007fec917bf000
[ 27.577135] </TASK>
#54/1 build_id/nofault-paged-out:OK
It was reproducable, which allowed me to bisect to commit 777a8560fd29
("lib/buildid: use __kernel_read() for sleepable context")
Then I used Codex for analysis of the splat, patch and relevant lore
discussions, and then asked it to produce a patch draft, which was a
starting point for this series.
I verified that the patch series fixes the deadlock splat.
---
v1->v2:
* Addressed feedback from Puranjay:
* split out a small refactoring patch
* use mmap_read_trylock()
* take into account CONFIG_PER_VMA_LOCK
* replace find_vma() with vma_lookup()
* cache prev_build_id to avoid re-parsing the same file
* Snapshot vm_pgoff and vm_start before unlocking (AI)
* To avoid repetitive unlocking statements, introduce struct
stack_map_vma_lock to hold relevant lock state info and add an
unlock helper
v1: https://lore.kernel.org/bpf/20260407223003.720428-1-ihor.solodrai@linux.dev/
---
Ihor Solodrai (2):
bpf: Factor out stack_map_build_id_set_ip() in stackmap.c
bpf: Avoid faultable build ID reads under mm locks
kernel/bpf/stackmap.c | 160 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 151 insertions(+), 9 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH bpf v2 1/2] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c 2026-04-09 1:06 [PATCH bpf v2 0/2] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai @ 2026-04-09 1:06 ` Ihor Solodrai 2026-04-09 14:01 ` Mykyta Yatsenko 2026-04-09 1:06 ` [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 1 sibling, 1 reply; 7+ messages in thread From: Ihor Solodrai @ 2026-04-09 1:06 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Song Liu Cc: Puranjay Mohan, Shakeel Butt, bpf, linux-kernel, kernel-team Factor out a small helper from stack_map_get_build_id_offset() in preparation for adding a sleepable build ID resolution path. No functional changes. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index da3d328f5c15..4ef0fd06cea5 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -152,6 +152,12 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b : build_id_parse_nofault(vma, build_id, NULL); } +static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) +{ + id->status = BPF_STACK_BUILD_ID_IP; + memset(id->build_id, 0, BUILD_ID_SIZE_MAX); +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -165,23 +171,21 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, u32 trace_nr, bool user, bool may_fault) { - int i; struct mmap_unlock_irq_work *work = NULL; bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); + bool has_user_ctx = user && current && current->mm; struct vm_area_struct *vma, *prev_vma = NULL; const char *prev_build_id; + int i; /* If the irq_work is in use, fall back to report ips. Same * fallback is used for kernel stack (!user) on a stackmap with * build_id. */ - if (!user || !current || !current->mm || irq_work_busy || - !mmap_read_trylock(current->mm)) { + if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { /* cannot access current->mm, fall back to ips */ - for (i = 0; i < trace_nr; i++) { - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); - } + for (i = 0; i < trace_nr; i++) + stack_map_build_id_set_ip(&id_offs[i]); return; } @@ -196,8 +200,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, vma = find_vma(current->mm, ip); if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { /* per entry fall back to ips */ - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); + stack_map_build_id_set_ip(&id_offs[i]); continue; } build_id_valid: -- 2.53.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c 2026-04-09 1:06 ` [PATCH bpf v2 1/2] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai @ 2026-04-09 14:01 ` Mykyta Yatsenko 0 siblings, 0 replies; 7+ messages in thread From: Mykyta Yatsenko @ 2026-04-09 14:01 UTC (permalink / raw) To: Ihor Solodrai, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Song Liu Cc: Puranjay Mohan, Shakeel Butt, bpf, linux-kernel, kernel-team On 4/9/26 2:06 AM, Ihor Solodrai wrote: > Factor out a small helper from stack_map_get_build_id_offset() in > preparation for adding a sleepable build ID resolution path. > > No functional changes. > > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- The refactoring looks correct. Acked-by: Mykyta Yatsenko <yatsenko@meta.com> > kernel/bpf/stackmap.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index da3d328f5c15..4ef0fd06cea5 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -152,6 +152,12 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b > : build_id_parse_nofault(vma, build_id, NULL); > } > > +static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) > +{ > + id->status = BPF_STACK_BUILD_ID_IP; > + memset(id->build_id, 0, BUILD_ID_SIZE_MAX); > +} > + > /* > * Expects all id_offs[i].ip values to be set to correct initial IPs. > * They will be subsequently: > @@ -165,23 +171,21 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b > static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > u32 trace_nr, bool user, bool may_fault) > { > - int i; > struct mmap_unlock_irq_work *work = NULL; > bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work); > + bool has_user_ctx = user && current && current->mm; > struct vm_area_struct *vma, *prev_vma = NULL; > const char *prev_build_id; > + int i; > > /* If the irq_work is in use, fall back to report ips. Same > * fallback is used for kernel stack (!user) on a stackmap with > * build_id. > */ > - if (!user || !current || !current->mm || irq_work_busy || > - !mmap_read_trylock(current->mm)) { > + if (!has_user_ctx || irq_work_busy || !mmap_read_trylock(current->mm)) { > /* cannot access current->mm, fall back to ips */ > - for (i = 0; i < trace_nr; i++) { > - id_offs[i].status = BPF_STACK_BUILD_ID_IP; > - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); > - } > + for (i = 0; i < trace_nr; i++) > + stack_map_build_id_set_ip(&id_offs[i]); > return; > } > > @@ -196,8 +200,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > vma = find_vma(current->mm, ip); > if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) { > /* per entry fall back to ips */ > - id_offs[i].status = BPF_STACK_BUILD_ID_IP; > - memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX); > + stack_map_build_id_set_ip(&id_offs[i]); > continue; > } > build_id_valid: ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks 2026-04-09 1:06 [PATCH bpf v2 0/2] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-04-09 1:06 ` [PATCH bpf v2 1/2] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai @ 2026-04-09 1:06 ` Ihor Solodrai 2026-04-09 4:20 ` Ihor Solodrai 2026-04-09 16:34 ` Mykyta Yatsenko 1 sibling, 2 replies; 7+ messages in thread From: Ihor Solodrai @ 2026-04-09 1:06 UTC (permalink / raw) To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Song Liu Cc: Puranjay Mohan, Shakeel Butt, bpf, linux-kernel, kernel-team Sleepable build ID parsing can block in __kernel_read() [1], so the stackmap sleepable path must not call it while holding mmap_lock or a per-VMA read lock. The issue and the fix are conceptually similar to a recent procfs patch [2]. Resolve each covered VMA with a stable read-side reference, preferring lock_vma_under_rcu() and falling back to mmap_read_lock() only long enough to acquire the VMA read lock. Take a reference to the backing file, drop the VMA lock, and then parse the build ID through (sleepable) build_id_parse_file(). [1]: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ [2]: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ Fixes: 777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context") Assisted-by: Codex:gpt-5.4 Suggested-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> --- kernel/bpf/stackmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 4ef0fd06cea5..de3d89e20a1e 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -9,6 +9,7 @@ #include <linux/perf_event.h> #include <linux/btf_ids.h> #include <linux/buildid.h> +#include <linux/mmap_lock.h> #include "percpu_freelist.h" #include "mmap_unlock_work.h" @@ -158,6 +159,139 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) memset(id->build_id, 0, BUILD_ID_SIZE_MAX); } +enum stack_map_vma_lock_state { + STACK_MAP_LOCKED_NONE = 0, + STACK_MAP_LOCKED_VMA, + STACK_MAP_LOCKED_MMAP, +}; + +struct stack_map_vma_lock { + enum stack_map_vma_lock_state state; + struct vm_area_struct *vma; + struct mm_struct *mm; +}; + +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) +{ + struct mm_struct *mm = lock->mm; + struct vm_area_struct *vma; + + if (WARN_ON_ONCE(!mm)) + return NULL; + + vma = lock_vma_under_rcu(mm, ip); + if (vma) + goto vma_locked; + + if (!mmap_read_trylock(mm)) + return NULL; + + vma = vma_lookup(mm, ip); + if (!vma) { + mmap_read_unlock(mm); + return NULL; + } + +#ifdef CONFIG_PER_VMA_LOCK + if (!vma_start_read_locked(vma)) { + mmap_read_unlock(mm); + return NULL; + } + mmap_read_unlock(mm); +#else + lock->state = STACK_MAP_LOCKED_MMAP; + lock->vma = vma; + return vma; +#endif + +vma_locked: + lock->state = STACK_MAP_LOCKED_VMA; + lock->vma = vma; + return vma; +} + +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) +{ + struct vm_area_struct *vma = lock->vma; + struct mm_struct *mm = lock->mm; + + switch (lock->state) { + case STACK_MAP_LOCKED_VMA: + if (WARN_ON_ONCE(!vma)) + break; + vma_end_read(vma); + break; + case STACK_MAP_LOCKED_MMAP: + if (WARN_ON_ONCE(!mm)) + break; + mmap_read_unlock(mm); + break; + default: + break; + } + + lock->state = STACK_MAP_LOCKED_NONE; + lock->vma = NULL; +} + +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, + u32 trace_nr) +{ + struct mm_struct *mm = current->mm; + struct stack_map_vma_lock lock = { + .state = STACK_MAP_LOCKED_NONE, + .vma = NULL, + .mm = mm, + }; + struct file *file, *prev_file = NULL; + unsigned long vm_pgoff, vm_start; + struct vm_area_struct *vma; + const char *prev_build_id; + u64 ip; + + for (u32 i = 0; i < trace_nr; i++) { + ip = READ_ONCE(id_offs[i].ip); + vma = stack_map_lock_vma(&lock, ip); + if (!vma || !vma->vm_file) { + stack_map_build_id_set_ip(&id_offs[i]); + stack_map_unlock_vma(&lock); + continue; + } + + file = vma->vm_file; + vm_pgoff = vma->vm_pgoff; + vm_start = vma->vm_start; + + if (file == prev_file) { + memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); + stack_map_unlock_vma(&lock); + goto build_id_valid; + } + + file = get_file(file); + stack_map_unlock_vma(&lock); + + /* build_id_parse_file() may block on filesystem reads */ + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { + stack_map_build_id_set_ip(&id_offs[i]); + fput(file); + continue; + } + + if (prev_file) + fput(prev_file); + prev_file = file; + prev_build_id = id_offs[i].build_id; + +build_id_valid: + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; + } + + if (prev_file) + fput(prev_file); +} + /* * Expects all id_offs[i].ip values to be set to correct initial IPs. * They will be subsequently: @@ -178,6 +312,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, const char *prev_build_id; int i; + if (may_fault && has_user_ctx) { + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); + return; + } + /* If the irq_work is in use, fall back to report ips. Same * fallback is used for kernel stack (!user) on a stackmap with * build_id. -- 2.53.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks 2026-04-09 1:06 ` [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai @ 2026-04-09 4:20 ` Ihor Solodrai 2026-04-09 14:51 ` Mykyta Yatsenko 2026-04-09 16:34 ` Mykyta Yatsenko 1 sibling, 1 reply; 7+ messages in thread From: Ihor Solodrai @ 2026-04-09 4:20 UTC (permalink / raw) To: Puranjay Mohan Cc: Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov, Song Liu, Shakeel Butt, bpf, linux-kernel, kernel-team On 4/8/26 6:06 PM, Ihor Solodrai wrote: > Sleepable build ID parsing can block in __kernel_read() [1], so the > stackmap sleepable path must not call it while holding mmap_lock or a > per-VMA read lock. > > The issue and the fix are conceptually similar to a recent procfs > patch [2]. > > Resolve each covered VMA with a stable read-side reference, preferring > lock_vma_under_rcu() and falling back to mmap_read_lock() only long > enough to acquire the VMA read lock. Take a reference to the backing > file, drop the VMA lock, and then parse the build ID through > (sleepable) build_id_parse_file(). > > [1]: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ > [2]: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ > > Fixes: 777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context") > Assisted-by: Codex:gpt-5.4 > Suggested-by: Puranjay Mohan <puranjay@kernel.org> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- > kernel/bpf/stackmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 139 insertions(+) > > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 4ef0fd06cea5..de3d89e20a1e 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -9,6 +9,7 @@ > #include <linux/perf_event.h> > #include <linux/btf_ids.h> > #include <linux/buildid.h> > +#include <linux/mmap_lock.h> > #include "percpu_freelist.h" > #include "mmap_unlock_work.h" > > @@ -158,6 +159,139 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) > memset(id->build_id, 0, BUILD_ID_SIZE_MAX); > } > > +enum stack_map_vma_lock_state { > + STACK_MAP_LOCKED_NONE = 0, > + STACK_MAP_LOCKED_VMA, > + STACK_MAP_LOCKED_MMAP, > +}; > + > +struct stack_map_vma_lock { > + enum stack_map_vma_lock_state state; > + struct vm_area_struct *vma; > + struct mm_struct *mm; > +}; > + > +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > +{ > + struct mm_struct *mm = lock->mm; > + struct vm_area_struct *vma; > + > + if (WARN_ON_ONCE(!mm)) > + return NULL; > + > + vma = lock_vma_under_rcu(mm, ip); > + if (vma) > + goto vma_locked; > + > + if (!mmap_read_trylock(mm)) > + return NULL; > + > + vma = vma_lookup(mm, ip); > + if (!vma) { > + mmap_read_unlock(mm); > + return NULL; > + } > + > +#ifdef CONFIG_PER_VMA_LOCK Puranjay and others, I tried to test this patch with CONFIG_PER_VMA_LOCK=n, and it turns out it's not easy to turn off. config PER_VMA_LOCK def_bool y depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP AFAIU this definition means that PER_VMA_LOCK is automatically true if it's dependencies are satisfied. The only practical way to do it appears to be disabling SMP, which is probably very unusual. So I am thinking, maybe instead of having to manage both mmap and vma lock in stackmap.c, we can assume CONFIG_PER_VMA_LOCK=y, and have #ifndef CONFIG_PER_VMA_LOCK return -ENOTSUPP #endif somewhere in the stack_map_get_build_id_offset() callstack? Say in __bpf_get_stackid()? Thoughts? > + if (!vma_start_read_locked(vma)) { > + mmap_read_unlock(mm); > + return NULL; > + } > + mmap_read_unlock(mm); > +#else > + lock->state = STACK_MAP_LOCKED_MMAP; > + lock->vma = vma; > + return vma; > +#endif > + > +vma_locked: > + lock->state = STACK_MAP_LOCKED_VMA; > + lock->vma = vma; > + return vma; > +} > + > +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) > +{ > + struct vm_area_struct *vma = lock->vma; > + struct mm_struct *mm = lock->mm; > + > + switch (lock->state) { > + case STACK_MAP_LOCKED_VMA: > + if (WARN_ON_ONCE(!vma)) > + break; > + vma_end_read(vma); > + break; > + case STACK_MAP_LOCKED_MMAP: > + if (WARN_ON_ONCE(!mm)) > + break; > + mmap_read_unlock(mm); > + break; > + default: > + break; > + } > + > + lock->state = STACK_MAP_LOCKED_NONE; > + lock->vma = NULL; > +} > + > +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > + u32 trace_nr) > +{ > + struct mm_struct *mm = current->mm; > + struct stack_map_vma_lock lock = { > + .state = STACK_MAP_LOCKED_NONE, > + .vma = NULL, > + .mm = mm, > + }; > + struct file *file, *prev_file = NULL; > + unsigned long vm_pgoff, vm_start; > + struct vm_area_struct *vma; > + const char *prev_build_id; > + u64 ip; > + > + for (u32 i = 0; i < trace_nr; i++) { > + ip = READ_ONCE(id_offs[i].ip); > + vma = stack_map_lock_vma(&lock, ip); > + if (!vma || !vma->vm_file) { > + stack_map_build_id_set_ip(&id_offs[i]); > + stack_map_unlock_vma(&lock); > + continue; > + } > + > + file = vma->vm_file; > + vm_pgoff = vma->vm_pgoff; > + vm_start = vma->vm_start; > + > + if (file == prev_file) { > + memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); > + stack_map_unlock_vma(&lock); > + goto build_id_valid; > + } > + > + file = get_file(file); > + stack_map_unlock_vma(&lock); > + > + /* build_id_parse_file() may block on filesystem reads */ > + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { > + stack_map_build_id_set_ip(&id_offs[i]); > + fput(file); > + continue; > + } > + > + if (prev_file) > + fput(prev_file); > + prev_file = file; > + prev_build_id = id_offs[i].build_id; > + > +build_id_valid: > + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; > + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > + } > + > + if (prev_file) > + fput(prev_file); > +} > + > /* > * Expects all id_offs[i].ip values to be set to correct initial IPs. > * They will be subsequently: > @@ -178,6 +312,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > const char *prev_build_id; > int i; > > + if (may_fault && has_user_ctx) { > + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); > + return; > + } > + > /* If the irq_work is in use, fall back to report ips. Same > * fallback is used for kernel stack (!user) on a stackmap with > * build_id. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks 2026-04-09 4:20 ` Ihor Solodrai @ 2026-04-09 14:51 ` Mykyta Yatsenko 0 siblings, 0 replies; 7+ messages in thread From: Mykyta Yatsenko @ 2026-04-09 14:51 UTC (permalink / raw) To: Ihor Solodrai, Puranjay Mohan Cc: Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov, Song Liu, Shakeel Butt, bpf, linux-kernel, kernel-team On 4/9/26 5:20 AM, Ihor Solodrai wrote: > > > On 4/8/26 6:06 PM, Ihor Solodrai wrote: >> Sleepable build ID parsing can block in __kernel_read() [1], so the >> stackmap sleepable path must not call it while holding mmap_lock or a >> per-VMA read lock. >> >> The issue and the fix are conceptually similar to a recent procfs >> patch [2]. >> >> Resolve each covered VMA with a stable read-side reference, preferring >> lock_vma_under_rcu() and falling back to mmap_read_lock() only long >> enough to acquire the VMA read lock. Take a reference to the backing >> file, drop the VMA lock, and then parse the build ID through >> (sleepable) build_id_parse_file(). >> >> [1]: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ >> [2]: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ >> >> Fixes: 777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context") >> Assisted-by: Codex:gpt-5.4 >> Suggested-by: Puranjay Mohan <puranjay@kernel.org> >> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> >> --- >> kernel/bpf/stackmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 139 insertions(+) >> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >> index 4ef0fd06cea5..de3d89e20a1e 100644 >> --- a/kernel/bpf/stackmap.c >> +++ b/kernel/bpf/stackmap.c >> @@ -9,6 +9,7 @@ >> #include <linux/perf_event.h> >> #include <linux/btf_ids.h> >> #include <linux/buildid.h> >> +#include <linux/mmap_lock.h> >> #include "percpu_freelist.h" >> #include "mmap_unlock_work.h" >> >> @@ -158,6 +159,139 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) >> memset(id->build_id, 0, BUILD_ID_SIZE_MAX); >> } >> >> +enum stack_map_vma_lock_state { >> + STACK_MAP_LOCKED_NONE = 0, >> + STACK_MAP_LOCKED_VMA, >> + STACK_MAP_LOCKED_MMAP, >> +}; >> + >> +struct stack_map_vma_lock { >> + enum stack_map_vma_lock_state state; >> + struct vm_area_struct *vma; >> + struct mm_struct *mm; >> +}; >> + >> +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) >> +{ >> + struct mm_struct *mm = lock->mm; >> + struct vm_area_struct *vma; >> + >> + if (WARN_ON_ONCE(!mm)) >> + return NULL; >> + >> + vma = lock_vma_under_rcu(mm, ip); >> + if (vma) >> + goto vma_locked; >> + >> + if (!mmap_read_trylock(mm)) >> + return NULL; >> + >> + vma = vma_lookup(mm, ip); >> + if (!vma) { >> + mmap_read_unlock(mm); >> + return NULL; >> + } >> + >> +#ifdef CONFIG_PER_VMA_LOCK > > Puranjay and others, > > I tried to test this patch with CONFIG_PER_VMA_LOCK=n, and it turns > out it's not easy to turn off. > > config PER_VMA_LOCK > def_bool y > depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP > > AFAIU this definition means that PER_VMA_LOCK is automatically true if > it's dependencies are satisfied. The only practical way to do it > appears to be disabling SMP, which is probably very unusual. > > So I am thinking, maybe instead of having to manage both mmap and vma > lock in stackmap.c, we can assume CONFIG_PER_VMA_LOCK=y, and have > > #ifndef CONFIG_PER_VMA_LOCK > return -ENOTSUPP > #endif > > somewhere in the stack_map_get_build_id_offset() callstack? > Say in __bpf_get_stackid()? > > Thoughts? > I looks like you are right, it should be safe to drop CONFIG_PER_VMA_LOCK=n case, I understand all platforms that care about BPF going to have it enabled. >> + if (!vma_start_read_locked(vma)) { >> + mmap_read_unlock(mm); >> + return NULL; >> + } >> + mmap_read_unlock(mm); >> +#else >> + lock->state = STACK_MAP_LOCKED_MMAP; >> + lock->vma = vma; >> + return vma; >> +#endif >> + >> +vma_locked: >> + lock->state = STACK_MAP_LOCKED_VMA; >> + lock->vma = vma; >> + return vma; >> +} >> + >> +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) >> +{ >> + struct vm_area_struct *vma = lock->vma; >> + struct mm_struct *mm = lock->mm; >> + >> + switch (lock->state) { >> + case STACK_MAP_LOCKED_VMA: >> + if (WARN_ON_ONCE(!vma)) >> + break; >> + vma_end_read(vma); >> + break; >> + case STACK_MAP_LOCKED_MMAP: >> + if (WARN_ON_ONCE(!mm)) >> + break; >> + mmap_read_unlock(mm); >> + break; >> + default: >> + break; >> + } >> + >> + lock->state = STACK_MAP_LOCKED_NONE; >> + lock->vma = NULL; >> +} >> + >> +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, >> + u32 trace_nr) >> +{ >> + struct mm_struct *mm = current->mm; >> + struct stack_map_vma_lock lock = { >> + .state = STACK_MAP_LOCKED_NONE, >> + .vma = NULL, >> + .mm = mm, >> + }; >> + struct file *file, *prev_file = NULL; >> + unsigned long vm_pgoff, vm_start; >> + struct vm_area_struct *vma; >> + const char *prev_build_id; >> + u64 ip; >> + >> + for (u32 i = 0; i < trace_nr; i++) { >> + ip = READ_ONCE(id_offs[i].ip); >> + vma = stack_map_lock_vma(&lock, ip); >> + if (!vma || !vma->vm_file) { >> + stack_map_build_id_set_ip(&id_offs[i]); >> + stack_map_unlock_vma(&lock); >> + continue; >> + } >> + >> + file = vma->vm_file; >> + vm_pgoff = vma->vm_pgoff; >> + vm_start = vma->vm_start; >> + >> + if (file == prev_file) { >> + memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); >> + stack_map_unlock_vma(&lock); >> + goto build_id_valid; >> + } >> + >> + file = get_file(file); >> + stack_map_unlock_vma(&lock); >> + >> + /* build_id_parse_file() may block on filesystem reads */ >> + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { >> + stack_map_build_id_set_ip(&id_offs[i]); >> + fput(file); >> + continue; >> + } >> + >> + if (prev_file) >> + fput(prev_file); >> + prev_file = file; >> + prev_build_id = id_offs[i].build_id; >> + >> +build_id_valid: >> + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; >> + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; >> + } >> + >> + if (prev_file) >> + fput(prev_file); >> +} >> + >> /* >> * Expects all id_offs[i].ip values to be set to correct initial IPs. >> * They will be subsequently: >> @@ -178,6 +312,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, >> const char *prev_build_id; >> int i; >> >> + if (may_fault && has_user_ctx) { >> + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); >> + return; >> + } >> + >> /* If the irq_work is in use, fall back to report ips. Same >> * fallback is used for kernel stack (!user) on a stackmap with >> * build_id. > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks 2026-04-09 1:06 ` [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-04-09 4:20 ` Ihor Solodrai @ 2026-04-09 16:34 ` Mykyta Yatsenko 1 sibling, 0 replies; 7+ messages in thread From: Mykyta Yatsenko @ 2026-04-09 16:34 UTC (permalink / raw) To: Ihor Solodrai, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Song Liu Cc: Puranjay Mohan, Shakeel Butt, bpf, linux-kernel, kernel-team On 4/9/26 2:06 AM, Ihor Solodrai wrote: > Sleepable build ID parsing can block in __kernel_read() [1], so the > stackmap sleepable path must not call it while holding mmap_lock or a > per-VMA read lock. > > The issue and the fix are conceptually similar to a recent procfs > patch [2]. > > Resolve each covered VMA with a stable read-side reference, preferring > lock_vma_under_rcu() and falling back to mmap_read_lock() only long meganit: falling back to mmap_read_trylock()? > enough to acquire the VMA read lock. Take a reference to the backing > file, drop the VMA lock, and then parse the build ID through > (sleepable) build_id_parse_file(). > > [1]: https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/ > [2]: https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/ > > Fixes: 777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context") > Assisted-by: Codex:gpt-5.4 > Suggested-by: Puranjay Mohan <puranjay@kernel.org> > Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> > --- > kernel/bpf/stackmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 139 insertions(+) > > diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c > index 4ef0fd06cea5..de3d89e20a1e 100644 > --- a/kernel/bpf/stackmap.c > +++ b/kernel/bpf/stackmap.c > @@ -9,6 +9,7 @@ > #include <linux/perf_event.h> > #include <linux/btf_ids.h> > #include <linux/buildid.h> > +#include <linux/mmap_lock.h> > #include "percpu_freelist.h" > #include "mmap_unlock_work.h" > > @@ -158,6 +159,139 @@ static inline void stack_map_build_id_set_ip(struct bpf_stack_build_id *id) > memset(id->build_id, 0, BUILD_ID_SIZE_MAX); > } > > +enum stack_map_vma_lock_state { > + STACK_MAP_LOCKED_NONE = 0, > + STACK_MAP_LOCKED_VMA, > + STACK_MAP_LOCKED_MMAP, > +}; > + > +struct stack_map_vma_lock { > + enum stack_map_vma_lock_state state; > + struct vm_area_struct *vma; > + struct mm_struct *mm; > +}; > + > +static struct vm_area_struct *stack_map_lock_vma(struct stack_map_vma_lock *lock, unsigned long ip) > +{ > + struct mm_struct *mm = lock->mm; > + struct vm_area_struct *vma; > + > + if (WARN_ON_ONCE(!mm)) > + return NULL; > + > + vma = lock_vma_under_rcu(mm, ip); > + if (vma) > + goto vma_locked; > + > + if (!mmap_read_trylock(mm)) > + return NULL; > + > + vma = vma_lookup(mm, ip); > + if (!vma) { > + mmap_read_unlock(mm); > + return NULL; > + } > + > +#ifdef CONFIG_PER_VMA_LOCK > + if (!vma_start_read_locked(vma)) { > + mmap_read_unlock(mm); > + return NULL; > + } > + mmap_read_unlock(mm); > +#else > + lock->state = STACK_MAP_LOCKED_MMAP; > + lock->vma = vma; > + return vma; > +#endif > + > +vma_locked: > + lock->state = STACK_MAP_LOCKED_VMA; > + lock->vma = vma; > + return vma; > +} > + > +static void stack_map_unlock_vma(struct stack_map_vma_lock *lock) > +{ > + struct vm_area_struct *vma = lock->vma; > + struct mm_struct *mm = lock->mm; > + > + switch (lock->state) { > + case STACK_MAP_LOCKED_VMA: > + if (WARN_ON_ONCE(!vma)) > + break; > + vma_end_read(vma); > + break; > + case STACK_MAP_LOCKED_MMAP: > + if (WARN_ON_ONCE(!mm)) > + break; > + mmap_read_unlock(mm); > + break; > + default: > + break; > + } > + > + lock->state = STACK_MAP_LOCKED_NONE; > + lock->vma = NULL; > +} > + > +static void stack_map_get_build_id_offset_sleepable(struct bpf_stack_build_id *id_offs, > + u32 trace_nr) > +{ > + struct mm_struct *mm = current->mm; > + struct stack_map_vma_lock lock = { > + .state = STACK_MAP_LOCKED_NONE, > + .vma = NULL, > + .mm = mm, > + }; > + struct file *file, *prev_file = NULL; > + unsigned long vm_pgoff, vm_start; > + struct vm_area_struct *vma; > + const char *prev_build_id; > + u64 ip; > + > + for (u32 i = 0; i < trace_nr; i++) { > + ip = READ_ONCE(id_offs[i].ip); I'm not sure if I understand why READ_ONCE is necessary here. > + vma = stack_map_lock_vma(&lock, ip); > + if (!vma || !vma->vm_file) { > + stack_map_build_id_set_ip(&id_offs[i]); > + stack_map_unlock_vma(&lock); > + continue; > + } > + > + file = vma->vm_file; > + vm_pgoff = vma->vm_pgoff; > + vm_start = vma->vm_start; > + > + if (file == prev_file) { What if instead of caching prev_file, we cache vm_start and vm_end, and if the next IP is in range, reuse previous build id. This should optimize this code further, avoiding locks on the vma used on previous iteration. > + memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX); > + stack_map_unlock_vma(&lock); > + goto build_id_valid; > + } > + > + file = get_file(file); > + stack_map_unlock_vma(&lock); > + > + /* build_id_parse_file() may block on filesystem reads */ > + if (build_id_parse_file(file, id_offs[i].build_id, NULL)) { > + stack_map_build_id_set_ip(&id_offs[i]); > + fput(file); > + continue; > + } > + > + if (prev_file) > + fput(prev_file); > + prev_file = file; > + prev_build_id = id_offs[i].build_id; > + > +build_id_valid: > + id_offs[i].offset = (vm_pgoff << PAGE_SHIFT) + ip - vm_start; > + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; > + } > + > + if (prev_file) > + fput(prev_file); > +} > + > /* > * Expects all id_offs[i].ip values to be set to correct initial IPs. > * They will be subsequently: > @@ -178,6 +312,11 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, > const char *prev_build_id; > int i; > > + if (may_fault && has_user_ctx) { > + stack_map_get_build_id_offset_sleepable(id_offs, trace_nr); > + return; > + } > + > /* If the irq_work is in use, fall back to report ips. Same > * fallback is used for kernel stack (!user) on a stackmap with > * build_id. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-09 16:34 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-09 1:06 [PATCH bpf v2 0/2] bpf: Implement stack_map_get_build_id_offset_sleepable() Ihor Solodrai 2026-04-09 1:06 ` [PATCH bpf v2 1/2] bpf: Factor out stack_map_build_id_set_ip() in stackmap.c Ihor Solodrai 2026-04-09 14:01 ` Mykyta Yatsenko 2026-04-09 1:06 ` [PATCH bpf v2 2/2] bpf: Avoid faultable build ID reads under mm locks Ihor Solodrai 2026-04-09 4:20 ` Ihor Solodrai 2026-04-09 14:51 ` Mykyta Yatsenko 2026-04-09 16:34 ` Mykyta Yatsenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox