BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/4] Remove KF_SLEEPABLE from arena kfuncs
@ 2025-11-11 16:34 Puranjay Mohan
  2025-11-11 16:34 ` [PATCH bpf-next 1/4] bpf: arena: populate vm_area without allocating memory Puranjay Mohan
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Puranjay Mohan @ 2025-11-11 16:34 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, kernel-team

This set allows arena kfuncs to be called from non-sleepable contexts.
It is acheived by the following changes:

The range_tree is now protected with a rqspinlock and not a mutex,
this change is enough to make bpf_arena_reserve_pages() any context
safe.

bpf_arena_alloc_pages() had four points where it could sleep:

1. Mutex to protect range_tree: now replaced with rqspinlock

2. kvcalloc() for allocations: now replaced with kmalloc_nolock()

3. Allocating pages with bpf_map_alloc_pages(): this already calls
   alloc_pages_nolock() in non-sleepable contexts and therefore is safe.

4. Setting up kernel page tables with vm_area_map_pages():
   vm_area_map_pages() may allocate memory while inserting pages into bpf
   arena's vm_area. Now, at arena creation time populate all page table
   levels except the last level when new pages need to be inserted call
   apply_to_page_range() again which will only set_pte_at() those pages and
   will not allocate memory.

The above four changes make bpf_arena_alloc_pages() any context safe.

bpf_arena_free_pages() has to do the following steps:

1. Update the range_tree
2. vm_area_unmap_pages(): to unmap pages from kernel vm_area
3. flush the tlb: done by 2, already.
4. zap_pages(): to unmap pages from user page tables
5. free pages.

The third patch in this set makes bpf_arena_free_pages() polymorphic using
the specialize_kfunc() mechanism. When called from a sleepable context,
arena_free_pages() remains mostly unchanged except the following:
1. rqspinlock is taken now instead of the mutex for the range tree
2. Instead of using vm_area_unmap_pages() that can free intermediate page
   table levels, apply_to_existing_page_range() with a callback is used
   that only does pte_clear() on the last level and leaves the intermediate
   page table levels intact. This is needed to make sure that
   bpf_arena_alloc_pages() can safely do set_pte_at() without allocating
   intermediate page tables.

When arena_free_pages() is called from a non-sleepable context or it fails to
acquire the rqspinlock in the sleepable case, a lock-less list of struct
arena_free_span is used to queue the uaddr and page cnt. kmalloc_nolock()
is used to allocate this arena_free_span, this can fail but we need to make
this trade-off for frees done from non-sleepable context.

arena_free_pages() then raises an irq_work whose handler in turn schedules
work that iterate this list and clears ptes, flushes tlbs, zap pages, and
frees pages for the queued uaddr and page cnts.

apply_range_clear_cb() with apply_to_existing_page_range() is used to
clear PTEs and collect pages to be freed, struct llist_node pcp_llist;
in the struct page is used to do this.

The arena selftest fails to load on s390x, this is due to an unrelated
bug in the verifier that is being exposed by the selftest that I here. I
have already sent a patch[1] to fix this.


[1] https://lore.kernel.org/all/20251111160949.45623-1-puranjay@kernel.org/

Puranjay Mohan (4):
  bpf: arena: populate vm_area without allocating memory
  bpf: arena: use kmalloc_nolock() in place of kvcalloc()
  bpf: arena: make arena kfuncs any context safe
  selftests: bpf: test non-sleepable arena allocations

 include/linux/bpf.h                           |   2 +
 kernel/bpf/arena.c                            | 290 +++++++++++++++---
 kernel/bpf/verifier.c                         |   5 +
 .../selftests/bpf/prog_tests/arena_list.c     |  20 +-
 .../testing/selftests/bpf/progs/arena_list.c  |  11 +
 .../selftests/bpf/progs/verifier_arena.c      | 185 +++++++++++
 6 files changed, 472 insertions(+), 41 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-11-13  4:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-11 16:34 [PATCH bpf-next 0/4] Remove KF_SLEEPABLE from arena kfuncs Puranjay Mohan
2025-11-11 16:34 ` [PATCH bpf-next 1/4] bpf: arena: populate vm_area without allocating memory Puranjay Mohan
2025-11-11 17:01   ` bot+bpf-ci
2025-11-13  4:49   ` kernel test robot
2025-11-13  4:51   ` kernel test robot
2025-11-13  4:52   ` kernel test robot
2025-11-11 16:34 ` [PATCH bpf-next 2/4] bpf: arena: use kmalloc_nolock() in place of kvcalloc() Puranjay Mohan
2025-11-11 17:01   ` bot+bpf-ci
2025-11-11 17:47     ` Alexei Starovoitov
2025-11-11 16:34 ` [PATCH bpf-next 3/4] bpf: arena: make arena kfuncs any context safe Puranjay Mohan
2025-11-11 17:01   ` bot+bpf-ci
2025-11-11 17:53     ` Alexei Starovoitov
2025-11-11 16:34 ` [PATCH bpf-next 4/4] selftests: bpf: test non-sleepable arena allocations Puranjay Mohan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox