* [PATCH bpf-next 0/2] bpf: range_tree for bpf arena
@ 2024-11-08 2:56 Alexei Starovoitov
2024-11-08 2:56 ` [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in " Alexei Starovoitov
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2024-11-08 2:56 UTC (permalink / raw)
To: bpf; +Cc: daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team
From: Alexei Starovoitov <ast@kernel.org>
Introduce range_tree (internval tree plus rbtree) to track
unallocated ranges in bpf arena and replace maple_tree with it.
This is a step towards making bpf_arena|free_alloc_pages non-sleepable.
The previous approach to reuse drm_mm to replace maple_tree reached
dead end, since sizeof(struct drm_mm_node) = 168 and
sizeof(struct maple_node) = 256 while
sizeof(struct range_node) = 64 introduced in this patch.
Not only it's smaller, but the algorithm splits and merges
adjacent ranges. Ultimate performance doesn't matter.
The main objective of range_tree is to work in context
where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc.
Alexei Starovoitov (2):
bpf: Introduce range_tree data structure and use it in bpf arena
selftests/bpf: Add a test for arena range tree algorithm
kernel/bpf/Makefile | 2 +-
kernel/bpf/arena.c | 34 ++-
kernel/bpf/range_tree.c | 262 ++++++++++++++++++
kernel/bpf/range_tree.h | 21 ++
.../bpf/progs/verifier_arena_large.c | 110 +++++++-
5 files changed, 412 insertions(+), 17 deletions(-)
create mode 100644 kernel/bpf/range_tree.c
create mode 100644 kernel/bpf/range_tree.h
--
2.43.5
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in bpf arena 2024-11-08 2:56 [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Alexei Starovoitov @ 2024-11-08 2:56 ` Alexei Starovoitov 2024-11-13 16:02 ` Kumar Kartikeya Dwivedi 2025-01-06 16:12 ` Barret Rhoden 2024-11-08 2:56 ` [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm Alexei Starovoitov ` (2 subsequent siblings) 3 siblings, 2 replies; 14+ messages in thread From: Alexei Starovoitov @ 2024-11-08 2:56 UTC (permalink / raw) To: bpf; +Cc: daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team From: Alexei Starovoitov <ast@kernel.org> Introduce range_tree data structure and use it in bpf arena to track ranges of allocated pages. range_tree is a large bitmap that is implemented as interval tree plus rbtree. The contiguous sequence of bits represents unallocated pages. Signed-off-by: Alexei Starovoitov <ast@kernel.org> --- kernel/bpf/Makefile | 2 +- kernel/bpf/arena.c | 34 +++--- kernel/bpf/range_tree.c | 262 ++++++++++++++++++++++++++++++++++++++++ kernel/bpf/range_tree.h | 21 ++++ 4 files changed, 304 insertions(+), 15 deletions(-) create mode 100644 kernel/bpf/range_tree.c create mode 100644 kernel/bpf/range_tree.h diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 105328f0b9c0..9762bdddf1de 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -16,7 +16,7 @@ obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o obj-$(CONFIG_BPF_JIT) += trampoline.o obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy) -obj-$(CONFIG_BPF_SYSCALL) += arena.o +obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o endif obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index e52b3ad231b9..3e1dfe349ced 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -6,6 +6,7 @@ #include <linux/btf_ids.h> #include <linux/vmalloc.h> #include <linux/pagemap.h> +#include "range_tree.h" /* * bpf_arena is a sparsely populated shared memory region between bpf program and @@ -45,7 +46,7 @@ struct bpf_arena { u64 user_vm_start; u64 user_vm_end; struct vm_struct *kern_vm; - struct maple_tree mt; + struct range_tree rt; struct list_head vma_list; struct mutex lock; }; @@ -132,7 +133,8 @@ static struct bpf_map *arena_map_alloc(union bpf_attr *attr) INIT_LIST_HEAD(&arena->vma_list); bpf_map_init_from_attr(&arena->map, attr); - mt_init_flags(&arena->mt, MT_FLAGS_ALLOC_RANGE); + range_tree_init(&arena->rt); + range_tree_set(&arena->rt, 0, attr->max_entries); mutex_init(&arena->lock); return &arena->map; @@ -183,7 +185,7 @@ static void arena_map_free(struct bpf_map *map) apply_to_existing_page_range(&init_mm, bpf_arena_get_kern_vm_start(arena), KERN_VM_SZ - GUARD_SZ, existing_page_cb, NULL); free_vm_area(arena->kern_vm); - mtree_destroy(&arena->mt); + range_tree_destroy(&arena->rt); bpf_map_area_free(arena); } @@ -274,20 +276,20 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf) /* User space requested to segfault when page is not allocated by bpf prog */ return VM_FAULT_SIGSEGV; - ret = mtree_insert(&arena->mt, vmf->pgoff, MT_ENTRY, GFP_KERNEL); + ret = range_tree_clear(&arena->rt, vmf->pgoff, 1); if (ret) return VM_FAULT_SIGSEGV; /* Account into memcg of the process that created bpf_arena */ ret = bpf_map_alloc_pages(map, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE, 1, &page); if (ret) { - mtree_erase(&arena->mt, vmf->pgoff); + range_tree_set(&arena->rt, vmf->pgoff, 1); return VM_FAULT_SIGSEGV; } ret = vm_area_map_pages(arena->kern_vm, kaddr, kaddr + PAGE_SIZE, &page); if (ret) { - mtree_erase(&arena->mt, vmf->pgoff); + range_tree_set(&arena->rt, vmf->pgoff, 1); __free_page(page); return VM_FAULT_SIGSEGV; } @@ -444,12 +446,16 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt guard(mutex)(&arena->lock); - if (uaddr) - ret = mtree_insert_range(&arena->mt, pgoff, pgoff + page_cnt - 1, - MT_ENTRY, GFP_KERNEL); - else - ret = mtree_alloc_range(&arena->mt, &pgoff, MT_ENTRY, - page_cnt, 0, page_cnt_max - 1, GFP_KERNEL); + if (uaddr) { + ret = is_range_tree_set(&arena->rt, pgoff, page_cnt); + if (ret) + goto out_free_pages; + ret = range_tree_clear(&arena->rt, pgoff, page_cnt); + } else { + ret = pgoff = range_tree_find(&arena->rt, page_cnt); + if (pgoff >= 0) + ret = range_tree_clear(&arena->rt, pgoff, page_cnt); + } if (ret) goto out_free_pages; @@ -476,7 +482,7 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt kvfree(pages); return clear_lo32(arena->user_vm_start) + uaddr32; out: - mtree_erase(&arena->mt, pgoff); + range_tree_set(&arena->rt, pgoff, page_cnt); out_free_pages: kvfree(pages); return 0; @@ -516,7 +522,7 @@ static void arena_free_pages(struct bpf_arena *arena, long uaddr, long page_cnt) pgoff = compute_pgoff(arena, uaddr); /* clear range */ - mtree_store_range(&arena->mt, pgoff, pgoff + page_cnt - 1, NULL, GFP_KERNEL); + range_tree_set(&arena->rt, pgoff, page_cnt); if (page_cnt > 1) /* bulk zap if multiple pages being freed */ diff --git a/kernel/bpf/range_tree.c b/kernel/bpf/range_tree.c new file mode 100644 index 000000000000..ca92e1e4441b --- /dev/null +++ b/kernel/bpf/range_tree.c @@ -0,0 +1,262 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <linux/interval_tree_generic.h> +#include <linux/slab.h> +#include <linux/bpf_mem_alloc.h> +#include <linux/bpf.h> +#include "range_tree.h" + +/* + * struct range_tree is a data structure used to allocate contiguous memory + * ranges in bpf arena. It's a large bitmap. The contiguous sequence of bits is + * represented by struct range_node or 'rn' for short. + * rn->rn_rbnode links it into an interval tree while + * rn->rb_range_size links it into a second rbtree sorted by size of the range. + * __find_range() performs binary search and best fit algorithm to find the + * range less or equal requested size. + * range_tree_clear/set() clears or sets a range of bits in this bitmap. The + * adjacent ranges are merged or split at the same time. + * + * The split/merge logic is based/borrowed from XFS's xbitmap32 added + * in commit 6772fcc8890a ("xfs: convert xbitmap to interval tree"). + * + * The implementation relies on external lock to protect rbtree-s. + * The alloc/free of range_node-s is done via bpf_mem_alloc. + * + * bpf arena is using range_tree to representat unallocated slots. + * At init time: + * range_tree_set(rt, 0, max); + * Then: + * start = range_tree_find(rt, len); + * if (start >= 0) + * range_tree_clear(rt, start, len); + * to find free range and mark slots as allocated and later: + * range_tree_set(rt, start, len); + * to mark as unallocated after use. + */ +struct range_node { + struct rb_node rn_rbnode; + struct rb_node rb_range_size; + u32 rn_start; + u32 rn_last; /* inclusive */ + u32 __rn_subtree_last; +}; + +static struct range_node *rb_to_range_node(struct rb_node *rb) +{ + return rb_entry(rb, struct range_node, rb_range_size); +} + +static u32 rn_size(struct range_node *rn) +{ + return rn->rn_last - rn->rn_start + 1; +} + +/* Find range that fits best to requested size */ +static inline struct range_node *__find_range(struct range_tree *rt, u32 len) +{ + struct rb_node *rb = rt->range_size_root.rb_root.rb_node; + struct range_node *best = NULL; + + while (rb) { + struct range_node *rn = rb_to_range_node(rb); + + if (len <= rn_size(rn)) { + best = rn; + rb = rb->rb_right; + } else { + rb = rb->rb_left; + } + } + + return best; +} + +s64 range_tree_find(struct range_tree *rt, u32 len) +{ + struct range_node *rn; + + rn = __find_range(rt, len); + if (!rn) + return -ENOENT; + return rn->rn_start; +} + +/* Insert the range into rbtree sorted by the range size */ +static inline void __range_size_insert(struct range_node *rn, + struct rb_root_cached *root) +{ + struct rb_node **link = &root->rb_root.rb_node, *rb = NULL; + u64 size = rn_size(rn); + bool leftmost = true; + + while (*link) { + rb = *link; + if (size > rn_size(rb_to_range_node(rb))) { + link = &rb->rb_left; + } else { + link = &rb->rb_right; + leftmost = false; + } + } + + rb_link_node(&rn->rb_range_size, rb, link); + rb_insert_color_cached(&rn->rb_range_size, root, leftmost); +} + +#define START(node) ((node)->rn_start) +#define LAST(node) ((node)->rn_last) + +INTERVAL_TREE_DEFINE(struct range_node, rn_rbnode, u32, + __rn_subtree_last, START, LAST, + static inline __maybe_unused, + __range_it) + +static inline __maybe_unused void +range_it_insert(struct range_node *rn, struct range_tree *rt) +{ + __range_size_insert(rn, &rt->range_size_root); + __range_it_insert(rn, &rt->it_root); +} + +static inline __maybe_unused void +range_it_remove(struct range_node *rn, struct range_tree *rt) +{ + rb_erase_cached(&rn->rb_range_size, &rt->range_size_root); + RB_CLEAR_NODE(&rn->rb_range_size); + __range_it_remove(rn, &rt->it_root); +} + +static inline __maybe_unused struct range_node * +range_it_iter_first(struct range_tree *rt, u32 start, u32 last) +{ + return __range_it_iter_first(&rt->it_root, start, last); +} + +/* Clear the range in this range tree */ +int range_tree_clear(struct range_tree *rt, u32 start, u32 len) +{ + u32 last = start + len - 1; + struct range_node *new_rn; + struct range_node *rn; + + while ((rn = range_it_iter_first(rt, start, last))) { + if (rn->rn_start < start && rn->rn_last > last) { + u32 old_last = rn->rn_last; + + /* Overlaps with the entire clearing range */ + range_it_remove(rn, rt); + rn->rn_last = start - 1; + range_it_insert(rn, rt); + + /* Add a range */ + new_rn = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node)); + if (!new_rn) + return -ENOMEM; + new_rn->rn_start = last + 1; + new_rn->rn_last = old_last; + range_it_insert(new_rn, rt); + } else if (rn->rn_start < start) { + /* Overlaps with the left side of the clearing range */ + range_it_remove(rn, rt); + rn->rn_last = start - 1; + range_it_insert(rn, rt); + } else if (rn->rn_last > last) { + /* Overlaps with the right side of the clearing range */ + range_it_remove(rn, rt); + rn->rn_start = last + 1; + range_it_insert(rn, rt); + break; + } else { + /* in the middle of the clearing range */ + range_it_remove(rn, rt); + bpf_mem_free(&bpf_global_ma, rn); + } + } + return 0; +} + +/* Is the whole range set ? */ +int is_range_tree_set(struct range_tree *rt, u32 start, u32 len) +{ + u32 last = start + len - 1; + struct range_node *left; + + /* Is this whole range set ? */ + left = range_it_iter_first(rt, start, last); + if (left && left->rn_start <= start && left->rn_last >= last) + return 0; + return -ESRCH; +} + +/* Set the range in this range tree */ +int range_tree_set(struct range_tree *rt, u32 start, u32 len) +{ + u32 last = start + len - 1; + struct range_node *right; + struct range_node *left; + int err; + + /* Is this whole range already set ? */ + left = range_it_iter_first(rt, start, last); + if (left && left->rn_start <= start && left->rn_last >= last) + return 0; + + /* Clear out everything in the range we want to set. */ + err = range_tree_clear(rt, start, len); + if (err) + return err; + + /* Do we have a left-adjacent range ? */ + left = range_it_iter_first(rt, start - 1, start - 1); + if (left && left->rn_last + 1 != start) + return -EFAULT; + + /* Do we have a right-adjacent range ? */ + right = range_it_iter_first(rt, last + 1, last + 1); + if (right && right->rn_start != last + 1) + return -EFAULT; + + if (left && right) { + /* Combine left and right adjacent ranges */ + range_it_remove(left, rt); + range_it_remove(right, rt); + left->rn_last = right->rn_last; + range_it_insert(left, rt); + bpf_mem_free(&bpf_global_ma, right); + } else if (left) { + /* Combine with the left range */ + range_it_remove(left, rt); + left->rn_last = last; + range_it_insert(left, rt); + } else if (right) { + /* Combine with the right range */ + range_it_remove(right, rt); + right->rn_start = start; + range_it_insert(right, rt); + } else { + left = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node)); + if (!left) + return -ENOMEM; + left->rn_start = start; + left->rn_last = last; + range_it_insert(left, rt); + } + return 0; +} + +void range_tree_destroy(struct range_tree *rt) +{ + struct range_node *rn; + + while ((rn = range_it_iter_first(rt, 0, -1U))) { + range_it_remove(rn, rt); + bpf_mem_free(&bpf_global_ma, rn); + } +} + +void range_tree_init(struct range_tree *rt) +{ + rt->it_root = RB_ROOT_CACHED; + rt->range_size_root = RB_ROOT_CACHED; +} diff --git a/kernel/bpf/range_tree.h b/kernel/bpf/range_tree.h new file mode 100644 index 000000000000..ff0b9110eb71 --- /dev/null +++ b/kernel/bpf/range_tree.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#ifndef _RANGE_TREE_H +#define _RANGE_TREE_H 1 + +struct range_tree { + /* root of interval tree */ + struct rb_root_cached it_root; + /* root of rbtree of interval sizes */ + struct rb_root_cached range_size_root; +}; + +void range_tree_init(struct range_tree *rt); +void range_tree_destroy(struct range_tree *rt); + +int range_tree_clear(struct range_tree *rt, u32 start, u32 len); +int range_tree_set(struct range_tree *rt, u32 start, u32 len); +int is_range_tree_set(struct range_tree *rt, u32 start, u32 len); +s64 range_tree_find(struct range_tree *rt, u32 len); + +#endif -- 2.43.5 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in bpf arena 2024-11-08 2:56 ` [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in " Alexei Starovoitov @ 2024-11-13 16:02 ` Kumar Kartikeya Dwivedi 2025-01-06 16:12 ` Barret Rhoden 1 sibling, 0 replies; 14+ messages in thread From: Kumar Kartikeya Dwivedi @ 2024-11-13 16:02 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, daniel, andrii, martin.lau, eddyz87, djwong, kernel-team On Fri, 8 Nov 2024 at 03:56, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > Introduce range_tree data structure and use it in bpf arena to track > ranges of allocated pages. range_tree is a large bitmap that is > implemented as interval tree plus rbtree. The contiguous sequence of > bits represents unallocated pages. > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in bpf arena 2024-11-08 2:56 ` [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in " Alexei Starovoitov 2024-11-13 16:02 ` Kumar Kartikeya Dwivedi @ 2025-01-06 16:12 ` Barret Rhoden 2025-01-06 17:45 ` Alexei Starovoitov 1 sibling, 1 reply; 14+ messages in thread From: Barret Rhoden @ 2025-01-06 16:12 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team On 11/7/24 9:56 PM, Alexei Starovoitov wrote: > +/* Clear the range in this range tree */ > +int range_tree_clear(struct range_tree *rt, u32 start, u32 len) > +{ > + u32 last = start + len - 1; > + struct range_node *new_rn; > + struct range_node *rn; > + > + while ((rn = range_it_iter_first(rt, start, last))) { > + if (rn->rn_start < start && rn->rn_last > last) { > + u32 old_last = rn->rn_last; > + > + /* Overlaps with the entire clearing range */ > + range_it_remove(rn, rt); > + rn->rn_last = start - 1; > + range_it_insert(rn, rt); > + > + /* Add a range */ > + new_rn = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node)); > + if (!new_rn) > + return -ENOMEM; > + new_rn->rn_start = last + 1; > + new_rn->rn_last = old_last; > + range_it_insert(new_rn, rt); > + } else if (rn->rn_start < start) { > + /* Overlaps with the left side of the clearing range */ > + range_it_remove(rn, rt); > + rn->rn_last = start - 1; > + range_it_insert(rn, rt); > + } else if (rn->rn_last > last) { > + /* Overlaps with the right side of the clearing range */ > + range_it_remove(rn, rt); > + rn->rn_start = last + 1; > + range_it_insert(rn, rt); > + break; ^^^ did you mean to have the break here, but not in the "contains entire range" case? for the arena use case, i think you never have overlapping intervals, so once you hit the last one, you can break. (in both cases). though TBH, i'd just never break in case you ever have intervals that overlap (i.e. two intervals containing 'last') - either for arenas or for someone who copies this code for another use of interval trees. barret > + } else { > + /* in the middle of the clearing range */ > + range_it_remove(rn, rt); > + bpf_mem_free(&bpf_global_ma, rn); > + } > + } > + return 0; > +} ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in bpf arena 2025-01-06 16:12 ` Barret Rhoden @ 2025-01-06 17:45 ` Alexei Starovoitov 0 siblings, 0 replies; 14+ messages in thread From: Alexei Starovoitov @ 2025-01-06 17:45 UTC (permalink / raw) To: Barret Rhoden Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Kumar Kartikeya Dwivedi, Eddy Z, djwong, Kernel Team On Mon, Jan 6, 2025 at 8:12 AM Barret Rhoden <brho@google.com> wrote: > > On 11/7/24 9:56 PM, Alexei Starovoitov wrote: > > +/* Clear the range in this range tree */ > > +int range_tree_clear(struct range_tree *rt, u32 start, u32 len) > > +{ > > + u32 last = start + len - 1; > > + struct range_node *new_rn; > > + struct range_node *rn; > > + > > + while ((rn = range_it_iter_first(rt, start, last))) { > > + if (rn->rn_start < start && rn->rn_last > last) { > > + u32 old_last = rn->rn_last; > > + > > + /* Overlaps with the entire clearing range */ > > + range_it_remove(rn, rt); > > + rn->rn_last = start - 1; > > + range_it_insert(rn, rt); > > + > > + /* Add a range */ > > + new_rn = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node)); > > + if (!new_rn) > > + return -ENOMEM; > > + new_rn->rn_start = last + 1; > > + new_rn->rn_last = old_last; > > + range_it_insert(new_rn, rt); > > + } else if (rn->rn_start < start) { > > + /* Overlaps with the left side of the clearing range */ > > + range_it_remove(rn, rt); > > + rn->rn_last = start - 1; > > + range_it_insert(rn, rt); > > + } else if (rn->rn_last > last) { > > + /* Overlaps with the right side of the clearing range */ > > + range_it_remove(rn, rt); > > + rn->rn_start = last + 1; > > + range_it_insert(rn, rt); > > + break; > ^^^ > did you mean to have the break here, but not in the "contains entire > range" case? for the arena use case, i think you never have overlapping > intervals, so once you hit the last one, you can break. (in both > cases). though TBH, i'd just never break in case you ever have > intervals that overlap (i.e. two intervals containing 'last') - either > for arenas or for someone who copies this code for another use of > interval trees. I copy pasted that 'break' from the original Darrick's commit 6772fcc8890a ("xfs: convert xbitmap to interval tree") This asymmetrical loop termination bothered me as well, but as far as I understood the code it's correct, so I kept it as-is. In bpf arena we won't have overlaps and the way the range tree is used with preceding is_range_tree_set() or range_tree_find() even the 'while' part is not necessary, but somebody might copy paste the code, as you said. This particular 'break' is more of the optimization, I think, since next call to range_it_iter_first() is supposed to return NULL anyway. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm 2024-11-08 2:56 [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Alexei Starovoitov 2024-11-08 2:56 ` [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in " Alexei Starovoitov @ 2024-11-08 2:56 ` Alexei Starovoitov 2024-11-13 16:04 ` Kumar Kartikeya Dwivedi 2024-11-15 12:20 ` Jiri Olsa 2024-11-13 21:59 ` [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Andrii Nakryiko 2024-11-13 22:10 ` patchwork-bot+netdevbpf 3 siblings, 2 replies; 14+ messages in thread From: Alexei Starovoitov @ 2024-11-08 2:56 UTC (permalink / raw) To: bpf; +Cc: daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team From: Alexei Starovoitov <ast@kernel.org> Add a test that verifies specific behavior of arena range tree algorithm and just existing bif_alloc1 test due to use of global data in arena. Signed-off-by: Alexei Starovoitov <ast@kernel.org> --- .../bpf/progs/verifier_arena_large.c | 110 +++++++++++++++++- 1 file changed, 108 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c index 6065f862d964..8a9af79db884 100644 --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c @@ -29,12 +29,12 @@ int big_alloc1(void *ctx) if (!page1) return 1; *page1 = 1; - page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, + page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE * 2, 1, NUMA_NO_NODE, 0); if (!page2) return 2; *page2 = 2; - no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE, + no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, 1, NUMA_NO_NODE, 0); if (no_page) return 3; @@ -66,4 +66,110 @@ int big_alloc1(void *ctx) #endif return 0; } + +#if defined(__BPF_FEATURE_ADDR_SPACE_CAST) +#define PAGE_CNT 100 +__u8 __arena * __arena page[PAGE_CNT]; /* occupies the first page */ +__u8 __arena *base; + +/* + * Check that arena's range_tree algorithm allocates pages sequentially + * on the first pass and then fills in all gaps on the second pass. + */ +__noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, + int max_idx, int step) +{ + __u8 __arena *pg; + int i, pg_idx; + + for (i = 0; i < page_cnt; i++) { + pg = bpf_arena_alloc_pages(&arena, NULL, pages_atonce, + NUMA_NO_NODE, 0); + if (!pg) + return step; + pg_idx = (pg - base) / PAGE_SIZE; + if (first_pass) { + /* Pages must be allocated sequentially */ + if (pg_idx != i) + return step + 100; + } else { + /* Allocator must fill into gaps */ + if (pg_idx >= max_idx || (pg_idx & 1)) + return step + 200; + } + *pg = pg_idx; + page[pg_idx] = pg; + cond_break; + } + return 0; +} + +SEC("syscall") +__success __retval(0) +int big_alloc2(void *ctx) +{ + __u8 __arena *pg; + int i, err; + + base = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0); + if (!base) + return 1; + bpf_arena_free_pages(&arena, (void __arena *)base, 1); + + err = alloc_pages(PAGE_CNT, 1, true, PAGE_CNT, 2); + if (err) + return err; + + /* Clear all even pages */ + for (i = 0; i < PAGE_CNT; i += 2) { + pg = page[i]; + if (*pg != i) + return 3; + bpf_arena_free_pages(&arena, (void __arena *)pg, 1); + page[i] = NULL; + cond_break; + } + + /* Allocate into freed gaps */ + err = alloc_pages(PAGE_CNT / 2, 1, false, PAGE_CNT, 4); + if (err) + return err; + + /* Free pairs of pages */ + for (i = 0; i < PAGE_CNT; i += 4) { + pg = page[i]; + if (*pg != i) + return 5; + bpf_arena_free_pages(&arena, (void __arena *)pg, 2); + page[i] = NULL; + page[i + 1] = NULL; + cond_break; + } + + /* Allocate 2 pages at a time into freed gaps */ + err = alloc_pages(PAGE_CNT / 4, 2, false, PAGE_CNT, 6); + if (err) + return err; + + /* Check pages without freeing */ + for (i = 0; i < PAGE_CNT; i += 2) { + pg = page[i]; + if (*pg != i) + return 7; + cond_break; + } + + pg = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0); + + if (!pg) + return 8; + /* + * The first PAGE_CNT pages are occupied. The new page + * must be above. + */ + if ((pg - base) / PAGE_SIZE < PAGE_CNT) + return 9; + return 0; +} +#endif char _license[] SEC("license") = "GPL"; -- 2.43.5 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm 2024-11-08 2:56 ` [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm Alexei Starovoitov @ 2024-11-13 16:04 ` Kumar Kartikeya Dwivedi 2024-11-15 12:20 ` Jiri Olsa 1 sibling, 0 replies; 14+ messages in thread From: Kumar Kartikeya Dwivedi @ 2024-11-13 16:04 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, daniel, andrii, martin.lau, eddyz87, djwong, kernel-team On Fri, 8 Nov 2024 at 03:56, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > Add a test that verifies specific behavior of arena range tree > algorithm and just existing bif_alloc1 test due to use > of global data in arena. > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm 2024-11-08 2:56 ` [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm Alexei Starovoitov 2024-11-13 16:04 ` Kumar Kartikeya Dwivedi @ 2024-11-15 12:20 ` Jiri Olsa 2024-11-15 16:30 ` Yonghong Song 1 sibling, 1 reply; 14+ messages in thread From: Jiri Olsa @ 2024-11-15 12:20 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team On Thu, Nov 07, 2024 at 06:56:16PM -0800, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@kernel.org> > > Add a test that verifies specific behavior of arena range tree > algorithm and just existing bif_alloc1 test due to use > of global data in arena. > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- > .../bpf/progs/verifier_arena_large.c | 110 +++++++++++++++++- > 1 file changed, 108 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > index 6065f862d964..8a9af79db884 100644 > --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c > +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > @@ -29,12 +29,12 @@ int big_alloc1(void *ctx) > if (!page1) > return 1; > *page1 = 1; > - page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, > + page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE * 2, > 1, NUMA_NO_NODE, 0); > if (!page2) > return 2; > *page2 = 2; > - no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE, > + no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, > 1, NUMA_NO_NODE, 0); > if (no_page) > return 3; > @@ -66,4 +66,110 @@ int big_alloc1(void *ctx) > #endif > return 0; > } > + > +#if defined(__BPF_FEATURE_ADDR_SPACE_CAST) > +#define PAGE_CNT 100 > +__u8 __arena * __arena page[PAGE_CNT]; /* occupies the first page */ > +__u8 __arena *base; > + > +/* > + * Check that arena's range_tree algorithm allocates pages sequentially > + * on the first pass and then fills in all gaps on the second pass. > + */ > +__noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, > + int max_idx, int step) > +{ > + __u8 __arena *pg; > + int i, pg_idx; > + > + for (i = 0; i < page_cnt; i++) { > + pg = bpf_arena_alloc_pages(&arena, NULL, pages_atonce, > + NUMA_NO_NODE, 0); > + if (!pg) > + return step; > + pg_idx = (pg - base) / PAGE_SIZE; hi, I'm getting compile error below with clang 20.0.0: CLNG-BPF [test_progs] verifier_arena_large.bpf.o progs/verifier_arena_large.c:90:24: error: unsupported signed division, please convert to unsigned div/mod. 90 | pg_idx = (pg - base) / PAGE_SIZE; should we just convert it to unsigned div like below? also I saw recent llvm change [1] that might help, I'll give it a try jirka [1] 38a8000f30aa [BPF] Use mul for certain div/mod operations (#110712) --- diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c index 8a9af79db884..e743d008697e 100644 --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c @@ -87,7 +87,7 @@ __noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, NUMA_NO_NODE, 0); if (!pg) return step; - pg_idx = (pg - base) / PAGE_SIZE; + pg_idx = (unsigned int) (pg - base) / PAGE_SIZE; if (first_pass) { /* Pages must be allocated sequentially */ if (pg_idx != i) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm 2024-11-15 12:20 ` Jiri Olsa @ 2024-11-15 16:30 ` Yonghong Song 2024-11-16 19:00 ` Alexei Starovoitov 0 siblings, 1 reply; 14+ messages in thread From: Yonghong Song @ 2024-11-15 16:30 UTC (permalink / raw) To: Jiri Olsa, Alexei Starovoitov Cc: bpf, daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team On 11/15/24 4:20 AM, Jiri Olsa wrote: > On Thu, Nov 07, 2024 at 06:56:16PM -0800, Alexei Starovoitov wrote: >> From: Alexei Starovoitov <ast@kernel.org> >> >> Add a test that verifies specific behavior of arena range tree >> algorithm and just existing bif_alloc1 test due to use >> of global data in arena. >> >> Signed-off-by: Alexei Starovoitov <ast@kernel.org> >> --- >> .../bpf/progs/verifier_arena_large.c | 110 +++++++++++++++++- >> 1 file changed, 108 insertions(+), 2 deletions(-) >> >> diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c >> index 6065f862d964..8a9af79db884 100644 >> --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c >> +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c >> @@ -29,12 +29,12 @@ int big_alloc1(void *ctx) >> if (!page1) >> return 1; >> *page1 = 1; >> - page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, >> + page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE * 2, >> 1, NUMA_NO_NODE, 0); >> if (!page2) >> return 2; >> *page2 = 2; >> - no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE, >> + no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, >> 1, NUMA_NO_NODE, 0); >> if (no_page) >> return 3; >> @@ -66,4 +66,110 @@ int big_alloc1(void *ctx) >> #endif >> return 0; >> } >> + >> +#if defined(__BPF_FEATURE_ADDR_SPACE_CAST) >> +#define PAGE_CNT 100 >> +__u8 __arena * __arena page[PAGE_CNT]; /* occupies the first page */ >> +__u8 __arena *base; >> + >> +/* >> + * Check that arena's range_tree algorithm allocates pages sequentially >> + * on the first pass and then fills in all gaps on the second pass. >> + */ >> +__noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, >> + int max_idx, int step) >> +{ >> + __u8 __arena *pg; >> + int i, pg_idx; >> + >> + for (i = 0; i < page_cnt; i++) { >> + pg = bpf_arena_alloc_pages(&arena, NULL, pages_atonce, >> + NUMA_NO_NODE, 0); >> + if (!pg) >> + return step; >> + pg_idx = (pg - base) / PAGE_SIZE; > hi, > I'm getting compile error below with clang 20.0.0: > > CLNG-BPF [test_progs] verifier_arena_large.bpf.o > progs/verifier_arena_large.c:90:24: error: unsupported signed division, please convert to unsigned div/mod. > 90 | pg_idx = (pg - base) / PAGE_SIZE; > > should we just convert it to unsigned div like below? > > also I saw recent llvm change [1] that might help, I'll give it a try I am using latest clang 20 and compilation is successful due to the llvm change [1]. > > jirka > > > [1] 38a8000f30aa [BPF] Use mul for certain div/mod operations (#110712) > --- > diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > index 8a9af79db884..e743d008697e 100644 > --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c > +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > @@ -87,7 +87,7 @@ __noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, > NUMA_NO_NODE, 0); > if (!pg) > return step; > - pg_idx = (pg - base) / PAGE_SIZE; > + pg_idx = (unsigned int) (pg - base) / PAGE_SIZE; > if (first_pass) { > /* Pages must be allocated sequentially */ > if (pg_idx != i) I think this patch is still good. Compiling the current verifier_arena_large.c will be okay for llvm <= 18 and >= 20. But llvm19 will have compilation failure as you mentioned in the above. So once bpf ci upgrades compiler to llvm19 we will see the above compilation failure. Please verifify it as well. If this is the case in your side, please submit a patch. Thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm 2024-11-15 16:30 ` Yonghong Song @ 2024-11-16 19:00 ` Alexei Starovoitov 2024-11-16 20:35 ` Jiri Olsa 0 siblings, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2024-11-16 19:00 UTC (permalink / raw) To: Yonghong Song Cc: Jiri Olsa, bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Kumar Kartikeya Dwivedi, Eddy Z, djwong, Kernel Team On Fri, Nov 15, 2024 at 8:30 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > On 11/15/24 4:20 AM, Jiri Olsa wrote: > > On Thu, Nov 07, 2024 at 06:56:16PM -0800, Alexei Starovoitov wrote: > >> From: Alexei Starovoitov <ast@kernel.org> > >> > >> Add a test that verifies specific behavior of arena range tree > >> algorithm and just existing bif_alloc1 test due to use > >> of global data in arena. > >> > >> Signed-off-by: Alexei Starovoitov <ast@kernel.org> > >> --- > >> .../bpf/progs/verifier_arena_large.c | 110 +++++++++++++++++- > >> 1 file changed, 108 insertions(+), 2 deletions(-) > >> > >> diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > >> index 6065f862d964..8a9af79db884 100644 > >> --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c > >> +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > >> @@ -29,12 +29,12 @@ int big_alloc1(void *ctx) > >> if (!page1) > >> return 1; > >> *page1 = 1; > >> - page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, > >> + page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE * 2, > >> 1, NUMA_NO_NODE, 0); > >> if (!page2) > >> return 2; > >> *page2 = 2; > >> - no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE, > >> + no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, > >> 1, NUMA_NO_NODE, 0); > >> if (no_page) > >> return 3; > >> @@ -66,4 +66,110 @@ int big_alloc1(void *ctx) > >> #endif > >> return 0; > >> } > >> + > >> +#if defined(__BPF_FEATURE_ADDR_SPACE_CAST) > >> +#define PAGE_CNT 100 > >> +__u8 __arena * __arena page[PAGE_CNT]; /* occupies the first page */ > >> +__u8 __arena *base; > >> + > >> +/* > >> + * Check that arena's range_tree algorithm allocates pages sequentially > >> + * on the first pass and then fills in all gaps on the second pass. > >> + */ > >> +__noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, > >> + int max_idx, int step) > >> +{ > >> + __u8 __arena *pg; > >> + int i, pg_idx; > >> + > >> + for (i = 0; i < page_cnt; i++) { > >> + pg = bpf_arena_alloc_pages(&arena, NULL, pages_atonce, > >> + NUMA_NO_NODE, 0); > >> + if (!pg) > >> + return step; > >> + pg_idx = (pg - base) / PAGE_SIZE; > > hi, > > I'm getting compile error below with clang 20.0.0: > > > > CLNG-BPF [test_progs] verifier_arena_large.bpf.o > > progs/verifier_arena_large.c:90:24: error: unsupported signed division, please convert to unsigned div/mod. > > 90 | pg_idx = (pg - base) / PAGE_SIZE; > > > > should we just convert it to unsigned div like below? > > > > also I saw recent llvm change [1] that might help, I'll give it a try > > I am using latest clang 20 and compilation is successful due to the llvm change [1]. > > > > > jirka > > > > > > [1] 38a8000f30aa [BPF] Use mul for certain div/mod operations (#110712) > > --- > > diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > index 8a9af79db884..e743d008697e 100644 > > --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > @@ -87,7 +87,7 @@ __noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, > > NUMA_NO_NODE, 0); > > if (!pg) > > return step; > > - pg_idx = (pg - base) / PAGE_SIZE; > > + pg_idx = (unsigned int) (pg - base) / PAGE_SIZE; > > if (first_pass) { > > /* Pages must be allocated sequentially */ > > if (pg_idx != i) > > I think this patch is still good. > Compiling the current verifier_arena_large.c will be okay for llvm <= 18 and >= 20. > But llvm19 will have compilation failure as you mentioned in the above. > > So once bpf ci upgrades compiler to llvm19 we will see the above compilation failure. > > Please verifify it as well. If this is the case in your side, please submit a patch. The merge window is about to open, so I pushed the fix myself. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm 2024-11-16 19:00 ` Alexei Starovoitov @ 2024-11-16 20:35 ` Jiri Olsa 0 siblings, 0 replies; 14+ messages in thread From: Jiri Olsa @ 2024-11-16 20:35 UTC (permalink / raw) To: Alexei Starovoitov Cc: Yonghong Song, Jiri Olsa, bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Kumar Kartikeya Dwivedi, Eddy Z, djwong, Kernel Team On Sat, Nov 16, 2024 at 11:00:21AM -0800, Alexei Starovoitov wrote: > On Fri, Nov 15, 2024 at 8:30 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > > > > > > On 11/15/24 4:20 AM, Jiri Olsa wrote: > > > On Thu, Nov 07, 2024 at 06:56:16PM -0800, Alexei Starovoitov wrote: > > >> From: Alexei Starovoitov <ast@kernel.org> > > >> > > >> Add a test that verifies specific behavior of arena range tree > > >> algorithm and just existing bif_alloc1 test due to use > > >> of global data in arena. > > >> > > >> Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > >> --- > > >> .../bpf/progs/verifier_arena_large.c | 110 +++++++++++++++++- > > >> 1 file changed, 108 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > >> index 6065f862d964..8a9af79db884 100644 > > >> --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > >> +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > >> @@ -29,12 +29,12 @@ int big_alloc1(void *ctx) > > >> if (!page1) > > >> return 1; > > >> *page1 = 1; > > >> - page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, > > >> + page2 = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE * 2, > > >> 1, NUMA_NO_NODE, 0); > > >> if (!page2) > > >> return 2; > > >> *page2 = 2; > > >> - no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE, > > >> + no_page = bpf_arena_alloc_pages(&arena, base + ARENA_SIZE - PAGE_SIZE, > > >> 1, NUMA_NO_NODE, 0); > > >> if (no_page) > > >> return 3; > > >> @@ -66,4 +66,110 @@ int big_alloc1(void *ctx) > > >> #endif > > >> return 0; > > >> } > > >> + > > >> +#if defined(__BPF_FEATURE_ADDR_SPACE_CAST) > > >> +#define PAGE_CNT 100 > > >> +__u8 __arena * __arena page[PAGE_CNT]; /* occupies the first page */ > > >> +__u8 __arena *base; > > >> + > > >> +/* > > >> + * Check that arena's range_tree algorithm allocates pages sequentially > > >> + * on the first pass and then fills in all gaps on the second pass. > > >> + */ > > >> +__noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, > > >> + int max_idx, int step) > > >> +{ > > >> + __u8 __arena *pg; > > >> + int i, pg_idx; > > >> + > > >> + for (i = 0; i < page_cnt; i++) { > > >> + pg = bpf_arena_alloc_pages(&arena, NULL, pages_atonce, > > >> + NUMA_NO_NODE, 0); > > >> + if (!pg) > > >> + return step; > > >> + pg_idx = (pg - base) / PAGE_SIZE; > > > hi, > > > I'm getting compile error below with clang 20.0.0: > > > > > > CLNG-BPF [test_progs] verifier_arena_large.bpf.o > > > progs/verifier_arena_large.c:90:24: error: unsupported signed division, please convert to unsigned div/mod. > > > 90 | pg_idx = (pg - base) / PAGE_SIZE; > > > > > > should we just convert it to unsigned div like below? > > > > > > also I saw recent llvm change [1] that might help, I'll give it a try > > > > I am using latest clang 20 and compilation is successful due to the llvm change [1]. > > > > > > > > jirka > > > > > > > > > [1] 38a8000f30aa [BPF] Use mul for certain div/mod operations (#110712) > > > --- > > > diff --git a/tools/testing/selftests/bpf/progs/verifier_arena_large.c b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > > index 8a9af79db884..e743d008697e 100644 > > > --- a/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > > +++ b/tools/testing/selftests/bpf/progs/verifier_arena_large.c > > > @@ -87,7 +87,7 @@ __noinline int alloc_pages(int page_cnt, int pages_atonce, bool first_pass, > > > NUMA_NO_NODE, 0); > > > if (!pg) > > > return step; > > > - pg_idx = (pg - base) / PAGE_SIZE; > > > + pg_idx = (unsigned int) (pg - base) / PAGE_SIZE; > > > if (first_pass) { > > > /* Pages must be allocated sequentially */ > > > if (pg_idx != i) > > > > I think this patch is still good. > > Compiling the current verifier_arena_large.c will be okay for llvm <= 18 and >= 20. > > But llvm19 will have compilation failure as you mentioned in the above. > > > > So once bpf ci upgrades compiler to llvm19 we will see the above compilation failure. > > > > Please verifify it as well. If this is the case in your side, please submit a patch. > > The merge window is about to open, so I pushed the fix myself. thanks, also the [1] llvm change fixes the build for me jirka ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 0/2] bpf: range_tree for bpf arena 2024-11-08 2:56 [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Alexei Starovoitov 2024-11-08 2:56 ` [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in " Alexei Starovoitov 2024-11-08 2:56 ` [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm Alexei Starovoitov @ 2024-11-13 21:59 ` Andrii Nakryiko 2024-11-14 0:48 ` Alexei Starovoitov 2024-11-13 22:10 ` patchwork-bot+netdevbpf 3 siblings, 1 reply; 14+ messages in thread From: Andrii Nakryiko @ 2024-11-13 21:59 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team On Thu, Nov 7, 2024 at 6:56 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > Introduce range_tree (internval tree plus rbtree) to track > unallocated ranges in bpf arena and replace maple_tree with it. > This is a step towards making bpf_arena|free_alloc_pages non-sleepable. > The previous approach to reuse drm_mm to replace maple_tree reached > dead end, since sizeof(struct drm_mm_node) = 168 and > sizeof(struct maple_node) = 256 while > sizeof(struct range_node) = 64 introduced in this patch. > Not only it's smaller, but the algorithm splits and merges > adjacent ranges. Ultimate performance doesn't matter. > The main objective of range_tree is to work in context > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. > > Alexei Starovoitov (2): > bpf: Introduce range_tree data structure and use it in bpf arena > selftests/bpf: Add a test for arena range tree algorithm > > kernel/bpf/Makefile | 2 +- > kernel/bpf/arena.c | 34 ++- > kernel/bpf/range_tree.c | 262 ++++++++++++++++++ > kernel/bpf/range_tree.h | 21 ++ > .../bpf/progs/verifier_arena_large.c | 110 +++++++- > 5 files changed, 412 insertions(+), 17 deletions(-) > create mode 100644 kernel/bpf/range_tree.c > create mode 100644 kernel/bpf/range_tree.h > > -- > 2.43.5 > I skimmed through just to familiarize myself, superficially the range addition logic seems correct. I'll just bikeshed a bit, take it for what it's worth. I found some naming choices a bit weird. rn_start and rn_last, just doesn't match in my head. If it's "start", then it's "end" (or "finish", but it's weird for this case). If it's "last", then it should have "first". "start"/"end" sounds best in my head, fwiw. As for an API, is_range_tree_set() caught my eye as well. I'd expect to see a consistent "range_tree_" prefix for the internal API for this data structure. So "range_tree_is_set()" was what I expected. But all minor, feel free to follow up if you agree. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 0/2] bpf: range_tree for bpf arena 2024-11-13 21:59 ` [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Andrii Nakryiko @ 2024-11-14 0:48 ` Alexei Starovoitov 0 siblings, 0 replies; 14+ messages in thread From: Alexei Starovoitov @ 2024-11-14 0:48 UTC (permalink / raw) To: Andrii Nakryiko Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Kumar Kartikeya Dwivedi, Eddy Z, djwong, Kernel Team On Wed, Nov 13, 2024 at 1:59 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Nov 7, 2024 at 6:56 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > From: Alexei Starovoitov <ast@kernel.org> > > > > Introduce range_tree (internval tree plus rbtree) to track > > unallocated ranges in bpf arena and replace maple_tree with it. > > This is a step towards making bpf_arena|free_alloc_pages non-sleepable. > > The previous approach to reuse drm_mm to replace maple_tree reached > > dead end, since sizeof(struct drm_mm_node) = 168 and > > sizeof(struct maple_node) = 256 while > > sizeof(struct range_node) = 64 introduced in this patch. > > Not only it's smaller, but the algorithm splits and merges > > adjacent ranges. Ultimate performance doesn't matter. > > The main objective of range_tree is to work in context > > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. > > > > Alexei Starovoitov (2): > > bpf: Introduce range_tree data structure and use it in bpf arena > > selftests/bpf: Add a test for arena range tree algorithm > > > > kernel/bpf/Makefile | 2 +- > > kernel/bpf/arena.c | 34 ++- > > kernel/bpf/range_tree.c | 262 ++++++++++++++++++ > > kernel/bpf/range_tree.h | 21 ++ > > .../bpf/progs/verifier_arena_large.c | 110 +++++++- > > 5 files changed, 412 insertions(+), 17 deletions(-) > > create mode 100644 kernel/bpf/range_tree.c > > create mode 100644 kernel/bpf/range_tree.h > > > > -- > > 2.43.5 > > > > I skimmed through just to familiarize myself, superficially the range > addition logic seems correct. > > I'll just bikeshed a bit, take it for what it's worth. I found some > naming choices a bit weird. > > rn_start and rn_last, just doesn't match in my head. If it's "start", > then it's "end" (or "finish", but it's weird for this case). If it's > "last", then it should have "first". "start"/"end" sounds best in my > head, fwiw. Agree. It bothered me too a bit, but I kept it as-is to be consistent with xbitmap. So prefer to keep it this way. > > As for an API, is_range_tree_set() caught my eye as well. I'd expect > to see a consistent "range_tree_" prefix for the internal API for this > data structure. So "range_tree_is_set()" was what I expected. This is what I tried first, but looking at how it can be used the "_is_" part in the middle is too easy to misread. if (!range_tree_is_set(rt, pgoff, page_cnt)) range_tree_set(rt, pgoff, page_cnt); // not so bad here if (!range_tree_is_set(rt, pgoff, page_cnt)) // is above "_set" or "_is_set" range_tree_clear(rt, pgoff, page_cnt); Hence I moved "is_" to the beginning to make it more visually different: if (!is_range_tree_set(rt, pgoff, page_cnt)) range_tree_clear(rt, pgoff, page_cnt); Not sure whether the consistent "range_tree_" prefix is a better trade off. No strong opinion. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next 0/2] bpf: range_tree for bpf arena 2024-11-08 2:56 [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Alexei Starovoitov ` (2 preceding siblings ...) 2024-11-13 21:59 ` [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Andrii Nakryiko @ 2024-11-13 22:10 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 14+ messages in thread From: patchwork-bot+netdevbpf @ 2024-11-13 22:10 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, daniel, andrii, martin.lau, memxor, eddyz87, djwong, kernel-team Hello: This series was applied to bpf/bpf-next.git (master) by Andrii Nakryiko <andrii@kernel.org>: On Thu, 7 Nov 2024 18:56:14 -0800 you wrote: > From: Alexei Starovoitov <ast@kernel.org> > > Introduce range_tree (internval tree plus rbtree) to track > unallocated ranges in bpf arena and replace maple_tree with it. > This is a step towards making bpf_arena|free_alloc_pages non-sleepable. > The previous approach to reuse drm_mm to replace maple_tree reached > dead end, since sizeof(struct drm_mm_node) = 168 and > sizeof(struct maple_node) = 256 while > sizeof(struct range_node) = 64 introduced in this patch. > Not only it's smaller, but the algorithm splits and merges > adjacent ranges. Ultimate performance doesn't matter. > The main objective of range_tree is to work in context > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. > > [...] Here is the summary with links: - [bpf-next,1/2] bpf: Introduce range_tree data structure and use it in bpf arena https://git.kernel.org/bpf/bpf-next/c/b795379757eb - [bpf-next,2/2] selftests/bpf: Add a test for arena range tree algorithm https://git.kernel.org/bpf/bpf-next/c/e58358afa84e You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-01-06 17:46 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-08 2:56 [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Alexei Starovoitov 2024-11-08 2:56 ` [PATCH bpf-next 1/2] bpf: Introduce range_tree data structure and use it in " Alexei Starovoitov 2024-11-13 16:02 ` Kumar Kartikeya Dwivedi 2025-01-06 16:12 ` Barret Rhoden 2025-01-06 17:45 ` Alexei Starovoitov 2024-11-08 2:56 ` [PATCH bpf-next 2/2] selftests/bpf: Add a test for arena range tree algorithm Alexei Starovoitov 2024-11-13 16:04 ` Kumar Kartikeya Dwivedi 2024-11-15 12:20 ` Jiri Olsa 2024-11-15 16:30 ` Yonghong Song 2024-11-16 19:00 ` Alexei Starovoitov 2024-11-16 20:35 ` Jiri Olsa 2024-11-13 21:59 ` [PATCH bpf-next 0/2] bpf: range_tree for bpf arena Andrii Nakryiko 2024-11-14 0:48 ` Alexei Starovoitov 2024-11-13 22:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox