* [PATCH bpf-next v2] bpf: Use kmalloc_nolock() in range tree
@ 2025-11-06 17:06 Puranjay Mohan
2025-11-06 17:30 ` bot+bpf-ci
2025-11-07 0:00 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 4+ messages in thread
From: Puranjay Mohan @ 2025-11-06 17:06 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, kernel-team
The range tree uses bpf_mem_alloc() that is safe to be called from all
contexts and uses a pre-allocated pool of memory to serve these
allocations.
Replace bpf_mem_alloc() with kmalloc_nolock() as it can be called safely
from all contexts and is more scalable than bpf_mem_alloc().
Remove the migrate_disable/enable pairs as they were only needed for
bpf_mem_alloc() as it does per-cpu operations, kmalloc_nolock() doesn't
need this.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
v1: https://lore.kernel.org/all/20251106162935.7146-1-puranjay@kernel.org/
Changes in v1->v2:
- Drop __GFP_ACCOUNT from kmalloc_nolock();
---
kernel/bpf/range_tree.c | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)
diff --git a/kernel/bpf/range_tree.c b/kernel/bpf/range_tree.c
index 37b80a23ae1a..99c63d982c5d 100644
--- a/kernel/bpf/range_tree.c
+++ b/kernel/bpf/range_tree.c
@@ -2,7 +2,6 @@
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <linux/interval_tree_generic.h>
#include <linux/slab.h>
-#include <linux/bpf_mem_alloc.h>
#include <linux/bpf.h>
#include "range_tree.h"
@@ -21,7 +20,7 @@
* in commit 6772fcc8890a ("xfs: convert xbitmap to interval tree").
*
* The implementation relies on external lock to protect rbtree-s.
- * The alloc/free of range_node-s is done via bpf_mem_alloc.
+ * The alloc/free of range_node-s is done via kmalloc_nolock().
*
* bpf arena is using range_tree to represent unallocated slots.
* At init time:
@@ -150,9 +149,7 @@ int range_tree_clear(struct range_tree *rt, u32 start, u32 len)
range_it_insert(rn, rt);
/* Add a range */
- migrate_disable();
- new_rn = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node));
- migrate_enable();
+ new_rn = kmalloc_nolock(sizeof(struct range_node), 0, NUMA_NO_NODE);
if (!new_rn)
return -ENOMEM;
new_rn->rn_start = last + 1;
@@ -172,9 +169,7 @@ int range_tree_clear(struct range_tree *rt, u32 start, u32 len)
} else {
/* in the middle of the clearing range */
range_it_remove(rn, rt);
- migrate_disable();
- bpf_mem_free(&bpf_global_ma, rn);
- migrate_enable();
+ kfree_nolock(rn);
}
}
return 0;
@@ -227,9 +222,7 @@ int range_tree_set(struct range_tree *rt, u32 start, u32 len)
range_it_remove(right, rt);
left->rn_last = right->rn_last;
range_it_insert(left, rt);
- migrate_disable();
- bpf_mem_free(&bpf_global_ma, right);
- migrate_enable();
+ kfree_nolock(right);
} else if (left) {
/* Combine with the left range */
range_it_remove(left, rt);
@@ -241,9 +234,7 @@ int range_tree_set(struct range_tree *rt, u32 start, u32 len)
right->rn_start = start;
range_it_insert(right, rt);
} else {
- migrate_disable();
- left = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node));
- migrate_enable();
+ left = kmalloc_nolock(sizeof(struct range_node), 0, NUMA_NO_NODE);
if (!left)
return -ENOMEM;
left->rn_start = start;
@@ -259,7 +250,7 @@ void range_tree_destroy(struct range_tree *rt)
while ((rn = range_it_iter_first(rt, 0, -1U))) {
range_it_remove(rn, rt);
- bpf_mem_free(&bpf_global_ma, rn);
+ kfree_nolock(rn);
}
}
--
2.47.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH bpf-next v2] bpf: Use kmalloc_nolock() in range tree
2025-11-06 17:06 [PATCH bpf-next v2] bpf: Use kmalloc_nolock() in range tree Puranjay Mohan
@ 2025-11-06 17:30 ` bot+bpf-ci
2025-11-06 19:32 ` Alexei Starovoitov
2025-11-07 0:00 ` patchwork-bot+netdevbpf
1 sibling, 1 reply; 4+ messages in thread
From: bot+bpf-ci @ 2025-11-06 17:30 UTC (permalink / raw)
To: puranjay, bpf
Cc: puranjay, puranjay12, ast, andrii, daniel, martin.lau, eddyz87,
memxor, kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 3059 bytes --]
> diff --git a/kernel/bpf/range_tree.c b/kernel/bpf/range_tree.c
> index 37b80a23a..99c63d982 100644
> --- a/kernel/bpf/range_tree.c
> +++ b/kernel/bpf/range_tree.c
[ ... ]
> @@ -150,9 +149,7 @@ int range_tree_clear(struct range_tree *rt, u32 start, u32 len)
> range_it_insert(rn, rt);
>
> /* Add a range */
> - migrate_disable();
> - new_rn = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node));
> - migrate_enable();
> + new_rn = kmalloc_nolock(sizeof(struct range_node), 0, NUMA_NO_NODE);
> if (!new_rn)
> return -ENOMEM;
Was the change in memcg accounting behavior intentional here?
The old code used bpf_mem_alloc(&bpf_global_ma, ...) which charges
allocations to the root memcg (bpf_global_ma is initialized at
late_initcall with get_obj_cgroup_from_current()).
The new code uses kmalloc_nolock() with gfp_flags=0, which does not
include __GFP_ACCOUNT, so these allocations are not charged to any
memcg.
While these are small metadata structures (40 bytes each), and the
actual arena pages are still properly charged via bpf_map_alloc_pages(),
this changes the accounting behavior. Looking at other BPF code, there's
a bpf_map_kmalloc_nolock() wrapper in kernel/bpf/syscall.c that handles
memcg accounting properly with set_active_memcg() and __GFP_ACCOUNT.
Should the range_node allocations use __GFP_ACCOUNT to maintain
consistent accounting, or was the removal of memcg charging intentional?
> @@ -172,9 +169,7 @@ int range_tree_clear(struct range_tree *rt, u32 start, u32 len)
> } else {
> /* in the middle of the clearing range */
> range_it_remove(rn, rt);
> - migrate_disable();
> - bpf_mem_free(&bpf_global_ma, rn);
> - migrate_enable();
> + kfree_nolock(rn);
> }
> }
> return 0;
[ ... ]
> @@ -227,9 +222,7 @@ int range_tree_set(struct range_tree *rt, u32 start, u32 len)
> range_it_remove(right, rt);
> left->rn_last = right->rn_last;
> range_it_insert(left, rt);
> - migrate_disable();
> - bpf_mem_free(&bpf_global_ma, right);
> - migrate_enable();
> + kfree_nolock(right);
Same memcg accounting question applies to the kfree_nolock() calls
as well.
[ ... ]
> @@ -241,9 +234,7 @@ int range_tree_set(struct range_tree *rt, u32 start, u32 len)
> right->rn_start = start;
> range_it_insert(right, rt);
> } else {
> - migrate_disable();
> - left = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node));
> - migrate_enable();
> + left = kmalloc_nolock(sizeof(struct range_node), 0, NUMA_NO_NODE);
> if (!left)
> return -ENOMEM;
Same memcg accounting question applies here as well.
[ ... ]
> @@ -259,7 +250,7 @@ void range_tree_destroy(struct range_tree *rt)
>
> while ((rn = range_it_iter_first(rt, 0, -1U))) {
> range_it_remove(rn, rt);
> - bpf_mem_free(&bpf_global_ma, rn);
> + kfree_nolock(rn);
> }
> }
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19143736343
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH bpf-next v2] bpf: Use kmalloc_nolock() in range tree
2025-11-06 17:30 ` bot+bpf-ci
@ 2025-11-06 19:32 ` Alexei Starovoitov
0 siblings, 0 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2025-11-06 19:32 UTC (permalink / raw)
To: bot+bpf-ci
Cc: Puranjay Mohan, bpf, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard,
Kumar Kartikeya Dwivedi, Kernel Team, Yonghong Song, Chris Mason,
Ihor Solodrai
On Thu, Nov 6, 2025 at 9:30 AM <bot+bpf-ci@kernel.org> wrote:
>
> > diff --git a/kernel/bpf/range_tree.c b/kernel/bpf/range_tree.c
> > index 37b80a23a..99c63d982 100644
> > --- a/kernel/bpf/range_tree.c
> > +++ b/kernel/bpf/range_tree.c
>
> [ ... ]
>
> > @@ -150,9 +149,7 @@ int range_tree_clear(struct range_tree *rt, u32 start, u32 len)
> > range_it_insert(rn, rt);
> >
> > /* Add a range */
> > - migrate_disable();
> > - new_rn = bpf_mem_alloc(&bpf_global_ma, sizeof(struct range_node));
> > - migrate_enable();
> > + new_rn = kmalloc_nolock(sizeof(struct range_node), 0, NUMA_NO_NODE);
> > if (!new_rn)
> > return -ENOMEM;
>
> Was the change in memcg accounting behavior intentional here?
>
> The old code used bpf_mem_alloc(&bpf_global_ma, ...) which charges
> allocations to the root memcg (bpf_global_ma is initialized at
> late_initcall with get_obj_cgroup_from_current()).
>
> The new code uses kmalloc_nolock() with gfp_flags=0, which does not
> include __GFP_ACCOUNT, so these allocations are not charged to any
> memcg.
Glad that AI caught this. We're going to revisit this when
non-sleepable arena allocations land.
At that time we can set_active_memcg() early on in arena_alloc/free
paths and all subsequent page_alloc_nolock() and kmalloc_nolock()
will charge correct memcg, and __GFP_ACCOUNT will return to all of them.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH bpf-next v2] bpf: Use kmalloc_nolock() in range tree
2025-11-06 17:06 [PATCH bpf-next v2] bpf: Use kmalloc_nolock() in range tree Puranjay Mohan
2025-11-06 17:30 ` bot+bpf-ci
@ 2025-11-07 0:00 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-11-07 0:00 UTC (permalink / raw)
To: Puranjay Mohan
Cc: bpf, puranjay12, ast, andrii, daniel, martin.lau, eddyz87, memxor,
kernel-team
Hello:
This patch was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Thu, 6 Nov 2025 17:06:07 +0000 you wrote:
> The range tree uses bpf_mem_alloc() that is safe to be called from all
> contexts and uses a pre-allocated pool of memory to serve these
> allocations.
>
> Replace bpf_mem_alloc() with kmalloc_nolock() as it can be called safely
> from all contexts and is more scalable than bpf_mem_alloc().
>
> [...]
Here is the summary with links:
- [bpf-next,v2] bpf: Use kmalloc_nolock() in range tree
https://git.kernel.org/bpf/bpf-next/c/f8c67d8550ee
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-11-07 0:00 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06 17:06 [PATCH bpf-next v2] bpf: Use kmalloc_nolock() in range tree Puranjay Mohan
2025-11-06 17:30 ` bot+bpf-ci
2025-11-06 19:32 ` Alexei Starovoitov
2025-11-07 0:00 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox