* [PATCH RFC bpf-next 0/2] Switch to kmalloc_nolock() in BPF local storage
@ 2025-11-12 17:59 Amery Hung
2025-11-12 17:59 ` [PATCH RFC bpf-next 1/2] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
2025-11-12 17:59 ` [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally Amery Hung
0 siblings, 2 replies; 9+ messages in thread
From: Amery Hung @ 2025-11-12 17:59 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Hi,
This patchset tries to simplify bpf_local_storage.c by switching to
kmalloc_nolock() unconditionally. Currently, local storage adopted
BPF memory allocator in task and cgroup local storage or when PREEMPT_RT
is enabled to allow getting memory in different context without deadlock.
However, due to performance reasons socket local storage did not switch.
Using different memory allocators added a decent amount of complexity.
Therefore, to make [1] and other future work in local storage simpler,
this patchset consolidates the memory allocation/deallocation paths by
switching to kmalloc_nolock() unconditionally.
Benchmark
./bench -p 1 local-storage-create --storage-type <socket,task> \
--batch-size <16,32,64>
The benchmark is a microbenchmark stress-testing how fast local storage
can be created. For task local storage, switching from BPF memory
allocator to kmalloc_nolock() yields a small amount of improvement. For
socket local storage, it losses some when switching from kzalloc() to
kmalloc_nolock().
Socket local storage
memory alloc batch creation speed creation speed diff
--------------- ---- ------------------ ----
kzalloc 16 104.217 ± 0.974k/s 4.15 kmallocs/create
(before) 32 104.355 ± 0.606k/s 4.13 kmallocs/create
64 103.611 ± 0.707k/s 4.15 kmallocs/create
kmalloc_nolock 16 100.566 ± 0.560k/s 1.13 kmallocs/create -3.5%
(after) 32 99.708 ± 0.684k/s 1.15 kmallocs/create -4.5%
64 98.375 ± 1.757k/s 1.13 kmallocs/create -5.1%
Task local storage
memory alloc batch creation speed creation speed diff
--------------- ---- ------------------ ----
BPF memory 16 24.668 ± 0.121k/s 2.54 kmallocs/create
allocator 32 22.899 ± 0.097k/s 2.67 kmallocs/create
(before) 64 22.559 ± 0.076k/s 2.56 kmallocs/create
kmalloc_nolock 16 25.399 ± 0.142k/s 2.51 kmallocs/create +3.0%
(after) 32 23.495 ± 1.285k/s 2.66 kmallocs/create +2.6%
64 23.701 ± 0.207k/s 2.63 kmallocs/create +5.1%
[1] https://lore.kernel.org/bpf/20251002225356.1505480-1-ameryhung@gmail.com/
---
Amery Hung (2):
bpf: Always charge/uncharge memory when allocating/unlinking storage
elements
bpf: Use kmalloc_nolock() in local storage unconditionally
include/linux/bpf_local_storage.h | 12 +-
kernel/bpf/bpf_cgrp_storage.c | 2 +-
kernel/bpf/bpf_inode_storage.c | 2 +-
kernel/bpf/bpf_local_storage.c | 283 +++++-------------------------
kernel/bpf/bpf_task_storage.c | 2 +-
net/core/bpf_sk_storage.c | 6 +-
6 files changed, 53 insertions(+), 254 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH RFC bpf-next 1/2] bpf: Always charge/uncharge memory when allocating/unlinking storage elements
2025-11-12 17:59 [PATCH RFC bpf-next 0/2] Switch to kmalloc_nolock() in BPF local storage Amery Hung
@ 2025-11-12 17:59 ` Amery Hung
2025-11-12 17:59 ` [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally Amery Hung
1 sibling, 0 replies; 9+ messages in thread
From: Amery Hung @ 2025-11-12 17:59 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Since commit a96a44aba556 ("bpf: bpf_sk_storage: Fix invalid wait
context lockdep report"), {charge,uncharge}_mem are always true when
allocating a bpf_local_storage_elem or unlinking a bpf_local_storage_elem
from local storage, so drop these arguments. No functional change.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 2 +-
kernel/bpf/bpf_local_storage.c | 22 ++++++++++------------
net/core/bpf_sk_storage.c | 2 +-
3 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 782f58feea35..3663eabcc3ff 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -184,7 +184,7 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
- bool charge_mem, bool swap_uptrs, gfp_t gfp_flags);
+ bool swap_uptrs, gfp_t gfp_flags);
void bpf_selem_free(struct bpf_local_storage_elem *selem,
struct bpf_local_storage_map *smap,
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index b931fbceb54d..400bdf8a3eb2 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -73,11 +73,11 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
- void *value, bool charge_mem, bool swap_uptrs, gfp_t gfp_flags)
+ void *value, bool swap_uptrs, gfp_t gfp_flags)
{
struct bpf_local_storage_elem *selem;
- if (charge_mem && mem_charge(smap, owner, smap->elem_size))
+ if (mem_charge(smap, owner, smap->elem_size))
return NULL;
if (smap->bpf_ma) {
@@ -106,8 +106,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
return selem;
}
- if (charge_mem)
- mem_uncharge(smap, owner, smap->elem_size);
+ mem_uncharge(smap, owner, smap->elem_size);
return NULL;
}
@@ -284,7 +283,7 @@ static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
*/
static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem,
- bool uncharge_mem, struct hlist_head *free_selem_list)
+ struct hlist_head *free_selem_list)
{
struct bpf_local_storage_map *smap;
bool free_local_storage;
@@ -297,8 +296,7 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
* The owner may be freed once the last selem is unlinked
* from local_storage.
*/
- if (uncharge_mem)
- mem_uncharge(smap, owner, smap->elem_size);
+ mem_uncharge(smap, owner, smap->elem_size);
free_local_storage = hlist_is_singular_node(&selem->snode,
&local_storage->list);
@@ -393,7 +391,7 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
raw_spin_lock_irqsave(&local_storage->lock, flags);
if (likely(selem_linked_to_storage(selem)))
free_local_storage = bpf_selem_unlink_storage_nolock(
- local_storage, selem, true, &selem_free_list);
+ local_storage, selem, &selem_free_list);
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&selem_free_list, reuse_now);
@@ -582,7 +580,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (err)
return ERR_PTR(err);
- selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
+ selem = bpf_selem_alloc(smap, owner, value, swap_uptrs, gfp_flags);
if (!selem)
return ERR_PTR(-ENOMEM);
@@ -616,7 +614,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
/* A lookup has just been done before and concluded a new selem is
* needed. The chance of an unnecessary alloc is unlikely.
*/
- alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
+ alloc_selem = selem = bpf_selem_alloc(smap, owner, value, swap_uptrs, gfp_flags);
if (!alloc_selem)
return ERR_PTR(-ENOMEM);
@@ -656,7 +654,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (old_sdata) {
bpf_selem_unlink_map(SELEM(old_sdata));
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
- true, &old_selem_free_list);
+ &old_selem_free_list);
}
unlock:
@@ -762,7 +760,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
* of the loop will set the free_cgroup_storage to true.
*/
free_storage = bpf_selem_unlink_storage_nolock(
- local_storage, selem, true, &free_selem_list);
+ local_storage, selem, &free_selem_list);
}
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index d3fbaf89a698..bd3c686edc0b 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -136,7 +136,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
{
struct bpf_local_storage_elem *copy_selem;
- copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, false, GFP_ATOMIC);
+ copy_selem = bpf_selem_alloc(smap, newsk, NULL, false, GFP_ATOMIC);
if (!copy_selem)
return NULL;
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally
2025-11-12 17:59 [PATCH RFC bpf-next 0/2] Switch to kmalloc_nolock() in BPF local storage Amery Hung
2025-11-12 17:59 ` [PATCH RFC bpf-next 1/2] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
@ 2025-11-12 17:59 ` Amery Hung
2025-11-12 19:35 ` Alexei Starovoitov
1 sibling, 1 reply; 9+ messages in thread
From: Amery Hung @ 2025-11-12 17:59 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Simplify local storage and selem memory alloc/free paths by switching to
kmalloc_nolock unconditionally.
While it is okay to call bpf_obj_free_fields() immediately in
bpf_selem_free() when reuse_now == true, it is kept only in
bpf_selem_free_rcu() to simplify the code. This requires rcu_barrier()
to be called unconditionally in bpf_local_storage_map_free() since
bpf_obj_free_fields() may also be called from rcu callbacks.
In addition, remove the argument, smap, from bpf_selem_free() and rely
on SDATA(selem)->smap for bpf_obj_free_fields(). This requires
initializing SDATA(selem)->smap earlier during bpf_selem_alloc() as
bpf_local_storage_update() can allocate an selem and free it without
ever linking it to a map.
Finally, clean up and update comments. Note that, we already free selem
after an RCU grace period in bpf_local_storage_update() when
bpf_local_storage_alloc() failed the cmpxchg since commit c0d63f309186
("bpf: Add bpf_selem_free()").
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 10 +-
kernel/bpf/bpf_cgrp_storage.c | 2 +-
kernel/bpf/bpf_inode_storage.c | 2 +-
kernel/bpf/bpf_local_storage.c | 261 ++++--------------------------
kernel/bpf/bpf_task_storage.c | 2 +-
net/core/bpf_sk_storage.c | 4 +-
6 files changed, 41 insertions(+), 240 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 3663eabcc3ff..3c35f0b9b86c 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -53,9 +53,6 @@ struct bpf_local_storage_map {
u32 bucket_log;
u16 elem_size;
u16 cache_idx;
- struct bpf_mem_alloc selem_ma;
- struct bpf_mem_alloc storage_ma;
- bool bpf_ma;
};
struct bpf_local_storage_data {
@@ -129,8 +126,7 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
struct bpf_map *
bpf_local_storage_map_alloc(union bpf_attr *attr,
- struct bpf_local_storage_cache *cache,
- bool bpf_ma);
+ struct bpf_local_storage_cache *cache);
void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *smap,
@@ -186,9 +182,7 @@ struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
bool swap_uptrs, gfp_t gfp_flags);
-void bpf_selem_free(struct bpf_local_storage_elem *selem,
- struct bpf_local_storage_map *smap,
- bool reuse_now);
+void bpf_selem_free(struct bpf_local_storage_elem *selem, bool reuse_now);
int
bpf_local_storage_alloc(void *owner,
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index 0687a760974a..87d50b6b3673 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -146,7 +146,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
{
- return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
+ return bpf_local_storage_map_alloc(attr, &cgroup_cache);
}
static void cgroup_storage_map_free(struct bpf_map *map)
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index e54cce2b9175..30934f6a66a4 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -181,7 +181,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
{
- return bpf_local_storage_map_alloc(attr, &inode_cache, false);
+ return bpf_local_storage_map_alloc(attr, &inode_cache);
}
static void inode_storage_map_free(struct bpf_map *map)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 400bdf8a3eb2..678bbf45eb57 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -80,23 +80,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
if (mem_charge(smap, owner, smap->elem_size))
return NULL;
- if (smap->bpf_ma) {
- selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
- if (selem)
- /* Keep the original bpf_map_kzalloc behavior
- * before started using the bpf_mem_cache_alloc.
- *
- * No need to use zero_map_value. The bpf_selem_free()
- * only does bpf_mem_cache_free when there is
- * no other bpf prog is using the selem.
- */
- memset(SDATA(selem)->data, 0, smap->map.value_size);
- } else {
- selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
- gfp_flags | __GFP_NOWARN);
- }
+ selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size, gfp_flags, NUMA_NO_NODE);
if (selem) {
+ memset(selem, 0, smap->elem_size);
+ RCU_INIT_POINTER(SDATA(selem)->smap, smap);
+
if (value) {
/* No need to call check_and_init_map_value as memory is zero init */
copy_map_value(&smap->map, SDATA(selem)->data, value);
@@ -111,96 +100,35 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
return NULL;
}
-/* rcu tasks trace callback for bpf_ma == false */
-static void __bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
+static void bpf_local_storage_free_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage *local_storage;
- /* If RCU Tasks Trace grace period implies RCU grace period, do
- * kfree(), else do kfree_rcu().
- */
local_storage = container_of(rcu, struct bpf_local_storage, rcu);
- if (rcu_trace_implies_rcu_gp())
- kfree(local_storage);
- else
- kfree_rcu(local_storage, rcu);
+ kfree_nolock(local_storage);
}
-static void bpf_local_storage_free_rcu(struct rcu_head *rcu)
+static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage *local_storage;
local_storage = container_of(rcu, struct bpf_local_storage, rcu);
- bpf_mem_cache_raw_free(local_storage);
-}
-
-static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
-{
if (rcu_trace_implies_rcu_gp())
bpf_local_storage_free_rcu(rcu);
else
call_rcu(rcu, bpf_local_storage_free_rcu);
}
-/* Handle bpf_ma == false */
-static void __bpf_local_storage_free(struct bpf_local_storage *local_storage,
- bool vanilla_rcu)
-{
- if (vanilla_rcu)
- kfree_rcu(local_storage, rcu);
- else
- call_rcu_tasks_trace(&local_storage->rcu,
- __bpf_local_storage_free_trace_rcu);
-}
-
static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_map *smap,
- bool bpf_ma, bool reuse_now)
+ bool reuse_now)
{
if (!local_storage)
return;
- if (!bpf_ma) {
- __bpf_local_storage_free(local_storage, reuse_now);
- return;
- }
-
- if (!reuse_now) {
- call_rcu_tasks_trace(&local_storage->rcu,
- bpf_local_storage_free_trace_rcu);
- return;
- }
-
- if (smap)
- bpf_mem_cache_free(&smap->storage_ma, local_storage);
- else
- /* smap could be NULL if the selem that triggered
- * this 'local_storage' creation had been long gone.
- * In this case, directly do call_rcu().
- */
+ if (reuse_now)
call_rcu(&local_storage->rcu, bpf_local_storage_free_rcu);
-}
-
-/* rcu tasks trace callback for bpf_ma == false */
-static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
-{
- struct bpf_local_storage_elem *selem;
-
- selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
- if (rcu_trace_implies_rcu_gp())
- kfree(selem);
- else
- kfree_rcu(selem, rcu);
-}
-
-/* Handle bpf_ma == false */
-static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
- bool vanilla_rcu)
-{
- if (vanilla_rcu)
- kfree_rcu(selem, rcu);
else
- call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu);
+ call_rcu_tasks_trace(&local_storage->rcu, bpf_local_storage_free_trace_rcu);
}
static void bpf_selem_free_rcu(struct rcu_head *rcu)
@@ -215,7 +143,7 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu)
migrate_disable();
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
migrate_enable();
- bpf_mem_cache_raw_free(selem);
+ kfree_nolock(selem);
}
static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
@@ -227,43 +155,17 @@ static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
}
void bpf_selem_free(struct bpf_local_storage_elem *selem,
- struct bpf_local_storage_map *smap,
bool reuse_now)
{
- if (!smap->bpf_ma) {
- /* Only task storage has uptrs and task storage
- * has moved to bpf_mem_alloc. Meaning smap->bpf_ma == true
- * for task storage, so this bpf_obj_free_fields() won't unpin
- * any uptr.
- */
- bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
- __bpf_selem_free(selem, reuse_now);
- return;
- }
-
- if (reuse_now) {
- /* reuse_now == true only happens when the storage owner
- * (e.g. task_struct) is being destructed or the map itself
- * is being destructed (ie map_free). In both cases,
- * no bpf prog can have a hold on the selem. It is
- * safe to unpin the uptrs and free the selem now.
- */
- bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
- /* Instead of using the vanilla call_rcu(),
- * bpf_mem_cache_free will be able to reuse selem
- * immediately.
- */
- bpf_mem_cache_free(&smap->selem_ma, selem);
- return;
- }
-
- call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
+ if (reuse_now)
+ call_rcu(&selem->rcu, bpf_selem_free_rcu);
+ else
+ call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
}
static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
{
struct bpf_local_storage_elem *selem;
- struct bpf_local_storage_map *smap;
struct hlist_node *n;
/* The "_safe" iteration is needed.
@@ -271,10 +173,8 @@ static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
* but bpf_selem_free will use the selem->rcu_head
* which is union-ized with the selem->free_node.
*/
- hlist_for_each_entry_safe(selem, n, list, free_node) {
- smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
- bpf_selem_free(selem, smap, reuse_now);
- }
+ hlist_for_each_entry_safe(selem, n, list, free_node)
+ bpf_selem_free(selem, reuse_now);
}
/* local_storage->lock must be held and selem->local_storage == local_storage.
@@ -334,47 +234,11 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
return free_local_storage;
}
-static bool check_storage_bpf_ma(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_map *storage_smap,
- struct bpf_local_storage_elem *selem)
-{
-
- struct bpf_local_storage_map *selem_smap;
-
- /* local_storage->smap may be NULL. If it is, get the bpf_ma
- * from any selem in the local_storage->list. The bpf_ma of all
- * local_storage and selem should have the same value
- * for the same map type.
- *
- * If the local_storage->list is already empty, the caller will not
- * care about the bpf_ma value also because the caller is not
- * responsible to free the local_storage.
- */
-
- if (storage_smap)
- return storage_smap->bpf_ma;
-
- if (!selem) {
- struct hlist_node *n;
-
- n = rcu_dereference_check(hlist_first_rcu(&local_storage->list),
- bpf_rcu_lock_held());
- if (!n)
- return false;
-
- selem = hlist_entry(n, struct bpf_local_storage_elem, snode);
- }
- selem_smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
-
- return selem_smap->bpf_ma;
-}
-
static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bool reuse_now)
{
- struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage *local_storage;
- bool bpf_ma, free_local_storage = false;
+ bool free_local_storage = false;
HLIST_HEAD(selem_free_list);
unsigned long flags;
@@ -384,9 +248,6 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
local_storage = rcu_dereference_check(selem->local_storage,
bpf_rcu_lock_held());
- storage_smap = rcu_dereference_check(local_storage->smap,
- bpf_rcu_lock_held());
- bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
raw_spin_lock_irqsave(&local_storage->lock, flags);
if (likely(selem_linked_to_storage(selem)))
@@ -397,7 +258,7 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bpf_selem_free_list(&selem_free_list, reuse_now);
if (free_local_storage)
- bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
+ bpf_local_storage_free(local_storage, reuse_now);
}
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
@@ -432,7 +293,6 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
unsigned long flags;
raw_spin_lock_irqsave(&b->lock, flags);
- RCU_INIT_POINTER(SDATA(selem)->smap, smap);
hlist_add_head_rcu(&selem->map_node, &b->list);
raw_spin_unlock_irqrestore(&b->lock, flags);
}
@@ -491,16 +351,14 @@ int bpf_local_storage_alloc(void *owner,
if (err)
return err;
- if (smap->bpf_ma)
- storage = bpf_mem_cache_alloc_flags(&smap->storage_ma, gfp_flags);
- else
- storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
- gfp_flags | __GFP_NOWARN);
+ storage = bpf_map_kmalloc_nolock(&smap->map, sizeof(*storage),
+ gfp_flags, NUMA_NO_NODE);
if (!storage) {
err = -ENOMEM;
goto uncharge;
}
+ memset(storage->cache, 0, BPF_LOCAL_STORAGE_CACHE_SIZE * sizeof(storage->cache[0]));
RCU_INIT_POINTER(storage->smap, smap);
INIT_HLIST_HEAD(&storage->list);
raw_spin_lock_init(&storage->lock);
@@ -526,22 +384,12 @@ int bpf_local_storage_alloc(void *owner,
bpf_selem_unlink_map(first_selem);
err = -EAGAIN;
goto uncharge;
-
- /* Note that even first_selem was linked to smap's
- * bucket->list, first_selem can be freed immediately
- * (instead of kfree_rcu) because
- * bpf_local_storage_map_free() does a
- * synchronize_rcu_mult (waiting for both sleepable and
- * normal programs) before walking the bucket->list.
- * Hence, no one is accessing selem from the
- * bucket->list under rcu_read_lock().
- */
}
return 0;
uncharge:
- bpf_local_storage_free(storage, smap, smap->bpf_ma, true);
+ bpf_local_storage_free(storage, true);
mem_uncharge(smap, owner, sizeof(*storage));
return err;
}
@@ -586,7 +434,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
if (err) {
- bpf_selem_free(selem, smap, true);
+ bpf_selem_free(selem, true);
mem_uncharge(smap, owner, smap->elem_size);
return ERR_PTR(err);
}
@@ -662,7 +510,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
bpf_selem_free_list(&old_selem_free_list, false);
if (alloc_selem) {
mem_uncharge(smap, owner, smap->elem_size);
- bpf_selem_free(alloc_selem, smap, true);
+ bpf_selem_free(alloc_selem, true);
}
return err ? ERR_PTR(err) : SDATA(selem);
}
@@ -728,16 +576,12 @@ int bpf_local_storage_map_check_btf(const struct bpf_map *map,
void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
{
- struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage_elem *selem;
- bool bpf_ma, free_storage = false;
+ bool free_storage = false;
HLIST_HEAD(free_selem_list);
struct hlist_node *n;
unsigned long flags;
- storage_smap = rcu_dereference_check(local_storage->smap, bpf_rcu_lock_held());
- bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, NULL);
-
/* Neither the bpf_prog nor the bpf_map's syscall
* could be modifying the local_storage->list now.
* Thus, no elem can be added to or deleted from the
@@ -767,7 +611,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
bpf_selem_free_list(&free_selem_list, true);
if (free_storage)
- bpf_local_storage_free(local_storage, storage_smap, bpf_ma, true);
+ bpf_local_storage_free(local_storage, true);
}
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
@@ -780,25 +624,13 @@ u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
return usage;
}
-/* When bpf_ma == true, the bpf_mem_alloc is used to allocate and free memory.
- * A deadlock free allocator is useful for storage that the bpf prog can easily
- * get a hold of the owner PTR_TO_BTF_ID in any context. eg. bpf_get_current_task_btf.
- * The task and cgroup storage fall into this case. The bpf_mem_alloc reuses
- * memory immediately. To be reuse-immediate safe, the owner destruction
- * code path needs to go through a rcu grace period before calling
- * bpf_local_storage_destroy().
- *
- * When bpf_ma == false, the kmalloc and kfree are used.
- */
struct bpf_map *
bpf_local_storage_map_alloc(union bpf_attr *attr,
- struct bpf_local_storage_cache *cache,
- bool bpf_ma)
+ struct bpf_local_storage_cache *cache)
{
struct bpf_local_storage_map *smap;
unsigned int i;
u32 nbuckets;
- int err;
smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE);
if (!smap)
@@ -813,8 +645,8 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
smap->buckets = bpf_map_kvcalloc(&smap->map, nbuckets,
sizeof(*smap->buckets), GFP_USER | __GFP_NOWARN);
if (!smap->buckets) {
- err = -ENOMEM;
- goto free_smap;
+ bpf_map_area_free(smap);
+ return ERR_PTR(-ENOMEM);
}
for (i = 0; i < nbuckets; i++) {
@@ -825,30 +657,8 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
smap->elem_size = offsetof(struct bpf_local_storage_elem,
sdata.data[attr->value_size]);
- /* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
- * preemptible context. Thus, enforce all storages to use
- * bpf_mem_alloc when CONFIG_PREEMPT_RT is enabled.
- */
- smap->bpf_ma = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : bpf_ma;
- if (smap->bpf_ma) {
- err = bpf_mem_alloc_init(&smap->selem_ma, smap->elem_size, false);
- if (err)
- goto free_smap;
-
- err = bpf_mem_alloc_init(&smap->storage_ma, sizeof(struct bpf_local_storage), false);
- if (err) {
- bpf_mem_alloc_destroy(&smap->selem_ma);
- goto free_smap;
- }
- }
-
smap->cache_idx = bpf_local_storage_cache_idx_get(cache);
return &smap->map;
-
-free_smap:
- kvfree(smap->buckets);
- bpf_map_area_free(smap);
- return ERR_PTR(err);
}
void bpf_local_storage_map_free(struct bpf_map *map,
@@ -910,13 +720,10 @@ void bpf_local_storage_map_free(struct bpf_map *map,
*/
synchronize_rcu();
- if (smap->bpf_ma) {
- rcu_barrier_tasks_trace();
- if (!rcu_trace_implies_rcu_gp())
- rcu_barrier();
- bpf_mem_alloc_destroy(&smap->selem_ma);
- bpf_mem_alloc_destroy(&smap->storage_ma);
- }
+ /* Wait for in-flight bpf_obj_free_fields() */
+ rcu_barrier_tasks_trace();
+ rcu_barrier();
+
kvfree(smap->buckets);
bpf_map_area_free(smap);
}
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index a1dc1bf0848a..baf5de6104e2 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -308,7 +308,7 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
{
- return bpf_local_storage_map_alloc(attr, &task_cache, true);
+ return bpf_local_storage_map_alloc(attr, &task_cache);
}
static void task_storage_map_free(struct bpf_map *map)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index bd3c686edc0b..92b548246f04 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -67,7 +67,7 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
{
- return bpf_local_storage_map_alloc(attr, &sk_cache, false);
+ return bpf_local_storage_map_alloc(attr, &sk_cache);
}
static int notsupp_get_next_key(struct bpf_map *map, void *key,
@@ -196,7 +196,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
} else {
ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC);
if (ret) {
- bpf_selem_free(copy_selem, smap, true);
+ bpf_selem_free(copy_selem, true);
atomic_sub(smap->elem_size,
&newsk->sk_omem_alloc);
bpf_map_put(map);
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally
2025-11-12 17:59 ` [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally Amery Hung
@ 2025-11-12 19:35 ` Alexei Starovoitov
2025-11-12 19:51 ` Amery Hung
0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2025-11-12 19:35 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Wed, Nov 12, 2025 at 9:59 AM Amery Hung <ameryhung@gmail.com> wrote:
>
> @@ -80,23 +80,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> if (mem_charge(smap, owner, smap->elem_size))
> return NULL;
>
> - if (smap->bpf_ma) {
> - selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
> - if (selem)
> - /* Keep the original bpf_map_kzalloc behavior
> - * before started using the bpf_mem_cache_alloc.
> - *
> - * No need to use zero_map_value. The bpf_selem_free()
> - * only does bpf_mem_cache_free when there is
> - * no other bpf prog is using the selem.
> - */
> - memset(SDATA(selem)->data, 0, smap->map.value_size);
> - } else {
> - selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> - gfp_flags | __GFP_NOWARN);
> - }
> + selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size, gfp_flags, NUMA_NO_NODE);
Pls enable CONFIG_DEBUG_VM=y then you'll see that the above triggers:
void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
{
gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
...
VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
__GFP_NO_OBJ_EXT));
and benchmarking numbers have to be redone, since with
unsupported gfp flags kmalloc_nolock() is likely doing something wrong.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally
2025-11-12 19:35 ` Alexei Starovoitov
@ 2025-11-12 19:51 ` Amery Hung
2025-11-12 20:04 ` Alexei Starovoitov
0 siblings, 1 reply; 9+ messages in thread
From: Amery Hung @ 2025-11-12 19:51 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Wed, Nov 12, 2025 at 11:35 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 9:59 AM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > @@ -80,23 +80,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > if (mem_charge(smap, owner, smap->elem_size))
> > return NULL;
> >
> > - if (smap->bpf_ma) {
> > - selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
> > - if (selem)
> > - /* Keep the original bpf_map_kzalloc behavior
> > - * before started using the bpf_mem_cache_alloc.
> > - *
> > - * No need to use zero_map_value. The bpf_selem_free()
> > - * only does bpf_mem_cache_free when there is
> > - * no other bpf prog is using the selem.
> > - */
> > - memset(SDATA(selem)->data, 0, smap->map.value_size);
> > - } else {
> > - selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > - gfp_flags | __GFP_NOWARN);
> > - }
> > + selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size, gfp_flags, NUMA_NO_NODE);
>
>
> Pls enable CONFIG_DEBUG_VM=y then you'll see that the above triggers:
> void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
> {
> gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
> ...
> VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
> __GFP_NO_OBJ_EXT));
>
> and benchmarking numbers have to be redone, since with
> unsupported gfp flags kmalloc_nolock() is likely doing something wrong.
I see. Thanks for pointing it out. Currently the verifier determines
the flag and rewrites the program based on if the caller of
storage_get helpers is sleepable. I will remove it and redo the
benchmark.
Thanks,
Amery
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally
2025-11-12 19:51 ` Amery Hung
@ 2025-11-12 20:04 ` Alexei Starovoitov
2025-11-12 21:14 ` Amery Hung
0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2025-11-12 20:04 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Wed, Nov 12, 2025 at 11:51 AM Amery Hung <ameryhung@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 11:35 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Nov 12, 2025 at 9:59 AM Amery Hung <ameryhung@gmail.com> wrote:
> > >
> > > @@ -80,23 +80,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > > if (mem_charge(smap, owner, smap->elem_size))
> > > return NULL;
> > >
> > > - if (smap->bpf_ma) {
> > > - selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
> > > - if (selem)
> > > - /* Keep the original bpf_map_kzalloc behavior
> > > - * before started using the bpf_mem_cache_alloc.
> > > - *
> > > - * No need to use zero_map_value. The bpf_selem_free()
> > > - * only does bpf_mem_cache_free when there is
> > > - * no other bpf prog is using the selem.
> > > - */
> > > - memset(SDATA(selem)->data, 0, smap->map.value_size);
> > > - } else {
> > > - selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > > - gfp_flags | __GFP_NOWARN);
> > > - }
> > > + selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size, gfp_flags, NUMA_NO_NODE);
> >
> >
> > Pls enable CONFIG_DEBUG_VM=y then you'll see that the above triggers:
> > void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
> > {
> > gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
> > ...
> > VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
> > __GFP_NO_OBJ_EXT));
> >
> > and benchmarking numbers have to be redone, since with
> > unsupported gfp flags kmalloc_nolock() is likely doing something wrong.
>
> I see. Thanks for pointing it out. Currently the verifier determines
> the flag and rewrites the program based on if the caller of
> storage_get helpers is sleepable. I will remove it and redo the
> benchmark.
yes. that part of the verifier can be removed too.
First I would redo the benchmark numbers with s/gfp_flags/0 in the above line.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally
2025-11-12 20:04 ` Alexei Starovoitov
@ 2025-11-12 21:14 ` Amery Hung
2025-11-13 1:08 ` Alexei Starovoitov
0 siblings, 1 reply; 9+ messages in thread
From: Amery Hung @ 2025-11-12 21:14 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Wed, Nov 12, 2025 at 12:05 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 11:51 AM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > On Wed, Nov 12, 2025 at 11:35 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Nov 12, 2025 at 9:59 AM Amery Hung <ameryhung@gmail.com> wrote:
> > > >
> > > > @@ -80,23 +80,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > > > if (mem_charge(smap, owner, smap->elem_size))
> > > > return NULL;
> > > >
> > > > - if (smap->bpf_ma) {
> > > > - selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
> > > > - if (selem)
> > > > - /* Keep the original bpf_map_kzalloc behavior
> > > > - * before started using the bpf_mem_cache_alloc.
> > > > - *
> > > > - * No need to use zero_map_value. The bpf_selem_free()
> > > > - * only does bpf_mem_cache_free when there is
> > > > - * no other bpf prog is using the selem.
> > > > - */
> > > > - memset(SDATA(selem)->data, 0, smap->map.value_size);
> > > > - } else {
> > > > - selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > > > - gfp_flags | __GFP_NOWARN);
> > > > - }
> > > > + selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size, gfp_flags, NUMA_NO_NODE);
> > >
> > >
> > > Pls enable CONFIG_DEBUG_VM=y then you'll see that the above triggers:
> > > void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
> > > {
> > > gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
> > > ...
> > > VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
> > > __GFP_NO_OBJ_EXT));
> > >
> > > and benchmarking numbers have to be redone, since with
> > > unsupported gfp flags kmalloc_nolock() is likely doing something wrong.
> >
> > I see. Thanks for pointing it out. Currently the verifier determines
> > the flag and rewrites the program based on if the caller of
> > storage_get helpers is sleepable. I will remove it and redo the
> > benchmark.
>
> yes. that part of the verifier can be removed too.
> First I would redo the benchmark numbers with s/gfp_flags/0 in the above line.
Here are the new numbers after setting gfp_flags = __GFP_ZERO and
removing memset(0). BTW, the test is done on a physical machine. The
numbers for sk local storage can change ±1-2. I will try to increase
the test iteration or kill other unnecessary things running on the
machine to see if the number fluctuates less.
Socket local storage
memory alloc batch creation speed creation speed diff
--------------- ---- ------------------ ----
kzalloc 16 104.217 ± 0.974k/s 4.15 kmallocs/create
(before) 32 104.355 ± 0.606k/s 4.13 kmallocs/create
64 103.611 ± 0.707k/s 4.15 kmallocs/create
kmalloc_nolock 16 100.402 ± 1.282k/s 1.11 kmallocs/create -3.7%
(after) 32 101.592 ± 0.861k/s 1.07 kmallocs/create -2.6%
64 98.995 ± 0.868k/s 1.07 kmallocs/create -4.6%
Task local storage
memory alloc batch creation speed creation speed diff
--------------- ---- ------------------ ----
BPF memory 16 24.668 ± 0.121k/s 2.54 kmallocs/create
allocator 32 22.899 ± 0.097k/s 2.67 kmallocs/create
(before) 64 22.559 ± 0.076k/s 2.56 kmallocs/create
kmalloc_nolock 16 25.659 ± 0.108k/s 2.51 kmallocs/create +4.0%
(after) 32 23.723 ± 0.082k/s 2.54 kmallocs/create +3.6%
64 23.359 ± 0.236k/s 2.48 kmallocs/create +3.5%
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally
2025-11-12 21:14 ` Amery Hung
@ 2025-11-13 1:08 ` Alexei Starovoitov
2025-11-13 18:58 ` Amery Hung
0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2025-11-13 1:08 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Wed, Nov 12, 2025 at 1:15 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 12:05 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Nov 12, 2025 at 11:51 AM Amery Hung <ameryhung@gmail.com> wrote:
> > >
> > > On Wed, Nov 12, 2025 at 11:35 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 12, 2025 at 9:59 AM Amery Hung <ameryhung@gmail.com> wrote:
> > > > >
> > > > > @@ -80,23 +80,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > > > > if (mem_charge(smap, owner, smap->elem_size))
> > > > > return NULL;
> > > > >
> > > > > - if (smap->bpf_ma) {
> > > > > - selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
> > > > > - if (selem)
> > > > > - /* Keep the original bpf_map_kzalloc behavior
> > > > > - * before started using the bpf_mem_cache_alloc.
> > > > > - *
> > > > > - * No need to use zero_map_value. The bpf_selem_free()
> > > > > - * only does bpf_mem_cache_free when there is
> > > > > - * no other bpf prog is using the selem.
> > > > > - */
> > > > > - memset(SDATA(selem)->data, 0, smap->map.value_size);
> > > > > - } else {
> > > > > - selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > > > > - gfp_flags | __GFP_NOWARN);
> > > > > - }
> > > > > + selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size, gfp_flags, NUMA_NO_NODE);
> > > >
> > > >
> > > > Pls enable CONFIG_DEBUG_VM=y then you'll see that the above triggers:
> > > > void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
> > > > {
> > > > gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
> > > > ...
> > > > VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
> > > > __GFP_NO_OBJ_EXT));
> > > >
> > > > and benchmarking numbers have to be redone, since with
> > > > unsupported gfp flags kmalloc_nolock() is likely doing something wrong.
> > >
> > > I see. Thanks for pointing it out. Currently the verifier determines
> > > the flag and rewrites the program based on if the caller of
> > > storage_get helpers is sleepable. I will remove it and redo the
> > > benchmark.
> >
> > yes. that part of the verifier can be removed too.
> > First I would redo the benchmark numbers with s/gfp_flags/0 in the above line.
>
> Here are the new numbers after setting gfp_flags = __GFP_ZERO and
> removing memset(0). BTW, the test is done on a physical machine. The
> numbers for sk local storage can change ±1-2. I will try to increase
> the test iteration or kill other unnecessary things running on the
> machine to see if the number fluctuates less.
fyi gfp_zero is pretty much the same memset().
Shouldn't be any difference between __GFP_ZERO vs manual memset() after alloc.
> Socket local storage
> memory alloc batch creation speed creation speed diff
> --------------- ---- ------------------ ----
> kzalloc 16 104.217 ± 0.974k/s 4.15 kmallocs/create
> (before) 32 104.355 ± 0.606k/s 4.13 kmallocs/create
> 64 103.611 ± 0.707k/s 4.15 kmallocs/create
>
> kmalloc_nolock 16 100.402 ± 1.282k/s 1.11 kmallocs/create -3.7%
> (after) 32 101.592 ± 0.861k/s 1.07 kmallocs/create -2.6%
> 64 98.995 ± 0.868k/s 1.07 kmallocs/create -4.6%
could you perf record/report both to see where the difference comes from?
The only reason I could explain the difference is that kfree_nolock()
is hitting defer_free() case, but that shouldn't happen
in this microbenchmark. But if you see free_deferred_objects()
in perf report that would explain it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally
2025-11-13 1:08 ` Alexei Starovoitov
@ 2025-11-13 18:58 ` Amery Hung
0 siblings, 0 replies; 9+ messages in thread
From: Amery Hung @ 2025-11-13 18:58 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Wed, Nov 12, 2025 at 5:08 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Nov 12, 2025 at 1:15 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > On Wed, Nov 12, 2025 at 12:05 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Nov 12, 2025 at 11:51 AM Amery Hung <ameryhung@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 12, 2025 at 11:35 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Wed, Nov 12, 2025 at 9:59 AM Amery Hung <ameryhung@gmail.com> wrote:
> > > > > >
> > > > > > @@ -80,23 +80,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > > > > > if (mem_charge(smap, owner, smap->elem_size))
> > > > > > return NULL;
> > > > > >
> > > > > > - if (smap->bpf_ma) {
> > > > > > - selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
> > > > > > - if (selem)
> > > > > > - /* Keep the original bpf_map_kzalloc behavior
> > > > > > - * before started using the bpf_mem_cache_alloc.
> > > > > > - *
> > > > > > - * No need to use zero_map_value. The bpf_selem_free()
> > > > > > - * only does bpf_mem_cache_free when there is
> > > > > > - * no other bpf prog is using the selem.
> > > > > > - */
> > > > > > - memset(SDATA(selem)->data, 0, smap->map.value_size);
> > > > > > - } else {
> > > > > > - selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
> > > > > > - gfp_flags | __GFP_NOWARN);
> > > > > > - }
> > > > > > + selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size, gfp_flags, NUMA_NO_NODE);
> > > > >
> > > > >
> > > > > Pls enable CONFIG_DEBUG_VM=y then you'll see that the above triggers:
> > > > > void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node)
> > > > > {
> > > > > gfp_t alloc_gfp = __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags;
> > > > > ...
> > > > > VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO |
> > > > > __GFP_NO_OBJ_EXT));
> > > > >
> > > > > and benchmarking numbers have to be redone, since with
> > > > > unsupported gfp flags kmalloc_nolock() is likely doing something wrong.
> > > >
> > > > I see. Thanks for pointing it out. Currently the verifier determines
> > > > the flag and rewrites the program based on if the caller of
> > > > storage_get helpers is sleepable. I will remove it and redo the
> > > > benchmark.
> > >
> > > yes. that part of the verifier can be removed too.
> > > First I would redo the benchmark numbers with s/gfp_flags/0 in the above line.
> >
> > Here are the new numbers after setting gfp_flags = __GFP_ZERO and
> > removing memset(0). BTW, the test is done on a physical machine. The
> > numbers for sk local storage can change ±1-2. I will try to increase
> > the test iteration or kill other unnecessary things running on the
> > machine to see if the number fluctuates less.
>
> fyi gfp_zero is pretty much the same memset().
> Shouldn't be any difference between __GFP_ZERO vs manual memset() after alloc.
>
> > Socket local storage
> > memory alloc batch creation speed creation speed diff
> > --------------- ---- ------------------ ----
> > kzalloc 16 104.217 ± 0.974k/s 4.15 kmallocs/create
> > (before) 32 104.355 ± 0.606k/s 4.13 kmallocs/create
> > 64 103.611 ± 0.707k/s 4.15 kmallocs/create
> >
> > kmalloc_nolock 16 100.402 ± 1.282k/s 1.11 kmallocs/create -3.7%
> > (after) 32 101.592 ± 0.861k/s 1.07 kmallocs/create -2.6%
> > 64 98.995 ± 0.868k/s 1.07 kmallocs/create -4.6%
>
> could you perf record/report both to see where the difference comes from?
> The only reason I could explain the difference is that kfree_nolock()
> is hitting defer_free() case, but that shouldn't happen
> in this microbenchmark. But if you see free_deferred_objects()
> in perf report that would explain it.
Updated numbers and next step:
A new perf result shows ~10% creation speed decrease (130k/s vs
142k/s) for socket local storage after stopping a service that is also
creating it for every connection.
The performance hit did come from defer_free() and the loss of
batching with kfree_rcu().
Hence, kmalloc_nolock() is not ready in this case to replace
kzalloc(). The next respin will only replace BPF memory allocator in
local storage with kmalloc_nolock().
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-11-13 18:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-12 17:59 [PATCH RFC bpf-next 0/2] Switch to kmalloc_nolock() in BPF local storage Amery Hung
2025-11-12 17:59 ` [PATCH RFC bpf-next 1/2] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
2025-11-12 17:59 ` [PATCH RFC bpf-next 2/2] bpf: Use kmalloc_nolock() in local storage unconditionally Amery Hung
2025-11-12 19:35 ` Alexei Starovoitov
2025-11-12 19:51 ` Amery Hung
2025-11-12 20:04 ` Alexei Starovoitov
2025-11-12 21:14 ` Amery Hung
2025-11-13 1:08 ` Alexei Starovoitov
2025-11-13 18:58 ` Amery Hung
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox