* [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage
@ 2025-11-14 20:13 Amery Hung
2025-11-14 20:13 ` [PATCH v2 bpf-next 1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Amery Hung @ 2025-11-14 20:13 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Hi,
This patchset tries to simplify bpf_local_storage.c by adopting
kmalloc_nolock(). This removes memory preallocation and reduces the
dependency of smap in bpf_selem_free() and bpf_local_storage_free().
The later will simplify a future refactor that replaces
local_storage->lock and b->lock [1].
RFC v1 tried to switch to kmalloc_nolock() unconditionally. However,
as there is substantial performance loss in socket local storage due to
1) defer_free() in kfree_nolock() and 2) no kfree_rcu() batching,
replacing kzalloc() is postponed until necessary improvements in mm
land.
Benchmark
./bench -p 1 local-storage-create --storage-type <socket,task> \
--batch-size <16,32,64>
The benchmark is a microbenchmark stress-testing how fast local storage
can be created. For task local storage, switching from BPF memory
allocator to kmalloc_nolock() yields a small amount of improvement. For
socket local storage, it remains roughly the same as nothing has changed.
Socket local storage
memory alloc batch creation speed creation speed diff
--------------- ---- ------------------ ----
kzalloc 16 144.149 ± 0.642k/s 3.10 kmallocs/create
(before) 32 144.379 ± 1.070k/s 3.08 kmallocs/create
64 144.491 ± 0.818k/s 3.13 kmallocs/create
kzalloc 16 146.180 ± 1.403k/s 3.10 kmallocs/create +1.4%
(not changed) 32 146.245 ± 1.272k/s 3.10 kmallocs/create +1.3%
64 145.012 ± 1.545k/s 3.10 kmallocs/create +0.4%
Task local storage
memory alloc batch creation speed creation speed diff
--------------- ---- ------------------ ----
BPF memory 16 24.668 ± 0.121k/s 2.54 kmallocs/create
allocator 32 22.899 ± 0.097k/s 2.67 kmallocs/create
(before) 64 22.559 ± 0.076k/s 2.56 kmallocs/create
kmalloc_nolock 16 25.796 ± 0.059k/s 2.52 kmallocs/create +4.6%
(after) 32 23.412 ± 0.069k/s 2.50 kmallocs/create +2.2%
64 23.717 ± 0.108k/s 2.60 kmallocs/create +5.1%
[1] https://lore.kernel.org/bpf/20251002225356.1505480-1-ameryhung@gmail.com/
v1 -> v2
- Only replace BPF memory allocator with kmalloc_nolock()
Link: https://lore.kernel.org/bpf/20251112175939.2365295-1-ameryhung@gmail.com/
---
Amery Hung (4):
bpf: Always charge/uncharge memory when allocating/unlinking storage
elements
bpf: Remove smap argument from bpf_selem_free()
bpf: Save memory alloction info in bpf_local_storage
bpf: Replace bpf memory allocator with kmalloc_nolock() in local
storage
include/linux/bpf_local_storage.h | 10 +-
kernel/bpf/bpf_local_storage.c | 235 +++++++++---------------------
net/core/bpf_sk_storage.c | 4 +-
3 files changed, 74 insertions(+), 175 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 bpf-next 1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements
2025-11-14 20:13 [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage Amery Hung
@ 2025-11-14 20:13 ` Amery Hung
2025-11-17 18:25 ` Martin KaFai Lau
2025-11-14 20:13 ` [PATCH v2 bpf-next 2/4] bpf: Remove smap argument from bpf_selem_free() Amery Hung
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Amery Hung @ 2025-11-14 20:13 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Since commit a96a44aba556 ("bpf: bpf_sk_storage: Fix invalid wait
context lockdep report"), {charge,uncharge}_mem are always true when
allocating a bpf_local_storage_elem or unlinking a bpf_local_storage_elem
from local storage, so drop these arguments. No functional change.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 2 +-
kernel/bpf/bpf_local_storage.c | 22 ++++++++++------------
net/core/bpf_sk_storage.c | 2 +-
3 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 782f58feea35..3663eabcc3ff 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -184,7 +184,7 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
- bool charge_mem, bool swap_uptrs, gfp_t gfp_flags);
+ bool swap_uptrs, gfp_t gfp_flags);
void bpf_selem_free(struct bpf_local_storage_elem *selem,
struct bpf_local_storage_map *smap,
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index b931fbceb54d..400bdf8a3eb2 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -73,11 +73,11 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
- void *value, bool charge_mem, bool swap_uptrs, gfp_t gfp_flags)
+ void *value, bool swap_uptrs, gfp_t gfp_flags)
{
struct bpf_local_storage_elem *selem;
- if (charge_mem && mem_charge(smap, owner, smap->elem_size))
+ if (mem_charge(smap, owner, smap->elem_size))
return NULL;
if (smap->bpf_ma) {
@@ -106,8 +106,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
return selem;
}
- if (charge_mem)
- mem_uncharge(smap, owner, smap->elem_size);
+ mem_uncharge(smap, owner, smap->elem_size);
return NULL;
}
@@ -284,7 +283,7 @@ static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
*/
static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem,
- bool uncharge_mem, struct hlist_head *free_selem_list)
+ struct hlist_head *free_selem_list)
{
struct bpf_local_storage_map *smap;
bool free_local_storage;
@@ -297,8 +296,7 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
* The owner may be freed once the last selem is unlinked
* from local_storage.
*/
- if (uncharge_mem)
- mem_uncharge(smap, owner, smap->elem_size);
+ mem_uncharge(smap, owner, smap->elem_size);
free_local_storage = hlist_is_singular_node(&selem->snode,
&local_storage->list);
@@ -393,7 +391,7 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
raw_spin_lock_irqsave(&local_storage->lock, flags);
if (likely(selem_linked_to_storage(selem)))
free_local_storage = bpf_selem_unlink_storage_nolock(
- local_storage, selem, true, &selem_free_list);
+ local_storage, selem, &selem_free_list);
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&selem_free_list, reuse_now);
@@ -582,7 +580,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (err)
return ERR_PTR(err);
- selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
+ selem = bpf_selem_alloc(smap, owner, value, swap_uptrs, gfp_flags);
if (!selem)
return ERR_PTR(-ENOMEM);
@@ -616,7 +614,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
/* A lookup has just been done before and concluded a new selem is
* needed. The chance of an unnecessary alloc is unlikely.
*/
- alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
+ alloc_selem = selem = bpf_selem_alloc(smap, owner, value, swap_uptrs, gfp_flags);
if (!alloc_selem)
return ERR_PTR(-ENOMEM);
@@ -656,7 +654,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (old_sdata) {
bpf_selem_unlink_map(SELEM(old_sdata));
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
- true, &old_selem_free_list);
+ &old_selem_free_list);
}
unlock:
@@ -762,7 +760,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
* of the loop will set the free_cgroup_storage to true.
*/
free_storage = bpf_selem_unlink_storage_nolock(
- local_storage, selem, true, &free_selem_list);
+ local_storage, selem, &free_selem_list);
}
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index d3fbaf89a698..bd3c686edc0b 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -136,7 +136,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
{
struct bpf_local_storage_elem *copy_selem;
- copy_selem = bpf_selem_alloc(smap, newsk, NULL, true, false, GFP_ATOMIC);
+ copy_selem = bpf_selem_alloc(smap, newsk, NULL, false, GFP_ATOMIC);
if (!copy_selem)
return NULL;
--
2.47.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 bpf-next 2/4] bpf: Remove smap argument from bpf_selem_free()
2025-11-14 20:13 [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage Amery Hung
2025-11-14 20:13 ` [PATCH v2 bpf-next 1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
@ 2025-11-14 20:13 ` Amery Hung
2025-11-17 18:32 ` Martin KaFai Lau
2025-11-14 20:13 ` [PATCH v2 bpf-next 3/4] bpf: Save memory alloction info in bpf_local_storage Amery Hung
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Amery Hung @ 2025-11-14 20:13 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Since selem already saves a pointer to smap, use it instead of an
additional argument in bpf_selem_free(). This requires moving the
SDATA(selem)->smap assignment from bpf_selem_link_map() to
bpf_selem_alloc() since bpf_selem_free() may be called without the
selem being linked to smap in bpf_local_storage_update().
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 1 -
kernel/bpf/bpf_local_storage.c | 19 ++++++++++---------
net/core/bpf_sk_storage.c | 2 +-
3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 3663eabcc3ff..4ab137e75f33 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -187,7 +187,6 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
bool swap_uptrs, gfp_t gfp_flags);
void bpf_selem_free(struct bpf_local_storage_elem *selem,
- struct bpf_local_storage_map *smap,
bool reuse_now);
int
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 400bdf8a3eb2..95a5ea618cc5 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -97,6 +97,8 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
}
if (selem) {
+ RCU_INIT_POINTER(SDATA(selem)->smap, smap);
+
if (value) {
/* No need to call check_and_init_map_value as memory is zero init */
copy_map_value(&smap->map, SDATA(selem)->data, value);
@@ -227,9 +229,12 @@ static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
}
void bpf_selem_free(struct bpf_local_storage_elem *selem,
- struct bpf_local_storage_map *smap,
bool reuse_now)
{
+ struct bpf_local_storage_map *smap;
+
+ smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
+
if (!smap->bpf_ma) {
/* Only task storage has uptrs and task storage
* has moved to bpf_mem_alloc. Meaning smap->bpf_ma == true
@@ -263,7 +268,6 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
{
struct bpf_local_storage_elem *selem;
- struct bpf_local_storage_map *smap;
struct hlist_node *n;
/* The "_safe" iteration is needed.
@@ -271,10 +275,8 @@ static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
* but bpf_selem_free will use the selem->rcu_head
* which is union-ized with the selem->free_node.
*/
- hlist_for_each_entry_safe(selem, n, list, free_node) {
- smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
- bpf_selem_free(selem, smap, reuse_now);
- }
+ hlist_for_each_entry_safe(selem, n, list, free_node)
+ bpf_selem_free(selem, reuse_now);
}
/* local_storage->lock must be held and selem->local_storage == local_storage.
@@ -432,7 +434,6 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
unsigned long flags;
raw_spin_lock_irqsave(&b->lock, flags);
- RCU_INIT_POINTER(SDATA(selem)->smap, smap);
hlist_add_head_rcu(&selem->map_node, &b->list);
raw_spin_unlock_irqrestore(&b->lock, flags);
}
@@ -586,7 +587,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
err = bpf_local_storage_alloc(owner, smap, selem, gfp_flags);
if (err) {
- bpf_selem_free(selem, smap, true);
+ bpf_selem_free(selem, true);
mem_uncharge(smap, owner, smap->elem_size);
return ERR_PTR(err);
}
@@ -662,7 +663,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
bpf_selem_free_list(&old_selem_free_list, false);
if (alloc_selem) {
mem_uncharge(smap, owner, smap->elem_size);
- bpf_selem_free(alloc_selem, smap, true);
+ bpf_selem_free(alloc_selem, true);
}
return err ? ERR_PTR(err) : SDATA(selem);
}
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index bd3c686edc0b..850dd736ccd1 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -196,7 +196,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
} else {
ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC);
if (ret) {
- bpf_selem_free(copy_selem, smap, true);
+ bpf_selem_free(copy_selem, true);
atomic_sub(smap->elem_size,
&newsk->sk_omem_alloc);
bpf_map_put(map);
--
2.47.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 bpf-next 3/4] bpf: Save memory alloction info in bpf_local_storage
2025-11-14 20:13 [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage Amery Hung
2025-11-14 20:13 ` [PATCH v2 bpf-next 1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
2025-11-14 20:13 ` [PATCH v2 bpf-next 2/4] bpf: Remove smap argument from bpf_selem_free() Amery Hung
@ 2025-11-14 20:13 ` Amery Hung
2025-11-17 18:36 ` Martin KaFai Lau
2025-11-14 20:13 ` [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage Amery Hung
2025-11-19 0:30 ` [PATCH v2 bpf-next 0/4] Replace BPF " patchwork-bot+netdevbpf
4 siblings, 1 reply; 16+ messages in thread
From: Amery Hung @ 2025-11-14 20:13 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Save the memory allocation method used for bpf_local_storage in the
struct explicitly so that we don't need to go through the hassle to
find out the info. When a later patch replaces BPF memory allocator
with kmalloc_noloc(), bpf_local_storage_free() will no longer need
smap->storage_ma to return the memory and completely remove the
dependency on smap in bpf_local_storage_free().
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 1 +
kernel/bpf/bpf_local_storage.c | 52 +++++--------------------------
2 files changed, 9 insertions(+), 44 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 4ab137e75f33..7fef0cec8340 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -97,6 +97,7 @@ struct bpf_local_storage {
*/
struct rcu_head rcu;
raw_spinlock_t lock; /* Protect adding/removing from the "list" */
+ bool bpf_ma;
};
/* U16_MAX is much more than enough for sk local storage
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 95a5ea618cc5..3c04b9d85860 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -157,12 +157,12 @@ static void __bpf_local_storage_free(struct bpf_local_storage *local_storage,
static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *smap,
- bool bpf_ma, bool reuse_now)
+ bool reuse_now)
{
if (!local_storage)
return;
- if (!bpf_ma) {
+ if (!local_storage->bpf_ma) {
__bpf_local_storage_free(local_storage, reuse_now);
return;
}
@@ -336,47 +336,12 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
return free_local_storage;
}
-static bool check_storage_bpf_ma(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_map *storage_smap,
- struct bpf_local_storage_elem *selem)
-{
-
- struct bpf_local_storage_map *selem_smap;
-
- /* local_storage->smap may be NULL. If it is, get the bpf_ma
- * from any selem in the local_storage->list. The bpf_ma of all
- * local_storage and selem should have the same value
- * for the same map type.
- *
- * If the local_storage->list is already empty, the caller will not
- * care about the bpf_ma value also because the caller is not
- * responsible to free the local_storage.
- */
-
- if (storage_smap)
- return storage_smap->bpf_ma;
-
- if (!selem) {
- struct hlist_node *n;
-
- n = rcu_dereference_check(hlist_first_rcu(&local_storage->list),
- bpf_rcu_lock_held());
- if (!n)
- return false;
-
- selem = hlist_entry(n, struct bpf_local_storage_elem, snode);
- }
- selem_smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
-
- return selem_smap->bpf_ma;
-}
-
static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bool reuse_now)
{
struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage *local_storage;
- bool bpf_ma, free_local_storage = false;
+ bool free_local_storage = false;
HLIST_HEAD(selem_free_list);
unsigned long flags;
@@ -388,7 +353,6 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bpf_rcu_lock_held());
storage_smap = rcu_dereference_check(local_storage->smap,
bpf_rcu_lock_held());
- bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
raw_spin_lock_irqsave(&local_storage->lock, flags);
if (likely(selem_linked_to_storage(selem)))
@@ -399,7 +363,7 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bpf_selem_free_list(&selem_free_list, reuse_now);
if (free_local_storage)
- bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
+ bpf_local_storage_free(local_storage, storage_smap, reuse_now);
}
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
@@ -506,6 +470,7 @@ int bpf_local_storage_alloc(void *owner,
INIT_HLIST_HEAD(&storage->list);
raw_spin_lock_init(&storage->lock);
storage->owner = owner;
+ storage->bpf_ma = smap->bpf_ma;
bpf_selem_link_storage_nolock(storage, first_selem);
bpf_selem_link_map(smap, first_selem);
@@ -542,7 +507,7 @@ int bpf_local_storage_alloc(void *owner,
return 0;
uncharge:
- bpf_local_storage_free(storage, smap, smap->bpf_ma, true);
+ bpf_local_storage_free(storage, smap, true);
mem_uncharge(smap, owner, sizeof(*storage));
return err;
}
@@ -731,13 +696,12 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
{
struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage_elem *selem;
- bool bpf_ma, free_storage = false;
+ bool free_storage = false;
HLIST_HEAD(free_selem_list);
struct hlist_node *n;
unsigned long flags;
storage_smap = rcu_dereference_check(local_storage->smap, bpf_rcu_lock_held());
- bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, NULL);
/* Neither the bpf_prog nor the bpf_map's syscall
* could be modifying the local_storage->list now.
@@ -768,7 +732,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
bpf_selem_free_list(&free_selem_list, true);
if (free_storage)
- bpf_local_storage_free(local_storage, storage_smap, bpf_ma, true);
+ bpf_local_storage_free(local_storage, storage_smap, true);
}
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
--
2.47.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-14 20:13 [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage Amery Hung
` (2 preceding siblings ...)
2025-11-14 20:13 ` [PATCH v2 bpf-next 3/4] bpf: Save memory alloction info in bpf_local_storage Amery Hung
@ 2025-11-14 20:13 ` Amery Hung
2025-11-15 2:01 ` Alexei Starovoitov
2025-11-19 0:30 ` [PATCH v2 bpf-next 0/4] Replace BPF " patchwork-bot+netdevbpf
4 siblings, 1 reply; 16+ messages in thread
From: Amery Hung @ 2025-11-14 20:13 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, ameryhung, kernel-team
Replace bpf memory allocator with kmalloc_nolock() to reduce memory
wastage due to preallocation.
In bpf_selem_free(), an selem now needs to wait for a RCU grace period
before being freed when reuse_now == true. Therefore, rcu_barrier()
should be always be called in bpf_local_storage_map_free().
In bpf_local_storage_free(), since smap->storage_ma is no longer needed
to return the memory, the function is now independent from smap.
Remove the outdated comment in bpf_local_storage_alloc(). We already
free selem after an RCU grace period in bpf_local_storage_update() when
bpf_local_storage_alloc() failed the cmpxchg since commit c0d63f309186
("bpf: Add bpf_selem_free()").
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 8 +-
kernel/bpf/bpf_local_storage.c | 152 +++++++++---------------------
2 files changed, 48 insertions(+), 112 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 7fef0cec8340..66432248cd81 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -53,9 +53,7 @@ struct bpf_local_storage_map {
u32 bucket_log;
u16 elem_size;
u16 cache_idx;
- struct bpf_mem_alloc selem_ma;
- struct bpf_mem_alloc storage_ma;
- bool bpf_ma;
+ bool use_kmalloc_nolock;
};
struct bpf_local_storage_data {
@@ -97,7 +95,7 @@ struct bpf_local_storage {
*/
struct rcu_head rcu;
raw_spinlock_t lock; /* Protect adding/removing from the "list" */
- bool bpf_ma;
+ bool use_kmalloc_nolock;
};
/* U16_MAX is much more than enough for sk local storage
@@ -131,7 +129,7 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
struct bpf_map *
bpf_local_storage_map_alloc(union bpf_attr *attr,
struct bpf_local_storage_cache *cache,
- bool bpf_ma);
+ bool use_kmalloc_nolock);
void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *smap,
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 3c04b9d85860..e2fe6c32822b 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -80,17 +80,9 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
if (mem_charge(smap, owner, smap->elem_size))
return NULL;
- if (smap->bpf_ma) {
- selem = bpf_mem_cache_alloc_flags(&smap->selem_ma, gfp_flags);
- if (selem)
- /* Keep the original bpf_map_kzalloc behavior
- * before started using the bpf_mem_cache_alloc.
- *
- * No need to use zero_map_value. The bpf_selem_free()
- * only does bpf_mem_cache_free when there is
- * no other bpf prog is using the selem.
- */
- memset(SDATA(selem)->data, 0, smap->map.value_size);
+ if (smap->use_kmalloc_nolock) {
+ selem = bpf_map_kmalloc_nolock(&smap->map, smap->elem_size,
+ __GFP_ZERO, NUMA_NO_NODE);
} else {
selem = bpf_map_kzalloc(&smap->map, smap->elem_size,
gfp_flags | __GFP_NOWARN);
@@ -113,7 +105,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
return NULL;
}
-/* rcu tasks trace callback for bpf_ma == false */
+/* rcu tasks trace callback for use_kmalloc_nolock == false */
static void __bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage *local_storage;
@@ -128,12 +120,23 @@ static void __bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
kfree_rcu(local_storage, rcu);
}
+/* Handle use_kmalloc_nolock == false */
+static void __bpf_local_storage_free(struct bpf_local_storage *local_storage,
+ bool vanilla_rcu)
+{
+ if (vanilla_rcu)
+ kfree_rcu(local_storage, rcu);
+ else
+ call_rcu_tasks_trace(&local_storage->rcu,
+ __bpf_local_storage_free_trace_rcu);
+}
+
static void bpf_local_storage_free_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage *local_storage;
local_storage = container_of(rcu, struct bpf_local_storage, rcu);
- bpf_mem_cache_raw_free(local_storage);
+ kfree_nolock(local_storage);
}
static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
@@ -144,46 +147,27 @@ static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
call_rcu(rcu, bpf_local_storage_free_rcu);
}
-/* Handle bpf_ma == false */
-static void __bpf_local_storage_free(struct bpf_local_storage *local_storage,
- bool vanilla_rcu)
-{
- if (vanilla_rcu)
- kfree_rcu(local_storage, rcu);
- else
- call_rcu_tasks_trace(&local_storage->rcu,
- __bpf_local_storage_free_trace_rcu);
-}
-
static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_map *smap,
bool reuse_now)
{
if (!local_storage)
return;
- if (!local_storage->bpf_ma) {
+ if (!local_storage->use_kmalloc_nolock) {
__bpf_local_storage_free(local_storage, reuse_now);
return;
}
- if (!reuse_now) {
- call_rcu_tasks_trace(&local_storage->rcu,
- bpf_local_storage_free_trace_rcu);
+ if (reuse_now) {
+ call_rcu(&local_storage->rcu, bpf_local_storage_free_rcu);
return;
}
- if (smap)
- bpf_mem_cache_free(&smap->storage_ma, local_storage);
- else
- /* smap could be NULL if the selem that triggered
- * this 'local_storage' creation had been long gone.
- * In this case, directly do call_rcu().
- */
- call_rcu(&local_storage->rcu, bpf_local_storage_free_rcu);
+ call_rcu_tasks_trace(&local_storage->rcu,
+ bpf_local_storage_free_trace_rcu);
}
-/* rcu tasks trace callback for bpf_ma == false */
+/* rcu tasks trace callback for use_kmalloc_nolock == false */
static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
{
struct bpf_local_storage_elem *selem;
@@ -195,7 +179,7 @@ static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
kfree_rcu(selem, rcu);
}
-/* Handle bpf_ma == false */
+/* Handle use_kmalloc_nolock == false */
static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
bool vanilla_rcu)
{
@@ -217,7 +201,7 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu)
migrate_disable();
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
migrate_enable();
- bpf_mem_cache_raw_free(selem);
+ kfree_nolock(selem);
}
static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
@@ -235,11 +219,11 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
- if (!smap->bpf_ma) {
- /* Only task storage has uptrs and task storage
- * has moved to bpf_mem_alloc. Meaning smap->bpf_ma == true
- * for task storage, so this bpf_obj_free_fields() won't unpin
- * any uptr.
+ if (!smap->use_kmalloc_nolock) {
+ /*
+ * No uptr will be unpin even when reuse_now == false since uptr
+ * is only supported in task local storage, where
+ * smap->use_kmalloc_nolock == true.
*/
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
__bpf_selem_free(selem, reuse_now);
@@ -247,18 +231,11 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
}
if (reuse_now) {
- /* reuse_now == true only happens when the storage owner
- * (e.g. task_struct) is being destructed or the map itself
- * is being destructed (ie map_free). In both cases,
- * no bpf prog can have a hold on the selem. It is
- * safe to unpin the uptrs and free the selem now.
- */
- bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
- /* Instead of using the vanilla call_rcu(),
- * bpf_mem_cache_free will be able to reuse selem
- * immediately.
+ /*
+ * While it is okay to call bpf_obj_free_fields() that unpins uptr when
+ * reuse_now == true, keep it in bpf_selem_free_rcu() for simplicity.
*/
- bpf_mem_cache_free(&smap->selem_ma, selem);
+ call_rcu(&selem->rcu, bpf_selem_free_rcu);
return;
}
@@ -339,7 +316,6 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bool reuse_now)
{
- struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage *local_storage;
bool free_local_storage = false;
HLIST_HEAD(selem_free_list);
@@ -351,8 +327,6 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
local_storage = rcu_dereference_check(selem->local_storage,
bpf_rcu_lock_held());
- storage_smap = rcu_dereference_check(local_storage->smap,
- bpf_rcu_lock_held());
raw_spin_lock_irqsave(&local_storage->lock, flags);
if (likely(selem_linked_to_storage(selem)))
@@ -363,7 +337,7 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
bpf_selem_free_list(&selem_free_list, reuse_now);
if (free_local_storage)
- bpf_local_storage_free(local_storage, storage_smap, reuse_now);
+ bpf_local_storage_free(local_storage, reuse_now);
}
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
@@ -456,8 +430,9 @@ int bpf_local_storage_alloc(void *owner,
if (err)
return err;
- if (smap->bpf_ma)
- storage = bpf_mem_cache_alloc_flags(&smap->storage_ma, gfp_flags);
+ if (smap->use_kmalloc_nolock)
+ storage = bpf_map_kmalloc_nolock(&smap->map, sizeof(*storage),
+ __GFP_ZERO, NUMA_NO_NODE);
else
storage = bpf_map_kzalloc(&smap->map, sizeof(*storage),
gfp_flags | __GFP_NOWARN);
@@ -470,7 +445,7 @@ int bpf_local_storage_alloc(void *owner,
INIT_HLIST_HEAD(&storage->list);
raw_spin_lock_init(&storage->lock);
storage->owner = owner;
- storage->bpf_ma = smap->bpf_ma;
+ storage->use_kmalloc_nolock = smap->use_kmalloc_nolock;
bpf_selem_link_storage_nolock(storage, first_selem);
bpf_selem_link_map(smap, first_selem);
@@ -492,22 +467,12 @@ int bpf_local_storage_alloc(void *owner,
bpf_selem_unlink_map(first_selem);
err = -EAGAIN;
goto uncharge;
-
- /* Note that even first_selem was linked to smap's
- * bucket->list, first_selem can be freed immediately
- * (instead of kfree_rcu) because
- * bpf_local_storage_map_free() does a
- * synchronize_rcu_mult (waiting for both sleepable and
- * normal programs) before walking the bucket->list.
- * Hence, no one is accessing selem from the
- * bucket->list under rcu_read_lock().
- */
}
return 0;
uncharge:
- bpf_local_storage_free(storage, smap, true);
+ bpf_local_storage_free(storage, true);
mem_uncharge(smap, owner, sizeof(*storage));
return err;
}
@@ -694,15 +659,12 @@ int bpf_local_storage_map_check_btf(const struct bpf_map *map,
void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
{
- struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage_elem *selem;
bool free_storage = false;
HLIST_HEAD(free_selem_list);
struct hlist_node *n;
unsigned long flags;
- storage_smap = rcu_dereference_check(local_storage->smap, bpf_rcu_lock_held());
-
/* Neither the bpf_prog nor the bpf_map's syscall
* could be modifying the local_storage->list now.
* Thus, no elem can be added to or deleted from the
@@ -732,7 +694,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
bpf_selem_free_list(&free_selem_list, true);
if (free_storage)
- bpf_local_storage_free(local_storage, storage_smap, true);
+ bpf_local_storage_free(local_storage, true);
}
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
@@ -745,20 +707,10 @@ u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
return usage;
}
-/* When bpf_ma == true, the bpf_mem_alloc is used to allocate and free memory.
- * A deadlock free allocator is useful for storage that the bpf prog can easily
- * get a hold of the owner PTR_TO_BTF_ID in any context. eg. bpf_get_current_task_btf.
- * The task and cgroup storage fall into this case. The bpf_mem_alloc reuses
- * memory immediately. To be reuse-immediate safe, the owner destruction
- * code path needs to go through a rcu grace period before calling
- * bpf_local_storage_destroy().
- *
- * When bpf_ma == false, the kmalloc and kfree are used.
- */
struct bpf_map *
bpf_local_storage_map_alloc(union bpf_attr *attr,
struct bpf_local_storage_cache *cache,
- bool bpf_ma)
+ bool use_kmalloc_nolock)
{
struct bpf_local_storage_map *smap;
unsigned int i;
@@ -792,20 +744,9 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
/* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
* preemptible context. Thus, enforce all storages to use
- * bpf_mem_alloc when CONFIG_PREEMPT_RT is enabled.
+ * kmalloc_nolock() when CONFIG_PREEMPT_RT is enabled.
*/
- smap->bpf_ma = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : bpf_ma;
- if (smap->bpf_ma) {
- err = bpf_mem_alloc_init(&smap->selem_ma, smap->elem_size, false);
- if (err)
- goto free_smap;
-
- err = bpf_mem_alloc_init(&smap->storage_ma, sizeof(struct bpf_local_storage), false);
- if (err) {
- bpf_mem_alloc_destroy(&smap->selem_ma);
- goto free_smap;
- }
- }
+ smap->use_kmalloc_nolock = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : use_kmalloc_nolock;
smap->cache_idx = bpf_local_storage_cache_idx_get(cache);
return &smap->map;
@@ -875,12 +816,9 @@ void bpf_local_storage_map_free(struct bpf_map *map,
*/
synchronize_rcu();
- if (smap->bpf_ma) {
+ if (smap->use_kmalloc_nolock) {
rcu_barrier_tasks_trace();
- if (!rcu_trace_implies_rcu_gp())
- rcu_barrier();
- bpf_mem_alloc_destroy(&smap->selem_ma);
- bpf_mem_alloc_destroy(&smap->storage_ma);
+ rcu_barrier();
}
kvfree(smap->buckets);
bpf_map_area_free(smap);
--
2.47.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-14 20:13 ` [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage Amery Hung
@ 2025-11-15 2:01 ` Alexei Starovoitov
2025-11-17 19:21 ` Martin KaFai Lau
2025-11-17 20:37 ` Amery Hung
0 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2025-11-15 2:01 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Fri, Nov 14, 2025 at 12:13 PM Amery Hung <ameryhung@gmail.com> wrote:
>
>
> - if (smap->bpf_ma) {
> + if (smap->use_kmalloc_nolock) {
> rcu_barrier_tasks_trace();
> - if (!rcu_trace_implies_rcu_gp())
> - rcu_barrier();
> - bpf_mem_alloc_destroy(&smap->selem_ma);
> - bpf_mem_alloc_destroy(&smap->storage_ma);
> + rcu_barrier();
Why unconditional rcu_barrier() ?
It's implied in rcu_barrier_tasks_trace().
What am I missing?
The rest looks good.
If that's the only issue, I can fix it up while applying.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements
2025-11-14 20:13 ` [PATCH v2 bpf-next 1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
@ 2025-11-17 18:25 ` Martin KaFai Lau
0 siblings, 0 replies; 16+ messages in thread
From: Martin KaFai Lau @ 2025-11-17 18:25 UTC (permalink / raw)
To: Amery Hung
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, kernel-team, bpf
On 11/14/25 12:13 PM, Amery Hung wrote:
> Since commit a96a44aba556 ("bpf: bpf_sk_storage: Fix invalid wait
> context lockdep report"), {charge,uncharge}_mem are always true when
> allocating a bpf_local_storage_elem or unlinking a bpf_local_storage_elem
> from local storage, so drop these arguments. No functional change.
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 2/4] bpf: Remove smap argument from bpf_selem_free()
2025-11-14 20:13 ` [PATCH v2 bpf-next 2/4] bpf: Remove smap argument from bpf_selem_free() Amery Hung
@ 2025-11-17 18:32 ` Martin KaFai Lau
0 siblings, 0 replies; 16+ messages in thread
From: Martin KaFai Lau @ 2025-11-17 18:32 UTC (permalink / raw)
To: Amery Hung
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, kernel-team, bpf
On 11/14/25 12:13 PM, Amery Hung wrote:
> Since selem already saves a pointer to smap, use it instead of an
> additional argument in bpf_selem_free(). This requires moving the
> SDATA(selem)->smap assignment from bpf_selem_link_map() to
> bpf_selem_alloc() since bpf_selem_free() may be called without the
> selem being linked to smap in bpf_local_storage_update().
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 3/4] bpf: Save memory alloction info in bpf_local_storage
2025-11-14 20:13 ` [PATCH v2 bpf-next 3/4] bpf: Save memory alloction info in bpf_local_storage Amery Hung
@ 2025-11-17 18:36 ` Martin KaFai Lau
0 siblings, 0 replies; 16+ messages in thread
From: Martin KaFai Lau @ 2025-11-17 18:36 UTC (permalink / raw)
To: Amery Hung
Cc: netdev, alexei.starovoitov, andrii, daniel, martin.lau, memxor,
kpsingh, yonghong.song, song, kernel-team, bpf
On 11/14/25 12:13 PM, Amery Hung wrote:
> Save the memory allocation method used for bpf_local_storage in the
> struct explicitly so that we don't need to go through the hassle to
> find out the info. When a later patch replaces BPF memory allocator
> with kmalloc_noloc(), bpf_local_storage_free() will no longer need
> smap->storage_ma to return the memory and completely remove the
> dependency on smap in bpf_local_storage_free().
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-15 2:01 ` Alexei Starovoitov
@ 2025-11-17 19:21 ` Martin KaFai Lau
2025-11-17 20:37 ` Amery Hung
1 sibling, 0 replies; 16+ messages in thread
From: Martin KaFai Lau @ 2025-11-17 19:21 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Amery Hung, bpf, Network Development, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
KP Singh, Yonghong Song, Song Liu, Kernel Team
On 11/14/25 6:01 PM, Alexei Starovoitov wrote:
> On Fri, Nov 14, 2025 at 12:13 PM Amery Hung <ameryhung@gmail.com> wrote:
>>
>>
>> - if (smap->bpf_ma) {
>> + if (smap->use_kmalloc_nolock) {
>> rcu_barrier_tasks_trace();
>> - if (!rcu_trace_implies_rcu_gp())
>> - rcu_barrier();
>> - bpf_mem_alloc_destroy(&smap->selem_ma);
>> - bpf_mem_alloc_destroy(&smap->storage_ma);
>> + rcu_barrier();
>
> Why unconditional rcu_barrier() ?
> It's implied in rcu_barrier_tasks_trace().
> What am I missing?
Amery probably can confirm. I think the bpf_obj_free_fields() may only need to
wait for a rcu gp without going through a rcu_tasks_trace gp and the tasks_trace
cb, so it needs to ensure all rcu callbacks has finished.
@@ -247,18 +231,11 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
}
if (reuse_now) {
- /* reuse_now == true only happens when the storage owner
- * (e.g. task_struct) is being destructed or the map itself
- * is being destructed (ie map_free). In both cases,
- * no bpf prog can have a hold on the selem. It is
- * safe to unpin the uptrs and free the selem now.
- */
- bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
- /* Instead of using the vanilla call_rcu(),
- * bpf_mem_cache_free will be able to reuse selem
- * immediately.
+ /*
+ * While it is okay to call bpf_obj_free_fields() that unpins uptr when
+ * reuse_now == true, keep it in bpf_selem_free_rcu() for simplicity.
*/
- bpf_mem_cache_free(&smap->selem_ma, selem);
+ call_rcu(&selem->rcu, bpf_selem_free_rcu);
return;
}
Others lgtm also,
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-15 2:01 ` Alexei Starovoitov
2025-11-17 19:21 ` Martin KaFai Lau
@ 2025-11-17 20:37 ` Amery Hung
2025-11-17 23:36 ` Alexei Starovoitov
1 sibling, 1 reply; 16+ messages in thread
From: Amery Hung @ 2025-11-17 20:37 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Fri, Nov 14, 2025 at 6:01 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 12:13 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> >
> > - if (smap->bpf_ma) {
> > + if (smap->use_kmalloc_nolock) {
> > rcu_barrier_tasks_trace();
> > - if (!rcu_trace_implies_rcu_gp())
> > - rcu_barrier();
> > - bpf_mem_alloc_destroy(&smap->selem_ma);
> > - bpf_mem_alloc_destroy(&smap->storage_ma);
> > + rcu_barrier();
>
> Why unconditional rcu_barrier() ?
> It's implied in rcu_barrier_tasks_trace().
Hmm, I am not sure.
> What am I missing?
I hit a UAF in v1 in bpf_selem_free_rcu() when running selftests and
making rcu_barrier() unconditional addressed it. I think the bug was
due to map_free() not waiting for bpf_selem_free_rcu() (an RCU
callback) to finish.
Looking at rcu_barrier() and rcu_barrier_tasks_trace(), they pass
different rtp to rcu_barrier_tasks_generic() so I think both are
needed to make sure in-flight RCU and RCU tasks trace callbacks are
done.
Not an expert in RCU so I might be wrong and it was something else.
>
> The rest looks good.
> If that's the only issue, I can fix it up while applying.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-17 20:37 ` Amery Hung
@ 2025-11-17 23:36 ` Alexei Starovoitov
2025-11-17 23:46 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-11-17 23:36 UTC (permalink / raw)
To: Amery Hung, Paul E. McKenney
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, KP Singh,
Yonghong Song, Song Liu, Kernel Team
On Mon, Nov 17, 2025 at 12:37 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 6:01 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Nov 14, 2025 at 12:13 PM Amery Hung <ameryhung@gmail.com> wrote:
> > >
> > >
> > > - if (smap->bpf_ma) {
> > > + if (smap->use_kmalloc_nolock) {
> > > rcu_barrier_tasks_trace();
> > > - if (!rcu_trace_implies_rcu_gp())
> > > - rcu_barrier();
> > > - bpf_mem_alloc_destroy(&smap->selem_ma);
> > > - bpf_mem_alloc_destroy(&smap->storage_ma);
> > > + rcu_barrier();
> >
> > Why unconditional rcu_barrier() ?
> > It's implied in rcu_barrier_tasks_trace().
>
> Hmm, I am not sure.
>
> > What am I missing?
>
> I hit a UAF in v1 in bpf_selem_free_rcu() when running selftests and
> making rcu_barrier() unconditional addressed it. I think the bug was
> due to map_free() not waiting for bpf_selem_free_rcu() (an RCU
> callback) to finish.
>
> Looking at rcu_barrier() and rcu_barrier_tasks_trace(), they pass
> different rtp to rcu_barrier_tasks_generic() so I think both are
> needed to make sure in-flight RCU and RCU tasks trace callbacks are
> done.
>
> Not an expert in RCU so I might be wrong and it was something else.
Paul,
Please help us here.
Does rcu_barrier_tasks_trace() imply rcu_barrier() ?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-17 23:36 ` Alexei Starovoitov
@ 2025-11-17 23:46 ` Paul E. McKenney
2025-11-18 0:24 ` Amery Hung
0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2025-11-17 23:46 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Amery Hung, bpf, Network Development, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
KP Singh, Yonghong Song, Song Liu, Kernel Team
On Mon, Nov 17, 2025 at 03:36:08PM -0800, Alexei Starovoitov wrote:
> On Mon, Nov 17, 2025 at 12:37 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > On Fri, Nov 14, 2025 at 6:01 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Nov 14, 2025 at 12:13 PM Amery Hung <ameryhung@gmail.com> wrote:
> > > >
> > > >
> > > > - if (smap->bpf_ma) {
> > > > + if (smap->use_kmalloc_nolock) {
> > > > rcu_barrier_tasks_trace();
> > > > - if (!rcu_trace_implies_rcu_gp())
> > > > - rcu_barrier();
> > > > - bpf_mem_alloc_destroy(&smap->selem_ma);
> > > > - bpf_mem_alloc_destroy(&smap->storage_ma);
> > > > + rcu_barrier();
> > >
> > > Why unconditional rcu_barrier() ?
> > > It's implied in rcu_barrier_tasks_trace().
> >
> > Hmm, I am not sure.
> >
> > > What am I missing?
> >
> > I hit a UAF in v1 in bpf_selem_free_rcu() when running selftests and
> > making rcu_barrier() unconditional addressed it. I think the bug was
> > due to map_free() not waiting for bpf_selem_free_rcu() (an RCU
> > callback) to finish.
> >
> > Looking at rcu_barrier() and rcu_barrier_tasks_trace(), they pass
> > different rtp to rcu_barrier_tasks_generic() so I think both are
> > needed to make sure in-flight RCU and RCU tasks trace callbacks are
> > done.
> >
> > Not an expert in RCU so I might be wrong and it was something else.
>
> Paul,
>
> Please help us here.
> Does rcu_barrier_tasks_trace() imply rcu_barrier() ?
I am sorry, but no, it does not.
If latency proves to be an issue, one approach is to invoke rcu_barrier()
and rcu_barrier_tasks_trace() each in its own workqueue handler. But as
always, I suggest invoking them one after the other to see if a latency
problem really exists before adding complexity.
Except that rcu_barrier_tasks_trace() is never invoked by rcu_barrier(),
only rcu_barrier_tasks() and rcu_barrier_tasks_trace(). So do you really
mean rcu_barrier()? Or rcu_barrier_tasks()?
Either way, rcu_barrier_tasks() and rcu_barrier_tasks_trace() are also
independent of each other in the sense that if you need tw wait on
callbacks from both call_rcu_tasks() and call_rcu_tasks_trace(), you
need both rcu_barrier_tasks() and rcu_barrier_tasks_trace() to be invoked.
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-17 23:46 ` Paul E. McKenney
@ 2025-11-18 0:24 ` Amery Hung
2025-11-18 0:41 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Amery Hung @ 2025-11-18 0:24 UTC (permalink / raw)
To: paulmck
Cc: Alexei Starovoitov, bpf, Network Development, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
KP Singh, Yonghong Song, Song Liu, Kernel Team
On Mon, Nov 17, 2025 at 3:46 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Nov 17, 2025 at 03:36:08PM -0800, Alexei Starovoitov wrote:
> > On Mon, Nov 17, 2025 at 12:37 PM Amery Hung <ameryhung@gmail.com> wrote:
> > >
> > > On Fri, Nov 14, 2025 at 6:01 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Fri, Nov 14, 2025 at 12:13 PM Amery Hung <ameryhung@gmail.com> wrote:
> > > > >
> > > > >
> > > > > - if (smap->bpf_ma) {
> > > > > + if (smap->use_kmalloc_nolock) {
> > > > > rcu_barrier_tasks_trace();
> > > > > - if (!rcu_trace_implies_rcu_gp())
> > > > > - rcu_barrier();
> > > > > - bpf_mem_alloc_destroy(&smap->selem_ma);
> > > > > - bpf_mem_alloc_destroy(&smap->storage_ma);
> > > > > + rcu_barrier();
> > > >
> > > > Why unconditional rcu_barrier() ?
> > > > It's implied in rcu_barrier_tasks_trace().
> > >
> > > Hmm, I am not sure.
> > >
> > > > What am I missing?
> > >
> > > I hit a UAF in v1 in bpf_selem_free_rcu() when running selftests and
> > > making rcu_barrier() unconditional addressed it. I think the bug was
> > > due to map_free() not waiting for bpf_selem_free_rcu() (an RCU
> > > callback) to finish.
> > >
> > > Looking at rcu_barrier() and rcu_barrier_tasks_trace(), they pass
> > > different rtp to rcu_barrier_tasks_generic() so I think both are
> > > needed to make sure in-flight RCU and RCU tasks trace callbacks are
> > > done.
> > >
> > > Not an expert in RCU so I might be wrong and it was something else.
> >
> > Paul,
> >
> > Please help us here.
> > Does rcu_barrier_tasks_trace() imply rcu_barrier() ?
>
> I am sorry, but no, it does not.
Thanks for the clarification, Paul!
>
> If latency proves to be an issue, one approach is to invoke rcu_barrier()
> and rcu_barrier_tasks_trace() each in its own workqueue handler. But as
> always, I suggest invoking them one after the other to see if a latency
> problem really exists before adding complexity.
>
> Except that rcu_barrier_tasks_trace() is never invoked by rcu_barrier(),
> only rcu_barrier_tasks() and rcu_barrier_tasks_trace(). So do you really
> mean rcu_barrier()? Or rcu_barrier_tasks()?
Sorry for the confusion. I misread the code. I was trying to say that
rcu_barrier() and rcu_barrier_tasks_trace() seem to wait on different
callacks but then referring to rcu_barrier_tasks() implementation
wrongly.
>
> Either way, rcu_barrier_tasks() and rcu_barrier_tasks_trace() are also
> independent of each other in the sense that if you need tw wait on
> callbacks from both call_rcu_tasks() and call_rcu_tasks_trace(), you
> need both rcu_barrier_tasks() and rcu_barrier_tasks_trace() to be invoked.
>
> Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
2025-11-18 0:24 ` Amery Hung
@ 2025-11-18 0:41 ` Paul E. McKenney
0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2025-11-18 0:41 UTC (permalink / raw)
To: Amery Hung
Cc: Alexei Starovoitov, bpf, Network Development, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
KP Singh, Yonghong Song, Song Liu, Kernel Team
On Mon, Nov 17, 2025 at 04:24:56PM -0800, Amery Hung wrote:
> On Mon, Nov 17, 2025 at 3:46 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Mon, Nov 17, 2025 at 03:36:08PM -0800, Alexei Starovoitov wrote:
> > > On Mon, Nov 17, 2025 at 12:37 PM Amery Hung <ameryhung@gmail.com> wrote:
> > > >
> > > > On Fri, Nov 14, 2025 at 6:01 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Fri, Nov 14, 2025 at 12:13 PM Amery Hung <ameryhung@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > - if (smap->bpf_ma) {
> > > > > > + if (smap->use_kmalloc_nolock) {
> > > > > > rcu_barrier_tasks_trace();
> > > > > > - if (!rcu_trace_implies_rcu_gp())
> > > > > > - rcu_barrier();
> > > > > > - bpf_mem_alloc_destroy(&smap->selem_ma);
> > > > > > - bpf_mem_alloc_destroy(&smap->storage_ma);
> > > > > > + rcu_barrier();
> > > > >
> > > > > Why unconditional rcu_barrier() ?
> > > > > It's implied in rcu_barrier_tasks_trace().
> > > >
> > > > Hmm, I am not sure.
> > > >
> > > > > What am I missing?
> > > >
> > > > I hit a UAF in v1 in bpf_selem_free_rcu() when running selftests and
> > > > making rcu_barrier() unconditional addressed it. I think the bug was
> > > > due to map_free() not waiting for bpf_selem_free_rcu() (an RCU
> > > > callback) to finish.
> > > >
> > > > Looking at rcu_barrier() and rcu_barrier_tasks_trace(), they pass
> > > > different rtp to rcu_barrier_tasks_generic() so I think both are
> > > > needed to make sure in-flight RCU and RCU tasks trace callbacks are
> > > > done.
> > > >
> > > > Not an expert in RCU so I might be wrong and it was something else.
> > >
> > > Paul,
> > >
> > > Please help us here.
> > > Does rcu_barrier_tasks_trace() imply rcu_barrier() ?
> >
> > I am sorry, but no, it does not.
>
> Thanks for the clarification, Paul!
No problem!
> > If latency proves to be an issue, one approach is to invoke rcu_barrier()
> > and rcu_barrier_tasks_trace() each in its own workqueue handler. But as
> > always, I suggest invoking them one after the other to see if a latency
> > problem really exists before adding complexity.
> >
> > Except that rcu_barrier_tasks_trace() is never invoked by rcu_barrier(),
> > only rcu_barrier_tasks() and rcu_barrier_tasks_trace(). So do you really
> > mean rcu_barrier()? Or rcu_barrier_tasks()?
>
> Sorry for the confusion. I misread the code. I was trying to say that
> rcu_barrier() and rcu_barrier_tasks_trace() seem to wait on different
> callacks but then referring to rcu_barrier_tasks() implementation
> wrongly.
Well, you did reach the correct conclusion, even if by dubious means. ;-)
Thanx, Paul
> > Either way, rcu_barrier_tasks() and rcu_barrier_tasks_trace() are also
> > independent of each other in the sense that if you need tw wait on
> > callbacks from both call_rcu_tasks() and call_rcu_tasks_trace(), you
> > need both rcu_barrier_tasks() and rcu_barrier_tasks_trace() to be invoked.
> >
> > Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage
2025-11-14 20:13 [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage Amery Hung
` (3 preceding siblings ...)
2025-11-14 20:13 ` [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage Amery Hung
@ 2025-11-19 0:30 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 16+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-11-19 0:30 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, martin.lau,
memxor, kpsingh, yonghong.song, song, kernel-team
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Fri, 14 Nov 2025 12:13:22 -0800 you wrote:
> Hi,
>
> This patchset tries to simplify bpf_local_storage.c by adopting
> kmalloc_nolock(). This removes memory preallocation and reduces the
> dependency of smap in bpf_selem_free() and bpf_local_storage_free().
> The later will simplify a future refactor that replaces
> local_storage->lock and b->lock [1].
>
> [...]
Here is the summary with links:
- [v2,bpf-next,1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements
https://git.kernel.org/bpf/bpf-next/c/0e854e553569
- [v2,bpf-next,2/4] bpf: Remove smap argument from bpf_selem_free()
https://git.kernel.org/bpf/bpf-next/c/e76a33e1c718
- [v2,bpf-next,3/4] bpf: Save memory alloction info in bpf_local_storage
https://git.kernel.org/bpf/bpf-next/c/39a460c4253e
- [v2,bpf-next,4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage
https://git.kernel.org/bpf/bpf-next/c/f484f4a3e058
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-11-19 0:30 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 20:13 [PATCH v2 bpf-next 0/4] Replace BPF memory allocator with kmalloc_nolock() in local storage Amery Hung
2025-11-14 20:13 ` [PATCH v2 bpf-next 1/4] bpf: Always charge/uncharge memory when allocating/unlinking storage elements Amery Hung
2025-11-17 18:25 ` Martin KaFai Lau
2025-11-14 20:13 ` [PATCH v2 bpf-next 2/4] bpf: Remove smap argument from bpf_selem_free() Amery Hung
2025-11-17 18:32 ` Martin KaFai Lau
2025-11-14 20:13 ` [PATCH v2 bpf-next 3/4] bpf: Save memory alloction info in bpf_local_storage Amery Hung
2025-11-17 18:36 ` Martin KaFai Lau
2025-11-14 20:13 ` [PATCH v2 bpf-next 4/4] bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage Amery Hung
2025-11-15 2:01 ` Alexei Starovoitov
2025-11-17 19:21 ` Martin KaFai Lau
2025-11-17 20:37 ` Amery Hung
2025-11-17 23:36 ` Alexei Starovoitov
2025-11-17 23:46 ` Paul E. McKenney
2025-11-18 0:24 ` Amery Hung
2025-11-18 0:41 ` Paul E. McKenney
2025-11-19 0:30 ` [PATCH v2 bpf-next 0/4] Replace BPF " patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).