* [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local
@ 2025-07-29 18:25 Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
` (12 more replies)
0 siblings, 13 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
* Motivation *
The goal of this patchset is to make bpf syscalls and helpers updating
task and cgroup local storage more robust by removing percpu counters
in them. Task local storage and cgroup storage each employs a percpu
counter to prevent deadlock caused by recursion. Since the underlying
bpf local storage takes spinlocks in various operations, bpf programs
running recursively may try to take a spinlock which is already taken.
For example, when a tracing bpf program called recursively during
bpf_task_storage_get(..., F_CREATE) tries to call
bpf_task_storage_get(..., F_CREATE) again, it will cause AA deadlock
if the percpu variable is not in place.
However, sometimes, the percpu counter may cause bpf syscalls or helpers
to return errors spuriously, as soon as another threads is also updating
the local storage or the local storage map. Ideally, the two threads
could have taken turn to take the locks and perform their jobs
respectively. However, due to the percpu counter, the syscalls and
helpers can return -EBUSY even if one of them does not run recursively
in another one. All it takes for this to happen is if the two threads run
on the same CPU. This happened when BPF-CI ran the selftest of task local
data. Since CI runs the test on VM with 2 CPUs, bpf_task_storage_get(...,
F_CREATE) can easily fail.
The failure mode is not good for users as they need to add retry logic
in user space or bpf programs to avoid it. Even with retry, there
is no guaranteed upper bound of the loop for a succeess call. Therefore,
this patchset seeks to remove the percpu counter and makes the related
bpf syscalls and helpers more reliable, while still make sure recursion
deadlock will not happen, with the help of resilient queued spinlock
(rqspinlock).
* Implementation *
To remove the percpu counter without introducing deadlock,
bpf_local_storage is refactored by changing the locks from raw_spin_lock
to rqspinlock, which prevents deadlock with deadlock detection and a
timeout mechanism.
There are two locks in bpf_local_storage due to the ownership model as
illustrated in the figure below. A map value, which consists of a
pointer to the map and the data, is a bpf_local_storage_map_data (sdata)
stored in a bpf_local_storage_elem (selem). A selem belongs to a
bpf_local_storage and bpf_local_storage_map at the same time.
bpf_local_storage::lock (lock_storage->lock in short) protects the list
in a bpf_local_storage and bpf_local_storage_map_bucket::lock (b->lock)
protects the hash bucket in a bpf_local_storage_map.
task_struct
┌ task1 ───────┐ bpf_local_storage
│ *bpf_storage │---->┌─────────┐
└──────────────┘<----│ *owner │ bpf_local_storage_elem
│ *cache[16] (selem) selem
│ *smap │ ┌──────────┐ ┌──────────┐
│ list │------->│ snode │<------->│ snode │
│ lock │ ┌---->│ map_node │<--┐ ┌-->│ map_node │
└─────────┘ │ │ sdata = │ │ │ │ sdata = │
task_struct │ │ {&mapA,} │ │ │ │ {&mapB,} │
┌ task2 ───────┐ bpf_local_storage └──────────┘ │ │ └──────────┘
│ *bpf_storage │---->┌─────────┐ │ │ │
└──────────────┘<----│ *owner │ │ │ │
│ *cache[16] │ selem │ │ selem
│ *smap │ │ ┌──────────┐ │ │ ┌──────────┐
│ list │--│---->│ snode │<--│-│-->│ snode │
│ lock │ │ ┌-->│ map_node │ └-│-->│ map_node │
└─────────┘ │ │ │ sdata = │ │ │ sdata = │
bpf_local_storage_map │ │ │ {&mapB,} │ │ │ {&mapA,} │
(smap) │ │ └──────────┘ │ └──────────┘
┌ mapA ───────┐ │ │ │
│ bpf_map map │ bpf_local_storage_map_bucket │
│ *buckets │---->┌ b[0] ┐ │ │ │
└─────────────┘ │ list │------┘ │ │
│ lock │ │ │
└──────┘ │ │
smap ... │ │
┌ mapB ───────┐ │ │
│ bpf_map map │ bpf_local_storage_map_bucket │
│ *buckets │---->┌ b[0] ┐ │ │
└─────────────┘ │ list │--------┘ │
│ lock │ │
└──────┘ │
┌ b[1] ┐ │
│ list │-----------------------------┘
│ lock │
└──────┘
...
The refactoring is divided into three steps.
First, in patch 1-4, local storage helpers that take locks are being
converted to failable. The functions are changed from returning void to
returning an int error code with the return value temporarily set to 0.
In callers where the helpers cannot fail in the middle of an update,
the helper is open coded. In callers that are not allowed to fail, (i.e.,
bpf_local_storage_destroy() and bpf_local_storage_map_free(), we make
sure the functions cannot be called recursively, causing deadlock, and
assert the return value to be 0.
Then, in patch 5, the locks are changed to rqspinlock, and the error
returned from raw_res_spin_lock_irqsave() is propogated to the syscalls
and heleprs.
Finally, in patch 7-8, the percpu counters in task and cgroup local
storage are removed.
Question:
- In bpf_local_storage_destroy() and bpf_local_storage_map_free(), where
it is not allow to fail, I assert that the lock acquisition always
succeeds based on the fact that 1) these paths cannot run recursively
causing AA deadlock and 2) local_storage->lock and b->lock are always
acquired in the same order, but I also notice that rqspinlock has
a timeout fallback. Is this assertion an okay thing to do?
* Test *
Task and cgroup local storage selftests have already covered deadlock
caused by recursion. Patch 9 updates the expected result of task local
storage selftests as task local storage bpf helpers can now run on the
same CPU as they don't cause deadlock.
* Patch organization *
E(exposed) L(local storage lock) B(bucket lock)
EL __bpf_local_storage_insert_cache (skip cache update)
ELB bpf_local_storage_destroy (cannot recur)
ELB bpf_local_storage_map_free (cannot recur)
ELB bpf_selem_unlink --> Patch 4
E B bpf_selem_link_map --> Patch 2
B bpf_selem_unlink_map --> Patch 1
L bpf_selem_unlink_storage --> Patch 3
During the refactoring, to make sure all exposed functions
handle the error returned by raw_res_spin_lock_irqsave(),
__must_check is added locally to catch all callers.
Patch 1-4
Convert local storage helpers to failable, or open-code
the helpers
Patch 5
Change local_storage->lock and b->lock from
raw_spin_lock to rqspinlock
Patch 6
Remove percpu lock in task local storage and remove
bpf_task_storage_{get,delete}_recur()
Patch 7
Remove percpu lock in cgroup local storage
Patch 8
Remove percpu lock in bpf_local_storage
Patch 9
Update task local storage recursion test
Patch 10
Remove task local storage stress test
Patch 11
Update btf_dump to use another percpu variable
----
Amery Hung (11):
bpf: Convert bpf_selem_unlink_map to failable
bpf: Convert bpf_selem_link_map to failable
bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink
bpf: Convert bpf_selem_unlink to failable
bpf: Change local_storage->lock and b->lock to rqspinlock
bpf: Remove task local storage percpu counter
bpf: Remove cgroup local storage percpu counter
bpf: Remove unused percpu counter from bpf_local_storage_map_free
selftests/bpf: Update task_local_storage/recursion test
selftests/bpf: Remove test_task_storage_map_stress_lookup
selftests/bpf: Choose another percpu variable in bpf for btf_dump test
include/linux/bpf_local_storage.h | 14 +-
kernel/bpf/bpf_cgrp_storage.c | 60 +-----
kernel/bpf/bpf_inode_storage.c | 6 +-
kernel/bpf/bpf_local_storage.c | 202 ++++++++++++------
kernel/bpf/bpf_task_storage.c | 153 ++-----------
kernel/bpf/helpers.c | 4 -
net/core/bpf_sk_storage.c | 10 +-
.../bpf/map_tests/task_storage_map.c | 128 -----------
.../selftests/bpf/prog_tests/btf_dump.c | 4 +-
.../bpf/prog_tests/task_local_storage.c | 8 +-
.../bpf/progs/read_bpf_task_storage_busy.c | 38 ----
11 files changed, 184 insertions(+), 443 deletions(-)
delete mode 100644 tools/testing/selftests/bpf/map_tests/task_storage_map.c
delete mode 100644 tools/testing/selftests/bpf/progs/read_bpf_task_storage_busy.c
--
2.47.3
^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-08-01 1:05 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 02/11] bpf: Convert bpf_selem_link_map " Amery Hung
` (11 subsequent siblings)
12 siblings, 1 reply; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock,
convert bpf_selem_unlink_map() to failable. It still always succeeds and
returns 0 for now.
Since some operations updating local storage cannot fail in the middle,
open-code bpf_selem_unlink_map() to take the b->lock before the
operation. There are two such locations:
- bpf_local_storage_alloc()
The first selem will be unlinked from smap if cmpxchg owner_storage_ptr
fails, which should not fail. Therefore, hold b->lock when linking
until allocation complete. Helpers that assume b->lock is held by
callers are introduced: bpf_selem_link_map_nolock() and
bpf_selem_unlink_map_nolock().
- bpf_local_storage_update()
The three step update process: link_map(new_selem),
link_storage(new_selem), and unlink_map(old_selem) should not fail in
the middle. Hence, lock both b->lock before the update process starts.
While locking two different buckets decided by the hash function
introduces different locking order, this will not cause ABBA deadlock
since this is performed under local_storage->lock.
- bpf_selem_unlink()
bpf_selem_unlink_map() and bpf_selem_unlink_storage() should either
all succeed or fail as a whole instead of failing in the middle.
As the first step, open code bpf_selem_unlink_map(). A later patch
will open code bpf_selem_unlink_storage(). Then, unlink_map and
unlink_storage will be done after successfully acquiring both
local_storage->lock and b->lock.
One caller of bpf_selem_unlink_map() cannot run recursively (e.g.,
called by helpers in tracing bpf programs) and therefore cannot deadlock.
Assert that these calls cannot fail instead of handling them.
- bpf_local_storage_destroy()
Called by owner (e.g., task_struct, sk, ...). Will not recur and
cause AA deadlock.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
kernel/bpf/bpf_local_storage.c | 75 ++++++++++++++++++++++++++++------
1 file changed, 62 insertions(+), 13 deletions(-)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index b931fbceb54d..7e39b88ef795 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -409,7 +409,7 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
hlist_add_head_rcu(&selem->snode, &local_storage->list);
}
-static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
+static int bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map *smap;
struct bpf_local_storage_map_bucket *b;
@@ -417,7 +417,7 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
if (unlikely(!selem_linked_to_map_lockless(selem)))
/* selem has already be unlinked from smap */
- return;
+ return 0;
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
b = select_bucket(smap, selem);
@@ -425,6 +425,14 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
if (likely(selem_linked_to_map(selem)))
hlist_del_init_rcu(&selem->map_node);
raw_spin_unlock_irqrestore(&b->lock, flags);
+
+ return 0;
+}
+
+static void bpf_selem_unlink_map_nolock(struct bpf_local_storage_elem *selem)
+{
+ if (likely(selem_linked_to_map(selem)))
+ hlist_del_init_rcu(&selem->map_node);
}
void bpf_selem_link_map(struct bpf_local_storage_map *smap,
@@ -439,13 +447,33 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
raw_spin_unlock_irqrestore(&b->lock, flags);
}
+static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem,
+ struct bpf_local_storage_map_bucket *b)
+{
+ RCU_INIT_POINTER(SDATA(selem)->smap, smap);
+ hlist_add_head_rcu(&selem->map_node, &b->list);
+}
+
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
{
- /* Always unlink from map before unlinking from local_storage
- * because selem will be freed after successfully unlinked from
- * the local_storage.
- */
- bpf_selem_unlink_map(selem);
+ struct bpf_local_storage_map_bucket *b;
+ struct bpf_local_storage_map *smap;
+ unsigned long flags;
+
+ if (likely(selem_linked_to_map_lockless(selem))) {
+ smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
+ b = select_bucket(smap, selem);
+ raw_spin_lock_irqsave(&b->lock, flags);
+
+ /* Always unlink from map before unlinking from local_storage
+ * because selem will be freed after successfully unlinked from
+ * the local_storage.
+ */
+ bpf_selem_unlink_map_nolock(selem);
+ raw_spin_unlock_irqrestore(&b->lock, flags);
+ }
+
bpf_selem_unlink_storage(selem, reuse_now);
}
@@ -487,6 +515,8 @@ int bpf_local_storage_alloc(void *owner,
{
struct bpf_local_storage *prev_storage, *storage;
struct bpf_local_storage **owner_storage_ptr;
+ struct bpf_local_storage_map_bucket *b;
+ unsigned long flags;
int err;
err = mem_charge(smap, owner, sizeof(*storage));
@@ -509,7 +539,10 @@ int bpf_local_storage_alloc(void *owner,
storage->owner = owner;
bpf_selem_link_storage_nolock(storage, first_selem);
- bpf_selem_link_map(smap, first_selem);
+
+ b = select_bucket(smap, first_selem);
+ raw_spin_lock_irqsave(&b->lock, flags);
+ bpf_selem_link_map_nolock(smap, first_selem, b);
owner_storage_ptr =
(struct bpf_local_storage **)owner_storage(smap, owner);
@@ -525,7 +558,8 @@ int bpf_local_storage_alloc(void *owner,
*/
prev_storage = cmpxchg(owner_storage_ptr, NULL, storage);
if (unlikely(prev_storage)) {
- bpf_selem_unlink_map(first_selem);
+ bpf_selem_unlink_map_nolock(first_selem);
+ raw_spin_unlock_irqrestore(&b->lock, flags);
err = -EAGAIN;
goto uncharge;
@@ -539,6 +573,7 @@ int bpf_local_storage_alloc(void *owner,
* bucket->list under rcu_read_lock().
*/
}
+ raw_spin_unlock_irqrestore(&b->lock, flags);
return 0;
@@ -560,8 +595,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
struct bpf_local_storage_data *old_sdata = NULL;
struct bpf_local_storage_elem *alloc_selem, *selem = NULL;
struct bpf_local_storage *local_storage;
+ struct bpf_local_storage_map_bucket *b, *old_b;
HLIST_HEAD(old_selem_free_list);
- unsigned long flags;
+ unsigned long flags, b_flags, old_b_flags;
int err;
/* BPF_EXIST and BPF_NOEXIST cannot be both set */
@@ -645,20 +681,31 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
goto unlock;
}
+ b = select_bucket(smap, selem);
+ old_b = old_sdata ? select_bucket(smap, SELEM(old_sdata)) : b;
+
+ raw_spin_lock_irqsave(&b->lock, b_flags);
+ if (b != old_b)
+ raw_spin_lock_irqsave(&old_b->lock, old_b_flags);
+
alloc_selem = NULL;
/* First, link the new selem to the map */
- bpf_selem_link_map(smap, selem);
+ bpf_selem_link_map_nolock(smap, selem, b);
/* Second, link (and publish) the new selem to local_storage */
bpf_selem_link_storage_nolock(local_storage, selem);
/* Third, remove old selem, SELEM(old_sdata) */
if (old_sdata) {
- bpf_selem_unlink_map(SELEM(old_sdata));
+ bpf_selem_unlink_map_nolock(SELEM(old_sdata));
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
true, &old_selem_free_list);
}
+ if (b != old_b)
+ raw_spin_unlock_irqrestore(&old_b->lock, old_b_flags);
+ raw_spin_unlock_irqrestore(&b->lock, b_flags);
+
unlock:
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&old_selem_free_list, false);
@@ -736,6 +783,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
HLIST_HEAD(free_selem_list);
struct hlist_node *n;
unsigned long flags;
+ int err;
storage_smap = rcu_dereference_check(local_storage->smap, bpf_rcu_lock_held());
bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, NULL);
@@ -754,7 +802,8 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
/* Always unlink from map before unlinking from
* local_storage.
*/
- bpf_selem_unlink_map(selem);
+ err = bpf_selem_unlink_map(selem);
+ WARN_ON(err);
/* If local_storage list has only one element, the
* bpf_selem_unlink_storage_nolock() will return true.
* Otherwise, it will return false. The current loop iteration
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 02/11] bpf: Convert bpf_selem_link_map to failable
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink Amery Hung
` (10 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock,
convert bpf_selem_link_map to failable. It still always succeeds and
returns 0 until the change happen. No functional change.
__must_check is added to the function declaration locally to make sure
all the callers are accounted for during the conversion.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 4 ++--
kernel/bpf/bpf_local_storage.c | 6 ++++--
net/core/bpf_sk_storage.c | 4 +++-
3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index ab7244d8108f..dc56fa459ac9 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -182,8 +182,8 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now);
-void bpf_selem_link_map(struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *selem);
+int bpf_selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem);
struct bpf_local_storage_elem *
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 7e39b88ef795..95e2dcf919ac 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -435,8 +435,8 @@ static void bpf_selem_unlink_map_nolock(struct bpf_local_storage_elem *selem)
hlist_del_init_rcu(&selem->map_node);
}
-void bpf_selem_link_map(struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *selem)
+int bpf_selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
unsigned long flags;
@@ -445,6 +445,8 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
RCU_INIT_POINTER(SDATA(selem)->smap, smap);
hlist_add_head_rcu(&selem->map_node, &b->list);
raw_spin_unlock_irqrestore(&b->lock, flags);
+
+ return 0;
}
static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 2e538399757f..fac5cf385785 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -194,7 +194,9 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
}
if (new_sk_storage) {
- bpf_selem_link_map(smap, copy_selem);
+ ret = bpf_selem_link_map(smap, copy_selem);
+ if (ret)
+ goto out;
bpf_selem_link_storage_nolock(new_sk_storage, copy_selem);
} else {
ret = bpf_local_storage_alloc(newsk, smap, copy_selem, GFP_ATOMIC);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 02/11] bpf: Convert bpf_selem_link_map " Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-08-02 0:58 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 04/11] bpf: Convert bpf_selem_unlink to failable Amery Hung
` (9 subsequent siblings)
12 siblings, 1 reply; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
To prepare for changing bpf_local_storage::lock to rqspinlock, open code
bpf_selem_unlink_storage() in the only caller, bpf_selem_unlink(), since
unlink_map and unlink_storage must be done together after all the
necessary locks are acquired.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
kernel/bpf/bpf_local_storage.c | 78 ++++++++++++++++------------------
1 file changed, 37 insertions(+), 41 deletions(-)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 95e2dcf919ac..7501227c8974 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -371,37 +371,6 @@ static bool check_storage_bpf_ma(struct bpf_local_storage *local_storage,
return selem_smap->bpf_ma;
}
-static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
- bool reuse_now)
-{
- struct bpf_local_storage_map *storage_smap;
- struct bpf_local_storage *local_storage;
- bool bpf_ma, free_local_storage = false;
- HLIST_HEAD(selem_free_list);
- unsigned long flags;
-
- if (unlikely(!selem_linked_to_storage_lockless(selem)))
- /* selem has already been unlinked from sk */
- return;
-
- local_storage = rcu_dereference_check(selem->local_storage,
- bpf_rcu_lock_held());
- storage_smap = rcu_dereference_check(local_storage->smap,
- bpf_rcu_lock_held());
- bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
-
- raw_spin_lock_irqsave(&local_storage->lock, flags);
- if (likely(selem_linked_to_storage(selem)))
- free_local_storage = bpf_selem_unlink_storage_nolock(
- local_storage, selem, true, &selem_free_list);
- raw_spin_unlock_irqrestore(&local_storage->lock, flags);
-
- bpf_selem_free_list(&selem_free_list, reuse_now);
-
- if (free_local_storage)
- bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
-}
-
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem)
{
@@ -459,24 +428,51 @@ static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
{
+ struct bpf_local_storage_map *storage_smap;
+ struct bpf_local_storage *local_storage = NULL;
+ bool bpf_ma, free_local_storage = false;
+ HLIST_HEAD(selem_free_list);
struct bpf_local_storage_map_bucket *b;
- struct bpf_local_storage_map *smap;
- unsigned long flags;
+ struct bpf_local_storage_map *smap = NULL;
+ unsigned long flags, b_flags;
if (likely(selem_linked_to_map_lockless(selem))) {
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
b = select_bucket(smap, selem);
- raw_spin_lock_irqsave(&b->lock, flags);
+ }
- /* Always unlink from map before unlinking from local_storage
- * because selem will be freed after successfully unlinked from
- * the local_storage.
- */
- bpf_selem_unlink_map_nolock(selem);
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ if (likely(selem_linked_to_storage_lockless(selem))) {
+ local_storage = rcu_dereference_check(selem->local_storage,
+ bpf_rcu_lock_held());
+ storage_smap = rcu_dereference_check(local_storage->smap,
+ bpf_rcu_lock_held());
+ bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
}
- bpf_selem_unlink_storage(selem, reuse_now);
+ if (local_storage)
+ raw_spin_lock_irqsave(&local_storage->lock, flags);
+ if (smap)
+ raw_spin_lock_irqsave(&b->lock, b_flags);
+
+ /* Always unlink from map before unlinking from local_storage
+ * because selem will be freed after successfully unlinked from
+ * the local_storage.
+ */
+ if (smap)
+ bpf_selem_unlink_map_nolock(selem);
+ if (local_storage && likely(selem_linked_to_storage(selem)))
+ free_local_storage = bpf_selem_unlink_storage_nolock(
+ local_storage, selem, true, &selem_free_list);
+
+ if (smap)
+ raw_spin_unlock_irqrestore(&b->lock, b_flags);
+ if (local_storage)
+ raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+
+ bpf_selem_free_list(&selem_free_list, reuse_now);
+
+ if (free_local_storage)
+ bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
}
void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 04/11] bpf: Convert bpf_selem_unlink to failable
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (2 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 05/11] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
` (8 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
To prepare changing both bpf_local_storage_map_bucket::lock and
bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to
failable. It still always succeeds and returns 0 until the change
happen. No functional change.
For bpf_local_storage_map_free(), since it cannot run recursively,
and the local_storage->lock is always acquired before b->lock,
assert that the lock acquisition always succeeds.
__must_check is added to the function declaration locally to make sure
all the callers are accounted for during the conversion.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 2 +-
kernel/bpf/bpf_cgrp_storage.c | 3 +--
kernel/bpf/bpf_inode_storage.c | 4 +---
kernel/bpf/bpf_local_storage.c | 6 ++++--
kernel/bpf/bpf_task_storage.c | 4 +---
net/core/bpf_sk_storage.c | 4 +---
6 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index dc56fa459ac9..26b7f53dad33 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -180,7 +180,7 @@ int bpf_local_storage_map_check_btf(const struct bpf_map *map,
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem);
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now);
+int bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now);
int bpf_selem_link_map(struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *selem);
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index 148da8f7ff36..68a2ad1fc3d4 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -120,8 +120,7 @@ static int cgroup_storage_delete(struct cgroup *cgroup, struct bpf_map *map)
if (!sdata)
return -ENOENT;
- bpf_selem_unlink(SELEM(sdata), false);
- return 0;
+ return bpf_selem_unlink(SELEM(sdata), false);
}
static long bpf_cgrp_storage_delete_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index 15a3eb9b02d9..e55c3e0f5bb9 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -112,9 +112,7 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
if (!sdata)
return -ENOENT;
- bpf_selem_unlink(SELEM(sdata), false);
-
- return 0;
+ return bpf_selem_unlink(SELEM(sdata), false);
}
static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 7501227c8974..dda106f76491 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -426,7 +426,7 @@ static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
hlist_add_head_rcu(&selem->map_node, &b->list);
}
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
+int bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
{
struct bpf_local_storage_map *storage_smap;
struct bpf_local_storage *local_storage = NULL;
@@ -473,6 +473,8 @@ void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
if (free_local_storage)
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
+
+ return 0;
}
void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
@@ -937,7 +939,7 @@ void bpf_local_storage_map_free(struct bpf_map *map,
struct bpf_local_storage_elem, map_node))) {
if (busy_counter)
this_cpu_inc(*busy_counter);
- bpf_selem_unlink(selem, true);
+ WARN_ON(bpf_selem_unlink(selem, true));
if (busy_counter)
this_cpu_dec(*busy_counter);
cond_resched_rcu();
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index 1109475953c0..c0199baba9a7 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -169,9 +169,7 @@ static int task_storage_delete(struct task_struct *task, struct bpf_map *map,
if (!nobusy)
return -EBUSY;
- bpf_selem_unlink(SELEM(sdata), false);
-
- return 0;
+ return bpf_selem_unlink(SELEM(sdata), false);
}
static long bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index fac5cf385785..7b3d44667cee 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -40,9 +40,7 @@ static int bpf_sk_storage_del(struct sock *sk, struct bpf_map *map)
if (!sdata)
return -ENOENT;
- bpf_selem_unlink(SELEM(sdata), false);
-
- return 0;
+ return bpf_selem_unlink(SELEM(sdata), false);
}
/* Called by __sk_destruct() & bpf_sk_storage_clone() */
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 05/11] bpf: Change local_storage->lock and b->lock to rqspinlock
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (3 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 04/11] bpf: Convert bpf_selem_unlink to failable Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 06/11] bpf: Remove task local storage percpu counter Amery Hung
` (7 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock to
from raw_spin_lock to rqspinlock.
Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall
return or bpf helper return values.
For, __bpf_local_storage_map_cache(), instead of handling the error, skip
updating the cache.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 5 +-
kernel/bpf/bpf_local_storage.c | 86 +++++++++++++++++++++----------
2 files changed, 62 insertions(+), 29 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 26b7f53dad33..2a0aae5168fa 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -15,6 +15,7 @@
#include <linux/types.h>
#include <linux/bpf_mem_alloc.h>
#include <uapi/linux/btf.h>
+#include <asm/rqspinlock.h>
#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
@@ -23,7 +24,7 @@
rcu_read_lock_bh_held())
struct bpf_local_storage_map_bucket {
struct hlist_head list;
- raw_spinlock_t lock;
+ rqspinlock_t lock;
};
/* Thp map is not the primary owner of a bpf_local_storage_elem.
@@ -99,7 +100,7 @@ struct bpf_local_storage {
* bpf_local_storage_elem.
*/
struct rcu_head rcu;
- raw_spinlock_t lock; /* Protect adding/removing from the "list" */
+ rqspinlock_t lock; /* Protect adding/removing from the "list" */
};
/* U16_MAX is much more than enough for sk local storage
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index dda106f76491..5faa1df4fc50 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -383,6 +383,7 @@ static int bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
struct bpf_local_storage_map *smap;
struct bpf_local_storage_map_bucket *b;
unsigned long flags;
+ int ret;
if (unlikely(!selem_linked_to_map_lockless(selem)))
/* selem has already be unlinked from smap */
@@ -390,10 +391,13 @@ static int bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
b = select_bucket(smap, selem);
- raw_spin_lock_irqsave(&b->lock, flags);
+ ret = raw_res_spin_lock_irqsave(&b->lock, flags);
+ if (ret)
+ return ret;
+
if (likely(selem_linked_to_map(selem)))
hlist_del_init_rcu(&selem->map_node);
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ raw_res_spin_unlock_irqrestore(&b->lock, flags);
return 0;
}
@@ -409,11 +413,15 @@ int bpf_selem_link_map(struct bpf_local_storage_map *smap,
{
struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
unsigned long flags;
+ int ret;
+
+ ret = raw_res_spin_lock_irqsave(&b->lock, flags);
+ if (ret)
+ return ret;
- raw_spin_lock_irqsave(&b->lock, flags);
RCU_INIT_POINTER(SDATA(selem)->smap, smap);
hlist_add_head_rcu(&selem->map_node, &b->list);
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ raw_res_spin_unlock_irqrestore(&b->lock, flags);
return 0;
}
@@ -435,6 +443,7 @@ int bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
struct bpf_local_storage_map_bucket *b;
struct bpf_local_storage_map *smap = NULL;
unsigned long flags, b_flags;
+ int ret = 0;
if (likely(selem_linked_to_map_lockless(selem))) {
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
@@ -449,10 +458,16 @@ int bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
}
- if (local_storage)
- raw_spin_lock_irqsave(&local_storage->lock, flags);
- if (smap)
- raw_spin_lock_irqsave(&b->lock, b_flags);
+ if (local_storage) {
+ ret = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
+ if (ret)
+ return ret;
+ }
+ if (smap) {
+ ret = raw_res_spin_lock_irqsave(&b->lock, b_flags);
+ if (ret)
+ goto unlock_storage;
+ }
/* Always unlink from map before unlinking from local_storage
* because selem will be freed after successfully unlinked from
@@ -465,16 +480,17 @@ int bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
local_storage, selem, true, &selem_free_list);
if (smap)
- raw_spin_unlock_irqrestore(&b->lock, b_flags);
+ raw_res_spin_unlock_irqrestore(&b->lock, b_flags);
+unlock_storage:
if (local_storage)
- raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+ raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&selem_free_list, reuse_now);
if (free_local_storage)
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
- return 0;
+ return ret;
}
void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
@@ -482,16 +498,20 @@ void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
struct bpf_local_storage_elem *selem)
{
unsigned long flags;
+ int err;
/* spinlock is needed to avoid racing with the
* parallel delete. Otherwise, publishing an already
* deleted sdata to the cache will become a use-after-free
* problem in the next bpf_local_storage_lookup().
*/
- raw_spin_lock_irqsave(&local_storage->lock, flags);
+ err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
+ if (err)
+ return;
+
if (selem_linked_to_storage(selem))
rcu_assign_pointer(local_storage->cache[smap->cache_idx], SDATA(selem));
- raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+ raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
}
static int check_flags(const struct bpf_local_storage_data *old_sdata,
@@ -535,13 +555,16 @@ int bpf_local_storage_alloc(void *owner,
RCU_INIT_POINTER(storage->smap, smap);
INIT_HLIST_HEAD(&storage->list);
- raw_spin_lock_init(&storage->lock);
+ raw_res_spin_lock_init(&storage->lock);
storage->owner = owner;
bpf_selem_link_storage_nolock(storage, first_selem);
b = select_bucket(smap, first_selem);
- raw_spin_lock_irqsave(&b->lock, flags);
+ err = raw_res_spin_lock_irqsave(&b->lock, flags);
+ if (err)
+ goto uncharge;
+
bpf_selem_link_map_nolock(smap, first_selem, b);
owner_storage_ptr =
@@ -559,7 +582,7 @@ int bpf_local_storage_alloc(void *owner,
prev_storage = cmpxchg(owner_storage_ptr, NULL, storage);
if (unlikely(prev_storage)) {
bpf_selem_unlink_map_nolock(first_selem);
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ raw_res_spin_unlock_irqrestore(&b->lock, flags);
err = -EAGAIN;
goto uncharge;
@@ -573,7 +596,7 @@ int bpf_local_storage_alloc(void *owner,
* bucket->list under rcu_read_lock().
*/
}
- raw_spin_unlock_irqrestore(&b->lock, flags);
+ raw_res_spin_unlock_irqrestore(&b->lock, flags);
return 0;
@@ -656,7 +679,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
if (!alloc_selem)
return ERR_PTR(-ENOMEM);
- raw_spin_lock_irqsave(&local_storage->lock, flags);
+ err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
+ if (err)
+ return ERR_PTR(err);
/* Recheck local_storage->list under local_storage->lock */
if (unlikely(hlist_empty(&local_storage->list))) {
@@ -684,9 +709,15 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
b = select_bucket(smap, selem);
old_b = old_sdata ? select_bucket(smap, SELEM(old_sdata)) : b;
- raw_spin_lock_irqsave(&b->lock, b_flags);
- if (b != old_b)
- raw_spin_lock_irqsave(&old_b->lock, old_b_flags);
+ err = raw_res_spin_lock_irqsave(&b->lock, b_flags);
+ if (err)
+ goto unlock;
+
+ if (b != old_b) {
+ err = raw_res_spin_lock_irqsave(&old_b->lock, old_b_flags);
+ if (err)
+ goto unlock_bucket;
+ }
alloc_selem = NULL;
/* First, link the new selem to the map */
@@ -703,11 +734,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
}
if (b != old_b)
- raw_spin_unlock_irqrestore(&old_b->lock, old_b_flags);
- raw_spin_unlock_irqrestore(&b->lock, b_flags);
+ raw_res_spin_unlock_irqrestore(&old_b->lock, old_b_flags);
+unlock_bucket:
+ raw_res_spin_unlock_irqrestore(&b->lock, b_flags);
unlock:
- raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+ raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&old_selem_free_list, false);
if (alloc_selem) {
mem_uncharge(smap, owner, smap->elem_size);
@@ -797,7 +829,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
* when unlinking elem from the local_storage->list and
* the map's bucket->list.
*/
- raw_spin_lock_irqsave(&local_storage->lock, flags);
+ WARN_ON(raw_res_spin_lock_irqsave(&local_storage->lock, flags));
hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
/* Always unlink from map before unlinking from
* local_storage.
@@ -813,7 +845,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
free_storage = bpf_selem_unlink_storage_nolock(
local_storage, selem, true, &free_selem_list);
}
- raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+ raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&free_selem_list, true);
@@ -870,7 +902,7 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
for (i = 0; i < nbuckets; i++) {
INIT_HLIST_HEAD(&smap->buckets[i].list);
- raw_spin_lock_init(&smap->buckets[i].lock);
+ raw_res_spin_lock_init(&smap->buckets[i].lock);
}
smap->elem_size = offsetof(struct bpf_local_storage_elem,
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 06/11] bpf: Remove task local storage percpu counter
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (4 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 05/11] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-08-02 1:15 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 07/11] bpf: Remove cgroup " Amery Hung
` (6 subsequent siblings)
12 siblings, 1 reply; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
The percpu counter in task local storage is no longer needed as the
underlying bpf_local_storage can now handle deadlock with the help of
rqspinlock. Remove the percpu counter and related migrate_{disable,
enable}.
Since the percpu counter is removed, merge back bpf_task_storage_get()
and bpf_task_storage_get_recur(). This will allow the bpf syscalls and
helpers to run concurrently on the same CPU, removing the -EBUSY error.
bpf_task_storage_get(..., F_CREATE) will now always succeed with enough
free memory unless being called recursively.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
kernel/bpf/bpf_task_storage.c | 149 ++++------------------------------
kernel/bpf/helpers.c | 4 -
2 files changed, 17 insertions(+), 136 deletions(-)
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index c0199baba9a7..ff1e2e4cc5b5 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -20,29 +20,6 @@
DEFINE_BPF_STORAGE_CACHE(task_cache);
-static DEFINE_PER_CPU(int, bpf_task_storage_busy);
-
-static void bpf_task_storage_lock(void)
-{
- cant_migrate();
- this_cpu_inc(bpf_task_storage_busy);
-}
-
-static void bpf_task_storage_unlock(void)
-{
- this_cpu_dec(bpf_task_storage_busy);
-}
-
-static bool bpf_task_storage_trylock(void)
-{
- cant_migrate();
- if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
- this_cpu_dec(bpf_task_storage_busy);
- return false;
- }
- return true;
-}
-
static struct bpf_local_storage __rcu **task_storage_ptr(void *owner)
{
struct task_struct *task = owner;
@@ -70,19 +47,15 @@ void bpf_task_storage_free(struct task_struct *task)
{
struct bpf_local_storage *local_storage;
- migrate_disable();
rcu_read_lock();
local_storage = rcu_dereference(task->bpf_storage);
if (!local_storage)
goto out;
- bpf_task_storage_lock();
bpf_local_storage_destroy(local_storage);
- bpf_task_storage_unlock();
out:
rcu_read_unlock();
- migrate_enable();
}
static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
@@ -108,9 +81,7 @@ static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
goto out;
}
- bpf_task_storage_lock();
sdata = task_storage_lookup(task, map, true);
- bpf_task_storage_unlock();
put_pid(pid);
return sdata ? sdata->data : NULL;
out:
@@ -145,11 +116,9 @@ static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
goto out;
}
- bpf_task_storage_lock();
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value, map_flags,
true, GFP_ATOMIC);
- bpf_task_storage_unlock();
err = PTR_ERR_OR_ZERO(sdata);
out:
@@ -157,8 +126,7 @@ static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
return err;
}
-static int task_storage_delete(struct task_struct *task, struct bpf_map *map,
- bool nobusy)
+static int task_storage_delete(struct task_struct *task, struct bpf_map *map)
{
struct bpf_local_storage_data *sdata;
@@ -166,9 +134,6 @@ static int task_storage_delete(struct task_struct *task, struct bpf_map *map,
if (!sdata)
return -ENOENT;
- if (!nobusy)
- return -EBUSY;
-
return bpf_selem_unlink(SELEM(sdata), false);
}
@@ -194,111 +159,51 @@ static long bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
goto out;
}
- bpf_task_storage_lock();
- err = task_storage_delete(task, map, true);
- bpf_task_storage_unlock();
+ err = task_storage_delete(task, map);
out:
put_pid(pid);
return err;
}
-/* Called by bpf_task_storage_get*() helpers */
-static void *__bpf_task_storage_get(struct bpf_map *map,
- struct task_struct *task, void *value,
- u64 flags, gfp_t gfp_flags, bool nobusy)
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
+ task, void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
- sdata = task_storage_lookup(task, map, nobusy);
+ WARN_ON_ONCE(!bpf_rcu_lock_held());
+ if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task)
+ return (unsigned long)NULL;
+
+ sdata = task_storage_lookup(task, map, true);
if (sdata)
- return sdata->data;
+ return (unsigned long)sdata->data;
/* only allocate new storage, when the task is refcounted */
if (refcount_read(&task->usage) &&
- (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) && nobusy) {
+ (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) {
sdata = bpf_local_storage_update(
task, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST, false, gfp_flags);
- return IS_ERR(sdata) ? NULL : sdata->data;
+ WARN_ON(IS_ERR(sdata));
+ return IS_ERR(sdata) ? (unsigned long)NULL : (unsigned long)sdata->data;
}
- return NULL;
-}
-
-/* *gfp_flags* is a hidden argument provided by the verifier */
-BPF_CALL_5(bpf_task_storage_get_recur, struct bpf_map *, map, struct task_struct *,
- task, void *, value, u64, flags, gfp_t, gfp_flags)
-{
- bool nobusy;
- void *data;
-
- WARN_ON_ONCE(!bpf_rcu_lock_held());
- if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task)
- return (unsigned long)NULL;
-
- nobusy = bpf_task_storage_trylock();
- data = __bpf_task_storage_get(map, task, value, flags,
- gfp_flags, nobusy);
- if (nobusy)
- bpf_task_storage_unlock();
- return (unsigned long)data;
-}
-
-/* *gfp_flags* is a hidden argument provided by the verifier */
-BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
- task, void *, value, u64, flags, gfp_t, gfp_flags)
-{
- void *data;
-
- WARN_ON_ONCE(!bpf_rcu_lock_held());
- if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task)
- return (unsigned long)NULL;
-
- bpf_task_storage_lock();
- data = __bpf_task_storage_get(map, task, value, flags,
- gfp_flags, true);
- bpf_task_storage_unlock();
- return (unsigned long)data;
-}
-
-BPF_CALL_2(bpf_task_storage_delete_recur, struct bpf_map *, map, struct task_struct *,
- task)
-{
- bool nobusy;
- int ret;
-
- WARN_ON_ONCE(!bpf_rcu_lock_held());
- if (!task)
- return -EINVAL;
-
- nobusy = bpf_task_storage_trylock();
- /* This helper must only be called from places where the lifetime of the task
- * is guaranteed. Either by being refcounted or by being protected
- * by an RCU read-side critical section.
- */
- ret = task_storage_delete(task, map, nobusy);
- if (nobusy)
- bpf_task_storage_unlock();
- return ret;
+ return (unsigned long)NULL;
}
BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *,
task)
{
- int ret;
-
WARN_ON_ONCE(!bpf_rcu_lock_held());
if (!task)
return -EINVAL;
- bpf_task_storage_lock();
/* This helper must only be called from places where the lifetime of the task
* is guaranteed. Either by being refcounted or by being protected
* by an RCU read-side critical section.
*/
- ret = task_storage_delete(task, map, true);
- bpf_task_storage_unlock();
- return ret;
+ return task_storage_delete(task, map);
}
static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
@@ -313,7 +218,7 @@ static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
static void task_storage_map_free(struct bpf_map *map)
{
- bpf_local_storage_map_free(map, &task_cache, &bpf_task_storage_busy);
+ bpf_local_storage_map_free(map, &task_cache, NULL);
}
BTF_ID_LIST_GLOBAL_SINGLE(bpf_local_storage_map_btf_id, struct, bpf_local_storage_map)
@@ -332,17 +237,6 @@ const struct bpf_map_ops task_storage_map_ops = {
.map_owner_storage_ptr = task_storage_ptr,
};
-const struct bpf_func_proto bpf_task_storage_get_recur_proto = {
- .func = bpf_task_storage_get_recur,
- .gpl_only = false,
- .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
- .arg1_type = ARG_CONST_MAP_PTR,
- .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
- .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
- .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
- .arg4_type = ARG_ANYTHING,
-};
-
const struct bpf_func_proto bpf_task_storage_get_proto = {
.func = bpf_task_storage_get,
.gpl_only = false,
@@ -354,15 +248,6 @@ const struct bpf_func_proto bpf_task_storage_get_proto = {
.arg4_type = ARG_ANYTHING,
};
-const struct bpf_func_proto bpf_task_storage_delete_recur_proto = {
- .func = bpf_task_storage_delete_recur,
- .gpl_only = false,
- .ret_type = RET_INTEGER,
- .arg1_type = ARG_CONST_MAP_PTR,
- .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
- .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
-};
-
const struct bpf_func_proto bpf_task_storage_delete_proto = {
.func = bpf_task_storage_delete,
.gpl_only = false,
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6b4877e85a68..d142fbb25665 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2032,12 +2032,8 @@ bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_cgroup_classid_curr_proto;
#endif
case BPF_FUNC_task_storage_get:
- if (bpf_prog_check_recur(prog))
- return &bpf_task_storage_get_recur_proto;
return &bpf_task_storage_get_proto;
case BPF_FUNC_task_storage_delete:
- if (bpf_prog_check_recur(prog))
- return &bpf_task_storage_delete_recur_proto;
return &bpf_task_storage_delete_proto;
default:
break;
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 07/11] bpf: Remove cgroup local storage percpu counter
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (5 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 06/11] bpf: Remove task local storage percpu counter Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 08/11] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
` (5 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
The percpu counter in cgroup local storage is no longer needed as the
underlying bpf_local_storage can now handle deadlock with the help of
rqspinlock. Remove the percpu counter and related migrate_{disable,
enable}.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
kernel/bpf/bpf_cgrp_storage.c | 57 ++++-------------------------------
1 file changed, 6 insertions(+), 51 deletions(-)
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index 68a2ad1fc3d4..4f9cfa032870 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -11,29 +11,6 @@
DEFINE_BPF_STORAGE_CACHE(cgroup_cache);
-static DEFINE_PER_CPU(int, bpf_cgrp_storage_busy);
-
-static void bpf_cgrp_storage_lock(void)
-{
- cant_migrate();
- this_cpu_inc(bpf_cgrp_storage_busy);
-}
-
-static void bpf_cgrp_storage_unlock(void)
-{
- this_cpu_dec(bpf_cgrp_storage_busy);
-}
-
-static bool bpf_cgrp_storage_trylock(void)
-{
- cant_migrate();
- if (unlikely(this_cpu_inc_return(bpf_cgrp_storage_busy) != 1)) {
- this_cpu_dec(bpf_cgrp_storage_busy);
- return false;
- }
- return true;
-}
-
static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner)
{
struct cgroup *cg = owner;
@@ -45,18 +22,14 @@ void bpf_cgrp_storage_free(struct cgroup *cgroup)
{
struct bpf_local_storage *local_storage;
- migrate_disable();
rcu_read_lock();
local_storage = rcu_dereference(cgroup->bpf_cgrp_storage);
if (!local_storage)
goto out;
- bpf_cgrp_storage_lock();
bpf_local_storage_destroy(local_storage);
- bpf_cgrp_storage_unlock();
out:
rcu_read_unlock();
- migrate_enable();
}
static struct bpf_local_storage_data *
@@ -85,9 +58,7 @@ static void *bpf_cgrp_storage_lookup_elem(struct bpf_map *map, void *key)
if (IS_ERR(cgroup))
return ERR_CAST(cgroup);
- bpf_cgrp_storage_lock();
sdata = cgroup_storage_lookup(cgroup, map, true);
- bpf_cgrp_storage_unlock();
cgroup_put(cgroup);
return sdata ? sdata->data : NULL;
}
@@ -104,10 +75,8 @@ static long bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key,
if (IS_ERR(cgroup))
return PTR_ERR(cgroup);
- bpf_cgrp_storage_lock();
sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map,
value, map_flags, false, GFP_ATOMIC);
- bpf_cgrp_storage_unlock();
cgroup_put(cgroup);
return PTR_ERR_OR_ZERO(sdata);
}
@@ -133,9 +102,7 @@ static long bpf_cgrp_storage_delete_elem(struct bpf_map *map, void *key)
if (IS_ERR(cgroup))
return PTR_ERR(cgroup);
- bpf_cgrp_storage_lock();
err = cgroup_storage_delete(cgroup, map);
- bpf_cgrp_storage_unlock();
cgroup_put(cgroup);
return err;
}
@@ -152,7 +119,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
static void cgroup_storage_map_free(struct bpf_map *map)
{
- bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy);
+ bpf_local_storage_map_free(map, &cgroup_cache, NULL);
}
/* *gfp_flags* is a hidden argument provided by the verifier */
@@ -160,7 +127,6 @@ BPF_CALL_5(bpf_cgrp_storage_get, struct bpf_map *, map, struct cgroup *, cgroup,
void *, value, u64, flags, gfp_t, gfp_flags)
{
struct bpf_local_storage_data *sdata;
- bool nobusy;
WARN_ON_ONCE(!bpf_rcu_lock_held());
if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
@@ -169,38 +135,27 @@ BPF_CALL_5(bpf_cgrp_storage_get, struct bpf_map *, map, struct cgroup *, cgroup,
if (!cgroup)
return (unsigned long)NULL;
- nobusy = bpf_cgrp_storage_trylock();
-
- sdata = cgroup_storage_lookup(cgroup, map, nobusy);
+ sdata = cgroup_storage_lookup(cgroup, map, NULL);
if (sdata)
- goto unlock;
+ goto out;
/* only allocate new storage, when the cgroup is refcounted */
if (!percpu_ref_is_dying(&cgroup->self.refcnt) &&
- (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) && nobusy)
+ (flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map,
value, BPF_NOEXIST, false, gfp_flags);
-unlock:
- if (nobusy)
- bpf_cgrp_storage_unlock();
+out:
return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL : (unsigned long)sdata->data;
}
BPF_CALL_2(bpf_cgrp_storage_delete, struct bpf_map *, map, struct cgroup *, cgroup)
{
- int ret;
-
WARN_ON_ONCE(!bpf_rcu_lock_held());
if (!cgroup)
return -EINVAL;
- if (!bpf_cgrp_storage_trylock())
- return -EBUSY;
-
- ret = cgroup_storage_delete(cgroup, map);
- bpf_cgrp_storage_unlock();
- return ret;
+ return cgroup_storage_delete(cgroup, map);
}
const struct bpf_map_ops cgrp_storage_map_ops = {
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 08/11] bpf: Remove unused percpu counter from bpf_local_storage_map_free
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (6 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 07/11] bpf: Remove cgroup " Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 09/11] selftests/bpf: Update task_local_storage/recursion test Amery Hung
` (4 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
Percpu locks have been removed from cgroup and task local storage. Now
that all local storage no longer use percpu variables as locks preventing
recursion, there is no need to pass them to bpf_local_storage_map_free().
Remove the argument from the function.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
include/linux/bpf_local_storage.h | 3 +--
kernel/bpf/bpf_cgrp_storage.c | 2 +-
kernel/bpf/bpf_inode_storage.c | 2 +-
kernel/bpf/bpf_local_storage.c | 7 +------
kernel/bpf/bpf_task_storage.c | 2 +-
net/core/bpf_sk_storage.c | 2 +-
6 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 2a0aae5168fa..5888e012dfe3 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -170,8 +170,7 @@ bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
void bpf_local_storage_destroy(struct bpf_local_storage *local_storage);
void bpf_local_storage_map_free(struct bpf_map *map,
- struct bpf_local_storage_cache *cache,
- int __percpu *busy_counter);
+ struct bpf_local_storage_cache *cache);
int bpf_local_storage_map_check_btf(const struct bpf_map *map,
const struct btf *btf,
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index 4f9cfa032870..a57abb2956d5 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -119,7 +119,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
static void cgroup_storage_map_free(struct bpf_map *map)
{
- bpf_local_storage_map_free(map, &cgroup_cache, NULL);
+ bpf_local_storage_map_free(map, &cgroup_cache);
}
/* *gfp_flags* is a hidden argument provided by the verifier */
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index e55c3e0f5bb9..46985acff0dd 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -186,7 +186,7 @@ static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
static void inode_storage_map_free(struct bpf_map *map)
{
- bpf_local_storage_map_free(map, &inode_cache, NULL);
+ bpf_local_storage_map_free(map, &inode_cache);
}
const struct bpf_map_ops inode_storage_map_ops = {
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 5faa1df4fc50..94d7e7f88572 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -935,8 +935,7 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
}
void bpf_local_storage_map_free(struct bpf_map *map,
- struct bpf_local_storage_cache *cache,
- int __percpu *busy_counter)
+ struct bpf_local_storage_cache *cache)
{
struct bpf_local_storage_map_bucket *b;
struct bpf_local_storage_elem *selem;
@@ -969,11 +968,7 @@ void bpf_local_storage_map_free(struct bpf_map *map,
while ((selem = hlist_entry_safe(
rcu_dereference_raw(hlist_first_rcu(&b->list)),
struct bpf_local_storage_elem, map_node))) {
- if (busy_counter)
- this_cpu_inc(*busy_counter);
WARN_ON(bpf_selem_unlink(selem, true));
- if (busy_counter)
- this_cpu_dec(*busy_counter);
cond_resched_rcu();
}
rcu_read_unlock();
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index ff1e2e4cc5b5..b9d8c029c0e5 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -218,7 +218,7 @@ static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
static void task_storage_map_free(struct bpf_map *map)
{
- bpf_local_storage_map_free(map, &task_cache, NULL);
+ bpf_local_storage_map_free(map, &task_cache);
}
BTF_ID_LIST_GLOBAL_SINGLE(bpf_local_storage_map_btf_id, struct, bpf_local_storage_map)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 7b3d44667cee..7037b841cf11 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -62,7 +62,7 @@ void bpf_sk_storage_free(struct sock *sk)
static void bpf_sk_storage_map_free(struct bpf_map *map)
{
- bpf_local_storage_map_free(map, &sk_cache, NULL);
+ bpf_local_storage_map_free(map, &sk_cache);
}
static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 09/11] selftests/bpf: Update task_local_storage/recursion test
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (7 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 08/11] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 10/11] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
` (3 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
Update the expected result of the selftest as recursion of task local
storage syscall and helpers have been relaxed. Now that the percpu
counter is removed, task local storage helpers, bpf_task_storage_get()
and bpf_task_storage_delete() can now run on the same CPU at the same
time unless they cause deadlock.
Note that since there is no percpu counter preventing recursion in
task local storage helpers, bpf_trampoline now catches the recursion
of on_update as reported by recursion_misses.
on_enter: tp_btf/sys_enter
on_update: fentry/bpf_local_storage_update
Old behavior New behavior
____________ ____________
on_enter on_enter
bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a)
bpf_task_storage_trylock succeed bpf_local_storage_update(&map_a)
bpf_local_storage_update(&map_a)
on_update on_update
bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a)
bpf_task_storage_trylock fail on_update::misses++ (1)
return NULL create and return map_a::ptr
map_a::ptr += 1 (1)
bpf_task_storage_delete(&map_a)
return 0
bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b)
bpf_task_storage_trylock fail on_update::misses++ (2)
return NULL create and return map_b::ptr
map_b::ptr += 1 (1)
create and return map_a::ptr create and return map_a::ptr
map_a::ptr = 200 map_a::ptr = 200
bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b)
bpf_task_storage_trylock succeed lockless lookup succeed
bpf_local_storage_update(&map_b) return map_b::ptr
on_update
bpf_task_storage_get(&map_a)
bpf_task_storage_trylock fail
lockless lookup succeed
return map_a::ptr
map_a::ptr += 1 (201)
bpf_task_storage_delete(&map_a)
bpf_task_storage_trylock fail
return -EBUSY
nr_del_errs++ (1)
bpf_task_storage_get(&map_b)
bpf_task_storage_trylock fail
return NULL
create and return ptr
map_b::ptr = 100
Expected result:
map_a::ptr = 201 map_a::ptr = 200
map_b::ptr = 100 map_b::ptr = 1
nr_del_err = 1 nr_del_err = 0
on_update::recursion_misses = 0 on_update::recursion_misses = 2
On_enter::recursion_misses = 0 on_enter::recursion_misses = 0
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
.../testing/selftests/bpf/prog_tests/task_local_storage.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
index 42e822ea352f..559727b05e08 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
@@ -117,19 +117,19 @@ static void test_recursion(void)
map_fd = bpf_map__fd(skel->maps.map_a);
err = bpf_map_lookup_elem(map_fd, &task_fd, &value);
ASSERT_OK(err, "lookup map_a");
- ASSERT_EQ(value, 201, "map_a value");
- ASSERT_EQ(skel->bss->nr_del_errs, 1, "bpf_task_storage_delete busy");
+ ASSERT_EQ(value, 200, "map_a value");
+ ASSERT_EQ(skel->bss->nr_del_errs, 0, "bpf_task_storage_delete busy");
map_fd = bpf_map__fd(skel->maps.map_b);
err = bpf_map_lookup_elem(map_fd, &task_fd, &value);
ASSERT_OK(err, "lookup map_b");
- ASSERT_EQ(value, 100, "map_b value");
+ ASSERT_EQ(value, 1, "map_b value");
prog_fd = bpf_program__fd(skel->progs.on_update);
memset(&info, 0, sizeof(info));
err = bpf_prog_get_info_by_fd(prog_fd, &info, &info_len);
ASSERT_OK(err, "get prog info");
- ASSERT_EQ(info.recursion_misses, 0, "on_update prog recursion");
+ ASSERT_EQ(info.recursion_misses, 2, "on_update prog recursion");
prog_fd = bpf_program__fd(skel->progs.on_enter);
memset(&info, 0, sizeof(info));
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 10/11] selftests/bpf: Remove test_task_storage_map_stress_lookup
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (8 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 09/11] selftests/bpf: Update task_local_storage/recursion test Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 11/11] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
` (2 subsequent siblings)
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
Remove a test in test_maps that checks if the updating of the percpu
counter in task local storage map is preemption safe as the percpu
counter is now removed.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
.../bpf/map_tests/task_storage_map.c | 128 ------------------
.../bpf/progs/read_bpf_task_storage_busy.c | 38 ------
2 files changed, 166 deletions(-)
delete mode 100644 tools/testing/selftests/bpf/map_tests/task_storage_map.c
delete mode 100644 tools/testing/selftests/bpf/progs/read_bpf_task_storage_busy.c
diff --git a/tools/testing/selftests/bpf/map_tests/task_storage_map.c b/tools/testing/selftests/bpf/map_tests/task_storage_map.c
deleted file mode 100644
index a4121d2248ac..000000000000
--- a/tools/testing/selftests/bpf/map_tests/task_storage_map.c
+++ /dev/null
@@ -1,128 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
-#define _GNU_SOURCE
-#include <sched.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <stdbool.h>
-#include <errno.h>
-#include <string.h>
-#include <pthread.h>
-
-#include <bpf/bpf.h>
-#include <bpf/libbpf.h>
-
-#include "bpf_util.h"
-#include "test_maps.h"
-#include "task_local_storage_helpers.h"
-#include "read_bpf_task_storage_busy.skel.h"
-
-struct lookup_ctx {
- bool start;
- bool stop;
- int pid_fd;
- int map_fd;
- int loop;
-};
-
-static void *lookup_fn(void *arg)
-{
- struct lookup_ctx *ctx = arg;
- long value;
- int i = 0;
-
- while (!ctx->start)
- usleep(1);
-
- while (!ctx->stop && i++ < ctx->loop)
- bpf_map_lookup_elem(ctx->map_fd, &ctx->pid_fd, &value);
- return NULL;
-}
-
-static void abort_lookup(struct lookup_ctx *ctx, pthread_t *tids, unsigned int nr)
-{
- unsigned int i;
-
- ctx->stop = true;
- ctx->start = true;
- for (i = 0; i < nr; i++)
- pthread_join(tids[i], NULL);
-}
-
-void test_task_storage_map_stress_lookup(void)
-{
-#define MAX_NR_THREAD 4096
- unsigned int i, nr = 256, loop = 8192, cpu = 0;
- struct read_bpf_task_storage_busy *skel;
- pthread_t tids[MAX_NR_THREAD];
- struct lookup_ctx ctx;
- cpu_set_t old, new;
- const char *cfg;
- int err;
-
- cfg = getenv("TASK_STORAGE_MAP_NR_THREAD");
- if (cfg) {
- nr = atoi(cfg);
- if (nr > MAX_NR_THREAD)
- nr = MAX_NR_THREAD;
- }
- cfg = getenv("TASK_STORAGE_MAP_NR_LOOP");
- if (cfg)
- loop = atoi(cfg);
- cfg = getenv("TASK_STORAGE_MAP_PIN_CPU");
- if (cfg)
- cpu = atoi(cfg);
-
- skel = read_bpf_task_storage_busy__open_and_load();
- err = libbpf_get_error(skel);
- CHECK(err, "open_and_load", "error %d\n", err);
-
- /* Only for a fully preemptible kernel */
- if (!skel->kconfig->CONFIG_PREEMPTION) {
- printf("%s SKIP (no CONFIG_PREEMPTION)\n", __func__);
- read_bpf_task_storage_busy__destroy(skel);
- skips++;
- return;
- }
-
- /* Save the old affinity setting */
- sched_getaffinity(getpid(), sizeof(old), &old);
-
- /* Pinned on a specific CPU */
- CPU_ZERO(&new);
- CPU_SET(cpu, &new);
- sched_setaffinity(getpid(), sizeof(new), &new);
-
- ctx.start = false;
- ctx.stop = false;
- ctx.pid_fd = sys_pidfd_open(getpid(), 0);
- ctx.map_fd = bpf_map__fd(skel->maps.task);
- ctx.loop = loop;
- for (i = 0; i < nr; i++) {
- err = pthread_create(&tids[i], NULL, lookup_fn, &ctx);
- if (err) {
- abort_lookup(&ctx, tids, i);
- CHECK(err, "pthread_create", "error %d\n", err);
- goto out;
- }
- }
-
- ctx.start = true;
- for (i = 0; i < nr; i++)
- pthread_join(tids[i], NULL);
-
- skel->bss->pid = getpid();
- err = read_bpf_task_storage_busy__attach(skel);
- CHECK(err, "attach", "error %d\n", err);
-
- /* Trigger program */
- sys_gettid();
- skel->bss->pid = 0;
-
- CHECK(skel->bss->busy != 0, "bad bpf_task_storage_busy", "got %d\n", skel->bss->busy);
-out:
- read_bpf_task_storage_busy__destroy(skel);
- /* Restore affinity setting */
- sched_setaffinity(getpid(), sizeof(old), &old);
- printf("%s:PASS\n", __func__);
-}
diff --git a/tools/testing/selftests/bpf/progs/read_bpf_task_storage_busy.c b/tools/testing/selftests/bpf/progs/read_bpf_task_storage_busy.c
deleted file mode 100644
index 69da05bb6c63..000000000000
--- a/tools/testing/selftests/bpf/progs/read_bpf_task_storage_busy.c
+++ /dev/null
@@ -1,38 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
-#include "vmlinux.h"
-#include <bpf/bpf_helpers.h>
-#include <bpf/bpf_tracing.h>
-
-extern bool CONFIG_PREEMPTION __kconfig __weak;
-extern const int bpf_task_storage_busy __ksym;
-
-char _license[] SEC("license") = "GPL";
-
-int pid = 0;
-int busy = 0;
-
-struct {
- __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
- __uint(map_flags, BPF_F_NO_PREALLOC);
- __type(key, int);
- __type(value, long);
-} task SEC(".maps");
-
-SEC("raw_tp/sys_enter")
-int BPF_PROG(read_bpf_task_storage_busy)
-{
- int *value;
-
- if (!CONFIG_PREEMPTION)
- return 0;
-
- if (bpf_get_current_pid_tgid() >> 32 != pid)
- return 0;
-
- value = bpf_this_cpu_ptr(&bpf_task_storage_busy);
- if (value)
- busy = *value;
-
- return 0;
-}
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH bpf-next v1 11/11] selftests/bpf: Choose another percpu variable in bpf for btf_dump test
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (9 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 10/11] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
@ 2025-07-29 18:25 ` Amery Hung
2025-07-29 18:28 ` [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
2025-08-02 1:46 ` Martin KaFai Lau
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:25 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, ameryhung, kernel-team
bpf_cgrp_storage_busy has been removed. Use bpf_bprintf_nest_level
instead. This percpu variable is also in the bpf subsystem so that
if it is removed in the future, BPF-CI will catch this type of CI-
breaking change.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
tools/testing/selftests/bpf/prog_tests/btf_dump.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
index 82903585c870..e08b6e6475df 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
@@ -875,8 +875,8 @@ static void test_btf_dump_var_data(struct btf *btf, struct btf_dump *d,
TEST_BTF_DUMP_VAR(btf, d, NULL, str, "cpu_number", int, BTF_F_COMPACT,
"int cpu_number = (int)100", 100);
#endif
- TEST_BTF_DUMP_VAR(btf, d, NULL, str, "bpf_cgrp_storage_busy", int, BTF_F_COMPACT,
- "static int bpf_cgrp_storage_busy = (int)2", 2);
+ TEST_BTF_DUMP_VAR(btf, d, NULL, str, "bpf_bprintf_nest_level", int, BTF_F_COMPACT,
+ "static int bpf_bprintf_nest_level = (int)2", 2);
}
struct btf_dump_string_ctx {
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (10 preceding siblings ...)
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 11/11] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
@ 2025-07-29 18:28 ` Amery Hung
2025-08-02 1:46 ` Martin KaFai Lau
12 siblings, 0 replies; 19+ messages in thread
From: Amery Hung @ 2025-07-29 18:28 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, kernel-team
The title is "Remove task and cgroup local storage percpu counters"
On Tue, Jul 29, 2025 at 11:25 AM Amery Hung <ameryhung@gmail.com> wrote:
>
> * Motivation *
>
> The goal of this patchset is to make bpf syscalls and helpers updating
> task and cgroup local storage more robust by removing percpu counters
> in them. Task local storage and cgroup storage each employs a percpu
> counter to prevent deadlock caused by recursion. Since the underlying
> bpf local storage takes spinlocks in various operations, bpf programs
> running recursively may try to take a spinlock which is already taken.
> For example, when a tracing bpf program called recursively during
> bpf_task_storage_get(..., F_CREATE) tries to call
> bpf_task_storage_get(..., F_CREATE) again, it will cause AA deadlock
> if the percpu variable is not in place.
>
> However, sometimes, the percpu counter may cause bpf syscalls or helpers
> to return errors spuriously, as soon as another threads is also updating
> the local storage or the local storage map. Ideally, the two threads
> could have taken turn to take the locks and perform their jobs
> respectively. However, due to the percpu counter, the syscalls and
> helpers can return -EBUSY even if one of them does not run recursively
> in another one. All it takes for this to happen is if the two threads run
> on the same CPU. This happened when BPF-CI ran the selftest of task local
> data. Since CI runs the test on VM with 2 CPUs, bpf_task_storage_get(...,
> F_CREATE) can easily fail.
>
> The failure mode is not good for users as they need to add retry logic
> in user space or bpf programs to avoid it. Even with retry, there
> is no guaranteed upper bound of the loop for a succeess call. Therefore,
> this patchset seeks to remove the percpu counter and makes the related
> bpf syscalls and helpers more reliable, while still make sure recursion
> deadlock will not happen, with the help of resilient queued spinlock
> (rqspinlock).
>
>
> * Implementation *
>
> To remove the percpu counter without introducing deadlock,
> bpf_local_storage is refactored by changing the locks from raw_spin_lock
> to rqspinlock, which prevents deadlock with deadlock detection and a
> timeout mechanism.
>
> There are two locks in bpf_local_storage due to the ownership model as
> illustrated in the figure below. A map value, which consists of a
> pointer to the map and the data, is a bpf_local_storage_map_data (sdata)
> stored in a bpf_local_storage_elem (selem). A selem belongs to a
> bpf_local_storage and bpf_local_storage_map at the same time.
> bpf_local_storage::lock (lock_storage->lock in short) protects the list
> in a bpf_local_storage and bpf_local_storage_map_bucket::lock (b->lock)
> protects the hash bucket in a bpf_local_storage_map.
>
>
> task_struct
> ┌ task1 ───────┐ bpf_local_storage
> │ *bpf_storage │---->┌─────────┐
> └──────────────┘<----│ *owner │ bpf_local_storage_elem
> │ *cache[16] (selem) selem
> │ *smap │ ┌──────────┐ ┌──────────┐
> │ list │------->│ snode │<------->│ snode │
> │ lock │ ┌---->│ map_node │<--┐ ┌-->│ map_node │
> └─────────┘ │ │ sdata = │ │ │ │ sdata = │
> task_struct │ │ {&mapA,} │ │ │ │ {&mapB,} │
> ┌ task2 ───────┐ bpf_local_storage └──────────┘ │ │ └──────────┘
> │ *bpf_storage │---->┌─────────┐ │ │ │
> └──────────────┘<----│ *owner │ │ │ │
> │ *cache[16] │ selem │ │ selem
> │ *smap │ │ ┌──────────┐ │ │ ┌──────────┐
> │ list │--│---->│ snode │<--│-│-->│ snode │
> │ lock │ │ ┌-->│ map_node │ └-│-->│ map_node │
> └─────────┘ │ │ │ sdata = │ │ │ sdata = │
> bpf_local_storage_map │ │ │ {&mapB,} │ │ │ {&mapA,} │
> (smap) │ │ └──────────┘ │ └──────────┘
> ┌ mapA ───────┐ │ │ │
> │ bpf_map map │ bpf_local_storage_map_bucket │
> │ *buckets │---->┌ b[0] ┐ │ │ │
> └─────────────┘ │ list │------┘ │ │
> │ lock │ │ │
> └──────┘ │ │
> smap ... │ │
> ┌ mapB ───────┐ │ │
> │ bpf_map map │ bpf_local_storage_map_bucket │
> │ *buckets │---->┌ b[0] ┐ │ │
> └─────────────┘ │ list │--------┘ │
> │ lock │ │
> └──────┘ │
> ┌ b[1] ┐ │
> │ list │-----------------------------┘
> │ lock │
> └──────┘
> ...
>
>
> The refactoring is divided into three steps.
>
> First, in patch 1-4, local storage helpers that take locks are being
> converted to failable. The functions are changed from returning void to
> returning an int error code with the return value temporarily set to 0.
> In callers where the helpers cannot fail in the middle of an update,
> the helper is open coded. In callers that are not allowed to fail, (i.e.,
> bpf_local_storage_destroy() and bpf_local_storage_map_free(), we make
> sure the functions cannot be called recursively, causing deadlock, and
> assert the return value to be 0.
>
> Then, in patch 5, the locks are changed to rqspinlock, and the error
> returned from raw_res_spin_lock_irqsave() is propogated to the syscalls
> and heleprs.
>
> Finally, in patch 7-8, the percpu counters in task and cgroup local
> storage are removed.
>
> Question:
>
> - In bpf_local_storage_destroy() and bpf_local_storage_map_free(), where
> it is not allow to fail, I assert that the lock acquisition always
> succeeds based on the fact that 1) these paths cannot run recursively
> causing AA deadlock and 2) local_storage->lock and b->lock are always
> acquired in the same order, but I also notice that rqspinlock has
> a timeout fallback. Is this assertion an okay thing to do?
>
>
> * Test *
>
> Task and cgroup local storage selftests have already covered deadlock
> caused by recursion. Patch 9 updates the expected result of task local
> storage selftests as task local storage bpf helpers can now run on the
> same CPU as they don't cause deadlock.
>
> * Patch organization *
>
> E(exposed) L(local storage lock) B(bucket lock)
> EL __bpf_local_storage_insert_cache (skip cache update)
> ELB bpf_local_storage_destroy (cannot recur)
> ELB bpf_local_storage_map_free (cannot recur)
> ELB bpf_selem_unlink --> Patch 4
> E B bpf_selem_link_map --> Patch 2
> B bpf_selem_unlink_map --> Patch 1
> L bpf_selem_unlink_storage --> Patch 3
>
> During the refactoring, to make sure all exposed functions
> handle the error returned by raw_res_spin_lock_irqsave(),
> __must_check is added locally to catch all callers.
>
> Patch 1-4
> Convert local storage helpers to failable, or open-code
> the helpers
>
> Patch 5
> Change local_storage->lock and b->lock from
> raw_spin_lock to rqspinlock
>
> Patch 6
> Remove percpu lock in task local storage and remove
> bpf_task_storage_{get,delete}_recur()
>
> Patch 7
> Remove percpu lock in cgroup local storage
>
> Patch 8
> Remove percpu lock in bpf_local_storage
>
> Patch 9
> Update task local storage recursion test
>
> Patch 10
> Remove task local storage stress test
>
> Patch 11
> Update btf_dump to use another percpu variable
>
> ----
>
> Amery Hung (11):
> bpf: Convert bpf_selem_unlink_map to failable
> bpf: Convert bpf_selem_link_map to failable
> bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink
> bpf: Convert bpf_selem_unlink to failable
> bpf: Change local_storage->lock and b->lock to rqspinlock
> bpf: Remove task local storage percpu counter
> bpf: Remove cgroup local storage percpu counter
> bpf: Remove unused percpu counter from bpf_local_storage_map_free
> selftests/bpf: Update task_local_storage/recursion test
> selftests/bpf: Remove test_task_storage_map_stress_lookup
> selftests/bpf: Choose another percpu variable in bpf for btf_dump test
>
> include/linux/bpf_local_storage.h | 14 +-
> kernel/bpf/bpf_cgrp_storage.c | 60 +-----
> kernel/bpf/bpf_inode_storage.c | 6 +-
> kernel/bpf/bpf_local_storage.c | 202 ++++++++++++------
> kernel/bpf/bpf_task_storage.c | 153 ++-----------
> kernel/bpf/helpers.c | 4 -
> net/core/bpf_sk_storage.c | 10 +-
> .../bpf/map_tests/task_storage_map.c | 128 -----------
> .../selftests/bpf/prog_tests/btf_dump.c | 4 +-
> .../bpf/prog_tests/task_local_storage.c | 8 +-
> .../bpf/progs/read_bpf_task_storage_busy.c | 38 ----
> 11 files changed, 184 insertions(+), 443 deletions(-)
> delete mode 100644 tools/testing/selftests/bpf/map_tests/task_storage_map.c
> delete mode 100644 tools/testing/selftests/bpf/progs/read_bpf_task_storage_busy.c
>
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
@ 2025-08-01 1:05 ` Martin KaFai Lau
0 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2025-08-01 1:05 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, kernel-team
On 7/29/25 11:25 AM, Amery Hung wrote:
> - bpf_local_storage_update()
>
> The three step update process: link_map(new_selem),
> link_storage(new_selem), and unlink_map(old_selem) should not fail in
> the middle. Hence, lock both b->lock before the update process starts.
>
> While locking two different buckets decided by the hash function
> introduces different locking order, this will not cause ABBA deadlock
> since this is performed under local_storage->lock.
I am not sure it is always true. e.g. two threads running in different cores can
do bpf_local_storage_update() for two different sk, then it will be two
different local_storage->lock.
My current thought is to change the select_bucket() to depend on the owner
pointer (e.g. *sk, *task...) or the local_storage pointer instead. The intuitive
thinking is the owner pointer is easier to understand than the local_storage
pointer. Then the same owner always hash to the same bucket of a map.
I am not sure the owner pointer is always available in the current setup during
delete. This needs to check. iirc, the current setup is that local_storage->lock
and bucket lock are not always acquired together. It seems the patch set now
needs to acquire both of them together if possible. With this, I suspect
something else can be simplified here and also make the owner pointer available
during delete (if it is indeed missing in some cases now). Not very sure yet. I
need a bit more time to take a closer look.
Thanks for working on this! I think it can simplify the local storage.
[ ... ]
> @@ -560,8 +595,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> struct bpf_local_storage_data *old_sdata = NULL;
> struct bpf_local_storage_elem *alloc_selem, *selem = NULL;
> struct bpf_local_storage *local_storage;
> + struct bpf_local_storage_map_bucket *b, *old_b;
> HLIST_HEAD(old_selem_free_list);
> - unsigned long flags;
> + unsigned long flags, b_flags, old_b_flags;
> int err;
>
> /* BPF_EXIST and BPF_NOEXIST cannot be both set */
> @@ -645,20 +681,31 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> goto unlock;
> }
>
> + b = select_bucket(smap, selem);
> + old_b = old_sdata ? select_bucket(smap, SELEM(old_sdata)) : b;
> +
> + raw_spin_lock_irqsave(&b->lock, b_flags);
> + if (b != old_b)
> + raw_spin_lock_irqsave(&old_b->lock, old_b_flags);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink Amery Hung
@ 2025-08-02 0:58 ` Martin KaFai Lau
2025-08-05 16:25 ` Amery Hung
0 siblings, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2025-08-02 0:58 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, kernel-team
On 7/29/25 11:25 AM, Amery Hung wrote:
> void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
> {
> + struct bpf_local_storage_map *storage_smap;
> + struct bpf_local_storage *local_storage = NULL;
> + bool bpf_ma, free_local_storage = false;
> + HLIST_HEAD(selem_free_list);
> struct bpf_local_storage_map_bucket *b;
> - struct bpf_local_storage_map *smap;
> - unsigned long flags;
> + struct bpf_local_storage_map *smap = NULL;
> + unsigned long flags, b_flags;
>
> if (likely(selem_linked_to_map_lockless(selem))) {
Can we simplify the bpf_selem_unlink() function by skipping this map_lockless
check,
> smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
> b = select_bucket(smap, selem);
> - raw_spin_lock_irqsave(&b->lock, flags);
> + }
>
> - /* Always unlink from map before unlinking from local_storage
> - * because selem will be freed after successfully unlinked from
> - * the local_storage.
> - */
> - bpf_selem_unlink_map_nolock(selem);
> - raw_spin_unlock_irqrestore(&b->lock, flags);
> + if (likely(selem_linked_to_storage_lockless(selem))) {
only depends on this and then proceed to take the lock_storage->lock. Then
recheck selem_linked_to_storage(selem), bpf_selem_unlink_map(selem) first, and
then bpf_selem_unlink_storage_nolock(selem) last.
Then bpf_selem_unlink_map can use selem->local_storage->owner to select_bucket().
> + local_storage = rcu_dereference_check(selem->local_storage,
> + bpf_rcu_lock_held());
> + storage_smap = rcu_dereference_check(local_storage->smap,
> + bpf_rcu_lock_held());
> + bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
> }
>
> - bpf_selem_unlink_storage(selem, reuse_now);
> + if (local_storage)
> + raw_spin_lock_irqsave(&local_storage->lock, flags);
> + if (smap)
> + raw_spin_lock_irqsave(&b->lock, b_flags);
> +
> + /* Always unlink from map before unlinking from local_storage
> + * because selem will be freed after successfully unlinked from
> + * the local_storage.
> + */
> + if (smap)
> + bpf_selem_unlink_map_nolock(selem);
> + if (local_storage && likely(selem_linked_to_storage(selem)))
> + free_local_storage = bpf_selem_unlink_storage_nolock(
> + local_storage, selem, true, &selem_free_list);
> +
> + if (smap)
> + raw_spin_unlock_irqrestore(&b->lock, b_flags);
> + if (local_storage)
> + raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> +
> + bpf_selem_free_list(&selem_free_list, reuse_now);
> +
> + if (free_local_storage)
> + bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
> }
>
> void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH bpf-next v1 06/11] bpf: Remove task local storage percpu counter
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 06/11] bpf: Remove task local storage percpu counter Amery Hung
@ 2025-08-02 1:15 ` Martin KaFai Lau
0 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2025-08-02 1:15 UTC (permalink / raw)
To: Amery Hung
Cc: netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, kernel-team, bpf
On 7/29/25 11:25 AM, Amery Hung wrote:
> kernel/bpf/bpf_task_storage.c | 149 ++++------------------------------
nice.
> +BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
> + task, void *, value, u64, flags, gfp_t, gfp_flags)
> {
> struct bpf_local_storage_data *sdata;
>
> - sdata = task_storage_lookup(task, map, nobusy);
> + WARN_ON_ONCE(!bpf_rcu_lock_held());
> + if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task)
> + return (unsigned long)NULL;
> +
> + sdata = task_storage_lookup(task, map, true);
> if (sdata)
> - return sdata->data;
> + return (unsigned long)sdata->data;
>
> /* only allocate new storage, when the task is refcounted */
> if (refcount_read(&task->usage) &&
> - (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) && nobusy) {
> + (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) {
> sdata = bpf_local_storage_update(
> task, (struct bpf_local_storage_map *)map, value,
> BPF_NOEXIST, false, gfp_flags);
> - return IS_ERR(sdata) ? NULL : sdata->data;
> + WARN_ON(IS_ERR(sdata));
A nit for now. ok during development/RFC. This will eventually need to be
removed. e.g. it should not WARN_ON ENOMEM.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
` (11 preceding siblings ...)
2025-07-29 18:28 ` [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
@ 2025-08-02 1:46 ` Martin KaFai Lau
12 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2025-08-02 1:46 UTC (permalink / raw)
To: Amery Hung, memxor
Cc: netdev, alexei.starovoitov, andrii, daniel, kpsingh, martin.lau,
yonghong.song, song, haoluo, kernel-team, bpf
On 7/29/25 11:25 AM, Amery Hung wrote:
> Question:
>
> - In bpf_local_storage_destroy() and bpf_local_storage_map_free(), where
> it is not allow to fail, I assert that the lock acquisition always
> succeeds based on the fact that 1) these paths cannot run recursively
> causing AA deadlock and 2) local_storage->lock and b->lock are always
> acquired in the same order, but I also notice that rqspinlock has
> a timeout fallback. Is this assertion an okay thing to do?
At bpf_local_storage_destroy, the task is going away.
At bpf_local_storage_map_free, the map is going away.
A bpf prog needs to have both task ptr and map ptr to be able to do
bpf_task_storage_get(+create) and bpf_task_storage_delete().
The bpf_local_storage_destroy and bpf_local_storage_map_free can run in
parallel, and you mentioned there is lock ordering. Not sure how the timeout
fallback is (Kumar ?) but I don't think either of the two functions will hold a
lock for a very long time before releasing it.
I also think bpf_local_storage_destroy and bpf_local_storage_map_free should not
fail. It is good to keep the WARN_ON but I would change it to WARN_ON_ONCE.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink
2025-08-02 0:58 ` Martin KaFai Lau
@ 2025-08-05 16:25 ` Amery Hung
2025-08-05 23:10 ` Martin KaFai Lau
0 siblings, 1 reply; 19+ messages in thread
From: Amery Hung @ 2025-08-05 16:25 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, kernel-team
On Fri, Aug 1, 2025 at 5:58 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 7/29/25 11:25 AM, Amery Hung wrote:
> > void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
> > {
> > + struct bpf_local_storage_map *storage_smap;
> > + struct bpf_local_storage *local_storage = NULL;
> > + bool bpf_ma, free_local_storage = false;
> > + HLIST_HEAD(selem_free_list);
> > struct bpf_local_storage_map_bucket *b;
> > - struct bpf_local_storage_map *smap;
> > - unsigned long flags;
> > + struct bpf_local_storage_map *smap = NULL;
> > + unsigned long flags, b_flags;
> >
> > if (likely(selem_linked_to_map_lockless(selem))) {
>
> Can we simplify the bpf_selem_unlink() function by skipping this map_lockless
> check,
>
> > smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
> > b = select_bucket(smap, selem);
> > - raw_spin_lock_irqsave(&b->lock, flags);
> > + }
> >
> > - /* Always unlink from map before unlinking from local_storage
> > - * because selem will be freed after successfully unlinked from
> > - * the local_storage.
> > - */
> > - bpf_selem_unlink_map_nolock(selem);
> > - raw_spin_unlock_irqrestore(&b->lock, flags);
> > + if (likely(selem_linked_to_storage_lockless(selem))) {
>
> only depends on this and then proceed to take the lock_storage->lock. Then
> recheck selem_linked_to_storage(selem), bpf_selem_unlink_map(selem) first, and
> then bpf_selem_unlink_storage_nolock(selem) last.
Thanks for the suggestion. I think it will simplify the function. Just
making sure I am getting you right, you mean instead of open code both
unlink_map and unlink_storage, only open code unlink_storage. First,
grab local_storage->lock and call bpf_selem_unlink_map(). Then, only
proceed to unlink_storage only If bpf_selem_unlink_map() succeeds.
>
> Then bpf_selem_unlink_map can use selem->local_storage->owner to select_bucket().
Not sure what this part mean. Could you elaborate?
>
> > + local_storage = rcu_dereference_check(selem->local_storage,
> > + bpf_rcu_lock_held());
> > + storage_smap = rcu_dereference_check(local_storage->smap,
> > + bpf_rcu_lock_held());
> > + bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
> > }
> >
> > - bpf_selem_unlink_storage(selem, reuse_now);
> > + if (local_storage)
> > + raw_spin_lock_irqsave(&local_storage->lock, flags);
> > + if (smap)
> > + raw_spin_lock_irqsave(&b->lock, b_flags);
> > +
> > + /* Always unlink from map before unlinking from local_storage
> > + * because selem will be freed after successfully unlinked from
> > + * the local_storage.
> > + */
> > + if (smap)
> > + bpf_selem_unlink_map_nolock(selem);
> > + if (local_storage && likely(selem_linked_to_storage(selem)))
> > + free_local_storage = bpf_selem_unlink_storage_nolock(
> > + local_storage, selem, true, &selem_free_list);
> > +
> > + if (smap)
> > + raw_spin_unlock_irqrestore(&b->lock, b_flags);
> > + if (local_storage)
> > + raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > +
> > + bpf_selem_free_list(&selem_free_list, reuse_now);
> > +
> > + if (free_local_storage)
> > + bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
> > }
> >
> > void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink
2025-08-05 16:25 ` Amery Hung
@ 2025-08-05 23:10 ` Martin KaFai Lau
0 siblings, 0 replies; 19+ messages in thread
From: Martin KaFai Lau @ 2025-08-05 23:10 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor, kpsingh,
martin.lau, yonghong.song, song, haoluo, kernel-team
On 8/5/25 9:25 AM, Amery Hung wrote:
> On Fri, Aug 1, 2025 at 5:58 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 7/29/25 11:25 AM, Amery Hung wrote:
>>> void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
>>> {
>>> + struct bpf_local_storage_map *storage_smap;
>>> + struct bpf_local_storage *local_storage = NULL;
>>> + bool bpf_ma, free_local_storage = false;
>>> + HLIST_HEAD(selem_free_list);
>>> struct bpf_local_storage_map_bucket *b;
>>> - struct bpf_local_storage_map *smap;
>>> - unsigned long flags;
>>> + struct bpf_local_storage_map *smap = NULL;
>>> + unsigned long flags, b_flags;
>>>
>>> if (likely(selem_linked_to_map_lockless(selem))) {
>>
>> Can we simplify the bpf_selem_unlink() function by skipping this map_lockless
>> check,
>>
>>> smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
>>> b = select_bucket(smap, selem);
>>> - raw_spin_lock_irqsave(&b->lock, flags);
>>> + }
>>>
>>> - /* Always unlink from map before unlinking from local_storage
>>> - * because selem will be freed after successfully unlinked from
>>> - * the local_storage.
>>> - */
>>> - bpf_selem_unlink_map_nolock(selem);
>>> - raw_spin_unlock_irqrestore(&b->lock, flags);
>>> + if (likely(selem_linked_to_storage_lockless(selem))) {
>>
>> only depends on this and then proceed to take the lock_storage->lock. Then
>> recheck selem_linked_to_storage(selem), bpf_selem_unlink_map(selem) first, and
>> then bpf_selem_unlink_storage_nolock(selem) last.
>
> Thanks for the suggestion. I think it will simplify the function. Just
> making sure I am getting you right, you mean instead of open code both
> unlink_map and unlink_storage, only open code unlink_storage. First,
> grab local_storage->lock and call bpf_selem_unlink_map(). Then, only
After grabbing the local-storage->lock, re-check selem_linked_to_storage() first
before calling bpf_selem_unlink_map().
> proceed to unlink_storage only If bpf_selem_unlink_map() succeeds.
No strong opinion on open coding bpf_selem_unlink_map() or not. I think they are
the same. I reuse the bpf_selem_unlink_map() because I think it can be used as
is. The logic of bpf_selem_unlink() here should be very similar to the
bpf_local_storage_destroy() now except it needs to recheck the
selem_linked_to_storage():
1. grab both locks.
2. If selem_linked_to_storage() is true, the selem_linked_to_map() should also
be true since we now need to grab both locks before moving forward to unlink.
Meaning either a selem will not be unlinked at all or it will be unlinked from
both local_storage and map. Am I thinking it correctly or there is hole?
>
>>
>> Then bpf_selem_unlink_map can use selem->local_storage->owner to select_bucket().
>
> Not sure what this part mean. Could you elaborate?
I meant to pass owner pointer to select_bucket, like
select_bucket(smap, selem->local_storage->owner). Of course, the owner pointer
should also be used on the update side also, i.e. bpf_local_storage_update().
Then it does not need to take the second bucket lock when "if (b != old_b)" in
the bpf_local_storage_update() in patch 1.
>
>>
>>> + local_storage = rcu_dereference_check(selem->local_storage,
>>> + bpf_rcu_lock_held());
>>> + storage_smap = rcu_dereference_check(local_storage->smap,
>>> + bpf_rcu_lock_held());
>>> + bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, selem);
>>> }
>>>
>>> - bpf_selem_unlink_storage(selem, reuse_now);
>>> + if (local_storage)
>>> + raw_spin_lock_irqsave(&local_storage->lock, flags);
>>> + if (smap)
>>> + raw_spin_lock_irqsave(&b->lock, b_flags);
>>> +
>>> + /* Always unlink from map before unlinking from local_storage
>>> + * because selem will be freed after successfully unlinked from
>>> + * the local_storage.
>>> + */
>>> + if (smap)
This "if (smap)" test
>>> + bpf_selem_unlink_map_nolock(selem);
>>> + if (local_storage && likely(selem_linked_to_storage(selem)))
and this orthogonal "if (local_storage &&
likely(selem_linked_to_storage(selem)))" test.
Understood it is from the existing codes that unlink from map and unlink from
local_storage can be done independently. It was there because it did not need to
grab both locks to move forward. Since now it needs both locks, I think we can
simplify things a bit like mentioned above.
>>> + free_local_storage = bpf_selem_unlink_storage_nolock(
>>> + local_storage, selem, true, &selem_free_list);
>>> +
>>> + if (smap)
>>> + raw_spin_unlock_irqrestore(&b->lock, b_flags);
>>> + if (local_storage)
>>> + raw_spin_unlock_irqrestore(&local_storage->lock, flags);
>>> +
>>> + bpf_selem_free_list(&selem_free_list, reuse_now);
>>> +
>>> + if (free_local_storage)
>>> + bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
>>> }
>>>
>>> void __bpf_local_storage_insert_cache(struct bpf_local_storage *local_storage,
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-08-05 23:10 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
2025-08-01 1:05 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 02/11] bpf: Convert bpf_selem_link_map " Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink Amery Hung
2025-08-02 0:58 ` Martin KaFai Lau
2025-08-05 16:25 ` Amery Hung
2025-08-05 23:10 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 04/11] bpf: Convert bpf_selem_unlink to failable Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 05/11] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 06/11] bpf: Remove task local storage percpu counter Amery Hung
2025-08-02 1:15 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 07/11] bpf: Remove cgroup " Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 08/11] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 09/11] selftests/bpf: Update task_local_storage/recursion test Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 10/11] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 11/11] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
2025-07-29 18:28 ` [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
2025-08-02 1:46 ` Martin KaFai Lau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).