From: Amery Hung <ameryhung@gmail.com>
To: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, alexei.starovoitov@gmail.com,
andrii@kernel.org, daniel@iogearbox.net, memxor@gmail.com,
kpsingh@kernel.org, martin.lau@kernel.org,
yonghong.song@linux.dev, song@kernel.org, haoluo@google.com,
ameryhung@gmail.com, kernel-team@meta.com
Subject: [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable
Date: Tue, 29 Jul 2025 11:25:39 -0700 [thread overview]
Message-ID: <20250729182550.185356-2-ameryhung@gmail.com> (raw)
In-Reply-To: <20250729182550.185356-1-ameryhung@gmail.com>
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock,
convert bpf_selem_unlink_map() to failable. It still always succeeds and
returns 0 for now.
Since some operations updating local storage cannot fail in the middle,
open-code bpf_selem_unlink_map() to take the b->lock before the
operation. There are two such locations:
- bpf_local_storage_alloc()
The first selem will be unlinked from smap if cmpxchg owner_storage_ptr
fails, which should not fail. Therefore, hold b->lock when linking
until allocation complete. Helpers that assume b->lock is held by
callers are introduced: bpf_selem_link_map_nolock() and
bpf_selem_unlink_map_nolock().
- bpf_local_storage_update()
The three step update process: link_map(new_selem),
link_storage(new_selem), and unlink_map(old_selem) should not fail in
the middle. Hence, lock both b->lock before the update process starts.
While locking two different buckets decided by the hash function
introduces different locking order, this will not cause ABBA deadlock
since this is performed under local_storage->lock.
- bpf_selem_unlink()
bpf_selem_unlink_map() and bpf_selem_unlink_storage() should either
all succeed or fail as a whole instead of failing in the middle.
As the first step, open code bpf_selem_unlink_map(). A later patch
will open code bpf_selem_unlink_storage(). Then, unlink_map and
unlink_storage will be done after successfully acquiring both
local_storage->lock and b->lock.
One caller of bpf_selem_unlink_map() cannot run recursively (e.g.,
called by helpers in tracing bpf programs) and therefore cannot deadlock.
Assert that these calls cannot fail instead of handling them.
- bpf_local_storage_destroy()
Called by owner (e.g., task_struct, sk, ...). Will not recur and
cause AA deadlock.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
kernel/bpf/bpf_local_storage.c | 75 ++++++++++++++++++++++++++++------
1 file changed, 62 insertions(+), 13 deletions(-)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index b931fbceb54d..7e39b88ef795 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -409,7 +409,7 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
hlist_add_head_rcu(&selem->snode, &local_storage->list);
}
-static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
+static int bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map *smap;
struct bpf_local_storage_map_bucket *b;
@@ -417,7 +417,7 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
if (unlikely(!selem_linked_to_map_lockless(selem)))
/* selem has already be unlinked from smap */
- return;
+ return 0;
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
b = select_bucket(smap, selem);
@@ -425,6 +425,14 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
if (likely(selem_linked_to_map(selem)))
hlist_del_init_rcu(&selem->map_node);
raw_spin_unlock_irqrestore(&b->lock, flags);
+
+ return 0;
+}
+
+static void bpf_selem_unlink_map_nolock(struct bpf_local_storage_elem *selem)
+{
+ if (likely(selem_linked_to_map(selem)))
+ hlist_del_init_rcu(&selem->map_node);
}
void bpf_selem_link_map(struct bpf_local_storage_map *smap,
@@ -439,13 +447,33 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
raw_spin_unlock_irqrestore(&b->lock, flags);
}
+static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem,
+ struct bpf_local_storage_map_bucket *b)
+{
+ RCU_INIT_POINTER(SDATA(selem)->smap, smap);
+ hlist_add_head_rcu(&selem->map_node, &b->list);
+}
+
void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now)
{
- /* Always unlink from map before unlinking from local_storage
- * because selem will be freed after successfully unlinked from
- * the local_storage.
- */
- bpf_selem_unlink_map(selem);
+ struct bpf_local_storage_map_bucket *b;
+ struct bpf_local_storage_map *smap;
+ unsigned long flags;
+
+ if (likely(selem_linked_to_map_lockless(selem))) {
+ smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
+ b = select_bucket(smap, selem);
+ raw_spin_lock_irqsave(&b->lock, flags);
+
+ /* Always unlink from map before unlinking from local_storage
+ * because selem will be freed after successfully unlinked from
+ * the local_storage.
+ */
+ bpf_selem_unlink_map_nolock(selem);
+ raw_spin_unlock_irqrestore(&b->lock, flags);
+ }
+
bpf_selem_unlink_storage(selem, reuse_now);
}
@@ -487,6 +515,8 @@ int bpf_local_storage_alloc(void *owner,
{
struct bpf_local_storage *prev_storage, *storage;
struct bpf_local_storage **owner_storage_ptr;
+ struct bpf_local_storage_map_bucket *b;
+ unsigned long flags;
int err;
err = mem_charge(smap, owner, sizeof(*storage));
@@ -509,7 +539,10 @@ int bpf_local_storage_alloc(void *owner,
storage->owner = owner;
bpf_selem_link_storage_nolock(storage, first_selem);
- bpf_selem_link_map(smap, first_selem);
+
+ b = select_bucket(smap, first_selem);
+ raw_spin_lock_irqsave(&b->lock, flags);
+ bpf_selem_link_map_nolock(smap, first_selem, b);
owner_storage_ptr =
(struct bpf_local_storage **)owner_storage(smap, owner);
@@ -525,7 +558,8 @@ int bpf_local_storage_alloc(void *owner,
*/
prev_storage = cmpxchg(owner_storage_ptr, NULL, storage);
if (unlikely(prev_storage)) {
- bpf_selem_unlink_map(first_selem);
+ bpf_selem_unlink_map_nolock(first_selem);
+ raw_spin_unlock_irqrestore(&b->lock, flags);
err = -EAGAIN;
goto uncharge;
@@ -539,6 +573,7 @@ int bpf_local_storage_alloc(void *owner,
* bucket->list under rcu_read_lock().
*/
}
+ raw_spin_unlock_irqrestore(&b->lock, flags);
return 0;
@@ -560,8 +595,9 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
struct bpf_local_storage_data *old_sdata = NULL;
struct bpf_local_storage_elem *alloc_selem, *selem = NULL;
struct bpf_local_storage *local_storage;
+ struct bpf_local_storage_map_bucket *b, *old_b;
HLIST_HEAD(old_selem_free_list);
- unsigned long flags;
+ unsigned long flags, b_flags, old_b_flags;
int err;
/* BPF_EXIST and BPF_NOEXIST cannot be both set */
@@ -645,20 +681,31 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
goto unlock;
}
+ b = select_bucket(smap, selem);
+ old_b = old_sdata ? select_bucket(smap, SELEM(old_sdata)) : b;
+
+ raw_spin_lock_irqsave(&b->lock, b_flags);
+ if (b != old_b)
+ raw_spin_lock_irqsave(&old_b->lock, old_b_flags);
+
alloc_selem = NULL;
/* First, link the new selem to the map */
- bpf_selem_link_map(smap, selem);
+ bpf_selem_link_map_nolock(smap, selem, b);
/* Second, link (and publish) the new selem to local_storage */
bpf_selem_link_storage_nolock(local_storage, selem);
/* Third, remove old selem, SELEM(old_sdata) */
if (old_sdata) {
- bpf_selem_unlink_map(SELEM(old_sdata));
+ bpf_selem_unlink_map_nolock(SELEM(old_sdata));
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
true, &old_selem_free_list);
}
+ if (b != old_b)
+ raw_spin_unlock_irqrestore(&old_b->lock, old_b_flags);
+ raw_spin_unlock_irqrestore(&b->lock, b_flags);
+
unlock:
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
bpf_selem_free_list(&old_selem_free_list, false);
@@ -736,6 +783,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
HLIST_HEAD(free_selem_list);
struct hlist_node *n;
unsigned long flags;
+ int err;
storage_smap = rcu_dereference_check(local_storage->smap, bpf_rcu_lock_held());
bpf_ma = check_storage_bpf_ma(local_storage, storage_smap, NULL);
@@ -754,7 +802,8 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
/* Always unlink from map before unlinking from
* local_storage.
*/
- bpf_selem_unlink_map(selem);
+ err = bpf_selem_unlink_map(selem);
+ WARN_ON(err);
/* If local_storage list has only one element, the
* bpf_selem_unlink_storage_nolock() will return true.
* Otherwise, it will return false. The current loop iteration
--
2.47.3
next prev parent reply other threads:[~2025-07-29 18:25 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-29 18:25 [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
2025-07-29 18:25 ` Amery Hung [this message]
2025-08-01 1:05 ` [RFC PATCH bpf-next v1 01/11] bpf: Convert bpf_selem_unlink_map to failable Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 02/11] bpf: Convert bpf_selem_link_map " Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 03/11] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink Amery Hung
2025-08-02 0:58 ` Martin KaFai Lau
2025-08-05 16:25 ` Amery Hung
2025-08-05 23:10 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 04/11] bpf: Convert bpf_selem_unlink to failable Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 05/11] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 06/11] bpf: Remove task local storage percpu counter Amery Hung
2025-08-02 1:15 ` Martin KaFai Lau
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 07/11] bpf: Remove cgroup " Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 08/11] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 09/11] selftests/bpf: Update task_local_storage/recursion test Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 10/11] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
2025-07-29 18:25 ` [RFC PATCH bpf-next v1 11/11] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
2025-07-29 18:28 ` [RFC PATCH bpf-next v1 00/11] Remove task and cgroup local Amery Hung
2025-08-02 1:46 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250729182550.185356-2-ameryhung@gmail.com \
--to=ameryhung@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.