From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: netdev@vger.kernel.org, alexei.starovoitov@gmail.com,
andrii@kernel.org, daniel@iogearbox.net, memxor@gmail.com,
martin.lau@kernel.org, kpsingh@kernel.org,
yonghong.song@linux.dev, song@kernel.org, haoluo@google.com,
bpf@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH bpf-next v7 10/17] bpf: Support lockless unlink when freeing map or local storage
Date: Fri, 6 Feb 2026 15:25:06 -0800 [thread overview]
Message-ID: <b51ca47f-46e2-457e-a152-2f7fbdeee1e2@linux.dev> (raw)
In-Reply-To: <20260205222916.1788211-11-ameryhung@gmail.com>
On 2/5/26 2:29 PM, Amery Hung wrote:
> +/*
> + * Unlink an selem from map and local storage with lockless fallback if callers
> + * are racing or rqspinlock returns error. It should only be called by
> + * bpf_local_storage_destroy() or bpf_local_storage_map_free().
> + */
> +static void bpf_selem_unlink_nofail(struct bpf_local_storage_elem *selem,
> + struct bpf_local_storage_map_bucket *b)
> +{
> + bool in_map_free = !!b, free_storage = false;
> + struct bpf_local_storage *local_storage;
> + struct bpf_local_storage_map *smap;
> + unsigned long flags;
> + int err, unlink = 0;
> +
> + local_storage = rcu_dereference_check(selem->local_storage, bpf_rcu_lock_held());
> + smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
> +
> + if (smap) {
> + b = b ? : select_bucket(smap, local_storage);
> + err = raw_res_spin_lock_irqsave(&b->lock, flags);
> + if (!err) {
> + /*
> + * Call bpf_obj_free_fields() under b->lock to make sure it is done
> + * exactly once for an selem. Safe to free special fields immediately
> + * as no BPF program should be referencing the selem.
> + */
> + if (likely(selem_linked_to_map(selem))) {
> + hlist_del_init_rcu(&selem->map_node);
> + bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
> + unlink++;
> + }
> + raw_res_spin_unlock_irqrestore(&b->lock, flags);
> + }
> + /*
> + * Highly unlikely scenario: resource leak
> + *
> + * When map_free(selem1), destroy(selem1) and destroy(selem2) are racing
> + * and both selem belong to the same bucket, if destroy(selem2) acquired
> + * b->lock and block for too long, neither map_free(selem1) and
> + * destroy(selem1) will be able to free the special field associated
> + * with selem1 as raw_res_spin_lock_irqsave() returns -ETIMEDOUT.
> + */
> + WARN_ON_ONCE(err && in_map_free);
> + if (!err || in_map_free)
> + RCU_INIT_POINTER(SDATA(selem)->smap, NULL);
> + }
> +
> + if (local_storage) {
> + err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
> + if (!err) {
> + if (likely(selem_linked_to_storage(selem))) {
> + free_storage = hlist_is_singular_node(&selem->snode,
> + &local_storage->list);
> + /*
> + * Okay to skip clearing owner_storage and storage->owner in
> + * destroy() since the owner is going away. No user or bpf
> + * programs should be able to reference it.
> + */
> + if (smap && in_map_free)
> + bpf_selem_unlink_storage_nolock_misc(
> + selem, smap, local_storage,
> + free_storage, true);
> + hlist_del_init_rcu(&selem->snode);
> + unlink++;
> + }
> + raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
> + }
> + if (!err || !in_map_free)
> + RCU_INIT_POINTER(selem->local_storage, NULL);
> + }
> +
> + if (unlink != 2)
> + atomic_or(in_map_free ? SELEM_MAP_UNLINKED : SELEM_STORAGE_UNLINKED, &selem->state);
> +
> + /*
> + * Normally, an selem can be unlinked under local_storage->lock and b->lock, and
> + * then freed after an RCU grace period. However, if destroy() and map_free() are
> + * racing or rqspinlock returns errors in unlikely situations (unlink != 2), free
> + * the selem only after both map_free() and destroy() see the selem.
> + */
> + if (unlink == 2 ||
> + atomic_cmpxchg(&selem->state, SELEM_UNLINKED, SELEM_TOFREE) == SELEM_UNLINKED)
> + bpf_selem_free(selem, true);
> +
> + if (free_storage)
> + bpf_local_storage_free(local_storage, true);
I think there is a chance that selem->state reached SELEM_UNLINKED but
free_storage is false, and then local_storage is leaked.
afaik, it can happen when destroy() cannot hold its own
local_storage->lock, but it should be very unlikely. There is a similar
WARN_ON_ONCE in this function. If addressing this unlikely case is not
worth the complexity, maybe it deserves a WARN_ON_ONCE here also. This
can be followed up.
Thanks for working on this. It is a huge effort. The set is applied.
> +}
next prev parent reply other threads:[~2026-02-06 23:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 22:28 [PATCH bpf-next v7 00/17] Remove task and cgroup local storage percpu counters Amery Hung
2026-02-05 22:28 ` [PATCH bpf-next v7 01/17] bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 02/17] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 03/17] bpf: Convert bpf_selem_link_map " Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 04/17] bpf: Convert bpf_selem_unlink " Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 05/17] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 06/17] bpf: Remove task local storage percpu counter Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 07/17] bpf: Remove cgroup " Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 08/17] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 09/17] bpf: Prepare for bpf_selem_unlink_nofail() Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 10/17] bpf: Support lockless unlink when freeing map or local storage Amery Hung
2026-02-06 23:25 ` Martin KaFai Lau [this message]
2026-02-05 22:29 ` [PATCH bpf-next v7 11/17] bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 12/17] selftests/bpf: Update sk_storage_omem_uncharge test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 13/17] selftests/bpf: Update task_local_storage/recursion test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 14/17] selftests/bpf: Update task_local_storage/task_storage_nodeadlock test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 15/17] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 16/17] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 17/17] selftests/bpf: Fix outdated test on storage->smap Amery Hung
2026-02-06 23:00 ` [PATCH bpf-next v7 00/17] Remove task and cgroup local storage percpu counters patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b51ca47f-46e2-457e-a152-2f7fbdeee1e2@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox