From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: netdev@vger.kernel.org, alexei.starovoitov@gmail.com,
andrii@kernel.org, daniel@iogearbox.net, memxor@gmail.com,
martin.lau@kernel.org, kpsingh@kernel.org,
yonghong.song@linux.dev, song@kernel.org, haoluo@google.com,
bpf@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH bpf-next v7 10/17] bpf: Support lockless unlink when freeing map or local storage
Date: Fri, 6 Feb 2026 15:25:06 -0800 [thread overview]
Message-ID: <b51ca47f-46e2-457e-a152-2f7fbdeee1e2@linux.dev> (raw)
In-Reply-To: <20260205222916.1788211-11-ameryhung@gmail.com>
On 2/5/26 2:29 PM, Amery Hung wrote:
> +/*
> + * Unlink an selem from map and local storage with lockless fallback if callers
> + * are racing or rqspinlock returns error. It should only be called by
> + * bpf_local_storage_destroy() or bpf_local_storage_map_free().
> + */
> +static void bpf_selem_unlink_nofail(struct bpf_local_storage_elem *selem,
> + struct bpf_local_storage_map_bucket *b)
> +{
> + bool in_map_free = !!b, free_storage = false;
> + struct bpf_local_storage *local_storage;
> + struct bpf_local_storage_map *smap;
> + unsigned long flags;
> + int err, unlink = 0;
> +
> + local_storage = rcu_dereference_check(selem->local_storage, bpf_rcu_lock_held());
> + smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
> +
> + if (smap) {
> + b = b ? : select_bucket(smap, local_storage);
> + err = raw_res_spin_lock_irqsave(&b->lock, flags);
> + if (!err) {
> + /*
> + * Call bpf_obj_free_fields() under b->lock to make sure it is done
> + * exactly once for an selem. Safe to free special fields immediately
> + * as no BPF program should be referencing the selem.
> + */
> + if (likely(selem_linked_to_map(selem))) {
> + hlist_del_init_rcu(&selem->map_node);
> + bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
> + unlink++;
> + }
> + raw_res_spin_unlock_irqrestore(&b->lock, flags);
> + }
> + /*
> + * Highly unlikely scenario: resource leak
> + *
> + * When map_free(selem1), destroy(selem1) and destroy(selem2) are racing
> + * and both selem belong to the same bucket, if destroy(selem2) acquired
> + * b->lock and block for too long, neither map_free(selem1) and
> + * destroy(selem1) will be able to free the special field associated
> + * with selem1 as raw_res_spin_lock_irqsave() returns -ETIMEDOUT.
> + */
> + WARN_ON_ONCE(err && in_map_free);
> + if (!err || in_map_free)
> + RCU_INIT_POINTER(SDATA(selem)->smap, NULL);
> + }
> +
> + if (local_storage) {
> + err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
> + if (!err) {
> + if (likely(selem_linked_to_storage(selem))) {
> + free_storage = hlist_is_singular_node(&selem->snode,
> + &local_storage->list);
> + /*
> + * Okay to skip clearing owner_storage and storage->owner in
> + * destroy() since the owner is going away. No user or bpf
> + * programs should be able to reference it.
> + */
> + if (smap && in_map_free)
> + bpf_selem_unlink_storage_nolock_misc(
> + selem, smap, local_storage,
> + free_storage, true);
> + hlist_del_init_rcu(&selem->snode);
> + unlink++;
> + }
> + raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
> + }
> + if (!err || !in_map_free)
> + RCU_INIT_POINTER(selem->local_storage, NULL);
> + }
> +
> + if (unlink != 2)
> + atomic_or(in_map_free ? SELEM_MAP_UNLINKED : SELEM_STORAGE_UNLINKED, &selem->state);
> +
> + /*
> + * Normally, an selem can be unlinked under local_storage->lock and b->lock, and
> + * then freed after an RCU grace period. However, if destroy() and map_free() are
> + * racing or rqspinlock returns errors in unlikely situations (unlink != 2), free
> + * the selem only after both map_free() and destroy() see the selem.
> + */
> + if (unlink == 2 ||
> + atomic_cmpxchg(&selem->state, SELEM_UNLINKED, SELEM_TOFREE) == SELEM_UNLINKED)
> + bpf_selem_free(selem, true);
> +
> + if (free_storage)
> + bpf_local_storage_free(local_storage, true);
I think there is a chance that selem->state reached SELEM_UNLINKED but
free_storage is false, and then local_storage is leaked.
afaik, it can happen when destroy() cannot hold its own
local_storage->lock, but it should be very unlikely. There is a similar
WARN_ON_ONCE in this function. If addressing this unlikely case is not
worth the complexity, maybe it deserves a WARN_ON_ONCE here also. This
can be followed up.
Thanks for working on this. It is a huge effort. The set is applied.
> +}
next prev parent reply other threads:[~2026-02-06 23:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 22:28 [PATCH bpf-next v7 00/17] Remove task and cgroup local storage percpu counters Amery Hung
2026-02-05 22:28 ` [PATCH bpf-next v7 01/17] bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 02/17] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 03/17] bpf: Convert bpf_selem_link_map " Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 04/17] bpf: Convert bpf_selem_unlink " Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 05/17] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 06/17] bpf: Remove task local storage percpu counter Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 07/17] bpf: Remove cgroup " Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 08/17] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 09/17] bpf: Prepare for bpf_selem_unlink_nofail() Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 10/17] bpf: Support lockless unlink when freeing map or local storage Amery Hung
2026-02-06 23:25 ` Martin KaFai Lau [this message]
2026-02-05 22:29 ` [PATCH bpf-next v7 11/17] bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 12/17] selftests/bpf: Update sk_storage_omem_uncharge test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 13/17] selftests/bpf: Update task_local_storage/recursion test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 14/17] selftests/bpf: Update task_local_storage/task_storage_nodeadlock test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 15/17] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 16/17] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
2026-02-05 22:29 ` [PATCH bpf-next v7 17/17] selftests/bpf: Fix outdated test on storage->smap Amery Hung
2026-02-06 23:00 ` [PATCH bpf-next v7 00/17] Remove task and cgroup local storage percpu counters patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b51ca47f-46e2-457e-a152-2f7fbdeee1e2@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.