All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: netdev@vger.kernel.org, alexei.starovoitov@gmail.com,
	andrii@kernel.org, daniel@iogearbox.net, memxor@gmail.com,
	martin.lau@kernel.org, kpsingh@kernel.org,
	yonghong.song@linux.dev, song@kernel.org, haoluo@google.com,
	kernel-team@meta.com, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v5 10/16] bpf: Support lockless unlink when freeing map or local storage
Date: Tue, 3 Feb 2026 21:39:05 -0800	[thread overview]
Message-ID: <d512e9fd-eb04-4194-ab75-b1d2e775461a@linux.dev> (raw)
In-Reply-To: <20260201175050.468601-11-ameryhung@gmail.com>

On 2/1/26 9:50 AM, Amery Hung wrote:
> +/*
> + * Unlink an selem from map and local storage with lockless fallback if callers
> + * are racing or rqspinlock returns error. It should only be called by
> + * bpf_local_storage_destroy() or bpf_local_storage_map_free().
> + */
> +static void bpf_selem_unlink_nofail(struct bpf_local_storage_elem *selem,
> +				    struct bpf_local_storage_map_bucket *b)
> +{
> +	struct bpf_local_storage *local_storage;
> +	struct bpf_local_storage_map *smap;
> +	bool in_map_free = !!b;
> +	unsigned long flags;
> +	int err, unlink = 0;
> +
> +	local_storage = rcu_dereference_check(selem->local_storage, bpf_rcu_lock_held());
> +	smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
> +
> +	/*
> +	 * Prevent being called twice from the same caller on the same selem.
> +	 * map_free() and destroy() each holds a link_cnt on an selem.
> +	 */
> +	if ((!smap && in_map_free) || (!local_storage && !in_map_free))

There is chance that map_free() can see "!smap" in the very first call 
of bpf_selem_unlink_nofail(). For example, the destroy() may grab the 
b->lock and do the hlist_del_init_rcu(&selem->map_node). In the unlikely 
case, the destroy() cannot grab the local_storage->lock, so it does 
atomic_dec_and_test(&selem->link_cnt). If map_free() hits the !smap in 
the very first time, it cannot move on to do 
atomic_dec_and_test(&selem->link_cnt), and the selem will be leaked. It 
is unlikely if we can assume destroy() should be able to hold its own 
local_storage->lock (no bpf prog should be holding it and no ETIMEDOUT).

I think the same goes for the "!local_storage" check calling from destroy().


> +		return;
> +
> +	if (smap) {
> +		b = b ? : select_bucket(smap, local_storage);
> +		err = raw_res_spin_lock_irqsave(&b->lock, flags);
> +		if (!err) {
> +			/*
> +			 * Call bpf_obj_free_fields() under b->lock to make sure it is done
> +			 * exactly once for an selem. Safe to free special fields immediately
> +			 * as no BPF program should be referencing the selem.
> +			 */
> +			if (likely(selem_linked_to_map(selem))) {
> +				hlist_del_init_rcu(&selem->map_node);
> +				bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
> +				unlink++;
> +			}
> +			raw_res_spin_unlock_irqrestore(&b->lock, flags);
> +		}
> +		/*
> +		 * Highly unlikely scenario: resource leak
> +		 *
> +		 * When map_free(selem1), destroy(selem1) and destroy(selem2) are racing
> +		 * and both selem belong to the same bucket, if destroy(selem2) acquired
> +		 * b->lock and block for too long, neither map_free(selem1) and
> +		 * destroy(selem1) will be able to free the special field associated
> +		 * with selem1 as raw_res_spin_lock_irqsave() returns -ETIMEDOUT.
> +		 */
> +		WARN_ON_ONCE(err && in_map_free);
> +		if (!err || in_map_free)
> +			RCU_INIT_POINTER(SDATA(selem)->smap, NULL);
> +	}
> +
> +	if (local_storage) {
> +		err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
> +		if (!err) {
> +			/*
> +			 * Normally, map_free() can call mem_uncharge() if destroy() is
> +			 * not about to return to the owner, which can then go away
> +			 * immediately. Otherwise, the charge of the selem will stay
> +			 * accounted in local_storage->selems_size and uncharged during
> +			 * destroy().
> +			 */
> +			if (likely(selem_linked_to_storage(selem))) {
> +				hlist_del_init_rcu(&selem->snode);
> +				if (smap && in_map_free &&

I think the smap non-null check is not needed.

> +				    refcount_inc_not_zero(&local_storage->owner_refcnt)) {
> +					mem_uncharge(smap, local_storage->owner, smap->elem_size);
> +					local_storage->selems_size -= smap->elem_size;
> +					refcount_dec(&local_storage->owner_refcnt);
> +				}
> +				unlink++;
> +			}
> +			raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
> +		}
> +		if (!err || !in_map_free)
> +			RCU_INIT_POINTER(selem->local_storage, NULL);
> +	}
> +
> +	/*
> +	 * Normally, an selem can be unlinked under local_storage->lock and b->lock, and
> +	 * then freed after an RCU grace period. However, if destroy() and map_free() are
> +	 * racing or rqspinlock returns errors in unlikely situations (unlink != 2), free
> +	 * the selem only after both map_free() and destroy() drop their link_cnt.
> +	 */
> +	if (unlink == 2 || atomic_dec_and_test(&selem->link_cnt))
> +		bpf_selem_free(selem, false);

This can be bpf_selem_free(..., true) here.



  parent reply	other threads:[~2026-02-04  5:39 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-01 17:50 [PATCH bpf-next v5 00/16] Remove task and cgroup local storage percpu counters Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 01/16] bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 02/16] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 03/16] bpf: Convert bpf_selem_link_map " Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 04/16] bpf: Convert bpf_selem_unlink " Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 05/16] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 06/16] bpf: Remove task local storage percpu counter Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 07/16] bpf: Remove cgroup " Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 08/16] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 09/16] bpf: Prepare for bpf_selem_unlink_nofail() Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 10/16] bpf: Support lockless unlink when freeing map or local storage Amery Hung
2026-02-01 18:22   ` bot+bpf-ci
2026-02-04  5:39   ` Martin KaFai Lau [this message]
2026-02-04 23:14     ` Amery Hung
2026-02-05  1:08       ` Martin KaFai Lau
2026-02-01 17:50 ` [PATCH bpf-next v5 11/16] bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} Amery Hung
2026-02-04  1:52   ` Martin KaFai Lau
2026-02-04 23:20     ` Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 12/16] selftests/bpf: Update sk_storage_omem_uncharge test Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 13/16] selftests/bpf: Update task_local_storage/recursion test Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 14/16] selftests/bpf: Update task_local_storage/task_storage_nodeadlock test Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 15/16] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
2026-02-01 17:50 ` [PATCH bpf-next v5 16/16] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
2026-02-01 23:29 ` [PATCH bpf-next v5 00/16] Remove task and cgroup local storage percpu counters Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d512e9fd-eb04-4194-ab75-b1d2e775461a@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=kernel-team@meta.com \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.