All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	alexei.starovoitov@gmail.com, andrii@kernel.org,
	daniel@iogearbox.net, memxor@gmail.com, martin.lau@kernel.org,
	kpsingh@kernel.org, yonghong.song@linux.dev, song@kernel.org,
	haoluo@google.com, kernel-team@meta.com
Subject: Re: [PATCH bpf-next v3 01/16] bpf: Convert bpf_selem_unlink_map to failable
Date: Thu, 8 Jan 2026 12:29:00 -0800	[thread overview]
Message-ID: <74fa8337-b0cb-42fb-af8a-fdf6877e558d@linux.dev> (raw)
In-Reply-To: <20251218175628.1460321-2-ameryhung@gmail.com>

On 12/18/25 9:56 AM, Amery Hung wrote:
> To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock,
> convert bpf_selem_unlink_map() to failable. It still always succeeds and
> returns 0 for now.
> 
> Since some operations updating local storage cannot fail in the middle,
> open-code bpf_selem_unlink_map() to take the b->lock before the
> operation. There are two such locations:
> 
> - bpf_local_storage_alloc()
> 
>    The first selem will be unlinked from smap if cmpxchg owner_storage_ptr
>    fails, which should not fail. Therefore, hold b->lock when linking
>    until allocation complete. Helpers that assume b->lock is held by
>    callers are introduced: bpf_selem_link_map_nolock() and
>    bpf_selem_unlink_map_nolock().
> 
> - bpf_local_storage_update()
> 
>    The three step update process: link_map(new_selem),
>    link_storage(new_selem), and unlink_map(old_selem) should not fail in
>    the middle.
> 
> In bpf_selem_unlink(), bpf_selem_unlink_map() and
> bpf_selem_unlink_storage() should either all succeed or fail as a whole
> instead of failing in the middle. So, return if unlink_map() failed.
> 
> In bpf_local_storage_destroy(), since it cannot deadlock with itself or
> bpf_local_storage_map_free() who the function might be racing with,
> retry if bpf_selem_unlink_map() fails due to rqspinlock returning
> errors.
> 
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
>   kernel/bpf/bpf_local_storage.c | 64 +++++++++++++++++++++++++++++-----
>   1 file changed, 55 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index e2fe6c32822b..4e3f227fd634 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -347,7 +347,7 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
>   	hlist_add_head_rcu(&selem->snode, &local_storage->list);
>   }
>   
> -static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
> +static int bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)

This will end up only be used by bpf_selem_unlink(). It may as well 
remove this function and open code in the bpf_selem_unlink(). I think it 
may depend on how patch 10 goes and also if it makes sense to remove 
bpf_selem_"link"_map and bpf_selem_unlink_map_nolock also, so treat it 
as a nit note for now.

>   {
>   	struct bpf_local_storage_map *smap;
>   	struct bpf_local_storage_map_bucket *b;
> @@ -355,7 +355,7 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
>   
>   	if (unlikely(!selem_linked_to_map_lockless(selem)))

In the later patch where both local_storage's and map-bucket's locks 
must be acquired, will this check still be needed if there is an earlier 
check that ensures the selem is still linked to the local_storage? It 
does not matter in terms of perf, but I think it will help code reading 
in the future for the common code path (i.e. the code paths other than 
bpf_local_storage_destroy and bpf_local_storage_map_free).

>   		/* selem has already be unlinked from smap */
> -		return;
> +		return 0;
>   
>   	smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
>   	b = select_bucket(smap, selem);
> @@ -363,6 +363,14 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
>   	if (likely(selem_linked_to_map(selem)))
>   		hlist_del_init_rcu(&selem->map_node);
>   	raw_spin_unlock_irqrestore(&b->lock, flags);
> +
> +	return 0;
> +}
> +
> +static void bpf_selem_unlink_map_nolock(struct bpf_local_storage_elem *selem)
> +{
> +	if (likely(selem_linked_to_map(selem)))

Take this chance to remove the selem_linked_to_map() check. 
hlist_del_init_rcu has the same check.

> +		hlist_del_init_rcu(&selem->map_node);
>   }
>   
>   void bpf_selem_link_map(struct bpf_local_storage_map *smap,
> @@ -376,13 +384,26 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
>   	raw_spin_unlock_irqrestore(&b->lock, flags);
>   }
>   
> +static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
> +				      struct bpf_local_storage_elem *selem,
> +				      struct bpf_local_storage_map_bucket *b)
> +{
> +	RCU_INIT_POINTER(SDATA(selem)->smap, smap);

Is it needed? bpf_selem_alloc should have init the SDATA(selem)->smap.

> +	hlist_add_head_rcu(&selem->map_node, &b->list);
> +}
> +

[ ... ]

> @@ -574,20 +603,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
>   		goto unlock;
>   	}
>   
> +	b = select_bucket(smap, selem);
> +
> +	if (old_sdata) {
> +		old_b = select_bucket(smap, SELEM(old_sdata));
> +		old_b = old_b == b ? NULL : old_b;
> +	}
> +
> +	raw_spin_lock_irqsave(&b->lock, b_flags);
> +
> +	if (old_b)
> +		raw_spin_lock_irqsave(&old_b->lock, old_b_flags);

This will deadlock because of the lock ordering of b and old_b. 
Replacing it with res_spin_lock in the later patch can detect it and 
break it more gracefully. imo, we should not introduce a known deadlock 
logic in the kernel code in the syscall code path and ask the current 
user to retry the map_update_elem syscall.

What happened to the patch in the earlier revision that uses the 
local_storage (or owner) for select_bucket?

[ will continue with the rest of the patches a bit later ]

> +
>   	alloc_selem = NULL;
>   	/* First, link the new selem to the map */
> -	bpf_selem_link_map(smap, selem);
> +	bpf_selem_link_map_nolock(smap, selem, b);
>   
>   	/* Second, link (and publish) the new selem to local_storage */
>   	bpf_selem_link_storage_nolock(local_storage, selem);
>   
>   	/* Third, remove old selem, SELEM(old_sdata) */
>   	if (old_sdata) {
> -		bpf_selem_unlink_map(SELEM(old_sdata));
> +		bpf_selem_unlink_map_nolock(SELEM(old_sdata));
>   		bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
>   						&old_selem_free_list);
>   	}
>   
> +	if (old_b)
> +		raw_spin_unlock_irqrestore(&old_b->lock, old_b_flags);
> +
> +	raw_spin_unlock_irqrestore(&b->lock, b_flags);
> +
>   unlock:
>   	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
>   	bpf_selem_free_list(&old_selem_free_list, false);
> @@ -679,7 +725,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
>   		/* Always unlink from map before unlinking from
>   		 * local_storage.
>   		 */
> -		bpf_selem_unlink_map(selem);
> +		WARN_ON(bpf_selem_unlink_map(selem));
>   		/* If local_storage list has only one element, the
>   		 * bpf_selem_unlink_storage_nolock() will return true.
>   		 * Otherwise, it will return false. The current loop iteration


  parent reply	other threads:[~2026-01-08 20:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-18 17:56 [PATCH bpf-next v3 00/16] Remove task and cgroup local storage percpu counters Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 01/16] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
2025-12-18 18:27   ` bot+bpf-ci
2026-01-08 20:40     ` Martin KaFai Lau
2026-01-08 20:29   ` Martin KaFai Lau [this message]
2026-01-09 18:39     ` Amery Hung
2026-01-09 21:53       ` Martin KaFai Lau
2026-01-12 17:47         ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 02/16] bpf: Convert bpf_selem_link_map " Amery Hung
2025-12-18 18:19   ` bot+bpf-ci
2025-12-18 17:56 ` [PATCH bpf-next v3 03/16] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink Amery Hung
2026-01-09 17:42   ` Martin KaFai Lau
2026-01-09 18:49     ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 04/16] bpf: Convert bpf_selem_unlink to failable Amery Hung
2025-12-18 18:27   ` bot+bpf-ci
2026-01-09 18:16   ` Martin KaFai Lau
2026-01-09 18:49     ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 05/16] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
2025-12-18 18:27   ` bot+bpf-ci
2025-12-18 17:56 ` [PATCH bpf-next v3 06/16] bpf: Remove task local storage percpu counter Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 07/16] bpf: Remove cgroup " Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 08/16] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 09/16] bpf: Save memory allocation method and size in bpf_local_storage_elem Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 10/16] bpf: Support lockless unlink when freeing map or local storage Amery Hung
2026-01-09 20:16   ` Martin KaFai Lau
2026-01-09 20:47     ` Amery Hung
2026-01-09 21:38       ` Martin KaFai Lau
2026-01-12 22:38         ` Amery Hung
2026-01-13  0:15           ` Martin KaFai Lau
2026-01-12 15:36   ` Kumar Kartikeya Dwivedi
2026-01-12 15:49     ` Kumar Kartikeya Dwivedi
2026-01-12 21:17       ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 11/16] bpf: Switch to bpf_selem_unlink_lockless in bpf_local_storage_{map_free, destroy} Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 12/16] selftests/bpf: Update sk_storage_omem_uncharge test Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 13/16] selftests/bpf: Update task_local_storage/recursion test Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 14/16] selftests/bpf: Update task_local_storage/task_storage_nodeadlock test Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 15/16] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 16/16] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74fa8337-b0cb-42fb-af8a-fdf6877e558d@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=kernel-team@meta.com \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.