From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: netdev@vger.kernel.org, alexei.starovoitov@gmail.com,
andrii@kernel.org, daniel@iogearbox.net, memxor@gmail.com,
martin.lau@kernel.org, kpsingh@kernel.org,
yonghong.song@linux.dev, song@kernel.org, haoluo@google.com,
kernel-team@meta.com, bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 10/16] bpf: Support lockless unlink when freeing map or local storage
Date: Mon, 12 Jan 2026 16:15:17 -0800 [thread overview]
Message-ID: <1eb1715a-c1b0-4741-8d2d-f66adffcf91d@linux.dev> (raw)
In-Reply-To: <CAMB2axNtrFEL0x+j6M3De-ezR38cv5s7LFtpuAeN7QCf_AyHaQ@mail.gmail.com>
On 1/12/26 2:38 PM, Amery Hung wrote:
>> [ btw, a nit, I think it can use a better function name instead of
>> "lockless". This function still takes the lock if it can. ]
> Does using _nofail suffix make it more clear?
sgtm.
>
>>> sure how destroy() can hold b->lock in a way that causes map_free() to
>>> fail acquiring b->lock.
>> I recall ETIMEDOUT was mentioned to be the likely (only?) case here.
>> Assume the b->lock did fail in map_free here, there are >1 selem(s)
>> using the same b->lock. Is it always true that the selem that failed at
>> the b->lock in map_free() here must race with the very same selem in
>> destroy()?
This is still an open issue/question.
>>
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * Only let destroy() unlink from local_storage->list and do mem_uncharge
>>>>> + * as owner is guaranteed to be valid in destroy().
>>>>> + */
>>>>> + if (local_storage && caller == BPF_LOCAL_STORAGE_DESTROY) {
>>>> If I read here correctly, only bpf_local_storage_destroy() can do the
>>>> bpf_selem_free(). For example, if a bpf_sk_storage_map is going away,
>>>> the selem (which is memcg charged) will stay in the sk until the sk is
>>>> closed?
>>> You read it correctly and Yes there will be stale elements in
>>> local_storage->list.
>>>
>>> I would hope the unlink from local_storage part is doable from
>>> map_free() and destroy(), but it is hard to guarantee (1) mem_uncharge
>>> is done only once (2) while the owner is still valid in a lockless
>>> way.
>> This needs to be addressed. It cannot leave the selem lingering. At
>> least the selem should be freed for the common case (i.e., succeeds in
>> both locks). Lingering selem is ok in case of lock failure. Many sk can
>> be long-lived connections. The user space may want to recreate the map,
>> and it will be limited by the memcg.
> I think this is doable by maintaining a local memory charge in local storage.
>
> So, remove selem->size and have a copy of total selem memory charge in
> a new local_storage->selem_size. Update will be protected by
> local_storage->lock in common paths (not in bpf_selem_unlink_nofail).
> More specifically, charge in bpf_selem_link_storage_nolock() when a
> selem is going to be publicized. uncharge in
> bpf_selem_unlink_storage_nolock() when a selem is being deleted. Then,
> in destroy() we simply get the total amount to be uncharged from the
> owner from local_storage->selem_size.
Right, I had a similar thought before. Because of the nofail/lockless
consideration, I suspect the local_storage->selem_size will need to be
atomic. I am not sure if it is enough though, e.g. there is a debug/warn
on sk->sk_omem_alloc in __sk_destruct to ensure it is 0, so I stopped
thinking on it for now, but it could be a direction to explore.
If it is the case, it is another atomic in the destruct/map_free code
path. I am still open-minded on the nofail requirement for both locks,
but complexity is building up. Kumar has also commented that b->lock in
map_free should not fail. Regardless, I think lets get the nofail code
path for map_free() sorted out first. Then we can move on to handle this
case.
next prev parent reply other threads:[~2026-01-13 0:15 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-18 17:56 [PATCH bpf-next v3 00/16] Remove task and cgroup local storage percpu counters Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 01/16] bpf: Convert bpf_selem_unlink_map to failable Amery Hung
2025-12-18 18:27 ` bot+bpf-ci
2026-01-08 20:40 ` Martin KaFai Lau
2026-01-08 20:29 ` Martin KaFai Lau
2026-01-09 18:39 ` Amery Hung
2026-01-09 21:53 ` Martin KaFai Lau
2026-01-12 17:47 ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 02/16] bpf: Convert bpf_selem_link_map " Amery Hung
2025-12-18 18:19 ` bot+bpf-ci
2025-12-18 17:56 ` [PATCH bpf-next v3 03/16] bpf: Open code bpf_selem_unlink_storage in bpf_selem_unlink Amery Hung
2026-01-09 17:42 ` Martin KaFai Lau
2026-01-09 18:49 ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 04/16] bpf: Convert bpf_selem_unlink to failable Amery Hung
2025-12-18 18:27 ` bot+bpf-ci
2026-01-09 18:16 ` Martin KaFai Lau
2026-01-09 18:49 ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 05/16] bpf: Change local_storage->lock and b->lock to rqspinlock Amery Hung
2025-12-18 18:27 ` bot+bpf-ci
2025-12-18 17:56 ` [PATCH bpf-next v3 06/16] bpf: Remove task local storage percpu counter Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 07/16] bpf: Remove cgroup " Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 08/16] bpf: Remove unused percpu counter from bpf_local_storage_map_free Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 09/16] bpf: Save memory allocation method and size in bpf_local_storage_elem Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 10/16] bpf: Support lockless unlink when freeing map or local storage Amery Hung
2026-01-09 20:16 ` Martin KaFai Lau
2026-01-09 20:47 ` Amery Hung
2026-01-09 21:38 ` Martin KaFai Lau
2026-01-12 22:38 ` Amery Hung
2026-01-13 0:15 ` Martin KaFai Lau [this message]
2026-01-12 15:36 ` Kumar Kartikeya Dwivedi
2026-01-12 15:49 ` Kumar Kartikeya Dwivedi
2026-01-12 21:17 ` Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 11/16] bpf: Switch to bpf_selem_unlink_lockless in bpf_local_storage_{map_free, destroy} Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 12/16] selftests/bpf: Update sk_storage_omem_uncharge test Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 13/16] selftests/bpf: Update task_local_storage/recursion test Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 14/16] selftests/bpf: Update task_local_storage/task_storage_nodeadlock test Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 15/16] selftests/bpf: Remove test_task_storage_map_stress_lookup Amery Hung
2025-12-18 17:56 ` [PATCH bpf-next v3 16/16] selftests/bpf: Choose another percpu variable in bpf for btf_dump test Amery Hung
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1eb1715a-c1b0-4741-8d2d-f66adffcf91d@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox