From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Abel Wu <wuyun.abel@bytedance.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, David Vernet <void@manifault.com>,
"open list:BPF [STORAGE & CGROUPS]" <bpf@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage
Date: Thu, 19 Dec 2024 10:57:21 -0800 [thread overview]
Message-ID: <00fa1c00-9b14-47d6-917f-17fcb3d5b18a@linux.dev> (raw)
In-Reply-To: <CAADnVQ+pYevQ9QsRB-oLu1ONtzZ31J=3ANqB+aFLLiU4VcGgNA@mail.gmail.com>
On 12/19/24 10:43 AM, Alexei Starovoitov wrote:
> On Thu, Dec 19, 2024 at 10:39 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 12/19/24 4:38 AM, Abel Wu wrote:
>>> Hi Yonghong,
>>>
>>> On 12/19/24 10:45 AM, Yonghong Song Wrote:
>>>>
>>>>
>>>> On 12/18/24 1:21 AM, Abel Wu wrote:
>>>>> The following commit
>>>>> bc235cdb423a ("bpf: Prevent deadlock from recursive
>>>>> bpf_task_storage_[get|delete]")
>>>>> first introduced deadlock prevention for fentry/fexit programs
>>>>> attaching
>>>>> on bpf_task_storage helpers. That commit also employed the logic in map
>>>>> free path in its v6 version.
>>>>>
>>>>> Later bpf_cgrp_storage was first introduced in
>>>>> c4bcfb38a95e ("bpf: Implement cgroup storage available to
>>>>> non-cgroup-attached bpf progs")
>>>>> which faces the same issue as bpf_task_storage, instead of its busy
>>>>> counter, NULL was passed to bpf_local_storage_map_free() which opened
>>>>> a window to cause deadlock:
>>>>>
>>>>> <TASK>
>>>>> _raw_spin_lock_irqsave+0x3d/0x50
>>>>> bpf_local_storage_update+0xd1/0x460
>>>>> bpf_cgrp_storage_get+0x109/0x130
>>>>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e
>>>>> bpf_trace_run1+0x84/0x100
>>>>> cgroup_storage_ptr+0x4c/0x60
>>>>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160
>>>>> bpf_selem_unlink_storage+0x6f/0x110
>>>>> bpf_local_storage_map_free+0xa2/0x110
>>>>> bpf_map_free_deferred+0x5b/0x90
>>>>> process_one_work+0x17c/0x390
>>>>> worker_thread+0x251/0x360
>>>>> kthread+0xd2/0x100
>>>>> ret_from_fork+0x34/0x50
>>>>> ret_from_fork_asm+0x1a/0x30
>>>>> </TASK>
>>>>>
>>>>> [ Since the verifier treats 'void *' as scalar which
>>>>> prevents me from getting a pointer to 'struct cgroup *',
>>>>> I added a raw tracepoint in cgroup_storage_ptr() to
>>>>> help reproducing this issue. ]
>>>>>
>>>>> Although it is tricky to reproduce, the risk of deadlock exists and
>>>>> worthy of a fix, by passing its busy counter to the free procedure so
>>>>> it can be properly incremented before storage/smap locking.
>>>> The above stack trace and explanation does not show that we will have
>>>> a potential dead lock here. You mentioned that it is tricky to
>>>> reproduce,
>>>> does it mean that you have done some analysis or coding to reproduce it?
>>>> Could you share the details on why you think we may have deadlock here?
>>> The stack is A-A deadlocked: cgroup_storage_ptr() is called with
>>> storage->lock held, while the bpf_prog attaching on this function
>>> also tries to acquire the same lock by calling bpf_cgrp_storage_get()
>>> thus leading to a AA deadlock.
>>>
>>> The tricky part is, instead of attaching on cgroup_storage_ptr()
>>> directly, I added a tracepoint inside it to hook:
>>>
>>> ------
>>> diff --git a/kernel/bpf/bpf_cgrp_storage.c
>>> b/kernel/bpf/bpf_cgrp_storage.c
>>> index 20f05de92e9c..679209d4f88f 100644
>>> --- a/kernel/bpf/bpf_cgrp_storage.c
>>> +++ b/kernel/bpf/bpf_cgrp_storage.c
>>> @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu
>>> **cgroup_storage_ptr(void *owner)
>>> {
>>> struct cgroup *cg = owner;
>>>
>>> + trace_cgroup_ptr(cg);
>>> +
>>> return &cg->bpf_cgrp_storage;
>>> }
>>>
>>> ------
>>>
>>> The reason doing so is that typecasting from 'void *owner' to
>>> 'struct cgroup *' will be rejected by the verifier. But there
>>> could be other ways to obtain a pointer to the @owner cgroup
>>> too, making the deadlock possible.
>> I checked the callstack and what you described indeed the case.
>> In function bpf_selem_unlink_storage(), local_storage->lock is
>> held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr.
>> If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could
>> have a deadlock as you described in the above.
>>
>> As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr
>> does not work due to func signature:
>> struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner)
>> Even say we support 'void *' for fentry and we do bpf_rdonly_cast()
>> to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be
>> passed to helper/kfunc.
>>
>> Your fix looks good but it would be great to have a reproducer.
>> One possibility is to find a function which can be fentried within
>> local_storage->lock. If you know the cgroup id, in bpf program you
>> can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id.
>> and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc.
>> which should be able to triger deadlock. Could you give a try?
> I'd rather mark a set of functions as notrace to avoid this situation
> or add:
> CFLAGS_REMOVE_bpf_cgrp_storage.o = $(CC_FLAGS_FTRACE)
If we go through CFLAGS_REMOVE path, we need to do
CFLAGS_REMOVE_bpf_local_storage.o = $(CC_FLAGS_FTRACE)
as well since bpf_selem_unlink_storage_nolock() calls a few functions
which, if fentry traced, could trigger similar issue (with bpf_cgroup_from_id() approach).
next prev parent reply other threads:[~2024-12-19 18:57 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-18 9:21 [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage Abel Wu
2024-12-19 2:45 ` Yonghong Song
2024-12-19 12:38 ` Abel Wu
2024-12-19 18:39 ` Yonghong Song
2024-12-19 18:43 ` Alexei Starovoitov
2024-12-19 18:57 ` Yonghong Song [this message]
2025-01-26 9:19 ` Abel Wu
2024-12-21 2:26 ` Abel Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00fa1c00-9b14-47d6-917f-17fcb3d5b18a@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=void@manifault.com \
--cc=wuyun.abel@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.