From: Tao Chen <chen.dylane@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, bpf <bpf@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
Date: Fri, 19 Sep 2025 10:08:12 +0800 [thread overview]
Message-ID: <f2fd90a9-bc7d-43b8-ac5e-9d233219dcfb@linux.dev> (raw)
In-Reply-To: <CAADnVQLwV=fUkgLF3uTmevA97WX2FH4vG-7=97Px0H_WJOJieQ@mail.gmail.com>
在 2025/9/19 10:01, Alexei Starovoitov 写道:
> On Thu, Sep 18, 2025 at 6:35 AM Tao Chen <chen.dylane@linux.dev> wrote:
>>
>> 在 2025/9/18 09:35, Alexei Starovoitov 写道:
>>> On Wed, Sep 17, 2025 at 3:16 PM Andrii Nakryiko
>>> <andrii.nakryiko@gmail.com> wrote:
>>>>
>>>>
>>>> P.S. It seems like a good idea to switch STACKMAP to open addressing
>>>> instead of the current kind-of-bucket-chain-but-not-really
>>>> implementation. It's fixed size and pre-allocated already, so open
>>>> addressing seems like a great approach here, IMO.
>>>
>>> That makes sense. It won't have backward compat issues.
>>> Just more reliable stack_id.
>>>
>>> Fixed value_size is another footgun there.
>>> Especially for collecting user stack traces.
>>> We can switch the whole stackmap to bpf_mem_alloc()
>>> or wait for kmalloc_nolock().
>>> But it's probably a diminishing return.
>>>
>>> bpf_get_stack() also isn't great with a copy into
>>> perf_callchain_entry, then 2nd copy into on stack/percpu buf/ringbuf,
>>> and 3rd copy of correct size into ringbuf (optional).
>>>
>>> Also, I just realized we have another nasty race there.
>>> In the past bpf progs were run in preempt disabled context,
>>> but we forgot to adjust bpf_get_stack[id]() helpers when everything
>>> switched to migrate disable.
>>>
>>> The return value from get_perf_callchain() may be reused
>>> if another task preempts and requests the stack.
>>> We have partially incorrect comment in __bpf_get_stack() too:
>>> if (may_fault)
>>> rcu_read_lock(); /* need RCU for perf's callchain below */
>>>
>>> rcu can be preemptable. so rcu_read_lock() makes
>>> trace = get_perf_callchain(...)
>>> accessible, but that per-cpu trace buffer can be overwritten.
>>> It's not an issue for CONFIG_PREEMPT_NONE=y, but that doesn't
>>> give much comfort.
>>
>> Hi Alexei,
>>
>> Can we fix it like this?
>>
>> - if (may_fault)
>> - rcu_read_lock(); /* need RCU for perf's callchain below */
>> + preempt_diable();
>>
>> if (trace_in)
>> trace = trace_in;
>> @@ -455,8 +454,7 @@ static long __bpf_get_stack(struct pt_regs *regs,
>> struct task_struct *task,
>> crosstask, false);
>>
>> if (unlikely(!trace) || trace->nr < skip) {
>> - if (may_fault)
>> - rcu_read_unlock();
>> + preempt_enable();
>> goto err_fault;
>> }
>>
>> @@ -475,9 +473,7 @@ static long __bpf_get_stack(struct pt_regs *regs,
>> struct task_struct *task,
>> memcpy(buf, ips, copy_len);
>> }
>>
>> - /* trace/ips should not be dereferenced after this point */
>> - if (may_fault)
>> - rcu_read_unlock();
>> + preempt_enable();
>
> That should do it. Don't see an issue at first glance.
Ok, i will send a patch later, thanks.
--
Best Regards
Tao Chen
next prev parent reply other threads:[~2025-09-19 2:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 16:32 [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE Tao Chen
2025-09-09 16:32 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add stacktrace map lookup_and_delete_elem test case Tao Chen
2025-09-17 22:16 ` [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE Andrii Nakryiko
2025-09-18 1:35 ` Alexei Starovoitov
2025-09-18 13:34 ` Tao Chen
2025-09-19 2:01 ` Alexei Starovoitov
2025-09-19 2:08 ` Tao Chen [this message]
2025-09-18 12:45 ` Tao Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2fd90a9-bc7d-43b8-ac5e-9d233219dcfb@linux.dev \
--to=chen.dylane@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox