All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Chen <chen.dylane@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, bpf <bpf@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
Date: Thu, 18 Sep 2025 21:34:54 +0800	[thread overview]
Message-ID: <457b805f-ea5c-460e-b93f-b7b63f3358af@linux.dev> (raw)
In-Reply-To: <CAADnVQ+s8B7-fvR1TNO-bniSyKv57cH_ihRszmZV7pQDyV=VDQ@mail.gmail.com>

在 2025/9/18 09:35, Alexei Starovoitov 写道:
> On Wed, Sep 17, 2025 at 3:16 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>>
>> P.S. It seems like a good idea to switch STACKMAP to open addressing
>> instead of the current kind-of-bucket-chain-but-not-really
>> implementation. It's fixed size and pre-allocated already, so open
>> addressing seems like a great approach here, IMO.
> 
> That makes sense. It won't have backward compat issues.
> Just more reliable stack_id.
> 
> Fixed value_size is another footgun there.
> Especially for collecting user stack traces.
> We can switch the whole stackmap to bpf_mem_alloc()
> or wait for kmalloc_nolock().
> But it's probably a diminishing return.
> 
> bpf_get_stack() also isn't great with a copy into
> perf_callchain_entry, then 2nd copy into on stack/percpu buf/ringbuf,
> and 3rd copy of correct size into ringbuf (optional).
> 
> Also, I just realized we have another nasty race there.
> In the past bpf progs were run in preempt disabled context,
> but we forgot to adjust bpf_get_stack[id]() helpers when everything
> switched to migrate disable.
> 
> The return value from get_perf_callchain() may be reused
> if another task preempts and requests the stack.
> We have partially incorrect comment in __bpf_get_stack() too:
>          if (may_fault)
>                  rcu_read_lock(); /* need RCU for perf's callchain below */
> 
> rcu can be preemptable. so rcu_read_lock() makes
> trace = get_perf_callchain(...)
> accessible, but that per-cpu trace buffer can be overwritten.
> It's not an issue for CONFIG_PREEMPT_NONE=y, but that doesn't
> give much comfort.

Hi Alexei,

Can we fix it like this?

-       if (may_fault)
-               rcu_read_lock(); /* need RCU for perf's callchain below */
+       preempt_diable();

         if (trace_in)
                 trace = trace_in;
@@ -455,8 +454,7 @@ static long __bpf_get_stack(struct pt_regs *regs, 
struct task_struct *task,
                                            crosstask, false);

         if (unlikely(!trace) || trace->nr < skip) {
-               if (may_fault)
-                       rcu_read_unlock();
+               preempt_enable();
                 goto err_fault;
         }

@@ -475,9 +473,7 @@ static long __bpf_get_stack(struct pt_regs *regs, 
struct task_struct *task,
                 memcpy(buf, ips, copy_len);
         }

-       /* trace/ips should not be dereferenced after this point */
-       if (may_fault)
-               rcu_read_unlock();
+       preempt_enable();

> 
> Modern day bpf api would probably be
> - get_callchain_entry()/put() kfuncs to expose low level mechanism
> with safe acq/rel of temp buffer.
> - then another kfuncs to perf_callchain_kernel/user into that buffer.
> 
> and with bpf_mem_alloc and hash kfuncs the bpf prog can
> implement either bpf_get_stack() equivalent or much better
> bpf_get_stackid() with variable length stack traces and so on.


-- 
Best Regards
Tao Chen

  reply	other threads:[~2025-09-18 13:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09 16:32 [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE Tao Chen
2025-09-09 16:32 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add stacktrace map lookup_and_delete_elem test case Tao Chen
2025-09-17 22:16 ` [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE Andrii Nakryiko
2025-09-18  1:35   ` Alexei Starovoitov
2025-09-18 13:34     ` Tao Chen [this message]
2025-09-19  2:01       ` Alexei Starovoitov
2025-09-19  2:08         ` Tao Chen
2025-09-18 12:45   ` Tao Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=457b805f-ea5c-460e-b93f-b7b63f3358af@linux.dev \
    --to=chen.dylane@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.