All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tao Chen <chen.dylane@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com,
	andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
	song@kernel.org, yonghong.song@linux.dev, kpsingh@kernel.org,
	sdf@fomichev.me, haoluo@google.com, jolsa@kernel.org,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
Date: Thu, 18 Sep 2025 20:45:23 +0800	[thread overview]
Message-ID: <2d9320eb-6be9-43e4-a63e-08e7ab1427e3@linux.dev> (raw)
In-Reply-To: <CAEf4BzZ2Fg+AmFA-K3YODE27br+e0-rLJwn0M5XEwfEHqpPKgQ@mail.gmail.com>

在 2025/9/18 06:16, Andrii Nakryiko 写道:
> On Tue, Sep 9, 2025 at 9:32 AM Tao Chen <chen.dylane@linux.dev> wrote:
>>
>> The stacktrace map can be easily full, which will lead to failure in
>> obtaining the stack. In addition to increasing the size of the map,
>> another solution is to delete the stack_id after looking it up from
> 
> This is not a "solution", really. Just another partially broken
> workaround to an already broken STACKMAP map (and there is no fixing
> it, IMO).
>

Actually it is. But in our use case, we used continuous profiling with 
perf_event, the result looks better when we got the stack_id and deleted 
it to alleviate the data size pressure in the map. And there is no high 
requirement for the accuracy of stack information for us.

> When a user is doing lookup_and_delete for that element, another BPF
> program can reuse that slot, and user space will delete the stack that
> is, effectively, still in use.
> 
> <rant>
> 
> I also just looked at the bpf_stackmap_copy() implementation, and even
> lookup behavior appears broken. We temporarily remove the bucket while
> copying, just to put it back a bit later. Meanwhile another BPF
> program can put something into that bucket and we'll just silently
> discard that new value later. That was a new one for me, but whatever,

Yes, it is a problem.

> as I said STACKMAP cannot be used reliably anyways.
> 
> </rant>
> 
> But let's stay constructive here. Some vague proposals below.
> 
> Really, STACKMAP should have used some form of refcounting and let
> users put those refs, instead of just unconditionally removing the
> element. I wonder if we can retrofit this and treat lookup/delete as
> get/put instead? This would work well for a typical use pattern where
> we send stack_id through ringbuf of some sort and user space fetches
> stack trace by that ID. Each bpf_get_stackid() would be treated as
> refcount bump, and each lookup_and_delete or just delete would be
> treated as refcount put.
> 
> Also, it would be better for STACKMAP to be a proper hashmap and
> handle collisions properly.
> 
> The above two changes would probably make STACKMAP actually usable as
> "a stack trace bank" producing 4-byte IDs that are easily added to
> fixed-sized ringbuf samples as an extra field. This model sometimes is
> way more convenient than getting bpf_get_stack() and copying it into
> ringbuf (which is currently the only way to have reliable stack traces
> with BPF, IMO).
> 
> So, tl;dr. Maybe instead of pretending like we are fixing something
> about STACKMAP with slightly-more-atomic (but not really)
> lookup_and_delete support, maybe let's try to actually make STACKMAP
> usable?.. (it's way harder than this patch, but has more value, IMO)
> 

The idea looks great. I will try to make improvements in this direction, 
though there will be certain challenges for me right now.

> What does everyone think?
> 
> P.S. It seems like a good idea to switch STACKMAP to open addressing
> instead of the current kind-of-bucket-chain-but-not-really
> implementation. It's fixed size and pre-allocated already, so open
> addressing seems like a great approach here, IMO.
> 
>> the user, so extend the existing bpf_map_lookup_and_delete_elem()
>> functionality to stacktrace map types.
>>
>> Signed-off-by: Tao Chen <chen.dylane@linux.dev>
>> ---
>>   include/linux/bpf.h   |  2 +-
>>   kernel/bpf/stackmap.c | 16 ++++++++++++++--
>>   kernel/bpf/syscall.c  |  8 +++++---
>>   3 files changed, 20 insertions(+), 6 deletions(-)
>>
> 
> As for the patch in question, I think the logic is correct :) I find
> bpf_stackmap_copy_and_delete() name bad and misleading, though,
> because it's more of "maybe also delete". Maybe
> bpf_stackmap_extract()? Don't know, it's minor nit anyways.
> 
> [...]Will rename it in v2. The original idea in this patch was just to
make it convenient for users to delete the stackid when they obtain it.

-- 
Best Regards
Tao Chen

      parent reply	other threads:[~2025-09-18 12:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09 16:32 [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE Tao Chen
2025-09-09 16:32 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add stacktrace map lookup_and_delete_elem test case Tao Chen
2025-09-17 22:16 ` [PATCH bpf-next v2 1/2] bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE Andrii Nakryiko
2025-09-18  1:35   ` Alexei Starovoitov
2025-09-18 13:34     ` Tao Chen
2025-09-19  2:01       ` Alexei Starovoitov
2025-09-19  2:08         ` Tao Chen
2025-09-18 12:45   ` Tao Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d9320eb-6be9-43e4-a63e-08e7ab1427e3@linux.dev \
    --to=chen.dylane@linux.dev \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.