From: Philo Lu <lulie@linux.alibaba.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Shung-Hsi Yu <shung-hsi.yu@suse.com>,
bpf@vger.kernel.org, song@kernel.org, andrii@kernel.org,
ast@kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
xuanzhuo@linux.alibaba.com, dust.li@linux.alibaba.com,
guwen@linux.alibaba.com, alibuda@linux.alibaba.com,
hengqi@linux.alibaba.com, Nathan Slingerland <slinger@meta.com>,
"rihams@meta.com" <rihams@meta.com>,
Alan Maguire <alan.maguire@oracle.com>,
Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: Question about bpf perfbuf/ringbuf: pinned in backend with overwriting
Date: Thu, 21 Dec 2023 21:00:39 +0800 [thread overview]
Message-ID: <cde8a134-8185-4387-a2f5-db2f1173b31b@linux.alibaba.com> (raw)
In-Reply-To: <20231219083851.0ec83349@gandalf.local.home>
Hi Steven,
Thanks for your explanation about ftrace ring buffer. Also thanks to
Shung-Hsi for the discussion.
Here are some features of ftrace buffer that I'm not sure if they are
right. Could you please tell me if my understandings correct?
(1) When reading and writing occur concurrently:
(a) If reader is faster than writer, the reader cannot get the page
which is still being written, which means the reader cannot get the data
immediately of one-page length in the worst case.
(b) If writer is faster than reader, the only race between them is
when reader is doing swap while writer wraps in overwrite mode. But if
the reader has finished swapping, the writer can wrap safely, because
the reader page if already out of the buffer page list.
(2) As the per-cpu buffer list is dynamic with reader page moves, we
cannot do mmap to expose the buffer to user. Users can consume at most
one page at a time.
(3) The wake-up behavior is controllable. If there is no waiter at all,
no overhead will be induced because of waking up.
Thanks.
On 2023/12/19 21:38, Steven Rostedt wrote:
> On Tue, 19 Dec 2023 14:23:59 +0800
> Shung-Hsi Yu <shung-hsi.yu@suse.com> wrote:
>
>> Curious whether it is possible to reuse ftrace's trace buffer instead
>> (or it's underlying ring buffer implementation at
>> kernel/trace/ring_buffer.c). AFAICT it satisfies both requirements that
>> Philo stated: (1) no need for user process as the buffer is accessible
>> through tracefs, and (2) has an overwrite mode.
>
> Yes, the ftrace ring-buffer was in fact designed for the above use case.
>
>>
>> Further more, a natural feature request that would come after
>> overwriting support would be snapshotting, and that has already been
>> covered in ftrace.
>
> Yes, it has that too.
>
>>
>> Note: technically BPF program could already write to ftrace's trace
>> buffer with the bpf_trace_vprintk() helper, but that goes through string
>> formatting and only allows writing into to the global buffer.
>
> When eBPF was first being developed, Alexei told me he tried the ftrace
> ring buffer, and he said the filtering was too slow. That's because it
> would always write into the ring buffer and then try to discard it after
> the fact, which required a few cmpxchg to synchronize. He decided that the
> perf ring buffer was a better fit for this.
>
> That was solved with this: 0fc1b09ff1ff4 ("tracing: Use temp buffer when
> filtering events") Which makes the filtering similar to perf as perf always
> copies events to a temporary buffer first.
>
> It still falls back to writing directly into the ring buffer if the temp
> buffer is currently being used by another event on the same CPU.
>
> Note that the perf ring buffer was designed for profiling (taking
> intermediate traces) and tightly coupled to have a reader. Whereas the
> ftrace ring buffer was designed for high speed constant tracing, with or
> without a reader.
>
> -- Steve
next prev parent reply other threads:[~2023-12-21 13:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 13:15 Question about bpf perfbuf/ringbuf: pinned in backend with overwriting Philo Lu
2023-12-07 14:48 ` Alan Maguire
2023-12-08 22:32 ` Andrii Nakryiko
2023-12-11 12:39 ` Philo Lu
2023-12-13 23:35 ` Andrii Nakryiko
2023-12-15 10:10 ` Philo Lu
2023-12-15 22:39 ` Andrii Nakryiko
2023-12-16 8:50 ` Dmitry Vyukov
2023-12-18 12:58 ` Philo Lu
2023-12-19 19:25 ` Andrii Nakryiko
2023-12-19 6:23 ` Shung-Hsi Yu
2023-12-19 13:38 ` Steven Rostedt
2023-12-19 17:01 ` Alexei Starovoitov
2023-12-19 17:28 ` Steven Rostedt
2023-12-21 13:00 ` Philo Lu [this message]
2023-12-21 14:49 ` Steven Rostedt
2023-12-22 12:25 ` Philo Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cde8a134-8185-4387-a2f5-db2f1173b31b@linux.alibaba.com \
--to=lulie@linux.alibaba.com \
--cc=alan.maguire@oracle.com \
--cc=alibuda@linux.alibaba.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dust.li@linux.alibaba.com \
--cc=guwen@linux.alibaba.com \
--cc=hengqi@linux.alibaba.com \
--cc=mhiramat@kernel.org \
--cc=rihams@meta.com \
--cc=rostedt@goodmis.org \
--cc=shung-hsi.yu@suse.com \
--cc=slinger@meta.com \
--cc=song@kernel.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox