BPF List
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Jon Doron <arilou@gmail.com>,
	bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net
Cc: Jon Doron <jond@wiz.io>
Subject: Re: [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the ring buffer
Date: Fri, 15 Jul 2022 09:58:39 -0700	[thread overview]
Message-ID: <36d47140-b144-0f72-d79c-18b8f3d3be5e@fb.com> (raw)
In-Reply-To: <20220715141835.93513-2-arilou@gmail.com>



On 7/15/22 7:18 AM, Jon Doron wrote:
> From: Jon Doron <jond@wiz.io>
> 
> Add support for writing a custom event reader, by exposing the ring
> buffer.
> 
> Few simple examples where this type of needed:
> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>     to handle the wrap-around in some other way.
> 2. Since perf buf is per-cpu then the order of the events is not
>     guarnteed, for example:
>     Given 3 events where each event has a timestamp t0 < t1 < t2,
>     and the events are spread on more than 1 CPU, then we can end
>     up with the following state in the ring buf:
>     CPU[0] => [t0, t2]
>     CPU[1] => [t1]
>     When you consume the events from CPU[0], you could know there is
>     a t1 missing, (assuming there are no drops, and your event data
>     contains a sequential index).
>     So now one can simply do the following, for CPU[0], you can store
>     the address of t0 and t2 in an array (without moving the tail, so
>     there data is not perished) then move on the CPU[1] and set the
>     address of t1 in the same array.
>     So you end up with something like:
>     void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>     and move the tails as you process in order.
> 3. Assuming there are multiple CPUs and we want to start draining the
>     messages from them, then we can "pick" with which one to start with
>     according to the remaining free space in the ring buffer.
> 
> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
>   tools/lib/bpf/libbpf.c   | 26 ++++++++++++++++++++++++++
>   tools/lib/bpf/libbpf.h   |  2 ++
>   tools/lib/bpf/libbpf.map |  1 +
>   3 files changed, 29 insertions(+)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e89cc9c885b3..250263812194 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -12485,6 +12485,32 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
>   	return cpu_buf->fd;
>   }
>   
> +/*
> + * Return the memory region of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This ring buffer can be used to implement
> + * a custom events consumer.
> + * The ring buffer starts with the *struct perf_event_mmap_page*, which
> + * holds the ring buffer managment fields, when accessing the header
> + * structure it's important to be SMP aware.
> + * You can refer to *perf_event_read_simple* for a simple example.
> + */
> +int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> +			size_t *buf_size)
> +{
> +	struct perf_cpu_buf *cpu_buf;
> +
> +	if (buf_idx >= pb->cpu_cnt)
> +		return libbpf_err(-EINVAL);
> +
> +	cpu_buf = pb->cpu_bufs[buf_idx];
> +	if (!cpu_buf)
> +		return libbpf_err(-ENOENT);
> +
> +	*buf = cpu_buf->base;
> +	*buf_size = pb->mmap_size;
> +	return 0;
> +}
> +
>   /*
>    * Consume data from perf ring buffer corresponding to slot *buf_idx* in
>    * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 9e9a3fd3edd8..78a7ab8f610a 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1381,6 +1381,8 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
>   LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
>   LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
>   LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
> +LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> +				   size_t *buf_size);
>   
>   typedef enum bpf_perf_event_ret
>   	(*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 52973cffc20c..971072c6dfd8 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -458,6 +458,7 @@ LIBBPF_0.8.0 {
>   		bpf_program__set_insns;
>   		libbpf_register_prog_handler;
>   		libbpf_unregister_prog_handler;
> +		perf_buffer__buffer;

You cannot add the LIBBPF_0.7.0 which has been released.
Please add to LIBBPF_1.0.0.

>   } LIBBPF_0.7.0;
>   
>   LIBBPF_1.0.0 {

  parent reply	other threads:[~2022-07-15 16:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15 14:18 [PATCH bpf-next v3 0/1] libbpf: perfbuf expose ring buffer Jon Doron
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
2022-07-15 16:54   ` Andrii Nakryiko
2022-07-15 16:58   ` Yonghong Song [this message]
2022-07-15 16:53 ` [PATCH bpf-next v3 0/1] libbpf: perfbuf expose " Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36d47140-b144-0f72-d79c-18b8f3d3be5e@fb.com \
    --to=yhs@fb.com \
    --cc=andrii@kernel.org \
    --cc=arilou@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jond@wiz.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox