* Re: [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the ring buffer
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
@ 2022-07-15 16:54 ` Andrii Nakryiko
2022-07-15 16:58 ` Yonghong Song
1 sibling, 0 replies; 5+ messages in thread
From: Andrii Nakryiko @ 2022-07-15 16:54 UTC (permalink / raw)
To: Jon Doron
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Jon Doron
On Fri, Jul 15, 2022 at 7:18 AM Jon Doron <arilou@gmail.com> wrote:
>
> From: Jon Doron <jond@wiz.io>
>
> Add support for writing a custom event reader, by exposing the ring
> buffer.
>
> Few simple examples where this type of needed:
> 1. perf_event_read_simple is allocating using malloc, perhaps you want
> to handle the wrap-around in some other way.
> 2. Since perf buf is per-cpu then the order of the events is not
> guarnteed, for example:
> Given 3 events where each event has a timestamp t0 < t1 < t2,
> and the events are spread on more than 1 CPU, then we can end
> up with the following state in the ring buf:
> CPU[0] => [t0, t2]
> CPU[1] => [t1]
> When you consume the events from CPU[0], you could know there is
> a t1 missing, (assuming there are no drops, and your event data
> contains a sequential index).
> So now one can simply do the following, for CPU[0], you can store
> the address of t0 and t2 in an array (without moving the tail, so
> there data is not perished) then move on the CPU[1] and set the
> address of t1 in the same array.
> So you end up with something like:
> void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
> and move the tails as you process in order.
> 3. Assuming there are multiple CPUs and we want to start draining the
> messages from them, then we can "pick" with which one to start with
> according to the remaining free space in the ring buffer.
It would be good to mention that with perf_buffer__buffer() you get
access to raw mmap()'ed per-CPU underlying memory region, which
contains both perf buffer data and header, which in turn contains
head/tail positions, so the above scenarios are possible to implement
now.
>
> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
> tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 2 ++
> tools/lib/bpf/libbpf.map | 1 +
> 3 files changed, 29 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e89cc9c885b3..250263812194 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -12485,6 +12485,32 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
> return cpu_buf->fd;
> }
>
> +/*
> + * Return the memory region of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This ring buffer can be used to implement
> + * a custom events consumer.
> + * The ring buffer starts with the *struct perf_event_mmap_page*, which
> + * holds the ring buffer managment fields, when accessing the header
> + * structure it's important to be SMP aware.
> + * You can refer to *perf_event_read_simple* for a simple example.
> + */
Please move this comment into libbpf.h to corresponding API
declaration site. And make sure it is a proper doc comment (so add
descriptions for parameters and return results
> +int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size)
nit: if this fits under 100 characters, keep it as a single line
> +{
> + struct perf_cpu_buf *cpu_buf;
> +
> + if (buf_idx >= pb->cpu_cnt)
> + return libbpf_err(-EINVAL);
> +
> + cpu_buf = pb->cpu_bufs[buf_idx];
> + if (!cpu_buf)
> + return libbpf_err(-ENOENT);
> +
> + *buf = cpu_buf->base;
> + *buf_size = pb->mmap_size;
> + return 0;
> +}
> +
The code itself looks good and straightforward
> /*
> * Consume data from perf ring buffer corresponding to slot *buf_idx* in
> * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 9e9a3fd3edd8..78a7ab8f610a 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1381,6 +1381,8 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
> LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
here should be that doc comment
> +LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size);
>
> typedef enum bpf_perf_event_ret
> (*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 52973cffc20c..971072c6dfd8 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -458,6 +458,7 @@ LIBBPF_0.8.0 {
> bpf_program__set_insns;
> libbpf_register_prog_handler;
> libbpf_unregister_prog_handler;
> + perf_buffer__buffer;
Current development version is 1.0.0, please put new API in the right section
> } LIBBPF_0.7.0;
>
> LIBBPF_1.0.0 {
> --
> 2.36.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the ring buffer
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
2022-07-15 16:54 ` Andrii Nakryiko
@ 2022-07-15 16:58 ` Yonghong Song
1 sibling, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2022-07-15 16:58 UTC (permalink / raw)
To: Jon Doron, bpf, ast, andrii, daniel; +Cc: Jon Doron
On 7/15/22 7:18 AM, Jon Doron wrote:
> From: Jon Doron <jond@wiz.io>
>
> Add support for writing a custom event reader, by exposing the ring
> buffer.
>
> Few simple examples where this type of needed:
> 1. perf_event_read_simple is allocating using malloc, perhaps you want
> to handle the wrap-around in some other way.
> 2. Since perf buf is per-cpu then the order of the events is not
> guarnteed, for example:
> Given 3 events where each event has a timestamp t0 < t1 < t2,
> and the events are spread on more than 1 CPU, then we can end
> up with the following state in the ring buf:
> CPU[0] => [t0, t2]
> CPU[1] => [t1]
> When you consume the events from CPU[0], you could know there is
> a t1 missing, (assuming there are no drops, and your event data
> contains a sequential index).
> So now one can simply do the following, for CPU[0], you can store
> the address of t0 and t2 in an array (without moving the tail, so
> there data is not perished) then move on the CPU[1] and set the
> address of t1 in the same array.
> So you end up with something like:
> void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
> and move the tails as you process in order.
> 3. Assuming there are multiple CPUs and we want to start draining the
> messages from them, then we can "pick" with which one to start with
> according to the remaining free space in the ring buffer.
>
> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
> tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 2 ++
> tools/lib/bpf/libbpf.map | 1 +
> 3 files changed, 29 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e89cc9c885b3..250263812194 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -12485,6 +12485,32 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
> return cpu_buf->fd;
> }
>
> +/*
> + * Return the memory region of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This ring buffer can be used to implement
> + * a custom events consumer.
> + * The ring buffer starts with the *struct perf_event_mmap_page*, which
> + * holds the ring buffer managment fields, when accessing the header
> + * structure it's important to be SMP aware.
> + * You can refer to *perf_event_read_simple* for a simple example.
> + */
> +int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size)
> +{
> + struct perf_cpu_buf *cpu_buf;
> +
> + if (buf_idx >= pb->cpu_cnt)
> + return libbpf_err(-EINVAL);
> +
> + cpu_buf = pb->cpu_bufs[buf_idx];
> + if (!cpu_buf)
> + return libbpf_err(-ENOENT);
> +
> + *buf = cpu_buf->base;
> + *buf_size = pb->mmap_size;
> + return 0;
> +}
> +
> /*
> * Consume data from perf ring buffer corresponding to slot *buf_idx* in
> * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 9e9a3fd3edd8..78a7ab8f610a 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1381,6 +1381,8 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
> LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
> +LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size);
>
> typedef enum bpf_perf_event_ret
> (*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 52973cffc20c..971072c6dfd8 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -458,6 +458,7 @@ LIBBPF_0.8.0 {
> bpf_program__set_insns;
> libbpf_register_prog_handler;
> libbpf_unregister_prog_handler;
> + perf_buffer__buffer;
You cannot add the LIBBPF_0.7.0 which has been released.
Please add to LIBBPF_1.0.0.
> } LIBBPF_0.7.0;
>
> LIBBPF_1.0.0 {
^ permalink raw reply [flat|nested] 5+ messages in thread