* [PATCH bpf-next v3 0/1] libbpf: perfbuf expose ring buffer
@ 2022-07-15 14:18 Jon Doron
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
2022-07-15 16:53 ` [PATCH bpf-next v3 0/1] libbpf: perfbuf expose " Yonghong Song
0 siblings, 2 replies; 5+ messages in thread
From: Jon Doron @ 2022-07-15 14:18 UTC (permalink / raw)
To: bpf, ast, andrii, daniel; +Cc: Jon Doron
From: Jon Doron <jond@wiz.io>
Add support for writing a custom event reader, by exposing the ring
buffer.
Few simple examples where this type of needed:
1. perf_event_read_simple is allocating using malloc, perhaps you want
to handle the wrap-around in some other way.
2. Since perf buf is per-cpu then the order of the events is not
guarnteed, for example:
Given 3 events where each event has a timestamp t0 < t1 < t2,
and the events are spread on more than 1 CPU, then we can end
up with the following state in the ring buf:
CPU[0] => [t0, t2]
CPU[1] => [t1]
When you consume the events from CPU[0], you could know there is
a t1 missing, (assuming there are no drops, and your event data
contains a sequential index).
So now one can simply do the following, for CPU[0], you can store
the address of t0 and t2 in an array (without moving the tail, so
there data is not perished) then move on the CPU[1] and set the
address of t1 in the same array.
So you end up with something like:
void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
and move the tails as you process in order.
3. Assuming there are multiple CPUs and we want to start draining the
messages from them, then we can "pick" with which one to start with
according to the remaining free space in the ring buffer.
Jon Doron (1):
libbpf: perfbuf: Add API to get the ring buffer
tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 2 ++
tools/lib/bpf/libbpf.map | 1 +
3 files changed, 29 insertions(+)
--
2.36.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the ring buffer
2022-07-15 14:18 [PATCH bpf-next v3 0/1] libbpf: perfbuf expose ring buffer Jon Doron
@ 2022-07-15 14:18 ` Jon Doron
2022-07-15 16:54 ` Andrii Nakryiko
2022-07-15 16:58 ` Yonghong Song
2022-07-15 16:53 ` [PATCH bpf-next v3 0/1] libbpf: perfbuf expose " Yonghong Song
1 sibling, 2 replies; 5+ messages in thread
From: Jon Doron @ 2022-07-15 14:18 UTC (permalink / raw)
To: bpf, ast, andrii, daniel; +Cc: Jon Doron
From: Jon Doron <jond@wiz.io>
Add support for writing a custom event reader, by exposing the ring
buffer.
Few simple examples where this type of needed:
1. perf_event_read_simple is allocating using malloc, perhaps you want
to handle the wrap-around in some other way.
2. Since perf buf is per-cpu then the order of the events is not
guarnteed, for example:
Given 3 events where each event has a timestamp t0 < t1 < t2,
and the events are spread on more than 1 CPU, then we can end
up with the following state in the ring buf:
CPU[0] => [t0, t2]
CPU[1] => [t1]
When you consume the events from CPU[0], you could know there is
a t1 missing, (assuming there are no drops, and your event data
contains a sequential index).
So now one can simply do the following, for CPU[0], you can store
the address of t0 and t2 in an array (without moving the tail, so
there data is not perished) then move on the CPU[1] and set the
address of t1 in the same array.
So you end up with something like:
void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
and move the tails as you process in order.
3. Assuming there are multiple CPUs and we want to start draining the
messages from them, then we can "pick" with which one to start with
according to the remaining free space in the ring buffer.
Signed-off-by: Jon Doron <jond@wiz.io>
---
tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 2 ++
tools/lib/bpf/libbpf.map | 1 +
3 files changed, 29 insertions(+)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e89cc9c885b3..250263812194 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12485,6 +12485,32 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
return cpu_buf->fd;
}
+/*
+ * Return the memory region of a ring buffer in *buf_idx* slot of
+ * PERF_EVENT_ARRAY BPF map. This ring buffer can be used to implement
+ * a custom events consumer.
+ * The ring buffer starts with the *struct perf_event_mmap_page*, which
+ * holds the ring buffer managment fields, when accessing the header
+ * structure it's important to be SMP aware.
+ * You can refer to *perf_event_read_simple* for a simple example.
+ */
+int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
+ size_t *buf_size)
+{
+ struct perf_cpu_buf *cpu_buf;
+
+ if (buf_idx >= pb->cpu_cnt)
+ return libbpf_err(-EINVAL);
+
+ cpu_buf = pb->cpu_bufs[buf_idx];
+ if (!cpu_buf)
+ return libbpf_err(-ENOENT);
+
+ *buf = cpu_buf->base;
+ *buf_size = pb->mmap_size;
+ return 0;
+}
+
/*
* Consume data from perf ring buffer corresponding to slot *buf_idx* in
* PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 9e9a3fd3edd8..78a7ab8f610a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1381,6 +1381,8 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
+LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
+ size_t *buf_size);
typedef enum bpf_perf_event_ret
(*bpf_perf_event_print_t)(struct perf_event_header *hdr,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 52973cffc20c..971072c6dfd8 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -458,6 +458,7 @@ LIBBPF_0.8.0 {
bpf_program__set_insns;
libbpf_register_prog_handler;
libbpf_unregister_prog_handler;
+ perf_buffer__buffer;
} LIBBPF_0.7.0;
LIBBPF_1.0.0 {
--
2.36.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next v3 0/1] libbpf: perfbuf expose ring buffer
2022-07-15 14:18 [PATCH bpf-next v3 0/1] libbpf: perfbuf expose ring buffer Jon Doron
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
@ 2022-07-15 16:53 ` Yonghong Song
1 sibling, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2022-07-15 16:53 UTC (permalink / raw)
To: Jon Doron, bpf, ast, andrii, daniel; +Cc: Jon Doron
On 7/15/22 7:18 AM, Jon Doron wrote:
> From: Jon Doron <jond@wiz.io>
>
> Add support for writing a custom event reader, by exposing the ring
> buffer.
>
> Few simple examples where this type of needed:
> 1. perf_event_read_simple is allocating using malloc, perhaps you want
> to handle the wrap-around in some other way.
> 2. Since perf buf is per-cpu then the order of the events is not
> guarnteed, for example:
> Given 3 events where each event has a timestamp t0 < t1 < t2,
> and the events are spread on more than 1 CPU, then we can end
> up with the following state in the ring buf:
> CPU[0] => [t0, t2]
> CPU[1] => [t1]
> When you consume the events from CPU[0], you could know there is
> a t1 missing, (assuming there are no drops, and your event data
> contains a sequential index).
> So now one can simply do the following, for CPU[0], you can store
> the address of t0 and t2 in an array (without moving the tail, so
> there data is not perished) then move on the CPU[1] and set the
> address of t1 in the same array.
> So you end up with something like:
> void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
> and move the tails as you process in order.
> 3. Assuming there are multiple CPUs and we want to start draining the
> messages from them, then we can "pick" with which one to start with
> according to the remaining free space in the ring buffer.
>
> Jon Doron (1):
> libbpf: perfbuf: Add API to get the ring buffer
>
> tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 2 ++
> tools/lib/bpf/libbpf.map | 1 +
> 3 files changed, 29 insertions(+)
>
This is a patch set with a single patch. There is no need for
the cover letter. Further the cover letter description is
identical to the commit message of the first patch.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the ring buffer
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
@ 2022-07-15 16:54 ` Andrii Nakryiko
2022-07-15 16:58 ` Yonghong Song
1 sibling, 0 replies; 5+ messages in thread
From: Andrii Nakryiko @ 2022-07-15 16:54 UTC (permalink / raw)
To: Jon Doron
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Jon Doron
On Fri, Jul 15, 2022 at 7:18 AM Jon Doron <arilou@gmail.com> wrote:
>
> From: Jon Doron <jond@wiz.io>
>
> Add support for writing a custom event reader, by exposing the ring
> buffer.
>
> Few simple examples where this type of needed:
> 1. perf_event_read_simple is allocating using malloc, perhaps you want
> to handle the wrap-around in some other way.
> 2. Since perf buf is per-cpu then the order of the events is not
> guarnteed, for example:
> Given 3 events where each event has a timestamp t0 < t1 < t2,
> and the events are spread on more than 1 CPU, then we can end
> up with the following state in the ring buf:
> CPU[0] => [t0, t2]
> CPU[1] => [t1]
> When you consume the events from CPU[0], you could know there is
> a t1 missing, (assuming there are no drops, and your event data
> contains a sequential index).
> So now one can simply do the following, for CPU[0], you can store
> the address of t0 and t2 in an array (without moving the tail, so
> there data is not perished) then move on the CPU[1] and set the
> address of t1 in the same array.
> So you end up with something like:
> void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
> and move the tails as you process in order.
> 3. Assuming there are multiple CPUs and we want to start draining the
> messages from them, then we can "pick" with which one to start with
> according to the remaining free space in the ring buffer.
It would be good to mention that with perf_buffer__buffer() you get
access to raw mmap()'ed per-CPU underlying memory region, which
contains both perf buffer data and header, which in turn contains
head/tail positions, so the above scenarios are possible to implement
now.
>
> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
> tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 2 ++
> tools/lib/bpf/libbpf.map | 1 +
> 3 files changed, 29 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e89cc9c885b3..250263812194 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -12485,6 +12485,32 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
> return cpu_buf->fd;
> }
>
> +/*
> + * Return the memory region of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This ring buffer can be used to implement
> + * a custom events consumer.
> + * The ring buffer starts with the *struct perf_event_mmap_page*, which
> + * holds the ring buffer managment fields, when accessing the header
> + * structure it's important to be SMP aware.
> + * You can refer to *perf_event_read_simple* for a simple example.
> + */
Please move this comment into libbpf.h to corresponding API
declaration site. And make sure it is a proper doc comment (so add
descriptions for parameters and return results
> +int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size)
nit: if this fits under 100 characters, keep it as a single line
> +{
> + struct perf_cpu_buf *cpu_buf;
> +
> + if (buf_idx >= pb->cpu_cnt)
> + return libbpf_err(-EINVAL);
> +
> + cpu_buf = pb->cpu_bufs[buf_idx];
> + if (!cpu_buf)
> + return libbpf_err(-ENOENT);
> +
> + *buf = cpu_buf->base;
> + *buf_size = pb->mmap_size;
> + return 0;
> +}
> +
The code itself looks good and straightforward
> /*
> * Consume data from perf ring buffer corresponding to slot *buf_idx* in
> * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 9e9a3fd3edd8..78a7ab8f610a 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1381,6 +1381,8 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
> LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
here should be that doc comment
> +LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size);
>
> typedef enum bpf_perf_event_ret
> (*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 52973cffc20c..971072c6dfd8 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -458,6 +458,7 @@ LIBBPF_0.8.0 {
> bpf_program__set_insns;
> libbpf_register_prog_handler;
> libbpf_unregister_prog_handler;
> + perf_buffer__buffer;
Current development version is 1.0.0, please put new API in the right section
> } LIBBPF_0.7.0;
>
> LIBBPF_1.0.0 {
> --
> 2.36.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the ring buffer
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
2022-07-15 16:54 ` Andrii Nakryiko
@ 2022-07-15 16:58 ` Yonghong Song
1 sibling, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2022-07-15 16:58 UTC (permalink / raw)
To: Jon Doron, bpf, ast, andrii, daniel; +Cc: Jon Doron
On 7/15/22 7:18 AM, Jon Doron wrote:
> From: Jon Doron <jond@wiz.io>
>
> Add support for writing a custom event reader, by exposing the ring
> buffer.
>
> Few simple examples where this type of needed:
> 1. perf_event_read_simple is allocating using malloc, perhaps you want
> to handle the wrap-around in some other way.
> 2. Since perf buf is per-cpu then the order of the events is not
> guarnteed, for example:
> Given 3 events where each event has a timestamp t0 < t1 < t2,
> and the events are spread on more than 1 CPU, then we can end
> up with the following state in the ring buf:
> CPU[0] => [t0, t2]
> CPU[1] => [t1]
> When you consume the events from CPU[0], you could know there is
> a t1 missing, (assuming there are no drops, and your event data
> contains a sequential index).
> So now one can simply do the following, for CPU[0], you can store
> the address of t0 and t2 in an array (without moving the tail, so
> there data is not perished) then move on the CPU[1] and set the
> address of t1 in the same array.
> So you end up with something like:
> void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
> and move the tails as you process in order.
> 3. Assuming there are multiple CPUs and we want to start draining the
> messages from them, then we can "pick" with which one to start with
> according to the remaining free space in the ring buffer.
>
> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
> tools/lib/bpf/libbpf.c | 26 ++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 2 ++
> tools/lib/bpf/libbpf.map | 1 +
> 3 files changed, 29 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e89cc9c885b3..250263812194 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -12485,6 +12485,32 @@ int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
> return cpu_buf->fd;
> }
>
> +/*
> + * Return the memory region of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This ring buffer can be used to implement
> + * a custom events consumer.
> + * The ring buffer starts with the *struct perf_event_mmap_page*, which
> + * holds the ring buffer managment fields, when accessing the header
> + * structure it's important to be SMP aware.
> + * You can refer to *perf_event_read_simple* for a simple example.
> + */
> +int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size)
> +{
> + struct perf_cpu_buf *cpu_buf;
> +
> + if (buf_idx >= pb->cpu_cnt)
> + return libbpf_err(-EINVAL);
> +
> + cpu_buf = pb->cpu_bufs[buf_idx];
> + if (!cpu_buf)
> + return libbpf_err(-ENOENT);
> +
> + *buf = cpu_buf->base;
> + *buf_size = pb->mmap_size;
> + return 0;
> +}
> +
> /*
> * Consume data from perf ring buffer corresponding to slot *buf_idx* in
> * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 9e9a3fd3edd8..78a7ab8f610a 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1381,6 +1381,8 @@ LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx);
> LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
> LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx);
> +LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf,
> + size_t *buf_size);
>
> typedef enum bpf_perf_event_ret
> (*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 52973cffc20c..971072c6dfd8 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -458,6 +458,7 @@ LIBBPF_0.8.0 {
> bpf_program__set_insns;
> libbpf_register_prog_handler;
> libbpf_unregister_prog_handler;
> + perf_buffer__buffer;
You cannot add the LIBBPF_0.7.0 which has been released.
Please add to LIBBPF_1.0.0.
> } LIBBPF_0.7.0;
>
> LIBBPF_1.0.0 {
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-07-15 16:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-15 14:18 [PATCH bpf-next v3 0/1] libbpf: perfbuf expose ring buffer Jon Doron
2022-07-15 14:18 ` [PATCH bpf-next v3 1/1] libbpf: perfbuf: Add API to get the " Jon Doron
2022-07-15 16:54 ` Andrii Nakryiko
2022-07-15 16:58 ` Yonghong Song
2022-07-15 16:53 ` [PATCH bpf-next v3 0/1] libbpf: perfbuf expose " Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox