bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eduard Zingerman <eddyz87@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org,  Alexei Starovoitov <ast@kernel.org>,
	 Andrii Nakryiko <andrii@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	 Emil Tsalapatis <emil@etsalapatis.com>,
	 Barret Rhoden <brho@google.com>,
	 kkd@meta.com, kernel-team@meta.com
Subject: Re: [RFC PATCH bpf-next/net v1 07/13] bpf: Introduce per-prog stdout/stderr streams
Date: Wed, 16 Apr 2025 14:49:20 -0700	[thread overview]
Message-ID: <m2plhbu68v.fsf@gmail.com> (raw)
In-Reply-To: <20250414161443.1146103-8-memxor@gmail.com> (Kumar Kartikeya Dwivedi's message of "Mon, 14 Apr 2025 09:14:37 -0700")

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> Introduce a set of kfuncs to implement per BPF program stdout and stderr
> streams. This is implemented as a linked list of dynamically allocated
> strings whenever a message is printed. This can be done by the kernel or
> the program itself using the bpf_prog_stream_vprintk kfunc. In-kernel
> wrappers are provided over streams to ease the process.
>
> The idea is that everytime messages need to be dumped, the reader would
> pull out the whole batch of messages from the stream at once, and then
> pop each string one by one and start printing it out (how exactly is
> left up to the BPF program reading the log, but usually it will be
> streaming data back into a ring buffer that is consumed by user space).
>
> The use of a lockless list warrants that we be careful about the
> ordering of messages. When being added, the order maintained is new to
> old, therefore after deletion, we must reverse the list before iterating
> and popping out elements from it.
>
> Overall, this infrastructure provides NMI-safe any context printing
> built on top of the NMI-safe any context bpf_mem_alloc() interface.
>
> Later patches will add support for printing splats in case of BPF arena
> page faults, rqspinlock deadlocks, and cond_break timeouts.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

[...]

> diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> new file mode 100644
> index 000000000000..2019ce134310
> --- /dev/null
> +++ b/kernel/bpf/stream.c

[...]

> +static int bpf_stream_page_check_room(struct bpf_stream_page *stream_page, int len)
                                                                              ^^^^^^^
                                                                     len is never used
> +{
> +	int min = offsetof(struct bpf_stream_elem, str[0]);
> +	int consumed = stream_page->consumed;
> +	int total = BPF_STREAM_PAGE_SZ;
> +	int rem = max(0, total - consumed - min);
> +
> +	/* Let's give room of at least 8 bytes. */
> +	WARN_ON_ONCE(rem % 8 != 0);
> +	return rem < 8 ? 0 : rem;
> +}

[...]

> +static struct bpf_stream_elem *bpf_stream_elem_alloc(int len)
> +{
> +	const int max_len = ARRAY_SIZE((struct bpf_bprintf_buffers){}.buf);
> +	struct bpf_stream_elem *elem;
> +
> +	/*
> +	 * We may overflow, but we should never need more than one page size
> +	 * worth of memory. This can be lifted, but we'd need to adjust the
> +	 * other code to keep allocating more pages to overflow messages.
> +	 */
> +	BUILD_BUG_ON(max_len > BPF_STREAM_PAGE_SZ);
> +	/*
> +	 * Length denotes the amount of data to be written as part of stream element,
> +	 * thus includes '\0' byte. We're capped by how much bpf_bprintf_buffers can
> +	 * accomodate, therefore deny allocations that won't fit into them.
> +	 */
> +	if (len < 0 || len > max_len)
> +		return NULL;
> +
> +	elem = bpf_stream_elem_alloc_from_bpf_ma(len);
> +	if (!elem)
> +		elem = bpf_stream_elem_alloc_from_stream_page(len);

So, the stream page is a backup mechanism, right?
I'm curious, did you compare how many messages are dropped if there is
no such backup? Also, how much memory is wasted if there is no
"spillover" mechanism (BPF_STREAM_ELEM_F_NEXT).
Are these complications absolutely necessary?

> +	return elem;
> +}
> +
> +__bpf_kfunc_start_defs();
> +
> +static int bpf_stream_push_str(struct bpf_stream *stream, const char *str, int len)
> +{

This function accumulates elements in &stream->log w/o a cap.
Why is this not a problem if e.g. user space never flushes streams for
the program?

> +	struct bpf_stream_elem *elem, *next = NULL;
> +	int room = 0, rem = 0;
> +
> +	/*
> +	 * Allocate a bpf_prog_stream_elem and push it to the bpf_prog_stream
> +	 * log, elements will be popped at once and reversed to print the log.
> +	 */
> +	elem = bpf_stream_elem_alloc(len);
> +	if (!elem)
> +		return -ENOMEM;
> +	room = elem->mem_slice.len;
> +	if (elem->flags & BPF_STREAM_ELEM_F_NEXT) {
> +		next = (struct bpf_stream_elem *)((unsigned long)elem->next & ~BPF_STREAM_ELEM_F_MASK);
> +		rem = next->mem_slice.len;
> +	}
> +
> +	memcpy(elem->str, str, room);
> +	if (next)
> +		memcpy(next->str, str + room, rem);
> +
> +	if (next) {
> +		elem->node.next = &next->node;
> +		next->node.next = NULL;
> +
> +		llist_add_batch(&elem->node, &next->node, &stream->log);
> +	} else {
> +		llist_add(&elem->node, &stream->log);
> +	}
> +
> +	return 0;
> +}

[...]

> +BTF_KFUNCS_START(stream_consumer_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_stream_next_elem_batch, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_stream_free_elem_batch, KF_RELEASE)
> +BTF_ID_FLAGS(func, bpf_stream_next_elem, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_stream_free_elem, KF_RELEASE)
> +BTF_ID_FLAGS(func, bpf_prog_stream_get, KF_ACQUIRE | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_prog_stream_put, KF_RELEASE)
> +BTF_KFUNCS_END(stream_consumer_kfunc_set)

This is a complicated API.
If we anticipate that users intend to write this info to ring buffers
maybe just provide a function doing that and do not expose complete API?

[...]

I'll continue reading the patch-set tomorrow...

  reply	other threads:[~2025-04-16 21:49 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 16:14 [RFC PATCH bpf-next/net v1 00/13] BPF Standard Streams Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 01/13] bpf: Tie dynptrs to referenced source objects Kumar Kartikeya Dwivedi
2025-04-14 19:58   ` Amery Hung
2025-04-17 19:53     ` Kumar Kartikeya Dwivedi
2025-04-16  5:52   ` Eduard Zingerman
2025-04-17 19:52     ` Kumar Kartikeya Dwivedi
2025-04-16 21:04   ` Andrii Nakryiko
2025-04-17 19:51     ` Kumar Kartikeya Dwivedi
2025-04-22 21:09       ` Andrii Nakryiko
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 02/13] bpf: Compare dynptr_id in regsafe Kumar Kartikeya Dwivedi
2025-04-16  7:58   ` Eduard Zingerman
2025-04-16 21:04   ` Andrii Nakryiko
2025-04-17 19:54     ` Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 03/13] selftests/bpf: Convert dynptr_fail to use vmlinux.h Kumar Kartikeya Dwivedi
2025-04-16  8:41   ` Eduard Zingerman
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 04/13] selftests/bpf: Add tests for dynptr source object interaction Kumar Kartikeya Dwivedi
2025-04-16  8:40   ` Eduard Zingerman
2025-04-17 20:09     ` Kumar Kartikeya Dwivedi
2025-04-17 20:44       ` Eduard Zingerman
2025-04-17 21:10         ` Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 05/13] locking/local_lock, mm: Replace localtry_ helpers with local_trylock_t type Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 06/13] bpf: Introduce bpf_dynptr_from_mem_slice Kumar Kartikeya Dwivedi
2025-04-16 17:36   ` Eduard Zingerman
2025-04-17 20:07     ` Kumar Kartikeya Dwivedi
2025-04-16 21:04   ` Andrii Nakryiko
2025-04-17 20:07     ` Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 07/13] bpf: Introduce per-prog stdout/stderr streams Kumar Kartikeya Dwivedi
2025-04-16 21:49   ` Eduard Zingerman [this message]
2025-04-17 20:06     ` Kumar Kartikeya Dwivedi
2025-04-22  1:42       ` Alexei Starovoitov
2025-04-22  2:30         ` Kumar Kartikeya Dwivedi
2025-04-22 17:44           ` Alexei Starovoitov
2025-04-22 22:09       ` Eduard Zingerman
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 08/13] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
2025-04-22 21:07   ` Eduard Zingerman
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 09/13] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 10/13] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 11/13] bpf: Report arena faults " Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 12/13] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
2025-04-22 20:30   ` Eduard Zingerman
2025-04-22 22:10   ` Quentin Monnet
2025-05-07 17:13     ` Kumar Kartikeya Dwivedi
2025-04-14 16:14 ` [RFC PATCH bpf-next/net v1 13/13] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
2025-04-22 19:12   ` Eduard Zingerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2plhbu68v.fsf@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=daniel@iogearbox.net \
    --cc=emil@etsalapatis.com \
    --cc=kernel-team@meta.com \
    --cc=kkd@meta.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).