All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 00/11] BPF Standard Streams
@ 2025-05-07 17:17 Kumar Kartikeya Dwivedi
  2025-05-07 17:17 ` [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice Kumar Kartikeya Dwivedi
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

This set introduces a standard output interface with two streams, namely
stdout and stderr, for BPF programs. The idea is that these streams will
be written to by BPF programs and the kernel, and serve as standard
interfaces for informing user space of any BPF runtime violations. Users
can also utilize them for printing normal messages for debugging usage,
as is the case with bpf_printk() and trace pipe interface.

BPF programs and the kernel can use these streams to output messages.
User space can dump these messages using bpftool.

The stream interface itself is implemented using a lockless list, so
that we can queue messages from any context. Every printk statement into
the stream leads to memory allocation. Allocation itself relies on
try_alloc_pages() to construct a bespoke bump allocator to carve out
elements. If this fails, we finally give up and drop the message.

See commit logs for more details.

Three scenarios are covered:
 - Deadlocks and timeouts in rqspinlock.
 - Arena page faults.
 - Timeouts for may_goto.

In each we provide the stack trace and source information for the
offending BPF programs. Both the C source line and the file and line
numbers are printed. The output format is as follows:

ERROR: AA or ABBA deadlock detected for bpf_res_spin_lock
Attempted lock   = 0xff11000108f3a5e0
Total held locks = 1
Held lock[ 0] = 0xff11000108f3a5e0
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_rqspinlock_violation+0x10b/0x130
bpf_res_spin_lock+0x8c/0xa0
bpf_prog_3699ea119d1f6ed8_foo+0xe5/0x140
  if (!bpf_res_spin_lock(&v2->lock)) @ stream_bpftool.c:62
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
  foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e

ERROR: Arena READ access at unmapped address 0xdeadbeef
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_arena_violation+0x90/0xb0
ex_handler_bpf+0x4a/0xa0
fixup_exception+0xde/0x310
kernelmode_fixup_or_oops.constprop.0+0x2f/0x70
exc_page_fault+0xdd/0x1d0
asm_exc_page_fault+0x26/0x30
bpf_prog_3699ea119d1f6ed8_foo+0x10c/0x140
  *(u64 __arena *)0xfaceb00c = *(u64 __arena *)0xdeadbeef; @ stream_bpftool.c:68
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
  foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e

ERROR: Arena WRITE access at unmapped address 0xfaceb00c
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_arena_violation+0x90/0xb0
ex_handler_bpf+0x4a/0xa0
fixup_exception+0xde/0x310
kernelmode_fixup_or_oops.constprop.0+0x2f/0x70
exc_page_fault+0xdd/0x1d0
asm_exc_page_fault+0x26/0x30
bpf_prog_3699ea119d1f6ed8_foo+0x111/0x140
  *(u64 __arena *)0xfaceb00c = *(u64 __arena *)0xdeadbeef; @ stream_bpftool.c:68
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
  foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e

ERROR: Timeout detected for may_goto instruction
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_may_goto_violation+0x6a/0x90
bpf_check_timed_may_goto+0x4d/0xa0
arch_bpf_timed_may_goto+0x21/0x40
bpf_prog_3699ea119d1f6ed8_foo+0x12f/0x140
  while (can_loop) @ stream_bpftool.c:71
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
  foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e

Changelog:
----------
RFC v1 -> v1
RFC v1: https://lore.kernel.org/bpf/20250414161443.1146103-1-memxor@gmail.com

 * Rebase on bpf-next/master.
 * Change output in dump_stack to also print source line. (Alexei)
 * Simplify API to single pop() operation. (Eduard, Alexei)
 * Add kdoc for bpf_dynptr_from_mem_slice.
 * Fix -EINVAL returned from prog_dump_stream. (Eduard)
 * Split dump_stack() patch into multiple commits.
 * Add macro wrapping stream staging API.
 * Change bpftool command from dump to tracelog. (Quentin)
 * Add bpftool documentation and bash completion. (Quentin)
 * Change license of bpftool to Dual BSD/GPL.
 * Simplify memory allocator (Alexei).
   * No overflow into second page.
   * Remove bpf_mem_alloc() fallback.
 * Symlink bpftool BPF program and exercise as selftest. (Eduard)
 * Verify output after dumping from ringbuf. (Eduard)
 * More failure cases to check API invariants.
 * Remove patches for dynptr lifetime fixes (split into separate set).
 * Limit maximum error messages, and add stream capacity (Eduard).

Kumar Kartikeya Dwivedi (11):
  bpf: Introduce bpf_dynptr_from_mem_slice
  bpf: Introduce BPF standard streams
  bpf: Add function to extract program source info
  bpf: Add function to find program from stack trace
  bpf: Add dump_stack() analogue to print to BPF stderr
  bpf: Report may_goto timeout to BPF stderr
  bpf: Report rqspinlock deadlocks/timeout to BPF stderr
  bpf: Report arena faults to BPF stderr
  libbpf: Add bpf_stream_printk() macro
  bpftool: Add support for dumping streams
  selftests/bpf: Add tests for prog streams

 arch/x86/net/bpf_jit_comp.c                   |  22 +-
 include/linux/bpf.h                           |  91 ++-
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/arena.c                            |  14 +
 kernel/bpf/core.c                             |  95 ++-
 kernel/bpf/helpers.c                          |  63 +-
 kernel/bpf/rqspinlock.c                       |  22 +
 kernel/bpf/stream.c                           | 552 ++++++++++++++++++
 kernel/bpf/syscall.c                          |   2 +-
 kernel/bpf/verifier.c                         |  21 +-
 .../bpftool/Documentation/bpftool-prog.rst    |   6 +
 tools/bpf/bpftool/Makefile                    |   2 +-
 tools/bpf/bpftool/bash-completion/bpftool     |  16 +-
 tools/bpf/bpftool/prog.c                      |  88 ++-
 tools/bpf/bpftool/skeleton/stream.bpf.c       |  69 +++
 tools/lib/bpf/bpf_helpers.h                   |  44 +-
 .../testing/selftests/bpf/prog_tests/stream.c |  95 +++
 tools/testing/selftests/bpf/progs/stream.c    | 127 ++++
 .../selftests/bpf/progs/stream_bpftool.c      |   1 +
 .../testing/selftests/bpf/progs/stream_fail.c |  90 +++
 20 files changed, 1383 insertions(+), 39 deletions(-)
 create mode 100644 kernel/bpf/stream.c
 create mode 100644 tools/bpf/bpftool/skeleton/stream.bpf.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/stream.c
 create mode 100644 tools/testing/selftests/bpf/progs/stream.c
 create mode 120000 tools/testing/selftests/bpf/progs/stream_bpftool.c
 create mode 100644 tools/testing/selftests/bpf/progs/stream_fail.c


base-commit: 43745d11bfd9683abdf08ad7a5cc403d6a9ffd15
-- 
2.47.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-09 17:19   ` Eduard Zingerman
  2025-05-09 21:11   ` Andrii Nakryiko
  2025-05-07 17:17 ` [PATCH bpf-next v1 02/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Add a new bpf_dynptr_from_mem_slice kfunc to create a dynptr from a
PTR_TO_BTF_ID exposing a variable-length slice of memory, represented by
the new bpf_mem_slice type. This slice is read-only, for a read-write
slice we can expose a distinct type in the future.

Since this is the first kfunc with potential local dynptr
initialization, add it to the if-else list in check_kfunc_call.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |  6 ++++++
 kernel/bpf/helpers.c  | 37 +++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c |  6 +++++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3f0cc89c0622..b0ea0b71df90 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1344,6 +1344,12 @@ enum bpf_dynptr_type {
 	BPF_DYNPTR_TYPE_XDP,
 };
 
+struct bpf_mem_slice {
+	void *ptr;
+	u32 len;
+	u32 reserved;
+};
+
 int bpf_dynptr_check_size(u32 size);
 u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr);
 const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 78cefb41266a..89ab3481378d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2873,6 +2873,42 @@ __bpf_kfunc int bpf_dynptr_copy(struct bpf_dynptr *dst_ptr, u32 dst_off,
 	return 0;
 }
 
+/**
+ * bpf_dynptr_from_mem_slice - Create a dynptr from a bpf_mem_slice
+ * @mem_slice: Source bpf_mem_slice, backing the underlying memory for dynptr
+ * @flags: Flags for dynptr construction, currently no supported flags.
+ * @dptr__uninit: Destination dynptr, which will be initialized.
+ *
+ * Creates a dynptr that points to variable-length read-only memory represented
+ * by a bpf_mem_slice fat pointer.
+ * Returns 0 on success; negative error, otherwise.
+ */
+__bpf_kfunc int bpf_dynptr_from_mem_slice(struct bpf_mem_slice *mem_slice, u64 flags, struct bpf_dynptr *dptr__uninit)
+{
+	struct bpf_dynptr_kern *dptr = (struct bpf_dynptr_kern *)dptr__uninit;
+	int err;
+
+	/* mem_slice is never NULL, as we use KF_TRUSTED_ARGS. */
+	err = bpf_dynptr_check_size(mem_slice->len);
+	if (err)
+		goto error;
+
+	/* flags is currently unsupported */
+	if (flags) {
+		err = -EINVAL;
+		goto error;
+	}
+
+	bpf_dynptr_init(dptr, mem_slice->ptr, BPF_DYNPTR_TYPE_LOCAL, 0, mem_slice->len);
+	bpf_dynptr_set_rdonly(dptr);
+
+	return 0;
+
+error:
+	bpf_dynptr_set_null(dptr);
+	return err;
+}
+
 __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
 {
 	return obj;
@@ -3327,6 +3363,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
 BTF_ID_FLAGS(func, bpf_dynptr_size)
 BTF_ID_FLAGS(func, bpf_dynptr_clone)
 BTF_ID_FLAGS(func, bpf_dynptr_copy)
+BTF_ID_FLAGS(func, bpf_dynptr_from_mem_slice, KF_TRUSTED_ARGS)
 #ifdef CONFIG_NET
 BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
 #endif
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 99aa2c890e7b..ff34e68c9237 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12116,6 +12116,7 @@ enum special_kfunc_type {
 	KF_bpf_res_spin_unlock,
 	KF_bpf_res_spin_lock_irqsave,
 	KF_bpf_res_spin_unlock_irqrestore,
+	KF_bpf_dynptr_from_mem_slice,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -12219,6 +12220,7 @@ BTF_ID(func, bpf_res_spin_lock)
 BTF_ID(func, bpf_res_spin_unlock)
 BTF_ID(func, bpf_res_spin_lock_irqsave)
 BTF_ID(func, bpf_res_spin_unlock_irqrestore)
+BTF_ID(func, bpf_dynptr_from_mem_slice)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -13140,7 +13142,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			if (is_kfunc_arg_uninit(btf, &args[i]))
 				dynptr_arg_type |= MEM_UNINIT;
 
-			if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) {
+			if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_mem_slice]) {
+				dynptr_arg_type |= DYNPTR_TYPE_LOCAL;
+			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) {
 				dynptr_arg_type |= DYNPTR_TYPE_SKB;
 			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp]) {
 				dynptr_arg_type |= DYNPTR_TYPE_XDP;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 02/11] bpf: Introduce BPF standard streams
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
  2025-05-07 17:17 ` [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-08 23:54   ` Eduard Zingerman
  2025-05-07 17:17 ` [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.

The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.

The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using try_alloc_pages. This ensures that we can allocate memory
from any context. The eventual plan is to discard this scheme in favor
of Alexei's kmalloc_nolock() [0].

This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.

Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream argument in addition to working otherwise similar
to bpf_trace_vprintk. The stream itself can be obtained using two
kfuncs, bpf_stream_get for the current program, and bpf_prog_stream_get
to obtain it for a target program ID.

The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.

For consuming elements from a stream, bpf_stream_next_elem can be
called, which returns a bpf_stream_elem object that contains a
bpf_mem_slice struct representing the message contents. A dynptr can be
created from this memory slice object to access the contents of the
bpf_stream_elem.  Once consumed, the bpf_stream_free_elem can be used to
release the message back to the memory allocator.

The internals of bpf_stream_next_elem merit some discussion. First, the
lockless list bpf_stream::log is a LIFO stack. Elements obtained using a
llist_del_all() operation are in LIFO order, thus would break the
chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller.

For this purpose, we hold a lock around bpf_stream_backlog_pop(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Next time we pop a message, we should visit the
remaining elements in the backlog log first. We use rqspinlock for
protecting the backlog log, to ensure we can invoke bpf_stream_next_elem
in any context.

With the exception of bpf_prog_stream_get, these kfuncs are available to
all program types. bpf_prog_stream_get takes a spin_lock_bh, thus is
susceptible to deadlocks if invoked in random kernel contexts. Hence, it
is restricted to BPF_PROG_TYPE_SYSCALL. In the future, if the need
arises, we can use rqspinlock to make it callable in any context.

From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.

To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.

This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.

While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.

Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.

Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.

  [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h   |  72 +++++-
 kernel/bpf/Makefile   |   2 +-
 kernel/bpf/core.c     |  12 +
 kernel/bpf/helpers.c  |  26 +--
 kernel/bpf/stream.c   | 499 ++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c  |   2 +-
 kernel/bpf/verifier.c |  15 +-
 7 files changed, 605 insertions(+), 23 deletions(-)
 create mode 100644 kernel/bpf/stream.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b0ea0b71df90..2c10ae62df2d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1524,6 +1524,40 @@ struct btf_mod_pair {
 
 struct bpf_kfunc_desc_tab;
 
+enum bpf_stream_id {
+	BPF_STDOUT = 1,
+	BPF_STDERR = 2,
+};
+
+struct bpf_stream_elem {
+	struct llist_node node;
+	struct bpf_mem_slice mem_slice;
+	char str[];
+};
+
+struct bpf_stream_elem_batch {
+	struct llist_node *node;
+};
+
+enum {
+	BPF_STREAM_MAX_CAPACITY = (4 * 1024U * 1024U),
+};
+
+struct bpf_stream {
+	enum bpf_stream_id stream_id;
+	atomic_t capacity;
+	struct llist_head log;
+
+	rqspinlock_t lock;
+	struct llist_node *backlog_head;
+	struct llist_node *backlog_tail;
+};
+
+struct bpf_stream_stage {
+	struct llist_head log;
+	int len;
+};
+
 struct bpf_prog_aux {
 	atomic64_t refcnt;
 	u32 used_map_cnt;
@@ -1632,6 +1666,7 @@ struct bpf_prog_aux {
 		struct work_struct work;
 		struct rcu_head	rcu;
 	};
+	struct bpf_stream stream[2];
 };
 
 struct bpf_prog {
@@ -2391,6 +2426,8 @@ int  generic_map_delete_batch(struct bpf_map *map,
 struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
 struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
 
+
+struct page *__bpf_alloc_page(int nid);
 int bpf_map_alloc_pages(const struct bpf_map *map, int nid,
 			unsigned long nr_pages, struct page **page_array);
 #ifdef CONFIG_MEMCG
@@ -3529,6 +3566,16 @@ bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
 #define MAX_BPRINTF_VARARGS		12
 #define MAX_BPRINTF_BUF			1024
 
+/* Per-cpu temp buffers used by printf-like helpers to store the bprintf binary
+ * arguments representation.
+ */
+#define MAX_BPRINTF_BIN_ARGS	512
+
+struct bpf_bprintf_buffers {
+	char bin_args[MAX_BPRINTF_BIN_ARGS];
+	char buf[MAX_BPRINTF_BUF];
+};
+
 struct bpf_bprintf_data {
 	u32 *bin_args;
 	char *buf;
@@ -3536,9 +3583,32 @@ struct bpf_bprintf_data {
 	bool get_buf;
 };
 
-int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
+int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 num_args, struct bpf_bprintf_data *data);
 void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
+int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
+void bpf_put_buffers(void);
+
+void bpf_prog_stream_init(struct bpf_prog *prog);
+void bpf_prog_stream_free(struct bpf_prog *prog);
+
+void bpf_stream_stage_init(struct bpf_stream_stage *ss);
+void bpf_stream_stage_free(struct bpf_stream_stage *ss);
+__printf(2, 3)
+int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
+int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
+			    enum bpf_stream_id stream_id);
+
+#define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
+
+#define bpf_stream_stage(prog, stream_id, expr)                  \
+	({                                                       \
+		struct bpf_stream_stage __ss;                    \
+		bpf_stream_stage_init(&__ss);                    \
+		(expr);                                          \
+		bpf_stream_stage_commit(&__ss, prog, stream_id); \
+		bpf_stream_stage_free(&__ss);                    \
+	})
 
 #ifdef CONFIG_BPF_LSM
 void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 70502f038b92..a89575822b60 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
 obj-$(CONFIG_BPF_JIT) += trampoline.o
-obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o
+obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o stream.o
 ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
 obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o
 endif
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index a3e571688421..22c278c008ce 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -134,6 +134,10 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
 	mutex_init(&fp->aux->ext_mutex);
 	mutex_init(&fp->aux->dst_mutex);
 
+#ifdef CONFIG_BPF_SYSCALL
+	bpf_prog_stream_init(fp);
+#endif
+
 	return fp;
 }
 
@@ -2861,6 +2865,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 	aux = container_of(work, struct bpf_prog_aux, work);
 #ifdef CONFIG_BPF_SYSCALL
 	bpf_free_kfunc_btf_tab(aux->kfunc_btf_tab);
+	bpf_prog_stream_free(aux->prog);
 #endif
 #ifdef CONFIG_CGROUP_BPF
 	if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
@@ -2877,6 +2882,13 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 	if (aux->dst_trampoline)
 		bpf_trampoline_put(aux->dst_trampoline);
 	for (i = 0; i < aux->real_func_cnt; i++) {
+#ifdef CONFIG_BPF_SYSCALL
+		/* Ensure we don't push to subprog lists. */
+		if (bpf_is_subprog(aux->func[i])) {
+			WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[0].log));
+			WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[1].log));
+		}
+#endif
 		/* We can just unlink the subprog poke descriptor table as
 		 * it was originally linked to the main program and is also
 		 * released along with it.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 89ab3481378d..98806368121e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -761,22 +761,13 @@ static int bpf_trace_copy_string(char *buf, void *unsafe_ptr, char fmt_ptype,
 	return -EINVAL;
 }
 
-/* Per-cpu temp buffers used by printf-like helpers to store the bprintf binary
- * arguments representation.
- */
-#define MAX_BPRINTF_BIN_ARGS	512
-
 /* Support executing three nested bprintf helper calls on a given CPU */
 #define MAX_BPRINTF_NEST_LEVEL	3
-struct bpf_bprintf_buffers {
-	char bin_args[MAX_BPRINTF_BIN_ARGS];
-	char buf[MAX_BPRINTF_BUF];
-};
 
 static DEFINE_PER_CPU(struct bpf_bprintf_buffers[MAX_BPRINTF_NEST_LEVEL], bpf_bprintf_bufs);
 static DEFINE_PER_CPU(int, bpf_bprintf_nest_level);
 
-static int try_get_buffers(struct bpf_bprintf_buffers **bufs)
+int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs)
 {
 	int nest_level;
 
@@ -792,16 +783,21 @@ static int try_get_buffers(struct bpf_bprintf_buffers **bufs)
 	return 0;
 }
 
-void bpf_bprintf_cleanup(struct bpf_bprintf_data *data)
+void bpf_put_buffers(void)
 {
-	if (!data->bin_args && !data->buf)
-		return;
 	if (WARN_ON_ONCE(this_cpu_read(bpf_bprintf_nest_level) == 0))
 		return;
 	this_cpu_dec(bpf_bprintf_nest_level);
 	preempt_enable();
 }
 
+void bpf_bprintf_cleanup(struct bpf_bprintf_data *data)
+{
+	if (!data->bin_args && !data->buf)
+		return;
+	bpf_put_buffers();
+}
+
 /*
  * bpf_bprintf_prepare - Generic pass on format strings for bprintf-like helpers
  *
@@ -816,7 +812,7 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data)
  * In argument preparation mode, if 0 is returned, safe temporary buffers are
  * allocated and bpf_bprintf_cleanup should be called to free them after use.
  */
-int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
+int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 num_args, struct bpf_bprintf_data *data)
 {
 	bool get_buffers = (data->get_bin_args && num_args) || data->get_buf;
@@ -832,7 +828,7 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
 		return -EINVAL;
 	fmt_size = fmt_end - fmt;
 
-	if (get_buffers && try_get_buffers(&buffers))
+	if (get_buffers && bpf_try_get_buffers(&buffers))
 		return -EBUSY;
 
 	if (data->get_bin_args) {
diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
new file mode 100644
index 000000000000..a9151a8575ec
--- /dev/null
+++ b/kernel/bpf/stream.c
@@ -0,0 +1,499 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <linux/bpf.h>
+#include <linux/bpf_mem_alloc.h>
+#include <linux/percpu.h>
+#include <linux/refcount.h>
+#include <linux/gfp.h>
+#include <linux/memory.h>
+#include <linux/local_lock.h>
+#include <asm/rqspinlock.h>
+
+/*
+ * Simple per-CPU NMI-safe bump allocation mechanism, backed by the NMI-safe
+ * try_alloc_pages()/free_pages_nolock() primitives. We allocate a page and
+ * stash it in a local per-CPU variable, and bump allocate from the page
+ * whenever items need to be printed to a stream. Each page holds a global
+ * atomic refcount in its first 4 bytes, and then records of variable length
+ * that describe the printed messages. Once the global refcount has dropped to
+ * zero, it is a signal to free the page back to the kernel's page allocator,
+ * given all the individual records in it have been consumed.
+ *
+ * It is possible the same page is used to serve allocations across different
+ * programs, which may be consumed at different times individually, hence
+ * maintaining a reference count per-page is critical for correct lifetime
+ * tracking.
+ *
+ * The bpf_stream_page code will be replaced to use kmalloc_nolock() once it
+ * lands.
+ */
+struct bpf_stream_page {
+	refcount_t ref;
+	u32 consumed;
+	char buf[];
+};
+
+/* Available room to add data to a refcounted page. */
+#define BPF_STREAM_PAGE_SZ (PAGE_SIZE - offsetofend(struct bpf_stream_page, consumed))
+
+static DEFINE_PER_CPU(local_trylock_t, stream_local_lock) = INIT_LOCAL_TRYLOCK(stream_local_lock);
+static DEFINE_PER_CPU(struct bpf_stream_page *, stream_pcpu_page);
+
+static bool bpf_stream_page_local_lock(unsigned long *flags)
+{
+	return local_trylock_irqsave(&stream_local_lock, *flags);
+}
+
+static void bpf_stream_page_local_unlock(unsigned long *flags)
+{
+	local_unlock_irqrestore(&stream_local_lock, *flags);
+}
+
+static void bpf_stream_page_free(struct bpf_stream_page *stream_page)
+{
+	struct page *p;
+
+	if (!stream_page)
+		return;
+	p = virt_to_page(stream_page);
+	free_pages_nolock(p, 0);
+}
+
+static void bpf_stream_page_get(struct bpf_stream_page *stream_page)
+{
+	refcount_inc(&stream_page->ref);
+}
+
+static void bpf_stream_page_put(struct bpf_stream_page *stream_page)
+{
+	if (refcount_dec_and_test(&stream_page->ref))
+		bpf_stream_page_free(stream_page);
+}
+
+static void bpf_stream_page_init(struct bpf_stream_page *stream_page)
+{
+	refcount_set(&stream_page->ref, 1);
+	stream_page->consumed = 0;
+}
+
+static struct bpf_stream_page *bpf_stream_page_replace(void)
+{
+	struct bpf_stream_page *stream_page, *old_stream_page;
+	struct page *page;
+
+	page = __bpf_alloc_page(NUMA_NO_NODE);
+	if (!page)
+		return NULL;
+	stream_page = page_address(page);
+	bpf_stream_page_init(stream_page);
+
+	old_stream_page = this_cpu_read(stream_pcpu_page);
+	if (old_stream_page)
+		bpf_stream_page_put(old_stream_page);
+	this_cpu_write(stream_pcpu_page, stream_page);
+	return stream_page;
+}
+
+static int bpf_stream_page_check_room(struct bpf_stream_page *stream_page, int len)
+{
+	int min = offsetof(struct bpf_stream_elem, str[0]);
+	int consumed = stream_page->consumed;
+	int total = BPF_STREAM_PAGE_SZ;
+	int rem = max(0, total - consumed - min);
+
+	/* Let's give room of at least 8 bytes. */
+	WARN_ON_ONCE(rem % 8 != 0);
+	rem = rem < 8 ? 0 : rem;
+	return min(len, rem);
+}
+
+static void bpf_stream_elem_init(struct bpf_stream_elem *elem, int len)
+{
+	init_llist_node(&elem->node);
+	elem->mem_slice.ptr = elem->str;
+	elem->mem_slice.len = len;
+}
+
+static struct bpf_stream_page *bpf_stream_page_from_elem(struct bpf_stream_elem *elem)
+{
+	unsigned long addr = (unsigned long)elem;
+
+	return (struct bpf_stream_page *)PAGE_ALIGN_DOWN(addr);
+}
+
+static struct bpf_stream_elem *bpf_stream_page_push_elem(struct bpf_stream_page *stream_page, int len)
+{
+	u32 consumed = stream_page->consumed;
+
+	stream_page->consumed += round_up(offsetof(struct bpf_stream_elem, str[len]), 8);
+	return (struct bpf_stream_elem *)&stream_page->buf[consumed];
+}
+
+static noinline struct bpf_stream_elem *bpf_stream_page_reserve_elem(int len)
+{
+	struct bpf_stream_elem *elem = NULL;
+	struct bpf_stream_page *page;
+	int room = 0;
+
+	page = this_cpu_read(stream_pcpu_page);
+	if (!page)
+		page = bpf_stream_page_replace();
+	if (!page)
+		return NULL;
+
+	room = bpf_stream_page_check_room(page, len);
+	if (room != len)
+		page = bpf_stream_page_replace();
+	if (!page)
+		return NULL;
+	bpf_stream_page_get(page);
+	room = bpf_stream_page_check_room(page, len);
+	WARN_ON_ONCE(room != len);
+
+	elem = bpf_stream_page_push_elem(page, room);
+	bpf_stream_elem_init(elem, room);
+	return elem;
+}
+
+static struct bpf_stream_elem *bpf_stream_elem_alloc(int len)
+{
+	const int max_len = ARRAY_SIZE((struct bpf_bprintf_buffers){}.buf);
+	struct bpf_stream_elem *elem;
+	unsigned long flags;
+
+	/*
+	 * We may overflow, but we should never need more than one page size
+	 * worth of memory. This can be lifted, but we'd need to adjust the
+	 * other code to keep allocating more pages to overflow messages.
+	 */
+	BUILD_BUG_ON(max_len > BPF_STREAM_PAGE_SZ);
+	/*
+	 * Length denotes the amount of data to be written as part of stream element,
+	 * thus includes '\0' byte. We're capped by how much bpf_bprintf_buffers can
+	 * accomodate, therefore deny allocations that won't fit into them.
+	 */
+	if (len < 0 || len > max_len)
+		return NULL;
+
+	if (!bpf_stream_page_local_lock(&flags))
+		return NULL;
+	elem = bpf_stream_page_reserve_elem(len);
+	bpf_stream_page_local_unlock(&flags);
+	return elem;
+}
+
+__bpf_kfunc_start_defs();
+
+static int __bpf_stream_push_str(struct llist_head *log, const char *str, int len)
+{
+	struct bpf_stream_elem *elem = NULL;
+
+	/*
+	 * Allocate a bpf_prog_stream_elem and push it to the bpf_prog_stream
+	 * log, elements will be popped at once and reversed to print the log.
+	 */
+	elem = bpf_stream_elem_alloc(len);
+	if (!elem)
+		return -ENOMEM;
+
+	memcpy(elem->str, str, len);
+	llist_add(&elem->node, log);
+
+	return 0;
+}
+
+static int bpf_stream_consume_capacity(struct bpf_stream *stream, int len)
+{
+	if (atomic_read(&stream->capacity) >= BPF_STREAM_MAX_CAPACITY)
+		return -ENOSPC;
+	if (atomic_add_return(len, &stream->capacity) >= BPF_STREAM_MAX_CAPACITY) {
+		atomic_sub(len, &stream->capacity);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+static void bpf_stream_release_capacity(struct bpf_stream *stream, struct bpf_stream_elem *elem)
+{
+	int len = elem->mem_slice.len;
+
+	atomic_sub(len, &stream->capacity);
+}
+
+static int bpf_stream_push_str(struct bpf_stream *stream, const char *str, int len)
+{
+	int ret = bpf_stream_consume_capacity(stream, len);
+
+	return ret ?: __bpf_stream_push_str(&stream->log, str, len);
+}
+
+__bpf_kfunc int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__str, const void *args, u32 len__sz)
+{
+	struct bpf_bprintf_data data = {
+		.get_bin_args	= true,
+		.get_buf	= true,
+	};
+	u32 fmt_size = strlen(fmt__str) + 1;
+	u32 data_len = len__sz;
+	int ret, num_args;
+
+	if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
+	    (data_len && !args))
+		return -EINVAL;
+	num_args = data_len / 8;
+
+	ret = bpf_bprintf_prepare(fmt__str, fmt_size, args, num_args, &data);
+	if (ret < 0)
+		return ret;
+
+	ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt__str, data.bin_args);
+	/* If the string was truncated, we only wrote until the size of buffer. */
+	ret = min_t(u32, ret + 1, MAX_BPRINTF_BUF);
+	ret = bpf_stream_push_str(stream, data.buf, ret);
+	bpf_bprintf_cleanup(&data);
+
+	return ret;
+}
+
+__bpf_kfunc struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, void *aux__ign)
+{
+	struct bpf_prog_aux *aux = aux__ign;
+
+	if (stream_id != BPF_STDOUT && stream_id != BPF_STDERR)
+		return NULL;
+	return &aux->stream[stream_id - 1];
+}
+
+__bpf_kfunc void bpf_stream_free_elem(struct bpf_stream_elem *elem)
+{
+	struct bpf_stream_page *p;
+
+	p = bpf_stream_page_from_elem(elem);
+	bpf_stream_page_put(p);
+}
+
+static void bpf_stream_free_list(struct llist_node *list)
+{
+	struct bpf_stream_elem *elem, *tmp;
+
+	llist_for_each_entry_safe(elem, tmp, list, node)
+		bpf_stream_free_elem(elem);
+}
+
+static struct llist_node *bpf_stream_backlog_pop(struct bpf_stream *stream)
+{
+	struct llist_node *node;
+
+	node = stream->backlog_head;
+	if (stream->backlog_head == stream->backlog_tail)
+		stream->backlog_head = stream->backlog_tail = NULL;
+	else
+		stream->backlog_head = node->next;
+	return node;
+}
+
+static struct llist_node *bpf_stream_log_pop(struct bpf_stream *stream)
+{
+	struct llist_node *node, *head, *tail;
+	unsigned long flags;
+
+	if (llist_empty(&stream->log))
+		return NULL;
+	tail = llist_del_all(&stream->log);
+	if (!tail)
+		return NULL;
+	head = llist_reverse_order(tail);
+
+	if (raw_res_spin_lock_irqsave(&stream->lock, flags)) {
+		bpf_stream_free_list(head);
+		return NULL;
+	}
+
+	if (!stream->backlog_head) {
+		stream->backlog_head = head;
+		stream->backlog_tail = tail;
+	} else {
+		stream->backlog_tail->next = head;
+		stream->backlog_tail = tail;
+	}
+
+	node = bpf_stream_backlog_pop(stream);
+	raw_res_spin_unlock_irqrestore(&stream->lock, flags);
+
+	return node;
+}
+
+__bpf_kfunc struct bpf_stream_elem *bpf_stream_next_elem(struct bpf_stream *stream)
+{
+	struct bpf_stream_elem *elem = NULL;
+	struct llist_node *node;
+	unsigned long flags;
+
+	if (raw_res_spin_lock_irqsave(&stream->lock, flags))
+		return NULL;
+	node = bpf_stream_backlog_pop(stream);
+	if (!node)
+		goto unlock;
+unlock:
+	raw_res_spin_unlock_irqrestore(&stream->lock, flags);
+
+	if (node)
+		goto end;
+
+	node = bpf_stream_log_pop(stream);
+	if (!node)
+		return NULL;
+end:
+	elem = container_of(node, typeof(*elem), node);
+	bpf_stream_release_capacity(stream, elem);
+	return elem;
+}
+
+__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(enum bpf_stream_id stream_id, u32 prog_id)
+{
+	struct bpf_stream *stream;
+	struct bpf_prog *prog;
+
+	prog = bpf_prog_by_id(prog_id);
+	if (IS_ERR_OR_NULL(prog))
+		return NULL;
+	stream = bpf_stream_get(stream_id, prog->aux);
+	if (!stream)
+		bpf_prog_put(prog);
+	return stream;
+}
+
+__bpf_kfunc void bpf_prog_stream_put(struct bpf_stream *stream)
+{
+	enum bpf_stream_id stream_id = stream->stream_id;
+	struct bpf_prog *prog;
+
+	prog = container_of(stream, struct bpf_prog_aux, stream[stream_id - 1])->prog;
+	bpf_prog_put(prog);
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(stream_kfunc_set)
+BTF_ID_FLAGS(func, bpf_stream_get, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_stream_vprintk, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_stream_next_elem, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_stream_free_elem, KF_RELEASE)
+BTF_KFUNCS_END(stream_kfunc_set)
+
+BTF_KFUNCS_START(stream_syscall_kfunc_set)
+BTF_ID_FLAGS(func, bpf_prog_stream_get, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_prog_stream_put, KF_RELEASE)
+BTF_KFUNCS_END(stream_syscall_kfunc_set)
+
+static const struct btf_kfunc_id_set bpf_stream_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &stream_kfunc_set,
+};
+
+static const struct btf_kfunc_id_set bpf_stream_syscall_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &stream_syscall_kfunc_set,
+};
+
+static int __init bpf_stream_kfunc_init(void)
+{
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &bpf_stream_kfunc_set);
+	if (ret)
+		return ret;
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_stream_syscall_kfunc_set);
+}
+late_initcall(bpf_stream_kfunc_init);
+
+void bpf_prog_stream_init(struct bpf_prog *prog)
+{
+	int i;
+
+	prog->aux->stream[0].stream_id = BPF_STDOUT;
+	prog->aux->stream[1].stream_id = BPF_STDERR;
+
+	for (i = 0; i < ARRAY_SIZE(prog->aux->stream); i++) {
+		atomic_set(&prog->aux->stream[i].capacity, 0);
+		init_llist_head(&prog->aux->stream[i].log);
+		raw_res_spin_lock_init(&prog->aux->stream[i].lock);
+		prog->aux->stream[i].backlog_head = NULL;
+		prog->aux->stream[i].backlog_tail = NULL;
+	}
+}
+
+void bpf_prog_stream_free(struct bpf_prog *prog)
+{
+	struct llist_node *list;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(prog->aux->stream); i++) {
+		list = llist_del_all(&prog->aux->stream[i].log);
+		bpf_stream_free_list(list);
+		bpf_stream_free_list(prog->aux->stream[i].backlog_head);
+	}
+}
+
+void bpf_stream_stage_init(struct bpf_stream_stage *ss)
+{
+	init_llist_head(&ss->log);
+	ss->len = 0;
+}
+
+void bpf_stream_stage_free(struct bpf_stream_stage *ss)
+{
+	struct llist_node *node;
+
+	node = llist_del_all(&ss->log);
+	bpf_stream_free_list(node);
+}
+
+int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...)
+{
+	struct bpf_bprintf_buffers *buf;
+	va_list args;
+	int ret;
+
+	if (bpf_try_get_buffers(&buf))
+		return -EBUSY;
+
+	va_start(args, fmt);
+	ret = vsnprintf(buf->buf, ARRAY_SIZE(buf->buf), fmt, args);
+	va_end(args);
+	/* If the string was truncated, we only wrote until the size of buffer. */
+	ret = min_t(u32, ret + 1, ARRAY_SIZE(buf->buf));
+	ss->len += ret;
+	ret = __bpf_stream_push_str(&ss->log, buf->buf, ret);
+	bpf_put_buffers();
+	return ret;
+}
+
+int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
+			    enum bpf_stream_id stream_id)
+{
+	struct llist_node *list, *head, *tail;
+	struct bpf_stream *stream;
+	int ret;
+
+	stream = bpf_stream_get(stream_id, prog->aux);
+	if (!stream)
+		return -EINVAL;
+
+	ret = bpf_stream_consume_capacity(stream, ss->len);
+	if (ret)
+		return ret;
+
+	list = llist_del_all(&ss->log);
+	head = list;
+
+	if (!list)
+		return 0;
+	while (llist_next(list)) {
+		tail = llist_next(list);
+		list = tail;
+	}
+	llist_add_batch(head, tail, &stream->log);
+	return 0;
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index df33d19c5c3b..60778be870e3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -576,7 +576,7 @@ static bool can_alloc_pages(void)
 		!IS_ENABLED(CONFIG_PREEMPT_RT);
 }
 
-static struct page *__bpf_alloc_page(int nid)
+struct page *__bpf_alloc_page(int nid)
 {
 	if (!can_alloc_pages())
 		return try_alloc_pages(nid, 0);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ff34e68c9237..aba0b38733bc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12117,6 +12117,7 @@ enum special_kfunc_type {
 	KF_bpf_res_spin_lock_irqsave,
 	KF_bpf_res_spin_unlock_irqrestore,
 	KF_bpf_dynptr_from_mem_slice,
+	KF_bpf_stream_get,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -12221,6 +12222,7 @@ BTF_ID(func, bpf_res_spin_unlock)
 BTF_ID(func, bpf_res_spin_lock_irqsave)
 BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 BTF_ID(func, bpf_dynptr_from_mem_slice)
+BTF_ID(func, bpf_stream_get)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -13886,10 +13888,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].type = PTR_TO_BTF_ID;
 			regs[BPF_REG_0].btf_id = ptr_type_id;
 
-			if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache])
+			if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache]) {
 				regs[BPF_REG_0].type |= PTR_UNTRUSTED;
-
-			if (is_iter_next_kfunc(&meta)) {
+			} else if (meta.func_id == special_kfunc_list[KF_bpf_stream_get]) {
+				regs[BPF_REG_0].type |= PTR_TRUSTED;
+			} else if (is_iter_next_kfunc(&meta)) {
 				struct bpf_reg_state *cur_iter;
 
 				cur_iter = get_iter_from_state(env->cur_state, &meta);
@@ -21521,8 +21524,10 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		   desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
 		insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
 		*cnt = 1;
-	} else if (is_bpf_wq_set_callback_impl_kfunc(desc->func_id)) {
-		struct bpf_insn ld_addrs[2] = { BPF_LD_IMM64(BPF_REG_4, (long)env->prog->aux) };
+	} else if (is_bpf_wq_set_callback_impl_kfunc(desc->func_id) ||
+		   desc->func_id == special_kfunc_list[KF_bpf_stream_get]) {
+		u32 regno = is_bpf_wq_set_callback_impl_kfunc(desc->func_id) ? BPF_REG_4 : BPF_REG_2;
+		struct bpf_insn ld_addrs[2] = { BPF_LD_IMM64(regno, (long)env->prog->aux) };
 
 		insn_buf[0] = ld_addrs[0];
 		insn_buf[1] = ld_addrs[1];
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
  2025-05-07 17:17 ` [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice Kumar Kartikeya Dwivedi
  2025-05-07 17:17 ` [PATCH bpf-next v1 02/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-08 10:30   ` kernel test robot
                     ` (2 more replies)
  2025-05-07 17:17 ` [PATCH bpf-next v1 04/11] bpf: Add function to find program from stack trace Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Prepare a function for use in future patches that can extract the file
info, line info, and the source line number for a given BPF program
provided it's program counter.

Only the basename of the file path is provided, given it can be
excessively long in some cases.

This will be used in later patches to print source info to the BPF
stream. The source line number is indicated by the return value, and the
file and line info are provided through out parameters.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h |  2 ++
 kernel/bpf/core.c   | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2c10ae62df2d..f12a0bf536c0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3644,4 +3644,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
+
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 22c278c008ce..df1bae084abd 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3204,3 +3204,43 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
 EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
+
+int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
+{
+	int idx = -1, insn_start, insn_end, len;
+	struct bpf_line_info *linfo;
+	void **jited_linfo;
+	struct btf *btf;
+
+	btf = prog->aux->btf;
+	linfo = prog->aux->linfo;
+	jited_linfo = prog->aux->jited_linfo;
+
+	if (!btf || !linfo || !prog->aux->jited_linfo)
+		return -EINVAL;
+	len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
+
+	linfo = &prog->aux->linfo[prog->aux->linfo_idx];
+	jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
+
+	insn_start = linfo[0].insn_off;
+	insn_end = insn_start + len;
+
+	for (int i = 0; linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {
+		if (jited_linfo[i] >= (void *)ip)
+			break;
+		idx = i;
+	}
+
+	if (idx == -1)
+		return -ENOENT;
+
+	/* Get base component of the file path. */
+	*filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
+	*filep = kbasename(*filep);
+	/* Obtain the source line, and strip whitespace in prefix. */
+	*linep = btf_name_by_offset(btf, linfo[idx].line_off);
+	while (isspace(**linep))
+		*linep += 1;
+	return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 04/11] bpf: Add function to find program from stack trace
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-08 23:07   ` Eduard Zingerman
  2025-05-07 17:17 ` [PATCH bpf-next v1 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

In preparation of figuring out the closest program that led to the
current point in the kernel, implement a function that scans through the
stack trace and finds out the closest BPF program when walking down the
stack trace.

Special care needs to be taken to skip over kernel and BPF subprog
frames. We basically scan until we find a BPF main prog frame. The
assumption is that if a program calls into us transitively, we'll
hit it along the way. If not, we end up returning NULL.

Contextually the function will be used in places where we know the
program may have called into us.

Due to reliance on arch_bpf_stack_walk(), this function only works on
x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from
arch_bpf_stack_walk as well since we call it outside bpf_throw()
context.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c |  1 -
 include/linux/bpf.h         |  1 +
 kernel/bpf/core.c           | 26 ++++++++++++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 9e5fe2ba858f..17693ee6bb1a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3791,7 +3791,6 @@ void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp
 	}
 	return;
 #endif
-	WARN(1, "verification of programs using bpf_throw should have failed\n");
 }
 
 void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f12a0bf536c0..b57d8a1a7758 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3645,5 +3645,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 }
 
 int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
+struct bpf_prog *bpf_prog_find_from_stack(void);
 
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index df1bae084abd..dcb665bff22f 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3244,3 +3244,29 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
 		*linep += 1;
 	return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
 }
+
+struct walk_stack_ctx {
+	struct bpf_prog *prog;
+};
+
+static bool find_from_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
+{
+	struct walk_stack_ctx *ctxp = cookie;
+	struct bpf_prog *prog;
+
+	if (!is_bpf_text_address(ip))
+		return true;
+	prog = bpf_prog_ksym_find(ip);
+	if (bpf_is_subprog(prog))
+		return true;
+	ctxp->prog = prog;
+	return false;
+}
+
+struct bpf_prog *bpf_prog_find_from_stack(void)
+{
+	struct walk_stack_ctx ctx = {};
+
+	arch_bpf_stack_walk(find_from_stack_cb, &ctx);
+	return ctx.prog;
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 05/11] bpf: Add dump_stack() analogue to print to BPF stderr
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 04/11] bpf: Add function to find program from stack trace Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-08 22:38   ` Eduard Zingerman
  2025-05-07 17:17 ` [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Introduce a kernel function which is the analogue of dump_stack()
printing some useful information and the stack trace. This is not
exposed to BPF programs yet, but can be made available in the future.

When we have a program counter for a BPF program in the stack trace,
also additionally output the filename and line number to make the trace
helpful. The rest of the trace can be passed into ./decode_stacktrace.sh
to obtain the line numbers for kernel symbols.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h |  2 ++
 kernel/bpf/stream.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b57d8a1a7758..46ce05aad0ed 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3598,8 +3598,10 @@ __printf(2, 3)
 int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
 int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
 			    enum bpf_stream_id stream_id);
+int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
 
 #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
+#define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
 
 #define bpf_stream_stage(prog, stream_id, expr)                  \
 	({                                                       \
diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
index a9151a8575ec..a921fb1de319 100644
--- a/kernel/bpf/stream.c
+++ b/kernel/bpf/stream.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
 
 #include <linux/bpf.h>
+#include <linux/filter.h>
 #include <linux/bpf_mem_alloc.h>
 #include <linux/percpu.h>
 #include <linux/refcount.h>
@@ -497,3 +498,44 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
 	llist_add_batch(head, tail, &stream->log);
 	return 0;
 }
+
+struct dump_stack_ctx {
+	struct bpf_stream_stage *ss;
+	int err;
+};
+
+static bool dump_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
+{
+	struct dump_stack_ctx *ctxp = cookie;
+	const char *file = "", *line = "";
+	struct bpf_prog *prog;
+	int num;
+
+	if (is_bpf_text_address(ip)) {
+		prog = bpf_prog_ksym_find(ip);
+		num = bpf_prog_get_file_line(prog, ip, &file, &line);
+		if (num == -1)
+			goto end;
+		ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n  %s @ %s:%d\n",
+						    (void *)ip, line, file, num);
+		return !ctxp->err;
+	}
+end:
+	ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n", (void *)ip);
+	return !ctxp->err;
+}
+
+int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss)
+{
+	struct dump_stack_ctx ctx = { .ss = ss };
+	int ret;
+
+	ret = bpf_stream_stage_printk(ss, "CPU: %d UID: %d PID: %d Comm: %s\n",
+				      raw_smp_processor_id(), __kuid_val(current_real_cred()->euid),
+				      current->pid, current->comm);
+	ret = ret ?: bpf_stream_stage_printk(ss, "Call trace:\n");
+	if (!ret)
+		arch_bpf_stack_walk(dump_stack_cb, &ctx);
+	ret = ret ?: ctx.err;
+	return ret ?: bpf_stream_stage_printk(ss, "\n");
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout to BPF stderr
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-08 12:53   ` kernel test robot
                     ` (2 more replies)
  2025-05-07 17:17 ` [PATCH bpf-next v1 07/11] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  10 siblings, 3 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Begin reporting may_goto timeouts to BPF program's stderr stream.
Make sure that we don't end up spamming too many errors if the
program keeps failing repeatedly and filling up the stream, hence
emit at most 512 error messages from the kernel for a given stream.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h | 21 ++++++++++++++-------
 kernel/bpf/core.c   | 17 ++++++++++++++++-
 kernel/bpf/stream.c |  5 +++++
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 46ce05aad0ed..daf95333be78 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1667,6 +1667,7 @@ struct bpf_prog_aux {
 		struct rcu_head	rcu;
 	};
 	struct bpf_stream stream[2];
+	atomic_t stream_error_cnt;
 };
 
 struct bpf_prog {
@@ -3589,6 +3590,8 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
 int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
 void bpf_put_buffers(void);
 
+#define BPF_PROG_STREAM_ERROR_CNT 512
+
 void bpf_prog_stream_init(struct bpf_prog *prog);
 void bpf_prog_stream_free(struct bpf_prog *prog);
 
@@ -3600,16 +3603,20 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
 			    enum bpf_stream_id stream_id);
 int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
 
+bool bpf_prog_stream_error_limit(struct bpf_prog *prog);
+
 #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
 #define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
 
-#define bpf_stream_stage(prog, stream_id, expr)                  \
-	({                                                       \
-		struct bpf_stream_stage __ss;                    \
-		bpf_stream_stage_init(&__ss);                    \
-		(expr);                                          \
-		bpf_stream_stage_commit(&__ss, prog, stream_id); \
-		bpf_stream_stage_free(&__ss);                    \
+#define bpf_stream_stage(prog, stream_id, expr)                          \
+	({                                                               \
+		struct bpf_stream_stage __ss;                            \
+		if (!bpf_prog_stream_error_limit(prog)) {                \
+			bpf_stream_stage_init(&__ss);                    \
+			(expr);                                          \
+			bpf_stream_stage_commit(&__ss, prog, stream_id); \
+			bpf_stream_stage_free(&__ss);                    \
+		}                                                        \
 	})
 
 #ifdef CONFIG_BPF_LSM
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index dcb665bff22f..d21c304fe829 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3156,6 +3156,19 @@ u64 __weak arch_bpf_timed_may_goto(void)
 	return 0;
 }
 
+static noinline void bpf_prog_report_may_goto_violation(void)
+{
+	struct bpf_prog *prog;
+
+	prog = bpf_prog_find_from_stack();
+	if (!prog)
+		return;
+	bpf_stream_stage(prog, BPF_STDERR, ({
+		bpf_stream_printk("ERROR: Timeout detected for may_goto instruction\n");
+		bpf_stream_dump_stack();
+	}));
+}
+
 u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *p)
 {
 	u64 time = ktime_get_mono_fast_ns();
@@ -3166,8 +3179,10 @@ u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *p)
 		return BPF_MAX_TIMED_LOOPS;
 	}
 	/* Check if we've exhausted our time slice, and zero count. */
-	if (time - p->timestamp >= (NSEC_PER_SEC / 4))
+	if (unlikely(time - p->timestamp >= (NSEC_PER_SEC / 4))) {
+		bpf_prog_report_may_goto_violation();
 		return 0;
+	}
 	/* Refresh the count for the stack frame. */
 	return BPF_MAX_TIMED_LOOPS;
 }
diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
index a921fb1de319..eaf0574866b1 100644
--- a/kernel/bpf/stream.c
+++ b/kernel/bpf/stream.c
@@ -539,3 +539,8 @@ int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss)
 	ret = ret ?: ctx.err;
 	return ret ?: bpf_stream_stage_printk(ss, "\n");
 }
+
+bool bpf_prog_stream_error_limit(struct bpf_prog *prog)
+{
+	return atomic_fetch_add(1, &prog->aux->stream_error_cnt) >= BPF_PROG_STREAM_ERROR_CNT;
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 07/11] bpf: Report rqspinlock deadlocks/timeout to BPF stderr
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-07 17:17 ` [PATCH bpf-next v1 08/11] bpf: Report arena faults " Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Begin reporting rqspinlock deadlocks and timeout to BPF program's
stderr.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/rqspinlock.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 338305c8852c..888c8e2f9061 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -666,6 +666,26 @@ EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
 
 __bpf_kfunc_start_defs();
 
+static void bpf_prog_report_rqspinlock_violation(const char *str, void *lock, bool irqsave)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	struct bpf_prog *prog;
+
+	prog = bpf_prog_find_from_stack();
+	if (!prog)
+		return;
+	bpf_stream_stage(prog, BPF_STDERR, ({
+		bpf_stream_printk("ERROR: %s for bpf_res_spin_lock%s\n", str, irqsave ? "_irqsave" : "");
+		bpf_stream_printk("Attempted lock   = 0x%px\n", lock);
+		bpf_stream_printk("Total held locks = %d\n", rqh->cnt);
+		for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
+			bpf_stream_printk("Held lock[%2d] = 0x%px\n", i, rqh->locks[i]);
+		bpf_stream_dump_stack();
+	}));
+}
+
+#define REPORT_STR(ret) ({ (ret) == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; })
+
 __bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
 {
 	int ret;
@@ -676,6 +696,7 @@ __bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
 	preempt_disable();
 	ret = res_spin_lock((rqspinlock_t *)lock);
 	if (unlikely(ret)) {
+		bpf_prog_report_rqspinlock_violation(REPORT_STR(ret), lock, false);
 		preempt_enable();
 		return ret;
 	}
@@ -698,6 +719,7 @@ __bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsign
 	local_irq_save(flags);
 	ret = res_spin_lock((rqspinlock_t *)lock);
 	if (unlikely(ret)) {
+		bpf_prog_report_rqspinlock_violation(REPORT_STR(ret), lock, true);
 		local_irq_restore(flags);
 		preempt_enable();
 		return ret;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 08/11] bpf: Report arena faults to BPF stderr
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 07/11] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-09 19:28   ` Eduard Zingerman
  2025-05-07 17:17 ` [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Begin reporting arena page faults and the faulting address to BPF
program's stderr, for now limited to x86, but arm64 support should
be easy to add.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 21 ++++++++++++++++++---
 include/linux/bpf.h         |  1 +
 kernel/bpf/arena.c          | 14 ++++++++++++++
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 17693ee6bb1a..dbb0feeec701 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1384,15 +1384,27 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
 }
 
 #define DONT_CLEAR 1
+#define ARENA_FAULT (1 << 8)
 
 bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
 {
-	u32 reg = x->fixup >> 8;
+	u32 arena_reg = (x->fixup >> 8) & 0xff;
+	bool is_arena = !!arena_reg;
+	u32 reg = x->fixup >> 16;
+	unsigned long addr;
+
+	/* Read here, if src_reg is dst_reg for load, we'll write 0 to it. */
+	if (is_arena)
+		addr = *(unsigned long *)((void *)regs + arena_reg);
 
 	/* jump over faulting load and clear dest register */
 	if (reg != DONT_CLEAR)
 		*(unsigned long *)((void *)regs + reg) = 0;
 	regs->ip += x->fixup & 0xff;
+
+	if (is_arena)
+		bpf_prog_report_arena_violation(reg == DONT_CLEAR, addr);
+
 	return true;
 }
 
@@ -2043,7 +2055,10 @@ st:			if (is_imm8(insn->off))
 				ex->data = EX_TYPE_BPF;
 
 				ex->fixup = (prog - start_of_ldx) |
-					((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 8);
+					((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 16)
+					| ((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[src_reg] : reg2pt_regs[dst_reg])<< 8);
+				/* Ensure src_reg offset fits in 1 byte. */
+				BUILD_BUG_ON(sizeof(struct pt_regs) > U8_MAX);
 			}
 			break;
 
@@ -2161,7 +2176,7 @@ st:			if (is_imm8(insn->off))
 				 * End result: x86 insn "mov rbx, qword ptr [rax+0x14]"
 				 * of 4 bytes will be ignored and rbx will be zero inited.
 				 */
-				ex->fixup = (prog - start_of_ldx) | (reg2pt_regs[dst_reg] << 8);
+				ex->fixup = (prog - start_of_ldx) | (reg2pt_regs[dst_reg] << 16);
 			}
 			break;
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index daf95333be78..9e086ca16028 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3604,6 +3604,7 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
 int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
 
 bool bpf_prog_stream_error_limit(struct bpf_prog *prog);
+void bpf_prog_report_arena_violation(bool write, unsigned long addr);
 
 #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
 #define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index 0d56cea71602..d4baa98de7d8 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -590,3 +590,17 @@ static int __init kfunc_init(void)
 	return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &common_kfunc_set);
 }
 late_initcall(kfunc_init);
+
+void bpf_prog_report_arena_violation(bool write, unsigned long addr)
+{
+	struct bpf_prog *prog;
+
+	prog = bpf_prog_find_from_stack();
+	if (!prog)
+		return;
+	bpf_stream_stage(prog, BPF_STDERR, ({
+		bpf_stream_printk("ERROR: Arena %s access at unmapped address 0x%lx\n",
+				  write ? "WRITE" : "READ", addr);
+		bpf_stream_dump_stack();
+	}));
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 08/11] bpf: Report arena faults " Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-08 23:31   ` Eduard Zingerman
                     ` (2 more replies)
  2025-05-07 17:17 ` [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
  2025-05-07 17:17 ` [PATCH bpf-next v1 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
  10 siblings, 3 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Introduce a new macro that allows printing data similar to bpf_printk(),
but to BPF streams. The first argument is the stream ID, the rest of the
arguments are same as what one would pass to bpf_printk().

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/stream.c         | 10 +++++++--
 tools/lib/bpf/bpf_helpers.h | 44 +++++++++++++++++++++++++++++++------
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
index eaf0574866b1..d64975486ad1 100644
--- a/kernel/bpf/stream.c
+++ b/kernel/bpf/stream.c
@@ -257,7 +257,12 @@ __bpf_kfunc int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__s
 	return ret;
 }
 
-__bpf_kfunc struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, void *aux__ign)
+/* Use int vs enum stream_id here, we use this kfunc in bpf_helpers.h, and
+ * keeping enum stream_id necessitates a complete definition of enum, but we
+ * can't copy it in the header as it may conflict with the definition in
+ * vmlinux.h.
+ */
+__bpf_kfunc struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign)
 {
 	struct bpf_prog_aux *aux = aux__ign;
 
@@ -351,7 +356,8 @@ __bpf_kfunc struct bpf_stream_elem *bpf_stream_next_elem(struct bpf_stream *stre
 	return elem;
 }
 
-__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(enum bpf_stream_id stream_id, u32 prog_id)
+/* Use int vs enum bpf_stream_id for consistency with bpf_stream_get. */
+__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(int stream_id, u32 prog_id)
 {
 	struct bpf_stream *stream;
 	struct bpf_prog *prog;
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index a50773d4616e..1a748c21e358 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -314,17 +314,47 @@ enum libbpf_tristate {
 			  ___param, sizeof(___param));		\
 })
 
+struct bpf_stream;
+
+extern struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign) __weak __ksym;
+extern int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__str, const void *args,
+			      __u32 len__sz) __weak __ksym;
+
+#define __bpf_stream_vprintk(stream, fmt, args...)				\
+({										\
+	static const char ___fmt[] = fmt;					\
+	unsigned long long ___param[___bpf_narg(args)];				\
+										\
+	_Pragma("GCC diagnostic push")						\
+	_Pragma("GCC diagnostic ignored \"-Wint-conversion\"")			\
+	___bpf_fill(___param, args);						\
+	_Pragma("GCC diagnostic pop")						\
+										\
+	int ___id = stream;							\
+	struct bpf_stream *___sptr = bpf_stream_get(___id, NULL);		\
+	if (___sptr)								\
+		bpf_stream_vprintk(___sptr, ___fmt, ___param, sizeof(___param));\
+})
+
 /* Use __bpf_printk when bpf_printk call has 3 or fewer fmt args
- * Otherwise use __bpf_vprintk
+ * Otherwise use __bpf_vprintk. Virtualize choices so stream printk
+ * can override it to bpf_stream_vprintk.
  */
-#define ___bpf_pick_printk(...) \
-	___bpf_nth(_, ##__VA_ARGS__, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,	\
-		   __bpf_vprintk, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,		\
-		   __bpf_vprintk, __bpf_vprintk, __bpf_printk /*3*/, __bpf_printk /*2*/,\
-		   __bpf_printk /*1*/, __bpf_printk /*0*/)
+#define ___bpf_pick_printk(choice, choice_3, ...)			\
+	___bpf_nth(_, ##__VA_ARGS__, choice, choice, choice,		\
+		   choice, choice, choice, choice,			\
+		   choice, choice, choice_3 /*3*/, choice_3 /*2*/,	\
+		   choice_3 /*1*/, choice_3 /*0*/)
 
 /* Helper macro to print out debug messages */
-#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
+#define __bpf_trace_printk(fmt, args...) \
+	___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
+#define __bpf_stream_printk(stream, fmt, args...) \
+	___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
+
+#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_printk(stream, fmt, ##args)
+
+#define bpf_printk(arg, args...) __bpf_trace_printk(arg, ##args)
 
 struct bpf_iter_num;
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-08 10:41   ` Quentin Monnet
  2025-05-09  6:21   ` Eduard Zingerman
  2025-05-07 17:17 ` [PATCH bpf-next v1 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
  10 siblings, 2 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Quentin Monnet, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Emil Tsalapatis, Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Add bpftool support for dumping streams of a given BPF program.
The syntax is `bpftool prog tracelog { stdout | stderr } PROG`.
The stdout is dumped to stdout, stderr is dumped to stderr.

Cc: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../bpftool/Documentation/bpftool-prog.rst    |  6 ++
 tools/bpf/bpftool/Makefile                    |  2 +-
 tools/bpf/bpftool/bash-completion/bpftool     | 16 +++-
 tools/bpf/bpftool/prog.c                      | 88 ++++++++++++++++++-
 tools/bpf/bpftool/skeleton/stream.bpf.c       | 69 +++++++++++++++
 5 files changed, 178 insertions(+), 3 deletions(-)
 create mode 100644 tools/bpf/bpftool/skeleton/stream.bpf.c

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index d6304e01afe0..258e16ee8def 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -173,6 +173,12 @@ bpftool prog tracelog
     purposes. For streaming data from BPF programs to user space, one can use
     perf events (see also **bpftool-map**\ (8)).
 
+bpftool prog tracelog { stdout | stderr } *PROG*
+    Dump the BPF stream of the program. BPF programs can write to these streams
+    at runtime with the **bpf_stream_vprintk**\ () kfunc. The kernel may write
+    error messages to the standard error stream. This facility should be used
+    only for debugging purposes.
+
 bpftool prog run *PROG* data_in *FILE* [data_out *FILE* [data_size_out *L*]] [ctx_in *FILE* [ctx_out *FILE* [ctx_size_out *M*]]] [repeat *N*]
     Run BPF program *PROG* in the kernel testing infrastructure for BPF,
     meaning that the program works on the data and context provided by the
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 9e9a5f006cd2..eb908223c3bb 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -234,7 +234,7 @@ $(OUTPUT)%.bpf.o: skeleton/%.bpf.c $(OUTPUT)vmlinux.h $(LIBBPF_BOOTSTRAP)
 $(OUTPUT)%.skel.h: $(OUTPUT)%.bpf.o $(BPFTOOL_BOOTSTRAP)
 	$(QUIET_GEN)$(BPFTOOL_BOOTSTRAP) gen skeleton $< > $@
 
-$(OUTPUT)prog.o: $(OUTPUT)profiler.skel.h
+$(OUTPUT)prog.o: $(OUTPUT)profiler.skel.h $(OUTPUT)stream.skel.h
 
 $(OUTPUT)pids.o: $(OUTPUT)pid_iter.skel.h
 
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 1ce409a6cbd9..c7c0bf3aee24 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -518,7 +518,21 @@ _bpftool()
                     esac
                     ;;
                 tracelog)
-                    return 0
+                    case $prev in
+                        $command)
+                            COMPREPLY+=( $( compgen -W "stdout stderr" -- \
+                                "$cur" ) )
+                            return 0
+                            ;;
+                        stdout|stderr)
+                            COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \
+                                "$cur" ) )
+                            return 0
+                            ;;
+                        *)
+                            return 0
+                            ;;
+                    esac
                     ;;
                 profile)
                     case $cword in
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index f010295350be..7abe4698c86c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -35,6 +35,8 @@
 #include "main.h"
 #include "xlated_dumper.h"
 
+#include "stream.skel.h"
+
 #define BPF_METADATA_PREFIX "bpf_metadata_"
 #define BPF_METADATA_PREFIX_LEN (sizeof(BPF_METADATA_PREFIX) - 1)
 
@@ -697,6 +699,15 @@ static int do_show(int argc, char **argv)
 	return err;
 }
 
+static int process_stream_sample(void *ctx, void *data, size_t len)
+{
+	FILE *file = ctx;
+
+	fprintf(file, "%s", (char *)data);
+	fflush(file);
+	return 0;
+}
+
 static int
 prog_dump(struct bpf_prog_info *info, enum dump_mode mode,
 	  char *filepath, bool opcodes, bool visual, bool linum)
@@ -1113,6 +1124,80 @@ static int do_detach(int argc, char **argv)
 	return 0;
 }
 
+enum prog_tracelog_mode {
+	TRACE_STDOUT,
+	TRACE_STDERR,
+};
+
+static int
+prog_tracelog_stream(struct bpf_prog_info *info, enum prog_tracelog_mode mode)
+{
+	FILE *file = mode == TRACE_STDOUT ? stdout : stderr;
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct ring_buffer *ringbuf;
+	struct stream_bpf *skel;
+	int map_fd, ret = -1;
+
+	__u32 prog_id = info->id;
+	__u32 stream_id = mode == TRACE_STDOUT ? 1 : 2;
+
+	skel = stream_bpf__open_and_load();
+	if (!skel)
+		return -errno;
+	skel->bss->prog_id = prog_id;
+	skel->bss->stream_id = stream_id;
+
+	map_fd = bpf_map__fd(skel->maps.ringbuf);
+	ringbuf = ring_buffer__new(map_fd, process_stream_sample, file, NULL);
+	if (!ringbuf) {
+		ret = -errno;
+		goto end;
+	}
+	do {
+		skel->bss->written_count = skel->bss->written_size = 0;
+		ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.bpftool_dump_prog_stream), &opts);
+		if (ring_buffer__consume_n(ringbuf, skel->bss->written_count) != skel->bss->written_count) {
+			ret = -EINVAL;
+			goto end;
+		}
+	} while (!ret && opts.retval == EAGAIN);
+
+	if (opts.retval != 0)
+		ret = -EINVAL;
+end:
+	stream_bpf__destroy(skel);
+	return ret;
+}
+
+
+static int do_tracelog_any(int argc, char **argv)
+{
+	enum prog_tracelog_mode mode;
+	struct bpf_prog_info info;
+	__u32 info_len = sizeof(info);
+	int fd, err;
+
+	if (argc == 0)
+		return do_tracelog(argc, argv);
+	if (!is_prefix(*argv, "stdout") && !is_prefix(*argv, "stderr"))
+		usage();
+	mode = is_prefix(*argv, "stdout") ? TRACE_STDOUT : TRACE_STDERR;
+	NEXT_ARG();
+
+	if (!REQ_ARGS(2))
+		return -1;
+
+	fd = prog_parse_fd(&argc, &argv);
+	if (fd < 0)
+		return -1;
+
+	err = bpf_prog_get_info_by_fd(fd, &info, &info_len);
+	if (err < 0)
+		return -1;
+
+	return prog_tracelog_stream(&info, mode);
+}
+
 static int check_single_stdin(char *file_data_in, char *file_ctx_in)
 {
 	if (file_data_in && file_ctx_in &&
@@ -2483,6 +2568,7 @@ static int do_help(int argc, char **argv)
 		"                         [repeat N]\n"
 		"       %1$s %2$s profile PROG [duration DURATION] METRICs\n"
 		"       %1$s %2$s tracelog\n"
+		"       %1$s %2$s tracelog { stdout | stderr } PROG\n"
 		"       %1$s %2$s help\n"
 		"\n"
 		"       " HELP_SPEC_MAP "\n"
@@ -2522,7 +2608,7 @@ static const struct cmd cmds[] = {
 	{ "loadall",	do_loadall },
 	{ "attach",	do_attach },
 	{ "detach",	do_detach },
-	{ "tracelog",	do_tracelog },
+	{ "tracelog",	do_tracelog_any },
 	{ "run",	do_run },
 	{ "profile",	do_profile },
 	{ 0 }
diff --git a/tools/bpf/bpftool/skeleton/stream.bpf.c b/tools/bpf/bpftool/skeleton/stream.bpf.c
new file mode 100644
index 000000000000..910315959144
--- /dev/null
+++ b/tools/bpf/bpftool/skeleton/stream.bpf.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1024 * 1024);
+} ringbuf SEC(".maps");
+
+int written_size;
+int written_count;
+int stream_id;
+int prog_id;
+
+#define ENOENT 2
+#define EAGAIN 11
+#define EFAULT 14
+
+SEC("syscall")
+int bpftool_dump_prog_stream(void *ctx)
+{
+	struct bpf_stream_elem *elem;
+	struct bpf_stream *stream;
+	bool cont = false;
+	bool ret = 0;
+
+	stream = bpf_prog_stream_get(stream_id, prog_id);
+	if (!stream)
+		return ENOENT;
+
+	bpf_repeat(BPF_MAX_LOOPS) {
+		struct bpf_dynptr dst_dptr, src_dptr;
+		int size;
+
+		elem = bpf_stream_next_elem(stream);
+		if (!elem)
+			break;
+		size = elem->mem_slice.len;
+
+		if (bpf_dynptr_from_mem_slice(&elem->mem_slice, 0, &src_dptr))
+			ret = EFAULT;
+		if (bpf_ringbuf_reserve_dynptr(&ringbuf, size, 0, &dst_dptr))
+			ret = EFAULT;
+		if (bpf_dynptr_copy(&dst_dptr, 0, &src_dptr, 0, size))
+			ret = EFAULT;
+		bpf_ringbuf_submit_dynptr(&dst_dptr, 0);
+
+		written_count++;
+		written_size += size;
+
+		bpf_stream_free_elem(elem);
+
+		/* Probe and exit if no more space, probe for twice the typical size. */
+		if (bpf_ringbuf_reserve_dynptr(&ringbuf, 2048, 0, &dst_dptr))
+			cont = true;
+		bpf_ringbuf_discard_dynptr(&dst_dptr, 0);
+
+		if (ret || cont)
+			break;
+	}
+
+	bpf_prog_stream_put(stream);
+
+	return ret ? ret : (cont ? EAGAIN : 0);
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH bpf-next v1 11/11] selftests/bpf: Add tests for prog streams
  2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2025-05-07 17:17 ` [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
@ 2025-05-07 17:17 ` Kumar Kartikeya Dwivedi
  2025-05-09 17:18   ` Eduard Zingerman
  10 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-07 17:17 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

Add selftests to stress test the various facets of the stream API,
memory allocation pattern, and ensuring dumping support is tested and
functional. Create symlink to bpftool stream.bpf.c and use it to test
the support to dump messages to ringbuf in user space, and verify
output.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../testing/selftests/bpf/prog_tests/stream.c |  95 +++++++++++++
 tools/testing/selftests/bpf/progs/stream.c    | 127 ++++++++++++++++++
 .../selftests/bpf/progs/stream_bpftool.c      |   1 +
 .../testing/selftests/bpf/progs/stream_fail.c |  90 +++++++++++++
 4 files changed, 313 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/stream.c
 create mode 100644 tools/testing/selftests/bpf/progs/stream.c
 create mode 120000 tools/testing/selftests/bpf/progs/stream_bpftool.c
 create mode 100644 tools/testing/selftests/bpf/progs/stream_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
new file mode 100644
index 000000000000..7b97b783ff1f
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/stream.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <sys/mman.h>
+
+#include "stream.skel.h"
+#include "stream_fail.skel.h"
+
+#include "stream_bpftool.skel.h"
+
+void test_stream_failure(void)
+{
+	RUN_TESTS(stream_fail);
+}
+
+void test_stream_success(void)
+{
+	RUN_TESTS(stream);
+	RUN_TESTS(stream_bpftool);
+	return;
+}
+
+typedef int (*sample_cb_t)(void *, void *, size_t);
+
+static void stream_ringbuf_output(int prog_id, sample_cb_t sample_cb)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct ring_buffer *ringbuf;
+	struct stream_bpftool *skel;
+	int fd, ret;
+
+	skel = stream_bpftool__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "stream_bpftool_open_and_load"))
+		return;
+
+	fd = bpf_map__fd(skel->maps.ringbuf);
+
+	ringbuf = ring_buffer__new(fd, sample_cb, NULL, NULL);
+	if (!ASSERT_OK_PTR(ringbuf, "ringbuf_new"))
+		goto end;
+
+	skel->bss->prog_id = prog_id;
+	skel->bss->stream_id = 1;
+	do {
+		skel->bss->written_count = skel->bss->written_size = 0;
+		ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.bpftool_dump_prog_stream), &opts);
+		if (ret)
+			break;
+		ret = ring_buffer__consume_n(ringbuf, skel->bss->written_count);
+		if (!ASSERT_EQ(ret, skel->bss->written_count, "consume"))
+			break;
+		ret = 0;
+	} while (opts.retval == EAGAIN);
+
+	ASSERT_OK(ret, "ret");
+	ASSERT_EQ(opts.retval, 0, "retval");
+
+end:
+	stream_bpftool__destroy(skel);
+}
+
+int cnt = 0;
+
+static int process_sample(void *ctx, void *data, size_t len)
+{
+	char buf[64];
+
+	snprintf(buf, sizeof(buf), "num=%d\n", cnt++);
+	ASSERT_TRUE(strcmp(buf, (char *)data) == 0, "sample strcmp");
+	return 0;
+}
+
+void test_stream_output(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	struct stream *skel;
+	int ret;
+
+	skel = stream__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
+		return;
+
+	ASSERT_OK(bpf_prog_get_info_by_fd(bpf_program__fd(skel->progs.stream_test_output), &info, &info_len), "get info");
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.stream_test_output), &opts);
+	ASSERT_OK(ret, "ret");
+	ASSERT_OK(opts.retval, "retval");
+	stream_ringbuf_output(info.id, process_sample);
+
+	ASSERT_EQ(cnt, 1000, "cnt");
+
+	stream__destroy(skel);
+	return;
+}
diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
new file mode 100644
index 000000000000..14cb8690824f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stream.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#define _STR "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+#define STREAM_STR (u64)(_STR _STR _STR _STR)
+
+static __noinline int stream_exercise(int id, int N)
+{
+	struct bpf_stream_elem *elem, *earr[56] = {};
+	struct bpf_stream *stream;
+	int ret;
+	u32 i;
+
+	if (N > 56)
+		return 56;
+
+	stream = bpf_stream_get(id, NULL);
+	if (!stream)
+		return 1;
+	for (i = 0; i < N; i++)
+		if ((ret = bpf_stream_vprintk(stream, "%llu%s", &(u64[]){i, STREAM_STR}, 16)) < 0) {
+			bpf_printk("bpf_stream_vprintk ret=%d", ret);
+			return 2;
+		}
+	ret = 0;
+	for (i = 0; i < N; i++) {
+		elem = bpf_stream_next_elem(stream);
+		if (!elem) {
+			ret = 4;
+			break;
+		}
+		earr[i] = elem;
+	}
+	elem = bpf_stream_next_elem(stream);
+	if (elem) {
+		bpf_stream_free_elem(elem);
+		ret = 5;
+	}
+	for (i = 0; i < N; i++)
+		if (earr[i])
+			bpf_stream_free_elem(earr[i]);
+	return ret;
+}
+
+static __noinline int stream_exercise_nums(int id)
+{
+	int ret = 0;
+
+	ret = ret ?: stream_exercise(id, 56);
+	ret = ret ?: stream_exercise(id, 42);
+	ret = ret ?: stream_exercise(id, 28);
+	ret = ret ?: stream_exercise(id, 10);
+	ret = ret ?: stream_exercise(id, 1);
+
+	return ret;
+}
+
+SEC("syscall")
+__success __retval(0)
+int stream_test(void *ctx)
+{
+	unsigned long flags;
+	int ret;
+
+	bpf_local_irq_save(&flags);
+	bpf_repeat(50) {
+		ret = stream_exercise_nums(BPF_STDOUT);
+		if (ret)
+			break;
+	}
+	if (ret) {
+		bpf_local_irq_restore(&flags);
+		return ret;
+	}
+	bpf_repeat(100) {
+		ret = stream_exercise_nums(BPF_STDERR);
+		if (ret)
+			break;
+	}
+	bpf_local_irq_restore(&flags);
+
+	if (ret)
+		return ret;
+
+	ret = stream_exercise_nums(BPF_STDOUT);
+	if (ret)
+		return ret;
+	return stream_exercise_nums(BPF_STDERR);
+}
+
+SEC("syscall")
+__success __retval(0)
+int stream_test_output(void *ctx)
+{
+	for (int i = 0; i < 1000; i++)
+		bpf_stream_printk(BPF_STDOUT, "num=%d\n", i);
+	return 0;
+}
+
+SEC("syscall")
+__success __retval(0)
+int stream_test_limit(void *ctx)
+{
+	struct bpf_stream *stream;
+	bool failed = false;
+
+	stream = bpf_stream_get(BPF_STDOUT, NULL);
+	if (!stream)
+		return 2;
+
+	bpf_repeat(BPF_MAX_LOOPS) {
+		failed = bpf_stream_vprintk(stream, "%s%s%s", &(u64[]){STREAM_STR, STREAM_STR}, 16) != 0;
+		if (failed)
+			break;
+	}
+
+	if (failed)
+		return 0;
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/stream_bpftool.c b/tools/testing/selftests/bpf/progs/stream_bpftool.c
new file mode 120000
index 000000000000..5904c0d92edc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stream_bpftool.c
@@ -0,0 +1 @@
+../../../../bpf/bpftool/skeleton/stream.bpf.c
\ No newline at end of file
diff --git a/tools/testing/selftests/bpf/progs/stream_fail.c b/tools/testing/selftests/bpf/progs/stream_fail.c
new file mode 100644
index 000000000000..50f70b9878b8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stream_fail.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+
+SEC("syscall")
+__failure __msg("R1 type=trusted_ptr_or_null_ expected=")
+int stream_get_trusted(void *ctx) {
+	struct bpf_stream *stream;
+
+	stream = bpf_stream_get(BPF_STDOUT, NULL);
+	bpf_this_cpu_ptr(stream);
+	return 0;
+}
+
+SEC("tc")
+__failure __msg("calling kernel function bpf_prog_stream_get is not allowed")
+int stream_get_prog_fail(void *ctx) {
+	struct bpf_stream *stream;
+
+	stream = bpf_prog_stream_get(BPF_STDOUT, 0);
+	if (!stream)
+		return 0;
+	bpf_this_cpu_ptr(stream);
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("R1 type=ptr_or_null_ expected=")
+int stream_get_prog_trusted(void *ctx) {
+	struct bpf_stream *stream;
+
+	stream = bpf_prog_stream_get(BPF_STDOUT, 0);
+	bpf_this_cpu_ptr(stream);
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("Unreleased reference")
+int stream_get_put_missing(void *ctx) {
+	struct bpf_stream *stream;
+
+	stream = bpf_prog_stream_get(BPF_STDOUT, 0);
+	if (!stream)
+		return 0;
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("R1 must be referenced or trusted")
+int stream_next_untrusted_arg(void *ctx)
+{
+	struct bpf_stream *stream;
+
+	stream = bpf_core_cast((void *)0xdeadbeef, typeof(*stream));
+	bpf_stream_next_elem(stream);
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed")
+int stream_next_null_arg(void *ctx)
+{
+	bpf_stream_next_elem(NULL);
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("R1 must be referenced or trusted")
+int stream_vprintk_untrusted_arg(void *ctx)
+{
+	struct bpf_stream *stream;
+
+	stream = bpf_core_cast((void *)0xfaceb00c, typeof(*stream));
+	bpf_stream_vprintk(stream, "", NULL, 0);
+	return 0;
+}
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed")
+int stream_vprintk_null_arg(void *ctx)
+{
+	bpf_stream_vprintk(NULL, "", NULL, 0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info
  2025-05-07 17:17 ` [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
@ 2025-05-08 10:30   ` kernel test robot
  2025-05-08 20:15   ` Eduard Zingerman
  2025-05-09 21:17   ` Andrii Nakryiko
  2 siblings, 0 replies; 55+ messages in thread
From: kernel test robot @ 2025-05-08 10:30 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi; +Cc: oe-kbuild-all

Hi Kumar,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 43745d11bfd9683abdf08ad7a5cc403d6a9ffd15]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/bpf-Introduce-bpf_dynptr_from_mem_slice/20250508-012020
base:   43745d11bfd9683abdf08ad7a5cc403d6a9ffd15
patch link:    https://lore.kernel.org/r/20250507171720.1958296-4-memxor%40gmail.com
patch subject: [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20250508/202505081842.Tgphc4hq-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250508/202505081842.Tgphc4hq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505081842.Tgphc4hq-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/bitmap.h:13,
                    from include/linux/cpumask.h:12,
                    from arch/x86/include/asm/cpumask.h:5,
                    from arch/x86/include/asm/msr.h:11,
                    from arch/x86/include/asm/tsc.h:10,
                    from arch/x86/include/asm/timex.h:6,
                    from include/linux/timex.h:67,
                    from include/linux/time32.h:13,
                    from include/linux/time.h:60,
                    from include/linux/jiffies.h:10,
                    from include/linux/ktime.h:25,
                    from include/linux/timer.h:6,
                    from include/linux/workqueue.h:9,
                    from include/linux/bpf.h:10,
                    from include/linux/filter.h:9,
                    from kernel/bpf/core.c:21:
   In function 'kbasename',
       inlined from 'bpf_prog_get_file_line' at kernel/bpf/core.c:3240:11:
>> include/linux/string.h:387:28: warning: argument 1 null where non-null expected [-Wnonnull]
     387 |         const char *tail = strrchr(path, '/');
         |                            ^~~~~~~~~~~~~~~~~~
   kernel/bpf/core.c: In function 'bpf_prog_get_file_line':
   include/linux/string.h:183:15: note: in a call to function 'strrchr' declared 'nonnull'
     183 | extern char * strrchr(const char *,int);
         |               ^~~~~~~


vim +387 include/linux/string.h

639b9e34f15e4b Akinobu Mita    2012-07-30  379  
b18888ab256f05 Andy Shevchenko 2012-12-17  380  /**
b18888ab256f05 Andy Shevchenko 2012-12-17  381   * kbasename - return the last part of a pathname.
b18888ab256f05 Andy Shevchenko 2012-12-17  382   *
b18888ab256f05 Andy Shevchenko 2012-12-17  383   * @path: path to extract the filename from.
b18888ab256f05 Andy Shevchenko 2012-12-17  384   */
b18888ab256f05 Andy Shevchenko 2012-12-17  385  static inline const char *kbasename(const char *path)
b18888ab256f05 Andy Shevchenko 2012-12-17  386  {
b18888ab256f05 Andy Shevchenko 2012-12-17 @387  	const char *tail = strrchr(path, '/');
b18888ab256f05 Andy Shevchenko 2012-12-17  388  	return tail ? tail + 1 : path;
b18888ab256f05 Andy Shevchenko 2012-12-17  389  }
b18888ab256f05 Andy Shevchenko 2012-12-17  390  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-07 17:17 ` [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
@ 2025-05-08 10:41   ` Quentin Monnet
  2025-05-08 23:41     ` Kumar Kartikeya Dwivedi
  2025-05-09  6:21   ` Eduard Zingerman
  1 sibling, 1 reply; 55+ messages in thread
From: Quentin Monnet @ 2025-05-08 10:41 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

On 07/05/2025 18:17, Kumar Kartikeya Dwivedi wrote:
> Add bpftool support for dumping streams of a given BPF program.
> The syntax is `bpftool prog tracelog { stdout | stderr } PROG`.
> The stdout is dumped to stdout, stderr is dumped to stderr.
> 
> Cc: Quentin Monnet <qmo@kernel.org>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../bpftool/Documentation/bpftool-prog.rst    |  6 ++
>  tools/bpf/bpftool/Makefile                    |  2 +-
>  tools/bpf/bpftool/bash-completion/bpftool     | 16 +++-
>  tools/bpf/bpftool/prog.c                      | 88 ++++++++++++++++++-
>  tools/bpf/bpftool/skeleton/stream.bpf.c       | 69 +++++++++++++++
>  5 files changed, 178 insertions(+), 3 deletions(-)
>  create mode 100644 tools/bpf/bpftool/skeleton/stream.bpf.c
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> index d6304e01afe0..258e16ee8def 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> @@ -173,6 +173,12 @@ bpftool prog tracelog
>      purposes. For streaming data from BPF programs to user space, one can use
>      perf events (see also **bpftool-map**\ (8)).
>  
> +bpftool prog tracelog { stdout | stderr } *PROG*
> +    Dump the BPF stream of the program. BPF programs can write to these streams
> +    at runtime with the **bpf_stream_vprintk**\ () kfunc. The kernel may write
> +    error messages to the standard error stream. This facility should be used
> +    only for debugging purposes.


Thanks! The syntax "bpftool prog tracelog stdout/stderr <prog>" works
well for me.

Can you also update the short description line at the top of the file
too? Should be:

    | **bpftool** **prog tracelog** [ { **stdout** | **stderr** } *PROG* ]


> +
>  bpftool prog run *PROG* data_in *FILE* [data_out *FILE* [data_size_out *L*]] [ctx_in *FILE* [ctx_out *FILE* [ctx_size_out *M*]]] [repeat *N*]
>      Run BPF program *PROG* in the kernel testing infrastructure for BPF,
>      meaning that the program works on the data and context provided by the
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index 9e9a5f006cd2..eb908223c3bb 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -234,7 +234,7 @@ $(OUTPUT)%.bpf.o: skeleton/%.bpf.c $(OUTPUT)vmlinux.h $(LIBBPF_BOOTSTRAP)
>  $(OUTPUT)%.skel.h: $(OUTPUT)%.bpf.o $(BPFTOOL_BOOTSTRAP)
>  	$(QUIET_GEN)$(BPFTOOL_BOOTSTRAP) gen skeleton $< > $@
>  
> -$(OUTPUT)prog.o: $(OUTPUT)profiler.skel.h
> +$(OUTPUT)prog.o: $(OUTPUT)profiler.skel.h $(OUTPUT)stream.skel.h
>  
>  $(OUTPUT)pids.o: $(OUTPUT)pid_iter.skel.h
>  
> diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> index 1ce409a6cbd9..c7c0bf3aee24 100644
> --- a/tools/bpf/bpftool/bash-completion/bpftool
> +++ b/tools/bpf/bpftool/bash-completion/bpftool
> @@ -518,7 +518,21 @@ _bpftool()
>                      esac
>                      ;;
>                  tracelog)
> -                    return 0
> +                    case $prev in
> +                        $command)
> +                            COMPREPLY+=( $( compgen -W "stdout stderr" -- \
> +                                "$cur" ) )
> +                            return 0
> +                            ;;
> +                        stdout|stderr)
> +                            COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \
> +                                "$cur" ) )
> +                            return 0
> +                            ;;
> +                        *)
> +                            return 0
> +                            ;;
> +                    esac


Works well, thanks for this!


>                      ;;
>                  profile)
>                      case $cword in
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index f010295350be..7abe4698c86c 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -35,6 +35,8 @@
>  #include "main.h"
>  #include "xlated_dumper.h"
>  
> +#include "stream.skel.h"
> +
>  #define BPF_METADATA_PREFIX "bpf_metadata_"
>  #define BPF_METADATA_PREFIX_LEN (sizeof(BPF_METADATA_PREFIX) - 1)
>  
> @@ -697,6 +699,15 @@ static int do_show(int argc, char **argv)
>  	return err;
>  }
>  
> +static int process_stream_sample(void *ctx, void *data, size_t len)
> +{
> +	FILE *file = ctx;
> +
> +	fprintf(file, "%s", (char *)data);
> +	fflush(file);
> +	return 0;
> +}
> +
>  static int
>  prog_dump(struct bpf_prog_info *info, enum dump_mode mode,
>  	  char *filepath, bool opcodes, bool visual, bool linum)
> @@ -1113,6 +1124,80 @@ static int do_detach(int argc, char **argv)
>  	return 0;
>  }
>  
> +enum prog_tracelog_mode {
> +	TRACE_STDOUT,
> +	TRACE_STDERR,
> +};
> +
> +static int
> +prog_tracelog_stream(struct bpf_prog_info *info, enum prog_tracelog_mode mode)
> +{
> +	FILE *file = mode == TRACE_STDOUT ? stdout : stderr;
> +	LIBBPF_OPTS(bpf_test_run_opts, opts);
> +	struct ring_buffer *ringbuf;
> +	struct stream_bpf *skel;
> +	int map_fd, ret = -1;
> +
> +	__u32 prog_id = info->id;
> +	__u32 stream_id = mode == TRACE_STDOUT ? 1 : 2;
> +
> +	skel = stream_bpf__open_and_load();
> +	if (!skel)
> +		return -errno;
> +	skel->bss->prog_id = prog_id;
> +	skel->bss->stream_id = stream_id;
> +
> +	map_fd = bpf_map__fd(skel->maps.ringbuf);
> +	ringbuf = ring_buffer__new(map_fd, process_stream_sample, file, NULL);
> +	if (!ringbuf) {
> +		ret = -errno;
> +		goto end;
> +	}
> +	do {
> +		skel->bss->written_count = skel->bss->written_size = 0;
> +		ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.bpftool_dump_prog_stream), &opts);
> +		if (ring_buffer__consume_n(ringbuf, skel->bss->written_count) != skel->bss->written_count) {
> +			ret = -EINVAL;
> +			goto end;
> +		}
> +	} while (!ret && opts.retval == EAGAIN);
> +
> +	if (opts.retval != 0)
> +		ret = -EINVAL;
> +end:
> +	stream_bpf__destroy(skel);
> +	return ret;
> +}
> +
> +
> +static int do_tracelog_any(int argc, char **argv)
> +{
> +	enum prog_tracelog_mode mode;
> +	struct bpf_prog_info info;
> +	__u32 info_len = sizeof(info);
> +	int fd, err;
> +
> +	if (argc == 0)
> +		return do_tracelog(argc, argv);
> +	if (!is_prefix(*argv, "stdout") && !is_prefix(*argv, "stderr"))
> +		usage();
> +	mode = is_prefix(*argv, "stdout") ? TRACE_STDOUT : TRACE_STDERR;
> +	NEXT_ARG();
> +
> +	if (!REQ_ARGS(2))
> +		return -1;
> +
> +	fd = prog_parse_fd(&argc, &argv);
> +	if (fd < 0)
> +		return -1;
> +
> +	err = bpf_prog_get_info_by_fd(fd, &info, &info_len);
> +	if (err < 0)
> +		return -1;
> +
> +	return prog_tracelog_stream(&info, mode);
> +}
> +
>  static int check_single_stdin(char *file_data_in, char *file_ctx_in)
>  {
>  	if (file_data_in && file_ctx_in &&
> @@ -2483,6 +2568,7 @@ static int do_help(int argc, char **argv)
>  		"                         [repeat N]\n"
>  		"       %1$s %2$s profile PROG [duration DURATION] METRICs\n"
>  		"       %1$s %2$s tracelog\n"
> +		"       %1$s %2$s tracelog { stdout | stderr } PROG\n"
>  		"       %1$s %2$s help\n"
>  		"\n"
>  		"       " HELP_SPEC_MAP "\n"
> @@ -2522,7 +2608,7 @@ static const struct cmd cmds[] = {
>  	{ "loadall",	do_loadall },
>  	{ "attach",	do_attach },
>  	{ "detach",	do_detach },
> -	{ "tracelog",	do_tracelog },
> +	{ "tracelog",	do_tracelog_any },
>  	{ "run",	do_run },
>  	{ "profile",	do_profile },
>  	{ 0 }
> diff --git a/tools/bpf/bpftool/skeleton/stream.bpf.c b/tools/bpf/bpftool/skeleton/stream.bpf.c
> new file mode 100644
> index 000000000000..910315959144
> --- /dev/null
> +++ b/tools/bpf/bpftool/skeleton/stream.bpf.c
> @@ -0,0 +1,69 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_helpers.h>
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_RINGBUF);
> +	__uint(max_entries, 1024 * 1024);
> +} ringbuf SEC(".maps");
> +
> +int written_size;
> +int written_count;
> +int stream_id;
> +int prog_id;
> +
> +#define ENOENT 2
> +#define EAGAIN 11
> +#define EFAULT 14
> +
> +SEC("syscall")
> +int bpftool_dump_prog_stream(void *ctx)
> +{
> +	struct bpf_stream_elem *elem;
> +	struct bpf_stream *stream;
> +	bool cont = false;
> +	bool ret = 0;
> +
> +	stream = bpf_prog_stream_get(stream_id, prog_id);


Recalling discussion from RFC:

>> Calls to these new kfuncs will break compilation on older systems that
>> don't support them yet (and don't have the definition in their
>> vmlinux.h). We should provide fallback definitions to make sure that the
>> program compiles.
> 
> This is the only thing I haven't yet addressed in v2, because it
> seemed a bit ugly.
> I tried adding kfunc declarations, but those aren't enough.
> We rely on structs introduced and read in this patch.
> So I think vmlinux.h needs to be dropped, but it means adding a lot
> more than just the declarations, all types, plus any types they
> transitively depend on.
> Maybe there is a better way (like detecting compilation failure and skipping?).
> But if not, I will address like above in v3.

We do have to provide a workaround, or bpftool won't be able to compile 
on any machine that doesn't know the new kfuncs yet.

I don't think there are so many definitions to add (we don't need to
drop the vmlinux.h), CO-RE should help and if my understanding is
correct, we should be able to do something like this (on top of your
patch):

    diff --git a/tools/bpf/bpftool/skeleton/stream.bpf.c b/tools/bpf/bpftool/skeleton/stream.bpf.c
    index 910315959144..5e3d8f4f68a5 100644
    --- a/tools/bpf/bpftool/skeleton/stream.bpf.c
    +++ b/tools/bpf/bpftool/skeleton/stream.bpf.c
    @@ -1,6 +1,7 @@
     // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
     /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
     #include <vmlinux.h>
    +#include <bpf/bpf_core_read.h>
     #include <bpf/bpf_tracing.h>
     #include <bpf/bpf_helpers.h>

    @@ -18,10 +19,31 @@ int prog_id;
     #define EAGAIN 11
     #define EFAULT 14

    +
    +struct bpf_mem_slice___local {
    +	u32 len;
    +} __attribute__((preserve_access_index));
    +struct bpf_stream_elem___local {
    +	struct bpf_mem_slice___local mem_slice;
    +} __attribute__((preserve_access_index));
    +
    +extern struct bpf_stream *bpf_prog_stream_get(int stream_id,
    +					      u32 prog_id) __ksym;
    +extern struct bpf_stream_elem___local *
    +bpf_stream_next_elem(struct bpf_stream *stream) __ksym;
    +extern int bpf_dynptr_from_mem_slice(struct bpf_mem_slice___local *mem_slice,
    +				     u64 flags,
    +				     struct bpf_dynptr *dptr__uninit) __ksym;
    +extern void bpf_stream_free_elem(struct bpf_stream_elem___local *elem) __ksym;
    +extern void bpf_prog_stream_put(struct bpf_stream *stream) __ksym;
    +extern int bpf_dynptr_copy(struct bpf_dynptr *dst_ptr, u32 dst_off,
    +			   struct bpf_dynptr *src_ptr, u32 src_off,
    +			   u32 size) __ksym;
    +
     SEC("syscall")
     int bpftool_dump_prog_stream(void *ctx)
     {
    -	struct bpf_stream_elem *elem;
    +	struct bpf_stream_elem___local *elem;
        struct bpf_stream *stream;
        bool cont = false;
        bool ret = 0;
    @@ -38,6 +60,7 @@ int bpftool_dump_prog_stream(void *ctx)
            if (!elem)
                break;
            size = elem->mem_slice.len;
    +		bpf_core_read(&size, sizeof(u32), &elem->mem_slice.len);

            if (bpf_dynptr_from_mem_slice(&elem->mem_slice, 0, &src_dptr))
                ret = EFAULT;


The diff above allowed me to compile on a box with a 6.10 kernel,
although I didn't check that the feature still works with a vmlinux
generated after applying your changes - please try it.

We should probably find workarounds for older struct and helpers too,
such as struct bpf_dynptr and bpf_ringbuf_(reserve|discard)_dynptr, but
I didn't look into it.


> +	if (!stream)
> +		return ENOENT;
> +
> +	bpf_repeat(BPF_MAX_LOOPS) {
> +		struct bpf_dynptr dst_dptr, src_dptr;
> +		int size;
> +
> +		elem = bpf_stream_next_elem(stream);
> +		if (!elem)
> +			break;
> +		size = elem->mem_slice.len;
> +
> +		if (bpf_dynptr_from_mem_slice(&elem->mem_slice, 0, &src_dptr))
> +			ret = EFAULT;
> +		if (bpf_ringbuf_reserve_dynptr(&ringbuf, size, 0, &dst_dptr))
> +			ret = EFAULT;
> +		if (bpf_dynptr_copy(&dst_dptr, 0, &src_dptr, 0, size))
> +			ret = EFAULT;
> +		bpf_ringbuf_submit_dynptr(&dst_dptr, 0);
> +
> +		written_count++;
> +		written_size += size;
> +
> +		bpf_stream_free_elem(elem);
> +
> +		/* Probe and exit if no more space, probe for twice the typical size. */
> +		if (bpf_ringbuf_reserve_dynptr(&ringbuf, 2048, 0, &dst_dptr))
> +			cont = true;
> +		bpf_ringbuf_discard_dynptr(&dst_dptr, 0);
> +
> +		if (ret || cont)
> +			break;
> +	}
> +
> +	bpf_prog_stream_put(stream);
> +
> +	return ret ? ret : (cont ? EAGAIN : 0);
> +}
> +
> +char _license[] SEC("license") = "Dual BSD/GPL";

Thanks!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout to BPF stderr
  2025-05-07 17:17 ` [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
@ 2025-05-08 12:53   ` kernel test robot
  2025-05-09  6:22   ` Eduard Zingerman
  2025-05-09  9:19   ` Alan Maguire
  2 siblings, 0 replies; 55+ messages in thread
From: kernel test robot @ 2025-05-08 12:53 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi; +Cc: oe-kbuild-all

Hi Kumar,

kernel test robot noticed the following build errors:

[auto build test ERROR on 43745d11bfd9683abdf08ad7a5cc403d6a9ffd15]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/bpf-Introduce-bpf_dynptr_from_mem_slice/20250508-012020
base:   43745d11bfd9683abdf08ad7a5cc403d6a9ffd15
patch link:    https://lore.kernel.org/r/20250507171720.1958296-7-memxor%40gmail.com
patch subject: [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout to BPF stderr
config: microblaze-defconfig (https://download.01.org/0day-ci/archive/20250508/202505082019.euW3a7Kk-lkp@intel.com/config)
compiler: microblaze-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250508/202505082019.euW3a7Kk-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505082019.euW3a7Kk-lkp@intel.com/

All errors (new ones prefixed by >>):

   microblaze-linux-ld: kernel/bpf/core.o: in function `bpf_prog_report_may_goto_violation':
>> kernel/bpf/core.c:3166:(.text+0x6e08): undefined reference to `bpf_prog_stream_error_limit'
>> microblaze-linux-ld: kernel/bpf/core.c:3166:(.text+0x6e28): undefined reference to `bpf_stream_stage_init'
>> microblaze-linux-ld: kernel/bpf/core.c:3166:(.text+0x6e3c): undefined reference to `bpf_stream_stage_printk'
>> microblaze-linux-ld: kernel/bpf/core.c:3166:(.text+0x6e48): undefined reference to `bpf_stream_stage_dump_stack'
>> microblaze-linux-ld: kernel/bpf/core.c:3166:(.text+0x6e5c): undefined reference to `bpf_stream_stage_commit'
>> microblaze-linux-ld: kernel/bpf/core.c:3166:(.text+0x6e68): undefined reference to `bpf_stream_stage_free'


vim +3166 kernel/bpf/core.c

  3158	
  3159	static noinline void bpf_prog_report_may_goto_violation(void)
  3160	{
  3161		struct bpf_prog *prog;
  3162	
  3163		prog = bpf_prog_find_from_stack();
  3164		if (!prog)
  3165			return;
> 3166		bpf_stream_stage(prog, BPF_STDERR, ({
  3167			bpf_stream_printk("ERROR: Timeout detected for may_goto instruction\n");
  3168			bpf_stream_dump_stack();
  3169		}));
  3170	}
  3171	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info
  2025-05-07 17:17 ` [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
  2025-05-08 10:30   ` kernel test robot
@ 2025-05-08 20:15   ` Eduard Zingerman
  2025-05-08 23:32     ` Kumar Kartikeya Dwivedi
  2025-05-09 21:17   ` Andrii Nakryiko
  2 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-08 20:15 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> Prepare a function for use in future patches that can extract the file
> info, line info, and the source line number for a given BPF program
> provided it's program counter.
> 
> Only the basename of the file path is provided, given it can be
> excessively long in some cases.
> 
> This will be used in later patches to print source info to the BPF
> stream. The source line number is indicated by the return value, and the
> file and line info are provided through out parameters.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Hi Kumar,

I did a silly test for this function by calling it for every ip in the
program at the and of the program load. See patch at the end of the
email. The goal was to compare its output with output of the `bpftool
prog dump jited`.

Next, I used pyperf600_iter.bpf.o as a guinea pig:

  bpftool prog load <kernel>/tools/testing/selftests/bpf/pyperf600_iter.bpf.o /sys/fs/bpf/dbg-prog
  bpftool prog dump jited pinned /sys/fs/bpf/dbg-prog

Overall, the bpftool output looks coherent to what is shown by printk.
However, I see an off-by-one difference, e.g.:

  // bpftool output

  void * get_thread_state(void * tls_base, PidData * pidData):
  bpf_prog_2af5b1ca414a1163_get_thread_state:
  ; static void *get_thread_state(void *tls_base, PidData *pidData)
     0:	endbr64
     ...
  ; bpf_probe_read_user(&key, sizeof(key), (void*)(long)pidData->tls_key_addr);
    1f:	movl	4(%rsi), %edx
    ...
  ; bpf_probe_read_user(&key, sizeof(key), (void*)(long)pidData->tls_key_addr);
    29:	movl	$4, %esi
    ...
  ; tls_base + 0x310 + key * 0x10 + 0x08);
    33:	movl	-12(%rbp), %edi
    ...
  ; bpf_probe_read_user(&thread_state, sizeof(thread_state),
    52:	movl	$8, %esi
    ...
  ; return thread_state;
    5f:	movq	-8(%rbp), %rax
    ...
  
  // printk

  [  114.506237] func[2] jited_len=106
  [  114.506306] ip=0, file='(null)', line='(null)', line_num=-2
  [  114.506395] ip=1, file='pyperf.h', line='static void *get_thread_state(void *tls_base, PidData *pidData)', line_num=77
  [  114.506571] ip=20, file='pyperf.h', line='bpf_probe_read_user(&key, sizeof(key), (void*)(long)pidData->tls_key_addr);', line_num=82
  [  114.506765] ip=34, file='pyperf.h', line='tls_base + 0x310 + key * 0x10 + 0x08);', line_num=84
  [  114.506919] ip=53, file='pyperf.h', line='bpf_probe_read_user(&thread_state, sizeof(thread_state),', line_num=83
  [  114.507096] ip=60, file='pyperf.h', line='return thread_state;', line_num=85
  
Note that ip for each printk entry is +1 compared to bpftool output.

Also, there is a BUG splat from KASAN in the end:

  [    2.343160] ==================================================================
  [    2.343277] BUG: KASAN: slab-out-of-bounds in bpf_prog_get_file_line (kernel/bpf/core.c:3213) 
  [    2.343397] Read of size 4 at addr ffff88810b5ea810 by task veristat/145
  [    2.343496] 
  [    2.343542] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-3.fc41 04/01/2014
  [    2.343544] Call Trace:
  ...
  [    2.343592] bpf_prog_get_file_line (kernel/bpf/core.c:3213) 
  [    2.343598] ?bpf_prog_2af5b1ca414a1163_get_thread_state+0x64/0x6a 85 
  [    2.343602] bpf_prog_load (kernel/bpf/syscall.c:3014) 
  ...
  [    2.343686] 
  [    2.346851] Allocated by task 145:
  [    2.346912] kasan_save_track (mm/kasan/common.c:48 mm/kasan/common.c:68) 
  [    2.346974] __kasan_kmalloc (mm/kasan/common.c:398) 
  [    2.347036] __kvmalloc_node_noprof (mm/slub.c:4342 mm/slub.c:5026) 
  [    2.347117] check_btf_info (kernel/bpf/verifier.c:17908 kernel/bpf/verifier.c:18120) 
  [    2.347179] bpf_check (kernel/bpf/verifier.c:24004) 
  [    2.347240] bpf_prog_load (kernel/bpf/syscall.c:2971) 
  [    2.347301] __sys_bpf (kernel/bpf/syscall.c:5897) 
  [    2.347363] __x64_sys_bpf (kernel/bpf/syscall.c:5958 kernel/bpf/syscall.c:5956 kernel/bpf/syscall.c:5956) 
  [    2.347423] do_syscall_64 (arch/x86/entry/syscall_64.c:0) 
  [    2.347484] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
  [    2.347566] 
  [    2.347607] The buggy address belongs to the object at ffff88810b5ea000
  [    2.347607]  which belongs to the cache kmalloc-4k of size 4096
  [    2.347782] The buggy address is located 0 bytes to the right of
  [    2.347782]  allocated 2064-byte region [ffff88810b5ea000, ffff88810b5ea810)

Am I doing something stupid or there is an issue?

--- 8< -------------------------------------------

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 4664ab5e8cc7..467ae79f77a1 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3188,6 +3188,7 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
 EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
 EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
 
+__attribute__((optnone)) // to see line numbers after decode_stacktrace
 int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
 {
        int idx = -1, insn_start, insn_end, len;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64c3393e8270..d1777b8c5558 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3001,6 +3001,23 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
        err = bpf_prog_new_fd(prog);
        if (err < 0)
                bpf_prog_put(prog);
+       for (int fidx = 0; fidx < prog->aux->func_cnt; ++fidx) {
+               struct bpf_prog *fprog = prog->aux->func[fidx];
+               int line_num, prev_line_num;
+               const char *filep, *linep;
+
+               prev_line_num = -1;
+               printk("func[%d] jited_len=%d\n", fidx, fprog->jited_len);
+               for (u32 ip = 0; ip < fprog->jited_len; ++ip) {
+                       filep = NULL;
+                       linep = NULL;
+                       line_num = bpf_prog_get_file_line(fprog, (u64)fprog->bpf_func + ip, &filep, &linep);
+                       if (line_num != prev_line_num)
+                               printk("ip=%x, file='%s', line='%s', line_num=%d\n",
+                                      ip, filep, linep, line_num);
+                       prev_line_num = line_num;
+               }
+       }
        return err;
------------------------------------------- >8 ---


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 05/11] bpf: Add dump_stack() analogue to print to BPF stderr
  2025-05-07 17:17 ` [PATCH bpf-next v1 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
@ 2025-05-08 22:38   ` Eduard Zingerman
  2025-05-08 23:29     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-08 22:38 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:

[...]

> +static bool dump_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
> +{
> +	struct dump_stack_ctx *ctxp = cookie;
> +	const char *file = "", *line = "";
> +	struct bpf_prog *prog;
> +	int num;
> +
> +	if (is_bpf_text_address(ip)) {
> +		prog = bpf_prog_ksym_find(ip);
> +		num = bpf_prog_get_file_line(prog, ip, &file, &line);
> +		if (num == -1)
> +			goto end;

Should this be `num < 0` ?
bpf_prog_get_file_line() can return -EINVAL and -ENOENT.

> +		ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n  %s @ %s:%d\n",
> +						    (void *)ip, line, file, num);
> +		return !ctxp->err;
> +	}
> +end:
> +	ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n", (void *)ip);
> +	return !ctxp->err;
> +}

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 04/11] bpf: Add function to find program from stack trace
  2025-05-07 17:17 ` [PATCH bpf-next v1 04/11] bpf: Add function to find program from stack trace Kumar Kartikeya Dwivedi
@ 2025-05-08 23:07   ` Eduard Zingerman
  2025-05-08 23:29     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-08 23:07 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> In preparation of figuring out the closest program that led to the
> current point in the kernel, implement a function that scans through the
> stack trace and finds out the closest BPF program when walking down the
> stack trace.
> 
> Special care needs to be taken to skip over kernel and BPF subprog
> frames. We basically scan until we find a BPF main prog frame. The
> assumption is that if a program calls into us transitively, we'll
> hit it along the way. If not, we end up returning NULL.
> 
> Contextually the function will be used in places where we know the
> program may have called into us.
> 
> Due to reliance on arch_bpf_stack_walk(), this function only works on
> x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from
> arch_bpf_stack_walk as well since we call it outside bpf_throw()
> context.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index df1bae084abd..dcb665bff22f 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3244,3 +3244,29 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
>  		*linep += 1;
>  	return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
>  }
> +
> +struct walk_stack_ctx {
> +	struct bpf_prog *prog;
> +};
> +
> +static bool find_from_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
> +{
> +	struct walk_stack_ctx *ctxp = cookie;
> +	struct bpf_prog *prog;
> +
> +	if (!is_bpf_text_address(ip))
> +		return true;
> +	prog = bpf_prog_ksym_find(ip);

Nit: both bpf_prog_ksym_find() and is_bpf_text_address()
     use bpf_ksym_find(), so it ends up called twice.

> +	if (bpf_is_subprog(prog))
> +		return true;
> +	ctxp->prog = prog;
> +	return false;
> +}
> +
> +struct bpf_prog *bpf_prog_find_from_stack(void)
> +{
> +	struct walk_stack_ctx ctx = {};
> +
> +	arch_bpf_stack_walk(find_from_stack_cb, &ctx);
> +	return ctx.prog;
> +}



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 05/11] bpf: Add dump_stack() analogue to print to BPF stderr
  2025-05-08 22:38   ` Eduard Zingerman
@ 2025-05-08 23:29     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-08 23:29 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 9 May 2025 at 00:38, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > +static bool dump_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
> > +{
> > +     struct dump_stack_ctx *ctxp = cookie;
> > +     const char *file = "", *line = "";
> > +     struct bpf_prog *prog;
> > +     int num;
> > +
> > +     if (is_bpf_text_address(ip)) {
> > +             prog = bpf_prog_ksym_find(ip);
> > +             num = bpf_prog_get_file_line(prog, ip, &file, &line);
> > +             if (num == -1)
> > +                     goto end;
>
> Should this be `num < 0` ?
> bpf_prog_get_file_line() can return -EINVAL and -ENOENT.

My bad, I modified the error code when cleaning it up but forgot to
change caller, thanks for catching it.

>
> > +             ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n  %s @ %s:%d\n",
> > +                                                 (void *)ip, line, file, num);
> > +             return !ctxp->err;
> > +     }
> > +end:
> > +     ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n", (void *)ip);
> > +     return !ctxp->err;
> > +}
>
> [...]
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 04/11] bpf: Add function to find program from stack trace
  2025-05-08 23:07   ` Eduard Zingerman
@ 2025-05-08 23:29     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-08 23:29 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 9 May 2025 at 01:07, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> > In preparation of figuring out the closest program that led to the
> > current point in the kernel, implement a function that scans through the
> > stack trace and finds out the closest BPF program when walking down the
> > stack trace.
> >
> > Special care needs to be taken to skip over kernel and BPF subprog
> > frames. We basically scan until we find a BPF main prog frame. The
> > assumption is that if a program calls into us transitively, we'll
> > hit it along the way. If not, we end up returning NULL.
> >
> > Contextually the function will be used in places where we know the
> > program may have called into us.
> >
> > Due to reliance on arch_bpf_stack_walk(), this function only works on
> > x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from
> > arch_bpf_stack_walk as well since we call it outside bpf_throw()
> > context.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index df1bae084abd..dcb665bff22f 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -3244,3 +3244,29 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
> >               *linep += 1;
> >       return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
> >  }
> > +
> > +struct walk_stack_ctx {
> > +     struct bpf_prog *prog;
> > +};
> > +
> > +static bool find_from_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
> > +{
> > +     struct walk_stack_ctx *ctxp = cookie;
> > +     struct bpf_prog *prog;
> > +
> > +     if (!is_bpf_text_address(ip))
> > +             return true;
> > +     prog = bpf_prog_ksym_find(ip);
>
> Nit: both bpf_prog_ksym_find() and is_bpf_text_address()
>      use bpf_ksym_find(), so it ends up called twice.
>

Good point, will fix.

> > +     if (bpf_is_subprog(prog))
> > +             return true;
> > +     ctxp->prog = prog;
> > +     return false;
> > +}
> > +
> > +struct bpf_prog *bpf_prog_find_from_stack(void)
> > +{
> > +     struct walk_stack_ctx ctx = {};
> > +
> > +     arch_bpf_stack_walk(find_from_stack_cb, &ctx);
> > +     return ctx.prog;
> > +}
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-07 17:17 ` [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
@ 2025-05-08 23:31   ` Eduard Zingerman
  2025-05-08 23:33     ` Kumar Kartikeya Dwivedi
  2025-05-08 23:41   ` Alexei Starovoitov
  2025-05-09 21:26   ` Andrii Nakryiko
  2 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-08 23:31 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:

[...]

> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index a50773d4616e..1a748c21e358 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h

[...]

>  /* Use __bpf_printk when bpf_printk call has 3 or fewer fmt args
> - * Otherwise use __bpf_vprintk
> + * Otherwise use __bpf_vprintk. Virtualize choices so stream printk
> + * can override it to bpf_stream_vprintk.
>   */
> -#define ___bpf_pick_printk(...) \
> -	___bpf_nth(_, ##__VA_ARGS__, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,	\
> -		   __bpf_vprintk, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,		\
> -		   __bpf_vprintk, __bpf_vprintk, __bpf_printk /*3*/, __bpf_printk /*2*/,\
> -		   __bpf_printk /*1*/, __bpf_printk /*0*/)
> +#define ___bpf_pick_printk(choice, choice_3, ...)			\
> +	___bpf_nth(_, ##__VA_ARGS__, choice, choice, choice,		\
> +		   choice, choice, choice, choice,			\
> +		   choice, choice, choice_3 /*3*/, choice_3 /*2*/,	\
> +		   choice_3 /*1*/, choice_3 /*0*/)
>  
>  /* Helper macro to print out debug messages */
> -#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
> +#define __bpf_trace_printk(fmt, args...) \
> +	___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
> +#define __bpf_stream_printk(stream, fmt, args...) \
> +	___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
                           ^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^
                           These two parameters are identical,
                           why is ___bpf_pick_printk is necessary in such case?
> +
> +#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_printk(stream, fmt, ##args)
> +
> +#define bpf_printk(arg, args...) __bpf_trace_printk(arg, ##args)
>  
>  struct bpf_iter_num;
>  



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info
  2025-05-08 20:15   ` Eduard Zingerman
@ 2025-05-08 23:32     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-08 23:32 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Thu, 8 May 2025 at 22:15, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> > Prepare a function for use in future patches that can extract the file
> > info, line info, and the source line number for a given BPF program
> > provided it's program counter.
> >
> > Only the basename of the file path is provided, given it can be
> > excessively long in some cases.
> >
> > This will be used in later patches to print source info to the BPF
> > stream. The source line number is indicated by the return value, and the
> > file and line info are provided through out parameters.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Hi Kumar,
>
> I did a silly test for this function by calling it for every ip in the
> program at the and of the program load. See patch at the end of the
> email. The goal was to compare its output with output of the `bpftool
> prog dump jited`.
>
> Next, I used pyperf600_iter.bpf.o as a guinea pig:
>
>   bpftool prog load <kernel>/tools/testing/selftests/bpf/pyperf600_iter.bpf.o /sys/fs/bpf/dbg-prog
>   bpftool prog dump jited pinned /sys/fs/bpf/dbg-prog
>
> Overall, the bpftool output looks coherent to what is shown by printk.
> However, I see an off-by-one difference, e.g.:
>
>   // bpftool output
>
>   void * get_thread_state(void * tls_base, PidData * pidData):
>   bpf_prog_2af5b1ca414a1163_get_thread_state:
>   ; static void *get_thread_state(void *tls_base, PidData *pidData)
>      0: endbr64
>      ...
>   ; bpf_probe_read_user(&key, sizeof(key), (void*)(long)pidData->tls_key_addr);
>     1f: movl    4(%rsi), %edx
>     ...
>   ; bpf_probe_read_user(&key, sizeof(key), (void*)(long)pidData->tls_key_addr);
>     29: movl    $4, %esi
>     ...
>   ; tls_base + 0x310 + key * 0x10 + 0x08);
>     33: movl    -12(%rbp), %edi
>     ...
>   ; bpf_probe_read_user(&thread_state, sizeof(thread_state),
>     52: movl    $8, %esi
>     ...
>   ; return thread_state;
>     5f: movq    -8(%rbp), %rax
>     ...
>
>   // printk
>
>   [  114.506237] func[2] jited_len=106
>   [  114.506306] ip=0, file='(null)', line='(null)', line_num=-2
>   [  114.506395] ip=1, file='pyperf.h', line='static void *get_thread_state(void *tls_base, PidData *pidData)', line_num=77
>   [  114.506571] ip=20, file='pyperf.h', line='bpf_probe_read_user(&key, sizeof(key), (void*)(long)pidData->tls_key_addr);', line_num=82
>   [  114.506765] ip=34, file='pyperf.h', line='tls_base + 0x310 + key * 0x10 + 0x08);', line_num=84
>   [  114.506919] ip=53, file='pyperf.h', line='bpf_probe_read_user(&thread_state, sizeof(thread_state),', line_num=83
>   [  114.507096] ip=60, file='pyperf.h', line='return thread_state;', line_num=85
>
> Note that ip for each printk entry is +1 compared to bpftool output.
>
> Also, there is a BUG splat from KASAN in the end:
>
>   [    2.343160] ==================================================================
>   [    2.343277] BUG: KASAN: slab-out-of-bounds in bpf_prog_get_file_line (kernel/bpf/core.c:3213)
>   [    2.343397] Read of size 4 at addr ffff88810b5ea810 by task veristat/145
>   [    2.343496]
>   [    2.343542] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-3.fc41 04/01/2014
>   [    2.343544] Call Trace:
>   ...
>   [    2.343592] bpf_prog_get_file_line (kernel/bpf/core.c:3213)
>   [    2.343598] ?bpf_prog_2af5b1ca414a1163_get_thread_state+0x64/0x6a 85
>   [    2.343602] bpf_prog_load (kernel/bpf/syscall.c:3014)
>   ...
>   [    2.343686]
>   [    2.346851] Allocated by task 145:
>   [    2.346912] kasan_save_track (mm/kasan/common.c:48 mm/kasan/common.c:68)
>   [    2.346974] __kasan_kmalloc (mm/kasan/common.c:398)
>   [    2.347036] __kvmalloc_node_noprof (mm/slub.c:4342 mm/slub.c:5026)
>   [    2.347117] check_btf_info (kernel/bpf/verifier.c:17908 kernel/bpf/verifier.c:18120)
>   [    2.347179] bpf_check (kernel/bpf/verifier.c:24004)
>   [    2.347240] bpf_prog_load (kernel/bpf/syscall.c:2971)
>   [    2.347301] __sys_bpf (kernel/bpf/syscall.c:5897)
>   [    2.347363] __x64_sys_bpf (kernel/bpf/syscall.c:5958 kernel/bpf/syscall.c:5956 kernel/bpf/syscall.c:5956)
>   [    2.347423] do_syscall_64 (arch/x86/entry/syscall_64.c:0)
>   [    2.347484] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>   [    2.347566]
>   [    2.347607] The buggy address belongs to the object at ffff88810b5ea000
>   [    2.347607]  which belongs to the cache kmalloc-4k of size 4096
>   [    2.347782] The buggy address is located 0 bytes to the right of
>   [    2.347782]  allocated 2064-byte region [ffff88810b5ea000, ffff88810b5ea810)
>
> Am I doing something stupid or there is an issue?
>
> --- 8< -------------------------------------------
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 4664ab5e8cc7..467ae79f77a1 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3188,6 +3188,7 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
>
> +__attribute__((optnone)) // to see line numbers after decode_stacktrace
>  int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
>  {
>         int idx = -1, insn_start, insn_end, len;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 64c3393e8270..d1777b8c5558 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3001,6 +3001,23 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
>         err = bpf_prog_new_fd(prog);
>         if (err < 0)
>                 bpf_prog_put(prog);
> +       for (int fidx = 0; fidx < prog->aux->func_cnt; ++fidx) {
> +               struct bpf_prog *fprog = prog->aux->func[fidx];
> +               int line_num, prev_line_num;
> +               const char *filep, *linep;
> +
> +               prev_line_num = -1;
> +               printk("func[%d] jited_len=%d\n", fidx, fprog->jited_len);
> +               for (u32 ip = 0; ip < fprog->jited_len; ++ip) {
> +                       filep = NULL;
> +                       linep = NULL;
> +                       line_num = bpf_prog_get_file_line(fprog, (u64)fprog->bpf_func + ip, &filep, &linep);
> +                       if (line_num != prev_line_num)
> +                               printk("ip=%x, file='%s', line='%s', line_num=%d\n",
> +                                      ip, filep, linep, line_num);
> +                       prev_line_num = line_num;
> +               }
> +       }
>         return err;
> ------------------------------------------- >8 ---
>

Thanks for trying it out, the ip slip is probably because we get the
return address at runtime so it's always trailing the actual ip of
what called into us.
I will look into the KASAN error / off-by-one.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-08 23:31   ` Eduard Zingerman
@ 2025-05-08 23:33     ` Kumar Kartikeya Dwivedi
  2025-05-09  6:16       ` Eduard Zingerman
  0 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-08 23:33 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 9 May 2025 at 01:31, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> > index a50773d4616e..1a748c21e358 100644
> > --- a/tools/lib/bpf/bpf_helpers.h
> > +++ b/tools/lib/bpf/bpf_helpers.h
>
> [...]
>
> >  /* Use __bpf_printk when bpf_printk call has 3 or fewer fmt args
> > - * Otherwise use __bpf_vprintk
> > + * Otherwise use __bpf_vprintk. Virtualize choices so stream printk
> > + * can override it to bpf_stream_vprintk.
> >   */
> > -#define ___bpf_pick_printk(...) \
> > -     ___bpf_nth(_, ##__VA_ARGS__, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,       \
> > -                __bpf_vprintk, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,          \
> > -                __bpf_vprintk, __bpf_vprintk, __bpf_printk /*3*/, __bpf_printk /*2*/,\
> > -                __bpf_printk /*1*/, __bpf_printk /*0*/)
> > +#define ___bpf_pick_printk(choice, choice_3, ...)                    \
> > +     ___bpf_nth(_, ##__VA_ARGS__, choice, choice, choice,            \
> > +                choice, choice, choice, choice,                      \
> > +                choice, choice, choice_3 /*3*/, choice_3 /*2*/,      \
> > +                choice_3 /*1*/, choice_3 /*0*/)
> >
> >  /* Helper macro to print out debug messages */
> > -#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
> > +#define __bpf_trace_printk(fmt, args...) \
> > +     ___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
> > +#define __bpf_stream_printk(stream, fmt, args...) \
> > +     ___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
>                            ^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^
>                            These two parameters are identical,
>                            why is ___bpf_pick_printk is necessary in such case?

In our case choice and choice_3 are the same, but for bpf_printk
they're different, I was mostly trying to reuse the pick_printk
machinery for both (which dispatches correctly to the actual macro).

> > +
> > +#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_printk(stream, fmt, ##args)
> > +
> > +#define bpf_printk(arg, args...) __bpf_trace_printk(arg, ##args)
> >
> >  struct bpf_iter_num;
> >
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-08 10:41   ` Quentin Monnet
@ 2025-05-08 23:41     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-08 23:41 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

On Thu, 8 May 2025 at 12:41, Quentin Monnet <qmo@kernel.org> wrote:
>
> On 07/05/2025 18:17, Kumar Kartikeya Dwivedi wrote:
> > Add bpftool support for dumping streams of a given BPF program.
> > The syntax is `bpftool prog tracelog { stdout | stderr } PROG`.
> > The stdout is dumped to stdout, stderr is dumped to stderr.
> >
> > Cc: Quentin Monnet <qmo@kernel.org>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  .../bpftool/Documentation/bpftool-prog.rst    |  6 ++
> >  tools/bpf/bpftool/Makefile                    |  2 +-
> >  tools/bpf/bpftool/bash-completion/bpftool     | 16 +++-
> >  tools/bpf/bpftool/prog.c                      | 88 ++++++++++++++++++-
> >  tools/bpf/bpftool/skeleton/stream.bpf.c       | 69 +++++++++++++++
> >  5 files changed, 178 insertions(+), 3 deletions(-)
> >  create mode 100644 tools/bpf/bpftool/skeleton/stream.bpf.c
> >
> > diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > index d6304e01afe0..258e16ee8def 100644
> > --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > @@ -173,6 +173,12 @@ bpftool prog tracelog
> >      purposes. For streaming data from BPF programs to user space, one can use
> >      perf events (see also **bpftool-map**\ (8)).
> >
> > +bpftool prog tracelog { stdout | stderr } *PROG*
> > +    Dump the BPF stream of the program. BPF programs can write to these streams
> > +    at runtime with the **bpf_stream_vprintk**\ () kfunc. The kernel may write
> > +    error messages to the standard error stream. This facility should be used
> > +    only for debugging purposes.
>
>
> Thanks! The syntax "bpftool prog tracelog stdout/stderr <prog>" works
> well for me.
>
> Can you also update the short description line at the top of the file
> too? Should be:
>
>     | **bpftool** **prog tracelog** [ { **stdout** | **stderr** } *PROG* ]
>

Will do.

>
> > +
> >  bpftool prog run *PROG* data_in *FILE* [data_out *FILE* [data_size_out *L*]] [ctx_in *FILE* [ctx_out *FILE* [ctx_size_out *M*]]] [repeat *N*]
> >      Run BPF program *PROG* in the kernel testing infrastructure for BPF,
> >      meaning that the program works on the data and context provided by the
> > diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> > index 9e9a5f006cd2..eb908223c3bb 100644
> > --- a/tools/bpf/bpftool/Makefile
> > +++ b/tools/bpf/bpftool/Makefile
> > @@ -234,7 +234,7 @@ $(OUTPUT)%.bpf.o: skeleton/%.bpf.c $(OUTPUT)vmlinux.h $(LIBBPF_BOOTSTRAP)
> >  $(OUTPUT)%.skel.h: $(OUTPUT)%.bpf.o $(BPFTOOL_BOOTSTRAP)
> >       $(QUIET_GEN)$(BPFTOOL_BOOTSTRAP) gen skeleton $< > $@
> >
> > -$(OUTPUT)prog.o: $(OUTPUT)profiler.skel.h
> > +$(OUTPUT)prog.o: $(OUTPUT)profiler.skel.h $(OUTPUT)stream.skel.h
> >
> >  $(OUTPUT)pids.o: $(OUTPUT)pid_iter.skel.h
> >
> > diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> > index 1ce409a6cbd9..c7c0bf3aee24 100644
> > --- a/tools/bpf/bpftool/bash-completion/bpftool
> > +++ b/tools/bpf/bpftool/bash-completion/bpftool
> > @@ -518,7 +518,21 @@ _bpftool()
> >                      esac
> >                      ;;
> >                  tracelog)
> > -                    return 0
> > +                    case $prev in
> > +                        $command)
> > +                            COMPREPLY+=( $( compgen -W "stdout stderr" -- \
> > +                                "$cur" ) )
> > +                            return 0
> > +                            ;;
> > +                        stdout|stderr)
> > +                            COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \
> > +                                "$cur" ) )
> > +                            return 0
> > +                            ;;
> > +                        *)
> > +                            return 0
> > +                            ;;
> > +                    esac
>
>
> Works well, thanks for this!
>
>
> >                      ;;
> >                  profile)
> >                      case $cword in
> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> > index f010295350be..7abe4698c86c 100644
> > --- a/tools/bpf/bpftool/prog.c
> > +++ b/tools/bpf/bpftool/prog.c
> > @@ -35,6 +35,8 @@
> >  #include "main.h"
> >  #include "xlated_dumper.h"
> >
> > +#include "stream.skel.h"
> > +
> >  #define BPF_METADATA_PREFIX "bpf_metadata_"
> >  #define BPF_METADATA_PREFIX_LEN (sizeof(BPF_METADATA_PREFIX) - 1)
> >
> > @@ -697,6 +699,15 @@ static int do_show(int argc, char **argv)
> >       return err;
> >  }
> >
> > +static int process_stream_sample(void *ctx, void *data, size_t len)
> > +{
> > +     FILE *file = ctx;
> > +
> > +     fprintf(file, "%s", (char *)data);
> > +     fflush(file);
> > +     return 0;
> > +}
> > +
> >  static int
> >  prog_dump(struct bpf_prog_info *info, enum dump_mode mode,
> >         char *filepath, bool opcodes, bool visual, bool linum)
> > @@ -1113,6 +1124,80 @@ static int do_detach(int argc, char **argv)
> >       return 0;
> >  }
> >
> > +enum prog_tracelog_mode {
> > +     TRACE_STDOUT,
> > +     TRACE_STDERR,
> > +};
> > +
> > +static int
> > +prog_tracelog_stream(struct bpf_prog_info *info, enum prog_tracelog_mode mode)
> > +{
> > +     FILE *file = mode == TRACE_STDOUT ? stdout : stderr;
> > +     LIBBPF_OPTS(bpf_test_run_opts, opts);
> > +     struct ring_buffer *ringbuf;
> > +     struct stream_bpf *skel;
> > +     int map_fd, ret = -1;
> > +
> > +     __u32 prog_id = info->id;
> > +     __u32 stream_id = mode == TRACE_STDOUT ? 1 : 2;
> > +
> > +     skel = stream_bpf__open_and_load();
> > +     if (!skel)
> > +             return -errno;
> > +     skel->bss->prog_id = prog_id;
> > +     skel->bss->stream_id = stream_id;
> > +
> > +     map_fd = bpf_map__fd(skel->maps.ringbuf);
> > +     ringbuf = ring_buffer__new(map_fd, process_stream_sample, file, NULL);
> > +     if (!ringbuf) {
> > +             ret = -errno;
> > +             goto end;
> > +     }
> > +     do {
> > +             skel->bss->written_count = skel->bss->written_size = 0;
> > +             ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.bpftool_dump_prog_stream), &opts);
> > +             if (ring_buffer__consume_n(ringbuf, skel->bss->written_count) != skel->bss->written_count) {
> > +                     ret = -EINVAL;
> > +                     goto end;
> > +             }
> > +     } while (!ret && opts.retval == EAGAIN);
> > +
> > +     if (opts.retval != 0)
> > +             ret = -EINVAL;
> > +end:
> > +     stream_bpf__destroy(skel);
> > +     return ret;
> > +}
> > +
> > +
> > +static int do_tracelog_any(int argc, char **argv)
> > +{
> > +     enum prog_tracelog_mode mode;
> > +     struct bpf_prog_info info;
> > +     __u32 info_len = sizeof(info);
> > +     int fd, err;
> > +
> > +     if (argc == 0)
> > +             return do_tracelog(argc, argv);
> > +     if (!is_prefix(*argv, "stdout") && !is_prefix(*argv, "stderr"))
> > +             usage();
> > +     mode = is_prefix(*argv, "stdout") ? TRACE_STDOUT : TRACE_STDERR;
> > +     NEXT_ARG();
> > +
> > +     if (!REQ_ARGS(2))
> > +             return -1;
> > +
> > +     fd = prog_parse_fd(&argc, &argv);
> > +     if (fd < 0)
> > +             return -1;
> > +
> > +     err = bpf_prog_get_info_by_fd(fd, &info, &info_len);
> > +     if (err < 0)
> > +             return -1;
> > +
> > +     return prog_tracelog_stream(&info, mode);
> > +}
> > +
> >  static int check_single_stdin(char *file_data_in, char *file_ctx_in)
> >  {
> >       if (file_data_in && file_ctx_in &&
> > @@ -2483,6 +2568,7 @@ static int do_help(int argc, char **argv)
> >               "                         [repeat N]\n"
> >               "       %1$s %2$s profile PROG [duration DURATION] METRICs\n"
> >               "       %1$s %2$s tracelog\n"
> > +             "       %1$s %2$s tracelog { stdout | stderr } PROG\n"
> >               "       %1$s %2$s help\n"
> >               "\n"
> >               "       " HELP_SPEC_MAP "\n"
> > @@ -2522,7 +2608,7 @@ static const struct cmd cmds[] = {
> >       { "loadall",    do_loadall },
> >       { "attach",     do_attach },
> >       { "detach",     do_detach },
> > -     { "tracelog",   do_tracelog },
> > +     { "tracelog",   do_tracelog_any },
> >       { "run",        do_run },
> >       { "profile",    do_profile },
> >       { 0 }
> > diff --git a/tools/bpf/bpftool/skeleton/stream.bpf.c b/tools/bpf/bpftool/skeleton/stream.bpf.c
> > new file mode 100644
> > index 000000000000..910315959144
> > --- /dev/null
> > +++ b/tools/bpf/bpftool/skeleton/stream.bpf.c
> > @@ -0,0 +1,69 @@
> > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +struct {
> > +     __uint(type, BPF_MAP_TYPE_RINGBUF);
> > +     __uint(max_entries, 1024 * 1024);
> > +} ringbuf SEC(".maps");
> > +
> > +int written_size;
> > +int written_count;
> > +int stream_id;
> > +int prog_id;
> > +
> > +#define ENOENT 2
> > +#define EAGAIN 11
> > +#define EFAULT 14
> > +
> > +SEC("syscall")
> > +int bpftool_dump_prog_stream(void *ctx)
> > +{
> > +     struct bpf_stream_elem *elem;
> > +     struct bpf_stream *stream;
> > +     bool cont = false;
> > +     bool ret = 0;
> > +
> > +     stream = bpf_prog_stream_get(stream_id, prog_id);
>
>
> Recalling discussion from RFC:
>
> >> Calls to these new kfuncs will break compilation on older systems that
> >> don't support them yet (and don't have the definition in their
> >> vmlinux.h). We should provide fallback definitions to make sure that the
> >> program compiles.
> >
> > This is the only thing I haven't yet addressed in v2, because it
> > seemed a bit ugly.
> > I tried adding kfunc declarations, but those aren't enough.
> > We rely on structs introduced and read in this patch.
> > So I think vmlinux.h needs to be dropped, but it means adding a lot
> > more than just the declarations, all types, plus any types they
> > transitively depend on.
> > Maybe there is a better way (like detecting compilation failure and skipping?).
> > But if not, I will address like above in v3.
>
> We do have to provide a workaround, or bpftool won't be able to compile
> on any machine that doesn't know the new kfuncs yet.
>
> I don't think there are so many definitions to add (we don't need to
> drop the vmlinux.h), CO-RE should help and if my understanding is
> correct, we should be able to do something like this (on top of your
> patch):
>
>     diff --git a/tools/bpf/bpftool/skeleton/stream.bpf.c b/tools/bpf/bpftool/skeleton/stream.bpf.c
>     index 910315959144..5e3d8f4f68a5 100644
>     --- a/tools/bpf/bpftool/skeleton/stream.bpf.c
>     +++ b/tools/bpf/bpftool/skeleton/stream.bpf.c
>     @@ -1,6 +1,7 @@
>      // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>      /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
>      #include <vmlinux.h>
>     +#include <bpf/bpf_core_read.h>
>      #include <bpf/bpf_tracing.h>
>      #include <bpf/bpf_helpers.h>
>
>     @@ -18,10 +19,31 @@ int prog_id;
>      #define EAGAIN 11
>      #define EFAULT 14
>
>     +
>     +struct bpf_mem_slice___local {
>     +   u32 len;
>     +} __attribute__((preserve_access_index));
>     +struct bpf_stream_elem___local {
>     +   struct bpf_mem_slice___local mem_slice;
>     +} __attribute__((preserve_access_index));
>     +
>     +extern struct bpf_stream *bpf_prog_stream_get(int stream_id,
>     +                                         u32 prog_id) __ksym;
>     +extern struct bpf_stream_elem___local *
>     +bpf_stream_next_elem(struct bpf_stream *stream) __ksym;
>     +extern int bpf_dynptr_from_mem_slice(struct bpf_mem_slice___local *mem_slice,
>     +                                u64 flags,
>     +                                struct bpf_dynptr *dptr__uninit) __ksym;
>     +extern void bpf_stream_free_elem(struct bpf_stream_elem___local *elem) __ksym;
>     +extern void bpf_prog_stream_put(struct bpf_stream *stream) __ksym;
>     +extern int bpf_dynptr_copy(struct bpf_dynptr *dst_ptr, u32 dst_off,
>     +                      struct bpf_dynptr *src_ptr, u32 src_off,
>     +                      u32 size) __ksym;
>     +
>      SEC("syscall")
>      int bpftool_dump_prog_stream(void *ctx)
>      {
>     -   struct bpf_stream_elem *elem;
>     +   struct bpf_stream_elem___local *elem;
>         struct bpf_stream *stream;
>         bool cont = false;
>         bool ret = 0;
>     @@ -38,6 +60,7 @@ int bpftool_dump_prog_stream(void *ctx)
>             if (!elem)
>                 break;
>             size = elem->mem_slice.len;
>     +           bpf_core_read(&size, sizeof(u32), &elem->mem_slice.len);
>
>             if (bpf_dynptr_from_mem_slice(&elem->mem_slice, 0, &src_dptr))
>                 ret = EFAULT;
>
>
> The diff above allowed me to compile on a box with a 6.10 kernel,
> although I didn't check that the feature still works with a vmlinux
> generated after applying your changes - please try it.
>
> We should probably find workarounds for older struct and helpers too,
> such as struct bpf_dynptr and bpf_ringbuf_(reserve|discard)_dynptr, but
> I didn't look into it.

All makes sense, yeah I realized after hitting send that core can work.
Will update in v3.

>
>
> > +     if (!stream)
> > +             return ENOENT;
> > +
> > +     bpf_repeat(BPF_MAX_LOOPS) {
> > +             struct bpf_dynptr dst_dptr, src_dptr;
> > +             int size;
> > +
> > +             elem = bpf_stream_next_elem(stream);
> > +             if (!elem)
> > +                     break;
> > +             size = elem->mem_slice.len;
> > +
> > +             if (bpf_dynptr_from_mem_slice(&elem->mem_slice, 0, &src_dptr))
> > +                     ret = EFAULT;
> > +             if (bpf_ringbuf_reserve_dynptr(&ringbuf, size, 0, &dst_dptr))
> > +                     ret = EFAULT;
> > +             if (bpf_dynptr_copy(&dst_dptr, 0, &src_dptr, 0, size))
> > +                     ret = EFAULT;
> > +             bpf_ringbuf_submit_dynptr(&dst_dptr, 0);
> > +
> > +             written_count++;
> > +             written_size += size;
> > +
> > +             bpf_stream_free_elem(elem);
> > +
> > +             /* Probe and exit if no more space, probe for twice the typical size. */
> > +             if (bpf_ringbuf_reserve_dynptr(&ringbuf, 2048, 0, &dst_dptr))
> > +                     cont = true;
> > +             bpf_ringbuf_discard_dynptr(&dst_dptr, 0);
> > +
> > +             if (ret || cont)
> > +                     break;
> > +     }
> > +
> > +     bpf_prog_stream_put(stream);
> > +
> > +     return ret ? ret : (cont ? EAGAIN : 0);
> > +}
> > +
> > +char _license[] SEC("license") = "Dual BSD/GPL";
>
> Thanks!

Great, thanks for taking a look!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-07 17:17 ` [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
  2025-05-08 23:31   ` Eduard Zingerman
@ 2025-05-08 23:41   ` Alexei Starovoitov
  2025-05-08 23:48     ` Kumar Kartikeya Dwivedi
  2025-05-09 21:26   ` Andrii Nakryiko
  2 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2025-05-08 23:41 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Wed, May 7, 2025 at 10:17 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Introduce a new macro that allows printing data similar to bpf_printk(),
> but to BPF streams. The first argument is the stream ID, the rest of the
> arguments are same as what one would pass to bpf_printk().
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/stream.c         | 10 +++++++--
>  tools/lib/bpf/bpf_helpers.h | 44 +++++++++++++++++++++++++++++++------
>  2 files changed, 45 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> index eaf0574866b1..d64975486ad1 100644
> --- a/kernel/bpf/stream.c
> +++ b/kernel/bpf/stream.c
> @@ -257,7 +257,12 @@ __bpf_kfunc int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__s
>         return ret;
>  }
>
> -__bpf_kfunc struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, void *aux__ign)
> +/* Use int vs enum stream_id here, we use this kfunc in bpf_helpers.h, and
> + * keeping enum stream_id necessitates a complete definition of enum, but we
> + * can't copy it in the header as it may conflict with the definition in
> + * vmlinux.h.
> + */
> +__bpf_kfunc struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign)
>  {
>         struct bpf_prog_aux *aux = aux__ign;
>
> @@ -351,7 +356,8 @@ __bpf_kfunc struct bpf_stream_elem *bpf_stream_next_elem(struct bpf_stream *stre
>         return elem;
>  }
>
> -__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(enum bpf_stream_id stream_id, u32 prog_id)
> +/* Use int vs enum bpf_stream_id for consistency with bpf_stream_get. */
> +__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(int stream_id, u32 prog_id)
>  {
>         struct bpf_stream *stream;
>         struct bpf_prog *prog;
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index a50773d4616e..1a748c21e358 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -314,17 +314,47 @@ enum libbpf_tristate {
>                           ___param, sizeof(___param));          \
>  })
>
> +struct bpf_stream;
> +
> +extern struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign) __weak __ksym;
> +extern int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__str, const void *args,
> +                             __u32 len__sz) __weak __ksym;
> +
> +#define __bpf_stream_vprintk(stream, fmt, args...)                             \
> +({                                                                             \
> +       static const char ___fmt[] = fmt;                                       \
> +       unsigned long long ___param[___bpf_narg(args)];                         \
> +                                                                               \
> +       _Pragma("GCC diagnostic push")                                          \
> +       _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")                  \
> +       ___bpf_fill(___param, args);                                            \
> +       _Pragma("GCC diagnostic pop")                                           \
> +                                                                               \
> +       int ___id = stream;                                                     \
> +       struct bpf_stream *___sptr = bpf_stream_get(___id, NULL);               \
> +       if (___sptr)                                                            \
> +               bpf_stream_vprintk(___sptr, ___fmt, ___param, sizeof(___param));\
> +})

Typically _get() is an acquire kfunc,
but here:

+BTF_ID_FLAGS(func, bpf_stream_get, KF_RET_NULL)
...
+BTF_ID_FLAGS(func, bpf_prog_stream_get, KF_ACQUIRE | KF_RET_NULL)

This is odd and it makes above sequence look weird too.

This is inconsistent as well:
bpf_stream_printk(int stream,
bpf_stream_vprintk(struct bpf_stream *stream,

Existing helpers bpf_trace_printk() and bpf_trace_vprintk()
are consistent.

Not sure why bpf_stream_get() is needed at all.

Maybe
#define BPF_STDOUT ((struct bpf_stream *)1)
#define BPF_STDERR ((struct bpf_stream *)2)

not pretty, but at least api will be consistent.

Other ideas ?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-08 23:41   ` Alexei Starovoitov
@ 2025-05-08 23:48     ` Kumar Kartikeya Dwivedi
  2025-05-08 23:50       ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-08 23:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Fri, 9 May 2025 at 01:42, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, May 7, 2025 at 10:17 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Introduce a new macro that allows printing data similar to bpf_printk(),
> > but to BPF streams. The first argument is the stream ID, the rest of the
> > arguments are same as what one would pass to bpf_printk().
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  kernel/bpf/stream.c         | 10 +++++++--
> >  tools/lib/bpf/bpf_helpers.h | 44 +++++++++++++++++++++++++++++++------
> >  2 files changed, 45 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> > index eaf0574866b1..d64975486ad1 100644
> > --- a/kernel/bpf/stream.c
> > +++ b/kernel/bpf/stream.c
> > @@ -257,7 +257,12 @@ __bpf_kfunc int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__s
> >         return ret;
> >  }
> >
> > -__bpf_kfunc struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, void *aux__ign)
> > +/* Use int vs enum stream_id here, we use this kfunc in bpf_helpers.h, and
> > + * keeping enum stream_id necessitates a complete definition of enum, but we
> > + * can't copy it in the header as it may conflict with the definition in
> > + * vmlinux.h.
> > + */
> > +__bpf_kfunc struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign)
> >  {
> >         struct bpf_prog_aux *aux = aux__ign;
> >
> > @@ -351,7 +356,8 @@ __bpf_kfunc struct bpf_stream_elem *bpf_stream_next_elem(struct bpf_stream *stre
> >         return elem;
> >  }
> >
> > -__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(enum bpf_stream_id stream_id, u32 prog_id)
> > +/* Use int vs enum bpf_stream_id for consistency with bpf_stream_get. */
> > +__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(int stream_id, u32 prog_id)
> >  {
> >         struct bpf_stream *stream;
> >         struct bpf_prog *prog;
> > diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> > index a50773d4616e..1a748c21e358 100644
> > --- a/tools/lib/bpf/bpf_helpers.h
> > +++ b/tools/lib/bpf/bpf_helpers.h
> > @@ -314,17 +314,47 @@ enum libbpf_tristate {
> >                           ___param, sizeof(___param));          \
> >  })
> >
> > +struct bpf_stream;
> > +
> > +extern struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign) __weak __ksym;
> > +extern int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__str, const void *args,
> > +                             __u32 len__sz) __weak __ksym;
> > +
> > +#define __bpf_stream_vprintk(stream, fmt, args...)                             \
> > +({                                                                             \
> > +       static const char ___fmt[] = fmt;                                       \
> > +       unsigned long long ___param[___bpf_narg(args)];                         \
> > +                                                                               \
> > +       _Pragma("GCC diagnostic push")                                          \
> > +       _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")                  \
> > +       ___bpf_fill(___param, args);                                            \
> > +       _Pragma("GCC diagnostic pop")                                           \
> > +                                                                               \
> > +       int ___id = stream;                                                     \
> > +       struct bpf_stream *___sptr = bpf_stream_get(___id, NULL);               \
> > +       if (___sptr)                                                            \
> > +               bpf_stream_vprintk(___sptr, ___fmt, ___param, sizeof(___param));\
> > +})
>
> Typically _get() is an acquire kfunc,
> but here:
>
> +BTF_ID_FLAGS(func, bpf_stream_get, KF_RET_NULL)
> ...
> +BTF_ID_FLAGS(func, bpf_prog_stream_get, KF_ACQUIRE | KF_RET_NULL)
>
> This is odd and it makes above sequence look weird too.
>
> This is inconsistent as well:
> bpf_stream_printk(int stream,
> bpf_stream_vprintk(struct bpf_stream *stream,
>
> Existing helpers bpf_trace_printk() and bpf_trace_vprintk()
> are consistent.
>
> Not sure why bpf_stream_get() is needed at all.
>
> Maybe
> #define BPF_STDOUT ((struct bpf_stream *)1)
> #define BPF_STDERR ((struct bpf_stream *)2)
>
> not pretty, but at least api will be consistent.
>
> Other ideas ?

We can take the stream id directly in bpf_stream_vprintk, we have room
for one more argument, that can be hidden prog->aux.
Then we can drop bpf_stream_get.

Alternatively there's a way to call it bpf_stream_self.
I wasn't concerned about inconsistency since bpf_stream_vprintk is not
something people will use directly, you have to stuff arguments as array
of u64 etc. so it's unusable in practice. The main API exposed is
bpf_stream_printk. But I get the concern.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-08 23:48     ` Kumar Kartikeya Dwivedi
@ 2025-05-08 23:50       ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-08 23:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Fri, 9 May 2025 at 01:48, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Fri, 9 May 2025 at 01:42, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, May 7, 2025 at 10:17 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > Introduce a new macro that allows printing data similar to bpf_printk(),
> > > but to BPF streams. The first argument is the stream ID, the rest of the
> > > arguments are same as what one would pass to bpf_printk().
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  kernel/bpf/stream.c         | 10 +++++++--
> > >  tools/lib/bpf/bpf_helpers.h | 44 +++++++++++++++++++++++++++++++------
> > >  2 files changed, 45 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> > > index eaf0574866b1..d64975486ad1 100644
> > > --- a/kernel/bpf/stream.c
> > > +++ b/kernel/bpf/stream.c
> > > @@ -257,7 +257,12 @@ __bpf_kfunc int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__s
> > >         return ret;
> > >  }
> > >
> > > -__bpf_kfunc struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, void *aux__ign)
> > > +/* Use int vs enum stream_id here, we use this kfunc in bpf_helpers.h, and
> > > + * keeping enum stream_id necessitates a complete definition of enum, but we
> > > + * can't copy it in the header as it may conflict with the definition in
> > > + * vmlinux.h.
> > > + */
> > > +__bpf_kfunc struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign)
> > >  {
> > >         struct bpf_prog_aux *aux = aux__ign;
> > >
> > > @@ -351,7 +356,8 @@ __bpf_kfunc struct bpf_stream_elem *bpf_stream_next_elem(struct bpf_stream *stre
> > >         return elem;
> > >  }
> > >
> > > -__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(enum bpf_stream_id stream_id, u32 prog_id)
> > > +/* Use int vs enum bpf_stream_id for consistency with bpf_stream_get. */
> > > +__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(int stream_id, u32 prog_id)
> > >  {
> > >         struct bpf_stream *stream;
> > >         struct bpf_prog *prog;
> > > diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> > > index a50773d4616e..1a748c21e358 100644
> > > --- a/tools/lib/bpf/bpf_helpers.h
> > > +++ b/tools/lib/bpf/bpf_helpers.h
> > > @@ -314,17 +314,47 @@ enum libbpf_tristate {
> > >                           ___param, sizeof(___param));          \
> > >  })
> > >
> > > +struct bpf_stream;
> > > +
> > > +extern struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign) __weak __ksym;
> > > +extern int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__str, const void *args,
> > > +                             __u32 len__sz) __weak __ksym;
> > > +
> > > +#define __bpf_stream_vprintk(stream, fmt, args...)                             \
> > > +({                                                                             \
> > > +       static const char ___fmt[] = fmt;                                       \
> > > +       unsigned long long ___param[___bpf_narg(args)];                         \
> > > +                                                                               \
> > > +       _Pragma("GCC diagnostic push")                                          \
> > > +       _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")                  \
> > > +       ___bpf_fill(___param, args);                                            \
> > > +       _Pragma("GCC diagnostic pop")                                           \
> > > +                                                                               \
> > > +       int ___id = stream;                                                     \
> > > +       struct bpf_stream *___sptr = bpf_stream_get(___id, NULL);               \
> > > +       if (___sptr)                                                            \
> > > +               bpf_stream_vprintk(___sptr, ___fmt, ___param, sizeof(___param));\
> > > +})
> >
> > Typically _get() is an acquire kfunc,
> > but here:
> >
> > +BTF_ID_FLAGS(func, bpf_stream_get, KF_RET_NULL)
> > ...
> > +BTF_ID_FLAGS(func, bpf_prog_stream_get, KF_ACQUIRE | KF_RET_NULL)
> >
> > This is odd and it makes above sequence look weird too.
> >
> > This is inconsistent as well:
> > bpf_stream_printk(int stream,
> > bpf_stream_vprintk(struct bpf_stream *stream,
> >
> > Existing helpers bpf_trace_printk() and bpf_trace_vprintk()
> > are consistent.
> >
> > Not sure why bpf_stream_get() is needed at all.
> >
> > Maybe
> > #define BPF_STDOUT ((struct bpf_stream *)1)
> > #define BPF_STDERR ((struct bpf_stream *)2)
> >
> > not pretty, but at least api will be consistent.
> >
> > Other ideas ?
>
> We can take the stream id directly in bpf_stream_vprintk, we have room
> for one more argument, that can be hidden prog->aux.
> Then we can drop bpf_stream_get.

Taking it directly does negate the ability to write into any stream *
one has access to, so there's that.

>
> Alternatively there's a way to call it bpf_stream_self.
> I wasn't concerned about inconsistency since bpf_stream_vprintk is not
> something people will use directly, you have to stuff arguments as array
> of u64 etc. so it's unusable in practice. The main API exposed is
> bpf_stream_printk. But I get the concern.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 02/11] bpf: Introduce BPF standard streams
  2025-05-07 17:17 ` [PATCH bpf-next v1 02/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
@ 2025-05-08 23:54   ` Eduard Zingerman
  2025-05-09  0:10     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-08 23:54 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> Add support for a stream API to the kernel and expose related kfuncs to
> BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
> can be used for printing messages that can be consumed from user space,
> thus it's similar in spirit to existing trace_pipe interface.

[...]

> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Read through the patch, implementation looks solid,
but I'm no expert on multi-threading within kernel.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

---

For the sake of discussion and sorry, I'm repeating myself a bit.
Current API is still quite elaborate:
- bpf_prog_stream_get()
  - bpf_stream_next_elem()
  - bpf_stream_free_elem()
- bpf_prog_stream_put()

On the other hand, this sequence of function calls can be hidden
inside a single kfunc with prototype like:

  bpf_stream_read(int stream_id, int prog_id, struct bpf_dynptr *dst);

Which would slightly complicate stream elem, as it would need to track
amount of bytes consumed from it, but completely hide the
implementation details.

I'm sure you thought about that, what is the reasoning behind a
more complicated API?

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 02/11] bpf: Introduce BPF standard streams
  2025-05-08 23:54   ` Eduard Zingerman
@ 2025-05-09  0:10     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-09  0:10 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 9 May 2025 at 01:54, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> > Add support for a stream API to the kernel and expose related kfuncs to
> > BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
> > can be used for printing messages that can be consumed from user space,
> > thus it's similar in spirit to existing trace_pipe interface.
>
> [...]
>
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Read through the patch, implementation looks solid,
> but I'm no expert on multi-threading within kernel.
>
> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
>
> ---
>
> For the sake of discussion and sorry, I'm repeating myself a bit.
> Current API is still quite elaborate:
> - bpf_prog_stream_get()
>   - bpf_stream_next_elem()
>   - bpf_stream_free_elem()
> - bpf_prog_stream_put()
>
> On the other hand, this sequence of function calls can be hidden
> inside a single kfunc with prototype like:
>
>   bpf_stream_read(int stream_id, int prog_id, struct bpf_dynptr *dst);
>
> Which would slightly complicate stream elem, as it would need to track
> amount of bytes consumed from it, but completely hide the
> implementation details.
>
> I'm sure you thought about that, what is the reasoning behind a
> more complicated API?

Mostly that I was not trying to reinvent read(2).

As you said, we're basically exposing a file with a persistent offset
underneath.
bpf_stream_read() will need to hold a lock around the memcpy into the dynptr.
So in the end it's effectively like reading from a file.
stream_id is like fd, prog_id is just an extra identifier we need to
pass to locate it,
but you could separate it into the equivalent of open(2) like we have now.

I don't have a particularly strong opposition to bundling it inside a
single kfunc,
but I just decided composing other building blocks and doing it in the
program might be better.
FWIW you can expose a BPF function bpf_stream_read as well, but I
guess that's harder to ship to people than a kfunc.

But anyway, don't have strong opinions here, so others should chime in
to shape the discussion.

>
> [...]
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-08 23:33     ` Kumar Kartikeya Dwivedi
@ 2025-05-09  6:16       ` Eduard Zingerman
  2025-05-09 21:28         ` Andrii Nakryiko
  0 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09  6:16 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 2025-05-09 at 01:33 +0200, Kumar Kartikeya Dwivedi wrote:

[...]

> > > -#define ___bpf_pick_printk(...) \
> > > -     ___bpf_nth(_, ##__VA_ARGS__, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,       \
> > > -                __bpf_vprintk, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,          \
> > > -                __bpf_vprintk, __bpf_vprintk, __bpf_printk /*3*/, __bpf_printk /*2*/,\
> > > -                __bpf_printk /*1*/, __bpf_printk /*0*/)
> > > +#define ___bpf_pick_printk(choice, choice_3, ...)                    \
> > > +     ___bpf_nth(_, ##__VA_ARGS__, choice, choice, choice,            \
> > > +                choice, choice, choice, choice,                      \
> > > +                choice, choice, choice_3 /*3*/, choice_3 /*2*/,      \
> > > +                choice_3 /*1*/, choice_3 /*0*/)
> > > 
> > >  /* Helper macro to print out debug messages */
> > > -#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
> > > +#define __bpf_trace_printk(fmt, args...) \
> > > +     ___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
> > > +#define __bpf_stream_printk(stream, fmt, args...) \
> > > +     ___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
> >                            ^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^
> >                            These two parameters are identical,
> >                            why is ___bpf_pick_printk is necessary in such case?
> 
> In our case choice and choice_3 are the same, but for bpf_printk
> they're different, I was mostly trying to reuse the pick_printk
> machinery for both (which dispatches correctly to the actual macro).
>

But ___bpf_pick_printk is a noop if two identical choices are supplied,
so there is nothing to reuse. E.g. nothing breaks after the following change:

   #define __bpf_trace_printk(fmt, args...) \
          ___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
  -#define __bpf_stream_printk(stream, fmt, args...) \
  -       ___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
   
  -#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_printk(stream, fmt, ##args)
  +#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_vprintk(stream, fmt, ##args)
   
   #define bpf_printk(arg, args...) __bpf_trace_printk(arg, ##args)

Which allows to shorten this patch.
Or do I miss something?

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-07 17:17 ` [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
  2025-05-08 10:41   ` Quentin Monnet
@ 2025-05-09  6:21   ` Eduard Zingerman
  2025-05-09 17:31     ` Alexei Starovoitov
  1 sibling, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09  6:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Quentin Monnet, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
	Matt Bobrowski, kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> Add bpftool support for dumping streams of a given BPF program.
> The syntax is `bpftool prog tracelog { stdout | stderr } PROG`.
> The stdout is dumped to stdout, stderr is dumped to stderr.
> 
> Cc: Quentin Monnet <qmo@kernel.org>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Do we want some utility functions for access to streams in libbpf?
I'd say that this would be useful, otherwise many applications
would need to reinvent their own bpftool_dump_prog_stream().

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout to BPF stderr
  2025-05-07 17:17 ` [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
  2025-05-08 12:53   ` kernel test robot
@ 2025-05-09  6:22   ` Eduard Zingerman
  2025-05-09  9:19   ` Alan Maguire
  2 siblings, 0 replies; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09  6:22 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> Begin reporting may_goto timeouts to BPF program's stderr stream.
> Make sure that we don't end up spamming too many errors if the
> program keeps failing repeatedly and filling up the stream, hence
> emit at most 512 error messages from the kernel for a given stream.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout to BPF stderr
  2025-05-07 17:17 ` [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
  2025-05-08 12:53   ` kernel test robot
  2025-05-09  6:22   ` Eduard Zingerman
@ 2025-05-09  9:19   ` Alan Maguire
  2 siblings, 0 replies; 55+ messages in thread
From: Alan Maguire @ 2025-05-09  9:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

On 07/05/2025 18:17, Kumar Kartikeya Dwivedi wrote:
> Begin reporting may_goto timeouts to BPF program's stderr stream.
> Make sure that we don't end up spamming too many errors if the
> program keeps failing repeatedly and filling up the stream, hence
> emit at most 512 error messages from the kernel for a given stream.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

this series is great, having runtime error reporting like this is hugely
valuable! One question below...

> ---
>  include/linux/bpf.h | 21 ++++++++++++++-------
>  kernel/bpf/core.c   | 17 ++++++++++++++++-
>  kernel/bpf/stream.c |  5 +++++
>  3 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 46ce05aad0ed..daf95333be78 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1667,6 +1667,7 @@ struct bpf_prog_aux {
>  		struct rcu_head	rcu;
>  	};
>  	struct bpf_stream stream[2];
> +	atomic_t stream_error_cnt;
>  };
>  
>  struct bpf_prog {
> @@ -3589,6 +3590,8 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
>  int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
>  void bpf_put_buffers(void);
>  
> +#define BPF_PROG_STREAM_ERROR_CNT 512
> +
>  void bpf_prog_stream_init(struct bpf_prog *prog);
>  void bpf_prog_stream_free(struct bpf_prog *prog);
>  
> @@ -3600,16 +3603,20 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
>  			    enum bpf_stream_id stream_id);
>  int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
>  
> +bool bpf_prog_stream_error_limit(struct bpf_prog *prog);
> +
>  #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
>  #define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
>  
> -#define bpf_stream_stage(prog, stream_id, expr)                  \
> -	({                                                       \
> -		struct bpf_stream_stage __ss;                    \
> -		bpf_stream_stage_init(&__ss);                    \
> -		(expr);                                          \
> -		bpf_stream_stage_commit(&__ss, prog, stream_id); \
> -		bpf_stream_stage_free(&__ss);                    \
> +#define bpf_stream_stage(prog, stream_id, expr)                          \
> +	({                                                               \
> +		struct bpf_stream_stage __ss;                            \
> +		if (!bpf_prog_stream_error_limit(prog)) {                \
> +			bpf_stream_stage_init(&__ss);                    \
> +			(expr);                                          \
> +			bpf_stream_stage_commit(&__ss, prog, stream_id); \
> +			bpf_stream_stage_free(&__ss);                    \
> +		}                                                        \
>  	})
>  
>  #ifdef CONFIG_BPF_LSM
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index dcb665bff22f..d21c304fe829 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3156,6 +3156,19 @@ u64 __weak arch_bpf_timed_may_goto(void)
>  	return 0;
>  }
>  
> +static noinline void bpf_prog_report_may_goto_violation(void)
> +{
> +	struct bpf_prog *prog;
> +
> +	prog = bpf_prog_find_from_stack();
> +	if (!prog)
> +		return;
> +	bpf_stream_stage(prog, BPF_STDERR, ({
> +		bpf_stream_printk("ERROR: Timeout detected for may_goto instruction\n");
> +		bpf_stream_dump_stack();
> +	}));
> +}
> +

Given that we can hit a stream stage error limit, and that some users
might want a high-level picture before diving into stream output, is
there any scope here for adding error stats covering situations like
this? I can imagine some users (perhaps users of bpftool) might not want
to see the full error stream but rather get a summary of runtime error
stats first, so recording runtime error counts (perhaps contingent on
bpf_stats_enabled?) might be worthwhile too? Doesn't have to be this
series of course, but just wondering if others perceive a need here too.

A tracepoint for BPF runtime errors that is passed a bpf prog + an enum
representing the error encountered would be pretty handy for tracers I
suspect; that would allow them to tailor their output based upon their
needs when runtime errors are hit, with later dumping of the whole error
stream if required.

Thanks!

Alan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 11/11] selftests/bpf: Add tests for prog streams
  2025-05-07 17:17 ` [PATCH bpf-next v1 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
@ 2025-05-09 17:18   ` Eduard Zingerman
  0 siblings, 0 replies; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09 17:18 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> Add selftests to stress test the various facets of the stream API,
> memory allocation pattern, and ensuring dumping support is tested and
> functional. Create symlink to bpftool stream.bpf.c and use it to test
> the support to dump messages to ringbuf in user space, and verify
> output.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice
  2025-05-07 17:17 ` [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice Kumar Kartikeya Dwivedi
@ 2025-05-09 17:19   ` Eduard Zingerman
  2025-05-09 21:11   ` Andrii Nakryiko
  1 sibling, 0 replies; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09 17:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> Add a new bpf_dynptr_from_mem_slice kfunc to create a dynptr from a
> PTR_TO_BTF_ID exposing a variable-length slice of memory, represented by
> the new bpf_mem_slice type. This slice is read-only, for a read-write
> slice we can expose a distinct type in the future.
> 
> Since this is the first kfunc with potential local dynptr
> initialization, add it to the if-else list in check_kfunc_call.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-09  6:21   ` Eduard Zingerman
@ 2025-05-09 17:31     ` Alexei Starovoitov
  2025-05-09 18:31       ` Eduard Zingerman
  0 siblings, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2025-05-09 17:31 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Kumar Kartikeya Dwivedi, bpf, Quentin Monnet, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Emil Tsalapatis, Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Thu, May 8, 2025 at 11:21 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> > Add bpftool support for dumping streams of a given BPF program.
> > The syntax is `bpftool prog tracelog { stdout | stderr } PROG`.
> > The stdout is dumped to stdout, stderr is dumped to stderr.
> >
> > Cc: Quentin Monnet <qmo@kernel.org>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Do we want some utility functions for access to streams in libbpf?
> I'd say that this would be useful, otherwise many applications
> would need to reinvent their own bpftool_dump_prog_stream().

Since we're positioning streams as analogous to stdout/stderr
we have to expect that user space applications will be routinely
accessing it, so we need an easy way to read the streams.

I don't think we can ship syscall bpf prog with libbpf.
When it's part of bpftool, it's fine, but being part of the library
is taking non-uapi stance too far.

How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
Or add a new command ?

Then the question regarding:
- bpf_prog_stream_get()
  - bpf_stream_next_elem()
  - bpf_stream_free_elem()
- bpf_prog_stream_put()

vs

bpf_stream_read()

will disappear.
For now at least. New command will copy from stream into user space.

We wouldn't need to introduce mem_slice right now as well.
It can be added later when needed.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-09 17:31     ` Alexei Starovoitov
@ 2025-05-09 18:31       ` Eduard Zingerman
  2025-05-09 18:48         ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09 18:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, bpf, Quentin Monnet, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Emil Tsalapatis, Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:

[...]

> How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> Or add a new command ?

You mean like this:

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 71d5ac83cf5d..25ac28d11af5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6610,6 +6610,10 @@ struct bpf_prog_info {
        __u32 verified_insns;
        __u32 attach_btf_obj_id;
        __u32 attach_btf_id;
+       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
+       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
+       __aligned_u64 stdout;
+       __aligned_u64 stderr;
 } __attribute__((aligned(8)));

And return -EAGAIN if there is more data to read?
Imo, having this in syscall is more convenient for the end users.

Alternatively, are files in bpffs considered to be stable API?
E.g. having something like /sys/fs/bpf/<prog-id>/std{err,out} .

[...]


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-09 18:31       ` Eduard Zingerman
@ 2025-05-09 18:48         ` Alexei Starovoitov
  2025-05-09 19:37           ` Eduard Zingerman
  2025-05-09 21:33           ` Andrii Nakryiko
  0 siblings, 2 replies; 55+ messages in thread
From: Alexei Starovoitov @ 2025-05-09 18:48 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Kumar Kartikeya Dwivedi, bpf, Quentin Monnet, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Emil Tsalapatis, Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Fri, May 9, 2025 at 11:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> > Or add a new command ?
>
> You mean like this:
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 71d5ac83cf5d..25ac28d11af5 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -6610,6 +6610,10 @@ struct bpf_prog_info {
>         __u32 verified_insns;
>         __u32 attach_btf_obj_id;
>         __u32 attach_btf_id;
> +       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
> +       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
> +       __aligned_u64 stdout;
> +       __aligned_u64 stderr;
>  } __attribute__((aligned(8)));
>
> And return -EAGAIN if there is more data to read?

Exactly.
The only concern that all other __aligned_u64 will probably be zero,
but kernel will still fill in all other non-pointer fields and
that information will be re-populated again and again,
so new command might be cleaner.

> Imo, having this in syscall is more convenient for the end users.
>
> Alternatively, are files in bpffs considered to be stable API?
> E.g. having something like /sys/fs/bpf/<prog-id>/std{err,out} .

yeah. Ideally the user would just 'cat /sys/.../stdout',
but we don't auto create pseudo files when progs are loaded.
Maybe we should.
'bpftool prog show' will become 'ls' in some directory.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 08/11] bpf: Report arena faults to BPF stderr
  2025-05-07 17:17 ` [PATCH bpf-next v1 08/11] bpf: Report arena faults " Kumar Kartikeya Dwivedi
@ 2025-05-09 19:28   ` Eduard Zingerman
  2025-05-09 20:01     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09 19:28 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> Begin reporting arena page faults and the faulting address to BPF
> program's stderr, for now limited to x86, but arm64 support should
> be easy to add.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

I think this needs a corresponding test case that would check
backtrace structure and address in the message.

>  arch/x86/net/bpf_jit_comp.c | 21 ++++++++++++++++++---
>  include/linux/bpf.h         |  1 +
>  kernel/bpf/arena.c          | 14 ++++++++++++++
>  3 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 17693ee6bb1a..dbb0feeec701 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1384,15 +1384,27 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
>  }
>  
>  #define DONT_CLEAR 1
> +#define ARENA_FAULT (1 << 8)
>  
>  bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
>  {
> -	u32 reg = x->fixup >> 8;
> +	u32 arena_reg = (x->fixup >> 8) & 0xff;
> +	bool is_arena = !!arena_reg;
> +	u32 reg = x->fixup >> 16;
> +	unsigned long addr;
> +
> +	/* Read here, if src_reg is dst_reg for load, we'll write 0 to it. */
> +	if (is_arena)
> +		addr = *(unsigned long *)((void *)regs + arena_reg);

Is it necessary to also take offset into account when calculating address?

>  
>  	/* jump over faulting load and clear dest register */
>  	if (reg != DONT_CLEAR)
>  		*(unsigned long *)((void *)regs + reg) = 0;
>  	regs->ip += x->fixup & 0xff;
> +
> +	if (is_arena)
> +		bpf_prog_report_arena_violation(reg == DONT_CLEAR, addr);
> +
>  	return true;
>  }
>  
> @@ -2043,7 +2055,10 @@ st:			if (is_imm8(insn->off))
>  				ex->data = EX_TYPE_BPF;
>  
>  				ex->fixup = (prog - start_of_ldx) |
> -					((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 8);
> +					((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 16)
> +					| ((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[src_reg] : reg2pt_regs[dst_reg])<< 8);
> +				/* Ensure src_reg offset fits in 1 byte. */
> +				BUILD_BUG_ON(sizeof(struct pt_regs) > U8_MAX);

The ex->fixup field structure should be better documented, at the
moment docstring does not say anything about registers being encoded
within it. Also, maybe add a comment why `prog - start_of_ldx` is
guaranteed to be small.

>  			}
>  			break;
>  

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-09 18:48         ` Alexei Starovoitov
@ 2025-05-09 19:37           ` Eduard Zingerman
  2025-05-09 19:50             ` Kumar Kartikeya Dwivedi
  2025-05-09 21:33           ` Andrii Nakryiko
  1 sibling, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09 19:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, bpf, Quentin Monnet, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Emil Tsalapatis, Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Fri, 2025-05-09 at 11:48 -0700, Alexei Starovoitov wrote:

[...]

> yeah. Ideally the user would just 'cat /sys/.../stdout',
> but we don't auto create pseudo files when progs are loaded.
> Maybe we should.
> 'bpftool prog show' will become 'ls' in some directory.

From the end user point of view, I think this is the simplest
interface possible.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-09 19:37           ` Eduard Zingerman
@ 2025-05-09 19:50             ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-09 19:50 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, bpf, Quentin Monnet, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Emil Tsalapatis, Barret Rhoden, Matt Bobrowski, kkd, Kernel Team

On Fri, 9 May 2025 at 21:37, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-05-09 at 11:48 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > yeah. Ideally the user would just 'cat /sys/.../stdout',
> > but we don't auto create pseudo files when progs are loaded.
> > Maybe we should.
> > 'bpftool prog show' will become 'ls' in some directory.
>
> From the end user point of view, I think this is the simplest
> interface possible.

Alright, I will rework like this.
This will require a fair amount of reworking though, so it's going to
take some time.

>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 08/11] bpf: Report arena faults to BPF stderr
  2025-05-09 19:28   ` Eduard Zingerman
@ 2025-05-09 20:01     ` Kumar Kartikeya Dwivedi
  2025-05-09 20:07       ` Eduard Zingerman
  0 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-09 20:01 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 9 May 2025 at 21:28, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2025-05-07 at 10:17 -0700, Kumar Kartikeya Dwivedi wrote:
> > Begin reporting arena page faults and the faulting address to BPF
> > program's stderr, for now limited to x86, but arm64 support should
> > be easy to add.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> I think this needs a corresponding test case that would check
> backtrace structure and address in the message.

Makes sense, will do.

>
> >  arch/x86/net/bpf_jit_comp.c | 21 ++++++++++++++++++---
> >  include/linux/bpf.h         |  1 +
> >  kernel/bpf/arena.c          | 14 ++++++++++++++
> >  3 files changed, 33 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 17693ee6bb1a..dbb0feeec701 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1384,15 +1384,27 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
> >  }
> >
> >  #define DONT_CLEAR 1
> > +#define ARENA_FAULT (1 << 8)
> >
> >  bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
> >  {
> > -     u32 reg = x->fixup >> 8;
> > +     u32 arena_reg = (x->fixup >> 8) & 0xff;
> > +     bool is_arena = !!arena_reg;
> > +     u32 reg = x->fixup >> 16;
> > +     unsigned long addr;
> > +
> > +     /* Read here, if src_reg is dst_reg for load, we'll write 0 to it. */
> > +     if (is_arena)
> > +             addr = *(unsigned long *)((void *)regs + arena_reg);
>
> Is it necessary to also take offset into account when calculating address?
>

Not sure what you mean? "arena_reg" is basically the offset of the
register holding the arena address within pt_regs.

> >
> >       /* jump over faulting load and clear dest register */
> >       if (reg != DONT_CLEAR)
> >               *(unsigned long *)((void *)regs + reg) = 0;
> >       regs->ip += x->fixup & 0xff;
> > +
> > +     if (is_arena)
> > +             bpf_prog_report_arena_violation(reg == DONT_CLEAR, addr);
> > +
> >       return true;
> >  }
> >
> > @@ -2043,7 +2055,10 @@ st:                    if (is_imm8(insn->off))
> >                               ex->data = EX_TYPE_BPF;
> >
> >                               ex->fixup = (prog - start_of_ldx) |
> > -                                     ((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 8);
> > +                                     ((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 16)
> > +                                     | ((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[src_reg] : reg2pt_regs[dst_reg])<< 8);
> > +                             /* Ensure src_reg offset fits in 1 byte. */
> > +                             BUILD_BUG_ON(sizeof(struct pt_regs) > U8_MAX);
>
> The ex->fixup field structure should be better documented, at the
> moment docstring does not say anything about registers being encoded
> within it. Also, maybe add a comment why `prog - start_of_ldx` is
> guaranteed to be small.

Ack, will add comments.

>
> >                       }
> >                       break;
> >
>
> [...]
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 08/11] bpf: Report arena faults to BPF stderr
  2025-05-09 20:01     ` Kumar Kartikeya Dwivedi
@ 2025-05-09 20:07       ` Eduard Zingerman
  2025-05-09 20:10         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09 20:07 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 2025-05-09 at 22:01 +0200, Kumar Kartikeya Dwivedi wrote:

[...]

> > >  bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
> > >  {
> > > -     u32 reg = x->fixup >> 8;
> > > +     u32 arena_reg = (x->fixup >> 8) & 0xff;
> > > +     bool is_arena = !!arena_reg;
> > > +     u32 reg = x->fixup >> 16;
> > > +     unsigned long addr;
> > > +
> > > +     /* Read here, if src_reg is dst_reg for load, we'll write 0 to it. */
> > > +     if (is_arena)
> > > +             addr = *(unsigned long *)((void *)regs + arena_reg);
> > 
> > Is it necessary to also take offset into account when calculating address?
> > 
> 
> Not sure what you mean? "arena_reg" is basically the offset of the
> register holding the arena address within pt_regs.

Arena access is translated as an instruction with three operands, e.g.:

  `movzx <dst>, byte ptr [<src> + r12 + <off>]`

As far as I understand the code, currently `addr` takes into account
`<src>` value, but not the `<off>` value.

[...]


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 08/11] bpf: Report arena faults to BPF stderr
  2025-05-09 20:07       ` Eduard Zingerman
@ 2025-05-09 20:10         ` Kumar Kartikeya Dwivedi
  2025-05-09 20:17           ` Eduard Zingerman
  0 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-09 20:10 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 9 May 2025 at 22:07, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-05-09 at 22:01 +0200, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > > >  bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
> > > >  {
> > > > -     u32 reg = x->fixup >> 8;
> > > > +     u32 arena_reg = (x->fixup >> 8) & 0xff;
> > > > +     bool is_arena = !!arena_reg;
> > > > +     u32 reg = x->fixup >> 16;
> > > > +     unsigned long addr;
> > > > +
> > > > +     /* Read here, if src_reg is dst_reg for load, we'll write 0 to it. */
> > > > +     if (is_arena)
> > > > +             addr = *(unsigned long *)((void *)regs + arena_reg);
> > >
> > > Is it necessary to also take offset into account when calculating address?
> > >
> >
> > Not sure what you mean? "arena_reg" is basically the offset of the
> > register holding the arena address within pt_regs.
>
> Arena access is translated as an instruction with three operands, e.g.:
>
>   `movzx <dst>, byte ptr [<src> + r12 + <off>]`
>
> As far as I understand the code, currently `addr` takes into account
> `<src>` value, but not the `<off>` value.

Ah, good point. We could certainly reconstruct it.
I'll look into it.
For prog authors I think giving them src + off in the output is the clearest?
IIUC that's what they'll see when they bpf_printk the pointer, too, right?
LLVM wouldn't insert cast insns unless the pointer is being loaded from.

>
> [...]
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 08/11] bpf: Report arena faults to BPF stderr
  2025-05-09 20:10         ` Kumar Kartikeya Dwivedi
@ 2025-05-09 20:17           ` Eduard Zingerman
  0 siblings, 0 replies; 55+ messages in thread
From: Eduard Zingerman @ 2025-05-09 20:17 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, kernel-team

On Fri, 2025-05-09 at 22:10 +0200, Kumar Kartikeya Dwivedi wrote:

[...]

> > Arena access is translated as an instruction with three operands, e.g.:
> > 
> >   `movzx <dst>, byte ptr [<src> + r12 + <off>]`
> > 
> > As far as I understand the code, currently `addr` takes into account
> > `<src>` value, but not the `<off>` value.
> 
> Ah, good point. We could certainly reconstruct it.
> I'll look into it.
> For prog authors I think giving them src + off in the output is the clearest?
> IIUC that's what they'll see when they bpf_printk the pointer, too, right?

Yes, the final address where access occurred.

> LLVM wouldn't insert cast insns unless the pointer is being loaded from.

I think so, have the following docstring in the BPFCheckAndAdjustIR.cpp:

// Support for BPF address spaces:
// - for each function in the module M, update pointer operand of
//   each memory access instruction (load/store/cmpxchg/atomicrmw)
//   by casting it from non-zero address space to zero address space, e.g:
//
//   (load (ptr addrspace (N) %p) ...)
//     -> (load (addrspacecast ptr addrspace (N) %p to ptr))


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice
  2025-05-07 17:17 ` [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice Kumar Kartikeya Dwivedi
  2025-05-09 17:19   ` Eduard Zingerman
@ 2025-05-09 21:11   ` Andrii Nakryiko
  1 sibling, 0 replies; 55+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 21:11 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

On Wed, May 7, 2025 at 10:17 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Add a new bpf_dynptr_from_mem_slice kfunc to create a dynptr from a
> PTR_TO_BTF_ID exposing a variable-length slice of memory, represented by
> the new bpf_mem_slice type. This slice is read-only, for a read-write
> slice we can expose a distinct type in the future.
>
> Since this is the first kfunc with potential local dynptr
> initialization, add it to the if-else list in check_kfunc_call.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h   |  6 ++++++
>  kernel/bpf/helpers.c  | 37 +++++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c |  6 +++++-
>  3 files changed, 48 insertions(+), 1 deletion(-)
>

LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>


> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 3f0cc89c0622..b0ea0b71df90 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1344,6 +1344,12 @@ enum bpf_dynptr_type {
>         BPF_DYNPTR_TYPE_XDP,
>  };
>
> +struct bpf_mem_slice {
> +       void *ptr;
> +       u32 len;
> +       u32 reserved;
> +};
> +
>  int bpf_dynptr_check_size(u32 size);
>  u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr);
>  const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len);
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 78cefb41266a..89ab3481378d 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2873,6 +2873,42 @@ __bpf_kfunc int bpf_dynptr_copy(struct bpf_dynptr *dst_ptr, u32 dst_off,
>         return 0;
>  }
>
> +/**
> + * bpf_dynptr_from_mem_slice - Create a dynptr from a bpf_mem_slice
> + * @mem_slice: Source bpf_mem_slice, backing the underlying memory for dynptr
> + * @flags: Flags for dynptr construction, currently no supported flags.
> + * @dptr__uninit: Destination dynptr, which will be initialized.
> + *
> + * Creates a dynptr that points to variable-length read-only memory represented
> + * by a bpf_mem_slice fat pointer.
> + * Returns 0 on success; negative error, otherwise.
> + */
> +__bpf_kfunc int bpf_dynptr_from_mem_slice(struct bpf_mem_slice *mem_slice, u64 flags, struct bpf_dynptr *dptr__uninit)
> +{
> +       struct bpf_dynptr_kern *dptr = (struct bpf_dynptr_kern *)dptr__uninit;
> +       int err;
> +
> +       /* mem_slice is never NULL, as we use KF_TRUSTED_ARGS. */
> +       err = bpf_dynptr_check_size(mem_slice->len);
> +       if (err)
> +               goto error;
> +
> +       /* flags is currently unsupported */
> +       if (flags) {
> +               err = -EINVAL;
> +               goto error;
> +       }
> +
> +       bpf_dynptr_init(dptr, mem_slice->ptr, BPF_DYNPTR_TYPE_LOCAL, 0, mem_slice->len);
> +       bpf_dynptr_set_rdonly(dptr);
> +
> +       return 0;
> +
> +error:
> +       bpf_dynptr_set_null(dptr);
> +       return err;
> +}
> +
>  __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
>  {
>         return obj;
> @@ -3327,6 +3363,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>  BTF_ID_FLAGS(func, bpf_dynptr_size)
>  BTF_ID_FLAGS(func, bpf_dynptr_clone)
>  BTF_ID_FLAGS(func, bpf_dynptr_copy)
> +BTF_ID_FLAGS(func, bpf_dynptr_from_mem_slice, KF_TRUSTED_ARGS)
>  #ifdef CONFIG_NET
>  BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
>  #endif
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 99aa2c890e7b..ff34e68c9237 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12116,6 +12116,7 @@ enum special_kfunc_type {
>         KF_bpf_res_spin_unlock,
>         KF_bpf_res_spin_lock_irqsave,
>         KF_bpf_res_spin_unlock_irqrestore,
> +       KF_bpf_dynptr_from_mem_slice,
>  };
>
>  BTF_SET_START(special_kfunc_set)
> @@ -12219,6 +12220,7 @@ BTF_ID(func, bpf_res_spin_lock)
>  BTF_ID(func, bpf_res_spin_unlock)
>  BTF_ID(func, bpf_res_spin_lock_irqsave)
>  BTF_ID(func, bpf_res_spin_unlock_irqrestore)
> +BTF_ID(func, bpf_dynptr_from_mem_slice)
>
>  static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
>  {
> @@ -13140,7 +13142,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>                         if (is_kfunc_arg_uninit(btf, &args[i]))
>                                 dynptr_arg_type |= MEM_UNINIT;
>
> -                       if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) {
> +                       if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_mem_slice]) {
> +                               dynptr_arg_type |= DYNPTR_TYPE_LOCAL;
> +                       } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) {
>                                 dynptr_arg_type |= DYNPTR_TYPE_SKB;
>                         } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp]) {
>                                 dynptr_arg_type |= DYNPTR_TYPE_XDP;
> --
> 2.47.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info
  2025-05-07 17:17 ` [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
  2025-05-08 10:30   ` kernel test robot
  2025-05-08 20:15   ` Eduard Zingerman
@ 2025-05-09 21:17   ` Andrii Nakryiko
  2 siblings, 0 replies; 55+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 21:17 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

On Wed, May 7, 2025 at 10:17 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Prepare a function for use in future patches that can extract the file
> info, line info, and the source line number for a given BPF program
> provided it's program counter.
>
> Only the basename of the file path is provided, given it can be
> excessively long in some cases.
>
> This will be used in later patches to print source info to the BPF
> stream. The source line number is indicated by the return value, and the
> file and line info are provided through out parameters.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf.h |  2 ++
>  kernel/bpf/core.c   | 40 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 2c10ae62df2d..f12a0bf536c0 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3644,4 +3644,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
>         return prog->aux->func_idx != 0;
>  }
>
> +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 22c278c008ce..df1bae084abd 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3204,3 +3204,43 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
>
>  EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
> +
> +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
> +{
> +       int idx = -1, insn_start, insn_end, len;
> +       struct bpf_line_info *linfo;
> +       void **jited_linfo;
> +       struct btf *btf;
> +
> +       btf = prog->aux->btf;
> +       linfo = prog->aux->linfo;
> +       jited_linfo = prog->aux->jited_linfo;
> +
> +       if (!btf || !linfo || !prog->aux->jited_linfo)
> +               return -EINVAL;
> +       len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
> +
> +       linfo = &prog->aux->linfo[prog->aux->linfo_idx];
> +       jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
> +
> +       insn_start = linfo[0].insn_off;
> +       insn_end = insn_start + len;
> +
> +       for (int i = 0; linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {

have you checked find_linfo() in kernel/bpf/log.c? it uses binary
search, why not do that here as well? Or better yet reuse the code to
find bpf_line_info, and then extract whatever derived data you need?

> +               if (jited_linfo[i] >= (void *)ip)
> +                       break;
> +               idx = i;
> +       }
> +
> +       if (idx == -1)
> +               return -ENOENT;
> +
> +       /* Get base component of the file path. */
> +       *filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
> +       *filep = kbasename(*filep);
> +       /* Obtain the source line, and strip whitespace in prefix. */
> +       *linep = btf_name_by_offset(btf, linfo[idx].line_off);
> +       while (isspace(**linep))
> +               *linep += 1;
> +       return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);

we do a bunch of this in verbose_linfo(), maybe extract common code to
reuse? (no strong feeling about this, it's just a few lines of code
that are unlikely to change, after all)

> +}
> --
> 2.47.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-07 17:17 ` [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
  2025-05-08 23:31   ` Eduard Zingerman
  2025-05-08 23:41   ` Alexei Starovoitov
@ 2025-05-09 21:26   ` Andrii Nakryiko
  2 siblings, 0 replies; 55+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 21:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
	Barret Rhoden, Matt Bobrowski, kkd, kernel-team

On Wed, May 7, 2025 at 10:17 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Introduce a new macro that allows printing data similar to bpf_printk(),
> but to BPF streams. The first argument is the stream ID, the rest of the
> arguments are same as what one would pass to bpf_printk().
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/stream.c         | 10 +++++++--
>  tools/lib/bpf/bpf_helpers.h | 44 +++++++++++++++++++++++++++++++------
>  2 files changed, 45 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> index eaf0574866b1..d64975486ad1 100644
> --- a/kernel/bpf/stream.c
> +++ b/kernel/bpf/stream.c
> @@ -257,7 +257,12 @@ __bpf_kfunc int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__s
>         return ret;
>  }
>
> -__bpf_kfunc struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, void *aux__ign)
> +/* Use int vs enum stream_id here, we use this kfunc in bpf_helpers.h, and
> + * keeping enum stream_id necessitates a complete definition of enum, but we
> + * can't copy it in the header as it may conflict with the definition in
> + * vmlinux.h.
> + */
> +__bpf_kfunc struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign)
>  {
>         struct bpf_prog_aux *aux = aux__ign;
>
> @@ -351,7 +356,8 @@ __bpf_kfunc struct bpf_stream_elem *bpf_stream_next_elem(struct bpf_stream *stre
>         return elem;
>  }
>
> -__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(enum bpf_stream_id stream_id, u32 prog_id)
> +/* Use int vs enum bpf_stream_id for consistency with bpf_stream_get. */
> +__bpf_kfunc struct bpf_stream *bpf_prog_stream_get(int stream_id, u32 prog_id)
>  {
>         struct bpf_stream *stream;
>         struct bpf_prog *prog;
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index a50773d4616e..1a748c21e358 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -314,17 +314,47 @@ enum libbpf_tristate {
>                           ___param, sizeof(___param));          \
>  })
>
> +struct bpf_stream;
> +
> +extern struct bpf_stream *bpf_stream_get(int stream_id, void *aux__ign) __weak __ksym;
> +extern int bpf_stream_vprintk(struct bpf_stream *stream, const char *fmt__str, const void *args,
> +                             __u32 len__sz) __weak __ksym;
> +
> +#define __bpf_stream_vprintk(stream, fmt, args...)                             \
> +({                                                                             \
> +       static const char ___fmt[] = fmt;                                       \
> +       unsigned long long ___param[___bpf_narg(args)];                         \
> +                                                                               \
> +       _Pragma("GCC diagnostic push")                                          \
> +       _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")                  \
> +       ___bpf_fill(___param, args);                                            \
> +       _Pragma("GCC diagnostic pop")                                           \
> +                                                                               \
> +       int ___id = stream;                                                     \

What's the point of ___id variable? Just use `stream` in
bpf_stream_get() directly?

> +       struct bpf_stream *___sptr = bpf_stream_get(___id, NULL);               \
> +       if (___sptr)                                                            \
> +               bpf_stream_vprintk(___sptr, ___fmt, ___param, sizeof(___param));\
> +})
> +
>  /* Use __bpf_printk when bpf_printk call has 3 or fewer fmt args
> - * Otherwise use __bpf_vprintk
> + * Otherwise use __bpf_vprintk. Virtualize choices so stream printk
> + * can override it to bpf_stream_vprintk.
>   */
> -#define ___bpf_pick_printk(...) \
> -       ___bpf_nth(_, ##__VA_ARGS__, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,       \
> -                  __bpf_vprintk, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,          \
> -                  __bpf_vprintk, __bpf_vprintk, __bpf_printk /*3*/, __bpf_printk /*2*/,\
> -                  __bpf_printk /*1*/, __bpf_printk /*0*/)
> +#define ___bpf_pick_printk(choice, choice_3, ...)                      \
> +       ___bpf_nth(_, ##__VA_ARGS__, choice, choice, choice,            \
> +                  choice, choice, choice, choice,                      \
> +                  choice, choice, choice_3 /*3*/, choice_3 /*2*/,      \
> +                  choice_3 /*1*/, choice_3 /*0*/)
>
>  /* Helper macro to print out debug messages */
> -#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
> +#define __bpf_trace_printk(fmt, args...) \
> +       ___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
> +#define __bpf_stream_printk(stream, fmt, args...) \
> +       ___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
> +
> +#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_printk(stream, fmt, ##args)
> +
> +#define bpf_printk(arg, args...) __bpf_trace_printk(arg, ##args)
>
>  struct bpf_iter_num;
>
> --
> 2.47.1
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro
  2025-05-09  6:16       ` Eduard Zingerman
@ 2025-05-09 21:28         ` Andrii Nakryiko
  0 siblings, 0 replies; 55+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 21:28 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Kumar Kartikeya Dwivedi, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
	Matt Bobrowski, kkd, kernel-team

On Thu, May 8, 2025 at 11:16 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-05-09 at 01:33 +0200, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > > > -#define ___bpf_pick_printk(...) \
> > > > -     ___bpf_nth(_, ##__VA_ARGS__, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,       \
> > > > -                __bpf_vprintk, __bpf_vprintk, __bpf_vprintk, __bpf_vprintk,          \
> > > > -                __bpf_vprintk, __bpf_vprintk, __bpf_printk /*3*/, __bpf_printk /*2*/,\
> > > > -                __bpf_printk /*1*/, __bpf_printk /*0*/)
> > > > +#define ___bpf_pick_printk(choice, choice_3, ...)                    \
> > > > +     ___bpf_nth(_, ##__VA_ARGS__, choice, choice, choice,            \
> > > > +                choice, choice, choice, choice,                      \
> > > > +                choice, choice, choice_3 /*3*/, choice_3 /*2*/,      \
> > > > +                choice_3 /*1*/, choice_3 /*0*/)
> > > >
> > > >  /* Helper macro to print out debug messages */
> > > > -#define bpf_printk(fmt, args...) ___bpf_pick_printk(args)(fmt, ##args)
> > > > +#define __bpf_trace_printk(fmt, args...) \
> > > > +     ___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
> > > > +#define __bpf_stream_printk(stream, fmt, args...) \
> > > > +     ___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
> > >                            ^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^
> > >                            These two parameters are identical,
> > >                            why is ___bpf_pick_printk is necessary in such case?
> >
> > In our case choice and choice_3 are the same, but for bpf_printk
> > they're different, I was mostly trying to reuse the pick_printk
> > machinery for both (which dispatches correctly to the actual macro).
> >
>
> But ___bpf_pick_printk is a noop if two identical choices are supplied,
> so there is nothing to reuse. E.g. nothing breaks after the following change:
>
>    #define __bpf_trace_printk(fmt, args...) \
>           ___bpf_pick_printk(__bpf_vprintk, __bpf_printk, args)(fmt, ##args)
>   -#define __bpf_stream_printk(stream, fmt, args...) \
>   -       ___bpf_pick_printk(__bpf_stream_vprintk, __bpf_stream_vprintk, args)(stream, fmt, ##args)
>
>   -#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_printk(stream, fmt, ##args)
>   +#define bpf_stream_printk(stream, fmt, args...) __bpf_stream_vprintk(stream, fmt, ##args)
>
>    #define bpf_printk(arg, args...) __bpf_trace_printk(arg, ##args)
>
> Which allows to shorten this patch.
> Or do I miss something?
>

+1, we have this ___bpf_pick_printk business because we want to use
older bpf_trace_printk() that accepts values directly if possible (for
best support of old kernels). With bpf_stream_vprintk() it's always
values-in-array approach, so no need for all this extra macro
machinery, IMO.

> [...]
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-09 18:48         ` Alexei Starovoitov
  2025-05-09 19:37           ` Eduard Zingerman
@ 2025-05-09 21:33           ` Andrii Nakryiko
  2025-05-12 20:51             ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 55+ messages in thread
From: Andrii Nakryiko @ 2025-05-09 21:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eduard Zingerman, Kumar Kartikeya Dwivedi, bpf, Quentin Monnet,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, Kernel Team

On Fri, May 9, 2025 at 11:48 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 9, 2025 at 11:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:
> >
> > [...]
> >
> > > How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> > > Or add a new command ?
> >
> > You mean like this:
> >
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 71d5ac83cf5d..25ac28d11af5 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -6610,6 +6610,10 @@ struct bpf_prog_info {
> >         __u32 verified_insns;
> >         __u32 attach_btf_obj_id;
> >         __u32 attach_btf_id;
> > +       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
> > +       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
> > +       __aligned_u64 stdout;
> > +       __aligned_u64 stderr;
> >  } __attribute__((aligned(8)));
> >
> > And return -EAGAIN if there is more data to read?
>
> Exactly.
> The only concern that all other __aligned_u64 will probably be zero,
> but kernel will still fill in all other non-pointer fields and
> that information will be re-populated again and again,
> so new command might be cleaner.

+1, but I'd allow reading only either stdout or stderr per each
command invocation to keep things simple API-wise (e.g., which stream
got EAGAIN, if you asked for both?) I haven't read carefully enough to
know if we'll allow creating custom streams beyond stderr/stdout, but
this would scale to that more naturally as well.



>
> > Imo, having this in syscall is more convenient for the end users.
> >
> > Alternatively, are files in bpffs considered to be stable API?
> > E.g. having something like /sys/fs/bpf/<prog-id>/std{err,out} .
>
> yeah. Ideally the user would just 'cat /sys/.../stdout',
> but we don't auto create pseudo files when progs are loaded.
> Maybe we should.
> 'bpftool prog show' will become 'ls' in some directory.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-09 21:33           ` Andrii Nakryiko
@ 2025-05-12 20:51             ` Kumar Kartikeya Dwivedi
  2025-05-12 21:35               ` Andrii Nakryiko
  2025-05-12 21:50               ` Alexei Starovoitov
  0 siblings, 2 replies; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-12 20:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Eduard Zingerman, bpf, Quentin Monnet,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, Kernel Team

On Fri, 9 May 2025 at 17:33, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Fri, May 9, 2025 at 11:48 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, May 9, 2025 at 11:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:
> > >
> > > [...]
> > >
> > > > How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> > > > Or add a new command ?
> > >
> > > You mean like this:
> > >
> > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > > index 71d5ac83cf5d..25ac28d11af5 100644
> > > --- a/tools/include/uapi/linux/bpf.h
> > > +++ b/tools/include/uapi/linux/bpf.h
> > > @@ -6610,6 +6610,10 @@ struct bpf_prog_info {
> > >         __u32 verified_insns;
> > >         __u32 attach_btf_obj_id;
> > >         __u32 attach_btf_id;
> > > +       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
> > > +       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
> > > +       __aligned_u64 stdout;
> > > +       __aligned_u64 stderr;
> > >  } __attribute__((aligned(8)));
> > >
> > > And return -EAGAIN if there is more data to read?
> >
> > Exactly.
> > The only concern that all other __aligned_u64 will probably be zero,
> > but kernel will still fill in all other non-pointer fields and
> > that information will be re-populated again and again,
> > so new command might be cleaner.
>
> +1, but I'd allow reading only either stdout or stderr per each
> command invocation to keep things simple API-wise (e.g., which stream
> got EAGAIN, if you asked for both?) I haven't read carefully enough to
> know if we'll allow creating custom streams beyond stderr/stdout, but
> this would scale to that more naturally as well.
>

What's your preference/concerns re: pseudo files in sysfs?
That does seem like it would be simplest for someone using this
(read() on a file vs special BPF syscall).

>
>
> >
> > > Imo, having this in syscall is more convenient for the end users.
> > >
> > > Alternatively, are files in bpffs considered to be stable API?
> > > E.g. having something like /sys/fs/bpf/<prog-id>/std{err,out} .
> >
> > yeah. Ideally the user would just 'cat /sys/.../stdout',
> > but we don't auto create pseudo files when progs are loaded.
> > Maybe we should.
> > 'bpftool prog show' will become 'ls' in some directory.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-12 20:51             ` Kumar Kartikeya Dwivedi
@ 2025-05-12 21:35               ` Andrii Nakryiko
  2025-05-12 21:50               ` Alexei Starovoitov
  1 sibling, 0 replies; 55+ messages in thread
From: Andrii Nakryiko @ 2025-05-12 21:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Alexei Starovoitov, Eduard Zingerman, bpf, Quentin Monnet,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, Kernel Team

On Mon, May 12, 2025 at 1:51 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Fri, 9 May 2025 at 17:33, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, May 9, 2025 at 11:48 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, May 9, 2025 at 11:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > >
> > > > On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:
> > > >
> > > > [...]
> > > >
> > > > > How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> > > > > Or add a new command ?
> > > >
> > > > You mean like this:
> > > >
> > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > > > index 71d5ac83cf5d..25ac28d11af5 100644
> > > > --- a/tools/include/uapi/linux/bpf.h
> > > > +++ b/tools/include/uapi/linux/bpf.h
> > > > @@ -6610,6 +6610,10 @@ struct bpf_prog_info {
> > > >         __u32 verified_insns;
> > > >         __u32 attach_btf_obj_id;
> > > >         __u32 attach_btf_id;
> > > > +       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
> > > > +       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
> > > > +       __aligned_u64 stdout;
> > > > +       __aligned_u64 stderr;
> > > >  } __attribute__((aligned(8)));
> > > >
> > > > And return -EAGAIN if there is more data to read?
> > >
> > > Exactly.
> > > The only concern that all other __aligned_u64 will probably be zero,
> > > but kernel will still fill in all other non-pointer fields and
> > > that information will be re-populated again and again,
> > > so new command might be cleaner.
> >
> > +1, but I'd allow reading only either stdout or stderr per each
> > command invocation to keep things simple API-wise (e.g., which stream
> > got EAGAIN, if you asked for both?) I haven't read carefully enough to
> > know if we'll allow creating custom streams beyond stderr/stdout, but
> > this would scale to that more naturally as well.
> >
>
> What's your preference/concerns re: pseudo files in sysfs?
> That does seem like it would be simplest for someone using this
> (read() on a file vs special BPF syscall).

sysfs approach seems fine to me, not sure I have any concerns

>
> >
> >
> > >
> > > > Imo, having this in syscall is more convenient for the end users.
> > > >
> > > > Alternatively, are files in bpffs considered to be stable API?
> > > > E.g. having something like /sys/fs/bpf/<prog-id>/std{err,out} .
> > >
> > > yeah. Ideally the user would just 'cat /sys/.../stdout',
> > > but we don't auto create pseudo files when progs are loaded.
> > > Maybe we should.
> > > 'bpftool prog show' will become 'ls' in some directory.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-12 20:51             ` Kumar Kartikeya Dwivedi
  2025-05-12 21:35               ` Andrii Nakryiko
@ 2025-05-12 21:50               ` Alexei Starovoitov
  2025-05-12 21:56                 ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 55+ messages in thread
From: Alexei Starovoitov @ 2025-05-12 21:50 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Andrii Nakryiko, Eduard Zingerman, bpf, Quentin Monnet,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, Kernel Team

On Mon, May 12, 2025 at 1:51 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Fri, 9 May 2025 at 17:33, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, May 9, 2025 at 11:48 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, May 9, 2025 at 11:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > >
> > > > On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:
> > > >
> > > > [...]
> > > >
> > > > > How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> > > > > Or add a new command ?
> > > >
> > > > You mean like this:
> > > >
> > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > > > index 71d5ac83cf5d..25ac28d11af5 100644
> > > > --- a/tools/include/uapi/linux/bpf.h
> > > > +++ b/tools/include/uapi/linux/bpf.h
> > > > @@ -6610,6 +6610,10 @@ struct bpf_prog_info {
> > > >         __u32 verified_insns;
> > > >         __u32 attach_btf_obj_id;
> > > >         __u32 attach_btf_id;
> > > > +       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
> > > > +       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
> > > > +       __aligned_u64 stdout;
> > > > +       __aligned_u64 stderr;
> > > >  } __attribute__((aligned(8)));
> > > >
> > > > And return -EAGAIN if there is more data to read?
> > >
> > > Exactly.
> > > The only concern that all other __aligned_u64 will probably be zero,
> > > but kernel will still fill in all other non-pointer fields and
> > > that information will be re-populated again and again,
> > > so new command might be cleaner.
> >
> > +1, but I'd allow reading only either stdout or stderr per each
> > command invocation to keep things simple API-wise (e.g., which stream
> > got EAGAIN, if you asked for both?) I haven't read carefully enough to
> > know if we'll allow creating custom streams beyond stderr/stdout, but
> > this would scale to that more naturally as well.
> >
>
> What's your preference/concerns re: pseudo files in sysfs?
> That does seem like it would be simplest for someone using this
> (read() on a file vs special BPF syscall).

sysfs is abi.
If we start creating directories:
/sys/kernel/bpf/<prog_id>/stdout
it will be permanent.

Though I'd like to see it, I feel we're not quite ready
to cross that bridge.

Let's add a new sys_bpf command for now,
some trivial helper function in libbpf,
and corresponding bpftool support.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-12 21:50               ` Alexei Starovoitov
@ 2025-05-12 21:56                 ` Kumar Kartikeya Dwivedi
  2025-05-12 22:07                   ` Alexei Starovoitov
  0 siblings, 1 reply; 55+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-12 21:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Eduard Zingerman, bpf, Quentin Monnet,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, Kernel Team

On Mon, 12 May 2025 at 17:50, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 12, 2025 at 1:51 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Fri, 9 May 2025 at 17:33, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, May 9, 2025 at 11:48 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Fri, May 9, 2025 at 11:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > >
> > > > > On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > > > How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> > > > > > Or add a new command ?
> > > > >
> > > > > You mean like this:
> > > > >
> > > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > > > > index 71d5ac83cf5d..25ac28d11af5 100644
> > > > > --- a/tools/include/uapi/linux/bpf.h
> > > > > +++ b/tools/include/uapi/linux/bpf.h
> > > > > @@ -6610,6 +6610,10 @@ struct bpf_prog_info {
> > > > >         __u32 verified_insns;
> > > > >         __u32 attach_btf_obj_id;
> > > > >         __u32 attach_btf_id;
> > > > > +       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
> > > > > +       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
> > > > > +       __aligned_u64 stdout;
> > > > > +       __aligned_u64 stderr;
> > > > >  } __attribute__((aligned(8)));
> > > > >
> > > > > And return -EAGAIN if there is more data to read?
> > > >
> > > > Exactly.
> > > > The only concern that all other __aligned_u64 will probably be zero,
> > > > but kernel will still fill in all other non-pointer fields and
> > > > that information will be re-populated again and again,
> > > > so new command might be cleaner.
> > >
> > > +1, but I'd allow reading only either stdout or stderr per each
> > > command invocation to keep things simple API-wise (e.g., which stream
> > > got EAGAIN, if you asked for both?) I haven't read carefully enough to
> > > know if we'll allow creating custom streams beyond stderr/stdout, but
> > > this would scale to that more naturally as well.
> > >
> >
> > What's your preference/concerns re: pseudo files in sysfs?
> > That does seem like it would be simplest for someone using this
> > (read() on a file vs special BPF syscall).
>
> sysfs is abi.
> If we start creating directories:
> /sys/kernel/bpf/<prog_id>/stdout
> it will be permanent.
>
> Though I'd like to see it, I feel we're not quite ready
> to cross that bridge.
>
> Let's add a new sys_bpf command for now,
> some trivial helper function in libbpf,
> and corresponding bpftool support.

Ok, but the new sys_bpf command is also ABI, no?
I'm fine with either, but it seems both will be permanent.
Only difference is visibility.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams
  2025-05-12 21:56                 ` Kumar Kartikeya Dwivedi
@ 2025-05-12 22:07                   ` Alexei Starovoitov
  0 siblings, 0 replies; 55+ messages in thread
From: Alexei Starovoitov @ 2025-05-12 22:07 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Andrii Nakryiko, Eduard Zingerman, bpf, Quentin Monnet,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
	kkd, Kernel Team

On Mon, May 12, 2025 at 2:57 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Mon, 12 May 2025 at 17:50, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, May 12, 2025 at 1:51 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Fri, 9 May 2025 at 17:33, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Fri, May 9, 2025 at 11:48 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Fri, May 9, 2025 at 11:31 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, 2025-05-09 at 10:31 -0700, Alexei Starovoitov wrote:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > How about we extend BPF_OBJ_GET_INFO_BY_FD to return stream data?
> > > > > > > Or add a new command ?
> > > > > >
> > > > > > You mean like this:
> > > > > >
> > > > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > > > > > index 71d5ac83cf5d..25ac28d11af5 100644
> > > > > > --- a/tools/include/uapi/linux/bpf.h
> > > > > > +++ b/tools/include/uapi/linux/bpf.h
> > > > > > @@ -6610,6 +6610,10 @@ struct bpf_prog_info {
> > > > > >         __u32 verified_insns;
> > > > > >         __u32 attach_btf_obj_id;
> > > > > >         __u32 attach_btf_id;
> > > > > > +       __u32 stdout_len; /* length of the buffer passed in 'stdout' */
> > > > > > +       __u32 stderr_len; /* length of the buffer passed in 'stderr' */
> > > > > > +       __aligned_u64 stdout;
> > > > > > +       __aligned_u64 stderr;
> > > > > >  } __attribute__((aligned(8)));
> > > > > >
> > > > > > And return -EAGAIN if there is more data to read?
> > > > >
> > > > > Exactly.
> > > > > The only concern that all other __aligned_u64 will probably be zero,
> > > > > but kernel will still fill in all other non-pointer fields and
> > > > > that information will be re-populated again and again,
> > > > > so new command might be cleaner.
> > > >
> > > > +1, but I'd allow reading only either stdout or stderr per each
> > > > command invocation to keep things simple API-wise (e.g., which stream
> > > > got EAGAIN, if you asked for both?) I haven't read carefully enough to
> > > > know if we'll allow creating custom streams beyond stderr/stdout, but
> > > > this would scale to that more naturally as well.
> > > >
> > >
> > > What's your preference/concerns re: pseudo files in sysfs?
> > > That does seem like it would be simplest for someone using this
> > > (read() on a file vs special BPF syscall).
> >
> > sysfs is abi.
> > If we start creating directories:
> > /sys/kernel/bpf/<prog_id>/stdout
> > it will be permanent.
> >
> > Though I'd like to see it, I feel we're not quite ready
> > to cross that bridge.
> >
> > Let's add a new sys_bpf command for now,
> > some trivial helper function in libbpf,
> > and corresponding bpftool support.
>
> Ok, but the new sys_bpf command is also ABI, no?
> I'm fine with either, but it seems both will be permanent.
> Only difference is visibility.

Right, but the blast radius is smaller.
cmd is easier to extend.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2025-05-12 22:07 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 17:17 [PATCH bpf-next v1 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
2025-05-07 17:17 ` [PATCH bpf-next v1 01/11] bpf: Introduce bpf_dynptr_from_mem_slice Kumar Kartikeya Dwivedi
2025-05-09 17:19   ` Eduard Zingerman
2025-05-09 21:11   ` Andrii Nakryiko
2025-05-07 17:17 ` [PATCH bpf-next v1 02/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
2025-05-08 23:54   ` Eduard Zingerman
2025-05-09  0:10     ` Kumar Kartikeya Dwivedi
2025-05-07 17:17 ` [PATCH bpf-next v1 03/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
2025-05-08 10:30   ` kernel test robot
2025-05-08 20:15   ` Eduard Zingerman
2025-05-08 23:32     ` Kumar Kartikeya Dwivedi
2025-05-09 21:17   ` Andrii Nakryiko
2025-05-07 17:17 ` [PATCH bpf-next v1 04/11] bpf: Add function to find program from stack trace Kumar Kartikeya Dwivedi
2025-05-08 23:07   ` Eduard Zingerman
2025-05-08 23:29     ` Kumar Kartikeya Dwivedi
2025-05-07 17:17 ` [PATCH bpf-next v1 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
2025-05-08 22:38   ` Eduard Zingerman
2025-05-08 23:29     ` Kumar Kartikeya Dwivedi
2025-05-07 17:17 ` [PATCH bpf-next v1 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
2025-05-08 12:53   ` kernel test robot
2025-05-09  6:22   ` Eduard Zingerman
2025-05-09  9:19   ` Alan Maguire
2025-05-07 17:17 ` [PATCH bpf-next v1 07/11] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
2025-05-07 17:17 ` [PATCH bpf-next v1 08/11] bpf: Report arena faults " Kumar Kartikeya Dwivedi
2025-05-09 19:28   ` Eduard Zingerman
2025-05-09 20:01     ` Kumar Kartikeya Dwivedi
2025-05-09 20:07       ` Eduard Zingerman
2025-05-09 20:10         ` Kumar Kartikeya Dwivedi
2025-05-09 20:17           ` Eduard Zingerman
2025-05-07 17:17 ` [PATCH bpf-next v1 09/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
2025-05-08 23:31   ` Eduard Zingerman
2025-05-08 23:33     ` Kumar Kartikeya Dwivedi
2025-05-09  6:16       ` Eduard Zingerman
2025-05-09 21:28         ` Andrii Nakryiko
2025-05-08 23:41   ` Alexei Starovoitov
2025-05-08 23:48     ` Kumar Kartikeya Dwivedi
2025-05-08 23:50       ` Kumar Kartikeya Dwivedi
2025-05-09 21:26   ` Andrii Nakryiko
2025-05-07 17:17 ` [PATCH bpf-next v1 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
2025-05-08 10:41   ` Quentin Monnet
2025-05-08 23:41     ` Kumar Kartikeya Dwivedi
2025-05-09  6:21   ` Eduard Zingerman
2025-05-09 17:31     ` Alexei Starovoitov
2025-05-09 18:31       ` Eduard Zingerman
2025-05-09 18:48         ` Alexei Starovoitov
2025-05-09 19:37           ` Eduard Zingerman
2025-05-09 19:50             ` Kumar Kartikeya Dwivedi
2025-05-09 21:33           ` Andrii Nakryiko
2025-05-12 20:51             ` Kumar Kartikeya Dwivedi
2025-05-12 21:35               ` Andrii Nakryiko
2025-05-12 21:50               ` Alexei Starovoitov
2025-05-12 21:56                 ` Kumar Kartikeya Dwivedi
2025-05-12 22:07                   ` Alexei Starovoitov
2025-05-07 17:17 ` [PATCH bpf-next v1 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
2025-05-09 17:18   ` Eduard Zingerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.