* [PATCH bpf-next v2 00/11] BPF Standard Streams
@ 2025-05-24 1:18 Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
` (10 more replies)
0 siblings, 11 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
This set introduces a standard output interface with two streams, namely
stdout and stderr, for BPF programs. The idea is that these streams will
be written to by BPF programs and the kernel, and serve as standard
interfaces for informing user space of any BPF runtime violations. Users
can also utilize them for printing normal messages for debugging usage,
as is the case with bpf_printk() and trace pipe interface.
BPF programs and the kernel can use these streams to output messages.
User space can dump these messages using bpftool.
The stream interface itself is implemented using a lockless list, so
that we can queue messages from any context. Every printk statement into
the stream leads to memory allocation. Allocation itself relies on
try_alloc_pages() to construct a bespoke bump allocator to carve out
elements. If this fails, we finally give up and drop the message.
See commit logs for more details.
Two scenarios are covered:
- Deadlocks and timeouts in rqspinlock.
- Timeouts for may_goto.
In each we provide the stack trace and source information for the
offending BPF programs. Both the C source line and the file and line
numbers are printed. The output format is as follows:
ERROR: AA or ABBA deadlock detected for bpf_res_spin_lock
Attempted lock = 0xff11000108f3a5e0
Total held locks = 1
Held lock[ 0] = 0xff11000108f3a5e0
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_rqspinlock_violation+0x10b/0x130
bpf_res_spin_lock+0x8c/0xa0
bpf_prog_3699ea119d1f6ed8_foo+0xe5/0x140
if (!bpf_res_spin_lock(&v2->lock)) @ stream_bpftool.c:62
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
ERROR: Timeout detected for may_goto instruction
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_may_goto_violation+0x6a/0x90
bpf_check_timed_may_goto+0x4d/0xa0
arch_bpf_timed_may_goto+0x21/0x40
bpf_prog_3699ea119d1f6ed8_foo+0x12f/0x140
while (can_loop) @ stream_bpftool.c:71
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20250507171720.1958296-1-memxor@gmail.com
* Drop arena page fault prints, will be done as follow up. (Alexei)
* Defer Andrii's request to reuse code and Alan's suggestion of error
counts to follow up.
* Drop bpf_dynptr_from_mem_slice patch.
* Drop some acks due to heavy reworking.
* Fix KASAN splat in bpf_prog_get_file_line. (Eduard)
* Collapse bpf_prog_ksym_find and is_bpf_text_address into single
call. (Eduard)
* Add missing RCU read lock in bpf_prog_ksym_find.
* Fix incorrect error handling in dump_stack_cb.
* Simplify libbpf macro. (Eduard, Andrii)
* Introduce bpf_prog_stream_read() libbpf API. (Eduard, Alexei, Andrii)
* Drop BPF prog from the bpftool, use libbpf API.
* Rework selftests.
RFC v1 -> v1
RFC v1: https://lore.kernel.org/bpf/20250414161443.1146103-1-memxor@gmail.com
* Rebase on bpf-next/master.
* Change output in dump_stack to also print source line. (Alexei)
* Simplify API to single pop() operation. (Eduard, Alexei)
* Add kdoc for bpf_dynptr_from_mem_slice.
* Fix -EINVAL returned from prog_dump_stream. (Eduard)
* Split dump_stack() patch into multiple commits.
* Add macro wrapping stream staging API.
* Change bpftool command from dump to tracelog. (Quentin)
* Add bpftool documentation and bash completion. (Quentin)
* Change license of bpftool to Dual BSD/GPL.
* Simplify memory allocator. (Alexei)
* No overflow into second page.
* Remove bpf_mem_alloc() fallback.
* Symlink bpftool BPF program and exercise as selftest. (Eduard)
* Verify output after dumping from ringbuf. (Eduard)
* More failure cases to check API invariants.
* Remove patches for dynptr lifetime fixes (split into separate set).
* Limit maximum error messages, and add stream capacity. (Eduard)
Kumar Kartikeya Dwivedi (11):
bpf: Introduce BPF standard streams
bpf: Add function to extract program source info
bpf: Add function to find program from stack trace
bpf: Hold RCU read lock in bpf_prog_ksym_find
bpf: Add dump_stack() analogue to print to BPF stderr
bpf: Report may_goto timeout to BPF stderr
bpf: Report rqspinlock deadlocks/timeout to BPF stderr
libbpf: Add bpf_stream_printk() macro
libbpf: Introduce bpf_prog_stream_read() API
bpftool: Add support for dumping streams
selftests/bpf: Add tests for prog streams
arch/x86/net/bpf_jit_comp.c | 1 -
include/linux/bpf.h | 85 ++-
include/uapi/linux/bpf.h | 19 +
kernel/bpf/Makefile | 2 +-
kernel/bpf/core.c | 112 +++-
kernel/bpf/helpers.c | 26 +-
kernel/bpf/rqspinlock.c | 22 +
kernel/bpf/stream.c | 544 ++++++++++++++++++
kernel/bpf/syscall.c | 27 +-
kernel/bpf/verifier.c | 5 +-
.../bpftool/Documentation/bpftool-prog.rst | 7 +
tools/bpf/bpftool/bash-completion/bpftool | 16 +-
tools/bpf/bpftool/prog.c | 50 +-
tools/include/uapi/linux/bpf.h | 19 +
tools/lib/bpf/bpf.c | 16 +
tools/lib/bpf/bpf.h | 15 +
tools/lib/bpf/bpf_helpers.h | 16 +
tools/lib/bpf/libbpf.map | 1 +
.../testing/selftests/bpf/prog_tests/stream.c | 110 ++++
tools/testing/selftests/bpf/progs/stream.c | 75 +++
.../testing/selftests/bpf/progs/stream_fail.c | 17 +
21 files changed, 1159 insertions(+), 26 deletions(-)
create mode 100644 kernel/bpf/stream.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/stream.c
create mode 100644 tools/testing/selftests/bpf/progs/stream.c
create mode 100644 tools/testing/selftests/bpf/progs/stream_fail.c
base-commit: 079e5c56a5c41d285068939ff7b0041ab10386fa
--
2.47.1
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-27 23:47 ` Eduard Zingerman
2025-06-02 20:07 ` Alexei Starovoitov
2025-05-24 1:18 ` [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
` (9 subsequent siblings)
10 siblings, 2 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
Matt Bobrowski, kkd, kernel-team
Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.
The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.
The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using try_alloc_pages. This ensures that we can allocate memory
from any context. The eventual plan is to discard this scheme in favor
of Alexei's kmalloc_nolock() [0].
This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.
Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream_id argument in addition to working otherwise
similar to bpf_trace_vprintk.
The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.
For consuming elements from a stream, we expose a bpf(2) syscall command
named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
stream of a given prog_fd into a user space buffer. The main logic is
implemented in bpf_stream_read(). The log messages are queued in
bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
ordered correctly in the stream backlog.
For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Once the full length of the element is consumed, we
will pop it and free it.
The lockless list bpf_stream::log is a LIFO stack. Elements obtained
using a llist_del_all() operation are in LIFO order, thus would break
the chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller. All of this is
done in bpf_stream_backlog_fill().
From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.
To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.
This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.
While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.
Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.
Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.
[0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
include/linux/bpf.h | 73 ++++-
include/uapi/linux/bpf.h | 19 ++
kernel/bpf/Makefile | 2 +-
kernel/bpf/core.c | 12 +
kernel/bpf/helpers.c | 26 +-
kernel/bpf/stream.c | 497 +++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 27 +-
kernel/bpf/verifier.c | 5 +-
tools/include/uapi/linux/bpf.h | 19 ++
9 files changed, 659 insertions(+), 21 deletions(-)
create mode 100644 kernel/bpf/stream.c
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b25d278409b..d298746f4dcc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1538,6 +1538,41 @@ struct btf_mod_pair {
struct bpf_kfunc_desc_tab;
+enum bpf_stream_id {
+ BPF_STDOUT = 1,
+ BPF_STDERR = 2,
+};
+
+struct bpf_stream_elem {
+ struct llist_node node;
+ int total_len;
+ int consumed_len;
+ char str[];
+};
+
+struct bpf_stream_elem_batch {
+ struct llist_node *node;
+};
+
+enum {
+ BPF_STREAM_MAX_CAPACITY = (4 * 1024U * 1024U),
+};
+
+struct bpf_stream {
+ enum bpf_stream_id stream_id;
+ atomic_t capacity;
+ struct llist_head log;
+
+ rqspinlock_t lock;
+ struct llist_node *backlog_head;
+ struct llist_node *backlog_tail;
+};
+
+struct bpf_stream_stage {
+ struct llist_head log;
+ int len;
+};
+
struct bpf_prog_aux {
atomic64_t refcnt;
u32 used_map_cnt;
@@ -1646,6 +1681,7 @@ struct bpf_prog_aux {
struct work_struct work;
struct rcu_head rcu;
};
+ struct bpf_stream stream[2];
};
struct bpf_prog {
@@ -2405,6 +2441,8 @@ int generic_map_delete_batch(struct bpf_map *map,
struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
+
+struct page *__bpf_alloc_page(int nid);
int bpf_map_alloc_pages(const struct bpf_map *map, int nid,
unsigned long nr_pages, struct page **page_array);
#ifdef CONFIG_MEMCG
@@ -3543,6 +3581,16 @@ bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
#define MAX_BPRINTF_VARARGS 12
#define MAX_BPRINTF_BUF 1024
+/* Per-cpu temp buffers used by printf-like helpers to store the bprintf binary
+ * arguments representation.
+ */
+#define MAX_BPRINTF_BIN_ARGS 512
+
+struct bpf_bprintf_buffers {
+ char bin_args[MAX_BPRINTF_BIN_ARGS];
+ char buf[MAX_BPRINTF_BUF];
+};
+
struct bpf_bprintf_data {
u32 *bin_args;
char *buf;
@@ -3550,9 +3598,32 @@ struct bpf_bprintf_data {
bool get_buf;
};
-int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
+int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
u32 num_args, struct bpf_bprintf_data *data);
void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
+int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
+void bpf_put_buffers(void);
+
+void bpf_prog_stream_init(struct bpf_prog *prog);
+void bpf_prog_stream_free(struct bpf_prog *prog);
+int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len);
+void bpf_stream_stage_init(struct bpf_stream_stage *ss);
+void bpf_stream_stage_free(struct bpf_stream_stage *ss);
+__printf(2, 3)
+int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
+int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
+ enum bpf_stream_id stream_id);
+
+#define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
+
+#define bpf_stream_stage(prog, stream_id, expr) \
+ ({ \
+ struct bpf_stream_stage __ss; \
+ bpf_stream_stage_init(&__ss); \
+ (expr); \
+ bpf_stream_stage_commit(&__ss, prog, stream_id); \
+ bpf_stream_stage_free(&__ss); \
+ })
#ifdef CONFIG_BPF_LSM
void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 16e95398c91c..137993ff63bd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -906,6 +906,17 @@ union bpf_iter_link_info {
* A new file descriptor (a nonnegative integer), or -1 if an
* error occurred (in which case, *errno* is set appropriately).
*
+ * BPF_PROG_STREAM_READ_BY_FD
+ * Description
+ * Read data of a program's BPF stream. The program is identified
+ * by *prog_fd*, and the stream is identified by the *stream_id*.
+ * The data is copied to a buffer pointed to by *stream_buf*, and
+ * filled less than or equal to *stream_buf_len* bytes.
+ *
+ * Return
+ * Number of bytes read from the stream on success, or -1 if an
+ * error occurred (in which case, *errno* is set appropriately).
+ *
* NOTES
* eBPF objects (maps and programs) can be shared between processes.
*
@@ -961,6 +972,7 @@ enum bpf_cmd {
BPF_LINK_DETACH,
BPF_PROG_BIND_MAP,
BPF_TOKEN_CREATE,
+ BPF_PROG_STREAM_READ_BY_FD,
__MAX_BPF_CMD,
};
@@ -1842,6 +1854,13 @@ union bpf_attr {
__u32 bpffs_fd;
} token_create;
+ struct {
+ __aligned_u64 stream_buf;
+ __u32 stream_buf_len;
+ __u32 stream_id;
+ __u32 prog_fd;
+ } prog_stream_read;
+
} __attribute__((aligned(8)));
/* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 70502f038b92..a89575822b60 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
-obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o
+obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o stream.o
ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o
endif
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index a3e571688421..22c278c008ce 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -134,6 +134,10 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
mutex_init(&fp->aux->ext_mutex);
mutex_init(&fp->aux->dst_mutex);
+#ifdef CONFIG_BPF_SYSCALL
+ bpf_prog_stream_init(fp);
+#endif
+
return fp;
}
@@ -2861,6 +2865,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
aux = container_of(work, struct bpf_prog_aux, work);
#ifdef CONFIG_BPF_SYSCALL
bpf_free_kfunc_btf_tab(aux->kfunc_btf_tab);
+ bpf_prog_stream_free(aux->prog);
#endif
#ifdef CONFIG_CGROUP_BPF
if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
@@ -2877,6 +2882,13 @@ static void bpf_prog_free_deferred(struct work_struct *work)
if (aux->dst_trampoline)
bpf_trampoline_put(aux->dst_trampoline);
for (i = 0; i < aux->real_func_cnt; i++) {
+#ifdef CONFIG_BPF_SYSCALL
+ /* Ensure we don't push to subprog lists. */
+ if (bpf_is_subprog(aux->func[i])) {
+ WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[0].log));
+ WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[1].log));
+ }
+#endif
/* We can just unlink the subprog poke descriptor table as
* it was originally linked to the main program and is also
* released along with it.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index c1113b74e1e2..8598fdad25a4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -762,22 +762,13 @@ static int bpf_trace_copy_string(char *buf, void *unsafe_ptr, char fmt_ptype,
return -EINVAL;
}
-/* Per-cpu temp buffers used by printf-like helpers to store the bprintf binary
- * arguments representation.
- */
-#define MAX_BPRINTF_BIN_ARGS 512
-
/* Support executing three nested bprintf helper calls on a given CPU */
#define MAX_BPRINTF_NEST_LEVEL 3
-struct bpf_bprintf_buffers {
- char bin_args[MAX_BPRINTF_BIN_ARGS];
- char buf[MAX_BPRINTF_BUF];
-};
static DEFINE_PER_CPU(struct bpf_bprintf_buffers[MAX_BPRINTF_NEST_LEVEL], bpf_bprintf_bufs);
static DEFINE_PER_CPU(int, bpf_bprintf_nest_level);
-static int try_get_buffers(struct bpf_bprintf_buffers **bufs)
+int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs)
{
int nest_level;
@@ -793,16 +784,21 @@ static int try_get_buffers(struct bpf_bprintf_buffers **bufs)
return 0;
}
-void bpf_bprintf_cleanup(struct bpf_bprintf_data *data)
+void bpf_put_buffers(void)
{
- if (!data->bin_args && !data->buf)
- return;
if (WARN_ON_ONCE(this_cpu_read(bpf_bprintf_nest_level) == 0))
return;
this_cpu_dec(bpf_bprintf_nest_level);
preempt_enable();
}
+void bpf_bprintf_cleanup(struct bpf_bprintf_data *data)
+{
+ if (!data->bin_args && !data->buf)
+ return;
+ bpf_put_buffers();
+}
+
/*
* bpf_bprintf_prepare - Generic pass on format strings for bprintf-like helpers
*
@@ -817,7 +813,7 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data)
* In argument preparation mode, if 0 is returned, safe temporary buffers are
* allocated and bpf_bprintf_cleanup should be called to free them after use.
*/
-int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
+int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
u32 num_args, struct bpf_bprintf_data *data)
{
bool get_buffers = (data->get_bin_args && num_args) || data->get_buf;
@@ -833,7 +829,7 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
return -EINVAL;
fmt_size = fmt_end - fmt;
- if (get_buffers && try_get_buffers(&buffers))
+ if (get_buffers && bpf_try_get_buffers(&buffers))
return -EBUSY;
if (data->get_bin_args) {
diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
new file mode 100644
index 000000000000..b9e6f7a43b1b
--- /dev/null
+++ b/kernel/bpf/stream.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <linux/bpf.h>
+#include <linux/bpf_mem_alloc.h>
+#include <linux/percpu.h>
+#include <linux/refcount.h>
+#include <linux/gfp.h>
+#include <linux/memory.h>
+#include <linux/local_lock.h>
+#include <asm/rqspinlock.h>
+
+/*
+ * Simple per-CPU NMI-safe bump allocation mechanism, backed by the NMI-safe
+ * try_alloc_pages()/free_pages_nolock() primitives. We allocate a page and
+ * stash it in a local per-CPU variable, and bump allocate from the page
+ * whenever items need to be printed to a stream. Each page holds a global
+ * atomic refcount in its first 4 bytes, and then records of variable length
+ * that describe the printed messages. Once the global refcount has dropped to
+ * zero, it is a signal to free the page back to the kernel's page allocator,
+ * given all the individual records in it have been consumed.
+ *
+ * It is possible the same page is used to serve allocations across different
+ * programs, which may be consumed at different times individually, hence
+ * maintaining a reference count per-page is critical for correct lifetime
+ * tracking.
+ *
+ * The bpf_stream_page code will be replaced to use kmalloc_nolock() once it
+ * lands.
+ */
+struct bpf_stream_page {
+ refcount_t ref;
+ u32 consumed;
+ char buf[];
+};
+
+/* Available room to add data to a refcounted page. */
+#define BPF_STREAM_PAGE_SZ (PAGE_SIZE - offsetofend(struct bpf_stream_page, consumed))
+
+static DEFINE_PER_CPU(local_trylock_t, stream_local_lock) = INIT_LOCAL_TRYLOCK(stream_local_lock);
+static DEFINE_PER_CPU(struct bpf_stream_page *, stream_pcpu_page);
+
+static bool bpf_stream_page_local_lock(unsigned long *flags)
+{
+ return local_trylock_irqsave(&stream_local_lock, *flags);
+}
+
+static void bpf_stream_page_local_unlock(unsigned long *flags)
+{
+ local_unlock_irqrestore(&stream_local_lock, *flags);
+}
+
+static void bpf_stream_page_free(struct bpf_stream_page *stream_page)
+{
+ struct page *p;
+
+ if (!stream_page)
+ return;
+ p = virt_to_page(stream_page);
+ free_pages_nolock(p, 0);
+}
+
+static void bpf_stream_page_get(struct bpf_stream_page *stream_page)
+{
+ refcount_inc(&stream_page->ref);
+}
+
+static void bpf_stream_page_put(struct bpf_stream_page *stream_page)
+{
+ if (refcount_dec_and_test(&stream_page->ref))
+ bpf_stream_page_free(stream_page);
+}
+
+static void bpf_stream_page_init(struct bpf_stream_page *stream_page)
+{
+ refcount_set(&stream_page->ref, 1);
+ stream_page->consumed = 0;
+}
+
+static struct bpf_stream_page *bpf_stream_page_replace(void)
+{
+ struct bpf_stream_page *stream_page, *old_stream_page;
+ struct page *page;
+
+ page = __bpf_alloc_page(NUMA_NO_NODE);
+ if (!page)
+ return NULL;
+ stream_page = page_address(page);
+ bpf_stream_page_init(stream_page);
+
+ old_stream_page = this_cpu_read(stream_pcpu_page);
+ if (old_stream_page)
+ bpf_stream_page_put(old_stream_page);
+ this_cpu_write(stream_pcpu_page, stream_page);
+ return stream_page;
+}
+
+static int bpf_stream_page_check_room(struct bpf_stream_page *stream_page, int len)
+{
+ int min = offsetof(struct bpf_stream_elem, str[0]);
+ int consumed = stream_page->consumed;
+ int total = BPF_STREAM_PAGE_SZ;
+ int rem = max(0, total - consumed - min);
+
+ /* Let's give room of at least 8 bytes. */
+ WARN_ON_ONCE(rem % 8 != 0);
+ rem = rem < 8 ? 0 : rem;
+ return min(len, rem);
+}
+
+static void bpf_stream_elem_init(struct bpf_stream_elem *elem, int len)
+{
+ init_llist_node(&elem->node);
+ elem->total_len = len;
+ elem->consumed_len = 0;
+}
+
+static struct bpf_stream_page *bpf_stream_page_from_elem(struct bpf_stream_elem *elem)
+{
+ unsigned long addr = (unsigned long)elem;
+
+ return (struct bpf_stream_page *)PAGE_ALIGN_DOWN(addr);
+}
+
+static struct bpf_stream_elem *bpf_stream_page_push_elem(struct bpf_stream_page *stream_page, int len)
+{
+ u32 consumed = stream_page->consumed;
+
+ stream_page->consumed += round_up(offsetof(struct bpf_stream_elem, str[len]), 8);
+ return (struct bpf_stream_elem *)&stream_page->buf[consumed];
+}
+
+static noinline struct bpf_stream_elem *bpf_stream_page_reserve_elem(int len)
+{
+ struct bpf_stream_elem *elem = NULL;
+ struct bpf_stream_page *page;
+ int room = 0;
+
+ page = this_cpu_read(stream_pcpu_page);
+ if (!page)
+ page = bpf_stream_page_replace();
+ if (!page)
+ return NULL;
+
+ room = bpf_stream_page_check_room(page, len);
+ if (room != len)
+ page = bpf_stream_page_replace();
+ if (!page)
+ return NULL;
+ bpf_stream_page_get(page);
+ room = bpf_stream_page_check_room(page, len);
+ WARN_ON_ONCE(room != len);
+
+ elem = bpf_stream_page_push_elem(page, room);
+ bpf_stream_elem_init(elem, room);
+ return elem;
+}
+
+static struct bpf_stream_elem *bpf_stream_elem_alloc(int len)
+{
+ const int max_len = ARRAY_SIZE((struct bpf_bprintf_buffers){}.buf);
+ struct bpf_stream_elem *elem;
+ unsigned long flags;
+
+ BUILD_BUG_ON(max_len > BPF_STREAM_PAGE_SZ);
+ /*
+ * Length denotes the amount of data to be written as part of stream element,
+ * thus includes '\0' byte. We're capped by how much bpf_bprintf_buffers can
+ * accomodate, therefore deny allocations that won't fit into them.
+ */
+ if (len < 0 || len > max_len)
+ return NULL;
+
+ if (!bpf_stream_page_local_lock(&flags))
+ return NULL;
+ elem = bpf_stream_page_reserve_elem(len);
+ bpf_stream_page_local_unlock(&flags);
+ return elem;
+}
+
+static int __bpf_stream_push_str(struct llist_head *log, const char *str, int len)
+{
+ struct bpf_stream_elem *elem = NULL;
+
+ /*
+ * Allocate a bpf_prog_stream_elem and push it to the bpf_prog_stream
+ * log, elements will be popped at once and reversed to print the log.
+ */
+ elem = bpf_stream_elem_alloc(len);
+ if (!elem)
+ return -ENOMEM;
+
+ memcpy(elem->str, str, len);
+ llist_add(&elem->node, log);
+
+ return 0;
+}
+
+static int bpf_stream_consume_capacity(struct bpf_stream *stream, int len)
+{
+ if (atomic_read(&stream->capacity) >= BPF_STREAM_MAX_CAPACITY)
+ return -ENOSPC;
+ if (atomic_add_return(len, &stream->capacity) >= BPF_STREAM_MAX_CAPACITY) {
+ atomic_sub(len, &stream->capacity);
+ return -ENOSPC;
+ }
+ return 0;
+}
+
+static void bpf_stream_release_capacity(struct bpf_stream *stream, struct bpf_stream_elem *elem)
+{
+ int len = elem->total_len;
+
+ atomic_sub(len, &stream->capacity);
+}
+
+static int bpf_stream_push_str(struct bpf_stream *stream, const char *str, int len)
+{
+ int ret = bpf_stream_consume_capacity(stream, len);
+
+ return ret ?: __bpf_stream_push_str(&stream->log, str, len);
+}
+
+static struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, struct bpf_prog_aux *aux)
+{
+ if (stream_id != BPF_STDOUT && stream_id != BPF_STDERR)
+ return NULL;
+ return &aux->stream[stream_id - 1];
+}
+
+static void bpf_stream_free_elem(struct bpf_stream_elem *elem)
+{
+ struct bpf_stream_page *p;
+
+ p = bpf_stream_page_from_elem(elem);
+ bpf_stream_page_put(p);
+}
+
+static void bpf_stream_free_list(struct llist_node *list)
+{
+ struct bpf_stream_elem *elem, *tmp;
+
+ llist_for_each_entry_safe(elem, tmp, list, node)
+ bpf_stream_free_elem(elem);
+}
+
+static struct llist_node *bpf_stream_backlog_peek(struct bpf_stream *stream)
+{
+ return stream->backlog_head;
+}
+
+static struct llist_node *bpf_stream_backlog_pop(struct bpf_stream *stream)
+{
+ struct llist_node *node;
+
+ node = stream->backlog_head;
+ if (stream->backlog_head == stream->backlog_tail)
+ stream->backlog_head = stream->backlog_tail = NULL;
+ else
+ stream->backlog_head = node->next;
+ return node;
+}
+
+static void bpf_stream_backlog_fill(struct bpf_stream *stream)
+{
+ struct llist_node *head, *tail;
+
+ if (llist_empty(&stream->log))
+ return;
+ tail = llist_del_all(&stream->log);
+ if (!tail)
+ return;
+ head = llist_reverse_order(tail);
+
+ if (!stream->backlog_head) {
+ stream->backlog_head = head;
+ stream->backlog_tail = tail;
+ } else {
+ stream->backlog_tail->next = head;
+ stream->backlog_tail = tail;
+ }
+
+ return;
+}
+
+static bool bpf_stream_consume_elem(struct bpf_stream_elem *elem, int *len)
+{
+ int rem = elem->total_len - elem->consumed_len;
+ int used = min(rem, *len);
+
+ elem->consumed_len += used;
+ *len -= used;
+
+ return elem->consumed_len == elem->total_len;
+}
+
+static int bpf_stream_read(struct bpf_stream *stream, void __user *buf, int len)
+{
+ int rem_len = len, cons_len, ret = 0;
+ struct bpf_stream_elem *elem = NULL;
+ struct llist_node *node;
+ unsigned long flags;
+
+ if (raw_res_spin_lock_irqsave(&stream->lock, flags))
+ return -EDEADLK;
+
+ while (rem_len) {
+ int pos = len - rem_len;
+ bool cont;
+
+ node = bpf_stream_backlog_peek(stream);
+ if (!node) {
+ bpf_stream_backlog_fill(stream);
+ node = bpf_stream_backlog_peek(stream);
+ }
+ if (!node)
+ break;
+ elem = container_of(node, typeof(*elem), node);
+
+ cons_len = elem->consumed_len;
+ cont = bpf_stream_consume_elem(elem, &rem_len) == false;
+
+ ret = copy_to_user_nofault(buf + pos, elem->str + cons_len,
+ elem->consumed_len - cons_len);
+ /* Restore in case of error. */
+ if (ret) {
+ elem->consumed_len = cons_len;
+ break;
+ }
+
+ if (cont)
+ continue;
+ bpf_stream_backlog_pop(stream);
+ bpf_stream_release_capacity(stream, elem);
+ bpf_stream_free_elem(elem);
+ }
+
+ raw_res_spin_unlock_irqrestore(&stream->lock, flags);
+ return ret ? ret : len - rem_len;
+}
+
+int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len)
+{
+ struct bpf_stream *stream;
+
+ stream = bpf_stream_get(stream_id, prog->aux);
+ if (!stream)
+ return -ENOENT;
+ return bpf_stream_read(stream, buf, len);
+}
+
+__bpf_kfunc_start_defs();
+
+/*
+ * Avoid using enum bpf_stream_id so that kfunc users don't have to pull in the
+ * enum in headers.
+ */
+__bpf_kfunc int bpf_stream_vprintk(int stream_id, const char *fmt__str, const void *args, u32 len__sz, void *aux__prog)
+{
+ struct bpf_bprintf_data data = {
+ .get_bin_args = true,
+ .get_buf = true,
+ };
+ struct bpf_prog_aux *aux = aux__prog;
+ u32 fmt_size = strlen(fmt__str) + 1;
+ struct bpf_stream *stream;
+ u32 data_len = len__sz;
+ int ret, num_args;
+
+ stream = bpf_stream_get(stream_id, aux);
+ if (!stream)
+ return -ENOENT;
+
+ if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
+ (data_len && !args))
+ return -EINVAL;
+ num_args = data_len / 8;
+
+ ret = bpf_bprintf_prepare(fmt__str, fmt_size, args, num_args, &data);
+ if (ret < 0)
+ return ret;
+
+ ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt__str, data.bin_args);
+ /* If the string was truncated, we only wrote until the size of buffer. */
+ ret = min_t(u32, ret + 1, MAX_BPRINTF_BUF);
+ ret = bpf_stream_push_str(stream, data.buf, ret);
+ bpf_bprintf_cleanup(&data);
+
+ return ret;
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(stream_kfunc_set)
+BTF_ID_FLAGS(func, bpf_stream_vprintk, KF_TRUSTED_ARGS)
+BTF_KFUNCS_END(stream_kfunc_set)
+
+static const struct btf_kfunc_id_set bpf_stream_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &stream_kfunc_set,
+};
+
+static int __init bpf_stream_kfunc_init(void)
+{
+ return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &bpf_stream_kfunc_set);
+}
+late_initcall(bpf_stream_kfunc_init);
+
+void bpf_prog_stream_init(struct bpf_prog *prog)
+{
+ int i;
+
+ prog->aux->stream[0].stream_id = BPF_STDOUT;
+ prog->aux->stream[1].stream_id = BPF_STDERR;
+
+ for (i = 0; i < ARRAY_SIZE(prog->aux->stream); i++) {
+ atomic_set(&prog->aux->stream[i].capacity, 0);
+ init_llist_head(&prog->aux->stream[i].log);
+ raw_res_spin_lock_init(&prog->aux->stream[i].lock);
+ prog->aux->stream[i].backlog_head = NULL;
+ prog->aux->stream[i].backlog_tail = NULL;
+ }
+}
+
+void bpf_prog_stream_free(struct bpf_prog *prog)
+{
+ struct llist_node *list;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(prog->aux->stream); i++) {
+ list = llist_del_all(&prog->aux->stream[i].log);
+ bpf_stream_free_list(list);
+ bpf_stream_free_list(prog->aux->stream[i].backlog_head);
+ }
+}
+
+void bpf_stream_stage_init(struct bpf_stream_stage *ss)
+{
+ init_llist_head(&ss->log);
+ ss->len = 0;
+}
+
+void bpf_stream_stage_free(struct bpf_stream_stage *ss)
+{
+ struct llist_node *node;
+
+ node = llist_del_all(&ss->log);
+ bpf_stream_free_list(node);
+}
+
+int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...)
+{
+ struct bpf_bprintf_buffers *buf;
+ va_list args;
+ int ret;
+
+ if (bpf_try_get_buffers(&buf))
+ return -EBUSY;
+
+ va_start(args, fmt);
+ ret = vsnprintf(buf->buf, ARRAY_SIZE(buf->buf), fmt, args);
+ va_end(args);
+ /* If the string was truncated, we only wrote until the size of buffer. */
+ ret = min_t(u32, ret + 1, ARRAY_SIZE(buf->buf));
+ ss->len += ret;
+ ret = __bpf_stream_push_str(&ss->log, buf->buf, ret);
+ bpf_put_buffers();
+ return ret;
+}
+
+int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
+ enum bpf_stream_id stream_id)
+{
+ struct llist_node *list, *head, *tail;
+ struct bpf_stream *stream;
+ int ret;
+
+ stream = bpf_stream_get(stream_id, prog->aux);
+ if (!stream)
+ return -EINVAL;
+
+ ret = bpf_stream_consume_capacity(stream, ss->len);
+ if (ret)
+ return ret;
+
+ list = llist_del_all(&ss->log);
+ head = list;
+
+ if (!list)
+ return 0;
+ while (llist_next(list)) {
+ tail = llist_next(list);
+ list = tail;
+ }
+ llist_add_batch(head, tail, &stream->log);
+ return 0;
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4b5f29168618..49868e2693c4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -576,7 +576,7 @@ static bool can_alloc_pages(void)
!IS_ENABLED(CONFIG_PREEMPT_RT);
}
-static struct page *__bpf_alloc_page(int nid)
+struct page *__bpf_alloc_page(int nid)
{
if (!can_alloc_pages())
return try_alloc_pages(nid, 0);
@@ -5794,6 +5794,28 @@ static int token_create(union bpf_attr *attr)
return bpf_token_create(attr);
}
+#define BPF_PROG_STREAM_READ_BY_FD_LAST_FIELD prog_stream_read.prog_fd
+
+static int prog_stream_read(union bpf_attr *attr)
+{
+ char __user *buf = u64_to_user_ptr(attr->prog_stream_read.stream_buf);
+ u32 len = attr->prog_stream_read.stream_buf_len;
+ struct bpf_prog *prog;
+ int ret;
+
+ if (CHECK_ATTR(BPF_PROG_STREAM_READ_BY_FD))
+ return -EINVAL;
+
+ prog = bpf_prog_get(attr->prog_stream_read.prog_fd);
+ if (IS_ERR(prog))
+ return PTR_ERR(prog);
+
+ ret = bpf_prog_stream_read(prog, attr->prog_stream_read.stream_id, buf, len);
+ bpf_prog_put(prog);
+
+ return ret;
+}
+
static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
{
union bpf_attr attr;
@@ -5930,6 +5952,9 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
case BPF_TOKEN_CREATE:
err = token_create(&attr);
break;
+ case BPF_PROG_STREAM_READ_BY_FD:
+ err = prog_stream_read(&attr);
+ break;
default:
err = -EINVAL;
break;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d5807d2efc92..5ab8742d2c00 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13882,10 +13882,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
regs[BPF_REG_0].type = PTR_TO_BTF_ID;
regs[BPF_REG_0].btf_id = ptr_type_id;
- if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache])
+ if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache]) {
regs[BPF_REG_0].type |= PTR_UNTRUSTED;
-
- if (is_iter_next_kfunc(&meta)) {
+ } else if (is_iter_next_kfunc(&meta)) {
struct bpf_reg_state *cur_iter;
cur_iter = get_iter_from_state(env->cur_state, &meta);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16e95398c91c..137993ff63bd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -906,6 +906,17 @@ union bpf_iter_link_info {
* A new file descriptor (a nonnegative integer), or -1 if an
* error occurred (in which case, *errno* is set appropriately).
*
+ * BPF_PROG_STREAM_READ_BY_FD
+ * Description
+ * Read data of a program's BPF stream. The program is identified
+ * by *prog_fd*, and the stream is identified by the *stream_id*.
+ * The data is copied to a buffer pointed to by *stream_buf*, and
+ * filled less than or equal to *stream_buf_len* bytes.
+ *
+ * Return
+ * Number of bytes read from the stream on success, or -1 if an
+ * error occurred (in which case, *errno* is set appropriately).
+ *
* NOTES
* eBPF objects (maps and programs) can be shared between processes.
*
@@ -961,6 +972,7 @@ enum bpf_cmd {
BPF_LINK_DETACH,
BPF_PROG_BIND_MAP,
BPF_TOKEN_CREATE,
+ BPF_PROG_STREAM_READ_BY_FD,
__MAX_BPF_CMD,
};
@@ -1842,6 +1854,13 @@ union bpf_attr {
__u32 bpffs_fd;
} token_create;
+ struct {
+ __aligned_u64 stream_buf;
+ __u32 stream_buf_len;
+ __u32 stream_id;
+ __u32 prog_fd;
+ } prog_stream_read;
+
} __attribute__((aligned(8)));
/* The description below is an attempt at providing documentation to eBPF
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-06-02 20:18 ` Alexei Starovoitov
2025-05-24 1:18 ` [PATCH bpf-next v2 03/11] bpf: Add function to find program from stack trace Kumar Kartikeya Dwivedi
` (8 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
Prepare a function for use in future patches that can extract the file
info, line info, and the source line number for a given BPF program
provided it's program counter.
Only the basename of the file path is provided, given it can be
excessively long in some cases.
This will be used in later patches to print source info to the BPF
stream. The source line number is indicated by the return value, and the
file and line info are provided through out parameters.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
include/linux/bpf.h | 2 ++
kernel/bpf/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d298746f4dcc..4eb4f06f7219 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3659,4 +3659,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
return prog->aux->func_idx != 0;
}
+int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
+
#endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 22c278c008ce..7e7fef095bca 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3204,3 +3204,52 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
+
+#ifdef CONFIG_BPF_SYSCALL
+
+int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
+{
+ int idx = -1, insn_start, insn_end, len;
+ struct bpf_line_info *linfo;
+ void **jited_linfo;
+ struct btf *btf;
+
+ btf = prog->aux->btf;
+ linfo = prog->aux->linfo;
+ jited_linfo = prog->aux->jited_linfo;
+
+ if (!btf || !linfo || !prog->aux->jited_linfo)
+ return -EINVAL;
+ len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
+
+ linfo = &prog->aux->linfo[prog->aux->linfo_idx];
+ jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
+
+ insn_start = linfo[0].insn_off;
+ insn_end = insn_start + len;
+
+ for (int i = 0; i < prog->aux->nr_linfo &&
+ linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {
+ if (jited_linfo[i] >= (void *)ip)
+ break;
+ idx = i;
+ }
+
+ if (idx == -1)
+ return -ENOENT;
+
+ /* Get base component of the file path. */
+ *filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
+ if (!*filep)
+ return -ENOENT;
+ *filep = kbasename(*filep);
+ /* Obtain the source line, and strip whitespace in prefix. */
+ *linep = btf_name_by_offset(btf, linfo[idx].line_off);
+ if (!*linep)
+ return -ENOENT;
+ while (isspace(**linep))
+ *linep += 1;
+ return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
+}
+
+#endif
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 03/11] bpf: Add function to find program from stack trace
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find Kumar Kartikeya Dwivedi
` (7 subsequent siblings)
10 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
Matt Bobrowski, kkd, kernel-team
In preparation of figuring out the closest program that led to the
current point in the kernel, implement a function that scans through the
stack trace and finds out the closest BPF program when walking down the
stack trace.
Special care needs to be taken to skip over kernel and BPF subprog
frames. We basically scan until we find a BPF main prog frame. The
assumption is that if a program calls into us transitively, we'll
hit it along the way. If not, we end up returning NULL.
Contextually the function will be used in places where we know the
program may have called into us.
Due to reliance on arch_bpf_stack_walk(), this function only works on
x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from
arch_bpf_stack_walk as well since we call it outside bpf_throw()
context.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
arch/x86/net/bpf_jit_comp.c | 1 -
include/linux/bpf.h | 1 +
kernel/bpf/core.c | 26 ++++++++++++++++++++++++++
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 9e5fe2ba858f..17693ee6bb1a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3791,7 +3791,6 @@ void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp
}
return;
#endif
- WARN(1, "verification of programs using bpf_throw should have failed\n");
}
void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4eb4f06f7219..6985e793e927 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3660,5 +3660,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
}
int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
+struct bpf_prog *bpf_prog_find_from_stack(void);
#endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 7e7fef095bca..8d381ca9f2fa 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3252,4 +3252,30 @@ int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char *
return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
}
+struct walk_stack_ctx {
+ struct bpf_prog *prog;
+};
+
+static bool find_from_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
+{
+ struct walk_stack_ctx *ctxp = cookie;
+ struct bpf_prog *prog;
+
+ prog = bpf_prog_ksym_find(ip);
+ if (!prog)
+ return true;
+ if (bpf_is_subprog(prog))
+ return true;
+ ctxp->prog = prog;
+ return false;
+}
+
+struct bpf_prog *bpf_prog_find_from_stack(void)
+{
+ struct walk_stack_ctx ctx = {};
+
+ arch_bpf_stack_walk(find_from_stack_cb, &ctx);
+ return ctx.prog;
+}
+
#endif
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (2 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 03/11] bpf: Add function to find program from stack trace Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-24 4:41 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
` (6 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
The bpf_ksym_find must be called with RCU read protection, wrap the call
to bpf_ksym_find in bpf_prog_ksym_find with RCU read lock so that
callers do not have to care about holding it specifically.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
kernel/bpf/core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8d381ca9f2fa..959538f91c60 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -782,7 +782,11 @@ bool is_bpf_text_address(unsigned long addr)
struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
{
- struct bpf_ksym *ksym = bpf_ksym_find(addr);
+ struct bpf_ksym *ksym;
+
+ rcu_read_lock();
+ ksym = bpf_ksym_find(addr);
+ rcu_read_unlock();
return ksym && ksym->prog ?
container_of(ksym, struct bpf_prog_aux, ksym)->prog :
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (3 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-28 0:45 ` Eduard Zingerman
2025-05-24 1:18 ` [PATCH bpf-next v2 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
` (5 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
Introduce a kernel function which is the analogue of dump_stack()
printing some useful information and the stack trace. This is not
exposed to BPF programs yet, but can be made available in the future.
When we have a program counter for a BPF program in the stack trace,
also additionally output the filename and line number to make the trace
helpful. The rest of the trace can be passed into ./decode_stacktrace.sh
to obtain the line numbers for kernel symbols.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
include/linux/bpf.h | 2 ++
kernel/bpf/stream.c | 42 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6985e793e927..aab5ea17a329 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3613,8 +3613,10 @@ __printf(2, 3)
int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
enum bpf_stream_id stream_id);
+int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
#define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
+#define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
#define bpf_stream_stage(prog, stream_id, expr) \
({ \
diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
index b9e6f7a43b1b..cebd596671cd 100644
--- a/kernel/bpf/stream.c
+++ b/kernel/bpf/stream.c
@@ -2,6 +2,7 @@
/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
#include <linux/bpf.h>
+#include <linux/filter.h>
#include <linux/bpf_mem_alloc.h>
#include <linux/percpu.h>
#include <linux/refcount.h>
@@ -495,3 +496,44 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
llist_add_batch(head, tail, &stream->log);
return 0;
}
+
+struct dump_stack_ctx {
+ struct bpf_stream_stage *ss;
+ int err;
+};
+
+static bool dump_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
+{
+ struct dump_stack_ctx *ctxp = cookie;
+ const char *file = "", *line = "";
+ struct bpf_prog *prog;
+ int num;
+
+ if (is_bpf_text_address(ip)) {
+ prog = bpf_prog_ksym_find(ip);
+ num = bpf_prog_get_file_line(prog, ip, &file, &line);
+ if (num < 0)
+ goto end;
+ ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n %s @ %s:%d\n",
+ (void *)ip, line, file, num);
+ return !ctxp->err;
+ }
+end:
+ ctxp->err = bpf_stream_stage_printk(ctxp->ss, "%pS\n", (void *)ip);
+ return !ctxp->err;
+}
+
+int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss)
+{
+ struct dump_stack_ctx ctx = { .ss = ss };
+ int ret;
+
+ ret = bpf_stream_stage_printk(ss, "CPU: %d UID: %d PID: %d Comm: %s\n",
+ raw_smp_processor_id(), __kuid_val(current_real_cred()->euid),
+ current->pid, current->comm);
+ ret = ret ?: bpf_stream_stage_printk(ss, "Call trace:\n");
+ if (!ret)
+ arch_bpf_stack_walk(dump_stack_cb, &ctx);
+ ret = ret ?: ctx.err;
+ return ret ?: bpf_stream_stage_printk(ss, "\n");
+}
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 06/11] bpf: Report may_goto timeout to BPF stderr
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (4 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-06-02 22:27 ` Alexei Starovoitov
2025-05-24 1:18 ` [PATCH bpf-next v2 07/11] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
` (4 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
Matt Bobrowski, kkd, kernel-team
Begin reporting may_goto timeouts to BPF program's stderr stream.
Make sure that we don't end up spamming too many errors if the
program keeps failing repeatedly and filling up the stream, hence
emit at most 512 error messages from the kernel for a given stream.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
include/linux/bpf.h | 21 ++++++++++++++-------
kernel/bpf/core.c | 19 ++++++++++++++++++-
kernel/bpf/stream.c | 5 +++++
3 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aab5ea17a329..3449a31e9f66 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1682,6 +1682,7 @@ struct bpf_prog_aux {
struct rcu_head rcu;
};
struct bpf_stream stream[2];
+ atomic_t stream_error_cnt;
};
struct bpf_prog {
@@ -3604,6 +3605,8 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
void bpf_put_buffers(void);
+#define BPF_PROG_STREAM_ERROR_CNT 512
+
void bpf_prog_stream_init(struct bpf_prog *prog);
void bpf_prog_stream_free(struct bpf_prog *prog);
int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len);
@@ -3615,16 +3618,20 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
enum bpf_stream_id stream_id);
int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
+bool bpf_prog_stream_error_limit(struct bpf_prog *prog);
+
#define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
#define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
-#define bpf_stream_stage(prog, stream_id, expr) \
- ({ \
- struct bpf_stream_stage __ss; \
- bpf_stream_stage_init(&__ss); \
- (expr); \
- bpf_stream_stage_commit(&__ss, prog, stream_id); \
- bpf_stream_stage_free(&__ss); \
+#define bpf_stream_stage(prog, stream_id, expr) \
+ ({ \
+ struct bpf_stream_stage __ss; \
+ if (!bpf_prog_stream_error_limit(prog)) { \
+ bpf_stream_stage_init(&__ss); \
+ (expr); \
+ bpf_stream_stage_commit(&__ss, prog, stream_id); \
+ bpf_stream_stage_free(&__ss); \
+ } \
})
#ifdef CONFIG_BPF_LSM
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 959538f91c60..8e74562e4114 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3160,6 +3160,21 @@ u64 __weak arch_bpf_timed_may_goto(void)
return 0;
}
+static noinline void bpf_prog_report_may_goto_violation(void)
+{
+#ifdef CONFIG_BPF_SYSCALL
+ struct bpf_prog *prog;
+
+ prog = bpf_prog_find_from_stack();
+ if (!prog)
+ return;
+ bpf_stream_stage(prog, BPF_STDERR, ({
+ bpf_stream_printk("ERROR: Timeout detected for may_goto instruction\n");
+ bpf_stream_dump_stack();
+ }));
+#endif
+}
+
u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *p)
{
u64 time = ktime_get_mono_fast_ns();
@@ -3170,8 +3185,10 @@ u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *p)
return BPF_MAX_TIMED_LOOPS;
}
/* Check if we've exhausted our time slice, and zero count. */
- if (time - p->timestamp >= (NSEC_PER_SEC / 4))
+ if (unlikely(time - p->timestamp >= (NSEC_PER_SEC / 4))) {
+ bpf_prog_report_may_goto_violation();
return 0;
+ }
/* Refresh the count for the stack frame. */
return BPF_MAX_TIMED_LOOPS;
}
diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
index cebd596671cd..c72b9480008c 100644
--- a/kernel/bpf/stream.c
+++ b/kernel/bpf/stream.c
@@ -537,3 +537,8 @@ int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss)
ret = ret ?: ctx.err;
return ret ?: bpf_stream_stage_printk(ss, "\n");
}
+
+bool bpf_prog_stream_error_limit(struct bpf_prog *prog)
+{
+ return atomic_fetch_add(1, &prog->aux->stream_error_cnt) >= BPF_PROG_STREAM_ERROR_CNT;
+}
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 07/11] bpf: Report rqspinlock deadlocks/timeout to BPF stderr
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (5 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-06-02 22:32 ` Alexei Starovoitov
2025-05-24 1:18 ` [PATCH bpf-next v2 08/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
` (3 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
Begin reporting rqspinlock deadlocks and timeout to BPF program's
stderr.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
kernel/bpf/rqspinlock.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 338305c8852c..888c8e2f9061 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -666,6 +666,26 @@ EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
__bpf_kfunc_start_defs();
+static void bpf_prog_report_rqspinlock_violation(const char *str, void *lock, bool irqsave)
+{
+ struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+ struct bpf_prog *prog;
+
+ prog = bpf_prog_find_from_stack();
+ if (!prog)
+ return;
+ bpf_stream_stage(prog, BPF_STDERR, ({
+ bpf_stream_printk("ERROR: %s for bpf_res_spin_lock%s\n", str, irqsave ? "_irqsave" : "");
+ bpf_stream_printk("Attempted lock = 0x%px\n", lock);
+ bpf_stream_printk("Total held locks = %d\n", rqh->cnt);
+ for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
+ bpf_stream_printk("Held lock[%2d] = 0x%px\n", i, rqh->locks[i]);
+ bpf_stream_dump_stack();
+ }));
+}
+
+#define REPORT_STR(ret) ({ (ret) == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; })
+
__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
{
int ret;
@@ -676,6 +696,7 @@ __bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
preempt_disable();
ret = res_spin_lock((rqspinlock_t *)lock);
if (unlikely(ret)) {
+ bpf_prog_report_rqspinlock_violation(REPORT_STR(ret), lock, false);
preempt_enable();
return ret;
}
@@ -698,6 +719,7 @@ __bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsign
local_irq_save(flags);
ret = res_spin_lock((rqspinlock_t *)lock);
if (unlikely(ret)) {
+ bpf_prog_report_rqspinlock_violation(REPORT_STR(ret), lock, true);
local_irq_restore(flags);
preempt_enable();
return ret;
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 08/11] libbpf: Add bpf_stream_printk() macro
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (6 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 07/11] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-28 6:12 ` Eduard Zingerman
2025-05-24 1:18 ` [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API Kumar Kartikeya Dwivedi
` (2 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
print to the respective streams.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
tools/lib/bpf/bpf_helpers.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index a50773d4616e..76b127a9f24d 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -314,6 +314,22 @@ enum libbpf_tristate {
___param, sizeof(___param)); \
})
+extern int bpf_stream_vprintk(int stream_id, const char *fmt__str, const void *args,
+ __u32 len__sz, void *aux__prog) __weak __ksym;
+
+#define bpf_stream_printk(stream_id, fmt, args...) \
+({ \
+ static const char ___fmt[] = fmt; \
+ unsigned long long ___param[___bpf_narg(args)]; \
+ \
+ _Pragma("GCC diagnostic push") \
+ _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
+ ___bpf_fill(___param, args); \
+ _Pragma("GCC diagnostic pop") \
+ \
+ bpf_stream_vprintk(stream_id, ___fmt, ___param, sizeof(___param), NULL);\
+})
+
/* Use __bpf_printk when bpf_printk call has 3 or fewer fmt args
* Otherwise use __bpf_vprintk
*/
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (7 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 08/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-28 18:02 ` Eduard Zingerman
2025-06-05 20:06 ` Andrii Nakryiko
2025-05-24 1:18 ` [PATCH bpf-next v2 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
10 siblings, 2 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
Introduce a libbpf API so that users can read data from a given BPF
stream for a BPF prog fd. For now, only the low-level syscall wrapper
is provided, we can add a bpf_program__* accessor as a follow up if
needed.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
tools/lib/bpf/bpf.c | 16 ++++++++++++++++
tools/lib/bpf/bpf.h | 15 +++++++++++++++
tools/lib/bpf/libbpf.map | 1 +
3 files changed, 32 insertions(+)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index a9c3e33d0f8a..4b9f3a2096c8 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1331,3 +1331,19 @@ int bpf_token_create(int bpffs_fd, struct bpf_token_create_opts *opts)
fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
return libbpf_err_errno(fd);
}
+
+int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *stream_buf, __u32 stream_buf_len)
+{
+ const size_t attr_sz = offsetofend(union bpf_attr, prog_stream_read);
+ union bpf_attr attr;
+ int err;
+
+ memset(&attr, 0, attr_sz);
+ attr.prog_stream_read.stream_buf = ptr_to_u64(stream_buf);
+ attr.prog_stream_read.stream_buf_len = stream_buf_len;
+ attr.prog_stream_read.stream_id = stream_id;
+ attr.prog_stream_read.prog_fd = prog_fd;
+
+ err = sys_bpf(BPF_PROG_STREAM_READ_BY_FD, &attr, attr_sz);
+ return libbpf_err_errno(err);
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 777627d33d25..9fa7be3d92d3 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -704,6 +704,21 @@ struct bpf_token_create_opts {
LIBBPF_API int bpf_token_create(int bpffs_fd,
struct bpf_token_create_opts *opts);
+/**
+ * @brief **bpf_prog_stream_read** reads data from the BPF stream of a given BPF
+ * program.
+ *
+ * @param prog_fd FD for the BPF program whose BPF stream is to be read.
+ * @param stream_id ID of the BPF stream to be read.
+ * @param stream_buf Buffer to read data into from the BPF stream.
+ * @param stream_buf_len Maximum number of bytes to read from the BPF stream.
+ *
+ * @return The number of bytes read, on success; negative error code, otherwise
+ * (errno is also set to the error code)
+ */
+LIBBPF_API int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *stream_buf,
+ __u32 stream_buf_len);
+
#ifdef __cplusplus
} /* extern "C" */
#endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 1205f9a4fe04..4359527c8442 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -437,6 +437,7 @@ LIBBPF_1.6.0 {
bpf_linker__add_fd;
bpf_linker__new_fd;
bpf_object__prepare;
+ bpf_prog_stream_read;
bpf_program__func_info;
bpf_program__func_info_cnt;
bpf_program__line_info;
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 10/11] bpftool: Add support for dumping streams
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (8 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-27 10:20 ` Quentin Monnet
2025-05-24 1:18 ` [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Quentin Monnet, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
Emil Tsalapatis, Barret Rhoden, Matt Bobrowski, kkd, kernel-team
Add support for printing the BPF stream contents of a program in
bpftool. The new bpftool prog tracelog command is extended to take
stdout and stderr arguments, and then the prog specification.
The bpf_prog_stream_read() API added in previous patch is simply reused
to grab data and then it is dumped to the respective file. The stdout
data is sent to stdout, and stderr is printed to stderr.
Cc: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
.../bpftool/Documentation/bpftool-prog.rst | 7 +++
tools/bpf/bpftool/bash-completion/bpftool | 16 +++++-
tools/bpf/bpftool/prog.c | 50 ++++++++++++++++++-
3 files changed, 71 insertions(+), 2 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index d6304e01afe0..d8aaac9363f3 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -35,6 +35,7 @@ PROG COMMANDS
| **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*]
| **bpftool** **prog tracelog**
+| **bpftool** **prog tracelog** [ { **stdout** | **stderr** } *PROG* ]
| **bpftool** **prog run** *PROG* **data_in** *FILE* [**data_out** *FILE* [**data_size_out** *L*]] [**ctx_in** *FILE* [**ctx_out** *FILE* [**ctx_size_out** *M*]]] [**repeat** *N*]
| **bpftool** **prog profile** *PROG* [**duration** *DURATION*] *METRICs*
| **bpftool** **prog help**
@@ -173,6 +174,12 @@ bpftool prog tracelog
purposes. For streaming data from BPF programs to user space, one can use
perf events (see also **bpftool-map**\ (8)).
+bpftool prog tracelog { stdout | stderr } *PROG*
+ Dump the BPF stream of the program. BPF programs can write to these streams
+ at runtime with the **bpf_stream_vprintk**\ () kfunc. The kernel may write
+ error messages to the standard error stream. This facility should be used
+ only for debugging purposes.
+
bpftool prog run *PROG* data_in *FILE* [data_out *FILE* [data_size_out *L*]] [ctx_in *FILE* [ctx_out *FILE* [ctx_size_out *M*]]] [repeat *N*]
Run BPF program *PROG* in the kernel testing infrastructure for BPF,
meaning that the program works on the data and context provided by the
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 1ce409a6cbd9..c7c0bf3aee24 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -518,7 +518,21 @@ _bpftool()
esac
;;
tracelog)
- return 0
+ case $prev in
+ $command)
+ COMPREPLY+=( $( compgen -W "stdout stderr" -- \
+ "$cur" ) )
+ return 0
+ ;;
+ stdout|stderr)
+ COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \
+ "$cur" ) )
+ return 0
+ ;;
+ *)
+ return 0
+ ;;
+ esac
;;
profile)
case $cword in
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index f010295350be..3f31fbb8a99c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -1113,6 +1113,53 @@ static int do_detach(int argc, char **argv)
return 0;
}
+enum prog_tracelog_mode {
+ TRACE_STDOUT,
+ TRACE_STDERR,
+};
+
+static int
+prog_tracelog_stream(int prog_fd, enum prog_tracelog_mode mode)
+{
+ FILE *file = mode == TRACE_STDOUT ? stdout : stderr;
+ int stream_id = mode == TRACE_STDOUT ? 1 : 2;
+ static char buf[512];
+ int ret;
+
+ ret = 0;
+ do {
+ ret = bpf_prog_stream_read(prog_fd, stream_id, buf, sizeof(buf));
+ if (ret > 0) {
+ fwrite(buf, sizeof(buf[0]), ret, file);
+ }
+ } while (ret > 0);
+
+ fflush(file);
+ return ret ? -1 : 0;
+}
+
+static int do_tracelog_any(int argc, char **argv)
+{
+ enum prog_tracelog_mode mode;
+ int fd;
+
+ if (argc == 0)
+ return do_tracelog(argc, argv);
+ if (!is_prefix(*argv, "stdout") && !is_prefix(*argv, "stderr"))
+ usage();
+ mode = is_prefix(*argv, "stdout") ? TRACE_STDOUT : TRACE_STDERR;
+ NEXT_ARG();
+
+ if (!REQ_ARGS(2))
+ return -1;
+
+ fd = prog_parse_fd(&argc, &argv);
+ if (fd < 0)
+ return -1;
+
+ return prog_tracelog_stream(fd, mode);
+}
+
static int check_single_stdin(char *file_data_in, char *file_ctx_in)
{
if (file_data_in && file_ctx_in &&
@@ -2483,6 +2530,7 @@ static int do_help(int argc, char **argv)
" [repeat N]\n"
" %1$s %2$s profile PROG [duration DURATION] METRICs\n"
" %1$s %2$s tracelog\n"
+ " %1$s %2$s tracelog { stdout | stderr } PROG\n"
" %1$s %2$s help\n"
"\n"
" " HELP_SPEC_MAP "\n"
@@ -2522,7 +2570,7 @@ static const struct cmd cmds[] = {
{ "loadall", do_loadall },
{ "attach", do_attach },
{ "detach", do_detach },
- { "tracelog", do_tracelog },
+ { "tracelog", do_tracelog_any },
{ "run", do_run },
{ "profile", do_profile },
{ 0 }
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
` (9 preceding siblings ...)
2025-05-24 1:18 ` [PATCH bpf-next v2 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
@ 2025-05-24 1:18 ` Kumar Kartikeya Dwivedi
2025-05-28 17:04 ` Eduard Zingerman
10 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 1:18 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
Add selftests to stress test the various facets of the stream API,
memory allocation pattern, and ensuring dumping support is tested and
functional.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
.../testing/selftests/bpf/prog_tests/stream.c | 110 ++++++++++++++++++
tools/testing/selftests/bpf/progs/stream.c | 75 ++++++++++++
.../testing/selftests/bpf/progs/stream_fail.c | 17 +++
3 files changed, 202 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/stream.c
create mode 100644 tools/testing/selftests/bpf/progs/stream.c
create mode 100644 tools/testing/selftests/bpf/progs/stream_fail.c
diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
new file mode 100644
index 000000000000..3b04d3d499d0
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/stream.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <sys/mman.h>
+
+#include "stream.skel.h"
+#include "stream_fail.skel.h"
+
+void test_stream_failure(void)
+{
+ RUN_TESTS(stream_fail);
+}
+
+void test_stream_success(void)
+{
+ RUN_TESTS(stream);
+ return;
+}
+
+struct {
+ int prog_off;
+ const char *errstr;
+} stream_error_arr[] = {
+ {
+ offsetof(struct stream, progs.stream_cond_break),
+ "ERROR: Timeout detected for may_goto instruction",
+ },
+ {
+ offsetof(struct stream, progs.stream_deadlock),
+ "ERROR: AA or ABBA deadlock detected",
+ },
+};
+
+void test_stream_errors(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ struct stream *skel;
+ int ret, prog_fd;
+ char buf[64];
+
+ skel = stream__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
+ return;
+
+ for (int i = 0; i < ARRAY_SIZE(stream_error_arr); i++) {
+ struct bpf_program **prog;
+
+ prog = (struct bpf_program **)(((char *)skel) + stream_error_arr[i].prog_off);
+ prog_fd = bpf_program__fd(*prog);
+ ret = bpf_prog_test_run_opts(prog_fd, &opts);
+ ASSERT_OK(ret, "ret");
+ ASSERT_OK(opts.retval, "retval");
+
+#if !defined(__x86_64__)
+ ASSERT_TRUE(1, "Timed may_goto unsupported, skip.");
+ if (i == 0) {
+ ret = bpf_prog_stream_read(prog_fd, 2, buf, sizeof(buf));
+ ASSERT_EQ(ret, 0, "stream read");
+ continue;
+ }
+#endif
+
+ ret = bpf_prog_stream_read(prog_fd, 2, buf, sizeof(buf));
+ ASSERT_EQ(ret, sizeof(buf), "stream read");
+ ASSERT_STRNEQ(stream_error_arr[i].errstr, buf, strlen(stream_error_arr[i].errstr),
+ "compare error msg");
+ }
+
+ stream__destroy(skel);
+}
+
+void test_stream_syscall(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ struct stream *skel;
+ int ret, prog_fd;
+ char buf[64];
+
+ skel = stream__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
+ return;
+
+ prog_fd = bpf_program__fd(skel->progs.stream_syscall);
+ ret = bpf_prog_test_run_opts(prog_fd, &opts);
+ ASSERT_OK(ret, "ret");
+ ASSERT_OK(opts.retval, "retval");
+
+ bpf_prog_stream_read(0, 1, buf, sizeof(buf));
+ ret = -errno;
+ ASSERT_EQ(ret, -EINVAL, "bad prog_fd");
+
+ bpf_prog_stream_read(prog_fd, 0, buf, sizeof(buf));
+ ret = -errno;
+ ASSERT_EQ(ret, -ENOENT, "bad stream id");
+
+ bpf_prog_stream_read(prog_fd, 1, NULL, sizeof(buf));
+ ret = -errno;
+ ASSERT_EQ(ret, -EFAULT, "bad stream buf");
+
+ ret = bpf_prog_stream_read(prog_fd, 1, buf, 2);
+ ASSERT_EQ(ret, 2, "bytes");
+ ret = bpf_prog_stream_read(prog_fd, 1, buf, 2);
+ ASSERT_EQ(ret, 2, "bytes");
+ ret = bpf_prog_stream_read(prog_fd, 1, buf, 1);
+ ASSERT_EQ(ret, 0, "no bytes stdout");
+ ret = bpf_prog_stream_read(prog_fd, 2, buf, 1);
+ ASSERT_EQ(ret, 0, "no bytes stderr");
+
+ stream__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
new file mode 100644
index 000000000000..1fb0e810afc6
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stream.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct arr_elem {
+ struct bpf_res_spin_lock lock;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(max_entries, 1);
+ __type(key, int);
+ __type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+#define ENOSPC 28
+#define _STR "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+#define STREAM_STR (u64)(_STR _STR _STR _STR)
+
+SEC("syscall")
+__success __retval(0)
+int stream_exhaust(void *ctx)
+{
+ bpf_repeat(BPF_MAX_LOOPS)
+ if (bpf_stream_printk(BPF_STDOUT, _STR) == -ENOSPC)
+ return 0;
+ return 1;
+}
+
+SEC("syscall")
+__success __retval(0)
+int stream_cond_break(void *ctx)
+{
+ while (can_loop)
+ ;
+ return 0;
+}
+
+SEC("syscall")
+__success __retval(0)
+int stream_deadlock(void *ctx)
+{
+ struct bpf_res_spin_lock *lock, *nlock;
+
+ lock = bpf_map_lookup_elem(&arrmap, &(int){0});
+ if (!lock)
+ return 0;
+ nlock = bpf_map_lookup_elem(&arrmap, &(int){0});
+ if (!nlock)
+ return 0;
+ if (bpf_res_spin_lock(lock))
+ return 0;
+ if (bpf_res_spin_lock(nlock)) {
+ bpf_res_spin_unlock(lock);
+ return 0;
+ }
+ bpf_res_spin_unlock(nlock);
+ bpf_res_spin_unlock(lock);
+ return 0;
+}
+
+SEC("syscall")
+__success __retval(0)
+int stream_syscall(void *ctx)
+{
+ bpf_stream_printk(BPF_STDOUT, "foo");
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/stream_fail.c b/tools/testing/selftests/bpf/progs/stream_fail.c
new file mode 100644
index 000000000000..12004d5092b7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stream_fail.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+
+SEC("syscall")
+__failure __msg("Possibly NULL pointer passed")
+int stream_vprintk_null_arg(void *ctx)
+{
+ bpf_stream_vprintk(BPF_STDOUT, "", NULL, 0, NULL);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.47.1
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find
2025-05-24 1:18 ` [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find Kumar Kartikeya Dwivedi
@ 2025-05-24 4:41 ` Kumar Kartikeya Dwivedi
2025-05-28 0:15 ` Eduard Zingerman
0 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-24 4:41 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
On Sat, 24 May 2025 at 03:18, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> The bpf_ksym_find must be called with RCU read protection, wrap the call
> to bpf_ksym_find in bpf_prog_ksym_find with RCU read lock so that
> callers do not have to care about holding it specifically.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> kernel/bpf/core.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8d381ca9f2fa..959538f91c60 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -782,7 +782,11 @@ bool is_bpf_text_address(unsigned long addr)
>
> struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
> {
> - struct bpf_ksym *ksym = bpf_ksym_find(addr);
> + struct bpf_ksym *ksym;
> +
> + rcu_read_lock();
> + ksym = bpf_ksym_find(addr);
> + rcu_read_unlock();
>
> return ksym && ksym->prog ?
> container_of(ksym, struct bpf_prog_aux, ksym)->prog :
This isn't right, we need to have the read section open around ksym
access as well.
We can end the section and return the prog pointer.
The caller is responsible to ensure prog outlives RCU protection, or
otherwise hold it if necessary for prog's lifetime.
We're using this to pick programs who have an active stack frame, so
they aren't going away.
But the ksym access itself needs to happen under correct protection.
I can fix it in a respin, whatever is best.
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 10/11] bpftool: Add support for dumping streams
2025-05-24 1:18 ` [PATCH bpf-next v2 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
@ 2025-05-27 10:20 ` Quentin Monnet
2025-05-27 14:51 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Quentin Monnet @ 2025-05-27 10:20 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi, bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
2025-05-23 18:18 UTC-0700 ~ Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Add support for printing the BPF stream contents of a program in
> bpftool. The new bpftool prog tracelog command is extended to take
> stdout and stderr arguments, and then the prog specification.
>
> The bpf_prog_stream_read() API added in previous patch is simply reused
> to grab data and then it is dumped to the respective file. The stdout
> data is sent to stdout, and stderr is printed to stderr.
>
> Cc: Quentin Monnet <qmo@kernel.org>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> .../bpftool/Documentation/bpftool-prog.rst | 7 +++
> tools/bpf/bpftool/bash-completion/bpftool | 16 +++++-
> tools/bpf/bpftool/prog.c | 50 ++++++++++++++++++-
> 3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index f010295350be..3f31fbb8a99c 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -1113,6 +1113,53 @@ static int do_detach(int argc, char **argv)
> return 0;
> }
>
> +enum prog_tracelog_mode {
> + TRACE_STDOUT,
> + TRACE_STDERR,
> +};
You could have TRACE_STDOUT = 1 and TRACE_STDERR = 2 in this enum, and
later do "stream_id = mode". This would avoid passing "1" or "2" inside
of the prog_tracelog_stream() function. Although thinking again, it's
maybe confusing to use the same enum for the mode and the stream_id?
Your call.
> +
> +static int
> +prog_tracelog_stream(int prog_fd, enum prog_tracelog_mode mode)
> +{
> + FILE *file = mode == TRACE_STDOUT ? stdout : stderr;
> + int stream_id = mode == TRACE_STDOUT ? 1 : 2;
> + static char buf[512];
Why static?
> + int ret;
> +
> + ret = 0;
> + do {
> + ret = bpf_prog_stream_read(prog_fd, stream_id, buf, sizeof(buf));
> + if (ret > 0) {
> + fwrite(buf, sizeof(buf[0]), ret, file);
> + }
Nit: No brackets needed around fwrite()
Otherwise looks good, thanks!
Quentin
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 10/11] bpftool: Add support for dumping streams
2025-05-27 10:20 ` Quentin Monnet
@ 2025-05-27 14:51 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-27 14:51 UTC (permalink / raw)
To: Quentin Monnet
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
On Tue, 27 May 2025 at 12:20, Quentin Monnet <qmo@kernel.org> wrote:
>
> 2025-05-23 18:18 UTC-0700 ~ Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > Add support for printing the BPF stream contents of a program in
> > bpftool. The new bpftool prog tracelog command is extended to take
> > stdout and stderr arguments, and then the prog specification.
> >
> > The bpf_prog_stream_read() API added in previous patch is simply reused
> > to grab data and then it is dumped to the respective file. The stdout
> > data is sent to stdout, and stderr is printed to stderr.
> >
> > Cc: Quentin Monnet <qmo@kernel.org>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > .../bpftool/Documentation/bpftool-prog.rst | 7 +++
> > tools/bpf/bpftool/bash-completion/bpftool | 16 +++++-
> > tools/bpf/bpftool/prog.c | 50 ++++++++++++++++++-
> > 3 files changed, 71 insertions(+), 2 deletions(-)
> >
>
> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> > index f010295350be..3f31fbb8a99c 100644
> > --- a/tools/bpf/bpftool/prog.c
> > +++ b/tools/bpf/bpftool/prog.c
> > @@ -1113,6 +1113,53 @@ static int do_detach(int argc, char **argv)
> > return 0;
> > }
> >
> > +enum prog_tracelog_mode {
> > + TRACE_STDOUT,
> > + TRACE_STDERR,
> > +};
>
>
> You could have TRACE_STDOUT = 1 and TRACE_STDERR = 2 in this enum, and
> later do "stream_id = mode". This would avoid passing "1" or "2" inside
> of the prog_tracelog_stream() function. Although thinking again, it's
> maybe confusing to use the same enum for the mode and the stream_id?
> Your call.
Yeah, that's why I kept them separate.
>
>
> > +
> > +static int
> > +prog_tracelog_stream(int prog_fd, enum prog_tracelog_mode mode)
> > +{
> > + FILE *file = mode == TRACE_STDOUT ? stdout : stderr;
> > + int stream_id = mode == TRACE_STDOUT ? 1 : 2;
> > + static char buf[512];
>
>
> Why static?
>
Will change, not really needed.
>
> > + int ret;
> > +
> > + ret = 0;
> > + do {
> > + ret = bpf_prog_stream_read(prog_fd, stream_id, buf, sizeof(buf));
> > + if (ret > 0) {
> > + fwrite(buf, sizeof(buf[0]), ret, file);
> > + }
>
>
> Nit: No brackets needed around fwrite()
Ack.
>
> Otherwise looks good, thanks!
>
> Quentin
Thanks!
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams
2025-05-24 1:18 ` [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
@ 2025-05-27 23:47 ` Eduard Zingerman
2025-05-27 23:55 ` Kumar Kartikeya Dwivedi
2025-06-02 20:07 ` Alexei Starovoitov
1 sibling, 1 reply; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-27 23:47 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
Overall logic seems right to me, a few comments below.
[...]
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5b25d278409b..d298746f4dcc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1538,6 +1538,41 @@ struct btf_mod_pair {
>
> struct bpf_kfunc_desc_tab;
>
> +enum bpf_stream_id {
> + BPF_STDOUT = 1,
> + BPF_STDERR = 2,
> +};
> +
> +struct bpf_stream_elem {
> + struct llist_node node;
> + int total_len;
> + int consumed_len;
> + char str[];
> +};
> +
> +struct bpf_stream_elem_batch {
> + struct llist_node *node;
> +};
This type is not used anymore.
> +
> +enum {
> + BPF_STREAM_MAX_CAPACITY = (4 * 1024U * 1024U),
> +};
> +
> +struct bpf_stream {
> + enum bpf_stream_id stream_id;
Nit: `stream_id` is never read, as streams are identified by a position
in the bpf_prog_aux->stream.
> + atomic_t capacity;
> + struct llist_head log;
> +
> + rqspinlock_t lock;
> + struct llist_node *backlog_head;
> + struct llist_node *backlog_tail;
Nit: maybe add comments describing what kind of data is in the llist_{head,node}? E.g.:
atomic_t capacity;
struct llist_head log; /* list of struct bpf_stream_elem in LIFO order */
rqspinlock_t lock; /* backlog_{head,tail} lock */
struct llist_node *backlog_head; /* list of struct bpf_stream_elem in FIFO order */
struct llist_node *backlog_tail;
> +};
> +
> +struct bpf_stream_stage {
> + struct llist_head log;
> + int len;
> +};
> +
[...]
> diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> new file mode 100644
> index 000000000000..b9e6f7a43b1b
> --- /dev/null
> +++ b/kernel/bpf/stream.c
[...]
> +int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> + enum bpf_stream_id stream_id)
> +{
> + struct llist_node *list, *head, *tail;
> + struct bpf_stream *stream;
> + int ret;
> +
> + stream = bpf_stream_get(stream_id, prog->aux);
> + if (!stream)
> + return -EINVAL;
> +
> + ret = bpf_stream_consume_capacity(stream, ss->len);
> + if (ret)
> + return ret;
> +
> + list = llist_del_all(&ss->log);
> + head = list;
> +
> + if (!list)
> + return 0;
> + while (llist_next(list)) {
> + tail = llist_next(list);
> + list = tail;
> + }
> + llist_add_batch(head, tail, &stream->log);
If `llist_next(list) == NULL` at entry `tail` is never assigned?
> + return 0;
> +}
[...]
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d5807d2efc92..5ab8742d2c00 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -13882,10 +13882,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> regs[BPF_REG_0].type = PTR_TO_BTF_ID;
> regs[BPF_REG_0].btf_id = ptr_type_id;
>
> - if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache])
> + if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache]) {
> regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> -
> - if (is_iter_next_kfunc(&meta)) {
> + } else if (is_iter_next_kfunc(&meta)) {
Nit: unrelated change?
> struct bpf_reg_state *cur_iter;
>
> cur_iter = get_iter_from_state(env->cur_state, &meta);
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams
2025-05-27 23:47 ` Eduard Zingerman
@ 2025-05-27 23:55 ` Kumar Kartikeya Dwivedi
2025-05-28 0:23 ` Eduard Zingerman
0 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-27 23:55 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
On Wed, 28 May 2025 at 01:47, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> Overall logic seems right to me, a few comments below.
>
> [...]
>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 5b25d278409b..d298746f4dcc 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1538,6 +1538,41 @@ struct btf_mod_pair {
> >
> > struct bpf_kfunc_desc_tab;
> >
> > +enum bpf_stream_id {
> > + BPF_STDOUT = 1,
> > + BPF_STDERR = 2,
> > +};
> > +
> > +struct bpf_stream_elem {
> > + struct llist_node node;
> > + int total_len;
> > + int consumed_len;
> > + char str[];
> > +};
> > +
> > +struct bpf_stream_elem_batch {
> > + struct llist_node *node;
> > +};
>
> This type is not used anymore.
>
Will drop.
> > +
> > +enum {
> > + BPF_STREAM_MAX_CAPACITY = (4 * 1024U * 1024U),
> > +};
> > +
> > +struct bpf_stream {
> > + enum bpf_stream_id stream_id;
>
> Nit: `stream_id` is never read, as streams are identified by a position
> in the bpf_prog_aux->stream.
Ack.
>
> > + atomic_t capacity;
> > + struct llist_head log;
> > +
> > + rqspinlock_t lock;
> > + struct llist_node *backlog_head;
> > + struct llist_node *backlog_tail;
>
> Nit: maybe add comments describing what kind of data is in the llist_{head,node}? E.g.:
>
> atomic_t capacity;
> struct llist_head log; /* list of struct bpf_stream_elem in LIFO order */
>
> rqspinlock_t lock; /* backlog_{head,tail} lock */
> struct llist_node *backlog_head; /* list of struct bpf_stream_elem in FIFO order */
> struct llist_node *backlog_tail;
>
>
Will do.
> > +};
> > +
> > +struct bpf_stream_stage {
> > + struct llist_head log;
> > + int len;
> > +};
> > +
>
> [...]
>
> > diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> > new file mode 100644
> > index 000000000000..b9e6f7a43b1b
> > --- /dev/null
> > +++ b/kernel/bpf/stream.c
>
> [...]
>
> > +int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> > + enum bpf_stream_id stream_id)
> > +{
> > + struct llist_node *list, *head, *tail;
> > + struct bpf_stream *stream;
> > + int ret;
> > +
> > + stream = bpf_stream_get(stream_id, prog->aux);
> > + if (!stream)
> > + return -EINVAL;
> > +
> > + ret = bpf_stream_consume_capacity(stream, ss->len);
> > + if (ret)
> > + return ret;
> > +
> > + list = llist_del_all(&ss->log);
> > + head = list;
> > +
> > + if (!list)
> > + return 0;
> > + while (llist_next(list)) {
> > + tail = llist_next(list);
> > + list = tail;
> > + }
> > + llist_add_batch(head, tail, &stream->log);
>
> If `llist_next(list) == NULL` at entry `tail` is never assigned?
The assumption is llist_del_all being non-NULL means llist_next is
going to return a non-NULL value at least once.
Does that address your concern?
>
> > + return 0;
> > +}
>
> [...]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index d5807d2efc92..5ab8742d2c00 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -13882,10 +13882,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> > regs[BPF_REG_0].type = PTR_TO_BTF_ID;
> > regs[BPF_REG_0].btf_id = ptr_type_id;
> >
> > - if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache])
> > + if (meta.func_id == special_kfunc_list[KF_bpf_get_kmem_cache]) {
> > regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> > -
> > - if (is_iter_next_kfunc(&meta)) {
> > + } else if (is_iter_next_kfunc(&meta)) {
>
> Nit: unrelated change?
Yeah, will drop.
>
> > struct bpf_reg_state *cur_iter;
> >
> > cur_iter = get_iter_from_state(env->cur_state, &meta);
>
> [...]
Thanks!
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find
2025-05-24 4:41 ` Kumar Kartikeya Dwivedi
@ 2025-05-28 0:15 ` Eduard Zingerman
2025-05-28 0:27 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 0:15 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
[...]
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -782,7 +782,11 @@ bool is_bpf_text_address(unsigned long addr)
>>
>> struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
>> {
>> - struct bpf_ksym *ksym = bpf_ksym_find(addr);
>> + struct bpf_ksym *ksym;
>> +
>> + rcu_read_lock();
>> + ksym = bpf_ksym_find(addr);
>> + rcu_read_unlock();
>>
>> return ksym && ksym->prog ?
>> container_of(ksym, struct bpf_prog_aux, ksym)->prog :
>
> This isn't right, we need to have the read section open around ksym
> access as well.
> We can end the section and return the prog pointer.
> The caller is responsible to ensure prog outlives RCU protection, or
> otherwise hold it if necessary for prog's lifetime.
>
> We're using this to pick programs who have an active stack frame, so
> they aren't going away.
> But the ksym access itself needs to happen under correct protection.
>
> I can fix it in a respin, whatever is best.
Are rcu_read_{lock,unlock} necessary in core.c:search_bpf_extables()
after this change?
Also, helpers.c:bpf_stack_walker.c does not have lock/unlock in it,
this patch needs a fixes tag for commit f18b03fabaa9 ("bpf: Implement BPF exceptions")?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams
2025-05-27 23:55 ` Kumar Kartikeya Dwivedi
@ 2025-05-28 0:23 ` Eduard Zingerman
2025-05-28 0:30 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 0:23 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
[...]
>> > diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
>> > new file mode 100644
>> > index 000000000000..b9e6f7a43b1b
>> > --- /dev/null
>> > +++ b/kernel/bpf/stream.c
>>
>> [...]
>>
>> > +int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
>> > + enum bpf_stream_id stream_id)
>> > +{
>> > + struct llist_node *list, *head, *tail;
>> > + struct bpf_stream *stream;
>> > + int ret;
>> > +
>> > + stream = bpf_stream_get(stream_id, prog->aux);
>> > + if (!stream)
>> > + return -EINVAL;
>> > +
>> > + ret = bpf_stream_consume_capacity(stream, ss->len);
>> > + if (ret)
>> > + return ret;
>> > +
>> > + list = llist_del_all(&ss->log);
>> > + head = list;
>> > +
>> > + if (!list)
>> > + return 0;
>> > + while (llist_next(list)) {
>> > + tail = llist_next(list);
>> > + list = tail;
>> > + }
>> > + llist_add_batch(head, tail, &stream->log);
>>
>> If `llist_next(list) == NULL` at entry `tail` is never assigned?
>
> The assumption is llist_del_all being non-NULL means llist_next is
> going to return a non-NULL value at least once.
> Does that address your concern?
Sorry, maybe I don't understand something.
Suppose that at entry ss->log is a list with a single element:
ss->log -> 0xAA: { .next = NULL; ... payload ... }
then:
- list == 0xAA;
- llist_next(list) == 0x0;
- loop body never executes.
What do I miss?
>> > + return 0;
>> > +}
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find
2025-05-28 0:15 ` Eduard Zingerman
@ 2025-05-28 0:27 ` Kumar Kartikeya Dwivedi
2025-05-28 0:33 ` Eduard Zingerman
0 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-28 0:27 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
On Wed, 28 May 2025 at 02:15, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> [...]
>
> >> --- a/kernel/bpf/core.c
> >> +++ b/kernel/bpf/core.c
> >> @@ -782,7 +782,11 @@ bool is_bpf_text_address(unsigned long addr)
> >>
> >> struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
> >> {
> >> - struct bpf_ksym *ksym = bpf_ksym_find(addr);
> >> + struct bpf_ksym *ksym;
> >> +
> >> + rcu_read_lock();
> >> + ksym = bpf_ksym_find(addr);
> >> + rcu_read_unlock();
> >>
> >> return ksym && ksym->prog ?
> >> container_of(ksym, struct bpf_prog_aux, ksym)->prog :
> >
> > This isn't right, we need to have the read section open around ksym
> > access as well.
> > We can end the section and return the prog pointer.
> > The caller is responsible to ensure prog outlives RCU protection, or
> > otherwise hold it if necessary for prog's lifetime.
> >
> > We're using this to pick programs who have an active stack frame, so
> > they aren't going away.
> > But the ksym access itself needs to happen under correct protection.
> >
> > I can fix it in a respin, whatever is best.
>
> Are rcu_read_{lock,unlock} necessary in core.c:search_bpf_extables()
> after this change?
Don't think so, in general, it would be necessary to hold the RCU read
lock to make sure the program does not go away.
There is a requirement to hold it to traverse the latch tree itself,
and then for accesses on the prog require its lifetime to be valid.
Except in case of search_bpf_extables (and all other current users),
the prog being found should have an active frame on that CPU, so it's
not going away.
Alternatively, we can add WARN_ON_ONCE(rcu_read_lock_held()) as a requirement.
> Also, helpers.c:bpf_stack_walker.c does not have lock/unlock in it,
> this patch needs a fixes tag for commit f18b03fabaa9 ("bpf: Implement BPF exceptions")?
So far this is probably not a bug because programs run in an RCU read
section already (perhaps it is a problem in case of sleepable ones).
It felt odd to me while looking at the code.
I feel like the rcu_read_lock being dropped and then returning a prog
pointer will be a bit confusing and a footgun, so it might be better
to go with WARN_ON_ONCE.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams
2025-05-28 0:23 ` Eduard Zingerman
@ 2025-05-28 0:30 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-28 0:30 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
On Wed, 28 May 2025 at 02:23, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> [...]
>
> >> > diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> >> > new file mode 100644
> >> > index 000000000000..b9e6f7a43b1b
> >> > --- /dev/null
> >> > +++ b/kernel/bpf/stream.c
> >>
> >> [...]
> >>
> >> > +int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> >> > + enum bpf_stream_id stream_id)
> >> > +{
> >> > + struct llist_node *list, *head, *tail;
> >> > + struct bpf_stream *stream;
> >> > + int ret;
> >> > +
> >> > + stream = bpf_stream_get(stream_id, prog->aux);
> >> > + if (!stream)
> >> > + return -EINVAL;
> >> > +
> >> > + ret = bpf_stream_consume_capacity(stream, ss->len);
> >> > + if (ret)
> >> > + return ret;
> >> > +
> >> > + list = llist_del_all(&ss->log);
> >> > + head = list;
> >> > +
> >> > + if (!list)
> >> > + return 0;
> >> > + while (llist_next(list)) {
> >> > + tail = llist_next(list);
> >> > + list = tail;
> >> > + }
> >> > + llist_add_batch(head, tail, &stream->log);
> >>
> >> If `llist_next(list) == NULL` at entry `tail` is never assigned?
> >
> > The assumption is llist_del_all being non-NULL means llist_next is
> > going to return a non-NULL value at least once.
> > Does that address your concern?
>
> Sorry, maybe I don't understand something.
> Suppose that at entry ss->log is a list with a single element:
>
> ss->log -> 0xAA: { .next = NULL; ... payload ... }
>
> then:
> - list == 0xAA;
> - llist_next(list) == 0x0;
> - loop body never executes.
>
> What do I miss?
Right, I see.
We need to do head = tail = list above.
Then it's equivalent to a single element llist_add.
>
>
> >> > + return 0;
> >> > +}
>
> [...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find
2025-05-28 0:27 ` Kumar Kartikeya Dwivedi
@ 2025-05-28 0:33 ` Eduard Zingerman
0 siblings, 0 replies; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 0:33 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
[...]
> Alternatively, we can add WARN_ON_ONCE(rcu_read_lock_held()) as a requirement.
I think this is a good option.
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr
2025-05-24 1:18 ` [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
@ 2025-05-28 0:45 ` Eduard Zingerman
2025-05-28 0:58 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 0:45 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
Could you please modify one of the selftests to check lines reported by
dump stack?
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6985e793e927..aab5ea17a329 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3613,8 +3613,10 @@ __printf(2, 3)
> int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
> int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> enum bpf_stream_id stream_id);
> +int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
>
> #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
> +#define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
I don't think we should add macro with hard-coded variable names (`__ss`)
in common headers.
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr
2025-05-28 0:45 ` Eduard Zingerman
@ 2025-05-28 0:58 ` Kumar Kartikeya Dwivedi
2025-05-28 6:03 ` Eduard Zingerman
0 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-28 0:58 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
On Wed, 28 May 2025 at 02:45, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> Could you please modify one of the selftests to check lines reported by
> dump stack?
Ok, will try to do it with a regex pattern to match the overall layout.
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 6985e793e927..aab5ea17a329 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -3613,8 +3613,10 @@ __printf(2, 3)
> > int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
> > int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> > enum bpf_stream_id stream_id);
> > +int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
> >
> > #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
> > +#define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
>
> I don't think we should add macro with hard-coded variable names (`__ss`)
> in common headers.
Hm, right. But this is supposed to be used within the stream stage
block, and we have __i variables in macros that wrap around loops
etc., hence the double / triple underscore to not conflict. Anyhow,
I'm open to other suggestions.
>
> [...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr
2025-05-28 0:58 ` Kumar Kartikeya Dwivedi
@ 2025-05-28 6:03 ` Eduard Zingerman
0 siblings, 0 replies; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 6:03 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
[...]
>> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> > index 6985e793e927..aab5ea17a329 100644
>> > --- a/include/linux/bpf.h
>> > +++ b/include/linux/bpf.h
>> > @@ -3613,8 +3613,10 @@ __printf(2, 3)
>> > int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
>> > int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
>> > enum bpf_stream_id stream_id);
>> > +int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
>> >
>> > #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
>> > +#define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
>>
>> I don't think we should add macro with hard-coded variable names (`__ss`)
>> in common headers.
>
> Hm, right. But this is supposed to be used within the stream stage
> block, and we have __i variables in macros that wrap around loops
> etc., hence the double / triple underscore to not conflict. Anyhow,
> I'm open to other suggestions.
I missed that this macro is supposed to be used only inside
`bpf_stream_stage`. Nothing pretty comes to mind, I thought about using
cleanup.h:DEFINE_CLASS, but that thing is ugly in its own way.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 08/11] libbpf: Add bpf_stream_printk() macro
2025-05-24 1:18 ` [PATCH bpf-next v2 08/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
@ 2025-05-28 6:12 ` Eduard Zingerman
2025-05-28 10:09 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 6:12 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
> BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
> print to the respective streams.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
(Sorry if I'm being repetative, could you please extend
one of the tests so that output of the bpf_stream_printk
is compared with some reference values).
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 08/11] libbpf: Add bpf_stream_printk() macro
2025-05-28 6:12 ` Eduard Zingerman
@ 2025-05-28 10:09 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-28 10:09 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
On Wed, 28 May 2025 at 08:13, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> > Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
> > BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
> > print to the respective streams.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> (Sorry if I'm being repetative, could you please extend
> one of the tests so that output of the bpf_stream_printk
> is compared with some reference values).
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
IDK whether you saw the final selftest patch, that does some output matching.
I was matching arena fault addresses until I dropped the patch.
I will add something to match on the backtrace output as well.
>
> [...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams
2025-05-24 1:18 ` [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
@ 2025-05-28 17:04 ` Eduard Zingerman
2025-05-29 14:57 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 17:04 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
[...]
> +struct {
> + int prog_off;
> + const char *errstr;
> +} stream_error_arr[] = {
> + {
> + offsetof(struct stream, progs.stream_cond_break),
> + "ERROR: Timeout detected for may_goto instruction",
> + },
> + {
> + offsetof(struct stream, progs.stream_deadlock),
> + "ERROR: AA or ABBA deadlock detected",
> + },
> +};
Wild idea: instead of hand-coding this for each test, maybe add
__bpf_stderr, __bpf_stdout annotations to test_loader.c?
With intent for them to operate like __msg.
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API
2025-05-24 1:18 ` [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API Kumar Kartikeya Dwivedi
@ 2025-05-28 18:02 ` Eduard Zingerman
2025-05-29 15:36 ` Kumar Kartikeya Dwivedi
2025-06-05 20:06 ` Andrii Nakryiko
1 sibling, 1 reply; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-28 18:02 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> Introduce a libbpf API so that users can read data from a given BPF
> stream for a BPF prog fd. For now, only the low-level syscall wrapper
> is provided, we can add a bpf_program__* accessor as a follow up if
> needed.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
> +int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *stream_buf, __u32 stream_buf_len)
Note: many of such utility functions have _opts parameter for future
extensibility. Imo in this case it would hinder usability a bit.
If need be bpf_prog_stream_read_opts can be added later.
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams
2025-05-28 17:04 ` Eduard Zingerman
@ 2025-05-29 14:57 ` Kumar Kartikeya Dwivedi
2025-05-29 16:28 ` Eduard Zingerman
0 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-29 14:57 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
On Wed, 28 May 2025 at 19:04, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> [...]
>
> > +struct {
> > + int prog_off;
> > + const char *errstr;
> > +} stream_error_arr[] = {
> > + {
> > + offsetof(struct stream, progs.stream_cond_break),
> > + "ERROR: Timeout detected for may_goto instruction",
> > + },
> > + {
> > + offsetof(struct stream, progs.stream_deadlock),
> > + "ERROR: AA or ABBA deadlock detected",
> > + },
> > +};
>
> Wild idea: instead of hand-coding this for each test, maybe add
> __bpf_stderr, __bpf_stdout annotations to test_loader.c?
> With intent for them to operate like __msg.
Good idea. But we'll have to run the program, which is slightly
different for each type.
I guess I can just support tc, syscall etc. for now with some default
setup to support this.
>
> [...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API
2025-05-28 18:02 ` Eduard Zingerman
@ 2025-05-29 15:36 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-05-29 15:36 UTC (permalink / raw)
To: Eduard Zingerman
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
On Wed, 28 May 2025 at 20:02, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> > Introduce a libbpf API so that users can read data from a given BPF
> > stream for a BPF prog fd. For now, only the low-level syscall wrapper
> > is provided, we can add a bpf_program__* accessor as a follow up if
> > needed.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > +int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *stream_buf, __u32 stream_buf_len)
>
> Note: many of such utility functions have _opts parameter for future
> extensibility. Imo in this case it would hinder usability a bit.
> If need be bpf_prog_stream_read_opts can be added later.
>
We may have more arguments in the future (like count of errors, as
Alan requested), so it may be a good idea to add the opts variant
directly now.
> [...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams
2025-05-29 14:57 ` Kumar Kartikeya Dwivedi
@ 2025-05-29 16:28 ` Eduard Zingerman
0 siblings, 0 replies; 43+ messages in thread
From: Eduard Zingerman @ 2025-05-29 16:28 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden, Matt Bobrowski,
kkd, kernel-team
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> On Wed, 28 May 2025 at 19:04, Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>>
>> [...]
>>
>> > +struct {
>> > + int prog_off;
>> > + const char *errstr;
>> > +} stream_error_arr[] = {
>> > + {
>> > + offsetof(struct stream, progs.stream_cond_break),
>> > + "ERROR: Timeout detected for may_goto instruction",
>> > + },
>> > + {
>> > + offsetof(struct stream, progs.stream_deadlock),
>> > + "ERROR: AA or ABBA deadlock detected",
>> > + },
>> > +};
>>
>> Wild idea: instead of hand-coding this for each test, maybe add
>> __bpf_stderr, __bpf_stdout annotations to test_loader.c?
>> With intent for them to operate like __msg.
>
> Good idea. But we'll have to run the program, which is slightly
> different for each type.
> I guess I can just support tc, syscall etc. for now with some default
> setup to support this.
We currently have a __retval annotation which does BPF_PROG_TEST_RUN.
BPF_PROG_TEST_RUN is supported for "syscall" program type that you use
in the tests, so that should be covered.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams
2025-05-24 1:18 ` [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
2025-05-27 23:47 ` Eduard Zingerman
@ 2025-06-02 20:07 ` Alexei Starovoitov
2025-06-02 20:31 ` Kumar Kartikeya Dwivedi
1 sibling, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2025-06-02 20:07 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
Matt Bobrowski, kkd, Kernel Team
On Fri, May 23, 2025 at 6:18 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> for (i = 0; i < aux->real_func_cnt; i++) {
> +#ifdef CONFIG_BPF_SYSCALL
> + /* Ensure we don't push to subprog lists. */
> + if (bpf_is_subprog(aux->func[i])) {
> + WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[0].log));
> + WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[1].log));
> + }
Why check for subprogs only ?
> -/* Per-cpu temp buffers used by printf-like helpers to store the bprintf binary
> - * arguments representation.
> - */
> -#define MAX_BPRINTF_BIN_ARGS 512
> -
> /* Support executing three nested bprintf helper calls on a given CPU */
> #define MAX_BPRINTF_NEST_LEVEL 3
> -struct bpf_bprintf_buffers {
> - char bin_args[MAX_BPRINTF_BIN_ARGS];
> - char buf[MAX_BPRINTF_BUF];
> -};
Pls split the refactor into another patch.
> +static int bpf_stream_read(struct bpf_stream *stream, void __user *buf, int len)
> +{
> + int rem_len = len, cons_len, ret = 0;
> + struct bpf_stream_elem *elem = NULL;
> + struct llist_node *node;
> + unsigned long flags;
> +
> + if (raw_res_spin_lock_irqsave(&stream->lock, flags))
> + return -EDEADLK;
> +
> + while (rem_len) {
> + int pos = len - rem_len;
> + bool cont;
> +
> + node = bpf_stream_backlog_peek(stream);
> + if (!node) {
> + bpf_stream_backlog_fill(stream);
> + node = bpf_stream_backlog_peek(stream);
> + }
> + if (!node)
> + break;
> + elem = container_of(node, typeof(*elem), node);
> +
> + cons_len = elem->consumed_len;
> + cont = bpf_stream_consume_elem(elem, &rem_len) == false;
> +
> + ret = copy_to_user_nofault(buf + pos, elem->str + cons_len,
> + elem->consumed_len - cons_len);
_nofault() because res_spin_lock() is held, right?
But this is only called from sys_bpf. It's sleepable and faultable.
In v1 bpf_prog_stream_read() was callable from bpf prog, iirc,
but not anymore.
Let's use proper mutex and copy_to_user() ?
> + /* Restore in case of error. */
> + if (ret) {
> + elem->consumed_len = cons_len;
> + break;
> + }
> +
> + if (cont)
> + continue;
> + bpf_stream_backlog_pop(stream);
> + bpf_stream_release_capacity(stream, elem);
> + bpf_stream_free_elem(elem);
> + }
> +
> + raw_res_spin_unlock_irqrestore(&stream->lock, flags);
> + return ret ? ret : len - rem_len;
> +}
> +
> +int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len)
> +{
> + struct bpf_stream *stream;
> +
> + stream = bpf_stream_get(stream_id, prog->aux);
> + if (!stream)
> + return -ENOENT;
> + return bpf_stream_read(stream, buf, len);
> +}
> +
> +__bpf_kfunc_start_defs();
> +
> +/*
> + * Avoid using enum bpf_stream_id so that kfunc users don't have to pull in the
> + * enum in headers.
> + */
> +__bpf_kfunc int bpf_stream_vprintk(int stream_id, const char *fmt__str, const void *args, u32 len__sz, void *aux__prog)
> +{
> + struct bpf_bprintf_data data = {
> + .get_bin_args = true,
> + .get_buf = true,
> + };
> + struct bpf_prog_aux *aux = aux__prog;
> + u32 fmt_size = strlen(fmt__str) + 1;
> + struct bpf_stream *stream;
> + u32 data_len = len__sz;
> + int ret, num_args;
> +
> + stream = bpf_stream_get(stream_id, aux);
> + if (!stream)
> + return -ENOENT;
> +
> + if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
> + (data_len && !args))
> + return -EINVAL;
> + num_args = data_len / 8;
> +
> + ret = bpf_bprintf_prepare(fmt__str, fmt_size, args, num_args, &data);
> + if (ret < 0)
> + return ret;
> +
> + ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt__str, data.bin_args);
> + /* If the string was truncated, we only wrote until the size of buffer. */
> + ret = min_t(u32, ret + 1, MAX_BPRINTF_BUF);
> + ret = bpf_stream_push_str(stream, data.buf, ret);
> + bpf_bprintf_cleanup(&data);
> +
> + return ret;
> +}
> +
> +__bpf_kfunc_end_defs();
> +
> +BTF_KFUNCS_START(stream_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_stream_vprintk, KF_TRUSTED_ARGS)
> +BTF_KFUNCS_END(stream_kfunc_set)
> +
> +static const struct btf_kfunc_id_set bpf_stream_kfunc_set = {
> + .owner = THIS_MODULE,
> + .set = &stream_kfunc_set,
> +};
> +
> +static int __init bpf_stream_kfunc_init(void)
> +{
> + return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &bpf_stream_kfunc_set);
> +}
> +late_initcall(bpf_stream_kfunc_init);
let's avoid all these little initcall-s and individual
stream_kfunc_set-s.
Add bpf_stream_vprintk() to common_btf_ids[].
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info
2025-05-24 1:18 ` [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
@ 2025-06-02 20:18 ` Alexei Starovoitov
2025-06-02 20:21 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2025-06-02 20:18 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, Kernel Team
On Fri, May 23, 2025 at 6:18 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Prepare a function for use in future patches that can extract the file
> info, line info, and the source line number for a given BPF program
> provided it's program counter.
>
> Only the basename of the file path is provided, given it can be
> excessively long in some cases.
>
> This will be used in later patches to print source info to the BPF
> stream. The source line number is indicated by the return value, and the
> file and line info are provided through out parameters.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> include/linux/bpf.h | 2 ++
> kernel/bpf/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 51 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index d298746f4dcc..4eb4f06f7219 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3659,4 +3659,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
> return prog->aux->func_idx != 0;
> }
>
> +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
> +
> #endif /* _LINUX_BPF_H */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 22c278c008ce..7e7fef095bca 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3204,3 +3204,52 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
>
> EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
> EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
> +
> +#ifdef CONFIG_BPF_SYSCALL
> +
> +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
> +{
> + int idx = -1, insn_start, insn_end, len;
> + struct bpf_line_info *linfo;
> + void **jited_linfo;
> + struct btf *btf;
> +
> + btf = prog->aux->btf;
> + linfo = prog->aux->linfo;
> + jited_linfo = prog->aux->jited_linfo;
> +
> + if (!btf || !linfo || !prog->aux->jited_linfo)
> + return -EINVAL;
> + len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
> +
> + linfo = &prog->aux->linfo[prog->aux->linfo_idx];
> + jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
> +
> + insn_start = linfo[0].insn_off;
> + insn_end = insn_start + len;
> +
> + for (int i = 0; i < prog->aux->nr_linfo &&
> + linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {
> + if (jited_linfo[i] >= (void *)ip)
> + break;
> + idx = i;
> + }
> +
> + if (idx == -1)
> + return -ENOENT;
> +
> + /* Get base component of the file path. */
> + *filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
> + if (!*filep)
> + return -ENOENT;
> + *filep = kbasename(*filep);
> + /* Obtain the source line, and strip whitespace in prefix. */
> + *linep = btf_name_by_offset(btf, linfo[idx].line_off);
> + if (!*linep)
> + return -ENOENT;
> + while (isspace(**linep))
> + *linep += 1;
The check_btf_line() in the verifier does:
if (!btf_name_by_offset(btf, linfo[i].line_off) ||
!btf_name_by_offset(btf, linfo[i].file_name_off)) {
verbose(env, "Invalid line_info[%u].line_off
or .file_name_off\n", i);
err = -EINVAL;
goto err_free;
}
and later in the verifier we do:
s = ltrim(btf_name_by_offset(btf, linfo->line_off));
verbose(env, "%s", s); /* source code line */
so please drop these two checks.
> + return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
I would return it by reference as well.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info
2025-06-02 20:18 ` Alexei Starovoitov
@ 2025-06-02 20:21 ` Kumar Kartikeya Dwivedi
2025-06-02 20:25 ` Alexei Starovoitov
0 siblings, 1 reply; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-06-02 20:21 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, Kernel Team
On Mon, 2 Jun 2025 at 22:18, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 23, 2025 at 6:18 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Prepare a function for use in future patches that can extract the file
> > info, line info, and the source line number for a given BPF program
> > provided it's program counter.
> >
> > Only the basename of the file path is provided, given it can be
> > excessively long in some cases.
> >
> > This will be used in later patches to print source info to the BPF
> > stream. The source line number is indicated by the return value, and the
> > file and line info are provided through out parameters.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > include/linux/bpf.h | 2 ++
> > kernel/bpf/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 51 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index d298746f4dcc..4eb4f06f7219 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -3659,4 +3659,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
> > return prog->aux->func_idx != 0;
> > }
> >
> > +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
> > +
> > #endif /* _LINUX_BPF_H */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 22c278c008ce..7e7fef095bca 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -3204,3 +3204,52 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
> >
> > EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
> > +
> > +#ifdef CONFIG_BPF_SYSCALL
> > +
> > +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
> > +{
> > + int idx = -1, insn_start, insn_end, len;
> > + struct bpf_line_info *linfo;
> > + void **jited_linfo;
> > + struct btf *btf;
> > +
> > + btf = prog->aux->btf;
> > + linfo = prog->aux->linfo;
> > + jited_linfo = prog->aux->jited_linfo;
> > +
> > + if (!btf || !linfo || !prog->aux->jited_linfo)
> > + return -EINVAL;
> > + len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
> > +
> > + linfo = &prog->aux->linfo[prog->aux->linfo_idx];
> > + jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
> > +
> > + insn_start = linfo[0].insn_off;
> > + insn_end = insn_start + len;
> > +
> > + for (int i = 0; i < prog->aux->nr_linfo &&
> > + linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {
> > + if (jited_linfo[i] >= (void *)ip)
> > + break;
> > + idx = i;
> > + }
> > +
> > + if (idx == -1)
> > + return -ENOENT;
> > +
> > + /* Get base component of the file path. */
> > + *filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
> > + if (!*filep)
> > + return -ENOENT;
> > + *filep = kbasename(*filep);
> > + /* Obtain the source line, and strip whitespace in prefix. */
> > + *linep = btf_name_by_offset(btf, linfo[idx].line_off);
> > + if (!*linep)
> > + return -ENOENT;
> > + while (isspace(**linep))
> > + *linep += 1;
>
> The check_btf_line() in the verifier does:
> if (!btf_name_by_offset(btf, linfo[i].line_off) ||
> !btf_name_by_offset(btf, linfo[i].file_name_off)) {
> verbose(env, "Invalid line_info[%u].line_off
> or .file_name_off\n", i);
> err = -EINVAL;
> goto err_free;
> }
>
> and later in the verifier we do:
> s = ltrim(btf_name_by_offset(btf, linfo->line_off));
> verbose(env, "%s", s); /* source code line */
>
> so please drop these two checks.
There was a kernel crash in CI when these NULL checks were missing.
>
> > + return BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
>
> I would return it by reference as well.
Ok.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info
2025-06-02 20:21 ` Kumar Kartikeya Dwivedi
@ 2025-06-02 20:25 ` Alexei Starovoitov
2025-06-02 20:31 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2025-06-02 20:25 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, Kernel Team
On Mon, Jun 2, 2025 at 1:21 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Mon, 2 Jun 2025 at 22:18, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, May 23, 2025 at 6:18 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > Prepare a function for use in future patches that can extract the file
> > > info, line info, and the source line number for a given BPF program
> > > provided it's program counter.
> > >
> > > Only the basename of the file path is provided, given it can be
> > > excessively long in some cases.
> > >
> > > This will be used in later patches to print source info to the BPF
> > > stream. The source line number is indicated by the return value, and the
> > > file and line info are provided through out parameters.
> > >
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > > include/linux/bpf.h | 2 ++
> > > kernel/bpf/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 51 insertions(+)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index d298746f4dcc..4eb4f06f7219 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -3659,4 +3659,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
> > > return prog->aux->func_idx != 0;
> > > }
> > >
> > > +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
> > > +
> > > #endif /* _LINUX_BPF_H */
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 22c278c008ce..7e7fef095bca 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -3204,3 +3204,52 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
> > >
> > > EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
> > > EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
> > > +
> > > +#ifdef CONFIG_BPF_SYSCALL
> > > +
> > > +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
> > > +{
> > > + int idx = -1, insn_start, insn_end, len;
> > > + struct bpf_line_info *linfo;
> > > + void **jited_linfo;
> > > + struct btf *btf;
> > > +
> > > + btf = prog->aux->btf;
> > > + linfo = prog->aux->linfo;
> > > + jited_linfo = prog->aux->jited_linfo;
> > > +
> > > + if (!btf || !linfo || !prog->aux->jited_linfo)
> > > + return -EINVAL;
> > > + len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
> > > +
> > > + linfo = &prog->aux->linfo[prog->aux->linfo_idx];
> > > + jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
> > > +
> > > + insn_start = linfo[0].insn_off;
> > > + insn_end = insn_start + len;
> > > +
> > > + for (int i = 0; i < prog->aux->nr_linfo &&
> > > + linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {
> > > + if (jited_linfo[i] >= (void *)ip)
> > > + break;
> > > + idx = i;
> > > + }
> > > +
> > > + if (idx == -1)
> > > + return -ENOENT;
> > > +
> > > + /* Get base component of the file path. */
> > > + *filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
> > > + if (!*filep)
> > > + return -ENOENT;
> > > + *filep = kbasename(*filep);
> > > + /* Obtain the source line, and strip whitespace in prefix. */
> > > + *linep = btf_name_by_offset(btf, linfo[idx].line_off);
> > > + if (!*linep)
> > > + return -ENOENT;
> > > + while (isspace(**linep))
> > > + *linep += 1;
> >
> > The check_btf_line() in the verifier does:
> > if (!btf_name_by_offset(btf, linfo[i].line_off) ||
> > !btf_name_by_offset(btf, linfo[i].file_name_off)) {
> > verbose(env, "Invalid line_info[%u].line_off
> > or .file_name_off\n", i);
> > err = -EINVAL;
> > goto err_free;
> > }
> >
> > and later in the verifier we do:
> > s = ltrim(btf_name_by_offset(btf, linfo->line_off));
> > verbose(env, "%s", s); /* source code line */
> >
> > so please drop these two checks.
>
> There was a kernel crash in CI when these NULL checks were missing.
The math before is probably wrong then.
idx is invalid ?
This needs more debugging.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams
2025-06-02 20:07 ` Alexei Starovoitov
@ 2025-06-02 20:31 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-06-02 20:31 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
Matt Bobrowski, kkd, Kernel Team
On Mon, 2 Jun 2025 at 22:07, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 23, 2025 at 6:18 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > for (i = 0; i < aux->real_func_cnt; i++) {
> > +#ifdef CONFIG_BPF_SYSCALL
> > + /* Ensure we don't push to subprog lists. */
> > + if (bpf_is_subprog(aux->func[i])) {
> > + WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[0].log));
> > + WARN_ON_ONCE(!llist_empty(&aux->func[i]->aux->stream[1].log));
> > + }
>
> Why check for subprogs only ?
>
Stream elements should only be populated for main progs, so it would
be a bug if we did it for subprogs.
I guess I can drop it.
>
> > -/* Per-cpu temp buffers used by printf-like helpers to store the bprintf binary
> > - * arguments representation.
> > - */
> > -#define MAX_BPRINTF_BIN_ARGS 512
> > -
> > /* Support executing three nested bprintf helper calls on a given CPU */
> > #define MAX_BPRINTF_NEST_LEVEL 3
> > -struct bpf_bprintf_buffers {
> > - char bin_args[MAX_BPRINTF_BIN_ARGS];
> > - char buf[MAX_BPRINTF_BUF];
> > -};
>
> Pls split the refactor into another patch.
Ack.
>
> > +static int bpf_stream_read(struct bpf_stream *stream, void __user *buf, int len)
> > +{
> > + int rem_len = len, cons_len, ret = 0;
> > + struct bpf_stream_elem *elem = NULL;
> > + struct llist_node *node;
> > + unsigned long flags;
> > +
> > + if (raw_res_spin_lock_irqsave(&stream->lock, flags))
> > + return -EDEADLK;
> > +
> > + while (rem_len) {
> > + int pos = len - rem_len;
> > + bool cont;
> > +
> > + node = bpf_stream_backlog_peek(stream);
> > + if (!node) {
> > + bpf_stream_backlog_fill(stream);
> > + node = bpf_stream_backlog_peek(stream);
> > + }
> > + if (!node)
> > + break;
> > + elem = container_of(node, typeof(*elem), node);
> > +
> > + cons_len = elem->consumed_len;
> > + cont = bpf_stream_consume_elem(elem, &rem_len) == false;
> > +
> > + ret = copy_to_user_nofault(buf + pos, elem->str + cons_len,
> > + elem->consumed_len - cons_len);
>
> _nofault() because res_spin_lock() is held, right?
> But this is only called from sys_bpf. It's sleepable and faultable.
> In v1 bpf_prog_stream_read() was callable from bpf prog, iirc,
> but not anymore.
> Let's use proper mutex and copy_to_user() ?
Yes, will do.
>
> > + /* Restore in case of error. */
> > + if (ret) {
> > + elem->consumed_len = cons_len;
> > + break;
> > + }
> > +
> > + if (cont)
> > + continue;
> > + bpf_stream_backlog_pop(stream);
> > + bpf_stream_release_capacity(stream, elem);
> > + bpf_stream_free_elem(elem);
> > + }
> > +
> > + raw_res_spin_unlock_irqrestore(&stream->lock, flags);
> > + return ret ? ret : len - rem_len;
> > +}
> > +
> > +int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len)
> > +{
> > + struct bpf_stream *stream;
> > +
> > + stream = bpf_stream_get(stream_id, prog->aux);
> > + if (!stream)
> > + return -ENOENT;
> > + return bpf_stream_read(stream, buf, len);
> > +}
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +/*
> > + * Avoid using enum bpf_stream_id so that kfunc users don't have to pull in the
> > + * enum in headers.
> > + */
> > +__bpf_kfunc int bpf_stream_vprintk(int stream_id, const char *fmt__str, const void *args, u32 len__sz, void *aux__prog)
> > +{
> > + struct bpf_bprintf_data data = {
> > + .get_bin_args = true,
> > + .get_buf = true,
> > + };
> > + struct bpf_prog_aux *aux = aux__prog;
> > + u32 fmt_size = strlen(fmt__str) + 1;
> > + struct bpf_stream *stream;
> > + u32 data_len = len__sz;
> > + int ret, num_args;
> > +
> > + stream = bpf_stream_get(stream_id, aux);
> > + if (!stream)
> > + return -ENOENT;
> > +
> > + if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
> > + (data_len && !args))
> > + return -EINVAL;
> > + num_args = data_len / 8;
> > +
> > + ret = bpf_bprintf_prepare(fmt__str, fmt_size, args, num_args, &data);
> > + if (ret < 0)
> > + return ret;
> > +
> > + ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt__str, data.bin_args);
> > + /* If the string was truncated, we only wrote until the size of buffer. */
> > + ret = min_t(u32, ret + 1, MAX_BPRINTF_BUF);
> > + ret = bpf_stream_push_str(stream, data.buf, ret);
> > + bpf_bprintf_cleanup(&data);
> > +
> > + return ret;
> > +}
> > +
> > +__bpf_kfunc_end_defs();
> > +
> > +BTF_KFUNCS_START(stream_kfunc_set)
> > +BTF_ID_FLAGS(func, bpf_stream_vprintk, KF_TRUSTED_ARGS)
> > +BTF_KFUNCS_END(stream_kfunc_set)
> > +
> > +static const struct btf_kfunc_id_set bpf_stream_kfunc_set = {
> > + .owner = THIS_MODULE,
> > + .set = &stream_kfunc_set,
> > +};
> > +
> > +static int __init bpf_stream_kfunc_init(void)
> > +{
> > + return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &bpf_stream_kfunc_set);
> > +}
> > +late_initcall(bpf_stream_kfunc_init);
>
> let's avoid all these little initcall-s and individual
> stream_kfunc_set-s.
> Add bpf_stream_vprintk() to common_btf_ids[].
Ok, I will move it there.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info
2025-06-02 20:25 ` Alexei Starovoitov
@ 2025-06-02 20:31 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-06-02 20:31 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, Kernel Team
On Mon, 2 Jun 2025 at 22:25, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jun 2, 2025 at 1:21 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Mon, 2 Jun 2025 at 22:18, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, May 23, 2025 at 6:18 PM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > Prepare a function for use in future patches that can extract the file
> > > > info, line info, and the source line number for a given BPF program
> > > > provided it's program counter.
> > > >
> > > > Only the basename of the file path is provided, given it can be
> > > > excessively long in some cases.
> > > >
> > > > This will be used in later patches to print source info to the BPF
> > > > stream. The source line number is indicated by the return value, and the
> > > > file and line info are provided through out parameters.
> > > >
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > > include/linux/bpf.h | 2 ++
> > > > kernel/bpf/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++++
> > > > 2 files changed, 51 insertions(+)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index d298746f4dcc..4eb4f06f7219 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -3659,4 +3659,6 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
> > > > return prog->aux->func_idx != 0;
> > > > }
> > > >
> > > > +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep);
> > > > +
> > > > #endif /* _LINUX_BPF_H */
> > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > index 22c278c008ce..7e7fef095bca 100644
> > > > --- a/kernel/bpf/core.c
> > > > +++ b/kernel/bpf/core.c
> > > > @@ -3204,3 +3204,52 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
> > > >
> > > > EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
> > > > EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
> > > > +
> > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > +
> > > > +int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep, const char **linep)
> > > > +{
> > > > + int idx = -1, insn_start, insn_end, len;
> > > > + struct bpf_line_info *linfo;
> > > > + void **jited_linfo;
> > > > + struct btf *btf;
> > > > +
> > > > + btf = prog->aux->btf;
> > > > + linfo = prog->aux->linfo;
> > > > + jited_linfo = prog->aux->jited_linfo;
> > > > +
> > > > + if (!btf || !linfo || !prog->aux->jited_linfo)
> > > > + return -EINVAL;
> > > > + len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
> > > > +
> > > > + linfo = &prog->aux->linfo[prog->aux->linfo_idx];
> > > > + jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
> > > > +
> > > > + insn_start = linfo[0].insn_off;
> > > > + insn_end = insn_start + len;
> > > > +
> > > > + for (int i = 0; i < prog->aux->nr_linfo &&
> > > > + linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {
> > > > + if (jited_linfo[i] >= (void *)ip)
> > > > + break;
> > > > + idx = i;
> > > > + }
> > > > +
> > > > + if (idx == -1)
> > > > + return -ENOENT;
> > > > +
> > > > + /* Get base component of the file path. */
> > > > + *filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
> > > > + if (!*filep)
> > > > + return -ENOENT;
> > > > + *filep = kbasename(*filep);
> > > > + /* Obtain the source line, and strip whitespace in prefix. */
> > > > + *linep = btf_name_by_offset(btf, linfo[idx].line_off);
> > > > + if (!*linep)
> > > > + return -ENOENT;
> > > > + while (isspace(**linep))
> > > > + *linep += 1;
> > >
> > > The check_btf_line() in the verifier does:
> > > if (!btf_name_by_offset(btf, linfo[i].line_off) ||
> > > !btf_name_by_offset(btf, linfo[i].file_name_off)) {
> > > verbose(env, "Invalid line_info[%u].line_off
> > > or .file_name_off\n", i);
> > > err = -EINVAL;
> > > goto err_free;
> > > }
> > >
> > > and later in the verifier we do:
> > > s = ltrim(btf_name_by_offset(btf, linfo->line_off));
> > > verbose(env, "%s", s); /* source code line */
> > >
> > > so please drop these two checks.
> >
> > There was a kernel crash in CI when these NULL checks were missing.
>
> The math before is probably wrong then.
> idx is invalid ?
> This needs more debugging.
I will take a look.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 06/11] bpf: Report may_goto timeout to BPF stderr
2025-05-24 1:18 ` [PATCH bpf-next v2 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
@ 2025-06-02 22:27 ` Alexei Starovoitov
2025-06-03 1:55 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2025-06-02 22:27 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
Matt Bobrowski, kkd, Kernel Team
On Fri, May 23, 2025 at 6:19 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Begin reporting may_goto timeouts to BPF program's stderr stream.
> Make sure that we don't end up spamming too many errors if the
> program keeps failing repeatedly and filling up the stream, hence
> emit at most 512 error messages from the kernel for a given stream.
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> include/linux/bpf.h | 21 ++++++++++++++-------
> kernel/bpf/core.c | 19 ++++++++++++++++++-
> kernel/bpf/stream.c | 5 +++++
> 3 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index aab5ea17a329..3449a31e9f66 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1682,6 +1682,7 @@ struct bpf_prog_aux {
> struct rcu_head rcu;
> };
> struct bpf_stream stream[2];
> + atomic_t stream_error_cnt;
> };
>
> struct bpf_prog {
> @@ -3604,6 +3605,8 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
> int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
> void bpf_put_buffers(void);
>
> +#define BPF_PROG_STREAM_ERROR_CNT 512
> +
> void bpf_prog_stream_init(struct bpf_prog *prog);
> void bpf_prog_stream_free(struct bpf_prog *prog);
> int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len);
> @@ -3615,16 +3618,20 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> enum bpf_stream_id stream_id);
> int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
>
> +bool bpf_prog_stream_error_limit(struct bpf_prog *prog);
> +
> #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
> #define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
>
> -#define bpf_stream_stage(prog, stream_id, expr) \
> - ({ \
> - struct bpf_stream_stage __ss; \
> - bpf_stream_stage_init(&__ss); \
> - (expr); \
> - bpf_stream_stage_commit(&__ss, prog, stream_id); \
> - bpf_stream_stage_free(&__ss); \
> +#define bpf_stream_stage(prog, stream_id, expr) \
> + ({ \
> + struct bpf_stream_stage __ss; \
> + if (!bpf_prog_stream_error_limit(prog)) { \
> + bpf_stream_stage_init(&__ss); \
> + (expr); \
> + bpf_stream_stage_commit(&__ss, prog, stream_id); \
> + bpf_stream_stage_free(&__ss); \
> + } \
> })
I think this part can be in the macro in some prior patch
from the start to avoid the churn.
To me it's easier to review all key things about it in one go.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 07/11] bpf: Report rqspinlock deadlocks/timeout to BPF stderr
2025-05-24 1:18 ` [PATCH bpf-next v2 07/11] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
@ 2025-06-02 22:32 ` Alexei Starovoitov
2025-06-03 2:06 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2025-06-02 22:32 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, Kernel Team
On Fri, May 23, 2025 at 6:19 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Begin reporting rqspinlock deadlocks and timeout to BPF program's
> stderr.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> kernel/bpf/rqspinlock.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
> index 338305c8852c..888c8e2f9061 100644
> --- a/kernel/bpf/rqspinlock.c
> +++ b/kernel/bpf/rqspinlock.c
> @@ -666,6 +666,26 @@ EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
>
> __bpf_kfunc_start_defs();
>
> +static void bpf_prog_report_rqspinlock_violation(const char *str, void *lock, bool irqsave)
> +{
> + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> + struct bpf_prog *prog;
> +
> + prog = bpf_prog_find_from_stack();
> + if (!prog)
> + return;
> + bpf_stream_stage(prog, BPF_STDERR, ({
> + bpf_stream_printk("ERROR: %s for bpf_res_spin_lock%s\n", str, irqsave ? "_irqsave" : "");
> + bpf_stream_printk("Attempted lock = 0x%px\n", lock);
> + bpf_stream_printk("Total held locks = %d\n", rqh->cnt);
> + for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
> + bpf_stream_printk("Held lock[%2d] = 0x%px\n", i, rqh->locks[i]);
> + bpf_stream_dump_stack();
> + }));
I agree with Eduard that hiding __ss is kinda meh.
How about the following:
struct bpf_stream_stage ss;
bpf_stream_stage(ss, prog, BPF_STDERR, ({
bpf_stream_printk(ss, "ERROR: ...);
...
bpf_stream_dump_stack(ss);
}));
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 06/11] bpf: Report may_goto timeout to BPF stderr
2025-06-02 22:27 ` Alexei Starovoitov
@ 2025-06-03 1:55 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-06-03 1:55 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Eduard Zingerman, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Emil Tsalapatis, Barret Rhoden,
Matt Bobrowski, kkd, Kernel Team
On Tue, 3 Jun 2025 at 00:28, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 23, 2025 at 6:19 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Begin reporting may_goto timeouts to BPF program's stderr stream.
> > Make sure that we don't end up spamming too many errors if the
> > program keeps failing repeatedly and filling up the stream, hence
> > emit at most 512 error messages from the kernel for a given stream.
> >
> > Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > include/linux/bpf.h | 21 ++++++++++++++-------
> > kernel/bpf/core.c | 19 ++++++++++++++++++-
> > kernel/bpf/stream.c | 5 +++++
> > 3 files changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index aab5ea17a329..3449a31e9f66 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1682,6 +1682,7 @@ struct bpf_prog_aux {
> > struct rcu_head rcu;
> > };
> > struct bpf_stream stream[2];
> > + atomic_t stream_error_cnt;
> > };
> >
> > struct bpf_prog {
> > @@ -3604,6 +3605,8 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
> > int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
> > void bpf_put_buffers(void);
> >
> > +#define BPF_PROG_STREAM_ERROR_CNT 512
> > +
> > void bpf_prog_stream_init(struct bpf_prog *prog);
> > void bpf_prog_stream_free(struct bpf_prog *prog);
> > int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len);
> > @@ -3615,16 +3618,20 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> > enum bpf_stream_id stream_id);
> > int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
> >
> > +bool bpf_prog_stream_error_limit(struct bpf_prog *prog);
> > +
> > #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__)
> > #define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss)
> >
> > -#define bpf_stream_stage(prog, stream_id, expr) \
> > - ({ \
> > - struct bpf_stream_stage __ss; \
> > - bpf_stream_stage_init(&__ss); \
> > - (expr); \
> > - bpf_stream_stage_commit(&__ss, prog, stream_id); \
> > - bpf_stream_stage_free(&__ss); \
> > +#define bpf_stream_stage(prog, stream_id, expr) \
> > + ({ \
> > + struct bpf_stream_stage __ss; \
> > + if (!bpf_prog_stream_error_limit(prog)) { \
> > + bpf_stream_stage_init(&__ss); \
> > + (expr); \
> > + bpf_stream_stage_commit(&__ss, prog, stream_id); \
> > + bpf_stream_stage_free(&__ss); \
> > + } \
> > })
>
> I think this part can be in the macro in some prior patch
> from the start to avoid the churn.
> To me it's easier to review all key things about it in one go.
Ok, will move it.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 07/11] bpf: Report rqspinlock deadlocks/timeout to BPF stderr
2025-06-02 22:32 ` Alexei Starovoitov
@ 2025-06-03 2:06 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 43+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-06-03 2:06 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, Kernel Team
On Tue, 3 Jun 2025 at 00:33, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 23, 2025 at 6:19 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Begin reporting rqspinlock deadlocks and timeout to BPF program's
> > stderr.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > kernel/bpf/rqspinlock.c | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
> > index 338305c8852c..888c8e2f9061 100644
> > --- a/kernel/bpf/rqspinlock.c
> > +++ b/kernel/bpf/rqspinlock.c
> > @@ -666,6 +666,26 @@ EXPORT_SYMBOL_GPL(resilient_queued_spin_lock_slowpath);
> >
> > __bpf_kfunc_start_defs();
> >
> > +static void bpf_prog_report_rqspinlock_violation(const char *str, void *lock, bool irqsave)
> > +{
> > + struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> > + struct bpf_prog *prog;
> > +
> > + prog = bpf_prog_find_from_stack();
> > + if (!prog)
> > + return;
> > + bpf_stream_stage(prog, BPF_STDERR, ({
> > + bpf_stream_printk("ERROR: %s for bpf_res_spin_lock%s\n", str, irqsave ? "_irqsave" : "");
> > + bpf_stream_printk("Attempted lock = 0x%px\n", lock);
> > + bpf_stream_printk("Total held locks = %d\n", rqh->cnt);
> > + for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
> > + bpf_stream_printk("Held lock[%2d] = 0x%px\n", i, rqh->locks[i]);
> > + bpf_stream_dump_stack();
> > + }));
>
> I agree with Eduard that hiding __ss is kinda meh.
> How about the following:
>
> struct bpf_stream_stage ss;
>
> bpf_stream_stage(ss, prog, BPF_STDERR, ({
> bpf_stream_printk(ss, "ERROR: ...);
> ...
> bpf_stream_dump_stack(ss);
> }));
Ok.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API
2025-05-24 1:18 ` [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API Kumar Kartikeya Dwivedi
2025-05-28 18:02 ` Eduard Zingerman
@ 2025-06-05 20:06 ` Andrii Nakryiko
1 sibling, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2025-06-05 20:06 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Emil Tsalapatis,
Barret Rhoden, Matt Bobrowski, kkd, kernel-team
On Fri, May 23, 2025 at 6:19 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Introduce a libbpf API so that users can read data from a given BPF
> stream for a BPF prog fd. For now, only the low-level syscall wrapper
> is provided, we can add a bpf_program__* accessor as a follow up if
> needed.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> tools/lib/bpf/bpf.c | 16 ++++++++++++++++
> tools/lib/bpf/bpf.h | 15 +++++++++++++++
> tools/lib/bpf/libbpf.map | 1 +
> 3 files changed, 32 insertions(+)
>
LGTM from libbpf side of things, but please see nits and questions below.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index a9c3e33d0f8a..4b9f3a2096c8 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -1331,3 +1331,19 @@ int bpf_token_create(int bpffs_fd, struct bpf_token_create_opts *opts)
> fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
> return libbpf_err_errno(fd);
> }
> +
> +int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *stream_buf, __u32 stream_buf_len)
nit: I'd use shorter and no less clear "buf" and "buf_len". If
anything, the buffer itself isn't really "a stream buffer", it's just
a buffer into which we copy stream data..
> +{
> + const size_t attr_sz = offsetofend(union bpf_attr, prog_stream_read);
> + union bpf_attr attr;
> + int err;
> +
> + memset(&attr, 0, attr_sz);
> + attr.prog_stream_read.stream_buf = ptr_to_u64(stream_buf);
> + attr.prog_stream_read.stream_buf_len = stream_buf_len;
> + attr.prog_stream_read.stream_id = stream_id;
> + attr.prog_stream_read.prog_fd = prog_fd;
> +
> + err = sys_bpf(BPF_PROG_STREAM_READ_BY_FD, &attr, attr_sz);
> + return libbpf_err_errno(err);
> +}
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 777627d33d25..9fa7be3d92d3 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -704,6 +704,21 @@ struct bpf_token_create_opts {
> LIBBPF_API int bpf_token_create(int bpffs_fd,
> struct bpf_token_create_opts *opts);
>
> +/**
> + * @brief **bpf_prog_stream_read** reads data from the BPF stream of a given BPF
> + * program.
> + *
> + * @param prog_fd FD for the BPF program whose BPF stream is to be read.
> + * @param stream_id ID of the BPF stream to be read.
Is it documented anywhere that 1 is BPF_STDOUT and 2 is BPF_STDERR?
That seems like an important user-visible thing, no? In patch #1 these
constants are kernel-internal, is that intentional? If yes, how are
users going to know which value to pass here?
> + * @param stream_buf Buffer to read data into from the BPF stream.
> + * @param stream_buf_len Maximum number of bytes to read from the BPF stream.
> + *
> + * @return The number of bytes read, on success; negative error code, otherwise
> + * (errno is also set to the error code)
> + */
> +LIBBPF_API int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *stream_buf,
> + __u32 stream_buf_len);
Do you anticipate adding some extra parameters, flags, etc? Should we
add empty opts struct from the very beginning instead of later having
bpf_prog_stream_read() and bpf_prog_stream_read_opts() APIs?
> +
> #ifdef __cplusplus
> } /* extern "C" */
> #endif
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 1205f9a4fe04..4359527c8442 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -437,6 +437,7 @@ LIBBPF_1.6.0 {
> bpf_linker__add_fd;
> bpf_linker__new_fd;
> bpf_object__prepare;
> + bpf_prog_stream_read;
> bpf_program__func_info;
> bpf_program__func_info_cnt;
> bpf_program__line_info;
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2025-06-05 20:07 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-24 1:18 [PATCH bpf-next v2 00/11] BPF Standard Streams Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 01/11] bpf: Introduce BPF standard streams Kumar Kartikeya Dwivedi
2025-05-27 23:47 ` Eduard Zingerman
2025-05-27 23:55 ` Kumar Kartikeya Dwivedi
2025-05-28 0:23 ` Eduard Zingerman
2025-05-28 0:30 ` Kumar Kartikeya Dwivedi
2025-06-02 20:07 ` Alexei Starovoitov
2025-06-02 20:31 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 02/11] bpf: Add function to extract program source info Kumar Kartikeya Dwivedi
2025-06-02 20:18 ` Alexei Starovoitov
2025-06-02 20:21 ` Kumar Kartikeya Dwivedi
2025-06-02 20:25 ` Alexei Starovoitov
2025-06-02 20:31 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 03/11] bpf: Add function to find program from stack trace Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 04/11] bpf: Hold RCU read lock in bpf_prog_ksym_find Kumar Kartikeya Dwivedi
2025-05-24 4:41 ` Kumar Kartikeya Dwivedi
2025-05-28 0:15 ` Eduard Zingerman
2025-05-28 0:27 ` Kumar Kartikeya Dwivedi
2025-05-28 0:33 ` Eduard Zingerman
2025-05-24 1:18 ` [PATCH bpf-next v2 05/11] bpf: Add dump_stack() analogue to print to BPF stderr Kumar Kartikeya Dwivedi
2025-05-28 0:45 ` Eduard Zingerman
2025-05-28 0:58 ` Kumar Kartikeya Dwivedi
2025-05-28 6:03 ` Eduard Zingerman
2025-05-24 1:18 ` [PATCH bpf-next v2 06/11] bpf: Report may_goto timeout " Kumar Kartikeya Dwivedi
2025-06-02 22:27 ` Alexei Starovoitov
2025-06-03 1:55 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 07/11] bpf: Report rqspinlock deadlocks/timeout " Kumar Kartikeya Dwivedi
2025-06-02 22:32 ` Alexei Starovoitov
2025-06-03 2:06 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 08/11] libbpf: Add bpf_stream_printk() macro Kumar Kartikeya Dwivedi
2025-05-28 6:12 ` Eduard Zingerman
2025-05-28 10:09 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 09/11] libbpf: Introduce bpf_prog_stream_read() API Kumar Kartikeya Dwivedi
2025-05-28 18:02 ` Eduard Zingerman
2025-05-29 15:36 ` Kumar Kartikeya Dwivedi
2025-06-05 20:06 ` Andrii Nakryiko
2025-05-24 1:18 ` [PATCH bpf-next v2 10/11] bpftool: Add support for dumping streams Kumar Kartikeya Dwivedi
2025-05-27 10:20 ` Quentin Monnet
2025-05-27 14:51 ` Kumar Kartikeya Dwivedi
2025-05-24 1:18 ` [PATCH bpf-next v2 11/11] selftests/bpf: Add tests for prog streams Kumar Kartikeya Dwivedi
2025-05-28 17:04 ` Eduard Zingerman
2025-05-29 14:57 ` Kumar Kartikeya Dwivedi
2025-05-29 16:28 ` Eduard Zingerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).