* [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams
@ 2025-08-27 15:37 Puranjay Mohan
2025-08-27 15:37 ` [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling Puranjay Mohan
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Puranjay Mohan @ 2025-08-27 15:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Kumar Kartikeya Dwivedi, bpf
Changes in v3->v4:
v3: https://lore.kernel.org/all/20250827150113.15763-1-puranjay@kernel.org/
- Fixed a build issue when CONFIG_BPF_JIT=y and # CONFIG_BPF_SYSCALL is not set
Changes in v2->v3:
v2: https://lore.kernel.org/all/20250811111828.13836-1-puranjay@kernel.org/
- Improved the selftest to check the exact fault address
- Dropped BPF_NO_KFUNC_PROTOTYPES and bpf_arena_alloc/free_pages() usage
- Rebased on bpf-next/master
Changes in v1->v2:
v1: https://lore.kernel.org/all/20250806085847.18633-1-puranjay@kernel.org/
- Changed variable and mask names for consistency (Yonghong)
- Added Acked-by: Yonghong Song <yonghong.song@linux.dev> on two patches
This set adds the support of reporting page faults inside arena to BPF
stderr stream. The reported address is the one that a user would expect
to see if they pass it to bpf_printk();
Here is an example output from a stream and bpf_printk()
ERROR: Arena WRITE access at unmapped address 0xdeaddead0000
CPU: 9 UID: 0 PID: 502 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xc0/0x150
bpf_prog_report_arena_violation+0x98/0xf0
ex_handler_bpf+0x5c/0x78
fixup_exception+0xf8/0x160
__do_kernel_fault+0x40/0x188
do_bad_area+0x70/0x88
do_translation_fault+0x54/0x98
do_mem_abort+0x4c/0xa8
el1_abort+0x44/0x70
el1h_64_sync_handler+0x50/0x108
el1h_64_sync+0x6c/0x70
bpf_prog_a64a9778d31b8e88_stream_arena_write_fault+0x84/0xc8
*(page) = 1; @ stream.c:100
bpf_prog_test_run_syscall+0x100/0x328
__sys_bpf+0x508/0xb98
__arm64_sys_bpf+0x2c/0x48
invoke_syscall+0x50/0x120
el0_svc_common.constprop.0+0x48/0xf8
do_el0_svc+0x28/0x40
el0_svc+0x48/0xf8
el0t_64_sync_handler+0xa0/0xe8
el0t_64_sync+0x198/0x1a0
Same address is seen by using bpf_printk():
1389.078831: bpf_trace_printk: Read Address: 0xdeaddead0000
To make this possible, some extra metadata has to be passed to the bpf
exception handler, so the bpf exception handling mechanism for both
x86-64 and arm64 have been improved in this set.
The streams selftest has been updated to also test this new feature.
Puranjay Mohan (3):
bpf: arm64: simplify exception table handling
bpf: Report arena faults to BPF stderr
selftests/bpf: Add tests for arena fault reporting
arch/arm64/net/bpf_jit_comp.c | 77 ++++++++++++------
arch/x86/net/bpf_jit_comp.c | 79 ++++++++++++++++++-
include/linux/bpf.h | 5 ++
kernel/bpf/arena.c | 20 +++++
.../testing/selftests/bpf/prog_tests/stream.c | 33 +++++++-
tools/testing/selftests/bpf/progs/stream.c | 39 +++++++++
6 files changed, 226 insertions(+), 27 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling
2025-08-27 15:37 [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Puranjay Mohan
@ 2025-08-27 15:37 ` Puranjay Mohan
2025-08-28 0:19 ` Kumar Kartikeya Dwivedi
2025-08-29 10:06 ` Xu Kuohai
2025-08-27 15:37 ` [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr Puranjay Mohan
` (2 subsequent siblings)
3 siblings, 2 replies; 22+ messages in thread
From: Puranjay Mohan @ 2025-08-27 15:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Kumar Kartikeya Dwivedi, bpf
BPF loads with BPF_PROBE_MEM(SX) can load from unsafe pointers and the
JIT adds an exception table entry for the JITed instruction which allows
the exeption handler to set the destination register of the load to zero
and continue execution from the next instruction.
As all arm64 instructions are AARCH64_INSN_SIZE size, the exception
handler can just increment the pc by AARCH64_INSN_SIZE without needing
the exact address of the instruction following the the faulting
instruction.
Simplify the exception table usage in arm64 JIT by only saving the
destination register in ex->fixup and drop everything related to
the fixup_offset. The fault handler is modified to add AARCH64_INSN_SIZE
to the pc.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
---
arch/arm64/net/bpf_jit_comp.c | 25 +++----------------------
1 file changed, 3 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 52ffe115a8c47..42643fd9168fc 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1066,19 +1066,18 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
emit(A64_RET(A64_LR), ctx);
}
-#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
#define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
bool ex_handler_bpf(const struct exception_table_entry *ex,
struct pt_regs *regs)
{
- off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
if (dst_reg != DONT_CLEAR)
regs->regs[dst_reg] = 0;
- regs->pc = (unsigned long)&ex->fixup - offset;
+ /* Skip the faulting instruction */
+ regs->pc += AARCH64_INSN_SIZE;
return true;
}
@@ -1088,7 +1087,6 @@ static int add_exception_handler(const struct bpf_insn *insn,
int dst_reg)
{
off_t ins_offset;
- off_t fixup_offset;
unsigned long pc;
struct exception_table_entry *ex;
@@ -1119,22 +1117,6 @@ static int add_exception_handler(const struct bpf_insn *insn,
if (WARN_ON_ONCE(ins_offset >= 0 || ins_offset < INT_MIN))
return -ERANGE;
- /*
- * Since the extable follows the program, the fixup offset is always
- * negative and limited to BPF_JIT_REGION_SIZE. Store a positive value
- * to keep things simple, and put the destination register in the upper
- * bits. We don't need to worry about buildtime or runtime sort
- * modifying the upper bits because the table is already sorted, and
- * isn't part of the main exception table.
- *
- * The fixup_offset is set to the next instruction from the instruction
- * that may fault. The execution will jump to this after handling the
- * fault.
- */
- fixup_offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
- if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, fixup_offset))
- return -ERANGE;
-
/*
* The offsets above have been calculated using the RO buffer but we
* need to use the R/W buffer for writes.
@@ -1147,8 +1129,7 @@ static int add_exception_handler(const struct bpf_insn *insn,
if (BPF_CLASS(insn->code) != BPF_LDX)
dst_reg = DONT_CLEAR;
- ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
- FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
+ ex->fixup = FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
ex->type = EX_TYPE_BPF;
--
2.47.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-08-27 15:37 [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Puranjay Mohan
2025-08-27 15:37 ` [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling Puranjay Mohan
@ 2025-08-27 15:37 ` Puranjay Mohan
2025-08-28 0:22 ` Kumar Kartikeya Dwivedi
2025-08-29 10:30 ` Xu Kuohai
2025-08-27 15:37 ` [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting Puranjay Mohan
2025-08-28 0:23 ` [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Kumar Kartikeya Dwivedi
3 siblings, 2 replies; 22+ messages in thread
From: Puranjay Mohan @ 2025-08-27 15:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Kumar Kartikeya Dwivedi, bpf
Begin reporting arena page faults and the faulting address to BPF
program's stderr, this patch adds support in the arm64 and x86-64 JITs,
support for other archs can be added later.
The fault handlers receive the 32 bit address in the arena region so
the upper 32 bits of user_vm_start is added to it before printing the
address. This is what the user would expect to see as this is what is
printed by bpf_printk() is you pass it an address returned by
bpf_arena_alloc_pages();
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
---
arch/arm64/net/bpf_jit_comp.c | 52 +++++++++++++++++++++++
arch/x86/net/bpf_jit_comp.c | 79 +++++++++++++++++++++++++++++++++--
include/linux/bpf.h | 5 +++
kernel/bpf/arena.c | 20 +++++++++
4 files changed, 152 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 42643fd9168fc..5083886d6e66b 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1066,6 +1066,30 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
emit(A64_RET(A64_LR), ctx);
}
+/*
+ * Metadata encoding for exception handling in JITed code.
+ *
+ * Format of `fixup` field in `struct exception_table_entry`:
+ *
+ * Bit layout of `fixup` (32-bit):
+ *
+ * +-----------+--------+-----------+-----------+----------+
+ * | 31-27 | 26-22 | 21 | 20-16 | 15-0 |
+ * | | | | | |
+ * | FIXUP_REG | Unused | ARENA_ACC | ARENA_REG | OFFSET |
+ * +-----------+--------+-----------+-----------+----------+
+ *
+ * - OFFSET (16 bits): Offset used to compute address for Load/Store instruction.
+ * - ARENA_REG (5 bits): Register that is used to calculate the address for load/store when
+ * accessing the arena region.
+ * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
+ * - FIXUP_REG (5 bits): Destination register for the load instruction (cleared on fault) or set to
+ * DONT_CLEAR if it is a store instruction.
+ */
+
+#define BPF_FIXUP_OFFSET_MASK GENMASK(15, 0)
+#define BPF_FIXUP_ARENA_REG_MASK GENMASK(20, 16)
+#define BPF_ARENA_ACCESS BIT(21)
#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
#define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
@@ -1073,11 +1097,22 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
struct pt_regs *regs)
{
int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
+ s16 off = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
+ int arena_reg = FIELD_GET(BPF_FIXUP_ARENA_REG_MASK, ex->fixup);
+ bool is_arena = !!(ex->fixup & BPF_ARENA_ACCESS);
+ bool is_write = (dst_reg == DONT_CLEAR);
+ unsigned long addr;
if (dst_reg != DONT_CLEAR)
regs->regs[dst_reg] = 0;
/* Skip the faulting instruction */
regs->pc += AARCH64_INSN_SIZE;
+
+ if (is_arena) {
+ addr = regs->regs[arena_reg] + off;
+ bpf_prog_report_arena_violation(is_write, addr);
+ }
+
return true;
}
@@ -1087,6 +1122,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
int dst_reg)
{
off_t ins_offset;
+ s16 off = insn->off;
+ bool is_arena;
+ int arena_reg;
unsigned long pc;
struct exception_table_entry *ex;
@@ -1100,6 +1138,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
BPF_MODE(insn->code) != BPF_PROBE_ATOMIC)
return 0;
+ is_arena = (BPF_MODE(insn->code) == BPF_PROBE_MEM32) ||
+ (BPF_MODE(insn->code) == BPF_PROBE_ATOMIC);
+
if (!ctx->prog->aux->extable ||
WARN_ON_ONCE(ctx->exentry_idx >= ctx->prog->aux->num_exentries))
return -EINVAL;
@@ -1131,6 +1172,17 @@ static int add_exception_handler(const struct bpf_insn *insn,
ex->fixup = FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
+ if (is_arena) {
+ ex->fixup |= BPF_ARENA_ACCESS;
+ if (BPF_CLASS(insn->code) == BPF_LDX)
+ arena_reg = bpf2a64[insn->src_reg];
+ else
+ arena_reg = bpf2a64[insn->dst_reg];
+
+ ex->fixup |= FIELD_PREP(BPF_FIXUP_OFFSET_MASK, off) |
+ FIELD_PREP(BPF_FIXUP_ARENA_REG_MASK, arena_reg);
+ }
+
ex->type = EX_TYPE_BPF;
ctx->exentry_idx++;
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 7e3fca1646203..b75dea55df5a2 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -8,6 +8,7 @@
#include <linux/netdevice.h>
#include <linux/filter.h>
#include <linux/if_vlan.h>
+#include <linux/bitfield.h>
#include <linux/bpf.h>
#include <linux/memory.h>
#include <linux/sort.h>
@@ -1388,16 +1389,67 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
return 0;
}
+/*
+ * Metadata encoding for exception handling in JITed code.
+ *
+ * Format of `fixup` and `data` fields in `struct exception_table_entry`:
+ *
+ * Bit layout of `fixup` (32-bit):
+ *
+ * +-----------+--------+-----------+---------+----------+
+ * | 31 | 30-24 | 23-16 | 15-8 | 7-0 |
+ * | | | | | |
+ * | ARENA_ACC | Unused | ARENA_REG | DST_REG | INSN_LEN |
+ * +-----------+--------+-----------+---------+----------+
+ *
+ * - INSN_LEN (8 bits): Length of faulting insn (max x86 insn = 15 bytes (fits in 8 bits)).
+ * - DST_REG (8 bits): Offset of dst_reg from reg2pt_regs[] (max offset = 112 (fits in 8 bits)).
+ * This is set to DONT_CLEAR if the insn is a store.
+ * - ARENA_REG (8 bits): Offset of the register that is used to calculate the
+ * address for load/store when accessing the arena region.
+ * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
+ *
+ * Bit layout of `data` (32-bit):
+ *
+ * +--------------+--------+--------------+
+ * | 31-16 | 15-8 | 7-0 |
+ * | | | |
+ * | ARENA_OFFSET | Unused | EX_TYPE_BPF |
+ * +--------------+--------+--------------+
+ *
+ * - ARENA_OFFSET (16 bits): Offset used to calculate the address for load/store when
+ * accessing the arena region.
+ */
+
#define DONT_CLEAR 1
+#define FIXUP_INSN_LEN_MASK GENMASK(7, 0)
+#define FIXUP_REG_MASK GENMASK(15, 8)
+#define FIXUP_ARENA_REG_MASK GENMASK(23, 16)
+#define FIXUP_ARENA_ACCESS BIT(31)
+#define DATA_ARENA_OFFSET_MASK GENMASK(31, 16)
bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
{
- u32 reg = x->fixup >> 8;
+ u32 reg = FIELD_GET(FIXUP_REG_MASK, x->fixup);
+ u32 insn_len = FIELD_GET(FIXUP_INSN_LEN_MASK, x->fixup);
+ bool is_arena = !!(x->fixup & FIXUP_ARENA_ACCESS);
+ bool is_write = (reg == DONT_CLEAR);
+ unsigned long addr;
+ s16 off;
+ u32 arena_reg;
/* jump over faulting load and clear dest register */
if (reg != DONT_CLEAR)
*(unsigned long *)((void *)regs + reg) = 0;
- regs->ip += x->fixup & 0xff;
+ regs->ip += insn_len;
+
+ if (is_arena) {
+ arena_reg = FIELD_GET(FIXUP_ARENA_REG_MASK, x->fixup);
+ off = FIELD_GET(DATA_ARENA_OFFSET_MASK, x->data);
+ addr = *(unsigned long *)((void *)regs + arena_reg) + off;
+ bpf_prog_report_arena_violation(is_write, addr);
+ }
+
return true;
}
@@ -2070,6 +2122,8 @@ st: if (is_imm8(insn->off))
{
struct exception_table_entry *ex;
u8 *_insn = image + proglen + (start_of_ldx - temp);
+ u32 arena_reg, fixup_reg;
+ bool is_arena;
s64 delta;
if (!bpf_prog->aux->extable)
@@ -2089,8 +2143,25 @@ st: if (is_imm8(insn->off))
ex->data = EX_TYPE_BPF;
- ex->fixup = (prog - start_of_ldx) |
- ((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 8);
+ is_arena = (BPF_MODE(insn->code) == BPF_PROBE_MEM32) ||
+ (BPF_MODE(insn->code) == BPF_PROBE_ATOMIC);
+
+ fixup_reg = (BPF_CLASS(insn->code) == BPF_LDX) ?
+ reg2pt_regs[dst_reg] : DONT_CLEAR;
+
+ ex->fixup = FIELD_PREP(FIXUP_INSN_LEN_MASK, prog - start_of_ldx) |
+ FIELD_PREP(FIXUP_REG_MASK, fixup_reg);
+
+ if (is_arena) {
+ ex->fixup |= FIXUP_ARENA_ACCESS;
+ if (BPF_CLASS(insn->code) == BPF_LDX)
+ arena_reg = reg2pt_regs[src_reg];
+ else
+ arena_reg = reg2pt_regs[dst_reg];
+
+ ex->fixup |= FIELD_PREP(FIXUP_ARENA_REG_MASK, arena_reg);
+ ex->data |= FIELD_PREP(DATA_ARENA_OFFSET_MASK, insn->off);
+ }
}
break;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8f6e87f0f3a89..9959e30f805b2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2013,6 +2013,7 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
struct bpf_verifier_log *log);
void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map);
void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_ops_desc);
+void bpf_prog_report_arena_violation(bool write, unsigned long addr);
#else
#define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; })
static inline bool bpf_try_module_get(const void *data, struct module *owner)
@@ -2045,6 +2046,10 @@ static inline void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_op
{
}
+static inline void bpf_prog_report_arena_violation(bool write, unsigned long addr)
+{
+}
+
#endif
int bpf_prog_ctx_arg_info_init(struct bpf_prog *prog,
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index 5b37753799d20..a1653d1c04ca5 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -633,3 +633,23 @@ static int __init kfunc_init(void)
return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &common_kfunc_set);
}
late_initcall(kfunc_init);
+
+void bpf_prog_report_arena_violation(bool write, unsigned long addr)
+{
+ struct bpf_stream_stage ss;
+ struct bpf_prog *prog;
+ u64 user_vm_start;
+
+ prog = bpf_prog_find_from_stack();
+ if (!prog)
+ return;
+
+ user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
+ addr += (user_vm_start >> 32) << 32;
+
+ bpf_stream_stage(ss, prog, BPF_STDERR, ({
+ bpf_stream_printk(ss, "ERROR: Arena %s access at unmapped address 0x%lx\n",
+ write ? "WRITE" : "READ", addr);
+ bpf_stream_dump_stack(ss);
+ }));
+}
--
2.47.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting
2025-08-27 15:37 [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Puranjay Mohan
2025-08-27 15:37 ` [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling Puranjay Mohan
2025-08-27 15:37 ` [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr Puranjay Mohan
@ 2025-08-27 15:37 ` Puranjay Mohan
2025-08-27 19:54 ` Yonghong Song
2025-08-28 0:23 ` [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Kumar Kartikeya Dwivedi
3 siblings, 1 reply; 22+ messages in thread
From: Puranjay Mohan @ 2025-08-27 15:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Puranjay Mohan, Xu Kuohai, Catalin Marinas, Will Deacon,
Kumar Kartikeya Dwivedi, bpf
Add selftests for testing the reporting of arena page faults through BPF
streams. Two new bpf programs are added that read and write to an
unmapped arena address and the fault reporting is verified in the
userspace through streams.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
.../testing/selftests/bpf/prog_tests/stream.c | 33 +++++++++++++++-
tools/testing/selftests/bpf/progs/stream.c | 39 +++++++++++++++++++
2 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
index 36a1a1ebde692..8fdc83260ea14 100644
--- a/tools/testing/selftests/bpf/prog_tests/stream.c
+++ b/tools/testing/selftests/bpf/prog_tests/stream.c
@@ -41,6 +41,22 @@ struct {
"([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
"|[ \t]+[^\n]+\n)*",
},
+ {
+ offsetof(struct stream, progs.stream_arena_read_fault),
+ "ERROR: Arena READ access at unmapped address 0x.*\n"
+ "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
+ "Call trace:\n"
+ "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
+ "|[ \t]+[^\n]+\n)*",
+ },
+ {
+ offsetof(struct stream, progs.stream_arena_write_fault),
+ "ERROR: Arena WRITE access at unmapped address 0x.*\n"
+ "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
+ "Call trace:\n"
+ "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
+ "|[ \t]+[^\n]+\n)*",
+ },
};
static int match_regex(const char *pattern, const char *string)
@@ -63,6 +79,7 @@ void test_stream_errors(void)
struct stream *skel;
int ret, prog_fd;
char buf[1024];
+ char fault_addr[64] = {0};
skel = stream__open_and_load();
if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
@@ -85,6 +102,14 @@ void test_stream_errors(void)
continue;
}
#endif
+#if !defined(__x86_64__) && !defined(__aarch64__)
+ ASSERT_TRUE(1, "Arena fault reporting unsupported, skip.");
+ if (i == 2 || i == 3) {
+ ret = bpf_prog_stream_read(prog_fd, 2, buf, sizeof(buf), &ropts);
+ ASSERT_EQ(ret, 0, "stream read");
+ continue;
+ }
+#endif
ret = bpf_prog_stream_read(prog_fd, BPF_STREAM_STDERR, buf, sizeof(buf), &ropts);
ASSERT_GT(ret, 0, "stream read");
@@ -92,8 +117,14 @@ void test_stream_errors(void)
buf[ret] = '\0';
ret = match_regex(stream_error_arr[i].errstr, buf);
- if (!ASSERT_TRUE(ret == 1, "regex match"))
+ if (ret && (i == 2 || i == 3)) {
+ sprintf(fault_addr, "0x%lx", skel->bss->fault_addr);
+ ret = match_regex(fault_addr, buf);
+ }
+ if (!ASSERT_TRUE(ret == 1, "regex match")) {
fprintf(stderr, "Output from stream:\n%s\n", buf);
+ fprintf(stderr, "Fault Addr: 0x%lx\n", skel->bss->fault_addr);
+ }
}
stream__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
index 35790897dc879..9de015ac3ced5 100644
--- a/tools/testing/selftests/bpf/progs/stream.c
+++ b/tools/testing/selftests/bpf/progs/stream.c
@@ -5,6 +5,7 @@
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
#include "bpf_experimental.h"
+#include "bpf_arena_common.h"
struct arr_elem {
struct bpf_res_spin_lock lock;
@@ -17,10 +18,17 @@ struct {
__type(value, struct arr_elem);
} arrmap SEC(".maps");
+struct {
+ __uint(type, BPF_MAP_TYPE_ARENA);
+ __uint(map_flags, BPF_F_MMAPABLE);
+ __uint(max_entries, 1); /* number of pages */
+} arena SEC(".maps");
+
#define ENOSPC 28
#define _STR "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
int size;
+u64 fault_addr;
SEC("syscall")
__success __retval(0)
@@ -76,4 +84,35 @@ int stream_syscall(void *ctx)
return 0;
}
+SEC("syscall")
+__success __retval(0)
+int stream_arena_write_fault(void *ctx)
+{
+ struct bpf_arena *ptr = (void *)&arena;
+ u64 user_vm_start;
+
+ barrier_var(ptr);
+ user_vm_start = ptr->user_vm_start;
+
+ fault_addr = user_vm_start + 0xbeef;
+ *(u32 __arena *)(user_vm_start + 0xbeef) = 1;
+
+ return 0;
+}
+
+SEC("syscall")
+__success __retval(0)
+int stream_arena_read_fault(void *ctx)
+{
+ struct bpf_arena *ptr = (void *)&arena;
+ u64 user_vm_start;
+
+ barrier_var(ptr);
+ user_vm_start = ptr->user_vm_start;
+
+ fault_addr = user_vm_start + 0xbeef;
+
+ return *(u32 __arena *)(user_vm_start + 0xbeef);
+}
+
char _license[] SEC("license") = "GPL";
--
2.47.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting
2025-08-27 15:37 ` [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting Puranjay Mohan
@ 2025-08-27 19:54 ` Yonghong Song
2025-08-27 23:49 ` Kumar Kartikeya Dwivedi
2025-08-28 12:25 ` Puranjay Mohan
0 siblings, 2 replies; 22+ messages in thread
From: Yonghong Song @ 2025-08-27 19:54 UTC (permalink / raw)
To: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, Kumar Kartikeya Dwivedi,
bpf
On 8/27/25 8:37 AM, Puranjay Mohan wrote:
> Add selftests for testing the reporting of arena page faults through BPF
> streams. Two new bpf programs are added that read and write to an
> unmapped arena address and the fault reporting is verified in the
> userspace through streams.
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
> .../testing/selftests/bpf/prog_tests/stream.c | 33 +++++++++++++++-
> tools/testing/selftests/bpf/progs/stream.c | 39 +++++++++++++++++++
> 2 files changed, 71 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
> index 36a1a1ebde692..8fdc83260ea14 100644
> --- a/tools/testing/selftests/bpf/prog_tests/stream.c
> +++ b/tools/testing/selftests/bpf/prog_tests/stream.c
> @@ -41,6 +41,22 @@ struct {
> "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
> "|[ \t]+[^\n]+\n)*",
> },
> + {
> + offsetof(struct stream, progs.stream_arena_read_fault),
> + "ERROR: Arena READ access at unmapped address 0x.*\n"
> + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
> + "Call trace:\n"
> + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
> + "|[ \t]+[^\n]+\n)*",
> + },
> + {
> + offsetof(struct stream, progs.stream_arena_write_fault),
> + "ERROR: Arena WRITE access at unmapped address 0x.*\n"
> + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
> + "Call trace:\n"
> + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
> + "|[ \t]+[^\n]+\n)*",
> + },
> };
>
> static int match_regex(const char *pattern, const char *string)
> @@ -63,6 +79,7 @@ void test_stream_errors(void)
> struct stream *skel;
> int ret, prog_fd;
> char buf[1024];
> + char fault_addr[64] = {0};
You can replace '{0}' to '{}' so the whole array will be initialized.
>
> skel = stream__open_and_load();
> if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
> @@ -85,6 +102,14 @@ void test_stream_errors(void)
> continue;
> }
> #endif
> +#if !defined(__x86_64__) && !defined(__aarch64__)
> + ASSERT_TRUE(1, "Arena fault reporting unsupported, skip.");
> + if (i == 2 || i == 3) {
> + ret = bpf_prog_stream_read(prog_fd, 2, buf, sizeof(buf), &ropts);
> + ASSERT_EQ(ret, 0, "stream read");
> + continue;
> + }
> +#endif
>
> ret = bpf_prog_stream_read(prog_fd, BPF_STREAM_STDERR, buf, sizeof(buf), &ropts);
> ASSERT_GT(ret, 0, "stream read");
> @@ -92,8 +117,14 @@ void test_stream_errors(void)
> buf[ret] = '\0';
>
> ret = match_regex(stream_error_arr[i].errstr, buf);
> - if (!ASSERT_TRUE(ret == 1, "regex match"))
> + if (ret && (i == 2 || i == 3)) {
> + sprintf(fault_addr, "0x%lx", skel->bss->fault_addr);
> + ret = match_regex(fault_addr, buf);
> + }
> + if (!ASSERT_TRUE(ret == 1, "regex match")) {
> fprintf(stderr, "Output from stream:\n%s\n", buf);
> + fprintf(stderr, "Fault Addr: 0x%lx\n", skel->bss->fault_addr);
This will fault addr even for i == 0 or i == 1 and those address may be confusing
for test 0/1.
> + }
> }
>
> stream__destroy(skel);
> diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
> index 35790897dc879..9de015ac3ced5 100644
> --- a/tools/testing/selftests/bpf/progs/stream.c
> +++ b/tools/testing/selftests/bpf/progs/stream.c
> @@ -5,6 +5,7 @@
> #include <bpf/bpf_helpers.h>
> #include "bpf_misc.h"
> #include "bpf_experimental.h"
> +#include "bpf_arena_common.h"
>
> struct arr_elem {
> struct bpf_res_spin_lock lock;
> @@ -17,10 +18,17 @@ struct {
> __type(value, struct arr_elem);
> } arrmap SEC(".maps");
>
> +struct {
> + __uint(type, BPF_MAP_TYPE_ARENA);
> + __uint(map_flags, BPF_F_MMAPABLE);
> + __uint(max_entries, 1); /* number of pages */
> +} arena SEC(".maps");
> +
> #define ENOSPC 28
> #define _STR "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
>
> int size;
> +u64 fault_addr;
>
> SEC("syscall")
> __success __retval(0)
> @@ -76,4 +84,35 @@ int stream_syscall(void *ctx)
> return 0;
> }
>
> +SEC("syscall")
> +__success __retval(0)
> +int stream_arena_write_fault(void *ctx)
> +{
> + struct bpf_arena *ptr = (void *)&arena;
> + u64 user_vm_start;
> +
> + barrier_var(ptr);
Do we need this barrier_var()? I tried llvm20 and it works fine without the
above barrier_var().
> + user_vm_start = ptr->user_vm_start;
> +
Remove this line.
> + fault_addr = user_vm_start + 0xbeef;
> + *(u32 __arena *)(user_vm_start + 0xbeef) = 1;
Simplify to *(u32 __arena *)fault = 1;
> +
Remove this line.
> + return 0;
> +}
> +
> +SEC("syscall")
> +__success __retval(0)
> +int stream_arena_read_fault(void *ctx)
> +{
> + struct bpf_arena *ptr = (void *)&arena;
> + u64 user_vm_start;
> +
> + barrier_var(ptr);
Is this necessary?
> + user_vm_start = ptr->user_vm_start;
> +
Remove this line.
> + fault_addr = user_vm_start + 0xbeef;
> +
Remove this line.
> + return *(u32 __arena *)(user_vm_start + 0xbeef);
return*(u32 __arena *)fault_addr.
> +}
> +
> char _license[] SEC("license") = "GPL";
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting
2025-08-27 19:54 ` Yonghong Song
@ 2025-08-27 23:49 ` Kumar Kartikeya Dwivedi
2025-08-28 12:25 ` Puranjay Mohan
1 sibling, 0 replies; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-27 23:49 UTC (permalink / raw)
To: Yonghong Song
Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, bpf
On Wed, 27 Aug 2025 at 21:55, Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/27/25 8:37 AM, Puranjay Mohan wrote:
> > Add selftests for testing the reporting of arena page faults through BPF
> > streams. Two new bpf programs are added that read and write to an
> > unmapped arena address and the fault reporting is verified in the
> > userspace through streams.
> >
> > Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> > ---
> > .../testing/selftests/bpf/prog_tests/stream.c | 33 +++++++++++++++-
> > tools/testing/selftests/bpf/progs/stream.c | 39 +++++++++++++++++++
> > 2 files changed, 71 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
> > index 36a1a1ebde692..8fdc83260ea14 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/stream.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/stream.c
> > @@ -41,6 +41,22 @@ struct {
> > "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
> > "|[ \t]+[^\n]+\n)*",
> > },
> > + {
> > + offsetof(struct stream, progs.stream_arena_read_fault),
> > + "ERROR: Arena READ access at unmapped address 0x.*\n"
> > + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
> > + "Call trace:\n"
> > + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
> > + "|[ \t]+[^\n]+\n)*",
> > + },
> > + {
> > + offsetof(struct stream, progs.stream_arena_write_fault),
> > + "ERROR: Arena WRITE access at unmapped address 0x.*\n"
> > + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
> > + "Call trace:\n"
> > + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
> > + "|[ \t]+[^\n]+\n)*",
> > + },
> > };
> >
> > static int match_regex(const char *pattern, const char *string)
> > @@ -63,6 +79,7 @@ void test_stream_errors(void)
> > struct stream *skel;
> > int ret, prog_fd;
> > char buf[1024];
> > + char fault_addr[64] = {0};
>
> You can replace '{0}' to '{}' so the whole array will be initialized.
>
> >
> > skel = stream__open_and_load();
> > if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
> > @@ -85,6 +102,14 @@ void test_stream_errors(void)
> > continue;
> > }
> > #endif
> > +#if !defined(__x86_64__) && !defined(__aarch64__)
> > + ASSERT_TRUE(1, "Arena fault reporting unsupported, skip.");
> > + if (i == 2 || i == 3) {
> > + ret = bpf_prog_stream_read(prog_fd, 2, buf, sizeof(buf), &ropts);
> > + ASSERT_EQ(ret, 0, "stream read");
> > + continue;
> > + }
> > +#endif
> >
> > ret = bpf_prog_stream_read(prog_fd, BPF_STREAM_STDERR, buf, sizeof(buf), &ropts);
> > ASSERT_GT(ret, 0, "stream read");
> > @@ -92,8 +117,14 @@ void test_stream_errors(void)
> > buf[ret] = '\0';
> >
> > ret = match_regex(stream_error_arr[i].errstr, buf);
> > - if (!ASSERT_TRUE(ret == 1, "regex match"))
> > + if (ret && (i == 2 || i == 3)) {
> > + sprintf(fault_addr, "0x%lx", skel->bss->fault_addr);
> > + ret = match_regex(fault_addr, buf);
> > + }
> > + if (!ASSERT_TRUE(ret == 1, "regex match")) {
> > fprintf(stderr, "Output from stream:\n%s\n", buf);
> > + fprintf(stderr, "Fault Addr: 0x%lx\n", skel->bss->fault_addr);
>
> This will fault addr even for i == 0 or i == 1 and those address may be confusing
> for test 0/1.
>
> > + }
> > }
> >
> > stream__destroy(skel);
> > diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
> > index 35790897dc879..9de015ac3ced5 100644
> > --- a/tools/testing/selftests/bpf/progs/stream.c
> > +++ b/tools/testing/selftests/bpf/progs/stream.c
> > @@ -5,6 +5,7 @@
> > #include <bpf/bpf_helpers.h>
> > #include "bpf_misc.h"
> > #include "bpf_experimental.h"
> > +#include "bpf_arena_common.h"
> >
> > struct arr_elem {
> > struct bpf_res_spin_lock lock;
> > @@ -17,10 +18,17 @@ struct {
> > __type(value, struct arr_elem);
> > } arrmap SEC(".maps");
> >
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_ARENA);
> > + __uint(map_flags, BPF_F_MMAPABLE);
> > + __uint(max_entries, 1); /* number of pages */
> > +} arena SEC(".maps");
> > +
> > #define ENOSPC 28
> > #define _STR "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
> >
> > int size;
> > +u64 fault_addr;
> >
> > SEC("syscall")
> > __success __retval(0)
> > @@ -76,4 +84,35 @@ int stream_syscall(void *ctx)
> > return 0;
> > }
> >
> > +SEC("syscall")
> > +__success __retval(0)
> > +int stream_arena_write_fault(void *ctx)
> > +{
> > + struct bpf_arena *ptr = (void *)&arena;
> > + u64 user_vm_start;
> > +
> > + barrier_var(ptr);
>
> Do we need this barrier_var()? I tried llvm20 and it works fine without the
> above barrier_var().
Puranjay should add some context in the commit log or comments for
this, but I think the reason was that we found that GCC would diagnose
accesses into ptr assuming it points into the actual typeof(arena)
object defined in the C file (which is the right thing to do from a
language standpoint), but that obviously leads to misdiagnosis once we
access something in struct bpf_arena that goes out of bounds for this
type. So laundering the pointer through barrier_var gets rid of the
provenance information and the warning/error is gone.
But it would be better to add a comment since it's not obvious why.
>
> > + user_vm_start = ptr->user_vm_start;
> > +
>
> Remove this line.
>
> > + fault_addr = user_vm_start + 0xbeef;
> > + *(u32 __arena *)(user_vm_start + 0xbeef) = 1;
>
> Simplify to *(u32 __arena *)fault = 1;
>
> > +
>
> Remove this line.
>
> > + return 0;
> > +}
> > +
> > +SEC("syscall")
> > +__success __retval(0)
> > +int stream_arena_read_fault(void *ctx)
> > +{
> > + struct bpf_arena *ptr = (void *)&arena;
> > + u64 user_vm_start;
> > +
> > + barrier_var(ptr);
>
> Is this necessary?
>
> > + user_vm_start = ptr->user_vm_start;
> > +
>
> Remove this line.
>
> > + fault_addr = user_vm_start + 0xbeef;
> > +
>
> Remove this line.
>
> > + return *(u32 __arena *)(user_vm_start + 0xbeef);
>
> return*(u32 __arena *)fault_addr.
>
> > +}
> > +
> > char _license[] SEC("license") = "GPL";
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling
2025-08-27 15:37 ` [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling Puranjay Mohan
@ 2025-08-28 0:19 ` Kumar Kartikeya Dwivedi
2025-08-29 10:06 ` Xu Kuohai
1 sibling, 0 replies; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-28 0:19 UTC (permalink / raw)
To: Puranjay Mohan
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Catalin Marinas, Will Deacon, bpf, Xu Kuohai
On Wed, 27 Aug 2025 at 17:37, Puranjay Mohan <puranjay@kernel.org> wrote:
>
> BPF loads with BPF_PROBE_MEM(SX) can load from unsafe pointers and the
> JIT adds an exception table entry for the JITed instruction which allows
> the exeption handler to set the destination register of the load to zero
> and continue execution from the next instruction.
>
> As all arm64 instructions are AARCH64_INSN_SIZE size, the exception
> handler can just increment the pc by AARCH64_INSN_SIZE without needing
> the exact address of the instruction following the the faulting
> instruction.
>
> Simplify the exception table usage in arm64 JIT by only saving the
> destination register in ex->fixup and drop everything related to
> the fixup_offset. The fault handler is modified to add AARCH64_INSN_SIZE
> to the pc.
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-08-27 15:37 ` [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr Puranjay Mohan
@ 2025-08-28 0:22 ` Kumar Kartikeya Dwivedi
2025-08-28 0:27 ` Kumar Kartikeya Dwivedi
2025-08-28 12:14 ` Puranjay Mohan
2025-08-29 10:30 ` Xu Kuohai
1 sibling, 2 replies; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-28 0:22 UTC (permalink / raw)
To: Puranjay Mohan
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, bpf
On Wed, 27 Aug 2025 at 17:37, Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Begin reporting arena page faults and the faulting address to BPF
> program's stderr, this patch adds support in the arm64 and x86-64 JITs,
> support for other archs can be added later.
>
> The fault handlers receive the 32 bit address in the arena region so
> the upper 32 bits of user_vm_start is added to it before printing the
> address. This is what the user would expect to see as this is what is
> printed by bpf_printk() is you pass it an address returned by
> bpf_arena_alloc_pages();
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> arch/arm64/net/bpf_jit_comp.c | 52 +++++++++++++++++++++++
> arch/x86/net/bpf_jit_comp.c | 79 +++++++++++++++++++++++++++++++++--
> include/linux/bpf.h | 5 +++
> kernel/bpf/arena.c | 20 +++++++++
> 4 files changed, 152 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 42643fd9168fc..5083886d6e66b 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1066,6 +1066,30 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
> emit(A64_RET(A64_LR), ctx);
> }
>
> +/*
> + * Metadata encoding for exception handling in JITed code.
> + *
> + * Format of `fixup` field in `struct exception_table_entry`:
> + *
> + * Bit layout of `fixup` (32-bit):
> + *
> + * +-----------+--------+-----------+-----------+----------+
> + * | 31-27 | 26-22 | 21 | 20-16 | 15-0 |
> + * | | | | | |
> + * | FIXUP_REG | Unused | ARENA_ACC | ARENA_REG | OFFSET |
> + * +-----------+--------+-----------+-----------+----------+
> + *
> + * - OFFSET (16 bits): Offset used to compute address for Load/Store instruction.
> + * - ARENA_REG (5 bits): Register that is used to calculate the address for load/store when
> + * accessing the arena region.
> + * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
> + * - FIXUP_REG (5 bits): Destination register for the load instruction (cleared on fault) or set to
> + * DONT_CLEAR if it is a store instruction.
> + */
> +
> +#define BPF_FIXUP_OFFSET_MASK GENMASK(15, 0)
> +#define BPF_FIXUP_ARENA_REG_MASK GENMASK(20, 16)
> +#define BPF_ARENA_ACCESS BIT(21)
> #define BPF_FIXUP_REG_MASK GENMASK(31, 27)
> #define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
>
> @@ -1073,11 +1097,22 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
> struct pt_regs *regs)
> {
> int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
> + s16 off = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
> + int arena_reg = FIELD_GET(BPF_FIXUP_ARENA_REG_MASK, ex->fixup);
> + bool is_arena = !!(ex->fixup & BPF_ARENA_ACCESS);
> + bool is_write = (dst_reg == DONT_CLEAR);
> + unsigned long addr;
>
> if (dst_reg != DONT_CLEAR)
> regs->regs[dst_reg] = 0;
> /* Skip the faulting instruction */
> regs->pc += AARCH64_INSN_SIZE;
> +
> + if (is_arena) {
> + addr = regs->regs[arena_reg] + off;
What happens when arena_reg == dst_reg?
> + bpf_prog_report_arena_violation(is_write, addr);
> + }
> +
> return true;
> }
>
> @@ -1087,6 +1122,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
> int dst_reg)
> {
> off_t ins_offset;
> + s16 off = insn->off;
> + bool is_arena;
> + int arena_reg;
> unsigned long pc;
> struct exception_table_entry *ex;
>
> @@ -1100,6 +1138,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
> BPF_MODE(insn->code) != BPF_PROBE_ATOMIC)
> return 0;
>
> + is_arena = (BPF_MODE(insn->code) == BPF_PROBE_MEM32) ||
> + (BPF_MODE(insn->code) == BPF_PROBE_ATOMIC);
> +
> if (!ctx->prog->aux->extable ||
> WARN_ON_ONCE(ctx->exentry_idx >= ctx->prog->aux->num_exentries))
> return -EINVAL;
> @@ -1131,6 +1172,17 @@ static int add_exception_handler(const struct bpf_insn *insn,
>
> ex->fixup = FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
>
> + if (is_arena) {
> + ex->fixup |= BPF_ARENA_ACCESS;
> + if (BPF_CLASS(insn->code) == BPF_LDX)
> + arena_reg = bpf2a64[insn->src_reg];
> + else
> + arena_reg = bpf2a64[insn->dst_reg];
> +
> + ex->fixup |= FIELD_PREP(BPF_FIXUP_OFFSET_MASK, off) |
> + FIELD_PREP(BPF_FIXUP_ARENA_REG_MASK, arena_reg);
> + }
> +
> ex->type = EX_TYPE_BPF;
>
> ctx->exentry_idx++;
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 7e3fca1646203..b75dea55df5a2 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -8,6 +8,7 @@
> #include <linux/netdevice.h>
> #include <linux/filter.h>
> #include <linux/if_vlan.h>
> +#include <linux/bitfield.h>
> #include <linux/bpf.h>
> #include <linux/memory.h>
> #include <linux/sort.h>
> @@ -1388,16 +1389,67 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
> return 0;
> }
>
> +/*
> + * Metadata encoding for exception handling in JITed code.
> + *
> + * Format of `fixup` and `data` fields in `struct exception_table_entry`:
> + *
> + * Bit layout of `fixup` (32-bit):
> + *
> + * +-----------+--------+-----------+---------+----------+
> + * | 31 | 30-24 | 23-16 | 15-8 | 7-0 |
> + * | | | | | |
> + * | ARENA_ACC | Unused | ARENA_REG | DST_REG | INSN_LEN |
> + * +-----------+--------+-----------+---------+----------+
> + *
> + * - INSN_LEN (8 bits): Length of faulting insn (max x86 insn = 15 bytes (fits in 8 bits)).
> + * - DST_REG (8 bits): Offset of dst_reg from reg2pt_regs[] (max offset = 112 (fits in 8 bits)).
> + * This is set to DONT_CLEAR if the insn is a store.
> + * - ARENA_REG (8 bits): Offset of the register that is used to calculate the
> + * address for load/store when accessing the arena region.
> + * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
> + *
> + * Bit layout of `data` (32-bit):
> + *
> + * +--------------+--------+--------------+
> + * | 31-16 | 15-8 | 7-0 |
> + * | | | |
> + * | ARENA_OFFSET | Unused | EX_TYPE_BPF |
> + * +--------------+--------+--------------+
> + *
> + * - ARENA_OFFSET (16 bits): Offset used to calculate the address for load/store when
> + * accessing the arena region.
> + */
> +
> #define DONT_CLEAR 1
> +#define FIXUP_INSN_LEN_MASK GENMASK(7, 0)
> +#define FIXUP_REG_MASK GENMASK(15, 8)
> +#define FIXUP_ARENA_REG_MASK GENMASK(23, 16)
> +#define FIXUP_ARENA_ACCESS BIT(31)
> +#define DATA_ARENA_OFFSET_MASK GENMASK(31, 16)
>
> bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
> {
> - u32 reg = x->fixup >> 8;
> + u32 reg = FIELD_GET(FIXUP_REG_MASK, x->fixup);
> + u32 insn_len = FIELD_GET(FIXUP_INSN_LEN_MASK, x->fixup);
> + bool is_arena = !!(x->fixup & FIXUP_ARENA_ACCESS);
> + bool is_write = (reg == DONT_CLEAR);
> + unsigned long addr;
> + s16 off;
> + u32 arena_reg;
>
> /* jump over faulting load and clear dest register */
> if (reg != DONT_CLEAR)
> *(unsigned long *)((void *)regs + reg) = 0;
> - regs->ip += x->fixup & 0xff;
> + regs->ip += insn_len;
> +
> + if (is_arena) {
> + arena_reg = FIELD_GET(FIXUP_ARENA_REG_MASK, x->fixup);
> + off = FIELD_GET(DATA_ARENA_OFFSET_MASK, x->data);
> + addr = *(unsigned long *)((void *)regs + arena_reg) + off;
Same question. I faintly remember I spent a few hours when I
implemented this, wondering why the reported address was always zeroed
out for x86 before realizing they can be the same.
It would be good to add a test for this condition.
And also, to work around this, the address needs to be captured before
the destination register is cleared.
> [...]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams
2025-08-27 15:37 [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Puranjay Mohan
` (2 preceding siblings ...)
2025-08-27 15:37 ` [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting Puranjay Mohan
@ 2025-08-28 0:23 ` Kumar Kartikeya Dwivedi
2025-08-28 12:13 ` Puranjay Mohan
3 siblings, 1 reply; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-28 0:23 UTC (permalink / raw)
To: Puranjay Mohan, Xu Kuohai
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Catalin Marinas, Will Deacon, bpf
On Wed, 27 Aug 2025 at 17:37, Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Changes in v3->v4:
> v3: https://lore.kernel.org/all/20250827150113.15763-1-puranjay@kernel.org/
> - Fixed a build issue when CONFIG_BPF_JIT=y and # CONFIG_BPF_SYSCALL is not set
>
> Changes in v2->v3:
> v2: https://lore.kernel.org/all/20250811111828.13836-1-puranjay@kernel.org/
> - Improved the selftest to check the exact fault address
> - Dropped BPF_NO_KFUNC_PROTOTYPES and bpf_arena_alloc/free_pages() usage
> - Rebased on bpf-next/master
>
> Changes in v1->v2:
> v1: https://lore.kernel.org/all/20250806085847.18633-1-puranjay@kernel.org/
> - Changed variable and mask names for consistency (Yonghong)
> - Added Acked-by: Yonghong Song <yonghong.song@linux.dev> on two patches
>
> This set adds the support of reporting page faults inside arena to BPF
> stderr stream. The reported address is the one that a user would expect
> to see if they pass it to bpf_printk();
>
> Here is an example output from a stream and bpf_printk()
>
> ERROR: Arena WRITE access at unmapped address 0xdeaddead0000
> CPU: 9 UID: 0 PID: 502 Comm: test_progs
> Call trace:
> bpf_stream_stage_dump_stack+0xc0/0x150
> bpf_prog_report_arena_violation+0x98/0xf0
> ex_handler_bpf+0x5c/0x78
> fixup_exception+0xf8/0x160
> __do_kernel_fault+0x40/0x188
> do_bad_area+0x70/0x88
> do_translation_fault+0x54/0x98
> do_mem_abort+0x4c/0xa8
> el1_abort+0x44/0x70
> el1h_64_sync_handler+0x50/0x108
> el1h_64_sync+0x6c/0x70
> bpf_prog_a64a9778d31b8e88_stream_arena_write_fault+0x84/0xc8
> *(page) = 1; @ stream.c:100
> bpf_prog_test_run_syscall+0x100/0x328
> __sys_bpf+0x508/0xb98
> __arm64_sys_bpf+0x2c/0x48
> invoke_syscall+0x50/0x120
> el0_svc_common.constprop.0+0x48/0xf8
> do_el0_svc+0x28/0x40
> el0_svc+0x48/0xf8
> el0t_64_sync_handler+0xa0/0xe8
> el0t_64_sync+0x198/0x1a0
>
> Same address is seen by using bpf_printk():
>
> 1389.078831: bpf_trace_printk: Read Address: 0xdeaddead0000
>
> To make this possible, some extra metadata has to be passed to the bpf
> exception handler, so the bpf exception handling mechanism for both
> x86-64 and arm64 have been improved in this set.
>
> The streams selftest has been updated to also test this new feature.
We also need arm64 experts to take a look before we land, since you'll
respin anyway now.
Xu, could you please provide acks on the patches?
Thanks a lot.
>
> Puranjay Mohan (3):
> bpf: arm64: simplify exception table handling
> bpf: Report arena faults to BPF stderr
> selftests/bpf: Add tests for arena fault reporting
>
> arch/arm64/net/bpf_jit_comp.c | 77 ++++++++++++------
> arch/x86/net/bpf_jit_comp.c | 79 ++++++++++++++++++-
> include/linux/bpf.h | 5 ++
> kernel/bpf/arena.c | 20 +++++
> .../testing/selftests/bpf/prog_tests/stream.c | 33 +++++++-
> tools/testing/selftests/bpf/progs/stream.c | 39 +++++++++
> 6 files changed, 226 insertions(+), 27 deletions(-)
>
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-08-28 0:22 ` Kumar Kartikeya Dwivedi
@ 2025-08-28 0:27 ` Kumar Kartikeya Dwivedi
2025-08-28 12:14 ` Puranjay Mohan
1 sibling, 0 replies; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-08-28 0:27 UTC (permalink / raw)
To: Puranjay Mohan
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, bpf
On Thu, 28 Aug 2025 at 02:22, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Wed, 27 Aug 2025 at 17:37, Puranjay Mohan <puranjay@kernel.org> wrote:
> >
> > Begin reporting arena page faults and the faulting address to BPF
> > program's stderr, this patch adds support in the arm64 and x86-64 JITs,
> > support for other archs can be added later.
> >
> > The fault handlers receive the 32 bit address in the arena region so
> > the upper 32 bits of user_vm_start is added to it before printing the
> > address. This is what the user would expect to see as this is what is
> > printed by bpf_printk() is you pass it an address returned by
> > bpf_arena_alloc_pages();
> >
> > Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> > Acked-by: Yonghong Song <yonghong.song@linux.dev>
> > ---
> > [...]
> >
> > bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
> > {
> > - u32 reg = x->fixup >> 8;
> > + u32 reg = FIELD_GET(FIXUP_REG_MASK, x->fixup);
> > + u32 insn_len = FIELD_GET(FIXUP_INSN_LEN_MASK, x->fixup);
> > + bool is_arena = !!(x->fixup & FIXUP_ARENA_ACCESS);
> > + bool is_write = (reg == DONT_CLEAR);
> > + unsigned long addr;
> > + s16 off;
> > + u32 arena_reg;
> >
> > /* jump over faulting load and clear dest register */
> > if (reg != DONT_CLEAR)
> > *(unsigned long *)((void *)regs + reg) = 0;
> > - regs->ip += x->fixup & 0xff;
> > + regs->ip += insn_len;
> > +
> > + if (is_arena) {
> > + arena_reg = FIELD_GET(FIXUP_ARENA_REG_MASK, x->fixup);
> > + off = FIELD_GET(DATA_ARENA_OFFSET_MASK, x->data);
> > + addr = *(unsigned long *)((void *)regs + arena_reg) + off;
>
> Same question. I faintly remember I spent a few hours when I
> implemented this, wondering why the reported address was always zeroed
> out for x86 before realizing they can be the same.
> It would be good to add a test for this condition.
> And also, to work around this, the address needs to be captured before
> the destination register is cleared.
To be clear, to have such a test, you'd want to write it in inline
assembly to make sure compiler shenanigans don't screw up things.
>
> > [...]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams
2025-08-28 0:23 ` [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Kumar Kartikeya Dwivedi
@ 2025-08-28 12:13 ` Puranjay Mohan
0 siblings, 0 replies; 22+ messages in thread
From: Puranjay Mohan @ 2025-08-28 12:13 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi, Xu Kuohai
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Catalin Marinas, Will Deacon, bpf
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> On Wed, 27 Aug 2025 at 17:37, Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Changes in v3->v4:
>> v3: https://lore.kernel.org/all/20250827150113.15763-1-puranjay@kernel.org/
>> - Fixed a build issue when CONFIG_BPF_JIT=y and # CONFIG_BPF_SYSCALL is not set
>>
>> Changes in v2->v3:
>> v2: https://lore.kernel.org/all/20250811111828.13836-1-puranjay@kernel.org/
>> - Improved the selftest to check the exact fault address
>> - Dropped BPF_NO_KFUNC_PROTOTYPES and bpf_arena_alloc/free_pages() usage
>> - Rebased on bpf-next/master
>>
>> Changes in v1->v2:
>> v1: https://lore.kernel.org/all/20250806085847.18633-1-puranjay@kernel.org/
>> - Changed variable and mask names for consistency (Yonghong)
>> - Added Acked-by: Yonghong Song <yonghong.song@linux.dev> on two patches
>>
>> This set adds the support of reporting page faults inside arena to BPF
>> stderr stream. The reported address is the one that a user would expect
>> to see if they pass it to bpf_printk();
>>
>> Here is an example output from a stream and bpf_printk()
>>
>> ERROR: Arena WRITE access at unmapped address 0xdeaddead0000
>> CPU: 9 UID: 0 PID: 502 Comm: test_progs
>> Call trace:
>> bpf_stream_stage_dump_stack+0xc0/0x150
>> bpf_prog_report_arena_violation+0x98/0xf0
>> ex_handler_bpf+0x5c/0x78
>> fixup_exception+0xf8/0x160
>> __do_kernel_fault+0x40/0x188
>> do_bad_area+0x70/0x88
>> do_translation_fault+0x54/0x98
>> do_mem_abort+0x4c/0xa8
>> el1_abort+0x44/0x70
>> el1h_64_sync_handler+0x50/0x108
>> el1h_64_sync+0x6c/0x70
>> bpf_prog_a64a9778d31b8e88_stream_arena_write_fault+0x84/0xc8
>> *(page) = 1; @ stream.c:100
>> bpf_prog_test_run_syscall+0x100/0x328
>> __sys_bpf+0x508/0xb98
>> __arm64_sys_bpf+0x2c/0x48
>> invoke_syscall+0x50/0x120
>> el0_svc_common.constprop.0+0x48/0xf8
>> do_el0_svc+0x28/0x40
>> el0_svc+0x48/0xf8
>> el0t_64_sync_handler+0xa0/0xe8
>> el0t_64_sync+0x198/0x1a0
>>
>> Same address is seen by using bpf_printk():
>>
>> 1389.078831: bpf_trace_printk: Read Address: 0xdeaddead0000
>>
>> To make this possible, some extra metadata has to be passed to the bpf
>> exception handler, so the bpf exception handling mechanism for both
>> x86-64 and arm64 have been improved in this set.
>>
>> The streams selftest has been updated to also test this new feature.
>
> We also need arm64 experts to take a look before we land, since you'll
> respin anyway now.
> Xu, could you please provide acks on the patches?
>
> Thanks a lot.
Thanks for your review.
I will wait for Xu's feedback before respining.
Thanks,
Puranjay
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-08-28 0:22 ` Kumar Kartikeya Dwivedi
2025-08-28 0:27 ` Kumar Kartikeya Dwivedi
@ 2025-08-28 12:14 ` Puranjay Mohan
1 sibling, 0 replies; 22+ messages in thread
From: Puranjay Mohan @ 2025-08-28 12:14 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, bpf
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> On Wed, 27 Aug 2025 at 17:37, Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Begin reporting arena page faults and the faulting address to BPF
>> program's stderr, this patch adds support in the arm64 and x86-64 JITs,
>> support for other archs can be added later.
>>
>> The fault handlers receive the 32 bit address in the arena region so
>> the upper 32 bits of user_vm_start is added to it before printing the
>> address. This is what the user would expect to see as this is what is
>> printed by bpf_printk() is you pass it an address returned by
>> bpf_arena_alloc_pages();
>>
>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>> arch/arm64/net/bpf_jit_comp.c | 52 +++++++++++++++++++++++
>> arch/x86/net/bpf_jit_comp.c | 79 +++++++++++++++++++++++++++++++++--
>> include/linux/bpf.h | 5 +++
>> kernel/bpf/arena.c | 20 +++++++++
>> 4 files changed, 152 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index 42643fd9168fc..5083886d6e66b 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -1066,6 +1066,30 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
>> emit(A64_RET(A64_LR), ctx);
>> }
>>
>> +/*
>> + * Metadata encoding for exception handling in JITed code.
>> + *
>> + * Format of `fixup` field in `struct exception_table_entry`:
>> + *
>> + * Bit layout of `fixup` (32-bit):
>> + *
>> + * +-----------+--------+-----------+-----------+----------+
>> + * | 31-27 | 26-22 | 21 | 20-16 | 15-0 |
>> + * | | | | | |
>> + * | FIXUP_REG | Unused | ARENA_ACC | ARENA_REG | OFFSET |
>> + * +-----------+--------+-----------+-----------+----------+
>> + *
>> + * - OFFSET (16 bits): Offset used to compute address for Load/Store instruction.
>> + * - ARENA_REG (5 bits): Register that is used to calculate the address for load/store when
>> + * accessing the arena region.
>> + * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
>> + * - FIXUP_REG (5 bits): Destination register for the load instruction (cleared on fault) or set to
>> + * DONT_CLEAR if it is a store instruction.
>> + */
>> +
>> +#define BPF_FIXUP_OFFSET_MASK GENMASK(15, 0)
>> +#define BPF_FIXUP_ARENA_REG_MASK GENMASK(20, 16)
>> +#define BPF_ARENA_ACCESS BIT(21)
>> #define BPF_FIXUP_REG_MASK GENMASK(31, 27)
>> #define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
>>
>> @@ -1073,11 +1097,22 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
>> struct pt_regs *regs)
>> {
>> int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
>> + s16 off = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
>> + int arena_reg = FIELD_GET(BPF_FIXUP_ARENA_REG_MASK, ex->fixup);
>> + bool is_arena = !!(ex->fixup & BPF_ARENA_ACCESS);
>> + bool is_write = (dst_reg == DONT_CLEAR);
>> + unsigned long addr;
>>
>> if (dst_reg != DONT_CLEAR)
>> regs->regs[dst_reg] = 0;
>> /* Skip the faulting instruction */
>> regs->pc += AARCH64_INSN_SIZE;
>> +
>> + if (is_arena) {
>> + addr = regs->regs[arena_reg] + off;
>
> What happens when arena_reg == dst_reg?
>
>> + bpf_prog_report_arena_violation(is_write, addr);
>> + }
>> +
>> return true;
>> }
>>
>> @@ -1087,6 +1122,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
>> int dst_reg)
>> {
>> off_t ins_offset;
>> + s16 off = insn->off;
>> + bool is_arena;
>> + int arena_reg;
>> unsigned long pc;
>> struct exception_table_entry *ex;
>>
>> @@ -1100,6 +1138,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
>> BPF_MODE(insn->code) != BPF_PROBE_ATOMIC)
>> return 0;
>>
>> + is_arena = (BPF_MODE(insn->code) == BPF_PROBE_MEM32) ||
>> + (BPF_MODE(insn->code) == BPF_PROBE_ATOMIC);
>> +
>> if (!ctx->prog->aux->extable ||
>> WARN_ON_ONCE(ctx->exentry_idx >= ctx->prog->aux->num_exentries))
>> return -EINVAL;
>> @@ -1131,6 +1172,17 @@ static int add_exception_handler(const struct bpf_insn *insn,
>>
>> ex->fixup = FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
>>
>> + if (is_arena) {
>> + ex->fixup |= BPF_ARENA_ACCESS;
>> + if (BPF_CLASS(insn->code) == BPF_LDX)
>> + arena_reg = bpf2a64[insn->src_reg];
>> + else
>> + arena_reg = bpf2a64[insn->dst_reg];
>> +
>> + ex->fixup |= FIELD_PREP(BPF_FIXUP_OFFSET_MASK, off) |
>> + FIELD_PREP(BPF_FIXUP_ARENA_REG_MASK, arena_reg);
>> + }
>> +
>> ex->type = EX_TYPE_BPF;
>>
>> ctx->exentry_idx++;
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 7e3fca1646203..b75dea55df5a2 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -8,6 +8,7 @@
>> #include <linux/netdevice.h>
>> #include <linux/filter.h>
>> #include <linux/if_vlan.h>
>> +#include <linux/bitfield.h>
>> #include <linux/bpf.h>
>> #include <linux/memory.h>
>> #include <linux/sort.h>
>> @@ -1388,16 +1389,67 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
>> return 0;
>> }
>>
>> +/*
>> + * Metadata encoding for exception handling in JITed code.
>> + *
>> + * Format of `fixup` and `data` fields in `struct exception_table_entry`:
>> + *
>> + * Bit layout of `fixup` (32-bit):
>> + *
>> + * +-----------+--------+-----------+---------+----------+
>> + * | 31 | 30-24 | 23-16 | 15-8 | 7-0 |
>> + * | | | | | |
>> + * | ARENA_ACC | Unused | ARENA_REG | DST_REG | INSN_LEN |
>> + * +-----------+--------+-----------+---------+----------+
>> + *
>> + * - INSN_LEN (8 bits): Length of faulting insn (max x86 insn = 15 bytes (fits in 8 bits)).
>> + * - DST_REG (8 bits): Offset of dst_reg from reg2pt_regs[] (max offset = 112 (fits in 8 bits)).
>> + * This is set to DONT_CLEAR if the insn is a store.
>> + * - ARENA_REG (8 bits): Offset of the register that is used to calculate the
>> + * address for load/store when accessing the arena region.
>> + * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
>> + *
>> + * Bit layout of `data` (32-bit):
>> + *
>> + * +--------------+--------+--------------+
>> + * | 31-16 | 15-8 | 7-0 |
>> + * | | | |
>> + * | ARENA_OFFSET | Unused | EX_TYPE_BPF |
>> + * +--------------+--------+--------------+
>> + *
>> + * - ARENA_OFFSET (16 bits): Offset used to calculate the address for load/store when
>> + * accessing the arena region.
>> + */
>> +
>> #define DONT_CLEAR 1
>> +#define FIXUP_INSN_LEN_MASK GENMASK(7, 0)
>> +#define FIXUP_REG_MASK GENMASK(15, 8)
>> +#define FIXUP_ARENA_REG_MASK GENMASK(23, 16)
>> +#define FIXUP_ARENA_ACCESS BIT(31)
>> +#define DATA_ARENA_OFFSET_MASK GENMASK(31, 16)
>>
>> bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
>> {
>> - u32 reg = x->fixup >> 8;
>> + u32 reg = FIELD_GET(FIXUP_REG_MASK, x->fixup);
>> + u32 insn_len = FIELD_GET(FIXUP_INSN_LEN_MASK, x->fixup);
>> + bool is_arena = !!(x->fixup & FIXUP_ARENA_ACCESS);
>> + bool is_write = (reg == DONT_CLEAR);
>> + unsigned long addr;
>> + s16 off;
>> + u32 arena_reg;
>>
>> /* jump over faulting load and clear dest register */
>> if (reg != DONT_CLEAR)
>> *(unsigned long *)((void *)regs + reg) = 0;
>> - regs->ip += x->fixup & 0xff;
>> + regs->ip += insn_len;
>> +
>> + if (is_arena) {
>> + arena_reg = FIELD_GET(FIXUP_ARENA_REG_MASK, x->fixup);
>> + off = FIELD_GET(DATA_ARENA_OFFSET_MASK, x->data);
>> + addr = *(unsigned long *)((void *)regs + arena_reg) + off;
>
> Same question. I faintly remember I spent a few hours when I
> implemented this, wondering why the reported address was always zeroed
> out for x86 before realizing they can be the same.
> It would be good to add a test for this condition.
> And also, to work around this, the address needs to be captured before
> the destination register is cleared.
Thanks for catching this!
I will capture the address and then zero out the destination register
int the next version.
Thanks,
Puranjay
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting
2025-08-27 19:54 ` Yonghong Song
2025-08-27 23:49 ` Kumar Kartikeya Dwivedi
@ 2025-08-28 12:25 ` Puranjay Mohan
2025-08-28 15:44 ` Yonghong Song
1 sibling, 1 reply; 22+ messages in thread
From: Puranjay Mohan @ 2025-08-28 12:25 UTC (permalink / raw)
To: Yonghong Song, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, Kumar Kartikeya Dwivedi,
bpf
Yonghong Song <yonghong.song@linux.dev> writes:
> On 8/27/25 8:37 AM, Puranjay Mohan wrote:
>> Add selftests for testing the reporting of arena page faults through BPF
>> streams. Two new bpf programs are added that read and write to an
>> unmapped arena address and the fault reporting is verified in the
>> userspace through streams.
>>
>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>> ---
>> .../testing/selftests/bpf/prog_tests/stream.c | 33 +++++++++++++++-
>> tools/testing/selftests/bpf/progs/stream.c | 39 +++++++++++++++++++
>> 2 files changed, 71 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
>> index 36a1a1ebde692..8fdc83260ea14 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/stream.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/stream.c
>> @@ -41,6 +41,22 @@ struct {
>> "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
>> "|[ \t]+[^\n]+\n)*",
>> },
>> + {
>> + offsetof(struct stream, progs.stream_arena_read_fault),
>> + "ERROR: Arena READ access at unmapped address 0x.*\n"
>> + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
>> + "Call trace:\n"
>> + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
>> + "|[ \t]+[^\n]+\n)*",
>> + },
>> + {
>> + offsetof(struct stream, progs.stream_arena_write_fault),
>> + "ERROR: Arena WRITE access at unmapped address 0x.*\n"
>> + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
>> + "Call trace:\n"
>> + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
>> + "|[ \t]+[^\n]+\n)*",
>> + },
>> };
>>
>> static int match_regex(const char *pattern, const char *string)
>> @@ -63,6 +79,7 @@ void test_stream_errors(void)
>> struct stream *skel;
>> int ret, prog_fd;
>> char buf[1024];
>> + char fault_addr[64] = {0};
>
> You can replace '{0}' to '{}' so the whole array will be initialized.
Ack!
>>
>> skel = stream__open_and_load();
>> if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
>> @@ -85,6 +102,14 @@ void test_stream_errors(void)
>> continue;
>> }
>> #endif
>> +#if !defined(__x86_64__) && !defined(__aarch64__)
>> + ASSERT_TRUE(1, "Arena fault reporting unsupported, skip.");
>> + if (i == 2 || i == 3) {
>> + ret = bpf_prog_stream_read(prog_fd, 2, buf, sizeof(buf), &ropts);
>> + ASSERT_EQ(ret, 0, "stream read");
>> + continue;
>> + }
>> +#endif
>>
>> ret = bpf_prog_stream_read(prog_fd, BPF_STREAM_STDERR, buf, sizeof(buf), &ropts);
>> ASSERT_GT(ret, 0, "stream read");
>> @@ -92,8 +117,14 @@ void test_stream_errors(void)
>> buf[ret] = '\0';
>>
>> ret = match_regex(stream_error_arr[i].errstr, buf);
>> - if (!ASSERT_TRUE(ret == 1, "regex match"))
>> + if (ret && (i == 2 || i == 3)) {
>> + sprintf(fault_addr, "0x%lx", skel->bss->fault_addr);
>> + ret = match_regex(fault_addr, buf);
>> + }
>> + if (!ASSERT_TRUE(ret == 1, "regex match")) {
>> fprintf(stderr, "Output from stream:\n%s\n", buf);
>> + fprintf(stderr, "Fault Addr: 0x%lx\n", skel->bss->fault_addr);
>
> This will fault addr even for i == 0 or i == 1 and those address may be confusing
> for test 0/1.
Will add a check before printing this.
>> + }
>> }
>>
>> stream__destroy(skel);
>> diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
>> index 35790897dc879..9de015ac3ced5 100644
>> --- a/tools/testing/selftests/bpf/progs/stream.c
>> +++ b/tools/testing/selftests/bpf/progs/stream.c
>> @@ -5,6 +5,7 @@
>> #include <bpf/bpf_helpers.h>
>> #include "bpf_misc.h"
>> #include "bpf_experimental.h"
>> +#include "bpf_arena_common.h"
>>
>> struct arr_elem {
>> struct bpf_res_spin_lock lock;
>> @@ -17,10 +18,17 @@ struct {
>> __type(value, struct arr_elem);
>> } arrmap SEC(".maps");
>>
>> +struct {
>> + __uint(type, BPF_MAP_TYPE_ARENA);
>> + __uint(map_flags, BPF_F_MMAPABLE);
>> + __uint(max_entries, 1); /* number of pages */
>> +} arena SEC(".maps");
>> +
>> #define ENOSPC 28
>> #define _STR "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
>>
>> int size;
>> +u64 fault_addr;
>>
>> SEC("syscall")
>> __success __retval(0)
>> @@ -76,4 +84,35 @@ int stream_syscall(void *ctx)
>> return 0;
>> }
>>
>> +SEC("syscall")
>> +__success __retval(0)
>> +int stream_arena_write_fault(void *ctx)
>> +{
>> + struct bpf_arena *ptr = (void *)&arena;
>> + u64 user_vm_start;
>> +
>> + barrier_var(ptr);
>
> Do we need this barrier_var()? I tried llvm20 and it works fine without the
> above barrier_var().
As kumar explained in his reply, this is for making it build with GCC,
without the barrier_var() GCC fails with:
progs/stream.c: In function ‘stream_arena_write_fault’:
progs/stream.c:91:76: error: array subscript ‘struct bpf_arena[0]’ is partly outside array bounds of ‘struct <anonymous>[1]’ [-Werror=array-bounds=]
91 | u64 user_vm_start = ((struct bpf_arena *)(uintptr_t)(void *)&arena)->user_vm_start;
| ^~
progs/stream.c:25:3: note: object ‘arena’ of size 24
25 | } arena SEC(".maps");
| ^~~~~
GCC-BPF [test_progs-bpf_gcc] struct_ops_refcounted_fail__tail_call.bpf.o
progs/stream.c: In function ‘stream_arena_read_fault’:
progs/stream.c:103:85: error: array subscript ‘struct bpf_arena[0]’ is partly outside array bounds of ‘struct <anonymous>[1]’ [-Werror=array-bounds=]
103 | volatile u64 user_vm_start = ((struct bpf_arena *)(uintptr_t)(void *)&arena)->user_vm_start;
| ^~
progs/stream.c:25:3: note: object ‘arena’ of size 24
25 | } arena SEC(".maps");
| ^~~~~
cc1: all warnings being treated as errors
>> + user_vm_start = ptr->user_vm_start;
>> +
>
> Remove this line.
>
>> + fault_addr = user_vm_start + 0xbeef;
>> + *(u32 __arena *)(user_vm_start + 0xbeef) = 1;
>
> Simplify to *(u32 __arena *)fault = 1;
I wanted it to compile to an instruction with *(u32 *)src + offset = 1, which
was naive of me to think but now I will rewrite this in inline assembly to
also test dst_reg == src_reg case which Kumar found out.
Thanks,
Puranjay
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting
2025-08-28 12:25 ` Puranjay Mohan
@ 2025-08-28 15:44 ` Yonghong Song
0 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2025-08-28 15:44 UTC (permalink / raw)
To: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Xu Kuohai, Catalin Marinas, Will Deacon, Kumar Kartikeya Dwivedi,
bpf
On 8/28/25 5:25 AM, Puranjay Mohan wrote:
> Yonghong Song <yonghong.song@linux.dev> writes:
>
>> On 8/27/25 8:37 AM, Puranjay Mohan wrote:
>>> Add selftests for testing the reporting of arena page faults through BPF
>>> streams. Two new bpf programs are added that read and write to an
>>> unmapped arena address and the fault reporting is verified in the
>>> userspace through streams.
>>>
>>> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
>>> ---
>>> .../testing/selftests/bpf/prog_tests/stream.c | 33 +++++++++++++++-
>>> tools/testing/selftests/bpf/progs/stream.c | 39 +++++++++++++++++++
>>> 2 files changed, 71 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/stream.c b/tools/testing/selftests/bpf/prog_tests/stream.c
>>> index 36a1a1ebde692..8fdc83260ea14 100644
>>> --- a/tools/testing/selftests/bpf/prog_tests/stream.c
>>> +++ b/tools/testing/selftests/bpf/prog_tests/stream.c
>>> @@ -41,6 +41,22 @@ struct {
>>> "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
>>> "|[ \t]+[^\n]+\n)*",
>>> },
>>> + {
>>> + offsetof(struct stream, progs.stream_arena_read_fault),
>>> + "ERROR: Arena READ access at unmapped address 0x.*\n"
>>> + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
>>> + "Call trace:\n"
>>> + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
>>> + "|[ \t]+[^\n]+\n)*",
>>> + },
>>> + {
>>> + offsetof(struct stream, progs.stream_arena_write_fault),
>>> + "ERROR: Arena WRITE access at unmapped address 0x.*\n"
>>> + "CPU: [0-9]+ UID: 0 PID: [0-9]+ Comm: .*\n"
>>> + "Call trace:\n"
>>> + "([a-zA-Z_][a-zA-Z0-9_]*\\+0x[0-9a-fA-F]+/0x[0-9a-fA-F]+\n"
>>> + "|[ \t]+[^\n]+\n)*",
>>> + },
>>> };
>>>
>>> static int match_regex(const char *pattern, const char *string)
>>> @@ -63,6 +79,7 @@ void test_stream_errors(void)
>>> struct stream *skel;
>>> int ret, prog_fd;
>>> char buf[1024];
>>> + char fault_addr[64] = {0};
>> You can replace '{0}' to '{}' so the whole array will be initialized.
> Ack!
>
>>>
>>> skel = stream__open_and_load();
>>> if (!ASSERT_OK_PTR(skel, "stream__open_and_load"))
>>> @@ -85,6 +102,14 @@ void test_stream_errors(void)
>>> continue;
>>> }
>>> #endif
>>> +#if !defined(__x86_64__) && !defined(__aarch64__)
>>> + ASSERT_TRUE(1, "Arena fault reporting unsupported, skip.");
>>> + if (i == 2 || i == 3) {
>>> + ret = bpf_prog_stream_read(prog_fd, 2, buf, sizeof(buf), &ropts);
>>> + ASSERT_EQ(ret, 0, "stream read");
>>> + continue;
>>> + }
>>> +#endif
>>>
>>> ret = bpf_prog_stream_read(prog_fd, BPF_STREAM_STDERR, buf, sizeof(buf), &ropts);
>>> ASSERT_GT(ret, 0, "stream read");
>>> @@ -92,8 +117,14 @@ void test_stream_errors(void)
>>> buf[ret] = '\0';
>>>
>>> ret = match_regex(stream_error_arr[i].errstr, buf);
>>> - if (!ASSERT_TRUE(ret == 1, "regex match"))
>>> + if (ret && (i == 2 || i == 3)) {
>>> + sprintf(fault_addr, "0x%lx", skel->bss->fault_addr);
>>> + ret = match_regex(fault_addr, buf);
>>> + }
>>> + if (!ASSERT_TRUE(ret == 1, "regex match")) {
>>> fprintf(stderr, "Output from stream:\n%s\n", buf);
>>> + fprintf(stderr, "Fault Addr: 0x%lx\n", skel->bss->fault_addr);
>> This will fault addr even for i == 0 or i == 1 and those address may be confusing
>> for test 0/1.
> Will add a check before printing this.
>
>>> + }
>>> }
>>>
>>> stream__destroy(skel);
>>> diff --git a/tools/testing/selftests/bpf/progs/stream.c b/tools/testing/selftests/bpf/progs/stream.c
>>> index 35790897dc879..9de015ac3ced5 100644
>>> --- a/tools/testing/selftests/bpf/progs/stream.c
>>> +++ b/tools/testing/selftests/bpf/progs/stream.c
>>> @@ -5,6 +5,7 @@
>>> #include <bpf/bpf_helpers.h>
>>> #include "bpf_misc.h"
>>> #include "bpf_experimental.h"
>>> +#include "bpf_arena_common.h"
>>>
>>> struct arr_elem {
>>> struct bpf_res_spin_lock lock;
>>> @@ -17,10 +18,17 @@ struct {
>>> __type(value, struct arr_elem);
>>> } arrmap SEC(".maps");
>>>
>>> +struct {
>>> + __uint(type, BPF_MAP_TYPE_ARENA);
>>> + __uint(map_flags, BPF_F_MMAPABLE);
>>> + __uint(max_entries, 1); /* number of pages */
>>> +} arena SEC(".maps");
>>> +
>>> #define ENOSPC 28
>>> #define _STR "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
>>>
>>> int size;
>>> +u64 fault_addr;
>>>
>>> SEC("syscall")
>>> __success __retval(0)
>>> @@ -76,4 +84,35 @@ int stream_syscall(void *ctx)
>>> return 0;
>>> }
>>>
>>> +SEC("syscall")
>>> +__success __retval(0)
>>> +int stream_arena_write_fault(void *ctx)
>>> +{
>>> + struct bpf_arena *ptr = (void *)&arena;
>>> + u64 user_vm_start;
>>> +
>>> + barrier_var(ptr);
>> Do we need this barrier_var()? I tried llvm20 and it works fine without the
>> above barrier_var().
> As kumar explained in his reply, this is for making it build with GCC,
> without the barrier_var() GCC fails with:
>
> progs/stream.c: In function ‘stream_arena_write_fault’:
> progs/stream.c:91:76: error: array subscript ‘struct bpf_arena[0]’ is partly outside array bounds of ‘struct <anonymous>[1]’ [-Werror=array-bounds=]
> 91 | u64 user_vm_start = ((struct bpf_arena *)(uintptr_t)(void *)&arena)->user_vm_start;
> | ^~
> progs/stream.c:25:3: note: object ‘arena’ of size 24
> 25 | } arena SEC(".maps");
> | ^~~~~
> GCC-BPF [test_progs-bpf_gcc] struct_ops_refcounted_fail__tail_call.bpf.o
> progs/stream.c: In function ‘stream_arena_read_fault’:
> progs/stream.c:103:85: error: array subscript ‘struct bpf_arena[0]’ is partly outside array bounds of ‘struct <anonymous>[1]’ [-Werror=array-bounds=]
> 103 | volatile u64 user_vm_start = ((struct bpf_arena *)(uintptr_t)(void *)&arena)->user_vm_start;
> | ^~
> progs/stream.c:25:3: note: object ‘arena’ of size 24
> 25 | } arena SEC(".maps");
> | ^~~~~
> cc1: all warnings being treated as errors
It would be great if you can have the above in the commit message.
Please also put a small comments in the code such that
without barrier_var(), gcc will do some transformation like
u64 user_vm_start = ((struct bpf_arena *)(uintptr_t)(void *)&arena)->user_vm_start;
and this will cause struct out-of-bound access due to the casting
from smaller map struct to larger "struct bpf_arena".
clang does not have this issue.
>
>>> + user_vm_start = ptr->user_vm_start;
>>> +
>> Remove this line.
>>
>>> + fault_addr = user_vm_start + 0xbeef;
>>> + *(u32 __arena *)(user_vm_start + 0xbeef) = 1;
>> Simplify to *(u32 __arena *)fault = 1;
> I wanted it to compile to an instruction with *(u32 *)src + offset = 1, which
> was naive of me to think but now I will rewrite this in inline assembly to
> also test dst_reg == src_reg case which Kumar found out.
>
> Thanks,
> Puranjay
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling
2025-08-27 15:37 ` [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling Puranjay Mohan
2025-08-28 0:19 ` Kumar Kartikeya Dwivedi
@ 2025-08-29 10:06 ` Xu Kuohai
1 sibling, 0 replies; 22+ messages in thread
From: Xu Kuohai @ 2025-08-29 10:06 UTC (permalink / raw)
To: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Catalin Marinas, Will Deacon,
Kumar Kartikeya Dwivedi, bpf
On 8/27/2025 11:37 PM, Puranjay Mohan wrote:
> BPF loads with BPF_PROBE_MEM(SX) can load from unsafe pointers and the
> JIT adds an exception table entry for the JITed instruction which allows
> the exeption handler to set the destination register of the load to zero
> and continue execution from the next instruction.
>
> As all arm64 instructions are AARCH64_INSN_SIZE size, the exception
> handler can just increment the pc by AARCH64_INSN_SIZE without needing
> the exact address of the instruction following the the faulting
> instruction.
>
> Simplify the exception table usage in arm64 JIT by only saving the
> destination register in ex->fixup and drop everything related to
> the fixup_offset. The fault handler is modified to add AARCH64_INSN_SIZE
> to the pc.
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> arch/arm64/net/bpf_jit_comp.c | 25 +++----------------------
> 1 file changed, 3 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 52ffe115a8c47..42643fd9168fc 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1066,19 +1066,18 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
> emit(A64_RET(A64_LR), ctx);
> }
>
> -#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
> #define BPF_FIXUP_REG_MASK GENMASK(31, 27)
> #define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
>
> bool ex_handler_bpf(const struct exception_table_entry *ex,
> struct pt_regs *regs)
> {
> - off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
> int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
>
> if (dst_reg != DONT_CLEAR)
> regs->regs[dst_reg] = 0;
> - regs->pc = (unsigned long)&ex->fixup - offset;
> + /* Skip the faulting instruction */
> + regs->pc += AARCH64_INSN_SIZE;
> return true;
> }
>
> @@ -1088,7 +1087,6 @@ static int add_exception_handler(const struct bpf_insn *insn,
> int dst_reg)
> {
> off_t ins_offset;
> - off_t fixup_offset;
> unsigned long pc;
> struct exception_table_entry *ex;
>
> @@ -1119,22 +1117,6 @@ static int add_exception_handler(const struct bpf_insn *insn,
> if (WARN_ON_ONCE(ins_offset >= 0 || ins_offset < INT_MIN))
> return -ERANGE;
>
> - /*
> - * Since the extable follows the program, the fixup offset is always
> - * negative and limited to BPF_JIT_REGION_SIZE. Store a positive value
> - * to keep things simple, and put the destination register in the upper
> - * bits. We don't need to worry about buildtime or runtime sort
> - * modifying the upper bits because the table is already sorted, and
> - * isn't part of the main exception table.
> - *
> - * The fixup_offset is set to the next instruction from the instruction
> - * that may fault. The execution will jump to this after handling the
> - * fault.
> - */
> - fixup_offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
> - if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, fixup_offset))
> - return -ERANGE;
> -
> /*
> * The offsets above have been calculated using the RO buffer but we
> * need to use the R/W buffer for writes.
> @@ -1147,8 +1129,7 @@ static int add_exception_handler(const struct bpf_insn *insn,
> if (BPF_CLASS(insn->code) != BPF_LDX)
> dst_reg = DONT_CLEAR;
>
> - ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
> - FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
> + ex->fixup = FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
>
> ex->type = EX_TYPE_BPF;
>
Nice refactor, looks good to me
Acked-by: Xu Kuohai <xukuohai@huawei.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-08-27 15:37 ` [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr Puranjay Mohan
2025-08-28 0:22 ` Kumar Kartikeya Dwivedi
@ 2025-08-29 10:30 ` Xu Kuohai
2025-08-29 20:28 ` Alexei Starovoitov
1 sibling, 1 reply; 22+ messages in thread
From: Xu Kuohai @ 2025-08-29 10:30 UTC (permalink / raw)
To: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Catalin Marinas, Will Deacon,
Kumar Kartikeya Dwivedi, bpf
On 8/27/2025 11:37 PM, Puranjay Mohan wrote:
> Begin reporting arena page faults and the faulting address to BPF
> program's stderr, this patch adds support in the arm64 and x86-64 JITs,
> support for other archs can be added later.
>
> The fault handlers receive the 32 bit address in the arena region so
> the upper 32 bits of user_vm_start is added to it before printing the
> address. This is what the user would expect to see as this is what is
> printed by bpf_printk() is you pass it an address returned by
> bpf_arena_alloc_pages();
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> arch/arm64/net/bpf_jit_comp.c | 52 +++++++++++++++++++++++
> arch/x86/net/bpf_jit_comp.c | 79 +++++++++++++++++++++++++++++++++--
> include/linux/bpf.h | 5 +++
> kernel/bpf/arena.c | 20 +++++++++
> 4 files changed, 152 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 42643fd9168fc..5083886d6e66b 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -1066,6 +1066,30 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
> emit(A64_RET(A64_LR), ctx);
> }
>
> +/*
> + * Metadata encoding for exception handling in JITed code.
> + *
> + * Format of `fixup` field in `struct exception_table_entry`:
> + *
> + * Bit layout of `fixup` (32-bit):
> + *
> + * +-----------+--------+-----------+-----------+----------+
> + * | 31-27 | 26-22 | 21 | 20-16 | 15-0 |
> + * | | | | | |
> + * | FIXUP_REG | Unused | ARENA_ACC | ARENA_REG | OFFSET |
> + * +-----------+--------+-----------+-----------+----------+
> + *
> + * - OFFSET (16 bits): Offset used to compute address for Load/Store instruction.
> + * - ARENA_REG (5 bits): Register that is used to calculate the address for load/store when
> + * accessing the arena region.
> + * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
> + * - FIXUP_REG (5 bits): Destination register for the load instruction (cleared on fault) or set to
> + * DONT_CLEAR if it is a store instruction.
> + */
> +
> +#define BPF_FIXUP_OFFSET_MASK GENMASK(15, 0)
> +#define BPF_FIXUP_ARENA_REG_MASK GENMASK(20, 16)
> +#define BPF_ARENA_ACCESS BIT(21)
> #define BPF_FIXUP_REG_MASK GENMASK(31, 27)
> #define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
>
> @@ -1073,11 +1097,22 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
> struct pt_regs *regs)
> {
> int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
> + s16 off = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
> + int arena_reg = FIELD_GET(BPF_FIXUP_ARENA_REG_MASK, ex->fixup);
> + bool is_arena = !!(ex->fixup & BPF_ARENA_ACCESS);
> + bool is_write = (dst_reg == DONT_CLEAR);
> + unsigned long addr;
>
> if (dst_reg != DONT_CLEAR)
> regs->regs[dst_reg] = 0;
> /* Skip the faulting instruction */
> regs->pc += AARCH64_INSN_SIZE;
> +
> + if (is_arena) {
> + addr = regs->regs[arena_reg] + off;
> + bpf_prog_report_arena_violation(is_write, addr);
> + }
> +
> return true;
> }
>
> @@ -1087,6 +1122,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
> int dst_reg)
> {
> off_t ins_offset;
> + s16 off = insn->off;
> + bool is_arena;
> + int arena_reg;
> unsigned long pc;
> struct exception_table_entry *ex;
>
> @@ -1100,6 +1138,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
> BPF_MODE(insn->code) != BPF_PROBE_ATOMIC)
> return 0;
>
> + is_arena = (BPF_MODE(insn->code) == BPF_PROBE_MEM32) ||
> + (BPF_MODE(insn->code) == BPF_PROBE_ATOMIC);
> +
> if (!ctx->prog->aux->extable ||
> WARN_ON_ONCE(ctx->exentry_idx >= ctx->prog->aux->num_exentries))
> return -EINVAL;
> @@ -1131,6 +1172,17 @@ static int add_exception_handler(const struct bpf_insn *insn,
>
> ex->fixup = FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
>
> + if (is_arena) {
> + ex->fixup |= BPF_ARENA_ACCESS;
> + if (BPF_CLASS(insn->code) == BPF_LDX)
> + arena_reg = bpf2a64[insn->src_reg];
> + else
> + arena_reg = bpf2a64[insn->dst_reg];
> +
> + ex->fixup |= FIELD_PREP(BPF_FIXUP_OFFSET_MASK, off) |
> + FIELD_PREP(BPF_FIXUP_ARENA_REG_MASK, arena_reg);
> + }
> +
> ex->type = EX_TYPE_BPF;
>
> ctx->exentry_idx++;
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 7e3fca1646203..b75dea55df5a2 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -8,6 +8,7 @@
> #include <linux/netdevice.h>
> #include <linux/filter.h>
> #include <linux/if_vlan.h>
> +#include <linux/bitfield.h>
> #include <linux/bpf.h>
> #include <linux/memory.h>
> #include <linux/sort.h>
> @@ -1388,16 +1389,67 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
> return 0;
> }
>
> +/*
> + * Metadata encoding for exception handling in JITed code.
> + *
> + * Format of `fixup` and `data` fields in `struct exception_table_entry`:
> + *
> + * Bit layout of `fixup` (32-bit):
> + *
> + * +-----------+--------+-----------+---------+----------+
> + * | 31 | 30-24 | 23-16 | 15-8 | 7-0 |
> + * | | | | | |
> + * | ARENA_ACC | Unused | ARENA_REG | DST_REG | INSN_LEN |
> + * +-----------+--------+-----------+---------+----------+
> + *
> + * - INSN_LEN (8 bits): Length of faulting insn (max x86 insn = 15 bytes (fits in 8 bits)).
> + * - DST_REG (8 bits): Offset of dst_reg from reg2pt_regs[] (max offset = 112 (fits in 8 bits)).
> + * This is set to DONT_CLEAR if the insn is a store.
> + * - ARENA_REG (8 bits): Offset of the register that is used to calculate the
> + * address for load/store when accessing the arena region.
> + * - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
> + *
> + * Bit layout of `data` (32-bit):
> + *
> + * +--------------+--------+--------------+
> + * | 31-16 | 15-8 | 7-0 |
> + * | | | |
> + * | ARENA_OFFSET | Unused | EX_TYPE_BPF |
> + * +--------------+--------+--------------+
> + *
> + * - ARENA_OFFSET (16 bits): Offset used to calculate the address for load/store when
> + * accessing the arena region.
> + */
> +
> #define DONT_CLEAR 1
> +#define FIXUP_INSN_LEN_MASK GENMASK(7, 0)
> +#define FIXUP_REG_MASK GENMASK(15, 8)
> +#define FIXUP_ARENA_REG_MASK GENMASK(23, 16)
> +#define FIXUP_ARENA_ACCESS BIT(31)
> +#define DATA_ARENA_OFFSET_MASK GENMASK(31, 16)
>
> bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
> {
> - u32 reg = x->fixup >> 8;
> + u32 reg = FIELD_GET(FIXUP_REG_MASK, x->fixup);
> + u32 insn_len = FIELD_GET(FIXUP_INSN_LEN_MASK, x->fixup);
> + bool is_arena = !!(x->fixup & FIXUP_ARENA_ACCESS);
> + bool is_write = (reg == DONT_CLEAR);
> + unsigned long addr;
> + s16 off;
> + u32 arena_reg;
>
> /* jump over faulting load and clear dest register */
> if (reg != DONT_CLEAR)
> *(unsigned long *)((void *)regs + reg) = 0;
> - regs->ip += x->fixup & 0xff;
> + regs->ip += insn_len;
> +
> + if (is_arena) {
> + arena_reg = FIELD_GET(FIXUP_ARENA_REG_MASK, x->fixup);
> + off = FIELD_GET(DATA_ARENA_OFFSET_MASK, x->data);
> + addr = *(unsigned long *)((void *)regs + arena_reg) + off;
> + bpf_prog_report_arena_violation(is_write, addr);
> + }
> +
> return true;
> }
>
> @@ -2070,6 +2122,8 @@ st: if (is_imm8(insn->off))
> {
> struct exception_table_entry *ex;
> u8 *_insn = image + proglen + (start_of_ldx - temp);
> + u32 arena_reg, fixup_reg;
> + bool is_arena;
> s64 delta;
>
> if (!bpf_prog->aux->extable)
> @@ -2089,8 +2143,25 @@ st: if (is_imm8(insn->off))
>
> ex->data = EX_TYPE_BPF;
>
> - ex->fixup = (prog - start_of_ldx) |
> - ((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 8);
> + is_arena = (BPF_MODE(insn->code) == BPF_PROBE_MEM32) ||
> + (BPF_MODE(insn->code) == BPF_PROBE_ATOMIC);
> +
> + fixup_reg = (BPF_CLASS(insn->code) == BPF_LDX) ?
> + reg2pt_regs[dst_reg] : DONT_CLEAR;
> +
> + ex->fixup = FIELD_PREP(FIXUP_INSN_LEN_MASK, prog - start_of_ldx) |
> + FIELD_PREP(FIXUP_REG_MASK, fixup_reg);
> +
> + if (is_arena) {
> + ex->fixup |= FIXUP_ARENA_ACCESS;
> + if (BPF_CLASS(insn->code) == BPF_LDX)
> + arena_reg = reg2pt_regs[src_reg];
> + else
> + arena_reg = reg2pt_regs[dst_reg];
> +
> + ex->fixup |= FIELD_PREP(FIXUP_ARENA_REG_MASK, arena_reg);
> + ex->data |= FIELD_PREP(DATA_ARENA_OFFSET_MASK, insn->off);
> + }
> }
> break;
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 8f6e87f0f3a89..9959e30f805b2 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2013,6 +2013,7 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
> struct bpf_verifier_log *log);
> void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map);
> void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_ops_desc);
> +void bpf_prog_report_arena_violation(bool write, unsigned long addr);
> #else
> #define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; })
> static inline bool bpf_try_module_get(const void *data, struct module *owner)
> @@ -2045,6 +2046,10 @@ static inline void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_op
> {
> }
>
> +static inline void bpf_prog_report_arena_violation(bool write, unsigned long addr)
> +{
> +}
> +
> #endif
>
> int bpf_prog_ctx_arg_info_init(struct bpf_prog *prog,
> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> index 5b37753799d20..a1653d1c04ca5 100644
> --- a/kernel/bpf/arena.c
> +++ b/kernel/bpf/arena.c
> @@ -633,3 +633,23 @@ static int __init kfunc_init(void)
> return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &common_kfunc_set);
> }
> late_initcall(kfunc_init);
> +
> +void bpf_prog_report_arena_violation(bool write, unsigned long addr)
> +{
> + struct bpf_stream_stage ss;
> + struct bpf_prog *prog;
> + u64 user_vm_start;
> +
> + prog = bpf_prog_find_from_stack();
bpf_prog_find_from_stack depends on arch_bpf_stack_walk, which isn't available
on all archs. How about switching to bpf_prog_ksym_find with the fault pc?
> + if (!prog)
> + return;
> +
> + user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
> + addr += (user_vm_start >> 32) << 32;
> +
> + bpf_stream_stage(ss, prog, BPF_STDERR, ({
> + bpf_stream_printk(ss, "ERROR: Arena %s access at unmapped address 0x%lx\n",
> + write ? "WRITE" : "READ", addr);
> + bpf_stream_dump_stack(ss);
> + }));
> +}
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-08-29 10:30 ` Xu Kuohai
@ 2025-08-29 20:28 ` Alexei Starovoitov
2025-09-01 13:34 ` Puranjay Mohan
0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2025-08-29 20:28 UTC (permalink / raw)
To: Xu Kuohai
Cc: Puranjay Mohan, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Catalin Marinas, Will Deacon,
Kumar Kartikeya Dwivedi, bpf
On Fri, Aug 29, 2025 at 3:30 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
>
> > +
> > +void bpf_prog_report_arena_violation(bool write, unsigned long addr)
> > +{
> > + struct bpf_stream_stage ss;
> > + struct bpf_prog *prog;
> > + u64 user_vm_start;
> > +
> > + prog = bpf_prog_find_from_stack();
>
> bpf_prog_find_from_stack depends on arch_bpf_stack_walk, which isn't available
> on all archs. How about switching to bpf_prog_ksym_find with the fault pc?
Out of archs that support bpf arena only riscv doesn't
support arch_bpf_stack_walk(), which is probably fixable.
But I agree that direct bpf_prog_ksym_find() is cleaner here.
We need to make sure it works for subprogs, since streams[2] are
valid only for main prog.
I think we can add:
struct bpf_prog_aux {
...
struct bpf_prog_aux *main_prog;
};
init it during jit_subprogs() and use it for stream access.
We can also remove skipping of subprogs in find_from_stack_cb() then.
Kumar, wdyt?
In a bigger follow up maybe we can split bpf_prog_aux into two
structures for main prog and for subprogs, since we copy
a bunch of pointers from main into subprogs.
With 'main_prog' pointer all these pointers (like linfo, func_info,
kfunc_tab, arena, btf) can stay in 'struct bpf_main_prog_aux',
so 'struct bpf_subprog_aux' can be smaller.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-08-29 20:28 ` Alexei Starovoitov
@ 2025-09-01 13:34 ` Puranjay Mohan
2025-09-01 16:39 ` Alexei Starovoitov
0 siblings, 1 reply; 22+ messages in thread
From: Puranjay Mohan @ 2025-09-01 13:34 UTC (permalink / raw)
To: Alexei Starovoitov, Xu Kuohai
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Catalin Marinas, Will Deacon, Kumar Kartikeya Dwivedi, bpf
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Fri, Aug 29, 2025 at 3:30 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
>>
>> > +
>> > +void bpf_prog_report_arena_violation(bool write, unsigned long addr)
>> > +{
>> > + struct bpf_stream_stage ss;
>> > + struct bpf_prog *prog;
>> > + u64 user_vm_start;
>> > +
>> > + prog = bpf_prog_find_from_stack();
>>
>> bpf_prog_find_from_stack depends on arch_bpf_stack_walk, which isn't available
>> on all archs. How about switching to bpf_prog_ksym_find with the fault pc?
>
> Out of archs that support bpf arena only riscv doesn't
> support arch_bpf_stack_walk(), which is probably fixable.
> But I agree that direct bpf_prog_ksym_find() is cleaner here.
> We need to make sure it works for subprogs, since streams[2] are
> valid only for main prog.
> I think we can add:
> struct bpf_prog_aux {
> ...
> struct bpf_prog_aux *main_prog;
> };
> init it during jit_subprogs() and use it for stream access.
> We can also remove skipping of subprogs in find_from_stack_cb() then.
>
> Kumar, wdyt?
So, IIUC, after adding struct bpf_prog_aux *main_prog_aux in struct
bpf_prog_aux,
We can do in bpf_prog_alloc_no_stats():
fp->aux->main_prog_aux = aux;
and in jit_subprogs():
func[i]->aux->main_prog_aux = prog->aux;
and then all users of bpf_stream_get() can do
bpf_stream_get(stream_id, prog->aux->main_prog_aux);
with above we can allow find_from_stack_cb() to return subprogs.
and bpf_prog_ksym_find() can be used in
bpf_prog_report_arena_violation() without any other changes.
Thanks,
Puranjay
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-09-01 13:34 ` Puranjay Mohan
@ 2025-09-01 16:39 ` Alexei Starovoitov
2025-09-01 19:22 ` Puranjay Mohan
0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2025-09-01 16:39 UTC (permalink / raw)
To: Puranjay Mohan
Cc: Xu Kuohai, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Catalin Marinas, Will Deacon, Kumar Kartikeya Dwivedi, bpf
On Mon, Sep 1, 2025 at 6:34 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Fri, Aug 29, 2025 at 3:30 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
> >>
> >> > +
> >> > +void bpf_prog_report_arena_violation(bool write, unsigned long addr)
> >> > +{
> >> > + struct bpf_stream_stage ss;
> >> > + struct bpf_prog *prog;
> >> > + u64 user_vm_start;
> >> > +
> >> > + prog = bpf_prog_find_from_stack();
> >>
> >> bpf_prog_find_from_stack depends on arch_bpf_stack_walk, which isn't available
> >> on all archs. How about switching to bpf_prog_ksym_find with the fault pc?
> >
> > Out of archs that support bpf arena only riscv doesn't
> > support arch_bpf_stack_walk(), which is probably fixable.
> > But I agree that direct bpf_prog_ksym_find() is cleaner here.
> > We need to make sure it works for subprogs, since streams[2] are
> > valid only for main prog.
> > I think we can add:
> > struct bpf_prog_aux {
> > ...
> > struct bpf_prog_aux *main_prog;
> > };
> > init it during jit_subprogs() and use it for stream access.
> > We can also remove skipping of subprogs in find_from_stack_cb() then.
> >
> > Kumar, wdyt?
>
> So, IIUC, after adding struct bpf_prog_aux *main_prog_aux in struct
> bpf_prog_aux,
>
> We can do in bpf_prog_alloc_no_stats():
> fp->aux->main_prog_aux = aux;
>
> and in jit_subprogs():
> func[i]->aux->main_prog_aux = prog->aux;
>
> and then all users of bpf_stream_get() can do
> bpf_stream_get(stream_id, prog->aux->main_prog_aux);
>
> with above we can allow find_from_stack_cb() to return subprogs.
> and bpf_prog_ksym_find() can be used in
> bpf_prog_report_arena_violation() without any other changes.
Yes. That's exactly the proposal.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-09-01 16:39 ` Alexei Starovoitov
@ 2025-09-01 19:22 ` Puranjay Mohan
2025-09-01 22:44 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 22+ messages in thread
From: Puranjay Mohan @ 2025-09-01 19:22 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Xu Kuohai, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Catalin Marinas, Will Deacon, Kumar Kartikeya Dwivedi, bpf
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Mon, Sep 1, 2025 at 6:34 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Fri, Aug 29, 2025 at 3:30 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
>> >>
>> >> > +
>> >> > +void bpf_prog_report_arena_violation(bool write, unsigned long addr)
>> >> > +{
>> >> > + struct bpf_stream_stage ss;
>> >> > + struct bpf_prog *prog;
>> >> > + u64 user_vm_start;
>> >> > +
>> >> > + prog = bpf_prog_find_from_stack();
>> >>
>> >> bpf_prog_find_from_stack depends on arch_bpf_stack_walk, which isn't available
>> >> on all archs. How about switching to bpf_prog_ksym_find with the fault pc?
>> >
>> > Out of archs that support bpf arena only riscv doesn't
>> > support arch_bpf_stack_walk(), which is probably fixable.
>> > But I agree that direct bpf_prog_ksym_find() is cleaner here.
>> > We need to make sure it works for subprogs, since streams[2] are
>> > valid only for main prog.
>> > I think we can add:
>> > struct bpf_prog_aux {
>> > ...
>> > struct bpf_prog_aux *main_prog;
>> > };
>> > init it during jit_subprogs() and use it for stream access.
>> > We can also remove skipping of subprogs in find_from_stack_cb() then.
>> >
>> > Kumar, wdyt?
>>
>> So, IIUC, after adding struct bpf_prog_aux *main_prog_aux in struct
>> bpf_prog_aux,
>>
>> We can do in bpf_prog_alloc_no_stats():
>> fp->aux->main_prog_aux = aux;
>>
>> and in jit_subprogs():
>> func[i]->aux->main_prog_aux = prog->aux;
>>
>> and then all users of bpf_stream_get() can do
>> bpf_stream_get(stream_id, prog->aux->main_prog_aux);
>>
>> with above we can allow find_from_stack_cb() to return subprogs.
>> and bpf_prog_ksym_find() can be used in
>> bpf_prog_report_arena_violation() without any other changes.
>
> Yes. That's exactly the proposal.
I think we should go ahead with this approach but also divide
bpf_prog_aux into two as you suggested. I will send a follow-up set for
that.
Thanks,
Puranjay
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-09-01 19:22 ` Puranjay Mohan
@ 2025-09-01 22:44 ` Kumar Kartikeya Dwivedi
2025-09-02 2:18 ` Alexei Starovoitov
0 siblings, 1 reply; 22+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-09-01 22:44 UTC (permalink / raw)
To: Puranjay Mohan
Cc: Alexei Starovoitov, Xu Kuohai, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Catalin Marinas,
Will Deacon, bpf
On Mon, 1 Sept 2025 at 21:22, Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Mon, Sep 1, 2025 at 6:34 AM Puranjay Mohan <puranjay@kernel.org> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>
> >> > On Fri, Aug 29, 2025 at 3:30 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
> >> >>
> >> >> > +
> >> >> > +void bpf_prog_report_arena_violation(bool write, unsigned long addr)
> >> >> > +{
> >> >> > + struct bpf_stream_stage ss;
> >> >> > + struct bpf_prog *prog;
> >> >> > + u64 user_vm_start;
> >> >> > +
> >> >> > + prog = bpf_prog_find_from_stack();
> >> >>
> >> >> bpf_prog_find_from_stack depends on arch_bpf_stack_walk, which isn't available
> >> >> on all archs. How about switching to bpf_prog_ksym_find with the fault pc?
> >> >
> >> > Out of archs that support bpf arena only riscv doesn't
> >> > support arch_bpf_stack_walk(), which is probably fixable.
> >> > But I agree that direct bpf_prog_ksym_find() is cleaner here.
> >> > We need to make sure it works for subprogs, since streams[2] are
> >> > valid only for main prog.
> >> > I think we can add:
> >> > struct bpf_prog_aux {
> >> > ...
> >> > struct bpf_prog_aux *main_prog;
> >> > };
> >> > init it during jit_subprogs() and use it for stream access.
> >> > We can also remove skipping of subprogs in find_from_stack_cb() then.
> >> >
> >> > Kumar, wdyt?
> >>
> >> So, IIUC, after adding struct bpf_prog_aux *main_prog_aux in struct
> >> bpf_prog_aux,
> >>
> >> We can do in bpf_prog_alloc_no_stats():
> >> fp->aux->main_prog_aux = aux;
> >>
> >> and in jit_subprogs():
> >> func[i]->aux->main_prog_aux = prog->aux;
> >>
> >> and then all users of bpf_stream_get() can do
> >> bpf_stream_get(stream_id, prog->aux->main_prog_aux);
> >>
> >> with above we can allow find_from_stack_cb() to return subprogs.
> >> and bpf_prog_ksym_find() can be used in
> >> bpf_prog_report_arena_violation() without any other changes.
> >
> > Yes. That's exactly the proposal.
>
> I think we should go ahead with this approach but also divide
> bpf_prog_aux into two as you suggested. I will send a follow-up set for
> that.
find_from_stack_cb shouldn't be returning subprogs IMO, I think what
was meant by Alexei was to use the first found subprog to jump to its
main prog.
I think this will also make stream related logic work in cases where
we only have subprog of the program, and not the main prog (e.g. timer
callbacks, etc.) in the stack trace, which was probably an oversight
on my part. It would be nice to test this, let me know if you can fold
it in your set / send as a follow up, otherwise I will.
Additionally, I feel the extra pointer is unnecessary. Instead, the
logic to jump to the main prog from the subprog can be (if I didn't
miss anything):
prog->aux->func ? prog->aux->func[0]->aux ? prog-aux
When prog->aux->func is not set, that means there are no subprogs. I
would simply wrap this logic in bpf_main_prog_aux or something.
>
> Thanks,
> Puranjay
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr
2025-09-01 22:44 ` Kumar Kartikeya Dwivedi
@ 2025-09-02 2:18 ` Alexei Starovoitov
0 siblings, 0 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2025-09-02 2:18 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: Puranjay Mohan, Xu Kuohai, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Catalin Marinas, Will Deacon, bpf
On Mon, Sep 1, 2025 at 3:45 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
>
> Additionally, I feel the extra pointer is unnecessary. Instead, the
> logic to jump to the main prog from the subprog can be (if I didn't
> miss anything):
>
> prog->aux->func ? prog->aux->func[0]->aux ? prog-aux
>
> When prog->aux->func is not set, that means there are no subprogs. I
> would simply wrap this logic in bpf_main_prog_aux or something.
Good point. Forgot about this one.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-09-02 2:18 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 15:37 [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Puranjay Mohan
2025-08-27 15:37 ` [PATCH bpf-next v4 1/3] bpf: arm64: simplify exception table handling Puranjay Mohan
2025-08-28 0:19 ` Kumar Kartikeya Dwivedi
2025-08-29 10:06 ` Xu Kuohai
2025-08-27 15:37 ` [PATCH bpf-next v4 2/3] bpf: Report arena faults to BPF stderr Puranjay Mohan
2025-08-28 0:22 ` Kumar Kartikeya Dwivedi
2025-08-28 0:27 ` Kumar Kartikeya Dwivedi
2025-08-28 12:14 ` Puranjay Mohan
2025-08-29 10:30 ` Xu Kuohai
2025-08-29 20:28 ` Alexei Starovoitov
2025-09-01 13:34 ` Puranjay Mohan
2025-09-01 16:39 ` Alexei Starovoitov
2025-09-01 19:22 ` Puranjay Mohan
2025-09-01 22:44 ` Kumar Kartikeya Dwivedi
2025-09-02 2:18 ` Alexei Starovoitov
2025-08-27 15:37 ` [PATCH bpf-next v4 3/3] selftests/bpf: Add tests for arena fault reporting Puranjay Mohan
2025-08-27 19:54 ` Yonghong Song
2025-08-27 23:49 ` Kumar Kartikeya Dwivedi
2025-08-28 12:25 ` Puranjay Mohan
2025-08-28 15:44 ` Yonghong Song
2025-08-28 0:23 ` [PATCH bpf-next v4 0/3] bpf: Report arena faults to BPF streams Kumar Kartikeya Dwivedi
2025-08-28 12:13 ` Puranjay Mohan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).