BPF List
 help / color / mirror / Atom feed
* [PATCH 0/5] LoongArch: BPF: arena instruction gating, private stack and exceptions
@ 2026-06-18  3:38 George Guo
  2026-06-18  3:38 ` [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn() George Guo
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: George Guo @ 2026-06-18  3:38 UTC (permalink / raw)
  To: chenhuacai, yangtiezhu, hengqi.chen
  Cc: kernel, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, loongarch, linux-kernel, bpf,
	linux-kselftest, George Guo

From: George Guo <guodongtai@kylinos.cn>

This series adds three LoongArch BPF JIT features, plus the accompanying
selftest changes:

  1. Gate the arena instructions the JIT does not implement (atomics and
     sign-extending loads on arena pointers) via bpf_jit_supports_insn(),
     so the verifier rejects such programs early with a clear message
     instead of letting them fail late in the JIT.
  2. Per-program private stack (bpf_jit_supports_private_stack()): the BPF
     stack of deep/recursive tracing programs is moved off the kernel
     stack into a per-CPU allocation bracketed by overflow/underflow
     guards that are checked on teardown.
  3. Exceptions / bpf_throw (bpf_jit_supports_exceptions()): unwind to the
     exception boundary program via arch_bpf_stack_walk() (built on the
     ORC unwinder) and reuse its frame in the exception callback. Gated on
     CONFIG_UNWINDER_ORC.

Patches 4-5 are the selftests side: a LoongArch deny list (arena_atomics,
which patch 1 deliberately rejects) and enabling the struct_ops private
stack test on LoongArch. They touch tools/testing/selftests/bpf, hence the
bpf@vger / linux-kselftest Cc.

The series is independent of the earlier "LoongArch: BPF: Support
internal-only MOV to resolve per-CPU addrs" / "Add timed may_goto support"
patches [1] (no functional or apply dependency) and targets the LoongArch
tree.

Testing on a LoongArch board (test_progs):
  - exceptions:               117 subtests, 0 failed
  - struct_ops_private_stack: private_stack / _fail / _recur, 0 failed
  - arena list/htab/strsearch and verifier_arena*: pass
  - arena_atomics: rejected by the verifier as expected
      ("BPF_ATOMIC stores into Rn arena is not allowed"), hence the
      deny-list entry in patch 4

[1] <https://lore.kernel.org/all/20260609041407.122384-1-dongtai.guo@linux.dev/>

George Guo (5):
  LoongArch: BPF: Gate unsupported arena instructions via
    bpf_jit_supports_insn()
  LoongArch: BPF: Add private stack support
  LoongArch: BPF: Add exceptions (bpf_throw) support
  selftests/bpf: Add LoongArch deny list
  selftests/bpf: Enable struct_ops private stack test for LoongArch

 arch/loongarch/kernel/stacktrace.c            |  52 ++++++
 arch/loongarch/net/bpf_jit.c                  | 172 +++++++++++++++++-
 arch/loongarch/net/bpf_jit.h                  |   1 +
 .../testing/selftests/bpf/DENYLIST.loongarch  |   2 +
 .../bpf/prog_tests/struct_ops_private_stack.c |   2 +-
 5 files changed, 222 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/DENYLIST.loongarch


base-commit: 186d3c4e92242351afc24d9784f31cb4cd08a4b7
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn()
  2026-06-18  3:38 [PATCH 0/5] LoongArch: BPF: arena instruction gating, private stack and exceptions George Guo
@ 2026-06-18  3:38 ` George Guo
  2026-06-18  3:53   ` sashiko-bot
  2026-06-18  4:19   ` bot+bpf-ci
  2026-06-18  3:38 ` [PATCH 2/5] LoongArch: BPF: Add private stack support George Guo
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 11+ messages in thread
From: George Guo @ 2026-06-18  3:38 UTC (permalink / raw)
  To: chenhuacai, yangtiezhu, hengqi.chen
  Cc: kernel, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, loongarch, linux-kernel, bpf,
	linux-kselftest, George Guo

From: George Guo <guodongtai@kylinos.cn>

The JIT does not implement atomics on arena pointers (BPF_PROBE_ATOMIC)
nor sign-extending loads from the arena (BPF_PROBE_MEM32SX). Without a
bpf_jit_supports_insn() callback the verifier assumes both are available,
so such programs are accepted only to fail later in the JIT with a
confusing -EINVAL 'unknown opcode'.

Implement bpf_jit_supports_insn() to reject these instructions in the
arena case. The verifier then rejects the program early with a clear
message ('BPF_ATOMIC stores into R<n> ... is not allowed' / 'sign
extending loads from arena are not supported yet'). Regular arena
accesses (BPF_PROBE_MEM32 loads/stores of all sizes) remain supported.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 arch/loongarch/net/bpf_jit.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index 24913dc7f4e8..3f9ffdde2491 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -2357,6 +2357,26 @@ bool bpf_jit_supports_arena(void)
 	return true;
 }
 
+bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
+{
+	if (!in_arena)
+		return true;
+
+	switch (insn->code) {
+	case BPF_STX | BPF_ATOMIC | BPF_W:
+	case BPF_STX | BPF_ATOMIC | BPF_DW:
+		/* Atomics on arena pointers are not implemented yet. */
+		return false;
+	case BPF_LDX | BPF_MEMSX | BPF_B:
+	case BPF_LDX | BPF_MEMSX | BPF_H:
+	case BPF_LDX | BPF_MEMSX | BPF_W:
+		/* Sign-extending loads from arena are not implemented yet. */
+		return false;
+	}
+
+	return true;
+}
+
 bool bpf_jit_supports_fsession(void)
 {
 	return true;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] LoongArch: BPF: Add private stack support
  2026-06-18  3:38 [PATCH 0/5] LoongArch: BPF: arena instruction gating, private stack and exceptions George Guo
  2026-06-18  3:38 ` [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn() George Guo
@ 2026-06-18  3:38 ` George Guo
  2026-06-18  3:55   ` sashiko-bot
  2026-06-18  3:38 ` [PATCH 3/5] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: George Guo @ 2026-06-18  3:38 UTC (permalink / raw)
  To: chenhuacai, yangtiezhu, hengqi.chen
  Cc: kernel, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, loongarch, linux-kernel, bpf,
	linux-kselftest, George Guo

From: George Guo <guodongtai@kylinos.cn>

Support per-program private stacks, advertised via
bpf_jit_supports_private_stack(). When the verifier marks a program with
jits_use_priv_stack (e.g. a sufficiently deep, potentially recursive
tracing program), its BPF stack is moved off the kernel stack into a
per-CPU allocation, reducing kernel stack pressure.

The private stack is allocated in bpf_int_jit_compile() as the
verifier-computed stack depth plus two 16-byte guard regions used to
detect overflow and underflow; the guards are initialised at allocation
time and validated in bpf_jit_free(). S5 (otherwise saved/restored but
unused by the JIT) is reused to hold the private stack pointer, loaded
in the prologue with the current CPU's per-CPU offset ($r21). When a
private stack is in use the BPF frame pointer points into this per-CPU
region and the BPF stack is no longer reserved on the kernel stack.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 arch/loongarch/net/bpf_jit.c | 111 ++++++++++++++++++++++++++++++++++-
 arch/loongarch/net/bpf_jit.h |   1 +
 2 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index 3f9ffdde2491..c410b02e64be 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -18,8 +18,13 @@
 
 #define REG_TCC		LOONGARCH_GPR_A6
 #define REG_ARENA	LOONGARCH_GPR_S6 /* For storing arena_vm_start */
+#define REG_PRIV_SP	LOONGARCH_GPR_S5 /* For storing the private stack pointer */
 #define BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack) (round_up(stack, 16) - 80)
 
+/* Memory size/value to protect private stack overflow/underflow */
+#define PRIV_STACK_GUARD_SZ	16
+#define PRIV_STACK_GUARD_VAL	0xEB9F12345678eb9fULL
+
 static const int regmap[] = {
 	/* return value from in-kernel function, and exit value for eBPF program */
 	[BPF_REG_0] = LOONGARCH_GPR_A5,
@@ -40,6 +45,15 @@ static const int regmap[] = {
 	[BPF_REG_AX] = LOONGARCH_GPR_T0,
 };
 
+static void emit_percpu_ptr(struct jit_ctx *ctx, u8 dst, void __percpu *ptr)
+{
+	move_imm(ctx, dst, (__force long)ptr, false);
+#ifdef CONFIG_SMP
+	/* dst += __my_cpu_offset, held in $r21 */
+	emit_insn(ctx, addd, dst, dst, LOONGARCH_GPR_U0);
+#endif
+}
+
 static void prepare_bpf_tail_call_cnt(struct jit_ctx *ctx, int *store_offset)
 {
 	const struct bpf_prog *prog = ctx->prog;
@@ -141,7 +155,14 @@ static void build_prologue(struct jit_ctx *ctx)
 		stack_adjust += 8;
 
 	stack_adjust = round_up(stack_adjust, 16);
-	stack_adjust += bpf_stack_adjust;
+
+	/*
+	 * When a private stack is used the BPF stack lives in a per-CPU
+	 * allocation rather than on the kernel stack, so only the non-BPF
+	 * part is reserved here.
+	 */
+	if (!ctx->priv_sp_used)
+		stack_adjust += bpf_stack_adjust;
 
 	move_reg(ctx, LOONGARCH_GPR_T0, LOONGARCH_GPR_RA);
 	/* Reserve space for the move_imm + jirl instruction */
@@ -191,8 +212,16 @@ static void build_prologue(struct jit_ctx *ctx)
 
 	emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
 
-	if (bpf_stack_adjust)
+	if (ctx->priv_sp_used) {
+		/* Set up the private stack pointer and the BPF frame pointer */
+		void __percpu *priv_stack_ptr;
+
+		priv_stack_ptr = prog->aux->priv_stack_ptr + PRIV_STACK_GUARD_SZ;
+		emit_percpu_ptr(ctx, REG_PRIV_SP, priv_stack_ptr);
+		emit_insn(ctx, addid, regmap[BPF_REG_FP], REG_PRIV_SP, bpf_stack_adjust);
+	} else if (bpf_stack_adjust) {
 		emit_insn(ctx, addid, regmap[BPF_REG_FP], LOONGARCH_GPR_SP, bpf_stack_adjust);
+	}
 
 	ctx->stack_size = stack_adjust;
 
@@ -2166,6 +2195,39 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
 	return ret < 0 ? ret : ret * LOONGARCH_INSN_SIZE;
 }
 
+static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size)
+{
+	int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
+	u64 *stack_ptr;
+
+	for_each_possible_cpu(cpu) {
+		stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
+		stack_ptr[0] = PRIV_STACK_GUARD_VAL;
+		stack_ptr[1] = PRIV_STACK_GUARD_VAL;
+		stack_ptr[underflow_idx] = PRIV_STACK_GUARD_VAL;
+		stack_ptr[underflow_idx + 1] = PRIV_STACK_GUARD_VAL;
+	}
+}
+
+static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size,
+				   struct bpf_prog *prog)
+{
+	int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
+	u64 *stack_ptr;
+
+	for_each_possible_cpu(cpu) {
+		stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
+		if (stack_ptr[0] != PRIV_STACK_GUARD_VAL ||
+		    stack_ptr[1] != PRIV_STACK_GUARD_VAL ||
+		    stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL ||
+		    stack_ptr[underflow_idx + 1] != PRIV_STACK_GUARD_VAL) {
+			pr_err("BPF private stack overflow/underflow detected for prog %sx\n",
+			       bpf_jit_get_prog_name(prog));
+			break;
+		}
+	}
+}
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_prog *prog)
 {
 	bool extra_pass = false;
@@ -2174,7 +2236,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 	struct jit_ctx ctx;
 	struct jit_data *jit_data;
 	struct bpf_binary_header *header;
-	struct bpf_binary_header *ro_header;
+	struct bpf_binary_header *ro_header = NULL;
+	void __percpu *priv_stack_ptr = NULL;
+	int priv_stack_alloc_sz;
 
 	/*
 	 * If BPF JIT was not enabled then we must fall back to
@@ -2190,6 +2254,22 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 			return prog;
 		prog->aux->jit_data = jit_data;
 	}
+	priv_stack_ptr = prog->aux->priv_stack_ptr;
+	if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
+		/*
+		 * Allocate the actual private stack: the verifier-calculated
+		 * stack size plus two guard regions to detect overflow and
+		 * underflow.
+		 */
+		priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 16) +
+				      2 * PRIV_STACK_GUARD_SZ;
+		priv_stack_ptr = __alloc_percpu_gfp(priv_stack_alloc_sz, 16, GFP_KERNEL);
+		if (!priv_stack_ptr)
+			goto out_priv_stack;
+
+		priv_stack_init_guard(priv_stack_ptr, priv_stack_alloc_sz);
+		prog->aux->priv_stack_ptr = priv_stack_ptr;
+	}
 	if (jit_data->ctx.offset) {
 		ctx = jit_data->ctx;
 		ro_header = jit_data->ro_header;
@@ -2205,6 +2285,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 	ctx.prog = prog;
 	ctx.arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
 	ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
+	ctx.priv_sp_used = priv_stack_ptr ? true : false;
 
 	ctx.offset = kvcalloc(prog->len + 1, sizeof(u32), GFP_KERNEL);
 	if (ctx.offset == NULL)
@@ -2298,7 +2379,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 		bpf_prog_fill_jited_linfo(prog, ctx.offset + 1);
 
 out_offset:
+		/*
+		 * A NULL ro_header here means the JIT failed, so release the
+		 * private stack that was allocated above; on success the
+		 * program keeps it until bpf_jit_free().
+		 */
+		if (!ro_header && priv_stack_ptr) {
+			free_percpu(priv_stack_ptr);
+			prog->aux->priv_stack_ptr = NULL;
+		}
 		kvfree(ctx.offset);
+out_priv_stack:
 		kfree(jit_data);
 		prog->aux->jit_data = NULL;
 	}
@@ -2324,6 +2415,8 @@ void bpf_jit_free(struct bpf_prog *prog)
 	if (prog->jited) {
 		struct jit_data *jit_data = prog->aux->jit_data;
 		struct bpf_binary_header *hdr;
+		void __percpu *priv_stack_ptr;
+		int priv_stack_alloc_sz;
 
 		/*
 		 * If we fail the final pass of JIT (from jit_subprogs), the
@@ -2336,6 +2429,13 @@ void bpf_jit_free(struct bpf_prog *prog)
 		}
 		hdr = bpf_jit_binary_pack_hdr(prog);
 		bpf_jit_binary_pack_free(hdr, NULL);
+		priv_stack_ptr = prog->aux->priv_stack_ptr;
+		if (priv_stack_ptr) {
+			priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 16) +
+					      2 * PRIV_STACK_GUARD_SZ;
+			priv_stack_check_guard(priv_stack_ptr, priv_stack_alloc_sz, prog);
+			free_percpu(prog->aux->priv_stack_ptr);
+		}
 		WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
 	}
 
@@ -2382,6 +2482,11 @@ bool bpf_jit_supports_fsession(void)
 	return true;
 }
 
+bool bpf_jit_supports_private_stack(void)
+{
+	return true;
+}
+
 /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool bpf_jit_supports_subprog_tailcalls(void)
 {
diff --git a/arch/loongarch/net/bpf_jit.h b/arch/loongarch/net/bpf_jit.h
index a8e29be35fa8..01a7ea47e79b 100644
--- a/arch/loongarch/net/bpf_jit.h
+++ b/arch/loongarch/net/bpf_jit.h
@@ -22,6 +22,7 @@ struct jit_ctx {
 	u32 stack_size;
 	u64 arena_vm_start;
 	u64 user_vm_start;
+	bool priv_sp_used;
 };
 
 struct jit_data {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] LoongArch: BPF: Add exceptions (bpf_throw) support
  2026-06-18  3:38 [PATCH 0/5] LoongArch: BPF: arena instruction gating, private stack and exceptions George Guo
  2026-06-18  3:38 ` [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn() George Guo
  2026-06-18  3:38 ` [PATCH 2/5] LoongArch: BPF: Add private stack support George Guo
@ 2026-06-18  3:38 ` George Guo
  2026-06-18  3:55   ` sashiko-bot
  2026-06-18  3:38 ` [PATCH 4/5] selftests/bpf: Add LoongArch deny list George Guo
  2026-06-18  3:38 ` [PATCH 5/5] selftests/bpf: Enable struct_ops private stack test for LoongArch George Guo
  4 siblings, 1 reply; 11+ messages in thread
From: George Guo @ 2026-06-18  3:38 UTC (permalink / raw)
  To: chenhuacai, yangtiezhu, hengqi.chen
  Cc: kernel, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, loongarch, linux-kernel, bpf,
	linux-kselftest, George Guo

From: George Guo <guodongtai@kylinos.cn>

Implement BPF exception support, advertised via
bpf_jit_supports_exceptions(). bpf_throw() unwinds the stack to find the
exception boundary program's frame and then invokes its exception
callback with that frame's stack and frame pointers.

Finding the boundary frame needs arch_bpf_stack_walk(), which reports
each frame's (ip, sp, fp). This is implemented on top of the ORC
unwinder: ORC updates the frame pointer per frame and walks JITed BPF
code via its generated-code frame-pointer fallback, which expects the
frame record at fp-8 ($ra) and fp-16 (previous fp) -- exactly what the
LoongArch BPF prologue already lays down. The capability is therefore
gated on CONFIG_UNWINDER_ORC; with other unwinders it returns false.

The walk is seeded with the live frame pointer ($r22). The kernel is
built with -fomit-frame-pointer, so $fp is an ordinary callee-saved
register preserved across the call from the JITed program into
bpf_throw() down to arch_bpf_stack_walk(), where it still points at the
innermost BPF frame for the ORC fallback to start from. It is captured
in a thin wrapper with no large stack locals, because the worker that
runs the unwind uses $r22 to address its own (pt_regs + unwind_state)
frame and would otherwise clobber the live $fp before it could be read.

On the JIT side, the exception callback does not build a normal frame:
it receives the boundary program's frame pointer as its third argument
(a2), sets FP to it and SP to FP - stack_size, and reuses the boundary's
frame. Because the callee-saved register saves are anchored at the top
of the frame (FP), the existing FP-relative epilogue restores the
boundary's registers and returns to the boundary's caller regardless of
the two programs' individual frame sizes. To keep the boundary and the
callback agreeing on the layout, the s6 slot is always reserved for
exception programs, mirroring the arena case.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 arch/loongarch/kernel/stacktrace.c | 52 ++++++++++++++++++++++++++++++
 arch/loongarch/net/bpf_jit.c       | 41 +++++++++++++++++++++--
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
index 387dc4d3c486..718c98b3f1fc 100644
--- a/arch/loongarch/kernel/stacktrace.c
+++ b/arch/loongarch/kernel/stacktrace.c
@@ -4,6 +4,7 @@
  *
  * Copyright (C) 2022 Loongson Technology Corporation Limited
  */
+#include <linux/filter.h>
 #include <linux/sched.h>
 #include <linux/stacktrace.h>
 #include <linux/uaccess.h>
@@ -40,6 +41,57 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
 	}
 }
 
+#ifdef CONFIG_UNWINDER_ORC
+/*
+ * Used by BPF exception support (bpf_throw) to find the exception boundary
+ * frame. The ORC unwinder reports the stack and frame pointer of each frame
+ * and, via its generated-code fallback, can walk JITed BPF frames, which set
+ * up the expected frame record ($ra at fp-8, previous fp at fp-16).
+ */
+static noinline void walk_stackframe_bpf(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp),
+					 void *cookie, unsigned long fp)
+{
+	unsigned long addr;
+	struct pt_regs dummyregs;
+	struct pt_regs *regs = &dummyregs;
+	struct unwind_state state;
+
+	regs->regs[3] = (unsigned long)__builtin_frame_address(0);
+	regs->csr_era = (unsigned long)__builtin_return_address(0);
+	regs->regs[1] = 0;
+	regs->regs[22] = fp;
+
+	for (unwind_start(&state, current, regs);
+	     !unwind_done(&state); unwind_next_frame(&state)) {
+		addr = unwind_get_return_address(&state);
+		if (!addr || !consume_fn(cookie, (u64)addr, (u64)state.sp, (u64)state.fp))
+			break;
+	}
+}
+
+void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp),
+			 void *cookie)
+{
+	unsigned long fp;
+
+	/*
+	 * Capture the live frame pointer ($r22/$fp) here, before handing off to
+	 * the worker. The kernel is built with -fomit-frame-pointer, so $fp is
+	 * an ordinary callee-saved register that is preserved across the call
+	 * from the JITed BPF program into bpf_throw() down to here, and thus
+	 * still points at the innermost BPF frame. The ORC frame-pointer
+	 * fallback walks the BPF frames up to the exception boundary from it.
+	 *
+	 * This must be a thin wrapper with no large stack locals: the worker
+	 * uses $r22 to address its frame, which would clobber the live $fp
+	 * before it could be read. __builtin_frame_address() cannot be used
+	 * either, as it is $sp-derived and would yield a kernel-stack frame.
+	 */
+	asm volatile("move %0, $r22" : "=r"(fp));
+	walk_stackframe_bpf(consume_fn, cookie, fp);
+}
+#endif /* CONFIG_UNWINDER_ORC */
+
 int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
 			     void *cookie, struct task_struct *task)
 {
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index c410b02e64be..22527428f0b3 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -142,6 +142,13 @@ static void build_prologue(struct jit_ctx *ctx)
 	int i, stack_adjust = 0, store_offset, bpf_stack_adjust;
 	const struct bpf_prog *prog = ctx->prog;
 	const bool is_main_prog = !bpf_is_subprog(prog);
+	/*
+	 * Exception boundary and callback programs must agree on the frame
+	 * layout: the callback reuses the boundary's frame to restore its
+	 * callee-saved registers, so the s6 slot is always reserved for them.
+	 */
+	const bool is_exception_prog = prog->aux->exception_boundary ||
+				       prog->aux->exception_cb;
 
 	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
 
@@ -151,7 +158,7 @@ static void build_prologue(struct jit_ctx *ctx)
 	/* To store tcc and tcc_ptr */
 	stack_adjust += sizeof(long) * 2;
 
-	if (ctx->arena_vm_start)
+	if (ctx->arena_vm_start || is_exception_prog)
 		stack_adjust += 8;
 
 	stack_adjust = round_up(stack_adjust, 16);
@@ -177,6 +184,19 @@ static void build_prologue(struct jit_ctx *ctx)
 	if (is_main_prog)
 		emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO, 0);
 
+	if (prog->aux->exception_cb) {
+		/*
+		 * The exception callback receives the boundary program's frame
+		 * pointer as its third argument (a2). Reuse that frame so the
+		 * (FP-anchored) epilogue restores the boundary's callee-saved
+		 * registers and returns to the boundary's caller. The boundary
+		 * already saved them, so nothing is pushed here.
+		 */
+		move_reg(ctx, LOONGARCH_GPR_FP, LOONGARCH_GPR_A2);
+		emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_FP, -stack_adjust);
+		goto setup_bpf_fp;
+	}
+
 	emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_adjust);
 
 	store_offset = stack_adjust - sizeof(long);
@@ -203,7 +223,7 @@ static void build_prologue(struct jit_ctx *ctx)
 	store_offset -= sizeof(long);
 	emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, store_offset);
 
-	if (ctx->arena_vm_start) {
+	if (ctx->arena_vm_start || is_exception_prog) {
 		store_offset -= sizeof(long);
 		emit_insn(ctx, std, REG_ARENA, LOONGARCH_GPR_SP, store_offset);
 	}
@@ -212,6 +232,7 @@ static void build_prologue(struct jit_ctx *ctx)
 
 	emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
 
+setup_bpf_fp:
 	if (ctx->priv_sp_used) {
 		/* Set up the private stack pointer and the BPF frame pointer */
 		void __percpu *priv_stack_ptr;
@@ -233,6 +254,9 @@ static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
 {
 	int stack_adjust = ctx->stack_size;
 	int load_offset;
+	const struct bpf_prog *prog = ctx->prog;
+	const bool is_exception_prog = prog->aux->exception_boundary ||
+				       prog->aux->exception_cb;
 
 	load_offset = stack_adjust - sizeof(long);
 	emit_insn(ctx, ldd, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, load_offset);
@@ -258,7 +282,7 @@ static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
 	load_offset -= sizeof(long);
 	emit_insn(ctx, ldd, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, load_offset);
 
-	if (ctx->arena_vm_start) {
+	if (ctx->arena_vm_start || is_exception_prog) {
 		load_offset -= sizeof(long);
 		emit_insn(ctx, ldd, REG_ARENA, LOONGARCH_GPR_SP, load_offset);
 	}
@@ -2487,6 +2511,17 @@ bool bpf_jit_supports_private_stack(void)
 	return true;
 }
 
+bool bpf_jit_supports_exceptions(void)
+{
+	/*
+	 * Walking kernel and BPF frames from within bpf_throw() relies on
+	 * arch_bpf_stack_walk(), which is only implemented for the ORC
+	 * unwinder. ORC reports each frame's stack and frame pointer and
+	 * walks JITed BPF frames via its frame-pointer fallback.
+	 */
+	return IS_ENABLED(CONFIG_UNWINDER_ORC);
+}
+
 /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
 bool bpf_jit_supports_subprog_tailcalls(void)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] selftests/bpf: Add LoongArch deny list
  2026-06-18  3:38 [PATCH 0/5] LoongArch: BPF: arena instruction gating, private stack and exceptions George Guo
                   ` (2 preceding siblings ...)
  2026-06-18  3:38 ` [PATCH 3/5] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
@ 2026-06-18  3:38 ` George Guo
  2026-06-18  3:52   ` sashiko-bot
  2026-06-18  3:38 ` [PATCH 5/5] selftests/bpf: Enable struct_ops private stack test for LoongArch George Guo
  4 siblings, 1 reply; 11+ messages in thread
From: George Guo @ 2026-06-18  3:38 UTC (permalink / raw)
  To: chenhuacai, yangtiezhu, hengqi.chen
  Cc: kernel, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, loongarch, linux-kernel, bpf,
	linux-kselftest, George Guo

From: George Guo <guodongtai@kylinos.cn>

Some test_progs cases cannot pass on LoongArch and otherwise fail the
run. Add a deny list mirroring the other architectures:

 - arena_atomics: the JIT gates atomic operations on arena pointers
   (bpf_jit_supports_insn() rejects BPF_ATOMIC in the arena case), so
   the verifier rejects these programs early and the skeleton fails to
   load. Observed on LoongArch:

       13: (db) r3 = atomic64_fetch_add((u64 *)(r2 +0), r3)
       BPF_ATOMIC stores into R2 arena is not allowed
       ... failed to load: -EACCES

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 tools/testing/selftests/bpf/DENYLIST.loongarch | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/DENYLIST.loongarch

diff --git a/tools/testing/selftests/bpf/DENYLIST.loongarch b/tools/testing/selftests/bpf/DENYLIST.loongarch
new file mode 100644
index 000000000000..925005e4298f
--- /dev/null
+++ b/tools/testing/selftests/bpf/DENYLIST.loongarch
@@ -0,0 +1,2 @@
+# loongarch64 deny list for BPF CI and local vmtest
+arena_atomics					# JIT does not support atomics in arena (verifier rejects BPF_ATOMIC on arena pointers)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] selftests/bpf: Enable struct_ops private stack test for LoongArch
  2026-06-18  3:38 [PATCH 0/5] LoongArch: BPF: arena instruction gating, private stack and exceptions George Guo
                   ` (3 preceding siblings ...)
  2026-06-18  3:38 ` [PATCH 4/5] selftests/bpf: Add LoongArch deny list George Guo
@ 2026-06-18  3:38 ` George Guo
  4 siblings, 0 replies; 11+ messages in thread
From: George Guo @ 2026-06-18  3:38 UTC (permalink / raw)
  To: chenhuacai, yangtiezhu, hengqi.chen
  Cc: kernel, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, loongarch, linux-kernel, bpf,
	linux-kselftest, George Guo

From: George Guo <guodongtai@kylinos.cn>

LoongArch now supports BPF private stacks via
bpf_jit_supports_private_stack(), so let the struct_ops private stack
runtime test (private_stack / private_stack_fail / private_stack_recur)
run there instead of being skipped.

Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
 .../testing/selftests/bpf/prog_tests/struct_ops_private_stack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
index 98db9bafa44b..fcc86f1dfd15 100644
--- a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
+++ b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
@@ -5,7 +5,7 @@
 #include "struct_ops_private_stack_fail.skel.h"
 #include "struct_ops_private_stack_recur.skel.h"
 
-#if defined(__x86_64__) || defined(__aarch64__) || defined(__powerpc64__)
+#if defined(__x86_64__) || defined(__aarch64__) || defined(__powerpc64__) || defined(__loongarch__)
 static void test_private_stack(void)
 {
 	struct struct_ops_private_stack *skel;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] selftests/bpf: Add LoongArch deny list
  2026-06-18  3:38 ` [PATCH 4/5] selftests/bpf: Add LoongArch deny list George Guo
@ 2026-06-18  3:52   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-18  3:52 UTC (permalink / raw)
  To: George Guo; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The `arena_spin_lock` test program relies on BPF atomic operations on arena pointers and will fail on LoongArch, but it was omitted from the newly created `DENYLIST.loongarch`.
--

commit 4b438bd9b260c0e4fb1672644286e66f7e59d089
Author: George Guo <guodongtai@kylinos.cn>

selftests/bpf: Add LoongArch deny list

This commit adds a new test denylist for LoongArch in the BPF selftests. It
disables the arena_atomics test because the LoongArch JIT does not currently
support BPF atomic operations on arena pointers, which causes the verifier
to reject the programs.

> diff --git a/tools/testing/selftests/bpf/DENYLIST.loongarch b/tools/testing/selftests/bpf/DENYLIST.loongarch
> new file mode 100644
> index 0000000000000..925005e4298fa
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/DENYLIST.loongarch
> @@ -0,0 +1,2 @@
> +# loongarch64 deny list for BPF CI and local vmtest
> +arena_atomics					# JIT does not support atomics in arena (verifier rejects BPF_ATOMIC on arena pointers)

[Severity: Medium]
Should the arena_spin_lock test be added to this list as well?

The arena_spin_lock test also relies on BPF atomic operations on arena
pointers through atomic_try_cmpxchg_acquire(). Since the verifier rejects
BPF_ATOMIC on arena pointers for LoongArch, will arena_spin_lock also fail
to load and cause CI runs to break?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618033809.98253-1-dongtai.guo@linux.dev?part=4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn()
  2026-06-18  3:38 ` [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn() George Guo
@ 2026-06-18  3:53   ` sashiko-bot
  2026-06-18  4:19   ` bot+bpf-ci
  1 sibling, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-18  3:53 UTC (permalink / raw)
  To: George Guo; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] bpf_jit_supports_insn() fails to reject 1-byte and 2-byte atomic instructions on arena pointers, leaving the verifier/JIT mismatch bug open for these instruction sizes.
--

commit 862ea469523044315d6f8a9fd4faf30c77844856
Author: George Guo <guodongtai@kylinos.cn>

LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn()

This commit implements bpf_jit_supports_insn() for LoongArch to reject
unsupported BPF arena instructions during verification. This prevents
confusing failures later in the JIT compiler.

> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> --- a/arch/loongarch/net/bpf_jit.c
> +++ b/arch/loongarch/net/bpf_jit.c
> @@ -2357,6 +2357,26 @@ bool bpf_jit_supports_arena(void)

[ ... ]

> +bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
> +{
> +	if (!in_arena)
> +		return true;
> +
> +	switch (insn->code) {
> +	case BPF_STX | BPF_ATOMIC | BPF_W:
> +	case BPF_STX | BPF_ATOMIC | BPF_DW:

[Severity: Medium]
Are 8-bit and 16-bit atomics supported on arena pointers?

This switch handles the 32-bit and 64-bit atomic instructions, but it omits the
BPF_B and BPF_H sizes. If 1-byte and 2-byte atomics are used on an arena
pointer, will this return true and cause the verifier to accept them, leading
to the same JIT failure this patch is trying to fix?

> +		/* Atomics on arena pointers are not implemented yet. */
> +		return false;

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618033809.98253-1-dongtai.guo@linux.dev?part=1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] LoongArch: BPF: Add private stack support
  2026-06-18  3:38 ` [PATCH 2/5] LoongArch: BPF: Add private stack support George Guo
@ 2026-06-18  3:55   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-18  3:55 UTC (permalink / raw)
  To: George Guo; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] Per-CPU private stack memory leak upon JIT compilation failure.
- [Low] Undeclared identifier 'LOONGARCH_GPR_U0' causes build failure.
- [Low] Typo in the private stack guard check error message (`%sx` instead of `%s`).

Pre-existing issues:
- [Critical] Incorrect BPF_TAIL_CALL_CNT_PTR_STACK_OFF offset calculation when arena or exception boundary features are used.
--

commit ec3426d8bc4121c18182b9bd92e8adc7fa663dc5
Author: George Guo <guodongtai@kylinos.cn>

LoongArch: BPF: Add private stack support

Support per-program private stacks to reduce kernel stack pressure for deep
or recursive tracing programs. The stack depth is computed and allocated
per-CPU with guard regions. If a private stack is used, the BPF frame
pointer is pointed into this region instead of the kernel stack.

> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> --- a/arch/loongarch/net/bpf_jit.c
> +++ b/arch/loongarch/net/bpf_jit.c
> @@ -18,8 +18,13 @@
>  
>  #define REG_TCC		LOONGARCH_GPR_A6
>  #define REG_ARENA	LOONGARCH_GPR_S6 /* For storing arena_vm_start */
> +#define REG_PRIV_SP	LOONGARCH_GPR_S5 /* For storing the private stack pointer */
>  #define BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack) (round_up(stack, 16) - 80)

[Severity: Critical]
This isn't a bug introduced by this patch, but is the tail call count
offset calculation correct when arena or exception boundary features are
enabled? When ctx->arena_vm_start or is_exception_prog is true, 8 additional
bytes are reserved on the stack, which shifts the actual location of TCC_PTR.
Because the macro doesn't account for this shift, will tail calls load the
TCC counter value instead of the pointer address, causing a kernel panic
when it's dereferenced?

[ ... ]

> +static void emit_percpu_ptr(struct jit_ctx *ctx, u8 dst, void __percpu *ptr)
> +{
> +	move_imm(ctx, dst, (__force long)ptr, false);
> +#ifdef CONFIG_SMP
> +	/* dst += __my_cpu_offset, held in $r21 */
> +	emit_insn(ctx, addd, dst, dst, LOONGARCH_GPR_U0);
> +#endif
> +}

[Severity: Low]
Does LOONGARCH_GPR_U0 exist in enum loongarch_gpr? It appears missing from
the loongarch include headers, which would cause a build failure when
CONFIG_SMP is enabled.

[ ... ]

> +static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size,
> +				   struct bpf_prog *prog)
> +{
[ ... ]
> +		if (stack_ptr[0] != PRIV_STACK_GUARD_VAL ||
> +		    stack_ptr[1] != PRIV_STACK_GUARD_VAL ||
> +		    stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL ||
> +		    stack_ptr[underflow_idx + 1] != PRIV_STACK_GUARD_VAL) {
> +			pr_err("BPF private stack overflow/underflow detected for prog %sx\n",
> +			       bpf_jit_get_prog_name(prog));

[Severity: Low]
Is the "%sx" format specifier intentional here? This looks like a typo that
would append a literal 'x' to the end of the program name in the error log.

[ ... ]

>  	if (!prog->is_func || extra_pass) {
>  		if (extra_pass && ctx.idx != jit_data->ctx.idx) {
>  			pr_err_once("multi-func JIT bug %d != %d\n",
>  				    ctx.idx, jit_data->ctx.idx);
>  			goto out_free;
>  		}
[ ... ]
>  out_offset:
> +		/*
> +		 * A NULL ro_header here means the JIT failed, so release the
> +		 * private stack that was allocated above; on success the
> +		 * program keeps it until bpf_jit_free().
> +		 */
> +		if (!ro_header && priv_stack_ptr) {
> +			free_percpu(priv_stack_ptr);
> +			prog->aux->priv_stack_ptr = NULL;
> +		}
>  		kvfree(ctx.offset);
> +out_priv_stack:
>  		kfree(jit_data);
>  		prog->aux->jit_data = NULL;
>  	}
>  
>  	return prog;
>  
>  out_free:
>  	if (extra_pass) {
>  		prog->bpf_func = NULL;
>  		prog->jited = 0;
>  		prog->jited_len = 0;
>  	}
>  
>  	if (header) {
>  		bpf_arch_text_copy(&ro_header->size, &header->size, sizeof(header->size));
>  		bpf_jit_binary_pack_free(ro_header, header);
>  	}
>  	goto out_offset;
>  }

[Severity: High]
Does this code leak the private stack allocation when JIT compilation fails?
When jumping to out_free, ro_header is freed but the local pointer is not set
to NULL. 

Because ro_header is still non-NULL when execution falls through to
out_offset, the free_percpu() cleanup block is skipped. Furthermore, because
prog->jited is set to 0, bpf_jit_free() will also skip freeing it, resulting
in a permanent leak of the per-CPU allocation.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618033809.98253-1-dongtai.guo@linux.dev?part=2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] LoongArch: BPF: Add exceptions (bpf_throw) support
  2026-06-18  3:38 ` [PATCH 3/5] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
@ 2026-06-18  3:55   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-06-18  3:55 UTC (permalink / raw)
  To: George Guo; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Critical] `arch_bpf_stack_walk()` uses inline assembly to capture the live `$r22` register, incorrectly assuming it holds the unmodified BPF frame pointer.

Pre-existing issues:
- [High] Tail calls within exception programs (or arena programs) cause a kernel panic (NULL pointer dereference) due to a hardcoded stack offset macro.
--

commit 7327176c407071584a340964ddba1f924e66533a
Author: George Guo <guodongtai@kylinos.cn>

LoongArch: BPF: Add exceptions (bpf_throw) support

Implement BPF exception support, advertised via
bpf_jit_supports_exceptions(). bpf_throw() unwinds the stack to find the
exception boundary program's frame and then invokes its exception
callback with that frame's stack and frame pointers.

> diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
> index 387dc4d3c4868..718c98b3f1fc4 100644
> --- a/arch/loongarch/kernel/stacktrace.c
> +++ b/arch/loongarch/kernel/stacktrace.c
[ ... ]
> @@ -40,6 +41,57 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  	}
>  }
>  
> +#ifdef CONFIG_UNWINDER_ORC
[ ... ]
> +void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp),
> +			 void *cookie)
> +{
> +	unsigned long fp;
> +
> +	/*
> +	 * Capture the live frame pointer ($r22/$fp) here, before handing off to
> +	 * the worker. The kernel is built with -fomit-frame-pointer, so $fp is
> +	 * an ordinary callee-saved register that is preserved across the call
> +	 * from the JITed BPF program into bpf_throw() down to here, and thus
> +	 * still points at the innermost BPF frame. The ORC frame-pointer
> +	 * fallback walks the BPF frames up to the exception boundary from it.
> +	 *
> +	 * This must be a thin wrapper with no large stack locals: the worker
> +	 * uses $r22 to address its frame, which would clobber the live $fp
> +	 * before it could be read. __builtin_frame_address() cannot be used
> +	 * either, as it is $sp-derived and would yield a kernel-stack frame.
> +	 */
> +	asm volatile("move %0, $r22" : "=r"(fp));

[Severity: Critical]
Could the compiler clobber the live $r22 register before we capture it
here?

Since the kernel is built with -fomit-frame-pointer, bpf_throw() is
compiled as a standard C function and the compiler might use $r22 as a
general-purpose callee-saved register for local variables (such as the
cookie argument).

If $r22 is clobbered during bpf_throw(), this inline assembly will capture
the clobbered value instead of the unmodified BPF frame pointer. The
exception callback then blindly trusts this propagated value (passed via
$a2) to anchor its stack frame:

arch/loongarch/net/bpf_jit.c:build_prologue() {
    ...
    move_reg(ctx, LOONGARCH_GPR_FP, LOONGARCH_GPR_A2);
    emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_FP, -stack_adjust);
    ...
}

Could this allow a forged chain of frame pointers to dictate the callback's
return address, leading to a severe regression where arbitrary kernel code
execution is possible?

> +	walk_stackframe_bpf(consume_fn, cookie, fp);
> +}
> +#endif /* CONFIG_UNWINDER_ORC */
> +
>  int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
>  			     void *cookie, struct task_struct *task)
>  {

> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> index c410b02e64bef..22527428f0b34 100644
> --- a/arch/loongarch/net/bpf_jit.c
> +++ b/arch/loongarch/net/bpf_jit.c
[ ... ]
> @@ -203,7 +223,7 @@ static void build_prologue(struct jit_ctx *ctx)
>  	store_offset -= sizeof(long);
>  	emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, store_offset);
>  
> -	if (ctx->arena_vm_start) {
> +	if (ctx->arena_vm_start || is_exception_prog) {
>  		store_offset -= sizeof(long);
>  		emit_insn(ctx, std, REG_ARENA, LOONGARCH_GPR_SP, store_offset);
>  	}

[Severity: High]
This isn't a bug introduced by this patch, but does forcing REG_ARENA to be
pushed for exception programs expose an issue with the tail call layout?

When is_exception_prog is true, this adds an extra 8 bytes to the stack.
However, emit_bpf_tail_call() relies on a hardcoded stack offset macro:

arch/loongarch/net/bpf_jit.c:emit_bpf_tail_call() {
    ...
    int tcc_ptr_off = BPF_TAIL_CALL_CNT_PTR_STACK_OFF(ctx->stack_size);
    ...
    emit_insn(ctx, ldd, t3, REG_TCC, 0);
    ...
}

Since BPF_TAIL_CALL_CNT_PTR_STACK_OFF(ctx->stack_size) evaluates to a fixed
offset (stack_size - 80) regardless of whether REG_ARENA was pushed, does
this cause emit_bpf_tail_call() to incorrectly fetch the TCC integer
instead of TCC_PTR?

If it fetches the integer (often 0) and immediately dereferences it, would
tail calls within exception programs or arena programs cause a NULL pointer
dereference?

> @@ -212,6 +232,7 @@ static void build_prologue(struct jit_ctx *ctx)
>  
>  	emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
>  
> +setup_bpf_fp:
>  	if (ctx->priv_sp_used) {
>  		/* Set up the private stack pointer and the BPF frame pointer */
>  		void __percpu *priv_stack_ptr;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618033809.98253-1-dongtai.guo@linux.dev?part=3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn()
  2026-06-18  3:38 ` [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn() George Guo
  2026-06-18  3:53   ` sashiko-bot
@ 2026-06-18  4:19   ` bot+bpf-ci
  1 sibling, 0 replies; 11+ messages in thread
From: bot+bpf-ci @ 2026-06-18  4:19 UTC (permalink / raw)
  To: dongtai.guo, chenhuacai, yangtiezhu, hengqi.chen
  Cc: kernel, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, shuah, loongarch, linux-kernel, bpf,
	linux-kselftest, guodongtai, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]

>     The JIT does not implement atomics on arena pointers (BPF_PROBE_ATOMIC)
>     nor sign-extending loads from the arena (BPF_PROBE_MEM32SX). Without a
>     bpf_jit_supports_insn() callback the verifier assumes both are available,
>     so such programs are accepted only to fail later in the JIT with a
>     confusing -EINVAL 'unknown opcode'.
>     Implement bpf_jit_supports_insn() to reject these instructions in the
>     arena case. The verifier then rejects the program early with a clear
>     message ('BPF_ATOMIC stores into R<n> ... is not allowed' / 'sign
>     extending loads from arena are not supported yet'). Regular arena
>     accesses (BPF_PROBE_MEM32 loads/stores of all sizes) remain supported.
>     Signed-off-by: George Guo <guodongtai@kylinos.cn>

This looks like it fixes a gap that was introduced when arena support
was first advertised via bpf_jit_supports_arena() returning true.
Should this carry a Fixes: tag?

  Fixes: 4fdb5dd8aeba ("LoongArch: BPF: Implement bpf_addr_space_cast instruction")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27735575341

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-18  4:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18  3:38 [PATCH 0/5] LoongArch: BPF: arena instruction gating, private stack and exceptions George Guo
2026-06-18  3:38 ` [PATCH 1/5] LoongArch: BPF: Gate unsupported arena instructions via bpf_jit_supports_insn() George Guo
2026-06-18  3:53   ` sashiko-bot
2026-06-18  4:19   ` bot+bpf-ci
2026-06-18  3:38 ` [PATCH 2/5] LoongArch: BPF: Add private stack support George Guo
2026-06-18  3:55   ` sashiko-bot
2026-06-18  3:38 ` [PATCH 3/5] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
2026-06-18  3:55   ` sashiko-bot
2026-06-18  3:38 ` [PATCH 4/5] selftests/bpf: Add LoongArch deny list George Guo
2026-06-18  3:52   ` sashiko-bot
2026-06-18  3:38 ` [PATCH 5/5] selftests/bpf: Enable struct_ops private stack test for LoongArch George Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox