* [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto
@ 2026-07-02 2:23 George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 01/11] LoongArch: BPF: Fix tail call count pointer offset for arena programs George Guo
` (11 more replies)
0 siblings, 12 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
This series adds the remaining LoongArch BPF JIT features and enables the
matching selftests:
- internal-only MOV for per-CPU address resolution
- timed may_goto
- per-program private stacks
- exceptions (bpf_throw)
- sign-extending loads from arena (PROBE_MEM32SX)
- atomics on arena pointers (PROBE_ATOMIC)
- selftests: struct_ops private stack, arena LDSX, arena atomics, and a
LoongArch deny list
Patch 1 ("LoongArch: BPF: Fix tail call count pointer offset for arena
programs") is the same fix already posted separately for the bpf tree;
several patches in this series (notably exceptions, patch 5) build on
top of the helper it introduces, so it is included here, unchanged, to
keep the series self-contained and applicable on bpf-next. It can be
dropped once it lands via the bpf tree.
LoongArch: BPF: Fix tail call count pointer offset for arena programs
https://lore.kernel.org/all/20260629085511.359546-1-dongtai.guo@linux.dev
Based on 7.2-rc1.
v2:
- Included the tail call count pointer offset fix as patch 1/11 (see
above) instead of only referencing it, so the series applies cleanly
on bpf-next without waiting on the bpf tree; the empty prog_array
off-by-one was fixed independently upstream by commit 0379d10f09bc
("LoongArch: BPF: Fix off-by-one error in tail call"), now in 7.2-rc1.
- timed may_goto: store $ra at the ORC-mandated .ra_offset = -8 slot in
the trampoline (it was stored at the wrong slot in v1), per Tiezhu
Yang's review.
- Consolidated the earlier per-CPU MOV / timed may_goto and arena gating
postings into one series; rebased on 7.2-rc1.
- selftests: use the upstream __arch_loongarch / "LOONGARCH" test_loader
support now in 7.2-rc1 instead of introducing it.
- Prior postings:
https://lore.kernel.org/all/20260609041407.122384-1-dongtai.guo@linux.dev
https://lore.kernel.org/all/20260618033809.98253-1-dongtai.guo@linux.dev
George Guo (11):
LoongArch: BPF: Fix tail call count pointer offset for arena programs
LoongArch: BPF: Support internal-only MOV to resolve per-CPU addrs
LoongArch: BPF: Add timed may_goto support
LoongArch: BPF: Add private stack support
LoongArch: BPF: Add exceptions (bpf_throw) support
LoongArch: BPF: Support sign-extending loads from arena
LoongArch: BPF: Support atomics on arena pointers
selftests/bpf: Enable struct_ops private stack test for LoongArch
selftests/bpf: Enable arena LDSX tests on LoongArch
selftests/bpf: Enable arena atomics tests on LoongArch
selftests/bpf: Add LoongArch deny list
arch/loongarch/include/asm/inst.h | 1 +
arch/loongarch/kernel/stacktrace.c | 52 +++
arch/loongarch/net/Makefile | 2 +-
arch/loongarch/net/bpf_jit.c | 397 ++++++++++++++++--
arch/loongarch/net/bpf_jit.h | 1 +
arch/loongarch/net/bpf_timed_may_goto.S | 47 +++
.../testing/selftests/bpf/DENYLIST.loongarch | 1 +
.../bpf/prog_tests/struct_ops_private_stack.c | 2 +-
.../selftests/bpf/progs/arena_atomics.c | 3 +
.../selftests/bpf/progs/verifier_ldsx.c | 17 +
10 files changed, 485 insertions(+), 38 deletions(-)
create mode 100644 arch/loongarch/net/bpf_timed_may_goto.S
create mode 100644 tools/testing/selftests/bpf/DENYLIST.loongarch
--
2.25.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 01/11] LoongArch: BPF: Fix tail call count pointer offset for arena programs
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:35 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 02/11] LoongArch: BPF: Support internal-only MOV to resolve per-CPU addrs George Guo
` (10 subsequent siblings)
11 siblings, 1 reply; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
The tail call count (TCC) and its pointer occupy the two deepest slots of
the callee-saved area set up by build_prologue(). An arena program reserves
one extra word for REG_ARENA (arena_vm_start) right above them:
ra fp s0 s1 s2 s3 s4 s5 <- 8 words
[ REG_ARENA ] <- only if ctx->arena_vm_start
tail_call_cnt
tail_call_cnt_ptr <- loaded on tail call / bpf2bpf call
BPF_TAIL_CALL_CNT_PTR_STACK_OFF() hardcodes the pointer at
round_up(stack, 16) - 80, which is only correct when REG_ARENA is absent.
For an arena program the extra word shifts every slot below it down by 8
bytes, so the macro resolves to the tail_call_cnt slot (the counter value)
instead of tail_call_cnt_ptr. The JIT then loads the counter value and
dereferences it as the TCC pointer, corrupting memory or panicking the
kernel whenever an arena program performs a tail call or a bpf2bpf call.
Replace the macro with a helper that accounts for the REG_ARENA slot,
mirroring the reservation logic in build_prologue().
This is the same fix already posted separately for the bpf tree; it is
included here, as patch 1/11, only so the rest of this series applies
and builds cleanly on top of it. It can be dropped from this series
once it lands via the bpf tree.
https://lore.kernel.org/all/20260629085511.359546-1-dongtai.guo@linux.dev
Fixes: ef54c517a937 ("LoongArch: BPF: Implement PROBE_MEM32 pseudo instructions")
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/net/bpf_jit.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index ad7e28375aa9..5e34e9e3f508 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -25,7 +25,23 @@
#define REG_TCC LOONGARCH_GPR_A6
#define REG_ARENA LOONGARCH_GPR_S6 /* For storing arena_vm_start */
-#define BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack) (round_up(stack, 16) - 80)
+
+static int tail_call_cnt_ptr_stack_off(struct jit_ctx *ctx)
+{
+ /* Ten words are pushed below the BPF stack: ra, fp, s0-s5, and the
+ * tail call count plus its pointer, which occupy the two deepest
+ * slots of the callee-saved area.
+ */
+ int offset = sizeof(long) * 10;
+
+ /* An arena program reserves one extra word above them (REG_ARENA),
+ * which pushes the tail call count pointer down by one slot.
+ */
+ if (ctx->arena_vm_start)
+ offset += sizeof(long);
+
+ return round_up(ctx->stack_size, 16) - offset;
+}
static const int regmap[] = {
/* return value from in-kernel function, and exit value for eBPF program */
@@ -291,7 +307,7 @@ bool bpf_jit_supports_far_kfunc_call(void)
static int emit_bpf_tail_call(struct jit_ctx *ctx, int insn)
{
int off, tc_ninsn = 0;
- int tcc_ptr_off = BPF_TAIL_CALL_CNT_PTR_STACK_OFF(ctx->stack_size);
+ int tcc_ptr_off = tail_call_cnt_ptr_stack_off(ctx);
u8 a1 = LOONGARCH_GPR_A1;
u8 a2 = LOONGARCH_GPR_A2;
u8 t1 = LOONGARCH_GPR_T1;
@@ -1181,7 +1197,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
return ret;
if (insn->src_reg == BPF_PSEUDO_CALL) {
- tcc_ptr_off = BPF_TAIL_CALL_CNT_PTR_STACK_OFF(ctx->stack_size);
+ tcc_ptr_off = tail_call_cnt_ptr_stack_off(ctx);
emit_insn(ctx, ldd, REG_TCC, LOONGARCH_GPR_SP, tcc_ptr_off);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 02/11] LoongArch: BPF: Support internal-only MOV to resolve per-CPU addrs
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 01/11] LoongArch: BPF: Fix tail call count pointer offset for arena programs George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 03/11] LoongArch: BPF: Add timed may_goto support George Guo
` (9 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
Support the internal-only BPF_MOV instruction that resolves the absolute
address of per-CPU data from its per-CPU offset. This instruction is used
only for internal inlining optimizations between the BPF verifier and the
JITs (e.g. inlining bpf_get_smp_processor_id() and per-CPU map lookups).
LoongArch keeps the per-CPU offset of the current CPU in $r21
(__my_cpu_offset), so resolving a per-CPU address only requires adding
$r21 to the source register holding the per-CPU offset. Advertise the
capability via bpf_jit_supports_percpu_insn().
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/include/asm/inst.h | 1 +
arch/loongarch/net/bpf_jit.c | 14 ++++++++++++++
2 files changed, 15 insertions(+)
diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index 76b723590023..44fb5ad26d1a 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -404,6 +404,7 @@ enum loongarch_gpr {
LOONGARCH_GPR_T6,
LOONGARCH_GPR_T7,
LOONGARCH_GPR_T8,
+ LOONGARCH_GPR_U0 = 21, /* Kernel per-CPU base register ($r21) */
LOONGARCH_GPR_FP = 22,
LOONGARCH_GPR_S0 = 23,
LOONGARCH_GPR_S1,
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index 5e34e9e3f508..b4208fa3a242 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -759,6 +759,15 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
move_reg(ctx, dst, t1);
break;
}
+ if (insn_is_mov_percpu_addr(insn)) {
+ if (dst != src)
+ move_reg(ctx, dst, src);
+#ifdef CONFIG_SMP
+ /* dst += __my_cpu_offset, held in $r21 */
+ emit_insn(ctx, addd, dst, dst, LOONGARCH_GPR_U0);
+#endif
+ break;
+ }
switch (off) {
case 0:
move_reg(ctx, dst, src);
@@ -2406,6 +2415,11 @@ bool bpf_jit_supports_fsession(void)
return true;
}
+bool bpf_jit_supports_percpu_insn(void)
+{
+ return true;
+}
+
/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
bool bpf_jit_supports_subprog_tailcalls(void)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 03/11] LoongArch: BPF: Add timed may_goto support
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 01/11] LoongArch: BPF: Fix tail call count pointer offset for arena programs George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 02/11] LoongArch: BPF: Support internal-only MOV to resolve per-CPU addrs George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 04/11] LoongArch: BPF: Add private stack support George Guo
` (8 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
Implement arch_bpf_timed_may_goto() and advertise it through
bpf_jit_supports_timed_may_goto() so the verifier lowers may_goto into
the timed variant: instead of a fixed iteration counter, the loop is
bounded by a wall-clock timeout maintained in a per-loop stack slot.
arch_bpf_timed_may_goto() uses a custom calling convention: the verifier
passes the count/timestamp stack offset in BPF_REG_AX and expects the
updated count back in the same register. The JIT call path therefore skips
the usual 'BPF_REG_0 = C return value' move for this helper.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/net/Makefile | 2 +-
arch/loongarch/net/bpf_jit.c | 13 ++++++-
arch/loongarch/net/bpf_timed_may_goto.S | 47 +++++++++++++++++++++++++
3 files changed, 60 insertions(+), 2 deletions(-)
create mode 100644 arch/loongarch/net/bpf_timed_may_goto.S
diff --git a/arch/loongarch/net/Makefile b/arch/loongarch/net/Makefile
index 1ec12a0c324a..8d9ddb48f9ea 100644
--- a/arch/loongarch/net/Makefile
+++ b/arch/loongarch/net/Makefile
@@ -4,4 +4,4 @@
#
# Copyright (C) 2022 Loongson Technology Corporation Limited
#
-obj-$(CONFIG_BPF_JIT) += bpf_jit.o
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_timed_may_goto.o
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index b4208fa3a242..bb84b985cb45 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1229,7 +1229,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
move_addr(ctx, t1, func_addr);
emit_insn(ctx, jirl, LOONGARCH_GPR_RA, t1, 0);
- if (insn->src_reg != BPF_PSEUDO_CALL)
+ /*
+ * Call to arch_bpf_timed_may_goto() uses a custom calling
+ * convention with the argument and return value in BPF_REG_AX,
+ * so skip moving the C return value into BPF_REG_0.
+ */
+ if (insn->src_reg != BPF_PSEUDO_CALL &&
+ func_addr != (u64)arch_bpf_timed_may_goto)
move_reg(ctx, regmap[BPF_REG_0], LOONGARCH_GPR_A0);
break;
@@ -2420,6 +2426,11 @@ bool bpf_jit_supports_percpu_insn(void)
return true;
}
+bool bpf_jit_supports_timed_may_goto(void)
+{
+ return true;
+}
+
/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
bool bpf_jit_supports_subprog_tailcalls(void)
{
diff --git a/arch/loongarch/net/bpf_timed_may_goto.S b/arch/loongarch/net/bpf_timed_may_goto.S
new file mode 100644
index 000000000000..8a4c15418998
--- /dev/null
+++ b/arch/loongarch/net/bpf_timed_may_goto.S
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Author: George Guo <guodongtai@kylinos.cn>
+ * Copyright (C) 2026 KylinSoft Corporation.
+ */
+
+#include <asm/asmmacro.h>
+#include <asm/regdef.h>
+#include <linux/export.h>
+#include <linux/linkage.h>
+
+SYM_FUNC_START(arch_bpf_timed_may_goto)
+ addi.d sp, sp, -64
+ st.d ra, sp, 56
+
+ /* Save BPF registers R0 - R5 (a5, a0 - a4) */
+ st.d a5, sp, 8
+ st.d a0, sp, 16
+ st.d a1, sp, 24
+ st.d a2, sp, 32
+ st.d a3, sp, 40
+ st.d a4, sp, 48
+
+ /*
+ * BPF_REG_AX (t0) holds the offset passed in by the verifier; add it
+ * to BPF_REG_FP (s4) to get the pointer to the count and timestamp,
+ * then pass it as the first argument in a0.
+ *
+ * The verifier emits a load using FP right before this call, so
+ * BPF_REG_FP (s4) is always set up by the JIT in this case.
+ */
+ add.d a0, t0, s4
+ bl bpf_check_timed_may_goto
+ /* BPF_REG_AX (t0) will be stored into count, so move the return value to it. */
+ move t0, a0
+
+ ld.d ra, sp, 56
+ ld.d a5, sp, 8
+ ld.d a0, sp, 16
+ ld.d a1, sp, 24
+ ld.d a2, sp, 32
+ ld.d a3, sp, 40
+ ld.d a4, sp, 48
+ addi.d sp, sp, 64
+
+ jr ra
+SYM_FUNC_END(arch_bpf_timed_may_goto)
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 04/11] LoongArch: BPF: Add private stack support
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (2 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 03/11] LoongArch: BPF: Add timed may_goto support George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 05/11] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
` (7 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
Support per-program private stacks, advertised via
bpf_jit_supports_private_stack(). When the verifier marks a program with
jits_use_priv_stack (e.g. a sufficiently deep, potentially recursive
tracing program), its BPF stack is moved off the kernel stack into a
per-CPU allocation, reducing kernel stack pressure.
The private stack is sized as the verifier-computed stack depth plus two
16-byte guard regions for overflow/underflow detection, initialised at
allocation time and validated in bpf_jit_free(). S5 (saved/restored but
otherwise unused by the JIT) holds the private stack pointer, computed in
the prologue from the current CPU's per-CPU offset ($r21).
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/net/bpf_jit.c | 112 ++++++++++++++++++++++++++++++++++-
arch/loongarch/net/bpf_jit.h | 1 +
2 files changed, 110 insertions(+), 3 deletions(-)
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index bb84b985cb45..3822e05a0779 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -25,6 +25,7 @@
#define REG_TCC LOONGARCH_GPR_A6
#define REG_ARENA LOONGARCH_GPR_S6 /* For storing arena_vm_start */
+#define REG_PRIV_SP LOONGARCH_GPR_S5 /* For storing the private stack pointer */
static int tail_call_cnt_ptr_stack_off(struct jit_ctx *ctx)
{
@@ -43,6 +44,10 @@ static int tail_call_cnt_ptr_stack_off(struct jit_ctx *ctx)
return round_up(ctx->stack_size, 16) - offset;
}
+/* Memory size/value to protect private stack overflow/underflow */
+#define PRIV_STACK_GUARD_SZ 16
+#define PRIV_STACK_GUARD_VAL 0xEB9F12345678eb9fULL
+
static const int regmap[] = {
/* return value from in-kernel function, and exit value for eBPF program */
[BPF_REG_0] = LOONGARCH_GPR_A5,
@@ -63,6 +68,15 @@ static const int regmap[] = {
[BPF_REG_AX] = LOONGARCH_GPR_T0,
};
+static void emit_percpu_ptr(struct jit_ctx *ctx, u8 dst, void __percpu *ptr)
+{
+ move_imm(ctx, dst, (__force long)ptr, false);
+#ifdef CONFIG_SMP
+ /* dst += __my_cpu_offset, held in $r21 */
+ emit_insn(ctx, addd, dst, dst, LOONGARCH_GPR_U0);
+#endif
+}
+
static void prepare_bpf_tail_call_cnt(struct jit_ctx *ctx, int *store_offset)
{
const struct bpf_prog *prog = ctx->prog;
@@ -164,7 +178,14 @@ static void build_prologue(struct jit_ctx *ctx)
stack_adjust += 8;
stack_adjust = round_up(stack_adjust, 16);
- stack_adjust += bpf_stack_adjust;
+
+ /*
+ * When a private stack is used the BPF stack lives in a per-CPU
+ * allocation rather than on the kernel stack, so only the non-BPF
+ * part is reserved here.
+ */
+ if (!ctx->priv_sp_used)
+ stack_adjust += bpf_stack_adjust;
/*
* Save the original return address to a temporary register to prevent
@@ -219,8 +240,16 @@ static void build_prologue(struct jit_ctx *ctx)
emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
- if (bpf_stack_adjust)
+ if (ctx->priv_sp_used) {
+ /* Set up the private stack pointer and the BPF frame pointer */
+ void __percpu *priv_stack_ptr;
+
+ priv_stack_ptr = prog->aux->priv_stack_ptr + PRIV_STACK_GUARD_SZ;
+ emit_percpu_ptr(ctx, REG_PRIV_SP, priv_stack_ptr);
+ emit_insn(ctx, addid, regmap[BPF_REG_FP], REG_PRIV_SP, bpf_stack_adjust);
+ } else if (bpf_stack_adjust) {
emit_insn(ctx, addid, regmap[BPF_REG_FP], LOONGARCH_GPR_SP, bpf_stack_adjust);
+ }
ctx->stack_size = stack_adjust;
@@ -2225,6 +2254,39 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags,
return ret < 0 ? ret : ret * LOONGARCH_INSN_SIZE;
}
+static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size)
+{
+ int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
+ u64 *stack_ptr;
+
+ for_each_possible_cpu(cpu) {
+ stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
+ stack_ptr[0] = PRIV_STACK_GUARD_VAL;
+ stack_ptr[1] = PRIV_STACK_GUARD_VAL;
+ stack_ptr[underflow_idx] = PRIV_STACK_GUARD_VAL;
+ stack_ptr[underflow_idx + 1] = PRIV_STACK_GUARD_VAL;
+ }
+}
+
+static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size,
+ struct bpf_prog *prog)
+{
+ int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
+ u64 *stack_ptr;
+
+ for_each_possible_cpu(cpu) {
+ stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
+ if (stack_ptr[0] != PRIV_STACK_GUARD_VAL ||
+ stack_ptr[1] != PRIV_STACK_GUARD_VAL ||
+ stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL ||
+ stack_ptr[underflow_idx + 1] != PRIV_STACK_GUARD_VAL) {
+ pr_err("BPF private stack overflow/underflow detected for prog %s\n",
+ bpf_jit_get_prog_name(prog));
+ break;
+ }
+ }
+}
+
struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_prog *prog)
{
bool extra_pass = false;
@@ -2233,7 +2295,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
struct jit_ctx ctx;
struct jit_data *jit_data;
struct bpf_binary_header *header;
- struct bpf_binary_header *ro_header;
+ struct bpf_binary_header *ro_header = NULL;
+ void __percpu *priv_stack_ptr = NULL;
+ int priv_stack_alloc_sz;
/*
* If BPF JIT was not enabled then we must fall back to
@@ -2249,6 +2313,22 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
return prog;
prog->aux->jit_data = jit_data;
}
+ priv_stack_ptr = prog->aux->priv_stack_ptr;
+ if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
+ /*
+ * Allocate the actual private stack: the verifier-calculated
+ * stack size plus two guard regions to detect overflow and
+ * underflow.
+ */
+ priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 16) +
+ 2 * PRIV_STACK_GUARD_SZ;
+ priv_stack_ptr = __alloc_percpu_gfp(priv_stack_alloc_sz, 16, GFP_KERNEL);
+ if (!priv_stack_ptr)
+ goto out_priv_stack;
+
+ priv_stack_init_guard(priv_stack_ptr, priv_stack_alloc_sz);
+ prog->aux->priv_stack_ptr = priv_stack_ptr;
+ }
if (jit_data->ctx.offset) {
ctx = jit_data->ctx;
ro_header = jit_data->ro_header;
@@ -2264,6 +2344,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
ctx.prog = prog;
ctx.arena_vm_start = bpf_arena_get_kern_vm_start(prog->aux->arena);
ctx.user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
+ ctx.priv_sp_used = priv_stack_ptr ? true : false;
ctx.offset = kvcalloc(prog->len + 1, sizeof(u32), GFP_KERNEL);
if (ctx.offset == NULL)
@@ -2357,7 +2438,17 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
bpf_prog_fill_jited_linfo(prog, ctx.offset + 1);
out_offset:
+ /*
+ * A NULL ro_header here means the JIT failed, so release the
+ * private stack that was allocated above; on success the
+ * program keeps it until bpf_jit_free().
+ */
+ if (!ro_header && priv_stack_ptr) {
+ free_percpu(priv_stack_ptr);
+ prog->aux->priv_stack_ptr = NULL;
+ }
kvfree(ctx.offset);
+out_priv_stack:
kfree(jit_data);
prog->aux->jit_data = NULL;
}
@@ -2374,6 +2465,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
if (header) {
bpf_arch_text_copy(&ro_header->size, &header->size, sizeof(header->size));
bpf_jit_binary_pack_free(ro_header, header);
+ ro_header = NULL;
}
goto out_offset;
}
@@ -2383,6 +2475,8 @@ void bpf_jit_free(struct bpf_prog *prog)
if (prog->jited) {
struct jit_data *jit_data = prog->aux->jit_data;
struct bpf_binary_header *hdr;
+ void __percpu *priv_stack_ptr;
+ int priv_stack_alloc_sz;
/*
* If we fail the final pass of JIT (from jit_subprogs), the
@@ -2395,6 +2489,13 @@ void bpf_jit_free(struct bpf_prog *prog)
}
hdr = bpf_jit_binary_pack_hdr(prog);
bpf_jit_binary_pack_free(hdr, NULL);
+ priv_stack_ptr = prog->aux->priv_stack_ptr;
+ if (priv_stack_ptr) {
+ priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 16) +
+ 2 * PRIV_STACK_GUARD_SZ;
+ priv_stack_check_guard(priv_stack_ptr, priv_stack_alloc_sz, prog);
+ free_percpu(prog->aux->priv_stack_ptr);
+ }
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
}
@@ -2431,6 +2532,11 @@ bool bpf_jit_supports_timed_may_goto(void)
return true;
}
+bool bpf_jit_supports_private_stack(void)
+{
+ return true;
+}
+
/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
bool bpf_jit_supports_subprog_tailcalls(void)
{
diff --git a/arch/loongarch/net/bpf_jit.h b/arch/loongarch/net/bpf_jit.h
index a8e29be35fa8..01a7ea47e79b 100644
--- a/arch/loongarch/net/bpf_jit.h
+++ b/arch/loongarch/net/bpf_jit.h
@@ -22,6 +22,7 @@ struct jit_ctx {
u32 stack_size;
u64 arena_vm_start;
u64 user_vm_start;
+ bool priv_sp_used;
};
struct jit_data {
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 05/11] LoongArch: BPF: Add exceptions (bpf_throw) support
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (3 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 04/11] LoongArch: BPF: Add private stack support George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:39 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 06/11] LoongArch: BPF: Support sign-extending loads from arena George Guo
` (6 subsequent siblings)
11 siblings, 1 reply; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
Implement BPF exception support, advertised via
bpf_jit_supports_exceptions(). bpf_throw() unwinds the stack to find the
exception boundary program's frame and then invokes its exception
callback with that frame's stack and frame pointers.
Finding the boundary frame needs arch_bpf_stack_walk(), which reports
each frame's (ip, sp, fp). This is implemented on top of the ORC
unwinder: ORC updates the frame pointer per frame and walks JITed BPF
code via its generated-code frame-pointer fallback, which expects the
frame record at fp-8 ($ra) and fp-16 (previous fp) -- exactly what the
LoongArch BPF prologue already lays down. The capability is therefore
gated on CONFIG_UNWINDER_ORC; with other unwinders it returns false.
The walk is seeded with the live frame pointer ($r22). The kernel is
built with -fomit-frame-pointer, so $fp is an ordinary callee-saved
register preserved across the call from the JITed program into
bpf_throw() down to arch_bpf_stack_walk(), where it still points at the
innermost BPF frame for the ORC fallback to start from. It is captured
in a thin wrapper with no large stack locals, because the worker that
runs the unwind uses $r22 to address its own (pt_regs + unwind_state)
frame and would otherwise clobber the live $fp before it could be read.
On the JIT side, the exception callback does not build a normal frame:
it receives the boundary program's frame pointer as its third argument
(a2), sets FP to it and SP to FP - stack_size, and reuses the boundary's
frame. Because the callee-saved register saves are anchored at the top
of the frame (FP), the existing FP-relative epilogue restores the
boundary's registers and returns to the boundary's caller regardless of
the two programs' individual frame sizes. To keep the boundary and the
callback agreeing on the layout, the s6 slot is always reserved for
exception programs, mirroring the arena case.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/kernel/stacktrace.c | 52 ++++++++++++++++++++++++++++
arch/loongarch/net/bpf_jit.c | 54 ++++++++++++++++++++++++++----
2 files changed, 100 insertions(+), 6 deletions(-)
diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
index 387dc4d3c486..718c98b3f1fc 100644
--- a/arch/loongarch/kernel/stacktrace.c
+++ b/arch/loongarch/kernel/stacktrace.c
@@ -4,6 +4,7 @@
*
* Copyright (C) 2022 Loongson Technology Corporation Limited
*/
+#include <linux/filter.h>
#include <linux/sched.h>
#include <linux/stacktrace.h>
#include <linux/uaccess.h>
@@ -40,6 +41,57 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
}
}
+#ifdef CONFIG_UNWINDER_ORC
+/*
+ * Used by BPF exception support (bpf_throw) to find the exception boundary
+ * frame. The ORC unwinder reports the stack and frame pointer of each frame
+ * and, via its generated-code fallback, can walk JITed BPF frames, which set
+ * up the expected frame record ($ra at fp-8, previous fp at fp-16).
+ */
+static noinline void walk_stackframe_bpf(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp),
+ void *cookie, unsigned long fp)
+{
+ unsigned long addr;
+ struct pt_regs dummyregs;
+ struct pt_regs *regs = &dummyregs;
+ struct unwind_state state;
+
+ regs->regs[3] = (unsigned long)__builtin_frame_address(0);
+ regs->csr_era = (unsigned long)__builtin_return_address(0);
+ regs->regs[1] = 0;
+ regs->regs[22] = fp;
+
+ for (unwind_start(&state, current, regs);
+ !unwind_done(&state); unwind_next_frame(&state)) {
+ addr = unwind_get_return_address(&state);
+ if (!addr || !consume_fn(cookie, (u64)addr, (u64)state.sp, (u64)state.fp))
+ break;
+ }
+}
+
+void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp),
+ void *cookie)
+{
+ unsigned long fp;
+
+ /*
+ * Capture the live frame pointer ($r22/$fp) here, before handing off to
+ * the worker. The kernel is built with -fomit-frame-pointer, so $fp is
+ * an ordinary callee-saved register that is preserved across the call
+ * from the JITed BPF program into bpf_throw() down to here, and thus
+ * still points at the innermost BPF frame. The ORC frame-pointer
+ * fallback walks the BPF frames up to the exception boundary from it.
+ *
+ * This must be a thin wrapper with no large stack locals: the worker
+ * uses $r22 to address its frame, which would clobber the live $fp
+ * before it could be read. __builtin_frame_address() cannot be used
+ * either, as it is $sp-derived and would yield a kernel-stack frame.
+ */
+ asm volatile("move %0, $r22" : "=r"(fp));
+ walk_stackframe_bpf(consume_fn, cookie, fp);
+}
+#endif /* CONFIG_UNWINDER_ORC */
+
int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
void *cookie, struct task_struct *task)
{
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index 3822e05a0779..f172ffc2c011 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -29,16 +29,20 @@
static int tail_call_cnt_ptr_stack_off(struct jit_ctx *ctx)
{
+ const struct bpf_prog *prog = ctx->prog;
+ const bool is_exception_prog = prog->aux->exception_boundary ||
+ prog->aux->exception_cb;
/* Ten words are pushed below the BPF stack: ra, fp, s0-s5, and the
* tail call count plus its pointer, which occupy the two deepest
* slots of the callee-saved area.
*/
int offset = sizeof(long) * 10;
- /* An arena program reserves one extra word above them (REG_ARENA),
- * which pushes the tail call count pointer down by one slot.
+ /* An arena or exception program reserves one extra word above them
+ * ($s6, see build_prologue()), which pushes the tail call count
+ * pointer down by one slot.
*/
- if (ctx->arena_vm_start)
+ if (ctx->arena_vm_start || is_exception_prog)
offset += sizeof(long);
return round_up(ctx->stack_size, 16) - offset;
@@ -151,6 +155,9 @@ static void prepare_bpf_tail_call_cnt(struct jit_ctx *ctx, int *store_offset)
* +-------------------------+
* | $s5 |
* +-------------------------+
+ * | $s6 (arena/exception) |
+ * | (optional) |
+ * +-------------------------+
* | tcc |
* +-------------------------+
* | tcc_ptr |
@@ -165,6 +172,13 @@ static void build_prologue(struct jit_ctx *ctx)
int i, stack_adjust = 0, store_offset, bpf_stack_adjust;
const struct bpf_prog *prog = ctx->prog;
const bool is_main_prog = !bpf_is_subprog(prog);
+ /*
+ * Exception boundary and callback programs must agree on the frame
+ * layout: the callback reuses the boundary's frame to restore its
+ * callee-saved registers, so the s6 slot is always reserved for them.
+ */
+ const bool is_exception_prog = prog->aux->exception_boundary ||
+ prog->aux->exception_cb;
bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
@@ -174,7 +188,7 @@ static void build_prologue(struct jit_ctx *ctx)
/* To store tcc and tcc_ptr */
stack_adjust += sizeof(long) * 2;
- if (ctx->arena_vm_start)
+ if (ctx->arena_vm_start || is_exception_prog)
stack_adjust += 8;
stack_adjust = round_up(stack_adjust, 16);
@@ -205,6 +219,19 @@ static void build_prologue(struct jit_ctx *ctx)
if (is_main_prog)
emit_insn(ctx, addid, REG_TCC, LOONGARCH_GPR_ZERO, 0);
+ if (prog->aux->exception_cb) {
+ /*
+ * The exception callback receives the boundary program's frame
+ * pointer as its third argument (a2). Reuse that frame so the
+ * (FP-anchored) epilogue restores the boundary's callee-saved
+ * registers and returns to the boundary's caller. The boundary
+ * already saved them, so nothing is pushed here.
+ */
+ move_reg(ctx, LOONGARCH_GPR_FP, LOONGARCH_GPR_A2);
+ emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_FP, -stack_adjust);
+ goto setup_bpf_fp;
+ }
+
emit_insn(ctx, addid, LOONGARCH_GPR_SP, LOONGARCH_GPR_SP, -stack_adjust);
store_offset = stack_adjust - sizeof(long);
@@ -231,7 +258,7 @@ static void build_prologue(struct jit_ctx *ctx)
store_offset -= sizeof(long);
emit_insn(ctx, std, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, store_offset);
- if (ctx->arena_vm_start) {
+ if (ctx->arena_vm_start || is_exception_prog) {
store_offset -= sizeof(long);
emit_insn(ctx, std, REG_ARENA, LOONGARCH_GPR_SP, store_offset);
}
@@ -240,6 +267,7 @@ static void build_prologue(struct jit_ctx *ctx)
emit_insn(ctx, addid, LOONGARCH_GPR_FP, LOONGARCH_GPR_SP, stack_adjust);
+setup_bpf_fp:
if (ctx->priv_sp_used) {
/* Set up the private stack pointer and the BPF frame pointer */
void __percpu *priv_stack_ptr;
@@ -261,6 +289,9 @@ static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
{
int stack_adjust = ctx->stack_size;
int load_offset;
+ const struct bpf_prog *prog = ctx->prog;
+ const bool is_exception_prog = prog->aux->exception_boundary ||
+ prog->aux->exception_cb;
load_offset = stack_adjust - sizeof(long);
emit_insn(ctx, ldd, LOONGARCH_GPR_RA, LOONGARCH_GPR_SP, load_offset);
@@ -286,7 +317,7 @@ static void __build_epilogue(struct jit_ctx *ctx, bool is_tail_call)
load_offset -= sizeof(long);
emit_insn(ctx, ldd, LOONGARCH_GPR_S5, LOONGARCH_GPR_SP, load_offset);
- if (ctx->arena_vm_start) {
+ if (ctx->arena_vm_start || is_exception_prog) {
load_offset -= sizeof(long);
emit_insn(ctx, ldd, REG_ARENA, LOONGARCH_GPR_SP, load_offset);
}
@@ -2537,6 +2568,17 @@ bool bpf_jit_supports_private_stack(void)
return true;
}
+bool bpf_jit_supports_exceptions(void)
+{
+ /*
+ * Walking kernel and BPF frames from within bpf_throw() relies on
+ * arch_bpf_stack_walk(), which is only implemented for the ORC
+ * unwinder. ORC reports each frame's stack and frame pointer and
+ * walks JITed BPF frames via its frame-pointer fallback.
+ */
+ return IS_ENABLED(CONFIG_UNWINDER_ORC);
+}
+
/* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */
bool bpf_jit_supports_subprog_tailcalls(void)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 06/11] LoongArch: BPF: Support sign-extending loads from arena
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (4 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 05/11] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 07/11] LoongArch: BPF: Support atomics on arena pointers George Guo
` (5 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
A sign-extending load from an arena pointer is rewritten by the verifier
to BPF_PROBE_MEM32SX once the JIT advertises support for it through
bpf_jit_supports_insn(); the generic helper otherwise defaults to
rejecting arena instructions. Add the callback and implement the load:
route BPF_PROBE_MEM32SX through the existing BPF_PROBE_MEM32 path, which
adds the arena base held in REG_ARENA, while selecting the
sign-extending load variants (ld.b/ld.h/ld.w), and register an exception
table entry so a fault on the access is handled like the other arena
probes.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/net/bpf_jit.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index f172ffc2c011..4a3b632c1fde 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -739,7 +739,8 @@ static int add_exception_handler(const struct bpf_insn *insn,
if (BPF_MODE(insn->code) != BPF_PROBE_MEM &&
BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
- BPF_MODE(insn->code) != BPF_PROBE_MEM32)
+ BPF_MODE(insn->code) != BPF_PROBE_MEM32 &&
+ BPF_MODE(insn->code) != BPF_PROBE_MEM32SX)
return 0;
if (WARN_ON_ONCE(ctx->num_exentries >= ctx->prog->aux->num_exentries))
@@ -1349,10 +1350,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
case BPF_LDX | BPF_PROBE_MEM32 | BPF_H:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_W:
case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW:
+ /* LDX | PROBE_MEM32SX: dst = *(signed size *)(src + REG_ARENA + off) */
+ case BPF_LDX | BPF_PROBE_MEM32SX | BPF_B:
+ case BPF_LDX | BPF_PROBE_MEM32SX | BPF_H:
+ case BPF_LDX | BPF_PROBE_MEM32SX | BPF_W:
sign_extend = BPF_MODE(code) == BPF_MEMSX ||
- BPF_MODE(code) == BPF_PROBE_MEMSX;
+ BPF_MODE(code) == BPF_PROBE_MEMSX ||
+ BPF_MODE(code) == BPF_PROBE_MEM32SX;
- if (BPF_MODE(code) == BPF_PROBE_MEM32) {
+ if (BPF_MODE(code) == BPF_PROBE_MEM32 ||
+ BPF_MODE(code) == BPF_PROBE_MEM32SX) {
emit_insn(ctx, addd, t2, src, REG_ARENA);
src = t2;
}
@@ -2548,6 +2555,21 @@ bool bpf_jit_supports_arena(void)
return true;
}
+bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
+{
+ if (!in_arena)
+ return true;
+
+ switch (insn->code) {
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
+ /* Atomics on arena pointers are not implemented yet. */
+ return false;
+ }
+
+ return true;
+}
+
bool bpf_jit_supports_fsession(void)
{
return true;
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 07/11] LoongArch: BPF: Support atomics on arena pointers
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (5 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 06/11] LoongArch: BPF: Support sign-extending loads from arena George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:48 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 08/11] selftests/bpf: Enable struct_ops private stack test for LoongArch George Guo
` (4 subsequent siblings)
11 siblings, 1 reply; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
Implement atomic operations on arena pointers (BPF_PROBE_ATOMIC): the
read-modify-write ops, atomic_fetch_*, xchg, cmpxchg and
load-acquire/store-release. For each, the arena base held in REG_ARENA
is folded into the address and an exception table entry is registered on
the access so a fault is handled like the other arena probes.
The exception entry must point at the actual memory-accessing
instruction rather than the last one emitted: the fetch variants append
a zero-extend after the am* op, and cmpxchg accesses memory with the ll
of an ll/sc loop. Generalise add_exception_handler() to take explicit
fault and resume instruction indices. A faulting ll resumes past the
whole ll/sc loop: if the ll faults the sc is never reached, and once the
ll succeeds the page is mapped so the sc cannot fault, so a single entry
on the ll suffices.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
arch/loongarch/net/bpf_jit.c | 182 ++++++++++++++++++++++++++++-------
1 file changed, 148 insertions(+), 34 deletions(-)
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index 4a3b632c1fde..a7f2d45aef75 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -441,6 +441,16 @@ static void emit_store_stack_imm64(struct jit_ctx *ctx, int reg, int stack_off,
emit_insn(ctx, std, reg, LOONGARCH_GPR_FP, stack_off);
}
+#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
+#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
+#define REG_DONT_CLEAR_MARKER 0
+
+static int add_exception_handler(const struct bpf_insn *insn,
+ struct jit_ctx *ctx, int dst_reg);
+static int __add_exception_handler(const struct bpf_insn *insn,
+ struct jit_ctx *ctx, int dst_reg,
+ int fault_idx, int resume_idx);
+
static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
{
const u8 t1 = LOONGARCH_GPR_T1;
@@ -452,9 +462,14 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
const s16 off = insn->off;
const s32 imm = insn->imm;
const bool isdw = BPF_SIZE(insn->code) == BPF_DW;
+ const bool arena = BPF_MODE(insn->code) == BPF_PROBE_ATOMIC;
+ bool zext = false;
+ int ret, ll_idx = 0;
move_imm(ctx, t1, off, false);
emit_insn(ctx, addd, t1, dst, t1);
+ if (arena)
+ emit_insn(ctx, addd, t1, t1, REG_ARENA);
move_reg(ctx, t3, src);
switch (imm) {
@@ -510,7 +525,7 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
return -EINVAL;
}
emit_insn(ctx, amaddb, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
break;
case BPF_H:
if (!cpu_has_lam_bh) {
@@ -518,11 +533,11 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
return -EINVAL;
}
emit_insn(ctx, amaddh, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
break;
case BPF_W:
emit_insn(ctx, amaddw, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
break;
case BPF_DW:
emit_insn(ctx, amaddd, src, t1, t3);
@@ -534,7 +549,7 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
emit_insn(ctx, amandd, src, t1, t3);
} else {
emit_insn(ctx, amandw, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
}
break;
case BPF_OR | BPF_FETCH:
@@ -542,7 +557,7 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
emit_insn(ctx, amord, src, t1, t3);
} else {
emit_insn(ctx, amorw, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
}
break;
case BPF_XOR | BPF_FETCH:
@@ -550,7 +565,7 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
emit_insn(ctx, amxord, src, t1, t3);
} else {
emit_insn(ctx, amxorw, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
}
break;
/* src = atomic_xchg(dst + off, src); */
@@ -562,7 +577,7 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
return -EINVAL;
}
emit_insn(ctx, amswapb, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
break;
case BPF_H:
if (!cpu_has_lam_bh) {
@@ -570,11 +585,11 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
return -EINVAL;
}
emit_insn(ctx, amswaph, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
break;
case BPF_W:
emit_insn(ctx, amswapw, src, t1, t3);
- emit_zext_32(ctx, src, true);
+ zext = true;
break;
case BPF_DW:
emit_insn(ctx, amswapd, src, t1, t3);
@@ -585,12 +600,14 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
case BPF_CMPXCHG:
move_reg(ctx, t2, r0);
if (isdw) {
+ ll_idx = ctx->idx;
emit_insn(ctx, lld, r0, t1, 0);
emit_insn(ctx, bne, t2, r0, 4);
move_reg(ctx, t3, src);
emit_insn(ctx, scd, t3, t1, 0);
emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -4);
} else {
+ ll_idx = ctx->idx;
emit_insn(ctx, llw, r0, t1, 0);
emit_zext_32(ctx, t2, true);
emit_zext_32(ctx, r0, true);
@@ -600,12 +617,42 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -6);
emit_zext_32(ctx, r0, true);
}
+ /*
+ * On arena the ll may fault (unmapped page); the page-fault
+ * handler restarts the program at @resume. Only the ll needs an
+ * entry: if it faults the sc is never reached, and once the ll
+ * succeeds the page is mapped so the sc cannot fault. Resume
+ * past the whole ll/sc loop.
+ */
+ if (arena) {
+ ret = __add_exception_handler(insn, ctx,
+ REG_DONT_CLEAR_MARKER,
+ ll_idx, ctx->idx);
+ if (ret)
+ return ret;
+ }
break;
default:
pr_err_once("bpf-jit: invalid atomic read-modify-write opcode %02x\n", imm);
return -EINVAL;
}
+ /*
+ * For the single-instruction am* ops the memory access is the last
+ * emitted instruction; register its exception entry before emitting the
+ * deferred zero-extend so the fault resumes past it. cmpxchg handled
+ * its own entry above.
+ */
+ if (arena && imm != BPF_CMPXCHG) {
+ ret = __add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
+ ctx->idx - 1, ctx->idx + (zext ? 1 : 0));
+ if (ret)
+ return ret;
+ }
+
+ if (zext)
+ emit_zext_32(ctx, src, true);
+
return 0;
}
@@ -616,10 +663,37 @@ static int emit_atomic_ld_st(const struct bpf_insn *insn, struct jit_ctx *ctx)
const u8 dst = regmap[insn->dst_reg];
const s16 off = insn->off;
const s32 imm = insn->imm;
+ const bool arena = BPF_MODE(insn->code) == BPF_PROBE_ATOMIC;
+ int ret;
switch (imm) {
/* dst_reg = load_acquire(src_reg + off16) */
case BPF_LOAD_ACQ:
+ if (arena) {
+ /* t1 = src + off + arena_vm_start; load from [t1]. */
+ move_imm(ctx, t1, off, false);
+ emit_insn(ctx, addd, t1, src, t1);
+ emit_insn(ctx, addd, t1, t1, REG_ARENA);
+ switch (BPF_SIZE(insn->code)) {
+ case BPF_B:
+ emit_insn(ctx, ldbu, dst, t1, 0);
+ break;
+ case BPF_H:
+ emit_insn(ctx, ldhu, dst, t1, 0);
+ break;
+ case BPF_W:
+ emit_insn(ctx, ldwu, dst, t1, 0);
+ break;
+ case BPF_DW:
+ emit_insn(ctx, ldd, dst, t1, 0);
+ break;
+ }
+ ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER);
+ if (ret)
+ return ret;
+ emit_insn(ctx, dbar, 0b10100);
+ break;
+ }
switch (BPF_SIZE(insn->code)) {
case BPF_B:
if (is_signed_imm12(off)) {
@@ -658,6 +732,31 @@ static int emit_atomic_ld_st(const struct bpf_insn *insn, struct jit_ctx *ctx)
break;
/* store_release(dst_reg + off16, src_reg) */
case BPF_STORE_REL:
+ if (arena) {
+ /* t1 = dst + off + arena_vm_start; store to [t1]. */
+ emit_insn(ctx, dbar, 0b10010);
+ move_imm(ctx, t1, off, false);
+ emit_insn(ctx, addd, t1, dst, t1);
+ emit_insn(ctx, addd, t1, t1, REG_ARENA);
+ switch (BPF_SIZE(insn->code)) {
+ case BPF_B:
+ emit_insn(ctx, stb, src, t1, 0);
+ break;
+ case BPF_H:
+ emit_insn(ctx, sth, src, t1, 0);
+ break;
+ case BPF_W:
+ emit_insn(ctx, stw, src, t1, 0);
+ break;
+ case BPF_DW:
+ emit_insn(ctx, std, src, t1, 0);
+ break;
+ }
+ ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER);
+ if (ret)
+ return ret;
+ break;
+ }
emit_insn(ctx, dbar, 0b10010);
switch (BPF_SIZE(insn->code)) {
case BPF_B:
@@ -708,10 +807,6 @@ static bool is_signed_bpf_cond(u8 cond)
cond == BPF_JSGE || cond == BPF_JSLE;
}
-#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
-#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
-#define REG_DONT_CLEAR_MARKER 0
-
bool ex_handler_bpf(const struct exception_table_entry *ex,
struct pt_regs *regs)
{
@@ -725,12 +820,21 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
return true;
}
-/* For accesses to BTF pointers, add an entry to the exception table */
-static int add_exception_handler(const struct bpf_insn *insn,
- struct jit_ctx *ctx,
- int dst_reg)
+/*
+ * Register an exception table entry for a faulting instruction.
+ *
+ * @fault_idx is the ctx->image index of the instruction that may fault;
+ * @resume_idx is the index to resume execution at after the fault is handled.
+ * For a simple load/store these are the just-emitted instruction and the one
+ * right after it, but an atomic may need to fault on an instruction in the
+ * middle of a longer sequence (e.g. the ll of an ll/sc cmpxchg loop) and
+ * resume past the whole sequence, so both are passed explicitly.
+ */
+static int __add_exception_handler(const struct bpf_insn *insn,
+ struct jit_ctx *ctx, int dst_reg,
+ int fault_idx, int resume_idx)
{
- unsigned long pc;
+ unsigned long pc, resume_pc;
off_t ins_offset, fixup_offset;
struct exception_table_entry *ex;
@@ -740,20 +844,22 @@ static int add_exception_handler(const struct bpf_insn *insn,
if (BPF_MODE(insn->code) != BPF_PROBE_MEM &&
BPF_MODE(insn->code) != BPF_PROBE_MEMSX &&
BPF_MODE(insn->code) != BPF_PROBE_MEM32 &&
- BPF_MODE(insn->code) != BPF_PROBE_MEM32SX)
+ BPF_MODE(insn->code) != BPF_PROBE_MEM32SX &&
+ BPF_MODE(insn->code) != BPF_PROBE_ATOMIC)
return 0;
if (WARN_ON_ONCE(ctx->num_exentries >= ctx->prog->aux->num_exentries))
return -EINVAL;
ex = &ctx->prog->aux->extable[ctx->num_exentries];
- pc = (unsigned long)&ctx->ro_image[ctx->idx - 1];
+ pc = (unsigned long)&ctx->ro_image[fault_idx];
+ resume_pc = (unsigned long)&ctx->ro_image[resume_idx];
/*
* This is the relative offset of the instruction that may fault from
* the exception table itself. This will be written to the exception
* table and if this instruction faults, the destination register will
- * be set to '0' and the execution will jump to the next instruction.
+ * be set to '0' and the execution will jump to @resume_pc.
*/
ins_offset = pc - (long)&ex->insn;
if (WARN_ON_ONCE(ins_offset >= 0 || ins_offset < INT_MIN))
@@ -767,10 +873,10 @@ static int add_exception_handler(const struct bpf_insn *insn,
* modifying the upper bits because the table is already sorted, and
* isn't part of the main exception table.
*
- * The fixup_offset is set to the next instruction from the instruction
- * that may fault. The execution will jump to this after handling the fault.
+ * The fixup_offset is set to the resume instruction. The execution will
+ * jump to this after handling the fault.
*/
- fixup_offset = (long)&ex->fixup - (pc + LOONGARCH_INSN_SIZE);
+ fixup_offset = (long)&ex->fixup - resume_pc;
if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, fixup_offset))
return -ERANGE;
@@ -789,6 +895,14 @@ static int add_exception_handler(const struct bpf_insn *insn,
return 0;
}
+/* The faulting instruction is the one just emitted; resume at the next. */
+static int add_exception_handler(const struct bpf_insn *insn,
+ struct jit_ctx *ctx, int dst_reg)
+{
+ return __add_exception_handler(insn, ctx, dst_reg,
+ ctx->idx - 1, ctx->idx);
+}
+
static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool extra_pass)
{
u8 tm = -1;
@@ -1545,6 +1659,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext
case BPF_STX | BPF_ATOMIC | BPF_H:
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
+ case BPF_STX | BPF_PROBE_ATOMIC | BPF_B:
+ case BPF_STX | BPF_PROBE_ATOMIC | BPF_H:
+ case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
+ case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
if (!bpf_atomic_is_load_store(insn))
ret = emit_atomic_rmw(insn, ctx);
else
@@ -2557,16 +2675,12 @@ bool bpf_jit_supports_arena(void)
bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
{
- if (!in_arena)
- return true;
-
- switch (insn->code) {
- case BPF_STX | BPF_ATOMIC | BPF_W:
- case BPF_STX | BPF_ATOMIC | BPF_DW:
- /* Atomics on arena pointers are not implemented yet. */
- return false;
- }
-
+ /*
+ * All arena access instructions are implemented: regular and
+ * sign-extending loads/stores (BPF_PROBE_MEM32 / BPF_PROBE_MEM32SX)
+ * and atomics (BPF_PROBE_ATOMIC). The default weak helper rejects
+ * everything, so the override is required to enable arena programs.
+ */
return true;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 08/11] selftests/bpf: Enable struct_ops private stack test for LoongArch
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (6 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 07/11] LoongArch: BPF: Support atomics on arena pointers George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 09/11] selftests/bpf: Enable arena LDSX tests on LoongArch George Guo
` (3 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
LoongArch now supports BPF private stacks via
bpf_jit_supports_private_stack(), so let the struct_ops private stack
runtime test (private_stack / private_stack_fail / private_stack_recur)
run there instead of being skipped.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
.../testing/selftests/bpf/prog_tests/struct_ops_private_stack.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
index 98db9bafa44b..fcc86f1dfd15 100644
--- a/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
+++ b/tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
@@ -5,7 +5,7 @@
#include "struct_ops_private_stack_fail.skel.h"
#include "struct_ops_private_stack_recur.skel.h"
-#if defined(__x86_64__) || defined(__aarch64__) || defined(__powerpc64__)
+#if defined(__x86_64__) || defined(__aarch64__) || defined(__powerpc64__) || defined(__loongarch__)
static void test_private_stack(void)
{
struct struct_ops_private_stack *skel;
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 09/11] selftests/bpf: Enable arena LDSX tests on LoongArch
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (7 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 08/11] selftests/bpf: Enable struct_ops private stack test for LoongArch George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 10/11] selftests/bpf: Enable arena atomics " George Guo
` (2 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
The verifier test_loader only runs an __arch_*-tagged test when the tag
matches the running architecture. The arena sign-extending load (LDSX)
subtests in verifier_ldsx are tagged for x86_64 and arm64 only, so they
are skipped on LoongArch even though the JIT now supports the
instruction.
Tag the arena LDSX subtests (disasm, exception, S8, S16, S32) with
__arch_loongarch, and add the expected LoongArch JIT disassembly to the
disasm subtest, so they run and are checked there.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
.../testing/selftests/bpf/progs/verifier_ldsx.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_ldsx.c b/tools/testing/selftests/bpf/progs/verifier_ldsx.c
index 41340877dc9d..55039dde3dc5 100644
--- a/tools/testing/selftests/bpf/progs/verifier_ldsx.c
+++ b/tools/testing/selftests/bpf/progs/verifier_ldsx.c
@@ -286,6 +286,19 @@ __jited("add x11, x0, x28")
__jited("ldrsh x22, [x11, #0x18]")
__jited("add x11, x0, x28")
__jited("ldrsb x22, [x11, #0x20]")
+__arch_loongarch
+__jited("add.d $t2, $a5, $s6")
+__jited("ld.w $s2, $t2, 16")
+__jited("add.d $t2, $a5, $s6")
+__jited("ld.h $s2, $t2, 24")
+__jited("add.d $t2, $a5, $s6")
+__jited("ld.b $s2, $t2, 32")
+__jited("add.d $t2, $a0, $s6")
+__jited("ld.w $s3, $t2, 16")
+__jited("add.d $t2, $a0, $s6")
+__jited("ld.h $s3, $t2, 24")
+__jited("add.d $t2, $a0, $s6")
+__jited("ld.b $s3, $t2, 32")
__naked void arena_ldsx_disasm(void *ctx)
{
asm volatile (
@@ -317,6 +330,7 @@ __description("Arena LDSX Exception")
__success __retval(0)
__arch_x86_64
__arch_arm64
+__arch_loongarch
__naked void arena_ldsx_exception(void *ctx)
{
asm volatile (
@@ -338,6 +352,7 @@ __description("Arena LDSX, S8")
__success __retval(-1)
__arch_x86_64
__arch_arm64
+__arch_loongarch
__naked void arena_ldsx_s8(void *ctx)
{
asm volatile (
@@ -369,6 +384,7 @@ __description("Arena LDSX, S16")
__success __retval(-1)
__arch_x86_64
__arch_arm64
+__arch_loongarch
__naked void arena_ldsx_s16(void *ctx)
{
asm volatile (
@@ -400,6 +416,7 @@ __description("Arena LDSX, S32")
__success __retval(-1)
__arch_x86_64
__arch_arm64
+__arch_loongarch
__naked void arena_ldsx_s32(void *ctx)
{
asm volatile (
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 10/11] selftests/bpf: Enable arena atomics tests on LoongArch
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (8 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 09/11] selftests/bpf: Enable arena LDSX tests on LoongArch George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-02 2:49 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 11/11] selftests/bpf: Add LoongArch deny list George Guo
2026-07-03 10:11 ` [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto Huacai Chen
11 siblings, 1 reply; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
Now that the LoongArch JIT implements atomic operations on arena
pointers, add LoongArch to the arena_atomics load-acquire/store-release
architecture guard so those subtests run on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
tools/testing/selftests/bpf/progs/arena_atomics.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/arena_atomics.c b/tools/testing/selftests/bpf/progs/arena_atomics.c
index 2e7751a85399..38a628b4ee24 100644
--- a/tools/testing/selftests/bpf/progs/arena_atomics.c
+++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
@@ -29,6 +29,7 @@ bool skip_all_tests = true;
#if defined(ENABLE_ATOMICS_TESTS) && \
defined(__BPF_FEATURE_ADDR_SPACE_CAST) && \
(defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
+ defined(__TARGET_ARCH_loongarch) || \
(defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64))
bool skip_lacq_srel_tests __attribute((__section__(".data"))) = false;
#else
@@ -316,6 +317,7 @@ int load_acquire(const void *ctx)
#if defined(ENABLE_ATOMICS_TESTS) && \
defined(__BPF_FEATURE_ADDR_SPACE_CAST) && \
(defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
+ defined(__TARGET_ARCH_loongarch) || \
(defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64))
#define LOAD_ACQUIRE_ARENA(SIZEOP, SIZE, SRC, DST) \
@@ -368,6 +370,7 @@ int store_release(const void *ctx)
#if defined(ENABLE_ATOMICS_TESTS) && \
defined(__BPF_FEATURE_ADDR_SPACE_CAST) && \
(defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
+ defined(__TARGET_ARCH_loongarch) || \
(defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64))
#define STORE_RELEASE_ARENA(SIZEOP, DST, VAL) \
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 11/11] selftests/bpf: Add LoongArch deny list
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (9 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 10/11] selftests/bpf: Enable arena atomics " George Guo
@ 2026-07-02 2:23 ` George Guo
2026-07-03 10:11 ` [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto Huacai Chen
11 siblings, 0 replies; 17+ messages in thread
From: George Guo @ 2026-07-02 2:23 UTC (permalink / raw)
To: Huacai Chen, Tiezhu Yang, Hengqi Chen, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko
Cc: WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
From: George Guo <guodongtai@kylinos.cn>
Add an empty test_progs deny list for LoongArch, mirroring the other
architectures, so BPF CI and local vmtest have a place to record cases
that cannot pass on LoongArch.
Signed-off-by: George Guo <guodongtai@kylinos.cn>
---
tools/testing/selftests/bpf/DENYLIST.loongarch | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/bpf/DENYLIST.loongarch
diff --git a/tools/testing/selftests/bpf/DENYLIST.loongarch b/tools/testing/selftests/bpf/DENYLIST.loongarch
new file mode 100644
index 000000000000..70cc007e396e
--- /dev/null
+++ b/tools/testing/selftests/bpf/DENYLIST.loongarch
@@ -0,0 +1 @@
+# loongarch64 deny list for BPF CI and local vmtest
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 01/11] LoongArch: BPF: Fix tail call count pointer offset for arena programs
2026-07-02 2:23 ` [PATCH bpf-next v2 01/11] LoongArch: BPF: Fix tail call count pointer offset for arena programs George Guo
@ 2026-07-02 2:35 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-07-02 2:35 UTC (permalink / raw)
To: George Guo; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The newly added multi-line comments violate the BPF subsystem's preferred comment style.
--
commit a7174bab96fc4c6edb488f83bc8733f74eac78c7
Author: George Guo <guodongtai@kylinos.cn>
LoongArch: BPF: Fix tail call count pointer offset for arena programs
This commit replaces the BPF_TAIL_CALL_CNT_PTR_STACK_OFF macro with a helper
function, tail_call_cnt_ptr_stack_off(). This function properly accounts for
the REG_ARENA slot in arena programs, resolving an issue where the JIT loaded
the counter value instead of the tail call count pointer.
Link: https://lore.kernel.org/all/20260629085511.359546-1-dongtai.guo@linux.dev
> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> index ad7e28375aa95..5e34e9e3f5086 100644
> --- a/arch/loongarch/net/bpf_jit.c
> +++ b/arch/loongarch/net/bpf_jit.c
> @@ -25,7 +25,23 @@
>
> #define REG_TCC LOONGARCH_GPR_A6
> #define REG_ARENA LOONGARCH_GPR_S6 /* For storing arena_vm_start */
> -#define BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack) (round_up(stack, 16) - 80)
> +
> +static int tail_call_cnt_ptr_stack_off(struct jit_ctx *ctx)
> +{
> + /* Ten words are pushed below the BPF stack: ra, fp, s0-s5, and the
> + * tail call count plus its pointer, which occupy the two deepest
> + * slots of the callee-saved area.
> + */
[Severity: Low]
This isn't a bug, but does this multi-line comment format adhere to the BPF
subsystem's preferred style? Multi-line comments in BPF-related paths usually
require the opening /* to be on its own line.
> + int offset = sizeof(long) * 10;
> +
> + /* An arena program reserves one extra word above them (REG_ARENA),
> + * which pushes the tail call count pointer down by one slot.
> + */
[Severity: Low]
This isn't a bug, but this comment also doesn't follow the preferred BPF
subsystem multi-line comment style of placing the opening /* on its own line.
> + if (ctx->arena_vm_start)
> + offset += sizeof(long);
> +
> + return round_up(ctx->stack_size, 16) - offset;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260702022322.51033-1-dongtai.guo@linux.dev?part=1
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 05/11] LoongArch: BPF: Add exceptions (bpf_throw) support
2026-07-02 2:23 ` [PATCH bpf-next v2 05/11] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
@ 2026-07-02 2:39 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-07-02 2:39 UTC (permalink / raw)
To: George Guo; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The BPF frame pointer ($r22) can be clobbered by bpf_throw() before it is captured, causing stack unwinding to fail and crash.
- [Low] Multi-line comment does not follow the required BPF kernel comment style.
--
commit a22e7d51563237179d5ffdc8ffe6290c25f1ddba
Author: George Guo <guodongtai@kylinos.cn>
LoongArch: BPF: Add exceptions (bpf_throw) support
This patch implements BPF exception support for LoongArch. It introduces
arch_bpf_stack_walk() to find the exception boundary frame using the ORC
unwinder, allowing bpf_throw() to properly unwind the stack and invoke
the exception callback.
> diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
> index 387dc4d3c4868..718c98b3f1fc4 100644
> --- a/arch/loongarch/kernel/stacktrace.c
> +++ b/arch/loongarch/kernel/stacktrace.c
[ ... ]
> +void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp),
> + void *cookie)
> +{
> + unsigned long fp;
> +
> + /*
> + * Capture the live frame pointer ($r22/$fp) here, before handing off to
> + * the worker. The kernel is built with -fomit-frame-pointer, so $fp is
> + * an ordinary callee-saved register that is preserved across the call
> + * from the JITed BPF program into bpf_throw() down to here, and thus
> + * still points at the innermost BPF frame. The ORC frame-pointer
> + * fallback walks the BPF frames up to the exception boundary from it.
> + *
> + * This must be a thin wrapper with no large stack locals: the worker
> + * uses $r22 to address its frame, which would clobber the live $fp
> + * before it could be read. __builtin_frame_address() cannot be used
> + * either, as it is $sp-derived and would yield a kernel-stack frame.
> + */
> + asm volatile("move %0, $r22" : "=r"(fp));
> + walk_stackframe_bpf(consume_fn, cookie, fp);
> +}
[Severity: High]
Can the C compiler clobber the BPF frame pointer ($r22) before this inline
assembly captures it?
Because the kernel is built with -fomit-frame-pointer, the compiler is free to
use $r22 as a general-purpose callee-saved register. If the intermediate C
caller (bpf_throw) modifies $r22 during its execution, this assembly will
capture the clobbered value instead of the BPF frame pointer.
If the frame pointer is clobbered, the ORC unwinder will fail to find the
exception boundary, leaving ctx.aux as NULL. bpf_throw() then unconditionally
dereferences ctx.aux, which would cause a kernel panic.
> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> index 3822e05a07790..f172ffc2c011c 100644
> --- a/arch/loongarch/net/bpf_jit.c
> +++ b/arch/loongarch/net/bpf_jit.c
> @@ -29,16 +29,20 @@
>
> static int tail_call_cnt_ptr_stack_off(struct jit_ctx *ctx)
> {
[ ... ]
> - /* An arena program reserves one extra word above them (REG_ARENA),
> - * which pushes the tail call count pointer down by one slot.
> + /* An arena or exception program reserves one extra word above them
> + * ($s6, see build_prologue()), which pushes the tail call count
> + * pointer down by one slot.
> */
[Severity: Low]
This isn't a bug, but does this multi-line comment follow the BPF subsystem
comment style? The BPF subsystem requires the opening /* of a multi-line
comment to be on its own line.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260702022322.51033-1-dongtai.guo@linux.dev?part=5
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 07/11] LoongArch: BPF: Support atomics on arena pointers
2026-07-02 2:23 ` [PATCH bpf-next v2 07/11] LoongArch: BPF: Support atomics on arena pointers George Guo
@ 2026-07-02 2:48 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-07-02 2:48 UTC (permalink / raw)
To: George Guo; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Critical] Faulting arena atomic operations fail to clear their destination registers, defeating the verifier's zero-extension tracking and enabling arbitrary out-of-bounds memory access.
- [High] Missing exception table entry for the `sc` (Store Conditional) instruction in `BPF_PROBE_ATOMIC` `cmpxchg` allows trivial kernel panics.
Pre-existing issues:
- [High] LoongArch's `ex_handler_bpf` does not report arena violations, breaking standard BPF arena semantics.
--
commit 445332124b454806c0eb995caa9dad9e7668feba
Author: George Guo <guodongtai@kylinos.cn>
LoongArch: BPF: Support atomics on arena pointers
This commit implements atomic operations on arena pointers (BPF_PROBE_ATOMIC)
for LoongArch. It adds exception table entries to handle page faults for
these memory accesses.
> diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
> index 4a3b632c1fde5..a7f2d45aef751 100644
> --- a/arch/loongarch/net/bpf_jit.c
> +++ b/arch/loongarch/net/bpf_jit.c
[ ... ]
> @@ -585,12 +600,14 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
> case BPF_CMPXCHG:
> move_reg(ctx, t2, r0);
> if (isdw) {
> + ll_idx = ctx->idx;
> emit_insn(ctx, lld, r0, t1, 0);
> emit_insn(ctx, bne, t2, r0, 4);
> move_reg(ctx, t3, src);
> emit_insn(ctx, scd, t3, t1, 0);
[Severity: High]
Does this Store Conditional (scd) instruction need an exception table entry?
If a BPF program executes a BPF_CMPXCHG operation on a read-only or
Copy-On-Write (COW) arena page, the Load Linked (lld) instruction will
succeed because it is a read, but the subsequent Store Conditional
instruction could trigger a page fault due to the write violation. Since
there is no exception table entry registered for this scd instruction,
would this result in an unhandled kernel page fault oops?
> emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -4);
> } else {
[ ... ]
> @@ -600,12 +617,42 @@ static int emit_atomic_rmw(const struct bpf_insn *insn, struct jit_ctx *ctx)
> emit_insn(ctx, beq, t3, LOONGARCH_GPR_ZERO, -6);
> emit_zext_32(ctx, r0, true);
> }
> + /*
> + * On arena the ll may fault (unmapped page); the page-fault
> + * handler restarts the program at @resume. Only the ll needs an
> + * entry: if it faults the sc is never reached, and once the ll
> + * succeeds the page is mapped so the sc cannot fault. Resume
> + * past the whole ll/sc loop.
> + */
> + if (arena) {
> + ret = __add_exception_handler(insn, ctx,
> + REG_DONT_CLEAR_MARKER,
> + ll_idx, ctx->idx);
[Severity: Critical]
Should the actual destination register (r0) be cleared here instead of
using REG_DONT_CLEAR_MARKER? If a fault occurs on the arena pointer, the
destination register retains its old value instead of being cleared.
Because the verifier assumes the register has been bounded (e.g., upper
32-bits zeroed), leaving the hardware register with an unbounded value
might allow out-of-bounds memory accesses.
> + if (ret)
> + return ret;
> + }
> break;
> default:
> pr_err_once("bpf-jit: invalid atomic read-modify-write opcode %02x\n", imm);
> return -EINVAL;
> }
>
> + /*
> + * For the single-instruction am* ops the memory access is the last
> + * emitted instruction; register its exception entry before emitting the
> + * deferred zero-extend so the fault resumes past it. cmpxchg handled
> + * its own entry above.
> + */
> + if (arena && imm != BPF_CMPXCHG) {
> + ret = __add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER,
> + ctx->idx - 1, ctx->idx + (zext ? 1 : 0));
[Severity: Critical]
Is it safe to pass REG_DONT_CLEAR_MARKER for fetch atomics? Similar to
the cmpxchg case, if an exception occurs on an invalid arena pointer,
the source register is not cleared. This can cause a divergence from
the verifier's zero extension tracking and potentially allow out-of-bounds
memory accesses.
> + if (ret)
> + return ret;
> + }
> +
> + if (zext)
> + emit_zext_32(ctx, src, true);
> +
> return 0;
> }
[ ... ]
> @@ -616,10 +663,37 @@ static int emit_atomic_ld_st(const struct bpf_insn *insn, struct jit_ctx *ctx)
> const u8 dst = regmap[insn->dst_reg];
> const s16 off = insn->off;
> const s32 imm = insn->imm;
> + const bool arena = BPF_MODE(insn->code) == BPF_PROBE_ATOMIC;
> + int ret;
>
> switch (imm) {
> /* dst_reg = load_acquire(src_reg + off16) */
> case BPF_LOAD_ACQ:
> + if (arena) {
> + /* t1 = src + off + arena_vm_start; load from [t1]. */
> + move_imm(ctx, t1, off, false);
> + emit_insn(ctx, addd, t1, src, t1);
> + emit_insn(ctx, addd, t1, t1, REG_ARENA);
> + switch (BPF_SIZE(insn->code)) {
> + case BPF_B:
> + emit_insn(ctx, ldbu, dst, t1, 0);
> + break;
> + case BPF_H:
> + emit_insn(ctx, ldhu, dst, t1, 0);
> + break;
> + case BPF_W:
> + emit_insn(ctx, ldwu, dst, t1, 0);
> + break;
> + case BPF_DW:
> + emit_insn(ctx, ldd, dst, t1, 0);
> + break;
> + }
> + ret = add_exception_handler(insn, ctx, REG_DONT_CLEAR_MARKER);
[Severity: Critical]
Could we pass the actual destination register rather than
REG_DONT_CLEAR_MARKER here? If the load acquisition faults on an unmapped
arena pointer, the destination register will not be cleared. The verifier
expects it to be cleared or bounded, which could allow out-of-bounds
memory reads if the register keeps its prior unbounded value.
> + if (ret)
> + return ret;
> + emit_insn(ctx, dbar, 0b10100);
> + break;
> + }
[ ... ]
> @@ -708,10 +807,6 @@ static bool is_signed_bpf_cond(u8 cond)
> cond == BPF_JSGE || cond == BPF_JSLE;
> }
>
> -#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
> -#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
> -#define REG_DONT_CLEAR_MARKER 0
> -
> bool ex_handler_bpf(const struct exception_table_entry *ex,
> struct pt_regs *regs)
> {
> @@ -725,12 +820,21 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
> return true;
> }
[Severity: High]
This is a pre-existing issue, but does ex_handler_bpf() need to report
arena violations to user space? If a BPF program accesses an out-of-bounds
arena pointer (such as the guard bands), this handler clears the
destination register and resumes execution without delivering a SIGSEGV.
Standard BPF arena semantics usually require calling
bpf_prog_report_arena_violation() in this path to ensure the user space
process is properly signaled.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260702022322.51033-1-dongtai.guo@linux.dev?part=7
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 10/11] selftests/bpf: Enable arena atomics tests on LoongArch
2026-07-02 2:23 ` [PATCH bpf-next v2 10/11] selftests/bpf: Enable arena atomics " George Guo
@ 2026-07-02 2:49 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-07-02 2:49 UTC (permalink / raw)
To: George Guo; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The patch enables arena atomics tests for LoongArch in `arena_atomics.c` but misses the identical architecture guard in `test_parallel_spmc.bpf.c`, failing to enable the libarena SPMC parallel tests.
--
commit 6a78ab7798074b26b2ec91b91c15b4a58e0531a1
Author: George Guo <guodongtai@kylinos.cn>
selftests/bpf: Enable arena atomics tests on LoongArch
This patch adds LoongArch to the arena_atomics load-acquire/store-release
architecture guards so those subtests are executed.
> diff --git a/tools/testing/selftests/bpf/progs/arena_atomics.c b/tools/testing/selftests/bpf/progs/arena_atomics.c
> index 2e7751a853999..38a628b4ee241 100644
> --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> @@ -29,6 +29,7 @@ bool skip_all_tests = true;
> #if defined(ENABLE_ATOMICS_TESTS) && \
> defined(__BPF_FEATURE_ADDR_SPACE_CAST) && \
> (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \
> + defined(__TARGET_ARCH_loongarch) || \
> (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64))
[Severity: Medium]
Should this patch also update the architecture guard in
tools/testing/selftests/bpf/libarena/selftests/test_parallel_spmc.bpf.c?
It looks like spmc_tests_enabled() in that file has an identical architecture
guard for arena atomics that currently misses LoongArch. Without adding it
there as well, those parallel SPMC arena atomics tests will be skipped and
return -EOPNOTSUPP.
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260702022322.51033-1-dongtai.guo@linux.dev?part=10
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
` (10 preceding siblings ...)
2026-07-02 2:23 ` [PATCH bpf-next v2 11/11] selftests/bpf: Add LoongArch deny list George Guo
@ 2026-07-03 10:11 ` Huacai Chen
11 siblings, 0 replies; 17+ messages in thread
From: Huacai Chen @ 2026-07-03 10:11 UTC (permalink / raw)
To: George Guo
Cc: Tiezhu Yang, Hengqi Chen, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, WANG Xuerui, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
George Guo, bpf, loongarch, linux-kernel
Hi, Tiezhu and Hengqi,
Could you please pay some time to review this series?
Huacai
On Thu, Jul 2, 2026 at 10:23 AM George Guo <dongtai.guo@linux.dev> wrote:
>
> This series adds the remaining LoongArch BPF JIT features and enables the
> matching selftests:
>
> - internal-only MOV for per-CPU address resolution
> - timed may_goto
> - per-program private stacks
> - exceptions (bpf_throw)
> - sign-extending loads from arena (PROBE_MEM32SX)
> - atomics on arena pointers (PROBE_ATOMIC)
> - selftests: struct_ops private stack, arena LDSX, arena atomics, and a
> LoongArch deny list
>
> Patch 1 ("LoongArch: BPF: Fix tail call count pointer offset for arena
> programs") is the same fix already posted separately for the bpf tree;
> several patches in this series (notably exceptions, patch 5) build on
> top of the helper it introduces, so it is included here, unchanged, to
> keep the series self-contained and applicable on bpf-next. It can be
> dropped once it lands via the bpf tree.
>
> LoongArch: BPF: Fix tail call count pointer offset for arena programs
> https://lore.kernel.org/all/20260629085511.359546-1-dongtai.guo@linux.dev
>
> Based on 7.2-rc1.
>
> v2:
> - Included the tail call count pointer offset fix as patch 1/11 (see
> above) instead of only referencing it, so the series applies cleanly
> on bpf-next without waiting on the bpf tree; the empty prog_array
> off-by-one was fixed independently upstream by commit 0379d10f09bc
> ("LoongArch: BPF: Fix off-by-one error in tail call"), now in 7.2-rc1.
> - timed may_goto: store $ra at the ORC-mandated .ra_offset = -8 slot in
> the trampoline (it was stored at the wrong slot in v1), per Tiezhu
> Yang's review.
> - Consolidated the earlier per-CPU MOV / timed may_goto and arena gating
> postings into one series; rebased on 7.2-rc1.
> - selftests: use the upstream __arch_loongarch / "LOONGARCH" test_loader
> support now in 7.2-rc1 instead of introducing it.
> - Prior postings:
> https://lore.kernel.org/all/20260609041407.122384-1-dongtai.guo@linux.dev
> https://lore.kernel.org/all/20260618033809.98253-1-dongtai.guo@linux.dev
>
> George Guo (11):
> LoongArch: BPF: Fix tail call count pointer offset for arena programs
> LoongArch: BPF: Support internal-only MOV to resolve per-CPU addrs
> LoongArch: BPF: Add timed may_goto support
> LoongArch: BPF: Add private stack support
> LoongArch: BPF: Add exceptions (bpf_throw) support
> LoongArch: BPF: Support sign-extending loads from arena
> LoongArch: BPF: Support atomics on arena pointers
> selftests/bpf: Enable struct_ops private stack test for LoongArch
> selftests/bpf: Enable arena LDSX tests on LoongArch
> selftests/bpf: Enable arena atomics tests on LoongArch
> selftests/bpf: Add LoongArch deny list
>
> arch/loongarch/include/asm/inst.h | 1 +
> arch/loongarch/kernel/stacktrace.c | 52 +++
> arch/loongarch/net/Makefile | 2 +-
> arch/loongarch/net/bpf_jit.c | 397 ++++++++++++++++--
> arch/loongarch/net/bpf_jit.h | 1 +
> arch/loongarch/net/bpf_timed_may_goto.S | 47 +++
> .../testing/selftests/bpf/DENYLIST.loongarch | 1 +
> .../bpf/prog_tests/struct_ops_private_stack.c | 2 +-
> .../selftests/bpf/progs/arena_atomics.c | 3 +
> .../selftests/bpf/progs/verifier_ldsx.c | 17 +
> 10 files changed, 485 insertions(+), 38 deletions(-)
> create mode 100644 arch/loongarch/net/bpf_timed_may_goto.S
> create mode 100644 tools/testing/selftests/bpf/DENYLIST.loongarch
>
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-07-03 10:11 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02 2:23 [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 01/11] LoongArch: BPF: Fix tail call count pointer offset for arena programs George Guo
2026-07-02 2:35 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 02/11] LoongArch: BPF: Support internal-only MOV to resolve per-CPU addrs George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 03/11] LoongArch: BPF: Add timed may_goto support George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 04/11] LoongArch: BPF: Add private stack support George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 05/11] LoongArch: BPF: Add exceptions (bpf_throw) support George Guo
2026-07-02 2:39 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 06/11] LoongArch: BPF: Support sign-extending loads from arena George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 07/11] LoongArch: BPF: Support atomics on arena pointers George Guo
2026-07-02 2:48 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 08/11] selftests/bpf: Enable struct_ops private stack test for LoongArch George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 09/11] selftests/bpf: Enable arena LDSX tests on LoongArch George Guo
2026-07-02 2:23 ` [PATCH bpf-next v2 10/11] selftests/bpf: Enable arena atomics " George Guo
2026-07-02 2:49 ` sashiko-bot
2026-07-02 2:23 ` [PATCH bpf-next v2 11/11] selftests/bpf: Add LoongArch deny list George Guo
2026-07-03 10:11 ` [PATCH bpf-next v2 00/11] LoongArch: BPF: arena features, exceptions, private stack and may_goto Huacai Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox