[PATCH 0/4] bpf/arm64: add BPF_ABS/BPF

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
@ 2026-06-08 20:28 Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
                   ` (9 more replies)
  0 siblings, 10 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Discovered this while exploring packet filtering.
The arm64 BPF JIT did not implement BPF_LD | BPF_ABS or BPF_LD | BPF_IND,
so cBPF filters converted by rte_bpf_convert() could not be JIT compiled
and silently fell back to the interpreter on arm64.

The first patch fixes a latent bug in emit_return_zero_if_src_zero():
the offset of the branch to the epilogue was held in an unsigned.
A backward branch wrapped around. Existing JIT tests were never
being run on ARM.

The next two patches make the bpf tests assert that,
on an architecture with a JITbackend, code is actually generated,
so a missing or failed JIT is reported rather than skipped.

The final patch adds the ABS/IND opcodes, mirroring the x86 JIT
with a fast path for data in the first
mbuf segment and a __rte_pktmbuf_read() slow path for the rest.

Stephen Hemminger (4):
  bpf/arm64: fix zero-return branch in multi-exit programs
  test: bpf check that JIT was generated
  test: bpf check that bpf_convert can be JIT'd
  bpf/arm64: add BPF_ABS/BPF_IND packet load support

 app/test/test_bpf.c     |  23 ++++++-
 lib/bpf/bpf_jit_arm64.c | 149 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 169 insertions(+), 3 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 18:03   ` Marat Khalili
  2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili, Jerin Jacob

If a JIT'd BPF program has more than one exit,
the branch to the epilogue can be backwards.

The current code assumed it is always forward:
emit_return_zero_if_src_zero() held the offset in an unsigned uint16_t,
so a backward (negative) offset wrapped to a large positive value and
branch off the end of the program, faulting at run time.

This was masked until now: the only test with this shape, test_ld_mbuf,
needs BPF_ABS/BPF_IND which the arm64 JIT did not implement, so it never
ran under the JIT.  The x86 JIT is unaffected because emit_epilog() keeps a
single exit (st->exit.off) reached from later exits and the divide-by-zero
check via a signed absolute jump (emit_abs_jcc), so direction does not
matter.

Use a signed offset; emit_b() already sign-extends imm26 correctly.

Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index a04ef33a9c..099822e9f1 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -957,7 +957,7 @@ static void
 emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
 {
 	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
-	uint16_t jump_to_epilogue;
+	int32_t jump_to_epilogue;

 	emit_cbnz(ctx, is64, src, 3);
 	emit_mov_imm(ctx, is64, r0, 0);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 2/4] test: bpf check that JIT was generated
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 18:09   ` Marat Khalili
  2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

Avoid silently ignoring JIT failures. The test cases should
all succeed JIT compilation; if not it is a bug in the JIT
implementation and should be reported.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index dd24722450..79d547dc82 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3508,6 +3508,14 @@ run_test(const struct bpf_test *tst)
 				rv, strerror(rv));
 		}
 	}
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	else {
+		/* a JIT backend exists for this arch, so it must compile */
+		printf("%s@%d: %s: no JIT code generated;\n",
+			__func__, __LINE__, tst->name);
+		ret = -1;
+	}
+#endif
 
 	rte_bpf_destroy(bpf);
 	return ret;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 18:14   ` Marat Khalili
  2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

Add followup in bpf conversion tests to make sure resulting
code was also run through JIT.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 79d547dc82..f5ab447ff6 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -4569,6 +4569,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 	struct bpf_program fcode;
 	struct rte_bpf_prm *prm = NULL;
 	struct rte_bpf *bpf = NULL;
+	int ret = -1;
 
 	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
@@ -4592,6 +4593,18 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 			__func__, __LINE__, rte_errno, strerror(rte_errno));
 		goto error;
 	}
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func == NULL) {
+			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
+			goto error;
+		}
+	}
+#endif
+	ret = 0;
 
 error:
 	if (bpf)
@@ -4603,7 +4616,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 
 	rte_free(prm);
 	pcap_freecode(&fcode);
-	return (bpf == NULL) ? -1 : 0;
+	return ret;
 }
 
 static int
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (2 preceding siblings ...)
  2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 19:35   ` Marat Khalili
  2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili

The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
"invalid opcode", so cBPF programs converted by rte_bpf_convert() could
not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
data held in the first mbuf segment and a __rte_pktmbuf_read() slow path
for everything else. Programs using these opcodes now use the call
register layout, since the slow path makes a function call.

Bugzilla ID: 1427

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 147 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 099822e9f1..6952c61806 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -1123,6 +1123,133 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
 	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
 }
 
+/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
+enum {
+	LDMB_FAST_OFS, /* fast path */
+	LDMB_SLOW_OFS, /* slow path */
+	LDMB_FIN_OFS,  /* common tail */
+	LDMB_OFS_NUM
+};
+
+/*
+ * Helper for emit_ld_mbuf(): fast path.
+ * Compute the packet offset; if it lies inside the first segment leave the
+ * data pointer in R0, otherwise branch to the slow path.
+ */
+static void
+emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
+		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
+
+	/* off = imm (+ src for BPF_IND) */
+	emit_mov_imm(ctx, 1, tmp1, imm);
+	if (mode == BPF_IND)
+		emit_add(ctx, 1, tmp1, src);
+
+	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_sub(ctx, 1, tmp2, tmp1);
+	emit_mov_imm(ctx, 1, tmp3, sz);
+	emit_cmp(ctx, 1, tmp2, tmp3);
+	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
+	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
+	emit_add(ctx, 1, r0, tmp2);
+	emit_add(ctx, 1, r0, tmp1);
+
+	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
+}
+
+/*
+ * Helper for emit_ld_mbuf(): slow path.
+ * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
+ * The scratch buffer is the space reserved by __rte_bpf_validate() at the
+ * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
+ */
+static void
+emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+
+	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
+	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
+	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
+	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
+	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
+
+	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
+	emit_return_zero_if_src_zero(ctx, 1, r0);
+}
+
+/*
+ * Helper for emit_ld_mbuf(): common tail.
+ * Load the value pointed to by R0 and convert from network byte order.
+ */
+static void
+emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+
+	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
+	if (opsz != BPF_B)
+		emit_be(ctx, r0, sz * 8);
+}
+
+/*
+ * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads:
+ *
+ *	off = imm (+ src for BPF_IND)
+ *	if (mbuf->data_len - off >= sz)			    -- fast path
+ *		ptr = mbuf->buf_addr + mbuf->data_off + off;
+ *	else						    -- slow path
+ *		ptr = __rte_pktmbuf_read(mbuf, off, sz, buf);
+ *		if (ptr == NULL)
+ *			return 0;
+ *	R0 = ntoh(*(size *)ptr);			    -- common tail
+ *
+ * The three blocks are sized in a dry run so the forward branches can be
+ * resolved, then emitted for real (arm64 instructions are fixed width, so
+ * the dry run reproduces the real instruction count exactly).
+ */
+static void
+emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
+	     uint32_t stack_ofs)
+{
+	uint8_t mode = BPF_MODE(op);
+	uint8_t opsz = BPF_SIZE(op);
+	uint32_t sz = bpf_size(opsz);
+	uint32_t ofs[LDMB_OFS_NUM];
+
+	/* seed offsets so the dry-run branches stay in range */
+	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
+
+	/* dry run to record block offsets */
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	ofs[LDMB_SLOW_OFS] = ctx->idx;
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	ofs[LDMB_FIN_OFS] = ctx->idx;
+	emit_ldmb_fin(ctx, opsz, sz);
+
+	/* rewind and emit for real with resolved offsets */
+	ctx->idx = ofs[LDMB_FAST_OFS];
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	emit_ldmb_fin(ctx, opsz, sz);
+}
+
 static void
 check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 {
@@ -1135,8 +1262,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 		op = ins->code;
 
 		switch (op) {
-		/* Call imm */
+		/*
+		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
+		 * so they need the call-clobbered register layout as well.
+		 */
 		case (BPF_JMP | EBPF_CALL):
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
 			ctx->foundcall = 1;
 			return;
 		}
@@ -1338,6 +1474,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 			emit_mov_imm(ctx, 1, dst, u64);
 			i++;
 			break;
+		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
+			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
+			break;
 		/* *(size *)(dst + off) = src */
 		case (BPF_STX | BPF_MEM | BPF_B):
 		case (BPF_STX | BPF_MEM | BPF_H):
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* RE: [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (3 preceding siblings ...)
  2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-17 17:37 ` Marat Khalili
  2026-06-17 21:17   ` Stephen Hemminger
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 73+ messages in thread
From: Marat Khalili @ 2026-06-17 17:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org

I think this should CC participants of the previous discussion: 
https://inbox.dpdk.org/dev/20260319114500.9757-1-cfontain@redhat.com/

(Humans may think they submit patches independently, but weights of their LLMs
were already contaminated with the other effort. Hello brave new world.)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
@ 2026-06-17 18:03   ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-17 18:03 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org
  Cc: stable@dpdk.org, Wathsala Vithanage, Konstantin Ananyev,
	Jerin Jacob

> If a JIT'd BPF program has more than one exit,
> the branch to the epilogue can be backwards.
> 
> The current code assumed it is always forward:
> emit_return_zero_if_src_zero() held the offset in an unsigned uint16_t,
> so a backward (negative) offset wrapped to a large positive value and
> branch off the end of the program, faulting at run time.
> 
> This was masked until now: the only test with this shape, test_ld_mbuf,
> needs BPF_ABS/BPF_IND which the arm64 JIT did not implement, so it never
> ran under the JIT.  The x86 JIT is unaffected because emit_epilog() keeps a
> single exit (st->exit.off) reached from later exits and the divide-by-zero
> check via a signed absolute jump (emit_abs_jcc), so direction does not
> matter.
> 
> Use a signed offset; emit_b() already sign-extends imm26 correctly.

This line is unclear to me, but that's not the main issue.

> Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_jit_arm64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
> index a04ef33a9c..099822e9f1 100644
> --- a/lib/bpf/bpf_jit_arm64.c
> +++ b/lib/bpf/bpf_jit_arm64.c
> @@ -957,7 +957,7 @@ static void
>  emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
>  {
>  	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> -	uint16_t jump_to_epilogue;
> +	int32_t jump_to_epilogue;
> 
>  	emit_cbnz(ctx, is64, src, 3);
>  	emit_mov_imm(ctx, is64, r0, 0);
> --
> 2.53.0

In the very next line the offset is calculated as follows though:

    jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;

From its appearance, it can never be negative. However, ctx->program_sz is set
in emit_epilogue which is issued on every BPF_EXIT, so mid-generation it
contains the location of the last issued epilogue (including possibly previous
pass), not the program size. With this interpretation the code technically
works, but I'd suggest either renaming program_sz to something like
epilogue_offset, or making sure it is only set to the actual program size (sans
prologue and epilogue). In the latter case the jump offset will never be
negative, although the change to int32_t is still helpful in extending the
range of supported programs from 2**16 instructions. Since only 26 signed bits
are actually available maybe some check or assert is also warranted.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH 2/4] test: bpf check that JIT was generated
  2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
@ 2026-06-17 18:09   ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-17 18:09 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday 8 June 2026 21:29
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH 2/4] test: bpf check that JIT was generated
> 
> Avoid silently ignoring JIT failures. The test cases should
> all succeed JIT compilation; if not it is a bug in the JIT
> implementation and should be reported.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index dd24722450..79d547dc82 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -3508,6 +3508,14 @@ run_test(const struct bpf_test *tst)
>  				rv, strerror(rv));
>  		}
>  	}
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> +	else {
> +		/* a JIT backend exists for this arch, so it must compile */
> +		printf("%s@%d: %s: no JIT code generated;\n",
> +			__func__, __LINE__, tst->name);
> +		ret = -1;
> +	}
> +#endif
> 
>  	rte_bpf_destroy(bpf);
>  	return ret;
> --
> 2.53.0

Acked-by: Marat Khalili <marat.khalili@huawei.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd
  2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-17 18:14   ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-17 18:14 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday 8 June 2026 21:29
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd
> 
> Add followup in bpf conversion tests to make sure resulting
> code was also run through JIT.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index 79d547dc82..f5ab447ff6 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -4569,6 +4569,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
>  	struct bpf_program fcode;
>  	struct rte_bpf_prm *prm = NULL;
>  	struct rte_bpf *bpf = NULL;
> +	int ret = -1;
> 
>  	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
>  		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
> @@ -4592,6 +4593,18 @@ test_bpf_filter(pcap_t *pcap, const char *s)
>  			__func__, __LINE__, rte_errno, strerror(rte_errno));
>  		goto error;
>  	}
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> +	{
> +		struct rte_bpf_jit jit;
> +
> +		rte_bpf_get_jit(bpf, &jit);
> +		if (jit.func == NULL) {
> +			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
> +			goto error;
> +		}
> +	}
> +#endif
> +	ret = 0;
> 
>  error:
>  	if (bpf)
> @@ -4603,7 +4616,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
> 
>  	rte_free(prm);
>  	pcap_freecode(&fcode);
> -	return (bpf == NULL) ? -1 : 0;
> +	return ret;
>  }
> 
>  static int
> --
> 2.53.0

Acked-by: Marat Khalili <marat.khalili@huawei.com>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-17 19:35   ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-17 19:35 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Wathsala Vithanage, Konstantin Ananyev

Thank you for doing this. I suggest comparing against the previous effort by Christophe Fontaine though.

Couple of comments inline, superficially looks correct otherwise.

> +/*
> + * Helper for emit_ld_mbuf(): fast path.
> + * Compute the packet offset; if it lies inside the first segment leave the
> + * data pointer in R0, otherwise branch to the slow path.
> + */
> +static void
> +emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
> +		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
> +{
> +	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> +	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
> +	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
> +	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
> +	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
> +
> +	/* off = imm (+ src for BPF_IND) */
> +	emit_mov_imm(ctx, 1, tmp1, imm);
> +	if (mode == BPF_IND)
> +		emit_add(ctx, 1, tmp1, src);
> +
> +	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
> +	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
> +	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
> +	emit_sub(ctx, 1, tmp2, tmp1);
> +	emit_mov_imm(ctx, 1, tmp3, sz);
> +	emit_cmp(ctx, 1, tmp2, tmp3);
> +	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));

Are we checking that (int64_t)off >= 0 anywhere?

> +
> +	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
> +	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
> +	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
> +	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
> +	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
> +	emit_add(ctx, 1, r0, tmp2);
> +	emit_add(ctx, 1, r0, tmp1);
> +
> +	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
> +}

// snip

> +/*
> + * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads:
> + *
> + *	off = imm (+ src for BPF_IND)
> + *	if (mbuf->data_len - off >= sz)			    -- fast path
> + *		ptr = mbuf->buf_addr + mbuf->data_off + off;
> + *	else						    -- slow path
> + *		ptr = __rte_pktmbuf_read(mbuf, off, sz, buf);
> + *		if (ptr == NULL)
> + *			return 0;
> + *	R0 = ntoh(*(size *)ptr);			    -- common tail

nit: this pseudo-code could probably be made more C-like.

> + *
> + * The three blocks are sized in a dry run so the forward branches can be
> + * resolved, then emitted for real (arm64 instructions are fixed width, so
> + * the dry run reproduces the real instruction count exactly).
> + */
> +static void
> +emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
> +	     uint32_t stack_ofs)
> +{
> +	uint8_t mode = BPF_MODE(op);
> +	uint8_t opsz = BPF_SIZE(op);
> +	uint32_t sz = bpf_size(opsz);
> +	uint32_t ofs[LDMB_OFS_NUM];
> +
> +	/* seed offsets so the dry-run branches stay in range */
> +	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
> +
> +	/* dry run to record block offsets */
> +	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
> +	ofs[LDMB_SLOW_OFS] = ctx->idx;
> +	emit_ldmb_slow_path(ctx, sz, stack_ofs);
> +	ofs[LDMB_FIN_OFS] = ctx->idx;
> +	emit_ldmb_fin(ctx, opsz, sz);

nit: we already do two passes for the whole program, could avoid quadruple work here

> +
> +	/* rewind and emit for real with resolved offsets */
> +	ctx->idx = ofs[LDMB_FAST_OFS];
> +	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
> +	emit_ldmb_slow_path(ctx, sz, stack_ofs);
> +	emit_ldmb_fin(ctx, opsz, sz);
> +}

// snip the rest

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
@ 2026-06-17 21:17   ` Stephen Hemminger
  0 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-17 21:17 UTC (permalink / raw)
  To: Marat Khalili; +Cc: dev@dpdk.org

On Wed, 17 Jun 2026 17:37:39 +0000
Marat Khalili <marat.khalili@huawei.com> wrote:

> I think this should CC participants of the previous discussion: 
> https://inbox.dpdk.org/dev/20260319114500.9757-1-cfontain@redhat.com/
> 
> (Humans may think they submit patches independently, but weights of their LLMs
> were already contaminated with the other effort. Hello brave new world.)

Didn't see previous thread. This effort was more targeted at
why can't capture which uses pcap_compile -> bpf_convert flow be JIT'd?

Had not looked back at other overlaps.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 0/6] bpf: JIT related bug fixes
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (4 preceding siblings ...)
  2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
@ 2026-06-18 20:47 ` Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
                     ` (5 more replies)
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
                   ` (3 subsequent siblings)
  9 siblings, 6 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-18 20:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

While implementing JIT for packet capture ran into several issues:
  1. x86 JIT had pre-existing bug which would crash
  2. ARM64 BPF JIT was missing instructions for packet access.
     Which had been discovered previously [1]
  3. Tests related to JIT were not being run or missing coverage.

Fixed all of these. Patches are ordered so that most urgent fix
is first, follwed by the test that should have caught the problem.

The arm64 epilogue branch fix (patch 3) was originally posted by Christophe
Fontaine [1]; that series stalled, so it is carried here with his
authorship.

Changes since v1:
 - add x86 BPF_JSET encoding fix and a regression test (patches 1-2),
   found once the convert test ran generated code through the JIT
 - carry Christophe's arm64 epilogue fix with his sign-off (patch 3)
 - convert test now runs the converted filters through the JIT, not just
   loads them (patch 6)
 - kept Marat's ack (patch 4)
   Since tests change enough, decided to drop his ack for that part.

[1] https://inbox.dpdk.org/dev/20260319114500.9757-2-cfontain@redhat.com/

Christophe Fontaine (1):
  bpf/arm64: fix offset type to allow a negative jump

Stephen Hemminger (5):
  bpf/x86: fix JIT encoding of BPF_JSET with immediate
  test/bpf: add JSET test with small immediate
  test/bpf: check that JIT was generated
  bpf/arm64: add BPF_ABS/BPF_IND packet load support
  test/bpf: check that bpf_convert can be JIT'd

 app/test/test_bpf.c     | 184 ++++++++++++++++++++++++++++++++++++----
 lib/bpf/bpf_jit_arm64.c | 153 ++++++++++++++++++++++++++++++++-
 lib/bpf/bpf_jit_x86.c   |   2 +-
 3 files changed, 321 insertions(+), 18 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-18 20:47   ` Stephen Hemminger
  2026-06-19  2:09     ` Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 2/6] test/bpf: add JSET test with small immediate Stephen Hemminger
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-18 20:47 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Konstantin Ananyev, Marat Khalili,
	Ferruh Yigit

emit_tst_imm() emits TEST (0xF7 /0) but sized the immediate with
imm_size(), which can return 1 byte. TEST has no imm8 form; it always
takes imm32. A small mask like BPF_JSET | BPF_K #0x1 then produced a
4-byte instruction the CPU decodes as 7, swallowing the following Jcc
and crashing.

Always emit a 32-bit immediate for TEST.

Bugzilla ID: 1959
Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_x86.c b/lib/bpf/bpf_jit_x86.c
index 88b1b5aeab..0ffe3783ff 100644
--- a/lib/bpf/bpf_jit_x86.c
+++ b/lib/bpf/bpf_jit_x86.c
@@ -921,7 +921,7 @@ emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
 	emit_rex(st, op, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(int32_t));
 }
 
 static void
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 2/6] test/bpf: add JSET test with small immediate
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
@ 2026-06-18 20:47   ` Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 3/6] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-18 20:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

The existing jump test only used a 32-bit JSET mask,
so the broken imm8 encoding of TEST in the x86 JIT was never exercised.
Add a case with a byte-sized mask;
run_test() runs it through the interpreter and the JIT.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index dd24722450..e70dea736f 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3158,7 +3158,89 @@ static const struct ebpf_insn test_ld_mbuf3_prog[] = {
 };
 
 /* all bpf test cases */
+/*
+ * JSET with a byte-sized mask: exercises the imm8 path of the TEST
+ * encoding in the x86 JIT (a 32-bit mask takes a different path).
+ */
+static const struct ebpf_insn test_jset1_prog[] = {
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	/* bit 0 is set in the input: branch is taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x1,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x1,
+	},
+	/* bit 1 is clear in the input: branch is not taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x2,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_jset1_prepare(void *arg)
+{
+	struct dummy_offset *df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u8 = 0x1;	/* bit 0 set, bit 1 clear */
+}
+
+static int
+test_jset1_check(uint64_t rc, const void *arg)
+{
+	return cmp_res(__func__, 0x1, rc, arg, arg, 0);
+}
+
 static const struct bpf_test tests[] = {
+	{
+		.name = "test_jset1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_jset1_prog,
+			.nb_ins = RTE_DIM(test_jset1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_jset1_prepare,
+		.check_result = test_jset1_check,
+	},
 	{
 		.name = "test_store1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 3/6] bpf/arm64: fix offset type to allow a negative jump
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 2/6] test/bpf: add JSET test with small immediate Stephen Hemminger
@ 2026-06-18 20:47   ` Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 4/6] test/bpf: check that JIT was generated Stephen Hemminger
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-18 20:47 UTC (permalink / raw)
  To: dev
  Cc: Christophe Fontaine, stable, Stephen Hemminger,
	Wathsala Vithanage, Konstantin Ananyev, Marat Khalili,
	Jerin Jacob

From: Christophe Fontaine <cfontain@redhat.com>

The DPDK BPF JIT standalone test test_ld_mbuf1 fails on arm64.
It does:
	r6 = r1                    // mbuf
	r0 = *(u8 *)pkt[0]         // BPF_ABS
	if ((r0 & 0xf0) == 0x40)
		goto parse
	r0 = 0
	exit                       // epilogue E0
parse:
	r0 = *(u8 *)pkt[r0 + 3]    // BPF_IND
	...
	exit

emit_return_zero_if_src_zero() returns 0 by branching to a function
epilogue. The target maybe a previous epilogue so branch
might be backwards; therefore the offset needs to be negative.

The offset was stored in a uint16_t, so a negative value wrapped to a
large positive number; emit_b() then branched past the end of the
program and faulted at run time.

Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
Cc: stable@dpdk.org

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index a04ef33a9c..67e42015de 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -957,10 +957,12 @@ static void
 emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
 {
 	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
-	uint16_t jump_to_epilogue;
+	int32_t jump_to_epilogue;
 
 	emit_cbnz(ctx, is64, src, 3);
 	emit_mov_imm(ctx, is64, r0, 0);
+
+	/* maybe backwards branch to earlier epilogue */
 	jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;
 	emit_b(ctx, jump_to_epilogue);
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 4/6] test/bpf: check that JIT was generated
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
                     ` (2 preceding siblings ...)
  2026-06-18 20:47   ` [PATCH v2 3/6] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
@ 2026-06-18 20:47   ` Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 6/6] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  5 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-18 20:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

Avoid silently ignoring JIT failures. The test cases should
all succeed JIT compilation; if not it is a bug in the JIT
implementation and should be reported.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index e70dea736f..3a88434c3c 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3590,6 +3590,14 @@ run_test(const struct bpf_test *tst)
 				rv, strerror(rv));
 		}
 	}
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	else {
+		/* a JIT backend exists for this arch, so it must compile */
+		printf("%s@%d: %s: no JIT code generated;\n",
+			__func__, __LINE__, tst->name);
+		ret = -1;
+	}
+#endif
 
 	rte_bpf_destroy(bpf);
 	return ret;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
                     ` (3 preceding siblings ...)
  2026-06-18 20:47   ` [PATCH v2 4/6] test/bpf: check that JIT was generated Stephen Hemminger
@ 2026-06-18 20:47   ` Stephen Hemminger
  2026-06-18 20:47   ` [PATCH v2 6/6] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  5 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-18 20:47 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili

The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
"invalid opcode", so cBPF programs converted by rte_bpf_convert() could
not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
data held in the first mbuf segment, and a __rte_pktmbuf_read() slow
path for everything else.

The forward branches over the call cannot use fixed distances:
emit_call() materializes the helper address with a variable number of
mov/movk instructions, so the block sizes are not known up front. Size
the three blocks (fast path, slow path, common tail) in a dry run, then
emit for real with the branches resolved from the measured offsets.

Programs using these opcodes use the call register layout, since the
slow path makes a function call.

Bugzilla ID: 1427

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 149 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 148 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 67e42015de..3fc16fd984 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -1125,6 +1125,135 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
 	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
 }
 
+/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
+enum {
+	LDMB_FAST_OFS, /* fast path */
+	LDMB_SLOW_OFS, /* slow path */
+	LDMB_FIN_OFS,  /* common tail */
+	LDMB_OFS_NUM
+};
+
+/*
+ * Helper for emit_ld_mbuf(): fast path.
+ * Compute the packet offset; if it lies inside the first segment leave the
+ * data pointer in R0, otherwise branch to the slow path.
+ */
+static void
+emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
+		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
+
+	/* off = imm (+ src for BPF_IND) */
+	emit_mov_imm(ctx, 1, tmp1, imm);
+	if (mode == BPF_IND)
+		emit_add(ctx, 1, tmp1, src);
+
+	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_sub(ctx, 1, tmp2, tmp1);
+	emit_mov_imm(ctx, 1, tmp3, sz);
+	emit_cmp(ctx, 1, tmp2, tmp3);
+	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
+	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
+	emit_add(ctx, 1, r0, tmp2);
+	emit_add(ctx, 1, r0, tmp1);
+
+	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
+}
+
+/*
+ * Helper for emit_ld_mbuf(): slow path.
+ * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
+ * The scratch buffer is the space reserved by __rte_bpf_validate() at the
+ * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
+ */
+static void
+emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+
+	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
+	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
+	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
+	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
+	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
+
+	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
+	emit_return_zero_if_src_zero(ctx, 1, r0);
+}
+
+/*
+ * Helper for emit_ld_mbuf(): common tail.
+ * Load the value pointed to by R0 and convert from network byte order.
+ */
+static void
+emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+
+	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
+	if (opsz != BPF_B)
+		emit_be(ctx, r0, sz * 8);
+}
+
+/*
+ * emit code for BPF_ABS/BPF_IND load.
+ * generates the following construction:
+ * fast_path:
+ *   off = src + imm
+ *   if (mbuf->data_len - off < sz)
+ *      goto slow_path;
+ *   ptr = mbuf->buf_addr + mbuf->data_off + off;
+ *   goto fin_part;
+ * slow_path:
+ *   typeof(sz) buf;  // scratch space reserved on the eBPF stack
+ *   ptr = __rte_pktmbuf_read(mbuf, off, sz, &buf);
+ *   if (ptr == NULL)
+ *      return 0;
+ * fin_part:
+ *   res = *(typeof(sz))ptr;
+ *   res = ntoh(res);
+ */
+static void
+emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
+	     uint32_t stack_ofs)
+{
+	uint8_t mode = BPF_MODE(op);
+	uint8_t opsz = BPF_SIZE(op);
+	uint32_t sz = bpf_size(opsz);
+	uint32_t ofs[LDMB_OFS_NUM];
+
+	/* seed offsets so the dry-run branches stay in range */
+	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
+
+	/* dry run to record block offsets */
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	ofs[LDMB_SLOW_OFS] = ctx->idx;
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	ofs[LDMB_FIN_OFS] = ctx->idx;
+	emit_ldmb_fin(ctx, opsz, sz);
+
+	/* rewind and emit for real with resolved offsets */
+	ctx->idx = ofs[LDMB_FAST_OFS];
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	emit_ldmb_fin(ctx, opsz, sz);
+}
+
 static void
 check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 {
@@ -1137,8 +1266,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 		op = ins->code;
 
 		switch (op) {
-		/* Call imm */
+		/*
+		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
+		 * so they need the call-clobbered register layout as well.
+		 */
 		case (BPF_JMP | EBPF_CALL):
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
 			ctx->foundcall = 1;
 			return;
 		}
@@ -1340,6 +1478,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 			emit_mov_imm(ctx, 1, dst, u64);
 			i++;
 			break;
+		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
+			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
+			break;
 		/* *(size *)(dst + off) = src */
 		case (BPF_STX | BPF_MEM | BPF_B):
 		case (BPF_STX | BPF_MEM | BPF_H):
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v2 6/6] test/bpf: check that bpf_convert can be JIT'd
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
                     ` (4 preceding siblings ...)
  2026-06-18 20:47   ` [PATCH v2 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-18 20:47   ` Stephen Hemminger
  5 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-18 20:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

Add followup in bpf conversion tests to make sure resulting
code was also run through JIT and that JIT produces
same results as non-JIT.

Reduce log output to make it easier to match which
expression might be causing issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 94 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 79 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 3a88434c3c..973dd7d659 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -8,6 +8,7 @@
 #include <inttypes.h>
 #include <unistd.h>
 
+#include <rte_byteorder.h>
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
@@ -32,6 +33,7 @@ test_bpf(void)
 #include <rte_bpf.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
+#include <rte_udp.h>
 
 
 /* Tests of most simple BPF programs (no instructions, one instruction etc.) */
@@ -4529,6 +4531,7 @@ test_bpf_match(pcap_t *pcap, const char *str,
 	int ret = -1;
 	uint64_t rc;
 
+	printf("%s '%s'\n", __func__, str);
 	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
 		       __func__, __LINE__,  str, pcap_geterr(pcap));
@@ -4550,6 +4553,24 @@ test_bpf_match(pcap_t *pcap, const char *str,
 	}
 
 	rc = rte_bpf_exec(bpf, mb);
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func == NULL) {
+			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
+			goto error;
+		}
+
+		fflush(stdout);
+		uint64_t rc_jit = jit.func(mb);
+		if (rc_jit != rc) {
+			printf("%s@%d: JIT return code does not match\n", __func__, __LINE__);
+			goto error;
+		}
+	}
+#endif
 	/* The return code from bpf capture filter is non-zero if matched */
 	ret = (rc == 0);
 error:
@@ -4560,23 +4581,16 @@ test_bpf_match(pcap_t *pcap, const char *str,
 	return ret;
 }
 
-/* Basic sanity test can we match a IP packet */
-static int
-test_bpf_filter_sanity(pcap_t *pcap)
+/* Setup mbuf for filter test */
+static void
+dummy_ip_prep(void *data, uint16_t plen)
 {
-	const uint32_t plen = 100;
-	struct rte_mbuf mb, *m;
-	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
 	struct {
 		struct rte_ether_hdr eth_hdr;
 		struct rte_ipv4_hdr ip_hdr;
-	} *hdr;
+		struct rte_udp_hdr udp_hdr;
+	} *hdr = data;
 
-	memset(&mb, 0, sizeof(mb));
-	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
-	m = &mb;
-
-	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
 	hdr->eth_hdr = (struct rte_ether_hdr) {
 		.dst_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
 		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
@@ -4589,13 +4603,32 @@ test_bpf_filter_sanity(pcap_t *pcap)
 		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
 		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
 	};
+	hdr->udp_hdr = (struct rte_udp_hdr) {
+		.src_port = rte_rand_max(UINT16_MAX),
+		.dst_port = rte_cpu_to_be_16(9),	/* discard port */
+		.dgram_len = rte_cpu_to_be_16(plen - sizeof(struct rte_ipv4_hdr)),
+		.dgram_cksum = 0,
+	};
+}
+
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	struct rte_mbuf mb = { 0 };
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	const uint32_t plen = 100;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
 
-	if (test_bpf_match(pcap, "ip", m) != 0) {
+	if (test_bpf_match(pcap, "ip", &mb) != 0) {
 		printf("%s@%d: filter \"ip\" doesn't match test data\n",
 		       __func__, __LINE__);
 		return -1;
 	}
-	if (test_bpf_match(pcap, "not ip", m) == 0) {
+	if (test_bpf_match(pcap, "not ip", &mb) == 0) {
 		printf("%s@%d: filter \"not ip\" does match test data\n",
 		       __func__, __LINE__);
 		return -1;
@@ -4648,10 +4681,15 @@ static const char * const sample_filters[] = {
 static int
 test_bpf_filter(pcap_t *pcap, const char *s)
 {
+	struct rte_mbuf mb = { 0 };
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	const uint32_t plen = 100;
 	struct bpf_program fcode;
 	struct rte_bpf_prm *prm = NULL;
 	struct rte_bpf *bpf = NULL;
+	int ret = -1;
 
+	printf("%s '%s'\n", __func__, s);
 	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
 		       __func__, __LINE__, s, pcap_geterr(pcap));
@@ -4665,8 +4703,10 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 		goto error;
 	}
 
+#ifdef DEBUG
 	printf("bpf convert for \"%s\" produced:\n", s);
 	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+#endif
 
 	bpf = rte_bpf_load(prm);
 	if (bpf == NULL) {
@@ -4675,6 +4715,30 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 		goto error;
 	}
 
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
+
+	uint64_t rc = rte_bpf_exec(bpf, &mb);
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func == NULL) {
+			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
+			goto error;
+		}
+
+		fflush(stdout);
+		uint64_t rc_jit = jit.func(&mb);
+		if (rc_jit != rc) {
+			printf("%s@%d: JIT return code does not match\n", __func__, __LINE__);
+			goto error;
+		}
+	}
+#endif
+	ret = 0;
+
 error:
 	if (bpf)
 		rte_bpf_destroy(bpf);
@@ -4685,7 +4749,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 
 	rte_free(prm);
 	pcap_freecode(&fcode);
-	return (bpf == NULL) ? -1 : 0;
+	return ret;
 }
 
 static int
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate
  2026-06-18 20:47   ` [PATCH v2 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
@ 2026-06-19  2:09     ` Stephen Hemminger
  0 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-19  2:09 UTC (permalink / raw)
  To: dev; +Cc: stable, Konstantin Ananyev, Marat Khalili, Ferruh Yigit

On Thu, 18 Jun 2026 13:47:05 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> emit_tst_imm() emits TEST (0xF7 /0) but sized the immediate with
> imm_size(), which can return 1 byte. TEST has no imm8 form; it always
> takes imm32. A small mask like BPF_JSET | BPF_K #0x1 then produced a
> 4-byte instruction the CPU decodes as 7, swallowing the following Jcc
> and crashing.
> 
> Always emit a 32-bit immediate for TEST.
> 
> Bugzilla ID: 1959
> Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---

Turns out there are two more places with similar bugs (spotted with AI review).

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 0/6] bpf: JIT related bug fixes
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (5 preceding siblings ...)
  2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-21 16:23 ` Stephen Hemminger
  2026-06-21 16:23   ` [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
                     ` (5 more replies)
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
                   ` (2 subsequent siblings)
  9 siblings, 6 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-21 16:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

While implementing JIT for packet capture ran into several issues:
  1. x86 JIT had pre-existing bug which would crash
  2. ARM64 BPF JIT was missing instructions for packet access.
     Which had been discovered previously [1]
  3. Tests related to JIT were not being run or missing coverage.

Fixed all of these. Patches are ordered so that most urgent fix
is first, follwed by the test that should have caught the problem.

The arm64 epilogue branch fix (patch 3) was originally posted by Christophe
Fontaine [1]; that series stalled, so it is carried here with his
authorship.

Changes since v1:
 - add x86 BPF_JSET encoding fix and a regression test (patches 1-2),
   found once the convert test ran generated code through the JIT
 - carry Christophe's arm64 epilogue fix with his sign-off (patch 3)
 - convert test now runs the converted filters through the JIT, not just
   loads them (patch 6)
 - kept Marat's ack (patch 4)
   Since tests change enough, decided to drop his ack for that part.

[1] https://inbox.dpdk.org/dev/20260319114500.9757-2-cfontain@redhat.com/

v3 -- found a couple similar places where x86 JIT was generating
      invalid op codes.

Christophe Fontaine (1):
  bpf/arm64: fix offset type to allow a negative jump

Stephen Hemminger (5):
  bpf/x86: fix JIT encoding of BPF_JSET with immediate
  test/bpf: add JSET test with small immediate
  test/bpf: check that JIT was generated
  bpf/arm64: add BPF_ABS/BPF_IND packet load support
  test/bpf: check that bpf_convert can be JIT'd

 app/test/test_bpf.c     | 184 ++++++++++++++++++++++++++++++++++++----
 lib/bpf/bpf_jit_arm64.c | 153 ++++++++++++++++++++++++++++++++-
 lib/bpf/bpf_jit_x86.c   |   6 +-
 3 files changed, 323 insertions(+), 20 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-21 16:23   ` Stephen Hemminger
  2026-06-23 10:11     ` Marat Khalili
  2026-06-21 16:23   ` [PATCH v3 2/6] test/bpf: add JSET test with small immediate Stephen Hemminger
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-21 16:23 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Konstantin Ananyev, Marat Khalili,
	Ferruh Yigit

Several place in x86 JIT code, it assumes that for small immediate
values the instruction size is one byte; but it is not.

The immddiate form of the instruction takes a 32 bit value.
The broken version of emit_tst_imm() emits TEST (0xF7 /0)
but sized the immediate with imm_size(), which can return 1 byte.

A small mask like BPF_JSET | BPF_K #0x1 then produced a
4-byte instruction the CPU decodes as 7,
swallowing the following Jcc and crashing.

Always emit a 32-bit immediate for TEST, ROR and SHIFT.

Bugzilla ID: 1959
Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_x86.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/bpf/bpf_jit_x86.c b/lib/bpf/bpf_jit_x86.c
index 88b1b5aeab..b14a574703 100644
--- a/lib/bpf/bpf_jit_x86.c
+++ b/lib/bpf/bpf_jit_x86.c
@@ -300,7 +300,7 @@ emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
 	emit_rex(st, BPF_ALU, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }
 
 /*
@@ -441,7 +441,7 @@ emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
 	uint32_t imm)
 {
 	emit_shift(st, op, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }
 
 /*
@@ -921,7 +921,7 @@ emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
 	emit_rex(st, op, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(int32_t));
 }
 
 static void
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 2/6] test/bpf: add JSET test with small immediate
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-21 16:23   ` [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
@ 2026-06-21 16:23   ` Stephen Hemminger
  2026-06-23 10:16     ` Marat Khalili
  2026-06-21 16:23   ` [PATCH v3 3/6] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-21 16:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

The existing jump test only used a 32-bit JSET mask,
so the broken imm8 encoding of TEST in the x86 JIT was never exercised.
Add a case with a byte-sized mask;
run_test() runs it through the interpreter and the JIT.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index dd24722450..e70dea736f 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3158,7 +3158,89 @@ static const struct ebpf_insn test_ld_mbuf3_prog[] = {
 };
 
 /* all bpf test cases */
+/*
+ * JSET with a byte-sized mask: exercises the imm8 path of the TEST
+ * encoding in the x86 JIT (a 32-bit mask takes a different path).
+ */
+static const struct ebpf_insn test_jset1_prog[] = {
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	/* bit 0 is set in the input: branch is taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x1,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x1,
+	},
+	/* bit 1 is clear in the input: branch is not taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x2,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_jset1_prepare(void *arg)
+{
+	struct dummy_offset *df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u8 = 0x1;	/* bit 0 set, bit 1 clear */
+}
+
+static int
+test_jset1_check(uint64_t rc, const void *arg)
+{
+	return cmp_res(__func__, 0x1, rc, arg, arg, 0);
+}
+
 static const struct bpf_test tests[] = {
+	{
+		.name = "test_jset1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_jset1_prog,
+			.nb_ins = RTE_DIM(test_jset1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_jset1_prepare,
+		.check_result = test_jset1_check,
+	},
 	{
 		.name = "test_store1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 3/6] bpf/arm64: fix offset type to allow a negative jump
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-21 16:23   ` [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
  2026-06-21 16:23   ` [PATCH v3 2/6] test/bpf: add JSET test with small immediate Stephen Hemminger
@ 2026-06-21 16:23   ` Stephen Hemminger
  2026-06-22 16:26     ` Marat Khalili
  2026-06-21 16:23   ` [PATCH v3 4/6] test/bpf: check that JIT was generated Stephen Hemminger
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-21 16:23 UTC (permalink / raw)
  To: dev
  Cc: Christophe Fontaine, stable, Stephen Hemminger,
	Wathsala Vithanage, Konstantin Ananyev, Marat Khalili,
	Jerin Jacob

From: Christophe Fontaine <cfontain@redhat.com>

The DPDK BPF JIT standalone test test_ld_mbuf1 fails on arm64.
It does:
	r6 = r1                    // mbuf
	r0 = *(u8 *)pkt[0]         // BPF_ABS
	if ((r0 & 0xf0) == 0x40)
		goto parse
	r0 = 0
	exit                       // epilogue E0
parse:
	r0 = *(u8 *)pkt[r0 + 3]    // BPF_IND
	...
	exit

emit_return_zero_if_src_zero() returns 0 by branching to a function
epilogue. The target maybe a previous epilogue so branch
might be backwards; therefore the offset needs to be negative.

The offset was stored in a uint16_t, so a negative value wrapped to a
large positive number; emit_b() then branched past the end of the
program and faulted at run time.

Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
Cc: stable@dpdk.org

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index a04ef33a9c..67e42015de 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -957,10 +957,12 @@ static void
 emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
 {
 	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
-	uint16_t jump_to_epilogue;
+	int32_t jump_to_epilogue;
 
 	emit_cbnz(ctx, is64, src, 3);
 	emit_mov_imm(ctx, is64, r0, 0);
+
+	/* maybe backwards branch to earlier epilogue */
 	jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;
 	emit_b(ctx, jump_to_epilogue);
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 4/6] test/bpf: check that JIT was generated
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
                     ` (2 preceding siblings ...)
  2026-06-21 16:23   ` [PATCH v3 3/6] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
@ 2026-06-21 16:23   ` Stephen Hemminger
  2026-06-21 16:23   ` [PATCH v3 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-21 16:23   ` [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  5 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-21 16:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

Avoid silently ignoring JIT failures. The test cases should
all succeed JIT compilation; if not it is a bug in the JIT
implementation and should be reported.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index e70dea736f..3a88434c3c 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3590,6 +3590,14 @@ run_test(const struct bpf_test *tst)
 				rv, strerror(rv));
 		}
 	}
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	else {
+		/* a JIT backend exists for this arch, so it must compile */
+		printf("%s@%d: %s: no JIT code generated;\n",
+			__func__, __LINE__, tst->name);
+		ret = -1;
+	}
+#endif
 
 	rte_bpf_destroy(bpf);
 	return ret;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
                     ` (3 preceding siblings ...)
  2026-06-21 16:23   ` [PATCH v3 4/6] test/bpf: check that JIT was generated Stephen Hemminger
@ 2026-06-21 16:23   ` Stephen Hemminger
  2026-06-21 16:23   ` [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  5 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-21 16:23 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili

The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
"invalid opcode", so cBPF programs converted by rte_bpf_convert() could
not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
data held in the first mbuf segment, and a __rte_pktmbuf_read() slow
path for everything else.

The forward branches over the call cannot use fixed distances:
emit_call() materializes the helper address with a variable number of
mov/movk instructions, so the block sizes are not known up front. Size
the three blocks (fast path, slow path, common tail) in a dry run, then
emit for real with the branches resolved from the measured offsets.

Programs using these opcodes use the call register layout, since the
slow path makes a function call.

Bugzilla ID: 1427

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 149 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 148 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 67e42015de..3fc16fd984 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -1125,6 +1125,135 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
 	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
 }
 
+/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
+enum {
+	LDMB_FAST_OFS, /* fast path */
+	LDMB_SLOW_OFS, /* slow path */
+	LDMB_FIN_OFS,  /* common tail */
+	LDMB_OFS_NUM
+};
+
+/*
+ * Helper for emit_ld_mbuf(): fast path.
+ * Compute the packet offset; if it lies inside the first segment leave the
+ * data pointer in R0, otherwise branch to the slow path.
+ */
+static void
+emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
+		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
+
+	/* off = imm (+ src for BPF_IND) */
+	emit_mov_imm(ctx, 1, tmp1, imm);
+	if (mode == BPF_IND)
+		emit_add(ctx, 1, tmp1, src);
+
+	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_sub(ctx, 1, tmp2, tmp1);
+	emit_mov_imm(ctx, 1, tmp3, sz);
+	emit_cmp(ctx, 1, tmp2, tmp3);
+	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
+	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
+	emit_add(ctx, 1, r0, tmp2);
+	emit_add(ctx, 1, r0, tmp1);
+
+	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
+}
+
+/*
+ * Helper for emit_ld_mbuf(): slow path.
+ * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
+ * The scratch buffer is the space reserved by __rte_bpf_validate() at the
+ * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
+ */
+static void
+emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+
+	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
+	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
+	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
+	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
+	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
+
+	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
+	emit_return_zero_if_src_zero(ctx, 1, r0);
+}
+
+/*
+ * Helper for emit_ld_mbuf(): common tail.
+ * Load the value pointed to by R0 and convert from network byte order.
+ */
+static void
+emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+
+	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
+	if (opsz != BPF_B)
+		emit_be(ctx, r0, sz * 8);
+}
+
+/*
+ * emit code for BPF_ABS/BPF_IND load.
+ * generates the following construction:
+ * fast_path:
+ *   off = src + imm
+ *   if (mbuf->data_len - off < sz)
+ *      goto slow_path;
+ *   ptr = mbuf->buf_addr + mbuf->data_off + off;
+ *   goto fin_part;
+ * slow_path:
+ *   typeof(sz) buf;  // scratch space reserved on the eBPF stack
+ *   ptr = __rte_pktmbuf_read(mbuf, off, sz, &buf);
+ *   if (ptr == NULL)
+ *      return 0;
+ * fin_part:
+ *   res = *(typeof(sz))ptr;
+ *   res = ntoh(res);
+ */
+static void
+emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
+	     uint32_t stack_ofs)
+{
+	uint8_t mode = BPF_MODE(op);
+	uint8_t opsz = BPF_SIZE(op);
+	uint32_t sz = bpf_size(opsz);
+	uint32_t ofs[LDMB_OFS_NUM];
+
+	/* seed offsets so the dry-run branches stay in range */
+	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
+
+	/* dry run to record block offsets */
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	ofs[LDMB_SLOW_OFS] = ctx->idx;
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	ofs[LDMB_FIN_OFS] = ctx->idx;
+	emit_ldmb_fin(ctx, opsz, sz);
+
+	/* rewind and emit for real with resolved offsets */
+	ctx->idx = ofs[LDMB_FAST_OFS];
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	emit_ldmb_fin(ctx, opsz, sz);
+}
+
 static void
 check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 {
@@ -1137,8 +1266,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 		op = ins->code;
 
 		switch (op) {
-		/* Call imm */
+		/*
+		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
+		 * so they need the call-clobbered register layout as well.
+		 */
 		case (BPF_JMP | EBPF_CALL):
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
 			ctx->foundcall = 1;
 			return;
 		}
@@ -1340,6 +1478,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 			emit_mov_imm(ctx, 1, dst, u64);
 			i++;
 			break;
+		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
+			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
+			break;
 		/* *(size *)(dst + off) = src */
 		case (BPF_STX | BPF_MEM | BPF_B):
 		case (BPF_STX | BPF_MEM | BPF_H):
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
                     ` (4 preceding siblings ...)
  2026-06-21 16:23   ` [PATCH v3 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-21 16:23   ` Stephen Hemminger
  2026-06-23 13:57     ` Marat Khalili
  5 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-21 16:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

Add followup in bpf conversion tests to make sure resulting
code was also run through JIT and that JIT produces
same results as non-JIT.

Reduce log output to make it easier to match which
expression might be causing issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 94 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 79 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 3a88434c3c..973dd7d659 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -8,6 +8,7 @@
 #include <inttypes.h>
 #include <unistd.h>
 
+#include <rte_byteorder.h>
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
@@ -32,6 +33,7 @@ test_bpf(void)
 #include <rte_bpf.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
+#include <rte_udp.h>
 
 
 /* Tests of most simple BPF programs (no instructions, one instruction etc.) */
@@ -4529,6 +4531,7 @@ test_bpf_match(pcap_t *pcap, const char *str,
 	int ret = -1;
 	uint64_t rc;
 
+	printf("%s '%s'\n", __func__, str);
 	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
 		       __func__, __LINE__,  str, pcap_geterr(pcap));
@@ -4550,6 +4553,24 @@ test_bpf_match(pcap_t *pcap, const char *str,
 	}
 
 	rc = rte_bpf_exec(bpf, mb);
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func == NULL) {
+			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
+			goto error;
+		}
+
+		fflush(stdout);
+		uint64_t rc_jit = jit.func(mb);
+		if (rc_jit != rc) {
+			printf("%s@%d: JIT return code does not match\n", __func__, __LINE__);
+			goto error;
+		}
+	}
+#endif
 	/* The return code from bpf capture filter is non-zero if matched */
 	ret = (rc == 0);
 error:
@@ -4560,23 +4581,16 @@ test_bpf_match(pcap_t *pcap, const char *str,
 	return ret;
 }
 
-/* Basic sanity test can we match a IP packet */
-static int
-test_bpf_filter_sanity(pcap_t *pcap)
+/* Setup mbuf for filter test */
+static void
+dummy_ip_prep(void *data, uint16_t plen)
 {
-	const uint32_t plen = 100;
-	struct rte_mbuf mb, *m;
-	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
 	struct {
 		struct rte_ether_hdr eth_hdr;
 		struct rte_ipv4_hdr ip_hdr;
-	} *hdr;
+		struct rte_udp_hdr udp_hdr;
+	} *hdr = data;
 
-	memset(&mb, 0, sizeof(mb));
-	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
-	m = &mb;
-
-	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
 	hdr->eth_hdr = (struct rte_ether_hdr) {
 		.dst_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
 		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
@@ -4589,13 +4603,32 @@ test_bpf_filter_sanity(pcap_t *pcap)
 		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
 		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
 	};
+	hdr->udp_hdr = (struct rte_udp_hdr) {
+		.src_port = rte_rand_max(UINT16_MAX),
+		.dst_port = rte_cpu_to_be_16(9),	/* discard port */
+		.dgram_len = rte_cpu_to_be_16(plen - sizeof(struct rte_ipv4_hdr)),
+		.dgram_cksum = 0,
+	};
+}
+
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	struct rte_mbuf mb = { 0 };
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	const uint32_t plen = 100;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
 
-	if (test_bpf_match(pcap, "ip", m) != 0) {
+	if (test_bpf_match(pcap, "ip", &mb) != 0) {
 		printf("%s@%d: filter \"ip\" doesn't match test data\n",
 		       __func__, __LINE__);
 		return -1;
 	}
-	if (test_bpf_match(pcap, "not ip", m) == 0) {
+	if (test_bpf_match(pcap, "not ip", &mb) == 0) {
 		printf("%s@%d: filter \"not ip\" does match test data\n",
 		       __func__, __LINE__);
 		return -1;
@@ -4648,10 +4681,15 @@ static const char * const sample_filters[] = {
 static int
 test_bpf_filter(pcap_t *pcap, const char *s)
 {
+	struct rte_mbuf mb = { 0 };
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	const uint32_t plen = 100;
 	struct bpf_program fcode;
 	struct rte_bpf_prm *prm = NULL;
 	struct rte_bpf *bpf = NULL;
+	int ret = -1;
 
+	printf("%s '%s'\n", __func__, s);
 	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
 		       __func__, __LINE__, s, pcap_geterr(pcap));
@@ -4665,8 +4703,10 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 		goto error;
 	}
 
+#ifdef DEBUG
 	printf("bpf convert for \"%s\" produced:\n", s);
 	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+#endif
 
 	bpf = rte_bpf_load(prm);
 	if (bpf == NULL) {
@@ -4675,6 +4715,30 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 		goto error;
 	}
 
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
+
+	uint64_t rc = rte_bpf_exec(bpf, &mb);
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func == NULL) {
+			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
+			goto error;
+		}
+
+		fflush(stdout);
+		uint64_t rc_jit = jit.func(&mb);
+		if (rc_jit != rc) {
+			printf("%s@%d: JIT return code does not match\n", __func__, __LINE__);
+			goto error;
+		}
+	}
+#endif
+	ret = 0;
+
 error:
 	if (bpf)
 		rte_bpf_destroy(bpf);
@@ -4685,7 +4749,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 
 	rte_free(prm);
 	pcap_freecode(&fcode);
-	return (bpf == NULL) ? -1 : 0;
+	return ret;
 }
 
 static int
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* RE: [PATCH v3 3/6] bpf/arm64: fix offset type to allow a negative jump
  2026-06-21 16:23   ` [PATCH v3 3/6] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
@ 2026-06-22 16:26     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-22 16:26 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org
  Cc: Christophe Fontaine, stable@dpdk.org, Wathsala Vithanage,
	Konstantin Ananyev, Jerin Jacob

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Sunday 21 June 2026 17:24
> To: dev@dpdk.org
> Cc: Christophe Fontaine <cfontain@redhat.com>; stable@dpdk.org; Stephen Hemminger
> <stephen@networkplumber.org>; Wathsala Vithanage <wathsala.vithanage@arm.com>; Konstantin Ananyev
> <konstantin.ananyev@huawei.com>; Marat Khalili <marat.khalili@huawei.com>; Jerin Jacob
> <jerinj@marvell.com>
> Subject: [PATCH v3 3/6] bpf/arm64: fix offset type to allow a negative jump
> 
> From: Christophe Fontaine <cfontain@redhat.com>
> 
> The DPDK BPF JIT standalone test test_ld_mbuf1 fails on arm64.
> It does:
> 	r6 = r1                    // mbuf
> 	r0 = *(u8 *)pkt[0]         // BPF_ABS
> 	if ((r0 & 0xf0) == 0x40)
> 		goto parse
> 	r0 = 0
> 	exit                       // epilogue E0
> parse:
> 	r0 = *(u8 *)pkt[r0 + 3]    // BPF_IND
> 	...
> 	exit
> 
> emit_return_zero_if_src_zero() returns 0 by branching to a function
> epilogue. The target maybe a previous epilogue so branch
> might be backwards; therefore the offset needs to be negative.
> 
> The offset was stored in a uint16_t, so a negative value wrapped to a
> large positive number; emit_b() then branched past the end of the
> program and faulted at run time.
> 
> Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_jit_arm64.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
> index a04ef33a9c..67e42015de 100644
> --- a/lib/bpf/bpf_jit_arm64.c
> +++ b/lib/bpf/bpf_jit_arm64.c
> @@ -957,10 +957,12 @@ static void
>  emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
>  {
>  	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> -	uint16_t jump_to_epilogue;
> +	int32_t jump_to_epilogue;
> 
>  	emit_cbnz(ctx, is64, src, 3);
>  	emit_mov_imm(ctx, is64, r0, 0);
> +
> +	/* maybe backwards branch to earlier epilogue */
>  	jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;
>  	emit_b(ctx, jump_to_epilogue);
>  }
> --
> 2.53.0

I still wish it was not called program_sz here, but the fix is not wrong, so

Acked-by: Marat Khalili <marat.khalili@huawei.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate
  2026-06-21 16:23   ` [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
@ 2026-06-23 10:11     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-23 10:11 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org
  Cc: stable@dpdk.org, Konstantin Ananyev, Ferruh Yigit

With the condition that the commit message is proofread,

Acked-by: Marat Khalili <marat.khalili@huawei.com>

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Sunday 21 June 2026 17:24
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; stable@dpdk.org; Konstantin Ananyev
> <konstantin.ananyev@huawei.com>; Marat Khalili <marat.khalili@huawei.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>
> Subject: [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate
> 
> Several place in x86 JIT code, it assumes that for small immediate
> values the instruction size is one byte; but it is not.
> 
> The immddiate form of the instruction takes a 32 bit value.
> The broken version of emit_tst_imm() emits TEST (0xF7 /0)
> but sized the immediate with imm_size(), which can return 1 byte.
> 
> A small mask like BPF_JSET | BPF_K #0x1 then produced a
> 4-byte instruction the CPU decodes as 7,
> swallowing the following Jcc and crashing.
> 
> Always emit a 32-bit immediate for TEST, ROR and SHIFT.

The commit message needs to be LLMed for typos and factual mistakes.

> 
> Bugzilla ID: 1959
> Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_jit_x86.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/bpf/bpf_jit_x86.c b/lib/bpf/bpf_jit_x86.c
> index 88b1b5aeab..b14a574703 100644
> --- a/lib/bpf/bpf_jit_x86.c
> +++ b/lib/bpf/bpf_jit_x86.c
> @@ -300,7 +300,7 @@ emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
>  	emit_rex(st, BPF_ALU, 0, dreg);
>  	emit_bytes(st, &ops, sizeof(ops));
>  	emit_modregrm(st, MOD_DIRECT, mods, dreg);
> -	emit_imm(st, imm, imm_size(imm));
> +	emit_imm(st, imm, sizeof(uint8_t));

The fix appears to be correct, although this function was only ever called with
imm == 8, so the problem was not reproducible.

>  }
> 
>  /*
> @@ -441,7 +441,7 @@ emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
>  	uint32_t imm)
>  {
>  	emit_shift(st, op, dreg);
> -	emit_imm(st, imm, imm_size(imm));
> +	emit_imm(st, imm, sizeof(uint8_t));

The fix appears to be correct, I would welcome a test reproducing the problem.

>  }
> 
>  /*
> @@ -921,7 +921,7 @@ emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
>  	emit_rex(st, op, 0, dreg);
>  	emit_bytes(st, &ops, sizeof(ops));
>  	emit_modregrm(st, MOD_DIRECT, mods, dreg);
> -	emit_imm(st, imm, imm_size(imm));
> +	emit_imm(st, imm, sizeof(int32_t));

The fix appears to be correct.

>  }
> 
>  static void
> --
> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v3 2/6] test/bpf: add JSET test with small immediate
  2026-06-21 16:23   ` [PATCH v3 2/6] test/bpf: add JSET test with small immediate Stephen Hemminger
@ 2026-06-23 10:16     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-23 10:16 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

This instruction has an interesting behavior for negative values of immediate,
to be thorough I would test them as well. Fixed x86 should pass but who knows.

I would also welcome a test reproducing the problem with shifts.

For the narrow scope of this test,

Acked-by: Marat Khalili <marat.khalili@huawei.com>

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Sunday 21 June 2026 17:24
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH v3 2/6] test/bpf: add JSET test with small immediate
> 
> The existing jump test only used a 32-bit JSET mask,
> so the broken imm8 encoding of TEST in the x86 JIT was never exercised.
> Add a case with a byte-sized mask;
> run_test() runs it through the interpreter and the JIT.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 82 insertions(+)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index dd24722450..e70dea736f 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -3158,7 +3158,89 @@ static const struct ebpf_insn test_ld_mbuf3_prog[] = {
>  };
> 
>  /* all bpf test cases */
> +/*
> + * JSET with a byte-sized mask: exercises the imm8 path of the TEST
> + * encoding in the x86 JIT (a 32-bit mask takes a different path).
> + */
> +static const struct ebpf_insn test_jset1_prog[] = {
> +	{
> +		.code = (BPF_ALU | EBPF_MOV | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 0,
> +	},
> +	{
> +		.code = (BPF_LDX | BPF_MEM | BPF_B),
> +		.dst_reg = EBPF_REG_2,
> +		.src_reg = EBPF_REG_1,
> +		.off = offsetof(struct dummy_offset, u8),
> +	},
> +	/* bit 0 is set in the input: branch is taken */
> +	{
> +		.code = (BPF_JMP | BPF_JSET | BPF_K),
> +		.dst_reg = EBPF_REG_2,
> +		.imm = 0x1,
> +		.off = 1,
> +	},
> +	{
> +		.code = (BPF_JMP | BPF_JA),
> +		.off = 1,
> +	},
> +	{
> +		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 0x1,
> +	},
> +	/* bit 1 is clear in the input: branch is not taken */
> +	{
> +		.code = (BPF_JMP | BPF_JSET | BPF_K),
> +		.dst_reg = EBPF_REG_2,
> +		.imm = 0x2,
> +		.off = 1,
> +	},
> +	{
> +		.code = (BPF_JMP | BPF_JA),
> +		.off = 1,
> +	},
> +	{
> +		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 0x2,
> +	},
> +	{
> +		.code = (BPF_JMP | EBPF_EXIT),
> +	},
> +};
> +
> +static void
> +test_jset1_prepare(void *arg)
> +{
> +	struct dummy_offset *df = arg;
> +
> +	memset(df, 0, sizeof(*df));
> +	df->u8 = 0x1;	/* bit 0 set, bit 1 clear */
> +}
> +
> +static int
> +test_jset1_check(uint64_t rc, const void *arg)
> +{
> +	return cmp_res(__func__, 0x1, rc, arg, arg, 0);
> +}
> +
>  static const struct bpf_test tests[] = {
> +	{
> +		.name = "test_jset1",
> +		.arg_sz = sizeof(struct dummy_offset),
> +		.prm = {
> +			.ins = test_jset1_prog,
> +			.nb_ins = RTE_DIM(test_jset1_prog),
> +			.prog_arg = {
> +				.type = RTE_BPF_ARG_PTR,
> +				.size = sizeof(struct dummy_offset),
> +			},
> +		},
> +		.prepare = test_jset1_prepare,
> +		.check_result = test_jset1_check,
> +	},
>  	{
>  		.name = "test_store1",
>  		.arg_sz = sizeof(struct dummy_offset),
> --
> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd
  2026-06-21 16:23   ` [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-23 13:57     ` Marat Khalili
  2026-06-23 15:51       ` Stephen Hemminger
  2026-06-23 20:58       ` Stephen Hemminger
  0 siblings, 2 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-23 13:57 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

Thank you for working on this, please see some comments inline.

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Sunday 21 June 2026 17:24
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd
> 
> Add followup in bpf conversion tests to make sure resulting
> code was also run through JIT and that JIT produces
> same results as non-JIT.
> 
> Reduce log output to make it easier to match which
> expression might be causing issues.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 94 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 79 insertions(+), 15 deletions(-)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index 3a88434c3c..973dd7d659 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -8,6 +8,7 @@
>  #include <inttypes.h>
>  #include <unistd.h>
> 
> +#include <rte_byteorder.h>
>  #include <rte_memory.h>
>  #include <rte_debug.h>
>  #include <rte_hexdump.h>
> @@ -32,6 +33,7 @@ test_bpf(void)
>  #include <rte_bpf.h>
>  #include <rte_ether.h>
>  #include <rte_ip.h>
> +#include <rte_udp.h>
> 
> 
>  /* Tests of most simple BPF programs (no instructions, one instruction etc.) */
> @@ -4529,6 +4531,7 @@ test_bpf_match(pcap_t *pcap, const char *str,
>  	int ret = -1;
>  	uint64_t rc;
> 
> +	printf("%s '%s'\n", __func__, str);
>  	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
>  		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
>  		       __func__, __LINE__,  str, pcap_geterr(pcap));
> @@ -4550,6 +4553,24 @@ test_bpf_match(pcap_t *pcap, const char *str,
>  	}
> 
>  	rc = rte_bpf_exec(bpf, mb);
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)

We are going to have a few of these lines, I think something like
RTE_BPF_JIT_SUPPORTED is warranted.

> +	{
> +		struct rte_bpf_jit jit;
> +
> +		rte_bpf_get_jit(bpf, &jit);

Out of abundance of caution I would also prefill jit with zeroes and check the
return code here.

> +		if (jit.func == NULL) {
> +			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
> +			goto error;
> +		}
> +
> +		fflush(stdout);
> +		uint64_t rc_jit = jit.func(mb);
> +		if (rc_jit != rc) {
> +			printf("%s@%d: JIT return code does not match\n", __func__, __LINE__);
> +			goto error;
> +		}
> +	}
> +#endif
>  	/* The return code from bpf capture filter is non-zero if matched */
>  	ret = (rc == 0);
>  error:
> @@ -4560,23 +4581,16 @@ test_bpf_match(pcap_t *pcap, const char *str,
>  	return ret;
>  }
> 
> -/* Basic sanity test can we match a IP packet */
> -static int
> -test_bpf_filter_sanity(pcap_t *pcap)
> +/* Setup mbuf for filter test */
> +static void
> +dummy_ip_prep(void *data, uint16_t plen)
>  {
> -	const uint32_t plen = 100;
> -	struct rte_mbuf mb, *m;
> -	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
>  	struct {
>  		struct rte_ether_hdr eth_hdr;
>  		struct rte_ipv4_hdr ip_hdr;
> -	} *hdr;
> +		struct rte_udp_hdr udp_hdr;
> +	} *hdr = data;
> 
> -	memset(&mb, 0, sizeof(mb));
> -	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
> -	m = &mb;
> -
> -	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
>  	hdr->eth_hdr = (struct rte_ether_hdr) {
>  		.dst_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
>  		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
> @@ -4589,13 +4603,32 @@ test_bpf_filter_sanity(pcap_t *pcap)
>  		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
>  		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
>  	};
> +	hdr->udp_hdr = (struct rte_udp_hdr) {
> +		.src_port = rte_rand_max(UINT16_MAX),
> +		.dst_port = rte_cpu_to_be_16(9),	/* discard port */
> +		.dgram_len = rte_cpu_to_be_16(plen - sizeof(struct rte_ipv4_hdr)),
> +		.dgram_cksum = 0,
> +	};
> +}
> +
> +
> +/* Basic sanity test can we match a IP packet */
> +static int
> +test_bpf_filter_sanity(pcap_t *pcap)
> +{
> +	struct rte_mbuf mb = { 0 };
> +	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
> +	const uint32_t plen = 100;
> +
> +	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
> +	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
> 
> -	if (test_bpf_match(pcap, "ip", m) != 0) {
> +	if (test_bpf_match(pcap, "ip", &mb) != 0) {
>  		printf("%s@%d: filter \"ip\" doesn't match test data\n",
>  		       __func__, __LINE__);
>  		return -1;
>  	}
> -	if (test_bpf_match(pcap, "not ip", m) == 0) {
> +	if (test_bpf_match(pcap, "not ip", &mb) == 0) {

Not a new bug, but this condition should be for non-positive, not just zero.

>  		printf("%s@%d: filter \"not ip\" does match test data\n",
>  		       __func__, __LINE__);
>  		return -1;
> @@ -4648,10 +4681,15 @@ static const char * const sample_filters[] = {
>  static int
>  test_bpf_filter(pcap_t *pcap, const char *s)
>  {
> +	struct rte_mbuf mb = { 0 };
> +	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
> +	const uint32_t plen = 100;
>  	struct bpf_program fcode;
>  	struct rte_bpf_prm *prm = NULL;
>  	struct rte_bpf *bpf = NULL;
> +	int ret = -1;
> 
> +	printf("%s '%s'\n", __func__, s);
>  	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
>  		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
>  		       __func__, __LINE__, s, pcap_geterr(pcap));
> @@ -4665,8 +4703,10 @@ test_bpf_filter(pcap_t *pcap, const char *s)
>  		goto error;
>  	}
> 
> +#ifdef DEBUG
>  	printf("bpf convert for \"%s\" produced:\n", s);
>  	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
> +#endif
> 
>  	bpf = rte_bpf_load(prm);
>  	if (bpf == NULL) {
> @@ -4675,6 +4715,30 @@ test_bpf_filter(pcap_t *pcap, const char *s)
>  		goto error;
>  	}
> 
> +	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
> +	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
> +
> +	uint64_t rc = rte_bpf_exec(bpf, &mb);

Would it be hard to check the result against a known correct answer?

> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> +	{
> +		struct rte_bpf_jit jit;
> +
> +		rte_bpf_get_jit(bpf, &jit);

Same suggestion regarding zeroing jit and check return code here.

> +		if (jit.func == NULL) {
> +			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
> +			goto error;
> +		}
> +
> +		fflush(stdout);
> +		uint64_t rc_jit = jit.func(&mb);
> +		if (rc_jit != rc) {
> +			printf("%s@%d: JIT return code does not match\n", __func__, __LINE__);
> +			goto error;
> +		}
> +	}
> +#endif
> +	ret = 0;
> +

Are `test_bpf_filter` and `test_bpf_filter_sanity` substantially different any
more, or could they just be merged?

>  error:
>  	if (bpf)
>  		rte_bpf_destroy(bpf);
> @@ -4685,7 +4749,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
> 
>  	rte_free(prm);
>  	pcap_freecode(&fcode);
> -	return (bpf == NULL) ? -1 : 0;
> +	return ret;
>  }
> 
>  static int
> --
> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd
  2026-06-23 13:57     ` Marat Khalili
@ 2026-06-23 15:51       ` Stephen Hemminger
  2026-06-23 20:58       ` Stephen Hemminger
  1 sibling, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 15:51 UTC (permalink / raw)
  To: Marat Khalili; +Cc: dev@dpdk.org, Konstantin Ananyev

On Tue, 23 Jun 2026 13:57:35 +0000
Marat Khalili <marat.khalili@huawei.com> wrote:

> Thank you for working on this, please see some comments inline.

I think it is still worth keeping the tests split into two.
One test just make sure that some basic filters work as expected (match vs not match).

The other one is just doing its job by causing pcap_compile() to generate more complex
and different code. Since the packet we are feeding it is just a dummy packet, I suspect
all of the filters will be false. Let me recheck, if so then can look at return value.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd
  2026-06-23 13:57     ` Marat Khalili
  2026-06-23 15:51       ` Stephen Hemminger
@ 2026-06-23 20:58       ` Stephen Hemminger
  1 sibling, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 20:58 UTC (permalink / raw)
  To: Marat Khalili; +Cc: dev@dpdk.org, Konstantin Ananyev

On Tue, 23 Jun 2026 13:57:35 +0000
Marat Khalili <marat.khalili@huawei.com> wrote:

> > +	{
> > +		struct rte_bpf_jit jit;
> > +
> > +		rte_bpf_get_jit(bpf, &jit);  
> 
> Out of abundance of caution I would also prefill jit with zeroes and check the
> return code here.

Makes sense, but the test already was just doing same thing elsewhere:

static int
run_test(const struct bpf_test *tst)
{
	int32_t ret, rv;
...
	bpf = rte_bpf_load(&tst->prm);
...
	/* repeat the same test with jit, when possible */
	rte_bpf_get_jit(bpf, &jit);
	if (jit.func != NULL) {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 0/7] bpf: JIT related bug fixes
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (6 preceding siblings ...)
  2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-23 23:23 ` Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 1/7] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
                     ` (6 more replies)
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
  9 siblings, 7 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

While implementing JIT for packet capture ran into several issues:
  1. x86 JIT had pre-existing bug which would crash
  2. ARM64 BPF JIT was missing instructions for packet access.
     Which had been discovered previously [1]
  3. Tests related to JIT were not being run or missing coverage.

Fixed all of these. Patches are ordered so that most urgent fix
is first, followed by the test that should have caught the problem.

The arm64 epilogue branch fix (patch 4) was originally posted by
Christophe Fontaine [1]; that series stalled, so it is carried here
with his authorship.

Changes since v3:
 - incorporate review feedback
 - rebase to current main
 - extend the x86 fix to all fixed-width immediates, not just JSET:
   TEST is always imm32, while ROR and the shift group are always
   imm8; patch 1 retitled to match
 - add a regression test for a large shift count (patch 3)

Changes since v2:
 - found more places where the x86 JIT emitted invalid opcodes for
   fixed-width immediates

Changes since v1:
 - add the x86 JSET encoding fix and its regression test, found once
   the convert test ran generated code through the JIT
 - carry Christophe's arm64 epilogue fix with his sign-off
 - convert test now runs the converted filters through the JIT, not
   just loading them
 - kept Marat's ack on the "check JIT was generated" patch; dropped it
   on the convert test since that changed substantially

[1] https://inbox.dpdk.org/dev/20260319114500.9757-2-cfontain@redhat.com/

Christophe Fontaine (1):
  bpf/arm64: fix offset type to allow a negative jump

Stephen Hemminger (6):
  bpf/x86: fix JIT encoding of fixed-width immediates
  test/bpf: add JSET test with small immediate
  test/bpf: add test for large shift
  test/bpf: check that JIT was generated
  bpf/arm64: add BPF_ABS/BPF_IND packet load support
  test/bpf: check that bpf_convert can be JIT'd

 app/test/test_bpf.c     | 327 +++++++++++++++++++++++++++++++---------
 lib/bpf/bpf_jit_arm64.c | 153 ++++++++++++++++++-
 lib/bpf/bpf_jit_x86.c   |   6 +-
 lib/bpf/meson.build     |   2 +
 4 files changed, 412 insertions(+), 76 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 1/7] bpf/x86: fix JIT encoding of fixed-width immediates
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-23 23:23   ` Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 2/7] test/bpf: add JSET test with small immediate Stephen Hemminger
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Marat Khalili, Konstantin Ananyev,
	Ferruh Yigit

Several places in the x86 JIT size an immediate with imm_size(), which
returns 1 or 4 bytes depending on the value. That is wrong for opcodes
whose immediate width is fixed by the encoding, and it breaks in both
directions.

TEST (0xF7 /0, used for BPF_JSET) has no imm8 form; the immediate is
always 32 bits. For a small mask such as BPF_JSET | BPF_K #0x1,
imm_size() returns 1, so the JIT emits a 1-byte immediate. The CPU
still consumes 4, swallowing 3 bytes of the following Jcc. The
instruction stream desyncs and the program crashes.

ROR and the shifts (0xC1 group) have the opposite problem: their
immediate is always imm8. For a count >= 128, imm_size() returns 4 and
the JIT emits 3 stray bytes, again desyncing the stream.

Size each immediate by its encoding: 32 bits for TEST, 8 bits for ROR
and the shifts.

Bugzilla ID: 1959
Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 lib/bpf/bpf_jit_x86.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/bpf/bpf_jit_x86.c b/lib/bpf/bpf_jit_x86.c
index 54eb279643..912d3f69bc 100644
--- a/lib/bpf/bpf_jit_x86.c
+++ b/lib/bpf/bpf_jit_x86.c
@@ -300,7 +300,7 @@ emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
 	emit_rex(st, BPF_ALU, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }

 /*
@@ -441,7 +441,7 @@ emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
 	uint32_t imm)
 {
 	emit_shift(st, op, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }

 /*
@@ -921,7 +921,7 @@ emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
 	emit_rex(st, op, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(int32_t));
 }

 static void
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 2/7] test/bpf: add JSET test with small immediate
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 1/7] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
@ 2026-06-23 23:23   ` Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 3/7] test/bpf: add test for large shift Stephen Hemminger
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

The existing jump test only used a 32-bit JSET mask,
so the broken imm8 encoding of TEST in the x86 JIT was never exercised.
Add a case with a byte-sized mask;
run_test() runs it through the interpreter and the JIT.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 6b07e72295..232e9e2a98 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3158,7 +3158,89 @@ static const struct ebpf_insn test_ld_mbuf3_prog[] = {
 };
 
 /* all bpf test cases */
+/*
+ * JSET with a byte-sized mask: exercises the imm8 path of the TEST
+ * encoding in the x86 JIT (a 32-bit mask takes a different path).
+ */
+static const struct ebpf_insn test_jset1_prog[] = {
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	/* bit 0 is set in the input: branch is taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x1,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x1,
+	},
+	/* bit 1 is clear in the input: branch is not taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x2,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_jset1_prepare(void *arg)
+{
+	struct dummy_offset *df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u8 = 0x1;	/* bit 0 set, bit 1 clear */
+}
+
+static int
+test_jset1_check(uint64_t rc, const void *arg)
+{
+	return cmp_res(__func__, 0x1, rc, arg, arg, 0);
+}
+
 static const struct bpf_test tests[] = {
+	{
+		.name = "test_jset1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_jset1_prog,
+			.nb_ins = RTE_DIM(test_jset1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_jset1_prepare,
+		.check_result = test_jset1_check,
+	},
 	{
 		.name = "test_store1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 3/7] test/bpf: add test for large shift
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 1/7] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 2/7] test/bpf: add JSET test with small immediate Stephen Hemminger
@ 2026-06-23 23:23   ` Stephen Hemminger
  2026-06-24  7:59     ` Marat Khalili
  2026-06-23 23:23   ` [PATCH v4 4/7] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

The JIT compiler had issues with immediate values on shift instructions
so add a new test to cover that case.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 232e9e2a98..b54e36910b 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -2005,6 +2005,58 @@ test_div1_check(uint64_t rc, const void *arg)
 	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
 }
 
+/*
+ * Shift by an immediate that doesn't fit in a signed byte: the C1 shift
+ * group takes a fixed 1-byte immediate, but imm_size() returns 4 for
+ * counts >= 128, so the x86 JIT emits 3 stray bytes and desyncs the
+ * instruction stream. The shift results are discarded (a count >= 64 is
+ * UB in the interpreter); the test returns a known constant, which the
+ * corrupted stream fails to produce.
+ */
+static const struct ebpf_insn test_shift_big_imm_prog[] = {
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_LSH | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 137,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_RSH | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 200,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 255,
+	},
+	/* known result; a desynced stream won't reproduce it */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x55,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_shift_big_imm_prepare(void *arg)
+{
+	memset(arg, 0, sizeof(struct dummy_offset));
+}
+
+static int
+test_shift_big_imm_check(uint64_t rc, const void *arg)
+{
+	return cmp_res(__func__, 0x55, rc, arg, arg, 0);
+}
+
 /* call test-cases */
 static const struct ebpf_insn test_call1_prog[] = {
 
@@ -3409,6 +3461,20 @@ static const struct bpf_test tests[] = {
 		.prepare = test_mul1_prepare,
 		.check_result = test_div1_check,
 	},
+	{
+		.name = "test_shift_big_imm",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_shift_big_imm_prog,
+			.nb_ins = RTE_DIM(test_shift_big_imm_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_shift_big_imm_prepare,
+		.check_result = test_shift_big_imm_check,
+	},
 	{
 		.name = "test_call1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 4/7] bpf/arm64: fix offset type to allow a negative jump
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
                     ` (2 preceding siblings ...)
  2026-06-23 23:23   ` [PATCH v4 3/7] test/bpf: add test for large shift Stephen Hemminger
@ 2026-06-23 23:23   ` Stephen Hemminger
  2026-06-24  8:43     ` Marat Khalili
  2026-06-23 23:23   ` [PATCH v4 5/7] test/bpf: check that JIT was generated Stephen Hemminger
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev
  Cc: Christophe Fontaine, stable, Stephen Hemminger,
	Wathsala Vithanage, Konstantin Ananyev, Marat Khalili,
	Jerin Jacob

From: Christophe Fontaine <cfontain@redhat.com>

The DPDK BPF JIT standalone test test_ld_mbuf1 fails on arm64.
It does:
	r6 = r1                    // mbuf
	r0 = *(u8 *)pkt[0]         // BPF_ABS
	if ((r0 & 0xf0) == 0x40)
		goto parse
	r0 = 0
	exit                       // epilogue E0
parse:
	r0 = *(u8 *)pkt[r0 + 3]    // BPF_IND
	...
	exit

emit_return_zero_if_src_zero() returns 0 by branching to a function
epilogue. The target maybe a previous epilogue so branch
might be backwards; therefore the offset needs to be negative.

The offset was stored in a uint16_t, so a negative value wrapped to a
large positive number; emit_b() then branched past the end of the
program and faulted at run time.

Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
Cc: stable@dpdk.org

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index ba7ae4d680..776d7c8e97 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -957,10 +957,12 @@ static void
 emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
 {
 	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
-	uint16_t jump_to_epilogue;
+	int32_t jump_to_epilogue;
 
 	emit_cbnz(ctx, is64, src, 3);
 	emit_mov_imm(ctx, is64, r0, 0);
+
+	/* maybe backwards branch to earlier epilogue */
 	jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;
 	emit_b(ctx, jump_to_epilogue);
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 5/7] test/bpf: check that JIT was generated
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
                     ` (3 preceding siblings ...)
  2026-06-23 23:23   ` [PATCH v4 4/7] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
@ 2026-06-23 23:23   ` Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 6/7] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 7/7] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  6 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

Avoid silently ignoring JIT failures. The test cases should
all succeed JIT compilation; if not it is a bug in the JIT
implementation and should be reported.

Introduce a configuration setting RTE_BPF_JIT_SUPPORTED
which is cleaner that adding ARCH specific #ifdef.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 8 ++++++++
 lib/bpf/meson.build | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index b54e36910b..9adffcce64 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3656,6 +3656,14 @@ run_test(const struct bpf_test *tst)
 				rv, strerror(rv));
 		}
 	}
+#ifdef RTE_BPF_JIT_SUPPORTED
+	else {
+		/* a JIT backend exists for this arch, so it must compile */
+		printf("%s@%d: %s: no JIT code generated;\n",
+			__func__, __LINE__, tst->name);
+		ret = -1;
+	}
+#endif
 
 	rte_bpf_destroy(bpf);
 	return ret;
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 7e8a300e3f..04ede96689 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -27,8 +27,10 @@ sources = files(
 )
 
 if arch_subdir == 'x86' and dpdk_conf.get('RTE_ARCH_64')
+    dpdk_conf.set('RTE_BPF_JIT_SUPPORTED', 1)
     sources += files('bpf_jit_x86.c')
 elif dpdk_conf.has('RTE_ARCH_ARM64')
+    dpdk_conf.set('RTE_BPF_JIT_SUPPORTED', 1)
     sources += files('bpf_jit_arm64.c')
 endif
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 6/7] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
                     ` (4 preceding siblings ...)
  2026-06-23 23:23   ` [PATCH v4 5/7] test/bpf: check that JIT was generated Stephen Hemminger
@ 2026-06-23 23:23   ` Stephen Hemminger
  2026-06-23 23:23   ` [PATCH v4 7/7] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  6 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili

The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
"invalid opcode", so cBPF programs converted by rte_bpf_convert() could
not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
data held in the first mbuf segment, and a __rte_pktmbuf_read() slow
path for everything else.

The forward branches over the call cannot use fixed distances:
emit_call() materializes the helper address with a variable number of
mov/movk instructions, so the block sizes are not known up front. Size
the three blocks (fast path, slow path, common tail) in a dry run, then
emit for real with the branches resolved from the measured offsets.

Programs using these opcodes use the call register layout, since the
slow path makes a function call.

Bugzilla ID: 1427

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 149 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 148 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 776d7c8e97..7b2a1595e8 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -1125,6 +1125,135 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
 	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
 }
 
+/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
+enum {
+	LDMB_FAST_OFS, /* fast path */
+	LDMB_SLOW_OFS, /* slow path */
+	LDMB_FIN_OFS,  /* common tail */
+	LDMB_OFS_NUM
+};
+
+/*
+ * Helper for emit_ld_mbuf(): fast path.
+ * Compute the packet offset; if it lies inside the first segment leave the
+ * data pointer in R0, otherwise branch to the slow path.
+ */
+static void
+emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
+		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
+
+	/* off = imm (+ src for BPF_IND) */
+	emit_mov_imm(ctx, 1, tmp1, imm);
+	if (mode == BPF_IND)
+		emit_add(ctx, 1, tmp1, src);
+
+	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_sub(ctx, 1, tmp2, tmp1);
+	emit_mov_imm(ctx, 1, tmp3, sz);
+	emit_cmp(ctx, 1, tmp2, tmp3);
+	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
+	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
+	emit_add(ctx, 1, r0, tmp2);
+	emit_add(ctx, 1, r0, tmp1);
+
+	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
+}
+
+/*
+ * Helper for emit_ld_mbuf(): slow path.
+ * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
+ * The scratch buffer is the space reserved by __rte_bpf_validate() at the
+ * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
+ */
+static void
+emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+
+	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
+	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
+	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
+	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
+	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
+
+	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
+	emit_return_zero_if_src_zero(ctx, 1, r0);
+}
+
+/*
+ * Helper for emit_ld_mbuf(): common tail.
+ * Load the value pointed to by R0 and convert from network byte order.
+ */
+static void
+emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+
+	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
+	if (opsz != BPF_B)
+		emit_be(ctx, r0, sz * 8);
+}
+
+/*
+ * emit code for BPF_ABS/BPF_IND load.
+ * generates the following construction:
+ * fast_path:
+ *   off = src + imm
+ *   if (mbuf->data_len - off < sz)
+ *      goto slow_path;
+ *   ptr = mbuf->buf_addr + mbuf->data_off + off;
+ *   goto fin_part;
+ * slow_path:
+ *   typeof(sz) buf;  // scratch space reserved on the eBPF stack
+ *   ptr = __rte_pktmbuf_read(mbuf, off, sz, &buf);
+ *   if (ptr == NULL)
+ *      return 0;
+ * fin_part:
+ *   res = *(typeof(sz))ptr;
+ *   res = ntoh(res);
+ */
+static void
+emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
+	     uint32_t stack_ofs)
+{
+	uint8_t mode = BPF_MODE(op);
+	uint8_t opsz = BPF_SIZE(op);
+	uint32_t sz = bpf_size(opsz);
+	uint32_t ofs[LDMB_OFS_NUM];
+
+	/* seed offsets so the dry-run branches stay in range */
+	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
+
+	/* dry run to record block offsets */
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	ofs[LDMB_SLOW_OFS] = ctx->idx;
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	ofs[LDMB_FIN_OFS] = ctx->idx;
+	emit_ldmb_fin(ctx, opsz, sz);
+
+	/* rewind and emit for real with resolved offsets */
+	ctx->idx = ofs[LDMB_FAST_OFS];
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	emit_ldmb_fin(ctx, opsz, sz);
+}
+
 static void
 check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 {
@@ -1137,8 +1266,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 		op = ins->code;
 
 		switch (op) {
-		/* Call imm */
+		/*
+		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
+		 * so they need the call-clobbered register layout as well.
+		 */
 		case (BPF_JMP | EBPF_CALL):
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
 			ctx->foundcall = 1;
 			return;
 		}
@@ -1340,6 +1478,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 			emit_mov_imm(ctx, 1, dst, u64);
 			i++;
 			break;
+		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
+			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
+			break;
 		/* *(size *)(dst + off) = src */
 		case (BPF_STX | BPF_MEM | BPF_B):
 		case (BPF_STX | BPF_MEM | BPF_H):
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v4 7/7] test/bpf: check that bpf_convert can be JIT'd
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
                     ` (5 preceding siblings ...)
  2026-06-23 23:23   ` [PATCH v4 6/7] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-23 23:23   ` Stephen Hemminger
  2026-06-24  8:39     ` Marat Khalili
  6 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-23 23:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

Run each converted filter through both the interpreter and the JIT and
check they agree, catching JIT miscompiles.

test_bpf_filter and test_bpf_match did nearly the same thing: compile,
load and run a filter against the dummy packet. Combine them into
test_bpf_match, which now builds the packet itself and returns whether
the filter matched. Callers run it for both load methods.

The dummy packet is a UDP packet to a fixed destination MAC, source
and destination ports, so the filter results are deterministic. None
of the sample filters should match it, so assert that; a convert or
JIT bug that flips a result is then caught. The destination MAC and
source port are chosen so the negative ethernet and port filters do
not match, and "port not 53 and not arp" is dropped as it matches
any non-ARP packet that lacks port 53.

Reduce log output to make it easier to match which expression might be
causing issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 171 ++++++++++++++++++++++++++------------------
 1 file changed, 100 insertions(+), 71 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 9adffcce64..8934c98c3c 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -32,6 +32,7 @@ test_bpf(void)
 #include <rte_bpf.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
+#include <rte_udp.h>
 
 
 /* Tests of most simple BPF programs (no instructions, one instruction etc.) */
@@ -4763,11 +4764,13 @@ load_cbpf_program_convert(struct bpf_program *cbpf_program, const char *str)
 		return NULL;
 	}
 
+#ifdef DEBUG
 	printf("bpf convert(\"%s\") produced:\n", str);
 	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
 
 	printf("%s \"%s\"\n", __func__, str);
 	test_bpf_dump(cbpf_program, prm);
+#endif
 
 	bpf = rte_bpf_load(prm);
 	rte_free(prm);
@@ -4792,18 +4795,65 @@ load_cbpf_program_direct(struct bpf_program *cbpf_program, const char *str __rte
 	});
 }
 
+static const load_cbpf_program_t cbpf_program_loaders[] = {
+	load_cbpf_program_convert,
+	load_cbpf_program_direct,
+};
+
+/* Setup Ethernet/IP/UDP headers in a dummy packet buffer for filter tests */
+static void
+dummy_ip_prep(void *data, uint16_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+		struct rte_udp_hdr udp_hdr;
+	} *hdr = data;
+
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.dst_addr.addr_bytes = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x0e },
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_UDP,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+	hdr->udp_hdr = (struct rte_udp_hdr) {
+		.src_port = rte_cpu_to_be_16(49152),	/* fixed, avoids filter ports */
+		.dst_port = rte_cpu_to_be_16(9),	/* discard port */
+		.dgram_len = rte_cpu_to_be_16(plen - sizeof(struct rte_ipv4_hdr)),
+		.dgram_cksum = 0,
+	};
+}
+
+/*
+ * Compile a pcap filter, load it with the given loader, then run it against
+ * a standard dummy packet with both the interpreter and (when available) the
+ * JIT, checking the two agree.
+ *
+ * Returns 1 if the filter matched, 0 if it did not, and -1 on any error
+ * (compile, load, or interpreter/JIT mismatch).
+ */
 static int
-test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
+test_bpf_match(pcap_t *pcap, const char *str,
 	load_cbpf_program_t load_cbpf_program)
 {
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	const uint32_t plen = 100;
 	struct bpf_program fcode;
-	struct rte_bpf *bpf;
+	struct rte_mbuf mb = { 0 };
+	struct rte_bpf *bpf = NULL;
 	int ret = -1;
 	uint64_t rc;
 
+	printf("%s '%s'\n", __func__, str);
 	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
-		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		       __func__, __LINE__, str, pcap_geterr(pcap));
 		return -1;
 	}
 
@@ -4811,15 +4861,41 @@ test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
 	if (bpf == NULL) {
 		printf("%s@%d: failed to load cbpf program for \"%s\", error=%d(%s);\n",
 			__func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		test_bpf_dump(&fcode, NULL);
 		goto error;
 	}
 
-	rc = rte_bpf_exec(bpf, mb);
-	/* The return code from bpf capture filter is non-zero if matched */
-	ret = (rc == 0);
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
+
+	rc = rte_bpf_exec(bpf, &mb);
+
+	/* Verify the JIT, when available, produces the same result. */
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func != NULL) {
+			fflush(stdout);
+			if (jit.func(&mb) != rc) {
+				printf("%s@%d: JIT return code does not match\n",
+				       __func__, __LINE__);
+				goto error;
+			}
+		}
+#ifdef RTE_BPF_JIT_SUPPORTED
+		else {
+			printf("%s@%d: no JIT code generated\n",
+			       __func__, __LINE__);
+			goto error;
+		}
+#endif
+	}
+
+	/* The return code from a bpf capture filter is non-zero if matched. */
+	ret = (rc != 0);
 error:
-	if (bpf)
-		rte_bpf_destroy(bpf);
+	rte_bpf_destroy(bpf);
 	pcap_freecode(&fcode);
 	return ret;
 }
@@ -4828,44 +4904,13 @@ test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
 static int
 test_bpf_filter_sanity(pcap_t *pcap)
 {
-	static const load_cbpf_program_t cbpf_program_loaders[] = {
-		load_cbpf_program_convert,
-		load_cbpf_program_direct,
-	};
-
-	const uint32_t plen = 100;
-	struct rte_mbuf mb, *m;
-	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
-	struct {
-		struct rte_ether_hdr eth_hdr;
-		struct rte_ipv4_hdr ip_hdr;
-	} *hdr;
-
-	memset(&mb, 0, sizeof(mb));
-	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
-	m = &mb;
-
-	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
-	hdr->eth_hdr = (struct rte_ether_hdr) {
-		.dst_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
-		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
-	};
-	hdr->ip_hdr = (struct rte_ipv4_hdr) {
-		.version_ihl = RTE_IPV4_VHL_DEF,
-		.total_length = rte_cpu_to_be_16(plen),
-		.time_to_live = IPDEFTTL,
-		.next_proto_id = IPPROTO_RAW,
-		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
-		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
-	};
-
-	for (int li = 0; li != RTE_DIM(cbpf_program_loaders); ++li) {
-		if (test_bpf_match(pcap, "ip", m, cbpf_program_loaders[li]) != 0) {
+	for (unsigned int li = 0; li != RTE_DIM(cbpf_program_loaders); ++li) {
+		if (test_bpf_match(pcap, "ip", cbpf_program_loaders[li]) != 1) {
 			printf("%s@%d: filter \"ip\" doesn't match test data\n",
 			       __func__, __LINE__);
 			return -1;
 		}
-		if (test_bpf_match(pcap, "not ip", m, cbpf_program_loaders[li]) == 0) {
+		if (test_bpf_match(pcap, "not ip", cbpf_program_loaders[li]) != 0) {
 			printf("%s@%d: filter \"not ip\" does match test data\n",
 			       __func__, __LINE__);
 			return -1;
@@ -4889,7 +4934,6 @@ static const char * const sample_filters[] = {
 	"port 53",
 	"host 192.0.2.1 and not (port 80 or port 25)",
 	"host 2001:4b98:db0::8 and not port 80 and not port 25",
-	"port not 53 and not arp",
 	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
 	"ether proto 0x888e",
 	"ether[0] & 1 = 0 and ip[16] >= 224",
@@ -4916,35 +4960,10 @@ static const char * const sample_filters[] = {
 	"or host 192.0.2.1 or host 192.0.2.100 or host 192.0.2.200"),
 };
 
-static int
-test_bpf_filter(pcap_t *pcap, const char *s, load_cbpf_program_t load_cbpf_program)
-{
-	struct bpf_program fcode;
-	struct rte_bpf *bpf;
-
-	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
-		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
-		       __func__, __LINE__, s, pcap_geterr(pcap));
-		return -1;
-	}
-
-	bpf = load_cbpf_program(&fcode, s);
-	if (bpf == NULL) {
-		printf("%s@%d: failed to load cbpf program for \"%s\", error=%d(%s);\n",
-			__func__, __LINE__, s, rte_errno, strerror(rte_errno));
-		test_bpf_dump(&fcode, NULL);
-	}
-
-	rte_bpf_destroy(bpf);
-
-	pcap_freecode(&fcode);
-	return (bpf == NULL) ? -1 : 0;
-}
-
 static int
 test_bpf_convert(void)
 {
-	unsigned int i;
+	unsigned int i, li;
 	pcap_t *pcap;
 	int rc;
 
@@ -4956,8 +4975,18 @@ test_bpf_convert(void)
 
 	rc = test_bpf_filter_sanity(pcap);
 	for (i = 0; i < RTE_DIM(sample_filters); i++) {
-		rc |= test_bpf_filter(pcap, sample_filters[i], load_cbpf_program_convert);
-		rc |= test_bpf_filter(pcap, sample_filters[i], load_cbpf_program_direct);
+		for (li = 0; li < RTE_DIM(cbpf_program_loaders); li++) {
+			int m = test_bpf_match(pcap, sample_filters[i],
+					       cbpf_program_loaders[li]);
+
+			/* None of the sample filters match the dummy packet. */
+			if (m != 0) {
+				if (m > 0)
+					printf("%s@%d: filter \"%s\" unexpectedly matched\n",
+					       __func__, __LINE__, sample_filters[i]);
+				rc = -1;
+			}
+		}
 	}
 
 	pcap_close(pcap);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 3/7] test/bpf: add test for large shift
  2026-06-23 23:23   ` [PATCH v4 3/7] test/bpf: add test for large shift Stephen Hemminger
@ 2026-06-24  7:59     ` Marat Khalili
  2026-06-24 13:44       ` Marat Khalili
  0 siblings, 1 reply; 73+ messages in thread
From: Marat Khalili @ 2026-06-24  7:59 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday 24 June 2026 00:23
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH v4 3/7] test/bpf: add test for large shift
> 
> The JIT compiler had issues with immediate values on shift instructions
> so add a new test to cover that case.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index 232e9e2a98..b54e36910b 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -2005,6 +2005,58 @@ test_div1_check(uint64_t rc, const void *arg)
>  	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
>  }
> 
> +/*
> + * Shift by an immediate that doesn't fit in a signed byte: the C1 shift
> + * group takes a fixed 1-byte immediate, but imm_size() returns 4 for
> + * counts >= 128, so the x86 JIT emits 3 stray bytes and desyncs the
> + * instruction stream. The shift results are discarded (a count >= 64 is
> + * UB in the interpreter); the test returns a known constant, which the
> + * corrupted stream fails to produce.
> + */
> +static const struct ebpf_insn test_shift_big_imm_prog[] = {
> +	{
> +		.code = (BPF_ALU | EBPF_MOV | BPF_K),
> +		.dst_reg = EBPF_REG_2,
> +		.imm = 0x1,
> +	},
> +	{
> +		.code = (EBPF_ALU64 | BPF_LSH | BPF_K),
> +		.dst_reg = EBPF_REG_2,
> +		.imm = 137,
> +	},
> +	{
> +		.code = (EBPF_ALU64 | BPF_RSH | BPF_K),
> +		.dst_reg = EBPF_REG_2,
> +		.imm = 200,
> +	},
> +	{
> +		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
> +		.dst_reg = EBPF_REG_2,
> +		.imm = 255,
> +	},
> +	/* known result; a desynced stream won't reproduce it */
> +	{
> +		.code = (BPF_ALU | EBPF_MOV | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 0x55,
> +	},
> +	{
> +		.code = (BPF_JMP | EBPF_EXIT),
> +	},
> +};

// snip the rest

Thanks a lot for adding this test. Can we use the shift results though, instead
of discarding them, maybe as another test case? If the interpreter is unable to
reproduce them or triggers sanitizer it needs to be fixed as well. (Apologies
for this scope creep but I hope we get to the bottom of it eventually.)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 7/7] test/bpf: check that bpf_convert can be JIT'd
  2026-06-23 23:23   ` [PATCH v4 7/7] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-24  8:39     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-24  8:39 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday 24 June 2026 00:23
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH v4 7/7] test/bpf: check that bpf_convert can be JIT'd
> 
> Run each converted filter through both the interpreter and the JIT and
> check they agree, catching JIT miscompiles.
> 
> test_bpf_filter and test_bpf_match did nearly the same thing: compile,
> load and run a filter against the dummy packet. Combine them into
> test_bpf_match, which now builds the packet itself and returns whether
> the filter matched. Callers run it for both load methods.
> 
> The dummy packet is a UDP packet to a fixed destination MAC, source
> and destination ports, so the filter results are deterministic. None
> of the sample filters should match it, so assert that; a convert or
> JIT bug that flips a result is then caught. The destination MAC and
> source port are chosen so the negative ethernet and port filters do
> not match, and "port not 53 and not arp" is dropped as it matches
> any non-ARP packet that lacks port 53.
> 
> Reduce log output to make it easier to match which expression might be
> causing issues.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Acked-by: Marat Khalili <marat.khalili@huawei.com>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 4/7] bpf/arm64: fix offset type to allow a negative jump
  2026-06-23 23:23   ` [PATCH v4 4/7] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
@ 2026-06-24  8:43     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-24  8:43 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org
  Cc: Christophe Fontaine, stable@dpdk.org, Wathsala Vithanage,
	Konstantin Ananyev, Jerin Jacob

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday 24 June 2026 00:23
> To: dev@dpdk.org
> Cc: Christophe Fontaine <cfontain@redhat.com>; stable@dpdk.org; Stephen Hemminger
> <stephen@networkplumber.org>; Wathsala Vithanage <wathsala.vithanage@arm.com>; Konstantin Ananyev
> <konstantin.ananyev@huawei.com>; Marat Khalili <marat.khalili@huawei.com>; Jerin Jacob
> <jerinj@marvell.com>
> Subject: [PATCH v4 4/7] bpf/arm64: fix offset type to allow a negative jump
> 
> From: Christophe Fontaine <cfontain@redhat.com>
> 
> The DPDK BPF JIT standalone test test_ld_mbuf1 fails on arm64.
> It does:
> 	r6 = r1                    // mbuf
> 	r0 = *(u8 *)pkt[0]         // BPF_ABS
> 	if ((r0 & 0xf0) == 0x40)
> 		goto parse
> 	r0 = 0
> 	exit                       // epilogue E0
> parse:
> 	r0 = *(u8 *)pkt[r0 + 3]    // BPF_IND
> 	...
> 	exit
> 
> emit_return_zero_if_src_zero() returns 0 by branching to a function
> epilogue. The target maybe a previous epilogue so branch
> might be backwards; therefore the offset needs to be negative.
> 
> The offset was stored in a uint16_t, so a negative value wrapped to a
> large positive number; emit_b() then branched past the end of the
> program and faulted at run time.
> 
> Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_jit_arm64.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
> index ba7ae4d680..776d7c8e97 100644
> --- a/lib/bpf/bpf_jit_arm64.c
> +++ b/lib/bpf/bpf_jit_arm64.c
> @@ -957,10 +957,12 @@ static void
>  emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
>  {
>  	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> -	uint16_t jump_to_epilogue;
> +	int32_t jump_to_epilogue;
> 
>  	emit_cbnz(ctx, is64, src, 3);
>  	emit_mov_imm(ctx, is64, r0, 0);
> +
> +	/* maybe backwards branch to earlier epilogue */
>  	jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;
>  	emit_b(ctx, jump_to_epilogue);
>  }
> --
> 2.53.0

Acked-by: Marat Khalili <marat.khalili@huawei.com>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v4 3/7] test/bpf: add test for large shift
  2026-06-24  7:59     ` Marat Khalili
@ 2026-06-24 13:44       ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-24 13:44 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

> > +/*
> > + * Shift by an immediate that doesn't fit in a signed byte: the C1 shift
> > + * group takes a fixed 1-byte immediate, but imm_size() returns 4 for
> > + * counts >= 128, so the x86 JIT emits 3 stray bytes and desyncs the
> > + * instruction stream. The shift results are discarded (a count >= 64 is
> > + * UB in the interpreter); the test returns a known constant, which the
> > + * corrupted stream fails to produce.
> > + */
> > +static const struct ebpf_insn test_shift_big_imm_prog[] = {
> > +	{
> > +		.code = (BPF_ALU | EBPF_MOV | BPF_K),
> > +		.dst_reg = EBPF_REG_2,
> > +		.imm = 0x1,
> > +	},
> > +	{
> > +		.code = (EBPF_ALU64 | BPF_LSH | BPF_K),
> > +		.dst_reg = EBPF_REG_2,
> > +		.imm = 137,
> > +	},
> > +	{
> > +		.code = (EBPF_ALU64 | BPF_RSH | BPF_K),
> > +		.dst_reg = EBPF_REG_2,
> > +		.imm = 200,
> > +	},
> > +	{
> > +		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
> > +		.dst_reg = EBPF_REG_2,
> > +		.imm = 255,
> > +	},
> > +	/* known result; a desynced stream won't reproduce it */
> > +	{
> > +		.code = (BPF_ALU | EBPF_MOV | BPF_K),
> > +		.dst_reg = EBPF_REG_0,
> > +		.imm = 0x55,
> > +	},
> > +	{
> > +		.code = (BPF_JMP | EBPF_EXIT),
> > +	},
> > +};
> 
> // snip the rest
> 
> Thanks a lot for adding this test. Can we use the shift results though, instead
> of discarding them, maybe as another test case? If the interpreter is unable to
> reproduce them or triggers sanitizer it needs to be fixed as well. (Apologies
> for this scope creep but I hope we get to the bottom of it eventually.)

Speaking of scope creep, this currently fails on ARM since emit_lsl (as well as 
emit_lsr, emit_asr) does not clear unused immediate bits and thus the value 
does not fit in the encoding.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 0/9] bpf: JIT related bug fixes
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (7 preceding siblings ...)
  2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-24 17:54 ` Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
                     ` (8 more replies)
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
  9 siblings, 9 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:54 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

While implementing JIT for packet capture ran into several issues:
  1. x86 JIT had a pre-existing bug which would crash.
  2. The arm64 JIT was missing the packet-access instructions, found
     previously [1].
  3. Shift counts were not masked to the operand width as RFC 9669
     requires: undefined behavior in the interpreter and an encoding
     failure in the arm64 JIT.
  4. Tests related to JIT were not being run or were missing coverage.

Fixed all of these. Patches are ordered with the most urgent fix (the
x86 crash) first, each fix followed by the test that exercises it. The
arm64 packet-load support is kept ahead of the "check JIT was generated"
patch so the series bisects cleanly on arm64.

The arm64 epilogue branch fix (patch 6) was originally posted by
Christophe Fontaine [1]; that series stalled, so it is carried here with
his authorship.

Changes since v4:
 - address Marat's review of the large-shift test: it now checks the
   RFC 9669 masked result instead of a sentinel, which surfaced two more
   bugs fixed in this version
 - mask shift counts to the operand width: the interpreter shifted by
   the raw count (UB, trips UBSan; patch 3) and the arm64 JIT overflowed
   the UBFM/SBFM immediate (patch 4)
 - reorder so the arm64 BPF_ABS/BPF_IND support precedes the "check JIT
   was generated" patch, keeping the suite bisectable on arm64

[1] https://inbox.dpdk.org/dev/20260319114500.9757-2-cfontain@redhat.com/

Christophe Fontaine (1):
  bpf/arm64: fix offset type to allow a negative jump

Stephen Hemminger (8):
  bpf/x86: fix JIT encoding of fixed-width immediates
  test/bpf: add JSET test with small immediate
  bpf: mask shift count in interpreter per RFC 9669
  bpf/arm64: mask shift count per RFC 9669
  test/bpf: add test for large shift
  bpf/arm64: add BPF_ABS/BPF_IND packet load support
  test/bpf: check that JIT was generated
  test/bpf: check that bpf_convert can be JIT'd

 app/test/test_bpf.c     | 320 +++++++++++++++++++++++++++++++---------
 lib/bpf/bpf_exec.c      |  31 ++--
 lib/bpf/bpf_jit_arm64.c | 165 ++++++++++++++++++++-
 lib/bpf/bpf_jit_x86.c   |   6 +-
 lib/bpf/meson.build     |   2 +
 5 files changed, 436 insertions(+), 88 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 1/9] bpf/x86: fix JIT encoding of fixed-width immediates
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 2/9] test/bpf: add JSET test with small immediate Stephen Hemminger
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Marat Khalili, Konstantin Ananyev,
	Ferruh Yigit

Several places in the x86 JIT size an immediate with imm_size(), which
returns 1 or 4 bytes depending on the value. That is wrong for opcodes
whose immediate width is fixed by the encoding, and it breaks in both
directions.

TEST (0xF7 /0, used for BPF_JSET) has no imm8 form; the immediate is
always 32 bits. For a small mask such as BPF_JSET | BPF_K #0x1,
imm_size() returns 1, so the JIT emits a 1-byte immediate. The CPU
still consumes 4, swallowing 3 bytes of the following Jcc. The
instruction stream desyncs and the program crashes.

ROR and the shifts (0xC1 group) have the opposite problem: their
immediate is always imm8. For a count >= 128, imm_size() returns 4 and
the JIT emits 3 stray bytes, again desyncing the stream.

Size each immediate by its encoding: 32 bits for TEST, 8 bits for ROR
and the shifts.

Bugzilla ID: 1959
Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 lib/bpf/bpf_jit_x86.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/bpf/bpf_jit_x86.c b/lib/bpf/bpf_jit_x86.c
index 54eb279643..912d3f69bc 100644
--- a/lib/bpf/bpf_jit_x86.c
+++ b/lib/bpf/bpf_jit_x86.c
@@ -300,7 +300,7 @@ emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
 	emit_rex(st, BPF_ALU, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }

 /*
@@ -441,7 +441,7 @@ emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
 	uint32_t imm)
 {
 	emit_shift(st, op, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }

 /*
@@ -921,7 +921,7 @@ emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
 	emit_rex(st, op, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(int32_t));
 }

 static void
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 2/9] test/bpf: add JSET test with small immediate
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 3/9] bpf: mask shift count in interpreter per RFC 9669 Stephen Hemminger
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

The existing jump test only used a 32-bit JSET mask,
so the broken imm8 encoding of TEST in the x86 JIT was never exercised.
Add a case with a byte-sized mask;
run_test() runs it through the interpreter and the JIT.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 6b07e72295..232e9e2a98 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3158,7 +3158,89 @@ static const struct ebpf_insn test_ld_mbuf3_prog[] = {
 };
 
 /* all bpf test cases */
+/*
+ * JSET with a byte-sized mask: exercises the imm8 path of the TEST
+ * encoding in the x86 JIT (a 32-bit mask takes a different path).
+ */
+static const struct ebpf_insn test_jset1_prog[] = {
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	/* bit 0 is set in the input: branch is taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x1,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x1,
+	},
+	/* bit 1 is clear in the input: branch is not taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x2,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_jset1_prepare(void *arg)
+{
+	struct dummy_offset *df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u8 = 0x1;	/* bit 0 set, bit 1 clear */
+}
+
+static int
+test_jset1_check(uint64_t rc, const void *arg)
+{
+	return cmp_res(__func__, 0x1, rc, arg, arg, 0);
+}
+
 static const struct bpf_test tests[] = {
+	{
+		.name = "test_jset1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_jset1_prog,
+			.nb_ins = RTE_DIM(test_jset1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_jset1_prepare,
+		.check_result = test_jset1_check,
+	},
 	{
 		.name = "test_store1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 3/9] bpf: mask shift count in interpreter per RFC 9669
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 2/9] test/bpf: add JSET test with small immediate Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-25 15:35     ` Marat Khalili
  2026-06-24 17:55   ` [PATCH v5 4/9] bpf/arm64: mask shift count " Stephen Hemminger
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Konstantin Ananyev, Marat Khalili,
	Ferruh Yigit

The interpreter shifted by the raw immediate or register value, which
is undefined behavior in C when the count is >= the operand width and
trips UBSan. RFC 9669 masks shift counts (0x3f for 64-bit, 0x1f for
32-bit); mask the count in the LSH/RSH/ARSH cases.

Fixes: 94972f35a02e ("bpf: add BPF loading and execution framework")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_exec.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/lib/bpf/bpf_exec.c b/lib/bpf/bpf_exec.c
index d423ef28f5..bb03c9cc2c 100644
--- a/lib/bpf/bpf_exec.c
+++ b/lib/bpf/bpf_exec.c
@@ -4,6 +4,7 @@
 
 #include <stdio.h>
 #include <stdint.h>
+#include <limits.h>
 
 #include <eal_export.h>
 #include <rte_common.h>
@@ -43,6 +44,16 @@
 	((reg)[(ins)->dst_reg] = \
 		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
 
+#define BPF_OP_SHIFT_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] =		\
+		(type)(reg)[(ins)->dst_reg] op	\
+		((ins)->imm & (sizeof(type) * CHAR_BIT - 1)))
+
+#define BPF_OP_SHIFT_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] =		\
+		(type)(reg)[(ins)->dst_reg] op	\
+		((reg)[(ins)->src_reg] & (sizeof(type) * CHAR_BIT - 1)))
+
 #define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
 	if ((type)(reg)[(ins)->src_reg] == 0) { \
 		RTE_BPF_LOG_LINE(ERR, \
@@ -183,10 +194,10 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
 			break;
 		case (BPF_ALU | BPF_LSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			BPF_OP_SHIFT_IMM(reg, ins, <<, uint32_t);
 			break;
 		case (BPF_ALU | BPF_RSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			BPF_OP_SHIFT_IMM(reg, ins, >>, uint32_t);
 			break;
 		case (BPF_ALU | BPF_XOR | BPF_K):
 			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
@@ -217,10 +228,10 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
 			break;
 		case (BPF_ALU | BPF_LSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			BPF_OP_SHIFT_REG(reg, ins, <<, uint32_t);
 			break;
 		case (BPF_ALU | BPF_RSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			BPF_OP_SHIFT_REG(reg, ins, >>, uint32_t);
 			break;
 		case (BPF_ALU | BPF_XOR | BPF_X):
 			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
@@ -262,13 +273,13 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_LSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			BPF_OP_SHIFT_IMM(reg, ins, <<, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_RSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			BPF_OP_SHIFT_IMM(reg, ins, >>, uint64_t);
 			break;
 		case (EBPF_ALU64 | EBPF_ARSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			BPF_OP_SHIFT_IMM(reg, ins, >>, int64_t);
 			break;
 		case (EBPF_ALU64 | BPF_XOR | BPF_K):
 			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
@@ -299,13 +310,13 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_LSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			BPF_OP_SHIFT_REG(reg, ins, <<, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_RSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			BPF_OP_SHIFT_REG(reg, ins, >>, uint64_t);
 			break;
 		case (EBPF_ALU64 | EBPF_ARSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			BPF_OP_SHIFT_REG(reg, ins, >>, int64_t);
 			break;
 		case (EBPF_ALU64 | BPF_XOR | BPF_X):
 			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 4/9] bpf/arm64: mask shift count per RFC 9669
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (2 preceding siblings ...)
  2026-06-24 17:55   ` [PATCH v5 3/9] bpf: mask shift count in interpreter per RFC 9669 Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-25 15:40     ` Marat Khalili
  2026-06-24 17:55   ` [PATCH v5 5/9] test/bpf: add test for large shift Stephen Hemminger
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili, Jerin Jacob

The ARM JIT was not masking the shift count as required by RFC 9669
(0x3f for 64-bit, 0x1f for 32-bit), so large immediate shift counts
overflowed the UBFM/SBFM encoding and failed the JIT. Mask the
immediate in emit_lsl/emit_lsr/emit_asr.

Fixes: 9f4469d9e83a ("bpf/arm: add logical operations")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index ba7ae4d680..7582370062 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -545,12 +545,14 @@ emit_bitfield(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t rn,
 	emit_insn(ctx, insn, check_reg(rd) || check_reg(rn) ||
 		  check_immr_imms(is64, immr, imms));
 }
+
 static void
 emit_lsl(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 {
 	const unsigned int width = is64 ? 64 : 32;
 	uint8_t imms, immr;
 
+	imm &= width - 1;
 	immr = (width - imm) & (width - 1);
 	imms = width - 1 - imm;
 
@@ -560,13 +562,19 @@ emit_lsl(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 static void
 emit_lsr(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 {
-	emit_bitfield(ctx, is64, rd, rd, imm, is64 ? 63 : 31, A64_UBFM);
+	const unsigned int width = is64 ? 64 : 32;
+
+	imm &= width - 1;
+	emit_bitfield(ctx, is64, rd, rd, imm, width - 1, A64_UBFM);
 }
 
 static void
 emit_asr(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 {
-	emit_bitfield(ctx, is64, rd, rd, imm, is64 ? 63 : 31, A64_SBFM);
+	const unsigned int width = is64 ? 64 : 32;
+
+	imm &= width - 1;
+	emit_bitfield(ctx, is64, rd, rd, imm, width - 1, A64_SBFM);
 }
 
 #define A64_AND 0
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 5/9] test/bpf: add test for large shift
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (3 preceding siblings ...)
  2026-06-24 17:55   ` [PATCH v5 4/9] bpf/arm64: mask shift count " Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-25 15:38     ` Marat Khalili
  2026-06-24 17:55   ` [PATCH v5 6/9] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

There were multiple bugs with immediate values in shift instructions.
The code was not masking as required by RFC.

Add new tests that cover these instructions.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 232e9e2a98..0e5894a532 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -2005,6 +2005,51 @@ test_div1_check(uint64_t rc, const void *arg)
 	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
 }
 
+/*
+ * Shift counts are masked to the operand width (RFC 9669: 0x3f for 64-bit,
+ * 0x1f for 32-bit). Counts >= 128 also exercise the x86 imm_size() path that
+ * used to desync the stream, and the arm64 UBFM/SBFM immediate encoding.
+ */
+static const struct ebpf_insn test_shift_big_imm_prog[] = {
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_LSH | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 191
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 200
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_RSH | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 130
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT)
+	},
+};
+
+static void
+test_shift_big_imm_prepare(void *arg)
+{
+	memset(arg, 0, sizeof(struct dummy_offset));
+}
+
+static int
+test_shift_big_imm_check(uint64_t rc, const void *arg)
+{
+	uint64_t expect = 0x3FE0000000000000ULL;
+
+	return cmp_res(__func__, expect, rc, arg, arg, 0);
+}
+
 /* call test-cases */
 static const struct ebpf_insn test_call1_prog[] = {
 
@@ -3409,6 +3454,20 @@ static const struct bpf_test tests[] = {
 		.prepare = test_mul1_prepare,
 		.check_result = test_div1_check,
 	},
+	{
+		.name = "test_shift_big_imm",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_shift_big_imm_prog,
+			.nb_ins = RTE_DIM(test_shift_big_imm_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_shift_big_imm_prepare,
+		.check_result = test_shift_big_imm_check,
+	},
 	{
 		.name = "test_call1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 6/9] bpf/arm64: fix offset type to allow a negative jump
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (4 preceding siblings ...)
  2026-06-24 17:55   ` [PATCH v5 5/9] test/bpf: add test for large shift Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev
  Cc: Christophe Fontaine, stable, Stephen Hemminger, Marat Khalili,
	Wathsala Vithanage, Konstantin Ananyev, Jerin Jacob

From: Christophe Fontaine <cfontain@redhat.com>

The DPDK BPF JIT standalone test test_ld_mbuf1 fails on arm64.
It does:
	r6 = r1                    // mbuf
	r0 = *(u8 *)pkt[0]         // BPF_ABS
	if ((r0 & 0xf0) == 0x40)
		goto parse
	r0 = 0
	exit                       // epilogue E0
parse:
	r0 = *(u8 *)pkt[r0 + 3]    // BPF_IND
	...
	exit

emit_return_zero_if_src_zero() returns 0 by branching to a function
epilogue. The target may be a previous epilogue so branch
might be backwards; therefore the offset needs to be negative.

The offset was stored in a uint16_t, so a negative value wrapped to a
large positive number; emit_b() then branched past the end of the
program and faulted at run time.

Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
Cc: stable@dpdk.org

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 lib/bpf/bpf_jit_arm64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 7582370062..51906c7f0d 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -965,10 +965,12 @@ static void
 emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
 {
 	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
-	uint16_t jump_to_epilogue;
+	int32_t jump_to_epilogue;
 
 	emit_cbnz(ctx, is64, src, 3);
 	emit_mov_imm(ctx, is64, r0, 0);
+
+	/* maybe backwards branch to earlier epilogue */
 	jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;
 	emit_b(ctx, jump_to_epilogue);
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (5 preceding siblings ...)
  2026-06-24 17:55   ` [PATCH v5 6/9] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-25 13:59     ` Marat Khalili
  2026-06-24 17:55   ` [PATCH v5 8/9] test/bpf: check that JIT was generated Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 9/9] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  8 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili

The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
"invalid opcode", so cBPF programs converted by rte_bpf_convert() could
not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
data held in the first mbuf segment, and a __rte_pktmbuf_read() slow
path for everything else.

The forward branches over the call cannot use fixed distances:
emit_call() materializes the helper address with a variable number of
mov/movk instructions, so the block sizes are not known up front. Size
the three blocks (fast path, slow path, common tail) in a dry run, then
emit for real with the branches resolved from the measured offsets.

Programs using these opcodes use the call register layout, since the
slow path makes a function call.

Bugzilla ID: 1427

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 149 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 148 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 51906c7f0d..62fe5bf0dc 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -1133,6 +1133,135 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
 	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
 }
 
+/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
+enum {
+	LDMB_FAST_OFS, /* fast path */
+	LDMB_SLOW_OFS, /* slow path */
+	LDMB_FIN_OFS,  /* common tail */
+	LDMB_OFS_NUM
+};
+
+/*
+ * Helper for emit_ld_mbuf(): fast path.
+ * Compute the packet offset; if it lies inside the first segment leave the
+ * data pointer in R0, otherwise branch to the slow path.
+ */
+static void
+emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
+		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
+
+	/* off = imm (+ src for BPF_IND) */
+	emit_mov_imm(ctx, 1, tmp1, imm);
+	if (mode == BPF_IND)
+		emit_add(ctx, 1, tmp1, src);
+
+	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_sub(ctx, 1, tmp2, tmp1);
+	emit_mov_imm(ctx, 1, tmp3, sz);
+	emit_cmp(ctx, 1, tmp2, tmp3);
+	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
+	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
+	emit_add(ctx, 1, r0, tmp2);
+	emit_add(ctx, 1, r0, tmp1);
+
+	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
+}
+
+/*
+ * Helper for emit_ld_mbuf(): slow path.
+ * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
+ * The scratch buffer is the space reserved by __rte_bpf_validate() at the
+ * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
+ */
+static void
+emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+
+	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
+	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
+	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
+	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
+	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
+
+	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
+	emit_return_zero_if_src_zero(ctx, 1, r0);
+}
+
+/*
+ * Helper for emit_ld_mbuf(): common tail.
+ * Load the value pointed to by R0 and convert from network byte order.
+ */
+static void
+emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+
+	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
+	if (opsz != BPF_B)
+		emit_be(ctx, r0, sz * 8);
+}
+
+/*
+ * emit code for BPF_ABS/BPF_IND load.
+ * generates the following construction:
+ * fast_path:
+ *   off = src + imm
+ *   if (mbuf->data_len - off < sz)
+ *      goto slow_path;
+ *   ptr = mbuf->buf_addr + mbuf->data_off + off;
+ *   goto fin_part;
+ * slow_path:
+ *   typeof(sz) buf;  // scratch space reserved on the eBPF stack
+ *   ptr = __rte_pktmbuf_read(mbuf, off, sz, &buf);
+ *   if (ptr == NULL)
+ *      return 0;
+ * fin_part:
+ *   res = *(typeof(sz))ptr;
+ *   res = ntoh(res);
+ */
+static void
+emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
+	     uint32_t stack_ofs)
+{
+	uint8_t mode = BPF_MODE(op);
+	uint8_t opsz = BPF_SIZE(op);
+	uint32_t sz = bpf_size(opsz);
+	uint32_t ofs[LDMB_OFS_NUM];
+
+	/* seed offsets so the dry-run branches stay in range */
+	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
+
+	/* dry run to record block offsets */
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	ofs[LDMB_SLOW_OFS] = ctx->idx;
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	ofs[LDMB_FIN_OFS] = ctx->idx;
+	emit_ldmb_fin(ctx, opsz, sz);
+
+	/* rewind and emit for real with resolved offsets */
+	ctx->idx = ofs[LDMB_FAST_OFS];
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	emit_ldmb_fin(ctx, opsz, sz);
+}
+
 static void
 check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 {
@@ -1145,8 +1274,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 		op = ins->code;
 
 		switch (op) {
-		/* Call imm */
+		/*
+		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
+		 * so they need the call-clobbered register layout as well.
+		 */
 		case (BPF_JMP | EBPF_CALL):
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
 			ctx->foundcall = 1;
 			return;
 		}
@@ -1348,6 +1486,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 			emit_mov_imm(ctx, 1, dst, u64);
 			i++;
 			break;
+		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
+			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
+			break;
 		/* *(size *)(dst + off) = src */
 		case (BPF_STX | BPF_MEM | BPF_B):
 		case (BPF_STX | BPF_MEM | BPF_H):
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 8/9] test/bpf: check that JIT was generated
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (6 preceding siblings ...)
  2026-06-24 17:55   ` [PATCH v5 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  2026-06-24 17:55   ` [PATCH v5 9/9] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
  8 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

Avoid silently ignoring JIT failures. The test cases should
all succeed JIT compilation; if not it is a bug in the JIT
implementation and should be reported.

Introduce a configuration setting RTE_BPF_JIT_SUPPORTED
which is cleaner than using an ARCH specific #ifdef.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 8 ++++++++
 lib/bpf/meson.build | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 0e5894a532..16a1004e51 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3649,6 +3649,14 @@ run_test(const struct bpf_test *tst)
 				rv, strerror(rv));
 		}
 	}
+#ifdef RTE_BPF_JIT_SUPPORTED
+	else {
+		/* a JIT backend exists for this arch, so it must compile */
+		printf("%s@%d: %s: no JIT code generated;\n",
+			__func__, __LINE__, tst->name);
+		ret = -1;
+	}
+#endif
 
 	rte_bpf_destroy(bpf);
 	return ret;
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 7e8a300e3f..04ede96689 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -27,8 +27,10 @@ sources = files(
 )
 
 if arch_subdir == 'x86' and dpdk_conf.get('RTE_ARCH_64')
+    dpdk_conf.set('RTE_BPF_JIT_SUPPORTED', 1)
     sources += files('bpf_jit_x86.c')
 elif dpdk_conf.has('RTE_ARCH_ARM64')
+    dpdk_conf.set('RTE_BPF_JIT_SUPPORTED', 1)
     sources += files('bpf_jit_arm64.c')
 endif
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5 9/9] test/bpf: check that bpf_convert can be JIT'd
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (7 preceding siblings ...)
  2026-06-24 17:55   ` [PATCH v5 8/9] test/bpf: check that JIT was generated Stephen Hemminger
@ 2026-06-24 17:55   ` Stephen Hemminger
  8 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-24 17:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

Run each converted filter through both the interpreter and the JIT and
check they agree, catching JIT miscompiles.

test_bpf_filter and test_bpf_match did nearly the same thing: compile,
load and run a filter against the dummy packet. Combine them into
test_bpf_match, which now builds the packet itself and returns whether
the filter matched. Callers run it for both load methods.

The dummy packet is a UDP packet to a fixed destination MAC, source
and destination ports, so the filter results are deterministic. None
of the sample filters should match it, so assert that; a convert or
JIT bug that flips a result is then caught. The destination MAC and
source port are chosen so the negative ethernet and port filters do
not match, and "port not 53 and not arp" is dropped as it matches
any non-ARP packet that lacks port 53.

Reduce log output to make it easier to match which expression might be
causing issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 171 ++++++++++++++++++++++++++------------------
 1 file changed, 100 insertions(+), 71 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 16a1004e51..f56f6c7d7e 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -32,6 +32,7 @@ test_bpf(void)
 #include <rte_bpf.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
+#include <rte_udp.h>
 
 
 /* Tests of most simple BPF programs (no instructions, one instruction etc.) */
@@ -4756,11 +4757,13 @@ load_cbpf_program_convert(struct bpf_program *cbpf_program, const char *str)
 		return NULL;
 	}
 
+#ifdef DEBUG
 	printf("bpf convert(\"%s\") produced:\n", str);
 	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
 
 	printf("%s \"%s\"\n", __func__, str);
 	test_bpf_dump(cbpf_program, prm);
+#endif
 
 	bpf = rte_bpf_load(prm);
 	rte_free(prm);
@@ -4785,18 +4788,65 @@ load_cbpf_program_direct(struct bpf_program *cbpf_program, const char *str __rte
 	});
 }
 
+static const load_cbpf_program_t cbpf_program_loaders[] = {
+	load_cbpf_program_convert,
+	load_cbpf_program_direct,
+};
+
+/* Setup Ethernet/IP/UDP headers in a dummy packet buffer for filter tests */
+static void
+dummy_ip_prep(void *data, uint16_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+		struct rte_udp_hdr udp_hdr;
+	} *hdr = data;
+
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.dst_addr.addr_bytes = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x0e },
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_UDP,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+	hdr->udp_hdr = (struct rte_udp_hdr) {
+		.src_port = rte_cpu_to_be_16(49152),	/* fixed, avoids filter ports */
+		.dst_port = rte_cpu_to_be_16(9),	/* discard port */
+		.dgram_len = rte_cpu_to_be_16(plen - sizeof(struct rte_ipv4_hdr)),
+		.dgram_cksum = 0,
+	};
+}
+
+/*
+ * Compile a pcap filter, load it with the given loader, then run it against
+ * a standard dummy packet with both the interpreter and (when available) the
+ * JIT, checking the two agree.
+ *
+ * Returns 1 if the filter matched, 0 if it did not, and -1 on any error
+ * (compile, load, or interpreter/JIT mismatch).
+ */
 static int
-test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
+test_bpf_match(pcap_t *pcap, const char *str,
 	load_cbpf_program_t load_cbpf_program)
 {
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	const uint32_t plen = 100;
 	struct bpf_program fcode;
-	struct rte_bpf *bpf;
+	struct rte_mbuf mb = { 0 };
+	struct rte_bpf *bpf = NULL;
 	int ret = -1;
 	uint64_t rc;
 
+	printf("%s '%s'\n", __func__, str);
 	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
-		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		       __func__, __LINE__, str, pcap_geterr(pcap));
 		return -1;
 	}
 
@@ -4804,15 +4854,41 @@ test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
 	if (bpf == NULL) {
 		printf("%s@%d: failed to load cbpf program for \"%s\", error=%d(%s);\n",
 			__func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		test_bpf_dump(&fcode, NULL);
 		goto error;
 	}
 
-	rc = rte_bpf_exec(bpf, mb);
-	/* The return code from bpf capture filter is non-zero if matched */
-	ret = (rc == 0);
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
+
+	rc = rte_bpf_exec(bpf, &mb);
+
+	/* Verify the JIT, when available, produces the same result. */
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func != NULL) {
+			fflush(stdout);
+			if (jit.func(&mb) != rc) {
+				printf("%s@%d: JIT return code does not match\n",
+				       __func__, __LINE__);
+				goto error;
+			}
+		}
+#ifdef RTE_BPF_JIT_SUPPORTED
+		else {
+			printf("%s@%d: no JIT code generated\n",
+			       __func__, __LINE__);
+			goto error;
+		}
+#endif
+	}
+
+	/* The return code from a bpf capture filter is non-zero if matched. */
+	ret = (rc != 0);
 error:
-	if (bpf)
-		rte_bpf_destroy(bpf);
+	rte_bpf_destroy(bpf);
 	pcap_freecode(&fcode);
 	return ret;
 }
@@ -4821,44 +4897,13 @@ test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
 static int
 test_bpf_filter_sanity(pcap_t *pcap)
 {
-	static const load_cbpf_program_t cbpf_program_loaders[] = {
-		load_cbpf_program_convert,
-		load_cbpf_program_direct,
-	};
-
-	const uint32_t plen = 100;
-	struct rte_mbuf mb, *m;
-	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
-	struct {
-		struct rte_ether_hdr eth_hdr;
-		struct rte_ipv4_hdr ip_hdr;
-	} *hdr;
-
-	memset(&mb, 0, sizeof(mb));
-	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
-	m = &mb;
-
-	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
-	hdr->eth_hdr = (struct rte_ether_hdr) {
-		.dst_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
-		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
-	};
-	hdr->ip_hdr = (struct rte_ipv4_hdr) {
-		.version_ihl = RTE_IPV4_VHL_DEF,
-		.total_length = rte_cpu_to_be_16(plen),
-		.time_to_live = IPDEFTTL,
-		.next_proto_id = IPPROTO_RAW,
-		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
-		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
-	};
-
-	for (int li = 0; li != RTE_DIM(cbpf_program_loaders); ++li) {
-		if (test_bpf_match(pcap, "ip", m, cbpf_program_loaders[li]) != 0) {
+	for (unsigned int li = 0; li != RTE_DIM(cbpf_program_loaders); ++li) {
+		if (test_bpf_match(pcap, "ip", cbpf_program_loaders[li]) != 1) {
 			printf("%s@%d: filter \"ip\" doesn't match test data\n",
 			       __func__, __LINE__);
 			return -1;
 		}
-		if (test_bpf_match(pcap, "not ip", m, cbpf_program_loaders[li]) == 0) {
+		if (test_bpf_match(pcap, "not ip", cbpf_program_loaders[li]) != 0) {
 			printf("%s@%d: filter \"not ip\" does match test data\n",
 			       __func__, __LINE__);
 			return -1;
@@ -4882,7 +4927,6 @@ static const char * const sample_filters[] = {
 	"port 53",
 	"host 192.0.2.1 and not (port 80 or port 25)",
 	"host 2001:4b98:db0::8 and not port 80 and not port 25",
-	"port not 53 and not arp",
 	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
 	"ether proto 0x888e",
 	"ether[0] & 1 = 0 and ip[16] >= 224",
@@ -4909,35 +4953,10 @@ static const char * const sample_filters[] = {
 	"or host 192.0.2.1 or host 192.0.2.100 or host 192.0.2.200"),
 };
 
-static int
-test_bpf_filter(pcap_t *pcap, const char *s, load_cbpf_program_t load_cbpf_program)
-{
-	struct bpf_program fcode;
-	struct rte_bpf *bpf;
-
-	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
-		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
-		       __func__, __LINE__, s, pcap_geterr(pcap));
-		return -1;
-	}
-
-	bpf = load_cbpf_program(&fcode, s);
-	if (bpf == NULL) {
-		printf("%s@%d: failed to load cbpf program for \"%s\", error=%d(%s);\n",
-			__func__, __LINE__, s, rte_errno, strerror(rte_errno));
-		test_bpf_dump(&fcode, NULL);
-	}
-
-	rte_bpf_destroy(bpf);
-
-	pcap_freecode(&fcode);
-	return (bpf == NULL) ? -1 : 0;
-}
-
 static int
 test_bpf_convert(void)
 {
-	unsigned int i;
+	unsigned int i, li;
 	pcap_t *pcap;
 	int rc;
 
@@ -4949,8 +4968,18 @@ test_bpf_convert(void)
 
 	rc = test_bpf_filter_sanity(pcap);
 	for (i = 0; i < RTE_DIM(sample_filters); i++) {
-		rc |= test_bpf_filter(pcap, sample_filters[i], load_cbpf_program_convert);
-		rc |= test_bpf_filter(pcap, sample_filters[i], load_cbpf_program_direct);
+		for (li = 0; li < RTE_DIM(cbpf_program_loaders); li++) {
+			int m = test_bpf_match(pcap, sample_filters[i],
+					       cbpf_program_loaders[li]);
+
+			/* None of the sample filters match the dummy packet. */
+			if (m != 0) {
+				if (m > 0)
+					printf("%s@%d: filter \"%s\" unexpectedly matched\n",
+					       __func__, __LINE__, sample_filters[i]);
+				rc = -1;
+			}
+		}
 	}
 
 	pcap_close(pcap);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* RE: [PATCH v5 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-24 17:55   ` [PATCH v5 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-25 13:59     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-25 13:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Wathsala Vithanage, Konstantin Ananyev, dev@dpdk.org

Below is what gdb shows actually generated for instruction 15 of
test_ld_mbuf1_prog (with minimal changes and comments for readability). I
suggest adding this to the comments or (if we don't feel like keeping it
updated) the commit message, it helps analyzing the code a bit.

(Also, stack drawings in the file do not include the buffer we use here.)

     0: 0x92800069      mov     x9, #-4         // mov x9, <imm>
     1: 0x8b150129      add     x9, x9, x21     // add x9, src_reg
     2: 0xd280050a      mov     x10, #40        // mov x9, <&::data_len>
     3: 0x786a6a6a      ldrh    w10, [x19, x10]
     4: 0xcb09014a      sub     x10, x10, x9
     5: 0xd280008b      mov     x11, #4         // mov x11, <sz>
     6: 0xeb0b014f      subs    x15, x10, x11
     7: 0x5400010b      b.lt    +8              // b.lt slow
     8: 0xd280020a      mov     x10, #16        // mov x10, <&::data_off>
     9: 0x786a6a6a      ldrh    w10, [x19, x10]
    10: 0xd2800007      mov     x7, #0          // mov x7, <&::buf_addr>
    11: 0xf8676a67      ldr     x7, [x19, x7]
    12: 0x8b0a00e7      add     x7, x7, x10
    13: 0x8b0900e7      add     x7, x7, x9
    14: 0x1400000c      b       +12             // b load
                        slow:
    15: 0x91000121      add     x1, x9, #0      // mov x1, x9
    16: 0x91000260      add     x0, x19, #0     // mov x0, x19
    17: 0x52800082      mov     w2, #4          // mov w2, <sz>
    18: 0xd1002323      sub     x3, x25, #8     // sub x3, x25, <stack_ofs>
    19: 0xd2a04d49      mov     x9, #0x26a0000  // mov x9,
    20: 0xf29d3409      movk    x9, #0xe9a0     //   __rte_pktmbuf_read
    21: 0xd63f0120      blr     x9
    22: 0x91000007      add     x7, x0, #0      // mov x7, x0
    23: 0xb5000067      cbnz    x7, +3          // cbnz load
    24: 0xd2800007      mov     x7, #0x0
    25: 0x17ffff88      b       -120            // b epilogue
                        load:
    26: 0xb87f68e7      ldr     w7, [x7, xzr]
    27: 0xdac008e7      rev32   x7, x7

Opcode variations:
* Instruction 1 is omitted for BPF_ABS.
* Instruction 26 varies depending on sz.
* Instruction 27 varies or is omitted depending on sz.

Some benign nits:
* Instruction 6 should probably be `subs xzr, x10, x11`, a slight 1-bit error in
  the existing code, though x15 is unused.
* Instructions 5 and 17 use different encoding for the same operation, would be
  nice to keep them consistent, though operand never exceeds INT32_MAX.
* Instruction 10 is redundant.

I see two problems:
* We never check that x9 is non-negative. We could either add one more check,
  or rearrange the code and use unsigned comparison at 7: (currently b.lt).
  (There was some discussion previously regarding the special meaning of
  negative BPF_ABS immediate, but I believe this is out of scope of this patch,
  here we should just fail on negative _effective_ offset regardless of opcode.)
* Second argument of __rte_pktmbuf_read is `uint32_t off`, and we are trying to
  pass 64-bit offset in x1. We need a check that it does not exceed UINT32_MAX.

Otherwise looks good to me.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v5 3/9] bpf: mask shift count in interpreter per RFC 9669
  2026-06-24 17:55   ` [PATCH v5 3/9] bpf: mask shift count in interpreter per RFC 9669 Stephen Hemminger
@ 2026-06-25 15:35     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-25 15:35 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org
  Cc: stable@dpdk.org, Konstantin Ananyev, Ferruh Yigit

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday 24 June 2026 18:55
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; stable@dpdk.org; Konstantin Ananyev
> <konstantin.ananyev@huawei.com>; Marat Khalili <marat.khalili@huawei.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>
> Subject: [PATCH v5 3/9] bpf: mask shift count in interpreter per RFC 9669
> 
> The interpreter shifted by the raw immediate or register value, which
> is undefined behavior in C when the count is >= the operand width and
> trips UBSan. RFC 9669 masks shift counts (0x3f for 64-bit, 0x1f for
> 32-bit); mask the count in the LSH/RSH/ARSH cases.
> 
> Fixes: 94972f35a02e ("bpf: add BPF loading and execution framework")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Acked-by: Marat Khalili <marat.khalili@huawei.com>

> ---
>  lib/bpf/bpf_exec.c | 31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/bpf/bpf_exec.c b/lib/bpf/bpf_exec.c
> index d423ef28f5..bb03c9cc2c 100644
> --- a/lib/bpf/bpf_exec.c
> +++ b/lib/bpf/bpf_exec.c
> @@ -4,6 +4,7 @@
> 
>  #include <stdio.h>
>  #include <stdint.h>
> +#include <limits.h>
> 
>  #include <eal_export.h>
>  #include <rte_common.h>
> @@ -43,6 +44,16 @@
>  	((reg)[(ins)->dst_reg] = \
>  		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
> 
> +#define BPF_OP_SHIFT_IMM(reg, ins, op, type)	\
> +	((reg)[(ins)->dst_reg] =		\
> +		(type)(reg)[(ins)->dst_reg] op	\
> +		((ins)->imm & (sizeof(type) * CHAR_BIT - 1)))
> +
> +#define BPF_OP_SHIFT_REG(reg, ins, op, type)	\
> +	((reg)[(ins)->dst_reg] =		\
> +		(type)(reg)[(ins)->dst_reg] op	\
> +		((reg)[(ins)->src_reg] & (sizeof(type) * CHAR_BIT - 1)))
> +
>  #define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
>  	if ((type)(reg)[(ins)->src_reg] == 0) { \
>  		RTE_BPF_LOG_LINE(ERR, \
> @@ -183,10 +194,10 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
>  			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
>  			break;
>  		case (BPF_ALU | BPF_LSH | BPF_K):
> -			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
> +			BPF_OP_SHIFT_IMM(reg, ins, <<, uint32_t);
>  			break;
>  		case (BPF_ALU | BPF_RSH | BPF_K):
> -			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
> +			BPF_OP_SHIFT_IMM(reg, ins, >>, uint32_t);
>  			break;
>  		case (BPF_ALU | BPF_XOR | BPF_K):
>  			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
> @@ -217,10 +228,10 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
>  			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
>  			break;
>  		case (BPF_ALU | BPF_LSH | BPF_X):
> -			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
> +			BPF_OP_SHIFT_REG(reg, ins, <<, uint32_t);
>  			break;
>  		case (BPF_ALU | BPF_RSH | BPF_X):
> -			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
> +			BPF_OP_SHIFT_REG(reg, ins, >>, uint32_t);
>  			break;
>  		case (BPF_ALU | BPF_XOR | BPF_X):
>  			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
> @@ -262,13 +273,13 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
>  			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
>  			break;
>  		case (EBPF_ALU64 | BPF_LSH | BPF_K):
> -			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
> +			BPF_OP_SHIFT_IMM(reg, ins, <<, uint64_t);
>  			break;
>  		case (EBPF_ALU64 | BPF_RSH | BPF_K):
> -			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
> +			BPF_OP_SHIFT_IMM(reg, ins, >>, uint64_t);
>  			break;
>  		case (EBPF_ALU64 | EBPF_ARSH | BPF_K):
> -			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
> +			BPF_OP_SHIFT_IMM(reg, ins, >>, int64_t);
>  			break;
>  		case (EBPF_ALU64 | BPF_XOR | BPF_K):
>  			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
> @@ -299,13 +310,13 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
>  			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
>  			break;
>  		case (EBPF_ALU64 | BPF_LSH | BPF_X):
> -			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
> +			BPF_OP_SHIFT_REG(reg, ins, <<, uint64_t);
>  			break;
>  		case (EBPF_ALU64 | BPF_RSH | BPF_X):
> -			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
> +			BPF_OP_SHIFT_REG(reg, ins, >>, uint64_t);
>  			break;
>  		case (EBPF_ALU64 | EBPF_ARSH | BPF_X):
> -			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
> +			BPF_OP_SHIFT_REG(reg, ins, >>, int64_t);
>  			break;
>  		case (EBPF_ALU64 | BPF_XOR | BPF_X):
>  			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
> --
> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v5 5/9] test/bpf: add test for large shift
  2026-06-24 17:55   ` [PATCH v5 5/9] test/bpf: add test for large shift Stephen Hemminger
@ 2026-06-25 15:38     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-25 15:38 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday 24 June 2026 18:55
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH v5 5/9] test/bpf: add test for large shift
> 
> There were multiple bugs with immediate values in shift instructions.
> The code was not masking as required by RFC.
> 
> Add new tests that cover these instructions.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Acked-by: Marat Khalili <marat.khalili@huawei.com>

> ---
>  app/test/test_bpf.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index 232e9e2a98..0e5894a532 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -2005,6 +2005,51 @@ test_div1_check(uint64_t rc, const void *arg)
>  	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
>  }
> 
> +/*
> + * Shift counts are masked to the operand width (RFC 9669: 0x3f for 64-bit,
> + * 0x1f for 32-bit). Counts >= 128 also exercise the x86 imm_size() path that
> + * used to desync the stream, and the arm64 UBFM/SBFM immediate encoding.
> + */
> +static const struct ebpf_insn test_shift_big_imm_prog[] = {
> +	{
> +		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 1
> +	},
> +	{
> +		.code = (EBPF_ALU64 | BPF_LSH | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 191
> +	},
> +	{
> +		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 200
> +	},
> +	{
> +		.code = (EBPF_ALU64 | BPF_RSH | BPF_K),
> +		.dst_reg = EBPF_REG_0,
> +		.imm = 130
> +	},
> +	{
> +		.code = (BPF_JMP | EBPF_EXIT)
> +	},
> +};
> +
> +static void
> +test_shift_big_imm_prepare(void *arg)
> +{
> +	memset(arg, 0, sizeof(struct dummy_offset));
> +}
> +
> +static int
> +test_shift_big_imm_check(uint64_t rc, const void *arg)
> +{
> +	uint64_t expect = 0x3FE0000000000000ULL;
> +
> +	return cmp_res(__func__, expect, rc, arg, arg, 0);
> +}
> +
>  /* call test-cases */
>  static const struct ebpf_insn test_call1_prog[] = {
> 
> @@ -3409,6 +3454,20 @@ static const struct bpf_test tests[] = {
>  		.prepare = test_mul1_prepare,
>  		.check_result = test_div1_check,
>  	},
> +	{
> +		.name = "test_shift_big_imm",
> +		.arg_sz = sizeof(struct dummy_offset),
> +		.prm = {
> +			.ins = test_shift_big_imm_prog,
> +			.nb_ins = RTE_DIM(test_shift_big_imm_prog),
> +			.prog_arg = {
> +				.type = RTE_BPF_ARG_PTR,
> +				.size = sizeof(struct dummy_offset),
> +			},
> +		},
> +		.prepare = test_shift_big_imm_prepare,
> +		.check_result = test_shift_big_imm_check,
> +	},
>  	{
>  		.name = "test_call1",
>  		.arg_sz = sizeof(struct dummy_offset),
> --
> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v5 4/9] bpf/arm64: mask shift count per RFC 9669
  2026-06-24 17:55   ` [PATCH v5 4/9] bpf/arm64: mask shift count " Stephen Hemminger
@ 2026-06-25 15:40     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-25 15:40 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org
  Cc: stable@dpdk.org, Wathsala Vithanage, Konstantin Ananyev,
	Jerin Jacob

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday 24 June 2026 18:55
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; stable@dpdk.org; Wathsala Vithanage
> <wathsala.vithanage@arm.com>; Konstantin Ananyev <konstantin.ananyev@huawei.com>; Marat Khalili
> <marat.khalili@huawei.com>; Jerin Jacob <jerinj@marvell.com>
> Subject: [PATCH v5 4/9] bpf/arm64: mask shift count per RFC 9669
> 
> The ARM JIT was not masking the shift count as required by RFC 9669
> (0x3f for 64-bit, 0x1f for 32-bit), so large immediate shift counts
> overflowed the UBFM/SBFM encoding and failed the JIT. Mask the
> immediate in emit_lsl/emit_lsr/emit_asr.
> 
> Fixes: 9f4469d9e83a ("bpf/arm: add logical operations")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Acked-by: Marat Khalili <marat.khalili@huawei.com>

> ---
>  lib/bpf/bpf_jit_arm64.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
> index ba7ae4d680..7582370062 100644
> --- a/lib/bpf/bpf_jit_arm64.c
> +++ b/lib/bpf/bpf_jit_arm64.c
> @@ -545,12 +545,14 @@ emit_bitfield(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t rn,
>  	emit_insn(ctx, insn, check_reg(rd) || check_reg(rn) ||
>  		  check_immr_imms(is64, immr, imms));
>  }
> +
>  static void
>  emit_lsl(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
>  {
>  	const unsigned int width = is64 ? 64 : 32;
>  	uint8_t imms, immr;
> 
> +	imm &= width - 1;
>  	immr = (width - imm) & (width - 1);
>  	imms = width - 1 - imm;
> 
> @@ -560,13 +562,19 @@ emit_lsl(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
>  static void
>  emit_lsr(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
>  {
> -	emit_bitfield(ctx, is64, rd, rd, imm, is64 ? 63 : 31, A64_UBFM);
> +	const unsigned int width = is64 ? 64 : 32;
> +
> +	imm &= width - 1;
> +	emit_bitfield(ctx, is64, rd, rd, imm, width - 1, A64_UBFM);
>  }
> 
>  static void
>  emit_asr(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
>  {
> -	emit_bitfield(ctx, is64, rd, rd, imm, is64 ? 63 : 31, A64_SBFM);
> +	const unsigned int width = is64 ? 64 : 32;
> +
> +	imm &= width - 1;
> +	emit_bitfield(ctx, is64, rd, rd, imm, width - 1, A64_SBFM);
>  }
> 
>  #define A64_AND 0
> --
> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 0/9] bpf: JIT related bug fixes
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (8 preceding siblings ...)
  2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-25 17:30 ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
                     ` (10 more replies)
  9 siblings, 11 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

While implementing JIT for packet capture ran into several issues:
  1. x86 JIT had a pre-existing bug which would crash.
  2. The arm64 JIT was missing the packet-access instructions, found
     previously [1].
  3. Shift counts were not masked to the operand width as RFC 9669
     requires: undefined behavior in the interpreter and an encoding
     failure in the arm64 JIT.
  4. Tests related to JIT were not being run or were missing coverage.

Fixed all of these. Patches are ordered with the most urgent fix (the
x86 crash) first, each fix followed by the test that exercises it. The
arm64 packet-load support is kept ahead of the "check JIT was generated"
patch so the series bisects cleanly on arm64.

The arm64 epilogue branch fix (patch 6) was originally posted by
Christophe Fontaine [1]; that series stalled, so it is carried here with
his authorship.

[1] https://inbox.dpdk.org/dev/20260319114500.9757-2-cfontain@redhat.com/

v6:
 - address Marat's review ARM64 JIT of LD IND instructions
   need to handle offsets outside of 32 bit range.


Christophe Fontaine (1):
  bpf/arm64: fix offset type to allow a negative jump

Stephen Hemminger (8):
  bpf/x86: fix JIT encoding of fixed-width immediates
  test/bpf: add JSET test with small immediate
  bpf: mask shift count in interpreter per RFC 9669
  bpf/arm64: mask shift count per RFC 9669
  test/bpf: add test for large shift
  bpf/arm64: add BPF_ABS/BPF_IND packet load support
  test/bpf: check that JIT was generated
  test/bpf: check that bpf_convert can be JIT'd

 app/test/test_bpf.c     | 320 +++++++++++++++++++++++++++++++---------
 lib/bpf/bpf_exec.c      |  31 ++--
 lib/bpf/bpf_jit_arm64.c | 185 ++++++++++++++++++++++-
 lib/bpf/bpf_jit_x86.c   |   6 +-
 lib/bpf/meson.build     |   2 +
 5 files changed, 456 insertions(+), 88 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 1/9] bpf/x86: fix JIT encoding of fixed-width immediates
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 2/9] test/bpf: add JSET test with small immediate Stephen Hemminger
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Marat Khalili, Konstantin Ananyev,
	Ferruh Yigit

Several places in the x86 JIT size an immediate with imm_size(), which
returns 1 or 4 bytes depending on the value. That is wrong for opcodes
whose immediate width is fixed by the encoding, and it breaks in both
directions.

TEST (0xF7 /0, used for BPF_JSET) has no imm8 form; the immediate is
always 32 bits. For a small mask such as BPF_JSET | BPF_K #0x1,
imm_size() returns 1, so the JIT emits a 1-byte immediate. The CPU
still consumes 4, swallowing 3 bytes of the following Jcc. The
instruction stream desyncs and the program crashes.

ROR and the shifts (0xC1 group) have the opposite problem: their
immediate is always imm8. For a count >= 128, imm_size() returns 4 and
the JIT emits 3 stray bytes, again desyncing the stream.

Size each immediate by its encoding: 32 bits for TEST, 8 bits for ROR
and the shifts.

Bugzilla ID: 1959
Fixes: cc752e43e079 ("bpf: add JIT compilation for x86_64 ISA")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 lib/bpf/bpf_jit_x86.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/bpf/bpf_jit_x86.c b/lib/bpf/bpf_jit_x86.c
index 54eb279643..912d3f69bc 100644
--- a/lib/bpf/bpf_jit_x86.c
+++ b/lib/bpf/bpf_jit_x86.c
@@ -300,7 +300,7 @@ emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
 	emit_rex(st, BPF_ALU, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }

 /*
@@ -441,7 +441,7 @@ emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
 	uint32_t imm)
 {
 	emit_shift(st, op, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(uint8_t));
 }

 /*
@@ -921,7 +921,7 @@ emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
 	emit_rex(st, op, 0, dreg);
 	emit_bytes(st, &ops, sizeof(ops));
 	emit_modregrm(st, MOD_DIRECT, mods, dreg);
-	emit_imm(st, imm, imm_size(imm));
+	emit_imm(st, imm, sizeof(int32_t));
 }

 static void
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 2/9] test/bpf: add JSET test with small immediate
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 3/9] bpf: mask shift count in interpreter per RFC 9669 Stephen Hemminger
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

The existing jump test only used a 32-bit JSET mask,
so the broken imm8 encoding of TEST in the x86 JIT was never exercised.
Add a case with a byte-sized mask;
run_test() runs it through the interpreter and the JIT.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 82 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 6b07e72295..232e9e2a98 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3158,7 +3158,89 @@ static const struct ebpf_insn test_ld_mbuf3_prog[] = {
 };
 
 /* all bpf test cases */
+/*
+ * JSET with a byte-sized mask: exercises the imm8 path of the TEST
+ * encoding in the x86 JIT (a 32-bit mask takes a different path).
+ */
+static const struct ebpf_insn test_jset1_prog[] = {
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	/* bit 0 is set in the input: branch is taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x1,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x1,
+	},
+	/* bit 1 is clear in the input: branch is not taken */
+	{
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 0x2,
+		.off = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_JA),
+		.off = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_jset1_prepare(void *arg)
+{
+	struct dummy_offset *df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u8 = 0x1;	/* bit 0 set, bit 1 clear */
+}
+
+static int
+test_jset1_check(uint64_t rc, const void *arg)
+{
+	return cmp_res(__func__, 0x1, rc, arg, arg, 0);
+}
+
 static const struct bpf_test tests[] = {
+	{
+		.name = "test_jset1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_jset1_prog,
+			.nb_ins = RTE_DIM(test_jset1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_jset1_prepare,
+		.check_result = test_jset1_check,
+	},
 	{
 		.name = "test_store1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 3/9] bpf: mask shift count in interpreter per RFC 9669
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 2/9] test/bpf: add JSET test with small immediate Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 4/9] bpf/arm64: mask shift count " Stephen Hemminger
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Marat Khalili, Konstantin Ananyev,
	Ferruh Yigit

The interpreter shifted by the raw immediate or register value, which
is undefined behavior in C when the count is >= the operand width and
trips UBSan. RFC 9669 masks shift counts (0x3f for 64-bit, 0x1f for
32-bit); mask the count in the LSH/RSH/ARSH cases.

Fixes: 94972f35a02e ("bpf: add BPF loading and execution framework")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 lib/bpf/bpf_exec.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/lib/bpf/bpf_exec.c b/lib/bpf/bpf_exec.c
index d423ef28f5..bb03c9cc2c 100644
--- a/lib/bpf/bpf_exec.c
+++ b/lib/bpf/bpf_exec.c
@@ -4,6 +4,7 @@
 
 #include <stdio.h>
 #include <stdint.h>
+#include <limits.h>
 
 #include <eal_export.h>
 #include <rte_common.h>
@@ -43,6 +44,16 @@
 	((reg)[(ins)->dst_reg] = \
 		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
 
+#define BPF_OP_SHIFT_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] =		\
+		(type)(reg)[(ins)->dst_reg] op	\
+		((ins)->imm & (sizeof(type) * CHAR_BIT - 1)))
+
+#define BPF_OP_SHIFT_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] =		\
+		(type)(reg)[(ins)->dst_reg] op	\
+		((reg)[(ins)->src_reg] & (sizeof(type) * CHAR_BIT - 1)))
+
 #define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
 	if ((type)(reg)[(ins)->src_reg] == 0) { \
 		RTE_BPF_LOG_LINE(ERR, \
@@ -183,10 +194,10 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
 			break;
 		case (BPF_ALU | BPF_LSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			BPF_OP_SHIFT_IMM(reg, ins, <<, uint32_t);
 			break;
 		case (BPF_ALU | BPF_RSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			BPF_OP_SHIFT_IMM(reg, ins, >>, uint32_t);
 			break;
 		case (BPF_ALU | BPF_XOR | BPF_K):
 			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
@@ -217,10 +228,10 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
 			break;
 		case (BPF_ALU | BPF_LSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			BPF_OP_SHIFT_REG(reg, ins, <<, uint32_t);
 			break;
 		case (BPF_ALU | BPF_RSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			BPF_OP_SHIFT_REG(reg, ins, >>, uint32_t);
 			break;
 		case (BPF_ALU | BPF_XOR | BPF_X):
 			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
@@ -262,13 +273,13 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_LSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			BPF_OP_SHIFT_IMM(reg, ins, <<, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_RSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			BPF_OP_SHIFT_IMM(reg, ins, >>, uint64_t);
 			break;
 		case (EBPF_ALU64 | EBPF_ARSH | BPF_K):
-			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			BPF_OP_SHIFT_IMM(reg, ins, >>, int64_t);
 			break;
 		case (EBPF_ALU64 | BPF_XOR | BPF_K):
 			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
@@ -299,13 +310,13 @@ bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
 			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_LSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			BPF_OP_SHIFT_REG(reg, ins, <<, uint64_t);
 			break;
 		case (EBPF_ALU64 | BPF_RSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			BPF_OP_SHIFT_REG(reg, ins, >>, uint64_t);
 			break;
 		case (EBPF_ALU64 | EBPF_ARSH | BPF_X):
-			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			BPF_OP_SHIFT_REG(reg, ins, >>, int64_t);
 			break;
 		case (EBPF_ALU64 | BPF_XOR | BPF_X):
 			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 4/9] bpf/arm64: mask shift count per RFC 9669
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (2 preceding siblings ...)
  2026-06-25 17:30   ` [PATCH v6 3/9] bpf: mask shift count in interpreter per RFC 9669 Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 5/9] test/bpf: add test for large shift Stephen Hemminger
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Marat Khalili, Wathsala Vithanage,
	Konstantin Ananyev, Jerin Jacob

The ARM JIT was not masking the shift count as required by RFC 9669
(0x3f for 64-bit, 0x1f for 32-bit), so large immediate shift counts
overflowed the UBFM/SBFM encoding and failed the JIT. Mask the
immediate in emit_lsl/emit_lsr/emit_asr.

Fixes: 9f4469d9e83a ("bpf/arm: add logical operations")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 lib/bpf/bpf_jit_arm64.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index ba7ae4d680..7582370062 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -545,12 +545,14 @@ emit_bitfield(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t rn,
 	emit_insn(ctx, insn, check_reg(rd) || check_reg(rn) ||
 		  check_immr_imms(is64, immr, imms));
 }
+
 static void
 emit_lsl(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 {
 	const unsigned int width = is64 ? 64 : 32;
 	uint8_t imms, immr;
 
+	imm &= width - 1;
 	immr = (width - imm) & (width - 1);
 	imms = width - 1 - imm;
 
@@ -560,13 +562,19 @@ emit_lsl(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 static void
 emit_lsr(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 {
-	emit_bitfield(ctx, is64, rd, rd, imm, is64 ? 63 : 31, A64_UBFM);
+	const unsigned int width = is64 ? 64 : 32;
+
+	imm &= width - 1;
+	emit_bitfield(ctx, is64, rd, rd, imm, width - 1, A64_UBFM);
 }
 
 static void
 emit_asr(struct a64_jit_ctx *ctx, bool is64, uint8_t rd, uint8_t imm)
 {
-	emit_bitfield(ctx, is64, rd, rd, imm, is64 ? 63 : 31, A64_SBFM);
+	const unsigned int width = is64 ? 64 : 32;
+
+	imm &= width - 1;
+	emit_bitfield(ctx, is64, rd, rd, imm, width - 1, A64_SBFM);
 }
 
 #define A64_AND 0
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 5/9] test/bpf: add test for large shift
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (3 preceding siblings ...)
  2026-06-25 17:30   ` [PATCH v6 4/9] bpf/arm64: mask shift count " Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 6/9] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

There were multiple bugs with immediate values in shift instructions.
The code was not masking as required by RFC.

Add new tests that cover these instructions.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 232e9e2a98..0e5894a532 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -2005,6 +2005,51 @@ test_div1_check(uint64_t rc, const void *arg)
 	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
 }
 
+/*
+ * Shift counts are masked to the operand width (RFC 9669: 0x3f for 64-bit,
+ * 0x1f for 32-bit). Counts >= 128 also exercise the x86 imm_size() path that
+ * used to desync the stream, and the arm64 UBFM/SBFM immediate encoding.
+ */
+static const struct ebpf_insn test_shift_big_imm_prog[] = {
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_LSH | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 191
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 200
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_RSH | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 130
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT)
+	},
+};
+
+static void
+test_shift_big_imm_prepare(void *arg)
+{
+	memset(arg, 0, sizeof(struct dummy_offset));
+}
+
+static int
+test_shift_big_imm_check(uint64_t rc, const void *arg)
+{
+	uint64_t expect = 0x3FE0000000000000ULL;
+
+	return cmp_res(__func__, expect, rc, arg, arg, 0);
+}
+
 /* call test-cases */
 static const struct ebpf_insn test_call1_prog[] = {
 
@@ -3409,6 +3454,20 @@ static const struct bpf_test tests[] = {
 		.prepare = test_mul1_prepare,
 		.check_result = test_div1_check,
 	},
+	{
+		.name = "test_shift_big_imm",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_shift_big_imm_prog,
+			.nb_ins = RTE_DIM(test_shift_big_imm_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_shift_big_imm_prepare,
+		.check_result = test_shift_big_imm_check,
+	},
 	{
 		.name = "test_call1",
 		.arg_sz = sizeof(struct dummy_offset),
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 6/9] bpf/arm64: fix offset type to allow a negative jump
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (4 preceding siblings ...)
  2026-06-25 17:30   ` [PATCH v6 5/9] test/bpf: add test for large shift Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev
  Cc: Christophe Fontaine, stable, Stephen Hemminger, Marat Khalili,
	Wathsala Vithanage, Konstantin Ananyev, Jerin Jacob

From: Christophe Fontaine <cfontain@redhat.com>

The DPDK BPF JIT standalone test test_ld_mbuf1 fails on arm64.
It does:
	r6 = r1                    // mbuf
	r0 = *(u8 *)pkt[0]         // BPF_ABS
	if ((r0 & 0xf0) == 0x40)
		goto parse
	r0 = 0
	exit                       // epilogue E0
parse:
	r0 = *(u8 *)pkt[r0 + 3]    // BPF_IND
	...
	exit

emit_return_zero_if_src_zero() returns 0 by branching to a function
epilogue. The target may be a previous epilogue so branch
might be backwards; therefore the offset needs to be negative.

The offset was stored in a uint16_t, so a negative value wrapped to a
large positive number; emit_b() then branched past the end of the
program and faulted at run time.

Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
Cc: stable@dpdk.org

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 lib/bpf/bpf_jit_arm64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 7582370062..51906c7f0d 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -965,10 +965,12 @@ static void
 emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
 {
 	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
-	uint16_t jump_to_epilogue;
+	int32_t jump_to_epilogue;
 
 	emit_cbnz(ctx, is64, src, 3);
 	emit_mov_imm(ctx, is64, r0, 0);
+
+	/* maybe backwards branch to earlier epilogue */
 	jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;
 	emit_b(ctx, jump_to_epilogue);
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (5 preceding siblings ...)
  2026-06-25 17:30   ` [PATCH v6 6/9] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-26 10:34     ` Marat Khalili
  2026-06-25 17:30   ` [PATCH v6 8/9] test/bpf: check that JIT was generated Stephen Hemminger
                     ` (3 subsequent siblings)
  10 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili

The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
"invalid opcode", so cBPF programs converted by rte_bpf_convert() could
not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
data held in the first mbuf segment, and a __rte_pktmbuf_read() slow
path for everything else.

The forward branches over the call cannot use fixed distances:
emit_call() materializes the helper address with a variable number of
mov/movk instructions, so the block sizes are not known up front. Size
the three blocks (fast path, slow path, common tail) in a dry run, then
emit for real with the branches resolved from the measured offsets.

The effective offset is validated before use: src is a runtime value for
BPF_IND, so a negative offset is routed to the slow path rather than
read from the first segment, and the offset is bounded to UINT32_MAX
before __rte_pktmbuf_read(), whose off argument is uint32_t.

Programs using these opcodes use the call register layout, since the
slow path makes a function call.

For example, BPF_LD | BPF_IND | BPF_W (4-byte indirect load, mbuf in
R6/x19, effective offset kept in x9) emits:

	mov	x9, #imm		// off  = imm
	add	x9, x9, src		// off += src		(BPF_IND)
	cmp	x9, xzr			// reject negative
	b.mi	slow			//   effective offset
	mov	x10, #data_len_ofs
	ldrh	w10, [x19, x10]		// mbuf->data_len
	sub	x10, x10, x9		// data_len - off
	mov	x11, #sz
	cmp	x10, x11
	b.lt	slow			// not in first segment
	mov	x10, #data_off_ofs
	ldrh	w10, [x19, x10]		// mbuf->data_off
	mov	x7, #buf_addr_ofs
	ldr	x7, [x19, x7]		// mbuf->buf_addr
	add	x7, x7, x10
	add	x7, x7, x9		// ptr = buf_addr + data_off + off
	b	load
slow:
	mov	x10, #UINT32_MAX
	cmp	x9, x10
	b.ls	1f			// off fits uint32_t ...
	mov	x7, #0			//   else return 0
	b	epilogue
1:	mov	x1, x9			// __rte_pktmbuf_read(mbuf, off, sz, buf)
	mov	x0, x19
	mov	w2, #sz
	sub	x3, x25, #stack_ofs
	mov	x9, #<helper lo>
	movk	x9, #<helper hi>
	blr	x9
	mov	x7, x0			// ptr = return value
	cbnz	x7, load		// non-NULL -> common tail
	mov	x7, #0			//   else return 0
	b	epilogue
load:
	ldr	w7, [x7, xzr]		// *(uint32_t *)ptr	(size varies)
	rev32	x7, x7			// ntoh	(size varies; omitted for BPF_B)

For BPF_ABS the "add x9, x9, src" is omitted; the final load/byte-swap
vary with the access size.

Bugzilla ID: 1427

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 169 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 168 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 51906c7f0d..6d531dc83d 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -1133,6 +1133,155 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
 	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
 }
 
+/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
+enum {
+	LDMB_FAST_OFS, /* fast path */
+	LDMB_SLOW_OFS, /* slow path */
+	LDMB_FIN_OFS,  /* common tail */
+	LDMB_OFS_NUM
+};
+
+/*
+ * Helper for emit_ld_mbuf(): fast path.
+ * Compute the packet offset; if it lies inside the first segment leave the
+ * data pointer in R0, otherwise branch to the slow path.
+ */
+static void
+emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
+		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
+
+	/* off = imm (+ src for BPF_IND) */
+	emit_mov_imm(ctx, 1, tmp1, imm);
+	if (mode == BPF_IND)
+		emit_add(ctx, 1, tmp1, src);
+
+	/*
+	 * A negative effective offset (src can be < 0 for BPF_IND) would pass
+	 * the signed check below and read before the segment, so route it to
+	 * the slow path, which rejects it via the uint32_t bound on off.
+	 */
+	emit_cmp(ctx, 1, tmp1, A64_ZR);
+	emit_b_cond(ctx, A64_MI, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_sub(ctx, 1, tmp2, tmp1);
+	emit_mov_imm(ctx, 1, tmp3, sz);
+	emit_cmp(ctx, 1, tmp2, tmp3);
+	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
+	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
+	emit_add(ctx, 1, r0, tmp2);
+	emit_add(ctx, 1, r0, tmp1);
+
+	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
+}
+
+/*
+ * Helper for emit_ld_mbuf(): slow path.
+ * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
+ * The scratch buffer is the space reserved by __rte_bpf_validate() at the
+ * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
+ */
+static void
+emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+
+	/*
+	 * __rte_pktmbuf_read() takes a uint32_t off, so a 64-bit off that does
+	 * not fit would be silently truncated.  Return 0 if it is out of range;
+	 * this also catches the negative off routed here by the fast path.
+	 */
+	emit_mov_imm(ctx, 1, tmp2, UINT32_MAX);
+	emit_cmp(ctx, 1, tmp1, tmp2);
+	emit_b_cond(ctx, A64_LS, 3);		/* off <= UINT32_MAX: do the call */
+	emit_mov_imm(ctx, 1, r0, 0);
+	emit_b(ctx, (ctx->program_start + ctx->program_sz) - ctx->idx);
+
+	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
+	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
+	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
+	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
+	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
+
+	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
+	emit_return_zero_if_src_zero(ctx, 1, r0);
+}
+
+/*
+ * Helper for emit_ld_mbuf(): common tail.
+ * Load the value pointed to by R0 and convert from network byte order.
+ */
+static void
+emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+
+	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
+	if (opsz != BPF_B)
+		emit_be(ctx, r0, sz * 8);
+}
+
+/*
+ * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads:
+ *
+ *	off = imm (+ src for BPF_IND)
+ *	if (off >= 0 && mbuf->data_len - off >= sz)	    -- fast path
+ *		ptr = mbuf->buf_addr + mbuf->data_off + off;
+ *	else						    -- slow path
+ *		if ((uint64_t)off > UINT32_MAX)
+ *			return 0;
+ *		ptr = __rte_pktmbuf_read(mbuf, off, sz, buf);
+ *		if (ptr == NULL)
+ *			return 0;
+ *	R0 = ntoh(*(size *)ptr);			    -- common tail
+ *
+ * The three blocks are sized in a dry run so the forward branches can be
+ * resolved, then emitted for real (arm64 instructions are fixed width, so
+ * the dry run reproduces the real instruction count exactly).
+ */
+static void
+emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
+	     uint32_t stack_ofs)
+{
+	uint8_t mode = BPF_MODE(op);
+	uint8_t opsz = BPF_SIZE(op);
+	uint32_t sz = bpf_size(opsz);
+	uint32_t ofs[LDMB_OFS_NUM];
+
+	/* seed offsets so the dry-run branches stay in range */
+	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
+
+	/* dry run to record block offsets */
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	ofs[LDMB_SLOW_OFS] = ctx->idx;
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	ofs[LDMB_FIN_OFS] = ctx->idx;
+	emit_ldmb_fin(ctx, opsz, sz);
+
+	/* rewind and emit for real with resolved offsets */
+	ctx->idx = ofs[LDMB_FAST_OFS];
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	emit_ldmb_fin(ctx, opsz, sz);
+}
+
 static void
 check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 {
@@ -1145,8 +1294,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 		op = ins->code;
 
 		switch (op) {
-		/* Call imm */
+		/*
+		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
+		 * so they need the call-clobbered register layout as well.
+		 */
 		case (BPF_JMP | EBPF_CALL):
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
 			ctx->foundcall = 1;
 			return;
 		}
@@ -1348,6 +1506,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 			emit_mov_imm(ctx, 1, dst, u64);
 			i++;
 			break;
+		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
+			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
+			break;
 		/* *(size *)(dst + off) = src */
 		case (BPF_STX | BPF_MEM | BPF_B):
 		case (BPF_STX | BPF_MEM | BPF_H):
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 8/9] test/bpf: check that JIT was generated
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (6 preceding siblings ...)
  2026-06-25 17:30   ` [PATCH v6 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 17:30   ` [PATCH v6 9/9] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

Avoid silently ignoring JIT failures. The test cases should
all succeed JIT compilation; if not it is a bug in the JIT
implementation and should be reported.

Introduce a configuration setting RTE_BPF_JIT_SUPPORTED
which is cleaner than using an ARCH specific #ifdef.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 8 ++++++++
 lib/bpf/meson.build | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 0e5894a532..16a1004e51 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3649,6 +3649,14 @@ run_test(const struct bpf_test *tst)
 				rv, strerror(rv));
 		}
 	}
+#ifdef RTE_BPF_JIT_SUPPORTED
+	else {
+		/* a JIT backend exists for this arch, so it must compile */
+		printf("%s@%d: %s: no JIT code generated;\n",
+			__func__, __LINE__, tst->name);
+		ret = -1;
+	}
+#endif
 
 	rte_bpf_destroy(bpf);
 	return ret;
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 7e8a300e3f..04ede96689 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -27,8 +27,10 @@ sources = files(
 )
 
 if arch_subdir == 'x86' and dpdk_conf.get('RTE_ARCH_64')
+    dpdk_conf.set('RTE_BPF_JIT_SUPPORTED', 1)
     sources += files('bpf_jit_x86.c')
 elif dpdk_conf.has('RTE_ARCH_ARM64')
+    dpdk_conf.set('RTE_BPF_JIT_SUPPORTED', 1)
     sources += files('bpf_jit_arm64.c')
 endif
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 9/9] test/bpf: check that bpf_convert can be JIT'd
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (7 preceding siblings ...)
  2026-06-25 17:30   ` [PATCH v6 8/9] test/bpf: check that JIT was generated Stephen Hemminger
@ 2026-06-25 17:30   ` Stephen Hemminger
  2026-06-25 23:12     ` Stephen Hemminger
  2026-06-26 10:35   ` [PATCH v6 0/9] bpf: JIT related bug fixes Marat Khalili
  2026-06-30 12:45   ` Konstantin Ananyev
  10 siblings, 1 reply; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 17:30 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Marat Khalili, Konstantin Ananyev

Run each converted filter through both the interpreter and the JIT and
check they agree, catching JIT miscompiles.

test_bpf_filter and test_bpf_match did nearly the same thing: compile,
load and run a filter against the dummy packet. Combine them into
test_bpf_match, which now builds the packet itself and returns whether
the filter matched. Callers run it for both load methods.

The dummy packet is a UDP packet to a fixed destination MAC, source
and destination ports, so the filter results are deterministic. None
of the sample filters should match it, so assert that; a convert or
JIT bug that flips a result is then caught. The destination MAC and
source port are chosen so the negative ethernet and port filters do
not match, and "port not 53 and not arp" is dropped as it matches
any non-ARP packet that lacks port 53.

Reduce log output to make it easier to match which expression might be
causing issues.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
 app/test/test_bpf.c | 171 ++++++++++++++++++++++++++------------------
 1 file changed, 100 insertions(+), 71 deletions(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 16a1004e51..f56f6c7d7e 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -32,6 +32,7 @@ test_bpf(void)
 #include <rte_bpf.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
+#include <rte_udp.h>
 
 
 /* Tests of most simple BPF programs (no instructions, one instruction etc.) */
@@ -4756,11 +4757,13 @@ load_cbpf_program_convert(struct bpf_program *cbpf_program, const char *str)
 		return NULL;
 	}
 
+#ifdef DEBUG
 	printf("bpf convert(\"%s\") produced:\n", str);
 	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
 
 	printf("%s \"%s\"\n", __func__, str);
 	test_bpf_dump(cbpf_program, prm);
+#endif
 
 	bpf = rte_bpf_load(prm);
 	rte_free(prm);
@@ -4785,18 +4788,65 @@ load_cbpf_program_direct(struct bpf_program *cbpf_program, const char *str __rte
 	});
 }
 
+static const load_cbpf_program_t cbpf_program_loaders[] = {
+	load_cbpf_program_convert,
+	load_cbpf_program_direct,
+};
+
+/* Setup Ethernet/IP/UDP headers in a dummy packet buffer for filter tests */
+static void
+dummy_ip_prep(void *data, uint16_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+		struct rte_udp_hdr udp_hdr;
+	} *hdr = data;
+
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.dst_addr.addr_bytes = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x0e },
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_UDP,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+	hdr->udp_hdr = (struct rte_udp_hdr) {
+		.src_port = rte_cpu_to_be_16(49152),	/* fixed, avoids filter ports */
+		.dst_port = rte_cpu_to_be_16(9),	/* discard port */
+		.dgram_len = rte_cpu_to_be_16(plen - sizeof(struct rte_ipv4_hdr)),
+		.dgram_cksum = 0,
+	};
+}
+
+/*
+ * Compile a pcap filter, load it with the given loader, then run it against
+ * a standard dummy packet with both the interpreter and (when available) the
+ * JIT, checking the two agree.
+ *
+ * Returns 1 if the filter matched, 0 if it did not, and -1 on any error
+ * (compile, load, or interpreter/JIT mismatch).
+ */
 static int
-test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
+test_bpf_match(pcap_t *pcap, const char *str,
 	load_cbpf_program_t load_cbpf_program)
 {
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	const uint32_t plen = 100;
 	struct bpf_program fcode;
-	struct rte_bpf *bpf;
+	struct rte_mbuf mb = { 0 };
+	struct rte_bpf *bpf = NULL;
 	int ret = -1;
 	uint64_t rc;
 
+	printf("%s '%s'\n", __func__, str);
 	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
-		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		       __func__, __LINE__, str, pcap_geterr(pcap));
 		return -1;
 	}
 
@@ -4804,15 +4854,41 @@ test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
 	if (bpf == NULL) {
 		printf("%s@%d: failed to load cbpf program for \"%s\", error=%d(%s);\n",
 			__func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		test_bpf_dump(&fcode, NULL);
 		goto error;
 	}
 
-	rc = rte_bpf_exec(bpf, mb);
-	/* The return code from bpf capture filter is non-zero if matched */
-	ret = (rc == 0);
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	dummy_ip_prep(rte_pktmbuf_mtod(&mb, void *), plen);
+
+	rc = rte_bpf_exec(bpf, &mb);
+
+	/* Verify the JIT, when available, produces the same result. */
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func != NULL) {
+			fflush(stdout);
+			if (jit.func(&mb) != rc) {
+				printf("%s@%d: JIT return code does not match\n",
+				       __func__, __LINE__);
+				goto error;
+			}
+		}
+#ifdef RTE_BPF_JIT_SUPPORTED
+		else {
+			printf("%s@%d: no JIT code generated\n",
+			       __func__, __LINE__);
+			goto error;
+		}
+#endif
+	}
+
+	/* The return code from a bpf capture filter is non-zero if matched. */
+	ret = (rc != 0);
 error:
-	if (bpf)
-		rte_bpf_destroy(bpf);
+	rte_bpf_destroy(bpf);
 	pcap_freecode(&fcode);
 	return ret;
 }
@@ -4821,44 +4897,13 @@ test_bpf_match(pcap_t *pcap, const char *str, struct rte_mbuf *mb,
 static int
 test_bpf_filter_sanity(pcap_t *pcap)
 {
-	static const load_cbpf_program_t cbpf_program_loaders[] = {
-		load_cbpf_program_convert,
-		load_cbpf_program_direct,
-	};
-
-	const uint32_t plen = 100;
-	struct rte_mbuf mb, *m;
-	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
-	struct {
-		struct rte_ether_hdr eth_hdr;
-		struct rte_ipv4_hdr ip_hdr;
-	} *hdr;
-
-	memset(&mb, 0, sizeof(mb));
-	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
-	m = &mb;
-
-	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
-	hdr->eth_hdr = (struct rte_ether_hdr) {
-		.dst_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
-		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
-	};
-	hdr->ip_hdr = (struct rte_ipv4_hdr) {
-		.version_ihl = RTE_IPV4_VHL_DEF,
-		.total_length = rte_cpu_to_be_16(plen),
-		.time_to_live = IPDEFTTL,
-		.next_proto_id = IPPROTO_RAW,
-		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
-		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
-	};
-
-	for (int li = 0; li != RTE_DIM(cbpf_program_loaders); ++li) {
-		if (test_bpf_match(pcap, "ip", m, cbpf_program_loaders[li]) != 0) {
+	for (unsigned int li = 0; li != RTE_DIM(cbpf_program_loaders); ++li) {
+		if (test_bpf_match(pcap, "ip", cbpf_program_loaders[li]) != 1) {
 			printf("%s@%d: filter \"ip\" doesn't match test data\n",
 			       __func__, __LINE__);
 			return -1;
 		}
-		if (test_bpf_match(pcap, "not ip", m, cbpf_program_loaders[li]) == 0) {
+		if (test_bpf_match(pcap, "not ip", cbpf_program_loaders[li]) != 0) {
 			printf("%s@%d: filter \"not ip\" does match test data\n",
 			       __func__, __LINE__);
 			return -1;
@@ -4882,7 +4927,6 @@ static const char * const sample_filters[] = {
 	"port 53",
 	"host 192.0.2.1 and not (port 80 or port 25)",
 	"host 2001:4b98:db0::8 and not port 80 and not port 25",
-	"port not 53 and not arp",
 	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
 	"ether proto 0x888e",
 	"ether[0] & 1 = 0 and ip[16] >= 224",
@@ -4909,35 +4953,10 @@ static const char * const sample_filters[] = {
 	"or host 192.0.2.1 or host 192.0.2.100 or host 192.0.2.200"),
 };
 
-static int
-test_bpf_filter(pcap_t *pcap, const char *s, load_cbpf_program_t load_cbpf_program)
-{
-	struct bpf_program fcode;
-	struct rte_bpf *bpf;
-
-	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
-		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
-		       __func__, __LINE__, s, pcap_geterr(pcap));
-		return -1;
-	}
-
-	bpf = load_cbpf_program(&fcode, s);
-	if (bpf == NULL) {
-		printf("%s@%d: failed to load cbpf program for \"%s\", error=%d(%s);\n",
-			__func__, __LINE__, s, rte_errno, strerror(rte_errno));
-		test_bpf_dump(&fcode, NULL);
-	}
-
-	rte_bpf_destroy(bpf);
-
-	pcap_freecode(&fcode);
-	return (bpf == NULL) ? -1 : 0;
-}
-
 static int
 test_bpf_convert(void)
 {
-	unsigned int i;
+	unsigned int i, li;
 	pcap_t *pcap;
 	int rc;
 
@@ -4949,8 +4968,18 @@ test_bpf_convert(void)
 
 	rc = test_bpf_filter_sanity(pcap);
 	for (i = 0; i < RTE_DIM(sample_filters); i++) {
-		rc |= test_bpf_filter(pcap, sample_filters[i], load_cbpf_program_convert);
-		rc |= test_bpf_filter(pcap, sample_filters[i], load_cbpf_program_direct);
+		for (li = 0; li < RTE_DIM(cbpf_program_loaders); li++) {
+			int m = test_bpf_match(pcap, sample_filters[i],
+					       cbpf_program_loaders[li]);
+
+			/* None of the sample filters match the dummy packet. */
+			if (m != 0) {
+				if (m > 0)
+					printf("%s@%d: filter \"%s\" unexpectedly matched\n",
+					       __func__, __LINE__, sample_filters[i]);
+				rc = -1;
+			}
+		}
 	}
 
 	pcap_close(pcap);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 9/9] test/bpf: check that bpf_convert can be JIT'd
  2026-06-25 17:30   ` [PATCH v6 9/9] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-25 23:12     ` Stephen Hemminger
  0 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-25 23:12 UTC (permalink / raw)
  To: dev; +Cc: Marat Khalili, Konstantin Ananyev


Recheck-request: github-robot: build

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v6 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-25 17:30   ` [PATCH v6 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-26 10:34     ` Marat Khalili
  0 siblings, 0 replies; 73+ messages in thread
From: Marat Khalili @ 2026-06-26 10:34 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Wathsala Vithanage, Konstantin Ananyev



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday 25 June 2026 18:30
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Wathsala Vithanage <wathsala.vithanage@arm.com>;
> Konstantin Ananyev <konstantin.ananyev@huawei.com>; Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH v6 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support
> 
> The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
> "invalid opcode", so cBPF programs converted by rte_bpf_convert() could
> not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
> data held in the first mbuf segment, and a __rte_pktmbuf_read() slow
> path for everything else.
> 
> The forward branches over the call cannot use fixed distances:
> emit_call() materializes the helper address with a variable number of
> mov/movk instructions, so the block sizes are not known up front. Size
> the three blocks (fast path, slow path, common tail) in a dry run, then
> emit for real with the branches resolved from the measured offsets.
> 
> The effective offset is validated before use: src is a runtime value for
> BPF_IND, so a negative offset is routed to the slow path rather than
> read from the first segment, and the offset is bounded to UINT32_MAX
> before __rte_pktmbuf_read(), whose off argument is uint32_t.
> 
> Programs using these opcodes use the call register layout, since the
> slow path makes a function call.
> 
> For example, BPF_LD | BPF_IND | BPF_W (4-byte indirect load, mbuf in
> R6/x19, effective offset kept in x9) emits:
> 
> 	mov	x9, #imm		// off  = imm
> 	add	x9, x9, src		// off += src		(BPF_IND)
> 	cmp	x9, xzr			// reject negative
> 	b.mi	slow			//   effective offset
> 	mov	x10, #data_len_ofs
> 	ldrh	w10, [x19, x10]		// mbuf->data_len
> 	sub	x10, x10, x9		// data_len - off
> 	mov	x11, #sz
> 	cmp	x10, x11
> 	b.lt	slow			// not in first segment
> 	mov	x10, #data_off_ofs
> 	ldrh	w10, [x19, x10]		// mbuf->data_off
> 	mov	x7, #buf_addr_ofs
> 	ldr	x7, [x19, x7]		// mbuf->buf_addr
> 	add	x7, x7, x10
> 	add	x7, x7, x9		// ptr = buf_addr + data_off + off
> 	b	load
> slow:
> 	mov	x10, #UINT32_MAX
> 	cmp	x9, x10
> 	b.ls	1f			// off fits uint32_t ...
> 	mov	x7, #0			//   else return 0
> 	b	epilogue
> 1:	mov	x1, x9			// __rte_pktmbuf_read(mbuf, off, sz, buf)
> 	mov	x0, x19
> 	mov	w2, #sz
> 	sub	x3, x25, #stack_ofs
> 	mov	x9, #<helper lo>
> 	movk	x9, #<helper hi>
> 	blr	x9
> 	mov	x7, x0			// ptr = return value
> 	cbnz	x7, load		// non-NULL -> common tail
> 	mov	x7, #0			//   else return 0
> 	b	epilogue
> load:
> 	ldr	w7, [x7, xzr]		// *(uint32_t *)ptr	(size varies)
> 	rev32	x7, x7			// ntoh	(size varies; omitted for BPF_B)
> 
> For BPF_ABS the "add x9, x9, src" is omitted; the final load/byte-swap
> vary with the access size.
> 
> Bugzilla ID: 1427
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>


Acked-by: Marat Khalili <marat.khalili@huawei.com>


> ---
>  lib/bpf/bpf_jit_arm64.c | 169 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
> index 51906c7f0d..6d531dc83d 100644
> --- a/lib/bpf/bpf_jit_arm64.c
> +++ b/lib/bpf/bpf_jit_arm64.c
> @@ -1133,6 +1133,155 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
>  	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
>  }
> 
> +/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
> +enum {
> +	LDMB_FAST_OFS, /* fast path */
> +	LDMB_SLOW_OFS, /* slow path */
> +	LDMB_FIN_OFS,  /* common tail */
> +	LDMB_OFS_NUM
> +};
> +
> +/*
> + * Helper for emit_ld_mbuf(): fast path.
> + * Compute the packet offset; if it lies inside the first segment leave the
> + * data pointer in R0, otherwise branch to the slow path.
> + */
> +static void
> +emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
> +		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
> +{
> +	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> +	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
> +	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
> +	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
> +	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
> +
> +	/* off = imm (+ src for BPF_IND) */
> +	emit_mov_imm(ctx, 1, tmp1, imm);
> +	if (mode == BPF_IND)
> +		emit_add(ctx, 1, tmp1, src);
> +
> +	/*
> +	 * A negative effective offset (src can be < 0 for BPF_IND) would pass
> +	 * the signed check below and read before the segment, so route it to
> +	 * the slow path, which rejects it via the uint32_t bound on off.
> +	 */
> +	emit_cmp(ctx, 1, tmp1, A64_ZR);
> +	emit_b_cond(ctx, A64_MI, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
> +
> +	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
> +	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
> +	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
> +	emit_sub(ctx, 1, tmp2, tmp1);
> +	emit_mov_imm(ctx, 1, tmp3, sz);
> +	emit_cmp(ctx, 1, tmp2, tmp3);
> +	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
> +
> +	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
> +	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
> +	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
> +	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
> +	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
> +	emit_add(ctx, 1, r0, tmp2);
> +	emit_add(ctx, 1, r0, tmp1);
> +
> +	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
> +}
> +
> +/*
> + * Helper for emit_ld_mbuf(): slow path.
> + * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
> + * The scratch buffer is the space reserved by __rte_bpf_validate() at the
> + * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
> + */
> +static void
> +emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
> +{
> +	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> +	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
> +	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
> +	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
> +	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
> +
> +	/*
> +	 * __rte_pktmbuf_read() takes a uint32_t off, so a 64-bit off that does
> +	 * not fit would be silently truncated.  Return 0 if it is out of range;
> +	 * this also catches the negative off routed here by the fast path.
> +	 */
> +	emit_mov_imm(ctx, 1, tmp2, UINT32_MAX);
> +	emit_cmp(ctx, 1, tmp1, tmp2);
> +	emit_b_cond(ctx, A64_LS, 3);		/* off <= UINT32_MAX: do the call */
> +	emit_mov_imm(ctx, 1, r0, 0);
> +	emit_b(ctx, (ctx->program_start + ctx->program_sz) - ctx->idx);
> +
> +	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
> +	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
> +	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
> +	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
> +	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
> +
> +	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
> +	emit_return_zero_if_src_zero(ctx, 1, r0);
> +}
> +
> +/*
> + * Helper for emit_ld_mbuf(): common tail.
> + * Load the value pointed to by R0 and convert from network byte order.
> + */
> +static void
> +emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
> +{
> +	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> +
> +	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
> +	if (opsz != BPF_B)
> +		emit_be(ctx, r0, sz * 8);
> +}
> +
> +/*
> + * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads:
> + *
> + *	off = imm (+ src for BPF_IND)
> + *	if (off >= 0 && mbuf->data_len - off >= sz)	    -- fast path
> + *		ptr = mbuf->buf_addr + mbuf->data_off + off;
> + *	else						    -- slow path
> + *		if ((uint64_t)off > UINT32_MAX)
> + *			return 0;
> + *		ptr = __rte_pktmbuf_read(mbuf, off, sz, buf);
> + *		if (ptr == NULL)
> + *			return 0;
> + *	R0 = ntoh(*(size *)ptr);			    -- common tail
> + *
> + * The three blocks are sized in a dry run so the forward branches can be
> + * resolved, then emitted for real (arm64 instructions are fixed width, so
> + * the dry run reproduces the real instruction count exactly).
> + */
> +static void
> +emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
> +	     uint32_t stack_ofs)
> +{
> +	uint8_t mode = BPF_MODE(op);
> +	uint8_t opsz = BPF_SIZE(op);
> +	uint32_t sz = bpf_size(opsz);
> +	uint32_t ofs[LDMB_OFS_NUM];
> +
> +	/* seed offsets so the dry-run branches stay in range */
> +	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
> +
> +	/* dry run to record block offsets */
> +	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
> +	ofs[LDMB_SLOW_OFS] = ctx->idx;
> +	emit_ldmb_slow_path(ctx, sz, stack_ofs);
> +	ofs[LDMB_FIN_OFS] = ctx->idx;
> +	emit_ldmb_fin(ctx, opsz, sz);
> +
> +	/* rewind and emit for real with resolved offsets */
> +	ctx->idx = ofs[LDMB_FAST_OFS];
> +	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
> +	emit_ldmb_slow_path(ctx, sz, stack_ofs);
> +	emit_ldmb_fin(ctx, opsz, sz);
> +}
> +
>  static void
>  check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
>  {
> @@ -1145,8 +1294,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
>  		op = ins->code;
> 
>  		switch (op) {
> -		/* Call imm */
> +		/*
> +		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
> +		 * so they need the call-clobbered register layout as well.
> +		 */
>  		case (BPF_JMP | EBPF_CALL):
> +		case (BPF_LD | BPF_ABS | BPF_B):
> +		case (BPF_LD | BPF_ABS | BPF_H):
> +		case (BPF_LD | BPF_ABS | BPF_W):
> +		case (BPF_LD | BPF_IND | BPF_B):
> +		case (BPF_LD | BPF_IND | BPF_H):
> +		case (BPF_LD | BPF_IND | BPF_W):
>  			ctx->foundcall = 1;
>  			return;
>  		}
> @@ -1348,6 +1506,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
>  			emit_mov_imm(ctx, 1, dst, u64);
>  			i++;
>  			break;
> +		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
> +		case (BPF_LD | BPF_ABS | BPF_B):
> +		case (BPF_LD | BPF_ABS | BPF_H):
> +		case (BPF_LD | BPF_ABS | BPF_W):
> +		case (BPF_LD | BPF_IND | BPF_B):
> +		case (BPF_LD | BPF_IND | BPF_H):
> +		case (BPF_LD | BPF_IND | BPF_W):
> +			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
> +			break;
>  		/* *(size *)(dst + off) = src */
>  		case (BPF_STX | BPF_MEM | BPF_B):
>  		case (BPF_STX | BPF_MEM | BPF_H):
> --
> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v6 0/9] bpf: JIT related bug fixes
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (8 preceding siblings ...)
  2026-06-25 17:30   ` [PATCH v6 9/9] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-26 10:35   ` Marat Khalili
  2026-06-26 15:01     ` Stephen Hemminger
  2026-06-30 12:45   ` Konstantin Ananyev
  10 siblings, 1 reply; 73+ messages in thread
From: Marat Khalili @ 2026-06-26 10:35 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday 25 June 2026 18:30
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Subject: [PATCH v6 0/9] bpf: JIT related bug fixes
> 
> While implementing JIT for packet capture ran into several issues:
>   1. x86 JIT had a pre-existing bug which would crash.
>   2. The arm64 JIT was missing the packet-access instructions, found
>      previously [1].
>   3. Shift counts were not masked to the operand width as RFC 9669
>      requires: undefined behavior in the interpreter and an encoding
>      failure in the arm64 JIT.
>   4. Tests related to JIT were not being run or were missing coverage.
> 
> Fixed all of these. Patches are ordered with the most urgent fix (the
> x86 crash) first, each fix followed by the test that exercises it. The
> arm64 packet-load support is kept ahead of the "check JIT was generated"
> patch so the series bisects cleanly on arm64.
> 
> The arm64 epilogue branch fix (patch 6) was originally posted by
> Christophe Fontaine [1]; that series stalled, so it is carried here with
> his authorship.
> 
> [1] https://inbox.dpdk.org/dev/20260319114500.9757-2-cfontain@redhat.com/
> 
> v6:
>  - address Marat's review ARM64 JIT of LD IND instructions
>    need to handle offsets outside of 32 bit range.
> 
> 
> Christophe Fontaine (1):
>   bpf/arm64: fix offset type to allow a negative jump
> 
> Stephen Hemminger (8):
>   bpf/x86: fix JIT encoding of fixed-width immediates
>   test/bpf: add JSET test with small immediate
>   bpf: mask shift count in interpreter per RFC 9669
>   bpf/arm64: mask shift count per RFC 9669
>   test/bpf: add test for large shift
>   bpf/arm64: add BPF_ABS/BPF_IND packet load support
>   test/bpf: check that JIT was generated
>   test/bpf: check that bpf_convert can be JIT'd
> 
>  app/test/test_bpf.c     | 320 +++++++++++++++++++++++++++++++---------
>  lib/bpf/bpf_exec.c      |  31 ++--
>  lib/bpf/bpf_jit_arm64.c | 185 ++++++++++++++++++++++-
>  lib/bpf/bpf_jit_x86.c   |   6 +-
>  lib/bpf/meson.build     |   2 +
>  5 files changed, 456 insertions(+), 88 deletions(-)
> 
> --
> 2.53.0

Series-tested-by: Marat Khalili <marat.khalili@huawei.com>

(Maybe it actually deserves a release note at this point?)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 0/9] bpf: JIT related bug fixes
  2026-06-26 10:35   ` [PATCH v6 0/9] bpf: JIT related bug fixes Marat Khalili
@ 2026-06-26 15:01     ` Stephen Hemminger
  0 siblings, 0 replies; 73+ messages in thread
From: Stephen Hemminger @ 2026-06-26 15:01 UTC (permalink / raw)
  To: Marat Khalili; +Cc: dev@dpdk.org

On Fri, 26 Jun 2026 10:35:47 +0000
Marat Khalili <marat.khalili@huawei.com> wrote:

> Series-tested-by: Marat Khalili <marat.khalili@huawei.com>
> 
> (Maybe it actually deserves a release note at this point?)

Maybe just a single release note about BPF enhancements/fixes
and add sub bullets for your loader and verifier changes.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v6 0/9] bpf: JIT related bug fixes
  2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
                     ` (9 preceding siblings ...)
  2026-06-26 10:35   ` [PATCH v6 0/9] bpf: JIT related bug fixes Marat Khalili
@ 2026-06-30 12:45   ` Konstantin Ananyev
  10 siblings, 0 replies; 73+ messages in thread
From: Konstantin Ananyev @ 2026-06-30 12:45 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org



> While implementing JIT for packet capture ran into several issues:
>   1. x86 JIT had a pre-existing bug which would crash.
>   2. The arm64 JIT was missing the packet-access instructions, found
>      previously [1].
>   3. Shift counts were not masked to the operand width as RFC 9669
>      requires: undefined behavior in the interpreter and an encoding
>      failure in the arm64 JIT.
>   4. Tests related to JIT were not being run or were missing coverage.
> 
> Fixed all of these. Patches are ordered with the most urgent fix (the
> x86 crash) first, each fix followed by the test that exercises it. The
> arm64 packet-load support is kept ahead of the "check JIT was generated"
> patch so the series bisects cleanly on arm64.
> 
> The arm64 epilogue branch fix (patch 6) was originally posted by
> Christophe Fontaine [1]; that series stalled, so it is carried here with
> his authorship.
> 
> [1] https://inbox.dpdk.org/dev/20260319114500.9757-2-cfontain@redhat.com/
> 
> v6:
>  - address Marat's review ARM64 JIT of LD IND instructions
>    need to handle offsets outside of 32 bit range.
> 
> 
> Christophe Fontaine (1):
>   bpf/arm64: fix offset type to allow a negative jump
> 
> Stephen Hemminger (8):
>   bpf/x86: fix JIT encoding of fixed-width immediates
>   test/bpf: add JSET test with small immediate
>   bpf: mask shift count in interpreter per RFC 9669
>   bpf/arm64: mask shift count per RFC 9669
>   test/bpf: add test for large shift
>   bpf/arm64: add BPF_ABS/BPF_IND packet load support
>   test/bpf: check that JIT was generated
>   test/bpf: check that bpf_convert can be JIT'd
> 
>  app/test/test_bpf.c     | 320 +++++++++++++++++++++++++++++++---------
>  lib/bpf/bpf_exec.c      |  31 ++--
>  lib/bpf/bpf_jit_arm64.c | 185 ++++++++++++++++++++++-
>  lib/bpf/bpf_jit_x86.c   |   6 +-
>  lib/bpf/meson.build     |   2 +
>  5 files changed, 456 insertions(+), 88 deletions(-)
> 
> --

Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

> 2.53.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2026-06-30 12:45 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
2026-06-17 18:03   ` Marat Khalili
2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
2026-06-17 18:09   ` Marat Khalili
2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
2026-06-17 18:14   ` Marat Khalili
2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-17 19:35   ` Marat Khalili
2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
2026-06-17 21:17   ` Stephen Hemminger
2026-06-18 20:47 ` [PATCH v2 0/6] bpf: JIT related bug fixes Stephen Hemminger
2026-06-18 20:47   ` [PATCH v2 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
2026-06-19  2:09     ` Stephen Hemminger
2026-06-18 20:47   ` [PATCH v2 2/6] test/bpf: add JSET test with small immediate Stephen Hemminger
2026-06-18 20:47   ` [PATCH v2 3/6] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
2026-06-18 20:47   ` [PATCH v2 4/6] test/bpf: check that JIT was generated Stephen Hemminger
2026-06-18 20:47   ` [PATCH v2 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-18 20:47   ` [PATCH v2 6/6] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
2026-06-21 16:23 ` [PATCH v3 0/6] bpf: JIT related bug fixes Stephen Hemminger
2026-06-21 16:23   ` [PATCH v3 1/6] bpf/x86: fix JIT encoding of BPF_JSET with immediate Stephen Hemminger
2026-06-23 10:11     ` Marat Khalili
2026-06-21 16:23   ` [PATCH v3 2/6] test/bpf: add JSET test with small immediate Stephen Hemminger
2026-06-23 10:16     ` Marat Khalili
2026-06-21 16:23   ` [PATCH v3 3/6] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
2026-06-22 16:26     ` Marat Khalili
2026-06-21 16:23   ` [PATCH v3 4/6] test/bpf: check that JIT was generated Stephen Hemminger
2026-06-21 16:23   ` [PATCH v3 5/6] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-21 16:23   ` [PATCH v3 6/6] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
2026-06-23 13:57     ` Marat Khalili
2026-06-23 15:51       ` Stephen Hemminger
2026-06-23 20:58       ` Stephen Hemminger
2026-06-23 23:23 ` [PATCH v4 0/7] bpf: JIT related bug fixes Stephen Hemminger
2026-06-23 23:23   ` [PATCH v4 1/7] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
2026-06-23 23:23   ` [PATCH v4 2/7] test/bpf: add JSET test with small immediate Stephen Hemminger
2026-06-23 23:23   ` [PATCH v4 3/7] test/bpf: add test for large shift Stephen Hemminger
2026-06-24  7:59     ` Marat Khalili
2026-06-24 13:44       ` Marat Khalili
2026-06-23 23:23   ` [PATCH v4 4/7] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
2026-06-24  8:43     ` Marat Khalili
2026-06-23 23:23   ` [PATCH v4 5/7] test/bpf: check that JIT was generated Stephen Hemminger
2026-06-23 23:23   ` [PATCH v4 6/7] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-23 23:23   ` [PATCH v4 7/7] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
2026-06-24  8:39     ` Marat Khalili
2026-06-24 17:54 ` [PATCH v5 0/9] bpf: JIT related bug fixes Stephen Hemminger
2026-06-24 17:55   ` [PATCH v5 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
2026-06-24 17:55   ` [PATCH v5 2/9] test/bpf: add JSET test with small immediate Stephen Hemminger
2026-06-24 17:55   ` [PATCH v5 3/9] bpf: mask shift count in interpreter per RFC 9669 Stephen Hemminger
2026-06-25 15:35     ` Marat Khalili
2026-06-24 17:55   ` [PATCH v5 4/9] bpf/arm64: mask shift count " Stephen Hemminger
2026-06-25 15:40     ` Marat Khalili
2026-06-24 17:55   ` [PATCH v5 5/9] test/bpf: add test for large shift Stephen Hemminger
2026-06-25 15:38     ` Marat Khalili
2026-06-24 17:55   ` [PATCH v5 6/9] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
2026-06-24 17:55   ` [PATCH v5 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-25 13:59     ` Marat Khalili
2026-06-24 17:55   ` [PATCH v5 8/9] test/bpf: check that JIT was generated Stephen Hemminger
2026-06-24 17:55   ` [PATCH v5 9/9] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
2026-06-25 17:30 ` [PATCH v6 0/9] bpf: JIT related bug fixes Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 1/9] bpf/x86: fix JIT encoding of fixed-width immediates Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 2/9] test/bpf: add JSET test with small immediate Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 3/9] bpf: mask shift count in interpreter per RFC 9669 Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 4/9] bpf/arm64: mask shift count " Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 5/9] test/bpf: add test for large shift Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 6/9] bpf/arm64: fix offset type to allow a negative jump Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 7/9] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-26 10:34     ` Marat Khalili
2026-06-25 17:30   ` [PATCH v6 8/9] test/bpf: check that JIT was generated Stephen Hemminger
2026-06-25 17:30   ` [PATCH v6 9/9] test/bpf: check that bpf_convert can be JIT'd Stephen Hemminger
2026-06-25 23:12     ` Stephen Hemminger
2026-06-26 10:35   ` [PATCH v6 0/9] bpf: JIT related bug fixes Marat Khalili
2026-06-26 15:01     ` Stephen Hemminger
2026-06-30 12:45   ` Konstantin Ananyev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox