[PATCH 0/4] bpf/arm64: add BPF_ABS/BPF

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
@ 2026-06-08 20:28 Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Discovered this while exploring packet filtering.
The arm64 BPF JIT did not implement BPF_LD | BPF_ABS or BPF_LD | BPF_IND,
so cBPF filters converted by rte_bpf_convert() could not be JIT compiled
and silently fell back to the interpreter on arm64.

The first patch fixes a latent bug in emit_return_zero_if_src_zero():
the offset of the branch to the epilogue was held in an unsigned.
A backward branch wrapped around. Existing JIT tests were never
being run on ARM.

The next two patches make the bpf tests assert that,
on an architecture with a JITbackend, code is actually generated,
so a missing or failed JIT is reported rather than skipped.

The final patch adds the ABS/IND opcodes, mirroring the x86 JIT
with a fast path for data in the first
mbuf segment and a __rte_pktmbuf_read() slow path for the rest.

Stephen Hemminger (4):
  bpf/arm64: fix zero-return branch in multi-exit programs
  test: bpf check that JIT was generated
  test: bpf check that bpf_convert can be JIT'd
  bpf/arm64: add BPF_ABS/BPF_IND packet load support

 app/test/test_bpf.c     |  23 ++++++-
 lib/bpf/bpf_jit_arm64.c | 149 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 169 insertions(+), 3 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 18:03   ` Marat Khalili
  2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, stable, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili, Jerin Jacob

If a JIT'd BPF program has more than one exit,
the branch to the epilogue can be backwards.

The current code assumed it is always forward:
emit_return_zero_if_src_zero() held the offset in an unsigned uint16_t,
so a backward (negative) offset wrapped to a large positive value and
branch off the end of the program, faulting at run time.

This was masked until now: the only test with this shape, test_ld_mbuf,
needs BPF_ABS/BPF_IND which the arm64 JIT did not implement, so it never
ran under the JIT.  The x86 JIT is unaffected because emit_epilog() keeps a
single exit (st->exit.off) reached from later exits and the divide-by-zero
check via a signed absolute jump (emit_abs_jcc), so direction does not
matter.

Use a signed offset; emit_b() already sign-extends imm26 correctly.

Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index a04ef33a9c..099822e9f1 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -957,7 +957,7 @@ static void
 emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
 {
 	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
-	uint16_t jump_to_epilogue;
+	int32_t jump_to_epilogue;

 	emit_cbnz(ctx, is64, src, 3);
 	emit_mov_imm(ctx, is64, r0, 0);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
@ 2026-06-17 18:03   ` Marat Khalili
  0 siblings, 0 replies; 11+ messages in thread
From: Marat Khalili @ 2026-06-17 18:03 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org
  Cc: stable@dpdk.org, Wathsala Vithanage, Konstantin Ananyev,
	Jerin Jacob

> If a JIT'd BPF program has more than one exit,
> the branch to the epilogue can be backwards.
> 
> The current code assumed it is always forward:
> emit_return_zero_if_src_zero() held the offset in an unsigned uint16_t,
> so a backward (negative) offset wrapped to a large positive value and
> branch off the end of the program, faulting at run time.
> 
> This was masked until now: the only test with this shape, test_ld_mbuf,
> needs BPF_ABS/BPF_IND which the arm64 JIT did not implement, so it never
> ran under the JIT.  The x86 JIT is unaffected because emit_epilog() keeps a
> single exit (st->exit.off) reached from later exits and the divide-by-zero
> check via a signed absolute jump (emit_abs_jcc), so direction does not
> matter.
> 
> Use a signed offset; emit_b() already sign-extends imm26 correctly.

This line is unclear to me, but that's not the main issue.

> Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_jit_arm64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
> index a04ef33a9c..099822e9f1 100644
> --- a/lib/bpf/bpf_jit_arm64.c
> +++ b/lib/bpf/bpf_jit_arm64.c
> @@ -957,7 +957,7 @@ static void
>  emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src)
>  {
>  	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> -	uint16_t jump_to_epilogue;
> +	int32_t jump_to_epilogue;
> 
>  	emit_cbnz(ctx, is64, src, 3);
>  	emit_mov_imm(ctx, is64, r0, 0);
> --
> 2.53.0

In the very next line the offset is calculated as follows though:

    jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx;

From its appearance, it can never be negative. However, ctx->program_sz is set
in emit_epilogue which is issued on every BPF_EXIT, so mid-generation it
contains the location of the last issued epilogue (including possibly previous
pass), not the program size. With this interpretation the code technically
works, but I'd suggest either renaming program_sz to something like
epilogue_offset, or making sure it is only set to the actual program size (sans
prologue and epilogue). In the latter case the jump offset will never be
negative, although the change to int32_t is still helpful in extending the
range of supported programs from 2**16 instructions. Since only 26 signed bits
are actually available maybe some check or assert is also warranted.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/4] test: bpf check that JIT was generated
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 18:09   ` Marat Khalili
  2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

Avoid silently ignoring JIT failures. The test cases should
all succeed JIT compilation; if not it is a bug in the JIT
implementation and should be reported.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index dd24722450..79d547dc82 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3508,6 +3508,14 @@ run_test(const struct bpf_test *tst)
 				rv, strerror(rv));
 		}
 	}
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	else {
+		/* a JIT backend exists for this arch, so it must compile */
+		printf("%s@%d: %s: no JIT code generated;\n",
+			__func__, __LINE__, tst->name);
+		ret = -1;
+	}
+#endif
 
 	rte_bpf_destroy(bpf);
 	return ret;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH 2/4] test: bpf check that JIT was generated
  2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
@ 2026-06-17 18:09   ` Marat Khalili
  0 siblings, 0 replies; 11+ messages in thread
From: Marat Khalili @ 2026-06-17 18:09 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday 8 June 2026 21:29
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH 2/4] test: bpf check that JIT was generated
> 
> Avoid silently ignoring JIT failures. The test cases should
> all succeed JIT compilation; if not it is a bug in the JIT
> implementation and should be reported.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index dd24722450..79d547dc82 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -3508,6 +3508,14 @@ run_test(const struct bpf_test *tst)
>  				rv, strerror(rv));
>  		}
>  	}
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> +	else {
> +		/* a JIT backend exists for this arch, so it must compile */
> +		printf("%s@%d: %s: no JIT code generated;\n",
> +			__func__, __LINE__, tst->name);
> +		ret = -1;
> +	}
> +#endif
> 
>  	rte_bpf_destroy(bpf);
>  	return ret;
> --
> 2.53.0

Acked-by: Marat Khalili <marat.khalili@huawei.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
  2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 18:14   ` Marat Khalili
  2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
  2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
  4 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili

Add followup in bpf conversion tests to make sure resulting
code was also run through JIT.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 79d547dc82..f5ab447ff6 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -4569,6 +4569,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 	struct bpf_program fcode;
 	struct rte_bpf_prm *prm = NULL;
 	struct rte_bpf *bpf = NULL;
+	int ret = -1;
 
 	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
@@ -4592,6 +4593,18 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 			__func__, __LINE__, rte_errno, strerror(rte_errno));
 		goto error;
 	}
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	{
+		struct rte_bpf_jit jit;
+
+		rte_bpf_get_jit(bpf, &jit);
+		if (jit.func == NULL) {
+			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
+			goto error;
+		}
+	}
+#endif
+	ret = 0;
 
 error:
 	if (bpf)
@@ -4603,7 +4616,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 
 	rte_free(prm);
 	pcap_freecode(&fcode);
-	return (bpf == NULL) ? -1 : 0;
+	return ret;
 }
 
 static int
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd
  2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-17 18:14   ` Marat Khalili
  0 siblings, 0 replies; 11+ messages in thread
From: Marat Khalili @ 2026-06-17 18:14 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday 8 June 2026 21:29
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
> Marat Khalili <marat.khalili@huawei.com>
> Subject: [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd
> 
> Add followup in bpf conversion tests to make sure resulting
> code was also run through JIT.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index 79d547dc82..f5ab447ff6 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -4569,6 +4569,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
>  	struct bpf_program fcode;
>  	struct rte_bpf_prm *prm = NULL;
>  	struct rte_bpf *bpf = NULL;
> +	int ret = -1;
> 
>  	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
>  		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
> @@ -4592,6 +4593,18 @@ test_bpf_filter(pcap_t *pcap, const char *s)
>  			__func__, __LINE__, rte_errno, strerror(rte_errno));
>  		goto error;
>  	}
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> +	{
> +		struct rte_bpf_jit jit;
> +
> +		rte_bpf_get_jit(bpf, &jit);
> +		if (jit.func == NULL) {
> +			printf("%s@%d: no JIT generated\n", __func__, __LINE__);
> +			goto error;
> +		}
> +	}
> +#endif
> +	ret = 0;
> 
>  error:
>  	if (bpf)
> @@ -4603,7 +4616,7 @@ test_bpf_filter(pcap_t *pcap, const char *s)
> 
>  	rte_free(prm);
>  	pcap_freecode(&fcode);
> -	return (bpf == NULL) ? -1 : 0;
> +	return ret;
>  }
> 
>  static int
> --
> 2.53.0

Acked-by: Marat Khalili <marat.khalili@huawei.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (2 preceding siblings ...)
  2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
@ 2026-06-08 20:28 ` Stephen Hemminger
  2026-06-17 19:35   ` Marat Khalili
  2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
  4 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev,
	Marat Khalili

The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with
"invalid opcode", so cBPF programs converted by rte_bpf_convert() could
not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for
data held in the first mbuf segment and a __rte_pktmbuf_read() slow path
for everything else. Programs using these opcodes now use the call
register layout, since the slow path makes a function call.

Bugzilla ID: 1427

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_jit_arm64.c | 147 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c
index 099822e9f1..6952c61806 100644
--- a/lib/bpf/bpf_jit_arm64.c
+++ b/lib/bpf/bpf_jit_arm64.c
@@ -1123,6 +1123,133 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off)
 	emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off));
 }
 
+/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */
+enum {
+	LDMB_FAST_OFS, /* fast path */
+	LDMB_SLOW_OFS, /* slow path */
+	LDMB_FIN_OFS,  /* common tail */
+	LDMB_OFS_NUM
+};
+
+/*
+ * Helper for emit_ld_mbuf(): fast path.
+ * Compute the packet offset; if it lies inside the first segment leave the
+ * data pointer in R0, otherwise branch to the slow path.
+ */
+static void
+emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
+		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
+	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
+
+	/* off = imm (+ src for BPF_IND) */
+	emit_mov_imm(ctx, 1, tmp1, imm);
+	if (mode == BPF_IND)
+		emit_add(ctx, 1, tmp1, src);
+
+	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_sub(ctx, 1, tmp2, tmp1);
+	emit_mov_imm(ctx, 1, tmp3, sz);
+	emit_cmp(ctx, 1, tmp2, tmp3);
+	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));
+
+	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
+	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
+	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
+	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
+	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
+	emit_add(ctx, 1, r0, tmp2);
+	emit_add(ctx, 1, r0, tmp1);
+
+	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
+}
+
+/*
+ * Helper for emit_ld_mbuf(): slow path.
+ * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL.
+ * The scratch buffer is the space reserved by __rte_bpf_validate() at the
+ * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs).
+ */
+static void
+emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
+	uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP);
+	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
+
+	/* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */
+	emit_mov_64(ctx, A64_R(1), tmp1);		/* off (held in tmp1) */
+	emit_mov_64(ctx, A64_R(0), r6);			/* mbuf */
+	emit_mov_imm(ctx, 0, A64_R(2), sz);		/* len */
+	emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs);	/* buf */
+
+	emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read);
+	emit_return_zero_if_src_zero(ctx, 1, r0);
+}
+
+/*
+ * Helper for emit_ld_mbuf(): common tail.
+ * Load the value pointed to by R0 and convert from network byte order.
+ */
+static void
+emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz)
+{
+	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
+
+	emit_ldr(ctx, opsz, r0, r0, A64_ZR);
+	if (opsz != BPF_B)
+		emit_be(ctx, r0, sz * 8);
+}
+
+/*
+ * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads:
+ *
+ *	off = imm (+ src for BPF_IND)
+ *	if (mbuf->data_len - off >= sz)			    -- fast path
+ *		ptr = mbuf->buf_addr + mbuf->data_off + off;
+ *	else						    -- slow path
+ *		ptr = __rte_pktmbuf_read(mbuf, off, sz, buf);
+ *		if (ptr == NULL)
+ *			return 0;
+ *	R0 = ntoh(*(size *)ptr);			    -- common tail
+ *
+ * The three blocks are sized in a dry run so the forward branches can be
+ * resolved, then emitted for real (arm64 instructions are fixed width, so
+ * the dry run reproduces the real instruction count exactly).
+ */
+static void
+emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
+	     uint32_t stack_ofs)
+{
+	uint8_t mode = BPF_MODE(op);
+	uint8_t opsz = BPF_SIZE(op);
+	uint32_t sz = bpf_size(opsz);
+	uint32_t ofs[LDMB_OFS_NUM];
+
+	/* seed offsets so the dry-run branches stay in range */
+	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
+
+	/* dry run to record block offsets */
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	ofs[LDMB_SLOW_OFS] = ctx->idx;
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	ofs[LDMB_FIN_OFS] = ctx->idx;
+	emit_ldmb_fin(ctx, opsz, sz);
+
+	/* rewind and emit for real with resolved offsets */
+	ctx->idx = ofs[LDMB_FAST_OFS];
+	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
+	emit_ldmb_slow_path(ctx, sz, stack_ofs);
+	emit_ldmb_fin(ctx, opsz, sz);
+}
+
 static void
 check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 {
@@ -1135,8 +1262,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 		op = ins->code;
 
 		switch (op) {
-		/* Call imm */
+		/*
+		 * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(),
+		 * so they need the call-clobbered register layout as well.
+		 */
 		case (BPF_JMP | EBPF_CALL):
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
 			ctx->foundcall = 1;
 			return;
 		}
@@ -1338,6 +1474,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf)
 			emit_mov_imm(ctx, 1, dst, u64);
 			i++;
 			break;
+		/* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */
+		case (BPF_LD | BPF_ABS | BPF_B):
+		case (BPF_LD | BPF_ABS | BPF_H):
+		case (BPF_LD | BPF_ABS | BPF_W):
+		case (BPF_LD | BPF_IND | BPF_B):
+		case (BPF_LD | BPF_IND | BPF_H):
+		case (BPF_LD | BPF_IND | BPF_W):
+			emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz);
+			break;
 		/* *(size *)(dst + off) = src */
 		case (BPF_STX | BPF_MEM | BPF_B):
 		case (BPF_STX | BPF_MEM | BPF_H):
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-17 19:35   ` Marat Khalili
  0 siblings, 0 replies; 11+ messages in thread
From: Marat Khalili @ 2026-06-17 19:35 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Wathsala Vithanage, Konstantin Ananyev

Thank you for doing this. I suggest comparing against the previous effort by Christophe Fontaine though.

Couple of comments inline, superficially looks correct otherwise.

> +/*
> + * Helper for emit_ld_mbuf(): fast path.
> + * Compute the packet offset; if it lies inside the first segment leave the
> + * data pointer in R0, otherwise branch to the slow path.
> + */
> +static void
> +emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode,
> +		    uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM])
> +{
> +	uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0);
> +	uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6);
> +	uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1);
> +	uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2);
> +	uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3);
> +
> +	/* off = imm (+ src for BPF_IND) */
> +	emit_mov_imm(ctx, 1, tmp1, imm);
> +	if (mode == BPF_IND)
> +		emit_add(ctx, 1, tmp1, src);
> +
> +	/* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */
> +	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len));
> +	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
> +	emit_sub(ctx, 1, tmp2, tmp1);
> +	emit_mov_imm(ctx, 1, tmp3, sz);
> +	emit_cmp(ctx, 1, tmp2, tmp3);
> +	emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx));

Are we checking that (int64_t)off >= 0 anywhere?

> +
> +	/* R0 = mbuf->buf_addr + mbuf->data_off + off */
> +	emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off));
> +	emit_ldr(ctx, BPF_H, tmp2, r6, tmp2);
> +	emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr));
> +	emit_ldr(ctx, EBPF_DW, r0, r6, r0);
> +	emit_add(ctx, 1, r0, tmp2);
> +	emit_add(ctx, 1, r0, tmp1);
> +
> +	emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx));
> +}

// snip

> +/*
> + * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads:
> + *
> + *	off = imm (+ src for BPF_IND)
> + *	if (mbuf->data_len - off >= sz)			    -- fast path
> + *		ptr = mbuf->buf_addr + mbuf->data_off + off;
> + *	else						    -- slow path
> + *		ptr = __rte_pktmbuf_read(mbuf, off, sz, buf);
> + *		if (ptr == NULL)
> + *			return 0;
> + *	R0 = ntoh(*(size *)ptr);			    -- common tail

nit: this pseudo-code could probably be made more C-like.

> + *
> + * The three blocks are sized in a dry run so the forward branches can be
> + * resolved, then emitted for real (arm64 instructions are fixed width, so
> + * the dry run reproduces the real instruction count exactly).
> + */
> +static void
> +emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm,
> +	     uint32_t stack_ofs)
> +{
> +	uint8_t mode = BPF_MODE(op);
> +	uint8_t opsz = BPF_SIZE(op);
> +	uint32_t sz = bpf_size(opsz);
> +	uint32_t ofs[LDMB_OFS_NUM];
> +
> +	/* seed offsets so the dry-run branches stay in range */
> +	ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx;
> +
> +	/* dry run to record block offsets */
> +	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
> +	ofs[LDMB_SLOW_OFS] = ctx->idx;
> +	emit_ldmb_slow_path(ctx, sz, stack_ofs);
> +	ofs[LDMB_FIN_OFS] = ctx->idx;
> +	emit_ldmb_fin(ctx, opsz, sz);

nit: we already do two passes for the whole program, could avoid quadruple work here

> +
> +	/* rewind and emit for real with resolved offsets */
> +	ctx->idx = ofs[LDMB_FAST_OFS];
> +	emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs);
> +	emit_ldmb_slow_path(ctx, sz, stack_ofs);
> +	emit_ldmb_fin(ctx, opsz, sz);
> +}

// snip the rest

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
                   ` (3 preceding siblings ...)
  2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
@ 2026-06-17 17:37 ` Marat Khalili
  2026-06-17 21:17   ` Stephen Hemminger
  4 siblings, 1 reply; 11+ messages in thread
From: Marat Khalili @ 2026-06-17 17:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org

I think this should CC participants of the previous discussion: 
https://inbox.dpdk.org/dev/20260319114500.9757-1-cfontain@redhat.com/

(Humans may think they submit patches independently, but weights of their LLMs
were already contaminated with the other effort. Hello brave new world.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
  2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
@ 2026-06-17 21:17   ` Stephen Hemminger
  0 siblings, 0 replies; 11+ messages in thread
From: Stephen Hemminger @ 2026-06-17 21:17 UTC (permalink / raw)
  To: Marat Khalili; +Cc: dev@dpdk.org

On Wed, 17 Jun 2026 17:37:39 +0000
Marat Khalili <marat.khalili@huawei.com> wrote:

> I think this should CC participants of the previous discussion: 
> https://inbox.dpdk.org/dev/20260319114500.9757-1-cfontain@redhat.com/
> 
> (Humans may think they submit patches independently, but weights of their LLMs
> were already contaminated with the other effort. Hello brave new world.)

Didn't see previous thread. This effort was more targeted at
why can't capture which uses pcap_compile -> bpf_convert flow be JIT'd?

Had not looked back at other overlaps.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-17 21:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
2026-06-17 18:03   ` Marat Khalili
2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger
2026-06-17 18:09   ` Marat Khalili
2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger
2026-06-17 18:14   ` Marat Khalili
2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger
2026-06-17 19:35   ` Marat Khalili
2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili
2026-06-17 21:17   ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox