* [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support
@ 2026-06-08 20:28 Stephen Hemminger
2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Discovered this while exploring packet filtering.
The arm64 BPF JIT did not implement BPF_LD | BPF_ABS or BPF_LD | BPF_IND,
so cBPF filters converted by rte_bpf_convert() could not be JIT compiled
and silently fell back to the interpreter on arm64.
The first patch fixes a latent bug in emit_return_zero_if_src_zero():
the offset of the branch to the epilogue was held in an unsigned.
A backward branch wrapped around. Existing JIT tests were never
being run on ARM.
The next two patches make the bpf tests assert that,
on an architecture with a JITbackend, code is actually generated,
so a missing or failed JIT is reported rather than skipped.
The final patch adds the ABS/IND opcodes, mirroring the x86 JIT
with a fast path for data in the first
mbuf segment and a __rte_pktmbuf_read() slow path for the rest.
Stephen Hemminger (4):
bpf/arm64: fix zero-return branch in multi-exit programs
test: bpf check that JIT was generated
test: bpf check that bpf_convert can be JIT'd
bpf/arm64: add BPF_ABS/BPF_IND packet load support
app/test/test_bpf.c | 23 ++++++-
lib/bpf/bpf_jit_arm64.c | 149 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 169 insertions(+), 3 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs 2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger @ 2026-06-08 20:28 ` Stephen Hemminger 2026-06-17 18:03 ` Marat Khalili 2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 1 reply; 11+ messages in thread From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw) To: dev Cc: Stephen Hemminger, stable, Wathsala Vithanage, Konstantin Ananyev, Marat Khalili, Jerin Jacob If a JIT'd BPF program has more than one exit, the branch to the epilogue can be backwards. The current code assumed it is always forward: emit_return_zero_if_src_zero() held the offset in an unsigned uint16_t, so a backward (negative) offset wrapped to a large positive value and branch off the end of the program, faulting at run time. This was masked until now: the only test with this shape, test_ld_mbuf, needs BPF_ABS/BPF_IND which the arm64 JIT did not implement, so it never ran under the JIT. The x86 JIT is unaffected because emit_epilog() keeps a single exit (st->exit.off) reached from later exits and the divide-by-zero check via a signed absolute jump (emit_abs_jcc), so direction does not matter. Use a signed offset; emit_b() already sign-extends imm26 correctly. Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- lib/bpf/bpf_jit_arm64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c index a04ef33a9c..099822e9f1 100644 --- a/lib/bpf/bpf_jit_arm64.c +++ b/lib/bpf/bpf_jit_arm64.c @@ -957,7 +957,7 @@ static void emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src) { uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0); - uint16_t jump_to_epilogue; + int32_t jump_to_epilogue; emit_cbnz(ctx, is64, src, 3); emit_mov_imm(ctx, is64, r0, 0); -- 2.53.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs 2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger @ 2026-06-17 18:03 ` Marat Khalili 0 siblings, 0 replies; 11+ messages in thread From: Marat Khalili @ 2026-06-17 18:03 UTC (permalink / raw) To: Stephen Hemminger, dev@dpdk.org Cc: stable@dpdk.org, Wathsala Vithanage, Konstantin Ananyev, Jerin Jacob > If a JIT'd BPF program has more than one exit, > the branch to the epilogue can be backwards. > > The current code assumed it is always forward: > emit_return_zero_if_src_zero() held the offset in an unsigned uint16_t, > so a backward (negative) offset wrapped to a large positive value and > branch off the end of the program, faulting at run time. > > This was masked until now: the only test with this shape, test_ld_mbuf, > needs BPF_ABS/BPF_IND which the arm64 JIT did not implement, so it never > ran under the JIT. The x86 JIT is unaffected because emit_epilog() keeps a > single exit (st->exit.off) reached from later exits and the divide-by-zero > check via a signed absolute jump (emit_abs_jcc), so direction does not > matter. > > Use a signed offset; emit_b() already sign-extends imm26 correctly. This line is unclear to me, but that's not the main issue. > Fixes: 111e2a747a4f ("bpf/arm: add basic arithmetic operations") > Cc: stable@dpdk.org > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > lib/bpf/bpf_jit_arm64.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c > index a04ef33a9c..099822e9f1 100644 > --- a/lib/bpf/bpf_jit_arm64.c > +++ b/lib/bpf/bpf_jit_arm64.c > @@ -957,7 +957,7 @@ static void > emit_return_zero_if_src_zero(struct a64_jit_ctx *ctx, bool is64, uint8_t src) > { > uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0); > - uint16_t jump_to_epilogue; > + int32_t jump_to_epilogue; > > emit_cbnz(ctx, is64, src, 3); > emit_mov_imm(ctx, is64, r0, 0); > -- > 2.53.0 In the very next line the offset is calculated as follows though: jump_to_epilogue = (ctx->program_start + ctx->program_sz) - ctx->idx; From its appearance, it can never be negative. However, ctx->program_sz is set in emit_epilogue which is issued on every BPF_EXIT, so mid-generation it contains the location of the last issued epilogue (including possibly previous pass), not the program size. With this interpretation the code technically works, but I'd suggest either renaming program_sz to something like epilogue_offset, or making sure it is only set to the actual program size (sans prologue and epilogue). In the latter case the jump offset will never be negative, although the change to int32_t is still helpful in extending the range of supported programs from 2**16 instructions. Since only 26 signed bits are actually available maybe some check or assert is also warranted. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/4] test: bpf check that JIT was generated 2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger 2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger @ 2026-06-08 20:28 ` Stephen Hemminger 2026-06-17 18:09 ` Marat Khalili 2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 1 reply; 11+ messages in thread From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili Avoid silently ignoring JIT failures. The test cases should all succeed JIT compilation; if not it is a bug in the JIT implementation and should be reported. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/test_bpf.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c index dd24722450..79d547dc82 100644 --- a/app/test/test_bpf.c +++ b/app/test/test_bpf.c @@ -3508,6 +3508,14 @@ run_test(const struct bpf_test *tst) rv, strerror(rv)); } } +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) + else { + /* a JIT backend exists for this arch, so it must compile */ + printf("%s@%d: %s: no JIT code generated;\n", + __func__, __LINE__, tst->name); + ret = -1; + } +#endif rte_bpf_destroy(bpf); return ret; -- 2.53.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH 2/4] test: bpf check that JIT was generated 2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger @ 2026-06-17 18:09 ` Marat Khalili 0 siblings, 0 replies; 11+ messages in thread From: Marat Khalili @ 2026-06-17 18:09 UTC (permalink / raw) To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev > -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Monday 8 June 2026 21:29 > To: dev@dpdk.org > Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>; > Marat Khalili <marat.khalili@huawei.com> > Subject: [PATCH 2/4] test: bpf check that JIT was generated > > Avoid silently ignoring JIT failures. The test cases should > all succeed JIT compilation; if not it is a bug in the JIT > implementation and should be reported. > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > app/test/test_bpf.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c > index dd24722450..79d547dc82 100644 > --- a/app/test/test_bpf.c > +++ b/app/test/test_bpf.c > @@ -3508,6 +3508,14 @@ run_test(const struct bpf_test *tst) > rv, strerror(rv)); > } > } > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) > + else { > + /* a JIT backend exists for this arch, so it must compile */ > + printf("%s@%d: %s: no JIT code generated;\n", > + __func__, __LINE__, tst->name); > + ret = -1; > + } > +#endif > > rte_bpf_destroy(bpf); > return ret; > -- > 2.53.0 Acked-by: Marat Khalili <marat.khalili@huawei.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd 2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger 2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger 2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger @ 2026-06-08 20:28 ` Stephen Hemminger 2026-06-17 18:14 ` Marat Khalili 2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger 2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili 4 siblings, 1 reply; 11+ messages in thread From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Marat Khalili Add followup in bpf conversion tests to make sure resulting code was also run through JIT. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/test_bpf.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c index 79d547dc82..f5ab447ff6 100644 --- a/app/test/test_bpf.c +++ b/app/test/test_bpf.c @@ -4569,6 +4569,7 @@ test_bpf_filter(pcap_t *pcap, const char *s) struct bpf_program fcode; struct rte_bpf_prm *prm = NULL; struct rte_bpf *bpf = NULL; + int ret = -1; if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) { printf("%s@%d: pcap_compile('%s') failed: %s;\n", @@ -4592,6 +4593,18 @@ test_bpf_filter(pcap_t *pcap, const char *s) __func__, __LINE__, rte_errno, strerror(rte_errno)); goto error; } +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) + { + struct rte_bpf_jit jit; + + rte_bpf_get_jit(bpf, &jit); + if (jit.func == NULL) { + printf("%s@%d: no JIT generated\n", __func__, __LINE__); + goto error; + } + } +#endif + ret = 0; error: if (bpf) @@ -4603,7 +4616,7 @@ test_bpf_filter(pcap_t *pcap, const char *s) rte_free(prm); pcap_freecode(&fcode); - return (bpf == NULL) ? -1 : 0; + return ret; } static int -- 2.53.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd 2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger @ 2026-06-17 18:14 ` Marat Khalili 0 siblings, 0 replies; 11+ messages in thread From: Marat Khalili @ 2026-06-17 18:14 UTC (permalink / raw) To: Stephen Hemminger, dev@dpdk.org; +Cc: Konstantin Ananyev > -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Monday 8 June 2026 21:29 > To: dev@dpdk.org > Cc: Stephen Hemminger <stephen@networkplumber.org>; Konstantin Ananyev <konstantin.ananyev@huawei.com>; > Marat Khalili <marat.khalili@huawei.com> > Subject: [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd > > Add followup in bpf conversion tests to make sure resulting > code was also run through JIT. > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > app/test/test_bpf.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c > index 79d547dc82..f5ab447ff6 100644 > --- a/app/test/test_bpf.c > +++ b/app/test/test_bpf.c > @@ -4569,6 +4569,7 @@ test_bpf_filter(pcap_t *pcap, const char *s) > struct bpf_program fcode; > struct rte_bpf_prm *prm = NULL; > struct rte_bpf *bpf = NULL; > + int ret = -1; > > if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) { > printf("%s@%d: pcap_compile('%s') failed: %s;\n", > @@ -4592,6 +4593,18 @@ test_bpf_filter(pcap_t *pcap, const char *s) > __func__, __LINE__, rte_errno, strerror(rte_errno)); > goto error; > } > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) > + { > + struct rte_bpf_jit jit; > + > + rte_bpf_get_jit(bpf, &jit); > + if (jit.func == NULL) { > + printf("%s@%d: no JIT generated\n", __func__, __LINE__); > + goto error; > + } > + } > +#endif > + ret = 0; > > error: > if (bpf) > @@ -4603,7 +4616,7 @@ test_bpf_filter(pcap_t *pcap, const char *s) > > rte_free(prm); > pcap_freecode(&fcode); > - return (bpf == NULL) ? -1 : 0; > + return ret; > } > > static int > -- > 2.53.0 Acked-by: Marat Khalili <marat.khalili@huawei.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support 2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger ` (2 preceding siblings ...) 2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger @ 2026-06-08 20:28 ` Stephen Hemminger 2026-06-17 19:35 ` Marat Khalili 2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili 4 siblings, 1 reply; 11+ messages in thread From: Stephen Hemminger @ 2026-06-08 20:28 UTC (permalink / raw) To: dev Cc: Stephen Hemminger, Wathsala Vithanage, Konstantin Ananyev, Marat Khalili The arm64 JIT rejected BPF_LD | BPF_ABS and BPF_LD | BPF_IND with "invalid opcode", so cBPF programs converted by rte_bpf_convert() could not be JITed. Add these opcodes, mirroring the x86 JIT: a fast path for data held in the first mbuf segment and a __rte_pktmbuf_read() slow path for everything else. Programs using these opcodes now use the call register layout, since the slow path makes a function call. Bugzilla ID: 1427 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- lib/bpf/bpf_jit_arm64.c | 147 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 1 deletion(-) diff --git a/lib/bpf/bpf_jit_arm64.c b/lib/bpf/bpf_jit_arm64.c index 099822e9f1..6952c61806 100644 --- a/lib/bpf/bpf_jit_arm64.c +++ b/lib/bpf/bpf_jit_arm64.c @@ -1123,6 +1123,133 @@ emit_branch(struct a64_jit_ctx *ctx, uint8_t op, uint32_t i, int16_t off) emit_b_cond(ctx, ebpf_to_a64_cond(op), jump_offset_get(ctx, i, off)); } +/* LD_ABS/LD_IND code block offsets (in arm64 instructions) */ +enum { + LDMB_FAST_OFS, /* fast path */ + LDMB_SLOW_OFS, /* slow path */ + LDMB_FIN_OFS, /* common tail */ + LDMB_OFS_NUM +}; + +/* + * Helper for emit_ld_mbuf(): fast path. + * Compute the packet offset; if it lies inside the first segment leave the + * data pointer in R0, otherwise branch to the slow path. + */ +static void +emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode, + uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM]) +{ + uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0); + uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6); + uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1); + uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2); + uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3); + + /* off = imm (+ src for BPF_IND) */ + emit_mov_imm(ctx, 1, tmp1, imm); + if (mode == BPF_IND) + emit_add(ctx, 1, tmp1, src); + + /* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */ + emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len)); + emit_ldr(ctx, BPF_H, tmp2, r6, tmp2); + emit_sub(ctx, 1, tmp2, tmp1); + emit_mov_imm(ctx, 1, tmp3, sz); + emit_cmp(ctx, 1, tmp2, tmp3); + emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx)); + + /* R0 = mbuf->buf_addr + mbuf->data_off + off */ + emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off)); + emit_ldr(ctx, BPF_H, tmp2, r6, tmp2); + emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr)); + emit_ldr(ctx, EBPF_DW, r0, r6, r0); + emit_add(ctx, 1, r0, tmp2); + emit_add(ctx, 1, r0, tmp1); + + emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx)); +} + +/* + * Helper for emit_ld_mbuf(): slow path. + * R0 = __rte_pktmbuf_read(mbuf, off, sz, buf); return 0 if NULL. + * The scratch buffer is the space reserved by __rte_bpf_validate() at the + * bottom of the eBPF stack frame, i.e. (frame_pointer - stack_ofs). + */ +static void +emit_ldmb_slow_path(struct a64_jit_ctx *ctx, uint32_t sz, uint32_t stack_ofs) +{ + uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0); + uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6); + uint8_t fp = ebpf_to_a64_reg(ctx, EBPF_FP); + uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1); + + /* arguments of __rte_pktmbuf_read(mbuf, off, len, buf) */ + emit_mov_64(ctx, A64_R(1), tmp1); /* off (held in tmp1) */ + emit_mov_64(ctx, A64_R(0), r6); /* mbuf */ + emit_mov_imm(ctx, 0, A64_R(2), sz); /* len */ + emit_sub_imm_64(ctx, A64_R(3), fp, stack_ofs); /* buf */ + + emit_call(ctx, tmp1, (void *)(uintptr_t)__rte_pktmbuf_read); + emit_return_zero_if_src_zero(ctx, 1, r0); +} + +/* + * Helper for emit_ld_mbuf(): common tail. + * Load the value pointed to by R0 and convert from network byte order. + */ +static void +emit_ldmb_fin(struct a64_jit_ctx *ctx, uint8_t opsz, uint32_t sz) +{ + uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0); + + emit_ldr(ctx, opsz, r0, r0, A64_ZR); + if (opsz != BPF_B) + emit_be(ctx, r0, sz * 8); +} + +/* + * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads: + * + * off = imm (+ src for BPF_IND) + * if (mbuf->data_len - off >= sz) -- fast path + * ptr = mbuf->buf_addr + mbuf->data_off + off; + * else -- slow path + * ptr = __rte_pktmbuf_read(mbuf, off, sz, buf); + * if (ptr == NULL) + * return 0; + * R0 = ntoh(*(size *)ptr); -- common tail + * + * The three blocks are sized in a dry run so the forward branches can be + * resolved, then emitted for real (arm64 instructions are fixed width, so + * the dry run reproduces the real instruction count exactly). + */ +static void +emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm, + uint32_t stack_ofs) +{ + uint8_t mode = BPF_MODE(op); + uint8_t opsz = BPF_SIZE(op); + uint32_t sz = bpf_size(opsz); + uint32_t ofs[LDMB_OFS_NUM]; + + /* seed offsets so the dry-run branches stay in range */ + ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx; + + /* dry run to record block offsets */ + emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs); + ofs[LDMB_SLOW_OFS] = ctx->idx; + emit_ldmb_slow_path(ctx, sz, stack_ofs); + ofs[LDMB_FIN_OFS] = ctx->idx; + emit_ldmb_fin(ctx, opsz, sz); + + /* rewind and emit for real with resolved offsets */ + ctx->idx = ofs[LDMB_FAST_OFS]; + emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs); + emit_ldmb_slow_path(ctx, sz, stack_ofs); + emit_ldmb_fin(ctx, opsz, sz); +} + static void check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf) { @@ -1135,8 +1262,17 @@ check_program_has_call(struct a64_jit_ctx *ctx, struct rte_bpf *bpf) op = ins->code; switch (op) { - /* Call imm */ + /* + * BPF_ABS/BPF_IND can fall through to __rte_pktmbuf_read(), + * so they need the call-clobbered register layout as well. + */ case (BPF_JMP | EBPF_CALL): + case (BPF_LD | BPF_ABS | BPF_B): + case (BPF_LD | BPF_ABS | BPF_H): + case (BPF_LD | BPF_ABS | BPF_W): + case (BPF_LD | BPF_IND | BPF_B): + case (BPF_LD | BPF_IND | BPF_H): + case (BPF_LD | BPF_IND | BPF_W): ctx->foundcall = 1; return; } @@ -1338,6 +1474,15 @@ emit(struct a64_jit_ctx *ctx, struct rte_bpf *bpf) emit_mov_imm(ctx, 1, dst, u64); i++; break; + /* R0 = ntoh(*(size *)(mbuf data + (src) + imm)) */ + case (BPF_LD | BPF_ABS | BPF_B): + case (BPF_LD | BPF_ABS | BPF_H): + case (BPF_LD | BPF_ABS | BPF_W): + case (BPF_LD | BPF_IND | BPF_B): + case (BPF_LD | BPF_IND | BPF_H): + case (BPF_LD | BPF_IND | BPF_W): + emit_ld_mbuf(ctx, op, src, imm, bpf->stack_sz); + break; /* *(size *)(dst + off) = src */ case (BPF_STX | BPF_MEM | BPF_B): case (BPF_STX | BPF_MEM | BPF_H): -- 2.53.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support 2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger @ 2026-06-17 19:35 ` Marat Khalili 0 siblings, 0 replies; 11+ messages in thread From: Marat Khalili @ 2026-06-17 19:35 UTC (permalink / raw) To: Stephen Hemminger, dev@dpdk.org; +Cc: Wathsala Vithanage, Konstantin Ananyev Thank you for doing this. I suggest comparing against the previous effort by Christophe Fontaine though. Couple of comments inline, superficially looks correct otherwise. > +/* > + * Helper for emit_ld_mbuf(): fast path. > + * Compute the packet offset; if it lies inside the first segment leave the > + * data pointer in R0, otherwise branch to the slow path. > + */ > +static void > +emit_ldmb_fast_path(struct a64_jit_ctx *ctx, uint8_t src, uint8_t mode, > + uint32_t sz, int32_t imm, const uint32_t ofs[LDMB_OFS_NUM]) > +{ > + uint8_t r0 = ebpf_to_a64_reg(ctx, EBPF_REG_0); > + uint8_t r6 = ebpf_to_a64_reg(ctx, EBPF_REG_6); > + uint8_t tmp1 = ebpf_to_a64_reg(ctx, TMP_REG_1); > + uint8_t tmp2 = ebpf_to_a64_reg(ctx, TMP_REG_2); > + uint8_t tmp3 = ebpf_to_a64_reg(ctx, TMP_REG_3); > + > + /* off = imm (+ src for BPF_IND) */ > + emit_mov_imm(ctx, 1, tmp1, imm); > + if (mode == BPF_IND) > + emit_add(ctx, 1, tmp1, src); > + > + /* if ((int64_t)(mbuf->data_len - off) < sz) goto slow_path */ > + emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_len)); > + emit_ldr(ctx, BPF_H, tmp2, r6, tmp2); > + emit_sub(ctx, 1, tmp2, tmp1); > + emit_mov_imm(ctx, 1, tmp3, sz); > + emit_cmp(ctx, 1, tmp2, tmp3); > + emit_b_cond(ctx, A64_LT, (int32_t)(ofs[LDMB_SLOW_OFS] - ctx->idx)); Are we checking that (int64_t)off >= 0 anywhere? > + > + /* R0 = mbuf->buf_addr + mbuf->data_off + off */ > + emit_mov_imm(ctx, 1, tmp2, offsetof(struct rte_mbuf, data_off)); > + emit_ldr(ctx, BPF_H, tmp2, r6, tmp2); > + emit_mov_imm(ctx, 1, r0, offsetof(struct rte_mbuf, buf_addr)); > + emit_ldr(ctx, EBPF_DW, r0, r6, r0); > + emit_add(ctx, 1, r0, tmp2); > + emit_add(ctx, 1, r0, tmp1); > + > + emit_b(ctx, (int32_t)(ofs[LDMB_FIN_OFS] - ctx->idx)); > +} // snip > +/* > + * Emit code for BPF_LD | BPF_ABS and BPF_LD | BPF_IND packet loads: > + * > + * off = imm (+ src for BPF_IND) > + * if (mbuf->data_len - off >= sz) -- fast path > + * ptr = mbuf->buf_addr + mbuf->data_off + off; > + * else -- slow path > + * ptr = __rte_pktmbuf_read(mbuf, off, sz, buf); > + * if (ptr == NULL) > + * return 0; > + * R0 = ntoh(*(size *)ptr); -- common tail nit: this pseudo-code could probably be made more C-like. > + * > + * The three blocks are sized in a dry run so the forward branches can be > + * resolved, then emitted for real (arm64 instructions are fixed width, so > + * the dry run reproduces the real instruction count exactly). > + */ > +static void > +emit_ld_mbuf(struct a64_jit_ctx *ctx, uint8_t op, uint8_t src, int32_t imm, > + uint32_t stack_ofs) > +{ > + uint8_t mode = BPF_MODE(op); > + uint8_t opsz = BPF_SIZE(op); > + uint32_t sz = bpf_size(opsz); > + uint32_t ofs[LDMB_OFS_NUM]; > + > + /* seed offsets so the dry-run branches stay in range */ > + ofs[LDMB_FAST_OFS] = ofs[LDMB_SLOW_OFS] = ofs[LDMB_FIN_OFS] = ctx->idx; > + > + /* dry run to record block offsets */ > + emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs); > + ofs[LDMB_SLOW_OFS] = ctx->idx; > + emit_ldmb_slow_path(ctx, sz, stack_ofs); > + ofs[LDMB_FIN_OFS] = ctx->idx; > + emit_ldmb_fin(ctx, opsz, sz); nit: we already do two passes for the whole program, could avoid quadruple work here > + > + /* rewind and emit for real with resolved offsets */ > + ctx->idx = ofs[LDMB_FAST_OFS]; > + emit_ldmb_fast_path(ctx, src, mode, sz, imm, ofs); > + emit_ldmb_slow_path(ctx, sz, stack_ofs); > + emit_ldmb_fin(ctx, opsz, sz); > +} // snip the rest ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support 2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger ` (3 preceding siblings ...) 2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger @ 2026-06-17 17:37 ` Marat Khalili 2026-06-17 21:17 ` Stephen Hemminger 4 siblings, 1 reply; 11+ messages in thread From: Marat Khalili @ 2026-06-17 17:37 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev@dpdk.org I think this should CC participants of the previous discussion: https://inbox.dpdk.org/dev/20260319114500.9757-1-cfontain@redhat.com/ (Humans may think they submit patches independently, but weights of their LLMs were already contaminated with the other effort. Hello brave new world.) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support 2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili @ 2026-06-17 21:17 ` Stephen Hemminger 0 siblings, 0 replies; 11+ messages in thread From: Stephen Hemminger @ 2026-06-17 21:17 UTC (permalink / raw) To: Marat Khalili; +Cc: dev@dpdk.org On Wed, 17 Jun 2026 17:37:39 +0000 Marat Khalili <marat.khalili@huawei.com> wrote: > I think this should CC participants of the previous discussion: > https://inbox.dpdk.org/dev/20260319114500.9757-1-cfontain@redhat.com/ > > (Humans may think they submit patches independently, but weights of their LLMs > were already contaminated with the other effort. Hello brave new world.) Didn't see previous thread. This effort was more targeted at why can't capture which uses pcap_compile -> bpf_convert flow be JIT'd? Had not looked back at other overlaps. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-17 21:17 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-08 20:28 [PATCH 0/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger 2026-06-08 20:28 ` [PATCH 1/4] bpf/arm64: fix zero-return branch in multi-exit programs Stephen Hemminger 2026-06-17 18:03 ` Marat Khalili 2026-06-08 20:28 ` [PATCH 2/4] test: bpf check that JIT was generated Stephen Hemminger 2026-06-17 18:09 ` Marat Khalili 2026-06-08 20:28 ` [PATCH 3/4] test: bpf check that bpf_convert can be JIT'd Stephen Hemminger 2026-06-17 18:14 ` Marat Khalili 2026-06-08 20:28 ` [PATCH 4/4] bpf/arm64: add BPF_ABS/BPF_IND packet load support Stephen Hemminger 2026-06-17 19:35 ` Marat Khalili 2026-06-17 17:37 ` [PATCH 0/4] " Marat Khalili 2026-06-17 21:17 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox