From: Hari Bathini <hbathini@linux.ibm.com>
To: adubey@linux.ibm.com, bpf@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org, maddy@linux.ibm.com,
ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net,
shuah@kernel.org, linux-kselftest@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH v7 2/7] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
Date: Sat, 13 Jun 2026 18:07:59 +0530 [thread overview]
Message-ID: <c1a8cf30-a229-48b7-93a5-02d7fc288fc8@linux.ibm.com> (raw)
In-Reply-To: <20260611153826.31187-3-adubey@linux.ibm.com>
On 11/06/26 9:08 pm, adubey@linux.ibm.com wrote:
> From: Abhishek Dubey <adubey@linux.ibm.com>
>
> Move the long branch address field to the bottom of the long
> branch stub. This allows uninterrupted disassembly until the
> last 8 bytes. The last bytes exclusion is logically necessary to
> prevent disassembly failure, otherwise the actual program layout
> is never altered. Hence no effect on overall program size.
> Also, align dummy_tramp_addr field with 8-byte boundary.
>
> Following is disassembler output for test program with moved down
> dummy_tramp_addr field:
> .....
> .....
> pc:68 left:44 a6 03 08 7c : mtlr 0
> pc:72 left:40 bc ff ff 4b : b .-68
> pc:76 left:36 a6 02 68 7d : mflr 11
> pc:80 left:32 05 00 9f 42 : bcl 20, 31, .+4
> pc:84 left:28 a6 02 88 7d : mflr 12
> pc:88 left:24 14 00 8c e9 : ld 12, 20(12)
> pc:92 left:20 a6 03 89 7d : mtctr 12
> pc:96 left:16 a6 03 68 7d : mtlr 11
> pc:100 left:12 20 04 80 4e : bctr
> pc:104 left:8 c0 34 1d 00 :
>
> Failure log:
> Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
> Disassembly logic can truncate at 104, ignoring last 8 bytes.
>
> Update the dummy_tramp_addr field offset calculation from the end
> of the program to reflect its new location, for bpf_arch_text_poke()
> to update the actual trampoline's address in this field.
>
> All BPF trampoline selftests continue to pass with this patch applied.
>
> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
> ---
> arch/powerpc/net/bpf_jit.h | 3 +-
> arch/powerpc/net/bpf_jit_comp.c | 51 ++++++++++++++++---------------
> arch/powerpc/net/bpf_jit_comp64.c | 3 +-
> 3 files changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index 71e6e7d01057..6632de9871dd 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -217,7 +217,8 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
> void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx);
> void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx);
> void bpf_jit_realloc_regs(struct codegen_context *ctx);
> -int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
> +int bpf_jit_emit_exit_insn(u32 *image, u32 *fimage, struct codegen_context *ctx, int tmp_reg,
> + long exit_addr);
Yes, this does not compile on ppc32 without the corresponding
change there..
> void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt,
> int cookie_off, int retval_off);
> void store_func_meta(u32 *image, struct codegen_context *ctx, u64 func_meta, int func_meta_off);
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 79288ff789b5..ebee23d33396 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -52,9 +52,10 @@ asm (
> void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx)
> {
> int ool_stub_idx, long_branch_stub_idx;
> - int ool_instrs;
> + int stubs_instrs;
>
> /*
> + * The dummy_tramp_addr field is placed at bottom of Long branch stub.
> * In the final pass, align the mis-aligned dummy_tramp_addr field
> * in the fimage. The alignment NOP must appear before OOL stub,
> * to make ool_stub_idx & long_branch_stub_idx constant from end.
> @@ -62,13 +63,10 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
> * dummy_tramp_addr must be 8-byte aligned for load-register
> * compatibility. The fimage can be non 8-byte aligned, so final
> * alignment depends on start of fimage and the stub's instruction
> - * count offset. The OOL stub has 4 instructions (with
> - * CONFIG_PPC_FTRACE_OUT_OF_LINE) or 3 instructions (without)
> - * before dummy_tramp_addr.
> - *
> - * Emit a NOP here if (ctx->idx + ool_instrs) is odd, so that
> - * dummy_tramp_addr lands at an even instruction offset (== 8-byte
> - * aligned from an 8-byte aligned base).
> + * count. The stubs block has 11 instructions (with
> + * CONFIG_PPC_FTRACE_OUT_OF_LINE) or 10 instructions (without)
> + * before dummy_tramp_addr field. Emit a NOP if the address of
> + * dummy_tramp_addr is non aligned.
> *
> * In pass=0 when image==NULL, conservatively account for space
> * required to accommodate alignment NOP. In case final pass skips
> @@ -76,8 +74,8 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
> * jited_len signifies correct program size.
> */
>
> - ool_instrs = IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4*4 : 3*4;
> - if (!image || !IS_ALIGNED((unsigned long)fimage + ctx->idx*4 + ool_instrs, SZL))
> + stubs_instrs = IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11*4 : 10*4;
This should be stubs_sz instead of stubs_instrs. So:
stub_sz = IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 44 : 40;
> + if (!image || !IS_ALIGNED((unsigned long)fimage + ctx->idx*4 + stubs_instrs, SZL))
> EMIT(PPC_RAW_NOP());
>
> /*
> @@ -98,35 +96,37 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
>
> /*
> * Long branch stub:
> - * .long <dummy_tramp_addr> // 8-byte aligned
> * mflr r11
> * bcl 20,31,$+4
> - * mflr r12
> - * ld r12, -8-SZL(r12)
> + * mflr r12 // lr/r12 stores pc of current(this) inst.
> + * ld r12, 20(r12) // offset(dummy_tramp_addr) from prev inst. is 20
> * mtctr r12
> - * mtlr r11 // needed to retain ftrace ABI
> + * mtlr r11 // needed to retain ftrace ABI
> * bctr
> + * .long <dummy_tramp_addr> // 8-byte aligned
> */
> - if (image)
> - *((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
> -
> - ctx->idx += SZL / 4;
> long_branch_stub_idx = ctx->idx;
> EMIT(PPC_RAW_MFLR(_R11));
> EMIT(PPC_RAW_BCL4());
> EMIT(PPC_RAW_MFLR(_R12));
> - EMIT(PPC_RAW_LL(_R12, _R12, -8-SZL));
> + EMIT(PPC_RAW_LL(_R12, _R12, 20));
> EMIT(PPC_RAW_MTCTR(_R12));
> EMIT(PPC_RAW_MTLR(_R11));
> EMIT(PPC_RAW_BCTR());
>
> + if (image)
> + *((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
> +
> + ctx->idx += SZL / 4;
> +
> if (!bpf_jit_ool_stub) {
> bpf_jit_ool_stub = (ctx->idx - ool_stub_idx) * 4;
> bpf_jit_long_branch_stub = (ctx->idx - long_branch_stub_idx) * 4;
> }
> }
>
> -int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr)
> +int bpf_jit_emit_exit_insn(u32 *image, u32 *fimage, struct codegen_context *ctx,
> + int tmp_reg, long exit_addr)
> {
> if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 4))) {
> PPC_JMP(exit_addr);
> @@ -136,7 +136,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
> PPC_JMP(ctx->alt_exit_addr);
> } else {
> ctx->alt_exit_addr = ctx->idx * 4;
> - bpf_jit_build_epilogue(image, NULL, ctx);
> + bpf_jit_build_epilogue(image, fimage, ctx);
> }
>
> return 0;
> @@ -1289,6 +1289,7 @@ static void do_isync(void *info __maybe_unused)
> * bpf_func:
> * [nop|b] ool_stub
> * 2. Out-of-line stub:
> + * nop // optional nop for alignment
> * ool_stub:
> * mflr r0
> * [b|bl] <bpf_prog>/<long_branch_stub>
> @@ -1296,14 +1297,14 @@ static void do_isync(void *info __maybe_unused)
> * b bpf_func + 4
> * 3. Long branch stub:
> * long_branch_stub:
> - * .long <branch_addr>/<dummy_tramp>
> * mflr r11
> * bcl 20,31,$+4
> * mflr r12
> - * ld r12, -16(r12)
> + * ld r12, 20(r12)
> * mtctr r12
> * mtlr r11 // needed to retain ftrace ABI
> * bctr
> + * .long <branch_addr>/<dummy_tramp>
> *
> * dummy_tramp is used to reduce synchronization requirements.
> *
> @@ -1405,10 +1406,12 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
> * 1. Update the address in the long branch stub:
> * If new_addr is out of range, we will have to use the long branch stub, so patch new_addr
> * here. Otherwise, revert to dummy_tramp, but only if we had patched old_addr here.
> + *
> + * dummy_tramp_addr moved to bottom of long branch stub.
> */
> if ((new_addr && !is_offset_in_branch_range(new_addr - ip)) ||
> (old_addr && !is_offset_in_branch_range(old_addr - ip)))
> - ret = patch_ulong((void *)(bpf_func_end - bpf_jit_long_branch_stub - SZL),
> + ret = patch_ulong((void *)(bpf_func_end - SZL), /* SZL: dummy_tramp_addr offset */
> (new_addr && !is_offset_in_branch_range(new_addr - ip)) ?
> (unsigned long)new_addr : (unsigned long)dummy_tramp);
> if (ret)
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 885dc8cf55a2..eaf816a07f14 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -1726,7 +1726,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code
> * we'll just fall through to the epilogue.
> */
> if (i != flen - 1) {
> - ret = bpf_jit_emit_exit_insn(image, ctx, tmp1_reg, exit_addr);
> + ret = bpf_jit_emit_exit_insn(image, fimage, ctx,
> + tmp1_reg, exit_addr);
> if (ret)
> return ret;
> }
- Hari
next prev parent reply other threads:[~2026-06-13 12:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 15:38 [PATCH v7 0/7] powerpc/bpf: Add support for verifier selftest adubey
2026-06-11 15:38 ` [PATCH v7 1/7] powerpc/bpf: fix alignment of long branch trampoline address adubey
2026-06-13 12:34 ` Hari Bathini
2026-06-11 15:38 ` [PATCH v7 2/7] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub adubey
2026-06-11 12:18 ` bot+bpf-ci
2026-06-13 12:37 ` Hari Bathini [this message]
2026-06-11 15:38 ` [PATCH v7 3/7] selftest/bpf: Fixing powerpc JIT disassembly failure adubey
2026-06-13 12:40 ` Hari Bathini
2026-06-11 15:38 ` [PATCH v7 4/7] selftest/bpf: Enable verifier selftest for powerpc64 adubey
2026-06-13 12:48 ` Hari Bathini
2026-06-11 15:38 ` [PATCH v7 5/7] powerpc64/bpf: fix compare instruction emitted for tailcall adubey
2026-06-13 12:41 ` Hari Bathini
2026-06-11 15:38 ` [PATCH v7 6/7] selftest/bpf: Add tailcall verifier selftest for powerpc64 adubey
2026-06-13 12:42 ` Hari Bathini
2026-06-11 15:38 ` [PATCH v7 7/7] powerpc/bpf: fix buffer overflow in JIT for large BPF programs adubey
2026-06-13 12:44 ` Hari Bathini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c1a8cf30-a229-48b7-93a5-02d7fc288fc8@linux.ibm.com \
--to=hbathini@linux.ibm.com \
--cc=adubey@linux.ibm.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=linux-kselftest@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=shuah@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox