* Re: [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address
2026-05-17 21:40 ` [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address adubey
@ 2026-05-17 18:02 ` sashiko-bot
2026-05-17 18:30 ` bot+bpf-ci
2026-05-18 7:18 ` Hari Bathini
2 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-17 18:02 UTC (permalink / raw)
To: adubey; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] JIT Buffer Overflow due to undersized pass 0 allocation.
- [High] Flawed alignment arithmetic actively misaligns the trampoline address.
- [Medium] Inline epilogues silently bypass the alignment logic.
--
commit ce3a7a39e286dcf5c4eaad0616ec652506a106fa
Author: Abhishek Dubey <adubey@linux.ibm.com>
Subject: powerpc/bpf: fix alignment of long branch trampoline address
This commit attempts to align the dummy trampoline address field between
the OOL stub and the long branch stub to 8 bytes for memory compatibility.
It does this by emitting an optional NOP depending on the alignment of the
instruction pointer.
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -49,11 +49,34 @@ asm (
> " .popsection ;"
> );
>
> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx)
> {
> int ool_stub_idx, long_branch_stub_idx;
>
> /*
> + * In the final pass, align the mis-aligned dummy_tramp_addr field
> + * in the fimage. The alignment NOP must appear before OOL stub,
> + * to make ool_stub_idx & long_branch_stub_idx constant from end.
> + */
> +#ifdef CONFIG_PPC64
> + if (fimage && image) {
Since this condition evaluates to false during pass 0 (where fimage and
image are NULL), the alignment NOP is skipped and the required buffer size
in proglen is calculated without it.
During the final passes (1 and 2), if the NOP is emitted, could this increase
the instruction count beyond proglen and overflow into the subsequent
exception table or fixups? A later patch in the series may alter this area,
but does this patch introduce a self-contained memory corruption regression?
> + /*
> + * pc points to first instruction of OOL stub,
> + * dummy_tramp_addr is past 4/3 instructions depending on
> + * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
> + *
> + * The decision to emit alignment NOP must depend on the alignment
> + * of dummy_tramp_addr field.
> + */
> + unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
> + pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
Because pc is an unsigned long byte address, adding 4 or 3 adds bytes
instead of the intended 16 or 12 bytes (since PowerPC instructions are 4
bytes long).
Adding 4 bytes to an 8-byte aligned address makes it unaligned. Does this
flawed arithmetic force the code to emit a NOP when the target address is
already aligned, thereby misaligning it?
> + if (!IS_ALIGNED(pc, 8))
> + EMIT(PPC_RAW_NOP());
> + }
> +#endif
[ ... ]
> @@ -107,7 +131,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
> PPC_JMP(ctx->alt_exit_addr);
> } else {
> ctx->alt_exit_addr = ctx->idx * 4;
> - bpf_jit_build_epilogue(image, ctx);
> + bpf_jit_build_epilogue(image, NULL, ctx);
> }
Because fimage is strictly passed as NULL here, the if (fimage && image)
check in bpf_jit_build_fentry_stubs() will always evaluate to false.
Will this cause the alignment NOP to be unconditionally skipped for all inline
epilogues, leaving their dummy_tramp_addr fields unaligned?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260517214043.12975-1-adubey@linux.ibm.com?part=1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64
2026-05-17 21:40 ` [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64 adubey
@ 2026-05-17 18:18 ` bot+bpf-ci
0 siblings, 0 replies; 18+ messages in thread
From: bot+bpf-ci @ 2026-05-17 18:18 UTC (permalink / raw)
To: adubey, bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 3647 bytes --]
> diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
> index 9eeb5b0b63d6..cdc2a3de3054 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_misc.h
> +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
> @@ -158,6 +158,7 @@
> #define __arch_arm64 __arch("ARM64")
> #define __arch_riscv64 __arch("RISCV64")
> #define __arch_s390x __arch("s390x")
> +#define __arch_powerpc64 __arch("POWERPC64")
> #define __caps_unpriv(caps) __test_tag("test_caps_unpriv=" EXPAND_QUOTE(caps))
> #define __load_if_JITed() __test_tag("load_mode=jited")
> #define __load_if_no_JITed() __test_tag("load_mode=no_jited")
> diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
> index abdb9e6e3713..d5589355ed9e 100644
> --- a/tools/testing/selftests/bpf/test_loader.c
> +++ b/tools/testing/selftests/bpf/test_loader.c
> @@ -377,6 +377,7 @@ enum arch {
> ARCH_ARM64 = 0x4,
> ARCH_RISCV64 = 0x8,
> ARCH_S390X = 0x10,
> + ARCH_POWERPC64 = 0x20,
> };
>
> static int get_current_arch(void)
> @@ -389,6 +390,8 @@ static int get_current_arch(void)
> return ARCH_RISCV64;
> #elif defined(__s390x__)
> return ARCH_S390X;
> +#elif defined(__powerpc64__)
> + return ARCH_POWERPC64;
> #endif
> return ARCH_UNKNOWN;
> }
> @@ -580,6 +583,8 @@ static int parse_test_spec(struct test_loader *tester,
> arch = ARCH_RISCV64;
> } else if (strcmp(val, "s390x") == 0) {
> arch = ARCH_S390X;
> + } else if (strcmp(val, "POWERPC64") == 0) {
> + arch = ARCH_POWERPC64;
> } else {
> PRINT_FAIL("bad arch spec: '%s'\n", val);
> err = -EINVAL;
This patch looks clean. However, unaddressed review comments from earlier
versions of related patches in the series were identified:
https://lore.kernel.org/bpf/20260517180201.87B8AC2BCB0@smtp.kernel.org
Three issues were raised by sashiko-bot@kernel.org on Sun, 17 May 2026 about
alignment logic in arch/powerpc/net/bpf_jit_comp.c (from a related commit in
this series):
Issue 1: In bpf_jit_build_fentry_stubs(), the alignment NOP is only emitted
when (fimage && image) is true. During pass 0 sizing, both are NULL, so the
NOP is skipped and proglen is calculated without it.
During final passes 1 and 2, if the NOP is emitted, could this increase the
instruction count beyond proglen and overflow into the exception table or
fixups?
Issue 2: In the same function, the alignment calculation appears flawed:
unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10;
The comment says "11/10 instructions" but pc is a byte address. Adding 11 or
10 adds bytes instead of the intended 44 or 40 bytes (since PowerPC
instructions are 4 bytes).
Adding 4 bytes to an 8-byte aligned address makes it unaligned. Does this
force the code to emit a NOP when the target address is already aligned?
Issue 3: In bpf_jit_emit_exit_insn(), the code passes NULL for fimage:
} else {
ctx->alt_exit_addr = ctx->idx * 4;
bpf_jit_build_epilogue(image, NULL, ctx);
Because fimage is strictly passed as NULL, the (fimage && image) check will
always evaluate to false. Will this cause the alignment NOP to be
unconditionally skipped for all inline epilogues, leaving their
dummy_tramp_addr fields unaligned?
These comments should be addressed in the related commits of this series.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure
2026-05-17 21:40 ` [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure adubey
@ 2026-05-17 18:18 ` bot+bpf-ci
2026-05-17 18:38 ` sashiko-bot
1 sibling, 0 replies; 18+ messages in thread
From: bot+bpf-ci @ 2026-05-17 18:18 UTC (permalink / raw)
To: adubey, bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]
> diff --git a/tools/testing/selftests/bpf/jit_disasm_helpers.c b/tools/testing/selftests/bpf/jit_disasm_helpers.c
> index 364c557c5115..4c6bcbe08491 100644
> --- a/tools/testing/selftests/bpf/jit_disasm_helpers.c
> +++ b/tools/testing/selftests/bpf/jit_disasm_helpers.c
> @@ -170,9 +170,11 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
> struct bpf_prog_info info = {};
> __u32 info_len = sizeof(info);
> __u32 jited_funcs, len, pc;
> + __u32 trunc_len = 0;
> __u32 *func_lens = NULL;
> FILE *text_out = NULL;
> uint8_t *image = NULL;
> + char *triple = NULL;
> int i, err = 0;
>
> if (!llvm_initialized) {
[ ... ]
> @@ -216,9 +218,18 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
> if (!ASSERT_OK(err, "bpf_prog_get_info_by_fd #2"))
> goto out;
>
> + /*
> + * last 8 bytes contains dummy_trampoline address in JIT
> + * output for 64-bit and 32-bit powerpc, which can't
> + * disassemble a to valid instruction.
> + */
> + triple = LLVMGetDefaultTargetTriple();
> + if (strstr(triple, "powerpc"))
> + trunc_len = 8;
> +
Does this code leak the triple string? Looking at the LLVM C API,
LLVMGetDefaultTargetTriple() returns a string that must be freed with
LLVMDisposeMessage(). The sibling function disasm_one_func() in this
same file shows the correct pattern:
if (triple)
LLVMDisposeMessage(triple);
The cleanup section at the 'out:' label properly frees text_out, image,
and func_lens, but triple isn't included. While this is test code and
the leak is small, it violates the established cleanup pattern in the
file and can accumulate across test runs.
> for (pc = 0, i = 0; i < jited_funcs; ++i) {
> fprintf(text_out, "func #%d:\n", i);
> - disasm_one_func(text_out, image + pc, func_lens[i]);
> + disasm_one_func(text_out, image + pc, func_lens[i] - trunc_len);
> fprintf(text_out, "\n");
> pc += func_lens[i];
> }
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
2026-05-17 21:40 ` [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub adubey
@ 2026-05-17 18:23 ` sashiko-bot
2026-05-17 18:30 ` bot+bpf-ci
2026-05-18 7:25 ` Hari Bathini
2 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-17 18:23 UTC (permalink / raw)
To: adubey; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] The JIT alignment logic for `dummy_tramp_addr` is skipped during the sizing pass but executed during code generation, causing buffer overflows and BPF trampoline corruption.
- [High] The alignment prediction logic for `dummy_tramp_addr` incorrectly adds unscaled bytes instead of instructions, breaking the alignment and causing `bpf_arch_text_poke()` to fail.
- [Medium] Commit message claims an implementation that does not exist in the code.
--
commit e4939a5c93f678ddd053f906d998b2151b87ea0f
Author: Abhishek Dubey <adubey@linux.ibm.com>
powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
This commit moves the dummy_tramp_addr field to the bottom of the long
branch stub to allow uninterrupted disassembly. It also updates the offset
calculation for bpf_arch_text_poke() to reflect the new location.
> Move the long branch address space to the bottom of the long
> branch stub. This allows uninterrupted disassembly until the
> last 8 bytes. Exclude these last bytes from the overall
> program length to prevent failure in assembly generation.
The commit message states that the last bytes are excluded from the overall
program length. Is this description accurate?
Looking at the code, ctx->idx is not decremented, and the length calculation
still appears to include the 8 bytes of dummy_tramp_addr.
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index ef7614177cb1f..b73bc9295c31d 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -57,19 +57,21 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
[ ... ]
> #ifdef CONFIG_PPC64
> if (fimage && image) {
> /*
> * pc points to first instruction of OOL stub,
> - * dummy_tramp_addr is past 4/3 instructions depending on
> + * dummy_tramp_addr is past 11/10 instructions depending on
> * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
> *
> * The decision to emit alignment NOP must depend on the alignment
> * of dummy_tramp_addr field.
> */
> unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
> - pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
> + pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10;
>
> if (!IS_ALIGNED(pc, 8))
> EMIT(PPC_RAW_NOP());
> }
> #endif
Does this alignment prediction logic incorrectly add unscaled bytes instead
of instructions?
Since pc is a byte address, adding 11 or 10 bytes instead of multiplying by
4 (the instruction size) guarantees pc is no longer a multiple of 8. This
would cause the NOP to be unconditionally emitted, merely shifting the layout
by 4 bytes instead of dynamically padding it.
If the NOP is unconditionally emitted here, could dummy_tramp_addr remain
8-byte misaligned for many BPF programs, causing patch_ulong() to strictly
reject the address and BPF trampoline attachments to fail with -EINVAL?
Additionally, does this conditional alignment logic cause buffer overflows?
During the sizing pass, fimage is NULL, so the NOP emission is skipped and
the program length is calculated without it. During the code generation
pass, the NOP is emitted, causing the generated code to exceed the allocated
length by 4 bytes. Could this silently overwrite JIT metadata allocated
immediately after the program?
[ ... ]
> @@ -1400,10 +1404,12 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
> * 1. Update the address in the long branch stub:
> * If new_addr is out of range, we will have to use the long branch stub, so patch new_addr
> * here. Otherwise, revert to dummy_tramp, but only if we had patched old_addr here.
> + *
> + * dummy_tramp_addr moved to bottom of long branch stub.
> */
> if ((new_addr && !is_offset_in_branch_range(new_addr - ip)) ||
> (old_addr && !is_offset_in_branch_range(old_addr - ip)))
> - ret = patch_ulong((void *)(bpf_func_end - bpf_jit_long_branch_stub - SZL),
> + ret = patch_ulong((void *)(bpf_func_end - SZL), /* SZL: dummy_tramp_addr offset */
> (new_addr && !is_offset_in_branch_range(new_addr - ip)) ?
> (unsigned long)new_addr : (unsigned long)dummy_tramp);
> if (ret)
If the extra NOP is emitted during generation and the layout shifts by 4
bytes, dummy_tramp_addr would be physically located at bpf_func_end - 4.
Since bpf_arch_text_poke() patches bpf_func_end - SZL (which evaluates to
8 bytes), could patch_ulong() overwrite the bctr instruction with the new
target address instead of dummy_tramp_addr, leading to an illegal instruction
panic when the trampoline executes?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260517214043.12975-1-adubey@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address
2026-05-17 21:40 ` [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address adubey
2026-05-17 18:02 ` sashiko-bot
@ 2026-05-17 18:30 ` bot+bpf-ci
2026-05-18 7:18 ` Hari Bathini
2 siblings, 0 replies; 18+ messages in thread
From: bot+bpf-ci @ 2026-05-17 18:30 UTC (permalink / raw)
To: adubey, bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 6519 bytes --]
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index f32de8704d4d..71e6e7d01057 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -214,8 +214,8 @@ int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *
> int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
> u32 *addrs, int pass, bool extra_pass);
> void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
> -void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
> +void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx);
> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx);
> void bpf_jit_realloc_regs(struct codegen_context *ctx);
> int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
> void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt,
The header file arch/powerpc/net/bpf_jit.h is shared by both the 64-bit
(bpf_jit_comp64.c) and 32-bit (bpf_jit_comp32.c) BPF JIT compilers (per
arch/powerpc/net/Makefile). The patch updates these prototypes to add a
new 'u32 *fimage' parameter, and modifies the PPC64 implementation
(bpf_jit_comp64.c) and the shared file (bpf_jit_comp.c) accordingly.
However, arch/powerpc/net/bpf_jit_comp32.c is not updated. In that file,
bpf_jit_build_epilogue() still has the 2-argument signature:
void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
and its call to bpf_jit_build_fentry_stubs() passes only 2 arguments:
bpf_jit_build_fentry_stubs(image, ctx);
Will this cause a function signature conflict under CONFIG_PPC32 builds
(CONFIG_PPC_BOOK3S_32, CONFIG_PPC_8xx, etc.) since the header now
declares the 3-argument prototype while the .c file defines the
2-argument version?
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 53ab97ad6074..ef7614177cb1 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -49,11 +49,34 @@ asm (
> " .popsection ;"
> );
>
> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx)
> {
> int ool_stub_idx, long_branch_stub_idx;
>
> /*
> + * In the final pass, align the mis-aligned dummy_tramp_addr field
> + * in the fimage. The alignment NOP must appear before OOL stub,
> + * to make ool_stub_idx & long_branch_stub_idx constant from end.
> + */
> +#ifdef CONFIG_PPC64
> + if (fimage && image) {
During pass 0 (where fimage and image are NULL), this condition evaluates
to false. The alignment NOP is not counted in proglen during size
calculation. During the final passes (1 and 2), if the NOP is emitted,
could this increase the instruction count beyond proglen and overflow into
the subsequent exception table or fixups?
> + /*
> + * pc points to first instruction of OOL stub,
> + * dummy_tramp_addr is past 4/3 instructions depending on
> + * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
> + *
> + * The decision to emit alignment NOP must depend on the alignment
> + * of dummy_tramp_addr field.
> + */
> + unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
> + pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
> +
> + if (!IS_ALIGNED(pc, 8))
> + EMIT(PPC_RAW_NOP());
> + }
The alignment calculation appears to be incorrect. The code adds 4 or 3 to
the pc value, which is a byte address. Since PowerPC instructions are 4
bytes each, adding 4 or 3 bytes gives the wrong offset.
Looking at the OOL stub instructions that follow this block, with
CONFIG_PPC_FTRACE_OUT_OF_LINE there are 4 instructions (mflr r0; nop;
mtlr r0; b bpf_func+4). This produces a 16-byte offset (4 instructions * 4
bytes). Without the config, there are 3 instructions producing a 12-byte
offset.
The dummy_tramp_addr field would therefore be at pc+16 or pc+12, not pc+4
or pc+3.
For example, with CONFIG_PPC_FTRACE_OUT_OF_LINE enabled and pc 8-byte
aligned (pc%8 == 0):
- True dummy_tramp_addr position: pc+16, which is (pc+16)%8 == 0, already
aligned.
- But the code calculates: pc += 4 -> pc+4, then !IS_ALIGNED(pc+4, 8) is
true -> emits NOP.
- With the NOP, dummy_tramp_addr ends up at pc+4+16 = pc+20, and (pc+20)%8
== 4, which is misaligned.
Similarly, when pc%8 == 4:
- True dummy_tramp_addr position: pc+16, which is (pc+16)%8 == 4, not
aligned.
- But the code calculates: pc += 4 -> pc+4, then IS_ALIGNED(pc+4, 8) is
true -> skips NOP.
- dummy_tramp_addr remains at pc+16, which is misaligned.
Does this flawed arithmetic force the code to emit a NOP when the target
address is already aligned, thereby misaligning it instead of aligning it?
> @@ -107,7 +131,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
> PPC_JMP(ctx->alt_exit_addr);
> } else {
> ctx->alt_exit_addr = ctx->idx * 4;
> - bpf_jit_build_epilogue(image, ctx);
> + bpf_jit_build_epilogue(image, NULL, ctx);
> }
Because fimage is strictly passed as NULL here, the if (fimage && image)
check in bpf_jit_build_fentry_stubs() will always evaluate to false. Will
this cause the alignment NOP to be unconditionally skipped for all inline
epilogues, leaving their dummy_tramp_addr fields unaligned?
[ ... ]
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index db364d9083e7..885dc8cf55a2 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -398,7 +398,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
> }
> }
>
> -void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx)
> {
> bpf_jit_emit_common_epilogue(image, ctx);
>
> @@ -407,7 +407,7 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
>
> EMIT(PPC_RAW_BLR());
>
> - bpf_jit_build_fentry_stubs(image, ctx);
> + bpf_jit_build_fentry_stubs(image, fimage, ctx);
> }
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
2026-05-17 21:40 ` [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub adubey
2026-05-17 18:23 ` sashiko-bot
@ 2026-05-17 18:30 ` bot+bpf-ci
2026-05-18 7:25 ` Hari Bathini
2 siblings, 0 replies; 18+ messages in thread
From: bot+bpf-ci @ 2026-05-17 18:30 UTC (permalink / raw)
To: adubey, bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]
> commit c1b9d3c3e6cf6547bc01395bd0576b1ea8dbd1d2
> Author: Abhishek Dubey <adubey@linux.ibm.com>
> Date: Sun May 17 17:40:40 2026 -0400
>
> powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
>
> Move the long branch address space to the bottom of the long
> branch stub. This allows uninterrupted disassembly until the
> last 8 bytes. Exclude these last bytes from the overall
> program length to prevent failure in assembly generation.
> Also, align dummy_tramp_addr field with 8-byte boundary.
>
> Following is disassembler output for test program with moved down
> dummy_tramp_addr field:
> .....
> .....
> pc:68 left:44 a6 03 08 7c : mtlr 0
> pc:72 left:40 bc ff ff 4b : b .-68
> pc:76 left:36 a6 02 68 7d : mflr 11
> pc:80 left:32 05 00 9f 42 : bcl 20, 31, .+4
> pc:84 left:28 a6 02 88 7d : mflr 12
> pc:88 left:24 14 00 8c e9 : ld 12, 20(12)
> pc:92 left:20 a6 03 89 7d : mtctr 12
> pc:96 left:16 a6 03 68 7d : mtlr 11
> pc:100 left:12 20 04 80 4e : bctr
> pc:104 left:8 c0 34 1d 00 :
>
> Failure log:
> Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
> Disassembly logic can truncate at 104, ignoring last 8 bytes.
>
> Update the dummy_tramp_addr field offset calculation from the end
> of the program to reflect its new location, for bpf_arch_text_poke()
> to update the actual trampoline's address in this field.
>
> All BPF trampoline selftests continue to pass with this patch applied.
>
> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
This looks like a bug fix for the alignment issue introduced in the
previous commit. Should this include:
Fixes: 1fac3cfb82ab ("powerpc/bpf: fix alignment of long branch trampoline address")
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure
2026-05-17 21:40 ` [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure adubey
2026-05-17 18:18 ` bot+bpf-ci
@ 2026-05-17 18:38 ` sashiko-bot
1 sibling, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-17 18:38 UTC (permalink / raw)
To: adubey; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Truncating 8 bytes on 32-bit PowerPC incorrectly removes a valid instruction from the JIT disassembly output.
--
commit 1ee8cba34b15e2d991f046b6802fda4443b02cd0
Author: Abhishek Dubey <adubey@linux.ibm.com>
selftest/bpf: Fixing powerpc JIT disassembly failure
This commit truncates the last 8 bytes from JITed buffers on PowerPC
architectures prior to disassembly in the BPF selftests. This is intended
to prevent the disassembler from misinterpreting a trailing dummy
trampoline address as a valid instruction.
> diff --git a/tools/testing/selftests/bpf/jit_disasm_helpers.c b/tools/testing/selftests/bpf/jit_disasm_helpers.c
> --- a/tools/testing/selftests/bpf/jit_disasm_helpers.c
> +++ b/tools/testing/selftests/bpf/jit_disasm_helpers.c
[ ... ]
> @@ -216,9 +218,18 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
> if (!ASSERT_OK(err, "bpf_prog_get_info_by_fd #2"))
> goto out;
>
> + /*
> + * last 8 bytes contains dummy_trampoline address in JIT
> + * output for 64-bit and 32-bit powerpc, which can't
> + * disassemble a to valid instruction.
> + */
> + triple = LLVMGetDefaultTargetTriple();
> + if (strstr(triple, "powerpc"))
> + trunc_len = 8;
Will this remove a valid instruction on 32-bit PowerPC targets?
The dummy trampoline address is stored as an unsigned long, which occupies 8
bytes on 64-bit PowerPC, but only 4 bytes on 32-bit PowerPC.
Because the long branch stub ends with a 4-byte bctr instruction followed by
the dummy address, unconditionally truncating 8 bytes on 32-bit PowerPC would
remove both the 4-byte dummy address and the valid 4-byte bctr instruction
from the disassembly.
Does the truncation length need to be determined dynamically based on whether
the target is 32-bit or 64-bit?
> for (pc = 0, i = 0; i < jited_funcs; ++i) {
> fprintf(text_out, "func #%d:\n", i);
> - disasm_one_func(text_out, image + pc, func_lens[i]);
> + disasm_one_func(text_out, image + pc, func_lens[i] - trunc_len);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260517214043.12975-1-adubey@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 5/5] selftest/bpf: Add tailcall verifier selftest for powerpc64
2026-05-17 21:40 ` [PATCH v4 5/5] selftest/bpf: Add tailcall " adubey
@ 2026-05-17 19:14 ` sashiko-bot
0 siblings, 0 replies; 18+ messages in thread
From: sashiko-bot @ 2026-05-17 19:14 UTC (permalink / raw)
To: adubey; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] This problem wasn't introduced by this patch, but the selftest enforces the use of a 32-bit comparison (`cmplwi`) for distinguishing 64-bit pointers from scalar values, codifying a tail-call limit bypass vulnerability.
- [Medium] The expected powerpc64 JIT disassembly hardcodes config-dependent instructions and offsets, causing test failures on kernels with `CONFIG_PPC_KERNEL_PCREL` enabled.
--
commit 9136278f4ffaab3858500dc94c8a08d981e53500
Author: Abhishek Dubey <adubey@linux.ibm.com>
selftest/bpf: Add tailcall verifier selftest for powerpc64
This commit adds a verifier tailcall test case specifically for the powerpc64
architecture. It verifies the JIT compiler's code generation for BPF tail
calls by matching expected instructions.
> diff --git a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
> index 8d60c634a114f..17475ecb3207a 100644
> --- a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
> +++ b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
> @@ -90,6 +90,75 @@ __jited(" popq %rax")
> __jited(" jmp {{.*}}") /* jump to tail call tgt */
> __jited("L0: leave")
> __jited(" {{(retq|jmp 0x)}}") /* return or jump to rethunk */
> +__arch_powerpc64
> +/* program entry for main(), regular function prologue */
> +__jited(" nop")
> +__jited(" ld 2, 16(13)")
Are we sure this load is always emitted?
On modern PowerPC configurations where CONFIG_PPC_KERNEL_PCREL is enabled,
the TOC load might be omitted by the JIT compiler.
Could this use wildcards to prevent test failures on kernels with PC-relative
addressing enabled?
[ ... ]
> +/* tail call prologue for subprogram */
> +__jited(" ld 10, 0(1)")
> +__jited(" ld 9, -8(10)")
> +__jited(" cmplwi 9, 33")
This isn't a bug introduced by this patch, but does the use of cmplwi here
codify a vulnerability in the JIT compiler?
Since tail_call_info can contain either a scalar counter value or a 64-bit
pointer to the counter, using a 32-bit compare (cmplwi) only checks the lower
32 bits.
If a 64-bit pointer's lower 32 bits happen to be <= 33 (for example, a stack
pointer near a 4GB boundary), it might incorrectly bypass the tail call limit
and cause an infinite loop leading to a CPU hard lockup.
Should the JIT (and this selftest) be using cmpldi instead to properly handle
the 64-bit comparison?
[ ... ]
> +/* bpf_tail_call implementation */
> +__jited(" ld 9, -8(1)")
> +__jited(" cmplwi 9, 33")
> +__jited(" bf {{.*}}, {{.*}}")
> +__jited(" ld 9, 0(9)")
> +__jited(" cmplwi 9, 33")
(Same question here regarding the use of cmplwi instead of cmpldi)
[ ... ]
> +__jited(" ld 10, {{.*}}(10)")
> +__jited(" cmpldi 10, 0")
> +__jited(" bt {{.*}}, {{.*}}")
> +__jited(" ld 10, {{.*}}(10)")
> +__jited(" addi 10, 10, 16")
Does this instruction offset also depend on the kernel configuration?
It looks like the prologue size calculation that produces this offset depends
on !IS_ENABLED(CONFIG_PPC_KERNEL_PCREL).
Could this fail on configurations where PC-relative addressing is enabled, and
would wildcards be more appropriate here as well?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260517214043.12975-1-adubey@linux.ibm.com?part=5
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest
@ 2026-05-17 21:40 adubey
2026-05-17 21:40 ` [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address adubey
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
To: bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, Abhishek Dubey
From: Abhishek Dubey <adubey@linux.ibm.com>
The verifier selftest validates JITed instructions by matching expected
disassembly output. The first two patches fix issues in powerpc instruction
disassembly that were causing test flow failures. The fix is common for
64-bit & 32-bit powerpc. Add support for the powerpc-specific "__powerpc64"
architecture tag in the third patch, enabling proper test filtering in
verifier test files. Introduce verifier testcases for tailcalls on powerpc64
in the final patch.
The first patch in series is fix patch, correcting memory alignment with
8-byte boundary for long branch trampoline address. The subsequent
patches enables verifier selftests on powerpc.
Issue Details:
--------------
The Long branch stub in the trampoline implementation[1] provides
flexibility to handles short as well as long branch distance to
actual trampoline. Whereas, the 8 bytes long dummy_tramp_addr field
sitting before long branch stub leads to failure when enabling
verifier based seltest for ppc64.
The verifier selftests require disassembing the final jited image
to get native instructions. Later the disassembled instruction
sequence is matched against sequence of instructions provided in
test-file under __jited() wrapper. The final jited image contains
Out-of-line stub and Long branch stub as part of epilogue jitting
for a bpf program. The 8 bytes space for dummy_tramp is sandwiched
between both above mentioned stubs. These 8 bytes contain memory
address of dummy trampoline during trampoline invocation which don't
correspond to any powerpc instructions. So, disassembly fails
resulting in failure of verifier selftests.
The following code snippet shows the problem with current arrangement
made for dummy_tramp_addr.
/* Out-of-line stub */
mflr r0
[b|bl] tramp
mtlr r0 //only with OOL
b bpf_func + 4
/* Long branch stub */
.long <dummy_tramp_addr> <---Invalid bytes sequence, disassembly fails
mflr r11
bcl 20,31,$+4
mflr r12
ld r12, -8-SZL(r12)
mtctr r12
mtlr r11 //retain ftrace ABI
bctr
Consider test program binary of size 112 bytes:
0: 00000060 10004de8 00002039 f8ff21f9 81ff21f8 7000e1fb 3000e13b
28: 3000e13b 2a006038 f8ff7ff8 00000039 7000e1eb 80002138 7843037d
56: 2000804e a602087c 00000060 a603087c bcffff4b c0341d00 000000c0
84: a602687d 05009f42 a602887d f0ff8ce9 a603897d a603687d 2004804e
Disassembly output of above binary for ppc64le:
pc:0 left:112 00 00 00 60 : nop
pc:4 left:108 10 00 4d e8 : ld 2, 16(13)
pc:8 left:104 00 00 20 39 : li 9, 0
pc:12 left:100 f8 ff 21 f9 : std 9, -8(1)
pc:16 left:96 81 ff 21 f8 : stdu 1, -128(1)
pc:20 left:92 70 00 e1 fb : std 31, 112(1)
pc:24 left:88 30 00 e1 3b : addi 31, 1, 48
pc:28 left:84 30 00 e1 3b : addi 31, 1, 48
pc:32 left:80 2a 00 60 38 : li 3, 42
pc:36 left:76 f8 ff 7f f8 : std 3, -8(31)
pc:40 left:72 00 00 00 39 : li 8, 0
pc:44 left:68 70 00 e1 eb : ld 31, 112(1)
pc:48 left:64 80 00 21 38 : addi 1, 1, 128
pc:52 left:60 78 43 03 7d : mr 3, 8
pc:56 left:56 20 00 80 4e : blr
pc:60 left:52 a6 02 08 7c : mflr 0
pc:64 left:48 00 00 00 60 : nop
pc:68 left:44 a6 03 08 7c : mtlr 0
pc:72 left:40 bc ff ff 4b : b .-68
pc:76 left:36 c0 34 1d 00 :
...
Failure log:
Can't disasm instruction at offset 76: c0 34 1d 00 00 00 00 c0 a6 02 68 7d 05 00 9f 42
--------------------------------------
Observation:
Can't disasm instruction at offset 76 as this address has
".long <dummy_tramp_addr>" (0xc0341d00000000c0)
But valid instructions follow at offset 84 onwards.
Move the long branch address space to the bottom of the long
branch stub. This allows uninterrupted disassembly until the
last 8 bytes. Exclude these last bytes from the overall
program length to prevent failure in assembly generation.
Following is disassembler output for same test program with moved down
dummy_tramp_addr field:
.....
.....
pc:68 left:44 a6 03 08 7c : mtlr 0
pc:72 left:40 bc ff ff 4b : b .-68
pc:76 left:36 a6 02 68 7d : mflr 11
pc:80 left:32 05 00 9f 42 : bcl 20, 31, .+4
pc:84 left:28 a6 02 88 7d : mflr 12
pc:88 left:24 14 00 8c e9 : ld 12, 20(12)
pc:92 left:20 a6 03 89 7d : mtctr 12
pc:96 left:16 a6 03 68 7d : mtlr 11
pc:100 left:12 20 04 80 4e : bctr
pc:104 left:8 c0 34 1d 00 :
Failure log:
Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
---------------------------------------
Disassembly logic can truncate at 104, ignoring last 8 bytes.
Update the dummy_tramp_addr field offset calculation from the end
of the program to reflect its new location, for bpf_arch_text_poke()
to update the actual trampoline's address in this field.
[1] https://lore.kernel.org/all/20241030070850.1361304-18-hbathini@linux.ibm.com
v3->v4:
Changed logic for emitting alignment NOP
v2->v3:
Removed fixed NOP from bottom of long branch stub
Rebased on top of bpf-next
v1->v2:
Added fix-patch to correct memory alignment in-place
Moved the optional alignmnet NOP before OOL stub
[v1]: https://lore.kernel.org/bpf/20260225013627.22098-1-adubey@linux.ibm.com
[v2]: https://lore.kernel.org/bpf/20260403004011.44417-1-adubey@linux.ibm.com
[v3]: https://lore.kernel.org/bpf/20260411221413.44304-1-adubey@linux.ibm.com
Abhishek Dubey (5):
powerpc/bpf: fix alignment of long branch trampoline address
powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
selftest/bpf: Fixing powerpc JIT disassembly failure
selftest/bpf: Enable verifier selftest for powerpc64
selftest/bpf: Add tailcall verifier selftest for powerpc64
arch/powerpc/net/bpf_jit.h | 4 +-
arch/powerpc/net/bpf_jit_comp.c | 60 ++++++++++++----
arch/powerpc/net/bpf_jit_comp64.c | 4 +-
.../selftests/bpf/jit_disasm_helpers.c | 13 +++-
tools/testing/selftests/bpf/progs/bpf_misc.h | 1 +
.../bpf/progs/verifier_tailcall_jit.c | 69 +++++++++++++++++++
tools/testing/selftests/bpf/test_loader.c | 5 ++
7 files changed, 136 insertions(+), 20 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address
2026-05-17 21:40 [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest adubey
@ 2026-05-17 21:40 ` adubey
2026-05-17 18:02 ` sashiko-bot
` (2 more replies)
2026-05-17 21:40 ` [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub adubey
` (4 subsequent siblings)
5 siblings, 3 replies; 18+ messages in thread
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
To: bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, Abhishek Dubey
From: Abhishek Dubey <adubey@linux.ibm.com>
Ensure the dummy trampoline address field present between the OOL stub
and the long branch stub is 8-byte aligned, for memory compatibility
when content loaded to a register.
Reported-by: Hari Bathini <hbathini@linux.ibm.com>
Fixes: d243b62b7bd3 ("powerpc64/bpf: Add support for bpf trampolines")
Cc: stable@vger.kernel.org
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
arch/powerpc/net/bpf_jit.h | 4 ++--
arch/powerpc/net/bpf_jit_comp.c | 34 ++++++++++++++++++++++++++-----
arch/powerpc/net/bpf_jit_comp64.c | 4 ++--
3 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index f32de8704d4d..71e6e7d01057 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -214,8 +214,8 @@ int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *
int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
u32 *addrs, int pass, bool extra_pass);
void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
-void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx);
+void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx);
void bpf_jit_realloc_regs(struct codegen_context *ctx);
int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt,
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 53ab97ad6074..ef7614177cb1 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -49,11 +49,34 @@ asm (
" .popsection ;"
);
-void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx)
{
int ool_stub_idx, long_branch_stub_idx;
/*
+ * In the final pass, align the mis-aligned dummy_tramp_addr field
+ * in the fimage. The alignment NOP must appear before OOL stub,
+ * to make ool_stub_idx & long_branch_stub_idx constant from end.
+ */
+#ifdef CONFIG_PPC64
+ if (fimage && image) {
+ /*
+ * pc points to first instruction of OOL stub,
+ * dummy_tramp_addr is past 4/3 instructions depending on
+ * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
+ *
+ * The decision to emit alignment NOP must depend on the alignment
+ * of dummy_tramp_addr field.
+ */
+ unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
+ pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
+
+ if (!IS_ALIGNED(pc, 8))
+ EMIT(PPC_RAW_NOP());
+ }
+#endif
+
+ /* nop // optional, for alignment of dummy_tramp_addr
* Out-of-line stub:
* mflr r0
* [b|bl] tramp
@@ -70,7 +93,7 @@ void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
/*
* Long branch stub:
- * .long <dummy_tramp_addr>
+ * .long <dummy_tramp_addr> // 8-byte aligned
* mflr r11
* bcl 20,31,$+4
* mflr r12
@@ -81,6 +104,7 @@ void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
*/
if (image)
*((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
+
ctx->idx += SZL / 4;
long_branch_stub_idx = ctx->idx;
EMIT(PPC_RAW_MFLR(_R11));
@@ -107,7 +131,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
PPC_JMP(ctx->alt_exit_addr);
} else {
ctx->alt_exit_addr = ctx->idx * 4;
- bpf_jit_build_epilogue(image, ctx);
+ bpf_jit_build_epilogue(image, NULL, ctx);
}
return 0;
@@ -286,7 +310,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
*/
bpf_jit_build_prologue(NULL, &cgctx);
addrs[fp->len] = cgctx.idx * 4;
- bpf_jit_build_epilogue(NULL, &cgctx);
+ bpf_jit_build_epilogue(NULL, NULL, &cgctx);
fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
@@ -318,7 +342,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
bpf_jit_binary_pack_free(fhdr, hdr);
goto out_err;
}
- bpf_jit_build_epilogue(code_base, &cgctx);
+ bpf_jit_build_epilogue(code_base, fcode_base, &cgctx);
if (bpf_jit_enable > 1)
pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index db364d9083e7..885dc8cf55a2 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -398,7 +398,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
}
}
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx)
{
bpf_jit_emit_common_epilogue(image, ctx);
@@ -407,7 +407,7 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
EMIT(PPC_RAW_BLR());
- bpf_jit_build_fentry_stubs(image, ctx);
+ bpf_jit_build_fentry_stubs(image, fimage, ctx);
}
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
2026-05-17 21:40 [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest adubey
2026-05-17 21:40 ` [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address adubey
@ 2026-05-17 21:40 ` adubey
2026-05-17 18:23 ` sashiko-bot
` (2 more replies)
2026-05-17 21:40 ` [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure adubey
` (3 subsequent siblings)
5 siblings, 3 replies; 18+ messages in thread
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
To: bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, Abhishek Dubey
From: Abhishek Dubey <adubey@linux.ibm.com>
Move the long branch address space to the bottom of the long
branch stub. This allows uninterrupted disassembly until the
last 8 bytes. Exclude these last bytes from the overall
program length to prevent failure in assembly generation.
Also, align dummy_tramp_addr field with 8-byte boundary.
Following is disassembler output for test program with moved down
dummy_tramp_addr field:
.....
.....
pc:68 left:44 a6 03 08 7c : mtlr 0
pc:72 left:40 bc ff ff 4b : b .-68
pc:76 left:36 a6 02 68 7d : mflr 11
pc:80 left:32 05 00 9f 42 : bcl 20, 31, .+4
pc:84 left:28 a6 02 88 7d : mflr 12
pc:88 left:24 14 00 8c e9 : ld 12, 20(12)
pc:92 left:20 a6 03 89 7d : mtctr 12
pc:96 left:16 a6 03 68 7d : mtlr 11
pc:100 left:12 20 04 80 4e : bctr
pc:104 left:8 c0 34 1d 00 :
Failure log:
Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
Disassembly logic can truncate at 104, ignoring last 8 bytes.
Update the dummy_tramp_addr field offset calculation from the end
of the program to reflect its new location, for bpf_arch_text_poke()
to update the actual trampoline's address in this field.
All BPF trampoline selftests continue to pass with this patch applied.
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
arch/powerpc/net/bpf_jit_comp.c | 34 +++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index ef7614177cb1..b73bc9295c31 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -57,19 +57,21 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
* In the final pass, align the mis-aligned dummy_tramp_addr field
* in the fimage. The alignment NOP must appear before OOL stub,
* to make ool_stub_idx & long_branch_stub_idx constant from end.
+ *
+ * The dummy_tramp_addr field is placed at bottom of Long branch stub.
*/
#ifdef CONFIG_PPC64
if (fimage && image) {
/*
* pc points to first instruction of OOL stub,
- * dummy_tramp_addr is past 4/3 instructions depending on
+ * dummy_tramp_addr is past 11/10 instructions depending on
* CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
*
* The decision to emit alignment NOP must depend on the alignment
* of dummy_tramp_addr field.
*/
unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
- pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
+ pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10;
if (!IS_ALIGNED(pc, 8))
EMIT(PPC_RAW_NOP());
@@ -93,28 +95,29 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
/*
* Long branch stub:
- * .long <dummy_tramp_addr> // 8-byte aligned
* mflr r11
* bcl 20,31,$+4
- * mflr r12
- * ld r12, -8-SZL(r12)
+ * mflr r12 // lr/r12 stores pc of current(this) inst.
+ * ld r12, 20(r12) // offset(dummy_tramp_addr) from prev inst. is 20
* mtctr r12
- * mtlr r11 // needed to retain ftrace ABI
+ * mtlr r11 // needed to retain ftrace ABI
* bctr
+ * .long <dummy_tramp_addr> // 8-byte aligned
*/
- if (image)
- *((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
-
- ctx->idx += SZL / 4;
long_branch_stub_idx = ctx->idx;
EMIT(PPC_RAW_MFLR(_R11));
EMIT(PPC_RAW_BCL4());
EMIT(PPC_RAW_MFLR(_R12));
- EMIT(PPC_RAW_LL(_R12, _R12, -8-SZL));
+ EMIT(PPC_RAW_LL(_R12, _R12, 20));
EMIT(PPC_RAW_MTCTR(_R12));
EMIT(PPC_RAW_MTLR(_R11));
EMIT(PPC_RAW_BCTR());
+ if (image)
+ *((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
+
+ ctx->idx += SZL / 4;
+
if (!bpf_jit_ool_stub) {
bpf_jit_ool_stub = (ctx->idx - ool_stub_idx) * 4;
bpf_jit_long_branch_stub = (ctx->idx - long_branch_stub_idx) * 4;
@@ -1284,6 +1287,7 @@ static void do_isync(void *info __maybe_unused)
* bpf_func:
* [nop|b] ool_stub
* 2. Out-of-line stub:
+ * nop // optional nop for alignment
* ool_stub:
* mflr r0
* [b|bl] <bpf_prog>/<long_branch_stub>
@@ -1291,14 +1295,14 @@ static void do_isync(void *info __maybe_unused)
* b bpf_func + 4
* 3. Long branch stub:
* long_branch_stub:
- * .long <branch_addr>/<dummy_tramp>
* mflr r11
* bcl 20,31,$+4
* mflr r12
- * ld r12, -16(r12)
+ * ld r12, 20(r12)
* mtctr r12
* mtlr r11 // needed to retain ftrace ABI
* bctr
+ * .long <branch_addr>/<dummy_tramp>
*
* dummy_tramp is used to reduce synchronization requirements.
*
@@ -1400,10 +1404,12 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
* 1. Update the address in the long branch stub:
* If new_addr is out of range, we will have to use the long branch stub, so patch new_addr
* here. Otherwise, revert to dummy_tramp, but only if we had patched old_addr here.
+ *
+ * dummy_tramp_addr moved to bottom of long branch stub.
*/
if ((new_addr && !is_offset_in_branch_range(new_addr - ip)) ||
(old_addr && !is_offset_in_branch_range(old_addr - ip)))
- ret = patch_ulong((void *)(bpf_func_end - bpf_jit_long_branch_stub - SZL),
+ ret = patch_ulong((void *)(bpf_func_end - SZL), /* SZL: dummy_tramp_addr offset */
(new_addr && !is_offset_in_branch_range(new_addr - ip)) ?
(unsigned long)new_addr : (unsigned long)dummy_tramp);
if (ret)
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure
2026-05-17 21:40 [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest adubey
2026-05-17 21:40 ` [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address adubey
2026-05-17 21:40 ` [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub adubey
@ 2026-05-17 21:40 ` adubey
2026-05-17 18:18 ` bot+bpf-ci
2026-05-17 18:38 ` sashiko-bot
2026-05-17 21:40 ` [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64 adubey
` (2 subsequent siblings)
5 siblings, 2 replies; 18+ messages in thread
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
To: bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, Abhishek Dubey
From: Abhishek Dubey <adubey@linux.ibm.com>
Ensure that the trampoline stubs JITed at the tail of the
epilogue do not expose the dummy trampoline address stored
in the last 8 bytes (for both 64-bit and 32-bit PowerPC)
to the disassembly flow. Prevent the disassembler from
ingesting this memory address, as it may occasionally decode
into a seemingly valid but incorrect instruction. Fix this
issue by truncating the last 8 bytes from JITed buffers
before supplying them for disassembly.
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
tools/testing/selftests/bpf/jit_disasm_helpers.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/jit_disasm_helpers.c b/tools/testing/selftests/bpf/jit_disasm_helpers.c
index 364c557c5115..4c6bcbe08491 100644
--- a/tools/testing/selftests/bpf/jit_disasm_helpers.c
+++ b/tools/testing/selftests/bpf/jit_disasm_helpers.c
@@ -170,9 +170,11 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
struct bpf_prog_info info = {};
__u32 info_len = sizeof(info);
__u32 jited_funcs, len, pc;
+ __u32 trunc_len = 0;
__u32 *func_lens = NULL;
FILE *text_out = NULL;
uint8_t *image = NULL;
+ char *triple = NULL;
int i, err = 0;
if (!llvm_initialized) {
@@ -216,9 +218,18 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
if (!ASSERT_OK(err, "bpf_prog_get_info_by_fd #2"))
goto out;
+ /*
+ * last 8 bytes contains dummy_trampoline address in JIT
+ * output for 64-bit and 32-bit powerpc, which can't
+ * disassemble a to valid instruction.
+ */
+ triple = LLVMGetDefaultTargetTriple();
+ if (strstr(triple, "powerpc"))
+ trunc_len = 8;
+
for (pc = 0, i = 0; i < jited_funcs; ++i) {
fprintf(text_out, "func #%d:\n", i);
- disasm_one_func(text_out, image + pc, func_lens[i]);
+ disasm_one_func(text_out, image + pc, func_lens[i] - trunc_len);
fprintf(text_out, "\n");
pc += func_lens[i];
}
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64
2026-05-17 21:40 [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest adubey
` (2 preceding siblings ...)
2026-05-17 21:40 ` [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure adubey
@ 2026-05-17 21:40 ` adubey
2026-05-17 18:18 ` bot+bpf-ci
2026-05-17 21:40 ` [PATCH v4 5/5] selftest/bpf: Add tailcall " adubey
2026-05-18 11:44 ` [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest Christophe Leroy (CS GROUP)
5 siblings, 1 reply; 18+ messages in thread
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
To: bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, Abhishek Dubey
From: Abhishek Dubey <adubey@linux.ibm.com>
This patch enables arch specifier "__powerpc64" in verifier
selftest for ppc64. Power 32-bit would require separate
handling. Changes tested for 64-bit only.
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
tools/testing/selftests/bpf/progs/bpf_misc.h | 1 +
tools/testing/selftests/bpf/test_loader.c | 5 +++++
2 files changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index 9eeb5b0b63d6..cdc2a3de3054 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -158,6 +158,7 @@
#define __arch_arm64 __arch("ARM64")
#define __arch_riscv64 __arch("RISCV64")
#define __arch_s390x __arch("s390x")
+#define __arch_powerpc64 __arch("POWERPC64")
#define __caps_unpriv(caps) __test_tag("test_caps_unpriv=" EXPAND_QUOTE(caps))
#define __load_if_JITed() __test_tag("load_mode=jited")
#define __load_if_no_JITed() __test_tag("load_mode=no_jited")
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index abdb9e6e3713..d5589355ed9e 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -377,6 +377,7 @@ enum arch {
ARCH_ARM64 = 0x4,
ARCH_RISCV64 = 0x8,
ARCH_S390X = 0x10,
+ ARCH_POWERPC64 = 0x20,
};
static int get_current_arch(void)
@@ -389,6 +390,8 @@ static int get_current_arch(void)
return ARCH_RISCV64;
#elif defined(__s390x__)
return ARCH_S390X;
+#elif defined(__powerpc64__)
+ return ARCH_POWERPC64;
#endif
return ARCH_UNKNOWN;
}
@@ -580,6 +583,8 @@ static int parse_test_spec(struct test_loader *tester,
arch = ARCH_RISCV64;
} else if (strcmp(val, "s390x") == 0) {
arch = ARCH_S390X;
+ } else if (strcmp(val, "POWERPC64") == 0) {
+ arch = ARCH_POWERPC64;
} else {
PRINT_FAIL("bad arch spec: '%s'\n", val);
err = -EINVAL;
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v4 5/5] selftest/bpf: Add tailcall verifier selftest for powerpc64
2026-05-17 21:40 [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest adubey
` (3 preceding siblings ...)
2026-05-17 21:40 ` [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64 adubey
@ 2026-05-17 21:40 ` adubey
2026-05-17 19:14 ` sashiko-bot
2026-05-18 11:44 ` [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest Christophe Leroy (CS GROUP)
5 siblings, 1 reply; 18+ messages in thread
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
To: bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable, Abhishek Dubey
From: Abhishek Dubey <adubey@linux.ibm.com>
Verifier testcase result for tailcalls:
# ./test_progs -t verifier_tailcall
#618/1 verifier_tailcall/invalid map type for tail call:OK
#618/2 verifier_tailcall/invalid map type for tail call @unpriv:OK
#618 verifier_tailcall:OK
#619/1 verifier_tailcall_jit/main:OK
#619 verifier_tailcall_jit:OK
Summary: 2/3 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
.../bpf/progs/verifier_tailcall_jit.c | 69 +++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
index 8d60c634a114..17475ecb3207 100644
--- a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
+++ b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
@@ -90,6 +90,75 @@ __jited(" popq %rax")
__jited(" jmp {{.*}}") /* jump to tail call tgt */
__jited("L0: leave")
__jited(" {{(retq|jmp 0x)}}") /* return or jump to rethunk */
+__arch_powerpc64
+/* program entry for main(), regular function prologue */
+__jited(" nop")
+__jited(" ld 2, 16(13)")
+__jited(" li 9, 0")
+__jited(" std 9, -8(1)")
+__jited(" mflr 0")
+__jited(" std 0, 16(1)")
+__jited(" stdu 1, {{.*}}(1)")
+/* load address and call sub() via count register */
+__jited(" lis 12, {{.*}}")
+__jited(" sldi 12, 12, 32")
+__jited(" oris 12, 12, {{.*}}")
+__jited(" ori 12, 12, {{.*}}")
+__jited(" mtctr 12")
+__jited(" bctrl")
+__jited(" mr 8, 3")
+__jited(" li 8, 0")
+__jited(" addi 1, 1, {{.*}}")
+__jited(" ld 0, 16(1)")
+__jited(" mtlr 0")
+__jited(" mr 3, 8")
+__jited(" blr")
+__jited("...")
+__jited("func #1")
+/* subprogram entry for sub() */
+__jited(" nop")
+__jited(" ld 2, 16(13)")
+/* tail call prologue for subprogram */
+__jited(" ld 10, 0(1)")
+__jited(" ld 9, -8(10)")
+__jited(" cmplwi 9, 33")
+__jited(" bt {{.*}}, {{.*}}")
+__jited(" addi 9, 10, -8")
+__jited(" std 9, -8(1)")
+__jited(" lis {{.*}}, {{.*}}")
+__jited(" sldi {{.*}}, {{.*}}, 32")
+__jited(" oris {{.*}}, {{.*}}, {{.*}}")
+__jited(" ori {{.*}}, {{.*}}, {{.*}}")
+__jited(" li {{.*}}, 0")
+__jited(" lwz 9, {{.*}}({{.*}})")
+__jited(" slwi {{.*}}, {{.*}}, 0")
+__jited(" cmplw {{.*}}, 9")
+__jited(" bf 0, {{.*}}")
+/* bpf_tail_call implementation */
+__jited(" ld 9, -8(1)")
+__jited(" cmplwi 9, 33")
+__jited(" bf {{.*}}, {{.*}}")
+__jited(" ld 9, 0(9)")
+__jited(" cmplwi 9, 33")
+__jited(" bt {{.*}}, {{.*}}")
+__jited(" addi 9, 9, 1")
+__jited(" mulli 10, {{.*}}, 8")
+__jited(" add 10, 10, {{.*}}")
+__jited(" ld 10, {{.*}}(10)")
+__jited(" cmpldi 10, 0")
+__jited(" bt {{.*}}, {{.*}}")
+__jited(" ld 10, {{.*}}(10)")
+__jited(" addi 10, 10, 16")
+__jited(" mtctr 10")
+__jited(" ld 10, -8(1)")
+__jited(" cmplwi 10, 33")
+__jited(" bt {{.*}}, {{.*}}")
+__jited(" addi 10, 1, -8")
+__jited(" std 9, 0(10)")
+__jited(" bctr")
+__jited(" mr 3, 8")
+__jited(" blr")
+
SEC("tc")
__naked int main(void)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address
2026-05-17 21:40 ` [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address adubey
2026-05-17 18:02 ` sashiko-bot
2026-05-17 18:30 ` bot+bpf-ci
@ 2026-05-18 7:18 ` Hari Bathini
2 siblings, 0 replies; 18+ messages in thread
From: Hari Bathini @ 2026-05-18 7:18 UTC (permalink / raw)
To: adubey, bpf
Cc: linuxppc-dev, maddy, ast, andrii, daniel, shuah, linux-kselftest,
stable
On 18/05/26 3:10 am, adubey@linux.ibm.com wrote:
> From: Abhishek Dubey <adubey@linux.ibm.com>
>
> Ensure the dummy trampoline address field present between the OOL stub
> and the long branch stub is 8-byte aligned, for memory compatibility
> when content loaded to a register.
>
> Reported-by: Hari Bathini <hbathini@linux.ibm.com>
> Fixes: d243b62b7bd3 ("powerpc64/bpf: Add support for bpf trampolines")
> Cc: stable@vger.kernel.org
> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
> ---
> arch/powerpc/net/bpf_jit.h | 4 ++--
> arch/powerpc/net/bpf_jit_comp.c | 34 ++++++++++++++++++++++++++-----
> arch/powerpc/net/bpf_jit_comp64.c | 4 ++--
> 3 files changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index f32de8704d4d..71e6e7d01057 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -214,8 +214,8 @@ int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *
> int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
> u32 *addrs, int pass, bool extra_pass);
> void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
> -void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
> +void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx);
> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx);
> void bpf_jit_realloc_regs(struct codegen_context *ctx);
> int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
> void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt,
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 53ab97ad6074..ef7614177cb1 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -49,11 +49,34 @@ asm (
> " .popsection ;"
> );
>
> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx)
> {
> int ool_stub_idx, long_branch_stub_idx;
>
> /*
> + * In the final pass, align the mis-aligned dummy_tramp_addr field
> + * in the fimage. The alignment NOP must appear before OOL stub,
> + * to make ool_stub_idx & long_branch_stub_idx constant from end.
> + */
> +#ifdef CONFIG_PPC64
> + if (fimage && image) {
> + /*
> + * pc points to first instruction of OOL stub,
> + * dummy_tramp_addr is past 4/3 instructions depending on
> + * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
> + *
> + * The decision to emit alignment NOP must depend on the alignment
> + * of dummy_tramp_addr field.
This makes it easier to read instead of the XOR matrix..
> + */
> + unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
> + pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
The above line should be:
pc += (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3) * 4;
> +
> + if (!IS_ALIGNED(pc, 8))
> + EMIT(PPC_RAW_NOP());
> + }
> +#endif
> +
> + /* nop // optional, for alignment of dummy_tramp_addr
> * Out-of-line stub:
> * mflr r0
> * [b|bl] tramp
> @@ -70,7 +93,7 @@ void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
>
> /*
> * Long branch stub:
> - * .long <dummy_tramp_addr>
> + * .long <dummy_tramp_addr> // 8-byte aligned
> * mflr r11
> * bcl 20,31,$+4
> * mflr r12
> @@ -81,6 +104,7 @@ void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> */
> if (image)
> *((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
> +
> ctx->idx += SZL / 4;
> long_branch_stub_idx = ctx->idx;
> EMIT(PPC_RAW_MFLR(_R11));
> @@ -107,7 +131,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
> PPC_JMP(ctx->alt_exit_addr);
> } else {
> ctx->alt_exit_addr = ctx->idx * 4;
> - bpf_jit_build_epilogue(image, ctx);
> + bpf_jit_build_epilogue(image, NULL, ctx);
> }
>
> return 0;
> @@ -286,7 +310,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
> */
> bpf_jit_build_prologue(NULL, &cgctx);
> addrs[fp->len] = cgctx.idx * 4;
> - bpf_jit_build_epilogue(NULL, &cgctx);
> + bpf_jit_build_epilogue(NULL, NULL, &cgctx);
>
> fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
> extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
> @@ -318,7 +342,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
> bpf_jit_binary_pack_free(fhdr, hdr);
> goto out_err;
> }
> - bpf_jit_build_epilogue(code_base, &cgctx);
> + bpf_jit_build_epilogue(code_base, fcode_base, &cgctx);
>
> if (bpf_jit_enable > 1)
> pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index db364d9083e7..885dc8cf55a2 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -398,7 +398,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
> }
> }
>
> -void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx)
> {
> bpf_jit_emit_common_epilogue(image, ctx);
>
> @@ -407,7 +407,7 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
>
> EMIT(PPC_RAW_BLR());
>
> - bpf_jit_build_fentry_stubs(image, ctx);
> + bpf_jit_build_fentry_stubs(image, fimage, ctx);
> }
>
> /*
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
2026-05-17 21:40 ` [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub adubey
2026-05-17 18:23 ` sashiko-bot
2026-05-17 18:30 ` bot+bpf-ci
@ 2026-05-18 7:25 ` Hari Bathini
2026-05-18 7:53 ` Hari Bathini
2 siblings, 1 reply; 18+ messages in thread
From: Hari Bathini @ 2026-05-18 7:25 UTC (permalink / raw)
To: adubey, bpf
Cc: linuxppc-dev, maddy, ast, andrii, daniel, shuah, linux-kselftest,
stable
On 18/05/26 3:10 am, adubey@linux.ibm.com wrote:
> From: Abhishek Dubey <adubey@linux.ibm.com>
>
> Move the long branch address space to the bottom of the long
> branch stub. This allows uninterrupted disassembly until the
> last 8 bytes. Exclude these last bytes from the overall
> program length to prevent failure in assembly generation.
> Also, align dummy_tramp_addr field with 8-byte boundary.
>
> Following is disassembler output for test program with moved down
> dummy_tramp_addr field:
> .....
> .....
> pc:68 left:44 a6 03 08 7c : mtlr 0
> pc:72 left:40 bc ff ff 4b : b .-68
> pc:76 left:36 a6 02 68 7d : mflr 11
> pc:80 left:32 05 00 9f 42 : bcl 20, 31, .+4
> pc:84 left:28 a6 02 88 7d : mflr 12
> pc:88 left:24 14 00 8c e9 : ld 12, 20(12)
> pc:92 left:20 a6 03 89 7d : mtctr 12
> pc:96 left:16 a6 03 68 7d : mtlr 11
> pc:100 left:12 20 04 80 4e : bctr
> pc:104 left:8 c0 34 1d 00 :
>
> Failure log:
> Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
> Disassembly logic can truncate at 104, ignoring last 8 bytes.
>
> Update the dummy_tramp_addr field offset calculation from the end
> of the program to reflect its new location, for bpf_arch_text_poke()
> to update the actual trampoline's address in this field.
>
> All BPF trampoline selftests continue to pass with this patch applied.
>
> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
> ---
> arch/powerpc/net/bpf_jit_comp.c | 34 +++++++++++++++++++--------------
> 1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index ef7614177cb1..b73bc9295c31 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -57,19 +57,21 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
> * In the final pass, align the mis-aligned dummy_tramp_addr field
> * in the fimage. The alignment NOP must appear before OOL stub,
> * to make ool_stub_idx & long_branch_stub_idx constant from end.
> + *
> + * The dummy_tramp_addr field is placed at bottom of Long branch stub.
> */
> #ifdef CONFIG_PPC64
> if (fimage && image) {
> /*
> * pc points to first instruction of OOL stub,
> - * dummy_tramp_addr is past 4/3 instructions depending on
> + * dummy_tramp_addr is past 11/10 instructions depending on
> * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
> *
> * The decision to emit alignment NOP must depend on the alignment
> * of dummy_tramp_addr field.
> */
> unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
> - pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
> + pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10;
To get the address, should multiply the instruction count with 4..
pc += (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10) * 4;
Also, pc may not be appropriate name here. We are essentially
calculating the pointer address of dummy_tramp_addr. `addrp` maybe?
- Hari
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
2026-05-18 7:25 ` Hari Bathini
@ 2026-05-18 7:53 ` Hari Bathini
0 siblings, 0 replies; 18+ messages in thread
From: Hari Bathini @ 2026-05-18 7:53 UTC (permalink / raw)
To: adubey, bpf
Cc: linuxppc-dev, maddy, ast, andrii, daniel, shuah, linux-kselftest,
stable
On 18/05/26 12:55 pm, Hari Bathini wrote:
>
>
> On 18/05/26 3:10 am, adubey@linux.ibm.com wrote:
>> From: Abhishek Dubey <adubey@linux.ibm.com>
>>
>> Move the long branch address space to the bottom of the long
>> branch stub. This allows uninterrupted disassembly until the
>> last 8 bytes. Exclude these last bytes from the overall
>> program length to prevent failure in assembly generation.
>> Also, align dummy_tramp_addr field with 8-byte boundary.
>>
>> Following is disassembler output for test program with moved down
>> dummy_tramp_addr field:
>> .....
>> .....
>> pc:68 left:44 a6 03 08 7c : mtlr 0
>> pc:72 left:40 bc ff ff 4b : b .-68
>> pc:76 left:36 a6 02 68 7d : mflr 11
>> pc:80 left:32 05 00 9f 42 : bcl 20, 31, .+4
>> pc:84 left:28 a6 02 88 7d : mflr 12
>> pc:88 left:24 14 00 8c e9 : ld 12, 20(12)
>> pc:92 left:20 a6 03 89 7d : mtctr 12
>> pc:96 left:16 a6 03 68 7d : mtlr 11
>> pc:100 left:12 20 04 80 4e : bctr
>> pc:104 left:8 c0 34 1d 00 :
>>
>> Failure log:
>> Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
>> Disassembly logic can truncate at 104, ignoring last 8 bytes.
>>
>> Update the dummy_tramp_addr field offset calculation from the end
>> of the program to reflect its new location, for bpf_arch_text_poke()
>> to update the actual trampoline's address in this field.
>>
>> All BPF trampoline selftests continue to pass with this patch applied.
>>
>> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
>> ---
>> arch/powerpc/net/bpf_jit_comp.c | 34 +++++++++++++++++++--------------
>> 1 file changed, 20 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/
>> bpf_jit_comp.c
>> index ef7614177cb1..b73bc9295c31 100644
>> --- a/arch/powerpc/net/bpf_jit_comp.c
>> +++ b/arch/powerpc/net/bpf_jit_comp.c
>> @@ -57,19 +57,21 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32
>> *fimage, struct codegen_context
>> * In the final pass, align the mis-aligned dummy_tramp_addr field
>> * in the fimage. The alignment NOP must appear before OOL stub,
>> * to make ool_stub_idx & long_branch_stub_idx constant from end.
>> + *
>> + * The dummy_tramp_addr field is placed at bottom of Long branch
>> stub.
>> */
>> #ifdef CONFIG_PPC64
>> if (fimage && image) {
>> /*
>> * pc points to first instruction of OOL stub,
>> - * dummy_tramp_addr is past 4/3 instructions depending on
>> + * dummy_tramp_addr is past 11/10 instructions depending on
>> * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
>> *
>> * The decision to emit alignment NOP must depend on the
>> alignment
>> * of dummy_tramp_addr field.
>> */
>> unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
>
>> - pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
>> + pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10;
>
> To get the address, should multiply the instruction count with 4..
>
> pc += (IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10) * 4;
>
> Also, pc may not be appropriate name here. We are essentially
> calculating the pointer address of dummy_tramp_addr. `addrp` maybe?
Something like this:
+ u32 *addrp = fimage + ctx->idx;
+
+ addrp += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
+ if (!IS_ALIGNED((unsigned long)addrp, 8))
+ EMIT(PPC_RAW_NOP());
- Hari
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest
2026-05-17 21:40 [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest adubey
` (4 preceding siblings ...)
2026-05-17 21:40 ` [PATCH v4 5/5] selftest/bpf: Add tailcall " adubey
@ 2026-05-18 11:44 ` Christophe Leroy (CS GROUP)
5 siblings, 0 replies; 18+ messages in thread
From: Christophe Leroy (CS GROUP) @ 2026-05-18 11:44 UTC (permalink / raw)
To: adubey, bpf
Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
linux-kselftest, stable
Le 17/05/2026 à 23:40, adubey@linux.ibm.com a écrit :
> From: Abhishek Dubey <adubey@linux.ibm.com>
>
> The verifier selftest validates JITed instructions by matching expected
> disassembly output. The first two patches fix issues in powerpc instruction
> disassembly that were causing test flow failures. The fix is common for
> 64-bit & 32-bit powerpc. Add support for the powerpc-specific "__powerpc64"
> architecture tag in the third patch, enabling proper test filtering in
> verifier test files. Introduce verifier testcases for tailcalls on powerpc64
> in the final patch.
Build fails:
DESCEND objtool
INSTALL libsubcmd_headers
CC arch/powerpc/net/bpf_jit_comp32.o
arch/powerpc/net/bpf_jit_comp32.c:232:6: error: conflicting types for
'bpf_jit_build_epilogue'; have 'void(u32 *, struct codegen_context *)'
{aka 'void(unsigned int *, struct codegen_context *)'}
232 | void bpf_jit_build_epilogue(u32 *image, struct codegen_context
*ctx)
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from arch/powerpc/net/bpf_jit_comp32.c:19:
arch/powerpc/net/bpf_jit.h:217:6: note: previous declaration of
'bpf_jit_build_epilogue' with type 'void(u32 *, u32 *, struct
codegen_context *)' {aka 'void(unsigned int *, unsigned int *, struct
codegen_context *)'}
217 | void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct
codegen_context *ctx);
| ^~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/net/bpf_jit_comp32.c: In function 'bpf_jit_build_epilogue':
arch/powerpc/net/bpf_jit_comp32.c:240:43: error: passing argument 2 of
'bpf_jit_build_fentry_stubs' from incompatible pointer type
[-Wincompatible-pointer-types]
240 | bpf_jit_build_fentry_stubs(image, ctx);
| ^~~
| |
| struct codegen_context *
arch/powerpc/net/bpf_jit.h:218:50: note: expected 'u32 *' {aka 'unsigned
int *'} but argument is of type 'struct codegen_context *'
218 | void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct
codegen_context *ctx);
| ~~~~~^~~~~~
arch/powerpc/net/bpf_jit_comp32.c:240:9: error: too few arguments to
function 'bpf_jit_build_fentry_stubs'; expected 3, have 2
240 | bpf_jit_build_fentry_stubs(image, ctx);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/net/bpf_jit.h:218:6: note: declared here
218 | void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct
codegen_context *ctx);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
make[4]: *** [scripts/Makefile.build:289:
arch/powerpc/net/bpf_jit_comp32.o] Error 1
make[3]: *** [scripts/Makefile.build:548: arch/powerpc/net] Error 2
make[2]: *** [scripts/Makefile.build:548: arch/powerpc] Error 2
make[1]: *** [/home/chleroy/linux-powerpc/Makefile:2143: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Christophe
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-05-18 11:45 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17 21:40 [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest adubey
2026-05-17 21:40 ` [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address adubey
2026-05-17 18:02 ` sashiko-bot
2026-05-17 18:30 ` bot+bpf-ci
2026-05-18 7:18 ` Hari Bathini
2026-05-17 21:40 ` [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub adubey
2026-05-17 18:23 ` sashiko-bot
2026-05-17 18:30 ` bot+bpf-ci
2026-05-18 7:25 ` Hari Bathini
2026-05-18 7:53 ` Hari Bathini
2026-05-17 21:40 ` [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure adubey
2026-05-17 18:18 ` bot+bpf-ci
2026-05-17 18:38 ` sashiko-bot
2026-05-17 21:40 ` [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64 adubey
2026-05-17 18:18 ` bot+bpf-ci
2026-05-17 21:40 ` [PATCH v4 5/5] selftest/bpf: Add tailcall " adubey
2026-05-17 19:14 ` sashiko-bot
2026-05-18 11:44 ` [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest Christophe Leroy (CS GROUP)
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.