LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCHv2] powerpc/44x: Set GPIO chip parent
From: Linus Walleij @ 2026-05-17 10:47 UTC (permalink / raw)
  To: Rosen Penev
  Cc: linuxppc-dev, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), open list
In-Reply-To: <20260517063754.21819-1-rosenp@gmail.com>

On Sun, May 17, 2026 at 8:38 AM Rosen Penev <rosenp@gmail.com> wrote:

> The PPC4xx GPIO driver stopped assigning an explicit parent
> to the gpio_chip when it moved away from of_mm_gpiochip_add_data().
>
> Restore that association from the platform device so OF GPIO lookup
> can match phandles to the registered gpiochip.
>
> Tested on: Cisco MX60W. No more probe deferral.
>
> Assisted-by: Codex:GPT-5.5
> Fixes: 1044dbaf2a77 ("powerpc/44x: Change GPIO driver to a proper platform driver")
> Signed-off-by: Rosen Penev <rosenp@gmail.com>

Reviewed-by: Linus Walleij <linusw@kernel.org>

Yours,
Linus Walleij


^ permalink raw reply

* Re: [PATCH] powerpc: define __LITTLE_ENDIAN and __BIG_ENDIAN for math-emu
From: David Laight @ 2026-05-17 13:54 UTC (permalink / raw)
  To: Mingcong Bai
  Cc: linux-kernel, Xi Ruoyao, Kexy Biscuit, stable, kernel test robot,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), linuxppc-dev
In-Reply-To: <20260517041423.71243-1-jeffbai@aosc.io>

On Sun, 17 May 2026 12:14:21 +0800
Mingcong Bai <jeffbai@aosc.io> wrote:

> Similar to commit b929926f01f2 ("sh: define __BIG_ENDIAN for math-emu"),
> define __LITTLE_ENDIAN and __BIG_ENDIAN as 0 to mitigate build-time
> warnings:
> 
>   ./include/math-emu/double.h:59:21: error: ‘__BIG_ENDIAN’ is not defined, evaluates to ‘0’ [-Werror=undef]
>      59 | #if __BYTE_ORDER == __BIG_ENDIAN
>         |
> 
> Cc: stable@vger.kernel.org
> Fixes: 13da9e200fe4 ("Revert "endian: #define __BYTE_ORDER"")
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202507301656.7FEX6J5W-lkp@intel.com/
> Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
> ---
>  arch/powerpc/include/asm/sfp-machine.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/sfp-machine.h b/arch/powerpc/include/asm/sfp-machine.h
> index 8b957aabb826d..db8525605c026 100644
> --- a/arch/powerpc/include/asm/sfp-machine.h
> +++ b/arch/powerpc/include/asm/sfp-machine.h
> @@ -319,10 +319,12 @@
>  #define abort()								\
>  	return 0
>  
> -#ifdef __BIG_ENDIAN
> +#ifdef __BIG_ENDIAN__
>  #define __BYTE_ORDER __BIG_ENDIAN
> +#define __LITTLE_ENDIAN 0
>  #else
>  #define __BYTE_ORDER __LITTLE_ENDIAN
> +#define __BIG_ENDIAN 0
>  #endif

I thought the expected/correct value for __BYTE_ORDER__ was either 1234 or 4321.
(apart from pdp11's 2143).

-- David

>  
>  /* Exception flags. */



^ permalink raw reply

* Re: [PATCH] powerpc: define __LITTLE_ENDIAN and __BIG_ENDIAN for math-emu
From: Xi Ruoyao @ 2026-05-17 15:40 UTC (permalink / raw)
  To: David Laight, Mingcong Bai
  Cc: linux-kernel, Kexy Biscuit, stable, kernel test robot,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), linuxppc-dev
In-Reply-To: <20260517145421.2d1ac77c@pumpkin>

On Sun, 2026-05-17 at 14:54 +0100, David Laight wrote:
> On Sun, 17 May 2026 12:14:21 +0800
> Mingcong Bai <jeffbai@aosc.io> wrote:
> 
> > Similar to commit b929926f01f2 ("sh: define __BIG_ENDIAN for math-emu"),
> > define __LITTLE_ENDIAN and __BIG_ENDIAN as 0 to mitigate build-time
> > warnings:
> > 
> >   ./include/math-emu/double.h:59:21: error: ‘__BIG_ENDIAN’ is not defined, evaluates to ‘0’ [-Werror=undef]
> >      59 | #if __BYTE_ORDER == __BIG_ENDIAN
> >         |
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 13da9e200fe4 ("Revert "endian: #define __BYTE_ORDER"")
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes: https://lore.kernel.org/oe-kbuild-all/202507301656.7FEX6J5W-lkp@intel.com/
> > Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
> > ---
> >  arch/powerpc/include/asm/sfp-machine.h | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/include/asm/sfp-machine.h b/arch/powerpc/include/asm/sfp-machine.h
> > index 8b957aabb826d..db8525605c026 100644
> > --- a/arch/powerpc/include/asm/sfp-machine.h
> > +++ b/arch/powerpc/include/asm/sfp-machine.h
> > @@ -319,10 +319,12 @@
> >  #define abort()								\
> >  	return 0
> >  
> > -#ifdef __BIG_ENDIAN
> > +#ifdef __BIG_ENDIAN__
> >  #define __BYTE_ORDER __BIG_ENDIAN
> > +#define __LITTLE_ENDIAN 0
> >  #else
> >  #define __BYTE_ORDER __LITTLE_ENDIAN
> > +#define __BIG_ENDIAN 0
> >  #endif
> 
> I thought the expected/correct value for __BYTE_ORDER__ was either 1234 or 4321.
> (apart from pdp11's 2143).

Should we just do

#define __BYTE_ORDER __BYTE_ORDER__
#define __LITTLE_ENDIAN __ORDER_LITTLE_ENDIAN__
#define __BIG_ENDIAN __ORDER_BIG_ENDIAN__

then?  __BYTE_ORDER__ etc. are available since gcc 4.6 and now we
requires gcc >= 8 to build the kernel.


-- 
Xi Ruoyao <xry111@xry111.site>


^ permalink raw reply

* [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
  To: bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, Abhishek Dubey
In-Reply-To: <20260517214043.12975-1-adubey@linux.ibm.com>

From: Abhishek Dubey <adubey@linux.ibm.com>

Ensure the dummy trampoline address field present between the OOL stub
and the long branch stub is 8-byte aligned, for memory compatibility
when content loaded to a register.

Reported-by: Hari Bathini <hbathini@linux.ibm.com>
Fixes: d243b62b7bd3 ("powerpc64/bpf: Add support for bpf trampolines")
Cc: stable@vger.kernel.org
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit.h        |  4 ++--
 arch/powerpc/net/bpf_jit_comp.c   | 34 ++++++++++++++++++++++++++-----
 arch/powerpc/net/bpf_jit_comp64.c |  4 ++--
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index f32de8704d4d..71e6e7d01057 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -214,8 +214,8 @@ int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
 		       u32 *addrs, int pass, bool extra_pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
-void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx);
+void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
 void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt,
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 53ab97ad6074..ef7614177cb1 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -49,11 +49,34 @@ asm (
 "	.popsection				;"
 );
 
-void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx)
 {
 	int ool_stub_idx, long_branch_stub_idx;
 
 	/*
+	 * In the final pass, align the mis-aligned dummy_tramp_addr field
+	 * in the fimage. The alignment NOP must appear before OOL stub,
+	 * to make ool_stub_idx & long_branch_stub_idx constant from end.
+	 */
+#ifdef CONFIG_PPC64
+	if (fimage && image) {
+		/*
+		 * pc points to first instruction of OOL stub,
+		 * dummy_tramp_addr is past 4/3 instructions depending on
+		 * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
+		 *
+		 * The decision to emit alignment NOP must depend on the alignment
+		 * of dummy_tramp_addr field.
+		 */
+		unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
+		pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
+
+		if (!IS_ALIGNED(pc, 8))
+			EMIT(PPC_RAW_NOP());
+	}
+#endif
+
+	/*      nop     // optional, for alignment of dummy_tramp_addr
 	 * Out-of-line stub:
 	 *	mflr	r0
 	 *	[b|bl]	tramp
@@ -70,7 +93,7 @@ void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
 
 	/*
 	 * Long branch stub:
-	 *	.long	<dummy_tramp_addr>
+	 *	.long	<dummy_tramp_addr>  // 8-byte aligned
 	 *	mflr	r11
 	 *	bcl	20,31,$+4
 	 *	mflr	r12
@@ -81,6 +104,7 @@ void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
 	 */
 	if (image)
 		*((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
+
 	ctx->idx += SZL / 4;
 	long_branch_stub_idx = ctx->idx;
 	EMIT(PPC_RAW_MFLR(_R11));
@@ -107,7 +131,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
 		PPC_JMP(ctx->alt_exit_addr);
 	} else {
 		ctx->alt_exit_addr = ctx->idx * 4;
-		bpf_jit_build_epilogue(image, ctx);
+		bpf_jit_build_epilogue(image, NULL, ctx);
 	}
 
 	return 0;
@@ -286,7 +310,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 	 */
 	bpf_jit_build_prologue(NULL, &cgctx);
 	addrs[fp->len] = cgctx.idx * 4;
-	bpf_jit_build_epilogue(NULL, &cgctx);
+	bpf_jit_build_epilogue(NULL, NULL, &cgctx);
 
 	fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
 	extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
@@ -318,7 +342,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_verifier_env *env, struct bpf_pr
 			bpf_jit_binary_pack_free(fhdr, hdr);
 			goto out_err;
 		}
-		bpf_jit_build_epilogue(code_base, &cgctx);
+		bpf_jit_build_epilogue(code_base, fcode_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
 			pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index db364d9083e7..885dc8cf55a2 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -398,7 +398,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
 	}
 }
 
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx)
 {
 	bpf_jit_emit_common_epilogue(image, ctx);
 
@@ -407,7 +407,7 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 
 	EMIT(PPC_RAW_BLR());
 
-	bpf_jit_build_fentry_stubs(image, ctx);
+	bpf_jit_build_fentry_stubs(image, fimage, ctx);
 }
 
 /*
-- 
2.52.0



^ permalink raw reply related

* [PATCH v4 0/5] powerpc/bpf: Add support for verifier selftest
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
  To: bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, Abhishek Dubey

From: Abhishek Dubey <adubey@linux.ibm.com>

The verifier selftest validates JITed instructions by matching expected
disassembly output. The first two patches fix issues in powerpc instruction
disassembly that were causing test flow failures. The fix is common for 
64-bit & 32-bit powerpc. Add support for the powerpc-specific "__powerpc64"
architecture tag in the third patch, enabling proper test filtering in
verifier test files. Introduce verifier testcases for tailcalls on powerpc64
in the final patch.

The first patch in series is fix patch, correcting memory alignment with
8-byte boundary for long branch trampoline address. The subsequent
patches enables verifier selftests on powerpc.

Issue Details:
--------------

    The Long branch stub in the trampoline implementation[1] provides
    flexibility to handles short as well as long branch distance to
    actual trampoline. Whereas, the 8 bytes long dummy_tramp_addr field
    sitting before long branch stub leads to failure when enabling
    verifier based seltest for ppc64.
    
    The verifier selftests require disassembing the final jited image
    to get native instructions. Later the disassembled instruction
    sequence is matched against sequence of instructions provided in
    test-file under __jited() wrapper. The final jited image contains
    Out-of-line stub and Long branch stub as part of epilogue jitting
    for a bpf program. The 8 bytes space for dummy_tramp is sandwiched
    between both above mentioned stubs. These 8 bytes contain memory
    address of dummy trampoline during trampoline invocation which don't
    correspond to any powerpc instructions. So, disassembly fails
    resulting in failure of verifier selftests.
    
    The following code snippet shows the problem with current arrangement
    made for dummy_tramp_addr.
    
    /* Out-of-line stub */
    mflr    r0  
    [b|bl]  tramp
    mtlr    r0 //only with OOL 
    b       bpf_func + 4 
    /* Long branch stub */
    .long   <dummy_tramp_addr>  <---Invalid bytes sequence, disassembly fails
    mflr    r11 
    bcl     20,31,$+4
    mflr    r12 
    ld      r12, -8-SZL(r12)
    mtctr   r12 
    mtlr    r11 //retain ftrace ABI 
    bctr

    Consider test program binary of size 112 bytes:
    0:  00000060 10004de8 00002039 f8ff21f9 81ff21f8 7000e1fb 3000e13b
    28: 3000e13b 2a006038 f8ff7ff8 00000039 7000e1eb 80002138 7843037d
    56: 2000804e a602087c 00000060 a603087c bcffff4b c0341d00 000000c0
    84: a602687d 05009f42 a602887d f0ff8ce9 a603897d a603687d 2004804e

    Disassembly output of above binary for ppc64le:
    pc:0     left:112    00 00 00 60  :  nop
    pc:4     left:108    10 00 4d e8  :  ld 2, 16(13)
    pc:8     left:104    00 00 20 39  :  li 9, 0
    pc:12    left:100    f8 ff 21 f9  :  std 9, -8(1)
    pc:16    left:96     81 ff 21 f8  :  stdu 1, -128(1)
    pc:20    left:92     70 00 e1 fb  :  std 31, 112(1)
    pc:24    left:88     30 00 e1 3b  :  addi 31, 1, 48
    pc:28    left:84     30 00 e1 3b  :  addi 31, 1, 48
    pc:32    left:80     2a 00 60 38  :  li 3, 42
    pc:36    left:76     f8 ff 7f f8  :  std 3, -8(31)
    pc:40    left:72     00 00 00 39  :  li 8, 0
    pc:44    left:68     70 00 e1 eb  :  ld 31, 112(1)
    pc:48    left:64     80 00 21 38  :  addi 1, 1, 128
    pc:52    left:60     78 43 03 7d  :  mr    3, 8
    pc:56    left:56     20 00 80 4e  :  blr
    pc:60    left:52     a6 02 08 7c  :  mflr 0
    pc:64    left:48     00 00 00 60  :  nop
    pc:68    left:44     a6 03 08 7c  :  mtlr 0
    pc:72    left:40     bc ff ff 4b  :  b .-68
    pc:76    left:36     c0 34 1d 00  :
    ...

    Failure log:
    Can't disasm instruction at offset 76: c0 34 1d 00 00 00 00 c0 a6 02 68 7d 05 00 9f 42
    --------------------------------------

    Observation:
    Can't disasm instruction at offset 76 as this address has
    ".long <dummy_tramp_addr>" (0xc0341d00000000c0)
    But valid instructions follow at offset 84 onwards.

    Move the long branch address space to the bottom of the long
    branch stub. This allows uninterrupted disassembly until the
    last 8 bytes. Exclude these last bytes from the overall
    program length to prevent failure in assembly generation.

    Following is disassembler output for same test program with moved down
    dummy_tramp_addr field:
    .....
    .....
    pc:68    left:44     a6 03 08 7c  :  mtlr 0
    pc:72    left:40     bc ff ff 4b  :  b .-68
    pc:76    left:36     a6 02 68 7d  :  mflr 11
    pc:80    left:32     05 00 9f 42  :  bcl 20, 31, .+4
    pc:84    left:28     a6 02 88 7d  :  mflr 12
    pc:88    left:24     14 00 8c e9  :  ld 12, 20(12)
    pc:92    left:20     a6 03 89 7d  :  mtctr 12
    pc:96    left:16     a6 03 68 7d  :  mtlr 11
    pc:100   left:12     20 04 80 4e  :  bctr
    pc:104   left:8      c0 34 1d 00  :

    Failure log:
    Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
    ---------------------------------------
    Disassembly logic can truncate at 104, ignoring last 8 bytes.

    Update the dummy_tramp_addr field offset calculation from the end
    of the program to reflect its new location, for bpf_arch_text_poke()
    to update the actual trampoline's address in this field.

    [1] https://lore.kernel.org/all/20241030070850.1361304-18-hbathini@linux.ibm.com

v3->v4:
  Changed logic for emitting alignment NOP

v2->v3:
  Removed fixed NOP from bottom of long branch stub
  Rebased on top of bpf-next

v1->v2:
  Added fix-patch to correct memory alignment in-place
  Moved the optional alignmnet NOP before OOL stub

[v1]: https://lore.kernel.org/bpf/20260225013627.22098-1-adubey@linux.ibm.com
[v2]: https://lore.kernel.org/bpf/20260403004011.44417-1-adubey@linux.ibm.com
[v3]: https://lore.kernel.org/bpf/20260411221413.44304-1-adubey@linux.ibm.com

Abhishek Dubey (5):
  powerpc/bpf: fix alignment of long branch trampoline address
  powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
  selftest/bpf: Fixing powerpc JIT disassembly failure
  selftest/bpf: Enable verifier selftest for powerpc64
  selftest/bpf: Add tailcall verifier selftest for powerpc64

 arch/powerpc/net/bpf_jit.h                    |  4 +-
 arch/powerpc/net/bpf_jit_comp.c               | 60 ++++++++++++----
 arch/powerpc/net/bpf_jit_comp64.c             |  4 +-
 .../selftests/bpf/jit_disasm_helpers.c        | 13 +++-
 tools/testing/selftests/bpf/progs/bpf_misc.h  |  1 +
 .../bpf/progs/verifier_tailcall_jit.c         | 69 +++++++++++++++++++
 tools/testing/selftests/bpf/test_loader.c     |  5 ++
 7 files changed, 136 insertions(+), 20 deletions(-)

-- 
2.52.0



^ permalink raw reply

* [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
  To: bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, Abhishek Dubey
In-Reply-To: <20260517214043.12975-1-adubey@linux.ibm.com>

From: Abhishek Dubey <adubey@linux.ibm.com>

Move the long branch address space to the bottom of the long
branch stub. This allows uninterrupted disassembly until the
last 8 bytes. Exclude these last bytes from the overall
program length to prevent failure in assembly generation.
Also, align dummy_tramp_addr field with 8-byte boundary.

Following is disassembler output for test program with moved down
dummy_tramp_addr field:
.....
.....
pc:68    left:44     a6 03 08 7c  :  mtlr 0
pc:72    left:40     bc ff ff 4b  :  b .-68
pc:76    left:36     a6 02 68 7d  :  mflr 11
pc:80    left:32     05 00 9f 42  :  bcl 20, 31, .+4
pc:84    left:28     a6 02 88 7d  :  mflr 12
pc:88    left:24     14 00 8c e9  :  ld 12, 20(12)
pc:92    left:20     a6 03 89 7d  :  mtctr 12
pc:96    left:16     a6 03 68 7d  :  mtlr 11
pc:100   left:12     20 04 80 4e  :  bctr
pc:104   left:8      c0 34 1d 00  :

Failure log:
Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
Disassembly logic can truncate at 104, ignoring last 8 bytes.

Update the dummy_tramp_addr field offset calculation from the end
of the program to reflect its new location, for bpf_arch_text_poke()
to update the actual trampoline's address in this field.

All BPF trampoline selftests continue to pass with this patch applied.

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp.c | 34 +++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index ef7614177cb1..b73bc9295c31 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -57,19 +57,21 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
 	 * In the final pass, align the mis-aligned dummy_tramp_addr field
 	 * in the fimage. The alignment NOP must appear before OOL stub,
 	 * to make ool_stub_idx & long_branch_stub_idx constant from end.
+	 *
+	 * The dummy_tramp_addr field is placed at bottom of Long branch stub.
 	 */
 #ifdef CONFIG_PPC64
 	if (fimage && image) {
 		/*
 		 * pc points to first instruction of OOL stub,
-		 * dummy_tramp_addr is past 4/3 instructions depending on
+		 * dummy_tramp_addr is past 11/10 instructions depending on
 		 * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
 		 *
 		 * The decision to emit alignment NOP must depend on the alignment
 		 * of dummy_tramp_addr field.
 		 */
 		unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
-		pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
+		pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10;
 
 		if (!IS_ALIGNED(pc, 8))
 			EMIT(PPC_RAW_NOP());
@@ -93,28 +95,29 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context
 
 	/*
 	 * Long branch stub:
-	 *	.long	<dummy_tramp_addr>  // 8-byte aligned
 	 *	mflr	r11
 	 *	bcl	20,31,$+4
-	 *	mflr	r12
-	 *	ld	r12, -8-SZL(r12)
+	 *	mflr	r12	// lr/r12 stores pc of current(this) inst.
+	 *	ld	r12, 20(r12) // offset(dummy_tramp_addr) from prev inst. is 20
 	 *	mtctr	r12
-	 *	mtlr	r11 // needed to retain ftrace ABI
+	 *	mtlr	r11	// needed to retain ftrace ABI
 	 *	bctr
+	 *	.long	<dummy_tramp_addr>  // 8-byte aligned
 	 */
-	if (image)
-		*((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
-
-	ctx->idx += SZL / 4;
 	long_branch_stub_idx = ctx->idx;
 	EMIT(PPC_RAW_MFLR(_R11));
 	EMIT(PPC_RAW_BCL4());
 	EMIT(PPC_RAW_MFLR(_R12));
-	EMIT(PPC_RAW_LL(_R12, _R12, -8-SZL));
+	EMIT(PPC_RAW_LL(_R12, _R12, 20));
 	EMIT(PPC_RAW_MTCTR(_R12));
 	EMIT(PPC_RAW_MTLR(_R11));
 	EMIT(PPC_RAW_BCTR());
 
+	if (image)
+		*((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp;
+
+	ctx->idx += SZL / 4;
+
 	if (!bpf_jit_ool_stub) {
 		bpf_jit_ool_stub = (ctx->idx - ool_stub_idx) * 4;
 		bpf_jit_long_branch_stub = (ctx->idx - long_branch_stub_idx) * 4;
@@ -1284,6 +1287,7 @@ static void do_isync(void *info __maybe_unused)
  * bpf_func:
  *	[nop|b]	ool_stub
  * 2. Out-of-line stub:
+ *	nop	// optional nop for alignment
  * ool_stub:
  *	mflr	r0
  *	[b|bl]	<bpf_prog>/<long_branch_stub>
@@ -1291,14 +1295,14 @@ static void do_isync(void *info __maybe_unused)
  *	b	bpf_func + 4
  * 3. Long branch stub:
  * long_branch_stub:
- *	.long	<branch_addr>/<dummy_tramp>
  *	mflr	r11
  *	bcl	20,31,$+4
  *	mflr	r12
- *	ld	r12, -16(r12)
+ *	ld	r12, 20(r12)
  *	mtctr	r12
  *	mtlr	r11 // needed to retain ftrace ABI
  *	bctr
+ *	.long	<branch_addr>/<dummy_tramp>
  *
  * dummy_tramp is used to reduce synchronization requirements.
  *
@@ -1400,10 +1404,12 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
 	 * 1. Update the address in the long branch stub:
 	 * If new_addr is out of range, we will have to use the long branch stub, so patch new_addr
 	 * here. Otherwise, revert to dummy_tramp, but only if we had patched old_addr here.
+	 *
+	 * dummy_tramp_addr moved to bottom of long branch stub.
 	 */
 	if ((new_addr && !is_offset_in_branch_range(new_addr - ip)) ||
 	    (old_addr && !is_offset_in_branch_range(old_addr - ip)))
-		ret = patch_ulong((void *)(bpf_func_end - bpf_jit_long_branch_stub - SZL),
+		ret = patch_ulong((void *)(bpf_func_end - SZL), /* SZL: dummy_tramp_addr offset */
 				  (new_addr && !is_offset_in_branch_range(new_addr - ip)) ?
 				  (unsigned long)new_addr : (unsigned long)dummy_tramp);
 	if (ret)
-- 
2.52.0



^ permalink raw reply related

* [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
  To: bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, Abhishek Dubey
In-Reply-To: <20260517214043.12975-1-adubey@linux.ibm.com>

From: Abhishek Dubey <adubey@linux.ibm.com>

Ensure that the trampoline stubs JITed at the tail of the
epilogue do not expose the dummy trampoline address stored
in the last 8 bytes (for both 64-bit and 32-bit PowerPC)
to the disassembly flow. Prevent the disassembler from
ingesting this memory address, as it may occasionally decode
into a seemingly valid but incorrect instruction. Fix this
issue by truncating the last 8 bytes from JITed buffers
before supplying them for disassembly.

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
 tools/testing/selftests/bpf/jit_disasm_helpers.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/jit_disasm_helpers.c b/tools/testing/selftests/bpf/jit_disasm_helpers.c
index 364c557c5115..4c6bcbe08491 100644
--- a/tools/testing/selftests/bpf/jit_disasm_helpers.c
+++ b/tools/testing/selftests/bpf/jit_disasm_helpers.c
@@ -170,9 +170,11 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
 	__u32 jited_funcs, len, pc;
+	__u32 trunc_len = 0;
 	__u32 *func_lens = NULL;
 	FILE *text_out = NULL;
 	uint8_t *image = NULL;
+	char *triple = NULL;
 	int i, err = 0;
 
 	if (!llvm_initialized) {
@@ -216,9 +218,18 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
 	if (!ASSERT_OK(err, "bpf_prog_get_info_by_fd #2"))
 		goto out;
 
+	/*
+	 * last 8 bytes contains dummy_trampoline address in JIT
+	 * output for 64-bit and 32-bit powerpc, which can't
+	 * disassemble a to valid instruction.
+	 */
+	triple = LLVMGetDefaultTargetTriple();
+	if (strstr(triple, "powerpc"))
+		trunc_len = 8;
+
 	for (pc = 0, i = 0; i < jited_funcs; ++i) {
 		fprintf(text_out, "func #%d:\n", i);
-		disasm_one_func(text_out, image + pc, func_lens[i]);
+		disasm_one_func(text_out, image + pc, func_lens[i] - trunc_len);
 		fprintf(text_out, "\n");
 		pc += func_lens[i];
 	}
-- 
2.52.0



^ permalink raw reply related

* [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
  To: bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, Abhishek Dubey
In-Reply-To: <20260517214043.12975-1-adubey@linux.ibm.com>

From: Abhishek Dubey <adubey@linux.ibm.com>

This patch enables arch specifier "__powerpc64" in verifier
selftest for ppc64. Power 32-bit would require separate
handling. Changes tested for 64-bit only.

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
 tools/testing/selftests/bpf/progs/bpf_misc.h | 1 +
 tools/testing/selftests/bpf/test_loader.c    | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index 9eeb5b0b63d6..cdc2a3de3054 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -158,6 +158,7 @@
 #define __arch_arm64		__arch("ARM64")
 #define __arch_riscv64		__arch("RISCV64")
 #define __arch_s390x		__arch("s390x")
+#define __arch_powerpc64	__arch("POWERPC64")
 #define __caps_unpriv(caps)	__test_tag("test_caps_unpriv=" EXPAND_QUOTE(caps))
 #define __load_if_JITed()	__test_tag("load_mode=jited")
 #define __load_if_no_JITed()	__test_tag("load_mode=no_jited")
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index abdb9e6e3713..d5589355ed9e 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -377,6 +377,7 @@ enum arch {
 	ARCH_ARM64	= 0x4,
 	ARCH_RISCV64	= 0x8,
 	ARCH_S390X	= 0x10,
+	ARCH_POWERPC64	= 0x20,
 };
 
 static int get_current_arch(void)
@@ -389,6 +390,8 @@ static int get_current_arch(void)
 	return ARCH_RISCV64;
 #elif defined(__s390x__)
 	return ARCH_S390X;
+#elif defined(__powerpc64__)
+	return ARCH_POWERPC64;
 #endif
 	return ARCH_UNKNOWN;
 }
@@ -580,6 +583,8 @@ static int parse_test_spec(struct test_loader *tester,
 				arch = ARCH_RISCV64;
 			} else if (strcmp(val, "s390x") == 0) {
 				arch = ARCH_S390X;
+			} else if (strcmp(val, "POWERPC64") == 0) {
+				arch = ARCH_POWERPC64;
 			} else {
 				PRINT_FAIL("bad arch spec: '%s'\n", val);
 				err = -EINVAL;
-- 
2.52.0



^ permalink raw reply related

* [PATCH v4 5/5] selftest/bpf: Add tailcall verifier selftest for powerpc64
From: adubey @ 2026-05-17 21:40 UTC (permalink / raw)
  To: bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, Abhishek Dubey
In-Reply-To: <20260517214043.12975-1-adubey@linux.ibm.com>

From: Abhishek Dubey <adubey@linux.ibm.com>

Verifier testcase result for tailcalls:

# ./test_progs -t verifier_tailcall
#618/1   verifier_tailcall/invalid map type for tail call:OK
#618/2   verifier_tailcall/invalid map type for tail call @unpriv:OK
#618     verifier_tailcall:OK
#619/1   verifier_tailcall_jit/main:OK
#619     verifier_tailcall_jit:OK
Summary: 2/3 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
---
 .../bpf/progs/verifier_tailcall_jit.c         | 69 +++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
index 8d60c634a114..17475ecb3207 100644
--- a/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
+++ b/tools/testing/selftests/bpf/progs/verifier_tailcall_jit.c
@@ -90,6 +90,75 @@ __jited("	popq	%rax")
 __jited("	jmp	{{.*}}")		/* jump to tail call tgt   */
 __jited("L0:	leave")
 __jited("	{{(retq|jmp	0x)}}")		/* return or jump to rethunk */
+__arch_powerpc64
+/* program entry for main(), regular function prologue */
+__jited("	nop")
+__jited("	ld 2, 16(13)")
+__jited("	li 9, 0")
+__jited("	std 9, -8(1)")
+__jited("	mflr 0")
+__jited("	std 0, 16(1)")
+__jited("	stdu 1, {{.*}}(1)")
+/* load address and call sub() via count register */
+__jited("	lis 12, {{.*}}")
+__jited("	sldi 12, 12, 32")
+__jited("	oris 12, 12, {{.*}}")
+__jited("	ori 12, 12, {{.*}}")
+__jited("	mtctr 12")
+__jited("	bctrl")
+__jited("	mr	8, 3")
+__jited("	li 8, 0")
+__jited("	addi 1, 1, {{.*}}")
+__jited("	ld 0, 16(1)")
+__jited("	mtlr 0")
+__jited("	mr	3, 8")
+__jited("	blr")
+__jited("...")
+__jited("func #1")
+/* subprogram entry for sub() */
+__jited("	nop")
+__jited("	ld 2, 16(13)")
+/* tail call prologue for subprogram */
+__jited("	ld 10, 0(1)")
+__jited("	ld 9, -8(10)")
+__jited("	cmplwi	9, 33")
+__jited("	bt	{{.*}}, {{.*}}")
+__jited("	addi 9, 10, -8")
+__jited("	std 9, -8(1)")
+__jited("	lis {{.*}}, {{.*}}")
+__jited("	sldi {{.*}}, {{.*}}, 32")
+__jited("	oris {{.*}}, {{.*}}, {{.*}}")
+__jited("	ori {{.*}}, {{.*}}, {{.*}}")
+__jited("	li {{.*}}, 0")
+__jited("	lwz 9, {{.*}}({{.*}})")
+__jited("	slwi {{.*}}, {{.*}}, 0")
+__jited("	cmplw	{{.*}}, 9")
+__jited("	bf	0, {{.*}}")
+/* bpf_tail_call implementation */
+__jited("	ld 9, -8(1)")
+__jited("	cmplwi	9, 33")
+__jited("	bf	{{.*}}, {{.*}}")
+__jited("	ld 9, 0(9)")
+__jited("	cmplwi	9, 33")
+__jited("	bt	{{.*}}, {{.*}}")
+__jited("	addi 9, 9, 1")
+__jited("	mulli 10, {{.*}}, 8")
+__jited("	add 10, 10, {{.*}}")
+__jited("	ld 10, {{.*}}(10)")
+__jited("	cmpldi	10, 0")
+__jited("	bt	{{.*}}, {{.*}}")
+__jited("	ld 10, {{.*}}(10)")
+__jited("	addi 10, 10, 16")
+__jited("	mtctr 10")
+__jited("	ld 10, -8(1)")
+__jited("	cmplwi	10, 33")
+__jited("	bt	{{.*}}, {{.*}}")
+__jited("	addi 10, 1, -8")
+__jited("	std 9, 0(10)")
+__jited("	bctr")
+__jited("	mr	3, 8")
+__jited("	blr")
+
 SEC("tc")
 __naked int main(void)
 {
-- 
2.52.0



^ permalink raw reply related

* Re: [PATCH v3 1/5] powerpc/bpf: fix alignment of long branch trampoline address
From: adubey @ 2026-05-17 17:45 UTC (permalink / raw)
  To: Hari Bathini
  Cc: bpf, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable
In-Reply-To: <7d3bbb94-575f-4119-9ef0-62cff98795ce@linux.ibm.com>

On 2026-04-28 20:59, Hari Bathini wrote:
> On 12/04/26 3:44 am, adubey@linux.ibm.com wrote:
>> From: Abhishek Dubey <adubey@linux.ibm.com>
>> 
>> Ensure the dummy trampoline address field present between the OOL stub
>> and the long branch stub is 8-byte aligned, for memory compatibility
>> when content loaded to a register.
>> 
>> Reported-by: Hari Bathini <hbathini@linux.ibm.com>
>> Fixes: d243b62b7bd3 ("powerpc64/bpf: Add support for bpf trampolines")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
>> ---
>>   arch/powerpc/net/bpf_jit.h        |  4 ++--
>>   arch/powerpc/net/bpf_jit_comp.c   | 34 
>> ++++++++++++++++++++++++++-----
>>   arch/powerpc/net/bpf_jit_comp64.c |  4 ++--
>>   3 files changed, 33 insertions(+), 9 deletions(-)
>> 
>> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
>> index 7354e1d72f79..1184ad15d5a4 100644
>> --- a/arch/powerpc/net/bpf_jit.h
>> +++ b/arch/powerpc/net/bpf_jit.h
>> @@ -208,8 +208,8 @@ int bpf_jit_emit_func_call_rel(u32 *image, u32 
>> *fimage, struct codegen_context *
>>   int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, 
>> struct codegen_context *ctx,
>>   		       u32 *addrs, int pass, bool extra_pass);
>>   void bpf_jit_build_prologue(u32 *image, struct codegen_context 
>> *ctx);
>> -void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
>> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context 
>> *ctx);
>> +void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct 
>> codegen_context *ctx);
>> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct 
>> codegen_context *ctx);
>>   void bpf_jit_realloc_regs(struct codegen_context *ctx);
>>   int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, 
>> int tmp_reg, long exit_addr);
>>   diff --git a/arch/powerpc/net/bpf_jit_comp.c 
>> b/arch/powerpc/net/bpf_jit_comp.c
>> index a62a9a92b7b5..c255b30a37b0 100644
>> --- a/arch/powerpc/net/bpf_jit_comp.c
>> +++ b/arch/powerpc/net/bpf_jit_comp.c
>> @@ -49,11 +49,34 @@ asm (
>>   "	.popsection				;"
>>   );
>>   -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context 
>> *ctx)
>> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct 
>> codegen_context *ctx)
>>   {
>>   	int ool_stub_idx, long_branch_stub_idx;
>>     	/*
>> +	 * In the final pass, align the mis-aligned dummy_tramp_addr field
>> +	 * in the fimage. The alignment NOP must appear before OOL stub,
>> +	 * to make ool_stub_idx & long_branch_stub_idx constant from end.
>> +	 *
>> +	 * Need alignment NOP in following conditions:
>> +	 *
> 
> I don't think I follow this table..
I have omitted this table. The new logic emits alignment NOP wrt 
misalignment detected
for dummy_tramp_addr field.
Please follow the changes at 
https://lore.kernel.org/bpf/20260517214043.12975-2-adubey@linux.ibm.com
> 
>> +	 * OOL stub aligned	CONFIG_PPC_FTRACE_OUT_OF_LINE	Alignment NOP
>> +	 *      Y                               Y                     N
> 
>> +	 *      Y                               N                     Y
> 
> Please help me understand why a NOP is needed here
> 
>> +	 *      N                               Y                     Y
> 
>> +	 *      N                               N                     N
> 
> and also, why a NOP isn't needed here..
> 
> - Hari
-Abhishek


^ permalink raw reply

* Re: [PATCH v4 3/5] selftest/bpf: Fixing powerpc JIT disassembly failure
From: bot+bpf-ci @ 2026-05-17 18:18 UTC (permalink / raw)
  To: adubey, bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260517214043.12975-4-adubey@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]

> diff --git a/tools/testing/selftests/bpf/jit_disasm_helpers.c b/tools/testing/selftests/bpf/jit_disasm_helpers.c
> index 364c557c5115..4c6bcbe08491 100644
> --- a/tools/testing/selftests/bpf/jit_disasm_helpers.c
> +++ b/tools/testing/selftests/bpf/jit_disasm_helpers.c
> @@ -170,9 +170,11 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
>  	struct bpf_prog_info info = {};
>  	__u32 info_len = sizeof(info);
>  	__u32 jited_funcs, len, pc;
> +	__u32 trunc_len = 0;
>  	__u32 *func_lens = NULL;
>  	FILE *text_out = NULL;
>  	uint8_t *image = NULL;
> +	char *triple = NULL;
>  	int i, err = 0;
>
>  	if (!llvm_initialized) {

[ ... ]

> @@ -216,9 +218,18 @@ int get_jited_program_text(int fd, char *text, size_t text_sz)
>  	if (!ASSERT_OK(err, "bpf_prog_get_info_by_fd #2"))
>  		goto out;
>
> +	/*
> +	 * last 8 bytes contains dummy_trampoline address in JIT
> +	 * output for 64-bit and 32-bit powerpc, which can't
> +	 * disassemble a to valid instruction.
> +	 */
> +	triple = LLVMGetDefaultTargetTriple();
> +	if (strstr(triple, "powerpc"))
> +		trunc_len = 8;
> +

Does this code leak the triple string? Looking at the LLVM C API,
LLVMGetDefaultTargetTriple() returns a string that must be freed with
LLVMDisposeMessage(). The sibling function disasm_one_func() in this
same file shows the correct pattern:

    if (triple)
        LLVMDisposeMessage(triple);

The cleanup section at the 'out:' label properly frees text_out, image,
and func_lens, but triple isn't included. While this is test code and
the leak is small, it violates the established cleanup pattern in the
file and can accumulate across test runs.

>  	for (pc = 0, i = 0; i < jited_funcs; ++i) {
>  		fprintf(text_out, "func #%d:\n", i);
> -		disasm_one_func(text_out, image + pc, func_lens[i]);
> +		disasm_one_func(text_out, image + pc, func_lens[i] - trunc_len);
>  		fprintf(text_out, "\n");
>  		pc += func_lens[i];
>  	}


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343

^ permalink raw reply

* Re: [PATCH v4 4/5] selftest/bpf: Enable verifier selftest for powerpc64
From: bot+bpf-ci @ 2026-05-17 18:18 UTC (permalink / raw)
  To: adubey, bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260517214043.12975-5-adubey@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3647 bytes --]

> diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
> index 9eeb5b0b63d6..cdc2a3de3054 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_misc.h
> +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
> @@ -158,6 +158,7 @@
>  #define __arch_arm64		__arch("ARM64")
>  #define __arch_riscv64		__arch("RISCV64")
>  #define __arch_s390x		__arch("s390x")
> +#define __arch_powerpc64	__arch("POWERPC64")
>  #define __caps_unpriv(caps)	__test_tag("test_caps_unpriv=" EXPAND_QUOTE(caps))
>  #define __load_if_JITed()	__test_tag("load_mode=jited")
>  #define __load_if_no_JITed()	__test_tag("load_mode=no_jited")
> diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
> index abdb9e6e3713..d5589355ed9e 100644
> --- a/tools/testing/selftests/bpf/test_loader.c
> +++ b/tools/testing/selftests/bpf/test_loader.c
> @@ -377,6 +377,7 @@ enum arch {
>  	ARCH_ARM64	= 0x4,
>  	ARCH_RISCV64	= 0x8,
>  	ARCH_S390X	= 0x10,
> +	ARCH_POWERPC64	= 0x20,
>  };
>
>  static int get_current_arch(void)
> @@ -389,6 +390,8 @@ static int get_current_arch(void)
>  	return ARCH_RISCV64;
>  #elif defined(__s390x__)
>  	return ARCH_S390X;
> +#elif defined(__powerpc64__)
> +	return ARCH_POWERPC64;
>  #endif
>  	return ARCH_UNKNOWN;
>  }
> @@ -580,6 +583,8 @@ static int parse_test_spec(struct test_loader *tester,
>  				arch = ARCH_RISCV64;
>  			} else if (strcmp(val, "s390x") == 0) {
>  				arch = ARCH_S390X;
> +			} else if (strcmp(val, "POWERPC64") == 0) {
> +				arch = ARCH_POWERPC64;
>  			} else {
>  				PRINT_FAIL("bad arch spec: '%s'\n", val);
>  				err = -EINVAL;

This patch looks clean. However, unaddressed review comments from earlier
versions of related patches in the series were identified:

https://lore.kernel.org/bpf/20260517180201.87B8AC2BCB0@smtp.kernel.org

Three issues were raised by sashiko-bot@kernel.org on Sun, 17 May 2026 about
alignment logic in arch/powerpc/net/bpf_jit_comp.c (from a related commit in
this series):

Issue 1: In bpf_jit_build_fentry_stubs(), the alignment NOP is only emitted
when (fimage && image) is true. During pass 0 sizing, both are NULL, so the
NOP is skipped and proglen is calculated without it.

During final passes 1 and 2, if the NOP is emitted, could this increase the
instruction count beyond proglen and overflow into the exception table or
fixups?

Issue 2: In the same function, the alignment calculation appears flawed:

    unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
    pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11 : 10;

The comment says "11/10 instructions" but pc is a byte address. Adding 11 or
10 adds bytes instead of the intended 44 or 40 bytes (since PowerPC
instructions are 4 bytes).

Adding 4 bytes to an 8-byte aligned address makes it unaligned. Does this
force the code to emit a NOP when the target address is already aligned?

Issue 3: In bpf_jit_emit_exit_insn(), the code passes NULL for fimage:

    } else {
        ctx->alt_exit_addr = ctx->idx * 4;
        bpf_jit_build_epilogue(image, NULL, ctx);

Because fimage is strictly passed as NULL, the (fimage && image) check will
always evaluate to false. Will this cause the alignment NOP to be
unconditionally skipped for all inline epilogues, leaving their
dummy_tramp_addr fields unaligned?

These comments should be addressed in the related commits of this series.

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343

^ permalink raw reply

* Re: [PATCH v4 1/5] powerpc/bpf: fix alignment of long branch trampoline address
From: bot+bpf-ci @ 2026-05-17 18:30 UTC (permalink / raw)
  To: adubey, bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260517214043.12975-2-adubey@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 6519 bytes --]

> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index f32de8704d4d..71e6e7d01057 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -214,8 +214,8 @@ int bpf_jit_emit_func_call_rel(u32 *image, u32 *fimage, struct codegen_context *
>  int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct codegen_context *ctx,
>  		       u32 *addrs, int pass, bool extra_pass);
>  void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
> -void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx);
> +void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx);
> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx);
>  void bpf_jit_realloc_regs(struct codegen_context *ctx);
>  int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg, long exit_addr);
>  void prepare_for_fsession_fentry(u32 *image, struct codegen_context *ctx, int cookie_cnt,

The header file arch/powerpc/net/bpf_jit.h is shared by both the 64-bit
(bpf_jit_comp64.c) and 32-bit (bpf_jit_comp32.c) BPF JIT compilers (per
arch/powerpc/net/Makefile). The patch updates these prototypes to add a
new 'u32 *fimage' parameter, and modifies the PPC64 implementation
(bpf_jit_comp64.c) and the shared file (bpf_jit_comp.c) accordingly.

However, arch/powerpc/net/bpf_jit_comp32.c is not updated. In that file,
bpf_jit_build_epilogue() still has the 2-argument signature:

  void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)

and its call to bpf_jit_build_fentry_stubs() passes only 2 arguments:

  bpf_jit_build_fentry_stubs(image, ctx);

Will this cause a function signature conflict under CONFIG_PPC32 builds
(CONFIG_PPC_BOOK3S_32, CONFIG_PPC_8xx, etc.) since the header now
declares the 3-argument prototype while the .c file defines the
2-argument version?

> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index 53ab97ad6074..ef7614177cb1 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -49,11 +49,34 @@ asm (
>  "	.popsection				;"
>  );
>
> -void bpf_jit_build_fentry_stubs(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx)
>  {
>  	int ool_stub_idx, long_branch_stub_idx;
>
>  	/*
> +	 * In the final pass, align the mis-aligned dummy_tramp_addr field
> +	 * in the fimage. The alignment NOP must appear before OOL stub,
> +	 * to make ool_stub_idx & long_branch_stub_idx constant from end.
> +	 */
> +#ifdef CONFIG_PPC64
> +	if (fimage && image) {

During pass 0 (where fimage and image are NULL), this condition evaluates
to false. The alignment NOP is not counted in proglen during size
calculation. During the final passes (1 and 2), if the NOP is emitted,
could this increase the instruction count beyond proglen and overflow into
the subsequent exception table or fixups?

> +		/*
> +		 * pc points to first instruction of OOL stub,
> +		 * dummy_tramp_addr is past 4/3 instructions depending on
> +		 * CONFIG_PPC_FTRACE_OUT_OF_LINE is enabled/not respectively.
> +		 *
> +		 * The decision to emit alignment NOP must depend on the alignment
> +		 * of dummy_tramp_addr field.
> +		 */
> +		unsigned long pc = (unsigned long)fimage + CTX_NIA(ctx);
> +		pc += IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4 : 3;
> +
> +		if (!IS_ALIGNED(pc, 8))
> +			EMIT(PPC_RAW_NOP());
> +	}

The alignment calculation appears to be incorrect. The code adds 4 or 3 to
the pc value, which is a byte address. Since PowerPC instructions are 4
bytes each, adding 4 or 3 bytes gives the wrong offset.

Looking at the OOL stub instructions that follow this block, with
CONFIG_PPC_FTRACE_OUT_OF_LINE there are 4 instructions (mflr r0; nop;
mtlr r0; b bpf_func+4). This produces a 16-byte offset (4 instructions * 4
bytes). Without the config, there are 3 instructions producing a 12-byte
offset.

The dummy_tramp_addr field would therefore be at pc+16 or pc+12, not pc+4
or pc+3.

For example, with CONFIG_PPC_FTRACE_OUT_OF_LINE enabled and pc 8-byte
aligned (pc%8 == 0):

- True dummy_tramp_addr position: pc+16, which is (pc+16)%8 == 0, already
  aligned.
- But the code calculates: pc += 4 -> pc+4, then !IS_ALIGNED(pc+4, 8) is
  true -> emits NOP.
- With the NOP, dummy_tramp_addr ends up at pc+4+16 = pc+20, and (pc+20)%8
  == 4, which is misaligned.

Similarly, when pc%8 == 4:

- True dummy_tramp_addr position: pc+16, which is (pc+16)%8 == 4, not
  aligned.
- But the code calculates: pc += 4 -> pc+4, then IS_ALIGNED(pc+4, 8) is
  true -> skips NOP.
- dummy_tramp_addr remains at pc+16, which is misaligned.

Does this flawed arithmetic force the code to emit a NOP when the target
address is already aligned, thereby misaligning it instead of aligning it?

> @@ -107,7 +131,7 @@ int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int tmp_reg,
>  		PPC_JMP(ctx->alt_exit_addr);
>  	} else {
>  		ctx->alt_exit_addr = ctx->idx * 4;
> -		bpf_jit_build_epilogue(image, ctx);
> +		bpf_jit_build_epilogue(image, NULL, ctx);
>  	}

Because fimage is strictly passed as NULL here, the if (fimage && image)
check in bpf_jit_build_fentry_stubs() will always evaluate to false. Will
this cause the alignment NOP to be unconditionally skipped for all inline
epilogues, leaving their dummy_tramp_addr fields unaligned?

[ ... ]

> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index db364d9083e7..885dc8cf55a2 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -398,7 +398,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
>  	}
>  }
>
> -void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
> +void bpf_jit_build_epilogue(u32 *image, u32 *fimage, struct codegen_context *ctx)
>  {
>  	bpf_jit_emit_common_epilogue(image, ctx);
>
> @@ -407,7 +407,7 @@ void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
>
>  	EMIT(PPC_RAW_BLR());
>
> -	bpf_jit_build_fentry_stubs(image, ctx);
> +	bpf_jit_build_fentry_stubs(image, fimage, ctx);
>  }

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343

^ permalink raw reply

* Re: [PATCH v4 2/5] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
From: bot+bpf-ci @ 2026-05-17 18:30 UTC (permalink / raw)
  To: adubey, bpf
  Cc: hbathini, linuxppc-dev, maddy, ast, andrii, daniel, shuah,
	linux-kselftest, stable, adubey, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260517214043.12975-3-adubey@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]

> commit c1b9d3c3e6cf6547bc01395bd0576b1ea8dbd1d2
> Author: Abhishek Dubey <adubey@linux.ibm.com>
> Date:   Sun May 17 17:40:40 2026 -0400
>
>     powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
>
>     Move the long branch address space to the bottom of the long
>     branch stub. This allows uninterrupted disassembly until the
>     last 8 bytes. Exclude these last bytes from the overall
>     program length to prevent failure in assembly generation.
>     Also, align dummy_tramp_addr field with 8-byte boundary.
>
>     Following is disassembler output for test program with moved down
>     dummy_tramp_addr field:
>     .....
>     .....
>     pc:68    left:44     a6 03 08 7c  :  mtlr 0
>     pc:72    left:40     bc ff ff 4b  :  b .-68
>     pc:76    left:36     a6 02 68 7d  :  mflr 11
>     pc:80    left:32     05 00 9f 42  :  bcl 20, 31, .+4
>     pc:84    left:28     a6 02 88 7d  :  mflr 12
>     pc:88    left:24     14 00 8c e9  :  ld 12, 20(12)
>     pc:92    left:20     a6 03 89 7d  :  mtctr 12
>     pc:96    left:16     a6 03 68 7d  :  mtlr 11
>     pc:100   left:12     20 04 80 4e  :  bctr
>     pc:104   left:8      c0 34 1d 00  :
>
>     Failure log:
>     Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
>     Disassembly logic can truncate at 104, ignoring last 8 bytes.
>
>     Update the dummy_tramp_addr field offset calculation from the end
>     of the program to reflect its new location, for bpf_arch_text_poke()
>     to update the actual trampoline's address in this field.
>
>     All BPF trampoline selftests continue to pass with this patch applied.
>
>     Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>

This looks like a bug fix for the alignment issue introduced in the
previous commit. Should this include:

  Fixes: 1fac3cfb82ab ("powerpc/bpf: fix alignment of long branch trampoline address")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25998434343

^ permalink raw reply

* Re: [GIT PULL] Please pull powerpc/linux.git powerpc-7.1-3 tag
From: pr-tracker-bot @ 2026-05-17 19:28 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: Linus Torvalds, aboorvad, allyheev, bvanassche, christophe.leroy,
	julianbraha, linusw, linux-kernel, linuxppc-dev, make24, mpe,
	naveen, npiggin, sayalip, sshegde
In-Reply-To: <6d0bf34e-af80-4b1e-aae7-9d2dc27fcc25@linux.ibm.com>

The pull request you sent on Sat, 16 May 2026 11:25:28 +0530:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-7.1-3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fe6f8e913df9319db03fd107671ff02f104ca38c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


^ permalink raw reply

* [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
From: Rosen Penev @ 2026-05-17 19:28 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER,
	open list

Collect received skbs on a local list during RX polling and pass the
completed batch to netif_receive_skb_list(). This lets the networking
stack process packets from a poll cycle in bulk instead of handing each
skb up individually.

Speedup tested with bidirectional iperf3.

Before:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec   490 MBytes   411 Mbits/sec                  sender
[  5][TX-C]   0.00-10.01  sec   488 MBytes   409 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   176 MBytes   147 Mbits/sec  167            sender
[  7][RX-C]   0.00-10.01  sec   175 MBytes   146 Mbits/sec                  receiver

After:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec   502 MBytes   421 Mbits/sec                  sender
[  5][TX-C]   0.00-10.01  sec   501 MBytes   420 Mbits/sec                  receiver
[  7][RX-C]   0.00-10.00  sec   212 MBytes   178 Mbits/sec  148            sender
[  7][RX-C]   0.00-10.01  sec   211 MBytes   177 Mbits/sec                  receiver

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 drivers/net/ethernet/freescale/ucc_geth.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index 7af4b5e3f38e..bce1079fc06a 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -2894,6 +2894,7 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 	u32 bd_status;
 	u8 *bdBuffer;
 	struct net_device *dev;
+	LIST_HEAD(rx_list);
 
 	ugeth_vdbg("%s: IN", __func__);
 
@@ -2934,7 +2935,7 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 
 			dev->stats.rx_bytes += length;
 			/* Send the packet up the stack */
-			netif_receive_skb(skb);
+			list_add_tail(&skb->list, &rx_list);
 		}
 
 		skb = get_new_skb(ugeth, bd);
@@ -2960,6 +2961,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 		bd_status = in_be32((u32 __iomem *)bd);
 	}
 
+	netif_receive_skb_list(&rx_list);
+
 	ugeth->rxBd[rxQ] = bd;
 	return howmany;
 }
-- 
2.54.0



^ permalink raw reply related

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
From: Rosen Penev @ 2026-05-17 20:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list
In-Reply-To: <5ac9f5ca-4706-4617-a5fd-d3e6dd3b254a@lunn.ch>

On Sun, May 17, 2026 at 1:24 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sun, May 17, 2026 at 12:28:56PM -0700, Rosen Penev wrote:
> > Collect received skbs on a local list during RX polling and pass the
> > completed batch to netif_receive_skb_list(). This lets the networking
> > stack process packets from a poll cycle in bulk instead of handing each
> > skb up individually.
>
> So my first through was, why is the core not doing this? The core NAPI
> poll code can initialise the list. netif_receive_skb() withing the
> driver poll would see there is a list and append to it. And when the
> poll finished the NAPI core would pass the list up the stack? Maybe
> this already exists and this driver is just using the wrong API?
I do not know. I know several drivers are already using
netif_receive_skb_list, some even which support hardware checksumming.
See 0a25d92c6f4facaf2852f1aac4cebfe01dd57a91

The core seems to use netif_receive_skb_list_internal. I do not know
the details.

Anyway, the performance difference is real.
>
>         Andrew


^ permalink raw reply

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
From: Andrew Lunn @ 2026-05-17 20:24 UTC (permalink / raw)
  To: Rosen Penev
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list
In-Reply-To: <20260517192856.3925-1-rosenp@gmail.com>

On Sun, May 17, 2026 at 12:28:56PM -0700, Rosen Penev wrote:
> Collect received skbs on a local list during RX polling and pass the
> completed batch to netif_receive_skb_list(). This lets the networking
> stack process packets from a poll cycle in bulk instead of handing each
> skb up individually.

So my first through was, why is the core not doing this? The core NAPI
poll code can initialise the list. netif_receive_skb() withing the
driver poll would see there is a list and append to it. And when the
poll finished the NAPI core would pass the list up the stack? Maybe
this already exists and this driver is just using the wrong API?

	Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: ucc_geth: Batch RX packets before stack handoff
From: Andrew Lunn @ 2026-05-17 21:01 UTC (permalink / raw)
  To: Rosen Penev
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni,
	open list:FREESCALE QUICC ENGINE UCC ETHERNET DRIVER, open list
In-Reply-To: <CAKxU2N_SygFAKQuhwnZG7SP7jiRdkmwS_R+VwHap33UUfbQmAg@mail.gmail.com>

On Sun, May 17, 2026 at 01:44:40PM -0700, Rosen Penev wrote:
> On Sun, May 17, 2026 at 1:24 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Sun, May 17, 2026 at 12:28:56PM -0700, Rosen Penev wrote:
> > > Collect received skbs on a local list during RX polling and pass the
> > > completed batch to netif_receive_skb_list(). This lets the networking
> > > stack process packets from a poll cycle in bulk instead of handing each
> > > skb up individually.
> >
> > So my first through was, why is the core not doing this? The core NAPI
> > poll code can initialise the list. netif_receive_skb() withing the
> > driver poll would see there is a list and append to it. And when the
> > poll finished the NAPI core would pass the list up the stack? Maybe
> > this already exists and this driver is just using the wrong API?
> I do not know. I know several drivers are already using
> netif_receive_skb_list, some even which support hardware checksumming.
> See 0a25d92c6f4facaf2852f1aac4cebfe01dd57a91
> 
> The core seems to use netif_receive_skb_list_internal. I do not know
> the details.
> 
> Anyway, the performance difference is real.

I'm not disagreeing with that. But can a similar performance
difference be made for all drivers by doing this is the core?

That is the interesting question.

     Andrew


^ permalink raw reply

* Re: [PATCH] powerpc/64s: Fix the vector number in comments for h_facility_unavailable
From: Vaibhav Jain @ 2026-05-18  3:17 UTC (permalink / raw)
  To: Gautam Menghani
  Cc: Gautam Menghani, maddy, mpe, npiggin, chleroy, linuxppc-dev,
	linux-kernel
In-Reply-To: <agcaxAANsk5gNzWX@Gautams-MacBook-Pro.local>

Gautam Menghani <gautam@linux.ibm.com> writes:

> On Wed, May 13, 2026 at 02:35:29PM +0530, Vaibhav Jain wrote:
>> Hey Gautam,
>> 
>> Thanks for the patch. Since this patch doesnt have any functional or
>> code change can you please put a 'trivial' suffix to it patch title like
>> [1] or some other suffix indicating its a non-functional change. That
>> way maintainers can easily pull the patch without worrying much about a
>> regression.
>> 
>> [1]
>> https://git.kernel.org/powerpc/c/d2827e5e2e0f0941a651f4b1ca5e9b778c4b5293
>
> Yeah I've mentioned "comments" in the title, so I guess that's fine?
>
> Thanks,
> Gautam

Bike shedding a bit but using the term 'trivial' might be better then
using the term 'comments'

-- 
Cheers
~ Vaibhav


^ permalink raw reply

* [PATCH v2] KVM: PPC: Kconfig: Enable CONFIG_VPA_PMU with KVM
From: Gautam Menghani @ 2026-05-18  4:41 UTC (permalink / raw)
  To: maddy, npiggin, mpe, chleroy, atrajeev
  Cc: Gautam Menghani, linuxppc-dev, kvm, linux-kernel, stable

Enable CONFIG_VPA_PMU with KVM to enable its usage. Currently, the
vpa-pmu driver cannot be used since it is not enabled in distro configs.

On fedora kernel 6.13.7, the config option is disabled:
$ cat /boot/config-6.19.12-200.fc43.ppc64le  | grep VPA_PMU
 # CONFIG_VPA_PMU is not set

Fixes: 176cda0619b6c ("powerpc/perf: Add perf interface to expose vpa counters")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
---
v1 -> v2:
1. Rebased on latest master

 arch/powerpc/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 9a0d1c1aca6c..56e86b46ff13 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -82,6 +82,7 @@ config KVM_BOOK3S_64_HV
 	select KVM_BOOK3S_HV_POSSIBLE
 	select KVM_BOOK3S_HV_PMU
 	select CMA
+	select VPA_PMU if HV_PERF_CTRS
 	help
 	  Support running unmodified book3s_64 guest kernels in
 	  virtual machines on POWER7 and newer processors that have
-- 
2.53.0



^ permalink raw reply related

* Re: [PATCH v2 0/4] powerpc: A few misc cpumask changes
From: Shrikanth Hegde @ 2026-05-18  4:44 UTC (permalink / raw)
  To: maddy, chleroy; +Cc: linux-kernel, linux, linuxppc-dev, yury.norov
In-Reply-To: <20260427044715.559137-1-sshegde@linux.ibm.com>

Hi Maddy, Christophe,


> Based on tip/master at: (dffc5753ba4c "Merge branch into tip/master: 'timers/clocksource'")
>

Gentle Ping,
Still applies cleanly to today's tip master.
  
> Shrikanth Hegde (4):
>    powerpc: Use cpumask_next_wrap instead
>    powerpc: Simplify cpumask api usage for cpuinfo display
>    powerpc/perf: Use cpumask_intersects api for checking disable path
>    powerpc/xive: Add warning if target CPU not found
> 
>   arch/powerpc/kernel/irq.c             | 5 +----
>   arch/powerpc/kernel/setup-common.c    | 7 ++-----
>   arch/powerpc/mm/book3s64/hash_utils.c | 4 +---
>   arch/powerpc/perf/imc-pmu.c           | 6 ++----
>   arch/powerpc/sysdev/xive/common.c     | 1 +
>   5 files changed, 7 insertions(+), 16 deletions(-)
> 


Do you have any comments on this? Do you want me to resend with yury's tag?


^ permalink raw reply

* [PATCH 0/3] powerpc: fix preempt_count imbalances in perf and kexec paths
From: Aboorva Devarajan @ 2026-05-18  5:08 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde

Hi all,

This patch series fixes some minor preempt_count bookkeeping issues in
arch/powerpc/ found during a preemption leak audit prompted by the
lazy/full preemption model changes. These are get_cpu/put_cpu and
get_cpu_var/put_cpu_var pairing errors that leave preempt_count
incorrectly elevated or underflowed.

Please let me know your comments.

Thanks,
Aboorva

Aboorva Devarajan (3):
  powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
  powerpc/powernv: fix preempt count leak in
    pnv_kexec_wait_secondaries_down
  powerpc/kexec: fix double get_cpu() imbalance in kexec_prepare_cpus

 arch/powerpc/kexec/core_64.c           | 15 ++++++++-------
 arch/powerpc/perf/core-fsl-emb.c       |  3 ++-
 arch/powerpc/platforms/powernv/setup.c |  2 +-
 3 files changed, 11 insertions(+), 9 deletions(-)

-- 
2.54.0



^ permalink raw reply

* [PATCH 1/3] powerpc/perf: fix preempt count underflow in fsl_emb_pmu_del
From: Aboorva Devarajan @ 2026-05-18  5:08 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde
In-Reply-To: <20260518050855.1147242-1-aboorvad@linux.ibm.com>

fsl_emb_pmu_del() unconditionally calls put_cpu_var(cpu_hw_events) at
the 'out:' label, but only calls the matching get_cpu_var() after the
'i < 0' early-return check. When event->hw.idx is negative the
function jumps to 'out:' without having taken get_cpu_var(), and the
trailing put_cpu_var() then issues an unmatched preempt_enable(),
underflowing preempt_count.

On a CONFIG_PREEMPT=y kernel preempt_count would underflow and
eventually present as a 'scheduling while atomic' BUG.

Move put_cpu_var() to pair with get_cpu_var() so the percpu access is
correctly bracketed and the 'out:' label only handles perf_pmu_enable.

Fixes: a11106544f33c ("powerpc/perf: e500 support")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 arch/powerpc/perf/core-fsl-emb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 7120ab20cbfec..02b5dd74c187a 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -366,9 +366,10 @@ static void fsl_emb_pmu_del(struct perf_event *event, int flags)

 	cpuhw->n_events--;

+	put_cpu_var(cpu_hw_events);
+
  out:
 	perf_pmu_enable(event->pmu);
-	put_cpu_var(cpu_hw_events);
 }

 static void fsl_emb_pmu_start(struct perf_event *event, int ef_flags)
-- 
2.54.0

^ permalink raw reply related

* [PATCH 2/3] powerpc/powernv: fix preempt count leak in pnv_kexec_wait_secondaries_down
From: Aboorva Devarajan @ 2026-05-18  5:08 UTC (permalink / raw)
  To: Madhavan Srinivasan, linuxppc-dev
  Cc: Athira Rajeev, Aboorva Devarajan, Christophe Leroy, linux-kernel,
	Sourabh Jain, Ritesh Harjani, Shrikanth Hegde
In-Reply-To: <20260518050855.1147242-1-aboorvad@linux.ibm.com>

pnv_kexec_wait_secondaries_down() calls get_cpu() to obtain the current
CPU id but never calls the matching put_cpu(), leaking one
preempt_disable() nesting level on every invocation.

In practice the imbalance does not trigger a visible splat because the
kexec teardown path is a one-way trip: IRQs are already disabled, no
schedule() occurs after the leak, and default_machine_kexec() overwrites
preempt_count with HARDIRQ_OFFSET before jumping into kexec_sequence()
which never returns. However the bookkeeping is still wrong.

In the kexec teardown path IRQs are already disabled and the CPU is
pinned, so get_cpu()'s preempt_disable() side-effect is unnecessary.
Replace get_cpu() with raw_smp_processor_id() which returns the CPU id
without touching preempt_count.

Fixes: 298b34d7d578 ("powerpc/powernv: Fix kexec races going back to OPAL")
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 4dbb47ddbdcc4..177da0defcb36 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -396,7 +396,7 @@ static void pnv_kexec_wait_secondaries_down(void)
 {
 	int my_cpu, i, notified = -1;

-	my_cpu = get_cpu();
+	my_cpu = raw_smp_processor_id();

 	for_each_online_cpu(i) {
 		uint8_t status;
-- 
2.54.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox