Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 2/4] KVM: arm64: timer: Kill the per-timer level cache
From: Marc Zyngier @ 2026-04-17 15:56 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Deepanshu Kartikey, Joey Gouly, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu
In-Reply-To: <20260417124612.2770268-3-maz@kernel.org>

On Fri, 17 Apr 2026 13:46:10 +0100,
Marc Zyngier <maz@kernel.org> wrote:
> 
> The timer code makes use of a per-timer irq level cache, which
> looks like a very minor optimisation to avoid taking a lock upon
> updating the GIC view of the interrupt when it is unchanged from
> the previous state.
> 
> This is coming in the way of more important correctness issues,
> so get rid of the cache, which simplifies a couple of minor things.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/arch_timer.c  | 18 +++++++++---------
>  include/kvm/arm_arch_timer.h |  5 -----
>  2 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index d6802fc87e085..fdc1afff06340 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -446,9 +446,8 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  {
>  	kvm_timer_update_status(timer_ctx, new_level);
>  
> -	timer_ctx->irq.level = new_level;
>  	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_irq(timer_ctx),
> -				   timer_ctx->irq.level);
> +				   new_level);
>  
>  	if (userspace_irqchip(vcpu->kvm))
>  		return;
> @@ -466,7 +465,7 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  
>  	kvm_vgic_inject_irq(vcpu->kvm, vcpu,
>  			    timer_irq(timer_ctx),
> -			    timer_ctx->irq.level,
> +			    new_level,
>  			    timer_ctx);
>  }
>  
> @@ -477,8 +476,7 @@ static void timer_emulate(struct arch_timer_context *ctx)
>  
>  	trace_kvm_timer_emulate(ctx, pending);
>  
> -	if (pending != ctx->irq.level)
> -		kvm_timer_update_irq(timer_context_to_vcpu(ctx), pending, ctx);
> +	kvm_timer_update_irq(timer_context_to_vcpu(ctx), pending, ctx);
>  
>  	kvm_timer_update_status(ctx, pending);

As my new best mate Sashiko pointed out, kvm_timer_update_status()
here becomes redundant, as the unconditional call to
kvm_timer_update_irq() already contains that.

I'll drop it from the patch when applying, unless there are more
comments.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply

* Re: [PATCH bpf-next v2] arm32, bpf: Reject BPF-to-BPF calls and callbacks in the JIT
From: Emil Tsalapatis @ 2026-04-17 15:48 UTC (permalink / raw)
  To: Puranjay Mohan, bpf, linux-arm-kernel
  Cc: Jonas Rebmann, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Russell King, kernel
In-Reply-To: <20260417143353.838911-1-puranjay@kernel.org>

On Fri Apr 17, 2026 at 10:33 AM EDT, Puranjay Mohan wrote:
> The ARM32 BPF JIT does not support BPF-to-BPF function calls
> (BPF_PSEUDO_CALL) or callbacks (BPF_PSEUDO_FUNC), but it does
> not reject them either.
>
> When a program with subprograms is loaded (e.g. libxdp's XDP
> dispatcher uses __noinline__ subprograms, or any program using
> callbacks like bpf_loop or bpf_for_each_map_elem), the verifier
> invokes bpf_jit_subprogs() which calls bpf_int_jit_compile()
> for each subprogram.
>
> For BPF_PSEUDO_CALL, since ARM32 does not reject it, the JIT
> silently emits code using the wrong address computation:
>
>     func = __bpf_call_base + imm
>
> where imm is a pc-relative subprogram offset, producing a bogus
> function pointer.
>
> For BPF_PSEUDO_FUNC, the ldimm64 handler ignores src_reg and
> loads the immediate as a normal 64-bit value without error.
>
> In both cases, build_body() reports success and a JIT image is
> allocated. ARM32 lacks the jit_data/extra_pass mechanism needed
> for the second JIT pass in bpf_jit_subprogs(). On the second
> pass, bpf_int_jit_compile() performs a full fresh compilation,
> allocating a new JIT binary and overwriting prog->bpf_func. The
> first allocation is never freed. bpf_jit_subprogs() then detects
> the function pointer changed and aborts with -ENOTSUPP, but the
> original JIT binary has already been leaked. Each program
> load/unload cycle leaks one JIT binary allocation, as reported
> by kmemleak:
>
>     unreferenced object 0xbf0a1000 (size 4096):
>       backtrace:
>         bpf_jit_binary_alloc+0x64/0xfc
>         bpf_int_jit_compile+0x14c/0x348
>         bpf_jit_subprogs+0x4fc/0xa60
>
> Fix this by rejecting both BPF_PSEUDO_CALL in the BPF_CALL
> handler and BPF_PSEUDO_FUNC in the BPF_LD_IMM64 handler, falling
> through to the existing 'notyet' path. This causes build_body()
> to fail before any JIT binary is allocated, so
> bpf_int_jit_compile() returns the original program unjitted.
> bpf_jit_subprogs() then sees !prog->jited and cleanly falls
> back to the interpreter with no leak.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>

The Fixes tag is a bit unrelated since it's for x64 but the original
commit that adds the file (ddecdfcea0ae8 ?) is so far back it probably
doesn't matter.

>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
> Reported-by: Jonas Rebmann <jre@pengutronix.de>
> Closes: https://lore.kernel.org/bpf/b63e9174-7a3d-4e22-8294-16df07a4af89@pengutronix.de
> Tested-by: Jonas Rebmann <jre@pengutronix.de>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>
> Changelog:
> v1: https://lore.kernel.org/all/20260417103004.3552500-1-puranjay@kernel.org/
> Changes in v2:
> - Add Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> - Reject BPF_PSEUDO_FUNC in the BPF_LD | BPF_IMM | BPF_DW handler
> - Move code below declarations
>
> ---
>  arch/arm/net/bpf_jit_32.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index deeb8f292454..a900aa973885 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -1852,6 +1852,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
>  	{
>  		u64 val = (u32)imm | (u64)insn[1].imm << 32;
>  
> +		if (insn->src_reg == BPF_PSEUDO_FUNC)
> +			goto notyet;
> +
>  		emit_a32_mov_i64(dst, val, ctx);
>  
>  		return 1;
> @@ -2055,6 +2058,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
>  		const s8 *r5 = bpf2a32[BPF_REG_5];
>  		const u32 func = (u32)__bpf_call_base + (u32)imm;
>  
> +		if (insn->src_reg == BPF_PSEUDO_CALL)
> +			goto notyet;
> +
>  		emit_a32_mov_r64(true, r0, r1, ctx);
>  		emit_a32_mov_r64(true, r1, r2, ctx);
>  		emit_push_r64(r5, ctx);
>
> base-commit: 1f5ffc672165ff851063a5fd044b727ab2517ae3



^ permalink raw reply

* Re: [PATCH v3 8/8] unwind: arm64: Use sframe to unwind interrupt frames.
From: Jens Remus @ 2026-04-17 15:45 UTC (permalink / raw)
  To: Dylan Hatch, Roman Gushchin, Weinan Liu, Will Deacon,
	Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Steven Rostedt,
	Catalin Marinas, Jiri Kosina
  Cc: Mark Rutland, Prasanna Kumar T S M, Puranjay Mohan, Song Liu,
	joe.lawrence, linux-toolchains, linux-kernel, live-patching,
	linux-arm-kernel, Heiko Carstens
In-Reply-To: <20260406185000.1378082-9-dylanbhatch@google.com>

Hello Dylan and Weinan!

On 4/6/2026 8:50 PM, Dylan Hatch wrote:
> Add unwind_next_frame_sframe() function to unwind by sframe info if
> present. Use this method at exception boundaries, falling back to
> frame-pointer unwind only on failure. In such failure cases, the
> stacktrace is considered unreliable.
> 
> During normal unwind, prefer frame pointer unwind (for better
> performance) with sframe as a backup.
> 
> This change restores the LR behavior originally introduced in commit
> c2c6b27b5aa14fa2 ("arm64: stacktrace: unwind exception boundaries"),
> But later removed in commit 32ed1205682e ("arm64: stacktrace: Skip
> reporting LR at exception boundaries")
> 
> This can be done because the sframe data can be used to determine
> whether the LR is current for the PC value recovered from pt_regs at the
> exception boundary.
> 
> Signed-off-by: Weinan Liu <wnliu@google.com>
> Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
> Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>

> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c

> +/*
> + * Unwind to the next frame according to sframe.
> + */
> +static __always_inline int
> +unwind_next_frame_sframe(struct kunwind_state *state)
> +{
> +	struct unwind_frame frame;
> +	unsigned long cfa, fp, ra;
> +	enum kunwind_source source = KUNWIND_SOURCE_FRAME;
> +	struct pt_regs *regs = state->regs;
> +
> +	int err;
> +
> +	/* FP/SP alignment 8 bytes */
> +	if (state->common.fp & 0x7 || state->common.sp & 0x7)
> +		return -EINVAL;
> +
> +	/*
> +	 * Most/all outermost functions are not visible to sframe. So, check for
> +	 * a meta frame record if the sframe lookup fails.
> +	 */
> +	err = sframe_find_kernel(state->common.pc, &frame);
> +	if (err)
> +		return kunwind_next_frame_record_meta(state);
> +
> +	if (frame.outermost)
> +		return -ENOENT;
> +
> +	/* Get the Canonical Frame Address (CFA) */
> +	switch (frame.cfa.rule) {
> +	case UNWIND_CFA_RULE_SP_OFFSET:
> +		cfa = state->common.sp;
> +		break;
> +	case UNWIND_CFA_RULE_FP_OFFSET:
> +		if (state->common.fp < state->common.sp)
> +			return -EINVAL;

I wonder whether that check is valid in kernel?  Looking at
call_on_irq_stack() saving SP in FP and then loading SP with the IRQ SP.
Is that condition always true then?

> +		cfa = state->common.fp;
> +		break;
> +	case UNWIND_CFA_RULE_REG_OFFSET:
> +	case UNWIND_CFA_RULE_REG_OFFSET_DEREF:
> +		if (!regs)

		if (!regs || frame.cfa.regnum > 30)

> +			return -EINVAL;
> +		cfa = regs->regs[frame.cfa.regnum];

In unwind user this is guarded by a topmost frame check, as arbitrary
registers are otherwise not available.  Isn't this necessary in the
kernel case?

> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return -EINVAL;
> +	}
> +	cfa += frame.cfa.offset;
> +
> +	/*
> +	 * CFA typically points to a higher address than RA or FP, so don't
> +	 * consume from the stack when we read it.
> +	 */
> +	if (frame.cfa.rule & UNWIND_RULE_DEREF &&
> +	    !get_word(&state->common, &cfa))
> +		return -EINVAL;
> +
> +	/* CFA alignment 8 bytes */
> +	if (cfa & 0x7)
> +		return -EINVAL;
> +
> +	/* Get the Return Address (RA) */
> +	switch (frame.ra.rule) {
> +	case UNWIND_RULE_RETAIN:
> +		if (!regs)
> +			return -EINVAL;
> +		ra = regs->regs[30];

Likewise: Topmost frame check not required to access arbitrary registers
(including RA/LR)?  Furthermore, provided don't have a thinko, LR may
only be in LR in the topmost frame.  In any other frame it must have
been saved.  Otherwise there would be an endless return loop.

> +		source = KUNWIND_SOURCE_REGS_LR;
> +		break;
> +	/* UNWIND_USER_RULE_CFA_OFFSET not implemented on purpose */
> +	case UNWIND_RULE_CFA_OFFSET_DEREF:
> +		ra = cfa + frame.ra.offset;
> +		break;
> +	case UNWIND_RULE_REG_OFFSET:
> +	case UNWIND_RULE_REG_OFFSET_DEREF:
> +		if (!regs)

		if (!regs || frame.cfa.regnum > 30)

> +			return -EINVAL;
> +		ra = regs->regs[frame.cfa.regnum];

Likewise: Topmost frame check not required to access arbitrary registers?

> +		ra += frame.ra.offset;
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return -EINVAL;
> +	}
> +
> +	/* Get the Frame Pointer (FP) */
> +	switch (frame.fp.rule) {
> +	case UNWIND_RULE_RETAIN:
> +		fp = state->common.fp;
> +		break;
> +	/* UNWIND_USER_RULE_CFA_OFFSET not implemented on purpose */
> +	case UNWIND_RULE_CFA_OFFSET_DEREF:
> +		fp = cfa + frame.fp.offset;
> +		break;
> +	case UNWIND_RULE_REG_OFFSET:
> +	case UNWIND_RULE_REG_OFFSET_DEREF:
> +		if (!regs)

		if (!regs || frame.cfa.regnum > 30)

> +			return -EINVAL;
> +		fp = regs->regs[frame.fp.regnum];

Likewise: Topmost frame check not required to access arbitrary registers?

> +		fp += frame.fp.offset;
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Consume RA and FP from the stack. The frame record puts FP at a lower
> +	 * address than RA, so we always read FP first.
> +	 */
> +	if (frame.fp.rule & UNWIND_RULE_DEREF &&
> +	    !get_word(&state->common, &fp))
> +		return -EINVAL;
> +
> +	if (frame.ra.rule & UNWIND_RULE_DEREF &&
> +	    get_consume_word(&state->common, &ra))
> +		return -EINVAL;
> +
> +	state->common.pc = ra;
> +	state->common.sp = cfa;
> +	state->common.fp = fp;
> +
> +	state->source = source;
> +
> +	return 0;
> +}
Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/



^ permalink raw reply

* Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation
From: Mark Brown @ 2026-04-17 15:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Robin Murphy, Demian Shulhan, Christoph Hellwig, Mark Rutland,
	Song Liu, Yu Kuai, Will Deacon, Catalin Marinas, linux-arm-kernel,
	Li Nan, linux-raid, linux-kernel
In-Reply-To: <c9362db6-1fef-4e70-9525-29b2936f4887@app.fastmail.com>

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

On Fri, Apr 17, 2026 at 04:43:06PM +0200, Ard Biesheuvel wrote:

> On arm64, kernel mode NEON is mostly used to gain access to AES and SHA
> instructions, and only to a lesser degree to speed up ordinary
> arithmetic, and so XOR is somewhat of an outlier here.

> Given that Neoverse V1 apparently already carves up ordinary arithmetic
> performed on 256-bit vectors and operates on 128 bits at a time, I am
> rather skeptical that we're likely to see any SVE implementations of the
> crypto extensions soon that are meaningfully faster, given that these
> are presumably much costlier to implement in terms of gate count, and
> therefore likely to be split up even on SVE implementations that can
> perform ordinary arithmetic on 256+ bit vectors in a single cycle. Note
> that even the arm64 SIMD accelerated CRC implementations rely heavily on
> 64x64->128 polynomial multiplication.

I'd not be surprised to see something that delivers useful benefits
using SVE at some point.

> IOW, before we consider kernel mode SVE, I'd like to see some benchmarks
> for other algorithms too.

Definitely, it needs a solid win to merge anything.  I do want to get
back to the situation where we've got out of tree infrastructure patches
so that people working on algorithms have something to base their work
on (and see the overheads using SVE incurs) but unless theres's a
practical user they should stay out of tree.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 15/18] Documentation: KVM: Fix typos in VGICv5 documentation
From: Joey Gouly @ 2026-04-17 15:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Sascha Bischoff
In-Reply-To: <20260415115559.2227718-16-maz@kernel.org>

On Wed, Apr 15, 2026 at 12:55:56PM +0100, Marc Zyngier wrote:
> From: Sascha Bischoff <sascha.bischoff@arm.com>
> 
> Fix two typos in the VGICv5 documentation.
> 
> Fixes: d51c978b7d3e ("KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPI")
> Fixes: eb3c4d2c9a4d ("Documentation: KVM: Introduce documentation for VGICv5")
> Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  Documentation/virt/kvm/devices/arm-vgic-v5.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/devices/arm-vgic-v5.rst b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
> index 29335ea823fc5..1985b2d880322 100644
> --- a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
> +++ b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
> @@ -12,8 +12,8 @@ Only one VGIC instance may be instantiated through this API.  The created VGIC
>  will act as the VM interrupt controller, requiring emulated user-space devices
>  to inject interrupts to the VGIC instead of directly to CPUs.
>  
> -Creating a guest GICv5 device requires a host GICv5 host.  The current VGICv5
> -device only supports PPI interrupts.  These can either be injected from emulated
> +Creating a guest GICv5 device requires a GICv5 host.  The current VGICv5 device
> +only supports PPI interrupts.  These can either be injected from emulated
>  in-kernel devices (such as the Arch Timer, or PMU), or via the KVM_IRQ_LINE
>  ioctl.
>  
> @@ -25,7 +25,7 @@ Groups:
>        request the initialization of the VGIC, no additional parameter in
>        kvm_device_attr.addr. Must be called after all VCPUs have been created.
>  
> -   KVM_DEV_ARM_VGIC_USERPSPACE_PPIs
> +   KVM_DEV_ARM_VGIC_USERPSPACE_PPIS

This still has a typo! There's a P lurking between USER and SPACE!

>        request the mask of userspace-drivable PPIs. Only a subset of the PPIs can
>        be directly driven from userspace with GICv5, and the returned mask
>        informs userspace of which it is allowed to drive via KVM_IRQ_LINE.
> -- 
> 2.47.3
> 

Thanks,
Joey


^ permalink raw reply

* [PATCH net] net: airoha: Fix PPE cpu port configuration for GDM2 loopback path
From: Lorenzo Bianconi @ 2026-04-17 15:24 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Bianconi
  Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev

When QoS loopback is enabled for GDM3 or GDM4, incoming packets are
forwarded to GDM2. However, the PPE cpu port for GDM2 is not configured
in this path, causing traffic originating from GDM3/GDM4, which may
be set up as WAN ports backed by QDMA1, to be incorrectly directed
to QDMA0 instead.
Configure the PPE cpu port for GDM2 when QoS loopback is active on
GDM3 or GDM4 to ensure traffic is routed to the correct QDMA instance.

Fixes: 9cd451d414f6 ("net: airoha: Add loopback support for GDM2")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 8 ++++++--
 drivers/net/ethernet/airoha/airoha_eth.h | 3 ++-
 drivers/net/ethernet/airoha/airoha_ppe.c | 6 +++---
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index e1ab15f1ee7d..d2b7c437a782 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1727,7 +1727,7 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_port *port)
 {
 	struct airoha_eth *eth = port->qdma->eth;
 	u32 val, pse_port, chan;
-	int src_port;
+	int i, src_port;
 
 	/* Forward the traffic to the proper GDM port */
 	pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3
@@ -1769,6 +1769,9 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_port *port)
 		      SP_CPORT_MASK(val),
 		      __field_prep(SP_CPORT_MASK(val), FE_PSE_PORT_CDM2));
 
+	for (i = 0; i < eth->soc->num_ppe; i++)
+		airoha_ppe_set_cpu_port(port, i, AIROHA_GDM2_IDX);
+
 	if (port->id == AIROHA_GDM4_IDX && airoha_is_7581(eth)) {
 		u32 mask = FC_ID_OF_SRC_PORT_MASK(port->nbq);
 
@@ -1807,7 +1810,8 @@ static int airoha_dev_init(struct net_device *dev)
 	}
 
 	for (i = 0; i < eth->soc->num_ppe; i++)
-		airoha_ppe_set_cpu_port(port, i);
+		airoha_ppe_set_cpu_port(port, i,
+					airoha_get_fe_port(port));
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 95e557638617..715aa26cbac8 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -653,7 +653,8 @@ int airoha_get_fe_port(struct airoha_gdm_port *port);
 bool airoha_is_valid_gdm_port(struct airoha_eth *eth,
 			      struct airoha_gdm_port *port);
 
-void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id);
+void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id,
+			     u8 fport);
 bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
 void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
 			  u16 hash, bool rx_wlan);
diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
index 859818676b69..5c9dff6bccd1 100644
--- a/drivers/net/ethernet/airoha/airoha_ppe.c
+++ b/drivers/net/ethernet/airoha/airoha_ppe.c
@@ -85,10 +85,9 @@ static u32 airoha_ppe_get_timestamp(struct airoha_ppe *ppe)
 	return FIELD_GET(AIROHA_FOE_IB1_BIND_TIMESTAMP, timestamp);
 }
 
-void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id)
+void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id, u8 fport)
 {
 	struct airoha_qdma *qdma = port->qdma;
-	u8 fport = airoha_get_fe_port(port);
 	struct airoha_eth *eth = qdma->eth;
 	u8 qdma_id = qdma - &eth->qdma[0];
 	u32 fe_cpu_port;
@@ -182,7 +181,8 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
 			if (!port)
 				continue;
 
-			airoha_ppe_set_cpu_port(port, i);
+			airoha_ppe_set_cpu_port(port, i,
+						airoha_get_fe_port(port));
 		}
 	}
 }

---
base-commit: 82c21069028c5db3463f851ae8ac9cc2e38a3827
change-id: 20260417-airoha-ppe-cpu-port-for-gdm2-loopback-96b9b52179c1

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>



^ permalink raw reply related

* Re: [PATCH 08/18] KVM: arm64: vgic: Rationalise per-CPU irq accessor
From: Joey Gouly @ 2026-04-17 15:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Sascha Bischoff
In-Reply-To: <20260415115559.2227718-9-maz@kernel.org>

On Wed, Apr 15, 2026 at 12:55:49PM +0100, Marc Zyngier wrote:
> Despite adding the necessary infrastructure to identify irq types,
> vgic_get_vcpu_irq() treats GICv5 PPIs in a special way, which
> impairs the readability of the code.
> 
> Use the existing irq classifiers to handle per-CPU irqs for all
> vgic types, and let the normal control flow reach global interrupt
> handling without any v5-specific path.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/vgic/vgic.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index 3ac6d49bc4876..b697678d68b01 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -106,24 +106,23 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, u32 intid)
>  
>  struct vgic_irq *vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid)
>  {
> +	enum kvm_device_type type;
> +
>  	if (WARN_ON(!vcpu))
>  		return NULL;
>  
> -	if (vgic_is_v5(vcpu->kvm)) {
> -		u32 int_num, hwirq_id;
> -
> -		if (!__irq_is_ppi(KVM_DEV_TYPE_ARM_VGIC_V5, intid))
> -			return NULL;
> -
> -		hwirq_id = FIELD_GET(GICV5_HWIRQ_ID, intid);
> -		int_num = array_index_nospec(hwirq_id, VGIC_V5_NR_PRIVATE_IRQS);
> +	type = vcpu->kvm->arch.vgic.vgic_model;
>  
> -		return &vcpu->arch.vgic_cpu.private_irqs[int_num];
> -	}
> +	if (__irq_is_sgi(type, intid) || __irq_is_ppi(type, intid)) {
> +		switch (type) {
> +		case KVM_DEV_TYPE_ARM_VGIC_V5:
> +			intid = vgic_v5_get_hwirq_id(intid);
> +			intid = array_index_nospec(intid, VGIC_V5_NR_PRIVATE_IRQS);
> +			break;
> +		default:
> +			intid = array_index_nospec(intid, VGIC_NR_PRIVATE_IRQS);
> +		}
>  
> -	/* SGIs and PPIs */
> -	if (intid < VGIC_NR_PRIVATE_IRQS) {
> -		intid = array_index_nospec(intid, VGIC_NR_PRIVATE_IRQS);
>  		return &vcpu->arch.vgic_cpu.private_irqs[intid];
>  	}
>  

It preserves the behaviour of returning NULL for anything other than PPI on
gic-v5, because the fallthrough to vgic_get_irq() returns NULL for gic-v5 and
__irq_is_sgi() is always false for gic-v5.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>

Thanks,
Joey


^ permalink raw reply

* [PATCH RFC 4/4] clk: rockchip: rk3576: add ROUND_CLOSEST to dclk_vp1_src divider
From: Alexey Charkov @ 2026-04-17 15:11 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Michael Turquette, Stephen Boyd
  Cc: Pavel Zhovner, Sebastian Reichel, Andy Yan, devicetree,
	linux-arm-kernel, linux-rockchip, linux-kernel, linux-clk,
	Alexey Charkov
In-Reply-To: <20260417-rk3576-dclk-v1-0-26a9d0dcb2de@flipper.net>

Without CLK_DIVIDER_ROUND_CLOSEST, the divider's _is_best_div() only
considers candidates where now <= target, rejecting any rate above the
target even when it is closer. Combined with the PLL round-nearest fix,
this causes the divider to still pick a suboptimal rate: with PLL
round-nearest alone, div=8 produces 249.0 MHz (0.048% over) but is
rejected because it exceeds the target, and div=3/248.0 MHz wins
(-0.354% error).

Add CLK_DIVIDER_ROUND_CLOSEST to dclk_vp1_src's div_flags so the
divider picks the rate closest to the target regardless of direction.
Together with the PLL round-nearest change, this yields:

  VPLL 1992 MHz / 8 = 249.0 MHz (+0.048% error)

instead of the previous:

  VPLL 1488 MHz / 6 = 248.0 MHz (-0.354% error)

This small difference appears to enable more monitors to lock to the VP1
clock when driving output at 2560x1440@60Hz via DisplayPort.

Signed-off-by: Alexey Charkov <alchark@flipper.net>
---
 drivers/clk/rockchip/clk-rk3576.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/rockchip/clk-rk3576.c b/drivers/clk/rockchip/clk-rk3576.c
index 28eb5a802e83..9fc3264ef322 100644
--- a/drivers/clk/rockchip/clk-rk3576.c
+++ b/drivers/clk/rockchip/clk-rk3576.c
@@ -1106,7 +1106,7 @@ static struct rockchip_clk_branch rk3576_clk_branches[] __initdata = {
 			RK3576_CLKSEL_CON(145), 8, 3, MFLAGS, 0, 8, DFLAGS,
 			RK3576_CLKGATE_CON(61), 10, GFLAGS),
 	COMPOSITE(DCLK_VP1_SRC, "dclk_vp1_src", gpll_cpll_vpll_bpll_lpll_p, CLK_SET_RATE_NO_REPARENT | CLK_SET_RATE_PARENT,
-			RK3576_CLKSEL_CON(146), 8, 3, MFLAGS, 0, 8, DFLAGS,
+			RK3576_CLKSEL_CON(146), 8, 3, MFLAGS, 0, 8, DFLAGS | CLK_DIVIDER_ROUND_CLOSEST,
 			RK3576_CLKGATE_CON(61), 11, GFLAGS),
 	COMPOSITE(DCLK_VP2_SRC, "dclk_vp2_src", gpll_cpll_vpll_bpll_lpll_p, CLK_SET_RATE_NO_REPARENT,
 			RK3576_CLKSEL_CON(147), 8, 3, MFLAGS, 0, 8, DFLAGS,

-- 
2.52.0

^ permalink raw reply related

* [PATCH RFC 3/4] clk: rockchip: rk3576: allow dclk_vp1_src to propagate rate to parent PLL
From: Alexey Charkov @ 2026-04-17 15:11 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Michael Turquette, Stephen Boyd
  Cc: Pavel Zhovner, Sebastian Reichel, Andy Yan, devicetree,
	linux-arm-kernel, linux-rockchip, linux-kernel, linux-clk,
	Alexey Charkov
In-Reply-To: <20260417-rk3576-dclk-v1-0-26a9d0dcb2de@flipper.net>

dclk_vp1_src feeds the display clock for Video Port 1. When parented to
the default GPLL (1188 MHz), the 8-bit divider cannot synthesize the
248.88 MHz pixel clock required for 2560x1440@60 which VP1 supports:
1188 / 5 = 237.6 MHz (-4.53% error). This exceeds DisplayPort's +/-0.5%
tolerance and causes black screens on strict sinks.

Add CLK_SET_RATE_PARENT so that when dclk_vp1_src is reparented to a
programmable PLL (e.g. VPLL via assigned-clock-parents), the CCF divider
can ask the PLL to retune. For example, VPLL at 1992 MHz / 8 = 249 MHz
(0.048% error).

This flag relies on reparenting the VP1 source clock to VPLL at DT level
to ensure no consumer calls clk_set_rate on dclk_vp1 while its parent is
set to the boot-time default of GPLL.

Signed-off-by: Alexey Charkov <alchark@flipper.net>
---
 drivers/clk/rockchip/clk-rk3576.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/rockchip/clk-rk3576.c b/drivers/clk/rockchip/clk-rk3576.c
index 2557358e0b9d..28eb5a802e83 100644
--- a/drivers/clk/rockchip/clk-rk3576.c
+++ b/drivers/clk/rockchip/clk-rk3576.c
@@ -1105,7 +1105,7 @@ static struct rockchip_clk_branch rk3576_clk_branches[] __initdata = {
 	COMPOSITE(DCLK_VP0_SRC, "dclk_vp0_src", gpll_cpll_vpll_bpll_lpll_p, CLK_SET_RATE_NO_REPARENT,
 			RK3576_CLKSEL_CON(145), 8, 3, MFLAGS, 0, 8, DFLAGS,
 			RK3576_CLKGATE_CON(61), 10, GFLAGS),
-	COMPOSITE(DCLK_VP1_SRC, "dclk_vp1_src", gpll_cpll_vpll_bpll_lpll_p, CLK_SET_RATE_NO_REPARENT,
+	COMPOSITE(DCLK_VP1_SRC, "dclk_vp1_src", gpll_cpll_vpll_bpll_lpll_p, CLK_SET_RATE_NO_REPARENT | CLK_SET_RATE_PARENT,
 			RK3576_CLKSEL_CON(146), 8, 3, MFLAGS, 0, 8, DFLAGS,
 			RK3576_CLKGATE_CON(61), 11, GFLAGS),
 	COMPOSITE(DCLK_VP2_SRC, "dclk_vp2_src", gpll_cpll_vpll_bpll_lpll_p, CLK_SET_RATE_NO_REPARENT,

-- 
2.52.0

^ permalink raw reply related

* [PATCH RFC 2/4] clk: rockchip: pll: use round-nearest in determine_rate
From: Alexey Charkov @ 2026-04-17 15:11 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Michael Turquette, Stephen Boyd
  Cc: Pavel Zhovner, Sebastian Reichel, Andy Yan, devicetree,
	linux-arm-kernel, linux-rockchip, linux-kernel, linux-clk,
	Alexey Charkov
In-Reply-To: <20260417-rk3576-dclk-v1-0-26a9d0dcb2de@flipper.net>

rockchip_pll_determine_rate() walks the rate table in descending order
and picks the first entry <= the requested rate. This floor-rounding
interacts poorly with consumers that use CLK_SET_RATE_PARENT: a divider
iterating candidates asks the PLL for rate*div, and a tiny undershoot
causes the PLL to snap to a much lower entry.

For example, requesting 1991.04 MHz (248.88 MHz * 8) causes the PLL to
return 1968 MHz instead of 1992 MHz — a 24 MHz table gap that produces
a 1.2% pixel clock error when divided back down.

Change to round-to-nearest: for each table entry compute the absolute
distance from the request, and pick the entry with the smallest delta.
The CCF's divider and composite logic handle over/undershoot preferences
via their own ROUND_CLOSEST flags.

Signed-off-by: Alexey Charkov <alchark@flipper.net>
---
 drivers/clk/rockchip/clk-pll.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/clk/rockchip/clk-pll.c b/drivers/clk/rockchip/clk-pll.c
index 6b853800cb6b..c142f2c4fd99 100644
--- a/drivers/clk/rockchip/clk-pll.c
+++ b/drivers/clk/rockchip/clk-pll.c
@@ -66,19 +66,19 @@ static int rockchip_pll_determine_rate(struct clk_hw *hw,
 {
 	struct rockchip_clk_pll *pll = to_rockchip_clk_pll(hw);
 	const struct rockchip_pll_rate_table *rate_table = pll->rate_table;
+	unsigned long best = 0;
 	int i;

-	/* Assuming rate_table is in descending order */
 	for (i = 0; i < pll->rate_count; i++) {
-		if (req->rate >= rate_table[i].rate) {
-			req->rate = rate_table[i].rate;
-
-			return 0;
-		}
+		if (abs((long)req->rate - (long)rate_table[i].rate) <
+		    abs((long)req->rate - (long)best))
+			best = rate_table[i].rate;
 	}

-	/* return minimum supported value */
-	req->rate = rate_table[i - 1].rate;
+	if (best)
+		req->rate = best;
+	else
+		req->rate = rate_table[pll->rate_count - 1].rate;

 	return 0;
 }

-- 
2.52.0

^ permalink raw reply related

* [PATCH RFC 1/4] arm64: dts: rockchip: rk3576: assign dclk_vp1_src to VPLL
From: Alexey Charkov @ 2026-04-17 15:11 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Michael Turquette, Stephen Boyd
  Cc: Pavel Zhovner, Sebastian Reichel, Andy Yan, devicetree,
	linux-arm-kernel, linux-rockchip, linux-kernel, linux-clk,
	Alexey Charkov
In-Reply-To: <20260417-rk3576-dclk-v1-0-26a9d0dcb2de@flipper.net>

Reparent dclk_vp1_src from GPLL to VPLL at the SoC level. VPLL is a
programmable PLL with no other consumers, allowing the CRU to synthesize
accurate pixel clocks for VP1's output with arbitrary display modes.

Signed-off-by: Alexey Charkov <alchark@flipper.net>
---
 arch/arm64/boot/dts/rockchip/rk3576.dtsi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3576.dtsi b/arch/arm64/boot/dts/rockchip/rk3576.dtsi
index e12a2a0cfb89..2b05900c6c1c 100644
--- a/arch/arm64/boot/dts/rockchip/rk3576.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3576.dtsi
@@ -1338,6 +1338,8 @@ vop: vop@27d00000 {
 				      "dclk_vp1",
 				      "dclk_vp2",
 				      "pll_hdmiphy0";
+			assigned-clocks = <&cru DCLK_VP1_SRC>;
+			assigned-clock-parents = <&cru PLL_VPLL>;
 			iommus = <&vop_mmu>;
 			power-domains = <&power RK3576_PD_VOP>;
 			rockchip,grf = <&sys_grf>;

-- 
2.52.0



^ permalink raw reply related

* [PATCH RFC 0/4] arm64: rockchip: The hunt for exact pixel clocks on RK3576
From: Alexey Charkov @ 2026-04-17 15:11 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Michael Turquette, Stephen Boyd
  Cc: Pavel Zhovner, Sebastian Reichel, Andy Yan, devicetree,
	linux-arm-kernel, linux-rockchip, linux-kernel, linux-clk,
	Alexey Charkov

Dear all,

Need the help of the collective wisdom of the community.

The problem I'm trying to solve is reliably obtaining the exact pixel
clock for arbitrary display modes supported by the RK3576 SoC.

Rockchip RK3576 has three display output processors VP0~VP2, each
supporting different ranges of display modes, roughly as follows:
- VP0: 4K 120Hz
- VP1: 2.5k 60Hz
- VP2: 1080p 60Hz

Each one obviously needs a pixel clock. The required frequencies for the
pixel clocks vary greatly depending on the display mode, and need to be
matched within a tight tolerance, or else many displays will refuse to
work. E.g. the preferred (maximum) display mode out of VP1 is particularly
awkward, because it requires a pixel clock of 248.88 MHz, which cannot
be obtained using integer dividers from its default clock source (GPLL
at 1188 MHz), and the nearest approximation is 237.6 MHz, which is well
outside the tolerance of e.g. DP specification, resulting in a blank
screen on most displays by default.

The clock sources are of course configurable, in particular there are muxes
connected to each VP for selecting the source of the pixel clock:
- Each VP can take the clock either from the (single!) HDMI PHY or from
  its dedicated dclk_vpX_src mux
- The dclk_vpX_src mux can select the clock from a number of system PLLs
  (GPLL, CPLL, VPLL, BPLL, LPLL)

While the system PLLs can be configured to output a wide range of
frequencies, they are shared between many system components. E.g. on the
current mainline kernel on one of my RK3576 boards I've got the following:
GPLL: 1188 MHz, enable count 20
CPLL: 1000 MHz, enable count 17
VPLL: 594 MHz, enable count 0 (yaay!)
BPLL, LPLL: 816 MHz, enable count 0 (but these last ones don't have
            predividers, so are less flexible)

So ultimately there is exactly one free fractional PLL (VPLL) which can be
used to generate arbitrary pixel clocks, but we have up to three consumers
trying to drive different display modes from it (e.g. HDMI on VP0, DP on
VP1 and MIPI DSI on VP2). We also want to be able to adjust the PLL output
frequency on the fly to satisfy the requirements of the selected display
mode.

And this is where I'm stuck. Trying to satisfy the requirements of up to
three consumers while changing the PLL frequency on the fly sounds like
a poorly tractable mathematical problem (is it 3-SAT?). We can take the
HDMI output out of the equation, because it can be driven from the HDMI
PHY (which is capable of arbitrary rates) instead of the mux, but that
makes the decision of which dclk source to use for a VP block dependent on
which downstream consumer is connected to it (HDMI vs. something else).
Even then we somehow need two devices to cooperate in picking a PLL
frequency that satisfies the requirements of both of them, and change to it
without display corruption. I'm not even sure if the CCF has mechanisms
for that?..

What follows is a brief set of patches which illustrate a partial solution
for the case of "I just need 2.5k60Hz on VP1 via DP and don't care about
the rest". It switches the VP1 unconditionally to use VPLL as the source
for its dclk mux, allows changing the VPLL frequency on the fly, and also
changes the frequency calculation logic to allow for nearest-match
frequencies which are not necessarily rounded down. These are not meant
to be merged as-is, as I see the following issues:
- The flag allowing the PLL to change rate is in the clock driver, while
  the reparenting to an unused PLL is in the device tree. If these go out
  of sync, we might end up trying to change the frequency of a PLL which
  is used by other consumers (I presume that could be dangerous)
- If VP0 happens to be driving DP output, it won't be able to produce the
  2560x1440@60Hz mode for the same reasons as VP1 - then it must also be
  reparented to VPLL and allowed to change its frequency on the fly

It does bring me from a state of "always blank screen on DP output until
the mode is switched to something magically working" to a state of
"most monitors work at the default preferred mode" though.

It is tempting to just reparent both VP0 and VP1 to VPLL and allow both of
them to change its frequency, while leaving VP2 on the default (fixed)
GPLL and relying on the fact that 148.5 MHz (the required frequency for
its maximum supported mode of 1920x1080@60Hz) is conveniently 1188/8 MHz -
just what GPLL can provide. Then also force whichever VP is driving HDMI
output to use the HDMI PHY as its clock source. But we still have the
problem of DT vs. driver coordination, and I'm not sure how to define
the policy for "if you've got HDMI connected, you must use the HDMI PHY
clock for the respective VP, whichever VP that is".

I would very much appreciate any thoughts on how to approach this.

Signed-off-by: Alexey Charkov <alchark@flipper.net>
---
Alexey Charkov (4):
      arm64: dts: rockchip: rk3576: assign dclk_vp1_src to VPLL
      clk: rockchip: pll: use round-nearest in determine_rate
      clk: rockchip: rk3576: allow dclk_vp1_src to propagate rate to parent PLL
      clk: rockchip: rk3576: add ROUND_CLOSEST to dclk_vp1_src divider

 arch/arm64/boot/dts/rockchip/rk3576.dtsi |  2 ++
 drivers/clk/rockchip/clk-pll.c           | 16 ++++++++--------
 drivers/clk/rockchip/clk-rk3576.c        |  4 ++--
 3 files changed, 12 insertions(+), 10 deletions(-)
---
base-commit: c7275b05bc428c7373d97aa2da02d3a7fa6b9f66
change-id: 20260417-rk3576-dclk-4c95bbb67581

Best regards,
-- 
Alexey Charkov <alchark@flipper.net>

^ permalink raw reply

* Re: [PATCH RFC 1/2] arm64: vdso: Prepare for robust futex unlock support
From: Florian Weimer @ 2026-04-17 15:08 UTC (permalink / raw)
  To: André Almeida
  Cc: Catalin Marinas, Will Deacon, Thomas Gleixner, Mark Rutland,
	Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Rich Felker, Torvald Riegel, Darren Hart,
	Ingo Molnar, Davidlohr Bueso, Arnd Bergmann, Liam R . Howlett,
	Uros Bizjak, Thomas Weißschuh, linux-arm-kernel,
	linux-kernel, linux-arch, kernel-dev
In-Reply-To: <20260417-tonyk-robust_arm-v1-1-03aa64e2ff1a@igalia.com>

* André Almeida:

> There will be a VDSO function to unlock non-contended robust futexes in
> user space. The unlock sequence is racy vs. clearing the list_pending_op
> pointer in the task's robust list head. To plug this race the kernel needs
> to know the critical section window so it can clear the pointer when the
> task is interrupted within that race window. The window is determined by
> labels in the inline assembly.
>
> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> ---
> RFC: Those symbols can't be found by the linker after patch 2/2, it fails with:
>
> ld: arch/arm64/kernel/vdso.o: in function `vdso_futex_robust_unlock_update_ips':
> arch/arm64/kernel/vdso.c:72:(.text+0x200): undefined reference to `__futex_list64_try_unlock_cs_success'
> ld: arch/arm64/kernel/vdso.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `__futex_list64_try_unlock_cs_success' which may bind externally can not be used when making a shared object; recompile with -fPIC
> arch/arm64/kernel/vdso.c:72:(.text+0x200): dangerous relocation: unsupported relocation

I think your GLOBLS definition adds a 64 suffix.  That shouldn't be
necessary on AArch64.  It's not reflected in the references, so you end
up with an undefined symbol error.

Thanks,
Florian



^ permalink raw reply

* [PATCH v1 4/4] arm64/unwind_user/sframe: Enable sframe unwinding on arm64
From: Jens Remus @ 2026-04-17 15:08 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Steven Rostedt, Josh Poimboeuf,
	Indu Bhagat, Peter Zijlstra, Dylan Hatch, Weinan Liu
  Cc: Jens Remus, linux-arm-kernel, linux-kernel, Heiko Carstens,
	Ilya Leoshkevich
In-Reply-To: <20260417150827.1183376-1-jremus@linux.ibm.com>

Add arm64 support for unwinding of user space using SFrame.

This leverages the unwind user (sframe) support for s390 which
enables architectures that pass the return address in a register,
may not necessarily save the return address on the stack (for
instance in leaf functions), and have SP at call site equal
SP at entry.

For this purpose provide arm64-specific unwind_user_get_ra_reg() and
unwind_user_get_reg() implementations, which return the value of the
link register (LR) or an arbitrary register in the topmost user space
frame.  Define the arm64 SP and FP DWARF register numbers.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    Note:  An arm64 implementation of unwind_user_get_reg() is strictly
    only needed, if SFrame V3 flexible FDE would get generated for aarch64,
    which is currently not the case in GNU Binutils 2.46.

 arch/arm64/Kconfig                          |  1 +
 arch/arm64/include/asm/unwind_user.h        | 23 +++++++++++++++++++++
 arch/arm64/include/asm/unwind_user_sframe.h |  8 +++++++
 3 files changed, 32 insertions(+)
 create mode 100644 arch/arm64/include/asm/unwind_user_sframe.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 994fd5162a1d..641a3a5fe5c9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -254,6 +254,7 @@ config ARM64
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UNWIND_USER_FP
+	select HAVE_UNWIND_USER_SFRAME
 	select HAVE_KPROBES
 	select HAVE_KRETPROBES
 	select HAVE_GENERIC_VDSO
diff --git a/arch/arm64/include/asm/unwind_user.h b/arch/arm64/include/asm/unwind_user.h
index 0641d4d97b0f..3c7fd8c4ba5b 100644
--- a/arch/arm64/include/asm/unwind_user.h
+++ b/arch/arm64/include/asm/unwind_user.h
@@ -4,6 +4,7 @@
 
 #include <linux/sched/task_stack.h>
 #include <linux/types.h>
+#include <asm/insn.h>
 
 #ifdef CONFIG_UNWIND_USER
 
@@ -16,6 +17,28 @@ static inline int unwind_user_word_size(struct pt_regs *regs)
 	return sizeof(long);
 }
 
+static inline int unwind_user_get_ra_reg(unsigned long *val)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	*val = regs->regs[AARCH64_INSN_REG_LR];
+	return 0;
+}
+#define unwind_user_get_ra_reg unwind_user_get_ra_reg
+
+static inline int unwind_user_get_reg(unsigned long *val, unsigned int regnum)
+{
+	const struct pt_regs *regs = task_pt_regs(current);
+
+	if (regnum <= 30)
+		/* DWARF register numbers 0..15 */
+		*val = regs->regs[regnum];
+	else
+		return -EINVAL;
+
+	return 0;
+}
+#define unwind_user_get_reg unwind_user_get_reg
+
 #endif /* CONFIG_UNWIND_USER */
 
 #ifdef CONFIG_HAVE_UNWIND_USER_FP
diff --git a/arch/arm64/include/asm/unwind_user_sframe.h b/arch/arm64/include/asm/unwind_user_sframe.h
new file mode 100644
index 000000000000..65c0a6b6c835
--- /dev/null
+++ b/arch/arm64/include/asm/unwind_user_sframe.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_UNWIND_USER_SFRAME_H
+#define _ASM_ARM64_UNWIND_USER_SFRAME_H
+
+#define SFRAME_REG_SP	31
+#define SFRAME_REG_FP	29
+
+#endif /* _ASM_ARM64_UNWIND_USER_SFRAME_H */
-- 
2.51.0



^ permalink raw reply related

* [PATCH v1 3/4] arm64/vdso: Enable SFrame generation in vDSO
From: Jens Remus @ 2026-04-17 15:08 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Steven Rostedt, Josh Poimboeuf,
	Indu Bhagat, Peter Zijlstra, Dylan Hatch, Weinan Liu
  Cc: Jens Remus, linux-arm-kernel, linux-kernel, Heiko Carstens,
	Ilya Leoshkevich
In-Reply-To: <20260417150827.1183376-1-jremus@linux.ibm.com>

This replicates Josh's x86 patch "x86/vdso: Enable sframe generation
in VDSO" [1] for arm64.

Enable .sframe generation in the vDSO library so kernel and user space
can unwind through it.  Keep all function symbols in the vDSO .symtab
for stack trace purposes.  This enables perf to lookup these function
symbols in addition to those already exported in vDSO .dynsym.

Starting with binutils 2.46 both GNU assembler and GNU linker
exclusively support generating and merging .sframe in SFrame V3 format.
For vDSO, only if supported by the assembler, generate .sframe, collect
it, mark it as KEEP, and generate a GNU_SFRAME program table entry.
Otherwise explicitly discard any .sframe.

[1]: x86/vdso: Enable sframe generation in VDSO,
     https://lore.kernel.org/all/20260211141357.271402-7-jremus@linux.ibm.com/

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---

Notes (jremus):
    @Dylan:  Adding -Wa,--gsframe-3 to the VDSO CC_FLAGS_ADD_VDSO (and
    AS_FLAGS_ADD_VDSO) may clash with your patch [1] that adds likewise
    to the CC_FLAGS_REMOVE_VDSO.  Any idea how to resolve?
    
    [1]: [PATCH v3 2/8] arm64, unwind: build kernel with sframe V3 info,
         https://lore.kernel.org/all/20260406185000.1378082-3-dylanbhatch@google.com/

 arch/arm64/kernel/vdso/Makefile   | 14 ++++++++++++--
 arch/arm64/kernel/vdso/vdso.lds.S | 21 +++++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 7dec05dd33b7..1f2f01673397 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -15,6 +15,10 @@ obj-vdso := vgettimeofday.o note.o sigreturn.o vgetrandom.o vgetrandom-chacha.o
 targets := $(obj-vdso) vdso.so vdso.so.dbg
 obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
 
+ifeq ($(CONFIG_AS_SFRAME3),y)
+  SFRAME_CFLAGS := -Wa,--gsframe-3
+endif
+
 btildflags-$(CONFIG_ARM64_BTI_KERNEL) += -z force-bti
 
 # -Bsymbolic has been added for consistency with arm, the compat vDSO and
@@ -41,7 +45,9 @@ CC_FLAGS_REMOVE_VDSO := $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) \
 			$(CC_FLAGS_LTO) $(CC_FLAGS_CFI) \
 			-Wmissing-prototypes -Wmissing-declarations
 
-CC_FLAGS_ADD_VDSO := -O2 -mcmodel=tiny -fasynchronous-unwind-tables
+CC_FLAGS_ADD_VDSO := -O2 -mcmodel=tiny -fasynchronous-unwind-tables $(SFRAME_CFLAGS)
+
+AS_FLAGS_ADD_VDSO := $(SFRAME_CFLAGS)
 
 CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_REMOVE_VDSO)
 CFLAGS_REMOVE_vgetrandom.o = $(CC_FLAGS_REMOVE_VDSO)
@@ -49,6 +55,10 @@ CFLAGS_REMOVE_vgetrandom.o = $(CC_FLAGS_REMOVE_VDSO)
 CFLAGS_vgettimeofday.o = $(CC_FLAGS_ADD_VDSO)
 CFLAGS_vgetrandom.o = $(CC_FLAGS_ADD_VDSO)
 
+AFLAGS_sigreturn.o = $(AS_FLAGS_ADD_VDSO)
+
+AFLAGS_vgetrandom-chacha.o = $(AS_FLAGS_ADD_VDSO)
+
 ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
 endif
@@ -65,7 +75,7 @@ $(obj)/vdso.so.dbg: $(obj)/vdso.lds $(obj-vdso) FORCE
 	$(call if_changed,vdsold_and_vdso_check)
 
 # Strip rule for the .so file
-$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: OBJCOPYFLAGS := -g
 $(obj)/%.so: $(obj)/%.so.dbg FORCE
 	$(call if_changed,objcopy)
 
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 52314be29191..527e107ca4b5 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -15,6 +15,8 @@
 #include <asm-generic/vmlinux.lds.h>
 #include <vdso/datapage.h>
 
+#define KEEP_SFRAME	IS_ENABLED(CONFIG_AS_SFRAME)
+
 OUTPUT_FORMAT("elf64-littleaarch64", "elf64-bigaarch64", "elf64-littleaarch64")
 OUTPUT_ARCH(aarch64)
 
@@ -68,6 +70,13 @@ SECTIONS
 		*(.igot .igot.plt)
 	}						:text
 
+#if KEEP_SFRAME
+	.sframe		: {
+		KEEP (*(.sframe))
+		*(.sframe.*)
+	}						:text	:sframe
+#endif
+
 	_end = .;
 	PROVIDE(end = .);
 
@@ -78,9 +87,18 @@ SECTIONS
 		*(.data .data.* .gnu.linkonce.d.* .sdata*)
 		*(.bss .sbss .dynbss .dynsbss)
 		*(.eh_frame .eh_frame_hdr)
+#if !KEEP_SFRAME
+		*(.sframe)
+		*(.sframe.*)
+#endif
 	}
 }
 
+/*
+ * Very old versions of ld do not recognize this name token; use the constant.
+ */
+#define PT_GNU_SFRAME	0x6474e554
+
 /*
  * We must supply the ELF program headers explicitly to get just one
  * PT_LOAD segment, and set the flags explicitly to make segments read-only.
@@ -90,6 +108,9 @@ PHDRS
 	text		PT_LOAD		FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */
 	dynamic		PT_DYNAMIC	FLAGS(4);		/* PF_R */
 	note		PT_NOTE		FLAGS(4);		/* PF_R */
+#if KEEP_SFRAME
+	sframe		PT_GNU_SFRAME	FLAGS(4);		/* PF_R */
+#endif
 }
 
 /*
-- 
2.51.0



^ permalink raw reply related

* [PATCH v1 0/4] arm64: SFrame user space unwinding
From: Jens Remus @ 2026-04-17 15:08 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Steven Rostedt, Josh Poimboeuf,
	Indu Bhagat, Peter Zijlstra, Dylan Hatch, Weinan Liu
  Cc: Jens Remus, linux-arm-kernel, linux-kernel, Heiko Carstens,
	Ilya Leoshkevich

This series adds arm64 support for unwinding of user space using SFrame V3.
It is based on Josh's, Steven's, and my work.


Prerequirements:

This series applies on top of the latest unwind user sframe series
"[PATCH v13 00/18] unwind_deferred: Implement sframe handling":
https://lore.kernel.org/all/20260127150554.2760964-1-jremus@linux.ibm.com/

For which Steven Rostedt kindly maintains a branch:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git  sframe/core

Like above series it depends on binutils 2.46 to be used to build
executables and libraries (e.g. vDSO) with SFrame V3 on aarch64
(using the assembler option --gsframe-3).

The unwind user sframe series depends on a Glibc patch from Josh, that
adds support for the prctls introduced in the Kernel:
https://lore.kernel.org/all/20250122023517.lmztuocecdjqzfhc@jpoimboe/
Note that Josh's Glibc patch needs to be adjusted for the updated prctl
numbers from "[PATCH v13 18/18] unwind_user/sframe: Add prctl() interface
for registering .sframe sections":
https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git/diff/include/uapi/linux/prctl.h?h=sframe/core


Overview:

Patch 1 enables deferred FP-based unwinding of user space on arm64.
This is used by unwind user as fallback if SFrame is not available.

Patch 2 adds an unsafe_copy_from_user() implementation for arm64.
This is needed by unwind user sframe to access .sframe sections.

Patch 3 enables .sframe generation in vDSO on arm64.

Patch 4 enables deferred SFrame-based unwinding of user space on arm64.


Usage:

perf tools already support the deferred unwinding infrastructure by
using option "--call-graph fp,defer" (name subject to change):

  $ perf record -F 999 --call-graph fp,defer /path/to/executable
  $ perf script


Limitations:

Support for PAC is not yet implemented.  Note that SFrame V3 already
provides the required information though:

  SFRAME_V3_AARCH64_FDE_PAUTH_KEY(fde_info)
  SFRAME_V3_AARCH64_FRE_MANGLED_RA_P(fre_info)


Thanks and regards,
Jens

Jens Remus (4):
  arm64/unwind_user/fp: Enable HAVE_UNWIND_USER_FP
  arm64/uaccess: Add unsafe_copy_from_user() implementation
  arm64/vdso: Enable SFrame generation in vDSO
  arm64/unwind_user/sframe: Enable sframe unwinding on arm64

 arch/arm64/Kconfig                          |  2 +
 arch/arm64/include/asm/uaccess.h            | 39 +++++++++----
 arch/arm64/include/asm/unwind_user.h        | 65 +++++++++++++++++++++
 arch/arm64/include/asm/unwind_user_sframe.h |  8 +++
 arch/arm64/kernel/vdso/Makefile             | 14 ++++-
 arch/arm64/kernel/vdso/vdso.lds.S           | 21 +++++++
 6 files changed, 137 insertions(+), 12 deletions(-)
 create mode 100644 arch/arm64/include/asm/unwind_user.h
 create mode 100644 arch/arm64/include/asm/unwind_user_sframe.h

-- 
2.51.0



^ permalink raw reply

* [PATCH v1 1/4] arm64/unwind_user/fp: Enable HAVE_UNWIND_USER_FP
From: Jens Remus @ 2026-04-17 15:08 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Steven Rostedt, Josh Poimboeuf,
	Indu Bhagat, Peter Zijlstra, Dylan Hatch, Weinan Liu
  Cc: Jens Remus, linux-arm-kernel, linux-kernel, Heiko Carstens,
	Ilya Leoshkevich
In-Reply-To: <20260417150827.1183376-1-jremus@linux.ibm.com>

Add arm64 support for unwinding of user space using frame pointer (FP).

For this purpose enable the config option HAVE_UNWIND_USER_FP and
provide an arm64-specific ARCH_INIT_USER_FP_FRAME definition (specifying
the CFA offset from FP and the FP and RA offsets from CFA).  Unlike x86,
as there is no mean to determine whether the user space IP in the
topmost frame is at function entry, rely on the common definition of
unwind_user_at_function_start(), which always returns false, and common
dummy definition of ARCH_INIT_USER_FP_ENTRY_FRAME.

For unwind user in general provide an arm64-specific implementation
of unwind_user_word_size().

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/unwind_user.h | 42 ++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)
 create mode 100644 arch/arm64/include/asm/unwind_user.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 38dba5f7e4d2..994fd5162a1d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -253,6 +253,7 @@ config ARM64
 	select HAVE_RUST if RUSTC_SUPPORTS_ARM64
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
+	select HAVE_UNWIND_USER_FP
 	select HAVE_KPROBES
 	select HAVE_KRETPROBES
 	select HAVE_GENERIC_VDSO
diff --git a/arch/arm64/include/asm/unwind_user.h b/arch/arm64/include/asm/unwind_user.h
new file mode 100644
index 000000000000..0641d4d97b0f
--- /dev/null
+++ b/arch/arm64/include/asm/unwind_user.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_UNWIND_USER_H
+#define _ASM_ARM64_UNWIND_USER_H
+
+#include <linux/sched/task_stack.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_UNWIND_USER
+
+static inline int unwind_user_word_size(struct pt_regs *regs)
+{
+#ifdef COMPAT
+	if (compat_user_mode(regs))
+		return sizeof(int);
+#endif
+	return sizeof(long);
+}
+
+#endif /* CONFIG_UNWIND_USER */
+
+#ifdef CONFIG_HAVE_UNWIND_USER_FP
+
+#define ARCH_INIT_USER_FP_FRAME(ws)					\
+	.cfa		=  {						\
+		.rule		= UNWIND_USER_CFA_RULE_FP_OFFSET,	\
+		.offset		= 2*(ws),				\
+			},						\
+	.ra		= {						\
+		.rule		= UNWIND_USER_RULE_CFA_OFFSET_DEREF,	\
+		.offset		= -1*(ws),				\
+			},						\
+	.fp		= {						\
+		.rule		= UNWIND_USER_RULE_CFA_OFFSET_DEREF,	\
+		.offset		= -2*(ws),				\
+			},						\
+	.outermost	= false,
+
+#endif /* CONFIG_HAVE_UNWIND_USER_FP */
+
+#include <asm-generic/unwind_user.h>
+
+#endif /* _ASM_ARM64_UNWIND_USER_H */
-- 
2.51.0



^ permalink raw reply related

* [PATCH v1 2/4] arm64/uaccess: Add unsafe_copy_from_user() implementation
From: Jens Remus @ 2026-04-17 15:08 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Steven Rostedt, Josh Poimboeuf,
	Indu Bhagat, Peter Zijlstra, Dylan Hatch, Weinan Liu
  Cc: Jens Remus, linux-arm-kernel, linux-kernel, Heiko Carstens,
	Ilya Leoshkevich
In-Reply-To: <20260417150827.1183376-1-jremus@linux.ibm.com>

This replicates Josh's x86 patch "x86/uaccess:
Add unsafe_copy_from_user() implementation" [1] for arm64.

Add an arm64 implementation of unsafe_copy_from_user() similar to the
existing unsafe_copy_to_user().

For this purpose rename the unsafe_copy_loop() helper to
unsafe_copy_to_user_loop() and introduce a unsafe_copy_from_user_loop()
helper.

While at it rename the unsafe_copy_to_user() local variables
__ucu_{dst|src|len} to __{dst|src|len} and change their pointer
type to void * to align to the x86 patch.

[1]: x86/uaccess: Add unsafe_copy_from_user() implementation,
     https://lore.kernel.org/all/20251119132323.1281768-4-jremus@linux.ibm.com/

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
---
 arch/arm64/include/asm/uaccess.h | 39 ++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 9810106a3f66..37d7d16b86a9 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -437,7 +437,7 @@ static inline void user_access_restore(unsigned long enabled) { }
  * We want the unsafe accessors to always be inlined and use
  * the error labels - thus the macro games.
  */
-#define unsafe_copy_loop(dst, src, len, type, label)				\
+#define unsafe_copy_to_user_loop(dst, src, len, type, label)			\
 	while (len >= sizeof(type)) {						\
 		unsafe_put_user(*(type *)(src),(type __user *)(dst),label);	\
 		dst += sizeof(type);						\
@@ -445,15 +445,34 @@ static inline void user_access_restore(unsigned long enabled) { }
 		len -= sizeof(type);						\
 	}
 
-#define unsafe_copy_to_user(_dst,_src,_len,label)			\
-do {									\
-	char __user *__ucu_dst = (_dst);				\
-	const char *__ucu_src = (_src);					\
-	size_t __ucu_len = (_len);					\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, label);	\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, label);	\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, label);	\
-	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, label);	\
+#define unsafe_copy_to_user(_dst, _src, _len, label)				\
+do {										\
+	void __user *__dst = (_dst);						\
+	const void *__src = (_src);						\
+	size_t __len = (_len);							\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u64, label);		\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u32, label);		\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u16, label);		\
+	unsafe_copy_to_user_loop(__dst, __src, __len, u8,  label);		\
+} while (0)
+
+#define unsafe_copy_from_user_loop(dst, src, len, type, label)			\
+	while (len >= sizeof(type)) {						\
+		unsafe_get_user(*(type *)(dst), (type __user *)(src), label);	\
+		dst += sizeof(type);						\
+		src += sizeof(type);						\
+		len -= sizeof(type);						\
+	}
+
+#define unsafe_copy_from_user(_dst, _src, _len, label)				\
+do {										\
+	void *__dst = (_dst);							\
+	void __user *__src = (_src);						\
+	size_t __len = (_len);							\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u64, label);		\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u32, label);		\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u16, label);		\
+	unsafe_copy_from_user_loop(__dst, __src, __len, u8,  label);		\
 } while (0)
 
 #define INLINE_COPY_TO_USER
-- 
2.51.0



^ permalink raw reply related

* [PATCH RFC 2/2] arm64: vdso: Implement __vdso_futex_robust_try_unlock()
From: André Almeida @ 2026-04-17 14:56 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Thomas Gleixner, Mark Rutland,
	Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh
  Cc: linux-arm-kernel, linux-kernel, linux-arch, kernel-dev, LKML,
	André Almeida
In-Reply-To: <20260417-tonyk-robust_arm-v1-0-03aa64e2ff1a@igalia.com>

Based on the x86 implementation, implement the vDSO function for unlocking
a robust futex correctly.

Commit xxxxxxxxxxxx ("x86/vdso: Implement __vdso_futex_robust_try_unlock()") has
the full explanation about why this mechanism is needed.

The unlock assembly sequence for arm64 is:

	__futex_list64_try_unlock_cs_start:
		ldxr	x3, [x0] // Load the value at *futex
		cmp	x1, x3   // Compare with TID
		b.ne	__futex_list64_try_unlock_cs_end
		stlxr	w1, xzr, [x0] // Try to clear *futex
		cbnz	w1, __futex_list64_try_unlock_cs_start
	__futex_list64_try_unlock_cs_success:
		str	xzr, [x2] // After clearing *futex, clear *op_pending
	__futex_list64_try_unlock_cs_end:

The decision regarding if the pointer should be cleared or not lies on checking
the condition flag zero:

	return (regs->user_regs.pstate & PSR_Z_BIT) ?
		(void __user *) regs->user_regs.regs[2] : NULL;

If it's not zero, that means that the comparassion worked and the kernel should
clear op_pending (if userspace didn't managed to) stored at x2.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
RFC:
 - Should I duplicate the explanation found in the x86 commit or can I just
 point to it?
 - Only LL/SC for now but I can add LSE later if this looks good
 - It the objdump I see that op_pending is store at x2. But how stable is this,
 how can I write it in a way that's always x2?
---
 arch/arm64/Kconfig                                 |  1 +
 arch/arm64/include/asm/futex_robust.h              | 35 +++++++++++++
 arch/arm64/kernel/vdso/Makefile                    |  9 +++-
 arch/arm64/kernel/vdso/vdso.lds.S                  |  4 ++
 .../kernel/vdso/vfutex_robust_list_try_unlock.c    | 59 ++++++++++++++++++++++
 5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 427151a9db7f..e10cb97a51c7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -249,6 +249,7 @@ config ARM64
 	select HAVE_RELIABLE_STACKTRACE
 	select HAVE_POSIX_CPU_TIMERS_TASK_WORK
 	select HAVE_FUNCTION_ARG_ACCESS_API
+	select HAVE_FUTEX_ROBUST_UNLOCK
 	select MMU_GATHER_RCU_TABLE_FREE
 	select HAVE_RSEQ
 	select HAVE_RUST if RUSTC_SUPPORTS_ARM64
diff --git a/arch/arm64/include/asm/futex_robust.h b/arch/arm64/include/asm/futex_robust.h
new file mode 100644
index 000000000000..f2b7a2b15cb5
--- /dev/null
+++ b/arch/arm64/include/asm/futex_robust.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_FUTEX_ROBUST_H
+#define _ASM_ARM64_FUTEX_ROBUST_H
+
+#include <asm/ptrace.h>
+
+static __always_inline void __user *arm64_futex_robust_unlock_get_pop(struct pt_regs *regs)
+{
+	/*
+	 * RFC: According to the objdump bellow, x2 is the address of
+	 * op_pending. How stable is this?
+
+	 <__futex_list64_try_unlock_cs_start>:
+		ldxr	x3, [x0]
+		cmp	x1, x3
+		b.ne	d7c <__futex_list64_try_unlock_cs_end>  // b.any
+		stlxr	w1, xzr, [x0]
+		cbnz	w1, d64 <__futex_list64_try_unlock_cs_start>
+
+	<__futex_list64_try_unlock_cs_success>:
+		str	xzr, [x2]
+
+	<__futex_list64_try_unlock_cs_end>:
+		mov	w0, w3
+		ret
+	*/
+
+	return (regs->user_regs.pstate & PSR_Z_BIT) ? NULL
+		: (void __user *) regs->user_regs.regs[2];
+}
+
+#define arch_futex_robust_unlock_get_pop(regs)	\
+	arm64_futex_robust_unlock_get_pop(regs)
+
+#endif /* _ASM_ARM64_FUTEX_ROBUST_H */
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 7dec05dd33b7..a65893d8100e 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -9,7 +9,8 @@
 # Include the generic Makefile to check the built vdso.
 include $(srctree)/lib/vdso/Makefile.include
 
-obj-vdso := vgettimeofday.o note.o sigreturn.o vgetrandom.o vgetrandom-chacha.o
+obj-vdso := vgettimeofday.o note.o sigreturn.o vgetrandom.o vgetrandom-chacha.o \
+	    vfutex_robust_list_try_unlock.o
 
 # Build rules
 targets := $(obj-vdso) vdso.so vdso.so.dbg
@@ -45,9 +46,11 @@ CC_FLAGS_ADD_VDSO := -O2 -mcmodel=tiny -fasynchronous-unwind-tables
 
 CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_REMOVE_VDSO)
 CFLAGS_REMOVE_vgetrandom.o = $(CC_FLAGS_REMOVE_VDSO)
+CFLAGS_REMOVE_vfutex_robust_list_try_unlock.o = $(CC_FLAGS_REMOVE_VDSO)
 
 CFLAGS_vgettimeofday.o = $(CC_FLAGS_ADD_VDSO)
 CFLAGS_vgetrandom.o = $(CC_FLAGS_ADD_VDSO)
+CFLAGS_vfutex_robust_list_try_unlock.o = $(CC_FLAGS_ADD_VDSO)
 
 ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
@@ -57,6 +60,10 @@ ifneq ($(c-getrandom-y),)
   CFLAGS_vgetrandom.o += -include $(c-getrandom-y)
 endif
 
+ifneq ($(c-vfutex_robust_list_try_unlock-y),)
+  CFLAGS_vfutex_robust_list_try_unlock.o += -include $(c-vfutex_robust_list_try_unlock-y)
+endif
+
 targets += vdso.lds
 CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
 
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 52314be29191..33ce58516580 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -104,6 +104,10 @@ VERSION
 		__kernel_clock_gettime;
 		__kernel_clock_getres;
 		__kernel_getrandom;
+		__vdso_futex_robust_list64_try_unlock;
+#ifdef CONFIG_COMPAT
+		__vdso_futex_robust_list32_try_unlock;
+#endif
 	local: *;
 	};
 }
diff --git a/arch/arm64/kernel/vdso/vfutex_robust_list_try_unlock.c b/arch/arm64/kernel/vdso/vfutex_robust_list_try_unlock.c
new file mode 100644
index 000000000000..a9089d3cacfc
--- /dev/null
+++ b/arch/arm64/kernel/vdso/vfutex_robust_list_try_unlock.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include <vdso/futex.h>
+#include <linux/stringify.h>
+
+#define LABEL(name, sz) __stringify(__futex_list##sz##_try_unlock_cs_##name)
+
+#define GLOBLS(sz) ".globl " LABEL(start, sz) ", " LABEL(success, sz) ", " LABEL(end, sz) "\n"
+
+__u32 __vdso_futex_robust_list64_try_unlock(__u32 *lock, __u32 tid, __u64 *pop)
+{
+	__u32 val, result;
+
+	asm volatile (
+		GLOBLS(64)
+		"	prfm pstl1strm, %[lock]			\n"
+		LABEL(start, 64)":				\n"
+		"	ldxr %[val], %[lock]			\n"
+		"	cmp %[tid], %[val]			\n"
+		"	bne " LABEL(end, 64)"			\n"
+		"	stlxr %w[result], xzr, %[lock]		\n"
+		"	cbnz %w[result], " LABEL(start, 64)"	\n"
+		LABEL(success, 64)":				\n"
+		"	str xzr, %[pop]				\n"
+		LABEL(end, 64)":				\n"
+
+		: [val] "=&r" (val), [result] "=r" (result)
+		: [tid] "r" (tid), [lock] "Q" (*lock), [pop] "Q" (*pop)
+		: "memory"
+	);
+
+	return val;
+}
+
+#ifdef CONFIG_COMPAT
+__u32 __vdso_futex_robust_list32_try_unlock(__u32 *lock, __u32 tid, __u32 *pop)
+{
+	__u32 val, result;
+
+	asm volatile (
+		GLOBLS(32)
+		"	prfm pstl1strm, %[lock]			\n"
+		LABEL(start, 32)":				\n"
+		"	ldxr %w[val], %[lock]			\n"
+		"	cmp %w[tid], %w[val]			\n"
+		"	bne " LABEL(end, 32)"			\n"
+		"	stlxr %w[result], wzr, %w[lock]		\n"
+		"	cbnz %w[result], " LABEL(start, 32)"	\n"
+		LABEL(success, 32)":				\n"
+		"	str wzr, %w[pop]			\n"
+		LABEL(end, 32)":				\n"
+
+		: [val] "=&r" (val), [result] "=r" (result)
+		: [tid] "r" (tid), [lock] "Q" (*lock), [pop] "Q" (*pop)
+		: "memory"
+	);
+
+	return val;
+}
+#endif

-- 
2.53.0



^ permalink raw reply related

* [PATCH RFC 0/2] arm64: vdso: Implement __vdso_futex_robust_try_unlock()
From: André Almeida @ 2026-04-17 14:56 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Thomas Gleixner, Mark Rutland,
	Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh
  Cc: linux-arm-kernel, linux-kernel, linux-arch, kernel-dev, LKML,
	André Almeida

Hi folks,

This is my take on implementing the new vDSO for unlocking a robust futex in
arm64. If you don't know what's that, Thomas wrote a good summary,
including the motivation for this work and the x86 implementation:

   https://lore.kernel.org/lkml/878qb89g7b.ffs@tglx/

There are some loose ends in my patchset so I'm sending as a RFC to ask
some questions:

 - I haven't managed to expose the assembly labels correctly, the linker can't
 find it and the compilation fails, more info in patch 1/2
 - If the process is interrupted between the labels, we need to check the
 conditional flags and clear the op_pending address from the register. Using
 objdump I see that op_pending addr is being stored at x2, but I suspect that
 this isn't stable, so I need to figure out how to make sure that the address
 will always be stored in the same register.
 - So far I have implemented only the LL/SC version to make review easier, but I
 can do the LSE version as well.

This patchset works fine with the tests proposed at
https://lore.kernel.org/lkml/20260330120118.012924430@kernel.org/ (but of course
without the labels the complete mechanism doesn't work properly).

---
André Almeida (2):
      arm64: vdso: Prepare for robust futex unlock support
      arm64: vdso: Implement __vdso_futex_robust_try_unlock()

 arch/arm64/Kconfig                                 |  1 +
 arch/arm64/include/asm/futex_robust.h              | 35 +++++++++++++
 arch/arm64/include/asm/vdso.h                      |  4 ++
 arch/arm64/kernel/vdso.c                           | 29 +++++++++++
 arch/arm64/kernel/vdso/Makefile                    |  9 +++-
 arch/arm64/kernel/vdso/vdso.lds.S                  |  4 ++
 .../kernel/vdso/vfutex_robust_list_try_unlock.c    | 59 ++++++++++++++++++++++
 7 files changed, 140 insertions(+), 1 deletion(-)
---
base-commit: 0e8896e9899b607bb168c1cce340596b8c2e3e2b
change-id: 20260416-tonyk-robust_arm-54ff77d2c4e4

Best regards,
--  
André Almeida <andrealmeid@igalia.com>



^ permalink raw reply

* [PATCH RFC 1/2] arm64: vdso: Prepare for robust futex unlock support
From: André Almeida @ 2026-04-17 14:56 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Thomas Gleixner, Mark Rutland,
	Mathieu Desnoyers, Sebastian Andrzej Siewior, Carlos O'Donell,
	Peter Zijlstra, Florian Weimer, Rich Felker, Torvald Riegel,
	Darren Hart, Ingo Molnar, Davidlohr Bueso, Arnd Bergmann,
	Liam R . Howlett, Uros Bizjak, Thomas Weißschuh
  Cc: linux-arm-kernel, linux-kernel, linux-arch, kernel-dev, LKML,
	André Almeida
In-Reply-To: <20260417-tonyk-robust_arm-v1-0-03aa64e2ff1a@igalia.com>

There will be a VDSO function to unlock non-contended robust futexes in
user space. The unlock sequence is racy vs. clearing the list_pending_op
pointer in the task's robust list head. To plug this race the kernel needs
to know the critical section window so it can clear the pointer when the
task is interrupted within that race window. The window is determined by
labels in the inline assembly.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
---
RFC: Those symbols can't be found by the linker after patch 2/2, it fails with:

ld: arch/arm64/kernel/vdso.o: in function `vdso_futex_robust_unlock_update_ips':
arch/arm64/kernel/vdso.c:72:(.text+0x200): undefined reference to `__futex_list64_try_unlock_cs_success'
ld: arch/arm64/kernel/vdso.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `__futex_list64_try_unlock_cs_success' which may bind externally can not be used when making a shared object; recompile with -fPIC
arch/arm64/kernel/vdso.c:72:(.text+0x200): dangerous relocation: unsupported relocation
---
 arch/arm64/include/asm/vdso.h |  4 ++++
 arch/arm64/kernel/vdso.c      | 29 +++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
index 232b46969088..182fde1df3dd 100644
--- a/arch/arm64/include/asm/vdso.h
+++ b/arch/arm64/include/asm/vdso.h
@@ -18,6 +18,10 @@
 
 extern char vdso_start[], vdso_end[];
 extern char vdso32_start[], vdso32_end[];
+extern char __futex_list64_try_unlock_cs_success[], __futex_list64_try_unlock_cs_end[];
+#ifdef CONFIG_COMPAT
+extern char __futex_list32_try_unlock_cs_success[], __futex_list32_try_unlock_cs_end[];
+#endif
 
 #endif /* !__ASSEMBLER__ */
 
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 592dd8668de4..42a82e73a774 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -11,6 +11,7 @@
 #include <linux/clocksource.h>
 #include <linux/elf.h>
 #include <linux/err.h>
+#include <linux/futex.h>
 #include <linux/errno.h>
 #include <linux/gfp.h>
 #include <linux/kernel.h>
@@ -57,6 +58,32 @@ static struct vdso_abi_info vdso_info[] __ro_after_init = {
 #endif /* CONFIG_COMPAT_VDSO */
 };
 
+#ifdef CONFIG_FUTEX_ROBUST_UNLOCK
+static void vdso_futex_robust_unlock_update_ips(enum vdso_abi abi, struct mm_struct *mm)
+{
+	unsigned long vdso = (unsigned long) mm->context.vdso;
+	struct futex_mm_data *fd = &mm->futex;
+
+	/*
+	 * RFC: won't compile due to undefined reference to `__futex_list64_try_unlock_cs_...`
+
+	if (abi == VDSO_ABI_AA64) {
+		futex_set_vdso_cs_range(fd, 0, vdso, (uintptr_t) __futex_list64_try_unlock_cs_success,
+					(uintptr_t) __futex_list64_try_unlock_cs_end, false);
+	}
+
+#ifdef CONFIG_COMPAT
+	if (abi == VDSO_ABI_AA32) {
+		futex_set_vdso_cs_range(fd, 1, vdso, (uintptr_t) __futex_list32_try_unlock_cs_success,
+					(uintptr_t) __futex_list32_try_unlock_cs_end, true);
+	}
+#endif
+	*/
+}
+#else
+static inline void vdso_futex_robust_unlock_update_ips(enum vdso_abi abi, struct mm_struct *mm) { }
+#endif /* CONFIG_FUTEX_ROBUST_UNLOCK */
+
 static int vdso_mremap(const struct vm_special_mapping *sm,
 		struct vm_area_struct *new_vma)
 {
@@ -134,6 +161,8 @@ static int __setup_additional_pages(enum vdso_abi abi,
 	if (IS_ERR(ret))
 		goto up_fail;
 
+	vdso_futex_robust_unlock_update_ips(abi, mm);
+
 	return 0;
 
 up_fail:

-- 
2.53.0



^ permalink raw reply related

* Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation
From: Ard Biesheuvel @ 2026-04-17 14:43 UTC (permalink / raw)
  To: Robin Murphy, Demian Shulhan
  Cc: Christoph Hellwig, Mark Rutland, Song Liu, Yu Kuai, Will Deacon,
	Catalin Marinas, Mark Brown, linux-arm-kernel, Li Nan, linux-raid,
	linux-kernel
In-Reply-To: <8db4defe-8b5e-4cc3-880b-72d46510b034@arm.com>

On Thu, 16 Apr 2026, at 18:26, Robin Murphy wrote:
> On 16/04/2026 3:59 pm, Demian Shulhan wrote:
>> Hi Ard!
...
>>> OK, so the takeaway here is that SVE is only worth the hassle if the
>>> vector length is at least 256 bits. This is not entirely surprising,
>>> but given that Graviton4 went back to 128 bit vectors from 256, I
>>> wonder what the future expectation is here.
>>
>> I agree. The results from the SnapRAID tests are not as impressive as
>> I hoped, and the fact that Neoverse-V2 went back to 128-bit is a red
>> flag. It suggests that wide SVE registers might not be a priority in
>> future architecture versions.
>
> If you look at the Neoverse V1 software optimisation guide[1], the SVE
> instructions generally have half the throughput of their ASIMD
> equivalents (i.e. presumably the vector pipes are still only 128 bits
> wide and SVE is just using them in pairs), so indeed the total
> instruction count is largely meaningless - IPC might be somewhat more
> relevant, but I'd say the only performance number that's really
> meaningful is the end-to-end MB/s measure of how fast the function
> implementation as a whole can process data.

On arm64, kernel mode NEON is mostly used to gain access to AES and SHA
instructions, and only to a lesser degree to speed up ordinary
arithmetic, and so XOR is somewhat of an outlier here.

Given that Neoverse V1 apparently already carves up ordinary arithmetic
performed on 256-bit vectors and operates on 128 bits at a time, I am
rather skeptical that we're likely to see any SVE implementations of the
crypto extensions soon that are meaningfully faster, given that these
are presumably much costlier to implement in terms of gate count, and
therefore likely to be split up even on SVE implementations that can
perform ordinary arithmetic on 256+ bit vectors in a single cycle. Note
that even the arm64 SIMD accelerated CRC implementations rely heavily on
64x64->128 polynomial multiplication.

IOW, before we consider kernel mode SVE, I'd like to see some benchmarks
for other algorithms too.

> It's probably also worth checking whether the current NEON routines
> themselves are actually optimal for modern big CPUs - things have
> moved on quite a bit since Cortex-A57 (whose ASIMD performance could
> also be described as "esoteric" at the best of times...)
>

Some of those crypto routines could definitely be made faster, but it
highly depends on the context whether that actually helps: for instance,
there was a proposal a while ago to incorporate the AES-GCM code from
the OpenSSL project (authored by ARM) but at the time, it slightly
regressed the ~1500 byte case and only gave a substantial improvement
for much larger block sizes, which aren't that common in the kernel for
this particular algorithm.

IOW, any contributions that improve the existing code (or outright
replace it with something faster, for all I care) are highly
appreciated, but they should be motivated by benchmarks that reflect
the use cases that we actually consider important for the algorithm
in question.

^ permalink raw reply

* Re: [PATCH v6 01/30] mm: Introduce kpkeys
From: David Hildenbrand (Arm) @ 2026-04-17 14:37 UTC (permalink / raw)
  To: Kevin Brodsky, linux-hardening
  Cc: linux-kernel, Andrew Morton, Andy Lutomirski, Catalin Marinas,
	Dave Hansen, Ira Weiny, Jann Horn, Jeff Xu, Joey Gouly, Kees Cook,
	Linus Walleij, Lorenzo Stoakes, Marc Zyngier, Mark Brown,
	Matthew Wilcox, Maxwell Bland, Mike Rapoport (IBM),
	Peter Zijlstra, Pierre Langlois, Quentin Perret, Rick Edgecombe,
	Ryan Roberts, Thomas Gleixner, Vlastimil Babka, Will Deacon,
	Yang Shi, Yeoreum Yun, linux-arm-kernel, linux-mm, x86
In-Reply-To: <20260227175518.3728055-2-kevin.brodsky@arm.com>

On 2/27/26 18:54, Kevin Brodsky wrote:
> kpkeys is a simple framework to enable the use of protection keys
> (pkeys) to harden the kernel itself. This patch introduces the basic
> API in <linux/kpkeys.h>: a couple of functions to set and restore
> the pkey register and macros to define guard objects.
> 
> kpkeys introduces a new concept on top of pkeys: the kpkeys level.
> Each level is associated to a set of permissions for the pkeys
> managed by the kpkeys framework. kpkeys_set_level(lvl) sets those
> permissions according to lvl, and returns the original pkey
> register, to be later restored by kpkeys_restore_pkey_reg(). To
> start with, only KPKEYS_LVL_DEFAULT is available, which is meant
> to grant RW access to KPKEYS_PKEY_DEFAULT (i.e. all memory since
> this is the only available pkey for now).
> 
> Because each architecture implementing pkeys uses a different
> representation for the pkey register, and may reserve certain pkeys
> for specific uses, support for kpkeys must be explicitly indicated
> by selecting ARCH_HAS_KPKEYS and defining the following functions in
> <asm/kpkeys.h>, in addition to the macros provided in
> <asm-generic/kpkeys.h>:
> 
> - arch_kpkeys_set_level()
> - arch_kpkeys_restore_pkey_reg()
> - arch_kpkeys_enabled()

Another thing: why not simply drop the "arch_" stuff from these helpers?

-- 
Cheers,

David


^ permalink raw reply

* [PATCH bpf-next v2] arm32, bpf: Reject BPF-to-BPF calls and callbacks in the JIT
From: Puranjay Mohan @ 2026-04-17 14:33 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Puranjay Mohan, Jonas Rebmann, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Song Liu, Russell King,
	kernel

The ARM32 BPF JIT does not support BPF-to-BPF function calls
(BPF_PSEUDO_CALL) or callbacks (BPF_PSEUDO_FUNC), but it does
not reject them either.

When a program with subprograms is loaded (e.g. libxdp's XDP
dispatcher uses __noinline__ subprograms, or any program using
callbacks like bpf_loop or bpf_for_each_map_elem), the verifier
invokes bpf_jit_subprogs() which calls bpf_int_jit_compile()
for each subprogram.

For BPF_PSEUDO_CALL, since ARM32 does not reject it, the JIT
silently emits code using the wrong address computation:

    func = __bpf_call_base + imm

where imm is a pc-relative subprogram offset, producing a bogus
function pointer.

For BPF_PSEUDO_FUNC, the ldimm64 handler ignores src_reg and
loads the immediate as a normal 64-bit value without error.

In both cases, build_body() reports success and a JIT image is
allocated. ARM32 lacks the jit_data/extra_pass mechanism needed
for the second JIT pass in bpf_jit_subprogs(). On the second
pass, bpf_int_jit_compile() performs a full fresh compilation,
allocating a new JIT binary and overwriting prog->bpf_func. The
first allocation is never freed. bpf_jit_subprogs() then detects
the function pointer changed and aborts with -ENOTSUPP, but the
original JIT binary has already been leaked. Each program
load/unload cycle leaks one JIT binary allocation, as reported
by kmemleak:

    unreferenced object 0xbf0a1000 (size 4096):
      backtrace:
        bpf_jit_binary_alloc+0x64/0xfc
        bpf_int_jit_compile+0x14c/0x348
        bpf_jit_subprogs+0x4fc/0xa60

Fix this by rejecting both BPF_PSEUDO_CALL in the BPF_CALL
handler and BPF_PSEUDO_FUNC in the BPF_LD_IMM64 handler, falling
through to the existing 'notyet' path. This causes build_body()
to fail before any JIT binary is allocated, so
bpf_int_jit_compile() returns the original program unjitted.
bpf_jit_subprogs() then sees !prog->jited and cleanly falls
back to the interpreter with no leak.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: Jonas Rebmann <jre@pengutronix.de>
Closes: https://lore.kernel.org/bpf/b63e9174-7a3d-4e22-8294-16df07a4af89@pengutronix.de
Tested-by: Jonas Rebmann <jre@pengutronix.de>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---

Changelog:
v1: https://lore.kernel.org/all/20260417103004.3552500-1-puranjay@kernel.org/
Changes in v2:
- Add Acked-by: Daniel Borkmann <daniel@iogearbox.net>
- Reject BPF_PSEUDO_FUNC in the BPF_LD | BPF_IMM | BPF_DW handler
- Move code below declarations

---
 arch/arm/net/bpf_jit_32.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index deeb8f292454..a900aa973885 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1852,6 +1852,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 	{
 		u64 val = (u32)imm | (u64)insn[1].imm << 32;

+		if (insn->src_reg == BPF_PSEUDO_FUNC)
+			goto notyet;
+
 		emit_a32_mov_i64(dst, val, ctx);

 		return 1;
@@ -2055,6 +2058,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
 		const s8 *r5 = bpf2a32[BPF_REG_5];
 		const u32 func = (u32)__bpf_call_base + (u32)imm;

+		if (insn->src_reg == BPF_PSEUDO_CALL)
+			goto notyet;
+
 		emit_a32_mov_r64(true, r0, r1, ctx);
 		emit_a32_mov_r64(true, r1, r2, ctx);
 		emit_push_r64(r5, ctx);

base-commit: 1f5ffc672165ff851063a5fd044b727ab2517ae3
-- 
2.52.0

^ permalink raw reply related

* Re: [PATCH] KVM: arm64: pkvm: Adopt MARKER() to define host hypercall ranges
From: Marc Zyngier @ 2026-04-17 14:23 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, Marc Zyngier
  Cc: Will Deacon, Vincent Donnefort, Fuad Tabba, Joey Gouly,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu
In-Reply-To: <20260414160528.2218858-1-maz@kernel.org>

On Tue, 14 Apr 2026 17:05:28 +0100, Marc Zyngier wrote:
> The EL2 code defines ranges of host hypercalls that are either
> enabled at boot-time only, used by [nh]VHE KVM, or reserved to pKVM.
> 
> The way these ranges are delineated is error prone, as the enum symbols
> defining the limits are expressed in terms of actual function symbols.
> This means that should a new function be added, special care must be
> taken to also update the limit symbol.
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: pkvm: Adopt MARKER() to define host hypercall ranges
      commit: 9b72acd3f770517d3fd4bd7193fd60f3a81d1f69

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.




^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox