Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v7 12/28] media: rockchip: rga: avoid odd frame sizes for YUV formats
From: Michael Tretter @ 2026-05-21 14:11 UTC (permalink / raw)
  To: Sven Püschel
  Cc: Jacob Chen, Ezequiel Garcia, Mauro Carvalho Chehab,
	Heiko Stuebner, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Hans Verkuil, linux-media, linux-rockchip, linux-arm-kernel,
	linux-kernel, devicetree, kernel, nicolas, sebastian.reichel,
	p.zabel, Nicolas Dufresne
In-Reply-To: <20260521-spu-rga3-v7-12-3f33e8c7145f@pengutronix.de>

On Thu, 21 May 2026 00:44:17 +0200, Sven Püschel wrote:
> Avoid odd frame sizes for YUV formats, as they may cause undefined
> behavior. This is done in preparation for the RGA3, which hangs when the
> output format is set to 129x129 pixel YUV420 SP (NV12).
> 
> This requirement is documented explicitly for the RGA3 in  section 5.6.3
> of the RK3588 TRM Part 2. For the RGA2 the RK3588 TRM Part 2
> (section 6.1.2) and RK3568 TRM Part 2 (section 14.2) only mentions the
> x/y offsets and stride aligning requirements. But the vendor driver for
> the RGA2 also contains checks for the width and height to be aligned to
> 2 bytes.
> 
> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de>
> ---
>  drivers/media/platform/rockchip/rga/rga.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c
> index f599c992829dd..77b8c7ab74274 100644
> --- a/drivers/media/platform/rockchip/rga/rga.c
> +++ b/drivers/media/platform/rockchip/rga/rga.c
> @@ -337,6 +337,19 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  	struct rga_ctx *ctx = file_to_rga_ctx(file);
>  	const struct rga_hw *hw = ctx->rga->hw;
>  	struct rga_fmt *fmt;
> +	struct v4l2_frmsize_stepwise frmsize = {
> +		.min_width = hw->min_width,
> +		.max_width = hw->max_width,
> +		.min_height = hw->min_height,
> +		.max_height = hw->max_height,
> +		.step_width = 1,
> +		.step_height = 1,
> +	};
> +
> +	if (v4l2_is_format_yuv(v4l2_format_info(pix_fmt->pixelformat))) {
> +		frmsize.step_width = 2;
> +		frmsize.step_height = 2;

If I understand correctly, this limitation may be a result of 4:2:0
chroma subsampling. Thus, formats with 4:2:2 subsampling would also work
with step_height = 1.

As this may be some hardware limitation, a comment that points to the
TRM (in addition to the commit message) may be beneficial, too.

Michael

> +	}
>  
>  	if (V4L2_TYPE_IS_CAPTURE(f->type)) {
>  		const struct rga_frame *frm;
> @@ -358,11 +371,7 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  	if (!fmt)
>  		fmt = &hw->formats[0];
>  
> -	pix_fmt->width = clamp(pix_fmt->width,
> -			       hw->min_width, hw->max_width);
> -	pix_fmt->height = clamp(pix_fmt->height,
> -				hw->min_height, hw->max_height);
> -
> +	v4l2_apply_frmsize_constraints(&pix_fmt->width, &pix_fmt->height, &frmsize);
>  	v4l2_fill_pixfmt_mp(pix_fmt, fmt->fourcc, pix_fmt->width, pix_fmt->height);
>  	pix_fmt->field = V4L2_FIELD_NONE;
>  
> 
> -- 
> 2.54.0
> 
> 


^ permalink raw reply

* Re: [PATCH 0/4] firmware: arm_scmi: Fix protocol parsing and validation
From: Sudeep Holla @ 2026-05-21 14:09 UTC (permalink / raw)
  To: Cristian Marussi, arm-scmi, linux-arm-kernel, Sudeep Holla
In-Reply-To: <20260517-scmi_fixes-v1-0-d86daec4defd@kernel.org>

On Sun, 17 May 2026 20:02:39 +0100, Sudeep Holla wrote:
> This series fixes a few SCMI protocol parsing and validation issues found
> while checking the driver message layouts against the SCMI specification.
> 
> The first patch fixes a clear response width mismatch in SENSOR_CONFIG_GET,
> where the driver requested a 4-byte response but read it as a 64-bit value.
> 
> The next two patches harden notification parsing for variable-sized payloads.
> BASE_ERROR_EVENT and SENSOR_UPDATE both carry counted trailing arrays, so the
> received payload size must be validated before copying or parsing those
> entries.
> 
> [...]

Applied to sudeep.holla/linux (for-next/scmi/updates), thanks!


[1/4] firmware: arm_scmi: Read sensor config as 32-bit value
      https://git.kernel.org/sudeep.holla/c/f6fe7c3c007d
[2/4] firmware: arm_scmi: Validate BASE_ERROR_EVENT payload size
      https://git.kernel.org/sudeep.holla/c/56e7e64cdd0e
[3/4] firmware: arm_scmi: Validate SENSOR_UPDATE payload size
      https://git.kernel.org/sudeep.holla/c/32bc5496b481
[4/4] firmware: arm_scmi: Validate Powercap domains before state access
      https://git.kernel.org/sudeep.holla/c/fcca603c6a09

--
Regards,
Sudeep



^ permalink raw reply

* Re: [PATCH 16/43] KVM: arm64: gic-v5: Initialise and teardown VMTEs & doorbells
From: Sascha Bischoff @ 2026-05-21 14:07 UTC (permalink / raw)
  To: maz@kernel.org
  Cc: yuzenghui@huawei.com, Timothy Hayes, Suzuki Poulose, nd,
	peter.maydell@linaro.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	Joey Gouly, lpieralisi@kernel.org, oliver.upton@linux.dev
In-Reply-To: <86cxzgzlff.wl-maz@kernel.org>

On Thu, 2026-04-30 at 13:23 +0100, Marc Zyngier wrote:
> On Mon, 27 Apr 2026 17:11:30 +0100,
> Sascha Bischoff <Sascha.Bischoff@arm.com> wrote:
> > 
> > Each GICv5 VM requires a valid VM Table Entry (VMTE). The VM Table
> > itself is allocated during probe time, but a VM needs to provision
> > a
> > VMTE before it is able to properly run (PPIs will work, but nothing
> > else will - and PPIs only are not useful!).
> > 
> > The correct time for setting up the VMTE is during VM
> > initialisation. For GICv5, this is vgic_v5_init(). Each VM needs a
> > VM
> > ID - this is actually the index into the VM Table so it is how a
> > specific VMTE is selected too. As part of vgic_v5_init get a VM ID
> > via
> > vgic_v5_allocate_vm_id(), which internally uses an IDA to select an
> > unused VM ID (and hence VMTE) within the range of allowed VM IDs.
> > 
> > Once the VM ID has been allocated, the doorbell domain for the VM
> > is
> > allocated, and each of the doorbells itself is allocated and
> > assigned
> > to a vcpu.
> > 
> > Assuming everything up until this point has succeeded, initialise
> > the
> > VMTE. Internally this allocates the additional data structures
> > required by the hardware - the VM Descriptor, VPE Table, etc. This
> > VMTE is then made valid via the IRS's MMIO interface. Finally, all
> > VPEs are allocated within the VPET.
> > 
> > On teardown, this process is reversed again. The VMTE is made
> > invalid,
> > the VPEs are freed, the doorbells are released and the domain torn
> > down, and finally the VM ID is released. The latter allows the VM
> > ID
> > and VMTE to be reused for a future VM.
> > 
> > Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
> > ---
> >  arch/arm64/kvm/vgic/vgic-v5.c | 146 +++++++++++++++++++++++++++++-
> > ----
> >  1 file changed, 128 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic-v5.c
> > b/arch/arm64/kvm/vgic/vgic-v5.c
> > index 2fc6fa4df034f..9347bc6895223 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v5.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v5.c
> > @@ -518,6 +518,18 @@ static int vgic_v5_irs_vpe_cr0_update(int
> > vm_id, int vpe_id, u32 cr0)
> >   return 0;
> >  }
> >  
> > +static irqreturn_t db_handler(int irq, void *data)
> > +{
> > + struct kvm_vcpu *vcpu = data;
> > +
> > + WRITE_ONCE(vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db_fired, true);
> > +
> > + kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > + kvm_vcpu_kick(vcpu);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> 
> I think it'd make more sense if the doorbell
> handling/requesting/freeing was one patch, or at least a set of
> consecutive patches in the series.
> 
> As it is now, it is very hard to keep track of things. You have part
> of it in the previous patch, the requesting and handling here, and
> probably the freeing in some other patch I haven't seen.
> 
> >  static int vgic_v5_send_command(struct kvm_vcpu *vcpu,
> >   enum gicv5_vcpu_info_cmd_type type)
> >  {
> > @@ -726,26 +738,46 @@ void vgic_v5_reset(struct kvm_vcpu *vcpu)
> >   }
> >  }
> >  
> > -int vgic_v5_init(struct kvm *kvm)
> > +int vgic_v5_map_resources(struct kvm *kvm)
> >  {
> > - struct kvm_vcpu *vcpu;
> > - unsigned long idx;
> > - int ret;
> > + if (!vgic_initialized(kvm))
> > + return -EBUSY;
> >  
> > - if (vgic_initialized(kvm))
> > - return 0;
> > + return 0;
> > +}
> 
> Pointless code movement?

Very pointless. Removed that.

> 
> > 
> > - ret = vgic_v5_create_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
> > - if (ret)
> > - return ret;
> > +/*
> > + * Claim and populate a VMTE (optionally making a new L2 VMT
> > valid), create VPE
> > + * doorbells, allocate VPET and populate for each VPE. Finally, we
> > also init the
> > + * vIRS, which means allocating and making the virtual SPI IST
> > valid.
> > + *
> > + * Note: We do need to put the cart before the horse here. The VPE
> > doorbells are
> > + * our conduit for communication with the IRS, which means we need
> > to have those
> > + * before making the VMTE valid.
> > + *
> > + * On failure, we clean up in the teardown path
> > (vgic_v5_teardown()).
> > + */
> > +int vgic_v5_init(struct kvm *kvm)
> > +{
> > + int nr_vcpus, ret = 0;
> > + struct kvm_vcpu *vcpu, *vcpu0;
> > + unsigned long i;
> > + struct irq_data *d;
> > + unsigned int db_virq;
> > +
> > + nr_vcpus = atomic_read(&kvm->online_vcpus);
> > + if (nr_vcpus == 0)
> > + return -ENODEV;
> >  
> > - kvm_for_each_vcpu(idx, vcpu, kvm) {
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> >   if (vcpu_has_nv(vcpu)) {
> >   kvm_err("Nested GICv5 VMs are currently unsupported\n");
> >   return -EINVAL;
> >   }
> >   }
> >  
> > + kvm->arch.vgic.gicv5_vm.nr_vpes = nr_vcpus;
> 
> Why do we need to track the number of vcpus separately from what KVM
> already does? GICv4 does it because a lot of the state is managed by
> the irqchip driver, but that's not the case here. I hope we can come
> up with a slightly simpler model with GICv5.

Right, it isn't needed. I've removed it from here (and from the earlier
commits that introduced it/used it).

> 
> > +
> >   /* We only allow userspace to drive the SW_PPI, if it is
> > implemented. */
> >   bitmap_zero(kvm->arch.vgic.gicv5_vm.userspace_ppis,
> >       VGIC_V5_NR_PRIVATE_IRQS);
> > @@ -754,20 +786,98 @@ int vgic_v5_init(struct kvm *kvm)
> >      kvm->arch.vgic.gicv5_vm.userspace_ppis,
> >      ppi_caps.impl_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS);
> >  
> > - return 0;
> > + ret = vgic_v5_allocate_vm_id(kvm);
> > + if (ret) {
> > + kvm_err("Maximum number of GICv5 VMs reached!\n");
> > + return ret;
> > + }
> 
> I'd rather we don't scream on the console when running out of
> VMIDs. If we're at capacity, so be it. That's not an error worth
> spamming the console over.

Alright. Have removed the printing here.

> 
> > +
> > + ret = vgic_v5_create_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
> > + if (ret)
> > + return ret;
> 
> Who is freeing the VMID?

We are in the vgic_v5_teardown() path.

> 
> > +
> > + /*
> > + * Allocate VPE doorbells first - these are our conduit for
> > + * communicating with the host irqchip driver.
> > + */
> > + db_virq = irq_domain_alloc_irqs(kvm->arch.vgic.gicv5_vm.domain,
> > + nr_vcpus, NUMA_NO_NODE,
> > + &kvm->arch.vgic.gicv5_vm);
> > + if (db_virq < 0) {
> > + /* Simplify teardown by doing this early! */
> > + vgic_v5_teardown_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
> > + return db_virq;
> > + }
> > +
> > + kvm->arch.vgic.gicv5_vm.vpe_db_base = db_virq;
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + d = irq_domain_get_irq_data(kvm->arch.vgic.gicv5_vm.domain,
> > +     db_virq + i);
> > + irq_set_status_flags(db_virq + i, IRQ_NOAUTOEN);
> > +
> > + ret = request_irq(db_virq + i, db_handler, 0, "vcpu", vcpu);
> > + if (ret)
> > + return ret;
> > +
> > + /* Stash it with the VCPU for easy retrieval */
> > + vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db = db_virq + i;
> > + }
> > +
> > + /* Populate VMTE (with VPET and VM descriptor) */
> > + ret = vgic_v5_vmte_init(kvm);
> > + if (ret)
> > + return ret;
> > +
> > + /* We pick the first vcpu to make the VMTE valid - any would do
> > */
> > + vcpu0 = kvm_get_vcpu(kvm, 0);
> > + ret = vgic_v5_send_command(vcpu0, VMTE_MAKE_VALID);
> > + if (ret)
> > + return ret;
> > +
> > + /* Loop over all VPEs, allocate/populate their data structures */
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + ret = vgic_v5_vmte_alloc_vpe(vcpu);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + return ret;
> 
> I'm very worried about the error handling of that function. Who is
> responsible for cleaning up the mess when this fails?

I'd been working on the flawed assumption that vgic_v5_teardown() will
be called in response to an init failure. I've now reworked this to
explicitly roll back everything on a failure by proactively calling
vgic_v5_teardown(), which has also been made safe to call again in the
teardown path.

Thanks,
Sascha

> 
> >  }
> >  
> >  void vgic_v5_teardown(struct kvm *kvm)
> >  {
> > - vgic_v5_teardown_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
> > -}
> > + struct kvm_vcpu *vcpu, *vcpu0;
> > + struct vgic_dist *dist = &kvm->arch.vgic;
> > + unsigned long i;
> > + int rc;
> >  
> > -int vgic_v5_map_resources(struct kvm *kvm)
> > -{
> > - if (!vgic_initialized(kvm))
> > - return -EBUSY;
> > + /*
> > + * If the VM's ID isn't valid, then we failed init very early.
> > Nothing
> > + * to do here.
> > + */
> > + if (!kvm->arch.vgic.gicv5_vm.vm_id_valid)
> > + return;
> >  
> > - return 0;
> > + if (kvm->arch.vgic.gicv5_vm.vmte_allocated) {
> > + /* Make the VM invalid  */
> > + vcpu0 = kvm_get_vcpu(kvm, 0);
> > + rc = vgic_v5_send_command(vcpu0, VMTE_MAKE_INVALID);
> > + if (rc)
> > + kvm_err("could not make VMTE invalid\n");
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + if (vgic_v5_vmte_free_vpe(vcpu))
> > + kvm_err("Failed to free VPE\n");
> > + }
> > +
> > + if (vgic_v5_vmte_release(kvm))
> > + kvm_err("Failed to release VM 0x%x\n", dist->gicv5_vm.vm_id);
> > + }
> > +
> > + vgic_v5_teardown_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
> > +
> > + vgic_v5_release_vm_id(kvm);
> >  }
> >  
> >  int vgic_v5_finalize_ppi_state(struct kvm *kvm)
> 
> Thanks,
> 
>  M.
> 


^ permalink raw reply

* Re: [PATCH 14/43] KVM: arm64: gic-v5: Request VPE doorbells when going non-resident
From: Sascha Bischoff @ 2026-05-21 14:06 UTC (permalink / raw)
  To: maz@kernel.org
  Cc: yuzenghui@huawei.com, Timothy Hayes, Suzuki Poulose, nd,
	peter.maydell@linaro.org, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	Joey Gouly, lpieralisi@kernel.org, oliver.upton@linux.dev
In-Reply-To: <86ecjwzqcr.wl-maz@kernel.org>

On Thu, 2026-04-30 at 11:37 +0100, Marc Zyngier wrote:
> On Mon, 27 Apr 2026 17:10:49 +0100,
> Sascha Bischoff <Sascha.Bischoff@arm.com> wrote:
> > 
> > When a VPE is made non-resident and is entering WFI, a doorbell
> > should
> > be requested for the VPE. This allows the VPE to be easily woken
> > once
> > an SPI/LPI interrupt is pending for it. This is tracked by the IRS,
> > which will signal the specific VPE doorbell for the VPE once such
> > an
> > interrupt arrives.
> > 
> > Requesting a doorbell involves calculating the DBPM - DoorBell
> > Priority Mask - which ensures that the DB is only signalled by the
> > hardware if the pending interrupt is of sufficient priority. This
> > avoids waking a VPE that can't process the incoming interrupt.
> > 
> > Doorbells are NOT requested if a VPE is not entering WFI as we
> > expect
> > to enter again imminently.
> > 
> > Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
> > ---
> >  arch/arm64/kvm/vgic/vgic-v5.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic-v5.c
> > b/arch/arm64/kvm/vgic/vgic-v5.c
> > index 11a1a491b7e0a..2fc6fa4df034f 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v5.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v5.c
> > @@ -1077,6 +1077,9 @@ void vgic_v5_load(struct kvm_vcpu *vcpu)
> >  void vgic_v5_put(struct kvm_vcpu *vcpu)
> >  {
> >   struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
> > + bool req_db = !!vcpu_get_flag(vcpu, IN_WFI);
> 
> Drop the spurious variable and move the check in the if () statement.
> This is way more readable than declaring a variable.

Done.

> 
> > + u32 priority_mask;
> > + int dbpm;
> 
> Move these in the inner block.

Done.

> 
> >  
> >   /*
> >   * Do nothing if we're not resident. This can happen in the WFI
> > path
> > @@ -1090,6 +1093,31 @@ void vgic_v5_put(struct kvm_vcpu *vcpu)
> >   kvm_call_hyp(__vgic_v5_save_apr, cpu_if);
> >  
> >   cpu_if->vgic_contextr = 0;
> > + if (req_db) {
> > + /*
> > + * Find the virtual running priority and use this to calculate
> > + * the doorbell priority mask. We combine the highest active
> > + * priority and the CPU's priority mask. The guest can't handle
> > + * interrupts with priorities less than or equal to the virtual
> > + * running priority, so there's literally no point in waking the
> > + * guest for these.
> > + *
> > + * The priority needs to be higher than the mask to signal, so
> > + * pick the next higher priority (subtract 1).
> > + */
> > + priority_mask = vgic_v5_get_effective_priority_mask(vcpu);
> > +
> > + /* Don't request a doorbell if the max priority is masked */
> 
> This comment reads badly. I'd suggest something like "Request a
> doorbell *unless* the priority is 0, indicating that no interrupt can
> wake the vcpu up".

Done.

> 
> > + if (priority_mask) {
> > + dbpm = priority_mask - 1;
> > + cpu_if->vgic_contextr = FIELD_PREP(ICH_CONTEXTR_EL2_DB, 1) |
> > + FIELD_PREP(ICH_CONTEXTR_EL2_DBPM, dbpm);
> > + }
> > +
> > + /* Make the doorbell affine to this CPU */
> > + WARN_ON(irq_set_affinity(vgic_v5_vpe_db(vcpu),
> > + cpumask_of(smp_processor_id())));
> 
> Repeatedly setting the affinity is likely to be costly. It may be
> worth comparing with the current affinity somehow.

I've changed this to check the affinity first, and then only change it
if the CPU has changed since it was last set.

> 
> > + }
> >  
> >   kvm_call_hyp(__vgic_v5_make_non_resident, cpu_if);
> >  
> 
> Thanks,
> 
>  M.
> 

Thanks,
Sascha

^ permalink raw reply

* Re: [PATCH v14 09/44] arm64: RMI: Provide functions to delegate/undelegate ranges of memory
From: Marc Zyngier @ 2026-05-21 13:59 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-10-steven.price@arm.com>

On Wed, 13 May 2026 14:17:17 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> The RMM requires memory is 'delegated' to it so that it can be used
> either for a realm guest or for various tracking purposes within the RMM
> (e.g. for metadata or page tables). Memory that has been delegated
> cannot be accessed by the host (it will result in a Granule Protection
> Fault).
> 
> Undelegation may fail if the memory is still in use by the RMM. This
> shouldn't happen (Linux should ensure it has destroyed the RMM objects
> before attempting to undelegate). In the event that it does happen this
> points to a programming bug and the only reasonable approach is for the
> physical pages to be leaked - it is up to the caller of
> rmi_undelegate_range() to handle this.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> v14:
>  * Split into separate patch and moved out of KVM
> ---
>  arch/arm64/include/asm/rmi_cmds.h | 13 +++++++++++
>  arch/arm64/kernel/rmi.c           | 36 +++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> index 9078a2920a7c..eb213c8e6f26 100644
> --- a/arch/arm64/include/asm/rmi_cmds.h
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -33,6 +33,19 @@ struct rmi_sro_state {
>  } while (RMI_RETURN_STATUS(res.a0) == RMI_BUSY ||			\
>  	 RMI_RETURN_STATUS(res.a0) == RMI_BLOCKED)
>  
> +int rmi_delegate_range(phys_addr_t phys, unsigned long size);
> +int rmi_undelegate_range(phys_addr_t phys, unsigned long size);
> +
> +static inline int rmi_delegate_page(phys_addr_t phys)
> +{
> +	return rmi_delegate_range(phys, PAGE_SIZE);
> +}
> +
> +static inline int rmi_undelegate_page(phys_addr_t phys)
> +{
> +	return rmi_undelegate_range(phys, PAGE_SIZE);
> +}
> +
>  bool rmi_is_available(void);
>  
>  unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp);
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index 52a415e99500..08cef54acadb 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -12,6 +12,42 @@ static bool arm64_rmi_is_available;
>  unsigned long rmm_feat_reg0;
>  unsigned long rmm_feat_reg1;
>  
> +int rmi_delegate_range(phys_addr_t phys, unsigned long size)
> +{
> +	unsigned long ret = 0;
> +	unsigned long top = phys + size;
> +	unsigned long out_top;
> +
> +	while (phys < top) {
> +		ret = rmi_granule_range_delegate(phys, top, &out_top);
> +		if (ret == RMI_SUCCESS)
> +			phys = out_top;
> +		else if (ret != RMI_BUSY && ret != RMI_BLOCKED)
> +			return ret;
> +	}
> +
> +	return ret;
> +}
> +
> +int rmi_undelegate_range(phys_addr_t phys, unsigned long size)
> +{
> +	unsigned long ret = 0;
> +	unsigned long top = phys + size;
> +	unsigned long out_top;
> +
> +	WARN_ON(size == 0);

I find it odd to warn on size = 0. After all, free(NULL) is not an
error. But even then, you continue feeding this to the RMM.

You also don't seem to be bothered with that on the delegation side...

> +
> +	while (phys < top) {
> +		ret = rmi_granule_range_undelegate(phys, top, &out_top);
> +		if (ret == RMI_SUCCESS)
> +			phys = out_top;

and size==0 doesn't violate any of the failure conditions listed in
B4.5.18.2 (beta2). Will you end-up looping around forever?

Same questions for the delegation, obviously.

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply

* Re: (subset) [PATCH 0/4] power: sys-off: fix Pixel C shutdown via MAX77620
From: Diogo Ivo @ 2026-05-21 13:59 UTC (permalink / raw)
  To: Lee Jones
  Cc: Mark Rutland, Lorenzo Pieralisi, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Thierry Reding, Jonathan Hunter, linux-arm-kernel,
	linux-kernel, devicetree, linux-tegra
In-Reply-To: <20260521104136.GA2921053@google.com>



On 5/21/26 12:41, Lee Jones wrote:
> On Thu, 21 May 2026, Diogo Ivo wrote:
> 
>> Hi Lee,
>>
>> On 5/20/26 18:25, Lee Jones wrote:
>>> On Thu, 14 May 2026 16:47:18 +0200, Diogo Ivo wrote:
>>>> This series migrates PSCI and MAX77620 poweroff handling to the
>>>> sys-off framework and fixes shutdown on the Pixel C (Smaug).
>>>>
>>>> The first two patches replace legacy pm_power_off usage in the PSCI
>>>> and MAX77620 drivers with sys-off handlers. Besides aligning both
>>>> drivers with the modern poweroff infrastructure, this removes the
>>>> global callback dependency and allows multiple handlers to coexist
>>>> with explicit priorities.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>
>> Thanks for applying the patches! Just a question and an observation:
>>
>>   - I'm assuming you were ok with merging [2/4] despite the possible
>>     deadlock since this risk is already present in mainline in the same
>>     form so we're not actually making things worse, is that so?
> 
> Did you see the text below?

Yes, but patch 3 is not addressing the possible deadlock hence my
question.

> Both patches 2 and 3 are applied.
> 
>>   - The observation is that the comment about overriding PSCI is only
>>     true after (and if) a reworked [1/4] is actually merged.
>>     If it isn't then patch [3/4] is actually working around another handler
>>     in soc/tegra/pmc.c where a handler that only does work for the Nexus
>>     7 is actually registered at FIRMWARE level for all platforms that
>>     probe that driver (I will send out a patch shortly to only register
>>     the handler on the Nexus 7).
> 
> I assume the other patches will be applied soon.
>
> If this causes some kind of issue - let me know later on in the cycle
> and I'll remove whatever patches you ask me to.

The PSCI patch [1/4] has a fundamental issue and needs a respin to be
applied.

In connection with this it might then become easier to quirk the PSCI
driver rather than the PMIC driver, so for the moment I'll ask you to
drop [3/4] until I propose the changes to the PSCI maintainers and see
the feedback and at that point we can either completely drop [3/4] or
reapply it; sorry for the noise.

Best regards,
Diogo


^ permalink raw reply

* Re: [PATCH v7 11/28] media: rockchip: rga: move hw specific parts to a dedicated struct
From: Michael Tretter @ 2026-05-21 13:56 UTC (permalink / raw)
  To: Sven Püschel
  Cc: Jacob Chen, Ezequiel Garcia, Mauro Carvalho Chehab,
	Heiko Stuebner, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Hans Verkuil, linux-media, linux-rockchip, linux-arm-kernel,
	linux-kernel, devicetree, kernel, nicolas, sebastian.reichel,
	p.zabel, Nicolas Dufresne
In-Reply-To: <20260521-spu-rga3-v7-11-3f33e8c7145f@pengutronix.de>

On Thu, 21 May 2026 00:44:16 +0200, Sven Püschel wrote:
> In preparation for the RGA3 unit, move RGA2 specific parts from rga.c
> to rga-hw.c and create a struct to reference the RGA2 specific functions
> and formats. This also allows to remove the rga-hw.h reference from the
> include list of the rga driver.
> 
> Also document the command finish interrupt with a dedicated define.
> 
> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de>

A few nits below, but

Reviewed-by: Michael Tretter <m.tretter@pengutronix.de>

> ---
>  drivers/media/platform/rockchip/rga/rga-hw.c | 166 ++++++++++++++++++++-
>  drivers/media/platform/rockchip/rga/rga-hw.h |   5 +-
>  drivers/media/platform/rockchip/rga/rga.c    | 211 +++++----------------------
>  drivers/media/platform/rockchip/rga/rga.h    |  23 ++-
>  4 files changed, 227 insertions(+), 178 deletions(-)
> 
> diff --git a/drivers/media/platform/rockchip/rga/rga-hw.c b/drivers/media/platform/rockchip/rga/rga-hw.c
> index ec6c17504ca15..40498796507e0 100644
> --- a/drivers/media/platform/rockchip/rga/rga-hw.c
> +++ b/drivers/media/platform/rockchip/rga/rga-hw.c
> @@ -437,8 +437,8 @@ static void rga_cmd_set(struct rga_ctx *ctx,
>  		PAGE_SIZE, DMA_BIDIRECTIONAL);
>  }
>  
> -void rga_hw_start(struct rockchip_rga *rga,
> -		  struct rga_vb_buffer *src, struct rga_vb_buffer *dst)
> +static void rga_hw_start(struct rockchip_rga *rga,
> +			 struct rga_vb_buffer *src,  struct rga_vb_buffer *dst)
>  {
>  	struct rga_ctx *ctx = rga->curr;
>  
> @@ -452,3 +452,165 @@ void rga_hw_start(struct rockchip_rga *rga,
>  
>  	rga_write(rga, RGA_CMD_CTRL, 0x1);
>  }
> +
> +static bool rga_handle_irq(struct rockchip_rga *rga)

Returning a bool for success prevents to report any error interrupts to
the core. I guess that's fine and can be changed later, if it may be
necessary.

> +{
> +	int intr;
> +
> +	intr = rga_read(rga, RGA_INT) & 0xf;
> +
> +	rga_mod(rga, RGA_INT, intr << 4, 0xf << 4);
> +
> +	return intr & RGA_INT_COMMAND_FINISHED;
> +}
> +
> +static void rga_get_version(struct rockchip_rga *rga)
> +{
> +	rga->version.major = (rga_read(rga, RGA_VERSION_INFO) >> 24) & 0xFF;
> +	rga->version.minor = (rga_read(rga, RGA_VERSION_INFO) >> 20) & 0x0F;
> +}
> +
> +static struct rga_fmt formats[] = {
> +	{
> +		.fourcc = V4L2_PIX_FMT_ARGB32,
> +		.color_swap = RGA_COLOR_ALPHA_SWAP,
> +		.hw_format = RGA_COLOR_FMT_ABGR8888,
> +		.depth = 32,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_ABGR32,
> +		.color_swap = RGA_COLOR_RB_SWAP,
> +		.hw_format = RGA_COLOR_FMT_ABGR8888,
> +		.depth = 32,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_XBGR32,
> +		.color_swap = RGA_COLOR_RB_SWAP,
> +		.hw_format = RGA_COLOR_FMT_XBGR8888,
> +		.depth = 32,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_RGB24,
> +		.color_swap = RGA_COLOR_NONE_SWAP,
> +		.hw_format = RGA_COLOR_FMT_RGB888,
> +		.depth = 24,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_BGR24,
> +		.color_swap = RGA_COLOR_RB_SWAP,
> +		.hw_format = RGA_COLOR_FMT_RGB888,
> +		.depth = 24,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_ARGB444,
> +		.color_swap = RGA_COLOR_RB_SWAP,
> +		.hw_format = RGA_COLOR_FMT_ABGR4444,
> +		.depth = 16,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_ARGB555,
> +		.color_swap = RGA_COLOR_RB_SWAP,
> +		.hw_format = RGA_COLOR_FMT_ABGR1555,
> +		.depth = 16,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_RGB565,
> +		.color_swap = RGA_COLOR_RB_SWAP,
> +		.hw_format = RGA_COLOR_FMT_BGR565,
> +		.depth = 16,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_NV21,
> +		.color_swap = RGA_COLOR_UV_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV420SP,
> +		.depth = 12,
> +		.y_div = 2,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_NV61,
> +		.color_swap = RGA_COLOR_UV_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV422SP,
> +		.depth = 16,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_NV12,
> +		.color_swap = RGA_COLOR_NONE_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV420SP,
> +		.depth = 12,
> +		.y_div = 2,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_NV12M,
> +		.color_swap = RGA_COLOR_NONE_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV420SP,
> +		.depth = 12,
> +		.y_div = 2,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_NV16,
> +		.color_swap = RGA_COLOR_NONE_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV422SP,
> +		.depth = 16,
> +		.y_div = 1,
> +		.x_div = 1,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_YUV420,
> +		.color_swap = RGA_COLOR_NONE_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV420P,
> +		.depth = 12,
> +		.y_div = 2,
> +		.x_div = 2,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_YUV422P,
> +		.color_swap = RGA_COLOR_NONE_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV422P,
> +		.depth = 16,
> +		.y_div = 1,
> +		.x_div = 2,
> +	},
> +	{
> +		.fourcc = V4L2_PIX_FMT_YVU420,
> +		.color_swap = RGA_COLOR_UV_SWAP,
> +		.hw_format = RGA_COLOR_FMT_YUV420P,
> +		.depth = 12,
> +		.y_div = 2,
> +		.x_div = 2,
> +	},
> +};
> +
> +const struct rga_hw rga2_hw = {
> +	.formats = formats,
> +	.num_formats = ARRAY_SIZE(formats),
> +	.cmdbuf_size = RGA_CMDBUF_SIZE,
> +	.min_width = MIN_WIDTH,
> +	.max_width = MAX_WIDTH,
> +	.min_height = MIN_HEIGHT,
> +	.max_height = MAX_HEIGHT,
> +
> +	.start = rga_hw_start,
> +	.handle_irq = rga_handle_irq,
> +	.get_version = rga_get_version,
> +};
> diff --git a/drivers/media/platform/rockchip/rga/rga-hw.h b/drivers/media/platform/rockchip/rga/rga-hw.h
> index 2b8537a5fd0d7..c2e34be751939 100644
> --- a/drivers/media/platform/rockchip/rga/rga-hw.h
> +++ b/drivers/media/platform/rockchip/rga/rga-hw.h
> @@ -15,9 +15,6 @@
>  #define MIN_WIDTH 34
>  #define MIN_HEIGHT 34
>  
> -#define DEFAULT_WIDTH 100
> -#define DEFAULT_HEIGHT 100
> -
>  #define RGA_TIMEOUT 500
>  
>  /* Registers address */
> @@ -178,6 +175,8 @@
>  #define RGA_ALPHA_COLOR_NORMAL 0
>  #define RGA_ALPHA_COLOR_MULTIPLY_CAL 1
>  
> +#define RGA_INT_COMMAND_FINISHED 4

This is probably a bitfield in the interrupt register:

	#define RGA_INT_COMMAND_FINISHED 0x4

or

	#define RGA_INT_COMMAND_FINISHED BIT(2)

> +
>  /* Registers union */
>  union rga_mode_ctrl {
>  	unsigned int val;
> diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c
> index 8c34f73d69764..f599c992829dd 100644
> --- a/drivers/media/platform/rockchip/rga/rga.c
> +++ b/drivers/media/platform/rockchip/rga/rga.c
> @@ -25,7 +25,6 @@
>  #include <media/videobuf2-dma-sg.h>
>  #include <media/videobuf2-v4l2.h>
>  
> -#include "rga-hw.h"
>  #include "rga.h"
>  
>  static int debug;
> @@ -47,7 +46,7 @@ static void device_run(void *prv)
>  
>  	dst = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>  
> -	rga_hw_start(rga, vb_to_rga(src), vb_to_rga(dst));
> +	rga->hw->start(rga, vb_to_rga(src), vb_to_rga(dst));
>  
>  	spin_unlock_irqrestore(&rga->ctrl_lock, flags);
>  }
> @@ -55,13 +54,8 @@ static void device_run(void *prv)
>  static irqreturn_t rga_isr(int irq, void *prv)
>  {
>  	struct rockchip_rga *rga = prv;
> -	int intr;
>  
> -	intr = rga_read(rga, RGA_INT) & 0xf;
> -
> -	rga_mod(rga, RGA_INT, intr << 4, 0xf << 4);
> -
> -	if (intr & 0x04) {
> +	if (rga->hw->handle_irq(rga)) {
>  		struct vb2_v4l2_buffer *src, *dst;
>  		struct rga_ctx *ctx = rga->curr;
>  
> @@ -184,158 +178,17 @@ static int rga_setup_ctrls(struct rga_ctx *ctx)
>  	return 0;
>  }
>  
> -static struct rga_fmt formats[] = {
> -	{
> -		.fourcc = V4L2_PIX_FMT_ARGB32,
> -		.color_swap = RGA_COLOR_ALPHA_SWAP,
> -		.hw_format = RGA_COLOR_FMT_ABGR8888,
> -		.depth = 32,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_ABGR32,
> -		.color_swap = RGA_COLOR_RB_SWAP,
> -		.hw_format = RGA_COLOR_FMT_ABGR8888,
> -		.depth = 32,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_XBGR32,
> -		.color_swap = RGA_COLOR_RB_SWAP,
> -		.hw_format = RGA_COLOR_FMT_XBGR8888,
> -		.depth = 32,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_RGB24,
> -		.color_swap = RGA_COLOR_NONE_SWAP,
> -		.hw_format = RGA_COLOR_FMT_RGB888,
> -		.depth = 24,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_BGR24,
> -		.color_swap = RGA_COLOR_RB_SWAP,
> -		.hw_format = RGA_COLOR_FMT_RGB888,
> -		.depth = 24,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_ARGB444,
> -		.color_swap = RGA_COLOR_RB_SWAP,
> -		.hw_format = RGA_COLOR_FMT_ABGR4444,
> -		.depth = 16,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_ARGB555,
> -		.color_swap = RGA_COLOR_RB_SWAP,
> -		.hw_format = RGA_COLOR_FMT_ABGR1555,
> -		.depth = 16,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_RGB565,
> -		.color_swap = RGA_COLOR_RB_SWAP,
> -		.hw_format = RGA_COLOR_FMT_BGR565,
> -		.depth = 16,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_NV21,
> -		.color_swap = RGA_COLOR_UV_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV420SP,
> -		.depth = 12,
> -		.y_div = 2,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_NV61,
> -		.color_swap = RGA_COLOR_UV_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV422SP,
> -		.depth = 16,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_NV12,
> -		.color_swap = RGA_COLOR_NONE_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV420SP,
> -		.depth = 12,
> -		.y_div = 2,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_NV12M,
> -		.color_swap = RGA_COLOR_NONE_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV420SP,
> -		.depth = 12,
> -		.y_div = 2,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_NV16,
> -		.color_swap = RGA_COLOR_NONE_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV422SP,
> -		.depth = 16,
> -		.y_div = 1,
> -		.x_div = 1,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_YUV420,
> -		.color_swap = RGA_COLOR_NONE_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV420P,
> -		.depth = 12,
> -		.y_div = 2,
> -		.x_div = 2,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_YUV422P,
> -		.color_swap = RGA_COLOR_NONE_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV422P,
> -		.depth = 16,
> -		.y_div = 1,
> -		.x_div = 2,
> -	},
> -	{
> -		.fourcc = V4L2_PIX_FMT_YVU420,
> -		.color_swap = RGA_COLOR_UV_SWAP,
> -		.hw_format = RGA_COLOR_FMT_YUV420P,
> -		.depth = 12,
> -		.y_div = 2,
> -		.x_div = 2,
> -	},
> -};
> -
> -#define NUM_FORMATS ARRAY_SIZE(formats)
> -
> -static struct rga_fmt *rga_fmt_find(u32 pixelformat)
> +static struct rga_fmt *rga_fmt_find(struct rockchip_rga *rga, u32 pixelformat)
>  {
>  	unsigned int i;
>  
> -	for (i = 0; i < NUM_FORMATS; i++) {
> -		if (formats[i].fourcc == pixelformat)
> -			return &formats[i];
> +	for (i = 0; i < rga->hw->num_formats; i++) {
> +		if (rga->hw->formats[i].fourcc == pixelformat)
> +			return &rga->hw->formats[i];
>  	}
>  	return NULL;
>  }
>  
> -static struct rga_frame def_frame = {
> -	.crop.left = 0,
> -	.crop.top = 0,
> -	.crop.width = DEFAULT_WIDTH,
> -	.crop.height = DEFAULT_HEIGHT,
> -	.fmt = &formats[0],
> -};
> -
>  struct rga_frame *rga_get_frame(struct rga_ctx *ctx, enum v4l2_buf_type type)
>  {
>  	if (V4L2_TYPE_IS_OUTPUT(type))
> @@ -350,6 +203,18 @@ static int rga_open(struct file *file)
>  	struct rockchip_rga *rga = video_drvdata(file);
>  	struct rga_ctx *ctx = NULL;
>  	int ret = 0;
> +	u32 def_width = clamp(DEFAULT_WIDTH, rga->hw->min_width, rga->hw->max_width);
> +	u32 def_height = clamp(DEFAULT_HEIGHT, rga->hw->min_height, rga->hw->max_height);
> +	struct rga_frame def_frame = {
> +		.crop.left = 0,
> +		.crop.top = 0,
> +		.crop.width = def_width,
> +		.crop.height = def_height,
> +		.fmt = &rga->hw->formats[0],
> +	};
> +
> +	def_frame.stride = (def_width * def_frame.fmt->depth) >> 3;
> +	def_frame.size = def_frame.stride * def_height;
>  
>  	ctx = kzalloc_obj(*ctx);
>  	if (!ctx)
> @@ -360,9 +225,9 @@ static int rga_open(struct file *file)
>  	ctx->out = def_frame;
>  
>  	v4l2_fill_pixfmt_mp(&ctx->in.pix,
> -			    ctx->in.fmt->fourcc, DEFAULT_WIDTH, DEFAULT_HEIGHT);
> +			    ctx->in.fmt->fourcc, def_width, def_height);
>  	v4l2_fill_pixfmt_mp(&ctx->out.pix,
> -			    ctx->out.fmt->fourcc, DEFAULT_WIDTH, DEFAULT_HEIGHT);
> +			    ctx->out.fmt->fourcc, def_width, def_height);
>  
>  	if (mutex_lock_interruptible(&rga->mutex)) {
>  		kfree(ctx);
> @@ -429,12 +294,13 @@ vidioc_querycap(struct file *file, void *priv, struct v4l2_capability *cap)
>  
>  static int vidioc_enum_fmt(struct file *file, void *priv, struct v4l2_fmtdesc *f)
>  {
> +	struct rockchip_rga *rga = video_drvdata(file);
>  	struct rga_fmt *fmt;
>  
> -	if (f->index >= NUM_FORMATS)
> +	if (f->index >= rga->hw->num_formats)
>  		return -EINVAL;
>  
> -	fmt = &formats[f->index];
> +	fmt = &rga->hw->formats[f->index];
>  	f->pixelformat = fmt->fourcc;
>  
>  	if (f->type != V4L2_BUF_TYPE_VIDEO_CAPTURE &&
> @@ -469,6 +335,7 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  {
>  	struct v4l2_pix_format_mplane *pix_fmt = &f->fmt.pix_mp;
>  	struct rga_ctx *ctx = file_to_rga_ctx(file);
> +	const struct rga_hw *hw = ctx->rga->hw;
>  	struct rga_fmt *fmt;
>  
>  	if (V4L2_TYPE_IS_CAPTURE(f->type)) {
> @@ -487,14 +354,14 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  		pix_fmt->xfer_func = frm->pix.xfer_func;
>  	}
>  
> -	fmt = rga_fmt_find(pix_fmt->pixelformat);
> +	fmt = rga_fmt_find(ctx->rga, pix_fmt->pixelformat);
>  	if (!fmt)
> -		fmt = &formats[0];
> +		fmt = &hw->formats[0];
>  
>  	pix_fmt->width = clamp(pix_fmt->width,
> -			       (u32)MIN_WIDTH, (u32)MAX_WIDTH);
> +			       hw->min_width, hw->max_width);
>  	pix_fmt->height = clamp(pix_fmt->height,
> -				(u32)MIN_HEIGHT, (u32)MAX_HEIGHT);
> +				hw->min_height, hw->max_height);
>  
>  	v4l2_fill_pixfmt_mp(pix_fmt, fmt->fourcc, pix_fmt->width, pix_fmt->height);
>  	pix_fmt->field = V4L2_FIELD_NONE;
> @@ -529,7 +396,7 @@ static int vidioc_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  	frm->size = 0;
>  	for (i = 0; i < pix_fmt->num_planes; i++)
>  		frm->size += pix_fmt->plane_fmt[i].sizeimage;
> -	frm->fmt = rga_fmt_find(pix_fmt->pixelformat);
> +	frm->fmt = rga_fmt_find(rga, pix_fmt->pixelformat);
>  	frm->stride = pix_fmt->plane_fmt[0].bytesperline;
>  
>  	/*
> @@ -660,7 +527,7 @@ static int vidioc_s_selection(struct file *file, void *priv,
>  
>  	if (s->r.left + s->r.width > f->pix.width ||
>  	    s->r.top + s->r.height > f->pix.height ||
> -	    s->r.width < MIN_WIDTH || s->r.height < MIN_HEIGHT) {
> +	    s->r.width < rga->hw->min_width || s->r.height < rga->hw->min_height) {
>  		v4l2_dbg(debug, 1, &rga->v4l2_dev, "unsupported crop value.\n");
>  		return -EINVAL;
>  	}
> @@ -770,6 +637,10 @@ static int rga_probe(struct platform_device *pdev)
>  	if (!rga)
>  		return -ENOMEM;
>  
> +	rga->hw = of_device_get_match_data(&pdev->dev);
> +	if (!rga->hw)
> +		return dev_err_probe(&pdev->dev, -ENODEV, "failed to get match data\n");
> +
>  	rga->dev = &pdev->dev;
>  	spin_lock_init(&rga->ctrl_lock);
>  	mutex_init(&rga->mutex);
> @@ -833,8 +704,7 @@ static int rga_probe(struct platform_device *pdev)
>  	if (ret < 0)
>  		goto rel_m2m;
>  
> -	rga->version.major = (rga_read(rga, RGA_VERSION_INFO) >> 24) & 0xFF;
> -	rga->version.minor = (rga_read(rga, RGA_VERSION_INFO) >> 20) & 0x0F;
> +	rga->hw->get_version(rga);
>  
>  	v4l2_info(&rga->v4l2_dev, "HW Version: 0x%02x.%02x\n",
>  		  rga->version.major, rga->version.minor);
> @@ -842,7 +712,7 @@ static int rga_probe(struct platform_device *pdev)
>  	pm_runtime_put(rga->dev);
>  
>  	/* Create CMD buffer */
> -	rga->cmdbuf_virt = dma_alloc_attrs(rga->dev, RGA_CMDBUF_SIZE,
> +	rga->cmdbuf_virt = dma_alloc_attrs(rga->dev, rga->hw->cmdbuf_size,
>  					   &rga->cmdbuf_phy, GFP_KERNEL,
>  					   DMA_ATTR_WRITE_COMBINE);
>  	if (!rga->cmdbuf_virt) {
> @@ -850,9 +720,6 @@ static int rga_probe(struct platform_device *pdev)
>  		goto rel_m2m;
>  	}
>  
> -	def_frame.stride = (DEFAULT_WIDTH * def_frame.fmt->depth) >> 3;
> -	def_frame.size = def_frame.stride * DEFAULT_HEIGHT;
> -
>  	ret = video_register_device(vfd, VFL_TYPE_VIDEO, -1);
>  	if (ret) {
>  		v4l2_err(&rga->v4l2_dev, "Failed to register video device\n");
> @@ -865,7 +732,7 @@ static int rga_probe(struct platform_device *pdev)
>  	return 0;
>  
>  free_dma:
> -	dma_free_attrs(rga->dev, RGA_CMDBUF_SIZE, rga->cmdbuf_virt,
> +	dma_free_attrs(rga->dev, rga->hw->cmdbuf_size, rga->cmdbuf_virt,
>  		       rga->cmdbuf_phy, DMA_ATTR_WRITE_COMBINE);
>  rel_m2m:
>  	v4l2_m2m_release(rga->m2m_dev);
> @@ -883,7 +750,7 @@ static void rga_remove(struct platform_device *pdev)
>  {
>  	struct rockchip_rga *rga = platform_get_drvdata(pdev);
>  
> -	dma_free_attrs(rga->dev, RGA_CMDBUF_SIZE, rga->cmdbuf_virt,
> +	dma_free_attrs(rga->dev, rga->hw->cmdbuf_size, rga->cmdbuf_virt,
>  		       rga->cmdbuf_phy, DMA_ATTR_WRITE_COMBINE);
>  
>  	v4l2_info(&rga->v4l2_dev, "Removing\n");
> @@ -919,9 +786,11 @@ static const struct dev_pm_ops rga_pm = {
>  static const struct of_device_id rockchip_rga_match[] = {
>  	{
>  		.compatible = "rockchip,rk3288-rga",
> +		.data = &rga2_hw,
>  	},
>  	{
>  		.compatible = "rockchip,rk3399-rga",
> +		.data = &rga2_hw,
>  	},
>  	{},
>  };
> diff --git a/drivers/media/platform/rockchip/rga/rga.h b/drivers/media/platform/rockchip/rga/rga.h
> index c4a3905a48f0d..640e510285341 100644
> --- a/drivers/media/platform/rockchip/rga/rga.h
> +++ b/drivers/media/platform/rockchip/rga/rga.h
> @@ -14,6 +14,9 @@
>  
>  #define RGA_NAME "rockchip-rga"
>  
> +#define DEFAULT_WIDTH 100
> +#define DEFAULT_HEIGHT 100
> +
>  struct rga_fmt {
>  	u32 fourcc;
>  	int depth;
> @@ -68,6 +71,8 @@ static inline struct rga_ctx *file_to_rga_ctx(struct file *filp)
>  	return container_of(file_to_v4l2_fh(filp), struct rga_ctx, fh);
>  }
>  
> +struct rga_hw;
> +
>  struct rockchip_rga {
>  	struct v4l2_device v4l2_dev;
>  	struct v4l2_m2m_dev *m2m_dev;
> @@ -88,6 +93,8 @@ struct rockchip_rga {
>  	struct rga_ctx *curr;
>  	dma_addr_t cmdbuf_phy;
>  	void *cmdbuf_virt;
> +
> +	const struct rga_hw *hw;
>  };
>  
>  struct rga_addr_offset {
> @@ -138,7 +145,19 @@ static inline void rga_mod(struct rockchip_rga *rga, u32 reg, u32 val, u32 mask)
>  	rga_write(rga, reg, temp);
>  };
>  
> -void rga_hw_start(struct rockchip_rga *rga,
> -		  struct rga_vb_buffer *src, struct rga_vb_buffer *dst);
> +struct rga_hw {
> +	struct rga_fmt *formats;
> +	u32 num_formats;
> +	size_t cmdbuf_size;
> +	u32 min_width, min_height;
> +	u32 max_width, max_height;
> +
> +	void (*start)(struct rockchip_rga *rga,
> +		      struct rga_vb_buffer *src, struct rga_vb_buffer *dst);
> +	bool (*handle_irq)(struct rockchip_rga *rga);
> +	void (*get_version)(struct rockchip_rga *rga);
> +};
> +
> +extern const struct rga_hw rga2_hw;
>  
>  #endif
> 
> -- 
> 2.54.0
> 
> 


^ permalink raw reply

* Re: [PATCH AUTOSEL 7.0-5.10] arm64: cputype: Add C1-Pro definitions
From: Sasha Levin @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Mark Rutland
  Cc: patches, stable, Catalin Marinas, Will Deacon, James Morse,
	linux-arm-kernel, linux-kernel
In-Reply-To: <afCWPTqKxIqGPe1r@J2N7QTR9R3>

On Tue, Apr 28, 2026 at 12:13:01PM +0100, Mark Rutland wrote:
>On Tue, Apr 28, 2026 at 06:41:02AM -0400, Sasha Levin wrote:
>> From: Catalin Marinas <catalin.marinas@arm.com>
>>
>> [ Upstream commit 2c99561016c591f4c3d5ad7d22a61b8726e79735 ]
>>
>> Add cputype definitions for C1-Pro. These will be used for errata
>> detection in subsequent patches.
>
>This definition is only needed for a workaround which is only applicable
>to v6.18+ (and the downstream android16-6.12 tree).
>
>We needn't backport this patch to v5.1.0.y unless there's something that
>depends upon it.

I'll drop it, thanks.

-- 
Thanks,
Sasha


^ permalink raw reply

* Re: [PATCH v14 08/44] arm64: RMI: Ensure that the RMM has GPT entries for memory
From: Marc Zyngier @ 2026-05-21 13:47 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-9-steven.price@arm.com>

On Wed, 13 May 2026 14:17:16 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> The RMM maintains the state of all the granules in the system to make
> sure that the host is abiding by the rules. This state can be maintained
> at different granularity, per page (TRACKING_FINE) or per region
> (TRACKING_COARSE). The region size depends on the underlying
> "RMI_GRANULE_SIZE". For a "coarse" region all pages in the region must
> be of the same state, this implies we need to have "fine" tracking for
> DRAM, so that we can delegated individual pages.
> 
> For now we only support a statically carved out memory for tracking
> granules for the "fine" regions. This can be extended in the future to
> allow modifying the tracking granularity and remove the need for a
> static allocation.
> 
> Similarly, the firmware may create L0 GPT entries describing the total
> address space. But if we change the "PAS" (Physical Address Space) of a
> granule then the firmware may need to create L1 tables to track the PAS
> at a finer granularity.
> 
> Note: support is currently missing for SROs which means that if the RMM
> needs memory donating this will fail (and render CCA unusable in Linux).
> This effectively means that the L1 GPT tables must be created before
> Linux starts.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Moved out of KVM
> ---
>  arch/arm64/include/asm/rmi_cmds.h |   2 +
>  arch/arm64/kernel/rmi.c           | 103 ++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> index 9179934925c5..9078a2920a7c 100644
> --- a/arch/arm64/include/asm/rmi_cmds.h
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -33,6 +33,8 @@ struct rmi_sro_state {
>  } while (RMI_RETURN_STATUS(res.a0) == RMI_BUSY ||			\
>  	 RMI_RETURN_STATUS(res.a0) == RMI_BLOCKED)
>  
> +bool rmi_is_available(void);
> +
>  unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp);
>  void rmi_sro_free(struct rmi_sro_state *sro);
>  
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index a14ead5dedda..52a415e99500 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -7,6 +7,8 @@
>  
>  #include <asm/rmi_cmds.h>
>  
> +static bool arm64_rmi_is_available;
> +
>  unsigned long rmm_feat_reg0;
>  unsigned long rmm_feat_reg1;
>  
> @@ -88,6 +90,102 @@ static int rmi_configure(void)
>  	return 0;
>  }
>  
> +/*
> + * For now we set the tracking_region_size to 0 for RMI_RMM_CONFIG_SET().
> + * TODO: Support other tracking sizes (via Kconfig option).
> + */
> +#ifdef CONFIG_PAGE_SIZE_4KB
> +#define RMM_GRANULE_TRACKING_SIZE	SZ_1G
> +#elif defined(CONFIG_PAGE_SIZE_16KB)
> +#define RMM_GRANULE_TRACKING_SIZE	SZ_32M
> +#elif defined(CONFIG_PAGE_SIZE_64KB)
> +#define RMM_GRANULE_TRACKING_SIZE	SZ_512M
> +#endif

Basically, a level 2 mapping. Which means this whole block really is:

#define RMM_GRANULE_TRAKING_SIZE	(2 * PAGE_SHIFT - 3)

(adjust for D128 as needed).

> +
> +/*
> + * Make sure the area is tracked by RMM at FINE granularity.
> + * We do not support changing the tracking yet.
> + */
> +static int rmi_verify_memory_tracking(phys_addr_t start, phys_addr_t end)
> +{
> +	while (start < end) {
> +		unsigned long ret, category, state, next;
> +
> +		ret = rmi_granule_tracking_get(start, end, &category, &state, &next);
> +		if (ret != RMI_SUCCESS ||
> +		    state != RMI_TRACKING_FINE ||
> +		    category != RMI_MEM_CATEGORY_CONVENTIONAL) {
> +			/* TODO: Set granule tracking in this case */
> +			pr_err("Granule tracking for region isn't fine/conventional: %llx",
> +			       start);
> +			return -ENODEV;

How is this triggered? Do we really need to spam the console with
this? A PA doesn't mean much, and there is no context (stack trace).

If that's not expected, turn this into a WARN_ONCE().

> +		}
> +		start = next;
> +	}
> +
> +	return 0;
> +}
> +
> +static unsigned long rmi_l0gpt_size(void)
> +{
> +	return 1UL << (30 + FIELD_GET(RMI_FEATURE_REGISTER_1_L0GPTSZ,
> +				      rmm_feat_reg1));
> +}
> +
> +static int rmi_create_gpts(phys_addr_t start, phys_addr_t end)
> +{
> +	unsigned long l0gpt_sz = rmi_l0gpt_size();
> +
> +	start = ALIGN_DOWN(start, l0gpt_sz);
> +	end = ALIGN(end, l0gpt_sz);
> +
> +	while (start < end) {
> +		int ret = rmi_gpt_l1_create(start);
> +
> +		/*
> +		 * Make sure the L1 GPT tables are created for the region.
> +		 * RMI_ERROR_GPT indicates the L1 table already exists.
> +		 */
> +		if (ret && ret != RMI_ERROR_GPT) {
> +			/*
> +			 * FIXME: Handle SRO so that memory can be donated for
> +			 * the tables.
> +			 */
> +			pr_err("GPT Level1 table missing for %llx\n", start);
> +			return -ENOMEM;

If any of this fails, where is the cleanup done? Is that part of the
missing SRO support that's indicated in the commit message?

> +		}
> +		start += l0gpt_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +static int rmi_init_metadata(void)
> +{
> +	phys_addr_t start, end;
> +	const struct memblock_region *r;
> +
> +	for_each_mem_region(r) {
> +		int ret;
> +
> +		start = memblock_region_memory_base_pfn(r) << PAGE_SHIFT;
> +		end = memblock_region_memory_end_pfn(r) << PAGE_SHIFT;
> +		ret = rmi_verify_memory_tracking(start, end);
> +		if (ret)
> +			return ret;
> +		ret = rmi_create_gpts(start, end);
> +		if (ret)
> +			return ret;
> +	}

How does this work with, say, memory hotplug?

> +
> +	return 0;
> +}
> +
> +bool rmi_is_available(void)
> +{
> +	return arm64_rmi_is_available;
> +}
> +
>  static int __init arm64_init_rmi(void)
>  {
>  	/* Continue without realm support if we can't agree on a version */
> @@ -101,6 +199,11 @@ static int __init arm64_init_rmi(void)
>  
>  	if (rmi_configure())
>  		return 0;
> +	if (rmi_init_metadata())
> +		return 0;
> +
> +	arm64_rmi_is_available = true;
> +	pr_info("RMI configured");
>  
>  	return 0;
>  }

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply

* Re: [PATCH v5 3/3] iommu/arm-smmu-v3: Allow ATS to be always on
From: Jason Gunthorpe @ 2026-05-21 13:44 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: will, joro, bhelgaas, robin.murphy, praan, baolu.lu, kevin.tian,
	miko.lenczewski, linux-arm-kernel, iommu, linux-kernel, linux-pci,
	dan.j.williams, jonathan.cameron, vsethi, linux-cxl, nirmoyd
In-Reply-To: <ag43NP4UiS7Z9T6q@Asurada-Nvidia>

On Wed, May 20, 2026 at 03:35:32PM -0700, Nicolin Chen wrote:
> @ -3870,13 +3870,15 @@ static int arm_smmu_blocking_set_dev_pasid(struct iommu_domain *new_domain,
>          * When the last user of the CD table goes away downgrade the STE back
>          * to a non-cd_table one, by re-attaching its sid_domain.
>          */
> -       if (!master->ats_always_on &&
> -           !arm_smmu_ssids_in_use(&master->cd_table)) {
> +       if (!arm_smmu_ssids_in_use(&master->cd_table)) {
>                 struct iommu_domain *sid_domain =
>                         iommu_driver_get_domain_for_dev(master->dev);
> +               bool ats_always_on = master->ats_always_on &&
> +                                    sid_domain->type != IOMMU_DOMAIN_BLOCKED;
> +               bool downgrade = sid_domain->type == IOMMU_DOMAIN_IDENTITY ||
> +                                sid_domain->type == IOMMU_DOMAIN_BLOCKED;
> 
> -               if (sid_domain->type == IOMMU_DOMAIN_IDENTITY ||
> -                   sid_domain->type == IOMMU_DOMAIN_BLOCKED)
> +               if (!ats_always_on && downgrade)
>                         sid_domain->ops->attach_dev(sid_domain, dev,
>                                                     sid_domain);

Only identity should remain with the CD S1DSS STE, BLOCKED should
attach the normal blocking domain still

Jason


^ permalink raw reply

* Re: [PATCH v7 10/28] media: rockchip: rga: announce and sync colorimetry
From: Michael Tretter @ 2026-05-21 13:44 UTC (permalink / raw)
  To: Sven Püschel
  Cc: Jacob Chen, Ezequiel Garcia, Mauro Carvalho Chehab,
	Heiko Stuebner, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Hans Verkuil, linux-media, linux-rockchip, linux-arm-kernel,
	linux-kernel, devicetree, kernel, nicolas, sebastian.reichel,
	p.zabel, Nicolas Dufresne
In-Reply-To: <20260521-spu-rga3-v7-10-3f33e8c7145f@pengutronix.de>

"announce colorimetry" in the subject is a bit strange. Maybe rephrase
the subject to

	media: rockchip: rga: announce CSC and sync colorimetry

On Thu, 21 May 2026 00:44:15 +0200, Sven Püschel wrote:
> Announce the capability to adjust the quantization and ycbcr_enc on the
> capture side and check if the SET_CSC flag is set when the colorimetry
> is changed. Furthermore copy the colorimetry from the output to the
> capture side to fix the currently failing v4l2-compliance tests, which
> expect exactly this behavior.
> 
> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de>
> ---
>  drivers/media/platform/rockchip/rga/rga.c | 37 +++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c
> index ca8d8a53dc251..8c34f73d69764 100644
> --- a/drivers/media/platform/rockchip/rga/rga.c
> +++ b/drivers/media/platform/rockchip/rga/rga.c
> @@ -437,6 +437,15 @@ static int vidioc_enum_fmt(struct file *file, void *priv, struct v4l2_fmtdesc *f
>  	fmt = &formats[f->index];
>  	f->pixelformat = fmt->fourcc;
>  
> +	if (f->type != V4L2_BUF_TYPE_VIDEO_CAPTURE &&
> +	    f->type != V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)

Is there a reason for not using V4L2_TYPE_IS_CAPTURE(f->type)? I'd also
invert the condition, set the flags in the branch and have a single exit
point of this function.

> +		return 0;
> +
> +	/* allow changing the quantization and xfer func for YUV formats */
> +	if (v4l2_is_format_yuv(v4l2_format_info(f->pixelformat)))
> +		f->flags |= V4L2_FMT_FLAG_CSC_QUANTIZATION |
> +			    V4L2_FMT_FLAG_CSC_YCBCR_ENC;
> +
>  	return 0;
>  }
>  
> @@ -459,8 +468,25 @@ static int vidioc_g_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  {
>  	struct v4l2_pix_format_mplane *pix_fmt = &f->fmt.pix_mp;
> +	struct rga_ctx *ctx = file_to_rga_ctx(file);
>  	struct rga_fmt *fmt;
>  
> +	if (V4L2_TYPE_IS_CAPTURE(f->type)) {
> +		const struct rga_frame *frm;
> +
> +		frm = rga_get_frame(ctx, f->type);
> +		if (IS_ERR(frm))
> +			return PTR_ERR(frm);
> +
> +		if (!(pix_fmt->flags & V4L2_PIX_FMT_FLAG_SET_CSC)) {
> +			pix_fmt->quantization = frm->pix.quantization;
> +			pix_fmt->ycbcr_enc = frm->pix.ycbcr_enc;
> +		}

Are there any limits on the colorspace conversion that the RGA can do?
If I understand correctly, user space may set an arbitrary
v4l2_ycbcr_encoding (for example V4L2_YCBCR_ENC_BT2020) and the driver
will happily accept it.

> +		/* disallow values not announced in vidioc_enum_fmt */

"disallow values" sounds strange. Maybe:

		/* The RGA cannot convert colorspace and xfer_func */

> +		pix_fmt->colorspace = frm->pix.colorspace;
> +		pix_fmt->xfer_func = frm->pix.xfer_func;
> +	}
> +
>  	fmt = rga_fmt_find(pix_fmt->pixelformat);
>  	if (!fmt)
>  		fmt = &formats[0];
> @@ -506,6 +532,17 @@ static int vidioc_s_fmt(struct file *file, void *priv, struct v4l2_format *f)
>  	frm->fmt = rga_fmt_find(pix_fmt->pixelformat);
>  	frm->stride = pix_fmt->plane_fmt[0].bytesperline;
>  
> +	/*
> +	 * Copy colorimetry from output to capture as required by the
> +	 * v4l2-compliance tests
> +	 */
> +	if (V4L2_TYPE_IS_OUTPUT(f->type)) {
> +		ctx->out.pix.colorspace = pix_fmt->colorspace;
> +		ctx->out.pix.ycbcr_enc = pix_fmt->ycbcr_enc;
> +		ctx->out.pix.quantization = pix_fmt->quantization;
> +		ctx->out.pix.xfer_func = pix_fmt->xfer_func;

I was very confused, because ctx->out is actually the format for
CAPTURE. The comment kind of helps to mark this trap. Not sure, if there
is anything that can be done about this.

Michael

> +	}
> +
>  	/* Reset crop settings */
>  	frm->crop.left = 0;
>  	frm->crop.top = 0;
> 
> -- 
> 2.54.0
> 
> 


^ permalink raw reply

* Re: [PATCH 5/6] firmware: samsung: acpm: Add TMU protocol support
From: Alexey Klimov @ 2026-05-21 13:37 UTC (permalink / raw)
  To: Peter Griffin
  Cc: Tudor Ambarus, Krzysztof Kozlowski, Michael Turquette,
	Stephen Boyd, Lee Jones, Alim Akhtar, Sylwester Nawrocki,
	Chanwoo Choi, André Draszik, linux-kernel, linux-samsung-soc,
	linux-arm-kernel, linux-clk, jyescas, kernel-team,
	Krzysztof Kozlowski
In-Reply-To: <CADrjBPqiooFC9o56bOAg-j7908ssPtrzff1reNe6eXmu7hcA=w@mail.gmail.com>

On Thu May 21, 2026 at 9:25 AM BST, Peter Griffin wrote:
> Hi Alexey,
>
> On Wed, 20 May 2026 at 22:01, Alexey Klimov <alexey.klimov@linaro.org> wrote:
>>
>> Hi Tudor,
>>
>> On Tue May 19, 2026 at 4:46 PM BST, Tudor Ambarus wrote:
>> > Hi, Alexey,
>> >
>> > On 5/18/26 2:24 PM, Alexey Klimov wrote:
>> >> Thinking further about this I'd humbly suggest that even
>> >>
>> >>      if (fw_err >= 0)
>> >>              return 0;
>> >>
>> >>      pr_debug_ratelimited("ACPM tmu call returned: %x\n", fw_err);
>> >>      or pr_debug(...);
>> >>
>> >>      if (fw_err == -1)
>> >>              return -EACCES;
>> >>
>> >> some debug message would do.
>> >> Perhaps we need some convertation, for instance as it is done in scmi
>> >> code (scmi_to_linux_errno(), scmi_linux_errmap[]). But I don't have any
>> >> data for mapping acpm errors to some human meanings.
>> >
>> > I did that for the pmic helpers. I don't need any debug prints for
>> > gs101 TMU as I have clear instructions from firmware: 0 for success,
>> > -1 for error.
>>
>> This doesn't look like a right approach for upstreaming a ACPM TMU
>> framework.
>>
>> You are trying to submit a gs101-specific implementation masquerading
>> it as a generic ACPM TMU framework, while explicitly pushing the
>> refactoring work onto the next developer to add support for other
>> SoCs in this generic ACPM code.
>>
>> The ACPM TMU protocol implementation on Exynos850 is different: it uses
>> different error codes, and half of the calls in this 'generic' driver
>> are not even implemented in the Exynos850 firmware. Relying on a
>> hardcoded if (fw_err == -1) in a driver named generic ACPM is broken
>> by design and may silently swallow critical firmware errors on other
>> SoCs.
>>
>> What about such options below?
>> - rename the driver to reflect reality: rename this specifically to
>> gs101-acpm-tmu-something to reflect that it is tailored for gs101-s;
>>
>> or
>> - abstract the firmware error handling paths through driver_data or
>> a dedicated ops structure now, so that other SoCs can cleanly hook into
>> it without having to rewrite the logic later.
>
> AFAIK it's pretty normal not to add new hooks before they are
> required. I think the approach taken in this series makes sense, as
> it's the developer adding support for SoC #2 who best knows what the
> differences are on their platform versus what exists upstream.
> Similarly, the developer adding support for SoC #3 may have different
> requirements to e850 and gs101 and that developer is best placed to
> refactor and add hooks or quirks that are required for that platform.
>
> Let's not try to boil the ocean with this series, it's targeting GS101
> support. We can evolve it for future SoCs as those requirements and
> differences become clear.

Peter, I agree we shouldn't bother about hypothetical SoCs. However,
Exynos850 is not hypothetical (I guess SoC #2 in this text). It is
possible to take set of patches from maillist, copy acpm DT node from
gs101 (minimal compatible rename) and it will be Exynos SoC with enabled
ACPM. I am also actively working on it. Hooks or whatever other way of
handling firmware error codes are required now.
We actually already know the differences: different error codes and
only 4 ACPM TMU calls are implemented on e850: TMU init, TMU read
temperature, TMU suspend/resume.
Extra TMU calls are fine since we can just do not use them from thermal
driver but hiding correct error codes, well, that's another story.
We already know this way of handling firmware codes is not ACPM TMU
generic.

The ACPM TMU part of gs101 ACPM firmware might be a vendor-specific fork.
We shouldn't assume it strictly adheres to the reference Samsung ACPM TMU
design.

Alexey



^ permalink raw reply

* Re: [PATCH v22 08/13] mfd: core: Add firmware-node support to MFD cells
From: Bartosz Golaszewski @ 2026-05-21 13:36 UTC (permalink / raw)
  To: Lee Jones
  Cc: Shivendra Pratap, Sebastian Reichel, Mark Rutland,
	Lorenzo Pieralisi, Rafael J. Wysocki, Daniel Lezcano,
	Christian Loehle, Ulf Hansson, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Bjorn Andersson, Konrad Dybcio, Arnd Bergmann,
	Souvik Chakravarty, Andy Yan, Matthias Brugger, John Stultz,
	Moritz Fischer, Sudeep Holla, linux-pm, linux-kernel,
	linux-arm-msm, linux-arm-kernel, devicetree, Florian Fainelli,
	Krzysztof Kozlowski, Dmitry Baryshkov, Mukesh Ojha, Andre Draszik,
	Greg Kroah-Hartman, Kathiravan Thirumoorthy, Srinivas Kandagatla,
	Bartosz Golaszewski
In-Reply-To: <20260521132419.GA3591266@google.com>

On Thu, May 21, 2026 at 3:24 PM Lee Jones <lee@kernel.org> wrote:
>
> >
> > I suggested it because of its flexibility. The alternative I had in
> > mind is something like a new field in mfd_cell:
> >
> >     const char *cell_node_name;
> >
> > Which - if set - would tell MFD to look up an fwnode that's a child of
> > the parent device's node by name - as it may not have a compatible.
>
> Remind me why the chlid device can't look-up its own fwnode?
>

Oh sure it can, but should it? I'm not sure it's logically sound to
have the child device reach into the parent, look up the fwnode and
then assign it to itself after it's already attached to the driver.
This should be done at the subsystem level before the device is
registered.

Bart


^ permalink raw reply

* Re: [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init
From: Fuad Tabba @ 2026-05-21 13:30 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, linux-arm-kernel, kvmarm, kernel-team,
	qperret, Sashiko
In-Reply-To: <ag8GvtAonB6LNB5m@google.com>

On Thu, 21 May 2026 at 14:21, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> On Thu, May 21, 2026 at 02:07:36PM +0100, Fuad Tabba wrote:
> > On Thu, 21 May 2026 at 11:22, Vincent Donnefort <vdonnefort@google.com> wrote:
> > >
> > > When a VM fails to initialise after its stage-2 hyp_pool has been
> > > initialised, that stage-2 must be torn down entirely. This requires
> > > resetting both the refcount and the order of its pages back to 0.
> > >
> > > Currently, reclaim_pgtable_pages() implicitly resets the page order by
> > > allocating the entire pool with order-0 granularity. However, in the VM
> > > initialisation error path, the addresses of the donated memory (the PGD)
> > > are already known, making it unnecessary to iterate over all pages in
> > > the pool.
> > >
> > > Since the vmemmap page order is a hyp_pool-specific field, leaving a
> > > non-zero order on hyp_pool destruction is harmless until another pool
> > > attempts to admit the page. Instead of resetting this field during
> > > destruction, reset it during pool initialization in hyp_pool_init().
> > > Note that pages added to the pool outside of the initial pool range
> > > (e.g., via guest_s2_zalloc_page()) must still have their order managed
> > > manually.
> > >
> > > While at it, add a WARN_ON() in the hyp_pool attach path to catch
> > > unexpected page orders that exceed the pool's max_order.
> > >
> > > Fixes: 256b4668cd89 ("KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization")
> > > Reported-by: Sashiko <sashiko-bot@kernel.org>
> > > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index 25f04629014e..89eb20d4fee4 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -322,7 +322,6 @@ void reclaim_pgtable_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> > >         while (addr) {
> > >                 page = hyp_virt_to_page(addr);
> > >                 page->refcount = 0;
> > > -               page->order = 0;
> > >                 push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > >                 WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> > >                 addr = hyp_alloc_pages(&vm->pool, 0);
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > > index a1eb27a1a747..c3b3dc5a8ea7 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > > @@ -97,6 +97,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
> > >         u8 order = p->order;
> > >         struct hyp_page *buddy;
> > >
> > > +       WARN_ON(p->order > pool->max_order);
> > > +
> >
> > Could you add a brief comment? It took me a minute to figure out what this
> > catches. IIUC it's not attach's own input, it's a stale p->order from way back
> > when an external page was popped from a memcache (today only via
> > guest_s2_zalloc_page()). Right?
>
> I think it'd be self explanatory if that was next the page_add_to_list, but that
> wouldn't protect the memset (that's really best-effort though).
>
> How about?
>
> /*
>  * A page with an order bigger than the pool's max is an 'external' page
>  * whose order hasn't been reset before being added to the pool.
>  */
>
> But now I am thinking I can do way better: we can easily identify external
> pages, so I could just force the order to 0 in that case.
>
> WDYS?

Yeah, Sounds better. The WARN's scope was actually narrower than the
real risk. Forcing order = 0 on entry covers all of that and removes
the implicit caller obligation the WARN was best-effort enforcing.

The memset is trivially safe then too (PAGE_SIZE << 0, regardless of
what was in the vmemmap).

Cheers,
/fuad

>
> >
> > With that.
> >
> > Reviewed-by: Fuad Tabba <tabba@google.com>
> > Tested-by: Fuad Tabba <tabba@google.com>
> >
> > Cheers,
> > /fuad
> >
> >
> >
> >
> > >         memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
> > >
> > >         /* Skip coalescing for 'external' pages being freed into the pool. */
> > > @@ -237,8 +239,10 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
> > >
> > >         /* Init the vmemmap portion */
> > >         p = hyp_phys_to_page(phys);
> > > -       for (i = 0; i < nr_pages; i++)
> > > +       for (i = 0; i < nr_pages; i++) {
> > >                 hyp_set_page_refcounted(&p[i]);
> > > +               p[i].order = 0;
> > > +       }
> > >
> > >         /* Attach the unused pages to the buddy tree */
> > >         for (i = reserved_pages; i < nr_pages; i++)
> > > --
> > > 2.54.0.746.g67dd491aae-goog
> > >


^ permalink raw reply

* Re: [PATCH v14 07/44] arm64: RMI: Configure the RMM with the host's page size
From: Marc Zyngier @ 2026-05-21 13:30 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-8-steven.price@arm.com>

On Wed, 13 May 2026 14:17:15 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> RMM v2.0 brings the ability to set the RMM's granule size. Check the
> feature registers and configure the RMM so that it matches the host's
> page size. This means that operations can be done with a granulatity
> equal to PAGE_SIZE.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Moved out of KVM.
> ---
>  arch/arm64/kernel/rmi.c | 42 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index 99c1ccc35c11..a14ead5dedda 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -49,6 +49,45 @@ static int rmi_check_version(void)
>  	return 0;
>  }
>  
> +static int rmi_configure(void)
> +{
> +	struct rmm_config *config __free(free_page) = NULL;
> +	unsigned long ret;
> +
> +	config = (struct rmm_config *)get_zeroed_page(GFP_KERNEL);
> +	if (!config)
> +		return -ENOMEM;

This is the sort of buggy construct that is highlighted in
include/linux/cleanup.h: initialising the object for cleanup with
NULL, and only later assigning the expected value.

It may not matter here, but it will catch you (or more probably me) in
the future.

> +
> +	switch (PAGE_SIZE) {
> +	case SZ_4K:
> +		config->rmi_granule_size = RMI_GRANULE_SIZE_4KB;
> +		break;
> +	case SZ_16K:
> +		config->rmi_granule_size = RMI_GRANULE_SIZE_16KB;
> +		break;
> +	case SZ_64K:
> +		config->rmi_granule_size = RMI_GRANULE_SIZE_64KB;
> +		break;
> +	default:
> +		pr_err("Unsupported PAGE_SIZE for RMM\n");

Do you really anticipate PAGE_SIZE being any other value? This is 100%
dead code. If you want to be extra cautious, have a BUILD_BUg_ON().

> +		return -EINVAL;
> +	}
> +
> +	ret = rmi_rmm_config_set(virt_to_phys(config));
> +	if (ret) {
> +		pr_err("RMM config set failed\n");
> +		return -EINVAL;
> +	}

What is the live cycle of the page when the call succeeds? Is it
switched back to the NS PAS and allowed to be freed?

> +
> +	ret = rmi_rmm_activate();
> +	if (ret) {
> +		pr_err("RMM activate failed\n");
> +		return -ENXIO;
> +	}
> +
> +	return 0;
> +}
> +
>  static int __init arm64_init_rmi(void)
>  {
>  	/* Continue without realm support if we can't agree on a version */
> @@ -60,6 +99,9 @@ static int __init arm64_init_rmi(void)
>  	if (WARN_ON(rmi_features(1, &rmm_feat_reg1)))
>  		return 0;
>  
> +	if (rmi_configure())
> +		return 0;
> +
>  	return 0;
>  }
>  subsys_initcall(arm64_init_rmi);

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply

* [PATCH 14/18] arm64: fpsimd: Use opaque type for SME state
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

As the SME state size can vary at runtime, we don't have a concrete type
for the in-memory SME state, and pass this around using a pointer to
void.

Using pointer to void means that it's very easy to introduce errors that
cannot be caught by the compiler (e.g. as 'void **' can be assigned to
'void *').

Improve this by adding an opaque 'struct sve_state', and consistently
passing a pointer to this.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h    | 8 ++++----
 arch/arm64/include/asm/processor.h | 3 ++-
 arch/arm64/kernel/fpsimd.c         | 4 ++--
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 19e670ae67598..560814acc60c0 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -163,7 +163,7 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 struct cpu_fp_state {
 	struct user_fpsimd_state *st;
 	struct sve_state *sve_state;
-	void *sme_state;
+	struct sme_state *sme_state;
 	u64 *svcr;
 	u64 *fpmr;
 	unsigned int sve_vl;
@@ -199,7 +199,7 @@ static inline void *thread_zt_state(struct thread_struct *thread)
 {
 	/* The ZT register state is stored immediately after the ZA state */
 	unsigned int sme_vq = sve_vq_from_vl(thread_get_sme_vl(thread));
-	return thread->sme_state + ZA_SIG_REGS_SIZE(sme_vq);
+	return (void *)thread->sme_state + ZA_SIG_REGS_SIZE(sme_vq);
 }
 
 static inline unsigned int sve_get_vl(void)
@@ -218,8 +218,8 @@ static inline unsigned int sve_get_vl(void)
 extern void sve_save_state(struct sve_state *state, int save_ffr);
 extern void sve_load_state(const struct sve_state *state, int restore_ffr);
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
-extern void sme_save_state(void *state, int zt);
-extern void sme_load_state(void const *state, int zt);
+extern void sme_save_state(struct sme_state *state, int zt);
+extern void sme_load_state(const struct sme_state *state, int zt);
 
 struct arm64_cpu_capabilities;
 extern void cpu_enable_fpsimd(const struct arm64_cpu_capabilities *__unused);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 1c2ffd063baa8..7304d9cca3e85 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -131,6 +131,7 @@ enum fp_type {
 };
 
 struct sve_state;		/* Opaque type */
+struct sme_state;		/* Opaque type */
 
 struct cpu_context {
 	unsigned long x19;
@@ -167,7 +168,7 @@ struct thread_struct {
 	enum fp_type		fp_type;	/* registers FPSIMD or SVE? */
 	unsigned int		fpsimd_cpu;
 	struct sve_state	*sve_state;	/* SVE registers, if any */
-	void			*sme_state;	/* ZA and ZT state, if any */
+	struct sme_state	*sme_state;	/* ZA and ZT state, if any */
 	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
 	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
 	unsigned long		fault_address;	/* fault info */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 66d880d081671..f9b3eeacf130d 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -808,7 +808,7 @@ static int change_live_vector_length(struct task_struct *task,
 	unsigned int sve_vl = task_get_sve_vl(task);
 	unsigned int sme_vl = task_get_sme_vl(task);
 	struct sve_state *sve_state = NULL;
-	void *sme_state = NULL;
+	struct sme_state *sme_state = NULL;
 
 	if (type == ARM64_VEC_SME)
 		sme_vl = vl;
@@ -1645,7 +1645,7 @@ static void fpsimd_flush_thread_vl(enum vec_type type)
 void fpsimd_flush_thread(void)
 {
 	struct sve_state *sve_state = NULL;
-	void *sme_state = NULL;
+	struct sme_state *sme_state = NULL;
 
 	if (!system_supports_fpsimd())
 		return;
-- 
2.30.2



^ permalink raw reply related

* [PATCH 18/18] arm64: fpsimd: Remove <asm/fpsimdmacros.h>
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

We no longer need any of the remaining macros in <asm/fpsimdmacros.h>.

Remove all of it.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimdmacros.h | 64 ---------------------------
 1 file changed, 64 deletions(-)
 delete mode 100644 arch/arm64/include/asm/fpsimdmacros.h

diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
deleted file mode 100644
index a763fd03ffef3..0000000000000
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ /dev/null
@@ -1,64 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * FP/SIMD state saving and restoring macros
- *
- * Copyright (C) 2012 ARM Ltd.
- * Author: Catalin Marinas <catalin.marinas@arm.com>
- */
-
-#include <asm/assembler.h>
-
-/* Sanity-check macros to help avoid encoding garbage instructions */
-
-.macro _check_general_reg nr
-	.if (\nr) < 0 || (\nr) > 30
-		.error "Bad register number \nr."
-	.endif
-.endm
-
-.macro _sve_check_zreg znr
-	.if (\znr) < 0 || (\znr) > 31
-		.error "Bad Scalable Vector Extension vector register number \znr."
-	.endif
-.endm
-
-.macro _sve_check_preg pnr
-	.if (\pnr) < 0 || (\pnr) > 15
-		.error "Bad Scalable Vector Extension predicate register number \pnr."
-	.endif
-.endm
-
-.macro _check_num n, min, max
-	.if (\n) < (\min) || (\n) > (\max)
-		.error "Number \n out of range [\min,\max]"
-	.endif
-.endm
-
-.macro _sme_check_wv v
-	.if (\v) < 12 || (\v) > 15
-		.error "Bad vector select register \v."
-	.endif
-.endm
-
-.macro __for from:req, to:req
-	.if (\from) == (\to)
-		_for__body %\from
-	.else
-		__for %\from, %((\from) + ((\to) - (\from)) / 2)
-		__for %((\from) + ((\to) - (\from)) / 2 + 1), %\to
-	.endif
-.endm
-
-.macro _for var:req, from:req, to:req, insn:vararg
-	.macro _for__body \var:req
-		.noaltmacro
-		\insn
-		.altmacro
-	.endm
-
-	.altmacro
-	__for \from, \to
-	.noaltmacro
-
-	.purgem _for__body
-.endm
-- 
2.30.2



^ permalink raw reply related

* [PATCH 15/18] arm64: fpsimd: Move SVE save/restore inline
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

Currently the SVE register save/restore sequences are written in
out-of-line assembly routines. While this works, it's somewhat painful:

* As KVM needs to be able to use the sequences in hyp code, separate
  assembly files are used for the regular kernel and KVM code. While the
  common logic is shared in assembly macros, this still requires some
  duplication, and has lead to some trivial divergence.

* As the SVE LDR/STR instrucitons have limited addressing modes, the
  assembly macros use an awkward pattern requiring negative offsets.
  This could be written more clearly with addresses being generated in C
  code.

* As the FFR does not always exist in streaming mode, some awkward
  conditional branching has been written in assembly which could be
  clearer in C (and would permit the compiler to optimize out
  unnecessary branches in some cases).

* For historical reasons, the assembly macros take some register
  arguments as numerical indices (e.g. "sve_save 0, x1" uses x0 and x1),
  which is simply confusing.

* For historical reasons, the SVE save/restore code and FPSIMD
  save/restore code have a distinct sequences for FPSR and FPCR. Ideally
  this logic would be shared.

* The assembly sequences can't be instrumented, and so it's harder than
  necessary to catch memory safety issues.

To handle the above, move the SVE register save/restore sequences
to inline assembly.

Neither GCC nor LLVM instrument memory arguments to inline assembly, so
explicit instrumentation is added in the same manner as other assembly
routines. This instrumentation is implicitly disabled by Kbuild for nVHE
hyp code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h         | 119 +++++++++++++++++++++++-
 arch/arm64/include/asm/fpsimdmacros.h   |  61 ------------
 arch/arm64/include/asm/kvm_hyp.h        |   3 -
 arch/arm64/kernel/entry-fpsimd.S        |  22 -----
 arch/arm64/kvm/hyp/fpsimd.S             |  21 -----
 arch/arm64/kvm/hyp/include/hyp/switch.h |   4 +-
 arch/arm64/kvm/hyp/nvhe/Makefile        |   2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |   4 +-
 arch/arm64/kvm/hyp/vhe/Makefile         |   2 +-
 9 files changed, 123 insertions(+), 115 deletions(-)
 delete mode 100644 arch/arm64/kvm/hyp/fpsimd.S

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 560814acc60c0..d005324bbcf3e 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -215,8 +215,123 @@ static inline unsigned int sve_get_vl(void)
 	return vl;
 }
 
-extern void sve_save_state(struct sve_state *state, int save_ffr);
-extern void sve_load_state(const struct sve_state *state, int restore_ffr);
+#define FOR_EACH_Z_REG(idx_str, asm_str)											\
+	"	.irp " idx_str ",0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31\n"	\
+	asm_str	"\n"														\
+	"	.endr\n"
+
+#define FOR_EACH_P_REG(idx_str, asm_str)											\
+	"	.irp " idx_str ",0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\n"	\
+	asm_str	"\n"								\
+	"	.endr\n"
+
+static inline void __sve_save_z(struct sve_state *state, unsigned long vl)
+{
+	instrument_write(state, SVE_NUM_ZREGS * vl);
+	asm volatile(
+	__SVE_PREAMBLE
+	FOR_EACH_Z_REG("n", "str	z\\n, [%[zregs], #\\n, MUL VL]")
+	:
+	: [zregs] "r" (state)
+	: "memory"
+	);
+}
+
+static inline void __sve_load_z(const struct sve_state *state, unsigned long vl)
+{
+	instrument_read(state, SVE_NUM_ZREGS * vl);
+	asm volatile(
+	__SVE_PREAMBLE
+	FOR_EACH_Z_REG("n", "ldr	z\\n, [%[zregs], #\\n, MUL VL]")
+	:
+	: [zregs] "r" (state)
+	: "memory"
+	);
+}
+
+static inline void __sve_save_p(struct sve_state *state, unsigned long vl, bool ffr)
+{
+	void *pregs = (void *)state + SVE_NUM_ZREGS * vl;
+	unsigned long pl = vl / 8;
+	void *pffr = pregs + SVE_NUM_PREGS * pl;
+
+	instrument_write(pregs, SVE_NUM_PREGS * pl);
+	asm volatile(
+	__SVE_PREAMBLE
+	FOR_EACH_P_REG("n", "str	p\\n, [%[pregs], #\\n, MUL VL]\n")
+	:
+	: [pregs] "r" (pregs)
+	: "memory"
+	);
+
+	instrument_write(pffr, pl);
+	if (ffr) {
+		asm volatile(
+		__SVE_PREAMBLE
+		"	rdffr	p0.b\n"
+		"	str	p0, [%[pffr]]\n"
+		"	ldr	p0, [%[pregs]]\n"
+		:
+		: [pregs] "r" (pregs),
+		  [pffr] "r" (pffr)
+		: "memory"
+		);
+	} else {
+		asm volatile(
+		__SVE_PREAMBLE
+		"	pfalse	p0.b\n"
+		"	str	p0, [%[pffr]]\n"
+		"	ldr	p0, [%[pregs]]\n"
+		:
+		: [pregs] "r" (pregs),
+		  [pffr] "r" (pffr)
+		: "memory"
+		);
+	}
+}
+
+static inline void __sve_load_p(const struct sve_state *state, unsigned long vl, bool ffr)
+{
+	const void *pregs = (const void *)state + SVE_NUM_ZREGS * vl;
+	unsigned long pl = vl / 8;
+	const void *pffr = pregs + SVE_NUM_PREGS * pl;
+
+	if (ffr) {
+		instrument_read(pffr, pl);
+		asm volatile(
+		__SVE_PREAMBLE
+		"	ldr	p0, [%[pffr]]\n"
+		"	wrffr	p0.b\n"
+		:
+		: [pffr] "r" (pffr)
+		: "memory"
+		);
+	}
+
+	instrument_read(pregs, SVE_NUM_PREGS * pl);
+	asm volatile(
+	__SVE_PREAMBLE
+	FOR_EACH_P_REG("n", "ldr	p\\n, [%[pregs], #\\n, MUL VL]\n")
+	:
+	: [pregs] "r" (pregs)
+	: "memory"
+	);
+}
+
+static inline void sve_save_state(struct sve_state *state, bool ffr)
+{
+	unsigned long vl = sve_get_vl();
+	__sve_save_z(state, vl);
+	__sve_save_p(state, vl, ffr);
+}
+
+static inline void sve_load_state(const struct sve_state *state, bool ffr)
+{
+	unsigned long vl = sve_get_vl();
+	__sve_load_z(state, vl);
+	__sve_load_p(state, vl, ffr);
+}
+
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern void sme_save_state(struct sme_state *state, int zt);
 extern void sme_load_state(const struct sme_state *state, int zt);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 08f4863e67715..ebf8b47313e90 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -42,36 +42,6 @@
 
 /* Deprecated macros for SVE instructions */
 
-/* STR (vector): STR Z\nz, [X\nxbase, #\offset, MUL VL] */
-.macro _sve_str_v nz, nxbase, offset=0
-	.arch_extension sve
-	str	z\nz, [X\nxbase, #\offset, MUL VL]
-.endm
-
-/* LDR (vector): LDR Z\nz, [X\nxbase, #\offset, MUL VL] */
-.macro _sve_ldr_v nz, nxbase, offset=0
-	.arch_extension sve
-	ldr	z\nz, [X\nxbase, #\offset, MUL VL]
-.endm
-
-/* STR (predicate): STR P\np, [X\nxbase, #\offset, MUL VL] */
-.macro _sve_str_p np, nxbase, offset=0
-	.arch_extension sve
-	str	p\np, [X\nxbase, #\offset, MUL VL]
-.endm
-
-/* LDR (predicate): LDR P\np, [X\nxbase, #\offset, MUL VL] */
-.macro _sve_ldr_p np, nxbase, offset=0
-	.arch_extension sve
-	ldr p\np, [x\nxbase, #\offset, MUL VL]
-.endm
-
-/* RDFFR (unpredicated): RDFFR P\np.B */
-.macro _sve_rdffr np
-	.arch_extension sve
-	rdffr p\np\().b
-.endm
-
 /* WRFFR P\np.B */
 .macro _sve_wrffr np
 	wrffr p\np\().b
@@ -176,37 +146,6 @@
 		_sve_wrffr	0
 .endm
 
-.macro _sve_pffr ptr
-	.arch_extension sve
-	addvl	\ptr, \ptr, #16
-	addvl	\ptr, \ptr, #16
-	addpl	\ptr, \ptr, #16
-.endm
-
-.macro sve_save nxbase, save_ffr
-		_sve_pffr	x\nxbase
- _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
- _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
-		cbz		\save_ffr, 921f
-		_sve_rdffr	0
-		b		922f
-921:
-		_sve_pfalse	0			// Zero out FFR
-922:
-		_sve_str_p	0, \nxbase
-		_sve_ldr_p	0, \nxbase, -16
-.endm
-
-.macro sve_load nxbase, restore_ffr
-		_sve_pffr	x\nxbase
- _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
-		cbz		\restore_ffr, 921f
-		_sve_ldr_p	0, \nxbase
-		_sve_wrffr	0
-921:
- _for n, 0, 15,	_sve_ldr_p	\n, \nxbase, \n - 16
-.endm
-
 .macro sme_save_za nxbase, xvl, nw
 	mov	w\nw, #0
 
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 38356eee592ad..ad19de1d0654f 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -121,9 +121,6 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 #endif
 
-void __sve_save_state(struct sve_state *sve, int save_ffr);
-void __sve_restore_state(struct sve_state *sve, int restore_ffr);
-
 u64 __guest_enter(struct kvm_vcpu *vcpu);
 
 bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt, u32 func_id);
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 4fa00c94f28b7..0575d90e6dffb 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -13,28 +13,6 @@
 
 #ifdef CONFIG_ARM64_SVE
 
-/*
- * Save the SVE state
- *
- * x0 - pointer to buffer for state
- * x1 - Save FFR if non-zero
- */
-SYM_FUNC_START(sve_save_state)
-	sve_save 0, x1
-	ret
-SYM_FUNC_END(sve_save_state)
-
-/*
- * Load the SVE state
- *
- * x0 - pointer to buffer for state
- * x1 - Restore FFR if non-zero
- */
-SYM_FUNC_START(sve_load_state)
-	sve_load 0, x1
-	ret
-SYM_FUNC_END(sve_load_state)
-
 /*
  * Zero all SVE registers but the first 128-bits of each vector
  *
diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
deleted file mode 100644
index beacec33b2541..0000000000000
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ /dev/null
@@ -1,21 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2015 - ARM Ltd
- * Author: Marc Zyngier <marc.zyngier@arm.com>
- */
-
-#include <linux/linkage.h>
-
-#include <asm/fpsimdmacros.h>
-
-	.text
-
-SYM_FUNC_START(__sve_restore_state)
-	sve_load 0, x1
-	ret
-SYM_FUNC_END(__sve_restore_state)
-
-SYM_FUNC_START(__sve_save_state)
-	sve_save 0, x1
-	ret
-SYM_FUNC_END(__sve_save_state)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 72e658255cda7..41c60c9eea423 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -467,7 +467,7 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
 	 * vCPU. Start off with the max VL so we can load the SVE state.
 	 */
 	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
-	__sve_restore_state(kern_hyp_va(vcpu->arch.sve_state), true);
+	sve_load_state(kern_hyp_va(vcpu->arch.sve_state), true);
 	fpsimd_load_common(&vcpu->arch.ctxt.fp_regs);
 
 	/*
@@ -488,7 +488,7 @@ static inline void __hyp_sve_save_host(void)
 
 	ctxt_sys_reg(hctxt, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
-	__sve_save_state(sve_regs, true);
+	sve_save_state(sve_regs, true);
 	fpsimd_save_common(&hctxt->fp_regs);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 62cdfbff75625..f57450ebcb498 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -26,7 +26,7 @@ hyp-obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o page_alloc.o \
 	 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o stacktrace.o ffa.o
 hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
-	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o ../vgic-v5-sr.o
+	 ../hyp-entry.o ../exception.o ../pgtable.o ../vgic-v5-sr.o
 hyp-obj-y += ../../../kernel/smccc-call.o
 hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o
 hyp-obj-$(CONFIG_NVHE_EL2_TRACING) += clock.o trace.o events.o
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 72d025b2178a7..5c43943f24380 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -35,7 +35,7 @@ static void __hyp_sve_save_guest(struct kvm_vcpu *vcpu)
 	 * on the VL, so use a consistent (i.e., the maximum) guest VL.
 	 */
 	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
-	__sve_save_state(kern_hyp_va(vcpu->arch.sve_state), true);
+	sve_save_state(kern_hyp_va(vcpu->arch.sve_state), true);
 	fpsimd_save_common(&vcpu->arch.ctxt.fp_regs);
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
 }
@@ -55,7 +55,7 @@ static void __hyp_sve_restore_host(void)
 	 * need to be revisited.
 	 */
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
-	__sve_restore_state(sve_regs, true);
+	sve_load_state(sve_regs, true);
 	fpsimd_load_common(&hctxt->fp_regs);
 	write_sysreg_el1(ctxt_sys_reg(hctxt, ZCR_EL1), SYS_ZCR);
 }
diff --git a/arch/arm64/kvm/hyp/vhe/Makefile b/arch/arm64/kvm/hyp/vhe/Makefile
index 9695328bbd96e..d6b3475145c0e 100644
--- a/arch/arm64/kvm/hyp/vhe/Makefile
+++ b/arch/arm64/kvm/hyp/vhe/Makefile
@@ -10,4 +10,4 @@ CFLAGS_switch.o += -Wno-override-init
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
-	 ../fpsimd.o ../hyp-entry.o ../exception.o ../vgic-v5-sr.o
+	 ../hyp-entry.o ../exception.o ../vgic-v5-sr.o
-- 
2.30.2



^ permalink raw reply related

* [PATCH 16/18] arm64: fpsimd: Move sve_flush_live() inline
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

Currently sve_flush_live() is written in out-of-line assembly. It would
be nice if we could move it inline such that control flow can be written
more clearly in C, and to permit the removal of otherwise unused
assembly macros.

The 'flush_ffr' argument is redundant as sve_flush_live() is always
called from non-streaming mode, and all callers pass 'true'. Remove the
argument and make it a requirement that the function is called from
non-streaming mode.

The 'vq_minus_1' argument is unnecessary, as sve_flush_live() can read
the live VL directly using the RDVL instruction (wrapped by the
sve_get_vl() helper function).

Move the function to C, with the simplifications above.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       | 26 +++++++++++++++++++++++-
 arch/arm64/include/asm/fpsimdmacros.h | 29 ---------------------------
 arch/arm64/kernel/entry-common.c      |  8 ++------
 arch/arm64/kernel/entry-fpsimd.S      | 22 --------------------
 arch/arm64/kernel/fpsimd.c            |  2 +-
 5 files changed, 28 insertions(+), 59 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index d005324bbcf3e..550987b36206a 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -332,7 +332,31 @@ static inline void sve_load_state(const struct sve_state *state, bool ffr)
 	__sve_load_p(state, vl, ffr);
 }
 
-extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
+
+/*
+ * Zero all SVE registers except for the first 128 bits of each vector.
+ *
+ * The caller must ensure that the VL has been configured and the CPU must be
+ * in non-streaming mode.
+ */
+static inline void sve_flush_live(void)
+{
+	unsigned long vl = sve_get_vl();
+
+	if (vl > sizeof(__uint128_t)) {
+		asm volatile(
+		__FPSIMD_PREAMBLE
+		FOR_EACH_Z_REG("n", "mov	v\\n\\().16b, v\\n\\().16b")
+		);
+	}
+
+	asm volatile(
+	__SVE_PREAMBLE
+	FOR_EACH_P_REG("n", "pfalse	p\\n\\().b")
+	"	wrffr	p0.b\n"
+	);
+}
+
 extern void sme_save_state(struct sme_state *state, int zt);
 extern void sme_load_state(const struct sme_state *state, int zt);
 
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index ebf8b47313e90..9e352b5c6b764 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -40,19 +40,6 @@
 	.endif
 .endm
 
-/* Deprecated macros for SVE instructions */
-
-/* WRFFR P\np.B */
-.macro _sve_wrffr np
-	wrffr p\np\().b
-.endm
-
-/* PFALSE P\np.B */
-.macro _sve_pfalse np
-	.arch_extension sve
-	pfalse	p\np\().b
-.endm
-
 /* Deprecated macros for SME instructions */
 
 /* RDSVL X\nx, #\imm */
@@ -130,22 +117,6 @@
 	.purgem _for__body
 .endm
 
-/* Preserve the first 128-bits of Znz and zero the rest. */
-.macro _sve_flush_z nz
-	_sve_check_zreg \nz
-	mov	v\nz\().16b, v\nz\().16b
-.endm
-
-.macro sve_flush_z
- _for n, 0, 31, _sve_flush_z	\n
-.endm
-.macro sve_flush_p
- _for n, 0, 15, _sve_pfalse	\n
-.endm
-.macro sve_flush_ffr
-		_sve_wrffr	0
-.endm
-
 .macro sme_save_za nxbase, xvl, nw
 	mov	w\nw, #0
 
diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index cb54335465f66..2352297330e12 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -237,12 +237,8 @@ static inline void fpsimd_syscall_enter(void)
 	if (!system_supports_sve())
 		return;
 
-	if (test_thread_flag(TIF_SVE)) {
-		unsigned int sve_vq_minus_one;
-
-		sve_vq_minus_one = sve_vq_from_vl(task_get_sve_vl(current)) - 1;
-		sve_flush_live(true, sve_vq_minus_one);
-	}
+	if (test_thread_flag(TIF_SVE))
+		sve_flush_live();
 
 	/*
 	 * Any live non-FPSIMD SVE state has been zeroed. Allow
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 0575d90e6dffb..bff941eea9566 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -11,28 +11,6 @@
 #include <asm/assembler.h>
 #include <asm/fpsimdmacros.h>
 
-#ifdef CONFIG_ARM64_SVE
-
-/*
- * Zero all SVE registers but the first 128-bits of each vector
- *
- * VQ must already be configured by caller, any further updates of VQ
- * will need to ensure that the register state remains valid.
- *
- * x0 = include FFR?
- * x1 = VQ - 1
- */
-SYM_FUNC_START(sve_flush_live)
-	cbz		x1, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
-	sve_flush_z
-1:	sve_flush_p
-	tbz		x0, #0, 2f
-	sve_flush_ffr
-2:	ret
-SYM_FUNC_END(sve_flush_live)
-
-#endif /* CONFIG_ARM64_SVE */
-
 #ifdef CONFIG_ARM64_SME
 
 /*
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f9b3eeacf130d..42177b439b3c7 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1338,7 +1338,7 @@ void do_sve_acc(unsigned long esr, struct pt_regs *regs)
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		unsigned long vq = sve_vq_from_vl(task_get_sve_vl(current));
 		sysreg_clear_set_s(SYS_ZCR_EL1, ZCR_ELx_LEN, vq - 1);
-		sve_flush_live(true, vq - 1);
+		sve_flush_live();
 		fpsimd_bind_task_to_cpu();
 	} else {
 		fpsimd_to_sve(current);
-- 
2.30.2



^ permalink raw reply related

* [PATCH 13/18] arm64: fpsimd: Use opaque type for SVE state
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

As the SVE state size can vary at runtime, we don't have a concrete type
for the in-memory SVE state, and pass this around using a pointer to
void. The functions which save/restore the SVE state have a very unusual
calling convention, expecting a pointer to the FFR *in the middle of*
the in-memory SVE state, which is also passed as a pointer to void.
Passing a pointer to the FFR also requires that callers find the live VL
and perform some arithmetic, which callers implement differently.

Using pointer to void means that it's very easy to introduce errors that
cannot be caught by the compiler (e.g. as 'void **' can be assigned to
'void *'). In general this is unnecessarily confusing and fragile.

Improve this by adding an opaque 'struct sve_state', and consistently
passing a pointer to this, performing the necessary offsetting *within*
the save/restore functions.

For the moment, the offsetting is performed in a new '_sve_pffr'
assembly macro, using the ADDVL and ADDPL instructions. These add a
multiple of the live vector length and predicate length respectively.
The ADDVL immediate range cannot encode 32, so this is split into two
increments of 16.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h         | 24 +++---------------------
 arch/arm64/include/asm/fpsimdmacros.h   |  9 +++++++++
 arch/arm64/include/asm/kvm_host.h       |  8 ++------
 arch/arm64/include/asm/kvm_hyp.h        |  4 ++--
 arch/arm64/include/asm/processor.h      |  4 +++-
 arch/arm64/kernel/fpsimd.c              | 21 ++++++++++-----------
 arch/arm64/kvm/arm.c                    |  4 ++--
 arch/arm64/kvm/guest.c                  |  4 ++--
 arch/arm64/kvm/hyp/include/hyp/switch.h |  8 +++-----
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  7 +++----
 arch/arm64/kvm/hyp/nvhe/setup.c         |  2 +-
 11 files changed, 40 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 19b373ad0ebf7..19e670ae67598 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -162,7 +162,7 @@ extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
 struct cpu_fp_state {
 	struct user_fpsimd_state *st;
-	void *sve_state;
+	struct sve_state *sve_state;
 	void *sme_state;
 	u64 *svcr;
 	u64 *fpmr;
@@ -195,24 +195,6 @@ extern void task_smstop_sm(struct task_struct *task);
 /* Maximum VL that SVE/SME VL-agnostic software can transparently support */
 #define VL_ARCH_MAX 0x100
 
-/* Offset of FFR in the SVE register dump */
-static inline size_t sve_ffr_offset(int vl)
-{
-	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
-}
-
-static inline void *sve_pffr(struct thread_struct *thread)
-{
-	unsigned int vl;
-
-	if (system_supports_sme() && thread_sm_enabled(thread))
-		vl = thread_get_sme_vl(thread);
-	else
-		vl = thread_get_sve_vl(thread);
-
-	return (char *)thread->sve_state + sve_ffr_offset(vl);
-}
-
 static inline void *thread_zt_state(struct thread_struct *thread)
 {
 	/* The ZT register state is stored immediately after the ZA state */
@@ -233,8 +215,8 @@ static inline unsigned int sve_get_vl(void)
 	return vl;
 }
 
-extern void sve_save_state(void *state, int save_ffr);
-extern void sve_load_state(void const *state, int restore_ffr);
+extern void sve_save_state(struct sve_state *state, int save_ffr);
+extern void sve_load_state(const struct sve_state *state, int restore_ffr);
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern void sme_save_state(void *state, int zt);
 extern void sme_load_state(void const *state, int zt);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 01b5e6d51ba79..08f4863e67715 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -176,7 +176,15 @@
 		_sve_wrffr	0
 .endm
 
+.macro _sve_pffr ptr
+	.arch_extension sve
+	addvl	\ptr, \ptr, #16
+	addvl	\ptr, \ptr, #16
+	addpl	\ptr, \ptr, #16
+.endm
+
 .macro sve_save nxbase, save_ffr
+		_sve_pffr	x\nxbase
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
 		cbz		\save_ffr, 921f
@@ -190,6 +198,7 @@
 .endm
 
 .macro sve_load nxbase, restore_ffr
+		_sve_pffr	x\nxbase
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
 		cbz		\restore_ffr, 921f
 		_sve_ldr_p	0, \nxbase
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ae24617380b8f..a366509c5944e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -759,7 +759,7 @@ struct kvm_host_data {
 	 * Hyp VA.
 	 * sve_regs is only used in pKVM and if system_supports_sve().
 	 */
-	u8	*sve_regs;
+	struct sve_state *sve_regs;
 
 	/* Ownership of the FP regs */
 	enum {
@@ -853,7 +853,7 @@ struct kvm_vcpu_arch {
 	 * floating point code saves the register state of a task it
 	 * records which view it saved in fp_type.
 	 */
-	void *sve_state;
+	struct sve_state *sve_state;
 	enum fp_type fp_type;
 	unsigned int sve_max_vl;
 
@@ -1097,10 +1097,6 @@ struct kvm_vcpu_arch {
 #define NESTED_SERROR_PENDING	__vcpu_single_flag(sflags, BIT(8))
 
 
-/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
-#define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
-			     sve_ffr_offset((vcpu)->arch.sve_max_vl))
-
 #define vcpu_sve_max_vq(vcpu)	sve_vq_from_vl((vcpu)->arch.sve_max_vl)
 
 #define vcpu_sve_zcr_elx(vcpu)						\
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 8c4602c8f4356..38356eee592ad 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -121,8 +121,8 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 #endif
 
-void __sve_save_state(void *sve, int save_ffr);
-void __sve_restore_state(void *sve, int restore_ffr);
+void __sve_save_state(struct sve_state *sve, int save_ffr);
+void __sve_restore_state(struct sve_state *sve, int restore_ffr);
 
 u64 __guest_enter(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index e30c4c8e3a7a7..1c2ffd063baa8 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -130,6 +130,8 @@ enum fp_type {
 	FP_STATE_SVE,
 };
 
+struct sve_state;		/* Opaque type */
+
 struct cpu_context {
 	unsigned long x19;
 	unsigned long x20;
@@ -164,7 +166,7 @@ struct thread_struct {
 
 	enum fp_type		fp_type;	/* registers FPSIMD or SVE? */
 	unsigned int		fpsimd_cpu;
-	void			*sve_state;	/* SVE registers, if any */
+	struct sve_state	*sve_state;	/* SVE registers, if any */
 	void			*sme_state;	/* ZA and ZT state, if any */
 	unsigned int		vl[ARM64_VEC_MAX];	/* vector length */
 	unsigned int		vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 9806fea8fea7c..66d880d081671 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -425,8 +425,7 @@ static void task_fpsimd_load(void)
 
 	if (restore_sve_regs) {
 		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_SVE);
-		sve_load_state(sve_pffr(&current->thread),
-			       restore_ffr);
+		sve_load_state(current->thread.sve_state, restore_ffr);
 		fpsimd_load_common(&current->thread.uw.fpsimd_state);
 	} else {
 		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_FPSIMD);
@@ -507,9 +506,7 @@ static void fpsimd_save_user_state(void)
 			return;
 		}
 
-		sve_save_state((char *)last->sve_state +
-					sve_ffr_offset(vl),
-			       save_ffr);
+		sve_save_state(last->sve_state, save_ffr);
 		fpsimd_save_common(last->st);
 		*last->fp_type = FP_STATE_SVE;
 	} else {
@@ -641,7 +638,8 @@ static __uint128_t arm64_cpu_to_le128(__uint128_t x)
 
 #define arm64_le128_to_cpu(x) arm64_cpu_to_le128(x)
 
-static void __fpsimd_to_sve(void *sst, struct user_fpsimd_state const *fst,
+static void __fpsimd_to_sve(struct sve_state *sst,
+			    struct user_fpsimd_state const *fst,
 			    unsigned int vq)
 {
 	unsigned int i;
@@ -668,7 +666,7 @@ static void __fpsimd_to_sve(void *sst, struct user_fpsimd_state const *fst,
 static inline void fpsimd_to_sve(struct task_struct *task)
 {
 	unsigned int vq;
-	void *sst = task->thread.sve_state;
+	struct sve_state *sst = task->thread.sve_state;
 	struct user_fpsimd_state const *fst = &task->thread.uw.fpsimd_state;
 
 	if (!system_supports_sve() && !system_supports_sme())
@@ -692,7 +690,7 @@ static inline void fpsimd_to_sve(struct task_struct *task)
 static inline void sve_to_fpsimd(struct task_struct *task)
 {
 	unsigned int vq, vl;
-	void const *sst = task->thread.sve_state;
+	const struct sve_state *sst = task->thread.sve_state;
 	struct user_fpsimd_state *fst = &task->thread.uw.fpsimd_state;
 	unsigned int i;
 	__uint128_t const *p;
@@ -791,7 +789,7 @@ void fpsimd_sync_from_effective_state(struct task_struct *task)
 void fpsimd_sync_to_effective_state_zeropad(struct task_struct *task)
 {
 	unsigned int vq;
-	void *sst = task->thread.sve_state;
+	struct sve_state *sst = task->thread.sve_state;
 	struct user_fpsimd_state const *fst = &task->thread.uw.fpsimd_state;
 
 	if (task->thread.fp_type != FP_STATE_SVE)
@@ -809,7 +807,8 @@ static int change_live_vector_length(struct task_struct *task,
 {
 	unsigned int sve_vl = task_get_sve_vl(task);
 	unsigned int sme_vl = task_get_sme_vl(task);
-	void *sve_state = NULL, *sme_state = NULL;
+	struct sve_state *sve_state = NULL;
+	void *sme_state = NULL;
 
 	if (type == ARM64_VEC_SME)
 		sme_vl = vl;
@@ -1645,7 +1644,7 @@ static void fpsimd_flush_thread_vl(enum vec_type type)
 
 void fpsimd_flush_thread(void)
 {
-	void *sve_state = NULL;
+	struct sve_state *sve_state = NULL;
 	void *sme_state = NULL;
 
 	if (!system_supports_fpsimd())
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f9fc85a0344e1..7a3db4d7dcdef 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2499,7 +2499,7 @@ static void __init teardown_hyp_mode(void)
 			continue;
 
 		if (free_sve) {
-			u8 *sve_regs;
+			struct sve_state *sve_regs;
 
 			sve_regs = per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_regs;
 			free_pages((unsigned long) sve_regs, pkvm_host_sve_state_order());
@@ -2648,7 +2648,7 @@ static void finalize_init_hyp_mode(void)
 
 	if (system_supports_sve() && is_protected_kvm_enabled()) {
 		for_each_possible_cpu(cpu) {
-			u8 *sve_regs;
+			struct sve_state *sve_regs;
 
 			sve_regs = per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_regs;
 			per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_regs =
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 332c453b87cf8..b01d6622b8720 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -500,7 +500,7 @@ static int get_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if (!kvm_arm_vcpu_sve_finalized(vcpu))
 		return -EPERM;
 
-	if (copy_to_user(uptr, vcpu->arch.sve_state + region.koffset,
+	if (copy_to_user(uptr, (void *)vcpu->arch.sve_state + region.koffset,
 			 region.klen) ||
 	    clear_user(uptr + region.klen, region.upad))
 		return -EFAULT;
@@ -526,7 +526,7 @@ static int set_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if (!kvm_arm_vcpu_sve_finalized(vcpu))
 		return -EPERM;
 
-	if (copy_from_user(vcpu->arch.sve_state + region.koffset, uptr,
+	if (copy_from_user((void *)vcpu->arch.sve_state + region.koffset, uptr,
 			   region.klen))
 		return -EFAULT;
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index aaa43554fd8e6..72e658255cda7 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -467,8 +467,7 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
 	 * vCPU. Start off with the max VL so we can load the SVE state.
 	 */
 	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
-	__sve_restore_state(vcpu_sve_pffr(vcpu),
-			    true);
+	__sve_restore_state(kern_hyp_va(vcpu->arch.sve_state), true);
 	fpsimd_load_common(&vcpu->arch.ctxt.fp_regs);
 
 	/*
@@ -485,12 +484,11 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
 static inline void __hyp_sve_save_host(void)
 {
 	struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
-	u8 *sve_regs = *host_data_ptr(sve_regs);
+	struct sve_state *sve_regs = *host_data_ptr(sve_regs);
 
 	ctxt_sys_reg(hctxt, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
-	__sve_save_state(sve_regs + sve_ffr_offset(kvm_host_sve_max_vl),
-			 true);
+	__sve_save_state(sve_regs, true);
 	fpsimd_save_common(&hctxt->fp_regs);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 627762ed7327f..72d025b2178a7 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -35,7 +35,7 @@ static void __hyp_sve_save_guest(struct kvm_vcpu *vcpu)
 	 * on the VL, so use a consistent (i.e., the maximum) guest VL.
 	 */
 	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
-	__sve_save_state(vcpu_sve_pffr(vcpu), true);
+	__sve_save_state(kern_hyp_va(vcpu->arch.sve_state), true);
 	fpsimd_save_common(&vcpu->arch.ctxt.fp_regs);
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
 }
@@ -43,7 +43,7 @@ static void __hyp_sve_save_guest(struct kvm_vcpu *vcpu)
 static void __hyp_sve_restore_host(void)
 {
 	struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
-	u8 *sve_regs = *host_data_ptr(sve_regs);
+	struct sve_state *sve_regs = *host_data_ptr(sve_regs);
 
 	/*
 	 * On saving/restoring host sve state, always use the maximum VL for
@@ -55,8 +55,7 @@ static void __hyp_sve_restore_host(void)
 	 * need to be revisited.
 	 */
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
-	__sve_restore_state(sve_regs + sve_ffr_offset(kvm_host_sve_max_vl),
-			    true);
+	__sve_restore_state(sve_regs, true);
 	fpsimd_load_common(&hctxt->fp_regs);
 	write_sysreg_el1(ctxt_sys_reg(hctxt, ZCR_EL1), SYS_ZCR);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index cdaf53c833409..77dbcfed05486 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -82,7 +82,7 @@ static int pkvm_create_host_sve_mappings(void)
 
 	for (i = 0; i < hyp_nr_cpus; i++) {
 		struct kvm_host_data *host_data = per_cpu_ptr(&kvm_host_data, i);
-		u8 *sve_regs = host_data->sve_regs;
+		struct sve_state *sve_regs = host_data->sve_regs;
 
 		start = kern_hyp_va(sve_regs);
 		end = start + PAGE_ALIGN(pkvm_host_sve_state_size());
-- 
2.30.2



^ permalink raw reply related

* [PATCH 17/18] arm64: fpsimd: Move SME save/restore inline
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

Currently the SVE register save/restore sequences are written in
out-of-line assembly routines. While this works, it's somewhat painful:

* For KVM to use the sequences, portions of the logic will need to be
  duplicated in KVM hyp code. While the common logic can be shared in
  assembly macros, this is very likely to lead to unnecessary divergence
  and be a maintenance burden.

* For historical reasons, the assembly macros take some register
  arguments as numerical indices (e.g. "sme_save_za 0, x2, 12" uses x0, x1, and
  x12), which is simply confusing.

* Address generation and control flow are far clearer in C than in
  assembly.

* The assembly sequences can't be instrumented, and so it's harder than
  necessary to catch memory safety issues.

To handle the above, move the SME register save/restore sequences
to inline assembly.

Neither GCC nor LLVM instrument memory arguments to inline assembly, so
explicit instrumentation is added in the same manner as other assembly
routines. This instrumentation is implicitly disabled by Kbuild for nVHE
hyp code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h       | 100 +++++++++++++++++++++++++-
 arch/arm64/include/asm/fpsimdmacros.h |  76 --------------------
 arch/arm64/kernel/Makefile            |   2 +-
 arch/arm64/kernel/entry-fpsimd.S      |  48 -------------
 4 files changed, 98 insertions(+), 128 deletions(-)
 delete mode 100644 arch/arm64/kernel/entry-fpsimd.S

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 550987b36206a..12f222f64b8d5 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -357,9 +357,6 @@ static inline void sve_flush_live(void)
 	);
 }
 
-extern void sme_save_state(struct sme_state *state, int zt);
-extern void sme_load_state(const struct sme_state *state, int zt);
-
 struct arm64_cpu_capabilities;
 extern void cpu_enable_fpsimd(const struct arm64_cpu_capabilities *__unused);
 extern void cpu_enable_sve(const struct arm64_cpu_capabilities *__unused);
@@ -639,6 +636,100 @@ static inline size_t __sme_state_size(unsigned int sme_vl)
 	return size;
 }
 
+static inline void __sme_save_za(struct sme_state *state, unsigned long svl)
+{
+	/* The <Wv> argument to STR (array vector) can only encode W12-W15 */
+	register unsigned long v asm ("12");
+
+	instrument_write(state, svl * svl);
+	for (v = 0; v < svl; v++) {
+		void *pav = (void *)state + v * svl;
+
+		asm volatile(
+		__SME_PREAMBLE
+		"	str	za[%w[v], #0], [%[pav]]\n"
+		:
+		: [v] "r" (v),
+		  [pav] "r" (pav)
+		: "memory"
+		);
+	}
+}
+
+static inline void __sme_load_za(struct sme_state *state, unsigned long svl)
+{
+	/* The <Wv> argument to LDR (array vector) can only encode W12-W15 */
+	register unsigned long v asm ("12");
+
+	instrument_read(state, svl * svl);
+	for (v = 0; v < svl; v++) {
+		void *pav = (void *)state + v * svl;
+
+		asm volatile(
+		__SME_PREAMBLE
+		"	ldr	za[%w[v], #0], [%[pav]]\n"
+		:
+		: [v] "r" (v),
+		  [pav] "r" (pav)
+		: "memory"
+		);
+	}
+}
+
+static inline void __sme_save_zt(struct sme_state *state, unsigned long svl)
+{
+	void *pzt = (void *)state + svl * svl;
+
+	instrument_write(pzt, svl);
+	asm volatile(
+	__DEFINE_ASM_GPR_NUMS
+	/*
+	 * STR ZT0, [<Xn|SP>]
+	 * Supported by binutils 2.41+.
+	 * Supported by LLVM 16+
+	 */
+	"	.inst	0xe13f8000 | ((.L__gpr_num_%[pzt]) << 5)\n"
+	:
+	: [pzt] "r" (pzt)
+	: "memory");
+}
+
+static inline void __sme_load_zt(const struct sme_state *state, unsigned long svl)
+{
+	void *pzt = (void *)state + svl * svl;
+
+	instrument_read(pzt, svl);
+	asm volatile(
+	__DEFINE_ASM_GPR_NUMS
+	/*
+	 * LDR ZT0, [<Xn|SP>]
+	 * Supported by binutils 2.41+.
+	 * Supported by LLVM 16+
+	 */
+	"	.inst	0xe11f8000 | ((.L__gpr_num_%[pzt]) << 5)\n"
+	:
+	: [pzt] "r" (pzt)
+	: "memory");
+}
+
+static inline void sme_save_state(struct sme_state *state, bool zt)
+{
+	unsigned long svl = sme_get_vl();
+
+	__sme_save_za(state, svl);
+	if (zt)
+		__sme_save_zt(state, svl);
+}
+
+static inline void sme_load_state(struct sme_state *state, bool zt)
+{
+	unsigned long svl = sme_get_vl();
+
+	__sme_load_za(state, svl);
+	if (zt)
+		__sme_load_zt(state, svl);
+}
+
 /*
  * Return how many bytes of memory are required to store the full SME
  * specific state for task, given task's currently configured vector
@@ -695,6 +786,9 @@ static inline size_t sme_state_size(struct task_struct const *task)
 	return 0;
 }
 
+static inline void sme_save_state(struct sme_state *state, bool zt) { BUILD_BUG(); }
+static inline void sme_load_state(const struct sme_state *state, bool zt) { BUILD_BUG(); }
+
 static inline void sme_enter_from_user_mode(void) { }
 static inline void sme_exit_to_user_mode(void) { }
 
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 9e352b5c6b764..a763fd03ffef3 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -40,60 +40,6 @@
 	.endif
 .endm
 
-/* Deprecated macros for SME instructions */
-
-/* RDSVL X\nx, #\imm */
-.macro _sme_rdsvl nx, imm
-	.arch_extension sme
-	rdsvl x\nx, #\imm
-.endm
-
-/*
- * STR (vector from ZA array):
- *	STR ZA[W\nw, #\offset], [X\nxbase, #\offset, MUL VL]
- */
-.macro _sme_str_zav nw, nxbase, offset=0
-	.arch_extension sme
-	str	za[w\nw, #\offset], [x\nxbase, #\offset, MUL VL]
-.endm
-
-/*
- * LDR (vector to ZA array):
- *	LDR ZA[w\nw, #\offset], [X\nxbase, #\offset, MUL VL]
- */
-.macro _sme_ldr_zav nw, nxbase, offset=0
-	.arch_extension sme
-	ldr	za[w\nw, #\offset], [x\nxbase, #\offset, MUL VL]
-.endm
-
-/*
- * SME2 instruction encodings for older assemblers.
- * Supported by binutils 2.41+.
- * Supported by LLVM 16+
- */
-
-/*
- * LDR (ZT0)
- *
- *	LDR ZT0, nx
- */
-.macro _ldr_zt nx
-	_check_general_reg \nx
-	.inst	0xe11f8000	\
-		 | (\nx << 5)
-.endm
-
-/*
- * STR (ZT0)
- *
- *	STR ZT0, nx
- */
-.macro _str_zt nx
-	_check_general_reg \nx
-	.inst	0xe13f8000		\
-		| (\nx << 5)
-.endm
-
 .macro __for from:req, to:req
 	.if (\from) == (\to)
 		_for__body %\from
@@ -116,25 +62,3 @@
 
 	.purgem _for__body
 .endm
-
-.macro sme_save_za nxbase, xvl, nw
-	mov	w\nw, #0
-
-423:
-	_sme_str_zav \nw, \nxbase
-	add	x\nxbase, x\nxbase, \xvl
-	add	x\nw, x\nw, #1
-	cmp	\xvl, x\nw
-	bne	423b
-.endm
-
-.macro sme_load_za nxbase, xvl, nw
-	mov	w\nw, #0
-
-423:
-	_sme_ldr_zav \nw, \nxbase
-	add	x\nxbase, x\nxbase, \xvl
-	add	x\nw, x\nw, #1
-	cmp	\xvl, x\nw
-	bne	423b
-.endm
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 74b76bb704523..d2690c3ec5288 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -27,7 +27,7 @@ KCOV_INSTRUMENT_idle.o := n
 
 # Object file lists.
 obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
-			   entry-common.o entry-fpsimd.o process.o ptrace.o	\
+			   entry-common.o process.o ptrace.o			\
 			   setup.o signal.o sys.o stacktrace.o time.o traps.o	\
 			   io.o vdso.o hyp-stub.o psci.o cpu_ops.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
deleted file mode 100644
index bff941eea9566..0000000000000
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ /dev/null
@@ -1,48 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * FP/SIMD state saving and restoring
- *
- * Copyright (C) 2012 ARM Ltd.
- * Author: Catalin Marinas <catalin.marinas@arm.com>
- */
-
-#include <linux/linkage.h>
-
-#include <asm/assembler.h>
-#include <asm/fpsimdmacros.h>
-
-#ifdef CONFIG_ARM64_SME
-
-/*
- * Save the ZA and ZT state
- *
- * x0 - pointer to buffer for state
- * x1 - number of ZT registers to save
- */
-SYM_FUNC_START(sme_save_state)
-	_sme_rdsvl	2, 1		// x2 = VL/8
-	sme_save_za 0, x2, 12		// Leaves x0 pointing to the end of ZA
-
-	cbz	x1, 1f
-	_str_zt 0
-1:
-	ret
-SYM_FUNC_END(sme_save_state)
-
-/*
- * Load the ZA and ZT state
- *
- * x0 - pointer to buffer for state
- * x1 - number of ZT registers to save
- */
-SYM_FUNC_START(sme_load_state)
-	_sme_rdsvl	2, 1		// x2 = VL/8
-	sme_load_za 0, x2, 12		// Leaves x0 pointing to the end of ZA
-
-	cbz	x1, 1f
-	_ldr_zt 0
-1:
-	ret
-SYM_FUNC_END(sme_load_state)
-
-#endif /* CONFIG_ARM64_SME */
-- 
2.30.2



^ permalink raw reply related

* [PATCH 11/18] arm64: fpsimd: Split FPSR/FPCR from SVE save/restore
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

Regardless of whether the vector registers are saved in FPSIMD or SVE
format, we store FPSR and FPCR in user_fpsimd_state::{fpsr,fpcr}.

For historical reasons, the functions which save/restore SVE context
take a pointer to user_fpsimd_state::fpsr, and use this to access both
user_fpsimd_state::fpsr and user_fpsimd_state::fpcr. This is
unnecessarily fragile.

Move the save/restore of FPSR and FPCR into separate helper functions
which take a pointer to user_fpsimd_state. I've used read_sysreg_s() and
write_sysreg_s() as contemporary versions of LLVM will refuse to
directly assemble accesses to FPCR or FPSR unless the "fp" arch
extension is enabled.

Note that the SVE assembly sequence for restoring FPCR uses an
unconditional write to FPCR. The plain FPSIMD assembly sequence has used
a conditional write to FPCR since 2014 in commit:

  5959e25729a5 ("arm64: fpsimd: avoid restoring fpcr if the contents haven't change")

... but this was not followed for the SVE restore assembly implemented
in 2017 in commit:

  1fc5dce78ad1 ("arm64/sve: Low-level SVE architectural state manipulation functions")

... so I've assumed that this doesn't actually matter in practice, and
implemented the C version matching the existing SVE assembly.

For the moment, fpsimd_save_state() and fpsimd_load_state() are left
as-is with their own logic to save/restore FPSR and FPCR. This will be
unified in subsequent patches.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h         | 17 ++++++++++++++---
 arch/arm64/include/asm/fpsimdmacros.h   | 13 ++-----------
 arch/arm64/include/asm/kvm_hyp.h        |  4 ++--
 arch/arm64/kernel/entry-fpsimd.S        | 10 ++++------
 arch/arm64/kernel/fpsimd.c              |  5 +++--
 arch/arm64/kvm/hyp/fpsimd.S             |  4 ++--
 arch/arm64/kvm/hyp/include/hyp/switch.h |  4 ++--
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  5 +++--
 8 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 36cf528e64971..6fd5cdf5e5f17 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -74,6 +74,18 @@ static inline void cpacr_restore(unsigned long cpacr)
 
 struct task_struct;
 
+static inline void fpsimd_save_common(struct user_fpsimd_state *state)
+{
+	state->fpsr = read_sysreg_s(SYS_FPSR);
+	state->fpcr = read_sysreg_s(SYS_FPCR);
+}
+
+static inline void fpsimd_load_common(const struct user_fpsimd_state *state)
+{
+	write_sysreg_s(state->fpsr, SYS_FPSR);
+	write_sysreg_s(state->fpcr, SYS_FPCR);
+}
+
 extern void fpsimd_save_state(struct user_fpsimd_state *state);
 extern void fpsimd_load_state(struct user_fpsimd_state *state);
 
@@ -157,9 +169,8 @@ static inline unsigned int sve_get_vl(void)
 	return vl;
 }
 
-extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
-extern void sve_load_state(void const *state, u32 const *pfpsr,
-			   int restore_ffr);
+extern void sve_save_state(void *state, int save_ffr);
+extern void sve_load_state(void const *state, int restore_ffr);
 extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
 extern void sme_save_state(void *state, int zt);
 extern void sme_load_state(void const *state, int zt);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index d75c9d4c9989b..c79ae7ec1ff05 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -235,7 +235,7 @@
 		_sve_wrffr	0
 .endm
 
-.macro sve_save nxbase, xpfpsr, save_ffr, nxtmp
+.macro sve_save nxbase, save_ffr
  _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
  _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
 		cbz		\save_ffr, 921f
@@ -246,24 +246,15 @@
 922:
 		_sve_str_p	0, \nxbase
 		_sve_ldr_p	0, \nxbase, -16
-		mrs		x\nxtmp, fpsr
-		str		w\nxtmp, [\xpfpsr]
-		mrs		x\nxtmp, fpcr
-		str		w\nxtmp, [\xpfpsr, #4]
 .endm
 
-.macro sve_load nxbase, xpfpsr, restore_ffr, nxtmp
+.macro sve_load nxbase, restore_ffr
  _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
 		cbz		\restore_ffr, 921f
 		_sve_ldr_p	0, \nxbase
 		_sve_wrffr	0
 921:
  _for n, 0, 15,	_sve_ldr_p	\n, \nxbase, \n - 16
-
-		ldr		w\nxtmp, [\xpfpsr]
-		msr		fpsr, x\nxtmp
-		ldr		w\nxtmp, [\xpfpsr, #4]
-		msr		fpcr, x\nxtmp
 .endm
 
 .macro sme_save_za nxbase, xvl, nw
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 8d06b62e7188c..0030cc1b52197 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -123,8 +123,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
-void __sve_save_state(void *sve_pffr, u32 *fpsr, int save_ffr);
-void __sve_restore_state(void *sve_pffr, u32 *fpsr, int restore_ffr);
+void __sve_save_state(void *sve, int save_ffr);
+void __sve_restore_state(void *sve, int restore_ffr);
 
 u64 __guest_enter(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 7f2d31dff8c17..83fe9c32bbd1c 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -37,11 +37,10 @@ SYM_FUNC_END(fpsimd_load_state)
  * Save the SVE state
  *
  * x0 - pointer to buffer for state
- * x1 - pointer to storage for FPSR
- * x2 - Save FFR if non-zero
+ * x1 - Save FFR if non-zero
  */
 SYM_FUNC_START(sve_save_state)
-	sve_save 0, x1, x2, 3
+	sve_save 0, x1
 	ret
 SYM_FUNC_END(sve_save_state)
 
@@ -49,11 +48,10 @@ SYM_FUNC_END(sve_save_state)
  * Load the SVE state
  *
  * x0 - pointer to buffer for state
- * x1 - pointer to storage for FPSR
- * x2 - Restore FFR if non-zero
+ * x1 - Restore FFR if non-zero
  */
 SYM_FUNC_START(sve_load_state)
-	sve_load 0, x1, x2, 4
+	sve_load 0, x1
 	ret
 SYM_FUNC_END(sve_load_state)
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 2578c2372c89e..9806fea8fea7c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -426,8 +426,8 @@ static void task_fpsimd_load(void)
 	if (restore_sve_regs) {
 		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_SVE);
 		sve_load_state(sve_pffr(&current->thread),
-			       &current->thread.uw.fpsimd_state.fpsr,
 			       restore_ffr);
+		fpsimd_load_common(&current->thread.uw.fpsimd_state);
 	} else {
 		WARN_ON_ONCE(current->thread.fp_type != FP_STATE_FPSIMD);
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
@@ -509,7 +509,8 @@ static void fpsimd_save_user_state(void)
 
 		sve_save_state((char *)last->sve_state +
 					sve_ffr_offset(vl),
-			       &last->st->fpsr, save_ffr);
+			       save_ffr);
+		fpsimd_save_common(last->st);
 		*last->fp_type = FP_STATE_SVE;
 	} else {
 		fpsimd_save_state(last->st);
diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
index 6e16cbfc5df27..8575e32977d19 100644
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ b/arch/arm64/kvm/hyp/fpsimd.S
@@ -21,11 +21,11 @@ SYM_FUNC_START(__fpsimd_restore_state)
 SYM_FUNC_END(__fpsimd_restore_state)
 
 SYM_FUNC_START(__sve_restore_state)
-	sve_load 0, x1, x2, 3
+	sve_load 0, x1
 	ret
 SYM_FUNC_END(__sve_restore_state)
 
 SYM_FUNC_START(__sve_save_state)
-	sve_save 0, x1, x2, 3
+	sve_save 0, x1
 	ret
 SYM_FUNC_END(__sve_save_state)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 6512dd3f75ae4..eb76a863ebb84 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -468,8 +468,8 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu)
 	 */
 	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
 	__sve_restore_state(vcpu_sve_pffr(vcpu),
-			    &vcpu->arch.ctxt.fp_regs.fpsr,
 			    true);
+	fpsimd_load_common(&vcpu->arch.ctxt.fp_regs);
 
 	/*
 	 * The effective VL for a VM could differ from the max VL when running a
@@ -490,8 +490,8 @@ static inline void __hyp_sve_save_host(void)
 	ctxt_sys_reg(hctxt, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
 	__sve_save_state(sve_regs + sve_ffr_offset(kvm_host_sve_max_vl),
-			 &hctxt->fp_regs.fpsr,
 			 true);
+	fpsimd_save_common(&hctxt->fp_regs);
 }
 
 static inline void fpsimd_lazy_switch_to_guest(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 04a6d2e0ea73f..0be4577a67e7b 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -35,7 +35,8 @@ static void __hyp_sve_save_guest(struct kvm_vcpu *vcpu)
 	 * on the VL, so use a consistent (i.e., the maximum) guest VL.
 	 */
 	sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2);
-	__sve_save_state(vcpu_sve_pffr(vcpu), &vcpu->arch.ctxt.fp_regs.fpsr, true);
+	__sve_save_state(vcpu_sve_pffr(vcpu), true);
+	fpsimd_save_common(&vcpu->arch.ctxt.fp_regs);
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
 }
 
@@ -55,8 +56,8 @@ static void __hyp_sve_restore_host(void)
 	 */
 	write_sysreg_s(sve_vq_from_vl(kvm_host_sve_max_vl) - 1, SYS_ZCR_EL2);
 	__sve_restore_state(sve_regs + sve_ffr_offset(kvm_host_sve_max_vl),
-			    &hctxt->fp_regs.fpsr,
 			    true);
+	fpsimd_load_common(&hctxt->fp_regs);
 	write_sysreg_el1(ctxt_sys_reg(hctxt, ZCR_EL1), SYS_ZCR);
 }
 
-- 
2.30.2



^ permalink raw reply related

* [PATCH 10/18] arm64: sysreg: Add FPCR and FPSR
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

Add sysreg definitions for FPCR and FPSR.

Some versions of LLVM will refuse to assemble accesses to FPCR and FPSR
unless the "fp" arch extension is enabled, which we don't currently do
for read_sysreg() and write_sysreg(). In general, handling feature
dependencies would complicate read_sysreg() and write_sysreg(), and it's
simpler to use read_sysreg_s() and write_sysreg_s() instead, requiring
sysreg definitions.

The values used can be found in ARM ARM issue M.b:

  https://developer.arm.com/documentation/ddi0487/mb/

... in sections:

* C5.2.8 ("FPCR, Floating-point Control Register")
* C5.2.10 ("FPSR, Floating-point Status Register")

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/tools/sysreg | 45 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 6c3ff14e561e6..fa155cd856a5b 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -3790,6 +3790,51 @@ Field	1	ZA
 Field	0	SM
 EndSysreg
 
+Sysreg	FPCR	3	3	4	4	0
+Res0	63:27
+Field	26	AHP
+Field	25	DN
+Field	24	FZ
+Enum	23:22	RMode
+	0b00	RN
+	0b01	RP
+	0b10	RM
+	0b11	RZ
+EndEnum
+Field	21:20	Stride
+Field	19	FZ16
+Field	18:16	Len
+Field	15	IDE
+Res0	14
+Field	13	EBF
+Field	12	IXE
+Field	11	UFE
+Field	10	OFE
+Field	9	DZE
+Field	8	IOE
+Res0	7:3
+Field	2	NEP
+Field	1	AH
+Field	0	FIZ
+EndSysreg
+
+Sysreg	FPSR	3	3	4	4	1
+Res0	63:32
+Field	31	N
+Field	30	Q
+Field	29	C
+Field	28	V
+Field	27	QC
+Res0	26:8
+Field	7	IDC
+Res0	6:5
+Field	4	IXC
+Field	3	UFC
+Field	2	OFC
+Field	1	DZC
+Field	0	IOC
+EndSysreg
+
 Sysreg	FPMR	3	3	4	4	2
 Res0	63:38
 Field	37:32	LSCALE2
-- 
2.30.2



^ permalink raw reply related

* [PATCH 12/18] arm64: fpsimd: Move fpsimd save/restore inline
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

Currently the FPSIMD register save/restore sequences are written in
out-of-line assembly routines. While this works, it's somewhat painful:

* As KVM needs to be able to use the sequences in hyp code, separate
  assembly files are used for the regular kernel and KVM code. While the
  common logic is shared in assembly macros, this still requires some
  duplication, and has lead to some trivial divergence.

* For historical reasons, the assembly macros take some register
  arguments as numerical indices (e.g. "fpsimd_save x0, 8" uses x0 and
  x8), which is simply confusing.

* For historical reasons, the SVE save/restore code and FPSIMD
  save/restore code have distinct sequences for FPSR and FPCR. Ideally
  this logic would be shared.

* The assembly sequences can't be instrumented, and so it's harder than
  necessary to catch memory safety issues.

To handle the above, move the FPSIMD register save/restore sequences to
inline assembly, and share the FPSR+FPCR save/restore with SVE.

Neither GCC nor LLVM instrument memory arguments to inline assembly, so
explicit instrumentation is added in the same manner as other assembly
routines. This instrumentation is implicitly disabled by Kbuild for nVHE
hyp code.

Note that I've used the SVE sequence for restoring FPCR, which uses an
unconditional write to FPCR. The plain FPSIMD assembly sequence used a
conditional write to FPCR since 2014 in commit:

  5959e25729a5 ("arm64: fpsimd: avoid restoring fpcr if the contents haven't change")

... but this was not followed for the SVE assembly implemented in 2017
in commit:

  1fc5dce78ad1 ("arm64/sve: Low-level SVE architectural state manipulation functions")

... so I've assumed that this doesn't actually matter in practice, and
I've erred in favour of the simpler sequence.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fpsimd.h         | 68 ++++++++++++++++++++++++-
 arch/arm64/include/asm/fpsimdmacros.h   | 59 ---------------------
 arch/arm64/include/asm/kvm_hyp.h        |  2 -
 arch/arm64/kernel/entry-fpsimd.S        | 20 --------
 arch/arm64/kvm/hyp/fpsimd.S             | 10 ----
 arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  4 +-
 7 files changed, 70 insertions(+), 97 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 6fd5cdf5e5f17..19b373ad0ebf7 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -22,6 +22,8 @@
 #include <linux/stddef.h>
 #include <linux/types.h>
 
+#define __FPSIMD_PREAMBLE	".arch_extension fp\n" \
+				".arch_extension simd\n"
 #define __SVE_PREAMBLE		".arch_extension sve\n"
 #define __SME_PREAMBLE		".arch_extension sme\n"
 
@@ -86,8 +88,70 @@ static inline void fpsimd_load_common(const struct user_fpsimd_state *state)
 	write_sysreg_s(state->fpcr, SYS_FPCR);
 }
 
-extern void fpsimd_save_state(struct user_fpsimd_state *state);
-extern void fpsimd_load_state(struct user_fpsimd_state *state);
+static inline void fpsimd_save_vregs(struct user_fpsimd_state *state)
+{
+	instrument_write(state->vregs, sizeof(state->vregs));
+	asm volatile(
+	__FPSIMD_PREAMBLE
+	"	stp	q0,  q1,  [%[vregs], #16 * 0]\n"
+	"	stp	q2,  q3,  [%[vregs], #16 * 2]\n"
+	"	stp	q4,  q5,  [%[vregs], #16 * 4]\n"
+	"	stp	q6,  q7,  [%[vregs], #16 * 6]\n"
+	"	stp	q8,  q9,  [%[vregs], #16 * 8]\n"
+	"	stp	q10, q11, [%[vregs], #16 * 10]\n"
+	"	stp	q12, q13, [%[vregs], #16 * 12]\n"
+	"	stp	q14, q15, [%[vregs], #16 * 14]\n"
+	"	stp	q16, q17, [%[vregs], #16 * 16]\n"
+	"	stp	q18, q19, [%[vregs], #16 * 18]\n"
+	"	stp	q20, q21, [%[vregs], #16 * 20]\n"
+	"	stp	q22, q23, [%[vregs], #16 * 22]\n"
+	"	stp	q24, q25, [%[vregs], #16 * 24]\n"
+	"	stp	q26, q27, [%[vregs], #16 * 26]\n"
+	"	stp	q28, q29, [%[vregs], #16 * 28]\n"
+	"	stp	q30, q31, [%[vregs], #16 * 30]\n"
+	: "=Q" (state->vregs)
+	: [vregs] "r" (state->vregs)
+	);
+}
+
+static inline void fpsimd_load_vregs(const struct user_fpsimd_state *state)
+{
+	instrument_read(state->vregs, sizeof(state->vregs));
+	asm volatile(
+	__FPSIMD_PREAMBLE
+	"	ldp	q0,  q1,  [%[vregs], #16 * 0]\n"
+	"	ldp	q2,  q3,  [%[vregs], #16 * 2]\n"
+	"	ldp	q4,  q5,  [%[vregs], #16 * 4]\n"
+	"	ldp	q6,  q7,  [%[vregs], #16 * 6]\n"
+	"	ldp	q8,  q9,  [%[vregs], #16 * 8]\n"
+	"	ldp	q10, q11, [%[vregs], #16 * 10]\n"
+	"	ldp	q12, q13, [%[vregs], #16 * 12]\n"
+	"	ldp	q14, q15, [%[vregs], #16 * 14]\n"
+	"	ldp	q16, q17, [%[vregs], #16 * 16]\n"
+	"	ldp	q18, q19, [%[vregs], #16 * 18]\n"
+	"	ldp	q20, q21, [%[vregs], #16 * 20]\n"
+	"	ldp	q22, q23, [%[vregs], #16 * 22]\n"
+	"	ldp	q24, q25, [%[vregs], #16 * 24]\n"
+	"	ldp	q26, q27, [%[vregs], #16 * 26]\n"
+	"	ldp	q28, q29, [%[vregs], #16 * 28]\n"
+	"	ldp	q30, q31, [%[vregs], #16 * 30]\n"
+	:
+	: "Q" (state->vregs),
+	  [vregs] "r" (state->vregs)
+	);
+}
+
+static inline void fpsimd_save_state(struct user_fpsimd_state *state)
+{
+	fpsimd_save_vregs(state);
+	fpsimd_save_common(state);
+}
+
+static inline void fpsimd_load_state(const struct user_fpsimd_state *state)
+{
+	fpsimd_load_vregs(state);
+	fpsimd_load_common(state);
+}
 
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index c79ae7ec1ff05..01b5e6d51ba79 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -8,65 +8,6 @@
 
 #include <asm/assembler.h>
 
-.macro fpsimd_save state, tmpnr
-	stp	q0, q1, [\state, #16 * 0]
-	stp	q2, q3, [\state, #16 * 2]
-	stp	q4, q5, [\state, #16 * 4]
-	stp	q6, q7, [\state, #16 * 6]
-	stp	q8, q9, [\state, #16 * 8]
-	stp	q10, q11, [\state, #16 * 10]
-	stp	q12, q13, [\state, #16 * 12]
-	stp	q14, q15, [\state, #16 * 14]
-	stp	q16, q17, [\state, #16 * 16]
-	stp	q18, q19, [\state, #16 * 18]
-	stp	q20, q21, [\state, #16 * 20]
-	stp	q22, q23, [\state, #16 * 22]
-	stp	q24, q25, [\state, #16 * 24]
-	stp	q26, q27, [\state, #16 * 26]
-	stp	q28, q29, [\state, #16 * 28]
-	stp	q30, q31, [\state, #16 * 30]!
-	mrs	x\tmpnr, fpsr
-	str	w\tmpnr, [\state, #16 * 2]
-	mrs	x\tmpnr, fpcr
-	str	w\tmpnr, [\state, #16 * 2 + 4]
-.endm
-
-.macro fpsimd_restore_fpcr state, tmp
-	/*
-	 * Writes to fpcr may be self-synchronising, so avoid restoring
-	 * the register if it hasn't changed.
-	 */
-	mrs	\tmp, fpcr
-	cmp	\tmp, \state
-	b.eq	9999f
-	msr	fpcr, \state
-9999:
-.endm
-
-/* Clobbers \state */
-.macro fpsimd_restore state, tmpnr
-	ldp	q0, q1, [\state, #16 * 0]
-	ldp	q2, q3, [\state, #16 * 2]
-	ldp	q4, q5, [\state, #16 * 4]
-	ldp	q6, q7, [\state, #16 * 6]
-	ldp	q8, q9, [\state, #16 * 8]
-	ldp	q10, q11, [\state, #16 * 10]
-	ldp	q12, q13, [\state, #16 * 12]
-	ldp	q14, q15, [\state, #16 * 14]
-	ldp	q16, q17, [\state, #16 * 16]
-	ldp	q18, q19, [\state, #16 * 18]
-	ldp	q20, q21, [\state, #16 * 20]
-	ldp	q22, q23, [\state, #16 * 22]
-	ldp	q24, q25, [\state, #16 * 24]
-	ldp	q26, q27, [\state, #16 * 26]
-	ldp	q28, q29, [\state, #16 * 28]
-	ldp	q30, q31, [\state, #16 * 30]!
-	ldr	w\tmpnr, [\state, #16 * 2]
-	msr	fpsr, x\tmpnr
-	ldr	w\tmpnr, [\state, #16 * 2 + 4]
-	fpsimd_restore_fpcr x\tmpnr, \state
-.endm
-
 /* Sanity-check macros to help avoid encoding garbage instructions */
 
 .macro _check_general_reg nr
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 0030cc1b52197..8c4602c8f4356 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -121,8 +121,6 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
 #endif
 
-void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
-void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
 void __sve_save_state(void *sve, int save_ffr);
 void __sve_restore_state(void *sve, int restore_ffr);
 
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index 83fe9c32bbd1c..4fa00c94f28b7 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -11,26 +11,6 @@
 #include <asm/assembler.h>
 #include <asm/fpsimdmacros.h>
 
-/*
- * Save the FP registers.
- *
- * x0 - pointer to struct fpsimd_state
- */
-SYM_FUNC_START(fpsimd_save_state)
-	fpsimd_save x0, 8
-	ret
-SYM_FUNC_END(fpsimd_save_state)
-
-/*
- * Load the FP registers.
- *
- * x0 - pointer to struct fpsimd_state
- */
-SYM_FUNC_START(fpsimd_load_state)
-	fpsimd_restore x0, 8
-	ret
-SYM_FUNC_END(fpsimd_load_state)
-
 #ifdef CONFIG_ARM64_SVE
 
 /*
diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
index 8575e32977d19..beacec33b2541 100644
--- a/arch/arm64/kvm/hyp/fpsimd.S
+++ b/arch/arm64/kvm/hyp/fpsimd.S
@@ -10,16 +10,6 @@
 
 	.text
 
-SYM_FUNC_START(__fpsimd_save_state)
-	fpsimd_save	x0, 1
-	ret
-SYM_FUNC_END(__fpsimd_save_state)
-
-SYM_FUNC_START(__fpsimd_restore_state)
-	fpsimd_restore	x0, 1
-	ret
-SYM_FUNC_END(__fpsimd_restore_state)
-
 SYM_FUNC_START(__sve_restore_state)
 	sve_load 0, x1
 	ret
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index eb76a863ebb84..aaa43554fd8e6 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -565,7 +565,7 @@ static void kvm_hyp_save_fpsimd_host(struct kvm_vcpu *vcpu)
 	if (system_supports_sve()) {
 		__hyp_sve_save_host();
 	} else {
-		__fpsimd_save_state(&hctxt->fp_regs);
+		fpsimd_save_state(&hctxt->fp_regs);
 	}
 
 	if (kvm_has_fpmr(kern_hyp_va(vcpu->kvm)))
@@ -625,7 +625,7 @@ static inline bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (sve_guest)
 		__hyp_sve_restore_guest(vcpu);
 	else
-		__fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs);
+		fpsimd_load_state(&vcpu->arch.ctxt.fp_regs);
 
 	if (kvm_has_fpmr(kern_hyp_va(vcpu->kvm)))
 		write_sysreg_s(__vcpu_sys_reg(vcpu, FPMR), SYS_FPMR);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 0be4577a67e7b..627762ed7327f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -83,7 +83,7 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
 	if (vcpu_has_sve(vcpu))
 		__hyp_sve_save_guest(vcpu);
 	else
-		__fpsimd_save_state(&vcpu->arch.ctxt.fp_regs);
+		fpsimd_save_state(&vcpu->arch.ctxt.fp_regs);
 
 	has_fpmr = kvm_has_fpmr(kern_hyp_va(vcpu->kvm));
 	if (has_fpmr)
@@ -92,7 +92,7 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu)
 	if (system_supports_sve())
 		__hyp_sve_restore_host();
 	else
-		__fpsimd_restore_state(&hctxt->fp_regs);
+		fpsimd_load_state(&hctxt->fp_regs);
 
 	if (has_fpmr)
 		write_sysreg_s(ctxt_sys_reg(hctxt, FPMR), SYS_FPMR);
-- 
2.30.2



^ permalink raw reply related

* [PATCH 08/18] arm64: fpsimd: Use assembler for baseline SME instructions
From: Mark Rutland @ 2026-05-21 13:25 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: broonie, catalin.marinas, james.morse, mark.rutland, maz, oupton,
	tabba, will
In-Reply-To: <20260521132556.584676-1-mark.rutland@arm.com>

We currently support assemblers which do not support SME instructions,
and have macros to manually encode SME instructions. This was
necessary historically as SME support was developed before assembler
support was widely available, but things have changed:

* All currently supported versions of LLVM support baseline SME
  instructions. Building the kernel requires LLVM 15+, while LLVM 13+
  supports SME.

* GNU binutils has supported baseline SME instructions since 2.38, which
  was released on 09 February 2022. Toolchains using this or later are
  widely available. For example Debian 12 (released on 10 June 2023)
  provides binutils 2.40. Toolchains provided kernel.org provide
  binutils 2.38+ since the GCC 12.1.0 release (released between 06 May
  2022 and 17 August 2022).

* For various reasons, SME support was marked as BROKEN, and re-enabled
  in v6.16 (released on 27 July 2025). The earliest support LTS kernel
  with SME support is v6.18.y, v6.18 was tagged on 30 November 2025, and
  contemporary toolchains (GCC 15.2 and binutils 2.45) supported
  baseline SME instructions.

* Any distribution which intends to support SME will presumably have a
  toolchain that supports baseline SME instructions such that userspace
  can be built.

Considering the above, there's no practical benefit to allowing SME to
be built when the toolchain doesn't support baseline SME instructions.

Make CONFIG_ARM64_SME depend on assembler support for SME, and remove
the manual encoding of SME instructions. The various _sme_<insn> macros
are kept for now, and will be cleaned up in subsequent patches.

A couple of SME2 instructions require a more recent toolchain, and are
left as-is for now. I've looked through releases of binutils and LLVM to
find when support was added, and noted this in a comment.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig                    |  5 ++++
 arch/arm64/include/asm/fpsimdmacros.h | 38 +++++++++++----------------
 2 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943b..378e50fef247a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2247,10 +2247,15 @@ config ARM64_SVE
 	  booting the kernel.  If unsure and you are not observing these
 	  symptoms, you should assume that it is safe to say Y.

+config AS_HAS_SME
+	# Supported by LLVM 13+ and binutils 2.38+
+	def_bool $(as-instr,.arch_extension sme)
+
 config ARM64_SME
 	bool "ARM Scalable Matrix Extension support"
 	default y
 	depends on ARM64_SVE
+	depends on AS_HAS_SME
 	help
 	  The Scalable Matrix Extension (SME) is an extension to the AArch64
 	  execution state which utilises a substantial subset of the SVE
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index 1122eea6daacf..d0bdbbf2d44ad 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -148,46 +148,38 @@
 	pfalse	p\np\().b
 .endm

-/* SME instruction encodings for non-SME-capable assemblers */
-/* (pre binutils 2.38/LLVM 13) */
+/* Deprecated macros for SME instructions */

 /* RDSVL X\nx, #\imm */
 .macro _sme_rdsvl nx, imm
-	_check_general_reg \nx
-	_check_num (\imm), -0x20, 0x1f
-	.inst	0x04bf5800			\
-		| (\nx)				\
-		| (((\imm) & 0x3f) << 5)
+	.arch_extension sme
+	rdsvl x\nx, #\imm
 .endm

 /*
  * STR (vector from ZA array):
- *	STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ *	STR ZA[W\nw, #\offset], [X\nxbase, #\offset, MUL VL]
  */
 .macro _sme_str_zav nw, nxbase, offset=0
-	_sme_check_wv \nw
-	_check_general_reg \nxbase
-	_check_num (\offset), -0x100, 0xff
-	.inst	0xe1200000			\
-		| (((\nw) & 3) << 13)		\
-		| ((\nxbase) << 5)		\
-		| ((\offset) & 7)
+	.arch_extension sme
+	str	za[w\nw, #\offset], [x\nxbase, #\offset, MUL VL]
 .endm

 /*
  * LDR (vector to ZA array):
- *	LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
+ *	LDR ZA[w\nw, #\offset], [X\nxbase, #\offset, MUL VL]
  */
 .macro _sme_ldr_zav nw, nxbase, offset=0
-	_sme_check_wv \nw
-	_check_general_reg \nxbase
-	_check_num (\offset), -0x100, 0xff
-	.inst	0xe1000000			\
-		| (((\nw) & 3) << 13)		\
-		| ((\nxbase) << 5)		\
-		| ((\offset) & 7)
+	.arch_extension sme
+	ldr	za[w\nw, #\offset], [x\nxbase, #\offset, MUL VL]
 .endm

+/*
+ * SME2 instruction encodings for older assemblers.
+ * Supported by binutils 2.41+.
+ * Supported by LLVM 16+
+ */
+
 /*
  * LDR (ZT0)
  *
-- 
2.30.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox