Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 00/19] init: discoverable root partitions, a.k.a. an omittable "root=" cmdline option
From: Al Viro @ 2026-06-15 17:04 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Jens Axboe, Davidlohr Bueso, Christian Brauner, Jan Kara,
	linux-kernel, linux-block, linux-efi, linux-fsdevel,
	Richard Henderson, Matt Turner, Magnus Lindholm, linux-alpha,
	Vineet Gupta, linux-snps-arc, Russell King, linux-arm-kernel,
	Catalin Marinas, Will Deacon, Huacai Chen, WANG Xuerui, loongarch,
	Thomas Bogendoerfer, linux-mips, James E.J. Bottomley,
	Helge Deller, linux-parisc, Madhavan Srinivasan, Michael Ellerman,
	linuxppc-dev, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	linux-riscv, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	linux-s390, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Jonathan Corbet, Shuah Khan, linux-doc
In-Reply-To: <20260615-discoverable-root_partitions-v1-0-39c78fac42e2@kernel.org>

On Mon, Jun 15, 2026 at 06:08:56PM +0200, Vincent Mailhol wrote:

> Tested with GRUB, which implements the LoaderDevicePartUUID EFI variable
> in its bli module [3]. With this, I was able to boot a kernel with a
> completely empty cmdline and no initrd.
> 
> [1] The Discoverable Partitions Specification (DPS)
> Link: https://uapi-group.org/specifications/specs/discoverable_partitions_specification/
> 
> [2] systemd-gpt-auto-generator
> Link: https://www.freedesktop.org/software/systemd/man/latest/systemd-gpt-auto-generator.html
> 
> [3] GRUB -- §16.2 bli
> Link: https://www.gnu.org/software/grub/manual/grub/html_node/bli_005fmodule.html

So what does that thing, tied to EFI as it is, have to do with architectures where
	* firmware is rather unlike EFI
	* firmware wouldn't know what to do with GPT
	* GRUB is *not* ported to, let alone used
such as, say it, the very first one mentioned at your [1]?

Or is that conditional upon "if anyone wants to design replacement firmware
for those, and if they agree to follow our wishlist"?


^ permalink raw reply

* Re: [PATCH v8 05/12] iommu/arm-smmu-v3: Cache and restore MSI config
From: Mostafa Saleh @ 2026-06-15 17:04 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
	Nicolin Chen, Daniel Mentz, Ashish Mhetre, linux-arm-kernel
In-Reply-To: <20260601215909.3958732-6-praan@google.com>

On Mon, Jun 01, 2026 at 09:59:02PM +0000, Pranjal Shrivastava wrote:
> The SMMU's MSI configuration registers (*_IRQ_CFGn) containing target
> address, data and memory attributes lose their state when the SMMU is
> powered down. We'll need to cache and restore their contents to ensure
> that MSIs work after the system resumes.
> 
> To address this, cache the original `msi_msg` within the `msi_desc`
> when the configuration is first written by `arm_smmu_write_msi_msg`.
> This primarily includes the target address and data since the memory
> attributes are fixed.
> 
> Introduce a new helper `arm_smmu_resume_msis` which will later be called
> during the driver's resume callback. The helper would retrieve the
> cached MSI message for each relevant interrupt (evtq, gerr, priq) via
> get_cached_msi_msg & re-config the registers via arm_smmu_write_msi_msg.
> 
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Pranjal Shrivastava <praan@google.com>

Reviewed-by: Mostafa Saleh <smostafa@google.com>

Thanks,
Mostafa

> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 37 +++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 8682be5060ed..93cee32f6c99 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -4551,6 +4551,9 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
>  	struct arm_smmu_device *smmu = dev_get_drvdata(dev);
>  	phys_addr_t *cfg = arm_smmu_msi_cfg[desc->msi_index];
>  
> +	/* Cache the msi_msg for resume */
> +	desc->msg = *msg;
> +
>  	doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
>  	doorbell &= MSI_CFG0_ADDR_MASK;
>  
> @@ -4559,6 +4562,40 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
>  	writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]);
>  }
>  
> +static void arm_smmu_resume_msi(struct arm_smmu_device *smmu,
> +				unsigned int irq, const char *name)
> +{
> +	struct msi_desc *desc;
> +	struct msi_msg msg;
> +
> +	if (!irq)
> +		return;
> +
> +	desc = irq_get_msi_desc(irq);
> +	if (!desc) {
> +		dev_err(smmu->dev, "Failed to resume msi: %s", name);
> +		return;
> +	}
> +
> +	get_cached_msi_msg(irq, &msg);
> +	arm_smmu_write_msi_msg(desc, &msg);
> +}
> +
> +static void arm_smmu_resume_msis(struct arm_smmu_device *smmu)
> +{
> +	if (!(smmu->features & ARM_SMMU_FEAT_MSI))
> +		return;
> +
> +	if (!dev_get_msi_domain(smmu->dev))
> +		return;
> +
> +	arm_smmu_resume_msi(smmu, smmu->gerr_irq, "gerror");
> +	arm_smmu_resume_msi(smmu, smmu->evtq.q.irq, "evtq");
> +
> +	if (smmu->features & ARM_SMMU_FEAT_PRI)
> +		arm_smmu_resume_msi(smmu, smmu->priq.q.irq, "priq");
> +}
> +
>  static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
>  {
>  	int ret, nvec = ARM_SMMU_MAX_MSIS;
> -- 
> 2.54.0.1013.g208068f2d8-goog
> 
> 


^ permalink raw reply

* Re: [PATCH v8 04/12] iommu/tegra241-cmdqv: Restore PROD and CONS after resume
From: Mostafa Saleh @ 2026-06-15 17:01 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
	Nicolin Chen, Daniel Mentz, Ashish Mhetre, linux-arm-kernel
In-Reply-To: <20260601215909.3958732-5-praan@google.com>

On Mon, Jun 01, 2026 at 09:59:01PM +0000, Pranjal Shrivastava wrote:
> From: Ashish Mhetre <amhetre@nvidia.com>
> 
> PROD and CONS indices for vcmdqs are getting set to 0 after resume.
> Because of this the vcmdq is not consuming commands after resume.
> Fix this by restoring PROD and CONS indices after resume from
> saved pointers.

What commands are exisiting at resume? Won't
tegra241_cmdqv_drain_vintf0_lvcmdqs() drain the queues and make the
PROD and CONS equal each other anyway?

Thanks,
Mostafa

> 
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> index cb1e75e4ee91..866cae7b73e5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> @@ -511,6 +511,8 @@ static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq)
>  
>  	/* Configure and enable VCMDQ */
>  	writeq_relaxed(vcmdq->cmdq.q.q_base, REG_VCMDQ_PAGE1(vcmdq, BASE));
> +	writel_relaxed(vcmdq->cmdq.q.llq.prod, REG_VCMDQ_PAGE0(vcmdq, PROD));
> +	writel_relaxed(vcmdq->cmdq.q.llq.cons, REG_VCMDQ_PAGE0(vcmdq, CONS));
>  
>  	ret = vcmdq_write_config(vcmdq, VCMDQ_EN);
>  	if (ret) {
> -- 
> 2.54.0.1013.g208068f2d8-goog
> 
> 


^ permalink raw reply

* Re: [PATCH v8 03/12] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs
From: Mostafa Saleh @ 2026-06-15 16:58 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
	Nicolin Chen, Daniel Mentz, Ashish Mhetre, linux-arm-kernel
In-Reply-To: <20260601215909.3958732-4-praan@google.com>

On Mon, Jun 01, 2026 at 09:59:00PM +0000, Pranjal Shrivastava wrote:
> The tegra241-cmdqv driver supports vCMDQs which need to be drained
> before suspending the SMMU. The current driver implementation only uses
> VINTF0 for vCMDQs owned by the kernel which need to be drained. Add a
> helper that drains all the enabled vCMDQs under VINTF0.
> 
> Add another function ptr to arm_smmu_impl_ops to drain implementation
> specified queues and call it within `arm_smmu_drain_queues`.
> 
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  7 +++++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  1 +
>  .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c    | 27 +++++++++++++++++++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 0e77ef1e4523..8682be5060ed 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -915,6 +915,13 @@ static int arm_smmu_drain_queues(struct arm_smmu_device *smmu)
>  	 */
>  	ret = arm_smmu_queue_poll_until_empty(smmu, &smmu->cmdq.q);
>  
> +	if (ret)
> +		goto out;
> +
> +	/* Drain all implementation-specific queues */
> +	if (smmu->impl_ops && smmu->impl_ops->drain_queues)
> +		ret = smmu->impl_ops->drain_queues(smmu);
> +out:
>  	return ret;
>  }
>  
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index c855ab4962ed..24d5e28eea88 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -885,6 +885,7 @@ struct arm_smmu_impl_ops {
>  	size_t (*get_viommu_size)(enum iommu_viommu_type viommu_type);
>  	int (*vsmmu_init)(struct arm_vsmmu *vsmmu,
>  			  const struct iommu_user_data *user_data);
> +	int (*drain_queues)(struct arm_smmu_device *smmu);
>  };
>  
>  /* An SMMUv3 instance */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> index 67be62a6e764..cb1e75e4ee91 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
> @@ -414,6 +414,32 @@ tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu,
>  	return &vcmdq->cmdq;
>  }
>  
> +static int tegra241_cmdqv_drain_vintf0_lvcmdqs(struct arm_smmu_device *smmu)
> +{
> +	struct tegra241_cmdqv *cmdqv =
> +		container_of(smmu, struct tegra241_cmdqv, smmu);
> +	struct tegra241_vintf *vintf = cmdqv->vintfs[0];
> +	int ret = 0;
> +	u16 lidx;
> +
> +	/* Kernel only uses VINTF0. Return if it's disabled */
> +	if (!READ_ONCE(vintf->enabled))
> +		return 0;

I am not familiar with this driver, but the READ_ONCE() caught my eye,
I see that’s already what is the existing code is doing, but it is not
clear to me why, it seems to be an attempt to make this path lockless.

However, won’t we need some aquire/release semantics?

For example in tegra241_vintf_hw_deinit() it WRITE_ONCE() cmdq and then
vintf and finally writel() with a write memory barrier.

While in tegra241_cmdqv_drain_vintf0_lvcmdqs() (or in
tegra241_cmdqv_get_cmdq()) it checks READ_ONCE(vintf->enabled) then
READ_ONCE(vcmdq->enabled)

Now it is possible that this executes in any order, due to the lack
of barriers,which means you can see:
Thread#1: READ_ONCE(vintf->enabled) => TRUE
Thread#2: Writes both vintf->enabled and vcmdq->enabled to FALSE
Thread#1: Still sees vcmdq->enabled as TRUE because it was speculated.

Am I missing something?

Thanks,
Mostafa

> +
> +	for (lidx = 0; lidx < cmdqv->num_lvcmdqs_per_vintf; lidx++) {
> +		struct tegra241_vcmdq *vcmdq = vintf->lvcmdqs[lidx];
> +
> +		if (!vcmdq || !READ_ONCE(vcmdq->enabled))
> +			continue;
> +
> +		ret = arm_smmu_queue_poll_until_empty(smmu, &vcmdq->cmdq.q);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  /* HW Reset Functions */
>  
>  /*
> @@ -845,6 +871,7 @@ static struct arm_smmu_impl_ops tegra241_cmdqv_impl_ops = {
>  	.get_secondary_cmdq = tegra241_cmdqv_get_cmdq,
>  	.device_reset = tegra241_cmdqv_hw_reset,
>  	.device_remove = tegra241_cmdqv_remove,
> +	.drain_queues = tegra241_cmdqv_drain_vintf0_lvcmdqs,
>  	/* For user-space use */
>  	.hw_info = tegra241_cmdqv_hw_info,
>  	.get_viommu_size = tegra241_cmdqv_get_vintf_size,
> -- 
> 2.54.0.1013.g208068f2d8-goog
> 
> 


^ permalink raw reply

* Re: [PATCH RFC 1/2] dt-bindings: pinctl: amlogic,pinctrl-a4: Add gpio irq property
From: Conor Dooley @ 2026-06-15 16:52 UTC (permalink / raw)
  To: xianwei.zhao
  Cc: Linus Walleij, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Neil Armstrong, Kevin Hilman, Jerome Brunet, Martin Blumenstingl,
	linux-amlogic, linux-gpio, devicetree, linux-kernel,
	linux-arm-kernel
In-Reply-To: <20260611-gpio-to-irq-v1-1-12201716f23f@amlogic.com>

[-- Attachment #1: Type: text/plain, Size: 85 bytes --]

Given Linus' comments on the cover letter,
pw-bot: changes-requested

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v6 1/7] dt-bindings: mfd: mt6397: Add MT6392 PMIC
From: Conor Dooley @ 2026-06-15 16:50 UTC (permalink / raw)
  To: Luca Leonardo Scorcia
  Cc: linux-mediatek, Fabien Parent, Val Packett, Dmitry Torokhov,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Sen Chu,
	Sean Wang, Macpaul Lin, Lee Jones, Matthias Brugger,
	AngeloGioacchino Del Regno, Linus Walleij, Julien Massot,
	Louis-Alexis Eyraud, Akari Tsuyukusa, Chen Zhong, linux-input,
	devicetree, linux-kernel, linux-pm, linux-arm-kernel, linux-gpio
In-Reply-To: <20260612200717.361018-2-l.scorcia@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1612 bytes --]

On Fri, Jun 12, 2026 at 10:04:06PM +0200, Luca Leonardo Scorcia wrote:
> From: Fabien Parent <parent.f@gmail.com>
> 
> Add the initial bindings for the MT6392 PMIC and its RTC device.
> 
> Signed-off-by: Fabien Parent <parent.f@gmail.com>
> Signed-off-by: Val Packett <val@packett.cool>
> Signed-off-by: Luca Leonardo Scorcia <l.scorcia@gmail.com>

Sashiko complaint about missing regulators looks valid.
Is it?

Cheers,
Conor.

> ---
>  .../devicetree/bindings/mfd/mediatek,mt6397.yaml          | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/mfd/mediatek,mt6397.yaml b/Documentation/devicetree/bindings/mfd/mediatek,mt6397.yaml
> index 3cbc0dc12c31..e39e81aa9924 100644
> --- a/Documentation/devicetree/bindings/mfd/mediatek,mt6397.yaml
> +++ b/Documentation/devicetree/bindings/mfd/mediatek,mt6397.yaml
> @@ -40,6 +40,10 @@ properties:
>            - mediatek,mt6358
>            - mediatek,mt6359
>            - mediatek,mt6397
> +      - items:
> +          - enum:
> +              - mediatek,mt6392
> +          - const: mediatek,mt6323
>        - items:
>            - enum:
>                - mediatek,mt6366
> @@ -72,6 +76,10 @@ properties:
>                - mediatek,mt6331-rtc
>                - mediatek,mt6358-rtc
>                - mediatek,mt6397-rtc
> +          - items:
> +              - enum:
> +                  - mediatek,mt6392-rtc
> +              - const: mediatek,mt6323-rtc
>            - items:
>                - enum:
>                    - mediatek,mt6359-rtc
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH RFC 3/9] net: stmmac: qcom-ethqos: fix RGMII_ID mode to use DLL bypass
From: Andrew Lunn @ 2026-06-15 16:48 UTC (permalink / raw)
  To: Mohd Ayaan Anwar
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Richard Cochran, Bjorn Andersson, Konrad Dybcio, Maxime Coquelin,
	Alexandre Torgue, Russell King, linux-arm-msm, netdev, devicetree,
	linux-kernel, linux-stm32, linux-arm-kernel
In-Reply-To: <ai93X/cNWHtEQsDt@oss.qualcomm.com>

On Mon, Jun 15, 2026 at 09:24:07AM +0530, Mohd Ayaan Anwar wrote:
> Hello Andrew,
> On Thu, Jun 11, 2026 at 10:54:37PM +0200, Andrew Lunn wrote:
> > On Fri, Jun 12, 2026 at 12:06:59AM +0530, Mohd Ayaan Anwar wrote:
> > > When "rgmii-id" is selected the PHY supplies both TX and RX delays, so
> > > the MAC must not add its own.  The driver currently falls through to the
> > > generic DLL initialisation path which programs it to add a delay.
> > > 
> > > Power down the DLL and set DDR bypass mode for RGMII_ID, then program
> > > the IO_MACRO via a new ethqos_rgmii_id_macro_init() helper.  Also fix
> > > ethqos_set_clk_tx_rate() to not double the clock rate in bypass mode at
> > > 100M/10M, and remove RGMII_ID from the phase-shift suppression in
> > > ethqos_rgmii_macro_init() since RGMII_ID no longer reaches that path.
> > 
> > I'm curious how this works at the moment? Do no boards make use of
> > RGMII ID? Are all current boards broken?
> 
> Searching through the DTS, I found that we have two boards using "rgmii"
> (qcs404-evb-4000.dts and sa8155-adp.dts) and another board using
> "rgmii-txid" (sa8540p-ride.dts). No board which uses RGMII ID.

So this causes problems. We cannot break existing boards, yet it would
be good to fix the current broken behaviour.

> I don't think any of these boards have extra long wires which would add
> PCB level delay. They are against the netdev definitions for "rgmii" and
> "rgmii-txid".
> 
> But the first two boards should still be working fine since the current
> driver programs the IO_MACRO to add the delay when operating in RGMII
> mode.

Which is wrong, given the current definition. No delays should be
added, by either the MAC or the PHY.

Please could you contact the Maintainers of these boards and find out
the real situation with the hardware.

It could be the best way forward is that you issue a warning when
"rgmii" is found and pass rgmii-id to the PHY. And you also change the
two boards to use rgmii-id. Lets think about the rgmii-txid case once
we better understand it.

	Andrew


^ permalink raw reply

* Re: [RFC PATCH v4 6/9] dt-bindings: npu: rockchip,rk3588-rknn-core: Add RK3568
From: Conor Dooley @ 2026-06-15 16:49 UTC (permalink / raw)
  To: MidG971
  Cc: tomeu, ogabbay, heiko, robh, krzk+dt, conor+dt, ulf.hansson,
	dri-devel, linux-rockchip, devicetree, linux-arm-kernel, linux-pm,
	iommu, linux-kernel, xxm, chaoyi.chen, finley.xiao, diederik,
	jonas
In-Reply-To: <20260613070116.438906-7-midgy971@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 75 bytes --]

Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
From: Fuad Tabba @ 2026-06-15 16:44 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: Marc Zyngier, Oliver Upton, Will Deacon, Catalin Marinas,
	Quentin Perret, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel
In-Reply-To: <ajAncPp3nOGcWD1U@google.com>

On Mon, 15 Jun 2026 at 17:25, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> On Fri, Jun 12, 2026 at 07:59:25AM +0100, tabba@google.com wrote:
> > pKVM copies a non-protected guest's register context between the host
> > and the hypervisor on every world switch, even when the host never
> > inspects it. Defer the copy: on entry, flush the host context into the
> > hyp vCPU only when the host marked it dirty (PKVM_HOST_STATE_DIRTY); on
> > exit, leave it in the hyp vCPU and copy it back only when the host needs
> > it, via a __pkvm_vcpu_sync_state hypercall on trap handling or at vcpu
> > put. A protected guest's context is copied as before, since lazy sync
> > only helps where the host is trusted to see the guest's registers.
> >
> > The PC is the exception: it is copied back on every exit so the
> > kvm_exit tracepoint reports the guest's real exit PC rather than the
> > value left by the previous sync.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   |  1 +
> >  arch/arm64/include/asm/kvm_host.h  |  2 +
> >  arch/arm64/kvm/arm.c               |  7 +++
> >  arch/arm64/kvm/handle_exit.c       | 22 ++++++++
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 88 ++++++++++++++++++++++++++++--
> >  5 files changed, 115 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 043495f7fc78..6e1135b3ded4 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -113,6 +113,7 @@ enum __kvm_host_smccc_func {
> >       __KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
> >       __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
> >       __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> > +     __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
> >       __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
> >
> >       MARKER(__KVM_HOST_SMCCC_FUNC_MAX)
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index a49042bfa801..1ef660774adc 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1113,6 +1113,8 @@ struct kvm_vcpu_arch {
> >  /* SError pending for nested guest */
> >  #define NESTED_SERROR_PENDING        __vcpu_single_flag(sflags, BIT(8))
> >
> > +/* pKVM host vcpu state is dirty, needs resync (nVHE-only) */
>
> nit: with hVHE, I guess we can just drop that nVHE-only?

Ack.


>
> > +#define PKVM_HOST_STATE_DIRTY        __vcpu_single_flag(iflags, BIT(4))
> >
> >  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> >  #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +   \
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index c9f36932c980..a5c54e37778b 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -734,6 +734,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >       if (is_protected_kvm_enabled()) {
> >               kvm_call_hyp(__vgic_v3_save_aprs, &vcpu->arch.vgic_cpu.vgic_v3);
> >               kvm_call_hyp_nvhe(__pkvm_vcpu_put);
> > +
> > +             /* __pkvm_vcpu_put implies a sync of the state */
> > +             if (!kvm_vm_is_protected(vcpu->kvm))
> > +                     vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> >       }
> >
> >       kvm_vcpu_put_debug(vcpu);
> > @@ -961,6 +965,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> >               return ret;
> >
> >       if (is_protected_kvm_enabled()) {
> > +             /* Start with the vcpu in a dirty state */
> > +             if (!kvm_vm_is_protected(vcpu->kvm))
> > +                     vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> >               ret = pkvm_create_hyp_vm(kvm);
> >               if (ret)
> >                       return ret;
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 54aedf93c78b..dccc3786548b 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -422,6 +422,21 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
> >  {
> >       int handled;
> >
> > +     /*
> > +      * If we run a non-protected VM when protection is enabled
> > +      * system-wide, resync the state from the hypervisor and mark
> > +      * it as dirty on the host side if it wasn't dirty already
> > +      * (which could happen if preemption has taken place).
> > +      */
> > +     if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
> > +             preempt_disable();
>
> nit: since we are introducing guard() with that series, this one could be
> guard(preempt)().

Nice one :) Done.


>
> > +             if (!(vcpu_get_flag(vcpu, PKVM_HOST_STATE_DIRTY))) {
> > +                     kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> > +                     vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> > +             }
> > +             preempt_enable();
> > +     }
> > +
> >       /*
> >        * See ARM ARM B1.14.1: "Hyp traps on instructions
> >        * that fail their condition code check"
> > @@ -489,6 +504,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
> >  /* For exit types that need handling before we can be preempted */
> >  void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
> >  {
> > +     /*
> > +      * We just exited, so the state is clean from a hypervisor
> > +      * perspective.
> > +      */
> > +     if (is_protected_kvm_enabled())
> > +             vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> > +
> >       if (ARM_SERROR_PENDING(exception_index)) {
> >               if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
> >                       u64 disr = kvm_vcpu_get_disr(vcpu);
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 23e644c24a03..02383b372258 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -139,6 +139,49 @@ static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> >               host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
> >  }
> >
> > +
> > +static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu,
> > +                           struct kvm_vcpu *to_vcpu)
> > +{
> > +     int i;
> > +
> > +     to_vcpu->arch.ctxt.regs         = from_vcpu->arch.ctxt.regs;
> > +     to_vcpu->arch.ctxt.spsr_abt     = from_vcpu->arch.ctxt.spsr_abt;
> > +     to_vcpu->arch.ctxt.spsr_und     = from_vcpu->arch.ctxt.spsr_und;
> > +     to_vcpu->arch.ctxt.spsr_irq     = from_vcpu->arch.ctxt.spsr_irq;
> > +     to_vcpu->arch.ctxt.spsr_fiq     = from_vcpu->arch.ctxt.spsr_fiq;
> > +     to_vcpu->arch.ctxt.fp_regs      = from_vcpu->arch.ctxt.fp_regs;
> > +
> > +     /*
> > +      * Copy the sysregs, but don't mess with the timer state which
> > +      * is directly handled by EL1 and is expected to be preserved.
> > +      * enum vcpu_sysreg is sparse: VNCR-mapped registers take values
> > +      * derived from their VNCR page offset, so the timer registers do
> > +      * not form a contiguous numeric range and must be skipped by name.
> > +      */
> > +     for (i = 1; i < NR_SYS_REGS; i++) {
> > +             switch (i) {
> > +             case CNTVOFF_EL2:
> > +             case CNTV_CVAL_EL0:
> > +             case CNTV_CTL_EL0:
> > +             case CNTP_CVAL_EL0:
> > +             case CNTP_CTL_EL0:
> > +                     continue;
> > +             }
> > +             to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i];
> > +     }
> > +}
> > +
> > +static void __sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > +     __copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu);
> > +}
> > +
> > +static void __flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> > +{
> > +     __copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu);
> > +}
>
> nit: Could that be flush/sync_hyp_vcpu_state? as everything this is called
> "state" and we already have flush_debug_state() below ?

Good point, renamed to flush_hyp_vcpu_state()/sync_hyp_vcpu_state().

>
> > +
> >  static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
> >  {
> >       struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> > @@ -168,7 +211,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> >       fpsimd_sve_flush();
> >       flush_debug_state(hyp_vcpu);
> >
> > -     hyp_vcpu->vcpu.arch.ctxt        = host_vcpu->arch.ctxt;
> > +     /*
> > +      * If we deal with a non-protected guest and the state is potentially
> > +      * dirty (from a host perspective), copy the state back into the hyp
> > +      * vcpu.
> > +      */
> > +     if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> > +             if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> > +                     __flush_hyp_vcpu(hyp_vcpu);
> > +     } else {
> > +             hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
> > +     }
> >
> >       hyp_vcpu->vcpu.arch.mdcr_el2    = host_vcpu->arch.mdcr_el2;
> >       hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
> > @@ -191,9 +244,11 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> >       fpsimd_sve_sync(&hyp_vcpu->vcpu);
> >       sync_debug_state(hyp_vcpu);
> >
> > -     host_vcpu->arch.ctxt            = hyp_vcpu->vcpu.arch.ctxt;
> > -
> > -     host_vcpu->arch.hcr_el2         = hyp_vcpu->vcpu.arch.hcr_el2;
> > +     if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> > +             host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
> > +     else
> > +             /* Keep the PC current for the kvm_exit tracepoint (lazy ctxt sync). */
> > +             host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
> >
> >       host_vcpu->arch.fault           = hyp_vcpu->vcpu.arch.fault;
> >
> > @@ -227,8 +282,30 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
> >  {
> >       struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> >
> > -     if (hyp_vcpu)
> > +     if (hyp_vcpu) {
> > +             struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> > +
> > +             if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) &&
> > +                 !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) {
> > +                     __sync_hyp_vcpu(hyp_vcpu);
> > +             }
> > +
> >               pkvm_put_hyp_vcpu(hyp_vcpu);
> > +     }
> > +}
> > +
> > +static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt)
> > +{
> > +     struct pkvm_hyp_vcpu *hyp_vcpu;
> > +
> > +     if (!is_protected_kvm_enabled())
> > +             return;
>
> Since "KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls" we
> got rid of those is_protected_kvm_enabled() for pKVM-only HVCs. (also, it is
> declared in the pKVM-only section of the HVCs)

Dropped.

Thanks a lot for the reviews!
/fuad
>
> > +
> > +     hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> > +     if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> > +             return;
> > +
> > +     __sync_hyp_vcpu(hyp_vcpu);
> >  }
> >
> >  static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
> > @@ -859,6 +936,7 @@ static const hcall_t host_hcall[] = {
> >       HANDLE_FUNC(__pkvm_finalize_teardown_vm),
> >       HANDLE_FUNC(__pkvm_vcpu_load),
> >       HANDLE_FUNC(__pkvm_vcpu_put),
> > +     HANDLE_FUNC(__pkvm_vcpu_sync_state),
> >       HANDLE_FUNC(__pkvm_tlb_flush_vmid),
> >  };
> >
> > --
> > 2.54.0.1136.gdb2ca164c4-goog
> >


^ permalink raw reply

* Re: [PATCH 3/8] dt-bindings: clock: clocking-wizard: Make s_axi_aclk optional for static-config
From: Conor Dooley @ 2026-06-15 16:44 UTC (permalink / raw)
  To: Shubhrajyoti Datta
  Cc: linux-clk, linux-kernel, git, Michael Turquette, Stephen Boyd,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Michal Simek,
	devicetree, linux-arm-kernel
In-Reply-To: <20260615-squid-showy-435c9cf780a0@spud>

[-- Attachment #1: Type: text/plain, Size: 391 bytes --]

On Mon, Jun 15, 2026 at 05:42:17PM +0100, Conor Dooley wrote:
> 
> 
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
> pw-bot: not-applicable

Actually, I take this back. Patch 1 seems to be what's adding the static
configurations in the first place and then patches 2 and 3 complete that
effort. Instead, please add this static config support as one patch.

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH] clk: at91: Read "reg" with helper
From: Brian Masney @ 2026-06-15 16:42 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: Michael Turquette, Stephen Boyd, Nicolas Ferre, Alexandre Belloni,
	Claudiu Beznea, linux-clk, linux-arm-kernel, linux-kernel
In-Reply-To: <20260612215251.1888345-1-robh@kernel.org>

On Fri, Jun 12, 2026 at 04:52:51PM -0500, Rob Herring (Arm) wrote:
> The "reg" property is an address-sized DT cell property. The AT91
> compat clock parser only uses a small bus id from it, but reading it
> with the u8 helper does not match the property encoding.
> 
> Use of_property_read_reg() so the code goes through the helper for
> "reg" properties, then keep the existing range check before passing
> the bus id to the clock registration code.
> 
> Assisted-by: Codex:gpt-5-5
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

Reviewed-by: Brian Masney <bmasney@redhat.com>



^ permalink raw reply

* Re: [PATCH 3/8] dt-bindings: clock: clocking-wizard: Make s_axi_aclk optional for static-config
From: Conor Dooley @ 2026-06-15 16:42 UTC (permalink / raw)
  To: Shubhrajyoti Datta
  Cc: linux-clk, linux-kernel, git, Michael Turquette, Stephen Boyd,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Michal Simek,
	devicetree, linux-arm-kernel
In-Reply-To: <20260615034845.3320286-4-shubhrajyoti.datta@amd.com>

[-- Attachment #1: Type: text/plain, Size: 77 bytes --]



Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH v10 4/6] clk: Add KUnit tests for assigned-clock-sscs
From: Brian Masney @ 2026-06-15 16:40 UTC (permalink / raw)
  To: Peng Fan (OSS)
  Cc: Michael Turquette, Stephen Boyd, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Sudeep Holla, Cristian Marussi, Sebin Francis,
	linux-kernel, linux-clk, devicetree, arm-scmi, linux-arm-kernel,
	Peng Fan
In-Reply-To: <20260612-clk-v10-v10-4-eb92484eda38@nxp.com>

On Fri, Jun 12, 2026 at 04:46:26PM +0800, Peng Fan (OSS) wrote:
> From: Peng Fan <peng.fan@nxp.com>
> 
> Add KUnit test coverage for the assigned-clock-sscs DT property that
> configures spread spectrum on clocks before they are used.
> 
> Extend the existing test infrastructure to support spread spectrum:
> - Add struct clk_spread_spectrum field to clk_dummy_context and a
>   clk_dummy_set_spread_spectrum callback
> - Wire set_spread_spectrum into all dummy clock ops
> - Extend clk_assigned_rates_register_clk and test parameter struct
>   to propagate initial SSCS values
> 
> Add a new separate test suite clk_assigned_sscs with three categories:
> 
>   1. clk_assigned_sscs_assigns_one — verifies that a single
>      assigned-clock-sscs entry correctly configures spread spectrum
>      on one clock, testing both provider and consumer paths
> 
>   2. clk_assigned_sscs_assigns_multiple — verifies that multiple
>      assigned-clock-sscs entries configure spread spectrum on two
>      clocks, testing both provider and consumer paths
> 
>   3. clk_assigned_sscs_skips — verifies that malformed DT properties
>      are correctly skipped without error: missing assigned-clocks,
>      zero-valued SSCS, and null phandles, tested for both provider
>      and consumer scenarios
> 
> New DT overlays are added for all test scenarios:
>   - kunit_clk_assigned_sscs_one{,consumer} — single valid entry
>   - kunit_clk_assigned_sscs_multiple{,consumer} — two valid entries
>   - kunit_clk_assigned_sscs_without{,consumer} — missing assigned-clocks
>   - kunit_clk_assigned_sscs_zero{,consumer} — all-zero SSCS values
>   - kunit_clk_assigned_sscs_null{,consumer} — null phandle
> 
> Co-developed-by: Brian Masney <bmasney@redhat.com>
> Signed-off-by: Brian Masney <bmasney@redhat.com>
> Signed-off-by: Peng Fan <peng.fan@nxp.com>

Looks good to me.

It's probably not appropriate for me to also put a Reviewed-by here.

Brian



^ permalink raw reply

* Re: [PATCH net-next v7 11/12] net: pcs: airoha: add PCS driver for Airoha AN7581 SoC
From: Benjamin Larsson @ 2026-06-15 16:31 UTC (permalink / raw)
  To: Christian Marangi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Simon Horman, Jonathan Corbet, Shuah Khan,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-12-ansuelsmth@gmail.com>

Hi.

On 15/06/2026 14:29, Christian Marangi wrote:
> Add PCS driver for Airoha AN7581 SoC for Ethernet/PON/PCIe/USB SERDES
> and permit usage of external PHY or connected SFP cage. Supported modes
> are USXGMII, 10G-BASER, 2500BASE-X, 1000BASE-X and SGMII.
>
> The driver probe and register the various needed registers and register as
> a PCS provider for fwnode usage.
>
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
>   drivers/net/pcs/Kconfig                    |    2 +
>   drivers/net/pcs/Makefile                   |    2 +
>   drivers/net/pcs/airoha/Kconfig             |   12 +
>   drivers/net/pcs/airoha/Makefile            |    7 +
>   drivers/net/pcs/airoha/pcs-airoha-common.c | 1318 ++++++++++++
>   drivers/net/pcs/airoha/pcs-airoha.h        | 1309 ++++++++++++
>   drivers/net/pcs/airoha/pcs-an7581.c        | 2093 ++++++++++++++++++++
>   7 files changed, 4743 insertions(+)
>   create mode 100644 drivers/net/pcs/airoha/Kconfig
>   create mode 100644 drivers/net/pcs/airoha/Makefile
>   create mode 100644 drivers/net/pcs/airoha/pcs-airoha-common.c
>   create mode 100644 drivers/net/pcs/airoha/pcs-airoha.h
>   create mode 100644 drivers/net/pcs/airoha/pcs-an7581.c

Most likely there will be pcs drivers for the EN7523 platform also. Can 
the common code for an7581 have an7581 in the name instead of airoha?

MvH

Benjamin Larsson



^ permalink raw reply

* Re: [PATCH v1 11/11] KVM: arm64: Implement lazy vCPU state sync for non-protected guests
From: Vincent Donnefort @ 2026-06-15 16:25 UTC (permalink / raw)
  To: tabba
  Cc: Marc Zyngier, Oliver Upton, Will Deacon, Catalin Marinas,
	Quentin Perret, Sebastian Ene, Per Larsen, Suzuki K Poulose,
	Zenghui Yu, Joey Gouly, Steffen Eiden, Mark Rutland,
	Jonathan Cameron, Hyunwoo Kim, linux-arm-kernel, kvmarm,
	linux-kernel
In-Reply-To: <20260612065925.755562-12-tabba@google.com>

On Fri, Jun 12, 2026 at 07:59:25AM +0100, tabba@google.com wrote:
> pKVM copies a non-protected guest's register context between the host
> and the hypervisor on every world switch, even when the host never
> inspects it. Defer the copy: on entry, flush the host context into the
> hyp vCPU only when the host marked it dirty (PKVM_HOST_STATE_DIRTY); on
> exit, leave it in the hyp vCPU and copy it back only when the host needs
> it, via a __pkvm_vcpu_sync_state hypercall on trap handling or at vcpu
> put. A protected guest's context is copied as before, since lazy sync
> only helps where the host is trusted to see the guest's registers.
> 
> The PC is the exception: it is copied back on every exit so the
> kvm_exit tracepoint reports the guest's real exit PC rather than the
> value left by the previous sync.
> 
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h   |  1 +
>  arch/arm64/include/asm/kvm_host.h  |  2 +
>  arch/arm64/kvm/arm.c               |  7 +++
>  arch/arm64/kvm/handle_exit.c       | 22 ++++++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 88 ++++++++++++++++++++++++++++--
>  5 files changed, 115 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 043495f7fc78..6e1135b3ded4 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -113,6 +113,7 @@ enum __kvm_host_smccc_func {
>  	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
>  
>  	MARKER(__KVM_HOST_SMCCC_FUNC_MAX)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a49042bfa801..1ef660774adc 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1113,6 +1113,8 @@ struct kvm_vcpu_arch {
>  /* SError pending for nested guest */
>  #define NESTED_SERROR_PENDING	__vcpu_single_flag(sflags, BIT(8))
>  
> +/* pKVM host vcpu state is dirty, needs resync (nVHE-only) */

nit: with hVHE, I guess we can just drop that nVHE-only? 

> +#define PKVM_HOST_STATE_DIRTY	__vcpu_single_flag(iflags, BIT(4))
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
>  #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index c9f36932c980..a5c54e37778b 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -734,6 +734,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	if (is_protected_kvm_enabled()) {
>  		kvm_call_hyp(__vgic_v3_save_aprs, &vcpu->arch.vgic_cpu.vgic_v3);
>  		kvm_call_hyp_nvhe(__pkvm_vcpu_put);
> +
> +		/* __pkvm_vcpu_put implies a sync of the state */
> +		if (!kvm_vm_is_protected(vcpu->kvm))
> +			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
>  	}
>  
>  	kvm_vcpu_put_debug(vcpu);
> @@ -961,6 +965,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>  		return ret;
>  
>  	if (is_protected_kvm_enabled()) {
> +		/* Start with the vcpu in a dirty state */
> +		if (!kvm_vm_is_protected(vcpu->kvm))
> +			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
>  		ret = pkvm_create_hyp_vm(kvm);
>  		if (ret)
>  			return ret;
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 54aedf93c78b..dccc3786548b 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -422,6 +422,21 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
>  {
>  	int handled;
>  
> +	/*
> +	 * If we run a non-protected VM when protection is enabled
> +	 * system-wide, resync the state from the hypervisor and mark
> +	 * it as dirty on the host side if it wasn't dirty already
> +	 * (which could happen if preemption has taken place).
> +	 */
> +	if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
> +		preempt_disable();

nit: since we are introducing guard() with that series, this one could be
guard(preempt)().

> +		if (!(vcpu_get_flag(vcpu, PKVM_HOST_STATE_DIRTY))) {
> +			kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
> +			vcpu_set_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> +		}
> +		preempt_enable();
> +	}
> +
>  	/*
>  	 * See ARM ARM B1.14.1: "Hyp traps on instructions
>  	 * that fail their condition code check"
> @@ -489,6 +504,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
>  /* For exit types that need handling before we can be preempted */
>  void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
>  {
> +	/*
> +	 * We just exited, so the state is clean from a hypervisor
> +	 * perspective.
> +	 */
> +	if (is_protected_kvm_enabled())
> +		vcpu_clear_flag(vcpu, PKVM_HOST_STATE_DIRTY);
> +
>  	if (ARM_SERROR_PENDING(exception_index)) {
>  		if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
>  			u64 disr = kvm_vcpu_get_disr(vcpu);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 23e644c24a03..02383b372258 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -139,6 +139,49 @@ static void sync_hyp_vgic_state(struct pkvm_hyp_vcpu *hyp_vcpu)
>  		host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
>  }
>  
> +
> +static void __copy_vcpu_state(const struct kvm_vcpu *from_vcpu,
> +			      struct kvm_vcpu *to_vcpu)
> +{
> +	int i;
> +
> +	to_vcpu->arch.ctxt.regs		= from_vcpu->arch.ctxt.regs;
> +	to_vcpu->arch.ctxt.spsr_abt	= from_vcpu->arch.ctxt.spsr_abt;
> +	to_vcpu->arch.ctxt.spsr_und	= from_vcpu->arch.ctxt.spsr_und;
> +	to_vcpu->arch.ctxt.spsr_irq	= from_vcpu->arch.ctxt.spsr_irq;
> +	to_vcpu->arch.ctxt.spsr_fiq	= from_vcpu->arch.ctxt.spsr_fiq;
> +	to_vcpu->arch.ctxt.fp_regs	= from_vcpu->arch.ctxt.fp_regs;
> +
> +	/*
> +	 * Copy the sysregs, but don't mess with the timer state which
> +	 * is directly handled by EL1 and is expected to be preserved.
> +	 * enum vcpu_sysreg is sparse: VNCR-mapped registers take values
> +	 * derived from their VNCR page offset, so the timer registers do
> +	 * not form a contiguous numeric range and must be skipped by name.
> +	 */
> +	for (i = 1; i < NR_SYS_REGS; i++) {
> +		switch (i) {
> +		case CNTVOFF_EL2:
> +		case CNTV_CVAL_EL0:
> +		case CNTV_CTL_EL0:
> +		case CNTP_CVAL_EL0:
> +		case CNTP_CTL_EL0:
> +			continue;
> +		}
> +		to_vcpu->arch.ctxt.sys_regs[i] = from_vcpu->arch.ctxt.sys_regs[i];
> +	}
> +}
> +
> +static void __sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> +	__copy_vcpu_state(&hyp_vcpu->vcpu, hyp_vcpu->host_vcpu);
> +}
> +
> +static void __flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> +	__copy_vcpu_state(hyp_vcpu->host_vcpu, &hyp_vcpu->vcpu);
> +}

nit: Could that be flush/sync_hyp_vcpu_state? as everything this is called
"state" and we already have flush_debug_state() below ?

> +
>  static void flush_debug_state(struct pkvm_hyp_vcpu *hyp_vcpu)
>  {
>  	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> @@ -168,7 +211,17 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
>  	fpsimd_sve_flush();
>  	flush_debug_state(hyp_vcpu);
>  
> -	hyp_vcpu->vcpu.arch.ctxt	= host_vcpu->arch.ctxt;
> +	/*
> +	 * If we deal with a non-protected guest and the state is potentially
> +	 * dirty (from a host perspective), copy the state back into the hyp
> +	 * vcpu.
> +	 */
> +	if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu)) {
> +		if (vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY))
> +			__flush_hyp_vcpu(hyp_vcpu);
> +	} else {
> +		hyp_vcpu->vcpu.arch.ctxt = host_vcpu->arch.ctxt;
> +	}
>  
>  	hyp_vcpu->vcpu.arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
>  	hyp_vcpu->vcpu.arch.hcr_el2 &= ~(HCR_TWI | HCR_TWE);
> @@ -191,9 +244,11 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
>  	fpsimd_sve_sync(&hyp_vcpu->vcpu);
>  	sync_debug_state(hyp_vcpu);
>  
> -	host_vcpu->arch.ctxt		= hyp_vcpu->vcpu.arch.ctxt;
> -
> -	host_vcpu->arch.hcr_el2		= hyp_vcpu->vcpu.arch.hcr_el2;
> +	if (pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +		host_vcpu->arch.ctxt = hyp_vcpu->vcpu.arch.ctxt;
> +	else
> +		/* Keep the PC current for the kvm_exit tracepoint (lazy ctxt sync). */
> +		host_vcpu->arch.ctxt.regs.pc = hyp_vcpu->vcpu.arch.ctxt.regs.pc;
>  
>  	host_vcpu->arch.fault		= hyp_vcpu->vcpu.arch.fault;
>  
> @@ -227,8 +282,30 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
>  {
>  	struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>  
> -	if (hyp_vcpu)
> +	if (hyp_vcpu) {
> +		struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
> +
> +		if (!pkvm_hyp_vcpu_is_protected(hyp_vcpu) &&
> +		    !vcpu_get_flag(host_vcpu, PKVM_HOST_STATE_DIRTY)) {
> +			__sync_hyp_vcpu(hyp_vcpu);
> +		}
> +
>  		pkvm_put_hyp_vcpu(hyp_vcpu);
> +	}
> +}
> +
> +static void handle___pkvm_vcpu_sync_state(struct kvm_cpu_context *host_ctxt)
> +{
> +	struct pkvm_hyp_vcpu *hyp_vcpu;
> +
> +	if (!is_protected_kvm_enabled())
> +		return;

Since "KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls" we
got rid of those is_protected_kvm_enabled() for pKVM-only HVCs. (also, it is
declared in the pKVM-only section of the HVCs)

> +
> +	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> +	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
> +		return;
> +
> +	__sync_hyp_vcpu(hyp_vcpu);
>  }
>  
>  static struct kvm_vcpu *__get_host_hyp_vcpus(struct kvm_vcpu *arg,
> @@ -859,6 +936,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
>  	HANDLE_FUNC(__pkvm_vcpu_load),
>  	HANDLE_FUNC(__pkvm_vcpu_put),
> +	HANDLE_FUNC(__pkvm_vcpu_sync_state),
>  	HANDLE_FUNC(__pkvm_tlb_flush_vmid),
>  };
>  
> -- 
> 2.54.0.1136.gdb2ca164c4-goog
> 


^ permalink raw reply

* [PATCH 05/19] arm64: define DPS root partition type UUID
From: Vincent Mailhol @ 2026-06-15 16:09 UTC (permalink / raw)
  To: Jens Axboe, Davidlohr Bueso, Alexander Viro, Christian Brauner,
	Jan Kara
  Cc: linux-kernel, linux-block, linux-efi, linux-fsdevel,
	Vincent Mailhol, Catalin Marinas, Will Deacon, linux-arm-kernel
In-Reply-To: <20260615-discoverable-root_partitions-v1-0-39c78fac42e2@kernel.org>

DPS [1] assigns GPT partition type UUIDs to operating system partitions.
Root partitions use architecture-specific type UUIDs so the OS can
discover the intended root filesystem without relying on a root= cmdline
option.

Define DPS_ROOT_PARTITION_TYPE_UUID in asm/dps_root.h for arm64 and select
ARCH_HAS_DPS_ROOT_PARTITION_TYPE_UUID.

[1] The Discoverable Partitions Specification (DPS)
Link: https://uapi-group.org/specifications/specs/discoverable_partitions_specification/

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
---
 arch/arm64/Kconfig                | 1 +
 arch/arm64/include/asm/dps_root.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943..190f8dde63b2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -26,6 +26,7 @@ config ARM64
 	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DMA_OPS if XEN
 	select ARCH_HAS_DMA_PREP_COHERENT
+	select ARCH_HAS_DPS_ROOT_PARTITION_TYPE_UUID
 	select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
 	select ARCH_HAS_FAST_MULTIPLIER
 	select ARCH_HAS_FORTIFY_SOURCE
diff --git a/arch/arm64/include/asm/dps_root.h b/arch/arm64/include/asm/dps_root.h
new file mode 100644
index 000000000000..7344f9a52343
--- /dev/null
+++ b/arch/arm64/include/asm/dps_root.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARM64_DPS_ROOT_H
+#define _ASM_ARM64_DPS_ROOT_H
+
+#define DPS_ROOT_PARTITION_TYPE_UUID "b921b045-1df0-41c3-af44-4c6f280d3fae"
+
+#endif /* _ASM_ARM64_DPS_ROOT_H */

-- 
2.53.0



^ permalink raw reply related

* [PATCH 04/19] arm: define DPS root partition type UUID
From: Vincent Mailhol @ 2026-06-15 16:09 UTC (permalink / raw)
  To: Jens Axboe, Davidlohr Bueso, Alexander Viro, Christian Brauner,
	Jan Kara
  Cc: linux-kernel, linux-block, linux-efi, linux-fsdevel,
	Vincent Mailhol, Russell King, linux-arm-kernel
In-Reply-To: <20260615-discoverable-root_partitions-v1-0-39c78fac42e2@kernel.org>

DPS [1] assigns GPT partition type UUIDs to operating system partitions.
Root partitions use architecture-specific type UUIDs so the OS can
discover the intended root filesystem without relying on a root= cmdline
option.

Define DPS_ROOT_PARTITION_TYPE_UUID in asm/dps_root.h for ARM and select
ARCH_HAS_DPS_ROOT_PARTITION_TYPE_UUID.

[1] The Discoverable Partitions Specification (DPS)
Link: https://uapi-group.org/specifications/specs/discoverable_partitions_specification/

Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
---
 arch/arm/Kconfig                | 1 +
 arch/arm/include/asm/dps_root.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 73e6647bea46..deedb5d808fb 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -14,6 +14,7 @@ config ARM
 	select ARCH_HAS_DMA_ALLOC if MMU
 	select ARCH_HAS_DMA_OPS
 	select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
+	select ARCH_HAS_DPS_ROOT_PARTITION_TYPE_UUID
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_KEEPINITRD
diff --git a/arch/arm/include/asm/dps_root.h b/arch/arm/include/asm/dps_root.h
new file mode 100644
index 000000000000..e9f0f24bcac2
--- /dev/null
+++ b/arch/arm/include/asm/dps_root.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARM_DPS_ROOT_H
+#define _ASM_ARM_DPS_ROOT_H
+
+#define DPS_ROOT_PARTITION_TYPE_UUID "69dad710-2ce4-4e3c-b16c-21a1d49abed3"
+
+#endif /* _ASM_ARM_DPS_ROOT_H */

-- 
2.53.0



^ permalink raw reply related

* [PATCH 00/19] init: discoverable root partitions, a.k.a. an omittable "root=" cmdline option
From: Vincent Mailhol @ 2026-06-15 16:08 UTC (permalink / raw)
  To: Jens Axboe, Davidlohr Bueso, Alexander Viro, Christian Brauner,
	Jan Kara
  Cc: linux-kernel, linux-block, linux-efi, linux-fsdevel,
	Vincent Mailhol, Richard Henderson, Matt Turner, Magnus Lindholm,
	linux-alpha, Vineet Gupta, linux-snps-arc, Russell King,
	linux-arm-kernel, Catalin Marinas, Will Deacon, Huacai Chen,
	WANG Xuerui, loongarch, Thomas Bogendoerfer, linux-mips,
	James E.J. Bottomley, Helge Deller, linux-parisc,
	Madhavan Srinivasan, Michael Ellerman, linuxppc-dev,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev, linux-s390,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Jonathan Corbet, Shuah Khan, linux-doc

DPS [1] defines GPT partition type UUIDs for OS partitions and
attributes that control whether such partitions should be
automatically discovered. The specification states that:

  The OS can discover and mount the necessary file systems with a
  non-existent or incomplete /etc/fstab file and without the root=
  kernel command line option.

DPS is already implemented in systemd-gpt-auto-generator [2], which,
when embedded in an initrd, indeed allows automatic detection of the
root filesystem through its partition type UUID.

This series adds this discovery feature directly into the kernel so
that people who are not using systemd or not using an initrd can still
benefit from it. The implementation follows the same model as
systemd-gpt-auto-generator:

  - GPT partition type UUIDs are used for automatic discovery policy
    only. No root=PARTTYPEUUID=xxx cmdline option or similar syntax is
    added.

  - The root= cmdline option takes precedence. This prevents unexpected
    behaviour.

  - Only the disk with the active EFI System Partition is scanned, as
    required by DPS. The disk is identified through the Boot Loader
    Interface LoaderDevicePartUUID EFI variable.

The DPS no-auto attribute is also implemented, giving another option for
the user to disable this auto discovery. However, the DPS read-only
attribute is intentionally not enforced. The kernel already mounts the
root filesystem read-only by default unless the command line requests
rw, and user space remains responsible for deciding whether a discovered
root should later be remounted read-write based on DPS metadata and
local policy. The other partition type UUIDs (home, swap, var...) are
also out of scope for the same reason: user space remains responsible
for mounting anything other than the root partition.

Patch 1 adds the ARCH_HAS_DPS_ROOT_PARTITION_TYPE_UUID capability and
the hidden CONFIG_DPS_ROOT_AUTO_DISCOVERY Kconfig symbol used to signal
whether the feature is available. Patches 2 to 12 declare the
ARCH_HAS_DPS_ROOT_PARTITION_TYPE_UUID capability for the supported
architectures and define their architecture-specific root partition type
UUID values in asm/dps_root.h.

Patches 13 to 16 make the GPT partition type UUID and the no-auto
attribute available during early block lookup.

Patch 17 is a small code refactor that prepares for patch 18, which
updates the root mount path so that, when root= is omitted, the kernel
reads LoaderDevicePartUUID and uses the early block lookup
infrastructure to discover the DPS root partition on that disk.

Finally, patch 19 documents this automatic root discovery feature.

Tested with GRUB, which implements the LoaderDevicePartUUID EFI variable
in its bli module [3]. With this, I was able to boot a kernel with a
completely empty cmdline and no initrd.

[1] The Discoverable Partitions Specification (DPS)
Link: https://uapi-group.org/specifications/specs/discoverable_partitions_specification/

[2] systemd-gpt-auto-generator
Link: https://www.freedesktop.org/software/systemd/man/latest/systemd-gpt-auto-generator.html

[3] GRUB -- §16.2 bli
Link: https://www.gnu.org/software/grub/manual/grub/html_node/bli_005fmodule.html

Signed-off-by: Vincent Mailhol <mailhol@kernel.org>
---
Vincent Mailhol (19):
      init: add DPS root partition type UUID capability
      alpha: define DPS root partition type UUID
      arc: define DPS root partition type UUID
      arm: define DPS root partition type UUID
      arm64: define DPS root partition type UUID
      loongarch: define DPS root partition type UUID
      mips: define DPS root partition type UUIDs
      parisc: define DPS root partition type UUID
      powerpc: define DPS root partition type UUIDs
      riscv: define DPS root partition type UUIDs
      s390: define DPS root partition type UUIDs
      x86: define DPS root partition type UUIDs
      block: store GPT partition type UUID
      block: add early_lookup_bdev_by_type_uuid()
      block: store GPT attributes as a raw value
      block: don't discover partition with DPS no-auto GPT attribute
      init: factor out root device lookup into lookup_root_device()
      init: discover root by DPS partition type UUID
      docs: document discoverable root partitions

 Documentation/admin-guide/discoverable-root.rst | 33 +++++++++
 Documentation/admin-guide/index.rst             |  1 +
 Documentation/admin-guide/kernel-parameters.txt |  5 ++
 arch/alpha/Kconfig                              |  1 +
 arch/alpha/include/asm/dps_root.h               |  8 +++
 arch/arc/Kconfig                                |  1 +
 arch/arc/include/asm/dps_root.h                 |  8 +++
 arch/arm/Kconfig                                |  1 +
 arch/arm/include/asm/dps_root.h                 |  8 +++
 arch/arm64/Kconfig                              |  1 +
 arch/arm64/include/asm/dps_root.h               |  8 +++
 arch/loongarch/Kconfig                          |  1 +
 arch/loongarch/include/asm/dps_root.h           |  8 +++
 arch/mips/Kconfig                               |  1 +
 arch/mips/include/asm/dps_root.h                | 20 ++++++
 arch/parisc/Kconfig                             |  1 +
 arch/parisc/include/asm/dps_root.h              |  8 +++
 arch/powerpc/Kconfig                            |  1 +
 arch/powerpc/include/asm/dps_root.h             | 16 +++++
 arch/riscv/Kconfig                              |  1 +
 arch/riscv/include/asm/dps_root.h               | 12 ++++
 arch/s390/Kconfig                               |  1 +
 arch/s390/include/asm/dps_root.h                | 12 ++++
 arch/x86/Kconfig                                |  1 +
 arch/x86/include/asm/dps_root.h                 | 12 ++++
 block/blk.h                                     |  1 +
 block/early-lookup.c                            | 68 +++++++++++++++++-
 block/partitions/core.c                         |  2 +
 block/partitions/efi.c                          |  3 +
 block/partitions/efi.h                          | 11 ++-
 include/linux/blk_types.h                       |  1 +
 include/linux/blkdev.h                          |  5 ++
 include/linux/root_dev.h                        |  6 ++
 init/Kconfig                                    |  6 ++
 init/do_mounts.c                                | 94 ++++++++++++++++++++++++-
 35 files changed, 355 insertions(+), 12 deletions(-)
---
base-commit: 36808d5e983985bbda87e01059cccc071fe3ec8d
change-id: 20260611-discoverable-root_partitions-bdacbada570d

Best regards,
-- 
Vincent Mailhol <mailhol@kernel.org>

^ permalink raw reply

* Re: [PATCH] clk: zynq: handle kasprintf() failure in periph_clk registration
From: Brian Masney @ 2026-06-15 16:07 UTC (permalink / raw)
  To: William Theesfeld
  Cc: Michael Turquette, Stephen Boyd, Michal Simek, linux-clk,
	linux-arm-kernel, linux-kernel
In-Reply-To: <20260601203500.658135-1-william@theesfeld.net>

On Mon, Jun 01, 2026 at 04:35:00PM -0400, William Theesfeld wrote:
> zynq_clk_register_periph_clk() ignores the return value of the two
> kasprintf() calls used to build the mux and divider clock names, and
> passes the resulting (possibly NULL) pointers straight into
> clk_register_mux(), clk_register_divider() and clk_register_gate() as
> the clock '"'name'"' argument.  On allocation failure that name later
> gets dereferenced by the clock framework (e.g. in debugfs name
> formatting), causing a NULL-pointer dereference.
> 
> Check both kasprintf() returns.  On failure unwind any allocated name
> buffer and the spinlock, then fall through to the existing err label
> which sets clks[] to ERR_PTR(-ENOMEM).  Freeing the spinlock on the
> error path is correct here because no clk_register_*() call has had
> a chance to take ownership of it; the success path intentionally
> hands it off to the registered clocks.
> 
> The neighbouring zynq_clk_register_fclk() in the same file already
> uses this per-allocation goto-label cleanup pattern; this change
> brings periph_clk into line with it.
> 
> Signed-off-by: William Theesfeld <william@theesfeld.net>

Reviewed-by: Brian Masney <bmasney@redhat.com>



^ permalink raw reply

* Re: [PATCH net-next v7 12/12] net: airoha: add phylink support
From: Benjamin Larsson @ 2026-06-15 16:07 UTC (permalink / raw)
  To: Christian Marangi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Simon Horman, Jonathan Corbet, Shuah Khan,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-13-ansuelsmth@gmail.com>

Hi.

On 15/06/2026 14:29, Christian Marangi wrote:
> Add phylink support for each GDM port. For GDM1 add the internal interface
> mode as the only supported mode. For GDM2/3/4 add the required
> configuration of the PCS to make the external PHY or attached SFP cage
> work.
>
> These needs to be defined in the GDM port node using the pcs-handle
> property.
>
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
>   drivers/net/ethernet/airoha/Kconfig       |   1 +
>   drivers/net/ethernet/airoha/airoha_eth.c  | 161 +++++++++++++++++++++-
>   drivers/net/ethernet/airoha/airoha_eth.h  |   3 +
>   drivers/net/ethernet/airoha/airoha_regs.h |  12 ++
>   4 files changed, 176 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/airoha/Kconfig b/drivers/net/ethernet/airoha/Kconfig
> index ad3ce501e7a5..38dcc76e5998 100644
> --- a/drivers/net/ethernet/airoha/Kconfig
> +++ b/drivers/net/ethernet/airoha/Kconfig
> @@ -20,6 +20,7 @@ config NET_AIROHA
>   	depends on NET_DSA || !NET_DSA
>   	select NET_AIROHA_NPU
>   	select PAGE_POOL
> +	select PHYLINK
>   	help
>   	  This driver supports the gigabit ethernet MACs in the
>   	  Airoha SoC family.
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 5f1a118875fb..9a42fb991bd7 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -8,6 +8,7 @@
>   #include <linux/of_reserved_mem.h>
>   #include <linux/platform_device.h>
>   #include <linux/tcp.h>
> +#include <linux/pcs/pcs.h>
>   #include <linux/u64_stats_sync.h>
>   #include <net/dst_metadata.h>
>   #include <net/page_pool/helpers.h>
> @@ -1810,6 +1811,14 @@ static int airoha_dev_open(struct net_device *netdev)
>   	u32 cur_len, pse_port = FE_PSE_PORT_PPE1;
>   	struct airoha_qdma *qdma = dev->qdma;
>   
> +	err = phylink_of_phy_connect(dev->phylink, netdev->dev.of_node, 0);
> +	if (err) {
> +		netdev_err(netdev, "could not attach PHY: %d\n", err);
> +		return err;
> +	}
> +
> +	phylink_start(dev->phylink);
> +
>   	netif_tx_start_all_queues(netdev);
>   	err = airoha_set_vip_for_gdm_port(dev, true);
>   	if (err)
> @@ -1907,6 +1916,9 @@ static int airoha_dev_stop(struct net_device *netdev)
>   		}
>   	}
>   
> +	phylink_stop(dev->phylink);
> +	phylink_disconnect_phy(dev->phylink);
> +
>   	return 0;
>   }
>   
> @@ -3168,6 +3180,151 @@ bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
>   	return false;
>   }
>   
> +/* Nothing to do in MAC, everything is handled in PCS */
> +static void airoha_mac_config(struct phylink_config *config, unsigned int mode,
> +			      const struct phylink_link_state *state)
> +{
> +}
> +
> +static void airoha_mac_link_up(struct phylink_config *config, struct phy_device *phy,
> +			       unsigned int mode, phy_interface_t interface,
> +			       int speed, int duplex, bool tx_pause, bool rx_pause)
> +{
> +	struct airoha_gdm_dev *dev = container_of(config, struct airoha_gdm_dev,
> +						  phylink_config);
> +	struct airoha_gdm_port *port = dev->port;
> +	struct airoha_eth *eth = dev->eth;
> +	u32 frag_size_tx, frag_size_rx;
> +	u32 mask, val;
> +
> +	/* TX/RX frag is configured only for GDM4 */
> +	if (port->id != AIROHA_GDM4_IDX)
> +		return;
> +
> +	switch (speed) {
> +	case SPEED_10000:
> +	case SPEED_5000:
> +		frag_size_tx = 8;
> +		frag_size_rx = 8;
> +		break;
> +	case SPEED_2500:
> +		frag_size_tx = 2;
> +		frag_size_rx = 1;
> +		break;
> +	default:
> +		frag_size_tx = 1;
> +		frag_size_rx = 0;
> +	}
> +
> +	/* Configure TX/RX frag based on speed */
> +	if (dev->nbq == 1) {
> +		mask = GDMA4_SGMII1_TX_FRAG_SIZE_MASK;

Can the naming be consistently GDM4 without the A?

MvH

Benjamin Larsson



^ permalink raw reply

* [RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots
From: Alexandru Elisei @ 2026-06-15 15:52 UTC (permalink / raw)
  To: pbonzini, kvm, linux-kernel, maz, oupton, suzuki.poulose, kvmarm,
	linux-arm-kernel, seanjc, david.hildenbrand, mark.rutland

For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the
memory provider for the virtual machine is the guest_memfd file, not the
userspace mapping. Faults are resolved using the guest_memfd page cache,
and the permissions for the secondary MMU mapping depends exclusively on
the memslot (i.e, if the memslot is read-only). How userspace happens to
have the memory mmaped at fault time, or even if the memory is mapped at
all into userspace, is not taken into consideration.

guest_memfd memory is not evictable, is not movable and there's no backing
storage. Once memory is allocated for an offset in guest_memfd file, the
offset will not change, and that memory is not freed unless userspace
explicitly punches a hole in the file. As a result, memory reclaim, page
migration, page aging and dirty page tracking for the userspace mapping
serve little purpose.

Despite this, KVM's MMU notifiers still modify the secondary MMU page
tables, similar to ordinary memslots, only for the same memory to be
remapped next time a guest accesses it. Make the disconnect between the
user mapping and the secondary MMU page tables explicit by ignoring the MMU
notifiers for guest_memfd-only memslots.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
The only theoretical instance where the MMU notifiers are invoked for the
userspace mapping of a guest_memfd-only memslot that I was able to find was
automatic NUMA balancing with a non-NULL NUMA policy for the guest_memfd
file. I wasn't able to test it in practice. Also my knowledge of MM is very
limited, so there might be other cases where it happens, or I might be
wrong and today the MMU notifiers are never invoked.

Either way, when and if it happens, having memory unmapped from the
seconday MMU in the case of guest_memfd-only memslot is at most a
performance issue (it causes unnecessary guest faults), but I wanted to
start a conversation about this because having memory that stays mapped at
stage 2 (unless userspace explicitly unmaps it from the VM) is needed for a
Arm feature (called SPE, Statistical Profiling Extension) that I'm working
to upstream. This patch aims to provide the guarantee that memory won't be
unmapped from the secondary MMU behind the VMMs back, which is what happens
for non guest_memfd memslots.

 virt/kvm/kvm_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 881f92d7a469..8c4158996928 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
 			unsigned long hva_start, hva_end;

 			slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
+
+			if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot))
+				continue;
+
 			hva_start = max_t(unsigned long, range->start, slot->userspace_addr);
 			hva_end = min_t(unsigned long, range->end,
 					slot->userspace_addr + (slot->npages << PAGE_SHIFT));

base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
-- 
2.54.0

^ permalink raw reply related

* Re: [PATCH v2 7/8] dt-bindings: display: allwinner: Split H616 DE33 layer reg space
From: Jernej Škrabec @ 2026-06-15 15:47 UTC (permalink / raw)
  To: wens, Krzysztof Kozlowski
  Cc: samuel, mripard, maarten.lankhorst, tzimmermann, airlied, simona,
	robh, krzk+dt, conor+dt, mturquette, sboyd, dri-devel, devicetree,
	linux-arm-kernel, linux-sunxi, linux-kernel, linux-clk
In-Reply-To: <86943057-f5b4-4fae-9172-45f13814494f@kernel.org>

Dne ponedeljek, 15. junij 2026 ob 06:28:54 Srednjeevropski poletni čas je Krzysztof Kozlowski napisal(a):
> On 14/06/2026 16:08, Jernej Škrabec wrote:
> > Dne ponedeljek, 25. maj 2026 ob 14:10:38 Srednjeevropski poletni čas je Krzysztof Kozlowski napisal(a):
> >> On 24/05/2026 23:33, Chen-Yu Tsai wrote:
> >>> Hi,
> >>>
> >>> (resent from new email)
> >>>
> >>> On Thu, May 14, 2026 at 2:04 PM Krzysztof Kozlowski <krzk@kernel.org> wrote:
> >>>>
> >>>> On Sat, May 09, 2026 at 09:00:14PM +0200, Jernej Skrabec wrote:
> >>>>> From: Jernej Skrabec <jernej.skrabec@gmail.com>
> >>>>>
> >>>>> As it turns out, current H616 DE33 binding was written based on
> >>>>> incomplete understanding of DE33 design. Namely, planes are shared
> >>>>> resource and not tied to specific mixer, which was the case for previous
> >>>>> generations of Display Engine (DE3 and earlier).
> >>>>>
> >>>>> This means that current DE33 binding doesn't properly reflect HW and
> >>>>> using it would mean that second mixer (used for second display output)
> >>>>> can't be supported.
> >>>>>
> >>>>> Remove layer register space, which will be represented with additional
> >>>>> node, and replace it with phandle, which will point to that new, shared
> >>>>> node. That way, all mixers can share same layers.
> >>>>>
> >>>>> There is no user of this binding yet, so changes can be made safely,
> >>>>> without breaking any backward compatibility.
> >>>>
> >>>> There is user. git grep gives me:
> >>>> drivers/gpu/drm/sun4i/sun8i_mixer.c
> >>>>
> >>>> which means this is a released ABI. As I understood, the old code was
> >>>
> >>> We held off on merging the DT changes so that we could rework this.
> >>> I can't find the actual request though. It was probably over IRC.
> >>>
> >>>> working fine but just did not support all use cases. Why this cannot be
> >>>> kept backwards compatible?
> >>>
> >>> AFAIK the "planes" block is shared between two display mixers. As the
> >>> commit message explains, this prevents using the second mixer, since
> >>> only one of them can claim and map the register space. And on the H700
> >>> (which is the same die as the H616 discussed here but with more exposed
> >>> interfaces), there could actually be a use case for the second mixer.
> >>
> >> It explains why you want to make the changes but not why you cannot keep
> >> it backwards compatible.
> > 
> > I guess it can be backward compatible, but I don't think it makes sense.
> > Yes, original driver implemented original DT bindings, but there is no node
> > which uses that binding. If there is no user of that, why would driver
> 
> Did you check all out of tree users of the ABI? All vendor kernels,
> forks and all of them for which the ABI was made for?

Since when do we care about out of tree users? I understand that drivers
must support old device tree files. Once they work, compatibility must
be carried forward. But that's not the case here.

In any case, vendor kernels have completely different DT structure. This
was developed independently from them. Take a look at [1] how BSP DT looks
like, specifically Display Engine node.

Of course there are some distros which grab WIP patches from mailing lists
soon after they are available. For example, I know that Armbian carried old
WIP patches which used old ABI. However, such distros generally don't care
about exact solution and ditch patches as soon as proper solution is merged
upstream or even when better WIP patches come around. DT files in such
distros get updated alongside kernel, they are not hidden in firmware. 

Best regards,
Jernej

[1] https://github.com/orangepi-xunlong/linux-orangepi/blob/orange-pi-4.9-sun50iw9/arch/arm64/boot/dts/sunxi/sun50iw9p1.dtsi#L1315-L1339

> 
> If there is no single downstream/out of tree kernel using this ABI, then
> of course you do not need to consider it. I don't know how would you
> prove that but I am open for suggestions.
> 
> > need to support it nevertheless? Supporting only actually used DT binding
> > allows for better code architecture, as there is no need to support second,
> > unused path. It also simplifies testing, since developer doesn't need to
> > test both paths if code is changed in that area.
> > 
> Best regards,
> Krzysztof
> 






^ permalink raw reply

* Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
From: Ryan Roberts @ 2026-06-15 15:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linu Cherian, Catalin Marinas, Kevin Brodsky, Anshuman Khandual,
	Yang Shi, Mark Rutland, Huang Ying, linux-arm-kernel,
	linux-kernel, shameerali.kolothum.thodi
In-Reply-To: <ajAPnahwTe1OHQDp@willie-the-truck>

On 15/06/2026 15:43, Will Deacon wrote:
> On Mon, Jun 15, 2026 at 12:21:19PM +0100, Ryan Roberts wrote:
>> On 14/06/2026 12:04, Will Deacon wrote:
>>> On Sat, May 23, 2026 at 07:17:10PM +0530, Linu Cherian wrote:
>>>> From: Ryan Roberts <ryan.roberts@arm.com>
>>>>
>>>> Testing with 7.1-rc4 :
>>>> +-----------------------+---------------------------------------------------+-------------+
>>>> | Benchmark             | Result Class                                      |  Improvement|  
>>>> +=======================+===================================================+=============+
>>>> | perf/syscall          | fork (ops/sec)                                    |   (I) 3.25% |
>>>> +-----------------------+---------------------------------------------------+-------------+
>>>> | pts/memtier-benchmark | Protocol: Redis Clients: 100 Ratio: 1:5 (Ops/sec) |   (I) 2.70% |
>>>> | 			| Protocol: Redis Clients: 100 Ratio: 5:1 (Ops/sec) |   (I) 2.13% |
>>>> +-----------------------+---------------------------------------------------+-------------+
>>>
>>> I think we need a much more comprehensive set of benchmarks before we can
>>> begin to consider a change like this.
>>
>> I believe that Linu ran a wider set of benchmarks and didn't find any
>> regressions. These are just the ones that show improvement (Linu, please correct
>> me and/or provide details).
> 
> I think it's important to show the ones that suffer as well... and also
> look at different configurations (e.g. preemptible settings) and different
> environments (e.g. native vs in a VM).
> 
>> Additionally Huang Ying did some testing against the RFC and reported 4.5%
>> improvement with Redis:
>>
>> https://lore.kernel.org/linux-arm-kernel/87segumv6w.fsf@DESKTOP-5N7EMDA 
> 
> To be clear: I'm not disputing that some benchmarks appear to show a small
> boost from this series. I'm just worried that's not the whole story.
> 
>>>>  arch/arm64/include/asm/mmu.h         |  12 +++
>>>>  arch/arm64/include/asm/mmu_context.h |   2 +
>>>>  arch/arm64/include/asm/tlbflush.h    | 127 +++++++++++++++++++++------
>>>>  arch/arm64/mm/context.c              |  30 ++++++-
>>>>  4 files changed, 141 insertions(+), 30 deletions(-)
>>>
>>> Doesn't this break BTM/SVM with the SMMU? I think that's a non-starter
>>> even if you can provide some more compelling numbers.
>>
>> AIUI, we don't support BTM upstream - the SMMU uses private ASIDs and implements
>> MMU notifiers to forward the TLBIs via its command queue interface.
>>
>> I was also under the impression that supporting BTM upsteam was not desired;
>> Please correct me if that's not accurate or if you're aware of plans to add
>> support. I've been (coincidentlly) looking at some other stuff that could
>> benefit from BTM but had concluded it wouldn't be an acceptable approach upstream.
>>
>> If we did ever want to add SMMU BTM support though, I think it would be simple
>> enough to add an interface to allow the SMMU to disable the optimization (i.e.
>> force active_cpu to ACTIVE_CPU_MULTIPLE)?
> 
> We used to have some initial BTM support in the SMMUv3 driver but the
> main problem was finding an upstream driver/soc that can use it properly
> and so it was ultimately removed in d38c28dbefee ("iommu/arm-smmu-v3: Put
> the SVA mmu notifier in the smmu_domain") because it was getting in the
> way of wider driver rework and we couldn't test it.
> 
> However, there *is* work to re-enable it on top of that rework (and other
> changes):
> 
>   https://lore.kernel.org/linux-iommu/20250319173202.78988-6-shameerali.kolothum.thodi@huawei.com/
> 
> although I don't know if Shameer intends to repost that...

Thanks for the pointers; That's very interesting feedback. I'll take a closer
look :)

> 
>>>> +static inline bool flush_tlb_user_pre(struct mm_struct *mm, tlbf_t flags)
>>>> +{
>>>> +	unsigned int self, active;
>>>> +	bool local;
>>>> +
>>>> +	migrate_disable();
>>>> +
>>>> +	if (flags & TLBF_NOBROADCAST) {
>>>> +		dsb(nshst);
>>>> +		return true;
>>>> +	}
>>>
>>> Why does the NOBROADCAST case need migration disabled? It didn't before...
>>
>> The existing semantic for TLBF_BOBROADCAST is that it emits a local TLBI on
>> whatever CPU we happen to be executing on. It's used for lazily fixing up
>> spurious faults (i.e. hitting RO TLB entries when the PTE has been relaxed to
>> RW). So it's still functionally correct if the thread migrates CPU between
>> taking the fault and issuing the local TLBI - in the worst case it just leads to
>> another spurious fault.
>>
>> For this new case, we need to ensure we don't get migrated between reading
>> active_cpu and issuing the local TLBI, otherwise we would only issue a local
>> TLBI when a broadcast was required.
> 
> Sounds like those two users probably need separating out, then?

Ahh, I see; I'll admit I hadn't actually reviewed the new integration part. I
agree - NOBROADCAST is different to to this. This is an optimization for the
"not NOBROADCAST" case. We need to avoid disabling migration in the NOBROADCAST
case.

> 
>>>> +	self = smp_processor_id();
>>>> +
>>>> +	/*
>>>> +	 * The load of mm->context.active_cpu must not be reordered before the
>>>> +	 * store to the pgtable that necessitated this flush. This ensures that
>>>> +	 * if the value read is our cpu id, then no other cpu can have seen the
>>>> +	 * old pgtable value and therefore does not need this old value to be
>>>> +	 * flushed from its tlb. But we don't want to upgrade the dsb(ishst),
>>>> +	 * needed to make the pgtable updates visible to the walker, to a
>>>> +	 * dsb(ish) by default. So speculatively load without a barrier and if
>>>> +	 * it indicates our cpu id, then upgrade the barrier and re-load.
>>>> +	 */
>>>> +	active = READ_ONCE(mm->context.active_cpu);
>>>> +	if (active == self) {
>>>> +		dsb(ish);
>>>> +		active = READ_ONCE(mm->context.active_cpu);
>>>> +	} else {
>>>> +		dsb(ishst);
>>>> +	}
>>>
>>> Why can't you just do:
>>>
>>> 	dsb(ishst);
>>> 	active = READ_ONCE(mm->context.active_cpu);
>>>
>>> ?
>>
>> Prior to this optimization, we always issued a dsb(ishst) here. Catalin
>> suggested the same simplification against the RFC. I believe Linu tried it but
>> saw regressions; Hopefully Linu can provide the details.
> 
> I don't follow...
> 
> The old code always did dsb(ishst). The proposed code here does either
> dsb(ish) or dsb(ishst). How can that possibly be faster?

Ugh, sorry - I read your suggestion as unconditionally issuing a dsb(ish).

Ignore my previous answer, and now I'll demonstrate my total lack of
understanding of barriers instead...

As the comment says, "The load of mm->context.active_cpu must not be reordered
before the store to the pgtable that necessitated this flush". I thought that a
dsb(ishst) would only provide ordering between stores. Don't we need the
dsb(ish) to prevent the load from being reordered before the store?

> 
>>>> +	local = active == self;
>>>> +	if (!local)
>>>> +		migrate_enable();
>>>> +
>>>> +	return local;
>>>> +}
>>>> +
>>>> +static inline void flush_tlb_user_post(bool local)
>>>> +{
>>>> +	if (local)
>>>> +		migrate_enable();
>>>> +}
>>>
>>> I was under the impression that disabling/enabling migration was an
>>> expensive thing to do, so I'd really want to see some more numbers to
>>> justify this (including from inside a VM) and allow us to consider the
>>> trade-offs properly. It's also not at all clear to me that it's safe
>>> from such a low-level TLB invalidation helper.
>>
>> I had assumed it wasn't very expensive, but perhaps I'm wrong. I know
>> preempt_enable() can be expensive because it has to test to see if it needs to
>> reschedule. But I assumed for disabling/enabling migration, it would just be a
>> counter and the scheduler would check that it's zero before considing moving the
>> task to another run queue. (But I have practically zero understanding of the
>> scheduler so I'll assume I'm wrong...).
> 
> I'm not an expert here either, but reading the code shows that it has
> a preempt guard along with additional book-keeping.
> 
>> Instead of disabling migration, perhaps we could re-check active_cpu after
>> issuing the local tlbi - if it's now reporting "multiple" we must have been
>> migrated and we need to upgrade to a braodcast TLBI?
> 
> That's an interesting idea, although I suppose it means the
> post-invalidation DSB needs to be ISH for the local case to check the
> active_cpu safely?

Another idea could be to use PeterZ's "kernel mode restartable sequence" thingy,
then we can detect migration and retry? There a thread briefly talking about
doing something similar to avoid disabling preemption in the this_cpu_ ops, but
not sure if it went anywhere.

> 
>>>>   *	TLB Invalidation
>>>>   *	================
>>>> @@ -408,12 +482,20 @@ static inline void flush_tlb_all(void)
>>>>  static inline void flush_tlb_mm(struct mm_struct *mm)
>>>>  {
>>>>  	unsigned long asid;
>>>> +	bool local;
>>>>  
>>>> -	dsb(ishst);
>>>> +	local = flush_tlb_user_pre(mm, TLBF_NONE);
>>>>  	asid = __TLBI_VADDR(0, ASID(mm));
>>>> -	__tlbi(aside1is, asid);
>>>> -	__tlbi_user(aside1is, asid);
>>>> -	__tlbi_sync_s1ish(mm);
>>>> +	if (local) {
>>>> +		__tlbi(aside1, asid);
>>>> +		__tlbi_user(aside1, asid);
>>>> +		dsb(nsh);
>>>> +	} else {
>>>> +		__tlbi(aside1is, asid);
>>>> +		__tlbi_user(aside1is, asid);
>>>> +		__tlbi_sync_s1ish(mm);
>>>> +	}
>>>> +	flush_tlb_user_post(local);
>>>
>>> I think you've changed this since Ryan's original patch, but why are you
>>> only calling __tlbi_sync_s1ish() for the !local case? Doesn't that break
>>> the erratum workaround when running as a VM if the vCPU is migrated?
>>
>> Hmm. So from the guest kernel's perspective, it has concluded that it only needs
>> to target the local (v)CPU. Since the errata only affect boardcast TLBIs, it
>> concludes there is no need to issue the workarounds. But since it's a VM, the HW
>> will upgrade the local TLBIs to broadcast TLBIs, but will not magically
>> re-instate the workarounds. I guess the simplest solution would be to disable
>> the optimization when either workaround is enabled.
> 
> That's what I was thinking, but Mark seems to think it's ok. I'll reply
> to him on the other part of the thread.
> 
>> Perhaps this is all getting a bit too complex for not enough benefit...
> 
> I don't think the complexity is unmanageable, but I'm not yet convinced
> that this offers any real benefit overall.

I'll talk with Linu and see if we can present a clearer view.

Thanks,
Ryan


> 
> Will



^ permalink raw reply

* Re: [PATCH v1 0/6] perf vendor events intel: update
From: Ian Rogers @ 2026-06-15 15:37 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Chun-Tse Shao, peterz, mingo, acme, namhyung, alexander.shishkin,
	jolsa, adrian.hunter, james.clark, afaerber, mani,
	linux-perf-users, linux-kernel, linux-arm-kernel, linux-actions
In-Reply-To: <0b6d58d5-f802-47dc-ae71-31c45184b738@linux.intel.com>

On Sun, Jun 14, 2026 at 6:32 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>
> LGTM. Thanks.
>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

Reviewed-by: Ian Rogers <irogers@google.com>

The tigerlake update is just the version number. On the perfmon git
the change was removing zero fields:
https://github.com/intel/perfmon/commit/8353ffb63efcad6b6fac1a8c05d76e2d6317ae23
but zero fields are dropped from the perf JSON, resulting in no differences.

Thanks,
Ian

> On 6/10/2026 5:50 AM, Chun-Tse Shao wrote:
> > Sync with the latest perfmon events from:
> > https://github.com/intel/perfmon
> > by running the script:
> > https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
> > and copying the resulting json and mapfile.csv changes into the perf
> > tree.
> >
> > Chun-Tse Shao (6):
> >   perf vendor events intel: Update arrowlake events from 1.17 to 1.19
> >   perf vendor events intel: Update emeraldrapids events from 1.23 to
> >     1.24
> >   perf vendor events intel: Update graniterapids events from 1.18 to
> >     1.19
> >   perf vendor events intel: Update lunarlake events from 1.22 to 1.25
> >   perf vendor events intel: Update pantherlake events from 1.05 to 1.06
> >   perf vendor events intel: Update tigerlake events from 1.18 to 1.19
> >
> >  .../pmu-events/arch/x86/arrowlake/cache.json  |  30 ++-
> >  .../arch/x86/arrowlake/floating-point.json    |  45 ++++
> >  .../pmu-events/arch/x86/arrowlake/memory.json |  18 ++
> >  .../arch/x86/arrowlake/pipeline.json          | 129 +++++++++-
> >  .../arch/x86/emeraldrapids/cache.json         |   9 +
> >  .../graniterapids/uncore-interconnect.json    |  10 +
> >  .../arch/x86/graniterapids/uncore-memory.json |   2 +-
> >  .../pmu-events/arch/x86/lunarlake/cache.json  |   2 +-
> >  .../arch/x86/lunarlake/pipeline.json          |  27 ++-
> >  .../arch/x86/lunarlake/uncore-memory.json     | 208 ++++++++++++++++-
> >  tools/perf/pmu-events/arch/x86/mapfile.csv    |  12 +-
> >  .../arch/x86/pantherlake/counter.json         |   5 +
> >  .../arch/x86/pantherlake/pipeline.json        |  29 ++-
> >  .../x86/pantherlake/uncore-interconnect.json  |  10 +
> >  .../arch/x86/pantherlake/uncore-memory.json   | 221 +++++++++++++++++-
> >  15 files changed, 728 insertions(+), 29 deletions(-)
> >  create mode 100644 tools/perf/pmu-events/arch/x86/pantherlake/uncore-interconnect.json
> >


^ permalink raw reply

* Re: [PATCH net] net: ethernet: mtk_eth_soc: fix supported_interface set after phylink_create
From: Daniel Golle @ 2026-06-15 15:33 UTC (permalink / raw)
  To: Christian Marangi
  Cc: Felix Fietkau, Lorenzo Bianconi, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	AngeloGioacchino Del Regno, Russell King, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek
In-Reply-To: <20260615151106.15438-1-ansuelsmth@gmail.com>

On Mon, Jun 15, 2026 at 05:11:00PM +0200, Christian Marangi wrote:
> Everything configured in phylink_config it's assumed to be set before
> calling phylink_create() to permit correct parsing of all the different
> modes and capabilities.
> 
> Commit 51cf06ddafc9 ("net: ethernet: mtk_eth_soc: add support for MT7988
> internal 2.5G PHY") while introducing support for 2.5G phy for MT7988,
> probably due to an auto-rebase, placed the configuration of the INTERNAL
> interface mode for the supported_interfaces for phylink_config right after
> phylink_create() introducing a possible problem with supported interfaces
> parsing.
> 
> While this doesn't currently create any problem/bug, move setting this bit
> before phylink_create() to prevent any possible regression in future code
> change in phylink core.
> 
> Fixes: 51cf06ddafc9 ("net: ethernet: mtk_eth_soc: add support for MT7988 internal 2.5G PHY")
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>

Reviewed-by: Daniel Golle <daniel@makrotopia.org>

As no user-visible bug surfaces because of that it is questionable though if
the Fixes:-tag is justified.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox