Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] KVM: arm64: vgic: Skip vCPU trylock for pre-init register access
@ 2026-05-10 22:11 David Woodhouse
  2026-05-11 11:03 ` Yao Yuan
  0 siblings, 1 reply; 3+ messages in thread
From: David Woodhouse @ 2026-05-10 22:11 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Sascha Bischoff,
	Paolo Bonzini, Jonathan Cameron, Eric Auger,
	Raghavendra Rao Ananta, David Woodhouse, Maxim Levitsky,
	linux-arm-kernel, kvmarm, kvm

[-- Attachment #1: Type: text/plain, Size: 3676 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

When creating a VM, userspace sets pre-init distributor registers such
as GICD_IIDR before the VGIC is initialized.

Strictly speaking there's no reason this couldn't be done at VM creation
time before any vCPUs exist at all, but the design of the userspace API
does require vCPU0 to have been created, as the system-wide registers
are effectively accessed via vCPU0.

So a VMM can't just set the IIDR at startup before spawning vCPU threads;
it has to be done once the vCPUs are being created.

However, kvm_trylock_all_vcpus() makes it unreliable by causing spurious
-EBUSY failures if any vCPU cannot be locked. So userspace is forced to
create the vCPUs (well, at least vCPU0), and is then forced to have a
full synchronization point and quiesce them all in order to reliably set
the IIDR.

To slightly reduce the pain of all this, skip the trylock when the VGIC
is not yet initialized. There is no need to lock the vCPUs if they can't
be accessing it anyway.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
Other options would include making it possible to set the IIDR before
creating any vCPUs, either by creating a new device-level attribute or
special-casing it not to require vCPU0 for DIST_REGS that aren't really
associated to a vCPU.

Deprecating kvm_trylock_all_vcpus() and killing every user of it with
fire would also be a good option, perhaps... :)

 arch/arm64/kvm/vgic/vgic-kvm-device.c | 38 ++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index a96c77dccf35..e17ea9f07f5f 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -540,7 +540,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 	struct vgic_reg_attr reg_attr;
 	gpa_t addr;
 	struct kvm_vcpu *vcpu;
-	bool uaccess;
+	bool uaccess, vcpus_locked = false;
 	u32 val;
 	int ret;
 
@@ -566,18 +566,37 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 			return -EFAULT;
 	}
 
+	if (!vgic_initialized(dev->kvm) && !reg_allowed_pre_init(attr))
+		return -EBUSY;
+
 	mutex_lock(&dev->kvm->lock);
 
-	if (kvm_trylock_all_vcpus(dev->kvm)) {
-		mutex_unlock(&dev->kvm->lock);
-		return -EBUSY;
+	/*
+	 * Pre-init registers (e.g. GICD_IIDR) don't need vCPU quiescence
+	 * since the VGIC isn't live yet.  Skip the trylock to avoid spurious
+	 * -EBUSY when vCPU threads happen to be running.
+	 */
+	if (vgic_initialized(dev->kvm)) {
+		if (kvm_trylock_all_vcpus(dev->kvm)) {
+			mutex_unlock(&dev->kvm->lock);
+			return -EBUSY;
+		}
+		vcpus_locked = true;
 	}
-
 	mutex_lock(&dev->kvm->arch.config_lock);
 
-	if (!(vgic_initialized(dev->kvm) || reg_allowed_pre_init(attr))) {
-		ret = -EBUSY;
-		goto out;
+	/*
+	 * If the VGIC becomes initialized between the above check and taking
+	 * config_lock, drop config_lock to lock the VCPUS.
+	 */
+	if (vgic_initialized(dev->kvm) && !vcpus_locked) {
+		mutex_unlock(&dev->kvm->arch.config_lock);
+		if (kvm_trylock_all_vcpus(dev->kvm)) {
+			mutex_unlock(&dev->kvm->lock);
+			return -EBUSY;
+		}
+		mutex_lock(&dev->kvm->arch.config_lock);
+		vcpus_locked = true;
 	}
 
 	switch (attr->group) {
@@ -612,7 +631,8 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 
 out:
 	mutex_unlock(&dev->kvm->arch.config_lock);
-	kvm_unlock_all_vcpus(dev->kvm);
+	if (vcpus_locked)
+		kvm_unlock_all_vcpus(dev->kvm);
 	mutex_unlock(&dev->kvm->lock);
 
 	if (!ret && uaccess && !is_write) {
-- 
2.43.0



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] KVM: arm64: vgic: Skip vCPU trylock for pre-init register access
  2026-05-10 22:11 [RFC PATCH] KVM: arm64: vgic: Skip vCPU trylock for pre-init register access David Woodhouse
@ 2026-05-11 11:03 ` Yao Yuan
  2026-05-11 11:52   ` David Woodhouse
  0 siblings, 1 reply; 3+ messages in thread
From: Yao Yuan @ 2026-05-11 11:03 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Sascha Bischoff,
	Paolo Bonzini, Jonathan Cameron, Eric Auger,
	Raghavendra Rao Ananta, David Woodhouse, Maxim Levitsky,
	linux-arm-kernel, kvmarm, kvm

On Sun, May 10, 2026 at 11:11:43PM +0800, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> When creating a VM, userspace sets pre-init distributor registers such
> as GICD_IIDR before the VGIC is initialized.
>
> Strictly speaking there's no reason this couldn't be done at VM creation
> time before any vCPUs exist at all, but the design of the userspace API
> does require vCPU0 to have been created, as the system-wide registers
> are effectively accessed via vCPU0.
>
> So a VMM can't just set the IIDR at startup before spawning vCPU threads;
> it has to be done once the vCPUs are being created.
>
> However, kvm_trylock_all_vcpus() makes it unreliable by causing spurious
> -EBUSY failures if any vCPU cannot be locked. So userspace is forced to
> create the vCPUs (well, at least vCPU0), and is then forced to have a
> full synchronization point and quiesce them all in order to reliably set
> the IIDR.
>
> To slightly reduce the pain of all this, skip the trylock when the VGIC
> is not yet initialized. There is no need to lock the vCPUs if they can't
> be accessing it anyway.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> Other options would include making it possible to set the IIDR before
> creating any vCPUs, either by creating a new device-level attribute or
> special-casing it not to require vCPU0 for DIST_REGS that aren't really
> associated to a vCPU.
>
> Deprecating kvm_trylock_all_vcpus() and killing every user of it with
> fire would also be a good option, perhaps... :)
>
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 38 ++++++++++++++++++++-------
>  1 file changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index a96c77dccf35..e17ea9f07f5f 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -540,7 +540,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
>  	struct vgic_reg_attr reg_attr;
>  	gpa_t addr;
>  	struct kvm_vcpu *vcpu;
> -	bool uaccess;
> +	bool uaccess, vcpus_locked = false;
>  	u32 val;
>  	int ret;
>
> @@ -566,18 +566,37 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
>  			return -EFAULT;
>  	}
>
> +	if (!vgic_initialized(dev->kvm) && !reg_allowed_pre_init(attr))
> +		return -EBUSY;
> +
>  	mutex_lock(&dev->kvm->lock);
>
> -	if (kvm_trylock_all_vcpus(dev->kvm)) {
> -		mutex_unlock(&dev->kvm->lock);
> -		return -EBUSY;
> +	/*
> +	 * Pre-init registers (e.g. GICD_IIDR) don't need vCPU quiescence
> +	 * since the VGIC isn't live yet.  Skip the trylock to avoid spurious
> +	 * -EBUSY when vCPU threads happen to be running.
> +	 */
> +	if (vgic_initialized(dev->kvm)) {
> +		if (kvm_trylock_all_vcpus(dev->kvm)) {
> +			mutex_unlock(&dev->kvm->lock);
> +			return -EBUSY;
> +		}
> +		vcpus_locked = true;
>  	}
> -
>  	mutex_lock(&dev->kvm->arch.config_lock);
>
> -	if (!(vgic_initialized(dev->kvm) || reg_allowed_pre_init(attr))) {
> -		ret = -EBUSY;
> -		goto out;

Hi David,

> +	/*
> +	 * If the VGIC becomes initialized between the above check and taking
> +	 * config_lock, drop config_lock to lock the VCPUS.
> +	 */
> +	if (vgic_initialized(dev->kvm) && !vcpus_locked) {
> +		mutex_unlock(&dev->kvm->arch.config_lock);
> +		if (kvm_trylock_all_vcpus(dev->kvm)) {
> +			mutex_unlock(&dev->kvm->lock);
> +			return -EBUSY;
> +		}
> +		mutex_lock(&dev->kvm->arch.config_lock);
> +		vcpus_locked = true;

Question: Why not do vgic_initialized(dev->kvm) and
reg_allowed_pre_init(attr) checking after take config_lock,
and depends on the checking result to decide trylocak all
vcpus or not ?

>  	}
>
>  	switch (attr->group) {
> @@ -612,7 +631,8 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
>
>  out:
>  	mutex_unlock(&dev->kvm->arch.config_lock);
> -	kvm_unlock_all_vcpus(dev->kvm);
> +	if (vcpus_locked)
> +		kvm_unlock_all_vcpus(dev->kvm);
>  	mutex_unlock(&dev->kvm->lock);
>
>  	if (!ret && uaccess && !is_write) {
> --
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] KVM: arm64: vgic: Skip vCPU trylock for pre-init register access
  2026-05-11 11:03 ` Yao Yuan
@ 2026-05-11 11:52   ` David Woodhouse
  0 siblings, 0 replies; 3+ messages in thread
From: David Woodhouse @ 2026-05-11 11:52 UTC (permalink / raw)
  To: Yao Yuan
  Cc: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Sascha Bischoff,
	Paolo Bonzini, Jonathan Cameron, Eric Auger,
	Raghavendra Rao Ananta, Maxim Levitsky, linux-arm-kernel, kvmarm,
	kvm

[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]

On Mon, 2026-05-11 at 19:03 +0800, Yao Yuan wrote:
> 
> > +	/*
> > +	 * If the VGIC becomes initialized between the above check and taking
> > +	 * config_lock, drop config_lock to lock the VCPUS.
> > +	 */
> > +	if (vgic_initialized(dev->kvm) && !vcpus_locked) {
> > +		mutex_unlock(&dev->kvm->arch.config_lock);
> > +		if (kvm_trylock_all_vcpus(dev->kvm)) {
> > +			mutex_unlock(&dev->kvm->lock);
> > +			return -EBUSY;
> > +		}
> > +		mutex_lock(&dev->kvm->arch.config_lock);
> > +		vcpus_locked = true;
> 
> Question: Why not do vgic_initialized(dev->kvm) and
> reg_allowed_pre_init(attr) checking after take config_lock,
> and depends on the checking result to decide trylocak all
> vcpus or not ?

Yeah... I didn't like that either.

If we attempt to take the vCPU locks while *holding* config_lock, that
gives us a classic ABBA deadlock with the code paths that hold the vCPU
locks first.

Well, maybe it doesn't because we'd only be doing a *trylock*... but
lockdep wouldn't know that and it's fugly as hell. So I settled on the
above.

The long term fix, as I noted, is to kill kvm_trylock_all_vcpus() with
fire in *all* cases, not just the !vgic_initialized() case. On x86 we'd
do this with something like KVM_REQ_MCLOCK_INPROGRESS to kick vCPUs out
of guest mode to prevent concurrent access. Which isn't my favourite
code *either* but at least it works without imposing the awfulness on
the userspace API.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-11 11:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-10 22:11 [RFC PATCH] KVM: arm64: vgic: Skip vCPU trylock for pre-init register access David Woodhouse
2026-05-11 11:03 ` Yao Yuan
2026-05-11 11:52   ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox