* [PATCH v2 0/4] KVM: Speed up MMIO registrations
@ 2025-07-16 11:07 Keir Fraser
2025-07-16 11:07 ` [PATCH v2 1/4] KVM: arm64: vgic-init: Remove vgic_ready() macro Keir Fraser
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Keir Fraser @ 2025-07-16 11:07 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kvm
Cc: Sean Christopherson, Eric Auger, Oliver Upton, Marc Zyngier,
Will Deacon, Paolo Bonzini, Keir Fraser
This is version 2 of the patches I previously posted here:
https://lore.kernel.org/all/20250624092256.1105524-1-keirf@google.com/
Changes since v1:
* Memory barriers introduced (or implied and documented) on the
kvm->buses[] SRCU read paths
* Disallow kvm_get_bus() from SRCU readers
* Rebased to v6.16-rc6
Thanks to Sean for the review feedback and patch suggestions!
Keir Fraser (4):
KVM: arm64: vgic-init: Remove vgic_ready() macro
KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
KVM: Implement barriers before accessing kvm->buses[] on SRCU read
paths
KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()
arch/arm64/kvm/vgic/vgic-init.c | 14 +++--------
arch/x86/kvm/vmx/vmx.c | 7 ++++++
include/kvm/arm_vgic.h | 1 -
include/linux/kvm_host.h | 11 ++++++---
virt/kvm/kvm_main.c | 43 +++++++++++++++++++++++++++------
5 files changed, 53 insertions(+), 23 deletions(-)
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 1/4] KVM: arm64: vgic-init: Remove vgic_ready() macro
2025-07-16 11:07 [PATCH v2 0/4] KVM: Speed up MMIO registrations Keir Fraser
@ 2025-07-16 11:07 ` Keir Fraser
2025-07-16 11:07 ` [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering Keir Fraser
` (2 subsequent siblings)
3 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2025-07-16 11:07 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kvm
Cc: Sean Christopherson, Eric Auger, Oliver Upton, Marc Zyngier,
Will Deacon, Paolo Bonzini, Keir Fraser
It is now used only within kvm_vgic_map_resources(). vgic_dist::ready
is already written directly by this function, so it is clearer to
bypass the macro for reads as well.
Signed-off-by: Keir Fraser <keirf@google.com>
---
arch/arm64/kvm/vgic/vgic-init.c | 5 ++---
include/kvm/arm_vgic.h | 1 -
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index eb1205654ac8..502b65049703 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -559,7 +559,6 @@ int vgic_lazy_init(struct kvm *kvm)
* Also map the virtual CPU interface into the VM.
* v2 calls vgic_init() if not already done.
* v3 and derivatives return an error if the VGIC is not initialized.
- * vgic_ready() returns true if this function has succeeded.
*/
int kvm_vgic_map_resources(struct kvm *kvm)
{
@@ -568,12 +567,12 @@ int kvm_vgic_map_resources(struct kvm *kvm)
gpa_t dist_base;
int ret = 0;
- if (likely(vgic_ready(kvm)))
+ if (likely(dist->ready))
return 0;
mutex_lock(&kvm->slots_lock);
mutex_lock(&kvm->arch.config_lock);
- if (vgic_ready(kvm))
+ if (dist->ready)
goto out;
if (!irqchip_in_kernel(kvm))
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4a34f7f0a864..233eaa6d1267 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -399,7 +399,6 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
#define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
#define vgic_initialized(k) ((k)->arch.vgic.initialized)
-#define vgic_ready(k) ((k)->arch.vgic.ready)
#define vgic_valid_spi(k, i) (((i) >= VGIC_NR_PRIVATE_IRQS) && \
((i) < (k)->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS))
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-16 11:07 [PATCH v2 0/4] KVM: Speed up MMIO registrations Keir Fraser
2025-07-16 11:07 ` [PATCH v2 1/4] KVM: arm64: vgic-init: Remove vgic_ready() macro Keir Fraser
@ 2025-07-16 11:07 ` Keir Fraser
2025-07-17 5:44 ` Yao Yuan
2025-07-16 11:07 ` [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths Keir Fraser
2025-07-16 11:07 ` [PATCH v2 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev() Keir Fraser
3 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2025-07-16 11:07 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kvm
Cc: Sean Christopherson, Eric Auger, Oliver Upton, Marc Zyngier,
Will Deacon, Paolo Bonzini, Keir Fraser
In preparation to remove synchronize_srcu() from MMIO registration,
remove the distributor's dependency on this implicit barrier by
direct acquire-release synchronization on the flag write and its
lock-free check.
Signed-off-by: Keir Fraser <keirf@google.com>
---
arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 502b65049703..bc83672e461b 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
gpa_t dist_base;
int ret = 0;
- if (likely(dist->ready))
+ if (likely(smp_load_acquire(&dist->ready)))
return 0;
mutex_lock(&kvm->slots_lock);
@@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
goto out_slots;
}
- /*
- * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
- * registration before returning through synchronize_srcu(), which also
- * implies a full memory barrier. As such, marking the distributor as
- * 'ready' here is guaranteed to be ordered after all vCPUs having seen
- * a completely configured distributor.
- */
- dist->ready = true;
+ smp_store_release(&dist->ready, true);
goto out_slots;
out:
mutex_unlock(&kvm->arch.config_lock);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths
2025-07-16 11:07 [PATCH v2 0/4] KVM: Speed up MMIO registrations Keir Fraser
2025-07-16 11:07 ` [PATCH v2 1/4] KVM: arm64: vgic-init: Remove vgic_ready() macro Keir Fraser
2025-07-16 11:07 ` [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering Keir Fraser
@ 2025-07-16 11:07 ` Keir Fraser
2025-07-17 6:01 ` Yao Yuan
2025-07-16 11:07 ` [PATCH v2 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev() Keir Fraser
3 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2025-07-16 11:07 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kvm
Cc: Sean Christopherson, Eric Auger, Oliver Upton, Marc Zyngier,
Will Deacon, Paolo Bonzini, Keir Fraser
This ensures that, if a VCPU has "observed" that an IO registration has
occurred, the instruction currently being trapped or emulated will also
observe the IO registration.
At the same time, enforce that kvm_get_bus() is used only on the
update side, ensuring that a long-term reference cannot be obtained by
an SRCU reader.
Signed-off-by: Keir Fraser <keirf@google.com>
---
arch/x86/kvm/vmx/vmx.c | 7 +++++++
include/linux/kvm_host.h | 10 +++++++---
virt/kvm/kvm_main.c | 33 +++++++++++++++++++++++++++------
3 files changed, 41 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 191a9ed0da22..425e3d8074ab 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5861,6 +5861,13 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
if (kvm_test_request(KVM_REQ_EVENT, vcpu))
return 1;
+ /*
+ * Ensure that any updates to kvm->buses[] observed by the
+ * previous instruction (emulated or otherwise) are also
+ * visible to the instruction we are about to emulate.
+ */
+ smp_rmb();
+
if (!kvm_emulate_instruction(vcpu, 0))
return 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3bde4fb5c6aa..9132148fb467 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -965,11 +965,15 @@ static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
}
+/*
+ * Get a bus reference under the update-side lock. No long-term SRCU reader
+ * references are permitted, to avoid stale reads vs concurrent IO
+ * registrations.
+ */
static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
{
- return srcu_dereference_check(kvm->buses[idx], &kvm->srcu,
- lockdep_is_held(&kvm->slots_lock) ||
- !refcount_read(&kvm->users_count));
+ return rcu_dereference_protected(kvm->buses[idx],
+ lockdep_is_held(&kvm->slots_lock));
}
static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 222f0e894a0c..9ec3b96b9666 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1103,6 +1103,15 @@ void __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
{
}
+/* Called only on cleanup and destruction paths when there are no users. */
+static inline struct kvm_io_bus *kvm_get_bus_for_destruction(struct kvm *kvm,
+ enum kvm_bus idx)
+{
+ return rcu_dereference_protected(kvm->buses[idx],
+ !refcount_read(&kvm->users_count));
+}
+
+
static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
{
struct kvm *kvm = kvm_arch_alloc_vm();
@@ -1228,7 +1237,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
out_err_no_arch_destroy_vm:
WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
for (i = 0; i < KVM_NR_BUSES; i++)
- kfree(kvm_get_bus(kvm, i));
+ kfree(kvm_get_bus_for_destruction(kvm, i));
kvm_free_irq_routing(kvm);
out_err_no_irq_routing:
cleanup_srcu_struct(&kvm->irq_srcu);
@@ -1276,7 +1285,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
kvm_free_irq_routing(kvm);
for (i = 0; i < KVM_NR_BUSES; i++) {
- struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
+ struct kvm_io_bus *bus = kvm_get_bus_for_destruction(kvm, i);
if (bus)
kvm_io_bus_destroy(bus);
@@ -5838,6 +5847,18 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
return -EOPNOTSUPP;
}
+static struct kvm_io_bus *kvm_get_bus_srcu(struct kvm *kvm, enum kvm_bus idx)
+{
+ /*
+ * Ensure that any updates to kvm_buses[] observed by the previous VCPU
+ * machine instruction are also visible to the VCPU machine instruction
+ * that triggered this call.
+ */
+ smp_mb__after_srcu_read_lock();
+
+ return srcu_dereference(kvm->buses[idx], &kvm->srcu);
+}
+
int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
int len, const void *val)
{
@@ -5850,7 +5871,7 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
.len = len,
};
- bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
+ bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
if (!bus)
return -ENOMEM;
r = __kvm_io_bus_write(vcpu, bus, &range, val);
@@ -5869,7 +5890,7 @@ int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
.len = len,
};
- bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
+ bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
if (!bus)
return -ENOMEM;
@@ -5919,7 +5940,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
.len = len,
};
- bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
+ bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
if (!bus)
return -ENOMEM;
r = __kvm_io_bus_read(vcpu, bus, &range, val);
@@ -6028,7 +6049,7 @@ struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
srcu_idx = srcu_read_lock(&kvm->srcu);
- bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
+ bus = kvm_get_bus_srcu(kvm, bus_idx);
if (!bus)
goto out_unlock;
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()
2025-07-16 11:07 [PATCH v2 0/4] KVM: Speed up MMIO registrations Keir Fraser
` (2 preceding siblings ...)
2025-07-16 11:07 ` [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths Keir Fraser
@ 2025-07-16 11:07 ` Keir Fraser
3 siblings, 0 replies; 22+ messages in thread
From: Keir Fraser @ 2025-07-16 11:07 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kvm
Cc: Sean Christopherson, Eric Auger, Oliver Upton, Marc Zyngier,
Will Deacon, Paolo Bonzini, Keir Fraser
Device MMIO registration may happen quite frequently during VM boot,
and the SRCU synchronization each time has a measurable effect
on VM startup time. In our experiments it can account for around 25%
of a VM's startup time.
Replace the synchronization with a deferred free of the old kvm_io_bus
structure.
Signed-off-by: Keir Fraser <keirf@google.com>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 10 ++++++++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9132148fb467..802ca46f7537 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -205,6 +205,7 @@ struct kvm_io_range {
struct kvm_io_bus {
int dev_count;
int ioeventfd_count;
+ struct rcu_head rcu;
struct kvm_io_range range[];
};
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9ec3b96b9666..f690a4997a0f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5948,6 +5948,13 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
}
EXPORT_SYMBOL_GPL(kvm_io_bus_read);
+static void __free_bus(struct rcu_head *rcu)
+{
+ struct kvm_io_bus *bus = container_of(rcu, struct kvm_io_bus, rcu);
+
+ kfree(bus);
+}
+
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, struct kvm_io_device *dev)
{
@@ -5986,8 +5993,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
memcpy(new_bus->range + i + 1, bus->range + i,
(bus->dev_count - i) * sizeof(struct kvm_io_range));
rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
- synchronize_srcu_expedited(&kvm->srcu);
- kfree(bus);
+ call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
return 0;
}
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-16 11:07 ` [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering Keir Fraser
@ 2025-07-17 5:44 ` Yao Yuan
2025-07-18 14:53 ` Keir Fraser
2025-07-18 15:00 ` Sean Christopherson
0 siblings, 2 replies; 22+ messages in thread
From: Yao Yuan @ 2025-07-17 5:44 UTC (permalink / raw)
To: Keir Fraser
Cc: linux-arm-kernel, linux-kernel, kvm, Sean Christopherson,
Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> In preparation to remove synchronize_srcu() from MMIO registration,
> remove the distributor's dependency on this implicit barrier by
> direct acquire-release synchronization on the flag write and its
> lock-free check.
>
> Signed-off-by: Keir Fraser <keirf@google.com>
> ---
> arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> 1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index 502b65049703..bc83672e461b 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> gpa_t dist_base;
> int ret = 0;
>
> - if (likely(dist->ready))
> + if (likely(smp_load_acquire(&dist->ready)))
> return 0;
>
> mutex_lock(&kvm->slots_lock);
> @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> goto out_slots;
> }
>
> - /*
> - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> - * registration before returning through synchronize_srcu(), which also
> - * implies a full memory barrier. As such, marking the distributor as
> - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> - * a completely configured distributor.
> - */
> - dist->ready = true;
> + smp_store_release(&dist->ready, true);
No need the store-release and load-acquire for replacing
synchronize_srcu_expedited() w/ call_srcu() IIUC:
Tree SRCU on SMP:
call_srcu()
__call_srcu()
srcu_gp_start_if_needed()
__srcu_read_unlock_nmisafe()
#ifdef CONFIG_NEED_SRCU_NMI_SAFE
smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
#else
__srcu_read_unlock()
smp_mb()
#endif
TINY SRCY on UP:
Should have no memory ordering issue on UP.
> goto out_slots;
> out:
> mutex_unlock(&kvm->arch.config_lock);
> --
> 2.50.0.727.gbf7dc18ff4-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths
2025-07-16 11:07 ` [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths Keir Fraser
@ 2025-07-17 6:01 ` Yao Yuan
2025-07-18 14:54 ` Sean Christopherson
2025-07-18 14:56 ` Keir Fraser
0 siblings, 2 replies; 22+ messages in thread
From: Yao Yuan @ 2025-07-17 6:01 UTC (permalink / raw)
To: Keir Fraser
Cc: linux-arm-kernel, linux-kernel, kvm, Sean Christopherson,
Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Wed, Jul 16, 2025 at 11:07:36AM +0800, Keir Fraser wrote:
> This ensures that, if a VCPU has "observed" that an IO registration has
> occurred, the instruction currently being trapped or emulated will also
> observe the IO registration.
>
> At the same time, enforce that kvm_get_bus() is used only on the
> update side, ensuring that a long-term reference cannot be obtained by
> an SRCU reader.
>
> Signed-off-by: Keir Fraser <keirf@google.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 7 +++++++
> include/linux/kvm_host.h | 10 +++++++---
> virt/kvm/kvm_main.c | 33 +++++++++++++++++++++++++++------
> 3 files changed, 41 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 191a9ed0da22..425e3d8074ab 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5861,6 +5861,13 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> if (kvm_test_request(KVM_REQ_EVENT, vcpu))
> return 1;
>
> + /*
> + * Ensure that any updates to kvm->buses[] observed by the
> + * previous instruction (emulated or otherwise) are also
> + * visible to the instruction we are about to emulate.
> + */
> + smp_rmb();
> +
> if (!kvm_emulate_instruction(vcpu, 0))
> return 0;
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 3bde4fb5c6aa..9132148fb467 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -965,11 +965,15 @@ static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
> return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
> }
>
> +/*
> + * Get a bus reference under the update-side lock. No long-term SRCU reader
> + * references are permitted, to avoid stale reads vs concurrent IO
> + * registrations.
> + */
> static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> {
> - return srcu_dereference_check(kvm->buses[idx], &kvm->srcu,
> - lockdep_is_held(&kvm->slots_lock) ||
> - !refcount_read(&kvm->users_count));
> + return rcu_dereference_protected(kvm->buses[idx],
> + lockdep_is_held(&kvm->slots_lock));
I want to consult the true reason for using protected version here,
save unnecessary READ_ONCE() ?
> }
>
> static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 222f0e894a0c..9ec3b96b9666 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1103,6 +1103,15 @@ void __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
> {
> }
>
> +/* Called only on cleanup and destruction paths when there are no users. */
> +static inline struct kvm_io_bus *kvm_get_bus_for_destruction(struct kvm *kvm,
> + enum kvm_bus idx)
> +{
> + return rcu_dereference_protected(kvm->buses[idx],
> + !refcount_read(&kvm->users_count));
> +}
> +
> +
> static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> {
> struct kvm *kvm = kvm_arch_alloc_vm();
> @@ -1228,7 +1237,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> out_err_no_arch_destroy_vm:
> WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
> for (i = 0; i < KVM_NR_BUSES; i++)
> - kfree(kvm_get_bus(kvm, i));
> + kfree(kvm_get_bus_for_destruction(kvm, i));
> kvm_free_irq_routing(kvm);
> out_err_no_irq_routing:
> cleanup_srcu_struct(&kvm->irq_srcu);
> @@ -1276,7 +1285,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>
> kvm_free_irq_routing(kvm);
> for (i = 0; i < KVM_NR_BUSES; i++) {
> - struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
> + struct kvm_io_bus *bus = kvm_get_bus_for_destruction(kvm, i);
>
> if (bus)
> kvm_io_bus_destroy(bus);
> @@ -5838,6 +5847,18 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
> return -EOPNOTSUPP;
> }
>
> +static struct kvm_io_bus *kvm_get_bus_srcu(struct kvm *kvm, enum kvm_bus idx)
> +{
> + /*
> + * Ensure that any updates to kvm_buses[] observed by the previous VCPU
> + * machine instruction are also visible to the VCPU machine instruction
> + * that triggered this call.
> + */
> + smp_mb__after_srcu_read_lock();
> +
> + return srcu_dereference(kvm->buses[idx], &kvm->srcu);
> +}
> +
> int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> int len, const void *val)
> {
> @@ -5850,7 +5871,7 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> .len = len,
> };
>
> - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> if (!bus)
> return -ENOMEM;
> r = __kvm_io_bus_write(vcpu, bus, &range, val);
> @@ -5869,7 +5890,7 @@ int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
> .len = len,
> };
>
> - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> if (!bus)
> return -ENOMEM;
>
> @@ -5919,7 +5940,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> .len = len,
> };
>
> - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> if (!bus)
> return -ENOMEM;
> r = __kvm_io_bus_read(vcpu, bus, &range, val);
> @@ -6028,7 +6049,7 @@ struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>
> srcu_idx = srcu_read_lock(&kvm->srcu);
>
> - bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
> + bus = kvm_get_bus_srcu(kvm, bus_idx);
> if (!bus)
> goto out_unlock;
>
> --
> 2.50.0.727.gbf7dc18ff4-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-17 5:44 ` Yao Yuan
@ 2025-07-18 14:53 ` Keir Fraser
2025-07-18 15:01 ` Sean Christopherson
2025-07-18 15:25 ` Keir Fraser
2025-07-18 15:00 ` Sean Christopherson
1 sibling, 2 replies; 22+ messages in thread
From: Keir Fraser @ 2025-07-18 14:53 UTC (permalink / raw)
To: Yao Yuan
Cc: linux-arm-kernel, linux-kernel, kvm, Sean Christopherson,
Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > In preparation to remove synchronize_srcu() from MMIO registration,
> > remove the distributor's dependency on this implicit barrier by
> > direct acquire-release synchronization on the flag write and its
> > lock-free check.
> >
> > Signed-off-by: Keir Fraser <keirf@google.com>
> > ---
> > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > 1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > index 502b65049703..bc83672e461b 100644
> > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > gpa_t dist_base;
> > int ret = 0;
> >
> > - if (likely(dist->ready))
> > + if (likely(smp_load_acquire(&dist->ready)))
> > return 0;
> >
> > mutex_lock(&kvm->slots_lock);
> > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > goto out_slots;
> > }
> >
> > - /*
> > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > - * registration before returning through synchronize_srcu(), which also
> > - * implies a full memory barrier. As such, marking the distributor as
> > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > - * a completely configured distributor.
> > - */
> > - dist->ready = true;
> > + smp_store_release(&dist->ready, true);
>
> No need the store-release and load-acquire for replacing
> synchronize_srcu_expedited() w/ call_srcu() IIUC:
>
> Tree SRCU on SMP:
> call_srcu()
> __call_srcu()
> srcu_gp_start_if_needed()
> __srcu_read_unlock_nmisafe()
> #ifdef CONFIG_NEED_SRCU_NMI_SAFE
> smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> #else
> __srcu_read_unlock()
> smp_mb()
> #endif
I don't think it's nice to depend on an implementation detail of
kvm_io_bus_register_dev() and, transitively, on implementation details
of call_srcu().
kvm_vgic_map_resources() isn't called that often and can afford its
own synchronization.
-- Keir
> TINY SRCY on UP:
> Should have no memory ordering issue on UP.
>
> > goto out_slots;
> > out:
> > mutex_unlock(&kvm->arch.config_lock);
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths
2025-07-17 6:01 ` Yao Yuan
@ 2025-07-18 14:54 ` Sean Christopherson
2025-07-19 2:37 ` Yao Yuan
2025-07-18 14:56 ` Keir Fraser
1 sibling, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-07-18 14:54 UTC (permalink / raw)
To: Yao Yuan
Cc: Keir Fraser, linux-arm-kernel, linux-kernel, kvm, Eric Auger,
Oliver Upton, Marc Zyngier, Will Deacon, Paolo Bonzini
On Thu, Jul 17, 2025, Yao Yuan wrote:
> On Wed, Jul 16, 2025 at 11:07:36AM +0800, Keir Fraser wrote:
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 3bde4fb5c6aa..9132148fb467 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -965,11 +965,15 @@ static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
> > return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
> > }
> >
> > +/*
> > + * Get a bus reference under the update-side lock. No long-term SRCU reader
> > + * references are permitted, to avoid stale reads vs concurrent IO
> > + * registrations.
> > + */
> > static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> > {
> > - return srcu_dereference_check(kvm->buses[idx], &kvm->srcu,
> > - lockdep_is_held(&kvm->slots_lock) ||
> > - !refcount_read(&kvm->users_count));
> > + return rcu_dereference_protected(kvm->buses[idx],
> > + lockdep_is_held(&kvm->slots_lock));
>
> I want to consult the true reason for using protected version here,
> save unnecessary READ_ONCE() ?
Avoiding the READ_ONCE() is a happy bonus. The main goal is to help document
and enforce that kvm_get_bus() can only be used if slots_lock is held. Keeping
this as srcu_dereference_check() would result in PROVE_RCU getting a false negative
if the caller held kvm->srcu but not slots_lock.
From a documentation perspective, rcu_dereference_protected() (hopefully) helps
highlight that there's something "special" about this helper, e.g. gives the reader
a hint that they probably shouldn't be using kvm_get_bus().
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths
2025-07-17 6:01 ` Yao Yuan
2025-07-18 14:54 ` Sean Christopherson
@ 2025-07-18 14:56 ` Keir Fraser
2025-07-19 2:33 ` Yao Yuan
1 sibling, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2025-07-18 14:56 UTC (permalink / raw)
To: Yao Yuan
Cc: linux-arm-kernel, linux-kernel, kvm, Sean Christopherson,
Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Thu, Jul 17, 2025 at 02:01:32PM +0800, Yao Yuan wrote:
> On Wed, Jul 16, 2025 at 11:07:36AM +0800, Keir Fraser wrote:
> > This ensures that, if a VCPU has "observed" that an IO registration has
> > occurred, the instruction currently being trapped or emulated will also
> > observe the IO registration.
> >
> > At the same time, enforce that kvm_get_bus() is used only on the
> > update side, ensuring that a long-term reference cannot be obtained by
> > an SRCU reader.
> >
> > Signed-off-by: Keir Fraser <keirf@google.com>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 7 +++++++
> > include/linux/kvm_host.h | 10 +++++++---
> > virt/kvm/kvm_main.c | 33 +++++++++++++++++++++++++++------
> > 3 files changed, 41 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 191a9ed0da22..425e3d8074ab 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -5861,6 +5861,13 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> > if (kvm_test_request(KVM_REQ_EVENT, vcpu))
> > return 1;
> >
> > + /*
> > + * Ensure that any updates to kvm->buses[] observed by the
> > + * previous instruction (emulated or otherwise) are also
> > + * visible to the instruction we are about to emulate.
> > + */
> > + smp_rmb();
> > +
> > if (!kvm_emulate_instruction(vcpu, 0))
> > return 0;
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 3bde4fb5c6aa..9132148fb467 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -965,11 +965,15 @@ static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
> > return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
> > }
> >
> > +/*
> > + * Get a bus reference under the update-side lock. No long-term SRCU reader
> > + * references are permitted, to avoid stale reads vs concurrent IO
> > + * registrations.
> > + */
> > static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> > {
> > - return srcu_dereference_check(kvm->buses[idx], &kvm->srcu,
> > - lockdep_is_held(&kvm->slots_lock) ||
> > - !refcount_read(&kvm->users_count));
> > + return rcu_dereference_protected(kvm->buses[idx],
> > + lockdep_is_held(&kvm->slots_lock));
>
> I want to consult the true reason for using protected version here,
> save unnecessary READ_ONCE() ?
We don't want this function to be callable from SRCU read section, but
*only* during teardown. Hence protected version provides a better,
stricter safety check (that there are no users).
-- Keir
> > }
> >
> > static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 222f0e894a0c..9ec3b96b9666 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -1103,6 +1103,15 @@ void __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
> > {
> > }
> >
> > +/* Called only on cleanup and destruction paths when there are no users. */
> > +static inline struct kvm_io_bus *kvm_get_bus_for_destruction(struct kvm *kvm,
> > + enum kvm_bus idx)
> > +{
> > + return rcu_dereference_protected(kvm->buses[idx],
> > + !refcount_read(&kvm->users_count));
> > +}
> > +
> > +
> > static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> > {
> > struct kvm *kvm = kvm_arch_alloc_vm();
> > @@ -1228,7 +1237,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> > out_err_no_arch_destroy_vm:
> > WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
> > for (i = 0; i < KVM_NR_BUSES; i++)
> > - kfree(kvm_get_bus(kvm, i));
> > + kfree(kvm_get_bus_for_destruction(kvm, i));
> > kvm_free_irq_routing(kvm);
> > out_err_no_irq_routing:
> > cleanup_srcu_struct(&kvm->irq_srcu);
> > @@ -1276,7 +1285,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
> >
> > kvm_free_irq_routing(kvm);
> > for (i = 0; i < KVM_NR_BUSES; i++) {
> > - struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
> > + struct kvm_io_bus *bus = kvm_get_bus_for_destruction(kvm, i);
> >
> > if (bus)
> > kvm_io_bus_destroy(bus);
> > @@ -5838,6 +5847,18 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
> > return -EOPNOTSUPP;
> > }
> >
> > +static struct kvm_io_bus *kvm_get_bus_srcu(struct kvm *kvm, enum kvm_bus idx)
> > +{
> > + /*
> > + * Ensure that any updates to kvm_buses[] observed by the previous VCPU
> > + * machine instruction are also visible to the VCPU machine instruction
> > + * that triggered this call.
> > + */
> > + smp_mb__after_srcu_read_lock();
> > +
> > + return srcu_dereference(kvm->buses[idx], &kvm->srcu);
> > +}
> > +
> > int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> > int len, const void *val)
> > {
> > @@ -5850,7 +5871,7 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> > .len = len,
> > };
> >
> > - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> > + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> > if (!bus)
> > return -ENOMEM;
> > r = __kvm_io_bus_write(vcpu, bus, &range, val);
> > @@ -5869,7 +5890,7 @@ int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
> > .len = len,
> > };
> >
> > - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> > + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> > if (!bus)
> > return -ENOMEM;
> >
> > @@ -5919,7 +5940,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> > .len = len,
> > };
> >
> > - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> > + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> > if (!bus)
> > return -ENOMEM;
> > r = __kvm_io_bus_read(vcpu, bus, &range, val);
> > @@ -6028,7 +6049,7 @@ struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> >
> > srcu_idx = srcu_read_lock(&kvm->srcu);
> >
> > - bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
> > + bus = kvm_get_bus_srcu(kvm, bus_idx);
> > if (!bus)
> > goto out_unlock;
> >
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-17 5:44 ` Yao Yuan
2025-07-18 14:53 ` Keir Fraser
@ 2025-07-18 15:00 ` Sean Christopherson
2025-07-19 2:15 ` Yao Yuan
1 sibling, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-07-18 15:00 UTC (permalink / raw)
To: Yao Yuan
Cc: Keir Fraser, linux-arm-kernel, linux-kernel, kvm, Eric Auger,
Oliver Upton, Marc Zyngier, Will Deacon, Paolo Bonzini
On Thu, Jul 17, 2025, Yao Yuan wrote:
> On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > In preparation to remove synchronize_srcu() from MMIO registration,
> > remove the distributor's dependency on this implicit barrier by
> > direct acquire-release synchronization on the flag write and its
> > lock-free check.
> >
> > Signed-off-by: Keir Fraser <keirf@google.com>
> > ---
> > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > 1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > index 502b65049703..bc83672e461b 100644
> > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > gpa_t dist_base;
> > int ret = 0;
> >
> > - if (likely(dist->ready))
> > + if (likely(smp_load_acquire(&dist->ready)))
> > return 0;
> >
> > mutex_lock(&kvm->slots_lock);
> > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > goto out_slots;
> > }
> >
> > - /*
> > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > - * registration before returning through synchronize_srcu(), which also
> > - * implies a full memory barrier. As such, marking the distributor as
> > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > - * a completely configured distributor.
> > - */
> > - dist->ready = true;
> > + smp_store_release(&dist->ready, true);
>
> No need the store-release and load-acquire for replacing
> synchronize_srcu_expedited() w/ call_srcu() IIUC:
This isn't about using call_srcu(), because it's not actually about kvm->buses.
This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
with a half-baked distributor.
In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to
smp_store_release() + smp_load_acquire() removes the dependency on the
synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-18 14:53 ` Keir Fraser
@ 2025-07-18 15:01 ` Sean Christopherson
2025-07-18 15:25 ` Keir Fraser
1 sibling, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-07-18 15:01 UTC (permalink / raw)
To: Keir Fraser
Cc: Yao Yuan, linux-arm-kernel, linux-kernel, kvm, Eric Auger,
Oliver Upton, Marc Zyngier, Will Deacon, Paolo Bonzini
On Fri, Jul 18, 2025, Keir Fraser wrote:
> On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > remove the distributor's dependency on this implicit barrier by
> > > direct acquire-release synchronization on the flag write and its
> > > lock-free check.
> > >
> > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > ---
> > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > index 502b65049703..bc83672e461b 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > gpa_t dist_base;
> > > int ret = 0;
> > >
> > > - if (likely(dist->ready))
> > > + if (likely(smp_load_acquire(&dist->ready)))
> > > return 0;
> > >
> > > mutex_lock(&kvm->slots_lock);
> > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > goto out_slots;
> > > }
> > >
> > > - /*
> > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > - * registration before returning through synchronize_srcu(), which also
> > > - * implies a full memory barrier. As such, marking the distributor as
> > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > - * a completely configured distributor.
> > > - */
> > > - dist->ready = true;
> > > + smp_store_release(&dist->ready, true);
> >
> > No need the store-release and load-acquire for replacing
> > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> >
> > Tree SRCU on SMP:
> > call_srcu()
> > __call_srcu()
> > srcu_gp_start_if_needed()
> > __srcu_read_unlock_nmisafe()
> > #ifdef CONFIG_NEED_SRCU_NMI_SAFE
> > smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> > #else
> > __srcu_read_unlock()
> > smp_mb()
> > #endif
>
> I don't think it's nice to depend on an implementation detail of
> kvm_io_bus_register_dev() and, transitively, on implementation details
> of call_srcu().
>
> kvm_vgic_map_resources() isn't called that often and can afford its
> own synchronization.
+1
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-18 14:53 ` Keir Fraser
2025-07-18 15:01 ` Sean Christopherson
@ 2025-07-18 15:25 ` Keir Fraser
2025-07-19 2:23 ` Yao Yuan
1 sibling, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2025-07-18 15:25 UTC (permalink / raw)
To: Yao Yuan
Cc: linux-arm-kernel, linux-kernel, kvm, Sean Christopherson,
Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Fri, Jul 18, 2025 at 02:53:42PM +0000, Keir Fraser wrote:
> On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > remove the distributor's dependency on this implicit barrier by
> > > direct acquire-release synchronization on the flag write and its
> > > lock-free check.
> > >
> > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > ---
> > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > index 502b65049703..bc83672e461b 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > gpa_t dist_base;
> > > int ret = 0;
> > >
> > > - if (likely(dist->ready))
> > > + if (likely(smp_load_acquire(&dist->ready)))
> > > return 0;
> > >
> > > mutex_lock(&kvm->slots_lock);
> > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > goto out_slots;
> > > }
> > >
> > > - /*
> > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > - * registration before returning through synchronize_srcu(), which also
> > > - * implies a full memory barrier. As such, marking the distributor as
> > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > - * a completely configured distributor.
> > > - */
> > > - dist->ready = true;
> > > + smp_store_release(&dist->ready, true);
> >
> > No need the store-release and load-acquire for replacing
> > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> >
> > Tree SRCU on SMP:
> > call_srcu()
> > __call_srcu()
> > srcu_gp_start_if_needed()
> > __srcu_read_unlock_nmisafe()
> > #ifdef CONFIG_NEED_SRCU_NMI_SAFE
> > smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> > #else
> > __srcu_read_unlock()
> > smp_mb()
> > #endif
>
> I don't think it's nice to depend on an implementation detail of
> kvm_io_bus_register_dev() and, transitively, on implementation details
> of call_srcu().
Also I should note that this is moot because the smp_mb() would *not*
safely replace the load-acquire.
> kvm_vgic_map_resources() isn't called that often and can afford its
> own synchronization.
>
> -- Keir
>
> > TINY SRCY on UP:
> > Should have no memory ordering issue on UP.
> >
> > > goto out_slots;
> > > out:
> > > mutex_unlock(&kvm->arch.config_lock);
> > > --
> > > 2.50.0.727.gbf7dc18ff4-goog
> > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-18 15:00 ` Sean Christopherson
@ 2025-07-19 2:15 ` Yao Yuan
2025-07-19 7:58 ` Keir Fraser
0 siblings, 1 reply; 22+ messages in thread
From: Yao Yuan @ 2025-07-19 2:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Yao Yuan, Keir Fraser, linux-arm-kernel, linux-kernel, kvm,
Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> On Thu, Jul 17, 2025, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > remove the distributor's dependency on this implicit barrier by
> > > direct acquire-release synchronization on the flag write and its
> > > lock-free check.
> > >
> > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > ---
> > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > index 502b65049703..bc83672e461b 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > gpa_t dist_base;
> > > int ret = 0;
> > >
> > > - if (likely(dist->ready))
> > > + if (likely(smp_load_acquire(&dist->ready)))
> > > return 0;
> > >
> > > mutex_lock(&kvm->slots_lock);
> > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > goto out_slots;
> > > }
> > >
> > > - /*
> > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > - * registration before returning through synchronize_srcu(), which also
> > > - * implies a full memory barrier. As such, marking the distributor as
> > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > - * a completely configured distributor.
> > > - */
> > > - dist->ready = true;
> > > + smp_store_release(&dist->ready, true);
> >
> > No need the store-release and load-acquire for replacing
> > synchronize_srcu_expedited() w/ call_srcu() IIUC:
>
> This isn't about using call_srcu(), because it's not actually about kvm->buses.
> This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> with a half-baked distributor.
>
> In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to
> smp_store_release() + smp_load_acquire() removes the dependency on the
> synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
Yes, I understand this and agree with your point.
Just for discusstion: I thought it should also work even w/o
introduce the load acqure + store release after switch to
call_srcu(): The smp_mb() in call_srcu() order the all store
to kvm->arch.vgic before store kvm->arch.vgic.ready in
current implementation.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-18 15:25 ` Keir Fraser
@ 2025-07-19 2:23 ` Yao Yuan
0 siblings, 0 replies; 22+ messages in thread
From: Yao Yuan @ 2025-07-19 2:23 UTC (permalink / raw)
To: Keir Fraser
Cc: Yao Yuan, linux-arm-kernel, linux-kernel, kvm,
Sean Christopherson, Eric Auger, Oliver Upton, Marc Zyngier,
Will Deacon, Paolo Bonzini
On Fri, Jul 18, 2025 at 03:25:46PM +0000, Keir Fraser wrote:
> On Fri, Jul 18, 2025 at 02:53:42PM +0000, Keir Fraser wrote:
> > On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > remove the distributor's dependency on this implicit barrier by
> > > > direct acquire-release synchronization on the flag write and its
> > > > lock-free check.
> > > >
> > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > ---
> > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > index 502b65049703..bc83672e461b 100644
> > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > gpa_t dist_base;
> > > > int ret = 0;
> > > >
> > > > - if (likely(dist->ready))
> > > > + if (likely(smp_load_acquire(&dist->ready)))
> > > > return 0;
> > > >
> > > > mutex_lock(&kvm->slots_lock);
> > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > goto out_slots;
> > > > }
> > > >
> > > > - /*
> > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > - * registration before returning through synchronize_srcu(), which also
> > > > - * implies a full memory barrier. As such, marking the distributor as
> > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > - * a completely configured distributor.
> > > > - */
> > > > - dist->ready = true;
> > > > + smp_store_release(&dist->ready, true);
> > >
> > > No need the store-release and load-acquire for replacing
> > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > >
> > > Tree SRCU on SMP:
> > > call_srcu()
> > > __call_srcu()
> > > srcu_gp_start_if_needed()
> > > __srcu_read_unlock_nmisafe()
> > > #ifdef CONFIG_NEED_SRCU_NMI_SAFE
> > > smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> > > #else
> > > __srcu_read_unlock()
> > > smp_mb()
> > > #endif
> >
> > I don't think it's nice to depend on an implementation detail of
> > kvm_io_bus_register_dev() and, transitively, on implementation details
> > of call_srcu().
This is good point, I agree with you.
>
> Also I should note that this is moot because the smp_mb() would *not*
> safely replace the load-acquire.
Hmm.. do you mean it can't order the write to dist->ready
here while read it in aonther thread at the same time ?
>
> > kvm_vgic_map_resources() isn't called that often and can afford its
> > own synchronization.
> >
> > -- Keir
> >
> > > TINY SRCY on UP:
> > > Should have no memory ordering issue on UP.
> > >
> > > > goto out_slots;
> > > > out:
> > > > mutex_unlock(&kvm->arch.config_lock);
> > > > --
> > > > 2.50.0.727.gbf7dc18ff4-goog
> > > >
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths
2025-07-18 14:56 ` Keir Fraser
@ 2025-07-19 2:33 ` Yao Yuan
0 siblings, 0 replies; 22+ messages in thread
From: Yao Yuan @ 2025-07-19 2:33 UTC (permalink / raw)
To: Keir Fraser
Cc: Yao Yuan, linux-arm-kernel, linux-kernel, kvm,
Sean Christopherson, Eric Auger, Oliver Upton, Marc Zyngier,
Will Deacon, Paolo Bonzini
On Fri, Jul 18, 2025 at 02:56:25PM +0000, Keir Fraser wrote:
> On Thu, Jul 17, 2025 at 02:01:32PM +0800, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:36AM +0800, Keir Fraser wrote:
> > > This ensures that, if a VCPU has "observed" that an IO registration has
> > > occurred, the instruction currently being trapped or emulated will also
> > > observe the IO registration.
> > >
> > > At the same time, enforce that kvm_get_bus() is used only on the
> > > update side, ensuring that a long-term reference cannot be obtained by
> > > an SRCU reader.
> > >
> > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > ---
> > > arch/x86/kvm/vmx/vmx.c | 7 +++++++
> > > include/linux/kvm_host.h | 10 +++++++---
> > > virt/kvm/kvm_main.c | 33 +++++++++++++++++++++++++++------
> > > 3 files changed, 41 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 191a9ed0da22..425e3d8074ab 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -5861,6 +5861,13 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> > > if (kvm_test_request(KVM_REQ_EVENT, vcpu))
> > > return 1;
> > >
> > > + /*
> > > + * Ensure that any updates to kvm->buses[] observed by the
> > > + * previous instruction (emulated or otherwise) are also
> > > + * visible to the instruction we are about to emulate.
> > > + */
> > > + smp_rmb();
> > > +
> > > if (!kvm_emulate_instruction(vcpu, 0))
> > > return 0;
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 3bde4fb5c6aa..9132148fb467 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -965,11 +965,15 @@ static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
> > > return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
> > > }
> > >
> > > +/*
> > > + * Get a bus reference under the update-side lock. No long-term SRCU reader
> > > + * references are permitted, to avoid stale reads vs concurrent IO
> > > + * registrations.
> > > + */
> > > static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> > > {
> > > - return srcu_dereference_check(kvm->buses[idx], &kvm->srcu,
> > > - lockdep_is_held(&kvm->slots_lock) ||
> > > - !refcount_read(&kvm->users_count));
> > > + return rcu_dereference_protected(kvm->buses[idx],
> > > + lockdep_is_held(&kvm->slots_lock));
> >
> > I want to consult the true reason for using protected version here,
> > save unnecessary READ_ONCE() ?
>
> We don't want this function to be callable from SRCU read section, but
> *only* during teardown. Hence protected version provides a better,
> stricter safety check (that there are no users).
I see, thanks for your explanation!
>
> -- Keir
>
> > > }
> > >
> > > static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index 222f0e894a0c..9ec3b96b9666 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -1103,6 +1103,15 @@ void __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
> > > {
> > > }
> > >
> > > +/* Called only on cleanup and destruction paths when there are no users. */
> > > +static inline struct kvm_io_bus *kvm_get_bus_for_destruction(struct kvm *kvm,
> > > + enum kvm_bus idx)
> > > +{
> > > + return rcu_dereference_protected(kvm->buses[idx],
> > > + !refcount_read(&kvm->users_count));
> > > +}
> > > +
> > > +
> > > static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> > > {
> > > struct kvm *kvm = kvm_arch_alloc_vm();
> > > @@ -1228,7 +1237,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
> > > out_err_no_arch_destroy_vm:
> > > WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
> > > for (i = 0; i < KVM_NR_BUSES; i++)
> > > - kfree(kvm_get_bus(kvm, i));
> > > + kfree(kvm_get_bus_for_destruction(kvm, i));
> > > kvm_free_irq_routing(kvm);
> > > out_err_no_irq_routing:
> > > cleanup_srcu_struct(&kvm->irq_srcu);
> > > @@ -1276,7 +1285,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
> > >
> > > kvm_free_irq_routing(kvm);
> > > for (i = 0; i < KVM_NR_BUSES; i++) {
> > > - struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
> > > + struct kvm_io_bus *bus = kvm_get_bus_for_destruction(kvm, i);
> > >
> > > if (bus)
> > > kvm_io_bus_destroy(bus);
> > > @@ -5838,6 +5847,18 @@ static int __kvm_io_bus_write(struct kvm_vcpu *vcpu, struct kvm_io_bus *bus,
> > > return -EOPNOTSUPP;
> > > }
> > >
> > > +static struct kvm_io_bus *kvm_get_bus_srcu(struct kvm *kvm, enum kvm_bus idx)
> > > +{
> > > + /*
> > > + * Ensure that any updates to kvm_buses[] observed by the previous VCPU
> > > + * machine instruction are also visible to the VCPU machine instruction
> > > + * that triggered this call.
> > > + */
> > > + smp_mb__after_srcu_read_lock();
> > > +
> > > + return srcu_dereference(kvm->buses[idx], &kvm->srcu);
> > > +}
> > > +
> > > int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> > > int len, const void *val)
> > > {
> > > @@ -5850,7 +5871,7 @@ int kvm_io_bus_write(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> > > .len = len,
> > > };
> > >
> > > - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> > > + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> > > if (!bus)
> > > return -ENOMEM;
> > > r = __kvm_io_bus_write(vcpu, bus, &range, val);
> > > @@ -5869,7 +5890,7 @@ int kvm_io_bus_write_cookie(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx,
> > > .len = len,
> > > };
> > >
> > > - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> > > + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> > > if (!bus)
> > > return -ENOMEM;
> > >
> > > @@ -5919,7 +5940,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
> > > .len = len,
> > > };
> > >
> > > - bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
> > > + bus = kvm_get_bus_srcu(vcpu->kvm, bus_idx);
> > > if (!bus)
> > > return -ENOMEM;
> > > r = __kvm_io_bus_read(vcpu, bus, &range, val);
> > > @@ -6028,7 +6049,7 @@ struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> > >
> > > srcu_idx = srcu_read_lock(&kvm->srcu);
> > >
> > > - bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
> > > + bus = kvm_get_bus_srcu(kvm, bus_idx);
> > > if (!bus)
> > > goto out_unlock;
> > >
> > > --
> > > 2.50.0.727.gbf7dc18ff4-goog
> > >
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths
2025-07-18 14:54 ` Sean Christopherson
@ 2025-07-19 2:37 ` Yao Yuan
0 siblings, 0 replies; 22+ messages in thread
From: Yao Yuan @ 2025-07-19 2:37 UTC (permalink / raw)
To: Sean Christopherson
Cc: Yao Yuan, Keir Fraser, linux-arm-kernel, linux-kernel, kvm,
Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Fri, Jul 18, 2025 at 07:54:38AM -0700, Sean Christopherson wrote:
> On Thu, Jul 17, 2025, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:36AM +0800, Keir Fraser wrote:
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 3bde4fb5c6aa..9132148fb467 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -965,11 +965,15 @@ static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
> > > return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET);
> > > }
> > >
> > > +/*
> > > + * Get a bus reference under the update-side lock. No long-term SRCU reader
> > > + * references are permitted, to avoid stale reads vs concurrent IO
> > > + * registrations.
> > > + */
> > > static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> > > {
> > > - return srcu_dereference_check(kvm->buses[idx], &kvm->srcu,
> > > - lockdep_is_held(&kvm->slots_lock) ||
> > > - !refcount_read(&kvm->users_count));
> > > + return rcu_dereference_protected(kvm->buses[idx],
> > > + lockdep_is_held(&kvm->slots_lock));
> >
> > I want to consult the true reason for using protected version here,
> > save unnecessary READ_ONCE() ?
>
> Avoiding the READ_ONCE() is a happy bonus. The main goal is to help document
> and enforce that kvm_get_bus() can only be used if slots_lock is held. Keeping
> this as srcu_dereference_check() would result in PROVE_RCU getting a false negative
> if the caller held kvm->srcu but not slots_lock.
Ah, I noticed the srcu_read_lock_held(ssp) in srcu_dereference_check() this time !
>
> From a documentation perspective, rcu_dereference_protected() (hopefully) helps
> highlight that there's something "special" about this helper, e.g. gives the reader
> a hint that they probably shouldn't be using kvm_get_bus().
Yes, I got it. Thanks for your nice explanation!
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-19 2:15 ` Yao Yuan
@ 2025-07-19 7:58 ` Keir Fraser
2025-07-20 0:08 ` Yao Yuan
0 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2025-07-19 7:58 UTC (permalink / raw)
To: Yao Yuan
Cc: Sean Christopherson, Yao Yuan, linux-arm-kernel, linux-kernel,
kvm, Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > remove the distributor's dependency on this implicit barrier by
> > > > direct acquire-release synchronization on the flag write and its
> > > > lock-free check.
> > > >
> > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > ---
> > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > index 502b65049703..bc83672e461b 100644
> > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > gpa_t dist_base;
> > > > int ret = 0;
> > > >
> > > > - if (likely(dist->ready))
> > > > + if (likely(smp_load_acquire(&dist->ready)))
> > > > return 0;
> > > >
> > > > mutex_lock(&kvm->slots_lock);
> > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > goto out_slots;
> > > > }
> > > >
> > > > - /*
> > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > - * registration before returning through synchronize_srcu(), which also
> > > > - * implies a full memory barrier. As such, marking the distributor as
> > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > - * a completely configured distributor.
> > > > - */
> > > > - dist->ready = true;
> > > > + smp_store_release(&dist->ready, true);
> > >
> > > No need the store-release and load-acquire for replacing
> > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> >
> > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > with a half-baked distributor.
> >
> > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to
> > smp_store_release() + smp_load_acquire() removes the dependency on the
> > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
>
> Yes, I understand this and agree with your point.
>
> Just for discusstion: I thought it should also work even w/o
> introduce the load acqure + store release after switch to
> call_srcu(): The smp_mb() in call_srcu() order the all store
> to kvm->arch.vgic before store kvm->arch.vgic.ready in
> current implementation.
The load-acquire would still be required, to ensure that accesses to
kvm->arch.vgic do not get reordered earlier than the lock-free check
of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
initialised, but then use speculated reads of uninitialised vgic state.
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-19 7:58 ` Keir Fraser
@ 2025-07-20 0:08 ` Yao Yuan
2025-07-21 7:49 ` Keir Fraser
0 siblings, 1 reply; 22+ messages in thread
From: Yao Yuan @ 2025-07-20 0:08 UTC (permalink / raw)
To: Keir Fraser
Cc: Sean Christopherson, Yao Yuan, linux-arm-kernel, linux-kernel,
kvm, Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > remove the distributor's dependency on this implicit barrier by
> > > > > direct acquire-release synchronization on the flag write and its
> > > > > lock-free check.
> > > > >
> > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > ---
> > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > index 502b65049703..bc83672e461b 100644
> > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > gpa_t dist_base;
> > > > > int ret = 0;
> > > > >
> > > > > - if (likely(dist->ready))
> > > > > + if (likely(smp_load_acquire(&dist->ready)))
> > > > > return 0;
> > > > >
> > > > > mutex_lock(&kvm->slots_lock);
> > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > goto out_slots;
> > > > > }
> > > > >
> > > > > - /*
> > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > - * registration before returning through synchronize_srcu(), which also
> > > > > - * implies a full memory barrier. As such, marking the distributor as
> > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > - * a completely configured distributor.
> > > > > - */
> > > > > - dist->ready = true;
> > > > > + smp_store_release(&dist->ready, true);
> > > >
> > > > No need the store-release and load-acquire for replacing
> > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > >
> > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > with a half-baked distributor.
> > >
> > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to
> > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> >
> > Yes, I understand this and agree with your point.
> >
> > Just for discusstion: I thought it should also work even w/o
> > introduce the load acqure + store release after switch to
> > call_srcu(): The smp_mb() in call_srcu() order the all store
> > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > current implementation.
>
> The load-acquire would still be required, to ensure that accesses to
> kvm->arch.vgic do not get reordered earlier than the lock-free check
> of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> initialised, but then use speculated reads of uninitialised vgic state.
>
Thanks for your explanation.
I see. But there's "mutex_lock(&kvm->slot_lock);" before later
acccessing to the kvm->arch.vgic, so I think the order can be
guaranteed. Of cause as you said a explicitly acquire-load +
store-release is better than before implicitly implementation.
> > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-20 0:08 ` Yao Yuan
@ 2025-07-21 7:49 ` Keir Fraser
2025-07-22 2:01 ` Yao Yuan
0 siblings, 1 reply; 22+ messages in thread
From: Keir Fraser @ 2025-07-21 7:49 UTC (permalink / raw)
To: Yao Yuan
Cc: Sean Christopherson, Yao Yuan, linux-arm-kernel, linux-kernel,
kvm, Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote:
> On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > > remove the distributor's dependency on this implicit barrier by
> > > > > > direct acquire-release synchronization on the flag write and its
> > > > > > lock-free check.
> > > > > >
> > > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > > ---
> > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > index 502b65049703..bc83672e461b 100644
> > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > gpa_t dist_base;
> > > > > > int ret = 0;
> > > > > >
> > > > > > - if (likely(dist->ready))
> > > > > > + if (likely(smp_load_acquire(&dist->ready)))
> > > > > > return 0;
> > > > > >
> > > > > > mutex_lock(&kvm->slots_lock);
> > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > goto out_slots;
> > > > > > }
> > > > > >
> > > > > > - /*
> > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > > - * registration before returning through synchronize_srcu(), which also
> > > > > > - * implies a full memory barrier. As such, marking the distributor as
> > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > > - * a completely configured distributor.
> > > > > > - */
> > > > > > - dist->ready = true;
> > > > > > + smp_store_release(&dist->ready, true);
> > > > >
> > > > > No need the store-release and load-acquire for replacing
> > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > > >
> > > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > > with a half-baked distributor.
> > > >
> > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to
> > > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> > >
> > > Yes, I understand this and agree with your point.
> > >
> > > Just for discusstion: I thought it should also work even w/o
> > > introduce the load acqure + store release after switch to
> > > call_srcu(): The smp_mb() in call_srcu() order the all store
> > > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > > current implementation.
> >
> > The load-acquire would still be required, to ensure that accesses to
> > kvm->arch.vgic do not get reordered earlier than the lock-free check
> > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> > initialised, but then use speculated reads of uninitialised vgic state.
> >
>
> Thanks for your explanation.
>
> I see. But there's "mutex_lock(&kvm->slot_lock);" before later
> acccessing to the kvm->arch.vgic, so I think the order can be
> guaranteed. Of cause as you said a explicitly acquire-load +
> store-release is better than before implicitly implementation.
If vgic_dist::ready is observed true by the lock-free read (the one
which is turned into load-acquire by this patch) then the function
immediately returns with no mutex_lock() executed. It is reads of
vgic_dist *after* return from kvm_vgic_map_resources() that you have
to worry about, and which require load-acquire semantics.
>
> > > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-21 7:49 ` Keir Fraser
@ 2025-07-22 2:01 ` Yao Yuan
2025-07-22 2:22 ` Yao Yuan
0 siblings, 1 reply; 22+ messages in thread
From: Yao Yuan @ 2025-07-22 2:01 UTC (permalink / raw)
To: Keir Fraser
Cc: Yao Yuan, Sean Christopherson, linux-arm-kernel, linux-kernel,
kvm, Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote:
> On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote:
> > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > > > remove the distributor's dependency on this implicit barrier by
> > > > > > > direct acquire-release synchronization on the flag write and its
> > > > > > > lock-free check.
> > > > > > >
> > > > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > > > ---
> > > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > index 502b65049703..bc83672e461b 100644
> > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > > gpa_t dist_base;
> > > > > > > int ret = 0;
> > > > > > >
> > > > > > > - if (likely(dist->ready))
> > > > > > > + if (likely(smp_load_acquire(&dist->ready)))
> > > > > > > return 0;
> > > > > > >
> > > > > > > mutex_lock(&kvm->slots_lock);
> > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > > goto out_slots;
> > > > > > > }
> > > > > > >
> > > > > > > - /*
> > > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > > > - * registration before returning through synchronize_srcu(), which also
> > > > > > > - * implies a full memory barrier. As such, marking the distributor as
> > > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > > > - * a completely configured distributor.
> > > > > > > - */
> > > > > > > - dist->ready = true;
> > > > > > > + smp_store_release(&dist->ready, true);
> > > > > >
> > > > > > No need the store-release and load-acquire for replacing
> > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > > > >
> > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > > > with a half-baked distributor.
> > > > >
> > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to
> > > > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> > > >
> > > > Yes, I understand this and agree with your point.
> > > >
> > > > Just for discusstion: I thought it should also work even w/o
> > > > introduce the load acqure + store release after switch to
> > > > call_srcu(): The smp_mb() in call_srcu() order the all store
> > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > > > current implementation.
> > >
> > > The load-acquire would still be required, to ensure that accesses to
> > > kvm->arch.vgic do not get reordered earlier than the lock-free check
> > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> > > initialised, but then use speculated reads of uninitialised vgic state.
> > >
> >
> > Thanks for your explanation.
> >
> > I see. But there's "mutex_lock(&kvm->slot_lock);" before later
> > acccessing to the kvm->arch.vgic, so I think the order can be
> > guaranteed. Of cause as you said a explicitly acquire-load +
> > store-release is better than before implicitly implementation.
>
> If vgic_dist::ready is observed true by the lock-free read (the one
> which is turned into load-acquire by this patch) then the function
> immediately returns with no mutex_lock() executed. It is reads of
> vgic_dist *after* return from kvm_vgic_map_resources() that you have
> to worry about, and which require load-acquire semantics.
I think this is the main purpose of such lock-free reading
here, to avoid lock contention on VM w/ large vCPUs for
vcpus' first time run together.
store-release makes sure the changes to vgic_dist::ready
become visible after the changes to vgic_dist become
visible, but it doesn't guarantee the vgic_dist::ready
becomes visible to reader on aother CPU **IMMEDIATELY**,
thus load-acquire in reader side is request for this.
Is above understanding correct ?
>
> >
> > > > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
2025-07-22 2:01 ` Yao Yuan
@ 2025-07-22 2:22 ` Yao Yuan
0 siblings, 0 replies; 22+ messages in thread
From: Yao Yuan @ 2025-07-22 2:22 UTC (permalink / raw)
To: Keir Fraser
Cc: Yao Yuan, Sean Christopherson, linux-arm-kernel, linux-kernel,
kvm, Eric Auger, Oliver Upton, Marc Zyngier, Will Deacon,
Paolo Bonzini
On Tue, Jul 22, 2025 at 10:01:27AM +0800, Yao Yuan wrote:
> On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote:
> > On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote:
> > > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> > > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > > > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > > > > remove the distributor's dependency on this implicit barrier by
> > > > > > > > direct acquire-release synchronization on the flag write and its
> > > > > > > > lock-free check.
> > > > > > > >
> > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > > > > ---
> > > > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > > > > 1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > > index 502b65049703..bc83672e461b 100644
> > > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > > > gpa_t dist_base;
> > > > > > > > int ret = 0;
> > > > > > > >
> > > > > > > > - if (likely(dist->ready))
> > > > > > > > + if (likely(smp_load_acquire(&dist->ready)))
> > > > > > > > return 0;
> > > > > > > >
> > > > > > > > mutex_lock(&kvm->slots_lock);
> > > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > > > goto out_slots;
> > > > > > > > }
> > > > > > > >
> > > > > > > > - /*
> > > > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > > > > - * registration before returning through synchronize_srcu(), which also
> > > > > > > > - * implies a full memory barrier. As such, marking the distributor as
> > > > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > > > > - * a completely configured distributor.
> > > > > > > > - */
> > > > > > > > - dist->ready = true;
> > > > > > > > + smp_store_release(&dist->ready, true);
> > > > > > >
> > > > > > > No need the store-release and load-acquire for replacing
> > > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > > > > >
> > > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > > > > with a half-baked distributor.
> > > > > >
> > > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to
> > > > > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> > > > >
> > > > > Yes, I understand this and agree with your point.
> > > > >
> > > > > Just for discusstion: I thought it should also work even w/o
> > > > > introduce the load acqure + store release after switch to
> > > > > call_srcu(): The smp_mb() in call_srcu() order the all store
> > > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > > > > current implementation.
> > > >
> > > > The load-acquire would still be required, to ensure that accesses to
> > > > kvm->arch.vgic do not get reordered earlier than the lock-free check
> > > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> > > > initialised, but then use speculated reads of uninitialised vgic state.
> > > >
> > >
> > > Thanks for your explanation.
> > >
> > > I see. But there's "mutex_lock(&kvm->slot_lock);" before later
> > > acccessing to the kvm->arch.vgic, so I think the order can be
> > > guaranteed. Of cause as you said a explicitly acquire-load +
> > > store-release is better than before implicitly implementation.
> >
> > If vgic_dist::ready is observed true by the lock-free read (the one
> > which is turned into load-acquire by this patch) then the function
> > immediately returns with no mutex_lock() executed. It is reads of
> > vgic_dist *after* return from kvm_vgic_map_resources() that you have
> > to worry about, and which require load-acquire semantics.
>
> I think this is the main purpose of such lock-free reading
> here, to avoid lock contention on VM w/ large vCPUs for
> vcpus' first time run together.
>
> store-release makes sure the changes to vgic_dist::ready
> become visible after the changes to vgic_dist become
> visible, but it doesn't guarantee the vgic_dist::ready
> becomes visible to reader on aother CPU **IMMEDIATELY**,
> thus load-acquire in reader side is request for this.
> Is above understanding correct ?
No. I get your point, the load-acuqire here for:
There's code path that the cpu read the vgic_dist reorder
before the vgic_dist::ready in case of the cpu get
vgic_dist::ready is true w/o the load_acquire here.
>
> >
> > >
> > > > > >
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-07-22 2:22 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-16 11:07 [PATCH v2 0/4] KVM: Speed up MMIO registrations Keir Fraser
2025-07-16 11:07 ` [PATCH v2 1/4] KVM: arm64: vgic-init: Remove vgic_ready() macro Keir Fraser
2025-07-16 11:07 ` [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering Keir Fraser
2025-07-17 5:44 ` Yao Yuan
2025-07-18 14:53 ` Keir Fraser
2025-07-18 15:01 ` Sean Christopherson
2025-07-18 15:25 ` Keir Fraser
2025-07-19 2:23 ` Yao Yuan
2025-07-18 15:00 ` Sean Christopherson
2025-07-19 2:15 ` Yao Yuan
2025-07-19 7:58 ` Keir Fraser
2025-07-20 0:08 ` Yao Yuan
2025-07-21 7:49 ` Keir Fraser
2025-07-22 2:01 ` Yao Yuan
2025-07-22 2:22 ` Yao Yuan
2025-07-16 11:07 ` [PATCH v2 3/4] KVM: Implement barriers before accessing kvm->buses[] on SRCU read paths Keir Fraser
2025-07-17 6:01 ` Yao Yuan
2025-07-18 14:54 ` Sean Christopherson
2025-07-19 2:37 ` Yao Yuan
2025-07-18 14:56 ` Keir Fraser
2025-07-19 2:33 ` Yao Yuan
2025-07-16 11:07 ` [PATCH v2 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev() Keir Fraser
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).