[PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values
@ 2023-09-07 10:09 Marc Zyngier
  2023-09-07 10:09 ` [PATCH 1/5] KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() Marc Zyngier
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 10:09 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Xu Zhao

Xu Zhao recently reported[1] that sending SGIs on large VMs was slower
than expected, specially if targeting vcpus that have a high vcpu
index. They root-caused it to the way we walk the vcpu xarray in the
search of the correct MPIDR, one vcpu at a time, which is of course
grossly inefficient.

The solution they proposed was, unfortunately, less than ideal, but I
was "nerd snipped" into doing something about it.

The main idea is to build a small hash table of MPIDR to vcpu
mappings, using the fact that most of the time, the MPIDR values only
use a small number of significant bits and that we can easily compute
a compact index from it. Once we have that, accelerating vcpu lookup
becomes pretty cheap, and we can in turn make SGIs great again.

It must be noted that since the MPIDR values are controlled by
userspace, it isn't always possible to allocate the hash table
(userspace could build a 32 vcpu VM and allocate one bit of affinity
to each of them, making all the bits significant). We thus always have
an iterative fallback -- if it hurts, don't do that.

Performance wise, this is very significant: using the KUT micro-bench
test with the following patch (always IPI-ing the last vcpu of the VM)
and running it with large number of vcpus shows a large improvement
(from 3832ns to 2593ns for a 64 vcpu VM, a 32% reduction, measured on
an Ampere Altra). I expect that IPI-happy workloads could benefit from
this.

Thanks,

	M.

[1] https://lore.kernel.org/r/20230825015811.5292-1-zhaoxu.35@bytedance.com

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index bfd181dc..f3ac3270 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -88,7 +88,7 @@ static bool test_init(void)

 	irq_ready = false;
 	gic_enable_defaults();
-	on_cpu_async(1, gic_secondary_entry, NULL);
+	on_cpu_async(nr_cpus - 1, gic_secondary_entry, NULL);

 	cntfrq = get_cntfrq();
 	printf("Timer Frequency %d Hz (Output in microseconds)\n", cntfrq);
@@ -157,7 +157,7 @@ static void ipi_exec(void)

 	irq_received = false;

-	gic_ipi_send_single(1, 1);
+	gic_ipi_send_single(1, nr_cpus - 1);

 	while (!irq_received && tries--)
 		cpu_relax();

Marc Zyngier (5):
  KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
  KVM: arm64: Build MPIDR to vcpu index cache at runtime
  KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is
    available
  KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  KVM: arm64: vgic-v3: Optimize affinity-based SGI injection

 arch/arm64/include/asm/kvm_emulate.h |   2 +-
 arch/arm64/include/asm/kvm_host.h    |  28 ++++++
 arch/arm64/kvm/arm.c                 |  66 +++++++++++++
 arch/arm64/kvm/vgic/vgic-mmio-v3.c   | 142 ++++++++++-----------------
 4 files changed, 148 insertions(+), 90 deletions(-)

-- 
2.34.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 1/5] KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
  2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
@ 2023-09-07 10:09 ` Marc Zyngier
  2023-09-07 15:28   ` Joey Gouly
  2023-09-07 10:09 ` [PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime Marc Zyngier
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 10:09 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Xu Zhao

By definition, MPIDR_EL1 cannot be modified by the guest. This
means it is pointless to check whether this is loaded on the CPU.

Simplify the kvm_vcpu_get_mpidr_aff() helper to directly access
the in-memory value.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3d6725ff0bf6..b66ef77cf49e 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -465,7 +465,7 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime
  2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
  2023-09-07 10:09 ` [PATCH 1/5] KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() Marc Zyngier
@ 2023-09-07 10:09 ` Marc Zyngier
  2023-09-07 15:29   ` Joey Gouly
  2023-09-07 10:09 ` [PATCH 3/5] KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available Marc Zyngier
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 10:09 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Xu Zhao

The MPIDR_EL1 register contains a unique value that identifies
the CPU. The only problem with it is that it is stupidly large
(32 bits, once the useless stuff is removed).

Trying to obtain a vcpu from an MPIDR value is a fairly common,
yet costly operation: we iterate over all the vcpus until we
find the correct one. While this is cheap for small VMs, it is
pretty expensive on large ones, specially if you are trying to
get to the one that's at the end of the list...

In order to help with this, it is important to realise that
the MPIDR values are actually structured, and that implementations
tend to use a small number of significant bits in the 32bit space.

We can use this fact to our advantage by computing a small hash
table that uses the "compression" of the significant MPIDR bits
as an index, giving us the vcpu index as a result.

Given that the MPIDR values can be supplied by userspace, and
that an evil VMM could decide to make *all* bits significant,
resulting in a 4G-entry table, we only use this method if the
resulting table fits in a single page. Otherwise, we fallback
to the good old iterative method.

Nothing uses that table just yet, but keep your eyes peeled.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 28 ++++++++++++++++
 arch/arm64/kvm/arm.c              | 54 +++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index af06ccb7ee34..361aaf66ac32 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -202,6 +202,31 @@ struct kvm_protected_vm {
 	struct kvm_hyp_memcache teardown_mc;
 };
 
+struct kvm_mpidr_data {
+	u64			mpidr_mask;
+	DECLARE_FLEX_ARRAY(u16, cmpidr_to_idx);
+};
+
+static inline u16 kvm_mpidr_index(struct kvm_mpidr_data *data, u64 mpidr)
+{
+	unsigned long mask = data->mpidr_mask;
+	u64 aff = mpidr & MPIDR_HWID_BITMASK;
+	int nbits, bit, bit_idx = 0;
+	u16 vcpu_idx = 0;
+
+	/*
+	 * If this looks like RISC-V's BEXT or x86's PEXT
+	 * instructions, it isn't by accident.
+	 */
+	nbits = fls(mask);
+	for_each_set_bit(bit, &mask, nbits) {
+		vcpu_idx |= (aff & BIT(bit)) >> (bit - bit_idx);
+		bit_idx++;
+	}
+
+	return vcpu_idx;
+}
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -248,6 +273,9 @@ struct kvm_arch {
 	/* VM-wide vCPU feature set */
 	DECLARE_BITMAP(vcpu_features, KVM_VCPU_MAX_FEATURES);
 
+	/* MPIDR to vcpu index mapping, optional */
+	struct kvm_mpidr_data *mpidr_data;
+
 	/*
 	 * VM-wide PMU filter, implemented as a bitmap and big enough for
 	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4866b3f7b4ea..30ce394c09d4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -205,6 +205,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	if (is_protected_kvm_enabled())
 		pkvm_destroy_hyp_vm(kvm);
 
+	kfree(kvm->arch.mpidr_data);
 	kvm_destroy_vcpus(kvm);
 
 	kvm_unshare_hyp(kvm, kvm + 1);
@@ -578,6 +579,57 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
 	return vcpu_get_flag(vcpu, VCPU_INITIALIZED);
 }
 
+static void kvm_init_mpidr_data(struct kvm *kvm)
+{
+	struct kvm_mpidr_data *data = NULL;
+	unsigned long c, mask, nr_entries;
+	u64 aff_set = 0, aff_clr = ~0UL;
+	struct kvm_vcpu *vcpu;
+
+	mutex_lock(&kvm->arch.config_lock);
+
+	if (kvm->arch.mpidr_data || atomic_read(&kvm->online_vcpus) == 1)
+		goto out;
+
+	kvm_for_each_vcpu(c, vcpu, kvm) {
+		u64 aff = kvm_vcpu_get_mpidr_aff(vcpu);
+		aff_set |= aff;
+		aff_clr &= aff;
+	}
+
+	/*
+	 * A significant bit can be either 0 or 1, and will only appear in
+	 * aff_set. Use aff_clr to weed out the useless stuff.
+	 */
+	mask = aff_set ^ aff_clr;
+	nr_entries = BIT_ULL(hweight_long(mask));
+
+	/*
+	 * Don't let userspace fool us. If we need more than a single page
+	 * to describe the compressed MPIDR array, just fall back to the
+	 * iterative method. Single vcpu VMs do not need this either.
+	 */
+	if (struct_size(data, cmpidr_to_idx, nr_entries) <= PAGE_SIZE)
+		data = kzalloc(struct_size(data, cmpidr_to_idx, nr_entries),
+			       GFP_KERNEL_ACCOUNT);
+
+	if (!data)
+		goto out;
+
+	data->mpidr_mask = mask;
+
+	kvm_for_each_vcpu(c, vcpu, kvm) {
+		u64 aff = kvm_vcpu_get_mpidr_aff(vcpu);
+		u16 vcpu_idx = kvm_mpidr_index(data, aff);
+
+		data->cmpidr_to_idx[vcpu_idx] = c;
+	}
+
+	kvm->arch.mpidr_data = data;
+out:
+	mutex_unlock(&kvm->arch.config_lock);
+}
+
 /*
  * Handle both the initialisation that is being done when the vcpu is
  * run for the first time, as well as the updates that must be
@@ -601,6 +653,8 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	if (likely(vcpu_has_run_once(vcpu)))
 		return 0;
 
+	kvm_init_mpidr_data(kvm);
+
 	kvm_arm_vcpu_init_debug(vcpu);
 
 	if (likely(irqchip_in_kernel(kvm))) {
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available
  2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
  2023-09-07 10:09 ` [PATCH 1/5] KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() Marc Zyngier
  2023-09-07 10:09 ` [PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime Marc Zyngier
@ 2023-09-07 10:09 ` Marc Zyngier
  2023-09-07 15:29   ` Joey Gouly
  2023-09-07 10:09 ` [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation Marc Zyngier
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 10:09 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Xu Zhao

If our fancy little table is present when calling kvm_mpidr_to_vcpu(),
use it to recover the corresponding vcpu.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 30ce394c09d4..5b75b2db12be 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2395,6 +2395,18 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
 	unsigned long i;
 
 	mpidr &= MPIDR_HWID_BITMASK;
+
+	if (kvm->arch.mpidr_data) {
+		u16 idx = kvm_mpidr_index(kvm->arch.mpidr_data, mpidr);
+
+		vcpu = kvm_get_vcpu(kvm,
+				    kvm->arch.mpidr_data->cmpidr_to_idx[idx]);
+		if (mpidr != kvm_vcpu_get_mpidr_aff(vcpu))
+			vcpu = NULL;
+
+		return vcpu;
+	}
+
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (mpidr == kvm_vcpu_get_mpidr_aff(vcpu))
 			return vcpu;
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
                   ` (2 preceding siblings ...)
  2023-09-07 10:09 ` [PATCH 3/5] KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available Marc Zyngier
@ 2023-09-07 10:09 ` Marc Zyngier
  2023-09-10 16:25   ` Zenghui Yu
  2023-09-07 10:09 ` [PATCH 5/5] KVM: arm64: vgic-v3: Optimize affinity-based SGI injection Marc Zyngier
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 10:09 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Xu Zhao

As we're about to change the way SGIs are sent, start by splitting
out some of the basic functionnality: instead of intermingling
the broadcast and non-broadcast cases with the actual SGI generation,
perform the following cleanups:

- move the SGI queuing into its own helper
- split the broadcast code from the affinity-driven code
- replace the mask/shift combinations with FIELD_GET()

The result is much more readable, and paves the way for further
optimisations.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-mmio-v3.c | 110 ++++++++++++++++-------------
 1 file changed, 59 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 188d2187eede..88b8d4524854 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -1052,6 +1052,38 @@ static int match_mpidr(u64 sgi_aff, u16 sgi_cpu_mask, struct kvm_vcpu *vcpu)
 	((((reg) & ICC_SGI1R_AFFINITY_## level ##_MASK) \
 	>> ICC_SGI1R_AFFINITY_## level ##_SHIFT) << MPIDR_LEVEL_SHIFT(level))
 
+static void vgic_v3_queue_sgi(struct kvm_vcpu *vcpu, u32 sgi, bool allow_group1)
+{
+	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, sgi);
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+
+	/*
+	 * An access targeting Group0 SGIs can only generate
+	 * those, while an access targeting Group1 SGIs can
+	 * generate interrupts of either group.
+	 */
+	if (!irq->group || allow_group1) {
+		if (!irq->hw) {
+			irq->pending_latch = true;
+			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
+		} else {
+			/* HW SGI? Ask the GIC to inject it */
+			int err;
+			err = irq_set_irqchip_state(irq->host_irq,
+						    IRQCHIP_STATE_PENDING,
+						    true);
+			WARN_RATELIMIT(err, "IRQ %d", irq->host_irq);
+			raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+		}
+	} else {
+		raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+	}
+
+	vgic_put_irq(vcpu->kvm, irq);
+}
+
 /**
  * vgic_v3_dispatch_sgi - handle SGI requests from VCPUs
  * @vcpu: The VCPU requesting a SGI
@@ -1070,19 +1102,30 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_vcpu *c_vcpu;
-	u16 target_cpus;
+	unsigned long target_cpus;
 	u64 mpidr;
-	int sgi;
-	int vcpu_id = vcpu->vcpu_id;
-	bool broadcast;
-	unsigned long c, flags;
-
-	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
-	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
-	target_cpus = (reg & ICC_SGI1R_TARGET_LIST_MASK) >> ICC_SGI1R_TARGET_LIST_SHIFT;
+	u32 sgi;
+	unsigned long c;
+
+	sgi = FIELD_GET(ICC_SGI1R_SGI_ID_MASK, reg);
+
+	/* Broadcast */
+	if (unlikely(reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT))) {
+		kvm_for_each_vcpu(c, c_vcpu, kvm) {
+			/* Don't signal the calling VCPU */
+			if (c_vcpu == vcpu)
+				continue;
+
+			vgic_v3_queue_sgi(c_vcpu, sgi, allow_group1);
+		}
+
+		return;
+	}
+
 	mpidr = SGI_AFFINITY_LEVEL(reg, 3);
 	mpidr |= SGI_AFFINITY_LEVEL(reg, 2);
 	mpidr |= SGI_AFFINITY_LEVEL(reg, 1);
+	target_cpus = FIELD_GET(ICC_SGI1R_TARGET_LIST_MASK, reg);
 
 	/*
 	 * We iterate over all VCPUs to find the MPIDRs matching the request.
@@ -1091,54 +1134,19 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
 	 * VCPUs when most of the times we just signal a single VCPU.
 	 */
 	kvm_for_each_vcpu(c, c_vcpu, kvm) {
-		struct vgic_irq *irq;
+		int level0;
 
 		/* Exit early if we have dealt with all requested CPUs */
-		if (!broadcast && target_cpus == 0)
+		if (target_cpus == 0)
 			break;
-
-		/* Don't signal the calling VCPU */
-		if (broadcast && c == vcpu_id)
+		level0 = match_mpidr(mpidr, target_cpus, c_vcpu);
+		if (level0 == -1)
 			continue;
 
-		if (!broadcast) {
-			int level0;
-
-			level0 = match_mpidr(mpidr, target_cpus, c_vcpu);
-			if (level0 == -1)
-				continue;
-
-			/* remove this matching VCPU from the mask */
-			target_cpus &= ~BIT(level0);
-		}
+		/* remove this matching VCPU from the mask */
+		target_cpus &= ~BIT(level0);
 
-		irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
-
-		raw_spin_lock_irqsave(&irq->irq_lock, flags);
-
-		/*
-		 * An access targeting Group0 SGIs can only generate
-		 * those, while an access targeting Group1 SGIs can
-		 * generate interrupts of either group.
-		 */
-		if (!irq->group || allow_group1) {
-			if (!irq->hw) {
-				irq->pending_latch = true;
-				vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
-			} else {
-				/* HW SGI? Ask the GIC to inject it */
-				int err;
-				err = irq_set_irqchip_state(irq->host_irq,
-							    IRQCHIP_STATE_PENDING,
-							    true);
-				WARN_RATELIMIT(err, "IRQ %d", irq->host_irq);
-				raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
-			}
-		} else {
-			raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
-		}
-
-		vgic_put_irq(vcpu->kvm, irq);
+		vgic_v3_queue_sgi(c_vcpu, sgi, allow_group1);
 	}
 }
 
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] KVM: arm64: vgic-v3: Optimize affinity-based SGI injection
  2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
                   ` (3 preceding siblings ...)
  2023-09-07 10:09 ` [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation Marc Zyngier
@ 2023-09-07 10:09 ` Marc Zyngier
  2023-09-07 15:30 ` [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Joey Gouly
  2023-09-11 15:01 ` Shameerali Kolothum Thodi
  6 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 10:09 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Xu Zhao

Our affinity-based SGI injection code is a bit daft. We iterate
over all the CPUs trying to match the set of affinities that the
guest is trying to reach, leading to some very bad behaviours
if the selected targets are at a high vcpu index.

Instead, we can now use the fact that we have an optimised
MPIDR to vcpu mapping, and only look at the relevant values.

This results in a much faster injection for large VMs, and
in a near constant time, irrespective of the position in the
vcpu index space.

As a bonus, this is mostly deleting a lot of hard-to-read
code. Nobody will complain about that.

Suggested-by: Xu Zhao <zhaoxu.35@bytedance.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-mmio-v3.c | 56 ++++--------------------------
 1 file changed, 6 insertions(+), 50 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 88b8d4524854..c7337a0fd242 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -1013,35 +1013,6 @@ int vgic_v3_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr)
 
 	return 0;
 }
-/*
- * Compare a given affinity (level 1-3 and a level 0 mask, from the SGI
- * generation register ICC_SGI1R_EL1) with a given VCPU.
- * If the VCPU's MPIDR matches, return the level0 affinity, otherwise
- * return -1.
- */
-static int match_mpidr(u64 sgi_aff, u16 sgi_cpu_mask, struct kvm_vcpu *vcpu)
-{
-	unsigned long affinity;
-	int level0;
-
-	/*
-	 * Split the current VCPU's MPIDR into affinity level 0 and the
-	 * rest as this is what we have to compare against.
-	 */
-	affinity = kvm_vcpu_get_mpidr_aff(vcpu);
-	level0 = MPIDR_AFFINITY_LEVEL(affinity, 0);
-	affinity &= ~MPIDR_LEVEL_MASK;
-
-	/* bail out if the upper three levels don't match */
-	if (sgi_aff != affinity)
-		return -1;
-
-	/* Is this VCPU's bit set in the mask ? */
-	if (!(sgi_cpu_mask & BIT(level0)))
-		return -1;
-
-	return level0;
-}
 
 /*
  * The ICC_SGI* registers encode the affinity differently from the MPIDR,
@@ -1104,7 +1075,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
 	struct kvm_vcpu *c_vcpu;
 	unsigned long target_cpus;
 	u64 mpidr;
-	u32 sgi;
+	u32 sgi, aff0;
 	unsigned long c;
 
 	sgi = FIELD_GET(ICC_SGI1R_SGI_ID_MASK, reg);
@@ -1122,31 +1093,16 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
 		return;
 	}
 
+	/* We iterate over affinities to find the corresponding vcpus */
 	mpidr = SGI_AFFINITY_LEVEL(reg, 3);
 	mpidr |= SGI_AFFINITY_LEVEL(reg, 2);
 	mpidr |= SGI_AFFINITY_LEVEL(reg, 1);
 	target_cpus = FIELD_GET(ICC_SGI1R_TARGET_LIST_MASK, reg);
 
-	/*
-	 * We iterate over all VCPUs to find the MPIDRs matching the request.
-	 * If we have handled one CPU, we clear its bit to detect early
-	 * if we are already finished. This avoids iterating through all
-	 * VCPUs when most of the times we just signal a single VCPU.
-	 */
-	kvm_for_each_vcpu(c, c_vcpu, kvm) {
-		int level0;
-
-		/* Exit early if we have dealt with all requested CPUs */
-		if (target_cpus == 0)
-			break;
-		level0 = match_mpidr(mpidr, target_cpus, c_vcpu);
-		if (level0 == -1)
-			continue;
-
-		/* remove this matching VCPU from the mask */
-		target_cpus &= ~BIT(level0);
-
-		vgic_v3_queue_sgi(c_vcpu, sgi, allow_group1);
+	for_each_set_bit(aff0, &target_cpus, hweight_long(ICC_SGI1R_TARGET_LIST_MASK)) {
+		c_vcpu = kvm_mpidr_to_vcpu(kvm, mpidr | aff0);
+		if (c_vcpu)
+			vgic_v3_queue_sgi(c_vcpu, sgi, allow_group1);
 	}
 }
 
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
  2023-09-07 10:09 ` [PATCH 1/5] KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() Marc Zyngier
@ 2023-09-07 15:28   ` Joey Gouly
  0 siblings, 0 replies; 19+ messages in thread
From: Joey Gouly @ 2023-09-07 15:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, Sep 07, 2023 at 11:09:27AM +0100, Marc Zyngier wrote:
> By definition, MPIDR_EL1 cannot be modified by the guest. This
> means it is pointless to check whether this is loaded on the CPU.
> 
> Simplify the kvm_vcpu_get_mpidr_aff() helper to directly access
> the in-memory value.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 3d6725ff0bf6..b66ef77cf49e 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -465,7 +465,7 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> +	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)

Reviewed-by: Joey Gouly <joey.gouly@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime
  2023-09-07 10:09 ` [PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime Marc Zyngier
@ 2023-09-07 15:29   ` Joey Gouly
  2023-09-07 18:15     ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Joey Gouly @ 2023-09-07 15:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, Sep 07, 2023 at 11:09:28AM +0100, Marc Zyngier wrote:
> The MPIDR_EL1 register contains a unique value that identifies
> the CPU. The only problem with it is that it is stupidly large
> (32 bits, once the useless stuff is removed).
> 
> Trying to obtain a vcpu from an MPIDR value is a fairly common,
> yet costly operation: we iterate over all the vcpus until we
> find the correct one. While this is cheap for small VMs, it is
> pretty expensive on large ones, specially if you are trying to
> get to the one that's at the end of the list...
> 
> In order to help with this, it is important to realise that
> the MPIDR values are actually structured, and that implementations
> tend to use a small number of significant bits in the 32bit space.
> 
> We can use this fact to our advantage by computing a small hash
> table that uses the "compression" of the significant MPIDR bits
> as an index, giving us the vcpu index as a result.
> 
> Given that the MPIDR values can be supplied by userspace, and
> that an evil VMM could decide to make *all* bits significant,
> resulting in a 4G-entry table, we only use this method if the
> resulting table fits in a single page. Otherwise, we fallback
> to the good old iterative method.
> 
> Nothing uses that table just yet, but keep your eyes peeled.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h | 28 ++++++++++++++++
>  arch/arm64/kvm/arm.c              | 54 +++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index af06ccb7ee34..361aaf66ac32 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -202,6 +202,31 @@ struct kvm_protected_vm {
>  	struct kvm_hyp_memcache teardown_mc;
>  };
>  
> +struct kvm_mpidr_data {
> +	u64			mpidr_mask;
> +	DECLARE_FLEX_ARRAY(u16, cmpidr_to_idx);
> +};
> +
> +static inline u16 kvm_mpidr_index(struct kvm_mpidr_data *data, u64 mpidr)
> +{
> +	unsigned long mask = data->mpidr_mask;
> +	u64 aff = mpidr & MPIDR_HWID_BITMASK;
> +	int nbits, bit, bit_idx = 0;
> +	u16 vcpu_idx = 0;
> +
> +	/*
> +	 * If this looks like RISC-V's BEXT or x86's PEXT
> +	 * instructions, it isn't by accident.
> +	 */
> +	nbits = fls(mask);
> +	for_each_set_bit(bit, &mask, nbits) {
> +		vcpu_idx |= (aff & BIT(bit)) >> (bit - bit_idx);
> +		bit_idx++;
> +	}
> +
> +	return vcpu_idx;
> +}
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -248,6 +273,9 @@ struct kvm_arch {
>  	/* VM-wide vCPU feature set */
>  	DECLARE_BITMAP(vcpu_features, KVM_VCPU_MAX_FEATURES);
>  
> +	/* MPIDR to vcpu index mapping, optional */
> +	struct kvm_mpidr_data *mpidr_data;
> +
>  	/*
>  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
>  	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 4866b3f7b4ea..30ce394c09d4 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -205,6 +205,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  	if (is_protected_kvm_enabled())
>  		pkvm_destroy_hyp_vm(kvm);
>  
> +	kfree(kvm->arch.mpidr_data);
>  	kvm_destroy_vcpus(kvm);
>  
>  	kvm_unshare_hyp(kvm, kvm + 1);
> @@ -578,6 +579,57 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
>  	return vcpu_get_flag(vcpu, VCPU_INITIALIZED);
>  }
>  
> +static void kvm_init_mpidr_data(struct kvm *kvm)
> +{
> +	struct kvm_mpidr_data *data = NULL;
> +	unsigned long c, mask, nr_entries;
> +	u64 aff_set = 0, aff_clr = ~0UL;
> +	struct kvm_vcpu *vcpu;
> +
> +	mutex_lock(&kvm->arch.config_lock);
> +
> +	if (kvm->arch.mpidr_data || atomic_read(&kvm->online_vcpus) == 1)
> +		goto out;
> +
> +	kvm_for_each_vcpu(c, vcpu, kvm) {
> +		u64 aff = kvm_vcpu_get_mpidr_aff(vcpu);
> +		aff_set |= aff;
> +		aff_clr &= aff;
> +	}
> +
> +	/*
> +	 * A significant bit can be either 0 or 1, and will only appear in
> +	 * aff_set. Use aff_clr to weed out the useless stuff.
> +	 */
> +	mask = aff_set ^ aff_clr;
> +	nr_entries = BIT_ULL(hweight_long(mask));
> +
> +	/*
> +	 * Don't let userspace fool us. If we need more than a single page
> +	 * to describe the compressed MPIDR array, just fall back to the
> +	 * iterative method. Single vcpu VMs do not need this either.
> +	 */
> +	if (struct_size(data, cmpidr_to_idx, nr_entries) <= PAGE_SIZE)
> +		data = kzalloc(struct_size(data, cmpidr_to_idx, nr_entries),
> +			       GFP_KERNEL_ACCOUNT);
> +
> +	if (!data)
> +		goto out;

Probably not a big deal, but if the data doesn't fit, every vCPU will run this
function up until this point (if the data fits or there's only 1 vCPU we bail
out earlier)

> +
> +	data->mpidr_mask = mask;
> +
> +	kvm_for_each_vcpu(c, vcpu, kvm) {
> +		u64 aff = kvm_vcpu_get_mpidr_aff(vcpu);
> +		u16 vcpu_idx = kvm_mpidr_index(data, aff);
> +
> +		data->cmpidr_to_idx[vcpu_idx] = c;
> +	}
> +
> +	kvm->arch.mpidr_data = data;
> +out:
> +	mutex_unlock(&kvm->arch.config_lock);
> +}
> +
>  /*
>   * Handle both the initialisation that is being done when the vcpu is
>   * run for the first time, as well as the updates that must be
> @@ -601,6 +653,8 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>  	if (likely(vcpu_has_run_once(vcpu)))
>  		return 0;
>  
> +	kvm_init_mpidr_data(kvm);
> +
>  	kvm_arm_vcpu_init_debug(vcpu);
>  
>  	if (likely(irqchip_in_kernel(kvm))) {

Reviewed-by: Joey Gouly <joey.gouly@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available
  2023-09-07 10:09 ` [PATCH 3/5] KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available Marc Zyngier
@ 2023-09-07 15:29   ` Joey Gouly
  0 siblings, 0 replies; 19+ messages in thread
From: Joey Gouly @ 2023-09-07 15:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, Sep 07, 2023 at 11:09:29AM +0100, Marc Zyngier wrote:
> If our fancy little table is present when calling kvm_mpidr_to_vcpu(),
> use it to recover the corresponding vcpu.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/arm.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 30ce394c09d4..5b75b2db12be 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2395,6 +2395,18 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
>  	unsigned long i;
>  
>  	mpidr &= MPIDR_HWID_BITMASK;
> +
> +	if (kvm->arch.mpidr_data) {
> +		u16 idx = kvm_mpidr_index(kvm->arch.mpidr_data, mpidr);
> +
> +		vcpu = kvm_get_vcpu(kvm,
> +				    kvm->arch.mpidr_data->cmpidr_to_idx[idx]);
> +		if (mpidr != kvm_vcpu_get_mpidr_aff(vcpu))
> +			vcpu = NULL;
> +
> +		return vcpu;
> +	}
> +
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		if (mpidr == kvm_vcpu_get_mpidr_aff(vcpu))
>  			return vcpu;

Reviewed-by: Joey Gouly <joey.gouly@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values
  2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
                   ` (4 preceding siblings ...)
  2023-09-07 10:09 ` [PATCH 5/5] KVM: arm64: vgic-v3: Optimize affinity-based SGI injection Marc Zyngier
@ 2023-09-07 15:30 ` Joey Gouly
  2023-09-07 18:17   ` Marc Zyngier
  2023-09-11 15:01 ` Shameerali Kolothum Thodi
  6 siblings, 1 reply; 19+ messages in thread
From: Joey Gouly @ 2023-09-07 15:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, Sep 07, 2023 at 11:09:26AM +0100, Marc Zyngier wrote:
> Xu Zhao recently reported[1] that sending SGIs on large VMs was slower
> than expected, specially if targeting vcpus that have a high vcpu
> index. They root-caused it to the way we walk the vcpu xarray in the
> search of the correct MPIDR, one vcpu at a time, which is of course
> grossly inefficient.
> 
> The solution they proposed was, unfortunately, less than ideal, but I
> was "nerd snipped" into doing something about it.
> 
> The main idea is to build a small hash table of MPIDR to vcpu
> mappings, using the fact that most of the time, the MPIDR values only
> use a small number of significant bits and that we can easily compute
> a compact index from it. Once we have that, accelerating vcpu lookup
> becomes pretty cheap, and we can in turn make SGIs great again.
> 
> It must be noted that since the MPIDR values are controlled by
> userspace, it isn't always possible to allocate the hash table
> (userspace could build a 32 vcpu VM and allocate one bit of affinity
> to each of them, making all the bits significant). We thus always have
> an iterative fallback -- if it hurts, don't do that.
> 
> Performance wise, this is very significant: using the KUT micro-bench
> test with the following patch (always IPI-ing the last vcpu of the VM)
> and running it with large number of vcpus shows a large improvement
> (from 3832ns to 2593ns for a 64 vcpu VM, a 32% reduction, measured on
> an Ampere Altra). I expect that IPI-happy workloads could benefit from
> this.
> 
> Thanks,
> 
> 	M.
> 
> [1] https://lore.kernel.org/r/20230825015811.5292-1-zhaoxu.35@bytedance.com
> 
> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> index bfd181dc..f3ac3270 100644
> --- a/arm/micro-bench.c
> +++ b/arm/micro-bench.c
> @@ -88,7 +88,7 @@ static bool test_init(void)
>  
>  	irq_ready = false;
>  	gic_enable_defaults();
> -	on_cpu_async(1, gic_secondary_entry, NULL);
> +	on_cpu_async(nr_cpus - 1, gic_secondary_entry, NULL);
>  
>  	cntfrq = get_cntfrq();
>  	printf("Timer Frequency %d Hz (Output in microseconds)\n", cntfrq);
> @@ -157,7 +157,7 @@ static void ipi_exec(void)
>  
>  	irq_received = false;
>  
> -	gic_ipi_send_single(1, 1);
> +	gic_ipi_send_single(1, nr_cpus - 1);
>  
>  	while (!irq_received && tries--)
>  		cpu_relax();
> 

Got a roughly similar perf improvement (about 28%).

Tested-by: Joey Gouly <joey.gouly@arm.com>

> 
> Marc Zyngier (5):
>   KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
>   KVM: arm64: Build MPIDR to vcpu index cache at runtime
>   KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is
>     available
>   KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
>   KVM: arm64: vgic-v3: Optimize affinity-based SGI injection
> 
>  arch/arm64/include/asm/kvm_emulate.h |   2 +-
>  arch/arm64/include/asm/kvm_host.h    |  28 ++++++
>  arch/arm64/kvm/arm.c                 |  66 +++++++++++++
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c   | 142 ++++++++++-----------------
>  4 files changed, 148 insertions(+), 90 deletions(-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime
  2023-09-07 15:29   ` Joey Gouly
@ 2023-09-07 18:15     ` Marc Zyngier
  0 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 18:15 UTC (permalink / raw)
  To: Joey Gouly
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, 07 Sep 2023 16:29:18 +0100,
Joey Gouly <joey.gouly@arm.com> wrote:
> 
> On Thu, Sep 07, 2023 at 11:09:28AM +0100, Marc Zyngier wrote:

[...]

> > @@ -578,6 +579,57 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> >  	return vcpu_get_flag(vcpu, VCPU_INITIALIZED);
> >  }
> >  
> > +static void kvm_init_mpidr_data(struct kvm *kvm)
> > +{
> > +	struct kvm_mpidr_data *data = NULL;
> > +	unsigned long c, mask, nr_entries;
> > +	u64 aff_set = 0, aff_clr = ~0UL;
> > +	struct kvm_vcpu *vcpu;
> > +
> > +	mutex_lock(&kvm->arch.config_lock);
> > +
> > +	if (kvm->arch.mpidr_data || atomic_read(&kvm->online_vcpus) == 1)
> > +		goto out;
> > +
> > +	kvm_for_each_vcpu(c, vcpu, kvm) {
> > +		u64 aff = kvm_vcpu_get_mpidr_aff(vcpu);
> > +		aff_set |= aff;
> > +		aff_clr &= aff;
> > +	}
> > +
> > +	/*
> > +	 * A significant bit can be either 0 or 1, and will only appear in
> > +	 * aff_set. Use aff_clr to weed out the useless stuff.
> > +	 */
> > +	mask = aff_set ^ aff_clr;
> > +	nr_entries = BIT_ULL(hweight_long(mask));
> > +
> > +	/*
> > +	 * Don't let userspace fool us. If we need more than a single page
> > +	 * to describe the compressed MPIDR array, just fall back to the
> > +	 * iterative method. Single vcpu VMs do not need this either.
> > +	 */
> > +	if (struct_size(data, cmpidr_to_idx, nr_entries) <= PAGE_SIZE)
> > +		data = kzalloc(struct_size(data, cmpidr_to_idx, nr_entries),
> > +			       GFP_KERNEL_ACCOUNT);
> > +
> > +	if (!data)
> > +		goto out;
> 
> Probably not a big deal, but if the data doesn't fit, every vCPU will run this
> function up until this point (if the data fits or there's only 1 vCPU we bail
> out earlier)

Yeah, I thought about that when writing this code, and applied the
following reasoning:

- this code is only run once per vcpu

- being able to remember that we cannot allocate the hash table
  requires at least an extra flag or a special value for the pointer

- this sequence is pretty quick (one read/or/and * nr_vcpu^2), and
  even if you have 512 vcpus, it isn't *that* much stuff given that it
  is spread across vcpus

Now, if someone can actually measure a significant boot-time speed-up,
I'll happily add that flag.

[...]

> Reviewed-by: Joey Gouly <joey.gouly@arm.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values
  2023-09-07 15:30 ` [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Joey Gouly
@ 2023-09-07 18:17   ` Marc Zyngier
  2023-09-07 20:27     ` Joey Gouly
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-09-07 18:17 UTC (permalink / raw)
  To: Joey Gouly
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, 07 Sep 2023 16:30:52 +0100,
Joey Gouly <joey.gouly@arm.com> wrote:

[...]

> Got a roughly similar perf improvement (about 28%).

Out of curiosity, on which HW?

> Tested-by: Joey Gouly <joey.gouly@arm.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values
  2023-09-07 18:17   ` Marc Zyngier
@ 2023-09-07 20:27     ` Joey Gouly
  2023-09-08  7:21       ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Joey Gouly @ 2023-09-07 20:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, Sep 07, 2023 at 07:17:03PM +0100, Marc Zyngier wrote:
> On Thu, 07 Sep 2023 16:30:52 +0100,
> Joey Gouly <joey.gouly@arm.com> wrote:
> 
> [...]
> 
> > Got a roughly similar perf improvement (about 28%).
> 
> Out of curiosity, on which HW?

I used QEMU emulation, but was told after posting this that QEMU is probably
not good to be used for perf measurments? Sorry about that. I can retry on Juno
tomorrow if that's of any use.

> 
> > Tested-by: Joey Gouly <joey.gouly@arm.com>
> 
> Thanks!
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values
  2023-09-07 20:27     ` Joey Gouly
@ 2023-09-08  7:21       ` Marc Zyngier
  0 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2023-09-08  7:21 UTC (permalink / raw)
  To: Joey Gouly
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Thu, 07 Sep 2023 21:27:12 +0100,
Joey Gouly <joey.gouly@arm.com> wrote:
> 
> On Thu, Sep 07, 2023 at 07:17:03PM +0100, Marc Zyngier wrote:
> > On Thu, 07 Sep 2023 16:30:52 +0100,
> > Joey Gouly <joey.gouly@arm.com> wrote:
> > 
> > [...]
> > 
> > > Got a roughly similar perf improvement (about 28%).
> > 
> > Out of curiosity, on which HW?
> 
> I used QEMU emulation, but was told after posting this that QEMU is probably
> not good to be used for perf measurments? Sorry about that. I can retry on Juno
> tomorrow if that's of any use.

Indeed, using models for performance measurement is pretty unreliable,
and will give you a rather different profile from the HW.

Unfortunately, Juno has only a GICv2, which won't see any significant
gain. You'll need something with a GICv3, as the whole interrupt
routing (SGI and SPI) is MPIDR based.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  2023-09-07 10:09 ` [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation Marc Zyngier
@ 2023-09-10 16:25   ` Zenghui Yu
  2023-09-10 18:18     ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Zenghui Yu @ 2023-09-10 16:25 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu, Xu Zhao

Hi Marc,

On 2023/9/7 18:09, Marc Zyngier wrote:
> As we're about to change the way SGIs are sent, start by splitting
> out some of the basic functionnality: instead of intermingling

functionality

> the broadcast and non-broadcast cases with the actual SGI generation,
> perform the following cleanups:
> 
> - move the SGI queuing into its own helper
> - split the broadcast code from the affinity-driven code
> - replace the mask/shift combinations with FIELD_GET()
> 
> The result is much more readable, and paves the way for further
> optimisations.

Indeed!

> @@ -1070,19 +1102,30 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
>  {
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_vcpu *c_vcpu;
> -	u16 target_cpus;
> +	unsigned long target_cpus;
>  	u64 mpidr;
> -	int sgi;
> -	int vcpu_id = vcpu->vcpu_id;
> -	bool broadcast;
> -	unsigned long c, flags;
> -
> -	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
> -	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> -	target_cpus = (reg & ICC_SGI1R_TARGET_LIST_MASK) >> ICC_SGI1R_TARGET_LIST_SHIFT;
> +	u32 sgi;
> +	unsigned long c;
> +
> +	sgi = FIELD_GET(ICC_SGI1R_SGI_ID_MASK, reg);
> +
> +	/* Broadcast */
> +	if (unlikely(reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT))) {
> +		kvm_for_each_vcpu(c, c_vcpu, kvm) {
> +			/* Don't signal the calling VCPU */
> +			if (c_vcpu == vcpu)
> +				continue;
> +
> +			vgic_v3_queue_sgi(c_vcpu, sgi, allow_group1);
> +		}
> +
> +		return;
> +	}
> +
>  	mpidr = SGI_AFFINITY_LEVEL(reg, 3);
>  	mpidr |= SGI_AFFINITY_LEVEL(reg, 2);
>  	mpidr |= SGI_AFFINITY_LEVEL(reg, 1);
> +	target_cpus = FIELD_GET(ICC_SGI1R_TARGET_LIST_MASK, reg);
>  
>  	/*
>  	 * We iterate over all VCPUs to find the MPIDRs matching the request.
> @@ -1091,54 +1134,19 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
>  	 * VCPUs when most of the times we just signal a single VCPU.
>  	 */
>  	kvm_for_each_vcpu(c, c_vcpu, kvm) {
> -		struct vgic_irq *irq;
> +		int level0;
>  
>  		/* Exit early if we have dealt with all requested CPUs */
> -		if (!broadcast && target_cpus == 0)
> +		if (target_cpus == 0)
>  			break;
> -
> -		/* Don't signal the calling VCPU */
> -		if (broadcast && c == vcpu_id)

Unrelated to this patch, but it looks that we were comparing the value
of *vcpu_idx* and vcpu_id to skip the calling VCPU. Is there a rule in
KVM that userspace should invoke KVM_CREATE_VCPU with sequential
"vcpu id"s, starting at 0, so that the user-provided vcpu_id always
equals to the KVM-internal vcpu_idx for a given VCPU?

I asked because it seems that in kvm/arm64 we always use
kvm_get_vcpu(kvm, i) to obtain the kvm_vcpu pointer, even if *i* is
sometimes essentially provided by userspace..

Besides, the refactor itself looks good to me.

Thanks,
Zenghui

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  2023-09-10 16:25   ` Zenghui Yu
@ 2023-09-10 18:18     ` Marc Zyngier
  2023-09-11 15:57       ` Zenghui Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2023-09-10 18:18 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Sun, 10 Sep 2023 17:25:36 +0100,
Zenghui Yu <zenghui.yu@linux.dev> wrote:
> 
> Hi Marc,
> 
> On 2023/9/7 18:09, Marc Zyngier wrote:
> > As we're about to change the way SGIs are sent, start by splitting
> > out some of the basic functionnality: instead of intermingling
> 
> functionality
> 
> > the broadcast and non-broadcast cases with the actual SGI generation,
> > perform the following cleanups:
> > 
> > - move the SGI queuing into its own helper
> > - split the broadcast code from the affinity-driven code
> > - replace the mask/shift combinations with FIELD_GET()
> > 
> > The result is much more readable, and paves the way for further
> > optimisations.
> 
> Indeed!
> 
> > @@ -1070,19 +1102,30 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
> >  {
> >  	struct kvm *kvm = vcpu->kvm;
> >  	struct kvm_vcpu *c_vcpu;
> > -	u16 target_cpus;
> > +	unsigned long target_cpus;
> >  	u64 mpidr;
> > -	int sgi;
> > -	int vcpu_id = vcpu->vcpu_id;
> > -	bool broadcast;
> > -	unsigned long c, flags;
> > -
> > -	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
> > -	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> > -	target_cpus = (reg & ICC_SGI1R_TARGET_LIST_MASK) >> ICC_SGI1R_TARGET_LIST_SHIFT;
> > +	u32 sgi;
> > +	unsigned long c;
> > +
> > +	sgi = FIELD_GET(ICC_SGI1R_SGI_ID_MASK, reg);
> > +
> > +	/* Broadcast */
> > +	if (unlikely(reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT))) {
> > +		kvm_for_each_vcpu(c, c_vcpu, kvm) {
> > +			/* Don't signal the calling VCPU */
> > +			if (c_vcpu == vcpu)
> > +				continue;
> > +
> > +			vgic_v3_queue_sgi(c_vcpu, sgi, allow_group1);
> > +		}
> > +
> > +		return;
> > +	}
> > +
> >  	mpidr = SGI_AFFINITY_LEVEL(reg, 3);
> >  	mpidr |= SGI_AFFINITY_LEVEL(reg, 2);
> >  	mpidr |= SGI_AFFINITY_LEVEL(reg, 1);
> > +	target_cpus = FIELD_GET(ICC_SGI1R_TARGET_LIST_MASK, reg);
> >   	/*
> >  	 * We iterate over all VCPUs to find the MPIDRs matching the request.
> > @@ -1091,54 +1134,19 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1)
> >  	 * VCPUs when most of the times we just signal a single VCPU.
> >  	 */
> >  	kvm_for_each_vcpu(c, c_vcpu, kvm) {
> > -		struct vgic_irq *irq;
> > +		int level0;
> >   		/* Exit early if we have dealt with all requested CPUs
> > */
> > -		if (!broadcast && target_cpus == 0)
> > +		if (target_cpus == 0)
> >  			break;
> > -
> > -		/* Don't signal the calling VCPU */
> > -		if (broadcast && c == vcpu_id)
> 
> Unrelated to this patch, but it looks that we were comparing the value
> of *vcpu_idx* and vcpu_id to skip the calling VCPU.

Huh, well caught. That was definitely a bug that was there for ever,
and only you spotted it. Guess I should flag it as a stable candidate.

> Is there a rule in KVM that userspace should invoke KVM_CREATE_VCPU
> with sequential "vcpu id"s, starting at 0, so that the user-provided
> vcpu_id always equals to the KVM-internal vcpu_idx for a given VCPU?

I don't think there is any such rule. As far as I can tell, any number
will do as long as it is within the range [0, max_vcpu_id). Of course,
max_vcpu_id doesn't even exist on arm64. From what I can tell, this is
just some random number between 0 and 511 for us (GICv2
notwithstanding).

> I asked because it seems that in kvm/arm64 we always use
> kvm_get_vcpu(kvm, i) to obtain the kvm_vcpu pointer, even if *i* is
> sometimes essentially provided by userspace..

Huh, this is incredibly dodgy. I had a go at a few occurrences (see
below), but this is hardly a complete list.

> Besides, the refactor itself looks good to me.

Cool, thanks!

	M.

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 6dcdae4d38cb..e32c867e7b48 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -458,7 +458,7 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 				   timer_ctx->irq.level);
 
 	if (!userspace_irqchip(vcpu->kvm)) {
-		ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+		ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_idx,
 					  timer_irq(timer_ctx),
 					  timer_ctx->irq.level,
 					  timer_ctx);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a3b13281d38a..1f7b074b81df 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -439,9 +439,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	 * We might get preempted before the vCPU actually runs, but
 	 * over-invalidation doesn't affect correctness.
 	 */
-	if (*last_ran != vcpu->vcpu_id) {
+	if (*last_ran != vcpu->vcpu_idx) {
 		kvm_call_hyp(__kvm_flush_cpu_context, mmu);
-		*last_ran = vcpu->vcpu_id;
+		*last_ran = vcpu->vcpu_idx;
 	}
 
 	vcpu->cpu = cpu;
@@ -1207,7 +1207,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
 		if (vcpu_idx >= nrcpus)
 			return -EINVAL;
 
-		vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+		vcpu = kvm_get_vcpu_by_id(kvm, vcpu_idx);
 		if (!vcpu)
 			return -EINVAL;
 
@@ -1222,14 +1222,14 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
 		if (vcpu_idx >= nrcpus)
 			return -EINVAL;
 
-		vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+		vcpu = kvm_get_vcpu_by_id(kvm, vcpu_idx);
 		if (!vcpu)
 			return -EINVAL;
 
 		if (irq_num < VGIC_NR_SGIS || irq_num >= VGIC_NR_PRIVATE_IRQS)
 			return -EINVAL;
 
-		return kvm_vgic_inject_irq(kvm, vcpu->vcpu_id, irq_num, level, NULL);
+		return kvm_vgic_inject_irq(kvm, vcpu->vcpu_idx, irq_num, level, NULL);
 	case KVM_ARM_IRQ_TYPE_SPI:
 		if (!irqchip_in_kernel(kvm))
 			return -ENXIO;
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 6b066e04dc5d..4448940b6d79 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -348,7 +348,7 @@ static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
 	pmu->irq_level = overflow;
 
 	if (likely(irqchip_in_kernel(vcpu->kvm))) {
-		int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+		int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_idx,
 					      pmu->irq_num, overflow, pmu);
 		WARN_ON(ret);
 	}
diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
index 07aa0437125a..85606a531dc3 100644
--- a/arch/arm64/kvm/vgic/vgic-debug.c
+++ b/arch/arm64/kvm/vgic/vgic-debug.c
@@ -166,7 +166,7 @@ static void print_header(struct seq_file *s, struct vgic_irq *irq,
 
 	if (vcpu) {
 		hdr = "VCPU";
-		id = vcpu->vcpu_id;
+		id = vcpu->vcpu_idx;
 	}
 
 	seq_printf(s, "\n");
@@ -212,7 +212,7 @@ static void print_irq_state(struct seq_file *s, struct vgic_irq *irq,
 		      "     %2d "
 		      "\n",
 			type, irq->intid,
-			(irq->target_vcpu) ? irq->target_vcpu->vcpu_id : -1,
+			(irq->target_vcpu) ? irq->target_vcpu->vcpu_idx : -1,
 			pending,
 			irq->line_level,
 			irq->active,
@@ -224,7 +224,7 @@ static void print_irq_state(struct seq_file *s, struct vgic_irq *irq,
 			irq->mpidr,
 			irq->source,
 			irq->priority,
-			(irq->vcpu) ? irq->vcpu->vcpu_id : -1);
+			(irq->vcpu) ? irq->vcpu->vcpu_idx : -1);
 }
 
 static int vgic_debug_show(struct seq_file *s, void *v)
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 212b73a715c1..82b264ad68c4 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -345,7 +345,7 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
 	if (cpuid >= atomic_read(&dev->kvm->online_vcpus))
 		return -EINVAL;
 
-	reg_attr->vcpu = kvm_get_vcpu(dev->kvm, cpuid);
+	reg_attr->vcpu = kvm_get_vcpu_by_id(dev->kvm, cpuid);
 	reg_attr->addr = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
 
 	return 0;

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* RE: [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values
  2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
                   ` (5 preceding siblings ...)
  2023-09-07 15:30 ` [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Joey Gouly
@ 2023-09-11 15:01 ` Shameerali Kolothum Thodi
  6 siblings, 0 replies; 19+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-09-11 15:01 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, yuzenghui, Xu Zhao



> -----Original Message-----
> From: Marc Zyngier [mailto:maz@kernel.org]
> Sent: 07 September 2023 11:09
> To: kvmarm@lists.linux.dev; linux-arm-kernel@lists.infradead.org;
> kvm@vger.kernel.org
> Cc: James Morse <james.morse@arm.com>; Suzuki K Poulose
> <suzuki.poulose@arm.com>; Oliver Upton <oliver.upton@linux.dev>;
> yuzenghui <yuzenghui@huawei.com>; Xu Zhao <zhaoxu.35@bytedance.com>
> Subject: [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR
> values
> 
> Xu Zhao recently reported[1] that sending SGIs on large VMs was slower
> than expected, specially if targeting vcpus that have a high vcpu
> index. They root-caused it to the way we walk the vcpu xarray in the
> search of the correct MPIDR, one vcpu at a time, which is of course
> grossly inefficient.
> 
> The solution they proposed was, unfortunately, less than ideal, but I
> was "nerd snipped" into doing something about it.
> 
> The main idea is to build a small hash table of MPIDR to vcpu
> mappings, using the fact that most of the time, the MPIDR values only
> use a small number of significant bits and that we can easily compute
> a compact index from it. Once we have that, accelerating vcpu lookup
> becomes pretty cheap, and we can in turn make SGIs great again.
> 
> It must be noted that since the MPIDR values are controlled by
> userspace, it isn't always possible to allocate the hash table
> (userspace could build a 32 vcpu VM and allocate one bit of affinity
> to each of them, making all the bits significant). We thus always have
> an iterative fallback -- if it hurts, don't do that.
> 
> Performance wise, this is very significant: using the KUT micro-bench
> test with the following patch (always IPI-ing the last vcpu of the VM)
> and running it with large number of vcpus shows a large improvement
> (from 3832ns to 2593ns for a 64 vcpu VM, a 32% reduction, measured on
> an Ampere Altra). I expect that IPI-happy workloads could benefit from
> this.

Hi Marc,

Tested on a HiSilicon D06 test board using KUT micro-bench(+ the 
changes) with a 64 vCPU VM. From an avg. of 5 runs, observed around
~54% improvement for IPI (from 5309ns to 2413ns).

FWIW,
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Thanks,
Shameer



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  2023-09-10 18:18     ` Marc Zyngier
@ 2023-09-11 15:57       ` Zenghui Yu
  2023-09-12 13:07         ` Marc Zyngier
  0 siblings, 1 reply; 19+ messages in thread
From: Zenghui Yu @ 2023-09-11 15:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On 2023/9/11 02:18, Marc Zyngier wrote:
> On Sun, 10 Sep 2023 17:25:36 +0100,
> Zenghui Yu <zenghui.yu@linux.dev> wrote:
>>
>> Hi Marc,
>>
>> I asked because it seems that in kvm/arm64 we always use
>> kvm_get_vcpu(kvm, i) to obtain the kvm_vcpu pointer, even if *i* is
>> sometimes essentially provided by userspace..
> 
> Huh, this is incredibly dodgy. I had a go at a few occurrences (see
> below), but this is hardly a complete list.

Another case is all kvm_get_vcpu(kvm, target_addr) in the vgic-its
emulation code. As we expose GITS_TYPER.PTA=0 to guest, which indicates
that the target address corresponds to the PE number specified by
GICR_TYPER.Processor_Number, which is now encoded as vcpu->vcpu_id.

Thanks,
Zenghui

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  2023-09-11 15:57       ` Zenghui Yu
@ 2023-09-12 13:07         ` Marc Zyngier
  0 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2023-09-12 13:07 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Xu Zhao

On Mon, 11 Sep 2023 16:57:39 +0100,
Zenghui Yu <zenghui.yu@linux.dev> wrote:
> 
> On 2023/9/11 02:18, Marc Zyngier wrote:
> > On Sun, 10 Sep 2023 17:25:36 +0100,
> > Zenghui Yu <zenghui.yu@linux.dev> wrote:
> >> 
> >> Hi Marc,
> >> 
> >> I asked because it seems that in kvm/arm64 we always use
> >> kvm_get_vcpu(kvm, i) to obtain the kvm_vcpu pointer, even if *i* is
> >> sometimes essentially provided by userspace..
> > 
> > Huh, this is incredibly dodgy. I had a go at a few occurrences (see
> > below), but this is hardly a complete list.
> 
> Another case is all kvm_get_vcpu(kvm, target_addr) in the vgic-its
> emulation code. As we expose GITS_TYPER.PTA=0 to guest, which indicates
> that the target address corresponds to the PE number specified by
> GICR_TYPER.Processor_Number, which is now encoded as vcpu->vcpu_id.

Yup, that's indeed missing. I'm going to hack kvmtool to generate
stupid vcpu_ids and see what explodes...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-09-12 13:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-07 10:09 [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Marc Zyngier
2023-09-07 10:09 ` [PATCH 1/5] KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() Marc Zyngier
2023-09-07 15:28   ` Joey Gouly
2023-09-07 10:09 ` [PATCH 2/5] KVM: arm64: Build MPIDR to vcpu index cache at runtime Marc Zyngier
2023-09-07 15:29   ` Joey Gouly
2023-09-07 18:15     ` Marc Zyngier
2023-09-07 10:09 ` [PATCH 3/5] KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available Marc Zyngier
2023-09-07 15:29   ` Joey Gouly
2023-09-07 10:09 ` [PATCH 4/5] KVM: arm64: vgic-v3: Refactor GICv3 SGI generation Marc Zyngier
2023-09-10 16:25   ` Zenghui Yu
2023-09-10 18:18     ` Marc Zyngier
2023-09-11 15:57       ` Zenghui Yu
2023-09-12 13:07         ` Marc Zyngier
2023-09-07 10:09 ` [PATCH 5/5] KVM: arm64: vgic-v3: Optimize affinity-based SGI injection Marc Zyngier
2023-09-07 15:30 ` [PATCH 0/5] KVM: arm64: Accelerate lookup of vcpus by MPIDR values Joey Gouly
2023-09-07 18:17   ` Marc Zyngier
2023-09-07 20:27     ` Joey Gouly
2023-09-08  7:21       ` Marc Zyngier
2023-09-11 15:01 ` Shameerali Kolothum Thodi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).