[PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support
@ 2026-06-15  8:21 Tao Cui
  2026-06-15  8:21 ` [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory Tao Cui
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tao Cui @ 2026-06-15  8:21 UTC (permalink / raw)
  To: maobibo, zhaotianrui, chenhuacai, loongarch; +Cc: kernel, kvm, Tao Cui

From: Tao Cui <cuitao@kylinos.cn>

This series implements paravirtualized TLB flush for LoongArch KVM
guests.

In multi-vCPU KVM guests, remote TLB flushes broadcast IPIs to all
target vCPUs, including those preempted by the host. Sending IPIs to
preempted vCPUs causes unnecessary VM exits and grows with vCPU count,
becoming severe in overcommitted deployments.

Reuse the existing steal-time shared memory page by adding a new
KVM_VCPU_FLUSH_TLB flag to the preempted byte. On the guest side,
skip IPIs to preempted vCPUs and set the flag via cmpxchg instead.
On the host side, when re-entering a vCPU in kvm_update_stolen_time(),
check and clear the flag; if set, drop the VPID to trigger a full TLB
flush on the next VM entry. No new shared memory page, hypercall, or
Kconfig option is needed.

The feature is advertised through the existing KVM PV feature
negotiation: the kernel exposes KVM_LOONGARCH_VM_FEAT_PV_TLB_FLUSH as
a VM-level capability, and userspace (QEMU) sets
KVM_FEATURE_PV_TLB_FLUSH in the guest's CPUCFG_KVM_FEATURE mask after
probing it. A corresponding QEMU patch ("target/loongarch: Enable PV
TLB flush advertisement to the guest") is posted alongside this
series.

- Host side: only trace a PV TLB flush request when one is observed.
- (Carried over) Host uses amand_db.w to atomically read and clear the
  preempted byte; selftest gained input validation and failure cleanup.

Testing: the PV TLB flush path itself is unchanged from v3, so the
benchmark numbers below still hold.  Note that, unlike v3, the feature
must now be enabled by userspace: run a QEMU built with the companion
patch (or with -cpu kvm-pv-tlb-flush=on) so the guest actually observes
KVM_FEATURE_PV_TLB_FLUSH.  Boot a 32-vCPU guest and run the selftest
inside it with sleep-idle (PV helps) and busy-spin (PV cannot optimize)
modes respectively:

  qemu-system-loongarch64 \
    -m 4G -smp 32 --cpu la464 --machine virt \
    -bios .../QEMU_EFI.fd \
    -kernel .../vmlinuz-...-pvtlb-v4+ \
    -initrd /tmp/ramdisk_test.gz \
    -serial mon:stdio \
    -netdev tap,id=net0,ifname=tap0,script=no,downscript=no \
    -device virtio-net-pci,netdev=net0 \
    -append "root=/dev/ram rdinit=/sbin/init console=ttyS0,115200" -nographic

  # PV TLB flush enabled (idle threads sleep, vCPUs get preempted)
  guest# ./pv_tlb_flush_test 1 31 50000 0

  # Baseline (idle threads busy-spin, all vCPUs stay active)
  guest# ./pv_tlb_flush_test 1 31 50000 1

  With PV TLB flush (sleep idle):    ~152,285 ns/flush
  Without PV TLB flush (busy-spin):  ~481,045 ns/flush

  Improvement: ~68% latency reduction (~3.2x throughput increase)

Tao Cui (3):
  LoongArch: KVM: Add PV TLB flush support via steal-time shared memory
  LoongArch: KVM: Implement guest-side PV TLB flush
  KVM: selftests: loongarch: Add PV TLB flush performance test

 arch/loongarch/include/asm/kvm_host.h              |   1 +
 arch/loongarch/include/asm/kvm_para.h              |   9 +
 arch/loongarch/include/asm/paravirt.h              |  21 +++
 arch/loongarch/include/uapi/asm/kvm.h              |   1 +
 arch/loongarch/include/uapi/asm/kvm_para.h         |   1 +
 arch/loongarch/kernel/paravirt.c                   |  60 ++++++
 arch/loongarch/kernel/smp.c                        |  30 +++-
 arch/loongarch/kvm/trace.h                         |  15 ++
 arch/loongarch/kvm/vcpu.c                          |  34 +++-
 arch/loongarch/kvm/vm.c                            |   3 +
 .../selftests/kvm/loongarch/pv_tlb_flush_test.c    | 194 +++++++++++++++++++++
 11 files changed, 362 insertions(+), 7 deletions(-)

---
Changes in v4:
- Drop the "Preserve auto-enabled PV features on userspace override"
  patch: forcing auto-enabled features back on is migration-unsafe, and
  for PV TLB flush (which cannot degrade gracefully) a missed flush
  would corrupt the guest.  Enablement now follows the usual KVM model
  -- the kernel advertises KVM_LOONGARCH_VM_FEAT_PV_TLB_FLUSH and
  userspace (QEMU) explicitly sets KVM_FEATURE_PV_TLB_FLUSH after
  probing it; a companion QEMU patch is posted alongside, and the
  feature can be kept off for an older destination
  (-cpu kvm-pv-tlb-flush=off).

Changes in v3:
- Host side: replace amswap_db.w with amand_db.w to atomically read
  and clear only the preempted byte, preserving the pad bytes for
  future UAPI use.  Issue a normal load (unsafe_get_user) before the
  atomic amand_db.w to avoid operating on stale cache data.
- Host side: move the pv_auto_features OR operation before the
  compatibility check in kvm_loongarch_cpucfg_set_attr() so that
  userspace does not need updating for pure kernel-internal PV
  feature additions.
- Selftest: add input validation, error checking on pthread_create,
  and cleanup handling on failure.

Changes in v2:
- Host side: replace non-atomic unsafe_get_user + unsafe_put_user with
  atomic amswap_db.w inline assembly. This fixes two issues:
  1) unsafe_put_user failure could skip the TLB flush entirely
  2) non-atomic read+write race with guest-side try_cmpxchg could
     cause FLUSH_TLB requests to be lost
- Guest side: consolidate two separate READ_ONCE calls into a single
  READ_ONCE to eliminate a TOCTOU race where the host could clear
  preempted between the two reads. Also switch from byte-sized
  try_cmpxchg to 32-bit try_cmpxchg on the aligned word containing
  preempted.

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory
  2026-06-15  8:21 [PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support Tao Cui
@ 2026-06-15  8:21 ` Tao Cui
  2026-06-15  8:35   ` sashiko-bot
  2026-06-15  8:21 ` [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush Tao Cui
  2026-06-15  8:21 ` [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test Tao Cui
  2 siblings, 1 reply; 7+ messages in thread
From: Tao Cui @ 2026-06-15  8:21 UTC (permalink / raw)
  To: maobibo, zhaotianrui, chenhuacai, loongarch; +Cc: kernel, kvm, Tao Cui

From: Tao Cui <cuitao@kylinos.cn>

Implement paravirtualized TLB flush for LoongArch KVM guests using the
existing steal-time shared memory page.

The mechanism uses the preempted byte in struct kvm_steal_time with an
additional KVM_VCPU_FLUSH_TLB flag bit:

- When a guest vCPU needs remote TLB flush but the target vCPU is
  preempted (not running), it atomically sets KVM_VCPU_FLUSH_TLB in
  the target's steal-time preempted byte instead of sending an IPI.
- When the host re-enters the target vCPU (kvm_update_stolen_time()),
  it atomically reads and clears only the preempted byte via amand_db.w
  with mask ~0xff on the 32-bit aligned word.  amand_db.w provides
  full barrier semantics and avoids the race with guest-side
  try_cmpxchg().  LoongArch is little-endian, so the preempted byte
  occupies bits [7:0]; this preserves pad[0..2] for future UAPI
  extension.  If KVM_VCPU_FLUSH_TLB was set, the host drops the vCPU's
  VPID, which triggers a full TLB flush on the next VM entry via
  kvm_check_vpid().
- For non-preempted vCPUs, the guest falls back to normal IPI-based
  flush, avoiding unnecessary VM exits.

Issue a normal load (unsafe_get_user) before the atomic amand_db.w to
avoid operating on stale cache data when the cache line was last
modified by a different core.

This significantly reduces TLB flush overhead in multi-vCPU workloads
where target vCPUs are often idle/preempted.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 arch/loongarch/include/asm/kvm_host.h      |  1 +
 arch/loongarch/include/asm/kvm_para.h      |  1 +
 arch/loongarch/include/uapi/asm/kvm.h      |  1 +
 arch/loongarch/include/uapi/asm/kvm_para.h |  1 +
 arch/loongarch/kvm/trace.h                 | 15 ++++++++++
 arch/loongarch/kvm/vcpu.c                  | 34 +++++++++++++++++++++-
 arch/loongarch/kvm/vm.c                    |  3 ++
 7 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h b/arch/loongarch/include/asm/kvm_host.h
index 776bc487a705..1750632699e0 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -168,6 +168,7 @@ enum emulation_result {
 #define LOONGARCH_PV_FEAT_MASK		(BIT(KVM_FEATURE_IPI) |		\
 					 BIT(KVM_FEATURE_PREEMPT) |	\
 					 BIT(KVM_FEATURE_STEAL_TIME) |	\
+					 BIT(KVM_FEATURE_PV_TLB_FLUSH) |\
 					 BIT(KVM_FEATURE_USER_HCALL) |	\
 					 BIT(KVM_FEATURE_VIRT_EXTIOI))
 
diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
index fb17ba0fa101..28e3fa3b4c0e 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -41,6 +41,7 @@ struct kvm_steal_time {
 	__u8  pad[47];
 };
 #define KVM_VCPU_PREEMPTED		(1 << 0)
+#define KVM_VCPU_FLUSH_TLB		(1 << 1)
 
 /*
  * Hypercall interface for KVM hypervisor
diff --git a/arch/loongarch/include/uapi/asm/kvm.h b/arch/loongarch/include/uapi/asm/kvm.h
index cd0b5c11ca9c..e4cd4bbf8914 100644
--- a/arch/loongarch/include/uapi/asm/kvm.h
+++ b/arch/loongarch/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_fpu {
 #define  KVM_LOONGARCH_VM_FEAT_PTW		8
 #define  KVM_LOONGARCH_VM_FEAT_MSGINT		9
 #define  KVM_LOONGARCH_VM_FEAT_PV_PREEMPT	10
+#define  KVM_LOONGARCH_VM_FEAT_PV_TLB_FLUSH	11
 
 /* Device Control API on vcpu fd */
 #define KVM_LOONGARCH_VCPU_CPUCFG	0
diff --git a/arch/loongarch/include/uapi/asm/kvm_para.h b/arch/loongarch/include/uapi/asm/kvm_para.h
index d28cbcadd276..8872839251cc 100644
--- a/arch/loongarch/include/uapi/asm/kvm_para.h
+++ b/arch/loongarch/include/uapi/asm/kvm_para.h
@@ -16,6 +16,7 @@
 #define  KVM_FEATURE_IPI		1
 #define  KVM_FEATURE_STEAL_TIME		2
 #define  KVM_FEATURE_PREEMPT		3
+#define  KVM_FEATURE_PV_TLB_FLUSH	4
 /* BIT 24 - 31 are features configurable by user space vmm */
 #define  KVM_FEATURE_VIRT_EXTIOI	24
 #define  KVM_FEATURE_USER_HCALL		25
diff --git a/arch/loongarch/kvm/trace.h b/arch/loongarch/kvm/trace.h
index 3467ee22b704..8556954fa196 100644
--- a/arch/loongarch/kvm/trace.h
+++ b/arch/loongarch/kvm/trace.h
@@ -210,6 +210,21 @@ TRACE_EVENT(kvm_vpid_change,
 	    TP_printk("VPID: 0x%08lx", __entry->vpid)
 );
 
+TRACE_EVENT(kvm_pv_tlb_flush,
+	TP_PROTO(struct kvm_vcpu *vcpu, bool need_flush),
+	TP_ARGS(vcpu, need_flush),
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(bool, need_flush)
+	),
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu->vcpu_id;
+		__entry->need_flush = need_flush;
+	),
+	TP_printk("vcpu %u need_flush %u", __entry->vcpu_id,
+		  __entry->need_flush)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index e28084c49e68..5230e95a7816 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -173,7 +173,39 @@ static void kvm_update_stolen_time(struct kvm_vcpu *vcpu)
 	}
 
 	st = (struct kvm_steal_time __user *)ghc->hva;
-	if (kvm_guest_has_pv_feature(vcpu, KVM_FEATURE_PREEMPT)) {
+	if (kvm_guest_has_pv_feature(vcpu, KVM_FEATURE_PV_TLB_FLUSH)) {
+		u32 old = 0;
+		int err = 0;
+
+		/*
+		 * Prime the cache line with a normal load before the coherent
+		 * atomic below; it was observed (when the line was last dirtied
+		 * by another core) to be needed for amand_db.w to see a current
+		 * value.  amand_db.w overwrites `old` with the real pre-AND value,
+		 * so this load contributes only its cache side-effect.
+		 */
+		unsafe_get_user(old, (u32 __user *)&st->preempted, out);
+
+		/* Atomically read and clear the preempted byte via amand_db.w. */
+		asm volatile(
+		"1: amand_db.w %1, %3, %2	\n"
+		"2:				\n"
+		_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %0, %1)
+		: "+r" (err), "+&r" (old),
+		  "+ZB" (*(u32 *)&st->preempted)
+		: "r" ((u32)~0xffu)
+		: "memory");
+
+		if (err)
+			goto out;
+
+		vcpu->arch.st.preempted = 0;
+
+		if ((u8)old & KVM_VCPU_FLUSH_TLB) {
+			vcpu->arch.vpid = 0;	/* Drop vpid to flush TLB */
+			trace_kvm_pv_tlb_flush(vcpu, true);
+		}
+	} else if (kvm_guest_has_pv_feature(vcpu, KVM_FEATURE_PREEMPT)) {
 		unsafe_put_user(0, &st->preempted, out);
 		vcpu->arch.st.preempted = 0;
 	}
diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
index 1317c718f896..cfba45a7343c 100644
--- a/arch/loongarch/kvm/vm.c
+++ b/arch/loongarch/kvm/vm.c
@@ -54,8 +54,10 @@ static void kvm_vm_init_features(struct kvm *kvm)
 	if (kvm_pvtime_supported()) {
 		kvm->arch.pv_features |= BIT(KVM_FEATURE_PREEMPT);
 		kvm->arch.pv_features |= BIT(KVM_FEATURE_STEAL_TIME);
+		kvm->arch.pv_features |= BIT(KVM_FEATURE_PV_TLB_FLUSH);
 		kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_PREEMPT);
 		kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_STEALTIME);
+		kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_TLB_FLUSH);
 	}
 }
 
@@ -158,6 +160,7 @@ static int kvm_vm_feature_has_attr(struct kvm *kvm, struct kvm_device_attr *attr
 	case KVM_LOONGARCH_VM_FEAT_PV_IPI:
 	case KVM_LOONGARCH_VM_FEAT_PV_PREEMPT:
 	case KVM_LOONGARCH_VM_FEAT_PV_STEALTIME:
+	case KVM_LOONGARCH_VM_FEAT_PV_TLB_FLUSH:
 		if (kvm_vm_support(&kvm->arch, attr->attr))
 			return 0;
 		return -ENXIO;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush
  2026-06-15  8:21 [PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support Tao Cui
  2026-06-15  8:21 ` [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory Tao Cui
@ 2026-06-15  8:21 ` Tao Cui
  2026-06-15  8:21 ` [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test Tao Cui
  2 siblings, 0 replies; 7+ messages in thread
From: Tao Cui @ 2026-06-15  8:21 UTC (permalink / raw)
  To: maobibo, zhaotianrui, chenhuacai, loongarch; +Cc: kernel, kvm, Tao Cui

From: Tao Cui <cuitao@kylinos.cn>

Add the guest-side implementation of PV TLB flush for LoongArch KVM
guests, complementing the host-side support added in the previous commit.

When running as a KVM guest, remote TLB flushes are optimized by
avoiding IPIs to preempted vCPUs:

- kvm_flush_tlb_mask() checks each target vCPU's steal-time
  preempted flag. If a vCPU is preempted, it atomically sets
  KVM_VCPU_FLUSH_TLB in the shared preempted byte via cmpxchg
  and removes that CPU from the IPI mask.
- Only non-preempted vCPUs receive IPIs via on_each_cpu_mask().
- When the host schedules a deferred-flush vCPU back in, it
  invalidates the VPID and flushes the TLB automatically.

All six SMP TLB flush functions (flush_tlb_all, flush_tlb_mm,
flush_tlb_range, flush_tlb_kernel_range, flush_tlb_page,
flush_tlb_one) are updated to use the PV path when the static
key pv_tlb_flush_key is enabled.

The feature is gated by KVM_FEATURE_PV_TLB_FLUSH and requires
KVM_FEATURE_STEAL_TIME (for the shared memory page). Per-CPU
cpumask buffers are allocated via arch_initcall.

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 arch/loongarch/include/asm/kvm_para.h |  8 ++++
 arch/loongarch/include/asm/paravirt.h | 21 ++++++++++
 arch/loongarch/kernel/paravirt.c      | 60 +++++++++++++++++++++++++++
 arch/loongarch/kernel/smp.c           | 30 +++++++++++---
 4 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/arch/loongarch/include/asm/kvm_para.h b/arch/loongarch/include/asm/kvm_para.h
index 28e3fa3b4c0e..7956aeff2436 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -187,4 +187,12 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 	return false;
 }
 
+static inline bool kvm_pv_tlb_flush_supported(void)
+{
+	unsigned int feat = kvm_arch_para_features();
+
+	return (feat & (1U << KVM_FEATURE_PV_TLB_FLUSH)) &&
+	       (feat & (1U << KVM_FEATURE_STEAL_TIME));
+}
+
 #endif /* _ASM_LOONGARCH_KVM_PARA_H */
diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h
index acae1c5e5f88..6ce62b555a5d 100644
--- a/arch/loongarch/include/asm/paravirt.h
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -5,15 +5,26 @@
 #ifdef CONFIG_PARAVIRT
 
 #include <linux/jump_label.h>
+#include <linux/cpumask.h>
+#include <linux/smp.h>
 
 DECLARE_STATIC_KEY_FALSE(virt_preempt_key);
 DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
+DECLARE_STATIC_KEY_FALSE(pv_tlb_flush_key);
 DECLARE_PER_CPU(struct kvm_steal_time, steal_time);
 
 int __init pv_ipi_init(void);
 int __init pv_time_init(void);
 int __init pv_spinlock_init(void);
 
+void kvm_flush_tlb_mask(const struct cpumask *cpumask,
+			smp_call_func_t func, void *info);
+
+static inline bool pv_tlb_flush_enabled(void)
+{
+	return static_branch_unlikely(&pv_tlb_flush_key);
+}
+
 #else
 
 static inline int pv_ipi_init(void)
@@ -31,5 +42,15 @@ static inline int pv_spinlock_init(void)
 	return 0;
 }
 
+static inline bool pv_tlb_flush_enabled(void)
+{
+	return false;
+}
+
+static inline void kvm_flush_tlb_mask(const struct cpumask *cpumask,
+				       smp_call_func_t func, void *info)
+{
+}
+
 #endif // CONFIG_PARAVIRT
 #endif
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
index 10821cce554c..eab84c9b94f6 100644
--- a/arch/loongarch/kernel/paravirt.c
+++ b/arch/loongarch/kernel/paravirt.c
@@ -12,7 +12,9 @@
 static int has_steal_clock;
 DEFINE_STATIC_KEY_FALSE(virt_preempt_key);
 DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
+DEFINE_STATIC_KEY_FALSE(pv_tlb_flush_key);
 DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
+static DEFINE_PER_CPU(cpumask_var_t, __pv_cpu_mask);
 
 static bool steal_acc = true;
 
@@ -208,6 +210,64 @@ int __init pv_ipi_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_SMP
+void kvm_flush_tlb_mask(const struct cpumask *cpumask,
+			       smp_call_func_t func, void *info)
+{
+	int cpu;
+	struct kvm_steal_time *src;
+	struct cpumask *flushmask;
+
+	preempt_disable();
+	flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);
+	cpumask_copy(flushmask, cpumask);
+
+	/*
+	 * We have to call flush only on online vCPUs. And
+	 * queue flush_on_enter for pre-empted vCPUs
+	 */
+	for_each_cpu(cpu, flushmask) {
+		u32 *ptr, old, new;
+
+		src = &per_cpu(steal_time, cpu);
+		ptr = (u32 *)&src->preempted;
+		old = READ_ONCE(*ptr);
+		if (!((u8)old & KVM_VCPU_PREEMPTED))
+			continue;
+
+		new = old | KVM_VCPU_FLUSH_TLB;
+		if (try_cmpxchg(ptr, &old, new))
+			__cpumask_clear_cpu(cpu, flushmask);
+	}
+
+	on_each_cpu_mask(flushmask, func, info, 1);
+	preempt_enable();
+}
+
+static int __init pv_tlb_flush_init(void)
+{
+	int cpu;
+
+	if (!kvm_pv_tlb_flush_supported())
+		return 0;
+
+	for_each_possible_cpu(cpu) {
+		if (!zalloc_cpumask_var_node(per_cpu_ptr(&__pv_cpu_mask, cpu),
+					    GFP_KERNEL, cpu_to_node(cpu))) {
+			while (--cpu >= 0)
+				free_cpumask_var(*per_cpu_ptr(&__pv_cpu_mask, cpu));
+			return -ENOMEM;
+		}
+	}
+
+	static_branch_enable(&pv_tlb_flush_key);
+	pr_info("KVM setup pv remote TLB flush\n");
+
+	return 0;
+}
+arch_initcall(pv_tlb_flush_init);
+#endif
+
 static int pv_enable_steal_time(void)
 {
 	int cpu = smp_processor_id();
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 50922610758b..bb3451b057ed 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -727,7 +727,10 @@ static void flush_tlb_all_ipi(void *info)
 
 void flush_tlb_all(void)
 {
-	on_each_cpu(flush_tlb_all_ipi, NULL, 1);
+	if (pv_tlb_flush_enabled())
+		kvm_flush_tlb_mask(cpu_online_mask, flush_tlb_all_ipi, NULL);
+	else
+		on_each_cpu(flush_tlb_all_ipi, NULL, 1);
 }
 
 static void flush_tlb_mm_ipi(void *mm)
@@ -743,7 +746,10 @@ void flush_tlb_mm(struct mm_struct *mm)
 	preempt_disable();
 
 	if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
-		on_each_cpu_mask(mm_cpumask(mm), flush_tlb_mm_ipi, mm, 1);
+		if (pv_tlb_flush_enabled())
+			kvm_flush_tlb_mask(mm_cpumask(mm), flush_tlb_mm_ipi, mm);
+		else
+			on_each_cpu_mask(mm_cpumask(mm), flush_tlb_mm_ipi, mm, 1);
 	} else {
 		unsigned int cpu;
 
@@ -782,7 +788,10 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned l
 			.addr2 = end,
 		};
 
-		on_each_cpu_mask(mm_cpumask(mm), flush_tlb_range_ipi, &fd, 1);
+		if (pv_tlb_flush_enabled())
+			kvm_flush_tlb_mask(mm_cpumask(mm), flush_tlb_range_ipi, &fd);
+		else
+			on_each_cpu_mask(mm_cpumask(mm), flush_tlb_range_ipi, &fd, 1);
 	} else {
 		unsigned int cpu;
 
@@ -809,7 +818,10 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.addr2 = end,
 	};
 
-	on_each_cpu(flush_tlb_kernel_range_ipi, &fd, 1);
+	if (pv_tlb_flush_enabled())
+		kvm_flush_tlb_mask(cpu_online_mask, flush_tlb_kernel_range_ipi, &fd);
+	else
+		on_each_cpu(flush_tlb_kernel_range_ipi, &fd, 1);
 }
 
 static void flush_tlb_page_ipi(void *info)
@@ -828,7 +840,10 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 			.addr1 = page,
 		};
 
-		on_each_cpu_mask(mm_cpumask(vma->vm_mm), flush_tlb_page_ipi, &fd, 1);
+		if (pv_tlb_flush_enabled())
+			kvm_flush_tlb_mask(mm_cpumask(vma->vm_mm), flush_tlb_page_ipi, &fd);
+		else
+			on_each_cpu_mask(mm_cpumask(vma->vm_mm), flush_tlb_page_ipi, &fd, 1);
 	} else {
 		unsigned int cpu;
 
@@ -851,6 +866,9 @@ static void flush_tlb_one_ipi(void *info)
 
 void flush_tlb_one(unsigned long vaddr)
 {
-	on_each_cpu(flush_tlb_one_ipi, (void *)vaddr, 1);
+	if (pv_tlb_flush_enabled())
+		kvm_flush_tlb_mask(cpu_online_mask, flush_tlb_one_ipi, (void *)vaddr);
+	else
+		on_each_cpu(flush_tlb_one_ipi, (void *)vaddr, 1);
 }
 EXPORT_SYMBOL(flush_tlb_one);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test
  2026-06-15  8:21 [PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support Tao Cui
  2026-06-15  8:21 ` [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory Tao Cui
  2026-06-15  8:21 ` [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush Tao Cui
@ 2026-06-15  8:21 ` Tao Cui
  2026-06-15  8:29   ` sashiko-bot
  2026-06-15  9:24   ` Bibo Mao
  2 siblings, 2 replies; 7+ messages in thread
From: Tao Cui @ 2026-06-15  8:21 UTC (permalink / raw)
  To: maobibo, zhaotianrui, chenhuacai, loongarch; +Cc: kernel, kvm, Tao Cui

From: Tao Cui <cuitao@kylinos.cn>

Add a multi-threaded benchmark to measure PV TLB flush performance
inside LoongArch KVM guests.

The test spawns flusher threads that repeatedly mmap/munmap to trigger
TLB shootdown IPIs, alongside idle threads that either sleep or
busy-spin. With PV TLB flush enabled, IPIs to preempted vCPUs are
replaced by deferred flags in the steal-time shared page.

Usage (inside guest):
  ./pv_tlb_flush_test <flushers> <idle> <iterations> <busy_idle>
  busy_idle=0: idle threads sleep (PV can skip IPIs to preempted vCPUs)
  busy_idle=1: idle threads spin (all vCPUs active, PV cannot optimize)

Signed-off-by: Tao Cui <cuitao@kylinos.cn>
---
 .../kvm/loongarch/pv_tlb_flush_test.c         | 194 ++++++++++++++++++
 1 file changed, 194 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c

diff --git a/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c b/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c
new file mode 100644
index 000000000000..63efaf9ef1cd
--- /dev/null
+++ b/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * LoongArch PV TLB Flush Performance Test
+ *
+ * Measure the overhead of remote TLB flushes in a KVM guest by spawning
+ * flusher threads that repeatedly mmap/munmap (triggering TLB shootdown
+ * IPIs) alongside idle threads that either sleep or busy-spin.
+ *
+ * With PV TLB flush enabled, IPIs to preempted vCPUs are replaced by
+ * deferred flags in the steal-time shared page, reducing flush latency.
+ *
+ * Usage:
+ *   Compile on LoongArch guest:
+ *     gcc -O2 -static -pthread -o pv_tlb_flush_test pv_tlb_flush_test.c
+ *   Run (inside KVM guest):
+ *     ./pv_tlb_flush_test <flushers> <idle> <iterations> <busy_idle>
+ *   Examples:
+ *     ./pv_tlb_flush_test 1 31 50000 0   # 1 flusher, 31 sleep, PV helps
+ *     ./pv_tlb_flush_test 1 31 50000 1   # 1 flusher, 31 busy-spin, no PV
+ *
+ *   busy_idle=0: idle threads sleep, vCPUs get preempted, PV TLB flush
+ *                can skip IPIs to them
+ *   busy_idle=1: idle threads spin, all vCPUs stay active, PV TLB flush
+ *                cannot optimize (baseline for comparison)
+ */
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+#include <string.h>
+#include <time.h>
+#include <unistd.h>
+#include <sched.h>
+#include <pthread.h>
+
+#define MEM_SIZE (2*1024*1024)
+#define DEFAULT_ITERS 50000
+#define MAX_THREADS 64
+
+static int nr_iters = DEFAULT_ITERS;
+static volatile int start_barrier;
+static volatile int stop_flag;
+static int busy_idle = 0;
+
+struct thread_args {
+	int cpu;
+	unsigned long *result;
+	int *completed;
+};
+
+static inline unsigned long clock_ns(void) {
+    struct timespec ts;
+    clock_gettime(CLOCK_MONOTONIC, &ts);
+    return (unsigned long)ts.tv_sec * 1000000000UL + ts.tv_nsec;
+}
+
+static void pin_cpu(int cpu) {
+    cpu_set_t set;
+    if (cpu < 0)
+        return;
+    CPU_ZERO(&set);
+    CPU_SET(cpu, &set);
+    sched_setaffinity(0, sizeof(set), &set);
+}
+
+static void *idle_thread(void *arg) {
+    struct thread_args *ta = arg;
+    pin_cpu(ta->cpu);
+    while (!__atomic_load_n(&start_barrier, __ATOMIC_ACQUIRE));
+    if (busy_idle) {
+        volatile long sink = 0;
+        while (!__atomic_load_n(&stop_flag, __ATOMIC_ACQUIRE))
+            sink++;
+    } else {
+        while (!__atomic_load_n(&stop_flag, __ATOMIC_ACQUIRE))
+            usleep(1000);
+    }
+    return NULL;
+}
+
+static void *flush_thread(void *arg) {
+    struct thread_args *ta = arg;
+    unsigned long start, end;
+    int i;
+    size_t mem_size = MEM_SIZE;
+    pin_cpu(ta->cpu);
+    while (!__atomic_load_n(&start_barrier, __ATOMIC_ACQUIRE));
+    start = clock_ns();
+    for (i = 0; i < nr_iters && !__atomic_load_n(&stop_flag, __ATOMIC_ACQUIRE); i++) {
+        void *p = mmap(NULL, mem_size, PROT_READ|PROT_WRITE,
+                       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+        if (p == MAP_FAILED) break;
+        for (size_t off = 0; off < mem_size; off += 65536)
+            ((volatile char*)p)[off] = 0;
+        munmap(p, mem_size);
+    }
+    end = clock_ns();
+    *ta->result = end - start;
+    *ta->completed = i;
+    return NULL;
+}
+
+int main(int argc, char **argv) {
+    int nr_flush = 1, nr_idle = 3, i, run;
+    int ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+    if (argc > 1) nr_flush = atoi(argv[1]);
+    if (argc > 2) nr_idle = atoi(argv[2]);
+    if (argc > 3) nr_iters = atoi(argv[3]);
+    if (argc > 4) busy_idle = atoi(argv[4]);
+
+    if (nr_flush < 1 || nr_idle < 0 || nr_flush + nr_idle > MAX_THREADS) {
+        fprintf(stderr, "Usage: %s [flushers(1-%d)] [idle(0-%d)] [iters] [busy_idle]\n",
+                argv[0], MAX_THREADS, MAX_THREADS);
+        return 1;
+    }
+    if (nr_iters <= 0) {
+        fprintf(stderr, "Error: iterations must be positive\n");
+        return 1;
+    }
+
+    printf("=== TLB Flush Benchmark ===\n");
+    printf("CPUs: %d  Flushers: %d  Idle: %d  Iters: %d  Mode: %s\n",
+           ncpus, nr_flush, nr_idle, nr_iters,
+           busy_idle ? "busy-spin" : "sleep");
+
+    for (run = 0; run < 3; run++) {
+        int total = nr_flush + nr_idle;
+        int do_pin = (total <= ncpus);
+        int created = 0;
+        pthread_t threads[MAX_THREADS];
+        unsigned long results[MAX_THREADS];
+        int completed[MAX_THREADS];
+        struct thread_args args[MAX_THREADS];
+        start_barrier = 0; stop_flag = 0;
+
+        for (i = 0; i < nr_idle; i++) {
+            args[i].cpu = do_pin ? nr_flush + i : -1;
+            args[i].result = NULL;
+            args[i].completed = NULL;
+            if (pthread_create(&threads[i], NULL, idle_thread, &args[i])) {
+                perror("pthread_create idle");
+                goto cleanup;
+            }
+            created++;
+        }
+        for (i = 0; i < nr_flush; i++) {
+            int idx = nr_idle + i;
+            results[idx] = 0;
+            completed[idx] = 0;
+            args[idx].cpu = do_pin ? i : -1;
+            args[idx].result = &results[idx];
+            args[idx].completed = &completed[idx];
+            if (pthread_create(&threads[idx], NULL, flush_thread, &args[idx])) {
+                perror("pthread_create flush");
+                goto cleanup;
+            }
+            created++;
+        }
+
+        usleep(10000);
+        __atomic_store_n(&start_barrier, 1, __ATOMIC_RELEASE);
+        for (i = 0; i < nr_flush; i++)
+            pthread_join(threads[nr_idle + i], NULL);
+        __atomic_store_n(&stop_flag, 1, __ATOMIC_RELEASE);
+        for (i = 0; i < nr_idle; i++)
+            pthread_join(threads[i], NULL);
+
+        unsigned long total_ns = 0;
+        unsigned long total_done = 0;
+        for (i = 0; i < nr_flush; i++) {
+            int idx = nr_idle + i;
+            unsigned long done = completed[idx];
+            if (done == 0) {
+                printf("  Run %d flusher %d: no iterations completed\n", run, i);
+                continue;
+            }
+            printf("  Run %d flusher %d: %lu ns/flush (%lu iters)\n",
+                   run, i, results[idx] / done, done);
+            total_ns += results[idx];
+            total_done += done;
+        }
+        if (total_done > 0)
+            printf("  Run %d Avg: %lu ns/flush\n", run, total_ns / total_done);
+        continue;
+
+cleanup:
+        __atomic_store_n(&start_barrier, 1, __ATOMIC_RELEASE);
+        __atomic_store_n(&stop_flag, 1, __ATOMIC_RELEASE);
+        for (i = 0; i < created; i++)
+            pthread_join(threads[i], NULL);
+        return 1;
+    }
+    return 0;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test
  2026-06-15  8:21 ` [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test Tao Cui
@ 2026-06-15  8:29   ` sashiko-bot
  2026-06-15  9:24   ` Bibo Mao
  1 sibling, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-15  8:29 UTC (permalink / raw)
  To: Tao Cui; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Missing Makefile integration for the new test file `pv_tlb_flush_test.c`.
- [Medium] Signed integer overflow in thread bounds check leads to stack buffer overflow.
--

commit 56616f9df639ca6715014c26c5864780ced6ff41
Author: Tao Cui <cuitao@kylinos.cn>

KVM: selftests: loongarch: Add PV TLB flush performance test

Add a multi-threaded benchmark to measure PV TLB flush performance
inside LoongArch KVM guests.

> diff --git a/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c b/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c
> new file mode 100644
> index 0000000000000..63efaf9ef1cd0
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c
> @@ -0,0 +1,194 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * LoongArch PV TLB Flush Performance Test
> + *
> + * Measure the overhead of remote TLB flushes in a KVM guest by spawning
> + * flusher threads that repeatedly mmap/munmap (triggering TLB shootdown
> + * IPIs) alongside idle threads that either sleep or busy-spin.
> + *
> + * With PV TLB flush enabled, IPIs to preempted vCPUs are replaced by
> + * deferred flags in the steal-time shared page, reducing flush latency.
> + *
> + * Usage:
> + *   Compile on LoongArch guest:
> + *     gcc -O2 -static -pthread -o pv_tlb_flush_test pv_tlb_flush_test.c
> + *   Run (inside KVM guest):
> + *     ./pv_tlb_flush_test <flushers> <idle> <iterations> <busy_idle>

[Severity: Medium]
Since this test is provided as a standalone script and not added to the
selftests Makefile (e.g., TEST_PROGS or TEST_GEN_PROGS), will it be
missed during make install and automated test execution?

Also, is it intentional that this test bypasses the standard KVM selftests
framework (kvm_util) which typically provisions VMs and runs guest payloads
programmatically?

[ ... ]

> +int main(int argc, char **argv) {
> +    int nr_flush = 1, nr_idle = 3, i, run;
> +    int ncpus = sysconf(_SC_NPROCESSORS_ONLN);
> +    if (argc > 1) nr_flush = atoi(argv[1]);
> +    if (argc > 2) nr_idle = atoi(argv[2]);
> +    if (argc > 3) nr_iters = atoi(argv[3]);
> +    if (argc > 4) busy_idle = atoi(argv[4]);
> +
> +    if (nr_flush < 1 || nr_idle < 0 || nr_flush + nr_idle > MAX_THREADS) {

[Severity: Medium]
Could this bounds check be bypassed if a user provides values that sum to a
value exceeding the 32-bit signed integer limit (e.g., nr_flush=1 and
nr_idle=2147483647)? The signed addition would overflow to a negative number.

If bypassed, would the program then loop nr_idle times and write out of
bounds to the stack-allocated arrays like threads and args in the loop below?

[ ... ]

> +        pthread_t threads[MAX_THREADS];
> +        unsigned long results[MAX_THREADS];
> +        int completed[MAX_THREADS];
> +        struct thread_args args[MAX_THREADS];
> +        start_barrier = 0; stop_flag = 0;
> +
> +        for (i = 0; i < nr_idle; i++) {
> +            args[i].cpu = do_pin ? nr_flush + i : -1;
> +            args[i].result = NULL;
> +            args[i].completed = NULL;
> +            if (pthread_create(&threads[i], NULL, idle_thread, &args[i])) {

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260615082154.42144-1-cui.tao@linux.dev?part=3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory
  2026-06-15  8:21 ` [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory Tao Cui
@ 2026-06-15  8:35   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-15  8:35 UTC (permalink / raw)
  To: Tao Cui; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] KVM_FEATURE_PV_TLB_FLUSH is unconditionally enabled by default, breaking capability negotiation and live migration.
--

commit f9022e883153b129074818c0b5b2ff53a144e2f5
Author: Tao Cui <cuitao@kylinos.cn>

LoongArch: KVM: Add PV TLB flush support via steal-time shared memory

This commit implements paravirtualized TLB flushes for LoongArch guests using
the preempted byte in the steal-time shared memory page. It introduces the
KVM_VCPU_FLUSH_TLB flag to avoid sending IPIs when the target vCPU is
currently preempted.

[ ... ]

> diff --git a/arch/loongarch/kvm/vm.c b/arch/loongarch/kvm/vm.c
> index 1317c718f896a..cfba45a7343c5 100644
> --- a/arch/loongarch/kvm/vm.c
> +++ b/arch/loongarch/kvm/vm.c
> @@ -54,8 +54,10 @@ static void kvm_vm_init_features(struct kvm *kvm)
>  	if (kvm_pvtime_supported()) {
>  		kvm->arch.pv_features |= BIT(KVM_FEATURE_PREEMPT);
>  		kvm->arch.pv_features |= BIT(KVM_FEATURE_STEAL_TIME);
> +		kvm->arch.pv_features |= BIT(KVM_FEATURE_PV_TLB_FLUSH);
>  		kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_PREEMPT);
>  		kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_STEALTIME);
> +		kvm->arch.kvm_features |= BIT(KVM_LOONGARCH_VM_FEAT_PV_TLB_FLUSH);

[Severity: High]
Should this new guest-visible feature be disabled by default?

If a VM starts on a kernel with this feature enabled by default, the guest
will detect and use PV TLB flushes. If that VM is later live-migrated to an
older kernel that lacks PV TLB flush support, the destination host will
silently ignore the KVM_VCPU_FLUSH_TLB flag in the steal-time structure.

Could this lead to the guest assuming its TLB was flushed when it wasn't,
resulting in stale translations or memory corruption? Would it be safer to
require an explicit userspace capability enablement before exposing this
to the guest?

>  	}
>  }

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260615082154.42144-1-cui.tao@linux.dev?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test
  2026-06-15  8:21 ` [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test Tao Cui
  2026-06-15  8:29   ` sashiko-bot
@ 2026-06-15  9:24   ` Bibo Mao
  1 sibling, 0 replies; 7+ messages in thread
From: Bibo Mao @ 2026-06-15  9:24 UTC (permalink / raw)
  To: Tao Cui, zhaotianrui, chenhuacai, loongarch; +Cc: kernel, kvm, Tao Cui

Hi Tao,

Thanks for you patch.

Now LoongArch KVM is just workable and not used with wide scale, I am a 
little worried the stability of this PV feature.

Do you have real work load to show the benefits of the PV TLB flush, and 
which TLB APIs should be hooked.

Regards
Bibo Mao

On 2026/6/15 下午4:21, Tao Cui wrote:
> From: Tao Cui <cuitao@kylinos.cn>
> 
> Add a multi-threaded benchmark to measure PV TLB flush performance
> inside LoongArch KVM guests.
> 
> The test spawns flusher threads that repeatedly mmap/munmap to trigger
> TLB shootdown IPIs, alongside idle threads that either sleep or
> busy-spin. With PV TLB flush enabled, IPIs to preempted vCPUs are
> replaced by deferred flags in the steal-time shared page.
> 
> Usage (inside guest):
>    ./pv_tlb_flush_test <flushers> <idle> <iterations> <busy_idle>
>    busy_idle=0: idle threads sleep (PV can skip IPIs to preempted vCPUs)
>    busy_idle=1: idle threads spin (all vCPUs active, PV cannot optimize)
> 
> Signed-off-by: Tao Cui <cuitao@kylinos.cn>
> ---
>   .../kvm/loongarch/pv_tlb_flush_test.c         | 194 ++++++++++++++++++
>   1 file changed, 194 insertions(+)
>   create mode 100644 tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c
> 
> diff --git a/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c b/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c
> new file mode 100644
> index 000000000000..63efaf9ef1cd
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/loongarch/pv_tlb_flush_test.c
> @@ -0,0 +1,194 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * LoongArch PV TLB Flush Performance Test
> + *
> + * Measure the overhead of remote TLB flushes in a KVM guest by spawning
> + * flusher threads that repeatedly mmap/munmap (triggering TLB shootdown
> + * IPIs) alongside idle threads that either sleep or busy-spin.
> + *
> + * With PV TLB flush enabled, IPIs to preempted vCPUs are replaced by
> + * deferred flags in the steal-time shared page, reducing flush latency.
> + *
> + * Usage:
> + *   Compile on LoongArch guest:
> + *     gcc -O2 -static -pthread -o pv_tlb_flush_test pv_tlb_flush_test.c
> + *   Run (inside KVM guest):
> + *     ./pv_tlb_flush_test <flushers> <idle> <iterations> <busy_idle>
> + *   Examples:
> + *     ./pv_tlb_flush_test 1 31 50000 0   # 1 flusher, 31 sleep, PV helps
> + *     ./pv_tlb_flush_test 1 31 50000 1   # 1 flusher, 31 busy-spin, no PV
> + *
> + *   busy_idle=0: idle threads sleep, vCPUs get preempted, PV TLB flush
> + *                can skip IPIs to them
> + *   busy_idle=1: idle threads spin, all vCPUs stay active, PV TLB flush
> + *                cannot optimize (baseline for comparison)
> + */
> +#define _GNU_SOURCE
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/mman.h>
> +#include <string.h>
> +#include <time.h>
> +#include <unistd.h>
> +#include <sched.h>
> +#include <pthread.h>
> +
> +#define MEM_SIZE (2*1024*1024)
> +#define DEFAULT_ITERS 50000
> +#define MAX_THREADS 64
> +
> +static int nr_iters = DEFAULT_ITERS;
> +static volatile int start_barrier;
> +static volatile int stop_flag;
> +static int busy_idle = 0;
> +
> +struct thread_args {
> +	int cpu;
> +	unsigned long *result;
> +	int *completed;
> +};
> +
> +static inline unsigned long clock_ns(void) {
> +    struct timespec ts;
> +    clock_gettime(CLOCK_MONOTONIC, &ts);
> +    return (unsigned long)ts.tv_sec * 1000000000UL + ts.tv_nsec;
> +}
> +
> +static void pin_cpu(int cpu) {
> +    cpu_set_t set;
> +    if (cpu < 0)
> +        return;
> +    CPU_ZERO(&set);
> +    CPU_SET(cpu, &set);
> +    sched_setaffinity(0, sizeof(set), &set);
> +}
> +
> +static void *idle_thread(void *arg) {
> +    struct thread_args *ta = arg;
> +    pin_cpu(ta->cpu);
> +    while (!__atomic_load_n(&start_barrier, __ATOMIC_ACQUIRE));
> +    if (busy_idle) {
> +        volatile long sink = 0;
> +        while (!__atomic_load_n(&stop_flag, __ATOMIC_ACQUIRE))
> +            sink++;
> +    } else {
> +        while (!__atomic_load_n(&stop_flag, __ATOMIC_ACQUIRE))
> +            usleep(1000);
> +    }
> +    return NULL;
> +}
> +
> +static void *flush_thread(void *arg) {
> +    struct thread_args *ta = arg;
> +    unsigned long start, end;
> +    int i;
> +    size_t mem_size = MEM_SIZE;
> +    pin_cpu(ta->cpu);
> +    while (!__atomic_load_n(&start_barrier, __ATOMIC_ACQUIRE));
> +    start = clock_ns();
> +    for (i = 0; i < nr_iters && !__atomic_load_n(&stop_flag, __ATOMIC_ACQUIRE); i++) {
> +        void *p = mmap(NULL, mem_size, PROT_READ|PROT_WRITE,
> +                       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> +        if (p == MAP_FAILED) break;
> +        for (size_t off = 0; off < mem_size; off += 65536)
> +            ((volatile char*)p)[off] = 0;
> +        munmap(p, mem_size);
> +    }
> +    end = clock_ns();
> +    *ta->result = end - start;
> +    *ta->completed = i;
> +    return NULL;
> +}
> +
> +int main(int argc, char **argv) {
> +    int nr_flush = 1, nr_idle = 3, i, run;
> +    int ncpus = sysconf(_SC_NPROCESSORS_ONLN);
> +    if (argc > 1) nr_flush = atoi(argv[1]);
> +    if (argc > 2) nr_idle = atoi(argv[2]);
> +    if (argc > 3) nr_iters = atoi(argv[3]);
> +    if (argc > 4) busy_idle = atoi(argv[4]);
> +
> +    if (nr_flush < 1 || nr_idle < 0 || nr_flush + nr_idle > MAX_THREADS) {
> +        fprintf(stderr, "Usage: %s [flushers(1-%d)] [idle(0-%d)] [iters] [busy_idle]\n",
> +                argv[0], MAX_THREADS, MAX_THREADS);
> +        return 1;
> +    }
> +    if (nr_iters <= 0) {
> +        fprintf(stderr, "Error: iterations must be positive\n");
> +        return 1;
> +    }
> +
> +    printf("=== TLB Flush Benchmark ===\n");
> +    printf("CPUs: %d  Flushers: %d  Idle: %d  Iters: %d  Mode: %s\n",
> +           ncpus, nr_flush, nr_idle, nr_iters,
> +           busy_idle ? "busy-spin" : "sleep");
> +
> +    for (run = 0; run < 3; run++) {
> +        int total = nr_flush + nr_idle;
> +        int do_pin = (total <= ncpus);
> +        int created = 0;
> +        pthread_t threads[MAX_THREADS];
> +        unsigned long results[MAX_THREADS];
> +        int completed[MAX_THREADS];
> +        struct thread_args args[MAX_THREADS];
> +        start_barrier = 0; stop_flag = 0;
> +
> +        for (i = 0; i < nr_idle; i++) {
> +            args[i].cpu = do_pin ? nr_flush + i : -1;
> +            args[i].result = NULL;
> +            args[i].completed = NULL;
> +            if (pthread_create(&threads[i], NULL, idle_thread, &args[i])) {
> +                perror("pthread_create idle");
> +                goto cleanup;
> +            }
> +            created++;
> +        }
> +        for (i = 0; i < nr_flush; i++) {
> +            int idx = nr_idle + i;
> +            results[idx] = 0;
> +            completed[idx] = 0;
> +            args[idx].cpu = do_pin ? i : -1;
> +            args[idx].result = &results[idx];
> +            args[idx].completed = &completed[idx];
> +            if (pthread_create(&threads[idx], NULL, flush_thread, &args[idx])) {
> +                perror("pthread_create flush");
> +                goto cleanup;
> +            }
> +            created++;
> +        }
> +
> +        usleep(10000);
> +        __atomic_store_n(&start_barrier, 1, __ATOMIC_RELEASE);
> +        for (i = 0; i < nr_flush; i++)
> +            pthread_join(threads[nr_idle + i], NULL);
> +        __atomic_store_n(&stop_flag, 1, __ATOMIC_RELEASE);
> +        for (i = 0; i < nr_idle; i++)
> +            pthread_join(threads[i], NULL);
> +
> +        unsigned long total_ns = 0;
> +        unsigned long total_done = 0;
> +        for (i = 0; i < nr_flush; i++) {
> +            int idx = nr_idle + i;
> +            unsigned long done = completed[idx];
> +            if (done == 0) {
> +                printf("  Run %d flusher %d: no iterations completed\n", run, i);
> +                continue;
> +            }
> +            printf("  Run %d flusher %d: %lu ns/flush (%lu iters)\n",
> +                   run, i, results[idx] / done, done);
> +            total_ns += results[idx];
> +            total_done += done;
> +        }
> +        if (total_done > 0)
> +            printf("  Run %d Avg: %lu ns/flush\n", run, total_ns / total_done);
> +        continue;
> +
> +cleanup:
> +        __atomic_store_n(&start_barrier, 1, __ATOMIC_RELEASE);
> +        __atomic_store_n(&stop_flag, 1, __ATOMIC_RELEASE);
> +        for (i = 0; i < created; i++)
> +            pthread_join(threads[i], NULL);
> +        return 1;
> +    }
> +    return 0;
> +}
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-15  9:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-15  8:21 [PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support Tao Cui
2026-06-15  8:21 ` [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory Tao Cui
2026-06-15  8:35   ` sashiko-bot
2026-06-15  8:21 ` [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush Tao Cui
2026-06-15  8:21 ` [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test Tao Cui
2026-06-15  8:29   ` sashiko-bot
2026-06-15  9:24   ` Bibo Mao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.