public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] KVM: X86: Make bus clock frequency for vapic timer configurable
@ 2023-11-14  4:35 isaku.yamahata
  2023-11-14  4:35 ` [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable isaku.yamahata
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: isaku.yamahata @ 2023-11-14  4:35 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Paolo Bonzini, erdemaktas,
	Sean Christopherson, Vishal Annapurve, Jim Mattson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Changes from v1:
  https://lore.kernel.org/all/cover.1699383993.git.isaku.yamahata@intel.com/
- Added a test case
- Fix a build error for i386 platform
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Updated api.rst

Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREUQNCY_CONTROL) to set the
frequency.  When using this capability, the user space VMM should configure
CPUID[0x15] to advertise the frequency.

TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
x86 KVM hardcodes its freuqncy for APIC timer to be 1GHz.  This mismatch
causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
APIC timer emulation uses hrtimer, whose unit is nanosecond.

There are options to reconcile the mismatch.  1) Make apic bus clock frequency
configurable (this patch).  2) TDX KVM code adjusts TMICT value.  This is hacky
and it results in losing MSB bits from 32 bit width to 30 bit width.  3). Make
the guest kernel use tsc deadline timer instead of acpi oneshot/periodic timer.
This is guest kernel choice.  It's out of control of VMM.

[1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/

Isaku Yamahata (3):
  KVM: x86: Make the hardcoded APIC bus frequency vm variable
  KVM: X86: Add a capability to configure bus frequency for APIC timer
  KVM: selftests: Add test case for x86 apic_bus_clock_frequency

 Documentation/virt/kvm/api.rst                |  14 ++
 arch/x86/include/asm/kvm_host.h               |   2 +
 arch/x86/kvm/hyperv.c                         |   2 +-
 arch/x86/kvm/lapic.c                          |   6 +-
 arch/x86/kvm/lapic.h                          |   4 +-
 arch/x86/kvm/x86.c                            |  37 +++++
 include/uapi/linux/kvm.h                      |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   7 +
 .../kvm/x86_64/apic_bus_clock_test.c          | 132 ++++++++++++++++++
 10 files changed, 201 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c


base-commit: be3ca57cfb777ad820c6659d52e60bbdd36bf5ff
-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-11-14  4:35 [PATCH v2 0/3] KVM: X86: Make bus clock frequency for vapic timer configurable isaku.yamahata
@ 2023-11-14  4:35 ` isaku.yamahata
  2023-12-13 22:39   ` Maxim Levitsky
  2023-11-14  4:35 ` [PATCH v2 2/3] KVM: X86: Add a capability to configure bus frequency for APIC timer isaku.yamahata
  2023-11-14  4:35 ` [PATCH v2 3/3] KVM: selftests: Add test case for x86 apic_bus_clock_frequency isaku.yamahata
  2 siblings, 1 reply; 20+ messages in thread
From: isaku.yamahata @ 2023-11-14  4:35 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Paolo Bonzini, erdemaktas,
	Sean Christopherson, Vishal Annapurve, Jim Mattson

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX virtualizes the advertised APIC bus frequency to be 25MHz.  The KVM
hardcodedes it to be 1GHz.  This mismatch causes the vAPIC timer to fire
earlier than the TDX guest expects.  In order to reconcile this mismatch,
make the frequency configurable for the user space VMM.  As the first step,
Replace the constants with the VM value in struct kvm.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes v2:
- no change
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/hyperv.c           | 2 +-
 arch/x86/kvm/lapic.c            | 6 ++++--
 arch/x86/kvm/lapic.h            | 4 ++--
 arch/x86/kvm/x86.c              | 2 ++
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d7036982332e..f2b1c6b3fb11 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1334,6 +1334,8 @@ struct kvm_arch {
 
 	u32 default_tsc_khz;
 	bool user_set_tsc;
+	u64 apic_bus_cycle_ns;
+	u64 apic_bus_frequency;
 
 	seqcount_raw_spinlock_t pvclock_sc;
 	bool use_master_clock;
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 238afd7335e4..995ce2c74ce0 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1687,7 +1687,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
 		data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
 		break;
 	case HV_X64_MSR_APIC_FREQUENCY:
-		data = APIC_BUS_FREQUENCY;
+		data = vcpu->kvm->arch.apic_bus_frequency;
 		break;
 	default:
 		kvm_pr_unimpl_rdmsr(vcpu, msr);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 245b20973cae..73956b0ac1f1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1542,7 +1542,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
 		remaining = 0;
 
 	ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
-	return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
+	return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+			      apic->divide_count));
 }
 
 static void __report_tpr_access(struct kvm_lapic *apic, bool write)
@@ -1960,7 +1961,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
 
 static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
 {
-	return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
+	return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
+		(u64)apic->divide_count;
 }
 
 static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0a0ea4b5dd8c..3a425ea2a515 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -16,8 +16,8 @@
 #define APIC_DEST_NOSHORT		0x0
 #define APIC_DEST_MASK			0x800
 
-#define APIC_BUS_CYCLE_NS       1
-#define APIC_BUS_FREQUENCY      (1000000000ULL / APIC_BUS_CYCLE_NS)
+#define APIC_BUS_CYCLE_NS_DEFAULT	1
+#define APIC_BUS_FREQUENCY_DEFAULT	(1000000000ULL / APIC_BUS_CYCLE_NS_DEFAULT)
 
 #define APIC_BROADCAST			0xFF
 #define X2APIC_BROADCAST		0xFFFFFFFFul
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c924075f6f1..a9f4991b3e2e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12466,6 +12466,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
 	kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
+	kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
+	kvm->arch.apic_bus_frequency = APIC_BUS_FREQUENCY_DEFAULT;
 	kvm->arch.guest_can_read_msr_platform_info = true;
 	kvm->arch.enable_pmu = enable_pmu;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] KVM: X86: Add a capability to configure bus frequency for APIC timer
  2023-11-14  4:35 [PATCH v2 0/3] KVM: X86: Make bus clock frequency for vapic timer configurable isaku.yamahata
  2023-11-14  4:35 ` [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable isaku.yamahata
@ 2023-11-14  4:35 ` isaku.yamahata
  2023-12-13 22:40   ` Maxim Levitsky
  2023-11-14  4:35 ` [PATCH v2 3/3] KVM: selftests: Add test case for x86 apic_bus_clock_frequency isaku.yamahata
  2 siblings, 1 reply; 20+ messages in thread
From: isaku.yamahata @ 2023-11-14  4:35 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Paolo Bonzini, erdemaktas,
	Sean Christopherson, Vishal Annapurve, Jim Mattson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREQUENCY_CONTROL) to set the
frequency.

TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
x86 KVM hardcodes its frequency for APIC timer to be 1GHz.  This mismatch
causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
APIC timer emulation uses hrtimer, whose unit is nanosecond.  Make the
parameter configurable for conversion from the TMICT value to nanosecond.

This patch doesn't affect the TSC deadline timer emulation.  The TSC
deadline emulation path records its expiring TSC value and calculates the
expiring time in nanoseconds.  The APIC timer emulation path calculates the
TSC value from the TMICT register value and uses the TSC deadline timer
path.  This patch touches the APIC timer-specific code but doesn't touch
common logic.

[1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
Reported-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes v2:
- Add check if vcpu isn't created.
- Add check if lapic chip is in-kernel emulation.
- Fix build error for i386
- Add document to api.rst
- typo in the commit message
---
 Documentation/virt/kvm/api.rst | 14 ++++++++++++++
 arch/x86/kvm/x86.c             | 35 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h       |  1 +
 3 files changed, 50 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7025b3751027..cc976df2651e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7858,6 +7858,20 @@ This capability is aimed to mitigate the threat that malicious VMs can
 cause CPU stuck (due to event windows don't open up) and make the CPU
 unavailable to host or other VMs.
 
+7.34 KVM_CAP_X86_BUS_FREQUENCY_CONTROL
+--------------------------------------
+
+:Architectures: x86
+:Target: VM
+:Parameters: args[0] is the value of apic bus clock frequency
+:Returns: 0 on success, -EINVAL if args[0] contains invalid value for the
+          frequency, or -ENXIO if virtual local APIC isn't enabled by
+          KVM_CREATE_IRQCHIP, or -EBUSY if any vcpu is created.
+
+This capability sets the APIC bus clock frequency (or core crystal clock
+frequency) for kvm to emulate APIC in the kernel.  The default value is 1000000
+(1GHz).
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9f4991b3e2e..a8fb862c4f8e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4625,6 +4625,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ENABLE_CAP:
 	case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
 	case KVM_CAP_IRQFD_RESAMPLE:
+	case KVM_CAP_X86_BUS_FREQUENCY_CONTROL:
 		r = 1;
 		break;
 	case KVM_CAP_EXIT_HYPERCALL:
@@ -6616,6 +6617,40 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_X86_BUS_FREQUENCY_CONTROL: {
+		u64 bus_frequency = cap->args[0];
+		u64 bus_cycle_ns;
+
+		if (!bus_frequency)
+			return -EINVAL;
+		/* CPUID[0x15] only support 32bits.  */
+		if (bus_frequency != (u32)bus_frequency)
+			return -EINVAL;
+
+		/* Cast to avoid 64bit division on 32bit platform. */
+		bus_cycle_ns = 1000000000UL / (u32)bus_frequency;
+		if (!bus_cycle_ns)
+			return -EINVAL;
+
+		r = 0;
+		mutex_lock(&kvm->lock);
+		/*
+		 * Don't allow to change the frequency dynamically during vcpu
+		 * running to avoid potentially bizarre behavior.
+		 */
+		if (kvm->created_vcpus)
+			r = -EBUSY;
+		/* This is for in-kernel vAPIC emulation. */
+		else if (!irqchip_in_kernel(kvm))
+			r = -ENXIO;
+
+		if (!r) {
+			kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
+			kvm->arch.apic_bus_frequency = bus_frequency;
+		}
+		mutex_unlock(&kvm->lock);
+		return r;
+	}
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 211b86de35ac..d74a057df173 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1201,6 +1201,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
 #define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
+#define KVM_CAP_X86_BUS_FREQUENCY_CONTROL 231
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] KVM: selftests: Add test case for x86 apic_bus_clock_frequency
  2023-11-14  4:35 [PATCH v2 0/3] KVM: X86: Make bus clock frequency for vapic timer configurable isaku.yamahata
  2023-11-14  4:35 ` [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable isaku.yamahata
  2023-11-14  4:35 ` [PATCH v2 2/3] KVM: X86: Add a capability to configure bus frequency for APIC timer isaku.yamahata
@ 2023-11-14  4:35 ` isaku.yamahata
  2023-12-13 22:41   ` Maxim Levitsky
  2 siblings, 1 reply; 20+ messages in thread
From: isaku.yamahata @ 2023-11-14  4:35 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: isaku.yamahata, isaku.yamahata, Paolo Bonzini, erdemaktas,
	Sean Christopherson, Vishal Annapurve, Jim Mattson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Test if the apic bus clock frequency is exptected to the configured value.
Set APIC TMICT to the maximum value and busy wait for 100 msec (any value
is okay) with tsc value, and read TMCCT. Calculate apic bus clock frequency
based on TSC frequency.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes v2:
- Newly added
---
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/apic.h       |   7 +
 .../kvm/x86_64/apic_bus_clock_test.c          | 132 ++++++++++++++++++
 3 files changed, 140 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index a5963ab9215b..74ed3f71b6e8 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -115,6 +115,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
+TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
 TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
 TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index bed316fdecd5..866a58d5fa11 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -60,6 +60,13 @@
 #define		APIC_VECTOR_MASK	0x000FF
 #define	APIC_ICR2	0x310
 #define		SET_APIC_DEST_FIELD(x)	((x) << 24)
+#define APIC_LVT0       0x350
+#define         APIC_LVT_TIMER_ONESHOT          (0 << 17)
+#define         APIC_LVT_TIMER_PERIODIC         (1 << 17)
+#define         APIC_LVT_TIMER_TSCDEADLINE      (2 << 17)
+#define APIC_TMICT	0x380
+#define APIC_TMCCT	0x390
+#define APIC_TDCR	0x3E0
 
 void apic_disable(void);
 void xapic_enable(void);
diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
new file mode 100644
index 000000000000..91f558d7c624
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "apic.h"
+#include "test_util.h"
+
+/* Pick one convenient value, 1Ghz.  No special meaning. */
+#define TSC_HZ			(1 * 1000 * 1000 * 1000ULL)
+
+/* Wait for 100 msec, not too long, not too short value. */
+#define LOOP_MSEC		100ULL
+#define TSC_WAIT_DELTA		(TSC_HZ / 1000 * LOOP_MSEC)
+
+/* Pick up typical value.  Different enough from the default value, 1GHz.  */
+#define APIC_BUS_CLOCK_FREQ	(25 * 1000 * 1000ULL)
+
+static void guest_code(void)
+{
+	/* Possible tdcr values and its divide count. */
+	struct {
+		u32 tdcr;
+		u32 divide_count;
+	} tdcrs[] = {
+		{0x0, 2},
+		{0x1, 4},
+		{0x2, 8},
+		{0x3, 16},
+		{0x8, 32},
+		{0x9, 64},
+		{0xa, 128},
+		{0xb, 1},
+	};
+
+	u32 tmict, tmcct;
+	u64 tsc0, tsc1;
+	int i;
+
+	asm volatile("cli");
+
+	xapic_enable();
+
+	/*
+	 * Setup one-shot timer.  Because we don't fire the interrupt, the
+	 * vector doesn't matter.
+	 */
+	xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
+
+	for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
+		xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
+
+		/* Set the largest value to not trigger the interrupt. */
+		tmict = ~0;
+		xapic_write_reg(APIC_TMICT, tmict);
+
+		/* Busy wait for LOOP_MSEC */
+		tsc0 = rdtsc();
+		tsc1 = tsc0;
+		while (tsc1 - tsc0 < TSC_WAIT_DELTA)
+			tsc1 = rdtsc();
+
+		/* Read apic timer and tsc */
+		tmcct = xapic_read_reg(APIC_TMCCT);
+		tsc1 = rdtsc();
+
+		/* Stop timer */
+		xapic_write_reg(APIC_TMICT, 0);
+
+		/* Report it. */
+		GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
+				tsc1 - tsc0, 0, 0);
+	}
+
+	GUEST_DONE();
+}
+
+void test_apic_bus_clock(struct kvm_vcpu *vcpu)
+{
+	bool done = false;
+	struct ucall uc;
+
+	while (!done) {
+		vcpu_run(vcpu);
+		TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+		switch (get_ucall(vcpu, &uc)) {
+		case UCALL_DONE:
+			done = true;
+			break;
+		case UCALL_ABORT:
+			REPORT_GUEST_ASSERT(uc);
+			break;
+		case UCALL_SYNC: {
+			u32 divide_counter = uc.args[1];
+			u32 apic_cycles = uc.args[2];
+			u64 tsc_cycles = uc.args[3];
+			u64 freq;
+
+			TEST_ASSERT(tsc_cycles > 0,
+				    "tsc cycles must not be zero.");
+
+			/* Allow 1% slack. */
+			freq = apic_cycles * divide_counter * TSC_HZ / tsc_cycles;
+			TEST_ASSERT(freq < APIC_BUS_CLOCK_FREQ * 101 / 100,
+				    "APIC bus clock frequency is too large");
+			TEST_ASSERT(freq > APIC_BUS_CLOCK_FREQ * 99 / 100,
+				    "APIC bus clock frequency is too small");
+			break;
+		}
+		default:
+			TEST_FAIL("Unknown ucall %lu", uc.cmd);
+			break;
+		}
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+
+	vm = __vm_create(VM_MODE_DEFAULT, 1, 0);
+	vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *) (TSC_HZ / 1000));
+	/*  KVM_CAP_X86_BUS_FREQUENCY_CONTROL requires that no vcpu is created. */
+	vm_enable_cap(vm, KVM_CAP_X86_BUS_FREQUENCY_CONTROL,
+		      APIC_BUS_CLOCK_FREQ);
+	vcpu = vm_vcpu_add(vm, 0, guest_code);
+
+	virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+	test_apic_bus_clock(vcpu);
+	kvm_vm_free(vm);
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-11-14  4:35 ` [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable isaku.yamahata
@ 2023-12-13 22:39   ` Maxim Levitsky
  2023-12-13 23:10     ` Sean Christopherson
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2023-12-13 22:39 UTC (permalink / raw)
  To: isaku.yamahata, kvm, linux-kernel
  Cc: isaku.yamahata, Paolo Bonzini, erdemaktas, Sean Christopherson,
	Vishal Annapurve, Jim Mattson

On Mon, 2023-11-13 at 20:35 -0800, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TDX virtualizes the advertised APIC bus frequency to be 25MHz. 

Can you explain a bit better why TDX needs this? I am not familiar
with TDX well enough yet to fully understand.

AFAIK, the guest writes the TMICT, that makes the KVM set up a HR timer,
and KVM is free to use any apic frequency to determine the deadline of that timer,
and then once the HR timer fires, KVM injects an interrupt to the guest.

Are some parts of this process overridden by the TDX?

I am sure that there is a good reason to do this, but I would be very happy
to see a detailed explanation in the changelog for future readers who
might know nothing about TDX.


>  The KVM
> hardcodedes it to be 1GHz.  This mismatch causes the vAPIC timer to fire
> earlier than the TDX guest expects.

Here too, what do you mean by "TDX guest expects"? Is the APIC frequency
given to the guest using some TDX specific way like HV_X64_MSR_APIC_FREQUENCY? 

>   In order to reconcile this mismatch,
> make the frequency configurable for the user space VMM.  As the first step,
> Replace the constants with the VM value in struct kvm.



> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> Changes v2:
> - no change
> ---
>  arch/x86/include/asm/kvm_host.h | 2 ++
>  arch/x86/kvm/hyperv.c           | 2 +-
>  arch/x86/kvm/lapic.c            | 6 ++++--
>  arch/x86/kvm/lapic.h            | 4 ++--
>  arch/x86/kvm/x86.c              | 2 ++
>  5 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d7036982332e..f2b1c6b3fb11 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1334,6 +1334,8 @@ struct kvm_arch {
>  
>  	u32 default_tsc_khz;
>  	bool user_set_tsc;
> +	u64 apic_bus_cycle_ns;
> +	u64 apic_bus_frequency;
>  
>  	seqcount_raw_spinlock_t pvclock_sc;
>  	bool use_master_clock;
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 238afd7335e4..995ce2c74ce0 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1687,7 +1687,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
>  		data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
>  		break;
>  	case HV_X64_MSR_APIC_FREQUENCY:
> -		data = APIC_BUS_FREQUENCY;
> +		data = vcpu->kvm->arch.apic_bus_frequency;
>  		break;
>  	default:
>  		kvm_pr_unimpl_rdmsr(vcpu, msr);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 245b20973cae..73956b0ac1f1 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1542,7 +1542,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
>  		remaining = 0;
>  
>  	ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
> -	return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
> +	return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> +			      apic->divide_count));
>  }
>  
>  static void __report_tpr_access(struct kvm_lapic *apic, bool write)
> @@ -1960,7 +1961,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
>  
>  static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
>  {
> -	return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
> +	return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
> +		(u64)apic->divide_count;
>  }
>  
>  static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 0a0ea4b5dd8c..3a425ea2a515 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -16,8 +16,8 @@
>  #define APIC_DEST_NOSHORT		0x0
>  #define APIC_DEST_MASK			0x800
>  
> -#define APIC_BUS_CYCLE_NS       1
> -#define APIC_BUS_FREQUENCY      (1000000000ULL / APIC_BUS_CYCLE_NS)
> +#define APIC_BUS_CYCLE_NS_DEFAULT	1
> +#define APIC_BUS_FREQUENCY_DEFAULT	(1000000000ULL / APIC_BUS_CYCLE_NS_DEFAULT)
>  
>  #define APIC_BROADCAST			0xFF
>  #define X2APIC_BROADCAST		0xFFFFFFFFul
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2c924075f6f1..a9f4991b3e2e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12466,6 +12466,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>  
>  	kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
> +	kvm->arch.apic_bus_cycle_ns = APIC_BUS_CYCLE_NS_DEFAULT;
> +	kvm->arch.apic_bus_frequency = APIC_BUS_FREQUENCY_DEFAULT;
>  	kvm->arch.guest_can_read_msr_platform_info = true;
>  	kvm->arch.enable_pmu = enable_pmu;
>  

Only one minor nitpick: We might not need 'apic_bus_frequency' and instead have
it calculated from apic_bus_cycle_ns? (to have single source of truth)

Frequency is only used by HV_X64_MSR_APIC_FREQUENCY, and I don't think that HyperV guests read
this MSR often, nor that a division will make a dent in the emulation time of this msr,
even if they do.

But if you prefer, I won't mind either.

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Add a capability to configure bus frequency for APIC timer
  2023-11-14  4:35 ` [PATCH v2 2/3] KVM: X86: Add a capability to configure bus frequency for APIC timer isaku.yamahata
@ 2023-12-13 22:40   ` Maxim Levitsky
  0 siblings, 0 replies; 20+ messages in thread
From: Maxim Levitsky @ 2023-12-13 22:40 UTC (permalink / raw)
  To: isaku.yamahata, kvm, linux-kernel
  Cc: isaku.yamahata, Paolo Bonzini, erdemaktas, Sean Christopherson,
	Vishal Annapurve, Jim Mattson

On Mon, 2023-11-13 at 20:35 -0800, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add KVM_CAP_X86_BUS_FREQUENCY_CONTROL capability to configure the core
> crystal clock (or processor's bus clock) for APIC timer emulation.  Allow
> KVM_ENABLE_CAPABILITY(KVM_CAP_X86_BUS_FREQUENCY_CONTROL) to set the
> frequency.
> 
> TDX virtualizes CPUID[0x15] for the core crystal clock to be 25MHz.  The
> x86 KVM hardcodes its frequency for APIC timer to be 1GHz.  This mismatch
> causes the vAPIC timer to fire earlier than the guest expects. [1] The KVM
> APIC timer emulation uses hrtimer, whose unit is nanosecond.  Make the
> parameter configurable for conversion from the TMICT value to nanosecond.
> 
> This patch doesn't affect the TSC deadline timer emulation.  The TSC
> deadline emulation path records its expiring TSC value and calculates the
> expiring time in nanoseconds.  The APIC timer emulation path calculates the
> TSC value from the TMICT register value and uses the TSC deadline timer
> path.  This patch touches the APIC timer-specific code but doesn't touch
> common logic.

Nitpick: To be honest IMHO the paragraph about tsc deadline is redundant, because by definition (x86 spec)
the tsc deadline timer doesn't use APIC bus frequency.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky

> 
> [1] https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> Reported-by: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> Changes v2:
> - Add check if vcpu isn't created.
> - Add check if lapic chip is in-kernel emulation.
> - Fix build error for i386
> - Add document to api.rst
> - typo in the commit message
> ---
>  Documentation/virt/kvm/api.rst | 14 ++++++++++++++
>  arch/x86/kvm/x86.c             | 35 ++++++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h       |  1 +
>  3 files changed, 50 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7025b3751027..cc976df2651e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7858,6 +7858,20 @@ This capability is aimed to mitigate the threat that malicious VMs can
>  cause CPU stuck (due to event windows don't open up) and make the CPU
>  unavailable to host or other VMs.
>  
> +7.34 KVM_CAP_X86_BUS_FREQUENCY_CONTROL
> +--------------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the value of apic bus clock frequency
> +:Returns: 0 on success, -EINVAL if args[0] contains invalid value for the
> +          frequency, or -ENXIO if virtual local APIC isn't enabled by
> +          KVM_CREATE_IRQCHIP, or -EBUSY if any vcpu is created.
> +
> +This capability sets the APIC bus clock frequency (or core crystal clock
> +frequency) for kvm to emulate APIC in the kernel.  The default value is 1000000
> +(1GHz).
> +
>  8. Other capabilities.
>  ======================
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a9f4991b3e2e..a8fb862c4f8e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4625,6 +4625,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_ENABLE_CAP:
>  	case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
>  	case KVM_CAP_IRQFD_RESAMPLE:
> +	case KVM_CAP_X86_BUS_FREQUENCY_CONTROL:
>  		r = 1;
>  		break;
>  	case KVM_CAP_EXIT_HYPERCALL:
> @@ -6616,6 +6617,40 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		}
>  		mutex_unlock(&kvm->lock);
>  		break;
> +	case KVM_CAP_X86_BUS_FREQUENCY_CONTROL: {
> +		u64 bus_frequency = cap->args[0];
> +		u64 bus_cycle_ns;
> +
> +		if (!bus_frequency)
> +			return -EINVAL;
> +		/* CPUID[0x15] only support 32bits.  */
> +		if (bus_frequency != (u32)bus_frequency)
> +			return -EINVAL;
> +
> +		/* Cast to avoid 64bit division on 32bit platform. */
> +		bus_cycle_ns = 1000000000UL / (u32)bus_frequency;
> +		if (!bus_cycle_ns)
> +			return -EINVAL;
> +
> +		r = 0;
> +		mutex_lock(&kvm->lock);
> +		/*
> +		 * Don't allow to change the frequency dynamically during vcpu
> +		 * running to avoid potentially bizarre behavior.
> +		 */
> +		if (kvm->created_vcpus)
> +			r = -EBUSY;
> +		/* This is for in-kernel vAPIC emulation. */
> +		else if (!irqchip_in_kernel(kvm))
> +			r = -ENXIO;
> +
> +		if (!r) {
> +			kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> +			kvm->arch.apic_bus_frequency = bus_frequency;
> +		}
> +		mutex_unlock(&kvm->lock);
> +		return r;
> +	}
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 211b86de35ac..d74a057df173 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1201,6 +1201,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
>  #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
>  #define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
> +#define KVM_CAP_X86_BUS_FREQUENCY_CONTROL 231
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: selftests: Add test case for x86 apic_bus_clock_frequency
  2023-11-14  4:35 ` [PATCH v2 3/3] KVM: selftests: Add test case for x86 apic_bus_clock_frequency isaku.yamahata
@ 2023-12-13 22:41   ` Maxim Levitsky
  0 siblings, 0 replies; 20+ messages in thread
From: Maxim Levitsky @ 2023-12-13 22:41 UTC (permalink / raw)
  To: isaku.yamahata, kvm, linux-kernel
  Cc: isaku.yamahata, Paolo Bonzini, erdemaktas, Sean Christopherson,
	Vishal Annapurve, Jim Mattson

On Mon, 2023-11-13 at 20:35 -0800, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Test if the apic bus clock frequency is exptected to the configured value.
> Set APIC TMICT to the maximum value and busy wait for 100 msec (any value
> is okay) with tsc value, and read TMCCT. Calculate apic bus clock frequency
> based on TSC frequency.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> Changes v2:
> - Newly added
> ---
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/include/x86_64/apic.h       |   7 +
>  .../kvm/x86_64/apic_bus_clock_test.c          | 132 ++++++++++++++++++
>  3 files changed, 140 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index a5963ab9215b..74ed3f71b6e8 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -115,6 +115,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
>  TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
> +TEST_GEN_PROGS_x86_64 += x86_64/apic_bus_clock_test
>  TEST_GEN_PROGS_x86_64 += x86_64/xapic_ipi_test
>  TEST_GEN_PROGS_x86_64 += x86_64/xapic_state_test
>  TEST_GEN_PROGS_x86_64 += x86_64/xcr0_cpuid_test
> diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
> index bed316fdecd5..866a58d5fa11 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/apic.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
> @@ -60,6 +60,13 @@
>  #define		APIC_VECTOR_MASK	0x000FF
>  #define	APIC_ICR2	0x310
>  #define		SET_APIC_DEST_FIELD(x)	((x) << 24)
> +#define APIC_LVT0       0x350
> +#define         APIC_LVT_TIMER_ONESHOT          (0 << 17)
> +#define         APIC_LVT_TIMER_PERIODIC         (1 << 17)
> +#define         APIC_LVT_TIMER_TSCDEADLINE      (2 << 17)
> +#define APIC_TMICT	0x380
> +#define APIC_TMCCT	0x390
> +#define APIC_TDCR	0x3E0
>  
>  void apic_disable(void);
>  void xapic_enable(void);
> diff --git a/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> new file mode 100644
> index 000000000000..91f558d7c624
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/apic_bus_clock_test.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +
> +#include "apic.h"
> +#include "test_util.h"
> +
> +/* Pick one convenient value, 1Ghz.  No special meaning. */
Tiny nitpick: to be 100% honest, 1Ghz does have a special meaning, it's the default APIC bus frequency.
It might be slightly better to use say 1.5Ghz or something like that.

> +#define TSC_HZ			(1 * 1000 * 1000 * 1000ULL)
> +
> +/* Wait for 100 msec, not too long, not too short value. */
> +#define LOOP_MSEC		100ULL
> +#define TSC_WAIT_DELTA		(TSC_HZ / 1000 * LOOP_MSEC)
> +
> +/* Pick up typical value.  Different enough from the default value, 1GHz.  */
> +#define APIC_BUS_CLOCK_FREQ	(25 * 1000 * 1000ULL)
> +
> +static void guest_code(void)
> +{
> +	/* Possible tdcr values and its divide count. */
> +	struct {
> +		u32 tdcr;
> +		u32 divide_count;
> +	} tdcrs[] = {
> +		{0x0, 2},
> +		{0x1, 4},
> +		{0x2, 8},
> +		{0x3, 16},
> +		{0x8, 32},
> +		{0x9, 64},
> +		{0xa, 128},
> +		{0xb, 1},
> +	};
> +
> +	u32 tmict, tmcct;
> +	u64 tsc0, tsc1;
> +	int i;
> +
> +	asm volatile("cli");
> +
> +	xapic_enable();
> +
> +	/*
> +	 * Setup one-shot timer.  Because we don't fire the interrupt, the
> +	 * vector doesn't matter.
> +	 */
> +	xapic_write_reg(APIC_LVT0, APIC_LVT_TIMER_ONESHOT);
> +
> +	for (i = 0; i < ARRAY_SIZE(tdcrs); i++) {
> +		xapic_write_reg(APIC_TDCR, tdcrs[i].tdcr);
> +
> +		/* Set the largest value to not trigger the interrupt. */
> +		tmict = ~0;
> +		xapic_write_reg(APIC_TMICT, tmict);
> +
> +		/* Busy wait for LOOP_MSEC */
> +		tsc0 = rdtsc();
> +		tsc1 = tsc0;
> +		while (tsc1 - tsc0 < TSC_WAIT_DELTA)
> +			tsc1 = rdtsc();
> +
> +		/* Read apic timer and tsc */
> +		tmcct = xapic_read_reg(APIC_TMCCT);
> +		tsc1 = rdtsc();
> +
> +		/* Stop timer */
> +		xapic_write_reg(APIC_TMICT, 0);
> +
> +		/* Report it. */
> +		GUEST_SYNC_ARGS(tdcrs[i].divide_count, tmict - tmcct,
> +				tsc1 - tsc0, 0, 0);
> +	}
> +
> +	GUEST_DONE();
> +}
> +
> +void test_apic_bus_clock(struct kvm_vcpu *vcpu)
> +{
> +	bool done = false;
> +	struct ucall uc;
> +
> +	while (!done) {
> +		vcpu_run(vcpu);
> +		TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> +
> +		switch (get_ucall(vcpu, &uc)) {
> +		case UCALL_DONE:
> +			done = true;
> +			break;
> +		case UCALL_ABORT:
> +			REPORT_GUEST_ASSERT(uc);
> +			break;
> +		case UCALL_SYNC: {
> +			u32 divide_counter = uc.args[1];
> +			u32 apic_cycles = uc.args[2];
> +			u64 tsc_cycles = uc.args[3];
> +			u64 freq;
> +
> +			TEST_ASSERT(tsc_cycles > 0,
> +				    "tsc cycles must not be zero.");
> +
> +			/* Allow 1% slack. */
> +			freq = apic_cycles * divide_counter * TSC_HZ / tsc_cycles;
> +			TEST_ASSERT(freq < APIC_BUS_CLOCK_FREQ * 101 / 100,
> +				    "APIC bus clock frequency is too large");
> +			TEST_ASSERT(freq > APIC_BUS_CLOCK_FREQ * 99 / 100,
> +				    "APIC bus clock frequency is too small");
> +			break;
> +		}
> +		default:
> +			TEST_FAIL("Unknown ucall %lu", uc.cmd);
> +			break;
> +		}
> +	}
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	struct kvm_vm *vm;
> +	struct kvm_vcpu *vcpu;
> +
> +	vm = __vm_create(VM_MODE_DEFAULT, 1, 0);
> +	vm_ioctl(vm, KVM_SET_TSC_KHZ, (void *) (TSC_HZ / 1000));
> +	/*  KVM_CAP_X86_BUS_FREQUENCY_CONTROL requires that no vcpu is created. */
> +	vm_enable_cap(vm, KVM_CAP_X86_BUS_FREQUENCY_CONTROL,
> +		      APIC_BUS_CLOCK_FREQ);
> +	vcpu = vm_vcpu_add(vm, 0, guest_code);
> +
> +	virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
> +
> +	test_apic_bus_clock(vcpu);
> +	kvm_vm_free(vm);
> +}

Looks good overall.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-13 22:39   ` Maxim Levitsky
@ 2023-12-13 23:10     ` Sean Christopherson
  2023-12-13 23:18       ` Jim Mattson
  2023-12-14  9:31       ` Maxim Levitsky
  0 siblings, 2 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-12-13 23:10 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Vishal Annapurve, Jim Mattson

On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> On Mon, 2023-11-13 at 20:35 -0800, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > TDX virtualizes the advertised APIC bus frequency to be 25MHz. 
> 
> Can you explain a bit better why TDX needs this? I am not familiar
> with TDX well enough yet to fully understand.

TDX (the module/architecture) hardcodes the core crystal frequency to 25Mhz,
whereas KVM hardcodes the APIC bus frequency to 1Ghz.  And TDX (again, the module)
*unconditionally* enumerates CPUID 0x15 to TDX guests, i.e. _tells_ the guest that
the frequency is 25MHz regardless of what the VMM/hypervisor actually emulates.
And so the guest skips calibrating the APIC timer, which results in the guest
scheduling timer interrupts waaaaaaay too frequently, i.e. the guest ends up
gettings interrupts at 40x the rate it wants.

Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
the TDX guest core crystal frequency of 25Mhz.

I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
dying on since the KVM changes are relatively simple.

https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-13 23:10     ` Sean Christopherson
@ 2023-12-13 23:18       ` Jim Mattson
  2023-12-14  9:31       ` Maxim Levitsky
  1 sibling, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2023-12-13 23:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, isaku.yamahata, kvm, linux-kernel, isaku.yamahata,
	Paolo Bonzini, erdemaktas, Vishal Annapurve

On Wed, Dec 13, 2023 at 3:10 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> > On Mon, 2023-11-13 at 20:35 -0800, isaku.yamahata@intel.com wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > >
> > > TDX virtualizes the advertised APIC bus frequency to be 25MHz.
> >
> > Can you explain a bit better why TDX needs this? I am not familiar
> > with TDX well enough yet to fully understand.
>
> TDX (the module/architecture) hardcodes the core crystal frequency to 25Mhz,
> whereas KVM hardcodes the APIC bus frequency to 1Ghz.  And TDX (again, the module)
> *unconditionally* enumerates CPUID 0x15 to TDX guests, i.e. _tells_ the guest that
> the frequency is 25MHz regardless of what the VMM/hypervisor actually emulates.
> And so the guest skips calibrating the APIC timer, which results in the guest
> scheduling timer interrupts waaaaaaay too frequently, i.e. the guest ends up
> gettings interrupts at 40x the rate it wants.
>
> Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
> for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
> CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
> needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
> the TDX guest core crystal frequency of 25Mhz.

Aside from placating a broken guest infrastructure that ignores a
17-year old contract between KVM and its guests, what are the
advantages to supporting a range of APIC bus frequencies?

> I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
> dying on since the KVM changes are relatively simple.

Not making the KVM changes is even simpler. :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-13 23:10     ` Sean Christopherson
  2023-12-13 23:18       ` Jim Mattson
@ 2023-12-14  9:31       ` Maxim Levitsky
  2023-12-14 16:41         ` Sean Christopherson
  1 sibling, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2023-12-14  9:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Vishal Annapurve, Jim Mattson

On Wed, 2023-12-13 at 15:10 -0800, Sean Christopherson wrote:
> On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> > On Mon, 2023-11-13 at 20:35 -0800, isaku.yamahata@intel.com wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > 
> > > TDX virtualizes the advertised APIC bus frequency to be 25MHz. 
> > 
> > Can you explain a bit better why TDX needs this? I am not familiar
> > with TDX well enough yet to fully understand.
> 
> TDX (the module/architecture) hardcodes the core crystal frequency to 25Mhz,
> whereas KVM hardcodes the APIC bus frequency to 1Ghz.  And TDX (again, the module)
> *unconditionally* enumerates CPUID 0x15 to TDX guests, i.e. _tells_ the guest that
> the frequency is 25MHz regardless of what the VMM/hypervisor actually emulates.
> And so the guest skips calibrating the APIC timer, which results in the guest
> scheduling timer interrupts waaaaaaay too frequently, i.e. the guest ends up
> gettings interrupts at 40x the rate it wants.

That is what I wanted to hear without opening the PRM ;) - so there is a CPUID leaf,
but KVM just doesn't advertise it. Now it makes sense.

Please add something like that to the commit message:

"TDX guests have the APIC bus frequency hardcoded to 25 Mhz in the CPUID leaf 0x15.
KVM doesn't expose this leaf, but TDX mandates it to be exposed,
and doesn't allow to override it's value either.

To ensure that the guest doesn't have a conflicting view of the APIC bus frequency, 
allow the userspace to tell KVM to use the same frequency that TDX mandates,
instead of the default 1Ghz"

> 
> Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
> for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
> CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
> needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
> the TDX guest core crystal frequency of 25Mhz.

I assume that TDX doesn't allow to change the CPUID 0x15 leaf.

> 
> I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
> dying on since the KVM changes are relatively simple.
> 
> https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com
> 

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-14  9:31       ` Maxim Levitsky
@ 2023-12-14 16:41         ` Sean Christopherson
  2023-12-19  1:40           ` Isaku Yamahata
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2023-12-14 16:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Vishal Annapurve, Jim Mattson

On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> On Wed, 2023-12-13 at 15:10 -0800, Sean Christopherson wrote:
> > Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
> > for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
> > CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
> > needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
> > the TDX guest core crystal frequency of 25Mhz.
> 
> I assume that TDX doesn't allow to change the CPUID 0x15 leaf.

Correct.  I meant to call that out below, but left my sentence half-finished.  It
was supposed to say:

  I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
  use 1Ghz as the base frequency or to allow configuring the base frequency
  advertised to the guest.

> > I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> > use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
> > dying on since the KVM changes are relatively simple.
> > 
> > https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com
> > 
> 
> Best regards,
> 	Maxim Levitsky
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-14 16:41         ` Sean Christopherson
@ 2023-12-19  1:40           ` Isaku Yamahata
  2023-12-19  3:53             ` Jim Mattson
  2023-12-21 17:01             ` Maxim Levitsky
  0 siblings, 2 replies; 20+ messages in thread
From: Isaku Yamahata @ 2023-12-19  1:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, isaku.yamahata, kvm, linux-kernel, isaku.yamahata,
	Paolo Bonzini, erdemaktas, Vishal Annapurve, Jim Mattson,
	isaku.yamahata

On Thu, Dec 14, 2023 at 08:41:43AM -0800,
Sean Christopherson <seanjc@google.com> wrote:

> On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> > On Wed, 2023-12-13 at 15:10 -0800, Sean Christopherson wrote:
> > > Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
> > > for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
> > > CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
> > > needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
> > > the TDX guest core crystal frequency of 25Mhz.
> > 
> > I assume that TDX doesn't allow to change the CPUID 0x15 leaf.
> 
> Correct.  I meant to call that out below, but left my sentence half-finished.  It
> was supposed to say:
> 
>   I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
>   use 1Ghz as the base frequency or to allow configuring the base frequency
>   advertised to the guest.
> 
> > > I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> > > use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
> > > dying on since the KVM changes are relatively simple.
> > > 
> > > https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com
> > > 
> > 
> > Best regards,
> > 	Maxim Levitsky

The followings are the updated version of the commit message.


KVM: x86: Make the hardcoded APIC bus frequency VM variable

The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
CPUID leaf 0x15.  The
TDX mandates it to be exposed and doesn't allow the VMM to override
its value.  The KVM APIC timer emulation hard-codes the frequency to
1GHz.  It doesn't unconditionally enumerate it to the guest unless the
user space VMM sets the CPUID leaf 0x15 by KVM_SET_CPUID.

If the CPUID leaf 0x15 is enumerated, the guest kernel uses it as the
APIC bus frequency.  If not, the guest kernel measures the frequency
based on other known timers like the ACPI timer or the legacy PIT.
The TDX guest kernel gets timer interrupt more times by 1GHz / 25MHz.

To ensure that the guest doesn't have a conflicting view of the APIC
bus frequency, allow the userspace to tell KVM to use the same
frequency that TDX mandates instead of the default 1Ghz.

There are several options to address this.
1. Make the KVM able to configure APIC bus frequency (This patch).
   Pros: It resembles the existing hardware.  The recent Intel CPUs
   adapts 25MHz.
   Cons: Require the VMM to emulate the APIC timer at 25MHz.
2. Make the TDX architecture enumerate CPUID 0x15 to configurable
   frequency or not enumerate it.
   Pros: Any APIC bus frequency is allowed.
   Cons: Deviation from the real hardware.
3. Make the TDX guest kernel use 1GHz when it's running on KVM.
   Cons: The kernel ignores CPUID leaf 0x15.


-- 
Isaku Yamahata <isaku.yamahata@linux.intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-19  1:40           ` Isaku Yamahata
@ 2023-12-19  3:53             ` Jim Mattson
  2023-12-19  7:56               ` Xiaoyao Li
  2023-12-19  8:11               ` Isaku Yamahata
  2023-12-21 17:01             ` Maxim Levitsky
  1 sibling, 2 replies; 20+ messages in thread
From: Jim Mattson @ 2023-12-19  3:53 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Sean Christopherson, Maxim Levitsky, isaku.yamahata, kvm,
	linux-kernel, isaku.yamahata, Paolo Bonzini, erdemaktas,
	Vishal Annapurve

On Mon, Dec 18, 2023 at 5:40 PM Isaku Yamahata
<isaku.yamahata@linux.intel.com> wrote:
>
> On Thu, Dec 14, 2023 at 08:41:43AM -0800,
> Sean Christopherson <seanjc@google.com> wrote:
>
> > On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> > > On Wed, 2023-12-13 at 15:10 -0800, Sean Christopherson wrote:
> > > > Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
> > > > for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
> > > > CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
> > > > needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
> > > > the TDX guest core crystal frequency of 25Mhz.
> > >
> > > I assume that TDX doesn't allow to change the CPUID 0x15 leaf.
> >
> > Correct.  I meant to call that out below, but left my sentence half-finished.  It
> > was supposed to say:
> >
> >   I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> >   use 1Ghz as the base frequency or to allow configuring the base frequency
> >   advertised to the guest.
> >
> > > > I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> > > > use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
> > > > dying on since the KVM changes are relatively simple.
> > > >
> > > > https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com
> > > >
> > >
> > > Best regards,
> > >     Maxim Levitsky
>
> The followings are the updated version of the commit message.
>
>
> KVM: x86: Make the hardcoded APIC bus frequency VM variable
>
> The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
> CPUID leaf 0x15.  The
> TDX mandates it to be exposed and doesn't allow the VMM to override
> its value.  The KVM APIC timer emulation hard-codes the frequency to
> 1GHz.  It doesn't unconditionally enumerate it to the guest unless the
> user space VMM sets the CPUID leaf 0x15 by KVM_SET_CPUID.
>
> If the CPUID leaf 0x15 is enumerated, the guest kernel uses it as the
> APIC bus frequency.  If not, the guest kernel measures the frequency
> based on other known timers like the ACPI timer or the legacy PIT.
> The TDX guest kernel gets timer interrupt more times by 1GHz / 25MHz.
>
> To ensure that the guest doesn't have a conflicting view of the APIC
> bus frequency, allow the userspace to tell KVM to use the same
> frequency that TDX mandates instead of the default 1Ghz.
>
> There are several options to address this.
> 1. Make the KVM able to configure APIC bus frequency (This patch).
>    Pros: It resembles the existing hardware.  The recent Intel CPUs
>    adapts 25MHz.
>    Cons: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
>    frequency or not enumerate it.
>    Pros: Any APIC bus frequency is allowed.
>    Cons: Deviation from the real hardware.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>    Cons: The kernel ignores CPUID leaf 0x15.

4. Change CPUID.15H under TDX to report the crystal clock frequency as 1 GHz.
Pro: This has been the virtual APIC frequency for KVM guests for 13 years.
Pro: This requires changing only one hard-coded constant in TDX.

I see no compelling reason to complicate KVM with support for
configurable APIC frequencies, and I see no advantages to doing so.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-19  3:53             ` Jim Mattson
@ 2023-12-19  7:56               ` Xiaoyao Li
  2023-12-19  8:11               ` Isaku Yamahata
  1 sibling, 0 replies; 20+ messages in thread
From: Xiaoyao Li @ 2023-12-19  7:56 UTC (permalink / raw)
  To: Jim Mattson, Isaku Yamahata
  Cc: Sean Christopherson, Maxim Levitsky, isaku.yamahata, kvm,
	linux-kernel, isaku.yamahata, Paolo Bonzini, erdemaktas,
	Vishal Annapurve

On 12/19/2023 11:53 AM, Jim Mattson wrote:
> On Mon, Dec 18, 2023 at 5:40 PM Isaku Yamahata
> <isaku.yamahata@linux.intel.com> wrote:
>>
>> On Thu, Dec 14, 2023 at 08:41:43AM -0800,
>> Sean Christopherson <seanjc@google.com> wrote:
>>
>>> On Thu, Dec 14, 2023, Maxim Levitsky wrote:
>>>> On Wed, 2023-12-13 at 15:10 -0800, Sean Christopherson wrote:
>>>>> Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
>>>>> for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
>>>>> CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
>>>>> needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
>>>>> the TDX guest core crystal frequency of 25Mhz.
>>>>
>>>> I assume that TDX doesn't allow to change the CPUID 0x15 leaf.
>>>
>>> Correct.  I meant to call that out below, but left my sentence half-finished.  It
>>> was supposed to say:
>>>
>>>    I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
>>>    use 1Ghz as the base frequency or to allow configuring the base frequency
>>>    advertised to the guest.
>>>
>>>>> I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
>>>>> use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
>>>>> dying on since the KVM changes are relatively simple.
>>>>>
>>>>> https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com
>>>>>
>>>>
>>>> Best regards,
>>>>      Maxim Levitsky
>>
>> The followings are the updated version of the commit message.
>>
>>
>> KVM: x86: Make the hardcoded APIC bus frequency VM variable
>>
>> The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
>> CPUID leaf 0x15.  The
>> TDX mandates it to be exposed and doesn't allow the VMM to override
>> its value.  The KVM APIC timer emulation hard-codes the frequency to
>> 1GHz.  It doesn't unconditionally enumerate it to the guest unless the
>> user space VMM sets the CPUID leaf 0x15 by KVM_SET_CPUID.
>>
>> If the CPUID leaf 0x15 is enumerated, the guest kernel uses it as the
>> APIC bus frequency.  If not, the guest kernel measures the frequency
>> based on other known timers like the ACPI timer or the legacy PIT.
>> The TDX guest kernel gets timer interrupt more times by 1GHz / 25MHz.
>>
>> To ensure that the guest doesn't have a conflicting view of the APIC
>> bus frequency, allow the userspace to tell KVM to use the same
>> frequency that TDX mandates instead of the default 1Ghz.
>>
>> There are several options to address this.
>> 1. Make the KVM able to configure APIC bus frequency (This patch).
>>     Pros: It resembles the existing hardware.  The recent Intel CPUs
>>     adapts 25MHz.
>>     Cons: Require the VMM to emulate the APIC timer at 25MHz.
>> 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
>>     frequency or not enumerate it.
>>     Pros: Any APIC bus frequency is allowed.
>>     Cons: Deviation from the real hardware.
>> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>>     Cons: The kernel ignores CPUID leaf 0x15.
> 
> 4. Change CPUID.15H under TDX to report the crystal clock frequency as 1 GHz.

This will have an impact on TSC frequency. Core crystal clock frequency 
is also used to calculate TSC frequency.

> Pro: This has been the virtual APIC frequency for KVM guests for 13 years.
> Pro: This requires changing only one hard-coded constant in TDX.
> 
> I see no compelling reason to complicate KVM with support for
> configurable APIC frequencies, and I see no advantages to doing so.

I'm wondering what's the attitude of KVM community to provide support 
CPUID leaf 0x15? Even KVM decides to never advertise CPUID 0x15 in 
GET_SUPPORTED_CPUID, hard-coded APIC frequency puts additional 
limitation when userspace want to emualte CPUID 0x15



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-19  3:53             ` Jim Mattson
  2023-12-19  7:56               ` Xiaoyao Li
@ 2023-12-19  8:11               ` Isaku Yamahata
  2023-12-20 22:07                 ` Sean Christopherson
  1 sibling, 1 reply; 20+ messages in thread
From: Isaku Yamahata @ 2023-12-19  8:11 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Isaku Yamahata, Sean Christopherson, Maxim Levitsky,
	isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Vishal Annapurve

On Mon, Dec 18, 2023 at 07:53:45PM -0800,
Jim Mattson <jmattson@google.com> wrote:

> On Mon, Dec 18, 2023 at 5:40 PM Isaku Yamahata
> <isaku.yamahata@linux.intel.com> wrote:
> >
> > On Thu, Dec 14, 2023 at 08:41:43AM -0800,
> > Sean Christopherson <seanjc@google.com> wrote:
> >
> > > On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> > > > On Wed, 2023-12-13 at 15:10 -0800, Sean Christopherson wrote:
> > > > > Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
> > > > > for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
> > > > > CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
> > > > > needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
> > > > > the TDX guest core crystal frequency of 25Mhz.
> > > >
> > > > I assume that TDX doesn't allow to change the CPUID 0x15 leaf.
> > >
> > > Correct.  I meant to call that out below, but left my sentence half-finished.  It
> > > was supposed to say:
> > >
> > >   I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> > >   use 1Ghz as the base frequency or to allow configuring the base frequency
> > >   advertised to the guest.
> > >
> > > > > I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> > > > > use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
> > > > > dying on since the KVM changes are relatively simple.
> > > > >
> > > > > https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com
> > > > >
> > > >
> > > > Best regards,
> > > >     Maxim Levitsky
> >
> > The followings are the updated version of the commit message.
> >
> >
> > KVM: x86: Make the hardcoded APIC bus frequency VM variable
> >
> > The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
> > CPUID leaf 0x15.  The
> > TDX mandates it to be exposed and doesn't allow the VMM to override
> > its value.  The KVM APIC timer emulation hard-codes the frequency to
> > 1GHz.  It doesn't unconditionally enumerate it to the guest unless the
> > user space VMM sets the CPUID leaf 0x15 by KVM_SET_CPUID.
> >
> > If the CPUID leaf 0x15 is enumerated, the guest kernel uses it as the
> > APIC bus frequency.  If not, the guest kernel measures the frequency
> > based on other known timers like the ACPI timer or the legacy PIT.
> > The TDX guest kernel gets timer interrupt more times by 1GHz / 25MHz.
> >
> > To ensure that the guest doesn't have a conflicting view of the APIC
> > bus frequency, allow the userspace to tell KVM to use the same
> > frequency that TDX mandates instead of the default 1Ghz.
> >
> > There are several options to address this.
> > 1. Make the KVM able to configure APIC bus frequency (This patch).
> >    Pros: It resembles the existing hardware.  The recent Intel CPUs
> >    adapts 25MHz.
> >    Cons: Require the VMM to emulate the APIC timer at 25MHz.
> > 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
> >    frequency or not enumerate it.
> >    Pros: Any APIC bus frequency is allowed.
> >    Cons: Deviation from the real hardware.
> > 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
> >    Cons: The kernel ignores CPUID leaf 0x15.
> 
> 4. Change CPUID.15H under TDX to report the crystal clock frequency as 1 GHz.
> Pro: This has been the virtual APIC frequency for KVM guests for 13 years.
> Pro: This requires changing only one hard-coded constant in TDX.
> 
> I see no compelling reason to complicate KVM with support for
> configurable APIC frequencies, and I see no advantages to doing so.

Because TDX isn't specific to KVM, it should work with other VMM technologies.
If we'd like to go for this route, the frequency would be configurable.  What
frequency should be acceptable securely is obscure.  25MHz has long history with
the real hardware.
-- 
Isaku Yamahata <isaku.yamahata@linux.intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-19  8:11               ` Isaku Yamahata
@ 2023-12-20 22:07                 ` Sean Christopherson
  2023-12-20 22:22                   ` Jim Mattson
  2023-12-21  5:44                   ` Xiaoyao Li
  0 siblings, 2 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-12-20 22:07 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Jim Mattson, Maxim Levitsky, isaku.yamahata, kvm, linux-kernel,
	isaku.yamahata, Paolo Bonzini, erdemaktas, Vishal Annapurve

On Tue, Dec 19, 2023, Isaku Yamahata wrote:
> On Mon, Dec 18, 2023 at 07:53:45PM -0800, Jim Mattson <jmattson@google.com> wrote:
> > > There are several options to address this.
> > > 1. Make the KVM able to configure APIC bus frequency (This patch).
> > >    Pros: It resembles the existing hardware.  The recent Intel CPUs
> > >    adapts 25MHz.
> > >    Cons: Require the VMM to emulate the APIC timer at 25MHz.
> > > 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
> > >    frequency or not enumerate it.
> > >    Pros: Any APIC bus frequency is allowed.
> > >    Cons: Deviation from the real hardware.

I don't buy this as a valid Con.  TDX is one gigantic deviation from real hardware,
and since TDX obviously can't guarantee the APIC timer is emulated at the correct
frequency, there can't possibly be any security benefits.  If this were truly a
Con that anyone cared about, we would have gotten patches to "fix" KVM a long time
ago.

If the TDX module wasn't effectively hardware-defined software, i.e. was actually
able to adapt at the speed of software, then fixing this in TDX would be a complete
no-brainer.

The KVM uAPI required to play nice is relatively minor, so I'm not totally opposed
to adding it.  But I totally agree with Jim that forcing KVM to change 13+ years
of behavior just because someone at Intel decided that 25MHz was a good number is
ridiculous.

> > > 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
> > >    Cons: The kernel ignores CPUID leaf 0x15.
> > 
> > 4. Change CPUID.15H under TDX to report the crystal clock frequency as 1 GHz.
> > Pro: This has been the virtual APIC frequency for KVM guests for 13 years.
> > Pro: This requires changing only one hard-coded constant in TDX.
> > 
> > I see no compelling reason to complicate KVM with support for
> > configurable APIC frequencies, and I see no advantages to doing so.
> 
> Because TDX isn't specific to KVM, it should work with other VMM technologies.
> If we'd like to go for this route, the frequency would be configurable.  What
> frequency should be acceptable securely is obscure.  25MHz has long history with
> the real hardware.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-20 22:07                 ` Sean Christopherson
@ 2023-12-20 22:22                   ` Jim Mattson
  2023-12-21  5:44                   ` Xiaoyao Li
  1 sibling, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2023-12-20 22:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Isaku Yamahata, Maxim Levitsky, isaku.yamahata, kvm, linux-kernel,
	isaku.yamahata, Paolo Bonzini, erdemaktas, Vishal Annapurve

On Wed, Dec 20, 2023 at 2:07 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Dec 19, 2023, Isaku Yamahata wrote:
> > On Mon, Dec 18, 2023 at 07:53:45PM -0800, Jim Mattson <jmattson@google.com> wrote:
> > > > There are several options to address this.
> > > > 1. Make the KVM able to configure APIC bus frequency (This patch).
> > > >    Pros: It resembles the existing hardware.  The recent Intel CPUs
> > > >    adapts 25MHz.
> > > >    Cons: Require the VMM to emulate the APIC timer at 25MHz.
> > > > 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
> > > >    frequency or not enumerate it.
> > > >    Pros: Any APIC bus frequency is allowed.
> > > >    Cons: Deviation from the real hardware.
>
> I don't buy this as a valid Con.  TDX is one gigantic deviation from real hardware,
> and since TDX obviously can't guarantee the APIC timer is emulated at the correct
> frequency, there can't possibly be any security benefits.  If this were truly a
> Con that anyone cared about, we would have gotten patches to "fix" KVM a long time
> ago.
>
> If the TDX module wasn't effectively hardware-defined software, i.e. was actually
> able to adapt at the speed of software, then fixing this in TDX would be a complete
> no-brainer.
>
> The KVM uAPI required to play nice is relatively minor, so I'm not totally opposed
> to adding it.  But I totally agree with Jim that forcing KVM to change 13+ years
> of behavior just because someone at Intel decided that 25MHz was a good number is
> ridiculous.
>
> > > > 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
> > > >    Cons: The kernel ignores CPUID leaf 0x15.
> > >
> > > 4. Change CPUID.15H under TDX to report the crystal clock frequency as 1 GHz.
> > > Pro: This has been the virtual APIC frequency for KVM guests for 13 years.
> > > Pro: This requires changing only one hard-coded constant in TDX.
> > >
> > > I see no compelling reason to complicate KVM with support for
> > > configurable APIC frequencies, and I see no advantages to doing so.
> >
> > Because TDX isn't specific to KVM, it should work with other VMM technologies.
> > If we'd like to go for this route, the frequency would be configurable.  What
> > frequency should be acceptable securely is obscure.  25MHz has long history with
> > the real hardware.

I am curious how many other hypervisors either offer a configurable
APIC frequency or happened to also land on 25 MHz.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-20 22:07                 ` Sean Christopherson
  2023-12-20 22:22                   ` Jim Mattson
@ 2023-12-21  5:44                   ` Xiaoyao Li
  2023-12-21 14:39                     ` Jim Mattson
  1 sibling, 1 reply; 20+ messages in thread
From: Xiaoyao Li @ 2023-12-21  5:44 UTC (permalink / raw)
  To: Sean Christopherson, Isaku Yamahata
  Cc: Jim Mattson, Maxim Levitsky, isaku.yamahata, kvm, linux-kernel,
	isaku.yamahata, Paolo Bonzini, erdemaktas, Vishal Annapurve

On 12/21/2023 6:07 AM, Sean Christopherson wrote:
> On Tue, Dec 19, 2023, Isaku Yamahata wrote:
>> On Mon, Dec 18, 2023 at 07:53:45PM -0800, Jim Mattson <jmattson@google.com> wrote:
>>>> There are several options to address this.
>>>> 1. Make the KVM able to configure APIC bus frequency (This patch).
>>>>     Pros: It resembles the existing hardware.  The recent Intel CPUs
>>>>     adapts 25MHz.
>>>>     Cons: Require the VMM to emulate the APIC timer at 25MHz.
>>>> 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
>>>>     frequency or not enumerate it.
>>>>     Pros: Any APIC bus frequency is allowed.
>>>>     Cons: Deviation from the real hardware.
> 
> I don't buy this as a valid Con.  TDX is one gigantic deviation from real hardware,
> and since TDX obviously can't guarantee the APIC timer is emulated at the correct
> frequency, there can't possibly be any security benefits.  If this were truly a
> Con that anyone cared about, we would have gotten patches to "fix" KVM a long time
> ago.
> 
> If the TDX module wasn't effectively hardware-defined software, i.e. was actually
> able to adapt at the speed of software, then fixing this in TDX would be a complete
> no-brainer.
> 
> The KVM uAPI required to play nice is relatively minor, so I'm not totally opposed
> to adding it.  But I totally agree with Jim that forcing KVM to change 13+ years
> of behavior just because someone at Intel decided that 25MHz was a good number is
> ridiculous.

I believe 25MHz was chosen because it's the value from hardware that 
supports TDX and it is not going to change for the following known 
generations that support TDX.

It's mainly the core crystal frequency. Yes, it also represents the APIC 
frequency when it's enumerated in CPUID 0x15. However, it also relates 
other things, like intel-pt MTC Freq. If it is configured to other value 
different from hardware, I think it will break the correctness of 
INTEL-PT MTC packets in TDs.

>>>> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>>>>     Cons: The kernel ignores CPUID leaf 0x15.
>>>
>>> 4. Change CPUID.15H under TDX to report the crystal clock frequency as 1 GHz.
>>> Pro: This has been the virtual APIC frequency for KVM guests for 13 years.
>>> Pro: This requires changing only one hard-coded constant in TDX.
>>>
>>> I see no compelling reason to complicate KVM with support for
>>> configurable APIC frequencies, and I see no advantages to doing so.
>>
>> Because TDX isn't specific to KVM, it should work with other VMM technologies.
>> If we'd like to go for this route, the frequency would be configurable.  What
>> frequency should be acceptable securely is obscure.  25MHz has long history with
>> the real hardware.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-21  5:44                   ` Xiaoyao Li
@ 2023-12-21 14:39                     ` Jim Mattson
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2023-12-21 14:39 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Isaku Yamahata, Maxim Levitsky,
	isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Vishal Annapurve

On Wed, Dec 20, 2023 at 9:44 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> On 12/21/2023 6:07 AM, Sean Christopherson wrote:
> > On Tue, Dec 19, 2023, Isaku Yamahata wrote:
> >> On Mon, Dec 18, 2023 at 07:53:45PM -0800, Jim Mattson <jmattson@google.com> wrote:
> >>>> There are several options to address this.
> >>>> 1. Make the KVM able to configure APIC bus frequency (This patch).
> >>>>     Pros: It resembles the existing hardware.  The recent Intel CPUs
> >>>>     adapts 25MHz.
> >>>>     Cons: Require the VMM to emulate the APIC timer at 25MHz.
> >>>> 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
> >>>>     frequency or not enumerate it.
> >>>>     Pros: Any APIC bus frequency is allowed.
> >>>>     Cons: Deviation from the real hardware.
> >
> > I don't buy this as a valid Con.  TDX is one gigantic deviation from real hardware,
> > and since TDX obviously can't guarantee the APIC timer is emulated at the correct
> > frequency, there can't possibly be any security benefits.  If this were truly a
> > Con that anyone cared about, we would have gotten patches to "fix" KVM a long time
> > ago.
> >
> > If the TDX module wasn't effectively hardware-defined software, i.e. was actually
> > able to adapt at the speed of software, then fixing this in TDX would be a complete
> > no-brainer.
> >
> > The KVM uAPI required to play nice is relatively minor, so I'm not totally opposed
> > to adding it.  But I totally agree with Jim that forcing KVM to change 13+ years
> > of behavior just because someone at Intel decided that 25MHz was a good number is
> > ridiculous.
>
> I believe 25MHz was chosen because it's the value from hardware that
> supports TDX and it is not going to change for the following known
> generations that support TDX.
>
> It's mainly the core crystal frequency. Yes, it also represents the APIC
> frequency when it's enumerated in CPUID 0x15. However, it also relates
> other things, like intel-pt MTC Freq. If it is configured to other value
> different from hardware, I think it will break the correctness of
> INTEL-PT MTC packets in TDs.

LOL! That suggests that no one is really using KVM's Intel PT virtualization.

This is certainly a compelling reason for having a variable frequency
virtual APIC. Thank you!

> >>>> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
> >>>>     Cons: The kernel ignores CPUID leaf 0x15.
> >>>
> >>> 4. Change CPUID.15H under TDX to report the crystal clock frequency as 1 GHz.
> >>> Pro: This has been the virtual APIC frequency for KVM guests for 13 years.
> >>> Pro: This requires changing only one hard-coded constant in TDX.
> >>>
> >>> I see no compelling reason to complicate KVM with support for
> >>> configurable APIC frequencies, and I see no advantages to doing so.
> >>
> >> Because TDX isn't specific to KVM, it should work with other VMM technologies.
> >> If we'd like to go for this route, the frequency would be configurable.  What
> >> frequency should be acceptable securely is obscure.  25MHz has long history with
> >> the real hardware.
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable
  2023-12-19  1:40           ` Isaku Yamahata
  2023-12-19  3:53             ` Jim Mattson
@ 2023-12-21 17:01             ` Maxim Levitsky
  1 sibling, 0 replies; 20+ messages in thread
From: Maxim Levitsky @ 2023-12-21 17:01 UTC (permalink / raw)
  To: Isaku Yamahata, Sean Christopherson
  Cc: isaku.yamahata, kvm, linux-kernel, isaku.yamahata, Paolo Bonzini,
	erdemaktas, Vishal Annapurve, Jim Mattson

On Mon, 2023-12-18 at 17:40 -0800, Isaku Yamahata wrote:
> On Thu, Dec 14, 2023 at 08:41:43AM -0800,
> Sean Christopherson <seanjc@google.com> wrote:
> 
> > On Thu, Dec 14, 2023, Maxim Levitsky wrote:
> > > On Wed, 2023-12-13 at 15:10 -0800, Sean Christopherson wrote:
> > > > Upstream KVM's non-TDX behavior is fine, because KVM doesn't advertise support
> > > > for CPUID 0x15, i.e. doesn't announce to host userspace that it's safe to expose
> > > > CPUID 0x15 to the guest.  Because TDX makes exposing CPUID 0x15 mandatory, KVM
> > > > needs to be taught to correctly emulate the guest's APIC bus frequency, a.k.a.
> > > > the TDX guest core crystal frequency of 25Mhz.
> > > 
> > > I assume that TDX doesn't allow to change the CPUID 0x15 leaf.
> > 
> > Correct.  I meant to call that out below, but left my sentence half-finished.  It
> > was supposed to say:
> > 
> >   I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> >   use 1Ghz as the base frequency or to allow configuring the base frequency
> >   advertised to the guest.
> > 
> > > > I halfheartedly floated the idea of "fixing" the TDX module/architecture to either
> > > > use 1Ghz as the base frequency (off list), but it definitely isn't a hill worth
> > > > dying on since the KVM changes are relatively simple.
> > > > 
> > > > https://lore.kernel.org/all/ZSnIKQ4bUavAtBz6@google.com
> > > > 
> > > 
> > > Best regards,
> > > 	Maxim Levitsky
> 
> The followings are the updated version of the commit message.
> 
> 
> KVM: x86: Make the hardcoded APIC bus frequency VM variable
> 
> The TDX architecture hard-codes the APIC bus frequency to 25MHz in the
> CPUID leaf 0x15.  The
> TDX mandates it to be exposed and doesn't allow the VMM to override
> its value.  The KVM APIC timer emulation hard-codes the frequency to
> 1GHz.  It doesn't unconditionally enumerate it to the guest unless the
> user space VMM sets the CPUID leaf 0x15 by KVM_SET_CPUID.
> 
> If the CPUID leaf 0x15 is enumerated, the guest kernel uses it as the
> APIC bus frequency.  If not, the guest kernel measures the frequency
> based on other known timers like the ACPI timer or the legacy PIT.
> The TDX guest kernel gets timer interrupt more times by 1GHz / 25MHz.
> 
> To ensure that the guest doesn't have a conflicting view of the APIC
> bus frequency, allow the userspace to tell KVM to use the same
> frequency that TDX mandates instead of the default 1Ghz.

Looks great!

In theory this gives me an idea that KVM could parse the guest CPUID leaf
0x15 and deduce the frequency from it automatically instead of a new capability,
but I understand that this is (also in theory) not backward compatible assuming
that some hypervisors already expose this leaf for some reason,
thus a new capability will be needed anyway.

Thus I have no more complaints, and thanks for addressing my feedback!

Best regards,
	Maxim Levitsky

> 
> There are several options to address this.
> 1. Make the KVM able to configure APIC bus frequency (This patch).
>    Pros: It resembles the existing hardware.  The recent Intel CPUs
>    adapts 25MHz.
>    Cons: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID 0x15 to configurable
>    frequency or not enumerate it.
>    Pros: Any APIC bus frequency is allowed.
>    Cons: Deviation from the real hardware.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>    Cons: The kernel ignores CPUID leaf 0x15.
> 
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-12-21 17:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-14  4:35 [PATCH v2 0/3] KVM: X86: Make bus clock frequency for vapic timer configurable isaku.yamahata
2023-11-14  4:35 ` [PATCH v2 1/3] KVM: x86: Make the hardcoded APIC bus frequency vm variable isaku.yamahata
2023-12-13 22:39   ` Maxim Levitsky
2023-12-13 23:10     ` Sean Christopherson
2023-12-13 23:18       ` Jim Mattson
2023-12-14  9:31       ` Maxim Levitsky
2023-12-14 16:41         ` Sean Christopherson
2023-12-19  1:40           ` Isaku Yamahata
2023-12-19  3:53             ` Jim Mattson
2023-12-19  7:56               ` Xiaoyao Li
2023-12-19  8:11               ` Isaku Yamahata
2023-12-20 22:07                 ` Sean Christopherson
2023-12-20 22:22                   ` Jim Mattson
2023-12-21  5:44                   ` Xiaoyao Li
2023-12-21 14:39                     ` Jim Mattson
2023-12-21 17:01             ` Maxim Levitsky
2023-11-14  4:35 ` [PATCH v2 2/3] KVM: X86: Add a capability to configure bus frequency for APIC timer isaku.yamahata
2023-12-13 22:40   ` Maxim Levitsky
2023-11-14  4:35 ` [PATCH v2 3/3] KVM: selftests: Add test case for x86 apic_bus_clock_frequency isaku.yamahata
2023-12-13 22:41   ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox