* [RFC 00/26] Intel Thread Director Virtualization
@ 2024-02-03 9:11 Zhao Liu
2024-02-03 9:11 ` [RFC 01/26] thermal: Add bit definition for x86 thermal related MSRs Zhao Liu
` (26 more replies)
0 siblings, 27 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
Hi list,
This is our RFC to virtualize Intel Thread Director (ITD) feature for
Guest, which is based on Ricardo's patch series about ITD related
support in HFI driver ("[PATCH 0/9] thermal: intel: hfi: Prework for the
virtualization of HFI" [1]).
In short, the purpose of this patch set is to enable the ITD-based
scheduling logic in Guest so that Guest can better schedule Guest tasks
on Intel hybrid platforms.
Currently, ITD is necessary for Windows VMs. Based on ITD virtualization
support, the Windows 11 Guest could have significant performance
improvement (for example, on i9-13900K, up to 14%+ improvement on
3DMARK).
Our ITD virtualization is not bound to VMs' hybrid topology or vCPUs'
CPU affinity. However, in our practice, the ITD scheduling optimization
for win11 VMs works best when combined with hybrid topology and CPU
affinity (this is related to the specific implementation of Win11
scheduling). For more details, please see the Section.1.2 "About hybrid
topology and vCPU pinning".
To enable ITD related scheduling optimization in Win11 VM, some other
thermal related support is also needed (HWP, CPPC), but we could emulate
it with dummy value in the VMM (We'll also be sending out extra patches
in the future for these).
Welcome your feedback!
1. Background and Motivation
============================
1.1. Background
^^^^^^^^^^^^^^^
We have the use case to run games in the client Windows VM as the cloud
gaming solution.
Gaming VMs are performance-sensitive VMs on Client, so that they usually
have two characteristics to ensure interactivity and performance:
i) There will be vCPUs equal to or close to the number of Host pCPUs.
ii) The vCPUs of Gaming VM are often bound to the pCPUs to achieve
exclusive resources and avoid the overhead of migration.
In this case, Host can't provide effective scheduling for Guest, so we
need to deliver more hardware-assisted scheduling capabilities to Guest
to enhance Guest's scheduling.
Windows 11 (and future Windows products) is heavily optimized for the
Intel hybrid platform. To get the best performance, we need to
virtualize hybrid scheduling features (HFI/ITD) for Windows Guest.
1.2. About hybrid topology and vCPU pinning
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Our ITD virtualization can support most vCPU topologies (except multiple
packages/dies, see details in 3.5 Restrictions on Guest Topology), and
can also support the case of non-pinning vCPUs (i.e. it can handle vCPU
thread migration).
The following is our performance measuremnt on an i9-13900K machine
(2995Mhz, 24Cores, 32Thread(8+16) RAM: 14GB (16GB Physical)), with
iGPU passthrough, running 3DMARK in Win11 Professional Guest:
compared with smp topo case smp topo smp topo smp topo hybrid topo hybrid topo hybrid topo hybrid topo
+ affinity + ITD + ITD + affinity + ITD + ITD
+ affinity + affinity
Time Spy - Overall 0.179% -0.250% 0.179% -0.107% 0.143% -0.179% -0.107%
Graphics score 0.124% -0.249% 0.124% -0.083% 0.124% -0.166% -0.249%
CPU score 0.916% -0.485% 1.149% -0.076% 0.722% -0.324% 11.915%
Fire Strike Extreme - Overall 0.149% 0.000% 0.224% -1.021% -3.361% -1.319% -3.361%
Graphics score 0.100% 0.050% 0.150% -1.376% -3.427% -1.676% -3.652%
Physics score 5.060% 0.759% 0.518% -2.907% -10.914% -0.897% 14.638%
Combined score 0.120% -0.179% 0.418% 0.060% -2.929% -0.179% -2.809%
Fire Strike - Overall 0.350% -0.085% 0.193% -1.377% -1.365% -1.509% -1.787%
Graphics score 0.256% -0.047% 0.210% -1.527% -1.376% -1.504% -2.320%
Physics score 3.695% -2.180% 0.629% -1.581% -6.846% -1.444% 14.100%
Combined score 0.415% -0.128% 0.128% -0.957% -1.052% -1.594% -0.957%
CPU Profile Max Threads 1.836% 0.298% 1.786% -0.069% 1.545% 0.025% 9.472%
16 Threads 4.290% 0.989% 3.588% 0.595% 1.580% 0.848% 11.295%
8 Threads -22.632% -0.602% -23.167% -0.988% -1.345% -1.340% 8.648%
4 Threads -21.598% 0.449% -21.429% -0.817% 1.951% -0.832% 2.084%
2 Threads -12.912% -0.014% -12.006% -0.481% -0.609% -0.595% 1.161%
1 Threads -3.793% -0.137% -3.793% -0.495% -3.189% -0.495% 1.154%
Based on the above result, we can find exposing only HFI/ITD to win11
VMs without hybrid topology or CPU affinity (case "smp topo + ITD")
won't hurt performance, but would also not get any performance
improvement.
Setting both hybrid topology and CPU affinity for ITD, then win11 VMs
get significate performance improvement (up to 14%+, compared with the
case setting smp topology without CPU affinity).
Not only the numerical results of 3DMARK, but in practice, there is an
significate improvement in the frame rate of the games.
Also, the more powerful the machine, the more significate the
performance gains!
Therefore, the best practice for enabling ITD scheduling optimization
is to set up both CPU affinity and hybrid topology for win11 Guest while
enabling our ITD virtualization.
Our earlier QEMU prototype RFC [2] presented the initial hybrid
topology support for VMs. And currently our another proposal about
"QOM topology" [3] has been raised in the QEMU community, which is the
first step towards the hybrid topology implementation based on QOM
approach.
2. Introduction of HFI and ITD
==============================
Intel provides Hardware Feedback Interface (HFI) feature to allow
hardware to provide guidance to the OS scheduler to perform optimal
workload scheduling through a hardware feedback interface structure in
memory [4]. This HFI structure is called HFI table.
For now, the guidance includes performance and energy efficiency
hints, and it could be update via thermal interrupt as the actual
operating conditions of the processor change during run time.
Intel Thread Director (ITD) feature extends the HFI to provide
performance and energy efficiency data for advanced classes of
instructions.
Since ITD is an extension of HFI, our ITD virtualization also
virtualizes the native HFI feature.
3. Dependencies of ITD
======================
ITD is a thermal FEATURE that requires:
* PTM (Package Thermal Management, alias, PTS)
* HFI (Hardware Feedback Interface)
In order to support the notification mechanism of ITD/HFI dynamic
update, we also need to add thermal interrupt related support,
including the following two features:
* ACPI (Thermal Monitor and Software Controlled Clock Facilities)
* TM (Thermal Monitor, alias, TM1/ACC)
Therefore, we must also consider support for the emulation of all
the above dependencies.
3.1. ACPI emulation
^^^^^^^^^^^^^^^^^^^
For both ACPI, we can support it by emulating the RDMSR/WRMSR of the
associated MSRs and adding the ability to inject thermal interrupts.
But in fact, we don't really inject termal interrupts into Guest for
the termal conditions corresponding to ACPI. Here the termal interrupt
is prepared for the subsequent HFI/ITD.
3.2. TM emulation
^^^^^^^^^^^^^^^^^
TM is a hardware feature and its CPUID bit only indicates the presence
of the automatic thermal monitoring facilities. For TM, there's no
interactive interface between OS and hardware, but its flag is one of
the prerequisites for the OS to enable thermal interrupt.
Thereby, as the support for TM, it is enough for us to expose its CPUID
flag to Guest.
3.3. PTM emulation
^^^^^^^^^^^^^^^^^^
PTM is a package-scope feature that includes package-level MSR and
package-level thermal interrupt. Unfortunately, KVM currently only
supports thread-scope MSR handling, and also doesn't care about the
specific Guest's topology.
But considering that our purpose of supporting PTM in KVM is to further
support ITD, and the current platforms with ITD are all 1 package, so we
emulate the MSRs of the package scope provided by PTM at the VM level.
In this way, the VMM is required to set only one package topology for
the PTM. In order to alleviate this limitation, we only expose the PTM
feature bit to Guest when ITD needs to be supported.
3.4. HFI emulation
^^^^^^^^^^^^^^^^^^
ITD is the extension of HFI, so both HFI and ITD depend on HFI table.
HFI itself is used on the Host for power-related management control, so
we should only expose HFI to Guest when we need to enable ITD.
HFI also relies on PTM interrupt control, so it also has requirements
for package topology, and we also emulate HFI (including ITD) at the VM
level.
In addition, because the HFI driver allocates HFI instances per die,
this also affects HFI (and ITD) and must limit the Guest to only set one
die.
3.5. Restrictions on Guest Topology
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Due to KVM's incomplete support for MSR topology and the requirement for
HFI instance management in the kernel, PTM, HFI, and ITD limit the
topology of the Guest (mainly restricting the topology types created on
the VMM side).
Therefore, we only expose PTM, HFI, and ITD to userspace when we need to
support ITD. At the same time, considering that currently, ITD is only
used on the client platform with 1 package and 1 die, such temporary
restrictions will not have too much impact.
4. Overview of ITD (and HFI) virtualization
===========================================
The main tasks of ITD (including HFI) virtualization are:
* maintain a virtual HFI table for VM.
* inject thermal interrupt when HFI table updates.
* handle related MSRs' emulation and adjust HFI table based on MSR's
control bits.
* expose ITD/HFI configuration info in related CPUID leaves.
The most important of these is the maintenance of the virtual HFI table.
Although the HFI table should also be per package, since ITD/HFI related
MSRs are treated as per VM in KVM, we also treat the virtual HFI table
as per VM.
4.1. HFI table building
^^^^^^^^^^^^^^^^^^^^^^^
HFI table contains a table header and many table entries. Each table
entry is identified by an hfi table index, and each CPU corresponds to
one of the hfi table indexes.
ITD and HFI features both depend on the HFI table, but their HFI table
are a little different. The HFI table provided by the ITD feature has
more classes (in terms of more columns in the table) than the HFI table
of native HFI feature.
The virtual HFI table in KVM is built based on the actual HFI table,
which is maintained by HFI instance in HFI driver. We extract the HFI
data of the pCPUs, which vCPUs are running on, to form a virtual HFI
table.
4.2. HFI table index
^^^^^^^^^^^^^^^^^^^^
There are many entries in the HFI table, and the vCPU will be assigned
an HFI table index to specify the entry it maps. KVM will fill the
pCPU's HFI data (the pCPU that vCPU is running on) into the entry
corresponding to the HFI table index of the vCPU in the vcitual HFI
table.
This index is set by VMM in CPUID.
4.3. HFI table updating
^^^^^^^^^^^^^^^^^^^^^^^
On some platforms, the HFI table will be dynamically updated with
thermal interrupts. In order to update the virtual HFI table in time, we
added the per-VM notifier to the HFI driver to notify KVM to update the
virtual HFI table for the VM, and then inject thermal interrupt into the
VM to notify the Guest.
There is another case that needs to update the virtual HFI table, that
is, when the vCPU is migrated, the pCPU where it is located is changed,
and the corresponding virtual HFI data should also be updated to the new
pCPU's data. In this case, in order to reduce overhead, we can only
update the data of a single vPCU without traversing the entire virtual
HFI table.
5. Patch Summary
================
Patch 01-03: Prepare the bit definition, the hfi helpers and hfi data
structures that KVM needs.
Patch 04-05: Add the sched_out arch hook and reset the classification
history at sched_in()/schedu_out().
Patch 06-10: Add emulations of ACPI, TM and PTM, mainly about CPUID and
related MSRs.
Patch 11-20: Add the emulation support for HFI, including maintaining
the HFI table for VM.
Patch 21-23: Add the emulation support for ITD, including extending HFI
to ITD and passing through the classification MSRs.
Patch 24-25: Add HRESET emulation support, which is also used by IPC
classes feature.
Patch 26: Add the brief doc about the per-VM lock - pkg_therm_lock.
6. References
=============
[1]: [PATCH 0/9] thermal: intel: hfi: Prework for the virtualization of HFI
https://lore.kernel.org/lkml/20240203040515.23947-1-ricardo.neri-calderon@linux.intel.com/
[2]: [RFC 00/52] Introduce hybrid CPU topology,
https://lore.kernel.org/qemu-devel/20230213095035.158240-1-zhao1.liu@linux.intel.com/
[3]: [RFC 00/41] qom-topo: Abstract Everything about CPU Topology,
https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1.liu@linux.intel.com/
[4]: SDM, vol. 3B, section 15.6 HARDWARE FEEDBACK INTERFACE AND INTEL
THREAD DIRECTOR
Thanks and Best Regards,
Zhao
---
Zhao Liu (17):
thermal: Add bit definition for x86 thermal related MSRs
KVM: Add kvm_arch_sched_out() hook
KVM: x86: Reset hardware history at vCPU's sched_in/out
KVM: VMX: Add helpers to handle the writes to MSR's R/O and R/WC0 bits
KVM: x86: cpuid: Define CPUID 0x06.eax by kvm_cpu_cap_mask()
KVM: VMX: Introduce HFI description structure
KVM: VMX: Introduce HFI table index for vCPU
KVM: x86: Introduce the HFI dynamic update request and kvm_x86_ops
KVM: VMX: Allow to inject thermal interrupt without HFI update
KVM: VMX: Emulate HFI related bits in package thermal MSRs
KVM: VMX: Emulate the MSRs of HFI feature
KVM: x86: Expose HFI feature bit and HFI info in CPUID
KVM: VMX: Extend HFI table and MSR emulation to support ITD
KVM: VMX: Pass through ITD classification related MSRs to Guest
KVM: x86: Expose ITD feature bit and related info in CPUID
KVM: VMX: Emulate the MSR of HRESET feature
Documentation: KVM: Add description of pkg_therm_lock
Zhuocheng Ding (9):
thermal: intel: hfi: Add helpers to build HFI/ITD structures
thermal: intel: hfi: Add HFI notifier helpers to notify HFI update
KVM: VMX: Emulate ACPI (CPUID.0x01.edx[bit 22]) feature
KVM: x86: Expose TM/ACC (CPUID.0x01.edx[bit 29]) feature bit to VM
KVM: VMX: Emulate PTM/PTS (CPUID.0x06.eax[bit 6]) feature
KVM: VMX: Support virtual HFI table for VM
KVM: VMX: Sync update of Host HFI table to Guest
KVM: VMX: Update HFI table when vCPU migrates
KVM: x86: Expose HRESET feature's CPUID to Guest
Documentation/virt/kvm/locking.rst | 13 +-
arch/arm64/include/asm/kvm_host.h | 1 +
arch/mips/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/riscv/include/asm/kvm_host.h | 1 +
arch/s390/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/hfi.h | 28 ++
arch/x86/include/asm/kvm-x86-ops.h | 3 +-
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/msr-index.h | 54 +-
arch/x86/kvm/cpuid.c | 201 +++++++-
arch/x86/kvm/irq.h | 1 +
arch/x86/kvm/lapic.c | 9 +
arch/x86/kvm/svm/svm.c | 8 +
arch/x86/kvm/vmx/vmx.c | 751 +++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 79 ++-
arch/x86/kvm/x86.c | 18 +
drivers/thermal/intel/intel_hfi.c | 212 +++++++-
drivers/thermal/intel/therm_throt.c | 1 -
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 1 +
21 files changed, 1343 insertions(+), 44 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC 01/26] thermal: Add bit definition for x86 thermal related MSRs
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 02/26] thermal: intel: hfi: Add helpers to build HFI/ITD structures Zhao Liu
` (25 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
Add the definition of more bits of these MSRs:
* MSR_IA32_THERM_CONTROL
* MSR_IA32_THERM_INTERRUPT
* MSR_IA32_THERM_STATUS
* MSR_IA32_PACKAGE_THERM_STATUS
* MSR_IA32_PACKAGE_THERM_INTERRUPT
The virtualization of thermal events need these extra definitions.
While here, regroup the definitions and use the BIT_ULL() and
GENMASK_ULL() macro to improve readability.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/include/asm/msr-index.h | 54 +++++++++++++++++++----------
drivers/thermal/intel/therm_throt.c | 1 -
2 files changed, 35 insertions(+), 20 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 65b1bfb9c304..4f7ebfafa46a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -829,17 +829,26 @@
#define MSR_IA32_MPERF 0x000000e7
#define MSR_IA32_APERF 0x000000e8
-#define MSR_IA32_THERM_CONTROL 0x0000019a
-#define MSR_IA32_THERM_INTERRUPT 0x0000019b
-
-#define THERM_INT_HIGH_ENABLE (1 << 0)
-#define THERM_INT_LOW_ENABLE (1 << 1)
-#define THERM_INT_PLN_ENABLE (1 << 24)
-
-#define MSR_IA32_THERM_STATUS 0x0000019c
+#define MSR_IA32_THERM_CONTROL 0x0000019a
+#define THERM_ON_DEM_CLO_MOD_DUTY_CYC_MASK GENMASK_ULL(3, 1)
+#define THERM_ON_DEM_CLO_MOD_ENABLE BIT_ULL(4)
-#define THERM_STATUS_PROCHOT (1 << 0)
-#define THERM_STATUS_POWER_LIMIT (1 << 10)
+#define MSR_IA32_THERM_INTERRUPT 0x0000019b
+#define THERM_INT_HIGH_ENABLE BIT_ULL(0)
+#define THERM_INT_LOW_ENABLE BIT_ULL(1)
+#define THERM_INT_PROCHOT_ENABLE BIT_ULL(2)
+#define THERM_INT_FORCEPR_ENABLE BIT_ULL(3)
+#define THERM_INT_CRITICAL_TEM_ENABLE BIT_ULL(4)
+#define THERM_INT_PLN_ENABLE BIT_ULL(24)
+
+#define MSR_IA32_THERM_STATUS 0x0000019c
+#define THERM_STATUS_PROCHOT BIT_ULL(0)
+#define THERM_STATUS_PROCHOT_LOG BIT_ULL(1)
+#define THERM_STATUS_PROCHOT_FORCEPR_EVENT BIT_ULL(2)
+#define THERM_STATUS_PROCHOT_FORCEPR_LOG BIT_ULL(3)
+#define THERM_STATUS_CRITICAL_TEMP BIT_ULL(4)
+#define THERM_STATUS_CRITICAL_TEMP_LOG BIT_ULL(5)
+#define THERM_STATUS_POWER_LIMIT BIT_ULL(10)
#define MSR_THERM2_CTL 0x0000019d
@@ -861,17 +870,24 @@
#define ENERGY_PERF_BIAS_POWERSAVE 15
#define MSR_IA32_PACKAGE_THERM_STATUS 0x000001b1
-
-#define PACKAGE_THERM_STATUS_PROCHOT (1 << 0)
-#define PACKAGE_THERM_STATUS_POWER_LIMIT (1 << 10)
-#define PACKAGE_THERM_STATUS_HFI_UPDATED (1 << 26)
+#define PACKAGE_THERM_STATUS_PROCHOT BIT_ULL(0)
+#define PACKAGE_THERM_STATUS_PROCHOT_LOG BIT_ULL(1)
+#define PACKAGE_THERM_STATUS_PROCHOT_EVENT BIT_ULL(2)
+#define PACKAGE_THERM_STATUS_PROCHOT_EVENT_LOG BIT_ULL(3)
+#define PACKAGE_THERM_STATUS_CRITICAL_TEMP BIT_ULL(4)
+#define PACKAGE_THERM_STATUS_CRITICAL_TEMP_LOG BIT_ULL(5)
+#define PACKAGE_THERM_STATUS_POWER_LIMIT BIT_ULL(10)
+#define PACKAGE_THERM_STATUS_POWER_LIMIT_LOG BIT_ULL(11)
+#define PACKAGE_THERM_STATUS_DIG_READOUT_MASK GENMASK_ULL(22, 16)
+#define PACKAGE_THERM_STATUS_HFI_UPDATED BIT_ULL(26)
#define MSR_IA32_PACKAGE_THERM_INTERRUPT 0x000001b2
-
-#define PACKAGE_THERM_INT_HIGH_ENABLE (1 << 0)
-#define PACKAGE_THERM_INT_LOW_ENABLE (1 << 1)
-#define PACKAGE_THERM_INT_PLN_ENABLE (1 << 24)
-#define PACKAGE_THERM_INT_HFI_ENABLE (1 << 25)
+#define PACKAGE_THERM_INT_HIGH_ENABLE BIT_ULL(0)
+#define PACKAGE_THERM_INT_LOW_ENABLE BIT_ULL(1)
+#define PACKAGE_THERM_INT_PROCHOT_ENABLE BIT_ULL(2)
+#define PACKAGE_THERM_INT_OVERHEAT_ENABLE BIT_ULL(4)
+#define PACKAGE_THERM_INT_PLN_ENABLE BIT_ULL(24)
+#define PACKAGE_THERM_INT_HFI_ENABLE BIT_ULL(25)
/* Thermal Thresholds Support */
#define THERM_INT_THRESHOLD0_ENABLE (1 << 15)
diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c
index e69868e868eb..4c72fee32bf2 100644
--- a/drivers/thermal/intel/therm_throt.c
+++ b/drivers/thermal/intel/therm_throt.c
@@ -191,7 +191,6 @@ static const struct attribute_group thermal_attr_group = {
#endif /* CONFIG_SYSFS */
#define THERM_THROT_POLL_INTERVAL HZ
-#define THERM_STATUS_PROCHOT_LOG BIT(1)
static u64 therm_intr_core_clear_mask;
static u64 therm_intr_pkg_clear_mask;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 02/26] thermal: intel: hfi: Add helpers to build HFI/ITD structures
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
2024-02-03 9:11 ` [RFC 01/26] thermal: Add bit definition for x86 thermal related MSRs Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 03/26] thermal: intel: hfi: Add HFI notifier helpers to notify HFI update Zhao Liu
` (24 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
Virtual machines need to compose their own HFI tables. Provide helper
functions that collect the relevant features and data from the host
machine.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/include/asm/hfi.h | 20 ++++
drivers/thermal/intel/intel_hfi.c | 149 ++++++++++++++++++++++++++++++
2 files changed, 169 insertions(+)
diff --git a/arch/x86/include/asm/hfi.h b/arch/x86/include/asm/hfi.h
index b7fda3e0e8c8..e0fe5b30fb53 100644
--- a/arch/x86/include/asm/hfi.h
+++ b/arch/x86/include/asm/hfi.h
@@ -82,4 +82,24 @@ struct hfi_features {
unsigned int hdr_size;
};
+#if defined(CONFIG_INTEL_HFI_THERMAL)
+int intel_hfi_max_instances(void);
+int intel_hfi_build_virt_features(struct hfi_features *features, unsigned int nr_classes,
+ unsigned int nr_entries);
+int intel_hfi_build_virt_table(struct hfi_table *table, struct hfi_features *features,
+ unsigned int nr_classes, unsigned int hfi_index,
+ unsigned int cpu);
+static inline bool intel_hfi_enabled(void) { return intel_hfi_max_instances() > 0; }
+#else
+static inline int intel_hfi_max_instances(void) { return 0; }
+static inline int intel_hfi_build_virt_features(struct hfi_features *features,
+ unsigned int nr_classes,
+ unsigned int nr_entries) { return 0; }
+static inline int intel_hfi_build_virt_table(struct hfi_table *table,
+ struct hfi_features *features,
+ unsigned int nr_classes, unsigned int hfi_index,
+ unsigned int cpu) { return 0; }
+static inline bool intel_hfi_enabled(void) { return false; }
+#endif
+
#endif /* _ASM_X86_HFI_H */
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index b69fa234b317..139ce2d4b26b 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -29,6 +29,7 @@
#include <linux/io.h>
#include <linux/kernel.h>
#include <linux/math.h>
+#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/percpu-defs.h>
#include <linux/printk.h>
@@ -642,3 +643,151 @@ void __init intel_hfi_init(void)
kfree(hfi_instances);
hfi_instances = NULL;
}
+
+/**
+ * intel_hfi_max_instances() - Get the maximum number of hfi instances.
+ *
+ * Return: the maximum number of hfi instances.
+ */
+int intel_hfi_max_instances(void)
+{
+ return max_hfi_instances;
+}
+EXPORT_SYMBOL_GPL(intel_hfi_max_instances);
+
+/**
+ * intel_hfi_build_virt_features() - Build a virtual hfi_features structure.
+ *
+ * @features: Feature structure need to be filled
+ * @nr_classes: Maximum number of classes supported. 1 class indicates
+ * only HFI feature is configured and 4 classes indicates
+ * both HFI and ITD features.
+ * @nr_entries: Number of HFI entries in HFI table.
+ *
+ * Fill a virtual hfi_features structure which is used for HFI/ITD virtualization.
+ * HFI and ITD have different feature information, and the virtual feature
+ * structure is based on the corresponding configured number of classes (in Guest
+ * CPUID) to be built.
+ *
+ * Return: -EINVAL if there's the error for the parameters, otherwise 0.
+ */
+int intel_hfi_build_virt_features(struct hfi_features *features,
+ unsigned int nr_classes,
+ unsigned int nr_entries)
+{
+ unsigned int data_size;
+
+ if (!features || !nr_classes || !nr_entries)
+ return -EINVAL;
+
+ /*
+ * The virtual feature must be based on the Host's feature; when Host
+ * enables both HFI and ITD, it is allowed for Guest to create only the
+ * HFI feature structure which has fewer classes than ITD.
+ */
+ if (nr_classes > hfi_features.nr_classes)
+ return -EINVAL;
+
+ features->nr_classes = nr_classes;
+ features->class_stride = hfi_features.class_stride;
+ /*
+ * For the meaning of these two calculations, please refer to the comments
+ * in hfi_parse_features().
+ */
+ features->hdr_size = DIV_ROUND_UP(features->class_stride *
+ features->nr_classes, 8) * 8;
+ features->cpu_stride = DIV_ROUND_UP(features->class_stride *
+ features->nr_classes, 8) * 8;
+
+ data_size = features->hdr_size + nr_entries * features->cpu_stride;
+ features->nr_table_pages = PAGE_ALIGN(data_size) >> PAGE_SHIFT;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(intel_hfi_build_virt_features);
+
+/**
+ * intel_hfi_build_virt_table() - Fill the data of @hfi_index in virtual HFI table.
+ *
+ * @table: HFI table to be filled
+ * @features: Configured feature information of the HFI table
+ * @nr_classes: Number of classes to be updated for @table. This field is
+ * based on the enabled feature, which may be different with
+ * the feature information configured in @features.
+ * @hfi_index: Index of the HFI data in HFI table to be filled
+ * @cpu: CPU whose real HFI data is used to fill the @hfi_index
+ *
+ * Fill the row data of hfi_index in a virtual HFI table which is used for HFI/ITD
+ * virtualization. The size of the virtual HFI table is decided by the configured
+ * feature information in @features, and the filled HFI data range is decided by
+ * specified number of classes @nr_classes.
+ *
+ * Virtual machine may disable ITD at runtime through MSR_IA32_HW_FEEDBACK_CONFIG,
+ * in this case, only 1 class data (class 0) can be dynamically updated in virtual
+ * HFI table (class 0).
+ *
+ * Return: 1 if the @table is changed, 0 if the @table isn't changed, and
+ * -EINVAL/-ENOMEM if there's the error for the parameters.
+ */
+int intel_hfi_build_virt_table(struct hfi_table *table,
+ struct hfi_features *features,
+ unsigned int nr_classes,
+ unsigned int hfi_index,
+ unsigned int cpu)
+{
+ struct hfi_instance *hfi_instance;
+ struct hfi_hdr *hfi_hdr = table->hdr;
+ s16 host_hfi_index;
+ void *src_ptr, *dst_ptr;
+ int table_changed = 0;
+
+ if (!table || !features || !nr_classes)
+ return -EINVAL;
+
+ if (nr_classes > features->nr_classes ||
+ nr_classes > hfi_features.nr_classes)
+ return -EINVAL;
+
+ /*
+ * Make sure that this raw that will be filled doesn't cause overflow.
+ * features->nr_classes indicates the maximum number of possible
+ * classes.
+ */
+ if (features->hdr_size + (hfi_index + 1) * features->cpu_stride >
+ features->nr_table_pages << PAGE_SHIFT)
+ return -ENOMEM;
+
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ if (features->class_stride != hfi_features.class_stride)
+ return -EINVAL;
+
+ hfi_instance = per_cpu(hfi_cpu_info, cpu).hfi_instance;
+ host_hfi_index = per_cpu(hfi_cpu_info, cpu).index;
+
+ src_ptr = hfi_instance->local_table.data +
+ host_hfi_index * hfi_features.cpu_stride;
+ dst_ptr = table->data + hfi_index * features->cpu_stride;
+
+ raw_spin_lock_irq(&hfi_instance->table_lock);
+ for (int i = 0; i < nr_classes; i++) {
+ struct hfi_cpu_data *src = src_ptr + i * hfi_features.class_stride;
+ struct hfi_cpu_data *dst = dst_ptr + i * features->class_stride;
+
+ if (dst->perf_cap != src->perf_cap) {
+ dst->perf_cap = src->perf_cap;
+ hfi_hdr->perf_updated = 1;
+ }
+ if (dst->ee_cap != src->ee_cap) {
+ dst->ee_cap = src->ee_cap;
+ hfi_hdr->ee_updated = 1;
+ }
+ if (hfi_hdr->perf_updated || hfi_hdr->ee_updated)
+ table_changed = 1;
+ hfi_hdr++;
+ }
+ raw_spin_unlock_irq(&hfi_instance->table_lock);
+
+ return table_changed;
+}
+EXPORT_SYMBOL_GPL(intel_hfi_build_virt_table);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 03/26] thermal: intel: hfi: Add HFI notifier helpers to notify HFI update
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
2024-02-03 9:11 ` [RFC 01/26] thermal: Add bit definition for x86 thermal related MSRs Zhao Liu
2024-02-03 9:11 ` [RFC 02/26] thermal: intel: hfi: Add helpers to build HFI/ITD structures Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 04/26] KVM: Add kvm_arch_sched_out() hook Zhao Liu
` (23 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
KVM builds virtual HFI tables for virtual machines, which also needs to
sync Host's HFI table update in time.
Add notifier_chain in HFI instance to notify other modules about HFI
table updates, and provide 2 helpers to register/unregister notifier
hook in HFI driver.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/include/asm/hfi.h | 8 ++++
drivers/thermal/intel/intel_hfi.c | 63 ++++++++++++++++++++++++++++---
2 files changed, 65 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/hfi.h b/arch/x86/include/asm/hfi.h
index e0fe5b30fb53..19e3e5a7fb77 100644
--- a/arch/x86/include/asm/hfi.h
+++ b/arch/x86/include/asm/hfi.h
@@ -90,6 +90,10 @@ int intel_hfi_build_virt_table(struct hfi_table *table, struct hfi_features *fea
unsigned int nr_classes, unsigned int hfi_index,
unsigned int cpu);
static inline bool intel_hfi_enabled(void) { return intel_hfi_max_instances() > 0; }
+int intel_hfi_notifier_register(struct notifier_block *notifier,
+ unsigned int cpu);
+int intel_hfi_notifier_unregister(struct notifier_block *notifier,
+ unsigned int cpu);
#else
static inline int intel_hfi_max_instances(void) { return 0; }
static inline int intel_hfi_build_virt_features(struct hfi_features *features,
@@ -100,6 +104,10 @@ static inline int intel_hfi_build_virt_table(struct hfi_table *table,
unsigned int nr_classes, unsigned int hfi_index,
unsigned int cpu) { return 0; }
static inline bool intel_hfi_enabled(void) { return false; }
+static inline int intel_hfi_notifier_register(struct notifier_block *notifier,
+ unsigned int cpu) { return -ENODEV; }
+static inline int intel_hfi_notifier_unregister(struct notifier_block *notifier,
+ unsigned int cpu) { return -ENODEV; }
#endif
#endif /* _ASM_X86_HFI_H */
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 139ce2d4b26b..330b264ca23d 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -72,18 +72,20 @@ struct hfi_cpu_data {
* @cpus: CPUs represented in this HFI table instance
* @hw_table: Pointer to the HFI table of this instance
* @update_work: Delayed work to process HFI updates
+ * @notifier_chain: Notification chain dedicated to this instance
* @table_lock: Lock to protect acceses to the table of this instance
* @event_lock: Lock to process HFI interrupts
*
* A set of parameters to parse and navigate a specific HFI table.
*/
struct hfi_instance {
- struct hfi_table local_table;
- cpumask_var_t cpus;
- void *hw_table;
- struct delayed_work update_work;
- raw_spinlock_t table_lock;
- raw_spinlock_t event_lock;
+ struct hfi_table local_table;
+ cpumask_var_t cpus;
+ void *hw_table;
+ struct delayed_work update_work;
+ struct raw_notifier_head notifier_chain;
+ raw_spinlock_t table_lock;
+ raw_spinlock_t event_lock;
};
/**
@@ -189,6 +191,7 @@ static void hfi_update_work_fn(struct work_struct *work)
update_work);
update_capabilities(hfi_instance);
+ raw_notifier_call_chain(&hfi_instance->notifier_chain, 0, NULL);
}
void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
@@ -448,6 +451,7 @@ void intel_hfi_online(unsigned int cpu)
init_hfi_instance(hfi_instance);
INIT_DELAYED_WORK(&hfi_instance->update_work, hfi_update_work_fn);
+ RAW_INIT_NOTIFIER_HEAD(&hfi_instance->notifier_chain);
raw_spin_lock_init(&hfi_instance->table_lock);
raw_spin_lock_init(&hfi_instance->event_lock);
@@ -791,3 +795,50 @@ int intel_hfi_build_virt_table(struct hfi_table *table,
return table_changed;
}
EXPORT_SYMBOL_GPL(intel_hfi_build_virt_table);
+
+/**
+ * intel_hfi_notifier_register() - Register @notifier hook at @hfi_instance.
+ *
+ * @notifier: HFI notifier hook to be registered
+ * @cpu: CPU whose HFI instance the notifier is register at
+ *
+ * When the HFI instance of @cpu receives HFI interrupt and updates its local
+ * HFI table, the registered HFI notifier will be called.
+ *
+ * Return: 0 if successful, otherwise error.
+ */
+int intel_hfi_notifier_register(struct notifier_block *notifier,
+ unsigned int cpu)
+{
+ struct hfi_instance *hfi_instance;
+
+ if (!notifier || cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ hfi_instance = per_cpu(hfi_cpu_info, cpu).hfi_instance;
+ return raw_notifier_chain_register(&hfi_instance->notifier_chain,
+ notifier);
+}
+EXPORT_SYMBOL_GPL(intel_hfi_notifier_register);
+
+/**
+ * intel_hfi_notifier_unregister() - Unregister @notifier hook at @hfi_instance
+ *
+ * @notifier: HFI notifier hook to be unregistered
+ * @cpu: CPU whose HFI instance the notifier is unregister from
+ *
+ * Return: 0 if successful, otherwise error.
+ */
+int intel_hfi_notifier_unregister(struct notifier_block *notifier,
+ unsigned int cpu)
+{
+ struct hfi_instance *hfi_instance;
+
+ if (!notifier || cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ hfi_instance = per_cpu(hfi_cpu_info, cpu).hfi_instance;
+ return raw_notifier_chain_unregister(&hfi_instance->notifier_chain,
+ notifier);
+}
+EXPORT_SYMBOL_GPL(intel_hfi_notifier_unregister);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 04/26] KVM: Add kvm_arch_sched_out() hook
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (2 preceding siblings ...)
2024-02-03 9:11 ` [RFC 03/26] thermal: intel: hfi: Add HFI notifier helpers to notify HFI update Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 05/26] KVM: x86: Reset hardware history at vCPU's sched_in/out Zhao Liu
` (22 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
x86 needs to reset classification history when vCPU is scheduled out.
Add the kvm_arch_sched_out() hook to allow x86 implements its own
history reset logic at sched_out.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/mips/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/riscv/include/asm/kvm_host.h | 1 +
arch/s390/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 ++
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 1 +
8 files changed, 9 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..a7898fceb761 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1127,6 +1127,7 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu) {}
void kvm_arm_init_debug(void);
void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 179f320cc231..2bcd462db11a 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -891,6 +891,7 @@ static inline void kvm_arch_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8abac532146e..96bcf62439b2 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -898,6 +898,7 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 484d04a92fa6..a395a366f034 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -273,6 +273,7 @@ struct kvm_vcpu_arch {
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu) {}
#define KVM_RISCV_GSTAGE_TLB_MIN_ORDER 12
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 52664105a473..6e03188d11b0 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -1045,6 +1045,7 @@ extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b5b2d0fde579..2be78549bec8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2280,6 +2280,8 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu) {}
+
#define KVM_CLOCK_VALID_FLAGS \
(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..3aabd3813de0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1478,6 +1478,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
+void kvm_arch_sched_out(struct kvm_vcpu *vcpu);
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10bfc88a69f7..671f88dff006 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6317,6 +6317,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
WRITE_ONCE(vcpu->ready, true);
}
kvm_arch_vcpu_put(vcpu);
+ kvm_arch_sched_out(vcpu);
__this_cpu_write(kvm_running_vcpu, NULL);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 05/26] KVM: x86: Reset hardware history at vCPU's sched_in/out
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (3 preceding siblings ...)
2024-02-03 9:11 ` [RFC 04/26] KVM: Add kvm_arch_sched_out() hook Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 06/26] KVM: VMX: Add helpers to handle the writes to MSR's R/O and R/WC0 bits Zhao Liu
` (21 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
Reset the classification history of the vCPU thread when it's scheduled
in and scheduled out. Hardware will start the classification of the vCPU
thread from scratch.
This helps protect Host/VM history information from leaking Host history
to VMs or leaking VM history to sibling VMs.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/x86.c | 8 ++++++++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2be78549bec8..b5b2d0fde579 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2280,8 +2280,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
-static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu) {}
-
#define KVM_CLOCK_VALID_FLAGS \
(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 363b1c080205..cd9a7251c768 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -79,6 +79,7 @@
#include <asm/div64.h>
#include <asm/irq_remapping.h>
#include <asm/mshyperv.h>
+#include <asm/hreset.h>
#include <asm/hypervisor.h>
#include <asm/tlbflush.h>
#include <asm/intel_pt.h>
@@ -12491,9 +12492,16 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
pmu->need_cleanup = true;
kvm_make_request(KVM_REQ_PMU, vcpu);
}
+
+ reset_hardware_history();
static_call(kvm_x86_sched_in)(vcpu, cpu);
}
+void kvm_arch_sched_out(struct kvm_vcpu *vcpu)
+{
+ reset_hardware_history();
+}
+
void kvm_arch_free_vm(struct kvm *kvm)
{
#if IS_ENABLED(CONFIG_HYPERV)
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 06/26] KVM: VMX: Add helpers to handle the writes to MSR's R/O and R/WC0 bits
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (4 preceding siblings ...)
2024-02-03 9:11 ` [RFC 05/26] KVM: x86: Reset hardware history at vCPU's sched_in/out Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 07/26] KVM: VMX: Emulate ACPI (CPUID.0x01.edx[bit 22]) feature Zhao Liu
` (20 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
For WRMSR emulation, any write to R/O bit and any nonzero write to R/WC0
bit must be ignored.
Provide 2 helpers to emulate the above R/O and R/WC0 write behavior.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e262bc2ba4e5..8f5981635fe5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2147,6 +2147,20 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
+/* Ignore writes to R/O bits. */
+static inline u64 vmx_set_msr_ro_bits(u64 new_val, u64 old_val, u64 ro_mask)
+{
+ return (new_val & ~ro_mask) | (old_val & ro_mask);
+}
+
+/* Ignore non-zero writes to R/WC0 bits. */
+static inline u64 vmx_set_msr_rwc0_bits(u64 new_val, u64 old_val, u64 rwc0_mask)
+{
+ u64 new_rwc0 = new_val & rwc0_mask, old_rwc0 = old_val & rwc0_mask;
+
+ return ((new_rwc0 | ~old_rwc0) & old_rwc0) | (new_val & ~rwc0_mask);
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 07/26] KVM: VMX: Emulate ACPI (CPUID.0x01.edx[bit 22]) feature
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (5 preceding siblings ...)
2024-02-03 9:11 ` [RFC 06/26] KVM: VMX: Add helpers to handle the writes to MSR's R/O and R/WC0 bits Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 08/26] KVM: x86: Expose TM/ACC (CPUID.0x01.edx[bit 29]) feature bit to VM Zhao Liu
` (19 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
The ACPI (Thermal Monitor and Software Controlled Clock Facilities)
feature is a dependency of thermal interrupt processing so that
it is required for the HFI notification (a thermal interrupt)
handling.
To support VM to handle thermal interrupt, we need to emulate ACPI
feature in KVM:
1. Emulate MSR_IA32_THERM_CONTROL (alias, IA32_CLOCK_MODULATION),
MSR_IA32_THERM_INTERRUPT and MSR_IA32_THERM_STATUS with dummy values.
According to SDM [1], the ACPI feature means:
"The ACPI flag (bit 22) of the CPUID feature flags indicates the
presence of the IA32_THERM_STATUS, IA32_THERM_INTERRUPT,
IA32_CLOCK_MODULATION MSRs, and the xAPIC thermal LVT entry."
It is enough to use dummy values in KVM to emulate the RDMSR/WRMSR on
them.
2. Add the thermal interrupt injection interfaces.
This interface reflects the integrity of the ACPI emulation. Although
thermal interrupts are not actually injected into the Guest now, in the
following HFI/ITD emulations, thermal interrupt will be injected into
Guest once the conditions are met.
3. Additionally, expose the CPUID bit of the ACPI feature to the VM,
which can help enable thermal interrupt handling in the VM.
[1]: SDM, vol. 3B, section 15.8.4.1, Detection of Software Controlled
Clock Modulation Extension.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/irq.h | 1 +
arch/x86/kvm/lapic.c | 9 ++++
arch/x86/kvm/svm/svm.c | 3 ++
arch/x86/kvm/vmx/vmx.c | 94 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 3 ++
arch/x86/kvm/x86.c | 3 ++
7 files changed, 114 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index adba49afb5fe..1ad547651022 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -623,7 +623,7 @@ void kvm_set_cpu_caps(void)
F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLUSH) |
- 0 /* Reserved, DS, ACPI */ | F(MMX) |
+ 0 /* Reserved, DS */ | F(ACPI) | F(MMX) |
F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
0 /* HTT, TM, Reserved, PBE */
);
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index c2d7cfe82d00..e11c1fb6e1e6 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -99,6 +99,7 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
+void kvm_apic_therm_deliver(struct kvm_vcpu *vcpu);
void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3242f3da2457..af8572798976 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2783,6 +2783,15 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
}
+void kvm_apic_therm_deliver(struct kvm_vcpu *vcpu)
+{
+ struct kvm_lapic *apic = vcpu->arch.apic;
+
+ if (apic)
+ kvm_apic_local_deliver(apic, APIC_LVTTHMR);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_therm_deliver);
+
static const struct kvm_io_device_ops apic_mmio_ops = {
.read = apic_mmio_read,
.write = apic_mmio_write,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e90b429c84f1..2e22d5e86768 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4288,6 +4288,9 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
switch (index) {
case MSR_IA32_MCG_EXT_CTL:
case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
+ case MSR_IA32_THERM_CONTROL:
+ case MSR_IA32_THERM_INTERRUPT:
+ case MSR_IA32_THERM_STATUS:
return false;
case MSR_IA32_SMBASE:
if (!IS_ENABLED(CONFIG_KVM_SMM))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8f5981635fe5..aa37b55cf045 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -157,6 +157,32 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED | \
RTIT_STATUS_BYTECNT))
+/*
+ * TM2 (CPUID.01H:ECX[8]), DTHERM (CPUID.06H:EAX[0]), PLN (CPUID.06H:EAX[4]),
+ * and HWP (CPUID.06H:EAX[7]) are not emulated in kvm.
+ */
+#define MSR_IA32_THERM_STATUS_RO_MASK (THERM_STATUS_PROCHOT | \
+ THERM_STATUS_PROCHOT_FORCEPR_EVENT | THERM_STATUS_CRITICAL_TEMP)
+#define MSR_IA32_THERM_STATUS_RWC0_MASK (THERM_STATUS_PROCHOT_LOG | \
+ THERM_STATUS_PROCHOT_FORCEPR_LOG | THERM_STATUS_CRITICAL_TEMP_LOG)
+/* MSR_IA32_THERM_STATUS unavailable bits mask: unsupported and reserved bits. */
+#define MSR_IA32_THERM_STATUS_UNAVAIL_MASK (~(MSR_IA32_THERM_STATUS_RO_MASK | \
+ MSR_IA32_THERM_STATUS_RWC0_MASK))
+
+/* ECMD (CPUID.06H:EAX[5]) is not emulated in kvm. */
+#define MSR_IA32_THERM_CONTROL_AVAIL_MASK (THERM_ON_DEM_CLO_MOD_ENABLE | \
+ THERM_ON_DEM_CLO_MOD_DUTY_CYC_MASK)
+
+/*
+ * MSR_IA32_THERM_INTERRUPT available bits mask.
+ * PLN (CPUID.06H:EAX[4]) and HFN (CPUID.06H:EAX[24]) are not emulated in kvm.
+ */
+#define MSR_IA32_THERM_INTERRUPT_AVAIL_MASK (THERM_INT_HIGH_ENABLE | \
+ THERM_INT_LOW_ENABLE | THERM_INT_PROCHOT_ENABLE | \
+ THERM_INT_FORCEPR_ENABLE | THERM_INT_CRITICAL_TEM_ENABLE | \
+ THERM_MASK_THRESHOLD0 | THERM_INT_THRESHOLD0_ENABLE | \
+ THERM_MASK_THRESHOLD1 | THERM_INT_THRESHOLD1_ENABLE)
+
/*
* List of MSRs that can be directly passed to the guest.
* In addition to these x2apic and PT MSRs are handled specially.
@@ -1470,6 +1496,19 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
}
}
+static void vmx_inject_therm_interrupt(struct kvm_vcpu *vcpu)
+{
+ /*
+ * From SDM, the ACPI flag also indicates the presence of the
+ * xAPIC thermal LVT entry.
+ */
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_ACPI))
+ return;
+
+ if (irqchip_in_kernel(vcpu->kvm))
+ kvm_apic_therm_deliver(vcpu);
+}
+
/*
* Switches to specified vcpu, until a matching vcpu_put(), but assumes
* vcpu mutex is already taken.
@@ -2109,6 +2148,24 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
+ case MSR_IA32_THERM_CONTROL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ACPI))
+ return 1;
+ msr_info->data = vmx->msr_ia32_therm_control;
+ break;
+ case MSR_IA32_THERM_INTERRUPT:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ACPI))
+ return 1;
+ msr_info->data = vmx->msr_ia32_therm_interrupt;
+ break;
+ case MSR_IA32_THERM_STATUS:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ACPI))
+ return 1;
+ msr_info->data = vmx->msr_ia32_therm_status;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2452,6 +2509,40 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
ret = kvm_set_msr_common(vcpu, msr_info);
break;
+ case MSR_IA32_THERM_CONTROL:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ACPI))
+ return 1;
+ if (!msr_info->host_initiated &&
+ data & ~MSR_IA32_THERM_CONTROL_AVAIL_MASK)
+ return 1;
+ vmx->msr_ia32_therm_control = data;
+ break;
+ case MSR_IA32_THERM_INTERRUPT:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ACPI))
+ return 1;
+ if (!msr_info->host_initiated &&
+ data & ~MSR_IA32_THERM_INTERRUPT_AVAIL_MASK)
+ return 1;
+ vmx->msr_ia32_therm_interrupt = data;
+ break;
+ case MSR_IA32_THERM_STATUS:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ACPI))
+ return 1;
+ /* Unsupported and reserved bits: generate the exception. */
+ if (!msr_info->host_initiated &&
+ data & MSR_IA32_THERM_STATUS_UNAVAIL_MASK)
+ return 1;
+ if (!msr_info->host_initiated) {
+ data = vmx_set_msr_rwc0_bits(data, vmx->msr_ia32_therm_status,
+ MSR_IA32_THERM_STATUS_RWC0_MASK);
+ data = vmx_set_msr_ro_bits(data, vmx->msr_ia32_therm_status,
+ MSR_IA32_THERM_STATUS_RO_MASK);
+ }
+ vmx->msr_ia32_therm_status = data;
+ break;
default:
find_uret_msr:
@@ -4870,6 +4961,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->spec_ctrl = 0;
vmx->msr_ia32_umwait_control = 0;
+ vmx->msr_ia32_therm_control = 0;
+ vmx->msr_ia32_therm_interrupt = 0;
+ vmx->msr_ia32_therm_status = 0;
vmx->hv_deadline_tsc = -1;
kvm_set_cr8(vcpu, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index e3b0985bb74a..e159dd5b7a66 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -282,6 +282,9 @@ struct vcpu_vmx {
u64 spec_ctrl;
u32 msr_ia32_umwait_control;
+ u64 msr_ia32_therm_control;
+ u64 msr_ia32_therm_interrupt;
+ u64 msr_ia32_therm_status;
/*
* loaded_vmcs points to the VMCS currently used in this vcpu. For a
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd9a7251c768..50aceb0ce4ee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1545,6 +1545,9 @@ static const u32 emulated_msrs_all[] = {
MSR_AMD64_TSC_RATIO,
MSR_IA32_POWER_CTL,
MSR_IA32_UCODE_REV,
+ MSR_IA32_THERM_CONTROL,
+ MSR_IA32_THERM_INTERRUPT,
+ MSR_IA32_THERM_STATUS,
/*
* KVM always supports the "true" VMX control MSRs, even if the host
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 08/26] KVM: x86: Expose TM/ACC (CPUID.0x01.edx[bit 29]) feature bit to VM
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (6 preceding siblings ...)
2024-02-03 9:11 ` [RFC 07/26] KVM: VMX: Emulate ACPI (CPUID.0x01.edx[bit 22]) feature Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 09/26] KVM: x86: cpuid: Define CPUID 0x06.eax by kvm_cpu_cap_mask() Zhao Liu
` (18 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
The TM (Thermal Monitor, alias, TM1/ACC) feature is a dependency of
thermal interrupt processing so that it is required for the
HFI notification (a thermal interrupt) handling.
According to SDM [1], the TM feature means:
"The TM1 flag (bit 29) of the CPUID feature flags indicates the presence
of the automatic thermal monitoring facilities that modulate clock duty
cycles."
Considering that the TM feature does not provide any OS interaction
interface, but only indicates the presence of a hardware feature.
Therefore, we do not need to perform any additional software emulation
while exposing the TM feature bit.
Expose the TM feature bit to the VM to support the VM in handling the
thermal interrupt.
[1]: SDM, vol. 3B, section 15.8.4.1, Detection of Software Controlled
Clock Modulation Extension.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/cpuid.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1ad547651022..829bb9c6516f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -625,7 +625,7 @@ void kvm_set_cpu_caps(void)
F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLUSH) |
0 /* Reserved, DS */ | F(ACPI) | F(MMX) |
F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
- 0 /* HTT, TM, Reserved, PBE */
+ 0 /* HTT */ | F(ACC) | 0 /* Reserved, PBE */
);
kvm_cpu_cap_mask(CPUID_7_0_EBX,
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 09/26] KVM: x86: cpuid: Define CPUID 0x06.eax by kvm_cpu_cap_mask()
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (7 preceding siblings ...)
2024-02-03 9:11 ` [RFC 08/26] KVM: x86: Expose TM/ACC (CPUID.0x01.edx[bit 29]) feature bit to VM Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 10/26] KVM: VMX: Emulate PTM/PTS (CPUID.0x06.eax[bit 6]) feature Zhao Liu
` (17 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
PTS, HFI, and ITD feature bits are to be specified in kvm_cpu_caps and
depend on Host support.
Define kvm_cpu_caps[CPUID_6_EAX] with kvm_cpu_cap_mask() and use this
0x06 cap feature to set the 0x06 leaf of the guest.
Currently, only ARAT is supported in 0x06.eax. Although ARAT is not
available on all CPUs with VMX support[1], commit e453aa0f7e7b ("KVM:
x86: Allow ARAT CPU feature") always sets ARAT for Guest because the
APIC timer is emulated.
Explicitly check ARAT in __do_cpuid_func() and make sure this feature
bit is always set.
[1]: https://lore.kernel.org/kvm/1523455369.20087.16.camel@intel.com/
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/cpuid.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 829bb9c6516f..d8cfae17cc92 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -628,6 +628,10 @@ void kvm_set_cpu_caps(void)
0 /* HTT */ | F(ACC) | 0 /* Reserved, PBE */
);
+ kvm_cpu_cap_mask(CPUID_6_EAX,
+ F(ARAT)
+ );
+
kvm_cpu_cap_mask(CPUID_7_0_EBX,
F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
@@ -964,7 +968,12 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
}
break;
case 6: /* Thermal management */
- entry->eax = 0x4; /* allow ARAT */
+ cpuid_entry_override(entry, CPUID_6_EAX);
+
+ /* Always allow ARAT since APICs are emulated. */
+ if (!kvm_cpu_cap_has(X86_FEATURE_ARAT))
+ entry->eax |= 0x4;
+
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 10/26] KVM: VMX: Emulate PTM/PTS (CPUID.0x06.eax[bit 6]) feature
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (8 preceding siblings ...)
2024-02-03 9:11 ` [RFC 09/26] KVM: x86: cpuid: Define CPUID 0x06.eax by kvm_cpu_cap_mask() Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:11 ` [RFC 11/26] KVM: VMX: Introduce HFI description structure Zhao Liu
` (16 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
The PTM feature (Package Thermal Management, alias, PTS) is the
dependency of the Hardware Feedback Interface (HFI) feature.
To support HFI virtualization, PTM feature is also required to be
emulated in KVM.
The PTM feature provides 2 package-level thermal related MSRs:
MSR_IA32_PACKAGE_THERM_INTERRUPT and MSR_IA32_PACKAGE_THERM_STATUS.
Currently KVM doesn't support MSR topology (except for thread scope MSR,
no more other different topological scopes), but since PTM's package
thermal MSRs are only used on client platform with only 1 package, it's
enough to handle these 2 MSRs at VM level. Additionally, a mutex is used
to avoid competing different vCPUs' access to emulated MSR values stored
in kvm_vmx.
PTM also indicates the presence of package level thermal interrupts,
which is meaningful for VM to handle package level thermal interrupt.
The ACPI emulation patch has already added the support for thermal
interrupt injection, and this also reflects the integrity of the PTM
emulation. Although thermal interrupts are not actually injected into
the Guest now, in the following HFI/ITD emulations, thermal interrupts
will be injected into the Guest once the conditions are met.
In addition, expose the CPUID bit of the PTM feature to the VM, which
can help enable package thermal interrupt handling in VM.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/cpuid.c | 11 ++++++
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 76 +++++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 9 +++++
arch/x86/kvm/x86.c | 2 ++
5 files changed, 99 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d8cfae17cc92..eaac2c8d98b9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -632,6 +632,17 @@ void kvm_set_cpu_caps(void)
F(ARAT)
);
+ /*
+ * PTS is the dependency of ITD, currently we only use PTS for
+ * enabling ITD in KVM. Since KVM does not support msr topology at
+ * present, the emulation of PTS has restrictions on the topology of
+ * Guest, so we only expose PTS when Host enables ITD.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_ITD)) {
+ if (boot_cpu_has(X86_FEATURE_PTS))
+ kvm_cpu_cap_set(X86_FEATURE_PTS);
+ }
+
kvm_cpu_cap_mask(CPUID_7_0_EBX,
F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2e22d5e86768..7039ae48d8d0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4291,6 +4291,8 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
case MSR_IA32_THERM_CONTROL:
case MSR_IA32_THERM_INTERRUPT:
case MSR_IA32_THERM_STATUS:
+ case MSR_IA32_PACKAGE_THERM_INTERRUPT:
+ case MSR_IA32_PACKAGE_THERM_STATUS:
return false;
case MSR_IA32_SMBASE:
if (!IS_ENABLED(CONFIG_KVM_SMM))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index aa37b55cf045..45b40a47b448 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -183,6 +183,29 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
THERM_MASK_THRESHOLD0 | THERM_INT_THRESHOLD0_ENABLE | \
THERM_MASK_THRESHOLD1 | THERM_INT_THRESHOLD1_ENABLE)
+/* HFI (CPUID.06H:EAX[19]) is not emulated in kvm yet. */
+#define MSR_IA32_PACKAGE_THERM_STATUS_RO_MASK (PACKAGE_THERM_STATUS_PROCHOT | \
+ PACKAGE_THERM_STATUS_PROCHOT_EVENT | PACKAGE_THERM_STATUS_CRITICAL_TEMP | \
+ THERM_STATUS_THRESHOLD0 | THERM_STATUS_THRESHOLD1 | \
+ PACKAGE_THERM_STATUS_POWER_LIMIT | PACKAGE_THERM_STATUS_DIG_READOUT_MASK)
+#define MSR_IA32_PACKAGE_THERM_STATUS_RWC0_MASK (PACKAGE_THERM_STATUS_PROCHOT_LOG | \
+ PACKAGE_THERM_STATUS_PROCHOT_EVENT_LOG | PACKAGE_THERM_STATUS_CRITICAL_TEMP_LOG | \
+ THERM_LOG_THRESHOLD0 | THERM_LOG_THRESHOLD1 | \
+ PACKAGE_THERM_STATUS_POWER_LIMIT_LOG)
+/* MSR_IA32_PACKAGE_THERM_STATUS unavailable bits mask: unsupported and reserved bits. */
+#define MSR_IA32_PACKAGE_THERM_STATUS_UNAVAIL_MASK (~(MSR_IA32_PACKAGE_THERM_STATUS_RO_MASK | \
+ MSR_IA32_PACKAGE_THERM_STATUS_RWC0_MASK))
+
+/*
+ * MSR_IA32_PACKAGE_THERM_INTERRUPT available bits mask.
+ * HFI (CPUID.06H:EAX[19]) is not emulated in kvm yet.
+ */
+#define MSR_IA32_PACKAGE_THERM_INTERRUPT_AVAIL_MASK (PACKAGE_THERM_INT_HIGH_ENABLE | \
+ PACKAGE_THERM_INT_LOW_ENABLE | PACKAGE_THERM_INT_PROCHOT_ENABLE | \
+ PACKAGE_THERM_INT_OVERHEAT_ENABLE | THERM_MASK_THRESHOLD0 | \
+ THERM_INT_THRESHOLD0_ENABLE | THERM_MASK_THRESHOLD1 | \
+ THERM_INT_THRESHOLD1_ENABLE | PACKAGE_THERM_INT_PLN_ENABLE)
+
/*
* List of MSRs that can be directly passed to the guest.
* In addition to these x2apic and PT MSRs are handled specially.
@@ -2013,6 +2036,7 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
struct vmx_uret_msr *msr;
u32 index;
@@ -2166,6 +2190,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = vmx->msr_ia32_therm_status;
break;
+ case MSR_IA32_PACKAGE_THERM_INTERRUPT:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_PTS))
+ return 1;
+ msr_info->data = kvm_vmx->pkg_therm.msr_pkg_therm_int;
+ break;
+ case MSR_IA32_PACKAGE_THERM_STATUS:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_PTS))
+ return 1;
+ msr_info->data = kvm_vmx->pkg_therm.msr_pkg_therm_status;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2226,6 +2262,7 @@ static inline u64 vmx_set_msr_rwc0_bits(u64 new_val, u64 old_val, u64 rwc0_mask)
static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
struct vmx_uret_msr *msr;
int ret = 0;
u32 msr_index = msr_info->index;
@@ -2543,7 +2580,35 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
vmx->msr_ia32_therm_status = data;
break;
+ case MSR_IA32_PACKAGE_THERM_INTERRUPT:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_PTS))
+ return 1;
+ /* Unsupported and reserved bits: generate the exception. */
+ if (!msr_info->host_initiated &&
+ data & ~MSR_IA32_PACKAGE_THERM_INTERRUPT_AVAIL_MASK)
+ return 1;
+ kvm_vmx->pkg_therm.msr_pkg_therm_int = data;
+ break;
+ case MSR_IA32_PACKAGE_THERM_STATUS:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_PTS))
+ return 1;
+ /* Unsupported and reserved bits: generate the exception. */
+ if (!msr_info->host_initiated &&
+ data & MSR_IA32_PACKAGE_THERM_STATUS_UNAVAIL_MASK)
+ return 1;
+ mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ if (!msr_info->host_initiated) {
+ data = vmx_set_msr_rwc0_bits(data, kvm_vmx->pkg_therm.msr_pkg_therm_status,
+ MSR_IA32_PACKAGE_THERM_STATUS_RWC0_MASK);
+ data = vmx_set_msr_ro_bits(data, kvm_vmx->pkg_therm.msr_pkg_therm_status,
+ MSR_IA32_PACKAGE_THERM_STATUS_RO_MASK);
+ }
+ kvm_vmx->pkg_therm.msr_pkg_therm_status = data;
+ mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_index);
@@ -7649,6 +7714,14 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
return err;
}
+static int vmx_vm_init_pkg_therm(struct kvm *kvm)
+{
+ struct pkg_therm_desc *pkg_therm = &to_kvm_vmx(kvm)->pkg_therm;
+
+ mutex_init(&pkg_therm->pkg_therm_lock);
+ return 0;
+}
+
#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
#define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
@@ -7680,7 +7753,8 @@ static int vmx_vm_init(struct kvm *kvm)
break;
}
}
- return 0;
+
+ return vmx_vm_init_pkg_therm(kvm);
}
static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index e159dd5b7a66..5723780da180 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -369,6 +369,13 @@ struct vcpu_vmx {
} shadow_msr_intercept;
};
+struct pkg_therm_desc {
+ u64 msr_pkg_therm_int;
+ u64 msr_pkg_therm_status;
+ /* All members before "struct mutex pkg_therm_lock" are protected by the lock. */
+ struct mutex pkg_therm_lock;
+};
+
struct kvm_vmx {
struct kvm kvm;
@@ -377,6 +384,8 @@ struct kvm_vmx {
gpa_t ept_identity_map_addr;
/* Posted Interrupt Descriptor (PID) table for IPI virtualization */
u64 *pid_table;
+
+ struct pkg_therm_desc pkg_therm;
};
void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 50aceb0ce4ee..7d787ced513f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1548,6 +1548,8 @@ static const u32 emulated_msrs_all[] = {
MSR_IA32_THERM_CONTROL,
MSR_IA32_THERM_INTERRUPT,
MSR_IA32_THERM_STATUS,
+ MSR_IA32_PACKAGE_THERM_INTERRUPT,
+ MSR_IA32_PACKAGE_THERM_STATUS,
/*
* KVM always supports the "true" VMX control MSRs, even if the host
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 11/26] KVM: VMX: Introduce HFI description structure
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (9 preceding siblings ...)
2024-02-03 9:11 ` [RFC 10/26] KVM: VMX: Emulate PTM/PTS (CPUID.0x06.eax[bit 6]) feature Zhao Liu
@ 2024-02-03 9:11 ` Zhao Liu
2024-02-03 9:12 ` [RFC 12/26] KVM: VMX: Introduce HFI table index for vCPU Zhao Liu
` (15 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:11 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
HFI/ITD virtualization needs to maintain the virtual HFI table in KVM.
This virtual HFI table is used to sync the update of Host's HFI table,
and then KVM will write this virtual table into Guest's HFI table
address.
Along with the sync process, the KVM also needs to know the status of
the Guest HFI.
Therefore, provide the hfi_desc structure to store the following things:
* The state flags of Guest HFI.
* The basic information of Guest HFI.
* The local virtual HFI table.
The PTS feature is emulated at VM level, so also support hfi_desc at VM
level.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 2 ++
arch/x86/kvm/vmx/vmx.h | 41 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 45b40a47b448..48f304683d6f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8386,7 +8386,9 @@ static void vmx_hardware_unsetup(void)
static void vmx_vm_destroy(struct kvm *kvm)
{
struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ kfree(kvm_vmx_hfi->hfi_table.base_addr);
free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm));
}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 5723780da180..4bf4ca6ac1c0 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -7,6 +7,7 @@
#include <asm/kvm.h>
#include <asm/intel_pt.h>
#include <asm/perf_event.h>
+#include <asm/hfi.h>
#include "capabilities.h"
#include "../kvm_cache_regs.h"
@@ -369,9 +370,49 @@ struct vcpu_vmx {
} shadow_msr_intercept;
};
+/**
+ * struct hfi_desc - Representation of an HFI instance (i.e., a table)
+ * @hfi_enabled: Flag to indicate whether HFI is enabled at runtime.
+ * Parsed from the Guest's MSR_IA32_HW_FEEDBACK_CONFIG.
+ * @hfi_int_enabled: Flag to indicate whether HFI is enabled at runtime.
+ * Parsed from Guest's MSR_IA32_PACKAGE_THERM_INTERRUPT[bit 25].
+ * @table_ptr_valid: Flag to indicate whether the memory of Guest HFI table is ready.
+ * Parsed from the valid bit of Guest's MSR_IA32_HW_FEEDBACK_PTR.
+ * @hfi_update_status: Flag to indicate whether Guest has handled the virtual HFI table
+ * update.
+ * Parsed from Guest's MSR_IA32_PACKAGE_THERM_STATUS[bit 26].
+ * @hfi_update_pending: Flag to indicate whether there's any update on Host that is not
+ * synced to Guest.
+ * KVM should update the Guest's HFI table and inject the notification
+ * until Guest has cleared hfi_update_status.
+ * @table_base: GPA of Guest's HFI table, which is parsed from Guest's
+ * MSR_IA32_HW_FEEDBACK_PTR.
+ * @hfi_features: Feature information based on Guest's HFI/ITD CPUID.
+ * @hfi_table: Local virtual HFI table based on the HFI data of the pCPU that
+ * the vCPU is running on.
+ * When KVM updates the Guest's HFI table, it writes the local
+ * virtual HFI table to the Guest HFI table memory in @table_base.
+ *
+ * A set of status flags and feature information, used to maintain local virtual HFI table
+ * and sync updates to Guest HFI table.
+ */
+
+struct hfi_desc {
+ bool hfi_enabled;
+ bool hfi_int_enabled;
+ bool table_ptr_valid;
+ bool hfi_update_status;
+ bool hfi_update_pending;
+ gpa_t table_base;
+ struct hfi_features hfi_features;
+ struct hfi_table hfi_table;
+};
+
struct pkg_therm_desc {
u64 msr_pkg_therm_int;
u64 msr_pkg_therm_status;
+ /* Currently HFI is only supported at package level. */
+ struct hfi_desc hfi_desc;
/* All members before "struct mutex pkg_therm_lock" are protected by the lock. */
struct mutex pkg_therm_lock;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 12/26] KVM: VMX: Introduce HFI table index for vCPU
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (10 preceding siblings ...)
2024-02-03 9:11 ` [RFC 11/26] KVM: VMX: Introduce HFI description structure Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 13/26] KVM: VMX: Support virtual HFI table for VM Zhao Liu
` (14 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
The HFI table contains a table header and many table entries. Each table
entry is identified by an HFI table index, and each CPU corresponds to
one of the HFI table indexes [1].
Add hfi_table_idx in vcpu_vmx, and this will be used to build virtual
HFI table.
This HFI index is initialized to 0, but in the following patch the VMM
can be allowed to configure this index with a custom value (CPUID.0x06.
edx[bits 16-31]).
[1]: SDM, vol. 3B, section 15.6.1 Hardware Feedback Interface Table
Structure
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/vmx/vmx.h | 3 +++
2 files changed, 9 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 48f304683d6f..96f0f768939d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7648,6 +7648,12 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
tsx_ctrl->mask = ~(u64)TSX_CTRL_CPUID_CLEAR;
}
+ /*
+ * hfi_table_idx is initialized to 0, but later it may be changed according
+ * to the value in the Guest's CPUID.0x06.edx[bits 16-31].
+ */
+ vmx->hfi_table_idx = 0;
+
err = alloc_loaded_vmcs(&vmx->vmcs01);
if (err < 0)
goto free_pml;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 4bf4ca6ac1c0..63874aad7ae3 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -362,6 +362,9 @@ struct vcpu_vmx {
struct pt_desc pt_desc;
struct lbr_desc lbr_desc;
+ /* Should be extracted from Guest's CPUID.0x06.edx[bits 16-31]. */
+ int hfi_table_idx;
+
/* Save desired MSR intercept (read: pass-through) state */
#define MAX_POSSIBLE_PASSTHROUGH_MSRS 16
struct {
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 13/26] KVM: VMX: Support virtual HFI table for VM
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (11 preceding siblings ...)
2024-02-03 9:12 ` [RFC 12/26] KVM: VMX: Introduce HFI table index for vCPU Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 14/26] KVM: x86: Introduce the HFI dynamic update request and kvm_x86_ops Zhao Liu
` (13 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
The Hardware Feedback Interface (HFI) is a feature that allows hardware
to provide guidance to the operating system scheduler through a hardware
feedback interface structure (HFI table) in memory [1], so that the
scheduler can perform optimal workload scheduling.
ITD (Intel Thread Director) and HFI features both depend on the HFI
table, but their HFI tables are slightly different. The HFI table
provided by the ITD feature has 4 classes (in terms of more columns
in the table) and the native HFI feature supports 1 class [2].
In fact, the size of the HFI table is determined by the feature bit that
the processor supports, but the range of data updates in the table
is determined by the feature actually enabled (HFI or ITD) [3], which is
controlled by MSR_IA32_HW_FEEDBACK_CONFIG.
To benefit the scheduling in VM with HFI/ITD, we need to maintain
virtual HFI tables in KVM. The virtual HFI table is based on the real
HFI table. We extract the HFI entries corresponding to the pCPU that the
vCPU is running on, and reorganize these actual entries into a new
virtual HFI table with the vCPU's HFI index.
Also, to simplify the logic, before the emulation of ITD is supported,
we build virtual HFI table based on HFI feature by default (i.e. only 1
class is supported, based on class 0 of real hardware).
Add the interfaces to initialize and build the virtual HFI table, and to
inject the thermal interrupt into the VM to notify about HFI updates.
[1]: SDM, vol. 3B, section 15.6 HARDWARE FEEDBACK INTERFACE AND INTEL
THREAD DIRECTOR
[2]: SDM, vol. 3B, section 15.6.2 Intel Thread Director Table Structure
[3]: SDM, vol. 3B, section 15.6.5 Hardware Feedback Interface
Configuration, Table 15-10. IA32_HW_FEEDBACK_CONFIG Control Option
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 119 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 119 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 96f0f768939d..7881f6b51daa 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1532,6 +1532,125 @@ static void vmx_inject_therm_interrupt(struct kvm_vcpu *vcpu)
kvm_apic_therm_deliver(vcpu);
}
+static inline bool vmx_hfi_initialized(struct kvm_vmx *kvm_vmx)
+{
+ return kvm_vmx->pkg_therm.hfi_desc.hfi_enabled &&
+ kvm_vmx->pkg_therm.hfi_desc.table_ptr_valid;
+}
+
+static inline bool vmx_hfi_int_enabled(struct kvm_vmx *kvm_vmx)
+{
+ return kvm_vmx->pkg_therm.hfi_desc.hfi_int_enabled;
+}
+
+static int vmx_init_hfi_table(struct kvm *kvm)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ struct hfi_features *hfi_features = &kvm_vmx_hfi->hfi_features;
+ struct hfi_table *hfi_table = &kvm_vmx_hfi->hfi_table;
+ int nr_classes, ret = 0;
+
+ /*
+ * Currently we haven't supported ITD. HFI is the default feature
+ * with 1 class.
+ */
+ nr_classes = 1;
+ ret = intel_hfi_build_virt_features(hfi_features,
+ nr_classes,
+ kvm->created_vcpus);
+ if (unlikely(ret))
+ return ret;
+
+ hfi_table->base_addr = kzalloc(hfi_features->nr_table_pages <<
+ PAGE_SHIFT, GFP_KERNEL);
+ if (!hfi_table->base_addr)
+ return -ENOMEM;
+
+ hfi_table->hdr = hfi_table->base_addr + sizeof(*hfi_table->timestamp);
+ hfi_table->data = hfi_table->hdr + hfi_features->hdr_size;
+ return 0;
+}
+
+static int vmx_build_hfi_table(struct kvm *kvm)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ struct hfi_features *hfi_features = &kvm_vmx_hfi->hfi_features;
+ struct hfi_table *hfi_table = &kvm_vmx_hfi->hfi_table;
+ struct hfi_hdr *hfi_hdr = hfi_table->hdr;
+ int nr_classes, ret = 0, updated = 0;
+ struct kvm_vcpu *v;
+ unsigned long i;
+
+ /*
+ * Currently we haven't supported ITD. HFI is the default feature
+ * with 1 class.
+ */
+ nr_classes = 1;
+ for (int j = 0; j < nr_classes; j++) {
+ hfi_hdr->perf_updated = 0;
+ hfi_hdr->ee_updated = 0;
+ hfi_hdr++;
+ }
+
+ kvm_for_each_vcpu(i, v, kvm) {
+ ret = intel_hfi_build_virt_table(hfi_table, hfi_features,
+ nr_classes,
+ to_vmx(v)->hfi_table_idx,
+ v->cpu);
+ if (unlikely(ret < 0))
+ return ret;
+ updated |= ret;
+ }
+
+ if (!updated)
+ return updated;
+
+ /* Timestamp must be monotonic. */
+ (*kvm_vmx_hfi->hfi_table.timestamp)++;
+
+ /* Update the HFI table, whether the HFI interrupt is enabled or not. */
+ kvm_write_guest(kvm, kvm_vmx_hfi->table_base, hfi_table->base_addr,
+ hfi_features->nr_table_pages << PAGE_SHIFT);
+ return 1;
+}
+
+static void vmx_update_hfi_table(struct kvm *kvm)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ int ret = 0;
+
+ if (!intel_hfi_enabled())
+ return;
+
+ if (!vmx_hfi_initialized(kvm_vmx))
+ return;
+
+ if (!kvm_vmx_hfi->hfi_table.base_addr) {
+ ret = vmx_init_hfi_table(kvm);
+ if (unlikely(ret))
+ return;
+ }
+
+ ret = vmx_build_hfi_table(kvm);
+ if (ret <= 0)
+ return;
+
+ kvm_vmx_hfi->hfi_update_status = true;
+ kvm_vmx_hfi->hfi_update_pending = false;
+
+ /*
+ * Since HFI is shared for all vCPUs of the same VM, we
+ * actually support only 1 package topology VMs, so when
+ * emulating package level interrupt, we only inject an
+ * interrupt into one vCPU to reduce the overhead.
+ */
+ if (vmx_hfi_int_enabled(kvm_vmx))
+ vmx_inject_therm_interrupt(kvm_get_vcpu(kvm, 0));
+}
+
/*
* Switches to specified vcpu, until a matching vcpu_put(), but assumes
* vcpu mutex is already taken.
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 14/26] KVM: x86: Introduce the HFI dynamic update request and kvm_x86_ops
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (12 preceding siblings ...)
2024-02-03 9:12 ` [RFC 13/26] KVM: VMX: Support virtual HFI table for VM Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 15/26] KVM: VMX: Sync update of Host HFI table to Guest Zhao Liu
` (12 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
There're 2 cases that we need to update Guest HFI table dynamically:
1. When Host's HFI table has update, we need to sync the change to
Guest.
2. When vCPU thread migrates to another pCPU, we need rebuild the new
Guest HFI table based the HFI data of the pCPU that the vCPU is
running on.
So add the updating mechanism with a new request and a new op to prepare
for the above 2 cases:
- New KVM request type to perform HFI updating at vcpu_enter_guest().
Updating the VM's HFI table will result in writing to the VM's memory.
This requires vCPU context, so we pend HFI updates via kvm request
until vCPU is running. Here we only make request for one vCPU per VM
because all vCPUs of the same VM share the same HFI table. This allows
one vCPU to update the HFI table for the entire VM.
- New kvm_x86_op (optional for x86).
When KVM processes KVM_REQ_HFI_UPDATE, this ops is called to update
the corresponding HFI table raw for the specified vCPU.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/include/asm/kvm-x86-ops.h | 3 ++-
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx/vmx.c | 30 ++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 2 ++
4 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 378ed944b849..1b16de7a03eb 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -136,8 +136,9 @@ KVM_X86_OP_OPTIONAL(migrate_timers)
KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
-KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons)
KVM_X86_OP_OPTIONAL(get_untagged_addr)
+KVM_X86_OP_OPTIONAL(update_hfi)
#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b5b2d0fde579..e476a86b0766 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -121,6 +121,7 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_HFI_UPDATE KVM_ARCH_REQ(33)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1794,6 +1795,7 @@ struct kvm_x86_ops {
unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
+ void (*update_hfi)(struct kvm_vcpu *vcpu);
};
struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7881f6b51daa..93c47ba0817b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1651,6 +1651,35 @@ static void vmx_update_hfi_table(struct kvm *kvm)
vmx_inject_therm_interrupt(kvm_get_vcpu(kvm, 0));
}
+static void vmx_dynamic_update_hfi_table(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+
+ if (!intel_hfi_enabled())
+ return;
+
+ mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+
+ /*
+ * If Guest hasn't handled the previous update, just mark a pending
+ * flag to indicate that Host has more updates that KVM needs to sync.
+ */
+ if (kvm_vmx_hfi->hfi_update_status) {
+ kvm_vmx_hfi->hfi_update_pending = true;
+ mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ return;
+ }
+
+ /*
+ * The virtual HFI table is maintained at VM level so that vCPUs
+ * of the same VM are sharing the one HFI table. Therefore, one
+ * vCPU can update the HFI table for the whole VM.
+ */
+ vmx_update_hfi_table(vcpu->kvm);
+ mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+}
+
/*
* Switches to specified vcpu, until a matching vcpu_put(), but assumes
* vcpu mutex is already taken.
@@ -8703,6 +8732,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
.get_untagged_addr = vmx_get_untagged_addr,
+ .update_hfi = vmx_dynamic_update_hfi_table,
};
static unsigned int vmx_handle_intel_pt_intr(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7d787ced513f..bea3def6a4b1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10850,6 +10850,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+ if (kvm_check_request(KVM_REQ_HFI_UPDATE, vcpu))
+ static_call(kvm_x86_update_hfi)(vcpu);
}
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 15/26] KVM: VMX: Sync update of Host HFI table to Guest
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (13 preceding siblings ...)
2024-02-03 9:12 ` [RFC 14/26] KVM: x86: Introduce the HFI dynamic update request and kvm_x86_ops Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 16/26] KVM: VMX: Update HFI table when vCPU migrates Zhao Liu
` (11 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
The HFI table could be updated via thermal interrupt as the actual
operating conditions of the processor change during runtime [1], so it
is required to synchronize hardware hint changes to Guest's HFI table in
time.
Provide the interfaces to register/unregister the Host's HFI update,
and in the callback of the notification, make HFI update request to
update Guest's HFI table before entering Guest.
[1]: SDM, vol. 3B, section 15.6.7 Hardware Feedback Interface and Intel
Thread Director Structure Dynamic Update
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 59 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 8 ++++++
2 files changed, 67 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 93c47ba0817b..0ad5e3473a28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1651,6 +1651,61 @@ static void vmx_update_hfi_table(struct kvm *kvm)
vmx_inject_therm_interrupt(kvm_get_vcpu(kvm, 0));
}
+static void vmx_hfi_notifier_register(struct kvm *kvm)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+
+ if (!intel_hfi_enabled())
+ return;
+
+ if (!vmx_hfi_initialized(kvm_vmx))
+ return;
+
+ if (kvm_vmx_hfi->has_hfi_instance)
+ return;
+
+ /*
+ * HFI/ITD virtualization is supported on the platforms with only
+ * 1 HFI instance. Just register notifier for vCPU 0.
+ */
+ kvm_vmx_hfi->has_hfi_instance =
+ !intel_hfi_notifier_register(&kvm_vmx_hfi->hfi_nb,
+ kvm_get_vcpu(kvm, 0)->cpu);
+}
+
+static void vmx_hfi_notifier_unregister(struct kvm *kvm)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+
+ if (!kvm_vmx_hfi->has_hfi_instance)
+ return;
+
+ intel_hfi_notifier_unregister(&kvm_vmx_hfi->hfi_nb,
+ kvm_get_vcpu(kvm, 0)->cpu);
+ kvm_vmx_hfi->has_hfi_instance = false;
+}
+
+static int vmx_hfi_update_notify(struct notifier_block *nb,
+ unsigned long code, void *data)
+{
+ struct hfi_desc *kvm_vmx_hfi;
+ struct kvm *kvm;
+
+ kvm_vmx_hfi = container_of(nb, struct hfi_desc, hfi_nb);
+ kvm = &kvm_vmx_hfi->vmx->kvm;
+
+ /*
+ * Don't need to check if vcpu 0 belongs to
+ * kvm_vmx_hfi->host_hfi_instance since currently ITD/HFI
+ * virtualization is only supported for client platforms
+ * (with only one HFI instance).
+ */
+ kvm_make_request(KVM_REQ_HFI_UPDATE, kvm_get_vcpu(kvm, 0));
+ return NOTIFY_OK;
+}
+
static void vmx_dynamic_update_hfi_table(struct kvm_vcpu *vcpu)
{
struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
@@ -7871,8 +7926,11 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
static int vmx_vm_init_pkg_therm(struct kvm *kvm)
{
struct pkg_therm_desc *pkg_therm = &to_kvm_vmx(kvm)->pkg_therm;
+ struct hfi_desc *kvm_vmx_hfi = &pkg_therm->hfi_desc;
mutex_init(&pkg_therm->pkg_therm_lock);
+ kvm_vmx_hfi->hfi_nb.notifier_call = vmx_hfi_update_notify;
+ kvm_vmx_hfi->vmx = to_kvm_vmx(kvm);
return 0;
}
@@ -8542,6 +8600,7 @@ static void vmx_vm_destroy(struct kvm *kvm)
struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ vmx_hfi_notifier_unregister(kvm);
kfree(kvm_vmx_hfi->hfi_table.base_addr);
free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm));
}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 63874aad7ae3..ff205bc0e99a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -395,6 +395,11 @@ struct vcpu_vmx {
* the vCPU is running on.
* When KVM updates the Guest's HFI table, it writes the local
* virtual HFI table to the Guest HFI table memory in @table_base.
+ * @has_hfi_instance: Flag indicates if VM registers @hfi_nb on Host's HFI instance.
+ * @hfi_nb: Notifier block to be registered in Host HFI instance.
+ * @vmx: Points to the kvm_vmx where the current nb is located.
+ * Used to get the corresponding kvm_vmx of the nb when it
+ * is executed.
*
* A set of status flags and feature information, used to maintain local virtual HFI table
* and sync updates to Guest HFI table.
@@ -409,6 +414,9 @@ struct hfi_desc {
gpa_t table_base;
struct hfi_features hfi_features;
struct hfi_table hfi_table;
+ bool has_hfi_instance;
+ struct notifier_block hfi_nb;
+ struct kvm_vmx *vmx;
};
struct pkg_therm_desc {
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 16/26] KVM: VMX: Update HFI table when vCPU migrates
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (14 preceding siblings ...)
2024-02-03 9:12 ` [RFC 15/26] KVM: VMX: Sync update of Host HFI table to Guest Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 17/26] KVM: VMX: Allow to inject thermal interrupt without HFI update Zhao Liu
` (10 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
When the vCPU migrates to a different pCPU, the virtual hfi data
corresponding to the vCPU's hfi index should be updated to the new
pCPU's data.
We don't need to re-register HFI notifier because currently ITD/HFI
virtualization is only supported for client platforms (with only one
HFI instance).
In this case, make the request to update the virtual hfi table.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ad5e3473a28..44c09c995120 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1735,6 +1735,17 @@ static void vmx_dynamic_update_hfi_table(struct kvm_vcpu *vcpu)
mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
}
+static void vmx_vcpu_hfi_load(struct kvm_vcpu *vcpu, int cpu)
+{
+ if (!intel_hfi_enabled())
+ return;
+
+ if (!vmx_hfi_initialized(to_kvm_vmx(vcpu->kvm)))
+ return;
+
+ kvm_make_request(KVM_REQ_HFI_UPDATE, vcpu);
+}
+
/*
* Switches to specified vcpu, until a matching vcpu_put(), but assumes
* vcpu mutex is already taken.
@@ -1748,6 +1759,9 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmx_vcpu_pi_load(vcpu, cpu);
vmx->host_debugctlmsr = get_debugctlmsr();
+
+ if (unlikely(vcpu->cpu != cpu))
+ vmx_vcpu_hfi_load(vcpu, cpu);
}
static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 17/26] KVM: VMX: Allow to inject thermal interrupt without HFI update
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (15 preceding siblings ...)
2024-02-03 9:12 ` [RFC 16/26] KVM: VMX: Update HFI table when vCPU migrates Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 18/26] KVM: VMX: Emulate HFI related bits in package thermal MSRs Zhao Liu
` (9 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
When the HFI table memory address is set by MSR_IA32_HW_FEEDBACK_PTR or
when MSR_IA32_HW_FEEDBACK_CONFIG enables the HFI feature, the hardware
sends an initial HFI notification via thermal interrupt and sets the
thermal status bit.
To prepare for the above cases, extend vmx_update_hfi_table() to allow
the forced thermal interrupt injection (with the thermal status bit set)
regardless of whether there is the HFI table change to be updated.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44c09c995120..97bb7b304213 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1616,7 +1616,7 @@ static int vmx_build_hfi_table(struct kvm *kvm)
return 1;
}
-static void vmx_update_hfi_table(struct kvm *kvm)
+static void vmx_update_hfi_table(struct kvm *kvm, bool forced_int)
{
struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
@@ -1635,7 +1635,7 @@ static void vmx_update_hfi_table(struct kvm *kvm)
}
ret = vmx_build_hfi_table(kvm);
- if (ret <= 0)
+ if (ret < 0 || (!ret && !forced_int))
return;
kvm_vmx_hfi->hfi_update_status = true;
@@ -1731,7 +1731,7 @@ static void vmx_dynamic_update_hfi_table(struct kvm_vcpu *vcpu)
* of the same VM are sharing the one HFI table. Therefore, one
* vCPU can update the HFI table for the whole VM.
*/
- vmx_update_hfi_table(vcpu->kvm);
+ vmx_update_hfi_table(vcpu->kvm, false);
mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 18/26] KVM: VMX: Emulate HFI related bits in package thermal MSRs
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (16 preceding siblings ...)
2024-02-03 9:12 ` [RFC 17/26] KVM: VMX: Allow to inject thermal interrupt without HFI update Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 19/26] KVM: VMX: Emulate the MSRs of HFI feature Zhao Liu
` (8 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
The HFI feature adds the new bits in MSR_IA32_PACKAGE_THERM_STATUS and
MSR_IA32_PACKAGE_THERM_INTERRUPT to control HFI status and notification:
* MSR_IA32_PACKAGE_THERM_STATUS: PACKAGE_THERM_STATUS_HFI_UPDATED bit.
This bit indicates if there's the new HFI update. Whenever the HFI
table is updated, the hardware sends an HFI notification and sets this
bit to 1. Only when the OS clears this bit to 0 will the HFI table
continue to be updated.
Emulate the logic of this bit to coordinate with the update of the
Guest HFI table and also to support Guest's clear 0 write.
* MSR_IA32_PACKAGE_THERM_INTERRUPT: PACKAGE_THERM_INT_HFI_ENABLE bit.
This bit controls the HFI notification enabling. If it's set to 1,
every time when HFI table has update, hardware will send a thermal
interrupt to notify OS.
Therefore, also emulate this bit to support thermal interrupt when
Guest HFI table is updated.
These status/control bits correspond to the flags in struct hfi_desc,
(this is hfi_update_status and hfi_int_enabled).
Note that for the thermal interrupt-related features, we only fully
emulate HFI, so MSR_IA32_PACKAGE_THERM_STATUS and
MSR_IA32_PACKAGE_THERM_INTERRUPT do not (and should not, even though
we do not disable the initial exception MSR value via KVM_SET_MSRS)
take effect by setting other bits.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 129 +++++++++++++++++++++++++++++++++++------
1 file changed, 111 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 97bb7b304213..92dded89ae3c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -183,7 +183,6 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
THERM_MASK_THRESHOLD0 | THERM_INT_THRESHOLD0_ENABLE | \
THERM_MASK_THRESHOLD1 | THERM_INT_THRESHOLD1_ENABLE)
-/* HFI (CPUID.06H:EAX[19]) is not emulated in kvm yet. */
#define MSR_IA32_PACKAGE_THERM_STATUS_RO_MASK (PACKAGE_THERM_STATUS_PROCHOT | \
PACKAGE_THERM_STATUS_PROCHOT_EVENT | PACKAGE_THERM_STATUS_CRITICAL_TEMP | \
THERM_STATUS_THRESHOLD0 | THERM_STATUS_THRESHOLD1 | \
@@ -191,20 +190,17 @@ module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
#define MSR_IA32_PACKAGE_THERM_STATUS_RWC0_MASK (PACKAGE_THERM_STATUS_PROCHOT_LOG | \
PACKAGE_THERM_STATUS_PROCHOT_EVENT_LOG | PACKAGE_THERM_STATUS_CRITICAL_TEMP_LOG | \
THERM_LOG_THRESHOLD0 | THERM_LOG_THRESHOLD1 | \
- PACKAGE_THERM_STATUS_POWER_LIMIT_LOG)
+ PACKAGE_THERM_STATUS_POWER_LIMIT_LOG | PACKAGE_THERM_STATUS_HFI_UPDATED)
/* MSR_IA32_PACKAGE_THERM_STATUS unavailable bits mask: unsupported and reserved bits. */
#define MSR_IA32_PACKAGE_THERM_STATUS_UNAVAIL_MASK (~(MSR_IA32_PACKAGE_THERM_STATUS_RO_MASK | \
MSR_IA32_PACKAGE_THERM_STATUS_RWC0_MASK))
-/*
- * MSR_IA32_PACKAGE_THERM_INTERRUPT available bits mask.
- * HFI (CPUID.06H:EAX[19]) is not emulated in kvm yet.
- */
-#define MSR_IA32_PACKAGE_THERM_INTERRUPT_AVAIL_MASK (PACKAGE_THERM_INT_HIGH_ENABLE | \
+#define MSR_IA32_PACKAGE_THERM_INTERRUPT_MASK (PACKAGE_THERM_INT_HIGH_ENABLE | \
PACKAGE_THERM_INT_LOW_ENABLE | PACKAGE_THERM_INT_PROCHOT_ENABLE | \
PACKAGE_THERM_INT_OVERHEAT_ENABLE | THERM_MASK_THRESHOLD0 | \
THERM_INT_THRESHOLD0_ENABLE | THERM_MASK_THRESHOLD1 | \
- THERM_INT_THRESHOLD1_ENABLE | PACKAGE_THERM_INT_PLN_ENABLE)
+ THERM_INT_THRESHOLD1_ENABLE | PACKAGE_THERM_INT_PLN_ENABLE | \
+ PACKAGE_THERM_INT_HFI_ENABLE)
/*
* List of MSRs that can be directly passed to the guest.
@@ -2417,7 +2413,16 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_PTS))
return 1;
+
+ mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ if (kvm_vmx->pkg_therm.hfi_desc.hfi_update_status)
+ kvm_vmx->pkg_therm.msr_pkg_therm_status |=
+ PACKAGE_THERM_STATUS_HFI_UPDATED;
+ else
+ kvm_vmx->pkg_therm.msr_pkg_therm_status &=
+ ~PACKAGE_THERM_STATUS_HFI_UPDATED;
msr_info->data = kvm_vmx->pkg_therm.msr_pkg_therm_status;
+ mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
break;
default:
find_uret_msr:
@@ -2471,6 +2476,87 @@ static inline u64 vmx_set_msr_rwc0_bits(u64 new_val, u64 old_val, u64 rwc0_mask)
return ((new_rwc0 | ~old_rwc0) & old_rwc0) | (new_val & ~rwc0_mask);
}
+static int vmx_set_pkg_therm_int_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ u64 data = msr_info->data;
+ bool hfi_int_enabled, hfi_int_changed;
+
+ hfi_int_enabled = data & PACKAGE_THERM_INT_HFI_ENABLE;
+ hfi_int_changed = vmx_hfi_int_enabled(kvm_vmx) != hfi_int_enabled;
+
+ kvm_vmx->pkg_therm.msr_pkg_therm_int = data;
+ kvm_vmx_hfi->hfi_int_enabled = hfi_int_enabled;
+
+ /*
+ * Only HFI notification is supported, otherwise behave as a
+ * dummy MSR.
+ */
+ if (!intel_hfi_enabled() ||
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI) ||
+ !hfi_int_changed)
+ return 0;
+
+ if (!hfi_int_enabled)
+ return 0;
+
+ /*
+ * SDM: (For IA32_HW_FEEDBACK_CONFIG) no (HFI) status bit
+ * set, no interrupt is generated.
+ */
+ if (!kvm_vmx_hfi->hfi_enabled)
+ return 0;
+
+ /*
+ * When HFI interrupt enable bit transitions from 0 to 1,
+ * try to inject initial interrupt. No need to force
+ * injection of the interrupt if there's no HFI table update.
+ */
+ vmx_update_hfi_table(vcpu->kvm, false);
+
+ return 0;
+}
+
+static int vmx_set_pkg_therm_status_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ u64 data = msr_info->data;
+ bool hfi_status_updated, hfi_status_changed;
+
+ if (!msr_info->host_initiated) {
+ data = vmx_set_msr_rwc0_bits(data, kvm_vmx->pkg_therm.msr_pkg_therm_status,
+ MSR_IA32_PACKAGE_THERM_STATUS_RWC0_MASK);
+ data = vmx_set_msr_ro_bits(data, kvm_vmx->pkg_therm.msr_pkg_therm_status,
+ MSR_IA32_PACKAGE_THERM_STATUS_RO_MASK);
+ }
+
+ hfi_status_updated = data & PACKAGE_THERM_STATUS_HFI_UPDATED;
+ hfi_status_changed = kvm_vmx_hfi->hfi_update_status != hfi_status_updated;
+
+ kvm_vmx->pkg_therm.msr_pkg_therm_status = data;
+ kvm_vmx_hfi->hfi_update_status = hfi_status_updated;
+
+ if (!intel_hfi_enabled() ||
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI) ||
+ !hfi_status_changed)
+ return 0;
+
+ /*
+ * From SDM, once the HFI (thermal) status bit is set, the hardware
+ * will not generate any further updates to HFI table until the OS
+ * clears this bit by writing 0. When this bit is cleared, apply any
+ * pending updates to guest HFI table.
+ */
+ if (!kvm_vmx_hfi->hfi_update_status && kvm_vmx_hfi->hfi_update_pending)
+ vmx_update_hfi_table(vcpu->kvm, false);
+
+ return 0;
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2801,11 +2887,19 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_PTS))
return 1;
- /* Unsupported and reserved bits: generate the exception. */
+ /* Unsupported bit: generate the exception. */
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI) &&
+ data & PACKAGE_THERM_INT_HFI_ENABLE)
+ return 1;
+ /* Reserved bits: generate the exception. */
if (!msr_info->host_initiated &&
- data & ~MSR_IA32_PACKAGE_THERM_INTERRUPT_AVAIL_MASK)
+ data & ~MSR_IA32_PACKAGE_THERM_INTERRUPT_MASK)
return 1;
- kvm_vmx->pkg_therm.msr_pkg_therm_int = data;
+
+ mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ ret = vmx_set_pkg_therm_int_msr(vcpu, msr_info);
+ mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
break;
case MSR_IA32_PACKAGE_THERM_STATUS:
if (!msr_info->host_initiated &&
@@ -2815,15 +2909,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!msr_info->host_initiated &&
data & MSR_IA32_PACKAGE_THERM_STATUS_UNAVAIL_MASK)
return 1;
+ /* Unsupported bit: generate the exception. */
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI) &&
+ data & PACKAGE_THERM_STATUS_HFI_UPDATED)
+ return 1;
mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
- if (!msr_info->host_initiated) {
- data = vmx_set_msr_rwc0_bits(data, kvm_vmx->pkg_therm.msr_pkg_therm_status,
- MSR_IA32_PACKAGE_THERM_STATUS_RWC0_MASK);
- data = vmx_set_msr_ro_bits(data, kvm_vmx->pkg_therm.msr_pkg_therm_status,
- MSR_IA32_PACKAGE_THERM_STATUS_RO_MASK);
- }
- kvm_vmx->pkg_therm.msr_pkg_therm_status = data;
+ ret = vmx_set_pkg_therm_status_msr(vcpu, msr_info);
mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
break;
default:
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 19/26] KVM: VMX: Emulate the MSRs of HFI feature
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (17 preceding siblings ...)
2024-02-03 9:12 ` [RFC 18/26] KVM: VMX: Emulate HFI related bits in package thermal MSRs Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 20/26] KVM: x86: Expose HFI feature bit and HFI info in CPUID Zhao Liu
` (7 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
In addition to adding new bits to the package thermal MSRs, HFI has also
introduced two new MSRs:
* MSR_IA32_HW_FEEDBACK_CONFIG: used to enable/disable HFI feature at
runtime.
Emulate this MSR by parsing the HFI enabling bit.
* MSR_IA32_HW_FEEDBACK_PTR: used to configure the HFI table's memory
address.
Emulate this MSR by storing the Guest HFI table's GPA, and writing
local virtual HFI table into this GPA when Guest's HFI table needs to
be updated.
Only when HFI is enabled (set by Guest in MSR_IA32_HW_FEEDBACK_CONFIG)
and Guest HFI table is valid (set the valid address by Guest in
MSR_IA32_HW_FEEDBACK_PTR), Guest can have the valid HFI table and its
HFI table can be updated.
Because the current virtual HFI table is maintained for each VM, not for
each virtual package, these 2 MSRs are also emulated at the VM level.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/vmx/vmx.c | 112 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 2 +
arch/x86/kvm/x86.c | 2 +
4 files changed, 118 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7039ae48d8d0..980d93c70eb6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4293,6 +4293,8 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
case MSR_IA32_THERM_STATUS:
case MSR_IA32_PACKAGE_THERM_INTERRUPT:
case MSR_IA32_PACKAGE_THERM_STATUS:
+ case MSR_IA32_HW_FEEDBACK_CONFIG:
+ case MSR_IA32_HW_FEEDBACK_PTR:
return false;
case MSR_IA32_SMBASE:
if (!IS_ENABLED(CONFIG_KVM_SMM))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 92dded89ae3c..9c28d4ea0b2d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2424,6 +2424,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = kvm_vmx->pkg_therm.msr_pkg_therm_status;
mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
break;
+ case MSR_IA32_HW_FEEDBACK_CONFIG:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI))
+ return 1;
+ msr_info->data = kvm_vmx->pkg_therm.msr_ia32_hfi_cfg;
+ break;
+ case MSR_IA32_HW_FEEDBACK_PTR:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI))
+ return 1;
+ msr_info->data = kvm_vmx->pkg_therm.msr_ia32_hfi_ptr;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -2557,6 +2569,77 @@ static int vmx_set_pkg_therm_status_msr(struct kvm_vcpu *vcpu,
return 0;
}
+static int vmx_set_hfi_cfg_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ u64 data = msr_info->data;
+ bool hfi_enabled, hfi_changed;
+
+ /*
+ * When the HFI enable bit changes (either from 0 to 1 or 1 to
+ * 0), HFI status bit is set and an interrupt is generated if
+ * enabled.
+ */
+ hfi_enabled = data & HW_FEEDBACK_CONFIG_HFI_ENABLE;
+ hfi_changed = kvm_vmx_hfi->hfi_enabled != hfi_enabled;
+
+ kvm_vmx->pkg_therm.msr_ia32_hfi_cfg = data;
+ kvm_vmx_hfi->hfi_enabled = hfi_enabled;
+
+ if (!hfi_changed)
+ return 0;
+
+ if (!hfi_enabled) {
+ /*
+ * SDM: hardware sets the IA32_PACKAGE_THERM_STATUS[bit 26]
+ * to 1 to acknowledge disabling of the interface.
+ */
+ kvm_vmx_hfi->hfi_update_status = true;
+ if (vmx_hfi_int_enabled(kvm_vmx))
+ vmx_inject_therm_interrupt(vcpu);
+ } else {
+ /*
+ * Here we don't care pending updates, because the enabed
+ * feature change may cause the HFI table update range to
+ * change.
+ */
+ vmx_update_hfi_table(vcpu->kvm, true);
+ vmx_hfi_notifier_register(vcpu->kvm);
+ }
+
+ return 0;
+}
+
+static int vmx_set_hfi_ptr_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info)
+{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
+ struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
+ u64 data = msr_info->data;
+
+ if (kvm_vmx->pkg_therm.msr_ia32_hfi_ptr == data)
+ return 0;
+
+ kvm_vmx->pkg_therm.msr_ia32_hfi_ptr = data;
+ kvm_vmx_hfi->table_ptr_valid = data & HW_FEEDBACK_PTR_VALID;
+ /*
+ * Currently we don't really support MSR handling for package
+ * scope, so when Guest writes, it is not possible to distinguish
+ * between writes from different packages or repeated writes from
+ * the same package. To simplify the process, we just assume that
+ * multiple writes are duplicate writes of the same package and
+ * overwrite the old.
+ */
+ kvm_vmx_hfi->table_base = data & ~HW_FEEDBACK_PTR_VALID;
+
+ vmx_update_hfi_table(vcpu->kvm, true);
+ vmx_hfi_notifier_register(vcpu->kvm);
+
+ return 0;
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2919,6 +3002,35 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
ret = vmx_set_pkg_therm_status_msr(vcpu, msr_info);
mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
break;
+ case MSR_IA32_HW_FEEDBACK_CONFIG:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI))
+ return 1;
+ /*
+ * Unsupported and reserved bits. ITD is not supported
+ * (CPUID.06H:EAX[19]) yet.
+ */
+ if (!msr_info->host_initiated &&
+ data & ~(HW_FEEDBACK_CONFIG_HFI_ENABLE))
+ return 1;
+
+ mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ ret = vmx_set_hfi_cfg_msr(vcpu, msr_info);
+ mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ break;
+ case MSR_IA32_HW_FEEDBACK_PTR:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_HFI))
+ return 1;
+ /* Reserved bits: generate the exception. */
+ if (!msr_info->host_initiated &&
+ data & HW_FEEDBACK_PTR_RESERVED_MASK)
+ return 1;
+
+ mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ ret = vmx_set_hfi_ptr_msr(vcpu, msr_info);
+ mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_index);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index ff205bc0e99a..d9db8bf3726f 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -422,6 +422,8 @@ struct hfi_desc {
struct pkg_therm_desc {
u64 msr_pkg_therm_int;
u64 msr_pkg_therm_status;
+ u64 msr_ia32_hfi_cfg;
+ u64 msr_ia32_hfi_ptr;
/* Currently HFI is only supported at package level. */
struct hfi_desc hfi_desc;
/* All members before "struct mutex pkg_therm_lock" are protected by the lock. */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bea3def6a4b1..27bec359907c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1550,6 +1550,8 @@ static const u32 emulated_msrs_all[] = {
MSR_IA32_THERM_STATUS,
MSR_IA32_PACKAGE_THERM_INTERRUPT,
MSR_IA32_PACKAGE_THERM_STATUS,
+ MSR_IA32_HW_FEEDBACK_CONFIG,
+ MSR_IA32_HW_FEEDBACK_PTR,
/*
* KVM always supports the "true" VMX control MSRs, even if the host
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 20/26] KVM: x86: Expose HFI feature bit and HFI info in CPUID
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (18 preceding siblings ...)
2024-02-03 9:12 ` [RFC 19/26] KVM: VMX: Emulate the MSRs of HFI feature Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 21/26] KVM: VMX: Extend HFI table and MSR emulation to support ITD Zhao Liu
` (6 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
The HFI feature contains the following relevant CPUID fields:
* 0x06.eax[bit 19]: HFI feature bit
* 0x06.ecx[bits 08-15]: Number of HFI/ITD supported classes
* 0x06.edx[bits 00-07]: Bitmap of supported HFI capabilities
* 0x06.edx[bits 08-11]: Enumerates the size of the HFI table in number
of 4 KB pages
* 0x06.edx[bits 16-31]: HFI table index of processor
Guest's HFI feature bit (0x06.eax[bit 19]) is based on Host's HFI
enabling.
For other HFI related CPUID fields, since they affect the memory
allocation and HFI data filling of the virtual HFI table in KVM, check
the hfi related CPUID fields after KVM_SET_CPUID/KVM_SET_CPUID2 to
ensure the valid HFI feature information and the valid memory size.
And about the HFI table index, since the current KVM creates the same
CPUID template for all vCPUs, we refer to the CPU topology handling and
leave the specific filling of the HFI table index to the user, if the
user does not specifically specify the HFI index, all vCPUs will share
the HFI entry with hfi index 0.
The shared HFI index is valid in spec [1], but considering that the data
of the virtual HFI table is all from the pCPU on which the vCPU is
running, the shared hfi index of vCPUs on different pCPUs might cause
frequent HFI updates, and the virtual HFI table cannot accurately reflect
the actual processor situation, which might have a negative impact on
the Guest performance. Therefore, it is better to assign different HFI
table indexes to different vCPUs.
[1]: SDM, vol. 2A, chap. CPUID--CPU Identification, CPUID.06H.EDX[Bits
31-16], about HFI table index sharing, it said, "Note that on some
parts the index may be same for multiple logical processors".
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/cpuid.c | 136 ++++++++++++++++++++++++++++++++++++-----
arch/x86/kvm/vmx/vmx.c | 7 +++
2 files changed, 128 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eaac2c8d98b9..4da8f3319917 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -17,6 +17,7 @@
#include <linux/uaccess.h>
#include <linux/sched/stat.h>
+#include <asm/hfi.h>
#include <asm/processor.h>
#include <asm/user.h>
#include <asm/fpu/xstate.h>
@@ -130,12 +131,77 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
return NULL;
}
+static int kvm_check_hfi_cpuid(struct kvm_vcpu *vcpu,
+ struct kvm_cpuid_entry2 *entries,
+ int nent)
+{
+ struct hfi_features hfi_features;
+ struct kvm_cpuid_entry2 *best = NULL;
+ bool has_hfi;
+ int nr_classes, ret;
+ union cpuid6_ecx ecx;
+ union cpuid6_edx edx;
+ unsigned int data_size;
+
+ best = cpuid_entry2_find(entries, nent, 0x6, 0);
+ if (!best)
+ return 0;
+
+ has_hfi = cpuid_entry_has(best, X86_FEATURE_HFI);
+ if (!has_hfi)
+ return 0;
+
+ /*
+ * Only the platform with 1 HFI instance (i.e., client platform)
+ * can enable HFI in Guest. For more information, please refer to
+ * the comment in kvm_set_cpu_caps().
+ */
+ if (intel_hfi_max_instances() != 1)
+ return -EINVAL;
+
+ /*
+ * Currently we haven't supported ITD. HFI is the default feature
+ * with 1 class.
+ */
+ nr_classes = 1;
+ ret = intel_hfi_build_virt_features(&hfi_features,
+ nr_classes,
+ vcpu->kvm->created_vcpus);
+ if (ret)
+ return ret;
+
+ ecx.full = best->ecx;
+ edx.full = best->edx;
+
+ if (ecx.split.nr_classes != hfi_features.nr_classes)
+ return -EINVAL;
+
+ if (hweight8(edx.split.capabilities.bits) != hfi_features.class_stride)
+ return -EINVAL;
+
+ if (edx.split.table_pages + 1 != hfi_features.nr_table_pages)
+ return -EINVAL;
+
+ /*
+ * The total size of the row corresponding to index and all
+ * previous data.
+ */
+ data_size = hfi_features.hdr_size + (edx.split.index + 1) *
+ hfi_features.cpu_stride;
+ /* Invalid index. */
+ if (data_size > hfi_features.nr_table_pages << PAGE_SHIFT)
+ return -EINVAL;
+
+ return 0;
+}
+
static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
struct kvm_cpuid_entry2 *entries,
int nent)
{
struct kvm_cpuid_entry2 *best;
u64 xfeatures;
+ int ret;
/*
* The existing code assumes virtual address is 48-bit or 57-bit in the
@@ -155,15 +221,18 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
* enabling in the FPU, e.g. to expand the guest XSAVE state size.
*/
best = cpuid_entry2_find(entries, nent, 0xd, 0);
- if (!best)
- return 0;
-
- xfeatures = best->eax | ((u64)best->edx << 32);
- xfeatures &= XFEATURE_MASK_USER_DYNAMIC;
- if (!xfeatures)
- return 0;
+ if (best) {
+ xfeatures = best->eax | ((u64)best->edx << 32);
+ xfeatures &= XFEATURE_MASK_USER_DYNAMIC;
+ if (xfeatures) {
+ ret = fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu,
+ xfeatures);
+ if (ret)
+ return ret;
+ }
+ }
- return fpu_enable_guest_xfd_features(&vcpu->arch.guest_fpu, xfeatures);
+ return kvm_check_hfi_cpuid(vcpu, entries, nent);
}
/* Check whether the supplied CPUID data is equal to what is already set for the vCPU. */
@@ -633,14 +702,27 @@ void kvm_set_cpu_caps(void)
);
/*
- * PTS is the dependency of ITD, currently we only use PTS for
- * enabling ITD in KVM. Since KVM does not support msr topology at
- * present, the emulation of PTS has restrictions on the topology of
- * Guest, so we only expose PTS when Host enables ITD.
+ * PTS and HFI are the dependencies of ITD, currently we only use PTS/HFI
+ * for enabling ITD in KVM. Since KVM does not support msr topology at
+ * present, the emulation of PTS/HFI has restrictions on the topology of
+ * Guest, so we only expose PTS/HFI when Host enables ITD.
+ *
+ * We also restrict HFI virtualization support to platforms with only 1 HFI
+ * instance (i.e., this is the client platform, and ITD is currently a
+ * client-specific feature), while server platforms with multiple instances
+ * do not require HFI virtualization. This restriction avoids adding
+ * additional complex logic to handle notification register updates when
+ * vCPUs migrate between different HFI instances.
*/
- if (cpu_feature_enabled(X86_FEATURE_ITD)) {
+ if (cpu_feature_enabled(X86_FEATURE_ITD) && intel_hfi_max_instances() == 1) {
if (boot_cpu_has(X86_FEATURE_PTS))
kvm_cpu_cap_set(X86_FEATURE_PTS);
+ /*
+ * Set HFI based on hardware capability. Only when the Host has
+ * the valid HFI instance, KVM can build the virtual HFI table.
+ */
+ if (intel_hfi_enabled())
+ kvm_cpu_cap_set(X86_FEATURE_HFI);
}
kvm_cpu_cap_mask(CPUID_7_0_EBX,
@@ -986,8 +1068,32 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->eax |= 0x4;
entry->ebx = 0;
- entry->ecx = 0;
- entry->edx = 0;
+
+ if (kvm_cpu_cap_has(X86_FEATURE_HFI)) {
+ union cpuid6_ecx ecx;
+ union cpuid6_edx edx;
+
+ ecx.full = 0;
+ edx.full = 0;
+ /* Number of supported HFI classes */
+ ecx.split.nr_classes = 1;
+ /* HFI supports performance and energy efficiency capabilities. */
+ edx.split.capabilities.split.performance = 1;
+ edx.split.capabilities.split.energy_efficiency = 1;
+ /* As default, keep the same HFI table size as host. */
+ edx.split.table_pages = ((union cpuid6_edx)entry->edx).split.table_pages;
+ /*
+ * Default HFI index = 0. User should be careful that
+ * the index differ for each CPUs.
+ */
+ edx.split.index = 0;
+
+ entry->ecx = ecx.full;
+ entry->edx = edx.full;
+ } else {
+ entry->ecx = 0;
+ entry->edx = 0;
+ }
break;
/* function 7 has additional index. */
case 7:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9c28d4ea0b2d..636f2bd68546 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8434,6 +8434,13 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vmx->msr_ia32_feature_control_valid_bits &=
~FEAT_CTL_SGX_LC_ENABLED;
+ if (guest_cpuid_has(vcpu, X86_FEATURE_HFI) && intel_hfi_enabled()) {
+ struct kvm_cpuid_entry2 *best = kvm_find_cpuid_entry_index(vcpu, 0x6, 0);
+
+ if (best)
+ vmx->hfi_table_idx = ((union cpuid6_edx)best->edx).split.index;
+ }
+
/* Refresh #PF interception to account for MAXPHYADDR changes. */
vmx_update_exception_bitmap(vcpu);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 21/26] KVM: VMX: Extend HFI table and MSR emulation to support ITD
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (19 preceding siblings ...)
2024-02-03 9:12 ` [RFC 20/26] KVM: x86: Expose HFI feature bit and HFI info in CPUID Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 22/26] KVM: VMX: Pass through ITD classification related MSRs to Guest Zhao Liu
` (5 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
ITD (Intel Thread Director) is the extension of HFI feature. Based on
HFI, it adds 4 classes in HFI table and it provides the MSR interface to
support the OS to classify the currently running task into one of 4
classes.
As the first step of ITD support, extend the HFI table and related HFI
MSRs' emulation to support the ITD:
* More classes in HFI table
If ITD is configured in Guest's CPUID, the virtual HFI table will be
built with 4 classes.
But only when ITD is enabled in MSR_IA32_HW_FEEDBACK_CONFIG, the
virtual HFI table will update all these 4 classes, otherwise it will
only update class 0's data if HFI is enabled.
* MSR_IA32_HW_FEEDBACK_CONFIG (HW_FEEDBACK_CONFIG_ITD_ENABLE bit)
With ITD support, MSR_IA32_HW_FEEDBACK_CONFIG has 2 feature enabling
bits: HW_FEEDBACK_CONFIG_HFI_ENABLE and HW_FEEDBACK_CONFIG_ITD_ENABLE
bit. These 2 bits control whether the HFI and ITD features are enabled
or not, and also affect which class data should actually be updated in
the virtual HFI table [1].
For the MSR_IA32_HW_FEEDBACK_CONFIG's emulation, add support for
dynamically changing these two bits and the corresponding HFI update
adjustments.
[1]: SDM, vol. 3B, section 15.6.5 Hardware Feedback Interface
Configuration, Table 15-10. IA32_HW_FEEDBACK_CONFIG Control Option
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 68 +++++++++++++++++++++++++++++++-----------
arch/x86/kvm/vmx/vmx.h | 3 ++
2 files changed, 54 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 636f2bd68546..bdff1d424b2f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1547,11 +1547,11 @@ static int vmx_init_hfi_table(struct kvm *kvm)
struct hfi_table *hfi_table = &kvm_vmx_hfi->hfi_table;
int nr_classes, ret = 0;
- /*
- * Currently we haven't supported ITD. HFI is the default feature
- * with 1 class.
- */
- nr_classes = 1;
+ if (guest_cpuid_has(kvm_get_vcpu(kvm, 0), X86_FEATURE_ITD))
+ nr_classes = 4;
+ else
+ nr_classes = 1;
+
ret = intel_hfi_build_virt_features(hfi_features,
nr_classes,
kvm->created_vcpus);
@@ -1579,11 +1579,11 @@ static int vmx_build_hfi_table(struct kvm *kvm)
struct kvm_vcpu *v;
unsigned long i;
- /*
- * Currently we haven't supported ITD. HFI is the default feature
- * with 1 class.
- */
- nr_classes = 1;
+ if (kvm_vmx_hfi->itd_enabled)
+ nr_classes = kvm_vmx_hfi->hfi_features.nr_classes;
+ else
+ nr_classes = 1;
+
for (int j = 0; j < nr_classes; j++) {
hfi_hdr->perf_updated = 0;
hfi_hdr->ee_updated = 0;
@@ -2575,7 +2575,7 @@ static int vmx_set_hfi_cfg_msr(struct kvm_vcpu *vcpu,
struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
struct hfi_desc *kvm_vmx_hfi = &kvm_vmx->pkg_therm.hfi_desc;
u64 data = msr_info->data;
- bool hfi_enabled, hfi_changed;
+ bool hfi_enabled, hfi_changed, itd_enabled, itd_changed;
/*
* When the HFI enable bit changes (either from 0 to 1 or 1 to
@@ -2584,12 +2584,44 @@ static int vmx_set_hfi_cfg_msr(struct kvm_vcpu *vcpu,
*/
hfi_enabled = data & HW_FEEDBACK_CONFIG_HFI_ENABLE;
hfi_changed = kvm_vmx_hfi->hfi_enabled != hfi_enabled;
+ itd_enabled = data & HW_FEEDBACK_CONFIG_ITD_ENABLE;
+ itd_changed = kvm_vmx_hfi->itd_enabled != itd_enabled;
kvm_vmx->pkg_therm.msr_ia32_hfi_cfg = data;
kvm_vmx_hfi->hfi_enabled = hfi_enabled;
+ kvm_vmx_hfi->itd_enabled = itd_enabled;
+
+ if (!hfi_changed && !itd_changed)
+ return 0;
+
+ /*
+ * Refer to SDM, vol. 3B, Table 15-10. IA32_HW_FEEDBACK_CONFIG
+ * Control Option.
+ */
+
+ /* Invalid option; quietly ignored by the hardware. */
+ if (!hfi_changed && itd_changed && !hfi_enabled && itd_enabled) {
+ /* No action (no update in the table). */
+ return 0;
+ }
- if (!hfi_changed)
+ /* No action; keep HFI and Intel Thread Director disabled. */
+ if (!hfi_changed && itd_changed && !hfi_enabled && !itd_enabled) {
+ /* No action (no update in the table). */
return 0;
+ }
+
+ /* No action; keep HFI enabled. */
+ if (!hfi_changed && itd_changed && hfi_enabled && !itd_enabled) {
+ /* No action (no update in the table). */
+ return 0;
+ }
+
+ /* Disable HFI and Intel Thread Director whether ITD changed. */
+ if (hfi_changed && !hfi_enabled && itd_enabled) {
+ kvm_vmx_hfi->hfi_enabled = false;
+ kvm_vmx_hfi->itd_enabled = false;
+ }
if (!hfi_enabled) {
/*
@@ -3006,12 +3038,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_HFI))
return 1;
- /*
- * Unsupported and reserved bits. ITD is not supported
- * (CPUID.06H:EAX[19]) yet.
- */
+ /* Unsupported bit: generate the exception. */
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_ITD) &&
+ (data & HW_FEEDBACK_CONFIG_ITD_ENABLE))
+ return 1;
+ /* Reserved bits: generate the exception. */
if (!msr_info->host_initiated &&
- data & ~(HW_FEEDBACK_CONFIG_HFI_ENABLE))
+ data & ~(HW_FEEDBACK_CONFIG_HFI_ENABLE | HW_FEEDBACK_CONFIG_ITD_ENABLE))
return 1;
mutex_lock(&kvm_vmx->pkg_therm.pkg_therm_lock);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d9db8bf3726f..0ef767d63def 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -377,6 +377,8 @@ struct vcpu_vmx {
* struct hfi_desc - Representation of an HFI instance (i.e., a table)
* @hfi_enabled: Flag to indicate whether HFI is enabled at runtime.
* Parsed from the Guest's MSR_IA32_HW_FEEDBACK_CONFIG.
+ * @itd_enabled: Flag to indicate whether ITD is enabled at runtime.
+ * Parsed from the Guest's MSR_IA32_HW_FEEDBACK_CONFIG.
* @hfi_int_enabled: Flag to indicate whether HFI is enabled at runtime.
* Parsed from Guest's MSR_IA32_PACKAGE_THERM_INTERRUPT[bit 25].
* @table_ptr_valid: Flag to indicate whether the memory of Guest HFI table is ready.
@@ -407,6 +409,7 @@ struct vcpu_vmx {
struct hfi_desc {
bool hfi_enabled;
+ bool itd_enabled;
bool hfi_int_enabled;
bool table_ptr_valid;
bool hfi_update_status;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 22/26] KVM: VMX: Pass through ITD classification related MSRs to Guest
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (20 preceding siblings ...)
2024-02-03 9:12 ` [RFC 21/26] KVM: VMX: Extend HFI table and MSR emulation to support ITD Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 23/26] KVM: x86: Expose ITD feature bit and related info in CPUID Zhao Liu
` (4 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
ITD adds 2 new MSRs, MSR_IA32_HW_FEEDBACK_CHAR and
MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, to allow OS to classify the running
task into one of four classes [1].
Pass through these 2 MSRs to Guest:
* MSR_IA32_HW_FEEDBACK_CHAR.
MSR_IA32_HW_FEEDBACK_CHAR is a thread scope MSR. It is used to specify
the class for the currently running workload,
* MSR_IA32_HW_FEEDBACK_THREAD_CONFIG.
MSR_IA32_HW_FEEDBACK_THREAD_CONFIG is also a thread scope MSR and is
used to control the enablement of the classification function.
[1]: SDM, vol. 3B, section 15.6.8 Logical Processor Scope Intel Thread
Director Configuration
Suggested-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/vmx/vmx.c | 37 +++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 8 +++++++-
2 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bdff1d424b2f..11d42e0a208b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -225,6 +225,8 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = {
MSR_CORE_C3_RESIDENCY,
MSR_CORE_C6_RESIDENCY,
MSR_CORE_C7_RESIDENCY,
+ MSR_IA32_HW_FEEDBACK_THREAD_CONFIG,
+ MSR_IA32_HW_FEEDBACK_CHAR,
};
/*
@@ -1288,6 +1290,30 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
}
+static void itd_guest_enter(struct vcpu_vmx *vmx)
+{
+ struct vcpu_hfi_desc *vcpu_hfi = &vmx->vcpu_hfi_desc;
+
+ if (!guest_cpuid_has(&vmx->vcpu, X86_FEATURE_ITD) ||
+ !kvm_cpu_cap_has(X86_FEATURE_ITD))
+ return;
+
+ rdmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, vcpu_hfi->host_thread_cfg);
+ wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, vcpu_hfi->guest_thread_cfg);
+}
+
+static void itd_guest_exit(struct vcpu_vmx *vmx)
+{
+ struct vcpu_hfi_desc *vcpu_hfi = &vmx->vcpu_hfi_desc;
+
+ if (!guest_cpuid_has(&vmx->vcpu, X86_FEATURE_ITD) ||
+ !kvm_cpu_cap_has(X86_FEATURE_ITD))
+ return;
+
+ rdmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, vcpu_hfi->guest_thread_cfg);
+ wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, vcpu_hfi->host_thread_cfg);
+}
+
void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
unsigned long fs_base, unsigned long gs_base)
{
@@ -5485,6 +5511,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->msr_ia32_therm_control = 0;
vmx->msr_ia32_therm_interrupt = 0;
vmx->msr_ia32_therm_status = 0;
+ vmx->vcpu_hfi_desc.host_thread_cfg = 0;
+ vmx->vcpu_hfi_desc.guest_thread_cfg = 0;
vmx->hv_deadline_tsc = -1;
kvm_set_cr8(vcpu, 0);
@@ -7977,6 +8005,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
kvm_load_guest_xsave_state(vcpu);
pt_guest_enter(vmx);
+ itd_guest_enter(vmx);
atomic_switch_perf_msrs(vmx);
if (intel_pmu_lbr_is_enabled(vcpu))
@@ -8015,6 +8044,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
loadsegment(es, __USER_DS);
#endif
+ itd_guest_exit(vmx);
pt_guest_exit(vmx);
kvm_load_host_xsave_state(vcpu);
@@ -8475,6 +8505,13 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vmx->hfi_table_idx = ((union cpuid6_edx)best->edx).split.index;
}
+ if (guest_cpuid_has(vcpu, X86_FEATURE_ITD) && kvm_cpu_cap_has(X86_FEATURE_ITD)) {
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_HW_FEEDBACK_THREAD_CONFIG,
+ MSR_TYPE_RW, !guest_cpuid_has(vcpu, X86_FEATURE_ITD));
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_HW_FEEDBACK_CHAR,
+ MSR_TYPE_RW, !guest_cpuid_has(vcpu, X86_FEATURE_ITD));
+ }
+
/* Refresh #PF interception to account for MAXPHYADDR changes. */
vmx_update_exception_bitmap(vcpu);
}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0ef767d63def..3d3238dd8fc3 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -71,6 +71,11 @@ struct pt_desc {
struct pt_ctx guest;
};
+struct vcpu_hfi_desc {
+ u64 host_thread_cfg;
+ u64 guest_thread_cfg;
+};
+
union vmx_exit_reason {
struct {
u32 basic : 16;
@@ -286,6 +291,7 @@ struct vcpu_vmx {
u64 msr_ia32_therm_control;
u64 msr_ia32_therm_interrupt;
u64 msr_ia32_therm_status;
+ struct vcpu_hfi_desc vcpu_hfi_desc;
/*
* loaded_vmcs points to the VMCS currently used in this vcpu. For a
@@ -366,7 +372,7 @@ struct vcpu_vmx {
int hfi_table_idx;
/* Save desired MSR intercept (read: pass-through) state */
-#define MAX_POSSIBLE_PASSTHROUGH_MSRS 16
+#define MAX_POSSIBLE_PASSTHROUGH_MSRS 18
struct {
DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 23/26] KVM: x86: Expose ITD feature bit and related info in CPUID
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (21 preceding siblings ...)
2024-02-03 9:12 ` [RFC 22/26] KVM: VMX: Pass through ITD classification related MSRs to Guest Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 24/26] KVM: VMX: Emulate the MSR of HRESET feature Zhao Liu
` (3 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
The Guest's Intel Thread Director (ITD) feature bit is not only
dependent on the Host ITD's enablement, but is also based on the Guest's
HFI feature bit.
When the Host supports both HFI and ITD, try to support HFI and ITD for
the Guest.
If Host doesn't support ITD, we won't allow Guest to enable HFI or ITD.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/cpuid.c | 55 +++++++++++++++++++++++++++++++-------------
1 file changed, 39 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 4da8f3319917..9e78398f29dc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -137,7 +137,7 @@ static int kvm_check_hfi_cpuid(struct kvm_vcpu *vcpu,
{
struct hfi_features hfi_features;
struct kvm_cpuid_entry2 *best = NULL;
- bool has_hfi;
+ bool has_hfi, has_itd;
int nr_classes, ret;
union cpuid6_ecx ecx;
union cpuid6_edx edx;
@@ -148,9 +148,14 @@ static int kvm_check_hfi_cpuid(struct kvm_vcpu *vcpu,
return 0;
has_hfi = cpuid_entry_has(best, X86_FEATURE_HFI);
- if (!has_hfi)
+ has_itd = cpuid_entry_has(best, X86_FEATURE_ITD);
+ if (!has_hfi && !has_itd)
return 0;
+ /* ITD must base on HFI. */
+ if (!has_hfi && has_itd)
+ return -EINVAL;
+
/*
* Only the platform with 1 HFI instance (i.e., client platform)
* can enable HFI in Guest. For more information, please refer to
@@ -159,11 +164,11 @@ static int kvm_check_hfi_cpuid(struct kvm_vcpu *vcpu,
if (intel_hfi_max_instances() != 1)
return -EINVAL;
- /*
- * Currently we haven't supported ITD. HFI is the default feature
- * with 1 class.
- */
- nr_classes = 1;
+ /* Guest's ITD must base on Host's ITD enablement. */
+ if (!cpu_feature_enabled(X86_FEATURE_ITD) && has_itd)
+ return -EINVAL;
+
+ nr_classes = has_itd ? 4 : 1;
ret = intel_hfi_build_virt_features(&hfi_features,
nr_classes,
vcpu->kvm->created_vcpus);
@@ -718,11 +723,13 @@ void kvm_set_cpu_caps(void)
if (boot_cpu_has(X86_FEATURE_PTS))
kvm_cpu_cap_set(X86_FEATURE_PTS);
/*
- * Set HFI based on hardware capability. Only when the Host has
+ * Set HFI/ITD based on hardware capability. Only when the Host has
* the valid HFI instance, KVM can build the virtual HFI table.
*/
- if (intel_hfi_enabled())
+ if (intel_hfi_enabled()) {
kvm_cpu_cap_set(X86_FEATURE_HFI);
+ kvm_cpu_cap_set(X86_FEATURE_ITD);
+ }
}
kvm_cpu_cap_mask(CPUID_7_0_EBX,
@@ -1069,19 +1076,35 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->ebx = 0;
- if (kvm_cpu_cap_has(X86_FEATURE_HFI)) {
+ /*
+ * When Host enables ITD, we will expose ITD and HFI,
+ * otherwise, HFI/ITD will not be exposed to Guest.
+ * ITD is an extension of HFI, so after KVM supports ITD
+ * emulation, HFI-related info in 0x6 leaf should be consistent
+ * with the Host, that is, use the Host's ITD info, except
+ * for the HFI index.
+ *
+ * HFI table size is related to the HFI table indexes, but
+ * this item will be checked in kvm_check_cpuid() after
+ * KVM_SET_CPUID/KVM_SET_CPUID2.
+ */
+ if (kvm_cpu_cap_has(X86_FEATURE_ITD)) {
union cpuid6_ecx ecx;
union cpuid6_edx edx;
+ union cpuid6_ecx *host_ecx = (union cpuid6_ecx *)&entry->ecx;
+ union cpuid6_edx *host_edx = (union cpuid6_edx *)&entry->edx;
ecx.full = 0;
edx.full = 0;
- /* Number of supported HFI classes */
- ecx.split.nr_classes = 1;
- /* HFI supports performance and energy efficiency capabilities. */
- edx.split.capabilities.split.performance = 1;
- edx.split.capabilities.split.energy_efficiency = 1;
+ /* Number of supported HFI/ITD classes. */
+ ecx.split.nr_classes = host_ecx->split.nr_classes;
+ /* HFI/ITD supports performance and energy efficiency capabilities. */
+ edx.split.capabilities.split.performance =
+ host_edx->split.capabilities.split.performance;
+ edx.split.capabilities.split.energy_efficiency =
+ host_edx->split.capabilities.split.energy_efficiency;
/* As default, keep the same HFI table size as host. */
- edx.split.table_pages = ((union cpuid6_edx)entry->edx).split.table_pages;
+ edx.split.table_pages = host_edx->split.table_pages;
/*
* Default HFI index = 0. User should be careful that
* the index differ for each CPUs.
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 24/26] KVM: VMX: Emulate the MSR of HRESET feature
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (22 preceding siblings ...)
2024-02-03 9:12 ` [RFC 23/26] KVM: x86: Expose ITD feature bit and related info in CPUID Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 25/26] KVM: x86: Expose HRESET feature's CPUID to Guest Zhao Liu
` (2 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
HRESET is a feature associated with ITD, which provides an HRESET
instruction to reset the ITD related history accumulated on the current
logical processor it is executing on [1]. The HRESET instruction does
not cause the VMExit and is therefore available to the Guest by default
when the HRESET feature bit is set for the Guest.
The HRESET feature also provides a thread scope MSR to control the
enabling of the ITD history reset via the HRESET instruction [2]:
MSR_IA32_HW_HRESET_ENABLE.
This MSR can control the hardware, so we use the emulation way to
support it for Guest, and this makes the Guest's changes to the hardware
under the control of the Host.
Considering that there may be the difference between Guest and Host
about HRESET enabling status, we store the MSR_IA32_HW_HRESET_ENABLE
values of Host and Guest in vcpu_vmx and save/load their respective
configurations when Guest/Host switch.
[1]: SDM, vol. 3B, section 15.6.11 Logical Processor Scope History
[2]: SDM, vol. 2A, chap. CPUID--CPU Identification, CPUID.07H.01H.EAX
[Bit 22], HRESET.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/vmx/vmx.c | 54 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.h | 2 ++
arch/x86/kvm/x86.c | 1 +
4 files changed, 58 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 980d93c70eb6..d847dd8eb193 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4295,6 +4295,7 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
case MSR_IA32_PACKAGE_THERM_STATUS:
case MSR_IA32_HW_FEEDBACK_CONFIG:
case MSR_IA32_HW_FEEDBACK_PTR:
+ case MSR_IA32_HW_HRESET_ENABLE:
return false;
case MSR_IA32_SMBASE:
if (!IS_ENABLED(CONFIG_KVM_SMM))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 11d42e0a208b..2d733c959f32 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1314,6 +1314,35 @@ static void itd_guest_exit(struct vcpu_vmx *vmx)
wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, vcpu_hfi->host_thread_cfg);
}
+static void hreset_guest_enter(struct vcpu_vmx *vmx)
+{
+ struct vcpu_hfi_desc *vcpu_hfi = &vmx->vcpu_hfi_desc;
+
+ if (!kvm_cpu_cap_has(X86_FEATURE_HRESET) ||
+ !guest_cpuid_has(&vmx->vcpu, X86_FEATURE_HRESET))
+ return;
+
+ rdmsrl(MSR_IA32_HW_HRESET_ENABLE, vcpu_hfi->host_hreset_enable);
+ if (unlikely(vcpu_hfi->host_hreset_enable != vcpu_hfi->guest_hreset_enable))
+ wrmsrl(MSR_IA32_HW_HRESET_ENABLE, vcpu_hfi->guest_hreset_enable);
+}
+
+static void hreset_guest_exit(struct vcpu_vmx *vmx)
+{
+ struct vcpu_hfi_desc *vcpu_hfi = &vmx->vcpu_hfi_desc;
+
+ if (!kvm_cpu_cap_has(X86_FEATURE_HRESET) ||
+ !guest_cpuid_has(&vmx->vcpu, X86_FEATURE_HRESET))
+ return;
+
+ /*
+ * MSR_IA32_HW_HRESET_ENABLE is not passed through to Guest, so there
+ * is no need to read the MSR to save the Guest's value.
+ */
+ if (unlikely(vcpu_hfi->host_hreset_enable != vcpu_hfi->guest_hreset_enable))
+ wrmsrl(MSR_IA32_HW_HRESET_ENABLE, vcpu_hfi->host_hreset_enable);
+}
+
void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
unsigned long fs_base, unsigned long gs_base)
{
@@ -2462,6 +2491,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = kvm_vmx->pkg_therm.msr_ia32_hfi_ptr;
break;
+ case MSR_IA32_HW_HRESET_ENABLE:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(&vmx->vcpu, X86_FEATURE_HRESET))
+ return 1;
+ msr_info->data = vmx->vcpu_hfi_desc.guest_hreset_enable;
+ break;
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_info->index);
@@ -3091,6 +3126,21 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
ret = vmx_set_hfi_ptr_msr(vcpu, msr_info);
mutex_unlock(&kvm_vmx->pkg_therm.pkg_therm_lock);
break;
+ case MSR_IA32_HW_HRESET_ENABLE: {
+ struct kvm_cpuid_entry2 *entry;
+
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(&vmx->vcpu, X86_FEATURE_HRESET))
+ return 1;
+
+ entry = kvm_find_cpuid_entry_index(&vmx->vcpu, 0x20, 0);
+ /* Reserved bits: generate the exception. */
+ if (!msr_info->host_initiated && data & ~entry->ebx)
+ return 1;
+ /* hreset_guest_enter() will update MSR for Guest. */
+ vmx->vcpu_hfi_desc.guest_hreset_enable = data;
+ break;
+ }
default:
find_uret_msr:
msr = vmx_find_uret_msr(vmx, msr_index);
@@ -5513,6 +5563,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmx->msr_ia32_therm_status = 0;
vmx->vcpu_hfi_desc.host_thread_cfg = 0;
vmx->vcpu_hfi_desc.guest_thread_cfg = 0;
+ vmx->vcpu_hfi_desc.host_hreset_enable = 0;
+ vmx->vcpu_hfi_desc.guest_hreset_enable = 0;
vmx->hv_deadline_tsc = -1;
kvm_set_cr8(vcpu, 0);
@@ -8006,6 +8058,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
pt_guest_enter(vmx);
itd_guest_enter(vmx);
+ hreset_guest_enter(vmx);
atomic_switch_perf_msrs(vmx);
if (intel_pmu_lbr_is_enabled(vcpu))
@@ -8044,6 +8097,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
loadsegment(es, __USER_DS);
#endif
+ hreset_guest_exit(vmx);
itd_guest_exit(vmx);
pt_guest_exit(vmx);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 3d3238dd8fc3..c5b4684a5b51 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -74,6 +74,8 @@ struct pt_desc {
struct vcpu_hfi_desc {
u64 host_thread_cfg;
u64 guest_thread_cfg;
+ u64 host_hreset_enable;
+ u64 guest_hreset_enable;
};
union vmx_exit_reason {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27bec359907c..04489efc2fb4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1552,6 +1552,7 @@ static const u32 emulated_msrs_all[] = {
MSR_IA32_PACKAGE_THERM_STATUS,
MSR_IA32_HW_FEEDBACK_CONFIG,
MSR_IA32_HW_FEEDBACK_PTR,
+ MSR_IA32_HW_HRESET_ENABLE,
/*
* KVM always supports the "true" VMX control MSRs, even if the host
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 25/26] KVM: x86: Expose HRESET feature's CPUID to Guest
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (23 preceding siblings ...)
2024-02-03 9:12 ` [RFC 24/26] KVM: VMX: Emulate the MSR of HRESET feature Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-03 9:12 ` [RFC 26/26] Documentation: KVM: Add description of pkg_therm_lock Zhao Liu
2024-02-22 7:42 ` [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhuocheng Ding <zhuocheng.ding@intel.com>
The HRESET feature needs to have not only the feature bit of 0x07.0x01.
eax[bit 22] in the CPUID, but also the associated 0x20 leaf, so, pass-
through the Host's 0x20 leaf to Guest.
Since currently, HRESET is only used to clear ITD's classification
history, only expose HRESET related CPUID when Guest has the ITD
capability.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
arch/x86/kvm/cpuid.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 9e78398f29dc..726b723ee34b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -197,6 +197,16 @@ static int kvm_check_hfi_cpuid(struct kvm_vcpu *vcpu,
if (data_size > hfi_features.nr_table_pages << PAGE_SHIFT)
return -EINVAL;
+ /*
+ * Check HRESET leaf since Guest's control of MSR_IA32_HW_HRESET_ENABLE
+ * needs to take effect on hardware.
+ */
+ best = cpuid_entry2_find(entries, nent, 0x20, 0);
+
+ /* Cannot set the Guest bit that is unsopported by Host. */
+ if (best && best->ebx & ~cpuid_ebx(0x20))
+ return -EINVAL;
+
return 0;
}
@@ -784,6 +794,10 @@ void kvm_set_cpu_caps(void)
F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
);
+ /* Currently HRESET is used to reset the ITD related history. */
+ if (kvm_cpu_cap_has(X86_FEATURE_ITD))
+ kvm_cpu_cap_set(X86_FEATURE_HRESET);
+
kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
F(AMX_COMPLEX)
@@ -1030,7 +1044,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
switch (function) {
case 0:
/* Limited to the highest leaf implemented in KVM. */
- entry->eax = min(entry->eax, 0x1fU);
+ entry->eax = min(entry->eax, 0x20U);
break;
case 1:
cpuid_entry_override(entry, CPUID_1_EDX);
@@ -1300,6 +1314,16 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
break;
}
break;
+ /* Intel HRESET */
+ case 0x20:
+ if (!kvm_cpu_cap_has(X86_FEATURE_HRESET)) {
+ entry->eax = 0;
+ entry->ebx = 0;
+ entry->ecx = 0;
+ entry->edx = 0;
+ break;
+ }
+ break;
case KVM_CPUID_SIGNATURE: {
const u32 *sigptr = (const u32 *)KVM_SIGNATURE;
entry->eax = KVM_CPUID_FEATURES;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC 26/26] Documentation: KVM: Add description of pkg_therm_lock
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (24 preceding siblings ...)
2024-02-03 9:12 ` [RFC 25/26] KVM: x86: Expose HRESET feature's CPUID to Guest Zhao Liu
@ 2024-02-03 9:12 ` Zhao Liu
2024-02-22 7:42 ` [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-03 9:12 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson, Rafael J . Wysocki,
Daniel Lezcano, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, kvm, linux-pm, linux-kernel, x86
Cc: Ricardo Neri, Len Brown, Zhang Rui, Zhenyu Wang, Zhuocheng Ding,
Dapeng Mi, Yanting Jiang, Yongwei Ma, Vineeth Pillai,
Suleiman Souhlal, Masami Hiramatsu, David Dai, Saravana Kannan,
Zhao Liu
From: Zhao Liu <zhao1.liu@intel.com>
pkg_therm_lock is a per-VM lock and used in PTS, HFI and ITD
virtualization supports. Add description about it.
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
Documentation/virt/kvm/locking.rst | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 02880d5552d5..84a138916898 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -290,7 +290,7 @@ time it will be set using the Dirty tracking mechanism described above.
wakeup.
``vendor_module_lock``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^
:Type: mutex
:Arch: x86
:Protects: loading a vendor module (kvm_amd or kvm_intel)
@@ -298,3 +298,14 @@ time it will be set using the Dirty tracking mechanism described above.
taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
many operations need to take cpu_hotplug_lock when loading a vendor module,
e.g. updating static calls.
+
+``pkg_therm_lock``
+^^^^^^^^^^^^^^^^^^
+:Type: mutex
+:Arch: x86 (vmx)
+:Protects: PTS, HFI and ITD emulation
+:Comment: This is a per-VM lock and it is used for VM level thermal features'
+ emulation (PTS, HFI and ITD). When these features' emulated MSRs need to
+ be changed, or when we handle the virtual HFI table's update, this lock is
+ needed to create the atomi context and to avoid competing behavior of other
+ vCPUs in the same VM.
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC 00/26] Intel Thread Director Virtualization
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
` (25 preceding siblings ...)
2024-02-03 9:12 ` [RFC 26/26] Documentation: KVM: Add description of pkg_therm_lock Zhao Liu
@ 2024-02-22 7:42 ` Zhao Liu
26 siblings, 0 replies; 28+ messages in thread
From: Zhao Liu @ 2024-02-22 7:42 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: Rafael J . Wysocki, Daniel Lezcano, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H . Peter Anvin, kvm, linux-pm,
linux-kernel, x86, Ricardo Neri, Len Brown, Zhang Rui,
Zhenyu Wang, Zhuocheng Ding, Dapeng Mi, Yanting Jiang, Yongwei Ma,
Vineeth Pillai, Suleiman Souhlal, Masami Hiramatsu, David Dai,
Saravana Kannan, Zhao Liu
Ping Paolo & Sean,
Do you have any comment? Or do you think ITD virtualization is
appropriate to discuss at PUCK?
Thanks,
Zhao
On Sat, Feb 03, 2024 at 05:11:48PM +0800, Zhao Liu wrote:
> Date: Sat, 3 Feb 2024 17:11:48 +0800
> From: Zhao Liu <zhao1.liu@linux.intel.com>
> Subject: [RFC 00/26] Intel Thread Director Virtualization
> X-Mailer: git-send-email 2.34.1
>
> From: Zhao Liu <zhao1.liu@intel.com>
>
> Hi list,
>
> This is our RFC to virtualize Intel Thread Director (ITD) feature for
> Guest, which is based on Ricardo's patch series about ITD related
> support in HFI driver ("[PATCH 0/9] thermal: intel: hfi: Prework for the
> virtualization of HFI" [1]).
>
> In short, the purpose of this patch set is to enable the ITD-based
> scheduling logic in Guest so that Guest can better schedule Guest tasks
> on Intel hybrid platforms.
>
> Currently, ITD is necessary for Windows VMs. Based on ITD virtualization
> support, the Windows 11 Guest could have significant performance
> improvement (for example, on i9-13900K, up to 14%+ improvement on
> 3DMARK).
>
> Our ITD virtualization is not bound to VMs' hybrid topology or vCPUs'
> CPU affinity. However, in our practice, the ITD scheduling optimization
> for win11 VMs works best when combined with hybrid topology and CPU
> affinity (this is related to the specific implementation of Win11
> scheduling). For more details, please see the Section.1.2 "About hybrid
> topology and vCPU pinning".
>
> To enable ITD related scheduling optimization in Win11 VM, some other
> thermal related support is also needed (HWP, CPPC), but we could emulate
> it with dummy value in the VMM (We'll also be sending out extra patches
> in the future for these).
>
> Welcome your feedback!
>
>
> 1. Background and Motivation
> ============================
>
> 1.1. Background
> ^^^^^^^^^^^^^^^
>
> We have the use case to run games in the client Windows VM as the cloud
> gaming solution.
>
> Gaming VMs are performance-sensitive VMs on Client, so that they usually
> have two characteristics to ensure interactivity and performance:
>
> i) There will be vCPUs equal to or close to the number of Host pCPUs.
>
> ii) The vCPUs of Gaming VM are often bound to the pCPUs to achieve
> exclusive resources and avoid the overhead of migration.
>
> In this case, Host can't provide effective scheduling for Guest, so we
> need to deliver more hardware-assisted scheduling capabilities to Guest
> to enhance Guest's scheduling.
>
> Windows 11 (and future Windows products) is heavily optimized for the
> Intel hybrid platform. To get the best performance, we need to
> virtualize hybrid scheduling features (HFI/ITD) for Windows Guest.
>
>
> 1.2. About hybrid topology and vCPU pinning
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Our ITD virtualization can support most vCPU topologies (except multiple
> packages/dies, see details in 3.5 Restrictions on Guest Topology), and
> can also support the case of non-pinning vCPUs (i.e. it can handle vCPU
> thread migration).
>
> The following is our performance measuremnt on an i9-13900K machine
> (2995Mhz, 24Cores, 32Thread(8+16) RAM: 14GB (16GB Physical)), with
> iGPU passthrough, running 3DMARK in Win11 Professional Guest:
>
>
> compared with smp topo case smp topo smp topo smp topo hybrid topo hybrid topo hybrid topo hybrid topo
> + affinity + ITD + ITD + affinity + ITD + ITD
> + affinity + affinity
> Time Spy - Overall 0.179% -0.250% 0.179% -0.107% 0.143% -0.179% -0.107%
> Graphics score 0.124% -0.249% 0.124% -0.083% 0.124% -0.166% -0.249%
> CPU score 0.916% -0.485% 1.149% -0.076% 0.722% -0.324% 11.915%
> Fire Strike Extreme - Overall 0.149% 0.000% 0.224% -1.021% -3.361% -1.319% -3.361%
> Graphics score 0.100% 0.050% 0.150% -1.376% -3.427% -1.676% -3.652%
> Physics score 5.060% 0.759% 0.518% -2.907% -10.914% -0.897% 14.638%
> Combined score 0.120% -0.179% 0.418% 0.060% -2.929% -0.179% -2.809%
> Fire Strike - Overall 0.350% -0.085% 0.193% -1.377% -1.365% -1.509% -1.787%
> Graphics score 0.256% -0.047% 0.210% -1.527% -1.376% -1.504% -2.320%
> Physics score 3.695% -2.180% 0.629% -1.581% -6.846% -1.444% 14.100%
> Combined score 0.415% -0.128% 0.128% -0.957% -1.052% -1.594% -0.957%
> CPU Profile Max Threads 1.836% 0.298% 1.786% -0.069% 1.545% 0.025% 9.472%
> 16 Threads 4.290% 0.989% 3.588% 0.595% 1.580% 0.848% 11.295%
> 8 Threads -22.632% -0.602% -23.167% -0.988% -1.345% -1.340% 8.648%
> 4 Threads -21.598% 0.449% -21.429% -0.817% 1.951% -0.832% 2.084%
> 2 Threads -12.912% -0.014% -12.006% -0.481% -0.609% -0.595% 1.161%
> 1 Threads -3.793% -0.137% -3.793% -0.495% -3.189% -0.495% 1.154%
>
>
> Based on the above result, we can find exposing only HFI/ITD to win11
> VMs without hybrid topology or CPU affinity (case "smp topo + ITD")
> won't hurt performance, but would also not get any performance
> improvement.
>
> Setting both hybrid topology and CPU affinity for ITD, then win11 VMs
> get significate performance improvement (up to 14%+, compared with the
> case setting smp topology without CPU affinity).
>
> Not only the numerical results of 3DMARK, but in practice, there is an
> significate improvement in the frame rate of the games.
>
> Also, the more powerful the machine, the more significate the
> performance gains!
>
> Therefore, the best practice for enabling ITD scheduling optimization
> is to set up both CPU affinity and hybrid topology for win11 Guest while
> enabling our ITD virtualization.
>
> Our earlier QEMU prototype RFC [2] presented the initial hybrid
> topology support for VMs. And currently our another proposal about
> "QOM topology" [3] has been raised in the QEMU community, which is the
> first step towards the hybrid topology implementation based on QOM
> approach.
>
>
> 2. Introduction of HFI and ITD
> ==============================
>
> Intel provides Hardware Feedback Interface (HFI) feature to allow
> hardware to provide guidance to the OS scheduler to perform optimal
> workload scheduling through a hardware feedback interface structure in
> memory [4]. This HFI structure is called HFI table.
>
> For now, the guidance includes performance and energy efficiency
> hints, and it could be update via thermal interrupt as the actual
> operating conditions of the processor change during run time.
>
> Intel Thread Director (ITD) feature extends the HFI to provide
> performance and energy efficiency data for advanced classes of
> instructions.
>
> Since ITD is an extension of HFI, our ITD virtualization also
> virtualizes the native HFI feature.
>
>
> 3. Dependencies of ITD
> ======================
>
> ITD is a thermal FEATURE that requires:
> * PTM (Package Thermal Management, alias, PTS)
> * HFI (Hardware Feedback Interface)
>
> In order to support the notification mechanism of ITD/HFI dynamic
> update, we also need to add thermal interrupt related support,
> including the following two features:
> * ACPI (Thermal Monitor and Software Controlled Clock Facilities)
> * TM (Thermal Monitor, alias, TM1/ACC)
>
> Therefore, we must also consider support for the emulation of all
> the above dependencies.
>
>
> 3.1. ACPI emulation
> ^^^^^^^^^^^^^^^^^^^
>
> For both ACPI, we can support it by emulating the RDMSR/WRMSR of the
> associated MSRs and adding the ability to inject thermal interrupts.
> But in fact, we don't really inject termal interrupts into Guest for
> the termal conditions corresponding to ACPI. Here the termal interrupt
> is prepared for the subsequent HFI/ITD.
>
>
> 3.2. TM emulation
> ^^^^^^^^^^^^^^^^^
>
> TM is a hardware feature and its CPUID bit only indicates the presence
> of the automatic thermal monitoring facilities. For TM, there's no
> interactive interface between OS and hardware, but its flag is one of
> the prerequisites for the OS to enable thermal interrupt.
>
> Thereby, as the support for TM, it is enough for us to expose its CPUID
> flag to Guest.
>
>
> 3.3. PTM emulation
> ^^^^^^^^^^^^^^^^^^
>
> PTM is a package-scope feature that includes package-level MSR and
> package-level thermal interrupt. Unfortunately, KVM currently only
> supports thread-scope MSR handling, and also doesn't care about the
> specific Guest's topology.
>
> But considering that our purpose of supporting PTM in KVM is to further
> support ITD, and the current platforms with ITD are all 1 package, so we
> emulate the MSRs of the package scope provided by PTM at the VM level.
>
> In this way, the VMM is required to set only one package topology for
> the PTM. In order to alleviate this limitation, we only expose the PTM
> feature bit to Guest when ITD needs to be supported.
>
>
> 3.4. HFI emulation
> ^^^^^^^^^^^^^^^^^^
>
> ITD is the extension of HFI, so both HFI and ITD depend on HFI table.
> HFI itself is used on the Host for power-related management control, so
> we should only expose HFI to Guest when we need to enable ITD.
>
> HFI also relies on PTM interrupt control, so it also has requirements
> for package topology, and we also emulate HFI (including ITD) at the VM
> level.
>
> In addition, because the HFI driver allocates HFI instances per die,
> this also affects HFI (and ITD) and must limit the Guest to only set one
> die.
>
>
> 3.5. Restrictions on Guest Topology
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Due to KVM's incomplete support for MSR topology and the requirement for
> HFI instance management in the kernel, PTM, HFI, and ITD limit the
> topology of the Guest (mainly restricting the topology types created on
> the VMM side).
>
> Therefore, we only expose PTM, HFI, and ITD to userspace when we need to
> support ITD. At the same time, considering that currently, ITD is only
> used on the client platform with 1 package and 1 die, such temporary
> restrictions will not have too much impact.
>
>
> 4. Overview of ITD (and HFI) virtualization
> ===========================================
>
> The main tasks of ITD (including HFI) virtualization are:
> * maintain a virtual HFI table for VM.
> * inject thermal interrupt when HFI table updates.
> * handle related MSRs' emulation and adjust HFI table based on MSR's
> control bits.
> * expose ITD/HFI configuration info in related CPUID leaves.
>
> The most important of these is the maintenance of the virtual HFI table.
> Although the HFI table should also be per package, since ITD/HFI related
> MSRs are treated as per VM in KVM, we also treat the virtual HFI table
> as per VM.
>
>
> 4.1. HFI table building
> ^^^^^^^^^^^^^^^^^^^^^^^
>
> HFI table contains a table header and many table entries. Each table
> entry is identified by an hfi table index, and each CPU corresponds to
> one of the hfi table indexes.
>
> ITD and HFI features both depend on the HFI table, but their HFI table
> are a little different. The HFI table provided by the ITD feature has
> more classes (in terms of more columns in the table) than the HFI table
> of native HFI feature.
>
> The virtual HFI table in KVM is built based on the actual HFI table,
> which is maintained by HFI instance in HFI driver. We extract the HFI
> data of the pCPUs, which vCPUs are running on, to form a virtual HFI
> table.
>
>
> 4.2. HFI table index
> ^^^^^^^^^^^^^^^^^^^^
>
> There are many entries in the HFI table, and the vCPU will be assigned
> an HFI table index to specify the entry it maps. KVM will fill the
> pCPU's HFI data (the pCPU that vCPU is running on) into the entry
> corresponding to the HFI table index of the vCPU in the vcitual HFI
> table.
>
> This index is set by VMM in CPUID.
>
>
> 4.3. HFI table updating
> ^^^^^^^^^^^^^^^^^^^^^^^
>
> On some platforms, the HFI table will be dynamically updated with
> thermal interrupts. In order to update the virtual HFI table in time, we
> added the per-VM notifier to the HFI driver to notify KVM to update the
> virtual HFI table for the VM, and then inject thermal interrupt into the
> VM to notify the Guest.
>
> There is another case that needs to update the virtual HFI table, that
> is, when the vCPU is migrated, the pCPU where it is located is changed,
> and the corresponding virtual HFI data should also be updated to the new
> pCPU's data. In this case, in order to reduce overhead, we can only
> update the data of a single vPCU without traversing the entire virtual
> HFI table.
>
>
> 5. Patch Summary
> ================
>
> Patch 01-03: Prepare the bit definition, the hfi helpers and hfi data
> structures that KVM needs.
> Patch 04-05: Add the sched_out arch hook and reset the classification
> history at sched_in()/schedu_out().
> Patch 06-10: Add emulations of ACPI, TM and PTM, mainly about CPUID and
> related MSRs.
> Patch 11-20: Add the emulation support for HFI, including maintaining
> the HFI table for VM.
> Patch 21-23: Add the emulation support for ITD, including extending HFI
> to ITD and passing through the classification MSRs.
> Patch 24-25: Add HRESET emulation support, which is also used by IPC
> classes feature.
> Patch 26: Add the brief doc about the per-VM lock - pkg_therm_lock.
>
>
> 6. References
> =============
>
> [1]: [PATCH 0/9] thermal: intel: hfi: Prework for the virtualization of HFI
> https://lore.kernel.org/lkml/20240203040515.23947-1-ricardo.neri-calderon@linux.intel.com/
> [2]: [RFC 00/52] Introduce hybrid CPU topology,
> https://lore.kernel.org/qemu-devel/20230213095035.158240-1-zhao1.liu@linux.intel.com/
> [3]: [RFC 00/41] qom-topo: Abstract Everything about CPU Topology,
> https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1.liu@linux.intel.com/
> [4]: SDM, vol. 3B, section 15.6 HARDWARE FEEDBACK INTERFACE AND INTEL
> THREAD DIRECTOR
>
>
> Thanks and Best Regards,
> Zhao
> ---
> Zhao Liu (17):
> thermal: Add bit definition for x86 thermal related MSRs
> KVM: Add kvm_arch_sched_out() hook
> KVM: x86: Reset hardware history at vCPU's sched_in/out
> KVM: VMX: Add helpers to handle the writes to MSR's R/O and R/WC0 bits
> KVM: x86: cpuid: Define CPUID 0x06.eax by kvm_cpu_cap_mask()
> KVM: VMX: Introduce HFI description structure
> KVM: VMX: Introduce HFI table index for vCPU
> KVM: x86: Introduce the HFI dynamic update request and kvm_x86_ops
> KVM: VMX: Allow to inject thermal interrupt without HFI update
> KVM: VMX: Emulate HFI related bits in package thermal MSRs
> KVM: VMX: Emulate the MSRs of HFI feature
> KVM: x86: Expose HFI feature bit and HFI info in CPUID
> KVM: VMX: Extend HFI table and MSR emulation to support ITD
> KVM: VMX: Pass through ITD classification related MSRs to Guest
> KVM: x86: Expose ITD feature bit and related info in CPUID
> KVM: VMX: Emulate the MSR of HRESET feature
> Documentation: KVM: Add description of pkg_therm_lock
>
> Zhuocheng Ding (9):
> thermal: intel: hfi: Add helpers to build HFI/ITD structures
> thermal: intel: hfi: Add HFI notifier helpers to notify HFI update
> KVM: VMX: Emulate ACPI (CPUID.0x01.edx[bit 22]) feature
> KVM: x86: Expose TM/ACC (CPUID.0x01.edx[bit 29]) feature bit to VM
> KVM: VMX: Emulate PTM/PTS (CPUID.0x06.eax[bit 6]) feature
> KVM: VMX: Support virtual HFI table for VM
> KVM: VMX: Sync update of Host HFI table to Guest
> KVM: VMX: Update HFI table when vCPU migrates
> KVM: x86: Expose HRESET feature's CPUID to Guest
>
> Documentation/virt/kvm/locking.rst | 13 +-
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/mips/include/asm/kvm_host.h | 1 +
> arch/powerpc/include/asm/kvm_host.h | 1 +
> arch/riscv/include/asm/kvm_host.h | 1 +
> arch/s390/include/asm/kvm_host.h | 1 +
> arch/x86/include/asm/hfi.h | 28 ++
> arch/x86/include/asm/kvm-x86-ops.h | 3 +-
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/include/asm/msr-index.h | 54 +-
> arch/x86/kvm/cpuid.c | 201 +++++++-
> arch/x86/kvm/irq.h | 1 +
> arch/x86/kvm/lapic.c | 9 +
> arch/x86/kvm/svm/svm.c | 8 +
> arch/x86/kvm/vmx/vmx.c | 751 +++++++++++++++++++++++++++-
> arch/x86/kvm/vmx/vmx.h | 79 ++-
> arch/x86/kvm/x86.c | 18 +
> drivers/thermal/intel/intel_hfi.c | 212 +++++++-
> drivers/thermal/intel/therm_throt.c | 1 -
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_main.c | 1 +
> 21 files changed, 1343 insertions(+), 44 deletions(-)
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2024-02-22 7:29 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-03 9:11 [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
2024-02-03 9:11 ` [RFC 01/26] thermal: Add bit definition for x86 thermal related MSRs Zhao Liu
2024-02-03 9:11 ` [RFC 02/26] thermal: intel: hfi: Add helpers to build HFI/ITD structures Zhao Liu
2024-02-03 9:11 ` [RFC 03/26] thermal: intel: hfi: Add HFI notifier helpers to notify HFI update Zhao Liu
2024-02-03 9:11 ` [RFC 04/26] KVM: Add kvm_arch_sched_out() hook Zhao Liu
2024-02-03 9:11 ` [RFC 05/26] KVM: x86: Reset hardware history at vCPU's sched_in/out Zhao Liu
2024-02-03 9:11 ` [RFC 06/26] KVM: VMX: Add helpers to handle the writes to MSR's R/O and R/WC0 bits Zhao Liu
2024-02-03 9:11 ` [RFC 07/26] KVM: VMX: Emulate ACPI (CPUID.0x01.edx[bit 22]) feature Zhao Liu
2024-02-03 9:11 ` [RFC 08/26] KVM: x86: Expose TM/ACC (CPUID.0x01.edx[bit 29]) feature bit to VM Zhao Liu
2024-02-03 9:11 ` [RFC 09/26] KVM: x86: cpuid: Define CPUID 0x06.eax by kvm_cpu_cap_mask() Zhao Liu
2024-02-03 9:11 ` [RFC 10/26] KVM: VMX: Emulate PTM/PTS (CPUID.0x06.eax[bit 6]) feature Zhao Liu
2024-02-03 9:11 ` [RFC 11/26] KVM: VMX: Introduce HFI description structure Zhao Liu
2024-02-03 9:12 ` [RFC 12/26] KVM: VMX: Introduce HFI table index for vCPU Zhao Liu
2024-02-03 9:12 ` [RFC 13/26] KVM: VMX: Support virtual HFI table for VM Zhao Liu
2024-02-03 9:12 ` [RFC 14/26] KVM: x86: Introduce the HFI dynamic update request and kvm_x86_ops Zhao Liu
2024-02-03 9:12 ` [RFC 15/26] KVM: VMX: Sync update of Host HFI table to Guest Zhao Liu
2024-02-03 9:12 ` [RFC 16/26] KVM: VMX: Update HFI table when vCPU migrates Zhao Liu
2024-02-03 9:12 ` [RFC 17/26] KVM: VMX: Allow to inject thermal interrupt without HFI update Zhao Liu
2024-02-03 9:12 ` [RFC 18/26] KVM: VMX: Emulate HFI related bits in package thermal MSRs Zhao Liu
2024-02-03 9:12 ` [RFC 19/26] KVM: VMX: Emulate the MSRs of HFI feature Zhao Liu
2024-02-03 9:12 ` [RFC 20/26] KVM: x86: Expose HFI feature bit and HFI info in CPUID Zhao Liu
2024-02-03 9:12 ` [RFC 21/26] KVM: VMX: Extend HFI table and MSR emulation to support ITD Zhao Liu
2024-02-03 9:12 ` [RFC 22/26] KVM: VMX: Pass through ITD classification related MSRs to Guest Zhao Liu
2024-02-03 9:12 ` [RFC 23/26] KVM: x86: Expose ITD feature bit and related info in CPUID Zhao Liu
2024-02-03 9:12 ` [RFC 24/26] KVM: VMX: Emulate the MSR of HRESET feature Zhao Liu
2024-02-03 9:12 ` [RFC 25/26] KVM: x86: Expose HRESET feature's CPUID to Guest Zhao Liu
2024-02-03 9:12 ` [RFC 26/26] Documentation: KVM: Add description of pkg_therm_lock Zhao Liu
2024-02-22 7:42 ` [RFC 00/26] Intel Thread Director Virtualization Zhao Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).