[RFC PATCH v2 00/34] MSR refactor with new MSR instructions support

linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support
@ 2025-04-22  8:21 Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h> Xin Li (Intel)
                   ` (34 more replies)
  0 siblings, 35 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Obviously the existing MSR code and the pv_ops MSR access APIs need some
love: https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/

hpa has started a discussion about how to refactor it last October:
https://lore.kernel.org/lkml/7a4de623-ecda-4369-a7ae-0c43ef328177@zytor.com/

The consensus so far is to utilize the alternatives mechanism to eliminate
the Xen MSR access overhead on native systems and enable new MSR instructions
based on their availability.

To achieve this, a code refactor is required:

Patch 1 relocates rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and
removes the inclusion of <asm/msr.h> in <asm/tsc.h>.  As a result,
<asm/msr.h> must now be explicitly included in several source files where
it was previously included implicitly through <asm/tsc.h>.

Patches 2 ~ 6 refactor the code to use the alternatives mechanism to read
PMC.

Patches 7 ~ 16 unify and simplify the MSR API definitions and usages.

Patches 17 ~ 19 add basic support for immediate form MSR instructions,
e.g., its CPU feature bit and opcode.

Patch 20 adds a new exception type to allow a function call inside an
alternative for instruction emulation to "kick back" the exception into
the alternatives pattern, possibly invoking a different exception handling
pattern there, or at least indicating the "real" location of the fault.

patches 21 and 22 refactor the code to use the alternatives mechanism to
read and write MSR.

Patches 23 ~ 34 are afterwards cleanups.


H. Peter Anvin (Intel) (1):
  x86/extable: Implement EX_TYPE_FUNC_REWIND

Xin Li (Intel) (33):
  x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h>
  x86/msr: Remove rdpmc()
  x86/msr: Rename rdpmcl() to rdpmcq()
  x86/msr: Convert rdpmcq() into a function
  x86/msr: Return u64 consistently in Xen PMC read functions
  x86/msr: Use the alternatives mechanism to read PMC
  x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses
  x86/msr: Convert a native_wrmsr() use to native_wrmsrq()
  x86/msr: Add the native_rdmsrq() helper
  x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses
  x86/msr: Remove calling native_{read,write}_msr{,_safe}() in
    pmu_msr_{read,write}()
  x86/msr: Remove pmu_msr_{read,write}()
  x86/xen/msr: Remove the error pointer argument from set_reg()
  x86/msr: refactor pv_cpu_ops.write_msr{_safe}()
  x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low)
  x86/msr: Change function type of native_read_msr_safe()
  x86/cpufeatures: Add a CPU feature bit for MSR immediate form
    instructions
  x86/opcode: Add immediate form MSR instructions
  x86/extable: Add support for immediate form MSR instructions
  x86/msr: Utilize the alternatives mechanism to write MSR
  x86/msr: Utilize the alternatives mechanism to read MSR
  x86/extable: Remove new dead code in ex_handler_msr()
  x86/mce: Use native MSR API __native_{wr,rd}msrq()
  x86/msr: Rename native_wrmsrq() to native_wrmsrq_no_trace()
  x86/msr: Rename native_wrmsr() to native_wrmsr_no_trace()
  x86/msr: Rename native_write_msr() to native_wrmsrq()
  x86/msr: Rename native_write_msr_safe() to native_wrmsrq_safe()
  x86/msr: Rename native_rdmsrq() to native_rdmsrq_no_trace()
  x86/msr: Rename native_rdmsr() to native_rdmsr_no_trace()
  x86/msr: Rename native_read_msr() to native_rdmsrq()
  x86/msr: Rename native_read_msr_safe() to native_rdmsrq_safe()
  x86/msr: Move the ARGS macros after the MSR read/write APIs
  x86/msr: Convert native_rdmsr_no_trace() uses to
    native_rdmsrq_no_trace() uses

 arch/x86/boot/startup/sme.c                   |   5 +-
 arch/x86/events/amd/brs.c                     |   4 +-
 arch/x86/events/amd/uncore.c                  |   2 +-
 arch/x86/events/core.c                        |   2 +-
 arch/x86/events/intel/core.c                  |   4 +-
 arch/x86/events/intel/ds.c                    |   2 +-
 arch/x86/events/msr.c                         |   3 +
 arch/x86/events/perf_event.h                  |   1 +
 arch/x86/events/probe.c                       |   2 +
 arch/x86/hyperv/hv_apic.c                     |   6 +-
 arch/x86/hyperv/hv_vtl.c                      |   4 +-
 arch/x86/hyperv/ivm.c                         |   7 +-
 arch/x86/include/asm/apic.h                   |   4 +-
 arch/x86/include/asm/asm.h                    |   6 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/extable_fixup_types.h    |   1 +
 arch/x86/include/asm/fred.h                   |   3 +-
 arch/x86/include/asm/microcode.h              |  10 +-
 arch/x86/include/asm/mshyperv.h               |   3 +-
 arch/x86/include/asm/msr.h                    | 637 ++++++++++++------
 arch/x86/include/asm/paravirt.h               |  78 ---
 arch/x86/include/asm/paravirt_types.h         |  13 -
 arch/x86/include/asm/sev-internal.h           |   9 +-
 arch/x86/include/asm/spec-ctrl.h              |   2 +-
 arch/x86/include/asm/suspend_32.h             |   1 +
 arch/x86/include/asm/suspend_64.h             |   1 +
 arch/x86/include/asm/switch_to.h              |   4 +-
 arch/x86/include/asm/tsc.h                    |  76 ++-
 arch/x86/kernel/cpu/amd.c                     |   2 +-
 arch/x86/kernel/cpu/common.c                  |  10 +-
 arch/x86/kernel/cpu/mce/core.c                |  61 +-
 arch/x86/kernel/cpu/microcode/amd.c           |  10 +-
 arch/x86/kernel/cpu/microcode/core.c          |   4 +-
 arch/x86/kernel/cpu/microcode/intel.c         |   8 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c     |  25 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        |   2 +-
 arch/x86/kernel/cpu/scattered.c               |   1 +
 arch/x86/kernel/cpu/umwait.c                  |   4 +-
 arch/x86/kernel/fpu/xstate.h                  |   1 +
 arch/x86/kernel/hpet.c                        |   1 +
 arch/x86/kernel/kvm.c                         |   2 +-
 arch/x86/kernel/kvmclock.c                    |   2 +-
 arch/x86/kernel/paravirt.c                    |   5 -
 arch/x86/kernel/process_64.c                  |   1 +
 arch/x86/kernel/trace_clock.c                 |   2 +-
 arch/x86/kernel/tsc_sync.c                    |   1 +
 arch/x86/kvm/svm/svm.c                        |  34 +-
 arch/x86/kvm/vmx/vmx.c                        |  12 +-
 arch/x86/lib/kaslr.c                          |   2 +-
 arch/x86/lib/x86-opcode-map.txt               |   5 +-
 arch/x86/mm/extable.c                         | 181 +++--
 arch/x86/realmode/init.c                      |   1 +
 arch/x86/xen/enlighten_pv.c                   | 112 ++-
 arch/x86/xen/pmu.c                            |  63 +-
 arch/x86/xen/xen-asm.S                        | 113 ++++
 arch/x86/xen/xen-ops.h                        |  14 +-
 drivers/acpi/processor_perflib.c              |   1 +
 drivers/acpi/processor_throttling.c           |   3 +-
 drivers/cpufreq/amd-pstate-ut.c               |   2 +
 drivers/hwmon/hwmon-vid.c                     |   4 +
 drivers/net/vmxnet3/vmxnet3_drv.c             |   6 +-
 .../intel/speed_select_if/isst_if_common.c    |   1 +
 drivers/platform/x86/intel/turbo_max_3.c      |   1 +
 tools/arch/x86/lib/x86-opcode-map.txt         |   5 +-
 64 files changed, 988 insertions(+), 605 deletions(-)


base-commit: f30a0c0d2b08b355c01392538de8fc872387cb2b
-- 
2.49.0


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h>
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-23 14:13   ` Dave Hansen
  2025-04-22  8:21 ` [RFC PATCH v2 02/34] x86/msr: Remove rdpmc() Xin Li (Intel)
                   ` (33 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Relocate rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h>, and
subsequently remove the inclusion of <asm/msr.h> in <asm/tsc.h>.
Consequently, <asm/msr.h> must be included in several source files
that previously did not require it.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/boot/startup/sme.c                   |  1 +
 arch/x86/events/msr.c                         |  3 +
 arch/x86/events/perf_event.h                  |  1 +
 arch/x86/events/probe.c                       |  2 +
 arch/x86/hyperv/ivm.c                         |  1 +
 arch/x86/include/asm/fred.h                   |  1 +
 arch/x86/include/asm/microcode.h              |  2 +
 arch/x86/include/asm/mshyperv.h               |  1 +
 arch/x86/include/asm/msr.h                    | 55 +-------------
 arch/x86/include/asm/suspend_32.h             |  1 +
 arch/x86/include/asm/suspend_64.h             |  1 +
 arch/x86/include/asm/switch_to.h              |  2 +
 arch/x86/include/asm/tsc.h                    | 76 ++++++++++++++++++-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c     |  1 +
 arch/x86/kernel/fpu/xstate.h                  |  1 +
 arch/x86/kernel/hpet.c                        |  1 +
 arch/x86/kernel/process_64.c                  |  1 +
 arch/x86/kernel/trace_clock.c                 |  2 +-
 arch/x86/kernel/tsc_sync.c                    |  1 +
 arch/x86/lib/kaslr.c                          |  2 +-
 arch/x86/realmode/init.c                      |  1 +
 drivers/acpi/processor_perflib.c              |  1 +
 drivers/acpi/processor_throttling.c           |  3 +-
 drivers/cpufreq/amd-pstate-ut.c               |  2 +
 drivers/hwmon/hwmon-vid.c                     |  4 +
 drivers/net/vmxnet3/vmxnet3_drv.c             |  4 +
 .../intel/speed_select_if/isst_if_common.c    |  1 +
 drivers/platform/x86/intel/turbo_max_3.c      |  1 +
 28 files changed, 115 insertions(+), 58 deletions(-)

diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 5738b31c8e60..591d6a4d2e59 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -44,6 +44,7 @@
 #include <asm/sections.h>
 #include <asm/coco.h>
 #include <asm/sev.h>
+#include <asm/msr.h>
 
 #define PGD_FLAGS		_KERNPG_TABLE_NOENC
 #define P4D_FLAGS		_KERNPG_TABLE_NOENC
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 8970ecef87c5..c39e49cecace 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -3,6 +3,9 @@
 #include <linux/sysfs.h>
 #include <linux/nospec.h>
 #include <asm/cpu_device_id.h>
+#include <asm/msr.h>
+#include <asm/tsc.h>
+
 #include "probe.h"
 
 enum perf_msr_id {
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index b29b452b1187..53ef48b4c65c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -17,6 +17,7 @@
 #include <asm/fpu/xstate.h>
 #include <asm/intel_ds.h>
 #include <asm/cpu.h>
+#include <asm/msr.h>
 
 /* To enable MSR tracing please use the generic trace points. */
 
diff --git a/arch/x86/events/probe.c b/arch/x86/events/probe.c
index fda35cf25528..bb719d0d3f0b 100644
--- a/arch/x86/events/probe.c
+++ b/arch/x86/events/probe.c
@@ -2,6 +2,8 @@
 #include <linux/export.h>
 #include <linux/types.h>
 #include <linux/bits.h>
+
+#include <asm/msr.h>
 #include "probe.h"
 
 static umode_t
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 1b8a2415183b..8209de792388 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -22,6 +22,7 @@
 #include <asm/realmode.h>
 #include <asm/e820/api.h>
 #include <asm/desc.h>
+#include <asm/msr.h>
 #include <uapi/asm/vmx.h>
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 2a29e5216881..12b34d5b2953 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -9,6 +9,7 @@
 #include <linux/const.h>
 
 #include <asm/asm.h>
+#include <asm/msr.h>
 #include <asm/trapnr.h>
 
 /*
diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 263ea3dd0001..107a1aaa211b 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_MICROCODE_H
 #define _ASM_X86_MICROCODE_H
 
+#include <asm/msr.h>
+
 struct cpu_signature {
 	unsigned int sig;
 	unsigned int pf;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index bab5ccfc60a7..15d00dace70f 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -8,6 +8,7 @@
 #include <linux/io.h>
 #include <asm/nospec-branch.h>
 #include <asm/paravirt.h>
+#include <asm/msr.h>
 #include <hyperv/hvhdk.h>
 
 /*
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 2ccc78ebc3d7..2caa13830e11 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -12,6 +12,7 @@
 #include <uapi/asm/msr.h>
 #include <asm/shared/msr.h>
 
+#include <linux/types.h>
 #include <linux/percpu.h>
 
 struct msr_info {
@@ -169,60 +170,6 @@ native_write_msr_safe(u32 msr, u32 low, u32 high)
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
-/**
- * rdtsc() - returns the current TSC without ordering constraints
- *
- * rdtsc() returns the result of RDTSC as a 64-bit integer.  The
- * only ordering constraint it supplies is the ordering implied by
- * "asm volatile": it will put the RDTSC in the place you expect.  The
- * CPU can and will speculatively execute that RDTSC, though, so the
- * results can be non-monotonic if compared on different CPUs.
- */
-static __always_inline u64 rdtsc(void)
-{
-	DECLARE_ARGS(val, low, high);
-
-	asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));
-
-	return EAX_EDX_VAL(val, low, high);
-}
-
-/**
- * rdtsc_ordered() - read the current TSC in program order
- *
- * rdtsc_ordered() returns the result of RDTSC as a 64-bit integer.
- * It is ordered like a load to a global in-memory counter.  It should
- * be impossible to observe non-monotonic rdtsc_unordered() behavior
- * across multiple CPUs as long as the TSC is synced.
- */
-static __always_inline u64 rdtsc_ordered(void)
-{
-	DECLARE_ARGS(val, low, high);
-
-	/*
-	 * The RDTSC instruction is not ordered relative to memory
-	 * access.  The Intel SDM and the AMD APM are both vague on this
-	 * point, but empirically an RDTSC instruction can be
-	 * speculatively executed before prior loads.  An RDTSC
-	 * immediately after an appropriate barrier appears to be
-	 * ordered as a normal load, that is, it provides the same
-	 * ordering guarantees as reading from a global memory location
-	 * that some other imaginary CPU is updating continuously with a
-	 * time stamp.
-	 *
-	 * Thus, use the preferred barrier on the respective CPU, aiming for
-	 * RDTSCP as the default.
-	 */
-	asm volatile(ALTERNATIVE_2("rdtsc",
-				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
-				   "rdtscp", X86_FEATURE_RDTSCP)
-			: EAX_EDX_RET(val, low, high)
-			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
-			:: "ecx");
-
-	return EAX_EDX_VAL(val, low, high);
-}
-
 static inline u64 native_read_pmc(int counter)
 {
 	DECLARE_ARGS(val, low, high);
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index d8416b3bf832..e8e5aab06255 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -9,6 +9,7 @@
 
 #include <asm/desc.h>
 #include <asm/fpu/api.h>
+#include <asm/msr.h>
 
 /* image of the saved processor state */
 struct saved_context {
diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h
index 54df06687d83..b512f9665f78 100644
--- a/arch/x86/include/asm/suspend_64.h
+++ b/arch/x86/include/asm/suspend_64.h
@@ -9,6 +9,7 @@
 
 #include <asm/desc.h>
 #include <asm/fpu/api.h>
+#include <asm/msr.h>
 
 /*
  * Image of the saved processor state, used by the low level ACPI suspend to
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 75248546403d..4f21df7af715 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -52,6 +52,8 @@ do {									\
 } while (0)
 
 #ifdef CONFIG_X86_32
+#include <asm/msr.h>
+
 static inline void refresh_sysenter_cs(struct thread_struct *thread)
 {
 	/* Only happens when SEP is enabled, no need to test "SEP"arately: */
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 94408a784c8e..13335a130edf 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -7,7 +7,81 @@
 
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
-#include <asm/msr.h>
+
+/*
+ * both i386 and x86_64 returns 64-bit value in edx:eax, but gcc's "A"
+ * constraint has different meanings. For i386, "A" means exactly
+ * edx:eax, while for x86_64 it doesn't mean rdx:rax or edx:eax. Instead,
+ * it means rax *or* rdx.
+ */
+#ifdef CONFIG_X86_64
+/* Using 64-bit values saves one instruction clearing the high half of low */
+#define DECLARE_ARGS(val, low, high)	unsigned long low, high
+#define EAX_EDX_VAL(val, low, high)	((low) | (high) << 32)
+#define EAX_EDX_RET(val, low, high)	"=a" (low), "=d" (high)
+#else
+#define DECLARE_ARGS(val, low, high)	u64 val
+#define EAX_EDX_VAL(val, low, high)	(val)
+#define EAX_EDX_RET(val, low, high)	"=A" (val)
+#endif
+
+/**
+ * rdtsc() - returns the current TSC without ordering constraints
+ *
+ * rdtsc() returns the result of RDTSC as a 64-bit integer.  The
+ * only ordering constraint it supplies is the ordering implied by
+ * "asm volatile": it will put the RDTSC in the place you expect.  The
+ * CPU can and will speculatively execute that RDTSC, though, so the
+ * results can be non-monotonic if compared on different CPUs.
+ */
+static __always_inline u64 rdtsc(void)
+{
+	DECLARE_ARGS(val, low, high);
+
+	asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));
+
+	return EAX_EDX_VAL(val, low, high);
+}
+
+/**
+ * rdtsc_ordered() - read the current TSC in program order
+ *
+ * rdtsc_ordered() returns the result of RDTSC as a 64-bit integer.
+ * It is ordered like a load to a global in-memory counter.  It should
+ * be impossible to observe non-monotonic rdtsc_unordered() behavior
+ * across multiple CPUs as long as the TSC is synced.
+ */
+static __always_inline u64 rdtsc_ordered(void)
+{
+	DECLARE_ARGS(val, low, high);
+
+	/*
+	 * The RDTSC instruction is not ordered relative to memory
+	 * access.  The Intel SDM and the AMD APM are both vague on this
+	 * point, but empirically an RDTSC instruction can be
+	 * speculatively executed before prior loads.  An RDTSC
+	 * immediately after an appropriate barrier appears to be
+	 * ordered as a normal load, that is, it provides the same
+	 * ordering guarantees as reading from a global memory location
+	 * that some other imaginary CPU is updating continuously with a
+	 * time stamp.
+	 *
+	 * Thus, use the preferred barrier on the respective CPU, aiming for
+	 * RDTSCP as the default.
+	 */
+	asm volatile(ALTERNATIVE_2("rdtsc",
+				   "lfence; rdtsc", X86_FEATURE_LFENCE_RDTSC,
+				   "rdtscp", X86_FEATURE_RDTSCP)
+			: EAX_EDX_RET(val, low, high)
+			/* RDTSCP clobbers ECX with MSR_TSC_AUX. */
+			:: "ecx");
+
+	return EAX_EDX_VAL(val, low, high);
+}
+
+#undef DECLARE_ARGS
+#undef EAX_EDX_VAL
+#undef EAX_EDX_RET
 
 /*
  * Standard way to access the cycle counter.
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 2a82eb6a0376..26c354bdea07 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -25,6 +25,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/resctrl.h>
 #include <asm/perf_event.h>
+#include <asm/msr.h>
 
 #include "../../events/perf_event.h" /* For X86_CONFIG() */
 #include "internal.h"
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index a3b7dcbdb060..52ce19289989 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -5,6 +5,7 @@
 #include <asm/cpufeature.h>
 #include <asm/fpu/xstate.h>
 #include <asm/fpu/xcr.h>
+#include <asm/msr.h>
 
 #ifdef CONFIG_X86_64
 DECLARE_PER_CPU(u64, xfd_state);
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index cc5d12232216..c9982a7c9536 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -12,6 +12,7 @@
 #include <asm/hpet.h>
 #include <asm/time.h>
 #include <asm/mwait.h>
+#include <asm/msr.h>
 
 #undef  pr_fmt
 #define pr_fmt(fmt) "hpet: " fmt
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 24e1ccf22912..cfa9c031de91 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -57,6 +57,7 @@
 #include <asm/unistd.h>
 #include <asm/fsgsbase.h>
 #include <asm/fred.h>
+#include <asm/msr.h>
 #ifdef CONFIG_IA32_EMULATION
 /* Not included via unistd.h */
 #include <asm/unistd_32_ia32.h>
diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
index b8e7abe00b06..708d61743d15 100644
--- a/arch/x86/kernel/trace_clock.c
+++ b/arch/x86/kernel/trace_clock.c
@@ -4,7 +4,7 @@
  */
 #include <asm/trace_clock.h>
 #include <asm/barrier.h>
-#include <asm/msr.h>
+#include <asm/tsc.h>
 
 /*
  * trace_clock_x86_tsc(): A clock that is just the cycle counter.
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index f1c7a86dbf49..ec3aa340d351 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -21,6 +21,7 @@
 #include <linux/kernel.h>
 #include <linux/smp.h>
 #include <linux/nmi.h>
+#include <asm/msr.h>
 #include <asm/tsc.h>
 
 struct tsc_adjust {
diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index a58f451a7dd3..b5893928d55c 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -8,7 +8,7 @@
  */
 #include <asm/asm.h>
 #include <asm/kaslr.h>
-#include <asm/msr.h>
+#include <asm/tsc.h>
 #include <asm/archrandom.h>
 #include <asm/e820/api.h>
 #include <asm/shared/io.h>
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 263787b4800c..ed5c63c0b4e5 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -9,6 +9,7 @@
 #include <asm/realmode.h>
 #include <asm/tlbflush.h>
 #include <asm/crash.h>
+#include <asm/msr.h>
 #include <asm/sev.h>
 
 struct real_mode_header *real_mode_header;
diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 53996f1a2d80..64b8d1e19594 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -20,6 +20,7 @@
 #include <acpi/processor.h>
 #ifdef CONFIG_X86
 #include <asm/cpufeature.h>
+#include <asm/msr.h>
 #endif
 
 #define ACPI_PROCESSOR_FILE_PERFORMANCE	"performance"
diff --git a/drivers/acpi/processor_throttling.c b/drivers/acpi/processor_throttling.c
index 00d045e5f524..8482e9a8a7aa 100644
--- a/drivers/acpi/processor_throttling.c
+++ b/drivers/acpi/processor_throttling.c
@@ -18,9 +18,10 @@
 #include <linux/sched.h>
 #include <linux/cpufreq.h>
 #include <linux/acpi.h>
+#include <linux/uaccess.h>
 #include <acpi/processor.h>
 #include <asm/io.h>
-#include <linux/uaccess.h>
+#include <asm/asm.h>
 
 /* ignore_tpc:
  *  0 -> acpi processor driver doesn't ignore _TPC values
diff --git a/drivers/cpufreq/amd-pstate-ut.c b/drivers/cpufreq/amd-pstate-ut.c
index 707fa81c749f..c8d031b297d2 100644
--- a/drivers/cpufreq/amd-pstate-ut.c
+++ b/drivers/cpufreq/amd-pstate-ut.c
@@ -31,6 +31,8 @@
 
 #include <acpi/cppc_acpi.h>
 
+#include <asm/msr.h>
+
 #include "amd-pstate.h"
 
 
diff --git a/drivers/hwmon/hwmon-vid.c b/drivers/hwmon/hwmon-vid.c
index 6d1175a51832..2df4956296ed 100644
--- a/drivers/hwmon/hwmon-vid.c
+++ b/drivers/hwmon/hwmon-vid.c
@@ -15,6 +15,10 @@
 #include <linux/kernel.h>
 #include <linux/hwmon-vid.h>
 
+#ifdef CONFIG_X86
+#include <asm/msr.h>
+#endif
+
 /*
  * Common code for decoding VID pins.
  *
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 3df6aabc7e33..7edd0b5e0e77 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -27,6 +27,10 @@
 #include <linux/module.h>
 #include <net/ip6_checksum.h>
 
+#ifdef CONFIG_X86
+#include <asm/msr.h>
+#endif
+
 #include "vmxnet3_int.h"
 #include "vmxnet3_xdp.h"
 
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c
index 44dcd165b4c0..8a5713593811 100644
--- a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c
+++ b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c
@@ -21,6 +21,7 @@
 
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
+#include <asm/msr.h>
 
 #include "isst_if_common.h"
 
diff --git a/drivers/platform/x86/intel/turbo_max_3.c b/drivers/platform/x86/intel/turbo_max_3.c
index 7e538bbd5b50..b5af3e91ba04 100644
--- a/drivers/platform/x86/intel/turbo_max_3.c
+++ b/drivers/platform/x86/intel/turbo_max_3.c
@@ -17,6 +17,7 @@
 
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
+#include <asm/msr.h>
 
 #define MSR_OC_MAILBOX			0x150
 #define MSR_OC_MAILBOX_CMD_OFFSET	32
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h>
  2025-04-22  8:21 ` [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h> Xin Li (Intel)
@ 2025-04-23 14:13   ` Dave Hansen
  2025-04-23 17:12     ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Hansen @ 2025-04-23 14:13 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/22/25 01:21, Xin Li (Intel) wrote:
> Relocate rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h>, and
> subsequently remove the inclusion of <asm/msr.h> in <asm/tsc.h>.
> Consequently, <asm/msr.h> must be included in several source files
> that previously did not require it.

I know it's mildly obvious but could you please add a problem statement
to these changelogs, even if it's just one little sentence?

	For some reason, there are some TSC-related functions in the
	MSR header even though there is a tsc.h header.

	Relocate rdtsc{,_ordered}() and	subsequently remove the
	inclusion of <asm/msr.h> in <asm/tsc.h>. Consequently,
	<asm/msr.h> must be included in several source files that
	previously did not require it.

But I agree with the concept, so with this fixed:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h>
  2025-04-23 14:13   ` Dave Hansen
@ 2025-04-23 17:12     ` Xin Li
  0 siblings, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-23 17:12 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/23/2025 7:13 AM, Dave Hansen wrote:
> On 4/22/25 01:21, Xin Li (Intel) wrote:
>> Relocate rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h>, and
>> subsequently remove the inclusion of <asm/msr.h> in <asm/tsc.h>.
>> Consequently, <asm/msr.h> must be included in several source files
>> that previously did not require it.
> 
> I know it's mildly obvious but could you please add a problem statement
> to these changelogs, even if it's just one little sentence?

So "ALWAYS make a changelog a complete story", right?

And that would be helpful for long term maintainability.

> 
> 	For some reason, there are some TSC-related functions in the
> 	MSR header even though there is a tsc.h header.
> 
> 	Relocate rdtsc{,_ordered}() and	subsequently remove the
> 	inclusion of <asm/msr.h> in <asm/tsc.h>. Consequently,
> 	<asm/msr.h> must be included in several source files that
> 	previously did not require it.
> 
> But I agree with the concept, so with this fixed:

TBH, I did hesitate to touch so many files just to include msr.h.

But because tsc.h doesn't reference any MSR definitions, it doesn't make 
sense to include msr.h in tsc.h.  I still did the big changes.

> 
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Thank you very much!

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 02/34] x86/msr: Remove rdpmc()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h> Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-23 14:23   ` Dave Hansen
  2025-04-22  8:21 ` [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq() Xin Li (Intel)
                   ` (32 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

rdpmc() is not used anywhere, remove it.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h      | 7 -------
 arch/x86/include/asm/paravirt.h | 7 -------
 2 files changed, 14 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 2caa13830e11..e05466e486fc 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -234,13 +234,6 @@ static inline int rdmsrq_safe(u32 msr, u64 *p)
 	return err;
 }
 
-#define rdpmc(counter, low, high)			\
-do {							\
-	u64 _l = native_read_pmc((counter));		\
-	(low)  = (u32)_l;				\
-	(high) = (u32)(_l >> 32);			\
-} while (0)
-
 #define rdpmcl(counter, val) ((val) = native_read_pmc(counter))
 
 #endif	/* !CONFIG_PARAVIRT_XXL */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 86a77528792d..c4dedb984735 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -244,13 +244,6 @@ static inline u64 paravirt_read_pmc(int counter)
 	return PVOP_CALL1(u64, cpu.read_pmc, counter);
 }
 
-#define rdpmc(counter, low, high)		\
-do {						\
-	u64 _l = paravirt_read_pmc(counter);	\
-	low = (u32)_l;				\
-	high = _l >> 32;			\
-} while (0)
-
 #define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter))
 
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 02/34] x86/msr: Remove rdpmc()
  2025-04-22  8:21 ` [RFC PATCH v2 02/34] x86/msr: Remove rdpmc() Xin Li (Intel)
@ 2025-04-23 14:23   ` Dave Hansen
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Hansen @ 2025-04-23 14:23 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/22/25 01:21, Xin Li (Intel) wrote:
> rdpmc() is not used anywhere, remove it.

I'm not sure it was *ever* used (at least since git started). Thanks for
finding this.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h> Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 02/34] x86/msr: Remove rdpmc() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-23 14:24   ` Dave Hansen
  2025-04-23 14:28   ` Sean Christopherson
  2025-04-22  8:21 ` [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function Xin Li (Intel)
                   ` (31 subsequent siblings)
  34 siblings, 2 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/events/amd/uncore.c              |  2 +-
 arch/x86/events/core.c                    |  2 +-
 arch/x86/events/intel/core.c              |  4 ++--
 arch/x86/events/intel/ds.c                |  2 +-
 arch/x86/include/asm/msr.h                |  2 +-
 arch/x86/include/asm/paravirt.h           |  2 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 ++++++------
 7 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index f231e1078e51..b9933ab3116c 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -152,7 +152,7 @@ static void amd_uncore_read(struct perf_event *event)
 	if (hwc->event_base_rdpmc < 0)
 		rdmsrq(hwc->event_base, new);
 	else
-		rdpmcl(hwc->event_base_rdpmc, new);
+		rdpmcq(hwc->event_base_rdpmc, new);
 
 	local64_set(&hwc->prev_count, new);
 	delta = (new << COUNTER_SHIFT) - (prev << COUNTER_SHIFT);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d390472f6c10..3da1f0b3446c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -139,7 +139,7 @@ u64 x86_perf_event_update(struct perf_event *event)
 	 */
 	prev_raw_count = local64_read(&hwc->prev_count);
 	do {
-		rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+		rdpmcq(hwc->event_base_rdpmc, new_raw_count);
 	} while (!local64_try_cmpxchg(&hwc->prev_count,
 				      &prev_raw_count, new_raw_count));
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1aed31514869..ba623e6cae1b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2739,12 +2739,12 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
 
 	if (!val) {
 		/* read Fixed counter 3 */
-		rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
+		rdpmcq((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
 		if (!slots)
 			return 0;
 
 		/* read PERF_METRICS */
-		rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
+		rdpmcq(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
 	} else {
 		slots = val[0];
 		metrics = val[1];
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index f33395c2e925..4074567219de 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2279,7 +2279,7 @@ intel_pmu_save_and_restart_reload(struct perf_event *event, int count)
 	WARN_ON(this_cpu_read(cpu_hw_events.enabled));
 
 	prev_raw_count = local64_read(&hwc->prev_count);
-	rdpmcl(hwc->event_base_rdpmc, new_raw_count);
+	rdpmcq(hwc->event_base_rdpmc, new_raw_count);
 	local64_set(&hwc->prev_count, new_raw_count);
 
 	/*
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index e05466e486fc..ed32637b1df6 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -234,7 +234,7 @@ static inline int rdmsrq_safe(u32 msr, u64 *p)
 	return err;
 }
 
-#define rdpmcl(counter, val) ((val) = native_read_pmc(counter))
+#define rdpmcq(counter, val) ((val) = native_read_pmc(counter))
 
 #endif	/* !CONFIG_PARAVIRT_XXL */
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c4dedb984735..63ca099f8368 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -244,7 +244,7 @@ static inline u64 paravirt_read_pmc(int counter)
 	return PVOP_CALL1(u64, cpu.read_pmc, counter);
 }
 
-#define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter))
+#define rdpmcq(counter, val) ((val) = paravirt_read_pmc(counter))
 
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 {
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 26c354bdea07..a5e21f44b0ca 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1019,8 +1019,8 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * used in L1 cache, second to capture accurate value that does not
 	 * include cache misses incurred because of instruction loads.
 	 */
-	rdpmcl(hit_pmcnum, hits_before);
-	rdpmcl(miss_pmcnum, miss_before);
+	rdpmcq(hit_pmcnum, hits_before);
+	rdpmcq(miss_pmcnum, miss_before);
 	/*
 	 * From SDM: Performing back-to-back fast reads are not guaranteed
 	 * to be monotonic.
@@ -1028,8 +1028,8 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * before proceeding.
 	 */
 	rmb();
-	rdpmcl(hit_pmcnum, hits_before);
-	rdpmcl(miss_pmcnum, miss_before);
+	rdpmcq(hit_pmcnum, hits_before);
+	rdpmcq(miss_pmcnum, miss_before);
 	/*
 	 * Use LFENCE to ensure all previous instructions are retired
 	 * before proceeding.
@@ -1051,8 +1051,8 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * before proceeding.
 	 */
 	rmb();
-	rdpmcl(hit_pmcnum, hits_after);
-	rdpmcl(miss_pmcnum, miss_after);
+	rdpmcq(hit_pmcnum, hits_after);
+	rdpmcq(miss_pmcnum, miss_after);
 	/*
 	 * Use LFENCE to ensure all previous instructions are retired
 	 * before proceeding.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq()
  2025-04-22  8:21 ` [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq() Xin Li (Intel)
@ 2025-04-23 14:24   ` Dave Hansen
  2025-04-23 14:28   ` Sean Christopherson
  1 sibling, 0 replies; 94+ messages in thread
From: Dave Hansen @ 2025-04-23 14:24 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/22/25 01:21, Xin Li (Intel) wrote:
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>

We had a non-trivial discussion about the l=>q renames. Please at least
include a sentence or two about those discussions.

For the code:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq()
  2025-04-22  8:21 ` [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq() Xin Li (Intel)
  2025-04-23 14:24   ` Dave Hansen
@ 2025-04-23 14:28   ` Sean Christopherson
  2025-04-23 15:06     ` Dave Hansen
  1 sibling, 1 reply; 94+ messages in thread
From: Sean Christopherson @ 2025-04-23 14:28 UTC (permalink / raw)
  To: Xin Li (Intel)
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On Tue, Apr 22, 2025, Xin Li (Intel) wrote:
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>  arch/x86/events/amd/uncore.c              |  2 +-
>  arch/x86/events/core.c                    |  2 +-
>  arch/x86/events/intel/core.c              |  4 ++--
>  arch/x86/events/intel/ds.c                |  2 +-
>  arch/x86/include/asm/msr.h                |  2 +-
>  arch/x86/include/asm/paravirt.h           |  2 +-
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 ++++++------
>  7 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
> index f231e1078e51..b9933ab3116c 100644
> --- a/arch/x86/events/amd/uncore.c
> +++ b/arch/x86/events/amd/uncore.c
> @@ -152,7 +152,7 @@ static void amd_uncore_read(struct perf_event *event)
>  	if (hwc->event_base_rdpmc < 0)
>  		rdmsrq(hwc->event_base, new);
>  	else
> -		rdpmcl(hwc->event_base_rdpmc, new);
> +		rdpmcq(hwc->event_base_rdpmc, new);

Now that rdpmc() is gone, i.e. rdpmcl/rdpmcq() is the only helper, why not simply
rename rdpmcl() => rdpmc()?  I see no point in adding a 'q' qualifier; it doesn't
disambiguate anything and IMO is pure noise.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq()
  2025-04-23 14:28   ` Sean Christopherson
@ 2025-04-23 15:06     ` Dave Hansen
  2025-04-23 17:23       ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Dave Hansen @ 2025-04-23 15:06 UTC (permalink / raw)
  To: Sean Christopherson, Xin Li (Intel)
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On 4/23/25 07:28, Sean Christopherson wrote:
> Now that rdpmc() is gone, i.e. rdpmcl/rdpmcq() is the only helper, why not simply
> rename rdpmcl() => rdpmc()?  I see no point in adding a 'q' qualifier; it doesn't
> disambiguate anything and IMO is pure noise.

That makes total sense to me.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq()
  2025-04-23 15:06     ` Dave Hansen
@ 2025-04-23 17:23       ` Xin Li
  0 siblings, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-23 17:23 UTC (permalink / raw)
  To: Dave Hansen, Sean Christopherson
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On 4/23/2025 8:06 AM, Dave Hansen wrote:
> On 4/23/25 07:28, Sean Christopherson wrote:
>> Now that rdpmc() is gone, i.e. rdpmcl/rdpmcq() is the only helper, why not simply
>> rename rdpmcl() => rdpmc()?  I see no point in adding a 'q' qualifier; it doesn't
>> disambiguate anything and IMO is pure noise.
> 
> That makes total sense to me.
> 

Unable to argue with two maintainers on a simple naming ;), so will make
the change.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (2 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-23 14:25   ` Dave Hansen
  2025-04-22  8:21 ` [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions Xin Li (Intel)
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/events/amd/uncore.c              |  2 +-
 arch/x86/events/core.c                    |  2 +-
 arch/x86/events/intel/core.c              |  4 ++--
 arch/x86/events/intel/ds.c                |  2 +-
 arch/x86/include/asm/msr.h                |  5 ++++-
 arch/x86/include/asm/paravirt.h           |  4 +---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 ++++++------
 7 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index b9933ab3116c..f2601c662783 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -152,7 +152,7 @@ static void amd_uncore_read(struct perf_event *event)
 	if (hwc->event_base_rdpmc < 0)
 		rdmsrq(hwc->event_base, new);
 	else
-		rdpmcq(hwc->event_base_rdpmc, new);
+		new = rdpmcq(hwc->event_base_rdpmc);
 
 	local64_set(&hwc->prev_count, new);
 	delta = (new << COUNTER_SHIFT) - (prev << COUNTER_SHIFT);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 3da1f0b3446c..0a3939b9965e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -139,7 +139,7 @@ u64 x86_perf_event_update(struct perf_event *event)
 	 */
 	prev_raw_count = local64_read(&hwc->prev_count);
 	do {
-		rdpmcq(hwc->event_base_rdpmc, new_raw_count);
+		new_raw_count = rdpmcq(hwc->event_base_rdpmc);
 	} while (!local64_try_cmpxchg(&hwc->prev_count,
 				      &prev_raw_count, new_raw_count));
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ba623e6cae1b..4370d0d86013 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2739,12 +2739,12 @@ static u64 intel_update_topdown_event(struct perf_event *event, int metric_end,
 
 	if (!val) {
 		/* read Fixed counter 3 */
-		rdpmcq((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
+		slots = rdpmcq(3 | INTEL_PMC_FIXED_RDPMC_BASE);
 		if (!slots)
 			return 0;
 
 		/* read PERF_METRICS */
-		rdpmcq(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
+		metrics = rdpmcq(INTEL_PMC_FIXED_RDPMC_METRICS);
 	} else {
 		slots = val[0];
 		metrics = val[1];
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 4074567219de..845439fd9c03 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2279,7 +2279,7 @@ intel_pmu_save_and_restart_reload(struct perf_event *event, int count)
 	WARN_ON(this_cpu_read(cpu_hw_events.enabled));
 
 	prev_raw_count = local64_read(&hwc->prev_count);
-	rdpmcq(hwc->event_base_rdpmc, new_raw_count);
+	new_raw_count = rdpmcq(hwc->event_base_rdpmc);
 	local64_set(&hwc->prev_count, new_raw_count);
 
 	/*
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index ed32637b1df6..01dc8e61ef97 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -234,7 +234,10 @@ static inline int rdmsrq_safe(u32 msr, u64 *p)
 	return err;
 }
 
-#define rdpmcq(counter, val) ((val) = native_read_pmc(counter))
+static __always_inline u64 rdpmcq(int counter)
+{
+	return native_read_pmc(counter);
+}
 
 #endif	/* !CONFIG_PARAVIRT_XXL */
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 63ca099f8368..590824916394 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -239,13 +239,11 @@ static inline int rdmsrq_safe(unsigned msr, u64 *p)
 	return err;
 }
 
-static inline u64 paravirt_read_pmc(int counter)
+static __always_inline u64 rdpmcq(int counter)
 {
 	return PVOP_CALL1(u64, cpu.read_pmc, counter);
 }
 
-#define rdpmcq(counter, val) ((val) = paravirt_read_pmc(counter))
-
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 {
 	PVOP_VCALL2(cpu.alloc_ldt, ldt, entries);
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index a5e21f44b0ca..276ffab194f6 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -1019,8 +1019,8 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * used in L1 cache, second to capture accurate value that does not
 	 * include cache misses incurred because of instruction loads.
 	 */
-	rdpmcq(hit_pmcnum, hits_before);
-	rdpmcq(miss_pmcnum, miss_before);
+	hits_before = rdpmcq(hit_pmcnum);
+	miss_before = rdpmcq(miss_pmcnum);
 	/*
 	 * From SDM: Performing back-to-back fast reads are not guaranteed
 	 * to be monotonic.
@@ -1028,8 +1028,8 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * before proceeding.
 	 */
 	rmb();
-	rdpmcq(hit_pmcnum, hits_before);
-	rdpmcq(miss_pmcnum, miss_before);
+	hits_before = rdpmcq(hit_pmcnum);
+	miss_before = rdpmcq(miss_pmcnum);
 	/*
 	 * Use LFENCE to ensure all previous instructions are retired
 	 * before proceeding.
@@ -1051,8 +1051,8 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * before proceeding.
 	 */
 	rmb();
-	rdpmcq(hit_pmcnum, hits_after);
-	rdpmcq(miss_pmcnum, miss_after);
+	hits_after = rdpmcq(hit_pmcnum);
+	miss_after = rdpmcq(miss_pmcnum);
 	/*
 	 * Use LFENCE to ensure all previous instructions are retired
 	 * before proceeding.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function
  2025-04-22  8:21 ` [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function Xin Li (Intel)
@ 2025-04-23 14:25   ` Dave Hansen
  0 siblings, 0 replies; 94+ messages in thread
From: Dave Hansen @ 2025-04-23 14:25 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/22/25 01:21, Xin Li (Intel) wrote:
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>

Code: good.  No changelog: bad.

Once there's some semblance of a changelog:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (3 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:40   ` Jürgen Groß
  2025-04-22  8:21 ` [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC Xin Li (Intel)
                   ` (29 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

The pv_ops PMC read API is defined as:
        u64 (*read_pmc)(int counter);

But Xen PMC read functions return unsigned long long, make them
return u64 consistently.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/xen/pmu.c     | 6 +++---
 arch/x86/xen/xen-ops.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index f06987b0efc3..9c1682af620a 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -346,7 +346,7 @@ bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
 	return true;
 }
 
-static unsigned long long xen_amd_read_pmc(int counter)
+static u64 xen_amd_read_pmc(int counter)
 {
 	struct xen_pmu_amd_ctxt *ctxt;
 	uint64_t *counter_regs;
@@ -366,7 +366,7 @@ static unsigned long long xen_amd_read_pmc(int counter)
 	return counter_regs[counter];
 }
 
-static unsigned long long xen_intel_read_pmc(int counter)
+static u64 xen_intel_read_pmc(int counter)
 {
 	struct xen_pmu_intel_ctxt *ctxt;
 	uint64_t *fixed_counters;
@@ -396,7 +396,7 @@ static unsigned long long xen_intel_read_pmc(int counter)
 	return arch_cntr_pair[counter].counter;
 }
 
-unsigned long long xen_read_pmc(int counter)
+u64 xen_read_pmc(int counter)
 {
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
 		return xen_amd_read_pmc(counter);
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 25e318ef27d6..dc886c3cc24d 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -274,7 +274,7 @@ static inline void xen_pmu_finish(int cpu) {}
 bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err);
 bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err);
 int pmu_apic_update(uint32_t reg);
-unsigned long long xen_read_pmc(int counter);
+u64 xen_read_pmc(int counter);
 
 #ifdef CONFIG_SMP
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions
  2025-04-22  8:21 ` [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions Xin Li (Intel)
@ 2025-04-22  8:40   ` Jürgen Groß
  0 siblings, 0 replies; 94+ messages in thread
From: Jürgen Groß @ 2025-04-22  8:40 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 339 bytes --]

On 22.04.25 10:21, Xin Li (Intel) wrote:
> The pv_ops PMC read API is defined as:
>          u64 (*read_pmc)(int counter);
> 
> But Xen PMC read functions return unsigned long long, make them
> return u64 consistently.
> 
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (4 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:38   ` Jürgen Groß
  2025-04-22  8:21 ` [RFC PATCH v2 07/34] x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses Xin Li (Intel)
                   ` (28 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

To eliminate the indirect call overhead introduced by the pv_ops API,
use the alternatives mechanism to read PMC:

    1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
       disabled feature, preventing the Xen PMC read code from being
       built and ensuring the native code is executed unconditionally.

    2) When built with CONFIG_XEN_PV:

       2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
            the kernel runtime binary is patched to unconditionally
            jump to the native PMC read code.

       2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
            kernel runtime binary is patched to unconditionally jump
            to the Xen PMC read code.

Consequently, remove the pv_ops PMC read API.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h            | 31 ++++++++++++++++++++-------
 arch/x86/include/asm/paravirt.h       |  5 -----
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/paravirt.c            |  1 -
 arch/x86/xen/enlighten_pv.c           |  2 --
 drivers/net/vmxnet3/vmxnet3_drv.c     |  2 +-
 6 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 01dc8e61ef97..33cf506e2fd6 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -8,6 +8,7 @@
 
 #include <asm/asm.h>
 #include <asm/errno.h>
+#include <asm/cpufeature.h>
 #include <asm/cpumask.h>
 #include <uapi/asm/msr.h>
 #include <asm/shared/msr.h>
@@ -73,6 +74,10 @@ static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
 static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
 #endif
 
+#ifdef CONFIG_XEN_PV
+extern u64 xen_read_pmc(int counter);
+#endif
+
 /*
  * __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
  * accessors and should not have any tracing or other functionality piggybacking
@@ -170,16 +175,32 @@ native_write_msr_safe(u32 msr, u32 low, u32 high)
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
-static inline u64 native_read_pmc(int counter)
+static __always_inline u64 native_rdpmcq(int counter)
 {
 	DECLARE_ARGS(val, low, high);
 
-	asm volatile("rdpmc" : EAX_EDX_RET(val, low, high) : "c" (counter));
+	asm_inline volatile("rdpmc" : EAX_EDX_RET(val, low, high) : "c" (counter));
+
 	if (tracepoint_enabled(rdpmc))
 		do_trace_rdpmc(counter, EAX_EDX_VAL(val, low, high), 0);
+
 	return EAX_EDX_VAL(val, low, high);
 }
 
+static __always_inline u64 rdpmcq(int counter)
+{
+#ifdef CONFIG_XEN_PV
+	if (cpu_feature_enabled(X86_FEATURE_XENPV))
+		return xen_read_pmc(counter);
+#endif
+
+	/*
+	 * 1) When built with !CONFIG_XEN_PV.
+	 * 2) When built with CONFIG_XEN_PV but not running on Xen hypervisor.
+	 */
+	return native_rdpmcq(counter);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 #include <asm/paravirt.h>
 #else
@@ -233,12 +254,6 @@ static inline int rdmsrq_safe(u32 msr, u64 *p)
 	*p = native_read_msr_safe(msr, &err);
 	return err;
 }
-
-static __always_inline u64 rdpmcq(int counter)
-{
-	return native_read_pmc(counter);
-}
-
 #endif	/* !CONFIG_PARAVIRT_XXL */
 
 /* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 590824916394..c7689f5f70d6 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -239,11 +239,6 @@ static inline int rdmsrq_safe(unsigned msr, u64 *p)
 	return err;
 }
 
-static __always_inline u64 rdpmcq(int counter)
-{
-	return PVOP_CALL1(u64, cpu.read_pmc, counter);
-}
-
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 {
 	PVOP_VCALL2(cpu.alloc_ldt, ldt, entries);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 631c306ce1ff..475f508531d6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -101,8 +101,6 @@ struct pv_cpu_ops {
 	u64 (*read_msr_safe)(unsigned int msr, int *err);
 	int (*write_msr_safe)(unsigned int msr, unsigned low, unsigned high);
 
-	u64 (*read_pmc)(int counter);
-
 	void (*start_context_switch)(struct task_struct *prev);
 	void (*end_context_switch)(struct task_struct *next);
 #endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 1ccd05d8999f..28d195ad7514 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -132,7 +132,6 @@ struct paravirt_patch_template pv_ops = {
 	.cpu.write_msr		= native_write_msr,
 	.cpu.read_msr_safe	= native_read_msr_safe,
 	.cpu.write_msr_safe	= native_write_msr_safe,
-	.cpu.read_pmc		= native_read_pmc,
 	.cpu.load_tr_desc	= native_load_tr_desc,
 	.cpu.set_ldt		= native_set_ldt,
 	.cpu.load_gdt		= native_load_gdt,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 846b5737d320..9fbe187aff00 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1236,8 +1236,6 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
 		.read_msr_safe = xen_read_msr_safe,
 		.write_msr_safe = xen_write_msr_safe,
 
-		.read_pmc = xen_read_pmc,
-
 		.load_tr_desc = paravirt_nop,
 		.set_ldt = xen_set_ldt,
 		.load_gdt = xen_load_gdt,
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 7edd0b5e0e77..8af3b4d7ef4d 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -151,7 +151,7 @@ static u64
 vmxnet3_get_cycles(int pmc)
 {
 #ifdef CONFIG_X86
-	return native_read_pmc(pmc);
+	return native_rdpmcq(pmc);
 #else
 	return 0;
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC
  2025-04-22  8:21 ` [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC Xin Li (Intel)
@ 2025-04-22  8:38   ` Jürgen Groß
  2025-04-22  9:12     ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-22  8:38 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 6671 bytes --]

On 22.04.25 10:21, Xin Li (Intel) wrote:
> To eliminate the indirect call overhead introduced by the pv_ops API,
> use the alternatives mechanism to read PMC:

Which indirect call overhead? The indirect call is patched via the
alternative mechanism to a direct one.

> 
>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>         disabled feature, preventing the Xen PMC read code from being
>         built and ensuring the native code is executed unconditionally.

Without CONFIG_XEN_PV CONFIG_PARAVIRT_XXL is not selected, resulting in
native code anyway.

> 
>      2) When built with CONFIG_XEN_PV:
> 
>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>              the kernel runtime binary is patched to unconditionally
>              jump to the native PMC read code.
> 
>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>              kernel runtime binary is patched to unconditionally jump
>              to the Xen PMC read code.
> 
> Consequently, remove the pv_ops PMC read API.

I don't see the value of this patch.

It adds more #ifdef and code lines without any real gain.

In case the x86 maintainers think it is still worth it, I won't object.


Juergen

> 
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>   arch/x86/include/asm/msr.h            | 31 ++++++++++++++++++++-------
>   arch/x86/include/asm/paravirt.h       |  5 -----
>   arch/x86/include/asm/paravirt_types.h |  2 --
>   arch/x86/kernel/paravirt.c            |  1 -
>   arch/x86/xen/enlighten_pv.c           |  2 --
>   drivers/net/vmxnet3/vmxnet3_drv.c     |  2 +-
>   6 files changed, 24 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 01dc8e61ef97..33cf506e2fd6 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -8,6 +8,7 @@
>   
>   #include <asm/asm.h>
>   #include <asm/errno.h>
> +#include <asm/cpufeature.h>
>   #include <asm/cpumask.h>
>   #include <uapi/asm/msr.h>
>   #include <asm/shared/msr.h>
> @@ -73,6 +74,10 @@ static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
>   static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
>   #endif
>   
> +#ifdef CONFIG_XEN_PV
> +extern u64 xen_read_pmc(int counter);
> +#endif
> +
>   /*
>    * __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
>    * accessors and should not have any tracing or other functionality piggybacking
> @@ -170,16 +175,32 @@ native_write_msr_safe(u32 msr, u32 low, u32 high)
>   extern int rdmsr_safe_regs(u32 regs[8]);
>   extern int wrmsr_safe_regs(u32 regs[8]);
>   
> -static inline u64 native_read_pmc(int counter)
> +static __always_inline u64 native_rdpmcq(int counter)
>   {
>   	DECLARE_ARGS(val, low, high);
>   
> -	asm volatile("rdpmc" : EAX_EDX_RET(val, low, high) : "c" (counter));
> +	asm_inline volatile("rdpmc" : EAX_EDX_RET(val, low, high) : "c" (counter));
> +
>   	if (tracepoint_enabled(rdpmc))
>   		do_trace_rdpmc(counter, EAX_EDX_VAL(val, low, high), 0);
> +
>   	return EAX_EDX_VAL(val, low, high);
>   }
>   
> +static __always_inline u64 rdpmcq(int counter)
> +{
> +#ifdef CONFIG_XEN_PV
> +	if (cpu_feature_enabled(X86_FEATURE_XENPV))
> +		return xen_read_pmc(counter);
> +#endif
> +
> +	/*
> +	 * 1) When built with !CONFIG_XEN_PV.
> +	 * 2) When built with CONFIG_XEN_PV but not running on Xen hypervisor.
> +	 */
> +	return native_rdpmcq(counter);
> +}
> +
>   #ifdef CONFIG_PARAVIRT_XXL
>   #include <asm/paravirt.h>
>   #else
> @@ -233,12 +254,6 @@ static inline int rdmsrq_safe(u32 msr, u64 *p)
>   	*p = native_read_msr_safe(msr, &err);
>   	return err;
>   }
> -
> -static __always_inline u64 rdpmcq(int counter)
> -{
> -	return native_read_pmc(counter);
> -}
> -
>   #endif	/* !CONFIG_PARAVIRT_XXL */
>   
>   /* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 590824916394..c7689f5f70d6 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -239,11 +239,6 @@ static inline int rdmsrq_safe(unsigned msr, u64 *p)
>   	return err;
>   }
>   
> -static __always_inline u64 rdpmcq(int counter)
> -{
> -	return PVOP_CALL1(u64, cpu.read_pmc, counter);
> -}
> -
>   static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
>   {
>   	PVOP_VCALL2(cpu.alloc_ldt, ldt, entries);
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 631c306ce1ff..475f508531d6 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -101,8 +101,6 @@ struct pv_cpu_ops {
>   	u64 (*read_msr_safe)(unsigned int msr, int *err);
>   	int (*write_msr_safe)(unsigned int msr, unsigned low, unsigned high);
>   
> -	u64 (*read_pmc)(int counter);
> -
>   	void (*start_context_switch)(struct task_struct *prev);
>   	void (*end_context_switch)(struct task_struct *next);
>   #endif
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 1ccd05d8999f..28d195ad7514 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -132,7 +132,6 @@ struct paravirt_patch_template pv_ops = {
>   	.cpu.write_msr		= native_write_msr,
>   	.cpu.read_msr_safe	= native_read_msr_safe,
>   	.cpu.write_msr_safe	= native_write_msr_safe,
> -	.cpu.read_pmc		= native_read_pmc,
>   	.cpu.load_tr_desc	= native_load_tr_desc,
>   	.cpu.set_ldt		= native_set_ldt,
>   	.cpu.load_gdt		= native_load_gdt,
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 846b5737d320..9fbe187aff00 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1236,8 +1236,6 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
>   		.read_msr_safe = xen_read_msr_safe,
>   		.write_msr_safe = xen_write_msr_safe,
>   
> -		.read_pmc = xen_read_pmc,
> -
>   		.load_tr_desc = paravirt_nop,
>   		.set_ldt = xen_set_ldt,
>   		.load_gdt = xen_load_gdt,
> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> index 7edd0b5e0e77..8af3b4d7ef4d 100644
> --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -151,7 +151,7 @@ static u64
>   vmxnet3_get_cycles(int pmc)
>   {
>   #ifdef CONFIG_X86
> -	return native_read_pmc(pmc);
> +	return native_rdpmcq(pmc);
>   #else
>   	return 0;
>   #endif


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC
  2025-04-22  8:38   ` Jürgen Groß
@ 2025-04-22  9:12     ` Xin Li
  2025-04-22  9:28       ` Juergen Gross
  0 siblings, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-22  9:12 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/22/2025 1:38 AM, Jürgen Groß wrote:
> On 22.04.25 10:21, Xin Li (Intel) wrote:
>> To eliminate the indirect call overhead introduced by the pv_ops API,
>> use the alternatives mechanism to read PMC:
> 
> Which indirect call overhead? The indirect call is patched via the
> alternative mechanism to a direct one.
> 

See below.


>>
>>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>         disabled feature, preventing the Xen PMC read code from being
>>         built and ensuring the native code is executed unconditionally.
> 
> Without CONFIG_XEN_PV CONFIG_PARAVIRT_XXL is not selected, resulting in
> native code anyway.

Yes, this is kept in this patch, but in a little different way.

> 
>>
>>      2) When built with CONFIG_XEN_PV:
>>
>>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>              the kernel runtime binary is patched to unconditionally
>>              jump to the native PMC read code.
>>
>>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>              kernel runtime binary is patched to unconditionally jump
>>              to the Xen PMC read code.
>>
>> Consequently, remove the pv_ops PMC read API.
> 
> I don't see the value of this patch.
> 
> It adds more #ifdef and code lines without any real gain.
> 
> In case the x86 maintainers think it is still worth it, I won't object.

I think we want to totally bypass pv_ops in the case 2.1).

Do you mean the indirect call is patched to call native code *directly*
for 2.1?  I don't know it, can you please elaborate?

AFAIK, Xen PV has been the sole user of pv_ops for nearly 20 years. This
raises significant doubts about whether pv_ops provides Linux with the
value of being a well-abstracted "CPU" or "Platform".  And the x86
maintainers have said that it's a maintenance nightmare.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC
  2025-04-22  9:12     ` Xin Li
@ 2025-04-22  9:28       ` Juergen Gross
  2025-04-23  7:40         ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Juergen Gross @ 2025-04-22  9:28 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 2692 bytes --]

On 22.04.25 11:12, Xin Li wrote:
> On 4/22/2025 1:38 AM, Jürgen Groß wrote:
>> On 22.04.25 10:21, Xin Li (Intel) wrote:
>>> To eliminate the indirect call overhead introduced by the pv_ops API,
>>> use the alternatives mechanism to read PMC:
>>
>> Which indirect call overhead? The indirect call is patched via the
>> alternative mechanism to a direct one.
>>
> 
> See below.
> 
> 
>>>
>>>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>>         disabled feature, preventing the Xen PMC read code from being
>>>         built and ensuring the native code is executed unconditionally.
>>
>> Without CONFIG_XEN_PV CONFIG_PARAVIRT_XXL is not selected, resulting in
>> native code anyway.
> 
> Yes, this is kept in this patch, but in a little different way.
> 
>>
>>>
>>>      2) When built with CONFIG_XEN_PV:
>>>
>>>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>>              the kernel runtime binary is patched to unconditionally
>>>              jump to the native PMC read code.
>>>
>>>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>>              kernel runtime binary is patched to unconditionally jump
>>>              to the Xen PMC read code.
>>>
>>> Consequently, remove the pv_ops PMC read API.
>>
>> I don't see the value of this patch.
>>
>> It adds more #ifdef and code lines without any real gain.
>>
>> In case the x86 maintainers think it is still worth it, I won't object.
> 
> I think we want to totally bypass pv_ops in the case 2.1).
> 
> Do you mean the indirect call is patched to call native code *directly*
> for 2.1?  I don't know it, can you please elaborate?

All paravirt indirect calls are patched to direct calls via the normal
alternative patch mechanism.

Have a look at alt_replace_call() in arch/x86/kernel/alternative.c

> AFAIK, Xen PV has been the sole user of pv_ops for nearly 20 years. This

Not quite. There was lguest until I ripped it out. :-)

And some use cases are left for KVM and Hyper-V guests (I have kept those
behind CONFIG_PARAVIRT, while the Xen-specific parts are behind
CONFIG_PARAVIRT_XXL now).

> raises significant doubts about whether pv_ops provides Linux with the
> value of being a well-abstracted "CPU" or "Platform".  And the x86
> maintainers have said that it's a maintenance nightmare.

I have worked rather hard to make it less intrusive, especially by removing
the paravirt specific code patching (now all done via alternative patching)
and by removing 32-bit Xen PV mode.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC
  2025-04-22  9:28       ` Juergen Gross
@ 2025-04-23  7:40         ` Xin Li
  0 siblings, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-23  7:40 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/22/2025 2:28 AM, Juergen Gross wrote:
> 
> I have worked rather hard to make it less intrusive, especially by removing
> the paravirt specific code patching (now all done via alternative patching)
> and by removing 32-bit Xen PV mode.

I looked at the optimization, and it is a nice improvement.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 07/34] x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (5 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq() Xin Li (Intel)
                   ` (27 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

__wrmsr() is the lowest level primitive MSR write API, and its direct
use is NOT preferred.  Use its wrapper function native_wrmsrq() instead.

No functional change intended.

This change also prepares for using the alternatives mechanism to access
MSRs: uses of native_wrmsr{,q}() don't need to change, but the approaches
how they perform MSR operations are binary patched during boot time upon
availability of MSR instructions.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---

Change in v2:
* Use native_wrmsr() where natural [rmid_p, closid_p] high/lo parameters
  can be used, without the shift-uglification (Ingo).
---
 arch/x86/events/amd/brs.c                 | 2 +-
 arch/x86/include/asm/apic.h               | 2 +-
 arch/x86/include/asm/msr.h                | 6 ++++--
 arch/x86/kernel/cpu/mce/core.c            | 2 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +++---
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index ec4e8a4cace4..3f5ecfd80d1e 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -44,7 +44,7 @@ static inline unsigned int brs_to(int idx)
 static __always_inline void set_debug_extn_cfg(u64 val)
 {
 	/* bits[4:3] must always be set to 11b */
-	__wrmsr(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
+	native_wrmsrq(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
 }
 
 static __always_inline u64 get_debug_extn_cfg(void)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 1c136f54651c..0174dd548327 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -214,7 +214,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 
 static inline void native_apic_msr_eoi(void)
 {
-	__wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
+	native_wrmsrq(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK);
 }
 
 static inline u32 native_apic_msr_read(u32 reg)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 33cf506e2fd6..b50cbd3299b3 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -149,10 +149,12 @@ static inline u64 native_read_msr_safe(u32 msr, int *err)
 static inline void notrace
 native_write_msr(u32 msr, u32 low, u32 high)
 {
-	__wrmsr(msr, low, high);
+	u64 val = (u64)high << 32 | low;
+
+	native_wrmsrq(msr, val);
 
 	if (tracepoint_enabled(write_msr))
-		do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
+		do_trace_write_msr(msr, val, 0);
 }
 
 /* Can be uninlined because referenced by paravirt */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 255927f0284e..1ae75ec7ac95 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1306,7 +1306,7 @@ static noinstr bool mce_check_crashing_cpu(void)
 		}
 
 		if (mcgstatus & MCG_STATUS_RIPV) {
-			__wrmsr(MSR_IA32_MCG_STATUS, 0, 0);
+			native_wrmsrq(MSR_IA32_MCG_STATUS, 0);
 			return true;
 		}
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 276ffab194f6..9ab033d6856a 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -483,7 +483,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * cache.
 	 */
 	saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
-	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	native_wrmsrq(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	closid_p = this_cpu_read(pqr_state.cur_closid);
 	rmid_p = this_cpu_read(pqr_state.cur_rmid);
 	mem_r = plr->kmem;
@@ -495,7 +495,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * pseudo-locked followed by reading of kernel memory to load it
 	 * into the cache.
 	 */
-	__wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
+	native_wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
 
 	/*
 	 * Cache was flushed earlier. Now access kernel memory to read it
@@ -532,7 +532,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * Critical section end: restore closid with capacity bitmask that
 	 * does not overlap with pseudo-locked region.
 	 */
-	__wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
+	native_wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
 
 	/* Re-enable the hardware prefetcher(s) */
 	wrmsrq(MSR_MISC_FEATURE_CONTROL, saved_msr);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (6 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 07/34] x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-23 15:51   ` Dave Hansen
  2025-04-22  8:21 ` [RFC PATCH v2 09/34] x86/msr: Add the native_rdmsrq() helper Xin Li (Intel)
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Convert a native_wrmsr() use to native_wrmsrq() to zap meaningless type
conversions when a u64 MSR value is splitted into two u32.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/sev-internal.h | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 73cb774c3639..9da509e52e11 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -101,12 +101,7 @@ static inline u64 sev_es_rd_ghcb_msr(void)
 
 static __always_inline void sev_es_wr_ghcb_msr(u64 val)
 {
-	u32 low, high;
-
-	low  = (u32)(val);
-	high = (u32)(val >> 32);
-
-	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+	native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
 }
 
 enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq()
  2025-04-22  8:21 ` [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq() Xin Li (Intel)
@ 2025-04-23 15:51   ` Dave Hansen
  2025-04-23 17:27     ` Xin Li
  2025-04-23 23:23     ` Xin Li
  0 siblings, 2 replies; 94+ messages in thread
From: Dave Hansen @ 2025-04-23 15:51 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/22/25 01:21, Xin Li (Intel) wrote:
>  static __always_inline void sev_es_wr_ghcb_msr(u64 val)
>  {
> -	u32 low, high;
> -
> -	low  = (u32)(val);
> -	high = (u32)(val >> 32);
> -
> -	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> +	native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
>  }

A note on ordering: Had this been a native_wrmsr()=>__wrmsr()
conversion, it could be sucked into the tree easily before the big
__wrmsr()=>native_wrmsrq() conversion.

Yeah, you'd have to base the big rename on top of this. But with a
series this big, I'd prioritize whatever gets it trimmed down.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq()
  2025-04-23 15:51   ` Dave Hansen
@ 2025-04-23 17:27     ` Xin Li
  2025-04-23 23:23     ` Xin Li
  1 sibling, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-23 17:27 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/23/2025 8:51 AM, Dave Hansen wrote:
> On 4/22/25 01:21, Xin Li (Intel) wrote:
>>   static __always_inline void sev_es_wr_ghcb_msr(u64 val)
>>   {
>> -	u32 low, high;
>> -
>> -	low  = (u32)(val);
>> -	high = (u32)(val >> 32);
>> -
>> -	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
>> +	native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
>>   }
> 
> A note on ordering: Had this been a native_wrmsr()=>__wrmsr()
> conversion, it could be sucked into the tree easily before the big
> __wrmsr()=>native_wrmsrq() conversion.
> 
> Yeah, you'd have to base the big rename on top of this. But with a
> series this big, I'd prioritize whatever gets it trimmed down.

Okay, I will focus on cleanup first.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq()
  2025-04-23 15:51   ` Dave Hansen
  2025-04-23 17:27     ` Xin Li
@ 2025-04-23 23:23     ` Xin Li
  1 sibling, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-23 23:23 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/23/2025 8:51 AM, Dave Hansen wrote:
> On 4/22/25 01:21, Xin Li (Intel) wrote:
>>   static __always_inline void sev_es_wr_ghcb_msr(u64 val)
>>   {
>> -	u32 low, high;
>> -
>> -	low  = (u32)(val);
>> -	high = (u32)(val >> 32);
>> -
>> -	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
>> +	native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
>>   }
> 
> A note on ordering: Had this been a native_wrmsr()=>__wrmsr()
> conversion, it could be sucked into the tree easily before the big
> __wrmsr()=>native_wrmsrq() conversion.

Can't reorder the 2 patches, because __wrmsr() takes two u32 arguments
and the split has to be done explicitly in sev_es_wr_ghcb_msr().

Thanks!
     Xin



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 09/34] x86/msr: Add the native_rdmsrq() helper
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (7 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses Xin Li (Intel)
                   ` (25 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

__rdmsr() is the lowest-level primitive MSR read API, implemented in
assembly code and returning an MSR value in a u64 integer, on top of
which a convenience wrapper native_rdmsr() is defined to return an MSR
value in two u32 integers.  For some reason, native_rdmsrq() is not
defined and __rdmsr() is directly used when it needs to return an MSR
value in a u64 integer.

Add the native_rdmsrq() helper, which is simply an alias of __rdmsr(),
to make native_rdmsr() and native_rdmsrq() a pair of MSR read APIs.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---

Change in v2:
* Split into two changes and add the native_rdmsrl() helper in the
  first one with a proper explanation (Ingo).
---
 arch/x86/include/asm/msr.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index b50cbd3299b3..2ab8effea4cd 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -112,6 +112,11 @@ do {							\
 	(void)((val2) = (u32)(__val >> 32));		\
 } while (0)
 
+static __always_inline u64 native_rdmsrq(u32 msr)
+{
+	return __rdmsr(msr);
+}
+
 #define native_wrmsr(msr, low, high)			\
 	__wrmsr(msr, low, high)
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (8 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 09/34] x86/msr: Add the native_rdmsrq() helper Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22 15:09   ` Sean Christopherson
  2025-04-22  8:21 ` [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}() Xin Li (Intel)
                   ` (24 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

__rdmsr() is the lowest level primitive MSR read API, and its direct
use is NOT preferred.  Use its wrapper function native_rdmsrq() instead.

No functional change intended.

This change also prepares for using the alternatives mechanism to access
MSRs: uses of native_rdmsr{,q}() don't need to change, but the approaches
how they perform MSR operations are binary patched during boot time upon
availability of MSR instructions.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/boot/startup/sme.c               | 4 ++--
 arch/x86/events/amd/brs.c                 | 2 +-
 arch/x86/hyperv/hv_vtl.c                  | 4 ++--
 arch/x86/hyperv/ivm.c                     | 2 +-
 arch/x86/include/asm/mshyperv.h           | 2 +-
 arch/x86/include/asm/sev-internal.h       | 2 +-
 arch/x86/kernel/cpu/common.c              | 2 +-
 arch/x86/kernel/cpu/mce/core.c            | 4 ++--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 arch/x86/kvm/vmx/vmx.c                    | 4 ++--
 10 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 591d6a4d2e59..5e147bf5a0a8 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -524,7 +524,7 @@ void __head sme_enable(struct boot_params *bp)
 	me_mask = 1UL << (ebx & 0x3f);
 
 	/* Check the SEV MSR whether SEV or SME is enabled */
-	sev_status = msr = __rdmsr(MSR_AMD64_SEV);
+	sev_status = msr = native_rdmsrq(MSR_AMD64_SEV);
 	feature_mask = (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
 
 	/*
@@ -555,7 +555,7 @@ void __head sme_enable(struct boot_params *bp)
 			return;
 
 		/* For SME, check the SYSCFG MSR */
-		msr = __rdmsr(MSR_AMD64_SYSCFG);
+		msr = native_rdmsrq(MSR_AMD64_SYSCFG);
 		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
 			return;
 	}
diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index 3f5ecfd80d1e..06f35a6b58a5 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -49,7 +49,7 @@ static __always_inline void set_debug_extn_cfg(u64 val)
 
 static __always_inline u64 get_debug_extn_cfg(void)
 {
-	return __rdmsr(MSR_AMD_DBG_EXTN_CFG);
+	return native_rdmsrq(MSR_AMD_DBG_EXTN_CFG);
 }
 
 static bool __init amd_brs_detect(void)
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 13242ed8ff16..c6343e699154 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -149,11 +149,11 @@ static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
 	input->vp_context.rip = rip;
 	input->vp_context.rsp = rsp;
 	input->vp_context.rflags = 0x0000000000000002;
-	input->vp_context.efer = __rdmsr(MSR_EFER);
+	input->vp_context.efer = native_rdmsrq(MSR_EFER);
 	input->vp_context.cr0 = native_read_cr0();
 	input->vp_context.cr3 = __native_read_cr3();
 	input->vp_context.cr4 = native_read_cr4();
-	input->vp_context.msr_cr_pat = __rdmsr(MSR_IA32_CR_PAT);
+	input->vp_context.msr_cr_pat = native_rdmsrq(MSR_IA32_CR_PAT);
 	input->vp_context.idtr.limit = idt_ptr.size;
 	input->vp_context.idtr.base = idt_ptr.address;
 	input->vp_context.gdtr.limit = gdt_ptr.size;
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 8209de792388..09a165a3c41e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -111,7 +111,7 @@ u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
 
 static inline u64 rd_ghcb_msr(void)
 {
-	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+	return native_rdmsrq(MSR_AMD64_SEV_ES_GHCB);
 }
 
 static inline void wr_ghcb_msr(u64 val)
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 15d00dace70f..778444310cfb 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -305,7 +305,7 @@ void hv_set_non_nested_msr(unsigned int reg, u64 value);
 
 static __always_inline u64 hv_raw_get_msr(unsigned int reg)
 {
-	return __rdmsr(reg);
+	return native_rdmsrq(reg);
 }
 
 #else /* CONFIG_HYPERV */
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 9da509e52e11..d259bcec220a 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -96,7 +96,7 @@ int svsm_perform_call_protocol(struct svsm_call *call);
 
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
-	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+	return native_rdmsrq(MSR_AMD64_SEV_ES_GHCB);
 }
 
 static __always_inline void sev_es_wr_ghcb_msr(u64 val)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index de1a25217053..10da3da5b81f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -164,7 +164,7 @@ static void ppin_init(struct cpuinfo_x86 *c)
 
 	/* Is the enable bit set? */
 	if (val & 2UL) {
-		c->ppin = __rdmsr(info->msr_ppin);
+		c->ppin = native_rdmsrq(info->msr_ppin);
 		set_cpu_cap(c, info->feature);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 1ae75ec7ac95..32286bad75e6 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -121,7 +121,7 @@ void mce_prep_record_common(struct mce *m)
 {
 	m->cpuid	= cpuid_eax(1);
 	m->cpuvendor	= boot_cpu_data.x86_vendor;
-	m->mcgcap	= __rdmsr(MSR_IA32_MCG_CAP);
+	m->mcgcap	= native_rdmsrq(MSR_IA32_MCG_CAP);
 	/* need the internal __ version to avoid deadlocks */
 	m->time		= __ktime_get_real_seconds();
 }
@@ -1298,7 +1298,7 @@ static noinstr bool mce_check_crashing_cpu(void)
 	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
 		u64 mcgstatus;
 
-		mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
+		mcgstatus = native_rdmsrq(MSR_IA32_MCG_STATUS);
 
 		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
 			if (mcgstatus & MCG_STATUS_LMCES)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 9ab033d6856a..185317c6b509 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -482,7 +482,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * the buffer and evict pseudo-locked memory read earlier from the
 	 * cache.
 	 */
-	saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
+	saved_msr = native_rdmsrq(MSR_MISC_FEATURE_CONTROL);
 	native_wrmsrq(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	closid_p = this_cpu_read(pqr_state.cur_closid);
 	rmid_p = this_cpu_read(pqr_state.cur_rmid);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1547bfacd40f..e73c1d5ba6c4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -380,7 +380,7 @@ static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
 	if (!vmx->disable_fb_clear)
 		return;
 
-	msr = __rdmsr(MSR_IA32_MCU_OPT_CTRL);
+	msr = native_rdmsrq(MSR_IA32_MCU_OPT_CTRL);
 	msr |= FB_CLEAR_DIS;
 	native_wrmsrq(MSR_IA32_MCU_OPT_CTRL, msr);
 	/* Cache the MSR value to avoid reading it later */
@@ -7307,7 +7307,7 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
 		return;
 
 	if (flags & VMX_RUN_SAVE_SPEC_CTRL)
-		vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+		vmx->spec_ctrl = native_rdmsrq(MSR_IA32_SPEC_CTRL);
 
 	/*
 	 * If the guest/host SPEC_CTRL values differ, restore the host value.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses
  2025-04-22  8:21 ` [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses Xin Li (Intel)
@ 2025-04-22 15:09   ` Sean Christopherson
  2025-04-23  9:27     ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Sean Christopherson @ 2025-04-22 15:09 UTC (permalink / raw)
  To: Xin Li (Intel)
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On Tue, Apr 22, 2025, Xin Li (Intel) wrote:
> __rdmsr() is the lowest level primitive MSR read API, and its direct
> use is NOT preferred.

Doesn't mean it's wrong.

> Use its wrapper function native_rdmsrq() instead.

...

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1547bfacd40f..e73c1d5ba6c4 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -380,7 +380,7 @@ static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
>  	if (!vmx->disable_fb_clear)
>  		return;
>  
> -	msr = __rdmsr(MSR_IA32_MCU_OPT_CTRL);
> +	msr = native_rdmsrq(MSR_IA32_MCU_OPT_CTRL);
>  	msr |= FB_CLEAR_DIS;
>  	native_wrmsrq(MSR_IA32_MCU_OPT_CTRL, msr);
>  	/* Cache the MSR value to avoid reading it later */
> @@ -7307,7 +7307,7 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
>  		return;
>  
>  	if (flags & VMX_RUN_SAVE_SPEC_CTRL)
> -		vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
> +		vmx->spec_ctrl = native_rdmsrq(MSR_IA32_SPEC_CTRL);

And what guarantees that native_rdmsrq() won't have tracing?  Ugh, a later patch
renames native_rdmsrq() => native_rdmsrq_no_trace().

I really don't like this.  It makes simple and obvious code:

	vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);

so much harder to read:

	vmx->spec_ctrl = native_rdmsrq_no_trace(MSR_IA32_SPEC_CTRL);

and does so in a way that is difficult to review, e.g. I have to peek ahead to
understand that this is even ok.

I strongly prefer that we find a way to not require such verbose APIs, especially
if KVM ends up using native variants throughout.  Xen PV is supposed to be the
odd one out, yet native code is what suffers.  Blech.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses
  2025-04-22 15:09   ` Sean Christopherson
@ 2025-04-23  9:27     ` Xin Li
  2025-04-23 13:37       ` Sean Christopherson
  2025-04-23 14:02       ` Dave Hansen
  0 siblings, 2 replies; 94+ messages in thread
From: Xin Li @ 2025-04-23  9:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On 4/22/2025 8:09 AM, Sean Christopherson wrote:
> On Tue, Apr 22, 2025, Xin Li (Intel) wrote:
>> __rdmsr() is the lowest level primitive MSR read API, and its direct
>> use is NOT preferred.
> 
> Doesn't mean it's wrong.

I wouldn't go so far as to claim that it's wrong :-)

>> Use its wrapper function native_rdmsrq() instead.

The current code exhibits a somewhat haphazard use of MSR APIs, so I
wanted to clarify which API to employ in specific situations with
verbose function naming.

Here is an example that Boris had to fix the use of MSR APIs:

https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f980f9c31a923e9040dee0bc679a5f5b09e61f40

> 
> ...
> 
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 1547bfacd40f..e73c1d5ba6c4 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -380,7 +380,7 @@ static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
>>   	if (!vmx->disable_fb_clear)
>>   		return;
>>   
>> -	msr = __rdmsr(MSR_IA32_MCU_OPT_CTRL);
>> +	msr = native_rdmsrq(MSR_IA32_MCU_OPT_CTRL);
>>   	msr |= FB_CLEAR_DIS;
>>   	native_wrmsrq(MSR_IA32_MCU_OPT_CTRL, msr);
>>   	/* Cache the MSR value to avoid reading it later */
>> @@ -7307,7 +7307,7 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
>>   		return;
>>   
>>   	if (flags & VMX_RUN_SAVE_SPEC_CTRL)
>> -		vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
>> +		vmx->spec_ctrl = native_rdmsrq(MSR_IA32_SPEC_CTRL);
> 
> And what guarantees that native_rdmsrq() won't have tracing?  Ugh, a later patch
> renames native_rdmsrq() => native_rdmsrq_no_trace().
> 
> I really don't like this.  It makes simple and obvious code:
> 
> 	vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
> 
> so much harder to read:
> 
> 	vmx->spec_ctrl = native_rdmsrq_no_trace(MSR_IA32_SPEC_CTRL);
> 
> and does so in a way that is difficult to review, e.g. I have to peek ahead to
> understand that this is even ok.
> 
> I strongly prefer that we find a way to not require such verbose APIs, especially
> if KVM ends up using native variants throughout.  Xen PV is supposed to be the
> odd one out, yet native code is what suffers.  Blech.

Will try to figure out how to name the APIs.

One reason I chose verbose names is that short names are in use and
renaming needs to touch a lot of files (and not fun at all).

Thanks!
     Xin


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses
  2025-04-23  9:27     ` Xin Li
@ 2025-04-23 13:37       ` Sean Christopherson
  2025-04-23 14:02       ` Dave Hansen
  1 sibling, 0 replies; 94+ messages in thread
From: Sean Christopherson @ 2025-04-23 13:37 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On Wed, Apr 23, 2025, Xin Li wrote:
> On 4/22/2025 8:09 AM, Sean Christopherson wrote:
> > I strongly prefer that we find a way to not require such verbose APIs, especially
> > if KVM ends up using native variants throughout.  Xen PV is supposed to be the
> > odd one out, yet native code is what suffers.  Blech.
> 
> Will try to figure out how to name the APIs.
> 
> One reason I chose verbose names is that short names are in use and
> renaming needs to touch a lot of files (and not fun at all).

Yeah, I've looked at modifying rdmsrl() to "return" a value more than once, and
ran away screaming every time.

But since you're already doing a pile of renames, IMO this is the perfect time to
do an aggressive cleanup.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses
  2025-04-23  9:27     ` Xin Li
  2025-04-23 13:37       ` Sean Christopherson
@ 2025-04-23 14:02       ` Dave Hansen
  1 sibling, 0 replies; 94+ messages in thread
From: Dave Hansen @ 2025-04-23 14:02 UTC (permalink / raw)
  To: Xin Li, Sean Christopherson
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On 4/23/25 02:27, Xin Li wrote:
> One reason I chose verbose names is that short names are in use and
> renaming needs to touch a lot of files (and not fun at all).

This series is getting *WAY* too big.

Could you please peel the renaming stuff out and we can get it applied
independently of the new instruction gunk?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (9 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-24  6:25   ` Mi, Dapeng
  2025-04-22  8:21 ` [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

hpa found that pmu_msr_write() is actually a completely pointless
function [1]: all it does is shuffle some arguments, then calls
pmu_msr_chk_emulated() and if it returns true AND the emulated flag
is clear then does *exactly the same thing* that the calling code
would have done if pmu_msr_write() itself had returned true.  And
pmu_msr_read() does the equivalent stupidity.

Remove the calls to native_{read,write}_msr{,_safe}() within
pmu_msr_{read,write}().  Instead reuse the existing calling code
that decides whether to call native_{read,write}_msr{,_safe}() based
on the return value from pmu_msr_{read,write}().  Consequently,
eliminate the need to pass an error pointer to pmu_msr_{read,write}().

While at it, refactor pmu_msr_write() to take the MSR value as a u64
argument, replacing the current dual u32 arguments, because the dual
u32 arguments were only used to call native_write_msr{,_safe}(), which
has now been removed.

[1]: https://lore.kernel.org/lkml/0ec48b84-d158-47c6-b14c-3563fd14bcc4@zytor.com/

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Sign-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/xen/enlighten_pv.c |  6 +++++-
 arch/x86/xen/pmu.c          | 27 ++++-----------------------
 arch/x86/xen/xen-ops.h      |  4 ++--
 3 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9fbe187aff00..1418758b57ff 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1132,6 +1132,8 @@ static void set_seg(unsigned int which, unsigned int low, unsigned int high,
 static void xen_do_write_msr(unsigned int msr, unsigned int low,
 			     unsigned int high, int *err)
 {
+	u64 val;
+
 	switch (msr) {
 	case MSR_FS_BASE:
 		set_seg(SEGBASE_FS, low, high, err);
@@ -1158,7 +1160,9 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 		break;
 
 	default:
-		if (!pmu_msr_write(msr, low, high, err)) {
+		val = (u64)high << 32 | low;
+
+		if (!pmu_msr_write(msr, val)) {
 			if (err)
 				*err = native_write_msr_safe(msr, low, high);
 			else
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 9c1682af620a..95caae97a394 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -313,37 +313,18 @@ static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
 	return true;
 }
 
-bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
+bool pmu_msr_read(u32 msr, u64 *val)
 {
 	bool emulated;
 
-	if (!pmu_msr_chk_emulated(msr, val, true, &emulated))
-		return false;
-
-	if (!emulated) {
-		*val = err ? native_read_msr_safe(msr, err)
-			   : native_read_msr(msr);
-	}
-
-	return true;
+	return pmu_msr_chk_emulated(msr, val, true, &emulated) && emulated;
 }
 
-bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
+bool pmu_msr_write(u32 msr, u64 val)
 {
-	uint64_t val = ((uint64_t)high << 32) | low;
 	bool emulated;
 
-	if (!pmu_msr_chk_emulated(msr, &val, false, &emulated))
-		return false;
-
-	if (!emulated) {
-		if (err)
-			*err = native_write_msr_safe(msr, low, high);
-		else
-			native_write_msr(msr, low, high);
-	}
-
-	return true;
+	return pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated;
 }
 
 static u64 xen_amd_read_pmc(int counter)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index dc886c3cc24d..a1875e10be31 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -271,8 +271,8 @@ void xen_pmu_finish(int cpu);
 static inline void xen_pmu_init(int cpu) {}
 static inline void xen_pmu_finish(int cpu) {}
 #endif
-bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err);
-bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err);
+bool pmu_msr_read(u32 msr, u64 *val);
+bool pmu_msr_write(u32 msr, u64 val);
 int pmu_apic_update(uint32_t reg);
 u64 xen_read_pmc(int counter);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}()
  2025-04-22  8:21 ` [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}() Xin Li (Intel)
@ 2025-04-24  6:25   ` Mi, Dapeng
  2025-04-24  7:16     ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Mi, Dapeng @ 2025-04-24  6:25 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui


On 4/22/2025 4:21 PM, Xin Li (Intel) wrote:
> hpa found that pmu_msr_write() is actually a completely pointless
> function [1]: all it does is shuffle some arguments, then calls
> pmu_msr_chk_emulated() and if it returns true AND the emulated flag
> is clear then does *exactly the same thing* that the calling code
> would have done if pmu_msr_write() itself had returned true.  And
> pmu_msr_read() does the equivalent stupidity.
>
> Remove the calls to native_{read,write}_msr{,_safe}() within
> pmu_msr_{read,write}().  Instead reuse the existing calling code
> that decides whether to call native_{read,write}_msr{,_safe}() based
> on the return value from pmu_msr_{read,write}().  Consequently,
> eliminate the need to pass an error pointer to pmu_msr_{read,write}().
>
> While at it, refactor pmu_msr_write() to take the MSR value as a u64
> argument, replacing the current dual u32 arguments, because the dual
> u32 arguments were only used to call native_write_msr{,_safe}(), which
> has now been removed.
>
> [1]: https://lore.kernel.org/lkml/0ec48b84-d158-47c6-b14c-3563fd14bcc4@zytor.com/
>
> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Sign-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>  arch/x86/xen/enlighten_pv.c |  6 +++++-
>  arch/x86/xen/pmu.c          | 27 ++++-----------------------
>  arch/x86/xen/xen-ops.h      |  4 ++--
>  3 files changed, 11 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 9fbe187aff00..1418758b57ff 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1132,6 +1132,8 @@ static void set_seg(unsigned int which, unsigned int low, unsigned int high,
>  static void xen_do_write_msr(unsigned int msr, unsigned int low,
>  			     unsigned int high, int *err)
>  {
> +	u64 val;
> +
>  	switch (msr) {
>  	case MSR_FS_BASE:
>  		set_seg(SEGBASE_FS, low, high, err);
> @@ -1158,7 +1160,9 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
>  		break;
>  
>  	default:
> -		if (!pmu_msr_write(msr, low, high, err)) {
> +		val = (u64)high << 32 | low;
> +
> +		if (!pmu_msr_write(msr, val)) {
>  			if (err)
>  				*err = native_write_msr_safe(msr, low, high);
>  			else
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> index 9c1682af620a..95caae97a394 100644
> --- a/arch/x86/xen/pmu.c
> +++ b/arch/x86/xen/pmu.c
> @@ -313,37 +313,18 @@ static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
>  	return true;
>  }
>  
> -bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
> +bool pmu_msr_read(u32 msr, u64 *val)

The function name is some kind of misleading right now. With the change,
this function only read PMU MSR's value if it's emulated, otherwise it
won't really read PMU MSR. How about changing the name to
"pmu_emulated_msr_read" or something similar?


>  {
>  	bool emulated;
>  
> -	if (!pmu_msr_chk_emulated(msr, val, true, &emulated))
> -		return false;
> -
> -	if (!emulated) {
> -		*val = err ? native_read_msr_safe(msr, err)
> -			   : native_read_msr(msr);
> -	}
> -
> -	return true;
> +	return pmu_msr_chk_emulated(msr, val, true, &emulated) && emulated;
>  }
>  
> -bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
> +bool pmu_msr_write(u32 msr, u64 val)

ditto.


>  {
> -	uint64_t val = ((uint64_t)high << 32) | low;
>  	bool emulated;
>  
> -	if (!pmu_msr_chk_emulated(msr, &val, false, &emulated))
> -		return false;
> -
> -	if (!emulated) {
> -		if (err)
> -			*err = native_write_msr_safe(msr, low, high);
> -		else
> -			native_write_msr(msr, low, high);
> -	}
> -
> -	return true;
> +	return pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated;
>  }
>  
>  static u64 xen_amd_read_pmc(int counter)
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index dc886c3cc24d..a1875e10be31 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -271,8 +271,8 @@ void xen_pmu_finish(int cpu);
>  static inline void xen_pmu_init(int cpu) {}
>  static inline void xen_pmu_finish(int cpu) {}
>  #endif
> -bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err);
> -bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err);
> +bool pmu_msr_read(u32 msr, u64 *val);

The prototype of pmu_msr_read() has been changed, but why there is no
corresponding change in its caller (xen_do_read_msr())?


> +bool pmu_msr_write(u32 msr, u64 val);
>  int pmu_apic_update(uint32_t reg);
>  u64 xen_read_pmc(int counter);
>  

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}()
  2025-04-24  6:25   ` Mi, Dapeng
@ 2025-04-24  7:16     ` Xin Li
  0 siblings, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-24  7:16 UTC (permalink / raw)
  To: Mi, Dapeng, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/23/2025 11:25 PM, Mi, Dapeng wrote:

>> -bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
>> +bool pmu_msr_read(u32 msr, u64 *val)
> 
> The function name is some kind of misleading right now. With the change,
> this function only read PMU MSR's value if it's emulated, otherwise it
> won't really read PMU MSR. How about changing the name to
> "pmu_emulated_msr_read" or something similar?

This makes sense!

>> -bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err);
>> -bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err);
>> +bool pmu_msr_read(u32 msr, u64 *val);
> 
> The prototype of pmu_msr_read() has been changed, but why there is no
> corresponding change in its caller (xen_do_read_msr())?

Good catch.  I didn't compile one by one thus missed it.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (10 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-24  6:33   ` Mi, Dapeng
  2025-04-24 10:05   ` Jürgen Groß
  2025-04-22  8:21 ` [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg() Xin Li (Intel)
                   ` (22 subsequent siblings)
  34 siblings, 2 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

As pmu_msr_{read,write}() are now wrappers of pmu_msr_chk_emulated(),
remove them and use pmu_msr_chk_emulated() directly.

While at it, convert the data type of MSR index to u32 in functions
called in pmu_msr_chk_emulated().

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/xen/enlighten_pv.c | 17 ++++++++++-------
 arch/x86/xen/pmu.c          | 24 ++++--------------------
 arch/x86/xen/xen-ops.h      |  3 +--
 3 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 1418758b57ff..b5a8bceb5f56 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1089,8 +1089,9 @@ static void xen_write_cr4(unsigned long cr4)
 static u64 xen_do_read_msr(unsigned int msr, int *err)
 {
 	u64 val = 0;	/* Avoid uninitialized value for safe variant. */
+	bool emulated;
 
-	if (pmu_msr_read(msr, &val, err))
+	if (pmu_msr_chk_emulated(msr, &val, true, &emulated) && emulated)
 		return val;
 
 	if (err)
@@ -1133,6 +1134,7 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 			     unsigned int high, int *err)
 {
 	u64 val;
+	bool emulated;
 
 	switch (msr) {
 	case MSR_FS_BASE:
@@ -1162,12 +1164,13 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 	default:
 		val = (u64)high << 32 | low;
 
-		if (!pmu_msr_write(msr, val)) {
-			if (err)
-				*err = native_write_msr_safe(msr, low, high);
-			else
-				native_write_msr(msr, low, high);
-		}
+		if (pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated)
+			return;
+
+		if (err)
+			*err = native_write_msr_safe(msr, low, high);
+		else
+			native_write_msr(msr, low, high);
 	}
 }
 
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 95caae97a394..afb02f43ee3f 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -128,7 +128,7 @@ static inline uint32_t get_fam15h_addr(u32 addr)
 	return addr;
 }
 
-static inline bool is_amd_pmu_msr(unsigned int msr)
+static bool is_amd_pmu_msr(u32 msr)
 {
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
 	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
@@ -194,8 +194,7 @@ static bool is_intel_pmu_msr(u32 msr_index, int *type, int *index)
 	}
 }
 
-static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
-				  int index, bool is_read)
+static bool xen_intel_pmu_emulate(u32 msr, u64 *val, int type, int index, bool is_read)
 {
 	uint64_t *reg = NULL;
 	struct xen_pmu_intel_ctxt *ctxt;
@@ -257,7 +256,7 @@ static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
 	return false;
 }
 
-static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
+static bool xen_amd_pmu_emulate(u32 msr, u64 *val, bool is_read)
 {
 	uint64_t *reg = NULL;
 	int i, off = 0;
@@ -298,8 +297,7 @@ static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
 	return false;
 }
 
-static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
-				 bool *emul)
+bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul)
 {
 	int type, index = 0;
 
@@ -313,20 +311,6 @@ static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
 	return true;
 }
 
-bool pmu_msr_read(u32 msr, u64 *val)
-{
-	bool emulated;
-
-	return pmu_msr_chk_emulated(msr, val, true, &emulated) && emulated;
-}
-
-bool pmu_msr_write(u32 msr, u64 val)
-{
-	bool emulated;
-
-	return pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated;
-}
-
 static u64 xen_amd_read_pmc(int counter)
 {
 	struct xen_pmu_amd_ctxt *ctxt;
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index a1875e10be31..fde9f9d7415f 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -271,8 +271,7 @@ void xen_pmu_finish(int cpu);
 static inline void xen_pmu_init(int cpu) {}
 static inline void xen_pmu_finish(int cpu) {}
 #endif
-bool pmu_msr_read(u32 msr, u64 *val);
-bool pmu_msr_write(u32 msr, u64 val);
+bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul);
 int pmu_apic_update(uint32_t reg);
 u64 xen_read_pmc(int counter);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-22  8:21 ` [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
@ 2025-04-24  6:33   ` Mi, Dapeng
  2025-04-24  7:21     ` Xin Li
  2025-04-24 10:05   ` Jürgen Groß
  1 sibling, 1 reply; 94+ messages in thread
From: Mi, Dapeng @ 2025-04-24  6:33 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui


On 4/22/2025 4:21 PM, Xin Li (Intel) wrote:
> As pmu_msr_{read,write}() are now wrappers of pmu_msr_chk_emulated(),
> remove them and use pmu_msr_chk_emulated() directly.
>
> While at it, convert the data type of MSR index to u32 in functions
> called in pmu_msr_chk_emulated().
>
> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>  arch/x86/xen/enlighten_pv.c | 17 ++++++++++-------
>  arch/x86/xen/pmu.c          | 24 ++++--------------------
>  arch/x86/xen/xen-ops.h      |  3 +--
>  3 files changed, 15 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 1418758b57ff..b5a8bceb5f56 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1089,8 +1089,9 @@ static void xen_write_cr4(unsigned long cr4)
>  static u64 xen_do_read_msr(unsigned int msr, int *err)
>  {
>  	u64 val = 0;	/* Avoid uninitialized value for safe variant. */
> +	bool emulated;
>  
> -	if (pmu_msr_read(msr, &val, err))
> +	if (pmu_msr_chk_emulated(msr, &val, true, &emulated) && emulated)

ah, here it is.

Could we merge this patch and previous patch into a single patch? It's
unnecessary to just modify the pmu_msr_read()/pmu_msr_write() in previous
patch and delete them immediately. It just wastes the effort.


>  		return val;
>  
>  	if (err)
> @@ -1133,6 +1134,7 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
>  			     unsigned int high, int *err)
>  {
>  	u64 val;
> +	bool emulated;
>  
>  	switch (msr) {
>  	case MSR_FS_BASE:
> @@ -1162,12 +1164,13 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
>  	default:
>  		val = (u64)high << 32 | low;
>  
> -		if (!pmu_msr_write(msr, val)) {
> -			if (err)
> -				*err = native_write_msr_safe(msr, low, high);
> -			else
> -				native_write_msr(msr, low, high);
> -		}
> +		if (pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated)
> +			return;
> +
> +		if (err)
> +			*err = native_write_msr_safe(msr, low, high);
> +		else
> +			native_write_msr(msr, low, high);
>  	}
>  }
>  
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> index 95caae97a394..afb02f43ee3f 100644
> --- a/arch/x86/xen/pmu.c
> +++ b/arch/x86/xen/pmu.c
> @@ -128,7 +128,7 @@ static inline uint32_t get_fam15h_addr(u32 addr)
>  	return addr;
>  }
>  
> -static inline bool is_amd_pmu_msr(unsigned int msr)
> +static bool is_amd_pmu_msr(u32 msr)
>  {
>  	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
>  	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
> @@ -194,8 +194,7 @@ static bool is_intel_pmu_msr(u32 msr_index, int *type, int *index)
>  	}
>  }
>  
> -static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
> -				  int index, bool is_read)
> +static bool xen_intel_pmu_emulate(u32 msr, u64 *val, int type, int index, bool is_read)
>  {
>  	uint64_t *reg = NULL;
>  	struct xen_pmu_intel_ctxt *ctxt;
> @@ -257,7 +256,7 @@ static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
>  	return false;
>  }
>  
> -static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
> +static bool xen_amd_pmu_emulate(u32 msr, u64 *val, bool is_read)
>  {
>  	uint64_t *reg = NULL;
>  	int i, off = 0;
> @@ -298,8 +297,7 @@ static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
>  	return false;
>  }
>  
> -static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
> -				 bool *emul)
> +bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul)
>  {
>  	int type, index = 0;
>  
> @@ -313,20 +311,6 @@ static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
>  	return true;
>  }
>  
> -bool pmu_msr_read(u32 msr, u64 *val)
> -{
> -	bool emulated;
> -
> -	return pmu_msr_chk_emulated(msr, val, true, &emulated) && emulated;
> -}
> -
> -bool pmu_msr_write(u32 msr, u64 val)
> -{
> -	bool emulated;
> -
> -	return pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated;
> -}
> -
>  static u64 xen_amd_read_pmc(int counter)
>  {
>  	struct xen_pmu_amd_ctxt *ctxt;
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index a1875e10be31..fde9f9d7415f 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -271,8 +271,7 @@ void xen_pmu_finish(int cpu);
>  static inline void xen_pmu_init(int cpu) {}
>  static inline void xen_pmu_finish(int cpu) {}
>  #endif
> -bool pmu_msr_read(u32 msr, u64 *val);
> -bool pmu_msr_write(u32 msr, u64 val);
> +bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul);
>  int pmu_apic_update(uint32_t reg);
>  u64 xen_read_pmc(int counter);
>  

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-24  6:33   ` Mi, Dapeng
@ 2025-04-24  7:21     ` Xin Li
  2025-04-24  7:43       ` Mi, Dapeng
  0 siblings, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-24  7:21 UTC (permalink / raw)
  To: Mi, Dapeng, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/23/2025 11:33 PM, Mi, Dapeng wrote:
> Could we merge this patch and previous patch into a single patch? It's
> unnecessary to just modify the pmu_msr_read()/pmu_msr_write() in previous
> patch and delete them immediately. It just wastes the effort.

No, it's not wasting effort, it's for easier review.

Look at this patch, you can easily tell that pmu_msr_read() and
pmu_msr_write() are nothing more than pmu_msr_chk_emulated(), and
then removing them makes a lot of sense.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-24  7:21     ` Xin Li
@ 2025-04-24  7:43       ` Mi, Dapeng
  2025-04-24  7:50         ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Mi, Dapeng @ 2025-04-24  7:43 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui


On 4/24/2025 3:21 PM, Xin Li wrote:
> On 4/23/2025 11:33 PM, Mi, Dapeng wrote:
>> Could we merge this patch and previous patch into a single patch? It's
>> unnecessary to just modify the pmu_msr_read()/pmu_msr_write() in previous
>> patch and delete them immediately. It just wastes the effort.
> No, it's not wasting effort, it's for easier review.
>
> Look at this patch, you can easily tell that pmu_msr_read() and
> pmu_msr_write() are nothing more than pmu_msr_chk_emulated(), and
> then removing them makes a lot of sense.

These 2 patches are not complicated, it won't be difficult to review if
merging them into one as long as the commit message mentions it clearly.
Anyway I'm fine if you hope to keep them into two patches.



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-24  7:43       ` Mi, Dapeng
@ 2025-04-24  7:50         ` Xin Li
  0 siblings, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-24  7:50 UTC (permalink / raw)
  To: Mi, Dapeng, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On 4/24/2025 12:43 AM, Mi, Dapeng wrote:
> These 2 patches are not complicated, it won't be difficult to review if
> merging them into one as long as the commit message mentions it clearly.
> Anyway I'm fine if you hope to keep them into two patches.

Simple Small Steps...

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-22  8:21 ` [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
  2025-04-24  6:33   ` Mi, Dapeng
@ 2025-04-24 10:05   ` Jürgen Groß
  2025-04-24 17:49     ` Xin Li
  1 sibling, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-24 10:05 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 4849 bytes --]

On 22.04.25 10:21, Xin Li (Intel) wrote:
> As pmu_msr_{read,write}() are now wrappers of pmu_msr_chk_emulated(),
> remove them and use pmu_msr_chk_emulated() directly.
> 
> While at it, convert the data type of MSR index to u32 in functions
> called in pmu_msr_chk_emulated().
> 
> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>   arch/x86/xen/enlighten_pv.c | 17 ++++++++++-------
>   arch/x86/xen/pmu.c          | 24 ++++--------------------
>   arch/x86/xen/xen-ops.h      |  3 +--
>   3 files changed, 15 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 1418758b57ff..b5a8bceb5f56 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1089,8 +1089,9 @@ static void xen_write_cr4(unsigned long cr4)
>   static u64 xen_do_read_msr(unsigned int msr, int *err)
>   {
>   	u64 val = 0;	/* Avoid uninitialized value for safe variant. */
> +	bool emulated;
>   
> -	if (pmu_msr_read(msr, &val, err))
> +	if (pmu_msr_chk_emulated(msr, &val, true, &emulated) && emulated)
>   		return val;
>   
>   	if (err)
> @@ -1133,6 +1134,7 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
>   			     unsigned int high, int *err)
>   {
>   	u64 val;
> +	bool emulated;
>   
>   	switch (msr) {
>   	case MSR_FS_BASE:
> @@ -1162,12 +1164,13 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
>   	default:
>   		val = (u64)high << 32 | low;
>   
> -		if (!pmu_msr_write(msr, val)) {
> -			if (err)
> -				*err = native_write_msr_safe(msr, low, high);
> -			else
> -				native_write_msr(msr, low, high);
> -		}
> +		if (pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated)
> +			return;
> +
> +		if (err)
> +			*err = native_write_msr_safe(msr, low, high);
> +		else
> +			native_write_msr(msr, low, high);
>   	}
>   }
>   
> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
> index 95caae97a394..afb02f43ee3f 100644
> --- a/arch/x86/xen/pmu.c
> +++ b/arch/x86/xen/pmu.c
> @@ -128,7 +128,7 @@ static inline uint32_t get_fam15h_addr(u32 addr)
>   	return addr;
>   }
>   
> -static inline bool is_amd_pmu_msr(unsigned int msr)
> +static bool is_amd_pmu_msr(u32 msr)
>   {
>   	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
>   	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
> @@ -194,8 +194,7 @@ static bool is_intel_pmu_msr(u32 msr_index, int *type, int *index)
>   	}
>   }
>   
> -static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
> -				  int index, bool is_read)
> +static bool xen_intel_pmu_emulate(u32 msr, u64 *val, int type, int index, bool is_read)
>   {
>   	uint64_t *reg = NULL;
>   	struct xen_pmu_intel_ctxt *ctxt;
> @@ -257,7 +256,7 @@ static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
>   	return false;
>   }
>   
> -static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
> +static bool xen_amd_pmu_emulate(u32 msr, u64 *val, bool is_read)
>   {
>   	uint64_t *reg = NULL;
>   	int i, off = 0;
> @@ -298,8 +297,7 @@ static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
>   	return false;
>   }
>   
> -static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
> -				 bool *emul)
> +bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul)
>   {
>   	int type, index = 0;
>   
> @@ -313,20 +311,6 @@ static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
>   	return true;
>   }
>   
> -bool pmu_msr_read(u32 msr, u64 *val)
> -{
> -	bool emulated;
> -
> -	return pmu_msr_chk_emulated(msr, val, true, &emulated) && emulated;
> -}
> -
> -bool pmu_msr_write(u32 msr, u64 val)
> -{
> -	bool emulated;
> -
> -	return pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated;
> -}
> -
>   static u64 xen_amd_read_pmc(int counter)
>   {
>   	struct xen_pmu_amd_ctxt *ctxt;
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index a1875e10be31..fde9f9d7415f 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -271,8 +271,7 @@ void xen_pmu_finish(int cpu);
>   static inline void xen_pmu_init(int cpu) {}
>   static inline void xen_pmu_finish(int cpu) {}
>   #endif
> -bool pmu_msr_read(u32 msr, u64 *val);
> -bool pmu_msr_write(u32 msr, u64 val);
> +bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul);
>   int pmu_apic_update(uint32_t reg);
>   u64 xen_read_pmc(int counter);
>   

May I suggest to get rid of the "emul" parameter of pmu_msr_chk_emulated()?
It has no real value, as pmu_msr_chk_emulated() could easily return false in
the cases where it would set *emul to false.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-24 10:05   ` Jürgen Groß
@ 2025-04-24 17:49     ` Xin Li
  2025-04-24 21:14       ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-24 17:49 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/24/2025 3:05 AM, Jürgen Groß wrote:
> 
> May I suggest to get rid of the "emul" parameter of pmu_msr_chk_emulated()?
> It has no real value, as pmu_msr_chk_emulated() could easily return 
> false in
> the cases where it would set *emul to false.

Good idea!

The function type is a bit of weird but I didn't think of change it.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-24 17:49     ` Xin Li
@ 2025-04-24 21:14       ` H. Peter Anvin
  2025-04-24 22:24         ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2025-04-24 21:14 UTC (permalink / raw)
  To: Xin Li, Jürgen Groß, linux-kernel, kvm,
	linux-perf-users, linux-hyperv, virtualization, linux-pm,
	linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, acme, andrew.cooper3, peterz,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On April 24, 2025 10:49:59 AM PDT, Xin Li <xin@zytor.com> wrote:
>On 4/24/2025 3:05 AM, Jürgen Groß wrote:
>> 
>> May I suggest to get rid of the "emul" parameter of pmu_msr_chk_emulated()?
>> It has no real value, as pmu_msr_chk_emulated() could easily return false in
>> the cases where it would set *emul to false.
>
>Good idea!
>
>The function type is a bit of weird but I didn't think of change it.

It is weird in the extreme. 

By the way, this patch should have "xen" in its subject tag.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}()
  2025-04-24 21:14       ` H. Peter Anvin
@ 2025-04-24 22:24         ` Xin Li
  0 siblings, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-24 22:24 UTC (permalink / raw)
  To: H. Peter Anvin, Jürgen Groß, linux-kernel, kvm,
	linux-perf-users, linux-hyperv, virtualization, linux-pm,
	linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, acme, andrew.cooper3, peterz,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

> By the way, this patch should have "xen" in its subject tag.
> 

Right, I should add it.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (11 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-24 10:11   ` Jürgen Groß
  2025-04-22  8:21 ` [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}() Xin Li (Intel)
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

set_reg() is used to write the following MSRs on Xen:

    MSR_FS_BASE
    MSR_KERNEL_GS_BASE
    MSR_GS_BASE

But none of these MSRs are written using any MSR write safe API.
Therefore there is no need to pass an error pointer argument to
set_reg() for returning an error code to be used in MSR safe APIs.

Remove the error pointer argument.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/xen/enlighten_pv.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index b5a8bceb5f56..9a89cb29fa35 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1111,17 +1111,11 @@ static u64 xen_do_read_msr(unsigned int msr, int *err)
 	return val;
 }
 
-static void set_seg(unsigned int which, unsigned int low, unsigned int high,
-		    int *err)
+static void set_seg(u32 which, u32 low, u32 high)
 {
 	u64 base = ((u64)high << 32) | low;
 
-	if (HYPERVISOR_set_segment_base(which, base) == 0)
-		return;
-
-	if (err)
-		*err = -EIO;
-	else
+	if (HYPERVISOR_set_segment_base(which, base))
 		WARN(1, "Xen set_segment_base(%u, %llx) failed\n", which, base);
 }
 
@@ -1138,15 +1132,15 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 
 	switch (msr) {
 	case MSR_FS_BASE:
-		set_seg(SEGBASE_FS, low, high, err);
+		set_seg(SEGBASE_FS, low, high);
 		break;
 
 	case MSR_KERNEL_GS_BASE:
-		set_seg(SEGBASE_GS_USER, low, high, err);
+		set_seg(SEGBASE_GS_USER, low, high);
 		break;
 
 	case MSR_GS_BASE:
-		set_seg(SEGBASE_GS_KERNEL, low, high, err);
+		set_seg(SEGBASE_GS_KERNEL, low, high);
 		break;
 
 	case MSR_STAR:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg()
  2025-04-22  8:21 ` [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg() Xin Li (Intel)
@ 2025-04-24 10:11   ` Jürgen Groß
  2025-04-24 17:50     ` Xin Li
  0 siblings, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-24 10:11 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 577 bytes --]

On 22.04.25 10:21, Xin Li (Intel) wrote:
> set_reg() is used to write the following MSRs on Xen:
> 
>      MSR_FS_BASE
>      MSR_KERNEL_GS_BASE
>      MSR_GS_BASE
> 
> But none of these MSRs are written using any MSR write safe API.
> Therefore there is no need to pass an error pointer argument to
> set_reg() for returning an error code to be used in MSR safe APIs.

set_seg(), please (further up, too).

> 
> Remove the error pointer argument.
> 
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg()
  2025-04-24 10:11   ` Jürgen Groß
@ 2025-04-24 17:50     ` Xin Li
  0 siblings, 0 replies; 94+ messages in thread
From: Xin Li @ 2025-04-24 17:50 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/24/2025 3:11 AM, Jürgen Groß wrote:
> set_seg(), please (further up, too).

Good catch, thanks a lot!

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (12 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-24 10:16   ` Jürgen Groß
  2025-04-22  8:21 ` [RFC PATCH v2 15/34] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low) Xin Li (Intel)
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

An MSR value is represented as a 64-bit unsigned integer, with existing
MSR instructions storing it in EDX:EAX as two 32-bit segments.

The new immediate form MSR instructions, however, utilize a 64-bit
general-purpose register to store the MSR value.  To unify the usage of
all MSR instructions, let the default MSR access APIs accept an MSR
value as a single 64-bit argument instead of two 32-bit segments.

The dual 32-bit APIs are still available as convenient wrappers over the
APIs that handle an MSR value as a single 64-bit argument.

The following illustrates the updated derivation of the MSR write APIs:

                 __wrmsrq(u32 msr, u64 val)
                   /                  \
                  /                    \
           native_wrmsrq(msr, val)    native_wrmsr(msr, low, high)
                 |
                 |
           native_write_msr(msr, val)
                /          \
               /            \
       wrmsrq(msr, val)    wrmsr(msr, low, high)

When CONFIG_PARAVIRT is enabled, wrmsrq() and wrmsr() are defined on top
of paravirt_write_msr():

            paravirt_write_msr(u32 msr, u64 val)
               /             \
              /               \
          wrmsrq(msr, val)    wrmsr(msr, low, high)

paravirt_write_msr() invokes cpu.write_msr(msr, val), an indirect layer
of pv_ops MSR write call:

    If on native:

            cpu.write_msr = native_write_msr

    If on Xen:

            cpu.write_msr = xen_write_msr

Therefore, refactor pv_cpu_ops.write_msr{_safe}() to accept an MSR value
in a single u64 argument, replacing the current dual u32 arguments.

No functional change intended.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---

Change in v2:
* Spell out the reason why use a single u64 argument to pass the MSR
  value in the lowest level APIs (Andrew Cooper).
---
 arch/x86/include/asm/msr.h            | 35 ++++++++++++---------------
 arch/x86/include/asm/paravirt.h       | 27 +++++++++++----------
 arch/x86/include/asm/paravirt_types.h |  4 +--
 arch/x86/kernel/kvmclock.c            |  2 +-
 arch/x86/kvm/svm/svm.c                | 15 +++---------
 arch/x86/xen/enlighten_pv.c           | 29 +++++++++-------------
 6 files changed, 46 insertions(+), 66 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 2ab8effea4cd..dd1114053173 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -97,12 +97,12 @@ static __always_inline u64 __rdmsr(u32 msr)
 	return EAX_EDX_VAL(val, low, high);
 }
 
-static __always_inline void __wrmsr(u32 msr, u32 low, u32 high)
+static __always_inline void __wrmsrq(u32 msr, u64 val)
 {
 	asm volatile("1: wrmsr\n"
 		     "2:\n"
 		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
-		     : : "c" (msr), "a"(low), "d" (high) : "memory");
+		     : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)) : "memory");
 }
 
 #define native_rdmsr(msr, val1, val2)			\
@@ -118,11 +118,10 @@ static __always_inline u64 native_rdmsrq(u32 msr)
 }
 
 #define native_wrmsr(msr, low, high)			\
-	__wrmsr(msr, low, high)
+	__wrmsrq((msr), (u64)(high) << 32 | (low))
 
 #define native_wrmsrq(msr, val)				\
-	__wrmsr((msr), (u32)((u64)(val)),		\
-		       (u32)((u64)(val) >> 32))
+	__wrmsrq((msr), (val))
 
 static inline u64 native_read_msr(u32 msr)
 {
@@ -151,11 +150,8 @@ static inline u64 native_read_msr_safe(u32 msr, int *err)
 }
 
 /* Can be uninlined because referenced by paravirt */
-static inline void notrace
-native_write_msr(u32 msr, u32 low, u32 high)
+static inline void notrace native_write_msr(u32 msr, u64 val)
 {
-	u64 val = (u64)high << 32 | low;
-
 	native_wrmsrq(msr, val);
 
 	if (tracepoint_enabled(write_msr))
@@ -163,8 +159,7 @@ native_write_msr(u32 msr, u32 low, u32 high)
 }
 
 /* Can be uninlined because referenced by paravirt */
-static inline int notrace
-native_write_msr_safe(u32 msr, u32 low, u32 high)
+static inline int notrace native_write_msr_safe(u32 msr, u64 val)
 {
 	int err;
 
@@ -172,10 +167,10 @@ native_write_msr_safe(u32 msr, u32 low, u32 high)
 		     "2:\n\t"
 		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
 		     : [err] "=a" (err)
-		     : "c" (msr), "0" (low), "d" (high)
+		     : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
 		     : "memory");
 	if (tracepoint_enabled(write_msr))
-		do_trace_write_msr(msr, ((u64)high << 32 | low), err);
+		do_trace_write_msr(msr, val, err);
 	return err;
 }
 
@@ -227,7 +222,7 @@ do {								\
 
 static inline void wrmsr(u32 msr, u32 low, u32 high)
 {
-	native_write_msr(msr, low, high);
+	native_write_msr(msr, (u64)high << 32 | low);
 }
 
 #define rdmsrq(msr, val)			\
@@ -235,13 +230,13 @@ static inline void wrmsr(u32 msr, u32 low, u32 high)
 
 static inline void wrmsrq(u32 msr, u64 val)
 {
-	native_write_msr(msr, (u32)(val & 0xffffffffULL), (u32)(val >> 32));
+	native_write_msr(msr, val);
 }
 
 /* wrmsr with exception handling */
-static inline int wrmsr_safe(u32 msr, u32 low, u32 high)
+static inline int wrmsrq_safe(u32 msr, u64 val)
 {
-	return native_write_msr_safe(msr, low, high);
+	return native_write_msr_safe(msr, val);
 }
 
 /* rdmsr with exception handling */
@@ -279,11 +274,11 @@ static __always_inline void wrmsrns(u32 msr, u64 val)
 }
 
 /*
- * 64-bit version of wrmsr_safe():
+ * Dual u32 version of wrmsrq_safe():
  */
-static inline int wrmsrq_safe(u32 msr, u64 val)
+static inline int wrmsr_safe(u32 msr, u32 low, u32 high)
 {
-	return wrmsr_safe(msr, (u32)val,  (u32)(val >> 32));
+	return wrmsrq_safe(msr, (u64)high << 32 | low);
 }
 
 struct msr __percpu *msrs_alloc(void);
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c7689f5f70d6..1bd1dad8da5a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -180,10 +180,9 @@ static inline u64 paravirt_read_msr(unsigned msr)
 	return PVOP_CALL1(u64, cpu.read_msr, msr);
 }
 
-static inline void paravirt_write_msr(unsigned msr,
-				      unsigned low, unsigned high)
+static inline void paravirt_write_msr(u32 msr, u64 val)
 {
-	PVOP_VCALL3(cpu.write_msr, msr, low, high);
+	PVOP_VCALL2(cpu.write_msr, msr, val);
 }
 
 static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
@@ -191,10 +190,9 @@ static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
 	return PVOP_CALL2(u64, cpu.read_msr_safe, msr, err);
 }
 
-static inline int paravirt_write_msr_safe(unsigned msr,
-					  unsigned low, unsigned high)
+static inline int paravirt_write_msr_safe(u32 msr, u64 val)
 {
-	return PVOP_CALL3(int, cpu.write_msr_safe, msr, low, high);
+	return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
 }
 
 #define rdmsr(msr, val1, val2)			\
@@ -204,22 +202,25 @@ do {						\
 	val2 = _l >> 32;			\
 } while (0)
 
-#define wrmsr(msr, val1, val2)			\
-do {						\
-	paravirt_write_msr(msr, val1, val2);	\
-} while (0)
+static __always_inline void wrmsr(u32 msr, u32 low, u32 high)
+{
+	paravirt_write_msr(msr, (u64)high << 32 | low);
+}
 
 #define rdmsrq(msr, val)			\
 do {						\
 	val = paravirt_read_msr(msr);		\
 } while (0)
 
-static inline void wrmsrq(unsigned msr, u64 val)
+static inline void wrmsrq(u32 msr, u64 val)
 {
-	wrmsr(msr, (u32)val, (u32)(val>>32));
+	paravirt_write_msr(msr, val);
 }
 
-#define wrmsr_safe(msr, a, b)	paravirt_write_msr_safe(msr, a, b)
+static inline int wrmsrq_safe(u32 msr, u64 val)
+{
+	return paravirt_write_msr_safe(msr, val)
+}
 
 /* rdmsr with exception handling */
 #define rdmsr_safe(msr, a, b)				\
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 475f508531d6..91b3423d36ce 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -92,14 +92,14 @@ struct pv_cpu_ops {
 
 	/* Unsafe MSR operations.  These will warn or panic on failure. */
 	u64 (*read_msr)(unsigned int msr);
-	void (*write_msr)(unsigned int msr, unsigned low, unsigned high);
+	void (*write_msr)(u32 msr, u64 val);
 
 	/*
 	 * Safe MSR operations.
 	 * read sets err to 0 or -EIO.  write returns 0 or -EIO.
 	 */
 	u64 (*read_msr_safe)(unsigned int msr, int *err);
-	int (*write_msr_safe)(unsigned int msr, unsigned low, unsigned high);
+	int (*write_msr_safe)(u32 msr, u64 val);
 
 	void (*start_context_switch)(struct task_struct *prev);
 	void (*end_context_switch)(struct task_struct *next);
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 0af797930ccb..ca0a49eeac4a 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -196,7 +196,7 @@ static void kvm_setup_secondary_clock(void)
 void kvmclock_disable(void)
 {
 	if (msr_kvm_system_time)
-		native_write_msr(msr_kvm_system_time, 0, 0);
+		native_write_msr(msr_kvm_system_time, 0);
 }
 
 static void __init kvmclock_init_mem(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 67657b3a36ce..4ef9978dce70 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -475,7 +475,6 @@ static void svm_inject_exception(struct kvm_vcpu *vcpu)
 
 static void svm_init_erratum_383(void)
 {
-	u32 low, high;
 	int err;
 	u64 val;
 
@@ -489,10 +488,7 @@ static void svm_init_erratum_383(void)
 
 	val |= (1ULL << 47);
 
-	low  = lower_32_bits(val);
-	high = upper_32_bits(val);
-
-	native_write_msr_safe(MSR_AMD64_DC_CFG, low, high);
+	native_write_msr_safe(MSR_AMD64_DC_CFG, val);
 
 	erratum_383_found = true;
 }
@@ -2167,17 +2163,12 @@ static bool is_erratum_383(void)
 
 	/* Clear MCi_STATUS registers */
 	for (i = 0; i < 6; ++i)
-		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0, 0);
+		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
 
 	value = native_read_msr_safe(MSR_IA32_MCG_STATUS, &err);
 	if (!err) {
-		u32 low, high;
-
 		value &= ~(1ULL << 2);
-		low    = lower_32_bits(value);
-		high   = upper_32_bits(value);
-
-		native_write_msr_safe(MSR_IA32_MCG_STATUS, low, high);
+		native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
 	}
 
 	/* Flush tlb to evict multi-match entries */
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9a89cb29fa35..052f68c92111 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1111,10 +1111,8 @@ static u64 xen_do_read_msr(unsigned int msr, int *err)
 	return val;
 }
 
-static void set_seg(u32 which, u32 low, u32 high)
+static void set_seg(u32 which, u64 base)
 {
-	u64 base = ((u64)high << 32) | low;
-
 	if (HYPERVISOR_set_segment_base(which, base))
 		WARN(1, "Xen set_segment_base(%u, %llx) failed\n", which, base);
 }
@@ -1124,23 +1122,21 @@ static void set_seg(u32 which, u32 low, u32 high)
  * With err == NULL write_msr() semantics are selected.
  * Supplying an err pointer requires err to be pre-initialized with 0.
  */
-static void xen_do_write_msr(unsigned int msr, unsigned int low,
-			     unsigned int high, int *err)
+static void xen_do_write_msr(u32 msr, u64 val, int *err)
 {
-	u64 val;
 	bool emulated;
 
 	switch (msr) {
 	case MSR_FS_BASE:
-		set_seg(SEGBASE_FS, low, high);
+		set_seg(SEGBASE_FS, val);
 		break;
 
 	case MSR_KERNEL_GS_BASE:
-		set_seg(SEGBASE_GS_USER, low, high);
+		set_seg(SEGBASE_GS_USER, val);
 		break;
 
 	case MSR_GS_BASE:
-		set_seg(SEGBASE_GS_KERNEL, low, high);
+		set_seg(SEGBASE_GS_KERNEL, val);
 		break;
 
 	case MSR_STAR:
@@ -1156,15 +1152,13 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 		break;
 
 	default:
-		val = (u64)high << 32 | low;
-
 		if (pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated)
 			return;
 
 		if (err)
-			*err = native_write_msr_safe(msr, low, high);
+			*err = native_write_msr_safe(msr, val);
 		else
-			native_write_msr(msr, low, high);
+			native_write_msr(msr, val);
 	}
 }
 
@@ -1173,12 +1167,11 @@ static u64 xen_read_msr_safe(unsigned int msr, int *err)
 	return xen_do_read_msr(msr, err);
 }
 
-static int xen_write_msr_safe(unsigned int msr, unsigned int low,
-			      unsigned int high)
+static int xen_write_msr_safe(u32 msr, u64 val)
 {
 	int err = 0;
 
-	xen_do_write_msr(msr, low, high, &err);
+	xen_do_write_msr(msr, val, &err);
 
 	return err;
 }
@@ -1190,11 +1183,11 @@ static u64 xen_read_msr(unsigned int msr)
 	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
 }
 
-static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
+static void xen_write_msr(u32 msr, u64 val)
 {
 	int err;
 
-	xen_do_write_msr(msr, low, high, xen_msr_safe ? &err : NULL);
+	xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
 }
 
 /* This is called once we have the cpu_possible_mask */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}()
  2025-04-22  8:21 ` [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}() Xin Li (Intel)
@ 2025-04-24 10:16   ` Jürgen Groß
  0 siblings, 0 replies; 94+ messages in thread
From: Jürgen Groß @ 2025-04-24 10:16 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 1968 bytes --]

On 22.04.25 10:21, Xin Li (Intel) wrote:
> An MSR value is represented as a 64-bit unsigned integer, with existing
> MSR instructions storing it in EDX:EAX as two 32-bit segments.
> 
> The new immediate form MSR instructions, however, utilize a 64-bit
> general-purpose register to store the MSR value.  To unify the usage of
> all MSR instructions, let the default MSR access APIs accept an MSR
> value as a single 64-bit argument instead of two 32-bit segments.
> 
> The dual 32-bit APIs are still available as convenient wrappers over the
> APIs that handle an MSR value as a single 64-bit argument.
> 
> The following illustrates the updated derivation of the MSR write APIs:
> 
>                   __wrmsrq(u32 msr, u64 val)
>                     /                  \
>                    /                    \
>             native_wrmsrq(msr, val)    native_wrmsr(msr, low, high)
>                   |
>                   |
>             native_write_msr(msr, val)
>                  /          \
>                 /            \
>         wrmsrq(msr, val)    wrmsr(msr, low, high)
> 
> When CONFIG_PARAVIRT is enabled, wrmsrq() and wrmsr() are defined on top
> of paravirt_write_msr():
> 
>              paravirt_write_msr(u32 msr, u64 val)
>                 /             \
>                /               \
>            wrmsrq(msr, val)    wrmsr(msr, low, high)
> 
> paravirt_write_msr() invokes cpu.write_msr(msr, val), an indirect layer
> of pv_ops MSR write call:
> 
>      If on native:
> 
>              cpu.write_msr = native_write_msr
> 
>      If on Xen:
> 
>              cpu.write_msr = xen_write_msr
> 
> Therefore, refactor pv_cpu_ops.write_msr{_safe}() to accept an MSR value
> in a single u64 argument, replacing the current dual u32 arguments.
> 
> No functional change intended.
> 
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 15/34] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low)
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (13 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 16/34] x86/msr: Change function type of native_read_msr_safe() Xin Li (Intel)
                   ` (19 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/hyperv/hv_apic.c                 | 6 +++---
 arch/x86/include/asm/apic.h               | 2 +-
 arch/x86/include/asm/switch_to.h          | 2 +-
 arch/x86/kernel/cpu/amd.c                 | 2 +-
 arch/x86/kernel/cpu/common.c              | 8 ++++----
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 2 +-
 arch/x86/kernel/cpu/umwait.c              | 4 ++--
 arch/x86/kernel/kvm.c                     | 2 +-
 9 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index c450e67cb0a4..4d617ee59377 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -75,10 +75,10 @@ static void hv_apic_write(u32 reg, u32 val)
 {
 	switch (reg) {
 	case APIC_EOI:
-		wrmsr(HV_X64_MSR_EOI, val, 0);
+		wrmsrq(HV_X64_MSR_EOI, val);
 		break;
 	case APIC_TASKPRI:
-		wrmsr(HV_X64_MSR_TPR, val, 0);
+		wrmsrq(HV_X64_MSR_TPR, val);
 		break;
 	default:
 		native_apic_mem_write(reg, val);
@@ -92,7 +92,7 @@ static void hv_apic_eoi_write(void)
 	if (hvp && (xchg(&hvp->apic_assist, 0) & 0x1))
 		return;
 
-	wrmsr(HV_X64_MSR_EOI, APIC_EOI_ACK, 0);
+	wrmsrq(HV_X64_MSR_EOI, APIC_EOI_ACK);
 }
 
 static bool cpu_is_self(int cpu)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 0174dd548327..68e10e30fe9b 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -209,7 +209,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 	    reg == APIC_LVR)
 		return;
 
-	wrmsr(APIC_BASE_MSR + (reg >> 4), v, 0);
+	wrmsrq(APIC_BASE_MSR + (reg >> 4), v);
 }
 
 static inline void native_apic_msr_eoi(void)
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 4f21df7af715..499b1c15cc8b 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -61,7 +61,7 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 		return;
 
 	this_cpu_write(cpu_tss_rw.x86_tss.ss1, thread->sysenter_cs);
-	wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
+	wrmsrq(MSR_IA32_SYSENTER_CS, thread->sysenter_cs);
 }
 #endif
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 1f7925e45b46..6132a3c529cc 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1206,7 +1206,7 @@ void amd_set_dr_addr_mask(unsigned long mask, unsigned int dr)
 	if (per_cpu(amd_dr_addr_mask, cpu)[dr] == mask)
 		return;
 
-	wrmsr(amd_msr_dr_addr_masks[dr], mask, 0);
+	wrmsrq(amd_msr_dr_addr_masks[dr], mask);
 	per_cpu(amd_dr_addr_mask, cpu)[dr] = mask;
 }
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 10da3da5b81f..99d8a8c15ba5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2019,9 +2019,9 @@ void enable_sep_cpu(void)
 	 */
 
 	tss->x86_tss.ss1 = __KERNEL_CS;
-	wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0);
-	wrmsr(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1), 0);
-	wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0);
+	wrmsrq(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1);
+	wrmsrq(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1));
+	wrmsrq(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32);
 
 	put_cpu();
 }
@@ -2235,7 +2235,7 @@ static inline void setup_getcpu(int cpu)
 	struct desc_struct d = { };
 
 	if (boot_cpu_has(X86_FEATURE_RDTSCP) || boot_cpu_has(X86_FEATURE_RDPID))
-		wrmsr(MSR_TSC_AUX, cpudata, 0);
+		wrmsrq(MSR_TSC_AUX, cpudata);
 
 	/* Store CPU and node number in limit. */
 	d.limit0 = cpudata;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 185317c6b509..cc534a83f19d 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -905,7 +905,7 @@ int resctrl_arch_measure_cycles_lat_fn(void *_plr)
 	 * Disable hardware prefetchers.
 	 */
 	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
-	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	wrmsrq(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	mem_r = READ_ONCE(plr->kmem);
 	/*
 	 * Dummy execute of the time measurement to load the needed
@@ -1001,7 +1001,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * Disable hardware prefetchers.
 	 */
 	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
-	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	wrmsrq(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 
 	/* Initialize rest of local variables */
 	/*
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f4a2ee2a6404..73ed83f1dff8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1707,7 +1707,7 @@ void resctrl_arch_mon_event_config_write(void *_config_info)
 		pr_warn_once("Invalid event id %d\n", config_info->evtid);
 		return;
 	}
-	wrmsr(MSR_IA32_EVT_CFG_BASE + index, config_info->mon_config, 0);
+	wrmsrq(MSR_IA32_EVT_CFG_BASE + index, config_info->mon_config);
 }
 
 static void mbm_config_write_domain(struct rdt_resource *r,
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 0050eae153bb..933fcd7ff250 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -33,7 +33,7 @@ static DEFINE_MUTEX(umwait_lock);
 static void umwait_update_control_msr(void * unused)
 {
 	lockdep_assert_irqs_disabled();
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
+	wrmsrq(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached));
 }
 
 /*
@@ -71,7 +71,7 @@ static int umwait_cpu_offline(unsigned int cpu)
 	 * the original control MSR value in umwait_init(). So there
 	 * is no race condition here.
 	 */
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0);
+	wrmsrq(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 44a45df7200a..bc9d21d7395f 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -399,7 +399,7 @@ static void kvm_disable_steal_time(void)
 	if (!has_steal_clock)
 		return;
 
-	wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
+	wrmsrq(MSR_KVM_STEAL_TIME, 0);
 }
 
 static u64 kvm_steal_clock(int cpu)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 16/34] x86/msr: Change function type of native_read_msr_safe()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (14 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 15/34] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low) Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 17/34] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
                   ` (18 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Change function type of native_read_msr_safe() to

    int native_read_msr_safe(u32 msr, u64 *val)

to make it the same as the type of native_write_msr_safe().

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h            | 21 +++++++++++----------
 arch/x86/include/asm/paravirt_types.h |  4 ++--
 arch/x86/kvm/svm/svm.c                | 19 +++++++------------
 arch/x86/xen/enlighten_pv.c           |  9 ++++++---
 arch/x86/xen/pmu.c                    | 14 ++++++++------
 5 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index dd1114053173..c955339be9c9 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -135,18 +135,22 @@ static inline u64 native_read_msr(u32 msr)
 	return val;
 }
 
-static inline u64 native_read_msr_safe(u32 msr, int *err)
+static inline int native_read_msr_safe(u32 msr, u64 *p)
 {
+	int err;
 	DECLARE_ARGS(val, low, high);
 
 	asm volatile("1: rdmsr ; xor %[err],%[err]\n"
 		     "2:\n\t"
 		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
-		     : [err] "=r" (*err), EAX_EDX_RET(val, low, high)
+		     : [err] "=r" (err), EAX_EDX_RET(val, low, high)
 		     : "c" (msr));
 	if (tracepoint_enabled(read_msr))
-		do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), *err);
-	return EAX_EDX_VAL(val, low, high);
+		do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), err);
+
+	*p = EAX_EDX_VAL(val, low, high);
+
+	return err;
 }
 
 /* Can be uninlined because referenced by paravirt */
@@ -242,8 +246,8 @@ static inline int wrmsrq_safe(u32 msr, u64 val)
 /* rdmsr with exception handling */
 #define rdmsr_safe(msr, low, high)				\
 ({								\
-	int __err;						\
-	u64 __val = native_read_msr_safe((msr), &__err);	\
+	u64 __val;						\
+	int __err = native_read_msr_safe((msr), &__val);	\
 	(*low) = (u32)__val;					\
 	(*high) = (u32)(__val >> 32);				\
 	__err;							\
@@ -251,10 +255,7 @@ static inline int wrmsrq_safe(u32 msr, u64 val)
 
 static inline int rdmsrq_safe(u32 msr, u64 *p)
 {
-	int err;
-
-	*p = native_read_msr_safe(msr, &err);
-	return err;
+	return native_read_msr_safe(msr, p);
 }
 #endif	/* !CONFIG_PARAVIRT_XXL */
 
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 91b3423d36ce..d2db38c32bc5 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -96,9 +96,9 @@ struct pv_cpu_ops {
 
 	/*
 	 * Safe MSR operations.
-	 * read sets err to 0 or -EIO.  write returns 0 or -EIO.
+	 * Returns 0 or -EIO.
 	 */
-	u64 (*read_msr_safe)(unsigned int msr, int *err);
+	int (*read_msr_safe)(unsigned int msr, u64 *val);
 	int (*write_msr_safe)(u32 msr, u64 val);
 
 	void (*start_context_switch)(struct task_struct *prev);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4ef9978dce70..838606f784c9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -475,15 +475,13 @@ static void svm_inject_exception(struct kvm_vcpu *vcpu)
 
 static void svm_init_erratum_383(void)
 {
-	int err;
 	u64 val;
 
 	if (!static_cpu_has_bug(X86_BUG_AMD_TLB_MMATCH))
 		return;
 
 	/* Use _safe variants to not break nested virtualization */
-	val = native_read_msr_safe(MSR_AMD64_DC_CFG, &err);
-	if (err)
+	if (native_read_msr_safe(MSR_AMD64_DC_CFG, &val))
 		return;
 
 	val |= (1ULL << 47);
@@ -648,13 +646,12 @@ static int svm_enable_virtualization_cpu(void)
 	 * erratum is present everywhere).
 	 */
 	if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) {
-		uint64_t len, status = 0;
+		u64 len, status = 0;
 		int err;
 
-		len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &err);
+		err = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
 		if (!err)
-			status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS,
-						      &err);
+			err = native_read_msr_safe(MSR_AMD64_OSVW_STATUS, &status);
 
 		if (err)
 			osvw_status = osvw_len = 0;
@@ -2145,14 +2142,13 @@ static int ac_interception(struct kvm_vcpu *vcpu)
 
 static bool is_erratum_383(void)
 {
-	int err, i;
+	int i;
 	u64 value;
 
 	if (!erratum_383_found)
 		return false;
 
-	value = native_read_msr_safe(MSR_IA32_MC0_STATUS, &err);
-	if (err)
+	if (native_read_msr_safe(MSR_IA32_MC0_STATUS, &value))
 		return false;
 
 	/* Bit 62 may or may not be set for this mce */
@@ -2165,8 +2161,7 @@ static bool is_erratum_383(void)
 	for (i = 0; i < 6; ++i)
 		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
 
-	value = native_read_msr_safe(MSR_IA32_MCG_STATUS, &err);
-	if (!err) {
+	if (!native_read_msr_safe(MSR_IA32_MCG_STATUS, &value)) {
 		value &= ~(1ULL << 2);
 		native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
 	}
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 052f68c92111..195e6501a000 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1095,7 +1095,7 @@ static u64 xen_do_read_msr(unsigned int msr, int *err)
 		return val;
 
 	if (err)
-		val = native_read_msr_safe(msr, err);
+		*err = native_read_msr_safe(msr, &val);
 	else
 		val = native_read_msr(msr);
 
@@ -1162,9 +1162,12 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
 	}
 }
 
-static u64 xen_read_msr_safe(unsigned int msr, int *err)
+static int xen_read_msr_safe(unsigned int msr, u64 *val)
 {
-	return xen_do_read_msr(msr, err);
+	int err;
+
+	*val = xen_do_read_msr(msr, &err);
+	return err;
 }
 
 static int xen_write_msr_safe(u32 msr, u64 val)
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index afb02f43ee3f..ee908dfcff48 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -319,11 +319,12 @@ static u64 xen_amd_read_pmc(int counter)
 	uint8_t xenpmu_flags = get_xenpmu_flags();
 
 	if (!xenpmu_data || !(xenpmu_flags & XENPMU_IRQ_PROCESSING)) {
-		uint32_t msr;
-		int err;
+		u32 msr;
+		u64 val;
 
 		msr = amd_counters_base + (counter * amd_msr_step);
-		return native_read_msr_safe(msr, &err);
+		native_read_msr_safe(msr, &val);
+		return val;
 	}
 
 	ctxt = &xenpmu_data->pmu.c.amd;
@@ -340,15 +341,16 @@ static u64 xen_intel_read_pmc(int counter)
 	uint8_t xenpmu_flags = get_xenpmu_flags();
 
 	if (!xenpmu_data || !(xenpmu_flags & XENPMU_IRQ_PROCESSING)) {
-		uint32_t msr;
-		int err;
+		u32 msr;
+		u64 val;
 
 		if (counter & (1 << INTEL_PMC_TYPE_SHIFT))
 			msr = MSR_CORE_PERF_FIXED_CTR0 + (counter & 0xffff);
 		else
 			msr = MSR_IA32_PERFCTR0 + counter;
 
-		return native_read_msr_safe(msr, &err);
+		native_read_msr_safe(msr, &val);
+		return val;
 	}
 
 	ctxt = &xenpmu_data->pmu.c.intel;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 17/34] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (15 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 16/34] x86/msr: Change function type of native_read_msr_safe() Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:21 ` [RFC PATCH v2 18/34] x86/opcode: Add immediate form MSR instructions Xin Li (Intel)
                   ` (17 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

The immediate form of MSR access instructions are primarily motivated
by performance, not code size: by having the MSR number in an immediate,
it is available *much* earlier in the pipeline, which allows the
hardware much more leeway about how a particular MSR is handled.

Use a scattered CPU feature bit for MSR immediate form instructions.

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/scattered.c    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 7642310276a8..1dc6dc794018 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -482,6 +482,7 @@
 #define X86_FEATURE_AMD_WORKLOAD_CLASS	(21*32+ 7) /* Workload Classification */
 #define X86_FEATURE_PREFER_YMM		(21*32+ 8) /* Avoid ZMM registers due to downclocking */
 #define X86_FEATURE_APX			(21*32+ 9) /* Advanced Performance Extensions */
+#define X86_FEATURE_MSR_IMM		(21*32+10) /* MSR immediate form instructions */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbf6d71bdf18..c63ddbf35a71 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_APERFMPERF,		CPUID_ECX,  0, 0x00000006, 0 },
 	{ X86_FEATURE_EPB,			CPUID_ECX,  3, 0x00000006, 0 },
 	{ X86_FEATURE_INTEL_PPIN,		CPUID_EBX,  0, 0x00000007, 1 },
+	{ X86_FEATURE_MSR_IMM,			CPUID_ECX,  5, 0x00000007, 1 },
 	{ X86_FEATURE_APX,			CPUID_EDX, 21, 0x00000007, 1 },
 	{ X86_FEATURE_RRSBA_CTRL,		CPUID_EDX,  2, 0x00000007, 2 },
 	{ X86_FEATURE_BHI_CTRL,			CPUID_EDX,  4, 0x00000007, 2 },
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 18/34] x86/opcode: Add immediate form MSR instructions
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (16 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 17/34] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
@ 2025-04-22  8:21 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 19/34] x86/extable: Add support for " Xin Li (Intel)
                   ` (16 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:21 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Add the instruction opcodes used by the immediate form WRMSRNS/RDMSR
to x86-opcode-map.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 5 +++--
 tools/arch/x86/lib/x86-opcode-map.txt | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index caedb3ef6688..e64f52321d6d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -839,7 +839,7 @@ f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
 f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy | RDMSR Rq,Gq (F2),(11B) | WRMSRNS Gq,Rq (F3),(11B)
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
 f9: MOVDIRI My,Gy
@@ -1014,7 +1014,7 @@ f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
 f2: INVPCID Gy,Mdq (F3),(ev)
 f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
 f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
-f6: Grp3_1 Eb (1A),(ev)
+f6: Grp3_1 Eb (1A),(ev) | RDMSR Rq,Gq (F2),(11B),(ev) | WRMSRNS Gq,Rq (F3),(11B),(ev)
 f7: Grp3_2 Ev (1A),(es)
 f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
 f9: MOVDIRI My,Gy (ev)
@@ -1103,6 +1103,7 @@ EndTable
 Table: VEX map 7
 Referrer:
 AVXcode: 7
+f6: RDMSR Rq,Id (F2),(v1),(11B) | WRMSRNS Id,Rq (F3),(v1),(11B)
 f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
 EndTable
 
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index caedb3ef6688..e64f52321d6d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -839,7 +839,7 @@ f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
 f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy | RDMSR Rq,Gq (F2),(11B) | WRMSRNS Gq,Rq (F3),(11B)
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
 f9: MOVDIRI My,Gy
@@ -1014,7 +1014,7 @@ f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
 f2: INVPCID Gy,Mdq (F3),(ev)
 f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
 f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
-f6: Grp3_1 Eb (1A),(ev)
+f6: Grp3_1 Eb (1A),(ev) | RDMSR Rq,Gq (F2),(11B),(ev) | WRMSRNS Gq,Rq (F3),(11B),(ev)
 f7: Grp3_2 Ev (1A),(es)
 f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
 f9: MOVDIRI My,Gy (ev)
@@ -1103,6 +1103,7 @@ EndTable
 Table: VEX map 7
 Referrer:
 AVXcode: 7
+f6: RDMSR Rq,Id (F2),(v1),(11B) | WRMSRNS Id,Rq (F3),(v1),(11B)
 f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
 EndTable
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 19/34] x86/extable: Add support for immediate form MSR instructions
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (17 preceding siblings ...)
  2025-04-22  8:21 ` [RFC PATCH v2 18/34] x86/opcode: Add immediate form MSR instructions Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 20/34] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h | 18 ++++++++++++++++++
 arch/x86/mm/extable.c      | 39 +++++++++++++++++++++++++++++++++-----
 2 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index c955339be9c9..8f7a67b1c61c 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -78,6 +78,24 @@ static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
 extern u64 xen_read_pmc(int counter);
 #endif
 
+/*
+ * Called only from an MSR fault handler, the instruction pointer points to
+ * the MSR access instruction that caused the fault.
+ */
+static __always_inline bool is_msr_imm_insn(void *ip)
+{
+	/*
+	 * A full decoder for immediate form MSR instructions appears excessive.
+	 */
+#ifdef CONFIG_X86_64
+	const u8 msr_imm_insn_prefix[] = { 0xc4, 0xe7 };
+
+	return !memcmp(ip, msr_imm_insn_prefix, sizeof(msr_imm_insn_prefix));
+#else
+	return false;
+#endif
+}
+
 /*
  * __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
  * accessors and should not have any tracing or other functionality piggybacking
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index bf8dab18be97..f1743babafc8 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -167,23 +167,52 @@ static bool ex_handler_uaccess(const struct exception_table_entry *fixup,
 static bool ex_handler_msr(const struct exception_table_entry *fixup,
 			   struct pt_regs *regs, bool wrmsr, bool safe, int reg)
 {
+	bool imm_insn = is_msr_imm_insn((void *)regs->ip);
+	u32 msr;
+
+	if (imm_insn)
+		/*
+		 * The 32-bit immediate specifying a MSR is encoded into
+		 * byte 5 ~ 8 of an immediate form MSR instruction.
+		 */
+		msr = *(u32 *)(regs->ip + 5);
+	else
+		msr = (u32)regs->cx;
+
 	if (__ONCE_LITE_IF(!safe && wrmsr)) {
-		pr_warn("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
-			(unsigned int)regs->cx, (unsigned int)regs->dx,
-			(unsigned int)regs->ax,  regs->ip, (void *)regs->ip);
+		/*
+		 * To maintain consistency with existing RDMSR and WRMSR(NS) instructions,
+		 * the register operand for immediate form MSR instructions is ALWAYS
+		 * encoded as RAX in <asm/msr.h> for reading or writing the MSR value.
+		 */
+		u64 msr_val = regs->ax;
+
+		if (!imm_insn) {
+			/*
+			 * On processors that support the Intel 64 architecture, the
+			 * high-order 32 bits of each of RAX and RDX are ignored.
+			 */
+			msr_val &= 0xffffffff;
+			msr_val |= (u64)regs->dx << 32;
+		}
+
+		pr_warn("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%016llx) at rIP: 0x%lx (%pS)\n",
+			msr, msr_val, regs->ip, (void *)regs->ip);
 		show_stack_regs(regs);
 	}
 
 	if (__ONCE_LITE_IF(!safe && !wrmsr)) {
 		pr_warn("unchecked MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
-			(unsigned int)regs->cx, regs->ip, (void *)regs->ip);
+			msr, regs->ip, (void *)regs->ip);
 		show_stack_regs(regs);
 	}
 
 	if (!wrmsr) {
 		/* Pretend that the read succeeded and returned 0. */
 		regs->ax = 0;
-		regs->dx = 0;
+
+		if (!imm_insn)
+			regs->dx = 0;
 	}
 
 	if (safe)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 20/34] x86/extable: Implement EX_TYPE_FUNC_REWIND
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (18 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 19/34] x86/extable: Add support for " Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR Xin Li (Intel)
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Add a new exception type, which allows emulating an exception as if it
had happened at or near the call site of a function.  This allows a
function call inside an alternative for instruction emulation to "kick
back" the exception into the alternatives pattern, possibly invoking a
different exception handling pattern there, or at least indicating the
"real" location of the fault.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/asm.h                 |   6 +
 arch/x86/include/asm/extable_fixup_types.h |   1 +
 arch/x86/mm/extable.c                      | 135 +++++++++++++--------
 3 files changed, 91 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index a9f07799e337..722340d7c1af 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -243,5 +243,11 @@ register unsigned long current_stack_pointer asm(_ASM_SP);
 #define _ASM_EXTABLE_FAULT(from, to)				\
 	_ASM_EXTABLE_TYPE(from, to, EX_TYPE_FAULT)
 
+#define _ASM_EXTABLE_FUNC_REWIND(from, ipdelta, spdelta)	\
+	_ASM_EXTABLE_TYPE(from, from /* unused */,		\
+			  EX_TYPE_FUNC_REWIND |			\
+			  EX_DATA_REG(spdelta) |		\
+			  EX_DATA_IMM(ipdelta))
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_X86_ASM_H */
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index 906b0d5541e8..9cd1cea45052 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -67,5 +67,6 @@
 #define	EX_TYPE_ZEROPAD			20 /* longword load with zeropad on fault */
 
 #define	EX_TYPE_ERETU			21
+#define	EX_TYPE_FUNC_REWIND		22
 
 #endif
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index f1743babafc8..6bf4c2a43c2c 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -319,6 +319,27 @@ static bool ex_handler_eretu(const struct exception_table_entry *fixup,
 }
 #endif
 
+/*
+ * Emulate a fault taken at the call site of a function.
+ *
+ * The combined reg and flags field are used as an unsigned number of
+ * machine words to pop off the stack before the return address, then
+ * the signed imm field is used as a delta from the return IP address.
+ */
+static bool ex_handler_func_rewind(struct pt_regs *regs, int data)
+{
+	const long ipdelta = FIELD_GET(EX_DATA_IMM_MASK, data);
+	const unsigned long pops = FIELD_GET(EX_DATA_REG_MASK | EX_DATA_FLAG_MASK, data);
+	unsigned long *sp;
+
+	sp = (unsigned long *)regs->sp;
+	sp += pops;
+	regs->ip = *sp++ + ipdelta;
+	regs->sp = (unsigned long)sp;
+
+	return true;
+}
+
 int ex_get_fixup_type(unsigned long ip)
 {
 	const struct exception_table_entry *e = search_exception_tables(ip);
@@ -331,6 +352,7 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 {
 	const struct exception_table_entry *e;
 	int type, reg, imm;
+	bool again;
 
 #ifdef CONFIG_PNPBIOS
 	if (unlikely(SEGMENT_IS_PNP_CODE(regs->cs))) {
@@ -346,60 +368,71 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 	}
 #endif
 
-	e = search_exception_tables(regs->ip);
-	if (!e)
-		return 0;
-
-	type = FIELD_GET(EX_DATA_TYPE_MASK, e->data);
-	reg  = FIELD_GET(EX_DATA_REG_MASK,  e->data);
-	imm  = FIELD_GET(EX_DATA_IMM_MASK,  e->data);
-
-	switch (type) {
-	case EX_TYPE_DEFAULT:
-	case EX_TYPE_DEFAULT_MCE_SAFE:
-		return ex_handler_default(e, regs);
-	case EX_TYPE_FAULT:
-	case EX_TYPE_FAULT_MCE_SAFE:
-		return ex_handler_fault(e, regs, trapnr);
-	case EX_TYPE_UACCESS:
-		return ex_handler_uaccess(e, regs, trapnr, fault_addr);
-	case EX_TYPE_CLEAR_FS:
-		return ex_handler_clear_fs(e, regs);
-	case EX_TYPE_FPU_RESTORE:
-		return ex_handler_fprestore(e, regs);
-	case EX_TYPE_BPF:
-		return ex_handler_bpf(e, regs);
-	case EX_TYPE_WRMSR:
-		return ex_handler_msr(e, regs, true, false, reg);
-	case EX_TYPE_RDMSR:
-		return ex_handler_msr(e, regs, false, false, reg);
-	case EX_TYPE_WRMSR_SAFE:
-		return ex_handler_msr(e, regs, true, true, reg);
-	case EX_TYPE_RDMSR_SAFE:
-		return ex_handler_msr(e, regs, false, true, reg);
-	case EX_TYPE_WRMSR_IN_MCE:
-		ex_handler_msr_mce(regs, true);
-		break;
-	case EX_TYPE_RDMSR_IN_MCE:
-		ex_handler_msr_mce(regs, false);
-		break;
-	case EX_TYPE_POP_REG:
-		regs->sp += sizeof(long);
-		fallthrough;
-	case EX_TYPE_IMM_REG:
-		return ex_handler_imm_reg(e, regs, reg, imm);
-	case EX_TYPE_FAULT_SGX:
-		return ex_handler_sgx(e, regs, trapnr);
-	case EX_TYPE_UCOPY_LEN:
-		return ex_handler_ucopy_len(e, regs, trapnr, fault_addr, reg, imm);
-	case EX_TYPE_ZEROPAD:
-		return ex_handler_zeropad(e, regs, fault_addr);
+	do {
+		e = search_exception_tables(regs->ip);
+		if (!e)
+			return 0;
+
+		again = false;
+
+		type = FIELD_GET(EX_DATA_TYPE_MASK, e->data);
+		reg  = FIELD_GET(EX_DATA_REG_MASK,  e->data);
+		imm  = FIELD_GET(EX_DATA_IMM_MASK,  e->data);
+
+		switch (type) {
+		case EX_TYPE_DEFAULT:
+		case EX_TYPE_DEFAULT_MCE_SAFE:
+			return ex_handler_default(e, regs);
+		case EX_TYPE_FAULT:
+		case EX_TYPE_FAULT_MCE_SAFE:
+			return ex_handler_fault(e, regs, trapnr);
+		case EX_TYPE_UACCESS:
+			return ex_handler_uaccess(e, regs, trapnr, fault_addr);
+		case EX_TYPE_CLEAR_FS:
+			return ex_handler_clear_fs(e, regs);
+		case EX_TYPE_FPU_RESTORE:
+			return ex_handler_fprestore(e, regs);
+		case EX_TYPE_BPF:
+			return ex_handler_bpf(e, regs);
+		case EX_TYPE_WRMSR:
+			return ex_handler_msr(e, regs, true, false, reg);
+		case EX_TYPE_RDMSR:
+			return ex_handler_msr(e, regs, false, false, reg);
+		case EX_TYPE_WRMSR_SAFE:
+			return ex_handler_msr(e, regs, true, true, reg);
+		case EX_TYPE_RDMSR_SAFE:
+			return ex_handler_msr(e, regs, false, true, reg);
+		case EX_TYPE_WRMSR_IN_MCE:
+			ex_handler_msr_mce(regs, true);
+			break;
+		case EX_TYPE_RDMSR_IN_MCE:
+			ex_handler_msr_mce(regs, false);
+			break;
+		case EX_TYPE_POP_REG:
+			regs->sp += sizeof(long);
+			fallthrough;
+		case EX_TYPE_IMM_REG:
+			return ex_handler_imm_reg(e, regs, reg, imm);
+		case EX_TYPE_FAULT_SGX:
+			return ex_handler_sgx(e, regs, trapnr);
+		case EX_TYPE_UCOPY_LEN:
+			return ex_handler_ucopy_len(e, regs, trapnr, fault_addr, reg, imm);
+		case EX_TYPE_ZEROPAD:
+			return ex_handler_zeropad(e, regs, fault_addr);
 #ifdef CONFIG_X86_FRED
-	case EX_TYPE_ERETU:
-		return ex_handler_eretu(e, regs, error_code);
+		case EX_TYPE_ERETU:
+			return ex_handler_eretu(e, regs, error_code);
 #endif
-	}
+		case EX_TYPE_FUNC_REWIND:
+			again = ex_handler_func_rewind(regs, e->data);
+			break;
+		default:
+			break;	/* Will BUG() */
+		}
+	} while (again);
+
 	BUG();
+	return 0;
 }
 
 extern unsigned int early_recursion_flag;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (19 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 20/34] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  9:57   ` Jürgen Groß
  2025-04-22  8:22 ` [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR Xin Li (Intel)
                   ` (13 subsequent siblings)
  34 siblings, 1 reply; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

The story started from tglx's reply in [1]:

  For actual performance relevant code the current PV ops mechanics
  are a horrorshow when the op defaults to the native instruction.

  look at wrmsrl():

  wrmsrl(msr, val
   wrmsr(msr, (u32)val, (u32)val >> 32))
    paravirt_write_msr(msr, low, high)
      PVOP_VCALL3(cpu.write_msr, msr, low, high)

  Which results in

	mov	$msr, %edi
	mov	$val, %rdx
	mov	%edx, %esi
	shr	$0x20, %rdx
	call	native_write_msr

  and native_write_msr() does at minimum:

	mov    %edi,%ecx
	mov    %esi,%eax
	wrmsr
	ret

  In the worst case 'ret' is going through the return thunk. Not to
  talk about function prologues and whatever.

  This becomes even more silly for trivial instructions like STI/CLI
  or in the worst case paravirt_nop().

  The call makes only sense, when the native default is an actual
  function, but for the trivial cases it's a blatant engineering
  trainwreck.

Later a consensus was reached to utilize the alternatives mechanism to
eliminate the indirect call overhead introduced by the pv_ops APIs:

    1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
       disabled feature, preventing the Xen code from being built
       and ensuring the native code is executed unconditionally.

    2) When built with CONFIG_XEN_PV:

       2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
            the kernel runtime binary is patched to unconditionally
            jump to the native MSR write code.

       2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
            kernel runtime binary is patched to unconditionally jump
            to the Xen MSR write code.

The alternatives mechanism is also used to choose the new immediate
form MSR write instruction when it's available.

Consequently, remove the pv_ops MSR write APIs and the Xen callbacks.

[1]: https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/

Originally-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/fred.h           |   2 +-
 arch/x86/include/asm/msr.h            | 294 ++++++++++++++++++++------
 arch/x86/include/asm/paravirt.h       |  25 ---
 arch/x86/include/asm/paravirt_types.h |   2 -
 arch/x86/kernel/paravirt.c            |   2 -
 arch/x86/xen/enlighten_pv.c           |  41 +---
 arch/x86/xen/xen-asm.S                |  64 ++++++
 arch/x86/xen/xen-ops.h                |   2 +
 8 files changed, 302 insertions(+), 130 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 12b34d5b2953..8ae4429e5401 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -101,7 +101,7 @@ static __always_inline void fred_update_rsp0(void)
 	unsigned long rsp0 = (unsigned long) task_stack_page(current) + THREAD_SIZE;
 
 	if (cpu_feature_enabled(X86_FEATURE_FRED) && (__this_cpu_read(fred_rsp0) != rsp0)) {
-		wrmsrns(MSR_IA32_FRED_RSP0, rsp0);
+		native_wrmsrq(MSR_IA32_FRED_RSP0, rsp0);
 		__this_cpu_write(fred_rsp0, rsp0);
 	}
 }
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 8f7a67b1c61c..bd3bdb3c3d23 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -75,9 +75,40 @@ static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
 #endif
 
 #ifdef CONFIG_XEN_PV
+extern void asm_xen_write_msr(void);
 extern u64 xen_read_pmc(int counter);
 #endif
 
+/* The GNU Assembler (Gas) with Binutils 2.40 adds WRMSRNS support */
+#if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24000
+#define ASM_WRMSRNS		"wrmsrns"
+#else
+#define ASM_WRMSRNS		_ASM_BYTES(0x0f,0x01,0xc6)
+#endif
+
+/* The GNU Assembler (Gas) with Binutils 2.41 adds the .insn directive support */
+#if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24100
+#define ASM_WRMSRNS_IMM			\
+	" .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
+#else
+/*
+ * Note, clang also doesn't support the .insn directive.
+ *
+ * The register operand is encoded as %rax because all uses of the immediate
+ * form MSR access instructions reference %rax as the register operand.
+ */
+#define ASM_WRMSRNS_IMM			\
+	" .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
+#endif
+
+#define PREPARE_RDX_FOR_WRMSR		\
+	"mov %%rax, %%rdx\n\t"		\
+	"shr $0x20, %%rdx\n\t"
+
+#define PREPARE_RCX_RDX_FOR_WRMSR	\
+	"mov %[msr], %%ecx\n\t"		\
+	PREPARE_RDX_FOR_WRMSR
+
 /*
  * Called only from an MSR fault handler, the instruction pointer points to
  * the MSR access instruction that caused the fault.
@@ -96,13 +127,6 @@ static __always_inline bool is_msr_imm_insn(void *ip)
 #endif
 }
 
-/*
- * __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
- * accessors and should not have any tracing or other functionality piggybacking
- * on them - those are *purely* for accessing MSRs and nothing more. So don't even
- * think of extending them - you will be slapped with a stinking trout or a frozen
- * shark will reach you, wherever you are! You've been warned.
- */
 static __always_inline u64 __rdmsr(u32 msr)
 {
 	DECLARE_ARGS(val, low, high);
@@ -115,14 +139,6 @@ static __always_inline u64 __rdmsr(u32 msr)
 	return EAX_EDX_VAL(val, low, high);
 }
 
-static __always_inline void __wrmsrq(u32 msr, u64 val)
-{
-	asm volatile("1: wrmsr\n"
-		     "2:\n"
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
-		     : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)) : "memory");
-}
-
 #define native_rdmsr(msr, val1, val2)			\
 do {							\
 	u64 __val = __rdmsr((msr));			\
@@ -135,12 +151,6 @@ static __always_inline u64 native_rdmsrq(u32 msr)
 	return __rdmsr(msr);
 }
 
-#define native_wrmsr(msr, low, high)			\
-	__wrmsrq((msr), (u64)(high) << 32 | (low))
-
-#define native_wrmsrq(msr, val)				\
-	__wrmsrq((msr), (val))
-
 static inline u64 native_read_msr(u32 msr)
 {
 	u64 val;
@@ -171,7 +181,132 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
 	return err;
 }
 
-/* Can be uninlined because referenced by paravirt */
+/*
+ * There are two sets of APIs for MSR accesses: native APIs and generic APIs.
+ * Native MSR APIs execute MSR instructions directly, regardless of whether the
+ * CPU is paravirtualized or native.  Generic MSR APIs determine the appropriate
+ * MSR access method at runtime, allowing them to be used generically on both
+ * paravirtualized and native CPUs.
+ *
+ * When the compiler can determine the MSR number at compile time, the APIs
+ * with the suffix _constant() are used to enable the immediate form MSR
+ * instructions when available.  The APIs with the suffix _variable() are
+ * used when the MSR number is not known until run time.
+ *
+ * Below is a diagram illustrating the derivation of the MSR write APIs:
+ *
+ *      __native_wrmsrq_variable()    __native_wrmsrq_constant()
+ *                         \           /
+ *                          \         /
+ *                         __native_wrmsrq()   -----------------------
+ *                            /     \                                |
+ *                           /       \                               |
+ *               native_wrmsrq()    native_write_msr_safe()          |
+ *                   /    \                                          |
+ *                  /      \                                         |
+ *      native_wrmsr()    native_write_msr()                         |
+ *                                                                   |
+ *                                                                   |
+ *                                                                   |
+ *                   __xenpv_wrmsrq()                                |
+ *                         |                                         |
+ *                         |                                         |
+ *                      __wrmsrq()   <--------------------------------
+ *                       /    \
+ *                      /      \
+ *                 wrmsrq()   wrmsrq_safe()
+ *                    /          \
+ *                   /            \
+ *                wrmsr()        wrmsr_safe()
+ */
+
+/*
+ * Non-serializing WRMSR, when available.
+ *
+ * Otherwise, it falls back to a serializing WRMSR.
+ */
+static __always_inline bool __native_wrmsrq_variable(u32 msr, u64 val, int type)
+{
+#ifdef CONFIG_X86_64
+	BUILD_BUG_ON(__builtin_constant_p(msr));
+#endif
+
+	/*
+	 * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant
+	 * DS prefix to avoid a trailing NOP.
+	 */
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE("ds wrmsr",
+			    ASM_WRMSRNS,
+			    X86_FEATURE_WRMSRNS)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])
+
+		:
+		: "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)), [type] "i" (type)
+		: "memory"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+
+#ifdef CONFIG_X86_64
+/*
+ * Non-serializing WRMSR or its immediate form, when available.
+ *
+ * Otherwise, it falls back to a serializing WRMSR.
+ */
+static __always_inline bool __native_wrmsrq_constant(u32 msr, u64 val, int type)
+{
+	BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
+			      "2: ds wrmsr",
+			      PREPARE_RCX_RDX_FOR_WRMSR
+			      ASM_WRMSRNS,
+			      X86_FEATURE_WRMSRNS,
+			      ASM_WRMSRNS_IMM,
+			      X86_FEATURE_MSR_IMM)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For WRMSRNS immediate */
+		_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type])	/* For WRMSR(NS) */
+
+		:
+		: [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
+		: "memory", "ecx", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+#endif
+
+static __always_inline bool __native_wrmsrq(u32 msr, u64 val, int type)
+{
+#ifdef CONFIG_X86_64
+	if (__builtin_constant_p(msr))
+		return __native_wrmsrq_constant(msr, val, type);
+#endif
+
+	return __native_wrmsrq_variable(msr, val, type);
+}
+
+static __always_inline void native_wrmsrq(u32 msr, u64 val)
+{
+	__native_wrmsrq(msr, val, EX_TYPE_WRMSR);
+}
+
+static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
+{
+	native_wrmsrq(msr, (u64)high << 32 | low);
+}
+
 static inline void notrace native_write_msr(u32 msr, u64 val)
 {
 	native_wrmsrq(msr, val);
@@ -180,22 +315,82 @@ static inline void notrace native_write_msr(u32 msr, u64 val)
 		do_trace_write_msr(msr, val, 0);
 }
 
-/* Can be uninlined because referenced by paravirt */
 static inline int notrace native_write_msr_safe(u32 msr, u64 val)
 {
-	int err;
+	int err = __native_wrmsrq(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
 
-	asm volatile("1: wrmsr ; xor %[err],%[err]\n"
-		     "2:\n\t"
-		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
-		     : [err] "=a" (err)
-		     : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
-		     : "memory");
 	if (tracepoint_enabled(write_msr))
 		do_trace_write_msr(msr, val, err);
+
 	return err;
 }
 
+#ifdef CONFIG_XEN_PV
+/* No plan to support immediate form MSR instructions in Xen */
+static __always_inline bool __xenpv_wrmsrq(u32 msr, u64 val, int type)
+{
+	asm_inline volatile goto(
+		"call asm_xen_write_msr\n\t"
+		"jnz 2f\n\t"
+		ALTERNATIVE("1: ds wrmsr",
+			    ASM_WRMSRNS,
+			    X86_FEATURE_WRMSRNS)
+		"2:\n"
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For WRMSR(NS) */
+
+		: ASM_CALL_CONSTRAINT
+		: "a" (val), "c" (msr), [type] "i" (type)
+		: "memory", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+#endif
+
+static __always_inline bool __wrmsrq(u32 msr, u64 val, int type)
+{
+	bool ret;
+
+#ifdef CONFIG_XEN_PV
+	if (cpu_feature_enabled(X86_FEATURE_XENPV))
+		return __xenpv_wrmsrq(msr, val, type);
+#endif
+
+	/*
+	 * 1) When built with !CONFIG_XEN_PV.
+	 * 2) When built with CONFIG_XEN_PV but not running on Xen hypervisor.
+	 */
+	ret = __native_wrmsrq(msr, val, type);
+
+	if (tracepoint_enabled(write_msr))
+		do_trace_write_msr(msr, val, ret ? -EIO : 0);
+
+	return ret;
+}
+
+static __always_inline void wrmsrq(u32 msr, u64 val)
+{
+	__wrmsrq(msr, val, EX_TYPE_WRMSR);
+}
+
+static __always_inline void wrmsr(u32 msr, u32 low, u32 high)
+{
+	wrmsrq(msr, (u64)high << 32 | low);
+}
+
+static __always_inline int wrmsrq_safe(u32 msr, u64 val)
+{
+	return __wrmsrq(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
+}
+
+static __always_inline int wrmsr_safe(u32 msr, u32 low, u32 high)
+{
+	return wrmsrq_safe(msr, (u64)high << 32 | low);
+}
+
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
@@ -242,25 +437,9 @@ do {								\
 	(void)((high) = (u32)(__val >> 32));			\
 } while (0)
 
-static inline void wrmsr(u32 msr, u32 low, u32 high)
-{
-	native_write_msr(msr, (u64)high << 32 | low);
-}
-
 #define rdmsrq(msr, val)			\
 	((val) = native_read_msr((msr)))
 
-static inline void wrmsrq(u32 msr, u64 val)
-{
-	native_write_msr(msr, val);
-}
-
-/* wrmsr with exception handling */
-static inline int wrmsrq_safe(u32 msr, u64 val)
-{
-	return native_write_msr_safe(msr, val);
-}
-
 /* rdmsr with exception handling */
 #define rdmsr_safe(msr, low, high)				\
 ({								\
@@ -277,29 +456,6 @@ static inline int rdmsrq_safe(u32 msr, u64 *p)
 }
 #endif	/* !CONFIG_PARAVIRT_XXL */
 
-/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
-#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
-
-/* Non-serializing WRMSR, when available.  Falls back to a serializing WRMSR. */
-static __always_inline void wrmsrns(u32 msr, u64 val)
-{
-	/*
-	 * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant
-	 * DS prefix to avoid a trailing NOP.
-	 */
-	asm volatile("1: " ALTERNATIVE("ds wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS)
-		     "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
-		     : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
-}
-
-/*
- * Dual u32 version of wrmsrq_safe():
- */
-static inline int wrmsr_safe(u32 msr, u32 low, u32 high)
-{
-	return wrmsrq_safe(msr, (u64)high << 32 | low);
-}
-
 struct msr __percpu *msrs_alloc(void);
 void msrs_free(struct msr __percpu *msrs);
 int msr_set_bit(u32 msr, u8 bit);
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 1bd1dad8da5a..6634f6cf801f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -180,21 +180,11 @@ static inline u64 paravirt_read_msr(unsigned msr)
 	return PVOP_CALL1(u64, cpu.read_msr, msr);
 }
 
-static inline void paravirt_write_msr(u32 msr, u64 val)
-{
-	PVOP_VCALL2(cpu.write_msr, msr, val);
-}
-
 static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
 {
 	return PVOP_CALL2(u64, cpu.read_msr_safe, msr, err);
 }
 
-static inline int paravirt_write_msr_safe(u32 msr, u64 val)
-{
-	return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
-}
-
 #define rdmsr(msr, val1, val2)			\
 do {						\
 	u64 _l = paravirt_read_msr(msr);	\
@@ -202,26 +192,11 @@ do {						\
 	val2 = _l >> 32;			\
 } while (0)
 
-static __always_inline void wrmsr(u32 msr, u32 low, u32 high)
-{
-	paravirt_write_msr(msr, (u64)high << 32 | low);
-}
-
 #define rdmsrq(msr, val)			\
 do {						\
 	val = paravirt_read_msr(msr);		\
 } while (0)
 
-static inline void wrmsrq(u32 msr, u64 val)
-{
-	paravirt_write_msr(msr, val);
-}
-
-static inline int wrmsrq_safe(u32 msr, u64 val)
-{
-	return paravirt_write_msr_safe(msr, val)
-}
-
 /* rdmsr with exception handling */
 #define rdmsr_safe(msr, a, b)				\
 ({							\
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d2db38c32bc5..18bb0e5bd22f 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -92,14 +92,12 @@ struct pv_cpu_ops {
 
 	/* Unsafe MSR operations.  These will warn or panic on failure. */
 	u64 (*read_msr)(unsigned int msr);
-	void (*write_msr)(u32 msr, u64 val);
 
 	/*
 	 * Safe MSR operations.
 	 * Returns 0 or -EIO.
 	 */
 	int (*read_msr_safe)(unsigned int msr, u64 *val);
-	int (*write_msr_safe)(u32 msr, u64 val);
 
 	void (*start_context_switch)(struct task_struct *prev);
 	void (*end_context_switch)(struct task_struct *next);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 28d195ad7514..62bf66f61821 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -129,9 +129,7 @@ struct paravirt_patch_template pv_ops = {
 	.cpu.write_cr0		= native_write_cr0,
 	.cpu.write_cr4		= native_write_cr4,
 	.cpu.read_msr		= native_read_msr,
-	.cpu.write_msr		= native_write_msr,
 	.cpu.read_msr_safe	= native_read_msr_safe,
-	.cpu.write_msr_safe	= native_write_msr_safe,
 	.cpu.load_tr_desc	= native_load_tr_desc,
 	.cpu.set_ldt		= native_set_ldt,
 	.cpu.load_gdt		= native_load_gdt,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 195e6501a000..4672de7fc084 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1118,26 +1118,26 @@ static void set_seg(u32 which, u64 base)
 }
 
 /*
- * Support write_msr_safe() and write_msr() semantics.
- * With err == NULL write_msr() semantics are selected.
- * Supplying an err pointer requires err to be pre-initialized with 0.
+ * Return true to indicate the requested MSR write has been done successfully,
+ * otherwise return false to have the calling MSR write primitives in msr.h to
+ * fail.
  */
-static void xen_do_write_msr(u32 msr, u64 val, int *err)
+bool xen_write_msr(u32 msr, u64 val)
 {
 	bool emulated;
 
 	switch (msr) {
 	case MSR_FS_BASE:
 		set_seg(SEGBASE_FS, val);
-		break;
+		return true;
 
 	case MSR_KERNEL_GS_BASE:
 		set_seg(SEGBASE_GS_USER, val);
-		break;
+		return true;
 
 	case MSR_GS_BASE:
 		set_seg(SEGBASE_GS_KERNEL, val);
-		break;
+		return true;
 
 	case MSR_STAR:
 	case MSR_CSTAR:
@@ -1149,16 +1149,13 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
 		/* Fast syscall setup is all done in hypercalls, so
 		   these are all ignored.  Stub them out here to stop
 		   Xen console noise. */
-		break;
+		return true;
 
 	default:
 		if (pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated)
-			return;
+			return true;
 
-		if (err)
-			*err = native_write_msr_safe(msr, val);
-		else
-			native_write_msr(msr, val);
+		return false;
 	}
 }
 
@@ -1170,15 +1167,6 @@ static int xen_read_msr_safe(unsigned int msr, u64 *val)
 	return err;
 }
 
-static int xen_write_msr_safe(u32 msr, u64 val)
-{
-	int err = 0;
-
-	xen_do_write_msr(msr, val, &err);
-
-	return err;
-}
-
 static u64 xen_read_msr(unsigned int msr)
 {
 	int err;
@@ -1186,13 +1174,6 @@ static u64 xen_read_msr(unsigned int msr)
 	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
 }
 
-static void xen_write_msr(u32 msr, u64 val)
-{
-	int err;
-
-	xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
-}
-
 /* This is called once we have the cpu_possible_mask */
 void __init xen_setup_vcpu_info_placement(void)
 {
@@ -1228,10 +1209,8 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
 		.write_cr4 = xen_write_cr4,
 
 		.read_msr = xen_read_msr,
-		.write_msr = xen_write_msr,
 
 		.read_msr_safe = xen_read_msr_safe,
-		.write_msr_safe = xen_write_msr_safe,
 
 		.load_tr_desc = paravirt_nop,
 		.set_ldt = xen_set_ldt,
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 461bb1526502..eecce47fbe49 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -342,3 +342,67 @@ SYM_CODE_END(xen_entry_SYSENTER_compat)
 SYM_CODE_END(xen_entry_SYSCALL_compat)
 
 #endif	/* CONFIG_IA32_EMULATION */
+
+/*
+ * To leverage the alternatives mechanism and eliminate the overhead of Xen
+ * MSR and PMU counter access on native systems, as well as to enable new MSR
+ * instructions based on their availability, assembly trampoline functions
+ * are introduced when CONFIG_XEN_PV is enabled.
+ *
+ * Since these trampoline functions are invoked without saving callee registers,
+ * they must save the callee registers and the frame pointer.
+ */
+.macro XEN_SAVE_CALLEE_REGS_FOR_MSR
+	push %rcx
+	push %rdi
+	push %rsi
+	push %r8
+	push %r9
+	push %r10
+	push %r11
+.endm
+
+.macro XEN_RESTORE_CALLEE_REGS_FOR_MSR
+	pop %r11
+	pop %r10
+	pop %r9
+	pop %r8
+	pop %rsi
+	pop %rdi
+	pop %rcx
+.endm
+
+/*
+ * MSR number in %ecx, MSR value in %rax.
+ *
+ * %edx is set up to match %rax >> 32 like the native stub
+ * is expected to do
+ *
+ * Let xen_write_msr() return 'false' if the MSR access should
+ * be executed natively, IOW, 'true' means it has done the job.
+ *
+ * 	bool xen_write_msr(u32 msr, u64 value)
+ *
+ * If ZF=1 then this will fall down to the actual native WRMSR[NS]
+ * instruction.
+ *
+ * This also removes the need for Xen to maintain different safe and
+ * unsafe MSR routines, as the difference is handled by the same
+ * trap handler as is used natively.
+ */
+ SYM_FUNC_START(asm_xen_write_msr)
+	ENDBR
+	FRAME_BEGIN
+	push %rax		/* Save in case of native fallback */
+	XEN_SAVE_CALLEE_REGS_FOR_MSR
+	mov %ecx, %edi		/* MSR number */
+	mov %rax, %rsi		/* MSR data */
+	call xen_write_msr
+	test %al, %al		/* %al=1, i.e., ZF=0, means successfully done */
+	XEN_RESTORE_CALLEE_REGS_FOR_MSR
+	mov 4(%rsp), %edx	/* Set up %edx for native execution */
+	pop %rax
+	FRAME_END
+	RET
+SYM_FUNC_END(asm_xen_write_msr)
+EXPORT_SYMBOL_GPL(asm_xen_write_msr)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index fde9f9d7415f..56712242262a 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -146,6 +146,8 @@ __visible unsigned long xen_read_cr2_direct(void);
 /* These are not functions, and cannot be called normally */
 __visible void xen_iret(void);
 
+extern bool xen_write_msr(u32 msr, u64 val);
+
 extern int xen_panic_handler_init(void);
 
 int xen_cpuhp_setup(int (*cpu_up_prepare_cb)(unsigned int),
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-22  8:22 ` [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR Xin Li (Intel)
@ 2025-04-22  9:57   ` Jürgen Groß
  2025-04-23  8:51     ` Xin Li
  2025-04-25  7:11     ` Peter Zijlstra
  0 siblings, 2 replies; 94+ messages in thread
From: Jürgen Groß @ 2025-04-22  9:57 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 2899 bytes --]

On 22.04.25 10:22, Xin Li (Intel) wrote:
> The story started from tglx's reply in [1]:
> 
>    For actual performance relevant code the current PV ops mechanics
>    are a horrorshow when the op defaults to the native instruction.
> 
>    look at wrmsrl():
> 
>    wrmsrl(msr, val
>     wrmsr(msr, (u32)val, (u32)val >> 32))
>      paravirt_write_msr(msr, low, high)
>        PVOP_VCALL3(cpu.write_msr, msr, low, high)
> 
>    Which results in
> 
> 	mov	$msr, %edi
> 	mov	$val, %rdx
> 	mov	%edx, %esi
> 	shr	$0x20, %rdx
> 	call	native_write_msr
> 
>    and native_write_msr() does at minimum:
> 
> 	mov    %edi,%ecx
> 	mov    %esi,%eax
> 	wrmsr
> 	ret
> 
>    In the worst case 'ret' is going through the return thunk. Not to
>    talk about function prologues and whatever.
> 
>    This becomes even more silly for trivial instructions like STI/CLI
>    or in the worst case paravirt_nop().

This is nonsense.

In the non-Xen case the initial indirect call is directly replaced with
STI/CLI via alternative patching, while for Xen it is replaced by a direct
call.

The paravirt_nop() case is handled in alt_replace_call() by replacing the
indirect call with a nop in case the target of the call was paravirt_nop()
(which is in fact no_func()).

> 
>    The call makes only sense, when the native default is an actual
>    function, but for the trivial cases it's a blatant engineering
>    trainwreck.

The trivial cases are all handled as stated above: a direct replacement
instruction is placed at the indirect call position.

> Later a consensus was reached to utilize the alternatives mechanism to
> eliminate the indirect call overhead introduced by the pv_ops APIs:
> 
>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>         disabled feature, preventing the Xen code from being built
>         and ensuring the native code is executed unconditionally.

This is the case today already. There is no need for any change to have
this in place.

> 
>      2) When built with CONFIG_XEN_PV:
> 
>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>              the kernel runtime binary is patched to unconditionally
>              jump to the native MSR write code.
> 
>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>              kernel runtime binary is patched to unconditionally jump
>              to the Xen MSR write code.

I can't see what is different here compared to today's state.

> 
> The alternatives mechanism is also used to choose the new immediate
> form MSR write instruction when it's available.

Yes, this needs to be added.

> Consequently, remove the pv_ops MSR write APIs and the Xen callbacks.

I still don't see a major difference to today's solution.

Only the "paravirt" term has been eliminated.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-22  9:57   ` Jürgen Groß
@ 2025-04-23  8:51     ` Xin Li
  2025-04-23 16:05       ` Jürgen Groß
  2025-04-25  7:11     ` Peter Zijlstra
  1 sibling, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-23  8:51 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/22/2025 2:57 AM, Jürgen Groß wrote:
> On 22.04.25 10:22, Xin Li (Intel) wrote:
>> The story started from tglx's reply in [1]:
>>
>>    For actual performance relevant code the current PV ops mechanics
>>    are a horrorshow when the op defaults to the native instruction.
>>
>>    look at wrmsrl():
>>
>>    wrmsrl(msr, val
>>     wrmsr(msr, (u32)val, (u32)val >> 32))
>>      paravirt_write_msr(msr, low, high)
>>        PVOP_VCALL3(cpu.write_msr, msr, low, high)
>>
>>    Which results in
>>
>>     mov    $msr, %edi
>>     mov    $val, %rdx
>>     mov    %edx, %esi
>>     shr    $0x20, %rdx
>>     call    native_write_msr
>>
>>    and native_write_msr() does at minimum:
>>
>>     mov    %edi,%ecx
>>     mov    %esi,%eax
>>     wrmsr
>>     ret
>>
>>    In the worst case 'ret' is going through the return thunk. Not to
>>    talk about function prologues and whatever.
>>
>>    This becomes even more silly for trivial instructions like STI/CLI
>>    or in the worst case paravirt_nop().
> 
> This is nonsense.
> 
> In the non-Xen case the initial indirect call is directly replaced with
> STI/CLI via alternative patching, while for Xen it is replaced by a direct
> call.
> 
> The paravirt_nop() case is handled in alt_replace_call() by replacing the
> indirect call with a nop in case the target of the call was paravirt_nop()
> (which is in fact no_func()).
> 
>>
>>    The call makes only sense, when the native default is an actual
>>    function, but for the trivial cases it's a blatant engineering
>>    trainwreck.
> 
> The trivial cases are all handled as stated above: a direct replacement
> instruction is placed at the indirect call position.

The above comment was given in 2023 IIRC, and you have addressed it.

> 
>> Later a consensus was reached to utilize the alternatives mechanism to
>> eliminate the indirect call overhead introduced by the pv_ops APIs:
>>
>>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>         disabled feature, preventing the Xen code from being built
>>         and ensuring the native code is executed unconditionally.
> 
> This is the case today already. There is no need for any change to have
> this in place.
> 
>>
>>      2) When built with CONFIG_XEN_PV:
>>
>>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>              the kernel runtime binary is patched to unconditionally
>>              jump to the native MSR write code.
>>
>>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>              kernel runtime binary is patched to unconditionally jump
>>              to the Xen MSR write code.
> 
> I can't see what is different here compared to today's state.
> 
>>
>> The alternatives mechanism is also used to choose the new immediate
>> form MSR write instruction when it's available.
> 
> Yes, this needs to be added.
> 
>> Consequently, remove the pv_ops MSR write APIs and the Xen callbacks.
> 
> I still don't see a major difference to today's solution.

The existing code generates:

     ...
     bf e0 06 00 00          mov    $0x6e0,%edi
     89 d6                   mov    %edx,%esi
     48 c1 ea 20             shr    $0x20,%rdx
     ff 15 07 48 8c 01       call   *0x18c4807(%rip)  # <pv_ops+0xb8>
     31 c0                   xor    %eax,%eax
     ...

And on native, the indirect call instruction is patched to a direct call
as you mentioned:

     ...
     bf e0 06 00 00          mov    $0x6e0,%edi
     89 d6                   mov    %edx,%esi
     48 c1 ea 20             shr    $0x20,%rdx
     e8 60 3e 01 00          call   <{native,xen}_write_msr> # direct
     90                      nop
     31 c0                   xor    %eax,%eax
     ...


This patch set generates assembly w/o CALL on native:

     ...
     e9 e6 22 c6 01          jmp    1f   # on native or nop on Xen
     b9 e0 06 00 00          mov    $0x6e0,%ecx
     e8 91 d4 fa ff          call   ffffffff8134ee80 <asm_xen_write_msr>
     e9 a4 9f eb 00          jmp    ffffffff8225b9a0 <__x86_return_thunk>
         ...
1:  b9 e0 06 00 00          mov    $0x6e0,%ecx   # immediate form here
     48 89 c2                mov    %rax,%rdx
     48 c1 ea 20             shr    $0x20,%rdx
     3e 0f 30                ds wrmsr
     ...

It's not a major change, but when it is patched to use the immediate 
form MSR write instruction, it's straightforwardly streamlined.

> 
> Only the "paravirt" term has been eliminated.

Yes.

But a PV guest doesn't operate at the highest privilege level, which
means MSR instructions typically result in a #GP fault.  I actually 
think the pv_ops MSR APIs are unnecessary because of this inherent
limitation.

Looking at the Xen MSR code, except PMU and just a few MSRs, it falls
back to executes native MSR instructions.  As MSR instructions trigger
#GP, Xen takes control and handles them in 2 ways:

   1) emulate (or ignore) a MSR operation and skip the guest instruction.

   2) inject the #GP back to guest OS and let its #GP handler handle it.
      But Linux MSR exception handler just ignores the MSR instruction
      (MCE MSR exception will panic).

So why not let Xen handle all the details which it already tries to do?
(Linux w/ such a change may not be able to run on old Xen hypervisors.)

BTW, if performance is a concern, writes to MSR_KERNEL_GS_BASE and
MSR_GS_BASE anyway are hpyercalls into Xen.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-23  8:51     ` Xin Li
@ 2025-04-23 16:05       ` Jürgen Groß
  2025-04-24  8:06         ` Xin Li
  2025-04-25 12:33         ` Peter Zijlstra
  0 siblings, 2 replies; 94+ messages in thread
From: Jürgen Groß @ 2025-04-23 16:05 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 6940 bytes --]

On 23.04.25 10:51, Xin Li wrote:
> On 4/22/2025 2:57 AM, Jürgen Groß wrote:
>> On 22.04.25 10:22, Xin Li (Intel) wrote:
>>> The story started from tglx's reply in [1]:
>>>
>>>    For actual performance relevant code the current PV ops mechanics
>>>    are a horrorshow when the op defaults to the native instruction.
>>>
>>>    look at wrmsrl():
>>>
>>>    wrmsrl(msr, val
>>>     wrmsr(msr, (u32)val, (u32)val >> 32))
>>>      paravirt_write_msr(msr, low, high)
>>>        PVOP_VCALL3(cpu.write_msr, msr, low, high)
>>>
>>>    Which results in
>>>
>>>     mov    $msr, %edi
>>>     mov    $val, %rdx
>>>     mov    %edx, %esi
>>>     shr    $0x20, %rdx
>>>     call    native_write_msr
>>>
>>>    and native_write_msr() does at minimum:
>>>
>>>     mov    %edi,%ecx
>>>     mov    %esi,%eax
>>>     wrmsr
>>>     ret
>>>
>>>    In the worst case 'ret' is going through the return thunk. Not to
>>>    talk about function prologues and whatever.
>>>
>>>    This becomes even more silly for trivial instructions like STI/CLI
>>>    or in the worst case paravirt_nop().
>>
>> This is nonsense.
>>
>> In the non-Xen case the initial indirect call is directly replaced with
>> STI/CLI via alternative patching, while for Xen it is replaced by a direct
>> call.
>>
>> The paravirt_nop() case is handled in alt_replace_call() by replacing the
>> indirect call with a nop in case the target of the call was paravirt_nop()
>> (which is in fact no_func()).
>>
>>>
>>>    The call makes only sense, when the native default is an actual
>>>    function, but for the trivial cases it's a blatant engineering
>>>    trainwreck.
>>
>> The trivial cases are all handled as stated above: a direct replacement
>> instruction is placed at the indirect call position.
> 
> The above comment was given in 2023 IIRC, and you have addressed it.
> 
>>
>>> Later a consensus was reached to utilize the alternatives mechanism to
>>> eliminate the indirect call overhead introduced by the pv_ops APIs:
>>>
>>>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>>         disabled feature, preventing the Xen code from being built
>>>         and ensuring the native code is executed unconditionally.
>>
>> This is the case today already. There is no need for any change to have
>> this in place.
>>
>>>
>>>      2) When built with CONFIG_XEN_PV:
>>>
>>>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>>              the kernel runtime binary is patched to unconditionally
>>>              jump to the native MSR write code.
>>>
>>>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>>              kernel runtime binary is patched to unconditionally jump
>>>              to the Xen MSR write code.
>>
>> I can't see what is different here compared to today's state.
>>
>>>
>>> The alternatives mechanism is also used to choose the new immediate
>>> form MSR write instruction when it's available.
>>
>> Yes, this needs to be added.
>>
>>> Consequently, remove the pv_ops MSR write APIs and the Xen callbacks.
>>
>> I still don't see a major difference to today's solution.
> 
> The existing code generates:
> 
>      ...
>      bf e0 06 00 00          mov    $0x6e0,%edi
>      89 d6                   mov    %edx,%esi
>      48 c1 ea 20             shr    $0x20,%rdx
>      ff 15 07 48 8c 01       call   *0x18c4807(%rip)  # <pv_ops+0xb8>
>      31 c0                   xor    %eax,%eax
>      ...
> 
> And on native, the indirect call instruction is patched to a direct call
> as you mentioned:
> 
>      ...
>      bf e0 06 00 00          mov    $0x6e0,%edi
>      89 d6                   mov    %edx,%esi
>      48 c1 ea 20             shr    $0x20,%rdx
>      e8 60 3e 01 00          call   <{native,xen}_write_msr> # direct
>      90                      nop
>      31 c0                   xor    %eax,%eax
>      ...
> 
> 
> This patch set generates assembly w/o CALL on native:
> 
>      ...
>      e9 e6 22 c6 01          jmp    1f   # on native or nop on Xen
>      b9 e0 06 00 00          mov    $0x6e0,%ecx
>      e8 91 d4 fa ff          call   ffffffff8134ee80 <asm_xen_write_msr>
>      e9 a4 9f eb 00          jmp    ffffffff8225b9a0 <__x86_return_thunk>
>          ...
> 1:  b9 e0 06 00 00          mov    $0x6e0,%ecx   # immediate form here
>      48 89 c2                mov    %rax,%rdx
>      48 c1 ea 20             shr    $0x20,%rdx
>      3e 0f 30                ds wrmsr
>      ...
> 
> It's not a major change, but when it is patched to use the immediate form MSR 
> write instruction, it's straightforwardly streamlined.

It should be rather easy to switch the current wrmsr/rdmsr paravirt patching
locations to use the rdmsr/wrmsr instructions instead of doing a call to
native_*msr().

The case of the new immediate form could be handled the same way.

> 
>>
>> Only the "paravirt" term has been eliminated.
> 
> Yes.
> 
> But a PV guest doesn't operate at the highest privilege level, which
> means MSR instructions typically result in a #GP fault.  I actually think the 
> pv_ops MSR APIs are unnecessary because of this inherent
> limitation.
> 
> Looking at the Xen MSR code, except PMU and just a few MSRs, it falls
> back to executes native MSR instructions.  As MSR instructions trigger
> #GP, Xen takes control and handles them in 2 ways:
> 
>    1) emulate (or ignore) a MSR operation and skip the guest instruction.
> 
>    2) inject the #GP back to guest OS and let its #GP handler handle it.
>       But Linux MSR exception handler just ignores the MSR instruction
>       (MCE MSR exception will panic).
> 
> So why not let Xen handle all the details which it already tries to do?

Some MSRs are not handled that way, but via a kernel internal emulation.
And those are handled that way mostly due to performance reasons. And some
need special treatment.

> (Linux w/ such a change may not be able to run on old Xen hypervisors.)

Yes, and this is something to avoid.

And remember that Linux isn't the only PV-mode guest existing.

> BTW, if performance is a concern, writes to MSR_KERNEL_GS_BASE and
> MSR_GS_BASE anyway are hpyercalls into Xen.

Yes, and some other MSR writes are just NOPs with Xen-PV.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-23 16:05       ` Jürgen Groß
@ 2025-04-24  8:06         ` Xin Li
  2025-04-24  8:14           ` Jürgen Groß
  2025-04-25 12:33         ` Peter Zijlstra
  1 sibling, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-24  8:06 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/23/2025 9:05 AM, Jürgen Groß wrote:
>> It's not a major change, but when it is patched to use the immediate 
>> form MSR write instruction, it's straightforwardly streamlined.
> 
> It should be rather easy to switch the current wrmsr/rdmsr paravirt 
> patching
> locations to use the rdmsr/wrmsr instructions instead of doing a call to
> native_*msr().
> 
> The case of the new immediate form could be handled the same way.

Actually, that is how we get this patch with the existing alternatives
infrastructure.  And we took a step further to also remove the pv_ops
MSR APIs...

It looks to me that you want to add a new facility to the alternatives
infrastructure first?


>>> Only the "paravirt" term has been eliminated.
>>
>> Yes.
>>
>> But a PV guest doesn't operate at the highest privilege level, which
>> means MSR instructions typically result in a #GP fault.  I actually 
>> think the pv_ops MSR APIs are unnecessary because of this inherent
>> limitation.
>>
>> Looking at the Xen MSR code, except PMU and just a few MSRs, it falls
>> back to executes native MSR instructions.  As MSR instructions trigger
>> #GP, Xen takes control and handles them in 2 ways:
>>
>>    1) emulate (or ignore) a MSR operation and skip the guest instruction.
>>
>>    2) inject the #GP back to guest OS and let its #GP handler handle it.
>>       But Linux MSR exception handler just ignores the MSR instruction
>>       (MCE MSR exception will panic).
>>
>> So why not let Xen handle all the details which it already tries to do?
> 
> Some MSRs are not handled that way, but via a kernel internal emulation.
> And those are handled that way mostly due to performance reasons. And some
> need special treatment.
> 
>> (Linux w/ such a change may not be able to run on old Xen hypervisors.)
> 
> Yes, and this is something to avoid.
> 
> And remember that Linux isn't the only PV-mode guest existing.
> 
>> BTW, if performance is a concern, writes to MSR_KERNEL_GS_BASE and
>> MSR_GS_BASE anyway are hpyercalls into Xen.
> 
> Yes, and some other MSR writes are just NOPs with Xen-PV.
> 

I will do some cleanup and refactor first.

BTW, at least we can merge the safe() APIs into the non-safe() ones.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-24  8:06         ` Xin Li
@ 2025-04-24  8:14           ` Jürgen Groß
  2025-04-25  1:15             ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-24  8:14 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 1048 bytes --]

On 24.04.25 10:06, Xin Li wrote:
> On 4/23/2025 9:05 AM, Jürgen Groß wrote:
>>> It's not a major change, but when it is patched to use the immediate form MSR 
>>> write instruction, it's straightforwardly streamlined.
>>
>> It should be rather easy to switch the current wrmsr/rdmsr paravirt patching
>> locations to use the rdmsr/wrmsr instructions instead of doing a call to
>> native_*msr().
>>
>> The case of the new immediate form could be handled the same way.
> 
> Actually, that is how we get this patch with the existing alternatives
> infrastructure.  And we took a step further to also remove the pv_ops
> MSR APIs...

And this is what I'm questioning. IMHO this approach is adding more
code by removing the pv_ops MSR_APIs just because "pv_ops is bad". And
I believe most refusal of pv_ops is based on no longer valid reasoning.

> It looks to me that you want to add a new facility to the alternatives
> infrastructure first?

Why would we need a new facility in the alternatives infrastructure?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-24  8:14           ` Jürgen Groß
@ 2025-04-25  1:15             ` H. Peter Anvin
  2025-04-25  3:44               ` H. Peter Anvin
  2025-04-25  6:51               ` Jürgen Groß
  0 siblings, 2 replies; 94+ messages in thread
From: H. Peter Anvin @ 2025-04-25  1:15 UTC (permalink / raw)
  To: Jürgen Groß, Xin Li, linux-kernel, kvm,
	linux-perf-users, linux-hyperv, virtualization, linux-pm,
	linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, acme, andrew.cooper3, peterz,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/24/25 01:14, Jürgen Groß wrote:
>>
>> Actually, that is how we get this patch with the existing alternatives
>> infrastructure.  And we took a step further to also remove the pv_ops
>> MSR APIs...
> 
> And this is what I'm questioning. IMHO this approach is adding more
> code by removing the pv_ops MSR_APIs just because "pv_ops is bad". And
> I believe most refusal of pv_ops is based on no longer valid reasoning.
> 

pvops are a headache because it is effectively a secondary alternatives 
infrastructure that is incompatible with the alternatives one...

>> It looks to me that you want to add a new facility to the alternatives
>> infrastructure first?
> 
> Why would we need a new facility in the alternatives infrastructure?

I'm not sure what Xin means with "facility", but a key motivation for 
this is to:

a. Avoid using the pvops for MSRs when on the only remaining user 
thereof (Xen) is only using it for a very small subset of MSRs and for 
the rest it is just overhead, even for Xen;

b. Being able to do wrmsrns immediate/wrmsrns/wrmsr and rdmsr 
immediate/rdmsr alternatives.

Of these, (b) is by far the biggest motivation. The architectural 
direction for supervisor states is to avoid ad hoc and XSAVES ISA and 
instead use MSRs. The immediate forms are expected to be significantly 
faster, because they make the MSR index available at the very beginning 
of the pipeline instead of at a relatively late stage.

	-hpa

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-25  1:15             ` H. Peter Anvin
@ 2025-04-25  3:44               ` H. Peter Anvin
  2025-04-25  7:01                 ` Jürgen Groß
  2025-04-25  6:51               ` Jürgen Groß
  1 sibling, 1 reply; 94+ messages in thread
From: H. Peter Anvin @ 2025-04-25  3:44 UTC (permalink / raw)
  To: Jürgen Groß, Xin Li, linux-kernel, kvm,
	linux-perf-users, linux-hyperv, virtualization, linux-pm,
	linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, acme, andrew.cooper3, peterz,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/24/25 18:15, H. Peter Anvin wrote:
> On 4/24/25 01:14, Jürgen Groß wrote:
>>>
>>> Actually, that is how we get this patch with the existing alternatives
>>> infrastructure.  And we took a step further to also remove the pv_ops
>>> MSR APIs...
>>
>> And this is what I'm questioning. IMHO this approach is adding more
>> code by removing the pv_ops MSR_APIs just because "pv_ops is bad". And
>> I believe most refusal of pv_ops is based on no longer valid reasoning.
>>
> 
> pvops are a headache because it is effectively a secondary alternatives 
> infrastructure that is incompatible with the alternatives one...
> 
>>> It looks to me that you want to add a new facility to the alternatives
>>> infrastructure first?
>>
>> Why would we need a new facility in the alternatives infrastructure?
> 
> I'm not sure what Xin means with "facility", but a key motivation for 
> this is to:
> 
> a. Avoid using the pvops for MSRs when on the only remaining user 
> thereof (Xen) is only using it for a very small subset of MSRs and for 
> the rest it is just overhead, even for Xen;
> 
> b. Being able to do wrmsrns immediate/wrmsrns/wrmsr and rdmsr immediate/ 
> rdmsr alternatives.
> 
> Of these, (b) is by far the biggest motivation. The architectural 
> direction for supervisor states is to avoid ad hoc and XSAVES ISA and 
> instead use MSRs. The immediate forms are expected to be significantly 
> faster, because they make the MSR index available at the very beginning 
> of the pipeline instead of at a relatively late stage.
> 

Note that to support the immediate forms, we *must* do these inline, or 
the const-ness of the MSR index -- which applies to by far the vast 
majority of MSR references -- gets lost. pvops does exactly that.

Furthermore, the MSR immediate instructions take a 64-bit number in a 
single register; as these instructions are by necessity relatively long, 
it makes sense for the alternative sequence to accept a 64-bit input 
register and do the %eax/%edx shuffle in the legacy fallback code... we 
did a bunch of experiments to see what made most sense.

	-hpa


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-25  3:44               ` H. Peter Anvin
@ 2025-04-25  7:01                 ` Jürgen Groß
  2025-04-25 15:28                   ` H. Peter Anvin
  0 siblings, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-25  7:01 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, acme, andrew.cooper3, peterz,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 2854 bytes --]

On 25.04.25 05:44, H. Peter Anvin wrote:
> On 4/24/25 18:15, H. Peter Anvin wrote:
>> On 4/24/25 01:14, Jürgen Groß wrote:
>>>>
>>>> Actually, that is how we get this patch with the existing alternatives
>>>> infrastructure.  And we took a step further to also remove the pv_ops
>>>> MSR APIs...
>>>
>>> And this is what I'm questioning. IMHO this approach is adding more
>>> code by removing the pv_ops MSR_APIs just because "pv_ops is bad". And
>>> I believe most refusal of pv_ops is based on no longer valid reasoning.
>>>
>>
>> pvops are a headache because it is effectively a secondary alternatives 
>> infrastructure that is incompatible with the alternatives one...
>>
>>>> It looks to me that you want to add a new facility to the alternatives
>>>> infrastructure first?
>>>
>>> Why would we need a new facility in the alternatives infrastructure?
>>
>> I'm not sure what Xin means with "facility", but a key motivation for this is to:
>>
>> a. Avoid using the pvops for MSRs when on the only remaining user thereof 
>> (Xen) is only using it for a very small subset of MSRs and for the rest it is 
>> just overhead, even for Xen;
>>
>> b. Being able to do wrmsrns immediate/wrmsrns/wrmsr and rdmsr immediate/ rdmsr 
>> alternatives.
>>
>> Of these, (b) is by far the biggest motivation. The architectural direction 
>> for supervisor states is to avoid ad hoc and XSAVES ISA and instead use MSRs. 
>> The immediate forms are expected to be significantly faster, because they make 
>> the MSR index available at the very beginning of the pipeline instead of at a 
>> relatively late stage.
>>
> 
> Note that to support the immediate forms, we *must* do these inline, or the 
> const-ness of the MSR index -- which applies to by far the vast majority of MSR 
> references -- gets lost. pvops does exactly that.
> 
> Furthermore, the MSR immediate instructions take a 64-bit number in a single 
> register; as these instructions are by necessity relatively long, it makes sense 
> for the alternative sequence to accept a 64-bit input register and do the %eax/ 
> %edx shuffle in the legacy fallback code... we did a bunch of experiments to see 
> what made most sense.

Yes, I understand that.

And I'm totally in favor of Xin's rework of the MSR low level functions.

Inlining the MSR access instructions with pv_ops should not be very
complicated. We do that with other instructions (STI/CLI, PTE accesses)
today, so this is no new kind of functionality.

I could have a try writing a patch achieving that, but I would only start
that work in case you might consider taking it instead of Xin's patch
removing the pv_ops usage for rdmsr/wrmsr. In case it turns out that my
version results in more code changes than Xin's patch, I'd be fine to drop
my patch, of course.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-25  7:01                 ` Jürgen Groß
@ 2025-04-25 15:28                   ` H. Peter Anvin
  0 siblings, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2025-04-25 15:28 UTC (permalink / raw)
  To: Jürgen Groß, Xin Li, linux-kernel, kvm,
	linux-perf-users, linux-hyperv, virtualization, linux-pm,
	linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, acme, andrew.cooper3, peterz,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On April 25, 2025 12:01:29 AM PDT, "Jürgen Groß" <jgross@suse.com> wrote:
>On 25.04.25 05:44, H. Peter Anvin wrote:
>> On 4/24/25 18:15, H. Peter Anvin wrote:
>>> On 4/24/25 01:14, Jürgen Groß wrote:
>>>>> 
>>>>> Actually, that is how we get this patch with the existing alternatives
>>>>> infrastructure.  And we took a step further to also remove the pv_ops
>>>>> MSR APIs...
>>>> 
>>>> And this is what I'm questioning. IMHO this approach is adding more
>>>> code by removing the pv_ops MSR_APIs just because "pv_ops is bad". And
>>>> I believe most refusal of pv_ops is based on no longer valid reasoning.
>>>> 
>>> 
>>> pvops are a headache because it is effectively a secondary alternatives infrastructure that is incompatible with the alternatives one...
>>> 
>>>>> It looks to me that you want to add a new facility to the alternatives
>>>>> infrastructure first?
>>>> 
>>>> Why would we need a new facility in the alternatives infrastructure?
>>> 
>>> I'm not sure what Xin means with "facility", but a key motivation for this is to:
>>> 
>>> a. Avoid using the pvops for MSRs when on the only remaining user thereof (Xen) is only using it for a very small subset of MSRs and for the rest it is just overhead, even for Xen;
>>> 
>>> b. Being able to do wrmsrns immediate/wrmsrns/wrmsr and rdmsr immediate/ rdmsr alternatives.
>>> 
>>> Of these, (b) is by far the biggest motivation. The architectural direction for supervisor states is to avoid ad hoc and XSAVES ISA and instead use MSRs. The immediate forms are expected to be significantly faster, because they make the MSR index available at the very beginning of the pipeline instead of at a relatively late stage.
>>> 
>> 
>> Note that to support the immediate forms, we *must* do these inline, or the const-ness of the MSR index -- which applies to by far the vast majority of MSR references -- gets lost. pvops does exactly that.
>> 
>> Furthermore, the MSR immediate instructions take a 64-bit number in a single register; as these instructions are by necessity relatively long, it makes sense for the alternative sequence to accept a 64-bit input register and do the %eax/ %edx shuffle in the legacy fallback code... we did a bunch of experiments to see what made most sense.
>
>Yes, I understand that.
>
>And I'm totally in favor of Xin's rework of the MSR low level functions.
>
>Inlining the MSR access instructions with pv_ops should not be very
>complicated. We do that with other instructions (STI/CLI, PTE accesses)
>today, so this is no new kind of functionality.
>
>I could have a try writing a patch achieving that, but I would only start
>that work in case you might consider taking it instead of Xin's patch
>removing the pv_ops usage for rdmsr/wrmsr. In case it turns out that my
>version results in more code changes than Xin's patch, I'd be fine to drop
>my patch, of course.
>
>
>Juergen

The wrapper in question is painfully opaque, but if it is much simpler, then I'm certainly willing to consider it... but I don't really see how it would be possible given among other things the need for trap points for the safe MSRs.

Keep in mind this needs to work even without PV enabled!

Note that Andrew encouraged us to pursue the pvops removal for MSRs. Note that Xen benefits pretty heavily because it can dispatch the proper path of the few that are left for the common case of fixed MSRs.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-25  1:15             ` H. Peter Anvin
  2025-04-25  3:44               ` H. Peter Anvin
@ 2025-04-25  6:51               ` Jürgen Groß
  1 sibling, 0 replies; 94+ messages in thread
From: Jürgen Groß @ 2025-04-25  6:51 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, acme, andrew.cooper3, peterz,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 2019 bytes --]

On 25.04.25 03:15, H. Peter Anvin wrote:
> On 4/24/25 01:14, Jürgen Groß wrote:
>>>
>>> Actually, that is how we get this patch with the existing alternatives
>>> infrastructure.  And we took a step further to also remove the pv_ops
>>> MSR APIs...
>>
>> And this is what I'm questioning. IMHO this approach is adding more
>> code by removing the pv_ops MSR_APIs just because "pv_ops is bad". And
>> I believe most refusal of pv_ops is based on no longer valid reasoning.
>>
> 
> pvops are a headache because it is effectively a secondary alternatives 
> infrastructure that is incompatible with the alternatives one...

Hu? How can that be, as pv_ops is using only alternatives infrastructure
for doing the patching?

I'd say today pv_ops is a convenience wrapper around alternatives.

> 
>>> It looks to me that you want to add a new facility to the alternatives
>>> infrastructure first?
>>
>> Why would we need a new facility in the alternatives infrastructure?
> 
> I'm not sure what Xin means with "facility", but a key motivation for this is to:
> 
> a. Avoid using the pvops for MSRs when on the only remaining user thereof (Xen) 
> is only using it for a very small subset of MSRs and for the rest it is just 
> overhead, even for Xen;
> 
> b. Being able to do wrmsrns immediate/wrmsrns/wrmsr and rdmsr immediate/rdmsr 
> alternatives.
> 
> Of these, (b) is by far the biggest motivation. The architectural direction for 
> supervisor states is to avoid ad hoc and XSAVES ISA and instead use MSRs. The 
> immediate forms are expected to be significantly faster, because they make the 
> MSR index available at the very beginning of the pipeline instead of at a 
> relatively late stage.

I understand the motivation for b), but I think this could be achieved without
a) rather easily. And I continue to believe that your reasoning for a) is based
on old facts. But may be I'm just not understanding your concerns with today's
pv_ops implementation.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-23 16:05       ` Jürgen Groß
  2025-04-24  8:06         ` Xin Li
@ 2025-04-25 12:33         ` Peter Zijlstra
  2025-04-25 12:51           ` Jürgen Groß
  2025-04-25 15:29           ` H. Peter Anvin
  1 sibling, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2025-04-25 12:33 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86, tglx, mingo, bp,
	dave.hansen, x86, hpa, acme, andrew.cooper3, namhyung,
	mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	kan.liang, wei.liu, ajay.kaher, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Wed, Apr 23, 2025 at 06:05:19PM +0200, Jürgen Groß wrote:

> > It's not a major change, but when it is patched to use the immediate
> > form MSR write instruction, it's straightforwardly streamlined.
> 
> It should be rather easy to switch the current wrmsr/rdmsr paravirt patching
> locations to use the rdmsr/wrmsr instructions instead of doing a call to
> native_*msr().

Right, just make the Xen functions asm stubs that expect the instruction
registers instead of C-abi and ALT_NOT_XEN the thing.

Shouldn't be hard at all.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-25 12:33         ` Peter Zijlstra
@ 2025-04-25 12:51           ` Jürgen Groß
  2025-04-25 20:12             ` H. Peter Anvin
  2025-04-25 15:29           ` H. Peter Anvin
  1 sibling, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-25 12:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86, tglx, mingo, bp,
	dave.hansen, x86, hpa, acme, andrew.cooper3, namhyung,
	mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	kan.liang, wei.liu, ajay.kaher, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 690 bytes --]

On 25.04.25 14:33, Peter Zijlstra wrote:
> On Wed, Apr 23, 2025 at 06:05:19PM +0200, Jürgen Groß wrote:
> 
>>> It's not a major change, but when it is patched to use the immediate
>>> form MSR write instruction, it's straightforwardly streamlined.
>>
>> It should be rather easy to switch the current wrmsr/rdmsr paravirt patching
>> locations to use the rdmsr/wrmsr instructions instead of doing a call to
>> native_*msr().
> 
> Right, just make the Xen functions asm stubs that expect the instruction
> registers instead of C-abi and ALT_NOT_XEN the thing.
> 
> Shouldn't be hard at all.

Correct. And for the new immediate form we can use ALTERNATIVE_3().


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-25 12:51           ` Jürgen Groß
@ 2025-04-25 20:12             ` H. Peter Anvin
  0 siblings, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2025-04-25 20:12 UTC (permalink / raw)
  To: Jürgen Groß, Peter Zijlstra
  Cc: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86, tglx, mingo, bp,
	dave.hansen, x86, acme, andrew.cooper3, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On April 25, 2025 5:51:27 AM PDT, "Jürgen Groß" <jgross@suse.com> wrote:
>On 25.04.25 14:33, Peter Zijlstra wrote:
>> On Wed, Apr 23, 2025 at 06:05:19PM +0200, Jürgen Groß wrote:
>> 
>>>> It's not a major change, but when it is patched to use the immediate
>>>> form MSR write instruction, it's straightforwardly streamlined.
>>> 
>>> It should be rather easy to switch the current wrmsr/rdmsr paravirt patching
>>> locations to use the rdmsr/wrmsr instructions instead of doing a call to
>>> native_*msr().
>> 
>> Right, just make the Xen functions asm stubs that expect the instruction
>> registers instead of C-abi and ALT_NOT_XEN the thing.
>> 
>> Shouldn't be hard at all.
>
>Correct. And for the new immediate form we can use ALTERNATIVE_3().
>
>
>Juergen

Yes; in the ultimate case there are *four* alternatives, but the concept is the same and again we have it implemented already.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-25 12:33         ` Peter Zijlstra
  2025-04-25 12:51           ` Jürgen Groß
@ 2025-04-25 15:29           ` H. Peter Anvin
  1 sibling, 0 replies; 94+ messages in thread
From: H. Peter Anvin @ 2025-04-25 15:29 UTC (permalink / raw)
  To: Peter Zijlstra, Jürgen Groß
  Cc: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86, tglx, mingo, bp,
	dave.hansen, x86, acme, andrew.cooper3, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

On April 25, 2025 5:33:17 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
>On Wed, Apr 23, 2025 at 06:05:19PM +0200, Jürgen Groß wrote:
>
>> > It's not a major change, but when it is patched to use the immediate
>> > form MSR write instruction, it's straightforwardly streamlined.
>> 
>> It should be rather easy to switch the current wrmsr/rdmsr paravirt patching
>> locations to use the rdmsr/wrmsr instructions instead of doing a call to
>> native_*msr().
>
>Right, just make the Xen functions asm stubs that expect the instruction
>registers instead of C-abi and ALT_NOT_XEN the thing.
>
>Shouldn't be hard at all.

And that's what we will be doing. We already have code for that.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
  2025-04-22  9:57   ` Jürgen Groß
  2025-04-23  8:51     ` Xin Li
@ 2025-04-25  7:11     ` Peter Zijlstra
  1 sibling, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2025-04-25  7:11 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86, tglx, mingo, bp,
	dave.hansen, x86, hpa, acme, andrew.cooper3, namhyung,
	mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	kan.liang, wei.liu, ajay.kaher, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

On Tue, Apr 22, 2025 at 11:57:01AM +0200, Jürgen Groß wrote:
> On 22.04.25 10:22, Xin Li (Intel) wrote:

> >    This becomes even more silly for trivial instructions like STI/CLI
> >    or in the worst case paravirt_nop().
> 
> This is nonsense.

What Jurgen says. Someone hasn't done their homework.

static __always_inline void arch_local_irq_disable(void)
{
        PVOP_ALT_VCALLEE0(irq.irq_disable, "cli;", ALT_NOT_XEN);
}

static __always_inline void arch_local_irq_enable(void)
{
        PVOP_ALT_VCALLEE0(irq.irq_enable, "sti;", ALT_NOT_XEN);
}

That very much patches in STI/CLI directly when not Xen.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (20 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:59   ` Jürgen Groß
  2025-04-22 11:12   ` Jürgen Groß
  2025-04-22  8:22 ` [RFC PATCH v2 23/34] x86/extable: Remove new dead code in ex_handler_msr() Xin Li (Intel)
                   ` (12 subsequent siblings)
  34 siblings, 2 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

To eliminate the indirect call overhead introduced by the pv_ops API,
utilize the alternatives mechanism to read MSR:

    1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
       disabled feature, preventing the Xen code from being built
       and ensuring the native code is executed unconditionally.

    2) When built with CONFIG_XEN_PV:

       2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
            the kernel runtime binary is patched to unconditionally
            jump to the native MSR read code.

       2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
            kernel runtime binary is patched to unconditionally jump
            to the Xen MSR read code.

The alternatives mechanism is also used to choose the new immediate
form MSR read instruction when it's available.

Consequently, remove the pv_ops MSR read APIs and the Xen callbacks.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h            | 277 +++++++++++++++++++-------
 arch/x86/include/asm/paravirt.h       |  40 ----
 arch/x86/include/asm/paravirt_types.h |   9 -
 arch/x86/kernel/paravirt.c            |   2 -
 arch/x86/xen/enlighten_pv.c           |  48 ++---
 arch/x86/xen/xen-asm.S                |  49 +++++
 arch/x86/xen/xen-ops.h                |   7 +
 7 files changed, 279 insertions(+), 153 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index bd3bdb3c3d23..5271cb002b23 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -75,6 +75,7 @@ static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
 #endif
 
 #ifdef CONFIG_XEN_PV
+extern void asm_xen_read_msr(void);
 extern void asm_xen_write_msr(void);
 extern u64 xen_read_pmc(int counter);
 #endif
@@ -88,6 +89,8 @@ extern u64 xen_read_pmc(int counter);
 
 /* The GNU Assembler (Gas) with Binutils 2.41 adds the .insn directive support */
 #if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24100
+#define ASM_RDMSR_IMM			\
+	" .insn VEX.128.F2.M7.W0 0xf6 /0, %[msr]%{:u32}, %[val]\n\t"
 #define ASM_WRMSRNS_IMM			\
 	" .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
 #else
@@ -97,10 +100,17 @@ extern u64 xen_read_pmc(int counter);
  * The register operand is encoded as %rax because all uses of the immediate
  * form MSR access instructions reference %rax as the register operand.
  */
+#define ASM_RDMSR_IMM			\
+	" .byte 0xc4,0xe7,0x7b,0xf6,0xc0; .long %c[msr]"
 #define ASM_WRMSRNS_IMM			\
 	" .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
 #endif
 
+#define RDMSR_AND_SAVE_RESULT		\
+	"rdmsr\n\t"			\
+	"shl $0x20, %%rdx\n\t"		\
+	"or %%rdx, %%rax\n\t"
+
 #define PREPARE_RDX_FOR_WRMSR		\
 	"mov %%rax, %%rdx\n\t"		\
 	"shr $0x20, %%rdx\n\t"
@@ -127,35 +137,135 @@ static __always_inline bool is_msr_imm_insn(void *ip)
 #endif
 }
 
-static __always_inline u64 __rdmsr(u32 msr)
+/*
+ * There are two sets of APIs for MSR accesses: native APIs and generic APIs.
+ * Native MSR APIs execute MSR instructions directly, regardless of whether the
+ * CPU is paravirtualized or native.  Generic MSR APIs determine the appropriate
+ * MSR access method at runtime, allowing them to be used generically on both
+ * paravirtualized and native CPUs.
+ *
+ * When the compiler can determine the MSR number at compile time, the APIs
+ * with the suffix _constant() are used to enable the immediate form MSR
+ * instructions when available.  The APIs with the suffix _variable() are
+ * used when the MSR number is not known until run time.
+ *
+ * Below is a diagram illustrating the derivation of the MSR read APIs:
+ *
+ *      __native_rdmsrq_variable()    __native_rdmsrq_constant()
+ *                         \           /
+ *                          \         /
+ *                         __native_rdmsrq()   -----------------------
+ *                            /     \                                |
+ *                           /       \                               |
+ *               native_rdmsrq()    native_read_msr_safe()           |
+ *                   /    \                                          |
+ *                  /      \                                         |
+ *      native_rdmsr()    native_read_msr()                          |
+ *                                                                   |
+ *                                                                   |
+ *                                                                   |
+ *                    __xenpv_rdmsrq()                               |
+ *                         |                                         |
+ *                         |                                         |
+ *                      __rdmsrq()   <--------------------------------
+ *                       /    \
+ *                      /      \
+ *                 rdmsrq()   rdmsrq_safe()
+ *                    /          \
+ *                   /            \
+ *                rdmsr()        rdmsr_safe()
+ */
+
+static __always_inline bool __native_rdmsrq_variable(u32 msr, u64 *val, int type)
 {
-	DECLARE_ARGS(val, low, high);
+#ifdef CONFIG_X86_64
+	BUILD_BUG_ON(__builtin_constant_p(msr));
 
-	asm volatile("1: rdmsr\n"
-		     "2:\n"
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
-		     : EAX_EDX_RET(val, low, high) : "c" (msr));
+	asm_inline volatile goto(
+		"1:\n"
+		RDMSR_AND_SAVE_RESULT
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR */
 
-	return EAX_EDX_VAL(val, low, high);
+		: [val] "=a" (*val)
+		: "c" (msr), [type] "i" (type)
+		: "rdx"
+		: badmsr);
+#else
+	asm_inline volatile goto(
+		"1: rdmsr\n\t"
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR */
+
+		: "=A" (*val)
+		: "c" (msr), [type] "i" (type)
+		:
+		: badmsr);
+#endif
+
+	return false;
+
+badmsr:
+	*val = 0;
+
+	return true;
 }
 
-#define native_rdmsr(msr, val1, val2)			\
-do {							\
-	u64 __val = __rdmsr((msr));			\
-	(void)((val1) = (u32)__val);			\
-	(void)((val2) = (u32)(__val >> 32));		\
-} while (0)
+#ifdef CONFIG_X86_64
+static __always_inline bool __native_rdmsrq_constant(u32 msr, u64 *val, int type)
+{
+	BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE("mov %[msr], %%ecx\n\t"
+			    "2:\n"
+			    RDMSR_AND_SAVE_RESULT,
+			    ASM_RDMSR_IMM,
+			    X86_FEATURE_MSR_IMM)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR immediate */
+		_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type])	/* For RDMSR */
+
+		: [val] "=a" (*val)
+		: [msr] "i" (msr), [type] "i" (type)
+		: "ecx", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	*val = 0;
+
+	return true;
+}
+#endif
+
+static __always_inline bool __native_rdmsrq(u32 msr, u64 *val, int type)
+{
+#ifdef CONFIG_X86_64
+	if (__builtin_constant_p(msr))
+		return __native_rdmsrq_constant(msr, val, type);
+#endif
+
+	return __native_rdmsrq_variable(msr, val, type);
+}
 
 static __always_inline u64 native_rdmsrq(u32 msr)
 {
-	return __rdmsr(msr);
+	u64 val = 0;
+
+	__native_rdmsrq(msr, &val, EX_TYPE_RDMSR);
+	return val;
 }
 
+#define native_rdmsr(msr, low, high)			\
+do {							\
+	u64 __val = native_rdmsrq(msr);			\
+	(void)((low) = (u32)__val);			\
+	(void)((high) = (u32)(__val >> 32));		\
+} while (0)
+
 static inline u64 native_read_msr(u32 msr)
 {
-	u64 val;
-
-	val = __rdmsr(msr);
+	u64 val = native_rdmsrq(msr);
 
 	if (tracepoint_enabled(read_msr))
 		do_trace_read_msr(msr, val, 0);
@@ -163,36 +273,91 @@ static inline u64 native_read_msr(u32 msr)
 	return val;
 }
 
-static inline int native_read_msr_safe(u32 msr, u64 *p)
+static inline int native_read_msr_safe(u32 msr, u64 *val)
 {
 	int err;
-	DECLARE_ARGS(val, low, high);
 
-	asm volatile("1: rdmsr ; xor %[err],%[err]\n"
-		     "2:\n\t"
-		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
-		     : [err] "=r" (err), EAX_EDX_RET(val, low, high)
-		     : "c" (msr));
-	if (tracepoint_enabled(read_msr))
-		do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), err);
+	err = __native_rdmsrq(msr, val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
 
-	*p = EAX_EDX_VAL(val, low, high);
+	if (tracepoint_enabled(read_msr))
+		do_trace_read_msr(msr, *val, err);
 
 	return err;
 }
 
+#ifdef CONFIG_XEN_PV
+/* No plan to support immediate form MSR instructions in Xen */
+static __always_inline bool __xenpv_rdmsrq(u32 msr, u64 *val, int type)
+{
+	asm_inline volatile goto(
+		"1: call asm_xen_read_msr\n\t"
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For CALL */
+
+		: [val] "=a" (*val), ASM_CALL_CONSTRAINT
+		: "c" (msr), [type] "i" (type)
+		: "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	*val = 0;
+
+	return true;
+}
+#endif
+
+static __always_inline bool __rdmsrq(u32 msr, u64 *val, int type)
+{
+	bool ret;
+
+#ifdef CONFIG_XEN_PV
+	if (cpu_feature_enabled(X86_FEATURE_XENPV))
+		return __xenpv_rdmsrq(msr, val, type);
+#endif
+
+	/*
+	 * 1) When built with !CONFIG_XEN_PV.
+	 * 2) When built with CONFIG_XEN_PV but not running on Xen hypervisor.
+	 */
+	ret = __native_rdmsrq(msr, val, type);
+
+	if (tracepoint_enabled(read_msr))
+		do_trace_read_msr(msr, *val, ret ? -EIO : 0);
+
+	return ret;
+}
+
+#define rdmsrq(msr, val)				\
+do {							\
+	u64 ___val = 0;					\
+	__rdmsrq((msr), &___val, EX_TYPE_RDMSR);	\
+	(val) = ___val;					\
+} while (0)
+
+#define rdmsr(msr, low, high)				\
+do {							\
+	u64 __val = 0;					\
+	rdmsrq((msr), __val);				\
+	(void)((low) = (u32)__val);			\
+	(void)((high) = (u32)(__val >> 32));		\
+} while (0)
+
+static __always_inline int rdmsrq_safe(u32 msr, u64 *val)
+{
+	return __rdmsrq(msr, val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
+}
+
+#define rdmsr_safe(msr, low, high)			\
+({							\
+	u64 __val = 0;					\
+	int __err = rdmsrq_safe((msr), &__val);		\
+	(*low) = (u32)__val;				\
+	(*high) = (u32)(__val >> 32);			\
+	__err;						\
+})
+
 /*
- * There are two sets of APIs for MSR accesses: native APIs and generic APIs.
- * Native MSR APIs execute MSR instructions directly, regardless of whether the
- * CPU is paravirtualized or native.  Generic MSR APIs determine the appropriate
- * MSR access method at runtime, allowing them to be used generically on both
- * paravirtualized and native CPUs.
- *
- * When the compiler can determine the MSR number at compile time, the APIs
- * with the suffix _constant() are used to enable the immediate form MSR
- * instructions when available.  The APIs with the suffix _variable() are
- * used when the MSR number is not known until run time.
- *
  * Below is a diagram illustrating the derivation of the MSR write APIs:
  *
  *      __native_wrmsrq_variable()    __native_wrmsrq_constant()
@@ -420,42 +585,6 @@ static __always_inline u64 rdpmcq(int counter)
 	return native_rdpmcq(counter);
 }
 
-#ifdef CONFIG_PARAVIRT_XXL
-#include <asm/paravirt.h>
-#else
-#include <linux/errno.h>
-/*
- * Access to machine-specific registers (available on 586 and better only)
- * Note: the rd* operations modify the parameters directly (without using
- * pointer indirection), this allows gcc to optimize better
- */
-
-#define rdmsr(msr, low, high)					\
-do {								\
-	u64 __val = native_read_msr((msr));			\
-	(void)((low) = (u32)__val);				\
-	(void)((high) = (u32)(__val >> 32));			\
-} while (0)
-
-#define rdmsrq(msr, val)			\
-	((val) = native_read_msr((msr)))
-
-/* rdmsr with exception handling */
-#define rdmsr_safe(msr, low, high)				\
-({								\
-	u64 __val;						\
-	int __err = native_read_msr_safe((msr), &__val);	\
-	(*low) = (u32)__val;					\
-	(*high) = (u32)(__val >> 32);				\
-	__err;							\
-})
-
-static inline int rdmsrq_safe(u32 msr, u64 *p)
-{
-	return native_read_msr_safe(msr, p);
-}
-#endif	/* !CONFIG_PARAVIRT_XXL */
-
 struct msr __percpu *msrs_alloc(void);
 void msrs_free(struct msr __percpu *msrs);
 int msr_set_bit(u32 msr, u8 bit);
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6634f6cf801f..e248a77b719f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -175,46 +175,6 @@ static inline void __write_cr4(unsigned long x)
 	PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline u64 paravirt_read_msr(unsigned msr)
-{
-	return PVOP_CALL1(u64, cpu.read_msr, msr);
-}
-
-static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
-{
-	return PVOP_CALL2(u64, cpu.read_msr_safe, msr, err);
-}
-
-#define rdmsr(msr, val1, val2)			\
-do {						\
-	u64 _l = paravirt_read_msr(msr);	\
-	val1 = (u32)_l;				\
-	val2 = _l >> 32;			\
-} while (0)
-
-#define rdmsrq(msr, val)			\
-do {						\
-	val = paravirt_read_msr(msr);		\
-} while (0)
-
-/* rdmsr with exception handling */
-#define rdmsr_safe(msr, a, b)				\
-({							\
-	int _err;					\
-	u64 _l = paravirt_read_msr_safe(msr, &_err);	\
-	(*a) = (u32)_l;					\
-	(*b) = _l >> 32;				\
-	_err;						\
-})
-
-static inline int rdmsrq_safe(unsigned msr, u64 *p)
-{
-	int err;
-
-	*p = paravirt_read_msr_safe(msr, &err);
-	return err;
-}
-
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
 {
 	PVOP_VCALL2(cpu.alloc_ldt, ldt, entries);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 18bb0e5bd22f..ae31ecf08933 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -90,15 +90,6 @@ struct pv_cpu_ops {
 	void (*cpuid)(unsigned int *eax, unsigned int *ebx,
 		      unsigned int *ecx, unsigned int *edx);
 
-	/* Unsafe MSR operations.  These will warn or panic on failure. */
-	u64 (*read_msr)(unsigned int msr);
-
-	/*
-	 * Safe MSR operations.
-	 * Returns 0 or -EIO.
-	 */
-	int (*read_msr_safe)(unsigned int msr, u64 *val);
-
 	void (*start_context_switch)(struct task_struct *prev);
 	void (*end_context_switch)(struct task_struct *next);
 #endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 62bf66f61821..9f5eb8a78040 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -128,8 +128,6 @@ struct paravirt_patch_template pv_ops = {
 	.cpu.read_cr0		= native_read_cr0,
 	.cpu.write_cr0		= native_write_cr0,
 	.cpu.write_cr4		= native_write_cr4,
-	.cpu.read_msr		= native_read_msr,
-	.cpu.read_msr_safe	= native_read_msr_safe,
 	.cpu.load_tr_desc	= native_load_tr_desc,
 	.cpu.set_ldt		= native_set_ldt,
 	.cpu.load_gdt		= native_load_gdt,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4672de7fc084..267e241b9236 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1086,19 +1086,26 @@ static void xen_write_cr4(unsigned long cr4)
 	native_write_cr4(cr4);
 }
 
-static u64 xen_do_read_msr(unsigned int msr, int *err)
+/*
+ * Return true in xen_rdmsr_ret_type to indicate the requested MSR read has
+ * been done successfully.
+ */
+struct xen_rdmsr_ret_type xen_read_msr(u32 msr)
 {
-	u64 val = 0;	/* Avoid uninitialized value for safe variant. */
-	bool emulated;
+	struct xen_rdmsr_ret_type ret;
 
-	if (pmu_msr_chk_emulated(msr, &val, true, &emulated) && emulated)
-		return val;
+	ret.done = true;
 
-	if (err)
-		*err = native_read_msr_safe(msr, &val);
-	else
-		val = native_read_msr(msr);
+	if (pmu_msr_chk_emulated(msr, &ret.val, true, &ret.done) && ret.done)
+		return ret;
+
+	ret.val = 0;
+	ret.done = false;
+	return ret;
+}
 
+u64 xen_read_msr_fixup(u32 msr, u64 val)
+{
 	switch (msr) {
 	case MSR_IA32_APICBASE:
 		val &= ~X2APIC_ENABLE;
@@ -1107,7 +1114,11 @@ static u64 xen_do_read_msr(unsigned int msr, int *err)
 		else
 			val &= ~MSR_IA32_APICBASE_BSP;
 		break;
+
+	default:
+		break;
 	}
+
 	return val;
 }
 
@@ -1159,21 +1170,6 @@ bool xen_write_msr(u32 msr, u64 val)
 	}
 }
 
-static int xen_read_msr_safe(unsigned int msr, u64 *val)
-{
-	int err;
-
-	*val = xen_do_read_msr(msr, &err);
-	return err;
-}
-
-static u64 xen_read_msr(unsigned int msr)
-{
-	int err;
-
-	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
-}
-
 /* This is called once we have the cpu_possible_mask */
 void __init xen_setup_vcpu_info_placement(void)
 {
@@ -1208,10 +1204,6 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
 
 		.write_cr4 = xen_write_cr4,
 
-		.read_msr = xen_read_msr,
-
-		.read_msr_safe = xen_read_msr_safe,
-
 		.load_tr_desc = paravirt_nop,
 		.set_ldt = xen_set_ldt,
 		.load_gdt = xen_load_gdt,
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index eecce47fbe49..62270ef85c56 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -406,3 +406,52 @@ SYM_CODE_END(xen_entry_SYSCALL_compat)
 	RET
 SYM_FUNC_END(asm_xen_write_msr)
 EXPORT_SYMBOL_GPL(asm_xen_write_msr)
+
+/*
+ * MSR number in %ecx, MSR value will be returned in %rax.
+ *
+ * The prototype of the Xen C code:
+ * 	struct { u64 val, bool done } xen_read_msr(u32 msr)
+ */
+SYM_FUNC_START(asm_xen_read_msr)
+	ENDBR
+	FRAME_BEGIN
+	XEN_SAVE_CALLEE_REGS_FOR_MSR
+	mov %ecx, %edi		/* MSR number */
+	call xen_read_msr
+	test %dl, %dl		/* %dl=1, i.e., ZF=0, meaning successfully done */
+	XEN_RESTORE_CALLEE_REGS_FOR_MSR
+	jnz 2f
+
+	/*
+	 * Falls through to the native RDMSR instruction if xen_read_msr() failed,
+	 * i.e., the MSR access should be executed natively, which will trigger a
+	 * #GP fault...
+	 */
+1:	rdmsr
+
+	/*
+	 * Note, #GP on RDMSR is reflected to the caller of this function through
+	 * EX_TYPE_FUNC_REWIND, which enforces a coupling between the caller and
+	 * callee, IOW the callee is able to calculate the address of the CALL
+	 * instruction in the caller that invoked it.
+	 *
+	 * The top of the stack points directly at the return address;
+	 * back up by 5 bytes (length of the CALL instruction in the caller) from
+	 * the return address.
+	 */
+	_ASM_EXTABLE_FUNC_REWIND(1b, -5, FRAME_OFFSET / (BITS_PER_LONG / 8))
+
+	shl $0x20, %rdx
+	or %rdx, %rax
+
+	XEN_SAVE_CALLEE_REGS_FOR_MSR
+	mov %ecx, %edi
+	mov %rax, %rsi
+	call xen_read_msr_fixup
+	XEN_RESTORE_CALLEE_REGS_FOR_MSR
+
+2:	FRAME_END
+	RET
+SYM_FUNC_END(asm_xen_read_msr)
+EXPORT_SYMBOL_GPL(asm_xen_read_msr)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 56712242262a..483526ec13c6 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -146,7 +146,14 @@ __visible unsigned long xen_read_cr2_direct(void);
 /* These are not functions, and cannot be called normally */
 __visible void xen_iret(void);
 
+struct xen_rdmsr_ret_type {
+	u64 val;
+	bool done;
+};
+
 extern bool xen_write_msr(u32 msr, u64 val);
+extern struct xen_rdmsr_ret_type xen_read_msr(u32 msr);
+extern u64 xen_read_msr_fixup(u32 msr, u64 val);
 
 extern int xen_panic_handler_init(void);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR
  2025-04-22  8:22 ` [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR Xin Li (Intel)
@ 2025-04-22  8:59   ` Jürgen Groß
  2025-04-22  9:20     ` Xin Li
  2025-04-22 11:12   ` Jürgen Groß
  1 sibling, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-22  8:59 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 1135 bytes --]

On 22.04.25 10:22, Xin Li (Intel) wrote:
> To eliminate the indirect call overhead introduced by the pv_ops API,
> utilize the alternatives mechanism to read MSR:
> 
>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>         disabled feature, preventing the Xen code from being built
>         and ensuring the native code is executed unconditionally.
> 
>      2) When built with CONFIG_XEN_PV:
> 
>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>              the kernel runtime binary is patched to unconditionally
>              jump to the native MSR read code.
> 
>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>              kernel runtime binary is patched to unconditionally jump
>              to the Xen MSR read code.
> 
> The alternatives mechanism is also used to choose the new immediate
> form MSR read instruction when it's available.
> 
> Consequently, remove the pv_ops MSR read APIs and the Xen callbacks.

Same as the comment to patch 5: there is no indirect call overhead after
the system has come up.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR
  2025-04-22  8:59   ` Jürgen Groß
@ 2025-04-22  9:20     ` Xin Li
  2025-04-22  9:57       ` Jürgen Groß
  0 siblings, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-22  9:20 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/22/2025 1:59 AM, Jürgen Groß wrote:
> On 22.04.25 10:22, Xin Li (Intel) wrote:
>> To eliminate the indirect call overhead introduced by the pv_ops API,
>> utilize the alternatives mechanism to read MSR:
>>
>>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>         disabled feature, preventing the Xen code from being built
>>         and ensuring the native code is executed unconditionally.
>>
>>      2) When built with CONFIG_XEN_PV:
>>
>>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>              the kernel runtime binary is patched to unconditionally
>>              jump to the native MSR read code.
>>
>>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>              kernel runtime binary is patched to unconditionally jump
>>              to the Xen MSR read code.
>>
>> The alternatives mechanism is also used to choose the new immediate
>> form MSR read instruction when it's available.
>>
>> Consequently, remove the pv_ops MSR read APIs and the Xen callbacks.
> 
> Same as the comment to patch 5: there is no indirect call overhead after
> the system has come up.
> 

Please check https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/.

And it's was also mentioned in the previous patch:

https://lore.kernel.org/lkml/20250422082216.1954310-22-xin@zytor.com/

Please let me know what I have missed.

Thanks!
     Xin



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR
  2025-04-22  9:20     ` Xin Li
@ 2025-04-22  9:57       ` Jürgen Groß
  0 siblings, 0 replies; 94+ messages in thread
From: Jürgen Groß @ 2025-04-22  9:57 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 1651 bytes --]

On 22.04.25 11:20, Xin Li wrote:
> On 4/22/2025 1:59 AM, Jürgen Groß wrote:
>> On 22.04.25 10:22, Xin Li (Intel) wrote:
>>> To eliminate the indirect call overhead introduced by the pv_ops API,
>>> utilize the alternatives mechanism to read MSR:
>>>
>>>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>>         disabled feature, preventing the Xen code from being built
>>>         and ensuring the native code is executed unconditionally.
>>>
>>>      2) When built with CONFIG_XEN_PV:
>>>
>>>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>>              the kernel runtime binary is patched to unconditionally
>>>              jump to the native MSR read code.
>>>
>>>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>>              kernel runtime binary is patched to unconditionally jump
>>>              to the Xen MSR read code.
>>>
>>> The alternatives mechanism is also used to choose the new immediate
>>> form MSR read instruction when it's available.
>>>
>>> Consequently, remove the pv_ops MSR read APIs and the Xen callbacks.
>>
>> Same as the comment to patch 5: there is no indirect call overhead after
>> the system has come up.
>>
> 
> Please check https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/.
> 
> And it's was also mentioned in the previous patch:
> 
> https://lore.kernel.org/lkml/20250422082216.1954310-22-xin@zytor.com/
> 
> Please let me know what I have missed.

Please see my response to the previous patch.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR
  2025-04-22  8:22 ` [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR Xin Li (Intel)
  2025-04-22  8:59   ` Jürgen Groß
@ 2025-04-22 11:12   ` Jürgen Groß
  2025-04-23  9:03     ` Xin Li
  1 sibling, 1 reply; 94+ messages in thread
From: Jürgen Groß @ 2025-04-22 11:12 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 10126 bytes --]

On 22.04.25 10:22, Xin Li (Intel) wrote:
> To eliminate the indirect call overhead introduced by the pv_ops API,
> utilize the alternatives mechanism to read MSR:
> 
>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>         disabled feature, preventing the Xen code from being built
>         and ensuring the native code is executed unconditionally.
> 
>      2) When built with CONFIG_XEN_PV:
> 
>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>              the kernel runtime binary is patched to unconditionally
>              jump to the native MSR read code.
> 
>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>              kernel runtime binary is patched to unconditionally jump
>              to the Xen MSR read code.
> 
> The alternatives mechanism is also used to choose the new immediate
> form MSR read instruction when it's available.
> 
> Consequently, remove the pv_ops MSR read APIs and the Xen callbacks.
> 
> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>   arch/x86/include/asm/msr.h            | 277 +++++++++++++++++++-------
>   arch/x86/include/asm/paravirt.h       |  40 ----
>   arch/x86/include/asm/paravirt_types.h |   9 -
>   arch/x86/kernel/paravirt.c            |   2 -
>   arch/x86/xen/enlighten_pv.c           |  48 ++---
>   arch/x86/xen/xen-asm.S                |  49 +++++
>   arch/x86/xen/xen-ops.h                |   7 +
>   7 files changed, 279 insertions(+), 153 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index bd3bdb3c3d23..5271cb002b23 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -75,6 +75,7 @@ static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
>   #endif
>   
>   #ifdef CONFIG_XEN_PV
> +extern void asm_xen_read_msr(void);
>   extern void asm_xen_write_msr(void);
>   extern u64 xen_read_pmc(int counter);
>   #endif
> @@ -88,6 +89,8 @@ extern u64 xen_read_pmc(int counter);
>   
>   /* The GNU Assembler (Gas) with Binutils 2.41 adds the .insn directive support */
>   #if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24100
> +#define ASM_RDMSR_IMM			\
> +	" .insn VEX.128.F2.M7.W0 0xf6 /0, %[msr]%{:u32}, %[val]\n\t"
>   #define ASM_WRMSRNS_IMM			\
>   	" .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
>   #else
> @@ -97,10 +100,17 @@ extern u64 xen_read_pmc(int counter);
>    * The register operand is encoded as %rax because all uses of the immediate
>    * form MSR access instructions reference %rax as the register operand.
>    */
> +#define ASM_RDMSR_IMM			\
> +	" .byte 0xc4,0xe7,0x7b,0xf6,0xc0; .long %c[msr]"
>   #define ASM_WRMSRNS_IMM			\
>   	" .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
>   #endif
>   
> +#define RDMSR_AND_SAVE_RESULT		\
> +	"rdmsr\n\t"			\
> +	"shl $0x20, %%rdx\n\t"		\
> +	"or %%rdx, %%rax\n\t"
> +
>   #define PREPARE_RDX_FOR_WRMSR		\
>   	"mov %%rax, %%rdx\n\t"		\
>   	"shr $0x20, %%rdx\n\t"
> @@ -127,35 +137,135 @@ static __always_inline bool is_msr_imm_insn(void *ip)
>   #endif
>   }
>   
> -static __always_inline u64 __rdmsr(u32 msr)
> +/*
> + * There are two sets of APIs for MSR accesses: native APIs and generic APIs.
> + * Native MSR APIs execute MSR instructions directly, regardless of whether the
> + * CPU is paravirtualized or native.  Generic MSR APIs determine the appropriate
> + * MSR access method at runtime, allowing them to be used generically on both
> + * paravirtualized and native CPUs.
> + *
> + * When the compiler can determine the MSR number at compile time, the APIs
> + * with the suffix _constant() are used to enable the immediate form MSR
> + * instructions when available.  The APIs with the suffix _variable() are
> + * used when the MSR number is not known until run time.
> + *
> + * Below is a diagram illustrating the derivation of the MSR read APIs:
> + *
> + *      __native_rdmsrq_variable()    __native_rdmsrq_constant()
> + *                         \           /
> + *                          \         /
> + *                         __native_rdmsrq()   -----------------------
> + *                            /     \                                |
> + *                           /       \                               |
> + *               native_rdmsrq()    native_read_msr_safe()           |
> + *                   /    \                                          |
> + *                  /      \                                         |
> + *      native_rdmsr()    native_read_msr()                          |
> + *                                                                   |
> + *                                                                   |
> + *                                                                   |
> + *                    __xenpv_rdmsrq()                               |
> + *                         |                                         |
> + *                         |                                         |
> + *                      __rdmsrq()   <--------------------------------
> + *                       /    \
> + *                      /      \
> + *                 rdmsrq()   rdmsrq_safe()
> + *                    /          \
> + *                   /            \
> + *                rdmsr()        rdmsr_safe()
> + */
> +
> +static __always_inline bool __native_rdmsrq_variable(u32 msr, u64 *val, int type)
>   {
> -	DECLARE_ARGS(val, low, high);
> +#ifdef CONFIG_X86_64
> +	BUILD_BUG_ON(__builtin_constant_p(msr));
>   
> -	asm volatile("1: rdmsr\n"
> -		     "2:\n"
> -		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
> -		     : EAX_EDX_RET(val, low, high) : "c" (msr));
> +	asm_inline volatile goto(
> +		"1:\n"
> +		RDMSR_AND_SAVE_RESULT
> +		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR */
>   
> -	return EAX_EDX_VAL(val, low, high);
> +		: [val] "=a" (*val)
> +		: "c" (msr), [type] "i" (type)
> +		: "rdx"
> +		: badmsr);
> +#else
> +	asm_inline volatile goto(
> +		"1: rdmsr\n\t"
> +		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR */
> +
> +		: "=A" (*val)
> +		: "c" (msr), [type] "i" (type)
> +		:
> +		: badmsr);
> +#endif
> +
> +	return false;
> +
> +badmsr:
> +	*val = 0;
> +
> +	return true;
>   }
>   
> -#define native_rdmsr(msr, val1, val2)			\
> -do {							\
> -	u64 __val = __rdmsr((msr));			\
> -	(void)((val1) = (u32)__val);			\
> -	(void)((val2) = (u32)(__val >> 32));		\
> -} while (0)
> +#ifdef CONFIG_X86_64
> +static __always_inline bool __native_rdmsrq_constant(u32 msr, u64 *val, int type)
> +{
> +	BUILD_BUG_ON(!__builtin_constant_p(msr));
> +
> +	asm_inline volatile goto(
> +		"1:\n"
> +		ALTERNATIVE("mov %[msr], %%ecx\n\t"
> +			    "2:\n"
> +			    RDMSR_AND_SAVE_RESULT,
> +			    ASM_RDMSR_IMM,
> +			    X86_FEATURE_MSR_IMM)
> +		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR immediate */
> +		_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type])	/* For RDMSR */
> +
> +		: [val] "=a" (*val)
> +		: [msr] "i" (msr), [type] "i" (type)
> +		: "ecx", "rdx"
> +		: badmsr);
> +
> +	return false;
> +
> +badmsr:
> +	*val = 0;
> +
> +	return true;
> +}
> +#endif
> +
> +static __always_inline bool __native_rdmsrq(u32 msr, u64 *val, int type)
> +{
> +#ifdef CONFIG_X86_64
> +	if (__builtin_constant_p(msr))
> +		return __native_rdmsrq_constant(msr, val, type);
> +#endif
> +
> +	return __native_rdmsrq_variable(msr, val, type);
> +}
>   
>   static __always_inline u64 native_rdmsrq(u32 msr)
>   {
> -	return __rdmsr(msr);
> +	u64 val = 0;
> +
> +	__native_rdmsrq(msr, &val, EX_TYPE_RDMSR);
> +	return val;
>   }
>   
> +#define native_rdmsr(msr, low, high)			\
> +do {							\
> +	u64 __val = native_rdmsrq(msr);			\
> +	(void)((low) = (u32)__val);			\
> +	(void)((high) = (u32)(__val >> 32));		\
> +} while (0)
> +
>   static inline u64 native_read_msr(u32 msr)
>   {
> -	u64 val;
> -
> -	val = __rdmsr(msr);
> +	u64 val = native_rdmsrq(msr);
>   
>   	if (tracepoint_enabled(read_msr))
>   		do_trace_read_msr(msr, val, 0);
> @@ -163,36 +273,91 @@ static inline u64 native_read_msr(u32 msr)
>   	return val;
>   }
>   
> -static inline int native_read_msr_safe(u32 msr, u64 *p)
> +static inline int native_read_msr_safe(u32 msr, u64 *val)
>   {
>   	int err;
> -	DECLARE_ARGS(val, low, high);
>   
> -	asm volatile("1: rdmsr ; xor %[err],%[err]\n"
> -		     "2:\n\t"
> -		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
> -		     : [err] "=r" (err), EAX_EDX_RET(val, low, high)
> -		     : "c" (msr));
> -	if (tracepoint_enabled(read_msr))
> -		do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), err);
> +	err = __native_rdmsrq(msr, val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
>   
> -	*p = EAX_EDX_VAL(val, low, high);
> +	if (tracepoint_enabled(read_msr))
> +		do_trace_read_msr(msr, *val, err);
>   
>   	return err;
>   }
>   
> +#ifdef CONFIG_XEN_PV
> +/* No plan to support immediate form MSR instructions in Xen */
> +static __always_inline bool __xenpv_rdmsrq(u32 msr, u64 *val, int type)
> +{
> +	asm_inline volatile goto(
> +		"1: call asm_xen_read_msr\n\t"
> +		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For CALL */
> +
> +		: [val] "=a" (*val), ASM_CALL_CONSTRAINT
> +		: "c" (msr), [type] "i" (type)
> +		: "rdx"
> +		: badmsr);
> +
> +	return false;
> +
> +badmsr:
> +	*val = 0;
> +
> +	return true;
> +}
> +#endif
> +
> +static __always_inline bool __rdmsrq(u32 msr, u64 *val, int type)
> +{
> +	bool ret;
> +
> +#ifdef CONFIG_XEN_PV
> +	if (cpu_feature_enabled(X86_FEATURE_XENPV))
> +		return __xenpv_rdmsrq(msr, val, type);

I don't think this will work for the Xen PV case.

X86_FEATURE_XENPV is set only after the first MSR is being read.

This can be fixed by setting the feature earlier, but it shows that the
paravirt feature has its benefits in such cases.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR
  2025-04-22 11:12   ` Jürgen Groß
@ 2025-04-23  9:03     ` Xin Li
  2025-04-23 16:11       ` Jürgen Groß
  0 siblings, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-23  9:03 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, kvm, linux-perf-users,
	linux-hyperv, virtualization, linux-pm, linux-edac, xen-devel,
	linux-acpi, linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 4/22/2025 4:12 AM, Jürgen Groß wrote:
>> +
>> +static __always_inline bool __rdmsrq(u32 msr, u64 *val, int type)
>> +{
>> +    bool ret;
>> +
>> +#ifdef CONFIG_XEN_PV
>> +    if (cpu_feature_enabled(X86_FEATURE_XENPV))
>> +        return __xenpv_rdmsrq(msr, val, type);
> 
> I don't think this will work for the Xen PV case.

Well, I have been testing the code on xen-4.17 coming with Ubuntu
24.04.2 LTS :)

> 
> X86_FEATURE_XENPV is set only after the first MSR is being read.

No matter whether the code works or not, good catch!

> 
> This can be fixed by setting the feature earlier, but it shows that the
> paravirt feature has its benefits in such cases.

See my other reply to let Xen handle all the details.

Plus the code actually works, I would actually argue the opposite :-P




^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR
  2025-04-23  9:03     ` Xin Li
@ 2025-04-23 16:11       ` Jürgen Groß
  0 siblings, 0 replies; 94+ messages in thread
From: Jürgen Groß @ 2025-04-23 16:11 UTC (permalink / raw)
  To: Xin Li, linux-kernel, kvm, linux-perf-users, linux-hyperv,
	virtualization, linux-pm, linux-edac, xen-devel, linux-acpi,
	linux-hwmon, netdev, platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, andrew.cooper3,
	peterz, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


[-- Attachment #1.1.1: Type: text/plain, Size: 1295 bytes --]

On 23.04.25 11:03, Xin Li wrote:
> On 4/22/2025 4:12 AM, Jürgen Groß wrote:
>>> +
>>> +static __always_inline bool __rdmsrq(u32 msr, u64 *val, int type)
>>> +{
>>> +    bool ret;
>>> +
>>> +#ifdef CONFIG_XEN_PV
>>> +    if (cpu_feature_enabled(X86_FEATURE_XENPV))
>>> +        return __xenpv_rdmsrq(msr, val, type);
>>
>> I don't think this will work for the Xen PV case.
> 
> Well, I have been testing the code on xen-4.17 coming with Ubuntu
> 24.04.2 LTS :)

Hmm, seems that the accessed MSR(s) are the ones falling back to the
native_rdmsr() calls. At least on the hardware you tested on.

>> X86_FEATURE_XENPV is set only after the first MSR is being read.
> 
> No matter whether the code works or not, good catch!
> 
>>
>> This can be fixed by setting the feature earlier, but it shows that the
>> paravirt feature has its benefits in such cases.
> 
> See my other reply to let Xen handle all the details.
> 
> Plus the code actually works, I would actually argue the opposite :-P

BTW, it was in kernel 6.12 I had to change the MSR read emulation for
Xen-PV the last time (fix some problems with changed x86 topology
detection). Things like that won't be easily put into the hypervisor,
which needs to serve other OS-es, too.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 23/34] x86/extable: Remove new dead code in ex_handler_msr()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (21 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 24/34] x86/mce: Use native MSR API __native_{wr,rd}msrq() Xin Li (Intel)
                   ` (11 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

The MSR read APIs no longer expect RAX or EDX:EAX to be cleared upon
returning from #GP caused by MSR read instructions.

The MSR safe APIs no longer assume -EIO being returned in a register
from #GP caused by MSR instructions.

Remove new dead code due to above changes.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/mm/extable.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 6bf4c2a43c2c..ea3fe7f32772 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -165,7 +165,7 @@ static bool ex_handler_uaccess(const struct exception_table_entry *fixup,
 }
 
 static bool ex_handler_msr(const struct exception_table_entry *fixup,
-			   struct pt_regs *regs, bool wrmsr, bool safe, int reg)
+			   struct pt_regs *regs, bool wrmsr, bool safe)
 {
 	bool imm_insn = is_msr_imm_insn((void *)regs->ip);
 	u32 msr;
@@ -207,17 +207,6 @@ static bool ex_handler_msr(const struct exception_table_entry *fixup,
 		show_stack_regs(regs);
 	}
 
-	if (!wrmsr) {
-		/* Pretend that the read succeeded and returned 0. */
-		regs->ax = 0;
-
-		if (!imm_insn)
-			regs->dx = 0;
-	}
-
-	if (safe)
-		*pt_regs_nr(regs, reg) = -EIO;
-
 	return ex_handler_default(fixup, regs);
 }
 
@@ -395,13 +384,13 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 		case EX_TYPE_BPF:
 			return ex_handler_bpf(e, regs);
 		case EX_TYPE_WRMSR:
-			return ex_handler_msr(e, regs, true, false, reg);
+			return ex_handler_msr(e, regs, true, false);
 		case EX_TYPE_RDMSR:
-			return ex_handler_msr(e, regs, false, false, reg);
+			return ex_handler_msr(e, regs, false, false);
 		case EX_TYPE_WRMSR_SAFE:
-			return ex_handler_msr(e, regs, true, true, reg);
+			return ex_handler_msr(e, regs, true, true);
 		case EX_TYPE_RDMSR_SAFE:
-			return ex_handler_msr(e, regs, false, true, reg);
+			return ex_handler_msr(e, regs, false, true);
 		case EX_TYPE_WRMSR_IN_MCE:
 			ex_handler_msr_mce(regs, true);
 			break;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 24/34] x86/mce: Use native MSR API __native_{wr,rd}msrq()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (22 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 23/34] x86/extable: Remove new dead code in ex_handler_msr() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 25/34] x86/msr: Rename native_wrmsrq() to native_wrmsrq_no_trace() Xin Li (Intel)
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Use native MSR API __native_{wr,rd}msrq() instead of MSR assemely.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/kernel/cpu/mce/core.c | 55 +++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 32286bad75e6..b854a60238de 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -370,13 +370,40 @@ static int msr_to_offset(u32 msr)
 
 void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr)
 {
+	bool imm_insn = is_msr_imm_insn((void *)regs->ip);
+	u32 msr;
+
+	if (imm_insn)
+		/*
+		 * The 32-bit immediate specifying a MSR is encoded into
+		 * byte 5 ~ 8 of an immediate form MSR instruction.
+		 */
+		msr = *(u32 *)(regs->ip + 5);
+	else
+		msr = (u32)regs->cx;
+
 	if (wrmsr) {
-		pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
-			 (unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
-			 regs->ip, (void *)regs->ip);
+		/*
+		 * To maintain consistency with existing RDMSR and WRMSR(NS) instructions,
+		 * the register operand for immediate form MSR instructions is ALWAYS
+		 * encoded as RAX in <asm/msr.h> for reading or writing the MSR value.
+		 */
+		u64 msr_val = regs->ax;
+
+		if (!imm_insn) {
+			/*
+			 * On processors that support the Intel 64 architecture, the
+			 * high-order 32 bits of each of RAX and RDX are ignored.
+			 */
+			msr_val &= 0xffffffff;
+			msr_val |= (u64)regs->dx << 32;
+		}
+
+		pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%016llx) at rIP: 0x%lx (%pS)\n",
+			 msr, msr_val, regs->ip, (void *)regs->ip);
 	} else {
 		pr_emerg("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
-			 (unsigned int)regs->cx, regs->ip, (void *)regs->ip);
+			 msr, regs->ip, (void *)regs->ip);
 	}
 
 	show_stack_regs(regs);
@@ -390,7 +417,7 @@ void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr)
 /* MSR access wrappers used for error injection */
 noinstr u64 mce_rdmsrq(u32 msr)
 {
-	DECLARE_ARGS(val, low, high);
+	u64 val;
 
 	if (__this_cpu_read(injectm.finished)) {
 		int offset;
@@ -414,19 +441,13 @@ noinstr u64 mce_rdmsrq(u32 msr)
 	 * architectural violation and needs to be reported to hw vendor. Panic
 	 * the box to not allow any further progress.
 	 */
-	asm volatile("1: rdmsr\n"
-		     "2:\n"
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR_IN_MCE)
-		     : EAX_EDX_RET(val, low, high) : "c" (msr));
+	__native_rdmsrq(msr, &val, EX_TYPE_RDMSR_IN_MCE);
 
-
-	return EAX_EDX_VAL(val, low, high);
+	return val;
 }
 
 static noinstr void mce_wrmsrq(u32 msr, u64 v)
 {
-	u32 low, high;
-
 	if (__this_cpu_read(injectm.finished)) {
 		int offset;
 
@@ -441,14 +462,8 @@ static noinstr void mce_wrmsrq(u32 msr, u64 v)
 		return;
 	}
 
-	low  = (u32)v;
-	high = (u32)(v >> 32);
-
 	/* See comment in mce_rdmsrq() */
-	asm volatile("1: wrmsr\n"
-		     "2:\n"
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR_IN_MCE)
-		     : : "c" (msr), "a"(low), "d" (high) : "memory");
+	__native_wrmsrq(msr, v, EX_TYPE_WRMSR_IN_MCE);
 }
 
 /*
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 25/34] x86/msr: Rename native_wrmsrq() to native_wrmsrq_no_trace()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (23 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 24/34] x86/mce: Use native MSR API __native_{wr,rd}msrq() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 26/34] x86/msr: Rename native_wrmsr() to native_wrmsr_no_trace() Xin Li (Intel)
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

native_wrmsrq() doesn't do trace thus can be used in noinstr context,
rename it to native_wrmsrq_no_trace() to make it explicit.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/events/amd/brs.c                 | 2 +-
 arch/x86/hyperv/ivm.c                     | 2 +-
 arch/x86/include/asm/apic.h               | 2 +-
 arch/x86/include/asm/fred.h               | 2 +-
 arch/x86/include/asm/microcode.h          | 2 +-
 arch/x86/include/asm/msr.h                | 8 ++++----
 arch/x86/include/asm/sev-internal.h       | 2 +-
 arch/x86/include/asm/spec-ctrl.h          | 2 +-
 arch/x86/kernel/cpu/mce/core.c            | 2 +-
 arch/x86/kernel/cpu/microcode/amd.c       | 2 +-
 arch/x86/kernel/cpu/microcode/intel.c     | 2 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 arch/x86/kvm/vmx/vmx.c                    | 8 ++++----
 13 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index 06f35a6b58a5..0153616a97cd 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -44,7 +44,7 @@ static inline unsigned int brs_to(int idx)
 static __always_inline void set_debug_extn_cfg(u64 val)
 {
 	/* bits[4:3] must always be set to 11b */
-	native_wrmsrq(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
+	native_wrmsrq_no_trace(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
 }
 
 static __always_inline u64 get_debug_extn_cfg(void)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 09a165a3c41e..821609af5bd2 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -116,7 +116,7 @@ static inline u64 rd_ghcb_msr(void)
 
 static inline void wr_ghcb_msr(u64 val)
 {
-	native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
+	native_wrmsrq_no_trace(MSR_AMD64_SEV_ES_GHCB, val);
 }
 
 static enum es_result hv_ghcb_hv_call(struct ghcb *ghcb, u64 exit_code,
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 68e10e30fe9b..442127c3e1f5 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -214,7 +214,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 
 static inline void native_apic_msr_eoi(void)
 {
-	native_wrmsrq(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK);
+	native_wrmsrq_no_trace(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK);
 }
 
 static inline u32 native_apic_msr_read(u32 reg)
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 8ae4429e5401..3a58545415d9 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -101,7 +101,7 @@ static __always_inline void fred_update_rsp0(void)
 	unsigned long rsp0 = (unsigned long) task_stack_page(current) + THREAD_SIZE;
 
 	if (cpu_feature_enabled(X86_FEATURE_FRED) && (__this_cpu_read(fred_rsp0) != rsp0)) {
-		native_wrmsrq(MSR_IA32_FRED_RSP0, rsp0);
+		native_wrmsrq_no_trace(MSR_IA32_FRED_RSP0, rsp0);
 		__this_cpu_write(fred_rsp0, rsp0);
 	}
 }
diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 107a1aaa211b..da482f430d80 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -63,7 +63,7 @@ static inline u32 intel_get_microcode_revision(void)
 {
 	u32 rev, dummy;
 
-	native_wrmsrq(MSR_IA32_UCODE_REV, 0);
+	native_wrmsrq_no_trace(MSR_IA32_UCODE_REV, 0);
 
 	/* As documented in the SDM: Do a CPUID 1 here */
 	native_cpuid_eax(1);
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 5271cb002b23..d130bdeed3ce 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -366,7 +366,7 @@ static __always_inline int rdmsrq_safe(u32 msr, u64 *val)
  *                         __native_wrmsrq()   -----------------------
  *                            /     \                                |
  *                           /       \                               |
- *               native_wrmsrq()    native_write_msr_safe()          |
+ *        native_wrmsrq_no_trace()    native_write_msr_safe()        |
  *                   /    \                                          |
  *                  /      \                                         |
  *      native_wrmsr()    native_write_msr()                         |
@@ -462,19 +462,19 @@ static __always_inline bool __native_wrmsrq(u32 msr, u64 val, int type)
 	return __native_wrmsrq_variable(msr, val, type);
 }
 
-static __always_inline void native_wrmsrq(u32 msr, u64 val)
+static __always_inline void native_wrmsrq_no_trace(u32 msr, u64 val)
 {
 	__native_wrmsrq(msr, val, EX_TYPE_WRMSR);
 }
 
 static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
 {
-	native_wrmsrq(msr, (u64)high << 32 | low);
+	native_wrmsrq_no_trace(msr, (u64)high << 32 | low);
 }
 
 static inline void notrace native_write_msr(u32 msr, u64 val)
 {
-	native_wrmsrq(msr, val);
+	native_wrmsrq_no_trace(msr, val);
 
 	if (tracepoint_enabled(write_msr))
 		do_trace_write_msr(msr, val, 0);
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index d259bcec220a..7eb030702435 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -101,7 +101,7 @@ static inline u64 sev_es_rd_ghcb_msr(void)
 
 static __always_inline void sev_es_wr_ghcb_msr(u64 val)
 {
-	native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
+	native_wrmsrq_no_trace(MSR_AMD64_SEV_ES_GHCB, val);
 }
 
 enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
index 00b7e0398210..8cf69849bbbe 100644
--- a/arch/x86/include/asm/spec-ctrl.h
+++ b/arch/x86/include/asm/spec-ctrl.h
@@ -84,7 +84,7 @@ static inline u64 ssbd_tif_to_amd_ls_cfg(u64 tifn)
 static __always_inline void __update_spec_ctrl(u64 val)
 {
 	__this_cpu_write(x86_spec_ctrl_current, val);
-	native_wrmsrq(MSR_IA32_SPEC_CTRL, val);
+	native_wrmsrq_no_trace(MSR_IA32_SPEC_CTRL, val);
 }
 
 #ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b854a60238de..bd3cb984ccb9 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1321,7 +1321,7 @@ static noinstr bool mce_check_crashing_cpu(void)
 		}
 
 		if (mcgstatus & MCG_STATUS_RIPV) {
-			native_wrmsrq(MSR_IA32_MCG_STATUS, 0);
+			native_wrmsrq_no_trace(MSR_IA32_MCG_STATUS, 0);
 			return true;
 		}
 	}
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 1798a6c027f8..41c553396500 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -607,7 +607,7 @@ static bool __apply_microcode_amd(struct microcode_amd *mc, u32 *cur_rev,
 	if (!verify_sha256_digest(mc->hdr.patch_id, *cur_rev, (const u8 *)p_addr, psize))
 		return false;
 
-	native_wrmsrq(MSR_AMD64_PATCH_LOADER, p_addr);
+	native_wrmsrq_no_trace(MSR_AMD64_PATCH_LOADER, p_addr);
 
 	if (x86_family(bsp_cpuid_1_eax) == 0x17) {
 		unsigned long p_addr_end = p_addr + psize - 1;
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 86e1047f738f..26e13dc4cedd 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -320,7 +320,7 @@ static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
 	}
 
 	/* write microcode via MSR 0x79 */
-	native_wrmsrq(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
+	native_wrmsrq_no_trace(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
 
 	rev = intel_get_microcode_revision();
 	if (rev != mc->hdr.rev)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index cc534a83f19d..e970a0de894f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -483,7 +483,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * cache.
 	 */
 	saved_msr = native_rdmsrq(MSR_MISC_FEATURE_CONTROL);
-	native_wrmsrq(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
+	native_wrmsrq_no_trace(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	closid_p = this_cpu_read(pqr_state.cur_closid);
 	rmid_p = this_cpu_read(pqr_state.cur_rmid);
 	mem_r = plr->kmem;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e73c1d5ba6c4..b53575dee64a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -382,7 +382,7 @@ static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
 
 	msr = native_rdmsrq(MSR_IA32_MCU_OPT_CTRL);
 	msr |= FB_CLEAR_DIS;
-	native_wrmsrq(MSR_IA32_MCU_OPT_CTRL, msr);
+	native_wrmsrq_no_trace(MSR_IA32_MCU_OPT_CTRL, msr);
 	/* Cache the MSR value to avoid reading it later */
 	vmx->msr_ia32_mcu_opt_ctrl = msr;
 }
@@ -393,7 +393,7 @@ static __always_inline void vmx_enable_fb_clear(struct vcpu_vmx *vmx)
 		return;
 
 	vmx->msr_ia32_mcu_opt_ctrl &= ~FB_CLEAR_DIS;
-	native_wrmsrq(MSR_IA32_MCU_OPT_CTRL, vmx->msr_ia32_mcu_opt_ctrl);
+	native_wrmsrq_no_trace(MSR_IA32_MCU_OPT_CTRL, vmx->msr_ia32_mcu_opt_ctrl);
 }
 
 static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
@@ -6745,7 +6745,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
 	vcpu->stat.l1d_flush++;
 
 	if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
-		native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+		native_wrmsrq_no_trace(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
 		return;
 	}
 
@@ -7318,7 +7318,7 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
 	 */
 	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
 	    vmx->spec_ctrl != hostval)
-		native_wrmsrq(MSR_IA32_SPEC_CTRL, hostval);
+		native_wrmsrq_no_trace(MSR_IA32_SPEC_CTRL, hostval);
 
 	barrier_nospec();
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 26/34] x86/msr: Rename native_wrmsr() to native_wrmsr_no_trace()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (24 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 25/34] x86/msr: Rename native_wrmsrq() to native_wrmsrq_no_trace() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 27/34] x86/msr: Rename native_write_msr() to native_wrmsrq() Xin Li (Intel)
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

native_wrmsr() doesn't do trace thus can be used in noinstr context,
rename it to native_wrmsr_no_trace() to make it explicit.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h                | 8 ++++----
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index d130bdeed3ce..2a62a899f7a5 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -367,9 +367,9 @@ static __always_inline int rdmsrq_safe(u32 msr, u64 *val)
  *                            /     \                                |
  *                           /       \                               |
  *        native_wrmsrq_no_trace()    native_write_msr_safe()        |
- *                   /    \                                          |
- *                  /      \                                         |
- *      native_wrmsr()    native_write_msr()                         |
+ *                   /        \                                      |
+ *                  /          \                                     |
+ * native_wrmsr_no_trace()    native_write_msr()                     |
  *                                                                   |
  *                                                                   |
  *                                                                   |
@@ -467,7 +467,7 @@ static __always_inline void native_wrmsrq_no_trace(u32 msr, u64 val)
 	__native_wrmsrq(msr, val, EX_TYPE_WRMSR);
 }
 
-static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
+static __always_inline void native_wrmsr_no_trace(u32 msr, u32 low, u32 high)
 {
 	native_wrmsrq_no_trace(msr, (u64)high << 32 | low);
 }
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index e970a0de894f..184bc1b3fb02 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -495,7 +495,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * pseudo-locked followed by reading of kernel memory to load it
 	 * into the cache.
 	 */
-	native_wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
+	native_wrmsr_no_trace(MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
 
 	/*
 	 * Cache was flushed earlier. Now access kernel memory to read it
@@ -532,7 +532,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * Critical section end: restore closid with capacity bitmask that
 	 * does not overlap with pseudo-locked region.
 	 */
-	native_wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
+	native_wrmsr_no_trace(MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
 
 	/* Re-enable the hardware prefetcher(s) */
 	wrmsrq(MSR_MISC_FEATURE_CONTROL, saved_msr);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 27/34] x86/msr: Rename native_write_msr() to native_wrmsrq()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (25 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 26/34] x86/msr: Rename native_wrmsr() to native_wrmsr_no_trace() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 28/34] x86/msr: Rename native_write_msr_safe() to native_wrmsrq_safe() Xin Li (Intel)
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h | 4 ++--
 arch/x86/kernel/kvmclock.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 2a62a899f7a5..72a1c3301d46 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -369,7 +369,7 @@ static __always_inline int rdmsrq_safe(u32 msr, u64 *val)
  *        native_wrmsrq_no_trace()    native_write_msr_safe()        |
  *                   /        \                                      |
  *                  /          \                                     |
- * native_wrmsr_no_trace()    native_write_msr()                     |
+ * native_wrmsr_no_trace()    native_wrmsrq()                        |
  *                                                                   |
  *                                                                   |
  *                                                                   |
@@ -472,7 +472,7 @@ static __always_inline void native_wrmsr_no_trace(u32 msr, u32 low, u32 high)
 	native_wrmsrq_no_trace(msr, (u64)high << 32 | low);
 }
 
-static inline void notrace native_write_msr(u32 msr, u64 val)
+static inline void notrace native_wrmsrq(u32 msr, u64 val)
 {
 	native_wrmsrq_no_trace(msr, val);
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca0a49eeac4a..36417fed7f18 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -196,7 +196,7 @@ static void kvm_setup_secondary_clock(void)
 void kvmclock_disable(void)
 {
 	if (msr_kvm_system_time)
-		native_write_msr(msr_kvm_system_time, 0);
+		native_wrmsrq(msr_kvm_system_time, 0);
 }
 
 static void __init kvmclock_init_mem(void)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 28/34] x86/msr: Rename native_write_msr_safe() to native_wrmsrq_safe()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (26 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 27/34] x86/msr: Rename native_write_msr() to native_wrmsrq() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 29/34] x86/msr: Rename native_rdmsrq() to native_rdmsrq_no_trace() Xin Li (Intel)
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h | 4 ++--
 arch/x86/kvm/svm/svm.c     | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 72a1c3301d46..a1c63bed14be 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -366,7 +366,7 @@ static __always_inline int rdmsrq_safe(u32 msr, u64 *val)
  *                         __native_wrmsrq()   -----------------------
  *                            /     \                                |
  *                           /       \                               |
- *        native_wrmsrq_no_trace()    native_write_msr_safe()        |
+ *        native_wrmsrq_no_trace()    native_wrmsrq_safe()           |
  *                   /        \                                      |
  *                  /          \                                     |
  * native_wrmsr_no_trace()    native_wrmsrq()                        |
@@ -480,7 +480,7 @@ static inline void notrace native_wrmsrq(u32 msr, u64 val)
 		do_trace_write_msr(msr, val, 0);
 }
 
-static inline int notrace native_write_msr_safe(u32 msr, u64 val)
+static inline int notrace native_wrmsrq_safe(u32 msr, u64 val)
 {
 	int err = __native_wrmsrq(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 838606f784c9..01dd3cd20730 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -486,7 +486,7 @@ static void svm_init_erratum_383(void)
 
 	val |= (1ULL << 47);
 
-	native_write_msr_safe(MSR_AMD64_DC_CFG, val);
+	native_wrmsrq_safe(MSR_AMD64_DC_CFG, val);
 
 	erratum_383_found = true;
 }
@@ -2159,11 +2159,11 @@ static bool is_erratum_383(void)
 
 	/* Clear MCi_STATUS registers */
 	for (i = 0; i < 6; ++i)
-		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
+		native_wrmsrq_safe(MSR_IA32_MCx_STATUS(i), 0);
 
 	if (!native_read_msr_safe(MSR_IA32_MCG_STATUS, &value)) {
 		value &= ~(1ULL << 2);
-		native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
+		native_wrmsrq_safe(MSR_IA32_MCG_STATUS, value);
 	}
 
 	/* Flush tlb to evict multi-match entries */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 29/34] x86/msr: Rename native_rdmsrq() to native_rdmsrq_no_trace()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (27 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 28/34] x86/msr: Rename native_write_msr_safe() to native_wrmsrq_safe() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 30/34] x86/msr: Rename native_rdmsr() to native_rdmsr_no_trace() Xin Li (Intel)
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

native_rdmsrq() doesn't do trace thus can be used in noinstr context,
rename it to native_rdmsrq_no_trace() to make it explicit.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/boot/startup/sme.c               | 4 ++--
 arch/x86/events/amd/brs.c                 | 2 +-
 arch/x86/hyperv/hv_vtl.c                  | 4 ++--
 arch/x86/hyperv/ivm.c                     | 2 +-
 arch/x86/include/asm/mshyperv.h           | 2 +-
 arch/x86/include/asm/msr.h                | 8 ++++----
 arch/x86/include/asm/sev-internal.h       | 2 +-
 arch/x86/kernel/cpu/common.c              | 2 +-
 arch/x86/kernel/cpu/mce/core.c            | 4 ++--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 arch/x86/kvm/vmx/vmx.c                    | 4 ++--
 11 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 5e147bf5a0a8..859d92ad91a4 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -524,7 +524,7 @@ void __head sme_enable(struct boot_params *bp)
 	me_mask = 1UL << (ebx & 0x3f);
 
 	/* Check the SEV MSR whether SEV or SME is enabled */
-	sev_status = msr = native_rdmsrq(MSR_AMD64_SEV);
+	sev_status = msr = native_rdmsrq_no_trace(MSR_AMD64_SEV);
 	feature_mask = (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
 
 	/*
@@ -555,7 +555,7 @@ void __head sme_enable(struct boot_params *bp)
 			return;
 
 		/* For SME, check the SYSCFG MSR */
-		msr = native_rdmsrq(MSR_AMD64_SYSCFG);
+		msr = native_rdmsrq_no_trace(MSR_AMD64_SYSCFG);
 		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
 			return;
 	}
diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index 0153616a97cd..0623b6d775fb 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -49,7 +49,7 @@ static __always_inline void set_debug_extn_cfg(u64 val)
 
 static __always_inline u64 get_debug_extn_cfg(void)
 {
-	return native_rdmsrq(MSR_AMD_DBG_EXTN_CFG);
+	return native_rdmsrq_no_trace(MSR_AMD_DBG_EXTN_CFG);
 }
 
 static bool __init amd_brs_detect(void)
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index c6343e699154..9e41e380ad26 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -149,11 +149,11 @@ static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
 	input->vp_context.rip = rip;
 	input->vp_context.rsp = rsp;
 	input->vp_context.rflags = 0x0000000000000002;
-	input->vp_context.efer = native_rdmsrq(MSR_EFER);
+	input->vp_context.efer = native_rdmsrq_no_trace(MSR_EFER);
 	input->vp_context.cr0 = native_read_cr0();
 	input->vp_context.cr3 = __native_read_cr3();
 	input->vp_context.cr4 = native_read_cr4();
-	input->vp_context.msr_cr_pat = native_rdmsrq(MSR_IA32_CR_PAT);
+	input->vp_context.msr_cr_pat = native_rdmsrq_no_trace(MSR_IA32_CR_PAT);
 	input->vp_context.idtr.limit = idt_ptr.size;
 	input->vp_context.idtr.base = idt_ptr.address;
 	input->vp_context.gdtr.limit = gdt_ptr.size;
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 821609af5bd2..dfddf522e838 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -111,7 +111,7 @@ u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
 
 static inline u64 rd_ghcb_msr(void)
 {
-	return native_rdmsrq(MSR_AMD64_SEV_ES_GHCB);
+	return native_rdmsrq_no_trace(MSR_AMD64_SEV_ES_GHCB);
 }
 
 static inline void wr_ghcb_msr(u64 val)
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 778444310cfb..ab94221ff38d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -305,7 +305,7 @@ void hv_set_non_nested_msr(unsigned int reg, u64 value);
 
 static __always_inline u64 hv_raw_get_msr(unsigned int reg)
 {
-	return native_rdmsrq(reg);
+	return native_rdmsrq_no_trace(reg);
 }
 
 #else /* CONFIG_HYPERV */
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index a1c63bed14be..050d750a5ab7 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -157,7 +157,7 @@ static __always_inline bool is_msr_imm_insn(void *ip)
  *                         __native_rdmsrq()   -----------------------
  *                            /     \                                |
  *                           /       \                               |
- *               native_rdmsrq()    native_read_msr_safe()           |
+ *        native_rdmsrq_no_trace()    native_read_msr_safe()         |
  *                   /    \                                          |
  *                  /      \                                         |
  *      native_rdmsr()    native_read_msr()                          |
@@ -248,7 +248,7 @@ static __always_inline bool __native_rdmsrq(u32 msr, u64 *val, int type)
 	return __native_rdmsrq_variable(msr, val, type);
 }
 
-static __always_inline u64 native_rdmsrq(u32 msr)
+static __always_inline u64 native_rdmsrq_no_trace(u32 msr)
 {
 	u64 val = 0;
 
@@ -258,14 +258,14 @@ static __always_inline u64 native_rdmsrq(u32 msr)
 
 #define native_rdmsr(msr, low, high)			\
 do {							\
-	u64 __val = native_rdmsrq(msr);			\
+	u64 __val = native_rdmsrq_no_trace(msr);	\
 	(void)((low) = (u32)__val);			\
 	(void)((high) = (u32)(__val >> 32));		\
 } while (0)
 
 static inline u64 native_read_msr(u32 msr)
 {
-	u64 val = native_rdmsrq(msr);
+	u64 val = native_rdmsrq_no_trace(msr);
 
 	if (tracepoint_enabled(read_msr))
 		do_trace_read_msr(msr, val, 0);
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 7eb030702435..743da9fc7454 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -96,7 +96,7 @@ int svsm_perform_call_protocol(struct svsm_call *call);
 
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
-	return native_rdmsrq(MSR_AMD64_SEV_ES_GHCB);
+	return native_rdmsrq_no_trace(MSR_AMD64_SEV_ES_GHCB);
 }
 
 static __always_inline void sev_es_wr_ghcb_msr(u64 val)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 99d8a8c15ba5..9d2de568cb96 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -164,7 +164,7 @@ static void ppin_init(struct cpuinfo_x86 *c)
 
 	/* Is the enable bit set? */
 	if (val & 2UL) {
-		c->ppin = native_rdmsrq(info->msr_ppin);
+		c->ppin = native_rdmsrq_no_trace(info->msr_ppin);
 		set_cpu_cap(c, info->feature);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bd3cb984ccb9..9f7538b9d2fa 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -121,7 +121,7 @@ void mce_prep_record_common(struct mce *m)
 {
 	m->cpuid	= cpuid_eax(1);
 	m->cpuvendor	= boot_cpu_data.x86_vendor;
-	m->mcgcap	= native_rdmsrq(MSR_IA32_MCG_CAP);
+	m->mcgcap	= native_rdmsrq_no_trace(MSR_IA32_MCG_CAP);
 	/* need the internal __ version to avoid deadlocks */
 	m->time		= __ktime_get_real_seconds();
 }
@@ -1313,7 +1313,7 @@ static noinstr bool mce_check_crashing_cpu(void)
 	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
 		u64 mcgstatus;
 
-		mcgstatus = native_rdmsrq(MSR_IA32_MCG_STATUS);
+		mcgstatus = native_rdmsrq_no_trace(MSR_IA32_MCG_STATUS);
 
 		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
 			if (mcgstatus & MCG_STATUS_LMCES)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 184bc1b3fb02..819c07a23c6d 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -482,7 +482,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * the buffer and evict pseudo-locked memory read earlier from the
 	 * cache.
 	 */
-	saved_msr = native_rdmsrq(MSR_MISC_FEATURE_CONTROL);
+	saved_msr = native_rdmsrq_no_trace(MSR_MISC_FEATURE_CONTROL);
 	native_wrmsrq_no_trace(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	closid_p = this_cpu_read(pqr_state.cur_closid);
 	rmid_p = this_cpu_read(pqr_state.cur_rmid);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b53575dee64a..cdbbfa0b9851 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -380,7 +380,7 @@ static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
 	if (!vmx->disable_fb_clear)
 		return;
 
-	msr = native_rdmsrq(MSR_IA32_MCU_OPT_CTRL);
+	msr = native_rdmsrq_no_trace(MSR_IA32_MCU_OPT_CTRL);
 	msr |= FB_CLEAR_DIS;
 	native_wrmsrq_no_trace(MSR_IA32_MCU_OPT_CTRL, msr);
 	/* Cache the MSR value to avoid reading it later */
@@ -7307,7 +7307,7 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
 		return;
 
 	if (flags & VMX_RUN_SAVE_SPEC_CTRL)
-		vmx->spec_ctrl = native_rdmsrq(MSR_IA32_SPEC_CTRL);
+		vmx->spec_ctrl = native_rdmsrq_no_trace(MSR_IA32_SPEC_CTRL);
 
 	/*
 	 * If the guest/host SPEC_CTRL values differ, restore the host value.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 30/34] x86/msr: Rename native_rdmsr() to native_rdmsr_no_trace()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (28 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 29/34] x86/msr: Rename native_rdmsrq() to native_rdmsrq_no_trace() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 31/34] x86/msr: Rename native_read_msr() to native_rdmsrq() Xin Li (Intel)
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

native_rdmsr() doesn't do trace thus can be used in noinstr context,
rename it to native_rdmsr_no_trace() to make it explicit.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/microcode.h      | 2 +-
 arch/x86/include/asm/msr.h            | 8 ++++----
 arch/x86/kernel/cpu/microcode/amd.c   | 2 +-
 arch/x86/kernel/cpu/microcode/core.c  | 2 +-
 arch/x86/kernel/cpu/microcode/intel.c | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index da482f430d80..d581fdaf1f36 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -69,7 +69,7 @@ static inline u32 intel_get_microcode_revision(void)
 	native_cpuid_eax(1);
 
 	/* get the current revision from MSR 0x8B */
-	native_rdmsr(MSR_IA32_UCODE_REV, dummy, rev);
+	native_rdmsr_no_trace(MSR_IA32_UCODE_REV, dummy, rev);
 
 	return rev;
 }
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 050d750a5ab7..dfaac42b6258 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -158,9 +158,9 @@ static __always_inline bool is_msr_imm_insn(void *ip)
  *                            /     \                                |
  *                           /       \                               |
  *        native_rdmsrq_no_trace()    native_read_msr_safe()         |
- *                   /    \                                          |
- *                  /      \                                         |
- *      native_rdmsr()    native_read_msr()                          |
+ *                   /      \                                        |
+ *                  /        \                                       |
+ * native_rdmsr_no_trace()    native_read_msr()                      |
  *                                                                   |
  *                                                                   |
  *                                                                   |
@@ -256,7 +256,7 @@ static __always_inline u64 native_rdmsrq_no_trace(u32 msr)
 	return val;
 }
 
-#define native_rdmsr(msr, low, high)			\
+#define native_rdmsr_no_trace(msr, low, high)		\
 do {							\
 	u64 __val = native_rdmsrq_no_trace(msr);	\
 	(void)((low) = (u32)__val);			\
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 41c553396500..f1f275ddab57 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -256,7 +256,7 @@ static u32 get_patch_level(void)
 {
 	u32 rev, dummy __always_unused;
 
-	native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
+	native_rdmsr_no_trace(MSR_AMD64_PATCH_LEVEL, rev, dummy);
 
 	return rev;
 }
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b3658d11e7b6..9bda8fd987ab 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -84,7 +84,7 @@ static bool amd_check_current_patch_level(void)
 	u32 lvl, dummy, i;
 	u32 *levels;
 
-	native_rdmsr(MSR_AMD64_PATCH_LEVEL, lvl, dummy);
+	native_rdmsr_no_trace(MSR_AMD64_PATCH_LEVEL, lvl, dummy);
 
 	levels = final_levels;
 
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 26e13dc4cedd..c0307b1ad63d 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -78,7 +78,7 @@ void intel_collect_cpu_info(struct cpu_signature *sig)
 		unsigned int val[2];
 
 		/* get processor flags from MSR 0x17 */
-		native_rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
+		native_rdmsr_no_trace(MSR_IA32_PLATFORM_ID, val[0], val[1]);
 		sig->pf = 1 << ((val[1] >> 18) & 7);
 	}
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 31/34] x86/msr: Rename native_read_msr() to native_rdmsrq()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (29 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 30/34] x86/msr: Rename native_rdmsr() to native_rdmsr_no_trace() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 32/34] x86/msr: Rename native_read_msr_safe() to native_rdmsrq_safe() Xin Li (Intel)
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/hyperv/ivm.c      | 2 +-
 arch/x86/include/asm/msr.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index dfddf522e838..8860c6c0f013 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -319,7 +319,7 @@ int hv_snp_boot_ap(u32 cpu, unsigned long start_ip)
 	asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
 	hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
 
-	vmsa->efer = native_read_msr(MSR_EFER);
+	vmsa->efer = native_rdmsrq(MSR_EFER);
 
 	vmsa->cr4 = native_read_cr4();
 	vmsa->cr3 = __native_read_cr3();
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index dfaac42b6258..4c7aa9e7fbac 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -160,7 +160,7 @@ static __always_inline bool is_msr_imm_insn(void *ip)
  *        native_rdmsrq_no_trace()    native_read_msr_safe()         |
  *                   /      \                                        |
  *                  /        \                                       |
- * native_rdmsr_no_trace()    native_read_msr()                      |
+ * native_rdmsr_no_trace()    native_rdmsrq()                        |
  *                                                                   |
  *                                                                   |
  *                                                                   |
@@ -263,7 +263,7 @@ do {							\
 	(void)((high) = (u32)(__val >> 32));		\
 } while (0)
 
-static inline u64 native_read_msr(u32 msr)
+static inline u64 native_rdmsrq(u32 msr)
 {
 	u64 val = native_rdmsrq_no_trace(msr);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 32/34] x86/msr: Rename native_read_msr_safe() to native_rdmsrq_safe()
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (30 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 31/34] x86/msr: Rename native_read_msr() to native_rdmsrq() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 33/34] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h |  4 ++--
 arch/x86/kvm/svm/svm.c     | 10 +++++-----
 arch/x86/xen/pmu.c         |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 4c7aa9e7fbac..b4447ba4d6c2 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -157,7 +157,7 @@ static __always_inline bool is_msr_imm_insn(void *ip)
  *                         __native_rdmsrq()   -----------------------
  *                            /     \                                |
  *                           /       \                               |
- *        native_rdmsrq_no_trace()    native_read_msr_safe()         |
+ *        native_rdmsrq_no_trace()    native_rdmsrq_safe()           |
  *                   /      \                                        |
  *                  /        \                                       |
  * native_rdmsr_no_trace()    native_rdmsrq()                        |
@@ -273,7 +273,7 @@ static inline u64 native_rdmsrq(u32 msr)
 	return val;
 }
 
-static inline int native_read_msr_safe(u32 msr, u64 *val)
+static inline int native_rdmsrq_safe(u32 msr, u64 *val)
 {
 	int err;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 01dd3cd20730..251fd9366b35 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -481,7 +481,7 @@ static void svm_init_erratum_383(void)
 		return;
 
 	/* Use _safe variants to not break nested virtualization */
-	if (native_read_msr_safe(MSR_AMD64_DC_CFG, &val))
+	if (native_rdmsrq_safe(MSR_AMD64_DC_CFG, &val))
 		return;
 
 	val |= (1ULL << 47);
@@ -649,9 +649,9 @@ static int svm_enable_virtualization_cpu(void)
 		u64 len, status = 0;
 		int err;
 
-		err = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
+		err = native_rdmsrq_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
 		if (!err)
-			err = native_read_msr_safe(MSR_AMD64_OSVW_STATUS, &status);
+			err = native_rdmsrq_safe(MSR_AMD64_OSVW_STATUS, &status);
 
 		if (err)
 			osvw_status = osvw_len = 0;
@@ -2148,7 +2148,7 @@ static bool is_erratum_383(void)
 	if (!erratum_383_found)
 		return false;
 
-	if (native_read_msr_safe(MSR_IA32_MC0_STATUS, &value))
+	if (native_rdmsrq_safe(MSR_IA32_MC0_STATUS, &value))
 		return false;
 
 	/* Bit 62 may or may not be set for this mce */
@@ -2161,7 +2161,7 @@ static bool is_erratum_383(void)
 	for (i = 0; i < 6; ++i)
 		native_wrmsrq_safe(MSR_IA32_MCx_STATUS(i), 0);
 
-	if (!native_read_msr_safe(MSR_IA32_MCG_STATUS, &value)) {
+	if (!native_rdmsrq_safe(MSR_IA32_MCG_STATUS, &value)) {
 		value &= ~(1ULL << 2);
 		native_wrmsrq_safe(MSR_IA32_MCG_STATUS, value);
 	}
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index ee908dfcff48..66d2c6fc7bfa 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -323,7 +323,7 @@ static u64 xen_amd_read_pmc(int counter)
 		u64 val;
 
 		msr = amd_counters_base + (counter * amd_msr_step);
-		native_read_msr_safe(msr, &val);
+		native_rdmsrq_safe(msr, &val);
 		return val;
 	}
 
@@ -349,7 +349,7 @@ static u64 xen_intel_read_pmc(int counter)
 		else
 			msr = MSR_IA32_PERFCTR0 + counter;
 
-		native_read_msr_safe(msr, &val);
+		native_rdmsrq_safe(msr, &val);
 		return val;
 	}
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 33/34] x86/msr: Move the ARGS macros after the MSR read/write APIs
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (31 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 32/34] x86/msr: Rename native_read_msr_safe() to native_rdmsrq_safe() Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22  8:22 ` [RFC PATCH v2 34/34] x86/msr: Convert native_rdmsr_no_trace() uses to native_rdmsrq_no_trace() uses Xin Li (Intel)
  2025-04-22 15:03 ` [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Sean Christopherson
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Since the ARGS macros are no longer used in the MSR read/write API
implementation, move them after their definitions.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index b4447ba4d6c2..be593a15a838 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -38,23 +38,6 @@ struct saved_msrs {
 	struct saved_msr *array;
 };
 
-/*
- * both i386 and x86_64 returns 64-bit value in edx:eax, but gcc's "A"
- * constraint has different meanings. For i386, "A" means exactly
- * edx:eax, while for x86_64 it doesn't mean rdx:rax or edx:eax. Instead,
- * it means rax *or* rdx.
- */
-#ifdef CONFIG_X86_64
-/* Using 64-bit values saves one instruction clearing the high half of low */
-#define DECLARE_ARGS(val, low, high)	unsigned long low, high
-#define EAX_EDX_VAL(val, low, high)	((low) | (high) << 32)
-#define EAX_EDX_RET(val, low, high)	"=a" (low), "=d" (high)
-#else
-#define DECLARE_ARGS(val, low, high)	u64 val
-#define EAX_EDX_VAL(val, low, high)	(val)
-#define EAX_EDX_RET(val, low, high)	"=A" (val)
-#endif
-
 /*
  * Be very careful with includes. This header is prone to include loops.
  */
@@ -559,6 +542,23 @@ static __always_inline int wrmsr_safe(u32 msr, u32 low, u32 high)
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
+/*
+ * both i386 and x86_64 returns 64-bit value in edx:eax, but gcc's "A"
+ * constraint has different meanings. For i386, "A" means exactly
+ * edx:eax, while for x86_64 it doesn't mean rdx:rax or edx:eax. Instead,
+ * it means rax *or* rdx.
+ */
+#ifdef CONFIG_X86_64
+/* Using 64-bit values saves one instruction clearing the high half of low */
+#define DECLARE_ARGS(val, low, high)	unsigned long low, high
+#define EAX_EDX_VAL(val, low, high)	((low) | (high) << 32)
+#define EAX_EDX_RET(val, low, high)	"=a" (low), "=d" (high)
+#else
+#define DECLARE_ARGS(val, low, high)	u64 val
+#define EAX_EDX_VAL(val, low, high)	(val)
+#define EAX_EDX_RET(val, low, high)	"=A" (val)
+#endif
+
 static __always_inline u64 native_rdpmcq(int counter)
 {
 	DECLARE_ARGS(val, low, high);
@@ -571,6 +571,10 @@ static __always_inline u64 native_rdpmcq(int counter)
 	return EAX_EDX_VAL(val, low, high);
 }
 
+#undef DECLARE_ARGS
+#undef EAX_EDX_VAL
+#undef EAX_EDX_RET
+
 static __always_inline u64 rdpmcq(int counter)
 {
 #ifdef CONFIG_XEN_PV
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [RFC PATCH v2 34/34] x86/msr: Convert native_rdmsr_no_trace() uses to native_rdmsrq_no_trace() uses
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (32 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 33/34] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)
@ 2025-04-22  8:22 ` Xin Li (Intel)
  2025-04-22 15:03 ` [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Sean Christopherson
  34 siblings, 0 replies; 94+ messages in thread
From: Xin Li (Intel) @ 2025-04-22  8:22 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, acme, jgross,
	andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui

Convert native_rdmsr_no_trace() uses to native_rdmsrq_no_trace() uses
cleanly with the use of struct msr, and remove native_rdmsr_no_trace().

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/microcode.h      |  6 +++---
 arch/x86/include/asm/msr.h            | 13 +++----------
 arch/x86/kernel/cpu/microcode/amd.c   |  8 ++------
 arch/x86/kernel/cpu/microcode/core.c  |  4 ++--
 arch/x86/kernel/cpu/microcode/intel.c |  6 +++---
 5 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index d581fdaf1f36..1d9641349744 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -61,7 +61,7 @@ static inline int intel_microcode_get_datasize(struct microcode_header_intel *hd
 
 static inline u32 intel_get_microcode_revision(void)
 {
-	u32 rev, dummy;
+	struct msr val;
 
 	native_wrmsrq_no_trace(MSR_IA32_UCODE_REV, 0);
 
@@ -69,9 +69,9 @@ static inline u32 intel_get_microcode_revision(void)
 	native_cpuid_eax(1);
 
 	/* get the current revision from MSR 0x8B */
-	native_rdmsr_no_trace(MSR_IA32_UCODE_REV, dummy, rev);
+	val.q = native_rdmsrq_no_trace(MSR_IA32_UCODE_REV);
 
-	return rev;
+	return val.h;
 }
 #endif /* !CONFIG_CPU_SUP_INTEL */
 
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index be593a15a838..aebcd846af3e 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -141,9 +141,9 @@ static __always_inline bool is_msr_imm_insn(void *ip)
  *                            /     \                                |
  *                           /       \                               |
  *        native_rdmsrq_no_trace()    native_rdmsrq_safe()           |
- *                   /      \                                        |
- *                  /        \                                       |
- * native_rdmsr_no_trace()    native_rdmsrq()                        |
+ *               /                                                   |
+ *              /                                                    |
+ *      native_rdmsrq()                                              |
  *                                                                   |
  *                                                                   |
  *                                                                   |
@@ -239,13 +239,6 @@ static __always_inline u64 native_rdmsrq_no_trace(u32 msr)
 	return val;
 }
 
-#define native_rdmsr_no_trace(msr, low, high)		\
-do {							\
-	u64 __val = native_rdmsrq_no_trace(msr);	\
-	(void)((low) = (u32)__val);			\
-	(void)((high) = (u32)(__val >> 32));		\
-} while (0)
-
 static inline u64 native_rdmsrq(u32 msr)
 {
 	u64 val = native_rdmsrq_no_trace(msr);
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index f1f275ddab57..b4d66e79089c 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -254,11 +254,7 @@ static bool verify_sha256_digest(u32 patch_id, u32 cur_rev, const u8 *data, unsi
 
 static u32 get_patch_level(void)
 {
-	u32 rev, dummy __always_unused;
-
-	native_rdmsr_no_trace(MSR_AMD64_PATCH_LEVEL, rev, dummy);
-
-	return rev;
+	return native_rdmsrq_no_trace(MSR_AMD64_PATCH_LEVEL);
 }
 
 static union cpuid_1_eax ucode_rev_to_cpuid(unsigned int val)
@@ -835,7 +831,7 @@ static struct ucode_patch *find_patch(unsigned int cpu)
 
 void reload_ucode_amd(unsigned int cpu)
 {
-	u32 rev, dummy __always_unused;
+	u32 rev;
 	struct microcode_amd *mc;
 	struct ucode_patch *p;
 
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 9bda8fd987ab..81b264373d3e 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -81,10 +81,10 @@ struct early_load_data early_data;
  */
 static bool amd_check_current_patch_level(void)
 {
-	u32 lvl, dummy, i;
+	u32 lvl, i;
 	u32 *levels;
 
-	native_rdmsr_no_trace(MSR_AMD64_PATCH_LEVEL, lvl, dummy);
+	lvl = native_rdmsrq_no_trace(MSR_AMD64_PATCH_LEVEL);
 
 	levels = final_levels;
 
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index c0307b1ad63d..1b484214f3ee 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -75,11 +75,11 @@ void intel_collect_cpu_info(struct cpu_signature *sig)
 	sig->rev = intel_get_microcode_revision();
 
 	if (IFM(x86_family(sig->sig), x86_model(sig->sig)) >= INTEL_PENTIUM_III_DESCHUTES) {
-		unsigned int val[2];
+		struct msr val;
 
 		/* get processor flags from MSR 0x17 */
-		native_rdmsr_no_trace(MSR_IA32_PLATFORM_ID, val[0], val[1]);
-		sig->pf = 1 << ((val[1] >> 18) & 7);
+		val.q = native_rdmsrq_no_trace(MSR_IA32_PLATFORM_ID);
+		sig->pf = 1 << ((val.h >> 18) & 7);
 	}
 }
 EXPORT_SYMBOL_GPL(intel_collect_cpu_info);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support
  2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (33 preceding siblings ...)
  2025-04-22  8:22 ` [RFC PATCH v2 34/34] x86/msr: Convert native_rdmsr_no_trace() uses to native_rdmsrq_no_trace() uses Xin Li (Intel)
@ 2025-04-22 15:03 ` Sean Christopherson
  2025-04-22 17:51   ` Xin Li
  34 siblings, 1 reply; 94+ messages in thread
From: Sean Christopherson @ 2025-04-22 15:03 UTC (permalink / raw)
  To: Xin Li (Intel)
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On Tue, Apr 22, 2025, Xin Li (Intel) wrote:
> base-commit: f30a0c0d2b08b355c01392538de8fc872387cb2b

This commit doesn't exist in Linus' tree or the tip tree, and the series doesn't
apply cleanly on any of the "obvious" choices.  Reviewing a 34 patches series
without being able to apply it is a wee bit difficult...

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support
  2025-04-22 15:03 ` [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Sean Christopherson
@ 2025-04-22 17:51   ` Xin Li
  2025-04-22 18:05     ` Luck, Tony
  0 siblings, 1 reply; 94+ messages in thread
From: Xin Li @ 2025-04-22 17:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, kvm, linux-perf-users, linux-hyperv, virtualization,
	linux-pm, linux-edac, xen-devel, linux-acpi, linux-hwmon, netdev,
	platform-driver-x86, tglx, mingo, bp, dave.hansen, x86, hpa, acme,
	jgross, andrew.cooper3, peterz, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, luto, boris.ostrovsky, kys, haiyangz, decui

On 4/22/2025 8:03 AM, Sean Christopherson wrote:
> On Tue, Apr 22, 2025, Xin Li (Intel) wrote:
>> base-commit: f30a0c0d2b08b355c01392538de8fc872387cb2b
> 
> This commit doesn't exist in Linus' tree or the tip tree, and the series doesn't
> apply cleanly on any of the "obvious" choices.  Reviewing a 34 patches series
> without being able to apply it is a wee bit difficult...
> 

$ git show f30a0c0d2b08b355c01392538de8fc872387cb2b
commit f30a0c0d2b08b355c01392538de8fc872387cb2b
Merge: 49b517e68cf7 e396dd85172c
Author: Ingo Molnar <mingo@kernel.org>
Date:   Tue Apr 22 08:37:32 2025 +0200

     Merge branch into tip/master: 'x86/sev'

      # New commits in x86/sev:
         e396dd85172c ("x86/sev: Register tpm-svsm platform device")
         93b7c6b3ce91 ("tpm: Add SNP SVSM vTPM driver")
         b2849b072366 ("svsm: Add header with SVSM_VTPM_CMD helpers")
         770de678bc28 ("x86/sev: Add SVSM vTPM probe/send_command 
functions")

     Signed-off-by: Ingo Molnar <mingo@kernel.org>


You probably need to git pull from the tip tree :-)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support
  2025-04-22 17:51   ` Xin Li
@ 2025-04-22 18:05     ` Luck, Tony
  2025-04-22 19:44       ` Ingo Molnar
  0 siblings, 1 reply; 94+ messages in thread
From: Luck, Tony @ 2025-04-22 18:05 UTC (permalink / raw)
  To: Xin Li, Sean Christopherson
  Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-perf-users@vger.kernel.org, linux-hyperv@vger.kernel.org,
	virtualization@lists.linux.dev, linux-pm@vger.kernel.org,
	linux-edac@vger.kernel.org, xen-devel@lists.xenproject.org,
	linux-acpi@vger.kernel.org, linux-hwmon@vger.kernel.org,
	netdev@vger.kernel.org, platform-driver-x86@vger.kernel.org,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	acme@kernel.org, jgross@suse.com, andrew.cooper3@citrix.com,
	peterz@infradead.org, namhyung@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	irogers@google.com, Hunter, Adrian, kan.liang@linux.intel.com,
	wei.liu@kernel.org, ajay.kaher@broadcom.com,
	bcm-kernel-feedback-list@broadcom.com, pbonzini@redhat.com,
	vkuznets@redhat.com, luto@kernel.org, Ostrovsky, Boris,
	kys@microsoft.com, haiyangz@microsoft.com, Cui, Dexuan

> >> base-commit: f30a0c0d2b08b355c01392538de8fc872387cb2b
> >
> > This commit doesn't exist in Linus' tree or the tip tree, and the series doesn't
> > apply cleanly on any of the "obvious" choices.  Reviewing a 34 patches series
> > without being able to apply it is a wee bit difficult...
> >
>
> $ git show f30a0c0d2b08b355c01392538de8fc872387cb2b
> commit f30a0c0d2b08b355c01392538de8fc872387cb2b
> Merge: 49b517e68cf7 e396dd85172c
> Author: Ingo Molnar <mingo@kernel.org>
> Date:   Tue Apr 22 08:37:32 2025 +0200
>
>      Merge branch into tip/master: 'x86/sev'
>
>       # New commits in x86/sev:
>          e396dd85172c ("x86/sev: Register tpm-svsm platform device")
>          93b7c6b3ce91 ("tpm: Add SNP SVSM vTPM driver")
>          b2849b072366 ("svsm: Add header with SVSM_VTPM_CMD helpers")
>          770de678bc28 ("x86/sev: Add SVSM vTPM probe/send_command
> functions")
>
>      Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
>
> You probably need to git pull from the tip tree :-)

If possible, you should avoid basing a series on tip/master as it gets recreated
frequently by merging all the topic branches. The SHA1 is here today, gone
tomorrow.

If your changes only depend on one TIP topic branch, base on that and mention
in the cover letter (as well as the SHA1 supplied from git format-patches --base=xxx).

If you do depend on multiple tip topic branches, then maybe tip/master is your
only hope. But in that case cover letter should say "tip/master as of yyy-mm-dd.

-Tony

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support
  2025-04-22 18:05     ` Luck, Tony
@ 2025-04-22 19:44       ` Ingo Molnar
  2025-04-22 19:51         ` Sean Christopherson
  0 siblings, 1 reply; 94+ messages in thread
From: Ingo Molnar @ 2025-04-22 19:44 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Xin Li, Sean Christopherson, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
	linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
	linux-pm@vger.kernel.org, linux-edac@vger.kernel.org,
	xen-devel@lists.xenproject.org, linux-acpi@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	platform-driver-x86@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, acme@kernel.org, jgross@suse.com,
	andrew.cooper3@citrix.com, peterz@infradead.org,
	namhyung@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	irogers@google.com, Hunter, Adrian, kan.liang@linux.intel.com,
	wei.liu@kernel.org, ajay.kaher@broadcom.com,
	bcm-kernel-feedback-list@broadcom.com, pbonzini@redhat.com,
	vkuznets@redhat.com, luto@kernel.org, Ostrovsky, Boris,
	kys@microsoft.com, haiyangz@microsoft.com, Cui, Dexuan


* Luck, Tony <tony.luck@intel.com> wrote:

> > >> base-commit: f30a0c0d2b08b355c01392538de8fc872387cb2b
> > >
> > > This commit doesn't exist in Linus' tree or the tip tree, and the series doesn't
> > > apply cleanly on any of the "obvious" choices.  Reviewing a 34 patches series
> > > without being able to apply it is a wee bit difficult...
> > >
> >
> > $ git show f30a0c0d2b08b355c01392538de8fc872387cb2b
> > commit f30a0c0d2b08b355c01392538de8fc872387cb2b
> > Merge: 49b517e68cf7 e396dd85172c
> > Author: Ingo Molnar <mingo@kernel.org>
> > Date:   Tue Apr 22 08:37:32 2025 +0200
> >
> >      Merge branch into tip/master: 'x86/sev'
> >
> >       # New commits in x86/sev:
> >          e396dd85172c ("x86/sev: Register tpm-svsm platform device")
> >          93b7c6b3ce91 ("tpm: Add SNP SVSM vTPM driver")
> >          b2849b072366 ("svsm: Add header with SVSM_VTPM_CMD helpers")
> >          770de678bc28 ("x86/sev: Add SVSM vTPM probe/send_command
> > functions")
> >
> >      Signed-off-by: Ingo Molnar <mingo@kernel.org>
> >
> >
> > You probably need to git pull from the tip tree :-)
> 
> If possible, you should avoid basing a series on tip/master as it 
> gets recreated frequently by merging all the topic branches. The SHA1 
> is here today, gone tomorrow.

Correct, although for x86 patch submissions via email it's not wrong: 
what applies today will likely apply tomorrow as well, regardless of 
the SHA1 change. :-)

> If your changes only depend on one TIP topic branch, base on that and 
> mention in the cover letter (as well as the SHA1 supplied from git 
> format-patches --base=xxx).

Yeah, the main dependency this series has is tip:x86/msr I believe:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/msr

Which SHA1's should be stable at this point.

Thanks,

	Ingo


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support
  2025-04-22 19:44       ` Ingo Molnar
@ 2025-04-22 19:51         ` Sean Christopherson
  0 siblings, 0 replies; 94+ messages in thread
From: Sean Christopherson @ 2025-04-22 19:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, Xin Li, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-perf-users@vger.kernel.org,
	linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev,
	linux-pm@vger.kernel.org, linux-edac@vger.kernel.org,
	xen-devel@lists.xenproject.org, linux-acpi@vger.kernel.org,
	linux-hwmon@vger.kernel.org, netdev@vger.kernel.org,
	platform-driver-x86@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, acme@kernel.org, jgross@suse.com,
	andrew.cooper3@citrix.com, peterz@infradead.org,
	namhyung@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	irogers@google.com, Adrian Hunter, kan.liang@linux.intel.com,
	wei.liu@kernel.org, ajay.kaher@broadcom.com,
	bcm-kernel-feedback-list@broadcom.com, pbonzini@redhat.com,
	vkuznets@redhat.com, luto@kernel.org, Boris Ostrovsky,
	kys@microsoft.com, haiyangz@microsoft.com, Dexuan Cui

On Tue, Apr 22, 2025, Ingo Molnar wrote:
> 
> * Luck, Tony <tony.luck@intel.com> wrote:
> 
> > > >> base-commit: f30a0c0d2b08b355c01392538de8fc872387cb2b
> > > >
> > > > This commit doesn't exist in Linus' tree or the tip tree, and the series doesn't
> > > > apply cleanly on any of the "obvious" choices.  Reviewing a 34 patches series
> > > > without being able to apply it is a wee bit difficult...
> > > >
> > >
> > > $ git show f30a0c0d2b08b355c01392538de8fc872387cb2b
> > > commit f30a0c0d2b08b355c01392538de8fc872387cb2b
> > > Merge: 49b517e68cf7 e396dd85172c
> > > Author: Ingo Molnar <mingo@kernel.org>
> > > Date:   Tue Apr 22 08:37:32 2025 +0200
> > >
> > >      Merge branch into tip/master: 'x86/sev'
> > >
> > >       # New commits in x86/sev:
> > >          e396dd85172c ("x86/sev: Register tpm-svsm platform device")
> > >          93b7c6b3ce91 ("tpm: Add SNP SVSM vTPM driver")
> > >          b2849b072366 ("svsm: Add header with SVSM_VTPM_CMD helpers")
> > >          770de678bc28 ("x86/sev: Add SVSM vTPM probe/send_command
> > > functions")
> > >
> > >      Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > >
> > >
> > > You probably need to git pull from the tip tree :-)
> > 
> > If possible, you should avoid basing a series on tip/master as it 
> > gets recreated frequently by merging all the topic branches. The SHA1 
> > is here today, gone tomorrow.
> 
> Correct, although for x86 patch submissions via email it's not wrong: 
> what applies today will likely apply tomorrow as well, regardless of 
> the SHA1 change. :-)

Yeah, but as Tony pointed out, when using base commit that may be ephemeral, then
the cover letter needs to call out the tree+branch.  This series applies on the
current tip/master, but there was nothing to clue me into that fact.

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2025-04-25 20:14 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h> Xin Li (Intel)
2025-04-23 14:13   ` Dave Hansen
2025-04-23 17:12     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 02/34] x86/msr: Remove rdpmc() Xin Li (Intel)
2025-04-23 14:23   ` Dave Hansen
2025-04-22  8:21 ` [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq() Xin Li (Intel)
2025-04-23 14:24   ` Dave Hansen
2025-04-23 14:28   ` Sean Christopherson
2025-04-23 15:06     ` Dave Hansen
2025-04-23 17:23       ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function Xin Li (Intel)
2025-04-23 14:25   ` Dave Hansen
2025-04-22  8:21 ` [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions Xin Li (Intel)
2025-04-22  8:40   ` Jürgen Groß
2025-04-22  8:21 ` [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC Xin Li (Intel)
2025-04-22  8:38   ` Jürgen Groß
2025-04-22  9:12     ` Xin Li
2025-04-22  9:28       ` Juergen Gross
2025-04-23  7:40         ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 07/34] x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq() Xin Li (Intel)
2025-04-23 15:51   ` Dave Hansen
2025-04-23 17:27     ` Xin Li
2025-04-23 23:23     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 09/34] x86/msr: Add the native_rdmsrq() helper Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses Xin Li (Intel)
2025-04-22 15:09   ` Sean Christopherson
2025-04-23  9:27     ` Xin Li
2025-04-23 13:37       ` Sean Christopherson
2025-04-23 14:02       ` Dave Hansen
2025-04-22  8:21 ` [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}() Xin Li (Intel)
2025-04-24  6:25   ` Mi, Dapeng
2025-04-24  7:16     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
2025-04-24  6:33   ` Mi, Dapeng
2025-04-24  7:21     ` Xin Li
2025-04-24  7:43       ` Mi, Dapeng
2025-04-24  7:50         ` Xin Li
2025-04-24 10:05   ` Jürgen Groß
2025-04-24 17:49     ` Xin Li
2025-04-24 21:14       ` H. Peter Anvin
2025-04-24 22:24         ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg() Xin Li (Intel)
2025-04-24 10:11   ` Jürgen Groß
2025-04-24 17:50     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}() Xin Li (Intel)
2025-04-24 10:16   ` Jürgen Groß
2025-04-22  8:21 ` [RFC PATCH v2 15/34] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low) Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 16/34] x86/msr: Change function type of native_read_msr_safe() Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 17/34] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 18/34] x86/opcode: Add immediate form MSR instructions Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 19/34] x86/extable: Add support for " Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 20/34] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR Xin Li (Intel)
2025-04-22  9:57   ` Jürgen Groß
2025-04-23  8:51     ` Xin Li
2025-04-23 16:05       ` Jürgen Groß
2025-04-24  8:06         ` Xin Li
2025-04-24  8:14           ` Jürgen Groß
2025-04-25  1:15             ` H. Peter Anvin
2025-04-25  3:44               ` H. Peter Anvin
2025-04-25  7:01                 ` Jürgen Groß
2025-04-25 15:28                   ` H. Peter Anvin
2025-04-25  6:51               ` Jürgen Groß
2025-04-25 12:33         ` Peter Zijlstra
2025-04-25 12:51           ` Jürgen Groß
2025-04-25 20:12             ` H. Peter Anvin
2025-04-25 15:29           ` H. Peter Anvin
2025-04-25  7:11     ` Peter Zijlstra
2025-04-22  8:22 ` [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR Xin Li (Intel)
2025-04-22  8:59   ` Jürgen Groß
2025-04-22  9:20     ` Xin Li
2025-04-22  9:57       ` Jürgen Groß
2025-04-22 11:12   ` Jürgen Groß
2025-04-23  9:03     ` Xin Li
2025-04-23 16:11       ` Jürgen Groß
2025-04-22  8:22 ` [RFC PATCH v2 23/34] x86/extable: Remove new dead code in ex_handler_msr() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 24/34] x86/mce: Use native MSR API __native_{wr,rd}msrq() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 25/34] x86/msr: Rename native_wrmsrq() to native_wrmsrq_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 26/34] x86/msr: Rename native_wrmsr() to native_wrmsr_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 27/34] x86/msr: Rename native_write_msr() to native_wrmsrq() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 28/34] x86/msr: Rename native_write_msr_safe() to native_wrmsrq_safe() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 29/34] x86/msr: Rename native_rdmsrq() to native_rdmsrq_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 30/34] x86/msr: Rename native_rdmsr() to native_rdmsr_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 31/34] x86/msr: Rename native_read_msr() to native_rdmsrq() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 32/34] x86/msr: Rename native_read_msr_safe() to native_rdmsrq_safe() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 33/34] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 34/34] x86/msr: Convert native_rdmsr_no_trace() uses to native_rdmsrq_no_trace() uses Xin Li (Intel)
2025-04-22 15:03 ` [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Sean Christopherson
2025-04-22 17:51   ` Xin Li
2025-04-22 18:05     ` Luck, Tony
2025-04-22 19:44       ` Ingo Molnar
2025-04-22 19:51         ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).