[PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions
@ 2025-05-06  9:20 Juergen Gross
  2025-05-06  9:20 ` [PATCH 1/6] coco/tdx: Rename MSR access helpers Juergen Gross
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Juergen Gross @ 2025-05-06  9:20 UTC (permalink / raw)
  To: linux-kernel, x86, linux-coco, kvm, linux-hyperv, virtualization
  Cc: xin, Juergen Gross, Kirill A. Shutemov, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Paolo Bonzini, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Vitaly Kuznetsov,
	Boris Ostrovsky, xen-devel, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list

When building a kernel with CONFIG_PARAVIRT_XXL the paravirt
infrastructure will always use functions for reading or writing MSRs,
even when running on bare metal.

Switch to inline RDMSR/WRMSR instructions in this case, reducing the
paravirt overhead.

In order to make this less intrusive, some further reorganization of
the MSR access helpers is done in the first 4 patches.

This series has been tested to work with Xen PV and on bare metal.

There has been another approach by Xin Li, which used dedicated #ifdef
and removing the MSR related paravirt hooks instead of just modifying
the paravirt code generation.

Please note that I haven't included the use of WRMSRNS or the
immediate forms of WRMSR and RDMSR, because I wanted to get some
feedback on my approach first. Enhancing paravirt for those cases
is not very complicated, as the main base is already prepared for
that enhancement.

This series is based on the x86/msr branch of the tip tree.

Juergen Gross (6):
  coco/tdx: Rename MSR access helpers
  x86/kvm: Rename the KVM private read_msr() function
  x86/msr: minimize usage of native_*() msr access functions
  x86/msr: Move MSR trace calls one function level up
  x86/paravirt: Switch MSR access pv_ops functions to instruction
    interfaces
  x86/msr: reduce number of low level MSR access helpers

 arch/x86/coco/tdx/tdx.c                   |   8 +-
 arch/x86/hyperv/ivm.c                     |   2 +-
 arch/x86/include/asm/kvm_host.h           |   2 +-
 arch/x86/include/asm/msr.h                | 116 ++++++++++-------
 arch/x86/include/asm/paravirt.h           | 152 ++++++++++++++--------
 arch/x86/include/asm/paravirt_types.h     |  13 +-
 arch/x86/include/asm/qspinlock_paravirt.h |   5 +-
 arch/x86/kernel/kvmclock.c                |   2 +-
 arch/x86/kernel/paravirt.c                |  26 +++-
 arch/x86/kvm/svm/svm.c                    |  16 +--
 arch/x86/kvm/vmx/vmx.c                    |   4 +-
 arch/x86/xen/enlighten_pv.c               |  60 ++++++---
 arch/x86/xen/pmu.c                        |   4 +-
 13 files changed, 262 insertions(+), 148 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/6] coco/tdx: Rename MSR access helpers
  2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
@ 2025-05-06  9:20 ` Juergen Gross
  2025-05-06 11:26   ` Kirill A. Shutemov
  2025-05-06  9:20 ` [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function Juergen Gross
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Juergen Gross @ 2025-05-06  9:20 UTC (permalink / raw)
  To: linux-kernel, x86, linux-coco
  Cc: xin, Juergen Gross, Kirill A. Shutemov, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

In order to avoid a name clash with some general MSR access helpers
after a future MSR infrastructure rework, rename the TDX specific
helpers.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/coco/tdx/tdx.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index edab6d6049be..49d79668f85f 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -428,7 +428,7 @@ static void __cpuidle tdx_safe_halt(void)
 	raw_local_irq_enable();
 }
 
-static int read_msr(struct pt_regs *regs, struct ve_info *ve)
+static int tdx_read_msr(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_module_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -449,7 +449,7 @@ static int read_msr(struct pt_regs *regs, struct ve_info *ve)
 	return ve_instr_len(ve);
 }
 
-static int write_msr(struct pt_regs *regs, struct ve_info *ve)
+static int tdx_write_msr(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_module_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -802,9 +802,9 @@ static int virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
 	case EXIT_REASON_HLT:
 		return handle_halt(ve);
 	case EXIT_REASON_MSR_READ:
-		return read_msr(regs, ve);
+		return tdx_read_msr(regs, ve);
 	case EXIT_REASON_MSR_WRITE:
-		return write_msr(regs, ve);
+		return tdx_write_msr(regs, ve);
 	case EXIT_REASON_CPUID:
 		return handle_cpuid(regs, ve);
 	case EXIT_REASON_EPT_VIOLATION:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function
  2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
  2025-05-06  9:20 ` [PATCH 1/6] coco/tdx: Rename MSR access helpers Juergen Gross
@ 2025-05-06  9:20 ` Juergen Gross
  2025-05-06 13:53   ` Sean Christopherson
  2025-05-06  9:20 ` [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions Juergen Gross
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Juergen Gross @ 2025-05-06  9:20 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: xin, Juergen Gross, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

Avoid a name clash with a new general MSR access helper after a future
MSR infrastructure rework by renaming the KVM specific read_msr() to
kvm_read_msr().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/vmx/vmx.c          | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9c971f846108..308f7020dc9d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2275,7 +2275,7 @@ static inline void kvm_load_ldt(u16 sel)
 }
 
 #ifdef CONFIG_X86_64
-static inline unsigned long read_msr(unsigned long msr)
+static inline unsigned long kvm_read_msr(unsigned long msr)
 {
 	u64 value;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63de5f6051e5..5a5f3c57363c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1335,8 +1335,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	} else {
 		savesegment(fs, fs_sel);
 		savesegment(gs, gs_sel);
-		fs_base = read_msr(MSR_FS_BASE);
-		vmx->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
+		fs_base = kvm_read_msr(MSR_FS_BASE);
+		vmx->msr_host_kernel_gs_base = kvm_read_msr(MSR_KERNEL_GS_BASE);
 	}
 
 	wrmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions
  2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
  2025-05-06  9:20 ` [PATCH 1/6] coco/tdx: Rename MSR access helpers Juergen Gross
  2025-05-06  9:20 ` [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function Juergen Gross
@ 2025-05-06  9:20 ` Juergen Gross
  2025-05-06 14:24   ` Sean Christopherson
  2025-05-09 21:49   ` Wei Liu
  2025-05-06  9:20 ` [PATCH 4/6] x86/msr: Move MSR trace calls one function level up Juergen Gross
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Juergen Gross @ 2025-05-06  9:20 UTC (permalink / raw)
  To: linux-kernel, x86, linux-hyperv, kvm
  Cc: xin, Juergen Gross, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Sean Christopherson, Boris Ostrovsky, xen-devel

In order to prepare for some MSR access function reorg work, switch
most users of native_{read|write}_msr[_safe]() to the more generic
rdmsr*()/wrmsr*() variants.

For now this will have some intermediate performance impact with
paravirtualization configured when running on bare metal, but this
is a prereq change for the planned direct inlining of the rdmsr/wrmsr
instructions with this configuration.

The main reason for this switch is the planned move of the MSR trace
function invocation from the native_*() functions to the generic
rdmsr*()/wrmsr*() variants. Without this switch the users of the
native_*() functions would lose the related tracing entries.

Note that the Xen related MSR access functions will not be switched,
as these will be handled after the move of the trace hooks.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/hyperv/ivm.c      |  2 +-
 arch/x86/kernel/kvmclock.c |  2 +-
 arch/x86/kvm/svm/svm.c     | 16 ++++++++--------
 arch/x86/xen/pmu.c         |  4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 09a165a3c41e..fe177a6be581 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -319,7 +319,7 @@ int hv_snp_boot_ap(u32 cpu, unsigned long start_ip)
 	asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
 	hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
 
-	vmsa->efer = native_read_msr(MSR_EFER);
+	rdmsrq(MSR_EFER, vmsa->efer);
 
 	vmsa->cr4 = native_read_cr4();
 	vmsa->cr3 = __native_read_cr3();
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca0a49eeac4a..b6cd45cce5fe 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -196,7 +196,7 @@ static void kvm_setup_secondary_clock(void)
 void kvmclock_disable(void)
 {
 	if (msr_kvm_system_time)
-		native_write_msr(msr_kvm_system_time, 0);
+		wrmsrq(msr_kvm_system_time, 0);
 }
 
 static void __init kvmclock_init_mem(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4c2a843780bf..3f0eed84f82a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -482,12 +482,12 @@ static void svm_init_erratum_383(void)
 		return;
 
 	/* Use _safe variants to not break nested virtualization */
-	if (native_read_msr_safe(MSR_AMD64_DC_CFG, &val))
+	if (rdmsrq_safe(MSR_AMD64_DC_CFG, &val))
 		return;
 
 	val |= (1ULL << 47);
 
-	native_write_msr_safe(MSR_AMD64_DC_CFG, val);
+	wrmsrq_safe(MSR_AMD64_DC_CFG, val);
 
 	erratum_383_found = true;
 }
@@ -650,9 +650,9 @@ static int svm_enable_virtualization_cpu(void)
 		u64 len, status = 0;
 		int err;
 
-		err = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
+		err = rdmsrq_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
 		if (!err)
-			err = native_read_msr_safe(MSR_AMD64_OSVW_STATUS, &status);
+			err = rdmsrq_safe(MSR_AMD64_OSVW_STATUS, &status);
 
 		if (err)
 			osvw_status = osvw_len = 0;
@@ -2149,7 +2149,7 @@ static bool is_erratum_383(void)
 	if (!erratum_383_found)
 		return false;
 
-	if (native_read_msr_safe(MSR_IA32_MC0_STATUS, &value))
+	if (rdmsrq_safe(MSR_IA32_MC0_STATUS, &value))
 		return false;
 
 	/* Bit 62 may or may not be set for this mce */
@@ -2160,11 +2160,11 @@ static bool is_erratum_383(void)
 
 	/* Clear MCi_STATUS registers */
 	for (i = 0; i < 6; ++i)
-		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
+		wrmsrq_safe(MSR_IA32_MCx_STATUS(i), 0);
 
-	if (!native_read_msr_safe(MSR_IA32_MCG_STATUS, &value)) {
+	if (!rdmsrq_safe(MSR_IA32_MCG_STATUS, &value)) {
 		value &= ~(1ULL << 2);
-		native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
+		wrmsrq_safe(MSR_IA32_MCG_STATUS, value);
 	}
 
 	/* Flush tlb to evict multi-match entries */
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 8f89ce0b67e3..d49a3bdc448b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -323,7 +323,7 @@ static u64 xen_amd_read_pmc(int counter)
 		u64 val;
 
 		msr = amd_counters_base + (counter * amd_msr_step);
-		native_read_msr_safe(msr, &val);
+		rdmsrq_safe(msr, &val);
 		return val;
 	}
 
@@ -349,7 +349,7 @@ static u64 xen_intel_read_pmc(int counter)
 		else
 			msr = MSR_IA32_PERFCTR0 + counter;
 
-		native_read_msr_safe(msr, &val);
+		rdmsrq_safe(msr, &val);
 		return val;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/6] x86/msr: Move MSR trace calls one function level up
  2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
                   ` (2 preceding siblings ...)
  2025-05-06  9:20 ` [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions Juergen Gross
@ 2025-05-06  9:20 ` Juergen Gross
  2025-05-06  9:20 ` [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces Juergen Gross
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Juergen Gross @ 2025-05-06  9:20 UTC (permalink / raw)
  To: linux-kernel, x86, virtualization
  Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list

In order to prepare paravirt inlining of the MSR access instructions
move the calls of MSR trace functions one function level up.

Introduce helpers {read|write}_msr[_safe]() helpers allowing to have
common definitions in msr.h doing the trace calls.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/msr.h      | 102 ++++++++++++++++++++------------
 arch/x86/include/asm/paravirt.h |  38 +++---------
 2 files changed, 73 insertions(+), 67 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index a9ce56fc8785..3a94cffb6a3e 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -103,14 +103,7 @@ static __always_inline u64 native_rdmsrq(u32 msr)
 
 static inline u64 native_read_msr(u32 msr)
 {
-	u64 val;
-
-	val = __rdmsr(msr);
-
-	if (tracepoint_enabled(read_msr))
-		do_trace_read_msr(msr, val, 0);
-
-	return val;
+	return __rdmsr(msr);
 }
 
 static inline int native_read_msr_safe(u32 msr, u64 *p)
@@ -123,8 +116,6 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
 		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
 		     : [err] "=r" (err), EAX_EDX_RET(val, low, high)
 		     : "c" (msr));
-	if (tracepoint_enabled(read_msr))
-		do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), err);
 
 	*p = EAX_EDX_VAL(val, low, high);
 
@@ -135,9 +126,6 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
 static inline void notrace native_write_msr(u32 msr, u64 val)
 {
 	native_wrmsrq(msr, val);
-
-	if (tracepoint_enabled(write_msr))
-		do_trace_write_msr(msr, val, 0);
 }
 
 /* Can be uninlined because referenced by paravirt */
@@ -151,8 +139,6 @@ static inline int notrace native_write_msr_safe(u32 msr, u64 val)
 		     : [err] "=a" (err)
 		     : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
 		     : "memory");
-	if (tracepoint_enabled(write_msr))
-		do_trace_write_msr(msr, val, err);
 	return err;
 }
 
@@ -173,59 +159,96 @@ static inline u64 native_read_pmc(int counter)
 #include <asm/paravirt.h>
 #else
 #include <linux/errno.h>
+static __always_inline u64 read_msr(u32 msr)
+{
+	return native_read_msr(msr);
+}
+
+static __always_inline int read_msr_safe(u32 msr, u64 *p)
+{
+	return native_read_msr_safe(msr, p);
+}
+
+static __always_inline void write_msr(u32 msr, u64 val)
+{
+	native_write_msr(msr, val);
+}
+
+static __always_inline int write_msr_safe(u32 msr, u64 val)
+{
+	return native_write_msr_safe(msr, val);
+}
+
+static __always_inline u64 rdpmc(int counter)
+{
+	return native_read_pmc(counter);
+}
+
+#endif	/* !CONFIG_PARAVIRT_XXL */
+
 /*
  * Access to machine-specific registers (available on 586 and better only)
  * Note: the rd* operations modify the parameters directly (without using
  * pointer indirection), this allows gcc to optimize better
  */
 
+#define rdmsrq(msr, val)			\
+do {						\
+	(val) = read_msr(msr);			\
+	if (tracepoint_enabled(read_msr))	\
+		do_trace_read_msr(msr, val, 0);	\
+} while (0)
+
 #define rdmsr(msr, low, high)					\
 do {								\
-	u64 __val = native_read_msr((msr));			\
+	u64 __val;						\
+	rdmsrq(msr, __val);					\
 	(void)((low) = (u32)__val);				\
 	(void)((high) = (u32)(__val >> 32));			\
 } while (0)
 
-static inline void wrmsr(u32 msr, u32 low, u32 high)
+/* rdmsr with exception handling */
+static inline int rdmsrq_safe(u32 msr, u64 *p)
 {
-	native_write_msr(msr, (u64)high << 32 | low);
-}
+	int err;
 
-#define rdmsrq(msr, val)			\
-	((val) = native_read_msr((msr)))
+	err = read_msr_safe(msr, p);
 
-static inline void wrmsrq(u32 msr, u64 val)
-{
-	native_write_msr(msr, val);
-}
+	if (tracepoint_enabled(read_msr))
+		do_trace_read_msr(msr, *p, err);
 
-/* wrmsr with exception handling */
-static inline int wrmsrq_safe(u32 msr, u64 val)
-{
-	return native_write_msr_safe(msr, val);
+	return err;
 }
 
-/* rdmsr with exception handling */
 #define rdmsr_safe(msr, low, high)				\
 ({								\
 	u64 __val;						\
-	int __err = native_read_msr_safe((msr), &__val);	\
+	int __err = rdmsrq_safe((msr), &__val);			\
 	(*low) = (u32)__val;					\
 	(*high) = (u32)(__val >> 32);				\
 	__err;							\
 })
 
-static inline int rdmsrq_safe(u32 msr, u64 *p)
+static inline void wrmsrq(u32 msr, u64 val)
 {
-	return native_read_msr_safe(msr, p);
+	write_msr(msr, val);
+
+	if (tracepoint_enabled(write_msr))
+		do_trace_write_msr(msr, val, 0);
 }
 
-static __always_inline u64 rdpmc(int counter)
+/* wrmsr with exception handling */
+static inline int wrmsrq_safe(u32 msr, u64 val)
 {
-	return native_read_pmc(counter);
-}
+	int err;
 
-#endif	/* !CONFIG_PARAVIRT_XXL */
+	err = write_msr_safe(msr, val);
+
+	if (tracepoint_enabled(write_msr))
+		do_trace_write_msr(msr, val, err);
+
+	return err;
+}
 
 /* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
 #define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
@@ -242,6 +265,11 @@ static __always_inline void wrmsrns(u32 msr, u64 val)
 		     : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
 }
 
+static inline void wrmsr(u32 msr, u32 low, u32 high)
+{
+	wrmsrq(msr, (u64)high << 32 | low);
+}
+
 /*
  * Dual u32 version of wrmsrq_safe():
  */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 03f680d1057a..a463c747c780 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -195,46 +195,24 @@ static inline int paravirt_write_msr_safe(u32 msr, u64 val)
 	return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
 }
 
-#define rdmsr(msr, val1, val2)			\
-do {						\
-	u64 _l = paravirt_read_msr(msr);	\
-	val1 = (u32)_l;				\
-	val2 = _l >> 32;			\
-} while (0)
-
-static __always_inline void wrmsr(u32 msr, u32 low, u32 high)
+static __always_inline u64 read_msr(u32 msr)
 {
-	paravirt_write_msr(msr, (u64)high << 32 | low);
+	return paravirt_read_msr(msr);
 }
 
-#define rdmsrq(msr, val)			\
-do {						\
-	val = paravirt_read_msr(msr);		\
-} while (0)
-
-static inline void wrmsrq(u32 msr, u64 val)
+static __always_inline int read_msr_safe(u32 msr, u64 *p)
 {
-	paravirt_write_msr(msr, val);
+	return paravirt_read_msr_safe(msr, p);
 }
 
-static inline int wrmsrq_safe(u32 msr, u64 val)
+static __always_inline void write_msr(u32 msr, u64 val)
 {
-	return paravirt_write_msr_safe(msr, val);
+	paravirt_write_msr(msr, val);
 }
 
-/* rdmsr with exception handling */
-#define rdmsr_safe(msr, a, b)				\
-({							\
-	u64 _l;						\
-	int _err = paravirt_read_msr_safe((msr), &_l);	\
-	(*a) = (u32)_l;					\
-	(*b) = (u32)(_l >> 32);				\
-	_err;						\
-})
-
-static __always_inline int rdmsrq_safe(u32 msr, u64 *p)
+static __always_inline int write_msr_safe(u32 msr, u64 val)
 {
-	return paravirt_read_msr_safe(msr, p);
+	return paravirt_write_msr_safe(msr, val);
 }
 
 static __always_inline u64 rdpmc(int counter)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
                   ` (3 preceding siblings ...)
  2025-05-06  9:20 ` [PATCH 4/6] x86/msr: Move MSR trace calls one function level up Juergen Gross
@ 2025-05-06  9:20 ` Juergen Gross
  2025-05-09  8:18   ` Xin Li
  2025-05-06  9:20 ` [PATCH 6/6] x86/msr: reduce number of low level MSR access helpers Juergen Gross
  2025-05-10 16:03 ` [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Michael Kelley
  6 siblings, 1 reply; 30+ messages in thread
From: Juergen Gross @ 2025-05-06  9:20 UTC (permalink / raw)
  To: linux-kernel, x86, virtualization
  Cc: xin, Juergen Gross, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Boris Ostrovsky, xen-devel

Instead of having callback functions for rdmsr/wrmsr on native, switch
to inline the respective instructions directly in order to avoid
overhead with the call interface.

This requires to use the instruction interfaces for rdmsr/wrmsr
emulation when running as a Xen PV guest.

In order to prepare support for the immediate forms of RDMSR and WRMSR
when not running as a Xen PV guest, use the RDMSR and WRMSR
instructions as the fallback case instead of ALT_CALL_INSTR.

Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
even as an intermediate step, as this would clobber the indirect call
information needed when patching in the direct call for the Xen case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt.h           | 114 +++++++++++++++++-----
 arch/x86/include/asm/paravirt_types.h     |  13 ++-
 arch/x86/include/asm/qspinlock_paravirt.h |   5 +-
 arch/x86/kernel/paravirt.c                |  26 ++++-
 arch/x86/xen/enlighten_pv.c               |  56 ++++++++---
 5 files changed, 167 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a463c747c780..df10b0e4f7b8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
 	PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline u64 paravirt_read_msr(u32 msr)
+static __always_inline u64 paravirt_read_msr(u32 msr)
 {
-	return PVOP_CALL1(u64, cpu.read_msr, msr);
+	EAX_EDX_DECLARE_ARGS(val, low, high);
+
+	PVOP_TEST_NULL(cpu.read_msr);
+	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+					"rdmsr", ALT_NOT_XEN,
+					ALT_CALL_INSTR, ALT_XENPV_CALL)
+		     "2:\n"
+		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
+		     : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
+		     : paravirt_ptr(cpu.read_msr), "c" (msr));
+
+	return EAX_EDX_VAL(val, low, high);
 }
 
-static inline void paravirt_write_msr(u32 msr, u64 val)
+static __always_inline void paravirt_write_msr(u32 msr, u64 val)
 {
-	PVOP_VCALL2(cpu.write_msr, msr, val);
+	PVOP_TEST_NULL(cpu.write_msr);
+	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+					"wrmsr", ALT_NOT_XEN,
+					ALT_CALL_INSTR, ALT_XENPV_CALL)
+		      "2:\n"
+		      _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
+		      : ASM_CALL_CONSTRAINT
+		      : paravirt_ptr(cpu.write_msr),
+			  "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32))
+		      : "memory");
 }
 
-static inline int paravirt_read_msr_safe(u32 msr, u64 *val)
+static __always_inline int paravirt_read_msr_safe(u32 msr, u64 *p)
 {
-	return PVOP_CALL2(int, cpu.read_msr_safe, msr, val);
+	int err;
+	EAX_EDX_DECLARE_ARGS(val, low, high);
+
+	PVOP_TEST_NULL(cpu.read_msr_safe);
+	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+					"rdmsr; xor %[err],%[err]", ALT_NOT_XEN,
+					ALT_CALL_INSTR, ALT_XENPV_CALL)
+		     "2:\n"
+		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
+		     : [err] "=c" (err), EAX_EDX_RET(val, low, high),
+		       ASM_CALL_CONSTRAINT
+		     : paravirt_ptr(cpu.read_msr_safe), "0" (msr));
+
+	*p = EAX_EDX_VAL(val, low, high);
+
+	return err;
 }
 
-static inline int paravirt_write_msr_safe(u32 msr, u64 val)
+static __always_inline int paravirt_write_msr_safe(u32 msr, u64 val)
 {
-	return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
+	int err;
+
+	PVOP_TEST_NULL(cpu.write_msr_safe);
+	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
+					"wrmsr; xor %[err],%[err]", ALT_NOT_XEN,
+					ALT_CALL_INSTR, ALT_XENPV_CALL)
+		     "2:\n"
+		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
+		     : [err] "=a" (err), ASM_CALL_CONSTRAINT
+		     : paravirt_ptr(cpu.write_msr_safe),
+		       "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
+		     : "memory");
+
+	return err;
 }
 
 static __always_inline u64 read_msr(u32 msr)
@@ -573,27 +621,43 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
 #define PV_SAVE_ALL_CALLER_REGS		"pushl %ecx;"
 #define PV_RESTORE_ALL_CALLER_REGS	"popl  %ecx;"
 #else
+/* save and restore caller-save registers, except %rax, %rcx and %rdx. */
+#define PV_SAVE_COMMON_CALLER_REGS	\
+	"push %rsi;"			\
+	"push %rdi;"			\
+	"push %r8;"			\
+	"push %r9;"			\
+	"push %r10;"			\
+	"push %r11;"
+#define PV_RESTORE_COMMON_CALLER_REGS	\
+	"pop %r11;"			\
+	"pop %r10;"			\
+	"pop %r9;"			\
+	"pop %r8;"			\
+	"pop %rdi;"			\
+	"pop %rsi;"
+
+#define PV_PROLOGUE_MSR(func)		\
+	PV_SAVE_COMMON_CALLER_REGS	\
+	PV_PROLOGUE_MSR_##func
+#define PV_EPILOGUE_MSR(func)		\
+	PV_EPILOGUE_MSR_##func		\
+	PV_RESTORE_COMMON_CALLER_REGS
+
 /* save and restore all caller-save registers, except return value */
 #define PV_SAVE_ALL_CALLER_REGS						\
 	"push %rcx;"							\
 	"push %rdx;"							\
-	"push %rsi;"							\
-	"push %rdi;"							\
-	"push %r8;"							\
-	"push %r9;"							\
-	"push %r10;"							\
-	"push %r11;"
+	PV_SAVE_COMMON_CALLER_REGS
 #define PV_RESTORE_ALL_CALLER_REGS					\
-	"pop %r11;"							\
-	"pop %r10;"							\
-	"pop %r9;"							\
-	"pop %r8;"							\
-	"pop %rdi;"							\
-	"pop %rsi;"							\
+	PV_RESTORE_COMMON_CALLER_REGS					\
 	"pop %rdx;"							\
 	"pop %rcx;"
 #endif
 
+#define PV_PROLOGUE_ALL(func)	PV_SAVE_ALL_CALLER_REGS
+#define PV_EPILOGUE_ALL(func)	PV_RESTORE_ALL_CALLER_REGS
+
 /*
  * Generate a thunk around a function which saves all caller-save
  * registers except for the return value.  This allows C functions to
@@ -607,7 +671,7 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
  * functions.
  */
 #define PV_THUNK_NAME(func) "__raw_callee_save_" #func
-#define __PV_CALLEE_SAVE_REGS_THUNK(func, section)			\
+#define __PV_CALLEE_SAVE_REGS_THUNK(func, section, helper)		\
 	extern typeof(func) __raw_callee_save_##func;			\
 									\
 	asm(".pushsection " section ", \"ax\";"				\
@@ -617,16 +681,18 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
 	    PV_THUNK_NAME(func) ":"					\
 	    ASM_ENDBR							\
 	    FRAME_BEGIN							\
-	    PV_SAVE_ALL_CALLER_REGS					\
+	    PV_PROLOGUE_##helper(func)					\
 	    "call " #func ";"						\
-	    PV_RESTORE_ALL_CALLER_REGS					\
+	    PV_EPILOGUE_##helper(func)					\
 	    FRAME_END							\
 	    ASM_RET							\
 	    ".size " PV_THUNK_NAME(func) ", .-" PV_THUNK_NAME(func) ";"	\
 	    ".popsection")
 
 #define PV_CALLEE_SAVE_REGS_THUNK(func)			\
-	__PV_CALLEE_SAVE_REGS_THUNK(func, ".text")
+	__PV_CALLEE_SAVE_REGS_THUNK(func, ".text", ALL)
+#define PV_CALLEE_SAVE_REGS_MSR_THUNK(func)		\
+	__PV_CALLEE_SAVE_REGS_THUNK(func, ".text", MSR)
 
 /* Get a reference to a callee-save function */
 #define PV_CALLEE_SAVE(func)						\
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index b08b9d3122d6..f7f879319e90 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -91,15 +91,15 @@ struct pv_cpu_ops {
 		      unsigned int *ecx, unsigned int *edx);
 
 	/* Unsafe MSR operations.  These will warn or panic on failure. */
-	u64 (*read_msr)(u32 msr);
-	void (*write_msr)(u32 msr, u64 val);
+	struct paravirt_callee_save read_msr;
+	struct paravirt_callee_save write_msr;
 
 	/*
 	 * Safe MSR operations.
 	 * Returns 0 or -EIO.
 	 */
-	int (*read_msr_safe)(u32 msr, u64 *val);
-	int (*write_msr_safe)(u32 msr, u64 val);
+	struct paravirt_callee_save read_msr_safe;
+	struct paravirt_callee_save write_msr_safe;
 
 	u64 (*read_pmc)(int counter);
 
@@ -520,6 +520,10 @@ unsigned long pv_native_save_fl(void);
 void pv_native_irq_disable(void);
 void pv_native_irq_enable(void);
 unsigned long pv_native_read_cr2(void);
+void pv_native_rdmsr(void);
+void pv_native_wrmsr(void);
+void pv_native_rdmsr_safe(void);
+void pv_native_wrmsr_safe(void);
 #endif
 
 #define paravirt_nop	((void *)nop_func)
@@ -527,6 +531,7 @@ unsigned long pv_native_read_cr2(void);
 #endif	/* __ASSEMBLER__ */
 
 #define ALT_NOT_XEN	ALT_NOT(X86_FEATURE_XENPV)
+#define ALT_XENPV_CALL	ALT_DIRECT_CALL(X86_FEATURE_XENPV)
 
 #endif  /* CONFIG_PARAVIRT */
 #endif	/* _ASM_X86_PARAVIRT_TYPES_H */
diff --git a/arch/x86/include/asm/qspinlock_paravirt.h b/arch/x86/include/asm/qspinlock_paravirt.h
index 0a985784be9b..0351acb5a143 100644
--- a/arch/x86/include/asm/qspinlock_paravirt.h
+++ b/arch/x86/include/asm/qspinlock_paravirt.h
@@ -14,7 +14,8 @@ void __lockfunc __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 lock
  */
 #ifdef CONFIG_64BIT
 
-__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, ".spinlock.text");
+__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, ".spinlock.text",
+			    ALL);
 #define __pv_queued_spin_unlock	__pv_queued_spin_unlock
 
 /*
@@ -61,7 +62,7 @@ DEFINE_ASM_FUNC(__raw_callee_save___pv_queued_spin_unlock,
 #else /* CONFIG_64BIT */
 
 extern void __lockfunc __pv_queued_spin_unlock(struct qspinlock *lock);
-__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text");
+__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text", ALL);
 
 #endif /* CONFIG_64BIT */
 #endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 015bf298434f..ff7d7fdae360 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -50,6 +50,24 @@ DEFINE_ASM_FUNC(pv_native_save_fl, "pushf; pop %rax", .noinstr.text);
 DEFINE_ASM_FUNC(pv_native_irq_disable, "cli", .noinstr.text);
 DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
 DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
+DEFINE_ASM_FUNC(pv_native_rdmsr,
+		"1: rdmsr\n"
+		"2:\n"
+		_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR), .noinstr.text);
+DEFINE_ASM_FUNC(pv_native_wrmsr,
+		"1: wrmsr\n"
+		"2:\n"
+		_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR), .noinstr.text);
+DEFINE_ASM_FUNC(pv_native_rdmsr_safe,
+		"1: rdmsr; xor %ecx, %ecx\n"
+		"2:\n"
+		_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %%ecx),
+		.noinstr.text);
+DEFINE_ASM_FUNC(pv_native_wrmsr_safe,
+		"1: wrmsr; xor %eax, %eax\n"
+		"2:\n"
+		_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %%eax),
+		.noinstr.text);
 #endif
 
 DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
@@ -129,10 +147,10 @@ struct paravirt_patch_template pv_ops = {
 	.cpu.read_cr0		= native_read_cr0,
 	.cpu.write_cr0		= native_write_cr0,
 	.cpu.write_cr4		= native_write_cr4,
-	.cpu.read_msr		= native_read_msr,
-	.cpu.write_msr		= native_write_msr,
-	.cpu.read_msr_safe	= native_read_msr_safe,
-	.cpu.write_msr_safe	= native_write_msr_safe,
+	.cpu.read_msr		= __PV_IS_CALLEE_SAVE(pv_native_rdmsr),
+	.cpu.write_msr		= __PV_IS_CALLEE_SAVE(pv_native_wrmsr),
+	.cpu.read_msr_safe	= __PV_IS_CALLEE_SAVE(pv_native_rdmsr_safe),
+	.cpu.write_msr_safe	= __PV_IS_CALLEE_SAVE(pv_native_wrmsr_safe),
 	.cpu.read_pmc		= native_read_pmc,
 	.cpu.load_tr_desc	= native_load_tr_desc,
 	.cpu.set_ldt		= native_set_ldt,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 3be38350f044..c279b2bef7eb 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1160,36 +1160,66 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
 	}
 }
 
-static int xen_read_msr_safe(u32 msr, u64 *val)
-{
+/*
+ * Prototypes for functions called via PV_CALLEE_SAVE_REGS_THUNK() in order
+ * to avoid warnings with "-Wmissing-prototypes".
+ */
+struct xen_rdmsr_safe_ret {
+	u64 val;
 	int err;
+};
+struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr);
+int xen_write_msr_safe(u32 msr, u32 low, u32 high);
+u64 xen_read_msr(u32 msr);
+void xen_write_msr(u32 msr, u32 low, u32 high);
 
-	*val = xen_do_read_msr(msr, &err);
-	return err;
+__visible struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr)
+{
+	struct xen_rdmsr_safe_ret ret;
+
+	ret.val = xen_do_read_msr(msr, &ret.err);
+	return ret;
 }
+#define PV_PROLOGUE_MSR_xen_read_msr_safe	"mov %ecx, %edi;"
+#define PV_EPILOGUE_MSR_xen_read_msr_safe	\
+	"mov %edx, %ecx; mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
+PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr_safe);
 
-static int xen_write_msr_safe(u32 msr, u64 val)
+__visible int xen_write_msr_safe(u32 msr, u32 low, u32 high)
 {
 	int err = 0;
 
-	xen_do_write_msr(msr, val, &err);
+	xen_do_write_msr(msr, (u64)high << 32 | low, &err);
 
 	return err;
 }
+#define PV_PROLOGUE_MSR_xen_write_msr_safe	\
+	"mov %ecx, %edi; mov %eax, %esi;"
+#define PV_EPILOGUE_MSR_xen_write_msr_safe
+PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr_safe);
 
-static u64 xen_read_msr(u32 msr)
+__visible u64 xen_read_msr(u32 msr)
 {
 	int err;
 
 	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
 }
+#define PV_PROLOGUE_MSR_xen_read_msr	"mov %ecx, %edi;"
+#define PV_EPILOGUE_MSR_xen_read_msr	\
+	"mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
+PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr);
 
-static void xen_write_msr(u32 msr, u64 val)
+__visible void xen_write_msr(u32 msr, u32 low, u32 high)
 {
 	int err;
 
-	xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
+	xen_do_write_msr(msr, (u64)high << 32 | low,
+			 xen_msr_safe ? &err : NULL);
 }
+#define PV_PROLOGUE_MSR_xen_write_msr	\
+	"mov %ecx, %edi; mov %eax, %esi;"
+#define PV_EPILOGUE_MSR_xen_write_msr
+PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr);
 
 /* This is called once we have the cpu_possible_mask */
 void __init xen_setup_vcpu_info_placement(void)
@@ -1225,11 +1255,11 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
 
 		.write_cr4 = xen_write_cr4,
 
-		.read_msr = xen_read_msr,
-		.write_msr = xen_write_msr,
+		.read_msr = PV_CALLEE_SAVE(xen_read_msr),
+		.write_msr = PV_CALLEE_SAVE(xen_write_msr),
 
-		.read_msr_safe = xen_read_msr_safe,
-		.write_msr_safe = xen_write_msr_safe,
+		.read_msr_safe = PV_CALLEE_SAVE(xen_read_msr_safe),
+		.write_msr_safe = PV_CALLEE_SAVE(xen_write_msr_safe),
 
 		.read_pmc = xen_read_pmc,
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/6] x86/msr: reduce number of low level MSR access helpers
  2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
                   ` (4 preceding siblings ...)
  2025-05-06  9:20 ` [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces Juergen Gross
@ 2025-05-06  9:20 ` Juergen Gross
  2025-05-10 16:03 ` [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Michael Kelley
  6 siblings, 0 replies; 30+ messages in thread
From: Juergen Gross @ 2025-05-06  9:20 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Boris Ostrovsky, xen-devel

Some MSR access helpers are redundant now, so remove the no longer
needed ones.

At the same time make the native_*_msr_safe() helpers always inline.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/msr.h  | 20 ++++----------------
 arch/x86/xen/enlighten_pv.c |  4 ++--
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 3a94cffb6a3e..0e2ed1604015 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -101,12 +101,7 @@ static __always_inline u64 native_rdmsrq(u32 msr)
 #define native_wrmsrq(msr, val)				\
 	__wrmsrq((msr), (val))
 
-static inline u64 native_read_msr(u32 msr)
-{
-	return __rdmsr(msr);
-}
-
-static inline int native_read_msr_safe(u32 msr, u64 *p)
+static __always_inline int native_read_msr_safe(u32 msr, u64 *p)
 {
 	int err;
 	EAX_EDX_DECLARE_ARGS(val, low, high);
@@ -122,14 +117,7 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
 	return err;
 }
 
-/* Can be uninlined because referenced by paravirt */
-static inline void notrace native_write_msr(u32 msr, u64 val)
-{
-	native_wrmsrq(msr, val);
-}
-
-/* Can be uninlined because referenced by paravirt */
-static inline int notrace native_write_msr_safe(u32 msr, u64 val)
+static __always_inline int notrace native_write_msr_safe(u32 msr, u64 val)
 {
 	int err;
 
@@ -161,7 +149,7 @@ static inline u64 native_read_pmc(int counter)
 #include <linux/errno.h>
 static __always_inline u64 read_msr(u32 msr)
 {
-	return native_read_msr(msr);
+	return native_rdmsrq(msr);
 }
 
 static __always_inline int read_msr_safe(u32 msr, u64 *p)
@@ -171,7 +159,7 @@ static __always_inline int read_msr_safe(u32 msr, u64 *p)
 
 static __always_inline void write_msr(u32 msr, u64 val)
 {
-	native_write_msr(msr, val);
+	native_wrmsrq(msr, val);
 }
 
 static __always_inline int write_msr_safe(u32 msr, u64 val)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index c279b2bef7eb..ea3d7d583254 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1097,7 +1097,7 @@ static u64 xen_do_read_msr(u32 msr, int *err)
 	if (err)
 		*err = native_read_msr_safe(msr, &val);
 	else
-		val = native_read_msr(msr);
+		val = native_rdmsrq(msr);
 
 	switch (msr) {
 	case MSR_IA32_APICBASE:
@@ -1156,7 +1156,7 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
 		if (err)
 			*err = native_write_msr_safe(msr, val);
 		else
-			native_write_msr(msr, val);
+			native_wrmsrq(msr, val);
 	}
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/6] coco/tdx: Rename MSR access helpers
  2025-05-06  9:20 ` [PATCH 1/6] coco/tdx: Rename MSR access helpers Juergen Gross
@ 2025-05-06 11:26   ` Kirill A. Shutemov
  0 siblings, 0 replies; 30+ messages in thread
From: Kirill A. Shutemov @ 2025-05-06 11:26 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-coco, xin, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Tue, May 06, 2025 at 11:20:10AM +0200, Juergen Gross wrote:
> In order to avoid a name clash with some general MSR access helpers
> after a future MSR infrastructure rework, rename the TDX specific
> helpers.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function
  2025-05-06  9:20 ` [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function Juergen Gross
@ 2025-05-06 13:53   ` Sean Christopherson
  2025-05-06 13:58     ` Jürgen Groß
  2025-05-06 16:16     ` Ingo Molnar
  0 siblings, 2 replies; 30+ messages in thread
From: Sean Christopherson @ 2025-05-06 13:53 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, kvm, xin, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

On Tue, May 06, 2025, Juergen Gross wrote:
> Avoid a name clash with a new general MSR access helper after a future
> MSR infrastructure rework by renaming the KVM specific read_msr() to
> kvm_read_msr().
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 2 +-
>  arch/x86/kvm/vmx/vmx.c          | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9c971f846108..308f7020dc9d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2275,7 +2275,7 @@ static inline void kvm_load_ldt(u16 sel)
>  }
>  
>  #ifdef CONFIG_X86_64
> -static inline unsigned long read_msr(unsigned long msr)

Ewwww.  Eww, eww, eww.  I forgot this thing existed.

Please just delete this and use rdmsrq() directly (or is it still rdmsrl()? at
this point?).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function
  2025-05-06 13:53   ` Sean Christopherson
@ 2025-05-06 13:58     ` Jürgen Groß
  2025-05-06 16:16     ` Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Jürgen Groß @ 2025-05-06 13:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, x86, kvm, xin, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 1068 bytes --]

On 06.05.25 15:53, Sean Christopherson wrote:
> On Tue, May 06, 2025, Juergen Gross wrote:
>> Avoid a name clash with a new general MSR access helper after a future
>> MSR infrastructure rework by renaming the KVM specific read_msr() to
>> kvm_read_msr().
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h | 2 +-
>>   arch/x86/kvm/vmx/vmx.c          | 4 ++--
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 9c971f846108..308f7020dc9d 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -2275,7 +2275,7 @@ static inline void kvm_load_ldt(u16 sel)
>>   }
>>   
>>   #ifdef CONFIG_X86_64
>> -static inline unsigned long read_msr(unsigned long msr)
> 
> Ewwww.  Eww, eww, eww.  I forgot this thing existed.
> 
> Please just delete this and use rdmsrq() directly (or is it still rdmsrl()? at
> this point?).

rdmsrq() it is.

Fine with me.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions
  2025-05-06  9:20 ` [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions Juergen Gross
@ 2025-05-06 14:24   ` Sean Christopherson
  2025-05-09 21:49   ` Wei Liu
  1 sibling, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2025-05-06 14:24 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-hyperv, kvm, xin, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Paolo Bonzini,
	Vitaly Kuznetsov, Boris Ostrovsky, xen-devel

On Tue, May 06, 2025, Juergen Gross wrote:
> In order to prepare for some MSR access function reorg work, switch
> most users of native_{read|write}_msr[_safe]() to the more generic
> rdmsr*()/wrmsr*() variants.
> 
> For now this will have some intermediate performance impact with
> paravirtualization configured when running on bare metal, but this
> is a prereq change for the planned direct inlining of the rdmsr/wrmsr
> instructions with this configuration.

Oh the horror, KVM's probing of errata will be marginally slower :-)

> The main reason for this switch is the planned move of the MSR trace
> function invocation from the native_*() functions to the generic
> rdmsr*()/wrmsr*() variants. Without this switch the users of the
> native_*() functions would lose the related tracing entries.
> 
> Note that the Xen related MSR access functions will not be switched,
> as these will be handled after the move of the trace hooks.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---

Acked-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function
  2025-05-06 13:53   ` Sean Christopherson
  2025-05-06 13:58     ` Jürgen Groß
@ 2025-05-06 16:16     ` Ingo Molnar
  2025-05-06 16:29       ` H. Peter Anvin
  1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-05-06 16:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Juergen Gross, linux-kernel, x86, kvm, xin, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


* Sean Christopherson <seanjc@google.com> wrote:

> On Tue, May 06, 2025, Juergen Gross wrote:
> > Avoid a name clash with a new general MSR access helper after a future
> > MSR infrastructure rework by renaming the KVM specific read_msr() to
> > kvm_read_msr().
> > 
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 2 +-
> >  arch/x86/kvm/vmx/vmx.c          | 4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 9c971f846108..308f7020dc9d 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2275,7 +2275,7 @@ static inline void kvm_load_ldt(u16 sel)
> >  }
> >  
> >  #ifdef CONFIG_X86_64
> > -static inline unsigned long read_msr(unsigned long msr)
> 
> Ewwww.  Eww, eww, eww.  I forgot this thing existed.
> 
> Please just delete this and use rdmsrq() directly (or is it still rdmsrl()? at
> this point?).

Both will work, so code-in-transition isn't build-broken unnecessarily:

  arch/x86/include/asm/msr.h:#define rdmsrl(msr, val) rdmsrq(msr, val)

:-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function
  2025-05-06 16:16     ` Ingo Molnar
@ 2025-05-06 16:29       ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2025-05-06 16:29 UTC (permalink / raw)
  To: Ingo Molnar, Sean Christopherson
  Cc: Juergen Gross, linux-kernel, x86, kvm, xin, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen

On May 6, 2025 9:16:03 AM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>
>* Sean Christopherson <seanjc@google.com> wrote:
>
>> On Tue, May 06, 2025, Juergen Gross wrote:
>> > Avoid a name clash with a new general MSR access helper after a future
>> > MSR infrastructure rework by renaming the KVM specific read_msr() to
>> > kvm_read_msr().
>> > 
>> > Signed-off-by: Juergen Gross <jgross@suse.com>
>> > ---
>> >  arch/x86/include/asm/kvm_host.h | 2 +-
>> >  arch/x86/kvm/vmx/vmx.c          | 4 ++--
>> >  2 files changed, 3 insertions(+), 3 deletions(-)
>> > 
>> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> > index 9c971f846108..308f7020dc9d 100644
>> > --- a/arch/x86/include/asm/kvm_host.h
>> > +++ b/arch/x86/include/asm/kvm_host.h
>> > @@ -2275,7 +2275,7 @@ static inline void kvm_load_ldt(u16 sel)
>> >  }
>> >  
>> >  #ifdef CONFIG_X86_64
>> > -static inline unsigned long read_msr(unsigned long msr)
>> 
>> Ewwww.  Eww, eww, eww.  I forgot this thing existed.
>> 
>> Please just delete this and use rdmsrq() directly (or is it still rdmsrl()? at
>> this point?).
>
>Both will work, so code-in-transition isn't build-broken unnecessarily:
>
>  arch/x86/include/asm/msr.h:#define rdmsrl(msr, val) rdmsrq(msr, val)
>
>:-)
>
>Thanks,
>
>	Ingo

But for forward-looking code, rdmsrq().

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-06  9:20 ` [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces Juergen Gross
@ 2025-05-09  8:18   ` Xin Li
  2025-05-12 11:20     ` Jürgen Groß
  0 siblings, 1 reply; 30+ messages in thread
From: Xin Li @ 2025-05-09  8:18 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Boris Ostrovsky, xen-devel

On 5/6/2025 2:20 AM, Juergen Gross wrote:
> Instead of having callback functions for rdmsr/wrmsr on native, switch
> to inline the respective instructions directly in order to avoid
> overhead with the call interface.

To me, this is a beneficial addition to the existing pvops MSR code.

> 
> This requires to use the instruction interfaces for rdmsr/wrmsr
> emulation when running as a Xen PV guest.
> 
> In order to prepare support for the immediate forms of RDMSR and WRMSR
> when not running as a Xen PV guest, use the RDMSR and WRMSR
> instructions as the fallback case instead of ALT_CALL_INSTR.

I'm trying to evaluate how to add the immediate form MSR instructions
on top of this patch set.  And I'm close to get it done.

> 
> Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
> even as an intermediate step, as this would clobber the indirect call
> information needed when patching in the direct call for the Xen case.

Good point!


Deciding whether to retain the pvops MSR API is the responsibility of
the x86 maintainers, who are the ones experiencing the challenges of 
maintaining the code.

tglx said @https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/:

 > I fundamentaly hate adding this to the PV infrastructure. We don't
 > want more PV ops, quite the contrary.

That is the reason I took a different direction, i.e., removing the
pvops MSR APIs.  But if your approach is cleaner, they may prefer it.

> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index a463c747c780..df10b0e4f7b8 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
>   	PVOP_VCALL1(cpu.write_cr4, x);
>   }
>   
> -static inline u64 paravirt_read_msr(u32 msr)
> +static __always_inline u64 paravirt_read_msr(u32 msr)
>   {
> -	return PVOP_CALL1(u64, cpu.read_msr, msr);
> +	EAX_EDX_DECLARE_ARGS(val, low, high);

This is under CONFIG_PARAVIRT_XXL, thus CONFIG_XEN_PV and CONFIG_X86_64,
therefore we don't need to consider 32-bit at all, no?

> +
> +	PVOP_TEST_NULL(cpu.read_msr);
> +	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
> +					"rdmsr", ALT_NOT_XEN,
> +					ALT_CALL_INSTR, ALT_XENPV_CALL)
> +		     "2:\n"
> +		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
> +		     : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
> +		     : paravirt_ptr(cpu.read_msr), "c" (msr));
> +
> +	return EAX_EDX_VAL(val, low, high);
>   }
>   
> -static inline void paravirt_write_msr(u32 msr, u64 val)
> +static __always_inline void paravirt_write_msr(u32 msr, u64 val)
>   {
> -	PVOP_VCALL2(cpu.write_msr, msr, val);
> +	PVOP_TEST_NULL(cpu.write_msr);
> +	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
> +					"wrmsr", ALT_NOT_XEN,
> +					ALT_CALL_INSTR, ALT_XENPV_CALL)
> +		      "2:\n"
> +		      _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
> +		      : ASM_CALL_CONSTRAINT
> +		      : paravirt_ptr(cpu.write_msr),
> +			  "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32))
> +		      : "memory");
>   }
>   
> -static inline int paravirt_read_msr_safe(u32 msr, u64 *val)
> +static __always_inline int paravirt_read_msr_safe(u32 msr, u64 *p)
>   {
> -	return PVOP_CALL2(int, cpu.read_msr_safe, msr, val);
> +	int err;
> +	EAX_EDX_DECLARE_ARGS(val, low, high);
> +
> +	PVOP_TEST_NULL(cpu.read_msr_safe);
> +	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
> +					"rdmsr; xor %[err],%[err]", ALT_NOT_XEN,
> +					ALT_CALL_INSTR, ALT_XENPV_CALL)
> +		     "2:\n"
> +		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
> +		     : [err] "=c" (err), EAX_EDX_RET(val, low, high),
> +		       ASM_CALL_CONSTRAINT
> +		     : paravirt_ptr(cpu.read_msr_safe), "0" (msr));
> +
> +	*p = EAX_EDX_VAL(val, low, high);
> +
> +	return err;
>   }
>   
> -static inline int paravirt_write_msr_safe(u32 msr, u64 val)
> +static __always_inline int paravirt_write_msr_safe(u32 msr, u64 val)
>   {
> -	return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
> +	int err;
> +
> +	PVOP_TEST_NULL(cpu.write_msr_safe);
> +	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
> +					"wrmsr; xor %[err],%[err]", ALT_NOT_XEN,
> +					ALT_CALL_INSTR, ALT_XENPV_CALL)
> +		     "2:\n"
> +		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
> +		     : [err] "=a" (err), ASM_CALL_CONSTRAINT
> +		     : paravirt_ptr(cpu.write_msr_safe),
> +		       "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
> +		     : "memory");
> +
> +	return err;
>   }
>   
>   static __always_inline u64 read_msr(u32 msr)
> @@ -573,27 +621,43 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
>   #define PV_SAVE_ALL_CALLER_REGS		"pushl %ecx;"
>   #define PV_RESTORE_ALL_CALLER_REGS	"popl  %ecx;"
>   #else
> +/* save and restore caller-save registers, except %rax, %rcx and %rdx. */
> +#define PV_SAVE_COMMON_CALLER_REGS	\
> +	"push %rsi;"			\
> +	"push %rdi;"			\
> +	"push %r8;"			\
> +	"push %r9;"			\
> +	"push %r10;"			\
> +	"push %r11;"

Add an empty line please, easier to read.

> +#define PV_RESTORE_COMMON_CALLER_REGS	\
> +	"pop %r11;"			\
> +	"pop %r10;"			\
> +	"pop %r9;"			\
> +	"pop %r8;"			\
> +	"pop %rdi;"			\
> +	"pop %rsi;"
> +
> +#define PV_PROLOGUE_MSR(func)		\
> +	PV_SAVE_COMMON_CALLER_REGS	\
> +	PV_PROLOGUE_MSR_##func

Ditto.  And the following similar cases.

> +#define PV_EPILOGUE_MSR(func)		\
> +	PV_EPILOGUE_MSR_##func		\
> +	PV_RESTORE_COMMON_CALLER_REGS
> +
>   /* save and restore all caller-save registers, except return value */
>   #define PV_SAVE_ALL_CALLER_REGS						\
>   	"push %rcx;"							\
>   	"push %rdx;"							\
> -	"push %rsi;"							\
> -	"push %rdi;"							\
> -	"push %r8;"							\
> -	"push %r9;"							\
> -	"push %r10;"							\
> -	"push %r11;"
> +	PV_SAVE_COMMON_CALLER_REGS
>   #define PV_RESTORE_ALL_CALLER_REGS					\
> -	"pop %r11;"							\
> -	"pop %r10;"							\
> -	"pop %r9;"							\
> -	"pop %r8;"							\
> -	"pop %rdi;"							\
> -	"pop %rsi;"							\
> +	PV_RESTORE_COMMON_CALLER_REGS					\
>   	"pop %rdx;"							\
>   	"pop %rcx;"
>   #endif
>   
> +#define PV_PROLOGUE_ALL(func)	PV_SAVE_ALL_CALLER_REGS
> +#define PV_EPILOGUE_ALL(func)	PV_RESTORE_ALL_CALLER_REGS
> +
>   /*
>    * Generate a thunk around a function which saves all caller-save
>    * registers except for the return value.  This allows C functions to
> @@ -607,7 +671,7 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
>    * functions.
>    */
>   #define PV_THUNK_NAME(func) "__raw_callee_save_" #func
> -#define __PV_CALLEE_SAVE_REGS_THUNK(func, section)			\
> +#define __PV_CALLEE_SAVE_REGS_THUNK(func, section, helper)		\
>   	extern typeof(func) __raw_callee_save_##func;			\
>   									\
>   	asm(".pushsection " section ", \"ax\";"				\
> @@ -617,16 +681,18 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
>   	    PV_THUNK_NAME(func) ":"					\
>   	    ASM_ENDBR							\
>   	    FRAME_BEGIN							\
> -	    PV_SAVE_ALL_CALLER_REGS					\
> +	    PV_PROLOGUE_##helper(func)					\
>   	    "call " #func ";"						\
> -	    PV_RESTORE_ALL_CALLER_REGS					\
> +	    PV_EPILOGUE_##helper(func)					\
>   	    FRAME_END							\
>   	    ASM_RET							\
>   	    ".size " PV_THUNK_NAME(func) ", .-" PV_THUNK_NAME(func) ";"	\
>   	    ".popsection")
>   
>   #define PV_CALLEE_SAVE_REGS_THUNK(func)			\
> -	__PV_CALLEE_SAVE_REGS_THUNK(func, ".text")
> +	__PV_CALLEE_SAVE_REGS_THUNK(func, ".text", ALL)
> +#define PV_CALLEE_SAVE_REGS_MSR_THUNK(func)		\
> +	__PV_CALLEE_SAVE_REGS_THUNK(func, ".text", MSR)
>   
>   /* Get a reference to a callee-save function */
>   #define PV_CALLEE_SAVE(func)						\
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index b08b9d3122d6..f7f879319e90 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -91,15 +91,15 @@ struct pv_cpu_ops {
>   		      unsigned int *ecx, unsigned int *edx);
>   
>   	/* Unsafe MSR operations.  These will warn or panic on failure. */
> -	u64 (*read_msr)(u32 msr);
> -	void (*write_msr)(u32 msr, u64 val);
> +	struct paravirt_callee_save read_msr;
> +	struct paravirt_callee_save write_msr;
>   
>   	/*
>   	 * Safe MSR operations.
>   	 * Returns 0 or -EIO.
>   	 */
> -	int (*read_msr_safe)(u32 msr, u64 *val);
> -	int (*write_msr_safe)(u32 msr, u64 val);
> +	struct paravirt_callee_save read_msr_safe;
> +	struct paravirt_callee_save write_msr_safe;
>   
>   	u64 (*read_pmc)(int counter);
>   
> @@ -520,6 +520,10 @@ unsigned long pv_native_save_fl(void);
>   void pv_native_irq_disable(void);
>   void pv_native_irq_enable(void);
>   unsigned long pv_native_read_cr2(void);
> +void pv_native_rdmsr(void);
> +void pv_native_wrmsr(void);
> +void pv_native_rdmsr_safe(void);
> +void pv_native_wrmsr_safe(void);
>   #endif
>   
>   #define paravirt_nop	((void *)nop_func)
> @@ -527,6 +531,7 @@ unsigned long pv_native_read_cr2(void);
>   #endif	/* __ASSEMBLER__ */
>   
>   #define ALT_NOT_XEN	ALT_NOT(X86_FEATURE_XENPV)
> +#define ALT_XENPV_CALL	ALT_DIRECT_CALL(X86_FEATURE_XENPV)
>   
>   #endif  /* CONFIG_PARAVIRT */
>   #endif	/* _ASM_X86_PARAVIRT_TYPES_H */
> diff --git a/arch/x86/include/asm/qspinlock_paravirt.h b/arch/x86/include/asm/qspinlock_paravirt.h
> index 0a985784be9b..0351acb5a143 100644
> --- a/arch/x86/include/asm/qspinlock_paravirt.h
> +++ b/arch/x86/include/asm/qspinlock_paravirt.h
> @@ -14,7 +14,8 @@ void __lockfunc __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 lock
>    */
>   #ifdef CONFIG_64BIT
>   
> -__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, ".spinlock.text");
> +__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, ".spinlock.text",
> +			    ALL);
>   #define __pv_queued_spin_unlock	__pv_queued_spin_unlock
>   
>   /*
> @@ -61,7 +62,7 @@ DEFINE_ASM_FUNC(__raw_callee_save___pv_queued_spin_unlock,
>   #else /* CONFIG_64BIT */
>   
>   extern void __lockfunc __pv_queued_spin_unlock(struct qspinlock *lock);
> -__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text");
> +__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text", ALL);
>   
>   #endif /* CONFIG_64BIT */
>   #endif
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 015bf298434f..ff7d7fdae360 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -50,6 +50,24 @@ DEFINE_ASM_FUNC(pv_native_save_fl, "pushf; pop %rax", .noinstr.text);
>   DEFINE_ASM_FUNC(pv_native_irq_disable, "cli", .noinstr.text);
>   DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
>   DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
> +DEFINE_ASM_FUNC(pv_native_rdmsr,
> +		"1: rdmsr\n"
> +		"2:\n"
> +		_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR), .noinstr.text);
> +DEFINE_ASM_FUNC(pv_native_wrmsr,
> +		"1: wrmsr\n"
> +		"2:\n"
> +		_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR), .noinstr.text);
> +DEFINE_ASM_FUNC(pv_native_rdmsr_safe,
> +		"1: rdmsr; xor %ecx, %ecx\n"
> +		"2:\n"
> +		_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %%ecx),
> +		.noinstr.text);
> +DEFINE_ASM_FUNC(pv_native_wrmsr_safe,
> +		"1: wrmsr; xor %eax, %eax\n"
> +		"2:\n"
> +		_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %%eax),
> +		.noinstr.text);
>   #endif
>   
>   DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
> @@ -129,10 +147,10 @@ struct paravirt_patch_template pv_ops = {
>   	.cpu.read_cr0		= native_read_cr0,
>   	.cpu.write_cr0		= native_write_cr0,
>   	.cpu.write_cr4		= native_write_cr4,
> -	.cpu.read_msr		= native_read_msr,
> -	.cpu.write_msr		= native_write_msr,
> -	.cpu.read_msr_safe	= native_read_msr_safe,
> -	.cpu.write_msr_safe	= native_write_msr_safe,
> +	.cpu.read_msr		= __PV_IS_CALLEE_SAVE(pv_native_rdmsr),
> +	.cpu.write_msr		= __PV_IS_CALLEE_SAVE(pv_native_wrmsr),
> +	.cpu.read_msr_safe	= __PV_IS_CALLEE_SAVE(pv_native_rdmsr_safe),
> +	.cpu.write_msr_safe	= __PV_IS_CALLEE_SAVE(pv_native_wrmsr_safe),
>   	.cpu.read_pmc		= native_read_pmc,
>   	.cpu.load_tr_desc	= native_load_tr_desc,
>   	.cpu.set_ldt		= native_set_ldt,
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index 3be38350f044..c279b2bef7eb 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -1160,36 +1160,66 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
>   	}
>   }
>   
> -static int xen_read_msr_safe(u32 msr, u64 *val)
> -{
> +/*
> + * Prototypes for functions called via PV_CALLEE_SAVE_REGS_THUNK() in order
> + * to avoid warnings with "-Wmissing-prototypes".
> + */
> +struct xen_rdmsr_safe_ret {
> +	u64 val;
>   	int err;
> +};
> +struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr);
> +int xen_write_msr_safe(u32 msr, u32 low, u32 high);
> +u64 xen_read_msr(u32 msr);
> +void xen_write_msr(u32 msr, u32 low, u32 high);
>   
> -	*val = xen_do_read_msr(msr, &err);
> -	return err;
> +__visible struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr)
> +{
> +	struct xen_rdmsr_safe_ret ret;

struct xen_rdmsr_safe_ret ret = { 0, 0 };

Because the 'err' member may not be set in xen_do_read_msr().

> +
> +	ret.val = xen_do_read_msr(msr, &ret.err);
> +	return ret;
>   }
> +#define PV_PROLOGUE_MSR_xen_read_msr_safe	"mov %ecx, %edi;"
> +#define PV_EPILOGUE_MSR_xen_read_msr_safe	\
> +	"mov %edx, %ecx; mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr_safe);
>   
> -static int xen_write_msr_safe(u32 msr, u64 val)
> +__visible int xen_write_msr_safe(u32 msr, u32 low, u32 high)

I think we can avoid splitting this u64 into two u32.

>   {
>   	int err = 0;
>   
> -	xen_do_write_msr(msr, val, &err);
> +	xen_do_write_msr(msr, (u64)high << 32 | low, &err);
>   
>   	return err;
>   }
> +#define PV_PROLOGUE_MSR_xen_write_msr_safe	\
> +	"mov %ecx, %edi; mov %eax, %esi;"
> +#define PV_EPILOGUE_MSR_xen_write_msr_safe
> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr_safe);
>   
> -static u64 xen_read_msr(u32 msr)
> +__visible u64 xen_read_msr(u32 msr)
>   {
>   	int err;
>   
>   	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
>   }
> +#define PV_PROLOGUE_MSR_xen_read_msr	"mov %ecx, %edi;"
> +#define PV_EPILOGUE_MSR_xen_read_msr	\
> +	"mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr);
>   
> -static void xen_write_msr(u32 msr, u64 val)
> +__visible void xen_write_msr(u32 msr, u32 low, u32 high)

Ditto.

>   {
>   	int err;
>   
> -	xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
> +	xen_do_write_msr(msr, (u64)high << 32 | low,
> +			 xen_msr_safe ? &err : NULL);
>   }
> +#define PV_PROLOGUE_MSR_xen_write_msr	\
> +	"mov %ecx, %edi; mov %eax, %esi;"
> +#define PV_EPILOGUE_MSR_xen_write_msr
> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr);
>   
>   /* This is called once we have the cpu_possible_mask */
>   void __init xen_setup_vcpu_info_placement(void)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions
  2025-05-06  9:20 ` [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions Juergen Gross
  2025-05-06 14:24   ` Sean Christopherson
@ 2025-05-09 21:49   ` Wei Liu
  1 sibling, 0 replies; 30+ messages in thread
From: Wei Liu @ 2025-05-09 21:49 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-hyperv, kvm, xin, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Paolo Bonzini,
	Vitaly Kuznetsov, Sean Christopherson, Boris Ostrovsky, xen-devel

On Tue, May 06, 2025 at 11:20:12AM +0200, Juergen Gross wrote:
> In order to prepare for some MSR access function reorg work, switch
> most users of native_{read|write}_msr[_safe]() to the more generic
> rdmsr*()/wrmsr*() variants.
> 
> For now this will have some intermediate performance impact with
> paravirtualization configured when running on bare metal, but this
> is a prereq change for the planned direct inlining of the rdmsr/wrmsr
> instructions with this configuration.
> 
> The main reason for this switch is the planned move of the MSR trace
> function invocation from the native_*() functions to the generic
> rdmsr*()/wrmsr*() variants. Without this switch the users of the
> native_*() functions would lose the related tracing entries.
> 
> Note that the Xen related MSR access functions will not be switched,
> as these will be handled after the move of the trace hooks.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  arch/x86/hyperv/ivm.c      |  2 +-

Acked-by: Wei Liu <wei.liu@kernel.org>

> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 09a165a3c41e..fe177a6be581 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -319,7 +319,7 @@ int hv_snp_boot_ap(u32 cpu, unsigned long start_ip)
>  	asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
>  	hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
>  
> -	vmsa->efer = native_read_msr(MSR_EFER);
> +	rdmsrq(MSR_EFER, vmsa->efer);
>  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions
  2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
                   ` (5 preceding siblings ...)
  2025-05-06  9:20 ` [PATCH 6/6] x86/msr: reduce number of low level MSR access helpers Juergen Gross
@ 2025-05-10 16:03 ` Michael Kelley
  6 siblings, 0 replies; 30+ messages in thread
From: Michael Kelley @ 2025-05-10 16:03 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-coco@lists.linux.dev, kvm@vger.kernel.org,
	linux-hyperv@vger.kernel.org, virtualization@lists.linux.dev
  Cc: xin@zytor.com, Kirill A. Shutemov, Dave Hansen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Sean Christopherson,
	Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Vitaly Kuznetsov, Boris Ostrovsky,
	xen-devel@lists.xenproject.org, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list

From: Juergen Gross <jgross@suse.com> Sent: Tuesday, May 6, 2025 2:20 AM
> 
> When building a kernel with CONFIG_PARAVIRT_XXL the paravirt
> infrastructure will always use functions for reading or writing MSRs,
> even when running on bare metal.
> 
> Switch to inline RDMSR/WRMSR instructions in this case, reducing the
> paravirt overhead.
> 
> In order to make this less intrusive, some further reorganization of
> the MSR access helpers is done in the first 4 patches.
> 
> This series has been tested to work with Xen PV and on bare metal.

I've tested in SEV-SNP and TDX guests with paravisor on Hyper-V. Basic
smoke test showed no issues.

Tested-by: Michael Kelley <mhklinux@outlook.com>

> 
> There has been another approach by Xin Li, which used dedicated #ifdef
> and removing the MSR related paravirt hooks instead of just modifying
> the paravirt code generation.
> 
> Please note that I haven't included the use of WRMSRNS or the
> immediate forms of WRMSR and RDMSR, because I wanted to get some
> feedback on my approach first. Enhancing paravirt for those cases
> is not very complicated, as the main base is already prepared for
> that enhancement.
> 
> This series is based on the x86/msr branch of the tip tree.
> 
> Juergen Gross (6):
>   coco/tdx: Rename MSR access helpers
>   x86/kvm: Rename the KVM private read_msr() function
>   x86/msr: minimize usage of native_*() msr access functions
>   x86/msr: Move MSR trace calls one function level up
>   x86/paravirt: Switch MSR access pv_ops functions to instruction
>     interfaces
>   x86/msr: reduce number of low level MSR access helpers
> 
>  arch/x86/coco/tdx/tdx.c                   |   8 +-
>  arch/x86/hyperv/ivm.c                     |   2 +-
>  arch/x86/include/asm/kvm_host.h           |   2 +-
>  arch/x86/include/asm/msr.h                | 116 ++++++++++-------
>  arch/x86/include/asm/paravirt.h           | 152 ++++++++++++++--------
>  arch/x86/include/asm/paravirt_types.h     |  13 +-
>  arch/x86/include/asm/qspinlock_paravirt.h |   5 +-
>  arch/x86/kernel/kvmclock.c                |   2 +-
>  arch/x86/kernel/paravirt.c                |  26 +++-
>  arch/x86/kvm/svm/svm.c                    |  16 +--
>  arch/x86/kvm/vmx/vmx.c                    |   4 +-
>  arch/x86/xen/enlighten_pv.c               |  60 ++++++---
>  arch/x86/xen/pmu.c                        |   4 +-
>  13 files changed, 262 insertions(+), 148 deletions(-)
> 
> --
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-09  8:18   ` Xin Li
@ 2025-05-12 11:20     ` Jürgen Groß
  2025-05-12 11:24       ` Juergen Gross
  2025-05-13  7:44       ` Xin Li
  0 siblings, 2 replies; 30+ messages in thread
From: Jürgen Groß @ 2025-05-12 11:20 UTC (permalink / raw)
  To: Xin Li, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Boris Ostrovsky, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 20462 bytes --]

On 09.05.25 10:18, Xin Li wrote:
> On 5/6/2025 2:20 AM, Juergen Gross wrote:
>> Instead of having callback functions for rdmsr/wrmsr on native, switch
>> to inline the respective instructions directly in order to avoid
>> overhead with the call interface.
> 
> To me, this is a beneficial addition to the existing pvops MSR code.
> 
>>
>> This requires to use the instruction interfaces for rdmsr/wrmsr
>> emulation when running as a Xen PV guest.
>>
>> In order to prepare support for the immediate forms of RDMSR and WRMSR
>> when not running as a Xen PV guest, use the RDMSR and WRMSR
>> instructions as the fallback case instead of ALT_CALL_INSTR.
> 
> I'm trying to evaluate how to add the immediate form MSR instructions
> on top of this patch set.  And I'm close to get it done.

There is something to consider when running as a Xen PV guest, ...

> 
>>
>> Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
>> even as an intermediate step, as this would clobber the indirect call
>> information needed when patching in the direct call for the Xen case.
> 
> Good point!

... as this still needs to be true.

There are 2 different ways to deal with this:

1. When running as a Xen PV guest disable X86_FEATURE_WRMSRNS and
    ASM_WRMSRNS_IMM (e.g. in xen_init_capabilities()).

2. Buffer the original instruction before patching in apply_alternatives()
    in order to avoid the sequence limitation above (see attached patch).

> Deciding whether to retain the pvops MSR API is the responsibility of
> the x86 maintainers, who are the ones experiencing the challenges of maintaining 
> the code.

Well, I'm the PV ops maintainer, so it is basically me who needs to deal
with this. OTOH I do understand that diagnosis of problems with PV ops is
more complicated than without.

> 
> tglx said @https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/:
> 
>  > I fundamentaly hate adding this to the PV infrastructure. We don't
>  > want more PV ops, quite the contrary.
> 
> That is the reason I took a different direction, i.e., removing the
> pvops MSR APIs.  But if your approach is cleaner, they may prefer it.

In the end it isn't adding additional PV ops interfaces. It is modifying
existing ones.

> 
>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
>> index a463c747c780..df10b0e4f7b8 100644
>> --- a/arch/x86/include/asm/paravirt.h
>> +++ b/arch/x86/include/asm/paravirt.h
>> @@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
>>       PVOP_VCALL1(cpu.write_cr4, x);
>>   }
>> -static inline u64 paravirt_read_msr(u32 msr)
>> +static __always_inline u64 paravirt_read_msr(u32 msr)
>>   {
>> -    return PVOP_CALL1(u64, cpu.read_msr, msr);
>> +    EAX_EDX_DECLARE_ARGS(val, low, high);
> 
> This is under CONFIG_PARAVIRT_XXL, thus CONFIG_XEN_PV and CONFIG_X86_64,
> therefore we don't need to consider 32-bit at all, no?

Right. OTOH the macros are there, so why not use them?

In the end I'm fine to open code the 64-bit case here.

> 
>> +
>> +    PVOP_TEST_NULL(cpu.read_msr);
>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>> +                    "rdmsr", ALT_NOT_XEN,
>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>> +             "2:\n"
>> +             _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
>> +             : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
>> +             : paravirt_ptr(cpu.read_msr), "c" (msr));
>> +
>> +    return EAX_EDX_VAL(val, low, high);
>>   }
>> -static inline void paravirt_write_msr(u32 msr, u64 val)
>> +static __always_inline void paravirt_write_msr(u32 msr, u64 val)
>>   {
>> -    PVOP_VCALL2(cpu.write_msr, msr, val);
>> +    PVOP_TEST_NULL(cpu.write_msr);
>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>> +                    "wrmsr", ALT_NOT_XEN,
>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>> +              "2:\n"
>> +              _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>> +              : ASM_CALL_CONSTRAINT
>> +              : paravirt_ptr(cpu.write_msr),
>> +              "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32))
>> +              : "memory");
>>   }
>> -static inline int paravirt_read_msr_safe(u32 msr, u64 *val)
>> +static __always_inline int paravirt_read_msr_safe(u32 msr, u64 *p)
>>   {
>> -    return PVOP_CALL2(int, cpu.read_msr_safe, msr, val);
>> +    int err;
>> +    EAX_EDX_DECLARE_ARGS(val, low, high);
>> +
>> +    PVOP_TEST_NULL(cpu.read_msr_safe);
>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>> +                    "rdmsr; xor %[err],%[err]", ALT_NOT_XEN,
>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>> +             "2:\n"
>> +             _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
>> +             : [err] "=c" (err), EAX_EDX_RET(val, low, high),
>> +               ASM_CALL_CONSTRAINT
>> +             : paravirt_ptr(cpu.read_msr_safe), "0" (msr));
>> +
>> +    *p = EAX_EDX_VAL(val, low, high);
>> +
>> +    return err;
>>   }
>> -static inline int paravirt_write_msr_safe(u32 msr, u64 val)
>> +static __always_inline int paravirt_write_msr_safe(u32 msr, u64 val)
>>   {
>> -    return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
>> +    int err;
>> +
>> +    PVOP_TEST_NULL(cpu.write_msr_safe);
>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>> +                    "wrmsr; xor %[err],%[err]", ALT_NOT_XEN,
>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>> +             "2:\n"
>> +             _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
>> +             : [err] "=a" (err), ASM_CALL_CONSTRAINT
>> +             : paravirt_ptr(cpu.write_msr_safe),
>> +               "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
>> +             : "memory");
>> +
>> +    return err;
>>   }
>>   static __always_inline u64 read_msr(u32 msr)
>> @@ -573,27 +621,43 @@ bool __raw_callee_save___native_vcpu_is_preempted(long 
>> cpu);
>>   #define PV_SAVE_ALL_CALLER_REGS        "pushl %ecx;"
>>   #define PV_RESTORE_ALL_CALLER_REGS    "popl  %ecx;"
>>   #else
>> +/* save and restore caller-save registers, except %rax, %rcx and %rdx. */
>> +#define PV_SAVE_COMMON_CALLER_REGS    \
>> +    "push %rsi;"            \
>> +    "push %rdi;"            \
>> +    "push %r8;"            \
>> +    "push %r9;"            \
>> +    "push %r10;"            \
>> +    "push %r11;"
> 
> Add an empty line please, easier to read.

Okay (same below).

> 
>> +#define PV_RESTORE_COMMON_CALLER_REGS    \
>> +    "pop %r11;"            \
>> +    "pop %r10;"            \
>> +    "pop %r9;"            \
>> +    "pop %r8;"            \
>> +    "pop %rdi;"            \
>> +    "pop %rsi;"
>> +
>> +#define PV_PROLOGUE_MSR(func)        \
>> +    PV_SAVE_COMMON_CALLER_REGS    \
>> +    PV_PROLOGUE_MSR_##func
> 
> Ditto.  And the following similar cases.
> 
>> +#define PV_EPILOGUE_MSR(func)        \
>> +    PV_EPILOGUE_MSR_##func        \
>> +    PV_RESTORE_COMMON_CALLER_REGS
>> +
>>   /* save and restore all caller-save registers, except return value */
>>   #define PV_SAVE_ALL_CALLER_REGS                        \
>>       "push %rcx;"                            \
>>       "push %rdx;"                            \
>> -    "push %rsi;"                            \
>> -    "push %rdi;"                            \
>> -    "push %r8;"                            \
>> -    "push %r9;"                            \
>> -    "push %r10;"                            \
>> -    "push %r11;"
>> +    PV_SAVE_COMMON_CALLER_REGS
>>   #define PV_RESTORE_ALL_CALLER_REGS                    \
>> -    "pop %r11;"                            \
>> -    "pop %r10;"                            \
>> -    "pop %r9;"                            \
>> -    "pop %r8;"                            \
>> -    "pop %rdi;"                            \
>> -    "pop %rsi;"                            \
>> +    PV_RESTORE_COMMON_CALLER_REGS                    \
>>       "pop %rdx;"                            \
>>       "pop %rcx;"
>>   #endif
>> +#define PV_PROLOGUE_ALL(func)    PV_SAVE_ALL_CALLER_REGS
>> +#define PV_EPILOGUE_ALL(func)    PV_RESTORE_ALL_CALLER_REGS
>> +
>>   /*
>>    * Generate a thunk around a function which saves all caller-save
>>    * registers except for the return value.  This allows C functions to
>> @@ -607,7 +671,7 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
>>    * functions.
>>    */
>>   #define PV_THUNK_NAME(func) "__raw_callee_save_" #func
>> -#define __PV_CALLEE_SAVE_REGS_THUNK(func, section)            \
>> +#define __PV_CALLEE_SAVE_REGS_THUNK(func, section, helper)        \
>>       extern typeof(func) __raw_callee_save_##func;            \
>>                                       \
>>       asm(".pushsection " section ", \"ax\";"                \
>> @@ -617,16 +681,18 @@ bool __raw_callee_save___native_vcpu_is_preempted(long 
>> cpu);
>>           PV_THUNK_NAME(func) ":"                    \
>>           ASM_ENDBR                            \
>>           FRAME_BEGIN                            \
>> -        PV_SAVE_ALL_CALLER_REGS                    \
>> +        PV_PROLOGUE_##helper(func)                    \
>>           "call " #func ";"                        \
>> -        PV_RESTORE_ALL_CALLER_REGS                    \
>> +        PV_EPILOGUE_##helper(func)                    \
>>           FRAME_END                            \
>>           ASM_RET                            \
>>           ".size " PV_THUNK_NAME(func) ", .-" PV_THUNK_NAME(func) ";"    \
>>           ".popsection")
>>   #define PV_CALLEE_SAVE_REGS_THUNK(func)            \
>> -    __PV_CALLEE_SAVE_REGS_THUNK(func, ".text")
>> +    __PV_CALLEE_SAVE_REGS_THUNK(func, ".text", ALL)
>> +#define PV_CALLEE_SAVE_REGS_MSR_THUNK(func)        \
>> +    __PV_CALLEE_SAVE_REGS_THUNK(func, ".text", MSR)
>>   /* Get a reference to a callee-save function */
>>   #define PV_CALLEE_SAVE(func)                        \
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/ 
>> paravirt_types.h
>> index b08b9d3122d6..f7f879319e90 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -91,15 +91,15 @@ struct pv_cpu_ops {
>>                 unsigned int *ecx, unsigned int *edx);
>>       /* Unsafe MSR operations.  These will warn or panic on failure. */
>> -    u64 (*read_msr)(u32 msr);
>> -    void (*write_msr)(u32 msr, u64 val);
>> +    struct paravirt_callee_save read_msr;
>> +    struct paravirt_callee_save write_msr;
>>       /*
>>        * Safe MSR operations.
>>        * Returns 0 or -EIO.
>>        */
>> -    int (*read_msr_safe)(u32 msr, u64 *val);
>> -    int (*write_msr_safe)(u32 msr, u64 val);
>> +    struct paravirt_callee_save read_msr_safe;
>> +    struct paravirt_callee_save write_msr_safe;
>>       u64 (*read_pmc)(int counter);
>> @@ -520,6 +520,10 @@ unsigned long pv_native_save_fl(void);
>>   void pv_native_irq_disable(void);
>>   void pv_native_irq_enable(void);
>>   unsigned long pv_native_read_cr2(void);
>> +void pv_native_rdmsr(void);
>> +void pv_native_wrmsr(void);
>> +void pv_native_rdmsr_safe(void);
>> +void pv_native_wrmsr_safe(void);
>>   #endif
>>   #define paravirt_nop    ((void *)nop_func)
>> @@ -527,6 +531,7 @@ unsigned long pv_native_read_cr2(void);
>>   #endif    /* __ASSEMBLER__ */
>>   #define ALT_NOT_XEN    ALT_NOT(X86_FEATURE_XENPV)
>> +#define ALT_XENPV_CALL    ALT_DIRECT_CALL(X86_FEATURE_XENPV)
>>   #endif  /* CONFIG_PARAVIRT */
>>   #endif    /* _ASM_X86_PARAVIRT_TYPES_H */
>> diff --git a/arch/x86/include/asm/qspinlock_paravirt.h b/arch/x86/include/asm/ 
>> qspinlock_paravirt.h
>> index 0a985784be9b..0351acb5a143 100644
>> --- a/arch/x86/include/asm/qspinlock_paravirt.h
>> +++ b/arch/x86/include/asm/qspinlock_paravirt.h
>> @@ -14,7 +14,8 @@ void __lockfunc __pv_queued_spin_unlock_slowpath(struct 
>> qspinlock *lock, u8 lock
>>    */
>>   #ifdef CONFIG_64BIT
>> -__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, ".spinlock.text");
>> +__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, ".spinlock.text",
>> +                ALL);
>>   #define __pv_queued_spin_unlock    __pv_queued_spin_unlock
>>   /*
>> @@ -61,7 +62,7 @@ DEFINE_ASM_FUNC(__raw_callee_save___pv_queued_spin_unlock,
>>   #else /* CONFIG_64BIT */
>>   extern void __lockfunc __pv_queued_spin_unlock(struct qspinlock *lock);
>> -__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text");
>> +__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text", ALL);
>>   #endif /* CONFIG_64BIT */
>>   #endif
>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>> index 015bf298434f..ff7d7fdae360 100644
>> --- a/arch/x86/kernel/paravirt.c
>> +++ b/arch/x86/kernel/paravirt.c
>> @@ -50,6 +50,24 @@ DEFINE_ASM_FUNC(pv_native_save_fl, "pushf; pop 
>> %rax", .noinstr.text);
>>   DEFINE_ASM_FUNC(pv_native_irq_disable, "cli", .noinstr.text);
>>   DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
>>   DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
>> +DEFINE_ASM_FUNC(pv_native_rdmsr,
>> +        "1: rdmsr\n"
>> +        "2:\n"
>> +        _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR), .noinstr.text);
>> +DEFINE_ASM_FUNC(pv_native_wrmsr,
>> +        "1: wrmsr\n"
>> +        "2:\n"
>> +        _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR), .noinstr.text);
>> +DEFINE_ASM_FUNC(pv_native_rdmsr_safe,
>> +        "1: rdmsr; xor %ecx, %ecx\n"
>> +        "2:\n"
>> +        _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %%ecx),
>> +        .noinstr.text);
>> +DEFINE_ASM_FUNC(pv_native_wrmsr_safe,
>> +        "1: wrmsr; xor %eax, %eax\n"
>> +        "2:\n"
>> +        _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %%eax),
>> +        .noinstr.text);
>>   #endif
>>   DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
>> @@ -129,10 +147,10 @@ struct paravirt_patch_template pv_ops = {
>>       .cpu.read_cr0        = native_read_cr0,
>>       .cpu.write_cr0        = native_write_cr0,
>>       .cpu.write_cr4        = native_write_cr4,
>> -    .cpu.read_msr        = native_read_msr,
>> -    .cpu.write_msr        = native_write_msr,
>> -    .cpu.read_msr_safe    = native_read_msr_safe,
>> -    .cpu.write_msr_safe    = native_write_msr_safe,
>> +    .cpu.read_msr        = __PV_IS_CALLEE_SAVE(pv_native_rdmsr),
>> +    .cpu.write_msr        = __PV_IS_CALLEE_SAVE(pv_native_wrmsr),
>> +    .cpu.read_msr_safe    = __PV_IS_CALLEE_SAVE(pv_native_rdmsr_safe),
>> +    .cpu.write_msr_safe    = __PV_IS_CALLEE_SAVE(pv_native_wrmsr_safe),
>>       .cpu.read_pmc        = native_read_pmc,
>>       .cpu.load_tr_desc    = native_load_tr_desc,
>>       .cpu.set_ldt        = native_set_ldt,
>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>> index 3be38350f044..c279b2bef7eb 100644
>> --- a/arch/x86/xen/enlighten_pv.c
>> +++ b/arch/x86/xen/enlighten_pv.c
>> @@ -1160,36 +1160,66 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
>>       }
>>   }
>> -static int xen_read_msr_safe(u32 msr, u64 *val)
>> -{
>> +/*
>> + * Prototypes for functions called via PV_CALLEE_SAVE_REGS_THUNK() in order
>> + * to avoid warnings with "-Wmissing-prototypes".
>> + */
>> +struct xen_rdmsr_safe_ret {
>> +    u64 val;
>>       int err;
>> +};
>> +struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr);
>> +int xen_write_msr_safe(u32 msr, u32 low, u32 high);
>> +u64 xen_read_msr(u32 msr);
>> +void xen_write_msr(u32 msr, u32 low, u32 high);
>> -    *val = xen_do_read_msr(msr, &err);
>> -    return err;
>> +__visible struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr)
>> +{
>> +    struct xen_rdmsr_safe_ret ret;
> 
> struct xen_rdmsr_safe_ret ret = { 0, 0 };
> 
> Because the 'err' member may not be set in xen_do_read_msr().

Right.

> 
>> +
>> +    ret.val = xen_do_read_msr(msr, &ret.err);
>> +    return ret;
>>   }
>> +#define PV_PROLOGUE_MSR_xen_read_msr_safe    "mov %ecx, %edi;"
>> +#define PV_EPILOGUE_MSR_xen_read_msr_safe    \
>> +    "mov %edx, %ecx; mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr_safe);
>> -static int xen_write_msr_safe(u32 msr, u64 val)
>> +__visible int xen_write_msr_safe(u32 msr, u32 low, u32 high)
> 
> I think we can avoid splitting this u64 into two u32.

This is related to the native WRMSR interface. The WRMSR needs to be
able to be replaced by the call of the Xen specific function.

I could handle this in the prologue helpers, but I'd prefer to keep
those helpers as small as possible.

> 
>>   {
>>       int err = 0;
>> -    xen_do_write_msr(msr, val, &err);
>> +    xen_do_write_msr(msr, (u64)high << 32 | low, &err);
>>       return err;
>>   }
>> +#define PV_PROLOGUE_MSR_xen_write_msr_safe    \
>> +    "mov %ecx, %edi; mov %eax, %esi;"
>> +#define PV_EPILOGUE_MSR_xen_write_msr_safe
>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr_safe);
>> -static u64 xen_read_msr(u32 msr)
>> +__visible u64 xen_read_msr(u32 msr)
>>   {
>>       int err;
>>       return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
>>   }
>> +#define PV_PROLOGUE_MSR_xen_read_msr    "mov %ecx, %edi;"
>> +#define PV_EPILOGUE_MSR_xen_read_msr    \
>> +    "mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr);
>> -static void xen_write_msr(u32 msr, u64 val)
>> +__visible void xen_write_msr(u32 msr, u32 low, u32 high)
> 
> Ditto.

See above.

> 
>>   {
>>       int err;
>> -    xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
>> +    xen_do_write_msr(msr, (u64)high << 32 | low,
>> +             xen_msr_safe ? &err : NULL);
>>   }
>> +#define PV_PROLOGUE_MSR_xen_write_msr    \
>> +    "mov %ecx, %edi; mov %eax, %esi;"
>> +#define PV_EPILOGUE_MSR_xen_write_msr
>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr);
>>   /* This is called once we have the cpu_possible_mask */
>>   void __init xen_setup_vcpu_info_placement(void)


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-12 11:20     ` Jürgen Groß
@ 2025-05-12 11:24       ` Juergen Gross
  2025-05-13  5:55         ` Xin Li
  2025-05-13  7:44       ` Xin Li
  1 sibling, 1 reply; 30+ messages in thread
From: Juergen Gross @ 2025-05-12 11:24 UTC (permalink / raw)
  To: Xin Li, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Boris Ostrovsky, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 21040 bytes --]

Now with the mentioned patch really attached. :-)

On 12.05.25 13:20, Jürgen Groß wrote:
> On 09.05.25 10:18, Xin Li wrote:
>> On 5/6/2025 2:20 AM, Juergen Gross wrote:
>>> Instead of having callback functions for rdmsr/wrmsr on native, switch
>>> to inline the respective instructions directly in order to avoid
>>> overhead with the call interface.
>>
>> To me, this is a beneficial addition to the existing pvops MSR code.
>>
>>>
>>> This requires to use the instruction interfaces for rdmsr/wrmsr
>>> emulation when running as a Xen PV guest.
>>>
>>> In order to prepare support for the immediate forms of RDMSR and WRMSR
>>> when not running as a Xen PV guest, use the RDMSR and WRMSR
>>> instructions as the fallback case instead of ALT_CALL_INSTR.
>>
>> I'm trying to evaluate how to add the immediate form MSR instructions
>> on top of this patch set.  And I'm close to get it done.
> 
> There is something to consider when running as a Xen PV guest, ...
> 
>>
>>>
>>> Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
>>> even as an intermediate step, as this would clobber the indirect call
>>> information needed when patching in the direct call for the Xen case.
>>
>> Good point!
> 
> ... as this still needs to be true.
> 
> There are 2 different ways to deal with this:
> 
> 1. When running as a Xen PV guest disable X86_FEATURE_WRMSRNS and
>     ASM_WRMSRNS_IMM (e.g. in xen_init_capabilities()).
> 
> 2. Buffer the original instruction before patching in apply_alternatives()
>     in order to avoid the sequence limitation above (see attached patch).
> 
>> Deciding whether to retain the pvops MSR API is the responsibility of
>> the x86 maintainers, who are the ones experiencing the challenges of 
>> maintaining the code.
> 
> Well, I'm the PV ops maintainer, so it is basically me who needs to deal
> with this. OTOH I do understand that diagnosis of problems with PV ops is
> more complicated than without.
> 
>>
>> tglx said @https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/:
>>
>>  > I fundamentaly hate adding this to the PV infrastructure. We don't
>>  > want more PV ops, quite the contrary.
>>
>> That is the reason I took a different direction, i.e., removing the
>> pvops MSR APIs.  But if your approach is cleaner, they may prefer it.
> 
> In the end it isn't adding additional PV ops interfaces. It is modifying
> existing ones.
> 
>>
>>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
>>> index a463c747c780..df10b0e4f7b8 100644
>>> --- a/arch/x86/include/asm/paravirt.h
>>> +++ b/arch/x86/include/asm/paravirt.h
>>> @@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
>>>       PVOP_VCALL1(cpu.write_cr4, x);
>>>   }
>>> -static inline u64 paravirt_read_msr(u32 msr)
>>> +static __always_inline u64 paravirt_read_msr(u32 msr)
>>>   {
>>> -    return PVOP_CALL1(u64, cpu.read_msr, msr);
>>> +    EAX_EDX_DECLARE_ARGS(val, low, high);
>>
>> This is under CONFIG_PARAVIRT_XXL, thus CONFIG_XEN_PV and CONFIG_X86_64,
>> therefore we don't need to consider 32-bit at all, no?
> 
> Right. OTOH the macros are there, so why not use them?
> 
> In the end I'm fine to open code the 64-bit case here.
> 
>>
>>> +
>>> +    PVOP_TEST_NULL(cpu.read_msr);
>>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>>> +                    "rdmsr", ALT_NOT_XEN,
>>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>>> +             "2:\n"
>>> +             _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
>>> +             : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
>>> +             : paravirt_ptr(cpu.read_msr), "c" (msr));
>>> +
>>> +    return EAX_EDX_VAL(val, low, high);
>>>   }
>>> -static inline void paravirt_write_msr(u32 msr, u64 val)
>>> +static __always_inline void paravirt_write_msr(u32 msr, u64 val)
>>>   {
>>> -    PVOP_VCALL2(cpu.write_msr, msr, val);
>>> +    PVOP_TEST_NULL(cpu.write_msr);
>>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>>> +                    "wrmsr", ALT_NOT_XEN,
>>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>>> +              "2:\n"
>>> +              _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
>>> +              : ASM_CALL_CONSTRAINT
>>> +              : paravirt_ptr(cpu.write_msr),
>>> +              "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32))
>>> +              : "memory");
>>>   }
>>> -static inline int paravirt_read_msr_safe(u32 msr, u64 *val)
>>> +static __always_inline int paravirt_read_msr_safe(u32 msr, u64 *p)
>>>   {
>>> -    return PVOP_CALL2(int, cpu.read_msr_safe, msr, val);
>>> +    int err;
>>> +    EAX_EDX_DECLARE_ARGS(val, low, high);
>>> +
>>> +    PVOP_TEST_NULL(cpu.read_msr_safe);
>>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>>> +                    "rdmsr; xor %[err],%[err]", ALT_NOT_XEN,
>>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>>> +             "2:\n"
>>> +             _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
>>> +             : [err] "=c" (err), EAX_EDX_RET(val, low, high),
>>> +               ASM_CALL_CONSTRAINT
>>> +             : paravirt_ptr(cpu.read_msr_safe), "0" (msr));
>>> +
>>> +    *p = EAX_EDX_VAL(val, low, high);
>>> +
>>> +    return err;
>>>   }
>>> -static inline int paravirt_write_msr_safe(u32 msr, u64 val)
>>> +static __always_inline int paravirt_write_msr_safe(u32 msr, u64 val)
>>>   {
>>> -    return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
>>> +    int err;
>>> +
>>> +    PVOP_TEST_NULL(cpu.write_msr_safe);
>>> +    asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>>> +                    "wrmsr; xor %[err],%[err]", ALT_NOT_XEN,
>>> +                    ALT_CALL_INSTR, ALT_XENPV_CALL)
>>> +             "2:\n"
>>> +             _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
>>> +             : [err] "=a" (err), ASM_CALL_CONSTRAINT
>>> +             : paravirt_ptr(cpu.write_msr_safe),
>>> +               "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
>>> +             : "memory");
>>> +
>>> +    return err;
>>>   }
>>>   static __always_inline u64 read_msr(u32 msr)
>>> @@ -573,27 +621,43 @@ bool __raw_callee_save___native_vcpu_is_preempted(long 
>>> cpu);
>>>   #define PV_SAVE_ALL_CALLER_REGS        "pushl %ecx;"
>>>   #define PV_RESTORE_ALL_CALLER_REGS    "popl  %ecx;"
>>>   #else
>>> +/* save and restore caller-save registers, except %rax, %rcx and %rdx. */
>>> +#define PV_SAVE_COMMON_CALLER_REGS    \
>>> +    "push %rsi;"            \
>>> +    "push %rdi;"            \
>>> +    "push %r8;"            \
>>> +    "push %r9;"            \
>>> +    "push %r10;"            \
>>> +    "push %r11;"
>>
>> Add an empty line please, easier to read.
> 
> Okay (same below).
> 
>>
>>> +#define PV_RESTORE_COMMON_CALLER_REGS    \
>>> +    "pop %r11;"            \
>>> +    "pop %r10;"            \
>>> +    "pop %r9;"            \
>>> +    "pop %r8;"            \
>>> +    "pop %rdi;"            \
>>> +    "pop %rsi;"
>>> +
>>> +#define PV_PROLOGUE_MSR(func)        \
>>> +    PV_SAVE_COMMON_CALLER_REGS    \
>>> +    PV_PROLOGUE_MSR_##func
>>
>> Ditto.  And the following similar cases.
>>
>>> +#define PV_EPILOGUE_MSR(func)        \
>>> +    PV_EPILOGUE_MSR_##func        \
>>> +    PV_RESTORE_COMMON_CALLER_REGS
>>> +
>>>   /* save and restore all caller-save registers, except return value */
>>>   #define PV_SAVE_ALL_CALLER_REGS                        \
>>>       "push %rcx;"                            \
>>>       "push %rdx;"                            \
>>> -    "push %rsi;"                            \
>>> -    "push %rdi;"                            \
>>> -    "push %r8;"                            \
>>> -    "push %r9;"                            \
>>> -    "push %r10;"                            \
>>> -    "push %r11;"
>>> +    PV_SAVE_COMMON_CALLER_REGS
>>>   #define PV_RESTORE_ALL_CALLER_REGS                    \
>>> -    "pop %r11;"                            \
>>> -    "pop %r10;"                            \
>>> -    "pop %r9;"                            \
>>> -    "pop %r8;"                            \
>>> -    "pop %rdi;"                            \
>>> -    "pop %rsi;"                            \
>>> +    PV_RESTORE_COMMON_CALLER_REGS                    \
>>>       "pop %rdx;"                            \
>>>       "pop %rcx;"
>>>   #endif
>>> +#define PV_PROLOGUE_ALL(func)    PV_SAVE_ALL_CALLER_REGS
>>> +#define PV_EPILOGUE_ALL(func)    PV_RESTORE_ALL_CALLER_REGS
>>> +
>>>   /*
>>>    * Generate a thunk around a function which saves all caller-save
>>>    * registers except for the return value.  This allows C functions to
>>> @@ -607,7 +671,7 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
>>>    * functions.
>>>    */
>>>   #define PV_THUNK_NAME(func) "__raw_callee_save_" #func
>>> -#define __PV_CALLEE_SAVE_REGS_THUNK(func, section)            \
>>> +#define __PV_CALLEE_SAVE_REGS_THUNK(func, section, helper)        \
>>>       extern typeof(func) __raw_callee_save_##func;            \
>>>                                       \
>>>       asm(".pushsection " section ", \"ax\";"                \
>>> @@ -617,16 +681,18 @@ bool __raw_callee_save___native_vcpu_is_preempted(long 
>>> cpu);
>>>           PV_THUNK_NAME(func) ":"                    \
>>>           ASM_ENDBR                            \
>>>           FRAME_BEGIN                            \
>>> -        PV_SAVE_ALL_CALLER_REGS                    \
>>> +        PV_PROLOGUE_##helper(func)                    \
>>>           "call " #func ";"                        \
>>> -        PV_RESTORE_ALL_CALLER_REGS                    \
>>> +        PV_EPILOGUE_##helper(func)                    \
>>>           FRAME_END                            \
>>>           ASM_RET                            \
>>>           ".size " PV_THUNK_NAME(func) ", .-" PV_THUNK_NAME(func) ";"    \
>>>           ".popsection")
>>>   #define PV_CALLEE_SAVE_REGS_THUNK(func)            \
>>> -    __PV_CALLEE_SAVE_REGS_THUNK(func, ".text")
>>> +    __PV_CALLEE_SAVE_REGS_THUNK(func, ".text", ALL)
>>> +#define PV_CALLEE_SAVE_REGS_MSR_THUNK(func)        \
>>> +    __PV_CALLEE_SAVE_REGS_THUNK(func, ".text", MSR)
>>>   /* Get a reference to a callee-save function */
>>>   #define PV_CALLEE_SAVE(func)                        \
>>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/ 
>>> paravirt_types.h
>>> index b08b9d3122d6..f7f879319e90 100644
>>> --- a/arch/x86/include/asm/paravirt_types.h
>>> +++ b/arch/x86/include/asm/paravirt_types.h
>>> @@ -91,15 +91,15 @@ struct pv_cpu_ops {
>>>                 unsigned int *ecx, unsigned int *edx);
>>>       /* Unsafe MSR operations.  These will warn or panic on failure. */
>>> -    u64 (*read_msr)(u32 msr);
>>> -    void (*write_msr)(u32 msr, u64 val);
>>> +    struct paravirt_callee_save read_msr;
>>> +    struct paravirt_callee_save write_msr;
>>>       /*
>>>        * Safe MSR operations.
>>>        * Returns 0 or -EIO.
>>>        */
>>> -    int (*read_msr_safe)(u32 msr, u64 *val);
>>> -    int (*write_msr_safe)(u32 msr, u64 val);
>>> +    struct paravirt_callee_save read_msr_safe;
>>> +    struct paravirt_callee_save write_msr_safe;
>>>       u64 (*read_pmc)(int counter);
>>> @@ -520,6 +520,10 @@ unsigned long pv_native_save_fl(void);
>>>   void pv_native_irq_disable(void);
>>>   void pv_native_irq_enable(void);
>>>   unsigned long pv_native_read_cr2(void);
>>> +void pv_native_rdmsr(void);
>>> +void pv_native_wrmsr(void);
>>> +void pv_native_rdmsr_safe(void);
>>> +void pv_native_wrmsr_safe(void);
>>>   #endif
>>>   #define paravirt_nop    ((void *)nop_func)
>>> @@ -527,6 +531,7 @@ unsigned long pv_native_read_cr2(void);
>>>   #endif    /* __ASSEMBLER__ */
>>>   #define ALT_NOT_XEN    ALT_NOT(X86_FEATURE_XENPV)
>>> +#define ALT_XENPV_CALL    ALT_DIRECT_CALL(X86_FEATURE_XENPV)
>>>   #endif  /* CONFIG_PARAVIRT */
>>>   #endif    /* _ASM_X86_PARAVIRT_TYPES_H */
>>> diff --git a/arch/x86/include/asm/qspinlock_paravirt.h b/arch/x86/include/ 
>>> asm/ qspinlock_paravirt.h
>>> index 0a985784be9b..0351acb5a143 100644
>>> --- a/arch/x86/include/asm/qspinlock_paravirt.h
>>> +++ b/arch/x86/include/asm/qspinlock_paravirt.h
>>> @@ -14,7 +14,8 @@ void __lockfunc __pv_queued_spin_unlock_slowpath(struct 
>>> qspinlock *lock, u8 lock
>>>    */
>>>   #ifdef CONFIG_64BIT
>>> -__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, 
>>> ".spinlock.text");
>>> +__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath, ".spinlock.text",
>>> +                ALL);
>>>   #define __pv_queued_spin_unlock    __pv_queued_spin_unlock
>>>   /*
>>> @@ -61,7 +62,7 @@ DEFINE_ASM_FUNC(__raw_callee_save___pv_queued_spin_unlock,
>>>   #else /* CONFIG_64BIT */
>>>   extern void __lockfunc __pv_queued_spin_unlock(struct qspinlock *lock);
>>> -__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text");
>>> +__PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock, ".spinlock.text", ALL);
>>>   #endif /* CONFIG_64BIT */
>>>   #endif
>>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>>> index 015bf298434f..ff7d7fdae360 100644
>>> --- a/arch/x86/kernel/paravirt.c
>>> +++ b/arch/x86/kernel/paravirt.c
>>> @@ -50,6 +50,24 @@ DEFINE_ASM_FUNC(pv_native_save_fl, "pushf; pop 
>>> %rax", .noinstr.text);
>>>   DEFINE_ASM_FUNC(pv_native_irq_disable, "cli", .noinstr.text);
>>>   DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
>>>   DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
>>> +DEFINE_ASM_FUNC(pv_native_rdmsr,
>>> +        "1: rdmsr\n"
>>> +        "2:\n"
>>> +        _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR), .noinstr.text);
>>> +DEFINE_ASM_FUNC(pv_native_wrmsr,
>>> +        "1: wrmsr\n"
>>> +        "2:\n"
>>> +        _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR), .noinstr.text);
>>> +DEFINE_ASM_FUNC(pv_native_rdmsr_safe,
>>> +        "1: rdmsr; xor %ecx, %ecx\n"
>>> +        "2:\n"
>>> +        _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %%ecx),
>>> +        .noinstr.text);
>>> +DEFINE_ASM_FUNC(pv_native_wrmsr_safe,
>>> +        "1: wrmsr; xor %eax, %eax\n"
>>> +        "2:\n"
>>> +        _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %%eax),
>>> +        .noinstr.text);
>>>   #endif
>>>   DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
>>> @@ -129,10 +147,10 @@ struct paravirt_patch_template pv_ops = {
>>>       .cpu.read_cr0        = native_read_cr0,
>>>       .cpu.write_cr0        = native_write_cr0,
>>>       .cpu.write_cr4        = native_write_cr4,
>>> -    .cpu.read_msr        = native_read_msr,
>>> -    .cpu.write_msr        = native_write_msr,
>>> -    .cpu.read_msr_safe    = native_read_msr_safe,
>>> -    .cpu.write_msr_safe    = native_write_msr_safe,
>>> +    .cpu.read_msr        = __PV_IS_CALLEE_SAVE(pv_native_rdmsr),
>>> +    .cpu.write_msr        = __PV_IS_CALLEE_SAVE(pv_native_wrmsr),
>>> +    .cpu.read_msr_safe    = __PV_IS_CALLEE_SAVE(pv_native_rdmsr_safe),
>>> +    .cpu.write_msr_safe    = __PV_IS_CALLEE_SAVE(pv_native_wrmsr_safe),
>>>       .cpu.read_pmc        = native_read_pmc,
>>>       .cpu.load_tr_desc    = native_load_tr_desc,
>>>       .cpu.set_ldt        = native_set_ldt,
>>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>>> index 3be38350f044..c279b2bef7eb 100644
>>> --- a/arch/x86/xen/enlighten_pv.c
>>> +++ b/arch/x86/xen/enlighten_pv.c
>>> @@ -1160,36 +1160,66 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
>>>       }
>>>   }
>>> -static int xen_read_msr_safe(u32 msr, u64 *val)
>>> -{
>>> +/*
>>> + * Prototypes for functions called via PV_CALLEE_SAVE_REGS_THUNK() in order
>>> + * to avoid warnings with "-Wmissing-prototypes".
>>> + */
>>> +struct xen_rdmsr_safe_ret {
>>> +    u64 val;
>>>       int err;
>>> +};
>>> +struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr);
>>> +int xen_write_msr_safe(u32 msr, u32 low, u32 high);
>>> +u64 xen_read_msr(u32 msr);
>>> +void xen_write_msr(u32 msr, u32 low, u32 high);
>>> -    *val = xen_do_read_msr(msr, &err);
>>> -    return err;
>>> +__visible struct xen_rdmsr_safe_ret xen_read_msr_safe(u32 msr)
>>> +{
>>> +    struct xen_rdmsr_safe_ret ret;
>>
>> struct xen_rdmsr_safe_ret ret = { 0, 0 };
>>
>> Because the 'err' member may not be set in xen_do_read_msr().
> 
> Right.
> 
>>
>>> +
>>> +    ret.val = xen_do_read_msr(msr, &ret.err);
>>> +    return ret;
>>>   }
>>> +#define PV_PROLOGUE_MSR_xen_read_msr_safe    "mov %ecx, %edi;"
>>> +#define PV_EPILOGUE_MSR_xen_read_msr_safe    \
>>> +    "mov %edx, %ecx; mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
>>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr_safe);
>>> -static int xen_write_msr_safe(u32 msr, u64 val)
>>> +__visible int xen_write_msr_safe(u32 msr, u32 low, u32 high)
>>
>> I think we can avoid splitting this u64 into two u32.
> 
> This is related to the native WRMSR interface. The WRMSR needs to be
> able to be replaced by the call of the Xen specific function.
> 
> I could handle this in the prologue helpers, but I'd prefer to keep
> those helpers as small as possible.
> 
>>
>>>   {
>>>       int err = 0;
>>> -    xen_do_write_msr(msr, val, &err);
>>> +    xen_do_write_msr(msr, (u64)high << 32 | low, &err);
>>>       return err;
>>>   }
>>> +#define PV_PROLOGUE_MSR_xen_write_msr_safe    \
>>> +    "mov %ecx, %edi; mov %eax, %esi;"
>>> +#define PV_EPILOGUE_MSR_xen_write_msr_safe
>>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr_safe);
>>> -static u64 xen_read_msr(u32 msr)
>>> +__visible u64 xen_read_msr(u32 msr)
>>>   {
>>>       int err;
>>>       return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
>>>   }
>>> +#define PV_PROLOGUE_MSR_xen_read_msr    "mov %ecx, %edi;"
>>> +#define PV_EPILOGUE_MSR_xen_read_msr    \
>>> +    "mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
>>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr);
>>> -static void xen_write_msr(u32 msr, u64 val)
>>> +__visible void xen_write_msr(u32 msr, u32 low, u32 high)
>>
>> Ditto.
> 
> See above.
> 
>>
>>>   {
>>>       int err;
>>> -    xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
>>> +    xen_do_write_msr(msr, (u64)high << 32 | low,
>>> +             xen_msr_safe ? &err : NULL);
>>>   }
>>> +#define PV_PROLOGUE_MSR_xen_write_msr    \
>>> +    "mov %ecx, %edi; mov %eax, %esi;"
>>> +#define PV_EPILOGUE_MSR_xen_write_msr
>>> +PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr);
>>>   /* This is called once we have the cpu_possible_mask */
>>>   void __init xen_setup_vcpu_info_placement(void)
> 
> 
> Juergen


[-- Attachment #1.1.2: 0001-x86-alternative-save-original-code-before-replacing-.patch --]
[-- Type: text/x-patch, Size: 4068 bytes --]

From 7db2e9790442d073d25fec220d88fb2f85e4683f Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Wed, 7 May 2025 12:47:22 +0200
Subject: [PATCH] x86/alternative: save original code before replacing it

In case of ALT_FLAG_DIRECT_CALL being set for an alternative
replacement, the patching needs to look at the original instruction to
find the target address of the direct call to be patched in.

In case of nested ALTERNATIVEs this limits the use of
ALT_FLAG_DIRECT_CALL to either the first replacement, or to be
mutually exclusive with all previous replacements. Otherwise the
original code could have been overwritten already resulting in a
BUG(), due to ALT_FLAG_DIRECT_CALL handling not finding the expected
indirect call instruction.

Avoid this problem by saving the original code before replacing it. As
this is the only case where the original code is required to be
analyzed, special case the copy to happen only if the original code has
the length of an indirect call (6 bytes). This minimizes complexity and
stack usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kernel/alternative.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index bf82c6f7d690..8d6d3a4fc4ab 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -387,6 +387,12 @@ EXPORT_SYMBOL(BUG_func);
 
 #define CALL_RIP_REL_OPCODE	0xff
 #define CALL_RIP_REL_MODRM	0x15
+#define CALL_RIP_INSTR_LEN	6
+
+static inline u8 * instr_va(struct alt_instr *i)
+{
+	return (u8 *)&i->instr_offset + i->instr_offset;
+}
 
 /*
  * Rewrite the "call BUG_func" replacement to point to the target of the
@@ -402,7 +408,7 @@ static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a)
 		BUG();
 	}
 
-	if (a->instrlen != 6 ||
+	if (a->instrlen != CALL_RIP_INSTR_LEN ||
 	    instr[0] != CALL_RIP_REL_OPCODE ||
 	    instr[1] != CALL_RIP_REL_MODRM) {
 		pr_err("ALT_FLAG_DIRECT_CALL set for unrecognized indirect call\n");
@@ -414,7 +420,7 @@ static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a)
 #ifdef CONFIG_X86_64
 	/* ff 15 00 00 00 00   call   *0x0(%rip) */
 	/* target address is stored at "next instruction + disp". */
-	target = *(void **)(instr + a->instrlen + disp);
+	target = *(void **)(instr_va(a) + a->instrlen + disp);
 #else
 	/* ff 15 00 00 00 00   call   *0x0 */
 	/* target address is stored at disp. */
@@ -432,11 +438,6 @@ static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a)
 	return 5;
 }
 
-static inline u8 * instr_va(struct alt_instr *i)
-{
-	return (u8 *)&i->instr_offset + i->instr_offset;
-}
-
 /*
  * Replace instructions with better alternatives for this CPU type. This runs
  * before SMP is initialized to avoid SMP problems with self modifying code.
@@ -451,7 +452,8 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
 						  struct alt_instr *end)
 {
 	u8 insn_buff[MAX_PATCH_LEN];
-	u8 *instr, *replacement;
+	u8 old_insn[CALL_RIP_INSTR_LEN];
+	u8 *instr, *replacement, *old_va = NULL;
 	struct alt_instr *a, *b;
 
 	DPRINTK(ALT, "alt table %px, -> %px", start, end);
@@ -513,11 +515,21 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
 			instr, instr, a->instrlen,
 			replacement, a->replacementlen, a->flags);
 
+		/*
+		 * Remember original code if it could be an indirect call.
+		 * This enables ALT_FLAG_DIRECT_CALL handling with nested
+		 * alternatives even if the original code has been modified
+		 * already.
+		 */
+		if (old_va != instr && a->instrlen == CALL_RIP_INSTR_LEN) {
+			old_va = instr;
+			memcpy(old_insn, instr, CALL_RIP_INSTR_LEN);
+		}
 		memcpy(insn_buff, replacement, a->replacementlen);
 		insn_buff_sz = a->replacementlen;
 
 		if (a->flags & ALT_FLAG_DIRECT_CALL) {
-			insn_buff_sz = alt_replace_call(instr, insn_buff, a);
+			insn_buff_sz = alt_replace_call(old_insn, insn_buff, a);
 			if (insn_buff_sz < 0)
 				continue;
 		}
-- 
2.43.0


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-12 11:24       ` Juergen Gross
@ 2025-05-13  5:55         ` Xin Li
  2025-05-13  6:06           ` Jürgen Groß
  0 siblings, 1 reply; 30+ messages in thread
From: Xin Li @ 2025-05-13  5:55 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Boris Ostrovsky, xen-devel

On 5/12/2025 4:24 AM, Juergen Gross wrote:
> Now with the mentioned patch really attached. :-)
> 

Does it allow patching with an instruction more than 6 bytes long?

The immediate form MSR instructions are 9 bytes long.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-13  5:55         ` Xin Li
@ 2025-05-13  6:06           ` Jürgen Groß
  2025-05-13  6:55             ` Xin Li
  2025-05-13 22:24             ` H. Peter Anvin
  0 siblings, 2 replies; 30+ messages in thread
From: Jürgen Groß @ 2025-05-13  6:06 UTC (permalink / raw)
  To: Xin Li, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Boris Ostrovsky, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 318 bytes --]

On 13.05.25 07:55, Xin Li wrote:
> On 5/12/2025 4:24 AM, Juergen Gross wrote:
>> Now with the mentioned patch really attached. :-)
>>
> 
> Does it allow patching with an instruction more than 6 bytes long?
> 
> The immediate form MSR instructions are 9 bytes long.

Yes, shouldn't be a problem.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-13  6:06           ` Jürgen Groß
@ 2025-05-13  6:55             ` Xin Li
  2025-05-13 22:24             ` H. Peter Anvin
  1 sibling, 0 replies; 30+ messages in thread
From: Xin Li @ 2025-05-13  6:55 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Boris Ostrovsky, xen-devel

On 5/12/2025 11:06 PM, Jürgen Groß wrote:
> On 13.05.25 07:55, Xin Li wrote:
>> On 5/12/2025 4:24 AM, Juergen Gross wrote:
>>> Now with the mentioned patch really attached. :-)
>>>
>>
>> Does it allow patching with an instruction more than 6 bytes long?
>>
>> The immediate form MSR instructions are 9 bytes long.
> 
> Yes, shouldn't be a problem.
> 

Excellent, I will give it a try.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-12 11:20     ` Jürgen Groß
  2025-05-12 11:24       ` Juergen Gross
@ 2025-05-13  7:44       ` Xin Li
  2025-06-11 12:58         ` Juergen Gross
  1 sibling, 1 reply; 30+ messages in thread
From: Xin Li @ 2025-05-13  7:44 UTC (permalink / raw)
  To: Jürgen Groß, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Broadcom internal kernel review list, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Boris Ostrovsky, xen-devel, Andrew Cooper

On 5/12/2025 4:20 AM, Jürgen Groß wrote:
> On 09.05.25 10:18, Xin Li wrote:
>> On 5/6/2025 2:20 AM, Juergen Gross wrote:
>> I'm trying to evaluate how to add the immediate form MSR instructions
>> on top of this patch set.  And I'm close to get it done.
> 
> There is something to consider when running as a Xen PV guest, ...

Andrew said he doens't plan to expose WRMSRNS to PV guests, and doesn't
expect MSR_IMM to be useful in a PV guest either, which I fully agree.
>>>
>>> Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
>>> even as an intermediate step, as this would clobber the indirect call
>>> information needed when patching in the direct call for the Xen case.
>>
>> Good point!
> 
> ... as this still needs to be true.
> 
> There are 2 different ways to deal with this:
> 
> 1. When running as a Xen PV guest disable X86_FEATURE_WRMSRNS and
>     ASM_WRMSRNS_IMM (e.g. in xen_init_capabilities()).
> 
> 2. Buffer the original instruction before patching in apply_alternatives()
>     in order to avoid the sequence limitation above (see attached patch).
> 
>> Deciding whether to retain the pvops MSR API is the responsibility of
>> the x86 maintainers, who are the ones experiencing the challenges of 
>> maintaining the code.
> 
> Well, I'm the PV ops maintainer, so it is basically me who needs to deal
> with this. OTOH I do understand that diagnosis of problems with PV ops is
> more complicated than without.

Indeed, significant improvements continue to be implemented.

> 
>>
>> tglx said @https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/:
>>
>>  > I fundamentaly hate adding this to the PV infrastructure. We don't
>>  > want more PV ops, quite the contrary.
>>
>> That is the reason I took a different direction, i.e., removing the
>> pvops MSR APIs.  But if your approach is cleaner, they may prefer it.
> 
> In the end it isn't adding additional PV ops interfaces. It is modifying
> existing ones.
> 
>>
>>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/ 
>>> paravirt.h
>>> index a463c747c780..df10b0e4f7b8 100644
>>> --- a/arch/x86/include/asm/paravirt.h
>>> +++ b/arch/x86/include/asm/paravirt.h
>>> @@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
>>>       PVOP_VCALL1(cpu.write_cr4, x);
>>>   }
>>> -static inline u64 paravirt_read_msr(u32 msr)
>>> +static __always_inline u64 paravirt_read_msr(u32 msr)
>>>   {
>>> -    return PVOP_CALL1(u64, cpu.read_msr, msr);
>>> +    EAX_EDX_DECLARE_ARGS(val, low, high);
>>
>> This is under CONFIG_PARAVIRT_XXL, thus CONFIG_XEN_PV and CONFIG_X86_64,
>> therefore we don't need to consider 32-bit at all, no?
> 
> Right. OTOH the macros are there, so why not use them?
> 
> In the end I'm fine to open code the 64-bit case here.
> 

Here is a patch I cooked.  I added an ALTERNATIVE() hack because the new 
instructions can't be more than 6 bytes long.  But with the patch you
just sent, it shouldn't be needed.

diff --git a/arch/x86/include/asm/paravirt.h 
b/arch/x86/include/asm/paravirt.h
index df10b0e4f7b8..82ffc11d6f1f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -177,18 +177,20 @@ static inline void __write_cr4(unsigned long x)

  static __always_inline u64 paravirt_read_msr(u32 msr)
  {
-	EAX_EDX_DECLARE_ARGS(val, low, high);
+	u64 val;

  	PVOP_TEST_NULL(cpu.read_msr);
  	asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
  					"rdmsr", ALT_NOT_XEN,
  					ALT_CALL_INSTR, ALT_XENPV_CALL)
+		     ALTERNATIVE("", "shl $0x20, %%rdx; or %%rdx, %%rax", ALT_NOT_XEN)
  		     "2:\n"
  		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
-		     : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
-		     : paravirt_ptr(cpu.read_msr), "c" (msr));
+		     : "=a" (val), ASM_CALL_CONSTRAINT
+		     : paravirt_ptr(cpu.read_msr), "c" (msr)
+		     : "rdx");

-	return EAX_EDX_VAL(val, low, high);
+	return val;
  }

  static __always_inline void paravirt_write_msr(u32 msr, u64 val)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index ea3d7d583254..cacd9c37c3bd 100644
@@ -1204,20 +1206,20 @@ __visible u64 xen_read_msr(u32 msr)

  	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
  }
+
  #define PV_PROLOGUE_MSR_xen_read_msr	"mov %ecx, %edi;"
-#define PV_EPILOGUE_MSR_xen_read_msr	\
-	"mov %rax, %rdx; mov %eax, %eax; shr $0x20, %rdx;"
+#define PV_EPILOGUE_MSR_xen_read_msr
  PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_read_msr);

-__visible void xen_write_msr(u32 msr, u32 low, u32 high)
+__visible void xen_write_msr(u32 msr, u64 val)
  {
  	int err;

-	xen_do_write_msr(msr, (u64)high << 32 | low,
-			 xen_msr_safe ? &err : NULL);
+	xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
  }
+
  #define PV_PROLOGUE_MSR_xen_write_msr	\
-	"mov %ecx, %edi; mov %eax, %esi;"
+	"mov %ecx, %edi; mov %rax, %rsi;"
  #define PV_EPILOGUE_MSR_xen_write_msr
  PV_CALLEE_SAVE_REGS_MSR_THUNK(xen_write_msr);



>>> +__visible int xen_write_msr_safe(u32 msr, u32 low, u32 high)
>>
>> I think we can avoid splitting this u64 into two u32.
> 
> This is related to the native WRMSR interface. The WRMSR needs to be
> able to be replaced by the call of the Xen specific function.
> 
> I could handle this in the prologue helpers, but I'd prefer to keep
> those helpers as small as possible.

The above patch makes PV_EPILOGUE_MSR_xen_read_msr empty, because only
RDMSR needs to convert edx:eax into a 64-bit register, and the code is
added into paravirt_read_msr() already.

For xen_write_msr(), the change is simple enough.

Thanks!
     Xin

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-13  6:06           ` Jürgen Groß
  2025-05-13  6:55             ` Xin Li
@ 2025-05-13 22:24             ` H. Peter Anvin
  2025-05-15  7:32               ` Xin Li
  1 sibling, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2025-05-13 22:24 UTC (permalink / raw)
  To: Jürgen Groß, Xin Li, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Boris Ostrovsky, xen-devel

On May 12, 2025 11:06:02 PM PDT, "Jürgen Groß" <jgross@suse.com> wrote:
>On 13.05.25 07:55, Xin Li wrote:
>> On 5/12/2025 4:24 AM, Juergen Gross wrote:
>>> Now with the mentioned patch really attached. :-)
>>> 
>> 
>> Does it allow patching with an instruction more than 6 bytes long?
>> 
>> The immediate form MSR instructions are 9 bytes long.
>
>Yes, shouldn't be a problem.
>
>
>Juergen

However, it is more than that. The immediate instructions have a different interface, and it makes more sense to use the extra bytes to shuffle the bits around for the legacy forms:

Write:

    mov %rax,%rdx
    shr $32,%rdx
    wrmsr(ns)

Read:

    rdmsr
    shl $32,%rdx
    or %rdx,%rax

For the write case, this also means that two separate trap points are needed.

As far as Xen (the only user of pv msrs), note that it only paravirtualizes a very small number of MSRs, and some of those are fairly performance sensitive, so not going through the Xen framework for MSRs known to be either native or null on Xen would definitely be a win.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-13 22:24             ` H. Peter Anvin
@ 2025-05-15  7:32               ` Xin Li
  2025-05-15 20:24                 ` H. Peter Anvin
  0 siblings, 1 reply; 30+ messages in thread
From: Xin Li @ 2025-05-15  7:32 UTC (permalink / raw)
  To: H. Peter Anvin, Jürgen Groß, linux-kernel, x86,
	virtualization, Peter Zijlstra
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Boris Ostrovsky, xen-devel

On 5/13/2025 3:24 PM, H. Peter Anvin wrote:
> On May 12, 2025 11:06:02 PM PDT, "Jürgen Groß" <jgross@suse.com> wrote:
>> On 13.05.25 07:55, Xin Li wrote:
>>> On 5/12/2025 4:24 AM, Juergen Gross wrote:
>>>> Now with the mentioned patch really attached. :-)
>>>>
>>>
>>> Does it allow patching with an instruction more than 6 bytes long?
>>>
>>> The immediate form MSR instructions are 9 bytes long.
>>
>> Yes, shouldn't be a problem.
>>
>>
>> Juergen
> 
> However, it is more than that. The immediate instructions have a different interface, and it makes more sense to use the extra bytes to shuffle the bits around for the legacy forms:
> 
> Write:
> 
>      mov %rax,%rdx
>      shr $32,%rdx
>      wrmsr(ns)
> 
> Read:
> 
>      rdmsr
>      shl $32,%rdx
>      or %rdx,%rax
> 
> For the write case, this also means that two separate trap points are needed.
> 
> As far as Xen (the only user of pv msrs), note that it only paravirtualizes a very small number of MSRs, and some of those are fairly performance sensitive, so not going through the Xen framework for MSRs known to be either native or null on Xen would definitely be a win.
> 
> 

Hi Juergen,

I have some update on this thread while working on it.

If we continue down the path of maintaining pvops MSR APIs as this patch
series does, it seems we’ll need to duplicate the ALTERNATIVE code in
three different places.

1) The MSR access primitives defined in <asm/msr.h>, which is used when
    CONFIG_PARAVIRT=n.

2) The pvops native MSR functions pv_native_{rd,wr}msr{,_safe}() defined
    in arch/x86/kernel/paravirt.c, used when CONFIG_PARAVIRT=y on bare
    metal.

3) The pvops Xen MSR functions paravirt_{read,write}_msr{,_safe}()
    defined in <asm/paravirt.h>, used when CONFIG_PARAVIRT_XXL=y.

hpa had mentioned to me earlier that this would be a maintenance burden
— something I only truly realized once I got hands-on with it.

Maybe you have something in mind to address it?

Also add PeterZ to the To list because he cares it.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-15  7:32               ` Xin Li
@ 2025-05-15 20:24                 ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2025-05-15 20:24 UTC (permalink / raw)
  To: Xin Li, Jürgen Groß, linux-kernel, x86, virtualization,
	Peter Zijlstra
  Cc: Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Boris Ostrovsky, xen-devel

On 5/15/25 00:32, Xin Li wrote:
> 
> Hi Juergen,
> 
> I have some update on this thread while working on it.
> 
> If we continue down the path of maintaining pvops MSR APIs as this patch
> series does, it seems we’ll need to duplicate the ALTERNATIVE code in
> three different places.
> 
> 1) The MSR access primitives defined in <asm/msr.h>, which is used when
>     CONFIG_PARAVIRT=n.
> 
> 2) The pvops native MSR functions pv_native_{rd,wr}msr{,_safe}() defined
>     in arch/x86/kernel/paravirt.c, used when CONFIG_PARAVIRT=y on bare
>     metal.
> 
> 3) The pvops Xen MSR functions paravirt_{read,write}_msr{,_safe}()
>     defined in <asm/paravirt.h>, used when CONFIG_PARAVIRT_XXL=y.
> 
> hpa had mentioned to me earlier that this would be a maintenance burden
> — something I only truly realized once I got hands-on with it.
> 
> Maybe you have something in mind to address it?
> 
> Also add PeterZ to the To list because he cares it.
> 

Having the code being duplicated is definitely not a good thing; 
although I'm not one of the x86 maintainers anymore, I would consider it 
a strong reason to NAK such a patchset.

At one point I was considering augmenting the alternatives framework to 
be able to call an ad hoc subroutine to generate the code. It would be 
useful in cases like this, where if PV is enabled it can make a callout 
to the currently-active PV code to query the desired code to be output.

There are 16 unused bits in the alternatives table (not counting the 14 
unused flag bits), which could be used for an enumeration of such 
subroutines, optionally split into 8 bits of function enumeration and 8 
bits of private data. In this case, the "replacement" pointer becomes 
available as a private pointer; possibly to a metadata structure used by 
the subroutine.

This could also be used to significantly enhance the static-immediate 
framework, by being able to have explicit code which handles the 
transformations instead of needing to rely on assembly hacks. That way 
we might even be able to do that kind of transformations for any 
ro_after_init value.

I think the biggest concern is how this would affect objtool, since 
objtool would now not have any kind of direct visibility into the 
possibly generated code. How to best feed the information objtool needs 
to it would be my biggest question (in part because I don't know what 
objtool would actually need.)

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-05-13  7:44       ` Xin Li
@ 2025-06-11 12:58         ` Juergen Gross
  2025-06-13  7:31           ` Xin Li
  2025-08-25  1:54           ` Xin Li
  0 siblings, 2 replies; 30+ messages in thread
From: Juergen Gross @ 2025-06-11 12:58 UTC (permalink / raw)
  To: Xin Li, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Broadcom internal kernel review list, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Boris Ostrovsky, xen-devel, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 5856 bytes --]

On 13.05.25 09:44, Xin Li wrote:
> On 5/12/2025 4:20 AM, Jürgen Groß wrote:
>> On 09.05.25 10:18, Xin Li wrote:
>>> On 5/6/2025 2:20 AM, Juergen Gross wrote:
>>> I'm trying to evaluate how to add the immediate form MSR instructions
>>> on top of this patch set.  And I'm close to get it done.
>>
>> There is something to consider when running as a Xen PV guest, ...
> 
> Andrew said he doens't plan to expose WRMSRNS to PV guests, and doesn't
> expect MSR_IMM to be useful in a PV guest either, which I fully agree.
>>>>
>>>> Note that in the Xen PV case the RDMSR/WRMSR patching must not happen
>>>> even as an intermediate step, as this would clobber the indirect call
>>>> information needed when patching in the direct call for the Xen case.
>>>
>>> Good point!
>>
>> ... as this still needs to be true.
>>
>> There are 2 different ways to deal with this:
>>
>> 1. When running as a Xen PV guest disable X86_FEATURE_WRMSRNS and
>>     ASM_WRMSRNS_IMM (e.g. in xen_init_capabilities()).
>>
>> 2. Buffer the original instruction before patching in apply_alternatives()
>>     in order to avoid the sequence limitation above (see attached patch).
>>
>>> Deciding whether to retain the pvops MSR API is the responsibility of
>>> the x86 maintainers, who are the ones experiencing the challenges of 
>>> maintaining the code.
>>
>> Well, I'm the PV ops maintainer, so it is basically me who needs to deal
>> with this. OTOH I do understand that diagnosis of problems with PV ops is
>> more complicated than without.
> 
> Indeed, significant improvements continue to be implemented.
> 
>>
>>>
>>> tglx said @https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/:
>>>
>>>  > I fundamentaly hate adding this to the PV infrastructure. We don't
>>>  > want more PV ops, quite the contrary.
>>>
>>> That is the reason I took a different direction, i.e., removing the
>>> pvops MSR APIs.  But if your approach is cleaner, they may prefer it.
>>
>> In the end it isn't adding additional PV ops interfaces. It is modifying
>> existing ones.
>>
>>>
>>>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/ paravirt.h
>>>> index a463c747c780..df10b0e4f7b8 100644
>>>> --- a/arch/x86/include/asm/paravirt.h
>>>> +++ b/arch/x86/include/asm/paravirt.h
>>>> @@ -175,24 +175,72 @@ static inline void __write_cr4(unsigned long x)
>>>>       PVOP_VCALL1(cpu.write_cr4, x);
>>>>   }
>>>> -static inline u64 paravirt_read_msr(u32 msr)
>>>> +static __always_inline u64 paravirt_read_msr(u32 msr)
>>>>   {
>>>> -    return PVOP_CALL1(u64, cpu.read_msr, msr);
>>>> +    EAX_EDX_DECLARE_ARGS(val, low, high);
>>>
>>> This is under CONFIG_PARAVIRT_XXL, thus CONFIG_XEN_PV and CONFIG_X86_64,
>>> therefore we don't need to consider 32-bit at all, no?
>>
>> Right. OTOH the macros are there, so why not use them?
>>
>> In the end I'm fine to open code the 64-bit case here.
>>
> 
> Here is a patch I cooked.  I added an ALTERNATIVE() hack because the new 
> instructions can't be more than 6 bytes long.  But with the patch you
> just sent, it shouldn't be needed.

I have meanwhile dropped the patch copying the original indirect call.

Reason is that I'm seeing a potential risk with current alternative
patching when using ALTERNATIVE_[23](): depending on the tested features
it might happen that an instruction sequence not suitable for the current
runtime environment is patched in as an intermediate step. In case there
is an interrupt happening just then AND the handling of the interrupt is
using the patch site, this could result in crashes or undefined behavior.

I have meanwhile a set of 3 patches fixing that problem by merging all
alternative patching of a patch site in the local buffer and only then
patching the code at the target site with the final result.

The same problem arises with your code below, but this time it isn't
fixed by my patches: the two ALTERNATIVE() instances in the asm() construct
would need to be patched in a single atomic operation to be consistent.
Otherwise you could end up e.g. on bare metal with paravirt_read_msr()
having replaced the indirect call with "rdmsr", but not yet having added
the code to merge %rdx into %rax.

I'm just doing a V2 of my series, but this time including the additional
support of the non-serializing and immediate forms. Lets see how this will
look like. I will drop using the EAX_EDX_* macros, but due to the reason
mentioned above I won't switch to your variant completely.

> 
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index df10b0e4f7b8..82ffc11d6f1f 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -177,18 +177,20 @@ static inline void __write_cr4(unsigned long x)
> 
>   static __always_inline u64 paravirt_read_msr(u32 msr)
>   {
> -    EAX_EDX_DECLARE_ARGS(val, low, high);
> +    u64 val;
> 
>       PVOP_TEST_NULL(cpu.read_msr);
>       asm volatile("1: "ALTERNATIVE_2(PARAVIRT_CALL,
>                       "rdmsr", ALT_NOT_XEN,
>                       ALT_CALL_INSTR, ALT_XENPV_CALL)
> +             ALTERNATIVE("", "shl $0x20, %%rdx; or %%rdx, %%rax", ALT_NOT_XEN)
>                "2:\n"
>                _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
> -             : EAX_EDX_RET(val, low, high), ASM_CALL_CONSTRAINT
> -             : paravirt_ptr(cpu.read_msr), "c" (msr));
> +             : "=a" (val), ASM_CALL_CONSTRAINT
> +             : paravirt_ptr(cpu.read_msr), "c" (msr)
> +             : "rdx");
> 
> -    return EAX_EDX_VAL(val, low, high);
> +    return val;

Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-06-11 12:58         ` Juergen Gross
@ 2025-06-13  7:31           ` Xin Li
  2025-06-13  8:01             ` Jürgen Groß
  2025-08-25  1:54           ` Xin Li
  1 sibling, 1 reply; 30+ messages in thread
From: Xin Li @ 2025-06-13  7:31 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Broadcom internal kernel review list, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Boris Ostrovsky, xen-devel, Andrew Cooper

On 6/11/2025 5:58 AM, Juergen Gross wrote:
>> Here is a patch I cooked.  I added an ALTERNATIVE() hack because the 
>> new instructions can't be more than 6 bytes long.  But with the patch you
>> just sent, it shouldn't be needed.
> 
> I have meanwhile dropped the patch copying the original indirect call.
> 
> Reason is that I'm seeing a potential risk with current alternative
> patching when using ALTERNATIVE_[23](): depending on the tested features
> it might happen that an instruction sequence not suitable for the current
> runtime environment is patched in as an intermediate step. In case there
> is an interrupt happening just then AND the handling of the interrupt is
> using the patch site, this could result in crashes or undefined behavior.

Oh, I had assumed that Linux disables interrupts during the patching
process. Just out of curiosity, why are interrupts allowed in this case?

> 
> I have meanwhile a set of 3 patches fixing that problem by merging all
> alternative patching of a patch site in the local buffer and only then
> patching the code at the target site with the final result.
> 
> The same problem arises with your code below, but this time it isn't
> fixed by my patches: the two ALTERNATIVE() instances in the asm() construct
> would need to be patched in a single atomic operation to be consistent.
> Otherwise you could end up e.g. on bare metal with paravirt_read_msr()
> having replaced the indirect call with "rdmsr", but not yet having added
> the code to merge %rdx into %rax.
> 
> I'm just doing a V2 of my series, but this time including the additional
> support of the non-serializing and immediate forms. Lets see how this will
> look like. I will drop using the EAX_EDX_* macros, but due to the reason
> mentioned above I won't switch to your variant completely.

Great!

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-06-13  7:31           ` Xin Li
@ 2025-06-13  8:01             ` Jürgen Groß
  0 siblings, 0 replies; 30+ messages in thread
From: Jürgen Groß @ 2025-06-13  8:01 UTC (permalink / raw)
  To: Xin Li, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Broadcom internal kernel review list, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Boris Ostrovsky, xen-devel, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 1240 bytes --]

On 13.06.25 09:31, Xin Li wrote:
> On 6/11/2025 5:58 AM, Juergen Gross wrote:
>>> Here is a patch I cooked.  I added an ALTERNATIVE() hack because the new 
>>> instructions can't be more than 6 bytes long.  But with the patch you
>>> just sent, it shouldn't be needed.
>>
>> I have meanwhile dropped the patch copying the original indirect call.
>>
>> Reason is that I'm seeing a potential risk with current alternative
>> patching when using ALTERNATIVE_[23](): depending on the tested features
>> it might happen that an instruction sequence not suitable for the current
>> runtime environment is patched in as an intermediate step. In case there
>> is an interrupt happening just then AND the handling of the interrupt is
>> using the patch site, this could result in crashes or undefined behavior.
> 
> Oh, I had assumed that Linux disables interrupts during the patching
> process. Just out of curiosity, why are interrupts allowed in this case?

Interrupts are disabled within text_poke_early() while patching a single
instance.

I guess keeping interrupts disabled during the complete apply_alternatives()
handling would potentially result in a too long period without handling any
interrupts.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-06-11 12:58         ` Juergen Gross
  2025-06-13  7:31           ` Xin Li
@ 2025-08-25  1:54           ` Xin Li
  2025-08-26 10:39             ` Jürgen Groß
  1 sibling, 1 reply; 30+ messages in thread
From: Xin Li @ 2025-08-25  1:54 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Broadcom internal kernel review list, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Boris Ostrovsky, xen-devel, Andrew Cooper

On 6/11/2025 5:58 AM, Juergen Gross wrote:
> I'm just doing a V2 of my series, but this time including the additional
> support of the non-serializing and immediate forms. Lets see how this will
> look like. I will drop using the EAX_EDX_* macros, but due to the reason
> mentioned above I won't switch to your variant completely.

Hi Juergen,

Do you have any update on this?

Thanks!
     Xin


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
  2025-08-25  1:54           ` Xin Li
@ 2025-08-26 10:39             ` Jürgen Groß
  0 siblings, 0 replies; 30+ messages in thread
From: Jürgen Groß @ 2025-08-26 10:39 UTC (permalink / raw)
  To: Xin Li, linux-kernel, x86, virtualization
  Cc: Ajay Kaher, Broadcom internal kernel review list, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Boris Ostrovsky, xen-devel, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 606 bytes --]

On 25.08.25 03:54, Xin Li wrote:
> On 6/11/2025 5:58 AM, Juergen Gross wrote:
>> I'm just doing a V2 of my series, but this time including the additional
>> support of the non-serializing and immediate forms. Lets see how this will
>> look like. I will drop using the EAX_EDX_* macros, but due to the reason
>> mentioned above I won't switch to your variant completely.
> 
> Hi Juergen,
> 
> Do you have any update on this?

I've been very busy with other stuff (downstream, security, ...).

In between I've been working on the series. I hope to post it some time in
September.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-08-26 10:39 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-06  9:20 [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Juergen Gross
2025-05-06  9:20 ` [PATCH 1/6] coco/tdx: Rename MSR access helpers Juergen Gross
2025-05-06 11:26   ` Kirill A. Shutemov
2025-05-06  9:20 ` [PATCH 2/6] x86/kvm: Rename the KVM private read_msr() function Juergen Gross
2025-05-06 13:53   ` Sean Christopherson
2025-05-06 13:58     ` Jürgen Groß
2025-05-06 16:16     ` Ingo Molnar
2025-05-06 16:29       ` H. Peter Anvin
2025-05-06  9:20 ` [PATCH 3/6] x86/msr: minimize usage of native_*() msr access functions Juergen Gross
2025-05-06 14:24   ` Sean Christopherson
2025-05-09 21:49   ` Wei Liu
2025-05-06  9:20 ` [PATCH 4/6] x86/msr: Move MSR trace calls one function level up Juergen Gross
2025-05-06  9:20 ` [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces Juergen Gross
2025-05-09  8:18   ` Xin Li
2025-05-12 11:20     ` Jürgen Groß
2025-05-12 11:24       ` Juergen Gross
2025-05-13  5:55         ` Xin Li
2025-05-13  6:06           ` Jürgen Groß
2025-05-13  6:55             ` Xin Li
2025-05-13 22:24             ` H. Peter Anvin
2025-05-15  7:32               ` Xin Li
2025-05-15 20:24                 ` H. Peter Anvin
2025-05-13  7:44       ` Xin Li
2025-06-11 12:58         ` Juergen Gross
2025-06-13  7:31           ` Xin Li
2025-06-13  8:01             ` Jürgen Groß
2025-08-25  1:54           ` Xin Li
2025-08-26 10:39             ` Jürgen Groß
2025-05-06  9:20 ` [PATCH 6/6] x86/msr: reduce number of low level MSR access helpers Juergen Gross
2025-05-10 16:03 ` [PATCH 0/6] x86/msr: let paravirt inline rdmsr/wrmsr instructions Michael Kelley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).