* [PATCH v3 02/16] coco/tdx: Rename MSR access helpers
2026-02-18 8:21 [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
@ 2026-02-18 8:21 ` Juergen Gross
2026-02-18 14:11 ` Edgecombe, Rick P
2026-02-18 8:21 ` [PATCH v3 04/16] KVM: x86: Remove the KVM private read_msr() function Juergen Gross
` (3 subsequent siblings)
4 siblings, 1 reply; 20+ messages in thread
From: Juergen Gross @ 2026-02-18 8:21 UTC (permalink / raw)
To: linux-kernel, x86, linux-coco, kvm
Cc: Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Kiryl Shutsemau, Rick Edgecombe
In order to avoid a name clash with some general MSR access helpers
after a future MSR infrastructure rework, rename the TDX specific
helpers.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
---
arch/x86/coco/tdx/tdx.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 7b2833705d47..500166c1a161 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -468,7 +468,7 @@ static void __cpuidle tdx_safe_halt(void)
raw_local_irq_enable();
}
-static int read_msr(struct pt_regs *regs, struct ve_info *ve)
+static int tdx_read_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
@@ -489,7 +489,7 @@ static int read_msr(struct pt_regs *regs, struct ve_info *ve)
return ve_instr_len(ve);
}
-static int write_msr(struct pt_regs *regs, struct ve_info *ve)
+static int tdx_write_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
@@ -842,9 +842,9 @@ static int virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
case EXIT_REASON_HLT:
return handle_halt(ve);
case EXIT_REASON_MSR_READ:
- return read_msr(regs, ve);
+ return tdx_read_msr(regs, ve);
case EXIT_REASON_MSR_WRITE:
- return write_msr(regs, ve);
+ return tdx_write_msr(regs, ve);
case EXIT_REASON_CPUID:
return handle_cpuid(regs, ve);
case EXIT_REASON_EPT_VIOLATION:
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v3 02/16] coco/tdx: Rename MSR access helpers
2026-02-18 8:21 ` [PATCH v3 02/16] coco/tdx: Rename MSR access helpers Juergen Gross
@ 2026-02-18 14:11 ` Edgecombe, Rick P
0 siblings, 0 replies; 20+ messages in thread
From: Edgecombe, Rick P @ 2026-02-18 14:11 UTC (permalink / raw)
To: kvm@vger.kernel.org, jgross@suse.com, linux-coco@lists.linux.dev,
linux-kernel@vger.kernel.org, x86@kernel.org
Cc: hpa@zytor.com, mingo@redhat.com, tglx@kernel.org, kas@kernel.org,
bp@alien8.de, dave.hansen@linux.intel.com
On Wed, 2026-02-18 at 09:21 +0100, Juergen Gross wrote:
> In order to avoid a name clash with some general MSR access helpers
> after a future MSR infrastructure rework, rename the TDX specific
> helpers.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> ---
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 04/16] KVM: x86: Remove the KVM private read_msr() function
2026-02-18 8:21 [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
2026-02-18 8:21 ` [PATCH v3 02/16] coco/tdx: Rename MSR access helpers Juergen Gross
@ 2026-02-18 8:21 ` Juergen Gross
2026-02-18 14:21 ` Edgecombe, Rick P
2026-02-18 8:21 ` [PATCH v3 05/16] x86/msr: Minimize usage of native_*() msr access functions Juergen Gross
` (2 subsequent siblings)
4 siblings, 1 reply; 20+ messages in thread
From: Juergen Gross @ 2026-02-18 8:21 UTC (permalink / raw)
To: linux-kernel, x86, kvm, linux-coco
Cc: Juergen Gross, Sean Christopherson, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Kiryl Shutsemau, Rick Edgecombe
Instead of having a KVM private read_msr() function, just use rdmsrq().
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
---
V2:
- remove the helper and use rdmsrq() directly (Sean Christopherson)
---
arch/x86/include/asm/kvm_host.h | 10 ----------
arch/x86/kvm/vmx/tdx.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 6 +++---
3 files changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff07c45e3c73..9034222a96e8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2347,16 +2347,6 @@ static inline void kvm_load_ldt(u16 sel)
asm("lldt %0" : : "rm"(sel));
}
-#ifdef CONFIG_X86_64
-static inline unsigned long read_msr(unsigned long msr)
-{
- u64 value;
-
- rdmsrq(msr, value);
- return value;
-}
-#endif
-
static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
{
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 5df9d32d2058..d9e371e39853 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -801,7 +801,7 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
if (likely(is_64bit_mm(current->mm)))
vt->msr_host_kernel_gs_base = current->thread.gsbase;
else
- vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
+ rdmsrq(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
vt->guest_state_loaded = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 967b58a8ab9d..3799cbbb4577 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1403,8 +1403,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
} else {
savesegment(fs, fs_sel);
savesegment(gs, gs_sel);
- fs_base = read_msr(MSR_FS_BASE);
- vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
+ rdmsrq(MSR_FS_BASE, fs_base);
+ rdmsrq(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
}
wrmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
@@ -1463,7 +1463,7 @@ static u64 vmx_read_guest_host_msr(struct vcpu_vmx *vmx, u32 msr, u64 *cache)
{
preempt_disable();
if (vmx->vt.guest_state_loaded)
- *cache = read_msr(msr);
+ rdmsrq(msr, *cache);
preempt_enable();
return *cache;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v3 04/16] KVM: x86: Remove the KVM private read_msr() function
2026-02-18 8:21 ` [PATCH v3 04/16] KVM: x86: Remove the KVM private read_msr() function Juergen Gross
@ 2026-02-18 14:21 ` Edgecombe, Rick P
2026-02-18 14:29 ` Sean Christopherson
0 siblings, 1 reply; 20+ messages in thread
From: Edgecombe, Rick P @ 2026-02-18 14:21 UTC (permalink / raw)
To: kvm@vger.kernel.org, jgross@suse.com, linux-coco@lists.linux.dev,
linux-kernel@vger.kernel.org, x86@kernel.org
Cc: seanjc@google.com, bp@alien8.de, kas@kernel.org, hpa@zytor.com,
mingo@redhat.com, dave.hansen@linux.intel.com, tglx@kernel.org,
pbonzini@redhat.com
On Wed, 2026-02-18 at 09:21 +0100, Juergen Gross wrote:
> Instead of having a KVM private read_msr() function, just use
> rdmsrq().
Might be nice to include a little bit more on the "why", but the patch
is pretty simple.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 04/16] KVM: x86: Remove the KVM private read_msr() function
2026-02-18 14:21 ` Edgecombe, Rick P
@ 2026-02-18 14:29 ` Sean Christopherson
0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2026-02-18 14:29 UTC (permalink / raw)
To: Rick P Edgecombe
Cc: kvm@vger.kernel.org, jgross@suse.com, linux-coco@lists.linux.dev,
linux-kernel@vger.kernel.org, x86@kernel.org, bp@alien8.de,
kas@kernel.org, hpa@zytor.com, mingo@redhat.com,
dave.hansen@linux.intel.com, tglx@kernel.org, pbonzini@redhat.com
On Wed, Feb 18, 2026, Rick P Edgecombe wrote:
> On Wed, 2026-02-18 at 09:21 +0100, Juergen Gross wrote:
> > Instead of having a KVM private read_msr() function, just use
> > rdmsrq().
>
> Might be nice to include a little bit more on the "why", but the patch
> is pretty simple.
Eh, the why is basically "KVM is old and crusty". I'm a-ok without a history
lesson on how we got here :-)
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> > Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Sean Christopherson <seanjc@google.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 05/16] x86/msr: Minimize usage of native_*() msr access functions
2026-02-18 8:21 [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
2026-02-18 8:21 ` [PATCH v3 02/16] coco/tdx: Rename MSR access helpers Juergen Gross
2026-02-18 8:21 ` [PATCH v3 04/16] KVM: x86: Remove the KVM private read_msr() function Juergen Gross
@ 2026-02-18 8:21 ` Juergen Gross
2026-02-18 8:21 ` [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR Juergen Gross
2026-02-18 20:37 ` [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions H. Peter Anvin
4 siblings, 0 replies; 20+ messages in thread
From: Juergen Gross @ 2026-02-18 8:21 UTC (permalink / raw)
To: linux-kernel, x86, linux-hyperv, kvm
Cc: Juergen Gross, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Paolo Bonzini,
Vitaly Kuznetsov, Sean Christopherson, Boris Ostrovsky, xen-devel
In order to prepare for some MSR access function reorg work, switch
most users of native_{read|write}_msr[_safe]() to the more generic
rdmsr*()/wrmsr*() variants.
For now this will have some intermediate performance impact with
paravirtualization configured when running on bare metal, but this
is a prereq change for the planned direct inlining of the rdmsr/wrmsr
instructions with this configuration.
The main reason for this switch is the planned move of the MSR trace
function invocation from the native_*() functions to the generic
rdmsr*()/wrmsr*() variants. Without this switch the users of the
native_*() functions would lose the related tracing entries.
Note that the Xen related MSR access functions will not be switched,
as these will be handled after the move of the trace hooks.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
---
arch/x86/hyperv/ivm.c | 2 +-
arch/x86/kernel/cpu/mshyperv.c | 7 +++++--
arch/x86/kernel/kvmclock.c | 2 +-
arch/x86/kvm/svm/svm.c | 16 ++++++++--------
arch/x86/xen/pmu.c | 4 ++--
5 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 651771534cae..1b2222036a0b 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -327,7 +327,7 @@ int hv_snp_boot_ap(u32 apic_id, unsigned long start_ip, unsigned int cpu)
asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
- vmsa->efer = native_read_msr(MSR_EFER);
+ rdmsrq(MSR_EFER, vmsa->efer);
vmsa->cr4 = native_read_cr4();
vmsa->cr3 = __native_read_cr3();
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 579fb2c64cfd..9bebb1a1ebee 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -111,9 +111,12 @@ void hv_para_set_sint_proxy(bool enable)
*/
u64 hv_para_get_synic_register(unsigned int reg)
{
+ u64 val;
+
if (WARN_ON(!ms_hyperv.paravisor_present || !hv_is_synic_msr(reg)))
return ~0ULL;
- return native_read_msr(reg);
+ rdmsrq(reg, val);
+ return val;
}
/*
@@ -123,7 +126,7 @@ void hv_para_set_synic_register(unsigned int reg, u64 val)
{
if (WARN_ON(!ms_hyperv.paravisor_present || !hv_is_synic_msr(reg)))
return;
- native_write_msr(reg, val);
+ wrmsrq(reg, val);
}
u64 hv_get_msr(unsigned int reg)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index b5991d53fc0e..1002bdd45c0f 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -197,7 +197,7 @@ static void kvm_setup_secondary_clock(void)
void kvmclock_disable(void)
{
if (msr_kvm_system_time)
- native_write_msr(msr_kvm_system_time, 0);
+ wrmsrq(msr_kvm_system_time, 0);
}
static void __init kvmclock_init_mem(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..1c0e7cae9e49 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -389,12 +389,12 @@ static void svm_init_erratum_383(void)
return;
/* Use _safe variants to not break nested virtualization */
- if (native_read_msr_safe(MSR_AMD64_DC_CFG, &val))
+ if (rdmsrq_safe(MSR_AMD64_DC_CFG, &val))
return;
val |= (1ULL << 47);
- native_write_msr_safe(MSR_AMD64_DC_CFG, val);
+ wrmsrq_safe(MSR_AMD64_DC_CFG, val);
erratum_383_found = true;
}
@@ -554,9 +554,9 @@ static int svm_enable_virtualization_cpu(void)
u64 len, status = 0;
int err;
- err = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
+ err = rdmsrq_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
if (!err)
- err = native_read_msr_safe(MSR_AMD64_OSVW_STATUS, &status);
+ err = rdmsrq_safe(MSR_AMD64_OSVW_STATUS, &status);
if (err)
osvw_status = osvw_len = 0;
@@ -2029,7 +2029,7 @@ static bool is_erratum_383(void)
if (!erratum_383_found)
return false;
- if (native_read_msr_safe(MSR_IA32_MC0_STATUS, &value))
+ if (rdmsrq_safe(MSR_IA32_MC0_STATUS, &value))
return false;
/* Bit 62 may or may not be set for this mce */
@@ -2040,11 +2040,11 @@ static bool is_erratum_383(void)
/* Clear MCi_STATUS registers */
for (i = 0; i < 6; ++i)
- native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
+ wrmsrq_safe(MSR_IA32_MCx_STATUS(i), 0);
- if (!native_read_msr_safe(MSR_IA32_MCG_STATUS, &value)) {
+ if (!rdmsrq_safe(MSR_IA32_MCG_STATUS, &value)) {
value &= ~(1ULL << 2);
- native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
+ wrmsrq_safe(MSR_IA32_MCG_STATUS, value);
}
/* Flush tlb to evict multi-match entries */
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 8f89ce0b67e3..d49a3bdc448b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -323,7 +323,7 @@ static u64 xen_amd_read_pmc(int counter)
u64 val;
msr = amd_counters_base + (counter * amd_msr_step);
- native_read_msr_safe(msr, &val);
+ rdmsrq_safe(msr, &val);
return val;
}
@@ -349,7 +349,7 @@ static u64 xen_intel_read_pmc(int counter)
else
msr = MSR_IA32_PERFCTR0 + counter;
- native_read_msr_safe(msr, &val);
+ rdmsrq_safe(msr, &val);
return val;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-18 8:21 [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (2 preceding siblings ...)
2026-02-18 8:21 ` [PATCH v3 05/16] x86/msr: Minimize usage of native_*() msr access functions Juergen Gross
@ 2026-02-18 8:21 ` Juergen Gross
2026-02-18 21:00 ` Sean Christopherson
2026-02-18 20:37 ` [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions H. Peter Anvin
4 siblings, 1 reply; 20+ messages in thread
From: Juergen Gross @ 2026-02-18 8:21 UTC (permalink / raw)
To: linux-kernel, x86, kvm, llvm
Cc: Juergen Gross, Xin Li, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Sean Christopherson,
Paolo Bonzini, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
Justin Stitt
When available use one of the non-serializing WRMSR variants (WRMSRNS
with or without an immediate operand specifying the MSR register) in
__wrmsrq().
For the safe/unsafe variants make __wrmsrq() to be a common base
function instead of duplicating the ALTERNATIVE*() macros. This
requires to let native_wrmsr() use native_wrmsrq() instead of
__wrmsrq(). While changing this, convert native_wrmsr() into an inline
function.
Replace the only call of wsrmsrns() with the now equivalent call to
native_wrmsrq() and remove wsrmsrns().
The paravirt case will be handled later.
Originally-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch, partially taken from "[RFC PATCH v2 21/34] x86/msr: Utilize
the alternatives mechanism to write MSR" by Xin Li.
---
arch/x86/include/asm/fred.h | 2 +-
arch/x86/include/asm/msr.h | 144 +++++++++++++++++++++++++++---------
arch/x86/kvm/vmx/vmx.c | 2 +-
3 files changed, 111 insertions(+), 37 deletions(-)
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 2bb65677c079..71fc0c6e4e32 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -101,7 +101,7 @@ static __always_inline void fred_update_rsp0(void)
unsigned long rsp0 = (unsigned long) task_stack_page(current) + THREAD_SIZE;
if (cpu_feature_enabled(X86_FEATURE_FRED) && (__this_cpu_read(fred_rsp0) != rsp0)) {
- wrmsrns(MSR_IA32_FRED_RSP0, rsp0);
+ native_wrmsrq(MSR_IA32_FRED_RSP0, rsp0);
__this_cpu_write(fred_rsp0, rsp0);
}
}
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 71f41af11591..ba11c3375cbd 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -7,11 +7,11 @@
#ifndef __ASSEMBLER__
#include <asm/asm.h>
-#include <asm/errno.h>
#include <asm/cpumask.h>
#include <uapi/asm/msr.h>
#include <asm/shared/msr.h>
+#include <linux/errno.h>
#include <linux/types.h>
#include <linux/percpu.h>
@@ -56,6 +56,36 @@ static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
#endif
+/* The GNU Assembler (Gas) with Binutils 2.40 adds WRMSRNS support */
+#if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24000
+#define ASM_WRMSRNS "wrmsrns\n\t"
+#else
+#define ASM_WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
+#endif
+
+/* The GNU Assembler (Gas) with Binutils 2.41 adds the .insn directive support */
+#if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24100
+#define ASM_WRMSRNS_IMM \
+ " .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
+#else
+/*
+ * Note, clang also doesn't support the .insn directive.
+ *
+ * The register operand is encoded as %rax because all uses of the immediate
+ * form MSR access instructions reference %rax as the register operand.
+ */
+#define ASM_WRMSRNS_IMM \
+ " .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
+#endif
+
+#define PREPARE_RDX_FOR_WRMSR \
+ "mov %%rax, %%rdx\n\t" \
+ "shr $0x20, %%rdx\n\t"
+
+#define PREPARE_RCX_RDX_FOR_WRMSR \
+ "mov %[msr], %%ecx\n\t" \
+ PREPARE_RDX_FOR_WRMSR
+
/*
* __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
* accessors and should not have any tracing or other functionality piggybacking
@@ -75,12 +105,76 @@ static __always_inline u64 __rdmsr(u32 msr)
return EAX_EDX_VAL(val, low, high);
}
-static __always_inline void __wrmsrq(u32 msr, u64 val)
+static __always_inline bool __wrmsrq_variable(u32 msr, u64 val, int type)
{
- asm volatile("1: wrmsr\n"
- "2:\n"
- _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
- : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)) : "memory");
+#ifdef CONFIG_X86_64
+ BUILD_BUG_ON(__builtin_constant_p(msr));
+#endif
+
+ /*
+ * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
+ * DS prefix to avoid a trailing NOP.
+ */
+ asm_inline volatile goto(
+ "1:\n"
+ ALTERNATIVE("ds wrmsr",
+ ASM_WRMSRNS,
+ X86_FEATURE_WRMSRNS)
+ _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])
+
+ :
+ : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)), [type] "i" (type)
+ : "memory"
+ : badmsr);
+
+ return false;
+
+badmsr:
+ return true;
+}
+
+#ifdef CONFIG_X86_64
+/*
+ * Non-serializing WRMSR or its immediate form, when available.
+ *
+ * Otherwise, it falls back to a serializing WRMSR.
+ */
+static __always_inline bool __wrmsrq_constant(u32 msr, u64 val, int type)
+{
+ BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+ asm_inline volatile goto(
+ "1:\n"
+ ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
+ "2: ds wrmsr",
+ PREPARE_RCX_RDX_FOR_WRMSR
+ ASM_WRMSRNS,
+ X86_FEATURE_WRMSRNS,
+ ASM_WRMSRNS_IMM,
+ X86_FEATURE_MSR_IMM)
+ _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
+ _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
+
+ :
+ : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
+ : "memory", "ecx", "rdx"
+ : badmsr);
+
+ return false;
+
+badmsr:
+ return true;
+}
+#endif
+
+static __always_inline bool __wrmsrq(u32 msr, u64 val, int type)
+{
+#ifdef CONFIG_X86_64
+ if (__builtin_constant_p(msr))
+ return __wrmsrq_constant(msr, val, type);
+#endif
+
+ return __wrmsrq_variable(msr, val, type);
}
#define native_rdmsr(msr, val1, val2) \
@@ -95,11 +189,15 @@ static __always_inline u64 native_rdmsrq(u32 msr)
return __rdmsr(msr);
}
-#define native_wrmsr(msr, low, high) \
- __wrmsrq((msr), (u64)(high) << 32 | (low))
+static __always_inline void native_wrmsrq(u32 msr, u64 val)
+{
+ __wrmsrq(msr, val, EX_TYPE_WRMSR);
+}
-#define native_wrmsrq(msr, val) \
- __wrmsrq((msr), (val))
+static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
+{
+ native_wrmsrq(msr, (u64)high << 32 | low);
+}
static inline u64 native_read_msr(u32 msr)
{
@@ -131,15 +229,7 @@ static inline void notrace native_write_msr(u32 msr, u64 val)
/* Can be uninlined because referenced by paravirt */
static inline int notrace native_write_msr_safe(u32 msr, u64 val)
{
- int err;
-
- asm volatile("1: wrmsr ; xor %[err],%[err]\n"
- "2:\n\t"
- _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
- : [err] "=a" (err)
- : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
- : "memory");
- return err;
+ return __wrmsrq(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
}
extern int rdmsr_safe_regs(u32 regs[8]);
@@ -158,7 +248,6 @@ static inline u64 native_read_pmc(int counter)
#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
#else
-#include <linux/errno.h>
static __always_inline u64 read_msr(u32 msr)
{
return native_read_msr(msr);
@@ -250,21 +339,6 @@ static inline int wrmsrq_safe(u32 msr, u64 val)
return err;
}
-/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
-#define ASM_WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
-
-/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
-static __always_inline void wrmsrns(u32 msr, u64 val)
-{
- /*
- * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
- * DS prefix to avoid a trailing NOP.
- */
- asm volatile("1: " ALTERNATIVE("ds wrmsr", ASM_WRMSRNS, X86_FEATURE_WRMSRNS)
- "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
- : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
-}
-
static inline void wrmsr(u32 msr, u32 low, u32 high)
{
wrmsrq(msr, (u64)high << 32 | low);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3799cbbb4577..e29a2ac24669 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1473,7 +1473,7 @@ static void vmx_write_guest_host_msr(struct vcpu_vmx *vmx, u32 msr, u64 data,
{
preempt_disable();
if (vmx->vt.guest_state_loaded)
- wrmsrns(msr, data);
+ native_wrmsrq(msr, data);
preempt_enable();
*cache = data;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-18 8:21 ` [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR Juergen Gross
@ 2026-02-18 21:00 ` Sean Christopherson
2026-02-18 21:37 ` Dave Hansen
0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2026-02-18 21:00 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, kvm, llvm, Xin Li, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Paolo Bonzini, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
Justin Stitt
On Wed, Feb 18, 2026, Juergen Gross wrote:
> When available use one of the non-serializing WRMSR variants (WRMSRNS
> with or without an immediate operand specifying the MSR register) in
> __wrmsrq().
Silently using a non-serializing version (or not) seems dangerous (not for KVM,
but for the kernel at-large), unless the rule is going to be that MSR writes need
to be treated as non-serializing by default. Which I'm fine with, but if we go
that route, then I'd prefer not to special case non-serializing callers.
E.g. in the KVM code, I find the use of wrmsrns() intuitive, because KVM doesn't
need the WRMSR to be serializing and so can eke out a bit of extra performance by
using wrmsrns() instead of wrmsrq(). But with native_wrmsrq(), it's not clear
why _this_ particular WRMSR in KVM needs to use the "native" version.
There are a pile of other WRMSRs in KVM that are in hot paths, especially with
the mediated PMU support. If we're going to make the default version non-serializing,
then I'd prefer to get that via wrmsrq(), i.e. reap the benefits for all of KVM,
not just one arbitrary path.
> For the safe/unsafe variants make __wrmsrq() to be a common base
> function instead of duplicating the ALTERNATIVE*() macros. This
> requires to let native_wrmsr() use native_wrmsrq() instead of
> __wrmsrq(). While changing this, convert native_wrmsr() into an inline
> function.
>
> Replace the only call of wsrmsrns() with the now equivalent call to
> native_wrmsrq() and remove wsrmsrns().
...
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3799cbbb4577..e29a2ac24669 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1473,7 +1473,7 @@ static void vmx_write_guest_host_msr(struct vcpu_vmx *vmx, u32 msr, u64 data,
> {
> preempt_disable();
> if (vmx->vt.guest_state_loaded)
> - wrmsrns(msr, data);
> + native_wrmsrq(msr, data);
> preempt_enable();
> *cache = data;
> }
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-18 21:00 ` Sean Christopherson
@ 2026-02-18 21:37 ` Dave Hansen
2026-02-18 23:36 ` H. Peter Anvin
2026-02-19 6:44 ` Jürgen Groß
0 siblings, 2 replies; 20+ messages in thread
From: Dave Hansen @ 2026-02-18 21:37 UTC (permalink / raw)
To: Sean Christopherson, Juergen Gross
Cc: linux-kernel, x86, kvm, llvm, Xin Li, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Paolo Bonzini, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
Justin Stitt
On 2/18/26 13:00, Sean Christopherson wrote:
> On Wed, Feb 18, 2026, Juergen Gross wrote:
>> When available use one of the non-serializing WRMSR variants (WRMSRNS
>> with or without an immediate operand specifying the MSR register) in
>> __wrmsrq().
> Silently using a non-serializing version (or not) seems dangerous (not for KVM,
> but for the kernel at-large), unless the rule is going to be that MSR writes need
> to be treated as non-serializing by default.
Yeah, there's no way we can do this in general. It'll work for 99% of
the MSRs on 99% of the systems for a long time. Then the one new system
with WRMSRNS is going to have one hell of a heisenbug that'll take years
off some poor schmuck's life.
We should really encourage *new* code to use wrmsrns() when it can at
least for annotation that it doesn't need serialization. But I don't
think we should do anything to old, working code.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-18 21:37 ` Dave Hansen
@ 2026-02-18 23:36 ` H. Peter Anvin
2026-02-19 6:41 ` Jürgen Groß
2026-02-19 6:44 ` Jürgen Groß
1 sibling, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2026-02-18 23:36 UTC (permalink / raw)
To: Dave Hansen, Sean Christopherson, Juergen Gross
Cc: linux-kernel, x86, kvm, llvm, Xin Li, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Paolo Bonzini,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt
On February 18, 2026 1:37:42 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>On 2/18/26 13:00, Sean Christopherson wrote:
>> On Wed, Feb 18, 2026, Juergen Gross wrote:
>>> When available use one of the non-serializing WRMSR variants (WRMSRNS
>>> with or without an immediate operand specifying the MSR register) in
>>> __wrmsrq().
>> Silently using a non-serializing version (or not) seems dangerous (not for KVM,
>> but for the kernel at-large), unless the rule is going to be that MSR writes need
>> to be treated as non-serializing by default.
>
>Yeah, there's no way we can do this in general. It'll work for 99% of
>the MSRs on 99% of the systems for a long time. Then the one new system
>with WRMSRNS is going to have one hell of a heisenbug that'll take years
>off some poor schmuck's life.
>
>We should really encourage *new* code to use wrmsrns() when it can at
>least for annotation that it doesn't need serialization. But I don't
>think we should do anything to old, working code.
Correct. We need to do this on a user by user basis.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-18 23:36 ` H. Peter Anvin
@ 2026-02-19 6:41 ` Jürgen Groß
0 siblings, 0 replies; 20+ messages in thread
From: Jürgen Groß @ 2026-02-19 6:41 UTC (permalink / raw)
To: H. Peter Anvin, Dave Hansen, Sean Christopherson
Cc: linux-kernel, x86, kvm, llvm, Xin Li, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Paolo Bonzini,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt
[-- Attachment #1.1.1: Type: text/plain, Size: 1455 bytes --]
On 19.02.26 00:36, H. Peter Anvin wrote:
> On February 18, 2026 1:37:42 PM PST, Dave Hansen <dave.hansen@intel.com> wrote:
>> On 2/18/26 13:00, Sean Christopherson wrote:
>>> On Wed, Feb 18, 2026, Juergen Gross wrote:
>>>> When available use one of the non-serializing WRMSR variants (WRMSRNS
>>>> with or without an immediate operand specifying the MSR register) in
>>>> __wrmsrq().
>>> Silently using a non-serializing version (or not) seems dangerous (not for KVM,
>>> but for the kernel at-large), unless the rule is going to be that MSR writes need
>>> to be treated as non-serializing by default.
>>
>> Yeah, there's no way we can do this in general. It'll work for 99% of
>> the MSRs on 99% of the systems for a long time. Then the one new system
>> with WRMSRNS is going to have one hell of a heisenbug that'll take years
>> off some poor schmuck's life.
>>
>> We should really encourage *new* code to use wrmsrns() when it can at
>> least for annotation that it doesn't need serialization. But I don't
>> think we should do anything to old, working code.
>
> Correct. We need to do this on a user by user basis.
Then I'd prefer to introduce a new wrmsr_sync() function for the serializing
variant and to switch all current users which are not known to tolerate the
non-serializing form to it. The main advantage of that approach would be to
be able to use the immediate form where possible automatically.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-18 21:37 ` Dave Hansen
2026-02-18 23:36 ` H. Peter Anvin
@ 2026-02-19 6:44 ` Jürgen Groß
2026-02-20 17:12 ` Xin Li
1 sibling, 1 reply; 20+ messages in thread
From: Jürgen Groß @ 2026-02-19 6:44 UTC (permalink / raw)
To: Dave Hansen, Sean Christopherson
Cc: linux-kernel, x86, kvm, llvm, Xin Li, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Paolo Bonzini, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
Justin Stitt
[-- Attachment #1.1.1: Type: text/plain, Size: 963 bytes --]
On 18.02.26 22:37, Dave Hansen wrote:
> On 2/18/26 13:00, Sean Christopherson wrote:
>> On Wed, Feb 18, 2026, Juergen Gross wrote:
>>> When available use one of the non-serializing WRMSR variants (WRMSRNS
>>> with or without an immediate operand specifying the MSR register) in
>>> __wrmsrq().
>> Silently using a non-serializing version (or not) seems dangerous (not for KVM,
>> but for the kernel at-large), unless the rule is going to be that MSR writes need
>> to be treated as non-serializing by default.
>
> Yeah, there's no way we can do this in general. It'll work for 99% of
> the MSRs on 99% of the systems for a long time. Then the one new system
> with WRMSRNS is going to have one hell of a heisenbug that'll take years
> off some poor schmuck's life.
I _really_ thought this was discussed upfront by Xin before he sent out his
first version of the series.
Sorry for not making it more clear in the header message.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-19 6:44 ` Jürgen Groß
@ 2026-02-20 17:12 ` Xin Li
2026-02-20 17:32 ` Sean Christopherson
2026-02-20 17:40 ` Dave Hansen
0 siblings, 2 replies; 20+ messages in thread
From: Xin Li @ 2026-02-20 17:12 UTC (permalink / raw)
To: Jürgen Groß
Cc: Dave Hansen, Sean Christopherson, linux-kernel, x86, kvm, llvm,
H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Paolo Bonzini, Nathan Chancellor, Nick Desaulniers,
Bill Wendling, Justin Stitt
> On Feb 18, 2026, at 10:44 PM, Jürgen Groß <jgross@suse.com> wrote:
>
> On 18.02.26 22:37, Dave Hansen wrote:
>> On 2/18/26 13:00, Sean Christopherson wrote:
>>> On Wed, Feb 18, 2026, Juergen Gross wrote:
>>>> When available use one of the non-serializing WRMSR variants (WRMSRNS
>>>> with or without an immediate operand specifying the MSR register) in
>>>> __wrmsrq().
>>> Silently using a non-serializing version (or not) seems dangerous (not for KVM,
>>> but for the kernel at-large), unless the rule is going to be that MSR writes need
>>> to be treated as non-serializing by default.
>> Yeah, there's no way we can do this in general. It'll work for 99% of
>> the MSRs on 99% of the systems for a long time. Then the one new system
>> with WRMSRNS is going to have one hell of a heisenbug that'll take years
>> off some poor schmuck's life.
>
> I _really_ thought this was discussed upfront by Xin before he sent out his
> first version of the series.
I actually reached out to the Intel architects about this before I started
coding. Turns out, if the CPU supports WRMSRNS, you can use it across the
board. The hardware is smart enough to perform a serialized write whenever
a non-serialized one isn't proper, so there’s no risk.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-20 17:12 ` Xin Li
@ 2026-02-20 17:32 ` Sean Christopherson
2026-02-20 17:40 ` Dave Hansen
1 sibling, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2026-02-20 17:32 UTC (permalink / raw)
To: Xin Li
Cc: Jürgen Groß, Dave Hansen, linux-kernel, x86, kvm, llvm,
H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Paolo Bonzini, Nathan Chancellor, Nick Desaulniers,
Bill Wendling, Justin Stitt
On Fri, Feb 20, 2026, Xin Li wrote:
>
> > On Feb 18, 2026, at 10:44 PM, Jürgen Groß <jgross@suse.com> wrote:
> >
> > On 18.02.26 22:37, Dave Hansen wrote:
> >> On 2/18/26 13:00, Sean Christopherson wrote:
> >>> On Wed, Feb 18, 2026, Juergen Gross wrote:
> >>>> When available use one of the non-serializing WRMSR variants (WRMSRNS
> >>>> with or without an immediate operand specifying the MSR register) in
> >>>> __wrmsrq().
> >>> Silently using a non-serializing version (or not) seems dangerous (not for KVM,
> >>> but for the kernel at-large), unless the rule is going to be that MSR writes need
> >>> to be treated as non-serializing by default.
> >> Yeah, there's no way we can do this in general. It'll work for 99% of
> >> the MSRs on 99% of the systems for a long time. Then the one new system
> >> with WRMSRNS is going to have one hell of a heisenbug that'll take years
> >> off some poor schmuck's life.
> >
> > I _really_ thought this was discussed upfront by Xin before he sent out his
> > first version of the series.
>
> I actually reached out to the Intel architects about this before I started
> coding. Turns out, if the CPU supports WRMSRNS, you can use it across the
> board. The hardware is smart enough to perform a serialized write whenever
> a non-serialized one isn't proper, so there’s no risk.
How can hardware possibly know what's "proper"? E.g. I don't see how hardware
can reason about safety if there's a software sequence that is subtly relying on
the serialization of WRMSR to provide some form of ordering.
And if that's the _architectural_ behavior, then what's the point of WRMSRNS?
If it's not architectural, then I don't see how the kernel can rely on it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-20 17:12 ` Xin Li
2026-02-20 17:32 ` Sean Christopherson
@ 2026-02-20 17:40 ` Dave Hansen
2026-02-23 17:56 ` Xin Li
1 sibling, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2026-02-20 17:40 UTC (permalink / raw)
To: Xin Li, Jürgen Groß
Cc: Sean Christopherson, linux-kernel, x86, kvm, llvm, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Paolo Bonzini, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
Justin Stitt
On 2/20/26 09:12, Xin Li wrote:
>> I _really_ thought this was discussed upfront by Xin before he sent out his
>> first version of the series.
> I actually reached out to the Intel architects about this before I started
> coding. Turns out, if the CPU supports WRMSRNS, you can use it across the
> board. The hardware is smart enough to perform a serialized write whenever
> a non-serialized one isn't proper, so there’s no risk.
Could we be a little more specific here, please?
If it was universally safe to s/WRMSR/WRMSRNS/, then there wouldn't have
been a need for WRMSRNS in the ISA.
Even the WRMSRNS description in the SDM talks about some caveats with
"performance-monitor events" MSRs. That sounds like it contradicts the
idea that the "hardware is smart enough" universally to tolerate using
WRMSRNS *EVERYWHERE*.
It also says:
Like WRMSR, WRMSRNS will ensure that all operations before it do
not use the new MSR value and that all operations after the
WRMSRNS do use the new value.
Which is a handy guarantee for sure. But, it's far short of a fully
serializing instruction.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-20 17:40 ` Dave Hansen
@ 2026-02-23 17:56 ` Xin Li
2026-02-23 21:34 ` H. Peter Anvin
0 siblings, 1 reply; 20+ messages in thread
From: Xin Li @ 2026-02-23 17:56 UTC (permalink / raw)
To: Dave Hansen
Cc: Jürgen Groß, Sean Christopherson, linux-kernel, x86,
kvm, llvm, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, Paolo Bonzini, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
>>> I _really_ thought this was discussed upfront by Xin before he sent out his
>>> first version of the series.
>> I actually reached out to the Intel architects about this before I started
>> coding. Turns out, if the CPU supports WRMSRNS, you can use it across the
>> board. The hardware is smart enough to perform a serialized write whenever
>> a non-serialized one isn't proper, so there’s no risk.
>
> Could we be a little more specific here, please?
Sorry as I’m no longer with Intel, I don’t have access to those emails.
Got to mention, also to reply to Sean’s challenge, as usual I didn’t get
detailed explanation about how would hardware implement WRMSRNS,
except it falls back to do a serialized write when it’s not *proper*.
>
> If it was universally safe to s/WRMSR/WRMSRNS/, then there wouldn't have
> been a need for WRMSRNS in the ISA.
>
> Even the WRMSRNS description in the SDM talks about some caveats with
> "performance-monitor events" MSRs. That sounds like it contradicts the
> idea that the "hardware is smart enough" universally to tolerate using
> WRMSRNS *EVERYWHERE*.
>
> It also says:
>
> Like WRMSR, WRMSRNS will ensure that all operations before it do
> not use the new MSR value and that all operations after the
> WRMSRNS do use the new value.
>
> Which is a handy guarantee for sure. But, it's far short of a fully
> serializing instruction.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR
2026-02-23 17:56 ` Xin Li
@ 2026-02-23 21:34 ` H. Peter Anvin
0 siblings, 0 replies; 20+ messages in thread
From: H. Peter Anvin @ 2026-02-23 21:34 UTC (permalink / raw)
To: Xin Li, Dave Hansen
Cc: Jürgen Groß, Sean Christopherson, linux-kernel, x86,
kvm, llvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Paolo Bonzini, Nathan Chancellor, Nick Desaulniers,
Bill Wendling, Justin Stitt
On February 23, 2026 9:56:28 AM PST, Xin Li <xin@zytor.com> wrote:
>
>>>> I _really_ thought this was discussed upfront by Xin before he sent out his
>>>> first version of the series.
>>> I actually reached out to the Intel architects about this before I started
>>> coding. Turns out, if the CPU supports WRMSRNS, you can use it across the
>>> board. The hardware is smart enough to perform a serialized write whenever
>>> a non-serialized one isn't proper, so there’s no risk.
>>
>> Could we be a little more specific here, please?
>
>Sorry as I’m no longer with Intel, I don’t have access to those emails.
>
>Got to mention, also to reply to Sean’s challenge, as usual I didn’t get
>detailed explanation about how would hardware implement WRMSRNS,
>except it falls back to do a serialized write when it’s not *proper*.
>
>
>>
>> If it was universally safe to s/WRMSR/WRMSRNS/, then there wouldn't have
>> been a need for WRMSRNS in the ISA.
>>
>> Even the WRMSRNS description in the SDM talks about some caveats with
>> "performance-monitor events" MSRs. That sounds like it contradicts the
>> idea that the "hardware is smart enough" universally to tolerate using
>> WRMSRNS *EVERYWHERE*.
>>
>> It also says:
>>
>> Like WRMSR, WRMSRNS will ensure that all operations before it do
>> not use the new MSR value and that all operations after the
>> WRMSRNS do use the new value.
>>
>> Which is a handy guarantee for sure. But, it's far short of a fully
>> serializing instruction.
>
>
So to get a little bit of clarity here as to the architectural contract as opposed to the current implementations:
1. WRNSRNS is indeed intended as an opt-in, as opposed to declaring random registers non-serializing a posteori by sheer necessity in technical violation of the ISA.
We should not blindly replace all WRMSRs with WRMSRNS. We should, however, make wrmsrns() fall back to WRMSR on hardware which does not support it, so we can unconditionally replace it at call sites. Many, probably most, would be possible to replace, but for those that make no difference performance-wise there is really no reason to worry about the testing.
It is also quite likely we will find cases where we need *one* serialization after writing to a whole group of MSRs. In that case, we may want to add a sync_cpu_after_wrmsrns() [or something like that] which does a sync_cpu() if and only if WRMSRNS is supported.
I don't know if there will ever be any CPUs which support WRMSRNS but not SERIALIZE, so it might be entirely reasonable to have WRMSRNS depend on SERIALIZE and not bother with the IRET fallback variation.
2. WRMSRNS *may* perform a fully serializing write if the hardware implementation does not support a faster write method for a certain MSR. This is particularly likely for MSRs that have system-wide consequences, but it is also a legitimate option for the hardware implementation for MSRs that are not expected to have any kind of performance impact (full serialization is a very easy way to ensure full consistency and so reduces implementation and verification burden.)
3. All registers, including MSRs, in x86 are subject to scoreboarding, meaning that so-called "psychic effects" (a direct effect being observable before the cause) or use of stale resources are never permitted. This does *not* imply that events cannot be observed out of order, and cross-CPU visibility has its own rules, but that is not relevant for most registers.
4. WRMSRNS immediate can be reasonably expected to be significantly faster than even WRMSRNS ecx (at least for MSRs deemed valuable to optimize), because the MSR number is available to the hardware at the very beginning of the instruction pipeline. To take proper advantage of that, it is desirable to avoid calling wrmsrns() with a non-constant value in code paths where performance matters, even if it bloats the code somewhat. The main case which I can think about that might actually matter is context-switching with perf enabled (also a good example for wanting to SERIALIZE or at least MFENCE or LFENCE after the batch write if they will have effects before returning to user space.) There is also of course the option of dynamically generating a code snippet if the list of MSRs is too dynamic.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions
2026-02-18 8:21 [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (3 preceding siblings ...)
2026-02-18 8:21 ` [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR Juergen Gross
@ 2026-02-18 20:37 ` H. Peter Anvin
2026-02-19 6:28 ` Jürgen Groß
4 siblings, 1 reply; 20+ messages in thread
From: H. Peter Anvin @ 2026-02-18 20:37 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, x86, linux-coco, kvm, linux-hyperv,
virtualization, llvm
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Kiryl Shutsemau, Rick Edgecombe, Sean Christopherson,
Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Vitaly Kuznetsov, Boris Ostrovsky, xen-devel,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Andy Lutomirski, Peter Zijlstra, Xin Li, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt, Josh Poimboeuf,
andy.cooper
On February 18, 2026 12:21:17 AM PST, Juergen Gross <jgross@suse.com> wrote:
>When building a kernel with CONFIG_PARAVIRT_XXL the paravirt
>infrastructure will always use functions for reading or writing MSRs,
>even when running on bare metal.
>
>Switch to inline RDMSR/WRMSR instructions in this case, reducing the
>paravirt overhead.
>
>The first patch is a prerequisite fix for alternative patching. Its
>is needed due to the initial indirect call needs to be padded with
>NOPs in some cases with the following patches.
>
>In order to make this less intrusive, some further reorganization of
>the MSR access helpers is done in the patches 1-6.
>
>The next 4 patches are converting the non-paravirt case to use direct
>inlining of the MSR access instructions, including the WRMSRNS
>instruction and the immediate variants of RDMSR and WRMSR if possible.
>
>Patches 11-13 are some further preparations for making the real switch
>to directly patch in the native MSR instructions easier.
>
>Patch 14 is switching the paravirt MSR function interface from normal
>call ABI to one more similar to the native MSR instructions.
>
>Patch 15 is a little cleanup patch.
>
>Patch 16 is the final step for patching in the native MSR instructions
>when not running as a Xen PV guest.
>
>This series has been tested to work with Xen PV and on bare metal.
>
>Note that there is more room for improvement. This series is sent out
>to get a first impression how the code will basically look like.
Does that mean you are considering this patchset an RFC? If so, you should put that in the subject header.
>Right now the same problem is solved differently for the paravirt and
>the non-paravirt cases. In case this is not desired, there are two
>possibilities to merge the two implementations. Both solutions have
>the common idea to have rather similar code for paravirt and
>non-paravirt variants, but just use a different main macro for
>generating the respective code. For making the code of both possible
>scenarios more similar, the following variants are possible:
>
>1. Remove the micro-optimizations of the non-paravirt case, making
> it similar to the paravirt code in my series. This has the
> advantage of being more simple, but might have a very small
> negative performance impact (probably not really detectable).
>
>2. Add the same micro-optimizations to the paravirt case, requiring
> to enhance paravirt patching to support a to be patched indirect
> call in the middle of the initial code snipplet.
>
>In both cases the native MSR function variants would no longer be
>usable in the paravirt case, but this would mostly affect Xen, as it
>would need to open code the WRMSR/RDMSR instructions to be used
>instead the native_*msr*() functions.
>
>Changes since V2:
>- switch back to the paravirt approach
>
>Changes since V1:
>- Use Xin Li's approach for inlining
>- Several new patches
>
>Juergen Gross (16):
> x86/alternative: Support alt_replace_call() with instructions after
> call
> coco/tdx: Rename MSR access helpers
> x86/sev: Replace call of native_wrmsr() with native_wrmsrq()
> KVM: x86: Remove the KVM private read_msr() function
> x86/msr: Minimize usage of native_*() msr access functions
> x86/msr: Move MSR trace calls one function level up
> x86/opcode: Add immediate form MSR instructions
> x86/extable: Add support for immediate form MSR instructions
> x86/msr: Use the alternatives mechanism for WRMSR
> x86/msr: Use the alternatives mechanism for RDMSR
> x86/alternatives: Add ALTERNATIVE_4()
> x86/paravirt: Split off MSR related hooks into new header
> x86/paravirt: Prepare support of MSR instruction interfaces
> x86/paravirt: Switch MSR access pv_ops functions to instruction
> interfaces
> x86/msr: Reduce number of low level MSR access helpers
> x86/paravirt: Use alternatives for MSR access with paravirt
>
> arch/x86/coco/sev/internal.h | 7 +-
> arch/x86/coco/tdx/tdx.c | 8 +-
> arch/x86/hyperv/ivm.c | 2 +-
> arch/x86/include/asm/alternative.h | 6 +
> arch/x86/include/asm/fred.h | 2 +-
> arch/x86/include/asm/kvm_host.h | 10 -
> arch/x86/include/asm/msr.h | 345 ++++++++++++++++------
> arch/x86/include/asm/paravirt-msr.h | 148 ++++++++++
> arch/x86/include/asm/paravirt.h | 67 -----
> arch/x86/include/asm/paravirt_types.h | 57 ++--
> arch/x86/include/asm/qspinlock_paravirt.h | 4 +-
> arch/x86/kernel/alternative.c | 5 +-
> arch/x86/kernel/cpu/mshyperv.c | 7 +-
> arch/x86/kernel/kvmclock.c | 2 +-
> arch/x86/kernel/paravirt.c | 42 ++-
> arch/x86/kvm/svm/svm.c | 16 +-
> arch/x86/kvm/vmx/tdx.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 8 +-
> arch/x86/lib/x86-opcode-map.txt | 5 +-
> arch/x86/mm/extable.c | 35 ++-
> arch/x86/xen/enlighten_pv.c | 52 +++-
> arch/x86/xen/pmu.c | 4 +-
> tools/arch/x86/lib/x86-opcode-map.txt | 5 +-
> tools/objtool/check.c | 1 +
> 24 files changed, 576 insertions(+), 264 deletions(-)
> create mode 100644 arch/x86/include/asm/paravirt-msr.h
>
Could you clarify *on the high design level* what "go back to the paravirt approach" means, and the motivation for that?
Note that for Xen *most* MSRs fall in one of two categories: those that are dropped entirely and those that are just passed straight on to the hardware.
I don't know if anyone cares about optimizing PV Xen anymore, but at least in theory Xen can un-paravirtualize most sites.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions
2026-02-18 20:37 ` [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions H. Peter Anvin
@ 2026-02-19 6:28 ` Jürgen Groß
0 siblings, 0 replies; 20+ messages in thread
From: Jürgen Groß @ 2026-02-19 6:28 UTC (permalink / raw)
To: H. Peter Anvin, linux-kernel, x86, linux-coco, kvm, linux-hyperv,
virtualization, llvm
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Kiryl Shutsemau, Rick Edgecombe, Sean Christopherson,
Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Vitaly Kuznetsov, Boris Ostrovsky, xen-devel,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Andy Lutomirski, Peter Zijlstra, Xin Li, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt, Josh Poimboeuf,
andy.cooper
[-- Attachment #1.1.1: Type: text/plain, Size: 6721 bytes --]
On 18.02.26 21:37, H. Peter Anvin wrote:
> On February 18, 2026 12:21:17 AM PST, Juergen Gross <jgross@suse.com> wrote:
>> When building a kernel with CONFIG_PARAVIRT_XXL the paravirt
>> infrastructure will always use functions for reading or writing MSRs,
>> even when running on bare metal.
>>
>> Switch to inline RDMSR/WRMSR instructions in this case, reducing the
>> paravirt overhead.
>>
>> The first patch is a prerequisite fix for alternative patching. Its
>> is needed due to the initial indirect call needs to be padded with
>> NOPs in some cases with the following patches.
>>
>> In order to make this less intrusive, some further reorganization of
>> the MSR access helpers is done in the patches 1-6.
>>
>> The next 4 patches are converting the non-paravirt case to use direct
>> inlining of the MSR access instructions, including the WRMSRNS
>> instruction and the immediate variants of RDMSR and WRMSR if possible.
>>
>> Patches 11-13 are some further preparations for making the real switch
>> to directly patch in the native MSR instructions easier.
>>
>> Patch 14 is switching the paravirt MSR function interface from normal
>> call ABI to one more similar to the native MSR instructions.
>>
>> Patch 15 is a little cleanup patch.
>>
>> Patch 16 is the final step for patching in the native MSR instructions
>> when not running as a Xen PV guest.
>>
>> This series has been tested to work with Xen PV and on bare metal.
>>
>> Note that there is more room for improvement. This series is sent out
>> to get a first impression how the code will basically look like.
>
> Does that mean you are considering this patchset an RFC? If so, you should put that in the subject header.
It is one possible solution.
>
>> Right now the same problem is solved differently for the paravirt and
>> the non-paravirt cases. In case this is not desired, there are two
>> possibilities to merge the two implementations. Both solutions have
>> the common idea to have rather similar code for paravirt and
>> non-paravirt variants, but just use a different main macro for
>> generating the respective code. For making the code of both possible
>> scenarios more similar, the following variants are possible:
>>
>> 1. Remove the micro-optimizations of the non-paravirt case, making
>> it similar to the paravirt code in my series. This has the
>> advantage of being more simple, but might have a very small
>> negative performance impact (probably not really detectable).
>>
>> 2. Add the same micro-optimizations to the paravirt case, requiring
>> to enhance paravirt patching to support a to be patched indirect
>> call in the middle of the initial code snipplet.
>>
>> In both cases the native MSR function variants would no longer be
>> usable in the paravirt case, but this would mostly affect Xen, as it
>> would need to open code the WRMSR/RDMSR instructions to be used
>> instead the native_*msr*() functions.
>>
>> Changes since V2:
>> - switch back to the paravirt approach
>>
>> Changes since V1:
>> - Use Xin Li's approach for inlining
>> - Several new patches
>>
>> Juergen Gross (16):
>> x86/alternative: Support alt_replace_call() with instructions after
>> call
>> coco/tdx: Rename MSR access helpers
>> x86/sev: Replace call of native_wrmsr() with native_wrmsrq()
>> KVM: x86: Remove the KVM private read_msr() function
>> x86/msr: Minimize usage of native_*() msr access functions
>> x86/msr: Move MSR trace calls one function level up
>> x86/opcode: Add immediate form MSR instructions
>> x86/extable: Add support for immediate form MSR instructions
>> x86/msr: Use the alternatives mechanism for WRMSR
>> x86/msr: Use the alternatives mechanism for RDMSR
>> x86/alternatives: Add ALTERNATIVE_4()
>> x86/paravirt: Split off MSR related hooks into new header
>> x86/paravirt: Prepare support of MSR instruction interfaces
>> x86/paravirt: Switch MSR access pv_ops functions to instruction
>> interfaces
>> x86/msr: Reduce number of low level MSR access helpers
>> x86/paravirt: Use alternatives for MSR access with paravirt
>>
>> arch/x86/coco/sev/internal.h | 7 +-
>> arch/x86/coco/tdx/tdx.c | 8 +-
>> arch/x86/hyperv/ivm.c | 2 +-
>> arch/x86/include/asm/alternative.h | 6 +
>> arch/x86/include/asm/fred.h | 2 +-
>> arch/x86/include/asm/kvm_host.h | 10 -
>> arch/x86/include/asm/msr.h | 345 ++++++++++++++++------
>> arch/x86/include/asm/paravirt-msr.h | 148 ++++++++++
>> arch/x86/include/asm/paravirt.h | 67 -----
>> arch/x86/include/asm/paravirt_types.h | 57 ++--
>> arch/x86/include/asm/qspinlock_paravirt.h | 4 +-
>> arch/x86/kernel/alternative.c | 5 +-
>> arch/x86/kernel/cpu/mshyperv.c | 7 +-
>> arch/x86/kernel/kvmclock.c | 2 +-
>> arch/x86/kernel/paravirt.c | 42 ++-
>> arch/x86/kvm/svm/svm.c | 16 +-
>> arch/x86/kvm/vmx/tdx.c | 2 +-
>> arch/x86/kvm/vmx/vmx.c | 8 +-
>> arch/x86/lib/x86-opcode-map.txt | 5 +-
>> arch/x86/mm/extable.c | 35 ++-
>> arch/x86/xen/enlighten_pv.c | 52 +++-
>> arch/x86/xen/pmu.c | 4 +-
>> tools/arch/x86/lib/x86-opcode-map.txt | 5 +-
>> tools/objtool/check.c | 1 +
>> 24 files changed, 576 insertions(+), 264 deletions(-)
>> create mode 100644 arch/x86/include/asm/paravirt-msr.h
>>
>
> Could you clarify *on the high design level* what "go back to the paravirt approach" means, and the motivation for that?
This is related to V2 of this series, where I used a static branch for
special casing Xen PV.
Peter Zijlstra commented on that asking to try harder using the pv_ops
hooks for Xen PV, too.
> Note that for Xen *most* MSRs fall in one of two categories: those that are dropped entirely and those that are just passed straight on to the hardware.
>
> I don't know if anyone cares about optimizing PV Xen anymore, but at least in theory Xen can un-paravirtualize most sites.
The problem with that is, that this would need to be taken care at the
callers' sites, "poisoning" a lot of code with Xen specific paths. Or we'd
need to use the native variants explicitly at all places where Xen PV
would just use the MSR instructions itself. But please be aware, that
there are plans to introduce a hypercall for Xen to speed up MSR accesses,
which would reduce the "passed through to hardware" cases to 0.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread