* [PATCH v2 01/12] coco/tdx: Rename MSR access helpers
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 9:07 ` Kiryl Shutsemau
2025-09-30 7:03 ` [PATCH v2 02/12] x86/sev: Replace call of native_wrmsr() with native_wrmsrq() Juergen Gross
` (11 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86, linux-coco
Cc: xin, Juergen Gross, Kirill A. Shutemov, Dave Hansen,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
In order to avoid a name clash with some general MSR access helpers
after a future MSR infrastructure rework, rename the TDX specific
helpers.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/coco/tdx/tdx.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 7b2833705d47..500166c1a161 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -468,7 +468,7 @@ static void __cpuidle tdx_safe_halt(void)
raw_local_irq_enable();
}
-static int read_msr(struct pt_regs *regs, struct ve_info *ve)
+static int tdx_read_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
@@ -489,7 +489,7 @@ static int read_msr(struct pt_regs *regs, struct ve_info *ve)
return ve_instr_len(ve);
}
-static int write_msr(struct pt_regs *regs, struct ve_info *ve)
+static int tdx_write_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
@@ -842,9 +842,9 @@ static int virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
case EXIT_REASON_HLT:
return handle_halt(ve);
case EXIT_REASON_MSR_READ:
- return read_msr(regs, ve);
+ return tdx_read_msr(regs, ve);
case EXIT_REASON_MSR_WRITE:
- return write_msr(regs, ve);
+ return tdx_write_msr(regs, ve);
case EXIT_REASON_CPUID:
return handle_cpuid(regs, ve);
case EXIT_REASON_EPT_VIOLATION:
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 01/12] coco/tdx: Rename MSR access helpers
2025-09-30 7:03 ` [PATCH v2 01/12] coco/tdx: Rename MSR access helpers Juergen Gross
@ 2025-09-30 9:07 ` Kiryl Shutsemau
0 siblings, 0 replies; 43+ messages in thread
From: Kiryl Shutsemau @ 2025-09-30 9:07 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, linux-coco, xin, Dave Hansen, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H. Peter Anvin
On Tue, Sep 30, 2025 at 09:03:45AM +0200, Juergen Gross wrote:
> In order to avoid a name clash with some general MSR access helpers
> after a future MSR infrastructure rework, rename the TDX specific
> helpers.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 02/12] x86/sev: Replace call of native_wrmsr() with native_wrmsrq()
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
2025-09-30 7:03 ` [PATCH v2 01/12] coco/tdx: Rename MSR access helpers Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-10-07 6:08 ` Nikunj A Dadhania
2025-09-30 7:03 ` [PATCH v2 03/12] x86/kvm: Remove the KVM private read_msr() function Juergen Gross
` (10 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86
Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin
In sev_es_wr_ghcb_msr() the 64 bit MSR value is split into 2 32 bit
values in order to call native_wrmsr(), which will combine the values
into a 64 bit value again.
Just use native_wrmsrq() instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
---
arch/x86/include/asm/sev-internal.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 3dfd306d1c9e..f5d6fb3b5916 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -89,12 +89,7 @@ static inline u64 sev_es_rd_ghcb_msr(void)
static __always_inline void sev_es_wr_ghcb_msr(u64 val)
{
- u32 low, high;
-
- low = (u32)(val);
- high = (u32)(val >> 32);
-
- native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+ native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
}
void snp_register_ghcb_early(unsigned long paddr);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 02/12] x86/sev: Replace call of native_wrmsr() with native_wrmsrq()
2025-09-30 7:03 ` [PATCH v2 02/12] x86/sev: Replace call of native_wrmsr() with native_wrmsrq() Juergen Gross
@ 2025-10-07 6:08 ` Nikunj A Dadhania
0 siblings, 0 replies; 43+ messages in thread
From: Nikunj A Dadhania @ 2025-10-07 6:08 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, x86, thomas.lendacky
Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin
Juergen Gross <jgross@suse.com> writes:
> In sev_es_wr_ghcb_msr() the 64 bit MSR value is split into 2 32 bit
> values in order to call native_wrmsr(), which will combine the values
> into a 64 bit value again.
>
> Just use native_wrmsrq() instead.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
> ---
> V2:
> - new patch
> ---
> arch/x86/include/asm/sev-internal.h | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
> index 3dfd306d1c9e..f5d6fb3b5916 100644
> --- a/arch/x86/include/asm/sev-internal.h
> +++ b/arch/x86/include/asm/sev-internal.h
> @@ -89,12 +89,7 @@ static inline u64 sev_es_rd_ghcb_msr(void)
>
> static __always_inline void sev_es_wr_ghcb_msr(u64 val)
> {
> - u32 low, high;
> -
> - low = (u32)(val);
> - high = (u32)(val >> 32);
> -
> - native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> + native_wrmsrq(MSR_AMD64_SEV_ES_GHCB, val);
> }
>
> void snp_register_ghcb_early(unsigned long paddr);
> --
> 2.51.0
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 03/12] x86/kvm: Remove the KVM private read_msr() function
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
2025-09-30 7:03 ` [PATCH v2 01/12] coco/tdx: Rename MSR access helpers Juergen Gross
2025-09-30 7:03 ` [PATCH v2 02/12] x86/sev: Replace call of native_wrmsr() with native_wrmsrq() Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 16:04 ` Sean Christopherson
2025-09-30 7:03 ` [PATCH v2 04/12] x86/msr: Minimize usage of native_*() msr access functions Juergen Gross
` (9 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86, kvm
Cc: xin, Juergen Gross, Sean Christopherson, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
Instead of having a KVM private read_msr() function, just use rdmsrq().
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- remove the helper and use rdmsrq() directly (Sean Christopherson)
---
arch/x86/include/asm/kvm_host.h | 10 ----------
arch/x86/kvm/vmx/vmx.c | 4 ++--
2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f19a76d3ca0e..aed754dda1a3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2296,16 +2296,6 @@ static inline void kvm_load_ldt(u16 sel)
asm("lldt %0" : : "rm"(sel));
}
-#ifdef CONFIG_X86_64
-static inline unsigned long read_msr(unsigned long msr)
-{
- u64 value;
-
- rdmsrq(msr, value);
- return value;
-}
-#endif
-
static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
{
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index aa157fe5b7b3..12bb1769e3ae 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1288,8 +1288,8 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
} else {
savesegment(fs, fs_sel);
savesegment(gs, gs_sel);
- fs_base = read_msr(MSR_FS_BASE);
- vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
+ rdmsrq(MSR_FS_BASE, fs_base);
+ rdmsrq(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
}
wrmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 03/12] x86/kvm: Remove the KVM private read_msr() function
2025-09-30 7:03 ` [PATCH v2 03/12] x86/kvm: Remove the KVM private read_msr() function Juergen Gross
@ 2025-09-30 16:04 ` Sean Christopherson
2025-10-01 9:14 ` Jürgen Groß
0 siblings, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2025-09-30 16:04 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, kvm, xin, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
For the scope:
KVM: x86:
because x86/kvm is specifically used for guest-side code.
On Tue, Sep 30, 2025, Juergen Gross wrote:
> Instead of having a KVM private read_msr() function, just use rdmsrq().
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - remove the helper and use rdmsrq() directly (Sean Christopherson)
> ---
> arch/x86/include/asm/kvm_host.h | 10 ----------
> arch/x86/kvm/vmx/vmx.c | 4 ++--
> 2 files changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f19a76d3ca0e..aed754dda1a3 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2296,16 +2296,6 @@ static inline void kvm_load_ldt(u16 sel)
> asm("lldt %0" : : "rm"(sel));
> }
>
> -#ifdef CONFIG_X86_64
> -static inline unsigned long read_msr(unsigned long msr)
> -{
> - u64 value;
> -
> - rdmsrq(msr, value);
> - return value;
> -}
> -#endif
Gah, the same commit[*] that added a wrmsrns() use also added a read_msr(). Sorry :-(
[*] 65391feb042b ("KVM: VMX: Add host MSR read/write helpers to consolidate preemption handling")
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 03/12] x86/kvm: Remove the KVM private read_msr() function
2025-09-30 16:04 ` Sean Christopherson
@ 2025-10-01 9:14 ` Jürgen Groß
0 siblings, 0 replies; 43+ messages in thread
From: Jürgen Groß @ 2025-10-01 9:14 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-kernel, x86, kvm, xin, Paolo Bonzini, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
[-- Attachment #1.1.1: Type: text/plain, Size: 1378 bytes --]
On 30.09.25 18:04, Sean Christopherson wrote:
> For the scope:
>
> KVM: x86:
>
> because x86/kvm is specifically used for guest-side code.
Okay, will change that.
>
> On Tue, Sep 30, 2025, Juergen Gross wrote:
>> Instead of having a KVM private read_msr() function, just use rdmsrq().
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - remove the helper and use rdmsrq() directly (Sean Christopherson)
>> ---
>> arch/x86/include/asm/kvm_host.h | 10 ----------
>> arch/x86/kvm/vmx/vmx.c | 4 ++--
>> 2 files changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index f19a76d3ca0e..aed754dda1a3 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -2296,16 +2296,6 @@ static inline void kvm_load_ldt(u16 sel)
>> asm("lldt %0" : : "rm"(sel));
>> }
>>
>> -#ifdef CONFIG_X86_64
>> -static inline unsigned long read_msr(unsigned long msr)
>> -{
>> - u64 value;
>> -
>> - rdmsrq(msr, value);
>> - return value;
>> -}
>> -#endif
>
> Gah, the same commit[*] that added a wrmsrns() use also added a read_msr(). Sorry :-(
>
> [*] 65391feb042b ("KVM: VMX: Add host MSR read/write helpers to consolidate preemption handling")
Again, thanks for the heads up.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 04/12] x86/msr: Minimize usage of native_*() msr access functions
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (2 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 03/12] x86/kvm: Remove the KVM private read_msr() function Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 7:03 ` [PATCH v2 05/12] x86/msr: Move MSR trace calls one function level up Juergen Gross
` (8 subsequent siblings)
12 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86, linux-hyperv, kvm
Cc: xin, Juergen Gross, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
Sean Christopherson, Boris Ostrovsky, xen-devel
In order to prepare for some MSR access function reorg work, switch
most users of native_{read|write}_msr[_safe]() to the more generic
rdmsr*()/wrmsr*() variants.
For now this will have some intermediate performance impact with
paravirtualization configured when running on bare metal, but this
is a prereq change for the planned direct inlining of the rdmsr/wrmsr
instructions with this configuration.
The main reason for this switch is the planned move of the MSR trace
function invocation from the native_*() functions to the generic
rdmsr*()/wrmsr*() variants. Without this switch the users of the
native_*() functions would lose the related tracing entries.
Note that the Xen related MSR access functions will not be switched,
as these will be handled after the move of the trace hooks.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
---
arch/x86/hyperv/ivm.c | 2 +-
arch/x86/kernel/kvmclock.c | 2 +-
arch/x86/kvm/svm/svm.c | 16 ++++++++--------
arch/x86/xen/pmu.c | 4 ++--
4 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index ade6c665c97e..202ed01dc151 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -327,7 +327,7 @@ int hv_snp_boot_ap(u32 apic_id, unsigned long start_ip, unsigned int cpu)
asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
- vmsa->efer = native_read_msr(MSR_EFER);
+ rdmsrq(MSR_EFER, vmsa->efer);
vmsa->cr4 = native_read_cr4();
vmsa->cr3 = __native_read_cr3();
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca0a49eeac4a..b6cd45cce5fe 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -196,7 +196,7 @@ static void kvm_setup_secondary_clock(void)
void kvmclock_disable(void)
{
if (msr_kvm_system_time)
- native_write_msr(msr_kvm_system_time, 0);
+ wrmsrq(msr_kvm_system_time, 0);
}
static void __init kvmclock_init_mem(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1bfebe40854f..105d5c2aae46 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -393,12 +393,12 @@ static void svm_init_erratum_383(void)
return;
/* Use _safe variants to not break nested virtualization */
- if (native_read_msr_safe(MSR_AMD64_DC_CFG, &val))
+ if (rdmsrq_safe(MSR_AMD64_DC_CFG, &val))
return;
val |= (1ULL << 47);
- native_write_msr_safe(MSR_AMD64_DC_CFG, val);
+ wrmsrq_safe(MSR_AMD64_DC_CFG, val);
erratum_383_found = true;
}
@@ -558,9 +558,9 @@ static int svm_enable_virtualization_cpu(void)
u64 len, status = 0;
int err;
- err = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
+ err = rdmsrq_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
if (!err)
- err = native_read_msr_safe(MSR_AMD64_OSVW_STATUS, &status);
+ err = rdmsrq_safe(MSR_AMD64_OSVW_STATUS, &status);
if (err)
osvw_status = osvw_len = 0;
@@ -2032,7 +2032,7 @@ static bool is_erratum_383(void)
if (!erratum_383_found)
return false;
- if (native_read_msr_safe(MSR_IA32_MC0_STATUS, &value))
+ if (rdmsrq_safe(MSR_IA32_MC0_STATUS, &value))
return false;
/* Bit 62 may or may not be set for this mce */
@@ -2043,11 +2043,11 @@ static bool is_erratum_383(void)
/* Clear MCi_STATUS registers */
for (i = 0; i < 6; ++i)
- native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
+ wrmsrq_safe(MSR_IA32_MCx_STATUS(i), 0);
- if (!native_read_msr_safe(MSR_IA32_MCG_STATUS, &value)) {
+ if (!rdmsrq_safe(MSR_IA32_MCG_STATUS, &value)) {
value &= ~(1ULL << 2);
- native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
+ wrmsrq_safe(MSR_IA32_MCG_STATUS, value);
}
/* Flush tlb to evict multi-match entries */
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 8f89ce0b67e3..d49a3bdc448b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -323,7 +323,7 @@ static u64 xen_amd_read_pmc(int counter)
u64 val;
msr = amd_counters_base + (counter * amd_msr_step);
- native_read_msr_safe(msr, &val);
+ rdmsrq_safe(msr, &val);
return val;
}
@@ -349,7 +349,7 @@ static u64 xen_intel_read_pmc(int counter)
else
msr = MSR_IA32_PERFCTR0 + counter;
- native_read_msr_safe(msr, &val);
+ rdmsrq_safe(msr, &val);
return val;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 05/12] x86/msr: Move MSR trace calls one function level up
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (3 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 04/12] x86/msr: Minimize usage of native_*() msr access functions Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 7:03 ` [PATCH v2 06/12] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Juergen Gross
` (7 subsequent siblings)
12 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86, virtualization
Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Ajay Kaher, Alexey Makhalov,
Broadcom internal kernel review list
In order to prepare paravirt inlining of the MSR access instructions
move the calls of MSR trace functions one function level up.
Introduce {read|write}_msr[_safe]() helpers allowing to have common
definitions in msr.h doing the trace calls.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/msr.h | 102 ++++++++++++++++++++------------
arch/x86/include/asm/paravirt.h | 38 +++---------
2 files changed, 73 insertions(+), 67 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 9c2ea29e12a9..71f41af11591 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -103,14 +103,7 @@ static __always_inline u64 native_rdmsrq(u32 msr)
static inline u64 native_read_msr(u32 msr)
{
- u64 val;
-
- val = __rdmsr(msr);
-
- if (tracepoint_enabled(read_msr))
- do_trace_read_msr(msr, val, 0);
-
- return val;
+ return __rdmsr(msr);
}
static inline int native_read_msr_safe(u32 msr, u64 *p)
@@ -123,8 +116,6 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
: [err] "=r" (err), EAX_EDX_RET(val, low, high)
: "c" (msr));
- if (tracepoint_enabled(read_msr))
- do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), err);
*p = EAX_EDX_VAL(val, low, high);
@@ -135,9 +126,6 @@ static inline int native_read_msr_safe(u32 msr, u64 *p)
static inline void notrace native_write_msr(u32 msr, u64 val)
{
native_wrmsrq(msr, val);
-
- if (tracepoint_enabled(write_msr))
- do_trace_write_msr(msr, val, 0);
}
/* Can be uninlined because referenced by paravirt */
@@ -151,8 +139,6 @@ static inline int notrace native_write_msr_safe(u32 msr, u64 val)
: [err] "=a" (err)
: "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
: "memory");
- if (tracepoint_enabled(write_msr))
- do_trace_write_msr(msr, val, err);
return err;
}
@@ -173,59 +159,96 @@ static inline u64 native_read_pmc(int counter)
#include <asm/paravirt.h>
#else
#include <linux/errno.h>
+static __always_inline u64 read_msr(u32 msr)
+{
+ return native_read_msr(msr);
+}
+
+static __always_inline int read_msr_safe(u32 msr, u64 *p)
+{
+ return native_read_msr_safe(msr, p);
+}
+
+static __always_inline void write_msr(u32 msr, u64 val)
+{
+ native_write_msr(msr, val);
+}
+
+static __always_inline int write_msr_safe(u32 msr, u64 val)
+{
+ return native_write_msr_safe(msr, val);
+}
+
+static __always_inline u64 rdpmc(int counter)
+{
+ return native_read_pmc(counter);
+}
+
+#endif /* !CONFIG_PARAVIRT_XXL */
+
/*
* Access to machine-specific registers (available on 586 and better only)
* Note: the rd* operations modify the parameters directly (without using
* pointer indirection), this allows gcc to optimize better
*/
+#define rdmsrq(msr, val) \
+do { \
+ (val) = read_msr(msr); \
+ if (tracepoint_enabled(read_msr)) \
+ do_trace_read_msr(msr, val, 0); \
+} while (0)
+
#define rdmsr(msr, low, high) \
do { \
- u64 __val = native_read_msr((msr)); \
+ u64 __val; \
+ rdmsrq(msr, __val); \
(void)((low) = (u32)__val); \
(void)((high) = (u32)(__val >> 32)); \
} while (0)
-static inline void wrmsr(u32 msr, u32 low, u32 high)
+/* rdmsr with exception handling */
+static inline int rdmsrq_safe(u32 msr, u64 *p)
{
- native_write_msr(msr, (u64)high << 32 | low);
-}
+ int err;
-#define rdmsrq(msr, val) \
- ((val) = native_read_msr((msr)))
+ err = read_msr_safe(msr, p);
-static inline void wrmsrq(u32 msr, u64 val)
-{
- native_write_msr(msr, val);
-}
+ if (tracepoint_enabled(read_msr))
+ do_trace_read_msr(msr, *p, err);
-/* wrmsr with exception handling */
-static inline int wrmsrq_safe(u32 msr, u64 val)
-{
- return native_write_msr_safe(msr, val);
+ return err;
}
-/* rdmsr with exception handling */
#define rdmsr_safe(msr, low, high) \
({ \
u64 __val; \
- int __err = native_read_msr_safe((msr), &__val); \
+ int __err = rdmsrq_safe((msr), &__val); \
(*low) = (u32)__val; \
(*high) = (u32)(__val >> 32); \
__err; \
})
-static inline int rdmsrq_safe(u32 msr, u64 *p)
+static inline void wrmsrq(u32 msr, u64 val)
{
- return native_read_msr_safe(msr, p);
+ write_msr(msr, val);
+
+ if (tracepoint_enabled(write_msr))
+ do_trace_write_msr(msr, val, 0);
}
-static __always_inline u64 rdpmc(int counter)
+/* wrmsr with exception handling */
+static inline int wrmsrq_safe(u32 msr, u64 val)
{
- return native_read_pmc(counter);
-}
+ int err;
-#endif /* !CONFIG_PARAVIRT_XXL */
+ err = write_msr_safe(msr, val);
+
+ if (tracepoint_enabled(write_msr))
+ do_trace_write_msr(msr, val, err);
+
+ return err;
+}
/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
#define ASM_WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
@@ -242,6 +265,11 @@ static __always_inline void wrmsrns(u32 msr, u64 val)
: : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
}
+static inline void wrmsr(u32 msr, u32 low, u32 high)
+{
+ wrmsrq(msr, (u64)high << 32 | low);
+}
+
/*
* Dual u32 version of wrmsrq_safe():
*/
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index b5e59a7ba0d0..dc297f62b935 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -195,46 +195,24 @@ static inline int paravirt_write_msr_safe(u32 msr, u64 val)
return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
}
-#define rdmsr(msr, val1, val2) \
-do { \
- u64 _l = paravirt_read_msr(msr); \
- val1 = (u32)_l; \
- val2 = _l >> 32; \
-} while (0)
-
-static __always_inline void wrmsr(u32 msr, u32 low, u32 high)
+static __always_inline u64 read_msr(u32 msr)
{
- paravirt_write_msr(msr, (u64)high << 32 | low);
+ return paravirt_read_msr(msr);
}
-#define rdmsrq(msr, val) \
-do { \
- val = paravirt_read_msr(msr); \
-} while (0)
-
-static inline void wrmsrq(u32 msr, u64 val)
+static __always_inline int read_msr_safe(u32 msr, u64 *p)
{
- paravirt_write_msr(msr, val);
+ return paravirt_read_msr_safe(msr, p);
}
-static inline int wrmsrq_safe(u32 msr, u64 val)
+static __always_inline void write_msr(u32 msr, u64 val)
{
- return paravirt_write_msr_safe(msr, val);
+ paravirt_write_msr(msr, val);
}
-/* rdmsr with exception handling */
-#define rdmsr_safe(msr, a, b) \
-({ \
- u64 _l; \
- int _err = paravirt_read_msr_safe((msr), &_l); \
- (*a) = (u32)_l; \
- (*b) = (u32)(_l >> 32); \
- _err; \
-})
-
-static __always_inline int rdmsrq_safe(u32 msr, u64 *p)
+static __always_inline int write_msr_safe(u32 msr, u64 val)
{
- return paravirt_read_msr_safe(msr, p);
+ return paravirt_write_msr_safe(msr, val);
}
static __always_inline u64 rdpmc(int counter)
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 06/12] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (4 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 05/12] x86/msr: Move MSR trace calls one function level up Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 7:03 ` [PATCH v2 07/12] x86/opcode: Add immediate form MSR instructions Juergen Gross
` (6 subsequent siblings)
12 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86
Cc: xin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Juergen Gross
From: "Xin Li (Intel)" <xin@zytor.com>
The immediate form of MSR access instructions are primarily motivated
by performance, not code size: by having the MSR number in an immediate,
it is available *much* earlier in the pipeline, which allows the
hardware much more leeway about how a particular MSR is handled.
Use a scattered CPU feature bit for MSR immediate form instructions.
Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch, taken from the RFC v2 MSR refactor series by Xin Li
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 751ca35386b0..625635767085 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -496,6 +496,7 @@
#define X86_FEATURE_TSA_L1_NO (21*32+12) /* AMD CPU not vulnerable to TSA-L1 */
#define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */
#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
+#define X86_FEATURE_MSR_IMM (21*32+15) /* MSR immediate form instructions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 6b868afb26c3..cf4ae822bcc0 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+ { X86_FEATURE_MSR_IMM, CPUID_ECX, 5, 0x00000007, 1 },
{ X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 07/12] x86/opcode: Add immediate form MSR instructions
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (5 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 06/12] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 7:03 ` [PATCH v2 08/12] x86/extable: Add support for " Juergen Gross
` (5 subsequent siblings)
12 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86
Cc: xin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Juergen Gross
From: "Xin Li (Intel)" <xin@zytor.com>
Add the instruction opcodes used by the immediate form WRMSRNS/RDMSR
to x86-opcode-map.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch, taken from the RFC v2 MSR refactor series by Xin Li
---
arch/x86/lib/x86-opcode-map.txt | 5 +++--
tools/arch/x86/lib/x86-opcode-map.txt | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 262f7ca1fb95..65aeb18ad22d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -839,7 +839,7 @@ f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
f3: Grp17 (1A)
f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy | RDMSR Rq,Gq (F2),(11B) | WRMSRNS Gq,Rq (F3),(11B)
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
f9: MOVDIRI My,Gy
@@ -1014,7 +1014,7 @@ f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
f2: INVPCID Gy,Mdq (F3),(ev)
f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
-f6: Grp3_1 Eb (1A),(ev)
+f6: Grp3_1 Eb (1A),(ev) | RDMSR Rq,Gq (F2),(11B),(ev) | WRMSRNS Gq,Rq (F3),(11B),(ev)
f7: Grp3_2 Ev (1A),(es)
f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
f9: MOVDIRI My,Gy (ev)
@@ -1103,6 +1103,7 @@ EndTable
Table: VEX map 7
Referrer:
AVXcode: 7
+f6: RDMSR Rq,Id (F2),(v1),(11B) | WRMSRNS Id,Rq (F3),(v1),(11B)
f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
EndTable
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 262f7ca1fb95..65aeb18ad22d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -839,7 +839,7 @@ f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
f3: Grp17 (1A)
f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy | RDMSR Rq,Gq (F2),(11B) | WRMSRNS Gq,Rq (F3),(11B)
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
f9: MOVDIRI My,Gy
@@ -1014,7 +1014,7 @@ f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
f2: INVPCID Gy,Mdq (F3),(ev)
f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
-f6: Grp3_1 Eb (1A),(ev)
+f6: Grp3_1 Eb (1A),(ev) | RDMSR Rq,Gq (F2),(11B),(ev) | WRMSRNS Gq,Rq (F3),(11B),(ev)
f7: Grp3_2 Ev (1A),(es)
f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
f9: MOVDIRI My,Gy (ev)
@@ -1103,6 +1103,7 @@ EndTable
Table: VEX map 7
Referrer:
AVXcode: 7
+f6: RDMSR Rq,Id (F2),(v1),(11B) | WRMSRNS Id,Rq (F3),(v1),(11B)
f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
EndTable
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 08/12] x86/extable: Add support for immediate form MSR instructions
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (6 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 07/12] x86/opcode: Add immediate form MSR instructions Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 8:14 ` Peter Zijlstra
2025-09-30 7:03 ` [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR Juergen Gross
` (4 subsequent siblings)
12 siblings, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86
Cc: xin, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Andy Lutomirski, Peter Zijlstra, Juergen Gross
From: "Xin Li (Intel)" <xin@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch, taken from the RFC v2 MSR refactor series by Xin Li
---
arch/x86/include/asm/msr.h | 18 ++++++++++++++++++
arch/x86/mm/extable.c | 39 +++++++++++++++++++++++++++++++++-----
2 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 71f41af11591..cf5205300266 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -56,6 +56,24 @@ static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
#endif
+/*
+ * Called only from an MSR fault handler, the instruction pointer points to
+ * the MSR access instruction that caused the fault.
+ */
+static __always_inline bool is_msr_imm_insn(void *ip)
+{
+ /*
+ * A full decoder for immediate form MSR instructions appears excessive.
+ */
+#ifdef CONFIG_X86_64
+ const u8 msr_imm_insn_prefix[] = { 0xc4, 0xe7 };
+
+ return !memcmp(ip, msr_imm_insn_prefix, sizeof(msr_imm_insn_prefix));
+#else
+ return false;
+#endif
+}
+
/*
* __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
* accessors and should not have any tracing or other functionality piggybacking
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 2fdc1f1f5adb..c021e4dbc69d 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -166,23 +166,52 @@ static bool ex_handler_uaccess(const struct exception_table_entry *fixup,
static bool ex_handler_msr(const struct exception_table_entry *fixup,
struct pt_regs *regs, bool wrmsr, bool safe, int reg)
{
+ bool imm_insn = is_msr_imm_insn((void *)regs->ip);
+ u32 msr;
+
+ if (imm_insn)
+ /*
+ * The 32-bit immediate specifying a MSR is encoded into
+ * byte 5 ~ 8 of an immediate form MSR instruction.
+ */
+ msr = *(u32 *)(regs->ip + 5);
+ else
+ msr = (u32)regs->cx;
+
if (__ONCE_LITE_IF(!safe && wrmsr)) {
- pr_warn("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
- (unsigned int)regs->cx, (unsigned int)regs->dx,
- (unsigned int)regs->ax, regs->ip, (void *)regs->ip);
+ /*
+ * To maintain consistency with existing RDMSR and WRMSR(NS) instructions,
+ * the register operand for immediate form MSR instructions is ALWAYS
+ * encoded as RAX in <asm/msr.h> for reading or writing the MSR value.
+ */
+ u64 msr_val = regs->ax;
+
+ if (!imm_insn) {
+ /*
+ * On processors that support the Intel 64 architecture, the
+ * high-order 32 bits of each of RAX and RDX are ignored.
+ */
+ msr_val &= 0xffffffff;
+ msr_val |= (u64)regs->dx << 32;
+ }
+
+ pr_warn("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%016llx) at rIP: 0x%lx (%pS)\n",
+ msr, msr_val, regs->ip, (void *)regs->ip);
show_stack_regs(regs);
}
if (__ONCE_LITE_IF(!safe && !wrmsr)) {
pr_warn("unchecked MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
- (unsigned int)regs->cx, regs->ip, (void *)regs->ip);
+ msr, regs->ip, (void *)regs->ip);
show_stack_regs(regs);
}
if (!wrmsr) {
/* Pretend that the read succeeded and returned 0. */
regs->ax = 0;
- regs->dx = 0;
+
+ if (!imm_insn)
+ regs->dx = 0;
}
if (safe)
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 08/12] x86/extable: Add support for immediate form MSR instructions
2025-09-30 7:03 ` [PATCH v2 08/12] x86/extable: Add support for " Juergen Gross
@ 2025-09-30 8:14 ` Peter Zijlstra
2025-09-30 8:31 ` Jürgen Groß
0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-09-30 8:14 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, xin, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Andy Lutomirski
On Tue, Sep 30, 2025 at 09:03:52AM +0200, Juergen Gross wrote:
> From: "Xin Li (Intel)" <xin@zytor.com>
>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - new patch, taken from the RFC v2 MSR refactor series by Xin Li
> ---
> arch/x86/include/asm/msr.h | 18 ++++++++++++++++++
> arch/x86/mm/extable.c | 39 +++++++++++++++++++++++++++++++++-----
> 2 files changed, 52 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 71f41af11591..cf5205300266 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -56,6 +56,24 @@ static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
> static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
> #endif
>
> +/*
> + * Called only from an MSR fault handler, the instruction pointer points to
> + * the MSR access instruction that caused the fault.
> + */
> +static __always_inline bool is_msr_imm_insn(void *ip)
> +{
> + /*
> + * A full decoder for immediate form MSR instructions appears excessive.
> + */
> +#ifdef CONFIG_X86_64
> + const u8 msr_imm_insn_prefix[] = { 0xc4, 0xe7 };
> +
> + return !memcmp(ip, msr_imm_insn_prefix, sizeof(msr_imm_insn_prefix));
This seems fragile. Those two bytes are basically the first two bytes of
VEX3 and only indicate VEX3.map7. Which is not very specific, but when
combined with the fact that this is an MSR exception, might just work.
Trouble is that it is also possible to encode the immediate form using
EVEX. If a toolchain were to do that, we'd fail to detect it.
(And there is segment prefix stuffing possible I suppose)
> +#else
> + return false;
> +#endif
> +}
> +
> /*
> * __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
> * accessors and should not have any tracing or other functionality piggybacking
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 2fdc1f1f5adb..c021e4dbc69d 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -166,23 +166,52 @@ static bool ex_handler_uaccess(const struct exception_table_entry *fixup,
> static bool ex_handler_msr(const struct exception_table_entry *fixup,
> struct pt_regs *regs, bool wrmsr, bool safe, int reg)
> {
> + bool imm_insn = is_msr_imm_insn((void *)regs->ip);
> + u32 msr;
> +
> + if (imm_insn)
> + /*
> + * The 32-bit immediate specifying a MSR is encoded into
> + * byte 5 ~ 8 of an immediate form MSR instruction.
> + */
> + msr = *(u32 *)(regs->ip + 5);
> + else
> + msr = (u32)regs->cx;
This seems to have fallen subject to the US tariff induced {} shortage
or something.
Also, EVEX form will have them in other bytes (one further, on account
of EVEX being 4 bytes, rather than 3).
Given this really isn't a fast path or anything, how about we just use
the instruction decoder? Something like:
struct insn insn;
u32 msr = (u32)regs->cx;
ret = insn_decode_kernel(&insn, (void *)regs->ip);
if (!ret && insn.vex_prefix.nbytes)
msr = insn.immediate.value;
should do, I suppose. Isn't that both simpler and more robust?
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 08/12] x86/extable: Add support for immediate form MSR instructions
2025-09-30 8:14 ` Peter Zijlstra
@ 2025-09-30 8:31 ` Jürgen Groß
0 siblings, 0 replies; 43+ messages in thread
From: Jürgen Groß @ 2025-09-30 8:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, x86, xin, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Andy Lutomirski
[-- Attachment #1.1.1: Type: text/plain, Size: 3449 bytes --]
On 30.09.25 10:14, Peter Zijlstra wrote:
> On Tue, Sep 30, 2025 at 09:03:52AM +0200, Juergen Gross wrote:
>> From: "Xin Li (Intel)" <xin@zytor.com>
>>
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - new patch, taken from the RFC v2 MSR refactor series by Xin Li
>> ---
>> arch/x86/include/asm/msr.h | 18 ++++++++++++++++++
>> arch/x86/mm/extable.c | 39 +++++++++++++++++++++++++++++++++-----
>> 2 files changed, 52 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
>> index 71f41af11591..cf5205300266 100644
>> --- a/arch/x86/include/asm/msr.h
>> +++ b/arch/x86/include/asm/msr.h
>> @@ -56,6 +56,24 @@ static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
>> static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
>> #endif
>>
>> +/*
>> + * Called only from an MSR fault handler, the instruction pointer points to
>> + * the MSR access instruction that caused the fault.
>> + */
>> +static __always_inline bool is_msr_imm_insn(void *ip)
>> +{
>> + /*
>> + * A full decoder for immediate form MSR instructions appears excessive.
>> + */
>> +#ifdef CONFIG_X86_64
>> + const u8 msr_imm_insn_prefix[] = { 0xc4, 0xe7 };
>> +
>> + return !memcmp(ip, msr_imm_insn_prefix, sizeof(msr_imm_insn_prefix));
>
> This seems fragile. Those two bytes are basically the first two bytes of
> VEX3 and only indicate VEX3.map7. Which is not very specific, but when
> combined with the fact that this is an MSR exception, might just work.
>
> Trouble is that it is also possible to encode the immediate form using
> EVEX. If a toolchain were to do that, we'd fail to detect it.
>
> (And there is segment prefix stuffing possible I suppose)
>
>> +#else
>> + return false;
>> +#endif
>> +}
>> +
>> /*
>> * __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
>> * accessors and should not have any tracing or other functionality piggybacking
>> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
>> index 2fdc1f1f5adb..c021e4dbc69d 100644
>> --- a/arch/x86/mm/extable.c
>> +++ b/arch/x86/mm/extable.c
>> @@ -166,23 +166,52 @@ static bool ex_handler_uaccess(const struct exception_table_entry *fixup,
>> static bool ex_handler_msr(const struct exception_table_entry *fixup,
>> struct pt_regs *regs, bool wrmsr, bool safe, int reg)
>> {
>> + bool imm_insn = is_msr_imm_insn((void *)regs->ip);
>> + u32 msr;
>> +
>> + if (imm_insn)
>> + /*
>> + * The 32-bit immediate specifying a MSR is encoded into
>> + * byte 5 ~ 8 of an immediate form MSR instruction.
>> + */
>> + msr = *(u32 *)(regs->ip + 5);
>> + else
>> + msr = (u32)regs->cx;
>
> This seems to have fallen subject to the US tariff induced {} shortage
> or something.
>
> Also, EVEX form will have them in other bytes (one further, on account
> of EVEX being 4 bytes, rather than 3).
>
> Given this really isn't a fast path or anything, how about we just use
> the instruction decoder? Something like:
>
> struct insn insn;
> u32 msr = (u32)regs->cx;
>
> ret = insn_decode_kernel(&insn, (void *)regs->ip);
> if (!ret && insn.vex_prefix.nbytes)
> msr = insn.immediate.value;
>
> should do, I suppose. Isn't that both simpler and more robust?
>
Works for me.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (7 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 08/12] x86/extable: Add support for " Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 8:31 ` Peter Zijlstra
2025-09-30 16:00 ` Sean Christopherson
2025-09-30 7:03 ` [PATCH v2 10/12] x86/msr: Use the alternatives mechanism for RDMSR Juergen Gross
` (3 subsequent siblings)
12 siblings, 2 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86, llvm
Cc: xin, Juergen Gross, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, Nathan Chancellor, Nick Desaulniers,
Bill Wendling, Justin Stitt
When available use one of the non-serializing WRMSR variants (WRMSRNS
with or without an immediate operand specifying the MSR register) in
__wrmsrq().
For the safe/unsafe variants make __wrmsrq() to be a common base
function instead of duplicating the ALTERNATIVE*() macros. This
requires to let native_wrmsr() use native_wrmsrq() instead of
__wrmsrq(). While changing this, convert native_wrmsr() into an inline
function.
Replace the only call of wsrmsrns() with the now equivalent call to
native_wrmsrq() and remove wsrmsrns().
The paravirt case will be handled later.
Originally-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch, partially taken from "[RFC PATCH v2 21/34] x86/msr: Utilize
the alternatives mechanism to write MSR" by Xin Li.
---
arch/x86/include/asm/fred.h | 2 +-
arch/x86/include/asm/msr.h | 144 +++++++++++++++++++++++++++---------
2 files changed, 110 insertions(+), 36 deletions(-)
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 12b34d5b2953..8ae4429e5401 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -101,7 +101,7 @@ static __always_inline void fred_update_rsp0(void)
unsigned long rsp0 = (unsigned long) task_stack_page(current) + THREAD_SIZE;
if (cpu_feature_enabled(X86_FEATURE_FRED) && (__this_cpu_read(fred_rsp0) != rsp0)) {
- wrmsrns(MSR_IA32_FRED_RSP0, rsp0);
+ native_wrmsrq(MSR_IA32_FRED_RSP0, rsp0);
__this_cpu_write(fred_rsp0, rsp0);
}
}
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index cf5205300266..19ed780c2a09 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -7,11 +7,11 @@
#ifndef __ASSEMBLER__
#include <asm/asm.h>
-#include <asm/errno.h>
#include <asm/cpumask.h>
#include <uapi/asm/msr.h>
#include <asm/shared/msr.h>
+#include <linux/errno.h>
#include <linux/types.h>
#include <linux/percpu.h>
@@ -56,6 +56,36 @@ static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
#endif
+/* The GNU Assembler (Gas) with Binutils 2.40 adds WRMSRNS support */
+#if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24000
+#define ASM_WRMSRNS "wrmsrns"
+#else
+#define ASM_WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
+#endif
+
+/* The GNU Assembler (Gas) with Binutils 2.41 adds the .insn directive support */
+#if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24100
+#define ASM_WRMSRNS_IMM \
+ " .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
+#else
+/*
+ * Note, clang also doesn't support the .insn directive.
+ *
+ * The register operand is encoded as %rax because all uses of the immediate
+ * form MSR access instructions reference %rax as the register operand.
+ */
+#define ASM_WRMSRNS_IMM \
+ " .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
+#endif
+
+#define PREPARE_RDX_FOR_WRMSR \
+ "mov %%rax, %%rdx\n\t" \
+ "shr $0x20, %%rdx\n\t"
+
+#define PREPARE_RCX_RDX_FOR_WRMSR \
+ "mov %[msr], %%ecx\n\t" \
+ PREPARE_RDX_FOR_WRMSR
+
/*
* Called only from an MSR fault handler, the instruction pointer points to
* the MSR access instruction that caused the fault.
@@ -93,12 +123,76 @@ static __always_inline u64 __rdmsr(u32 msr)
return EAX_EDX_VAL(val, low, high);
}
-static __always_inline void __wrmsrq(u32 msr, u64 val)
+static __always_inline bool __wrmsrq_variable(u32 msr, u64 val, int type)
{
- asm volatile("1: wrmsr\n"
- "2:\n"
- _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
- : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)) : "memory");
+#ifdef CONFIG_X86_64
+ BUILD_BUG_ON(__builtin_constant_p(msr));
+#endif
+
+ /*
+ * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
+ * DS prefix to avoid a trailing NOP.
+ */
+ asm_inline volatile goto(
+ "1:\n"
+ ALTERNATIVE("ds wrmsr",
+ ASM_WRMSRNS,
+ X86_FEATURE_WRMSRNS)
+ _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])
+
+ :
+ : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)), [type] "i" (type)
+ : "memory"
+ : badmsr);
+
+ return false;
+
+badmsr:
+ return true;
+}
+
+#ifdef CONFIG_X86_64
+/*
+ * Non-serializing WRMSR or its immediate form, when available.
+ *
+ * Otherwise, it falls back to a serializing WRMSR.
+ */
+static __always_inline bool __wrmsrq_constant(u32 msr, u64 val, int type)
+{
+ BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+ asm_inline volatile goto(
+ "1:\n"
+ ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
+ "2: ds wrmsr",
+ PREPARE_RCX_RDX_FOR_WRMSR
+ ASM_WRMSRNS,
+ X86_FEATURE_WRMSRNS,
+ ASM_WRMSRNS_IMM,
+ X86_FEATURE_MSR_IMM)
+ _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
+ _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
+
+ :
+ : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
+ : "memory", "ecx", "rdx"
+ : badmsr);
+
+ return false;
+
+badmsr:
+ return true;
+}
+#endif
+
+static __always_inline bool __wrmsrq(u32 msr, u64 val, int type)
+{
+#ifdef CONFIG_X86_64
+ if (__builtin_constant_p(msr))
+ return __wrmsrq_constant(msr, val, type);
+#endif
+
+ return __wrmsrq_variable(msr, val, type);
}
#define native_rdmsr(msr, val1, val2) \
@@ -113,11 +207,15 @@ static __always_inline u64 native_rdmsrq(u32 msr)
return __rdmsr(msr);
}
-#define native_wrmsr(msr, low, high) \
- __wrmsrq((msr), (u64)(high) << 32 | (low))
+static __always_inline void native_wrmsrq(u32 msr, u64 val)
+{
+ __wrmsrq(msr, val, EX_TYPE_WRMSR);
+}
-#define native_wrmsrq(msr, val) \
- __wrmsrq((msr), (val))
+static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
+{
+ native_wrmsrq(msr, (u64)high << 32 | low);
+}
static inline u64 native_read_msr(u32 msr)
{
@@ -149,15 +247,7 @@ static inline void notrace native_write_msr(u32 msr, u64 val)
/* Can be uninlined because referenced by paravirt */
static inline int notrace native_write_msr_safe(u32 msr, u64 val)
{
- int err;
-
- asm volatile("1: wrmsr ; xor %[err],%[err]\n"
- "2:\n\t"
- _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
- : [err] "=a" (err)
- : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
- : "memory");
- return err;
+ return __wrmsrq(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
}
extern int rdmsr_safe_regs(u32 regs[8]);
@@ -176,7 +266,6 @@ static inline u64 native_read_pmc(int counter)
#ifdef CONFIG_PARAVIRT_XXL
#include <asm/paravirt.h>
#else
-#include <linux/errno.h>
static __always_inline u64 read_msr(u32 msr)
{
return native_read_msr(msr);
@@ -268,21 +357,6 @@ static inline int wrmsrq_safe(u32 msr, u64 val)
return err;
}
-/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
-#define ASM_WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
-
-/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
-static __always_inline void wrmsrns(u32 msr, u64 val)
-{
- /*
- * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
- * DS prefix to avoid a trailing NOP.
- */
- asm volatile("1: " ALTERNATIVE("ds wrmsr", ASM_WRMSRNS, X86_FEATURE_WRMSRNS)
- "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
- : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
-}
-
static inline void wrmsr(u32 msr, u32 low, u32 high)
{
wrmsrq(msr, (u64)high << 32 | low);
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 7:03 ` [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR Juergen Gross
@ 2025-09-30 8:31 ` Peter Zijlstra
2025-09-30 8:46 ` Jürgen Groß
2025-09-30 16:00 ` Sean Christopherson
1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-09-30 8:31 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On Tue, Sep 30, 2025 at 09:03:53AM +0200, Juergen Gross wrote:
> +static __always_inline bool __wrmsrq_constant(u32 msr, u64 val, int type)
> +{
> + BUILD_BUG_ON(!__builtin_constant_p(msr));
> +
> + asm_inline volatile goto(
> + "1:\n"
> + ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
> + "2: ds wrmsr",
> + PREPARE_RCX_RDX_FOR_WRMSR
> + ASM_WRMSRNS,
> + X86_FEATURE_WRMSRNS,
> + ASM_WRMSRNS_IMM,
> + X86_FEATURE_MSR_IMM)
> + _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
> + _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
> +
> + :
> + : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
> + : "memory", "ecx", "rdx"
> + : badmsr);
> +
> + return false;
> +
> +badmsr:
> + return true;
> +}
Just wondering, would something this work?
asm_inline volatile goto(
"1:\n"
ALTERNATIVE(PREPARE_RCX_RDX_FOR_WRMSR
"2:\n"
ALTERNATIVE("ds wrmsr",
ASM_WRMSRNS, X86_FEATURE_WRMSRNS),
ASM_WRMSRNS_IMM, X86_FEATURE_MSR_IMM);
_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
:
: [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
: "memory", "ecx", "rdx"
: badmsr);
Its a bit weird because the nested alternative isn't for the exact same
position I suppose. But I find it a more readable form.
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 8:31 ` Peter Zijlstra
@ 2025-09-30 8:46 ` Jürgen Groß
2025-09-30 8:50 ` Peter Zijlstra
2025-10-01 8:49 ` Juergen Gross
0 siblings, 2 replies; 43+ messages in thread
From: Jürgen Groß @ 2025-09-30 8:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
[-- Attachment #1.1.1: Type: text/plain, Size: 2113 bytes --]
On 30.09.25 10:31, Peter Zijlstra wrote:
> On Tue, Sep 30, 2025 at 09:03:53AM +0200, Juergen Gross wrote:
>
>> +static __always_inline bool __wrmsrq_constant(u32 msr, u64 val, int type)
>> +{
>> + BUILD_BUG_ON(!__builtin_constant_p(msr));
>> +
>> + asm_inline volatile goto(
>> + "1:\n"
>> + ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
>> + "2: ds wrmsr",
>> + PREPARE_RCX_RDX_FOR_WRMSR
>> + ASM_WRMSRNS,
>> + X86_FEATURE_WRMSRNS,
>> + ASM_WRMSRNS_IMM,
>> + X86_FEATURE_MSR_IMM)
>> + _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
>> + _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
>> +
>> + :
>> + : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
>> + : "memory", "ecx", "rdx"
>> + : badmsr);
>> +
>> + return false;
>> +
>> +badmsr:
>> + return true;
>> +}
>
> Just wondering, would something this work?
>
> asm_inline volatile goto(
> "1:\n"
> ALTERNATIVE(PREPARE_RCX_RDX_FOR_WRMSR
> "2:\n"
> ALTERNATIVE("ds wrmsr",
> ASM_WRMSRNS, X86_FEATURE_WRMSRNS),
> ASM_WRMSRNS_IMM, X86_FEATURE_MSR_IMM);
> _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
> _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
>
> :
> : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
> : "memory", "ecx", "rdx"
> : badmsr);
>
> Its a bit weird because the nested alternative isn't for the exact same
> position I suppose. But I find it a more readable form.
I don't think it would work. Nested ALTERNATIVE()s do work only with
all of them starting at the same location. Have a look at the
ALTERNATIVE() macro, which is defining the label "771" via OLDINSTR()
and then referring to this label via ALTINSTR_ENTRY(). In your case
the ALTINSTR_ENTRY() of the outer ALTERNATIVE() invocation would find
the wrong "771" label (the one of the inner ALTERNATIVE()).
Allowing such constructs would probably require switching from preprocessor
macros to assembler macros.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 8:46 ` Jürgen Groß
@ 2025-09-30 8:50 ` Peter Zijlstra
2025-09-30 12:51 ` Peter Zijlstra
2025-10-01 8:49 ` Juergen Gross
1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-09-30 8:50 UTC (permalink / raw)
To: Jürgen Groß
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On Tue, Sep 30, 2025 at 10:46:23AM +0200, Jürgen Groß wrote:
> On 30.09.25 10:31, Peter Zijlstra wrote:
> > On Tue, Sep 30, 2025 at 09:03:53AM +0200, Juergen Gross wrote:
> >
> > > +static __always_inline bool __wrmsrq_constant(u32 msr, u64 val, int type)
> > > +{
> > > + BUILD_BUG_ON(!__builtin_constant_p(msr));
> > > +
> > > + asm_inline volatile goto(
> > > + "1:\n"
> > > + ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
> > > + "2: ds wrmsr",
> > > + PREPARE_RCX_RDX_FOR_WRMSR
> > > + ASM_WRMSRNS,
> > > + X86_FEATURE_WRMSRNS,
> > > + ASM_WRMSRNS_IMM,
> > > + X86_FEATURE_MSR_IMM)
> > > + _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
> > > + _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
> > > +
> > > + :
> > > + : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
> > > + : "memory", "ecx", "rdx"
> > > + : badmsr);
> > > +
> > > + return false;
> > > +
> > > +badmsr:
> > > + return true;
> > > +}
> >
> > Just wondering, would something this work?
> >
> > asm_inline volatile goto(
> > "1:\n"
> > ALTERNATIVE(PREPARE_RCX_RDX_FOR_WRMSR
> > "2:\n"
> > ALTERNATIVE("ds wrmsr",
> > ASM_WRMSRNS, X86_FEATURE_WRMSRNS),
> > ASM_WRMSRNS_IMM, X86_FEATURE_MSR_IMM);
> > _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
> > _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
> >
> > :
> > : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
> > : "memory", "ecx", "rdx"
> > : badmsr);
> >
> > Its a bit weird because the nested alternative isn't for the exact same
> > position I suppose. But I find it a more readable form.
>
> I don't think it would work. Nested ALTERNATIVE()s do work only with
> all of them starting at the same location. Have a look at the
> ALTERNATIVE() macro, which is defining the label "771" via OLDINSTR()
> and then referring to this label via ALTINSTR_ENTRY(). In your case
> the ALTINSTR_ENTRY() of the outer ALTERNATIVE() invocation would find
> the wrong "771" label (the one of the inner ALTERNATIVE()).
>
> Allowing such constructs would probably require switching from preprocessor
> macros to assembler macros.
Right, I was looking at the asm macros.
As long as the inner comes first in the apply list it should all just
work, except you get the double patching back.
Using the asm macros isn't going to make it more readable though.
Oh well, lets forget about this :-)
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 8:50 ` Peter Zijlstra
@ 2025-09-30 12:51 ` Peter Zijlstra
2025-09-30 15:42 ` Jürgen Groß
0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-09-30 12:51 UTC (permalink / raw)
To: Jürgen Groß
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On Tue, Sep 30, 2025 at 10:50:44AM +0200, Peter Zijlstra wrote:
> On Tue, Sep 30, 2025 at 10:46:23AM +0200, Jürgen Groß wrote:
> > > asm_inline volatile goto(
> > > "1:\n"
> > > ALTERNATIVE(PREPARE_RCX_RDX_FOR_WRMSR
> > > "2:\n"
> > > ALTERNATIVE("ds wrmsr",
> > > ASM_WRMSRNS, X86_FEATURE_WRMSRNS),
> > > ASM_WRMSRNS_IMM, X86_FEATURE_MSR_IMM);
> > > _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
> > > _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
> > >
> > > :
> > > : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
> > > : "memory", "ecx", "rdx"
> > > : badmsr);
> > >
> Oh well, lets forget about this :-)
So I couldn't. I tried the below, which when building a .i generates the
following:
static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) void clear_page(void *page)
{
kmsan_unpoison_memory(page, ((1UL) << 12));
asm __inline volatile(
"# ALT: oldinstr\n"
"__UNIQUE_ID_altinstr_9" "_begin:\n\t"
"# ALT: oldinstr\n"
"__UNIQUE_ID_altinstr_8" "_begin:\n\t"
"call %c[old]" "\n"
"__UNIQUE_ID_altinstr_8" "_pad:\n"
"# ALT: padding\n"
".skip -(((" "__UNIQUE_ID_altinstr_8" "_alt_end - " "__UNIQUE_ID_altinstr_8" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_8" "_pad - " "__UNIQUE_ID_altinstr_8" "_begin" ")) > 0) * "
"((" "__UNIQUE_ID_altinstr_8" "_alt_end - " "__UNIQUE_ID_altinstr_8" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_8" "_pad - " "__UNIQUE_ID_altinstr_8" "_begin" ")),0x90\n"
"__UNIQUE_ID_altinstr_8" "_end:\n"
".pushsection .altinstructions,\"a\"\n"
" .long " "__UNIQUE_ID_altinstr_8" "_begin - .\n"
" .long " "__UNIQUE_ID_altinstr_8" "_alt_begin - .\n"
" .4byte " "( 3*32+16)" "\n"
" .byte " "__UNIQUE_ID_altinstr_8" "_end - " "__UNIQUE_ID_altinstr_8" "_begin" "\n"
" .byte " "__UNIQUE_ID_altinstr_8" "_alt_end - " "__UNIQUE_ID_altinstr_8" "_alt_begin" "\n"
".popsection\n"
".pushsection .altinstr_replacement, \"ax\"\n"
"# ALT: replacement\n"
"__UNIQUE_ID_altinstr_8" "_alt_begin:\n\t"
"call %c[new1]" "\n"
"__UNIQUE_ID_altinstr_8" "_alt_end:\n"
".popsection\n"
"\n"
"__UNIQUE_ID_altinstr_9" "_pad:\n"
"# ALT: padding\n"
".skip -(((" "__UNIQUE_ID_altinstr_9" "_alt_end - " "__UNIQUE_ID_altinstr_9" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_9" "_pad - " "__UNIQUE_ID_altinstr_9" "_begin" ")) > 0) * "
"((" "__UNIQUE_ID_altinstr_9" "_alt_end - " "__UNIQUE_ID_altinstr_9" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_9" "_pad - " "__UNIQUE_ID_altinstr_9" "_begin" ")),0x90\n"
"__UNIQUE_ID_altinstr_9" "_end:\n"
".pushsection .altinstructions,\"a\"\n"
" .long " "__UNIQUE_ID_altinstr_9" "_begin - .\n"
" .long " "__UNIQUE_ID_altinstr_9" "_alt_begin - .\n"
" .4byte " "( 9*32+ 9)" "\n"
" .byte " "__UNIQUE_ID_altinstr_9" "_end - " "__UNIQUE_ID_altinstr_9" "_begin" "\n"
" .byte " "__UNIQUE_ID_altinstr_9" "_alt_end - " "__UNIQUE_ID_altinstr_9" "_alt_begin" "\n"
".popsection\n"
".pushsection .altinstr_replacement, \"ax\"\n"
"# ALT: replacement\n"
"__UNIQUE_ID_altinstr_9" "_alt_begin:\n\t"
"call %c[new2]" "\n"
"__UNIQUE_ID_altinstr_9" "_alt_end:\n"
".popsection\n"
: "+r" (current_stack_pointer), "=D" (page)
: [old] "i" (clear_page_orig), [new1] "i" (clear_page_rep), [new2] "i" (clear_page_erms) , "D" (page)
: "cc", "memory", "rax", "rcx")
;
}
Which looks right, but utterly fails to build :(
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 15bc07a5ebb3..12a93982457a 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -5,6 +5,7 @@
#include <linux/types.h>
#include <linux/stringify.h>
#include <linux/objtool.h>
+#include <linux/compiler.h>
#include <asm/asm.h>
#include <asm/bug.h>
@@ -184,38 +185,41 @@ static inline int alternatives_text_reserved(void *start, void *end)
#define ALT_CALL_INSTR "call BUG_func"
-#define alt_slen "772b-771b"
-#define alt_total_slen "773b-771b"
-#define alt_rlen "775f-774f"
+#define alt_slen(pfx) #pfx "_pad - " #pfx "_begin"
+#define alt_total_slen(pfx) #pfx "_end - " #pfx "_begin"
+#define alt_rlen(pfx) #pfx "_alt_end - " #pfx "_alt_begin"
-#define OLDINSTR(oldinstr) \
+#define OLDINSTR(oldinstr, pfx) \
"# ALT: oldinstr\n" \
- "771:\n\t" oldinstr "\n772:\n" \
+ #pfx "_begin:\n\t" oldinstr "\n" #pfx "_pad:\n" \
"# ALT: padding\n" \
- ".skip -(((" alt_rlen ")-(" alt_slen ")) > 0) * " \
- "((" alt_rlen ")-(" alt_slen ")),0x90\n" \
- "773:\n"
+ ".skip -(((" alt_rlen(pfx) ")-(" alt_slen(pfx) ")) > 0) * " \
+ "((" alt_rlen(pfx) ")-(" alt_slen(pfx) ")),0x90\n" \
+ #pfx "_end:\n"
-#define ALTINSTR_ENTRY(ft_flags) \
+#define ALTINSTR_ENTRY(ft_flags, pfx) \
".pushsection .altinstructions,\"a\"\n" \
- " .long 771b - .\n" /* label */ \
- " .long 774f - .\n" /* new instruction */ \
+ " .long " #pfx "_begin - .\n" /* label */ \
+ " .long " #pfx "_alt_begin - .\n" /* new instruction */ \
" .4byte " __stringify(ft_flags) "\n" /* feature + flags */ \
- " .byte " alt_total_slen "\n" /* source len */ \
- " .byte " alt_rlen "\n" /* replacement len */ \
+ " .byte " alt_total_slen(pfx) "\n" /* source len */ \
+ " .byte " alt_rlen(pfx) "\n" /* replacement len */ \
".popsection\n"
-#define ALTINSTR_REPLACEMENT(newinstr) /* replacement */ \
+#define ALTINSTR_REPLACEMENT(newinstr, pfx) /* replacement */ \
".pushsection .altinstr_replacement, \"ax\"\n" \
"# ALT: replacement\n" \
- "774:\n\t" newinstr "\n775:\n" \
+ #pfx "_alt_begin:\n\t" newinstr "\n" #pfx "_alt_end:\n" \
".popsection\n"
/* alternative assembly primitive: */
-#define ALTERNATIVE(oldinstr, newinstr, ft_flags) \
- OLDINSTR(oldinstr) \
- ALTINSTR_ENTRY(ft_flags) \
- ALTINSTR_REPLACEMENT(newinstr)
+#define __ALTERNATIVE(oldinstr, newinstr, ft_flags, pfx) \
+ OLDINSTR(oldinstr, pfx) \
+ ALTINSTR_ENTRY(ft_flags, pfx) \
+ ALTINSTR_REPLACEMENT(newinstr, pfx)
+
+#define ALTERNATIVE(oldinstr, newinstr, feat) \
+ __ALTERNATIVE(oldinstr, newinstr, feat, __UNIQUE_ID(altinstr))
#define ALTERNATIVE_2(oldinstr, newinstr1, ft_flags1, newinstr2, ft_flags2) \
ALTERNATIVE(ALTERNATIVE(oldinstr, newinstr1, ft_flags1), newinstr2, ft_flags2)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 64ff73c533e5..d79552a61fda 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -163,7 +163,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
__asm__ ("" : "=r" (var) : "0" (var))
#endif
-#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__PASTE(__UNIQUE_ID_, prefix), _), __COUNTER__)
/**
* data_race - mark an expression as containing intentional data races
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 12:51 ` Peter Zijlstra
@ 2025-09-30 15:42 ` Jürgen Groß
2025-10-01 6:43 ` Peter Zijlstra
0 siblings, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2025-09-30 15:42 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
[-- Attachment #1.1.1: Type: text/plain, Size: 3829 bytes --]
On 30.09.25 14:51, Peter Zijlstra wrote:
> On Tue, Sep 30, 2025 at 10:50:44AM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 30, 2025 at 10:46:23AM +0200, Jürgen Groß wrote:
>
>>>> asm_inline volatile goto(
>>>> "1:\n"
>>>> ALTERNATIVE(PREPARE_RCX_RDX_FOR_WRMSR
>>>> "2:\n"
>>>> ALTERNATIVE("ds wrmsr",
>>>> ASM_WRMSRNS, X86_FEATURE_WRMSRNS),
>>>> ASM_WRMSRNS_IMM, X86_FEATURE_MSR_IMM);
>>>> _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS immediate */
>>>> _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
>>>>
>>>> :
>>>> : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
>>>> : "memory", "ecx", "rdx"
>>>> : badmsr);
>>>>
>
>> Oh well, lets forget about this :-)
>
> So I couldn't. I tried the below, which when building a .i generates the
> following:
>
>
> static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) void clear_page(void *page)
> {
>
> kmsan_unpoison_memory(page, ((1UL) << 12));
> asm __inline volatile(
> "# ALT: oldinstr\n"
> "__UNIQUE_ID_altinstr_9" "_begin:\n\t"
>
> "# ALT: oldinstr\n"
> "__UNIQUE_ID_altinstr_8" "_begin:\n\t"
> "call %c[old]" "\n"
> "__UNIQUE_ID_altinstr_8" "_pad:\n"
> "# ALT: padding\n"
> ".skip -(((" "__UNIQUE_ID_altinstr_8" "_alt_end - " "__UNIQUE_ID_altinstr_8" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_8" "_pad - " "__UNIQUE_ID_altinstr_8" "_begin" ")) > 0) * "
> "((" "__UNIQUE_ID_altinstr_8" "_alt_end - " "__UNIQUE_ID_altinstr_8" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_8" "_pad - " "__UNIQUE_ID_altinstr_8" "_begin" ")),0x90\n"
> "__UNIQUE_ID_altinstr_8" "_end:\n"
> ".pushsection .altinstructions,\"a\"\n"
> " .long " "__UNIQUE_ID_altinstr_8" "_begin - .\n"
> " .long " "__UNIQUE_ID_altinstr_8" "_alt_begin - .\n"
> " .4byte " "( 3*32+16)" "\n"
> " .byte " "__UNIQUE_ID_altinstr_8" "_end - " "__UNIQUE_ID_altinstr_8" "_begin" "\n"
> " .byte " "__UNIQUE_ID_altinstr_8" "_alt_end - " "__UNIQUE_ID_altinstr_8" "_alt_begin" "\n"
> ".popsection\n"
> ".pushsection .altinstr_replacement, \"ax\"\n"
> "# ALT: replacement\n"
> "__UNIQUE_ID_altinstr_8" "_alt_begin:\n\t"
> "call %c[new1]" "\n"
> "__UNIQUE_ID_altinstr_8" "_alt_end:\n"
> ".popsection\n"
> "\n"
>
> "__UNIQUE_ID_altinstr_9" "_pad:\n"
> "# ALT: padding\n"
> ".skip -(((" "__UNIQUE_ID_altinstr_9" "_alt_end - " "__UNIQUE_ID_altinstr_9" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_9" "_pad - " "__UNIQUE_ID_altinstr_9" "_begin" ")) > 0) * "
> "((" "__UNIQUE_ID_altinstr_9" "_alt_end - " "__UNIQUE_ID_altinstr_9" "_alt_begin" ")-(" "__UNIQUE_ID_altinstr_9" "_pad - " "__UNIQUE_ID_altinstr_9" "_begin" ")),0x90\n"
> "__UNIQUE_ID_altinstr_9" "_end:\n"
> ".pushsection .altinstructions,\"a\"\n"
> " .long " "__UNIQUE_ID_altinstr_9" "_begin - .\n"
> " .long " "__UNIQUE_ID_altinstr_9" "_alt_begin - .\n"
> " .4byte " "( 9*32+ 9)" "\n"
> " .byte " "__UNIQUE_ID_altinstr_9" "_end - " "__UNIQUE_ID_altinstr_9" "_begin" "\n"
> " .byte " "__UNIQUE_ID_altinstr_9" "_alt_end - " "__UNIQUE_ID_altinstr_9" "_alt_begin" "\n"
> ".popsection\n"
> ".pushsection .altinstr_replacement, \"ax\"\n"
> "# ALT: replacement\n"
> "__UNIQUE_ID_altinstr_9" "_alt_begin:\n\t"
> "call %c[new2]" "\n"
> "__UNIQUE_ID_altinstr_9" "_alt_end:\n"
> ".popsection\n"
> : "+r" (current_stack_pointer), "=D" (page)
> : [old] "i" (clear_page_orig), [new1] "i" (clear_page_rep), [new2] "i" (clear_page_erms) , "D" (page)
> : "cc", "memory", "rax", "rcx")
>
> ;
> }
>
> Which looks right, but utterly fails to build :(
What does the failure look like?
Could it be that the labels should be local ones?
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 15:42 ` Jürgen Groß
@ 2025-10-01 6:43 ` Peter Zijlstra
2025-10-01 7:23 ` Peter Zijlstra
0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-10-01 6:43 UTC (permalink / raw)
To: Jürgen Groß
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On Tue, Sep 30, 2025 at 05:42:13PM +0200, Jürgen Groß wrote:
> Could it be that the labels should be local ones?
I already tried 's/#pfx/".L" #pfx/g' and that made no difference.
> What does the failure look like?
Its complaining about label re-definitions.
$ make O=defconfig-build/ arch/x86/kernel/alternative.o
../arch/x86/include/asm/cpufeature.h: Assembler messages:
../arch/x86/include/asm/cpufeature.h:102: Error: symbol `.L__UNIQUE_ID_altinstr_67_begin' is already defined
../arch/x86/include/asm/cpufeature.h:104: Error: symbol `.L__UNIQUE_ID_altinstr_66_begin' is already defined
../arch/x86/include/asm/cpufeature.h:106: Error: symbol `.L__UNIQUE_ID_altinstr_66_pad' is already defined
../arch/x86/include/asm/cpufeature.h:109: Error: symbol `.L__UNIQUE_ID_altinstr_66_end' is already defined
../arch/x86/include/asm/cpufeature.h:119: Error: symbol `.L__UNIQUE_ID_altinstr_66_alt_begin' is already defined
../arch/x86/include/asm/cpufeature.h:121: Error: symbol `.L__UNIQUE_ID_altinstr_66_alt_end' is already defined
../arch/x86/include/asm/cpufeature.h:124: Error: symbol `.L__UNIQUE_ID_altinstr_67_pad' is already defined
../arch/x86/include/asm/cpufeature.h:127: Error: symbol `.L__UNIQUE_ID_altinstr_67_end' is already defined
../arch/x86/include/asm/cpufeature.h:137: Error: symbol `.L__UNIQUE_ID_altinstr_67_alt_begin' is already defined
../arch/x86/include/asm/cpufeature.h:139: Error: symbol `.L__UNIQUE_ID_altinstr_67_alt_end' is already defined
../arch/x86/include/asm/cpufeature.h:102: Error: symbol `.L__UNIQUE_ID_altinstr_67_begin' is already defined
../arch/x86/include/asm/cpufeature.h:104: Error: symbol `.L__UNIQUE_ID_altinstr_66_begin' is already defined
../arch/x86/include/asm/cpufeature.h:106: Error: symbol `.L__UNIQUE_ID_altinstr_66_pad' is already defined
../arch/x86/include/asm/cpufeature.h:109: Error: symbol `.L__UNIQUE_ID_altinstr_66_end' is already defined
../arch/x86/include/asm/cpufeature.h:119: Error: symbol `.L__UNIQUE_ID_altinstr_66_alt_begin' is already defined
../arch/x86/include/asm/cpufeature.h:121: Error: symbol `.L__UNIQUE_ID_altinstr_66_alt_end' is already defined
../arch/x86/include/asm/cpufeature.h:124: Error: symbol `.L__UNIQUE_ID_altinstr_67_pad' is already defined
../arch/x86/include/asm/cpufeature.h:127: Error: symbol `.L__UNIQUE_ID_altinstr_67_end' is already defined
../arch/x86/include/asm/cpufeature.h:137: Error: symbol `.L__UNIQUE_ID_altinstr_67_alt_begin' is already defined
../arch/x86/include/asm/cpufeature.h:139: Error: symbol `.L__UNIQUE_ID_altinstr_67_alt_end' is already defined
../arch/x86/include/asm/cpufeature.h:102: Error: symbol `.L__UNIQUE_ID_altinstr_67_begin' is already defined
../arch/x86/include/asm/cpufeature.h:104: Error: symbol `.L__UNIQUE_ID_altinstr_66_begin' is already defined
../arch/x86/include/asm/cpufeature.h:106: Error: symbol `.L__UNIQUE_ID_altinstr_66_pad' is already defined
../arch/x86/include/asm/cpufeature.h:109: Error: symbol `.L__UNIQUE_ID_altinstr_66_end' is already defined
../arch/x86/include/asm/cpufeature.h:119: Error: symbol `.L__UNIQUE_ID_altinstr_66_alt_begin' is already defined
../arch/x86/include/asm/cpufeature.h:121: Error: symbol `.L__UNIQUE_ID_altinstr_66_alt_end' is already defined
../arch/x86/include/asm/cpufeature.h:124: Error: symbol `.L__UNIQUE_ID_altinstr_67_pad' is already defined
../arch/x86/include/asm/cpufeature.h:127: Error: symbol `.L__UNIQUE_ID_altinstr_67_end' is already defined
...
That specific one is this:
static inline __attribute__((__gnu_inline__)) __attribute__((__unused__)) __attribute__((no_instrument_function)) __attribute__((__always_inline__)) bool _static_cpu_has(u16 bit)
{
asm goto("# ALT: oldinstr\n" ".L" "__UNIQUE_ID_altinstr_67" "_begin:\n\t" "# ALT: oldinstr\n" ".L" "__UNIQUE_ID_altinstr_66" "_begin:\n\t" "jmp 6f" "\n" ".L" "__UNIQUE_ID_altinstr_66" "_pad:\n" "# ALT: padding\n" ".skip -(((" ".L" "_ _UNIQUE_ID_altinstr_66" "_alt_end - " ".L" "__UNIQUE_ID_altinstr_66" "_alt_begin" ")-(" ".L" "__UNIQUE_ID_altinstr_66" "_pad - " ".L" "__UNIQUE_ID_altinstr_66" "_begin" ")) > 0) * " "((" ".L" "__UNIQUE_ID_altinstr_66" "_alt_end - " " .L" "__UNIQUE_ID_altinstr_66" "_alt_begin" ")-(" ".L" "__UNIQUE_ID_altinstr_66" "_pad - " ".L" "__UNIQUE_ID_altinstr_66" "_begin" ")),0x90\n" ".L" "__UNIQUE_ID_altinstr_66" "_end:\n" ".pushsection .altinstructions,\"a\"\n" " .long " ".L" "__UNIQUE_ID_altinstr_66" "_begin - .\n" " .long " ".L" "__UNIQUE_ID_altinstr_66" "_alt_begin - .\n" " .4byte " "( 3*32+21)" "\n" " .byte " ".L" "__UNIQUE_ID_altinstr_66" "_end - " ".L" "__UNIQUE_ID_altinstr_66" "_begin" "\n" " .byte " ".L" "__UNIQUE_ID_altinstr_66" "_alt_end - " ".L" "__UNIQUE_ID_altinstr_66" "_alt_begin" "\n" ".popsection\n" ".pushsection .altinstr_replacement, \"ax\"\n" "# ALT: replacement\n" ".L" "__UNIQUE_ID_altinstr_66" "_alt_begin:\n \t" "jmp %l[t_no]" "\n" ".L" "__UNIQUE_ID_altinstr_66" "_alt_end:\n" ".popsection\n" "\n" ".L" "__UNIQUE_ID_altinstr_67" "_pad:\n" "# ALT: padding\n" ".skip -(((" ".L" "__UNIQUE_ID_altinstr_67" "_alt_end - " ".L" "__UNIQUE_ID_altinst r_67" "_alt_begin" ")-(" ".L" "__UNIQUE_ID_altinstr_67" "_pad - " ".L" "__UNIQUE_ID_altinstr_67" "_begin" ")) > 0) * " "((" ".L" "__UNIQUE_ID_altinstr_67" "_alt_end - " ".L" "__UNIQUE_ID_altinstr_67" "_alt_begin" ")-(" ".L" "__UNIQUE _ID_altinstr_67" "_pad - " ".L" "__UNIQUE_ID_altinstr_67" "_begin" ")),0x90\n" ".L" "__UNIQUE_ID_altinstr_67" "_end:\n" ".pushsection .altinstructions,\"a\"\n" " .long " ".L" "__UNIQUE_ID_altinstr_67" "_begin - .\n" " .long " ".L" "_ _UNIQUE_ID_altinstr_67" "_alt_begin - .\n" " .4byte " "%c[feature]" "\n" " .byte " ".L" "__UNIQUE_ID_altinstr_67" "_end - " ".L" "__UNIQUE_ID_altinstr_67" "_begin" "\n" " .byte " ".L" "__UNIQUE_ID_altinstr_67" "_alt_end - " ".L" "__U NIQUE_ID_altinstr_67" "_alt_begin" "\n" ".popsection\n" ".pushsection .altinstr_replacement, \"ax\"\n" "# ALT: replacement\n" ".L" "__UNIQUE_ID_altinstr_67" "_alt_begin:\n\t" "" "\n" ".L" "__UNIQUE_ID_altinstr_67" "_alt_end:\n" ".pop section\n"
".pushsection .altinstr_aux,\"ax\"\n"
"6:\n"
" testb %[bitnum], %a[cap_byte]\n"
" jnz %l[t_yes]\n"
" jmp %l[t_no]\n"
".popsection\n"
: : [feature] "i" (bit),
[bitnum] "i" (1 << (bit & 7)),
[cap_byte] "i" (&((const char *)boot_cpu_data.x86_capability)[bit >> 3])
: : t_yes, t_no);
t_yes:
return true;
t_no:
return false;
}
What I'm thinking is happening is that the __always_inline__ is causing
multiple exact copies of the thing.
Let me see how terrible it all ends up when using as macros
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-10-01 6:43 ` Peter Zijlstra
@ 2025-10-01 7:23 ` Peter Zijlstra
2025-10-03 14:23 ` Dave Hansen
0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-10-01 7:23 UTC (permalink / raw)
To: Jürgen Groß
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On Wed, Oct 01, 2025 at 08:43:39AM +0200, Peter Zijlstra wrote:
> Let me see how terrible it all ends up when using as macros
Argh, as macros are differently painful. I hate computers :/
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-10-01 7:23 ` Peter Zijlstra
@ 2025-10-03 14:23 ` Dave Hansen
2025-10-03 16:53 ` H. Peter Anvin
0 siblings, 1 reply; 43+ messages in thread
From: Dave Hansen @ 2025-10-03 14:23 UTC (permalink / raw)
To: Peter Zijlstra, Jürgen Groß
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On 10/1/25 00:23, Peter Zijlstra wrote:
> On Wed, Oct 01, 2025 at 08:43:39AM +0200, Peter Zijlstra wrote:
>> Let me see how terrible it all ends up when using as macros
> Argh, as macros are differently painful. I hate computers :/
ALTERNATIVES are fun and all, but is there a good reason we're pulling
out our hair to use them here?
Normal WRMSR is slooooooooooow. The ones that aren't slow don't need
WRMSRNS in the first place.
Would an out-of-line wrmsr() with an if() in it be so bad? Or a
static_call()? Having WRMSR be inlined in a laudable goal, but I'm
really asking if it's worth it.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-10-03 14:23 ` Dave Hansen
@ 2025-10-03 16:53 ` H. Peter Anvin
0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2025-10-03 16:53 UTC (permalink / raw)
To: Dave Hansen, Peter Zijlstra, Jürgen Groß
Cc: linux-kernel, x86, llvm, xin, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, Nathan Chancellor, Nick Desaulniers,
Bill Wendling, Justin Stitt
On October 3, 2025 7:23:29 AM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
>On 10/1/25 00:23, Peter Zijlstra wrote:
>> On Wed, Oct 01, 2025 at 08:43:39AM +0200, Peter Zijlstra wrote:
>>> Let me see how terrible it all ends up when using as macros
>> Argh, as macros are differently painful. I hate computers :/
>
>ALTERNATIVES are fun and all, but is there a good reason we're pulling
>out our hair to use them here?
>
>Normal WRMSR is slooooooooooow. The ones that aren't slow don't need
>WRMSRNS in the first place.
>
>Would an out-of-line wrmsr() with an if() in it be so bad? Or a
>static_call()? Having WRMSR be inlined in a laudable goal, but I'm
>really asking if it's worth it.
We need them to use wrmsrns immediate.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 8:46 ` Jürgen Groß
2025-09-30 8:50 ` Peter Zijlstra
@ 2025-10-01 8:49 ` Juergen Gross
2025-10-01 10:50 ` Peter Zijlstra
1 sibling, 1 reply; 43+ messages in thread
From: Juergen Gross @ 2025-10-01 8:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
[-- Attachment #1.1.1: Type: text/plain, Size: 3918 bytes --]
On 30.09.25 10:46, Jürgen Groß wrote:
> On 30.09.25 10:31, Peter Zijlstra wrote:
>> On Tue, Sep 30, 2025 at 09:03:53AM +0200, Juergen Gross wrote:
>>
>>> +static __always_inline bool __wrmsrq_constant(u32 msr, u64 val, int type)
>>> +{
>>> + BUILD_BUG_ON(!__builtin_constant_p(msr));
>>> +
>>> + asm_inline volatile goto(
>>> + "1:\n"
>>> + ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
>>> + "2: ds wrmsr",
>>> + PREPARE_RCX_RDX_FOR_WRMSR
>>> + ASM_WRMSRNS,
>>> + X86_FEATURE_WRMSRNS,
>>> + ASM_WRMSRNS_IMM,
>>> + X86_FEATURE_MSR_IMM)
>>> + _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS
>>> immediate */
>>> + _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
>>> +
>>> + :
>>> + : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
>>> + : "memory", "ecx", "rdx"
>>> + : badmsr);
>>> +
>>> + return false;
>>> +
>>> +badmsr:
>>> + return true;
>>> +}
>>
>> Just wondering, would something this work?
>>
>> asm_inline volatile goto(
>> "1:\n"
>> ALTERNATIVE(PREPARE_RCX_RDX_FOR_WRMSR
>> "2:\n"
>> ALTERNATIVE("ds wrmsr",
>> ASM_WRMSRNS, X86_FEATURE_WRMSRNS),
>> ASM_WRMSRNS_IMM, X86_FEATURE_MSR_IMM);
>> _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For WRMSRNS
>> immediate */
>> _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For WRMSR(NS) */
>>
>> :
>> : [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
>> : "memory", "ecx", "rdx"
>> : badmsr);
>>
>> Its a bit weird because the nested alternative isn't for the exact same
>> position I suppose. But I find it a more readable form.
>
> I don't think it would work. Nested ALTERNATIVE()s do work only with
> all of them starting at the same location. Have a look at the
> ALTERNATIVE() macro, which is defining the label "771" via OLDINSTR()
> and then referring to this label via ALTINSTR_ENTRY(). In your case
> the ALTINSTR_ENTRY() of the outer ALTERNATIVE() invocation would find
> the wrong "771" label (the one of the inner ALTERNATIVE()).
>
> Allowing such constructs would probably require switching from preprocessor
> macros to assembler macros.
Thinking more about that I believe there are additional problems:
Having overlapping alternatives not starting at the same address will result
in problems with length padding of the outer alternative in case the inner
one starting later is extending past the end of the outer one. This might be
possible to handle, but it will be tedious.
A similar problem occurs with my recent series for merging nested alternative
patching into a temporary buffer. Currently the code relies on all nested
alternatives to start at the same location.
Using your idea with pv_ops could result in the inner alternative not being
at the start of the outer alternative AND being not the initial code. This
would result in patching in the .altinstructions area instead of the normal
.text site, resulting in possible loss of a patching action if the patched
area would have been used as a replacement before.
So in my opinion allowing alternatives to nest without all inner levels
starting at the same location as the outermost level would be a receipt for
failure.
I think I'll write another patch to BUG() in case such a situation is being
detected.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-10-01 8:49 ` Juergen Gross
@ 2025-10-01 10:50 ` Peter Zijlstra
2025-10-01 11:16 ` Jürgen Groß
0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-10-01 10:50 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On Wed, Oct 01, 2025 at 10:49:31AM +0200, Juergen Gross wrote:
> Thinking more about that I believe there are additional problems:
>
> Having overlapping alternatives not starting at the same address will result
> in problems with length padding of the outer alternative in case the inner
> one starting later is extending past the end of the outer one. This might be
> possible to handle, but it will be tedious.
Yes, this must not happen.
> Using your idea with pv_ops could result in the inner alternative not being
> at the start of the outer alternative AND being not the initial code. This
> would result in patching in the .altinstructions area instead of the normal
> .text site, resulting in possible loss of a patching action if the patched
> area would have been used as a replacement before.
Not quite, the nested alternative was in the orig_insn part. So it would
result in patching the orig text twice, once from the inner (which comes
first in the patch list) and then once again from the outer (which comes
later).
> So in my opinion allowing alternatives to nest without all inner levels
> starting at the same location as the outermost level would be a receipt for
> failure.
Certainly tricky, no argument there.
> I think I'll write another patch to BUG() in case such a situation is being
> detected.
Fair enough; we should not currently have any such cases. And going by
my attempt to make it work, its going to be really tricky in any case.
But please put on a comment on why, which explains the constraints.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-10-01 10:50 ` Peter Zijlstra
@ 2025-10-01 11:16 ` Jürgen Groß
0 siblings, 0 replies; 43+ messages in thread
From: Jürgen Groß @ 2025-10-01 11:16 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
[-- Attachment #1.1.1: Type: text/plain, Size: 1886 bytes --]
On 01.10.25 12:50, Peter Zijlstra wrote:
> On Wed, Oct 01, 2025 at 10:49:31AM +0200, Juergen Gross wrote:
>
>> Thinking more about that I believe there are additional problems:
>>
>> Having overlapping alternatives not starting at the same address will result
>> in problems with length padding of the outer alternative in case the inner
>> one starting later is extending past the end of the outer one. This might be
>> possible to handle, but it will be tedious.
>
> Yes, this must not happen.
>
>> Using your idea with pv_ops could result in the inner alternative not being
>> at the start of the outer alternative AND being not the initial code. This
>> would result in patching in the .altinstructions area instead of the normal
>> .text site, resulting in possible loss of a patching action if the patched
>> area would have been used as a replacement before.
>
> Not quite, the nested alternative was in the orig_insn part. So it would
> result in patching the orig text twice, once from the inner (which comes
> first in the patch list) and then once again from the outer (which comes
> later).
Yes, but that was the native case only.
With pv_ops this would mean the original instruction would be the
paravirt-call, resulting in your construct becoming an inner nesting level.
>
>> So in my opinion allowing alternatives to nest without all inner levels
>> starting at the same location as the outermost level would be a receipt for
>> failure.
>
> Certainly tricky, no argument there.
>
>> I think I'll write another patch to BUG() in case such a situation is being
>> detected.
>
> Fair enough; we should not currently have any such cases. And going by
> my attempt to make it work, its going to be really tricky in any case.
>
> But please put on a comment on why, which explains the constraints.
Agreed.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 7:03 ` [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR Juergen Gross
2025-09-30 8:31 ` Peter Zijlstra
@ 2025-09-30 16:00 ` Sean Christopherson
2025-10-01 9:13 ` Jürgen Groß
1 sibling, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2025-09-30 16:00 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On Tue, Sep 30, 2025, Juergen Gross wrote:
> When available use one of the non-serializing WRMSR variants (WRMSRNS
> with or without an immediate operand specifying the MSR register) in
> __wrmsrq().
>
> For the safe/unsafe variants make __wrmsrq() to be a common base
> function instead of duplicating the ALTERNATIVE*() macros. This
> requires to let native_wrmsr() use native_wrmsrq() instead of
> __wrmsrq(). While changing this, convert native_wrmsr() into an inline
> function.
>
> Replace the only call of wsrmsrns() with the now equivalent call to
> native_wrmsrq() and remove wsrmsrns().
>
> The paravirt case will be handled later.
...
> @@ -268,21 +357,6 @@ static inline int wrmsrq_safe(u32 msr, u64 val)
> return err;
> }
>
> -/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
> -#define ASM_WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
> -
> -/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
> -static __always_inline void wrmsrns(u32 msr, u64 val)
FYI, a use of wrmsrns() is likely coming in through the KVM (x86) tree, commit
65391feb042b ("KVM: VMX: Add host MSR read/write helpers to consolidate preemption
handling").
Probably makes sense to spin v3 after the merge window? Or on linux-next? (I
can't tell what was used as the base, and I double-checked that the above commit
is in linux-next).
> -{
> - /*
> - * WRMSR is 2 bytes. WRMSRNS is 3 bytes. Pad WRMSR with a redundant
> - * DS prefix to avoid a trailing NOP.
> - */
> - asm volatile("1: " ALTERNATIVE("ds wrmsr", ASM_WRMSRNS, X86_FEATURE_WRMSRNS)
> - "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
> - : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
> -}
> -
> static inline void wrmsr(u32 msr, u32 low, u32 high)
> {
> wrmsrq(msr, (u64)high << 32 | low);
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR
2025-09-30 16:00 ` Sean Christopherson
@ 2025-10-01 9:13 ` Jürgen Groß
0 siblings, 0 replies; 43+ messages in thread
From: Jürgen Groß @ 2025-10-01 9:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-kernel, x86, llvm, xin, H. Peter Anvin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
[-- Attachment #1.1.1: Type: text/plain, Size: 1574 bytes --]
On 30.09.25 18:00, Sean Christopherson wrote:
> On Tue, Sep 30, 2025, Juergen Gross wrote:
>> When available use one of the non-serializing WRMSR variants (WRMSRNS
>> with or without an immediate operand specifying the MSR register) in
>> __wrmsrq().
>>
>> For the safe/unsafe variants make __wrmsrq() to be a common base
>> function instead of duplicating the ALTERNATIVE*() macros. This
>> requires to let native_wrmsr() use native_wrmsrq() instead of
>> __wrmsrq(). While changing this, convert native_wrmsr() into an inline
>> function.
>>
>> Replace the only call of wsrmsrns() with the now equivalent call to
>> native_wrmsrq() and remove wsrmsrns().
>>
>> The paravirt case will be handled later.
>
> ...
>
>> @@ -268,21 +357,6 @@ static inline int wrmsrq_safe(u32 msr, u64 val)
>> return err;
>> }
>>
>> -/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>> -#define ASM_WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
>> -
>> -/* Non-serializing WRMSR, when available. Falls back to a serializing WRMSR. */
>> -static __always_inline void wrmsrns(u32 msr, u64 val)
>
> FYI, a use of wrmsrns() is likely coming in through the KVM (x86) tree, commit
> 65391feb042b ("KVM: VMX: Add host MSR read/write helpers to consolidate preemption
> handling").
Thanks for the heads up!
>
> Probably makes sense to spin v3 after the merge window? Or on linux-next? (I
> can't tell what was used as the base, and I double-checked that the above commit
> is in linux-next).
I'll find a proper solution. :-)
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 10/12] x86/msr: Use the alternatives mechanism for RDMSR
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (8 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 09/12] x86/msr: Use the alternatives mechanism for WRMSR Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 7:03 ` [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions Juergen Gross
` (2 subsequent siblings)
12 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86
Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin
When available use the immediate variant of RDMSR in __rdmsr().
For the safe/unsafe variants make __rdmsr() to be a common base
function instead of duplicating the ALTERNATIVE*() macros.
Modify native_rdmsr() and native_read_msr() to use native_rdmsrq().
The paravirt case will be handled later.
Originally-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch, partially taken from "[RFC PATCH v2 22/34] x86/msr: Utilize
the alternatives mechanism to read MSR" by Xin Li
---
arch/x86/include/asm/msr.h | 116 ++++++++++++++++++++++++++++---------
1 file changed, 89 insertions(+), 27 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 19ed780c2a09..cc592611e333 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -65,6 +65,8 @@ static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
/* The GNU Assembler (Gas) with Binutils 2.41 adds the .insn directive support */
#if defined(CONFIG_AS_IS_GNU) && CONFIG_AS_VERSION >= 24100
+#define ASM_RDMSR_IMM \
+ " .insn VEX.128.F2.M7.W0 0xf6 /0, %[msr]%{:u32}, %[val]\n\t"
#define ASM_WRMSRNS_IMM \
" .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
#else
@@ -74,10 +76,17 @@ static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
* The register operand is encoded as %rax because all uses of the immediate
* form MSR access instructions reference %rax as the register operand.
*/
+#define ASM_RDMSR_IMM \
+ " .byte 0xc4,0xe7,0x7b,0xf6,0xc0; .long %c[msr]"
#define ASM_WRMSRNS_IMM \
" .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
#endif
+#define RDMSR_AND_SAVE_RESULT \
+ "rdmsr\n\t" \
+ "shl $0x20, %%rdx\n\t" \
+ "or %%rdx, %%rax\n\t"
+
#define PREPARE_RDX_FOR_WRMSR \
"mov %%rax, %%rdx\n\t" \
"shr $0x20, %%rdx\n\t"
@@ -111,16 +120,76 @@ static __always_inline bool is_msr_imm_insn(void *ip)
* think of extending them - you will be slapped with a stinking trout or a frozen
* shark will reach you, wherever you are! You've been warned.
*/
-static __always_inline u64 __rdmsr(u32 msr)
+static __always_inline bool __rdmsrq_variable(u32 msr, u64 *val, int type)
+ {
+#ifdef CONFIG_X86_64
+ BUILD_BUG_ON(__builtin_constant_p(msr));
+
+ asm_inline volatile goto(
+ "1:\n"
+ RDMSR_AND_SAVE_RESULT
+ _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For RDMSR */
+
+ : [val] "=a" (*val)
+ : "c" (msr), [type] "i" (type)
+ : "rdx"
+ : badmsr);
+#else
+ asm_inline volatile goto(
+ "1: rdmsr\n\t"
+ _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For RDMSR */
+
+ : "=A" (*val)
+ : "c" (msr), [type] "i" (type)
+ :
+ : badmsr);
+#endif
+
+ return false;
+
+badmsr:
+ *val = 0;
+
+ return true;
+}
+
+#ifdef CONFIG_X86_64
+static __always_inline bool __rdmsrq_constant(u32 msr, u64 *val, int type)
{
- EAX_EDX_DECLARE_ARGS(val, low, high);
+ BUILD_BUG_ON(!__builtin_constant_p(msr));
- asm volatile("1: rdmsr\n"
- "2:\n"
- _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
- : EAX_EDX_RET(val, low, high) : "c" (msr));
+ asm_inline volatile goto(
+ "1:\n"
+ ALTERNATIVE("mov %[msr], %%ecx\n\t"
+ "2:\n"
+ RDMSR_AND_SAVE_RESULT,
+ ASM_RDMSR_IMM,
+ X86_FEATURE_MSR_IMM)
+ _ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type]) /* For RDMSR immediate */
+ _ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type]) /* For RDMSR */
+
+ : [val] "=a" (*val)
+ : [msr] "i" (msr), [type] "i" (type)
+ : "ecx", "rdx"
+ : badmsr);
- return EAX_EDX_VAL(val, low, high);
+ return false;
+
+badmsr:
+ *val = 0;
+
+ return true;
+}
+#endif
+
+static __always_inline bool __rdmsr(u32 msr, u64 *val, int type)
+{
+#ifdef CONFIG_X86_64
+ if (__builtin_constant_p(msr))
+ return __rdmsrq_constant(msr, val, type);
+#endif
+
+ return __rdmsrq_variable(msr, val, type);
}
static __always_inline bool __wrmsrq_variable(u32 msr, u64 val, int type)
@@ -195,18 +264,22 @@ static __always_inline bool __wrmsrq(u32 msr, u64 val, int type)
return __wrmsrq_variable(msr, val, type);
}
+static __always_inline u64 native_rdmsrq(u32 msr)
+{
+ u64 val;
+
+ __rdmsr(msr, &val, EX_TYPE_RDMSR);
+
+ return val;
+}
+
#define native_rdmsr(msr, val1, val2) \
do { \
- u64 __val = __rdmsr((msr)); \
+ u64 __val = native_rdmsrq((msr)); \
(void)((val1) = (u32)__val); \
(void)((val2) = (u32)(__val >> 32)); \
} while (0)
-static __always_inline u64 native_rdmsrq(u32 msr)
-{
- return __rdmsr(msr);
-}
-
static __always_inline void native_wrmsrq(u32 msr, u64 val)
{
__wrmsrq(msr, val, EX_TYPE_WRMSR);
@@ -219,23 +292,12 @@ static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
static inline u64 native_read_msr(u32 msr)
{
- return __rdmsr(msr);
+ return native_rdmsrq(msr);
}
-static inline int native_read_msr_safe(u32 msr, u64 *p)
+static inline int native_read_msr_safe(u32 msr, u64 *val)
{
- int err;
- EAX_EDX_DECLARE_ARGS(val, low, high);
-
- asm volatile("1: rdmsr ; xor %[err],%[err]\n"
- "2:\n\t"
- _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
- : [err] "=r" (err), EAX_EDX_RET(val, low, high)
- : "c" (msr));
-
- *p = EAX_EDX_VAL(val, low, high);
-
- return err;
+ return __rdmsr(msr, val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
}
/* Can be uninlined because referenced by paravirt */
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (9 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 10/12] x86/msr: Use the alternatives mechanism for RDMSR Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 8:38 ` Peter Zijlstra
2025-09-30 21:27 ` kernel test robot
2025-09-30 7:03 ` [PATCH v2 12/12] x86/msr: Reduce number of low level MSR access helpers Juergen Gross
2025-09-30 19:19 ` [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions H. Peter Anvin
12 siblings, 2 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86, virtualization
Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Ajay Kaher, Alexey Makhalov,
Broadcom internal kernel review list, Boris Ostrovsky, xen-devel
Instead of using the pv_ops vector for RDMSR/WRMSR related functions,
use a more explicit approach allowing to inline the RDMSR/WRMSR
instructions directly when not running as a Xen PV guest.
By using cpu_feature_enabled(X86_FEATURE_XENPV) for the Xen PV case
the related calls to Xen specific code will be statically disabled via
runtime patching.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
---
arch/x86/include/asm/msr.h | 57 ++++++++++++++++++++++-----
arch/x86/include/asm/paravirt.h | 45 ---------------------
arch/x86/include/asm/paravirt_types.h | 13 ------
arch/x86/kernel/paravirt.c | 5 ---
arch/x86/xen/enlighten_pv.c | 20 ++++------
arch/x86/xen/pmu.c | 1 +
6 files changed, 57 insertions(+), 84 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index cc592611e333..d42cd2c19818 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -290,24 +290,22 @@ static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
native_wrmsrq(msr, (u64)high << 32 | low);
}
-static inline u64 native_read_msr(u32 msr)
+static __always_inline u64 native_read_msr(u32 msr)
{
return native_rdmsrq(msr);
}
-static inline int native_read_msr_safe(u32 msr, u64 *val)
+static __always_inline int native_read_msr_safe(u32 msr, u64 *val)
{
return __rdmsr(msr, val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
}
-/* Can be uninlined because referenced by paravirt */
-static inline void notrace native_write_msr(u32 msr, u64 val)
+static __always_inline void native_write_msr(u32 msr, u64 val)
{
native_wrmsrq(msr, val);
}
-/* Can be uninlined because referenced by paravirt */
-static inline int notrace native_write_msr_safe(u32 msr, u64 val)
+static __always_inline int native_write_msr_safe(u32 msr, u64 val)
{
return __wrmsrq(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
}
@@ -325,8 +323,49 @@ static inline u64 native_read_pmc(int counter)
return EAX_EDX_VAL(val, low, high);
}
-#ifdef CONFIG_PARAVIRT_XXL
-#include <asm/paravirt.h>
+#ifdef CONFIG_XEN_PV
+#include <asm/xen/msr.h>
+
+static __always_inline u64 read_msr(u32 msr)
+{
+ if (cpu_feature_enabled(X86_FEATURE_XENPV))
+ return xen_read_msr(msr);
+
+ return native_rdmsrq(msr);
+}
+
+static __always_inline int read_msr_safe(u32 msr, u64 *p)
+{
+ if (cpu_feature_enabled(X86_FEATURE_XENPV))
+ return xen_read_msr_safe(msr, p);
+
+ return native_read_msr_safe(msr, p);
+}
+
+static __always_inline void write_msr(u32 msr, u64 val)
+{
+ if (cpu_feature_enabled(X86_FEATURE_XENPV))
+ xen_write_msr(msr, val);
+ else
+ native_wrmsrq(msr, val);
+}
+
+static __always_inline int write_msr_safe(u32 msr, u64 val)
+{
+ if (cpu_feature_enabled(X86_FEATURE_XENPV))
+ return xen_write_msr_safe(msr, val);
+
+ return native_write_msr_safe(msr, val);
+}
+
+static __always_inline u64 rdpmc(int counter)
+{
+ if (cpu_feature_enabled(X86_FEATURE_XENPV))
+ return xen_read_pmc(counter);
+
+ return native_read_pmc(counter);
+}
+
#else
static __always_inline u64 read_msr(u32 msr)
{
@@ -353,7 +392,7 @@ static __always_inline u64 rdpmc(int counter)
return native_read_pmc(counter);
}
-#endif /* !CONFIG_PARAVIRT_XXL */
+#endif /* !CONFIG_XEN_PV */
/*
* Access to machine-specific registers (available on 586 and better only)
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index dc297f62b935..45f47b7d9f56 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -175,51 +175,6 @@ static inline void __write_cr4(unsigned long x)
PVOP_VCALL1(cpu.write_cr4, x);
}
-static inline u64 paravirt_read_msr(u32 msr)
-{
- return PVOP_CALL1(u64, cpu.read_msr, msr);
-}
-
-static inline void paravirt_write_msr(u32 msr, u64 val)
-{
- PVOP_VCALL2(cpu.write_msr, msr, val);
-}
-
-static inline int paravirt_read_msr_safe(u32 msr, u64 *val)
-{
- return PVOP_CALL2(int, cpu.read_msr_safe, msr, val);
-}
-
-static inline int paravirt_write_msr_safe(u32 msr, u64 val)
-{
- return PVOP_CALL2(int, cpu.write_msr_safe, msr, val);
-}
-
-static __always_inline u64 read_msr(u32 msr)
-{
- return paravirt_read_msr(msr);
-}
-
-static __always_inline int read_msr_safe(u32 msr, u64 *p)
-{
- return paravirt_read_msr_safe(msr, p);
-}
-
-static __always_inline void write_msr(u32 msr, u64 val)
-{
- paravirt_write_msr(msr, val);
-}
-
-static __always_inline int write_msr_safe(u32 msr, u64 val)
-{
- return paravirt_write_msr_safe(msr, val);
-}
-
-static __always_inline u64 rdpmc(int counter)
-{
- return PVOP_CALL1(u64, cpu.read_pmc, counter);
-}
-
static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries)
{
PVOP_VCALL2(cpu.alloc_ldt, ldt, entries);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 37a8627d8277..0d03e658ea8f 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -90,19 +90,6 @@ struct pv_cpu_ops {
void (*cpuid)(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx);
- /* Unsafe MSR operations. These will warn or panic on failure. */
- u64 (*read_msr)(u32 msr);
- void (*write_msr)(u32 msr, u64 val);
-
- /*
- * Safe MSR operations.
- * Returns 0 or -EIO.
- */
- int (*read_msr_safe)(u32 msr, u64 *val);
- int (*write_msr_safe)(u32 msr, u64 val);
-
- u64 (*read_pmc)(int counter);
-
void (*start_context_switch)(struct task_struct *prev);
void (*end_context_switch)(struct task_struct *next);
#endif
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index ab3e172dcc69..240eeb1beab5 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -129,11 +129,6 @@ struct paravirt_patch_template pv_ops = {
.cpu.read_cr0 = native_read_cr0,
.cpu.write_cr0 = native_write_cr0,
.cpu.write_cr4 = native_write_cr4,
- .cpu.read_msr = native_read_msr,
- .cpu.write_msr = native_write_msr,
- .cpu.read_msr_safe = native_read_msr_safe,
- .cpu.write_msr_safe = native_write_msr_safe,
- .cpu.read_pmc = native_read_pmc,
.cpu.load_tr_desc = native_load_tr_desc,
.cpu.set_ldt = native_set_ldt,
.cpu.load_gdt = native_load_gdt,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 26bbaf4b7330..df653099c567 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1160,15 +1160,16 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
}
}
-static int xen_read_msr_safe(u32 msr, u64 *val)
+int xen_read_msr_safe(u32 msr, u64 *val)
{
int err = 0;
*val = xen_do_read_msr(msr, &err);
return err;
}
+EXPORT_SYMBOL(xen_read_msr_safe);
-static int xen_write_msr_safe(u32 msr, u64 val)
+int xen_write_msr_safe(u32 msr, u64 val)
{
int err = 0;
@@ -1176,20 +1177,23 @@ static int xen_write_msr_safe(u32 msr, u64 val)
return err;
}
+EXPORT_SYMBOL(xen_write_msr_safe);
-static u64 xen_read_msr(u32 msr)
+u64 xen_read_msr(u32 msr)
{
int err = 0;
return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
}
+EXPORT_SYMBOL(xen_read_msr);
-static void xen_write_msr(u32 msr, u64 val)
+void xen_write_msr(u32 msr, u64 val)
{
int err;
xen_do_write_msr(msr, val, xen_msr_safe ? &err : NULL);
}
+EXPORT_SYMBOL(xen_write_msr);
/* This is called once we have the cpu_possible_mask */
void __init xen_setup_vcpu_info_placement(void)
@@ -1225,14 +1229,6 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
.write_cr4 = xen_write_cr4,
- .read_msr = xen_read_msr,
- .write_msr = xen_write_msr,
-
- .read_msr_safe = xen_read_msr_safe,
- .write_msr_safe = xen_write_msr_safe,
-
- .read_pmc = xen_read_pmc,
-
.load_tr_desc = paravirt_nop,
.set_ldt = xen_set_ldt,
.load_gdt = xen_load_gdt,
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index d49a3bdc448b..d0dea950cd4f 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -370,6 +370,7 @@ u64 xen_read_pmc(int counter)
else
return xen_intel_read_pmc(counter);
}
+EXPORT_SYMBOL(xen_read_pmc);
int pmu_apic_update(uint32_t val)
{
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 7:03 ` [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions Juergen Gross
@ 2025-09-30 8:38 ` Peter Zijlstra
2025-09-30 9:02 ` Jürgen Groß
2025-09-30 21:27 ` kernel test robot
1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-09-30 8:38 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, x86, virtualization, xin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
On Tue, Sep 30, 2025 at 09:03:55AM +0200, Juergen Gross wrote:
> +static __always_inline u64 read_msr(u32 msr)
> +{
> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + return xen_read_msr(msr);
> +
> + return native_rdmsrq(msr);
> +}
> +
> +static __always_inline int read_msr_safe(u32 msr, u64 *p)
> +{
> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + return xen_read_msr_safe(msr, p);
> +
> + return native_read_msr_safe(msr, p);
> +}
> +
> +static __always_inline void write_msr(u32 msr, u64 val)
> +{
> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + xen_write_msr(msr, val);
> + else
> + native_wrmsrq(msr, val);
> +}
> +
> +static __always_inline int write_msr_safe(u32 msr, u64 val)
> +{
> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + return xen_write_msr_safe(msr, val);
> +
> + return native_write_msr_safe(msr, val);
> +}
> +
> +static __always_inline u64 rdpmc(int counter)
> +{
> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + return xen_read_pmc(counter);
> +
> + return native_read_pmc(counter);
> +}
Egads, didn't we just construct giant ALTERNATIVE()s for the native_
things? Why wrap that in a cpu_feature_enabled() instead of just adding
one more case to the ALTERNATIVE() ?
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 8:38 ` Peter Zijlstra
@ 2025-09-30 9:02 ` Jürgen Groß
2025-09-30 10:04 ` Peter Zijlstra
0 siblings, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2025-09-30 9:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, x86, virtualization, xin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2474 bytes --]
On 30.09.25 10:38, Peter Zijlstra wrote:
> On Tue, Sep 30, 2025 at 09:03:55AM +0200, Juergen Gross wrote:
>
>> +static __always_inline u64 read_msr(u32 msr)
>> +{
>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>> + return xen_read_msr(msr);
>> +
>> + return native_rdmsrq(msr);
>> +}
>> +
>> +static __always_inline int read_msr_safe(u32 msr, u64 *p)
>> +{
>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>> + return xen_read_msr_safe(msr, p);
>> +
>> + return native_read_msr_safe(msr, p);
>> +}
>> +
>> +static __always_inline void write_msr(u32 msr, u64 val)
>> +{
>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>> + xen_write_msr(msr, val);
>> + else
>> + native_wrmsrq(msr, val);
>> +}
>> +
>> +static __always_inline int write_msr_safe(u32 msr, u64 val)
>> +{
>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>> + return xen_write_msr_safe(msr, val);
>> +
>> + return native_write_msr_safe(msr, val);
>> +}
>> +
>> +static __always_inline u64 rdpmc(int counter)
>> +{
>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>> + return xen_read_pmc(counter);
>> +
>> + return native_read_pmc(counter);
>> +}
>
> Egads, didn't we just construct giant ALTERNATIVE()s for the native_
> things? Why wrap that in a cpu_feature_enabled() instead of just adding
> one more case to the ALTERNATIVE() ?
The problem I encountered with using pv_ops was to implement the *_safe()
variants. There is no simple way to do that using ALTERNATIVE_<n>(), as
in the Xen PV case the call will remain, and I didn't find a way to
specify a sane interface between the call-site and the called Xen function
to return the error indicator. Remember that at the call site the main
interface is the one of the RDMSR/WRMSR instructions. They lack an error
indicator.
In Xin's series there was a patch written initially by you to solve such
a problem by adding the _ASM_EXTABLE_FUNC_REWIND() exception table method.
I think this is a dead end, as it will break when using a shadow stack.
Additionally I found a rather ugly hack only to avoid re-iterating most of
the bare metal ALTERNATIVE() for the paravirt case. It is possible, but the
bare metal case is gaining one additional ALTERNATIVE level, resulting in
patching the original instruction with an identical copy first.
Another benefit of my approach is the dropping of "#include <asm/paravirt.h>"
from msr.h, which is making life a little bit easier.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 9:02 ` Jürgen Groß
@ 2025-09-30 10:04 ` Peter Zijlstra
2025-09-30 10:43 ` Jürgen Groß
0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2025-09-30 10:04 UTC (permalink / raw)
To: Jürgen Groß
Cc: linux-kernel, x86, virtualization, xin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
[-- Attachment #1: Type: text/plain, Size: 4236 bytes --]
On Tue, Sep 30, 2025 at 11:02:52AM +0200, Jürgen Groß wrote:
> On 30.09.25 10:38, Peter Zijlstra wrote:
> > On Tue, Sep 30, 2025 at 09:03:55AM +0200, Juergen Gross wrote:
> >
> > > +static __always_inline u64 read_msr(u32 msr)
> > > +{
> > > + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> > > + return xen_read_msr(msr);
> > > +
> > > + return native_rdmsrq(msr);
> > > +}
> > > +
> > > +static __always_inline int read_msr_safe(u32 msr, u64 *p)
> > > +{
> > > + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> > > + return xen_read_msr_safe(msr, p);
> > > +
> > > + return native_read_msr_safe(msr, p);
> > > +}
> > > +
> > > +static __always_inline void write_msr(u32 msr, u64 val)
> > > +{
> > > + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> > > + xen_write_msr(msr, val);
> > > + else
> > > + native_wrmsrq(msr, val);
> > > +}
> > > +
> > > +static __always_inline int write_msr_safe(u32 msr, u64 val)
> > > +{
> > > + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> > > + return xen_write_msr_safe(msr, val);
> > > +
> > > + return native_write_msr_safe(msr, val);
> > > +}
> > > +
> > > +static __always_inline u64 rdpmc(int counter)
> > > +{
> > > + if (cpu_feature_enabled(X86_FEATURE_XENPV))
> > > + return xen_read_pmc(counter);
> > > +
> > > + return native_read_pmc(counter);
> > > +}
> >
> > Egads, didn't we just construct giant ALTERNATIVE()s for the native_
> > things? Why wrap that in a cpu_feature_enabled() instead of just adding
> > one more case to the ALTERNATIVE() ?
>
> The problem I encountered with using pv_ops was to implement the *_safe()
> variants. There is no simple way to do that using ALTERNATIVE_<n>(), as
> in the Xen PV case the call will remain, and I didn't find a way to
> specify a sane interface between the call-site and the called Xen function
> to return the error indicator. Remember that at the call site the main
> interface is the one of the RDMSR/WRMSR instructions. They lack an error
> indicator.
Would've been useful Changelog material that I suppose.
> In Xin's series there was a patch written initially by you to solve such
> a problem by adding the _ASM_EXTABLE_FUNC_REWIND() exception table method.
> I think this is a dead end, as it will break when using a shadow stack.
No memories, let me go search. I found this:
https://patchwork.ozlabs.org/project/linux-ide/patch/20250331082251.3171276-12-xin@zytor.com/
That's the other Peter :-)
Anyway, with shadowstack you should be able to frob SSP along with SP in
the exception context. IIRC the SSP 'return' value is on the SS itself,
so a WRSS to that field can easily make the whole CALL go away.
> Additionally I found a rather ugly hack only to avoid re-iterating most of
> the bare metal ALTERNATIVE() for the paravirt case. It is possible, but the
> bare metal case is gaining one additional ALTERNATIVE level, resulting in
> patching the original instruction with an identical copy first.
OTOH the above generates atrocious crap code :/
You get that _static_cpu_has() crud, which is basically a really fat
jump_label (because it needs to include the runtime test) and then the
code for both your xen thing and the alternative.
/me ponders things a bit..
> Remember that at the call site the main interface is the one of the
> RDMSR/WRMSR instructions. They lack an error indicator.
This, that isn't true.
Note how ex_handler_msr() takes a reg argument and how that sets that
reg to -EIO. See how the current native_read_msr_safe() uses that:
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
(also note how using _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_*_SAFE) like you
do, will result in reg being 0 or ax. Scribbling your 0 return value)
It very explicitly uses @err as error return value. So your call would
return eax:edx and take ecx to be the msr, but there is nothing stopping
us from then using say ebx for error return, like:
int err = 0;
asm_inline(
"1:\n"
ALTERNATIVE("ds rdmsr",
"call xen_rdmsr", XENPV)
"2:\n"
_ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %%ebx)
: "a" (ax), "d" (dx), "+b" (err)
: "c" (msr));
return err;
Hmm?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 10:04 ` Peter Zijlstra
@ 2025-09-30 10:43 ` Jürgen Groß
2025-09-30 19:49 ` H. Peter Anvin
0 siblings, 1 reply; 43+ messages in thread
From: Jürgen Groß @ 2025-09-30 10:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, x86, virtualization, xin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 4626 bytes --]
On 30.09.25 12:04, Peter Zijlstra wrote:
> On Tue, Sep 30, 2025 at 11:02:52AM +0200, Jürgen Groß wrote:
>> On 30.09.25 10:38, Peter Zijlstra wrote:
>>> On Tue, Sep 30, 2025 at 09:03:55AM +0200, Juergen Gross wrote:
>>>
>>>> +static __always_inline u64 read_msr(u32 msr)
>>>> +{
>>>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>>>> + return xen_read_msr(msr);
>>>> +
>>>> + return native_rdmsrq(msr);
>>>> +}
>>>> +
>>>> +static __always_inline int read_msr_safe(u32 msr, u64 *p)
>>>> +{
>>>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>>>> + return xen_read_msr_safe(msr, p);
>>>> +
>>>> + return native_read_msr_safe(msr, p);
>>>> +}
>>>> +
>>>> +static __always_inline void write_msr(u32 msr, u64 val)
>>>> +{
>>>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>>>> + xen_write_msr(msr, val);
>>>> + else
>>>> + native_wrmsrq(msr, val);
>>>> +}
>>>> +
>>>> +static __always_inline int write_msr_safe(u32 msr, u64 val)
>>>> +{
>>>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>>>> + return xen_write_msr_safe(msr, val);
>>>> +
>>>> + return native_write_msr_safe(msr, val);
>>>> +}
>>>> +
>>>> +static __always_inline u64 rdpmc(int counter)
>>>> +{
>>>> + if (cpu_feature_enabled(X86_FEATURE_XENPV))
>>>> + return xen_read_pmc(counter);
>>>> +
>>>> + return native_read_pmc(counter);
>>>> +}
>>>
>>> Egads, didn't we just construct giant ALTERNATIVE()s for the native_
>>> things? Why wrap that in a cpu_feature_enabled() instead of just adding
>>> one more case to the ALTERNATIVE() ?
>>
>> The problem I encountered with using pv_ops was to implement the *_safe()
>> variants. There is no simple way to do that using ALTERNATIVE_<n>(), as
>> in the Xen PV case the call will remain, and I didn't find a way to
>> specify a sane interface between the call-site and the called Xen function
>> to return the error indicator. Remember that at the call site the main
>> interface is the one of the RDMSR/WRMSR instructions. They lack an error
>> indicator.
>
> Would've been useful Changelog material that I suppose.
>
>> In Xin's series there was a patch written initially by you to solve such
>> a problem by adding the _ASM_EXTABLE_FUNC_REWIND() exception table method.
>> I think this is a dead end, as it will break when using a shadow stack.
>
> No memories, let me go search. I found this:
>
> https://patchwork.ozlabs.org/project/linux-ide/patch/20250331082251.3171276-12-xin@zytor.com/
>
> That's the other Peter :-)
Oh, my bad, sorry. :-)
> Anyway, with shadowstack you should be able to frob SSP along with SP in
> the exception context. IIRC the SSP 'return' value is on the SS itself,
> so a WRSS to that field can easily make the whole CALL go away.
Yeah, but being able to avoid all of that dance wouldn't be too bad either.
>> Additionally I found a rather ugly hack only to avoid re-iterating most of
>> the bare metal ALTERNATIVE() for the paravirt case. It is possible, but the
>> bare metal case is gaining one additional ALTERNATIVE level, resulting in
>> patching the original instruction with an identical copy first.
>
> OTOH the above generates atrocious crap code :/
Yes.
> You get that _static_cpu_has() crud, which is basically a really fat
> jump_label (because it needs to include the runtime test) and then the
> code for both your xen thing and the alternative.
Seeing both variants would make it easier to decide, I guess.
>
> /me ponders things a bit..
>
>> Remember that at the call site the main interface is the one of the
>> RDMSR/WRMSR instructions. They lack an error indicator.
>
> This, that isn't true.
>
> Note how ex_handler_msr() takes a reg argument and how that sets that
> reg to -EIO. See how the current native_read_msr_safe() uses that:
>
> _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
>
> (also note how using _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_*_SAFE) like you
> do, will result in reg being 0 or ax. Scribbling your 0 return value)
>
> It very explicitly uses @err as error return value. So your call would
> return eax:edx and take ecx to be the msr, but there is nothing stopping
> us from then using say ebx for error return, like:
>
> int err = 0;
>
> asm_inline(
> "1:\n"
> ALTERNATIVE("ds rdmsr",
> "call xen_rdmsr", XENPV)
> "2:\n"
>
> _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %%ebx)
>
> : "a" (ax), "d" (dx), "+b" (err)
> : "c" (msr));
>
> return err;
>
> Hmm?
Oh, indeed.
Let me try that and we can choose the less evil. :-)
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 10:43 ` Jürgen Groß
@ 2025-09-30 19:49 ` H. Peter Anvin
2025-09-30 19:59 ` H. Peter Anvin
2025-10-01 6:45 ` Peter Zijlstra
0 siblings, 2 replies; 43+ messages in thread
From: H. Peter Anvin @ 2025-09-30 19:49 UTC (permalink / raw)
To: Jürgen Groß, Peter Zijlstra
Cc: linux-kernel, x86, virtualization, xin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Ajay Kaher,
Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
On 2025-09-30 03:43, Jürgen Groß wrote:
>>
>>> In Xin's series there was a patch written initially by you to solve such
>>> a problem by adding the _ASM_EXTABLE_FUNC_REWIND() exception table method.
>>> I think this is a dead end, as it will break when using a shadow stack.
>>
>> No memories, let me go search. I found this:
>>
>> https://patchwork.ozlabs.org/project/linux-ide/
>> patch/20250331082251.3171276-12-xin@zytor.com/
>>
>> That's the other Peter :-)
>
> Oh, my bad, sorry. :-)
Yes, you would have to patch both the stack and the shadow stack.
BUT: in the end it really doesn't really buy much. The only thing that
benefits is Xen, but it is fairly easy for Xen (my original POC did this) to
filter out the quite few MSRs that they do special dispatch for (plus the
variable case), and then they can just use the native code including the
benefit of using WRMSRNS and immediates.
The other approach that I also came up with looks like this:
/* Native code (non-immediate): trap point at +7 */
0: 48 89 c2 mov %rax,%rdx
3: 48 c1 ea 20 shr $0x20,%rdx
7: 0f 01 c6 wrmsrns
a:
/* Native code (immediate): trap point at +0 */
0: 36 c4 e7 7a f6 c0 xx ds wrmsrns %rax,$XX
xx xx xx
a:
/* Xen code, stub sets CF = 1 on failure */
0: e8 xx xx xx xx call asm_xen_pv_wrmsr
5: 73 03 jnc 0xa
7: 0f 0b ud2
9: 90 nop
a:
The trap point even ends up in the same place! UD2 can be any 1-, 2-, or
3-byte trapping instruction.
>
>>> Additionally I found a rather ugly hack only to avoid re-iterating most of
>>> the bare metal ALTERNATIVE() for the paravirt case. It is possible, but the
>>> bare metal case is gaining one additional ALTERNATIVE level, resulting in
>>> patching the original instruction with an identical copy first.
>>
>> OTOH the above generates atrocious crap code :/
>
> Yes.
Please don't generate crap code -- that's exactly The Wrong Thing. I didn't
actually look at the output code; I honestly didn't think that that would even
be an issue.
If it is REALLY evil, then do something like shell script to generate the code
instead...
(One big problem here is that cpp doesn't understand colons as separators...)
-hpa
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 19:49 ` H. Peter Anvin
@ 2025-09-30 19:59 ` H. Peter Anvin
2025-10-01 6:45 ` Peter Zijlstra
1 sibling, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2025-09-30 19:59 UTC (permalink / raw)
To: Jürgen Groß, Peter Zijlstra
Cc: linux-kernel, x86, virtualization, xin, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Ajay Kaher,
Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
On 2025-09-30 12:49, H. Peter Anvin wrote:
>
> /* Xen code, stub sets CF = 1 on failure */
>
> 0: e8 xx xx xx xx call asm_xen_pv_wrmsr
> 5: 73 03 jnc 0xa
> 7: 0f 0b ud2
> 9: 90 nop
> a:
>
> The trap point even ends up in the same place! UD2 can be any 1-, 2-, or
> 3-byte trapping instruction.
>
You can, of course, also simply have a conditional jump, at the expense of
making the whole alternative block one byte longer:
0: e8 xx xx xx xx call asm_xen_pv_wrmsr
5: 0f 82 xx xx xx xx jc wrmsr_failed
-hpa
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 19:49 ` H. Peter Anvin
2025-09-30 19:59 ` H. Peter Anvin
@ 2025-10-01 6:45 ` Peter Zijlstra
1 sibling, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2025-10-01 6:45 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Jürgen Groß, linux-kernel, x86, virtualization, xin,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
On Tue, Sep 30, 2025 at 12:49:21PM -0700, H. Peter Anvin wrote:
> /* Xen code, stub sets CF = 1 on failure */
>
> 0: e8 xx xx xx xx call asm_xen_pv_wrmsr
> 5: 73 03 jnc 0xa
> 7: 0f 0b ud2
> 9: 90 nop
> a:
>
> The trap point even ends up in the same place! UD2 can be any 1-, 2-, or
> 3-byte trapping instruction.
Please don't rely on flags to be retained by RET. The various
mitigations have trouble with that.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 7:03 ` [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions Juergen Gross
2025-09-30 8:38 ` Peter Zijlstra
@ 2025-09-30 21:27 ` kernel test robot
2025-10-01 5:48 ` Jürgen Groß
1 sibling, 1 reply; 43+ messages in thread
From: kernel test robot @ 2025-09-30 21:27 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, x86, virtualization
Cc: llvm, oe-kbuild-all, xin, Juergen Gross, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
Hi Juergen,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on linus/master v6.17]
[cannot apply to kvm/queue kvm/next tip/master kvm/linux-next tip/auto-latest next-20250929]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Juergen-Gross/coco-tdx-Rename-MSR-access-helpers/20250930-150753
base: tip/x86/core
patch link: https://lore.kernel.org/r/20250930070356.30695-12-jgross%40suse.com
patch subject: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
config: x86_64-buildonly-randconfig-001-20251001 (https://download.01.org/0day-ci/archive/20251001/202510010555.InsgYDTd-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251001/202510010555.InsgYDTd-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510010555.InsgYDTd-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/x86/kernel/asm-offsets.c:9:
In file included from include/linux/crypto.h:18:
In file included from include/linux/slab.h:16:
In file included from include/linux/gfp.h:7:
In file included from include/linux/mmzone.h:22:
In file included from include/linux/mm_types.h:16:
In file included from include/linux/uprobes.h:18:
In file included from include/linux/timer.h:6:
In file included from include/linux/ktime.h:25:
In file included from include/linux/jiffies.h:10:
In file included from include/linux/time.h:60:
In file included from include/linux/time32.h:13:
In file included from include/linux/timex.h:67:
In file included from arch/x86/include/asm/timex.h:6:
In file included from arch/x86/include/asm/tsc.h:11:
>> arch/x86/include/asm/msr.h:327:10: fatal error: 'asm/xen/msr.h' file not found
327 | #include <asm/xen/msr.h>
| ^~~~~~~~~~~~~~~
1 error generated.
make[3]: *** [scripts/Makefile.build:182: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=3471495288
make[3]: Target 'prepare' not remade because of errors.
make[2]: *** [Makefile:1282: prepare0] Error 2 shuffle=3471495288
make[2]: Target 'prepare' not remade because of errors.
make[1]: *** [Makefile:248: __sub-make] Error 2 shuffle=3471495288
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:248: __sub-make] Error 2 shuffle=3471495288
make: Target 'prepare' not remade because of errors.
vim +327 arch/x86/include/asm/msr.h
325
326 #ifdef CONFIG_XEN_PV
> 327 #include <asm/xen/msr.h>
328
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
2025-09-30 21:27 ` kernel test robot
@ 2025-10-01 5:48 ` Jürgen Groß
0 siblings, 0 replies; 43+ messages in thread
From: Jürgen Groß @ 2025-10-01 5:48 UTC (permalink / raw)
To: kernel test robot, linux-kernel, x86, virtualization
Cc: llvm, oe-kbuild-all, xin, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Ajay Kaher,
Alexey Makhalov, Broadcom internal kernel review list,
Boris Ostrovsky, xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 3435 bytes --]
On 30.09.25 23:27, kernel test robot wrote:
> Hi Juergen,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on tip/x86/core]
> [also build test ERROR on linus/master v6.17]
> [cannot apply to kvm/queue kvm/next tip/master kvm/linux-next tip/auto-latest next-20250929]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Juergen-Gross/coco-tdx-Rename-MSR-access-helpers/20250930-150753
> base: tip/x86/core
> patch link: https://lore.kernel.org/r/20250930070356.30695-12-jgross%40suse.com
> patch subject: [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions
> config: x86_64-buildonly-randconfig-001-20251001 (https://download.01.org/0day-ci/archive/20251001/202510010555.InsgYDTd-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251001/202510010555.InsgYDTd-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202510010555.InsgYDTd-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> In file included from arch/x86/kernel/asm-offsets.c:9:
> In file included from include/linux/crypto.h:18:
> In file included from include/linux/slab.h:16:
> In file included from include/linux/gfp.h:7:
> In file included from include/linux/mmzone.h:22:
> In file included from include/linux/mm_types.h:16:
> In file included from include/linux/uprobes.h:18:
> In file included from include/linux/timer.h:6:
> In file included from include/linux/ktime.h:25:
> In file included from include/linux/jiffies.h:10:
> In file included from include/linux/time.h:60:
> In file included from include/linux/time32.h:13:
> In file included from include/linux/timex.h:67:
> In file included from arch/x86/include/asm/timex.h:6:
> In file included from arch/x86/include/asm/tsc.h:11:
>>> arch/x86/include/asm/msr.h:327:10: fatal error: 'asm/xen/msr.h' file not found
> 327 | #include <asm/xen/msr.h>
> | ^~~~~~~~~~~~~~~
> 1 error generated.
> make[3]: *** [scripts/Makefile.build:182: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=3471495288
> make[3]: Target 'prepare' not remade because of errors.
> make[2]: *** [Makefile:1282: prepare0] Error 2 shuffle=3471495288
> make[2]: Target 'prepare' not remade because of errors.
> make[1]: *** [Makefile:248: __sub-make] Error 2 shuffle=3471495288
> make[1]: Target 'prepare' not remade because of errors.
> make: *** [Makefile:248: __sub-make] Error 2 shuffle=3471495288
> make: Target 'prepare' not remade because of errors.
>
>
> vim +327 arch/x86/include/asm/msr.h
>
> 325
> 326 #ifdef CONFIG_XEN_PV
> > 327 #include <asm/xen/msr.h>
> 328
>
Uh, I forgot to "git add" that new header file. Will be corrected in V3.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 12/12] x86/msr: Reduce number of low level MSR access helpers
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (10 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 11/12] x86/paravirt: Don't use pv_ops vector for MSR access functions Juergen Gross
@ 2025-09-30 7:03 ` Juergen Gross
2025-09-30 19:19 ` [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions H. Peter Anvin
12 siblings, 0 replies; 43+ messages in thread
From: Juergen Gross @ 2025-09-30 7:03 UTC (permalink / raw)
To: linux-kernel, x86
Cc: xin, Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Boris Ostrovsky, xen-devel
Some MSR access helpers are redundant now, so remove the no longer
needed ones.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/include/asm/msr.h | 14 ++------------
arch/x86/xen/enlighten_pv.c | 4 ++--
2 files changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index d42cd2c19818..43924d8a3d66 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -290,21 +290,11 @@ static __always_inline void native_wrmsr(u32 msr, u32 low, u32 high)
native_wrmsrq(msr, (u64)high << 32 | low);
}
-static __always_inline u64 native_read_msr(u32 msr)
-{
- return native_rdmsrq(msr);
-}
-
static __always_inline int native_read_msr_safe(u32 msr, u64 *val)
{
return __rdmsr(msr, val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
}
-static __always_inline void native_write_msr(u32 msr, u64 val)
-{
- native_wrmsrq(msr, val);
-}
-
static __always_inline int native_write_msr_safe(u32 msr, u64 val)
{
return __wrmsrq(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
@@ -369,7 +359,7 @@ static __always_inline u64 rdpmc(int counter)
#else
static __always_inline u64 read_msr(u32 msr)
{
- return native_read_msr(msr);
+ return native_rdmsrq(msr);
}
static __always_inline int read_msr_safe(u32 msr, u64 *p)
@@ -379,7 +369,7 @@ static __always_inline int read_msr_safe(u32 msr, u64 *p)
static __always_inline void write_msr(u32 msr, u64 val)
{
- native_write_msr(msr, val);
+ native_wrmsrq(msr, val);
}
static __always_inline int write_msr_safe(u32 msr, u64 val)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index df653099c567..277e053cf3dd 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1097,7 +1097,7 @@ static u64 xen_do_read_msr(u32 msr, int *err)
if (err)
*err = native_read_msr_safe(msr, &val);
else
- val = native_read_msr(msr);
+ val = native_rdmsrq(msr);
switch (msr) {
case MSR_IA32_APICBASE:
@@ -1156,7 +1156,7 @@ static void xen_do_write_msr(u32 msr, u64 val, int *err)
if (err)
*err = native_write_msr_safe(msr, val);
else
- native_write_msr(msr, val);
+ native_wrmsrq(msr, val);
}
}
--
2.51.0
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions
2025-09-30 7:03 [PATCH v2 00/12] x86/msr: Inline rdmsr/wrmsr instructions Juergen Gross
` (11 preceding siblings ...)
2025-09-30 7:03 ` [PATCH v2 12/12] x86/msr: Reduce number of low level MSR access helpers Juergen Gross
@ 2025-09-30 19:19 ` H. Peter Anvin
12 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2025-09-30 19:19 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, x86, linux-coco, kvm, linux-hyperv,
virtualization, llvm
Cc: xin, Kirill A. Shutemov, Dave Hansen, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Sean Christopherson, Paolo Bonzini,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Vitaly Kuznetsov, Boris Ostrovsky, xen-devel, Ajay Kaher,
Alexey Makhalov, Broadcom internal kernel review list,
Andy Lutomirski, Peter Zijlstra, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
On 2025-09-30 00:03, Juergen Gross wrote:
> When building a kernel with CONFIG_PARAVIRT_XXL the paravirt
> infrastructure will always use functions for reading or writing MSRs,
> even when running on bare metal.
>
> Switch to inline RDMSR/WRMSR instructions in this case, reducing the
> paravirt overhead.
>
> In order to make this less intrusive, some further reorganization of
> the MSR access helpers is done in the first 5 patches.
>
> The next 5 patches are converting the non-paravirt case to use direct
> inlining of the MSR access instructions, including the WRMSRNS
> instruction and the immediate variants of RDMSR and WRMSR if possible.
>
> Patch 11 removes the PV hooks for MSR accesses and implements the
> Xen PV cases via calls depending on X86_FEATURE_XENPV, which results
> in runtime patching those calls away for the non-XenPV case.
>
> Patch 12 is a final little cleanup patch.
>
> This series has been tested to work with Xen PV and on bare metal.
>
> This series is inspired by Xin Li, who used a similar approach, but
> (in my opinion) with some flaws. Originally I thought it should be
> possible to use the paravirt infrastructure, but this turned out to be
> rather complicated, especially for the Xen PV case in the *_safe()
> variants of the MSR access functions.
>
Looks good to me.
(I'm not at all surprised that paravirt_ops didn't do the job. Both I and Xin
had come to the same conclusion.)
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
^ permalink raw reply [flat|nested] 43+ messages in thread