linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support
@ 2025-03-31  8:22 Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Xin Li (Intel)
                   ` (14 more replies)
  0 siblings, 15 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Obviously the existing MSR code and the pv_ops MSR access APIs need some
love: https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/

hpa has started a discussion about how to refactor it last October:
https://lore.kernel.org/lkml/7a4de623-ecda-4369-a7ae-0c43ef328177@zytor.com/

The consensus so far is to utilize the alternatives mechanism to eliminate
the Xen MSR access overhead on native systems and enable new MSR instructions
based on their availability.

To achieve this, a code refactor is necessary.  Initially, the MSR API usage
needs to be unified and simplified, which is addressed by patches 1 through 7.

Patches 8 and 9 introduce basic support for immediate form MSR instructions,
while patch 10 employs the immediate form WRMSRNS in VMX.

Finally, patches 11 to 15 leverage the alternatives mechanism to read and
write MSR.


H. Peter Anvin (Intel) (1):
  x86/extable: Implement EX_TYPE_FUNC_REWIND

Xin Li (Intel) (14):
  x86/msr: Replace __wrmsr() with native_wrmsrl()
  x86/msr: Replace __rdmsr() with native_rdmsrl()
  x86/msr: Simplify pmu_msr_{read,write}()
  x86/msr: Let pv_cpu_ops.write_msr{_safe}() take an u64 instead of two
    u32
  x86/msr: Replace wrmsr(msr, low, 0) with wrmsrl(msr, value)
  x86/msr: Remove MSR write APIs that take the MSR value in two u32
    arguments
  x86/msr: Remove pmu_msr_{read,write}()
  x86/cpufeatures: Add a CPU feature bit for MSR immediate form
    instructions
  x86/opcode: Add immediate form MSR instructions to x86-opcode-map
  KVM: VMX: Use WRMSRNS or its immediate form when available
  x86/msr: Use the alternatives mechanism to write MSR
  x86/msr: Use the alternatives mechanism to read MSR
  x86/extable: Add support for the immediate form MSR instructions
  x86/msr: Move the ARGS macros after the MSR read/write APIs

 arch/x86/coco/sev/core.c                   |   2 +-
 arch/x86/events/amd/brs.c                  |   4 +-
 arch/x86/hyperv/hv_apic.c                  |   6 +-
 arch/x86/hyperv/hv_vtl.c                   |   4 +-
 arch/x86/hyperv/ivm.c                      |   2 +-
 arch/x86/include/asm/apic.h                |   4 +-
 arch/x86/include/asm/asm.h                 |   6 +
 arch/x86/include/asm/cpufeatures.h         |  19 +-
 arch/x86/include/asm/extable_fixup_types.h |   1 +
 arch/x86/include/asm/fred.h                |   2 +-
 arch/x86/include/asm/mshyperv.h            |   2 +-
 arch/x86/include/asm/msr-index.h           |   6 +
 arch/x86/include/asm/msr.h                 | 664 ++++++++++++++++-----
 arch/x86/include/asm/paravirt.h            |  64 --
 arch/x86/include/asm/paravirt_types.h      |  11 -
 arch/x86/include/asm/switch_to.h           |   2 +-
 arch/x86/kernel/cpu/amd.c                  |   2 +-
 arch/x86/kernel/cpu/common.c               |  10 +-
 arch/x86/kernel/cpu/mce/core.c             |   6 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c  |  12 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c     |   2 +-
 arch/x86/kernel/cpu/scattered.c            |   1 +
 arch/x86/kernel/cpu/umwait.c               |   4 +-
 arch/x86/kernel/kvm.c                      |   2 +-
 arch/x86/kernel/kvmclock.c                 |   2 +-
 arch/x86/kernel/paravirt.c                 |   4 -
 arch/x86/kvm/svm/svm.c                     |  15 +-
 arch/x86/kvm/vmx/vmenter.S                 |  28 +-
 arch/x86/kvm/vmx/vmx.c                     |   4 +-
 arch/x86/lib/x86-opcode-map.txt            |   5 +-
 arch/x86/mm/extable.c                      | 186 ++++--
 arch/x86/mm/mem_encrypt_identity.c         |   4 +-
 arch/x86/xen/enlighten_pv.c                | 110 ++--
 arch/x86/xen/pmu.c                         |  43 +-
 arch/x86/xen/xen-asm.S                     |  89 +++
 arch/x86/xen/xen-ops.h                     |  12 +-
 drivers/ata/pata_cs5535.c                  |  12 +-
 drivers/ata/pata_cs5536.c                  |   6 +-
 drivers/cpufreq/acpi-cpufreq.c             |   2 +-
 drivers/cpufreq/e_powersaver.c             |   2 +-
 drivers/cpufreq/powernow-k6.c              |   8 +-
 tools/arch/x86/lib/x86-opcode-map.txt      |   5 +-
 42 files changed, 896 insertions(+), 479 deletions(-)


base-commit: 8fc8ae1aeed6dc895bf35a4797c6e770574f4612
-- 
2.49.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31 10:17   ` Ingo Molnar
  2025-03-31 21:45   ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Andrew Cooper
  2025-03-31  8:22 ` [RFC PATCH v1 02/15] x86/msr: Replace __rdmsr() with native_rdmsrl() Xin Li (Intel)
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

__wrmsr() is the lowest level primitive MSR write API, and its direct
use is NOT preferred.  Use its wrapper function native_wrmsrl() instead.

No functional change intended.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/events/amd/brs.c                 | 2 +-
 arch/x86/include/asm/apic.h               | 2 +-
 arch/x86/include/asm/msr.h                | 6 ++++--
 arch/x86/kernel/cpu/mce/core.c            | 2 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +++---
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index ec3427463382..4a47f3c108de 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -44,7 +44,7 @@ static inline unsigned int brs_to(int idx)
 static __always_inline void set_debug_extn_cfg(u64 val)
 {
 	/* bits[4:3] must always be set to 11b */
-	__wrmsr(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
+	native_wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
 }
 
 static __always_inline u64 get_debug_extn_cfg(void)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index c903d358405d..3345a819c859 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -214,7 +214,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 
 static inline void native_apic_msr_eoi(void)
 {
-	__wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
+	native_wrmsrl(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK);
 }
 
 static inline u32 native_apic_msr_read(u32 reg)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 9397a319d165..27ea8793705d 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -144,10 +144,12 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
 static inline void notrace
 native_write_msr(unsigned int msr, u32 low, u32 high)
 {
-	__wrmsr(msr, low, high);
+	u64 val = (u64)high << 32 | low;
+
+	native_wrmsrl(msr, val);
 
 	if (tracepoint_enabled(write_msr))
-		do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
+		do_trace_write_msr(msr, val, 0);
 }
 
 /* Can be uninlined because referenced by paravirt */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 1f14c3308b6b..0eaeaba12df2 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1306,7 +1306,7 @@ static noinstr bool mce_check_crashing_cpu(void)
 		}
 
 		if (mcgstatus & MCG_STATUS_RIPV) {
-			__wrmsr(MSR_IA32_MCG_STATUS, 0, 0);
+			native_wrmsrl(MSR_IA32_MCG_STATUS, 0);
 			return true;
 		}
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 01fa7890b43f..55536120c8d1 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -481,7 +481,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * cache.
 	 */
 	saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
-	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	native_wrmsrl(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	closid_p = this_cpu_read(pqr_state.cur_closid);
 	rmid_p = this_cpu_read(pqr_state.cur_rmid);
 	mem_r = plr->kmem;
@@ -493,7 +493,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * pseudo-locked followed by reading of kernel memory to load it
 	 * into the cache.
 	 */
-	__wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
+	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)plr->closid << 32 | rmid_p);
 
 	/*
 	 * Cache was flushed earlier. Now access kernel memory to read it
@@ -530,7 +530,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * Critical section end: restore closid with capacity bitmask that
 	 * does not overlap with pseudo-locked region.
 	 */
-	__wrmsr(MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
+	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)closid_p << 32 | rmid_p);
 
 	/* Re-enable the hardware prefetcher(s) */
 	wrmsrl(MSR_MISC_FEATURE_CONTROL, saved_msr);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 02/15] x86/msr: Replace __rdmsr() with native_rdmsrl()
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31 10:26   ` Ingo Molnar
  2025-03-31  8:22 ` [RFC PATCH v1 03/15] x86/msr: Simplify pmu_msr_{read,write}() Xin Li (Intel)
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

__rdmsr() is the lowest level primitive MSR read API, and its direct
use is NOT preferred.  Use its wrapper function native_rdmsrl() instead.

No functional change intended.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/coco/sev/core.c                  | 2 +-
 arch/x86/events/amd/brs.c                 | 2 +-
 arch/x86/hyperv/hv_vtl.c                  | 4 ++--
 arch/x86/hyperv/ivm.c                     | 2 +-
 arch/x86/include/asm/mshyperv.h           | 2 +-
 arch/x86/include/asm/msr.h                | 5 +++++
 arch/x86/kernel/cpu/common.c              | 2 +-
 arch/x86/kernel/cpu/mce/core.c            | 4 ++--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 arch/x86/kvm/vmx/vmx.c                    | 4 ++--
 arch/x86/mm/mem_encrypt_identity.c        | 4 ++--
 11 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index b0c1a7a57497..d38e6f0ff9c4 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -276,7 +276,7 @@ static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
 
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
-	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+	return native_rdmsrl(MSR_AMD64_SEV_ES_GHCB);
 }
 
 static __always_inline void sev_es_wr_ghcb_msr(u64 val)
diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index 4a47f3c108de..3ad7d87b5403 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -49,7 +49,7 @@ static __always_inline void set_debug_extn_cfg(u64 val)
 
 static __always_inline u64 get_debug_extn_cfg(void)
 {
-	return __rdmsr(MSR_AMD_DBG_EXTN_CFG);
+	return native_rdmsrl(MSR_AMD_DBG_EXTN_CFG);
 }
 
 static bool __init amd_brs_detect(void)
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 13242ed8ff16..4a27e475d35f 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -149,11 +149,11 @@ static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
 	input->vp_context.rip = rip;
 	input->vp_context.rsp = rsp;
 	input->vp_context.rflags = 0x0000000000000002;
-	input->vp_context.efer = __rdmsr(MSR_EFER);
+	input->vp_context.efer = native_rdmsrl(MSR_EFER);
 	input->vp_context.cr0 = native_read_cr0();
 	input->vp_context.cr3 = __native_read_cr3();
 	input->vp_context.cr4 = native_read_cr4();
-	input->vp_context.msr_cr_pat = __rdmsr(MSR_IA32_CR_PAT);
+	input->vp_context.msr_cr_pat = native_rdmsrl(MSR_IA32_CR_PAT);
 	input->vp_context.idtr.limit = idt_ptr.size;
 	input->vp_context.idtr.base = idt_ptr.address;
 	input->vp_context.gdtr.limit = gdt_ptr.size;
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 77bf05f06b9e..95cf2113a72a 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -110,7 +110,7 @@ u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
 
 static inline u64 rd_ghcb_msr(void)
 {
-	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+	return native_rdmsrl(MSR_AMD64_SEV_ES_GHCB);
 }
 
 static inline void wr_ghcb_msr(u64 val)
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index bab5ccfc60a7..2ca6ef89530d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -304,7 +304,7 @@ void hv_set_non_nested_msr(unsigned int reg, u64 value);
 
 static __always_inline u64 hv_raw_get_msr(unsigned int reg)
 {
-	return __rdmsr(reg);
+	return native_rdmsrl(reg);
 }
 
 #else /* CONFIG_HYPERV */
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 27ea8793705d..fb3d7c4cb774 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -106,6 +106,11 @@ do {							\
 	(void)((val2) = (u32)(__val >> 32));		\
 } while (0)
 
+static __always_inline u64 native_rdmsrl(const u32 msr)
+{
+	return __rdmsr(msr);
+}
+
 #define native_wrmsr(msr, low, high)			\
 	__wrmsr(msr, low, high)
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 12126adbc3a9..a268db71d944 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -164,7 +164,7 @@ static void ppin_init(struct cpuinfo_x86 *c)
 
 	/* Is the enable bit set? */
 	if (val & 2UL) {
-		c->ppin = __rdmsr(info->msr_ppin);
+		c->ppin = native_rdmsrl(info->msr_ppin);
 		set_cpu_cap(c, info->feature);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0eaeaba12df2..0e050af723f5 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -121,7 +121,7 @@ void mce_prep_record_common(struct mce *m)
 {
 	m->cpuid	= cpuid_eax(1);
 	m->cpuvendor	= boot_cpu_data.x86_vendor;
-	m->mcgcap	= __rdmsr(MSR_IA32_MCG_CAP);
+	m->mcgcap	= native_rdmsrl(MSR_IA32_MCG_CAP);
 	/* need the internal __ version to avoid deadlocks */
 	m->time		= __ktime_get_real_seconds();
 }
@@ -1298,7 +1298,7 @@ static noinstr bool mce_check_crashing_cpu(void)
 	    (crashing_cpu != -1 && crashing_cpu != cpu)) {
 		u64 mcgstatus;
 
-		mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
+		mcgstatus = native_rdmsrl(MSR_IA32_MCG_STATUS);
 
 		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
 			if (mcgstatus & MCG_STATUS_LMCES)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 55536120c8d1..675fd9f93e33 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -480,7 +480,7 @@ int resctrl_arch_pseudo_lock_fn(void *_plr)
 	 * the buffer and evict pseudo-locked memory read earlier from the
 	 * cache.
 	 */
-	saved_msr = __rdmsr(MSR_MISC_FEATURE_CONTROL);
+	saved_msr = native_rdmsrl(MSR_MISC_FEATURE_CONTROL);
 	native_wrmsrl(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	closid_p = this_cpu_read(pqr_state.cur_closid);
 	rmid_p = this_cpu_read(pqr_state.cur_rmid);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5c5766467a61..2a24060397cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -380,7 +380,7 @@ static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
 	if (!vmx->disable_fb_clear)
 		return;
 
-	msr = __rdmsr(MSR_IA32_MCU_OPT_CTRL);
+	msr = native_rdmsrl(MSR_IA32_MCU_OPT_CTRL);
 	msr |= FB_CLEAR_DIS;
 	native_wrmsrl(MSR_IA32_MCU_OPT_CTRL, msr);
 	/* Cache the MSR value to avoid reading it later */
@@ -7307,7 +7307,7 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
 		return;
 
 	if (flags & VMX_RUN_SAVE_SPEC_CTRL)
-		vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+		vmx->spec_ctrl = native_rdmsrl(MSR_IA32_SPEC_CTRL);
 
 	/*
 	 * If the guest/host SPEC_CTRL values differ, restore the host value.
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 5eecdd92da10..3005b07a0016 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -526,7 +526,7 @@ void __head sme_enable(struct boot_params *bp)
 	me_mask = 1UL << (ebx & 0x3f);
 
 	/* Check the SEV MSR whether SEV or SME is enabled */
-	RIP_REL_REF(sev_status) = msr = __rdmsr(MSR_AMD64_SEV);
+	RIP_REL_REF(sev_status) = msr = native_rdmsrl(MSR_AMD64_SEV);
 	feature_mask = (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
 
 	/*
@@ -557,7 +557,7 @@ void __head sme_enable(struct boot_params *bp)
 			return;
 
 		/* For SME, check the SYSCFG MSR */
-		msr = __rdmsr(MSR_AMD64_SYSCFG);
+		msr = native_rdmsrl(MSR_AMD64_SYSCFG);
 		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
 			return;
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 03/15] x86/msr: Simplify pmu_msr_{read,write}()
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 02/15] x86/msr: Replace __rdmsr() with native_rdmsrl() Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 04/15] x86/msr: Let pv_cpu_ops.write_msr{_safe}() take an u64 instead of two u32 Xin Li (Intel)
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Remove calling native_{read,write}_msr{_safe}() in pmu_msr_{read,write}(),
and have them return false to let their caller to do that instead.

Refactor pmu_msr_write() to take the input MSR value in an u64 argument,
replacing the current dual u32 arguments.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Sign-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/xen/enlighten_pv.c |  6 +++++-
 arch/x86/xen/pmu.c          | 27 ++++-----------------------
 arch/x86/xen/xen-ops.h      |  4 ++--
 3 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index dcc2041f8e61..2bfe57469ac3 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1133,6 +1133,8 @@ static void set_seg(unsigned int which, unsigned int low, unsigned int high,
 static void xen_do_write_msr(unsigned int msr, unsigned int low,
 			     unsigned int high, int *err)
 {
+	u64 val;
+
 	switch (msr) {
 	case MSR_FS_BASE:
 		set_seg(SEGBASE_FS, low, high, err);
@@ -1159,7 +1161,9 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 		break;
 
 	default:
-		if (!pmu_msr_write(msr, low, high, err)) {
+		val = (u64)high << 32 | low;
+
+		if (!pmu_msr_write(msr, val)) {
 			if (err)
 				*err = native_write_msr_safe(msr, low, high);
 			else
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index f06987b0efc3..1364cd3fb3ef 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -313,37 +313,18 @@ static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
 	return true;
 }
 
-bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
+bool pmu_msr_read(u32 msr, u64 *val, int *err)
 {
 	bool emulated;
 
-	if (!pmu_msr_chk_emulated(msr, val, true, &emulated))
-		return false;
-
-	if (!emulated) {
-		*val = err ? native_read_msr_safe(msr, err)
-			   : native_read_msr(msr);
-	}
-
-	return true;
+	return pmu_msr_chk_emulated(msr, val, true, &emulated) && emulated;
 }
 
-bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
+bool pmu_msr_write(u32 msr, u64 val)
 {
-	uint64_t val = ((uint64_t)high << 32) | low;
 	bool emulated;
 
-	if (!pmu_msr_chk_emulated(msr, &val, false, &emulated))
-		return false;
-
-	if (!emulated) {
-		if (err)
-			*err = native_write_msr_safe(msr, low, high);
-		else
-			native_write_msr(msr, low, high);
-	}
-
-	return true;
+	return pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated;
 }
 
 static unsigned long long xen_amd_read_pmc(int counter)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 63c13a2ccf55..4a0a1d73d8b8 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -274,8 +274,8 @@ void xen_pmu_finish(int cpu);
 static inline void xen_pmu_init(int cpu) {}
 static inline void xen_pmu_finish(int cpu) {}
 #endif
-bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err);
-bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err);
+bool pmu_msr_read(u32 msr, u64 *val, int *err);
+bool pmu_msr_write(u32 msr, u64 val);
 int pmu_apic_update(uint32_t reg);
 unsigned long long xen_read_pmc(int counter);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 04/15] x86/msr: Let pv_cpu_ops.write_msr{_safe}() take an u64 instead of two u32
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (2 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 03/15] x86/msr: Simplify pmu_msr_{read,write}() Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 05/15] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrl(msr, value) Xin Li (Intel)
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Refactor pv_cpu_ops.write_msr{_safe}() to take the input MSR value
in a single u64 argument, replacing the current dual u32 arguments.

No functional change intended.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h            | 33 ++++++++++++---------------
 arch/x86/include/asm/paravirt.h       | 10 ++++----
 arch/x86/include/asm/paravirt_types.h |  4 ++--
 arch/x86/kernel/kvmclock.c            |  2 +-
 arch/x86/kvm/svm/svm.c                | 15 +++---------
 arch/x86/xen/enlighten_pv.c           | 13 +++++------
 6 files changed, 30 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index fb3d7c4cb774..121597fc5d41 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -91,12 +91,12 @@ static __always_inline unsigned long long __rdmsr(unsigned int msr)
 	return EAX_EDX_VAL(val, low, high);
 }
 
-static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
+static __always_inline void __wrmsr(u32 msr, u64 val)
 {
 	asm volatile("1: wrmsr\n"
 		     "2:\n"
 		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
-		     : : "c" (msr), "a"(low), "d" (high) : "memory");
+		     : : "c" (msr), "a"((u32)val), "d" ((u32)(val >> 32)) : "memory");
 }
 
 #define native_rdmsr(msr, val1, val2)			\
@@ -112,11 +112,10 @@ static __always_inline u64 native_rdmsrl(const u32 msr)
 }
 
 #define native_wrmsr(msr, low, high)			\
-	__wrmsr(msr, low, high)
+	__wrmsr((msr), ((u64)(high) << 32) | (low))
 
 #define native_wrmsrl(msr, val)				\
-	__wrmsr((msr), (u32)((u64)(val)),		\
-		       (u32)((u64)(val) >> 32))
+	__wrmsr((msr), (val))
 
 static inline unsigned long long native_read_msr(unsigned int msr)
 {
@@ -146,11 +145,8 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
 }
 
 /* Can be uninlined because referenced by paravirt */
-static inline void notrace
-native_write_msr(unsigned int msr, u32 low, u32 high)
+static inline void notrace native_write_msr(u32 msr, u64 val)
 {
-	u64 val = (u64)high << 32 | low;
-
 	native_wrmsrl(msr, val);
 
 	if (tracepoint_enabled(write_msr))
@@ -158,8 +154,7 @@ native_write_msr(unsigned int msr, u32 low, u32 high)
 }
 
 /* Can be uninlined because referenced by paravirt */
-static inline int notrace
-native_write_msr_safe(unsigned int msr, u32 low, u32 high)
+static inline int notrace native_write_msr_safe(u32 msr, u64 val)
 {
 	int err;
 
@@ -167,10 +162,10 @@ native_write_msr_safe(unsigned int msr, u32 low, u32 high)
 		     "2:\n\t"
 		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
 		     : [err] "=a" (err)
-		     : "c" (msr), "0" (low), "d" (high)
+		     : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
 		     : "memory");
 	if (tracepoint_enabled(write_msr))
-		do_trace_write_msr(msr, ((u64)high << 32 | low), err);
+		do_trace_write_msr(msr, val, err);
 	return err;
 }
 
@@ -258,23 +253,23 @@ do {								\
 	(void)((high) = (u32)(__val >> 32));			\
 } while (0)
 
-static inline void wrmsr(unsigned int msr, u32 low, u32 high)
+static inline void wrmsr(u32 msr, u32 low, u32 high)
 {
-	native_write_msr(msr, low, high);
+	native_write_msr(msr, (u64)high << 32 | low);
 }
 
 #define rdmsrl(msr, val)			\
 	((val) = native_read_msr((msr)))
 
-static inline void wrmsrl(unsigned int msr, u64 val)
+static inline void wrmsrl(u32 msr, u64 val)
 {
-	native_write_msr(msr, (u32)(val & 0xffffffffULL), (u32)(val >> 32));
+	native_write_msr(msr, val);
 }
 
 /* wrmsr with exception handling */
-static inline int wrmsr_safe(unsigned int msr, u32 low, u32 high)
+static inline int wrmsr_safe(u32 msr, u32 low, u32 high)
 {
-	return native_write_msr_safe(msr, low, high);
+	return native_write_msr_safe(msr, (u64)high << 32 | low);
 }
 
 /* rdmsr with exception handling */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c4c23190925c..f3d6e8394d38 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -180,10 +180,9 @@ static inline u64 paravirt_read_msr(unsigned msr)
 	return PVOP_CALL1(u64, cpu.read_msr, msr);
 }
 
-static inline void paravirt_write_msr(unsigned msr,
-				      unsigned low, unsigned high)
+static inline void paravirt_write_msr(u32 msr, u32 low, u32 high)
 {
-	PVOP_VCALL3(cpu.write_msr, msr, low, high);
+	PVOP_VCALL2(cpu.write_msr, msr, (u64)high << 32 | low);
 }
 
 static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
@@ -191,10 +190,9 @@ static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
 	return PVOP_CALL2(u64, cpu.read_msr_safe, msr, err);
 }
 
-static inline int paravirt_write_msr_safe(unsigned msr,
-					  unsigned low, unsigned high)
+static inline int paravirt_write_msr_safe(u32 msr, u32 low, u32 high)
 {
-	return PVOP_CALL3(int, cpu.write_msr_safe, msr, low, high);
+	return PVOP_CALL2(int, cpu.write_msr_safe, msr, (u64)high << 32 | low);
 }
 
 #define rdmsr(msr, val1, val2)			\
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 631c306ce1ff..78777b78da12 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -92,14 +92,14 @@ struct pv_cpu_ops {
 
 	/* Unsafe MSR operations.  These will warn or panic on failure. */
 	u64 (*read_msr)(unsigned int msr);
-	void (*write_msr)(unsigned int msr, unsigned low, unsigned high);
+	void (*write_msr)(u32 msr, u64 val);
 
 	/*
 	 * Safe MSR operations.
 	 * read sets err to 0 or -EIO.  write returns 0 or -EIO.
 	 */
 	u64 (*read_msr_safe)(unsigned int msr, int *err);
-	int (*write_msr_safe)(unsigned int msr, unsigned low, unsigned high);
+	int (*write_msr_safe)(u32 msr, u64 val);
 
 	u64 (*read_pmc)(int counter);
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 5b2c15214a6b..6b4102365ae5 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -196,7 +196,7 @@ static void kvm_setup_secondary_clock(void)
 void kvmclock_disable(void)
 {
 	if (msr_kvm_system_time)
-		native_write_msr(msr_kvm_system_time, 0, 0);
+		native_write_msr(msr_kvm_system_time, 0);
 }
 
 static void __init kvmclock_init_mem(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d5d0c5c3300b..5cbc4ccb145c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -475,7 +475,6 @@ static void svm_inject_exception(struct kvm_vcpu *vcpu)
 
 static void svm_init_erratum_383(void)
 {
-	u32 low, high;
 	int err;
 	u64 val;
 
@@ -489,10 +488,7 @@ static void svm_init_erratum_383(void)
 
 	val |= (1ULL << 47);
 
-	low  = lower_32_bits(val);
-	high = upper_32_bits(val);
-
-	native_write_msr_safe(MSR_AMD64_DC_CFG, low, high);
+	native_write_msr_safe(MSR_AMD64_DC_CFG, val);
 
 	erratum_383_found = true;
 }
@@ -2167,17 +2163,12 @@ static bool is_erratum_383(void)
 
 	/* Clear MCi_STATUS registers */
 	for (i = 0; i < 6; ++i)
-		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0, 0);
+		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
 
 	value = native_read_msr_safe(MSR_IA32_MCG_STATUS, &err);
 	if (!err) {
-		u32 low, high;
-
 		value &= ~(1ULL << 2);
-		low    = lower_32_bits(value);
-		high   = upper_32_bits(value);
-
-		native_write_msr_safe(MSR_IA32_MCG_STATUS, low, high);
+		native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
 	}
 
 	/* Flush tlb to evict multi-match entries */
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 2bfe57469ac3..7401cce19939 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1165,9 +1165,9 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 
 		if (!pmu_msr_write(msr, val)) {
 			if (err)
-				*err = native_write_msr_safe(msr, low, high);
+				*err = native_write_msr_safe(msr, val);
 			else
-				native_write_msr(msr, low, high);
+				native_write_msr(msr, val);
 		}
 	}
 }
@@ -1177,12 +1177,11 @@ static u64 xen_read_msr_safe(unsigned int msr, int *err)
 	return xen_do_read_msr(msr, err);
 }
 
-static int xen_write_msr_safe(unsigned int msr, unsigned int low,
-			      unsigned int high)
+static int xen_write_msr_safe(u32 msr, u64 val)
 {
 	int err = 0;
 
-	xen_do_write_msr(msr, low, high, &err);
+	xen_do_write_msr(msr, val, (u32)(val >> 32), &err);
 
 	return err;
 }
@@ -1194,11 +1193,11 @@ static u64 xen_read_msr(unsigned int msr)
 	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
 }
 
-static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
+static void xen_write_msr(u32 msr, u64 val)
 {
 	int err;
 
-	xen_do_write_msr(msr, low, high, xen_msr_safe ? &err : NULL);
+	xen_do_write_msr(msr, val, (u32)(val >> 32), xen_msr_safe ? &err : NULL);
 }
 
 /* This is called once we have the cpu_possible_mask */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 05/15] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrl(msr, value)
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (3 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 04/15] x86/msr: Let pv_cpu_ops.write_msr{_safe}() take an u64 instead of two u32 Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 06/15] x86/msr: Remove MSR write APIs that take the MSR value in two u32 arguments Xin Li (Intel)
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/hyperv/hv_apic.c                 |  6 +++---
 arch/x86/include/asm/apic.h               |  2 +-
 arch/x86/include/asm/switch_to.h          |  2 +-
 arch/x86/kernel/cpu/amd.c                 |  2 +-
 arch/x86/kernel/cpu/common.c              |  8 ++++----
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  4 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  2 +-
 arch/x86/kernel/cpu/umwait.c              |  4 ++--
 arch/x86/kernel/kvm.c                     |  2 +-
 drivers/ata/pata_cs5535.c                 | 12 ++++++------
 drivers/ata/pata_cs5536.c                 |  6 +++---
 drivers/cpufreq/acpi-cpufreq.c            |  2 +-
 drivers/cpufreq/e_powersaver.c            |  2 +-
 drivers/cpufreq/powernow-k6.c             |  8 ++++----
 14 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 6d91ac5f9836..284e16fe359b 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -75,10 +75,10 @@ static void hv_apic_write(u32 reg, u32 val)
 {
 	switch (reg) {
 	case APIC_EOI:
-		wrmsr(HV_X64_MSR_EOI, val, 0);
+		wrmsrl(HV_X64_MSR_EOI, val);
 		break;
 	case APIC_TASKPRI:
-		wrmsr(HV_X64_MSR_TPR, val, 0);
+		wrmsrl(HV_X64_MSR_TPR, val);
 		break;
 	default:
 		native_apic_mem_write(reg, val);
@@ -92,7 +92,7 @@ static void hv_apic_eoi_write(void)
 	if (hvp && (xchg(&hvp->apic_assist, 0) & 0x1))
 		return;
 
-	wrmsr(HV_X64_MSR_EOI, APIC_EOI_ACK, 0);
+	wrmsrl(HV_X64_MSR_EOI, APIC_EOI_ACK);
 }
 
 static bool cpu_is_self(int cpu)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 3345a819c859..003b2cd2266b 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -209,7 +209,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 	    reg == APIC_LVR)
 		return;
 
-	wrmsr(APIC_BASE_MSR + (reg >> 4), v, 0);
+	wrmsrl(APIC_BASE_MSR + (reg >> 4), v);
 }
 
 static inline void native_apic_msr_eoi(void)
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 75248546403d..525896a18028 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -59,7 +59,7 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 		return;
 
 	this_cpu_write(cpu_tss_rw.x86_tss.ss1, thread->sysenter_cs);
-	wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
+	wrmsrl(MSR_IA32_SYSENTER_CS, thread->sysenter_cs);
 }
 #endif
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 79569f72b8ee..2f70cd525043 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1200,7 +1200,7 @@ void amd_set_dr_addr_mask(unsigned long mask, unsigned int dr)
 	if (per_cpu(amd_dr_addr_mask, cpu)[dr] == mask)
 		return;
 
-	wrmsr(amd_msr_dr_addr_masks[dr], mask, 0);
+	wrmsrl(amd_msr_dr_addr_masks[dr], mask);
 	per_cpu(amd_dr_addr_mask, cpu)[dr] = mask;
 }
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a268db71d944..9b53f92df21c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1982,9 +1982,9 @@ void enable_sep_cpu(void)
 	 */
 
 	tss->x86_tss.ss1 = __KERNEL_CS;
-	wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0);
-	wrmsr(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1), 0);
-	wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0);
+	wrmsrl(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1);
+	wrmsrl(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1));
+	wrmsrl(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32);
 
 	put_cpu();
 }
@@ -2198,7 +2198,7 @@ static inline void setup_getcpu(int cpu)
 	struct desc_struct d = { };
 
 	if (boot_cpu_has(X86_FEATURE_RDTSCP) || boot_cpu_has(X86_FEATURE_RDPID))
-		wrmsr(MSR_TSC_AUX, cpudata, 0);
+		wrmsrl(MSR_TSC_AUX, cpudata);
 
 	/* Store CPU and node number in limit. */
 	d.limit0 = cpudata;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 675fd9f93e33..44a9ac87b7be 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -903,7 +903,7 @@ int resctrl_arch_measure_cycles_lat_fn(void *_plr)
 	 * Disable hardware prefetchers.
 	 */
 	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
-	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	wrmsrl(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 	mem_r = READ_ONCE(plr->kmem);
 	/*
 	 * Dummy execute of the time measurement to load the needed
@@ -999,7 +999,7 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	 * Disable hardware prefetchers.
 	 */
 	rdmsr(MSR_MISC_FEATURE_CONTROL, saved_low, saved_high);
-	wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	wrmsrl(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits);
 
 	/* Initialize rest of local variables */
 	/*
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c6274d40b217..e5a4c283c924 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1697,7 +1697,7 @@ void resctrl_arch_mon_event_config_write(void *_config_info)
 		pr_warn_once("Invalid event id %d\n", config_info->evtid);
 		return;
 	}
-	wrmsr(MSR_IA32_EVT_CFG_BASE + index, config_info->mon_config, 0);
+	wrmsrl(MSR_IA32_EVT_CFG_BASE + index, config_info->mon_config);
 }
 
 static void mbm_config_write_domain(struct rdt_resource *r,
diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 2293efd6ffa6..0f5d5d9f3352 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -33,7 +33,7 @@ static DEFINE_MUTEX(umwait_lock);
 static void umwait_update_control_msr(void * unused)
 {
 	lockdep_assert_irqs_disabled();
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
+	wrmsrl(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached));
 }
 
 /*
@@ -71,7 +71,7 @@ static int umwait_cpu_offline(unsigned int cpu)
 	 * the original control MSR value in umwait_init(). So there
 	 * is no race condition here.
 	 */
-	wrmsr(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached, 0);
+	wrmsrl(MSR_IA32_UMWAIT_CONTROL, orig_umwait_control_cached);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3be9b3342c67..1b74aa64a1bc 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -399,7 +399,7 @@ static void kvm_disable_steal_time(void)
 	if (!has_steal_clock)
 		return;
 
-	wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
+	wrmsrl(MSR_KVM_STEAL_TIME, 0);
 }
 
 static u64 kvm_steal_clock(int cpu)
diff --git a/drivers/ata/pata_cs5535.c b/drivers/ata/pata_cs5535.c
index d793fc441b46..b0ebd0fe31ed 100644
--- a/drivers/ata/pata_cs5535.c
+++ b/drivers/ata/pata_cs5535.c
@@ -102,16 +102,16 @@ static void cs5535_set_piomode(struct ata_port *ap, struct ata_device *adev)
 		cmdmode = min(mode, pairmode);
 		/* Write the other drive timing register if it changed */
 		if (cmdmode < pairmode)
-			wrmsr(ATAC_CH0D0_PIO + 2 * pair->devno,
-				pio_cmd_timings[cmdmode] << 16 | pio_timings[pairmode], 0);
+			wrmsrl(ATAC_CH0D0_PIO + 2 * pair->devno,
+				pio_cmd_timings[cmdmode] << 16 | pio_timings[pairmode]);
 	}
 	/* Write the drive timing register */
-	wrmsr(ATAC_CH0D0_PIO + 2 * adev->devno,
-		pio_cmd_timings[cmdmode] << 16 | pio_timings[mode], 0);
+	wrmsrl(ATAC_CH0D0_PIO + 2 * adev->devno,
+		pio_cmd_timings[cmdmode] << 16 | pio_timings[mode]);
 
 	/* Set the PIO "format 1" bit in the DMA timing register */
 	rdmsr(ATAC_CH0D0_DMA + 2 * adev->devno, reg, dummy);
-	wrmsr(ATAC_CH0D0_DMA + 2 * adev->devno, reg | 0x80000000UL, 0);
+	wrmsrl(ATAC_CH0D0_DMA + 2 * adev->devno, reg | 0x80000000UL);
 }
 
 /**
@@ -138,7 +138,7 @@ static void cs5535_set_dmamode(struct ata_port *ap, struct ata_device *adev)
 		reg |= udma_timings[mode - XFER_UDMA_0];
 	else
 		reg |= mwdma_timings[mode - XFER_MW_DMA_0];
-	wrmsr(ATAC_CH0D0_DMA + 2 * adev->devno, reg, 0);
+	wrmsrl(ATAC_CH0D0_DMA + 2 * adev->devno, reg);
 }
 
 static const struct scsi_host_template cs5535_sht = {
diff --git a/drivers/ata/pata_cs5536.c b/drivers/ata/pata_cs5536.c
index b811efd2cc34..0578f3046b51 100644
--- a/drivers/ata/pata_cs5536.c
+++ b/drivers/ata/pata_cs5536.c
@@ -34,9 +34,9 @@ module_param_named(msr, use_msr, int, 0644);
 MODULE_PARM_DESC(msr, "Force using MSR to configure IDE function (Default: 0)");
 #else
 #undef rdmsr	/* avoid accidental MSR usage on, e.g. x86-64 */
-#undef wrmsr
+#undef wrmsrl
 #define rdmsr(x, y, z) do { } while (0)
-#define wrmsr(x, y, z) do { } while (0)
+#define wrmsrl(x, y) do { } while (0)
 #define use_msr 0
 #endif
 
@@ -98,7 +98,7 @@ static int cs5536_read(struct pci_dev *pdev, int reg, u32 *val)
 static int cs5536_write(struct pci_dev *pdev, int reg, int val)
 {
 	if (unlikely(use_msr)) {
-		wrmsr(MSR_IDE_CFG + reg, val, 0);
+		wrmsrl(MSR_IDE_CFG + reg, val);
 		return 0;
 	}
 
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 924314cdeebc..937c07f0839f 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -271,7 +271,7 @@ static u32 cpu_freq_read_amd(struct acpi_pct_register *not_used)
 
 static void cpu_freq_write_amd(struct acpi_pct_register *not_used, u32 val)
 {
-	wrmsr(MSR_AMD_PERF_CTL, val, 0);
+	wrmsrl(MSR_AMD_PERF_CTL, val);
 }
 
 static u32 cpu_freq_read_io(struct acpi_pct_register *reg)
diff --git a/drivers/cpufreq/e_powersaver.c b/drivers/cpufreq/e_powersaver.c
index d23a97ba6478..08bf293eb4bb 100644
--- a/drivers/cpufreq/e_powersaver.c
+++ b/drivers/cpufreq/e_powersaver.c
@@ -123,7 +123,7 @@ static int eps_set_state(struct eps_cpu_data *centaur,
 		}
 	}
 	/* Set new multiplier and voltage */
-	wrmsr(MSR_IA32_PERF_CTL, dest_state & 0xffff, 0);
+	wrmsrl(MSR_IA32_PERF_CTL, dest_state & 0xffff);
 	/* Wait until transition end */
 	i = 0;
 	do {
diff --git a/drivers/cpufreq/powernow-k6.c b/drivers/cpufreq/powernow-k6.c
index 99d2244e03b0..d22a0b981797 100644
--- a/drivers/cpufreq/powernow-k6.c
+++ b/drivers/cpufreq/powernow-k6.c
@@ -88,10 +88,10 @@ static int powernow_k6_get_cpu_multiplier(void)
 	local_irq_disable();
 
 	msrval = POWERNOW_IOPORT + 0x1;
-	wrmsr(MSR_K6_EPMR, msrval, 0); /* enable the PowerNow port */
+	wrmsrl(MSR_K6_EPMR, msrval); /* enable the PowerNow port */
 	invalue = inl(POWERNOW_IOPORT + 0x8);
 	msrval = POWERNOW_IOPORT + 0x0;
-	wrmsr(MSR_K6_EPMR, msrval, 0); /* disable it again */
+	wrmsrl(MSR_K6_EPMR, msrval); /* disable it again */
 
 	local_irq_enable();
 
@@ -118,13 +118,13 @@ static void powernow_k6_set_cpu_multiplier(unsigned int best_i)
 	outvalue = (1<<12) | (1<<10) | (1<<9) | (index_to_register[best_i]<<5);
 
 	msrval = POWERNOW_IOPORT + 0x1;
-	wrmsr(MSR_K6_EPMR, msrval, 0); /* enable the PowerNow port */
+	wrmsrl(MSR_K6_EPMR, msrval); /* enable the PowerNow port */
 	invalue = inl(POWERNOW_IOPORT + 0x8);
 	invalue = invalue & 0x1f;
 	outvalue = outvalue | invalue;
 	outl(outvalue, (POWERNOW_IOPORT + 0x8));
 	msrval = POWERNOW_IOPORT + 0x0;
-	wrmsr(MSR_K6_EPMR, msrval, 0); /* disable it again */
+	wrmsrl(MSR_K6_EPMR, msrval); /* disable it again */
 
 	write_cr0(cr0);
 	local_irq_enable();
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 06/15] x86/msr: Remove MSR write APIs that take the MSR value in two u32 arguments
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (4 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 05/15] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrl(msr, value) Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 07/15] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 121597fc5d41..da4f2f6d127f 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -111,9 +111,6 @@ static __always_inline u64 native_rdmsrl(const u32 msr)
 	return __rdmsr(msr);
 }
 
-#define native_wrmsr(msr, low, high)			\
-	__wrmsr((msr), ((u64)(high) << 32) | (low))
-
 #define native_wrmsrl(msr, val)				\
 	__wrmsr((msr), (val))
 
@@ -253,11 +250,6 @@ do {								\
 	(void)((high) = (u32)(__val >> 32));			\
 } while (0)
 
-static inline void wrmsr(u32 msr, u32 low, u32 high)
-{
-	native_write_msr(msr, (u64)high << 32 | low);
-}
-
 #define rdmsrl(msr, val)			\
 	((val) = native_read_msr((msr)))
 
@@ -266,12 +258,6 @@ static inline void wrmsrl(u32 msr, u64 val)
 	native_write_msr(msr, val);
 }
 
-/* wrmsr with exception handling */
-static inline int wrmsr_safe(u32 msr, u32 low, u32 high)
-{
-	return native_write_msr_safe(msr, (u64)high << 32 | low);
-}
-
 /* rdmsr with exception handling */
 #define rdmsr_safe(msr, low, high)				\
 ({								\
@@ -321,7 +307,7 @@ static __always_inline void wrmsrns(u32 msr, u64 val)
  */
 static inline int wrmsrl_safe(u32 msr, u64 val)
 {
-	return wrmsr_safe(msr, (u32)val,  (u32)(val >> 32));
+	return native_write_msr_safe(msr, val);
 }
 
 struct msr __percpu *msrs_alloc(void);
@@ -380,7 +366,7 @@ static inline int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no,
 }
 static inline int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
 {
-	return wrmsr_safe(msr_no, l, h);
+	return wrmsrl_safe(msr_no, (u64)h << 32 | l);
 }
 static inline int rdmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
 {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 07/15] x86/msr: Remove pmu_msr_{read,write}()
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (5 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 06/15] x86/msr: Remove MSR write APIs that take the MSR value in two u32 arguments Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 08/15] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Now pmu_msr_{read,write}() just do pmu_msr_chk_emulated(), so remove
them and call pmu_msr_chk_emulated() directly.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/xen/enlighten_pv.c | 17 ++++++++++-------
 arch/x86/xen/pmu.c          | 24 ++++--------------------
 arch/x86/xen/xen-ops.h      |  3 +--
 3 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 7401cce19939..a047dadf4511 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1090,8 +1090,9 @@ static void xen_write_cr4(unsigned long cr4)
 static u64 xen_do_read_msr(unsigned int msr, int *err)
 {
 	u64 val = 0;	/* Avoid uninitialized value for safe variant. */
+	bool emulated;
 
-	if (pmu_msr_read(msr, &val, err))
+	if (pmu_msr_chk_emulated(msr, &val, true, &emulated) && emulated)
 		return val;
 
 	if (err)
@@ -1134,6 +1135,7 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 			     unsigned int high, int *err)
 {
 	u64 val;
+	bool emulated;
 
 	switch (msr) {
 	case MSR_FS_BASE:
@@ -1163,12 +1165,13 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 	default:
 		val = (u64)high << 32 | low;
 
-		if (!pmu_msr_write(msr, val)) {
-			if (err)
-				*err = native_write_msr_safe(msr, val);
-			else
-				native_write_msr(msr, val);
-		}
+		if (pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated)
+			return;
+
+		if (err)
+			*err = native_write_msr_safe(msr, val);
+		else
+			native_write_msr(msr, val);
 	}
 }
 
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 1364cd3fb3ef..4d20503430dd 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -128,7 +128,7 @@ static inline uint32_t get_fam15h_addr(u32 addr)
 	return addr;
 }
 
-static inline bool is_amd_pmu_msr(unsigned int msr)
+static bool is_amd_pmu_msr(u32 msr)
 {
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
 	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
@@ -194,8 +194,7 @@ static bool is_intel_pmu_msr(u32 msr_index, int *type, int *index)
 	}
 }
 
-static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
-				  int index, bool is_read)
+static bool xen_intel_pmu_emulate(u32 msr, u64 *val, int type, int index, bool is_read)
 {
 	uint64_t *reg = NULL;
 	struct xen_pmu_intel_ctxt *ctxt;
@@ -257,7 +256,7 @@ static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
 	return false;
 }
 
-static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
+static bool xen_amd_pmu_emulate(u32 msr, u64 *val, bool is_read)
 {
 	uint64_t *reg = NULL;
 	int i, off = 0;
@@ -298,8 +297,7 @@ static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
 	return false;
 }
 
-static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
-				 bool *emul)
+bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul)
 {
 	int type, index = 0;
 
@@ -313,20 +311,6 @@ static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
 	return true;
 }
 
-bool pmu_msr_read(u32 msr, u64 *val, int *err)
-{
-	bool emulated;
-
-	return pmu_msr_chk_emulated(msr, val, true, &emulated) && emulated;
-}
-
-bool pmu_msr_write(u32 msr, u64 val)
-{
-	bool emulated;
-
-	return pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated;
-}
-
 static unsigned long long xen_amd_read_pmc(int counter)
 {
 	struct xen_pmu_amd_ctxt *ctxt;
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 4a0a1d73d8b8..6545661010ce 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -274,8 +274,7 @@ void xen_pmu_finish(int cpu);
 static inline void xen_pmu_init(int cpu) {}
 static inline void xen_pmu_finish(int cpu) {}
 #endif
-bool pmu_msr_read(u32 msr, u64 *val, int *err);
-bool pmu_msr_write(u32 msr, u64 val);
+bool pmu_msr_chk_emulated(u32 msr, u64 *val, bool is_read, bool *emul);
 int pmu_apic_update(uint32_t reg);
 unsigned long long xen_read_pmc(int counter);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 08/15] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (6 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 07/15] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 09/15] x86/opcode: Add immediate form MSR instructions to x86-opcode-map Xin Li (Intel)
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

The immediate form of MSR access instructions are primarily motivated
by performance, not code size: by having the MSR number in an immediate,
it is available *much* earlier in the pipeline, which allows the
hardware much more leeway about how a particular MSR is handled.

Use a scattered CPU feature bit for MSR immediate form instructions.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/cpufeatures.h | 19 ++++++++++---------
 arch/x86/kernel/cpu/scattered.c    |  1 +
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..a742a3d34712 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -472,15 +472,16 @@
  *
  * Reuse free bits when adding new feature flags!
  */
-#define X86_FEATURE_AMD_LBR_PMC_FREEZE	(21*32+ 0) /* "amd_lbr_pmc_freeze" AMD LBR and PMC Freeze */
-#define X86_FEATURE_CLEAR_BHB_LOOP	(21*32+ 1) /* Clear branch history at syscall entry using SW loop */
-#define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* BHI_DIS_S HW control available */
-#define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* BHI_DIS_S HW control enabled */
-#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */
-#define X86_FEATURE_AMD_FAST_CPPC	(21*32 + 5) /* Fast CPPC */
-#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
-#define X86_FEATURE_AMD_WORKLOAD_CLASS	(21*32 + 7) /* Workload Classification */
-#define X86_FEATURE_PREFER_YMM		(21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_AMD_LBR_PMC_FREEZE		(21*32+ 0) /* "amd_lbr_pmc_freeze" AMD LBR and PMC Freeze */
+#define X86_FEATURE_CLEAR_BHB_LOOP		(21*32+ 1) /* Clear branch history at syscall entry using SW loop */
+#define X86_FEATURE_BHI_CTRL			(21*32+ 2) /* BHI_DIS_S HW control available */
+#define X86_FEATURE_CLEAR_BHB_HW		(21*32+ 3) /* BHI_DIS_S HW control enabled */
+#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT	(21*32+ 4) /* Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_AMD_FAST_CPPC		(21*32+ 5) /* Fast CPPC */
+#define X86_FEATURE_AMD_HETEROGENEOUS_CORES	(21*32+ 6) /* Heterogeneous Core Topology */
+#define X86_FEATURE_AMD_WORKLOAD_CLASS		(21*32+ 7) /* Workload Classification */
+#define X86_FEATURE_PREFER_YMM			(21*32+ 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_MSR_IMM			(21*32+ 9) /* MSR immediate form instructions */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..9eda656e9793 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_APERFMPERF,		CPUID_ECX,  0, 0x00000006, 0 },
 	{ X86_FEATURE_EPB,			CPUID_ECX,  3, 0x00000006, 0 },
 	{ X86_FEATURE_INTEL_PPIN,		CPUID_EBX,  0, 0x00000007, 1 },
+	{ X86_FEATURE_MSR_IMM,			CPUID_ECX,  5, 0x00000007, 1 },
 	{ X86_FEATURE_RRSBA_CTRL,		CPUID_EDX,  2, 0x00000007, 2 },
 	{ X86_FEATURE_BHI_CTRL,			CPUID_EDX,  4, 0x00000007, 2 },
 	{ X86_FEATURE_CQM_LLC,			CPUID_EDX,  1, 0x0000000f, 0 },
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 09/15] x86/opcode: Add immediate form MSR instructions to x86-opcode-map
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (7 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 08/15] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available Xin Li (Intel)
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Add the instruction opcodes used by the immediate form WRMSRNS/RDMSR
to x86-opcode-map.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 5 +++--
 tools/arch/x86/lib/x86-opcode-map.txt | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index caedb3ef6688..e64f52321d6d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -839,7 +839,7 @@ f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
 f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy | RDMSR Rq,Gq (F2),(11B) | WRMSRNS Gq,Rq (F3),(11B)
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
 f9: MOVDIRI My,Gy
@@ -1014,7 +1014,7 @@ f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
 f2: INVPCID Gy,Mdq (F3),(ev)
 f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
 f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
-f6: Grp3_1 Eb (1A),(ev)
+f6: Grp3_1 Eb (1A),(ev) | RDMSR Rq,Gq (F2),(11B),(ev) | WRMSRNS Gq,Rq (F3),(11B),(ev)
 f7: Grp3_2 Ev (1A),(es)
 f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
 f9: MOVDIRI My,Gy (ev)
@@ -1103,6 +1103,7 @@ EndTable
 Table: VEX map 7
 Referrer:
 AVXcode: 7
+f6: RDMSR Rq,Id (F2),(v1),(11B) | WRMSRNS Id,Rq (F3),(v1),(11B)
 f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
 EndTable
 
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index caedb3ef6688..e64f52321d6d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -839,7 +839,7 @@ f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
 f3: Grp17 (1A)
 f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
-f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy | RDMSR Rq,Gq (F2),(11B) | WRMSRNS Gq,Rq (F3),(11B)
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
 f9: MOVDIRI My,Gy
@@ -1014,7 +1014,7 @@ f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
 f2: INVPCID Gy,Mdq (F3),(ev)
 f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
 f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
-f6: Grp3_1 Eb (1A),(ev)
+f6: Grp3_1 Eb (1A),(ev) | RDMSR Rq,Gq (F2),(11B),(ev) | WRMSRNS Gq,Rq (F3),(11B),(ev)
 f7: Grp3_2 Ev (1A),(es)
 f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
 f9: MOVDIRI My,Gy (ev)
@@ -1103,6 +1103,7 @@ EndTable
 Table: VEX map 7
 Referrer:
 AVXcode: 7
+f6: RDMSR Rq,Id (F2),(v1),(11B) | WRMSRNS Id,Rq (F3),(v1),(11B)
 f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
 EndTable
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (8 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 09/15] x86/opcode: Add immediate form MSR instructions to x86-opcode-map Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31 20:27   ` Konrad Rzeszutek Wilk
  2025-04-10 23:24   ` Sean Christopherson
  2025-03-31  8:22 ` [RFC PATCH v1 11/15] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr-index.h |  6 ++++++
 arch/x86/kvm/vmx/vmenter.S       | 28 ++++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e6134ef2263d..04244c3ba374 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1226,4 +1226,10 @@
 						* a #GP
 						*/
 
+/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
+#define ASM_WRMSRNS		_ASM_BYTES(0x0f,0x01,0xc6)
+
+/* Instruction opcode for the immediate form RDMSR/WRMSRNS */
+#define ASM_WRMSRNS_RAX		_ASM_BYTES(0xc4,0xe7,0x7a,0xf6,0xc0)
+
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index f6986dee6f8c..9fae43723c44 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -64,6 +64,29 @@
 	RET
 .endm
 
+/*
+ * Write EAX to MSR_IA32_SPEC_CTRL.
+ *
+ * Choose the best WRMSR instruction based on availability.
+ *
+ * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
+ */
+.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
+	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
+				  xor %edx, %edx;				\
+				  mov %edi, %eax;				\
+				  ds wrmsr),					\
+		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
+				  xor %edx, %edx;				\
+				  mov %edi, %eax;				\
+				  ASM_WRMSRNS),					\
+		      X86_FEATURE_WRMSRNS,					\
+		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
+				  mov %edi, %eax;				\
+				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
+		      X86_FEATURE_MSR_IMM
+.endm
+
 .section .noinstr.text, "ax"
 
 /**
@@ -123,10 +146,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
 	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
 	cmp %edi, %esi
 	je .Lspec_ctrl_done
-	mov $MSR_IA32_SPEC_CTRL, %ecx
-	xor %edx, %edx
-	mov %edi, %eax
-	wrmsr
+	WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
 
 .Lspec_ctrl_done:
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 11/15] x86/extable: Implement EX_TYPE_FUNC_REWIND
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (9 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 12/15] x86/msr: Use the alternatives mechanism to write MSR Xin Li (Intel)
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Add a new exception type, which allows emulating an exception as if it
had happened at or near the call site of a function.  This allows a
function call inside an alternative for instruction emulation to "kick
back" the exception into the alternatives pattern, possibly invoking a
different exception handling pattern there, or at least indicating the
"real" location of the fault.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/asm.h                 |   6 +
 arch/x86/include/asm/extable_fixup_types.h |   1 +
 arch/x86/mm/extable.c                      | 135 +++++++++++++--------
 3 files changed, 91 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index cc2881576c2c..c05c33653194 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -243,5 +243,11 @@ register unsigned long current_stack_pointer asm(_ASM_SP);
 #define _ASM_EXTABLE_FAULT(from, to)				\
 	_ASM_EXTABLE_TYPE(from, to, EX_TYPE_FAULT)
 
+#define _ASM_EXTABLE_FUNC_REWIND(from, ipdelta, spdelta)	\
+	_ASM_EXTABLE_TYPE(from, from /* unused */,		\
+			  EX_TYPE_FUNC_REWIND |			\
+			  EX_DATA_REG(spdelta) |		\
+			  EX_DATA_IMM(ipdelta))
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_X86_ASM_H */
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index 906b0d5541e8..9cd1cea45052 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -67,5 +67,6 @@
 #define	EX_TYPE_ZEROPAD			20 /* longword load with zeropad on fault */
 
 #define	EX_TYPE_ERETU			21
+#define	EX_TYPE_FUNC_REWIND		22
 
 #endif
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 51986e8a9d35..eb9331240a88 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -290,6 +290,27 @@ static bool ex_handler_eretu(const struct exception_table_entry *fixup,
 }
 #endif
 
+/*
+ * Emulate a fault taken at the call site of a function.
+ *
+ * The combined reg and flags field are used as an unsigned number of
+ * machine words to pop off the stack before the return address, then
+ * the signed imm field is used as a delta from the return IP address.
+ */
+static bool ex_handler_func_rewind(struct pt_regs *regs, int data)
+{
+	const long ipdelta = FIELD_GET(EX_DATA_IMM_MASK, data);
+	const unsigned long pops = FIELD_GET(EX_DATA_REG_MASK | EX_DATA_FLAG_MASK, data);
+	unsigned long *sp;
+
+	sp = (unsigned long *)regs->sp;
+	sp += pops;
+	regs->ip = *sp++ + ipdelta;
+	regs->sp = (unsigned long)sp;
+
+	return true;
+}
+
 int ex_get_fixup_type(unsigned long ip)
 {
 	const struct exception_table_entry *e = search_exception_tables(ip);
@@ -302,6 +323,7 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 {
 	const struct exception_table_entry *e;
 	int type, reg, imm;
+	bool again;
 
 #ifdef CONFIG_PNPBIOS
 	if (unlikely(SEGMENT_IS_PNP_CODE(regs->cs))) {
@@ -317,60 +339,71 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 	}
 #endif
 
-	e = search_exception_tables(regs->ip);
-	if (!e)
-		return 0;
-
-	type = FIELD_GET(EX_DATA_TYPE_MASK, e->data);
-	reg  = FIELD_GET(EX_DATA_REG_MASK,  e->data);
-	imm  = FIELD_GET(EX_DATA_IMM_MASK,  e->data);
-
-	switch (type) {
-	case EX_TYPE_DEFAULT:
-	case EX_TYPE_DEFAULT_MCE_SAFE:
-		return ex_handler_default(e, regs);
-	case EX_TYPE_FAULT:
-	case EX_TYPE_FAULT_MCE_SAFE:
-		return ex_handler_fault(e, regs, trapnr);
-	case EX_TYPE_UACCESS:
-		return ex_handler_uaccess(e, regs, trapnr, fault_addr);
-	case EX_TYPE_CLEAR_FS:
-		return ex_handler_clear_fs(e, regs);
-	case EX_TYPE_FPU_RESTORE:
-		return ex_handler_fprestore(e, regs);
-	case EX_TYPE_BPF:
-		return ex_handler_bpf(e, regs);
-	case EX_TYPE_WRMSR:
-		return ex_handler_msr(e, regs, true, false, reg);
-	case EX_TYPE_RDMSR:
-		return ex_handler_msr(e, regs, false, false, reg);
-	case EX_TYPE_WRMSR_SAFE:
-		return ex_handler_msr(e, regs, true, true, reg);
-	case EX_TYPE_RDMSR_SAFE:
-		return ex_handler_msr(e, regs, false, true, reg);
-	case EX_TYPE_WRMSR_IN_MCE:
-		ex_handler_msr_mce(regs, true);
-		break;
-	case EX_TYPE_RDMSR_IN_MCE:
-		ex_handler_msr_mce(regs, false);
-		break;
-	case EX_TYPE_POP_REG:
-		regs->sp += sizeof(long);
-		fallthrough;
-	case EX_TYPE_IMM_REG:
-		return ex_handler_imm_reg(e, regs, reg, imm);
-	case EX_TYPE_FAULT_SGX:
-		return ex_handler_sgx(e, regs, trapnr);
-	case EX_TYPE_UCOPY_LEN:
-		return ex_handler_ucopy_len(e, regs, trapnr, fault_addr, reg, imm);
-	case EX_TYPE_ZEROPAD:
-		return ex_handler_zeropad(e, regs, fault_addr);
+	do {
+		e = search_exception_tables(regs->ip);
+		if (!e)
+			return 0;
+
+		again = false;
+
+		type = FIELD_GET(EX_DATA_TYPE_MASK, e->data);
+		reg  = FIELD_GET(EX_DATA_REG_MASK,  e->data);
+		imm  = FIELD_GET(EX_DATA_IMM_MASK,  e->data);
+
+		switch (type) {
+		case EX_TYPE_DEFAULT:
+		case EX_TYPE_DEFAULT_MCE_SAFE:
+			return ex_handler_default(e, regs);
+		case EX_TYPE_FAULT:
+		case EX_TYPE_FAULT_MCE_SAFE:
+			return ex_handler_fault(e, regs, trapnr);
+		case EX_TYPE_UACCESS:
+			return ex_handler_uaccess(e, regs, trapnr, fault_addr);
+		case EX_TYPE_CLEAR_FS:
+			return ex_handler_clear_fs(e, regs);
+		case EX_TYPE_FPU_RESTORE:
+			return ex_handler_fprestore(e, regs);
+		case EX_TYPE_BPF:
+			return ex_handler_bpf(e, regs);
+		case EX_TYPE_WRMSR:
+			return ex_handler_msr(e, regs, true, false, reg);
+		case EX_TYPE_RDMSR:
+			return ex_handler_msr(e, regs, false, false, reg);
+		case EX_TYPE_WRMSR_SAFE:
+			return ex_handler_msr(e, regs, true, true, reg);
+		case EX_TYPE_RDMSR_SAFE:
+			return ex_handler_msr(e, regs, false, true, reg);
+		case EX_TYPE_WRMSR_IN_MCE:
+			ex_handler_msr_mce(regs, true);
+			break;
+		case EX_TYPE_RDMSR_IN_MCE:
+			ex_handler_msr_mce(regs, false);
+			break;
+		case EX_TYPE_POP_REG:
+			regs->sp += sizeof(long);
+			fallthrough;
+		case EX_TYPE_IMM_REG:
+			return ex_handler_imm_reg(e, regs, reg, imm);
+		case EX_TYPE_FAULT_SGX:
+			return ex_handler_sgx(e, regs, trapnr);
+		case EX_TYPE_UCOPY_LEN:
+			return ex_handler_ucopy_len(e, regs, trapnr, fault_addr, reg, imm);
+		case EX_TYPE_ZEROPAD:
+			return ex_handler_zeropad(e, regs, fault_addr);
 #ifdef CONFIG_X86_FRED
-	case EX_TYPE_ERETU:
-		return ex_handler_eretu(e, regs, error_code);
+		case EX_TYPE_ERETU:
+			return ex_handler_eretu(e, regs, error_code);
 #endif
-	}
+		case EX_TYPE_FUNC_REWIND:
+			again = ex_handler_func_rewind(regs, e->data);
+			break;
+		default:
+			break;	/* Will BUG() */
+		}
+	} while (again);
+
 	BUG();
+	return 0;
 }
 
 extern unsigned int early_recursion_flag;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 12/15] x86/msr: Use the alternatives mechanism to write MSR
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (10 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 11/15] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR Xin Li (Intel)
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Also add support for the immediate form MSR write support.

Originally-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/fred.h           |   2 +-
 arch/x86/include/asm/msr.h            | 340 ++++++++++++++++++++++----
 arch/x86/include/asm/paravirt.h       |  22 --
 arch/x86/include/asm/paravirt_types.h |   2 -
 arch/x86/kernel/paravirt.c            |   2 -
 arch/x86/xen/enlighten_pv.c           |  63 ++---
 arch/x86/xen/xen-asm.S                |  55 +++++
 arch/x86/xen/xen-ops.h                |   2 +
 8 files changed, 362 insertions(+), 126 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 2a29e5216881..e6eab64095d4 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -100,7 +100,7 @@ static __always_inline void fred_update_rsp0(void)
 	unsigned long rsp0 = (unsigned long) task_stack_page(current) + THREAD_SIZE;
 
 	if (cpu_feature_enabled(X86_FEATURE_FRED) && (__this_cpu_read(fred_rsp0) != rsp0)) {
-		wrmsrns(MSR_IA32_FRED_RSP0, rsp0);
+		native_wrmsrl(MSR_IA32_FRED_RSP0, rsp0);
 		__this_cpu_write(fred_rsp0, rsp0);
 	}
 }
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index da4f2f6d127f..066cde11254a 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -8,6 +8,7 @@
 
 #include <asm/asm.h>
 #include <asm/errno.h>
+#include <asm/cpufeature.h>
 #include <asm/cpumask.h>
 #include <uapi/asm/msr.h>
 #include <asm/shared/msr.h>
@@ -72,13 +73,83 @@ static inline void do_trace_read_msr(unsigned int msr, u64 val, int failed) {}
 static inline void do_trace_rdpmc(unsigned int msr, u64 val, int failed) {}
 #endif
 
+#ifdef CONFIG_CC_IS_GCC
+#define ASM_WRMSRNS_IMM			\
+	" .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
+#endif
+
+#ifdef CONFIG_CC_IS_CLANG
 /*
- * __rdmsr() and __wrmsr() are the two primitives which are the bare minimum MSR
- * accessors and should not have any tracing or other functionality piggybacking
- * on them - those are *purely* for accessing MSRs and nothing more. So don't even
- * think of extending them - you will be slapped with a stinking trout or a frozen
- * shark will reach you, wherever you are! You've been warned.
+ * clang doesn't support the insn directive.
+ *
+ * The register operand is encoded as %rax because all uses of the immediate
+ * form MSR access instructions reference %rax as the register operand.
  */
+#define ASM_WRMSRNS_IMM			\
+	" .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
+#endif
+
+#define PREPARE_RDX_FOR_WRMSR		\
+	"mov %%rax, %%rdx\n\t"		\
+	"shr $0x20, %%rdx\n\t"
+
+#define PREPARE_RCX_RDX_FOR_WRMSR	\
+	"mov %[msr], %%ecx\n\t"		\
+	PREPARE_RDX_FOR_WRMSR
+
+enum pv_msr_action {
+	PV_MSR_NATIVE,
+	PV_MSR_PV,
+	PV_MSR_IGNORE,
+};
+
+#ifdef CONFIG_XEN_PV
+static __always_inline enum pv_msr_action get_pv_msr_action(const u32 msr)
+{
+	if (!__builtin_constant_p(msr)) {
+		/* Is it safe to blindly do so? */
+		return PV_MSR_NATIVE;
+	}
+
+	switch (msr) {
+	case MSR_FS_BASE:
+	case MSR_KERNEL_GS_BASE:
+	case MSR_GS_BASE:
+	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+	case MSR_CORE_PERF_GLOBAL_STATUS:
+	case MSR_CORE_PERF_GLOBAL_CTRL:
+	case MSR_CORE_PERF_FIXED_CTR_CTRL:
+	case MSR_IA32_APICBASE:
+		return PV_MSR_PV;
+
+	case MSR_STAR:
+	case MSR_CSTAR:
+	case MSR_LSTAR:
+	case MSR_SYSCALL_MASK:
+	case MSR_IA32_SYSENTER_CS:
+	case MSR_IA32_SYSENTER_ESP:
+	case MSR_IA32_SYSENTER_EIP:
+		return PV_MSR_IGNORE;
+
+	default:
+		/*
+		 * MSR access instructions RDMSR/WRMSR/WRMSRNS will be used.
+		 *
+		 * The hypervisor will trap and inject #GP into the guest and
+		 * the MSR access instruction will be skipped.
+		 */
+		return PV_MSR_NATIVE;
+	}
+}
+
+extern void asm_xen_write_msr(void);
+#else
+static __always_inline enum pv_msr_action get_pv_msr_action(const u32 msr)
+{
+	return PV_MSR_NATIVE;
+}
+#endif
+
 static __always_inline unsigned long long __rdmsr(unsigned int msr)
 {
 	DECLARE_ARGS(val, low, high);
@@ -91,14 +162,6 @@ static __always_inline unsigned long long __rdmsr(unsigned int msr)
 	return EAX_EDX_VAL(val, low, high);
 }
 
-static __always_inline void __wrmsr(u32 msr, u64 val)
-{
-	asm volatile("1: wrmsr\n"
-		     "2:\n"
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
-		     : : "c" (msr), "a"((u32)val), "d" ((u32)(val >> 32)) : "memory");
-}
-
 #define native_rdmsr(msr, val1, val2)			\
 do {							\
 	u64 __val = __rdmsr((msr));			\
@@ -111,9 +174,6 @@ static __always_inline u64 native_rdmsrl(const u32 msr)
 	return __rdmsr(msr);
 }
 
-#define native_wrmsrl(msr, val)				\
-	__wrmsr((msr), (val))
-
 static inline unsigned long long native_read_msr(unsigned int msr)
 {
 	unsigned long long val;
@@ -141,31 +201,232 @@ static inline unsigned long long native_read_msr_safe(unsigned int msr,
 	return EAX_EDX_VAL(val, low, high);
 }
 
-/* Can be uninlined because referenced by paravirt */
-static inline void notrace native_write_msr(u32 msr, u64 val)
+/*
+ * Non-serializing WRMSR, when available.
+ * Falls back to a serializing WRMSR.
+ */
+static __always_inline bool __native_wrmsr_variable(const u32 msr, const u64 val, const int type)
+{
+#ifdef CONFIG_X86_64
+	BUILD_BUG_ON(__builtin_constant_p(msr));
+#endif
+
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE("ds wrmsr",
+			    ASM_WRMSRNS,
+			    X86_FEATURE_WRMSRNS)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])
+
+		:
+		: "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)), [type] "i" (type)
+		: "memory"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+
+#ifdef CONFIG_X86_64
+/*
+ * Non-serializing WRMSR or its immediate form, when available.
+ * Falls back to a serializing WRMSR.
+ */
+static __always_inline bool __native_wrmsr_constant(const u32 msr, const u64 val, const int type)
+{
+	BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+	/*
+	 * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant
+	 * DS prefix to avoid a trailing NOP.
+	 */
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR
+			      "2: ds wrmsr",
+			      PREPARE_RCX_RDX_FOR_WRMSR
+			      ASM_WRMSRNS,
+			      X86_FEATURE_WRMSRNS,
+			      ASM_WRMSRNS_IMM,
+			      X86_FEATURE_MSR_IMM)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For WRMSRNS immediate */
+		_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type])	/* For WRMSR(NS) */
+
+		:
+		: [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
+		: "memory", "ecx", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+#endif
+
+static __always_inline bool __native_wrmsr(const u32 msr, const u64 val, const int type)
+{
+#ifdef CONFIG_X86_64
+	if (__builtin_constant_p(msr))
+		return __native_wrmsr_constant(msr, val, type);
+#endif
+
+	return __native_wrmsr_variable(msr, val, type);
+}
+
+static __always_inline void native_wrmsr(const u32 msr, const u32 low, const u32 high)
+{
+	__native_wrmsr(msr, (u64)high << 32 | low, EX_TYPE_WRMSR);
+}
+
+static __always_inline void native_wrmsrl(const u32 msr, const u64 val)
+{
+	__native_wrmsr(msr, val, EX_TYPE_WRMSR);
+}
+
+static inline void notrace native_write_msr(const u32 msr, const u64 val)
 {
-	native_wrmsrl(msr, val);
+	__native_wrmsr(msr, val, EX_TYPE_WRMSR);
 
 	if (tracepoint_enabled(write_msr))
 		do_trace_write_msr(msr, val, 0);
 }
 
-/* Can be uninlined because referenced by paravirt */
-static inline int notrace native_write_msr_safe(u32 msr, u64 val)
+static inline int notrace native_write_msr_safe(const u32 msr, const u64 val)
 {
-	int err;
+	int err = __native_wrmsr(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
 
-	asm volatile("1: wrmsr ; xor %[err],%[err]\n"
-		     "2:\n\t"
-		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_WRMSR_SAFE, %[err])
-		     : [err] "=a" (err)
-		     : "c" (msr), "0" ((u32)val), "d" ((u32)(val >> 32))
-		     : "memory");
 	if (tracepoint_enabled(write_msr))
 		do_trace_write_msr(msr, val, err);
+
 	return err;
 }
 
+static __always_inline bool __wrmsr_variable(const u32 msr, const u64 val, const int type)
+{
+#ifdef CONFIG_X86_64
+	BUILD_BUG_ON(__builtin_constant_p(msr));
+
+	asm_inline volatile goto(
+		ALTERNATIVE(PREPARE_RDX_FOR_WRMSR,
+			    "call asm_xen_write_msr\n\t"
+			    "jnz 2f\n\t",
+			    X86_FEATURE_XENPV)
+		ALTERNATIVE("1: ds wrmsr",
+			    ASM_WRMSRNS,
+			    X86_FEATURE_WRMSRNS)
+		"2:\n"
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For WRMSR(NS) */
+
+		: ASM_CALL_CONSTRAINT
+		: "a" (val), "c" (msr), [type] "i" (type)
+		: "memory", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+#else
+	return __native_wrmsr_variable(msr, val, type);
+#endif
+}
+
+static __always_inline bool __wrmsr_variable_all(const u32 msr, const u64 val, const int type)
+{
+	const enum pv_msr_action action = get_pv_msr_action(msr);
+
+	if (action == PV_MSR_PV) {
+		return __wrmsr_variable(msr, val, type);
+	} else if (action == PV_MSR_IGNORE) {
+		if (cpu_feature_enabled(X86_FEATURE_XENPV))
+			return false;
+	}
+
+	return __native_wrmsr_variable(msr, val, type);
+}
+
+#ifdef CONFIG_X86_64
+static __always_inline bool __wrmsr_constant(const u32 msr, const u64 val, const int type)
+{
+	BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE_2(PREPARE_RCX_RDX_FOR_WRMSR,
+			      "",
+			      X86_FEATURE_MSR_IMM,
+			      "mov %[msr], %%ecx\n\t"
+			      "call asm_xen_write_msr\n\t"
+			      "jnz 3f\n\t",
+			      X86_FEATURE_XENPV)
+		ALTERNATIVE_2("2: ds wrmsr",
+			      ASM_WRMSRNS,
+			      X86_FEATURE_WRMSRNS,
+			      ASM_WRMSRNS_IMM,
+			      X86_FEATURE_MSR_IMM)
+		"3:\n"
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For WRMSRNS immediate */
+		_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type])	/* For WRMSR(NS) */
+
+		: ASM_CALL_CONSTRAINT
+		: [val] "a" (val), [msr] "i" (msr), [type] "i" (type)
+		: "memory", "ecx", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+
+static __always_inline bool __wrmsr_constant_all(const u32 msr, const u64 val, const int type)
+{
+	const enum pv_msr_action action = get_pv_msr_action(msr);
+
+	if (action == PV_MSR_PV) {
+		return __wrmsr_constant(msr, val, type);
+	} else if (action == PV_MSR_IGNORE) {
+		if (cpu_feature_enabled(X86_FEATURE_XENPV))
+			return false;
+	}
+
+	return __native_wrmsr_constant(msr, val, type);
+}
+#endif
+
+static __always_inline bool __wrmsr(const u32 msr, const u64 val, const int type)
+{
+#ifdef CONFIG_X86_64
+	if (__builtin_constant_p(msr))
+		return __wrmsr_constant_all(msr, val, type);
+#endif
+
+	return __wrmsr_variable_all(msr, val, type);
+}
+
+static __always_inline void wrmsr(const u32 msr, const u32 low, const u32 high)
+{
+	__wrmsr(msr, (u64)high << 32 | low, EX_TYPE_WRMSR);
+}
+
+static __always_inline void wrmsrl(const u32 msr, const u64 val)
+{
+	__wrmsr(msr, val, EX_TYPE_WRMSR);
+}
+
+static __always_inline int wrmsr_safe(const u32 msr, const u32 low, const u32 high)
+{
+	return __wrmsr(msr, (u64)high << 32 | low, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
+}
+
+static __always_inline int wrmsrl_safe(const u32 msr, const u64 val)
+{
+	return __wrmsr(msr, val, EX_TYPE_WRMSR_SAFE) ? -EIO : 0;
+}
+
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
@@ -287,29 +548,6 @@ do {							\
 
 #endif	/* !CONFIG_PARAVIRT_XXL */
 
-/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
-#define WRMSRNS _ASM_BYTES(0x0f,0x01,0xc6)
-
-/* Non-serializing WRMSR, when available.  Falls back to a serializing WRMSR. */
-static __always_inline void wrmsrns(u32 msr, u64 val)
-{
-	/*
-	 * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant
-	 * DS prefix to avoid a trailing NOP.
-	 */
-	asm volatile("1: " ALTERNATIVE("ds wrmsr", WRMSRNS, X86_FEATURE_WRMSRNS)
-		     "2: " _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
-		     : : "c" (msr), "a" ((u32)val), "d" ((u32)(val >> 32)));
-}
-
-/*
- * 64-bit version of wrmsr_safe():
- */
-static inline int wrmsrl_safe(u32 msr, u64 val)
-{
-	return native_write_msr_safe(msr, val);
-}
-
 struct msr __percpu *msrs_alloc(void);
 void msrs_free(struct msr __percpu *msrs);
 int msr_set_bit(u32 msr, u8 bit);
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f3d6e8394d38..351feb890ab0 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -180,21 +180,11 @@ static inline u64 paravirt_read_msr(unsigned msr)
 	return PVOP_CALL1(u64, cpu.read_msr, msr);
 }
 
-static inline void paravirt_write_msr(u32 msr, u32 low, u32 high)
-{
-	PVOP_VCALL2(cpu.write_msr, msr, (u64)high << 32 | low);
-}
-
 static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
 {
 	return PVOP_CALL2(u64, cpu.read_msr_safe, msr, err);
 }
 
-static inline int paravirt_write_msr_safe(u32 msr, u32 low, u32 high)
-{
-	return PVOP_CALL2(int, cpu.write_msr_safe, msr, (u64)high << 32 | low);
-}
-
 #define rdmsr(msr, val1, val2)			\
 do {						\
 	u64 _l = paravirt_read_msr(msr);	\
@@ -202,23 +192,11 @@ do {						\
 	val2 = _l >> 32;			\
 } while (0)
 
-#define wrmsr(msr, val1, val2)			\
-do {						\
-	paravirt_write_msr(msr, val1, val2);	\
-} while (0)
-
 #define rdmsrl(msr, val)			\
 do {						\
 	val = paravirt_read_msr(msr);		\
 } while (0)
 
-static inline void wrmsrl(unsigned msr, u64 val)
-{
-	wrmsr(msr, (u32)val, (u32)(val>>32));
-}
-
-#define wrmsr_safe(msr, a, b)	paravirt_write_msr_safe(msr, a, b)
-
 /* rdmsr with exception handling */
 #define rdmsr_safe(msr, a, b)				\
 ({							\
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 78777b78da12..8a563576d70e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -92,14 +92,12 @@ struct pv_cpu_ops {
 
 	/* Unsafe MSR operations.  These will warn or panic on failure. */
 	u64 (*read_msr)(unsigned int msr);
-	void (*write_msr)(u32 msr, u64 val);
 
 	/*
 	 * Safe MSR operations.
 	 * read sets err to 0 or -EIO.  write returns 0 or -EIO.
 	 */
 	u64 (*read_msr_safe)(unsigned int msr, int *err);
-	int (*write_msr_safe)(u32 msr, u64 val);
 
 	u64 (*read_pmc)(int counter);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 1ccd05d8999f..ffb04445f97e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -129,9 +129,7 @@ struct paravirt_patch_template pv_ops = {
 	.cpu.write_cr0		= native_write_cr0,
 	.cpu.write_cr4		= native_write_cr4,
 	.cpu.read_msr		= native_read_msr,
-	.cpu.write_msr		= native_write_msr,
 	.cpu.read_msr_safe	= native_read_msr_safe,
-	.cpu.write_msr_safe	= native_write_msr_safe,
 	.cpu.read_pmc		= native_read_pmc,
 	.cpu.load_tr_desc	= native_load_tr_desc,
 	.cpu.set_ldt		= native_set_ldt,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index a047dadf4511..d02f55bfa869 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1112,43 +1112,33 @@ static u64 xen_do_read_msr(unsigned int msr, int *err)
 	return val;
 }
 
-static void set_seg(unsigned int which, unsigned int low, unsigned int high,
-		    int *err)
+static void set_seg(u32 which, u64 base)
 {
-	u64 base = ((u64)high << 32) | low;
-
-	if (HYPERVISOR_set_segment_base(which, base) == 0)
-		return;
-
-	if (err)
-		*err = -EIO;
-	else
+	if (HYPERVISOR_set_segment_base(which, base))
 		WARN(1, "Xen set_segment_base(%u, %llx) failed\n", which, base);
 }
 
 /*
- * Support write_msr_safe() and write_msr() semantics.
- * With err == NULL write_msr() semantics are selected.
- * Supplying an err pointer requires err to be pre-initialized with 0.
+ * Return true to indicate the requested MSR write has been done successfully,
+ * otherwise return false to have the calling MSR write primitives in msr.h to
+ * fail.
  */
-static void xen_do_write_msr(unsigned int msr, unsigned int low,
-			     unsigned int high, int *err)
+bool xen_do_write_msr(u32 msr, u64 val)
 {
-	u64 val;
 	bool emulated;
 
 	switch (msr) {
 	case MSR_FS_BASE:
-		set_seg(SEGBASE_FS, low, high, err);
-		break;
+		set_seg(SEGBASE_FS, val);
+		return true;
 
 	case MSR_KERNEL_GS_BASE:
-		set_seg(SEGBASE_GS_USER, low, high, err);
-		break;
+		set_seg(SEGBASE_GS_USER, val);
+		return true;
 
 	case MSR_GS_BASE:
-		set_seg(SEGBASE_GS_KERNEL, low, high, err);
-		break;
+		set_seg(SEGBASE_GS_KERNEL, val);
+		return true;
 
 	case MSR_STAR:
 	case MSR_CSTAR:
@@ -1160,18 +1150,13 @@ static void xen_do_write_msr(unsigned int msr, unsigned int low,
 		/* Fast syscall setup is all done in hypercalls, so
 		   these are all ignored.  Stub them out here to stop
 		   Xen console noise. */
-		break;
+		return true;
 
 	default:
-		val = (u64)high << 32 | low;
-
 		if (pmu_msr_chk_emulated(msr, &val, false, &emulated) && emulated)
-			return;
+			return true;
 
-		if (err)
-			*err = native_write_msr_safe(msr, val);
-		else
-			native_write_msr(msr, val);
+		return false;
 	}
 }
 
@@ -1180,15 +1165,6 @@ static u64 xen_read_msr_safe(unsigned int msr, int *err)
 	return xen_do_read_msr(msr, err);
 }
 
-static int xen_write_msr_safe(u32 msr, u64 val)
-{
-	int err = 0;
-
-	xen_do_write_msr(msr, val, (u32)(val >> 32), &err);
-
-	return err;
-}
-
 static u64 xen_read_msr(unsigned int msr)
 {
 	int err;
@@ -1196,13 +1172,6 @@ static u64 xen_read_msr(unsigned int msr)
 	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
 }
 
-static void xen_write_msr(u32 msr, u64 val)
-{
-	int err;
-
-	xen_do_write_msr(msr, val, (u32)(val >> 32), xen_msr_safe ? &err : NULL);
-}
-
 /* This is called once we have the cpu_possible_mask */
 void __init xen_setup_vcpu_info_placement(void)
 {
@@ -1238,10 +1207,8 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
 		.write_cr4 = xen_write_cr4,
 
 		.read_msr = xen_read_msr,
-		.write_msr = xen_write_msr,
 
 		.read_msr_safe = xen_read_msr_safe,
-		.write_msr_safe = xen_write_msr_safe,
 
 		.read_pmc = xen_read_pmc,
 
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 109af12f7647..e672632b1cc0 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -344,3 +344,58 @@ SYM_CODE_END(xen_entry_SYSENTER_compat)
 SYM_CODE_END(xen_entry_SYSCALL_compat)
 
 #endif	/* CONFIG_IA32_EMULATION */
+
+.macro XEN_SAVE_CALLEE_REGS_FOR_MSR
+	push %rcx
+	push %rdi
+	push %rsi
+	push %r8
+	push %r9
+	push %r10
+	push %r11
+.endm
+
+.macro XEN_RESTORE_CALLEE_REGS_FOR_MSR
+	pop %r11
+	pop %r10
+	pop %r9
+	pop %r8
+	pop %rsi
+	pop %rdi
+	pop %rcx
+.endm
+
+/*
+ * MSR number in %ecx, MSR value in %rax.
+ *
+ * %edx is set up to match %rax >> 32 like the native stub
+ * is expected to do
+ *
+ * Let xen_do_write_msr() return 'false' if the MSR access should
+ * be executed natively, IOW, 'true' means it has done the job.
+ *
+ * 	bool xen_do_write_msr(u32 msr, u64 value)
+ *
+ * If ZF=1 then this will fall down to the actual native WRMSR[NS]
+ * instruction.
+ *
+ * This also removes the need for Xen to maintain different safe and
+ * unsafe MSR routines, as the difference is handled by the same
+ * trap handler as is used natively.
+ */
+ SYM_FUNC_START(asm_xen_write_msr)
+	ENDBR
+	FRAME_BEGIN
+	push %rax		/* Save in case of native fallback */
+	XEN_SAVE_CALLEE_REGS_FOR_MSR
+	mov %ecx, %edi		/* MSR number */
+	mov %rax, %rsi		/* MSR data */
+	call xen_do_write_msr
+	test %al, %al		/* %al=1, i.e., ZF=0, means successfully done */
+	XEN_RESTORE_CALLEE_REGS_FOR_MSR
+	mov 4(%rsp), %edx	/* Set up %edx for native execution */
+	pop %rax
+	FRAME_END
+	RET
+SYM_FUNC_END(asm_xen_write_msr)
+EXPORT_SYMBOL_GPL(asm_xen_write_msr)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 6545661010ce..fc3c55871037 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -146,6 +146,8 @@ __visible unsigned long xen_read_cr2_direct(void);
 /* These are not functions, and cannot be called normally */
 __visible void xen_iret(void);
 
+extern bool xen_do_write_msr(u32 msr, u64 val);
+
 extern int xen_panic_handler_init(void);
 
 int xen_cpuhp_setup(int (*cpu_up_prepare_cb)(unsigned int),
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (11 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 12/15] x86/msr: Use the alternatives mechanism to write MSR Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-04-14 17:13   ` Francesco Lavra
  2025-03-31  8:22 ` [RFC PATCH v1 14/15] x86/extable: Add support for the immediate form MSR instructions Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 15/15] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)
  14 siblings, 1 reply; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Also add support for the immediate form MSR read support.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h            | 274 ++++++++++++++++++++------
 arch/x86/include/asm/paravirt.h       |  40 ----
 arch/x86/include/asm/paravirt_types.h |   9 -
 arch/x86/kernel/paravirt.c            |   2 -
 arch/x86/xen/enlighten_pv.c           |  45 ++---
 arch/x86/xen/xen-asm.S                |  34 ++++
 arch/x86/xen/xen-ops.h                |   7 +
 7 files changed, 276 insertions(+), 135 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 066cde11254a..fc93c2601853 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -74,6 +74,8 @@ static inline void do_trace_rdpmc(unsigned int msr, u64 val, int failed) {}
 #endif
 
 #ifdef CONFIG_CC_IS_GCC
+#define ASM_RDMSR_IMM			\
+	" .insn VEX.128.F2.M7.W0 0xf6 /0, %[msr]%{:u32}, %[val]\n\t"
 #define ASM_WRMSRNS_IMM			\
 	" .insn VEX.128.F3.M7.W0 0xf6 /0, %[val], %[msr]%{:u32}\n\t"
 #endif
@@ -85,10 +87,17 @@ static inline void do_trace_rdpmc(unsigned int msr, u64 val, int failed) {}
  * The register operand is encoded as %rax because all uses of the immediate
  * form MSR access instructions reference %rax as the register operand.
  */
+#define ASM_RDMSR_IMM			\
+	" .byte 0xc4,0xe7,0x7b,0xf6,0xc0; .long %c[msr]"
 #define ASM_WRMSRNS_IMM			\
 	" .byte 0xc4,0xe7,0x7a,0xf6,0xc0; .long %c[msr]"
 #endif
 
+#define RDMSR_AND_SAVE_RESULT		\
+	"rdmsr\n\t"			\
+	"shl $0x20, %%rdx\n\t"		\
+	"or %%rdx, %[val]\n\t"
+
 #define PREPARE_RDX_FOR_WRMSR		\
 	"mov %%rax, %%rdx\n\t"		\
 	"shr $0x20, %%rdx\n\t"
@@ -142,6 +151,7 @@ static __always_inline enum pv_msr_action get_pv_msr_action(const u32 msr)
 	}
 }
 
+extern void asm_xen_read_msr(void);
 extern void asm_xen_write_msr(void);
 #else
 static __always_inline enum pv_msr_action get_pv_msr_action(const u32 msr)
@@ -150,35 +160,95 @@ static __always_inline enum pv_msr_action get_pv_msr_action(const u32 msr)
 }
 #endif
 
-static __always_inline unsigned long long __rdmsr(unsigned int msr)
+static __always_inline bool __native_rdmsr_variable(const u32 msr, u64 *val, const int type)
 {
-	DECLARE_ARGS(val, low, high);
+#ifdef CONFIG_X86_64
+	BUILD_BUG_ON(__builtin_constant_p(msr));
 
-	asm volatile("1: rdmsr\n"
-		     "2:\n"
-		     _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
-		     : EAX_EDX_RET(val, low, high) : "c" (msr));
+	asm_inline volatile goto(
+		"1:\n"
+		RDMSR_AND_SAVE_RESULT
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR */
 
-	return EAX_EDX_VAL(val, low, high);
+		: [val] "=a" (*val)
+		: "c" (msr), [type] "i" (type)
+		: "memory", "rdx"
+		: badmsr);
+#else
+	asm_inline volatile goto(
+		"1: rdmsr\n\t"
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR */
+
+		: "=A" (*val)
+		: "c" (msr), [type] "i" (type)
+		: "memory"
+		: badmsr);
+#endif
+
+	return false;
+
+badmsr:
+	return true;
+}
+
+#ifdef CONFIG_X86_64
+static __always_inline bool __native_rdmsr_constant(const u32 msr, u64 *val, const int type)
+{
+	BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE("mov %[msr], %%ecx\n\t"
+			    "2:\n"
+			    RDMSR_AND_SAVE_RESULT,
+			    ASM_RDMSR_IMM,
+			    X86_FEATURE_MSR_IMM)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR immediate */
+		_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type])	/* For RDMSR */
+
+		: [val] "=a" (*val)
+		: [msr] "i" (msr), [type] "i" (type)
+		: "memory", "ecx", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+#endif
+
+static __always_inline bool __native_rdmsr(const u32 msr, u64 *val, const int type)
+{
+#ifdef CONFIG_X86_64
+	if (__builtin_constant_p(msr))
+		return __native_rdmsr_constant(msr, val, type);
+#endif
+
+	return __native_rdmsr_variable(msr, val, type);
 }
 
-#define native_rdmsr(msr, val1, val2)			\
+#define native_rdmsr(msr, low, high)			\
 do {							\
-	u64 __val = __rdmsr((msr));			\
-	(void)((val1) = (u32)__val);			\
-	(void)((val2) = (u32)(__val >> 32));		\
+	u64 __val = 0;					\
+	__native_rdmsr((msr), &__val, EX_TYPE_RDMSR);	\
+	(void)((low) = (u32)__val);			\
+	(void)((high) = (u32)(__val >> 32));		\
 } while (0)
 
 static __always_inline u64 native_rdmsrl(const u32 msr)
 {
-	return __rdmsr(msr);
+	u64 val = 0;
+
+	__native_rdmsr(msr, &val, EX_TYPE_RDMSR);
+	return val;
 }
 
-static inline unsigned long long native_read_msr(unsigned int msr)
+static inline u64 native_read_msr(const u32 msr)
 {
-	unsigned long long val;
+	u64 val = 0;
 
-	val = __rdmsr(msr);
+	__native_rdmsr(msr, &val, EX_TYPE_RDMSR);
 
 	if (tracepoint_enabled(read_msr))
 		do_trace_read_msr(msr, val, 0);
@@ -186,19 +256,139 @@ static inline unsigned long long native_read_msr(unsigned int msr)
 	return val;
 }
 
-static inline unsigned long long native_read_msr_safe(unsigned int msr,
-						      int *err)
+static inline u64 native_read_msr_safe(const u32 msr, int *err)
 {
-	DECLARE_ARGS(val, low, high);
+	u64 val = 0;
+
+	*err = __native_rdmsr(msr, &val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
 
-	asm volatile("1: rdmsr ; xor %[err],%[err]\n"
-		     "2:\n\t"
-		     _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_RDMSR_SAFE, %[err])
-		     : [err] "=r" (*err), EAX_EDX_RET(val, low, high)
-		     : "c" (msr));
 	if (tracepoint_enabled(read_msr))
-		do_trace_read_msr(msr, EAX_EDX_VAL(val, low, high), *err);
-	return EAX_EDX_VAL(val, low, high);
+		do_trace_read_msr(msr, val, *err);
+
+	return val;
+}
+
+static __always_inline bool __rdmsr_variable(const u32 msr, u64 *val, const int type)
+{
+#ifdef CONFIG_X86_64
+	BUILD_BUG_ON(__builtin_constant_p(msr));
+
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE(RDMSR_AND_SAVE_RESULT,
+			    "call asm_xen_read_msr\n\t",
+			    X86_FEATURE_XENPV)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR and CALL */
+
+		: [val] "=a" (*val), ASM_CALL_CONSTRAINT
+		: "c" (msr), [type] "i" (type)
+		: "memory", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+#else
+	return __native_rdmsr_variable(msr, val, type);
+#endif
+}
+
+static __always_inline bool __rdmsr_variable_all(const u32 msr, u64 *val, const int type)
+{
+	const enum pv_msr_action action = get_pv_msr_action(msr);
+
+	if (action == PV_MSR_PV) {
+		return __rdmsr_variable(msr, val, type);
+	} else if (action == PV_MSR_IGNORE) {
+		if (cpu_feature_enabled(X86_FEATURE_XENPV))
+			return false;
+	}
+
+	return __native_rdmsr_variable(msr, val, type);
+}
+
+#ifdef CONFIG_X86_64
+static __always_inline bool __rdmsr_constant(const u32 msr, u64 *val, const int type)
+{
+	BUILD_BUG_ON(!__builtin_constant_p(msr));
+
+	asm_inline volatile goto(
+		"1:\n"
+		ALTERNATIVE_2("mov %[msr], %%ecx\n\t"
+			      "2:\n"
+			      RDMSR_AND_SAVE_RESULT,
+			      ASM_RDMSR_IMM,
+			      X86_FEATURE_MSR_IMM,
+			      "mov %[msr], %%ecx\n\t"
+			      "call asm_xen_read_msr\n\t",
+			      X86_FEATURE_XENPV)
+		_ASM_EXTABLE_TYPE(1b, %l[badmsr], %c[type])	/* For RDMSR immediate */
+		_ASM_EXTABLE_TYPE(2b, %l[badmsr], %c[type])	/* For RDMSR and CALL */
+
+		: [val] "=a" (*val), ASM_CALL_CONSTRAINT
+		: [msr] "i" (msr), [type] "i" (type)
+		: "memory", "ecx", "rdx"
+		: badmsr);
+
+	return false;
+
+badmsr:
+	return true;
+}
+
+static __always_inline bool __rdmsr_constant_all(const u32 msr, u64 *val, const int type)
+{
+	const enum pv_msr_action action = get_pv_msr_action(msr);
+
+	if (action == PV_MSR_PV) {
+		return __rdmsr_constant(msr, val, type);
+	} else if (action == PV_MSR_IGNORE) {
+		if (cpu_feature_enabled(X86_FEATURE_XENPV))
+			return false;
+	}
+
+	return __native_rdmsr_constant(msr, val, type);
+}
+#endif
+
+static __always_inline bool __rdmsr(const u32 msr, u64 *val, const int type)
+{
+#ifdef CONFIG_X86_64
+	if (__builtin_constant_p(msr))
+		return __rdmsr_constant_all(msr, val, type);
+#endif
+
+	return __rdmsr_variable_all(msr, val, type);
+}
+
+#define rdmsr(msr, low, high)				\
+do {							\
+	u64 __val = 0;					\
+	__rdmsr((msr), &__val, EX_TYPE_RDMSR);		\
+	(void)((low) = (u32)__val);			\
+	(void)((high) = (u32)(__val >> 32));		\
+} while (0)
+
+#define rdmsrl(msr, val)				\
+do {							\
+	u64 __val = 0;					\
+	__rdmsr((msr), &__val, EX_TYPE_RDMSR);		\
+	(val) = __val;					\
+} while (0)
+
+#define rdmsr_safe(msr, low, high)						\
+({										\
+	u64 __val = 0;								\
+	int __err = __rdmsr((msr), &__val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;	\
+	(*low) = (u32)__val;							\
+	(*high) = (u32)(__val >> 32);						\
+	__err;									\
+})
+
+static __always_inline int rdmsrl_safe(const u32 msr, u64 *val)
+{
+	return __rdmsr(msr, val, EX_TYPE_RDMSR_SAFE) ? -EIO : 0;
 }
 
 /*
@@ -503,40 +693,6 @@ static inline unsigned long long native_read_pmc(int counter)
  * Note: the rd* operations modify the parameters directly (without using
  * pointer indirection), this allows gcc to optimize better
  */
-
-#define rdmsr(msr, low, high)					\
-do {								\
-	u64 __val = native_read_msr((msr));			\
-	(void)((low) = (u32)__val);				\
-	(void)((high) = (u32)(__val >> 32));			\
-} while (0)
-
-#define rdmsrl(msr, val)			\
-	((val) = native_read_msr((msr)))
-
-static inline void wrmsrl(u32 msr, u64 val)
-{
-	native_write_msr(msr, val);
-}
-
-/* rdmsr with exception handling */
-#define rdmsr_safe(msr, low, high)				\
-({								\
-	int __err;						\
-	u64 __val = native_read_msr_safe((msr), &__err);	\
-	(*low) = (u32)__val;					\
-	(*high) = (u32)(__val >> 32);				\
-	__err;							\
-})
-
-static inline int rdmsrl_safe(unsigned int msr, unsigned long long *p)
-{
-	int err;
-
-	*p = native_read_msr_safe(msr, &err);
-	return err;
-}
-
 #define rdpmc(counter, low, high)			\
 do {							\
 	u64 _l = native_read_pmc((counter));		\
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 351feb890ab0..7ddb71ed9d0c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -175,46 +175,6 @@ static inline void __write_cr4(unsigned long x)
 	PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline u64 paravirt_read_msr(unsigned msr)
-{
-	return PVOP_CALL1(u64, cpu.read_msr, msr);
-}
-
-static inline u64 paravirt_read_msr_safe(unsigned msr, int *err)
-{
-	return PVOP_CALL2(u64, cpu.read_msr_safe, msr, err);
-}
-
-#define rdmsr(msr, val1, val2)			\
-do {						\
-	u64 _l = paravirt_read_msr(msr);	\
-	val1 = (u32)_l;				\
-	val2 = _l >> 32;			\
-} while (0)
-
-#define rdmsrl(msr, val)			\
-do {						\
-	val = paravirt_read_msr(msr);		\
-} while (0)
-
-/* rdmsr with exception handling */
-#define rdmsr_safe(msr, a, b)				\
-({							\
-	int _err;					\
-	u64 _l = paravirt_read_msr_safe(msr, &_err);	\
-	(*a) = (u32)_l;					\
-	(*b) = _l >> 32;				\
-	_err;						\
-})
-
-static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
-{
-	int err;
-
-	*p = paravirt_read_msr_safe(msr, &err);
-	return err;
-}
-
 static inline unsigned long long paravirt_read_pmc(int counter)
 {
 	return PVOP_CALL1(u64, cpu.read_pmc, counter);
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 8a563576d70e..5c57e6e4115f 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -90,15 +90,6 @@ struct pv_cpu_ops {
 	void (*cpuid)(unsigned int *eax, unsigned int *ebx,
 		      unsigned int *ecx, unsigned int *edx);
 
-	/* Unsafe MSR operations.  These will warn or panic on failure. */
-	u64 (*read_msr)(unsigned int msr);
-
-	/*
-	 * Safe MSR operations.
-	 * read sets err to 0 or -EIO.  write returns 0 or -EIO.
-	 */
-	u64 (*read_msr_safe)(unsigned int msr, int *err);
-
 	u64 (*read_pmc)(int counter);
 
 	void (*start_context_switch)(struct task_struct *prev);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index ffb04445f97e..e3d4f9869779 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -128,8 +128,6 @@ struct paravirt_patch_template pv_ops = {
 	.cpu.read_cr0		= native_read_cr0,
 	.cpu.write_cr0		= native_write_cr0,
 	.cpu.write_cr4		= native_write_cr4,
-	.cpu.read_msr		= native_read_msr,
-	.cpu.read_msr_safe	= native_read_msr_safe,
 	.cpu.read_pmc		= native_read_pmc,
 	.cpu.load_tr_desc	= native_load_tr_desc,
 	.cpu.set_ldt		= native_set_ldt,
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index d02f55bfa869..e49f16278487 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1087,19 +1087,26 @@ static void xen_write_cr4(unsigned long cr4)
 	native_write_cr4(cr4);
 }
 
-static u64 xen_do_read_msr(unsigned int msr, int *err)
+/*
+ * Return true in xen_rdmsr_ret_type to indicate the requested MSR read has
+ * been done successfully.
+ */
+struct xen_rdmsr_ret_type xen_do_read_msr(u32 msr)
 {
-	u64 val = 0;	/* Avoid uninitialized value for safe variant. */
-	bool emulated;
+	struct xen_rdmsr_ret_type ret;
 
-	if (pmu_msr_chk_emulated(msr, &val, true, &emulated) && emulated)
-		return val;
+	ret.done = true;
 
-	if (err)
-		val = native_read_msr_safe(msr, err);
-	else
-		val = native_read_msr(msr);
+	if (pmu_msr_chk_emulated(msr, &ret.val, true, &ret.done) && ret.done)
+		return ret;
+
+	ret.val = 0;
+	ret.done = false;
+	return ret;
+}
 
+u64 xen_do_read_msr_fixup(u32 msr, u64 val)
+{
 	switch (msr) {
 	case MSR_IA32_APICBASE:
 		val &= ~X2APIC_ENABLE;
@@ -1108,7 +1115,11 @@ static u64 xen_do_read_msr(unsigned int msr, int *err)
 		else
 			val &= ~MSR_IA32_APICBASE_BSP;
 		break;
+
+	default:
+		break;
 	}
+
 	return val;
 }
 
@@ -1160,18 +1171,6 @@ bool xen_do_write_msr(u32 msr, u64 val)
 	}
 }
 
-static u64 xen_read_msr_safe(unsigned int msr, int *err)
-{
-	return xen_do_read_msr(msr, err);
-}
-
-static u64 xen_read_msr(unsigned int msr)
-{
-	int err;
-
-	return xen_do_read_msr(msr, xen_msr_safe ? &err : NULL);
-}
-
 /* This is called once we have the cpu_possible_mask */
 void __init xen_setup_vcpu_info_placement(void)
 {
@@ -1206,10 +1205,6 @@ static const typeof(pv_ops) xen_cpu_ops __initconst = {
 
 		.write_cr4 = xen_write_cr4,
 
-		.read_msr = xen_read_msr,
-
-		.read_msr_safe = xen_read_msr_safe,
-
 		.read_pmc = xen_read_pmc,
 
 		.load_tr_desc = paravirt_nop,
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index e672632b1cc0..6e7a9daa03d4 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -399,3 +399,37 @@ SYM_CODE_END(xen_entry_SYSCALL_compat)
 	RET
 SYM_FUNC_END(asm_xen_write_msr)
 EXPORT_SYMBOL_GPL(asm_xen_write_msr)
+
+/*
+ * The prototype of the Xen C code:
+ * 	struct { u64 val, bool done } xen_do_read_msr(u32 msr)
+ */
+SYM_FUNC_START(asm_xen_read_msr)
+	ENDBR
+	FRAME_BEGIN
+	XEN_SAVE_CALLEE_REGS_FOR_MSR
+	mov %ecx, %edi		/* MSR number */
+	call xen_do_read_msr
+	test %dl, %dl		/* %dl=1, i.e., ZF=0, meaning successfully done */
+	XEN_RESTORE_CALLEE_REGS_FOR_MSR
+	jnz 2f
+
+1:	rdmsr
+	_ASM_EXTABLE_FUNC_REWIND(1b, -5, FRAME_OFFSET / (BITS_PER_LONG / 8))
+	shl $0x20, %rdx
+	or %rdx, %rax
+	/*
+	 * The top of the stack points directly at the return address;
+	 * back up by 5 bytes from the return address.
+	 */
+
+	XEN_SAVE_CALLEE_REGS_FOR_MSR
+	mov %ecx, %edi
+	mov %rax, %rsi
+	call xen_do_read_msr_fixup
+	XEN_RESTORE_CALLEE_REGS_FOR_MSR
+
+2:	FRAME_END
+	RET
+SYM_FUNC_END(asm_xen_read_msr)
+EXPORT_SYMBOL_GPL(asm_xen_read_msr)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index fc3c55871037..46efeaa4bbd3 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -146,7 +146,14 @@ __visible unsigned long xen_read_cr2_direct(void);
 /* These are not functions, and cannot be called normally */
 __visible void xen_iret(void);
 
+struct xen_rdmsr_ret_type {
+	u64 val;
+	bool done;
+};
+
 extern bool xen_do_write_msr(u32 msr, u64 val);
+extern struct xen_rdmsr_ret_type xen_do_read_msr(u32 msr);
+extern u64 xen_do_read_msr_fixup(u32 msr, u64 val);
 
 extern int xen_panic_handler_init(void);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 14/15] x86/extable: Add support for the immediate form MSR instructions
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (12 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  2025-03-31  8:22 ` [RFC PATCH v1 15/15] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/mm/extable.c | 59 ++++++++++++++++++++++++++++++-------------
 1 file changed, 41 insertions(+), 18 deletions(-)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index eb9331240a88..56138c0762b7 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -164,31 +164,54 @@ static bool ex_handler_uaccess(const struct exception_table_entry *fixup,
 	return ex_handler_default(fixup, regs);
 }
 
+#ifdef CONFIG_X86_64
+static const u8 msr_imm_insn_prefix[] = { 0xc4, 0xe7 };
+#endif
+
 static bool ex_handler_msr(const struct exception_table_entry *fixup,
-			   struct pt_regs *regs, bool wrmsr, bool safe, int reg)
+			   struct pt_regs *regs, bool wrmsr, bool safe)
 {
+	/*
+	 * To ensure consistency with the existing RDMSR and WRMSR(NS), the register
+	 * operand of the immediate form MSR access instructions is ALWAYS encoded as
+	 * RAX in <asm/msr.h> for the MSR value to be written or read.
+	 *
+	 * Full decoder for the immediate form MSR access instructions looks overkill.
+	 */
+	bool is_imm_insn;
+	u32 msr;
+	u64 msr_val;
+
+#ifdef CONFIG_X86_64
+	is_imm_insn = !memcmp((void *)regs->ip, msr_imm_insn_prefix, sizeof(msr_imm_insn_prefix));
+#else
+	is_imm_insn = false;
+#endif
+
+	if (is_imm_insn) {
+		u8 *insn = (u8 *)regs->ip;
+
+		msr = insn[5] | (insn[6] << 8) | (insn[7] << 16) | (insn[8] << 24);
+	} else {
+		msr = (u32)regs->cx;
+	}
+
 	if (__ONCE_LITE_IF(!safe && wrmsr)) {
-		pr_warn("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
-			(unsigned int)regs->cx, (unsigned int)regs->dx,
-			(unsigned int)regs->ax,  regs->ip, (void *)regs->ip);
+		msr_val = regs->ax;
+		if (!is_imm_insn)
+			msr_val |= (u64)regs->dx << 32;
+
+		pr_warn("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%016llx) at rIP: 0x%lx (%pS)\n",
+			msr, msr_val, regs->ip, (void *)regs->ip);
 		show_stack_regs(regs);
 	}
 
 	if (__ONCE_LITE_IF(!safe && !wrmsr)) {
 		pr_warn("unchecked MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
-			(unsigned int)regs->cx, regs->ip, (void *)regs->ip);
+			msr, regs->ip, (void *)regs->ip);
 		show_stack_regs(regs);
 	}
 
-	if (!wrmsr) {
-		/* Pretend that the read succeeded and returned 0. */
-		regs->ax = 0;
-		regs->dx = 0;
-	}
-
-	if (safe)
-		*pt_regs_nr(regs, reg) = -EIO;
-
 	return ex_handler_default(fixup, regs);
 }
 
@@ -366,13 +389,13 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 		case EX_TYPE_BPF:
 			return ex_handler_bpf(e, regs);
 		case EX_TYPE_WRMSR:
-			return ex_handler_msr(e, regs, true, false, reg);
+			return ex_handler_msr(e, regs, true, false);
 		case EX_TYPE_RDMSR:
-			return ex_handler_msr(e, regs, false, false, reg);
+			return ex_handler_msr(e, regs, false, false);
 		case EX_TYPE_WRMSR_SAFE:
-			return ex_handler_msr(e, regs, true, true, reg);
+			return ex_handler_msr(e, regs, true, true);
 		case EX_TYPE_RDMSR_SAFE:
-			return ex_handler_msr(e, regs, false, true, reg);
+			return ex_handler_msr(e, regs, false, true);
 		case EX_TYPE_WRMSR_IN_MCE:
 			ex_handler_msr_mce(regs, true);
 			break;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [RFC PATCH v1 15/15] x86/msr: Move the ARGS macros after the MSR read/write APIs
  2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
                   ` (13 preceding siblings ...)
  2025-03-31  8:22 ` [RFC PATCH v1 14/15] x86/extable: Add support for the immediate form MSR instructions Xin Li (Intel)
@ 2025-03-31  8:22 ` Xin Li (Intel)
  14 siblings, 0 replies; 55+ messages in thread
From: Xin Li (Intel) @ 2025-03-31  8:22 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3,
	peterz, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	alexey.amakhalov, bcm-kernel-feedback-list, tony.luck, pbonzini,
	vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz, decui

Since the ARGS macros are no longer used in the MSR read/write API
implementation, move them after their definitions.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/include/asm/msr.h | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index fc93c2601853..9b109d1d92aa 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -37,23 +37,6 @@ struct saved_msrs {
 	struct saved_msr *array;
 };
 
-/*
- * both i386 and x86_64 returns 64-bit value in edx:eax, but gcc's "A"
- * constraint has different meanings. For i386, "A" means exactly
- * edx:eax, while for x86_64 it doesn't mean rdx:rax or edx:eax. Instead,
- * it means rax *or* rdx.
- */
-#ifdef CONFIG_X86_64
-/* Using 64-bit values saves one instruction clearing the high half of low */
-#define DECLARE_ARGS(val, low, high)	unsigned long low, high
-#define EAX_EDX_VAL(val, low, high)	((low) | (high) << 32)
-#define EAX_EDX_RET(val, low, high)	"=a" (low), "=d" (high)
-#else
-#define DECLARE_ARGS(val, low, high)	unsigned long long val
-#define EAX_EDX_VAL(val, low, high)	(val)
-#define EAX_EDX_RET(val, low, high)	"=A" (val)
-#endif
-
 /*
  * Be very careful with includes. This header is prone to include loops.
  */
@@ -620,6 +603,23 @@ static __always_inline int wrmsrl_safe(const u32 msr, const u64 val)
 extern int rdmsr_safe_regs(u32 regs[8]);
 extern int wrmsr_safe_regs(u32 regs[8]);
 
+/*
+ * both i386 and x86_64 returns 64-bit value in edx:eax, but gcc's "A"
+ * constraint has different meanings. For i386, "A" means exactly
+ * edx:eax, while for x86_64 it doesn't mean rdx:rax or edx:eax. Instead,
+ * it means rax *or* rdx.
+ */
+#ifdef CONFIG_X86_64
+/* Using 64-bit values saves one instruction clearing the high half of low */
+#define DECLARE_ARGS(val, low, high)	unsigned long low, high
+#define EAX_EDX_VAL(val, low, high)	((low) | (high) << 32)
+#define EAX_EDX_RET(val, low, high)	"=a" (low), "=d" (high)
+#else
+#define DECLARE_ARGS(val, low, high)	unsigned long long val
+#define EAX_EDX_VAL(val, low, high)	(val)
+#define EAX_EDX_RET(val, low, high)	"=A" (val)
+#endif
+
 /**
  * rdtsc() - returns the current TSC without ordering constraints
  *
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31  8:22 ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Xin Li (Intel)
@ 2025-03-31 10:17   ` Ingo Molnar
  2025-03-31 20:32     ` H. Peter Anvin
  2025-03-31 21:45   ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Andrew Cooper
  1 sibling, 1 reply; 55+ messages in thread
From: Ingo Molnar @ 2025-03-31 10:17 UTC (permalink / raw)
  To: Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3, peterz,
	acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


* Xin Li (Intel) <xin@zytor.com> wrote:

> -	__wrmsr      (MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
> +	native_wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);

This is an improvement.

> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)plr->closid << 32 | rmid_p);

> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)closid_p << 32 | rmid_p);

This is not an improvement.

Please provide a native_wrmsrl() API variant where natural [rmid_p, closid_p]
high/lo parameters can be used, without the shift-uglification...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 02/15] x86/msr: Replace __rdmsr() with native_rdmsrl()
  2025-03-31  8:22 ` [RFC PATCH v1 02/15] x86/msr: Replace __rdmsr() with native_rdmsrl() Xin Li (Intel)
@ 2025-03-31 10:26   ` Ingo Molnar
  0 siblings, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2025-03-31 10:26 UTC (permalink / raw)
  To: Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3, peterz,
	acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui


* Xin Li (Intel) <xin@zytor.com> wrote:

> __rdmsr() is the lowest level primitive MSR read API, and its direct 
> use is NOT preferred.  Use its wrapper function native_rdmsrl() 
> instead.

This description is very misleading. As of today, native_rdmsrl() 
doesn't exist in-tree, so it cannot be 'preferred' in any fashion.

We have native_read_msr(), a confusingly similar function name, and 
this changelog doesn't make it clear, at all, why the extra 
native_rdmsrl() indirection is introduced.

Please split this into two changes and explain them properly:

 - x86/msr: Add the native_rdmsrl() helper
 - x86/msr: Convert __rdmsr() uses to native_rdmsrl() uses

For the first patch you should explain why you want an extra layer of 
indirection within these APIs and how it relates to native_read_msr() 
and why there is a _read_msr() and a _rdmsr() variant...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-03-31  8:22 ` [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available Xin Li (Intel)
@ 2025-03-31 20:27   ` Konrad Rzeszutek Wilk
  2025-03-31 20:38     ` Borislav Petkov
                       ` (2 more replies)
  2025-04-10 23:24   ` Sean Christopherson
  1 sibling, 3 replies; 55+ messages in thread
From: Konrad Rzeszutek Wilk @ 2025-03-31 20:27 UTC (permalink / raw)
  To: Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3, peterz,
	acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On Mon, Mar 31, 2025 at 01:22:46AM -0700, Xin Li (Intel) wrote:
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>  arch/x86/include/asm/msr-index.h |  6 ++++++
>  arch/x86/kvm/vmx/vmenter.S       | 28 ++++++++++++++++++++++++----
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e6134ef2263d..04244c3ba374 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1226,4 +1226,10 @@
>  						* a #GP
>  						*/
>  
> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
> +#define ASM_WRMSRNS		_ASM_BYTES(0x0f,0x01,0xc6)
> +
> +/* Instruction opcode for the immediate form RDMSR/WRMSRNS */
> +#define ASM_WRMSRNS_RAX		_ASM_BYTES(0xc4,0xe7,0x7a,0xf6,0xc0)
> +
>  #endif /* _ASM_X86_MSR_INDEX_H */
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index f6986dee6f8c..9fae43723c44 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -64,6 +64,29 @@
>  	RET
>  .endm
>  
> +/*
> + * Write EAX to MSR_IA32_SPEC_CTRL.
> + *
> + * Choose the best WRMSR instruction based on availability.
> + *
> + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
> + */
> +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
> +	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
> +				  xor %edx, %edx;				\
> +				  mov %edi, %eax;				\
> +				  ds wrmsr),					\
> +		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
> +				  xor %edx, %edx;				\
> +				  mov %edi, %eax;				\
> +				  ASM_WRMSRNS),					\
> +		      X86_FEATURE_WRMSRNS,					\
> +		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
> +				  mov %edi, %eax;				\
> +				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
> +		      X86_FEATURE_MSR_IMM
> +.endm
> +
>  .section .noinstr.text, "ax"
>  
>  /**
> @@ -123,10 +146,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
>  	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
>  	cmp %edi, %esi
>  	je .Lspec_ctrl_done
> -	mov $MSR_IA32_SPEC_CTRL, %ecx
> -	xor %edx, %edx
> -	mov %edi, %eax
> -	wrmsr

Is that the right path forward?

That is replace the MSR write to disable speculative execution with a
non-serialized WRMSR? Doesn't that mean the WRMSRNS is speculative?


> +	WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>  
>  .Lspec_ctrl_done:
>  
> -- 
> 2.49.0
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31 10:17   ` Ingo Molnar
@ 2025-03-31 20:32     ` H. Peter Anvin
  2025-04-01  5:53       ` Xin Li
  2025-04-01  7:52       ` Ingo Molnar
  0 siblings, 2 replies; 55+ messages in thread
From: H. Peter Anvin @ 2025-03-31 20:32 UTC (permalink / raw)
  To: Ingo Molnar, Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On March 31, 2025 3:17:30 AM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>
>* Xin Li (Intel) <xin@zytor.com> wrote:
>
>> -	__wrmsr      (MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
>> +	native_wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
>
>This is an improvement.
>
>> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
>> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)plr->closid << 32 | rmid_p);
>
>> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
>> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)closid_p << 32 | rmid_p);
>
>This is not an improvement.
>
>Please provide a native_wrmsrl() API variant where natural [rmid_p, closid_p]
>high/lo parameters can be used, without the shift-uglification...
>
>Thanks,
>
>	Ingo

Directing this question primarily to Ingo, who is more than anyone else the namespace consistency guardian:

On the subject of msr function naming ... *msrl() has always been misleading. The -l suffix usually means 32 bits; sometimes it means the C type "long" (which in the kernel is used instead of size_t/uintptr_t, which might end up being "fun" when 128-bit architectures appear some time this century), but for a fixed 64-but type we normally use -q.

Should we rename the *msrl() functions to *msrq() as part of this overhaul?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-03-31 20:27   ` Konrad Rzeszutek Wilk
@ 2025-03-31 20:38     ` Borislav Petkov
  2025-03-31 20:41     ` Andrew Cooper
  2025-03-31 20:45     ` H. Peter Anvin
  2 siblings, 0 replies; 55+ messages in thread
From: Borislav Petkov @ 2025-03-31 20:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Xin Li (Intel), linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, dave.hansen, x86, hpa, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui

On Mon, Mar 31, 2025 at 04:27:23PM -0400, Konrad Rzeszutek Wilk wrote:
> Is that the right path forward?
> 
> That is replace the MSR write to disable speculative execution with a
> non-serialized WRMSR? Doesn't that mean the WRMSRNS is speculative?

Ha, interesting question.

If the WRMSR is non-serializing, when do speculative things like indirect
branches and the like get *actually* cleared and can such a speculation window
be used to leak branch data even if IBRS is actually enabled for example...

Fun.

This change needs to be run by hw folks and I guess until then WRMSRNS should
not get anywhere near mitigation MSRs like SPEC_CTRL or PRED_CMD...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-03-31 20:27   ` Konrad Rzeszutek Wilk
  2025-03-31 20:38     ` Borislav Petkov
@ 2025-03-31 20:41     ` Andrew Cooper
  2025-03-31 20:55       ` H. Peter Anvin
  2025-03-31 20:45     ` H. Peter Anvin
  2 siblings, 1 reply; 55+ messages in thread
From: Andrew Cooper @ 2025-03-31 20:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, hpa, jgross, peterz, acme, namhyung,
	mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 31/03/2025 9:27 pm, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 31, 2025 at 01:22:46AM -0700, Xin Li (Intel) wrote:
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>> ---
>>  arch/x86/include/asm/msr-index.h |  6 ++++++
>>  arch/x86/kvm/vmx/vmenter.S       | 28 ++++++++++++++++++++++++----
>>  2 files changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e6134ef2263d..04244c3ba374 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1226,4 +1226,10 @@
>>  						* a #GP
>>  						*/
>>  
>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>> +#define ASM_WRMSRNS		_ASM_BYTES(0x0f,0x01,0xc6)
>> +
>> +/* Instruction opcode for the immediate form RDMSR/WRMSRNS */
>> +#define ASM_WRMSRNS_RAX		_ASM_BYTES(0xc4,0xe7,0x7a,0xf6,0xc0)
>> +
>>  #endif /* _ASM_X86_MSR_INDEX_H */
>> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
>> index f6986dee6f8c..9fae43723c44 100644
>> --- a/arch/x86/kvm/vmx/vmenter.S
>> +++ b/arch/x86/kvm/vmx/vmenter.S
>> @@ -64,6 +64,29 @@
>>  	RET
>>  .endm
>>  
>> +/*
>> + * Write EAX to MSR_IA32_SPEC_CTRL.
>> + *
>> + * Choose the best WRMSR instruction based on availability.
>> + *
>> + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
>> + */
>> +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>> +	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ds wrmsr),					\
>> +		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS),					\
>> +		      X86_FEATURE_WRMSRNS,					\
>> +		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
>> +		      X86_FEATURE_MSR_IMM
>> +.endm
>> +
>>  .section .noinstr.text, "ax"
>>  
>>  /**
>> @@ -123,10 +146,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
>>  	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
>>  	cmp %edi, %esi
>>  	je .Lspec_ctrl_done
>> -	mov $MSR_IA32_SPEC_CTRL, %ecx
>> -	xor %edx, %edx
>> -	mov %edi, %eax
>> -	wrmsr
> Is that the right path forward?
>
> That is replace the MSR write to disable speculative execution with a
> non-serialized WRMSR? Doesn't that mean the WRMSRNS is speculative?

MSR_SPEC_CTRL is explicitly non-serialising, even with a plain WRMSR.

non-serialising != non-speculative.

Although WRMSRNS's precise statement on the matter of
non-speculativeness is woolly, given an intent to optimise it some more
in the future.

~Andrew

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-03-31 20:27   ` Konrad Rzeszutek Wilk
  2025-03-31 20:38     ` Borislav Petkov
  2025-03-31 20:41     ` Andrew Cooper
@ 2025-03-31 20:45     ` H. Peter Anvin
  2 siblings, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2025-03-31 20:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On March 31, 2025 1:27:23 PM PDT, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>On Mon, Mar 31, 2025 at 01:22:46AM -0700, Xin Li (Intel) wrote:
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>> ---
>>  arch/x86/include/asm/msr-index.h |  6 ++++++
>>  arch/x86/kvm/vmx/vmenter.S       | 28 ++++++++++++++++++++++++----
>>  2 files changed, 30 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e6134ef2263d..04244c3ba374 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1226,4 +1226,10 @@
>>  						* a #GP
>>  						*/
>>  
>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>> +#define ASM_WRMSRNS		_ASM_BYTES(0x0f,0x01,0xc6)
>> +
>> +/* Instruction opcode for the immediate form RDMSR/WRMSRNS */
>> +#define ASM_WRMSRNS_RAX		_ASM_BYTES(0xc4,0xe7,0x7a,0xf6,0xc0)
>> +
>>  #endif /* _ASM_X86_MSR_INDEX_H */
>> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
>> index f6986dee6f8c..9fae43723c44 100644
>> --- a/arch/x86/kvm/vmx/vmenter.S
>> +++ b/arch/x86/kvm/vmx/vmenter.S
>> @@ -64,6 +64,29 @@
>>  	RET
>>  .endm
>>  
>> +/*
>> + * Write EAX to MSR_IA32_SPEC_CTRL.
>> + *
>> + * Choose the best WRMSR instruction based on availability.
>> + *
>> + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
>> + */
>> +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>> +	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ds wrmsr),					\
>> +		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS),					\
>> +		      X86_FEATURE_WRMSRNS,					\
>> +		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
>> +		      X86_FEATURE_MSR_IMM
>> +.endm
>> +
>>  .section .noinstr.text, "ax"
>>  
>>  /**
>> @@ -123,10 +146,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
>>  	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
>>  	cmp %edi, %esi
>>  	je .Lspec_ctrl_done
>> -	mov $MSR_IA32_SPEC_CTRL, %ecx
>> -	xor %edx, %edx
>> -	mov %edi, %eax
>> -	wrmsr
>
>Is that the right path forward?
>
>That is replace the MSR write to disable speculative execution with a
>non-serialized WRMSR? Doesn't that mean the WRMSRNS is speculative?
>
>
>> +	WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>>  
>>  .Lspec_ctrl_done:
>>  
>> -- 
>> 2.49.0
>> 
>> 

So to clarify the semantics of WRMSRNS: it is an opt-in capability for the OS to allow the hardware to choose the amount of serialization needed on an MSR- or implementation-specific basis.

It also allows the OS to set multiple MSRs followed by a SERIALIZE instruction if full hard serialization is still desired, rather than having each individual MSR write do a full hard serialization (killing the full pipe and starting over from instruction fetch.)

This will replace the – architecturally questionable, in my opinion – practice of introducing non-serializing MSRs which after all are retroactive changes to the semantics WRMSR instruction with no opt-out (although the existence of SERIALIZE improves the situation somewhat.)

I agree that we need better documentation as to the semantics of WRMSRNS on critical MSRs like SPEC_CTRL, and especially in that specific case, when post-batch SERIALIZE would be called for.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-03-31 20:41     ` Andrew Cooper
@ 2025-03-31 20:55       ` H. Peter Anvin
  0 siblings, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2025-03-31 20:55 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk, Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, peterz, acme, namhyung,
	mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 3/31/25 13:41, Andrew Cooper wrote:
>>
>> That is replace the MSR write to disable speculative execution with a
>> non-serialized WRMSR? Doesn't that mean the WRMSRNS is speculative?
> 
> MSR_SPEC_CTRL is explicitly non-serialising, even with a plain WRMSR.
> 
> non-serialising != non-speculative.
> 
> Although WRMSRNS's precise statement on the matter of
> non-speculativeness is woolly, given an intent to optimise it some more
> in the future.
> 

To be specific, "serializing" is a much harder statement than 
"non-speculative."

For architecturally non-serializing MSRs, WRMSRNS and WRMSR are 
equivalent (or to put it differently, WRMSR acts like WRMSRNS.)

The advantage with making them explicitly WRMSRNS is that it allows for 
the substitution of the upcoming immediate forms.

	-hpa


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31  8:22 ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Xin Li (Intel)
  2025-03-31 10:17   ` Ingo Molnar
@ 2025-03-31 21:45   ` Andrew Cooper
  2025-04-01  5:13     ` H. Peter Anvin
  2025-04-03  7:13     ` Xin Li
  1 sibling, 2 replies; 55+ messages in thread
From: Andrew Cooper @ 2025-03-31 21:45 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 31/03/2025 9:22 am, Xin Li (Intel) wrote:
> __wrmsr() is the lowest level primitive MSR write API, and its direct
> use is NOT preferred.  Use its wrapper function native_wrmsrl() instead.
>
> No functional change intended.
>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>

The critical piece of information you're missing from the commit message
is that the MSR_IMM instructions take a single u64.

Therefore to use them, you've got to arrange for all callers to provide
a single u64, rather than a split u32 pair.

~Andrew

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31 21:45   ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Andrew Cooper
@ 2025-04-01  5:13     ` H. Peter Anvin
  2025-04-01  5:29       ` Xin Li
  2025-04-03  7:13     ` Xin Li
  1 sibling, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2025-04-01  5:13 UTC (permalink / raw)
  To: Andrew Cooper, Xin Li (Intel), linux-kernel, linux-perf-users,
	linux-hyperv, virtualization, linux-edac, kvm, xen-devel,
	linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, jgross, peterz, acme, namhyung,
	mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On March 31, 2025 2:45:43 PM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>On 31/03/2025 9:22 am, Xin Li (Intel) wrote:
>> __wrmsr() is the lowest level primitive MSR write API, and its direct
>> use is NOT preferred.  Use its wrapper function native_wrmsrl() instead.
>>
>> No functional change intended.
>>
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>
>The critical piece of information you're missing from the commit message
>is that the MSR_IMM instructions take a single u64.
>
>Therefore to use them, you've got to arrange for all callers to provide
>a single u64, rather than a split u32 pair.
>
>~Andrew

That being said, there is nothing wrong with having a two-word convenience wrapper.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-01  5:13     ` H. Peter Anvin
@ 2025-04-01  5:29       ` Xin Li
  0 siblings, 0 replies; 55+ messages in thread
From: Xin Li @ 2025-04-01  5:29 UTC (permalink / raw)
  To: H. Peter Anvin, Andrew Cooper, linux-kernel, linux-perf-users,
	linux-hyperv, virtualization, linux-edac, kvm, xen-devel,
	linux-ide, linux-pm, bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, jgross, peterz, acme, namhyung,
	mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
	kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 3/31/2025 10:13 PM, H. Peter Anvin wrote:
> On March 31, 2025 2:45:43 PM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 31/03/2025 9:22 am, Xin Li (Intel) wrote:
>>> __wrmsr() is the lowest level primitive MSR write API, and its direct
>>> use is NOT preferred.  Use its wrapper function native_wrmsrl() instead.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>>
>> The critical piece of information you're missing from the commit message
>> is that the MSR_IMM instructions take a single u64.
>>
>> Therefore to use them, you've got to arrange for all callers to provide
>> a single u64, rather than a split u32 pair.
>>
>> ~Andrew
> 
> That being said, there is nothing wrong with having a two-word convenience wrapper.
> 

Yes, I ended up keeping the two-word convenience wrapper in this patch
set, and the wrapper calls a lower level API that takes a u64 argument.

And yes, as Ingo said, some of the conversion is NOT an improvement.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31 20:32     ` H. Peter Anvin
@ 2025-04-01  5:53       ` Xin Li
  2025-04-02 15:41         ` Dave Hansen
  2025-04-01  7:52       ` Ingo Molnar
  1 sibling, 1 reply; 55+ messages in thread
From: Xin Li @ 2025-04-01  5:53 UTC (permalink / raw)
  To: H. Peter Anvin, Ingo Molnar
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 3/31/2025 1:32 PM, H. Peter Anvin wrote:
> On March 31, 2025 3:17:30 AM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Xin Li (Intel) <xin@zytor.com> wrote:
>>
>>> -	__wrmsr      (MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
>>> +	native_wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
>>
>> This is an improvement.
>>
>>> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
>>> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)plr->closid << 32 | rmid_p);
>>
>>> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
>>> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)closid_p << 32 | rmid_p);
>>
>> This is not an improvement.
>>
>> Please provide a native_wrmsrl() API variant where natural [rmid_p, closid_p]
>> high/lo parameters can be used, without the shift-uglification...
>>
>> Thanks,
>>
>> 	Ingo
> 
> Directing this question primarily to Ingo, who is more than anyone else the namespace consistency guardian:
> 
> On the subject of msr function naming ... *msrl() has always been misleading. The -l suffix usually means 32 bits; sometimes it means the C type "long" (which in the kernel is used instead of size_t/uintptr_t, which might end up being "fun" when 128-bit architectures appear some time this century), but for a fixed 64-but type we normally use -q.
> 
> Should we rename the *msrl() functions to *msrq() as part of this overhaul?
> 

Per "struct msr" defined in arch/x86/include/asm/shared/msr.h:

struct msr {
         union {
                 struct {
                         u32 l;
                         u32 h;
                 };
                 u64 q;
         };
};

Probably *msrq() is what we want?



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31 20:32     ` H. Peter Anvin
  2025-04-01  5:53       ` Xin Li
@ 2025-04-01  7:52       ` Ingo Molnar
  2025-04-02  3:45         ` Xin Li
  2025-04-03  5:09         ` Xin Li
  1 sibling, 2 replies; 55+ messages in thread
From: Ingo Molnar @ 2025-04-01  7:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xin Li (Intel), linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui, Linus Torvalds


* H. Peter Anvin <hpa@zytor.com> wrote:

> On March 31, 2025 3:17:30 AM PDT, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >* Xin Li (Intel) <xin@zytor.com> wrote:
> >
> >> -	__wrmsr      (MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
> >> +	native_wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
> >
> >This is an improvement.
> >
> >> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
> >> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)plr->closid << 32 | rmid_p);
> >
> >> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
> >> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)closid_p << 32 | rmid_p);
> >
> >This is not an improvement.
> >
> >Please provide a native_wrmsrl() API variant where natural [rmid_p, closid_p]
> >high/lo parameters can be used, without the shift-uglification...
> >
> >Thanks,
> >
> >	Ingo
> 
> Directing this question primarily to Ingo, who is more than anyone 
> else the namespace consistency guardian:
> 
> On the subject of msr function naming ... *msrl() has always been 
> misleading. The -l suffix usually means 32 bits; sometimes it means 
> the C type "long" (which in the kernel is used instead of 
> size_t/uintptr_t, which might end up being "fun" when 128-bit 
> architectures appear some time this century), but for a fixed 64-but 
> type we normally use -q.

Yeah, agreed - that's been bothering me for a while too. :-)

> Should we rename the *msrl() functions to *msrq() as part of this 
> overhaul?

Yeah, that's a good idea, and because talk is cheap I just implemented 
this in the tip:WIP.x86/msr branch with a couple of other cleanups in 
this area (see the shortlog & diffstat below), but the churn is high:

  144 files changed, 1034 insertions(+), 1034 deletions(-)

So this can only be done if regenerated and sent to Linus right before 
an -rc1 I think:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip WIP.x86/msr

Thanks,

	Ingo

=======================>
Ingo Molnar (18):
      x86/msr: Standardize on u64 in <asm/msr.h>
      x86/msr: Standardize on u64 in <asm/msr-index.h>
      x86/msr: Use u64 in rdmsrl_amd_safe() and wrmsrl_amd_safe()
      x86/msr: Use u64 in rdmsrl_safe() and paravirt_read_pmc()
      x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'
      x86/msr: Rename 'wrmsrl()' to 'wrmsrq()'
      x86/msr: Rename 'rdmsrl_safe()' to 'rdmsrq_safe()'
      x86/msr: Rename 'wrmsrl_safe()' to 'wrmsrq_safe()'
      x86/msr: Rename 'rdmsrl_safe_on_cpu()' to 'rdmsrq_safe_on_cpu()'
      x86/msr: Rename 'wrmsrl_safe_on_cpu()' to 'wrmsrq_safe_on_cpu()'
      x86/msr: Rename 'rdmsrl_on_cpu()' to 'rdmsrq_on_cpu()'
      x86/msr: Rename 'wrmsrl_on_cpu()' to 'wrmsrq_on_cpu()'
      x86/msr: Rename 'mce_rdmsrl()' to 'mce_rdmsrq()'
      x86/msr: Rename 'mce_wrmsrl()' to 'mce_wrmsrq()'
      x86/msr: Rename 'rdmsrl_amd_safe()' to 'rdmsrq_amd_safe()'
      x86/msr: Rename 'wrmsrl_amd_safe()' to 'wrmsrq_amd_safe()'
      x86/msr: Rename 'native_wrmsrl()' to 'native_wrmsrq()'
      x86/msr: Rename 'wrmsrl_cstar()' to 'wrmsrq_cstar()'

 arch/x86/coco/sev/core.c                           |   2 +-
 arch/x86/events/amd/brs.c                          |   8 +-
 arch/x86/events/amd/core.c                         |  12 +--
 arch/x86/events/amd/ibs.c                          |  26 ++---
 arch/x86/events/amd/lbr.c                          |  20 ++--
 arch/x86/events/amd/power.c                        |  10 +-
 arch/x86/events/amd/uncore.c                       |  12 +--
 arch/x86/events/core.c                             |  42 ++++----
 arch/x86/events/intel/core.c                       |  66 ++++++-------
 arch/x86/events/intel/cstate.c                     |   2 +-
 arch/x86/events/intel/ds.c                         |  10 +-
 arch/x86/events/intel/knc.c                        |  16 +--
 arch/x86/events/intel/lbr.c                        |  44 ++++-----
 arch/x86/events/intel/p4.c                         |  24 ++---
 arch/x86/events/intel/p6.c                         |  12 +--
 arch/x86/events/intel/pt.c                         |  32 +++---
 arch/x86/events/intel/uncore.c                     |   2 +-
 arch/x86/events/intel/uncore_discovery.c           |  10 +-
 arch/x86/events/intel/uncore_nhmex.c               |  70 ++++++-------
 arch/x86/events/intel/uncore_snb.c                 |  42 ++++----
 arch/x86/events/intel/uncore_snbep.c               |  50 +++++-----
 arch/x86/events/msr.c                              |   2 +-
 arch/x86/events/perf_event.h                       |  26 ++---
 arch/x86/events/probe.c                            |   2 +-
 arch/x86/events/rapl.c                             |   8 +-
 arch/x86/events/zhaoxin/core.c                     |  16 +--
 arch/x86/hyperv/hv_apic.c                          |   4 +-
 arch/x86/hyperv/hv_init.c                          |  66 ++++++-------
 arch/x86/hyperv/hv_spinlock.c                      |   6 +-
 arch/x86/hyperv/ivm.c                              |   2 +-
 arch/x86/include/asm/apic.h                        |   8 +-
 arch/x86/include/asm/debugreg.h                    |   4 +-
 arch/x86/include/asm/fsgsbase.h                    |   4 +-
 arch/x86/include/asm/kvm_host.h                    |   2 +-
 arch/x86/include/asm/microcode.h                   |   2 +-
 arch/x86/include/asm/msr-index.h                   |  12 +--
 arch/x86/include/asm/msr.h                         |  50 +++++-----
 arch/x86/include/asm/paravirt.h                    |   8 +-
 arch/x86/include/asm/spec-ctrl.h                   |   2 +-
 arch/x86/kernel/acpi/cppc.c                        |   8 +-
 arch/x86/kernel/amd_nb.c                           |   2 +-
 arch/x86/kernel/apic/apic.c                        |  16 +--
 arch/x86/kernel/apic/apic_numachip.c               |   6 +-
 arch/x86/kernel/cet.c                              |   2 +-
 arch/x86/kernel/cpu/amd.c                          |  28 +++---
 arch/x86/kernel/cpu/aperfmperf.c                   |  28 +++---
 arch/x86/kernel/cpu/bugs.c                         |  24 ++---
 arch/x86/kernel/cpu/bus_lock.c                     |  18 ++--
 arch/x86/kernel/cpu/common.c                       |  68 ++++++-------
 arch/x86/kernel/cpu/feat_ctl.c                     |   4 +-
 arch/x86/kernel/cpu/hygon.c                        |   6 +-
 arch/x86/kernel/cpu/intel.c                        |  10 +-
 arch/x86/kernel/cpu/intel_epb.c                    |  12 +--
 arch/x86/kernel/cpu/mce/amd.c                      |  22 ++---
 arch/x86/kernel/cpu/mce/core.c                     |  58 +++++------
 arch/x86/kernel/cpu/mce/inject.c                   |  32 +++---
 arch/x86/kernel/cpu/mce/intel.c                    |  32 +++---
 arch/x86/kernel/cpu/mce/internal.h                 |   2 +-
 arch/x86/kernel/cpu/microcode/amd.c                |   2 +-
 arch/x86/kernel/cpu/microcode/intel.c              |   2 +-
 arch/x86/kernel/cpu/mshyperv.c                     |  12 +--
 arch/x86/kernel/cpu/resctrl/core.c                 |  10 +-
 arch/x86/kernel/cpu/resctrl/monitor.c              |   2 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c          |   2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c             |   6 +-
 arch/x86/kernel/cpu/sgx/main.c                     |   2 +-
 arch/x86/kernel/cpu/topology.c                     |   2 +-
 arch/x86/kernel/cpu/topology_amd.c                 |   4 +-
 arch/x86/kernel/cpu/tsx.c                          |  20 ++--
 arch/x86/kernel/cpu/umwait.c                       |   2 +-
 arch/x86/kernel/fpu/core.c                         |   2 +-
 arch/x86/kernel/fpu/xstate.c                       |  10 +-
 arch/x86/kernel/fpu/xstate.h                       |   2 +-
 arch/x86/kernel/fred.c                             |  20 ++--
 arch/x86/kernel/hpet.c                             |   2 +-
 arch/x86/kernel/kvm.c                              |  28 +++---
 arch/x86/kernel/kvmclock.c                         |   4 +-
 arch/x86/kernel/mmconf-fam10h_64.c                 |   8 +-
 arch/x86/kernel/process.c                          |  16 +--
 arch/x86/kernel/process_64.c                       |  20 ++--
 arch/x86/kernel/reboot_fixups_32.c                 |   2 +-
 arch/x86/kernel/shstk.c                            |  18 ++--
 arch/x86/kernel/traps.c                            |  10 +-
 arch/x86/kernel/tsc.c                              |   2 +-
 arch/x86/kernel/tsc_sync.c                         |  14 +--
 arch/x86/kvm/svm/avic.c                            |   2 +-
 arch/x86/kvm/svm/sev.c                             |   2 +-
 arch/x86/kvm/svm/svm.c                             |  16 +--
 arch/x86/kvm/vmx/nested.c                          |   4 +-
 arch/x86/kvm/vmx/pmu_intel.c                       |   4 +-
 arch/x86/kvm/vmx/sgx.c                             |   8 +-
 arch/x86/kvm/vmx/vmx.c                             |  66 ++++++-------
 arch/x86/kvm/x86.c                                 |  38 ++++----
 arch/x86/lib/insn-eval.c                           |   6 +-
 arch/x86/lib/msr-smp.c                             |  16 +--
 arch/x86/lib/msr.c                                 |   4 +-
 arch/x86/mm/pat/memtype.c                          |   4 +-
 arch/x86/mm/tlb.c                                  |   2 +-
 arch/x86/pci/amd_bus.c                             |  10 +-
 arch/x86/platform/olpc/olpc-xo1-rtc.c              |   6 +-
 arch/x86/platform/olpc/olpc-xo1-sci.c              |   2 +-
 arch/x86/power/cpu.c                               |  26 ++---
 arch/x86/realmode/init.c                           |   2 +-
 arch/x86/virt/svm/sev.c                            |  20 ++--
 arch/x86/xen/suspend.c                             |   6 +-
 drivers/acpi/acpi_extlog.c                         |   2 +-
 drivers/acpi/acpi_lpit.c                           |   2 +-
 drivers/cpufreq/acpi-cpufreq.c                     |   8 +-
 drivers/cpufreq/amd-pstate-ut.c                    |   6 +-
 drivers/cpufreq/amd-pstate.c                       |  22 ++---
 drivers/cpufreq/amd_freq_sensitivity.c             |   2 +-
 drivers/cpufreq/e_powersaver.c                     |   6 +-
 drivers/cpufreq/intel_pstate.c                     | 108 ++++++++++-----------
 drivers/cpufreq/longhaul.c                         |  24 ++---
 drivers/cpufreq/powernow-k7.c                      |  14 +--
 drivers/crypto/ccp/sev-dev.c                       |   2 +-
 drivers/edac/amd64_edac.c                          |   6 +-
 drivers/gpu/drm/i915/selftests/librapl.c           |   4 +-
 drivers/hwmon/fam15h_power.c                       |   6 +-
 drivers/idle/intel_idle.c                          |  34 +++----
 drivers/mtd/nand/raw/cs553x_nand.c                 |   6 +-
 drivers/platform/x86/intel/ifs/core.c              |   4 +-
 drivers/platform/x86/intel/ifs/load.c              |  20 ++--
 drivers/platform/x86/intel/ifs/runtest.c           |  16 +--
 drivers/platform/x86/intel/pmc/cnp.c               |   6 +-
 drivers/platform/x86/intel/pmc/core.c              |   8 +-
 .../x86/intel/speed_select_if/isst_if_common.c     |  18 ++--
 .../x86/intel/speed_select_if/isst_if_mbox_msr.c   |  14 +--
 .../x86/intel/speed_select_if/isst_tpmi_core.c     |   2 +-
 drivers/platform/x86/intel/tpmi_power_domains.c    |   4 +-
 drivers/platform/x86/intel/turbo_max_3.c           |   4 +-
 .../x86/intel/uncore-frequency/uncore-frequency.c  |  10 +-
 drivers/platform/x86/intel_ips.c                   |  36 +++----
 drivers/powercap/intel_rapl_msr.c                  |   6 +-
 .../int340x_thermal/processor_thermal_device.c     |   2 +-
 drivers/thermal/intel/intel_hfi.c                  |  14 +--
 drivers/thermal/intel/intel_powerclamp.c           |   4 +-
 drivers/thermal/intel/intel_tcc_cooling.c          |   4 +-
 drivers/thermal/intel/therm_throt.c                |  10 +-
 drivers/video/fbdev/geode/gxfb_core.c              |   2 +-
 drivers/video/fbdev/geode/lxfb_ops.c               |  22 ++---
 drivers/video/fbdev/geode/suspend_gx.c             |  10 +-
 drivers/video/fbdev/geode/video_gx.c               |  16 +--
 include/hyperv/hvgdk_mini.h                        |   2 +-
 144 files changed, 1034 insertions(+), 1034 deletions(-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-01  7:52       ` Ingo Molnar
@ 2025-04-02  3:45         ` Xin Li
  2025-04-02  4:10           ` Ingo Molnar
  2025-04-03  5:09         ` Xin Li
  1 sibling, 1 reply; 55+ messages in thread
From: Xin Li @ 2025-04-02  3:45 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui, Linus Torvalds

On 4/1/2025 12:52 AM, Ingo Molnar wrote:
> 
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
>> On March 31, 2025 3:17:30 AM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Xin Li (Intel) <xin@zytor.com> wrote:
>>>
>>>> -	__wrmsr      (MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
>>>> +	native_wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
>>>
>>> This is an improvement.
>>>
>>>> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, plr->closid);
>>>> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)plr->closid << 32 | rmid_p);
>>>
>>>> -	__wrmsr      (MSR_IA32_PQR_ASSOC, rmid_p, closid_p);
>>>> +	native_wrmsrl(MSR_IA32_PQR_ASSOC, (u64)closid_p << 32 | rmid_p);
>>>
>>> This is not an improvement.
>>>
>>> Please provide a native_wrmsrl() API variant where natural [rmid_p, closid_p]
>>> high/lo parameters can be used, without the shift-uglification...
>>>
>>> Thanks,
>>>
>>> 	Ingo
>>
>> Directing this question primarily to Ingo, who is more than anyone
>> else the namespace consistency guardian:
>>
>> On the subject of msr function naming ... *msrl() has always been
>> misleading. The -l suffix usually means 32 bits; sometimes it means
>> the C type "long" (which in the kernel is used instead of
>> size_t/uintptr_t, which might end up being "fun" when 128-bit
>> architectures appear some time this century), but for a fixed 64-but
>> type we normally use -q.
> 
> Yeah, agreed - that's been bothering me for a while too. :-)
> 
>> Should we rename the *msrl() functions to *msrq() as part of this
>> overhaul?
> 
> Yeah, that's a good idea, and because talk is cheap I just implemented
> this in the tip:WIP.x86/msr branch with a couple of other cleanups in
> this area (see the shortlog & diffstat below), but the churn is high:
> 
>    144 files changed, 1034 insertions(+), 1034 deletions(-)
> 
> So this can only be done if regenerated and sent to Linus right before
> an -rc1 I think:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip WIP.x86/msr

Hi Ingo,

Is this branch public?

I wanted to rebase on it and then incooperate your review comments, but
couldn't find the branch.

Thanks!
     Xin

> 
> Thanks,
> 
> 	Ingo
> 
> =======================>
> Ingo Molnar (18):
>        x86/msr: Standardize on u64 in <asm/msr.h>
>        x86/msr: Standardize on u64 in <asm/msr-index.h>
>        x86/msr: Use u64 in rdmsrl_amd_safe() and wrmsrl_amd_safe()
>        x86/msr: Use u64 in rdmsrl_safe() and paravirt_read_pmc()
>        x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'
>        x86/msr: Rename 'wrmsrl()' to 'wrmsrq()'
>        x86/msr: Rename 'rdmsrl_safe()' to 'rdmsrq_safe()'
>        x86/msr: Rename 'wrmsrl_safe()' to 'wrmsrq_safe()'
>        x86/msr: Rename 'rdmsrl_safe_on_cpu()' to 'rdmsrq_safe_on_cpu()'
>        x86/msr: Rename 'wrmsrl_safe_on_cpu()' to 'wrmsrq_safe_on_cpu()'
>        x86/msr: Rename 'rdmsrl_on_cpu()' to 'rdmsrq_on_cpu()'
>        x86/msr: Rename 'wrmsrl_on_cpu()' to 'wrmsrq_on_cpu()'
>        x86/msr: Rename 'mce_rdmsrl()' to 'mce_rdmsrq()'
>        x86/msr: Rename 'mce_wrmsrl()' to 'mce_wrmsrq()'
>        x86/msr: Rename 'rdmsrl_amd_safe()' to 'rdmsrq_amd_safe()'
>        x86/msr: Rename 'wrmsrl_amd_safe()' to 'wrmsrq_amd_safe()'
>        x86/msr: Rename 'native_wrmsrl()' to 'native_wrmsrq()'
>        x86/msr: Rename 'wrmsrl_cstar()' to 'wrmsrq_cstar()'
> 
>   arch/x86/coco/sev/core.c                           |   2 +-
>   arch/x86/events/amd/brs.c                          |   8 +-
>   arch/x86/events/amd/core.c                         |  12 +--
>   arch/x86/events/amd/ibs.c                          |  26 ++---
>   arch/x86/events/amd/lbr.c                          |  20 ++--
>   arch/x86/events/amd/power.c                        |  10 +-
>   arch/x86/events/amd/uncore.c                       |  12 +--
>   arch/x86/events/core.c                             |  42 ++++----
>   arch/x86/events/intel/core.c                       |  66 ++++++-------
>   arch/x86/events/intel/cstate.c                     |   2 +-
>   arch/x86/events/intel/ds.c                         |  10 +-
>   arch/x86/events/intel/knc.c                        |  16 +--
>   arch/x86/events/intel/lbr.c                        |  44 ++++-----
>   arch/x86/events/intel/p4.c                         |  24 ++---
>   arch/x86/events/intel/p6.c                         |  12 +--
>   arch/x86/events/intel/pt.c                         |  32 +++---
>   arch/x86/events/intel/uncore.c                     |   2 +-
>   arch/x86/events/intel/uncore_discovery.c           |  10 +-
>   arch/x86/events/intel/uncore_nhmex.c               |  70 ++++++-------
>   arch/x86/events/intel/uncore_snb.c                 |  42 ++++----
>   arch/x86/events/intel/uncore_snbep.c               |  50 +++++-----
>   arch/x86/events/msr.c                              |   2 +-
>   arch/x86/events/perf_event.h                       |  26 ++---
>   arch/x86/events/probe.c                            |   2 +-
>   arch/x86/events/rapl.c                             |   8 +-
>   arch/x86/events/zhaoxin/core.c                     |  16 +--
>   arch/x86/hyperv/hv_apic.c                          |   4 +-
>   arch/x86/hyperv/hv_init.c                          |  66 ++++++-------
>   arch/x86/hyperv/hv_spinlock.c                      |   6 +-
>   arch/x86/hyperv/ivm.c                              |   2 +-
>   arch/x86/include/asm/apic.h                        |   8 +-
>   arch/x86/include/asm/debugreg.h                    |   4 +-
>   arch/x86/include/asm/fsgsbase.h                    |   4 +-
>   arch/x86/include/asm/kvm_host.h                    |   2 +-
>   arch/x86/include/asm/microcode.h                   |   2 +-
>   arch/x86/include/asm/msr-index.h                   |  12 +--
>   arch/x86/include/asm/msr.h                         |  50 +++++-----
>   arch/x86/include/asm/paravirt.h                    |   8 +-
>   arch/x86/include/asm/spec-ctrl.h                   |   2 +-
>   arch/x86/kernel/acpi/cppc.c                        |   8 +-
>   arch/x86/kernel/amd_nb.c                           |   2 +-
>   arch/x86/kernel/apic/apic.c                        |  16 +--
>   arch/x86/kernel/apic/apic_numachip.c               |   6 +-
>   arch/x86/kernel/cet.c                              |   2 +-
>   arch/x86/kernel/cpu/amd.c                          |  28 +++---
>   arch/x86/kernel/cpu/aperfmperf.c                   |  28 +++---
>   arch/x86/kernel/cpu/bugs.c                         |  24 ++---
>   arch/x86/kernel/cpu/bus_lock.c                     |  18 ++--
>   arch/x86/kernel/cpu/common.c                       |  68 ++++++-------
>   arch/x86/kernel/cpu/feat_ctl.c                     |   4 +-
>   arch/x86/kernel/cpu/hygon.c                        |   6 +-
>   arch/x86/kernel/cpu/intel.c                        |  10 +-
>   arch/x86/kernel/cpu/intel_epb.c                    |  12 +--
>   arch/x86/kernel/cpu/mce/amd.c                      |  22 ++---
>   arch/x86/kernel/cpu/mce/core.c                     |  58 +++++------
>   arch/x86/kernel/cpu/mce/inject.c                   |  32 +++---
>   arch/x86/kernel/cpu/mce/intel.c                    |  32 +++---
>   arch/x86/kernel/cpu/mce/internal.h                 |   2 +-
>   arch/x86/kernel/cpu/microcode/amd.c                |   2 +-
>   arch/x86/kernel/cpu/microcode/intel.c              |   2 +-
>   arch/x86/kernel/cpu/mshyperv.c                     |  12 +--
>   arch/x86/kernel/cpu/resctrl/core.c                 |  10 +-
>   arch/x86/kernel/cpu/resctrl/monitor.c              |   2 +-
>   arch/x86/kernel/cpu/resctrl/pseudo_lock.c          |   2 +-
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c             |   6 +-
>   arch/x86/kernel/cpu/sgx/main.c                     |   2 +-
>   arch/x86/kernel/cpu/topology.c                     |   2 +-
>   arch/x86/kernel/cpu/topology_amd.c                 |   4 +-
>   arch/x86/kernel/cpu/tsx.c                          |  20 ++--
>   arch/x86/kernel/cpu/umwait.c                       |   2 +-
>   arch/x86/kernel/fpu/core.c                         |   2 +-
>   arch/x86/kernel/fpu/xstate.c                       |  10 +-
>   arch/x86/kernel/fpu/xstate.h                       |   2 +-
>   arch/x86/kernel/fred.c                             |  20 ++--
>   arch/x86/kernel/hpet.c                             |   2 +-
>   arch/x86/kernel/kvm.c                              |  28 +++---
>   arch/x86/kernel/kvmclock.c                         |   4 +-
>   arch/x86/kernel/mmconf-fam10h_64.c                 |   8 +-
>   arch/x86/kernel/process.c                          |  16 +--
>   arch/x86/kernel/process_64.c                       |  20 ++--
>   arch/x86/kernel/reboot_fixups_32.c                 |   2 +-
>   arch/x86/kernel/shstk.c                            |  18 ++--
>   arch/x86/kernel/traps.c                            |  10 +-
>   arch/x86/kernel/tsc.c                              |   2 +-
>   arch/x86/kernel/tsc_sync.c                         |  14 +--
>   arch/x86/kvm/svm/avic.c                            |   2 +-
>   arch/x86/kvm/svm/sev.c                             |   2 +-
>   arch/x86/kvm/svm/svm.c                             |  16 +--
>   arch/x86/kvm/vmx/nested.c                          |   4 +-
>   arch/x86/kvm/vmx/pmu_intel.c                       |   4 +-
>   arch/x86/kvm/vmx/sgx.c                             |   8 +-
>   arch/x86/kvm/vmx/vmx.c                             |  66 ++++++-------
>   arch/x86/kvm/x86.c                                 |  38 ++++----
>   arch/x86/lib/insn-eval.c                           |   6 +-
>   arch/x86/lib/msr-smp.c                             |  16 +--
>   arch/x86/lib/msr.c                                 |   4 +-
>   arch/x86/mm/pat/memtype.c                          |   4 +-
>   arch/x86/mm/tlb.c                                  |   2 +-
>   arch/x86/pci/amd_bus.c                             |  10 +-
>   arch/x86/platform/olpc/olpc-xo1-rtc.c              |   6 +-
>   arch/x86/platform/olpc/olpc-xo1-sci.c              |   2 +-
>   arch/x86/power/cpu.c                               |  26 ++---
>   arch/x86/realmode/init.c                           |   2 +-
>   arch/x86/virt/svm/sev.c                            |  20 ++--
>   arch/x86/xen/suspend.c                             |   6 +-
>   drivers/acpi/acpi_extlog.c                         |   2 +-
>   drivers/acpi/acpi_lpit.c                           |   2 +-
>   drivers/cpufreq/acpi-cpufreq.c                     |   8 +-
>   drivers/cpufreq/amd-pstate-ut.c                    |   6 +-
>   drivers/cpufreq/amd-pstate.c                       |  22 ++---
>   drivers/cpufreq/amd_freq_sensitivity.c             |   2 +-
>   drivers/cpufreq/e_powersaver.c                     |   6 +-
>   drivers/cpufreq/intel_pstate.c                     | 108 ++++++++++-----------
>   drivers/cpufreq/longhaul.c                         |  24 ++---
>   drivers/cpufreq/powernow-k7.c                      |  14 +--
>   drivers/crypto/ccp/sev-dev.c                       |   2 +-
>   drivers/edac/amd64_edac.c                          |   6 +-
>   drivers/gpu/drm/i915/selftests/librapl.c           |   4 +-
>   drivers/hwmon/fam15h_power.c                       |   6 +-
>   drivers/idle/intel_idle.c                          |  34 +++----
>   drivers/mtd/nand/raw/cs553x_nand.c                 |   6 +-
>   drivers/platform/x86/intel/ifs/core.c              |   4 +-
>   drivers/platform/x86/intel/ifs/load.c              |  20 ++--
>   drivers/platform/x86/intel/ifs/runtest.c           |  16 +--
>   drivers/platform/x86/intel/pmc/cnp.c               |   6 +-
>   drivers/platform/x86/intel/pmc/core.c              |   8 +-
>   .../x86/intel/speed_select_if/isst_if_common.c     |  18 ++--
>   .../x86/intel/speed_select_if/isst_if_mbox_msr.c   |  14 +--
>   .../x86/intel/speed_select_if/isst_tpmi_core.c     |   2 +-
>   drivers/platform/x86/intel/tpmi_power_domains.c    |   4 +-
>   drivers/platform/x86/intel/turbo_max_3.c           |   4 +-
>   .../x86/intel/uncore-frequency/uncore-frequency.c  |  10 +-
>   drivers/platform/x86/intel_ips.c                   |  36 +++----
>   drivers/powercap/intel_rapl_msr.c                  |   6 +-
>   .../int340x_thermal/processor_thermal_device.c     |   2 +-
>   drivers/thermal/intel/intel_hfi.c                  |  14 +--
>   drivers/thermal/intel/intel_powerclamp.c           |   4 +-
>   drivers/thermal/intel/intel_tcc_cooling.c          |   4 +-
>   drivers/thermal/intel/therm_throt.c                |  10 +-
>   drivers/video/fbdev/geode/gxfb_core.c              |   2 +-
>   drivers/video/fbdev/geode/lxfb_ops.c               |  22 ++---
>   drivers/video/fbdev/geode/suspend_gx.c             |  10 +-
>   drivers/video/fbdev/geode/video_gx.c               |  16 +--
>   include/hyperv/hvgdk_mini.h                        |   2 +-
>   144 files changed, 1034 insertions(+), 1034 deletions(-)


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-02  3:45         ` Xin Li
@ 2025-04-02  4:10           ` Ingo Molnar
  2025-04-02  4:57             ` Xin Li
  2025-04-08 17:34             ` Xin Li
  0 siblings, 2 replies; 55+ messages in thread
From: Ingo Molnar @ 2025-04-02  4:10 UTC (permalink / raw)
  To: Xin Li
  Cc: H. Peter Anvin, linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui, Linus Torvalds


* Xin Li <xin@zytor.com> wrote:

> Hi Ingo,
> 
> Is this branch public?
> 
> I wanted to rebase on it and then incooperate your review comments, but
> couldn't find the branch.

Yeah, I moved it over to:

  git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/msr

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-02  4:10           ` Ingo Molnar
@ 2025-04-02  4:57             ` Xin Li
  2025-04-08 17:34             ` Xin Li
  1 sibling, 0 replies; 55+ messages in thread
From: Xin Li @ 2025-04-02  4:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui, Linus Torvalds

On 4/1/2025 9:10 PM, Ingo Molnar wrote:
> Yeah, I moved it over to:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/msr

On it now.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-01  5:53       ` Xin Li
@ 2025-04-02 15:41         ` Dave Hansen
  2025-04-02 15:56           ` H. Peter Anvin
  0 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2025-04-02 15:41 UTC (permalink / raw)
  To: Xin Li, H. Peter Anvin, Ingo Molnar
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 3/31/25 22:53, Xin Li wrote:
> Per "struct msr" defined in arch/x86/include/asm/shared/msr.h:
> 
> struct msr {
>         union {
>                 struct {
>                         u32 l;
>                         u32 h;
>                 };
>                 u64 q;
>         };
> };
> 
> Probably *msrq() is what we want?

What would folks think about "wrmsr64()"? It's writing a 64-bit value to
an MSR and there are a lot of functions in the kernel that are named
with the argument width in bits.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-02 15:41         ` Dave Hansen
@ 2025-04-02 15:56           ` H. Peter Anvin
  2025-04-09 19:53             ` Ingo Molnar
  0 siblings, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2025-04-02 15:56 UTC (permalink / raw)
  To: Dave Hansen, Xin Li, Ingo Molnar
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On April 2, 2025 8:41:07 AM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
>On 3/31/25 22:53, Xin Li wrote:
>> Per "struct msr" defined in arch/x86/include/asm/shared/msr.h:
>> 
>> struct msr {
>>         union {
>>                 struct {
>>                         u32 l;
>>                         u32 h;
>>                 };
>>                 u64 q;
>>         };
>> };
>> 
>> Probably *msrq() is what we want?
>
>What would folks think about "wrmsr64()"? It's writing a 64-bit value to
>an MSR and there are a lot of functions in the kernel that are named
>with the argument width in bits.

Personally, I hate the extra verbosity, mostly visual, since numerals are nearly as prominent as capital letters they tend to attract the eye. There is a reason why they aren't used this way in assembly languages.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-01  7:52       ` Ingo Molnar
  2025-04-02  3:45         ` Xin Li
@ 2025-04-03  5:09         ` Xin Li
  2025-04-03  6:01           ` H. Peter Anvin
  2025-04-09 19:17           ` [PATCH] x86/msr: Standardize on 'u32' MSR indices in <asm/msr.h> Ingo Molnar
  1 sibling, 2 replies; 55+ messages in thread
From: Xin Li @ 2025-04-03  5:09 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui, Linus Torvalds

On 4/1/2025 12:52 AM, Ingo Molnar wrote:
>> Should we rename the *msrl() functions to *msrq() as part of this
>> overhaul?
> Yeah, that's a good idea, and because talk is cheap I just implemented
> this in the tip:WIP.x86/msr branch with a couple of other cleanups in
> this area (see the shortlog & diffstat below), but the churn is high:
> 
>    144 files changed, 1034 insertions(+), 1034 deletions(-)

Hi Ingo,

I noticed that you keep the type of MSR index in these patches as
"unsigned int".

I'm thinking would it be better to standardize it as "u32"?

Because:
1) MSR index is placed in ECX to execute MSR instructions, and the
    high-order 32 bits of RCX are ignored on 64-bit.
2) MSR index is encoded as a 32-bit immediate in the new immediate form
    MSR instructions.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-03  5:09         ` Xin Li
@ 2025-04-03  6:01           ` H. Peter Anvin
  2025-04-09 19:17           ` [PATCH] x86/msr: Standardize on 'u32' MSR indices in <asm/msr.h> Ingo Molnar
  1 sibling, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2025-04-03  6:01 UTC (permalink / raw)
  To: Xin Li, Ingo Molnar
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui, Linus Torvalds

On April 2, 2025 10:09:21 PM PDT, Xin Li <xin@zytor.com> wrote:
>On 4/1/2025 12:52 AM, Ingo Molnar wrote:
>>> Should we rename the *msrl() functions to *msrq() as part of this
>>> overhaul?
>> Yeah, that's a good idea, and because talk is cheap I just implemented
>> this in the tip:WIP.x86/msr branch with a couple of other cleanups in
>> this area (see the shortlog & diffstat below), but the churn is high:
>> 
>>    144 files changed, 1034 insertions(+), 1034 deletions(-)
>
>Hi Ingo,
>
>I noticed that you keep the type of MSR index in these patches as
>"unsigned int".
>
>I'm thinking would it be better to standardize it as "u32"?
>
>Because:
>1) MSR index is placed in ECX to execute MSR instructions, and the
>   high-order 32 bits of RCX are ignored on 64-bit.
>2) MSR index is encoded as a 32-bit immediate in the new immediate form
>   MSR instructions.
>
>Thanks!
>    Xin

"unsigned int" and "u32" are synonymous, but the latter is more explicit and would be better.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-03-31 21:45   ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Andrew Cooper
  2025-04-01  5:13     ` H. Peter Anvin
@ 2025-04-03  7:13     ` Xin Li
  1 sibling, 0 replies; 55+ messages in thread
From: Xin Li @ 2025-04-03  7:13 UTC (permalink / raw)
  To: Andrew Cooper, linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, jgross, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, seanjc,
	luto, boris.ostrovsky, kys, haiyangz, decui

On 3/31/2025 2:45 PM, Andrew Cooper wrote:
> On 31/03/2025 9:22 am, Xin Li (Intel) wrote:
>> __wrmsr() is the lowest level primitive MSR write API, and its direct
>> use is NOT preferred.  Use its wrapper function native_wrmsrl() instead.
>>
>> No functional change intended.
>>
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> 
> The critical piece of information you're missing from the commit message
> is that the MSR_IMM instructions take a single u64.
> 
> Therefore to use them, you've got to arrange for all callers to provide
> a single u64, rather than a split u32 pair.

You definitely caught me on how I was thinking it ;)

Sometimes it is nice to see a change log with a thinking process.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-02  4:10           ` Ingo Molnar
  2025-04-02  4:57             ` Xin Li
@ 2025-04-08 17:34             ` Xin Li
  1 sibling, 0 replies; 55+ messages in thread
From: Xin Li @ 2025-04-08 17:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui, Linus Torvalds

On 4/1/2025 9:10 PM, Ingo Molnar wrote:
> Yeah, I moved it over to:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/msr
> 

Hi Ingo,

Are you going to merge it into tip in this development cycle for the
v6.16 merge window?

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] x86/msr: Standardize on 'u32' MSR indices in <asm/msr.h>
  2025-04-03  5:09         ` Xin Li
  2025-04-03  6:01           ` H. Peter Anvin
@ 2025-04-09 19:17           ` Ingo Molnar
  1 sibling, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2025-04-09 19:17 UTC (permalink / raw)
  To: Xin Li
  Cc: H. Peter Anvin, linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, bcm-kernel-feedback-list, tony.luck,
	pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys, haiyangz,
	decui, Linus Torvalds


* Xin Li <xin@zytor.com> wrote:

> On 4/1/2025 12:52 AM, Ingo Molnar wrote:
> > > Should we rename the *msrl() functions to *msrq() as part of this
> > > overhaul?
> > Yeah, that's a good idea, and because talk is cheap I just implemented
> > this in the tip:WIP.x86/msr branch with a couple of other cleanups in
> > this area (see the shortlog & diffstat below), but the churn is high:
> > 
> >    144 files changed, 1034 insertions(+), 1034 deletions(-)
> 
> Hi Ingo,
> 
> I noticed that you keep the type of MSR index in these patches as
> "unsigned int".
> 
> I'm thinking would it be better to standardize it as "u32"?
> 
> Because:
> 1) MSR index is placed in ECX to execute MSR instructions, and the
>    high-order 32 bits of RCX are ignored on 64-bit.
> 2) MSR index is encoded as a 32-bit immediate in the new immediate form
>    MSR instructions.

Makes sense - something like the attached patch?

Thanks,

	Ingo

=====================>
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 9 Apr 2025 21:12:39 +0200
Subject: [PATCH] x86/msr: Standardize on 'u32' MSR indices in <asm/msr.h>

This is the customary type used for hardware ABIs.

Suggested-by: Xin Li <xin@zytor.com>
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/msr.h | 29 ++++++++++++++---------------
 arch/x86/lib/msr.c         |  4 ++--
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 4ee9ae734c08..20deb58308e5 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -63,12 +63,12 @@ struct saved_msrs {
 DECLARE_TRACEPOINT(read_msr);
 DECLARE_TRACEPOINT(write_msr);
 DECLARE_TRACEPOINT(rdpmc);
-extern void do_trace_write_msr(unsigned int msr, u64 val, int failed);
-extern void do_trace_read_msr(unsigned int msr, u64 val, int failed);
+extern void do_trace_write_msr(u32 msr, u64 val, int failed);
+extern void do_trace_read_msr(u32 msr, u64 val, int failed);
 extern void do_trace_rdpmc(u32 msr, u64 val, int failed);
 #else
-static inline void do_trace_write_msr(unsigned int msr, u64 val, int failed) {}
-static inline void do_trace_read_msr(unsigned int msr, u64 val, int failed) {}
+static inline void do_trace_write_msr(u32 msr, u64 val, int failed) {}
+static inline void do_trace_read_msr(u32 msr, u64 val, int failed) {}
 static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
 #endif
 
@@ -79,7 +79,7 @@ static inline void do_trace_rdpmc(u32 msr, u64 val, int failed) {}
  * think of extending them - you will be slapped with a stinking trout or a frozen
  * shark will reach you, wherever you are! You've been warned.
  */
-static __always_inline u64 __rdmsr(unsigned int msr)
+static __always_inline u64 __rdmsr(u32 msr)
 {
 	DECLARE_ARGS(val, low, high);
 
@@ -91,7 +91,7 @@ static __always_inline u64 __rdmsr(unsigned int msr)
 	return EAX_EDX_VAL(val, low, high);
 }
 
-static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
+static __always_inline void __wrmsr(u32 msr, u32 low, u32 high)
 {
 	asm volatile("1: wrmsr\n"
 		     "2:\n"
@@ -113,7 +113,7 @@ do {							\
 	__wrmsr((msr), (u32)((u64)(val)),		\
 		       (u32)((u64)(val) >> 32))
 
-static inline u64 native_read_msr(unsigned int msr)
+static inline u64 native_read_msr(u32 msr)
 {
 	u64 val;
 
@@ -125,8 +125,7 @@ static inline u64 native_read_msr(unsigned int msr)
 	return val;
 }
 
-static inline u64 native_read_msr_safe(unsigned int msr,
-						      int *err)
+static inline u64 native_read_msr_safe(u32 msr, int *err)
 {
 	DECLARE_ARGS(val, low, high);
 
@@ -142,7 +141,7 @@ static inline u64 native_read_msr_safe(unsigned int msr,
 
 /* Can be uninlined because referenced by paravirt */
 static inline void notrace
-native_write_msr(unsigned int msr, u32 low, u32 high)
+native_write_msr(u32 msr, u32 low, u32 high)
 {
 	__wrmsr(msr, low, high);
 
@@ -152,7 +151,7 @@ native_write_msr(unsigned int msr, u32 low, u32 high)
 
 /* Can be uninlined because referenced by paravirt */
 static inline int notrace
-native_write_msr_safe(unsigned int msr, u32 low, u32 high)
+native_write_msr_safe(u32 msr, u32 low, u32 high)
 {
 	int err;
 
@@ -251,7 +250,7 @@ do {								\
 	(void)((high) = (u32)(__val >> 32));			\
 } while (0)
 
-static inline void wrmsr(unsigned int msr, u32 low, u32 high)
+static inline void wrmsr(u32 msr, u32 low, u32 high)
 {
 	native_write_msr(msr, low, high);
 }
@@ -259,13 +258,13 @@ static inline void wrmsr(unsigned int msr, u32 low, u32 high)
 #define rdmsrq(msr, val)			\
 	((val) = native_read_msr((msr)))
 
-static inline void wrmsrq(unsigned int msr, u64 val)
+static inline void wrmsrq(u32 msr, u64 val)
 {
 	native_write_msr(msr, (u32)(val & 0xffffffffULL), (u32)(val >> 32));
 }
 
 /* wrmsr with exception handling */
-static inline int wrmsr_safe(unsigned int msr, u32 low, u32 high)
+static inline int wrmsr_safe(u32 msr, u32 low, u32 high)
 {
 	return native_write_msr_safe(msr, low, high);
 }
@@ -280,7 +279,7 @@ static inline int wrmsr_safe(unsigned int msr, u32 low, u32 high)
 	__err;							\
 })
 
-static inline int rdmsrq_safe(unsigned int msr, u64 *p)
+static inline int rdmsrq_safe(u32 msr, u64 *p)
 {
 	int err;
 
diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c
index e18925899f13..4ef7c6dcbea6 100644
--- a/arch/x86/lib/msr.c
+++ b/arch/x86/lib/msr.c
@@ -122,14 +122,14 @@ int msr_clear_bit(u32 msr, u8 bit)
 EXPORT_SYMBOL_GPL(msr_clear_bit);
 
 #ifdef CONFIG_TRACEPOINTS
-void do_trace_write_msr(unsigned int msr, u64 val, int failed)
+void do_trace_write_msr(u32 msr, u64 val, int failed)
 {
 	trace_write_msr(msr, val, failed);
 }
 EXPORT_SYMBOL(do_trace_write_msr);
 EXPORT_TRACEPOINT_SYMBOL(write_msr);
 
-void do_trace_read_msr(unsigned int msr, u64 val, int failed)
+void do_trace_read_msr(u32 msr, u64 val, int failed)
 {
 	trace_read_msr(msr, val, failed);
 }

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-02 15:56           ` H. Peter Anvin
@ 2025-04-09 19:53             ` Ingo Molnar
  2025-04-09 19:56               ` Dave Hansen
  0 siblings, 1 reply; 55+ messages in thread
From: Ingo Molnar @ 2025-04-09 19:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Xin Li, linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui


* H. Peter Anvin <hpa@zytor.com> wrote:

> On April 2, 2025 8:41:07 AM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
> >On 3/31/25 22:53, Xin Li wrote:
> >> Per "struct msr" defined in arch/x86/include/asm/shared/msr.h:
> >> 
> >> struct msr {
> >>         union {
> >>                 struct {
> >>                         u32 l;
> >>                         u32 h;
> >>                 };
> >>                 u64 q;
> >>         };
> >> };
> >> 
> >> Probably *msrq() is what we want?
> >
> > What would folks think about "wrmsr64()"? It's writing a 64-bit 
> > value to an MSR and there are a lot of functions in the kernel that 
> > are named with the argument width in bits.
> 
> Personally, I hate the extra verbosity, mostly visual, since numerals 
> are nearly as prominent as capital letters they tend to attract the 
> eye. There is a reason why they aren't used this way in assembly 
> languages.

So what's the consensus here? Both work for me, but I have to pick one. :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-09 19:53             ` Ingo Molnar
@ 2025-04-09 19:56               ` Dave Hansen
  2025-04-09 20:11                 ` Ingo Molnar
  0 siblings, 1 reply; 55+ messages in thread
From: Dave Hansen @ 2025-04-09 19:56 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin
  Cc: Xin Li, linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui

On 4/9/25 12:53, Ingo Molnar wrote:
>>> What would folks think about "wrmsr64()"? It's writing a 64-bit 
>>> value to an MSR and there are a lot of functions in the kernel that 
>>> are named with the argument width in bits.
>> Personally, I hate the extra verbosity, mostly visual, since numerals 
>> are nearly as prominent as capital letters they tend to attract the 
>> eye. There is a reason why they aren't used this way in assembly 
>> languages.
> So what's the consensus here? Both work for me, but I have to pick one. 🙂

I don't feel strongly about it. You're not going to hurt my feelings if
you pick the "q" one, so go for "q" unless you have a real preference.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl()
  2025-04-09 19:56               ` Dave Hansen
@ 2025-04-09 20:11                 ` Ingo Molnar
  0 siblings, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2025-04-09 20:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: H. Peter Anvin, Xin Li, linux-kernel, linux-perf-users,
	linux-hyperv, virtualization, linux-edac, kvm, xen-devel,
	linux-ide, linux-pm, bpf, llvm, tglx, mingo, bp, dave.hansen, x86,
	jgross, andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, seanjc, luto, boris.ostrovsky, kys,
	haiyangz, decui


* Dave Hansen <dave.hansen@intel.com> wrote:

> On 4/9/25 12:53, Ingo Molnar wrote:
> >>> What would folks think about "wrmsr64()"? It's writing a 64-bit 
> >>> value to an MSR and there are a lot of functions in the kernel that 
> >>> are named with the argument width in bits.
> >> Personally, I hate the extra verbosity, mostly visual, since numerals 
> >> are nearly as prominent as capital letters they tend to attract the 
> >> eye. There is a reason why they aren't used this way in assembly 
> >> languages.
> > So what's the consensus here? Both work for me, but I have to pick one. 🙂
> 
> I don't feel strongly about it. You're not going to hurt my feelings if
> you pick the "q" one, so go for "q" unless you have a real preference.

Ok, since hpa seems to hate the wrmsr64()/rdmsr64() names due to the 
numeric verbosity, I'll go with wrmsrq()/rdmsrq().

Thanks,

	Ingo


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-03-31  8:22 ` [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available Xin Li (Intel)
  2025-03-31 20:27   ` Konrad Rzeszutek Wilk
@ 2025-04-10 23:24   ` Sean Christopherson
  2025-04-11 16:18     ` Xin Li
  2025-04-11 21:12     ` Jim Mattson
  1 sibling, 2 replies; 55+ messages in thread
From: Sean Christopherson @ 2025-04-10 23:24 UTC (permalink / raw)
  To: Xin Li (Intel)
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3, peterz,
	acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher, alexey.amakhalov,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On Mon, Mar 31, 2025, Xin Li (Intel) wrote:
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> ---
>  arch/x86/include/asm/msr-index.h |  6 ++++++
>  arch/x86/kvm/vmx/vmenter.S       | 28 ++++++++++++++++++++++++----
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e6134ef2263d..04244c3ba374 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1226,4 +1226,10 @@
>  						* a #GP
>  						*/
>  
> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
> +#define ASM_WRMSRNS		_ASM_BYTES(0x0f,0x01,0xc6)
> +
> +/* Instruction opcode for the immediate form RDMSR/WRMSRNS */
> +#define ASM_WRMSRNS_RAX		_ASM_BYTES(0xc4,0xe7,0x7a,0xf6,0xc0)
> +
>  #endif /* _ASM_X86_MSR_INDEX_H */
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index f6986dee6f8c..9fae43723c44 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -64,6 +64,29 @@
>  	RET
>  .endm
>  
> +/*
> + * Write EAX to MSR_IA32_SPEC_CTRL.
> + *
> + * Choose the best WRMSR instruction based on availability.
> + *
> + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
> + */
> +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
> +	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
> +				  xor %edx, %edx;				\
> +				  mov %edi, %eax;				\
> +				  ds wrmsr),					\
> +		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
> +				  xor %edx, %edx;				\
> +				  mov %edi, %eax;				\
> +				  ASM_WRMSRNS),					\
> +		      X86_FEATURE_WRMSRNS,					\
> +		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
> +				  mov %edi, %eax;				\
> +				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
> +		      X86_FEATURE_MSR_IMM
> +.endm

This is quite hideous.  I have no objection to optimizing __vmx_vcpu_run(), but
I would much prefer that a macro like this live in generic code, and that it be
generic.  It should be easy enough to provide an assembly friendly equivalent to
__native_wrmsr_constant().


> +
>  .section .noinstr.text, "ax"
>  
>  /**
> @@ -123,10 +146,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
>  	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
>  	cmp %edi, %esi
>  	je .Lspec_ctrl_done
> -	mov $MSR_IA32_SPEC_CTRL, %ecx
> -	xor %edx, %edx
> -	mov %edi, %eax
> -	wrmsr
> +	WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>  
>  .Lspec_ctrl_done:
>  
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-10 23:24   ` Sean Christopherson
@ 2025-04-11 16:18     ` Xin Li
  2025-04-11 20:50       ` H. Peter Anvin
  2025-04-11 21:12     ` Jim Mattson
  1 sibling, 1 reply; 55+ messages in thread
From: Xin Li @ 2025-04-11 16:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3, peterz,
	acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On 4/10/2025 4:24 PM, Sean Christopherson wrote:
>> +/*
>> + * Write EAX to MSR_IA32_SPEC_CTRL.
>> + *
>> + * Choose the best WRMSR instruction based on availability.
>> + *
>> + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
>> + */
>> +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>> +	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ds wrmsr),					\
>> +		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS),					\
>> +		      X86_FEATURE_WRMSRNS,					\
>> +		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
>> +		      X86_FEATURE_MSR_IMM
>> +.endm
> This is quite hideous.  I have no objection to optimizing __vmx_vcpu_run(), but
> I would much prefer that a macro like this live in generic code, and that it be
> generic.  It should be easy enough to provide an assembly friendly equivalent to
> __native_wrmsr_constant().

Will do.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-11 16:18     ` Xin Li
@ 2025-04-11 20:50       ` H. Peter Anvin
  2025-04-12  4:28         ` Xin Li
  0 siblings, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2025-04-11 20:50 UTC (permalink / raw)
  To: Xin Li, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On April 11, 2025 9:18:08 AM PDT, Xin Li <xin@zytor.com> wrote:
>On 4/10/2025 4:24 PM, Sean Christopherson wrote:
>>> +/*
>>> + * Write EAX to MSR_IA32_SPEC_CTRL.
>>> + *
>>> + * Choose the best WRMSR instruction based on availability.
>>> + *
>>> + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
>>> + */
>>> +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>>> +	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>>> +				  xor %edx, %edx;				\
>>> +				  mov %edi, %eax;				\
>>> +				  ds wrmsr),					\
>>> +		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>>> +				  xor %edx, %edx;				\
>>> +				  mov %edi, %eax;				\
>>> +				  ASM_WRMSRNS),					\
>>> +		      X86_FEATURE_WRMSRNS,					\
>>> +		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
>>> +				  mov %edi, %eax;				\
>>> +				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
>>> +		      X86_FEATURE_MSR_IMM
>>> +.endm
>> This is quite hideous.  I have no objection to optimizing __vmx_vcpu_run(), but
>> I would much prefer that a macro like this live in generic code, and that it be
>> generic.  It should be easy enough to provide an assembly friendly equivalent to
>> __native_wrmsr_constant().
>
>Will do.

This should be coming anyway, right?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-10 23:24   ` Sean Christopherson
  2025-04-11 16:18     ` Xin Li
@ 2025-04-11 21:12     ` Jim Mattson
  2025-04-12  4:32       ` Xin Li
  1 sibling, 1 reply; 55+ messages in thread
From: Jim Mattson @ 2025-04-11 21:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xin Li (Intel), linux-kernel, linux-perf-users, linux-hyperv,
	virtualization, linux-edac, kvm, xen-devel, linux-ide, linux-pm,
	bpf, llvm, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	andrew.cooper3, peterz, acme, namhyung, mark.rutland,
	alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
	wei.liu, ajay.kaher, alexey.amakhalov, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, luto, boris.ostrovsky, kys,
	haiyangz, decui

On Thu, Apr 10, 2025 at 4:24 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Mar 31, 2025, Xin Li (Intel) wrote:
> > Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> > ---
> >  arch/x86/include/asm/msr-index.h |  6 ++++++
> >  arch/x86/kvm/vmx/vmenter.S       | 28 ++++++++++++++++++++++++----
> >  2 files changed, 30 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> > index e6134ef2263d..04244c3ba374 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -1226,4 +1226,10 @@
> >                                               * a #GP
> >                                               */
> >
> > +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
> > +#define ASM_WRMSRNS          _ASM_BYTES(0x0f,0x01,0xc6)
> > +
> > +/* Instruction opcode for the immediate form RDMSR/WRMSRNS */
> > +#define ASM_WRMSRNS_RAX              _ASM_BYTES(0xc4,0xe7,0x7a,0xf6,0xc0)
> > +
> >  #endif /* _ASM_X86_MSR_INDEX_H */
> > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> > index f6986dee6f8c..9fae43723c44 100644
> > --- a/arch/x86/kvm/vmx/vmenter.S
> > +++ b/arch/x86/kvm/vmx/vmenter.S
> > @@ -64,6 +64,29 @@
> >       RET
> >  .endm
> >
> > +/*
> > + * Write EAX to MSR_IA32_SPEC_CTRL.
> > + *
> > + * Choose the best WRMSR instruction based on availability.
> > + *
> > + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
> > + */
> > +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
> > +     ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;                \
> > +                               xor %edx, %edx;                               \
> > +                               mov %edi, %eax;                               \
> > +                               ds wrmsr),                                    \
> > +                   __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;                \
> > +                               xor %edx, %edx;                               \
> > +                               mov %edi, %eax;                               \
> > +                               ASM_WRMSRNS),                                 \
> > +                   X86_FEATURE_WRMSRNS,                                      \
> > +                   __stringify(xor %_ASM_AX, %_ASM_AX;                       \
> > +                               mov %edi, %eax;                               \
> > +                               ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),   \
> > +                   X86_FEATURE_MSR_IMM
> > +.endm
>
> This is quite hideous.  I have no objection to optimizing __vmx_vcpu_run(), but
> I would much prefer that a macro like this live in generic code, and that it be
> generic.  It should be easy enough to provide an assembly friendly equivalent to
> __native_wrmsr_constant().

Surely, any CPU that has WRMSRNS also supports "Virtualize
IA32_SPEC_CTRL," right? Shouldn't we be using that feature rather than
swapping host and guest values with some form of WRMSR?

> > +
> >  .section .noinstr.text, "ax"
> >
> >  /**
> > @@ -123,10 +146,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
> >       movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
> >       cmp %edi, %esi
> >       je .Lspec_ctrl_done
> > -     mov $MSR_IA32_SPEC_CTRL, %ecx
> > -     xor %edx, %edx
> > -     mov %edi, %eax
> > -     wrmsr
> > +     WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
> >
> >  .Lspec_ctrl_done:
> >
> > --
> > 2.49.0
> >
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-11 20:50       ` H. Peter Anvin
@ 2025-04-12  4:28         ` Xin Li
  0 siblings, 0 replies; 55+ messages in thread
From: Xin Li @ 2025-04-12  4:28 UTC (permalink / raw)
  To: H. Peter Anvin, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, bcm-kernel-feedback-list,
	tony.luck, pbonzini, vkuznets, luto, boris.ostrovsky, kys,
	haiyangz, decui

On 4/11/2025 1:50 PM, H. Peter Anvin wrote:
>>> This is quite hideous.  I have no objection to optimizing __vmx_vcpu_run(), but
>>> I would much prefer that a macro like this live in generic code, and that it be
>>> generic.  It should be easy enough to provide an assembly friendly equivalent to
>>> __native_wrmsr_constant().
>> Will do.
> This should be coming anyway, right?

Absolutely.

Totally stupid me: we have it ready to use here, but ...


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-11 21:12     ` Jim Mattson
@ 2025-04-12  4:32       ` Xin Li
  2025-04-12 23:10         ` H. Peter Anvin
  0 siblings, 1 reply; 55+ messages in thread
From: Xin Li @ 2025-04-12  4:32 UTC (permalink / raw)
  To: Jim Mattson, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, hpa, jgross, andrew.cooper3, peterz,
	acme, namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On 4/11/2025 2:12 PM, Jim Mattson wrote:
> Surely, any CPU that has WRMSRNS also supports "Virtualize
> IA32_SPEC_CTRL," right? Shouldn't we be using that feature rather than
> swapping host and guest values with some form of WRMSR?

Good question, the simple answer is that they are irrelevant.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-12  4:32       ` Xin Li
@ 2025-04-12 23:10         ` H. Peter Anvin
  2025-04-14 17:48           ` Xin Li
  0 siblings, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2025-04-12 23:10 UTC (permalink / raw)
  To: Xin Li, Jim Mattson, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On April 11, 2025 9:32:32 PM PDT, Xin Li <xin@zytor.com> wrote:
>On 4/11/2025 2:12 PM, Jim Mattson wrote:
>> Surely, any CPU that has WRMSRNS also supports "Virtualize
>> IA32_SPEC_CTRL," right? Shouldn't we be using that feature rather than
>> swapping host and guest values with some form of WRMSR?
>
>Good question, the simple answer is that they are irrelevant.

Also, *in this specific case* IA32_SPEC_CTRL is architecturally nonserializing, i.e. WRMSR executes as WRMSRNS anyway.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR
  2025-03-31  8:22 ` [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR Xin Li (Intel)
@ 2025-04-14 17:13   ` Francesco Lavra
  2025-04-17 11:10     ` Xin Li
  0 siblings, 1 reply; 55+ messages in thread
From: Francesco Lavra @ 2025-04-14 17:13 UTC (permalink / raw)
  To: xin
  Cc: acme, adrian.hunter, ajay.kaher, alexander.shishkin,
	alexey.amakhalov, andrew.cooper3, bcm-kernel-feedback-list,
	boris.ostrovsky, bp, bpf, dave.hansen, decui, haiyangz, hpa,
	irogers, jgross, jolsa, kan.liang, kvm, kys, linux-edac,
	linux-hyperv, linux-ide, linux-kernel, linux-perf-users, linux-pm,
	llvm, luto, mark.rutland, mingo, namhyung, pbonzini, peterz,
	seanjc, tglx, tony.luck, virtualization, vkuznets, wei.liu, x86,
	xen-devel

On 2025-03-31 at 8:22, Xin Li (Intel) wrote:
> diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
> index e672632b1cc0..6e7a9daa03d4 100644
> --- a/arch/x86/xen/xen-asm.S
> +++ b/arch/x86/xen/xen-asm.S
> @@ -399,3 +399,37 @@ SYM_CODE_END(xen_entry_SYSCALL_compat)
>  	RET
>  SYM_FUNC_END(asm_xen_write_msr)
>  EXPORT_SYMBOL_GPL(asm_xen_write_msr)
> +
> +/*
> + * The prototype of the Xen C code:
> + * 	struct { u64 val, bool done } xen_do_read_msr(u32 msr)
> + */
> +SYM_FUNC_START(asm_xen_read_msr)
> +	ENDBR
> +	FRAME_BEGIN
> +	XEN_SAVE_CALLEE_REGS_FOR_MSR
> +	mov %ecx, %edi		/* MSR number */
> +	call xen_do_read_msr
> +	test %dl, %dl		/* %dl=1, i.e., ZF=0, meaning
> successfully done */
> +	XEN_RESTORE_CALLEE_REGS_FOR_MSR
> +	jnz 2f
> +
> +1:	rdmsr
> +	_ASM_EXTABLE_FUNC_REWIND(1b, -5, FRAME_OFFSET /
> (BITS_PER_LONG / 8))
> +	shl $0x20, %rdx
> +	or %rdx, %rax
> +	/*
> +	 * The top of the stack points directly at the return
> address;
> +	 * back up by 5 bytes from the return address.
> +	 */

This works only if this function has been called directly (e.g. via
`call asm_xen_write_msr`), but doesn't work with alternative call types
(like indirect calls). Not sure why one might want to use an indirect
call to invoke asm_xen_write_msr, but this creates a hidden coupling
between caller and callee.
I don't have a suggestion on how to get rid of this coupling, other
than setting ipdelta in _ASM_EXTABLE_FUNC_REWIND() to 0 and adjusting
the _ASM_EXTABLE_TYPE entries at the call sites to consider the
instruction that follows the function call (instead of the call
instruction) as the faulting instruction (which seems pretty ugly, at
least because what follows the function call could be an instruction
that might itself fault). But you may want to make this caveat explicit
in the comment.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-12 23:10         ` H. Peter Anvin
@ 2025-04-14 17:48           ` Xin Li
  2025-04-15  6:56             ` H. Peter Anvin
  0 siblings, 1 reply; 55+ messages in thread
From: Xin Li @ 2025-04-14 17:48 UTC (permalink / raw)
  To: H. Peter Anvin, Jim Mattson, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On 4/12/2025 4:10 PM, H. Peter Anvin wrote:
> Also,*in this specific case* IA32_SPEC_CTRL is architecturally nonserializing, i.e. WRMSR executes as WRMSRNS anyway.

While the immediate form WRMSRNS could be faster because the MSR index
is available *much* earlier in the pipeline, right?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-14 17:48           ` Xin Li
@ 2025-04-15  6:56             ` H. Peter Anvin
  2025-04-15 17:06               ` Xin Li
  0 siblings, 1 reply; 55+ messages in thread
From: H. Peter Anvin @ 2025-04-15  6:56 UTC (permalink / raw)
  To: Xin Li, Jim Mattson, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On April 14, 2025 10:48:47 AM PDT, Xin Li <xin@zytor.com> wrote:
>On 4/12/2025 4:10 PM, H. Peter Anvin wrote:
>> Also,*in this specific case* IA32_SPEC_CTRL is architecturally nonserializing, i.e. WRMSR executes as WRMSRNS anyway.
>
>While the immediate form WRMSRNS could be faster because the MSR index
>is available *much* earlier in the pipeline, right?

Yes, but then it would be redundant with the virtualization support.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-15  6:56             ` H. Peter Anvin
@ 2025-04-15 17:06               ` Xin Li
  2025-04-15 17:07                 ` H. Peter Anvin
  0 siblings, 1 reply; 55+ messages in thread
From: Xin Li @ 2025-04-15 17:06 UTC (permalink / raw)
  To: H. Peter Anvin, Jim Mattson, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On 4/14/2025 11:56 PM, H. Peter Anvin wrote:
>> arlier in the pipeline, right?
> Yes, but then it would be redundant with the virtualization support.
> 

So better to drop this patch then.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available
  2025-04-15 17:06               ` Xin Li
@ 2025-04-15 17:07                 ` H. Peter Anvin
  0 siblings, 0 replies; 55+ messages in thread
From: H. Peter Anvin @ 2025-04-15 17:07 UTC (permalink / raw)
  To: Xin Li, Jim Mattson, Sean Christopherson
  Cc: linux-kernel, linux-perf-users, linux-hyperv, virtualization,
	linux-edac, kvm, xen-devel, linux-ide, linux-pm, bpf, llvm, tglx,
	mingo, bp, dave.hansen, x86, jgross, andrew.cooper3, peterz, acme,
	namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
	adrian.hunter, kan.liang, wei.liu, ajay.kaher,
	bcm-kernel-feedback-list, tony.luck, pbonzini, vkuznets, luto,
	boris.ostrovsky, kys, haiyangz, decui

On April 15, 2025 10:06:01 AM PDT, Xin Li <xin@zytor.com> wrote:
>On 4/14/2025 11:56 PM, H. Peter Anvin wrote:
>>> arlier in the pipeline, right?
>> Yes, but then it would be redundant with the virtualization support.
>> 
>
>So better to drop this patch then.

Yeah, if it gets pulled in as a consequence of a global change that is OK but the local change makes no sense.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR
  2025-04-14 17:13   ` Francesco Lavra
@ 2025-04-17 11:10     ` Xin Li
  0 siblings, 0 replies; 55+ messages in thread
From: Xin Li @ 2025-04-17 11:10 UTC (permalink / raw)
  To: Francesco Lavra
  Cc: acme, adrian.hunter, ajay.kaher, alexander.shishkin,
	andrew.cooper3, bcm-kernel-feedback-list, boris.ostrovsky, bp,
	bpf, dave.hansen, decui, haiyangz, hpa, irogers, jgross, jolsa,
	kan.liang, kvm, kys, linux-edac, linux-hyperv, linux-ide,
	linux-kernel, linux-perf-users, linux-pm, llvm, luto,
	mark.rutland, mingo, namhyung, pbonzini, peterz, seanjc, tglx,
	tony.luck, virtualization, vkuznets, wei.liu, x86, xen-devel

On 4/14/2025 10:13 AM, Francesco Lavra wrote:
> This works only if this function has been called directly (e.g. via
> `call asm_xen_write_msr`), but doesn't work with alternative call types
> (like indirect calls). Not sure why one might want to use an indirect
> call to invoke asm_xen_write_msr, but this creates a hidden coupling
> between caller and callee.
> I don't have a suggestion on how to get rid of this coupling, other
> than setting ipdelta in _ASM_EXTABLE_FUNC_REWIND() to 0 and adjusting
> the _ASM_EXTABLE_TYPE entries at the call sites to consider the
> instruction that follows the function call (instead of the call
> instruction) as the faulting instruction (which seems pretty ugly, at
> least because what follows the function call could be an instruction
> that might itself fault). But you may want to make this caveat explicit
> in the comment.

Good idea, will state that in the comment.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2025-04-17 11:12 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-31  8:22 [RFC PATCH v1 00/15] MSR refactor with new MSR instructions support Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Xin Li (Intel)
2025-03-31 10:17   ` Ingo Molnar
2025-03-31 20:32     ` H. Peter Anvin
2025-04-01  5:53       ` Xin Li
2025-04-02 15:41         ` Dave Hansen
2025-04-02 15:56           ` H. Peter Anvin
2025-04-09 19:53             ` Ingo Molnar
2025-04-09 19:56               ` Dave Hansen
2025-04-09 20:11                 ` Ingo Molnar
2025-04-01  7:52       ` Ingo Molnar
2025-04-02  3:45         ` Xin Li
2025-04-02  4:10           ` Ingo Molnar
2025-04-02  4:57             ` Xin Li
2025-04-08 17:34             ` Xin Li
2025-04-03  5:09         ` Xin Li
2025-04-03  6:01           ` H. Peter Anvin
2025-04-09 19:17           ` [PATCH] x86/msr: Standardize on 'u32' MSR indices in <asm/msr.h> Ingo Molnar
2025-03-31 21:45   ` [RFC PATCH v1 01/15] x86/msr: Replace __wrmsr() with native_wrmsrl() Andrew Cooper
2025-04-01  5:13     ` H. Peter Anvin
2025-04-01  5:29       ` Xin Li
2025-04-03  7:13     ` Xin Li
2025-03-31  8:22 ` [RFC PATCH v1 02/15] x86/msr: Replace __rdmsr() with native_rdmsrl() Xin Li (Intel)
2025-03-31 10:26   ` Ingo Molnar
2025-03-31  8:22 ` [RFC PATCH v1 03/15] x86/msr: Simplify pmu_msr_{read,write}() Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 04/15] x86/msr: Let pv_cpu_ops.write_msr{_safe}() take an u64 instead of two u32 Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 05/15] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrl(msr, value) Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 06/15] x86/msr: Remove MSR write APIs that take the MSR value in two u32 arguments Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 07/15] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 08/15] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 09/15] x86/opcode: Add immediate form MSR instructions to x86-opcode-map Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 10/15] KVM: VMX: Use WRMSRNS or its immediate form when available Xin Li (Intel)
2025-03-31 20:27   ` Konrad Rzeszutek Wilk
2025-03-31 20:38     ` Borislav Petkov
2025-03-31 20:41     ` Andrew Cooper
2025-03-31 20:55       ` H. Peter Anvin
2025-03-31 20:45     ` H. Peter Anvin
2025-04-10 23:24   ` Sean Christopherson
2025-04-11 16:18     ` Xin Li
2025-04-11 20:50       ` H. Peter Anvin
2025-04-12  4:28         ` Xin Li
2025-04-11 21:12     ` Jim Mattson
2025-04-12  4:32       ` Xin Li
2025-04-12 23:10         ` H. Peter Anvin
2025-04-14 17:48           ` Xin Li
2025-04-15  6:56             ` H. Peter Anvin
2025-04-15 17:06               ` Xin Li
2025-04-15 17:07                 ` H. Peter Anvin
2025-03-31  8:22 ` [RFC PATCH v1 11/15] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 12/15] x86/msr: Use the alternatives mechanism to write MSR Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 13/15] x86/msr: Use the alternatives mechanism to read MSR Xin Li (Intel)
2025-04-14 17:13   ` Francesco Lavra
2025-04-17 11:10     ` Xin Li
2025-03-31  8:22 ` [RFC PATCH v1 14/15] x86/extable: Add support for the immediate form MSR instructions Xin Li (Intel)
2025-03-31  8:22 ` [RFC PATCH v1 15/15] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).