* [PATCH v3 1/4] x86/CPU/AMD: Avoid racy updates to MSR_K7_HWCR in set_cpuid_faulting()
2026-06-18 22:45 [PATCH v3 0/4] Fix racy and incorrect updates to MSR_K7_HWCR Jim Mattson
@ 2026-06-18 22:45 ` Jim Mattson
2026-06-18 22:45 ` [PATCH v3 2/4] x86/mce/inject: Avoid racy updates to MSR_K7_HWCR during MCE injection Jim Mattson
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Jim Mattson @ 2026-06-18 22:45 UTC (permalink / raw)
To: bp, tglx, x86, rafael, viresh.kumar, yosry, andrew.cooper3,
ludloff
Cc: linux-kernel, linux-pm, Jim Mattson
Since msr_set_bit() and msr_clear_bit() perform a non-atomic update to an
MSR, they can race with a write to the same MSR from interrupt context.
On AMD CPUs, set_cpuid_faulting() uses these functions to modify
MSR_K7_HWCR from process context, with preemption disabled but interrupts
enabled. The acpi-cpufreq driver's boost_set_msr() modifies HWCR from
interrupt context. If a crosscall IPI arrives between
set_cpuid_faulting()'s read and write of MSR_K7_HWCR and toggles the Core
Performance Boost disable bit (CPB_DIS), the IPI's update is lost.
This race has been observed empirically on a Turin system running the
acpi-cpufreq driver with a synthetic test. One thread repeatedly toggles
/sys/devices/system/cpu/cpufreq/boost and verifies CPB_DIS on CPU0 after
each write. A second thread pinned to CPU0 calls arch_prctl(ARCH_SET_CPUID,
<val>), with alternating <val>s of 0 and 1. CPB_DIS bit changes are
sometimes lost.
Introduce amd_update_hwcr() to perform an interrupt-safe read-modify-write
of MSR_K7_HWCR, and use it in set_cpuid_faulting() to prevent races with
HWCR updates in interrupt context. Note that when set_cpuid_faulting() is
called from __switch_to_xtra(), interrupts are already disabled, so the
race is only possible on the arch_prctl() paths.
Reported-by: Sashiko (gemini/gemini-3.1-pro-preview)
Closes: https://lore.kernel.org/all/20260609211611.466231-1-jmattson@google.com/
Suggested-by: Borislav Petkov <bp@alien8.de>
Fixes: 65f55a301766 ("x86/CPU/AMD: Add CPUID faulting support")
Assisted-by: Gemini:gemini-3.5-pro
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/cpu/amd.c | 38 ++++++++++++++++++++++++++++++++
arch/x86/kernel/process.c | 5 +----
3 files changed, 41 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 87b1d4c0727e..153b621777ae 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -722,9 +722,11 @@ static __always_inline void amd_clear_divider(void)
}
extern void amd_check_microcode(void);
+extern int amd_update_hwcr(u8 bit, bool set);
#else
static inline void amd_clear_divider(void) { }
static inline void amd_check_microcode(void) { }
+static inline int amd_update_hwcr(u8 bit, bool set) { return -ENODEV; }
#endif
extern unsigned long arch_align_stack(unsigned long sp);
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 487ac147e11f..15e7ca3b815d 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1320,6 +1320,44 @@ void amd_check_microcode(void)
on_each_cpu(zenbleed_check_cpu, NULL, 1);
}
+/**
+ * amd_update_hwcr - Update MSR_K7_HWCR on the executing CPU
+ * @bit: bit number to change
+ * @set: whether to set or clear the bit
+ *
+ * MSR_K7_HWCR is written from both process context (e.g. CPUID faulting
+ * updates via arch_prctl(ARCH_SET_CPUID)) and interrupt context (e.g.
+ * Core Performance Boost updates IPI'd by the acpi-cpufreq driver), so
+ * a read-modify-write of the MSR must be performed with interrupts
+ * disabled to avoid losing an update made by an intervening interrupt.
+ * All runtime (non-initialization) updates of MSR_K7_HWCR should go
+ * through this helper.
+ *
+ * Context: Any context except NMI. Disabling interrupts does not
+ * serialize against an NMI, so NMI handlers must not write
+ * MSR_K7_HWCR. Warns if called from NMI context.
+ *
+ * Return: 0 on success, negative error code if an MSR access faults.
+ */
+int amd_update_hwcr(u8 bit, bool set)
+{
+ unsigned long flags;
+ int ret;
+
+ if (WARN_ON_ONCE(in_nmi()))
+ return -EINVAL;
+
+ local_irq_save(flags);
+ if (set)
+ ret = msr_set_bit(MSR_K7_HWCR, bit);
+ else
+ ret = msr_clear_bit(MSR_K7_HWCR, bit);
+ local_irq_restore(flags);
+
+ return ret < 0 ? ret : 0;
+}
+EXPORT_SYMBOL_GPL(amd_update_hwcr);
+
static const char * const s5_reset_reason_txt[] = {
[0] = "thermal pin BP_THERMTRIP_L was tripped",
[1] = "power button was pressed for 4 seconds",
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index a554f19c9973..8895cc0fa472 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -354,10 +354,7 @@ static void set_cpuid_faulting(bool on)
this_cpu_write(msr_misc_features_shadow, msrval);
wrmsrq(MSR_MISC_FEATURES_ENABLES, msrval);
} else if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
- if (on)
- msr_set_bit(MSR_K7_HWCR, MSR_K7_HWCR_CPUID_USER_DIS_BIT);
- else
- msr_clear_bit(MSR_K7_HWCR, MSR_K7_HWCR_CPUID_USER_DIS_BIT);
+ amd_update_hwcr(MSR_K7_HWCR_CPUID_USER_DIS_BIT, on);
}
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v3 2/4] x86/mce/inject: Avoid racy updates to MSR_K7_HWCR during MCE injection
2026-06-18 22:45 [PATCH v3 0/4] Fix racy and incorrect updates to MSR_K7_HWCR Jim Mattson
2026-06-18 22:45 ` [PATCH v3 1/4] x86/CPU/AMD: Avoid racy updates to MSR_K7_HWCR in set_cpuid_faulting() Jim Mattson
@ 2026-06-18 22:45 ` Jim Mattson
2026-06-18 22:45 ` [PATCH v3 3/4] cpufreq: ACPI: Use IPI to update boost MSR in cpufreq_boost_down_prep() Jim Mattson
2026-06-18 22:45 ` [PATCH v3 4/4] cpufreq: ACPI: Use amd_update_hwcr() for MSR_K7_HWCR updates Jim Mattson
3 siblings, 0 replies; 5+ messages in thread
From: Jim Mattson @ 2026-06-18 22:45 UTC (permalink / raw)
To: bp, tglx, x86, rafael, viresh.kumar, yosry, andrew.cooper3,
ludloff
Cc: linux-kernel, linux-pm, Jim Mattson
MCE injection performs a read-modify-write of MSR_K7_HWCR as two
independent crosscalls (via toggle_hw_mce_inject) wrapping the actual MSR
writes (prepare_msrs). Another HWCR update on the target CPU could be lost
if it occurred between these crosscalls.
Introduce ipi_inject_mce() to perform the entire injection sequence (toggle
ON, write MSRs, toggle OFF) in a single IPI callback, ensuring atomicity on
the target CPU.
For the local CPU initialization path in check_hw_inj_possible(), use
amd_update_hwcr() directly to avoid IPI overhead and ensure safe updates.
Remove toggle_hw_mce_inject() as it is no longer used.
Opportunistically, replace the open-coded BIT(18) with a new
MSR_K7_HWCR_MCSTATUSWREN macro.
Fixes: 21690934d934 ("EDAC, mce_amd_inj: Enable direct writes to MCE MSRs")
Link: https://sashiko.dev/#/patchset/20260612215729.1532175-1-jmattson%40google.com?part=2
Assisted-by: Gemini:gemini-3.5-pro
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/mce/inject.c | 46 ++++++++++----------------------
2 files changed, 16 insertions(+), 32 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 86554de9a3f5..29c4abade594 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -896,6 +896,8 @@
#define MSR_K7_HWCR 0xc0010015
#define MSR_K7_HWCR_SMMLOCK_BIT 0
#define MSR_K7_HWCR_SMMLOCK BIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT)
+#define MSR_K7_HWCR_MCSTATUSWREN_BIT 18
+#define MSR_K7_HWCR_MCSTATUSWREN BIT_ULL(MSR_K7_HWCR_MCSTATUSWREN_BIT)
#define MSR_K7_HWCR_IRPERF_EN_BIT 30
#define MSR_K7_HWCR_IRPERF_EN BIT_ULL(MSR_K7_HWCR_IRPERF_EN_BIT)
#define MSR_K7_HWCR_CPUID_USER_DIS_BIT 35
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 6f8a49d8baeb..1e017d8e23e4 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -31,6 +31,7 @@
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/nmi.h>
+#include <asm/processor.h>
#include <asm/smp.h>
#include "internal.h"
@@ -311,30 +312,6 @@ static struct notifier_block inject_nb = {
.notifier_call = mce_inject_raise,
};
-/*
- * Caller needs to be make sure this cpu doesn't disappear
- * from under us, i.e.: get_cpu/put_cpu.
- */
-static int toggle_hw_mce_inject(unsigned int cpu, bool enable)
-{
- struct msr val;
- int err;
-
- err = rdmsrq_on_cpu(cpu, MSR_K7_HWCR, &val.q);
- if (err) {
- pr_err("%s: error reading HWCR\n", __func__);
- return err;
- }
-
- enable ? (val.l |= BIT(18)) : (val.l &= ~BIT(18));
-
- err = wrmsrq_on_cpu(cpu, MSR_K7_HWCR, val.q);
- if (err)
- pr_err("%s: error writing HWCR\n", __func__);
-
- return err;
-}
-
static int __set_inj(const char *buf)
{
int i;
@@ -500,6 +477,12 @@ static void prepare_msrs(void *info)
wrmsrq(MSR_IA32_MCx_MISC(b), m.misc);
}
}
+static void ipi_inject_mce(void *info)
+{
+ amd_update_hwcr(MSR_K7_HWCR_MCSTATUSWREN_BIT, true);
+ prepare_msrs(info);
+ amd_update_hwcr(MSR_K7_HWCR_MCSTATUSWREN_BIT, false);
+}
static void do_inject(void)
{
@@ -556,13 +539,13 @@ static void do_inject(void)
if (!cpu_online(cpu))
goto err;
- toggle_hw_mce_inject(cpu, true);
-
i_mce.mcgstatus = mcg_status;
i_mce.inject_flags = inj_type;
- smp_call_function_single(cpu, prepare_msrs, &i_mce, 0);
- toggle_hw_mce_inject(cpu, false);
+ if (smp_call_function_single(cpu, ipi_inject_mce, &i_mce, 1)) {
+ pr_err("%s: Error injecting MCE on CPU %d\n", __func__, cpu);
+ goto err;
+ }
switch (inj_type) {
case DFR_INT_INJ:
@@ -727,7 +710,6 @@ static void __init debugfs_init(void)
static void check_hw_inj_possible(void)
{
- int cpu;
u8 bank;
/*
@@ -737,7 +719,7 @@ static void check_hw_inj_possible(void)
if (!cpu_feature_enabled(X86_FEATURE_SMCA))
return;
- cpu = get_cpu();
+ get_cpu();
for (bank = 0; bank < MAX_NR_BANKS; ++bank) {
u64 status = MCI_STATUS_VAL, ipid;
@@ -747,7 +729,7 @@ static void check_hw_inj_possible(void)
if (!ipid)
continue;
- toggle_hw_mce_inject(cpu, true);
+ amd_update_hwcr(MSR_K7_HWCR_MCSTATUSWREN_BIT, true);
wrmsrq_safe(mca_msr_reg(bank, MCA_STATUS), status);
rdmsrq_safe(mca_msr_reg(bank, MCA_STATUS), &status);
@@ -759,7 +741,7 @@ static void check_hw_inj_possible(void)
"Try using APEI EINJ instead.\n");
}
- toggle_hw_mce_inject(cpu, false);
+ amd_update_hwcr(MSR_K7_HWCR_MCSTATUSWREN_BIT, false);
break;
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v3 3/4] cpufreq: ACPI: Use IPI to update boost MSR in cpufreq_boost_down_prep()
2026-06-18 22:45 [PATCH v3 0/4] Fix racy and incorrect updates to MSR_K7_HWCR Jim Mattson
2026-06-18 22:45 ` [PATCH v3 1/4] x86/CPU/AMD: Avoid racy updates to MSR_K7_HWCR in set_cpuid_faulting() Jim Mattson
2026-06-18 22:45 ` [PATCH v3 2/4] x86/mce/inject: Avoid racy updates to MSR_K7_HWCR during MCE injection Jim Mattson
@ 2026-06-18 22:45 ` Jim Mattson
2026-06-18 22:45 ` [PATCH v3 4/4] cpufreq: ACPI: Use amd_update_hwcr() for MSR_K7_HWCR updates Jim Mattson
3 siblings, 0 replies; 5+ messages in thread
From: Jim Mattson @ 2026-06-18 22:45 UTC (permalink / raw)
To: bp, tglx, x86, rafael, viresh.kumar, yosry, andrew.cooper3,
ludloff
Cc: linux-kernel, linux-pm, Jim Mattson, Sashiko
During driver exit or CPU hotplug, acpi_cpufreq_cpu_exit() calls
cpufreq_boost_down_prep() to re-enable boost on the target CPU. However,
cpufreq_boost_down_prep() ignores the target CPU parameter and calls
boost_set_msr(1) locally. Since this runs in an unbound process context, it
updates the local CPU's MSR while leaving the target CPU's boost state
unchanged.
On Intel platforms, the open-coded read-modify-write of
MSR_IA32_MISC_ENABLE in boost_set_msr() is vulnerable to preemption and
thread migration if executed in process context.
Fix both issues by routing the MSR update through
smp_call_function_single() to execute boost_set_msr() synchronously on the
target CPU. This ensures the correct CPU is updated and that the RMW
sequence on Intel executes safely in IPI context with interrupts disabled.
Fixes: a3605c46e0c0 ("cpufreq: acpi-cpufreq: drop rdmsr_on_cpus() usage")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260612215729.1532175-1-jmattson%40google.com
Assisted-by: Gemini:gemini-3.5-pro
Signed-off-by: Jim Mattson <jmattson@google.com>
---
drivers/cpufreq/acpi-cpufreq.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 21639d9ac753..48d678812fd6 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -528,10 +528,11 @@ static void free_acpi_perf_data(void)
static int cpufreq_boost_down_prep(unsigned int cpu)
{
/*
- * Clear the boost-disable bit on the CPU_DOWN path so that
- * this cpu cannot block the remaining ones from boosting.
+ * Clear the boost-disable bit on the target CPU so that
+ * it cannot block the remaining ones from boosting.
*/
- return boost_set_msr(1);
+ return smp_call_function_single(cpu, boost_set_msr_each,
+ (void *)1L, 1);
}
/*
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v3 4/4] cpufreq: ACPI: Use amd_update_hwcr() for MSR_K7_HWCR updates
2026-06-18 22:45 [PATCH v3 0/4] Fix racy and incorrect updates to MSR_K7_HWCR Jim Mattson
` (2 preceding siblings ...)
2026-06-18 22:45 ` [PATCH v3 3/4] cpufreq: ACPI: Use IPI to update boost MSR in cpufreq_boost_down_prep() Jim Mattson
@ 2026-06-18 22:45 ` Jim Mattson
3 siblings, 0 replies; 5+ messages in thread
From: Jim Mattson @ 2026-06-18 22:45 UTC (permalink / raw)
To: bp, tglx, x86, rafael, viresh.kumar, yosry, andrew.cooper3,
ludloff
Cc: linux-kernel, linux-pm, Jim Mattson
For consistency and maintainability, update boost_set_msr() to use
the centralized amd_update_hwcr() helper when modifying MSR_K7_HWCR.
Suggested-by: Borislav Petkov <bp@alien8.de>
Assisted-by: Gemini:gemini-3.5-pro
Signed-off-by: Jim Mattson <jmattson@google.com>
---
drivers/cpufreq/acpi-cpufreq.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 48d678812fd6..b3e058dd6b4b 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -103,9 +103,7 @@ static int boost_set_msr(bool enable)
break;
case X86_VENDOR_HYGON:
case X86_VENDOR_AMD:
- msr_addr = MSR_K7_HWCR;
- msr_mask = MSR_K7_HWCR_CPB_DIS;
- break;
+ return amd_update_hwcr(MSR_K7_HWCR_CPB_DIS_BIT, !enable);
default:
return -EINVAL;
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread