* [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc
@ 2026-03-12 16:53 Jan Beulich
2026-03-12 16:54 ` [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume Jan Beulich
` (9 more replies)
0 siblings, 10 replies; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:53 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
Includes a few custom changes, too.
1: arrange for BSP MSR adjustments during S3 resume
2: clean up BYT/CHT auto demotion disable
3: latch struct idle_cpu contents
4: move pre-initialized struct idle_cpu instances
5: Remove unused driver version constant
6: Remove the 'preferred_cstates' parameter
7: drop const from struct cpuidle_state arrays
8: Add cmdline option to adjust C-states table
9: Add C-states validation
Jan
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
@ 2026-03-12 16:54 ` Jan Beulich
2026-04-24 14:34 ` Roger Pau Monné
2026-03-12 16:54 ` [PATCH 2/9] x86/mwait-idle: clean up BYT/CHT auto demotion disable Jan Beulich
` (8 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:54 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
mwait_idle_cpu_init() is only called for APs, yet MSR writes will
typically need re-doing post-S3 even for the BSP. When multiple cores /
threads are present (and to come back online) in a package, for package
scope MSRs this may be covered by APs doing the writes, but we can't rely
on that.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -28,6 +28,7 @@
#include <asm/io_apic.h>
#include <asm/irq.h>
#include <asm/microcode.h>
+#include <asm/mwait.h>
#include <asm/prot-key.h>
#include <asm/spec_ctrl.h>
#include <asm/tboot.h>
@@ -299,6 +300,7 @@ static int enter_state(u32 state)
acpi_sleep_post(state);
if ( hvm_cpu_up() )
BUG();
+ mwait_idle_resume();
cpufreq_add_cpu(0);
enable_cpu:
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -1680,6 +1680,28 @@ static int __init mwait_idle_probe(void)
return 0;
}
+static void mwait_idle_cpu_tweak(unsigned int cpu)
+{
+ if (icpu->auto_demotion_disable_flags)
+ on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
+
+ if (icpu->byt_auto_demotion_disable_flag)
+ on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
+
+ switch (icpu->c1e_promotion) {
+ case C1E_PROMOTION_DISABLE:
+ on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
+ break;
+
+ case C1E_PROMOTION_ENABLE:
+ on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
+ break;
+
+ case C1E_PROMOTION_PRESERVE:
+ break;
+ }
+}
+
static int cf_check mwait_idle_cpu_init(
struct notifier_block *nfb, unsigned long action, void *hcpu)
{
@@ -1762,24 +1784,7 @@ static int cf_check mwait_idle_cpu_init(
dev->count++;
}
- if (icpu->auto_demotion_disable_flags)
- on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
-
- if (icpu->byt_auto_demotion_disable_flag)
- on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
-
- switch (icpu->c1e_promotion) {
- case C1E_PROMOTION_DISABLE:
- on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
- break;
-
- case C1E_PROMOTION_ENABLE:
- on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
- break;
-
- case C1E_PROMOTION_PRESERVE:
- break;
- }
+ mwait_idle_cpu_tweak(cpu);
return NOTIFY_DONE;
}
@@ -1811,6 +1816,14 @@ int __init mwait_idle_init(struct notifi
return err;
}
+void mwait_idle_resume(void)
+{
+ if (!icpu)
+ return;
+
+ mwait_idle_cpu_tweak(smp_processor_id());
+}
+
/* Helper function for HPET. */
bool __init mwait_pc10_supported(void)
{
--- a/xen/arch/x86/include/asm/mwait.h
+++ b/xen/arch/x86/include/asm/mwait.h
@@ -14,9 +14,12 @@
#define MWAIT_ECX_INTERRUPT_BREAK 0x1
void mwait_idle_with_hints(unsigned int eax, unsigned int ecx);
+
#ifdef CONFIG_INTEL
+void mwait_idle_resume(void);
bool mwait_pc10_supported(void);
#else
+static inline void mwait_idle_resume(void) {}
static inline bool mwait_pc10_supported(void)
{
return false;
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/9] x86/mwait-idle: clean up BYT/CHT auto demotion disable
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
2026-03-12 16:54 ` [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume Jan Beulich
@ 2026-03-12 16:54 ` Jan Beulich
2026-04-24 14:47 ` Roger Pau Monné
2026-03-12 16:55 ` [PATCH 3/9] x86/mwait-idle: latch struct idle_cpu contents Jan Beulich
` (7 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:54 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
Bay Trail (BYT) and Cherry Trail (CHT) platforms have a very specific way
of disabling auto-demotion via specific MSR bits. Clean up the code so that
BYT/CHT-specifics do not show up in the common 'struct idle_cpu' data
structure.
Remove the 'byt_auto_demotion_disable_flag' flag from 'struct idle_cpu',
because a better coding pattern is to avoid very case-specific fields like
'bool byt_auto_demotion_disable_flag' in a common data structure, which is
used for all platforms, not only BYT/CHT. The code is just more readable
when common data structures contain only commonly used fields.
Instead, match BYT/CHT in the 'intel_idle_init_cstates_icpu()' function,
and introduce a small helper to take care of BYT/CHT auto-demotion. This
is consistent with how platform-specific things are done for other
platforms.
No intended functional changes.
Inspired by (and description largely taken from) Linux'es c93d13b661a6
("intel_idle: clean up BYT/CHT auto demotion disable").
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -104,7 +104,6 @@ struct idle_cpu {
* Indicate which enable bits to clear here.
*/
unsigned long auto_demotion_disable_flags;
- bool byt_auto_demotion_disable_flag;
enum c1e_promotion c1e_promotion;
};
@@ -1144,7 +1143,7 @@ static void cf_check auto_demotion_disab
wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
}
-static void cf_check byt_auto_demotion_disable(void *dummy)
+static void byt_cht_auto_demotion_disable(void)
{
wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0);
wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);
@@ -1195,13 +1194,11 @@ static const struct idle_cpu idle_cpu_sn
static const struct idle_cpu idle_cpu_byt = {
.state_table = byt_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
- .byt_auto_demotion_disable_flag = true,
};
static const struct idle_cpu idle_cpu_cht = {
.state_table = cht_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
- .byt_auto_demotion_disable_flag = true,
};
static const struct idle_cpu idle_cpu_ivb = {
@@ -1680,14 +1677,11 @@ static int __init mwait_idle_probe(void)
return 0;
}
-static void mwait_idle_cpu_tweak(unsigned int cpu)
+static void mwait_idle_cpu_tweak(unsigned int cpu, bool bsp)
{
if (icpu->auto_demotion_disable_flags)
on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
- if (icpu->byt_auto_demotion_disable_flag)
- on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
-
switch (icpu->c1e_promotion) {
case C1E_PROMOTION_DISABLE:
on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
@@ -1700,12 +1694,24 @@ static void mwait_idle_cpu_tweak(unsigne
case C1E_PROMOTION_PRESERVE:
break;
}
+
+ /* Pkg-scope MSRs on 1-socket-only systems need writing only once. */
+ if (!bsp)
+ return;
+
+ switch (boot_cpu_data.vfm) {
+ case INTEL_ATOM_SILVERMONT:
+ case INTEL_ATOM_AIRMONT:
+ byt_cht_auto_demotion_disable();
+ break;
+ }
}
static int cf_check mwait_idle_cpu_init(
struct notifier_block *nfb, unsigned long action, void *hcpu)
{
unsigned int cpu = (unsigned long)hcpu, cstate;
+ static bool first;
struct acpi_processor_power *dev = processor_powers[cpu];
switch (action) {
@@ -1784,7 +1790,8 @@ static int cf_check mwait_idle_cpu_init(
dev->count++;
}
- mwait_idle_cpu_tweak(cpu);
+ mwait_idle_cpu_tweak(cpu, first);
+ first = false;
return NOTIFY_DONE;
}
@@ -1821,7 +1828,7 @@ void mwait_idle_resume(void)
if (!icpu)
return;
- mwait_idle_cpu_tweak(smp_processor_id());
+ mwait_idle_cpu_tweak(smp_processor_id(), true);
}
/* Helper function for HPET. */
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 3/9] x86/mwait-idle: latch struct idle_cpu contents
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
2026-03-12 16:54 ` [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume Jan Beulich
2026-03-12 16:54 ` [PATCH 2/9] x86/mwait-idle: clean up BYT/CHT auto demotion disable Jan Beulich
@ 2026-03-12 16:55 ` Jan Beulich
2026-04-24 15:24 ` Roger Pau Monné
2026-03-12 16:55 ` [PATCH 4/9] x86/mwait-idle: move pre-initialized struct idle_cpu instances Jan Beulich
` (6 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:55 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
Rather than storing a pointer (and needing to keep all struct instances in
memory post-init), and rather than (like the Linux counterpart has it)
keeping individual variables, simply copy the respective structure
instance. By implication, subsequent updates now need doing to the copy.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -107,14 +107,14 @@ struct idle_cpu {
enum c1e_promotion c1e_promotion;
};
-static const struct idle_cpu *__ro_after_init icpu;
+static struct idle_cpu __ro_after_init icpu;
-static const struct cpuidle_state {
+struct cpuidle_state {
char name[16];
unsigned int flags;
unsigned int exit_latency; /* in US */
unsigned int target_residency; /* in US */
-} *__ro_after_init cpuidle_state_table;
+};
#define CPUIDLE_FLAG_DISABLED 0x1
/*
@@ -1097,7 +1097,7 @@ static void cf_check mwait_idle(void)
* leave_mm() to avoid costly and often unnecessary wakeups
* for flushing the user TLB's associated with the active mm.
*/
- if (cpuidle_state_table[].flags & CPUIDLE_FLAG_TLB_FLUSHED)
+ if (icpu.state_table[].flags & CPUIDLE_FLAG_TLB_FLUSHED)
leave_mm(cpu);
#endif
@@ -1139,7 +1139,7 @@ static void cf_check auto_demotion_disab
u64 msr_bits;
rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
- msr_bits &= ~(icpu->auto_demotion_disable_flags);
+ msr_bits &= ~icpu.auto_demotion_disable_flags;
wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
}
@@ -1371,10 +1371,10 @@ static void __init ivt_idle_state_table_
/* 1 and 2 socket systems use default ivt_cstates */
break;
case 2: case 3:
- cpuidle_state_table = ivt_cstates_4s;
+ icpu.state_table = ivt_cstates_4s;
break;
default:
- cpuidle_state_table = ivt_cstates_8s;
+ icpu.state_table = ivt_cstates_8s;
break;
}
}
@@ -1525,14 +1525,12 @@ static void __init adl_idle_state_table_
adl_l_cstates[1].flags |= CPUIDLE_FLAG_DISABLED;
/* Disable C1E by clearing the "C1E promotion" bit. */
- idle_cpu_adl.c1e_promotion = C1E_PROMOTION_DISABLE;
- idle_cpu_adl_l.c1e_promotion = C1E_PROMOTION_DISABLE;
+ icpu.c1e_promotion = C1E_PROMOTION_DISABLE;
return;
}
/* Make sure C1E is enabled by default */
- idle_cpu_adl.c1e_promotion = C1E_PROMOTION_ENABLE;
- idle_cpu_adl_l.c1e_promotion = C1E_PROMOTION_ENABLE;
+ icpu.c1e_promotion = C1E_PROMOTION_ENABLE;
}
/*
@@ -1593,6 +1591,7 @@ static int __init mwait_idle_probe(void)
{
unsigned int eax, ebx, ecx;
const struct x86_cpu_id *id;
+ const struct idle_cpu *idle_cpu;
const char *str;
if (boot_cpu_data.vendor != X86_VENDOR_INTEL)
@@ -1627,8 +1626,8 @@ static int __init mwait_idle_probe(void)
pr_debug(PREFIX "MWAIT substates: %#x\n", mwait_substates);
- icpu = id->driver_data;
- cpuidle_state_table = icpu->state_table;
+ idle_cpu = id->driver_data;
+ icpu = *idle_cpu;
if (boot_cpu_has(X86_FEATURE_XEN_ARAT))
lapic_timer_reliable_states = LAPIC_TIMER_ALWAYS_RELIABLE;
@@ -1647,7 +1646,7 @@ static int __init mwait_idle_probe(void)
const char *ss;
do {
- const struct cpuidle_state *state = icpu->state_table;
+ const struct cpuidle_state *state = idle_cpu->state_table;
unsigned int bit = 1;
ss = strchr(str, ',');
@@ -1679,10 +1678,10 @@ static int __init mwait_idle_probe(void)
static void mwait_idle_cpu_tweak(unsigned int cpu, bool bsp)
{
- if (icpu->auto_demotion_disable_flags)
+ if (icpu.auto_demotion_disable_flags)
on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
- switch (icpu->c1e_promotion) {
+ switch (icpu.c1e_promotion) {
case C1E_PROMOTION_DISABLE:
on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
break;
@@ -1735,11 +1734,11 @@ static int cf_check mwait_idle_cpu_init(
dev->count = 1;
- for (cstate = 0; cpuidle_state_table[cstate].target_residency; ++cstate) {
+ for (cstate = 0; icpu.state_table[cstate].target_residency; ++cstate) {
unsigned int num_substates, hint, state;
struct acpi_processor_cx *cx;
- hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
+ hint = flg2MWAIT(icpu.state_table[cstate].flags);
state = MWAIT_HINT2CSTATE(hint) + 1;
if (state > max_cstate) {
@@ -1755,10 +1754,10 @@ static int cf_check mwait_idle_cpu_init(
continue;
/* if state marked as disabled, skip it */
- if (cpuidle_state_table[cstate].flags &
+ if (icpu.state_table[cstate].flags &
CPUIDLE_FLAG_DISABLED) {
printk(XENLOG_DEBUG PREFIX "state %s is disabled\n",
- cpuidle_state_table[cstate].name);
+ icpu.state_table[cstate].name);
continue;
}
@@ -1776,15 +1775,15 @@ static int cf_check mwait_idle_cpu_init(
cx->type = state;
cx->address = hint;
cx->entry_method = ACPI_CSTATE_EM_FFH;
- cx->latency = cpuidle_state_table[cstate].exit_latency;
+ cx->latency = icpu.state_table[cstate].exit_latency;
cx->target_residency =
- cpuidle_state_table[cstate].target_residency;
- if ((cpuidle_state_table[cstate].flags &
+ icpu.state_table[cstate].target_residency;
+ if ((icpu.state_table[cstate].flags &
CPUIDLE_FLAG_IRQ_ENABLE) &&
/* cstate_restore_tsc() needs to be a no-op */
boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
cx->irq_enable_early = true;
- if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IBRS)
+ if (icpu.state_table[cstate].flags & CPUIDLE_FLAG_IBRS)
cx->ibrs_disable = true;
dev->count++;
@@ -1825,7 +1824,7 @@ int __init mwait_idle_init(struct notifi
void mwait_idle_resume(void)
{
- if (!icpu)
+ if (!icpu.state_table)
return;
mwait_idle_cpu_tweak(smp_processor_id(), true);
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 4/9] x86/mwait-idle: move pre-initialized struct idle_cpu instances
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
` (2 preceding siblings ...)
2026-03-12 16:55 ` [PATCH 3/9] x86/mwait-idle: latch struct idle_cpu contents Jan Beulich
@ 2026-03-12 16:55 ` Jan Beulich
2026-04-24 15:33 ` Roger Pau Monné
2026-03-12 16:56 ` [PATCH 5/9] x86/mwait-idle: Remove unused driver version constant Jan Beulich
` (5 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:55 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
Now that they're not referenced anymore post-init, they can themselves
move into .init.rodata. (idle_cpu_adl{,_l} can also become const in the
first place.)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -1167,132 +1167,132 @@ static void cf_check c1e_promotion_disab
wrmsrl(MSR_IA32_POWER_CTL, msr_bits);
}
-static const struct idle_cpu idle_cpu_nehalem = {
+static const struct idle_cpu __initconstrel idle_cpu_nehalem = {
.state_table = nehalem_cstates,
.auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_atom = {
+static const struct idle_cpu __initconstrel idle_cpu_atom = {
.state_table = atom_cstates,
};
-static const struct idle_cpu idle_cpu_tangier = {
+static const struct idle_cpu __initconstrel idle_cpu_tangier = {
.state_table = tangier_cstates,
};
-static const struct idle_cpu idle_cpu_lincroft = {
+static const struct idle_cpu __initconstrel idle_cpu_lincroft = {
.state_table = atom_cstates,
.auto_demotion_disable_flags = ATM_LNC_C6_AUTO_DEMOTE,
};
-static const struct idle_cpu idle_cpu_snb = {
+static const struct idle_cpu __initconstrel idle_cpu_snb = {
.state_table = snb_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_byt = {
+static const struct idle_cpu __initconstrel idle_cpu_byt = {
.state_table = byt_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_cht = {
+static const struct idle_cpu __initconstrel idle_cpu_cht = {
.state_table = cht_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_ivb = {
+static const struct idle_cpu __initconstrel idle_cpu_ivb = {
.state_table = ivb_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_ivt = {
+static const struct idle_cpu __initconstrel idle_cpu_ivt = {
.state_table = ivt_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_hsw = {
+static const struct idle_cpu __initconstrel idle_cpu_hsw = {
.state_table = hsw_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_bdw = {
+static const struct idle_cpu __initconstrel idle_cpu_bdw = {
.state_table = bdw_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_skl = {
+static const struct idle_cpu __initconstrel idle_cpu_skl = {
.state_table = skl_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_skx = {
+static const struct idle_cpu __initconstrel idle_cpu_skx = {
.state_table = skx_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_icx = {
+static const struct idle_cpu __initconstrel idle_cpu_icx = {
.state_table = icx_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static struct idle_cpu __ro_after_init idle_cpu_adl = {
+static const struct idle_cpu __initconstrel idle_cpu_adl = {
.state_table = adl_cstates,
};
-static struct idle_cpu __ro_after_init idle_cpu_adl_l = {
+static const struct idle_cpu __initconstrel idle_cpu_adl_l = {
.state_table = adl_l_cstates,
};
-static const struct idle_cpu idle_cpu_mtl_l = {
+static const struct idle_cpu __initconstrel idle_cpu_mtl_l = {
.state_table = mtl_l_cstates,
};
-static const struct idle_cpu idle_cpu_gmt = {
+static const struct idle_cpu __initconstrel idle_cpu_gmt = {
.state_table = gmt_cstates,
};
-static const struct idle_cpu idle_cpu_spr = {
+static const struct idle_cpu __initconstrel idle_cpu_spr = {
.state_table = spr_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_gnr = {
+static const struct idle_cpu __initconstrel idle_cpu_gnr = {
.state_table = gnr_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_gnrd = {
+static const struct idle_cpu __initconstrel idle_cpu_gnrd = {
.state_table = gnrd_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_avn = {
+static const struct idle_cpu __initconstrel idle_cpu_avn = {
.state_table = avn_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_bxt = {
+static const struct idle_cpu __initconstrel idle_cpu_bxt = {
.state_table = bxt_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_dnv = {
+static const struct idle_cpu __initconstrel idle_cpu_dnv = {
.state_table = dnv_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_snr = {
+static const struct idle_cpu __initconstrel idle_cpu_snr = {
.state_table = snr_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_grr = {
+static const struct idle_cpu __initconstrel idle_cpu_grr = {
.state_table = grr_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
-static const struct idle_cpu idle_cpu_srf = {
+static const struct idle_cpu __initconstrel idle_cpu_srf = {
.state_table = srf_cstates,
.c1e_promotion = C1E_PROMOTION_DISABLE,
};
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 5/9] x86/mwait-idle: Remove unused driver version constant
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
` (3 preceding siblings ...)
2026-03-12 16:55 ` [PATCH 4/9] x86/mwait-idle: move pre-initialized struct idle_cpu instances Jan Beulich
@ 2026-03-12 16:56 ` Jan Beulich
2026-04-24 15:35 ` Roger Pau Monné
2026-03-12 16:56 ` [PATCH 6/9] x86/mwait-idle: Remove the 'preferred_cstates' parameter Jan Beulich
` (4 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:56 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
The MWAIT_IDLE_VERSION constant has not been updated since 2016 and serves
no useful purpose. The driver version is implicitly defined by the
hypervisor version, making this constant redundant.
Remove the constant to eliminate potential confusion about version
tracking.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20251215111229.132705-1-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 25ff69011ddf
Adjust description to fit our code base.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -58,7 +58,6 @@
#include <acpi/cpufreq/cpufreq.h>
-#define MWAIT_IDLE_VERSION "0.4.1"
#undef PREFIX
#define PREFIX "mwait-idle: "
@@ -1632,9 +1631,6 @@ static int __init mwait_idle_probe(void)
if (boot_cpu_has(X86_FEATURE_XEN_ARAT))
lapic_timer_reliable_states = LAPIC_TIMER_ALWAYS_RELIABLE;
- pr_debug(PREFIX "v" MWAIT_IDLE_VERSION " model %#x\n",
- boot_cpu_data.x86_model);
-
pr_debug(PREFIX "lapic_timer_reliable_states %#x\n",
lapic_timer_reliable_states);
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 6/9] x86/mwait-idle: Remove the 'preferred_cstates' parameter
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
` (4 preceding siblings ...)
2026-03-12 16:56 ` [PATCH 5/9] x86/mwait-idle: Remove unused driver version constant Jan Beulich
@ 2026-03-12 16:56 ` Jan Beulich
2026-04-24 15:37 ` Roger Pau Monné
2026-03-12 16:57 ` [PATCH 7/9] x86/mwait-idle: drop const from struct cpuidle_state arrays Jan Beulich
` (3 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:56 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Remove the 'preferred_cstates' module parameter as it is not really useful.
The parameter currently only affects Alder Lake, where it controls C1/C1E
preference, with C1E being the default. The parameter does not support any
other platform. For example, Meteor Lake has a similar C1/C1E limitation,
but the parameter does not support Meteor Lake. This indicates that the
parameter is not very useful.
Generally, independent C1 and C1E are important for server platforms where
low latency is key. However, they are not as important for client platforms,
like Alder Lake, where C1E providing better energy savings is generally
preferred.
The parameter was originally introduced for Sapphire Rapids Xeon:
da0e58c038e6 intel_idle: add 'preferred_cstates' module argument
Later it was added to Alder Lake:
d1cf8bbfed1ed ("intel_idle: Add AlderLake support")
But it was removed from Sapphire Rapids when firmware fixed the C1/C1E
limitation:
1548fac47a114 ("intel_idle: make SPR C1 and C1E be independent")
So Alder Lake is the only platform left where this parameter has any effect.
Remove this parameter to simplify the driver and reduce maintenance burden.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20251215111300.132803-1-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a36dc37b5672
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2083,12 +2083,6 @@ compression is selected at build time fr
### ple_window (Intel)
> `= <integer>`
-### preferred-cstates (x86)
-> `= ( <integer> | List of ( C1 | C1E | C2 | ... )`
-
-This is a mask of C-states which are to be used preferably. This option is
-applicable only on hardware were certain C-states are exclusive of one another.
-
### probe-port-aliases (x86)
> `= <boolean>`
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -72,19 +72,6 @@ boolean_param("mwait-idle", opt_mwait_id
static unsigned int mwait_substates;
-/*
- * Some platforms come with mutually exclusive C-states, so that if one is
- * enabled, the other C-states must not be used. Example: C1 and C1E on
- * Sapphire Rapids platform. This parameter allows for selecting the
- * preferred C-states among the groups of mutually exclusive C-states - the
- * selected C-states will be registered, the other C-states from the mutually
- * exclusive group won't be registered. If the platform has no mutually
- * exclusive C-states, this parameter has no effect.
- */
-static unsigned int __ro_after_init preferred_states_mask;
-static char __initdata preferred_states[64];
-string_param("preferred-cstates", preferred_states);
-
#define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
/* Reliable LAPIC Timer States, bit 1 for C1 etc. Default to only C1. */
static unsigned int lapic_timer_reliable_states = (1 << 1);
@@ -1511,28 +1498,6 @@ static void __init skx_idle_state_table_
}
/*
- * adl_idle_state_table_update - Adjust AlderLake idle states table.
- */
-static void __init adl_idle_state_table_update(void)
-{
- /* Check if user prefers C1 over C1E. */
- if ((preferred_states_mask & BIT(1, U)) &&
- !(preferred_states_mask & BIT(2, U))) {
- adl_cstates[0].flags &= ~CPUIDLE_FLAG_DISABLED;
- adl_cstates[1].flags |= CPUIDLE_FLAG_DISABLED;
- adl_l_cstates[0].flags &= ~CPUIDLE_FLAG_DISABLED;
- adl_l_cstates[1].flags |= CPUIDLE_FLAG_DISABLED;
-
- /* Disable C1E by clearing the "C1E promotion" bit. */
- icpu.c1e_promotion = C1E_PROMOTION_DISABLE;
- return;
- }
-
- /* Make sure C1E is enabled by default */
- icpu.c1e_promotion = C1E_PROMOTION_ENABLE;
-}
-
-/*
* spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
*/
static void __init spr_idle_state_table_update(void)
@@ -1578,11 +1543,6 @@ static void __init mwait_idle_state_tabl
case INTEL_EMERALDRAPIDS_X:
spr_idle_state_table_update();
break;
- case INTEL_ALDERLAKE:
- case INTEL_ALDERLAKE_L:
- case INTEL_ATOM_GRACEMONT:
- adl_idle_state_table_update();
- break;
}
}
@@ -1591,7 +1551,6 @@ static int __init mwait_idle_probe(void)
unsigned int eax, ebx, ecx;
const struct x86_cpu_id *id;
const struct idle_cpu *idle_cpu;
- const char *str;
if (boot_cpu_data.vendor != X86_VENDOR_INTEL)
return -ENODEV;
@@ -1634,39 +1593,6 @@ static int __init mwait_idle_probe(void)
pr_debug(PREFIX "lapic_timer_reliable_states %#x\n",
lapic_timer_reliable_states);
- str = preferred_states;
- if (isdigit(str[0]))
- preferred_states_mask = simple_strtoul(str, &str, 0);
- else if (str[0])
- {
- const char *ss;
-
- do {
- const struct cpuidle_state *state = idle_cpu->state_table;
- unsigned int bit = 1;
-
- ss = strchr(str, ',');
- if (!ss)
- ss = strchr(str, '\0');
-
- for (; state->name[0]; ++state) {
- bit <<= 1;
- if (!cmdline_strcmp(str, state->name)) {
- preferred_states_mask |= bit;
- break;
- }
- }
- if (!state->name[0])
- break;
-
- str = ss + 1;
- } while (*ss);
-
- str -= str == ss + 1;
- }
- if (str[0])
- printk("unrecognized \"preferred-cstates=%s\"\n", str);
-
mwait_idle_state_table_update();
return 0;
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 7/9] x86/mwait-idle: drop const from struct cpuidle_state arrays
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
` (5 preceding siblings ...)
2026-03-12 16:56 ` [PATCH 6/9] x86/mwait-idle: Remove the 'preferred_cstates' parameter Jan Beulich
@ 2026-03-12 16:57 ` Jan Beulich
2026-04-24 17:57 ` Roger Pau Monné
2026-03-12 16:57 ` [PATCH 8/9] x86/mwait-idle: Add cmdline option to adjust C-states table Jan Beulich
` (2 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:57 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
A subsequent change will want to be able to alter them based on a new
command line option. (Note that some were __ro_after_init already.)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -83,7 +83,7 @@ enum c1e_promotion {
};
struct idle_cpu {
- const struct cpuidle_state *state_table;
+ struct cpuidle_state *state_table;
/*
* Hardware C-state auto-demotion may not always be optimal.
@@ -139,7 +139,7 @@ struct cpuidle_state {
* which is also the index into the MWAIT hint array.
* Thus C0 is a dummy.
*/
-static const struct cpuidle_state nehalem_cstates[] = {
+static struct cpuidle_state __ro_after_init nehalem_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -167,7 +167,7 @@ static const struct cpuidle_state nehale
{}
};
-static const struct cpuidle_state snb_cstates[] = {
+static struct cpuidle_state __ro_after_init snb_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -201,7 +201,7 @@ static const struct cpuidle_state snb_cs
{}
};
-static const struct cpuidle_state byt_cstates[] = {
+static struct cpuidle_state __ro_after_init byt_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -235,7 +235,7 @@ static const struct cpuidle_state byt_cs
{}
};
-static const struct cpuidle_state cht_cstates[] = {
+static struct cpuidle_state __ro_after_init cht_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -269,7 +269,7 @@ static const struct cpuidle_state cht_cs
{}
};
-static const struct cpuidle_state ivb_cstates[] = {
+static struct cpuidle_state __ro_after_init ivb_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -303,7 +303,7 @@ static const struct cpuidle_state ivb_cs
{}
};
-static const struct cpuidle_state ivt_cstates[] = {
+static struct cpuidle_state __ro_after_init ivt_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -331,7 +331,7 @@ static const struct cpuidle_state ivt_cs
{}
};
-static const struct cpuidle_state ivt_cstates_4s[] = {
+static struct cpuidle_state __ro_after_init ivt_cstates_4s[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -359,7 +359,7 @@ static const struct cpuidle_state ivt_cs
{}
};
-static const struct cpuidle_state ivt_cstates_8s[] = {
+static struct cpuidle_state __ro_after_init ivt_cstates_8s[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -387,7 +387,7 @@ static const struct cpuidle_state ivt_cs
{}
};
-static const struct cpuidle_state hsw_cstates[] = {
+static struct cpuidle_state __ro_after_init hsw_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -439,7 +439,7 @@ static const struct cpuidle_state hsw_cs
{}
};
-static const struct cpuidle_state bdw_cstates[] = {
+static struct cpuidle_state __ro_after_init bdw_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -565,7 +565,7 @@ static struct cpuidle_state __ro_after_i
{}
};
-static const struct cpuidle_state icx_cstates[] = {
+static struct cpuidle_state __ro_after_init icx_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_IRQ_ENABLE,
@@ -665,7 +665,7 @@ static struct cpuidle_state __ro_after_i
{}
};
-static const struct cpuidle_state mtl_l_cstates[] = {
+static struct cpuidle_state __ro_after_init mtl_l_cstates[] = {
{
.name = "C1E",
.flags = MWAIT2flg(0x01),
@@ -687,7 +687,7 @@ static const struct cpuidle_state mtl_l_
{}
};
-static const struct cpuidle_state gmt_cstates[] = {
+static struct cpuidle_state __ro_after_init gmt_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_DISABLED,
@@ -743,7 +743,7 @@ static struct cpuidle_state __ro_after_i
{}
};
-static const struct cpuidle_state gnr_cstates[] = {
+static struct cpuidle_state __ro_after_init gnr_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -771,7 +771,7 @@ static const struct cpuidle_state gnr_cs
{}
};
-static const struct cpuidle_state gnrd_cstates[] = {
+static struct cpuidle_state __ro_after_init gnrd_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -799,7 +799,7 @@ static const struct cpuidle_state gnrd_c
{}
};
-static const struct cpuidle_state atom_cstates[] = {
+static struct cpuidle_state __ro_after_init atom_cstates[] = {
{
.name = "C1E",
.flags = MWAIT2flg(0x00),
@@ -827,7 +827,7 @@ static const struct cpuidle_state atom_c
{}
};
-static const struct cpuidle_state tangier_cstates[] = {
+static struct cpuidle_state __ro_after_init tangier_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -861,7 +861,7 @@ static const struct cpuidle_state tangie
{}
};
-static const struct cpuidle_state avn_cstates[] = {
+static struct cpuidle_state __ro_after_init avn_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -923,7 +923,7 @@ static struct cpuidle_state __ro_after_i
{}
};
-static const struct cpuidle_state dnv_cstates[] = {
+static struct cpuidle_state __ro_after_init dnv_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -949,7 +949,7 @@ static const struct cpuidle_state dnv_cs
* Note, depending on HW and FW revision, SnowRidge SoC may or may not support
* C6, and this is indicated in the CPUID mwait leaf.
*/
-static const struct cpuidle_state snr_cstates[] = {
+static struct cpuidle_state __ro_after_init snr_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -971,7 +971,7 @@ static const struct cpuidle_state snr_cs
{}
};
-static const struct cpuidle_state grr_cstates[] = {
+static struct cpuidle_state __ro_after_init grr_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
@@ -993,7 +993,7 @@ static const struct cpuidle_state grr_cs
{}
};
-static const struct cpuidle_state srf_cstates[] = {
+static struct cpuidle_state __ro_after_init srf_cstates[] = {
{
.name = "C1",
.flags = MWAIT2flg(0x00),
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 8/9] x86/mwait-idle: Add cmdline option to adjust C-states table
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
` (6 preceding siblings ...)
2026-03-12 16:57 ` [PATCH 7/9] x86/mwait-idle: drop const from struct cpuidle_state arrays Jan Beulich
@ 2026-03-12 16:57 ` Jan Beulich
2026-04-24 19:10 ` Roger Pau Monné
2026-03-12 16:58 ` [PATCH 9/9] x86/mwait-idle: Add C-states validation Jan Beulich
2026-05-12 15:22 ` [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Oleksii Kurochko
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:57 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Add a new module parameter that allows adjusting the C-states table used by
the driver.
Currently, the C-states table is hardcoded in the driver based on the CPU
model. The goal is to have good enough defaults for most users.
However, C-state characteristics, such as exit latency and residency, can
vary between different variants of the same CPU model and BIOS settings.
Moreover, different platform usage models and user preferences may benefit
from different C-state target_residency values.
Provide a way for users to adjust the C-states table via a module parameter
"table". The general format is:
"state1:latency1:target_residency1,state2:latency2:target_residency2,..."
In other words, represent each C-state by its name, exit latency (in
microseconds), and target residency (in microseconds), separated by colons.
Separate multiple C-states by commas.
For example, suppose a CPU has 3 C-states with the following
characteristics:
C1: exit_latency=1, target_residency=2
C1E: exit_latency=10, target_residency=10
C6: exit_latency=100, target_residency=500
Users can specify a custom C-states table as follows:
1. intel_idle.table="C1:2:2,C1E:5:20,C6:150:600"
Result: C1: exit_latency=2, target_residency=2
C1E: exit_latency=5, target_residency=20
C6: exit_latency=150, target_residency=600
2. intel_idle.table="C6::400"
Result: C1: exit_latency=1, target_residency=2 (unchanged)
C1E: exit_latency=10, target_residency=10 (unchanged)
C6: exit_latency=100, target_residency=400
(only target_residency changed)
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20251216080402.156988-3-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 111f77a23348
Add __init to get_cmdline_field(). Put cmdline_table_str[] in .init.data.
Other adjustments to fit our env.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
For the initial attempt, I've left the new option as a standalone one. It
may be worth integrating with "mwait-idle", but I think much of the
parsing would then want doing differently. It'll then likely be much
harder to apply future Linux changes there.
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1928,6 +1928,23 @@ Print boot time MTRR state.
Use the MWAIT idle driver (with model specific C-state knowledge) instead
of the ACPI based one.
+### mwait-idle.table (x86)
+> `= <string>`
+
+ Define the C-states table from a user input string. Expected format is
+ 'name:latency:residency', where:
+ - name: The C-state name.
+ - latency: The C-state exit latency in us.
+ - residency: The C-state target residency in us.
+
+ Multiple C-states can be defined by separating them with commas:
+ 'name1:latency1:residency1,name2:latency2:residency2'
+
+ Example: intel_idle.table=C1:1:1,C1E:5:10,C6:100:600
+
+ To leave latency or residency unchanged, use an empty field, for example:
+ 'C1:1:1,C1E::10' - leaves C1E latency unchanged.
+
### nmi (x86)
> `= ignore | dom0 | fatal`
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -70,6 +70,11 @@
static __initdata bool opt_mwait_idle = true;
boolean_param("mwait-idle", opt_mwait_idle);
+/* The maximum allowed length for the 'table' module parameter */
+#define MAX_CMDLINE_TABLE_LEN 256
+static char cmdline_table_str[MAX_CMDLINE_TABLE_LEN] __initdata;
+string_param("mwait-idle.table", cmdline_table_str);
+
static unsigned int mwait_substates;
#define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
@@ -122,6 +127,9 @@ struct cpuidle_state {
*/
#define CPUIDLE_FLAG_IBRS 0x20000
+/* C-states data from the 'mwait-idle.table' cmdline parameter */
+static struct cpuidle_state cmdline_states[ACPI_PROCESSOR_MAX_POWER] __initdata;
+
/*
* MWAIT takes an 8-bit "hint" in EAX "suggesting"
* the C-state (top nibble) and sub-state (bottom nibble)
@@ -1546,6 +1554,161 @@ static void __init mwait_idle_state_tabl
}
}
+ /**
+ * get_cmdline_field - Get the current field from a cmdline string.
+ * @args: The cmdline string to get the current field from.
+ * @field: Pointer to the current field upon return.
+ * @sep: The fields separator character.
+ *
+ * Examples:
+ * Input: args="C1:1:1,C1E:2:10", sep=':'
+ * Output: field="C1", return "1:1,C1E:2:10"
+ * Input: args="C1:1:1,C1E:2:10", sep=','
+ * Output: field="C1:1:1", return "C1E:2:10"
+ * Ipnut: args="::", sep=':'
+ * Output: field="", return ":"
+ *
+ * Return: The continuation of the cmdline string after the field or NULL.
+ */
+static char *__init get_cmdline_field(char *args, char **field, char sep)
+{
+ unsigned int i;
+
+ for (i = 0; args[i] && !isspace(args[i]); i++) {
+ if (args[i] == sep)
+ break;
+ }
+
+ *field = args;
+
+ if (args[i] != sep)
+ return NULL;
+
+ args[i] = '\0';
+ return args + i + 1;
+}
+
+/**
+ * cmdline_table_adjust - Adjust the C-states table with data from cmdline.
+ *
+ * Adjust the C-states table with data from the 'mwait-idle.table' parameter
+ * (if specified).
+ */
+static void __init cmdline_table_adjust(void)
+{
+ char *args = cmdline_table_str;
+ struct cpuidle_state *state;
+ unsigned int i, state_count;
+
+ if (args[0] == '\0')
+ /* The 'mwait-idle.table' module parameter was not specified */
+ return;
+
+ /* Create a copy of the C-states table */
+ for (i = 0;
+ i < ARRAY_SIZE(cmdline_states) && icpu.state_table[i].name[0];
+ i++)
+ cmdline_states[i] = icpu.state_table[i];
+
+ state_count = i;
+
+ /*
+ * Adjust the C-states table copy with data from the 'mwait-idle.table'
+ * module parameter.
+ */
+ while (args) {
+ char *fields, *name, *val;
+
+ /*
+ * Get the next C-state definition, which is expected to be
+ * '<name>:<latency_us>:<target_residency_us>'. Treat "empty"
+ * fields as unchanged. For example,
+ * '<name>::<target_residency_us>' leaves the latency unchanged.
+ */
+ args = get_cmdline_field(args, &fields, ',');
+
+ /* name */
+ fields = get_cmdline_field(fields, &name, ':');
+ if (!fields)
+ goto error;
+
+ /* Find the C-state by its name */
+ state = NULL;
+ for (i = 0; i < state_count; i++) {
+ if (!strcmp(name, cmdline_states[i].name)) {
+ state = &cmdline_states[i];
+ break;
+ }
+ }
+
+ if (!state) {
+ printk(XENLOG_ERR PREFIX "C-state '%s' was not found\n",
+ name);
+ continue;
+ }
+
+ /* Latency */
+ fields = get_cmdline_field(fields, &val, ':');
+ if (!fields)
+ goto error;
+
+ if (*val) {
+ const char *end;
+ unsigned long n = simple_strtoul(val, &end, 0);
+
+ state->exit_latency = n;
+ if (*end || state->exit_latency != n)
+ goto error;
+ }
+
+ /* Target residency */
+ fields = get_cmdline_field(fields, &val, ':');
+
+ if (*val) {
+ const char *end;
+ unsigned long n = simple_strtoul(val, &end, 0);
+
+ state->target_residency = n;
+ if (*end || state->target_residency != n)
+ goto error;
+ }
+
+ /*
+ * Allow for 3 more fields, but ignore them. Helps to make
+ * possible future extensions of the cmdline format backward
+ * compatible.
+ */
+ for (i = 0; fields && i < 3; i++) {
+ fields = get_cmdline_field(fields, &val, ':');
+ if (!fields)
+ break;
+ }
+
+ if (fields) {
+ printk(XENLOG_ERR PREFIX
+ "Too many fields for C-state '%s'\n",
+ state->name);
+ goto error;
+ }
+
+ printk(XENLOG_INFO PREFIX
+ "C-state from cmdline: name=%s, latency=%u, residency=%u\n",
+ state->name, state->exit_latency, state->target_residency);
+ }
+
+ /* Copy the adjusted C-states table back */
+ for (i = 0; i < state_count; i++)
+ icpu.state_table[i] = cmdline_states[i];
+
+ printk(XENLOG_INFO PREFIX
+ "Adjusted C-states with data from 'mwait-idle.table'\n");
+ return;
+
+ error:
+ printk(PREFIX
+ "Failed to adjust C-states with data from 'mwait-idle.table'\n");
+}
+
static int __init mwait_idle_probe(void)
{
unsigned int eax, ebx, ecx;
@@ -1595,6 +1758,8 @@ static int __init mwait_idle_probe(void)
mwait_idle_state_table_update();
+ cmdline_table_adjust();
+
return 0;
}
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 9/9] x86/mwait-idle: Add C-states validation
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
` (7 preceding siblings ...)
2026-03-12 16:57 ` [PATCH 8/9] x86/mwait-idle: Add cmdline option to adjust C-states table Jan Beulich
@ 2026-03-12 16:58 ` Jan Beulich
2026-04-24 19:15 ` Roger Pau Monné
2026-05-12 15:22 ` [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Oleksii Kurochko
9 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-03-12 16:58 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Add validation for C-states specified via the "table=" module parameter.
Treat this module parameter as untrusted input and validate it thoroughly.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20251216080402.156988-4-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git be6a150829b3
Add __init to validate_cmdline_cstate(). Other adjustments to fit our env.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -72,6 +72,11 @@ boolean_param("mwait-idle", opt_mwait_id
/* The maximum allowed length for the 'table' module parameter */
#define MAX_CMDLINE_TABLE_LEN 256
+/* Maximum allowed C-state latency */
+#define MAX_CMDLINE_LATENCY_US (5 * 1000 /* USEC_PER_MSEC */)
+/* Maximum allowed C-state target residency */
+#define MAX_CMDLINE_RESIDENCY_US (100 * 1000 /* USEC_PER_MSEC */)
+
static char cmdline_table_str[MAX_CMDLINE_TABLE_LEN] __initdata;
string_param("mwait-idle.table", cmdline_table_str);
@@ -1589,6 +1594,41 @@ static char *__init get_cmdline_field(ch
}
/**
+ * validate_cmdline_cstate - Validate a C-state from cmdline.
+ * @state: The C-state to validate.
+ * @prev_state: The previous C-state in the table or NULL.
+ *
+ * Return: 0 if the C-state is valid or -EINVAL otherwise.
+ */
+static int __init validate_cmdline_cstate(struct cpuidle_state *state,
+ struct cpuidle_state *prev_state)
+{
+ if (state->exit_latency == 0)
+ /* Exit latency 0 can only be used for the POLL state */
+ return -EINVAL;
+
+ if (state->exit_latency > MAX_CMDLINE_LATENCY_US)
+ return -EINVAL;
+
+ if (state->target_residency > MAX_CMDLINE_RESIDENCY_US)
+ return -EINVAL;
+
+ if (state->target_residency < state->exit_latency)
+ return -EINVAL;
+
+ if (!prev_state)
+ return 0;
+
+ if (state->exit_latency <= prev_state->exit_latency)
+ return -EINVAL;
+
+ if (state->target_residency <= prev_state->target_residency)
+ return -EINVAL;
+
+ return 0;
+}
+
+/**
* cmdline_table_adjust - Adjust the C-states table with data from cmdline.
*
* Adjust the C-states table with data from the 'mwait-idle.table' parameter
@@ -1696,6 +1736,21 @@ static void __init cmdline_table_adjust(
state->name, state->exit_latency, state->target_residency);
}
+ /* Validate the adjusted C-states */
+ for (i = 0; i < state_count; i++) {
+ struct cpuidle_state *prev_state;
+
+ state = &cmdline_states[i];
+ prev_state = i ? &cmdline_states[i - 1] : NULL;
+
+ if (validate_cmdline_cstate(state, prev_state)) {
+ printk(XENLOG_ERR PREFIX
+ "C-state '%s' validation failed\n",
+ state->name);
+ goto error;
+ }
+ }
+
/* Copy the adjusted C-states table back */
for (i = 0; i < state_count; i++)
icpu.state_table[i] = cmdline_states[i];
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume
2026-03-12 16:54 ` [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume Jan Beulich
@ 2026-04-24 14:34 ` Roger Pau Monné
2026-05-04 9:02 ` Jan Beulich
0 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 14:34 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:54:30PM +0100, Jan Beulich wrote:
> mwait_idle_cpu_init() is only called for APs, yet MSR writes will
> typically need re-doing post-S3 even for the BSP. When multiple cores /
> threads are present (and to come back online) in a package, for package
> scope MSRs this may be covered by APs doing the writes, but we can't rely
> on that.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/acpi/power.c
> +++ b/xen/arch/x86/acpi/power.c
> @@ -28,6 +28,7 @@
> #include <asm/io_apic.h>
> #include <asm/irq.h>
> #include <asm/microcode.h>
> +#include <asm/mwait.h>
> #include <asm/prot-key.h>
> #include <asm/spec_ctrl.h>
> #include <asm/tboot.h>
> @@ -299,6 +300,7 @@ static int enter_state(u32 state)
> acpi_sleep_post(state);
> if ( hvm_cpu_up() )
> BUG();
> + mwait_idle_resume();
> cpufreq_add_cpu(0);
>
> enable_cpu:
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -1680,6 +1680,28 @@ static int __init mwait_idle_probe(void)
> return 0;
> }
>
> +static void mwait_idle_cpu_tweak(unsigned int cpu)
> +{
> + if (icpu->auto_demotion_disable_flags)
> + on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
> +
> + if (icpu->byt_auto_demotion_disable_flag)
> + on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
> +
> + switch (icpu->c1e_promotion) {
> + case C1E_PROMOTION_DISABLE:
> + on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
> + break;
> +
> + case C1E_PROMOTION_ENABLE:
> + on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
> + break;
> +
> + case C1E_PROMOTION_PRESERVE:
> + break;
> + }
> +}
> +
> static int cf_check mwait_idle_cpu_init(
> struct notifier_block *nfb, unsigned long action, void *hcpu)
> {
> @@ -1762,24 +1784,7 @@ static int cf_check mwait_idle_cpu_init(
> dev->count++;
> }
>
> - if (icpu->auto_demotion_disable_flags)
> - on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
> -
> - if (icpu->byt_auto_demotion_disable_flag)
> - on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
> -
> - switch (icpu->c1e_promotion) {
> - case C1E_PROMOTION_DISABLE:
> - on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
> - break;
> -
> - case C1E_PROMOTION_ENABLE:
> - on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
> - break;
I'm possibly missing some context here, but why do we use
on_selected_cpus and the CPU_ONLINE hook? Won't it be easier to use
CPU_STARTING and avoid the use of on_selected_cpus(), as CPU_STARTING
runs in the context of the CPU being onlined.
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/9] x86/mwait-idle: clean up BYT/CHT auto demotion disable
2026-03-12 16:54 ` [PATCH 2/9] x86/mwait-idle: clean up BYT/CHT auto demotion disable Jan Beulich
@ 2026-04-24 14:47 ` Roger Pau Monné
2026-05-04 9:07 ` Jan Beulich
0 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 14:47 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:54:56PM +0100, Jan Beulich wrote:
> Bay Trail (BYT) and Cherry Trail (CHT) platforms have a very specific way
> of disabling auto-demotion via specific MSR bits. Clean up the code so that
> BYT/CHT-specifics do not show up in the common 'struct idle_cpu' data
> structure.
>
> Remove the 'byt_auto_demotion_disable_flag' flag from 'struct idle_cpu',
> because a better coding pattern is to avoid very case-specific fields like
> 'bool byt_auto_demotion_disable_flag' in a common data structure, which is
> used for all platforms, not only BYT/CHT. The code is just more readable
> when common data structures contain only commonly used fields.
>
> Instead, match BYT/CHT in the 'intel_idle_init_cstates_icpu()' function,
> and introduce a small helper to take care of BYT/CHT auto-demotion. This
> is consistent with how platform-specific things are done for other
> platforms.
>
> No intended functional changes.
>
> Inspired by (and description largely taken from) Linux'es c93d13b661a6
> ("intel_idle: clean up BYT/CHT auto demotion disable").
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -104,7 +104,6 @@ struct idle_cpu {
> * Indicate which enable bits to clear here.
> */
> unsigned long auto_demotion_disable_flags;
> - bool byt_auto_demotion_disable_flag;
> enum c1e_promotion c1e_promotion;
> };
>
> @@ -1144,7 +1143,7 @@ static void cf_check auto_demotion_disab
> wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
> }
>
> -static void cf_check byt_auto_demotion_disable(void *dummy)
> +static void byt_cht_auto_demotion_disable(void)
> {
> wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0);
> wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0);
> @@ -1195,13 +1194,11 @@ static const struct idle_cpu idle_cpu_sn
> static const struct idle_cpu idle_cpu_byt = {
> .state_table = byt_cstates,
> .c1e_promotion = C1E_PROMOTION_DISABLE,
> - .byt_auto_demotion_disable_flag = true,
> };
>
> static const struct idle_cpu idle_cpu_cht = {
> .state_table = cht_cstates,
> .c1e_promotion = C1E_PROMOTION_DISABLE,
> - .byt_auto_demotion_disable_flag = true,
> };
>
> static const struct idle_cpu idle_cpu_ivb = {
> @@ -1680,14 +1677,11 @@ static int __init mwait_idle_probe(void)
> return 0;
> }
>
> -static void mwait_idle_cpu_tweak(unsigned int cpu)
> +static void mwait_idle_cpu_tweak(unsigned int cpu, bool bsp)
> {
> if (icpu->auto_demotion_disable_flags)
> on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
>
> - if (icpu->byt_auto_demotion_disable_flag)
> - on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
> -
> switch (icpu->c1e_promotion) {
> case C1E_PROMOTION_DISABLE:
> on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
> @@ -1700,12 +1694,24 @@ static void mwait_idle_cpu_tweak(unsigne
> case C1E_PROMOTION_PRESERVE:
> break;
> }
> +
> + /* Pkg-scope MSRs on 1-socket-only systems need writing only once. */
> + if (!bsp)
> + return;
> +
> + switch (boot_cpu_data.vfm) {
> + case INTEL_ATOM_SILVERMONT:
> + case INTEL_ATOM_AIRMONT:
> + byt_cht_auto_demotion_disable();
> + break;
> + }
> }
>
> static int cf_check mwait_idle_cpu_init(
> struct notifier_block *nfb, unsigned long action, void *hcpu)
> {
> unsigned int cpu = (unsigned long)hcpu, cstate;
> + static bool first;
I think you want to init first = true here, so that after the first
call to mwait_idle_cpu_tweak() it gets set to false for future calls?
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/9] x86/mwait-idle: latch struct idle_cpu contents
2026-03-12 16:55 ` [PATCH 3/9] x86/mwait-idle: latch struct idle_cpu contents Jan Beulich
@ 2026-04-24 15:24 ` Roger Pau Monné
0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 15:24 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:55:22PM +0100, Jan Beulich wrote:
> Rather than storing a pointer (and needing to keep all struct instances in
> memory post-init), and rather than (like the Linux counterpart has it)
> keeping individual variables, simply copy the respective structure
> instance. By implication, subsequent updates now need doing to the copy.
Shouldn't the aim here be to move the myriad of per-arch cpuidle_state
arrays to the init section, so we can get rid of them after boot?
Overall I wonder whether we would rather attempt to sya in-sync with
what Linux does, simply because it's then easier to pick updates fro
Linux, and less likely to introduce bugs as part of the modification
needs to adapt the Linux code into our fork.
Not saying the change is bad, just wondering whether in general we
would better try to keep the code bases as similar as feasible.
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/9] x86/mwait-idle: move pre-initialized struct idle_cpu instances
2026-03-12 16:55 ` [PATCH 4/9] x86/mwait-idle: move pre-initialized struct idle_cpu instances Jan Beulich
@ 2026-04-24 15:33 ` Roger Pau Monné
0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 15:33 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:55:50PM +0100, Jan Beulich wrote:
> Now that they're not referenced anymore post-init, they can themselves
> move into .init.rodata. (idle_cpu_adl{,_l} can also become const in the
> first place.)
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Oh, OK, that's what I was missing from the previous patch, then for
this and the previous:
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Albeit I'm still concerned whether us diverging (further?) from Linux
would causes more work when picking up upstream changes.
FWIW, I would also consider squashing this into the previous patch.
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 5/9] x86/mwait-idle: Remove unused driver version constant
2026-03-12 16:56 ` [PATCH 5/9] x86/mwait-idle: Remove unused driver version constant Jan Beulich
@ 2026-04-24 15:35 ` Roger Pau Monné
0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 15:35 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:56:24PM +0100, Jan Beulich wrote:
> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>
> The MWAIT_IDLE_VERSION constant has not been updated since 2016 and serves
> no useful purpose. The driver version is implicitly defined by the
> hypervisor version, making this constant redundant.
>
> Remove the constant to eliminate potential confusion about version
> tracking.
>
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
> Link: https://patch.msgid.link/20251215111229.132705-1-dedekind1@gmail.com
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 25ff69011ddf
>
> Adjust description to fit our code base.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 6/9] x86/mwait-idle: Remove the 'preferred_cstates' parameter
2026-03-12 16:56 ` [PATCH 6/9] x86/mwait-idle: Remove the 'preferred_cstates' parameter Jan Beulich
@ 2026-04-24 15:37 ` Roger Pau Monné
0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 15:37 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:56:55PM +0100, Jan Beulich wrote:
> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>
> Remove the 'preferred_cstates' module parameter as it is not really useful.
>
> The parameter currently only affects Alder Lake, where it controls C1/C1E
> preference, with C1E being the default. The parameter does not support any
> other platform. For example, Meteor Lake has a similar C1/C1E limitation,
> but the parameter does not support Meteor Lake. This indicates that the
> parameter is not very useful.
>
> Generally, independent C1 and C1E are important for server platforms where
> low latency is key. However, they are not as important for client platforms,
> like Alder Lake, where C1E providing better energy savings is generally
> preferred.
>
> The parameter was originally introduced for Sapphire Rapids Xeon:
> da0e58c038e6 intel_idle: add 'preferred_cstates' module argument
>
> Later it was added to Alder Lake:
> d1cf8bbfed1ed ("intel_idle: Add AlderLake support")
>
> But it was removed from Sapphire Rapids when firmware fixed the C1/C1E
> limitation:
> 1548fac47a114 ("intel_idle: make SPR C1 and C1E be independent")
>
> So Alder Lake is the only platform left where this parameter has any effect.
> Remove this parameter to simplify the driver and reduce maintenance burden.
>
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Link: https://patch.msgid.link/20251215111300.132803-1-dedekind1@gmail.com
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a36dc37b5672
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 7/9] x86/mwait-idle: drop const from struct cpuidle_state arrays
2026-03-12 16:57 ` [PATCH 7/9] x86/mwait-idle: drop const from struct cpuidle_state arrays Jan Beulich
@ 2026-04-24 17:57 ` Roger Pau Monné
2026-05-04 9:14 ` Jan Beulich
0 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 17:57 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:57:18PM +0100, Jan Beulich wrote:
> A subsequent change will want to be able to alter them based on a new
> command line option. (Note that some were __ro_after_init already.)
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
I wonder if we could also move all the cstate tables to .init section,
as after boot we would only use one of those. Anyway, for the change
here:
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 8/9] x86/mwait-idle: Add cmdline option to adjust C-states table
2026-03-12 16:57 ` [PATCH 8/9] x86/mwait-idle: Add cmdline option to adjust C-states table Jan Beulich
@ 2026-04-24 19:10 ` Roger Pau Monné
2026-05-04 9:29 ` Jan Beulich
0 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 19:10 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:57:53PM +0100, Jan Beulich wrote:
> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>
> Add a new module parameter that allows adjusting the C-states table used by
> the driver.
>
> Currently, the C-states table is hardcoded in the driver based on the CPU
> model. The goal is to have good enough defaults for most users.
>
> However, C-state characteristics, such as exit latency and residency, can
> vary between different variants of the same CPU model and BIOS settings.
> Moreover, different platform usage models and user preferences may benefit
> from different C-state target_residency values.
>
> Provide a way for users to adjust the C-states table via a module parameter
> "table". The general format is:
> "state1:latency1:target_residency1,state2:latency2:target_residency2,..."
>
> In other words, represent each C-state by its name, exit latency (in
> microseconds), and target residency (in microseconds), separated by colons.
> Separate multiple C-states by commas.
>
> For example, suppose a CPU has 3 C-states with the following
> characteristics:
> C1: exit_latency=1, target_residency=2
> C1E: exit_latency=10, target_residency=10
> C6: exit_latency=100, target_residency=500
>
> Users can specify a custom C-states table as follows:
>
> 1. intel_idle.table="C1:2:2,C1E:5:20,C6:150:600"
> Result: C1: exit_latency=2, target_residency=2
> C1E: exit_latency=5, target_residency=20
> C6: exit_latency=150, target_residency=600
> 2. intel_idle.table="C6::400"
> Result: C1: exit_latency=1, target_residency=2 (unchanged)
> C1E: exit_latency=10, target_residency=10 (unchanged)
> C6: exit_latency=100, target_residency=400
> (only target_residency changed)
>
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Link: https://patch.msgid.link/20251216080402.156988-3-dedekind1@gmail.com
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 111f77a23348
>
> Add __init to get_cmdline_field(). Put cmdline_table_str[] in .init.data.
> Other adjustments to fit our env.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> For the initial attempt, I've left the new option as a standalone one. It
> may be worth integrating with "mwait-idle", but I think much of the
> parsing would then want doing differently. It'll then likely be much
> harder to apply future Linux changes there.
>
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1928,6 +1928,23 @@ Print boot time MTRR state.
> Use the MWAIT idle driver (with model specific C-state knowledge) instead
> of the ACPI based one.
>
> +### mwait-idle.table (x86)
The .table suffix is kind of weird, as we don't use it elsewhere. It
might feel more natural given the naming of the existing command line
options to use mwait-idle-table?
> +> `= <string>`
> +
> + Define the C-states table from a user input string. Expected format is
> + 'name:latency:residency', where:
> + - name: The C-state name.
> + - latency: The C-state exit latency in us.
> + - residency: The C-state target residency in us.
> +
> + Multiple C-states can be defined by separating them with commas:
> + 'name1:latency1:residency1,name2:latency2:residency2'
> +
> + Example: intel_idle.table=C1:1:1,C1E:5:10,C6:100:600
s/intel_idle/mwait-idle/ in the line above.
> +
> + To leave latency or residency unchanged, use an empty field, for example:
> + 'C1:1:1,C1E::10' - leaves C1E latency unchanged.
> +
> ### nmi (x86)
> > `= ignore | dom0 | fatal`
>
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -70,6 +70,11 @@
> static __initdata bool opt_mwait_idle = true;
> boolean_param("mwait-idle", opt_mwait_idle);
>
> +/* The maximum allowed length for the 'table' module parameter */
> +#define MAX_CMDLINE_TABLE_LEN 256
> +static char cmdline_table_str[MAX_CMDLINE_TABLE_LEN] __initdata;
> +string_param("mwait-idle.table", cmdline_table_str);
> +
> static unsigned int mwait_substates;
>
> #define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
> @@ -122,6 +127,9 @@ struct cpuidle_state {
> */
> #define CPUIDLE_FLAG_IBRS 0x20000
>
> +/* C-states data from the 'mwait-idle.table' cmdline parameter */
> +static struct cpuidle_state cmdline_states[ACPI_PROCESSOR_MAX_POWER] __initdata;
> +
> /*
> * MWAIT takes an 8-bit "hint" in EAX "suggesting"
> * the C-state (top nibble) and sub-state (bottom nibble)
> @@ -1546,6 +1554,161 @@ static void __init mwait_idle_state_tabl
> }
> }
>
> + /**
> + * get_cmdline_field - Get the current field from a cmdline string.
> + * @args: The cmdline string to get the current field from.
> + * @field: Pointer to the current field upon return.
> + * @sep: The fields separator character.
> + *
> + * Examples:
> + * Input: args="C1:1:1,C1E:2:10", sep=':'
> + * Output: field="C1", return "1:1,C1E:2:10"
> + * Input: args="C1:1:1,C1E:2:10", sep=','
> + * Output: field="C1:1:1", return "C1E:2:10"
> + * Ipnut: args="::", sep=':'
> + * Output: field="", return ":"
> + *
> + * Return: The continuation of the cmdline string after the field or NULL.
> + */
> +static char *__init get_cmdline_field(char *args, char **field, char sep)
> +{
> + unsigned int i;
> +
> + for (i = 0; args[i] && !isspace(args[i]); i++) {
> + if (args[i] == sep)
> + break;
> + }
> +
> + *field = args;
> +
> + if (args[i] != sep)
> + return NULL;
> +
> + args[i] = '\0';
> + return args + i + 1;
> +}
> +
> +/**
> + * cmdline_table_adjust - Adjust the C-states table with data from cmdline.
> + *
> + * Adjust the C-states table with data from the 'mwait-idle.table' parameter
> + * (if specified).
> + */
> +static void __init cmdline_table_adjust(void)
> +{
> + char *args = cmdline_table_str;
> + struct cpuidle_state *state;
> + unsigned int i, state_count;
> +
> + if (args[0] == '\0')
> + /* The 'mwait-idle.table' module parameter was not specified */
> + return;
> +
> + /* Create a copy of the C-states table */
> + for (i = 0;
> + i < ARRAY_SIZE(cmdline_states) && icpu.state_table[i].name[0];
> + i++)
> + cmdline_states[i] = icpu.state_table[i];
> +
> + state_count = i;
> +
> + /*
> + * Adjust the C-states table copy with data from the 'mwait-idle.table'
> + * module parameter.
> + */
> + while (args) {
> + char *fields, *name, *val;
> +
> + /*
> + * Get the next C-state definition, which is expected to be
> + * '<name>:<latency_us>:<target_residency_us>'. Treat "empty"
> + * fields as unchanged. For example,
> + * '<name>::<target_residency_us>' leaves the latency unchanged.
> + */
> + args = get_cmdline_field(args, &fields, ',');
> +
> + /* name */
> + fields = get_cmdline_field(fields, &name, ':');
> + if (!fields)
> + goto error;
> +
> + /* Find the C-state by its name */
> + state = NULL;
> + for (i = 0; i < state_count; i++) {
> + if (!strcmp(name, cmdline_states[i].name)) {
> + state = &cmdline_states[i];
> + break;
> + }
> + }
> +
> + if (!state) {
> + printk(XENLOG_ERR PREFIX "C-state '%s' was not found\n",
> + name);
> + continue;
> + }
> +
> + /* Latency */
> + fields = get_cmdline_field(fields, &val, ':');
> + if (!fields)
> + goto error;
> +
> + if (*val) {
> + const char *end;
> + unsigned long n = simple_strtoul(val, &end, 0);
> +
> + state->exit_latency = n;
> + if (*end || state->exit_latency != n)
> + goto error;
> + }
> +
> + /* Target residency */
> + fields = get_cmdline_field(fields, &val, ':');
> +
> + if (*val) {
> + const char *end;
> + unsigned long n = simple_strtoul(val, &end, 0);
> +
> + state->target_residency = n;
> + if (*end || state->target_residency != n)
> + goto error;
> + }
> +
> + /*
> + * Allow for 3 more fields, but ignore them. Helps to make
> + * possible future extensions of the cmdline format backward
> + * compatible.
> + */
> + for (i = 0; fields && i < 3; i++) {
> + fields = get_cmdline_field(fields, &val, ':');
> + if (!fields)
> + break;
> + }
This seems a bit arbitrary for my taste. I would rather ignore the
extra fields (and print a message about it), and proceed with the next
state.
> +
> + if (fields) {
> + printk(XENLOG_ERR PREFIX
> + "Too many fields for C-state '%s'\n",
> + state->name);
> + goto error;
> + }
> +
> + printk(XENLOG_INFO PREFIX
> + "C-state from cmdline: name=%s, latency=%u, residency=%u\n",
> + state->name, state->exit_latency, state->target_residency);
> + }
> +
> + /* Copy the adjusted C-states table back */
> + for (i = 0; i < state_count; i++)
> + icpu.state_table[i] = cmdline_states[i];
> +
> + printk(XENLOG_INFO PREFIX
> + "Adjusted C-states with data from 'mwait-idle.table'\n");
> + return;
> +
> + error:
> + printk(PREFIX
XENLOG_ERR maybe?
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 9/9] x86/mwait-idle: Add C-states validation
2026-03-12 16:58 ` [PATCH 9/9] x86/mwait-idle: Add C-states validation Jan Beulich
@ 2026-04-24 19:15 ` Roger Pau Monné
2026-05-04 9:34 ` Jan Beulich
0 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2026-04-24 19:15 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, Mar 12, 2026 at 05:58:21PM +0100, Jan Beulich wrote:
> From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
>
> Add validation for C-states specified via the "table=" module parameter.
> Treat this module parameter as untrusted input and validate it thoroughly.
>
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Link: https://patch.msgid.link/20251216080402.156988-4-dedekind1@gmail.com
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git be6a150829b3
>
> Add __init to validate_cmdline_cstate(). Other adjustments to fit our env.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -72,6 +72,11 @@ boolean_param("mwait-idle", opt_mwait_id
>
> /* The maximum allowed length for the 'table' module parameter */
> #define MAX_CMDLINE_TABLE_LEN 256
> +/* Maximum allowed C-state latency */
> +#define MAX_CMDLINE_LATENCY_US (5 * 1000 /* USEC_PER_MSEC */)
> +/* Maximum allowed C-state target residency */
> +#define MAX_CMDLINE_RESIDENCY_US (100 * 1000 /* USEC_PER_MSEC */)
> +
> static char cmdline_table_str[MAX_CMDLINE_TABLE_LEN] __initdata;
> string_param("mwait-idle.table", cmdline_table_str);
>
> @@ -1589,6 +1594,41 @@ static char *__init get_cmdline_field(ch
> }
>
> /**
> + * validate_cmdline_cstate - Validate a C-state from cmdline.
> + * @state: The C-state to validate.
> + * @prev_state: The previous C-state in the table or NULL.
> + *
> + * Return: 0 if the C-state is valid or -EINVAL otherwise.
Hm, I know we picked this up from upstream, but this function would
better return a boolean, rather than 0 or -EINVAL.
> + */
> +static int __init validate_cmdline_cstate(struct cpuidle_state *state,
> + struct cpuidle_state *prev_state)
> +{
> + if (state->exit_latency == 0)
> + /* Exit latency 0 can only be used for the POLL state */
> + return -EINVAL;
> +
> + if (state->exit_latency > MAX_CMDLINE_LATENCY_US)
> + return -EINVAL;
> +
> + if (state->target_residency > MAX_CMDLINE_RESIDENCY_US)
> + return -EINVAL;
> +
> + if (state->target_residency < state->exit_latency)
> + return -EINVAL;
> +
> + if (!prev_state)
> + return 0;
> +
> + if (state->exit_latency <= prev_state->exit_latency)
> + return -EINVAL;
> +
> + if (state->target_residency <= prev_state->target_residency)
> + return -EINVAL;
I'm not an expert on C-states, but isn't this checking against the
previous value kind of defeating part of the purpose of the command
line?
Also, it might help to also write down those limits in the command
line documentation.
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume
2026-04-24 14:34 ` Roger Pau Monné
@ 2026-05-04 9:02 ` Jan Beulich
0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2026-05-04 9:02 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 24.04.2026 16:34, Roger Pau Monné wrote:
> On Thu, Mar 12, 2026 at 05:54:30PM +0100, Jan Beulich wrote:
>> mwait_idle_cpu_init() is only called for APs, yet MSR writes will
>> typically need re-doing post-S3 even for the BSP. When multiple cores /
>> threads are present (and to come back online) in a package, for package
>> scope MSRs this may be covered by APs doing the writes, but we can't rely
>> on that.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -28,6 +28,7 @@
>> #include <asm/io_apic.h>
>> #include <asm/irq.h>
>> #include <asm/microcode.h>
>> +#include <asm/mwait.h>
>> #include <asm/prot-key.h>
>> #include <asm/spec_ctrl.h>
>> #include <asm/tboot.h>
>> @@ -299,6 +300,7 @@ static int enter_state(u32 state)
>> acpi_sleep_post(state);
>> if ( hvm_cpu_up() )
>> BUG();
>> + mwait_idle_resume();
>> cpufreq_add_cpu(0);
>>
>> enable_cpu:
>> --- a/xen/arch/x86/cpu/mwait-idle.c
>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>> @@ -1680,6 +1680,28 @@ static int __init mwait_idle_probe(void)
>> return 0;
>> }
>>
>> +static void mwait_idle_cpu_tweak(unsigned int cpu)
>> +{
>> + if (icpu->auto_demotion_disable_flags)
>> + on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
>> +
>> + if (icpu->byt_auto_demotion_disable_flag)
>> + on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
>> +
>> + switch (icpu->c1e_promotion) {
>> + case C1E_PROMOTION_DISABLE:
>> + on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
>> + break;
>> +
>> + case C1E_PROMOTION_ENABLE:
>> + on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
>> + break;
>> +
>> + case C1E_PROMOTION_PRESERVE:
>> + break;
>> + }
>> +}
>> +
>> static int cf_check mwait_idle_cpu_init(
>> struct notifier_block *nfb, unsigned long action, void *hcpu)
>> {
>> @@ -1762,24 +1784,7 @@ static int cf_check mwait_idle_cpu_init(
>> dev->count++;
>> }
>>
>> - if (icpu->auto_demotion_disable_flags)
>> - on_selected_cpus(cpumask_of(cpu), auto_demotion_disable, NULL, 1);
>> -
>> - if (icpu->byt_auto_demotion_disable_flag)
>> - on_selected_cpus(cpumask_of(cpu), byt_auto_demotion_disable, NULL, 1);
>> -
>> - switch (icpu->c1e_promotion) {
>> - case C1E_PROMOTION_DISABLE:
>> - on_selected_cpus(cpumask_of(cpu), c1e_promotion_disable, NULL, 1);
>> - break;
>> -
>> - case C1E_PROMOTION_ENABLE:
>> - on_selected_cpus(cpumask_of(cpu), c1e_promotion_enable, NULL, 1);
>> - break;
>
> I'm possibly missing some context here, but why do we use
> on_selected_cpus and the CPU_ONLINE hook? Won't it be easier to use
> CPU_STARTING and avoid the use of on_selected_cpus(), as CPU_STARTING
> runs in the context of the CPU being onlined.
CPU_STARTING happens pretty early, e.g. before IRQs are first enabled. I
consider use of that notifier to generally be restricted to pretty
special purposes. But yes, technically switching over may be possible.
However, in any event what you're suggesting is imo an entirely separate
change.
Jan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/9] x86/mwait-idle: clean up BYT/CHT auto demotion disable
2026-04-24 14:47 ` Roger Pau Monné
@ 2026-05-04 9:07 ` Jan Beulich
0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2026-05-04 9:07 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 24.04.2026 16:47, Roger Pau Monné wrote:
> On Thu, Mar 12, 2026 at 05:54:56PM +0100, Jan Beulich wrote:
>> @@ -1700,12 +1694,24 @@ static void mwait_idle_cpu_tweak(unsigne
>> case C1E_PROMOTION_PRESERVE:
>> break;
>> }
>> +
>> + /* Pkg-scope MSRs on 1-socket-only systems need writing only once. */
>> + if (!bsp)
>> + return;
>> +
>> + switch (boot_cpu_data.vfm) {
>> + case INTEL_ATOM_SILVERMONT:
>> + case INTEL_ATOM_AIRMONT:
>> + byt_cht_auto_demotion_disable();
>> + break;
>> + }
>> }
>>
>> static int cf_check mwait_idle_cpu_init(
>> struct notifier_block *nfb, unsigned long action, void *hcpu)
>> {
>> unsigned int cpu = (unsigned long)hcpu, cstate;
>> + static bool first;
>
> I think you want to init first = true here, so that after the first
> call to mwait_idle_cpu_tweak() it gets set to false for future calls?
Ouch, yes, definitely.
Jan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 7/9] x86/mwait-idle: drop const from struct cpuidle_state arrays
2026-04-24 17:57 ` Roger Pau Monné
@ 2026-05-04 9:14 ` Jan Beulich
0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2026-05-04 9:14 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 24.04.2026 19:57, Roger Pau Monné wrote:
> On Thu, Mar 12, 2026 at 05:57:18PM +0100, Jan Beulich wrote:
>> A subsequent change will want to be able to alter them based on a new
>> command line option. (Note that some were __ro_after_init already.)
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> I wonder if we could also move all the cstate tables to .init section,
> as after boot we would only use one of those.
I think we could, at the price of introducing a compile-time upper bound
to the size of those arrays. We'd need to copy the one we want to use at
runtime, and hence we'd need to reserve enough space (or use a runtime
allocation).
> Anyway, for the change
> here:
>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks.
Jan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 8/9] x86/mwait-idle: Add cmdline option to adjust C-states table
2026-04-24 19:10 ` Roger Pau Monné
@ 2026-05-04 9:29 ` Jan Beulich
0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2026-05-04 9:29 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 24.04.2026 21:10, Roger Pau Monné wrote:
> On Thu, Mar 12, 2026 at 05:57:53PM +0100, Jan Beulich wrote:
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1928,6 +1928,23 @@ Print boot time MTRR state.
>> Use the MWAIT idle driver (with model specific C-state knowledge) instead
>> of the ACPI based one.
>>
>> +### mwait-idle.table (x86)
>
> The .table suffix is kind of weird, as we don't use it elsewhere. It
> might feel more natural given the naming of the existing command line
> options to use mwait-idle-table?
I thought it might be better to follow the Linux way of naming per-module
sub-options, when we "inherit" them. See also "mtrr.show".
>> +static void __init cmdline_table_adjust(void)
>> +{
>> + char *args = cmdline_table_str;
>> + struct cpuidle_state *state;
>> + unsigned int i, state_count;
>> +
>> + if (args[0] == '\0')
>> + /* The 'mwait-idle.table' module parameter was not specified */
>> + return;
>> +
>> + /* Create a copy of the C-states table */
>> + for (i = 0;
>> + i < ARRAY_SIZE(cmdline_states) && icpu.state_table[i].name[0];
>> + i++)
>> + cmdline_states[i] = icpu.state_table[i];
>> +
>> + state_count = i;
>> +
>> + /*
>> + * Adjust the C-states table copy with data from the 'mwait-idle.table'
>> + * module parameter.
>> + */
>> + while (args) {
>> + char *fields, *name, *val;
>> +
>> + /*
>> + * Get the next C-state definition, which is expected to be
>> + * '<name>:<latency_us>:<target_residency_us>'. Treat "empty"
>> + * fields as unchanged. For example,
>> + * '<name>::<target_residency_us>' leaves the latency unchanged.
>> + */
>> + args = get_cmdline_field(args, &fields, ',');
>> +
>> + /* name */
>> + fields = get_cmdline_field(fields, &name, ':');
>> + if (!fields)
>> + goto error;
>> +
>> + /* Find the C-state by its name */
>> + state = NULL;
>> + for (i = 0; i < state_count; i++) {
>> + if (!strcmp(name, cmdline_states[i].name)) {
>> + state = &cmdline_states[i];
>> + break;
>> + }
>> + }
>> +
>> + if (!state) {
>> + printk(XENLOG_ERR PREFIX "C-state '%s' was not found\n",
>> + name);
>> + continue;
>> + }
>> +
>> + /* Latency */
>> + fields = get_cmdline_field(fields, &val, ':');
>> + if (!fields)
>> + goto error;
>> +
>> + if (*val) {
>> + const char *end;
>> + unsigned long n = simple_strtoul(val, &end, 0);
>> +
>> + state->exit_latency = n;
>> + if (*end || state->exit_latency != n)
>> + goto error;
>> + }
>> +
>> + /* Target residency */
>> + fields = get_cmdline_field(fields, &val, ':');
>> +
>> + if (*val) {
>> + const char *end;
>> + unsigned long n = simple_strtoul(val, &end, 0);
>> +
>> + state->target_residency = n;
>> + if (*end || state->target_residency != n)
>> + goto error;
>> + }
>> +
>> + /*
>> + * Allow for 3 more fields, but ignore them. Helps to make
>> + * possible future extensions of the cmdline format backward
>> + * compatible.
>> + */
>> + for (i = 0; fields && i < 3; i++) {
>> + fields = get_cmdline_field(fields, &val, ':');
>> + if (!fields)
>> + break;
>> + }
>
> This seems a bit arbitrary for my taste. I would rather ignore the
> extra fields (and print a message about it), and proceed with the next
> state.
I think we want to stick to the original Linux behavior here. If you
think that wants adjustment, imo doing so would call for going through
Linux first.
>> +
>> + if (fields) {
>> + printk(XENLOG_ERR PREFIX
>> + "Too many fields for C-state '%s'\n",
>> + state->name);
>> + goto error;
>> + }
>> +
>> + printk(XENLOG_INFO PREFIX
>> + "C-state from cmdline: name=%s, latency=%u, residency=%u\n",
>> + state->name, state->exit_latency, state->target_residency);
>> + }
>> +
>> + /* Copy the adjusted C-states table back */
>> + for (i = 0; i < state_count; i++)
>> + icpu.state_table[i] = cmdline_states[i];
>> +
>> + printk(XENLOG_INFO PREFIX
>> + "Adjusted C-states with data from 'mwait-idle.table'\n");
>> + return;
>> +
>> + error:
>> + printk(PREFIX
>
> XENLOG_ERR maybe?
It's pr_info() in the Linux original, so ERR would feel like going too
far. And WARNING is the default anyway. I can switch to XENLOG_WARNING
if you think we want to make this explicit.
Jan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 9/9] x86/mwait-idle: Add C-states validation
2026-04-24 19:15 ` Roger Pau Monné
@ 2026-05-04 9:34 ` Jan Beulich
2026-05-08 7:38 ` Roger Pau Monné
0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2026-05-04 9:34 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 24.04.2026 21:15, Roger Pau Monné wrote:
> On Thu, Mar 12, 2026 at 05:58:21PM +0100, Jan Beulich wrote:
>> @@ -1589,6 +1594,41 @@ static char *__init get_cmdline_field(ch
>> }
>>
>> /**
>> + * validate_cmdline_cstate - Validate a C-state from cmdline.
>> + * @state: The C-state to validate.
>> + * @prev_state: The previous C-state in the table or NULL.
>> + *
>> + * Return: 0 if the C-state is valid or -EINVAL otherwise.
>
> Hm, I know we picked this up from upstream, but this function would
> better return a boolean, rather than 0 or -EINVAL.
I agree, but I didn't want to deviate from their code purely for cosmetic
reasons.
>> +static int __init validate_cmdline_cstate(struct cpuidle_state *state,
>> + struct cpuidle_state *prev_state)
>> +{
>> + if (state->exit_latency == 0)
>> + /* Exit latency 0 can only be used for the POLL state */
>> + return -EINVAL;
>> +
>> + if (state->exit_latency > MAX_CMDLINE_LATENCY_US)
>> + return -EINVAL;
>> +
>> + if (state->target_residency > MAX_CMDLINE_RESIDENCY_US)
>> + return -EINVAL;
>> +
>> + if (state->target_residency < state->exit_latency)
>> + return -EINVAL;
>> +
>> + if (!prev_state)
>> + return 0;
>> +
>> + if (state->exit_latency <= prev_state->exit_latency)
>> + return -EINVAL;
>> +
>> + if (state->target_residency <= prev_state->target_residency)
>> + return -EINVAL;
>
> I'm not an expert on C-states, but isn't this checking against the
> previous value kind of defeating part of the purpose of the command
> line?
I don't know. The question would need raising to the author.
> Also, it might help to also write down those limits in the command
> line documentation.
What do you mean there? Some of the values are universal, but some
checks are against model-specific values. I don't think you mean to
enumerate them all?
Jan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 9/9] x86/mwait-idle: Add C-states validation
2026-05-04 9:34 ` Jan Beulich
@ 2026-05-08 7:38 ` Roger Pau Monné
2026-05-11 10:41 ` Jan Beulich
0 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monné @ 2026-05-08 7:38 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Mon, May 04, 2026 at 11:34:40AM +0200, Jan Beulich wrote:
> On 24.04.2026 21:15, Roger Pau Monné wrote:
> > On Thu, Mar 12, 2026 at 05:58:21PM +0100, Jan Beulich wrote:
> >> @@ -1589,6 +1594,41 @@ static char *__init get_cmdline_field(ch
> >> }
> >>
> >> /**
> >> + * validate_cmdline_cstate - Validate a C-state from cmdline.
> >> + * @state: The C-state to validate.
> >> + * @prev_state: The previous C-state in the table or NULL.
> >> + *
> >> + * Return: 0 if the C-state is valid or -EINVAL otherwise.
> >
> > Hm, I know we picked this up from upstream, but this function would
> > better return a boolean, rather than 0 or -EINVAL.
>
> I agree, but I didn't want to deviate from their code purely for cosmetic
> reasons.
>
> >> +static int __init validate_cmdline_cstate(struct cpuidle_state *state,
> >> + struct cpuidle_state *prev_state)
> >> +{
> >> + if (state->exit_latency == 0)
> >> + /* Exit latency 0 can only be used for the POLL state */
> >> + return -EINVAL;
> >> +
> >> + if (state->exit_latency > MAX_CMDLINE_LATENCY_US)
> >> + return -EINVAL;
> >> +
> >> + if (state->target_residency > MAX_CMDLINE_RESIDENCY_US)
> >> + return -EINVAL;
> >> +
> >> + if (state->target_residency < state->exit_latency)
> >> + return -EINVAL;
> >> +
> >> + if (!prev_state)
> >> + return 0;
> >> +
> >> + if (state->exit_latency <= prev_state->exit_latency)
> >> + return -EINVAL;
> >> +
> >> + if (state->target_residency <= prev_state->target_residency)
> >> + return -EINVAL;
> >
> > I'm not an expert on C-states, but isn't this checking against the
> > previous value kind of defeating part of the purpose of the command
> > line?
>
> I don't know. The question would need raising to the author.
>
> > Also, it might help to also write down those limits in the command
> > line documentation.
>
> What do you mean there? Some of the values are universal, but some
> checks are against model-specific values. I don't think you mean to
> enumerate them all?
Maybe it's indeed not very useful. What I referring to was something
along the lines of: "the command line provided residency and latency
values must be smaller than the default ones". As noted above it
seems weird to me than higher than current values cannot be set,
albeit I have no idea what's the expected usage of this interface.
Thanks, Roger.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 9/9] x86/mwait-idle: Add C-states validation
2026-05-08 7:38 ` Roger Pau Monné
@ 2026-05-11 10:41 ` Jan Beulich
0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2026-05-11 10:41 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 08.05.2026 09:38, Roger Pau Monné wrote:
> On Mon, May 04, 2026 at 11:34:40AM +0200, Jan Beulich wrote:
>> On 24.04.2026 21:15, Roger Pau Monné wrote:
>>> On Thu, Mar 12, 2026 at 05:58:21PM +0100, Jan Beulich wrote:
>>>> @@ -1589,6 +1594,41 @@ static char *__init get_cmdline_field(ch
>>>> }
>>>>
>>>> /**
>>>> + * validate_cmdline_cstate - Validate a C-state from cmdline.
>>>> + * @state: The C-state to validate.
>>>> + * @prev_state: The previous C-state in the table or NULL.
>>>> + *
>>>> + * Return: 0 if the C-state is valid or -EINVAL otherwise.
>>>
>>> Hm, I know we picked this up from upstream, but this function would
>>> better return a boolean, rather than 0 or -EINVAL.
>>
>> I agree, but I didn't want to deviate from their code purely for cosmetic
>> reasons.
>>
>>>> +static int __init validate_cmdline_cstate(struct cpuidle_state *state,
>>>> + struct cpuidle_state *prev_state)
>>>> +{
>>>> + if (state->exit_latency == 0)
>>>> + /* Exit latency 0 can only be used for the POLL state */
>>>> + return -EINVAL;
>>>> +
>>>> + if (state->exit_latency > MAX_CMDLINE_LATENCY_US)
>>>> + return -EINVAL;
>>>> +
>>>> + if (state->target_residency > MAX_CMDLINE_RESIDENCY_US)
>>>> + return -EINVAL;
>>>> +
>>>> + if (state->target_residency < state->exit_latency)
>>>> + return -EINVAL;
>>>> +
>>>> + if (!prev_state)
>>>> + return 0;
>>>> +
>>>> + if (state->exit_latency <= prev_state->exit_latency)
>>>> + return -EINVAL;
>>>> +
>>>> + if (state->target_residency <= prev_state->target_residency)
>>>> + return -EINVAL;
>>>
>>> I'm not an expert on C-states, but isn't this checking against the
>>> previous value kind of defeating part of the purpose of the command
>>> line?
>>
>> I don't know. The question would need raising to the author.
>>
>>> Also, it might help to also write down those limits in the command
>>> line documentation.
>>
>> What do you mean there? Some of the values are universal, but some
>> checks are against model-specific values. I don't think you mean to
>> enumerate them all?
>
> Maybe it's indeed not very useful. What I referring to was something
> along the lines of: "the command line provided residency and latency
> values must be smaller than the default ones". As noted above it
> seems weird to me than higher than current values cannot be set,
> albeit I have no idea what's the expected usage of this interface.
Hmm, while meaning to make this change I came to wonder: What exactly do
you refer to by "current values" and "default ones"? prev_state here
isn't "previous state" as in "before this option was parsed", but as in
"next lower C-state", as per
prev_state = i ? &cmdline_states[i - 1] : NULL;
ahead of the call site.
Instead what I'm inclined to do (despite deviating from the original) is
to constify the function's parameters.
Jan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
` (8 preceding siblings ...)
2026-03-12 16:58 ` [PATCH 9/9] x86/mwait-idle: Add C-states validation Jan Beulich
@ 2026-05-12 15:22 ` Oleksii Kurochko
9 siblings, 0 replies; 27+ messages in thread
From: Oleksii Kurochko @ 2026-05-12 15:22 UTC (permalink / raw)
To: Jan Beulich, xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Roger Pau Monné
On 3/12/26 5:53 PM, Jan Beulich wrote:
> Includes a few custom changes, too.
>
> 1: arrange for BSP MSR adjustments during S3 resume
> 2: clean up BYT/CHT auto demotion disable
> 3: latch struct idle_cpu contents
> 4: move pre-initialized struct idle_cpu instances
> 5: Remove unused driver version constant
> 6: Remove the 'preferred_cstates' parameter
> 7: drop const from struct cpuidle_state arrays
> 8: Add cmdline option to adjust C-states table
> 9: Add C-states validation
>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
~ Oleksii
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2026-05-12 15:23 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 16:53 [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Jan Beulich
2026-03-12 16:54 ` [PATCH 1/9] x86/mwait-idle: arrange for BSP MSR adjustments during S3 resume Jan Beulich
2026-04-24 14:34 ` Roger Pau Monné
2026-05-04 9:02 ` Jan Beulich
2026-03-12 16:54 ` [PATCH 2/9] x86/mwait-idle: clean up BYT/CHT auto demotion disable Jan Beulich
2026-04-24 14:47 ` Roger Pau Monné
2026-05-04 9:07 ` Jan Beulich
2026-03-12 16:55 ` [PATCH 3/9] x86/mwait-idle: latch struct idle_cpu contents Jan Beulich
2026-04-24 15:24 ` Roger Pau Monné
2026-03-12 16:55 ` [PATCH 4/9] x86/mwait-idle: move pre-initialized struct idle_cpu instances Jan Beulich
2026-04-24 15:33 ` Roger Pau Monné
2026-03-12 16:56 ` [PATCH 5/9] x86/mwait-idle: Remove unused driver version constant Jan Beulich
2026-04-24 15:35 ` Roger Pau Monné
2026-03-12 16:56 ` [PATCH 6/9] x86/mwait-idle: Remove the 'preferred_cstates' parameter Jan Beulich
2026-04-24 15:37 ` Roger Pau Monné
2026-03-12 16:57 ` [PATCH 7/9] x86/mwait-idle: drop const from struct cpuidle_state arrays Jan Beulich
2026-04-24 17:57 ` Roger Pau Monné
2026-05-04 9:14 ` Jan Beulich
2026-03-12 16:57 ` [PATCH 8/9] x86/mwait-idle: Add cmdline option to adjust C-states table Jan Beulich
2026-04-24 19:10 ` Roger Pau Monné
2026-05-04 9:29 ` Jan Beulich
2026-03-12 16:58 ` [PATCH 9/9] x86/mwait-idle: Add C-states validation Jan Beulich
2026-04-24 19:15 ` Roger Pau Monné
2026-05-04 9:34 ` Jan Beulich
2026-05-08 7:38 ` Roger Pau Monné
2026-05-11 10:41 ` Jan Beulich
2026-05-12 15:22 ` [PATCH 0/9] x86/mwait-idle: sync up with Linux 7.0-rc Oleksii Kurochko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.