* [PATCH 0/4 v2] Add support for running in VM guests to intel_idle
@ 2023-06-05 15:47 arjan
2023-06-05 15:47 ` [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function arjan
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: arjan @ 2023-06-05 15:47 UTC (permalink / raw)
To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven
From: Arjan van de Ven <arjan@linux.intel.com>
intel_idle provides the CPU Idle states (for power saving in idle) to the
cpuidle framework, based on per-cpu tables combined with limited hardware
enumeration. This combination of cpuidle and intel_idle provides dynamic
behavior where power saving and performance impact are dynamically balanced
and where a set of generic knobs are provided in sysfs for users to tune
the heuristics (and get statistics etc)
However, intel_idle currently does not support running inside VM guests, and
the linux kernel falls back to either ACPI based idle (if supported by the
hypervisor/virtual bios) or just the default x86 fallback "hlt" based idle
method... that was introduced in the 1.2 kernel series... and lacks all the
dynamic behavior, user control and statistics that cpuidle brings.
While this is obviously functional, it's not great and we can do better
for the user by hooking up intel_idle into the cpuidle framework also
for the "in a guest" case.
And not only not great for the user, it's also not optimal and lacks two
key capabilities that are supported by the bare metal case:
1) The ability to flush the TLB for very long idle periods, to avoid
a costly (and high latency) IPI wakeup later, of an idle vCPU when a
process that used to run on the idle vCPU does an munmap or similar
operation. Avoiding high latency IPIs helps avoid performance jitter.
2) The ability to use the new Intel C0.2 idle state instead of polling
for very short duration idle periods to save power (and carbon footprint)
This patch series adds the basic support to run in a VM guest
to the intel_idle driver, and then addresses the first of these shortfalls.
The C0.2 gap will be fixed with a small additional patch after the
C0.2 support is merged seperately.
Arjan van de Ven (4):
intel_idle: refactor state->enter manipulation into its own function
intel_idle: clean up the (new) state_update_enter_method function
intel_idle: Add support for using intel_idle in a VM guest using just
hlt
intel_idle: Add a "Long HLT" C1 state for the VM guest mode
drivers/idle/intel_idle.c | 238 ++++++++++++++++++++++++++++++++++----
1 file changed, 215 insertions(+), 23 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function
2023-06-05 15:47 [PATCH 0/4 v2] Add support for running in VM guests to intel_idle arjan
@ 2023-06-05 15:47 ` arjan
2023-06-07 7:34 ` Zhang, Rui
2023-06-05 15:47 ` [PATCH 2/4] intel_idle: clean up the (new) state_update_enter_method function arjan
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: arjan @ 2023-06-05 15:47 UTC (permalink / raw)
To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven
From: Arjan van de Ven <arjan@linux.intel.com>
Since the 6.4 kernel, the logic for updating a state's enter method
based on "environmental conditions" (command line options, cpu sidechannel
workarounds etc etc) has gotten pretty complex.
This patch refactors this into a seperate small, self contained function
(no behavior changes) for improved readability and to make future
changes to this logic easier to do and understand.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
drivers/idle/intel_idle.c | 50 ++++++++++++++++++++++-----------------
1 file changed, 28 insertions(+), 22 deletions(-)
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index aa2d19db2b1d..c351b21c0875 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1839,6 +1839,32 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
return true;
}
+static void state_update_enter_method(struct cpuidle_state *state, int cstate)
+{
+ if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
+ /*
+ * Combining with XSTATE with IBRS or IRQ_ENABLE flags
+ * is not currently supported but this driver.
+ */
+ WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
+ WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+ state->enter = intel_idle_xstate;
+ } else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
+ state->flags & CPUIDLE_FLAG_IBRS) {
+ /*
+ * IBRS mitigation requires that C-states are entered
+ * with interrupts disabled.
+ */
+ WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+ state->enter = intel_idle_ibrs;
+ } else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
+ state->enter = intel_idle_irq;
+ } else if (force_irq_on) {
+ pr_info("forced intel_idle_irq for state %d\n", cstate);
+ state->enter = intel_idle_irq;
+ }
+}
+
static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
{
int cstate;
@@ -1894,28 +1920,8 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
drv->states[drv->state_count] = cpuidle_state_table[cstate];
state = &drv->states[drv->state_count];
- if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
- /*
- * Combining with XSTATE with IBRS or IRQ_ENABLE flags
- * is not currently supported but this driver.
- */
- WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
- WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
- state->enter = intel_idle_xstate;
- } else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
- state->flags & CPUIDLE_FLAG_IBRS) {
- /*
- * IBRS mitigation requires that C-states are entered
- * with interrupts disabled.
- */
- WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
- state->enter = intel_idle_ibrs;
- } else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
- state->enter = intel_idle_irq;
- } else if (force_irq_on) {
- pr_info("forced intel_idle_irq for state %d\n", cstate);
- state->enter = intel_idle_irq;
- }
+ state_update_enter_method(state, cstate);
+
if ((disabled_states_mask & BIT(drv->state_count)) ||
((icpu->use_acpi || force_use_acpi) &&
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/4] intel_idle: clean up the (new) state_update_enter_method function
2023-06-05 15:47 [PATCH 0/4 v2] Add support for running in VM guests to intel_idle arjan
2023-06-05 15:47 ` [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function arjan
@ 2023-06-05 15:47 ` arjan
2023-06-12 18:09 ` Rafael J. Wysocki
2023-06-05 15:47 ` [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
2023-06-05 15:47 ` [PATCH 4/4] intel_idle: Add a "Long HLT" C1 state for the VM guest mode arjan
3 siblings, 1 reply; 16+ messages in thread
From: arjan @ 2023-06-05 15:47 UTC (permalink / raw)
To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven
From: Arjan van de Ven <arjan@linux.intel.com>
Now that the logic for state_update_enter_method() is in its own
function, the long if .. else if .. else if .. else if chain
can be simplified by just returning from the function
at the various places. This does not change functionality,
but it makes the logic much simpler to read or modify later.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
drivers/idle/intel_idle.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c351b21c0875..256c2d42e350 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1849,7 +1849,10 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
state->enter = intel_idle_xstate;
- } else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
+ return;
+ }
+
+ if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
state->flags & CPUIDLE_FLAG_IBRS) {
/*
* IBRS mitigation requires that C-states are entered
@@ -1857,9 +1860,15 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
*/
WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
state->enter = intel_idle_ibrs;
- } else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
+ return;
+ }
+
+ if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
state->enter = intel_idle_irq;
- } else if (force_irq_on) {
+ return;
+ }
+
+ if (force_irq_on) {
pr_info("forced intel_idle_irq for state %d\n", cstate);
state->enter = intel_idle_irq;
}
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-06-05 15:47 [PATCH 0/4 v2] Add support for running in VM guests to intel_idle arjan
2023-06-05 15:47 ` [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function arjan
2023-06-05 15:47 ` [PATCH 2/4] intel_idle: clean up the (new) state_update_enter_method function arjan
@ 2023-06-05 15:47 ` arjan
2023-06-12 18:05 ` Rafael J. Wysocki
` (2 more replies)
2023-06-05 15:47 ` [PATCH 4/4] intel_idle: Add a "Long HLT" C1 state for the VM guest mode arjan
3 siblings, 3 replies; 16+ messages in thread
From: arjan @ 2023-06-05 15:47 UTC (permalink / raw)
To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven
From: Arjan van de Ven <arjan@linux.intel.com>
In a typical VM guest, the mwait instruction is not available, leaving only the
'hlt' instruction (which causes a VMEXIT to the host).
So for this common case, intel_idle will detect the lack of mwait, and fail
to initialize (after which another idle method would step in which will
just use hlt always).
Other (non-common) cases exist; the table below shows the before/after for these:
+------------+--------------------------+-------------------------+
| Hypervisor | Idle method before patch | Idle method after patch |
| exposes | | |
+============+==========================+=========================+
| nothing | default_idle fallback | intel_idle VM table |
| (common) | (straight "hlt") | |
+------------+--------------------------+-------------------------+
| mwait | intel_idle mwait table | intel_idle mwait table |
+------------+--------------------------+-------------------------+
| ACPI | ACPI C1 state ("hlt") | intel_idle VM table |
+------------+--------------------------+-------------------------+
By providing capability to do this with the intel_idle driver, we can
do better than the fallback or ACPI table methods. While this current change
only gets us to the existing behavior, later patches in this series
will add new capabilities such as optimized TLB flushing.
In order to do this, a simplified version of the initialization function
for VM guests is created, and this will be called if the CPU is recognized,
but mwait is not supported, and we're in a VM guest.
One thing to note is that the max latency (and break even) of this C1 state
is higher than the typical bare metal C1 state. Because hlt causes a vmexit,
and the cost of vmexit + hypervisor overhead + vmenter is typically in the
order of upto 5 microseconds... even if the hypervisor does not actually
goes into a hardware power saving state.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
drivers/idle/intel_idle.c | 125 +++++++++++++++++++++++++++++++++++++-
1 file changed, 124 insertions(+), 1 deletion(-)
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 256c2d42e350..d2518cf36ab4 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -199,6 +199,43 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
return __intel_idle(dev, drv, index);
}
+static __always_inline int __intel_idle_hlt(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
+{
+ raw_safe_halt();
+ raw_local_irq_disable();
+ return index;
+}
+
+/**
+ * intel_idle_hlt - Ask the processor to enter the given idle state using hlt.
+ * @dev: cpuidle device of the target CPU.
+ * @drv: cpuidle driver (assumed to point to intel_idle_driver).
+ * @index: Target idle state index.
+ *
+ * Use the HLT instruction to notify the processor that the CPU represented by
+ * @dev is idle and it can try to enter the idle state corresponding to @index.
+ *
+ * Must be called under local_irq_disable().
+ */
+static __cpuidle int intel_idle_hlt(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
+{
+ return __intel_idle_hlt(dev, drv, index);
+}
+
+static __cpuidle int intel_idle_hlt_irq_on(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
+{
+ int ret;
+
+ raw_local_irq_enable();
+ ret = __intel_idle_hlt(dev, drv, index);
+ raw_local_irq_disable();
+
+ return ret;
+}
+
/**
* intel_idle_s2idle - Ask the processor to enter the given idle state.
* @dev: cpuidle device of the target CPU.
@@ -1242,6 +1279,18 @@ static struct cpuidle_state snr_cstates[] __initdata = {
.enter = NULL }
};
+static struct cpuidle_state vmguest_cstates[] __initdata = {
+ {
+ .name = "C1",
+ .desc = "HLT",
+ .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_IRQ_ENABLE,
+ .exit_latency = 5,
+ .target_residency = 10,
+ .enter = &intel_idle_hlt, },
+ {
+ .enter = NULL }
+};
+
static const struct idle_cpu idle_cpu_nehalem __initconst = {
.state_table = nehalem_cstates,
.auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
@@ -1841,6 +1890,16 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
static void state_update_enter_method(struct cpuidle_state *state, int cstate)
{
+ if (state->enter == intel_idle_hlt) {
+ if (force_irq_on) {
+ pr_info("forced intel_idle_irq for state %d\n", cstate);
+ state->enter = intel_idle_hlt_irq_on;
+ }
+ return;
+ }
+ if (state->enter == intel_idle_hlt_irq_on)
+ return; /* no update scenarios */
+
if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
/*
* Combining with XSTATE with IBRS or IRQ_ENABLE flags
@@ -1874,6 +1933,29 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
}
}
+/*
+ * For mwait based states, we want to verify the cpuid data to see if the state
+ * is actually supported by this specific CPU.
+ * For non-mwait based states, this check should be skipped.
+ */
+static bool should_verify_mwait(struct cpuidle_state *state)
+{
+ if (state->enter == intel_idle_irq)
+ return true;
+ if (state->enter == intel_idle)
+ return true;
+ if (state->enter == intel_idle_ibrs)
+ return true;
+ if (state->enter == intel_idle_xstate)
+ return true;
+ if (state->enter == intel_idle_hlt)
+ return false;
+ if (state->enter == intel_idle_hlt_irq_on)
+ return false;
+
+ return true;
+}
+
static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
{
int cstate;
@@ -1922,7 +2004,7 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
}
mwait_hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
- if (!intel_idle_verify_cstate(mwait_hint))
+ if (should_verify_mwait(&cpuidle_state_table[cstate]) && !intel_idle_verify_cstate(mwait_hint))
continue;
/* Structure copy. */
@@ -2056,6 +2138,45 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
}
+static int __init intel_idle_vminit(const struct x86_cpu_id *id)
+{
+ int retval;
+
+ cpuidle_state_table = vmguest_cstates;
+
+ icpu = (const struct idle_cpu *)id->driver_data;
+
+ pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
+ boot_cpu_data.x86_model);
+
+ intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
+ if (!intel_idle_cpuidle_devices)
+ return -ENOMEM;
+
+ intel_idle_cpuidle_driver_init(&intel_idle_driver);
+
+ retval = cpuidle_register_driver(&intel_idle_driver);
+ if (retval) {
+ struct cpuidle_driver *drv = cpuidle_get_driver();
+ printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
+ drv ? drv->name : "none");
+ goto init_driver_fail;
+ }
+
+ retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
+ intel_idle_cpu_online, NULL);
+ if (retval < 0)
+ goto hp_setup_fail;
+
+ return 0;
+hp_setup_fail:
+ intel_idle_cpuidle_devices_uninit();
+ cpuidle_unregister_driver(&intel_idle_driver);
+init_driver_fail:
+ free_percpu(intel_idle_cpuidle_devices);
+ return retval;
+}
+
static int __init intel_idle_init(void)
{
const struct x86_cpu_id *id;
@@ -2074,6 +2195,8 @@ static int __init intel_idle_init(void)
id = x86_match_cpu(intel_idle_ids);
if (id) {
if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ return intel_idle_vminit(id);
pr_debug("Please enable MWAIT in BIOS SETUP\n");
return -ENODEV;
}
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/4] intel_idle: Add a "Long HLT" C1 state for the VM guest mode
2023-06-05 15:47 [PATCH 0/4 v2] Add support for running in VM guests to intel_idle arjan
` (2 preceding siblings ...)
2023-06-05 15:47 ` [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
@ 2023-06-05 15:47 ` arjan
3 siblings, 0 replies; 16+ messages in thread
From: arjan @ 2023-06-05 15:47 UTC (permalink / raw)
To: linux-pm; +Cc: artem.bityutskiy, rafael, Arjan van de Ven
From: Arjan van de Ven <arjan@linux.intel.com>
intel_idle will, for the bare metal case, usually have one or more deep
power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When
a state with this flag is selected by the cpuidle framework, it will also
flush the TLBs as part of entering this state. The benefit of doing this is
that the kernel does not need to wake the cpu out of this deep power state
just to flush the TLBs... for which the latency can be very high due to
the exit latency of deep power states.
In a VM guest currently, this benefit of avoiding the wakeup does not exist,
while the problem (long exit latency) is even more severe. Linux will need
to wake up a vCPU (causing the host to either come out of a deep C state,
or the VMM to have to deschedule something else to schedule the vCPU) which
can take a very long time.. adding a lot of latency to tlb flush operations
(including munmap and others).
To solve this, add a "Long HLT" C state to the state table for the VM guest
case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set. The result of that is
that for long idle periods (where the VMM is likely to do things that cause
large latency) the cpuidle framework will flush the TLBs (and avoid the
wakeups), while for short/quick idle durations, the existing behavior is
retained.
Now, there is still only "hlt" available in the guest, but for long idle,
the host can go to a deeper state (say C6). There is a reasonable debate
one can have to what to set for the exit_latency and break even point for
this "Long HLT" state. The good news is that intel_idle has these values
available for the underlying CPU (even when mwait is not exposed). The
solution thus is to just use the latency and break even of the deepest state
from the bare metal CPU. This is under the assumption that this is a pretty
reasonable estimate of what the VMM would do to cause latency.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
drivers/idle/intel_idle.c | 54 +++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index d2518cf36ab4..0bf5e9f5bed8 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1287,6 +1287,13 @@ static struct cpuidle_state vmguest_cstates[] __initdata = {
.exit_latency = 5,
.target_residency = 10,
.enter = &intel_idle_hlt, },
+ {
+ .name = "C1L",
+ .desc = "Long HLT",
+ .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 5,
+ .target_residency = 200,
+ .enter = &intel_idle_hlt, },
{
.enter = NULL }
};
@@ -2138,6 +2145,44 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
}
+/*
+ * Match up the latency and break even point of the bare metal (cpu based)
+ * states with the deepest VM available state.
+ *
+ * We only want to do this for the deepest state, the ones that has
+ * the TLB_FLUSHED flag set on the .
+ *
+ * All our short idle states are dominated by vmexit/vmenter latencies,
+ * not the underlying hardware latencies so we keep our values for these.
+ */
+static void matchup_vm_state_with_baremetal(void)
+{
+ int cstate;
+
+ for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
+ int matching_cstate;
+
+ if (intel_idle_max_cstate_reached(cstate))
+ break;
+
+ if (!cpuidle_state_table[cstate].enter &&
+ !cpuidle_state_table[cstate].enter_s2idle)
+ break;
+
+ if (!(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_TLB_FLUSHED))
+ continue;
+
+ for (matching_cstate = 0; matching_cstate < CPUIDLE_STATE_MAX; ++matching_cstate) {
+ if (icpu->state_table[matching_cstate].exit_latency > cpuidle_state_table[cstate].exit_latency) {
+ cpuidle_state_table[cstate].exit_latency = icpu->state_table[matching_cstate].exit_latency;
+ cpuidle_state_table[cstate].target_residency = icpu->state_table[matching_cstate].target_residency;
+ }
+ }
+
+ }
+}
+
+
static int __init intel_idle_vminit(const struct x86_cpu_id *id)
{
int retval;
@@ -2153,6 +2198,15 @@ static int __init intel_idle_vminit(const struct x86_cpu_id *id)
if (!intel_idle_cpuidle_devices)
return -ENOMEM;
+ /*
+ * We don't know exactly what the host will do when we go idle, but as a worst estimate
+ * we can assume that the exit latency of the deepest host state will be hit for our
+ * deep (long duration) guest idle state.
+ * The same logic applies to the break even point for the long duration guest idle state.
+ * So lets copy these two properties from the table we found for the host CPU type.
+ */
+ matchup_vm_state_with_baremetal();
+
intel_idle_cpuidle_driver_init(&intel_idle_driver);
retval = cpuidle_register_driver(&intel_idle_driver);
--
2.40.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function
2023-06-05 15:47 ` [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function arjan
@ 2023-06-07 7:34 ` Zhang, Rui
2023-06-07 14:08 ` Arjan van de Ven
0 siblings, 1 reply; 16+ messages in thread
From: Zhang, Rui @ 2023-06-07 7:34 UTC (permalink / raw)
To: linux-pm@vger.kernel.org, arjan@linux.intel.com
Cc: artem.bityutskiy@linux.intel.com, rafael@kernel.org
Just one minor comment.
On Mon, 2023-06-05 at 15:47 +0000, arjan@linux.intel.com wrote:
>
>
> +static void state_update_enter_method(struct cpuidle_state *state,
> int cstate)
> +{
> + if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
> + /*
> + * Combining with XSTATE with IBRS or IRQ_ENABLE
> flags
> + * is not currently supported but this driver.
s/but/by?
seems like a typo in the original commit.
thanks,
rui
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function
2023-06-07 7:34 ` Zhang, Rui
@ 2023-06-07 14:08 ` Arjan van de Ven
0 siblings, 0 replies; 16+ messages in thread
From: Arjan van de Ven @ 2023-06-07 14:08 UTC (permalink / raw)
To: Zhang, Rui, linux-pm@vger.kernel.org
Cc: artem.bityutskiy@linux.intel.com, rafael@kernel.org
On 6/7/2023 12:34 AM, Zhang, Rui wrote:
> Just one minor comment.
>
> On Mon, 2023-06-05 at 15:47 +0000, arjan@linux.intel.com wrote:
>>
>>
>> +static void state_update_enter_method(struct cpuidle_state *state,
>> int cstate)
>> +{
>> + if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
>> + /*
>> + * Combining with XSTATE with IBRS or IRQ_ENABLE
>> flags
>> + * is not currently supported but this driver.
>
> s/but/by?
> seems like a typo in the original commit.
this commit is a strict move of the existing code into a function, with zero changes
I would prefer to keep it that way obviously.
If someone wants to fix up spelling in comments that should be some completely independent commit
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-06-05 15:47 ` [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
@ 2023-06-12 18:05 ` Rafael J. Wysocki
2023-06-12 18:07 ` Arjan van de Ven
2023-06-15 17:46 ` Rafael J. Wysocki
2023-07-17 8:34 ` Xiaoyao Li
2 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2023-06-12 18:05 UTC (permalink / raw)
To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael
On Mon, Jun 5, 2023 at 5:47 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan@linux.intel.com>
>
> In a typical VM guest, the mwait instruction is not available, leaving only the
> 'hlt' instruction (which causes a VMEXIT to the host).
>
> So for this common case, intel_idle will detect the lack of mwait, and fail
> to initialize (after which another idle method would step in which will
> just use hlt always).
>
> Other (non-common) cases exist; the table below shows the before/after for these:
>
> +------------+--------------------------+-------------------------+
> | Hypervisor | Idle method before patch | Idle method after patch |
> | exposes | | |
> +============+==========================+=========================+
> | nothing | default_idle fallback | intel_idle VM table |
> | (common) | (straight "hlt") | |
> +------------+--------------------------+-------------------------+
> | mwait | intel_idle mwait table | intel_idle mwait table |
> +------------+--------------------------+-------------------------+
> | ACPI | ACPI C1 state ("hlt") | intel_idle VM table |
> +------------+--------------------------+-------------------------+
>
> By providing capability to do this with the intel_idle driver, we can
> do better than the fallback or ACPI table methods. While this current change
> only gets us to the existing behavior, later patches in this series
> will add new capabilities such as optimized TLB flushing.
>
> In order to do this, a simplified version of the initialization function
> for VM guests is created, and this will be called if the CPU is recognized,
> but mwait is not supported, and we're in a VM guest.
>
> One thing to note is that the max latency (and break even) of this C1 state
> is higher than the typical bare metal C1 state. Because hlt causes a vmexit,
> and the cost of vmexit + hypervisor overhead + vmenter is typically in the
> order of upto 5 microseconds... even if the hypervisor does not actually
> goes into a hardware power saving state.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
> drivers/idle/intel_idle.c | 125 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 124 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 256c2d42e350..d2518cf36ab4 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -199,6 +199,43 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
> return __intel_idle(dev, drv, index);
> }
>
> +static __always_inline int __intel_idle_hlt(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> +{
> + raw_safe_halt();
> + raw_local_irq_disable();
> + return index;
> +}
> +
> +/**
> + * intel_idle_hlt - Ask the processor to enter the given idle state using hlt.
> + * @dev: cpuidle device of the target CPU.
> + * @drv: cpuidle driver (assumed to point to intel_idle_driver).
> + * @index: Target idle state index.
> + *
> + * Use the HLT instruction to notify the processor that the CPU represented by
> + * @dev is idle and it can try to enter the idle state corresponding to @index.
> + *
> + * Must be called under local_irq_disable().
> + */
> +static __cpuidle int intel_idle_hlt(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> +{
> + return __intel_idle_hlt(dev, drv, index);
> +}
> +
> +static __cpuidle int intel_idle_hlt_irq_on(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> +{
> + int ret;
> +
> + raw_local_irq_enable();
> + ret = __intel_idle_hlt(dev, drv, index);
> + raw_local_irq_disable();
> +
> + return ret;
> +}
> +
> /**
> * intel_idle_s2idle - Ask the processor to enter the given idle state.
> * @dev: cpuidle device of the target CPU.
> @@ -1242,6 +1279,18 @@ static struct cpuidle_state snr_cstates[] __initdata = {
> .enter = NULL }
> };
>
> +static struct cpuidle_state vmguest_cstates[] __initdata = {
> + {
> + .name = "C1",
> + .desc = "HLT",
> + .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_IRQ_ENABLE,
> + .exit_latency = 5,
> + .target_residency = 10,
> + .enter = &intel_idle_hlt, },
> + {
> + .enter = NULL }
> +};
> +
> static const struct idle_cpu idle_cpu_nehalem __initconst = {
> .state_table = nehalem_cstates,
> .auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
> @@ -1841,6 +1890,16 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>
> static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> {
> + if (state->enter == intel_idle_hlt) {
> + if (force_irq_on) {
> + pr_info("forced intel_idle_irq for state %d\n", cstate);
> + state->enter = intel_idle_hlt_irq_on;
> + }
> + return;
> + }
> + if (state->enter == intel_idle_hlt_irq_on)
> + return; /* no update scenarios */
> +
> if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
> /*
> * Combining with XSTATE with IBRS or IRQ_ENABLE flags
> @@ -1874,6 +1933,29 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> }
> }
>
> +/*
> + * For mwait based states, we want to verify the cpuid data to see if the state
> + * is actually supported by this specific CPU.
> + * For non-mwait based states, this check should be skipped.
> + */
> +static bool should_verify_mwait(struct cpuidle_state *state)
> +{
> + if (state->enter == intel_idle_irq)
> + return true;
> + if (state->enter == intel_idle)
> + return true;
> + if (state->enter == intel_idle_ibrs)
> + return true;
> + if (state->enter == intel_idle_xstate)
> + return true;
Since true is returned by default below, why are the above checks
necessary (or even useful for that matter)?
> + if (state->enter == intel_idle_hlt)
> + return false;
> + if (state->enter == intel_idle_hlt_irq_on)
> + return false;
> +
> + return true;
> +}
> +
> static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
> {
> int cstate;
> @@ -1922,7 +2004,7 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
> }
>
> mwait_hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
> - if (!intel_idle_verify_cstate(mwait_hint))
> + if (should_verify_mwait(&cpuidle_state_table[cstate]) && !intel_idle_verify_cstate(mwait_hint))
> continue;
>
> /* Structure copy. */
> @@ -2056,6 +2138,45 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
> cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
> }
>
> +static int __init intel_idle_vminit(const struct x86_cpu_id *id)
> +{
> + int retval;
> +
> + cpuidle_state_table = vmguest_cstates;
> +
> + icpu = (const struct idle_cpu *)id->driver_data;
> +
> + pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
> + boot_cpu_data.x86_model);
> +
> + intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
> + if (!intel_idle_cpuidle_devices)
> + return -ENOMEM;
> +
> + intel_idle_cpuidle_driver_init(&intel_idle_driver);
> +
> + retval = cpuidle_register_driver(&intel_idle_driver);
> + if (retval) {
> + struct cpuidle_driver *drv = cpuidle_get_driver();
> + printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
> + drv ? drv->name : "none");
> + goto init_driver_fail;
> + }
> +
> + retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
> + intel_idle_cpu_online, NULL);
> + if (retval < 0)
> + goto hp_setup_fail;
> +
> + return 0;
> +hp_setup_fail:
> + intel_idle_cpuidle_devices_uninit();
> + cpuidle_unregister_driver(&intel_idle_driver);
> +init_driver_fail:
> + free_percpu(intel_idle_cpuidle_devices);
> + return retval;
> +}
> +
> static int __init intel_idle_init(void)
> {
> const struct x86_cpu_id *id;
> @@ -2074,6 +2195,8 @@ static int __init intel_idle_init(void)
> id = x86_match_cpu(intel_idle_ids);
> if (id) {
> if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> + return intel_idle_vminit(id);
> pr_debug("Please enable MWAIT in BIOS SETUP\n");
> return -ENODEV;
> }
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-06-12 18:05 ` Rafael J. Wysocki
@ 2023-06-12 18:07 ` Arjan van de Ven
0 siblings, 0 replies; 16+ messages in thread
From: Arjan van de Ven @ 2023-06-12 18:07 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-pm, artem.bityutskiy
On 6/12/2023 11:05 AM, Rafael J. Wysocki wrote:
> Since true is returned by default below, why are the above checks
> necessary (or even useful for that matter)?
they are really for documentation purposes to show all cases are handled... but if you hate them
I don't mind them going away; I can respin quickly or send another
patch to remove them.. up to you
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/4] intel_idle: clean up the (new) state_update_enter_method function
2023-06-05 15:47 ` [PATCH 2/4] intel_idle: clean up the (new) state_update_enter_method function arjan
@ 2023-06-12 18:09 ` Rafael J. Wysocki
0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2023-06-12 18:09 UTC (permalink / raw)
To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael
On Mon, Jun 5, 2023 at 5:47 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan@linux.intel.com>
>
> Now that the logic for state_update_enter_method() is in its own
> function, the long if .. else if .. else if .. else if chain
> can be simplified by just returning from the function
> at the various places. This does not change functionality,
> but it makes the logic much simpler to read or modify later.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
> drivers/idle/intel_idle.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index c351b21c0875..256c2d42e350 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1849,7 +1849,10 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IBRS);
> WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
> state->enter = intel_idle_xstate;
> - } else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
> + return;
> + }
> +
> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
> state->flags & CPUIDLE_FLAG_IBRS) {
> /*
> * IBRS mitigation requires that C-states are entered
> @@ -1857,9 +1860,15 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> */
> WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
> state->enter = intel_idle_ibrs;
> - } else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
> + return;
> + }
> +
> + if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
> state->enter = intel_idle_irq;
> - } else if (force_irq_on) {
> + return;
> + }
> +
> + if (force_irq_on) {
> pr_info("forced intel_idle_irq for state %d\n", cstate);
> state->enter = intel_idle_irq;
> }
> --
This and the [1/4] applied as 6.5 material.
I'll get to the other two shortly.
Thanks!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-06-05 15:47 ` [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
2023-06-12 18:05 ` Rafael J. Wysocki
@ 2023-06-15 17:46 ` Rafael J. Wysocki
2023-07-17 8:34 ` Xiaoyao Li
2 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2023-06-15 17:46 UTC (permalink / raw)
To: arjan; +Cc: linux-pm, artem.bityutskiy, rafael
On Mon, Jun 5, 2023 at 5:47 PM <arjan@linux.intel.com> wrote:
>
> From: Arjan van de Ven <arjan@linux.intel.com>
>
> In a typical VM guest, the mwait instruction is not available, leaving only the
> 'hlt' instruction (which causes a VMEXIT to the host).
>
> So for this common case, intel_idle will detect the lack of mwait, and fail
> to initialize (after which another idle method would step in which will
> just use hlt always).
>
> Other (non-common) cases exist; the table below shows the before/after for these:
>
> +------------+--------------------------+-------------------------+
> | Hypervisor | Idle method before patch | Idle method after patch |
> | exposes | | |
> +============+==========================+=========================+
> | nothing | default_idle fallback | intel_idle VM table |
> | (common) | (straight "hlt") | |
> +------------+--------------------------+-------------------------+
> | mwait | intel_idle mwait table | intel_idle mwait table |
> +------------+--------------------------+-------------------------+
> | ACPI | ACPI C1 state ("hlt") | intel_idle VM table |
> +------------+--------------------------+-------------------------+
However, this is only for the processors known to intel_idle.
On the processors whose IDs are not there in intel_idle_ids,
intel_idle on virt will still refuse to register, so the first column
remains applicable.
IOW, for this change to take effect on any new processors, their IDs
need to be added to intel_idle_ids.
IMO it would be prudent to mention this here. I can add a paragraph
to that effect to the changelog or please send me one if you prefer.
> By providing capability to do this with the intel_idle driver, we can
> do better than the fallback or ACPI table methods. While this current change
> only gets us to the existing behavior, later patches in this series
> will add new capabilities such as optimized TLB flushing.
>
> In order to do this, a simplified version of the initialization function
> for VM guests is created, and this will be called if the CPU is recognized,
> but mwait is not supported, and we're in a VM guest.
>
> One thing to note is that the max latency (and break even) of this C1 state
> is higher than the typical bare metal C1 state. Because hlt causes a vmexit,
> and the cost of vmexit + hypervisor overhead + vmenter is typically in the
> order of upto 5 microseconds... even if the hypervisor does not actually
> goes into a hardware power saving state.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
> drivers/idle/intel_idle.c | 125 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 124 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 256c2d42e350..d2518cf36ab4 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -199,6 +199,43 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
> return __intel_idle(dev, drv, index);
> }
>
> +static __always_inline int __intel_idle_hlt(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> +{
> + raw_safe_halt();
> + raw_local_irq_disable();
> + return index;
> +}
> +
> +/**
> + * intel_idle_hlt - Ask the processor to enter the given idle state using hlt.
> + * @dev: cpuidle device of the target CPU.
> + * @drv: cpuidle driver (assumed to point to intel_idle_driver).
> + * @index: Target idle state index.
> + *
> + * Use the HLT instruction to notify the processor that the CPU represented by
> + * @dev is idle and it can try to enter the idle state corresponding to @index.
> + *
> + * Must be called under local_irq_disable().
> + */
> +static __cpuidle int intel_idle_hlt(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> +{
> + return __intel_idle_hlt(dev, drv, index);
> +}
> +
> +static __cpuidle int intel_idle_hlt_irq_on(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> +{
> + int ret;
> +
> + raw_local_irq_enable();
> + ret = __intel_idle_hlt(dev, drv, index);
> + raw_local_irq_disable();
> +
> + return ret;
> +}
> +
> /**
> * intel_idle_s2idle - Ask the processor to enter the given idle state.
> * @dev: cpuidle device of the target CPU.
> @@ -1242,6 +1279,18 @@ static struct cpuidle_state snr_cstates[] __initdata = {
> .enter = NULL }
> };
>
> +static struct cpuidle_state vmguest_cstates[] __initdata = {
> + {
> + .name = "C1",
> + .desc = "HLT",
> + .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_IRQ_ENABLE,
> + .exit_latency = 5,
> + .target_residency = 10,
> + .enter = &intel_idle_hlt, },
> + {
> + .enter = NULL }
> +};
> +
> static const struct idle_cpu idle_cpu_nehalem __initconst = {
> .state_table = nehalem_cstates,
> .auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE,
> @@ -1841,6 +1890,16 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>
> static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> {
> + if (state->enter == intel_idle_hlt) {
> + if (force_irq_on) {
> + pr_info("forced intel_idle_irq for state %d\n", cstate);
> + state->enter = intel_idle_hlt_irq_on;
> + }
> + return;
> + }
> + if (state->enter == intel_idle_hlt_irq_on)
> + return; /* no update scenarios */
> +
> if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
> /*
> * Combining with XSTATE with IBRS or IRQ_ENABLE flags
> @@ -1874,6 +1933,29 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate)
> }
> }
>
> +/*
> + * For mwait based states, we want to verify the cpuid data to see if the state
> + * is actually supported by this specific CPU.
> + * For non-mwait based states, this check should be skipped.
> + */
> +static bool should_verify_mwait(struct cpuidle_state *state)
> +{
> + if (state->enter == intel_idle_irq)
> + return true;
> + if (state->enter == intel_idle)
> + return true;
> + if (state->enter == intel_idle_ibrs)
> + return true;
> + if (state->enter == intel_idle_xstate)
> + return true;
> + if (state->enter == intel_idle_hlt)
> + return false;
> + if (state->enter == intel_idle_hlt_irq_on)
> + return false;
> +
> + return true;
> +}
> +
> static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
> {
> int cstate;
> @@ -1922,7 +2004,7 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
> }
>
> mwait_hint = flg2MWAIT(cpuidle_state_table[cstate].flags);
> - if (!intel_idle_verify_cstate(mwait_hint))
> + if (should_verify_mwait(&cpuidle_state_table[cstate]) && !intel_idle_verify_cstate(mwait_hint))
> continue;
>
> /* Structure copy. */
> @@ -2056,6 +2138,45 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
> cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
> }
>
> +static int __init intel_idle_vminit(const struct x86_cpu_id *id)
> +{
> + int retval;
> +
> + cpuidle_state_table = vmguest_cstates;
> +
> + icpu = (const struct idle_cpu *)id->driver_data;
> +
> + pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
> + boot_cpu_data.x86_model);
> +
> + intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
> + if (!intel_idle_cpuidle_devices)
> + return -ENOMEM;
> +
> + intel_idle_cpuidle_driver_init(&intel_idle_driver);
> +
> + retval = cpuidle_register_driver(&intel_idle_driver);
> + if (retval) {
> + struct cpuidle_driver *drv = cpuidle_get_driver();
> + printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
> + drv ? drv->name : "none");
> + goto init_driver_fail;
> + }
> +
> + retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
> + intel_idle_cpu_online, NULL);
> + if (retval < 0)
> + goto hp_setup_fail;
> +
> + return 0;
> +hp_setup_fail:
> + intel_idle_cpuidle_devices_uninit();
> + cpuidle_unregister_driver(&intel_idle_driver);
> +init_driver_fail:
> + free_percpu(intel_idle_cpuidle_devices);
> + return retval;
> +}
> +
> static int __init intel_idle_init(void)
> {
> const struct x86_cpu_id *id;
> @@ -2074,6 +2195,8 @@ static int __init intel_idle_init(void)
> id = x86_match_cpu(intel_idle_ids);
> if (id) {
> if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> + return intel_idle_vminit(id);
> pr_debug("Please enable MWAIT in BIOS SETUP\n");
> return -ENODEV;
> }
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-06-05 15:47 ` [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
2023-06-12 18:05 ` Rafael J. Wysocki
2023-06-15 17:46 ` Rafael J. Wysocki
@ 2023-07-17 8:34 ` Xiaoyao Li
2023-07-17 12:58 ` Rafael J. Wysocki
2023-07-17 14:07 ` Arjan van de Ven
2 siblings, 2 replies; 16+ messages in thread
From: Xiaoyao Li @ 2023-07-17 8:34 UTC (permalink / raw)
To: arjan, linux-pm; +Cc: artem.bityutskiy, rafael, kvm, Dan Wu
+ KVM maillist.
On 6/5/2023 11:47 PM, arjan@linux.intel.com wrote:
...
>
> +static int __init intel_idle_vminit(const struct x86_cpu_id *id)
> +{
> + int retval;
> +
> + cpuidle_state_table = vmguest_cstates;
> +
> + icpu = (const struct idle_cpu *)id->driver_data;
> +
> + pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
> + boot_cpu_data.x86_model);
> +
> + intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
> + if (!intel_idle_cpuidle_devices)
> + return -ENOMEM;
> +
> + intel_idle_cpuidle_driver_init(&intel_idle_driver);
> +
> + retval = cpuidle_register_driver(&intel_idle_driver);
> + if (retval) {
> + struct cpuidle_driver *drv = cpuidle_get_driver();
> + printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
> + drv ? drv->name : "none");
> + goto init_driver_fail;
> + }
> +
> + retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
> + intel_idle_cpu_online, NULL);
> + if (retval < 0)
> + goto hp_setup_fail;
> +
> + return 0;
> +hp_setup_fail:
> + intel_idle_cpuidle_devices_uninit();
> + cpuidle_unregister_driver(&intel_idle_driver);
> +init_driver_fail:
> + free_percpu(intel_idle_cpuidle_devices);
> + return retval;
> +}
> +
> static int __init intel_idle_init(void)
> {
> const struct x86_cpu_id *id;
> @@ -2074,6 +2195,8 @@ static int __init intel_idle_init(void)
> id = x86_match_cpu(intel_idle_ids);
> if (id) {
> if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> + return intel_idle_vminit(id);
It leads to below MSR access error on SPR.
[ 4.158636] unchecked MSR access error: RDMSR from 0xe2 at rIP:
0xffffffffbcaeebed (intel_idle_init_cstates_icpu.constprop.0+0x2dd/0x5a0)
[ 4.174991] Call Trace:
[ 4.179611] <TASK>
[ 4.183610] ? ex_handler_msr+0x11e/0x150
[ 4.190624] ? fixup_exception+0x17e/0x3c0
[ 4.197648] ? gp_try_fixup_and_notify+0x1d/0xc0
[ 4.205579] ? exc_general_protection+0x1bb/0x410
[ 4.213620] ? asm_exc_general_protection+0x26/0x30
[ 4.221624] ? __pfx_intel_idle_init+0x10/0x10
[ 4.228588] ? intel_idle_init_cstates_icpu.constprop.0+0x2dd/0x5a0
[ 4.238632] ? __pfx_intel_idle_init+0x10/0x10
[ 4.246632] ? __pfx_intel_idle_init+0x10/0x10
[ 4.253616] intel_idle_vminit.isra.0+0xf5/0x1d0
[ 4.261580] ? __pfx_intel_idle_init+0x10/0x10
[ 4.269670] ? __pfx_intel_idle_init+0x10/0x10
[ 4.274605] do_one_initcall+0x50/0x230
[ 4.279873] do_initcalls+0xb3/0x130
[ 4.286535] kernel_init_freeable+0x255/0x310
[ 4.293688] ? __pfx_kernel_init+0x10/0x10
[ 4.300630] kernel_init+0x1a/0x1c0
[ 4.305681] ret_from_fork+0x29/0x50
[ 4.312700] </TASK>
On Intel SPR, the call site is
intel_idle_vminit()
-> intel_idle_cpuidle_driver_init()
-> intel_idle_init_cstates_icpu()
-> spr_idle_state_table_update()
-> rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
However, current KVM doesn't provide emulation for
MSR_PKG_CST_CONFIG_CONTROL. It leads to #GP on accessing.
> pr_debug("Please enable MWAIT in BIOS SETUP\n");
> return -ENODEV;
> }
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-07-17 8:34 ` Xiaoyao Li
@ 2023-07-17 12:58 ` Rafael J. Wysocki
2023-07-17 14:07 ` Arjan van de Ven
1 sibling, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2023-07-17 12:58 UTC (permalink / raw)
To: Xiaoyao Li; +Cc: arjan, linux-pm, artem.bityutskiy, rafael, kvm, Dan Wu
On Mon, Jul 17, 2023 at 10:34 AM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
>
> + KVM maillist.
>
> On 6/5/2023 11:47 PM, arjan@linux.intel.com wrote:
> ...
> >
> > +static int __init intel_idle_vminit(const struct x86_cpu_id *id)
> > +{
> > + int retval;
> > +
> > + cpuidle_state_table = vmguest_cstates;
> > +
> > + icpu = (const struct idle_cpu *)id->driver_data;
> > +
> > + pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n",
> > + boot_cpu_data.x86_model);
> > +
> > + intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
> > + if (!intel_idle_cpuidle_devices)
> > + return -ENOMEM;
> > +
> > + intel_idle_cpuidle_driver_init(&intel_idle_driver);
> > +
> > + retval = cpuidle_register_driver(&intel_idle_driver);
> > + if (retval) {
> > + struct cpuidle_driver *drv = cpuidle_get_driver();
> > + printk(KERN_DEBUG pr_fmt("intel_idle yielding to %s\n"),
> > + drv ? drv->name : "none");
> > + goto init_driver_fail;
> > + }
> > +
> > + retval = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "idle/intel:online",
> > + intel_idle_cpu_online, NULL);
> > + if (retval < 0)
> > + goto hp_setup_fail;
> > +
> > + return 0;
> > +hp_setup_fail:
> > + intel_idle_cpuidle_devices_uninit();
> > + cpuidle_unregister_driver(&intel_idle_driver);
> > +init_driver_fail:
> > + free_percpu(intel_idle_cpuidle_devices);
> > + return retval;
> > +}
> > +
> > static int __init intel_idle_init(void)
> > {
> > const struct x86_cpu_id *id;
> > @@ -2074,6 +2195,8 @@ static int __init intel_idle_init(void)
> > id = x86_match_cpu(intel_idle_ids);
> > if (id) {
> > if (!boot_cpu_has(X86_FEATURE_MWAIT)) {
> > + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> > + return intel_idle_vminit(id);
>
> It leads to below MSR access error on SPR.
>
> [ 4.158636] unchecked MSR access error: RDMSR from 0xe2 at rIP:
> 0xffffffffbcaeebed (intel_idle_init_cstates_icpu.constprop.0+0x2dd/0x5a0)
> [ 4.174991] Call Trace:
> [ 4.179611] <TASK>
> [ 4.183610] ? ex_handler_msr+0x11e/0x150
> [ 4.190624] ? fixup_exception+0x17e/0x3c0
> [ 4.197648] ? gp_try_fixup_and_notify+0x1d/0xc0
> [ 4.205579] ? exc_general_protection+0x1bb/0x410
> [ 4.213620] ? asm_exc_general_protection+0x26/0x30
> [ 4.221624] ? __pfx_intel_idle_init+0x10/0x10
> [ 4.228588] ? intel_idle_init_cstates_icpu.constprop.0+0x2dd/0x5a0
> [ 4.238632] ? __pfx_intel_idle_init+0x10/0x10
> [ 4.246632] ? __pfx_intel_idle_init+0x10/0x10
> [ 4.253616] intel_idle_vminit.isra.0+0xf5/0x1d0
> [ 4.261580] ? __pfx_intel_idle_init+0x10/0x10
> [ 4.269670] ? __pfx_intel_idle_init+0x10/0x10
> [ 4.274605] do_one_initcall+0x50/0x230
> [ 4.279873] do_initcalls+0xb3/0x130
> [ 4.286535] kernel_init_freeable+0x255/0x310
> [ 4.293688] ? __pfx_kernel_init+0x10/0x10
> [ 4.300630] kernel_init+0x1a/0x1c0
> [ 4.305681] ret_from_fork+0x29/0x50
> [ 4.312700] </TASK>
>
> On Intel SPR, the call site is
>
> intel_idle_vminit()
> -> intel_idle_cpuidle_driver_init()
> -> intel_idle_init_cstates_icpu()
> -> spr_idle_state_table_update()
> -> rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
>
> However, current KVM doesn't provide emulation for
> MSR_PKG_CST_CONFIG_CONTROL. It leads to #GP on accessing.
>
> > pr_debug("Please enable MWAIT in BIOS SETUP\n");
> > return -ENODEV;
> > }
Well, I'm waiting for a fix from Arjan, thanks!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-07-17 8:34 ` Xiaoyao Li
2023-07-17 12:58 ` Rafael J. Wysocki
@ 2023-07-17 14:07 ` Arjan van de Ven
2023-07-17 14:51 ` Rafael J. Wysocki
1 sibling, 1 reply; 16+ messages in thread
From: Arjan van de Ven @ 2023-07-17 14:07 UTC (permalink / raw)
To: Xiaoyao Li, linux-pm; +Cc: artem.bityutskiy, rafael, kvm, Dan Wu
> It leads to below MSR access error on SPR.
yeah I have a fix for this but Peter instead wants to delete the whole thing...
... so I'm sort of stuck in two worlds. I'll send the fix today but Rafael will need to
chose if he wants to revert or not
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-07-17 14:07 ` Arjan van de Ven
@ 2023-07-17 14:51 ` Rafael J. Wysocki
2023-07-17 14:53 ` Arjan van de Ven
0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2023-07-17 14:51 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Xiaoyao Li, linux-pm, artem.bityutskiy, rafael, kvm, Dan Wu
On Mon, Jul 17, 2023 at 4:10 PM Arjan van de Ven <arjan@linux.intel.com> wrote:
>
> > It leads to below MSR access error on SPR.
> yeah I have a fix for this but Peter instead wants to delete the whole thing...
> ... so I'm sort of stuck in two worlds. I'll send the fix today but Rafael will need to
> chose if he wants to revert or not
I thought that you wanted to fix this rather than to revert, but the
latter is entirely fine with me too.
Given the Peter's general objection to the changes, a revert would
likely be more appropriate until the controversy is resolved.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt
2023-07-17 14:51 ` Rafael J. Wysocki
@ 2023-07-17 14:53 ` Arjan van de Ven
0 siblings, 0 replies; 16+ messages in thread
From: Arjan van de Ven @ 2023-07-17 14:53 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Xiaoyao Li, linux-pm, artem.bityutskiy, kvm, Dan Wu
On 7/17/2023 7:51 AM, Rafael J. Wysocki wrote:
> On Mon, Jul 17, 2023 at 4:10 PM Arjan van de Ven <arjan@linux.intel.com> wrote:
>>
>>> It leads to below MSR access error on SPR.
>> yeah I have a fix for this but Peter instead wants to delete the whole thing...
>> ... so I'm sort of stuck in two worlds. I'll send the fix today but Rafael will need to
>> chose if he wants to revert or not
>
> I thought that you wanted to fix this rather than to revert, but the
> latter is entirely fine with me too.
>
I would rather fix of course, and will send the fixes out shorty (final test ongoing)
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-07-17 14:53 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-05 15:47 [PATCH 0/4 v2] Add support for running in VM guests to intel_idle arjan
2023-06-05 15:47 ` [PATCH 1/4] intel_idle: refactor state->enter manipulation into its own function arjan
2023-06-07 7:34 ` Zhang, Rui
2023-06-07 14:08 ` Arjan van de Ven
2023-06-05 15:47 ` [PATCH 2/4] intel_idle: clean up the (new) state_update_enter_method function arjan
2023-06-12 18:09 ` Rafael J. Wysocki
2023-06-05 15:47 ` [PATCH 3/4] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
2023-06-12 18:05 ` Rafael J. Wysocki
2023-06-12 18:07 ` Arjan van de Ven
2023-06-15 17:46 ` Rafael J. Wysocki
2023-07-17 8:34 ` Xiaoyao Li
2023-07-17 12:58 ` Rafael J. Wysocki
2023-07-17 14:07 ` Arjan van de Ven
2023-07-17 14:51 ` Rafael J. Wysocki
2023-07-17 14:53 ` Arjan van de Ven
2023-06-05 15:47 ` [PATCH 4/4] intel_idle: Add a "Long HLT" C1 state for the VM guest mode arjan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).