linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: arjan@linux.intel.com
To: linux-pm@vger.kernel.org
Cc: artem.bityutskiy@linux.intel.com, rafael@kernel.org,
	Arjan van de Ven <arjan.van.de.ven@intel.com>,
	Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH 7/7] intel_idle: Add a "Long HLT" C1 state for the VM guest mode
Date: Thu,  1 Jun 2023 18:28:01 +0000	[thread overview]
Message-ID: <20230601182801.2622044-8-arjan@linux.intel.com> (raw)
In-Reply-To: <20230601182801.2622044-1-arjan@linux.intel.com>

From: Arjan van de Ven <arjan.van.de.ven@intel.com>

intel_idle will, for the bare metal case, usually have one or more deep
power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When
a state with this flag is selected by the cpuidle framework, it will also
flush the TLBs as part of entering this state. The benefit of doing this is
that the kernel does not need to wake the cpu out of this deep power state
just to flush the TLBs... for which the latency can be very high due to
the exit latency of deep power states.

In a VM guest currently, this benefit of avoiding the wakeup does not exist,
while the problem (long exit latency) is even more severe. Linux will need
to wake up a vCPU (causing the host to either come out of a deep C state,
or the VMM to have to deschedule something else to schedule the vCPU) which
can take a very long time.. adding a lot of latency to tlb flush operations
(including munmap and others).

To solve this, add a "Long HLT" C state to the state table for the VM guest
case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set.  The result of that is
that for long idle periods (where the VMM is likely to do things that cause
large latency) the cpuidle framework will flush the TLBs (and avoid the
wakeups), while for short/quick idle durations, the existing behavior is
retained.

Now, there is still only "hlt" available in the guest, but for long idle,
the host can go to a deeper state (say C6).  There is a reasonable debate
one can have to what to set for the exit_latency and break even point for
this "Long HLT" state.  The good news is that intel_idle has these values
available for the underlying CPU (even when mwait is not exposed).  The
solution thus is to just use the latency and break even of the deepest state
from the bare metal CPU.  This is under the assumption that this is a pretty
reasonable estimate of what the VMM would do to cause latency.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/idle/intel_idle.c | 54 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c4929d8a35a4..e056e1ec64a9 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1288,6 +1288,13 @@ static struct cpuidle_state vmguest_cstates[] __initdata = {
 		.exit_latency = 5,
 		.target_residency = 10,
 		.enter = &intel_idle_hlt_irq, },
+	{
+		.name = "C1L",
+		.desc = "Long HLT",
+		.flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TLB_FLUSHED,
+		.exit_latency = 5,
+		.target_residency = 200,
+		.enter = &intel_idle_hlt, },
 	{
 		.enter = NULL }
 };
@@ -2117,6 +2124,44 @@ static void __init intel_idle_cpuidle_devices_uninit(void)
 		cpuidle_unregister_device(per_cpu_ptr(intel_idle_cpuidle_devices, i));
 }
 
+/*
+ * Match up the latency and break even point of the bare metal (cpu based)
+ * states with the deepest VM available state.
+ *
+ * We only want to do this for the deepest state, the ones that has
+ * the TLB_FLUSHED flag set on the .
+ *
+ * All our short idle states are dominated by vmexit/vmenter latencies,
+ * not the underlying hardware latencies so we keep our values for these.
+ */
+static void matchup_vm_state_with_baremetal(void)
+{
+	int cstate;
+
+	for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
+		int matching_cstate;
+
+		if (intel_idle_max_cstate_reached(cstate))
+			break;
+
+		if (!cpuidle_state_table[cstate].enter &&
+		    !cpuidle_state_table[cstate].enter_s2idle)
+			break;
+
+		if (!(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_TLB_FLUSHED))
+			continue;
+
+		for (matching_cstate = 0; matching_cstate < CPUIDLE_STATE_MAX; ++matching_cstate) {
+			if (icpu->state_table[matching_cstate].exit_latency > cpuidle_state_table[cstate].exit_latency) {
+				cpuidle_state_table[cstate].exit_latency = icpu->state_table[matching_cstate].exit_latency;
+				cpuidle_state_table[cstate].target_residency = icpu->state_table[matching_cstate].target_residency;
+			}
+		}
+
+	}
+}
+
+
 static int __init intel_idle_vminit(const struct x86_cpu_id *id)
 {
 	int retval;
@@ -2133,6 +2178,15 @@ static int __init intel_idle_vminit(const struct x86_cpu_id *id)
 	if (!intel_idle_cpuidle_devices)
 		return -ENOMEM;
 
+	/*
+	 * We don't know exactly what the host will do when we go idle, but as a worst estimate
+	 * we can assume that the exit latency of the deepest host state will be hit for our
+	 * deep (long duration) guest idle state.
+	 * The same logic applies to the break even point for the long duration guest idle state.
+	 * So lets copy these two properties from the table we found for the host CPU type.
+	 */
+	matchup_vm_state_with_baremetal();
+
 	intel_idle_cpuidle_driver_init(&intel_idle_driver);
 
 	retval = cpuidle_register_driver(&intel_idle_driver);
-- 
2.40.1


  parent reply	other threads:[~2023-06-01 18:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-01 18:27 [PATCH 00/7 Add support for running in VM guests to intel_idle arjan
2023-06-01 18:27 ` [PATCH 1/7] intel_idle: refactor state->enter manipulation into its own function arjan
2023-06-01 18:27 ` [PATCH 2/7] intel_idle: clean up the (new) state_update_enter_method function arjan
2023-06-04 15:34   ` Rafael J. Wysocki
2023-06-04 22:35     ` Van De Ven, Arjan
2023-06-01 18:27 ` [PATCH 3/7] intel_idle: Add a sanity check in the new " arjan
2023-06-04 15:43   ` Rafael J. Wysocki
2023-06-01 18:27 ` [PATCH 4/7] intel_idle: Add helper functions to support 'hlt' as idle state arjan
2023-06-04 15:46   ` Rafael J. Wysocki
2023-06-01 18:27 ` [PATCH 5/7] intel_idle: Add a way to skip the mwait check on all states arjan
2023-06-04 15:54   ` Rafael J. Wysocki
2023-06-05 15:24     ` Arjan van de Ven
2023-06-01 18:28 ` [PATCH 6/7] intel_idle: Add support for using intel_idle in a VM guest using just hlt arjan
2023-06-04 15:59   ` Rafael J. Wysocki
2023-06-04 22:34     ` Van De Ven, Arjan
2023-06-01 18:28 ` arjan [this message]
2023-06-04 15:01 ` [PATCH 00/7 Add support for running in VM guests to intel_idle Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230601182801.2622044-8-arjan@linux.intel.com \
    --to=arjan@linux.intel.com \
    --cc=arjan.van.de.ven@intel.com \
    --cc=artem.bityutskiy@linux.intel.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).