All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Doug Smythies" <dsmythies@telus.net>
To: "'Rafael J. Wysocki'" <rjw@rjwysocki.net>
Cc: "'LKML'" <linux-kernel@vger.kernel.org>,
	"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
	"'Christian Loehle'" <christian.loehle@arm.com>,
	"'Artem Bityutskiy'" <artem.bityutskiy@linux.intel.com>,
	"'Aboorva Devarajan'" <aboorvad@linux.ibm.com>,
	"Doug Smythies" <dsmythies@telus.net>,
	"'Linux PM'" <linux-pm@vger.kernel.org>
Subject: RE: [RFT][PATCH v1] cpuidle: teo: Avoid selecting deepest idle state over-eagerly
Date: Fri, 7 Feb 2025 15:40:33 -0800	[thread overview]
Message-ID: <009d01db79b9$aecd1c70$0c675550$@telus.net> (raw)
In-Reply-To: <12630185.O9o76ZdvQC@rjwysocki.net>

Hi Rafael,

On 2025.02.04 12:58 Rafael J. Wysocki wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It has been observed that the recent teo governor update which concluded
> with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation
> for low latency constraints") caused the max-jOPS score of the SPECjbb
> 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%.
> While it may be argued that this is not a significant increase, the
> previous score can be restored by tweaking the inequality used by teo
> to decide whether or not to preselect the deepest enabled idle state.
> That change also causes the critical-jOPS score of SPECjbb to increase
> by around 2%.
>
> Namely, the likelihood of selecting the deepest enabled idle state in
> teo on the platform in question has increased after commit 13ed5c4a6d9c
> ("cpuidle: teo: Skip getting the sleep length if wakeups are very
> frequent") because some timer wakeups were previously counted as non-
> timer ones and they were effectively added to the left-hand side of the
> inequality deciding whether or not to preselect the deepest idle state.
>
> Many of them are now (accurately) counted as timer wakeups, so the left-
> hand side of that inequality is now effectively smaller in some cases,
> especially when timer wakeups often occur in the range below the target
> residency of the deepest enabled idle state and idle states with target
> residencies below CPUIDLE_FLAG_POLLING are often selected, but the
> majority of recent idle intervals are still above that value most of
> the time.  As a result, the deepest enabled idle state may be selected
> more often than it used to be selected in some cases.
>
> To counter that effect, add the sum of the hits metric for all of the
> idle states below the candidate one (which is the deepest enabled idle
> state at that point) to the left-hand side of the inequality mentioned
> above.  This will cause it to be more balanced because, in principle,
> putting both timer and non-timer wakeups on both sides of it is more
> consistent than only taking into account the timer wakeups in the range
> above the target residency of the deepest enabled idle state.
>
> Link: https://www.spec.org/jbb2015/
> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpuidle/governors/teo.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -349,13 +349,13 @@
>         }
>
>         /*
> -        * If the sum of the intercepts metric for all of the idle states
> -        * shallower than the current candidate one (idx) is greater than the
> +        * If the sum of the intercepts and hits metric for all of the idle
> +        * states below the current candidate one (idx) is greater than the
>          * sum of the intercepts and hits metrics for the candidate state and
>          * all of the deeper states, a shallower idle state is likely to be a
>          * better choice.
>          */
> -       if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> +       if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) {
>                 int first_suitable_idx = idx;
>
>                 /*

I have only just started testing the recent idle governor changes,
and have not gotten very far yet.

There is a significant increase in processor package power during idle
with this patch, about 5 times increase (400%).

My processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
Distro: Ubuntu 24.04.1, server, no desktop GUI.
CPU frequency scaling driver: intel_pstate
HWP: disabled.
CPU frequency scaling governor: performance

Idle states:
$ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI

Test durations were >= 45 minutes each.

Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update
Average Idle Power: teo governor: 2.199 watts (+25.51%)
Average Idle power: menu governor: 1.873 watts (+6.91%)

Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu
Average Idle Power: teo governor: 9.401 watts (+436.6%)
Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%.
28.6531% idle time is in state 1. More typically it would be 0.03%
Average Idle Power: menu governor: 1.820 watts (+3.9%)

Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update"
Average Idle Power: teo governor: 1.752 watts (reference: 0.0%)
Average Idle power: menu governor: 1.909 watts (+9.0%)

... Doug



  parent reply	other threads:[~2025-02-07 23:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-04 20:58 [RFT][PATCH v1] cpuidle: teo: Avoid selecting deepest idle state over-eagerly Rafael J. Wysocki
2025-02-06 14:37 ` Rafael J. Wysocki
2025-02-07 23:40 ` Doug Smythies [this message]
2025-02-08 11:24   ` Rafael J. Wysocki
2025-02-10 15:17     ` Doug Smythies
2025-02-09  9:24   ` Artem Bityutskiy
2025-02-10 15:17     ` Doug Smythies
2025-02-13 14:07 ` Christian Loehle
2025-02-14  4:23   ` Doug Smythies
2025-02-14 21:34   ` Rafael J. Wysocki
2025-02-18 11:28     ` Christian Loehle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='009d01db79b9$aecd1c70$0c675550$@telus.net' \
    --to=dsmythies@telus.net \
    --cc=aboorvad@linux.ibm.com \
    --cc=artem.bityutskiy@linux.intel.com \
    --cc=christian.loehle@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.