linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aboorva Devarajan <aboorvad@linux.ibm.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Christian Loehle <christian.loehle@arm.com>,
	Artem Bityutskiy <artem.bityutskiy@linux.intel.com>,
	aboorvad@linux.ibm.com
Subject: Re: [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals
Date: Mon, 24 Feb 2025 11:57:22 +0530	[thread overview]
Message-ID: <d0c013d5d2a9251d5dc468446f2a08ae8a7a8953.camel@linux.ibm.com> (raw)
In-Reply-To: <1916668.tdWV9SEqCh@rjwysocki.net>

On Thu, 2025-02-06 at 15:21 +0100, Rafael J. Wysocki wrote:

> Hi Everyone,
> 
> This work had been triggered by a report that commit 0611a640e60a ("eventpoll:
> prefer kfree_rcu() in __ep_remove()") had caused the critical-jOPS metric of
> the SPECjbb 2015 benchmark [1] to drop by around 50% even though it generally
> reduced kernel overhead.  Indeed, it was found during further investigation
> that the total interrupt rate while running the SPECjbb workload had fallen as
> a result of that commit by 55% and the local timer interrupt rate had fallen by
> almost 80%.
> 
> That turned out to cause the menu cpuidle governor to select the deepest idle
> state supplied by the cpuidle driver (intel_idle) much more often which added
> significantly more idle state latency to the workload and that led to the
> decrease of the critical-jOPS score.
> 
> Interestingly enough, this problem was not visible when the teo cpuidle
> governor was used instead of menu, so it appeared to be specific to the
> latter.  CPU wakeup event statistics collected while running the workload
> indicated that the menu governor was effectively ignoring non-timer wakeup
> information and all of its idle state selection decisions appeared to be
> based on timer wakeups only.  Thus, it appeared that the reduction of the
> local timer interrupt rate caused the governor to predict a idle duration
> much more often while running the workload and the deepest idle state was
> selected significantly more often as a result of that.
> 
> A subsequent inspection of the get_typical_interval() function in the menu
> governor indicated that it might return UINT_MAX too often which then caused
> the governor's decisions to be based entirely on information related to timers.
> 
> Generally speaking, UINT_MAX is returned by get_typical_interval() if it
> cannot make a prediction based on the most recent idle intervals data with
> sufficiently high confidence, but at least in some cases this means that
> useful information is not taken into account at all which may lead to
> significant idle state selection mistakes.  Moreover, this is not really
> unlikely to happen.
> 
> One issue with get_typical_interval() is that, when it eliminates outliers from
> the sample set in an attempt to reduce the standard deviation (and so improve
> the prediction confidence), it does that by dropping high-end samples only,
> while samples at the low end of the set are retained.  However, the samples
> at the low end very well may be the outliers and they should be eliminated
> from the sample set instead of the high-end samples.  Accordingly, the
> likelihood of making a meaningful idle duration prediction can be improved
> by making it also eliminate low-end samples if they are farther from the
> average than high-end samples.  This is done in patch [4/5].
> 
> Another issue is that get_typical_interval() gives up after eliminating 1/4
> of the samples if the standard deviation is still not as low as desired (within
> 1/6 of the average or within 20 us if the average is close to 0), but the
> remaining samples in the set still represent useful information at that point
> and discarding them altogether may lead to suboptimal idle state selection.
> 
> For instance, the largest idle duration value in the get_typical_interval()
> data set is the maximum idle duration observed recently and it is likely that
> the upcoming idle duration will not exceed it.  Therefore, in the absence of
> a better choice, this value can be used as an upper bound on the target
> residency of the idle state to select.  Patch [5/5] works along these lines,
> but it takes the maximum data point remaining after the elimination of
> outliers.
> 
> The first two patches in the series are straightforward cleanups (in fact,
> the first patch is kind of reversed by patch [4/5], but it is there because
> it can be applied without the latter) and patch [3/5] is a cosmetic change
> made in preparation for the subsequent ones.
> 
> This series turns out to restore the SPECjbb critical-jOPS metric on affected
> systems to the level from before commit 0611a640e60a and it also happens to
> increase its max-jOPS metric by around 3%.
> 
> For easier reference/testing it is present in the git branch at
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/menu
> 
> based on the cpuidle material that went into 6.14-rc1.
> 
> If possible, please let me know if it works for you.
> 
> Thanks!
> 
> 
> [1] Link: https://www.spec.org/jbb2015/
> 
> 
> 

Hi Rafael,

I ran some tests using a cpuidle microbenchmark that I have [1]:

The test does a uniform distribution of idle durations, which means the past eight idle intervals approximately match 
the sleep duration. So as anticipated, this change-set does not impact this case, and the behavior remains mostly
consistent even after applying the patch.

 Pipe based wakeup - Above %
 ----------------------------------------------------------------------
| Sleep time (s)  | Menu No Patch(%) | Menu Patch(%)| Teo No Patch (%) |
|---------------- |------------------|--------------|------------------|
| 5               | 0.00%            | 0.00%        | 0.00%            |
| 10              | 0.00%            | 0.00%        | 0.00%            |
------------------------------------------------------------------------
| 30              | 99.97%           | 99.96%       | 0.00%            |
| 100             | 98.61%           | 98.67%       | 0.00%            |
| 120             | 97.88%           | 98.42%       | 1.03%            | -> (*)
------------------------------------------------------------------------
| 130             | 2.82%            | 1.26%        | 0.22%            |
| 150             | 1.68%            | 1.43%        | 0.32%            |
| 200             | 2.09%            | 1.91%        | 0.15%            |
| 300             | 1.22%            | 1.22%        | 0.19%            |
| 400             | 1.20%            | 1.21%        | 0.19%            |
| 500             | 1.16%            | 1.12%        | 0.12%            |
------------------------------------------------------------------------

 Pipe based wakeup - Below %
 -----------------------------------------------------------------------
| Sleep time (s)  | Menu No Patch(%) | Menu Patch(%)| Teo No Patch (%) |
|---------------- |------------------|--------------|------------------|
| 5               | 0.00%            | 0.00%        | 0.00%            |
| 10              | 0.00%            | 0.00%        | 0.00%            |
| 30              | 0.00%            | 0.00%        | 0.00%            |
| 100             | 0.00%            | 0.00%        | 0.00%            |
| 120             | 2.76%            | 1.14%        | 0.93%            |
| 130             | 3.11%            | 1.13%        | 0.12%            |
| 150             | 1.34%            | 1.16%        | 0.18%            |
| 200             | 1.38%            | 1.15%        | 0.09%            |
| 300             | 1.33%            | 1.21%        | 0.11%            |
| 400             | 1.36%            | 1.25%        | 0.10%            |
| 500             | 1.25%            | 1.22%        | 0.10%            |
|---------------- |------------------|--------------|------------------|

 Time based wakeup - Above %
 -----------------------------------------------------------------------
| Sleep time (s)  | Menu No Patch(%) | Menu Patch(%)| Teo No Patch (%) |
|---------------- |------------------|--------------|------------------|
| 5               | 0.00%            | 0.00%        | 0.00%            |
| 10              | 0.00%            | 0.00%        | 0.00%            |
| 30              | 0.00%            | 0.00%        | 0.00%            |
| 100             | 0.01%            | 0.00%        | 0.00%            |
| 120             | 0.00%            | 0.00%        | 0.15%            |
| 130             | 15.84%           | 0.14%        | 0.23%            |
| 150             | 0.39%            | 0.23%        | 0.48%            |
| 200             | 0.95%            | 0.87%        | 0.10%            |
| 300             | 0.20%            | 0.17%        | 0.15%            |
| 400             | 0.14%            | 0.12%        | 0.17%            |
| 500             | 0.10%            | 0.20%        | 0.11%            |
|---------------- |------------------|--------------|------------------|

 Time based wakeup - Below %
 -----------------------------------------------------------------------
| Sleep time (s)  | Menu No Patch(%) | Menu Patch(%)| Teo No Patch (%) |
|---------------- |------------------|--------------|------------------|
| 5               | 0.00%            | 0.00%        | 0.00%            |
| 10              | 0.00%            | 0.00%        | 0.00%            |
| 30              | 0.00%            | 0.00%        | 0.00%            |
| 100             | 0.00%            | 0.00%        | 0.00%            |
| 120             | 1.85%            | 1.66%        | 2.67%            |
| 130             | 16.71%           | 1.13%        | 1.11%            |
| 150             | 1.36%            | 1.16%        | 1.13%            |
| 200             | 1.33%            | 1.14%        | 1.19%            |
| 300             | 1.44%            | 1.20%        | 1.17%            |
| 400             | 1.51%            | 1.21%        | 1.21%            |
| 500             | 1.42%            | 1.24%        | 1.25%            |
|---------------- |------------------|--------------|------------------|


(*) - Above and below values are higher even without the patch with menu governor,
      this issue still persists, as previously reported in [2]. I will investigate
      further and submit a revision to get additional feedback.

I also carried out some benchmarks using pgbench:

pgbench Results

Without Patch:
-----------------------------------------------------------------------------
| Run  | Transactions Processed | Latency Avg (ms) | TPS (without init time) |
-----------------------------------------------------------------------------
|  1   |  11,936,327            |  0.050           |  198,946.141025         |
|  2   |  11,899,540            |  0.050           |  198,333.097547         |
|  3   |  11,870,792            |  0.051           |  197,853.728614         |
|  4   |  11,901,670            |  0.050           |  198,368.526139         |
|  5   |  11,922,046            |  0.050           |  198,708.112243         |
-----------------------------------------------------------------------------
| Avg  |  11,906,075            |  0.0502          |  198,441.921114         |
-----------------------------------------------------------------------------

With Patch:
-----------------------------------------------------------------------------
| Run  | Transactions Processed | Latency Avg (ms) | TPS (without init time) |
-----------------------------------------------------------------------------
|  1   |  12,052,865            |  0.050           |  200,888.492771         |
|  2   |  12,058,359            |  0.050           |  200,979.895325         |
|  3   |  12,071,012            |  0.050           |  201,190.809734         |
|  4   |  12,054,646            |  0.050           |  200,918.076736         |
|  5   |  12,053,087            |  0.050           |  200,892.045581         |
-----------------------------------------------------------------------------
| Avg  |  12,058,394            |  0.0500          |  200,973.464029         |
------------------------------------------------------------------------------

Performance Improvement After Patch:
--------------------------------------------------------------------------------------
| Metric                  | Without Patch (Avg.)| With Patch (Avg.)  | % Improvement |
--------------------------------------------------------------------------------------
| Transactions Processed  |  11,906,075         |  12,058,394        |  +1.28%       | 
| Latency Avg (ms)        |  0.0502             |  0.0500            |  -0.40%       |
| TPS (without init time) |  198,441.921114     |  200,973.464029    |  +1.28%       |
--------------------------------------------------------------------------------------

So with the patch pgbench's "Transactions Processed" improves by ~1.28%, and I did not observe
any major variations on other benchmarks with the patch.

So for the entire series:

Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>

I'm also trying a minimal unit fuzz-test with the pre- and post- patched version of the get_typical_interval 
function to understand this better, will post the results soon.

[1] - https://github.com/AboorvaDevarajan/linux-utils/tree/main/cpuidle/cpuidle_wakeup
[2] - https://lore.kernel.org/all/20240809073120.250974-1-aboorvad@linux.ibm.com/


Thanks,
Aboorva

  parent reply	other threads:[~2025-02-24  6:27 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 14:21 [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Rafael J. Wysocki
2025-02-06 14:22 ` [RFT][PATCH v1 1/5] cpuidle: menu: Drop a redundant local variable Rafael J. Wysocki
2025-02-06 14:55   ` Christian Loehle
2025-02-06 14:24 ` [RFT][PATCH v1 2/5] cpuidle: menu: Use one loop for average and variance computations Rafael J. Wysocki
2025-02-17 13:03   ` Christian Loehle
2025-02-06 14:25 ` [RFT][PATCH v1 3/5] cpuidle: menu: Tweak threshold use in get_typical_interval() Rafael J. Wysocki
2025-02-17 13:08   ` Christian Loehle
2025-02-06 14:26 ` [RFT][PATCH v1 4/5] cpuidle: menu: Eliminate outliers on both ends of the sample set Rafael J. Wysocki
2025-02-17 13:26   ` Christian Loehle
2025-02-06 14:29 ` [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information Rafael J. Wysocki
2025-02-17 13:39   ` Christian Loehle
2025-02-17 13:47     ` Rafael J. Wysocki
2025-08-04 16:54   ` Marc Zyngier
2025-08-05 13:23     ` Rafael J. Wysocki
2025-08-05 14:41       ` Christian Loehle
2025-08-05 16:00       ` Marc Zyngier
2025-08-05 18:50         ` Rafael J. Wysocki
2025-08-06  7:19           ` Marc Zyngier
2025-08-06 12:48           ` Christian Loehle
2025-02-07 14:48 ` [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Artem Bityutskiy
2025-02-07 15:24   ` Christian Loehle
2025-02-07 15:35     ` Rafael J. Wysocki
2025-02-07 15:45   ` Rafael J. Wysocki
2025-03-12 21:38   ` Doug Smythies
2025-02-10 14:15 ` Christian Loehle
2025-02-10 14:43   ` Rafael J. Wysocki
2025-02-10 14:47     ` Christian Loehle
2025-02-18 21:17   ` Christian Loehle
2025-02-19 12:06     ` Rafael J. Wysocki
2025-02-14  4:30 ` Doug Smythies
2025-02-14 22:10   ` Rafael J. Wysocki
2025-02-16 16:16     ` Doug Smythies
2025-02-24  6:27 ` Aboorva Devarajan [this message]
2025-02-24  6:38   ` Aboorva Devarajan
2025-02-24 12:35   ` Rafael J. Wysocki
2025-02-26  4:49   ` Aboorva Devarajan
2025-02-26 10:54     ` Christian Loehle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0c013d5d2a9251d5dc468446f2a08ae8a7a8953.camel@linux.ibm.com \
    --to=aboorvad@linux.ibm.com \
    --cc=artem.bityutskiy@linux.intel.com \
    --cc=christian.loehle@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).