linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Loehle <christian.loehle@arm.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Artem Bityutskiy <artem.bityutskiy@linux.intel.com>,
	Aboorva Devarajan <aboorvad@linux.ibm.com>
Subject: Re: [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals
Date: Mon, 10 Feb 2025 14:47:06 +0000	[thread overview]
Message-ID: <22c8bff8-4aea-4d24-875e-96ac18f921c3@arm.com> (raw)
In-Reply-To: <CAJZ5v0hxx-9AVJPJRRVPU8kx4kt9pgr6NhDuhEU_RHj=tCJQvw@mail.gmail.com>

On 2/10/25 14:43, Rafael J. Wysocki wrote:
> On Mon, Feb 10, 2025 at 3:15 PM Christian Loehle
> <christian.loehle@arm.com> wrote:
>>
>> On 2/6/25 14:21, Rafael J. Wysocki wrote:
>>> Hi Everyone,
>>>
>>> This work had been triggered by a report that commit 0611a640e60a ("eventpoll:
>>> prefer kfree_rcu() in __ep_remove()") had caused the critical-jOPS metric of
>>> the SPECjbb 2015 benchmark [1] to drop by around 50% even though it generally
>>> reduced kernel overhead.  Indeed, it was found during further investigation
>>> that the total interrupt rate while running the SPECjbb workload had fallen as
>>> a result of that commit by 55% and the local timer interrupt rate had fallen by
>>> almost 80%.
>>>
>>> That turned out to cause the menu cpuidle governor to select the deepest idle
>>> state supplied by the cpuidle driver (intel_idle) much more often which added
>>> significantly more idle state latency to the workload and that led to the
>>> decrease of the critical-jOPS score.
>>>
>>> Interestingly enough, this problem was not visible when the teo cpuidle
>>> governor was used instead of menu, so it appeared to be specific to the
>>> latter.  CPU wakeup event statistics collected while running the workload
>>> indicated that the menu governor was effectively ignoring non-timer wakeup
>>> information and all of its idle state selection decisions appeared to be
>>> based on timer wakeups only.  Thus, it appeared that the reduction of the
>>> local timer interrupt rate caused the governor to predict a idle duration
>>> much more often while running the workload and the deepest idle state was
>>> selected significantly more often as a result of that.
>>>
>>> A subsequent inspection of the get_typical_interval() function in the menu
>>> governor indicated that it might return UINT_MAX too often which then caused
>>> the governor's decisions to be based entirely on information related to timers.
>>>
>>> Generally speaking, UINT_MAX is returned by get_typical_interval() if it
>>> cannot make a prediction based on the most recent idle intervals data with
>>> sufficiently high confidence, but at least in some cases this means that
>>> useful information is not taken into account at all which may lead to
>>> significant idle state selection mistakes.  Moreover, this is not really
>>> unlikely to happen.
>>>
>>> One issue with get_typical_interval() is that, when it eliminates outliers from
>>> the sample set in an attempt to reduce the standard deviation (and so improve
>>> the prediction confidence), it does that by dropping high-end samples only,
>>> while samples at the low end of the set are retained.  However, the samples
>>> at the low end very well may be the outliers and they should be eliminated
>>> from the sample set instead of the high-end samples.  Accordingly, the
>>> likelihood of making a meaningful idle duration prediction can be improved
>>> by making it also eliminate low-end samples if they are farther from the
>>> average than high-end samples.  This is done in patch [4/5].
>>>
>>> Another issue is that get_typical_interval() gives up after eliminating 1/4
>>> of the samples if the standard deviation is still not as low as desired (within
>>> 1/6 of the average or within 20 us if the average is close to 0), but the
>>> remaining samples in the set still represent useful information at that point
>>> and discarding them altogether may lead to suboptimal idle state selection.
>>>
>>> For instance, the largest idle duration value in the get_typical_interval()
>>> data set is the maximum idle duration observed recently and it is likely that
>>> the upcoming idle duration will not exceed it.  Therefore, in the absence of
>>> a better choice, this value can be used as an upper bound on the target
>>> residency of the idle state to select.  Patch [5/5] works along these lines,
>>> but it takes the maximum data point remaining after the elimination of
>>> outliers.
>>>
>>> The first two patches in the series are straightforward cleanups (in fact,
>>> the first patch is kind of reversed by patch [4/5], but it is there because
>>> it can be applied without the latter) and patch [3/5] is a cosmetic change
>>> made in preparation for the subsequent ones.
>>>
>>> This series turns out to restore the SPECjbb critical-jOPS metric on affected
>>> systems to the level from before commit 0611a640e60a and it also happens to
>>> increase its max-jOPS metric by around 3%.
>>>
>>> For easier reference/testing it is present in the git branch at
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/menu
>>>
>>> based on the cpuidle material that went into 6.14-rc1.
>>>
>>> If possible, please let me know if it works for you.
>>>
>>> Thanks!
>>>
>>>
>>> [1] Link: https://www.spec.org/jbb2015/
>>
>> 5/5 shows significant IO workload improvements (the shorter wakeup scenario is
>> much more likely to be picked up now).
>> I don't see a significant regression in idle misses so far, I'll try Android
>> backports soon and some other system.
> 
> Sounds good, thanks!
> 
>> Here's a full dump, sorry it's from a different system (rk3588, only two idle
>> states), apparently eth networking is broken on 6.14-rc1 now on rk3399 :(
>>
>> For dm-delay 51ms (dm-slow) the command is (8 CPUs)
>> fio --minimal --time_based --group_reporting --name=fiotest --filename=/dev/mapper/dm-slow --runtime=30s --numjobs=16 --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1
>> For the rest:
>> fio --minimal --time_based --name=fiotest --filename=/dev/mmcblk1 --runtime=30s --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1
> 
> Thanks for the data!
> 
> Do I understand correctly that menu-X is menu with patches [1-X/5]
> applied?  And what's menu-m?

Correct about -1 to -5.
-m is just mainline. I changed that over time, it's just equivalent to menu now.
I'll just run that twice in the future (to be able to check for side-effects
like thermals because they all run one after the other).


  reply	other threads:[~2025-02-10 14:47 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 14:21 [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Rafael J. Wysocki
2025-02-06 14:22 ` [RFT][PATCH v1 1/5] cpuidle: menu: Drop a redundant local variable Rafael J. Wysocki
2025-02-06 14:55   ` Christian Loehle
2025-02-06 14:24 ` [RFT][PATCH v1 2/5] cpuidle: menu: Use one loop for average and variance computations Rafael J. Wysocki
2025-02-17 13:03   ` Christian Loehle
2025-02-06 14:25 ` [RFT][PATCH v1 3/5] cpuidle: menu: Tweak threshold use in get_typical_interval() Rafael J. Wysocki
2025-02-17 13:08   ` Christian Loehle
2025-02-06 14:26 ` [RFT][PATCH v1 4/5] cpuidle: menu: Eliminate outliers on both ends of the sample set Rafael J. Wysocki
2025-02-17 13:26   ` Christian Loehle
2025-02-06 14:29 ` [RFT][PATCH v1 5/5] cpuidle: menu: Avoid discarding useful information Rafael J. Wysocki
2025-02-17 13:39   ` Christian Loehle
2025-02-17 13:47     ` Rafael J. Wysocki
2025-08-04 16:54   ` Marc Zyngier
2025-08-05 13:23     ` Rafael J. Wysocki
2025-08-05 14:41       ` Christian Loehle
2025-08-05 16:00       ` Marc Zyngier
2025-08-05 18:50         ` Rafael J. Wysocki
2025-08-06  7:19           ` Marc Zyngier
2025-08-06 12:48           ` Christian Loehle
2025-02-07 14:48 ` [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Artem Bityutskiy
2025-02-07 15:24   ` Christian Loehle
2025-02-07 15:35     ` Rafael J. Wysocki
2025-02-07 15:45   ` Rafael J. Wysocki
2025-03-12 21:38   ` Doug Smythies
2025-02-10 14:15 ` Christian Loehle
2025-02-10 14:43   ` Rafael J. Wysocki
2025-02-10 14:47     ` Christian Loehle [this message]
2025-02-18 21:17   ` Christian Loehle
2025-02-19 12:06     ` Rafael J. Wysocki
2025-02-14  4:30 ` Doug Smythies
2025-02-14 22:10   ` Rafael J. Wysocki
2025-02-16 16:16     ` Doug Smythies
2025-02-24  6:27 ` Aboorva Devarajan
2025-02-24  6:38   ` Aboorva Devarajan
2025-02-24 12:35   ` Rafael J. Wysocki
2025-02-26  4:49   ` Aboorva Devarajan
2025-02-26 10:54     ` Christian Loehle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22c8bff8-4aea-4d24-875e-96ac18f921c3@arm.com \
    --to=christian.loehle@arm.com \
    --cc=aboorvad@linux.ibm.com \
    --cc=artem.bityutskiy@linux.intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).