Re: [PATCH 0/1] cpuidle/menu: Address performance drop from favoring physical over polling cpuidle state

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

From: Christian Loehle <christian.loehle@arm.com>
To: Aboorva Devarajan <aboorvad@linux.ibm.com>,
	rafael@kernel.org, daniel.lezcano@linaro.org,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: gautam@linux.ibm.com
Subject: Re: [PATCH 0/1] cpuidle/menu: Address performance drop from favoring physical over polling cpuidle state
Date: Thu, 19 Sep 2024 15:07:09 +0100	[thread overview]
Message-ID: <964275d1-5f31-4eef-8dcd-fa8f11dbb830@arm.com> (raw)
In-Reply-To: <592a0c06-a23c-464a-9f67-2ce8ce91a306@arm.com>

On 9/19/24 10:43, Christian Loehle wrote:
> On 9/19/24 09:49, Christian Loehle wrote:
>> On 9/19/24 06:02, Aboorva Devarajan wrote:
>>> On Wed, 2024-08-21 at 11:55 +0100, Christian Loehle wrote:
>>
>>>
>>> Timer based wakeup:
>>>
>>> +------------------------+---------------------+---------------------+
>>> | Metric                 | Without Patch       | With Patch          |
>>> +------------------------+---------------------+---------------------+
>>> | Wakee thread - CPU     | 110                 | 110                 |
>>> | Waker thread - CPU     | 20                  | 20                  |
>>> | Sleep Interval         | 50 us               | 50 us               |
>>> | Total Wakeups          | -                   | -                   |
>>> | Avg Wakeup Latency     | -                   | -                   |
>>> | Actual Sleep time      | 52.639 us           | 52.627 us           |
>>> +------------------------+---------------------+---------------------+
>>> | Idle State 0 Usage Diff| 94,879              | 94,879              |
>>> | Idle State 0 Time Diff | 4,700,323 ns        | 4,697,576 ns        |
>>> | Idle State 1 Usage Diff| 0                   | 0                   |
>>> | Idle State 1 Time Diff | 0 ns                | 0 ns                |
>>> +------------------------+---------------------+---------------------+
>>> | Total Above Usage      | 0 (0.00%)           | (0.00%)             |
>>> +------------------------+---------------------+---------------------+
>>> | Total Below Usage      | 0 (0.00%)           | 0 (0.00%)           |
>>> +------------------------+---------------------+---------------------+
>>>
>>> In timer-based wakeups, the menu governor effectively predicts the idle
>>> duration both with and without the patch. This ensures that there are
>>> few or no instances of "Above" usage, allowing the CPU to remain in the
>>> correct idle state.
>>>
>>> The condition (s->target_residency_ns <= data->next_timer_ns) in the menu
>>> governor ensures that a physical idle state is not prioritized when a
>>> timer event is expected before the target residency of the first physical
>>> idle state.
>>>
>>> As a result, the patch has no impact in this case, and performance
>>> remains stable with timer based wakeups.
>>>
>>> Pipe based wakeup (non-timer wakeup):
>>>
>>> +------------------------+---------------------+---------------------+
>>> | Metric                 | Without Patch       | With Patch          |
>>> +------------------------+---------------------+---------------------+
>>> | Wakee thread - CPU     | 110                 | 110                 |
>>> | Waker thread - CPU     | 20                  | 20                  |
>>> | Sleep Interval         | 50 us               | 50 us               |
>>> | Total Wakeups          | 97031               | 96583               |
>>> | Avg Wakeup Latency     | 7.070 us            | 4.879 us            |
>>> | Actual Sleep time      | 51.366 us           | 51.605 us           |
>>> +------------------------+---------------------+---------------------+
>>> | Idle State 0 Usage Diff| 1209                | 96,586              |
>>> | Idle State 0 Time Diff | 55,579 ns           | 4,510,003 ns        |
>>> | Idle State 1 Usage Diff| 95,826              | 5                   |
>>> | Idle State 1 Time Diff | 4,522,639 ns        | 198 ns              |
>>> +------------------------+---------------------+---------------------+
>>> +------------------------+---------------------+---------------------+
>>> | **Total Above Usage**  | 95,824 (98.75%)     | 5 (0.01%)           |
>>> +------------------------+---------------------+---------------------+     
>>> +------------------------+---------------------+---------------------+
>>> | Total Below Usage      | 0 (0.00%)           | 0 (0.00%)           |
>>> +------------------------+---------------------+---------------------+
>>>
>>> In the pipe-based wakeup scenario, before the patch was applied, the 
>>> "Above" metric was notably high around 98.75%. This suggests that the
>>> menu governor frequently selected a deeper idle state like CEDE, even
>>> when the idle period was relatively short.
>>>
>>> This happened because the menu governor is inclined to prioritize the
>>> physical idle state (CEDE) even when the target residency time of the
>>> physical idle state (s->target_residency_ns) was longer than the
>>> predicted idle time (predicted_ns), so data->next_timer_ns won't be
>>> relevant here in non-timer wakeups.
>>>
>>> In this test, despite the actual idle period being around 50 microseconds,
>>> the menu governor still chose CEDE state, which has a target residency of
>>> 120 microseconds.
>>
>> And the idle_miss statistics confirms that this was mostly wrong decisions
>> in hindsight.
>> I'll go through the menu code again, this indeed shouldn't be happening.
>> I'd be very surprised if upstream teo performed as badly (or actually badly
>> at all) on this, although it doesn't seem to help your actual workload,
>> would you mind giving that a try too?
>>
>> Regards,
>> Christian
> 
> So with a workload as static as this, the recent interval detection of menu
> should take affect. It records the last 8 idle interval lengths and does some
> statistics do see if they are somewhat stable and uses those instead of the
> next timer event.
> While I wouldn't vouch for the statistics just yet, the results don't seem
> to be completely unreasonable.
> Do you mind trying a) printing some snapshots of these intervals and the
> resulting return value of get_typical_interval()?
> b) Trying this out? Setting these values to some around 50us, that returns
> 50us for me as expected and therefore should not select an idle state above
> that.
> 
> -->8--
> 
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -112,6 +112,8 @@ struct menu_device {
>         int             interval_ptr;
>  };
>  
> +static int intervals_dummy[] = {50, 52, 48, 50, 51, 49, 51, 51};

So just to illustrate this a bit more:
{51, 80, 90, 54, 72, 60, 80, 117} is insignificant, but decrementing the last:
{51, 80, 90, 54, 72, 60, 80, 116} is significant (75us returned)
That sounds about reasonable but I think something like that would
also be observable for you. It's an unfortunate edge case.
I wouldn't want to mess with the ancient statistics (and silently break
an unknown amount of setups). I much prefer the teo intercept approach that
makes this idle state dependent and doesn't try to get one formula right
for all setups.

That is all assuming that a) get_typical_interval() doesn't return a
significant interval length for you and b) replacing that with dummy
significant values then fixes it (and not the error being some menu
logic that I overlooked).

Thinking out loud regarding the menu logic:
If we have ~50us predicted interval and your state setup the code then
goes as follows:

predicted_ns = 50000
...
data->next_timer_ns = KTIME_MAX; 
delta_tick = TICK_NSEC / 2;  
data->bucket = which_bucket(KTIME_MAX, nr_iowaiters); // bucket highest state probably

And then we have the big for loop for state 0 POLLING and state 1 CEDE target_residency_ns==100000:
trivially we have for i = 0:
	if (idx == -1)                                                  
		idx = i; /* first enabled state */
and for i = 1:
	if (s->target_residency_ns > predicted_ns) { // True
...
		if (predicted_ns < TICK_NSEC) // True                      
			break;
and should end up with idx == 0 and *stop_tick = false, all good.
If my above assumptions in a) and b) do not hold then I must've made a
mistake reading the code just now.

next prev parent reply	other threads:[~2024-09-19 14:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-09  7:31 [PATCH 0/1] cpuidle/menu: Address performance drop from favoring physical over polling cpuidle state Aboorva Devarajan
2024-08-09  7:31 ` [PATCH 1/1] cpuidle/menu: avoid prioritizing physical state over polling state Aboorva Devarajan
2024-08-13 13:09   ` Christian Loehle
2024-09-19 16:08   ` Christian Loehle
2024-10-18 10:36   ` Gautam Menghani
2024-08-13 12:56 ` [PATCH 0/1] cpuidle/menu: Address performance drop from favoring physical over polling cpuidle state Christian Loehle
2024-08-20  8:51   ` Aboorva Devarajan
2024-08-21 10:55     ` Christian Loehle
2024-09-19  5:02       ` Aboorva Devarajan
2024-09-19  8:49         ` Christian Loehle
2024-09-19  9:43           ` Christian Loehle
2024-09-19 14:07             ` Christian Loehle [this message]
2024-09-19 14:38               ` Christian Loehle
2024-10-21  8:02                 ` Aboorva Devarajan
2024-10-21  9:03                   ` Christian Loehle
2024-10-21  7:40             ` Aboorva Devarajan
2024-10-21  5:27           ` Aboorva Devarajan
2024-10-21  8:43             ` Christian Loehle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=964275d1-5f31-4eef-8dcd-fa8f11dbb830@arm.com \
    --to=christian.loehle@arm.com \
    --cc=aboorvad@linux.ibm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=gautam@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox