public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Chen Yu <yu.c.chen@intel.com>
Cc: linux-pm@vger.kernel.org, "Rafael J. Wysocki" <rafael@kernel.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Len Brown <len.brown@intel.com>, Tim Chen <tim.c.chen@intel.com>,
	Giovanni Gherdovich <ggherdovich@suse.cz>,
	Chen Yu <yu.chen.surf@gmail.com>,
	linux-kernel@vger.kernel.org, Zhang Rui <rui.zhang@intel.com>
Subject: Re: [PATCH] cpufreq: intel_pstate: Handle no_turbo in frequency invariance
Date: Fri, 8 Apr 2022 10:22:25 +0200	[thread overview]
Message-ID: <20220408082225.GN2731@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20220407234258.569681-1-yu.c.chen@intel.com>

On Fri, Apr 08, 2022 at 07:42:58AM +0800, Chen Yu wrote:
> Problem statement:
> Once the user has disabled turbo frequency by
> echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo,
> the cfs_rq's util_avg becomes quite small when compared with
> CPU capacity.
> 
> Step to reproduce:
> 
> echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> 
> ./x86_cpuload --count 1 --start 3 --timeout 100 --busy 99
> would launch 1 thread and bind it to CPU3, lasting for 100 seconds,
> with a CPU utilization of 99%. [1]
> 
> top result:
> %Cpu3  : 98.4 us,  0.0 sy,  0.0 ni,  1.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> 
> check util_avg:
> cat /sys/kernel/debug/sched/debug | grep "cfs_rq\[3\]" -A 20 | grep util_avg
>   .util_avg                      : 611
> 
> So the util_avg/cpu capacity is 611/1024, which is much smaller than
> 98.4% shown in the top result.
> 
> This might impact some logic in the scheduler. For example, group_is_overloaded()
> would compare the group_capacity and group_util in the sched group, to
> check if this sched group is overloaded or not. With this gap, even
> when there is a nearly 100% workload, the sched group will not be regarded
> as overloaded. Besides group_is_overloaded(), there are also other victims.
> There is a ongoing work that aims to optimize the task wakeup in a LLC domain.
> The main idea is to stop searching idle CPUs if the sched domain is overloaded[2].
> This proposal also relies on the util_avg/CPU capacity to decide whether the LLC
> domain is overloaded.
> 
> Analysis:
> CPU frequency invariance has caused this difference. In summary,
> the util_sum of cfs rq would decay quite fast when the CPU is in
> idle, when the CPU frequency invariance is enabled.
> 
> The detail is as followed:
> 
> As depicted in update_rq_clock_pelt(), when the frequency invariance
> is enabled, there would be two clock variables on each rq, clock_task
> and clock_pelt:
> 
>    The clock_pelt scales the time to reflect the effective amount of
>    computation done during the running delta time but then syncs back to
>    clock_task when rq is idle.
> 
>    absolute time    | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16
>    @ max frequency  ------******---------------******---------------
>    @ half frequency ------************---------************---------
>    clock pelt       | 1| 2|    3|    4| 7| 8| 9|   10|   11|14|15|16
> 
> The fast decay of util_sum during idle is due to:
> 1. rq->clock_pelt is always behind rq->clock_task
> 2. rq->last_update is updated to rq->clock_pelt' after invoking ___update_load_sum()
> 3. Then the CPU becomes idle, the rq->clock_pelt' would be suddenly increased
>    a lot to rq->clock_task
> 4. Enters ___update_load_sum() again, the idle period is calculated by
>    rq->clock_task - rq->last_update, AKA, rq->clock_task - rq->clock_pelt'.
>    The lower the CPU frequency is, the larger the delta =
>    rq->clock_task - rq->clock_pelt' will be. Since the idle period will be
>    used to decay the util_sum only, the util_sum drops significantly during
>    idle period.
> 
> Proposal:
> This symptom is not only caused by disabling turbo frequency, but it
> would also appear if the user limits the max frequency at runtime. Because
> if the frequency is always lower than the max frequency,
> CPU frequency invariance would decay the util_sum quite fast during idle.
> 
> As some end users would disable turbo after boot up, this patch aims to
> present this symptom and deals with turbo scenarios for now. It might
> be ideal if CPU frequency invariance is aware of the max CPU frequency
> (user specified) at runtime in the future.
> 
> [Previous patch seems to be lost on LKML, this is a resend, sorry for any
> inconvenience]
> 
> Link: https://github.com/yu-chen-surf/x86_cpuload.git #1
> Link: https://lore.kernel.org/lkml/20220310005228.11737-1-yu.c.chen@intel.com/ #2
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

  reply	other threads:[~2022-04-08  8:22 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07 23:42 [PATCH] cpufreq: intel_pstate: Handle no_turbo in frequency invariance Chen Yu
2022-04-08  8:22 ` Peter Zijlstra [this message]
2022-04-08 14:22 ` Giovanni Gherdovich
2022-04-13 15:39   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220408082225.GN2731@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ggherdovich@suse.cz \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yu.c.chen@intel.com \
    --cc=yu.chen.surf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox