public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Linux PM <linux-pm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Juri Lelli <juri.lelli@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Joel Fernandes <joelaf@google.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
Date: Tue, 21 Mar 2017 11:50:49 +0000	[thread overview]
Message-ID: <20170321115049.GB11054@e110439-lin> (raw)
In-Reply-To: <3300960.HE4b3sK4dn@aspire.rjw.lan>

On 20-Mar 22:46, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The way the schedutil governor uses the PELT metric causes it to
> underestimate the CPU utilization in some cases.
> 
> That can be easily demonstrated by running kernel compilation on
> a Sandy Bridge Intel processor, running turbostat in parallel with
> it and looking at the values written to the MSR_IA32_PERF_CTL
> register.  Namely, the expected result would be that when all CPUs
> were 100% busy, all of them would be requested to run in the maximum
> P-state, but observation shows that this clearly isn't the case.
> The CPUs run in the maximum P-state for a while and then are
> requested to run slower and go back to the maximum P-state after
> a while again.  That causes the actual frequency of the processor to
> visibly oscillate below the sustainable maximum in a jittery fashion
> which clearly is not desirable.
> 
> To work around this issue use the observation that, from the
> schedutil governor's perspective, it does not make sense to decrease
> the frequency of a CPU that doesn't enter idle and avoid decreasing
> the frequency of busy CPUs.
> 
> To that end, use the counter of idle calls in the timekeeping code.
> Namely, make the schedutil governor look at that counter for the
> current CPU every time before it is about to set a new frequency
> for that CPU's policy.  If the counter has not changed since the
> previous iteration, the CPU has been busy for all that time and
> its frequency should not be decreased, so if the new frequency would
> be lower than the one set previously, the governor will skip the
> frequency update.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> This is a slightly different approach (avoid decreasing frequency for busy CPUs
> instead of bumping if for them to the max upfront) and it works around the
> original problem too.

I like much better this version where we do not enforce max frequency
as well as we removed the hardcoded time threshold. ;-)

Makes sense to me also to avoid down scaling until we don't hit an IDLE.

However, I also agree with Vincent's observation: this constraint should be
there only for "overutilized" CPUs... but unfortunately, in mainline we are
still missing that flag and thus we should probably use the
"overloaded" one for the time being.

> I tried to address a few Peter's comments here and the result doesn't seem to
> be too heavy-wieght.

Nice!

> Thanks,
> Rafael
> 
> ---
>  include/linux/tick.h             |    1 +
>  kernel/sched/cpufreq_schedutil.c |   28 ++++++++++++++++++++++++----
>  kernel/time/tick-sched.c         |   12 ++++++++++++
>  3 files changed, 37 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -56,6 +56,9 @@ struct sugov_cpu {
>  	unsigned long iowait_boost;
>  	unsigned long iowait_boost_max;
>  	u64 last_update;
> +#ifdef CONFIG_NO_HZ_COMMON
> +	unsigned long saved_idle_calls;
> +#endif
>  
>  	/* The fields below are only needed when sharing a policy. */
>  	unsigned long util;
> @@ -88,11 +91,28 @@ static bool sugov_should_update_freq(str
>  	return delta_ns >= sg_policy->freq_update_delay_ns;
>  }
>  
> -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> -				unsigned int next_freq)
> +#ifdef CONFIG_NO_HZ_COMMON
> +static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
> +{
> +	unsigned long idle_calls = tick_nohz_get_idle_calls();
> +	bool ret = idle_calls == sg_cpu->saved_idle_calls;
> +
> +	sg_cpu->saved_idle_calls = idle_calls;
> +	return ret;
> +}
> +#else
> +static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
> +#endif /* CONFIG_NO_HZ_COMMON */
> +
> +static void sugov_update_commit(struct sugov_cpu *sg_cpu,
> +				struct sugov_policy *sg_policy,
> +				u64 time, unsigned int next_freq)
>  {
>  	struct cpufreq_policy *policy = sg_policy->policy;
>  
> +	if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
> +		next_freq = sg_policy->next_freq;
> +
>  	if (policy->fast_switch_enabled) {
>  		if (sg_policy->next_freq == next_freq) {
>  			trace_cpu_frequency(policy->cur, smp_processor_id());
> @@ -214,7 +234,7 @@ static void sugov_update_single(struct u
>  		sugov_iowait_boost(sg_cpu, &util, &max);
>  		next_f = get_next_freq(sg_policy, util, max);
>  	}
> -	sugov_update_commit(sg_policy, time, next_f);
> +	sugov_update_commit(sg_cpu, sg_policy, time, next_f);
>  }
>  
>  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
> @@ -283,7 +303,7 @@ static void sugov_update_shared(struct u
>  		else
>  			next_f = sugov_next_freq_shared(sg_cpu);
>  
> -		sugov_update_commit(sg_policy, time, next_f);
> +		sugov_update_commit(sg_cpu, sg_policy, time, next_f);
>  	}
>  
>  	raw_spin_unlock(&sg_policy->update_lock);
> Index: linux-pm/include/linux/tick.h
> ===================================================================
> --- linux-pm.orig/include/linux/tick.h
> +++ linux-pm/include/linux/tick.h
> @@ -117,6 +117,7 @@ extern void tick_nohz_idle_enter(void);
>  extern void tick_nohz_idle_exit(void);
>  extern void tick_nohz_irq_exit(void);
>  extern ktime_t tick_nohz_get_sleep_length(void);
> +extern unsigned long tick_nohz_get_idle_calls(void);
>  extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
>  extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
>  #else /* !CONFIG_NO_HZ_COMMON */
> Index: linux-pm/kernel/time/tick-sched.c
> ===================================================================
> --- linux-pm.orig/kernel/time/tick-sched.c
> +++ linux-pm/kernel/time/tick-sched.c
> @@ -993,6 +993,18 @@ ktime_t tick_nohz_get_sleep_length(void)
>  	return ts->sleep_length;
>  }
>  
> +/**
> + * tick_nohz_get_idle_calls - return the current idle calls counter value
> + *
> + * Called from the schedutil frequency scaling governor in scheduler context.
> + */
> +unsigned long tick_nohz_get_idle_calls(void)
> +{
> +	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
> +
> +	return ts->idle_calls;
> +}
> +
>  static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
>  {
>  #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> 

-- 
#include <best/regards.h>

Patrick Bellasi

  parent reply	other threads:[~2017-03-21 11:50 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-19 13:21 [PATCH 0/2] cpufreq: schedutil: Fix and optimization Rafael J. Wysocki
2017-03-19 13:30 ` [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() Rafael J. Wysocki
2017-03-20  3:28   ` Viresh Kumar
2017-03-20 12:36     ` Rafael J. Wysocki
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
2017-03-19 21:24   ` Rafael J. Wysocki
2017-03-19 21:42     ` Rafael J. Wysocki
2017-03-20 10:38     ` Peter Zijlstra
2017-03-20 12:31       ` Rafael J. Wysocki
2017-03-20  3:57   ` Viresh Kumar
2017-03-20  8:26     ` Vincent Guittot
2017-03-20 12:34       ` Patrick Bellasi
2017-03-22 23:56         ` Joel Fernandes
2017-03-23 22:08           ` Vincent Guittot
2017-03-25  3:48             ` Joel Fernandes
2017-03-27  6:59               ` Vincent Guittot
2017-03-20 12:59       ` Rafael J. Wysocki
2017-03-20 13:20         ` Vincent Guittot
2017-03-20 12:48     ` Rafael J. Wysocki
2017-03-20 10:36   ` Peter Zijlstra
2017-03-20 12:35     ` Rafael J. Wysocki
2017-03-20 12:50       ` Peter Zijlstra
2017-03-20 13:04         ` Rafael J. Wysocki
2017-03-20 13:06         ` Patrick Bellasi
2017-03-20 13:05           ` Rafael J. Wysocki
2017-03-20 14:13             ` Patrick Bellasi
2017-03-20 21:46   ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
2017-03-21  6:40     ` Viresh Kumar
2017-03-21 12:30       ` Rafael J. Wysocki
2017-03-21  8:50     ` Vincent Guittot
2017-03-21 11:56       ` Patrick Bellasi
2017-03-21 13:22       ` Peter Zijlstra
2017-03-21 13:37         ` Vincent Guittot
2017-03-21 14:03           ` Peter Zijlstra
2017-03-21 14:18             ` Vincent Guittot
2017-03-21 14:25             ` Patrick Bellasi
     [not found]             ` <CAKfTPtALorn7HNpz4LOfWWSc3u+9y5iHB5byzfTHGQXDA+tVJQ@mail.gmail.com>
2017-03-21 14:58               ` Peter Zijlstra
2017-03-21 17:00                 ` Vincent Guittot
2017-03-21 17:01                   ` Vincent Guittot
2017-03-21 14:26           ` Rafael J. Wysocki
2017-03-21 14:38             ` Patrick Bellasi
2017-03-21 14:46               ` Rafael J. Wysocki
2017-03-21 14:50                 ` Rafael J. Wysocki
2017-03-21 15:04                 ` Peter Zijlstra
2017-03-21 15:18                   ` Rafael J. Wysocki
2017-03-21 17:00                     ` Peter Zijlstra
2017-03-21 17:17                       ` Rafael J. Wysocki
2017-03-21 15:08                 ` Patrick Bellasi
2017-03-21 15:18                   ` Peter Zijlstra
2017-03-21 19:28                     ` Patrick Bellasi
2017-03-21 15:02             ` Peter Zijlstra
2017-03-21 11:50     ` Patrick Bellasi [this message]
2017-03-21 23:08     ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
2017-03-22  9:26       ` Peter Zijlstra
2017-03-22  9:54       ` Viresh Kumar
2017-03-23  1:04       ` Joel Fernandes
2017-03-23 19:26       ` Sai Gurrappadi
2017-03-23 20:48         ` Sai Gurrappadi
2017-03-24  1:39         ` Rafael J. Wysocki
2017-03-24 19:08           ` Sai Gurrappadi
2017-03-25  1:14       ` Sai Gurrappadi
2017-03-25  1:39         ` Rafael J. Wysocki
2017-03-27  7:04         ` Vincent Guittot
2017-03-27 21:01           ` Sai Gurrappadi
2017-03-27 21:11             ` Rafael J. Wysocki
2017-05-08  3:49       ` Wanpeng Li
2017-05-08  4:01         ` Viresh Kumar
2017-05-08  5:15           ` Wanpeng Li
2017-05-08 22:16           ` Rafael J. Wysocki
2017-05-08 22:36             ` Wanpeng Li
2017-05-08 23:01               ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170321115049.GB11054@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox