From: Patrick Bellasi <patrick.bellasi@arm.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
Peter Zijlstra <peterz@infradead.org>,
Linux PM <linux-pm@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
Viresh Kumar <viresh.kumar@linaro.org>,
Juri Lelli <juri.lelli@arm.com>,
Joel Fernandes <joelaf@google.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs
Date: Tue, 21 Mar 2017 15:08:20 +0000 [thread overview]
Message-ID: <20170321150820.GF11054@e110439-lin> (raw)
In-Reply-To: <1844525.jBn1oKmyb6@aspire.rjw.lan>
On 21-Mar 15:46, Rafael J. Wysocki wrote:
> On Tuesday, March 21, 2017 02:38:42 PM Patrick Bellasi wrote:
> > On 21-Mar 15:26, Rafael J. Wysocki wrote:
> > > On Tuesday, March 21, 2017 02:37:08 PM Vincent Guittot wrote:
> > > > On 21 March 2017 at 14:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > On Tue, Mar 21, 2017 at 09:50:28AM +0100, Vincent Guittot wrote:
> > > > >> On 20 March 2017 at 22:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > >> > To work around this issue use the observation that, from the
> > > > >> > schedutil governor's perspective, it does not make sense to decrease
> > > > >> > the frequency of a CPU that doesn't enter idle and avoid decreasing
> > > > >> > the frequency of busy CPUs.
> > > > >>
> > > > >> I don't fully agree with that statement.
> > > > >> If there are 2 runnable tasks on CPU A and scheduler migrates the
> > > > >> waiting task to another CPU B so CPU A is less loaded now, it makes
> > > > >> sense to reduce the OPP. That's even for that purpose that we have
> > > > >> decided to use scheduler metrics in cpufreq governor so we can adjust
> > > > >> OPP immediately when tasks migrate.
> > > > >> That being said, i probably know why you see such OPP switches in your
> > > > >> use case. When we migrate a task, we also migrate/remove its
> > > > >> utilization from CPU.
> > > > >> If the CPU is not overloaded, it means that runnable tasks have all
> > > > >> computation that they need and don't have any reason to use more when
> > > > >> a task migrates to another CPU. so decreasing the OPP makes sense
> > > > >> because the utilzation is decreasing
> > > > >> If the CPU is overloaded, it means that runnable tasks have to share
> > > > >> CPU time and probably don't have all computations that they would like
> > > > >> so when a task migrate, the remaining tasks on the CPU will increase
> > > > >> their utilization and fill space left by the task that has just
> > > > >> migrated. So the CPU's utilization will decrease when a task migrates
> > > > >> (and as a result the OPP) but then its utilization will increase with
> > > > >> remaining tasks running more time as well as the OPP
> > > > >>
> > > > >> So you need to make the difference between this 2 cases: Is a CPU
> > > > >> overloaded or not. You can't really rely on the utilization to detect
> > > > >> that but you could take advantage of the load which take into account
> > > > >> the waiting time of tasks
> > > > >
> > > > > I'm confused. What two cases? You only list the overloaded case, but he
> > > >
> > > > overloaded vs not overloaded use case.
> > > > For the not overloaded case, it makes sense to immediately update to
> > > > OPP to be aligned with the new utilization of the CPU even if it was
> > > > not idle in the past couple of ticks
> > >
> > > Yes, if the OPP (or P-state if you will) can be changed immediately. If it can't,
> > > conditions may change by the time we actually update it and in that case It'd
> > > be better to wait and see IMO.
> > >
> > > In any case, the theory about migrating tasks made sense to me, so below is
> > > what I tested. It works, and besides it has a nice feature that I don't need
> > > to fetch for the timekeeping data. :-)
> > >
> > > I only wonder if we want to do this or only prevent the frequency from
> > > decreasing in the overloaded case?
> > >
> > > ---
> > > kernel/sched/cpufreq_schedutil.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > > ===================================================================
> > > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > > @@ -61,6 +61,7 @@ struct sugov_cpu {
> > > unsigned long util;
> > > unsigned long max;
> > > unsigned int flags;
> > > + bool overload;
> > > };
> > >
> > > static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> > > @@ -207,7 +208,7 @@ static void sugov_update_single(struct u
> > > if (!sugov_should_update_freq(sg_policy, time))
> > > return;
> > >
> > > - if (flags & SCHED_CPUFREQ_RT_DL) {
> > > + if ((flags & SCHED_CPUFREQ_RT_DL) || this_rq()->rd->overload) {
> > > next_f = policy->cpuinfo.max_freq;
> >
> > Isn't this going to max OPP every time we have more than 1 task in
> > that CPU?
> >
> > In that case it will not fit the case: we have two 10% tasks on that CPU.
>
> Good point.
>
> > Previous solution was better IMO, apart from using overloaded instead
> > of overutilized (which is not yet there) :-/
>
> OK, so the one below works too.
Better... just one minor comment.
> ---
> kernel/sched/cpufreq_schedutil.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -37,6 +37,7 @@ struct sugov_policy {
> s64 freq_update_delay_ns;
> unsigned int next_freq;
> unsigned int cached_raw_freq;
> + bool overload;
Can we avoid using "overloaded" in favor of a more generic and
schedutil specific name. Mainly because in the future we would
probably like to switch from "overloaded" to "overutilized".
What about something like: "busy" ?
>
> /* The next fields are only needed if fast switch cannot be used. */
> struct irq_work irq_work;
> @@ -61,6 +62,7 @@ struct sugov_cpu {
> unsigned long util;
> unsigned long max;
> unsigned int flags;
> + bool overload;
> };
>
> static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> @@ -93,6 +95,9 @@ static void sugov_update_commit(struct s
> {
> struct cpufreq_policy *policy = sg_policy->policy;
>
> + if (sg_policy->overload && next_freq < sg_policy->next_freq)
> + next_freq = sg_policy->next_freq;
> +
> if (policy->fast_switch_enabled) {
> if (sg_policy->next_freq == next_freq) {
> trace_cpu_frequency(policy->cur, smp_processor_id());
> @@ -207,6 +212,8 @@ static void sugov_update_single(struct u
> if (!sugov_should_update_freq(sg_policy, time))
> return;
>
> + sg_policy->overload = this_rq()->rd->overload;
And than we can move this bit into an inline function, something like e.g.:
static inline bool sugov_this_cpu_is_busy()
{
return this_rq()->rd->overloaded
}
Where in future we can easily switch from usage of "overloaded" to
usage of "utilization".
> +
> if (flags & SCHED_CPUFREQ_RT_DL) {
> next_f = policy->cpuinfo.max_freq;
> } else {
> @@ -225,6 +232,8 @@ static unsigned int sugov_next_freq_shar
> unsigned long util = 0, max = 1;
> unsigned int j;
>
> + sg_policy->overload = false;
> +
> for_each_cpu(j, policy->cpus) {
> struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
> unsigned long j_util, j_max;
> @@ -253,6 +262,7 @@ static unsigned int sugov_next_freq_shar
> }
>
> sugov_iowait_boost(j_sg_cpu, &util, &max);
> + sg_policy->overload = sg_policy->overload || sg_cpu->overload;
> }
>
> return get_next_freq(sg_policy, util, max);
> @@ -273,6 +283,7 @@ static void sugov_update_shared(struct u
> sg_cpu->util = util;
> sg_cpu->max = max;
> sg_cpu->flags = flags;
> + sg_cpu->overload = this_rq()->rd->overload;
>
> sugov_set_iowait_boost(sg_cpu, time, flags);
> sg_cpu->last_update = time;
>
--
#include <best/regards.h>
Patrick Bellasi
next prev parent reply other threads:[~2017-03-21 15:08 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-19 13:21 [PATCH 0/2] cpufreq: schedutil: Fix and optimization Rafael J. Wysocki
2017-03-19 13:30 ` [PATCH 1/2] cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start() Rafael J. Wysocki
2017-03-20 3:28 ` Viresh Kumar
2017-03-20 12:36 ` Rafael J. Wysocki
2017-03-19 13:34 ` [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs Rafael J. Wysocki
2017-03-19 21:24 ` Rafael J. Wysocki
2017-03-19 21:42 ` Rafael J. Wysocki
2017-03-20 10:38 ` Peter Zijlstra
2017-03-20 12:31 ` Rafael J. Wysocki
2017-03-20 3:57 ` Viresh Kumar
2017-03-20 8:26 ` Vincent Guittot
2017-03-20 12:34 ` Patrick Bellasi
2017-03-22 23:56 ` Joel Fernandes
2017-03-23 22:08 ` Vincent Guittot
2017-03-25 3:48 ` Joel Fernandes
2017-03-27 6:59 ` Vincent Guittot
2017-03-20 12:59 ` Rafael J. Wysocki
2017-03-20 13:20 ` Vincent Guittot
2017-03-20 12:48 ` Rafael J. Wysocki
2017-03-20 10:36 ` Peter Zijlstra
2017-03-20 12:35 ` Rafael J. Wysocki
2017-03-20 12:50 ` Peter Zijlstra
2017-03-20 13:04 ` Rafael J. Wysocki
2017-03-20 13:06 ` Patrick Bellasi
2017-03-20 13:05 ` Rafael J. Wysocki
2017-03-20 14:13 ` Patrick Bellasi
2017-03-20 21:46 ` [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of " Rafael J. Wysocki
2017-03-21 6:40 ` Viresh Kumar
2017-03-21 12:30 ` Rafael J. Wysocki
2017-03-21 8:50 ` Vincent Guittot
2017-03-21 11:56 ` Patrick Bellasi
2017-03-21 13:22 ` Peter Zijlstra
2017-03-21 13:37 ` Vincent Guittot
2017-03-21 14:03 ` Peter Zijlstra
2017-03-21 14:18 ` Vincent Guittot
2017-03-21 14:25 ` Patrick Bellasi
[not found] ` <CAKfTPtALorn7HNpz4LOfWWSc3u+9y5iHB5byzfTHGQXDA+tVJQ@mail.gmail.com>
2017-03-21 14:58 ` Peter Zijlstra
2017-03-21 17:00 ` Vincent Guittot
2017-03-21 17:01 ` Vincent Guittot
2017-03-21 14:26 ` Rafael J. Wysocki
2017-03-21 14:38 ` Patrick Bellasi
2017-03-21 14:46 ` Rafael J. Wysocki
2017-03-21 14:50 ` Rafael J. Wysocki
2017-03-21 15:04 ` Peter Zijlstra
2017-03-21 15:18 ` Rafael J. Wysocki
2017-03-21 17:00 ` Peter Zijlstra
2017-03-21 17:17 ` Rafael J. Wysocki
2017-03-21 15:08 ` Patrick Bellasi [this message]
2017-03-21 15:18 ` Peter Zijlstra
2017-03-21 19:28 ` Patrick Bellasi
2017-03-21 15:02 ` Peter Zijlstra
2017-03-21 11:50 ` Patrick Bellasi
2017-03-21 23:08 ` [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely Rafael J. Wysocki
2017-03-22 9:26 ` Peter Zijlstra
2017-03-22 9:54 ` Viresh Kumar
2017-03-23 1:04 ` Joel Fernandes
2017-03-23 19:26 ` Sai Gurrappadi
2017-03-23 20:48 ` Sai Gurrappadi
2017-03-24 1:39 ` Rafael J. Wysocki
2017-03-24 19:08 ` Sai Gurrappadi
2017-03-25 1:14 ` Sai Gurrappadi
2017-03-25 1:39 ` Rafael J. Wysocki
2017-03-27 7:04 ` Vincent Guittot
2017-03-27 21:01 ` Sai Gurrappadi
2017-03-27 21:11 ` Rafael J. Wysocki
2017-05-08 3:49 ` Wanpeng Li
2017-05-08 4:01 ` Viresh Kumar
2017-05-08 5:15 ` Wanpeng Li
2017-05-08 22:16 ` Rafael J. Wysocki
2017-05-08 22:36 ` Wanpeng Li
2017-05-08 23:01 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170321150820.GF11054@e110439-lin \
--to=patrick.bellasi@arm.com \
--cc=joelaf@google.com \
--cc=juri.lelli@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.