Re: [PATCH 4/4] sched: cpufreq_cfs: pelt-based cpu frequency scaling

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Michael Turquette <mturquette@linaro.org>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org,
	preeti@linux.vnet.ibm.com, Morten.Rasmussen@arm.com,
	riel@redhat.com, efault@gmx.de, nicolas.pitre@linaro.org,
	daniel.lezcano@linaro.org, dietmar.eggemann@arm.com,
	vincent.guittot@linaro.org, amit.kucheria@linaro.org,
	juri.lelli@arm.com, rjw@rjwysocki.net, viresh.kumar@linaro.org,
	ashwin.chaugule@linaro.org, alex.shi@linaro.org,
	linux-pm@vger.kernel.org
Subject: Re: [PATCH 4/4] sched: cpufreq_cfs: pelt-based cpu frequency scaling
Date: Wed, 6 May 2015 14:22:40 +0200	[thread overview]
Message-ID: <20150506122240.GY23123@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150505182347.16410.16338@quantum>

On Tue, May 05, 2015 at 11:23:47AM -0700, Michael Turquette wrote:
> Quoting Peter Zijlstra (2015-05-05 02:00:42)
> > On Mon, May 04, 2015 at 03:10:41PM -0700, Michael Turquette wrote:
> > > This policy is implemented using the cpufreq governor interface for two
> > > main reasons:
> > > 
> > > 1) re-using the cpufreq machine drivers without using the governor
> > > interface is hard.
> > > 
> > > 2) using the cpufreq interface allows us to switch between the
> > > scheduler-driven policy and legacy cpufreq governors such as ondemand at
> > > run-time. This is very useful for comparative testing and tuning.
> > 
> > Urgh,. so I don't really like that. It adds a lot of noise to the
> > system. You do the irq work thing to kick the cpufreq threads which do
> > their little thing -- and their wakeup will influence the cfs
> > accounting, which in turn will start the whole thing anew.
> > 
> > I would really prefer you did a whole new system with directly invoked
> > drivers that avoid the silly dance. Your 'new' ARM systems should be
> > well capable of that.
> 
> Thanks for the review Peter.

Well, I didn't actually get beyond the Changelog; although I have
rectified this now. A few more comments below.

> We'll need something in process context for the many cpufreq drivers
> that might sleep during their cpu frequency transition, no? This is due
> to calls into the regulator framework, the clock framework and sometimes
> other things such as conversing with a power management IC or voting
> with some system controller.

Yes, we need _something_. I just spoke to a bunch of people on IRC and
it does indeed seem that I was mistaken in my assumption that modern ARM
systems were 'easy' in this regard.

> > As to the drivers, they're mostly fairly small and self contained, it
> > should not be too hard to hack them up to work without cpufreq.
> 
> The drivers are not the only thing. I want to leverage the existing
> cpufreq core infrastructure:
> 
> * rate change notifiers
> * cpu hotplug awareness
> * methods to fetch frequency tables from firmware (acpi, devicetree)
> * other stuff I can't think of now
> 
> So I do not think we should throw out the baby with the bath water. The
> thing that people really don't like about cpufreq are the governors IMO.
> Let's fix that by creating a list of requirements that we really want
> for scheduler-driven cpu frequency selection:
> 
> 0) do something smart with scheduler statistics to create an improved
> frequency selection algorithm over existing cpufreq governors
> 
> 1) support upcoming and legacy hardware, within reason
> 
> 2) if a system exports a fast, async frequency selection interface to
> Linux, then provide a solution that doesn't do any irq_work or kthread
> stuff.  Do it all in the fast path
> 
> 3) if a system has a slow, synchronous interface for frequency
> selection, then provide an acceptable solution for kicking this work to
> process context. Limit time in the fast path
> 
> The patch set tries to tackle 0, 1 and 3. Would the inclusion of #2 make
> you feel better about supporting "newer" hardware with a fire-and-forget
> frequency selection interface?

I should probably look at doing a x86 support patch to try some of this
out, I'll try and find some time to play with that.

So two comments on the actual code:

1) Ideally these hooks would be called from places where we've just
computed the cpu utilization already. I understand this is currently not
the case and we need to do it in-situ.

That said; it would be good to have the interface such that we pass the
utilization in; this would of course mean we'd have to compute it at the
call sites, this should be fine I think.


2) I dislike how policy->cpus is handled; it feels upside down.
If per 1 each CPU already computed its utilization and provides it in
the call, we should not have to recompute it and its scale factor (which
btw seems done slightly odd too, I know humans like 10 base, but
computers suck at it).

Why not something like:

	usage = ((1024 + 256) * usage) >> 10; /* +25% */

	old_usage = __this_cpu_read(gd->usage);
	__this_cpu_write(gd->usage, usage);

	max_usage = 0;
	for_each_cpu(cpu, policy->cpus)
		max_usage = max(max_usage, per_cpu(gd->usage, cpu));

	if (max_usage < old_usage || /* dropped */
	    (max_usage == usage && max_usage != old_usage)) /* raised */
		request_change(max_usage);

next prev parent reply	other threads:[~2015-05-06 12:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-04 22:10 [PATCH 0/4] scheduler-based cpu frequency scaling Michael Turquette
2015-05-04 22:10 ` [PATCH 1/4] arm: Frequency invariant scheduler load-tracking support Michael Turquette
2015-05-04 22:10 ` [PATCH 2/4] sched: sched feature for cpu frequency selection Michael Turquette
2015-05-04 22:10 ` [PATCH 3/4] sched: export get_cpu_usage & capacity_orig_of Michael Turquette
2015-05-04 22:10 ` [PATCH 4/4] sched: cpufreq_cfs: pelt-based cpu frequency scaling Michael Turquette
2015-05-05  9:00   ` Peter Zijlstra
2015-05-05 12:16     ` Juri Lelli
     [not found]     ` <20150505182347.16410.16338@quantum>
2015-05-06 12:22       ` Peter Zijlstra [this message]
     [not found]         ` <20150507041725.16410.58417@quantum>
2015-05-07  6:23           ` Peter Zijlstra
2015-05-07 10:49           ` Juri Lelli
2015-05-07 14:25             ` Michael Turquette
2015-05-05  9:01   ` Peter Zijlstra
2015-05-05  9:04   ` Peter Zijlstra
2015-05-04 23:27 ` [PATCH 0/4] scheduler-based " Rafael J. Wysocki
2015-05-05 18:28   ` Mike Turquette
2015-05-06 16:50 ` [PATCH] sched/core: Add empty 'gov_cfs_update_cpu' function definition for NON-SMP systems Abel Vesa
2015-05-07  4:18   ` Michael Turquette

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150506122240.GY23123@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=alex.shi@linaro.org \
    --cc=amit.kucheria@linaro.org \
    --cc=ashwin.chaugule@linaro.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mturquette@linaro.org \
    --cc=nicolas.pitre@linaro.org \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox