Re: [PATCH v4 0/9] Inefficient OPPs

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Quentin Perret <qperret@google.com>
To: Lukasz Luba <lukasz.luba@arm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Vincent Donnefort <vincent.donnefort@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Matthias Kaehlcke <mka@chromium.org>
Subject: Re: [PATCH v4 0/9] Inefficient OPPs
Date: Tue, 10 Aug 2021 15:07:48 +0100	[thread overview]
Message-ID: <YRKINFhDmYqvgxsN@google.com> (raw)
In-Reply-To: <78bc08fe-71c2-398c-9a10-caa54b8bd866@arm.com>

On Tuesday 10 Aug 2021 at 13:28:21 (+0100), Lukasz Luba wrote:
> 
> 
> On 8/10/21 7:13 AM, Viresh Kumar wrote:
> > On 04-08-21, 18:21, Rafael J. Wysocki wrote:
> > > The cpufreq changes are mostly fine by me now, but I would like to
> > > hear from Viresh regarding this.
> > 
> > I have few doubts/concerns here.
> > 
> > Just to iterate it again, the idea here is to choose a higher
> > frequency, which will work in an efficient manner based on energy
> > consumption. So this _only_ works for the case where the caller asked
> > for CPUFREQ_RELATION_L.
> > 
> > - The new flag being added here, CPUFREQ_RELATION_E, doesn't feel
> >    complete in this sense to me. It should rather be called as
> >    CPUFREQ_RELATION_LE instead as it is _always_ used with relation L.
> > 
> > - IMO this has made the caller site a bit confusing now, i.e.  why we
> >    send CPUFREQ_RELATION_E sometimes and CPUFREQ_RELATION_H at other
> >    times. Why shouldn't the _H freq be efficient as well ? I understand
> >    that this was done to not do the efficient stuff in case of
> >    userspace/powersave/performance governors.
> > 
> >    What about reusing following for finding all such cases ?
> > 
> >          policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING
> > 
> >    The driver can set a flag to tell if it wants efficient frequencies
> >    or not, and at runtime we apply efficient algorithm only if the
> >    current governor does DVFS, which will leave out
> >    userspace/performance/powersave.
> > 
> > 
> > Now the other thing I didn't like since the beginning, I still don't
> > like it :)
> > 
> > A cpufreq table is provided by the driver, which can have some
> > inefficient frequencies. The inefficient frequencies can be caught by
> > many parts of the kernel, the current way (in this series) is via EM.
> > But this can totally be anything else as well, like a devfreq driver.
> 
> Currently devfreq drivers and governors don't support 'inefficient'
> OPPs idea. They are not even 'utilization' driven yet. I'm not sure
> if that would make sense for their workloads. For now, they are far
> behind the CPUFreq/scheduler development in this field.
> It needs more research and experiments, to even estimate if this brings
> benefits. So, I would just skip devfreq use case...
> 
> > 
> > The way we have tied this whole thing with EM, via
> > cpufreq_read_inefficiencies_from_em(), is what I don't like. We have
> > closely bound the whole thing with one of the ways this can be done
> > and we shouldn't be polluting the cpufreq core with this IMHO. In a
> > sane world, the cpufreq core should just provide an API, like
> > cpufreq_set_freq_invariant() and it can be called by any part of
> > the kernel.
> > 
> > I understand that the current problem is where do we make that call
> > from and I suggested dev_pm_opp_of_register_em() for the same earlier.
> > But that doesn't work as the policy isn't properly initialized by that
> > point.
> 
> True, the policy is not initialized yet when cpufreq_driver::init()
> callback is called, which the current place for scmi-cpufreq.
> 
> What about the other callback cpufreq_driver::ready() ?
> This might solve the two issues I mentioned below.
> 
> > 
> > Now that I see the current implementation, I think we can make it work
> > in a different way.
> > 
> > - Copy what's done for thermal-cooling in cpufreq core, i.e.
> >    CPUFREQ_IS_COOLING_DEV flag, for the EM core as well. So the cpufreq
> >    drivers can set a flag, CPUFREQ_REGISTER_EM, and the cpufreq core
> >    can call dev_pm_opp_of_register_em() on their behalf. This call will
> >    be made from cpufreq_online(), just before/after
> >    cpufreq_thermal_control_enabled() stuff. By this point the policy is
> >    properly initialized and is also updated in
> >    per_cpu(cpufreq_cpu_data).
> > 
> > - Now the cpufreq core can provide an API like
> >    cpufreq_set_freq_invariant(int cpu, unsigned long freq), which can
> >    be called from EM core's implementation of
> >    em_dev_register_perf_domain().
> > 
> 
> I have to point out that there are two issues (which we
> might solve):
> 1. Advanced setup, due to per-CPU performance requests,
> which are not limited to real DVFS domain.
> The scmi-cpufreq (and possibly some other soon) uses more
> tricky EM setup. As you might recall, the construction of temporary
> cpumask is a bit complex, since we want per-CPU policy, but the
> cpumask pass to EM has not a single bit but is 'spanned' with a few
> CPUs.
> 
> 2. The scmi-cpufreq (and IIRC MTK cpufreq driver soon) requires
> custom struct em_data_callback, since the power data is coming from FW.

+1, we really need this to work.

> If there is a need for complex EM registration, maybe we could
> do it in the cpufreq_driver::ready(). The policy would be ready
> during that call, so it might fly.

I remember trying this, but ran into issues as well FWIW. I would need
to check if this is still relevant, but at the time this was introduced
we needed to register the EM _before_ the policy notifier is called with
'CPUFREQ_CREATE_POLICY', because this will trigger a sched domain
rebuild in the arch_topology driver, which allows the scheduler to pick
up the newly introduced EM data.

next prev parent reply	other threads:[~2021-08-10 14:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08 10:08 [PATCH v4 0/9] Inefficient OPPs Vincent Donnefort
2021-07-08 10:08 ` [PATCH v4 1/9] PM / EM: Fix inefficient states detection Vincent Donnefort
2021-07-08 10:08 ` [PATCH v4 2/9] PM / EM: Mark inefficient states Vincent Donnefort
2021-07-22  7:25   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 3/9] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
2021-07-22  7:27   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 4/9] PM / EM: Allow skipping inefficient states Vincent Donnefort
2021-07-22 13:09   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 5/9] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 6/9] cpufreq: Add a new freq-table relation CPUFREQ_RELATION_E Vincent Donnefort
2021-07-22  8:17   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 7/9] cpufreq: CPUFREQ_RELATION_E in schedutil ondemand and conservative Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 8/9] cpufreq: Add driver flag CPUFREQ_READ_ENERGY_MODEL Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 9/9] cpufreq: dt: Add CPUFREQ_READ_ENERGY_MODEL Vincent Donnefort
2021-08-04 16:21 ` [PATCH v4 0/9] Inefficient OPPs Rafael J. Wysocki
2021-08-10  6:13   ` Viresh Kumar
2021-08-10  7:39     ` Viresh Kumar
2021-08-10 12:28     ` Lukasz Luba
2021-08-10 14:07       ` Quentin Perret [this message]
2021-08-10 14:18         ` Lukasz Luba
2021-08-10 15:12           ` Lukasz Luba
2021-08-10 15:47             ` Quentin Perret
2021-08-11  5:03               ` Viresh Kumar
2021-08-11 11:38                 ` Lukasz Luba
2021-08-16 14:35                   ` Rafael J. Wysocki
2021-08-17  7:09                     ` Viresh Kumar
2021-08-11  4:01       ` Viresh Kumar
2021-08-16 14:19     ` Rafael J. Wysocki
2021-08-17  7:06       ` Viresh Kumar
2021-08-17  9:03         ` Lukasz Luba
2021-08-23 17:06           ` Vincent Donnefort

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YRKINFhDmYqvgxsN@google.com \
    --to=qperret@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ionela.voinescu@arm.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mka@chromium.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=vincent.donnefort@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).