From: Quentin Perret <qperret@google.com>
To: Lukasz Luba <lukasz.luba@arm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Vincent Donnefort <vincent.donnefort@arm.com>,
Peter Zijlstra <peterz@infradead.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Vincent Guittot <vincent.guittot@linaro.org>,
Linux PM <linux-pm@vger.kernel.org>,
Ionela Voinescu <ionela.voinescu@arm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Matthias Kaehlcke <mka@chromium.org>
Subject: Re: [PATCH v4 0/9] Inefficient OPPs
Date: Tue, 10 Aug 2021 15:07:48 +0100 [thread overview]
Message-ID: <YRKINFhDmYqvgxsN@google.com> (raw)
In-Reply-To: <78bc08fe-71c2-398c-9a10-caa54b8bd866@arm.com>
On Tuesday 10 Aug 2021 at 13:28:21 (+0100), Lukasz Luba wrote:
>
>
> On 8/10/21 7:13 AM, Viresh Kumar wrote:
> > On 04-08-21, 18:21, Rafael J. Wysocki wrote:
> > > The cpufreq changes are mostly fine by me now, but I would like to
> > > hear from Viresh regarding this.
> >
> > I have few doubts/concerns here.
> >
> > Just to iterate it again, the idea here is to choose a higher
> > frequency, which will work in an efficient manner based on energy
> > consumption. So this _only_ works for the case where the caller asked
> > for CPUFREQ_RELATION_L.
> >
> > - The new flag being added here, CPUFREQ_RELATION_E, doesn't feel
> > complete in this sense to me. It should rather be called as
> > CPUFREQ_RELATION_LE instead as it is _always_ used with relation L.
> >
> > - IMO this has made the caller site a bit confusing now, i.e. why we
> > send CPUFREQ_RELATION_E sometimes and CPUFREQ_RELATION_H at other
> > times. Why shouldn't the _H freq be efficient as well ? I understand
> > that this was done to not do the efficient stuff in case of
> > userspace/powersave/performance governors.
> >
> > What about reusing following for finding all such cases ?
> >
> > policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING
> >
> > The driver can set a flag to tell if it wants efficient frequencies
> > or not, and at runtime we apply efficient algorithm only if the
> > current governor does DVFS, which will leave out
> > userspace/performance/powersave.
> >
> >
> > Now the other thing I didn't like since the beginning, I still don't
> > like it :)
> >
> > A cpufreq table is provided by the driver, which can have some
> > inefficient frequencies. The inefficient frequencies can be caught by
> > many parts of the kernel, the current way (in this series) is via EM.
> > But this can totally be anything else as well, like a devfreq driver.
>
> Currently devfreq drivers and governors don't support 'inefficient'
> OPPs idea. They are not even 'utilization' driven yet. I'm not sure
> if that would make sense for their workloads. For now, they are far
> behind the CPUFreq/scheduler development in this field.
> It needs more research and experiments, to even estimate if this brings
> benefits. So, I would just skip devfreq use case...
>
> >
> > The way we have tied this whole thing with EM, via
> > cpufreq_read_inefficiencies_from_em(), is what I don't like. We have
> > closely bound the whole thing with one of the ways this can be done
> > and we shouldn't be polluting the cpufreq core with this IMHO. In a
> > sane world, the cpufreq core should just provide an API, like
> > cpufreq_set_freq_invariant() and it can be called by any part of
> > the kernel.
> >
> > I understand that the current problem is where do we make that call
> > from and I suggested dev_pm_opp_of_register_em() for the same earlier.
> > But that doesn't work as the policy isn't properly initialized by that
> > point.
>
> True, the policy is not initialized yet when cpufreq_driver::init()
> callback is called, which the current place for scmi-cpufreq.
>
> What about the other callback cpufreq_driver::ready() ?
> This might solve the two issues I mentioned below.
>
> >
> > Now that I see the current implementation, I think we can make it work
> > in a different way.
> >
> > - Copy what's done for thermal-cooling in cpufreq core, i.e.
> > CPUFREQ_IS_COOLING_DEV flag, for the EM core as well. So the cpufreq
> > drivers can set a flag, CPUFREQ_REGISTER_EM, and the cpufreq core
> > can call dev_pm_opp_of_register_em() on their behalf. This call will
> > be made from cpufreq_online(), just before/after
> > cpufreq_thermal_control_enabled() stuff. By this point the policy is
> > properly initialized and is also updated in
> > per_cpu(cpufreq_cpu_data).
> >
> > - Now the cpufreq core can provide an API like
> > cpufreq_set_freq_invariant(int cpu, unsigned long freq), which can
> > be called from EM core's implementation of
> > em_dev_register_perf_domain().
> >
>
> I have to point out that there are two issues (which we
> might solve):
> 1. Advanced setup, due to per-CPU performance requests,
> which are not limited to real DVFS domain.
> The scmi-cpufreq (and possibly some other soon) uses more
> tricky EM setup. As you might recall, the construction of temporary
> cpumask is a bit complex, since we want per-CPU policy, but the
> cpumask pass to EM has not a single bit but is 'spanned' with a few
> CPUs.
>
> 2. The scmi-cpufreq (and IIRC MTK cpufreq driver soon) requires
> custom struct em_data_callback, since the power data is coming from FW.
+1, we really need this to work.
> If there is a need for complex EM registration, maybe we could
> do it in the cpufreq_driver::ready(). The policy would be ready
> during that call, so it might fly.
I remember trying this, but ran into issues as well FWIW. I would need
to check if this is still relevant, but at the time this was introduced
we needed to register the EM _before_ the policy notifier is called with
'CPUFREQ_CREATE_POLICY', because this will trigger a sched domain
rebuild in the arch_topology driver, which allows the scheduler to pick
up the newly introduced EM data.
next prev parent reply other threads:[~2021-08-10 14:08 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-08 10:08 [PATCH v4 0/9] Inefficient OPPs Vincent Donnefort
2021-07-08 10:08 ` [PATCH v4 1/9] PM / EM: Fix inefficient states detection Vincent Donnefort
2021-07-08 10:08 ` [PATCH v4 2/9] PM / EM: Mark inefficient states Vincent Donnefort
2021-07-22 7:25 ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 3/9] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
2021-07-22 7:27 ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 4/9] PM / EM: Allow skipping inefficient states Vincent Donnefort
2021-07-22 13:09 ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 5/9] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 6/9] cpufreq: Add a new freq-table relation CPUFREQ_RELATION_E Vincent Donnefort
2021-07-22 8:17 ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 7/9] cpufreq: CPUFREQ_RELATION_E in schedutil ondemand and conservative Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 8/9] cpufreq: Add driver flag CPUFREQ_READ_ENERGY_MODEL Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 9/9] cpufreq: dt: Add CPUFREQ_READ_ENERGY_MODEL Vincent Donnefort
2021-08-04 16:21 ` [PATCH v4 0/9] Inefficient OPPs Rafael J. Wysocki
2021-08-10 6:13 ` Viresh Kumar
2021-08-10 7:39 ` Viresh Kumar
2021-08-10 12:28 ` Lukasz Luba
2021-08-10 14:07 ` Quentin Perret [this message]
2021-08-10 14:18 ` Lukasz Luba
2021-08-10 15:12 ` Lukasz Luba
2021-08-10 15:47 ` Quentin Perret
2021-08-11 5:03 ` Viresh Kumar
2021-08-11 11:38 ` Lukasz Luba
2021-08-16 14:35 ` Rafael J. Wysocki
2021-08-17 7:09 ` Viresh Kumar
2021-08-11 4:01 ` Viresh Kumar
2021-08-16 14:19 ` Rafael J. Wysocki
2021-08-17 7:06 ` Viresh Kumar
2021-08-17 9:03 ` Lukasz Luba
2021-08-23 17:06 ` Vincent Donnefort
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YRKINFhDmYqvgxsN@google.com \
--to=qperret@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=ionela.voinescu@arm.com \
--cc=linux-pm@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=mka@chromium.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rjw@rjwysocki.net \
--cc=vincent.donnefort@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).