* Re: [PATCH] thermal: intel: int340x_thermal: Remove unnecessary acpi_has_method() uses
From: Rafael J. Wysocki @ 2019-07-18 8:36 UTC (permalink / raw)
To: Kelsey Skunberg
Cc: rui.zhang, edubezval, daniel.lezcano, linux-pm, linux-kernel,
bjorn, skhan, linux-kernel-mentees
In-Reply-To: <20190717192639.90092-1-skunberg.kelsey@gmail.com>
On Wednesday, July 17, 2019 9:26:39 PM CEST Kelsey Skunberg wrote:
> acpi_evaluate_object() will already return in error if the method does not
> exist. Checking if the method is absent before the acpi_evaluate_object()
> call is not needed. Remove acpi_has_method() calls to avoid additional
> work.
>
> Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
LGTM:
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> diff --git a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> index 9716bc3abaf9..7130e90773ed 100644
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -77,9 +77,6 @@ int acpi_parse_trt(acpi_handle handle, int *trt_count, struct trt **trtp,
> struct acpi_buffer element = { 0, NULL };
> struct acpi_buffer trt_format = { sizeof("RRNNNNNN"), "RRNNNNNN" };
>
> - if (!acpi_has_method(handle, "_TRT"))
> - return -ENODEV;
> -
> status = acpi_evaluate_object(handle, "_TRT", NULL, &buffer);
> if (ACPI_FAILURE(status))
> return -ENODEV;
> @@ -158,9 +155,6 @@ int acpi_parse_art(acpi_handle handle, int *art_count, struct art **artp,
> struct acpi_buffer art_format = {
> sizeof("RRNNNNNNNNNNN"), "RRNNNNNNNNNNN" };
>
> - if (!acpi_has_method(handle, "_ART"))
> - return -ENODEV;
> -
> status = acpi_evaluate_object(handle, "_ART", NULL, &buffer);
> if (ACPI_FAILURE(status))
> return -ENODEV;
>
^ permalink raw reply
* Re: [PATCH RESEND] PCI: disable runtime PM for PLX switches
From: Rafael J. Wysocki @ 2019-07-18 8:35 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Alexander Fomichev, Rafael J. Wysocki, Mika Westerberg, Linux PCI,
linux, Linux PM, Logan Gunthorpe
In-Reply-To: <20190717214205.GA61126@google.com>
On Wednesday, July 17, 2019 11:42:05 PM CEST Bjorn Helgaas wrote:
> On Thu, Jun 27, 2019 at 02:06:24PM +0300, Alexander Fomichev wrote:
> > On Wed, Apr 24, 2019 at 11:09:52PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Apr 24, 2019 at 8:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, Apr 24, 2019 at 05:58:19PM +0300, Mika Westerberg wrote:
> > > > > On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
> > > > > > - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
> > > > > > turn off runtime PM? If we allow mmap of a BAR and then put the
> > > > > > device in D3hot, that seems like a bug that could affect lots of
> > > > > > things. But maybe that's already done magically elsewhere?
> > > > >
> > > > > IIRC there is no PM magic happening for MMIO userspace accesses.
> > > > >
> > > > > What you suggest above sounds like a good way to fix it. We already do
> > > > > similar for config space access from userspace (if the device is in
> > > > > D3cold) so definitely makes sense to do the same for MMIO. However, I
> > > > > don't think we need to disable runtime PM - it should be enough to
> > > > > increase the reference count (pm_runtime_get_sync() and friends) during
> > > > > the time the MMIO resource is mmapped.
> > > >
> > > > OK, so if I understand correctly this would be basically adding
> > > > pm_runtime_get_sync(dev) in pci_mmap_resource(). I don't know what
> > > > the unmap path is, but there would have to be a matching
> > > > pm_runtime_put() somewhere.
> > >
> > > Right.
> > >
> > > > And a similar change in the read/write path for /sys/.../resource<N>;
> > > > I think this must be related to the sysfs_create_bin_file() call in
> > > > pci_create_attr(), but I don't see the path where the actual
> > > > read/write to the device is done.
> > > >
> > > > And probably something similar should be done in pci_resource_io(),
> > > > pci_map_rom(), and pci_unmap_rom().
> > >
> > > In general, every path in which there is a memory or IO address space
> > > access requires pm_runtime_get_sync()/pm_runtime_put() around it as
> > > these accesses are only guaranteed to work in D0.
> >
> > Tested a solution based on proposals by Logan, Bjorn, Mika, Rafael (thanks all
> > of you, guys), I managed to fix the problem inside the PLX driver code. So no
> > additional quirks or other modifications in Linux kernel needed. I think
> > my patch can be easily rejected.
>
> Can you fill us in a little bit on the solution? Are you referring to
> an out-of-tree PLX kernel driver? I assume this is not a userspace
> PLX tool because I don't think we have a solution to make sysfs mmap
> safe yet.
>
> Did you have to call pm_runtime_get() or similar from your driver?
> Did your driver already call some PM interface before that? (If you
> could point us at the source, that would be ideal.)
>
> Rafael, does a PCI driver have to indicate somehow that it's prepared
> for runtime PM?
Yes, it does.
Please see the comment in local_pci_probe().
> I assume the runtime PM core is designed in such a
> way that it doesn't force driver changes (e.g., maybe driver changes
> would enable more power savings, but at least things would *work*
> unchanged).
Right.
^ permalink raw reply
* Re: [PATCH RFC v2 0/2] Use cpu based scaling passive governor for MT8183 CCI
From: Hsin-Yi Wang @ 2019-07-18 7:41 UTC (permalink / raw)
To: moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
Cc: MyungJoo Ham ), Kyungmin Park, Chanwoo Choi, Matthias Brugger,
Rafael J . Wysocki, Viresh Kumar, linux-pm, linux-mediatek, lkml
In-Reply-To: <20190717061124.453-1-hsinyi@chromium.org>
On Wed, Jul 17, 2019 at 2:12 PM Hsin-Yi Wang <hsinyi@chromium.org> wrote:
> 2. cpu based scaling support to passive_governor from Sibi Sankar
> https://lore.kernel.org/patchwork/patch/1101049/
This series is tested with previous version:
https://patchwork.kernel.org/patch/10875195/
^ permalink raw reply
* [PATCH] Revert "cpufreq: schedutil: Don't set next_freq to UINT_MAX"
From: Doug Smythies @ 2019-07-18 6:26 UTC (permalink / raw)
To: viresh.kumar, rjw, joel; +Cc: linux-kernel, linux-pm, dsmythies
This reverts commit ecd2884291261e3fddbc7651ee11a20d596bb514.
The commit caused a regression whereby reducing the maximum
CPU clock frequency is ineffective while busy, and the CPU
clock remains unchanged. Once the system has experienced
some idle time, the new limit is implemented.
A consequence is that any thermal throttling monitoring
and control based on max freq limits fail to respond
in a timely manor, if at all, to a thermal temperature
trip on a busy system.
Please review the original e-mail threads, as this
attempt at a revertion would then re-open some of
those questions. One possible alterative to this
revertion would be to revert B7EAF1AAB9F8:
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Wed Mar 22 00:08:50 2017 +0100
cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
References:
https://marc.info/?t=152576189300002&r=1&w=2
https://marc.info/?t=152586219000002&r=1&w=2
https://marc.info/?t=152607169800004&r=1&w=2
https://marc.info/?t=149013813900002&r=1&w=2
---
Test Data:
Typical commands:
watch -n 5 cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
watch -n 5 cat /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq
sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 5
watch -n 5 cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
Driver: intel_cpufreq ; Governor: Schedutil
Kernel 5.2:
Test 1: No thermal throttling involved:
- load the system (some sort of CPU stress test).
- adjust downwards the maximum clock frequency
echo 60 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
- observe the Bzy_MHz does not change on the turbostat output.
- reduce the system load, such that there is some idle time.
- observe the Bzy_MHz now does change.
- increase the load again.
- observe the CPU frequency is now pinned to the new limit.
Test 2: Use thermald to try Thermal throttling:
- By default thermald will use pstate control first.
On my system a lower thermal trip point it used,
because it doesn't get hot enough.
- enable and start thermald
- load the system (some sort of CPU stress test)
- Once the thermal trip point is reached, observe thermald
adjusting the max_perf_pct but nothing happens in response.
- Once the lowest limit of max_perf_pct is reached, observe
thermald fall through to intel-powerclamp, and kidle inject
starts to be used. Therefore the "busy" condition (see the code)
is not longer met, and the CPU frequency suddenly goes to minimum.
Test 3: Limit thermald to using max_perf_pct only:
- Adjust the cpu cooling device order to just include the one
option, thereby excluding help from the fall through partner
experienced in test 2.
- Observations here tracked test 2 until the lower limit
of max_perf_pct is reached. The CPUs remain at maximum
frequency until the load is removed. Processor
temperature continues to climb.
Driver: intel_cpufreq ; Governor: Schedutil
Kernel 5.2 + this revert patch:
Test 1: No thermal throttling involved:
- load the system (some sort of CPU stress test)
- adjust downwards the maximum clock frequency
echo 60 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
- observe the Bzy_MHz changes on the turbostat output.
Test 2: Use thermald to try Thermal throttling:
- By default thermald will use pstate control first.
- enable and start thermald
- load the system (some sort of CPU stress test)
- Once the thermal trip point is reached, observe thermald
adjusting the max_perf_pct and observe the actual frequency
adjust in response.
Test 3: Limit thermald to using max_perf_pct only:
- Adjust the cpu cooling device order to just include the one
option, adjusting max_perf_pct.
- Observations here tracked test 2. (test 2 never fell
through to intel_powerclamp and kidle inject.
Driver: acpi-cpufreq ; Governor: Schedutil
Kernel 5.2:
Test 1: No thermal throttling involved:
- load the system (some sort of CPU stress test)
- adjust downwards the maximum clock frequency
for file in /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq; do echo "2200000" > $file; done
- observe the Bzy_MHz does not change on the turbostat output.
- reduce the system load, such that there is some idle time.
- observe the Bzy_MHz now does change.
- increase the load again.
- observe the CPU frequency is now pinned to the new limit.
Test 2: Use thermald to try Thermal throttling:
- By default thermald will use intel_powerclamp control first.
- enable and start thermald
- load the system (some sort of CPU stress test)
- Once the thermal trip point is reached, observe thermald
adjusting the kidle_inj amounts.
- If the maximum kidle_inj is not enough, then oberve thermald
start to limit scaling_max_freq, however the actual frequency
response is immediate. Why? Because of the kidle inject, the
system is not longer fully "busy", and so that condition is not met.
Test 3: Limit thermald to using scaling_max_freq only:
- Adjust the cpu cooling device order to just include the one
option, thereby excluding help from previous partner
experienced in test 2.
- load the system (some sort of CPU stress test)
- Once the thermal trip point is reached, observe thermald
reducing scaling_max_freq.
- observe the Bzy_MHz does not change on the turbostat output.
- Eventually, for all CPUs, scaling_max_freq is set to the minimum.
- Processor temperature continues to climb.
- The CPUs remain at maximum frequency until the load is removed.
Driver: acpi-cpufreq ; Governor: Schedutil
Kernel 5.2 + this revert patch:
Test 1: No thermal throttling involved:
- load the system (some sort of CPU stress test)
- adjust downwards the maximum clock frequency
for file in /sys/devices/system/cpu/cpufreq/policy*/scaling_max_freq; do echo "2200000" > $file; done
- observe the Bzy_MHz changes on the turbostat output.
Test 2: Use thermald to try Thermal throttling:
- By default thermald will use intel_powerclamp control first.
- enable and start thermald
- This test already passes using kernel 5.2, and
proper operation was observed with this revert.
Test 3: Limit thermald to using scaling_max_freq only:
- Adjust the cpu cooling device order to just include the one
option, thereby excluding help from previous partner
experienced in test 2.
- load the system (some sort of CPU stress test)
- Once the thermal trip point is reached, observe thermald
reducing scaling_max_freq.
- observe the Bzy_MHz now does change on the turbostat output
and the processor package temperature is now controlled.
---
kernel/sched/cpufreq_schedutil.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 962cf34..ea4e04b 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -89,8 +89,15 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
!cpufreq_this_cpu_can_update(sg_policy->policy))
return false;
- if (unlikely(sg_policy->need_freq_update))
+ if (unlikely(sg_policy->need_freq_update)) {
+ sg_policy->need_freq_update = false;
+ /*
+ * This happens when limits change, so forget the previous
+ * next_freq value and force an update.
+ */
+ sg_policy->next_freq = UINT_MAX;
return true;
+ }
delta_ns = time - sg_policy->last_freq_update_time;
@@ -168,10 +175,8 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
freq = map_util_freq(util, freq, max);
- if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
+ if (freq == sg_policy->cached_raw_freq && sg_policy->next_freq != UINT_MAX)
return sg_policy->next_freq;
-
- sg_policy->need_freq_update = false;
sg_policy->cached_raw_freq = freq;
return cpufreq_driver_resolve_freq(policy, freq);
}
@@ -457,7 +462,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
* Do not reduce the frequency if the CPU has not been idle
* recently, as the reduction is likely to be premature then.
*/
- if (busy && next_f < sg_policy->next_freq) {
+ if (busy && next_f < sg_policy->next_freq &&
+ sg_policy->next_freq != UINT_MAX) {
next_f = sg_policy->next_freq;
/* Reset cached freq as next_freq has changed */
@@ -819,7 +825,7 @@ static int sugov_start(struct cpufreq_policy *policy)
sg_policy->freq_update_delay_ns = sg_policy->tunables->rate_limit_us * NSEC_PER_USEC;
sg_policy->last_freq_update_time = 0;
- sg_policy->next_freq = 0;
+ sg_policy->next_freq = UINT_MAX;
sg_policy->work_in_progress = false;
sg_policy->need_freq_update = false;
sg_policy->cached_raw_freq = 0;
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
From: Viresh Kumar @ 2019-07-18 5:37 UTC (permalink / raw)
To: Saravana Kannan
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
In-Reply-To: <CAGETcx-tbjVzRKW8D-564zgNOhrA_z-NC1q5U70bhoUDBhp6VA@mail.gmail.com>
I know you have explained lots of things earlier as well, but they are
available over multiple threads and I don't know where to reply now :)
Lets have proper discussion (once again) here and be done with it.
Sorry for the trouble of explaining things again.
On 17-07-19, 13:34, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > gpu_cache_opp_table: gpu_cache_opp_table {
> > > compatible = "operating-points-v2";
> > >
> > > gpu_cache_3000: opp-3000 {
> > > opp-peak-KBps = <3000>;
> > > opp-avg-KBps = <1000>;
> > > };
> > > gpu_cache_6000: opp-6000 {
> > > opp-peak-KBps = <6000>;
> > > opp-avg-KBps = <2000>;
> > > };
> > > gpu_cache_9000: opp-9000 {
> > > opp-peak-KBps = <9000>;
> > > opp-avg-KBps = <9000>;
> > > };
> > > };
> > >
> > > gpu_ddr_opp_table: gpu_ddr_opp_table {
> > > compatible = "operating-points-v2";
> > >
> > > gpu_ddr_1525: opp-1525 {
> > > opp-peak-KBps = <1525>;
> > > opp-avg-KBps = <452>;
> > > };
> > > gpu_ddr_3051: opp-3051 {
> > > opp-peak-KBps = <3051>;
> > > opp-avg-KBps = <915>;
> > > };
> > > gpu_ddr_7500: opp-7500 {
> > > opp-peak-KBps = <7500>;
> > > opp-avg-KBps = <3000>;
> > > };
> > > };
> >
> > Who is going to use the above tables and how ?
>
> In this example the GPU driver would use these. It'll go through these
> and then decide what peak and average bw to pick based on whatever
> criteria.
Are you saying that the GPU driver will decide which bandwidth to
choose while running at a particular frequency (say 2 GHz) ? And that
it can choose 1525 or 3051 or 7500 from the ddr path ?
Will it be possible to publicly share how we derive to these decisions
?
The thing is I don't like these separate OPP tables which will not be
used by anyone else, but just GPU (or a single device). I would like
to put this data in the GPU OPP table only. What about putting a
range in the GPU OPP table for the Bandwidth if it can change so much
for the same frequency.
> > These are the maximum
> > BW available over these paths, right ?
>
> I wouldn't call them "maximum" because there can't be multiple
> maximums :) But yes, these are the meaningful bandwidth from the GPU's
> perspective to use over these paths.
>
> >
> > > gpu_opp_table: gpu_opp_table {
> > > compatible = "operating-points-v2";
> > > opp-shared;
> > >
> > > opp-200000000 {
> > > opp-hz = /bits/ 64 <200000000>;
> > > };
> > > opp-400000000 {
> > > opp-hz = /bits/ 64 <400000000>;
> > > };
> > > };
> >
> > Shouldn't this link back to the above tables via required-opp, etc ?
> > How will we know how much BW is required by the GPU device for all the
> > paths ?
>
> If that's what the GPU driver wants to do, then yes. But the GPU
> driver could also choose to scale the bandwidth for these paths based
> on multiple other signals. Eg: bus port busy percentage, measure
> bandwidth, etc.
Lets say that the GPU is running at 2 GHz right now and based on above
inputs it wants to increase the bandwidth to 7500 for ddr path, now
does it make sense to run at 4 GHz instead of 2 so we utilize the
bandwidth to the best of our ability and waste less power ?
If something like that is acceptable, then what about keeping the
bandwidth fixed for frequencies and rather scale the frequency of the
GPU on the inputs your provided (like bus port busy percentage, etc).
The current proposal makes me wonder on why should we try to reuse OPP
tables for providing these bandwidth values as the OPP tables for
interconnect paths isn't really a lot of data, only bandwidth all the
time and there is no linking from the device's OPP table as well.
--
viresh
^ permalink raw reply
* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
From: Viresh Kumar @ 2019-07-18 4:35 UTC (permalink / raw)
To: Saravana Kannan
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
In-Reply-To: <CAGETcx-kUM7MqNYowwNAL1Q0bnFzxPEO6yMg0YTkk16=OnPdmg@mail.gmail.com>
On 17-07-19, 13:29, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 12:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > Interconnects often quantify their performance points in terms of
> > > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > > allow specifying Bandwidth OPP tables in DT.
> > >
> > > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > > tables.
> > >
> > > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > > tables.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > > Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> > > 1 file changed, 12 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > > index 76b6c79604a5..c869e87caa2a 100644
> > > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > > @@ -83,9 +83,14 @@ properties.
> > >
> > > Required properties:
> > > - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > > - required property for all device nodes but devices like power domains. The
> > > - power domain nodes must have another (implementation dependent) property which
> > > - uniquely identifies the OPP nodes.
> > > + required property for all device nodes but for devices like power domains or
> > > + bandwidth opp tables. The power domain nodes must have another (implementation
> > > + dependent) property which uniquely identifies the OPP nodes. The interconnect
> > > + opps are required to have the opp-peak-bw property.
> >
> > ??
>
> Sorry, what's the question? Was this an accidental email?
Too much smartness is too bad sometimes, sorry about that :)
I placed the ?? right below "opp-peak-bw", there is no property like
that. You failed to update it :)
--
viresh
^ permalink raw reply
* Re: [PATCH] cpuidle: Always stop scheduler tick on adaptive-tick CPUs
From: Frederic Weisbecker @ 2019-07-18 0:08 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Linux PM, Thomas Lindroth, LKML,
Peter Zijlstra
In-Reply-To: <CAJZ5v0icfumJc7E4+LgWpi3+UNpTsH4usAJOg4FEeCBptYYzUQ@mail.gmail.com>
On Thu, Jul 18, 2019 at 12:27:09AM +0200, Rafael J. Wysocki wrote:
> On Wed, Jul 17, 2019 at 3:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
> >
> > On Wed, Jul 17, 2019 at 09:55:08AM +0200, Rafael J. Wysocki wrote:
> > Well I think we disagree on that assumption that if a nohz_full CPU is put
> > idle, it will remain there indefinitely. Nohz_full CPUs aren't really special
> > in this regard, they can sleep on an IO, wait for a short event just like
> > any other CPU.
>
> Fair enough.
>
> This means that the governor (or rather governors) will need to be
> modified to address the issue reported by Thomas.
>
> Fortunately, I have a patch going in that direction too. :-)
Good to hear, please also Cc me on that one, thanks!
^ permalink raw reply
* Re: [PATCH] cpuidle: Always stop scheduler tick on adaptive-tick CPUs
From: Rafael J. Wysocki @ 2019-07-17 22:27 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, Thomas Lindroth,
LKML, Peter Zijlstra
In-Reply-To: <20190717132115.GB8345@lenoir>
On Wed, Jul 17, 2019 at 3:21 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Wed, Jul 17, 2019 at 09:55:08AM +0200, Rafael J. Wysocki wrote:
> > On Tue, Jul 16, 2019 at 11:40 PM Frederic Weisbecker
> > <frederic@kernel.org> wrote:
> > >
> > > On Tue, Jul 16, 2019 at 05:25:10PM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > Running the scheduler tick on idle adaptive-tick CPUs is not useful
> > >
> > > Judging by the below change, you mean full dynticks, right?
> >
> > Right.
> >
> > > > and it may also be not expected by users (as reported by Thomas), so
> > > > add a check to cpuidle_idle_call() to always stop the tick on them
> > > > regardless of the idle duration predicted by the governor.
> > > >
> > > > Fixes: 554c8aa8ecad ("sched: idle: Select idle state before stopping the tick")
> > > > Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
> > > > Tested-by: Thomas Lindroth <thomas.lindroth@gmail.com>
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > > kernel/sched/idle.c | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > Index: linux-pm/kernel/sched/idle.c
> > > > ===================================================================
> > > > --- linux-pm.orig/kernel/sched/idle.c
> > > > +++ linux-pm/kernel/sched/idle.c
> > > > @@ -191,7 +191,8 @@ static void cpuidle_idle_call(void)
> > > > */
> > > > next_state = cpuidle_select(drv, dev, &stop_tick);
> > > >
> > > > - if (stop_tick || tick_nohz_tick_stopped())
> > > > + if (stop_tick || tick_nohz_tick_stopped() ||
> > > > + !housekeeping_cpu(dev->cpu, HK_FLAG_TICK))
> > >
> > > But tick_nohz_tick_stopped() also works on full dynticks CPUs. If the
> > > tick isn't stopped on a full dynticks CPU by the time we reach this path,
> > > it means that the conditions for the tick to be stopped are not met anyway
> > > (eg: more than one task and sched tick is needed, perf event requires the tick,
> > > posix CPU timer, etc...)
> >
> > First of all, according to Thomas, the patch does make a difference,
> > so evidently on his system(s) the full dynticks CPUs enter the idle
> > loop with running tick.
> >
> > This means that, indeed, the conditions for the tick to be stopped
> > have not been met up to that point, but if the (full dynticks) CPU
> > becomes idle, that's because it has been made idle on purpose
> > (presumably by a user-space "orchestrator" or the sysadmin), so the
> > kernel can assume that it will remain idle indefinitely. That, in
> > turn, is when the tick would be stopped on it regardless of everything
> > else (even if it wasn't a full dynticks CPU).
>
> Well I think we disagree on that assumption that if a nohz_full CPU is put
> idle, it will remain there indefinitely. Nohz_full CPUs aren't really special
> in this regard, they can sleep on an IO, wait for a short event just like
> any other CPU.
Fair enough.
This means that the governor (or rather governors) will need to be
modified to address the issue reported by Thomas.
Fortunately, I have a patch going in that direction too. :-)
Cheers!
^ permalink raw reply
* [PATCH v3 1/5] OPP: Allow required-opps even if the device doesn't have power-domains
From: Saravana Kannan @ 2019-07-17 22:23 UTC (permalink / raw)
To: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
Cc: Saravana Kannan, Sibi Sankar, kernel-team, linux-pm, linux-kernel
In-Reply-To: <20190717222340.137578-1-saravanak@google.com>
A Device-A can have a (minimum) performance requirement on another
Device-B to be able to function correctly. This performance requirement
on Device-B can also change based on the current performance level of
Device-A.
The existing required-opps feature fits well to describe this need. So,
instead of limiting required-opps to point to only PM-domain devices,
allow it to point to any device.
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
drivers/opp/core.c | 2 +-
drivers/opp/of.c | 11 -----------
2 files changed, 1 insertion(+), 12 deletions(-)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index c094d5d20fd7..438fcd134d93 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -707,7 +707,7 @@ static int _set_required_opps(struct device *dev,
return 0;
/* Single genpd case */
- if (!genpd_virt_devs) {
+ if (!genpd_virt_devs && required_opp_tables[0]->is_genpd) {
pstate = likely(opp) ? opp->required_opps[0]->pstate : 0;
ret = dev_pm_genpd_set_performance_state(dev, pstate);
if (ret) {
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index b313aca9894f..ff88eaf66b56 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -197,17 +197,6 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
if (IS_ERR(required_opp_tables[i]))
goto free_required_tables;
-
- /*
- * We only support genpd's OPPs in the "required-opps" for now,
- * as we don't know how much about other cases. Error out if the
- * required OPP doesn't belong to a genpd.
- */
- if (!required_opp_tables[i]->is_genpd) {
- dev_err(dev, "required-opp doesn't belong to genpd: %pOF\n",
- required_np);
- goto free_required_tables;
- }
}
goto put_np;
--
2.22.0.510.g264f2c817a-goog
^ permalink raw reply related
* [PATCH v3 4/5] PM / devfreq: Cache OPP table reference in devfreq
From: Saravana Kannan @ 2019-07-17 22:23 UTC (permalink / raw)
To: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
Cc: Saravana Kannan, Sibi Sankar, kernel-team, linux-pm, linux-kernel
In-Reply-To: <20190717222340.137578-1-saravanak@google.com>
The OPP table can be used often in devfreq. Trying to get it each time can
be expensive, so cache it in the devfreq struct.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
---
drivers/devfreq/devfreq.c | 6 ++++++
include/linux/devfreq.h | 2 ++
2 files changed, 8 insertions(+)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 784c08e4f931..7984b01d585d 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -594,6 +594,8 @@ static void devfreq_dev_release(struct device *dev)
if (devfreq->profile->exit)
devfreq->profile->exit(devfreq->dev.parent);
+ if (devfreq->opp_table)
+ dev_pm_opp_put_opp_table(devfreq->opp_table);
mutex_destroy(&devfreq->lock);
kfree(devfreq);
}
@@ -674,6 +676,10 @@ struct devfreq *devfreq_add_device(struct device *dev,
devfreq->max_freq = devfreq->scaling_max_freq;
devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev);
+ devfreq->opp_table = dev_pm_opp_get_opp_table(dev);
+ if (IS_ERR(devfreq->opp_table))
+ devfreq->opp_table = NULL;
+
atomic_set(&devfreq->suspend_count, 0);
dev_set_name(&devfreq->dev, "devfreq%d",
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 2bae9ed3c783..1c05129f76c0 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -116,6 +116,7 @@ struct devfreq_dev_profile {
* @profile: device-specific devfreq profile
* @governor: method how to choose frequency based on the usage.
* @governor_name: devfreq governor name for use with this devfreq
+ * @opp_table: Reference to OPP table of dev.parent, if one exists.
* @nb: notifier block used to notify devfreq object that it should
* reevaluate operable frequencies. Devfreq users may use
* devfreq.nb to the corresponding register notifier call chain.
@@ -153,6 +154,7 @@ struct devfreq {
struct devfreq_dev_profile *profile;
const struct devfreq_governor *governor;
char governor_name[DEVFREQ_NAME_LEN];
+ struct opp_table *opp_table;
struct notifier_block nb;
struct delayed_work work;
--
2.22.0.510.g264f2c817a-goog
^ permalink raw reply related
* [PATCH v3 5/5] PM / devfreq: Add required OPPs support to passive governor
From: Saravana Kannan @ 2019-07-17 22:23 UTC (permalink / raw)
To: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
Cc: Saravana Kannan, Sibi Sankar, kernel-team, linux-pm, linux-kernel
In-Reply-To: <20190717222340.137578-1-saravanak@google.com>
Look at the required OPPs of the "parent" device to determine the OPP that
is required from the slave device managed by the passive governor. This
allows having mappings between a parent device and a slave device even when
they don't have the same number of OPPs.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
---
drivers/devfreq/governor_passive.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
index 58308948b863..24ce94c80f06 100644
--- a/drivers/devfreq/governor_passive.c
+++ b/drivers/devfreq/governor_passive.c
@@ -19,7 +19,7 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
= (struct devfreq_passive_data *)devfreq->data;
struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent;
unsigned long child_freq = ULONG_MAX;
- struct dev_pm_opp *opp;
+ struct dev_pm_opp *opp = NULL, *p_opp = NULL;
int i, count, ret = 0;
/*
@@ -56,13 +56,20 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
* list of parent device. Because in this case, *freq is temporary
* value which is decided by ondemand governor.
*/
- opp = devfreq_recommended_opp(parent_devfreq->dev.parent, freq, 0);
- if (IS_ERR(opp)) {
- ret = PTR_ERR(opp);
+ p_opp = devfreq_recommended_opp(parent_devfreq->dev.parent, freq, 0);
+ if (IS_ERR(p_opp)) {
+ ret = PTR_ERR(p_opp);
goto out;
}
- dev_pm_opp_put(opp);
+ if (devfreq->opp_table && parent_devfreq->opp_table)
+ opp = dev_pm_opp_xlate_opp(parent_devfreq->opp_table,
+ devfreq->opp_table, p_opp);
+ if (opp) {
+ *freq = dev_pm_opp_get_freq(opp);
+ dev_pm_opp_put(opp);
+ goto out;
+ }
/*
* Get the OPP table's index of decided freqeuncy by governor
@@ -89,6 +96,9 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
*freq = child_freq;
out:
+ if (!IS_ERR_OR_NULL(opp))
+ dev_pm_opp_put(p_opp);
+
return ret;
}
--
2.22.0.510.g264f2c817a-goog
^ permalink raw reply related
* [PATCH v3 2/5] OPP: Add function to look up required OPP's for a given OPP
From: Saravana Kannan @ 2019-07-17 22:23 UTC (permalink / raw)
To: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
Cc: Saravana Kannan, Sibi Sankar, kernel-team, linux-pm, linux-kernel
In-Reply-To: <20190717222340.137578-1-saravanak@google.com>
Add a function that allows looking up required OPPs given a source OPP
table, destination OPP table and the source OPP.
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
drivers/opp/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++
include/linux/pm_opp.h | 11 +++++++++
2 files changed, 65 insertions(+)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 438fcd134d93..72c055a3f6b7 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1883,6 +1883,60 @@ void dev_pm_opp_detach_genpd(struct opp_table *opp_table)
}
EXPORT_SYMBOL_GPL(dev_pm_opp_detach_genpd);
+/**
+ * dev_pm_opp_xlate_opp() - Find required OPP for src_table OPP.
+ * @src_table: OPP table which has dst_table as one of its required OPP table.
+ * @dst_table: Required OPP table of the src_table.
+ * @pstate: OPP of the src_table.
+ *
+ * This function returns the OPP (present in @dst_table) pointed out by the
+ * "required-opps" property of the OPP (present in @src_table).
+ *
+ * The callers are required to call dev_pm_opp_put() for the returned OPP after
+ * use.
+ *
+ * Return: destination table OPP on success, otherwise NULL on errors.
+ */
+struct dev_pm_opp *dev_pm_opp_xlate_opp(struct opp_table *src_table,
+ struct opp_table *dst_table,
+ struct dev_pm_opp *src_opp)
+{
+ struct dev_pm_opp *opp, *dest_opp = NULL;
+ int i;
+
+ if (!src_table || !dst_table || !src_opp)
+ return NULL;
+
+ for (i = 0; i < src_table->required_opp_count; i++) {
+ if (src_table->required_opp_tables[i]->np == dst_table->np)
+ break;
+ }
+
+ if (unlikely(i == src_table->required_opp_count)) {
+ pr_err("%s: Couldn't find matching OPP table (%p: %p)\n",
+ __func__, src_table, dst_table);
+ return NULL;
+ }
+
+ mutex_lock(&src_table->lock);
+
+ list_for_each_entry(opp, &src_table->opp_list, node) {
+ if (opp == src_opp) {
+ dest_opp = opp->required_opps[i];
+ dev_pm_opp_get(dest_opp);
+ goto unlock;
+ }
+ }
+
+ pr_err("%s: Couldn't find matching OPP (%p: %p)\n", __func__, src_table,
+ dst_table);
+
+unlock:
+ mutex_unlock(&src_table->lock);
+
+ return dest_opp;
+}
+
/**
* dev_pm_opp_xlate_performance_state() - Find required OPP's pstate for src_table.
* @src_table: OPP table which has dst_table as one of its required OPP table.
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index af5021f27cb7..36f52b9cf24a 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -131,6 +131,9 @@ void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table);
struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char **names);
void dev_pm_opp_detach_genpd(struct opp_table *opp_table);
int dev_pm_opp_xlate_performance_state(struct opp_table *src_table, struct opp_table *dst_table, unsigned int pstate);
+struct dev_pm_opp *dev_pm_opp_xlate_opp(struct opp_table *src_table,
+ struct opp_table *dst_table,
+ struct dev_pm_opp *src_opp);
int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
int dev_pm_opp_set_sharing_cpus(struct device *cpu_dev, const struct cpumask *cpumask);
int dev_pm_opp_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask);
@@ -304,6 +307,14 @@ static inline int dev_pm_opp_xlate_performance_state(struct opp_table *src_table
return -ENOTSUPP;
}
+static inline struct dev_pm_opp *dev_pm_opp_xlate_opp(
+ struct opp_table *src_table,
+ struct opp_table *dst_table,
+ struct dev_pm_opp *src_opp)
+{
+ return NULL;
+}
+
static inline int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
{
return -ENOTSUPP;
--
2.22.0.510.g264f2c817a-goog
^ permalink raw reply related
* [PATCH v3 3/5] OPP: Improve require-opps linking
From: Saravana Kannan @ 2019-07-17 22:23 UTC (permalink / raw)
To: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
Cc: Saravana Kannan, Sibi Sankar, kernel-team, linux-pm, linux-kernel
In-Reply-To: <20190717222340.137578-1-saravanak@google.com>
Currently, the linking of required-opps fails silently if the
destination OPP table hasn't been added before the source OPP table is
added. This puts an unnecessary requirement that the destination table
be added before the source table is added.
In reality, the destination table is needed only when we try to
translate from source OPP to destination OPP. So, instead of
completely failing, retry linking the tables when the translation is
attempted.
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
drivers/opp/core.c | 32 +++++++++++-----
drivers/opp/of.c | 91 ++++++++++++++++++++++------------------------
drivers/opp/opp.h | 5 +++
3 files changed, 71 insertions(+), 57 deletions(-)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 72c055a3f6b7..cafe6ec05d6c 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -706,8 +706,11 @@ static int _set_required_opps(struct device *dev,
if (!required_opp_tables)
return 0;
+ _of_lazy_link_required_tables(opp_table);
+
/* Single genpd case */
- if (!genpd_virt_devs && required_opp_tables[0]->is_genpd) {
+ if (!genpd_virt_devs && required_opp_tables[0]
+ && required_opp_tables[0]->is_genpd) {
pstate = likely(opp) ? opp->required_opps[0]->pstate : 0;
ret = dev_pm_genpd_set_performance_state(dev, pstate);
if (ret) {
@@ -726,11 +729,16 @@ static int _set_required_opps(struct device *dev,
mutex_lock(&opp_table->genpd_virt_dev_lock);
for (i = 0; i < opp_table->required_opp_count; i++) {
- pstate = likely(opp) ? opp->required_opps[i]->pstate : 0;
-
if (!genpd_virt_devs[i])
continue;
+ if (!opp->required_opps[i]) {
+ ret = -ENODEV;
+ break;
+ }
+
+ pstate = likely(opp) ? opp->required_opps[i]->pstate : 0;
+
ret = dev_pm_genpd_set_performance_state(genpd_virt_devs[i], pstate);
if (ret) {
dev_err(dev, "Failed to set performance rate of %s: %d (%d)\n",
@@ -1907,8 +1915,11 @@ struct dev_pm_opp *dev_pm_opp_xlate_opp(struct opp_table *src_table,
if (!src_table || !dst_table || !src_opp)
return NULL;
+ _of_lazy_link_required_tables(src_table);
+
for (i = 0; i < src_table->required_opp_count; i++) {
- if (src_table->required_opp_tables[i]->np == dst_table->np)
+ if (src_table->required_opp_tables[i]
+ && src_table->required_opp_tables[i]->np == dst_table->np)
break;
}
@@ -1971,6 +1982,8 @@ int dev_pm_opp_xlate_performance_state(struct opp_table *src_table,
if (!src_table->required_opp_count)
return pstate;
+ _of_lazy_link_required_tables(src_table);
+
for (i = 0; i < src_table->required_opp_count; i++) {
if (src_table->required_opp_tables[i]->np == dst_table->np)
break;
@@ -1986,15 +1999,16 @@ int dev_pm_opp_xlate_performance_state(struct opp_table *src_table,
list_for_each_entry(opp, &src_table->opp_list, node) {
if (opp->pstate == pstate) {
- dest_pstate = opp->required_opps[i]->pstate;
- goto unlock;
+ if (opp->required_opps[i])
+ dest_pstate = opp->required_opps[i]->pstate;
+ break;
}
}
- pr_err("%s: Couldn't find matching OPP (%p: %p)\n", __func__, src_table,
- dst_table);
+ if (dest_pstate < 0)
+ pr_err("%s: Couldn't find matching OPP (%p: %p)\n", __func__,
+ src_table, dst_table);
-unlock:
mutex_unlock(&src_table->lock);
return dest_pstate;
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index ff88eaf66b56..4e527245fd59 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -145,7 +145,7 @@ static void _opp_table_free_required_tables(struct opp_table *opp_table)
for (i = 0; i < opp_table->required_opp_count; i++) {
if (IS_ERR_OR_NULL(required_opp_tables[i]))
- break;
+ continue;
dev_pm_opp_put_opp_table(required_opp_tables[i]);
}
@@ -165,8 +165,8 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
struct device_node *opp_np)
{
struct opp_table **required_opp_tables;
- struct device_node *required_np, *np;
- int count, i;
+ struct device_node *np;
+ int count;
/* Traversing the first OPP node is all we need */
np = of_get_next_available_child(opp_np, NULL);
@@ -176,35 +176,57 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
}
count = of_count_phandle_with_args(np, "required-opps", NULL);
+ of_node_put(np);
if (!count)
- goto put_np;
+ return;
required_opp_tables = kcalloc(count, sizeof(*required_opp_tables),
GFP_KERNEL);
if (!required_opp_tables)
- goto put_np;
+ return;
opp_table->required_opp_tables = required_opp_tables;
opp_table->required_opp_count = count;
+}
- for (i = 0; i < count; i++) {
- required_np = of_parse_required_opp(np, i);
- if (!required_np)
- goto free_required_tables;
+void _of_lazy_link_required_tables(struct opp_table *src)
+{
+ struct dev_pm_opp *src_opp, *tmp_opp;
+ struct opp_table *req_table;
+ struct device_node *req_np;
+ int i;
- required_opp_tables[i] = _find_table_of_opp_np(required_np);
- of_node_put(required_np);
+ mutex_lock(&src->lock);
- if (IS_ERR(required_opp_tables[i]))
- goto free_required_tables;
- }
+ if (list_empty(&src->opp_list))
+ goto out;
- goto put_np;
+ src_opp = list_first_entry(&src->opp_list, struct dev_pm_opp, node);
-free_required_tables:
- _opp_table_free_required_tables(opp_table);
-put_np:
- of_node_put(np);
+ for (i = 0; i < src->required_opp_count; i++) {
+ if (src->required_opp_tables[i])
+ continue;
+
+ req_np = of_parse_required_opp(src_opp->np, i);
+ if (!req_np)
+ continue;
+
+ req_table = _find_table_of_opp_np(req_np);
+ of_node_put(req_np);
+ if (!req_table)
+ continue;
+
+ src->required_opp_tables[i] = req_table;
+ list_for_each_entry(tmp_opp, &src->opp_list, node) {
+ req_np = of_parse_required_opp(tmp_opp->np, i);
+ tmp_opp->required_opps[i] = _find_opp_of_np(req_table,
+ req_np);
+ of_node_put(req_np);
+ }
+ }
+
+out:
+ mutex_unlock(&src->lock);
}
void _of_init_opp_table(struct opp_table *opp_table, struct device *dev,
@@ -267,7 +289,7 @@ void _of_opp_free_required_opps(struct opp_table *opp_table,
for (i = 0; i < opp_table->required_opp_count; i++) {
if (!required_opps[i])
- break;
+ continue;
/* Put the reference back */
dev_pm_opp_put(required_opps[i]);
@@ -282,9 +304,7 @@ static int _of_opp_alloc_required_opps(struct opp_table *opp_table,
struct dev_pm_opp *opp)
{
struct dev_pm_opp **required_opps;
- struct opp_table *required_table;
- struct device_node *np;
- int i, ret, count = opp_table->required_opp_count;
+ int count = opp_table->required_opp_count;
if (!count)
return 0;
@@ -295,32 +315,7 @@ static int _of_opp_alloc_required_opps(struct opp_table *opp_table,
opp->required_opps = required_opps;
- for (i = 0; i < count; i++) {
- required_table = opp_table->required_opp_tables[i];
-
- np = of_parse_required_opp(opp->np, i);
- if (unlikely(!np)) {
- ret = -ENODEV;
- goto free_required_opps;
- }
-
- required_opps[i] = _find_opp_of_np(required_table, np);
- of_node_put(np);
-
- if (!required_opps[i]) {
- pr_err("%s: Unable to find required OPP node: %pOF (%d)\n",
- __func__, opp->np, i);
- ret = -ENODEV;
- goto free_required_opps;
- }
- }
-
return 0;
-
-free_required_opps:
- _of_opp_free_required_opps(opp_table, opp);
-
- return ret;
}
static bool _opp_is_supported(struct device *dev, struct opp_table *opp_table,
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 01a500e2c40a..5b074eb7da07 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -225,12 +225,17 @@ void _of_clear_opp_table(struct opp_table *opp_table);
struct opp_table *_managed_opp(struct device *dev, int index);
void _of_opp_free_required_opps(struct opp_table *opp_table,
struct dev_pm_opp *opp);
+void _of_lazy_link_required_tables(struct opp_table *src);
#else
static inline void _of_init_opp_table(struct opp_table *opp_table, struct device *dev, int index) {}
static inline void _of_clear_opp_table(struct opp_table *opp_table) {}
static inline struct opp_table *_managed_opp(struct device *dev, int index) { return NULL; }
static inline void _of_opp_free_required_opps(struct opp_table *opp_table,
struct dev_pm_opp *opp) {}
+void bool _of_lazy_link_required_tables(struct opp_table *src)
+{
+ return false;
+}
#endif
#ifdef CONFIG_DEBUG_FS
--
2.22.0.510.g264f2c817a-goog
^ permalink raw reply related
* [PATCH v3 0/5] Add required-opps support to devfreq passive gov
From: Saravana Kannan @ 2019-07-17 22:23 UTC (permalink / raw)
To: MyungJoo Ham, Kyungmin Park, Chanwoo Choi, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki
Cc: Saravana Kannan, Sibi Sankar, kernel-team, linux-pm, linux-kernel
The devfreq passive governor scales the frequency of a "child" device based
on the current frequency of a "parent" device (not parent/child in the
sense of device hierarchy). As of today, the passive governor requires one
of the following to work correctly:
1. The parent and child device have the same number of frequencies
2. The child device driver passes a mapping function to translate from
parent frequency to child frequency.
When (1) is not true, (2) is the only option right now. But often times,
all that is required is a simple mapping from parent's frequency to child's
frequency.
Since OPPs already support pointing to other "required-opps", add support
for using that to map from parent device frequency to child device
frequency. That way, every child device driver doesn't have to implement a
separate mapping function anytime (1) isn't true.
Some common (but not comprehensive) reason for needing a devfreq passive
governor to adjust the frequency of one device based on another are:
1. These were the combination of frequencies that were validated/screened
during the manufacturing process.
2. These are the sensible performance combinations between two devices
interacting with each other. So that when one runs fast the other
doesn't become the bottleneck.
3. Hardware bugs requiring some kind of frequency ratio between devices.
For example, the following mapping can't be captured in DT as it stands
today because the parent and child device have different number of OPPs.
But with this patch series, this mapping can be captured cleanly.
In arch/arm64/boot/dts/exynos/exynos5433-bus.dtsi you have something
like this with the following changes:
bus_g2d_400: bus0 {
compatible = "samsung,exynos-bus";
clocks = <&cmu_top CLK_ACLK_G2D_400>;
clock-names = "bus";
operating-points-v2 = <&bus_g2d_400_opp_table>;
status = "disabled";
};
bus_noc2: bus9 {
compatible = "samsung,exynos-bus";
clocks = <&cmu_mif CLK_ACLK_BUS2_400>;
clock-names = "bus";
operating-points-v2 = <&bus_noc2_opp_table>;
status = "disabled";
};
bus_g2d_400_opp_table: opp_table2 {
compatible = "operating-points-v2";
opp-shared;
opp-400000000 {
opp-hz = /bits/ 64 <400000000>;
opp-microvolt = <1075000>;
required-opps = <&noc2_400>;
};
opp-267000000 {
opp-hz = /bits/ 64 <267000000>;
opp-microvolt = <1000000>;
required-opps = <&noc2_200>;
};
opp-200000000 {
opp-hz = /bits/ 64 <200000000>;
opp-microvolt = <975000>;
required-opps = <&noc2_200>;
};
opp-160000000 {
opp-hz = /bits/ 64 <160000000>;
opp-microvolt = <962500>;
required-opps = <&noc2_134>;
};
opp-134000000 {
opp-hz = /bits/ 64 <134000000>;
opp-microvolt = <950000>;
required-opps = <&noc2_134>;
};
opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
opp-microvolt = <937500>;
required-opps = <&noc2_100>;
};
};
bus_noc2_opp_table: opp_table6 {
compatible = "operating-points-v2";
noc2_400: opp-400000000 {
opp-hz = /bits/ 64 <400000000>;
};
noc2_200: opp-200000000 {
opp-hz = /bits/ 64 <200000000>;
};
noc2_134: opp-134000000 {
opp-hz = /bits/ 64 <134000000>;
};
noc2_100: opp-100000000 {
opp-hz = /bits/ 64 <100000000>;
};
};
-Saravana
v2 -> v3:
- Rebased onto linux-next.
- Added documentation comment for new fields.
- Added support for lazy required-opps linking.
- Updated Ack/Reviewed-bys.
v1 -> v2:
- Cached OPP table reference in devfreq to avoid looking up every time.
- Renamed variable in passive governor to be more intuitive.
- Updated cover letter with examples.
Saravana Kannan (5):
OPP: Allow required-opps even if the device doesn't have power-domains
OPP: Add function to look up required OPP's for a given OPP
OPP: Improve require-opps linking
PM / devfreq: Cache OPP table reference in devfreq
PM / devfreq: Add required OPPs support to passive governor
drivers/devfreq/devfreq.c | 6 ++
drivers/devfreq/governor_passive.c | 20 ++++--
drivers/opp/core.c | 84 ++++++++++++++++++++++---
drivers/opp/of.c | 98 +++++++++++++-----------------
drivers/opp/opp.h | 5 ++
include/linux/devfreq.h | 2 +
include/linux/pm_opp.h | 11 ++++
7 files changed, 156 insertions(+), 70 deletions(-)
--
2.22.0.510.g264f2c817a-goog
^ permalink raw reply
* Re: [PATCH RESEND] PCI: disable runtime PM for PLX switches
From: Bjorn Helgaas @ 2019-07-17 21:42 UTC (permalink / raw)
To: Alexander Fomichev
Cc: Rafael J. Wysocki, Mika Westerberg, Linux PCI, linux,
Rafael J. Wysocki, Linux PM, Logan Gunthorpe
In-Reply-To: <20190627110624.nxwloyphithj4rmt@yadro.com>
On Thu, Jun 27, 2019 at 02:06:24PM +0300, Alexander Fomichev wrote:
> On Wed, Apr 24, 2019 at 11:09:52PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Apr 24, 2019 at 8:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Apr 24, 2019 at 05:58:19PM +0300, Mika Westerberg wrote:
> > > > On Wed, Apr 24, 2019 at 09:11:48AM -0500, Bjorn Helgaas wrote:
> > > > > - Maybe the PCI sysfs accessors (pci_mmap_resource(), etc) should
> > > > > turn off runtime PM? If we allow mmap of a BAR and then put the
> > > > > device in D3hot, that seems like a bug that could affect lots of
> > > > > things. But maybe that's already done magically elsewhere?
> > > >
> > > > IIRC there is no PM magic happening for MMIO userspace accesses.
> > > >
> > > > What you suggest above sounds like a good way to fix it. We already do
> > > > similar for config space access from userspace (if the device is in
> > > > D3cold) so definitely makes sense to do the same for MMIO. However, I
> > > > don't think we need to disable runtime PM - it should be enough to
> > > > increase the reference count (pm_runtime_get_sync() and friends) during
> > > > the time the MMIO resource is mmapped.
> > >
> > > OK, so if I understand correctly this would be basically adding
> > > pm_runtime_get_sync(dev) in pci_mmap_resource(). I don't know what
> > > the unmap path is, but there would have to be a matching
> > > pm_runtime_put() somewhere.
> >
> > Right.
> >
> > > And a similar change in the read/write path for /sys/.../resource<N>;
> > > I think this must be related to the sysfs_create_bin_file() call in
> > > pci_create_attr(), but I don't see the path where the actual
> > > read/write to the device is done.
> > >
> > > And probably something similar should be done in pci_resource_io(),
> > > pci_map_rom(), and pci_unmap_rom().
> >
> > In general, every path in which there is a memory or IO address space
> > access requires pm_runtime_get_sync()/pm_runtime_put() around it as
> > these accesses are only guaranteed to work in D0.
>
> Tested a solution based on proposals by Logan, Bjorn, Mika, Rafael (thanks all
> of you, guys), I managed to fix the problem inside the PLX driver code. So no
> additional quirks or other modifications in Linux kernel needed. I think
> my patch can be easily rejected.
Can you fill us in a little bit on the solution? Are you referring to
an out-of-tree PLX kernel driver? I assume this is not a userspace
PLX tool because I don't think we have a solution to make sysfs mmap
safe yet.
Did you have to call pm_runtime_get() or similar from your driver?
Did your driver already call some PM interface before that? (If you
could point us at the source, that would be ideal.)
Rafael, does a PCI driver have to indicate somehow that it's prepared
for runtime PM? I assume the runtime PM core is designed in such a
way that it doesn't force driver changes (e.g., maybe driver changes
would enable more power savings, but at least things would *work*
unchanged).
Bjorn
^ permalink raw reply
* Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
From: Saravana Kannan @ 2019-07-17 20:34 UTC (permalink / raw)
To: Viresh Kumar
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
In-Reply-To: <20190717103220.f7cys267hq23fbsb@vireshk-i7>
On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-07-19, 18:10, Saravana Kannan wrote:
> > Interconnects and interconnect paths quantify their performance levels in
> > terms of bandwidth and not in terms of frequency. So similar to how we have
> > frequency based OPP tables in DT and in the OPP framework, we need
> > bandwidth OPP table support in the OPP framework and in DT. Since there can
> > be more than one interconnect path used by a device, we also need a way to
> > assign a bandwidth OPP table to an interconnect path.
> >
> > This patch series:
> > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings
> > - Adds interconnect-opp-table property to interconnect DT bindings
> > - Adds OPP helper functions for bandwidth OPP tables
> > - Adds icc_get_opp_table() to get the OPP table for an interconnect path
> >
> > So with the DT bindings added in this patch series, the DT for a GPU
> > that does bandwidth voting from GPU to Cache and GPU to DDR would look
> > something like this:
> >
> > gpu_cache_opp_table: gpu_cache_opp_table {
> > compatible = "operating-points-v2";
> >
> > gpu_cache_3000: opp-3000 {
> > opp-peak-KBps = <3000>;
> > opp-avg-KBps = <1000>;
> > };
> > gpu_cache_6000: opp-6000 {
> > opp-peak-KBps = <6000>;
> > opp-avg-KBps = <2000>;
> > };
> > gpu_cache_9000: opp-9000 {
> > opp-peak-KBps = <9000>;
> > opp-avg-KBps = <9000>;
> > };
> > };
> >
> > gpu_ddr_opp_table: gpu_ddr_opp_table {
> > compatible = "operating-points-v2";
> >
> > gpu_ddr_1525: opp-1525 {
> > opp-peak-KBps = <1525>;
> > opp-avg-KBps = <452>;
> > };
> > gpu_ddr_3051: opp-3051 {
> > opp-peak-KBps = <3051>;
> > opp-avg-KBps = <915>;
> > };
> > gpu_ddr_7500: opp-7500 {
> > opp-peak-KBps = <7500>;
> > opp-avg-KBps = <3000>;
> > };
> > };
>
> Who is going to use the above tables and how ?
In this example the GPU driver would use these. It'll go through these
and then decide what peak and average bw to pick based on whatever
criteria.
> These are the maximum
> BW available over these paths, right ?
I wouldn't call them "maximum" because there can't be multiple
maximums :) But yes, these are the meaningful bandwidth from the GPU's
perspective to use over these paths.
>
> > gpu_opp_table: gpu_opp_table {
> > compatible = "operating-points-v2";
> > opp-shared;
> >
> > opp-200000000 {
> > opp-hz = /bits/ 64 <200000000>;
> > };
> > opp-400000000 {
> > opp-hz = /bits/ 64 <400000000>;
> > };
> > };
>
> Shouldn't this link back to the above tables via required-opp, etc ?
> How will we know how much BW is required by the GPU device for all the
> paths ?
If that's what the GPU driver wants to do, then yes. But the GPU
driver could also choose to scale the bandwidth for these paths based
on multiple other signals. Eg: bus port busy percentage, measure
bandwidth, etc.
Or they might even decide to pick the BW OPP that matches the index of
their GPU OPP. It's up to them how they actually do the heuristic.
-Saravana
^ permalink raw reply
* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
From: Saravana Kannan @ 2019-07-17 20:29 UTC (permalink / raw)
To: Viresh Kumar
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
In-Reply-To: <20190717075448.xlyg2ddewlci3abg@vireshk-i7>
On Wed, Jul 17, 2019 at 12:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 02-07-19, 18:10, Saravana Kannan wrote:
> > Interconnects often quantify their performance points in terms of
> > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > allow specifying Bandwidth OPP tables in DT.
> >
> > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > tables.
> >
> > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > tables.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> > Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> > 1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > index 76b6c79604a5..c869e87caa2a 100644
> > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > @@ -83,9 +83,14 @@ properties.
> >
> > Required properties:
> > - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > - required property for all device nodes but devices like power domains. The
> > - power domain nodes must have another (implementation dependent) property which
> > - uniquely identifies the OPP nodes.
> > + required property for all device nodes but for devices like power domains or
> > + bandwidth opp tables. The power domain nodes must have another (implementation
> > + dependent) property which uniquely identifies the OPP nodes. The interconnect
> > + opps are required to have the opp-peak-bw property.
>
> ??
Sorry, what's the question? Was this an accidental email?
-Saravana
>
> > +
> > +- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
> > + big-endian integer. This is a required property for all devices that don't
> > + have opp-hz. For example, bandwidth OPP tables for interconnect paths.
> >
> > Optional properties:
> > - opp-microvolt: voltage in micro Volts.
> > @@ -132,6 +137,10 @@ Optional properties:
> > - opp-level: A value representing the performance level of the device,
> > expressed as a 32-bit integer.
> >
> > +- opp-avg-KBps: Average bandwidth in kilobytes per second, expressed as a
> > + 32-bit big-endian integer. This property is only meaningful in OPP tables
> > + where opp-peak-KBps is present.
> > +
> > - clock-latency-ns: Specifies the maximum possible transition latency (in
> > nanoseconds) for switching to this OPP from any other OPP.
> >
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
>
> --
> viresh
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply
* [PATCH] thermal: intel: int340x_thermal: Remove unnecessary acpi_has_method() uses
From: Kelsey Skunberg @ 2019-07-17 19:26 UTC (permalink / raw)
To: rui.zhang, edubezval, daniel.lezcano, linux-pm, linux-kernel
Cc: skunberg.kelsey, bjorn, skhan, linux-kernel-mentees, rjw
acpi_evaluate_object() will already return in error if the method does not
exist. Checking if the method is absent before the acpi_evaluate_object()
call is not needed. Remove acpi_has_method() calls to avoid additional
work.
Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com>
---
drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
index 9716bc3abaf9..7130e90773ed 100644
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -77,9 +77,6 @@ int acpi_parse_trt(acpi_handle handle, int *trt_count, struct trt **trtp,
struct acpi_buffer element = { 0, NULL };
struct acpi_buffer trt_format = { sizeof("RRNNNNNN"), "RRNNNNNN" };
- if (!acpi_has_method(handle, "_TRT"))
- return -ENODEV;
-
status = acpi_evaluate_object(handle, "_TRT", NULL, &buffer);
if (ACPI_FAILURE(status))
return -ENODEV;
@@ -158,9 +155,6 @@ int acpi_parse_art(acpi_handle handle, int *art_count, struct art **artp,
struct acpi_buffer art_format = {
sizeof("RRNNNNNNNNNNN"), "RRNNNNNNNNNNN" };
- if (!acpi_has_method(handle, "_ART"))
- return -ENODEV;
-
status = acpi_evaluate_object(handle, "_ART", NULL, &buffer);
if (ACPI_FAILURE(status))
return -ENODEV;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH 17/18] arm64: dts: Convert to the hierarchical CPU topology layout for MSM8916
From: Sudeep Holla @ 2019-07-17 17:18 UTC (permalink / raw)
To: Lina Iyer
Cc: Ulf Hansson, Lorenzo Pieralisi, Mark Rutland, linux-arm-kernel,
Rafael J . Wysocki, Daniel Lezcano, Raju P . L . S . S . S . N,
Amit Kucheria, Bjorn Andersson, Stephen Boyd, Niklas Cassel,
Tony Lindgren, Kevin Hilman, Viresh Kumar, Vincent Guittot,
Geert Uytterhoeven, Souvik Chakravarty, linux-pm, linux-arm-msm,
linux-kernel, Lina Iyer, Andy Gross, David Brown
In-Reply-To: <20190716203631.GC25567@codeaurora.org>
On Tue, Jul 16, 2019 at 02:36:31PM -0600, Lina Iyer wrote:
> On Tue, Jul 16 2019 at 08:47 -0600, Sudeep Holla wrote:
> > On Mon, May 13, 2019 at 09:22:59PM +0200, Ulf Hansson wrote:
> > > From: Lina Iyer <lina.iyer@linaro.org>
> > >
> > > In the hierarchical layout, we are creating power domains around each CPU
> > > and describes the idle states for them inside the power domain provider
> > > node. Note that, the CPU's idle states still needs to be compatible with
> > > "arm,idle-state".
> > >
> > > Furthermore, represent the CPU cluster as a separate master power domain,
> > > powering the CPU's power domains. The cluster node, contains the idle
> > > states for the cluster and each idle state needs to be compatible with the
> > > "domain-idle-state".
> > >
> > > If the running platform is using a PSCI FW that supports the OS initiated
> > > CPU suspend mode, which likely should be the case unless the PSCI FW is
> > > very old, this change triggers the PSCI driver to enable it.
> > >
> > > Cc: Andy Gross <andy.gross@linaro.org>
> > > Cc: David Brown <david.brown@linaro.org>
> > > Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
> > > Co-developed-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > ---
> >
> > [...]
> >
> > > @@ -166,12 +170,57 @@
> > > min-residency-us = <2000>;
> > > local-timer-stop;
> > > };
> > > +
> > > + CLUSTER_RET: cluster-retention {
> > > + compatible = "domain-idle-state";
> > > + arm,psci-suspend-param = <0x1000010>;
> > > + entry-latency-us = <500>;
> > > + exit-latency-us = <500>;
> > > + min-residency-us = <2000>;
> > > + };
> > > +
> > > + CLUSTER_PWRDN: cluster-gdhs {
> > > + compatible = "domain-idle-state";
> > > + arm,psci-suspend-param = <0x1000030>;
> > > + entry-latency-us = <2000>;
> > > + exit-latency-us = <2000>;
> > > + min-residency-us = <6000>;
> > > + };
> > > };
> > > };
> >
> > I was trying to understand the composition of composite state parameters
> > in this series and that made me look at these DT examples.
> >
> This was meant to depict a hierarchical state format for OSI.
>
Hmm, I am more confused. We have 2 formats: original and extended.
1. Original:
31:26 Reserved. Must be zero.
25:24 PowerLevel
23:17 Reserved. Must be zero.
16 StateType
15:0 StateID
2. Extended
31 Reserved. Must be zero.
30 StateType
29:28 Reserved. Must be zero.
27:0 StateID
I was trying to match them to that. I think I commented on other patches.
I think simple OR logic breaks with extended format easily if StateIDs
are not carefully crafted which is not mandated and hence the trouble.
The same holds to original format but with PowerLevel, it slightly
relaxing things a bit but still it needs to be crafted when firmware
decides these parameters. E.g.: what is done with HiKey platform is
completely wrong.
It's helpful if we want to avoid save/restore for retention states.
CPU_PM_CPU_IDLE_ENTER_RETENTION vs CPU_PM_CPU_IDLE_ENTER
> > What format does the above platform use ? I tried matching them to
> > both original as well as extended format and I fail to understand.
> > Assuming original format:
> > State power_state PowerLevel StateType StateID
> > SPC 0x40000002 0(core) 0(Retention) 0x2 (Res0 b[29]=1?)
> > CLUSTER_RET 0x1000010 1(clusters) 0(Retention) 0x10
> > CLUSTER_PWRDN 0x1000030 1(clusters) 0(Retention?) 0x30
> > Now extended format:
> > State power_state StateType StateID
> > SPC 0x40000002 0(Retention) 0x40000002 (Res0 b[29]=1?)
> > CLUSTER_RET 0x1000010 0(Retention) 0x1000010
> The composite state would comprise of CPU state and Cluster state.
> So for the last CPU entering idle -
> (CLUSTER_RET | SPC)
> 0x41000012
> > CLUSTER_PWRDN 0x1000030 0(Retention?) 0x1000030
> >
> (CLUSTER_PWRDN | SPC)
> 0x41000032
>
> Hope this helps.
>
I just follow OR logic. I have made wrong reference to bit 29 above(I
can't read simple 32 bit number anymore :(), it should bit 30 and if
this platform follow extended state, then it makes some sense. But
I expect CLUSTER_PWRDN also to have bit 30 set. I tried to match to both
formats and failed to understand which it follows, so thought of asking.
--
Regards,
Sudeep
^ permalink raw reply
* Re: [PATCH v4 02/24] PM / devfreq: tegra30: Keep interrupt disabled while governor is stopped
From: Dmitry Osipenko @ 2019-07-17 16:44 UTC (permalink / raw)
To: Chanwoo Choi, Thierry Reding, MyungJoo Ham, Kyungmin Park,
Jonathan Hunter, Tomeu Vizoso
Cc: linux-pm, linux-tegra, linux-kernel
In-Reply-To: <ca339b7f-1141-4c68-1c07-2ac818422bbc@samsung.com>
17.07.2019 9:37, Chanwoo Choi пишет:
> On 19. 7. 16. 오후 10:03, Dmitry Osipenko wrote:
>> 16.07.2019 14:47, Chanwoo Choi пишет:
>>> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
>>>> There is no real need to keep interrupt always-enabled, will be nicer
>>>> to keep it disabled while governor is inactive.
>>>>
>>>> Suggested-by: Thierry Reding <thierry.reding@gmail.com>
>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>> ---
>>>> drivers/devfreq/tegra30-devfreq.c | 43 ++++++++++++++++---------------
>>>> 1 file changed, 22 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c
>>>> index a27300f40b0b..5e2b133babdd 100644
>>>> --- a/drivers/devfreq/tegra30-devfreq.c
>>>> +++ b/drivers/devfreq/tegra30-devfreq.c
>>>> @@ -11,6 +11,7 @@
>>>> #include <linux/devfreq.h>
>>>> #include <linux/interrupt.h>
>>>> #include <linux/io.h>
>>>> +#include <linux/irq.h>
>>>> #include <linux/module.h>
>>>> #include <linux/mod_devicetable.h>
>>>> #include <linux/platform_device.h>
>>>> @@ -416,8 +417,6 @@ static void tegra_actmon_start(struct tegra_devfreq *tegra)
>>>> {
>>>> unsigned int i;
>>>>
>>>> - disable_irq(tegra->irq);
>>>> -
>>>> actmon_writel(tegra, ACTMON_SAMPLING_PERIOD - 1,
>>>> ACTMON_GLB_PERIOD_CTRL);
>>>>
>>>> @@ -442,8 +441,6 @@ static void tegra_actmon_stop(struct tegra_devfreq *tegra)
>>>> }
>>>>
>>>> actmon_write_barrier(tegra);
>>>> -
>>>> - enable_irq(tegra->irq);
>>>> }
>>>>
>>>> static int tegra_devfreq_target(struct device *dev, unsigned long *freq,
>>>> @@ -552,6 +549,12 @@ static int tegra_governor_event_handler(struct devfreq *devfreq,
>>>> {
>>>> struct tegra_devfreq *tegra = dev_get_drvdata(devfreq->dev.parent);
>>>>
>>>> + /*
>>>> + * Couple device with the governor early as it is needed at
>>>> + * the moment of governor's start (used by ISR).
>>>> + */
>>>> + tegra->devfreq = devfreq;
>>>
>>> I'm not sure it is necessary. Almost devfreq device get
>>> the devfreq instance on probe timing through devfreq_add_device directly.
>>
>> This is necessary because this assignment is for the "governor" and not
>> the "device". Governor is started during of devfreq_add_device(), hence
>> there is no better way to assign device to the driver's governor.
>
> OK. I understand.
>
> But, I have a question. Is it working before this patch?
> How can you test it on that tegra->devfreq is NULL?
It was working before this patch because previously interrupt was
requested *after* devfreq_add_device(), now the IRQ requesting happens
*before* devfreq_add_device() and enabling *during of*. If interrupt
fires before the assignment happened, then ISR gets a NULL deference and
this is easily reproducible.
Please note that 'tegra->devfreq' is used only by the ISR and onward, in
the further patches of this series the usage is extended by the cpufreq
notifier.
>>
>>>> +
>>>> switch (event) {
>>>> case DEVFREQ_GOV_START:
>>>> devfreq_monitor_start(devfreq);
>>>> @@ -586,10 +589,11 @@ static struct devfreq_governor tegra_devfreq_governor = {
>>>>
>>>> static int tegra_devfreq_probe(struct platform_device *pdev)
>>>> {
>>>> - struct tegra_devfreq *tegra;
>>>> struct tegra_devfreq_device *dev;
>>>> - unsigned int i;
>>>> + struct tegra_devfreq *tegra;
>>>> + struct devfreq *devfreq;
>>>> unsigned long rate;
>>>> + unsigned int i;
>>>> int err;
>>>>
>>>> tegra = devm_kzalloc(&pdev->dev, sizeof(*tegra), GFP_KERNEL);
>>>> @@ -625,6 +629,16 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>>>> }
>>>> tegra->irq = err;
>>>>
>>>> + irq_set_status_flags(tegra->irq, IRQ_NOAUTOEN);
>>>> +
>>>> + err = devm_request_threaded_irq(&pdev->dev, tegra->irq, NULL,
>>>> + actmon_thread_isr, IRQF_ONESHOT,
>>>> + "tegra-devfreq", tegra);
>>>> + if (err) {
>>>> + dev_err(&pdev->dev, "Interrupt request failed: %d\n", err);
>>>> + return err;
>>>> + }
>>>> +
>>>> reset_control_assert(tegra->reset);
>>>>
>>>> err = clk_prepare_enable(tegra->clock);
>>>> @@ -672,28 +686,15 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
>>>> }
>>>>
>>>> tegra_devfreq_profile.initial_freq = clk_get_rate(tegra->emc_clock);
>>>> - tegra->devfreq = devfreq_add_device(&pdev->dev,
>>>> - &tegra_devfreq_profile,
>>>> - "tegra_actmon",
>>>> - NULL);
>>>> + devfreq = devfreq_add_device(&pdev->dev, &tegra_devfreq_profile,
>>>> + "tegra_actmon", NULL);
>>>> if (IS_ERR(tegra->devfreq)) {
>>>
>>> Have to check 'devfreq' instead of 'tegra->devfreq'.
>>> Did you test it? It might be failed because 'tegra->devfreq is NULL.
>>
>> That's a good catch! Thank you very much.
>>
>>>> err = PTR_ERR(tegra->devfreq);
>>>
>>> ditto.
>>
>> Ok
^ permalink raw reply
* Re: [PATCH v4 13/15] docs: ABI: testing: make the files compatible with ReST output
From: Jonathan Cameron @ 2019-07-17 16:13 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: gregkh, Rafael J. Wysocki, Len Brown, Jonathan Cameron,
Hartmut Knaack, Lars-Peter Clausen, Peter Meerwald-Stadler,
Peter Rosin, Benson Leung, Enric Balletbo i Serra, Guenter Roeck,
Maxime Coquelin, Alexandre Torgue, Fabrice Gasnier,
Frederic Barrat, Andrew Donnellan, Sebastian Reichel,
Heikki Krogerus, Boris Ostrovsky, Juergen Gross,
Stefano Stabellini, Mike Kravetz, Nicolas Ferre,
Alexandre Belloni, Ludovic Desroches, Richard Cochran,
Jonathan Corbet, linux-acpi, linux-iio, linux-stm32,
linux-arm-kernel, linuxppc-dev, linux-pm, linux-usb, xen-devel,
linux-mm, netdev, linux-doc
In-Reply-To: <88d15fa38167e3f2e73e65e1c1a1f39bca0267b4.1563365880.git.mchehab+samsung@kernel.org>
On Wed, 17 Jul 2019 09:28:17 -0300
Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:
> Some files over there won't parse well by Sphinx.
>
> Fix them.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Hi Mauro,
Does feel like this one should perhaps have been broken up a touch!
For the IIO ones I've eyeballed it rather than testing the results
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply
* [GIT PULL] Thermal management updates for v5.3-rc1
From: Zhang Rui @ 2019-07-17 15:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: LKML, Linux PM list, Eduardo Valentin
Hi, Linus,
Please pull from
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next
to receive the latest Thermal Management updates for v5.3-rc1 with
top-most commit 6c395f66e98c895cf3ebf87c0b2fc63b6a57a196:
drivers: thermal: processor_thermal_device: Fix build warning (2019-
07-09 21:19:12 +0800)
on top of commit 4b972a01a7da614b4796475f933094751a295a2f:
Linux 5.2-rc6 (2019-06-22 16:01:36 -0700)
Specifics:
- Covert thermal documents to ReST. (Mauro Carvalho Chehab)
- Fix a cyclic depedency in between thermal core and governors.
(Daniel Lezcano)
- Fix processor_thermal_device driver to re-evaluate power limits
after resume. (Srinivas Pandruvada, Zhang Rui)
thanks,
rui
----------------------------------------------------------------
Daniel Lezcano (2):
thermal/drivers/core: Add init section table for self-
encapsulation
thermal/drivers/core: Use governor table to initialize
Mauro Carvalho Chehab (1):
docs: thermal: convert to ReST
Srinivas Pandruvada (1):
drivers: thermal: processor_thermal: Read PPCC on resume
Zhang Rui (2):
Merge branches 'thermal-core' and 'thermal-intel' into next
drivers: thermal: processor_thermal_device: Fix build warning
.../{cpu-cooling-api.txt => cpu-cooling-api.rst} | 39 +-
.../thermal/{exynos_thermal => exynos_thermal.rst} | 47 +-
Documentation/thermal/exynos_thermal_emulation | 53 ---
Documentation/thermal/exynos_thermal_emulation.rst | 61 +++
Documentation/thermal/index.rst | 18 +
.../{intel_powerclamp.txt => intel_powerclamp.rst} | 183 ++++----
.../{nouveau_thermal => nouveau_thermal.rst} | 54 ++-
.../{power_allocator.txt => power_allocator.rst} | 144 +++---
.../thermal/{sysfs-api.txt => sysfs-api.rst} | 488
++++++++++++++-------
...ure_thermal => x86_pkg_temperature_thermal.rst} | 28 +-
MAINTAINERS | 2 +-
drivers/thermal/fair_share.c | 12 +-
drivers/thermal/gov_bang_bang.c | 11 +-
.../int340x_thermal/processor_thermal_device.c | 18 +
drivers/thermal/power_allocator.c | 11 +-
drivers/thermal/step_wise.c | 11 +-
drivers/thermal/thermal_core.c | 52 ++-
drivers/thermal/thermal_core.h | 55 +--
drivers/thermal/user_space.c | 12 +-
include/asm-generic/vmlinux.lds.h | 11 +
include/linux/thermal.h | 4 +-
21 files changed, 771 insertions(+), 543 deletions(-)
rename Documentation/thermal/{cpu-cooling-api.txt => cpu-cooling-
api.rst} (82%)
rename Documentation/thermal/{exynos_thermal => exynos_thermal.rst}
(67%)
delete mode 100644 Documentation/thermal/exynos_thermal_emulation
create mode 100644 Documentation/thermal/exynos_thermal_emulation.rst
create mode 100644 Documentation/thermal/index.rst
rename Documentation/thermal/{intel_powerclamp.txt =>
intel_powerclamp.rst} (76%)
rename Documentation/thermal/{nouveau_thermal => nouveau_thermal.rst}
(64%)
rename Documentation/thermal/{power_allocator.txt =>
power_allocator.rst} (74%)
rename Documentation/thermal/{sysfs-api.txt => sysfs-api.rst} (66%)
rename Documentation/thermal/{x86_pkg_temperature_thermal =>
x86_pkg_temperature_thermal.rst} (80%)
^ permalink raw reply
* [PATCH] qcom: Add BCM vote macro to TCS header
From: Jordan Crouse @ 2019-07-17 15:53 UTC (permalink / raw)
To: freedreno
Cc: ilina, bjorn.andersson, linux-pm, Raju P.L.S.S.S.N, linux-arm-msm,
linux-kernel, Andy Gross, David Brown, Georgi Djakov
The A6XX family of Adreno GPUs use a microcontroller to control the
GPU clock independently. The microcontroller also has the capability
to vote for the bus but doesn't currently do so except for one initial
vote that is hard coded [1].
Currently there is no good way to construct a valid TCS command outside
of the inner workings of the QCOM interconnect driver which is something
that will need to be addressed for the next generation of GPU drivers.
To start the process, this change moves the TCS command macros from the
sdm845 interconnect driver into a soc specific header to make it available
for future efforts into this area.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/msm/adreno/a6xx_hfi.c#n219
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
drivers/interconnect/qcom/sdm845.c | 17 -----------------
include/soc/qcom/tcs.h | 17 +++++++++++++++++
2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
index 4915b78..79b6f01 100644
--- a/drivers/interconnect/qcom/sdm845.c
+++ b/drivers/interconnect/qcom/sdm845.c
@@ -20,23 +20,6 @@
#include <soc/qcom/rpmh.h>
#include <soc/qcom/tcs.h>
-#define BCM_TCS_CMD_COMMIT_SHFT 30
-#define BCM_TCS_CMD_COMMIT_MASK 0x40000000
-#define BCM_TCS_CMD_VALID_SHFT 29
-#define BCM_TCS_CMD_VALID_MASK 0x20000000
-#define BCM_TCS_CMD_VOTE_X_SHFT 14
-#define BCM_TCS_CMD_VOTE_MASK 0x3fff
-#define BCM_TCS_CMD_VOTE_Y_SHFT 0
-#define BCM_TCS_CMD_VOTE_Y_MASK 0xfffc000
-
-#define BCM_TCS_CMD(commit, valid, vote_x, vote_y) \
- (((commit) << BCM_TCS_CMD_COMMIT_SHFT) | \
- ((valid) << BCM_TCS_CMD_VALID_SHFT) | \
- ((cpu_to_le32(vote_x) & \
- BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_X_SHFT) | \
- ((cpu_to_le32(vote_y) & \
- BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_Y_SHFT))
-
#define to_qcom_provider(_provider) \
container_of(_provider, struct qcom_icc_provider, provider)
diff --git a/include/soc/qcom/tcs.h b/include/soc/qcom/tcs.h
index 262876a..6012a9e 100644
--- a/include/soc/qcom/tcs.h
+++ b/include/soc/qcom/tcs.h
@@ -53,4 +53,21 @@ struct tcs_request {
struct tcs_cmd *cmds;
};
+#define BCM_TCS_CMD_COMMIT_SHFT 30
+#define BCM_TCS_CMD_COMMIT_MASK 0x40000000
+#define BCM_TCS_CMD_VALID_SHFT 29
+#define BCM_TCS_CMD_VALID_MASK 0x20000000
+#define BCM_TCS_CMD_VOTE_X_SHFT 14
+#define BCM_TCS_CMD_VOTE_MASK 0x3fff
+#define BCM_TCS_CMD_VOTE_Y_SHFT 0
+#define BCM_TCS_CMD_VOTE_Y_MASK 0xfffc000
+
+#define BCM_TCS_CMD(commit, valid, vote_x, vote_y) \
+ (((commit) << BCM_TCS_CMD_COMMIT_SHFT) | \
+ ((valid) << BCM_TCS_CMD_VALID_SHFT) | \
+ ((cpu_to_le32(vote_x) & \
+ BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_X_SHFT) | \
+ ((cpu_to_le32(vote_y) & \
+ BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_Y_SHFT))
+
#endif /* __SOC_QCOM_TCS_H__ */
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v4 11/24] PM / devfreq: tegra30: Add debug messages
From: Dmitry Osipenko @ 2019-07-17 15:46 UTC (permalink / raw)
To: Chanwoo Choi, Thierry Reding, MyungJoo Ham, Kyungmin Park,
Jonathan Hunter, Tomeu Vizoso
Cc: linux-pm, linux-tegra, linux-kernel
In-Reply-To: <922c9178-71de-46ad-eafd-805af461bedb@samsung.com>
17.07.2019 9:45, Chanwoo Choi пишет:
> On 19. 7. 16. 오후 10:26, Dmitry Osipenko wrote:
>> 16.07.2019 15:23, Chanwoo Choi пишет:
>>> Hi Dmitry,
>>>
>>> Usually, the kernel log print for all users
>>> such as changing the frequency, fail or success.
>>>
>>> But, if the log just show the register dump,
>>> it is not useful for all users. It is just used
>>> for only specific developer.
>>>
>>> I recommend that you better to add more exception handling
>>> code on many points instead of just showing the register dump.
>>
>> The debug messages are not users, but for developers. Yes, I primarily
>> made the debugging to be useful for myself and will be happy to change
>> the way debugging is done if there will be any other active developer
>> for this driver. The registers dump is more than enough in order to
>> understand what's going on, I don't see any real need to change anything
>> here for now.
>
> Basically, we have to develop code and add the log for anyone.
> As you commented, even if there are no other developer, we never
> guarantee this assumption forever. And also, if added debug message
> for only you, you can add them when testing it temporarily.
>
> If you want to add the just register dump log for you,
> I can't agree. Once again, I hope that anyone understand
> the meaning of debug message as much possible as.
>
The registers dump should be good for everyone because it's a
self-explanatory information for anyone who is familiar with the
hardware. I don't think there is a need for anything else than what is
proposed in this patch, at least for now. I also simply don't see any
other better way to debug the state of this particular hardware, again
this logging is for the driver developers and not for users.
Initially, I was temporarily adding the debug messages. Now they are
pretty much mandatory for verifying that driver is working properly. And
of course the debugging messages got into the shape of this patch after
several iterations of refinements. So again, I suppose that this should
be good enough for everyone who is familiar with the hardware. And of
course I'm open to the constructive suggestions, the debugging aid is
not an ABI and could be changed/improved at any time.
You're suggesting to break down the debugging into several smaller
pieces, but I'm finding that as not a constructive suggestion because
the information about the full hardware state is actually necessary for
the productive debugging.
^ permalink raw reply
* [PATCH v2] power: supply: ab8500: remove set but not used variables 'vbup33_vrtcn' and 'bup_vch_range'
From: YueHaibing @ 2019-07-17 14:18 UTC (permalink / raw)
To: sre, linus.walleij, lee.jones, loic.pallardy
Cc: linux-kernel, linux-pm, YueHaibing
Fixes gcc '-Wunused-but-set-variable' warnings:
drivers/power/supply/ab8500_charger.c:
In function ab8500_charger_init_hw_registers:
drivers/power/supply/ab8500_charger.c:3013:24: warning:
variable vbup33_vrtcn set but not used [-Wunused-but-set-variable]
drivers/power/supply/ab8500_charger.c:3013:5: warning:
variable bup_vch_range set but not used [-Wunused-but-set-variable]
They are not used since commit 4c4268dc97c4 ("power:
supply: ab8500: Drop AB8540/9540 support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
v2: Fix patch title
---
drivers/power/supply/ab8500_charger.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/power/supply/ab8500_charger.c b/drivers/power/supply/ab8500_charger.c
index 30de448..270a48a 100644
--- a/drivers/power/supply/ab8500_charger.c
+++ b/drivers/power/supply/ab8500_charger.c
@@ -3010,7 +3010,6 @@ static int ab8500_charger_usb_get_property(struct power_supply *psy,
static int ab8500_charger_init_hw_registers(struct ab8500_charger *di)
{
int ret = 0;
- u8 bup_vch_range = 0, vbup33_vrtcn = 0;
/* Setup maximum charger current and voltage for ABB cut2.0 */
if (!is_ab8500_1p1_or_earlier(di->parent)) {
@@ -3111,12 +3110,6 @@ static int ab8500_charger_init_hw_registers(struct ab8500_charger *di)
goto out;
}
- /* Backup battery voltage and current */
- if (di->bm->bkup_bat_v > BUP_VCH_SEL_3P1V)
- bup_vch_range = BUP_VCH_RANGE;
- if (di->bm->bkup_bat_v == BUP_VCH_SEL_3P3V)
- vbup33_vrtcn = VBUP33_VRTCN;
-
ret = abx500_set_register_interruptible(di->dev,
AB8500_RTC,
AB8500_RTC_BACKUP_CHG_REG,
--
2.7.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox