Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* Re: [PATCH v4 12/24] PM / devfreq: tegra30: Inline all one-line functions
From: Dmitry Osipenko @ 2019-07-19  2:14 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <97f2a317-989a-bcad-dd45-ccf00ba18cca@samsung.com>

В Fri, 19 Jul 2019 10:27:16 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 19. 오전 10:24, Chanwoo Choi wrote:
> > On 19. 7. 19. 오전 10:22, Dmitry Osipenko wrote:  
> >> В Thu, 18 Jul 2019 18:09:05 +0900
> >> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> >>  
> >>> On 19. 7. 16. 오후 10:35, Dmitry Osipenko wrote:  
> >>>> 16.07.2019 15:26, Chanwoo Choi пишет:    
> >>>>> Hi Dmitry,
> >>>>>
> >>>>> I'm not sure that it is necessary.
> >>>>> As I knew, usally, the 'inline' is used on header file
> >>>>> to define the empty functions.
> >>>>>
> >>>>> Do we have to change it with 'inline' keyword?    
> >>>>
> >>>> The 'inline' attribute tells compiler that instead of jumping
> >>>> into the function, it should take the function's code and
> >>>> replace the function's invocation with that code. This is done
> >>>> in order to help compiler optimize code properly, please see
> >>>> [1]. There is absolutely no need to create a function call into
> >>>> a function that consists of a single instruction.
> >>>>
> >>>> [1] https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Inline.html
> >>>>     
> >>>
> >>> If you want to add 'inline' keyword, I recommend that 
> >>> you better to remove the modified function in this patch
> >>> and then just call the 'write_relaxed or read_relaxed' function
> >>> directly. It is same result when using inline keyword.  
> >>
> >> That could be done, but it makes code less readable.
> >>
> >> See the difference:
> >>
> >> device_writel(dev, ACTMON_INTR_STATUS_CLEAR,
> >> ACTMON_DEV_INTR_STATUS);
> >>
> >> writel_relaxed(ACTMON_INTR_STATUS_CLEAR,
> >> 	       dev->regs + ACTMON_DEV_INTR_STATUS);  
> > 
> > No problem if you add the detailed comment and you want to use
> > the 'inline' keyword.  
> 
> Basically, I think that 'inline' keyword is not necessary.

Sure, but I'm finding that it's always nicer to explicitly inline a very
simple functions because compiler may not do it properly itself in some
cases.

> But if you want to use 'inline' keyword, I recommend
> that call the 'write_relaxed or read_relaxed' function directly
> with detailed description. 

Could you please reword this sentence? Not sure that I'm understanding
it correctly.

^ permalink raw reply

* Re: [PATCH v4 20/24] PM / devfreq: tegra30: Optimize upper average watermark selection
From: Chanwoo Choi @ 2019-07-19  2:06 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719045943.73b53e31@dimatab>

On 19. 7. 19. 오전 10:59, Dmitry Osipenko wrote:
> В Fri, 19 Jul 2019 10:36:30 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
>>> I noticed that CPU may be crossing the dependency threshold very
>>> frequently for some workloads and this results in a lot of
>>> interrupts which could be avoided if MCALL client is keeping actual
>>> EMC frequency at a higher rate.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>> ---
>>>  drivers/devfreq/tegra30-devfreq.c | 23 ++++++++++++++++++-----
>>>  1 file changed, 18 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/tegra30-devfreq.c
>>> b/drivers/devfreq/tegra30-devfreq.c index
>>> c3cf87231d25..4d582809acb6 100644 ---
>>> a/drivers/devfreq/tegra30-devfreq.c +++
>>> b/drivers/devfreq/tegra30-devfreq.c @@ -314,7 +314,8 @@ static void
>>> tegra_actmon_get_lower_upper(struct tegra_devfreq *tegra, }
>>>  
>>>  static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq
>>> *tegra,
>>> -					   struct
>>> tegra_devfreq_device *dev)
>>> +					   struct
>>> tegra_devfreq_device *dev,
>>> +					   unsigned long freq)
>>>  {
>>>  	unsigned long avg_threshold, lower, upper;
>>>  
>>> @@ -323,6 +324,15 @@ static void
>>> tegra_devfreq_update_avg_wmark(struct tegra_devfreq *tegra,
>>> avg_threshold = dev->config->avg_dependency_threshold;
>>> avg_threshold = avg_threshold * ACTMON_SAMPLING_PERIOD; 
>>> +	/*
>>> +	 * If cumulative EMC frequency selection is higher than the
>>> +	 * device's, then there is no need to set upper watermark
>>> to
>>> +	 * a lower value because it will result in unnecessary
>>> upper
>>> +	 * interrupts.
>>> +	 */
>>> +	if (freq * ACTMON_SAMPLING_PERIOD > upper)
>>> +		upper = freq * ACTMON_SAMPLING_PERIOD;  
>>
>> Also, 'upper value is used on the patch5. You can combine this code
>> to patch5 or if this patch depends on the cpu notifier, you can
>> combine it to the patch of adding cpu notifier without separate patch.
> 
> Well okay, I'll try to squash some of the patches in the next revision.
> Usually I'm receiving comments in the other direction, asking to
> separate patches into smaller changes ;) So that's more a personal
> preference of each maintainer, I'd say.
> 

Right. We have to make the patch with atomic attribute.
But, if there are patches which touch the same code
in the same patchset. We can squash or do refactorig
of this code.

And also, if possible, I'd like you to make the patch
list according to the role of patch. For example,
the patches related to the 'watermark' could be sequentially
listed. But, it is not forced opinion. If just possible.



-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 20/24] PM / devfreq: tegra30: Optimize upper average watermark selection
From: Dmitry Osipenko @ 2019-07-19  1:59 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <e3358039-d1b3-a5a0-1a37-aeb8edd49d6b@samsung.com>

В Fri, 19 Jul 2019 10:36:30 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
> > I noticed that CPU may be crossing the dependency threshold very
> > frequently for some workloads and this results in a lot of
> > interrupts which could be avoided if MCALL client is keeping actual
> > EMC frequency at a higher rate.
> > 
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/devfreq/tegra30-devfreq.c | 23 ++++++++++++++++++-----
> >  1 file changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/devfreq/tegra30-devfreq.c
> > b/drivers/devfreq/tegra30-devfreq.c index
> > c3cf87231d25..4d582809acb6 100644 ---
> > a/drivers/devfreq/tegra30-devfreq.c +++
> > b/drivers/devfreq/tegra30-devfreq.c @@ -314,7 +314,8 @@ static void
> > tegra_actmon_get_lower_upper(struct tegra_devfreq *tegra, }
> >  
> >  static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq
> > *tegra,
> > -					   struct
> > tegra_devfreq_device *dev)
> > +					   struct
> > tegra_devfreq_device *dev,
> > +					   unsigned long freq)
> >  {
> >  	unsigned long avg_threshold, lower, upper;
> >  
> > @@ -323,6 +324,15 @@ static void
> > tegra_devfreq_update_avg_wmark(struct tegra_devfreq *tegra,
> > avg_threshold = dev->config->avg_dependency_threshold;
> > avg_threshold = avg_threshold * ACTMON_SAMPLING_PERIOD; 
> > +	/*
> > +	 * If cumulative EMC frequency selection is higher than the
> > +	 * device's, then there is no need to set upper watermark
> > to
> > +	 * a lower value because it will result in unnecessary
> > upper
> > +	 * interrupts.
> > +	 */
> > +	if (freq * ACTMON_SAMPLING_PERIOD > upper)
> > +		upper = freq * ACTMON_SAMPLING_PERIOD;  
> 
> Also, 'upper value is used on the patch5. You can combine this code
> to patch5 or if this patch depends on the cpu notifier, you can
> combine it to the patch of adding cpu notifier without separate patch.

Well okay, I'll try to squash some of the patches in the next revision.
Usually I'm receiving comments in the other direction, asking to
separate patches into smaller changes ;) So that's more a personal
preference of each maintainer, I'd say.

^ permalink raw reply

* Re: [PATCH v4 17/24] PM / devfreq: tegra30: Use tracepoints for debugging
From: Dmitry Osipenko @ 2019-07-19  1:50 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <ffa24275-4499-c27a-8c60-a5f4d738913c@samsung.com>

В Fri, 19 Jul 2019 10:01:55 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 19. 오전 9:49, Dmitry Osipenko wrote:
> > В Thu, 18 Jul 2019 18:47:09 +0900
> > Chanwoo Choi <cw00.choi@samsung.com> пишет:
> >   
> >> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:  
> >>> Debug messages create too much CPU and memory activity by
> >>> themselves, so it's difficult to debug lower rates and catch
> >>> unwanted interrupts that happen rarely. Tracepoints are ideal in
> >>> that regards because they do not contribute to the sampled date at
> >>> all. This allowed me to catch few problems which are fixed by the
> >>> followup patches, without tracepoints it would be much harder to
> >>> do.
> >>>
> >>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >>> ---
> >>>  drivers/devfreq/tegra30-devfreq.c      |  43 +++-------
> >>>  include/trace/events/tegra30_devfreq.h | 105
> >>> +++++++++++++++++++++++++ 2 files changed, 117 insertions(+), 31
> >>> deletions(-) create mode 100644
> >>> include/trace/events/tegra30_devfreq.h    
> >>
> >> As I knew, 'include/trace/events' don't include the header file
> >> for only one device driver. Usually, the trace event is provided
> >> by framework instead of each devic driver.  
> > 
> > There are at least trace headers there for the tegra-apbdma,
> > tegra-host1x, intel-sst and intel-ish devices. I don't think that
> > there is a strict rule for the trace headers placement.  
> 
> OK.
> 
> But, As I already replied on patch4, if you want to show the register
> dump, you better to add the debugfs feature to devfreq framework for
> showing the register dump instead of printing the register dump with
> debug level and this trace event. 
> 
> As I said, just register dump is not useful for all developers.
> Almost developer cannot understand the meaning of debug log for
> register dump.

I think there is some disconnect here. I'm finding that the raw
register values are essential for debugging of this driver. The
registers tracing is very trivial and self-explanatory, just can't see
any better variant.

The registers documentation is available for everyone, you can go to
NVIDIA website and download it (after registration).

We have registers tracing in other Tegra drivers, please see for the
quick example:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/staging/media/tegra-vde/trace.h
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/gpu/drm/tegra/trace.h

It's the first time I'm seeing complains about debug tracing and
currently having hard time trying to understand yours point.

> Also, it is not proper way that front patch adds the some code
> and then later patch removes the additional code in the same series.
> Before sending the patches, you can renew them.

Okay.

^ permalink raw reply

* Re: [PATCH v4 21/24] PM / devfreq: tegra30: Synchronize average count on target's update
From: Chanwoo Choi @ 2019-07-19  1:40 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719033152.1d4c3003@dimatab>

On 19. 7. 19. 오전 9:31, Dmitry Osipenko wrote:
> В Thu, 18 Jul 2019 19:15:54 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 8. 오전 7:33, Dmitry Osipenko wrote:
>>> The average count may get out of sync if interrupt was disabled /
>>> avoided for a long time due to upper watermark optimization, hence
>>> it should be re-synced on each target's update to ensure that
>>> watermarks are set up correctly on EMC rate-change notification and
>>> that a correct frequency is selected for device.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>> ---
>>>  drivers/devfreq/tegra30-devfreq.c | 30
>>> ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
>>>
>>> diff --git a/drivers/devfreq/tegra30-devfreq.c
>>> b/drivers/devfreq/tegra30-devfreq.c index
>>> 4d582809acb6..8a674fad26be 100644 ---
>>> a/drivers/devfreq/tegra30-devfreq.c +++
>>> b/drivers/devfreq/tegra30-devfreq.c @@ -466,11 +466,41 @@ static
>>> void actmon_isr_device(struct tegra_devfreq *tegra,
>>> dev->boost_freq, cpufreq_get(0)); }
>>>  
>>> +static void tegra_devfreq_sync_avg_count(struct tegra_devfreq
>>> *tegra,
>>> +					 struct
>>> tegra_devfreq_device *dev) +{
>>> +	u32 avg_count, avg_freq, old_upper, new_upper;
>>> +
>>> +	avg_count = device_readl(dev, ACTMON_DEV_AVG_COUNT);
>>> +	avg_freq = avg_count / ACTMON_SAMPLING_PERIOD;
>>> +
>>> +	old_upper = tegra_actmon_upper_freq(tegra, dev->avg_freq);
>>> +	new_upper = tegra_actmon_upper_freq(tegra, avg_freq);
>>> +
>>> +	/* similar to ISR, see comments in actmon_isr_device() */
>>> +	if (old_upper != new_upper) {
>>> +		dev->avg_freq = avg_freq;
>>> +		dev->boost_freq = 0;
>>> +	}
>>> +}
>>> +
>>>  static unsigned long actmon_update_target(struct tegra_devfreq
>>> *tegra, struct tegra_devfreq_device *dev)
>>>  {
>>>  	unsigned long target_freq;
>>>  
>>> +	/*
>>> +	 * The avg_count / avg_freq is getting snapshoted on
>>> device's
>>> +	 * interrupt, but there are cases where actual value need
>>> to
>>> +	 * be utilized on target's update, like CPUFreq boosting
>>> and
>>> +	 * overriding the min freq
>>> via /sys/class/devfreq/devfreq0/min_freq
>>> +	 * because we're optimizing the upper watermark based on
>>> the
>>> +	 * actual EMC frequency. This means that interrupt may be
>>> +	 * inactive for a long time and thus making snapshoted
>>> value
>>> +	 * outdated.
>>> +	 */
>>> +	tegra_devfreq_sync_avg_count(tegra, dev);  
>>
>> I think that you don't need to add the separate function to calculate
>> the 'dev->avg_freq'. It is enough with your detailed comment to add
>> this code in this function.
> 
> The separate function is indeed not mandatory here, but I'm finding that
> it usually makes easier to read and follow the code when it is properly
> split up into logical blocks. Don't you agree?

It is right to make the separate function if function is too long or 
function is called on the multiple points.

But, in this case, I think that it is enough to add this code
to the actmon_update_target() because you already added the detailed comment. 

It is enough to understand the role of this code with your comment.

> 
>>> +
>>>  	target_freq = min(dev->avg_freq + dev->boost_freq,
>>> KHZ_MAX); target_freq = tegra_actmon_account_cpu_freq(tegra, dev,
>>> target_freq); 
>>>   
>>
>> And also, is it impossible to squash this patch with patch19/patch20?
>>
> 
> It should be possible to squash this patch with #20, but wouldn't
> be better to keep changes in the chronological order? It's also better
> to keep changes separate simply to aid bisection in case of a problem.
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 20/24] PM / devfreq: tegra30: Optimize upper average watermark selection
From: Chanwoo Choi @ 2019-07-19  1:36 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding, MyungJoo Ham, Kyungmin Park,
	Jonathan Hunter, Tomeu Vizoso
  Cc: linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190707223303.6755-21-digetx@gmail.com>

On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
> I noticed that CPU may be crossing the dependency threshold very
> frequently for some workloads and this results in a lot of interrupts
> which could be avoided if MCALL client is keeping actual EMC frequency
> at a higher rate.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/devfreq/tegra30-devfreq.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c
> index c3cf87231d25..4d582809acb6 100644
> --- a/drivers/devfreq/tegra30-devfreq.c
> +++ b/drivers/devfreq/tegra30-devfreq.c
> @@ -314,7 +314,8 @@ static void tegra_actmon_get_lower_upper(struct tegra_devfreq *tegra,
>  }
>  
>  static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq *tegra,
> -					   struct tegra_devfreq_device *dev)
> +					   struct tegra_devfreq_device *dev,
> +					   unsigned long freq)
>  {
>  	unsigned long avg_threshold, lower, upper;
>  
> @@ -323,6 +324,15 @@ static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq *tegra,
>  	avg_threshold = dev->config->avg_dependency_threshold;
>  	avg_threshold = avg_threshold * ACTMON_SAMPLING_PERIOD;
>  
> +	/*
> +	 * If cumulative EMC frequency selection is higher than the
> +	 * device's, then there is no need to set upper watermark to
> +	 * a lower value because it will result in unnecessary upper
> +	 * interrupts.
> +	 */
> +	if (freq * ACTMON_SAMPLING_PERIOD > upper)
> +		upper = freq * ACTMON_SAMPLING_PERIOD;

Also, 'upper value is used on the patch5. You can combine this code to patch5
or if this patch depends on the cpu notifier, you can combine it to the patch
of adding cpu notifier without separate patch.

> +
>  	/*
>  	 * We want to get interrupts when MCCPU client crosses the
>  	 * dependency threshold in order to take into / out of account
> @@ -392,6 +402,7 @@ static void actmon_isr_device(struct tegra_devfreq *tegra,
>  			      struct tegra_devfreq_device *dev)
>  {
>  	u32 intr_status, dev_ctrl, avg_intr_mask, avg_count;
> +	unsigned long freq;
>  
>  	trace_device_isr_enter(tegra->regs, dev->config->offset,
>  			       dev->boost_freq, cpufreq_get(0));
> @@ -405,8 +416,10 @@ static void actmon_isr_device(struct tegra_devfreq *tegra,
>  	avg_intr_mask = ACTMON_DEV_INTR_AVG_BELOW_WMARK |
>  			ACTMON_DEV_INTR_AVG_ABOVE_WMARK;
>  
> -	if (intr_status & avg_intr_mask)
> -		tegra_devfreq_update_avg_wmark(tegra, dev);
> +	if (intr_status & avg_intr_mask) {
> +		freq = clk_get_rate(tegra->emc_clock) / KHZ;
> +		tegra_devfreq_update_avg_wmark(tegra, dev, freq);
> +	}
>  
>  	if (intr_status & ACTMON_DEV_INTR_CONSECUTIVE_UPPER) {
>  		/*
> @@ -525,7 +538,7 @@ static int tegra_actmon_clk_notify_cb(struct notifier_block *nb,
>  	for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>  		dev = &tegra->devices[i];
>  
> -		tegra_devfreq_update_avg_wmark(tegra, dev);
> +		tegra_devfreq_update_avg_wmark(tegra, dev, freq);
>  		tegra_devfreq_update_wmark(tegra, dev, freq);
>  	}
>  
> @@ -630,7 +643,7 @@ static void tegra_actmon_configure_device(struct tegra_devfreq *tegra,
>  	device_writel(dev, dev->avg_freq * ACTMON_SAMPLING_PERIOD,
>  		      ACTMON_DEV_INIT_AVG);
>  
> -	tegra_devfreq_update_avg_wmark(tegra, dev);
> +	tegra_devfreq_update_avg_wmark(tegra, dev, dev->avg_freq);
>  	tegra_devfreq_update_wmark(tegra, dev, dev->avg_freq);
>  
>  	device_writel(dev, ACTMON_COUNT_WEIGHT, ACTMON_DEV_COUNT_WEIGHT);
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 05/24] PM / devfreq: tegra30: Set up watermarks properly
From: Chanwoo Choi @ 2019-07-19  1:31 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719030058.617de581@dimatab>

On 19. 7. 19. 오전 9:00, Dmitry Osipenko wrote:
> В Thu, 18 Jul 2019 19:17:17 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
>>> The current implementation is inaccurate and results in very
>>> intensive interrupt activity, which neglects the whole idea of
>>> polling offload to hardware. The reason of the shortcoming is that
>>> watermarks are not set up correctly and this results in ACTMON
>>> constantly asking to change freq and then these requests are
>>> ignored. The end result of this patch is that there are few
>>> hundreds of ACTMON's interrupts instead of tens thousands after few
>>> minutes of a working devfreq, meanwhile the transitions activity
>>> stays about the same and governor becomes more reactive.
>>>
>>> Since watermarks are set precisely correct now, the boosting logic
>>> is changed a tad to accommodate the change. The "average sustain
>>> coefficient" multiplier is gone now since there is no need to
>>> compensate the improper watermarks and EMC frequency-bump happens
>>> once boosting hits the upper watermark enough times, depending on
>>> the per-device boosting threshold.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>> ---
>>>  drivers/devfreq/tegra30-devfreq.c | 293
>>> +++++++++++++++++++++--------- 1 file changed, 209 insertions(+),
>>> 84 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/tegra30-devfreq.c
>>> b/drivers/devfreq/tegra30-devfreq.c index
>>> 4be7858c33bc..16f7e6cf3b99 100644 ---
>>> a/drivers/devfreq/tegra30-devfreq.c +++
>>> b/drivers/devfreq/tegra30-devfreq.c @@ -47,6 +47,8 @@
>>>  
>>>  #define ACTMON_DEV_INTR_CONSECUTIVE_UPPER
>>> BIT(31) #define
>>> ACTMON_DEV_INTR_CONSECUTIVE_LOWER			BIT(30)
>>> +#define
>>> ACTMON_DEV_INTR_AVG_BELOW_WMARK
>>> BIT(25) +#define
>>> ACTMON_DEV_INTR_AVG_ABOVE_WMARK
>>> BIT(24) #define
>>> ACTMON_ABOVE_WMARK_WINDOW				1 #define
>>> ACTMON_BELOW_WMARK_WINDOW				3 @@ -63,9
>>> +65,8 @@
>>>   * ACTMON_AVERAGE_WINDOW_LOG2: default value for @DEV_CTRL_K_VAL,
>>> which
>>>   * translates to 2 ^ (K_VAL + 1). ex: 2 ^ (6 + 1) = 128
>>>   */
>>> -#define ACTMON_AVERAGE_WINDOW_LOG2			6
>>> -#define ACTMON_SAMPLING_PERIOD
>>> 12 /* ms */ -#define
>>> ACTMON_DEFAULT_AVG_BAND				6  /* 1/10
>>> of % */ +#define
>>> ACTMON_AVERAGE_WINDOW_LOG2				6
>>> +#define
>>> ACTMON_SAMPLING_PERIOD					12 /*
>>> ms */ #define
>>> KHZ							1000 @@
>>> -142,9 +143,6 @@ struct tegra_devfreq_device {
>>>  	 * watermark breaches.
>>>  	 */
>>>  	unsigned long boost_freq;
>>> -
>>> -	/* Optimal frequency calculated from the stats for this
>>> device */
>>> -	unsigned long target_freq;
>>>  };
>>>  
>>>  struct tegra_devfreq {
>>> @@ -156,7 +154,6 @@ struct tegra_devfreq {
>>>  
>>>  	struct clk		*emc_clock;
>>>  	unsigned long		max_freq;
>>> -	unsigned long		cur_freq;
>>>  	struct notifier_block	rate_change_nb;
>>>  
>>>  	struct tegra_devfreq_device
>>> devices[ARRAY_SIZE(actmon_device_configs)]; @@ -205,42 +202,182 @@
>>> static unsigned long do_percent(unsigned long val, unsigned int
>>> pct) return val * pct / 100; }
>>>  
>>> +static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq
>>> *tegra) +{
>>> +	struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios;
>>> +	unsigned int cpu_freq = cpufreq_get(0);
>>> +	unsigned int i;
>>> +
>>> +	for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++,
>>> ratio++) {
>>> +		if (cpu_freq >= ratio->cpu_freq) {
>>> +			if (ratio->emc_freq >= tegra->max_freq)
>>> +				return tegra->max_freq;
>>> +			else
>>> +				return ratio->emc_freq;
>>> +		}
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static unsigned long
>>> +tegra_actmon_account_cpu_freq(struct tegra_devfreq *tegra,
>>> +			      struct tegra_devfreq_device *dev,
>>> +			      unsigned long target_freq)
>>> +{
>>> +	unsigned long static_cpu_emc_freq;
>>> +
>>> +	if (dev->config->avg_dependency_threshold &&
>>> +	    dev->config->avg_dependency_threshold <
>>> dev->avg_count) {
>>> +		static_cpu_emc_freq =
>>> actmon_cpu_to_emc_rate(tegra);
>>> +		target_freq = max(target_freq,
>>> static_cpu_emc_freq);
>>> +	}
>>> +
>>> +	return target_freq;
>>> +}
>>> +
>>> +static unsigned long tegra_actmon_lower_freq(struct tegra_devfreq
>>> *tegra,
>>> +					     unsigned long
>>> target_freq) +{
>>> +	unsigned long lower = target_freq;
>>> +	struct dev_pm_opp *opp;
>>> +
>>> +	opp =
>>> dev_pm_opp_find_freq_floor(tegra->devfreq->dev.parent, &lower);
>>> +	if (IS_ERR(opp))
>>> +		lower = 0;
>>> +	else
>>> +		dev_pm_opp_put(opp);
>>> +
>>> +	return lower;
>>> +}
>>> +
>>> +static unsigned long tegra_actmon_upper_freq(struct tegra_devfreq
>>> *tegra,
>>> +					     unsigned long
>>> target_freq) +{
>>> +	unsigned long upper = target_freq + 1;
>>> +	struct dev_pm_opp *opp;
>>> +
>>> +	opp =
>>> dev_pm_opp_find_freq_ceil(tegra->devfreq->dev.parent, &upper);
>>> +	if (IS_ERR(opp))
>>> +		upper = ULONG_MAX;
>>> +	else
>>> +		dev_pm_opp_put(opp);
>>> +
>>> +	return upper;
>>> +}
>>> +
>>> +static void tegra_actmon_get_lower_upper(struct tegra_devfreq
>>> *tegra,
>>> +					 struct
>>> tegra_devfreq_device *dev,
>>> +					 unsigned long target_freq,
>>> +					 unsigned long *lower,
>>> +					 unsigned long *upper)
>>> +{
>>> +	/*
>>> +	 * Memory frequencies are guaranteed to have 1MHz
>>> granularity
>>> +	 * and thus we need this rounding down to get a proper
>>> watermarks
>>> +	 * range in a case where target_freq falls into a range of
>>> +	 * next_possible_opp_freq - 1MHz.
>>> +	 */
>>> +	target_freq = round_down(target_freq, 1000000);
>>> +
>>> +	/* watermarks are set at the borders of the corresponding
>>> OPPs */
>>> +	*lower = tegra_actmon_lower_freq(tegra, target_freq);
>>> +	*upper = tegra_actmon_upper_freq(tegra, target_freq);
>>> +
>>> +	*lower /= KHZ;
>>> +	*upper /= KHZ;
>>> +
>>> +	/*
>>> +	 * The upper watermark should take into account CPU's
>>> frequency
>>> +	 * because cpu_to_emc_rate() may override the target_freq
>>> with
>>> +	 * a higher value and thus upper watermark need to be set
>>> up
>>> +	 * accordingly to avoid parasitic upper-events.
>>> +	 */
>>> +	*upper = tegra_actmon_account_cpu_freq(tegra, dev, *upper);
>>> +
>>> +	*lower *= ACTMON_SAMPLING_PERIOD;
>>> +	*upper *= ACTMON_SAMPLING_PERIOD;
>>> +}
>>> +
>>>  static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq
>>> *tegra, struct tegra_devfreq_device *dev)
>>>  {
>>> -	u32 avg = dev->avg_count;
>>> -	u32 avg_band_freq = tegra->max_freq *
>>> ACTMON_DEFAULT_AVG_BAND / KHZ;
>>> -	u32 band = avg_band_freq * ACTMON_SAMPLING_PERIOD;
>>> +	unsigned long lower, upper, freq;
>>>  
>>> -	device_writel(dev, avg + band, ACTMON_DEV_AVG_UPPER_WMARK);
>>> +	freq = dev->avg_count / ACTMON_SAMPLING_PERIOD * KHZ;
>>> +	tegra_actmon_get_lower_upper(tegra, dev, freq, &lower,
>>> &upper); 
>>> -	avg = max(dev->avg_count, band);
>>> -	device_writel(dev, avg - band, ACTMON_DEV_AVG_LOWER_WMARK);
>>> +	/*
>>> +	 * We want to get interrupts when MCCPU client crosses the
>>> +	 * dependency threshold in order to take into / out of
>>> account
>>> +	 * the CPU's freq.
>>> +	 */
>>> +	if (lower < dev->config->avg_dependency_threshold &&
>>> +	    upper > dev->config->avg_dependency_threshold) {
>>> +		if (dev->avg_count <
>>> dev->config->avg_dependency_threshold)
>>> +			upper =
>>> dev->config->avg_dependency_threshold;
>>> +		else
>>> +			lower =
>>> dev->config->avg_dependency_threshold;
>>> +	}
>>> +
>>> +	device_writel(dev, lower, ACTMON_DEV_AVG_LOWER_WMARK);
>>> +	device_writel(dev, upper, ACTMON_DEV_AVG_UPPER_WMARK);
>>>  }
>>>  
>>>  static void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
>>> -				       struct tegra_devfreq_device
>>> *dev)
>>> +				       struct tegra_devfreq_device
>>> *dev,
>>> +				       unsigned long freq)
>>>  {
>>> -	u32 val = tegra->cur_freq * ACTMON_SAMPLING_PERIOD;
>>> +	unsigned long lower, upper, delta;
>>> +
>>> +	/*
>>> +	 * Boosting logic kicks-in once lower / upper watermark is
>>> hit.
>>> +	 * The watermarks are based on the updated EMC rate and the
>>> +	 * average activity.
>>> +	 *
>>> +	 * The higher watermark is set in accordance to the EMC
>>> rate
>>> +	 * because we want to set it to the highest mark here and
>>> EMC rate
>>> +	 * represents that mark. The consecutive-upper interrupts
>>> are
>>> +	 * always enabled and we don't want to receive them if
>>> they won't
>>> +	 * do anything useful, hence the upper watermark is capped
>>> to maximum.
>>> +	 * Note that the EMC rate is changed once boosting pushed
>>> the rate
>>> +	 * too high, in that case boosting-up will be stopped
>>> because
>>> +	 * upper watermark is much higher now and it is
>>> *important* to
>>> +	 * stop the unwanted interrupts.
>>> +	 */
>>> +	tegra_actmon_get_lower_upper(tegra, dev, freq - 1, &lower,
>>> &upper); +
>>> +	delta = do_percent(upper - lower,
>>> dev->config->boost_up_threshold);
>>> +	device_writel(dev, lower + delta, ACTMON_DEV_UPPER_WMARK);
>>>  
>>> -	device_writel(dev, do_percent(val,
>>> dev->config->boost_up_threshold),
>>> -		      ACTMON_DEV_UPPER_WMARK);
>>> +	/*
>>> +	 * Meanwhile the lower mark is based on the average value
>>> +	 * because it is the lowest possible consecutive-mark for
>>> this
>>> +	 * device. Once that mark is hit and boosting is stopped,
>>> the
>>> +	 * interrupt is disabled by ISR.
>>> +	 */
>>> +	freq = dev->avg_count / ACTMON_SAMPLING_PERIOD * KHZ;
>>> +	tegra_actmon_get_lower_upper(tegra, dev, freq, &lower,
>>> &upper); 
>>> -	device_writel(dev, do_percent(val,
>>> dev->config->boost_down_threshold),
>>> -		      ACTMON_DEV_LOWER_WMARK);
>>> +	delta = do_percent(upper - lower,
>>> dev->config->boost_down_threshold);
>>> +	device_writel(dev, lower + delta, ACTMON_DEV_LOWER_WMARK);
>>>  }
>>>  
>>>  static void actmon_isr_device(struct tegra_devfreq *tegra,
>>>  			      struct tegra_devfreq_device *dev)
>>>  {
>>> -	u32 intr_status, dev_ctrl;
>>> +	u32 intr_status, dev_ctrl, avg_intr_mask;
>>>  
>>>  	dev->avg_count = device_readl(dev, ACTMON_DEV_AVG_COUNT);
>>> -	tegra_devfreq_update_avg_wmark(tegra, dev);
>>> -
>>>  	intr_status = device_readl(dev, ACTMON_DEV_INTR_STATUS);
>>>  	dev_ctrl = device_readl(dev, ACTMON_DEV_CTRL);
>>>  
>>> +	avg_intr_mask = ACTMON_DEV_INTR_AVG_BELOW_WMARK |
>>> +			ACTMON_DEV_INTR_AVG_ABOVE_WMARK;
>>> +
>>> +	if (intr_status & avg_intr_mask)
>>> +		tegra_devfreq_update_avg_wmark(tegra, dev);
>>> +
>>>  	if (intr_status & ACTMON_DEV_INTR_CONSECUTIVE_UPPER) {
>>>  		/*
>>>  		 * new_boost = min(old_boost * up_coef + step,
>>> max_freq) @@ -253,8 +390,6 @@ static void actmon_isr_device(struct
>>> tegra_devfreq *tegra, 
>>>  		if (dev->boost_freq >= tegra->max_freq)
>>>  			dev->boost_freq = tegra->max_freq;
>>> -		else
>>> -			dev_ctrl |=
>>> ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; } else if (intr_status
>>> & ACTMON_DEV_INTR_CONSECUTIVE_LOWER) { /*
>>>  		 * new_boost = old_boost * down_coef
>>> @@ -263,63 +398,37 @@ static void actmon_isr_device(struct
>>> tegra_devfreq *tegra, dev->boost_freq = do_percent(dev->boost_freq,
>>>  					     dev->config->boost_down_coeff);
>>>  
>>> -		dev_ctrl |=
>>> ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; -
>>>  		if (dev->boost_freq < (ACTMON_BOOST_FREQ_STEP >>
>>> 1)) dev->boost_freq = 0;
>>> -		else
>>> -			dev_ctrl |=
>>> ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; }
>>>  
>>> -	if (dev->config->avg_dependency_threshold) {
>>> -		if (dev->avg_count >=
>>> dev->config->avg_dependency_threshold)
>>> -			dev_ctrl |=
>>> ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
>>> -		else if (dev->boost_freq == 0)
>>> -			dev_ctrl &=
>>> ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
>>> +	if (intr_status & avg_intr_mask) {
>>> +		/*
>>> +		 * Once average watermark is hit, it means that
>>> the memory
>>> +		 * activity changed significantly and thus
>>> boosting-up shall
>>> +		 * be reset because EMC clock rate will be changed
>>> and
>>> +		 * boosting will restart in this case.
>>> +		 */
>>> +		dev->boost_freq = 0;
>>>  	}
>>>  
>>> -	device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL);
>>> +	/* no boosting => no need for consecutive-down interrupt */
>>> +	if (dev->boost_freq == 0)
>>> +		dev_ctrl &=
>>> ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; 
>>> +	device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL);
>>>  	device_writel(dev, ACTMON_INTR_STATUS_CLEAR,
>>> ACTMON_DEV_INTR_STATUS); }
>>>  
>>> -static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq
>>> *tegra,
>>> -					    unsigned long cpu_freq)
>>> -{
>>> -	unsigned int i;
>>> -	struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios;
>>> -
>>> -	for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++,
>>> ratio++) {
>>> -		if (cpu_freq >= ratio->cpu_freq) {
>>> -			if (ratio->emc_freq >= tegra->max_freq)
>>> -				return tegra->max_freq;
>>> -			else
>>> -				return ratio->emc_freq;
>>> -		}
>>> -	}
>>> -
>>> -	return 0;
>>> -}
>>> -
>>> -static void actmon_update_target(struct tegra_devfreq *tegra,
>>> -				 struct tegra_devfreq_device *dev)
>>> +static unsigned long actmon_update_target(struct tegra_devfreq
>>> *tegra,
>>> +					  struct
>>> tegra_devfreq_device *dev) {
>>> -	unsigned long cpu_freq = 0;
>>> -	unsigned long static_cpu_emc_freq = 0;
>>> -	unsigned int avg_sustain_coef;
>>> -
>>> -	if (dev->config->avg_dependency_threshold) {
>>> -		cpu_freq = cpufreq_get(0);
>>> -		static_cpu_emc_freq =
>>> actmon_cpu_to_emc_rate(tegra, cpu_freq);
>>> -	}
>>> +	unsigned long target_freq;
>>>  
>>> -	dev->target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD;
>>> -	avg_sustain_coef = 100 * 100 /
>>> dev->config->boost_up_threshold;
>>> -	dev->target_freq = do_percent(dev->target_freq,
>>> avg_sustain_coef);
>>> -	dev->target_freq += dev->boost_freq;
>>> +	target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD +
>>> dev->boost_freq;
>>> +	target_freq = tegra_actmon_account_cpu_freq(tegra, dev,
>>> target_freq); 
>>> -	if (dev->avg_count >=
>>> dev->config->avg_dependency_threshold)
>>> -		dev->target_freq = max(dev->target_freq,
>>> static_cpu_emc_freq);
>>> +	return target_freq;
>>>  }
>>>  
>>>  static irqreturn_t actmon_thread_isr(int irq, void *data)
>>> @@ -351,8 +460,8 @@ static int tegra_actmon_rate_notify_cb(struct
>>> notifier_block *nb, unsigned long action, void *ptr)
>>>  {
>>>  	struct clk_notifier_data *data = ptr;
>>> -	struct tegra_devfreq *tegra;
>>>  	struct tegra_devfreq_device *dev;
>>> +	struct tegra_devfreq *tegra;
>>>  	unsigned int i;
>>>  
>>>  	if (action != POST_RATE_CHANGE)
>>> @@ -360,12 +469,28 @@ static int tegra_actmon_rate_notify_cb(struct
>>> notifier_block *nb, 
>>>  	tegra = container_of(nb, struct tegra_devfreq,
>>> rate_change_nb); 
>>> -	tegra->cur_freq = data->new_rate / KHZ;
>>> -
>>> +	/*
>>> +	 * EMC rate could change due to three reasons:
>>> +	 *
>>> +	 *    1. Average watermark hit
>>> +	 *    2. Boosting overflow
>>> +	 *    3. CPU freq change
>>> +	 *
>>> +	 * Once rate is changed, the consecutive watermarks need
>>> to be
>>> +	 * updated in order for boosting to work properly and to
>>> avoid
>>> +	 * unnecessary interrupts. Note that the consecutive range
>>> is set for
>>> +	 * all of devices using the same rate, hence if CPU is
>>> doing much
>>> +	 * less than the other memory clients, then its upper
>>> watermark will
>>> +	 * be very high in comparison to the actual activity
>>> (lower watermark)
>>> +	 * and thus unnecessary upper-interrupts will be
>>> suppressed.
>>> +	 *
>>> +	 * The average watermarks also should be updated because
>>> of 3.
>>> +	 */
>>>  	for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>>>  		dev = &tegra->devices[i];
>>>  
>>> -		tegra_devfreq_update_wmark(tegra, dev);
>>> +		tegra_devfreq_update_avg_wmark(tegra, dev);
>>> +		tegra_devfreq_update_wmark(tegra, dev,
>>> data->new_rate); }
>>>  
>>>  	return NOTIFY_OK;
>>> @@ -374,15 +499,14 @@ static int tegra_actmon_rate_notify_cb(struct
>>> notifier_block *nb, static void
>>> tegra_actmon_configure_device(struct tegra_devfreq *tegra, struct
>>> tegra_devfreq_device *dev) {
>>> -	u32 val = 0;
>>> -
>>> -	dev->target_freq = tegra->cur_freq;
>>> +	u32 val = 0, target_freq;
>>>  
>>> -	dev->avg_count = tegra->cur_freq * ACTMON_SAMPLING_PERIOD;
>>> +	target_freq = clk_get_rate(tegra->emc_clock) / KHZ;
>>> +	dev->avg_count = target_freq * ACTMON_SAMPLING_PERIOD;
>>>  	device_writel(dev, dev->avg_count, ACTMON_DEV_INIT_AVG);
>>>  
>>>  	tegra_devfreq_update_avg_wmark(tegra, dev);
>>> -	tegra_devfreq_update_wmark(tegra, dev);
>>> +	tegra_devfreq_update_wmark(tegra, dev, target_freq);
>>>  
>>>  	device_writel(dev, ACTMON_COUNT_WEIGHT,
>>> ACTMON_DEV_COUNT_WEIGHT); device_writel(dev,
>>> ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS); @@ -469,13
>>> +593,13 @@ static int tegra_devfreq_get_dev_status(struct device
>>> *dev, struct tegra_devfreq_device *actmon_dev; unsigned long
>>> cur_freq; 
>>> -	cur_freq = READ_ONCE(tegra->cur_freq);
>>> +	cur_freq = clk_get_rate(tegra->emc_clock);
>>>  
>>>  	/* To be used by the tegra governor */
>>>  	stat->private_data = tegra;
>>>  
>>>  	/* The below are to be used by the other governors */
>>> -	stat->current_frequency = cur_freq * KHZ;
>>> +	stat->current_frequency = cur_freq;
>>>  
>>>  	actmon_dev = &tegra->devices[MCALL];
>>>  
>>> @@ -486,7 +610,7 @@ static int tegra_devfreq_get_dev_status(struct
>>> device *dev, stat->busy_time *= 100 / BUS_SATURATION_RATIO;
>>>  
>>>  	/* Number of cycles in a sampling period */
>>> -	stat->total_time = ACTMON_SAMPLING_PERIOD * cur_freq;
>>> +	stat->total_time = cur_freq / KHZ * ACTMON_SAMPLING_PERIOD;
>>>  
>>>  	stat->busy_time = min(stat->busy_time, stat->total_time);
>>>  
>>> @@ -505,6 +629,7 @@ static int tegra_governor_get_target(struct
>>> devfreq *devfreq, struct devfreq_dev_status *stat;
>>>  	struct tegra_devfreq *tegra;
>>>  	struct tegra_devfreq_device *dev;
>>> +	unsigned long dev_target_freq;
>>>  	unsigned long target_freq = 0;
>>>  	unsigned int i;
>>>  	int err;
>>> @@ -520,9 +645,9 @@ static int tegra_governor_get_target(struct
>>> devfreq *devfreq, for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>>>  		dev = &tegra->devices[i];
>>>  
>>> -		actmon_update_target(tegra, dev);
>>> +		dev_target_freq = actmon_update_target(tegra, dev);
>>>  
>>> -		target_freq = max(target_freq, dev->target_freq);
>>> +		target_freq = max(target_freq, dev_target_freq);
>>>  	}
>>>  
>>>  	*freq = target_freq * KHZ;
>>> @@ -642,7 +767,6 @@ static int tegra_devfreq_probe(struct
>>> platform_device *pdev) return rate;
>>>  	}
>>>  
>>> -	tegra->cur_freq = clk_get_rate(tegra->emc_clock) / KHZ;
>>>  	tegra->max_freq = rate / KHZ;
>>>  
>>>  	for (i = 0; i < ARRAY_SIZE(actmon_device_configs); i++) {
>>> @@ -671,7 +795,8 @@ static int tegra_devfreq_probe(struct
>>> platform_device *pdev) platform_set_drvdata(pdev, tegra);
>>>  
>>>  	tegra->rate_change_nb.notifier_call =
>>> tegra_actmon_rate_notify_cb;
>>> -	err = clk_notifier_register(tegra->emc_clock,
>>> &tegra->rate_change_nb);
>>> +	err = clk_notifier_register(tegra->emc_clock,
>>> +				    &tegra->rate_change_nb);
>>>  	if (err) {
>>>  		dev_err(&pdev->dev,
>>>  			"Failed to register rate change
>>> notifier\n"); 
>>
>>
>> Maybe, it is possible to merge patch4/patch19/patch20 to one patch.
> 
> All these three patches are completely separate changes, thus they
> should be kept separate.
> 

I replied on patch19 why it is possible to merge patch5 and patch19.
Please check my comment.


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 12/24] PM / devfreq: tegra30: Inline all one-line functions
From: Chanwoo Choi @ 2019-07-19  1:27 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <92f82420-5c50-468f-a403-7b4c36958076@samsung.com>

On 19. 7. 19. 오전 10:24, Chanwoo Choi wrote:
> On 19. 7. 19. 오전 10:22, Dmitry Osipenko wrote:
>> В Thu, 18 Jul 2019 18:09:05 +0900
>> Chanwoo Choi <cw00.choi@samsung.com> пишет:
>>
>>> On 19. 7. 16. 오후 10:35, Dmitry Osipenko wrote:
>>>> 16.07.2019 15:26, Chanwoo Choi пишет:  
>>>>> Hi Dmitry,
>>>>>
>>>>> I'm not sure that it is necessary.
>>>>> As I knew, usally, the 'inline' is used on header file
>>>>> to define the empty functions.
>>>>>
>>>>> Do we have to change it with 'inline' keyword?  
>>>>
>>>> The 'inline' attribute tells compiler that instead of jumping into
>>>> the function, it should take the function's code and replace the
>>>> function's invocation with that code. This is done in order to help
>>>> compiler optimize code properly, please see [1]. There is
>>>> absolutely no need to create a function call into a function that
>>>> consists of a single instruction.
>>>>
>>>> [1] https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Inline.html
>>>>   
>>>
>>> If you want to add 'inline' keyword, I recommend that 
>>> you better to remove the modified function in this patch
>>> and then just call the 'write_relaxed or read_relaxed' function
>>> directly. It is same result when using inline keyword.
>>
>> That could be done, but it makes code less readable.
>>
>> See the difference:
>>
>> device_writel(dev, ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS);
>>
>> writel_relaxed(ACTMON_INTR_STATUS_CLEAR,
>> 	       dev->regs + ACTMON_DEV_INTR_STATUS);
> 
> No problem if you add the detailed comment and you want to use
> the 'inline' keyword.

Basically, I think that 'inline' keyword is not necessary.
But if you want to use 'inline' keyword, I recommend
that call the 'write_relaxed or read_relaxed' function directly
with detailed description. 

> 
>>
>>
>>
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 12/24] PM / devfreq: tegra30: Inline all one-line functions
From: Chanwoo Choi @ 2019-07-19  1:24 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719042251.37cc9cda@dimatab>

On 19. 7. 19. 오전 10:22, Dmitry Osipenko wrote:
> В Thu, 18 Jul 2019 18:09:05 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 16. 오후 10:35, Dmitry Osipenko wrote:
>>> 16.07.2019 15:26, Chanwoo Choi пишет:  
>>>> Hi Dmitry,
>>>>
>>>> I'm not sure that it is necessary.
>>>> As I knew, usally, the 'inline' is used on header file
>>>> to define the empty functions.
>>>>
>>>> Do we have to change it with 'inline' keyword?  
>>>
>>> The 'inline' attribute tells compiler that instead of jumping into
>>> the function, it should take the function's code and replace the
>>> function's invocation with that code. This is done in order to help
>>> compiler optimize code properly, please see [1]. There is
>>> absolutely no need to create a function call into a function that
>>> consists of a single instruction.
>>>
>>> [1] https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Inline.html
>>>   
>>
>> If you want to add 'inline' keyword, I recommend that 
>> you better to remove the modified function in this patch
>> and then just call the 'write_relaxed or read_relaxed' function
>> directly. It is same result when using inline keyword.
> 
> That could be done, but it makes code less readable.
> 
> See the difference:
> 
> device_writel(dev, ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS);
> 
> writel_relaxed(ACTMON_INTR_STATUS_CLEAR,
> 	       dev->regs + ACTMON_DEV_INTR_STATUS);

No problem if you add the detailed comment and you want to use
the 'inline' keyword.

> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 11/24] PM / devfreq: tegra30: Add debug messages
From: Chanwoo Choi @ 2019-07-19  1:22 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719041357.0a80a2dc@dimatab>

On 19. 7. 19. 오전 10:13, Dmitry Osipenko wrote:
> В Thu, 18 Jul 2019 18:07:05 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 18. 오전 12:46, Dmitry Osipenko wrote:
>>> 17.07.2019 9:45, Chanwoo Choi пишет:  
>>>> On 19. 7. 16. 오후 10:26, Dmitry Osipenko wrote:  
>>>>> 16.07.2019 15:23, Chanwoo Choi пишет:  
>>>>>> Hi Dmitry,
>>>>>>
>>>>>> Usually, the kernel log print for all users
>>>>>> such as changing the frequency, fail or success.
>>>>>>
>>>>>> But, if the log just show the register dump,
>>>>>> it is not useful for all users. It is just used
>>>>>> for only specific developer.
>>>>>>
>>>>>> I recommend that you better to add more exception handling
>>>>>> code on many points instead of just showing the register dump.  
>>>>>
>>>>> The debug messages are not users, but for developers. Yes, I
>>>>> primarily made the debugging to be useful for myself and will be
>>>>> happy to change the way debugging is done if there will be any
>>>>> other active developer for this driver. The registers dump is
>>>>> more than enough in order to understand what's going on, I don't
>>>>> see any real need to change anything here for now.  
>>>>
>>>> Basically, we have to develop code and add the log for anyone.
>>>> As you commented, even if there are no other developer, we never
>>>> guarantee this assumption forever. And also, if added debug message
>>>> for only you, you can add them when testing it temporarily.
>>>>
>>>> If you want to add the just register dump log for you,
>>>> I can't agree. Once again, I hope that anyone understand
>>>> the meaning of debug message as much possible as.
>>>>  
>>>
>>> The registers dump should be good for everyone because it's a
>>> self-explanatory information for anyone who is familiar with the
>>> hardware. I don't think there is a need for anything else than what
>>> is proposed in this patch, at least for now. I also simply don't
>>> see any other better way to debug the state of this particular
>>> hardware, again this logging is for the driver developers and not
>>> for users.
>>>
>>> Initially, I was temporarily adding the debug messages. Now they are
>>> pretty much mandatory for verifying that driver is working
>>> properly. And of course the debugging messages got into the shape
>>> of this patch after several iterations of refinements. So again, I
>>> suppose that this should be good enough for everyone who is
>>> familiar with the hardware. And of course I'm open to the
>>> constructive suggestions, the debugging aid is not an ABI and could
>>> be changed/improved at any time.
>>>
>>> You're suggesting to break down the debugging into several smaller
>>> pieces, but I'm finding that as not a constructive suggestion
>>> because the information about the full hardware state is actually
>>> necessary for the productive debugging.
>>>
>>>   
>>
>> Sorry for that as I saie, I cannot agree this patch. In my case,
>> I don't understand what is meaning of register dump of this patch.
>> I knew that just register dump are useful for real developer.
> 
> It's not only a registers dump, as you may see there is also a dump of
> other properties like boosting value, OPPs selection and etc.
> 
> It looks to me that you're also missing important detail that debug
> messages are compiled out unless DEBUG is defined for the drivers
> build. So in order to get the debug message a user shall explicitly add
> #define DEBUG macro to the code or enable debug messages globally in
> the kernel's config. There is also an option for dynamic debug messages
> in the kernel, but it doesn't matter now because all these messages are
> turned into tracepoints later in the patch #17.


Right. But, this patch could not the split up between register dump and others.
As I said repeatly, I hope to add the log that anyone can understand. 


> 
>> If you want to show the register dump, you better to add some feature
>> with debugfs for devfreq framework in order to read the register dump.
>> As I knew, sound framework (alsa) has the similar feature for checking
>> the register dump.
>>
> 
> The intent was to have an option for dynamic debugging of the driver and
> initially debug messages were good enough, but then it became not enough
> and hence the debug messages were turned into tracepoints in the patch
> #17. Would it be acceptable to squash this patch and #17?
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 12/24] PM / devfreq: tegra30: Inline all one-line functions
From: Dmitry Osipenko @ 2019-07-19  1:22 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <45621f73-2f86-cde7-a92e-2a34810b9c05@samsung.com>

В Thu, 18 Jul 2019 18:09:05 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 16. 오후 10:35, Dmitry Osipenko wrote:
> > 16.07.2019 15:26, Chanwoo Choi пишет:  
> >> Hi Dmitry,
> >>
> >> I'm not sure that it is necessary.
> >> As I knew, usally, the 'inline' is used on header file
> >> to define the empty functions.
> >>
> >> Do we have to change it with 'inline' keyword?  
> > 
> > The 'inline' attribute tells compiler that instead of jumping into
> > the function, it should take the function's code and replace the
> > function's invocation with that code. This is done in order to help
> > compiler optimize code properly, please see [1]. There is
> > absolutely no need to create a function call into a function that
> > consists of a single instruction.
> > 
> > [1] https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Inline.html
> >   
> 
> If you want to add 'inline' keyword, I recommend that 
> you better to remove the modified function in this patch
> and then just call the 'write_relaxed or read_relaxed' function
> directly. It is same result when using inline keyword.

That could be done, but it makes code less readable.

See the difference:

device_writel(dev, ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS);

writel_relaxed(ACTMON_INTR_STATUS_CLEAR,
	       dev->regs + ACTMON_DEV_INTR_STATUS);


^ permalink raw reply

* Re: [PATCH v4 19/24] PM / devfreq: tegra30: Optimize upper consecutive watermark selection
From: Chanwoo Choi @ 2019-07-19  1:17 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <e45a250c-1860-ca86-4b60-4896166752a8@samsung.com>

On 19. 7. 19. 오전 10:15, Chanwoo Choi wrote:
> On 19. 7. 19. 오전 9:40, Dmitry Osipenko wrote:
>> В Thu, 18 Jul 2019 18:51:02 +0900
>> Chanwoo Choi <cw00.choi@samsung.com> пишет:
>>
>>> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
>>>> The memory activity counter may get a bit higher than a watermark
>>>> which is selected based on OPP that corresponds to a highest EMC
>>>> rate, in this case watermark is lower than the actual memory
>>>> activity is and thus results in unwanted "upper" interrupts.
>>>>
>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>> ---
>>>>  drivers/devfreq/tegra30-devfreq.c | 13 ++++++++++++-
>>>>  1 file changed, 12 insertions(+), 1 deletion(-)  
>>>
>>> It seems that you can combine patch19 with patch20.
>>
>> No, consecutive and average watermarks are different things that have
>> different purposes. Consecutive are used for boosting, while average
>> are for significant memory bandwidth changes.
>>
>>>>
>>>> diff --git a/drivers/devfreq/tegra30-devfreq.c
>>>> b/drivers/devfreq/tegra30-devfreq.c index
>>>> 8d6bf6e9f1ae..c3cf87231d25 100644 ---
>>>> a/drivers/devfreq/tegra30-devfreq.c +++
>>>> b/drivers/devfreq/tegra30-devfreq.c @@ -363,7 +363,18 @@ static
>>>> void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
>>>> tegra_actmon_get_lower_upper(tegra, dev, freq - 1, &lower, &upper); 
>>>>  	delta = do_percent(upper - lower,
>>>> dev->config->boost_up_threshold);
>>>> -	device_writel(dev, lower + delta, ACTMON_DEV_UPPER_WMARK);
> 
> This line was added on patch5 and then this line is removed on this patch.
> It is wrong method to make the patch in the same patchset.
> 
> It is enough to reduce the inefficient add/remove code in the same patchset.

I'm missing some expression. So, I update my comment as following:
It is possible to reduce the inefficient add/remove code in the same patchset
if you combine the patches.

> 
> Have to merge this patch to patch5.
> 
>>>> +
>>>> +	/*
>>>> +	 * The memory events count could go a bit higher than the
>>>> maximum
>>>> +	 * defined by the OPPs, hence make the upper watermark
>>>> infinitely
>>>> +	 * high to avoid unnecessary upper interrupts in that case.
>>>> +	 */
>>>> +	if (freq == tegra->max_freq)
>>>> +		upper = ULONG_MAX;
>>>> +	else
>>>> +		upper = lower + delta;
>>>> +
>>>> +	device_writel(dev, upper, ACTMON_DEV_UPPER_WMARK);
>>>>  
>>>>  	/*
>>>>  	 * Meanwhile the lower mark is based on the average value
>>>>   
>>>
>>>
>>
>>
>>
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 19/24] PM / devfreq: tegra30: Optimize upper consecutive watermark selection
From: Chanwoo Choi @ 2019-07-19  1:15 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719034025.67c82123@dimatab>

On 19. 7. 19. 오전 9:40, Dmitry Osipenko wrote:
> В Thu, 18 Jul 2019 18:51:02 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
>>> The memory activity counter may get a bit higher than a watermark
>>> which is selected based on OPP that corresponds to a highest EMC
>>> rate, in this case watermark is lower than the actual memory
>>> activity is and thus results in unwanted "upper" interrupts.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>> ---
>>>  drivers/devfreq/tegra30-devfreq.c | 13 ++++++++++++-
>>>  1 file changed, 12 insertions(+), 1 deletion(-)  
>>
>> It seems that you can combine patch19 with patch20.
> 
> No, consecutive and average watermarks are different things that have
> different purposes. Consecutive are used for boosting, while average
> are for significant memory bandwidth changes.
> 
>>>
>>> diff --git a/drivers/devfreq/tegra30-devfreq.c
>>> b/drivers/devfreq/tegra30-devfreq.c index
>>> 8d6bf6e9f1ae..c3cf87231d25 100644 ---
>>> a/drivers/devfreq/tegra30-devfreq.c +++
>>> b/drivers/devfreq/tegra30-devfreq.c @@ -363,7 +363,18 @@ static
>>> void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
>>> tegra_actmon_get_lower_upper(tegra, dev, freq - 1, &lower, &upper); 
>>>  	delta = do_percent(upper - lower,
>>> dev->config->boost_up_threshold);
>>> -	device_writel(dev, lower + delta, ACTMON_DEV_UPPER_WMARK);

This line was added on patch5 and then this line is removed on this patch.
It is wrong method to make the patch in the same patchset.

It is enough to reduce the inefficient add/remove code in the same patchset.

Have to merge this patch to patch5.

>>> +
>>> +	/*
>>> +	 * The memory events count could go a bit higher than the
>>> maximum
>>> +	 * defined by the OPPs, hence make the upper watermark
>>> infinitely
>>> +	 * high to avoid unnecessary upper interrupts in that case.
>>> +	 */
>>> +	if (freq == tegra->max_freq)
>>> +		upper = ULONG_MAX;
>>> +	else
>>> +		upper = lower + delta;
>>> +
>>> +	device_writel(dev, upper, ACTMON_DEV_UPPER_WMARK);
>>>  
>>>  	/*
>>>  	 * Meanwhile the lower mark is based on the average value
>>>   
>>
>>
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 11/24] PM / devfreq: tegra30: Add debug messages
From: Dmitry Osipenko @ 2019-07-19  1:13 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <f630dacc-2065-a12d-bd03-1fc6c4363e1f@samsung.com>

В Thu, 18 Jul 2019 18:07:05 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 18. 오전 12:46, Dmitry Osipenko wrote:
> > 17.07.2019 9:45, Chanwoo Choi пишет:  
> >> On 19. 7. 16. 오후 10:26, Dmitry Osipenko wrote:  
> >>> 16.07.2019 15:23, Chanwoo Choi пишет:  
> >>>> Hi Dmitry,
> >>>>
> >>>> Usually, the kernel log print for all users
> >>>> such as changing the frequency, fail or success.
> >>>>
> >>>> But, if the log just show the register dump,
> >>>> it is not useful for all users. It is just used
> >>>> for only specific developer.
> >>>>
> >>>> I recommend that you better to add more exception handling
> >>>> code on many points instead of just showing the register dump.  
> >>>
> >>> The debug messages are not users, but for developers. Yes, I
> >>> primarily made the debugging to be useful for myself and will be
> >>> happy to change the way debugging is done if there will be any
> >>> other active developer for this driver. The registers dump is
> >>> more than enough in order to understand what's going on, I don't
> >>> see any real need to change anything here for now.  
> >>
> >> Basically, we have to develop code and add the log for anyone.
> >> As you commented, even if there are no other developer, we never
> >> guarantee this assumption forever. And also, if added debug message
> >> for only you, you can add them when testing it temporarily.
> >>
> >> If you want to add the just register dump log for you,
> >> I can't agree. Once again, I hope that anyone understand
> >> the meaning of debug message as much possible as.
> >>  
> > 
> > The registers dump should be good for everyone because it's a
> > self-explanatory information for anyone who is familiar with the
> > hardware. I don't think there is a need for anything else than what
> > is proposed in this patch, at least for now. I also simply don't
> > see any other better way to debug the state of this particular
> > hardware, again this logging is for the driver developers and not
> > for users.
> > 
> > Initially, I was temporarily adding the debug messages. Now they are
> > pretty much mandatory for verifying that driver is working
> > properly. And of course the debugging messages got into the shape
> > of this patch after several iterations of refinements. So again, I
> > suppose that this should be good enough for everyone who is
> > familiar with the hardware. And of course I'm open to the
> > constructive suggestions, the debugging aid is not an ABI and could
> > be changed/improved at any time.
> > 
> > You're suggesting to break down the debugging into several smaller
> > pieces, but I'm finding that as not a constructive suggestion
> > because the information about the full hardware state is actually
> > necessary for the productive debugging.
> > 
> >   
> 
> Sorry for that as I saie, I cannot agree this patch. In my case,
> I don't understand what is meaning of register dump of this patch.
> I knew that just register dump are useful for real developer.

It's not only a registers dump, as you may see there is also a dump of
other properties like boosting value, OPPs selection and etc.

It looks to me that you're also missing important detail that debug
messages are compiled out unless DEBUG is defined for the drivers
build. So in order to get the debug message a user shall explicitly add
#define DEBUG macro to the code or enable debug messages globally in
the kernel's config. There is also an option for dynamic debug messages
in the kernel, but it doesn't matter now because all these messages are
turned into tracepoints later in the patch #17.

> If you want to show the register dump, you better to add some feature
> with debugfs for devfreq framework in order to read the register dump.
> As I knew, sound framework (alsa) has the similar feature for checking
> the register dump.
> 

The intent was to have an option for dynamic debugging of the driver and
initially debug messages were good enough, but then it became not enough
and hence the debug messages were turned into tracepoints in the patch
#17. Would it be acceptable to squash this patch and #17?


^ permalink raw reply

* Re: [PATCH v4 18/24] PM / devfreq: tegra30: Optimize CPUFreq notifier
From: Chanwoo Choi @ 2019-07-19  1:09 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719034258.651a9c06@dimatab>

On 19. 7. 19. 오전 9:42, Dmitry Osipenko wrote:
> В Thu, 18 Jul 2019 18:48:42 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
>>> When CPU's memory activity is low or memory activity is high such
>>> that CPU's frequency contribution to the boosting is not taken into
>>> account, then there is no need to schedule devfreq's update. This
>>> eliminates unnecessary CPU activity during of idling caused by the
>>> scheduled work.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>> ---
>>>  drivers/devfreq/tegra30-devfreq.c | 73
>>> +++++++++++++++++++++++++++---- 1 file changed, 64 insertions(+), 9
>>> deletions(-)  
>>
>> Patch4 add the 'cpufreq notifier' and this patch optimize the cpufreq
>> notifier. I think t hat you can combine two patches.
> 
> I'd prefer to keep them separate for a sake of git bisection.

Sorry, patch7 instead of patch4.

Patch7 made the 'tegra_actmon_cpu_notify_cb()' function
and this patch makes 'tegra_actmon_cpufreq_contribution' function
which is only called in the 'tegra_actmon_cpu_notify_cb()' function.

It is enough to make them as the only one patch related to
the cpu notifier. As I replied on patch17, you can merge
the patch if the patches has some relationship in this patchset.

> 
>>>
>>> diff --git a/drivers/devfreq/tegra30-devfreq.c
>>> b/drivers/devfreq/tegra30-devfreq.c index
>>> 43c9c5fbfe91..8d6bf6e9f1ae 100644 ---
>>> a/drivers/devfreq/tegra30-devfreq.c +++
>>> b/drivers/devfreq/tegra30-devfreq.c @@ -216,10 +216,10 @@ static
>>> inline unsigned long do_percent(unsigned long val, unsigned int
>>> pct) return val * pct / 100; }
>>>  
>>> -static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq
>>> *tegra) +static unsigned long actmon_cpu_to_emc_rate(struct
>>> tegra_devfreq *tegra,
>>> +					    unsigned int cpu_freq)
>>>  {
>>>  	const struct tegra_actmon_emc_ratio *ratio =
>>> actmon_emc_ratios;
>>> -	unsigned int cpu_freq = cpufreq_get(0);
>>>  	unsigned int i;
>>>  
>>>  	for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++,
>>> ratio++) { @@ -239,15 +239,15 @@
>>> tegra_actmon_account_cpu_freq(struct tegra_devfreq *tegra, struct
>>> tegra_devfreq_device *dev, unsigned long target_freq)
>>>  {
>>> -	unsigned long static_cpu_emc_freq;
>>> +	unsigned long cpu_emc_freq = 0;
>>>  
>>> -	if (dev->config->avg_dependency_threshold &&
>>> -	    dev->config->avg_dependency_threshold < dev->avg_freq)
>>> {
>>> -		static_cpu_emc_freq =
>>> actmon_cpu_to_emc_rate(tegra);
>>> -		target_freq = max(target_freq,
>>> static_cpu_emc_freq);
>>> -	}
>>> +	if (!dev->config->avg_dependency_threshold)
>>> +		return target_freq;
>>>  
>>> -	return target_freq;
>>> +	if (dev->avg_freq > dev->config->avg_dependency_threshold)
>>> +		cpu_emc_freq = actmon_cpu_to_emc_rate(tegra,
>>> cpufreq_get(0)); +
>>> +	return max(target_freq, cpu_emc_freq);
>>>  }
>>>  
>>>  static unsigned long tegra_actmon_lower_freq(struct tegra_devfreq
>>> *tegra, @@ -531,16 +531,71 @@ static void
>>> tegra_actmon_delayed_update(struct work_struct *work)
>>> mutex_unlock(&tegra->devfreq->lock); }
>>>  
>>> +static unsigned long
>>> +tegra_actmon_cpufreq_contribution(struct tegra_devfreq *tegra,
>>> +				  unsigned int cpu_freq)
>>> +{
>>> +	unsigned long freq, static_cpu_emc_freq;
>>> +
>>> +	/* check whether CPU's freq is taken into account at all */
>>> +	if (tegra->devices[MCCPU].avg_freq <=
>>> +	    tegra->devices[MCCPU].config->avg_dependency_threshold)
>>> +		return 0;
>>> +
>>> +	static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra,
>>> cpu_freq); +
>>> +	/* compare static CPU-EMC freq with MCALL */
>>> +	freq = tegra->devices[MCALL].avg_freq +
>>> +	       tegra->devices[MCALL].boost_freq;
>>> +
>>> +	freq = tegra_actmon_upper_freq(tegra, freq);
>>> +
>>> +	if (freq == tegra->max_freq || freq >= static_cpu_emc_freq)
>>> +		return 0;
>>> +
>>> +	/* compare static CPU-EMC freq with MCCPU */
>>> +	freq = tegra->devices[MCCPU].avg_freq +
>>> +	       tegra->devices[MCCPU].boost_freq;
>>> +
>>> +	freq = tegra_actmon_upper_freq(tegra, freq);
>>> +
>>> +	if (freq == tegra->max_freq || freq >= static_cpu_emc_freq)
>>> +		return 0;
>>> +
>>> +	return static_cpu_emc_freq;
>>> +}
>>> +
>>>  static int tegra_actmon_cpu_notify_cb(struct notifier_block *nb,
>>>  				      unsigned long action, void
>>> *ptr) {
>>> +	struct cpufreq_freqs *freqs = ptr;
>>>  	struct tegra_devfreq *tegra;
>>> +	unsigned long old, new;
>>>  
>>>  	if (action != CPUFREQ_POSTCHANGE)
>>>  		return NOTIFY_OK;
>>>  
>>>  	tegra = container_of(nb, struct tegra_devfreq,
>>> cpu_rate_change_nb); 
>>> +	/*
>>> +	 * Quickly check whether CPU frequency should be taken
>>> into account
>>> +	 * at all, without blocking CPUFreq's core.
>>> +	 */
>>> +	if (mutex_trylock(&tegra->devfreq->lock)) {
>>> +		old = tegra_actmon_cpufreq_contribution(tegra,
>>> freqs->old);
>>> +		new = tegra_actmon_cpufreq_contribution(tegra,
>>> freqs->new);
>>> +		mutex_unlock(&tegra->devfreq->lock);
>>> +
>>> +		/*
>>> +		 * If CPU's frequency shouldn't be taken into
>>> account at
>>> +		 * the moment, then there is no need to update the
>>> devfreq's
>>> +		 * state because ISR will re-check CPU's frequency
>>> on the
>>> +		 * next interrupt.
>>> +		 */
>>> +		if (old == new)
>>> +			return NOTIFY_OK;
>>> +	}
>>> +
>>>  	/*
>>>  	 * CPUFreq driver should support
>>> CPUFREQ_ASYNC_NOTIFICATION in order
>>>  	 * to allow asynchronous notifications. This means we
>>> can't block 
>>
>>
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 17/24] PM / devfreq: tegra30: Use tracepoints for debugging
From: Chanwoo Choi @ 2019-07-19  1:01 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <20190719034938.6382f989@dimatab>

On 19. 7. 19. 오전 9:49, Dmitry Osipenko wrote:
> В Thu, 18 Jul 2019 18:47:09 +0900
> Chanwoo Choi <cw00.choi@samsung.com> пишет:
> 
>> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
>>> Debug messages create too much CPU and memory activity by
>>> themselves, so it's difficult to debug lower rates and catch
>>> unwanted interrupts that happen rarely. Tracepoints are ideal in
>>> that regards because they do not contribute to the sampled date at
>>> all. This allowed me to catch few problems which are fixed by the
>>> followup patches, without tracepoints it would be much harder to do.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>> ---
>>>  drivers/devfreq/tegra30-devfreq.c      |  43 +++-------
>>>  include/trace/events/tegra30_devfreq.h | 105
>>> +++++++++++++++++++++++++ 2 files changed, 117 insertions(+), 31
>>> deletions(-) create mode 100644
>>> include/trace/events/tegra30_devfreq.h  
>>
>> As I knew, 'include/trace/events' don't include the header file
>> for only one device driver. Usually, the trace event is provided
>> by framework instead of each devic driver.
> 
> There are at least trace headers there for the tegra-apbdma,
> tegra-host1x, intel-sst and intel-ish devices. I don't think that there
> is a strict rule for the trace headers placement.

OK.

But, As I already replied on patch4, if you want to show the register dump,
you better to add the debugfs feature to devfreq framework for showing
the register dump instead of printing the register dump with debug level
and this trace event. 

As I said, just register dump is not useful for all developers.
Almost developer cannot understand the meaning of debug log for register dump.

Also, it is not proper way that front patch adds the some code
and then later patch removes the additional code in the same series.
Before sending the patches, you can renew them.

> 
>>>
>>> diff --git a/drivers/devfreq/tegra30-devfreq.c
>>> b/drivers/devfreq/tegra30-devfreq.c index
>>> 6ebf0f505767..43c9c5fbfe91 100644 ---
>>> a/drivers/devfreq/tegra30-devfreq.c +++
>>> b/drivers/devfreq/tegra30-devfreq.c @@ -19,6 +19,9 @@
>>>  #include <linux/reset.h>
>>>  #include <linux/workqueue.h>
>>>  
>>> +#define CREATE_TRACE_POINTS
>>> +#include <trace/events/tegra30_devfreq.h>
>>> +
>>>  #include "governor.h"
>>>  
>>>  #define ACTMON_GLB_STATUS
>>> 0x0 @@ -283,9 +286,6 @@ static void
>>> tegra_actmon_get_lower_upper(struct tegra_devfreq *tegra, unsigned
>>> long *lower, unsigned long *upper)
>>>  {
>>> -	struct device *ddev = tegra->devfreq->dev.parent;
>>> -	u32 offset = dev->config->offset;
>>> -
>>>  	/*
>>>  	 * Memory frequencies are guaranteed to have 1MHz
>>> granularity
>>>  	 * and thus we need this rounding down to get a proper
>>> watermarks @@ -298,8 +298,8 @@ static void
>>> tegra_actmon_get_lower_upper(struct tegra_devfreq *tegra, *lower =
>>> tegra_actmon_lower_freq(tegra, target_freq); *upper =
>>> tegra_actmon_upper_freq(tegra, target_freq); 
>>> -	dev_dbg(ddev, "%03x: target_freq %lu lower freq %lu upper
>>> freq %lu\n",
>>> -		offset, target_freq, *lower, *upper);
>>> +	trace_device_lower_upper(dev->config->offset, target_freq,
>>> +				 *lower, *upper);
>>>  
>>>  	/*
>>>  	 * The upper watermark should take into account CPU's
>>> frequency @@ -377,30 +377,13 @@ static void
>>> tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
>>> device_writel(dev, lower + delta, ACTMON_DEV_LOWER_WMARK); }
>>>  
>>> -static void actmon_device_debug(struct tegra_devfreq *tegra,
>>> -				struct tegra_devfreq_device *dev,
>>> -				const char *prefix)
>>> -{
>>> -	dev_dbg(tegra->devfreq->dev.parent,
>>> -		"%03x: %s: 0x%08x 0x%08x a %u %u %u c %u %u %u b
>>> %lu cpu %u\n",
>>> -		dev->config->offset, prefix,
>>> -		device_readl(dev, ACTMON_DEV_INTR_STATUS),
>>> -		device_readl(dev, ACTMON_DEV_CTRL),
>>> -		device_readl(dev, ACTMON_DEV_AVG_COUNT),
>>> -		device_readl(dev, ACTMON_DEV_AVG_LOWER_WMARK),
>>> -		device_readl(dev, ACTMON_DEV_AVG_UPPER_WMARK),
>>> -		device_readl(dev, ACTMON_DEV_COUNT),
>>> -		device_readl(dev, ACTMON_DEV_LOWER_WMARK),
>>> -		device_readl(dev, ACTMON_DEV_UPPER_WMARK),
>>> -		dev->boost_freq, cpufreq_get(0));
>>> -}
>>> -
>>>  static void actmon_isr_device(struct tegra_devfreq *tegra,
>>>  			      struct tegra_devfreq_device *dev)
>>>  {
>>>  	u32 intr_status, dev_ctrl, avg_intr_mask, avg_count;
>>>  
>>> -	actmon_device_debug(tegra, dev, "isr+");
>>> +	trace_device_isr_enter(tegra->regs, dev->config->offset,
>>> +			       dev->boost_freq, cpufreq_get(0));
>>>  
>>>  	intr_status = device_readl(dev, ACTMON_DEV_INTR_STATUS);
>>>  	avg_count = device_readl(dev, ACTMON_DEV_AVG_COUNT);
>>> @@ -455,7 +438,8 @@ static void actmon_isr_device(struct
>>> tegra_devfreq *tegra, device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL);
>>>  	device_writel(dev, ACTMON_INTR_STATUS_CLEAR,
>>> ACTMON_DEV_INTR_STATUS); 
>>> -	actmon_device_debug(tegra, dev, "isr-");
>>> +	trace_device_isr_exit(tegra->regs, dev->config->offset,
>>> +			      dev->boost_freq, cpufreq_get(0));
>>>  }
>>>  
>>>  static unsigned long actmon_update_target(struct tegra_devfreq
>>> *tegra, @@ -749,7 +733,6 @@ static struct devfreq_dev_profile
>>> tegra_devfreq_profile = { static int
>>> tegra_governor_get_target(struct devfreq *devfreq, unsigned long
>>> *freq) {
>>> -	struct device *ddev = devfreq->dev.parent;
>>>  	struct devfreq_dev_status *stat;
>>>  	struct tegra_devfreq *tegra;
>>>  	struct tegra_devfreq_device *dev;
>>> @@ -770,13 +753,11 @@ static int tegra_governor_get_target(struct
>>> devfreq *devfreq, dev = &tegra->devices[i];
>>>  
>>>  		dev_target_freq = actmon_update_target(tegra, dev);
>>> -
>>>  		target_freq = max(target_freq, dev_target_freq);
>>>  
>>> -		dev_dbg(ddev, "%03x: upd: dev_target_freq %lu\n",
>>> -			dev->config->offset, dev_target_freq);
>>> -
>>> -		actmon_device_debug(tegra, dev, "upd");
>>> +		trace_device_target_freq(dev->config->offset,
>>> dev_target_freq);
>>> +		trace_device_target_update(tegra->regs,
>>> dev->config->offset,
>>> +					   dev->boost_freq,
>>> cpufreq_get(0)); }
>>>  
>>>  	*freq = target_freq;
>>> diff --git a/include/trace/events/tegra30_devfreq.h
>>> b/include/trace/events/tegra30_devfreq.h new file mode 100644
>>> index 000000000000..8f264a489daf
>>> --- /dev/null
>>> +++ b/include/trace/events/tegra30_devfreq.h
>>> @@ -0,0 +1,105 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#undef TRACE_SYSTEM
>>> +#define TRACE_SYSTEM tegra30_devfreq
>>> +
>>> +#if !defined(_TRACE_TEGRA30_DEVFREQ_H) ||
>>> defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_TEGRA30_DEVFREQ_H
>>> +
>>> +#include <linux/io.h>
>>> +#include <linux/tracepoint.h>
>>> +#include <linux/types.h>
>>> +
>>> +DECLARE_EVENT_CLASS(device_state,
>>> +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
>>> cpufreq),
>>> +	TP_ARGS(base, offset, boost, cpufreq),
>>> +	TP_STRUCT__entry(
>>> +		__field(u32, offset)
>>> +		__field(u32, intr_status)
>>> +		__field(u32, ctrl)
>>> +		__field(u32, avg_count)
>>> +		__field(u32, avg_lower)
>>> +		__field(u32, avg_upper)
>>> +		__field(u32, count)
>>> +		__field(u32, lower)
>>> +		__field(u32, upper)
>>> +		__field(u32, boost_freq)
>>> +		__field(u32, cpu_freq)
>>> +	),
>>> +	TP_fast_assign(
>>> +		__entry->offset		= offset;
>>> +		__entry->intr_status	= readl_relaxed(base +
>>> offset + 0x24);
>>> +		__entry->ctrl		= readl_relaxed(base
>>> + offset + 0x0);
>>> +		__entry->avg_count	= readl_relaxed(base +
>>> offset + 0x20);
>>> +		__entry->avg_lower	= readl_relaxed(base +
>>> offset + 0x14);
>>> +		__entry->avg_upper	= readl_relaxed(base +
>>> offset + 0x10);
>>> +		__entry->count		= readl_relaxed(base
>>> + offset + 0x1c);
>>> +		__entry->lower		= readl_relaxed(base
>>> + offset + 0x8);
>>> +		__entry->upper		= readl_relaxed(base
>>> + offset + 0x4);
>>> +		__entry->boost_freq	= boost;
>>> +		__entry->cpu_freq	= cpufreq;
>>> +	),
>>> +	TP_printk("%03x: intr 0x%08x ctrl 0x%08x avg %010u %010u
>>> %010u cnt %010u %010u %010u boost %010u cpu %u",
>>> +		__entry->offset,
>>> +		__entry->intr_status,
>>> +		__entry->ctrl,
>>> +		__entry->avg_count,
>>> +		__entry->avg_lower,
>>> +		__entry->avg_upper,
>>> +		__entry->count,
>>> +		__entry->lower,
>>> +		__entry->upper,
>>> +		__entry->boost_freq,
>>> +		__entry->cpu_freq)
>>> +);
>>> +
>>> +DEFINE_EVENT(device_state, device_isr_enter,
>>> +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
>>> cpufreq),
>>> +	TP_ARGS(base, offset, boost, cpufreq));
>>> +
>>> +DEFINE_EVENT(device_state, device_isr_exit,
>>> +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
>>> cpufreq),
>>> +	TP_ARGS(base, offset, boost, cpufreq));
>>> +
>>> +DEFINE_EVENT(device_state, device_target_update,
>>> +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
>>> cpufreq),
>>> +	TP_ARGS(base, offset, boost, cpufreq));
>>> +
>>> +TRACE_EVENT(device_lower_upper,
>>> +	TP_PROTO(u32 offset, u32 target, u32 lower, u32 upper),
>>> +	TP_ARGS(offset, target, lower, upper),
>>> +	TP_STRUCT__entry(
>>> +		__field(u32, offset)
>>> +		__field(u32, target)
>>> +		__field(u32, lower)
>>> +		__field(u32, upper)
>>> +	),
>>> +	TP_fast_assign(
>>> +		__entry->offset = offset;
>>> +		__entry->target = target;
>>> +		__entry->lower = lower;
>>> +		__entry->upper = upper;
>>> +	),
>>> +	TP_printk("%03x: freq %010u lower freq %010u upper freq
>>> %010u",
>>> +		__entry->offset,
>>> +		__entry->target,
>>> +		__entry->lower,
>>> +		__entry->upper)
>>> +);
>>> +
>>> +TRACE_EVENT(device_target_freq,
>>> +	TP_PROTO(u32 offset, u32 target),
>>> +	TP_ARGS(offset, target),
>>> +	TP_STRUCT__entry(
>>> +		__field(u32, offset)
>>> +		__field(u32, target)
>>> +	),
>>> +	TP_fast_assign(
>>> +		__entry->offset = offset;
>>> +		__entry->target = target;
>>> +	),
>>> +	TP_printk("%03x: freq %010u", __entry->offset,
>>> __entry->target) +);
>>> +#endif /* _TRACE_TEGRA30_DEVFREQ_H */
>>> +
>>> +/* This part must be outside protection */
>>> +#include <trace/define_trace.h>
>>>   
>>
>>
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply

* Re: [PATCH v4 17/24] PM / devfreq: tegra30: Use tracepoints for debugging
From: Dmitry Osipenko @ 2019-07-19  0:49 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <209220ec-b677-3500-0e55-6cad57e97f88@samsung.com>

В Thu, 18 Jul 2019 18:47:09 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
> > Debug messages create too much CPU and memory activity by
> > themselves, so it's difficult to debug lower rates and catch
> > unwanted interrupts that happen rarely. Tracepoints are ideal in
> > that regards because they do not contribute to the sampled date at
> > all. This allowed me to catch few problems which are fixed by the
> > followup patches, without tracepoints it would be much harder to do.
> > 
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/devfreq/tegra30-devfreq.c      |  43 +++-------
> >  include/trace/events/tegra30_devfreq.h | 105
> > +++++++++++++++++++++++++ 2 files changed, 117 insertions(+), 31
> > deletions(-) create mode 100644
> > include/trace/events/tegra30_devfreq.h  
> 
> As I knew, 'include/trace/events' don't include the header file
> for only one device driver. Usually, the trace event is provided
> by framework instead of each devic driver.

There are at least trace headers there for the tegra-apbdma,
tegra-host1x, intel-sst and intel-ish devices. I don't think that there
is a strict rule for the trace headers placement.

> > 
> > diff --git a/drivers/devfreq/tegra30-devfreq.c
> > b/drivers/devfreq/tegra30-devfreq.c index
> > 6ebf0f505767..43c9c5fbfe91 100644 ---
> > a/drivers/devfreq/tegra30-devfreq.c +++
> > b/drivers/devfreq/tegra30-devfreq.c @@ -19,6 +19,9 @@
> >  #include <linux/reset.h>
> >  #include <linux/workqueue.h>
> >  
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/tegra30_devfreq.h>
> > +
> >  #include "governor.h"
> >  
> >  #define ACTMON_GLB_STATUS
> > 0x0 @@ -283,9 +286,6 @@ static void
> > tegra_actmon_get_lower_upper(struct tegra_devfreq *tegra, unsigned
> > long *lower, unsigned long *upper)
> >  {
> > -	struct device *ddev = tegra->devfreq->dev.parent;
> > -	u32 offset = dev->config->offset;
> > -
> >  	/*
> >  	 * Memory frequencies are guaranteed to have 1MHz
> > granularity
> >  	 * and thus we need this rounding down to get a proper
> > watermarks @@ -298,8 +298,8 @@ static void
> > tegra_actmon_get_lower_upper(struct tegra_devfreq *tegra, *lower =
> > tegra_actmon_lower_freq(tegra, target_freq); *upper =
> > tegra_actmon_upper_freq(tegra, target_freq); 
> > -	dev_dbg(ddev, "%03x: target_freq %lu lower freq %lu upper
> > freq %lu\n",
> > -		offset, target_freq, *lower, *upper);
> > +	trace_device_lower_upper(dev->config->offset, target_freq,
> > +				 *lower, *upper);
> >  
> >  	/*
> >  	 * The upper watermark should take into account CPU's
> > frequency @@ -377,30 +377,13 @@ static void
> > tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
> > device_writel(dev, lower + delta, ACTMON_DEV_LOWER_WMARK); }
> >  
> > -static void actmon_device_debug(struct tegra_devfreq *tegra,
> > -				struct tegra_devfreq_device *dev,
> > -				const char *prefix)
> > -{
> > -	dev_dbg(tegra->devfreq->dev.parent,
> > -		"%03x: %s: 0x%08x 0x%08x a %u %u %u c %u %u %u b
> > %lu cpu %u\n",
> > -		dev->config->offset, prefix,
> > -		device_readl(dev, ACTMON_DEV_INTR_STATUS),
> > -		device_readl(dev, ACTMON_DEV_CTRL),
> > -		device_readl(dev, ACTMON_DEV_AVG_COUNT),
> > -		device_readl(dev, ACTMON_DEV_AVG_LOWER_WMARK),
> > -		device_readl(dev, ACTMON_DEV_AVG_UPPER_WMARK),
> > -		device_readl(dev, ACTMON_DEV_COUNT),
> > -		device_readl(dev, ACTMON_DEV_LOWER_WMARK),
> > -		device_readl(dev, ACTMON_DEV_UPPER_WMARK),
> > -		dev->boost_freq, cpufreq_get(0));
> > -}
> > -
> >  static void actmon_isr_device(struct tegra_devfreq *tegra,
> >  			      struct tegra_devfreq_device *dev)
> >  {
> >  	u32 intr_status, dev_ctrl, avg_intr_mask, avg_count;
> >  
> > -	actmon_device_debug(tegra, dev, "isr+");
> > +	trace_device_isr_enter(tegra->regs, dev->config->offset,
> > +			       dev->boost_freq, cpufreq_get(0));
> >  
> >  	intr_status = device_readl(dev, ACTMON_DEV_INTR_STATUS);
> >  	avg_count = device_readl(dev, ACTMON_DEV_AVG_COUNT);
> > @@ -455,7 +438,8 @@ static void actmon_isr_device(struct
> > tegra_devfreq *tegra, device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL);
> >  	device_writel(dev, ACTMON_INTR_STATUS_CLEAR,
> > ACTMON_DEV_INTR_STATUS); 
> > -	actmon_device_debug(tegra, dev, "isr-");
> > +	trace_device_isr_exit(tegra->regs, dev->config->offset,
> > +			      dev->boost_freq, cpufreq_get(0));
> >  }
> >  
> >  static unsigned long actmon_update_target(struct tegra_devfreq
> > *tegra, @@ -749,7 +733,6 @@ static struct devfreq_dev_profile
> > tegra_devfreq_profile = { static int
> > tegra_governor_get_target(struct devfreq *devfreq, unsigned long
> > *freq) {
> > -	struct device *ddev = devfreq->dev.parent;
> >  	struct devfreq_dev_status *stat;
> >  	struct tegra_devfreq *tegra;
> >  	struct tegra_devfreq_device *dev;
> > @@ -770,13 +753,11 @@ static int tegra_governor_get_target(struct
> > devfreq *devfreq, dev = &tegra->devices[i];
> >  
> >  		dev_target_freq = actmon_update_target(tegra, dev);
> > -
> >  		target_freq = max(target_freq, dev_target_freq);
> >  
> > -		dev_dbg(ddev, "%03x: upd: dev_target_freq %lu\n",
> > -			dev->config->offset, dev_target_freq);
> > -
> > -		actmon_device_debug(tegra, dev, "upd");
> > +		trace_device_target_freq(dev->config->offset,
> > dev_target_freq);
> > +		trace_device_target_update(tegra->regs,
> > dev->config->offset,
> > +					   dev->boost_freq,
> > cpufreq_get(0)); }
> >  
> >  	*freq = target_freq;
> > diff --git a/include/trace/events/tegra30_devfreq.h
> > b/include/trace/events/tegra30_devfreq.h new file mode 100644
> > index 000000000000..8f264a489daf
> > --- /dev/null
> > +++ b/include/trace/events/tegra30_devfreq.h
> > @@ -0,0 +1,105 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM tegra30_devfreq
> > +
> > +#if !defined(_TRACE_TEGRA30_DEVFREQ_H) ||
> > defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_TEGRA30_DEVFREQ_H
> > +
> > +#include <linux/io.h>
> > +#include <linux/tracepoint.h>
> > +#include <linux/types.h>
> > +
> > +DECLARE_EVENT_CLASS(device_state,
> > +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
> > cpufreq),
> > +	TP_ARGS(base, offset, boost, cpufreq),
> > +	TP_STRUCT__entry(
> > +		__field(u32, offset)
> > +		__field(u32, intr_status)
> > +		__field(u32, ctrl)
> > +		__field(u32, avg_count)
> > +		__field(u32, avg_lower)
> > +		__field(u32, avg_upper)
> > +		__field(u32, count)
> > +		__field(u32, lower)
> > +		__field(u32, upper)
> > +		__field(u32, boost_freq)
> > +		__field(u32, cpu_freq)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->offset		= offset;
> > +		__entry->intr_status	= readl_relaxed(base +
> > offset + 0x24);
> > +		__entry->ctrl		= readl_relaxed(base
> > + offset + 0x0);
> > +		__entry->avg_count	= readl_relaxed(base +
> > offset + 0x20);
> > +		__entry->avg_lower	= readl_relaxed(base +
> > offset + 0x14);
> > +		__entry->avg_upper	= readl_relaxed(base +
> > offset + 0x10);
> > +		__entry->count		= readl_relaxed(base
> > + offset + 0x1c);
> > +		__entry->lower		= readl_relaxed(base
> > + offset + 0x8);
> > +		__entry->upper		= readl_relaxed(base
> > + offset + 0x4);
> > +		__entry->boost_freq	= boost;
> > +		__entry->cpu_freq	= cpufreq;
> > +	),
> > +	TP_printk("%03x: intr 0x%08x ctrl 0x%08x avg %010u %010u
> > %010u cnt %010u %010u %010u boost %010u cpu %u",
> > +		__entry->offset,
> > +		__entry->intr_status,
> > +		__entry->ctrl,
> > +		__entry->avg_count,
> > +		__entry->avg_lower,
> > +		__entry->avg_upper,
> > +		__entry->count,
> > +		__entry->lower,
> > +		__entry->upper,
> > +		__entry->boost_freq,
> > +		__entry->cpu_freq)
> > +);
> > +
> > +DEFINE_EVENT(device_state, device_isr_enter,
> > +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
> > cpufreq),
> > +	TP_ARGS(base, offset, boost, cpufreq));
> > +
> > +DEFINE_EVENT(device_state, device_isr_exit,
> > +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
> > cpufreq),
> > +	TP_ARGS(base, offset, boost, cpufreq));
> > +
> > +DEFINE_EVENT(device_state, device_target_update,
> > +	TP_PROTO(void __iomem *base, u32 offset, u32 boost, u32
> > cpufreq),
> > +	TP_ARGS(base, offset, boost, cpufreq));
> > +
> > +TRACE_EVENT(device_lower_upper,
> > +	TP_PROTO(u32 offset, u32 target, u32 lower, u32 upper),
> > +	TP_ARGS(offset, target, lower, upper),
> > +	TP_STRUCT__entry(
> > +		__field(u32, offset)
> > +		__field(u32, target)
> > +		__field(u32, lower)
> > +		__field(u32, upper)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->offset = offset;
> > +		__entry->target = target;
> > +		__entry->lower = lower;
> > +		__entry->upper = upper;
> > +	),
> > +	TP_printk("%03x: freq %010u lower freq %010u upper freq
> > %010u",
> > +		__entry->offset,
> > +		__entry->target,
> > +		__entry->lower,
> > +		__entry->upper)
> > +);
> > +
> > +TRACE_EVENT(device_target_freq,
> > +	TP_PROTO(u32 offset, u32 target),
> > +	TP_ARGS(offset, target),
> > +	TP_STRUCT__entry(
> > +		__field(u32, offset)
> > +		__field(u32, target)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->offset = offset;
> > +		__entry->target = target;
> > +	),
> > +	TP_printk("%03x: freq %010u", __entry->offset,
> > __entry->target) +);
> > +#endif /* _TRACE_TEGRA30_DEVFREQ_H */
> > +
> > +/* This part must be outside protection */
> > +#include <trace/define_trace.h>
> >   
> 
> 


^ permalink raw reply

* Re: [PATCH v4 18/24] PM / devfreq: tegra30: Optimize CPUFreq notifier
From: Dmitry Osipenko @ 2019-07-19  0:42 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <ce18694a-4281-a245-7bdf-299fedc3c724@samsung.com>

В Thu, 18 Jul 2019 18:48:42 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
> > When CPU's memory activity is low or memory activity is high such
> > that CPU's frequency contribution to the boosting is not taken into
> > account, then there is no need to schedule devfreq's update. This
> > eliminates unnecessary CPU activity during of idling caused by the
> > scheduled work.
> > 
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/devfreq/tegra30-devfreq.c | 73
> > +++++++++++++++++++++++++++---- 1 file changed, 64 insertions(+), 9
> > deletions(-)  
> 
> Patch4 add the 'cpufreq notifier' and this patch optimize the cpufreq
> notifier. I think t hat you can combine two patches.

I'd prefer to keep them separate for a sake of git bisection.

> > 
> > diff --git a/drivers/devfreq/tegra30-devfreq.c
> > b/drivers/devfreq/tegra30-devfreq.c index
> > 43c9c5fbfe91..8d6bf6e9f1ae 100644 ---
> > a/drivers/devfreq/tegra30-devfreq.c +++
> > b/drivers/devfreq/tegra30-devfreq.c @@ -216,10 +216,10 @@ static
> > inline unsigned long do_percent(unsigned long val, unsigned int
> > pct) return val * pct / 100; }
> >  
> > -static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq
> > *tegra) +static unsigned long actmon_cpu_to_emc_rate(struct
> > tegra_devfreq *tegra,
> > +					    unsigned int cpu_freq)
> >  {
> >  	const struct tegra_actmon_emc_ratio *ratio =
> > actmon_emc_ratios;
> > -	unsigned int cpu_freq = cpufreq_get(0);
> >  	unsigned int i;
> >  
> >  	for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++,
> > ratio++) { @@ -239,15 +239,15 @@
> > tegra_actmon_account_cpu_freq(struct tegra_devfreq *tegra, struct
> > tegra_devfreq_device *dev, unsigned long target_freq)
> >  {
> > -	unsigned long static_cpu_emc_freq;
> > +	unsigned long cpu_emc_freq = 0;
> >  
> > -	if (dev->config->avg_dependency_threshold &&
> > -	    dev->config->avg_dependency_threshold < dev->avg_freq)
> > {
> > -		static_cpu_emc_freq =
> > actmon_cpu_to_emc_rate(tegra);
> > -		target_freq = max(target_freq,
> > static_cpu_emc_freq);
> > -	}
> > +	if (!dev->config->avg_dependency_threshold)
> > +		return target_freq;
> >  
> > -	return target_freq;
> > +	if (dev->avg_freq > dev->config->avg_dependency_threshold)
> > +		cpu_emc_freq = actmon_cpu_to_emc_rate(tegra,
> > cpufreq_get(0)); +
> > +	return max(target_freq, cpu_emc_freq);
> >  }
> >  
> >  static unsigned long tegra_actmon_lower_freq(struct tegra_devfreq
> > *tegra, @@ -531,16 +531,71 @@ static void
> > tegra_actmon_delayed_update(struct work_struct *work)
> > mutex_unlock(&tegra->devfreq->lock); }
> >  
> > +static unsigned long
> > +tegra_actmon_cpufreq_contribution(struct tegra_devfreq *tegra,
> > +				  unsigned int cpu_freq)
> > +{
> > +	unsigned long freq, static_cpu_emc_freq;
> > +
> > +	/* check whether CPU's freq is taken into account at all */
> > +	if (tegra->devices[MCCPU].avg_freq <=
> > +	    tegra->devices[MCCPU].config->avg_dependency_threshold)
> > +		return 0;
> > +
> > +	static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra,
> > cpu_freq); +
> > +	/* compare static CPU-EMC freq with MCALL */
> > +	freq = tegra->devices[MCALL].avg_freq +
> > +	       tegra->devices[MCALL].boost_freq;
> > +
> > +	freq = tegra_actmon_upper_freq(tegra, freq);
> > +
> > +	if (freq == tegra->max_freq || freq >= static_cpu_emc_freq)
> > +		return 0;
> > +
> > +	/* compare static CPU-EMC freq with MCCPU */
> > +	freq = tegra->devices[MCCPU].avg_freq +
> > +	       tegra->devices[MCCPU].boost_freq;
> > +
> > +	freq = tegra_actmon_upper_freq(tegra, freq);
> > +
> > +	if (freq == tegra->max_freq || freq >= static_cpu_emc_freq)
> > +		return 0;
> > +
> > +	return static_cpu_emc_freq;
> > +}
> > +
> >  static int tegra_actmon_cpu_notify_cb(struct notifier_block *nb,
> >  				      unsigned long action, void
> > *ptr) {
> > +	struct cpufreq_freqs *freqs = ptr;
> >  	struct tegra_devfreq *tegra;
> > +	unsigned long old, new;
> >  
> >  	if (action != CPUFREQ_POSTCHANGE)
> >  		return NOTIFY_OK;
> >  
> >  	tegra = container_of(nb, struct tegra_devfreq,
> > cpu_rate_change_nb); 
> > +	/*
> > +	 * Quickly check whether CPU frequency should be taken
> > into account
> > +	 * at all, without blocking CPUFreq's core.
> > +	 */
> > +	if (mutex_trylock(&tegra->devfreq->lock)) {
> > +		old = tegra_actmon_cpufreq_contribution(tegra,
> > freqs->old);
> > +		new = tegra_actmon_cpufreq_contribution(tegra,
> > freqs->new);
> > +		mutex_unlock(&tegra->devfreq->lock);
> > +
> > +		/*
> > +		 * If CPU's frequency shouldn't be taken into
> > account at
> > +		 * the moment, then there is no need to update the
> > devfreq's
> > +		 * state because ISR will re-check CPU's frequency
> > on the
> > +		 * next interrupt.
> > +		 */
> > +		if (old == new)
> > +			return NOTIFY_OK;
> > +	}
> > +
> >  	/*
> >  	 * CPUFreq driver should support
> > CPUFREQ_ASYNC_NOTIFICATION in order
> >  	 * to allow asynchronous notifications. This means we
> > can't block 
> 
> 


^ permalink raw reply

* Re: [PATCH v4 19/24] PM / devfreq: tegra30: Optimize upper consecutive watermark selection
From: Dmitry Osipenko @ 2019-07-19  0:40 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <17fabbaf-ceca-7551-0a58-9c8a0e7027ed@samsung.com>

В Thu, 18 Jul 2019 18:51:02 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
> > The memory activity counter may get a bit higher than a watermark
> > which is selected based on OPP that corresponds to a highest EMC
> > rate, in this case watermark is lower than the actual memory
> > activity is and thus results in unwanted "upper" interrupts.
> > 
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/devfreq/tegra30-devfreq.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)  
> 
> It seems that you can combine patch19 with patch20.

No, consecutive and average watermarks are different things that have
different purposes. Consecutive are used for boosting, while average
are for significant memory bandwidth changes.

> > 
> > diff --git a/drivers/devfreq/tegra30-devfreq.c
> > b/drivers/devfreq/tegra30-devfreq.c index
> > 8d6bf6e9f1ae..c3cf87231d25 100644 ---
> > a/drivers/devfreq/tegra30-devfreq.c +++
> > b/drivers/devfreq/tegra30-devfreq.c @@ -363,7 +363,18 @@ static
> > void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
> > tegra_actmon_get_lower_upper(tegra, dev, freq - 1, &lower, &upper); 
> >  	delta = do_percent(upper - lower,
> > dev->config->boost_up_threshold);
> > -	device_writel(dev, lower + delta, ACTMON_DEV_UPPER_WMARK);
> > +
> > +	/*
> > +	 * The memory events count could go a bit higher than the
> > maximum
> > +	 * defined by the OPPs, hence make the upper watermark
> > infinitely
> > +	 * high to avoid unnecessary upper interrupts in that case.
> > +	 */
> > +	if (freq == tegra->max_freq)
> > +		upper = ULONG_MAX;
> > +	else
> > +		upper = lower + delta;
> > +
> > +	device_writel(dev, upper, ACTMON_DEV_UPPER_WMARK);
> >  
> >  	/*
> >  	 * Meanwhile the lower mark is based on the average value
> >   
> 
> 


^ permalink raw reply

* Re: [PATCH v4 22/24] PM / devfreq: tegra30: Include appropriate header
From: Dmitry Osipenko @ 2019-07-19  0:34 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <2f317faf-8de0-2717-cda7-b15374039493@samsung.com>

В Thu, 18 Jul 2019 18:58:50 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 8. 오전 7:33, Dmitry Osipenko wrote:
> > It's not very correct to include mod_devicetable.h for the OF device
> > drivers and of_device.h should be included instead.
> > 
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/devfreq/tegra30-devfreq.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/devfreq/tegra30-devfreq.c
> > b/drivers/devfreq/tegra30-devfreq.c index
> > 8a674fad26be..19e872a64148 100644 ---
> > a/drivers/devfreq/tegra30-devfreq.c +++
> > b/drivers/devfreq/tegra30-devfreq.c @@ -13,7 +13,7 @@
> >  #include <linux/io.h>
> >  #include <linux/irq.h>
> >  #include <linux/module.h>
> > -#include <linux/mod_devicetable.h>
> > +#include <linux/of_device.h>
> >  #include <linux/platform_device.h>
> >  #include <linux/pm_opp.h>
> >  #include <linux/reset.h>
> >   
> 
> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
> 
> Is not there other unused header file anymore?
> 

The rest looks good to me.

^ permalink raw reply

* Re: [PATCH v4 21/24] PM / devfreq: tegra30: Synchronize average count on target's update
From: Dmitry Osipenko @ 2019-07-19  0:31 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <f83fb873-9214-9649-9435-9f211c4cba9e@samsung.com>

В Thu, 18 Jul 2019 19:15:54 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 8. 오전 7:33, Dmitry Osipenko wrote:
> > The average count may get out of sync if interrupt was disabled /
> > avoided for a long time due to upper watermark optimization, hence
> > it should be re-synced on each target's update to ensure that
> > watermarks are set up correctly on EMC rate-change notification and
> > that a correct frequency is selected for device.
> > 
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/devfreq/tegra30-devfreq.c | 30
> > ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
> > 
> > diff --git a/drivers/devfreq/tegra30-devfreq.c
> > b/drivers/devfreq/tegra30-devfreq.c index
> > 4d582809acb6..8a674fad26be 100644 ---
> > a/drivers/devfreq/tegra30-devfreq.c +++
> > b/drivers/devfreq/tegra30-devfreq.c @@ -466,11 +466,41 @@ static
> > void actmon_isr_device(struct tegra_devfreq *tegra,
> > dev->boost_freq, cpufreq_get(0)); }
> >  
> > +static void tegra_devfreq_sync_avg_count(struct tegra_devfreq
> > *tegra,
> > +					 struct
> > tegra_devfreq_device *dev) +{
> > +	u32 avg_count, avg_freq, old_upper, new_upper;
> > +
> > +	avg_count = device_readl(dev, ACTMON_DEV_AVG_COUNT);
> > +	avg_freq = avg_count / ACTMON_SAMPLING_PERIOD;
> > +
> > +	old_upper = tegra_actmon_upper_freq(tegra, dev->avg_freq);
> > +	new_upper = tegra_actmon_upper_freq(tegra, avg_freq);
> > +
> > +	/* similar to ISR, see comments in actmon_isr_device() */
> > +	if (old_upper != new_upper) {
> > +		dev->avg_freq = avg_freq;
> > +		dev->boost_freq = 0;
> > +	}
> > +}
> > +
> >  static unsigned long actmon_update_target(struct tegra_devfreq
> > *tegra, struct tegra_devfreq_device *dev)
> >  {
> >  	unsigned long target_freq;
> >  
> > +	/*
> > +	 * The avg_count / avg_freq is getting snapshoted on
> > device's
> > +	 * interrupt, but there are cases where actual value need
> > to
> > +	 * be utilized on target's update, like CPUFreq boosting
> > and
> > +	 * overriding the min freq
> > via /sys/class/devfreq/devfreq0/min_freq
> > +	 * because we're optimizing the upper watermark based on
> > the
> > +	 * actual EMC frequency. This means that interrupt may be
> > +	 * inactive for a long time and thus making snapshoted
> > value
> > +	 * outdated.
> > +	 */
> > +	tegra_devfreq_sync_avg_count(tegra, dev);  
> 
> I think that you don't need to add the separate function to calculate
> the 'dev->avg_freq'. It is enough with your detailed comment to add
> this code in this function.

The separate function is indeed not mandatory here, but I'm finding that
it usually makes easier to read and follow the code when it is properly
split up into logical blocks. Don't you agree?

> > +
> >  	target_freq = min(dev->avg_freq + dev->boost_freq,
> > KHZ_MAX); target_freq = tegra_actmon_account_cpu_freq(tegra, dev,
> > target_freq); 
> >   
> 
> And also, is it impossible to squash this patch with patch19/patch20?
> 

It should be possible to squash this patch with #20, but wouldn't
be better to keep changes in the chronological order? It's also better
to keep changes separate simply to aid bisection in case of a problem.


^ permalink raw reply

* Re: [PATCH v4 05/24] PM / devfreq: tegra30: Set up watermarks properly
From: Dmitry Osipenko @ 2019-07-19  0:00 UTC (permalink / raw)
  To: Chanwoo Choi
  Cc: Thierry Reding, MyungJoo Ham, Kyungmin Park, Jonathan Hunter,
	Tomeu Vizoso, linux-pm, linux-tegra, linux-kernel
In-Reply-To: <9efb75fe-994f-24c7-7872-b5e5041105a6@samsung.com>

В Thu, 18 Jul 2019 19:17:17 +0900
Chanwoo Choi <cw00.choi@samsung.com> пишет:

> On 19. 7. 8. 오전 7:32, Dmitry Osipenko wrote:
> > The current implementation is inaccurate and results in very
> > intensive interrupt activity, which neglects the whole idea of
> > polling offload to hardware. The reason of the shortcoming is that
> > watermarks are not set up correctly and this results in ACTMON
> > constantly asking to change freq and then these requests are
> > ignored. The end result of this patch is that there are few
> > hundreds of ACTMON's interrupts instead of tens thousands after few
> > minutes of a working devfreq, meanwhile the transitions activity
> > stays about the same and governor becomes more reactive.
> > 
> > Since watermarks are set precisely correct now, the boosting logic
> > is changed a tad to accommodate the change. The "average sustain
> > coefficient" multiplier is gone now since there is no need to
> > compensate the improper watermarks and EMC frequency-bump happens
> > once boosting hits the upper watermark enough times, depending on
> > the per-device boosting threshold.
> > 
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/devfreq/tegra30-devfreq.c | 293
> > +++++++++++++++++++++--------- 1 file changed, 209 insertions(+),
> > 84 deletions(-)
> > 
> > diff --git a/drivers/devfreq/tegra30-devfreq.c
> > b/drivers/devfreq/tegra30-devfreq.c index
> > 4be7858c33bc..16f7e6cf3b99 100644 ---
> > a/drivers/devfreq/tegra30-devfreq.c +++
> > b/drivers/devfreq/tegra30-devfreq.c @@ -47,6 +47,8 @@
> >  
> >  #define ACTMON_DEV_INTR_CONSECUTIVE_UPPER
> > BIT(31) #define
> > ACTMON_DEV_INTR_CONSECUTIVE_LOWER			BIT(30)
> > +#define
> > ACTMON_DEV_INTR_AVG_BELOW_WMARK
> > BIT(25) +#define
> > ACTMON_DEV_INTR_AVG_ABOVE_WMARK
> > BIT(24) #define
> > ACTMON_ABOVE_WMARK_WINDOW				1 #define
> > ACTMON_BELOW_WMARK_WINDOW				3 @@ -63,9
> > +65,8 @@
> >   * ACTMON_AVERAGE_WINDOW_LOG2: default value for @DEV_CTRL_K_VAL,
> > which
> >   * translates to 2 ^ (K_VAL + 1). ex: 2 ^ (6 + 1) = 128
> >   */
> > -#define ACTMON_AVERAGE_WINDOW_LOG2			6
> > -#define ACTMON_SAMPLING_PERIOD
> > 12 /* ms */ -#define
> > ACTMON_DEFAULT_AVG_BAND				6  /* 1/10
> > of % */ +#define
> > ACTMON_AVERAGE_WINDOW_LOG2				6
> > +#define
> > ACTMON_SAMPLING_PERIOD					12 /*
> > ms */ #define
> > KHZ							1000 @@
> > -142,9 +143,6 @@ struct tegra_devfreq_device {
> >  	 * watermark breaches.
> >  	 */
> >  	unsigned long boost_freq;
> > -
> > -	/* Optimal frequency calculated from the stats for this
> > device */
> > -	unsigned long target_freq;
> >  };
> >  
> >  struct tegra_devfreq {
> > @@ -156,7 +154,6 @@ struct tegra_devfreq {
> >  
> >  	struct clk		*emc_clock;
> >  	unsigned long		max_freq;
> > -	unsigned long		cur_freq;
> >  	struct notifier_block	rate_change_nb;
> >  
> >  	struct tegra_devfreq_device
> > devices[ARRAY_SIZE(actmon_device_configs)]; @@ -205,42 +202,182 @@
> > static unsigned long do_percent(unsigned long val, unsigned int
> > pct) return val * pct / 100; }
> >  
> > +static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq
> > *tegra) +{
> > +	struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios;
> > +	unsigned int cpu_freq = cpufreq_get(0);
> > +	unsigned int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++,
> > ratio++) {
> > +		if (cpu_freq >= ratio->cpu_freq) {
> > +			if (ratio->emc_freq >= tegra->max_freq)
> > +				return tegra->max_freq;
> > +			else
> > +				return ratio->emc_freq;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static unsigned long
> > +tegra_actmon_account_cpu_freq(struct tegra_devfreq *tegra,
> > +			      struct tegra_devfreq_device *dev,
> > +			      unsigned long target_freq)
> > +{
> > +	unsigned long static_cpu_emc_freq;
> > +
> > +	if (dev->config->avg_dependency_threshold &&
> > +	    dev->config->avg_dependency_threshold <
> > dev->avg_count) {
> > +		static_cpu_emc_freq =
> > actmon_cpu_to_emc_rate(tegra);
> > +		target_freq = max(target_freq,
> > static_cpu_emc_freq);
> > +	}
> > +
> > +	return target_freq;
> > +}
> > +
> > +static unsigned long tegra_actmon_lower_freq(struct tegra_devfreq
> > *tegra,
> > +					     unsigned long
> > target_freq) +{
> > +	unsigned long lower = target_freq;
> > +	struct dev_pm_opp *opp;
> > +
> > +	opp =
> > dev_pm_opp_find_freq_floor(tegra->devfreq->dev.parent, &lower);
> > +	if (IS_ERR(opp))
> > +		lower = 0;
> > +	else
> > +		dev_pm_opp_put(opp);
> > +
> > +	return lower;
> > +}
> > +
> > +static unsigned long tegra_actmon_upper_freq(struct tegra_devfreq
> > *tegra,
> > +					     unsigned long
> > target_freq) +{
> > +	unsigned long upper = target_freq + 1;
> > +	struct dev_pm_opp *opp;
> > +
> > +	opp =
> > dev_pm_opp_find_freq_ceil(tegra->devfreq->dev.parent, &upper);
> > +	if (IS_ERR(opp))
> > +		upper = ULONG_MAX;
> > +	else
> > +		dev_pm_opp_put(opp);
> > +
> > +	return upper;
> > +}
> > +
> > +static void tegra_actmon_get_lower_upper(struct tegra_devfreq
> > *tegra,
> > +					 struct
> > tegra_devfreq_device *dev,
> > +					 unsigned long target_freq,
> > +					 unsigned long *lower,
> > +					 unsigned long *upper)
> > +{
> > +	/*
> > +	 * Memory frequencies are guaranteed to have 1MHz
> > granularity
> > +	 * and thus we need this rounding down to get a proper
> > watermarks
> > +	 * range in a case where target_freq falls into a range of
> > +	 * next_possible_opp_freq - 1MHz.
> > +	 */
> > +	target_freq = round_down(target_freq, 1000000);
> > +
> > +	/* watermarks are set at the borders of the corresponding
> > OPPs */
> > +	*lower = tegra_actmon_lower_freq(tegra, target_freq);
> > +	*upper = tegra_actmon_upper_freq(tegra, target_freq);
> > +
> > +	*lower /= KHZ;
> > +	*upper /= KHZ;
> > +
> > +	/*
> > +	 * The upper watermark should take into account CPU's
> > frequency
> > +	 * because cpu_to_emc_rate() may override the target_freq
> > with
> > +	 * a higher value and thus upper watermark need to be set
> > up
> > +	 * accordingly to avoid parasitic upper-events.
> > +	 */
> > +	*upper = tegra_actmon_account_cpu_freq(tegra, dev, *upper);
> > +
> > +	*lower *= ACTMON_SAMPLING_PERIOD;
> > +	*upper *= ACTMON_SAMPLING_PERIOD;
> > +}
> > +
> >  static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq
> > *tegra, struct tegra_devfreq_device *dev)
> >  {
> > -	u32 avg = dev->avg_count;
> > -	u32 avg_band_freq = tegra->max_freq *
> > ACTMON_DEFAULT_AVG_BAND / KHZ;
> > -	u32 band = avg_band_freq * ACTMON_SAMPLING_PERIOD;
> > +	unsigned long lower, upper, freq;
> >  
> > -	device_writel(dev, avg + band, ACTMON_DEV_AVG_UPPER_WMARK);
> > +	freq = dev->avg_count / ACTMON_SAMPLING_PERIOD * KHZ;
> > +	tegra_actmon_get_lower_upper(tegra, dev, freq, &lower,
> > &upper); 
> > -	avg = max(dev->avg_count, band);
> > -	device_writel(dev, avg - band, ACTMON_DEV_AVG_LOWER_WMARK);
> > +	/*
> > +	 * We want to get interrupts when MCCPU client crosses the
> > +	 * dependency threshold in order to take into / out of
> > account
> > +	 * the CPU's freq.
> > +	 */
> > +	if (lower < dev->config->avg_dependency_threshold &&
> > +	    upper > dev->config->avg_dependency_threshold) {
> > +		if (dev->avg_count <
> > dev->config->avg_dependency_threshold)
> > +			upper =
> > dev->config->avg_dependency_threshold;
> > +		else
> > +			lower =
> > dev->config->avg_dependency_threshold;
> > +	}
> > +
> > +	device_writel(dev, lower, ACTMON_DEV_AVG_LOWER_WMARK);
> > +	device_writel(dev, upper, ACTMON_DEV_AVG_UPPER_WMARK);
> >  }
> >  
> >  static void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
> > -				       struct tegra_devfreq_device
> > *dev)
> > +				       struct tegra_devfreq_device
> > *dev,
> > +				       unsigned long freq)
> >  {
> > -	u32 val = tegra->cur_freq * ACTMON_SAMPLING_PERIOD;
> > +	unsigned long lower, upper, delta;
> > +
> > +	/*
> > +	 * Boosting logic kicks-in once lower / upper watermark is
> > hit.
> > +	 * The watermarks are based on the updated EMC rate and the
> > +	 * average activity.
> > +	 *
> > +	 * The higher watermark is set in accordance to the EMC
> > rate
> > +	 * because we want to set it to the highest mark here and
> > EMC rate
> > +	 * represents that mark. The consecutive-upper interrupts
> > are
> > +	 * always enabled and we don't want to receive them if
> > they won't
> > +	 * do anything useful, hence the upper watermark is capped
> > to maximum.
> > +	 * Note that the EMC rate is changed once boosting pushed
> > the rate
> > +	 * too high, in that case boosting-up will be stopped
> > because
> > +	 * upper watermark is much higher now and it is
> > *important* to
> > +	 * stop the unwanted interrupts.
> > +	 */
> > +	tegra_actmon_get_lower_upper(tegra, dev, freq - 1, &lower,
> > &upper); +
> > +	delta = do_percent(upper - lower,
> > dev->config->boost_up_threshold);
> > +	device_writel(dev, lower + delta, ACTMON_DEV_UPPER_WMARK);
> >  
> > -	device_writel(dev, do_percent(val,
> > dev->config->boost_up_threshold),
> > -		      ACTMON_DEV_UPPER_WMARK);
> > +	/*
> > +	 * Meanwhile the lower mark is based on the average value
> > +	 * because it is the lowest possible consecutive-mark for
> > this
> > +	 * device. Once that mark is hit and boosting is stopped,
> > the
> > +	 * interrupt is disabled by ISR.
> > +	 */
> > +	freq = dev->avg_count / ACTMON_SAMPLING_PERIOD * KHZ;
> > +	tegra_actmon_get_lower_upper(tegra, dev, freq, &lower,
> > &upper); 
> > -	device_writel(dev, do_percent(val,
> > dev->config->boost_down_threshold),
> > -		      ACTMON_DEV_LOWER_WMARK);
> > +	delta = do_percent(upper - lower,
> > dev->config->boost_down_threshold);
> > +	device_writel(dev, lower + delta, ACTMON_DEV_LOWER_WMARK);
> >  }
> >  
> >  static void actmon_isr_device(struct tegra_devfreq *tegra,
> >  			      struct tegra_devfreq_device *dev)
> >  {
> > -	u32 intr_status, dev_ctrl;
> > +	u32 intr_status, dev_ctrl, avg_intr_mask;
> >  
> >  	dev->avg_count = device_readl(dev, ACTMON_DEV_AVG_COUNT);
> > -	tegra_devfreq_update_avg_wmark(tegra, dev);
> > -
> >  	intr_status = device_readl(dev, ACTMON_DEV_INTR_STATUS);
> >  	dev_ctrl = device_readl(dev, ACTMON_DEV_CTRL);
> >  
> > +	avg_intr_mask = ACTMON_DEV_INTR_AVG_BELOW_WMARK |
> > +			ACTMON_DEV_INTR_AVG_ABOVE_WMARK;
> > +
> > +	if (intr_status & avg_intr_mask)
> > +		tegra_devfreq_update_avg_wmark(tegra, dev);
> > +
> >  	if (intr_status & ACTMON_DEV_INTR_CONSECUTIVE_UPPER) {
> >  		/*
> >  		 * new_boost = min(old_boost * up_coef + step,
> > max_freq) @@ -253,8 +390,6 @@ static void actmon_isr_device(struct
> > tegra_devfreq *tegra, 
> >  		if (dev->boost_freq >= tegra->max_freq)
> >  			dev->boost_freq = tegra->max_freq;
> > -		else
> > -			dev_ctrl |=
> > ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; } else if (intr_status
> > & ACTMON_DEV_INTR_CONSECUTIVE_LOWER) { /*
> >  		 * new_boost = old_boost * down_coef
> > @@ -263,63 +398,37 @@ static void actmon_isr_device(struct
> > tegra_devfreq *tegra, dev->boost_freq = do_percent(dev->boost_freq,
> >  					     dev->config->boost_down_coeff);
> >  
> > -		dev_ctrl |=
> > ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; -
> >  		if (dev->boost_freq < (ACTMON_BOOST_FREQ_STEP >>
> > 1)) dev->boost_freq = 0;
> > -		else
> > -			dev_ctrl |=
> > ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; }
> >  
> > -	if (dev->config->avg_dependency_threshold) {
> > -		if (dev->avg_count >=
> > dev->config->avg_dependency_threshold)
> > -			dev_ctrl |=
> > ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
> > -		else if (dev->boost_freq == 0)
> > -			dev_ctrl &=
> > ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
> > +	if (intr_status & avg_intr_mask) {
> > +		/*
> > +		 * Once average watermark is hit, it means that
> > the memory
> > +		 * activity changed significantly and thus
> > boosting-up shall
> > +		 * be reset because EMC clock rate will be changed
> > and
> > +		 * boosting will restart in this case.
> > +		 */
> > +		dev->boost_freq = 0;
> >  	}
> >  
> > -	device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL);
> > +	/* no boosting => no need for consecutive-down interrupt */
> > +	if (dev->boost_freq == 0)
> > +		dev_ctrl &=
> > ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; 
> > +	device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL);
> >  	device_writel(dev, ACTMON_INTR_STATUS_CLEAR,
> > ACTMON_DEV_INTR_STATUS); }
> >  
> > -static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq
> > *tegra,
> > -					    unsigned long cpu_freq)
> > -{
> > -	unsigned int i;
> > -	struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios;
> > -
> > -	for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++,
> > ratio++) {
> > -		if (cpu_freq >= ratio->cpu_freq) {
> > -			if (ratio->emc_freq >= tegra->max_freq)
> > -				return tegra->max_freq;
> > -			else
> > -				return ratio->emc_freq;
> > -		}
> > -	}
> > -
> > -	return 0;
> > -}
> > -
> > -static void actmon_update_target(struct tegra_devfreq *tegra,
> > -				 struct tegra_devfreq_device *dev)
> > +static unsigned long actmon_update_target(struct tegra_devfreq
> > *tegra,
> > +					  struct
> > tegra_devfreq_device *dev) {
> > -	unsigned long cpu_freq = 0;
> > -	unsigned long static_cpu_emc_freq = 0;
> > -	unsigned int avg_sustain_coef;
> > -
> > -	if (dev->config->avg_dependency_threshold) {
> > -		cpu_freq = cpufreq_get(0);
> > -		static_cpu_emc_freq =
> > actmon_cpu_to_emc_rate(tegra, cpu_freq);
> > -	}
> > +	unsigned long target_freq;
> >  
> > -	dev->target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD;
> > -	avg_sustain_coef = 100 * 100 /
> > dev->config->boost_up_threshold;
> > -	dev->target_freq = do_percent(dev->target_freq,
> > avg_sustain_coef);
> > -	dev->target_freq += dev->boost_freq;
> > +	target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD +
> > dev->boost_freq;
> > +	target_freq = tegra_actmon_account_cpu_freq(tegra, dev,
> > target_freq); 
> > -	if (dev->avg_count >=
> > dev->config->avg_dependency_threshold)
> > -		dev->target_freq = max(dev->target_freq,
> > static_cpu_emc_freq);
> > +	return target_freq;
> >  }
> >  
> >  static irqreturn_t actmon_thread_isr(int irq, void *data)
> > @@ -351,8 +460,8 @@ static int tegra_actmon_rate_notify_cb(struct
> > notifier_block *nb, unsigned long action, void *ptr)
> >  {
> >  	struct clk_notifier_data *data = ptr;
> > -	struct tegra_devfreq *tegra;
> >  	struct tegra_devfreq_device *dev;
> > +	struct tegra_devfreq *tegra;
> >  	unsigned int i;
> >  
> >  	if (action != POST_RATE_CHANGE)
> > @@ -360,12 +469,28 @@ static int tegra_actmon_rate_notify_cb(struct
> > notifier_block *nb, 
> >  	tegra = container_of(nb, struct tegra_devfreq,
> > rate_change_nb); 
> > -	tegra->cur_freq = data->new_rate / KHZ;
> > -
> > +	/*
> > +	 * EMC rate could change due to three reasons:
> > +	 *
> > +	 *    1. Average watermark hit
> > +	 *    2. Boosting overflow
> > +	 *    3. CPU freq change
> > +	 *
> > +	 * Once rate is changed, the consecutive watermarks need
> > to be
> > +	 * updated in order for boosting to work properly and to
> > avoid
> > +	 * unnecessary interrupts. Note that the consecutive range
> > is set for
> > +	 * all of devices using the same rate, hence if CPU is
> > doing much
> > +	 * less than the other memory clients, then its upper
> > watermark will
> > +	 * be very high in comparison to the actual activity
> > (lower watermark)
> > +	 * and thus unnecessary upper-interrupts will be
> > suppressed.
> > +	 *
> > +	 * The average watermarks also should be updated because
> > of 3.
> > +	 */
> >  	for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
> >  		dev = &tegra->devices[i];
> >  
> > -		tegra_devfreq_update_wmark(tegra, dev);
> > +		tegra_devfreq_update_avg_wmark(tegra, dev);
> > +		tegra_devfreq_update_wmark(tegra, dev,
> > data->new_rate); }
> >  
> >  	return NOTIFY_OK;
> > @@ -374,15 +499,14 @@ static int tegra_actmon_rate_notify_cb(struct
> > notifier_block *nb, static void
> > tegra_actmon_configure_device(struct tegra_devfreq *tegra, struct
> > tegra_devfreq_device *dev) {
> > -	u32 val = 0;
> > -
> > -	dev->target_freq = tegra->cur_freq;
> > +	u32 val = 0, target_freq;
> >  
> > -	dev->avg_count = tegra->cur_freq * ACTMON_SAMPLING_PERIOD;
> > +	target_freq = clk_get_rate(tegra->emc_clock) / KHZ;
> > +	dev->avg_count = target_freq * ACTMON_SAMPLING_PERIOD;
> >  	device_writel(dev, dev->avg_count, ACTMON_DEV_INIT_AVG);
> >  
> >  	tegra_devfreq_update_avg_wmark(tegra, dev);
> > -	tegra_devfreq_update_wmark(tegra, dev);
> > +	tegra_devfreq_update_wmark(tegra, dev, target_freq);
> >  
> >  	device_writel(dev, ACTMON_COUNT_WEIGHT,
> > ACTMON_DEV_COUNT_WEIGHT); device_writel(dev,
> > ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS); @@ -469,13
> > +593,13 @@ static int tegra_devfreq_get_dev_status(struct device
> > *dev, struct tegra_devfreq_device *actmon_dev; unsigned long
> > cur_freq; 
> > -	cur_freq = READ_ONCE(tegra->cur_freq);
> > +	cur_freq = clk_get_rate(tegra->emc_clock);
> >  
> >  	/* To be used by the tegra governor */
> >  	stat->private_data = tegra;
> >  
> >  	/* The below are to be used by the other governors */
> > -	stat->current_frequency = cur_freq * KHZ;
> > +	stat->current_frequency = cur_freq;
> >  
> >  	actmon_dev = &tegra->devices[MCALL];
> >  
> > @@ -486,7 +610,7 @@ static int tegra_devfreq_get_dev_status(struct
> > device *dev, stat->busy_time *= 100 / BUS_SATURATION_RATIO;
> >  
> >  	/* Number of cycles in a sampling period */
> > -	stat->total_time = ACTMON_SAMPLING_PERIOD * cur_freq;
> > +	stat->total_time = cur_freq / KHZ * ACTMON_SAMPLING_PERIOD;
> >  
> >  	stat->busy_time = min(stat->busy_time, stat->total_time);
> >  
> > @@ -505,6 +629,7 @@ static int tegra_governor_get_target(struct
> > devfreq *devfreq, struct devfreq_dev_status *stat;
> >  	struct tegra_devfreq *tegra;
> >  	struct tegra_devfreq_device *dev;
> > +	unsigned long dev_target_freq;
> >  	unsigned long target_freq = 0;
> >  	unsigned int i;
> >  	int err;
> > @@ -520,9 +645,9 @@ static int tegra_governor_get_target(struct
> > devfreq *devfreq, for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
> >  		dev = &tegra->devices[i];
> >  
> > -		actmon_update_target(tegra, dev);
> > +		dev_target_freq = actmon_update_target(tegra, dev);
> >  
> > -		target_freq = max(target_freq, dev->target_freq);
> > +		target_freq = max(target_freq, dev_target_freq);
> >  	}
> >  
> >  	*freq = target_freq * KHZ;
> > @@ -642,7 +767,6 @@ static int tegra_devfreq_probe(struct
> > platform_device *pdev) return rate;
> >  	}
> >  
> > -	tegra->cur_freq = clk_get_rate(tegra->emc_clock) / KHZ;
> >  	tegra->max_freq = rate / KHZ;
> >  
> >  	for (i = 0; i < ARRAY_SIZE(actmon_device_configs); i++) {
> > @@ -671,7 +795,8 @@ static int tegra_devfreq_probe(struct
> > platform_device *pdev) platform_set_drvdata(pdev, tegra);
> >  
> >  	tegra->rate_change_nb.notifier_call =
> > tegra_actmon_rate_notify_cb;
> > -	err = clk_notifier_register(tegra->emc_clock,
> > &tegra->rate_change_nb);
> > +	err = clk_notifier_register(tegra->emc_clock,
> > +				    &tegra->rate_change_nb);
> >  	if (err) {
> >  		dev_err(&pdev->dev,
> >  			"Failed to register rate change
> > notifier\n"); 
> 
> 
> Maybe, it is possible to merge patch4/patch19/patch20 to one patch.

All these three patches are completely separate changes, thus they
should be kept separate.

^ permalink raw reply

* Re: [PATCH 14/18] drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
From: Ulf Hansson @ 2019-07-18 21:49 UTC (permalink / raw)
  To: Lina Iyer
  Cc: Lorenzo Pieralisi, Sudeep Holla, Mark Rutland, Linux ARM,
	Rafael J . Wysocki, Daniel Lezcano, Raju P . L . S . S . S . N,
	Amit Kucheria, Bjorn Andersson, Stephen Boyd, Niklas Cassel,
	Tony Lindgren, Kevin Hilman, Viresh Kumar, Vincent Guittot,
	Geert Uytterhoeven, Souvik Chakravarty, Linux PM, linux-arm-msm,
	Linux Kernel Mailing List
In-Reply-To: <20190718174116.GD25567@codeaurora.org>

On Thu, 18 Jul 2019 at 19:41, Lina Iyer <ilina@codeaurora.org> wrote:
>
> On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
> >On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
> ><lorenzo.pieralisi@arm.com> wrote:
> >>
> >> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> >> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> >> > <lorenzo.pieralisi@arm.com> wrote:
> >> > >
> >> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> >> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> >> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> >> > > > support.
> >> > > >
> >> > > > To know for which idle states runtime PM reference counting is needed,
> >> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> >> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> >> > > > the requested idle state index and then act accordingly.
> >> > >
> >> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> >> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> >> > > entry.
> >> >
> >> > If the CPU idle governor did select the CPU retention for the CPU, it
> >> > was probably because the target residency for the CPU shutdown state
> >> > could not be met.
> >>
> >> The kernel does not know what those cpu states represent, so, this is an
> >> assumption you are making and it must be made clear that this code works
> >> as long as your assumption is valid.
> >>
> >> If eg a "cluster" retention state has lower target_residency than
> >> the deepest CPU idle state this assumption is wrong.
> >
> >Good point, you are right. I try to find a place to document this assumption.
> >
> >>
> >> And CPUidle and genPD governor decisions are not synced anyway so,
> >> again, this is an assumption, not a certainty.
> >>
> >> > In this case, there is no point in allowing any other deeper idle
> >> > states for cluster/package/system, since those have even greater
> >> > residencies, hence calling runtime PM doesn't make sense.
> >>
> >> On the systems you are testing on.
> >
> >So what you are saying typically means, that if all CPUs in the same
> >cluster have entered the CPU retention state, on some system the
> >cluster may also put into a cluster retention state (assuming the
> >target residency is met)?
> >
> >Do you know of any systems that has these characteristics?
> >
> Many QCOM SoCs can do that. But with the hardware improving, the
> power-performance benefits skew the results in favor of powering off
> the cluster than keeping the CPU and cluster in retention.
>
> Kevin H and I thought of this problem earlier on. But that is a second
> level problem to solve and definitely to be thought of after we have the
> support for the deepest states in the kernel. We left that out for a
> later date. The idea would have been to setup the allowable state(s) in
> the DT for CPU and cluster state definitions and have the genpd take
> that into consideration when deciding the idle state for the domain.

Thanks for confirming.

This more or less means we need to improve the hierarchical support in
genpd to support more levels, such that it makes sense to have a genpd
governor assigned at more than one level. This doesn't work well
today. As I also have stated, this is on my todo list for genpd.

However, I also agree with your standpoint, that let's start simple to
enable the deepest state as a start with, then we can improve things
on top.

Kind regards
Uffe

^ permalink raw reply

* [PATCH V36 10/29] hibernate: Disable when the kernel is locked down
From: Matthew Garrett @ 2019-07-18 19:43 UTC (permalink / raw)
  To: jmorris
  Cc: linux-security-module, linux-kernel, linux-api, Josh Boyer,
	David Howells, Matthew Garrett, Kees Cook, rjw, pavel, linux-pm
In-Reply-To: <20190718194415.108476-1-matthewgarrett@google.com>

From: Josh Boyer <jwboyer@fedoraproject.org>

There is currently no way to verify the resume image when returning
from hibernate.  This might compromise the signed modules trust model,
so until we can work with signed hibernate images we disable it when the
kernel is locked down.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: rjw@rjwysocki.net
Cc: pavel@ucw.cz
cc: linux-pm@vger.kernel.org
---
 include/linux/security.h     | 1 +
 kernel/power/hibernate.c     | 3 ++-
 security/lockdown/lockdown.c | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 69c5de539e9a..304a155a5628 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -106,6 +106,7 @@ enum lockdown_reason {
 	LOCKDOWN_MODULE_SIGNATURE,
 	LOCKDOWN_DEV_MEM,
 	LOCKDOWN_KEXEC,
+	LOCKDOWN_HIBERNATION,
 	LOCKDOWN_INTEGRITY_MAX,
 	LOCKDOWN_CONFIDENTIALITY_MAX,
 };
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index cd7434e6000d..3c0a5a8170b0 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -30,6 +30,7 @@
 #include <linux/ctype.h>
 #include <linux/genhd.h>
 #include <linux/ktime.h>
+#include <linux/security.h>
 #include <trace/events/power.h>
 
 #include "power.h"
@@ -68,7 +69,7 @@ static const struct platform_hibernation_ops *hibernation_ops;
 
 bool hibernation_available(void)
 {
-	return (nohibernate == 0);
+	return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
 }
 
 /**
diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
index 6f302c156bc8..a0996f75629f 100644
--- a/security/lockdown/lockdown.c
+++ b/security/lockdown/lockdown.c
@@ -21,6 +21,7 @@ static char *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
 	[LOCKDOWN_MODULE_SIGNATURE] = "unsigned module loading",
 	[LOCKDOWN_DEV_MEM] = "/dev/mem,kmem,port",
 	[LOCKDOWN_KEXEC] = "kexec of unsigned images",
+	[LOCKDOWN_HIBERNATION] = "hibernation",
 	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
 	[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
 };
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH v12 0/6] Add utilization clamping support (CGroups API)
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini

Hi all, this is a respin of:

  https://lore.kernel.org/lkml/20190708084357.12944-1-patrick.bellasi@arm.com/

which addresses all the comments collected so far:

- track requested cgroup's percentage to mask conversion rounding to userspace
- use a dedicated variable for parent restrictions
- make more explicit in the documentation that the requested "protection" is
  always capped by the requested "limit"
- use the newly added uclamp_mutex to serialize the sysfs write callback
- add missing RCU read locks across cpu_util_update_eff() call from
  uclamp_update_root_tg()
- remove not required and confusing sentence from the above changelog
- add a new patch to always use enum uclamp_id for clamp_id values
- fix percentage's decimals format string

as well as adds some small modifications:

- introduce UCLAMP_PERCENT_{SHIFT,SCALE} to avoid hardcoded constants
- s/uclamp_scale_from_percent()/capacity_from_percent()/
- move range check from cpu_uclamp_{min,max}_write() to capacity_from_percent()

The series is based on top of today's Linus master branch (wip for 5.3-rc1):

  commit 22051d9c4a57 ("Merge tag 'platform-drivers-x86-v5.3-2' of git://git.infradead.org/linux-platform-drivers-x86")

Thanks Quentin, Michal and Tejun for your review comments!

This has been the first code review targeting specifically the cgroups bits and
the series is now hopefully in a better shape.

Looking forward for any additional comments! ;)

Cheers,
Patrick

Series Organization
===================

The full tree is available here:

   git://linux-arm.org/linux-pb.git   lkml/utilclamp_v12
   http://www.linux-arm.org/git?p=linux-pb.git;a=shortlog;h=refs/heads/lkml/utilclamp_v12

Newcomer's Short Abstract
=========================

The Linux scheduler tracks a "utilization" signal for each scheduling entity
(SE), e.g. tasks, to know how much CPU time they use. This signal allows the
scheduler to know how "big" a task is and, in principle, it can support
advanced task placement strategies by selecting the best CPU to run a task.
Some of these strategies are represented by the Energy Aware Scheduler [1].

When the schedutil cpufreq governor is in use, the utilization signal allows
the Linux scheduler to also drive frequency selection. The CPU utilization
signal, which represents the aggregated utilization of tasks scheduled on that
CPU, is used to select the frequency which best fits the workload generated by
the tasks.

The current translation of utilization values into a frequency selection is
simple: we go to max for RT tasks or to the minimum frequency which can
accommodate the utilization of DL+FAIR tasks.
However, utilization values by themselves cannot convey the desired
power/performance behaviors of each task as intended by user-space.
As such they are not ideally suited for task placement decisions.

Task placement and frequency selection policies in the kernel can be improved
by taking into consideration hints coming from authorized user-space elements,
like for example the Android middleware or more generally any "System
Management Software" (SMS) framework.

Utilization clamping is a mechanism which allows to "clamp" (i.e. filter) the
utilization generated by RT and FAIR tasks within a range defined by user-space.
The clamped utilization value can then be used, for example, to enforce a
minimum and/or maximum frequency depending on which tasks are active on a CPU.

The main use-cases for utilization clamping are:

 - boosting: better interactive response for small tasks which
   are affecting the user experience.

   Consider for example the case of a small control thread for an external
   accelerator (e.g. GPU, DSP, other devices). Here, from the task utilization
   the scheduler does not have a complete view of what the task's requirements
   are and, if it's a small utilization task, it keeps selecting a more energy
   efficient CPU, with smaller capacity and lower frequency, thus negatively
   impacting the overall time required to complete task activations.

 - capping: increase energy efficiency for background tasks not affecting the
   user experience.

   Since running on a lower capacity CPU at a lower frequency is more energy
   efficient, when the completion time is not a main goal, then capping the
   utilization considered for certain (maybe big) tasks can have positive
   effects, both on energy consumption and thermal headroom.
   This feature allows also to make RT tasks more energy friendly on mobile
   systems where running them on high capacity CPUs and at the maximum
   frequency is not required.

From these two use-cases, it's worth noticing that frequency selection
biasing, introduced by patches 9 and 10 of this series, is just one possible
usage of utilization clamping. Another compelling extension of utilization
clamping is in helping the scheduler in making tasks placement decisions.

Utilization is (also) a task specific property the scheduler uses to know
how much CPU bandwidth a task requires, at least as long as there is idle time.
Thus, the utilization clamp values, defined either per-task or per-task_group,
can represent tasks to the scheduler as being bigger (or smaller) than what
they actually are.

Utilization clamping thus enables interesting additional optimizations, for
example on asymmetric capacity systems like Arm big.LITTLE and DynamIQ CPUs,
where:

 - boosting: try to run small/foreground tasks on higher-capacity CPUs to
   complete them faster despite being less energy efficient.

 - capping: try to run big/background tasks on low-capacity CPUs to save power
   and thermal headroom for more important tasks

This series does not present this additional usage of utilization clamping but
it's an integral part of the EAS feature set, where [2] is one of its main
components.

Android kernels use SchedTune, a solution similar to utilization clamping, to
bias both 'frequency selection' and 'task placement'. This series provides the
foundation to add similar features to mainline while focusing, for the
time being, just on schedutil integration.

References
==========

[1] Energy Aware Scheduling
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-energy.txt?h=v5.1

[2] Expressing per-task/per-cgroup performance hints
    Linux Plumbers Conference 2018
    https://linuxplumbersconf.org/event/2/contributions/128/

Patrick Bellasi (6):
  sched/core: uclamp: Extend CPU's cgroup controller
  sched/core: uclamp: Propagate parent clamps
  sched/core: uclamp: Propagate system defaults to root group
  sched/core: uclamp: Use TG's clamps to restrict TASK's clamps
  sched/core: uclamp: Update CPU's refcount on TG's clamp changes
  sched/core: uclamp: always use enum uclamp_id for clamp_id values

 Documentation/admin-guide/cgroup-v2.rst |  34 +++
 init/Kconfig                            |  22 ++
 kernel/sched/core.c                     | 382 ++++++++++++++++++++++--
 kernel/sched/sched.h                    |  12 +-
 4 files changed, 430 insertions(+), 20 deletions(-)

-- 
2.22.0

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox