From mboxrd@z Thu Jan  1 00:00:00 1970
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Subject: Re: [Update][PATCH 1/2] cpufreq: intel_pstate: Per CPU P-State
 limits
Date: Wed, 12 Oct 2016 16:16:05 -0700
Message-ID: <1476314165.59335.137.camel@linux.intel.com>
References: <1476299241-87924-1-git-send-email-srinivas.pandruvada@linux.intel.com>
         <3832699.WyNH3AWCH1@vostro.rjw.lan>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Return-path: <linux-pm-owner@vger.kernel.org>
Received: from mga11.intel.com ([192.55.52.93]:52684 "EHLO mga11.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1753562AbcJLXWh (ORCPT <rfc822;linux-pm@vger.kernel.org>);
        Wed, 12 Oct 2016 19:22:37 -0400
In-Reply-To: <3832699.WyNH3AWCH1@vostro.rjw.lan>
Sender: linux-pm-owner@vger.kernel.org
List-Id: linux-pm@vger.kernel.org
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org

Hi,

On Thu, 2016-10-13 at 00:26 +0200, Rafael J. Wysocki wrote:

[...]

> 
> I have a couple of comments to the details, but let me start with a
> high-level view.
> I would do this quite differently.  There is quite a bit of a
> headache with
> keeping the global settings (intel_pstate/max(min)_perf_pct) and the
> per-policy
> scaling_max(min)_freq in sync and I'm not sure this really is
> necessary.  
> Moreover, all of these updates are asynchronous with respect to the
> scheduler
> callback anyway, so there are races this synchronization between the
> settings
> doesn't cover.  IMO, it would be sufficient to combine the two in
> intel_pstate_get_min_max() only where the limits to apply for the
> given CPU are
> computed.
> 
> So if limits->max_sysfs_perf is there (in addition to or even instead
> of
> limits->max_sysfs_pct), intel_pstate_get_min_max() can be something
> like:

Idea is to preserve current functionality.

The problem with approach is that limits are changed via cpufreq sysfs,
the intel_pstate sysfs doesn't reflect that. This is the value it is
displaying currently so keeping the same functionality.

Currently we know from Intel P-state sysfs the limited performance
value irrespective of which sysfs did. So I don't want to break this
behavior (There are tools reading intel_pstate sysfs for current limits
to get to next quickly to meet thermal deadline, also cpufreq sysfs
limits can change without tools knowledge by say thermal or _PPC, so
the intel P-state sysfs is a common place now).

For the sync issue with scheduler callbacks, the effect will see in
next cycle. Anyway limits don't need to be instantaneous.

 
static void intel_pstate_get_min_max(struct cpudata *cpu, int *min,
> int *max)
> 
Better to avoid more comparisons in fast path function.


[...]

> > @@ -569,12 +566,12 @@ static void intel_pstate_hwp_set(const struct
> > cpumask *cpumask)
> >  		range = hw_max - hw_min;
> >  
> >  		rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
> > -		adj_range = limits->min_perf_pct * range / 100;
> > +		adj_range = all_cpu_data[cpu]->min_perf_pct *
> > range / 100;
> >  		min = hw_min + adj_range;
> >  		value &= ~HWP_MIN_PERF(~0L);
> >  		value |= HWP_MIN_PERF(min);
> >  
> > -		adj_range = limits->max_perf_pct * range / 100;
> > +		adj_range = all_cpu_data[cpu]->max_perf_pct *
> > range / 100;
> >  		max = hw_min + adj_range;
> >  		if (limits->no_turbo) {
> >  			hw_max = HWP_GUARANTEED_PERF(cap);
> 
> I'd change this as I said in the beginning.
This needs to be done irrespective of the way choose. The HWP requests
are per CPU.

> 
> > 
> > 
[...]
> > + * since that is what can be be most constrained min performance
> > limit.
> > + */
> 
> Generally, if you add a comment to a function, there's no reason why
> it should
> not be a proper kerneldoc.
I can add that. 

> > [...]

> > +	for_each_online_cpu(cpu_index) {
> 
> Iterating over online CPUs generally requires CPU online/offline to
> be locked.
> 
> Otherwise the CPU you are dealing with at the moment may go away from
> under you.
We are updating memory variables only which are not per_cpu variable.
When offline is done the stop() callback will be called where the mutex
will prevent race. I can add get/put_online, this is an overkill as
user can get during next read. Also there is still a small window after
put_online_cpu() and sysfs return to user mode, where offline can still
happen, so it doesn't take care of all cases.

> 
[...]

> > +		struct cpudata *cpu = all_cpu_data[cpu_index];
> > 
> > +static ssize_t show_min_perf_pct(struct kobject *kobj,
> > +				 struct attribute *attr, char
> > *buf)
> > +{
> > +	int max_perf, min_perf;
> > +
> > +	get_max_min_perf(&max_perf, &min_perf);
> > +
> > +	return sprintf(buf, "%d\n", min_perf);
> > +}
> 
> From the above two we can simply return the last value written to
> them IMO.
Most of the time it will be last value unless someone changed the
cpufreq policy anyway.

> 
> That would be less confusing to users than returning some effective
> value
> depending on the per-policy settings.
This is what returned now. Currently cpupower utility is used by many
to limit performance via cpufreq sysfs and we return the reduced value
via sysfs read here.

Thanks,
Srinivas