From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Subject: Re: [Update][PATCH 1/2] cpufreq: intel_pstate: Per CPU P-State limits
Date: Wed, 12 Oct 2016 16:16:05 -0700 [thread overview]
Message-ID: <1476314165.59335.137.camel@linux.intel.com> (raw)
In-Reply-To: <3832699.WyNH3AWCH1@vostro.rjw.lan>
Hi,
On Thu, 2016-10-13 at 00:26 +0200, Rafael J. Wysocki wrote:
[...]
>
> I have a couple of comments to the details, but let me start with a
> high-level view.
> I would do this quite differently. There is quite a bit of a
> headache with
> keeping the global settings (intel_pstate/max(min)_perf_pct) and the
> per-policy
> scaling_max(min)_freq in sync and I'm not sure this really is
> necessary.
> Moreover, all of these updates are asynchronous with respect to the
> scheduler
> callback anyway, so there are races this synchronization between the
> settings
> doesn't cover. IMO, it would be sufficient to combine the two in
> intel_pstate_get_min_max() only where the limits to apply for the
> given CPU are
> computed.
>
> So if limits->max_sysfs_perf is there (in addition to or even instead
> of
> limits->max_sysfs_pct), intel_pstate_get_min_max() can be something
> like:
Idea is to preserve current functionality.
The problem with approach is that limits are changed via cpufreq sysfs,
the intel_pstate sysfs doesn't reflect that. This is the value it is
displaying currently so keeping the same functionality.
Currently we know from Intel P-state sysfs the limited performance
value irrespective of which sysfs did. So I don't want to break this
behavior (There are tools reading intel_pstate sysfs for current limits
to get to next quickly to meet thermal deadline, also cpufreq sysfs
limits can change without tools knowledge by say thermal or _PPC, so
the intel P-state sysfs is a common place now).
For the sync issue with scheduler callbacks, the effect will see in
next cycle. Anyway limits don't need to be instantaneous.
static void intel_pstate_get_min_max(struct cpudata *cpu, int *min,
> int *max)
>
Better to avoid more comparisons in fast path function.
[...]
> > @@ -569,12 +566,12 @@ static void intel_pstate_hwp_set(const struct
> > cpumask *cpumask)
> > range = hw_max - hw_min;
> >
> > rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
> > - adj_range = limits->min_perf_pct * range / 100;
> > + adj_range = all_cpu_data[cpu]->min_perf_pct *
> > range / 100;
> > min = hw_min + adj_range;
> > value &= ~HWP_MIN_PERF(~0L);
> > value |= HWP_MIN_PERF(min);
> >
> > - adj_range = limits->max_perf_pct * range / 100;
> > + adj_range = all_cpu_data[cpu]->max_perf_pct *
> > range / 100;
> > max = hw_min + adj_range;
> > if (limits->no_turbo) {
> > hw_max = HWP_GUARANTEED_PERF(cap);
>
> I'd change this as I said in the beginning.
This needs to be done irrespective of the way choose. The HWP requests
are per CPU.
>
> >
> >
[...]
> > + * since that is what can be be most constrained min performance
> > limit.
> > + */
>
> Generally, if you add a comment to a function, there's no reason why
> it should
> not be a proper kerneldoc.
I can add that.
> > [...]
> > + for_each_online_cpu(cpu_index) {
>
> Iterating over online CPUs generally requires CPU online/offline to
> be locked.
>
> Otherwise the CPU you are dealing with at the moment may go away from
> under you.
We are updating memory variables only which are not per_cpu variable.
When offline is done the stop() callback will be called where the mutex
will prevent race. I can add get/put_online, this is an overkill as
user can get during next read. Also there is still a small window after
put_online_cpu() and sysfs return to user mode, where offline can still
happen, so it doesn't take care of all cases.
>
[...]
> > + struct cpudata *cpu = all_cpu_data[cpu_index];
> >
> > +static ssize_t show_min_perf_pct(struct kobject *kobj,
> > + struct attribute *attr, char
> > *buf)
> > +{
> > + int max_perf, min_perf;
> > +
> > + get_max_min_perf(&max_perf, &min_perf);
> > +
> > + return sprintf(buf, "%d\n", min_perf);
> > +}
>
> From the above two we can simply return the last value written to
> them IMO.
Most of the time it will be last value unless someone changed the
cpufreq policy anyway.
>
> That would be less confusing to users than returning some effective
> value
> depending on the per-policy settings.
This is what returned now. Currently cpupower utility is used by many
to limit performance via cpufreq sysfs and we return the reduced value
via sysfs read here.
Thanks,
Srinivas
next prev parent reply other threads:[~2016-10-12 23:22 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-12 19:07 [Update][PATCH 1/2] cpufreq: intel_pstate: Per CPU P-State limits Srinivas Pandruvada
2016-10-12 22:26 ` Rafael J. Wysocki
2016-10-12 23:16 ` Srinivas Pandruvada [this message]
2016-10-13 0:16 ` Rafael J. Wysocki
2016-10-13 0:34 ` Rafael J. Wysocki
2016-10-13 0:45 ` Srinivas Pandruvada
2016-10-13 0:55 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1476314165.59335.137.camel@linux.intel.com \
--to=srinivas.pandruvada@linux.intel.com \
--cc=linux-pm@vger.kernel.org \
--cc=rjw@rjwysocki.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).