From: "Doug Smythies" <dsmythies@telus.net>
To: "'Rafael J. Wysocki'" <rjw@rjwysocki.net>,
"'Linux PM'" <linux-pm@vger.kernel.org>
Cc: "'LKML'" <linux-kernel@vger.kernel.org>,
"'Len Brown'" <len.brown@intel.com>,
"'Srinivas Pandruvada'" <srinivas.pandruvada@linux.intel.com>,
"'Peter Zijlstra'" <peterz@infradead.org>,
"'Giovanni Gherdovich'" <ggherdovich@suse.cz>,
"'Francisco Jerez'" <francisco.jerez.plata@intel.com>
Subject: RE: [RFC/RFT][PATCH] cpufreq: intel_pstate: Work in passive mode with HWP enabled
Date: Sun, 14 Jun 2020 23:18:16 -0700 [thread overview]
Message-ID: <002801d642dc$c5225c10$4f671430$@net> (raw)
In-Reply-To: <3169564.ZRsPWhXyMD@kreacher>
On 2020.05.21 10:16 Rafael J. Wysocki wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Allow intel_pstate to work in the passive mode with HWP enabled and
> make it translate the target frequency supplied by the cpufreq
> governor in use into an EPP value to be written to the HWP request
> MSR (high frequencies are mapped to low EPP values that mean more
> performance-oriented HWP operation) as a hint for the HWP algorithm
> in the processor, so as to prevent it and the CPU scheduler from
> working against each other at least when the schedutil governor is
> in use.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> This is a prototype not intended for production use (based on linux-next).
>
> Please test it if you can (on HWP systems, of course) and let me know the
> results.
>
> The INTEL_CPUFREQ_TRANSITION_DELAY_HWP value has been guessed and it very well
> may turn out to be either too high or too low for the general use, which is one
> reason why getting as much testing coverage as possible is key here.
>
> If you can play with different INTEL_CPUFREQ_TRANSITION_DELAY_HWP values,
> please do so and let me know the conclusions.
>
> Cheers,
> Rafael
To anyone trying this patch:
You will need to monitor EPP (Energy Performance Preference) carefully.
It changes as a function of passive/active, and if you booted active
or passive or no-hwp and changed later.
Originally, I was not specifically monitoring EPP, or paths taken since boot
towards driver, intel_pstate or intel_cpufreq, and governor, and will now have
to set aside test results.
@Rafael: I am still having problems with my test computer and HWP. However, I can
observe the energy saving potential of this "passive-yet-active HWP mode".
At this point, I am actually trying to make my newer test computer
simply behave and do what it is told with respect to CPU frequency scaling,
because even acpi-cpufreq misbehaves for performance governor under
some conditions [1].
[1] https://marc.info/?l=linux-pm&m=159155067328641&w=2
To my way of thinking:
1.) it is imperative that we be able to decouple the governor servo
from the processor servo. At a minimum this is needed for system testing,
debugging and reference baselines. At a maximum users could, perhaps, decide
for themselves. Myself, I would prefer "passive" to mean "do what you
have been told", and that is now what I am testing.
2.) I have always thought, indeed relied on, performance mode as being
more than a hint. For my older i7-2600K it never disobeyed orders, except
for the most minuscule of workloads.
This newer i5-9600K seems to have a mind of its own which I would like
to be able to turn off, yet still be able to use intel_pstate trace
with schedutil.
Recall last week I said
> moving forward the typical CPU frequency scaling
> configuration for my test system will be:
>
> driver: intel-cpufreq, forced at boot.
> governor: schedutil
> hwp: forced off at boot.
The problem is that baseline references are still needed
and performance mode is unreliable. Maybe other stuff also,
I simply don't know at this point.
Example of EPP changing (no need to read on) (from fresh boot):
Current EPP:
root@s18:/home/doug# rdmsr --bitfield 31:24 -u -a 0x774
128
128
128
128
128
128
root@s18:/home/doug# grep . /sys/devices/system/cpu/cpu3/cpufreq/*
/sys/devices/system/cpu/cpu3/cpufreq/affected_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/base_frequency:3700000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu3/cpufreq/energy_performance_available_preferences:default performance balance_performance balance_power
power
/sys/devices/system/cpu/cpu3/cpufreq/energy_performance_preference:balance_performance
/sys/devices/system/cpu/cpu3/cpufreq/related_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:performance powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:800102
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:<unsupported>
Now, switch to passive mode:
echo passive > /sys/devices/system/cpu/intel_pstate/status
And observe EPP:
root@s18:/home/doug# rdmsr --bitfield 31:24 -u -a 0x774
255
255
255
255
255
255
root@s18:/home/doug# grep . /sys/devices/system/cpu/cpu3/cpufreq/*
/sys/devices/system/cpu/cpu3/cpufreq/affected_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpu3/cpufreq/related_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
Hey, where did the ability to adjust the energy_performance_preference setting go?
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:3400313
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:<unsupported>
Kernel is 5.7 +plus this patch:
root@s18:/home/doug# uname -a
Linux s18 5.7.0-hwp10 #786 SMP PREEMPT Tue Jun 9 20:15:18 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux
223e5c33f927 (HEAD -> k57-doug-hwp) cpufreq: intel_pstate: Accept passive mode with HWP enabled
5d890a14763d cpufreq: intel_pstate: Use passive mode by default without HWP
3d77e6a8804a (tag: v5.7) Linux 5.7
The below is on top of this patch, and is how I am attempting to move forward:
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4ab8bc1476c9..6c28ec49b192 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2331,33 +2331,32 @@ static void intel_cpufreq_update_hwp_request(struct cpudata *cpu, u32 min_perf)
value |= HWP_MIN_PERF(min_perf);
/*
- * The entire MSR needs to be updated in order to update the HWP min
- * field in it, so opportunistically update the max too if needed.
+ * the max also...
*/
value &= ~HWP_MAX_PERF(~0L);
- value |= HWP_MAX_PERF(cpu->max_perf_ratio);
+ value |= HWP_MAX_PERF(min_perf);
if (value != prev)
wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
}
/**
- * intel_cpufreq_adjust_hwp - Adjust the HWP reuqest register.
+ * intel_cpufreq_adjust_hwp - Adjust the HWP request register.
* @cpu: Target CPU.
* @target_pstate: P-state corresponding to the target frequency.
*
- * Set the HWP minimum performance limit to 75% of @target_pstate taking the
+ * Set the HWP minimum performance limit to @target_pstate taking the
* global min and max policy limits into account.
*
- * The purpose of this is to avoid situations in which the kernel and the HWP
- * algorithm work against each other by giving a hint about the expectations of
- * the former to the latter.
+ * The purpose of this is to force the slave (passive) servo to do what
+ * it has been told, not what ever it wants.
+ * This NOT a hint. EPP (responsiveness) is managed from elsewhere.
*/
static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate)
{
u32 min_perf;
- min_perf = max_t(u32, (3 * target_pstate) / 4, cpu->min_perf_ratio);
+ min_perf = max_t(u32, target_pstate, cpu->min_perf_ratio);
min_perf = min_t(u32, min_perf, cpu->max_perf_ratio);
if (min_perf != cpu->pstate.current_pstate) {
cpu->pstate.current_pstate = min_perf;
... Doug
... [deleted] ...
next prev parent reply other threads:[~2020-06-15 6:18 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-21 17:15 [RFC/RFT][PATCH] cpufreq: intel_pstate: Work in passive mode with HWP enabled Rafael J. Wysocki
2020-05-25 1:36 ` Francisco Jerez
2020-05-25 13:23 ` Rafael J. Wysocki
2020-05-25 20:57 ` Francisco Jerez
2020-05-26 8:19 ` Rafael J. Wysocki
2020-05-26 15:51 ` Doug Smythies
2020-05-26 17:42 ` Rafael J. Wysocki
2020-05-26 18:29 ` Francisco Jerez
2020-05-25 15:30 ` Doug Smythies
2020-05-25 21:09 ` Doug Smythies
2020-06-15 6:18 ` Doug Smythies [this message]
2020-08-24 14:54 ` Doug Smythies
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='002801d642dc$c5225c10$4f671430$@net' \
--to=dsmythies@telus.net \
--cc=francisco.jerez.plata@intel.com \
--cc=ggherdovich@suse.cz \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
--cc=srinivas.pandruvada@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.