Re: [PATCH] cpufreq: intel_pstate: Enforce _PPC limits

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Subject: Re: [PATCH] cpufreq: intel_pstate: Enforce _PPC limits
Date: Wed, 20 Apr 2016 13:11:36 -0700	[thread overview]
Message-ID: <1461183096.8946.169.camel@linux.intel.com> (raw)
In-Reply-To: <1586030.5a1IISzElq@vostro.rjw.lan>

On Wed, 2016-04-20 at 03:20 +0200, Rafael J. Wysocki wrote:
> On Monday, April 04, 2016 04:22:30 PM Srinivas Pandruvada wrote:
> > 
> > Use ACPI _PPC notification to limit max P state driver will
> > request.

[...]

> > +		acpi_ppc
> > +			Enforce ACPI _PPC performance limits.
> I'd call it support_acpi_ppc or similar.
OK.

> 
> > 
> >  
> >  	intremap=	[X86-64, Intel-IOMMU]
> >  			on	enable Interrupt Remapping
> > (default)
> > diff --git a/drivers/cpufreq/Kconfig.x86
> > b/drivers/cpufreq/Kconfig.x86
> > index c59bdcb..adbd1de 100644
> > --- a/drivers/cpufreq/Kconfig.x86
> > +++ b/drivers/cpufreq/Kconfig.x86
> > @@ -5,6 +5,7 @@
> >  config X86_INTEL_PSTATE
> >         bool "Intel P state control"
> >         depends on X86
> > +       select ACPI_PROCESSOR if ACPI
> >         help
> >            This driver provides a P state for Intel core
> > processors.
> >  	  The driver implements an internal governor and will
> > become
> > diff --git a/drivers/cpufreq/intel_pstate.c
> > b/drivers/cpufreq/intel_pstate.c
> > index 8b5a415..b10ea73 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -39,6 +39,10 @@
> >  #define ATOM_TURBO_RATIOS	0x66c
> >  #define ATOM_TURBO_VIDS		0x66d
> >  
> > +#if IS_ENABLED(CONFIG_ACPI)
> > +#include <acpi/processor.h>
> > +#endif
> I'd prefer #ifdef (and below).
> 
OK.

> > 
> > +
> >  #define FRAC_BITS 8
> >  #define int_tofp(X) ((int64_t)(X) << FRAC_BITS)
> >  #define fp_toint(X) ((X) >> FRAC_BITS)
> > @@ -190,6 +194,10 @@ struct cpudata {
> >  	u64	prev_tsc;
> >  	u64	prev_cummulative_iowait;
> >  	struct sample sample;
> > +#if IS_ENABLED(CONFIG_ACPI)
> > +	struct acpi_processor_performance acpi_perf_data;
> > +	bool valid_pss_table;
> > +#endif
> >  };
> >  
> >  static struct cpudata **all_cpu_data;
> > @@ -257,7 +265,7 @@ static inline int32_t
> > get_target_pstate_use_cpu_load(struct cpudata *cpu);
> >  static struct pstate_adjust_policy pid_params;
> >  static struct pstate_funcs pstate_funcs;
> >  static int hwp_active;
> > -
> > +static int acpi_ppc;
> Is this needed for !CONFIG_ACPI?
> 
If I keep this outside then I can avoid !CONFIG_ACPI check at couple of
places.

> > 
> >  
> >  /**
> >   * struct perf_limits - Store user and policy limits
> > @@ -331,6 +339,117 @@ static struct perf_limits *limits =
> > &performance_limits;
> >  static struct perf_limits *limits = &powersave_limits;
> >  #endif
> >  
> > +#if IS_ENABLED(CONFIG_ACPI)
> > +/*
> > + * The max target pstate ratio is a 8 bit value in both
> > PLATFORM_INFO MSR and
> > + * in TURBO_RATIO_LIMIT MSR, which pstate driver stores in
> > max_pstate and
> > + * max_turbo_pstate fields. The PERF_CTL MSR contains 16 bit value
> > for P state
> > + * ratio, out of it only high 8 bits are used. For example 0x1700
> > is setting
> > + * target ratio 0x17. The _PSS control value stores in a format
> > which can be
> > + * directly written to PERF_CTL MSR. But in intel_pstate driver
> > this shift
> > + * occurs during write to PERF_CTL (E.g. for cores
> > core_set_pstate()).
> > + * This function converts the _PSS control value to intel pstate
> > driver format
> > + * for comparison and assignment.
> > + */
> > +static int convert_to_native_pstate_format(struct cpudata *cpu,
> > int index)
> > +{
> > +	return cpu->acpi_perf_data.states[index].control >> 8;
> > +}
> > +
> > +static int intel_pstate_init_perf_limits(struct cpufreq_policy
> > *policy)
> > +{
> > +	struct cpudata *cpu;
> > +	int turbo_pss_ctl;
> > +	int ret;
> > +	int i;
> > +
> > +	cpu = all_cpu_data[policy->cpu];
> > +
> > +	if (!cpu->acpi_perf_data.shared_cpu_map &&
> > +	    zalloc_cpumask_var_node(&cpu-
> > >acpi_perf_data.shared_cpu_map,
> > +				    GFP_KERNEL,
> > cpu_to_node(policy->cpu))) {
> > +		return -ENOMEM;
> > +	}
> Why exactly is the thing above needed?
> 
Just for safety. shared_cpu_map element is used during evaluating _PSD.
But this evaluation happens
during  acpi_processor_preregister_performance. But someone moves this
processing then we will be impacted.
I can remove this.

> > 
> > +
> > +	ret = acpi_processor_register_performance(&cpu-
> > >acpi_perf_data,
> > +						  policy->cpu);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/*
> > +	 * Check if the control value in _PSS is for PERF_CTL MSR,
> > which should
> > +	 * guarantee that the states returned by it map to the
> > states in our
> > +	 * list directly.
> > +	 */
> > +	if (cpu->acpi_perf_data.control_register.space_id !=
> > +						ACPI_ADR_SPACE_FIX
> > ED_HARDWARE)
> > +		goto unreg_perf;
> > +
> > +	/*
> > +	 * If there is only one entry _PSS, simply ignore _PSS and
> > continue as
> > +	 * usual without taking _PSS into account
> > +	 */
> > +	if (cpu->acpi_perf_data.state_count < 2)
> > +		goto unreg_perf;
> I'd call the label err or similar.
> 
OK

> > 
> > +
> > +	pr_debug("intel_pstate: CPU%u - ACPI _PSS perf data\n",
> > policy->cpu);
> Don't we have a pr_fmt() there now?
> 
Yes. I need to rebase and send update for this.

> > 
> > +	for (i = 0; i < cpu->acpi_perf_data.state_count; i++) {
> > +		pr_debug("     %cP%d: %u MHz, %u mW, 0x%x\n",
> > +			 (i == cpu->acpi_perf_data.state ? '*' : '
> > '), i,
> > +			 (u32) cpu-
> > >acpi_perf_data.states[i].core_frequency,
> > +			 (u32) cpu-
> > >acpi_perf_data.states[i].power,
> > +			 (u32) cpu-
> > >acpi_perf_data.states[i].control);
> > +	}
> > +
> > +	/*
> > +	 * The _PSS table doesn't contain whole turbo frequency
> > range.
> > +	 * This just contains +1 MHZ above the max non turbo
> > frequency,
> > +	 * with control value corresponding to max turbo ratio.
> > But
> > +	 * when cpufreq set policy is called, it will call with
> > this
> > +	 * max frequency, which will cause a reduced performance
> > as
> > +	 * this driver uses real max turbo frequency as the max
> > +	 * frequeny. So correct this frequency in _PSS table to
> frequency
> 
> > 
> > +	 * correct max turbo frequency based on the turbo ratio.
> > +	 * Also need to convert to MHz as _PSS freq is in MHz.
> > +	 */
> > +	turbo_pss_ctl = convert_to_native_pstate_format(cpu, 0);
> > +	if (turbo_pss_ctl > cpu->pstate.max_pstate)
> > +		cpu->acpi_perf_data.states[0].core_frequency =
> > +					policy->cpuinfo.max_freq /
> > 1000;
> > +	cpu->valid_pss_table = true;
> > +	pr_info("intel_pstate: _PPC limits will be enforced\n");
> > +
> > +	return 0;
> > +unreg_perf:
> > +	cpu->valid_pss_table = false;
> > +	acpi_processor_unregister_performance(policy->cpu);
> > +	return -EINVAL;
> > +}
> > +
> > +static int intel_pstate_exit_perf_limits(struct cpufreq_policy
> > *policy)
> > +{
> > +	struct cpudata *cpu;
> > +
> > +	cpu = all_cpu_data[policy->cpu];
> > +	if (!acpi_ppc || !cpu->valid_pss_table)
> > +		return 0;
> It should not be necessary to check acpi_ppc here as cpu-
> >valid_pss_table
> should never be set if acpi_ppc is unset.
> 
Yes.

> > 
> > +
> > +	acpi_processor_unregister_performance(policy->cpu);
> > +	return 0;
> > +}
> > +
> > +#else
> > +static int intel_pstate_init_perf_limits(struct cpufreq_policy
> > *policy)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int intel_pstate_exit_perf_limits(struct cpufreq_policy
> > *policy)
> > +{
> > +	return 0;
> > +}
> > +#endif
> > +
> >  static inline void pid_reset(struct _pid *pid, int setpoint, int
> > busy,
> >  			     int deadband, int integral) {
> >  	pid->setpoint = int_tofp(setpoint);
> > @@ -1297,6 +1416,30 @@ static int intel_pstate_set_policy(struct
> > cpufreq_policy *policy)
> >  
> >  	intel_pstate_clear_update_util_hook(policy->cpu);
> >  
> > +	if (acpi_ppc) {
> > +		struct cpudata *cpu;
> > +
> > +		/*
> > +		 * If the platform has config TDP feature, then to
> > indicate
> > +		 * start of turbo range _PPC is set to one more
> > than the turbo
> > +		 * activation ratio, which is cpu-
> > >pstate.max_pstate. Here the
> > +		 * updated frequency corresponding to _PPC is
> > reflected in
> > +		 * policy->max. This  means that this _PPC setting
> > still
> But ->set_policy may be called on updates of policy->max from sysfs
> which
> then may not reflect the _PPC value if I'm not mistaken, may it not?
> 
Yes, it will be called from sysfs path also. But if user sets a
frequency above max non turbo, then it will be in turbo range anyway.
But, I should add a check to support only for config tdp platforms
here.

If the _PPC limits sets above user set limits
then cpufreq_verify_within_limits should pick the lower value.

> > 
> > +		 * allowing system to reach policy-
> > >cpuinfo.max_freq anyway as
> > +		 * this is turbo range.
> > +		 * In this case showing restricted limits in
> > intel_pstate
> > +		 * sysfs or setting limits->max_perf to a lower
> > value has
> > +		 * no meaning.
> > +		 */
> > +		cpu = all_cpu_data[0];
> > +		if (policy->max < policy->cpuinfo.max_freq &&
> > +		    policy->max > (cpu->pstate.max_pstate *
> > +					cpu->pstate.scaling)) {
> > +			pr_info("intel_pstate: _PPC > Max non
> > Turbo P_state\n");
> > +			policy->max = policy->cpuinfo.max_freq;
> > +		}
> > +	}
> > +
> >  	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE) {
> >  		limits = &performance_limits;
> >  		if (policy->max >= policy->cpuinfo.max_freq) {
> > @@ -1392,18 +1535,30 @@ static int intel_pstate_cpu_init(struct
> > cpufreq_policy *policy)
> >  	policy->cpuinfo.min_freq = cpu->pstate.min_pstate * cpu-
> > >pstate.scaling;
> >  	policy->cpuinfo.max_freq =
> >  		cpu->pstate.turbo_pstate * cpu->pstate.scaling;
> > +	if (acpi_ppc)
> > +		intel_pstate_init_perf_limits(policy);
> Why don't you check acpi_ppc in intel_pstate_init_perf_limits()?
OK

> > 
> > +	/*
> > +	 * If there is no acpi perf data or error, we ignore and
> > use Intel P
> > +	 * state calculated limits, So this is not fatal error.
> > +	 */
> >  	policy->cpuinfo.transition_latency = CPUFREQ_ETERNAL;
> >  	cpumask_set_cpu(policy->cpu, policy->cpus);
> >  
> >  	return 0;
> >  }
> >  
> > +static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
> > +{
> > +	return intel_pstate_exit_perf_limits(policy);
> > +}
> > +
> >  static struct cpufreq_driver intel_pstate_driver = {
> >  	.flags		= CPUFREQ_CONST_LOOPS,
> >  	.verify		= intel_pstate_verify_policy,
> >  	.setpolicy	= intel_pstate_set_policy,
> >  	.get		= intel_pstate_get,
> >  	.init		= intel_pstate_cpu_init,
> > +	.exit		= intel_pstate_cpu_exit,
> >  	.stop_cpu	= intel_pstate_stop_cpu,
> >  	.name		= "intel_pstate",
> >  };
> > @@ -1448,7 +1603,6 @@ static void copy_cpu_funcs(struct
> > pstate_funcs *funcs)
> >  }
> >  
> >  #if IS_ENABLED(CONFIG_ACPI)
> > -#include <acpi/processor.h>
> >  
> >  static bool intel_pstate_no_acpi_pss(void)
> >  {
> > @@ -1654,6 +1808,9 @@ static int __init intel_pstate_setup(char
> > *str)
> >  		force_load = 1;
> >  	if (!strcmp(str, "hwp_only"))
> >  		hwp_only = 1;
> > +	if (!strcmp(str, "acpi_ppc"))
> > +		acpi_ppc = 1;
> > +
> >  	return 0;
> >  }
> >  early_param("intel_pstate", intel_pstate_setup);
> > 
I will submit version 2 of the patch soon.

Thanks,
Srinivas

> Thanks,
> Rafael
>

next prev parent reply	other threads:[~2016-04-20 20:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-04 23:22 [PATCH] cpufreq: intel_pstate: Enforce _PPC limits Srinivas Pandruvada
2016-04-20  1:20 ` Rafael J. Wysocki
2016-04-20 20:11   ` Srinivas Pandruvada [this message]
2016-04-20 14:02 ` Konstantin Khlebnikov
2016-04-20 17:28   ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1461183096.8946.169.camel@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).