All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	"linuxppc-dev@ozlabs.org" <linuxppc-dev@ozlabs.org>,
	Linux PM list <linux-pm@vger.kernel.org>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	Anton Blanchard <anton@samba.org>
Subject: Re: [PATCH v4] powernv, cpufreq: cpufreq driver for powernv platform
Date: Thu, 27 Mar 2014 15:00:50 +0530	[thread overview]
Message-ID: <20140327093050.GA27777@in.ibm.com> (raw)
In-Reply-To: <CAKohponPb+b=8Pa=te3CkiN6QPAUAukngqLSdG4vxc9pUkD8OQ@mail.gmail.com>

On Thu, Mar 27, 2014 at 12:09:53PM +0530, Viresh Kumar wrote:
> Cc'ing Rafael.
>

Thanks. It was a lapse on my part by not Cc'ing Rafael in the first
place.
 
> On 26 March 2014 22:25, Gautham R. Shenoy <ego@linux.vnet.ibm.com> wrote:
> > From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> >
> > Backend driver to dynamically set voltage and frequency on
> > IBM POWER non-virtualized platforms.  Power management SPRs
> > are used to set the required PState.
> >
> > This driver works in conjunction with cpufreq governors
> > like 'ondemand' to provide a demand based frequency and
> > voltage setting on IBM POWER non-virtualized platforms.
> >
> > PState table is obtained from OPAL v3 firmware through device
> > tree.
> >
> > powernv_cpufreq back-end driver would parse the relevant device-tree
> > nodes and initialise the cpufreq subsystem on powernv platform.
> >
> > The code was originally written by svaidy@linux.vnet.ibm.com. Over
> > time it was modified to accomodate bug-fixes as well as updates to the
> > the cpu-freq core. Relevant portions of the change logs corresponding
> > to those modifications are noted below:
> >
> >  * The policy->cpus needs to be populated in a hotplug-invariant
> >    manner instead of using cpu_sibling_mask() which varies with
> >    cpu-hotplug. This is because the cpufreq core code copies this
> >    content into policy->related_cpus mask which is should not vary on
> 
> s/is /

Good catch! Will fix this

[...]
> 
> >  * On POWER systems, the CPU frequency is controlled at a core-level
> >    and hence we need to serialize so that only one of the threads in
> >    the core switches the core's frequency at a time. Introduce
> >    per-core locking to enable finer-grained synchronization and
> >    thereby enhance the speed and responsiveness of the cpufreq driver
> >    to varying workload demands.
> >
> >      The design of per-core locking is very simple and
> >    straight-forward: we first define a Per-CPU lock and use the ones
> >    that belongs to the first thread sibling of the core.
> >
> >      cpu_first_thread_sibling() macro is used to find the *common*
> >    lock for all thread siblings belonging to a core. [Authored by
> >    srivatsa.bhat@linux.vnet.ibm.com]
> 
> We don't need that after serialization patch of core is accepted. And it
> should be accepted soon, in one form or other.

As of now, I prefer this patch be based on code that is in the -next
tree. I'll get rid of the per-core locking once the serialization
patch of the core is accepted.

[...]
> > --- a/arch/powerpc/configs/pseries_defconfig
> > +++ b/arch/powerpc/configs/pseries_defconfig
> > @@ -353,3 +353,4 @@ CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
> >  CONFIG_VIRTUALIZATION=y
> >  CONFIG_KVM_BOOK3S_64=m
> >  CONFIG_KVM_BOOK3S_64_HV=y
> > +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
> > diff --git a/arch/powerpc/configs/pseries_le_defconfig b/arch/powerpc/configs/pseries_le_defconfig
> > index 62771e0..47e6161 100644
> > --- a/arch/powerpc/configs/pseries_le_defconfig
> > +++ b/arch/powerpc/configs/pseries_le_defconfig
> > @@ -350,3 +350,4 @@ CONFIG_CRYPTO_LZO=m
> >  # CONFIG_CRYPTO_ANSI_CPRNG is not set
> >  CONFIG_CRYPTO_DEV_NX=y
> >  CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
> > +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
> 
> don't know how Rafael want this, but probably this part could have gone
> through ppc tree..
> 

That would mean multiple patches :-)

> > +
> > +#define pr_fmt(fmt)    "powernv-cpufreq: " fmt
> > +
> > +#include <linux/module.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/smp.h>
> > +#include <linux/of.h>
> > +#include <asm/cputhreads.h>
> 
> I thought I gave a comment on missing headers here?

Ok, so these seem to be the missing ones.

#include <linux/kernel.h>
#include <linux/percpu-defs.h>
#include <linux/mutex.h>
#include <linux/sysfs.h>
#include <linux/cpumask.h>

#include <asm/reg.h>

Have I missed anything else ?


> > +static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
> > +static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
> 
> I though we don't need this anymore? Either Rafael will take my patch as
> is for the BOOST fixup or we will end up creating .isboost field in
> struct cpufreq_frequency_table and so you could have used .driver_data here.
> 

I mentioned in my reply to your BOOST fixup patch that we would like
pstate_ids to be of the type "signed int", while the .driver_data is
an "unsigned int". If we end up using .driver_data, we would have to
be careful and not use it directly but reassign it to a signed int (or
typecast it) before using it.

Not just that, the pr_debugs in the core which are printed during
frequency transitions will then end up printing large positive values
(since it will interpret negative pstate_ids as unsigned int) in the
place of pstate_ids, which would not be particularly useful.

> > +struct powernv_pstate_info {
> > +       int pstate_min_id;
> > +       int pstate_max_id;
> > +       int pstate_nominal_id;
> > +       int nr_pstates;
> > +};
> > +static struct powernv_pstate_info powernv_pstate_info;
> 
> Maybe write it as this (if you like :), as this is the only instance
> of this struct):
> 
> Also, because 'powernv_pstate_info' is a local variable we can get rid of
> powerenv_ from its name and name it just pstate_info. That will make
> code shorter at many places and you may not be required to break
> lines into two at some places. If you wish :)
> 
Ok fair enough!

> +static struct powernv_pstate_info {
> +       int pstate_min_id;
> +       int pstate_max_id;
> +       int pstate_nominal_id;
> +       int nr_pstates;
> +} powernv_pstate_info;
> 
> > +/*
> > + * Initialize the freq table based on data obtained
> > + * from the firmware passed via device-tree
> > + */
> > +static int init_powernv_pstates(void)
> > +{
> > +       struct device_node *power_mgt;
> > +       int nr_pstates = 0;
> > +       int pstate_min, pstate_max, pstate_nominal;
> > +       const __be32 *pstate_ids, *pstate_freqs;
> > +       int i;
> 
> Can merge all the int definitions into a single line.

Too many ints to be incorporated in a single line. But will see if I
can beautify it :-) 

> > +       u32 len_ids, len_freqs;
> > +
> > +       power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
> > +       if (!power_mgt) {
> > +               pr_warn("power-mgt node not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       if (of_property_read_u32(power_mgt, "ibm,pstate-min", &pstate_min)) {
> > +               pr_warn("ibm,pstate-min node not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       if (of_property_read_u32(power_mgt, "ibm,pstate-max", &pstate_max)) {
> > +               pr_warn("ibm,pstate-max node not found\n");
> > +               return -ENODEV;
> > +       }
> 
> Why do you need to get these from DT? And not find that yourself here instead?
>

Well, I think we can obtain a more accurate value from the DT which
would have already computed it. But I'll let vaidy answer this.

> > +       if (of_property_read_u32(power_mgt, "ibm,pstate-nominal",
> > +                                &pstate_nominal)) {
> > +               pr_warn("ibm,pstate-nominal not found\n");
> > +               return -ENODEV;
> > +       }
> > +       pr_info("cpufreq pstate min %d nominal %d max %d\n", pstate_min,
> > +               pstate_nominal, pstate_max);
> > +
> > +       pstate_ids = of_get_property(power_mgt, "ibm,pstate-ids", &len_ids);
> > +       if (!pstate_ids) {
> > +               pr_warn("ibm,pstate-ids not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       pstate_freqs = of_get_property(power_mgt, "ibm,pstate-frequencies-mhz",
> > +                                     &len_freqs);
> > +       if (!pstate_freqs) {
> > +               pr_warn("ibm,pstate-frequencies-mhz not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       WARN_ON(len_ids != len_freqs);
> > +       nr_pstates = min(len_ids, len_freqs) / sizeof(u32);
> > +       if (!nr_pstates) {
> > +               WARN_ON(1);
> > +               return -ENODEV;
> > +       }
> 
> Maybe like this:
> 
> +       if (WARN_ON(!nr_pstates))
> +               return -ENODEV;
> 

Thanks. This looks better.

> > +       pr_debug("NR PStates %d\n", nr_pstates);
> > +       for (i = 0; i < nr_pstates; i++) {
> > +               u32 id = be32_to_cpu(pstate_ids[i]);
> > +               u32 freq = be32_to_cpu(pstate_freqs[i]);
> > +
> > +               pr_debug("PState id %d freq %d MHz\n", id, freq);
> > +               powernv_freqs[i].driver_data = i;
> 
> Looks like more than one comments aren't addressed :(
> You can use this field for your id. And even if you couldn't have
> done that, you don't need to initialize this field at all..


Ok, since we are better off not using it, we shouldn't be initializing
it.

> 
> > +               powernv_freqs[i].frequency = freq * 1000; /* kHz */
> > +               powernv_pstate_ids[i] = id;
> > +       }
> > +       /* End of list marker entry */
> > +       powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
> > +
> > +       powernv_pstate_info.pstate_min_id = pstate_min;
> > +       powernv_pstate_info.pstate_max_id = pstate_max;
> > +       powernv_pstate_info.pstate_nominal_id = pstate_nominal;
> > +       powernv_pstate_info.nr_pstates = nr_pstates;
> > +
> > +       return 0;
> > +}
> > +
> > +/*
> > + * Returns the cpu frequency corresponding to the pstate_id.
> > + */
> 
> Maybe:
> 
> +/* Returns the cpu frequency corresponding to the pstate_id. */
> 

Right, single line comment! Will fix this.

> > +static unsigned int pstate_id_to_freq(int pstate_id)
> > +{
> > +       int i;
> > +
> > +       i = powernv_pstate_info.pstate_max_id - pstate_id;
> 
> It looks like these ids would always be contiguous? In that case
> it would be better if you can mention this property at the top of this
> file in some comment. So, that new people can understand things
> quickly.
> 

Ok.

> > +       BUG_ON(i >= powernv_pstate_info.nr_pstates || i < 0);
> > +       WARN_ON(powernv_pstate_ids[i] != pstate_id);
> 
> Do you really want it? We have already confirmed that 'i' is
> within limits.

"i" might be within limits. But I want to make sure that the pstate_id
corresponding to "i" is the same as the pstate_id that we
requested. Not that anything would have changed since the
initialization, but I get paranoid about these things from time to
time :-)

> 
> > +       return powernv_freqs[i].frequency;
> > +}
> > +

[...]

> > +
> > +/*
> > + * powernv_read_cpu_freq: Reads the current frequency on this cpu.
> > + *
> > + * Called via smp_call_function.
> > + *
> > + * Note: The caller of the smp_call_function should pass an argument of
> > + * the type 'struct powernv_smp_call_data *' along with this function.
> > + *
> > + * The current frequency on this cpu will be returned via
> > + * ((struct powernv_smp_call_data *)arg)->freq;
> > + */
> > +static void powernv_read_cpu_freq(void *arg)
> > +{
> > +       unsigned long pmspr_val;
> > +       s8 local_pstate_id;
> > +        struct powernv_smp_call_data *freq_data;
> > +
> > +       freq_data = (struct powernv_smp_call_data *)arg;
> 
> don't need casting here ?

Doesn't harm having it there. Just like the #includes :-)

> 
> > +
> > +       pmspr_val = get_pmspr(SPRN_PMSR);
> > +
> > +       /*
> > +         * The local pstate id corresponds bits 48..55 in the PMSR.
> > +         * Note: Watch out for the sign!
> > +         */
> > +       local_pstate_id = (pmspr_val >> 48) & 0xFF;
> > +       freq_data->pstate_id = local_pstate_id;
> > +       freq_data->freq = pstate_id_to_freq(freq_data->pstate_id);
> > +
> > +       pr_debug("cpu %d pmsr %lx pstate_id %d frequency %d kHz \n",
> > +               smp_processor_id(), pmspr_val, freq_data->pstate_id,
> 
> s/smp_processor_id/raw_smp_processor_id ?

No. This function is called via smp_call_function(). So we have
preempt_disable on and it is safe to use smp_processor_id.

> 
> > +               freq_data->freq);
> > +}

[...]
> > +/*
> > + * set_pstate: Sets the frequency on this cpu.
> > + *
> > + * This is called via an smp_call_function.
> > + *
> > + * The caller must ensure that freq_data is of the type
> > + * (struct powernv_smp_call_data *) and the pstate_id which needs to be set
> > + * on this cpu should be present in freq_data->pstate_id.
> > + */
> > +static void set_pstate(void *freq_data)
> > +{
> > +       unsigned long val;
> > +       unsigned long pstate_ul =
> > +               ((struct powernv_smp_call_data *) freq_data)->pstate_id;
> > +
> > +       val = get_pmspr(SPRN_PMCR);
> > +       val = val & 0x0000ffffffffffffULL;
> > +
> > +       pstate_ul = pstate_ul & 0xFF;
> > +
> > +       /* Set both global(bits 56..63) and local(bits 48..55) PStates */
> > +       val = val | (pstate_ul << 56) | (pstate_ul << 48);
> > +
> > +       pr_debug("Setting cpu %d pmcr to %016lX\n", smp_processor_id(), val);
> 
> s/smp_processor_id/raw_smp_processor_id ? At other places as well.

Even this function is called via smp_call_function(). So, we should
have preempt disabled.

> 
> > +       set_pmspr(SPRN_PMCR, val);
> > +}
> > +
> > +/*
> > + * powernv_set_freq: Sets the frequency corresponding to the cpufreq
> > + * table entry indexed by new_index on the cpus in the mask 'cpus'
> 
> Rafael doesn't like CPUs to be written as cpus.. I got this comment long
> back :) (Applicable only for comments and logs)

Ah, ok. Will fix this.

> 
> > + */
> > +static int powernv_set_freq(cpumask_var_t cpus, unsigned int new_index)
> 
> Why do you want to keep this routine separately? Why not have a single routine
> powernv_cpufreq_target_index() ?
> 
> > +{
> > +       struct powernv_smp_call_data freq_data;
> > +
> > +       freq_data.pstate_id = powernv_pstate_ids[new_index];
> > +
> > +       /*
> > +        * Use smp_call_function to send IPI and execute the
> > +        * mtspr on target cpu.  We could do that without IPI
> > +        * if current CPU is within policy->cpus (core)
> > +        */
> > +       smp_call_function_any(cpus, set_pstate, &freq_data, 1);
> 
> Not sure how smp_call_function_any() behaves but wouldn't it be
> a good optimization if you can check if raw_smp_processor_id()
> returns one of the CPUs from 'cpus'? And in that case don't
> shoot an IPI.

That's what smp_call_function_any() does.

> 
> Same for the get part as well.
> 
> But then I looked at the implementation of these routines and found
> they already have this optimization in :) .. But with overhead of few
> locks and disable_preempt() :(
> 

Well, locks are taken when we are not running on this_cpu() so that we
can issue an IPI (or use some other optimized communication mechanism)
for executing set_pstate on one of the cpus in cpus. So this overhead
should be acceptable.

On the other hand if we are running on this_cpu(), we simply go ahead
and execute call which is what you're suggesting we do.

> > +}
> > +
> > +static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > +{
> > +       int base, i;
> > +
> > +       base = cpu_first_thread_sibling(policy->cpu);
> > +
> > +       for (i = 0; i < threads_per_core; i++)
> > +               cpumask_set_cpu(base + i, policy->cpus);
> > +
> > +       policy->cpuinfo.transition_latency = 25000;
> > +       policy->cur = powernv_freqs[0].frequency;
> 
> How can you be so sure? Also clock is doing this just after calling init()
> and so you can just remove it :)

You mean s/clock/core ? 

You're right, the core sets policy->cur by invoking driver->get()
which in our case will read it from the PMSR and initialize it with
the correct value. So these lines can be removed.

> 
> > +       return cpufreq_table_validate_and_show(policy, powernv_freqs);
> > +}
> > +
> > +static int powernv_cpufreq_verify(struct cpufreq_policy *policy)
> > +{
> > +       return cpufreq_generic_frequency_table_verify(policy);
> > +}
> > +
> > +static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
> > +                                       unsigned int new_index)
> > +{
> > +       int rc;
> > +
> > +       lock_core_freq(policy->cpu);
> > +       rc = powernv_set_freq(policy->cpus, new_index);
> > +       unlock_core_freq(policy->cpu);
> > +
> > +       return rc;
> > +}
> > +
> > +static struct cpufreq_driver powernv_cpufreq_driver = {
> > +       .name           = "powernv-cpufreq",
> > +       .flags          = CPUFREQ_CONST_LOOPS,
> > +       .init           = powernv_cpufreq_cpu_init,
> > +       .verify         = powernv_cpufreq_verify,
> 
> Can do this instead:
> +       .verify         = cpufreq_generic_frequency_table_verify,

Ok. 

> 

Thanks for a detailed review again.

I'll send out the updated patch today incorporating the comments. It
shouldn't be hard since most of the comments do not affect the
functionality.

--
Thanks and Regards
gautham.


WARNING: multiple messages have this Message-ID (diff)
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	Linux PM list <linux-pm@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"linuxppc-dev@ozlabs.org" <linuxppc-dev@ozlabs.org>,
	Anton Blanchard <anton@samba.org>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Subject: Re: [PATCH v4] powernv, cpufreq: cpufreq driver for powernv platform
Date: Thu, 27 Mar 2014 15:00:50 +0530	[thread overview]
Message-ID: <20140327093050.GA27777@in.ibm.com> (raw)
In-Reply-To: <CAKohponPb+b=8Pa=te3CkiN6QPAUAukngqLSdG4vxc9pUkD8OQ@mail.gmail.com>

On Thu, Mar 27, 2014 at 12:09:53PM +0530, Viresh Kumar wrote:
> Cc'ing Rafael.
>

Thanks. It was a lapse on my part by not Cc'ing Rafael in the first
place.
 
> On 26 March 2014 22:25, Gautham R. Shenoy <ego@linux.vnet.ibm.com> wrote:
> > From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> >
> > Backend driver to dynamically set voltage and frequency on
> > IBM POWER non-virtualized platforms.  Power management SPRs
> > are used to set the required PState.
> >
> > This driver works in conjunction with cpufreq governors
> > like 'ondemand' to provide a demand based frequency and
> > voltage setting on IBM POWER non-virtualized platforms.
> >
> > PState table is obtained from OPAL v3 firmware through device
> > tree.
> >
> > powernv_cpufreq back-end driver would parse the relevant device-tree
> > nodes and initialise the cpufreq subsystem on powernv platform.
> >
> > The code was originally written by svaidy@linux.vnet.ibm.com. Over
> > time it was modified to accomodate bug-fixes as well as updates to the
> > the cpu-freq core. Relevant portions of the change logs corresponding
> > to those modifications are noted below:
> >
> >  * The policy->cpus needs to be populated in a hotplug-invariant
> >    manner instead of using cpu_sibling_mask() which varies with
> >    cpu-hotplug. This is because the cpufreq core code copies this
> >    content into policy->related_cpus mask which is should not vary on
> 
> s/is /

Good catch! Will fix this

[...]
> 
> >  * On POWER systems, the CPU frequency is controlled at a core-level
> >    and hence we need to serialize so that only one of the threads in
> >    the core switches the core's frequency at a time. Introduce
> >    per-core locking to enable finer-grained synchronization and
> >    thereby enhance the speed and responsiveness of the cpufreq driver
> >    to varying workload demands.
> >
> >      The design of per-core locking is very simple and
> >    straight-forward: we first define a Per-CPU lock and use the ones
> >    that belongs to the first thread sibling of the core.
> >
> >      cpu_first_thread_sibling() macro is used to find the *common*
> >    lock for all thread siblings belonging to a core. [Authored by
> >    srivatsa.bhat@linux.vnet.ibm.com]
> 
> We don't need that after serialization patch of core is accepted. And it
> should be accepted soon, in one form or other.

As of now, I prefer this patch be based on code that is in the -next
tree. I'll get rid of the per-core locking once the serialization
patch of the core is accepted.

[...]
> > --- a/arch/powerpc/configs/pseries_defconfig
> > +++ b/arch/powerpc/configs/pseries_defconfig
> > @@ -353,3 +353,4 @@ CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
> >  CONFIG_VIRTUALIZATION=y
> >  CONFIG_KVM_BOOK3S_64=m
> >  CONFIG_KVM_BOOK3S_64_HV=y
> > +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
> > diff --git a/arch/powerpc/configs/pseries_le_defconfig b/arch/powerpc/configs/pseries_le_defconfig
> > index 62771e0..47e6161 100644
> > --- a/arch/powerpc/configs/pseries_le_defconfig
> > +++ b/arch/powerpc/configs/pseries_le_defconfig
> > @@ -350,3 +350,4 @@ CONFIG_CRYPTO_LZO=m
> >  # CONFIG_CRYPTO_ANSI_CPRNG is not set
> >  CONFIG_CRYPTO_DEV_NX=y
> >  CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
> > +CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
> 
> don't know how Rafael want this, but probably this part could have gone
> through ppc tree..
> 

That would mean multiple patches :-)

> > +
> > +#define pr_fmt(fmt)    "powernv-cpufreq: " fmt
> > +
> > +#include <linux/module.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/smp.h>
> > +#include <linux/of.h>
> > +#include <asm/cputhreads.h>
> 
> I thought I gave a comment on missing headers here?

Ok, so these seem to be the missing ones.

#include <linux/kernel.h>
#include <linux/percpu-defs.h>
#include <linux/mutex.h>
#include <linux/sysfs.h>
#include <linux/cpumask.h>

#include <asm/reg.h>

Have I missed anything else ?


> > +static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
> > +static int powernv_pstate_ids[POWERNV_MAX_PSTATES+1];
> 
> I though we don't need this anymore? Either Rafael will take my patch as
> is for the BOOST fixup or we will end up creating .isboost field in
> struct cpufreq_frequency_table and so you could have used .driver_data here.
> 

I mentioned in my reply to your BOOST fixup patch that we would like
pstate_ids to be of the type "signed int", while the .driver_data is
an "unsigned int". If we end up using .driver_data, we would have to
be careful and not use it directly but reassign it to a signed int (or
typecast it) before using it.

Not just that, the pr_debugs in the core which are printed during
frequency transitions will then end up printing large positive values
(since it will interpret negative pstate_ids as unsigned int) in the
place of pstate_ids, which would not be particularly useful.

> > +struct powernv_pstate_info {
> > +       int pstate_min_id;
> > +       int pstate_max_id;
> > +       int pstate_nominal_id;
> > +       int nr_pstates;
> > +};
> > +static struct powernv_pstate_info powernv_pstate_info;
> 
> Maybe write it as this (if you like :), as this is the only instance
> of this struct):
> 
> Also, because 'powernv_pstate_info' is a local variable we can get rid of
> powerenv_ from its name and name it just pstate_info. That will make
> code shorter at many places and you may not be required to break
> lines into two at some places. If you wish :)
> 
Ok fair enough!

> +static struct powernv_pstate_info {
> +       int pstate_min_id;
> +       int pstate_max_id;
> +       int pstate_nominal_id;
> +       int nr_pstates;
> +} powernv_pstate_info;
> 
> > +/*
> > + * Initialize the freq table based on data obtained
> > + * from the firmware passed via device-tree
> > + */
> > +static int init_powernv_pstates(void)
> > +{
> > +       struct device_node *power_mgt;
> > +       int nr_pstates = 0;
> > +       int pstate_min, pstate_max, pstate_nominal;
> > +       const __be32 *pstate_ids, *pstate_freqs;
> > +       int i;
> 
> Can merge all the int definitions into a single line.

Too many ints to be incorporated in a single line. But will see if I
can beautify it :-) 

> > +       u32 len_ids, len_freqs;
> > +
> > +       power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
> > +       if (!power_mgt) {
> > +               pr_warn("power-mgt node not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       if (of_property_read_u32(power_mgt, "ibm,pstate-min", &pstate_min)) {
> > +               pr_warn("ibm,pstate-min node not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       if (of_property_read_u32(power_mgt, "ibm,pstate-max", &pstate_max)) {
> > +               pr_warn("ibm,pstate-max node not found\n");
> > +               return -ENODEV;
> > +       }
> 
> Why do you need to get these from DT? And not find that yourself here instead?
>

Well, I think we can obtain a more accurate value from the DT which
would have already computed it. But I'll let vaidy answer this.

> > +       if (of_property_read_u32(power_mgt, "ibm,pstate-nominal",
> > +                                &pstate_nominal)) {
> > +               pr_warn("ibm,pstate-nominal not found\n");
> > +               return -ENODEV;
> > +       }
> > +       pr_info("cpufreq pstate min %d nominal %d max %d\n", pstate_min,
> > +               pstate_nominal, pstate_max);
> > +
> > +       pstate_ids = of_get_property(power_mgt, "ibm,pstate-ids", &len_ids);
> > +       if (!pstate_ids) {
> > +               pr_warn("ibm,pstate-ids not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       pstate_freqs = of_get_property(power_mgt, "ibm,pstate-frequencies-mhz",
> > +                                     &len_freqs);
> > +       if (!pstate_freqs) {
> > +               pr_warn("ibm,pstate-frequencies-mhz not found\n");
> > +               return -ENODEV;
> > +       }
> > +
> > +       WARN_ON(len_ids != len_freqs);
> > +       nr_pstates = min(len_ids, len_freqs) / sizeof(u32);
> > +       if (!nr_pstates) {
> > +               WARN_ON(1);
> > +               return -ENODEV;
> > +       }
> 
> Maybe like this:
> 
> +       if (WARN_ON(!nr_pstates))
> +               return -ENODEV;
> 

Thanks. This looks better.

> > +       pr_debug("NR PStates %d\n", nr_pstates);
> > +       for (i = 0; i < nr_pstates; i++) {
> > +               u32 id = be32_to_cpu(pstate_ids[i]);
> > +               u32 freq = be32_to_cpu(pstate_freqs[i]);
> > +
> > +               pr_debug("PState id %d freq %d MHz\n", id, freq);
> > +               powernv_freqs[i].driver_data = i;
> 
> Looks like more than one comments aren't addressed :(
> You can use this field for your id. And even if you couldn't have
> done that, you don't need to initialize this field at all..


Ok, since we are better off not using it, we shouldn't be initializing
it.

> 
> > +               powernv_freqs[i].frequency = freq * 1000; /* kHz */
> > +               powernv_pstate_ids[i] = id;
> > +       }
> > +       /* End of list marker entry */
> > +       powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
> > +
> > +       powernv_pstate_info.pstate_min_id = pstate_min;
> > +       powernv_pstate_info.pstate_max_id = pstate_max;
> > +       powernv_pstate_info.pstate_nominal_id = pstate_nominal;
> > +       powernv_pstate_info.nr_pstates = nr_pstates;
> > +
> > +       return 0;
> > +}
> > +
> > +/*
> > + * Returns the cpu frequency corresponding to the pstate_id.
> > + */
> 
> Maybe:
> 
> +/* Returns the cpu frequency corresponding to the pstate_id. */
> 

Right, single line comment! Will fix this.

> > +static unsigned int pstate_id_to_freq(int pstate_id)
> > +{
> > +       int i;
> > +
> > +       i = powernv_pstate_info.pstate_max_id - pstate_id;
> 
> It looks like these ids would always be contiguous? In that case
> it would be better if you can mention this property at the top of this
> file in some comment. So, that new people can understand things
> quickly.
> 

Ok.

> > +       BUG_ON(i >= powernv_pstate_info.nr_pstates || i < 0);
> > +       WARN_ON(powernv_pstate_ids[i] != pstate_id);
> 
> Do you really want it? We have already confirmed that 'i' is
> within limits.

"i" might be within limits. But I want to make sure that the pstate_id
corresponding to "i" is the same as the pstate_id that we
requested. Not that anything would have changed since the
initialization, but I get paranoid about these things from time to
time :-)

> 
> > +       return powernv_freqs[i].frequency;
> > +}
> > +

[...]

> > +
> > +/*
> > + * powernv_read_cpu_freq: Reads the current frequency on this cpu.
> > + *
> > + * Called via smp_call_function.
> > + *
> > + * Note: The caller of the smp_call_function should pass an argument of
> > + * the type 'struct powernv_smp_call_data *' along with this function.
> > + *
> > + * The current frequency on this cpu will be returned via
> > + * ((struct powernv_smp_call_data *)arg)->freq;
> > + */
> > +static void powernv_read_cpu_freq(void *arg)
> > +{
> > +       unsigned long pmspr_val;
> > +       s8 local_pstate_id;
> > +        struct powernv_smp_call_data *freq_data;
> > +
> > +       freq_data = (struct powernv_smp_call_data *)arg;
> 
> don't need casting here ?

Doesn't harm having it there. Just like the #includes :-)

> 
> > +
> > +       pmspr_val = get_pmspr(SPRN_PMSR);
> > +
> > +       /*
> > +         * The local pstate id corresponds bits 48..55 in the PMSR.
> > +         * Note: Watch out for the sign!
> > +         */
> > +       local_pstate_id = (pmspr_val >> 48) & 0xFF;
> > +       freq_data->pstate_id = local_pstate_id;
> > +       freq_data->freq = pstate_id_to_freq(freq_data->pstate_id);
> > +
> > +       pr_debug("cpu %d pmsr %lx pstate_id %d frequency %d kHz \n",
> > +               smp_processor_id(), pmspr_val, freq_data->pstate_id,
> 
> s/smp_processor_id/raw_smp_processor_id ?

No. This function is called via smp_call_function(). So we have
preempt_disable on and it is safe to use smp_processor_id.

> 
> > +               freq_data->freq);
> > +}

[...]
> > +/*
> > + * set_pstate: Sets the frequency on this cpu.
> > + *
> > + * This is called via an smp_call_function.
> > + *
> > + * The caller must ensure that freq_data is of the type
> > + * (struct powernv_smp_call_data *) and the pstate_id which needs to be set
> > + * on this cpu should be present in freq_data->pstate_id.
> > + */
> > +static void set_pstate(void *freq_data)
> > +{
> > +       unsigned long val;
> > +       unsigned long pstate_ul =
> > +               ((struct powernv_smp_call_data *) freq_data)->pstate_id;
> > +
> > +       val = get_pmspr(SPRN_PMCR);
> > +       val = val & 0x0000ffffffffffffULL;
> > +
> > +       pstate_ul = pstate_ul & 0xFF;
> > +
> > +       /* Set both global(bits 56..63) and local(bits 48..55) PStates */
> > +       val = val | (pstate_ul << 56) | (pstate_ul << 48);
> > +
> > +       pr_debug("Setting cpu %d pmcr to %016lX\n", smp_processor_id(), val);
> 
> s/smp_processor_id/raw_smp_processor_id ? At other places as well.

Even this function is called via smp_call_function(). So, we should
have preempt disabled.

> 
> > +       set_pmspr(SPRN_PMCR, val);
> > +}
> > +
> > +/*
> > + * powernv_set_freq: Sets the frequency corresponding to the cpufreq
> > + * table entry indexed by new_index on the cpus in the mask 'cpus'
> 
> Rafael doesn't like CPUs to be written as cpus.. I got this comment long
> back :) (Applicable only for comments and logs)

Ah, ok. Will fix this.

> 
> > + */
> > +static int powernv_set_freq(cpumask_var_t cpus, unsigned int new_index)
> 
> Why do you want to keep this routine separately? Why not have a single routine
> powernv_cpufreq_target_index() ?
> 
> > +{
> > +       struct powernv_smp_call_data freq_data;
> > +
> > +       freq_data.pstate_id = powernv_pstate_ids[new_index];
> > +
> > +       /*
> > +        * Use smp_call_function to send IPI and execute the
> > +        * mtspr on target cpu.  We could do that without IPI
> > +        * if current CPU is within policy->cpus (core)
> > +        */
> > +       smp_call_function_any(cpus, set_pstate, &freq_data, 1);
> 
> Not sure how smp_call_function_any() behaves but wouldn't it be
> a good optimization if you can check if raw_smp_processor_id()
> returns one of the CPUs from 'cpus'? And in that case don't
> shoot an IPI.

That's what smp_call_function_any() does.

> 
> Same for the get part as well.
> 
> But then I looked at the implementation of these routines and found
> they already have this optimization in :) .. But with overhead of few
> locks and disable_preempt() :(
> 

Well, locks are taken when we are not running on this_cpu() so that we
can issue an IPI (or use some other optimized communication mechanism)
for executing set_pstate on one of the cpus in cpus. So this overhead
should be acceptable.

On the other hand if we are running on this_cpu(), we simply go ahead
and execute call which is what you're suggesting we do.

> > +}
> > +
> > +static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > +{
> > +       int base, i;
> > +
> > +       base = cpu_first_thread_sibling(policy->cpu);
> > +
> > +       for (i = 0; i < threads_per_core; i++)
> > +               cpumask_set_cpu(base + i, policy->cpus);
> > +
> > +       policy->cpuinfo.transition_latency = 25000;
> > +       policy->cur = powernv_freqs[0].frequency;
> 
> How can you be so sure? Also clock is doing this just after calling init()
> and so you can just remove it :)

You mean s/clock/core ? 

You're right, the core sets policy->cur by invoking driver->get()
which in our case will read it from the PMSR and initialize it with
the correct value. So these lines can be removed.

> 
> > +       return cpufreq_table_validate_and_show(policy, powernv_freqs);
> > +}
> > +
> > +static int powernv_cpufreq_verify(struct cpufreq_policy *policy)
> > +{
> > +       return cpufreq_generic_frequency_table_verify(policy);
> > +}
> > +
> > +static int powernv_cpufreq_target_index(struct cpufreq_policy *policy,
> > +                                       unsigned int new_index)
> > +{
> > +       int rc;
> > +
> > +       lock_core_freq(policy->cpu);
> > +       rc = powernv_set_freq(policy->cpus, new_index);
> > +       unlock_core_freq(policy->cpu);
> > +
> > +       return rc;
> > +}
> > +
> > +static struct cpufreq_driver powernv_cpufreq_driver = {
> > +       .name           = "powernv-cpufreq",
> > +       .flags          = CPUFREQ_CONST_LOOPS,
> > +       .init           = powernv_cpufreq_cpu_init,
> > +       .verify         = powernv_cpufreq_verify,
> 
> Can do this instead:
> +       .verify         = cpufreq_generic_frequency_table_verify,

Ok. 

> 

Thanks for a detailed review again.

I'll send out the updated patch today incorporating the comments. It
shouldn't be hard since most of the comments do not affect the
functionality.

--
Thanks and Regards
gautham.

  reply	other threads:[~2014-03-27  9:31 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-26 16:55 [PATCH v4] powernv: Dynamic Frequency Scaling Enablement Gautham R. Shenoy
2014-03-26 16:55 ` Gautham R. Shenoy
2014-03-26 16:55 ` [PATCH v4] powernv, cpufreq: cpufreq driver for powernv platform Gautham R. Shenoy
2014-03-26 16:55   ` Gautham R. Shenoy
2014-03-27  3:56   ` Anton Blanchard
2014-03-27  3:56     ` Anton Blanchard
2014-03-27  6:39   ` Viresh Kumar
2014-03-27  6:39     ` Viresh Kumar
2014-03-27  9:30     ` Gautham R Shenoy [this message]
2014-03-27  9:30       ` Gautham R Shenoy
2014-03-27  9:59       ` Viresh Kumar
2014-03-27  9:59         ` Viresh Kumar
2014-03-27 10:21         ` Srivatsa S. Bhat
2014-03-27 10:21           ` Srivatsa S. Bhat
2014-03-27 10:23           ` Viresh Kumar
2014-03-27 10:23             ` Viresh Kumar
2014-03-27 11:20         ` Gautham R Shenoy
2014-03-27 11:20           ` Gautham R Shenoy
2014-03-27 11:29           ` Viresh Kumar
2014-03-27 11:29             ` Viresh Kumar
2014-03-27 12:56           ` Benjamin Herrenschmidt
2014-03-27 12:56             ` Benjamin Herrenschmidt
2014-03-27 10:11       ` Vaidyanathan Srinivasan
2014-03-27 10:11         ` Vaidyanathan Srinivasan
2014-03-27 10:22         ` Viresh Kumar
2014-03-27 10:22           ` Viresh Kumar
2014-03-27 11:12       ` Benjamin Herrenschmidt
2014-03-27 11:12         ` Benjamin Herrenschmidt
2014-03-28  5:25       ` Viresh Kumar
2014-03-28  5:25         ` Viresh Kumar
2014-03-27  5:38 ` [PATCH v4] powernv: Dynamic Frequency Scaling Enablement Viresh Kumar
2014-03-27  5:38   ` Viresh Kumar
2014-03-27  6:28   ` Benjamin Herrenschmidt
2014-03-27  6:28     ` Benjamin Herrenschmidt
2014-03-27  6:42     ` Viresh Kumar
2014-03-27  6:42       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140327093050.GA27777@in.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=rjw@rjwysocki.net \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.