* [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface. [not found] <16035918.jZXKnQ3yiq@vostro.rjw.lan> @ 2014-03-14 21:03 ` dirk.brandewie 2014-03-14 21:03 ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: dirk.brandewie @ 2014-03-14 21:03 UTC (permalink / raw) To: linux-pm; +Cc: linux-kernel, rjw, patrick.marlier, dirk.brandewie, Dirk Brandewie From: Dirk Brandewie <dirk.j.brandewie@intel.com> Some drivers (intel_pstate) need to modify state on a core before it is completely offline. The ->exit() callback is executed during the CPU_POST_DEAD phase of the cpu offline process which is too late to change the state of the core. Patch 1 adds an optional callback function to the cpufreq_driver interface which is called by the core during the CPU_DOWN_PREPARE phase of cpu offline in __cpufreq_remove_dev_prepare(). Patch 2 changes intel_pstate to use the ->stop callback to do its cleanup during cpu offline. Dirk Brandewie (2): cpufreq: Add exit_prepare callback to cpufreq_driver interface intel_pstate: Set core to min P state during core offline Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++- drivers/cpufreq/cpufreq.c | 3 +++ drivers/cpufreq/intel_pstate.c | 20 +++++++++++++------- include/linux/cpufreq.h | 1 + 4 files changed, 24 insertions(+), 8 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/2] cpufreq: Add exit_prepare callback to cpufreq_driver interface 2014-03-14 21:03 ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie @ 2014-03-14 21:03 ` dirk.brandewie 2014-03-15 2:04 ` Rafael J. Wysocki 2014-03-14 21:03 ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie 2014-03-18 17:22 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie 2 siblings, 1 reply; 25+ messages in thread From: dirk.brandewie @ 2014-03-14 21:03 UTC (permalink / raw) To: linux-pm; +Cc: linux-kernel, rjw, patrick.marlier, dirk.brandewie, Dirk Brandewie From: Dirk Brandewie <dirk.j.brandewie@intel.com> This callback allows the driver to do clean up before the CPU is completely down and its state cannot be modified. This is used by the intel_pstate driver to reduce the requested P state prior to the core going away. This is required because the requested P state of the offline core is used to select the package P state. This effectively sets the floor package P state to the requested P state on the offline core. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++- drivers/cpufreq/cpufreq.c | 3 +++ include/linux/cpufreq.h | 1 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 8b1a445..79def80 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -61,7 +61,13 @@ target_index - See below on the differences. And optionally -cpufreq_driver.exit - A pointer to a per-CPU cleanup function. +cpufreq_driver.exit - A pointer to a per-CPU cleanup + function called during CPU_POST_DEAD + phase of cpu hotplug process. + +cpufreq_driver.stop - A pointer to a per-CPU stop function + called during CPU_DOWN_PREPARE phase of + cpu hotplug process. cpufreq_driver.resume - A pointer to a per-CPU resume function which is called with interrupts disabled diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index cf485d9..0d430d7 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1338,6 +1338,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, } } + if (cpufreq_driver->stop) + cpufreq_driver->stop(policy); + return 0; } diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 4d89e0e..ff8db19 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -224,6 +224,7 @@ struct cpufreq_driver { int (*bios_limit) (int cpu, unsigned int *limit); int (*exit) (struct cpufreq_policy *policy); + int (*stop) (struct cpufreq_policy *policy); int (*suspend) (struct cpufreq_policy *policy); int (*resume) (struct cpufreq_policy *policy); struct freq_attr **attr; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/2] cpufreq: Add exit_prepare callback to cpufreq_driver interface 2014-03-14 21:03 ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie @ 2014-03-15 2:04 ` Rafael J. Wysocki 2014-03-18 5:43 ` Viresh Kumar 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2014-03-15 2:04 UTC (permalink / raw) To: dirk.brandewie; +Cc: linux-pm, linux-kernel, patrick.marlier, Dirk Brandewie On Friday, March 14, 2014 02:03:56 PM dirk.brandewie@gmail.com wrote: > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > > This callback allows the driver to do clean up before the CPU is > completely down and its state cannot be modified. This is used > by the intel_pstate driver to reduce the requested P state prior to > the core going away. This is required because the requested P state > of the offline core is used to select the package P state. This > effectively sets the floor package P state to the requested P state on > the offline core. > > Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> > --- > Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++- > drivers/cpufreq/cpufreq.c | 3 +++ > include/linux/cpufreq.h | 1 + > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt > index 8b1a445..79def80 100644 > --- a/Documentation/cpu-freq/cpu-drivers.txt > +++ b/Documentation/cpu-freq/cpu-drivers.txt > @@ -61,7 +61,13 @@ target_index - See below on the differences. > > And optionally > > -cpufreq_driver.exit - A pointer to a per-CPU cleanup function. > +cpufreq_driver.exit - A pointer to a per-CPU cleanup > + function called during CPU_POST_DEAD > + phase of cpu hotplug process. > + > +cpufreq_driver.stop - A pointer to a per-CPU stop function > + called during CPU_DOWN_PREPARE phase of > + cpu hotplug process. > > cpufreq_driver.resume - A pointer to a per-CPU resume function > which is called with interrupts disabled > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index cf485d9..0d430d7 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1338,6 +1338,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, > } > } > > + if (cpufreq_driver->stop) What about doing + if (cpufreq_driver->setpolicy && cpufreq_driver->stop) here instead? That would make it clear where the new callback belongs. If you're fine with that, I can make that change when applying the patch. > + cpufreq_driver->stop(policy); > + > return 0; > } > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index 4d89e0e..ff8db19 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -224,6 +224,7 @@ struct cpufreq_driver { > int (*bios_limit) (int cpu, unsigned int *limit); > > int (*exit) (struct cpufreq_policy *policy); > + int (*stop) (struct cpufreq_policy *policy); > int (*suspend) (struct cpufreq_policy *policy); > int (*resume) (struct cpufreq_policy *policy); > struct freq_attr **attr; > -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/2] cpufreq: Add exit_prepare callback to cpufreq_driver interface 2014-03-15 2:04 ` Rafael J. Wysocki @ 2014-03-18 5:43 ` Viresh Kumar 0 siblings, 0 replies; 25+ messages in thread From: Viresh Kumar @ 2014-03-18 5:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dirk Brandewie, Linux PM list, linux-kernel@vger.kernel.org, Patrick Marlier, Dirk Brandewie On Sat, Mar 15, 2014 at 7:34 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Friday, March 14, 2014 02:03:56 PM dirk.brandewie@gmail.com wrote: >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -1338,6 +1338,9 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, >> } >> } >> >> + if (cpufreq_driver->stop) > > What about doing > > + if (cpufreq_driver->setpolicy && cpufreq_driver->stop) > > here instead? That would make it clear where the new callback belongs. This is called after stopping governor and so might be useful for ->target() drivers. So, wouldn't be a bad option if we keep it available for all.. @Dirk: I thought about the solution I mentioned in another mail. And it looks like I will end up getting a similar solution. So, we will go with your solution. Just few changes for your solution.. You need to call ->stop() only for the last CPU of every policy and not for every CPU, so you need something like this: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 19d25a8..78d41c0 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1371,6 +1371,8 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, __func__, new_cpu, cpu); } } + } else if (cpufreq_driver->stop) { + cpufreq_driver->stop(policy); } return 0; ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-14 21:03 ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie 2014-03-14 21:03 ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie @ 2014-03-14 21:03 ` dirk.brandewie 2014-03-18 5:44 ` Viresh Kumar 2014-03-18 17:22 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie 2 siblings, 1 reply; 25+ messages in thread From: dirk.brandewie @ 2014-03-14 21:03 UTC (permalink / raw) To: linux-pm; +Cc: linux-kernel, rjw, patrick.marlier, dirk.brandewie, Dirk Brandewie From: Dirk Brandewie <dirk.j.brandewie@intel.com> Change to use ->stop() callback to do clean up during CPU hotplug. The requested P state for an offline core will be used by the hardware coordination function to select the package P state. If the core is under load when it is offlined it will fix the package P state floor to the requested P state of offline core. Reported-by: Patrick Marlier <patrick.marlier@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 2cd36b9..e9092fd 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate) if (limits.no_turbo) val |= (u64)1 << 32; - wrmsrl(MSR_IA32_PERF_CTL, val); + wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); } static struct cpu_defaults core_params = { @@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy) return 0; } -static int intel_pstate_cpu_exit(struct cpufreq_policy *policy) +static int intel_pstate_cpu_stop(struct cpufreq_policy *policy) { - int cpu = policy->cpu; + int cpu_num = policy->cpu; + struct cpudata *cpu = all_cpu_data[cpu_num]; + + pr_info("intel_pstate CPU %d exiting\n", cpu_num); + + del_timer(&all_cpu_data[cpu_num]->timer); + intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate); + kfree(all_cpu_data[cpu_num]); + all_cpu_data[cpu_num] = NULL; - del_timer(&all_cpu_data[cpu]->timer); - kfree(all_cpu_data[cpu]); - all_cpu_data[cpu] = NULL; return 0; } + static int intel_pstate_cpu_init(struct cpufreq_policy *policy) { struct cpudata *cpu; @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = { .setpolicy = intel_pstate_set_policy, .get = intel_pstate_get, .init = intel_pstate_cpu_init, - .exit = intel_pstate_cpu_exit, + .stop = intel_pstate_cpu_stop, .name = "intel_pstate", }; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-14 21:03 ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie @ 2014-03-18 5:44 ` Viresh Kumar 2014-03-18 15:01 ` Dirk Brandewie 0 siblings, 1 reply; 25+ messages in thread From: Viresh Kumar @ 2014-03-18 5:44 UTC (permalink / raw) To: Dirk Brandewie Cc: Linux PM list, linux-kernel@vger.kernel.org, Rafael J. Wysocki, Patrick Marlier, Dirk Brandewie On Sat, Mar 15, 2014 at 2:33 AM, <dirk.brandewie@gmail.com> wrote: > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > > Change to use ->stop() callback to do clean up during CPU > hotplug. The requested P state for an offline core will be used by the > hardware coordination function to select the package P state. If the > core is under load when it is offlined it will fix the package P state > floor to the requested P state of offline core. > > Reported-by: Patrick Marlier <patrick.marlier@gmail.com> > Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> > --- > drivers/cpufreq/intel_pstate.c | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index 2cd36b9..e9092fd 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate) > if (limits.no_turbo) > val |= (u64)1 << 32; > > - wrmsrl(MSR_IA32_PERF_CTL, val); > + wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); > } > > static struct cpu_defaults core_params = { > @@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy) > return 0; > } > > -static int intel_pstate_cpu_exit(struct cpufreq_policy *policy) > +static int intel_pstate_cpu_stop(struct cpufreq_policy *policy) > { > - int cpu = policy->cpu; > + int cpu_num = policy->cpu; > + struct cpudata *cpu = all_cpu_data[cpu_num]; > + > + pr_info("intel_pstate CPU %d exiting\n", cpu_num); > + > + del_timer(&all_cpu_data[cpu_num]->timer); > + intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate); > + kfree(all_cpu_data[cpu_num]); > + all_cpu_data[cpu_num] = NULL; > > - del_timer(&all_cpu_data[cpu]->timer); > - kfree(all_cpu_data[cpu]); > - all_cpu_data[cpu] = NULL; > return 0; > } > > + > static int intel_pstate_cpu_init(struct cpufreq_policy *policy) > { > struct cpudata *cpu; > @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = { > .setpolicy = intel_pstate_set_policy, > .get = intel_pstate_get, > .init = intel_pstate_cpu_init, > - .exit = intel_pstate_cpu_exit, > + .stop = intel_pstate_cpu_stop, Probably, keep exit as is and only change P-state in stop(). So that allocation of resources happen in init() and they are freed in exit()? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-18 5:44 ` Viresh Kumar @ 2014-03-18 15:01 ` Dirk Brandewie 2014-03-18 18:52 ` Srivatsa S. Bhat 0 siblings, 1 reply; 25+ messages in thread From: Dirk Brandewie @ 2014-03-18 15:01 UTC (permalink / raw) To: Viresh Kumar Cc: dirk.brandewie, Linux PM list, linux-kernel@vger.kernel.org, Rafael J. Wysocki, Patrick Marlier, Dirk Brandewie On 03/17/2014 10:44 PM, Viresh Kumar wrote: > On Sat, Mar 15, 2014 at 2:33 AM, <dirk.brandewie@gmail.com> wrote: >> + >> static int intel_pstate_cpu_init(struct cpufreq_policy *policy) >> { >> struct cpudata *cpu; >> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = { >> .setpolicy = intel_pstate_set_policy, >> .get = intel_pstate_get, >> .init = intel_pstate_cpu_init, >> - .exit = intel_pstate_cpu_exit, >> + .stop = intel_pstate_cpu_stop, > > Probably, keep exit as is and only change P-state in stop(). So that > allocation of resources happen in init() and they are freed in exit()? > I looked at doing just that but it junked up the code. if stop() is called during PREPARE then init() will be called via __cpufreq_add_dev() in the ONLINE and DOWN_FAILED case. So once stop() is called the driver will be ready for init() to be called exactly like when exit() is called. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-18 15:01 ` Dirk Brandewie @ 2014-03-18 18:52 ` Srivatsa S. Bhat 2014-03-18 19:44 ` Dirk Brandewie 0 siblings, 1 reply; 25+ messages in thread From: Srivatsa S. Bhat @ 2014-03-18 18:52 UTC (permalink / raw) To: Dirk Brandewie Cc: Viresh Kumar, Linux PM list, linux-kernel@vger.kernel.org, Rafael J. Wysocki, Patrick Marlier, Dirk Brandewie On 03/18/2014 08:31 PM, Dirk Brandewie wrote: > On 03/17/2014 10:44 PM, Viresh Kumar wrote: >> On Sat, Mar 15, 2014 at 2:33 AM, <dirk.brandewie@gmail.com> wrote: >>> + >>> static int intel_pstate_cpu_init(struct cpufreq_policy *policy) >>> { >>> struct cpudata *cpu; >>> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = { >>> .setpolicy = intel_pstate_set_policy, >>> .get = intel_pstate_get, >>> .init = intel_pstate_cpu_init, >>> - .exit = intel_pstate_cpu_exit, >>> + .stop = intel_pstate_cpu_stop, >> >> Probably, keep exit as is and only change P-state in stop(). So that >> allocation of resources happen in init() and they are freed in exit()? >> > I looked at doing just that but it junked up the code. if stop() is called > during PREPARE then init() will be called via __cpufreq_add_dev() in the > ONLINE > and DOWN_FAILED case. So once stop() is called the driver will be ready for > init() to be called exactly like when exit() is called. > I'm sorry, but that didn't make much sense to me. Can you be a little more specific as to what problems you hit while trying to have a ->stop() which sets min P state and a separate ->exit() which frees the resources? I think we can achieve this with almost no trouble. If you ignore the failure case (such as DOWN_FAILED) for now, do you still see any serious roadblocks? Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-18 18:52 ` Srivatsa S. Bhat @ 2014-03-18 19:44 ` Dirk Brandewie 2014-03-18 20:15 ` Srivatsa S. Bhat 2014-03-19 5:20 ` Viresh Kumar 0 siblings, 2 replies; 25+ messages in thread From: Dirk Brandewie @ 2014-03-18 19:44 UTC (permalink / raw) To: Srivatsa S. Bhat, Dirk Brandewie Cc: dirk.j.brandewie, Viresh Kumar, Linux PM list, linux-kernel@vger.kernel.org, Rafael J. Wysocki, Patrick Marlier On 03/18/2014 11:52 AM, Srivatsa S. Bhat wrote: > On 03/18/2014 08:31 PM, Dirk Brandewie wrote: >> On 03/17/2014 10:44 PM, Viresh Kumar wrote: >>> On Sat, Mar 15, 2014 at 2:33 AM, <dirk.brandewie@gmail.com> wrote: >>>> + >>>> static int intel_pstate_cpu_init(struct cpufreq_policy *policy) >>>> { >>>> struct cpudata *cpu; >>>> @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = { >>>> .setpolicy = intel_pstate_set_policy, >>>> .get = intel_pstate_get, >>>> .init = intel_pstate_cpu_init, >>>> - .exit = intel_pstate_cpu_exit, >>>> + .stop = intel_pstate_cpu_stop, >>> >>> Probably, keep exit as is and only change P-state in stop(). So that >>> allocation of resources happen in init() and they are freed in exit()? >>> >> I looked at doing just that but it junked up the code. if stop() is called >> during PREPARE then init() will be called via __cpufreq_add_dev() in the >> ONLINE >> and DOWN_FAILED case. So once stop() is called the driver will be ready for >> init() to be called exactly like when exit() is called. >> > > I'm sorry, but that didn't make much sense to me. Can you be a little > more specific as to what problems you hit while trying to have a > ->stop() which sets min P state and a separate ->exit() which frees > the resources? I think we can achieve this with almost no trouble. > There was no problem per se. In stop() all I really needed to do is stop the timer and set the P state to MIN. At init time I need to allocate memory and start timer. If stopping the timer and deallocating memory are separated then I need code in init() to detect this case. Moving all the clean up to stop() make my code simpler, covers the failure case and maintains the behaviour expected by the core. > If you ignore the failure case (such as DOWN_FAILED) for now, do you > still see any serious roadblocks? Why would I ignore a valid failure case? > > Regards, > Srivatsa S. Bhat > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-18 19:44 ` Dirk Brandewie @ 2014-03-18 20:15 ` Srivatsa S. Bhat 2014-03-19 5:20 ` Viresh Kumar 1 sibling, 0 replies; 25+ messages in thread From: Srivatsa S. Bhat @ 2014-03-18 20:15 UTC (permalink / raw) To: Dirk Brandewie Cc: dirk.j.brandewie, Viresh Kumar, Linux PM list, linux-kernel@vger.kernel.org, Rafael J. Wysocki, Patrick Marlier On 03/19/2014 01:14 AM, Dirk Brandewie wrote: > On 03/18/2014 11:52 AM, Srivatsa S. Bhat wrote: >> On 03/18/2014 08:31 PM, Dirk Brandewie wrote: >>> On 03/17/2014 10:44 PM, Viresh Kumar wrote: >>>> On Sat, Mar 15, 2014 at 2:33 AM, <dirk.brandewie@gmail.com> wrote: >>>>> + >>>>> static int intel_pstate_cpu_init(struct cpufreq_policy *policy) >>>>> { >>>>> struct cpudata *cpu; >>>>> @@ -818,7 +824,7 @@ static struct cpufreq_driver >>>>> intel_pstate_driver = { >>>>> .setpolicy = intel_pstate_set_policy, >>>>> .get = intel_pstate_get, >>>>> .init = intel_pstate_cpu_init, >>>>> - .exit = intel_pstate_cpu_exit, >>>>> + .stop = intel_pstate_cpu_stop, >>>> >>>> Probably, keep exit as is and only change P-state in stop(). So that >>>> allocation of resources happen in init() and they are freed in exit()? >>>> >>> I looked at doing just that but it junked up the code. if stop() is >>> called >>> during PREPARE then init() will be called via __cpufreq_add_dev() in the >>> ONLINE >>> and DOWN_FAILED case. So once stop() is called the driver will be >>> ready for >>> init() to be called exactly like when exit() is called. >>> >> >> I'm sorry, but that didn't make much sense to me. Can you be a little >> more specific as to what problems you hit while trying to have a >> ->stop() which sets min P state and a separate ->exit() which frees >> the resources? I think we can achieve this with almost no trouble. >> > > There was no problem per se. In stop() all I really needed to do is > stop the > timer and set the P state to MIN. > > At init time I need to allocate memory and start timer. If stopping the > timer > and deallocating memory are separated then I need code in init() to detect > this case. > > Moving all the clean up to stop() make my code simpler, covers the > failure case and maintains the behaviour expected by the core. > >> If you ignore the failure case (such as DOWN_FAILED) for now, do you >> still see any serious roadblocks? > > Why would I ignore a valid failure case? > Of course you shouldn't ignore it. I was just trying to make it easier to think about the design without complicating it with arcane failure cases right at the outset, that's all. Now that I looked at it again, I see your point. The problem is that by the DOWN_PREPARE stage, the core would have completed only half the tear-down (via __cpufreq_remove_dev_prepare()), but on failure, it tries to do a full init (via __cpufreq_add_dev()). I would say that's actually not a great design from the cpufreq core perspective, but perhaps we can fix it at a later point in time if it is that painful to endure. So yes, now I understand see why you do all the teardown in ->stop(), to workaround the somewhat inconvenient rollback performed by the cpufreq core. Your approach looks good to me. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-18 19:44 ` Dirk Brandewie 2014-03-18 20:15 ` Srivatsa S. Bhat @ 2014-03-19 5:20 ` Viresh Kumar 2014-03-19 15:32 ` Dirk Brandewie 1 sibling, 1 reply; 25+ messages in thread From: Viresh Kumar @ 2014-03-19 5:20 UTC (permalink / raw) To: Dirk Brandewie Cc: Srivatsa S. Bhat, Dirk Brandewie, Linux PM list, linux-kernel@vger.kernel.org, Rafael J. Wysocki, Patrick Marlier On 19 March 2014 01:14, Dirk Brandewie <dirk.brandewie@gmail.com> wrote: > There was no problem per se. In stop() all I really needed to do is stop > the > timer and set the P state to MIN. > > At init time I need to allocate memory and start timer. If stopping the > timer > and deallocating memory are separated then I need code in init() to detect > this case. Sorry, I didn't understood what exactly is special here :( If we return failure from CPU_POST_DEAD for some reason without calling exit() then you will have memory leak in your init() as we are allocating memory without checking if we already have that (nothing wrong in it though as other parts of kernel should handle things properly here). Probably the situation would be exactly same if we divide the exit path into stop and exit routines, which I still feel is the right way forward. Because ideally cpufreq shouldn't call init() if it hasn't called exit() (If it is doing that right now then its wrong and can be fixed). And so you must do the cleanup in exit().. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline 2014-03-19 5:20 ` Viresh Kumar @ 2014-03-19 15:32 ` Dirk Brandewie 0 siblings, 0 replies; 25+ messages in thread From: Dirk Brandewie @ 2014-03-19 15:32 UTC (permalink / raw) To: Viresh Kumar, Dirk Brandewie Cc: dirk.j.brandewie, Srivatsa S. Bhat, Linux PM list, linux-kernel@vger.kernel.org, Rafael J. Wysocki, Patrick Marlier On 03/18/2014 10:20 PM, Viresh Kumar wrote: > On 19 March 2014 01:14, Dirk Brandewie <dirk.brandewie@gmail.com> wrote: >> There was no problem per se. In stop() all I really needed to do is stop >> the >> timer and set the P state to MIN. >> >> At init time I need to allocate memory and start timer. If stopping the >> timer >> and deallocating memory are separated then I need code in init() to detect >> this case. > > Sorry, I didn't understood what exactly is special here :( > > If we return failure from CPU_POST_DEAD for some reason without > calling exit() then you will have memory leak in your init() as we are > allocating memory without checking if we already have that (nothing wrong > in it though as other parts of kernel should handle things properly here). No. If you got the CPU_POST_DEAD callback CPU_DOWN_PREPARE has already succeeded. init() is called on the CPU_ONLINE and CPU_DOWN_FAILED path. The issue is there is a two part teardown that can fail and the teardown fail will be followed by a call to init(). If the timer is not running (stopped in stop()) then there is no reason to have the memory around. If CPU_DOWN_PREPARE happens followed by CPU_DOWN_FAILED then intel_pstate is ready for init() to be called with no special case code. This maintains the semantics the core expects. > > Probably the situation would be exactly same if we divide the exit path into > stop and exit routines, which I still feel is the right way forward. Because > ideally cpufreq shouldn't call init() if it hasn't called exit() (If > it is doing that > right now then its wrong and can be fixed). And so you must do the cleanup > in exit().. > The core *is* doing this on the CPU_DOWN_FAILED path ATM. On the CPU_DOWN_FAILED path the core should be undoing the work it did in the CPU_DOWN_PREPARE path this would require another callback to drivers to let them "restart" after a call to stop() as well. I don't think it is worth that level of effort IMHO. --Dirk ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-14 21:03 ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie 2014-03-14 21:03 ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie 2014-03-14 21:03 ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie @ 2014-03-18 17:22 ` dirk.brandewie 2014-03-18 17:22 ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie ` (2 more replies) 2 siblings, 3 replies; 25+ messages in thread From: dirk.brandewie @ 2014-03-18 17:22 UTC (permalink / raw) To: linux-pm Cc: linux-kernel, rjw, patrick.marlier, viresh.kumar, srivatsa.bhat, dirk.brandewie, Dirk Brandewie From: Dirk Brandewie <dirk.j.brandewie@intel.com> Changes: v2->v3 Changed the calling of the ->stop() callback to be conditional on the core being the last core controlled by a given policy. v1->2 Change name of callback function added to cpufreq_driver interface. Some drivers (intel_pstate) need to modify state on a core before it is completely offline. The ->exit() callback is executed during the CPU_POST_DEAD phase of the cpu offline process which is too late to change the state of the core. Patch 1 adds an optional callback function to the cpufreq_driver interface which is called by the core during the CPU_DOWN_PREPARE phase of cpu offline in __cpufreq_remove_dev_prepare(). Patch 2 changes intel_pstate to use the ->stop callback to do its cleanup during cpu offline. Dirk Brandewie (2): cpufreq: Add exit_prepare callback to cpufreq_driver interface intel_pstate: Set core to min P state during core offline Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++- drivers/cpufreq/cpufreq.c | 3 +++ drivers/cpufreq/intel_pstate.c | 20 +++++++++++++------- include/linux/cpufreq.h | 1 + 4 files changed, 24 insertions(+), 8 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/2] cpufreq: Add stop callback to cpufreq_driver interface 2014-03-18 17:22 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie @ 2014-03-18 17:22 ` dirk.brandewie 2014-03-19 5:04 ` Viresh Kumar 2014-03-18 17:22 ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie 2014-03-18 19:08 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat 2 siblings, 1 reply; 25+ messages in thread From: dirk.brandewie @ 2014-03-18 17:22 UTC (permalink / raw) To: linux-pm Cc: linux-kernel, rjw, patrick.marlier, viresh.kumar, srivatsa.bhat, dirk.brandewie, Dirk Brandewie From: Dirk Brandewie <dirk.j.brandewie@intel.com> This callback allows the driver to do clean up before the CPU is completely down and its state cannot be modified. This is used by the intel_pstate driver to reduce the requested P state prior to the core going away. This is required because the requested P state of the offline core is used to select the package P state. This effectively sets the floor package P state to the requested P state on the offline core. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++- drivers/cpufreq/cpufreq.c | 3 ++- include/linux/cpufreq.h | 1 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 8b1a445..79def80 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -61,7 +61,13 @@ target_index - See below on the differences. And optionally -cpufreq_driver.exit - A pointer to a per-CPU cleanup function. +cpufreq_driver.exit - A pointer to a per-CPU cleanup + function called during CPU_POST_DEAD + phase of cpu hotplug process. + +cpufreq_driver.stop - A pointer to a per-CPU stop function + called during CPU_DOWN_PREPARE phase of + cpu hotplug process. cpufreq_driver.resume - A pointer to a per-CPU resume function which is called with interrupts disabled diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index cf485d9..6e6beae 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1336,7 +1336,8 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, __func__, new_cpu, cpu); } } - } + } else if (cpufreq_driver->stop) + cpufreq_driver->stop(policy); return 0; } diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 4d89e0e..ff8db19 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -224,6 +224,7 @@ struct cpufreq_driver { int (*bios_limit) (int cpu, unsigned int *limit); int (*exit) (struct cpufreq_policy *policy); + int (*stop) (struct cpufreq_policy *policy); int (*suspend) (struct cpufreq_policy *policy); int (*resume) (struct cpufreq_policy *policy); struct freq_attr **attr; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] cpufreq: Add stop callback to cpufreq_driver interface 2014-03-18 17:22 ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie @ 2014-03-19 5:04 ` Viresh Kumar 0 siblings, 0 replies; 25+ messages in thread From: Viresh Kumar @ 2014-03-19 5:04 UTC (permalink / raw) To: Dirk Brandewie Cc: linux-pm@vger.kernel.org, Linux Kernel Mailing List, Rafael J. Wysocki, Patrick Marlier, Srivatsa S. Bhat, Dirk Brandewie On 18 March 2014 22:52, <dirk.brandewie@gmail.com> wrote: > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > > This callback allows the driver to do clean up before the CPU is > completely down and its state cannot be modified. This is used > by the intel_pstate driver to reduce the requested P state prior to > the core going away. This is required because the requested P state > of the offline core is used to select the package P state. This > effectively sets the floor package P state to the requested P state on > the offline core. > > Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> > --- > Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++- > drivers/cpufreq/cpufreq.c | 3 ++- > include/linux/cpufreq.h | 1 + > 3 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt > index 8b1a445..79def80 100644 > --- a/Documentation/cpu-freq/cpu-drivers.txt > +++ b/Documentation/cpu-freq/cpu-drivers.txt > @@ -61,7 +61,13 @@ target_index - See below on the differences. > > And optionally > > -cpufreq_driver.exit - A pointer to a per-CPU cleanup function. > +cpufreq_driver.exit - A pointer to a per-CPU cleanup > + function called during CPU_POST_DEAD > + phase of cpu hotplug process. > + > +cpufreq_driver.stop - A pointer to a per-CPU stop function > + called during CPU_DOWN_PREPARE phase of > + cpu hotplug process. > > cpufreq_driver.resume - A pointer to a per-CPU resume function > which is called with interrupts disabled > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index cf485d9..6e6beae 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1336,7 +1336,8 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, > __func__, new_cpu, cpu); > } > } > - } > + } else if (cpufreq_driver->stop) > + cpufreq_driver->stop(policy); > > return 0; > } > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index 4d89e0e..ff8db19 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -224,6 +224,7 @@ struct cpufreq_driver { > int (*bios_limit) (int cpu, unsigned int *limit); > > int (*exit) (struct cpufreq_policy *policy); > + int (*stop) (struct cpufreq_policy *policy); > int (*suspend) (struct cpufreq_policy *policy); > int (*resume) (struct cpufreq_policy *policy); > struct freq_attr **attr; Acked-by: Viresh Kumar <viresh.kumar@linaro.org> ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/2] intel_pstate: Set core to min P state during core offline 2014-03-18 17:22 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie 2014-03-18 17:22 ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie @ 2014-03-18 17:22 ` dirk.brandewie 2014-03-18 19:08 ` Srivatsa S. Bhat 2014-03-18 19:08 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat 2 siblings, 1 reply; 25+ messages in thread From: dirk.brandewie @ 2014-03-18 17:22 UTC (permalink / raw) To: linux-pm Cc: linux-kernel, rjw, patrick.marlier, viresh.kumar, srivatsa.bhat, dirk.brandewie, Dirk Brandewie From: Dirk Brandewie <dirk.j.brandewie@intel.com> Change to use ->exit_prepare() callback to do clean up during CPU hotplug. The requested P state for an offline core will be used by the hardware coordination function to select the package P state. If the core is under load when it is offlined it will fix the package P state floor to the requested P state of offline core. Reported-by: Patrick Marlier <patrick.marlier@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 2cd36b9..e9092fd 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate) if (limits.no_turbo) val |= (u64)1 << 32; - wrmsrl(MSR_IA32_PERF_CTL, val); + wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); } static struct cpu_defaults core_params = { @@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy) return 0; } -static int intel_pstate_cpu_exit(struct cpufreq_policy *policy) +static int intel_pstate_cpu_stop(struct cpufreq_policy *policy) { - int cpu = policy->cpu; + int cpu_num = policy->cpu; + struct cpudata *cpu = all_cpu_data[cpu_num]; + + pr_info("intel_pstate CPU %d exiting\n", cpu_num); + + del_timer(&all_cpu_data[cpu_num]->timer); + intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate); + kfree(all_cpu_data[cpu_num]); + all_cpu_data[cpu_num] = NULL; - del_timer(&all_cpu_data[cpu]->timer); - kfree(all_cpu_data[cpu]); - all_cpu_data[cpu] = NULL; return 0; } + static int intel_pstate_cpu_init(struct cpufreq_policy *policy) { struct cpudata *cpu; @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = { .setpolicy = intel_pstate_set_policy, .get = intel_pstate_get, .init = intel_pstate_cpu_init, - .exit = intel_pstate_cpu_exit, + .stop = intel_pstate_cpu_stop, .name = "intel_pstate", }; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] intel_pstate: Set core to min P state during core offline 2014-03-18 17:22 ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie @ 2014-03-18 19:08 ` Srivatsa S. Bhat 0 siblings, 0 replies; 25+ messages in thread From: Srivatsa S. Bhat @ 2014-03-18 19:08 UTC (permalink / raw) To: dirk.brandewie Cc: linux-pm, linux-kernel, rjw, patrick.marlier, viresh.kumar, Dirk Brandewie On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote: > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > > Change to use ->exit_prepare() callback to do clean up during CPU ->stop() > hotplug. The requested P state for an offline core will be used by the > hardware coordination function to select the package P state. If the > core is under load when it is offlined it will fix the package P state > floor to the requested P state of offline core. > > Reported-by: Patrick Marlier <patrick.marlier@gmail.com> > Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> > --- > drivers/cpufreq/intel_pstate.c | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index 2cd36b9..e9092fd 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -447,7 +447,7 @@ static void core_set_pstate(struct cpudata *cpudata, int pstate) > if (limits.no_turbo) > val |= (u64)1 << 32; > > - wrmsrl(MSR_IA32_PERF_CTL, val); > + wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val); > } > > static struct cpu_defaults core_params = { > @@ -773,16 +773,22 @@ static int intel_pstate_verify_policy(struct cpufreq_policy *policy) > return 0; > } > > -static int intel_pstate_cpu_exit(struct cpufreq_policy *policy) > +static int intel_pstate_cpu_stop(struct cpufreq_policy *policy) > { > - int cpu = policy->cpu; > + int cpu_num = policy->cpu; > + struct cpudata *cpu = all_cpu_data[cpu_num]; > + > + pr_info("intel_pstate CPU %d exiting\n", cpu_num); > + > + del_timer(&all_cpu_data[cpu_num]->timer); > + intel_pstate_set_pstate(cpu, cpu->pstate.min_pstate); > + kfree(all_cpu_data[cpu_num]); > + all_cpu_data[cpu_num] = NULL; > I think it should be relatively simple to keep the intel_pstate_set_pstate() here inside ->stop() and move the rest of the code to ->exit(). Regards, Srivatsa S. Bhat > - del_timer(&all_cpu_data[cpu]->timer); > - kfree(all_cpu_data[cpu]); > - all_cpu_data[cpu] = NULL; > return 0; > } > > + > static int intel_pstate_cpu_init(struct cpufreq_policy *policy) > { > struct cpudata *cpu; > @@ -818,7 +824,7 @@ static struct cpufreq_driver intel_pstate_driver = { > .setpolicy = intel_pstate_set_policy, > .get = intel_pstate_get, > .init = intel_pstate_cpu_init, > - .exit = intel_pstate_cpu_exit, > + .stop = intel_pstate_cpu_stop, > .name = "intel_pstate", > }; > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-18 17:22 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie 2014-03-18 17:22 ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie 2014-03-18 17:22 ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie @ 2014-03-18 19:08 ` Srivatsa S. Bhat 2014-03-18 19:25 ` Dirk Brandewie 2 siblings, 1 reply; 25+ messages in thread From: Srivatsa S. Bhat @ 2014-03-18 19:08 UTC (permalink / raw) To: dirk.brandewie Cc: linux-pm, linux-kernel, rjw, patrick.marlier, viresh.kumar, Dirk Brandewie On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote: > From: Dirk Brandewie <dirk.j.brandewie@intel.com> > I don't mean to nitpick, but generally its easier to deal with patchsets if you post the subsequent versions in fresh email threads. Otherwise it can get a bit muddled along with too many other email discussions in the same thread :-( > Changes: > v2->v3 > Changed the calling of the ->stop() callback to be conditional on the > core being the last core controlled by a given policy. > Wait, why? I'm sorry if I am not catching up with the discussions on this issue quickly enough, but I don't see why we should make it conditional on _that_. I thought we agreed that we should make it conditional in the sense that ->stop() should be invoked only for ->setpolicy drivers, right? The way I look at it, ->stop() gives you a chance to stop managing the CPU going offline. As in "stop this CPU". ->exit() is your chance to cleanup the policy, since all its users have gone offline (or this is the last CPU belonging to that policy which is going offline). With this in mind, we should invoke ->stop() every time we take a CPU offline, and invoke ->exit() only when the last CPU in the policy goes offline. What am I missing? Regards, Srivatsa S. Bhat > v1->2 > Change name of callback function added to cpufreq_driver interface. > > Some drivers (intel_pstate) need to modify state on a core before it > is completely offline. The ->exit() callback is executed during the > CPU_POST_DEAD phase of the cpu offline process which is too late to > change the state of the core. > > Patch 1 adds an optional callback function to the cpufreq_driver > interface which is called by the core during the CPU_DOWN_PREPARE > phase of cpu offline in __cpufreq_remove_dev_prepare(). > > Patch 2 changes intel_pstate to use the ->stop callback to do > its cleanup during cpu offline. > > Dirk Brandewie (2): > cpufreq: Add exit_prepare callback to cpufreq_driver interface > intel_pstate: Set core to min P state during core offline > > Documentation/cpu-freq/cpu-drivers.txt | 8 +++++++- > drivers/cpufreq/cpufreq.c | 3 +++ > drivers/cpufreq/intel_pstate.c | 20 +++++++++++++------- > include/linux/cpufreq.h | 1 + > 4 files changed, 24 insertions(+), 8 deletions(-) > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-18 19:08 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat @ 2014-03-18 19:25 ` Dirk Brandewie 2014-03-18 20:04 ` Srivatsa S. Bhat 2014-03-19 0:53 ` Rafael J. Wysocki 0 siblings, 2 replies; 25+ messages in thread From: Dirk Brandewie @ 2014-03-18 19:25 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: dirk.brandewie, linux-pm, linux-kernel, rjw, patrick.marlier, viresh.kumar, Dirk Brandewie On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote: > On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote: >> From: Dirk Brandewie <dirk.j.brandewie@intel.com> >> > > I don't mean to nitpick, but generally its easier to deal with > patchsets if you post the subsequent versions in fresh email threads. > Otherwise it can get a bit muddled along with too many other email > discussions in the same thread :-( > >> Changes: >> v2->v3 >> Changed the calling of the ->stop() callback to be conditional on the >> core being the last core controlled by a given policy. >> > > Wait, why? I'm sorry if I am not catching up with the discussions on > this issue quickly enough, but I don't see why we should make it > conditional on _that_. I thought we agreed that we should make it > conditional in the sense that ->stop() should be invoked only for > ->setpolicy drivers, right? This was done at Viresh's suggestion since thought there might be value for ->target drivers. Any of the options work for me called only for set_policy scaling drivers called unconditionally for all scaling drivers called for last core controlled by a given policy > > The way I look at it, ->stop() gives you a chance to stop managing > the CPU going offline. As in "stop this CPU". ->exit() is your chance > to cleanup the policy, since all its users have gone offline (or this > is the last CPU belonging to that policy which is going offline). > > With this in mind, we should invoke ->stop() every time we take a > CPU offline, and invoke ->exit() only when the last CPU in the policy > goes offline. This is exactly what will happen for intel_pstate since the policies cover a single core. I will defer to you and Viresh how policies that affect more that one cpu should be handled. What intel_pstate needs it to be called during the PREPARE phase of the offline process. > > What am I missing? > > Regards, > Srivatsa S. Bhat > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-18 19:25 ` Dirk Brandewie @ 2014-03-18 20:04 ` Srivatsa S. Bhat 2014-03-19 0:53 ` Rafael J. Wysocki 1 sibling, 0 replies; 25+ messages in thread From: Srivatsa S. Bhat @ 2014-03-18 20:04 UTC (permalink / raw) To: Dirk Brandewie Cc: linux-pm, linux-kernel, rjw, patrick.marlier, viresh.kumar, Dirk Brandewie On 03/19/2014 12:55 AM, Dirk Brandewie wrote: > On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote: >> On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote: >>> From: Dirk Brandewie <dirk.j.brandewie@intel.com> >>> >> >> I don't mean to nitpick, but generally its easier to deal with >> patchsets if you post the subsequent versions in fresh email threads. >> Otherwise it can get a bit muddled along with too many other email >> discussions in the same thread :-( >> >>> Changes: >>> v2->v3 >>> Changed the calling of the ->stop() callback to be conditional on the >>> core being the last core controlled by a given policy. >>> >> >> Wait, why? I'm sorry if I am not catching up with the discussions on >> this issue quickly enough, but I don't see why we should make it >> conditional on _that_. I thought we agreed that we should make it >> conditional in the sense that ->stop() should be invoked only for >> ->setpolicy drivers, right? > > This was done at Viresh's suggestion since thought there might be value > for ->target drivers. > > Any of the options work for me > called only for set_policy scaling drivers > called unconditionally for all scaling drivers > called for last core controlled by a given policy > >> >> The way I look at it, ->stop() gives you a chance to stop managing >> the CPU going offline. As in "stop this CPU". ->exit() is your chance >> to cleanup the policy, since all its users have gone offline (or this >> is the last CPU belonging to that policy which is going offline). >> >> With this in mind, we should invoke ->stop() every time we take a >> CPU offline, and invoke ->exit() only when the last CPU in the policy >> goes offline. > > This is exactly what will happen for intel_pstate since the policies cover > a single core. > > I will defer to you and Viresh how policies that affect more that one > cpu should be handled. > > What intel_pstate needs it to be called during the PREPARE phase of the > offline process. > Sure, so here are my thoughts: 1. ->target() and ->setpolicy() drivers are mutually exclusive. Rafael had sent a patch to enforce this, and Viresh acked this patch. http://marc.info/?l=linux-pm&m=139458107925911&w=2 http://marc.info/?l=linux-pm&m=139460177829875&w=2 2. The way I understand, ->target() drivers use one of the defined governors, and hence have a way to stop managing the CPUs upon CPU offline events. This is done via the GOV_STOP mechanism (in the DOWN_PREPARE stage). On the other hand, the ->setpolicy() drivers don't have any equivalent mechanism at all, and all they have today is the ->exit() callback which is invoked way too late in the offline process, for it to be of any use. So the goal here is to introduce a new mechanism or callback to help those ->setpolicy drivers to do whatever they wish to do, while taking the CPU offline. In the case of intel_pstate, the driver wants to set the outgoing CPU's frequency to the min P state. With this reasoning, the cpufreq core should invoke the ->stop() callback only for the ->setpolicy() drivers. Let us not over-generalize interfaces unless they are actually going to be useful in broader scenarios. Now let's focus on the second issue - how often should we call ->stop()? Should we call it on every CPU offline or only upon the last CPU going offline in a given policy? If we look back at the ->target() drivers who use the defined governors, we _effectively_ call GOV_STOP during every CPU offline event. That is, the policy needs to stop governing the CPU going offline from this point onwards, hence the GOV_STOP (and subsequent GOV_START without the offline CPU) makes sense. Similarly, I believe that we should call ->stop() on every CPU offline. We already have ->exit() that gets called when the last CPU goes offline in that policy. With this scheme, we give 2 unique properties to ->stop() as compared to ->exit(): a. ->stop() gets called during the CPU_DOWN_PREPARE stage, which lets the ->setpolicy driver actually take some action on the CPU's frequency. b. ->stop() gets called for every CPU offline. The driver can use this fact if useful. Otherwise, the driver can still ignore some of these calls and do the actual work when the last CPU goes offline in that policy. From what I know, the former case is much more useful, and hence this semantics of invoking it on every CPU offline makes sense. Thoughts? Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-18 19:25 ` Dirk Brandewie 2014-03-18 20:04 ` Srivatsa S. Bhat @ 2014-03-19 0:53 ` Rafael J. Wysocki 2014-03-19 5:33 ` Viresh Kumar 1 sibling, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2014-03-19 0:53 UTC (permalink / raw) To: Dirk Brandewie Cc: Srivatsa S. Bhat, linux-pm, linux-kernel, patrick.marlier, viresh.kumar, Dirk Brandewie On Tuesday, March 18, 2014 12:25:14 PM Dirk Brandewie wrote: > On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote: > > On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote: > >> From: Dirk Brandewie <dirk.j.brandewie@intel.com> > >> > > > > I don't mean to nitpick, but generally its easier to deal with > > patchsets if you post the subsequent versions in fresh email threads. > > Otherwise it can get a bit muddled along with too many other email > > discussions in the same thread :-( > > > >> Changes: > >> v2->v3 > >> Changed the calling of the ->stop() callback to be conditional on the > >> core being the last core controlled by a given policy. > >> > > > > Wait, why? I'm sorry if I am not catching up with the discussions on > > this issue quickly enough, but I don't see why we should make it > > conditional on _that_. I thought we agreed that we should make it > > conditional in the sense that ->stop() should be invoked only for > > ->setpolicy drivers, right? > > This was done at Viresh's suggestion since thought there might be value > for ->target drivers. > > Any of the options work for me > called only for set_policy scaling drivers And that's what we should do *today* in my opinion, unless we want to add it to any ->target() drivers *right* now. Do we want that? Rafael ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-19 0:53 ` Rafael J. Wysocki @ 2014-03-19 5:33 ` Viresh Kumar 2014-03-19 14:01 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Viresh Kumar @ 2014-03-19 5:33 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie On 19 March 2014 06:23, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Tuesday, March 18, 2014 12:25:14 PM Dirk Brandewie wrote: >> On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote: >> > On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote: >> >> From: Dirk Brandewie <dirk.j.brandewie@intel.com> >> >> >> > >> > I don't mean to nitpick, but generally its easier to deal with >> > patchsets if you post the subsequent versions in fresh email threads. >> > Otherwise it can get a bit muddled along with too many other email >> > discussions in the same thread :-( >> > >> >> Changes: >> >> v2->v3 >> >> Changed the calling of the ->stop() callback to be conditional on the >> >> core being the last core controlled by a given policy. >> >> >> > >> > Wait, why? I'm sorry if I am not catching up with the discussions on >> > this issue quickly enough, but I don't see why we should make it >> > conditional on _that_. I thought we agreed that we should make it >> > conditional in the sense that ->stop() should be invoked only for >> > ->setpolicy drivers, right? >> >> This was done at Viresh's suggestion since thought there might be value >> for ->target drivers. >> >> Any of the options work for me >> called only for set_policy scaling drivers > > And that's what we should do *today* in my opinion, unless we want to add > it to any ->target() drivers *right* now. Do we want that? We don't have a platform right now that might want to use it, but people are doing this during suspend and so there is a high possibility that they will use it while normal cpu offline as well.. This is what I think: - We are adding a new callback to struct cpufreq_driver.. - We have a classic case here because a driver (set-policy ones) is both a driver and governor. And that's why its special.. - All we want here is to give drivers a chance to do something useful on the CPUs that are going down.. - There is nothing like GOV_STOP for setpolicy drivers as we never needed it and if we really want that, probably we better register setpolicy drivers as governors as well (only to a level where they would get GOV_START|STOP|etc) callbacks and nothing else. - And so I wanted the ->stop() callback to be more about the driver than the governor. - And because a policy contains only the CPUs that share clock line, its only required to call stop() for last CPU of the policy. -- viresh ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-19 5:33 ` Viresh Kumar @ 2014-03-19 14:01 ` Rafael J. Wysocki 2014-03-19 13:49 ` Viresh Kumar 0 siblings, 1 reply; 25+ messages in thread From: Rafael J. Wysocki @ 2014-03-19 14:01 UTC (permalink / raw) To: Viresh Kumar Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie On Wednesday, March 19, 2014 11:03:56 AM Viresh Kumar wrote: > On 19 March 2014 06:23, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > On Tuesday, March 18, 2014 12:25:14 PM Dirk Brandewie wrote: > >> On 03/18/2014 12:08 PM, Srivatsa S. Bhat wrote: > >> > On 03/18/2014 10:52 PM, dirk.brandewie@gmail.com wrote: > >> >> From: Dirk Brandewie <dirk.j.brandewie@intel.com> > >> >> > >> > > >> > I don't mean to nitpick, but generally its easier to deal with > >> > patchsets if you post the subsequent versions in fresh email threads. > >> > Otherwise it can get a bit muddled along with too many other email > >> > discussions in the same thread :-( > >> > > >> >> Changes: > >> >> v2->v3 > >> >> Changed the calling of the ->stop() callback to be conditional on the > >> >> core being the last core controlled by a given policy. > >> >> > >> > > >> > Wait, why? I'm sorry if I am not catching up with the discussions on > >> > this issue quickly enough, but I don't see why we should make it > >> > conditional on _that_. I thought we agreed that we should make it > >> > conditional in the sense that ->stop() should be invoked only for > >> > ->setpolicy drivers, right? > >> > >> This was done at Viresh's suggestion since thought there might be value > >> for ->target drivers. > >> > >> Any of the options work for me > >> called only for set_policy scaling drivers > > > > And that's what we should do *today* in my opinion, unless we want to add > > it to any ->target() drivers *right* now. Do we want that? > > We don't have a platform right now that might want to use it, but people > are doing this during suspend and so there is a high possibility that they > will use it while normal cpu offline as well.. > > This is what I think: > - We are adding a new callback to struct cpufreq_driver.. > - We have a classic case here because a driver (set-policy ones) is both a > driver and governor. And that's why its special.. > - All we want here is to give drivers a chance to do something useful on the > CPUs that are going down.. > - There is nothing like GOV_STOP for setpolicy drivers as we never needed > it and if we really want that, probably we better register setpolicy drivers as > governors as well (only to a level where they would get GOV_START|STOP|etc) > callbacks and nothing else. > - And so I wanted the ->stop() callback to be more about the driver than the > governor. > - And because a policy contains only the CPUs that share clock line, its > only required to call stop() for last CPU of the policy. So you're still insisting on putting ->setpolicy and ->target drivers into one bag, which in my opinion is a mistake. Moreover, it has always been a mistake. Let's add ->stop() (or whatever to call it) *specifically* for ->setpolicy drivers, so that the meaning of it is entirely clear. And *if* any ->target drivers need a similar callback, let's add it for them *separately* (just as a different callback pointer). Yes, we'll potentially waste a pointer size worth of storage this way, but then it will be clear who's supposed to use those new callbacks and when. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-19 14:01 ` Rafael J. Wysocki @ 2014-03-19 13:49 ` Viresh Kumar 2014-03-19 14:25 ` Rafael J. Wysocki 0 siblings, 1 reply; 25+ messages in thread From: Viresh Kumar @ 2014-03-19 13:49 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie On 19 March 2014 19:31, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > Let's add ->stop() (or whatever to call it) *specifically* for ->setpolicy > drivers setpolicy_stop() ? I know that's a bad name :) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface. 2014-03-19 13:49 ` Viresh Kumar @ 2014-03-19 14:25 ` Rafael J. Wysocki 0 siblings, 0 replies; 25+ messages in thread From: Rafael J. Wysocki @ 2014-03-19 14:25 UTC (permalink / raw) To: Viresh Kumar Cc: Dirk Brandewie, Srivatsa S. Bhat, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Patrick Marlier, Dirk Brandewie On Wednesday, March 19, 2014 07:19:13 PM Viresh Kumar wrote: > On 19 March 2014 19:31, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > Let's add ->stop() (or whatever to call it) *specifically* for ->setpolicy > > drivers > > setpolicy_stop() ? I know that's a bad name :) Well, what about ->stop_cpu()? -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-03-19 15:32 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <16035918.jZXKnQ3yiq@vostro.rjw.lan>
2014-03-14 21:03 ` [PATCH v2 0/2] Add exit_prepare callback to the cpufreq_driver interface dirk.brandewie
2014-03-14 21:03 ` [PATCH v2 1/2] cpufreq: Add exit_prepare callback to " dirk.brandewie
2014-03-15 2:04 ` Rafael J. Wysocki
2014-03-18 5:43 ` Viresh Kumar
2014-03-14 21:03 ` [PATCH v2 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
2014-03-18 5:44 ` Viresh Kumar
2014-03-18 15:01 ` Dirk Brandewie
2014-03-18 18:52 ` Srivatsa S. Bhat
2014-03-18 19:44 ` Dirk Brandewie
2014-03-18 20:15 ` Srivatsa S. Bhat
2014-03-19 5:20 ` Viresh Kumar
2014-03-19 15:32 ` Dirk Brandewie
2014-03-18 17:22 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface dirk.brandewie
2014-03-18 17:22 ` [PATCH 1/2] cpufreq: Add stop callback to " dirk.brandewie
2014-03-19 5:04 ` Viresh Kumar
2014-03-18 17:22 ` [PATCH 2/2] intel_pstate: Set core to min P state during core offline dirk.brandewie
2014-03-18 19:08 ` Srivatsa S. Bhat
2014-03-18 19:08 ` [PATCH v3 0/2] Add stop callback to the cpufreq_driver interface Srivatsa S. Bhat
2014-03-18 19:25 ` Dirk Brandewie
2014-03-18 20:04 ` Srivatsa S. Bhat
2014-03-19 0:53 ` Rafael J. Wysocki
2014-03-19 5:33 ` Viresh Kumar
2014-03-19 14:01 ` Rafael J. Wysocki
2014-03-19 13:49 ` Viresh Kumar
2014-03-19 14:25 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).