From: Eduardo Valentin <edubezval@gmail.com>
To: Javi Merino <javi.merino@arm.com>
Cc: linux-pm@vger.kernel.org, Punit.Agrawal@arm.com,
Zhang Rui <rui.zhang@intel.com>
Subject: Re: [RFC PATCH 4/5] thermal: introduce the Power Allocator governor
Date: Tue, 13 May 2014 19:20:41 -0400 [thread overview]
Message-ID: <20140513232041.GC28174@developer> (raw)
In-Reply-To: <1399377998-14870-5-git-send-email-javi.merino@arm.com>
Hello Javi,
Here are a series of initial comments. I will be probably reviewing the
code several times and coming back again with more questions.
On Tue, May 06, 2014 at 01:06:37PM +0100, Javi Merino wrote:
> The power allocator governor is a thermal governor that controls system
> and device power allocation to control temperature. Conceptually, the
> implementation takes a system view of heat dissipation by managing
> multiple heat sources.
Can it be able to control several thermal zones then?
>
> This governor relies on power-aware cooling devices (power actors) to
> operate. That is, cooling devices whose thermal_cooling_device_ops
> accept THERMAL_UNIT_POWER.
>
> It uses a Proportional Integral (PI) controller driven by the
> temperature of the thermal zone. This budget is then allocated to
> each cooling device that can have bearing on the temperature we are
> trying to control. It decides how much power to give each cooling
> device based on the performance they are requesting. The PI
> controller ensures that the total power budget does not exceed the
> control temperature.
>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: Eduardo Valentin <edubezval@gmail.com>
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Signed-off-by: Javi Merino <javi.merino@arm.com>
> ---
> drivers/thermal/Kconfig | 14 ++
> drivers/thermal/Makefile | 1 +
> drivers/thermal/power_allocator.c | 403 +++++++++++++++++++++++++++++++++++++
> drivers/thermal/thermal_core.c | 7 +-
> drivers/thermal/thermal_core.h | 8 +
> include/linux/thermal.h | 5 +
We need documentation under Documentation/thermal to understand
how to use this governor. Specially on how to accomodate
several actors and domains under its control.
> 6 files changed, 437 insertions(+), 1 deletion(-)
> create mode 100644 drivers/thermal/power_allocator.c
>
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index 2d51912a6e40..f5e6b693fc19 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -71,6 +71,14 @@ config THERMAL_DEFAULT_GOV_USER_SPACE
> Select this if you want to let the user space manage the
> platform thermals.
>
> +config THERMAL_DEFAULT_GOV_POWER_ALLOCATOR
> + bool "power_allocator"
> + select THERMAL_GOV_POWER_ALLOCATOR
> + help
> + Select this if you want to control temperature based on
> + system and device power allocation. This governor relies on
> + power-aware cooling devices (power actors) to operate.
> +
> endchoice
>
> config THERMAL_GOV_FAIR_SHARE
> @@ -89,6 +97,12 @@ config THERMAL_GOV_USER_SPACE
> help
> Enable this to let the user space manage the platform thermals.
>
> +config THERMAL_GOV_POWER_ALLOCATOR
> + bool "Power allocator thermal governor"
> + help
> + Enable this to manage platform thermals by dynamically
> + allocating and limiting power to devices.
> +
> config CPU_THERMAL
> bool "generic cpu cooling support"
> depends on CPU_FREQ
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 54e4ec9eb5df..1cb7d5fec8dc 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -13,6 +13,7 @@ thermal_sys-$(CONFIG_THERMAL_OF) += of-thermal.o
> thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE) += fair_share.o
> thermal_sys-$(CONFIG_THERMAL_GOV_STEP_WISE) += step_wise.o
> thermal_sys-$(CONFIG_THERMAL_GOV_USER_SPACE) += user_space.o
> +thermal_sys-$(CONFIG_THERMAL_GOV_POWER_ALLOCATOR) += power_allocator.o
>
> # cpufreq cooling
> thermal_sys-$(CONFIG_CPU_THERMAL) += cpu_cooling.o
> diff --git a/drivers/thermal/power_allocator.c b/drivers/thermal/power_allocator.c
> new file mode 100644
> index 000000000000..39c425abf10a
> --- /dev/null
> +++ b/drivers/thermal/power_allocator.c
> @@ -0,0 +1,403 @@
> +/*
> + * A power allocator to manage temperature
> + *
> + * Copyright (C) 2014 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> + * kind, whether express or implied; without even the implied warranty
> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#define pr_fmt(fmt) "Power allocator: " fmt
> +
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/thermal.h>
> +
> +#include "thermal_core.h"
> +
> +#define MAX_NUM_ACTORS 8
hmm... Why do we need to limit the amount of actors?
> +
> +#define FRAC_BITS 8
> +#define int_to_frac(x) ((x) << FRAC_BITS)
> +#define frac_to_int(x) ((x) >> FRAC_BITS)
> +
> +/**
> + * mul_frac - multiply two fixed-point numbers
> + * @x: first multiplicand
> + * @y: second multiplicand
> + *
> + * Returns the result of multiplying two fixed-point numbers. The
> + * result is also a fixed-point number.
> + */
> +static inline s64 mul_frac(s64 x, s64 y)
> +{
> + return (x * y) >> FRAC_BITS;
> +}
> +
> +enum power_allocator_trip_levels {
> + TRIP_SWITCH_ON = 0, /* Switch on PI controller */
> + TRIP_MAX_DESIRED_TEMPERATURE, /* Temperature we are controlling for */
> +};
Can we reuse the trip types we already have?
> +
> +/**
> + * struct power_allocator_params - parameters for the power allocator governor
> + * @k_po: P parameter of the PI controller when overshooting (i.e., when
> + * temperature is below the target)
> + * @k_pi: P parameter of the PI controller when undershooting
> + * @k_i: I parameter of the PI controller
> + * @integral_cutoff: threshold below which the error is no longer accumulated
> + in the PI controller
> + * @err_integral: Accumulated error in the PI controller.
> + */
> +struct power_allocator_params {
> + s32 k_po;
> + s32 k_pu;
> + s32 k_i;
> + s32 integral_cutoff;
> + s32 err_integral;
> +};
We would need a documentation on how to define these values.
> +
> +/**
> + * pi_controller() - PI controller
> + * @tz: thermal zone we are operating in
> + * @control_temp: The target temperature
> + * @max_allocatable_power: maximum allocatable power for this thermal zone
> + *
> + * This PI controller increases the available power budget so that the
> + * temperature of the thermal zone gets as close as possible to
> + * @control_temp and limits the power if it exceeds it. k_po is the
> + * proportional term when we are overshooting, k_pu is the
> + * proportional term when we are undershooting. integral_cutoff is a
> + * threshold below which we stop accumulating the error. The
> + * accumulated error is only valid if the requested power will make
> + * the system warmer. If the system is mostly idle, there's no point
> + * in accumulating positive error.
> + *
> + * It returns the power budget for the next period.
> + */
> +static u32 pi_controller(struct thermal_zone_device *tz,
> + unsigned long current_temp, unsigned long control_temp,
> + unsigned long max_allocatable_power)
> +{
> + s64 p, i, power_range;
> + s32 err;
> + struct power_allocator_params *params = tz->governor_data;
> +
> + err = ((s32)control_temp - (s32)current_temp) / 1000;
> + err = int_to_frac(err);
> +
> + /* Calculate the proportional term */
> + p = mul_frac(err < 0 ? params->k_po : params->k_pu, err);
> +
> + /*
> + * Calculate the integral term
> + *
> + * if the error s less than cut off allow integration (but
> + * the integral is limited to max power)
> + */
> + i = mul_frac(params->k_i, params->err_integral);
> +
> + if (err < int_to_frac(params->integral_cutoff)) {
> + s64 tmpi = mul_frac(params->k_i, err);
> + tmpi += i;
> + if (tmpi <= int_to_frac(max_allocatable_power)) {
> + i = tmpi;
> + params->err_integral += err;
> + }
> + }
> +
> + power_range = p + i;
> +
> + /* feed-forward the known maximum dissipatable power */
> + power_range = tz->tzp->max_dissipatable_power +
> + frac_to_int(power_range);
> +
> + /* output should not exceed max power */
> + if (power_range > max_allocatable_power)
> + power_range = max_allocatable_power;
> +
> + return power_range >= 0 ? power_range : 0;
What does it mean return 0? The above if does not make completelly sense
to me, as power_range can still be 0. power_range < 0 means error?
what if the value at power_range overflows the u32?
> +}
> +
> +static void divvy_up_power(struct thermal_zone_device *tz,
Can you please document how you perform this power split?
> + unsigned long req_power[MAX_NUM_ACTORS],
> + unsigned long max_power[MAX_NUM_ACTORS],
> + int num_actors, unsigned long total_req_power,
> + u32 power_range,
> + unsigned long granted_power[MAX_NUM_ACTORS])
> +{
> + unsigned long extra_power, capped_extra_power;
> + unsigned long extra_actor_power[MAX_NUM_ACTORS];
> + int i;
> +
> + if (!total_req_power) {
> + /*
> + * Nobody requested anything, so just give everybody
> + * the maximum power
> + */
> + for (i = 0; i < num_actors; i++)
> + granted_power[i] = max_power[i];
> +
> + return;
> + }
> +
> + capped_extra_power = 0;
> + extra_power = 0;
> + for (i = 0; i < num_actors; i++) {
> + u64 req_range = req_power[i] * power_range;
> +
> + granted_power[i] = div_u64(req_range, total_req_power);
> +
> + if (granted_power[i] > max_power[i]) {
> + extra_power += granted_power[i] - max_power[i];
> + granted_power[i] = max_power[i];
> + }
> +
> + extra_actor_power[i] = max_power[i] - granted_power[i];
> + capped_extra_power += extra_actor_power[i];
> + }
> +
> + if (!extra_power)
> + return;
> +
> + /*
> + * Re-divvy the reclaimed extra among actors based on
> + * how far they are from the max
> + */
> + extra_power = min(extra_power, capped_extra_power);
> + if (capped_extra_power > 0)
> + for (i = 0; i < num_actors; i++)
> + granted_power[i] += (extra_actor_power[i] *
> + extra_power) / capped_extra_power;
> +}
> +
> +static int allocate_power(struct thermal_zone_device *tz,
> + unsigned long current_temp, unsigned long control_temp)
> +{
> + struct thermal_instance *instance;
> + unsigned long req_power[MAX_NUM_ACTORS], max_power[MAX_NUM_ACTORS];
> + unsigned long granted_power[MAX_NUM_ACTORS];
> + unsigned long total_req_power, max_allocatable_power;
> + u32 power_range;
> + int i, num_actors, ret = 0;
> +
> + i = 0;
> + total_req_power = 0;
> + max_allocatable_power = 0;
> +
> + mutex_lock(&tz->lock);
> +
> + list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
> + struct thermal_cooling_device *cdev = instance->cdev;
> +
> + if (instance->trip != TRIP_MAX_DESIRED_TEMPERATURE)
How does one instance->trip gets the value of
TRIP_MAX_DESIRED_TEMPERATURE?
> + continue;
> +
> + if (i >= MAX_NUM_ACTORS) {
> + pr_warn("Too many actors (%d) for this governor\n", i);
dev_* family is preferrable over pr_* family. Same applies to other
patches.
> + ret = -EINVAL;
> + goto unlock;
> + }
> +
> + ret = cdev->ops->get_cur(cdev, &req_power[i],
> + THERMAL_UNIT_POWER);
> + if (ret) {
> + pr_warn_once("Couldn't get current power for %s: %d\n",
> + cdev->type, ret);
> + goto unlock;
> + }
> +
> + total_req_power += req_power[i];
> +
> + cdev->ops->get_max(cdev, &max_power[i], THERMAL_UNIT_POWER);
> + max_allocatable_power += max_power[i];
> +
> + i++;
> + }
> + num_actors = i;
> +
> + power_range = pi_controller(tz, current_temp, control_temp,
> + max_allocatable_power);
> +
> + divvy_up_power(tz, req_power, max_power, num_actors, total_req_power,
> + power_range, granted_power);
> +
> + i = 0;
> + list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
> + struct thermal_cooling_device *cdev = instance->cdev;
> +
> + if (instance->trip != TRIP_MAX_DESIRED_TEMPERATURE)
> + continue;
> +
> + cdev->ops->set_cur(cdev, granted_power[i], THERMAL_UNIT_POWER);
> + i++;
> + }
> +
> +unlock:
> + mutex_unlock(&tz->lock);
> +
> + return ret;
> +}
> +
> +static int check_trips(struct thermal_zone_device *tz)
> +{
> + int ret;
> + enum thermal_trip_type type;
> +
> + if (tz->trips < 2)
> + return -EINVAL;
> +
> + ret = tz->ops->get_trip_type(tz, TRIP_SWITCH_ON, &type);
> + if (ret)
> + return ret;
> +
> + if ((type != THERMAL_TRIP_PASSIVE) && (type != THERMAL_TRIP_ACTIVE))
> + return -EINVAL;
> +
> + ret = tz->ops->get_trip_type(tz, TRIP_MAX_DESIRED_TEMPERATURE, &type);
> + if (ret)
> + return ret;
> +
> + if ((type != THERMAL_TRIP_PASSIVE) && (type != THERMAL_TRIP_ACTIVE))
> + return -EINVAL;
> +
> + return ret;
> +}
> +
> +static void reset_pi_controller(struct power_allocator_params *params)
> +{
> + params->err_integral = 0;
> +}
> +
> +/**
> + * power_allocator_bind - bind the power_allocator governor to a thermal zone
> + * @tz: thermal zone to bind it to
> + *
> + * Check that the thermal zone is valid for this governor: has two
> + * thermal trips. If so, initialize the PI controller parameters and
> + * bind it to the thermal zone.
> + *
> + * Returns 0 on success, -EINVAL if the trips were invalid or -ENOMEM
> + * if we ran out of memory.
> + */
> +static int power_allocator_bind(struct thermal_zone_device *tz)
> +{
> + int ret;
> + struct power_allocator_params *params;
> + unsigned long switch_on_temp, control_temp;
> + u32 temperature_threshold;
> +
> + ret = check_trips(tz);
> + if (ret) {
> + pr_err("thermal zone %s has the wrong number of trips for this governor\n",
> + tz->type);
> + return ret;
> + }
> +
> + if (!tz->tzp || !tz->tzp->max_dissipatable_power) {
> + pr_err("Trying to bind the power_allocator governor to a thermal zone without the max_dissipatable_power parameter\n");
Can you please break the above line?
> + return -EINVAL;
> + }
> +
> + params = kzalloc(sizeof(*params), GFP_KERNEL);
can we use devm_kzalloc?
> + if (!params)
> + return -ENOMEM;
> +
> + ret = tz->ops->get_trip_temp(tz, TRIP_SWITCH_ON, &switch_on_temp);
> + if (ret)
> + goto free;
> +
> + ret = tz->ops->get_trip_temp(tz, TRIP_MAX_DESIRED_TEMPERATURE,
> + &control_temp);
thermal zone drivers are not aware of these trip types, are they?
> + if (ret)
> + goto free;
> +
> + temperature_threshold = (control_temp - switch_on_temp) / 1000;
> +
> + params->k_po = int_to_frac(tz->tzp->max_dissipatable_power) /
> + temperature_threshold;
> + params->k_pu = int_to_frac(2 * tz->tzp->max_dissipatable_power) /
> + temperature_threshold;
> + params->k_i = int_to_frac(1);
I suppose we need a way to input k_po, k_pu and k_i parameters from
platform code to this governor right? Or are they always defined as the
above code? always based on max_dissipated_power?
> + params->integral_cutoff = 0;
> +
> + reset_pi_controller(params);
> +
> + tz->governor_data = params;
> + return 0;
> +
> +free:
> + kfree(params);
> + return ret;
> +}
> +
> +static void power_allocator_unbind(struct thermal_zone_device *tz)
> +{
> + pr_debug("Unbinding from thermal zone %d\n", tz->id);
> + kfree(tz->governor_data);
> + tz->governor_data = NULL;
> +}
> +
> +static int power_allocator_throttle(struct thermal_zone_device *tz, int trip)
> +{
> + int ret;
> + unsigned long switch_on_temp, control_temp, current_temp;
> + struct power_allocator_params *params = tz->governor_data;
> +
> + /*
> + * We get called for every trip point but we only need to do
> + * our calculations once
> + */
> + if (trip != TRIP_MAX_DESIRED_TEMPERATURE)
> + return 0;
> +
> + ret = thermal_zone_get_temp(tz, ¤t_temp);
> + if (ret) {
> + pr_warn("Failed to get temperature: %d\n", ret);
> + return ret;
> + }
> +
> + ret = tz->ops->get_trip_temp(tz, TRIP_SWITCH_ON, &switch_on_temp);
> + if (ret) {
> + pr_warn("Failed to get switch on temperature: %d\n", ret);
> + return ret;
> + }
> +
> + if (current_temp < switch_on_temp) {
> + reset_pi_controller(params);
> + return 0;
> + }
> +
> + ret = tz->ops->get_trip_temp(tz, TRIP_MAX_DESIRED_TEMPERATURE,
> + &control_temp);
> + if (ret) {
> + pr_warn("Failed to get the maximum desired temperature: %d\n",
> + ret);
> + return ret;
> + }
> +
> + return allocate_power(tz, current_temp, control_temp);
> +}
> +
> +static struct thermal_governor thermal_gov_power_allocator = {
> + .name = "power_allocator",
> + .bind_to_tz = power_allocator_bind,
> + .unbind_from_tz = power_allocator_unbind,
> + .throttle = power_allocator_throttle,
> +};
> +
> +int thermal_gov_power_allocator_register(void)
> +{
> + return thermal_register_governor(&thermal_gov_power_allocator);
> +}
> +
> +void thermal_gov_power_allocator_unregister(void)
> +{
> + thermal_unregister_governor(&thermal_gov_power_allocator);
> +}
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 1efaadf436fa..bed3d2ef8ef3 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1840,7 +1840,11 @@ static int __init thermal_register_governors(void)
> if (result)
> return result;
>
> - return thermal_gov_user_space_register();
> + result = thermal_gov_user_space_register();
> + if (result)
> + return result;
> +
> + return thermal_gov_power_allocator_register();
> }
>
> static void thermal_unregister_governors(void)
> @@ -1848,6 +1852,7 @@ static void thermal_unregister_governors(void)
> thermal_gov_step_wise_unregister();
> thermal_gov_fair_share_unregister();
> thermal_gov_user_space_unregister();
> + thermal_gov_power_allocator_unregister();
> }
>
> static int __init thermal_init(void)
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index 3db339fb636f..b24cde2c71cc 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -77,6 +77,14 @@ static inline int thermal_gov_user_space_register(void) { return 0; }
> static inline void thermal_gov_user_space_unregister(void) {}
> #endif /* CONFIG_THERMAL_GOV_USER_SPACE */
>
> +#ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR
> +int thermal_gov_power_allocator_register(void);
> +void thermal_gov_power_allocator_unregister(void);
> +#else
> +static inline int thermal_gov_power_allocator_register(void) { return 0; }
> +static inline void thermal_gov_power_allocator_unregister(void) {}
> +#endif /* CONFIG_THERMAL_GOV_POWER_ALLOCATOR */
> +
> /* device tree support */
> #ifdef CONFIG_THERMAL_OF
> int of_parse_thermal_zones(void);
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 5986f546ca98..3861cb443520 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -57,6 +57,8 @@
> #define DEFAULT_THERMAL_GOVERNOR "fair_share"
> #elif defined(CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE)
> #define DEFAULT_THERMAL_GOVERNOR "user_space"
> +#elif defined(CONFIG_THERMAL_DEFAULT_GOV_POWER_ALLOCATOR)
> +#define DEFAULT_THERMAL_GOVERNOR "power_allocator"
> #endif
>
> extern struct list_head thermal_cdev_list;
> @@ -250,6 +252,9 @@ struct thermal_zone_params {
>
> int num_tbps; /* Number of tbp entries */
> struct thermal_bind_params *tbp;
> +
> + /* Maximum power (heat) that this thermal zone can dissipate in mW */
> + u32 max_dissipatable_power;
> };
>
> struct thermal_genl_event {
> --
> 1.7.9.5
>
>
next prev parent reply other threads:[~2014-05-13 23:21 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-06 12:06 [RFC PATCH 0/5] The power allocator thermal governor Javi Merino
2014-05-06 12:06 ` [RFC PATCH 1/5] thermal: let governors have private data for each thermal zone Javi Merino
2014-05-13 14:31 ` edubezval
2014-05-15 14:30 ` Javi Merino
2014-05-15 15:10 ` Eduardo Valentin
2014-05-06 12:06 ` [RFC PATCH 2/5] thermal: let cooling devices operate on other units other than state Javi Merino
2014-05-13 15:22 ` edubezval
2014-05-06 12:06 ` [RFC PATCH 3/5] thermal: add a basic cpu power actor Javi Merino
2014-05-13 22:57 ` Eduardo Valentin
2014-05-15 17:02 ` Javi Merino
2014-05-06 12:06 ` [RFC PATCH 4/5] thermal: introduce the Power Allocator governor Javi Merino
2014-05-13 23:20 ` Eduardo Valentin [this message]
2014-05-15 18:24 ` Javi Merino
2014-05-22 14:26 ` Eduardo Valentin
2014-05-06 12:06 ` [RFC PATCH 5/5] thermal: add trace events to the power allocator governor Javi Merino
2014-05-06 12:28 ` Steven Rostedt
2014-05-14 10:05 ` [RFC PATCH 0/5] The power allocator thermal governor James King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140513232041.GC28174@developer \
--to=edubezval@gmail.com \
--cc=Punit.Agrawal@arm.com \
--cc=javi.merino@arm.com \
--cc=linux-pm@vger.kernel.org \
--cc=rui.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).