From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eduardo Valentin Subject: Re: [RFC PATCH 4/5] thermal: introduce the Power Allocator governor Date: Tue, 13 May 2014 19:20:41 -0400 Message-ID: <20140513232041.GC28174@developer> References: <1399377998-14870-1-git-send-email-javi.merino@arm.com> <1399377998-14870-5-git-send-email-javi.merino@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mail-yh0-f45.google.com ([209.85.213.45]:62232 "EHLO mail-yh0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754116AbaEMXVD (ORCPT ); Tue, 13 May 2014 19:21:03 -0400 Received: by mail-yh0-f45.google.com with SMTP id b6so1007080yha.18 for ; Tue, 13 May 2014 16:21:01 -0700 (PDT) Content-Disposition: inline In-Reply-To: <1399377998-14870-5-git-send-email-javi.merino@arm.com> Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: Javi Merino Cc: linux-pm@vger.kernel.org, Punit.Agrawal@arm.com, Zhang Rui Hello Javi, Here are a series of initial comments. I will be probably reviewing the code several times and coming back again with more questions. On Tue, May 06, 2014 at 01:06:37PM +0100, Javi Merino wrote: > The power allocator governor is a thermal governor that controls system > and device power allocation to control temperature. Conceptually, the > implementation takes a system view of heat dissipation by managing > multiple heat sources. Can it be able to control several thermal zones then? > > This governor relies on power-aware cooling devices (power actors) to > operate. That is, cooling devices whose thermal_cooling_device_ops > accept THERMAL_UNIT_POWER. > > It uses a Proportional Integral (PI) controller driven by the > temperature of the thermal zone. This budget is then allocated to > each cooling device that can have bearing on the temperature we are > trying to control. It decides how much power to give each cooling > device based on the performance they are requesting. The PI > controller ensures that the total power budget does not exceed the > control temperature. > > Cc: Zhang Rui > Cc: Eduardo Valentin > Signed-off-by: Punit Agrawal > Signed-off-by: Javi Merino > --- > drivers/thermal/Kconfig | 14 ++ > drivers/thermal/Makefile | 1 + > drivers/thermal/power_allocator.c | 403 +++++++++++++++++++++++++++++++++++++ > drivers/thermal/thermal_core.c | 7 +- > drivers/thermal/thermal_core.h | 8 + > include/linux/thermal.h | 5 + We need documentation under Documentation/thermal to understand how to use this governor. Specially on how to accomodate several actors and domains under its control. > 6 files changed, 437 insertions(+), 1 deletion(-) > create mode 100644 drivers/thermal/power_allocator.c > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > index 2d51912a6e40..f5e6b693fc19 100644 > --- a/drivers/thermal/Kconfig > +++ b/drivers/thermal/Kconfig > @@ -71,6 +71,14 @@ config THERMAL_DEFAULT_GOV_USER_SPACE > Select this if you want to let the user space manage the > platform thermals. > > +config THERMAL_DEFAULT_GOV_POWER_ALLOCATOR > + bool "power_allocator" > + select THERMAL_GOV_POWER_ALLOCATOR > + help > + Select this if you want to control temperature based on > + system and device power allocation. This governor relies on > + power-aware cooling devices (power actors) to operate. > + > endchoice > > config THERMAL_GOV_FAIR_SHARE > @@ -89,6 +97,12 @@ config THERMAL_GOV_USER_SPACE > help > Enable this to let the user space manage the platform thermals. > > +config THERMAL_GOV_POWER_ALLOCATOR > + bool "Power allocator thermal governor" > + help > + Enable this to manage platform thermals by dynamically > + allocating and limiting power to devices. > + > config CPU_THERMAL > bool "generic cpu cooling support" > depends on CPU_FREQ > diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile > index 54e4ec9eb5df..1cb7d5fec8dc 100644 > --- a/drivers/thermal/Makefile > +++ b/drivers/thermal/Makefile > @@ -13,6 +13,7 @@ thermal_sys-$(CONFIG_THERMAL_OF) += of-thermal.o > thermal_sys-$(CONFIG_THERMAL_GOV_FAIR_SHARE) += fair_share.o > thermal_sys-$(CONFIG_THERMAL_GOV_STEP_WISE) += step_wise.o > thermal_sys-$(CONFIG_THERMAL_GOV_USER_SPACE) += user_space.o > +thermal_sys-$(CONFIG_THERMAL_GOV_POWER_ALLOCATOR) += power_allocator.o > > # cpufreq cooling > thermal_sys-$(CONFIG_CPU_THERMAL) += cpu_cooling.o > diff --git a/drivers/thermal/power_allocator.c b/drivers/thermal/power_allocator.c > new file mode 100644 > index 000000000000..39c425abf10a > --- /dev/null > +++ b/drivers/thermal/power_allocator.c > @@ -0,0 +1,403 @@ > +/* > + * A power allocator to manage temperature > + * > + * Copyright (C) 2014 ARM Ltd. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + * > + * This program is distributed "as is" WITHOUT ANY WARRANTY of any > + * kind, whether express or implied; without even the implied warranty > + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > + > +#define pr_fmt(fmt) "Power allocator: " fmt > + > +#include > +#include > +#include > + > +#include "thermal_core.h" > + > +#define MAX_NUM_ACTORS 8 hmm... Why do we need to limit the amount of actors? > + > +#define FRAC_BITS 8 > +#define int_to_frac(x) ((x) << FRAC_BITS) > +#define frac_to_int(x) ((x) >> FRAC_BITS) > + > +/** > + * mul_frac - multiply two fixed-point numbers > + * @x: first multiplicand > + * @y: second multiplicand > + * > + * Returns the result of multiplying two fixed-point numbers. The > + * result is also a fixed-point number. > + */ > +static inline s64 mul_frac(s64 x, s64 y) > +{ > + return (x * y) >> FRAC_BITS; > +} > + > +enum power_allocator_trip_levels { > + TRIP_SWITCH_ON = 0, /* Switch on PI controller */ > + TRIP_MAX_DESIRED_TEMPERATURE, /* Temperature we are controlling for */ > +}; Can we reuse the trip types we already have? > + > +/** > + * struct power_allocator_params - parameters for the power allocator governor > + * @k_po: P parameter of the PI controller when overshooting (i.e., when > + * temperature is below the target) > + * @k_pi: P parameter of the PI controller when undershooting > + * @k_i: I parameter of the PI controller > + * @integral_cutoff: threshold below which the error is no longer accumulated > + in the PI controller > + * @err_integral: Accumulated error in the PI controller. > + */ > +struct power_allocator_params { > + s32 k_po; > + s32 k_pu; > + s32 k_i; > + s32 integral_cutoff; > + s32 err_integral; > +}; We would need a documentation on how to define these values. > + > +/** > + * pi_controller() - PI controller > + * @tz: thermal zone we are operating in > + * @control_temp: The target temperature > + * @max_allocatable_power: maximum allocatable power for this thermal zone > + * > + * This PI controller increases the available power budget so that the > + * temperature of the thermal zone gets as close as possible to > + * @control_temp and limits the power if it exceeds it. k_po is the > + * proportional term when we are overshooting, k_pu is the > + * proportional term when we are undershooting. integral_cutoff is a > + * threshold below which we stop accumulating the error. The > + * accumulated error is only valid if the requested power will make > + * the system warmer. If the system is mostly idle, there's no point > + * in accumulating positive error. > + * > + * It returns the power budget for the next period. > + */ > +static u32 pi_controller(struct thermal_zone_device *tz, > + unsigned long current_temp, unsigned long control_temp, > + unsigned long max_allocatable_power) > +{ > + s64 p, i, power_range; > + s32 err; > + struct power_allocator_params *params = tz->governor_data; > + > + err = ((s32)control_temp - (s32)current_temp) / 1000; > + err = int_to_frac(err); > + > + /* Calculate the proportional term */ > + p = mul_frac(err < 0 ? params->k_po : params->k_pu, err); > + > + /* > + * Calculate the integral term > + * > + * if the error s less than cut off allow integration (but > + * the integral is limited to max power) > + */ > + i = mul_frac(params->k_i, params->err_integral); > + > + if (err < int_to_frac(params->integral_cutoff)) { > + s64 tmpi = mul_frac(params->k_i, err); > + tmpi += i; > + if (tmpi <= int_to_frac(max_allocatable_power)) { > + i = tmpi; > + params->err_integral += err; > + } > + } > + > + power_range = p + i; > + > + /* feed-forward the known maximum dissipatable power */ > + power_range = tz->tzp->max_dissipatable_power + > + frac_to_int(power_range); > + > + /* output should not exceed max power */ > + if (power_range > max_allocatable_power) > + power_range = max_allocatable_power; > + > + return power_range >= 0 ? power_range : 0; What does it mean return 0? The above if does not make completelly sense to me, as power_range can still be 0. power_range < 0 means error? what if the value at power_range overflows the u32? > +} > + > +static void divvy_up_power(struct thermal_zone_device *tz, Can you please document how you perform this power split? > + unsigned long req_power[MAX_NUM_ACTORS], > + unsigned long max_power[MAX_NUM_ACTORS], > + int num_actors, unsigned long total_req_power, > + u32 power_range, > + unsigned long granted_power[MAX_NUM_ACTORS]) > +{ > + unsigned long extra_power, capped_extra_power; > + unsigned long extra_actor_power[MAX_NUM_ACTORS]; > + int i; > + > + if (!total_req_power) { > + /* > + * Nobody requested anything, so just give everybody > + * the maximum power > + */ > + for (i = 0; i < num_actors; i++) > + granted_power[i] = max_power[i]; > + > + return; > + } > + > + capped_extra_power = 0; > + extra_power = 0; > + for (i = 0; i < num_actors; i++) { > + u64 req_range = req_power[i] * power_range; > + > + granted_power[i] = div_u64(req_range, total_req_power); > + > + if (granted_power[i] > max_power[i]) { > + extra_power += granted_power[i] - max_power[i]; > + granted_power[i] = max_power[i]; > + } > + > + extra_actor_power[i] = max_power[i] - granted_power[i]; > + capped_extra_power += extra_actor_power[i]; > + } > + > + if (!extra_power) > + return; > + > + /* > + * Re-divvy the reclaimed extra among actors based on > + * how far they are from the max > + */ > + extra_power = min(extra_power, capped_extra_power); > + if (capped_extra_power > 0) > + for (i = 0; i < num_actors; i++) > + granted_power[i] += (extra_actor_power[i] * > + extra_power) / capped_extra_power; > +} > + > +static int allocate_power(struct thermal_zone_device *tz, > + unsigned long current_temp, unsigned long control_temp) > +{ > + struct thermal_instance *instance; > + unsigned long req_power[MAX_NUM_ACTORS], max_power[MAX_NUM_ACTORS]; > + unsigned long granted_power[MAX_NUM_ACTORS]; > + unsigned long total_req_power, max_allocatable_power; > + u32 power_range; > + int i, num_actors, ret = 0; > + > + i = 0; > + total_req_power = 0; > + max_allocatable_power = 0; > + > + mutex_lock(&tz->lock); > + > + list_for_each_entry(instance, &tz->thermal_instances, tz_node) { > + struct thermal_cooling_device *cdev = instance->cdev; > + > + if (instance->trip != TRIP_MAX_DESIRED_TEMPERATURE) How does one instance->trip gets the value of TRIP_MAX_DESIRED_TEMPERATURE? > + continue; > + > + if (i >= MAX_NUM_ACTORS) { > + pr_warn("Too many actors (%d) for this governor\n", i); dev_* family is preferrable over pr_* family. Same applies to other patches. > + ret = -EINVAL; > + goto unlock; > + } > + > + ret = cdev->ops->get_cur(cdev, &req_power[i], > + THERMAL_UNIT_POWER); > + if (ret) { > + pr_warn_once("Couldn't get current power for %s: %d\n", > + cdev->type, ret); > + goto unlock; > + } > + > + total_req_power += req_power[i]; > + > + cdev->ops->get_max(cdev, &max_power[i], THERMAL_UNIT_POWER); > + max_allocatable_power += max_power[i]; > + > + i++; > + } > + num_actors = i; > + > + power_range = pi_controller(tz, current_temp, control_temp, > + max_allocatable_power); > + > + divvy_up_power(tz, req_power, max_power, num_actors, total_req_power, > + power_range, granted_power); > + > + i = 0; > + list_for_each_entry(instance, &tz->thermal_instances, tz_node) { > + struct thermal_cooling_device *cdev = instance->cdev; > + > + if (instance->trip != TRIP_MAX_DESIRED_TEMPERATURE) > + continue; > + > + cdev->ops->set_cur(cdev, granted_power[i], THERMAL_UNIT_POWER); > + i++; > + } > + > +unlock: > + mutex_unlock(&tz->lock); > + > + return ret; > +} > + > +static int check_trips(struct thermal_zone_device *tz) > +{ > + int ret; > + enum thermal_trip_type type; > + > + if (tz->trips < 2) > + return -EINVAL; > + > + ret = tz->ops->get_trip_type(tz, TRIP_SWITCH_ON, &type); > + if (ret) > + return ret; > + > + if ((type != THERMAL_TRIP_PASSIVE) && (type != THERMAL_TRIP_ACTIVE)) > + return -EINVAL; > + > + ret = tz->ops->get_trip_type(tz, TRIP_MAX_DESIRED_TEMPERATURE, &type); > + if (ret) > + return ret; > + > + if ((type != THERMAL_TRIP_PASSIVE) && (type != THERMAL_TRIP_ACTIVE)) > + return -EINVAL; > + > + return ret; > +} > + > +static void reset_pi_controller(struct power_allocator_params *params) > +{ > + params->err_integral = 0; > +} > + > +/** > + * power_allocator_bind - bind the power_allocator governor to a thermal zone > + * @tz: thermal zone to bind it to > + * > + * Check that the thermal zone is valid for this governor: has two > + * thermal trips. If so, initialize the PI controller parameters and > + * bind it to the thermal zone. > + * > + * Returns 0 on success, -EINVAL if the trips were invalid or -ENOMEM > + * if we ran out of memory. > + */ > +static int power_allocator_bind(struct thermal_zone_device *tz) > +{ > + int ret; > + struct power_allocator_params *params; > + unsigned long switch_on_temp, control_temp; > + u32 temperature_threshold; > + > + ret = check_trips(tz); > + if (ret) { > + pr_err("thermal zone %s has the wrong number of trips for this governor\n", > + tz->type); > + return ret; > + } > + > + if (!tz->tzp || !tz->tzp->max_dissipatable_power) { > + pr_err("Trying to bind the power_allocator governor to a thermal zone without the max_dissipatable_power parameter\n"); Can you please break the above line? > + return -EINVAL; > + } > + > + params = kzalloc(sizeof(*params), GFP_KERNEL); can we use devm_kzalloc? > + if (!params) > + return -ENOMEM; > + > + ret = tz->ops->get_trip_temp(tz, TRIP_SWITCH_ON, &switch_on_temp); > + if (ret) > + goto free; > + > + ret = tz->ops->get_trip_temp(tz, TRIP_MAX_DESIRED_TEMPERATURE, > + &control_temp); thermal zone drivers are not aware of these trip types, are they? > + if (ret) > + goto free; > + > + temperature_threshold = (control_temp - switch_on_temp) / 1000; > + > + params->k_po = int_to_frac(tz->tzp->max_dissipatable_power) / > + temperature_threshold; > + params->k_pu = int_to_frac(2 * tz->tzp->max_dissipatable_power) / > + temperature_threshold; > + params->k_i = int_to_frac(1); I suppose we need a way to input k_po, k_pu and k_i parameters from platform code to this governor right? Or are they always defined as the above code? always based on max_dissipated_power? > + params->integral_cutoff = 0; > + > + reset_pi_controller(params); > + > + tz->governor_data = params; > + return 0; > + > +free: > + kfree(params); > + return ret; > +} > + > +static void power_allocator_unbind(struct thermal_zone_device *tz) > +{ > + pr_debug("Unbinding from thermal zone %d\n", tz->id); > + kfree(tz->governor_data); > + tz->governor_data = NULL; > +} > + > +static int power_allocator_throttle(struct thermal_zone_device *tz, int trip) > +{ > + int ret; > + unsigned long switch_on_temp, control_temp, current_temp; > + struct power_allocator_params *params = tz->governor_data; > + > + /* > + * We get called for every trip point but we only need to do > + * our calculations once > + */ > + if (trip != TRIP_MAX_DESIRED_TEMPERATURE) > + return 0; > + > + ret = thermal_zone_get_temp(tz, ¤t_temp); > + if (ret) { > + pr_warn("Failed to get temperature: %d\n", ret); > + return ret; > + } > + > + ret = tz->ops->get_trip_temp(tz, TRIP_SWITCH_ON, &switch_on_temp); > + if (ret) { > + pr_warn("Failed to get switch on temperature: %d\n", ret); > + return ret; > + } > + > + if (current_temp < switch_on_temp) { > + reset_pi_controller(params); > + return 0; > + } > + > + ret = tz->ops->get_trip_temp(tz, TRIP_MAX_DESIRED_TEMPERATURE, > + &control_temp); > + if (ret) { > + pr_warn("Failed to get the maximum desired temperature: %d\n", > + ret); > + return ret; > + } > + > + return allocate_power(tz, current_temp, control_temp); > +} > + > +static struct thermal_governor thermal_gov_power_allocator = { > + .name = "power_allocator", > + .bind_to_tz = power_allocator_bind, > + .unbind_from_tz = power_allocator_unbind, > + .throttle = power_allocator_throttle, > +}; > + > +int thermal_gov_power_allocator_register(void) > +{ > + return thermal_register_governor(&thermal_gov_power_allocator); > +} > + > +void thermal_gov_power_allocator_unregister(void) > +{ > + thermal_unregister_governor(&thermal_gov_power_allocator); > +} > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 1efaadf436fa..bed3d2ef8ef3 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1840,7 +1840,11 @@ static int __init thermal_register_governors(void) > if (result) > return result; > > - return thermal_gov_user_space_register(); > + result = thermal_gov_user_space_register(); > + if (result) > + return result; > + > + return thermal_gov_power_allocator_register(); > } > > static void thermal_unregister_governors(void) > @@ -1848,6 +1852,7 @@ static void thermal_unregister_governors(void) > thermal_gov_step_wise_unregister(); > thermal_gov_fair_share_unregister(); > thermal_gov_user_space_unregister(); > + thermal_gov_power_allocator_unregister(); > } > > static int __init thermal_init(void) > diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h > index 3db339fb636f..b24cde2c71cc 100644 > --- a/drivers/thermal/thermal_core.h > +++ b/drivers/thermal/thermal_core.h > @@ -77,6 +77,14 @@ static inline int thermal_gov_user_space_register(void) { return 0; } > static inline void thermal_gov_user_space_unregister(void) {} > #endif /* CONFIG_THERMAL_GOV_USER_SPACE */ > > +#ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR > +int thermal_gov_power_allocator_register(void); > +void thermal_gov_power_allocator_unregister(void); > +#else > +static inline int thermal_gov_power_allocator_register(void) { return 0; } > +static inline void thermal_gov_power_allocator_unregister(void) {} > +#endif /* CONFIG_THERMAL_GOV_POWER_ALLOCATOR */ > + > /* device tree support */ > #ifdef CONFIG_THERMAL_OF > int of_parse_thermal_zones(void); > diff --git a/include/linux/thermal.h b/include/linux/thermal.h > index 5986f546ca98..3861cb443520 100644 > --- a/include/linux/thermal.h > +++ b/include/linux/thermal.h > @@ -57,6 +57,8 @@ > #define DEFAULT_THERMAL_GOVERNOR "fair_share" > #elif defined(CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE) > #define DEFAULT_THERMAL_GOVERNOR "user_space" > +#elif defined(CONFIG_THERMAL_DEFAULT_GOV_POWER_ALLOCATOR) > +#define DEFAULT_THERMAL_GOVERNOR "power_allocator" > #endif > > extern struct list_head thermal_cdev_list; > @@ -250,6 +252,9 @@ struct thermal_zone_params { > > int num_tbps; /* Number of tbp entries */ > struct thermal_bind_params *tbp; > + > + /* Maximum power (heat) that this thermal zone can dissipate in mW */ > + u32 max_dissipatable_power; > }; > > struct thermal_genl_event { > -- > 1.7.9.5 > >