* RE: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.
From: Zhang Rui @ 2012-11-07 6:36 UTC (permalink / raw)
To: R, Durgadoss
Cc: Jonghwa Lee, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, Brown, Len, Rafael J. Wysocki,
Amit Dinel Kachhap, MyungJoo Ham, Kyungmin Park
In-Reply-To: <4D68720C2E767A4AA6A8796D42C8EB591FFA29@BGSMSX101.gar.corp.intel.com>
On Thu, 2012-11-01 at 23:13 -0600, R, Durgadoss wrote:
> Hi Lee,
>
> > -----Original Message-----
> > From: Jonghwa Lee [mailto:jonghwa3.lee@samsung.com]
> > Sent: Friday, November 02, 2012 7:55 AM
> > To: linux-pm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org; Brown, Len; R, Durgadoss; Rafael J.
> > Wysocki; Amit Dinel Kachhap; MyungJoo Ham; Kyungmin Park; Jonghwa Lee
> > Subject: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's
> > emulation mode.
> >
> > This patch supports exynos's emulation mode with newly created sysfs node.
> > Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
> > management unit. Thermal emulation mode supports software debug for
> > TMU's
> > operation. User can set temperature manually with software code and TMU
> > will read current temperature from user value not from sensor's value.
> > This patch includes also documentary placed under
> > Documentation/thermal/.
> >
>
first of all, what would happen if overheat happens during emulation?
I just had a thought about if we can introduce this to the generic
thermal layer.
to do this, we only need to:
1) introduce tz->emulation
2) introduce thermal_get_temp()
static int thermal_get_temp(tz) {
if (tz->emulation)
return tz->emulation;
else
return tz->ops->get_temp(tz);
}
3) replace tz->ops->get_temp() with thermal_get_temp() in thermal layer
4) introduce /sys/class/thermal/thermal_zoneX/emulation
5) when setting /sys/class/thermal/thermal_zoneX/emulation,
a) set tz->emulation
b) invoke thermal_zone_device_update();
this is a pure software emulation solution but it would work on all
generic thermal layer users.
do you think this proposal would work properly?
if yes, I'd like to see if it is valuable for the other platform thermal
drivers.
thanks,
rui
> Thanks for fixing the comments.
> Please CC linux-acpi, when you submit thermal patches, going forward.
> I am CCing Rui for now, for him to review/merge this patch.
>
> Reviewed-by: Durgadoss R <durgadoss.r@intel.com>
>
> Thanks,
> Durga
>
> > Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
> > ---
> > v4
> > - Fix Typo.
> > - Remove unnecessary codes.
> > - Add comments about feature of exynos emulation operation to the
> > document.
> >
> > v3
> > - Remove unnecessay variables.
> > - Do some code clean in exynos_tmu_emulation_store().
> > - Make wrapping function of sysfs node creation function to use
> > #ifdefs in minimum.
> >
> > v2
> > exynos_thermal.c
> > - Fix build error occured by wrong emulation control register name.
> > - Remove exynos5410 dependent codes.
> > exynos_thermal_emulation
> > - Align indentation.
> >
> > Documentation/thermal/exynos_thermal_emulation | 56
> > +++++++++++++++
> > drivers/thermal/Kconfig | 9 +++
> > drivers/thermal/exynos_thermal.c | 91
> > ++++++++++++++++++++++++
> > 3 files changed, 156 insertions(+), 0 deletions(-)
> > create mode 100644 Documentation/thermal/exynos_thermal_emulation
> >
> > diff --git a/Documentation/thermal/exynos_thermal_emulation
> > b/Documentation/thermal/exynos_thermal_emulation
> > new file mode 100644
> > index 0000000..a6ea06f
> > --- /dev/null
> > +++ b/Documentation/thermal/exynos_thermal_emulation
> > @@ -0,0 +1,56 @@
> > +EXYNOS EMULATION MODE
> > +========================
> > +
> > +Copyright (C) 2012 Samsung Electronics
> > +
> > +Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
> > +
> > +Description
> > +-----------
> > +
> > +Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
> > management unit.
> > +Thermal emulation mode supports software debug for TMU's operation.
> > User can set temperature
> > +manually with software code and TMU will read current temperature from
> > user value not from
> > +sensor's value.
> > +
> > +Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support
> > in available.
> > +When it's enabled, sysfs node will be created under
> > +/sys/bus/platform/devices/'exynos device name'/ with name of
> > 'emulation'.
> > +
> > +The sysfs node, 'emulation', will contain value 0 for the initial state. When
> > you input any
> > +temperature you want to update to sysfs node, it automatically enable
> > emulation mode and
> > +current temperature will be changed into it.
> > +(Exynos also supports user changable delay time which would be used to
> > delay of
> > + changing temperature. However, this node only uses same delay of real
> > sensing time, 938us.)
> > +
> > +Exynos emulation mode requires synchronous of value changing and
> > enabling. It means when you
> > +want to update the any value of delay or next temperature, then you have
> > to enable emulation
> > +mode at the same time. (Or you have to keep the mode enabling.) If you
> > don't, it fails to
> > +change the value to updated one and just use last succeessful value
> > repeatedly. That's why
> > +this node gives users the right to change termerpature only. Just one
> > interface makes it more
> > +simply to use.
> > +
> > +Disabling emulation mode only requires writing value 0 to sysfs node.
> > +
> > +
> > +TEMP 120 |
> > + |
> > + 100 |
> > + |
> > + 80 |
> > + | +-----------
> > + 60 | | |
> > + | +-------------| |
> > + 40 | | | |
> > + | | | |
> > + 20 | | | +----------
> > + | | | | |
> > + 0
> > |______________|_____________|__________|__________|_______
> > __
> > + A A A A TIME
> > + |<----->| |<----->| |<----->| |
> > + | 938us | | | | | |
> > +emulation : 0 50 | 70 | 20 | 0
> > +current temp : sensor 50 70 20 sensor
> > +
> > +
> > +
> > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> > index e1cb6bd..c02a66c 100644
> > --- a/drivers/thermal/Kconfig
> > +++ b/drivers/thermal/Kconfig
> > @@ -55,3 +55,12 @@ config EXYNOS_THERMAL
> > help
> > If you say yes here you get support for TMU (Thermal Managment
> > Unit) on SAMSUNG EXYNOS series of SoC.
> > +
> > +config EXYNOS_THERMAL_EMUL
> > + bool "EXYNOS TMU emulation mode support"
> > + depends on !CPU_EXYNOS4210 && EXYNOS_THERMAL
> > + help
> > + Exynos 4412 and 4414 and 5 series has emulation mode on TMU.
> > + Enable this option will be make sysfs node in exynos thermal
> > platform
> > + device directory to support emulation mode. With emulation mode
> > sysfs
> > + node, you can manually input temperature to TMU for simulation
> > purpose.
> > diff --git a/drivers/thermal/exynos_thermal.c
> > b/drivers/thermal/exynos_thermal.c
> > index fd03e85..eebd4e5 100644
> > --- a/drivers/thermal/exynos_thermal.c
> > +++ b/drivers/thermal/exynos_thermal.c
> > @@ -99,6 +99,14 @@
> > #define IDLE_INTERVAL 10000
> > #define MCELSIUS 1000
> >
> > +#ifdef CONFIG_EXYNOS_THERMAL_EMUL
> > +#define EXYNOS_EMUL_TIME 0x57F0
> > +#define EXYNOS_EMUL_TIME_SHIFT 16
> > +#define EXYNOS_EMUL_DATA_SHIFT 8
> > +#define EXYNOS_EMUL_DATA_MASK 0xFF
> > +#define EXYNOS_EMUL_ENABLE 0x1
> > +#endif /* CONFIG_EXYNOS_THERMAL_EMUL */
> > +
> > /* CPU Zone information */
> > #define PANIC_ZONE 4
> > #define WARN_ZONE 3
> > @@ -832,6 +840,82 @@ static inline struct exynos_tmu_platform_data
> > *exynos_get_driver_data(
> > return (struct exynos_tmu_platform_data *)
> > platform_get_device_id(pdev)->driver_data;
> > }
> > +
> > +#ifdef CONFIG_EXYNOS_THERMAL_EMUL
> > +static ssize_t exynos_tmu_emulation_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct platform_device *pdev = container_of(dev,
> > + struct platform_device, dev);
> > + struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> > + unsigned int reg;
> > + u8 temp_code;
> > + int temp = 0;
> > +
> > + mutex_lock(&data->lock);
> > + clk_enable(data->clk);
> > + reg = readl(data->base + EXYNOS_EMUL_CON);
> > + clk_disable(data->clk);
> > + mutex_unlock(&data->lock);
> > +
> > + if (reg & EXYNOS_EMUL_ENABLE) {
> > + reg >>= EXYNOS_EMUL_DATA_SHIFT;
> > + temp_code = reg & EXYNOS_EMUL_DATA_MASK;
> > + temp = code_to_temp(data, temp_code);
> > + }
> > +
> > + return sprintf(buf, "%d\n", temp);
> > +}
> > +
> > +static ssize_t exynos_tmu_emulation_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct platform_device *pdev = container_of(dev,
> > + struct platform_device, dev);
> > + struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> > + unsigned int reg;
> > + int temp;
> > +
> > + if (!sscanf(buf, "%d\n", &temp) || temp < 0)
> > + return -EINVAL;
> > +
> > + mutex_lock(&data->lock);
> > + clk_enable(data->clk);
> > +
> > + reg = readl(data->base + EXYNOS_EMUL_CON);
> > +
> > + if (temp)
> > + reg = (EXYNOS_EMUL_TIME << EXYNOS_EMUL_TIME_SHIFT)
> > |
> > + (temp_to_code(data, temp) <<
> > EXYNOS_EMUL_DATA_SHIFT) |
> > + EXYNOS_EMUL_ENABLE;
> > + else
> > + reg &= ~EXYNOS_EMUL_ENABLE;
> > +
> > + writel(reg, data->base + EXYNOS_EMUL_CON);
> > +
> > + clk_disable(data->clk);
> > + mutex_unlock(&data->lock);
> > +
> > + return count;
> > +}
> > +
> > +static DEVICE_ATTR(emulation, 0644, exynos_tmu_emulation_show,
> > + exynos_tmu_emulation_store);
> > +static int create_emulation_sysfs(struct device *dev)
> > +{
> > + return device_create_file(dev, &dev_attr_emulation);
> > +}
> > +static void remove_emulation_sysfs(struct device *dev)
> > +{
> > + device_remove_file(dev, &dev_attr_emulation);
> > +}
> > +#else
> > +static inline int create_emulation_sysfs(struct device *dev) {return 0;}
> > +static inline void remove_emulation_sysfs(struct device *dev){}
> > +#endif
> > +
> > static int __devinit exynos_tmu_probe(struct platform_device *pdev)
> > {
> > struct exynos_tmu_data *data;
> > @@ -930,6 +1014,11 @@ static int __devinit exynos_tmu_probe(struct
> > platform_device *pdev)
> > dev_err(&pdev->dev, "Failed to register thermal
> > interface\n");
> > goto err_clk;
> > }
> > +
> > + ret = create_emulation_sysfs(&pdev->dev);
> > + if (ret)
> > + dev_err(&pdev->dev, "Failed to create emulation mode
> > sysfs node\n");
> > +
> > return 0;
> > err_clk:
> > platform_set_drvdata(pdev, NULL);
> > @@ -941,6 +1030,8 @@ static int __devexit exynos_tmu_remove(struct
> > platform_device *pdev)
> > {
> > struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> >
> > + remove_emulation_sysfs(&pdev->dev);
> > +
> > exynos_tmu_control(pdev, false);
> >
> > exynos_unregister_thermal();
> > --
> > 1.7.4.1
>
^ permalink raw reply
* Re: [PATCH V3 1/5] Thermal: add indent for code alignment.
From: Zhang Rui @ 2012-11-07 6:54 UTC (permalink / raw)
To: hongbo.zhang
Cc: linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
linaro-dev-cunTk1MwBs8s++Sfvej+rw, patches-QSEj5FYQhm4dnm+yROfE0A,
linux-pm-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
STEricsson_nomadik_linux-nkJGhpqTU55BDgjK7y7TUQ,
kernel-vMlcbD5RyM6HZuj8yyL1ah2eb7JE58TQ, hongbo.zhang
In-Reply-To: <1351615741-24134-2-git-send-email-hongbo.zhang-68IGFXMjmZ7QT0dZR+AlfA@public.gmane.org>
On Tue, 2012-10-30 at 17:48 +0100, hongbo.zhang wrote:
> From: "hongbo.zhang" <hongbo.zhang-68IGFXMjmZ7QT0dZR+AlfA@public.gmane.org>
>
> The curly bracket should be aligned with corresponding if else statements.
>
> Signed-off-by: hongbo.zhang <hongbo.zhang-68IGFXMjmZ7QT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
applied to thermal-next.
thanks,
rui
> ---
> drivers/thermal/cpu_cooling.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index cc1c930..b6b4c2a 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -369,7 +369,7 @@ struct thermal_cooling_device *cpufreq_cooling_register(
> if (min != policy.cpuinfo.min_freq ||
> max != policy.cpuinfo.max_freq)
> return ERR_PTR(-EINVAL);
> -}
> + }
> }
> cpufreq_dev = kzalloc(sizeof(struct cpufreq_cooling_device),
> GFP_KERNEL);
^ permalink raw reply
* Re: [PATCH V3 2/5] Thermal: fix bug of counting cpu frequencies.
From: Zhang Rui @ 2012-11-07 6:54 UTC (permalink / raw)
To: hongbo.zhang
Cc: linaro-dev, linux-kernel, linux-pm, amit.kachhap, patches,
linaro-kernel, STEricsson_nomadik_linux, kernel, hongbo.zhang
In-Reply-To: <1351615741-24134-3-git-send-email-hongbo.zhang@linaro.com>
On Tue, 2012-10-30 at 17:48 +0100, hongbo.zhang wrote:
> From: "hongbo.zhang" <hongbo.zhang@linaro.com>
>
> In the while loop for counting cpu frequencies, if table[i].frequency equals
> CPUFREQ_ENTRY_INVALID, index i won't be increased, so this leads to an endless
> loop, what's more the index i cannot be referred as cpu frequencies number if
> there is CPUFREQ_ENTRY_INVALID case.
>
> Signed-off-by: hongbo.zhang <hongbo.zhang@linaro.com>
> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
> Reviewed-by: Amit Daniel Kachhap <amit.kachhap@linaro.org>
applied to thermal-next.
thanks,
rui
> ---
> drivers/thermal/cpu_cooling.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index b6b4c2a..bfd62b7 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -245,6 +245,7 @@ static int cpufreq_get_max_state(struct thermal_cooling_device *cdev,
> struct cpumask *maskPtr;
> unsigned int cpu;
> struct cpufreq_frequency_table *table;
> + unsigned long count = 0;
>
> mutex_lock(&cooling_cpufreq_lock);
> list_for_each_entry(cpufreq_device, &cooling_cpufreq_list, node) {
> @@ -263,13 +264,14 @@ static int cpufreq_get_max_state(struct thermal_cooling_device *cdev,
> goto return_get_max_state;
> }
>
> - while (table[i].frequency != CPUFREQ_TABLE_END) {
> + for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++) {
> if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
> continue;
> - i++;
> + count++;
> }
> - if (i > 0) {
> - *state = --i;
> +
> + if (count > 0) {
> + *state = --count;
> ret = 0;
> }
>
^ permalink raw reply
* Re: [PATCH V3 3/5] Thermal: Remove the cooling_cpufreq_list.
From: Zhang Rui @ 2012-11-07 6:54 UTC (permalink / raw)
To: hongbo.zhang
Cc: linaro-dev, linux-kernel, linux-pm, amit.kachhap, patches,
linaro-kernel, STEricsson_nomadik_linux, kernel, hongbo.zhang
In-Reply-To: <1351615741-24134-4-git-send-email-hongbo.zhang@linaro.com>
On Tue, 2012-10-30 at 17:48 +0100, hongbo.zhang wrote:
> From: "hongbo.zhang" <hongbo.zhang@linaro.com>
>
> Problem of using this list is that the cpufreq_get_max_state callback will be
> called when register cooling device by thermal_cooling_device_register, but
> this list isn't ready at this moment. What's more, there is no need to maintain
> such a list, we can get cpufreq_cooling_device instance by the private
> thermal_cooling_device.devdata.
>
> Signed-off-by: hongbo.zhang <hongbo.zhang@linaro.com>
> Reviewed-by: Francesco Lavra <francescolavra.fl@gmail.com>
> Reviewed-by: Amit Daniel Kachhap <amit.kachhap@linaro.org>
applied to thermal-next.
thanks,
rui
> ---
> drivers/thermal/cpu_cooling.c | 91 +++++++++----------------------------------
> 1 file changed, 19 insertions(+), 72 deletions(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index bfd62b7..392d57d 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -58,8 +58,9 @@ struct cpufreq_cooling_device {
> };
> static LIST_HEAD(cooling_cpufreq_list);
> static DEFINE_IDR(cpufreq_idr);
> +static DEFINE_MUTEX(cooling_cpufreq_lock);
>
> -static struct mutex cooling_cpufreq_lock;
> +static unsigned int cpufreq_dev_count;
>
> /* notify_table passes value to the CPUFREQ_ADJUST callback function. */
> #define NOTIFY_INVALID NULL
> @@ -240,28 +241,18 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> static int cpufreq_get_max_state(struct thermal_cooling_device *cdev,
> unsigned long *state)
> {
> - int ret = -EINVAL, i = 0;
> - struct cpufreq_cooling_device *cpufreq_device;
> - struct cpumask *maskPtr;
> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
> + struct cpumask *maskPtr = &cpufreq_device->allowed_cpus;
> unsigned int cpu;
> struct cpufreq_frequency_table *table;
> unsigned long count = 0;
> + int i = 0;
>
> - mutex_lock(&cooling_cpufreq_lock);
> - list_for_each_entry(cpufreq_device, &cooling_cpufreq_list, node) {
> - if (cpufreq_device && cpufreq_device->cool_dev == cdev)
> - break;
> - }
> - if (cpufreq_device == NULL)
> - goto return_get_max_state;
> -
> - maskPtr = &cpufreq_device->allowed_cpus;
> cpu = cpumask_any(maskPtr);
> table = cpufreq_frequency_get_table(cpu);
> if (!table) {
> *state = 0;
> - ret = 0;
> - goto return_get_max_state;
> + return 0;
> }
>
> for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++) {
> @@ -272,12 +263,10 @@ static int cpufreq_get_max_state(struct thermal_cooling_device *cdev,
>
> if (count > 0) {
> *state = --count;
> - ret = 0;
> + return 0;
> }
>
> -return_get_max_state:
> - mutex_unlock(&cooling_cpufreq_lock);
> - return ret;
> + return -EINVAL;
> }
>
> /**
> @@ -288,20 +277,10 @@ return_get_max_state:
> static int cpufreq_get_cur_state(struct thermal_cooling_device *cdev,
> unsigned long *state)
> {
> - int ret = -EINVAL;
> - struct cpufreq_cooling_device *cpufreq_device;
> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
>
> - mutex_lock(&cooling_cpufreq_lock);
> - list_for_each_entry(cpufreq_device, &cooling_cpufreq_list, node) {
> - if (cpufreq_device && cpufreq_device->cool_dev == cdev) {
> - *state = cpufreq_device->cpufreq_state;
> - ret = 0;
> - break;
> - }
> - }
> - mutex_unlock(&cooling_cpufreq_lock);
> -
> - return ret;
> + *state = cpufreq_device->cpufreq_state;
> + return 0;
> }
>
> /**
> @@ -312,22 +291,9 @@ static int cpufreq_get_cur_state(struct thermal_cooling_device *cdev,
> static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
> unsigned long state)
> {
> - int ret = -EINVAL;
> - struct cpufreq_cooling_device *cpufreq_device;
> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
>
> - mutex_lock(&cooling_cpufreq_lock);
> - list_for_each_entry(cpufreq_device, &cooling_cpufreq_list, node) {
> - if (cpufreq_device && cpufreq_device->cool_dev == cdev) {
> - ret = 0;
> - break;
> - }
> - }
> - if (!ret)
> - ret = cpufreq_apply_cooling(cpufreq_device, state);
> -
> - mutex_unlock(&cooling_cpufreq_lock);
> -
> - return ret;
> + return cpufreq_apply_cooling(cpufreq_device, state);
> }
>
> /* Bind cpufreq callbacks to thermal cooling device ops */
> @@ -351,14 +317,11 @@ struct thermal_cooling_device *cpufreq_cooling_register(
> {
> struct thermal_cooling_device *cool_dev;
> struct cpufreq_cooling_device *cpufreq_dev = NULL;
> - unsigned int cpufreq_dev_count = 0, min = 0, max = 0;
> + unsigned int min = 0, max = 0;
> char dev_name[THERMAL_NAME_LENGTH];
> int ret = 0, i;
> struct cpufreq_policy policy;
>
> - list_for_each_entry(cpufreq_dev, &cooling_cpufreq_list, node)
> - cpufreq_dev_count++;
> -
> /*Verify that all the clip cpus have same freq_min, freq_max limit*/
> for_each_cpu(i, clip_cpus) {
> /*continue if cpufreq policy not found and not return error*/
> @@ -380,9 +343,6 @@ struct thermal_cooling_device *cpufreq_cooling_register(
>
> cpumask_copy(&cpufreq_dev->allowed_cpus, clip_cpus);
>
> - if (cpufreq_dev_count == 0)
> - mutex_init(&cooling_cpufreq_lock);
> -
> ret = get_idr(&cpufreq_idr, &cpufreq_dev->id);
> if (ret) {
> kfree(cpufreq_dev);
> @@ -401,12 +361,12 @@ struct thermal_cooling_device *cpufreq_cooling_register(
> cpufreq_dev->cool_dev = cool_dev;
> cpufreq_dev->cpufreq_state = 0;
> mutex_lock(&cooling_cpufreq_lock);
> - list_add_tail(&cpufreq_dev->node, &cooling_cpufreq_list);
>
> /* Register the notifier for first cpufreq cooling device */
> if (cpufreq_dev_count == 0)
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> CPUFREQ_POLICY_NOTIFIER);
> + cpufreq_dev_count++;
>
> mutex_unlock(&cooling_cpufreq_lock);
> return cool_dev;
> @@ -419,33 +379,20 @@ EXPORT_SYMBOL(cpufreq_cooling_register);
> */
> void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
> {
> - struct cpufreq_cooling_device *cpufreq_dev = NULL;
> - unsigned int cpufreq_dev_count = 0;
> + struct cpufreq_cooling_device *cpufreq_dev = cdev->devdata;
>
> mutex_lock(&cooling_cpufreq_lock);
> - list_for_each_entry(cpufreq_dev, &cooling_cpufreq_list, node) {
> - if (cpufreq_dev && cpufreq_dev->cool_dev == cdev)
> - break;
> - cpufreq_dev_count++;
> - }
> -
> - if (!cpufreq_dev || cpufreq_dev->cool_dev != cdev) {
> - mutex_unlock(&cooling_cpufreq_lock);
> - return;
> - }
> -
> - list_del(&cpufreq_dev->node);
> + cpufreq_dev_count--;
>
> /* Unregister the notifier for the last cpufreq cooling device */
> - if (cpufreq_dev_count == 1) {
> + if (cpufreq_dev_count == 0) {
> cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block,
> CPUFREQ_POLICY_NOTIFIER);
> }
> mutex_unlock(&cooling_cpufreq_lock);
> +
> thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
> release_idr(&cpufreq_idr, cpufreq_dev->id);
> - if (cpufreq_dev_count == 1)
> - mutex_destroy(&cooling_cpufreq_lock);
> kfree(cpufreq_dev);
> }
> EXPORT_SYMBOL(cpufreq_cooling_unregister);
^ permalink raw reply
* RE: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.
From: R, Durgadoss @ 2012-11-07 7:06 UTC (permalink / raw)
To: Zhang, Rui
Cc: Jonghwa Lee, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, Brown, Len, Rafael J. Wysocki,
Amit Dinel Kachhap, MyungJoo Ham, Kyungmin Park
In-Reply-To: <1352270197.2137.32.camel@rzhang1-mobl4>
Hi Rui,
> -----Original Message-----
> From: Zhang, Rui
> Sent: Wednesday, November 07, 2012 12:07 PM
> To: R, Durgadoss
> Cc: Jonghwa Lee; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> Brown, Len; Rafael J. Wysocki; Amit Dinel Kachhap; MyungJoo Ham;
> Kyungmin Park
> Subject: RE: [PATCH v4] Thermal: exynos: Add sysfs node supporting
> exynos's emulation mode.
>
> On Thu, 2012-11-01 at 23:13 -0600, R, Durgadoss wrote:
> > Hi Lee,
> >
> > > -----Original Message-----
> > > From: Jonghwa Lee [mailto:jonghwa3.lee@samsung.com]
> > > Sent: Friday, November 02, 2012 7:55 AM
> > > To: linux-pm@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org; Brown, Len; R, Durgadoss; Rafael J.
> > > Wysocki; Amit Dinel Kachhap; MyungJoo Ham; Kyungmin Park; Jonghwa
> Lee
> > > Subject: [PATCH v4] Thermal: exynos: Add sysfs node supporting
> exynos's
> > > emulation mode.
> > >
> > > This patch supports exynos's emulation mode with newly created sysfs
> node.
> > > Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for
> thermal
> > > management unit. Thermal emulation mode supports software debug for
> > > TMU's
> > > operation. User can set temperature manually with software code and
> TMU
> > > will read current temperature from user value not from sensor's value.
> > > This patch includes also documentary placed under
> > > Documentation/thermal/.
> > >
> >
> first of all, what would happen if overheat happens during emulation?
>
> I just had a thought about if we can introduce this to the generic
> thermal layer.
Sure, we can.
> to do this, we only need to:
> 1) introduce tz->emulation
> 2) introduce thermal_get_temp()
> static int thermal_get_temp(tz) {
> if (tz->emulation)
> return tz->emulation;
> else
> return tz->ops->get_temp(tz);
> }
> 3) replace tz->ops->get_temp() with thermal_get_temp() in thermal layer
> 4) introduce /sys/class/thermal/thermal_zoneX/emulation
> 5) when setting /sys/class/thermal/thermal_zoneX/emulation,
> a) set tz->emulation
> b) invoke thermal_zone_device_update();
> this is a pure software emulation solution but it would work on all
> generic thermal layer users.
>
> do you think this proposal would work properly?
Yes, this should work..
But, I am working on (top of your -next tree) to add multiple sensor support to
thermal framework.(What we discussed in Plumbers this year).
This changes APIs quite a bit in the thermal framework.
So, we will add this emulation support after the above changes are
in. What do you think ?
Thanks,
Durga
> if yes, I'd like to see if it is valuable for the other platform thermal
> drivers.
>
> thanks,
> rui
> > Thanks for fixing the comments.
> > Please CC linux-acpi, when you submit thermal patches, going forward.
> > I am CCing Rui for now, for him to review/merge this patch.
> >
> > Reviewed-by: Durgadoss R <durgadoss.r@intel.com>
> >
> > Thanks,
> > Durga
> >
> > > Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
> > > ---
> > > v4
> > > - Fix Typo.
> > > - Remove unnecessary codes.
> > > - Add comments about feature of exynos emulation operation to the
> > > document.
> > >
> > > v3
> > > - Remove unnecessay variables.
> > > - Do some code clean in exynos_tmu_emulation_store().
> > > - Make wrapping function of sysfs node creation function to use
> > > #ifdefs in minimum.
> > >
> > > v2
> > > exynos_thermal.c
> > > - Fix build error occured by wrong emulation control register name.
> > > - Remove exynos5410 dependent codes.
> > > exynos_thermal_emulation
> > > - Align indentation.
> > >
> > > Documentation/thermal/exynos_thermal_emulation | 56
> > > +++++++++++++++
> > > drivers/thermal/Kconfig | 9 +++
> > > drivers/thermal/exynos_thermal.c | 91
> > > ++++++++++++++++++++++++
> > > 3 files changed, 156 insertions(+), 0 deletions(-)
> > > create mode 100644
> Documentation/thermal/exynos_thermal_emulation
> > >
> > > diff --git a/Documentation/thermal/exynos_thermal_emulation
> > > b/Documentation/thermal/exynos_thermal_emulation
> > > new file mode 100644
> > > index 0000000..a6ea06f
> > > --- /dev/null
> > > +++ b/Documentation/thermal/exynos_thermal_emulation
> > > @@ -0,0 +1,56 @@
> > > +EXYNOS EMULATION MODE
> > > +========================
> > > +
> > > +Copyright (C) 2012 Samsung Electronics
> > > +
> > > +Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
> > > +
> > > +Description
> > > +-----------
> > > +
> > > +Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for
> thermal
> > > management unit.
> > > +Thermal emulation mode supports software debug for TMU's
> operation.
> > > User can set temperature
> > > +manually with software code and TMU will read current temperature
> from
> > > user value not from
> > > +sensor's value.
> > > +
> > > +Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this
> support
> > > in available.
> > > +When it's enabled, sysfs node will be created under
> > > +/sys/bus/platform/devices/'exynos device name'/ with name of
> > > 'emulation'.
> > > +
> > > +The sysfs node, 'emulation', will contain value 0 for the initial state.
> When
> > > you input any
> > > +temperature you want to update to sysfs node, it automatically enable
> > > emulation mode and
> > > +current temperature will be changed into it.
> > > +(Exynos also supports user changable delay time which would be used
> to
> > > delay of
> > > + changing temperature. However, this node only uses same delay of real
> > > sensing time, 938us.)
> > > +
> > > +Exynos emulation mode requires synchronous of value changing and
> > > enabling. It means when you
> > > +want to update the any value of delay or next temperature, then you
> have
> > > to enable emulation
> > > +mode at the same time. (Or you have to keep the mode enabling.) If
> you
> > > don't, it fails to
> > > +change the value to updated one and just use last succeessful value
> > > repeatedly. That's why
> > > +this node gives users the right to change termerpature only. Just one
> > > interface makes it more
> > > +simply to use.
> > > +
> > > +Disabling emulation mode only requires writing value 0 to sysfs node.
> > > +
> > > +
> > > +TEMP 120 |
> > > + |
> > > + 100 |
> > > + |
> > > + 80 |
> > > + | +-----------
> > > + 60 | | |
> > > + | +-------------| |
> > > + 40 | | | |
> > > + | | | |
> > > + 20 | | | +----------
> > > + | | | | |
> > > + 0
> > >
> |______________|_____________|__________|__________|_______
> > > __
> > > + A A A A TIME
> > > + |<----->| |<----->| |<----->| |
> > > + | 938us | | | | | |
> > > +emulation : 0 50 | 70 | 20 | 0
> > > +current temp : sensor 50 70 20 sensor
> > > +
> > > +
> > > +
> > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> > > index e1cb6bd..c02a66c 100644
> > > --- a/drivers/thermal/Kconfig
> > > +++ b/drivers/thermal/Kconfig
> > > @@ -55,3 +55,12 @@ config EXYNOS_THERMAL
> > > help
> > > If you say yes here you get support for TMU (Thermal Managment
> > > Unit) on SAMSUNG EXYNOS series of SoC.
> > > +
> > > +config EXYNOS_THERMAL_EMUL
> > > + bool "EXYNOS TMU emulation mode support"
> > > + depends on !CPU_EXYNOS4210 && EXYNOS_THERMAL
> > > + help
> > > + Exynos 4412 and 4414 and 5 series has emulation mode on TMU.
> > > + Enable this option will be make sysfs node in exynos thermal
> > > platform
> > > + device directory to support emulation mode. With emulation mode
> > > sysfs
> > > + node, you can manually input temperature to TMU for simulation
> > > purpose.
> > > diff --git a/drivers/thermal/exynos_thermal.c
> > > b/drivers/thermal/exynos_thermal.c
> > > index fd03e85..eebd4e5 100644
> > > --- a/drivers/thermal/exynos_thermal.c
> > > +++ b/drivers/thermal/exynos_thermal.c
> > > @@ -99,6 +99,14 @@
> > > #define IDLE_INTERVAL 10000
> > > #define MCELSIUS 1000
> > >
> > > +#ifdef CONFIG_EXYNOS_THERMAL_EMUL
> > > +#define EXYNOS_EMUL_TIME 0x57F0
> > > +#define EXYNOS_EMUL_TIME_SHIFT 16
> > > +#define EXYNOS_EMUL_DATA_SHIFT 8
> > > +#define EXYNOS_EMUL_DATA_MASK 0xFF
> > > +#define EXYNOS_EMUL_ENABLE 0x1
> > > +#endif /* CONFIG_EXYNOS_THERMAL_EMUL */
> > > +
> > > /* CPU Zone information */
> > > #define PANIC_ZONE 4
> > > #define WARN_ZONE 3
> > > @@ -832,6 +840,82 @@ static inline struct exynos_tmu_platform_data
> > > *exynos_get_driver_data(
> > > return (struct exynos_tmu_platform_data *)
> > > platform_get_device_id(pdev)->driver_data;
> > > }
> > > +
> > > +#ifdef CONFIG_EXYNOS_THERMAL_EMUL
> > > +static ssize_t exynos_tmu_emulation_show(struct device *dev,
> > > + struct device_attribute *attr,
> > > + char *buf)
> > > +{
> > > + struct platform_device *pdev = container_of(dev,
> > > + struct platform_device, dev);
> > > + struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> > > + unsigned int reg;
> > > + u8 temp_code;
> > > + int temp = 0;
> > > +
> > > + mutex_lock(&data->lock);
> > > + clk_enable(data->clk);
> > > + reg = readl(data->base + EXYNOS_EMUL_CON);
> > > + clk_disable(data->clk);
> > > + mutex_unlock(&data->lock);
> > > +
> > > + if (reg & EXYNOS_EMUL_ENABLE) {
> > > + reg >>= EXYNOS_EMUL_DATA_SHIFT;
> > > + temp_code = reg & EXYNOS_EMUL_DATA_MASK;
> > > + temp = code_to_temp(data, temp_code);
> > > + }
> > > +
> > > + return sprintf(buf, "%d\n", temp);
> > > +}
> > > +
> > > +static ssize_t exynos_tmu_emulation_store(struct device *dev,
> > > + struct device_attribute *attr,
> > > + const char *buf, size_t count)
> > > +{
> > > + struct platform_device *pdev = container_of(dev,
> > > + struct platform_device, dev);
> > > + struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> > > + unsigned int reg;
> > > + int temp;
> > > +
> > > + if (!sscanf(buf, "%d\n", &temp) || temp < 0)
> > > + return -EINVAL;
> > > +
> > > + mutex_lock(&data->lock);
> > > + clk_enable(data->clk);
> > > +
> > > + reg = readl(data->base + EXYNOS_EMUL_CON);
> > > +
> > > + if (temp)
> > > + reg = (EXYNOS_EMUL_TIME << EXYNOS_EMUL_TIME_SHIFT)
> > > |
> > > + (temp_to_code(data, temp) <<
> > > EXYNOS_EMUL_DATA_SHIFT) |
> > > + EXYNOS_EMUL_ENABLE;
> > > + else
> > > + reg &= ~EXYNOS_EMUL_ENABLE;
> > > +
> > > + writel(reg, data->base + EXYNOS_EMUL_CON);
> > > +
> > > + clk_disable(data->clk);
> > > + mutex_unlock(&data->lock);
> > > +
> > > + return count;
> > > +}
> > > +
> > > +static DEVICE_ATTR(emulation, 0644, exynos_tmu_emulation_show,
> > > + exynos_tmu_emulation_store);
> > > +static int create_emulation_sysfs(struct device *dev)
> > > +{
> > > + return device_create_file(dev, &dev_attr_emulation);
> > > +}
> > > +static void remove_emulation_sysfs(struct device *dev)
> > > +{
> > > + device_remove_file(dev, &dev_attr_emulation);
> > > +}
> > > +#else
> > > +static inline int create_emulation_sysfs(struct device *dev) {return 0;}
> > > +static inline void remove_emulation_sysfs(struct device *dev){}
> > > +#endif
> > > +
> > > static int __devinit exynos_tmu_probe(struct platform_device *pdev)
> > > {
> > > struct exynos_tmu_data *data;
> > > @@ -930,6 +1014,11 @@ static int __devinit exynos_tmu_probe(struct
> > > platform_device *pdev)
> > > dev_err(&pdev->dev, "Failed to register thermal
> > > interface\n");
> > > goto err_clk;
> > > }
> > > +
> > > + ret = create_emulation_sysfs(&pdev->dev);
> > > + if (ret)
> > > + dev_err(&pdev->dev, "Failed to create emulation mode
> > > sysfs node\n");
> > > +
> > > return 0;
> > > err_clk:
> > > platform_set_drvdata(pdev, NULL);
> > > @@ -941,6 +1030,8 @@ static int __devexit exynos_tmu_remove(struct
> > > platform_device *pdev)
> > > {
> > > struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> > >
> > > + remove_emulation_sysfs(&pdev->dev);
> > > +
> > > exynos_tmu_control(pdev, false);
> > >
> > > exynos_unregister_thermal();
> > > --
> > > 1.7.4.1
> >
>
^ permalink raw reply
* [PATCH V4 2/2] Thermal: Add ST-Ericsson DB8500 thermal properties and platform data.
From: hongbo.zhang @ 2012-11-07 10:23 UTC (permalink / raw)
To: linux-acpi-u79uwXL29TY76Z2rM5mHXA,
rui.zhang-ral2JQCrhuEAvxtiuMwx3w
Cc: linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
linaro-dev-cunTk1MwBs8s++Sfvej+rw, patches-QSEj5FYQhm4dnm+yROfE0A,
linux-pm-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
STEricsson_nomadik_linux-nkJGhpqTU55BDgjK7y7TUQ,
kernel-vMlcbD5RyM6HZuj8yyL1ah2eb7JE58TQ, hongbo.zhang
In-Reply-To: <1352283782-16489-1-git-send-email-hongbo.zhang-68IGFXMjmZ7QT0dZR+AlfA@public.gmane.org>
From: "hongbo.zhang" <hongbo.zhang-68IGFXMjmZ7QT0dZR+AlfA@public.gmane.org>
This patch adds device tree properties for ST-Ericsson DB8500 thermal driver,
also adds the platform data to support the old fashion.
Signed-off-by: hongbo.zhang <hongbo.zhang-68IGFXMjmZ7QT0dZR+AlfA@public.gmane.org>
Reviewed-by: Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
arch/arm/boot/dts/dbx5x0.dtsi | 14 +++++++++
arch/arm/boot/dts/snowball.dts | 31 ++++++++++++++++++
arch/arm/configs/u8500_defconfig | 2 ++
arch/arm/mach-ux500/board-mop500.c | 64 ++++++++++++++++++++++++++++++++++++++
4 files changed, 111 insertions(+)
diff --git a/arch/arm/boot/dts/dbx5x0.dtsi b/arch/arm/boot/dts/dbx5x0.dtsi
index 4b0e0ca..731086b 100644
--- a/arch/arm/boot/dts/dbx5x0.dtsi
+++ b/arch/arm/boot/dts/dbx5x0.dtsi
@@ -203,6 +203,14 @@
reg = <0x80157450 0xC>;
};
+ thermal@801573c0 {
+ compatible = "stericsson,db8500-thermal";
+ reg = <0x801573c0 0x40>;
+ interrupts = <21 0x4>, <22 0x4>;
+ interrupt-names = "IRQ_HOTMON_LOW", "IRQ_HOTMON_HIGH";
+ status = "disabled";
+ };
+
db8500-prcmu-regulators {
compatible = "stericsson,db8500-prcmu-regulator";
@@ -660,5 +668,11 @@
ranges = <0 0x50000000 0x4000000>;
status = "disabled";
};
+
+ cpufreq-cooling {
+ compatible = "stericsson,db8500-cpufreq-cooling";
+ status = "disabled";
+ };
+
};
};
diff --git a/arch/arm/boot/dts/snowball.dts b/arch/arm/boot/dts/snowball.dts
index 702c0ba..c6f85f0 100644
--- a/arch/arm/boot/dts/snowball.dts
+++ b/arch/arm/boot/dts/snowball.dts
@@ -99,6 +99,33 @@
status = "okay";
};
+ prcmu@80157000 {
+ thermal@801573c0 {
+ num-trips = <4>;
+
+ trip0-temp = <70000>;
+ trip0-type = "active";
+ trip0-cdev-num = <1>;
+ trip0-cdev-name0 = "thermal-cpufreq-0";
+
+ trip1-temp = <75000>;
+ trip1-type = "active";
+ trip1-cdev-num = <1>;
+ trip1-cdev-name0 = "thermal-cpufreq-0";
+
+ trip2-temp = <80000>;
+ trip2-type = "active";
+ trip2-cdev-num = <1>;
+ trip2-cdev-name0 = "thermal-cpufreq-0";
+
+ trip3-temp = <85000>;
+ trip3-type = "critical";
+ trip3-cdev-num = <0>;
+
+ status = "okay";
+ };
+ };
+
external-bus@50000000 {
status = "okay";
@@ -183,5 +210,9 @@
reg = <0x33>;
};
};
+
+ cpufreq-cooling {
+ status = "okay";
+ };
};
};
diff --git a/arch/arm/configs/u8500_defconfig b/arch/arm/configs/u8500_defconfig
index da68454..250625d 100644
--- a/arch/arm/configs/u8500_defconfig
+++ b/arch/arm/configs/u8500_defconfig
@@ -69,6 +69,8 @@ CONFIG_GPIO_TC3589X=y
CONFIG_POWER_SUPPLY=y
CONFIG_AB8500_BM=y
CONFIG_AB8500_BATTERY_THERM_ON_BATCTRL=y
+CONFIG_THERMAL=y
+CONFIG_CPU_THERMAL=y
CONFIG_MFD_STMPE=y
CONFIG_MFD_TC3589X=y
CONFIG_AB5500_CORE=y
diff --git a/arch/arm/mach-ux500/board-mop500.c b/arch/arm/mach-ux500/board-mop500.c
index 416d436..b03216b 100644
--- a/arch/arm/mach-ux500/board-mop500.c
+++ b/arch/arm/mach-ux500/board-mop500.c
@@ -16,6 +16,7 @@
#include <linux/io.h>
#include <linux/i2c.h>
#include <linux/platform_data/i2c-nomadik.h>
+#include <linux/platform_data/db8500_thermal.h>
#include <linux/gpio.h>
#include <linux/amba/bus.h>
#include <linux/amba/pl022.h>
@@ -229,6 +230,67 @@ static struct ab8500_platform_data ab8500_platdata = {
};
/*
+ * Thermal Sensor
+ */
+
+static struct resource db8500_thsens_resources[] = {
+ {
+ .name = "IRQ_HOTMON_LOW",
+ .start = IRQ_PRCMU_HOTMON_LOW,
+ .end = IRQ_PRCMU_HOTMON_LOW,
+ .flags = IORESOURCE_IRQ,
+ },
+ {
+ .name = "IRQ_HOTMON_HIGH",
+ .start = IRQ_PRCMU_HOTMON_HIGH,
+ .end = IRQ_PRCMU_HOTMON_HIGH,
+ .flags = IORESOURCE_IRQ,
+ },
+};
+
+static struct db8500_thsens_platform_data db8500_thsens_data = {
+ .trip_points[0] = {
+ .temp = 70000,
+ .type = THERMAL_TRIP_ACTIVE,
+ .cdev_name = {
+ [0] = "thermal-cpufreq-0",
+ },
+ },
+ .trip_points[1] = {
+ .temp = 75000,
+ .type = THERMAL_TRIP_ACTIVE,
+ .cdev_name = {
+ [0] = "thermal-cpufreq-0",
+ },
+ },
+ .trip_points[2] = {
+ .temp = 80000,
+ .type = THERMAL_TRIP_ACTIVE,
+ .cdev_name = {
+ [0] = "thermal-cpufreq-0",
+ },
+ },
+ .trip_points[3] = {
+ .temp = 85000,
+ .type = THERMAL_TRIP_CRITICAL,
+ },
+ .num_trips = 4,
+};
+
+static struct platform_device u8500_thsens_device = {
+ .name = "db8500-thermal",
+ .resource = db8500_thsens_resources,
+ .num_resources = ARRAY_SIZE(db8500_thsens_resources),
+ .dev = {
+ .platform_data = &db8500_thsens_data,
+ },
+};
+
+static struct platform_device u8500_cpufreq_cooling_device = {
+ .name = "db8500-cpufreq-cooling",
+};
+
+/*
* TPS61052
*/
@@ -583,6 +645,8 @@ static struct platform_device *snowball_platform_devs[] __initdata = {
&snowball_key_dev,
&snowball_sbnet_dev,
&snowball_gpio_en_3v3_regulator_dev,
+ &u8500_thsens_device,
+ &u8500_cpufreq_cooling_device,
};
static void __init mop500_init_machine(void)
--
1.7.11.3
^ permalink raw reply related
* [PATCH V4 0/2] Upstream ST-Ericsson thermal driver
From: hongbo.zhang @ 2012-11-07 10:23 UTC (permalink / raw)
To: linux-acpi, rui.zhang
Cc: linux-kernel, linux-pm, amit.kachhap, patches, linaro-dev,
linaro-kernel, STEricsson_nomadik_linux, kernel, hongbo.zhang
From: "hongbo.zhang" <hongbo.zhang@linaro.com>
V3->V4 Changes:
1. In previous patch set V3 "Fix thermal bugs and Upstream ST-Ericsson thermal
driver", there were 5 patches in total, since the first 3 for fixing thermal
layer bugs have been accepted by the maintainer, I'd like to send out the
updated last 2 only this time, for upstreaming ST-Ericsson thermal driver.
2. In patch "Thermal: Add ST-Ericsson DB8500 thermal driver"
- typo in commit message: diver -> driver;
- add more explanation of device tree properties are optional or required.
- in function db8500_thermal_match_cdev: remove cdev_name, use
trip_points->cdev_name[i] directly instead;
- in function db8500_cdev_bind: move cdev->ops->get_max_state(cdev,
&max_state) out of for loop to call it only once;
- in function db8500_thermal_parse_dt: update checking strlen(tmp_str) >=
THERMAL_NAME_LENGTH
- add Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
- add Reviewed-by: Francesco Lavra <francescolavra.fl@gmail.com>
3. In patch "Thermal: Add ST-Ericsson DB8500 thermal properties and platform
data"
- add Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
hongbo.zhang (2):
Thermal: Add ST-Ericsson DB8500 thermal driver.
Thermal: Add ST-Ericsson DB8500 thermal properties and platform data.
.../devicetree/bindings/thermal/db8500-thermal.txt | 44 ++
arch/arm/boot/dts/dbx5x0.dtsi | 14 +
arch/arm/boot/dts/snowball.dts | 31 ++
arch/arm/configs/u8500_defconfig | 2 +
arch/arm/mach-ux500/board-mop500.c | 64 +++
drivers/thermal/Kconfig | 20 +
drivers/thermal/Makefile | 2 +
drivers/thermal/db8500_cpufreq_cooling.c | 108 +++++
drivers/thermal/db8500_thermal.c | 530 +++++++++++++++++++++
include/linux/platform_data/db8500_thermal.h | 38 ++
10 files changed, 853 insertions(+)
create mode 100644 Documentation/devicetree/bindings/thermal/db8500-thermal.txt
create mode 100644 drivers/thermal/db8500_cpufreq_cooling.c
create mode 100644 drivers/thermal/db8500_thermal.c
create mode 100644 include/linux/platform_data/db8500_thermal.h
--
1.7.11.3
^ permalink raw reply
* [PATCH V4 1/2] Thermal: Add ST-Ericsson DB8500 thermal driver.
From: hongbo.zhang @ 2012-11-07 10:23 UTC (permalink / raw)
To: linux-acpi, rui.zhang
Cc: linux-kernel, linux-pm, amit.kachhap, patches, linaro-dev,
linaro-kernel, STEricsson_nomadik_linux, kernel, hongbo.zhang
In-Reply-To: <1352283782-16489-1-git-send-email-hongbo.zhang@linaro.com>
From: "hongbo.zhang" <hongbo.zhang@linaro.com>
This driver is based on the thermal management framework in thermal_sys.c. A
thermal zone device is created with the trip points to which cooling devices
can be bound, the current cooling device is cpufreq, e.g. CPU frequency is
clipped down to cool the CPU, and other cooling devices can be added and bound
to the trip points dynamically. The platform specific PRCMU interrupts are
used to active thermal update when trip points are reached.
Signed-off-by: hongbo.zhang <hongbo.zhang@linaro.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Francesco Lavra <francescolavra.fl@gmail.com>
---
.../devicetree/bindings/thermal/db8500-thermal.txt | 44 ++
drivers/thermal/Kconfig | 20 +
drivers/thermal/Makefile | 2 +
drivers/thermal/db8500_cpufreq_cooling.c | 108 +++++
drivers/thermal/db8500_thermal.c | 530 +++++++++++++++++++++
include/linux/platform_data/db8500_thermal.h | 38 ++
6 files changed, 742 insertions(+)
create mode 100644 Documentation/devicetree/bindings/thermal/db8500-thermal.txt
create mode 100644 drivers/thermal/db8500_cpufreq_cooling.c
create mode 100644 drivers/thermal/db8500_thermal.c
create mode 100644 include/linux/platform_data/db8500_thermal.h
diff --git a/Documentation/devicetree/bindings/thermal/db8500-thermal.txt b/Documentation/devicetree/bindings/thermal/db8500-thermal.txt
new file mode 100644
index 0000000..2a03c49
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/db8500-thermal.txt
@@ -0,0 +1,44 @@
+* ST-Ericsson DB8500 Thermal
+
+** Thermal node properties:
+
+- compatible : "stericsson,db8500-thermal";
+- reg : address range of the thermal sensor registers;
+- interrupts : interrupts generated from PRCMU;
+- interrupt-names : "IRQ_HOTMON_LOW" and "IRQ_HOTMON_HIGH";
+- num-trips : number of total trip points, this is required, set it 0 if none,
+ if greater than 0, the following properties must be defined;
+- tripN-temp : temperature of trip point N, should be in ascending order;
+- tripN-type : type of trip point N, should be one of "active" "passive" "hot"
+ "critical";
+- tripN-cdev-num : number of the cooling devices which can be bound to trip
+ point N, this is required if trip point N is defined, set it 0 if none,
+ otherwise the following cooling device names must be defined;
+- tripN-cdev-nameM : name of the No. M cooling device of trip point N;
+
+Usually the num-trips and tripN-*** are separated in board related dts files.
+
+Example:
+thermal@801573c0 {
+ compatible = "stericsson,db8500-thermal";
+ reg = <0x801573c0 0x40>;
+ interrupts = <21 0x4>, <22 0x4>;
+ interrupt-names = "IRQ_HOTMON_LOW", "IRQ_HOTMON_HIGH";
+
+ num-trips = <3>;
+
+ trip0-temp = <70000>;
+ trip0-type = "active";
+ trip0-cdev-num = <1>;
+ trip0-cdev-name0 = "thermal-cpufreq-0";
+
+ trip1-temp = <75000>;
+ trip1-type = "active";
+ trip1-cdev-num = <2>;
+ trip1-cdev-name0 = "thermal-cpufreq-0";
+ trip1-cdev-name1 = "thermal-fan";
+
+ trip2-temp = <85000>;
+ trip2-type = "critical";
+ trip2-cdev-num = <0>;
+}
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index e1cb6bd..54c8fd0 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -31,6 +31,26 @@ config CPU_THERMAL
and not the ACPI interface.
If you want this support, you should say Y here.
+config DB8500_THERMAL
+ bool "DB8500 thermal management"
+ depends on THERMAL
+ default y
+ help
+ Adds DB8500 thermal management implementation according to the thermal
+ management framework. A thermal zone with several trip points will be
+ created. Cooling devices can be bound to the trip points to cool this
+ thermal zone if trip points reached.
+
+config DB8500_CPUFREQ_COOLING
+ tristate "DB8500 cpufreq cooling"
+ depends on CPU_THERMAL
+ default y
+ help
+ Adds DB8500 cpufreq cooling devices, and these cooling devices can be
+ bound to thermal zone trip points. When a trip point reached, the
+ bound cpufreq cooling device turns active to set CPU frequency low to
+ cool down the CPU.
+
config SPEAR_THERMAL
bool "SPEAr thermal sensor driver"
depends on THERMAL
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 885550d..c7a8dab 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -7,3 +7,5 @@ obj-$(CONFIG_CPU_THERMAL) += cpu_cooling.o
obj-$(CONFIG_SPEAR_THERMAL) += spear_thermal.o
obj-$(CONFIG_RCAR_THERMAL) += rcar_thermal.o
obj-$(CONFIG_EXYNOS_THERMAL) += exynos_thermal.o
+obj-$(CONFIG_DB8500_THERMAL) += db8500_thermal.o
+obj-$(CONFIG_DB8500_CPUFREQ_COOLING) += db8500_cpufreq_cooling.o
diff --git a/drivers/thermal/db8500_cpufreq_cooling.c b/drivers/thermal/db8500_cpufreq_cooling.c
new file mode 100644
index 0000000..4cf8e72
--- /dev/null
+++ b/drivers/thermal/db8500_cpufreq_cooling.c
@@ -0,0 +1,108 @@
+/*
+ * db8500_cpufreq_cooling.c - DB8500 cpufreq works as cooling device.
+ *
+ * Copyright (C) 2012 ST-Ericsson
+ * Copyright (C) 2012 Linaro Ltd.
+ *
+ * Author: Hongbo Zhang <hongbo.zhang@linaro.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/cpu_cooling.h>
+#include <linux/cpufreq.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+static int db8500_cpufreq_cooling_probe(struct platform_device *pdev)
+{
+ struct thermal_cooling_device *cdev;
+ struct cpumask mask_val;
+
+ /* make sure cpufreq driver has been initialized */
+ if (!cpufreq_frequency_get_table(0))
+ return -EPROBE_DEFER;
+
+ cpumask_set_cpu(0, &mask_val);
+ cdev = cpufreq_cooling_register(&mask_val);
+
+ if (IS_ERR_OR_NULL(cdev)) {
+ dev_err(&pdev->dev, "Failed to register cooling device\n");
+ return PTR_ERR(cdev);
+ }
+
+ platform_set_drvdata(pdev, cdev);
+
+ dev_info(&pdev->dev, "Cooling device registered: %s\n", cdev->type);
+
+ return 0;
+}
+
+static int db8500_cpufreq_cooling_remove(struct platform_device *pdev)
+{
+ struct thermal_cooling_device *cdev = platform_get_drvdata(pdev);
+
+ cpufreq_cooling_unregister(cdev);
+
+ return 0;
+}
+
+static int db8500_cpufreq_cooling_suspend(struct platform_device *pdev,
+ pm_message_t state)
+{
+ return -ENOSYS;
+}
+
+static int db8500_cpufreq_cooling_resume(struct platform_device *pdev)
+{
+ return -ENOSYS;
+}
+
+#ifdef CONFIG_OF
+static const struct of_device_id db8500_cpufreq_cooling_match[] = {
+ { .compatible = "stericsson,db8500-cpufreq-cooling" },
+ {},
+};
+#else
+#define db8500_cpufreq_cooling_match NULL
+#endif
+
+static struct platform_driver db8500_cpufreq_cooling_driver = {
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = "db8500-cpufreq-cooling",
+ .of_match_table = db8500_cpufreq_cooling_match,
+ },
+ .probe = db8500_cpufreq_cooling_probe,
+ .suspend = db8500_cpufreq_cooling_suspend,
+ .resume = db8500_cpufreq_cooling_resume,
+ .remove = db8500_cpufreq_cooling_remove,
+};
+
+static int __init db8500_cpufreq_cooling_init(void)
+{
+ return platform_driver_register(&db8500_cpufreq_cooling_driver);
+}
+
+static void __exit db8500_cpufreq_cooling_exit(void)
+{
+ platform_driver_unregister(&db8500_cpufreq_cooling_driver);
+}
+
+/* Should be later than db8500_cpufreq_register */
+late_initcall(db8500_cpufreq_cooling_init);
+module_exit(db8500_cpufreq_cooling_exit);
+
+MODULE_AUTHOR("Hongbo Zhang <hongbo.zhang@stericsson.com>");
+MODULE_DESCRIPTION("DB8500 cpufreq cooling driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/thermal/db8500_thermal.c b/drivers/thermal/db8500_thermal.c
new file mode 100644
index 0000000..beafe10
--- /dev/null
+++ b/drivers/thermal/db8500_thermal.c
@@ -0,0 +1,530 @@
+/*
+ * db8500_thermal.c - DB8500 Thermal Management Implementation
+ *
+ * Copyright (C) 2012 ST-Ericsson
+ * Copyright (C) 2012 Linaro Ltd.
+ *
+ * Author: Hongbo Zhang <hongbo.zhang@linaro.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/cpu_cooling.h>
+#include <linux/interrupt.h>
+#include <linux/mfd/dbx500-prcmu.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_data/db8500_thermal.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/thermal.h>
+
+#define PRCMU_DEFAULT_MEASURE_TIME 0xFFF
+#define PRCMU_DEFAULT_LOW_TEMP 0
+
+struct db8500_thermal_zone {
+ struct thermal_zone_device *therm_dev;
+ struct mutex th_lock;
+ struct work_struct therm_work;
+ struct db8500_thsens_platform_data *trip_tab;
+ enum thermal_device_mode mode;
+ enum thermal_trend trend;
+ unsigned long cur_temp_pseudo;
+ unsigned int cur_index;
+};
+
+/* Local function to check if thermal zone matches cooling devices */
+static int db8500_thermal_match_cdev(struct thermal_cooling_device *cdev,
+ struct db8500_trip_point *trip_points)
+{
+ int i;
+
+ if (!strlen(cdev->type))
+ return -EINVAL;
+
+ for (i = 0; i < COOLING_DEV_MAX; i++) {
+ if (!strcmp(trip_points->cdev_name[i], cdev->type))
+ return 0;
+ }
+
+ return -ENODEV;
+}
+
+/* Callback to bind cooling device to thermal zone */
+static int db8500_cdev_bind(struct thermal_zone_device *thermal,
+ struct thermal_cooling_device *cdev)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+ unsigned long max_state, upper, lower;
+ int i, ret = -EINVAL;
+
+ cdev->ops->get_max_state(cdev, &max_state);
+
+ for (i = 0; i < ptrips->num_trips; i++) {
+ if (db8500_thermal_match_cdev(cdev, &ptrips->trip_points[i]))
+ continue;
+
+ lower = upper = (i > max_state) ? max_state : i;
+
+ ret = thermal_zone_bind_cooling_device(thermal, i, cdev,
+ upper, lower);
+
+ dev_info(&cdev->device, "%s bind to %d: %d-%s\n",
+ cdev->type, i, ret, ret ? "fail" : "succeed");
+ }
+
+ return ret;
+}
+
+/* Callback to unbind cooling device from thermal zone */
+static int db8500_cdev_unbind(struct thermal_zone_device *thermal,
+ struct thermal_cooling_device *cdev)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+ int i, ret = -EINVAL;
+
+ for (i = 0; i < ptrips->num_trips; i++) {
+ if (db8500_thermal_match_cdev(cdev, &ptrips->trip_points[i]))
+ continue;
+
+ ret = thermal_zone_unbind_cooling_device(thermal, i, cdev);
+
+ dev_info(&cdev->device, "%s unbind: %s\n",
+ cdev->type, ret ? "fail" : "succeed");
+ }
+
+ return ret;
+}
+
+/* Callback to get current temperature */
+static int db8500_sys_get_temp(struct thermal_zone_device *thermal,
+ unsigned long *temp)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+
+ /*
+ * TODO: There is no PRCMU interface to get temperature data currently,
+ * so a pseudo temperature is returned , it works for thermal framework
+ * and this will be fixed when the PRCMU interface is available.
+ */
+ *temp = pzone->cur_temp_pseudo;
+
+ return 0;
+}
+
+/* Callback to get temperature changing trend */
+static int db8500_sys_get_trend(struct thermal_zone_device *thermal,
+ int trip, enum thermal_trend *trend)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+
+ *trend = pzone->trend;
+
+ return 0;
+}
+
+/* Callback to get thermal zone mode */
+static int db8500_sys_get_mode(struct thermal_zone_device *thermal,
+ enum thermal_device_mode *mode)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+
+ mutex_lock(&pzone->th_lock);
+ *mode = pzone->mode;
+ mutex_unlock(&pzone->th_lock);
+
+ return 0;
+}
+
+/* Callback to set thermal zone mode */
+static int db8500_sys_set_mode(struct thermal_zone_device *thermal,
+ enum thermal_device_mode mode)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+
+ mutex_lock(&pzone->th_lock);
+
+ pzone->mode = mode;
+ if (mode == THERMAL_DEVICE_ENABLED)
+ schedule_work(&pzone->therm_work);
+
+ mutex_unlock(&pzone->th_lock);
+
+ return 0;
+}
+
+/* Callback to get trip point type */
+static int db8500_sys_get_trip_type(struct thermal_zone_device *thermal,
+ int trip, enum thermal_trip_type *type)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+
+ if (trip >= ptrips->num_trips)
+ return -EINVAL;
+
+ *type = ptrips->trip_points[trip].type;
+
+ return 0;
+}
+
+/* Callback to get trip point temperature */
+static int db8500_sys_get_trip_temp(struct thermal_zone_device *thermal,
+ int trip, unsigned long *temp)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+
+ if (trip >= ptrips->num_trips)
+ return -EINVAL;
+
+ *temp = ptrips->trip_points[trip].temp;
+
+ return 0;
+}
+
+/* Callback to get critical trip point temperature */
+static int db8500_sys_get_crit_temp(struct thermal_zone_device *thermal,
+ unsigned long *temp)
+{
+ struct db8500_thermal_zone *pzone = thermal->devdata;
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+ int i;
+
+ for (i = ptrips->num_trips - 1; i > 0; i--) {
+ if (ptrips->trip_points[i].type == THERMAL_TRIP_CRITICAL) {
+ *temp = ptrips->trip_points[i].temp;
+ return 0;
+ }
+ }
+
+ return -EINVAL;
+}
+
+static struct thermal_zone_device_ops thdev_ops = {
+ .bind = db8500_cdev_bind,
+ .unbind = db8500_cdev_unbind,
+ .get_temp = db8500_sys_get_temp,
+ .get_trend = db8500_sys_get_trend,
+ .get_mode = db8500_sys_get_mode,
+ .set_mode = db8500_sys_set_mode,
+ .get_trip_type = db8500_sys_get_trip_type,
+ .get_trip_temp = db8500_sys_get_trip_temp,
+ .get_crit_temp = db8500_sys_get_crit_temp,
+};
+
+static void db8500_thermal_update_config(struct db8500_thermal_zone *pzone,
+ unsigned int idx, enum thermal_trend trend,
+ unsigned long next_low, unsigned long next_high)
+{
+ pzone->cur_index = idx;
+ pzone->cur_temp_pseudo = (next_low + next_high)/2;
+ pzone->trend = trend;
+
+ prcmu_stop_temp_sense();
+ prcmu_config_hotmon((u8)(next_low/1000), (u8)(next_high/1000));
+ prcmu_start_temp_sense(PRCMU_DEFAULT_MEASURE_TIME);
+}
+
+static irqreturn_t prcmu_low_irq_handler(int irq, void *irq_data)
+{
+ struct db8500_thermal_zone *pzone = irq_data;
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+ unsigned int idx = pzone->cur_index;
+ unsigned long next_low, next_high;
+
+ if (unlikely(idx == 0))
+ /* Meaningless for thermal management, ignoring it */
+ return IRQ_HANDLED;
+
+ if (idx == 1) {
+ next_high = ptrips->trip_points[0].temp;
+ next_low = PRCMU_DEFAULT_LOW_TEMP;
+ } else {
+ next_high = ptrips->trip_points[idx-1].temp;
+ next_low = ptrips->trip_points[idx-2].temp;
+ }
+ idx -= 1;
+
+ db8500_thermal_update_config(pzone, idx, THERMAL_TREND_DROPPING,
+ next_low, next_high);
+
+ dev_dbg(&pzone->therm_dev->device,
+ "PRCMU set max %ld, min %ld\n", next_high, next_low);
+
+ schedule_work(&pzone->therm_work);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t prcmu_high_irq_handler(int irq, void *irq_data)
+{
+ struct db8500_thermal_zone *pzone = irq_data;
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+ unsigned int idx = pzone->cur_index;
+ unsigned long next_low, next_high;
+
+ if (idx < ptrips->num_trips - 1) {
+ next_high = ptrips->trip_points[idx+1].temp;
+ next_low = ptrips->trip_points[idx].temp;
+ idx += 1;
+
+ db8500_thermal_update_config(pzone, idx, THERMAL_TREND_RAISING,
+ next_low, next_high);
+
+ dev_dbg(&pzone->therm_dev->device,
+ "PRCMU set max %ld, min %ld\n", next_high, next_low);
+ } else if (idx == ptrips->num_trips - 1)
+ pzone->cur_temp_pseudo = ptrips->trip_points[idx].temp + 1;
+
+ schedule_work(&pzone->therm_work);
+
+ return IRQ_HANDLED;
+}
+
+static void db8500_thermal_work(struct work_struct *work)
+{
+ enum thermal_device_mode cur_mode;
+ struct db8500_thermal_zone *pzone;
+
+ pzone = container_of(work, struct db8500_thermal_zone, therm_work);
+
+ mutex_lock(&pzone->th_lock);
+ cur_mode = pzone->mode;
+ mutex_unlock(&pzone->th_lock);
+
+ if (cur_mode == THERMAL_DEVICE_DISABLED)
+ return;
+
+ thermal_zone_device_update(pzone->therm_dev);
+ dev_dbg(&pzone->therm_dev->device, "thermal work finished.\n");
+}
+
+#ifdef CONFIG_OF
+static struct db8500_thsens_platform_data*
+ db8500_thermal_parse_dt(struct platform_device *pdev)
+{
+ struct db8500_thsens_platform_data *ptrips;
+ struct device_node *np = pdev->dev.of_node;
+ char prop_name[32];
+ const char *tmp_str;
+ u32 tmp_data;
+ int i, j;
+
+ ptrips = devm_kzalloc(&pdev->dev, sizeof(*ptrips), GFP_KERNEL);
+ if (!ptrips)
+ return NULL;
+
+ if (of_property_read_u32(np, "num-trips", &tmp_data))
+ goto err_parse_dt;
+
+ if (tmp_data > THERMAL_MAX_TRIPS)
+ goto err_parse_dt;
+
+ ptrips->num_trips = tmp_data;
+
+ for (i = 0; i < ptrips->num_trips; i++) {
+ sprintf(prop_name, "trip%d-temp", i);
+ if (of_property_read_u32(np, prop_name, &tmp_data))
+ goto err_parse_dt;
+
+ ptrips->trip_points[i].temp = tmp_data;
+
+ sprintf(prop_name, "trip%d-type", i);
+ if (of_property_read_string(np, prop_name, &tmp_str))
+ goto err_parse_dt;
+
+ if (!strcmp(tmp_str, "active"))
+ ptrips->trip_points[i].type = THERMAL_TRIP_ACTIVE;
+ else if (!strcmp(tmp_str, "passive"))
+ ptrips->trip_points[i].type = THERMAL_TRIP_PASSIVE;
+ else if (!strcmp(tmp_str, "hot"))
+ ptrips->trip_points[i].type = THERMAL_TRIP_HOT;
+ else if (!strcmp(tmp_str, "critical"))
+ ptrips->trip_points[i].type = THERMAL_TRIP_CRITICAL;
+ else
+ goto err_parse_dt;
+
+ sprintf(prop_name, "trip%d-cdev-num", i);
+ if (of_property_read_u32(np, prop_name, &tmp_data))
+ goto err_parse_dt;
+
+ if (tmp_data > COOLING_DEV_MAX)
+ goto err_parse_dt;
+
+ for (j = 0; j < tmp_data; j++) {
+ sprintf(prop_name, "trip%d-cdev-name%d", i, j);
+ if (of_property_read_string(np, prop_name, &tmp_str))
+ goto err_parse_dt;
+
+ if (strlen(tmp_str) >= THERMAL_NAME_LENGTH)
+ goto err_parse_dt;
+
+ strcpy(ptrips->trip_points[i].cdev_name[j], tmp_str);
+ }
+ }
+ return ptrips;
+
+err_parse_dt:
+ dev_err(&pdev->dev, "Parsing device tree data error.\n");
+ return NULL;
+}
+#else
+static inline struct db8500_thsens_platform_data*
+ db8500_thermal_parse_dt(struct platform_device *pdev)
+{
+ return NULL;
+}
+#endif
+
+static int db8500_thermal_probe(struct platform_device *pdev)
+{
+ struct db8500_thermal_zone *pzone = NULL;
+ struct db8500_thsens_platform_data *ptrips = NULL;
+ struct device_node *np = pdev->dev.of_node;
+ int low_irq, high_irq, ret = 0;
+ unsigned long dft_low, dft_high;
+
+ if (np)
+ ptrips = db8500_thermal_parse_dt(pdev);
+ else
+ ptrips = dev_get_platdata(&pdev->dev);
+
+ if (!ptrips)
+ return -EINVAL;
+
+ pzone = devm_kzalloc(&pdev->dev, sizeof(*pzone), GFP_KERNEL);
+ if (!pzone)
+ return -ENOMEM;
+
+ mutex_init(&pzone->th_lock);
+ mutex_lock(&pzone->th_lock);
+
+ pzone->mode = THERMAL_DEVICE_DISABLED;
+ pzone->trip_tab = ptrips;
+
+ dft_low = PRCMU_DEFAULT_LOW_TEMP;
+ dft_high = ptrips->trip_points[0].temp;
+
+ db8500_thermal_update_config(pzone, 0, THERMAL_TREND_STABLE,
+ dft_low, dft_high);
+
+ INIT_WORK(&pzone->therm_work, db8500_thermal_work);
+
+ low_irq = platform_get_irq_byname(pdev, "IRQ_HOTMON_LOW");
+ if (low_irq < 0) {
+ dev_err(&pdev->dev, "Get IRQ_HOTMON_LOW failed.\n");
+ return low_irq;
+ }
+
+ ret = devm_request_threaded_irq(&pdev->dev, low_irq, NULL,
+ prcmu_low_irq_handler, IRQF_NO_SUSPEND | IRQF_ONESHOT,
+ "dbx500_temp_low", pzone);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Failed to allocate temp low irq.\n");
+ return ret;
+ }
+
+ high_irq = platform_get_irq_byname(pdev, "IRQ_HOTMON_HIGH");
+ if (high_irq < 0) {
+ dev_err(&pdev->dev, "Get IRQ_HOTMON_HIGH failed.\n");
+ return high_irq;
+ }
+
+ ret = devm_request_threaded_irq(&pdev->dev, high_irq, NULL,
+ prcmu_high_irq_handler, IRQF_NO_SUSPEND | IRQF_ONESHOT,
+ "dbx500_temp_high", pzone);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Failed to allocate temp high irq.\n");
+ return ret;
+ }
+
+ pzone->therm_dev = thermal_zone_device_register("db8500_thermal_zone",
+ ptrips->num_trips, 0, pzone, &thdev_ops, 0, 0);
+
+ if (IS_ERR_OR_NULL(pzone->therm_dev)) {
+ dev_err(&pdev->dev, "Register thermal zone device failed.\n");
+ return PTR_ERR(pzone->therm_dev);
+ }
+ dev_info(&pdev->dev, "Thermal zone device registered.\n");
+
+ platform_set_drvdata(pdev, pzone);
+ pzone->mode = THERMAL_DEVICE_ENABLED;
+ mutex_unlock(&pzone->th_lock);
+
+ return 0;
+}
+
+static int db8500_thermal_remove(struct platform_device *pdev)
+{
+ struct db8500_thermal_zone *pzone = platform_get_drvdata(pdev);
+
+ thermal_zone_device_unregister(pzone->therm_dev);
+ cancel_work_sync(&pzone->therm_work);
+ mutex_destroy(&pzone->th_lock);
+
+ return 0;
+}
+
+static int db8500_thermal_suspend(struct platform_device *pdev,
+ pm_message_t state)
+{
+ struct db8500_thermal_zone *pzone = platform_get_drvdata(pdev);
+
+ flush_work_sync(&pzone->therm_work);
+ prcmu_stop_temp_sense();
+
+ return 0;
+}
+
+static int db8500_thermal_resume(struct platform_device *pdev)
+{
+ struct db8500_thermal_zone *pzone = platform_get_drvdata(pdev);
+ struct db8500_thsens_platform_data *ptrips = pzone->trip_tab;
+ unsigned long dft_low, dft_high;
+
+ dft_low = PRCMU_DEFAULT_LOW_TEMP;
+ dft_high = ptrips->trip_points[0].temp;
+
+ db8500_thermal_update_config(pzone, 0, THERMAL_TREND_STABLE,
+ dft_low, dft_high);
+
+ return 0;
+}
+
+#ifdef CONFIG_OF
+static const struct of_device_id db8500_thermal_match[] = {
+ { .compatible = "stericsson,db8500-thermal" },
+ {},
+};
+#else
+#define db8500_thermal_match NULL
+#endif
+
+static struct platform_driver db8500_thermal_driver = {
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = "db8500-thermal",
+ .of_match_table = db8500_thermal_match,
+ },
+ .probe = db8500_thermal_probe,
+ .suspend = db8500_thermal_suspend,
+ .resume = db8500_thermal_resume,
+ .remove = db8500_thermal_remove,
+};
+
+module_platform_driver(db8500_thermal_driver);
+
+MODULE_AUTHOR("Hongbo Zhang <hongbo.zhang@stericsson.com>");
+MODULE_DESCRIPTION("DB8500 thermal driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/platform_data/db8500_thermal.h b/include/linux/platform_data/db8500_thermal.h
new file mode 100644
index 0000000..3bf6090
--- /dev/null
+++ b/include/linux/platform_data/db8500_thermal.h
@@ -0,0 +1,38 @@
+/*
+ * db8500_thermal.h - DB8500 Thermal Management Implementation
+ *
+ * Copyright (C) 2012 ST-Ericsson
+ * Copyright (C) 2012 Linaro Ltd.
+ *
+ * Author: Hongbo Zhang <hongbo.zhang@linaro.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _DB8500_THERMAL_H_
+#define _DB8500_THERMAL_H_
+
+#include <linux/thermal.h>
+
+#define COOLING_DEV_MAX 8
+
+struct db8500_trip_point {
+ unsigned long temp;
+ enum thermal_trip_type type;
+ char cdev_name[COOLING_DEV_MAX][THERMAL_NAME_LENGTH];
+};
+
+struct db8500_thsens_platform_data {
+ struct db8500_trip_point trip_points[THERMAL_MAX_TRIPS];
+ int num_trips;
+};
+
+#endif /* _DB8500_THERMAL_H_ */
--
1.7.11.3
^ permalink raw reply related
* Re: [PATCH 9/9] cgroup_freezer: implement proper hierarchy support
From: Michal Hocko @ 2012-11-07 11:00 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
In-Reply-To: <1351931915-1701-10-git-send-email-tj@kernel.org>
On Sat 03-11-12 01:38:35, Tejun Heo wrote:
[...]
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 4f12d31..3262537 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
[...]
> @@ -320,14 +388,39 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze,
> * @freezer: freezer of interest
> * @freeze: whether to freeze or thaw
> *
> - * Freeze or thaw @cgroup according to @freeze.
> + * Freeze or thaw @freezer according to @freeze. The operations are
> + * recursive - all descendants of @freezer will be affected.
> */
> static void freezer_change_state(struct freezer *freezer, bool freeze)
> {
> + struct cgroup *pos;
> +
> /* update @freezer */
> spin_lock_irq(&freezer->lock);
> freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
> spin_unlock_irq(&freezer->lock);
> +
> + /*
> + * Update all its descendants in pre-order traversal. Each
> + * descendant will try to inherit its parent's FREEZING state as
> + * CGROUP_FREEZING_PARENT.
> + */
> + rcu_read_lock();
> + cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
> + struct freezer *pos_f = cgroup_freezer(pos);
> + struct freezer *parent = parent_freezer(freezer);
This doesn't seem right. Why are we interested in freezer->parent here
at all? We should either care about freezer->state & CGROUP_FREEZING or
parent = parent_freezer(pos_f), right?
> +
> + /*
> + * Our update to @parent->state is already visible which is
> + * all we need. No need to lock @parent. For more info on
> + * synchronization, see freezer_post_create().
> + */
> + spin_lock_irq(&pos_f->lock);
> + freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
> + CGROUP_FREEZING_PARENT);
> + spin_unlock_irq(&pos_f->lock);
> + }
> + rcu_read_unlock();
> }
>
> static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [PATCH V3 0/4] cpuidle : multiple drivers support
From: Peter De Schrijver @ 2012-11-07 13:32 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: Daniel Lezcano, rjw@sisk.pl, linux-pm@vger.kernel.org,
linaro-dev@lists.linaro.org
In-Reply-To: <20121102131950.GA4432@e102568-lin.cambridge.arm.com>
On Fri, Nov 02, 2012 at 02:19:50PM +0100, Lorenzo Pieralisi wrote:
> On Wed, Oct 31, 2012 at 04:44:44PM +0000, Daniel Lezcano wrote:
> > The discussion about having different cpus on the system with
> > different latencies bring us to a first attemp by adding a
> > pointer in the cpuidle_device to the states array.
> >
> > But as Rafael suggested, it would make more sense to create a
> > driver per cpu [1].
> >
> > This patch adds support for multiple cpuidle drivers.
> >
> > It creates a per cpu cpuidle driver pointer.
> >
> > In order to not break the different drivers, the function cpuidle_register_driver
> > assign for each cpu, the driver.
> >
> > The multiple driver support is optional and if it is not set, the cpuide driver
> > core code remains the same (except some code reorganisation).
> >
> > I did the following tests compiled, booted, tested without/with CONFIG_CPU_IDLE,
> > with/without CONFIG_CPU_IDLE_MULTIPLE_DRIVERS.
> >
> > Tested on Core2 Duo T9500 with acpi_idle [and intel_idle]
> > Tested on ARM Dual Cortex-A9 U8500 (aka Snowball)
> >
> > V1 tested on Tegra3 and Vexpress TC2
>
> V3 tested on TC2, hence, on the whole series
>
> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>
V3 tested on Tegra3, so:
Tested-by: Peter De Schrijver <pdeschrijver@nvidia.com>
^ permalink raw reply
* Re: [PATCH 0/4] cpufreq: OMAP: if available, scale the iva coprocessor
From: Santosh Shilimkar @ 2012-11-07 14:42 UTC (permalink / raw)
To: Joshua Emele
Cc: Kevin Hilman, Rafael J. Wysocki, linux-omap, cpufreq, linux-pm,
linux-kernel
In-Reply-To: <1352252861-18384-1-git-send-email-jemele@gmail.com>
Hi,
On Tuesday 06 November 2012 07:47 PM, Joshua Emele wrote:
> The iva coprocessor, available on some omap platforms, shares a voltage domain
> with the mpu. If cpufreq is active and the mpu processor is scaled down, the iva
> coprocessor should also be scaled. The goal is to make sure we do not ramp down
> the voltage on the domain and affect clocking on the iva coprocessor leading to
> a dsp crash.
>
> I only have access to an omap3evm-ish device, so I do not know what the iva
> clock name is for omap24xx and omap44xx. This detail can be added later if the
> general approach is approved.
>
> I have tested a version of this patch against the linux-3.3 kernel, so this my
> attempt at a forward port against the current mainline. I have based my patch
> series against linux-omap-pm/pm-next.
>
How about making use of CPUFREQ notifiers to handle it. I don't like
this series since it adds multi-media co-processor scaling into the
CPUFreq driver.
Regards
Santosh
^ permalink raw reply
* Re: [PATCH 1/4] cpufreq: OMAP: if an iva clock name is specified, load iva resources
From: Nishanth Menon @ 2012-11-07 14:53 UTC (permalink / raw)
To: Joshua Emele
Cc: Kevin Hilman, Rafael J. Wysocki, linux-omap, cpufreq, linux-pm,
linux-kernel
In-Reply-To: <1352252861-18384-2-git-send-email-jemele@gmail.com>
On 17:47-20121106, Joshua Emele wrote:
> +static int __cpuinit omap_iva_init(struct cpufreq_policy *policy)
> +{
> + int result;
> + if (!iva_clk_name) {
> + pr_info("%s: iva unavailable\n", __func__);
> + return 0;
> + }
> + iva_dev = omap_device_get_by_hwmod_name("iva");
NAK. Reasons as follows:
a) cpufreq is purely meant for cpu, not IVA. we should instead be using
devfreq which is designed around the usage for co-processor
b) clubbing ARM's frequency decision is definitely NOT equal to IVA
frequency decision, not only is it wrong in terms of modularization, it
is wrong in terms of power benefits as well (not to mention weirdness
needed when you look at all OMAP SoC variants)
c) DVFS is not trivial around multiple co-processor transitions -> core
OPPs (as dependent OPP) needs to be addressed as well. ideally common
clock framework could take care of clock dependencies.
--
Regards,
Nishanth Menon
^ permalink raw reply
* Re: [PATCH 1/9] cgroup: add cgroup_subsys->post_create()
From: Michal Hocko @ 2012-11-07 15:25 UTC (permalink / raw)
To: Tejun Heo
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
On Sat 03-11-12 01:38:27, Tejun Heo wrote:
> Currently, there's no way for a controller to find out whether a new
> cgroup finished all ->create() allocatinos successfully and is
> considered "live" by cgroup.
>
> This becomes a problem later when we add generic descendants walking
> to cgroup which can be used by controllers as controllers don't have a
> synchronization point where it can synchronize against new cgroups
> appearing in such walks.
>
> This patch adds ->post_create(). It's called after all ->create()
> succeeded and the cgroup is linked into the generic cgroup hierarchy.
> This plays the counterpart of ->pre_destroy().
Hmm, I had to look at "cgroup_freezer: implement proper hierarchy
support" to actually understand what is the callback good for. The above
sounds as if the callback is needed when a controller wants to use
the new iterators or when pre_destroy is defined.
I think it would be helpful if the changelog described that the callback
is needed when the controller keeps a mutable shared state for the
hierarchy. For example memory controller doesn't have any such a strict
requirement so we can safely use your new iterators without pre_destroy.
Anyway, I like this change because the shared state is now really easy
to implement.
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> ---
> include/linux/cgroup.h | 1 +
> kernel/cgroup.c | 12 ++++++++++--
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index fe876a7..b442122 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -438,6 +438,7 @@ int cgroup_taskset_size(struct cgroup_taskset *tset);
>
> struct cgroup_subsys {
> struct cgroup_subsys_state *(*create)(struct cgroup *cgrp);
> + void (*post_create)(struct cgroup *cgrp);
> void (*pre_destroy)(struct cgroup *cgrp);
> void (*destroy)(struct cgroup *cgrp);
> int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e3045ad..f05d992 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4060,10 +4060,15 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
> if (err < 0)
> goto err_remove;
>
> - /* each css holds a ref to the cgroup's dentry */
> - for_each_subsys(root, ss)
> + for_each_subsys(root, ss) {
> + /* each css holds a ref to the cgroup's dentry */
> dget(dentry);
>
> + /* creation succeeded, notify subsystems */
> + if (ss->post_create)
> + ss->post_create(cgrp);
> + }
> +
> /* The cgroup directory was pre-locked for us */
> BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
>
> @@ -4281,6 +4286,9 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
>
> ss->active = 1;
>
> + if (ss->post_create)
> + ss->post_create(&ss->root->top_cgroup);
> +
> /* this function shouldn't be used with modular subsystems, since they
> * need to register a subsys_id, among other things */
> BUG_ON(ss->module);
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [PATCH 2/9] cgroup: Use rculist ops for cgroup->children
From: Michal Hocko @ 2012-11-07 15:30 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
In-Reply-To: <1351931915-1701-3-git-send-email-tj@kernel.org>
On Sat 03-11-12 01:38:28, Tejun Heo wrote:
> Use RCU safe list operations for cgroup->children. This will be used
> to implement cgroup children / descendant walking which can be used by
> controllers.
>
> Note that cgroup_create() now puts a new cgroup at the end of the
> ->children list instead of head. This isn't strictly necessary but is
> done so that the iteration order is more conventional.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
> ---
> include/linux/cgroup.h | 1 +
> kernel/cgroup.c | 8 +++-----
> 2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index b442122..90c33eb 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -12,6 +12,7 @@
> #include <linux/cpumask.h>
> #include <linux/nodemask.h>
> #include <linux/rcupdate.h>
> +#include <linux/rculist.h>
> #include <linux/cgroupstats.h>
> #include <linux/prio_heap.h>
> #include <linux/rwsem.h>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index f05d992..cc5d2a0 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1650,7 +1650,6 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
>
> free_cg_links(&tmp_cg_links);
>
> - BUG_ON(!list_empty(&root_cgrp->sibling));
> BUG_ON(!list_empty(&root_cgrp->children));
> BUG_ON(root->number_of_cgroups != 1);
>
> @@ -1699,7 +1698,6 @@ static void cgroup_kill_sb(struct super_block *sb) {
>
> BUG_ON(root->number_of_cgroups != 1);
> BUG_ON(!list_empty(&cgrp->children));
> - BUG_ON(!list_empty(&cgrp->sibling));
>
> mutex_lock(&cgroup_mutex);
> mutex_lock(&cgroup_root_mutex);
> @@ -4053,7 +4051,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
> }
> }
>
> - list_add(&cgrp->sibling, &cgrp->parent->children);
> + list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children);
> root->number_of_cgroups++;
>
> err = cgroup_create_dir(cgrp, dentry, mode);
> @@ -4084,7 +4082,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
>
> err_remove:
>
> - list_del(&cgrp->sibling);
> + list_del_rcu(&cgrp->sibling);
> root->number_of_cgroups--;
>
> err_destroy:
> @@ -4210,7 +4208,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
> raw_spin_unlock(&release_list_lock);
>
> /* delete this cgroup from parent->children */
> - list_del_init(&cgrp->sibling);
> + list_del_rcu(&cgrp->sibling);
>
> list_del_init(&cgrp->allcg_node);
>
> --
> 1.7.11.7
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
From: Michal Hocko @ 2012-11-07 15:38 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, rjw-KKrjLPT3xs0,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <20121106203154.GV30069-9pTldWuhBndy/B6EtB590w@public.gmane.org>
On Tue 06-11-12 12:31:54, Tejun Heo wrote:
> On Sat, Nov 03, 2012 at 01:38:29AM -0700, Tejun Heo wrote:
> > Currently, cgroup doesn't provide any generic helper for walking a
> > given cgroup's children or descendants. This patch adds the following
> > three macros.
> >
> > * cgroup_for_each_child() - walk immediate children of a cgroup.
> >
> > * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
> > in pre-order tree traversal.
> >
> > * cgroup_for_each_descendant_post() - visit all descendants of a
> > cgroup in post-order tree traversal.
> >
> > All three only require the user to hold RCU read lock during
> > traversal. Verifying that each iterated cgroup is online is the
> > responsibility of the user. When used with proper synchronization,
> > cgroup_for_each_descendant_pre() can be used to propagate config
> > updates to descendants in reliable way. See comments for details.
>
> Michal, Li, how does this look to you? Would this be okay for memcg
> too?
Yes, definitely. We are currently iterating by css->id which is, ehm,
impractical. Having a deterministic tree walk is definitely a plus.
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Alan Stern @ 2012-11-07 15:49 UTC (permalink / raw)
To: Huang Ying; +Cc: Rafael J. Wysocki, linux-kernel, linux-pm
In-Reply-To: <1352248006.31033.192.camel@yhuang-dev>
On Wed, 7 Nov 2012, Huang Ying wrote:
> > > Devices will be disabled if the PCI driver is unbound from the PCI
> > > device.
> >
> > Yes. But without a PCI driver, nothing will call
> > pm_runtime_set_suspended.
>
> pm_runtime_set_suspended will be called in pci_device_remove or error
> path of local_pci_probe.
Yes, okay. But that's what we want. Unused, driverless PCI devices
shouldn't force their parents to remain at full power.
> > And even if something does call
> > pm_runtime_set_suspended, it's still not a problem -- the device can't
> > be used without a driver.
>
> The VGA device can be used without a driver.
Ah, right, that's your _real_ problem. You should have mentioned this
in the original Changelog for the patch.
Rafael, this does need to be fixed. The PCI subsystem assumes that
driverless devices are not in use, so they are disabled for runtime PM
and marked as suspended. This is not appropriate for VGA devices,
which can indeed be used without a driver.
I'm not sure what the best solution is. Maybe we should Ying's
proposal a step farther:
Make pm_runtime_set_suspended() fail if runtime PM is
forbidden.
Make pm_runtime_forbid() call pm_runtime_set_active()
(and do a runtime resume of the parent) if disable_depth > 0.
Make the PCI runtime-idle routine call
pm_runtime_set_suspended() if disable_depth > 0. Or maybe
do this for all devices, in the runtime PM core.
What do you think?
Alan Stern
^ permalink raw reply
* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Rafael J. Wysocki @ 2012-11-07 16:09 UTC (permalink / raw)
To: Alan Stern; +Cc: Huang Ying, linux-kernel, linux-pm
In-Reply-To: <Pine.LNX.4.44L0.1211071032070.1859-100000@iolanthe.rowland.org>
On Wednesday, November 07, 2012 10:49:04 AM Alan Stern wrote:
> On Wed, 7 Nov 2012, Huang Ying wrote:
>
> > > > Devices will be disabled if the PCI driver is unbound from the PCI
> > > > device.
> > >
> > > Yes. But without a PCI driver, nothing will call
> > > pm_runtime_set_suspended.
> >
> > pm_runtime_set_suspended will be called in pci_device_remove or error
> > path of local_pci_probe.
>
> Yes, okay. But that's what we want. Unused, driverless PCI devices
> shouldn't force their parents to remain at full power.
>
> > > And even if something does call
> > > pm_runtime_set_suspended, it's still not a problem -- the device can't
> > > be used without a driver.
> >
> > The VGA device can be used without a driver.
>
> Ah, right, that's your _real_ problem. You should have mentioned this
> in the original Changelog for the patch.
>
> Rafael, this does need to be fixed.
Yup.
> The PCI subsystem assumes that
> driverless devices are not in use, so they are disabled for runtime PM
> and marked as suspended. This is not appropriate for VGA devices,
> which can indeed be used without a driver.
>
> I'm not sure what the best solution is. Maybe we should Ying's
> proposal a step farther:
>
> Make pm_runtime_set_suspended() fail if runtime PM is
> forbidden.
>
> Make pm_runtime_forbid() call pm_runtime_set_active()
> (and do a runtime resume of the parent) if disable_depth > 0.
I'd prefer this one. The callers of pm_runtime_forbid() may actually
reasonably expect something like this to happen.
> Make the PCI runtime-idle routine call
> pm_runtime_set_suspended() if disable_depth > 0. Or maybe
> do this for all devices, in the runtime PM core.
That would mean calling it on every call to pm_runtime_put() and friends
which seems to be a bit wasteful.
Thanks,
Rafael
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply
* Re: [PATCH 9/9] cgroup_freezer: implement proper hierarchy support
From: Tejun Heo @ 2012-11-07 16:31 UTC (permalink / raw)
To: Michal Hocko
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
In-Reply-To: <20121107110024.GA4143@dhcp22.suse.cz>
Hello, Michal.
On Wed, Nov 07, 2012 at 12:00:57PM +0100, Michal Hocko wrote:
> > + * Update all its descendants in pre-order traversal. Each
> > + * descendant will try to inherit its parent's FREEZING state as
> > + * CGROUP_FREEZING_PARENT.
> > + */
> > + rcu_read_lock();
> > + cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
> > + struct freezer *pos_f = cgroup_freezer(pos);
> > + struct freezer *parent = parent_freezer(freezer);
>
> This doesn't seem right. Why are we interested in freezer->parent here
> at all? We should either care about freezer->state & CGROUP_FREEZING or
> parent = parent_freezer(pos_f), right?
Yeap, that should have been parent_freezer(pos_f). Argh...
Thanks.
--
tejun
^ permalink raw reply
* [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support
From: Tejun Heo @ 2012-11-07 16:39 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-10-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved. They all operated
separately.
This patch implements proper hierarchy support. If a cgroup is
frozen, all its descendants are frozen. A cgroup is thawed iff it and
all its ancestors are THAWED. freezer.self_freezing shows the current
freezing state for the cgroup itself. freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.
freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state. update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.
Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations. Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.
As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.
Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too. This behavior change is
intended and has been warned via .broken_hierarchy.
v2: Michal spotted a bug in freezer_change_state() - descendants were
inheriting from the wrong ancestor. Fixed.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
kernel/cgroup_freezer.c | 161 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 123 insertions(+), 38 deletions(-)
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -22,6 +22,13 @@
#include <linux/freezer.h>
#include <linux/seq_file.h>
+/*
+ * A cgroup is freezing if any FREEZING flags are set. FREEZING_SELF is
+ * set if "FROZEN" is written to freezer.state cgroupfs file, and cleared
+ * for "THAWED". FREEZING_PARENT is set if the parent freezer is FREEZING
+ * for whatever reason. IOW, a cgroup has FREEZING_PARENT set if one of
+ * its ancestors has FREEZING_SELF set.
+ */
enum freezer_state_flags {
CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */
CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */
@@ -50,6 +57,15 @@ static inline struct freezer *task_freez
struct freezer, css);
}
+static struct freezer *parent_freezer(struct freezer *freezer)
+{
+ struct cgroup *pcg = freezer->css.cgroup->parent;
+
+ if (pcg)
+ return cgroup_freezer(pcg);
+ return NULL;
+}
+
bool cgroup_freezing(struct task_struct *task)
{
bool ret;
@@ -74,17 +90,6 @@ static const char *freezer_state_strs(un
return "THAWED";
};
-/*
- * State diagram
- * Transitions are caused by userspace writes to the freezer.state file.
- * The values in parenthesis are state labels. The rest are edge labels.
- *
- * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
- * ^ ^ | |
- * | \_______THAWED_______/ |
- * \__________________________THAWED____________/
- */
-
struct cgroup_subsys freezer_subsys;
static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
@@ -103,15 +108,34 @@ static struct cgroup_subsys_state *freez
* freezer_post_create - commit creation of a freezer cgroup
* @cgroup: cgroup being created
*
- * We're committing to creation of @cgroup. Mark it online.
+ * We're committing to creation of @cgroup. Mark it online and inherit
+ * parent's freezing state while holding both parent's and our
+ * freezer->lock.
*/
static void freezer_post_create(struct cgroup *cgroup)
{
struct freezer *freezer = cgroup_freezer(cgroup);
+ struct freezer *parent = parent_freezer(freezer);
+
+ /*
+ * The following double locking and freezing state inheritance
+ * guarantee that @cgroup can never escape ancestors' freezing
+ * states. See cgroup_for_each_descendant_pre() for details.
+ */
+ if (parent)
+ spin_lock_irq(&parent->lock);
+ spin_lock_nested(&freezer->lock, SINGLE_DEPTH_NESTING);
- spin_lock_irq(&freezer->lock);
freezer->state |= CGROUP_FREEZER_ONLINE;
- spin_unlock_irq(&freezer->lock);
+
+ if (parent && (parent->state & CGROUP_FREEZING)) {
+ freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN;
+ atomic_inc(&system_freezing_cnt);
+ }
+
+ spin_unlock(&freezer->lock);
+ if (parent)
+ spin_unlock_irq(&parent->lock);
}
/**
@@ -153,6 +177,7 @@ static void freezer_attach(struct cgroup
{
struct freezer *freezer = cgroup_freezer(new_cgrp);
struct task_struct *task;
+ bool clear_frozen = false;
spin_lock_irq(&freezer->lock);
@@ -172,10 +197,25 @@ static void freezer_attach(struct cgroup
} else {
freeze_task(task);
freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = true;
}
}
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Propagate FROZEN clearing upwards. We may race with
+ * update_if_frozen(), but as long as both work bottom-up, either
+ * update_if_frozen() sees child's FROZEN cleared or we clear the
+ * parent's FROZEN later. No parent w/ !FROZEN children can be
+ * left FROZEN.
+ */
+ while (clear_frozen && (freezer = parent_freezer(freezer))) {
+ spin_lock_irq(&freezer->lock);
+ freezer->state &= ~CGROUP_FROZEN;
+ clear_frozen = freezer->state & CGROUP_FREEZING;
+ spin_unlock_irq(&freezer->lock);
+ }
}
static void freezer_fork(struct task_struct *task)
@@ -200,24 +240,47 @@ out:
rcu_read_unlock();
}
-/*
- * We change from FREEZING to FROZEN lazily if the cgroup was only
- * partially frozen when we exitted write. Caller must hold freezer->lock.
+/**
+ * update_if_frozen - update whether a cgroup finished freezing
+ * @cgroup: cgroup of interest
+ *
+ * Once FREEZING is initiated, transition to FROZEN is lazily updated by
+ * calling this function. If the current state is FREEZING but not FROZEN,
+ * this function checks whether all tasks of this cgroup and the descendant
+ * cgroups finished freezing and, if so, sets FROZEN.
+ *
+ * The caller is responsible for grabbing RCU read lock and calling
+ * update_if_frozen() on all descendants prior to invoking this function.
*
* Task states and freezer state might disagree while tasks are being
* migrated into or out of @cgroup, so we can't verify task states against
* @freezer state here. See freezer_attach() for details.
*/
-static void update_if_frozen(struct freezer *freezer)
+static void update_if_frozen(struct cgroup *cgroup)
{
- struct cgroup *cgroup = freezer->css.cgroup;
+ struct freezer *freezer = cgroup_freezer(cgroup);
+ struct cgroup *pos;
struct cgroup_iter it;
struct task_struct *task;
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ spin_lock_irq(&freezer->lock);
+
if (!(freezer->state & CGROUP_FREEZING) ||
(freezer->state & CGROUP_FROZEN))
- return;
+ goto out_unlock;
+ /* are all (live) children frozen? */
+ cgroup_for_each_child(pos, cgroup) {
+ struct freezer *child = cgroup_freezer(pos);
+
+ if ((child->state & CGROUP_FREEZER_ONLINE) &&
+ !(child->state & CGROUP_FROZEN))
+ goto out_unlock;
+ }
+
+ /* are all tasks frozen? */
cgroup_iter_start(cgroup, &it);
while ((task = cgroup_iter_next(cgroup, &it))) {
@@ -229,27 +292,32 @@ static void update_if_frozen(struct free
* the usual frozen condition.
*/
if (!frozen(task) && !freezer_should_skip(task))
- goto notyet;
+ goto out_iter_end;
}
}
freezer->state |= CGROUP_FROZEN;
-notyet:
+out_iter_end:
cgroup_iter_end(cgroup, &it);
+out_unlock:
+ spin_unlock_irq(&freezer->lock);
}
static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
struct seq_file *m)
{
- struct freezer *freezer = cgroup_freezer(cgroup);
- unsigned int state;
+ struct cgroup *pos;
- spin_lock_irq(&freezer->lock);
- update_if_frozen(freezer);
- state = freezer->state;
- spin_unlock_irq(&freezer->lock);
+ rcu_read_lock();
- seq_puts(m, freezer_state_strs(state));
+ /* update states bottom-up */
+ cgroup_for_each_descendant_post(pos, cgroup)
+ update_if_frozen(pos);
+ update_if_frozen(cgroup);
+
+ rcu_read_unlock();
+
+ seq_puts(m, freezer_state_strs(cgroup_freezer(cgroup)->state));
seq_putc(m, '\n');
return 0;
}
@@ -320,14 +388,39 @@ static void freezer_apply_state(struct f
* @freezer: freezer of interest
* @freeze: whether to freeze or thaw
*
- * Freeze or thaw @cgroup according to @freeze.
+ * Freeze or thaw @freezer according to @freeze. The operations are
+ * recursive - all descendants of @freezer will be affected.
*/
static void freezer_change_state(struct freezer *freezer, bool freeze)
{
+ struct cgroup *pos;
+
/* update @freezer */
spin_lock_irq(&freezer->lock);
freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
spin_unlock_irq(&freezer->lock);
+
+ /*
+ * Update all its descendants in pre-order traversal. Each
+ * descendant will try to inherit its parent's FREEZING state as
+ * CGROUP_FREEZING_PARENT.
+ */
+ rcu_read_lock();
+ cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
+ struct freezer *pos_f = cgroup_freezer(pos);
+ struct freezer *parent = parent_freezer(pos_f);
+
+ /*
+ * Our update to @parent->state is already visible which is
+ * all we need. No need to lock @parent. For more info on
+ * synchronization, see freezer_post_create().
+ */
+ spin_lock_irq(&pos_f->lock);
+ freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
+ CGROUP_FREEZING_PARENT);
+ spin_unlock_irq(&pos_f->lock);
+ }
+ rcu_read_unlock();
}
static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
@@ -390,12 +483,4 @@ struct cgroup_subsys freezer_subsys = {
.attach = freezer_attach,
.fork = freezer_fork,
.base_cftypes = files,
-
- /*
- * freezer subsys doesn't handle hierarchy at all. Frozen state
- * should be inherited through the hierarchy - if a parent is
- * frozen, all its children should be frozen. Fix it and remove
- * the following.
- */
- .broken_hierarchy = true,
};
^ permalink raw reply
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
From: Michal Hocko @ 2012-11-07 16:54 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
fweisbec
In-Reply-To: <1351931915-1701-4-git-send-email-tj@kernel.org>
On Sat 03-11-12 01:38:29, Tejun Heo wrote:
[...]
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index cc5d2a0..8bd662c 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2985,6 +2985,92 @@ static void cgroup_enable_task_cg_lists(void)
> write_unlock(&css_set_lock);
> }
>
> +/**
> + * cgroup_next_descendant_pre - find the next descendant for pre-order walk
> + * @pos: the current position (%NULL to initiate traversal)
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * To be used by cgroup_for_each_descendant_pre(). Find the next
> + * descendant to visit for pre-order traversal of @cgroup's descendants.
> + */
> +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
> + struct cgroup *cgroup)
> +{
> + struct cgroup *next;
> +
> + WARN_ON_ONCE(!rcu_read_lock_held());
> +
> + /* if first iteration, pretend we just visited @cgroup */
> + if (!pos) {
> + if (list_empty(&cgroup->children))
> + return NULL;
> + pos = cgroup;
> + }
Is there any specific reason why the root of the tree is excluded?
This is bit impractical because you have to special case the root
in the code.
> +
> + /* visit the first child if exists */
> + next = list_first_or_null_rcu(&pos->children, struct cgroup, sibling);
> + if (next)
> + return next;
> +
> + /* no child, visit my or the closest ancestor's next sibling */
> + do {
> + next = list_entry_rcu(pos->sibling.next, struct cgroup,
> + sibling);
> + if (&next->sibling != &pos->parent->children)
> + return next;
> +
> + pos = pos->parent;
> + } while (pos != cgroup);
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_next_descendant_pre);
[...]
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
From: Tejun Heo @ 2012-11-07 17:01 UTC (permalink / raw)
To: Michal Hocko
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, rjw-KKrjLPT3xs0,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <20121107165457.GD4131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
Hello, Michal.
On Wed, Nov 07, 2012 at 05:54:57PM +0100, Michal Hocko wrote:
> > +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
> > + struct cgroup *cgroup)
> > +{
> > + struct cgroup *next;
> > +
> > + WARN_ON_ONCE(!rcu_read_lock_held());
> > +
> > + /* if first iteration, pretend we just visited @cgroup */
> > + if (!pos) {
> > + if (list_empty(&cgroup->children))
> > + return NULL;
> > + pos = cgroup;
> > + }
>
> Is there any specific reason why the root of the tree is excluded?
> This is bit impractical because you have to special case the root
> in the code.
Yeah, thought about including it but decided against it for two
reasons.
* To be consistent with cgroup_for_each_children() - it's a bit weird
for descendants to include self when children don't.
* Iteration root is likely to require different treatment anyway.
e.g. for cgroup_freezer, the root is updated to the specified config
while all the descendants inherit config from its immediate parent.
They are different.
Thanks.
--
tejun
^ permalink raw reply
* Re: [PATCH 1/9] cgroup: add cgroup_subsys->post_create()
From: Tejun Heo @ 2012-11-07 17:02 UTC (permalink / raw)
To: Michal Hocko
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121107152516.GA4131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
Hello, Michal.
On Wed, Nov 07, 2012 at 04:25:16PM +0100, Michal Hocko wrote:
> > This patch adds ->post_create(). It's called after all ->create()
> > succeeded and the cgroup is linked into the generic cgroup hierarchy.
> > This plays the counterpart of ->pre_destroy().
>
> Hmm, I had to look at "cgroup_freezer: implement proper hierarchy
> support" to actually understand what is the callback good for. The above
> sounds as if the callback is needed when a controller wants to use
> the new iterators or when pre_destroy is defined.
>
> I think it would be helpful if the changelog described that the callback
> is needed when the controller keeps a mutable shared state for the
> hierarchy. For example memory controller doesn't have any such a strict
> requirement so we can safely use your new iterators without pre_destroy.
Hmm.... will try to explain it but I think it might be best to just
refer to the later patch for details. It's a bit tricky to explain.
Thanks.
--
tejun
^ permalink raw reply
* [PATCH 1/9 v2] cgroup: add cgroup_subsys->post_create()
From: Tejun Heo @ 2012-11-07 17:15 UTC (permalink / raw)
To: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
rjw-KKrjLPT3xs0
Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Currently, there's no way for a controller to find out whether a new
cgroup finished all ->create() allocatinos successfully and is
considered "live" by cgroup.
This becomes a problem later when we add generic descendants walking
to cgroup which can be used by controllers as controllers don't have a
synchronization point where it can synchronize against new cgroups
appearing in such walks.
This patch adds ->post_create(). It's called after all ->create()
succeeded and the cgroup is linked into the generic cgroup hierarchy.
This plays the counterpart of ->pre_destroy().
When used in combination with the to-be-added generic descendant
iterators, ->post_create() can be used to implement reliable state
inheritance. It will be explained with the descendant iterators.
v2: Added a paragraph about its future use w/ descendant iterators per
Michal.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
---
If you can explain it better, please be my guest.
Thanks.
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 12 ++++++++++--
2 files changed, 11 insertions(+), 2 deletions(-)
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -438,6 +438,7 @@ int cgroup_taskset_size(struct cgroup_ta
struct cgroup_subsys {
struct cgroup_subsys_state *(*create)(struct cgroup *cgrp);
+ void (*post_create)(struct cgroup *cgrp);
void (*pre_destroy)(struct cgroup *cgrp);
void (*destroy)(struct cgroup *cgrp);
int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset);
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4060,10 +4060,15 @@ static long cgroup_create(struct cgroup
if (err < 0)
goto err_remove;
- /* each css holds a ref to the cgroup's dentry */
- for_each_subsys(root, ss)
+ for_each_subsys(root, ss) {
+ /* each css holds a ref to the cgroup's dentry */
dget(dentry);
+ /* creation succeeded, notify subsystems */
+ if (ss->post_create)
+ ss->post_create(cgrp);
+ }
+
/* The cgroup directory was pre-locked for us */
BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
@@ -4281,6 +4286,9 @@ static void __init cgroup_init_subsys(st
ss->active = 1;
+ if (ss->post_create)
+ ss->post_create(&ss->root->top_cgroup);
+
/* this function shouldn't be used with modular subsystems, since they
* need to register a subsys_id, among other things */
BUG_ON(ss->module);
^ permalink raw reply
* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Alan Stern @ 2012-11-07 17:17 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Huang Ying, linux-kernel, linux-pm
In-Reply-To: <2385996.Tx2Do9TUNJ@vostro.rjw.lan>
On Wed, 7 Nov 2012, Rafael J. Wysocki wrote:
> > The PCI subsystem assumes that
> > driverless devices are not in use, so they are disabled for runtime PM
> > and marked as suspended. This is not appropriate for VGA devices,
> > which can indeed be used without a driver.
> >
> > I'm not sure what the best solution is. Maybe we should Ying's
> > proposal a step farther:
> >
> > Make pm_runtime_set_suspended() fail if runtime PM is
> > forbidden.
> >
> > Make pm_runtime_forbid() call pm_runtime_set_active()
> > (and do a runtime resume of the parent) if disable_depth > 0.
>
> I'd prefer this one.
That wasn't meant to be a choice. The first item is close to what the
original patch did; I was suggesting that we should adopt all three
items.
If you adopt the second item but not the first then things won't work
when a driver is removed after power/control is set to "on".
> The callers of pm_runtime_forbid() may actually
> reasonably expect something like this to happen.
>
> > Make the PCI runtime-idle routine call
> > pm_runtime_set_suspended() if disable_depth > 0. Or maybe
> > do this for all devices, in the runtime PM core.
>
> That would mean calling it on every call to pm_runtime_put() and friends
> which seems to be a bit wasteful.
Not on every call; only when disable_depth > 0. We don't expect to see
idle calls very often under those circumstances.
Alan Stern
^ permalink raw reply
* Re: [PATCH 1/9 v2] cgroup: add cgroup_subsys->post_create()
From: Michal Hocko @ 2012-11-07 17:40 UTC (permalink / raw)
To: Tejun Heo
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, rjw-KKrjLPT3xs0,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
Glauber Costa
In-Reply-To: <20121107171508.GF2660-9pTldWuhBndy/B6EtB590w@public.gmane.org>
On Wed 07-11-12 09:15:08, Tejun Heo wrote:
[...]
> This patch adds ->post_create(). It's called after all ->create()
> succeeded and the cgroup is linked into the generic cgroup hierarchy.
> This plays the counterpart of ->pre_destroy().
>
> When used in combination with the to-be-added generic descendant
> iterators, ->post_create() can be used to implement reliable state
> inheritance. It will be explained with the descendant iterators.
Thanks
--
Michal Hocko
SUSE Labs
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox