* [PATCH] opp: introduce library for device-specific OPPs
[not found] <[PATCH 1/4] OMAP: introduce OPP layer for device-specific OPPs>
@ 2010-09-17 1:29 ` Nishanth Menon
2010-09-17 13:41 ` Linus Walleij
` (3 more replies)
0 siblings, 4 replies; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 1:29 UTC (permalink / raw)
To: linux-arm-kernel
SOCs have a standard set of tuples consisting of frequency and
voltage pairs that the device will support per voltage domain. These
are called Operating Performance Points or OPPs. The actual
definitions of Operating Performance Points varies over silicon within the
same family of devices. For a specific domain, you can have a set of
{frequency, voltage} pairs. As the kernel boots and more information
is available, a set of these are activated based on the precise nature
of device the kernel boots up on. It is interesting to remember that
each IP which belongs to a voltage domain may define their own set of
OPPs on top of this.
To implement an OPP, some sort of power management support is necessary
hence this library enablement depends on CONFIG_PM, however this does
not fit into the core power framework as it is an independent library.
This is hence introduced under lib allowing all architectures to
selectively enable the feature based on thier capabilities.
Contributions include:
Sanjeev Premi for the initial concept:
http://patchwork.kernel.org/patch/50998/
Kevin Hilman for converting original design to device-based
Kevin Hilman and Paul Walmsey for cleaning up many of the function
abstractions, improvements and data structure handling
Romit Dasgupta for using enums instead of opp pointers
Thara Gopinath, Eduardo Valentin and Vishwanath BS for fixes and
cleanups.
Linus Walleij for recommending this layer be made generic for usage
in other architectures beyond OMAP and ARM.
Discussions and comments from:
http://marc.info/?l=linux-omap&m=126033945313269&w=2
http://marc.info/?l=linux-omap&m=125482970102327&w=2
http://marc.info/?t=125809247500002&r=1&w=2
http://marc.info/?l=linux-omap&m=126025973426007&w=2
http://marc.info/?t=128152609200064&r=1&w=2
incorporated.
Cc: Benoit Cousson <b-cousson@ti.com>
Cc: Madhusudhan Chikkature Rajashekar <madhu.cr@ti.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Roberto Granados Dorado <x0095451@ti.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Sergio Alberto Aguirre Rodriguez <saaguirre@ti.com>
Cc: Tero Kristo <Tero.Kristo@nokia.com>
Cc: Eduardo Valentin <eduardo.valentin@nokia.com>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Romit Dasgupta <romit@ti.com>
Cc: Sanjeev Premi <premi@ti.com>
Cc: Thara Gopinath <thara@ti.com>
Cc: Vishwanath BS <vishwanath.bs@ti.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
---
minor notes:
a) Code rebased to linus's tree commit 03a7ab0
b) Looping in get-maintainter.pl list and lkml as per thread of discussion
http://marc.info/?t=128152609200064&r=1&w=2
Documentation/power/00-INDEX | 2 +
include/linux/opp.h | 136 +++++++++++++
kernel/power/Kconfig | 14 ++
lib/Makefile | 2 +
lib/opp.c | 440 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 594 insertions(+), 0 deletions(-)
create mode 100644 include/linux/opp.h
create mode 100644 lib/opp.c
diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
index fb742c2..45e9d4a 100644
--- a/Documentation/power/00-INDEX
+++ b/Documentation/power/00-INDEX
@@ -14,6 +14,8 @@ interface.txt
- Power management user interface in /sys/power
notifiers.txt
- Registering suspend notifiers in device drivers
+opp.txt
+ - Operating Performance Point library
pci.txt
- How the PCI Subsystem Does Power Management
pm_qos_interface.txt
diff --git a/include/linux/opp.h b/include/linux/opp.h
new file mode 100644
index 0000000..94a552b
--- /dev/null
+++ b/include/linux/opp.h
@@ -0,0 +1,136 @@
+/*
+ * Generic OPP Interface
+ *
+ * Copyright (C) 2009-2010 Texas Instruments Incorporated.
+ * Nishanth Menon
+ * Romit Dasgupta <romit@ti.com>
+ * Copyright (C) 2009 Deep Root Systems, LLC.
+ * Kevin Hilman
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef __ASM_OPP_H
+#define __ASM_OPP_H
+
+#include <linux/err.h>
+#include <linux/cpufreq.h>
+
+/**
+ * struct opp_def - Generic OPP Definition
+ * @freq: Frequency in hertz corresponding to this OPP
+ * @u_volt: Nominal voltage in microvolts corresponding to this OPP
+ * @enabled: True/false - is this OPP enabled/disabled by default
+ *
+ * SOCs have a standard set of tuples consisting of frequency and voltage
+ * pairs that the device will support per voltage domain. This is called
+ * Operating Performance Points or OPP. The actual definitions of Operating
+ * Performance Points varies over silicon within the same family of devices.
+ * For a specific domain, you can have a set of {frequency, voltage} pairs
+ * and this is denoted by an array of opp_def. As the kernel boots and more
+ * information is available, a set of these are activated based on the precise
+ * nature of device the kernel boots up on. It is interesting to remember that
+ * each IP which belongs to a voltage domain may define their own set of OPPs
+ * on top of this - but this is handled by the appropriate driver.
+ */
+struct opp_def {
+ unsigned long freq;
+ unsigned long u_volt;
+
+ bool enabled;
+};
+
+/*
+ * Initialization wrapper used to define an OPP.
+ * To point at the end of a terminator of a list of OPPs,
+ * use OPP_DEF(0, 0, 0)
+ */
+#define OPP_DEF(_enabled, _freq, _uv) \
+{ \
+ .enabled = _enabled, \
+ .freq = _freq, \
+ .u_volt = _uv, \
+}
+
+struct opp;
+
+#ifdef CONFIG_PM
+
+unsigned long opp_get_voltage(const struct opp *opp);
+
+unsigned long opp_get_freq(const struct opp *opp);
+
+int opp_get_opp_count(struct device *dev);
+
+struct opp *opp_find_freq_exact(struct device *dev, unsigned long freq,
+ bool enabled);
+
+struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq);
+
+struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq);
+
+int opp_add(struct device *dev, const struct opp_def *opp_def);
+
+int opp_enable(struct opp *opp);
+
+int opp_disable(struct opp *opp);
+
+void opp_init_cpufreq_table(struct device *dev,
+ struct cpufreq_frequency_table **table);
+#else
+static inline unsigned long opp_get_voltage(const struct opp *opp)
+{
+ return 0;
+}
+
+static inline unsigned long opp_get_freq(const struct opp *opp)
+{
+ return 0;
+}
+
+static inline int opp_get_opp_count(struct device *dev)
+{
+ return 0;
+}
+
+static inline struct opp *opp_find_freq_exact(struct device *dev,
+ unsigned long freq, bool enabled)
+{
+ return ERR_PTR(-EINVAL);
+}
+
+static inline struct opp *opp_find_freq_floor(struct device *dev,
+ unsigned long *freq)
+{
+ return ERR_PTR(-EINVAL);
+}
+
+static inline struct opp *opp_find_freq_ceil(struct device *dev,
+ unsigned long *freq)
+{
+ return ERR_PTR(-EINVAL);
+}
+
+static inline int opp_add(struct device *dev, const struct opp_def *opp_def)
+{
+ return ERR_PTR(-EINVAL);
+}
+
+static inline int opp_enable(struct opp *opp)
+{
+ return 0;
+}
+
+static inline int opp_disable(struct opp *opp)
+{
+ return 0;
+}
+
+static inline void opp_init_cpufreq_table(struct device *dev,
+ struct cpufreq_frequency_table **table)
+{
+}
+
+#endif /* CONFIG_PM */
+#endif /* __ASM_OPP_H */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index ca6066a..634eab6 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -242,3 +242,17 @@ config PM_OPS
bool
depends on PM_SLEEP || PM_RUNTIME
default y
+
+config PM_OPP
+ bool "Enable Operating Performance Point(OPP) Layer library"
+ depends on PM
+ ---help---
+ SOCs have a standard set of tuples consisting of frequency and
+ voltage pairs that the device will support per voltage domain. This
+ is called Operating Performance Point or OPP. The actual definitions
+ of OPP varies over silicon within the same family of devices.
+
+ OPP layer organizes the data internally using device pointers
+ representing individual voltage domains and provides SOC
+ implementations a ready to use framework to manage OPPs.
+ For more information, read <file:Documentation/power/opp.txt>
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..0114fcf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -88,6 +88,8 @@ obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o
obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o
+obj-$(CONFIG_PM_OPP) += opp.o
+
lib-$(CONFIG_GENERIC_BUG) += bug.o
obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
diff --git a/lib/opp.c b/lib/opp.c
new file mode 100644
index 0000000..650c8c3
--- /dev/null
+++ b/lib/opp.c
@@ -0,0 +1,440 @@
+/*
+ * Generic OPP Interface
+ *
+ * Copyright (C) 2009-2010 Texas Instruments Incorporated.
+ * Nishanth Menon
+ * Romit Dasgupta <romit@ti.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/cpufreq.h>
+#include <linux/list.h>
+#include <linux/opp.h>
+
+/**
+ * struct opp - Generic OPP description structure
+ * @node: list node
+ * @enabled: true/false - marking this OPP as enabled/disabled
+ * @rate: Frequency in hertz
+ * @u_volt: Nominal voltage in microvolts corresponding to this OPP
+ * @dev_opp: contains the device_opp struct
+ *
+ * This structure stores the OPP information for a given domain.
+ */
+struct opp {
+ struct list_head node;
+
+ bool enabled;
+ unsigned long rate;
+ unsigned long u_volt;
+
+ struct device_opp *dev_opp;
+};
+
+/**
+ * struct device_opp - Device opp structure
+ * @node: list node
+ * @dev: device handle
+ * @opp_list: list of opps
+ * @opp_count: num opps
+ * @enabled_opp_count: how many opps are actually enabled
+ *
+ * This is an internal datastructure maintaining the link to
+ * opps attached to a domain device. This structure is not
+ * meant to be shared with users as it private to opp layer.
+ */
+struct device_opp {
+ struct list_head node;
+
+ struct device *dev;
+
+ struct list_head opp_list;
+ u32 opp_count;
+ u32 enabled_opp_count;
+};
+
+static LIST_HEAD(dev_opp_list);
+
+/**
+ * find_device_opp() - find device_opp struct using device pointer
+ * @dev: device pointer used to lookup device OPPs
+ *
+ * Search list of device OPPs for one containing matching device.
+ *
+ * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or
+ * -EINVAL based on type of error.
+ */
+static struct device_opp *find_device_opp(struct device *dev)
+{
+ struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV);
+
+ if (unlikely(!dev || IS_ERR(dev))) {
+ pr_err("%s: Invalid parameters being passed\n", __func__);
+ return ERR_PTR(-EINVAL);
+ }
+
+ list_for_each_entry(tmp_dev_opp, &dev_opp_list, node) {
+ if (tmp_dev_opp->dev == dev) {
+ dev_opp = tmp_dev_opp;
+ break;
+ }
+ }
+
+ return dev_opp;
+}
+
+/**
+ * opp_get_voltage() - Gets the voltage corresponding to an opp
+ * @opp: opp for which voltage has to be returned for
+ *
+ * Return voltage in micro volt corresponding to the opp, else
+ * return 0
+ */
+unsigned long opp_get_voltage(const struct opp *opp)
+{
+ if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
+ pr_err("%s: Invalid parameters being passed\n", __func__);
+ return 0;
+ }
+
+ return opp->u_volt;
+}
+
+/**
+ * opp_get_freq() - Gets the frequency corresponding to an opp
+ * @opp: opp for which frequency has to be returned for
+ *
+ * Return frequency in hertz corresponding to the opp, else
+ * return 0
+ */
+unsigned long opp_get_freq(const struct opp *opp)
+{
+ if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
+ pr_err("%s: Invalid parameters being passed\n", __func__);
+ return 0;
+ }
+
+ return opp->rate;
+}
+
+/**
+ * opp_get_opp_count() - Get number of opps enabled in the opp list
+ * @dev: device for which we do this operation
+ *
+ * This functions returns the number of opps if there are any OPPs enabled,
+ * else returns corresponding error value.
+ */
+int opp_get_opp_count(struct device *dev)
+{
+ struct device_opp *dev_opp;
+
+ dev_opp = find_device_opp(dev);
+ if (IS_ERR(dev_opp))
+ return -ENODEV;
+
+ return dev_opp->enabled_opp_count;
+}
+
+/**
+ * opp_find_freq_exact() - search for an exact frequency
+ * @dev: device for which we do this operation
+ * @freq: frequency to search for
+ * @enabled: enabled/disabled OPP to search for
+ *
+ * Searches for exact match in the opp list and returns handle to the matching
+ * opp if found, else returns ERR_PTR in case of error and should be handled
+ * using IS_ERR.
+ *
+ * Note: enabled is a modifier for the search. if enabled=true, then the match
+ * is for exact matching frequency and is enabled. if false, the match is for
+ * exact frequency which is disabled.
+ */
+struct opp *opp_find_freq_exact(struct device *dev,
+ unsigned long freq, bool enabled)
+{
+ struct device_opp *dev_opp;
+ struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
+
+ dev_opp = find_device_opp(dev);
+ if (IS_ERR(dev_opp))
+ return opp;
+
+ list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
+ if (temp_opp->enabled && temp_opp->rate == freq) {
+ opp = temp_opp;
+ break;
+ }
+ }
+
+ return opp;
+}
+
+/**
+ * opp_find_freq_ceil() - Search for an rounded ceil freq
+ * @dev: device for which we do this operation
+ * @freq: Start frequency
+ *
+ * Search for the matching ceil *enabled* OPP from a starting freq
+ * for a domain.
+ *
+ * Returns *opp and *freq is populated with the match, else
+ * returns NULL opp if no match, else returns ERR_PTR in case of error.
+ *
+ * Example usages:
+ * * find match/next highest available frequency *
+ * freq = 350000;
+ * opp = opp_find_freq_ceil(dev, &freq))
+ * if (IS_ERR(opp))
+ * pr_err("unable to find a higher frequency\n");
+ * else
+ * pr_info("match freq = %ld\n", freq);
+ *
+ * * print all supported frequencies in ascending order *
+ * freq = 0; * Search for the lowest enabled frequency *
+ * while (!IS_ERR(opp = opp_find_freq_ceil(OPP_MPU, &freq)) {
+ * pr_info("freq = %ld\n", freq);
+ * freq++; * for next higher match *
+ * }
+ */
+struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq)
+{
+ struct device_opp *dev_opp;
+ struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
+
+ dev_opp = find_device_opp(dev);
+ if (IS_ERR(dev_opp))
+ return opp;
+
+ list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
+ if (temp_opp->enabled && temp_opp->rate >= *freq) {
+ opp = temp_opp;
+ *freq = opp->rate;
+ break;
+ }
+ }
+
+ return opp;
+}
+
+/**
+ * opp_find_freq_floor() - Search for an rounded floor freq
+ * @dev: device for which we do this operation
+ * @freq: Start frequency
+ *
+ * Search for the matching floor *enabled* OPP from a starting freq
+ * for a domain.
+ *
+ * Returns *opp and *freq is populated with the next match, else
+ * returns NULL opp if no match, else returns ERR_PTR in case of error.
+ *
+ * Example usages:
+ * * find match/next lowest available frequency
+ * freq = 350000;
+ * opp = opp_find_freq_floor(dev, &freq)))
+ * if (IS_ERR(opp))
+ * pr_err ("unable to find a lower frequency\n");
+ * else
+ * pr_info("match freq = %ld\n", freq);
+ *
+ * * print all supported frequencies in descending order *
+ * freq = ULONG_MAX; * search highest enabled frequency *
+ * while (!IS_ERR(opp = opp_find_freq_floor(OPP_MPU, &freq)) {
+ * pr_info("freq = %ld\n", freq);
+ * freq--; * for next lower match *
+ * }
+ */
+struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq)
+{
+ struct device_opp *dev_opp;
+ struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
+
+ dev_opp = find_device_opp(dev);
+ if (IS_ERR(dev_opp))
+ return opp;
+
+ list_for_each_entry_reverse(temp_opp, &dev_opp->opp_list, node) {
+ if (temp_opp->enabled && temp_opp->rate <= *freq) {
+ opp = temp_opp;
+ *freq = opp->rate;
+ break;
+ }
+ }
+
+ return opp;
+}
+
+/* wrapper to reuse converting opp_def to opp struct */
+static void opp_populate(struct opp *opp,
+ const struct opp_def *opp_def)
+{
+ opp->rate = opp_def->freq;
+ opp->enabled = opp_def->enabled;
+ opp->u_volt = opp_def->u_volt;
+}
+
+/**
+ * opp_add() - Add an OPP table from a table definitions
+ * @dev: device for which we do this operation
+ * @opp_def: opp_def to describe the OPP which we want to add.
+ *
+ * This function adds an opp definition to the opp list and returns status.
+ */
+int opp_add(struct device *dev, const struct opp_def *opp_def)
+{
+ struct device_opp *tmp_dev_opp, *dev_opp = NULL;
+ struct opp *opp, *new_opp;
+ struct list_head *head;
+
+ /* Check for existing list for 'dev' */
+ list_for_each_entry(tmp_dev_opp, &dev_opp_list, node) {
+ if (dev == tmp_dev_opp->dev) {
+ dev_opp = tmp_dev_opp;
+ break;
+ }
+ }
+
+ if (!dev_opp) {
+ /* Allocate a new device OPP table */
+ dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
+ if (!dev_opp) {
+ pr_warning("%s: unable to allocate device struct\n",
+ __func__);
+ return -ENOMEM;
+ }
+
+ dev_opp->dev = dev;
+ INIT_LIST_HEAD(&dev_opp->opp_list);
+
+ list_add(&dev_opp->node, &dev_opp_list);
+ }
+
+ /* allocate new OPP node */
+ new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
+ if (!new_opp) {
+ if (list_empty(&dev_opp->opp_list)) {
+ list_del(&dev_opp->node);
+ kfree(dev_opp);
+ }
+ pr_warning("%s: unable to allocate new opp node\n",
+ __func__);
+ return -ENOMEM;
+ }
+ opp_populate(new_opp, opp_def);
+
+ /* Insert new OPP in order of increasing frequency */
+ head = &dev_opp->opp_list;
+ list_for_each_entry_reverse(opp, &dev_opp->opp_list, node) {
+ if (new_opp->rate >= opp->rate) {
+ head = &opp->node;
+ break;
+ }
+ }
+ list_add(&new_opp->node, head);
+ dev_opp->opp_count++;
+ if (new_opp->enabled)
+ dev_opp->enabled_opp_count++;
+
+ return 0;
+}
+
+/**
+ * opp_enable() - Enable a specific OPP
+ * @opp: Pointer to opp
+ *
+ * Enables a provided opp. If the operation is valid, this returns 0, else the
+ * corresponding error value.
+ *
+ * OPP used here is from the the opp_is_valid/opp_has_freq or other search
+ * functions
+ */
+int opp_enable(struct opp *opp)
+{
+ if (unlikely(!opp || IS_ERR(opp))) {
+ pr_err("%s: Invalid parameters being passed\n", __func__);
+ return -EINVAL;
+ }
+
+ if (!opp->enabled && opp->dev_opp)
+ opp->dev_opp->enabled_opp_count++;
+
+ opp->enabled = true;
+
+ return 0;
+}
+
+/**
+ * opp_disable() - Disable a specific OPP
+ * @opp: Pointer to opp
+ *
+ * Disables a provided opp. If the operation is valid, this returns 0, else the
+ * corresponding error value.
+ *
+ * OPP used here is from the the opp_is_valid/opp_has_freq or other search
+ * functions
+ */
+int opp_disable(struct opp *opp)
+{
+ if (unlikely(!opp || IS_ERR(opp))) {
+ pr_err("%s: Invalid parameters being passed\n", __func__);
+ return -EINVAL;
+ }
+
+ if (opp->enabled && opp->dev_opp)
+ opp->dev_opp->enabled_opp_count--;
+
+ opp->enabled = false;
+
+ return 0;
+}
+
+/**
+ * opp_init_cpufreq_table() - create a cpufreq table for a domain
+ * @dev: device for which we do this operation
+ * @table: Cpufreq table returned back to caller
+ *
+ * Generate a cpufreq table for a provided domain - this assumes that the
+ * opp list is already initialized and ready for usage
+ */
+void opp_init_cpufreq_table(struct device *dev,
+ struct cpufreq_frequency_table **table)
+{
+ struct device_opp *dev_opp;
+ struct opp *opp;
+ struct cpufreq_frequency_table *freq_table;
+ int i = 0;
+
+ dev_opp = find_device_opp(dev);
+ if (IS_ERR(dev_opp)) {
+ pr_warning("%s: unable to find device\n", __func__);
+ return;
+ }
+
+ freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
+ (dev_opp->enabled_opp_count + 1), GFP_ATOMIC);
+ if (!freq_table) {
+ pr_warning("%s: failed to allocate frequency table\n",
+ __func__);
+ return;
+ }
+
+ list_for_each_entry(opp, &dev_opp->opp_list, node) {
+ if (opp->enabled) {
+ freq_table[i].index = i;
+ freq_table[i].frequency = opp->rate / 1000;
+ i++;
+ }
+ }
+
+ freq_table[i].index = i;
+ freq_table[i].frequency = CPUFREQ_TABLE_END;
+
+ *table = &freq_table[0];
+}
--
1.6.3.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 1:29 ` [PATCH] opp: introduce library for device-specific OPPs Nishanth Menon
@ 2010-09-17 13:41 ` Linus Walleij
2010-09-17 15:05 ` Nishanth Menon
2010-09-17 14:09 ` Aguirre, Sergio
` (2 subsequent siblings)
3 siblings, 1 reply; 34+ messages in thread
From: Linus Walleij @ 2010-09-17 13:41 UTC (permalink / raw)
To: linux-arm-kernel
2010/9/17 Nishanth Menon <nm@ti.com>:
> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> index fb742c2..45e9d4a 100644
> --- a/Documentation/power/00-INDEX
> +++ b/Documentation/power/00-INDEX
> @@ -14,6 +14,8 @@ interface.txt
> ? ? ? ?- Power management user interface in /sys/power
> ?notifiers.txt
> ? ? ? ?- Registering suspend notifiers in device drivers
> +opp.txt
> + ? ? ? - Operating Performance Point library
> ?pci.txt
> ? ? ? ?- How the PCI Subsystem Does Power Management
Hm you add the documentation to the index but not the documentation
itself (easy slip) would you mind posting it?
Yours,
Linus Walleij
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 1:29 ` [PATCH] opp: introduce library for device-specific OPPs Nishanth Menon
2010-09-17 13:41 ` Linus Walleij
@ 2010-09-17 14:09 ` Aguirre, Sergio
2010-09-17 15:30 ` Nishanth Menon
2010-09-17 15:36 ` [linux-pm] " Mark Brown
2010-09-17 19:19 ` Andrew Morton
3 siblings, 1 reply; 34+ messages in thread
From: Aguirre, Sergio @ 2010-09-17 14:09 UTC (permalink / raw)
To: linux-arm-kernel
Hi Nishanth,
> -----Original Message-----
> From: Menon, Nishanth
> Sent: Thursday, September 16, 2010 8:30 PM
> To: linux-arm; lkml
> Cc: Len Brown; Pavel Machek; Rafael J. Wysocki; Randy Dunlap; Jesse
> Barnes; Matthew Garrett; H. Peter Anvin; Andrew Morton; Benjamin
> Herrenschmidt; Martin K. Petersen; Andi Kleen; linux-pm; linux-doc; linux-
> omap; Menon, Nishanth; Cousson, Benoit; Chikkature Rajashekar,
> Madhusudhan; Phil Carmody; Granados Dorado, Roberto; Shilimkar, Santosh;
> Aguirre, Sergio; Tero Kristo; Eduardo Valentin; Paul Walmsley; Romit
> Dasgupta; Premi, Sanjeev; Gopinath, Thara; Sripathy, Vishwanath; Linus
> Walleij; Kevin Hilman
> Subject: [PATCH] opp: introduce library for device-specific OPPs
>
> SOCs have a standard set of tuples consisting of frequency and
> voltage pairs that the device will support per voltage domain. These
> are called Operating Performance Points or OPPs. The actual
> definitions of Operating Performance Points varies over silicon within the
> same family of devices. For a specific domain, you can have a set of
> {frequency, voltage} pairs. As the kernel boots and more information
> is available, a set of these are activated based on the precise nature
> of device the kernel boots up on. It is interesting to remember that
> each IP which belongs to a voltage domain may define their own set of
> OPPs on top of this.
>
> To implement an OPP, some sort of power management support is necessary
> hence this library enablement depends on CONFIG_PM, however this does
> not fit into the core power framework as it is an independent library.
> This is hence introduced under lib allowing all architectures to
> selectively enable the feature based on thier capabilities.
>
> Contributions include:
> Sanjeev Premi for the initial concept:
> http://patchwork.kernel.org/patch/50998/
> Kevin Hilman for converting original design to device-based
> Kevin Hilman and Paul Walmsey for cleaning up many of the function
> abstractions, improvements and data structure handling
> Romit Dasgupta for using enums instead of opp pointers
> Thara Gopinath, Eduardo Valentin and Vishwanath BS for fixes and
> cleanups.
> Linus Walleij for recommending this layer be made generic for usage
> in other architectures beyond OMAP and ARM.
>
> Discussions and comments from:
> http://marc.info/?l=linux-omap&m=126033945313269&w=2
> http://marc.info/?l=linux-omap&m=125482970102327&w=2
> http://marc.info/?t=125809247500002&r=1&w=2
> http://marc.info/?l=linux-omap&m=126025973426007&w=2
> http://marc.info/?t=128152609200064&r=1&w=2
> incorporated.
>
> Cc: Benoit Cousson <b-cousson@ti.com>
> Cc: Madhusudhan Chikkature Rajashekar <madhu.cr@ti.com>
> Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
> Cc: Roberto Granados Dorado <x0095451@ti.com>
> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Cc: Sergio Alberto Aguirre Rodriguez <saaguirre@ti.com>
> Cc: Tero Kristo <Tero.Kristo@nokia.com>
> Cc: Eduardo Valentin <eduardo.valentin@nokia.com>
> Cc: Paul Walmsley <paul@pwsan.com>
> Cc: Romit Dasgupta <romit@ti.com>
> Cc: Sanjeev Premi <premi@ti.com>
> Cc: Thara Gopinath <thara@ti.com>
> Cc: Vishwanath BS <vishwanath.bs@ti.com>
> Cc: Linus Walleij <linus.walleij@stericsson.com>
>
> Signed-off-by: Nishanth Menon <nm@ti.com>
> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
> ---
> minor notes:
> a) Code rebased to linus's tree commit 03a7ab0
> b) Looping in get-maintainter.pl list and lkml as per thread of discussion
> http://marc.info/?t=128152609200064&r=1&w=2
>
> Documentation/power/00-INDEX | 2 +
> include/linux/opp.h | 136 +++++++++++++
> kernel/power/Kconfig | 14 ++
> lib/Makefile | 2 +
> lib/opp.c | 440
> ++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 594 insertions(+), 0 deletions(-)
> create mode 100644 include/linux/opp.h
> create mode 100644 lib/opp.c
>
> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> index fb742c2..45e9d4a 100644
> --- a/Documentation/power/00-INDEX
> +++ b/Documentation/power/00-INDEX
> @@ -14,6 +14,8 @@ interface.txt
> - Power management user interface in /sys/power
> notifiers.txt
> - Registering suspend notifiers in device drivers
> +opp.txt
> + - Operating Performance Point library
> pci.txt
> - How the PCI Subsystem Does Power Management
> pm_qos_interface.txt
> diff --git a/include/linux/opp.h b/include/linux/opp.h
> new file mode 100644
> index 0000000..94a552b
> --- /dev/null
> +++ b/include/linux/opp.h
> @@ -0,0 +1,136 @@
> +/*
> + * Generic OPP Interface
> + *
> + * Copyright (C) 2009-2010 Texas Instruments Incorporated.
> + * Nishanth Menon
> + * Romit Dasgupta <romit@ti.com>
> + * Copyright (C) 2009 Deep Root Systems, LLC.
> + * Kevin Hilman
Just curious why only Romit's e-mail address is there, and not Kevin's or
yours...
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef __ASM_OPP_H
> +#define __ASM_OPP_H
> +
> +#include <linux/err.h>
> +#include <linux/cpufreq.h>
> +
> +/**
> + * struct opp_def - Generic OPP Definition
> + * @freq: Frequency in hertz corresponding to this OPP
> + * @u_volt: Nominal voltage in microvolts corresponding to this OPP
> + * @enabled: True/false - is this OPP enabled/disabled by default
> + *
> + * SOCs have a standard set of tuples consisting of frequency and voltage
> + * pairs that the device will support per voltage domain. This is called
> + * Operating Performance Points or OPP. The actual definitions of
> Operating
> + * Performance Points varies over silicon within the same family of
> devices.
> + * For a specific domain, you can have a set of {frequency, voltage}
> pairs
> + * and this is denoted by an array of opp_def. As the kernel boots and
> more
> + * information is available, a set of these are activated based on the
> precise
> + * nature of device the kernel boots up on. It is interesting to remember
> that
> + * each IP which belongs to a voltage domain may define their own set of
> OPPs
> + * on top of this - but this is handled by the appropriate driver.
> + */
> +struct opp_def {
> + unsigned long freq;
> + unsigned long u_volt;
> +
> + bool enabled;
> +};
> +
> +/*
> + * Initialization wrapper used to define an OPP.
> + * To point at the end of a terminator of a list of OPPs,
> + * use OPP_DEF(0, 0, 0)
> + */
> +#define OPP_DEF(_enabled, _freq, _uv) \
> +{ \
> + .enabled = _enabled, \
> + .freq = _freq, \
> + .u_volt = _uv, \
> +}
> +
> +struct opp;
> +
> +#ifdef CONFIG_PM
> +
> +unsigned long opp_get_voltage(const struct opp *opp);
> +
> +unsigned long opp_get_freq(const struct opp *opp);
> +
> +int opp_get_opp_count(struct device *dev);
> +
> +struct opp *opp_find_freq_exact(struct device *dev, unsigned long freq,
> + bool enabled);
> +
> +struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq);
> +
> +struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq);
> +
> +int opp_add(struct device *dev, const struct opp_def *opp_def);
> +
> +int opp_enable(struct opp *opp);
> +
> +int opp_disable(struct opp *opp);
> +
> +void opp_init_cpufreq_table(struct device *dev,
> + struct cpufreq_frequency_table **table);
> +#else
> +static inline unsigned long opp_get_voltage(const struct opp *opp)
> +{
> + return 0;
> +}
> +
> +static inline unsigned long opp_get_freq(const struct opp *opp)
> +{
> + return 0;
> +}
> +
> +static inline int opp_get_opp_count(struct device *dev)
> +{
> + return 0;
> +}
> +
> +static inline struct opp *opp_find_freq_exact(struct device *dev,
> + unsigned long freq, bool enabled)
> +{
> + return ERR_PTR(-EINVAL);
> +}
> +
> +static inline struct opp *opp_find_freq_floor(struct device *dev,
> + unsigned long *freq)
> +{
> + return ERR_PTR(-EINVAL);
> +}
> +
> +static inline struct opp *opp_find_freq_ceil(struct device *dev,
> + unsigned long *freq)
> +{
> + return ERR_PTR(-EINVAL);
> +}
> +
> +static inline int opp_add(struct device *dev, const struct opp_def
> *opp_def)
> +{
> + return ERR_PTR(-EINVAL);
> +}
> +
> +static inline int opp_enable(struct opp *opp)
> +{
> + return 0;
> +}
> +
> +static inline int opp_disable(struct opp *opp)
> +{
> + return 0;
> +}
> +
> +static inline void opp_init_cpufreq_table(struct device *dev,
> + struct cpufreq_frequency_table **table)
> +{
> +}
> +
> +#endif /* CONFIG_PM */
> +#endif /* __ASM_OPP_H */
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index ca6066a..634eab6 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -242,3 +242,17 @@ config PM_OPS
> bool
> depends on PM_SLEEP || PM_RUNTIME
> default y
> +
> +config PM_OPP
> + bool "Enable Operating Performance Point(OPP) Layer library"
> + depends on PM
> + ---help---
> + SOCs have a standard set of tuples consisting of frequency and
> + voltage pairs that the device will support per voltage domain.
> This
> + is called Operating Performance Point or OPP. The actual
> definitions
> + of OPP varies over silicon within the same family of devices.
> +
> + OPP layer organizes the data internally using device pointers
> + representing individual voltage domains and provides SOC
> + implementations a ready to use framework to manage OPPs.
> + For more information, read <file:Documentation/power/opp.txt>
> diff --git a/lib/Makefile b/lib/Makefile
> index e6a3763..0114fcf 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -88,6 +88,8 @@ obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o
> obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
> obj-$(CONFIG_CPU_NOTIFIER_ERROR_INJECT) += cpu-notifier-error-inject.o
>
> +obj-$(CONFIG_PM_OPP) += opp.o
> +
> lib-$(CONFIG_GENERIC_BUG) += bug.o
>
> obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
> diff --git a/lib/opp.c b/lib/opp.c
> new file mode 100644
> index 0000000..650c8c3
> --- /dev/null
> +++ b/lib/opp.c
> @@ -0,0 +1,440 @@
> +/*
> + * Generic OPP Interface
> + *
> + * Copyright (C) 2009-2010 Texas Instruments Incorporated.
> + * Nishanth Menon
> + * Romit Dasgupta <romit@ti.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/cpufreq.h>
> +#include <linux/list.h>
> +#include <linux/opp.h>
> +
> +/**
> + * struct opp - Generic OPP description structure
> + * @node: list node
> + * @enabled: true/false - marking this OPP as enabled/disabled
> + * @rate: Frequency in hertz
> + * @u_volt: Nominal voltage in microvolts corresponding to this OPP
> + * @dev_opp: contains the device_opp struct
> + *
> + * This structure stores the OPP information for a given domain.
> + */
> +struct opp {
> + struct list_head node;
> +
> + bool enabled;
> + unsigned long rate;
> + unsigned long u_volt;
> +
> + struct device_opp *dev_opp;
> +};
> +
> +/**
> + * struct device_opp - Device opp structure
> + * @node: list node
> + * @dev: device handle
> + * @opp_list: list of opps
> + * @opp_count: num opps
> + * @enabled_opp_count: how many opps are actually enabled
> + *
> + * This is an internal datastructure maintaining the link to
> + * opps attached to a domain device. This structure is not
> + * meant to be shared with users as it private to opp layer.
"... as it is private ..."
> + */
> +struct device_opp {
> + struct list_head node;
> +
> + struct device *dev;
> +
> + struct list_head opp_list;
> + u32 opp_count;
> + u32 enabled_opp_count;
> +};
> +
> +static LIST_HEAD(dev_opp_list);
> +
> +/**
> + * find_device_opp() - find device_opp struct using device pointer
> + * @dev: device pointer used to lookup device OPPs
> + *
> + * Search list of device OPPs for one containing matching device.
> + *
> + * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or
> + * -EINVAL based on type of error.
> + */
> +static struct device_opp *find_device_opp(struct device *dev)
> +{
> + struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV);
> +
> + if (unlikely(!dev || IS_ERR(dev))) {
> + pr_err("%s: Invalid parameters being passed\n", __func__);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + list_for_each_entry(tmp_dev_opp, &dev_opp_list, node) {
> + if (tmp_dev_opp->dev == dev) {
> + dev_opp = tmp_dev_opp;
> + break;
> + }
> + }
> +
> + return dev_opp;
> +}
> +
> +/**
> + * opp_get_voltage() - Gets the voltage corresponding to an opp
> + * @opp: opp for which voltage has to be returned for
> + *
> + * Return voltage in micro volt corresponding to the opp, else
> + * return 0
> + */
> +unsigned long opp_get_voltage(const struct opp *opp)
> +{
> + if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
> + pr_err("%s: Invalid parameters being passed\n", __func__);
> + return 0;
Shouldn't the case for !opp->enabled just return 0, without error reporting.
IMHO, having an OPP disabled is not really a bug in params, and returning this, will misinform the user.. I think.
What do you think?
> + }
> +
> + return opp->u_volt;
> +}
> +
> +/**
> + * opp_get_freq() - Gets the frequency corresponding to an opp
> + * @opp: opp for which frequency has to be returned for
> + *
> + * Return frequency in hertz corresponding to the opp, else
> + * return 0
> + */
> +unsigned long opp_get_freq(const struct opp *opp)
> +{
> + if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
> + pr_err("%s: Invalid parameters being passed\n", __func__);
> + return 0;
Similarly here.
> + }
> +
> + return opp->rate;
> +}
> +
> +/**
> + * opp_get_opp_count() - Get number of opps enabled in the opp list
> + * @dev: device for which we do this operation
> + *
> + * This functions returns the number of opps if there are any OPPs
"This function returns.."
> enabled,
> + * else returns corresponding error value.
> + */
> +int opp_get_opp_count(struct device *dev)
> +{
> + struct device_opp *dev_opp;
> +
> + dev_opp = find_device_opp(dev);
> + if (IS_ERR(dev_opp))
> + return -ENODEV;
> +
> + return dev_opp->enabled_opp_count;
> +}
> +
> +/**
> + * opp_find_freq_exact() - search for an exact frequency
> + * @dev: device for which we do this operation
> + * @freq: frequency to search for
> + * @enabled: enabled/disabled OPP to search for
> + *
> + * Searches for exact match in the opp list and returns handle to the
> matching
> + * opp if found, else returns ERR_PTR in case of error and should be
> handled
> + * using IS_ERR.
> + *
> + * Note: enabled is a modifier for the search. if enabled=true, then the
> match
> + * is for exact matching frequency and is enabled. if false, the match is
> for
> + * exact frequency which is disabled.
That enabled var is ignored, which makes this explanation mismatch with the code behavior.
Am I missing something?
> + */
> +struct opp *opp_find_freq_exact(struct device *dev,
> + unsigned long freq, bool enabled)
> +{
> + struct device_opp *dev_opp;
> + struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> +
> + dev_opp = find_device_opp(dev);
> + if (IS_ERR(dev_opp))
> + return opp;
> +
> + list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
> + if (temp_opp->enabled && temp_opp->rate == freq) {
> + opp = temp_opp;
> + break;
> + }
> + }
> +
> + return opp;
> +}
> +
> +/**
> + * opp_find_freq_ceil() - Search for an rounded ceil freq
> + * @dev: device for which we do this operation
> + * @freq: Start frequency
> + *
> + * Search for the matching ceil *enabled* OPP from a starting freq
> + * for a domain.
> + *
> + * Returns *opp and *freq is populated with the match, else
> + * returns NULL opp if no match, else returns ERR_PTR in case of error.
By looking at the code below, I don't think returning NULL is a possibility.
> + *
> + * Example usages:
> + * * find match/next highest available frequency *
> + * freq = 350000;
> + * opp = opp_find_freq_ceil(dev, &freq))
> + * if (IS_ERR(opp))
> + * pr_err("unable to find a higher frequency\n");
> + * else
> + * pr_info("match freq = %ld\n", freq);
> + *
> + * * print all supported frequencies in ascending order *
> + * freq = 0; * Search for the lowest enabled frequency *
> + * while (!IS_ERR(opp = opp_find_freq_ceil(OPP_MPU, &freq)) {
> + * pr_info("freq = %ld\n", freq);
> + * freq++; * for next higher match *
> + * }
> + */
> +struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq)
> +{
> + struct device_opp *dev_opp;
> + struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> +
> + dev_opp = find_device_opp(dev);
> + if (IS_ERR(dev_opp))
> + return opp;
> +
> + list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
> + if (temp_opp->enabled && temp_opp->rate >= *freq) {
What if freq contains a NULL pointer?
> + opp = temp_opp;
> + *freq = opp->rate;
> + break;
> + }
> + }
> +
> + return opp;
> +}
> +
> +/**
> + * opp_find_freq_floor() - Search for an rounded floor freq
"... Search for a rounded floor ..."
> + * @dev: device for which we do this operation
> + * @freq: Start frequency
> + *
> + * Search for the matching floor *enabled* OPP from a starting freq
> + * for a domain.
> + *
> + * Returns *opp and *freq is populated with the next match, else
> + * returns NULL opp if no match, else returns ERR_PTR in case of error.
Similar comment about opp not being ever NULL as the function above.
Regards,
Sergio
> + *
> + * Example usages:
> + * * find match/next lowest available frequency
> + * freq = 350000;
> + * opp = opp_find_freq_floor(dev, &freq)))
> + * if (IS_ERR(opp))
> + * pr_err ("unable to find a lower frequency\n");
> + * else
> + * pr_info("match freq = %ld\n", freq);
> + *
> + * * print all supported frequencies in descending order *
> + * freq = ULONG_MAX; * search highest enabled frequency *
> + * while (!IS_ERR(opp = opp_find_freq_floor(OPP_MPU, &freq)) {
> + * pr_info("freq = %ld\n", freq);
> + * freq--; * for next lower match *
> + * }
> + */
> +struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq)
> +{
> + struct device_opp *dev_opp;
> + struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> +
> + dev_opp = find_device_opp(dev);
> + if (IS_ERR(dev_opp))
> + return opp;
> +
> + list_for_each_entry_reverse(temp_opp, &dev_opp->opp_list, node) {
> + if (temp_opp->enabled && temp_opp->rate <= *freq) {
> + opp = temp_opp;
> + *freq = opp->rate;
> + break;
> + }
> + }
> +
> + return opp;
> +}
> +
> +/* wrapper to reuse converting opp_def to opp struct */
> +static void opp_populate(struct opp *opp,
> + const struct opp_def *opp_def)
> +{
> + opp->rate = opp_def->freq;
> + opp->enabled = opp_def->enabled;
> + opp->u_volt = opp_def->u_volt;
> +}
> +
> +/**
> + * opp_add() - Add an OPP table from a table definitions
> + * @dev: device for which we do this operation
> + * @opp_def: opp_def to describe the OPP which we want to add.
> + *
> + * This function adds an opp definition to the opp list and returns
> status.
> + */
> +int opp_add(struct device *dev, const struct opp_def *opp_def)
> +{
> + struct device_opp *tmp_dev_opp, *dev_opp = NULL;
> + struct opp *opp, *new_opp;
> + struct list_head *head;
> +
> + /* Check for existing list for 'dev' */
> + list_for_each_entry(tmp_dev_opp, &dev_opp_list, node) {
> + if (dev == tmp_dev_opp->dev) {
> + dev_opp = tmp_dev_opp;
> + break;
> + }
> + }
> +
> + if (!dev_opp) {
> + /* Allocate a new device OPP table */
> + dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
> + if (!dev_opp) {
> + pr_warning("%s: unable to allocate device struct\n",
> + __func__);
> + return -ENOMEM;
> + }
> +
> + dev_opp->dev = dev;
> + INIT_LIST_HEAD(&dev_opp->opp_list);
> +
> + list_add(&dev_opp->node, &dev_opp_list);
> + }
> +
> + /* allocate new OPP node */
> + new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
> + if (!new_opp) {
> + if (list_empty(&dev_opp->opp_list)) {
> + list_del(&dev_opp->node);
> + kfree(dev_opp);
> + }
> + pr_warning("%s: unable to allocate new opp node\n",
> + __func__);
> + return -ENOMEM;
> + }
> + opp_populate(new_opp, opp_def);
> +
> + /* Insert new OPP in order of increasing frequency */
> + head = &dev_opp->opp_list;
> + list_for_each_entry_reverse(opp, &dev_opp->opp_list, node) {
> + if (new_opp->rate >= opp->rate) {
> + head = &opp->node;
> + break;
> + }
> + }
> + list_add(&new_opp->node, head);
> + dev_opp->opp_count++;
> + if (new_opp->enabled)
> + dev_opp->enabled_opp_count++;
> +
> + return 0;
> +}
> +
> +/**
> + * opp_enable() - Enable a specific OPP
> + * @opp: Pointer to opp
> + *
> + * Enables a provided opp. If the operation is valid, this returns 0,
> else the
> + * corresponding error value.
> + *
> + * OPP used here is from the the opp_is_valid/opp_has_freq or other
> search
> + * functions
> + */
> +int opp_enable(struct opp *opp)
> +{
> + if (unlikely(!opp || IS_ERR(opp))) {
> + pr_err("%s: Invalid parameters being passed\n", __func__);
> + return -EINVAL;
> + }
> +
> + if (!opp->enabled && opp->dev_opp)
> + opp->dev_opp->enabled_opp_count++;
> +
> + opp->enabled = true;
> +
> + return 0;
> +}
> +
> +/**
> + * opp_disable() - Disable a specific OPP
> + * @opp: Pointer to opp
> + *
> + * Disables a provided opp. If the operation is valid, this returns 0,
> else the
> + * corresponding error value.
> + *
> + * OPP used here is from the the opp_is_valid/opp_has_freq or other
> search
> + * functions
> + */
> +int opp_disable(struct opp *opp)
> +{
> + if (unlikely(!opp || IS_ERR(opp))) {
> + pr_err("%s: Invalid parameters being passed\n", __func__);
> + return -EINVAL;
> + }
> +
> + if (opp->enabled && opp->dev_opp)
> + opp->dev_opp->enabled_opp_count--;
> +
> + opp->enabled = false;
> +
> + return 0;
> +}
> +
> +/**
> + * opp_init_cpufreq_table() - create a cpufreq table for a domain
> + * @dev: device for which we do this operation
> + * @table: Cpufreq table returned back to caller
> + *
> + * Generate a cpufreq table for a provided domain - this assumes that the
> + * opp list is already initialized and ready for usage
> + */
> +void opp_init_cpufreq_table(struct device *dev,
> + struct cpufreq_frequency_table **table)
> +{
> + struct device_opp *dev_opp;
> + struct opp *opp;
> + struct cpufreq_frequency_table *freq_table;
> + int i = 0;
> +
> + dev_opp = find_device_opp(dev);
> + if (IS_ERR(dev_opp)) {
> + pr_warning("%s: unable to find device\n", __func__);
> + return;
> + }
> +
> + freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
> + (dev_opp->enabled_opp_count + 1), GFP_ATOMIC);
> + if (!freq_table) {
> + pr_warning("%s: failed to allocate frequency table\n",
> + __func__);
> + return;
> + }
> +
> + list_for_each_entry(opp, &dev_opp->opp_list, node) {
> + if (opp->enabled) {
> + freq_table[i].index = i;
> + freq_table[i].frequency = opp->rate / 1000;
> + i++;
> + }
> + }
> +
> + freq_table[i].index = i;
> + freq_table[i].frequency = CPUFREQ_TABLE_END;
> +
> + *table = &freq_table[0];
> +}
> --
> 1.6.3.3
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 13:41 ` Linus Walleij
@ 2010-09-17 15:05 ` Nishanth Menon
2010-09-17 15:59 ` Nishanth Menon
0 siblings, 1 reply; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 15:05 UTC (permalink / raw)
To: linux-arm-kernel
Linus Walleij had written, on 09/17/2010 08:41 AM, the following:
> 2010/9/17 Nishanth Menon <nm@ti.com>:
>
>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
>> index fb742c2..45e9d4a 100644
>> --- a/Documentation/power/00-INDEX
>> +++ b/Documentation/power/00-INDEX
>> @@ -14,6 +14,8 @@ interface.txt
>> - Power management user interface in /sys/power
>> notifiers.txt
>> - Registering suspend notifiers in device drivers
>> +opp.txt
>> + - Operating Performance Point library
>> pci.txt
>> - How the PCI Subsystem Does Power Management
>
> Hm you add the documentation to the index but not the documentation
> itself (easy slip) would you mind posting it?
ouch.. Sorry.. I realized that I missed adding MAINTAINER entry as
well.. v2 coming up..
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 14:09 ` Aguirre, Sergio
@ 2010-09-17 15:30 ` Nishanth Menon
2010-09-17 16:11 ` Aguirre, Sergio
2010-09-17 16:15 ` Aguirre, Sergio
0 siblings, 2 replies; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 15:30 UTC (permalink / raw)
To: linux-arm-kernel
Aguirre, Sergio had written, on 09/17/2010 09:09 AM, the following:
> Hi Nishanth,
thanks for the review.. comments below..
[...]
>> diff --git a/include/linux/opp.h b/include/linux/opp.h
>> new file mode 100644
>> index 0000000..94a552b
>> --- /dev/null
>> +++ b/include/linux/opp.h
>> @@ -0,0 +1,136 @@
>> +/*
>> + * Generic OPP Interface
>> + *
>> + * Copyright (C) 2009-2010 Texas Instruments Incorporated.
>> + * Nishanth Menon
>> + * Romit Dasgupta <romit@ti.com>
>> + * Copyright (C) 2009 Deep Root Systems, LLC.
>> + * Kevin Hilman
>
> Just curious why only Romit's e-mail address is there, and not Kevin's or
> yours...
personal choice I guess, Romit prefered his email id added Kevin and I
did not.
[...]
>> diff --git a/lib/opp.c b/lib/opp.c
>> new file mode 100644
>> index 0000000..650c8c3
>> --- /dev/null
>> +++ b/lib/opp.c
>> @@ -0,0 +1,440 @@
>> +/*
>> + * Generic OPP Interface
[...]
>> +
>> +/**
>> + * struct device_opp - Device opp structure
>> + * @node: list node
>> + * @dev: device handle
>> + * @opp_list: list of opps
>> + * @opp_count: num opps
>> + * @enabled_opp_count: how many opps are actually enabled
>> + *
>> + * This is an internal datastructure maintaining the link to
>> + * opps attached to a domain device. This structure is not
>> + * meant to be shared with users as it private to opp layer.
>
> "... as it is private ..."
thanks. will fix.
[..]
>> +/**
>> + * opp_get_voltage() - Gets the voltage corresponding to an opp
>> + * @opp: opp for which voltage has to be returned for
>> + *
>> + * Return voltage in micro volt corresponding to the opp, else
>> + * return 0
>> + */
>> +unsigned long opp_get_voltage(const struct opp *opp)
>> +{
>> + if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
>> + pr_err("%s: Invalid parameters being passed\n", __func__);
>> + return 0;
>
> Shouldn't the case for !opp->enabled just return 0, without error reporting.
>
> IMHO, having an OPP disabled is not really a bug in params, and returning this, will misinform the user.. I think.
>
> What do you think?
I think we discussed it a little earlier in the chain
http://marc.info/?t=128152609200064&r=1&w=2
the operations of get_freq and get_voltage dont make sense to be used on
a disabled OPP. if we are using it so, we need to report an error to the
user and log it for debugging.
The intent is as follows:
a) find_freq_exact is the function to use for finding the opp which is
disabled (without the opp pointer, you cant query voltage and frequency
anyways).
b) to query this, you need to know the opp frequency anyways, so it
makes no usage sense for get_freq or get_voltage to be able to be using
a disabled opp.
Do we think it makes sense to add this info to the documentation as well?
>
>> + }
>> +
>> + return opp->u_volt;
>> +}
>> +
>> +/**
>> + * opp_get_freq() - Gets the frequency corresponding to an opp
>> + * @opp: opp for which frequency has to be returned for
>> + *
>> + * Return frequency in hertz corresponding to the opp, else
>> + * return 0
>> + */
>> +unsigned long opp_get_freq(const struct opp *opp)
>> +{
>> + if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
>> + pr_err("%s: Invalid parameters being passed\n", __func__);
>> + return 0;
>
> Similarly here.
commented above.
>
>> + }
>> +
>> + return opp->rate;
>> +}
>> +
>> +/**
>> + * opp_get_opp_count() - Get number of opps enabled in the opp list
>> + * @dev: device for which we do this operation
>> + *
>> + * This functions returns the number of opps if there are any OPPs
>
> "This function returns.."
Thanks. will fix in v2 post.
>
>> enabled,
>> + * else returns corresponding error value.
>> + */
>> +int opp_get_opp_count(struct device *dev)
>> +{
>> + struct device_opp *dev_opp;
>> +
>> + dev_opp = find_device_opp(dev);
>> + if (IS_ERR(dev_opp))
>> + return -ENODEV;
>> +
>> + return dev_opp->enabled_opp_count;
>> +}
>> +
>> +/**
>> + * opp_find_freq_exact() - search for an exact frequency
>> + * @dev: device for which we do this operation
>> + * @freq: frequency to search for
>> + * @enabled: enabled/disabled OPP to search for
>> + *
>> + * Searches for exact match in the opp list and returns handle to the
>> matching
>> + * opp if found, else returns ERR_PTR in case of error and should be
>> handled
>> + * using IS_ERR.
>> + *
>> + * Note: enabled is a modifier for the search. if enabled=true, then the
>> match
>> + * is for exact matching frequency and is enabled. if false, the match is
>> for
>> + * exact frequency which is disabled.
>
> That enabled var is ignored, which makes this explanation mismatch with the code behavior.
>
> Am I missing something?
ouch.. you are correct.. it is a mistake that got introduced in one of
the merges we did.. the matching check
if (temp_opp->enabled && temp_opp->rate == freq) {
should have been
if (temp_opp->enabled == enabled && temp_opp->rate == freq) {
Thanks for catching it.
>
>> + */
>> +struct opp *opp_find_freq_exact(struct device *dev,
>> + unsigned long freq, bool enabled)
>> +{
>> + struct device_opp *dev_opp;
>> + struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
>> +
>> + dev_opp = find_device_opp(dev);
>> + if (IS_ERR(dev_opp))
>> + return opp;
>> +
>> + list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
>> + if (temp_opp->enabled && temp_opp->rate == freq) {
>> + opp = temp_opp;
>> + break;
>> + }
>> + }
>> +
>> + return opp;
>> +}
>> +
>> +/**
>> + * opp_find_freq_ceil() - Search for an rounded ceil freq
>> + * @dev: device for which we do this operation
>> + * @freq: Start frequency
>> + *
>> + * Search for the matching ceil *enabled* OPP from a starting freq
>> + * for a domain.
>> + *
>> + * Returns *opp and *freq is populated with the match, else
>> + * returns NULL opp if no match, else returns ERR_PTR in case of error.
>
> By looking at the code below, I don't think returning NULL is a possibility.
yet another documentation error.. thanks for catching.
ERR_PTR(-ENODEV) is returned instead of NULL..
how about "
Returns *opp and *freq is populated with the match if found, else
returns ERR_PTR in case of error and should be handled using IS_ERR.
"
>
>> + *
>> + * Example usages:
>> + * * find match/next highest available frequency *
>> + * freq = 350000;
>> + * opp = opp_find_freq_ceil(dev, &freq))
>> + * if (IS_ERR(opp))
>> + * pr_err("unable to find a higher frequency\n");
>> + * else
>> + * pr_info("match freq = %ld\n", freq);
>> + *
>> + * * print all supported frequencies in ascending order *
>> + * freq = 0; * Search for the lowest enabled frequency *
>> + * while (!IS_ERR(opp = opp_find_freq_ceil(OPP_MPU, &freq)) {
>> + * pr_info("freq = %ld\n", freq);
>> + * freq++; * for next higher match *
>> + * }
>> + */
>> +struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq)
>> +{
>> + struct device_opp *dev_opp;
>> + struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
>> +
>> + dev_opp = find_device_opp(dev);
>> + if (IS_ERR(dev_opp))
>> + return opp;
>> +
>> + list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
>> + if (temp_opp->enabled && temp_opp->rate >= *freq) {
>
> What if freq contains a NULL pointer?
thanks.. good catch.. should have checked for dev and freq on entry..
added in the to be posted v2
>
>> + opp = temp_opp;
>> + *freq = opp->rate;
>> + break;
>> + }
>> + }
>> +
>> + return opp;
>> +}
>> +
>> +/**
>> + * opp_find_freq_floor() - Search for an rounded floor freq
>
> "... Search for a rounded floor ..."
Thanks.. will fix in v2.
>
>> + * @dev: device for which we do this operation
>> + * @freq: Start frequency
>> + *
>> + * Search for the matching floor *enabled* OPP from a starting freq
>> + * for a domain.
>> + *
>> + * Returns *opp and *freq is populated with the next match, else
>> + * returns NULL opp if no match, else returns ERR_PTR in case of error.
>
> Similar comment about opp not being ever NULL as the function above.
Suggest replacing with:
Returns *opp and *freq is populated with the match if found, else
returns ERR_PTR in case of error and should be handled using IS_ERR.
[..]
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 1:29 ` [PATCH] opp: introduce library for device-specific OPPs Nishanth Menon
2010-09-17 13:41 ` Linus Walleij
2010-09-17 14:09 ` Aguirre, Sergio
@ 2010-09-17 15:36 ` Mark Brown
2010-09-17 15:53 ` Nishanth Menon
2010-09-17 16:45 ` Phil Carmody
2010-09-17 19:19 ` Andrew Morton
3 siblings, 2 replies; 34+ messages in thread
From: Mark Brown @ 2010-09-17 15:36 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Sep 16, 2010 at 08:29:33PM -0500, Nishanth Menon wrote:
> +struct opp_def {
> + unsigned long freq;
> + unsigned long u_volt;
> +
> + bool enabled;
> +};
It might be clearer to use some term other than enabled in the code -
when reading I wasn't immediately sure if enabled meant that it was
available to be selected or if it was the active operating point. How
about 'allowed' (though I'm not 100% happy with that)?
> +static inline int opp_add(struct device *dev, const struct opp_def *opp_def)
> +{
> + return ERR_PTR(-EINVAL);
> +}
Mismatch with the return type and value here.
> +/**
> + * opp_enable() - Enable a specific OPP
> + * @opp: Pointer to opp
> + *
> + * Enables a provided opp. If the operation is valid, this returns 0, else the
> + * corresponding error value.
> + *
> + * OPP used here is from the the opp_is_valid/opp_has_freq or other search
> + * functions
> + */
> +int opp_enable(struct opp *opp)
> +{
> + if (unlikely(!opp || IS_ERR(opp))) {
> + pr_err("%s: Invalid parameters being passed\n", __func__);
> + return -EINVAL;
> + }
> +
> + if (!opp->enabled && opp->dev_opp)
> + opp->dev_opp->enabled_opp_count++;
> +
> + opp->enabled = true;
> +
> + return 0;
> +}
When reading the description I'd expected to see some facility to
trigger selection of an active operating point in the library (possibly
as a separate call since you might have a bunch of operating points
being updated in quick succession) but it looks like that needs to be
supplied externally at the minute?
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:36 ` [linux-pm] " Mark Brown
@ 2010-09-17 15:53 ` Nishanth Menon
2010-09-17 15:59 ` Mark Brown
2010-09-17 22:22 ` Rafael J. Wysocki
2010-09-17 16:45 ` Phil Carmody
1 sibling, 2 replies; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 15:53 UTC (permalink / raw)
To: linux-arm-kernel
Mark Brown had written, on 09/17/2010 10:36 AM, the following:
> On Thu, Sep 16, 2010 at 08:29:33PM -0500, Nishanth Menon wrote:
>
>> +struct opp_def {
>> + unsigned long freq;
>> + unsigned long u_volt;
>> +
>> + bool enabled;
>> +};
>
> It might be clearer to use some term other than enabled in the code -
> when reading I wasn't immediately sure if enabled meant that it was
> available to be selected or if it was the active operating point. How
> about 'allowed' (though I'm not 100% happy with that)?
;).. The opp is enabled or disabled if it is populated, it is implicit
as being available but not enabled- how about active? this would change
the opp_enable/disable functions to opp_activate, opp_deactivate..
Recommendations folks?
>
>> +static inline int opp_add(struct device *dev, const struct opp_def *opp_def)
>> +{
>> + return ERR_PTR(-EINVAL);
>> +}
>
> Mismatch with the return type and value here.
/me kicks himself.. ouch.. thanks.. will fix in v2.
>
>> +/**
>> + * opp_enable() - Enable a specific OPP
>> + * @opp: Pointer to opp
>> + *
>> + * Enables a provided opp. If the operation is valid, this returns 0, else the
>> + * corresponding error value.
>> + *
>> + * OPP used here is from the the opp_is_valid/opp_has_freq or other search
>> + * functions
>> + */
>> +int opp_enable(struct opp *opp)
>> +{
>> + if (unlikely(!opp || IS_ERR(opp))) {
>> + pr_err("%s: Invalid parameters being passed\n", __func__);
>> + return -EINVAL;
>> + }
>> +
>> + if (!opp->enabled && opp->dev_opp)
>> + opp->dev_opp->enabled_opp_count++;
>> +
>> + opp->enabled = true;
>> +
>> + return 0;
>> +}
>
> When reading the description I'd expected to see some facility to
> trigger selection of an active operating point in the library (possibly
> as a separate call since you might have a bunch of operating points
> being updated in quick succession) but it looks like that needs to be
> supplied externally at the minute?
The intent is we use the opp_search* functions to pick up the opp and
enable/activate it and disable/deactivate it.
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:05 ` Nishanth Menon
@ 2010-09-17 15:59 ` Nishanth Menon
2010-09-17 22:45 ` Rafael J. Wysocki
0 siblings, 1 reply; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 15:59 UTC (permalink / raw)
To: linux-arm-kernel
Nishanth Menon had written, on 09/17/2010 10:05 AM, the following:
> Linus Walleij had written, on 09/17/2010 08:41 AM, the following:
>> 2010/9/17 Nishanth Menon <nm@ti.com>:
>>
>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
>>> index fb742c2..45e9d4a 100644
>>> --- a/Documentation/power/00-INDEX
>>> +++ b/Documentation/power/00-INDEX
>>> @@ -14,6 +14,8 @@ interface.txt
>>> - Power management user interface in /sys/power
>>> notifiers.txt
>>> - Registering suspend notifiers in device drivers
>>> +opp.txt
>>> + - Operating Performance Point library
>>> pci.txt
>>> - How the PCI Subsystem Does Power Management
>> Hm you add the documentation to the index but not the documentation
>> itself (easy slip) would you mind posting it?
> ouch.. Sorry.. I realized that I missed adding MAINTAINER entry as
> well.. v2 coming up..
>
while the review goes on, I will till end of the day before posting a v2
will collated comments, here is the opp.txt documentation for the same..
apologies on missing in my v1..
OPP Layer
==============
SOCs have a standard set of tuples consisting of frequency and
voltage pairs that the device will support per voltage domain. This
is called Operating Performance Point or OPP. The actual definitions
of OPP varies over silicon within the same family of devices.
For a specific domain, you can have a set of {frequency, voltage}
pairs. As the kernel boots and more information is available, a set
of these are activated based on the precise nature of device the kernel
boots up on. It is interesting to remember that each IP which belongs
to a voltage domain may define their own set of OPPs on top of this.
OPP layer of its own depends on silicon specific implementation and
board specific data to finalize on the final set of OPPs available
in a system
OPP layer organizes the data internally using device pointers representing
individual voltage domains.
NOTE:
a) the OPP layer implementation expects to be used in multiple contexts
based on SOC implementation and recommends locking schemes appropriate to
the usage of SOC.
b) All pointer returns are expected to be handled by standard error checks
such as IS_ERR() and appropriate actions taken by the caller.
c) Dependency of OPP layer is on CONFIG_PM as certain SOCs such as Texas
Instrument's OMAP support have frameworks to optionally boot at a certain
opp without needing cpufreq.
Initial list initialization:
---------------------------
The SOC implementation calls the following function to add opp per
domain device.
1. opp_add - add a new OPP - NOTE: use struct opp_def and define
the custom OPP with OPP_DEF for usage.
Query functions:
----------------
Search for OPPs for various cases:
2. opp_find_freq_exact - exact search function
3. opp_find_freq_floor - round_up search function
4. opp_find_freq_ceil - round_down search function
OPP modifier functions:
----------------------
This allows opp layer users to add customized OPPs or change the table
for any need they may have
5. opp_enable - enable a disabled OPP
6. opp_disable - disable an enabled OPP
OPP Data retrieval functions:
----------------------------
The following sets of functions are useful for drivers to retrieve
data stored in opp layer for various functions.
7. opp_get_voltage - retrieve voltage for an opp
9. opp_get_freq - get the frequency for an opp
8. opp_get_opp_count - get number of opps enabled for a domain
Cpufreq table generation:
------------------------
9. opp_init_cpufreq_table - this translates the OPP layer's internal
OPP arrangement into a table understood and operated upon by the
cpufreq layer.
Data Structures:
---------------
struct opp * is a handle structure whose internals are known only
to the OPP layer and is meant to hide the complexity away from users of
opp layer.
struct opp_def * is the definitions that users can interface with
opp layer and is meant to define one OPP for a domain device.
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:53 ` Nishanth Menon
@ 2010-09-17 15:59 ` Mark Brown
2010-09-18 0:37 ` Kevin Hilman
2010-09-17 22:22 ` Rafael J. Wysocki
1 sibling, 1 reply; 34+ messages in thread
From: Mark Brown @ 2010-09-17 15:59 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Sep 17, 2010 at 10:53:06AM -0500, Nishanth Menon wrote:
> Mark Brown had written, on 09/17/2010 10:36 AM, the following:
> >It might be clearer to use some term other than enabled in the code -
> >when reading I wasn't immediately sure if enabled meant that it was
> >available to be selected or if it was the active operating point. How
> >about 'allowed' (though I'm not 100% happy with that)?
> ;).. The opp is enabled or disabled if it is populated, it is
> implicit as being available but not enabled- how about active? this
> would change the opp_enable/disable functions to opp_activate,
> opp_deactivate..
> Recommendations folks?
The enable/disable thing wasn't so noticable in the API itself, it was
in the data structures that I found it confusing - the core has a
different idea about what's going on with the system as a whole compared
to the decision that an individual device is taking.
> >When reading the description I'd expected to see some facility to
> >trigger selection of an active operating point in the library (possibly
> >as a separate call since you might have a bunch of operating points
> >being updated in quick succession) but it looks like that needs to be
> >supplied externally at the minute?
> The intent is we use the opp_search* functions to pick up the opp
> and enable/activate it and disable/deactivate it.
Sure, I get that bit. What I meant was that I was expecting something
that would say that changes had been made to the enabled/disabled sets
and that it'd be a good idea to rescan, especially for cases where the
devices change their requirements but the OPPs need to be done over a
larger block.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:30 ` Nishanth Menon
@ 2010-09-17 16:11 ` Aguirre, Sergio
2010-09-17 16:15 ` Aguirre, Sergio
1 sibling, 0 replies; 34+ messages in thread
From: Aguirre, Sergio @ 2010-09-17 16:11 UTC (permalink / raw)
To: linux-arm-kernel
Hi Nishanth,
> -----Original Message-----
> From: Menon, Nishanth
> Sent: Friday, September 17, 2010 10:30 AM
> To: Aguirre, Sergio
> Cc: linux-arm; lkml; Len Brown; Pavel Machek; Rafael J. Wysocki; Randy
> Dunlap; Jesse Barnes; Matthew Garrett; H. Peter Anvin; Andrew Morton;
> Benjamin Herrenschmidt; Martin K. Petersen; Andi Kleen; linux-pm; linux-
> doc; linux-omap; Cousson, Benoit; Chikkature Rajashekar, Madhusudhan; Phil
> Carmody; Granados Dorado, Roberto; Shilimkar, Santosh; Tero Kristo;
> Eduardo Valentin; Paul Walmsley; Romit Dasgupta; Premi, Sanjeev; Gopinath,
> Thara; Sripathy, Vishwanath; Linus Walleij; Kevin Hilman
> Subject: Re: [PATCH] opp: introduce library for device-specific OPPs
>
> Aguirre, Sergio had written, on 09/17/2010 09:09 AM, the following:
> > Hi Nishanth,
> thanks for the review.. comments below..
>
You're welcome!
My responses below.
>
> [...]
> >> diff --git a/include/linux/opp.h b/include/linux/opp.h
> >> new file mode 100644
> >> index 0000000..94a552b
> >> --- /dev/null
> >> +++ b/include/linux/opp.h
> >> @@ -0,0 +1,136 @@
> >> +/*
> >> + * Generic OPP Interface
> >> + *
> >> + * Copyright (C) 2009-2010 Texas Instruments Incorporated.
> >> + * Nishanth Menon
> >> + * Romit Dasgupta <romit@ti.com>
> >> + * Copyright (C) 2009 Deep Root Systems, LLC.
> >> + * Kevin Hilman
> >
> > Just curious why only Romit's e-mail address is there, and not Kevin's
> or
> > yours...
>
> personal choice I guess, Romit prefered his email id added Kevin and I
> did not.
OK. No problem then :)
>
> [...]
>
> >> diff --git a/lib/opp.c b/lib/opp.c
> >> new file mode 100644
> >> index 0000000..650c8c3
> >> --- /dev/null
> >> +++ b/lib/opp.c
> >> @@ -0,0 +1,440 @@
> >> +/*
> >> + * Generic OPP Interface
> [...]
> >> +
> >> +/**
> >> + * struct device_opp - Device opp structure
> >> + * @node: list node
> >> + * @dev: device handle
> >> + * @opp_list: list of opps
> >> + * @opp_count: num opps
> >> + * @enabled_opp_count: how many opps are actually enabled
> >> + *
> >> + * This is an internal datastructure maintaining the link to
> >> + * opps attached to a domain device. This structure is not
> >> + * meant to be shared with users as it private to opp layer.
> >
> > "... as it is private ..."
> thanks. will fix.
> [..]
> >> +/**
> >> + * opp_get_voltage() - Gets the voltage corresponding to an opp
> >> + * @opp: opp for which voltage has to be returned for
> >> + *
> >> + * Return voltage in micro volt corresponding to the opp, else
> >> + * return 0
> >> + */
> >> +unsigned long opp_get_voltage(const struct opp *opp)
> >> +{
> >> + if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
> >> + pr_err("%s: Invalid parameters being passed\n",
> __func__);
> >> + return 0;
> >
> > Shouldn't the case for !opp->enabled just return 0, without error
> reporting.
> >
> > IMHO, having an OPP disabled is not really a bug in params, and
> returning this, will misinform the user.. I think.
> >
> > What do you think?
>
> I think we discussed it a little earlier in the chain
> http://marc.info/?t=128152609200064&r=1&w=2
> the operations of get_freq and get_voltage dont make sense to be used on
> a disabled OPP. if we are using it so, we need to report an error to the
> user and log it for debugging.
>
> The intent is as follows:
> a) find_freq_exact is the function to use for finding the opp which is
> disabled (without the opp pointer, you cant query voltage and frequency
> anyways).
> b) to query this, you need to know the opp frequency anyways, so it
> makes no usage sense for get_freq or get_voltage to be able to be using
> a disabled opp.
Ok. I understand now.
>
> Do we think it makes sense to add this info to the documentation as well?
Yeah, I think that would be good to see explained in the docs.
>
> >
> >> + }
> >> +
> >> + return opp->u_volt;
> >> +}
> >> +
> >> +/**
> >> + * opp_get_freq() - Gets the frequency corresponding to an opp
> >> + * @opp: opp for which frequency has to be returned for
> >> + *
> >> + * Return frequency in hertz corresponding to the opp, else
> >> + * return 0
> >> + */
> >> +unsigned long opp_get_freq(const struct opp *opp)
> >> +{
> >> + if (unlikely(!opp || IS_ERR(opp)) || !opp->enabled) {
> >> + pr_err("%s: Invalid parameters being passed\n",
> __func__);
> >> + return 0;
> >
> > Similarly here.
> commented above.
Ok.
>
> >
> >> + }
> >> +
> >> + return opp->rate;
> >> +}
> >> +
> >> +/**
> >> + * opp_get_opp_count() - Get number of opps enabled in the opp list
> >> + * @dev: device for which we do this operation
> >> + *
> >> + * This functions returns the number of opps if there are any OPPs
> >
> > "This function returns.."
> Thanks. will fix in v2 post.
>
> >
> >> enabled,
> >> + * else returns corresponding error value.
> >> + */
> >> +int opp_get_opp_count(struct device *dev)
> >> +{
> >> + struct device_opp *dev_opp;
> >> +
> >> + dev_opp = find_device_opp(dev);
> >> + if (IS_ERR(dev_opp))
> >> + return -ENODEV;
> >> +
> >> + return dev_opp->enabled_opp_count;
> >> +}
> >> +
> >> +/**
> >> + * opp_find_freq_exact() - search for an exact frequency
> >> + * @dev: device for which we do this operation
> >> + * @freq: frequency to search for
> >> + * @enabled: enabled/disabled OPP to search for
> >> + *
> >> + * Searches for exact match in the opp list and returns handle to the
> >> matching
> >> + * opp if found, else returns ERR_PTR in case of error and should be
> >> handled
> >> + * using IS_ERR.
> >> + *
> >> + * Note: enabled is a modifier for the search. if enabled=true, then
> the
> >> match
> >> + * is for exact matching frequency and is enabled. if false, the match
> is
> >> for
> >> + * exact frequency which is disabled.
> >
> > That enabled var is ignored, which makes this explanation mismatch with
> the code behavior.
> >
> > Am I missing something?
> ouch.. you are correct.. it is a mistake that got introduced in one of
> the merges we did.. the matching check
> if (temp_opp->enabled && temp_opp->rate == freq) {
> should have been
> if (temp_opp->enabled == enabled && temp_opp->rate == freq) {
>
> Thanks for catching it.
>
> >
> >> + */
> >> +struct opp *opp_find_freq_exact(struct device *dev,
> >> + unsigned long freq, bool enabled)
> >> +{
> >> + struct device_opp *dev_opp;
> >> + struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >> +
> >> + dev_opp = find_device_opp(dev);
> >> + if (IS_ERR(dev_opp))
> >> + return opp;
> >> +
> >> + list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
> >> + if (temp_opp->enabled && temp_opp->rate == freq) {
> >> + opp = temp_opp;
> >> + break;
> >> + }
> >> + }
> >> +
> >> + return opp;
> >> +}
> >> +
> >> +/**
> >> + * opp_find_freq_ceil() - Search for an rounded ceil freq
> >> + * @dev: device for which we do this operation
> >> + * @freq: Start frequency
> >> + *
> >> + * Search for the matching ceil *enabled* OPP from a starting freq
> >> + * for a domain.
> >> + *
> >> + * Returns *opp and *freq is populated with the match, else
> >> + * returns NULL opp if no match, else returns ERR_PTR in case of
> error.
> >
> > By looking at the code below, I don't think returning NULL is a
> possibility.
> yet another documentation error.. thanks for catching.
> ERR_PTR(-ENODEV) is returned instead of NULL..
>
> how about "
> Returns *opp and *freq is populated with the match if found, else
> returns ERR_PTR in case of error and should be handled using IS_ERR.
>
> "
I'll say:
"Returns matching *opp and refreshes *freq accordingly"
But is a matter of taste of words. :)
> >
> >> + *
> >> + * Example usages:
> >> + * * find match/next highest available frequency *
> >> + * freq = 350000;
> >> + * opp = opp_find_freq_ceil(dev, &freq))
> >> + * if (IS_ERR(opp))
> >> + * pr_err("unable to find a higher frequency\n");
> >> + * else
> >> + * pr_info("match freq = %ld\n", freq);
> >> + *
> >> + * * print all supported frequencies in ascending order *
> >> + * freq = 0; * Search for the lowest enabled frequency *
> >> + * while (!IS_ERR(opp = opp_find_freq_ceil(OPP_MPU, &freq)) {
> >> + * pr_info("freq = %ld\n", freq);
> >> + * freq++; * for next higher match *
> >> + * }
> >> + */
> >> +struct opp *opp_find_freq_ceil(struct device *dev, unsigned long
> *freq)
> >> +{
> >> + struct device_opp *dev_opp;
> >> + struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
> >> +
> >> + dev_opp = find_device_opp(dev);
> >> + if (IS_ERR(dev_opp))
> >> + return opp;
> >> +
> >> + list_for_each_entry(temp_opp, &dev_opp->opp_list, node) {
> >> + if (temp_opp->enabled && temp_opp->rate >= *freq) {
> >
> > What if freq contains a NULL pointer?
> thanks.. good catch.. should have checked for dev and freq on entry..
> added in the to be posted v2
>
> >
> >> + opp = temp_opp;
> >> + *freq = opp->rate;
> >> + break;
> >> + }
> >> + }
> >> +
> >> + return opp;
> >> +}
> >> +
> >> +/**
> >> + * opp_find_freq_floor() - Search for an rounded floor freq
> >
> > "... Search for a rounded floor ..."
> Thanks.. will fix in v2.
>
> >
> >> + * @dev: device for which we do this operation
> >> + * @freq: Start frequency
> >> + *
> >> + * Search for the matching floor *enabled* OPP from a starting freq
> >> + * for a domain.
> >> + *
> >> + * Returns *opp and *freq is populated with the next match, else
> >> + * returns NULL opp if no match, else returns ERR_PTR in case of
> error.
> >
> > Similar comment about opp not being ever NULL as the function above.
>
> Suggest replacing with:
> Returns *opp and *freq is populated with the match if found, else
> returns ERR_PTR in case of error and should be handled using IS_ERR.
Same comment as above.
Regards,
Sergio
>
> [..]
>
> --
> Regards,
> Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:30 ` Nishanth Menon
2010-09-17 16:11 ` Aguirre, Sergio
@ 2010-09-17 16:15 ` Aguirre, Sergio
2010-09-17 16:20 ` Nishanth Menon
1 sibling, 1 reply; 34+ messages in thread
From: Aguirre, Sergio @ 2010-09-17 16:15 UTC (permalink / raw)
To: linux-arm-kernel
Nishanth,
<snip>
> > >> diff --git a/include/linux/opp.h b/include/linux/opp.h
> > >> new file mode 100644
> > >> index 0000000..94a552b
> > >> --- /dev/null
> > >> +++ b/include/linux/opp.h
> > >> @@ -0,0 +1,136 @@
> > >> +/*
> > >> + * Generic OPP Interface
> > >> + *
> > >> + * Copyright (C) 2009-2010 Texas Instruments Incorporated.
> > >> + * Nishanth Menon
> > >> + * Romit Dasgupta <romit@ti.com>
> > >> + * Copyright (C) 2009 Deep Root Systems, LLC.
> > >> + * Kevin Hilman
> > >
> > > Just curious why only Romit's e-mail address is there, and not Kevin's
> > or
> > > yours...
> >
> > personal choice I guess, Romit prefered his email id added Kevin and I
> > did not.
>
> OK. No problem then :)
Actually, Romit's e-mail address seems to be invalid... Is he still having that address? (OR is he still working in TI? 8-| )
Regards,
Sergio
<snip>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 16:15 ` Aguirre, Sergio
@ 2010-09-17 16:20 ` Nishanth Menon
0 siblings, 0 replies; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 16:20 UTC (permalink / raw)
To: linux-arm-kernel
Aguirre, Sergio had written, on 09/17/2010 11:15 AM, the following:
>>>>> diff --git a/include/linux/opp.h b/include/linux/opp.h
>>>>> new file mode 100644
>>>>> index 0000000..94a552b
>>>>> --- /dev/null
>>>>> +++ b/include/linux/opp.h
>>>>> @@ -0,0 +1,136 @@
>>>>> +/*
>>>>> + * Generic OPP Interface
>>>>> + *
>>>>> + * Copyright (C) 2009-2010 Texas Instruments Incorporated.
>>>>> + * Nishanth Menon
>>>>> + * Romit Dasgupta <romit@ti.com>
>>>>> + * Copyright (C) 2009 Deep Root Systems, LLC.
>>>>> + * Kevin Hilman
>>>> Just curious why only Romit's e-mail address is there, and not Kevin's
>>> or
>>>> yours...
>>> personal choice I guess, Romit prefered his email id added Kevin and I
>>> did not.
>> OK. No problem then :)
>
> Actually, Romit's e-mail address seems to be invalid... Is he still having that address? (OR is he still working in TI? 8-| )
On an internal thread, I am told to remove the email id - so will remove
Romit's email id and cc from v2 patch.
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:36 ` [linux-pm] " Mark Brown
2010-09-17 15:53 ` Nishanth Menon
@ 2010-09-17 16:45 ` Phil Carmody
2010-09-18 10:08 ` Mark Brown
1 sibling, 1 reply; 34+ messages in thread
From: Phil Carmody @ 2010-09-17 16:45 UTC (permalink / raw)
To: linux-arm-kernel
On 17/09/10 17:36 +0200, ext Mark Brown wrote:
> On Thu, Sep 16, 2010 at 08:29:33PM -0500, Nishanth Menon wrote:
>
> > +struct opp_def {
> > + unsigned long freq;
> > + unsigned long u_volt;
> > +
> > + bool enabled;
> > +};
>
> It might be clearer to use some term other than enabled in the code -
> when reading I wasn't immediately sure if enabled meant that it was
> available to be selected or if it was the active operating point. How
^^^^^^^^^
> about 'allowed' (though I'm not 100% happy with that)?
I think your query contains its own answer.
Phil
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 1:29 ` [PATCH] opp: introduce library for device-specific OPPs Nishanth Menon
` (2 preceding siblings ...)
2010-09-17 15:36 ` [linux-pm] " Mark Brown
@ 2010-09-17 19:19 ` Andrew Morton
2010-09-17 21:23 ` Nishanth Menon
2010-09-17 22:07 ` Rafael J. Wysocki
3 siblings, 2 replies; 34+ messages in thread
From: Andrew Morton @ 2010-09-17 19:19 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 16 Sep 2010 20:29:33 -0500
Nishanth Menon <nm@ti.com> wrote:
> SOCs have a standard set of tuples consisting of frequency and
> voltage pairs that the device will support per voltage domain. These
> are called Operating Performance Points or OPPs. The actual
> definitions of Operating Performance Points varies over silicon within the
> same family of devices. For a specific domain, you can have a set of
> {frequency, voltage} pairs. As the kernel boots and more information
> is available, a set of these are activated based on the precise nature
> of device the kernel boots up on. It is interesting to remember that
> each IP which belongs to a voltage domain may define their own set of
> OPPs on top of this.
>
> To implement an OPP, some sort of power management support is necessary
> hence this library enablement depends on CONFIG_PM, however this does
> not fit into the core power framework as it is an independent library.
> This is hence introduced under lib allowing all architectures to
> selectively enable the feature based on thier capabilities.
>
> Contributions include:
> Sanjeev Premi for the initial concept:
> http://patchwork.kernel.org/patch/50998/
> Kevin Hilman for converting original design to device-based
> Kevin Hilman and Paul Walmsey for cleaning up many of the function
> abstractions, improvements and data structure handling
> Romit Dasgupta for using enums instead of opp pointers
> Thara Gopinath, Eduardo Valentin and Vishwanath BS for fixes and
> cleanups.
> Linus Walleij for recommending this layer be made generic for usage
> in other architectures beyond OMAP and ARM.
>
> Discussions and comments from:
> http://marc.info/?l=linux-omap&m=126033945313269&w=2
> http://marc.info/?l=linux-omap&m=125482970102327&w=2
> http://marc.info/?t=125809247500002&r=1&w=2
> http://marc.info/?l=linux-omap&m=126025973426007&w=2
> http://marc.info/?t=128152609200064&r=1&w=2
> incorporated.
>
> ...
>
> Documentation/power/00-INDEX | 2 +
> include/linux/opp.h | 136 +++++++++++++
> kernel/power/Kconfig | 14 ++
> lib/Makefile | 2 +
> lib/opp.c | 440 ++++++++++++++++++++++++++++++++++++++++++
./lib/ is an unusual place to put a driver-like thing such as this.
The lib/ directory is mainly for generic kernel-wide support things.
I'd suggest that ./drivers/opp/ would be a better place.
>
> ...
>
> +/*
> + * Initialization wrapper used to define an OPP.
> + * To point at the end of a terminator of a list of OPPs,
> + * use OPP_DEF(0, 0, 0)
> + */
> +#define OPP_DEF(_enabled, _freq, _uv) \
> +{ \
> + .enabled = _enabled, \
> + .freq = _freq, \
> + .u_volt = _uv, \
> +}
OPP_DEF is a somewhat atypical name. OPP_INITIALIZER would be more
conventional.
However OPP_DEF has no usage in this patch so perhaps this can be
removed?
> +static LIST_HEAD(dev_opp_list);
There's no locking for this list. That's OK under some circumstances,
but I do think there should be a comment here explaining this apparent
bug: why is no locking needed, what are the lifetime rules for entries
on this list.
Also, the _ordering_ of items on this list is significant. It should
also be documented.
>
> ...
>
> +/**
> + * opp_get_voltage() - Gets the voltage corresponding to an opp
Usually the () is omitted from function names in kerneldoc comments.
It might be OK, or it might produce strange output - I haven't
checked.
>
> ...
>
> +/**
> + * opp_find_freq_exact() - search for an exact frequency
> + * @dev: device for which we do this operation
> + * @freq: frequency to search for
> + * @enabled: enabled/disabled OPP to search for
> + *
> + * Searches for exact match in the opp list and returns handle to the matching
s/handle/pointer/
> + * opp if found, else returns ERR_PTR in case of error and should be handled
> + * using IS_ERR.
> + *
> + * Note: enabled is a modifier for the search. if enabled=true, then the match
> + * is for exact matching frequency and is enabled. if false, the match is for
> + * exact frequency which is disabled.
> + */
>
> ...
>
> +int opp_add(struct device *dev, const struct opp_def *opp_def)
> +{
> + struct device_opp *tmp_dev_opp, *dev_opp = NULL;
> + struct opp *opp, *new_opp;
> + struct list_head *head;
> +
> + /* Check for existing list for 'dev' */
> + list_for_each_entry(tmp_dev_opp, &dev_opp_list, node) {
> + if (dev == tmp_dev_opp->dev) {
> + dev_opp = tmp_dev_opp;
> + break;
> + }
> + }
> +
> + if (!dev_opp) {
> + /* Allocate a new device OPP table */
> + dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
> + if (!dev_opp) {
> + pr_warning("%s: unable to allocate device struct\n",
> + __func__);
> + return -ENOMEM;
> + }
> +
> + dev_opp->dev = dev;
> + INIT_LIST_HEAD(&dev_opp->opp_list);
> +
> + list_add(&dev_opp->node, &dev_opp_list);
> + }
> +
> + /* allocate new OPP node */
> + new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
> + if (!new_opp) {
> + if (list_empty(&dev_opp->opp_list)) {
> + list_del(&dev_opp->node);
It would be neater to move the list_add() down to after the allocation
of new_opp and to remove this list_del().
> + kfree(dev_opp);
> + }
> + pr_warning("%s: unable to allocate new opp node\n",
> + __func__);
> + return -ENOMEM;
> + }
> + opp_populate(new_opp, opp_def);
> +
> + /* Insert new OPP in order of increasing frequency */
> + head = &dev_opp->opp_list;
> + list_for_each_entry_reverse(opp, &dev_opp->opp_list, node) {
> + if (new_opp->rate >= opp->rate) {
> + head = &opp->node;
> + break;
> + }
> + }
> + list_add(&new_opp->node, head);
> + dev_opp->opp_count++;
> + if (new_opp->enabled)
> + dev_opp->enabled_opp_count++;
These non-atomic read-modify-write operations on *dev_opp have no
locking. What prevents races here?
> + return 0;
> +}
> +
>
> ...
>
> +void opp_init_cpufreq_table(struct device *dev,
> + struct cpufreq_frequency_table **table)
> +{
> + struct device_opp *dev_opp;
> + struct opp *opp;
> + struct cpufreq_frequency_table *freq_table;
> + int i = 0;
> +
> + dev_opp = find_device_opp(dev);
> + if (IS_ERR(dev_opp)) {
> + pr_warning("%s: unable to find device\n", __func__);
> + return;
> + }
> +
> + freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
> + (dev_opp->enabled_opp_count + 1), GFP_ATOMIC);
> + if (!freq_table) {
> + pr_warning("%s: failed to allocate frequency table\n",
> + __func__);
> + return;
> + }
> +
> + list_for_each_entry(opp, &dev_opp->opp_list, node) {
> + if (opp->enabled) {
> + freq_table[i].index = i;
> + freq_table[i].frequency = opp->rate / 1000;
> + i++;
> + }
> + }
> +
> + freq_table[i].index = i;
> + freq_table[i].frequency = CPUFREQ_TABLE_END;
> +
> + *table = &freq_table[0];
> +}
So we're playing with cpufreq internals here but there's no #ifdef
CONFIG_CPUFREQ and there's no Kconfig dependency on cpufreq. That
needs fixing I think, if only from a reduce-code-bloat perspective.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 19:19 ` Andrew Morton
@ 2010-09-17 21:23 ` Nishanth Menon
2010-09-17 22:51 ` Kevin Hilman
2010-09-17 23:07 ` Rafael J. Wysocki
2010-09-17 22:07 ` Rafael J. Wysocki
1 sibling, 2 replies; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 21:23 UTC (permalink / raw)
To: linux-arm-kernel
Andrew Morton had written, on 09/17/2010 02:19 PM, the following:
> On Thu, 16 Sep 2010 20:29:33 -0500
> Nishanth Menon <nm@ti.com> wrote:
>
[...]
>>
>> Documentation/power/00-INDEX | 2 +
>> include/linux/opp.h | 136 +++++++++++++
>> kernel/power/Kconfig | 14 ++
>> lib/Makefile | 2 +
>> lib/opp.c | 440 ++++++++++++++++++++++++++++++++++++++++++
>
> ./lib/ is an unusual place to put a driver-like thing such as this.
> The lib/ directory is mainly for generic kernel-wide support things.
> I'd suggest that ./drivers/opp/ would be a better place.
we had an interesting debate on this topic on the the thread starting
http://marc.info/?l=linux-arm-kernel&m=128465710624421&w=2
It really does not provide any driver like feature here. it is just a
bunch of helper functions which any SOC framework can use to build up to
implement either:
a) cpufreq implementation with various OPPs
b) bootup in a specific opp in a set of supported OPPs and stay there
(e.g. mpurate support on OMAP platforms)
I had considered putting it in drivers/power, but it looks to contain
mostly regulator stuff, the other alternative would be to include it in
kernel/power or as Kevin H, Linus W and you mention drivers/opp
Personally, I dont have a strong feeling about it as long as it is
reusable across SOCs and am hoping to hear some alignment :).
>
>> ...
>>
>> +/*
>> + * Initialization wrapper used to define an OPP.
>> + * To point at the end of a terminator of a list of OPPs,
>> + * use OPP_DEF(0, 0, 0)
>> + */
>> +#define OPP_DEF(_enabled, _freq, _uv) \
>> +{ \
>> + .enabled = _enabled, \
>> + .freq = _freq, \
>> + .u_volt = _uv, \
>> +}
>
> OPP_DEF is a somewhat atypical name. OPP_INITIALIZER would be more
> conventional.
Thanks. will incorporate this in v2 of my patchset.
>
> However OPP_DEF has no usage in this patch so perhaps this can be
> removed?
there were a few OMAP followon patches from
http://marc.info/?l=linux-omap&m=128152135801634&w=2
which will actually use this for OMAP3 platform.
>
>> +static LIST_HEAD(dev_opp_list);
>
> There's no locking for this list. That's OK under some circumstances,
> but I do think there should be a comment here explaining this apparent
> bug: why is no locking needed, what are the lifetime rules for entries
> on this list.
Locking:
arrgh.. my bad.. I had documented it in the missing and later on posted
opp.txt http://marc.info/?l=linux-arm-kernel&m=128473931626114&w=2
The OPP layer implementation expects to be used in multiple contexts
(including calls from interrupt locked context) based on SOC framework
implementation. It is recommended to use appropriate locking schemes
within the SOC framework itself.
In terms of the lifetime rules on the nodes in the list:
The list is expected to be maintained once created, entries are expected
to be added optimally and not expected to be destroyed, the choice of
list implementation was for reducing the complexity of the code itself
and not yet meant as a mechanism to dynamically add and delete nodes on
the fly.. Essentially, it was intended for the SOC framework to ensure
it plugs in the OPP entries optimally and not create a humungous list of
all possible OPPs for all families of the vendor SOCs - even though it
is possible to use the OPP layer so - it just wont be smart to do so
considering list scan latencies on hot paths such as cpufreq transitions
or idle transitions.
I should add comments to the effect of the same to the list header and
documentation - will do this as part of v2.
>
> Also, the _ordering_ of items on this list is significant. It should
> also be documented.
Ack. thanks for catching this. Will do it in v2.
>
>> ...
>>
>> +/**
>> + * opp_get_voltage() - Gets the voltage corresponding to an opp
>
> Usually the () is omitted from function names in kerneldoc comments.
> It might be OK, or it might produce strange output - I haven't
> checked.
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/kernel-doc-nano-HOWTO.txt;h=3d8a97747f7731c801ca7d3a1483858feeb76b6c;hb=HEAD#l55
I wonder why the example protos use ()
>> ...
>>
>> +/**
>> + * opp_find_freq_exact() - search for an exact frequency
>> + * @dev: device for which we do this operation
>> + * @freq: frequency to search for
>> + * @enabled: enabled/disabled OPP to search for
>> + *
>> + * Searches for exact match in the opp list and returns handle to the matching
>
> s/handle/pointer/
Thanks.. will fix in v2.
>
>> + * opp if found, else returns ERR_PTR in case of error and should be handled
>> + * using IS_ERR.
>> + *
>> + * Note: enabled is a modifier for the search. if enabled=true, then the match
>> + * is for exact matching frequency and is enabled. if false, the match is for
>> + * exact frequency which is disabled.
>> + */
>>
>> ...
>>
>> +int opp_add(struct device *dev, const struct opp_def *opp_def)
>> +{
>> + struct device_opp *tmp_dev_opp, *dev_opp = NULL;
>> + struct opp *opp, *new_opp;
>> + struct list_head *head;
>> +
>> + /* Check for existing list for 'dev' */
>> + list_for_each_entry(tmp_dev_opp, &dev_opp_list, node) {
>> + if (dev == tmp_dev_opp->dev) {
>> + dev_opp = tmp_dev_opp;
>> + break;
>> + }
>> + }
>> +
>> + if (!dev_opp) {
>> + /* Allocate a new device OPP table */
>> + dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
>> + if (!dev_opp) {
>> + pr_warning("%s: unable to allocate device struct\n",
>> + __func__);
>> + return -ENOMEM;
>> + }
>> +
>> + dev_opp->dev = dev;
>> + INIT_LIST_HEAD(&dev_opp->opp_list);
>> +
>> + list_add(&dev_opp->node, &dev_opp_list);
>> + }
>> +
>> + /* allocate new OPP node */
>> + new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
>> + if (!new_opp) {
>> + if (list_empty(&dev_opp->opp_list)) {
>> + list_del(&dev_opp->node);
>
> It would be neater to move the list_add() down to after the allocation
> of new_opp and to remove this list_del().
ack. thanks.. yes it does clean it up. additionally i expanded the
static opp_populate which is used just once - removed one function call ;)
>
>> + kfree(dev_opp);
>> + }
>> + pr_warning("%s: unable to allocate new opp node\n",
>> + __func__);
>> + return -ENOMEM;
>> + }
>> + opp_populate(new_opp, opp_def);
^^^^^^^^^^^^
>> +
>> + /* Insert new OPP in order of increasing frequency */
>> + head = &dev_opp->opp_list;
>> + list_for_each_entry_reverse(opp, &dev_opp->opp_list, node) {
>> + if (new_opp->rate >= opp->rate) {
>> + head = &opp->node;
>> + break;
>> + }
>> + }
>> + list_add(&new_opp->node, head);
>> + dev_opp->opp_count++;
>> + if (new_opp->enabled)
>> + dev_opp->enabled_opp_count++;
>
> These non-atomic read-modify-write operations on *dev_opp have no
> locking. What prevents races here?
explained above. it is the job of the framework using the OPP Layer library.
>
>> + return 0;
>> +}
>> +
>>
>> ...
>>
>> +void opp_init_cpufreq_table(struct device *dev,
>> + struct cpufreq_frequency_table **table)
>> +{
>> + struct device_opp *dev_opp;
>> + struct opp *opp;
>> + struct cpufreq_frequency_table *freq_table;
>> + int i = 0;
>> +
>> + dev_opp = find_device_opp(dev);
>> + if (IS_ERR(dev_opp)) {
>> + pr_warning("%s: unable to find device\n", __func__);
>> + return;
>> + }
>> +
>> + freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
>> + (dev_opp->enabled_opp_count + 1), GFP_ATOMIC);
>> + if (!freq_table) {
>> + pr_warning("%s: failed to allocate frequency table\n",
>> + __func__);
>> + return;
>> + }
>> +
>> + list_for_each_entry(opp, &dev_opp->opp_list, node) {
>> + if (opp->enabled) {
>> + freq_table[i].index = i;
>> + freq_table[i].frequency = opp->rate / 1000;
>> + i++;
>> + }
>> + }
>> +
>> + freq_table[i].index = i;
>> + freq_table[i].frequency = CPUFREQ_TABLE_END;
>> +
>> + *table = &freq_table[0];
>> +}
>
> So we're playing with cpufreq internals here but there's no #ifdef
> CONFIG_CPUFREQ and there's no Kconfig dependency on cpufreq. That
> needs fixing I think, if only from a reduce-code-bloat perspective.
Thanks and ouch.. Again missing documentation. Apologies.
http://marc.info/?l=linux-arm-kernel&m=128473931626114&w=2
c) Dependency of OPP layer is on CONFIG_PM as certain SOCs such as Texas
Instrument's OMAP support have frameworks to optionally boot at a
certain opp without needing cpufreq.
This is called "mpurate" bootarg parameter in OMAP framework. I will put
this under #ifdef CPUFREQ and provide header coverage for the same
appropriately.
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 19:19 ` Andrew Morton
2010-09-17 21:23 ` Nishanth Menon
@ 2010-09-17 22:07 ` Rafael J. Wysocki
1 sibling, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-17 22:07 UTC (permalink / raw)
To: linux-arm-kernel
On Friday, September 17, 2010, Andrew Morton wrote:
> On Thu, 16 Sep 2010 20:29:33 -0500
> Nishanth Menon <nm@ti.com> wrote:
>
> > SOCs have a standard set of tuples consisting of frequency and
> > voltage pairs that the device will support per voltage domain. These
> > are called Operating Performance Points or OPPs. The actual
> > definitions of Operating Performance Points varies over silicon within the
> > same family of devices. For a specific domain, you can have a set of
> > {frequency, voltage} pairs. As the kernel boots and more information
> > is available, a set of these are activated based on the precise nature
> > of device the kernel boots up on. It is interesting to remember that
> > each IP which belongs to a voltage domain may define their own set of
> > OPPs on top of this.
> >
> > To implement an OPP, some sort of power management support is necessary
> > hence this library enablement depends on CONFIG_PM, however this does
> > not fit into the core power framework as it is an independent library.
> > This is hence introduced under lib allowing all architectures to
> > selectively enable the feature based on thier capabilities.
> >
> > Contributions include:
> > Sanjeev Premi for the initial concept:
> > http://patchwork.kernel.org/patch/50998/
> > Kevin Hilman for converting original design to device-based
> > Kevin Hilman and Paul Walmsey for cleaning up many of the function
> > abstractions, improvements and data structure handling
> > Romit Dasgupta for using enums instead of opp pointers
> > Thara Gopinath, Eduardo Valentin and Vishwanath BS for fixes and
> > cleanups.
> > Linus Walleij for recommending this layer be made generic for usage
> > in other architectures beyond OMAP and ARM.
> >
> > Discussions and comments from:
> > http://marc.info/?l=linux-omap&m=126033945313269&w=2
> > http://marc.info/?l=linux-omap&m=125482970102327&w=2
> > http://marc.info/?t=125809247500002&r=1&w=2
> > http://marc.info/?l=linux-omap&m=126025973426007&w=2
> > http://marc.info/?t=128152609200064&r=1&w=2
> > incorporated.
> >
> > ...
> >
> > Documentation/power/00-INDEX | 2 +
> > include/linux/opp.h | 136 +++++++++++++
> > kernel/power/Kconfig | 14 ++
> > lib/Makefile | 2 +
> > lib/opp.c | 440 ++++++++++++++++++++++++++++++++++++++++++
>
> ./lib/ is an unusual place to put a driver-like thing such as this.
> The lib/ directory is mainly for generic kernel-wide support things.
> I'd suggest that ./drivers/opp/ would be a better place.
Well, there are a few similar things in drivers/base/power already.
I agree with all of your comments below.
Thanks,
Rafael
> > ...
> >
> > +/*
> > + * Initialization wrapper used to define an OPP.
> > + * To point at the end of a terminator of a list of OPPs,
> > + * use OPP_DEF(0, 0, 0)
> > + */
> > +#define OPP_DEF(_enabled, _freq, _uv) \
> > +{ \
> > + .enabled = _enabled, \
> > + .freq = _freq, \
> > + .u_volt = _uv, \
> > +}
>
> OPP_DEF is a somewhat atypical name. OPP_INITIALIZER would be more
> conventional.
>
> However OPP_DEF has no usage in this patch so perhaps this can be
> removed?
>
> > +static LIST_HEAD(dev_opp_list);
>
> There's no locking for this list. That's OK under some circumstances,
> but I do think there should be a comment here explaining this apparent
> bug: why is no locking needed, what are the lifetime rules for entries
> on this list.
>
> Also, the _ordering_ of items on this list is significant. It should
> also be documented.
>
> >
> > ...
> >
> > +/**
> > + * opp_get_voltage() - Gets the voltage corresponding to an opp
>
> Usually the () is omitted from function names in kerneldoc comments.
> It might be OK, or it might produce strange output - I haven't
> checked.
>
> >
> > ...
> >
> > +/**
> > + * opp_find_freq_exact() - search for an exact frequency
> > + * @dev: device for which we do this operation
> > + * @freq: frequency to search for
> > + * @enabled: enabled/disabled OPP to search for
> > + *
> > + * Searches for exact match in the opp list and returns handle to the matching
>
> s/handle/pointer/
>
> > + * opp if found, else returns ERR_PTR in case of error and should be handled
> > + * using IS_ERR.
> > + *
> > + * Note: enabled is a modifier for the search. if enabled=true, then the match
> > + * is for exact matching frequency and is enabled. if false, the match is for
> > + * exact frequency which is disabled.
> > + */
> >
> > ...
> >
> > +int opp_add(struct device *dev, const struct opp_def *opp_def)
> > +{
> > + struct device_opp *tmp_dev_opp, *dev_opp = NULL;
> > + struct opp *opp, *new_opp;
> > + struct list_head *head;
> > +
> > + /* Check for existing list for 'dev' */
> > + list_for_each_entry(tmp_dev_opp, &dev_opp_list, node) {
> > + if (dev == tmp_dev_opp->dev) {
> > + dev_opp = tmp_dev_opp;
> > + break;
> > + }
> > + }
> > +
> > + if (!dev_opp) {
> > + /* Allocate a new device OPP table */
> > + dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
> > + if (!dev_opp) {
> > + pr_warning("%s: unable to allocate device struct\n",
> > + __func__);
> > + return -ENOMEM;
> > + }
> > +
> > + dev_opp->dev = dev;
> > + INIT_LIST_HEAD(&dev_opp->opp_list);
> > +
> > + list_add(&dev_opp->node, &dev_opp_list);
> > + }
> > +
> > + /* allocate new OPP node */
> > + new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
> > + if (!new_opp) {
> > + if (list_empty(&dev_opp->opp_list)) {
> > + list_del(&dev_opp->node);
>
> It would be neater to move the list_add() down to after the allocation
> of new_opp and to remove this list_del().
>
> > + kfree(dev_opp);
> > + }
> > + pr_warning("%s: unable to allocate new opp node\n",
> > + __func__);
> > + return -ENOMEM;
> > + }
> > + opp_populate(new_opp, opp_def);
> > +
> > + /* Insert new OPP in order of increasing frequency */
> > + head = &dev_opp->opp_list;
> > + list_for_each_entry_reverse(opp, &dev_opp->opp_list, node) {
> > + if (new_opp->rate >= opp->rate) {
> > + head = &opp->node;
> > + break;
> > + }
> > + }
> > + list_add(&new_opp->node, head);
> > + dev_opp->opp_count++;
> > + if (new_opp->enabled)
> > + dev_opp->enabled_opp_count++;
>
> These non-atomic read-modify-write operations on *dev_opp have no
> locking. What prevents races here?
>
> > + return 0;
> > +}
> > +
> >
> > ...
> >
> > +void opp_init_cpufreq_table(struct device *dev,
> > + struct cpufreq_frequency_table **table)
> > +{
> > + struct device_opp *dev_opp;
> > + struct opp *opp;
> > + struct cpufreq_frequency_table *freq_table;
> > + int i = 0;
> > +
> > + dev_opp = find_device_opp(dev);
> > + if (IS_ERR(dev_opp)) {
> > + pr_warning("%s: unable to find device\n", __func__);
> > + return;
> > + }
> > +
> > + freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
> > + (dev_opp->enabled_opp_count + 1), GFP_ATOMIC);
> > + if (!freq_table) {
> > + pr_warning("%s: failed to allocate frequency table\n",
> > + __func__);
> > + return;
> > + }
> > +
> > + list_for_each_entry(opp, &dev_opp->opp_list, node) {
> > + if (opp->enabled) {
> > + freq_table[i].index = i;
> > + freq_table[i].frequency = opp->rate / 1000;
> > + i++;
> > + }
> > + }
> > +
> > + freq_table[i].index = i;
> > + freq_table[i].frequency = CPUFREQ_TABLE_END;
> > +
> > + *table = &freq_table[0];
> > +}
>
> So we're playing with cpufreq internals here but there's no #ifdef
> CONFIG_CPUFREQ and there's no Kconfig dependency on cpufreq. That
> needs fixing I think, if only from a reduce-code-bloat perspective.
>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:53 ` Nishanth Menon
2010-09-17 15:59 ` Mark Brown
@ 2010-09-17 22:22 ` Rafael J. Wysocki
2010-09-17 22:26 ` Nishanth Menon
1 sibling, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-17 22:22 UTC (permalink / raw)
To: linux-arm-kernel
On Friday, September 17, 2010, Nishanth Menon wrote:
> Mark Brown had written, on 09/17/2010 10:36 AM, the following:
> > On Thu, Sep 16, 2010 at 08:29:33PM -0500, Nishanth Menon wrote:
> >
> >> +struct opp_def {
> >> + unsigned long freq;
> >> + unsigned long u_volt;
> >> +
> >> + bool enabled;
> >> +};
> >
> > It might be clearer to use some term other than enabled in the code -
> > when reading I wasn't immediately sure if enabled meant that it was
> > available to be selected or if it was the active operating point. How
> > about 'allowed' (though I'm not 100% happy with that)?
> ;).. The opp is enabled or disabled if it is populated, it is implicit
> as being available but not enabled- how about active? this would change
> the opp_enable/disable functions to opp_activate, opp_deactivate..
Would that mean that "active" is the one currently in use?
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 22:22 ` Rafael J. Wysocki
@ 2010-09-17 22:26 ` Nishanth Menon
2010-09-17 22:52 ` Rafael J. Wysocki
0 siblings, 1 reply; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 22:26 UTC (permalink / raw)
To: linux-arm-kernel
Rafael J. Wysocki had written, on 09/17/2010 05:22 PM, the following:
> On Friday, September 17, 2010, Nishanth Menon wrote:
>> Mark Brown had written, on 09/17/2010 10:36 AM, the following:
>>> On Thu, Sep 16, 2010 at 08:29:33PM -0500, Nishanth Menon wrote:
>>>
>>>> +struct opp_def {
>>>> + unsigned long freq;
>>>> + unsigned long u_volt;
>>>> +
>>>> + bool enabled;
>>>> +};
>>> It might be clearer to use some term other than enabled in the code -
>>> when reading I wasn't immediately sure if enabled meant that it was
>>> available to be selected or if it was the active operating point. How
>>> about 'allowed' (though I'm not 100% happy with that)?
>> ;).. The opp is enabled or disabled if it is populated, it is implicit
>> as being available but not enabled- how about active? this would change
>> the opp_enable/disable functions to opp_activate, opp_deactivate..
>
> Would that mean that "active" is the one currently in use?
I like the idea Phil pointed out[1] on using "available" instead..
opp_enable and disable will make the OPP available or not. does this
sound better?
[1] http://marc.info/?l=linux-arm-kernel&m=128474217132058&w=2
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:59 ` Nishanth Menon
@ 2010-09-17 22:45 ` Rafael J. Wysocki
2010-09-17 23:19 ` Nishanth Menon
0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-17 22:45 UTC (permalink / raw)
To: linux-arm-kernel
On Friday, September 17, 2010, Nishanth Menon wrote:
> Nishanth Menon had written, on 09/17/2010 10:05 AM, the following:
> > Linus Walleij had written, on 09/17/2010 08:41 AM, the following:
> >> 2010/9/17 Nishanth Menon <nm@ti.com>:
> >>
> >>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> >>> index fb742c2..45e9d4a 100644
> >>> --- a/Documentation/power/00-INDEX
> >>> +++ b/Documentation/power/00-INDEX
> >>> @@ -14,6 +14,8 @@ interface.txt
> >>> - Power management user interface in /sys/power
> >>> notifiers.txt
> >>> - Registering suspend notifiers in device drivers
> >>> +opp.txt
> >>> + - Operating Performance Point library
> >>> pci.txt
> >>> - How the PCI Subsystem Does Power Management
> >> Hm you add the documentation to the index but not the documentation
> >> itself (easy slip) would you mind posting it?
> > ouch.. Sorry.. I realized that I missed adding MAINTAINER entry as
> > well.. v2 coming up..
> >
>
> while the review goes on, I will till end of the day before posting a v2
> will collated comments, here is the opp.txt documentation for the same..
> apologies on missing in my v1..
OK, I'm not generally familiar with these things, so a couple of questions.
> OPP Layer
> ==============
> SOCs have a standard set of tuples consisting of frequency and
> voltage pairs that the device will support per voltage domain. This
> is called Operating Performance Point or OPP. The actual definitions
> of OPP varies over silicon within the same family of devices.
> For a specific domain, you can have a set of {frequency, voltage}
> pairs. As the kernel boots and more information is available, a set
> of these are activated based on the precise nature of device the kernel
> boots up on.
Does it mean that the full set of possible OPPs is already known from the
hardware configuration and the ones that are actually useable are found
during boot?
> It is interesting to remember that each IP which belongs
> to a voltage domain may define their own set of OPPs on top of this.
What does IP stand for in this context?
> OPP layer of its own depends on silicon specific implementation and
> board specific data to finalize on the final set of OPPs available
> in a system
How does the kernel learn about these things? Do they need to be hardcoded
or is there any dynamic configuration mechanism available?
> OPP layer organizes the data internally using device pointers representing
> individual voltage domains.
Can you please describe these data structures in a bit more detail?
> NOTE:
> a) the OPP layer implementation expects to be used in multiple contexts
> based on SOC implementation and recommends locking schemes appropriate to
> the usage of SOC.
> b) All pointer returns are expected to be handled by standard error checks
> such as IS_ERR() and appropriate actions taken by the caller.
I noticed that in a few places it isn't really necessary to return a pointer.
I think you can simply return int in such cases.
> c) Dependency of OPP layer is on CONFIG_PM as certain SOCs such as Texas
> Instrument's OMAP support have frameworks to optionally boot at a certain
> opp without needing cpufreq.
>
> Initial list initialization:
> ---------------------------
> The SOC implementation calls the following function to add opp per
> domain device.
>
> 1. opp_add - add a new OPP - NOTE: use struct opp_def and define
> the custom OPP with OPP_DEF for usage.
>
> Query functions:
> ----------------
> Search for OPPs for various cases:
> 2. opp_find_freq_exact - exact search function
> 3. opp_find_freq_floor - round_up search function
> 4. opp_find_freq_ceil - round_down search function
What do they do exactly?
> OPP modifier functions:
> ----------------------
> This allows opp layer users to add customized OPPs or change the table
How exactly is that supposed to work?
> for any need they may have
> 5. opp_enable - enable a disabled OPP
> 6. opp_disable - disable an enabled OPP
>
> OPP Data retrieval functions:
> ----------------------------
> The following sets of functions are useful for drivers to retrieve
> data stored in opp layer for various functions.
> 7. opp_get_voltage - retrieve voltage for an opp
> 9. opp_get_freq - get the frequency for an opp
> 8. opp_get_opp_count - get number of opps enabled for a domain
OK, suppose I'm writing a driver that needs to care. What should I use these
functions for?
> Cpufreq table generation:
> ------------------------
> 9. opp_init_cpufreq_table - this translates the OPP layer's internal
> OPP arrangement into a table understood and operated upon by the
> cpufreq layer.
>
> Data Structures:
> ---------------
> struct opp * is a handle structure whose internals are known only
> to the OPP layer and is meant to hide the complexity away from users of
> opp layer.
>
> struct opp_def * is the definitions that users can interface with
> opp layer and is meant to define one OPP for a domain device.
The data structures require some more description IMHO. They aren't completely
intuitive.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 21:23 ` Nishanth Menon
@ 2010-09-17 22:51 ` Kevin Hilman
2010-09-17 23:07 ` Rafael J. Wysocki
1 sibling, 0 replies; 34+ messages in thread
From: Kevin Hilman @ 2010-09-17 22:51 UTC (permalink / raw)
To: linux-arm-kernel
Nishanth Menon <nm@ti.com> writes:
> Andrew Morton had written, on 09/17/2010 02:19 PM, the following:
>> On Thu, 16 Sep 2010 20:29:33 -0500
>> Nishanth Menon <nm@ti.com> wrote:
>>
[...]
>>> +void opp_init_cpufreq_table(struct device *dev,
>>> + struct cpufreq_frequency_table **table)
>>> +{
>>> + struct device_opp *dev_opp;
>>> + struct opp *opp;
>>> + struct cpufreq_frequency_table *freq_table;
>>> + int i = 0;
>>> +
>>> + dev_opp = find_device_opp(dev);
>>> + if (IS_ERR(dev_opp)) {
>>> + pr_warning("%s: unable to find device\n", __func__);
>>> + return;
>>> + }
>>> +
>>> + freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
>>> + (dev_opp->enabled_opp_count + 1), GFP_ATOMIC);
>>> + if (!freq_table) {
>>> + pr_warning("%s: failed to allocate frequency table\n",
>>> + __func__);
>>> + return;
>>> + }
>>> +
>>> + list_for_each_entry(opp, &dev_opp->opp_list, node) {
>>> + if (opp->enabled) {
>>> + freq_table[i].index = i;
>>> + freq_table[i].frequency = opp->rate / 1000;
>>> + i++;
>>> + }
>>> + }
>>> +
>>> + freq_table[i].index = i;
>>> + freq_table[i].frequency = CPUFREQ_TABLE_END;
>>> +
>>> + *table = &freq_table[0];
>>> +}
>>
>> So we're playing with cpufreq internals here but there's no #ifdef
>> CONFIG_CPUFREQ and there's no Kconfig dependency on cpufreq. That
>> needs fixing I think, if only from a reduce-code-bloat perspective.
>
> Thanks and ouch.. Again missing documentation. Apologies.
> http://marc.info/?l=linux-arm-kernel&m=128473931626114&w=2
>
> c) Dependency of OPP layer is on CONFIG_PM as certain SOCs such as Texas
> Instrument's OMAP support have frameworks to optionally boot at a
> certain opp without needing cpufreq.
>
> This is called "mpurate" bootarg parameter in OMAP framework. I will
> put this under #ifdef CPUFREQ and provide header coverage for the same
> appropriately.
The OPP layer in general is dependent on CONFIG_PM, but the snippit
above is called only by CPUfreq core when CPUfreq is enabled, so at
least that function should be under #ifdef CPUFREQ.
Kevin
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 22:26 ` Nishanth Menon
@ 2010-09-17 22:52 ` Rafael J. Wysocki
0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-17 22:52 UTC (permalink / raw)
To: linux-arm-kernel
On Saturday, September 18, 2010, Nishanth Menon wrote:
> Rafael J. Wysocki had written, on 09/17/2010 05:22 PM, the following:
> > On Friday, September 17, 2010, Nishanth Menon wrote:
> >> Mark Brown had written, on 09/17/2010 10:36 AM, the following:
> >>> On Thu, Sep 16, 2010 at 08:29:33PM -0500, Nishanth Menon wrote:
> >>>
> >>>> +struct opp_def {
> >>>> + unsigned long freq;
> >>>> + unsigned long u_volt;
> >>>> +
> >>>> + bool enabled;
> >>>> +};
> >>> It might be clearer to use some term other than enabled in the code -
> >>> when reading I wasn't immediately sure if enabled meant that it was
> >>> available to be selected or if it was the active operating point. How
> >>> about 'allowed' (though I'm not 100% happy with that)?
> >> ;).. The opp is enabled or disabled if it is populated, it is implicit
> >> as being available but not enabled- how about active? this would change
> >> the opp_enable/disable functions to opp_activate, opp_deactivate..
> >
> > Would that mean that "active" is the one currently in use?
>
> I like the idea Phil pointed out[1] on using "available" instead..
> opp_enable and disable will make the OPP available or not. does this
> sound better?
Yes, it does.
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 21:23 ` Nishanth Menon
2010-09-17 22:51 ` Kevin Hilman
@ 2010-09-17 23:07 ` Rafael J. Wysocki
2010-09-17 23:33 ` Nishanth Menon
2010-09-19 19:46 ` Mark Brown
1 sibling, 2 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-17 23:07 UTC (permalink / raw)
To: linux-arm-kernel
On Friday, September 17, 2010, Nishanth Menon wrote:
> Andrew Morton had written, on 09/17/2010 02:19 PM, the following:
> > On Thu, 16 Sep 2010 20:29:33 -0500
> > Nishanth Menon <nm@ti.com> wrote:
> >
> [...]
> >>
> >> Documentation/power/00-INDEX | 2 +
> >> include/linux/opp.h | 136 +++++++++++++
> >> kernel/power/Kconfig | 14 ++
> >> lib/Makefile | 2 +
> >> lib/opp.c | 440 ++++++++++++++++++++++++++++++++++++++++++
> >
> > ./lib/ is an unusual place to put a driver-like thing such as this.
> > The lib/ directory is mainly for generic kernel-wide support things.
> > I'd suggest that ./drivers/opp/ would be a better place.
> we had an interesting debate on this topic on the the thread starting
> http://marc.info/?l=linux-arm-kernel&m=128465710624421&w=2
>
> It really does not provide any driver like feature here. it is just a
> bunch of helper functions which any SOC framework can use to build up to
> implement either:
> a) cpufreq implementation with various OPPs
> b) bootup in a specific opp in a set of supported OPPs and stay there
> (e.g. mpurate support on OMAP platforms)
Generally, some files under drivers/base/power contain helper functions of
similar nature. You can put it in there, as far as I'm concerned.
> I had considered putting it in drivers/power, but it looks to contain
> mostly regulator stuff, the other alternative would be to include it in
> kernel/power or as Kevin H, Linus W and you mention drivers/opp
>
> Personally, I dont have a strong feeling about it as long as it is
> reusable across SOCs and am hoping to hear some alignment :).
>
> >
> >> ...
> >>
> >> +/*
> >> + * Initialization wrapper used to define an OPP.
> >> + * To point at the end of a terminator of a list of OPPs,
> >> + * use OPP_DEF(0, 0, 0)
> >> + */
> >> +#define OPP_DEF(_enabled, _freq, _uv) \
> >> +{ \
> >> + .enabled = _enabled, \
> >> + .freq = _freq, \
> >> + .u_volt = _uv, \
> >> +}
> >
> > OPP_DEF is a somewhat atypical name. OPP_INITIALIZER would be more
> > conventional.
> Thanks. will incorporate this in v2 of my patchset.
>
> >
> > However OPP_DEF has no usage in this patch so perhaps this can be
> > removed?
> there were a few OMAP followon patches from
> http://marc.info/?l=linux-omap&m=128152135801634&w=2
> which will actually use this for OMAP3 platform.
Still, the $subject patch doesn't use it. So perhaps add it in a separate patch?
> >> +static LIST_HEAD(dev_opp_list);
> >
> > There's no locking for this list. That's OK under some circumstances,
> > but I do think there should be a comment here explaining this apparent
> > bug: why is no locking needed, what are the lifetime rules for entries
> > on this list.
> Locking:
> arrgh.. my bad.. I had documented it in the missing and later on posted
> opp.txt http://marc.info/?l=linux-arm-kernel&m=128473931626114&w=2
>
> The OPP layer implementation expects to be used in multiple contexts
> (including calls from interrupt locked context) based on SOC framework
> implementation. It is recommended to use appropriate locking schemes
> within the SOC framework itself.
That should be stated directly in the kerneldoc comments. The requirement
that the caller is supposed to ensure suitable locking is by no means
a trivial one.
Apart from this, it might be a good idea to help callers a bit and actually
introduce some sort of locking into the framework.
> In terms of the lifetime rules on the nodes in the list:
> The list is expected to be maintained once created, entries are expected
> to be added optimally and not expected to be destroyed, the choice of
> list implementation was for reducing the complexity of the code itself
> and not yet meant as a mechanism to dynamically add and delete nodes on
> the fly.. Essentially, it was intended for the SOC framework to ensure
> it plugs in the OPP entries optimally and not create a humungous list of
> all possible OPPs for all families of the vendor SOCs - even though it
> is possible to use the OPP layer so - it just wont be smart to do so
> considering list scan latencies on hot paths such as cpufreq transitions
> or idle transitions.
If the list nodes are not supposed to be added and removed dynamically,
it probably would make sense to create data structures containing
the "available" OPPs only, once they are known, and simply free the object
representing the other ones.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 22:45 ` Rafael J. Wysocki
@ 2010-09-17 23:19 ` Nishanth Menon
2010-09-18 19:11 ` Rafael J. Wysocki
0 siblings, 1 reply; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 23:19 UTC (permalink / raw)
To: linux-arm-kernel
Rafael J. Wysocki had written, on 09/17/2010 05:45 PM, the following:
Thanks for your review. few views below..
> On Friday, September 17, 2010, Nishanth Menon wrote:
>> Nishanth Menon had written, on 09/17/2010 10:05 AM, the following:
>>> Linus Walleij had written, on 09/17/2010 08:41 AM, the following:
>>>> 2010/9/17 Nishanth Menon <nm@ti.com>:
>>>>
>>>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
>>>>> index fb742c2..45e9d4a 100644
>>>>> --- a/Documentation/power/00-INDEX
>>>>> +++ b/Documentation/power/00-INDEX
>>>>> @@ -14,6 +14,8 @@ interface.txt
>>>>> - Power management user interface in /sys/power
>>>>> notifiers.txt
>>>>> - Registering suspend notifiers in device drivers
>>>>> +opp.txt
>>>>> + - Operating Performance Point library
>>>>> pci.txt
>>>>> - How the PCI Subsystem Does Power Management
>>>> Hm you add the documentation to the index but not the documentation
>>>> itself (easy slip) would you mind posting it?
>>> ouch.. Sorry.. I realized that I missed adding MAINTAINER entry as
>>> well.. v2 coming up..
>>>
>> while the review goes on, I will till end of the day before posting a v2
>> will collated comments, here is the opp.txt documentation for the same..
>> apologies on missing in my v1..
>
> OK, I'm not generally familiar with these things, so a couple of questions.
>
>> OPP Layer
>> ==============
>> SOCs have a standard set of tuples consisting of frequency and
>> voltage pairs that the device will support per voltage domain. This
>> is called Operating Performance Point or OPP. The actual definitions
>> of OPP varies over silicon within the same family of devices.
>> For a specific domain, you can have a set of {frequency, voltage}
>> pairs. As the kernel boots and more information is available, a set
>> of these are activated based on the precise nature of device the kernel
>> boots up on.
>
> Does it mean that the full set of possible OPPs is already known from the
> hardware configuration and the ones that are actually useable are found
> during boot?
yes. example - https://patchwork.kernel.org/patch/183742/ (based on
original patch posted to l-arm, this will need some tweaks with the
latest evolution, the concepts remain the same).
For example, consider in the patch above where we have a opp definitions
we can choose from omap3430 and 3630,
when using 3630 silicon as a specific, certain boards wish to enable
1GHz, this allows us to do something as follows:
in the generic omap code, we do init of all opps
in the board specific file, we enable 1Ghz
The selection of oppset and actual availability is done on the fly.
>
>> It is interesting to remember that each IP which belongs
>> to a voltage domain may define their own set of OPPs on top of this.
>
> What does IP stand for in this context?
I had intended domains linked to an IP Block within an SOC - in the case
of OMAP as an example we may have scenarios where voltage levels are
optimized beyond those defined in the original OPP tables using a TI
hardware IP called SmartReflex AVS (ok many SOCs probably are not that
complex.. but it is a cool feature to enable improve the power
consumption). more:
http://www.powersystemsdesign.com/index.php?option=com_content&view=article&id=168&Itemid=107
So in reality once you enable you'd have a lot more tuples of (Fx,
Vnom),(Fx, Vopt1), (Fx, Vopt2).... where (Fx, Vnom) is the original opp
definition.
Unfortunately our framework for supporting this is still in churn and
probably not ready for upstream yet.
>
>> OPP layer of its own depends on silicon specific implementation and
>> board specific data to finalize on the final set of OPPs available
>> in a system
>
> How does the kernel learn about these things? Do they need to be hardcoded
> or is there any dynamic configuration mechanism available?
yes, it is prefered to allow this to be SOC specific - every SOC
(including various OMAP families) have their capabilities allowing them
to implement the detection and configuration differently.. maybe device
trees could be in use in one SOCx, but SOCy might choose to implement a
run time detection as demonstrated in the link above.
>
>> OPP layer organizes the data internally using device pointers representing
>> individual voltage domains.
>
> Can you please describe these data structures in a bit more detail?
probably a dumb question: Is'nt it enough to give detailed verbose
information in the struct comment header? why duplicate here as well..
other than giving an overview?
>
>> NOTE:
>> a) the OPP layer implementation expects to be used in multiple contexts
>> based on SOC implementation and recommends locking schemes appropriate to
>> the usage of SOC.
>> b) All pointer returns are expected to be handled by standard error checks
>> such as IS_ERR() and appropriate actions taken by the caller.
>
> I noticed that in a few places it isn't really necessary to return a pointer.
> I think you can simply return int in such cases.
could you help and point these to me. opp_find_freq_exact,
opp_find_freq_ceil, opp_find_freq_floor are the only ones that return
pointers and they need to return the pointer to the opp they found to
operate on them such as add_freq etc..
>
>> c) Dependency of OPP layer is on CONFIG_PM as certain SOCs such as Texas
>> Instrument's OMAP support have frameworks to optionally boot at a certain
>> opp without needing cpufreq.
>>
>> Initial list initialization:
>> ---------------------------
>> The SOC implementation calls the following function to add opp per
>> domain device.
>>
>> 1. opp_add - add a new OPP - NOTE: use struct opp_def and define
>> the custom OPP with OPP_DEF for usage.
>>
>> Query functions:
>> ----------------
>> Search for OPPs for various cases:
>> 2. opp_find_freq_exact - exact search function
>> 3. opp_find_freq_floor - round_up search function
>> 4. opp_find_freq_ceil - round_down search function
>
> What do they do exactly?
I assume you are expecting to get a little more verbose data here to the
effect:
2. opp_find_freq_exact - function to find an opp with exact match to a
provided frequency
3. opp_find_freq_floor - function to find an opp with rounded floor
match to a provided frequency
4. opp_find_freq_ceil - function to find an opp with rounded ceil match
to a provided frequency
>
>> OPP modifier functions:
>> ----------------------
>> This allows opp layer users to add customized OPPs or change the table
>
> How exactly is that supposed to work?
:) This is more from the real way folks who use the OPP on thier
platforms face. As an example: OMAP3630 supports 100MHz and 200MHz for
the L3 bus driving DDR. some platforms choose not to have 200MHz DDR due
to some internal reasons, instead they choose a DDR - for the sake of
argument say 166MHz. This is not actually "official OPP" from TI
perspective, but these platforms need to do:
init_omap_default_opps()
add_my_custom_166MHz_opp()
disable_the_default_200MHz()
Disclaimer to prevent OMAP guys from flaming me: the values used in my
example are hypothetical and is not an official statement from TI to use
166MHz on OMAP3630 ;). But there are folks not using 200MHz and they
know the pain if this runtime addition was not possible.
>
>> for any need they may have
>> 5. opp_enable - enable a disabled OPP
>> 6. opp_disable - disable an enabled OPP
>>
>> OPP Data retrieval functions:
>> ----------------------------
>> The following sets of functions are useful for drivers to retrieve
>> data stored in opp layer for various functions.
>> 7. opp_get_voltage - retrieve voltage for an opp
>> 9. opp_get_freq - get the frequency for an opp
>> 8. opp_get_opp_count - get number of opps enabled for a domain
>
> OK, suppose I'm writing a driver that needs to care. What should I use these
> functions for?
opp_get_opp_count gets the number of opps available for that domain -
you may choose to setup some sort of array containing some special data
on it.
cpufreq SOCs impleemntation might be told to go to a frequency, it makes
sense for the implementation to pick up the opp pointer with search
function and call opp_get_voltage to get the voltage to scale voltage etc..
Or am I missing your intent here? I did not intend to explain the
concept of power management in this documentation..
>
>> Cpufreq table generation:
>> ------------------------
>> 9. opp_init_cpufreq_table - this translates the OPP layer's internal
>> OPP arrangement into a table understood and operated upon by the
>> cpufreq layer.
>>
>> Data Structures:
>> ---------------
>> struct opp * is a handle structure whose internals are known only
>> to the OPP layer and is meant to hide the complexity away from users of
>> opp layer.
>>
>> struct opp_def * is the definitions that users can interface with
>> opp layer and is meant to define one OPP for a domain device.
>
> The data structures require some more description IMHO. They aren't completely
> intuitive.
ok, a repeat question -> why duplicate the information defined in the
structure comment header?
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 23:07 ` Rafael J. Wysocki
@ 2010-09-17 23:33 ` Nishanth Menon
2010-09-18 18:41 ` Rafael J. Wysocki
2010-09-19 19:46 ` Mark Brown
1 sibling, 1 reply; 34+ messages in thread
From: Nishanth Menon @ 2010-09-17 23:33 UTC (permalink / raw)
To: linux-arm-kernel
Rafael J. Wysocki had written, on 09/17/2010 06:07 PM, the following:
> On Friday, September 17, 2010, Nishanth Menon wrote:
>> Andrew Morton had written, on 09/17/2010 02:19 PM, the following:
>>> On Thu, 16 Sep 2010 20:29:33 -0500
>>> Nishanth Menon <nm@ti.com> wrote:
>>>
>> [...]
>>>> Documentation/power/00-INDEX | 2 +
>>>> include/linux/opp.h | 136 +++++++++++++
>>>> kernel/power/Kconfig | 14 ++
>>>> lib/Makefile | 2 +
>>>> lib/opp.c | 440 ++++++++++++++++++++++++++++++++++++++++++
>>> ./lib/ is an unusual place to put a driver-like thing such as this.
>>> The lib/ directory is mainly for generic kernel-wide support things.
>>> I'd suggest that ./drivers/opp/ would be a better place.
>> we had an interesting debate on this topic on the the thread starting
>> http://marc.info/?l=linux-arm-kernel&m=128465710624421&w=2
>>
>> It really does not provide any driver like feature here. it is just a
>> bunch of helper functions which any SOC framework can use to build up to
>> implement either:
>> a) cpufreq implementation with various OPPs
>> b) bootup in a specific opp in a set of supported OPPs and stay there
>> (e.g. mpurate support on OMAP platforms)
>
> Generally, some files under drivers/base/power contain helper functions of
> similar nature. You can put it in there, as far as I'm concerned.
Ack. thanks for the suggestion. it looks like a good place to me too.
will do it as part of v2.
>
>> I had considered putting it in drivers/power, but it looks to contain
>> mostly regulator stuff, the other alternative would be to include it in
>> kernel/power or as Kevin H, Linus W and you mention drivers/opp
>>
>> Personally, I dont have a strong feeling about it as long as it is
>> reusable across SOCs and am hoping to hear some alignment :).
>>
>>>> ...
>>>>
>>>> +/*
>>>> + * Initialization wrapper used to define an OPP.
>>>> + * To point at the end of a terminator of a list of OPPs,
>>>> + * use OPP_DEF(0, 0, 0)
>>>> + */
>>>> +#define OPP_DEF(_enabled, _freq, _uv) \
>>>> +{ \
>>>> + .enabled = _enabled, \
>>>> + .freq = _freq, \
>>>> + .u_volt = _uv, \
>>>> +}
>>> OPP_DEF is a somewhat atypical name. OPP_INITIALIZER would be more
>>> conventional.
>> Thanks. will incorporate this in v2 of my patchset.
>>
>>> However OPP_DEF has no usage in this patch so perhaps this can be
>>> removed?
>> there were a few OMAP followon patches from
>> http://marc.info/?l=linux-omap&m=128152135801634&w=2
>> which will actually use this for OMAP3 platform.
>
> Still, the $subject patch doesn't use it. So perhaps add it in a separate patch?
was hoping to align on the approach with v2, the follow on OMAP patch
https://patchwork.kernel.org/patch/183742/ will post with needed changes
be part of Kevin's thread for OMAP3 if we are ok with it, or Do we want
to see it in this series as reference?
>
>>>> +static LIST_HEAD(dev_opp_list);
>>> There's no locking for this list. That's OK under some circumstances,
>>> but I do think there should be a comment here explaining this apparent
>>> bug: why is no locking needed, what are the lifetime rules for entries
>>> on this list.
>> Locking:
>> arrgh.. my bad.. I had documented it in the missing and later on posted
>> opp.txt http://marc.info/?l=linux-arm-kernel&m=128473931626114&w=2
>>
>> The OPP layer implementation expects to be used in multiple contexts
>> (including calls from interrupt locked context) based on SOC framework
>> implementation. It is recommended to use appropriate locking schemes
>> within the SOC framework itself.
>
> That should be stated directly in the kerneldoc comments. The requirement
> that the caller is supposed to ensure suitable locking is by no means
> a trivial one.
Agreed. thanks for pointing this out. I have done this as part of v2 of
my patch. will post in a few hrs.
>
> Apart from this, it might be a good idea to help callers a bit and actually
> introduce some sort of locking into the framework.
in OMAP implementation we have need to use these apis from irq_locked
contexts as well.. lock implementation will prevent thier necessary
usage. instead we have implemented lock mechanisms in higher layers and
ensured that the opp table does not change once created - the rest of
them are query functions - the risk is on opp_enable/disable.
>
>> In terms of the lifetime rules on the nodes in the list:
>> The list is expected to be maintained once created, entries are expected
>> to be added optimally and not expected to be destroyed, the choice of
>> list implementation was for reducing the complexity of the code itself
>> and not yet meant as a mechanism to dynamically add and delete nodes on
>> the fly.. Essentially, it was intended for the SOC framework to ensure
>> it plugs in the OPP entries optimally and not create a humongous list of
>> all possible OPPs for all families of the vendor SOCs - even though it
>> is possible to use the OPP layer so - it just wont be smart to do so
>> considering list scan latencies on hot paths such as cpufreq transitions
>> or idle transitions.
>
> If the list nodes are not supposed to be added and removed dynamically,
> it probably would make sense to create data structures containing
> the "available" OPPs only, once they are known, and simply free the object
> representing the other ones.
I covered the usage in my reply here:
http://marc.info/?l=linux-arm-kernel&m=128476570300466&w=2
but to repeat, the list is dynamic during initialization but remains
static after initialization based on SOC framework implementation - this
is best implemented with a list (we had started with an original array
implementation which evolved to the current list implementation
http://marc.info/?l=linux-omap&m=125912217718770&w=2)
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 15:59 ` Mark Brown
@ 2010-09-18 0:37 ` Kevin Hilman
2010-09-18 10:04 ` Mark Brown
0 siblings, 1 reply; 34+ messages in thread
From: Kevin Hilman @ 2010-09-18 0:37 UTC (permalink / raw)
To: linux-arm-kernel
[trimmed Cc list a bit, as vger bounced my last reply due to header too long]
Mark Brown <broonie@opensource.wolfsonmicro.com> writes:
> On Fri, Sep 17, 2010 at 10:53:06AM -0500, Nishanth Menon wrote:
>> Mark Brown had written, on 09/17/2010 10:36 AM, the following:
>
>> >It might be clearer to use some term other than enabled in the code -
>> >when reading I wasn't immediately sure if enabled meant that it was
>> >available to be selected or if it was the active operating point. How
>> >about 'allowed' (though I'm not 100% happy with that)?
>
>> ;).. The opp is enabled or disabled if it is populated, it is
>> implicit as being available but not enabled- how about active? this
>> would change the opp_enable/disable functions to opp_activate,
>> opp_deactivate..
>
>> Recommendations folks?
>
> The enable/disable thing wasn't so noticable in the API itself, it was
> in the data structures that I found it confusing - the core has a
> different idea about what's going on with the system as a whole compared
> to the decision that an individual device is taking.
Can you clarify your confusion here? I don't follow the problem you're
raising.
The enabled flag in the internal data structure is set to true when
opp_enable() is called and set to false when opp_disable() is called.
I'm not sure
At least as we're currently using it, this API has know knowlege of what
OPP is currently active on the system. IOW, it has no idea what the
current frequency/voltage a given device is set to. It's intended more
like an OPP database with some convience functions to search through the
OPPs and to make OPPs available (or not.) The users of this API (in our
case, CPUfreq and voltage scaling code) are the ones which
keep track of which OPP is actually in use.
As I write this, I agree with what Phil pointed out; that 'available' is
probably the right name for this flag, instead of 'enabled.'
>> >When reading the description I'd expected to see some facility to
>> >trigger selection of an active operating point in the library (possibly
>> >as a separate call since you might have a bunch of operating points
>> >being updated in quick succession) but it looks like that needs to be
>> >supplied externally at the minute?
>
>> The intent is we use the opp_search* functions to pick up the opp
>> and enable/activate it and disable/deactivate it.
>
> Sure, I get that bit. What I meant was that I was expecting something
> that would say that changes had been made to the enabled/disabled sets
> and that it'd be a good idea to rescan, especially for cases where the
> devices change their requirements but the OPPs need to be done over a
> larger block.
I guess you're thinking of notifiers, so if the list of available OPPs
changes, a driver could (re)search to see if a more optimal OPP was
available?
Certainly sounds possible, but not sure how useful in practice. OPP
change decisions are not very often, so any OPP database updates may not
affect a device OPP change currently in progress, but would take effect
the next time that device makes an OPP change.
Either way, this is something that could easily be added if it seems
useful.
Kevin
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-18 0:37 ` Kevin Hilman
@ 2010-09-18 10:04 ` Mark Brown
0 siblings, 0 replies; 34+ messages in thread
From: Mark Brown @ 2010-09-18 10:04 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Sep 17, 2010 at 05:37:06PM -0700, Kevin Hilman wrote:
> [trimmed Cc list a bit, as vger bounced my last reply due to header too long]
> Mark Brown <broonie@opensource.wolfsonmicro.com> writes:
> > The enable/disable thing wasn't so noticable in the API itself, it was
> > in the data structures that I found it confusing - the core has a
> > different idea about what's going on with the system as a whole compared
> > to the decision that an individual device is taking.
> Can you clarify your confusion here? I don't follow the problem you're
> raising.
The confusion is between the enable flag meaning that the operating
point is on the list that can be chosen from and it being the currently
active one. It's clear once you understand what the API does but the
current naming makes that harder to grasp.
> As I write this, I agree with what Phil pointed out; that 'available' is
> probably the right name for this flag, instead of 'enabled.'
Yup, me too.
> > Sure, I get that bit. What I meant was that I was expecting something
> > that would say that changes had been made to the enabled/disabled sets
> > and that it'd be a good idea to rescan, especially for cases where the
> > devices change their requirements but the OPPs need to be done over a
> > larger block.
> I guess you're thinking of notifiers, so if the list of available OPPs
> changes, a driver could (re)search to see if a more optimal OPP was
> available?
Yup, or at least some suggestion in the API for how this should be done.
> Certainly sounds possible, but not sure how useful in practice. OPP
> change decisions are not very often, so any OPP database updates may not
> affect a device OPP change currently in progress, but would take effect
> the next time that device makes an OPP change.
Like I say, the main use case would be when the device itself is not
making the actual operating point decision but is feeding in to a wider
block - if the device implements the decision then it really shouldn't
need to notify itself about it, though I guess it might be handy.
> Either way, this is something that could easily be added if it seems
> useful.
My concern there would be divergent idioms for working with the API.
It'd seem better to start everyone off down the same path if we can
rather than have differences between platforms which make life harder
when moving between mulitple platforms.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [linux-pm] [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 16:45 ` Phil Carmody
@ 2010-09-18 10:08 ` Mark Brown
0 siblings, 0 replies; 34+ messages in thread
From: Mark Brown @ 2010-09-18 10:08 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Sep 17, 2010 at 07:45:43PM +0300, Phil Carmody wrote:
> On 17/09/10 17:36 +0200, ext Mark Brown wrote:
> > It might be clearer to use some term other than enabled in the code -
> > when reading I wasn't immediately sure if enabled meant that it was
> > available to be selected or if it was the active operating point. How
> ^^^^^^^^^
> > about 'allowed' (though I'm not 100% happy with that)?
> I think your query contains its own answer.
Yeah, I didn't pick that because all of them are available in the sense
that they could in the future be enabled but I do think it's the best
option.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 23:33 ` Nishanth Menon
@ 2010-09-18 18:41 ` Rafael J. Wysocki
2010-09-20 15:26 ` Kevin Hilman
0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-18 18:41 UTC (permalink / raw)
To: linux-arm-kernel
[Trimming the CC list.]
On Saturday, September 18, 2010, Nishanth Menon wrote:
> Rafael J. Wysocki had written, on 09/17/2010 06:07 PM, the following:
...
> >
> > Apart from this, it might be a good idea to help callers a bit and actually
> > introduce some sort of locking into the framework.
> in OMAP implementation we have need to use these apis from irq_locked
> contexts as well.. lock implementation will prevent thier necessary
> usage.
Why so? You can use spin_lock_irqsave/spin_unclock_irqrestore to
avoid that.
> instead we have implemented lock mechanisms in higher layers and
> ensured that the opp table does not change once created - the rest of
> them are query functions - the risk is on opp_enable/disable.
OK, but this way you practically make the users of the framework duplicate
some code for the locking purposes. This is kind of suboptimal, and error
prone too.
> >> In terms of the lifetime rules on the nodes in the list:
> >> The list is expected to be maintained once created, entries are expected
> >> to be added optimally and not expected to be destroyed, the choice of
> >> list implementation was for reducing the complexity of the code itself
> >> and not yet meant as a mechanism to dynamically add and delete nodes on
> >> the fly.. Essentially, it was intended for the SOC framework to ensure
> >> it plugs in the OPP entries optimally and not create a humongous list of
> >> all possible OPPs for all families of the vendor SOCs - even though it
> >> is possible to use the OPP layer so - it just wont be smart to do so
> >> considering list scan latencies on hot paths such as cpufreq transitions
> >> or idle transitions.
> >
> > If the list nodes are not supposed to be added and removed dynamically,
> > it probably would make sense to create data structures containing
> > the "available" OPPs only, once they are known, and simply free the object
> > representing the other ones.
> I covered the usage in my reply here:
> http://marc.info/?l=linux-arm-kernel&m=128476570300466&w=2
> but to repeat, the list is dynamic during initialization but remains
> static after initialization based on SOC framework implementation - this
> is best implemented with a list (we had started with an original array
> implementation which evolved to the current list implementation
> http://marc.info/?l=linux-omap&m=125912217718770&w=2)
Well, my point is, since the _final_ set of OPPs doesn't really change, there's
no need to use a list for storing it in principle.
Your current algorithm seems to be:
(1) Create a list of all _possible_ OPPs.
(2) Mark the ones that can actually be used on the given hardware as
"available".
(3) Whenever we need to find an OPP to use, browse the entire list.
This isn't optimal, because the OPPs that are not marked as "available" in (2)
will never be used, although they _will_ be inspected while browsing the list.
So, I think a better algorithm would be:
(1) Create a list of all possible OPPs.
(2) Drop the nonavailable OPPs from the list.
(3) Whenever we need to find an OPP to use, browse the entire list.
But then, it may be better to simply move the list we get in (2) into an
array, because the browsing is going to require fewer memory accesses in
that case (also, an array would use less memory than the list). So, perhaps,
it's better to change the algorithm even further:
(1) Create a list of all possible OPPs.
(2) Drop the nonavailable OPPs from the list.
(3) Move the list we got in (2) into a sorted array.
(4) Whenever we need to find an OPP to use, browse the array
(perhaps using binary search).
Thanks,
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 23:19 ` Nishanth Menon
@ 2010-09-18 19:11 ` Rafael J. Wysocki
0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-18 19:11 UTC (permalink / raw)
To: linux-arm-kernel
[Trimming the CC list]
On Saturday, September 18, 2010, Nishanth Menon wrote:
> Rafael J. Wysocki had written, on 09/17/2010 05:45 PM, the following:
>
> Thanks for your review. few views below..
>
> > On Friday, September 17, 2010, Nishanth Menon wrote:
> >> Nishanth Menon had written, on 09/17/2010 10:05 AM, the following:
> >>> Linus Walleij had written, on 09/17/2010 08:41 AM, the following:
> >>>> 2010/9/17 Nishanth Menon <nm@ti.com>:
> >>>>
> >>>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> >>>>> index fb742c2..45e9d4a 100644
> >>>>> --- a/Documentation/power/00-INDEX
> >>>>> +++ b/Documentation/power/00-INDEX
> >>>>> @@ -14,6 +14,8 @@ interface.txt
> >>>>> - Power management user interface in /sys/power
> >>>>> notifiers.txt
> >>>>> - Registering suspend notifiers in device drivers
> >>>>> +opp.txt
> >>>>> + - Operating Performance Point library
> >>>>> pci.txt
> >>>>> - How the PCI Subsystem Does Power Management
> >>>> Hm you add the documentation to the index but not the documentation
> >>>> itself (easy slip) would you mind posting it?
> >>> ouch.. Sorry.. I realized that I missed adding MAINTAINER entry as
> >>> well.. v2 coming up..
> >>>
> >> while the review goes on, I will till end of the day before posting a v2
> >> will collated comments, here is the opp.txt documentation for the same..
> >> apologies on missing in my v1..
> >
> > OK, I'm not generally familiar with these things, so a couple of questions.
> >
> >> OPP Layer
> >> ==============
> >> SOCs have a standard set of tuples consisting of frequency and
> >> voltage pairs that the device will support per voltage domain. This
> >> is called Operating Performance Point or OPP. The actual definitions
> >> of OPP varies over silicon within the same family of devices.
> >> For a specific domain, you can have a set of {frequency, voltage}
> >> pairs. As the kernel boots and more information is available, a set
> >> of these are activated based on the precise nature of device the kernel
> >> boots up on.
> >
> > Does it mean that the full set of possible OPPs is already known from the
> > hardware configuration and the ones that are actually useable are found
> > during boot?
>
> yes. example - https://patchwork.kernel.org/patch/183742/ (based on
> original patch posted to l-arm, this will need some tweaks with the
> latest evolution, the concepts remain the same).
>
> For example, consider in the patch above where we have a opp definitions
> we can choose from omap3430 and 3630,
>
> when using 3630 silicon as a specific, certain boards wish to enable
> 1GHz, this allows us to do something as follows:
> in the generic omap code, we do init of all opps
> in the board specific file, we enable 1Ghz
>
> The selection of oppset and actual availability is done on the fly.
OK, so why don't you simply drop the OPPs you're not going to use from the
list in the board-specific file?
...
> >> OPP layer organizes the data internally using device pointers representing
> >> individual voltage domains.
> >
> > Can you please describe these data structures in a bit more detail?
> probably a dumb question: Is'nt it enough to give detailed verbose
> information in the struct comment header? why duplicate here as well..
> other than giving an overview?
I meant "explain it to me" rather than "explain it in the doc". :-) Sorry for
not being clear enough.
> >> NOTE:
> >> a) the OPP layer implementation expects to be used in multiple contexts
> >> based on SOC implementation and recommends locking schemes appropriate to
> >> the usage of SOC.
> >> b) All pointer returns are expected to be handled by standard error checks
> >> such as IS_ERR() and appropriate actions taken by the caller.
> >
> > I noticed that in a few places it isn't really necessary to return a pointer.
> > I think you can simply return int in such cases.
> could you help and point these to me. opp_find_freq_exact,
> opp_find_freq_ceil, opp_find_freq_floor are the only ones that return
> pointers and they need to return the pointer to the opp they found to
> operate on them such as add_freq etc..
Hmm, OK. I must have made a mistake, sorry.
...
> >> Data Structures:
> >> ---------------
> >> struct opp * is a handle structure whose internals are known only
> >> to the OPP layer and is meant to hide the complexity away from users of
> >> opp layer.
> >>
> >> struct opp_def * is the definitions that users can interface with
> >> opp layer and is meant to define one OPP for a domain device.
> >
> > The data structures require some more description IMHO. They aren't completely
> > intuitive.
>
> ok, a repeat question -> why duplicate the information defined in the
> structure comment header?
Because it doesn't imply the specific implementation, AFAICS.
Like what the list heads in your structures are used for etc.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-17 23:07 ` Rafael J. Wysocki
2010-09-17 23:33 ` Nishanth Menon
@ 2010-09-19 19:46 ` Mark Brown
1 sibling, 0 replies; 34+ messages in thread
From: Mark Brown @ 2010-09-19 19:46 UTC (permalink / raw)
To: linux-arm-kernel
On Sat, Sep 18, 2010 at 01:07:51AM +0200, Rafael J. Wysocki wrote:
> On Friday, September 17, 2010, Nishanth Menon wrote:
> > there were a few OMAP followon patches from
> > http://marc.info/?l=linux-omap&m=128152135801634&w=2
> > which will actually use this for OMAP3 platform.
> Still, the $subject patch doesn't use it. So perhaps add it in a separate patch?
By the same argument nothing in this patch causes the code to become
used so we may as well not include it at all :)
More seriously, looking at a lot of the questions that people are
raising it might make the review a bit easier if the patch were posted
in a series along with an initial user or two to provide some context
for the expected usage.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-18 18:41 ` Rafael J. Wysocki
@ 2010-09-20 15:26 ` Kevin Hilman
2010-09-20 16:38 ` Rafael J. Wysocki
0 siblings, 1 reply; 34+ messages in thread
From: Kevin Hilman @ 2010-09-20 15:26 UTC (permalink / raw)
To: linux-arm-kernel
"Rafael J. Wysocki" <rjw@sisk.pl> writes:
[...]
>> >> In terms of the lifetime rules on the nodes in the list:
>> >> The list is expected to be maintained once created, entries are expected
>> >> to be added optimally and not expected to be destroyed, the choice of
>> >> list implementation was for reducing the complexity of the code itself
>> >> and not yet meant as a mechanism to dynamically add and delete nodes on
>> >> the fly.. Essentially, it was intended for the SOC framework to ensure
>> >> it plugs in the OPP entries optimally and not create a humongous list of
>> >> all possible OPPs for all families of the vendor SOCs - even though it
>> >> is possible to use the OPP layer so - it just wont be smart to do so
>> >> considering list scan latencies on hot paths such as cpufreq transitions
>> >> or idle transitions.
>> >
>> > If the list nodes are not supposed to be added and removed dynamically,
>> > it probably would make sense to create data structures containing
>> > the "available" OPPs only, once they are known, and simply free the object
>> > representing the other ones.
>> I covered the usage in my reply here:
>> http://marc.info/?l=linux-arm-kernel&m=128476570300466&w=2
>> but to repeat, the list is dynamic during initialization but remains
>> static after initialization based on SOC framework implementation - this
>> is best implemented with a list (we had started with an original array
>> implementation which evolved to the current list implementation
>> http://marc.info/?l=linux-omap&m=125912217718770&w=2)
>
> Well, my point is, since the _final_ set of OPPs doesn't really
> change, there's no need to use a list for storing it in principle.
>
> Your current algorithm seems to be:
> (1) Create a list of all _possible_ OPPs.
> (2) Mark the ones that can actually be used on the given hardware as
> "available".
> (3) Whenever we need to find an OPP to use, browse the entire list.
> This isn't optimal, because the OPPs that are not marked as "available" in (2)
> will never be used, although they _will_ be inspected while browsing the list.
A little clarificaion about "will never be used" below...
> So, I think a better algorithm would be:
> (1) Create a list of all possible OPPs.
> (2) Drop the nonavailable OPPs from the list.
> (3) Whenever we need to find an OPP to use, browse the entire list.
>
> But then, it may be better to simply move the list we get in (2) into an
> array, because the browsing is going to require fewer memory accesses in
> that case (also, an array would use less memory than the list). So, perhaps,
> it's better to change the algorithm even further:
> (1) Create a list of all possible OPPs.
> (2) Drop the nonavailable OPPs from the list.
> (3) Move the list we got in (2) into a sorted array.
> (4) Whenever we need to find an OPP to use, browse the array
> (perhaps using binary search).
Just a little clarification on "available." The intended use of this flag
was not just a one-time "available on hardware X." It was also intended
to be able to add/remove availbale OPPs dynamically at run-time.
More specifically, it's intended for use to *temporarily* remove an OPP
from being selected. The production usage of this would primarily for
thermal considerations (e.g. don't use OPPx until the temperature drops)
However, for PM development & debug, we also use this to temporarily
take a class of OPPs out of the running for test/debug purposes
(e.g. driver X runs great at OPPx and OPPy, but not OPPz.) So the
ability to temporarily be selective about OPPs at runtime for
debug/development is extremely useful.
So, to summarize, "most of the time", all the OPPs that were added (via
opp_add()) will be "available". Ones that are !availble will likely
only be so temporarily, so I'm not sure that the overhead of keeping a
separate structure for the available and !available OPPs is worth it.
Especially, since OPP changes are relatively infrequent.
Kevin
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-20 15:26 ` Kevin Hilman
@ 2010-09-20 16:38 ` Rafael J. Wysocki
2010-09-20 17:21 ` Kevin Hilman
0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-20 16:38 UTC (permalink / raw)
To: linux-arm-kernel
On Monday, September 20, 2010, Kevin Hilman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>
> [...]
>
> >> >> In terms of the lifetime rules on the nodes in the list:
> >> >> The list is expected to be maintained once created, entries are expected
> >> >> to be added optimally and not expected to be destroyed, the choice of
> >> >> list implementation was for reducing the complexity of the code itself
> >> >> and not yet meant as a mechanism to dynamically add and delete nodes on
> >> >> the fly.. Essentially, it was intended for the SOC framework to ensure
> >> >> it plugs in the OPP entries optimally and not create a humongous list of
> >> >> all possible OPPs for all families of the vendor SOCs - even though it
> >> >> is possible to use the OPP layer so - it just wont be smart to do so
> >> >> considering list scan latencies on hot paths such as cpufreq transitions
> >> >> or idle transitions.
> >> >
> >> > If the list nodes are not supposed to be added and removed dynamically,
> >> > it probably would make sense to create data structures containing
> >> > the "available" OPPs only, once they are known, and simply free the object
> >> > representing the other ones.
> >> I covered the usage in my reply here:
> >> http://marc.info/?l=linux-arm-kernel&m=128476570300466&w=2
> >> but to repeat, the list is dynamic during initialization but remains
> >> static after initialization based on SOC framework implementation - this
> >> is best implemented with a list (we had started with an original array
> >> implementation which evolved to the current list implementation
> >> http://marc.info/?l=linux-omap&m=125912217718770&w=2)
> >
> > Well, my point is, since the _final_ set of OPPs doesn't really
> > change, there's no need to use a list for storing it in principle.
> >
> > Your current algorithm seems to be:
> > (1) Create a list of all _possible_ OPPs.
> > (2) Mark the ones that can actually be used on the given hardware as
> > "available".
> > (3) Whenever we need to find an OPP to use, browse the entire list.
> > This isn't optimal, because the OPPs that are not marked as "available" in (2)
> > will never be used, although they _will_ be inspected while browsing the list.
>
> A little clarificaion about "will never be used" below...
>
> > So, I think a better algorithm would be:
> > (1) Create a list of all possible OPPs.
> > (2) Drop the nonavailable OPPs from the list.
> > (3) Whenever we need to find an OPP to use, browse the entire list.
> >
> > But then, it may be better to simply move the list we get in (2) into an
> > array, because the browsing is going to require fewer memory accesses in
> > that case (also, an array would use less memory than the list). So, perhaps,
> > it's better to change the algorithm even further:
> > (1) Create a list of all possible OPPs.
> > (2) Drop the nonavailable OPPs from the list.
> > (3) Move the list we got in (2) into a sorted array.
> > (4) Whenever we need to find an OPP to use, browse the array
> > (perhaps using binary search).
>
> Just a little clarification on "available." The intended use of this flag
> was not just a one-time "available on hardware X." It was also intended
> to be able to add/remove availbale OPPs dynamically at run-time.
>
> More specifically, it's intended for use to *temporarily* remove an OPP
> from being selected. The production usage of this would primarily for
> thermal considerations (e.g. don't use OPPx until the temperature drops)
>
> However, for PM development & debug, we also use this to temporarily
> take a class of OPPs out of the running for test/debug purposes
> (e.g. driver X runs great at OPPx and OPPy, but not OPPz.) So the
> ability to temporarily be selective about OPPs at runtime for
> debug/development is extremely useful.
>
> So, to summarize, "most of the time", all the OPPs that were added (via
> opp_add()) will be "available". Ones that are !availble will likely
> only be so temporarily, so I'm not sure that the overhead of keeping a
> separate structure for the available and !available OPPs is worth it.
> Especially, since OPP changes are relatively infrequent.
Well, the Nishanth's description doesn't match this, so thanks for the
clarification.
In that case you might consider using a red-black tree for storing the
"available" OPPs, so that you can add-remove them dynamically, but
you can avoid a linear search through the entire list every time you need to
find and available OPP. Since we have standard helpers for handling rbtrees,
that shouldn't be a big deal.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-20 16:38 ` Rafael J. Wysocki
@ 2010-09-20 17:21 ` Kevin Hilman
2010-09-20 17:35 ` Rafael J. Wysocki
0 siblings, 1 reply; 34+ messages in thread
From: Kevin Hilman @ 2010-09-20 17:21 UTC (permalink / raw)
To: linux-arm-kernel
"Rafael J. Wysocki" <rjw@sisk.pl> writes:
> On Monday, September 20, 2010, Kevin Hilman wrote:
>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>>
>> [...]
>>
>> >> >> In terms of the lifetime rules on the nodes in the list:
>> >> >> The list is expected to be maintained once created, entries are expected
>> >> >> to be added optimally and not expected to be destroyed, the choice of
>> >> >> list implementation was for reducing the complexity of the code itself
>> >> >> and not yet meant as a mechanism to dynamically add and delete nodes on
>> >> >> the fly.. Essentially, it was intended for the SOC framework to ensure
>> >> >> it plugs in the OPP entries optimally and not create a humongous list of
>> >> >> all possible OPPs for all families of the vendor SOCs - even though it
>> >> >> is possible to use the OPP layer so - it just wont be smart to do so
>> >> >> considering list scan latencies on hot paths such as cpufreq transitions
>> >> >> or idle transitions.
>> >> >
>> >> > If the list nodes are not supposed to be added and removed dynamically,
>> >> > it probably would make sense to create data structures containing
>> >> > the "available" OPPs only, once they are known, and simply free the object
>> >> > representing the other ones.
>> >> I covered the usage in my reply here:
>> >> http://marc.info/?l=linux-arm-kernel&m=128476570300466&w=2
>> >> but to repeat, the list is dynamic during initialization but remains
>> >> static after initialization based on SOC framework implementation - this
>> >> is best implemented with a list (we had started with an original array
>> >> implementation which evolved to the current list implementation
>> >> http://marc.info/?l=linux-omap&m=125912217718770&w=2)
>> >
>> > Well, my point is, since the _final_ set of OPPs doesn't really
>> > change, there's no need to use a list for storing it in principle.
>> >
>> > Your current algorithm seems to be:
>> > (1) Create a list of all _possible_ OPPs.
>> > (2) Mark the ones that can actually be used on the given hardware as
>> > "available".
>> > (3) Whenever we need to find an OPP to use, browse the entire list.
>> > This isn't optimal, because the OPPs that are not marked as "available" in (2)
>> > will never be used, although they _will_ be inspected while browsing the list.
>>
>> A little clarificaion about "will never be used" below...
>>
>> > So, I think a better algorithm would be:
>> > (1) Create a list of all possible OPPs.
>> > (2) Drop the nonavailable OPPs from the list.
>> > (3) Whenever we need to find an OPP to use, browse the entire list.
>> >
>> > But then, it may be better to simply move the list we get in (2) into an
>> > array, because the browsing is going to require fewer memory accesses in
>> > that case (also, an array would use less memory than the list). So, perhaps,
>> > it's better to change the algorithm even further:
>> > (1) Create a list of all possible OPPs.
>> > (2) Drop the nonavailable OPPs from the list.
>> > (3) Move the list we got in (2) into a sorted array.
>> > (4) Whenever we need to find an OPP to use, browse the array
>> > (perhaps using binary search).
>>
>> Just a little clarification on "available." The intended use of this flag
>> was not just a one-time "available on hardware X." It was also intended
>> to be able to add/remove availbale OPPs dynamically at run-time.
>>
>> More specifically, it's intended for use to *temporarily* remove an OPP
>> from being selected. The production usage of this would primarily for
>> thermal considerations (e.g. don't use OPPx until the temperature drops)
>>
>> However, for PM development & debug, we also use this to temporarily
>> take a class of OPPs out of the running for test/debug purposes
>> (e.g. driver X runs great at OPPx and OPPy, but not OPPz.) So the
>> ability to temporarily be selective about OPPs at runtime for
>> debug/development is extremely useful.
>>
>> So, to summarize, "most of the time", all the OPPs that were added (via
>> opp_add()) will be "available". Ones that are !availble will likely
>> only be so temporarily, so I'm not sure that the overhead of keeping a
>> separate structure for the available and !available OPPs is worth it.
>> Especially, since OPP changes are relatively infrequent.
>
> Well, the Nishanth's description doesn't match this, so thanks for the
> clarification.
Agreed, we need to update the doc file to reflect this.
> In that case you might consider using a red-black tree for storing the
> "available" OPPs, so that you can add-remove them dynamically, but
> you can avoid a linear search through the entire list every time you need to
> find and available OPP. Since we have standard helpers for handling rbtrees,
> that shouldn't be a big deal.
That's a possibility, but do you think rbtrees are worth it for a
relatively small number of OPPs for any given device? We're using this
to track a list of OPPs for any struct device, so there may be multiple
independent OPP lists, but each would have a small number of OPPs.
For example, on OMAP, while the CPU might have a larger number of OPPs
(5-6), most devices will have a small number of OPPs (1-3.) I gather
this is similar for many of the embedded SoCs available today.
Considering such a small number of OPPs, is the extra complexity of an
rbtree worth it?
Kevin
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] opp: introduce library for device-specific OPPs
2010-09-20 17:21 ` Kevin Hilman
@ 2010-09-20 17:35 ` Rafael J. Wysocki
0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2010-09-20 17:35 UTC (permalink / raw)
To: linux-arm-kernel
On Monday, September 20, 2010, Kevin Hilman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>
> > On Monday, September 20, 2010, Kevin Hilman wrote:
> >> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >>
> >> [...]
> >>
> >> >> >> In terms of the lifetime rules on the nodes in the list:
> >> >> >> The list is expected to be maintained once created, entries are expected
> >> >> >> to be added optimally and not expected to be destroyed, the choice of
> >> >> >> list implementation was for reducing the complexity of the code itself
> >> >> >> and not yet meant as a mechanism to dynamically add and delete nodes on
> >> >> >> the fly.. Essentially, it was intended for the SOC framework to ensure
> >> >> >> it plugs in the OPP entries optimally and not create a humongous list of
> >> >> >> all possible OPPs for all families of the vendor SOCs - even though it
> >> >> >> is possible to use the OPP layer so - it just wont be smart to do so
> >> >> >> considering list scan latencies on hot paths such as cpufreq transitions
> >> >> >> or idle transitions.
> >> >> >
> >> >> > If the list nodes are not supposed to be added and removed dynamically,
> >> >> > it probably would make sense to create data structures containing
> >> >> > the "available" OPPs only, once they are known, and simply free the object
> >> >> > representing the other ones.
> >> >> I covered the usage in my reply here:
> >> >> http://marc.info/?l=linux-arm-kernel&m=128476570300466&w=2
> >> >> but to repeat, the list is dynamic during initialization but remains
> >> >> static after initialization based on SOC framework implementation - this
> >> >> is best implemented with a list (we had started with an original array
> >> >> implementation which evolved to the current list implementation
> >> >> http://marc.info/?l=linux-omap&m=125912217718770&w=2)
> >> >
> >> > Well, my point is, since the _final_ set of OPPs doesn't really
> >> > change, there's no need to use a list for storing it in principle.
> >> >
> >> > Your current algorithm seems to be:
> >> > (1) Create a list of all _possible_ OPPs.
> >> > (2) Mark the ones that can actually be used on the given hardware as
> >> > "available".
> >> > (3) Whenever we need to find an OPP to use, browse the entire list.
> >> > This isn't optimal, because the OPPs that are not marked as "available" in (2)
> >> > will never be used, although they _will_ be inspected while browsing the list.
> >>
> >> A little clarificaion about "will never be used" below...
> >>
> >> > So, I think a better algorithm would be:
> >> > (1) Create a list of all possible OPPs.
> >> > (2) Drop the nonavailable OPPs from the list.
> >> > (3) Whenever we need to find an OPP to use, browse the entire list.
> >> >
> >> > But then, it may be better to simply move the list we get in (2) into an
> >> > array, because the browsing is going to require fewer memory accesses in
> >> > that case (also, an array would use less memory than the list). So, perhaps,
> >> > it's better to change the algorithm even further:
> >> > (1) Create a list of all possible OPPs.
> >> > (2) Drop the nonavailable OPPs from the list.
> >> > (3) Move the list we got in (2) into a sorted array.
> >> > (4) Whenever we need to find an OPP to use, browse the array
> >> > (perhaps using binary search).
> >>
> >> Just a little clarification on "available." The intended use of this flag
> >> was not just a one-time "available on hardware X." It was also intended
> >> to be able to add/remove availbale OPPs dynamically at run-time.
> >>
> >> More specifically, it's intended for use to *temporarily* remove an OPP
> >> from being selected. The production usage of this would primarily for
> >> thermal considerations (e.g. don't use OPPx until the temperature drops)
> >>
> >> However, for PM development & debug, we also use this to temporarily
> >> take a class of OPPs out of the running for test/debug purposes
> >> (e.g. driver X runs great at OPPx and OPPy, but not OPPz.) So the
> >> ability to temporarily be selective about OPPs at runtime for
> >> debug/development is extremely useful.
> >>
> >> So, to summarize, "most of the time", all the OPPs that were added (via
> >> opp_add()) will be "available". Ones that are !availble will likely
> >> only be so temporarily, so I'm not sure that the overhead of keeping a
> >> separate structure for the available and !available OPPs is worth it.
> >> Especially, since OPP changes are relatively infrequent.
> >
> > Well, the Nishanth's description doesn't match this, so thanks for the
> > clarification.
>
> Agreed, we need to update the doc file to reflect this.
>
> > In that case you might consider using a red-black tree for storing the
> > "available" OPPs, so that you can add-remove them dynamically, but
> > you can avoid a linear search through the entire list every time you need to
> > find and available OPP. Since we have standard helpers for handling rbtrees,
> > that shouldn't be a big deal.
>
> That's a possibility, but do you think rbtrees are worth it for a
> relatively small number of OPPs for any given device? We're using this
> to track a list of OPPs for any struct device, so there may be multiple
> independent OPP lists, but each would have a small number of OPPs.
>
> For example, on OMAP, while the CPU might have a larger number of OPPs
> (5-6), most devices will have a small number of OPPs (1-3.) I gather
> this is similar for many of the embedded SoCs available today.
>
> Considering such a small number of OPPs, is the extra complexity of an
> rbtree worth it?
OK, probably not. If there's so few of them, I agree that using lists is
probably better, but please put the numbers information into the doc too. :-)
Thanks,
Rafael
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2010-09-20 17:35 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <[PATCH 1/4] OMAP: introduce OPP layer for device-specific OPPs>
2010-09-17 1:29 ` [PATCH] opp: introduce library for device-specific OPPs Nishanth Menon
2010-09-17 13:41 ` Linus Walleij
2010-09-17 15:05 ` Nishanth Menon
2010-09-17 15:59 ` Nishanth Menon
2010-09-17 22:45 ` Rafael J. Wysocki
2010-09-17 23:19 ` Nishanth Menon
2010-09-18 19:11 ` Rafael J. Wysocki
2010-09-17 14:09 ` Aguirre, Sergio
2010-09-17 15:30 ` Nishanth Menon
2010-09-17 16:11 ` Aguirre, Sergio
2010-09-17 16:15 ` Aguirre, Sergio
2010-09-17 16:20 ` Nishanth Menon
2010-09-17 15:36 ` [linux-pm] " Mark Brown
2010-09-17 15:53 ` Nishanth Menon
2010-09-17 15:59 ` Mark Brown
2010-09-18 0:37 ` Kevin Hilman
2010-09-18 10:04 ` Mark Brown
2010-09-17 22:22 ` Rafael J. Wysocki
2010-09-17 22:26 ` Nishanth Menon
2010-09-17 22:52 ` Rafael J. Wysocki
2010-09-17 16:45 ` Phil Carmody
2010-09-18 10:08 ` Mark Brown
2010-09-17 19:19 ` Andrew Morton
2010-09-17 21:23 ` Nishanth Menon
2010-09-17 22:51 ` Kevin Hilman
2010-09-17 23:07 ` Rafael J. Wysocki
2010-09-17 23:33 ` Nishanth Menon
2010-09-18 18:41 ` Rafael J. Wysocki
2010-09-20 15:26 ` Kevin Hilman
2010-09-20 16:38 ` Rafael J. Wysocki
2010-09-20 17:21 ` Kevin Hilman
2010-09-20 17:35 ` Rafael J. Wysocki
2010-09-19 19:46 ` Mark Brown
2010-09-17 22:07 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).