* [PATCH v5 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
2011-08-08 9:03 [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
@ 2011-08-08 9:03 ` MyungJoo Ham
2011-08-11 1:00 ` Turquette, Mike
2011-08-08 9:03 ` [PATCH v5 2/3] PM / DEVFREQ: add basic governors MyungJoo Ham
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: MyungJoo Ham @ 2011-08-08 9:03 UTC (permalink / raw)
To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner
With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.
This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.
The generic DVFS for devices, DEVFREQ, may appear quite similar with
/drivers/cpufreq. However, CPUFREQ does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.
Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
DEVFREQ also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.
When PM QoS is going to be used with the DEVFREQ device, the device
driver should enable OPPs that are appropriate with the current PM QoS
requests. In order to do so, the device driver may call opp_enable and
opp_disable at the notifier callback of PM QoS so that PM QoS's
update_target() call enables the appropriate OPPs. Note that at least
one of OPPs should be enabled at any time; be careful when there is a
transition.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
Tested with memory bus of Exynos4-NURI board.
The test code with board support for Exynos4-NURI is at
http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
---
Thank you for your valuable comments, Rafael, Greg, Pavel, Colin, Mike,
and Kevin.
Changed from v4
- Removed tickle, which is a duplicated feature; PM QoS can do the same.
- Allow to extend polling interval if devices have longer polling intervals.
- Relocated private data of governors.
- Removed system-wide sysfs
Changed from v3
- In kerneldoc comments, DEVFREQ has ben replaced by devfreq
- Revised removing devfreq entries with error mechanism
- Added and revised comments
- Removed unnecessary codes
- Allow to give a name to a governor
- Bugfix: a tickle call may cancel an older tickle call that is still in
effect.
Changed from v2
- Code style revised and cleaned up.
- Remove DEVFREQ entries that incur errors except for EAGAIN
- Bug fixed: tickle for devices without polling governors
Changes from v1(RFC)
- Rename: DVFS --> DEVFREQ
- Revised governor design
. Governor receives the whole struct devfreq
. Governor should gather usage information (thru get_dev_status)
itself
- Periodic monitoring runs only when needed.
- DEVFREQ no more deals with voltage information directly
- Removed some printks.
- Some cosmetics update
- Use freezable_wq.
---
drivers/base/power/Makefile | 1 +
drivers/base/power/devfreq.c | 303 ++++++++++++++++++++++++++++++++++++++++++
drivers/base/power/opp.c | 9 ++
include/linux/devfreq.h | 103 ++++++++++++++
kernel/power/Kconfig | 34 +++++
5 files changed, 450 insertions(+), 0 deletions(-)
create mode 100644 drivers/base/power/devfreq.c
create mode 100644 include/linux/devfreq.h
diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index 3647e11..20118dc 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -4,5 +4,6 @@ obj-$(CONFIG_PM_RUNTIME) += runtime.o
obj-$(CONFIG_PM_TRACE_RTC) += trace.o
obj-$(CONFIG_PM_OPP) += opp.o
obj-$(CONFIG_HAVE_CLK) += clock_ops.o
+obj-$(CONFIG_PM_DEVFREQ) += devfreq.o
ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
\ No newline at end of file
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
new file mode 100644
index 0000000..6f4bd3a
--- /dev/null
+++ b/drivers/base/power/devfreq.c
@@ -0,0 +1,303 @@
+/*
+ * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ * for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ * MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/opp.h>
+#include <linux/devfreq.h>
+#include <linux/workqueue.h>
+#include <linux/platform_device.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+#include <linux/hrtimer.h>
+
+/*
+ * devfreq polling interval in ms.
+ * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
+ */
+static unsigned int devfreq_interval = 20;
+
+/*
+ * devfreq_work periodically (given by devfreq_interval) monitors every
+ * registered device.
+ */
+static bool polling;
+static ktime_t last_polled_at;
+static struct workqueue_struct *devfreq_wq;
+static struct delayed_work devfreq_work;
+
+/* The list of all device-devfreq */
+static LIST_HEAD(devfreq_list);
+static DEFINE_MUTEX(devfreq_list_lock);
+
+/**
+ * find_device_devfreq() - find devfreq struct using device pointer
+ * @dev: device pointer used to lookup device devfreq.
+ *
+ * Search the list of device devfreqs and return the matched device's
+ * devfreq info. devfreq_list_lock should be held by the caller.
+ */
+static struct devfreq *find_device_devfreq(struct device *dev)
+{
+ struct devfreq *tmp_devfreq;
+
+ if (unlikely(IS_ERR_OR_NULL(dev))) {
+ pr_err("DEVFREQ: %s: Invalid parameters\n", __func__);
+ return ERR_PTR(-EINVAL);
+ }
+
+ list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
+ if (tmp_devfreq->dev == dev)
+ return tmp_devfreq;
+ }
+
+ return ERR_PTR(-ENODEV);
+}
+
+/**
+ * devfreq_do() - Check the usage profile of a given device and configure
+ * frequency and voltage accordingly
+ * @devfreq: devfreq info of the given device
+ */
+static int devfreq_do(struct devfreq *devfreq)
+{
+ struct opp *opp;
+ unsigned long freq;
+ int err;
+
+ err = devfreq->governor->get_target_freq(devfreq, &freq);
+ if (err)
+ return err;
+
+ opp = opp_find_freq_ceil(devfreq->dev, &freq);
+ if (opp == ERR_PTR(-ENODEV))
+ opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+ if (IS_ERR(opp))
+ return PTR_ERR(opp);
+
+ if (devfreq->previous_freq == freq)
+ return 0;
+
+ err = devfreq->profile->target(devfreq->dev, opp);
+ if (err)
+ return err;
+
+ devfreq->previous_freq = freq;
+ return 0;
+}
+
+/**
+ * devfreq_monitor() - Periodically run devfreq_do()
+ * @work: the work struct used to run devfreq_monitor periodically.
+ *
+ */
+static void devfreq_monitor(struct work_struct *work)
+{
+ struct devfreq *devfreq, *tmp;
+ int error;
+ unsigned int next_poll_min = UINT_MAX;
+ ktime_t now = ktime_get();
+ s64 time_passed = ktime_to_ms(ktime_sub(now, last_polled_at));
+ int iterations_passed = 1;
+
+ /* If n * devfreq_interval has passed, count it */
+ do_div(time_passed, devfreq_interval);
+ if (time_passed > 1)
+ iterations_passed = time_passed;
+ last_polled_at = now;
+
+ mutex_lock(&devfreq_list_lock);
+
+ list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
+ if (devfreq->next_polling == 0)
+ continue;
+
+ /*
+ * Reduce more next_polling if devfreq_wq took an extra
+ * delay. (i.e., CPU has been idled.)
+ */
+ if (devfreq->next_polling <= iterations_passed) {
+ error = devfreq_do(devfreq);
+
+ /* Remove a devfreq with an error. */
+ if (error && error != -EAGAIN) {
+ dev_err(devfreq->dev, "devfreq_do error(%d). "
+ "devfreq is removed from the device\n",
+ error);
+
+ list_del(&devfreq->node);
+ kfree(devfreq);
+
+ continue;
+ }
+ devfreq->next_polling = DIV_ROUND_UP(
+ devfreq->profile->polling_ms,
+ devfreq_interval);
+
+ /* No more polling required (polling_ms changed) */
+ if (devfreq->next_polling == 0)
+ continue;
+ } else {
+ devfreq->next_polling -= iterations_passed;
+ }
+
+ next_poll_min = (next_poll_min > devfreq->next_polling) ?
+ devfreq->next_polling : next_poll_min;
+ }
+
+ if (next_poll_min > 0 && next_poll_min < UINT_MAX) {
+ polling = true;
+ queue_delayed_work(devfreq_wq, &devfreq_work, msecs_to_jiffies(
+ devfreq_interval * next_poll_min));
+ } else {
+ polling = false;
+ }
+
+ mutex_unlock(&devfreq_list_lock);
+}
+
+/**
+ * devfreq_add_device() - Add devfreq feature to the device
+ * @dev: the device to add devfreq feature.
+ * @profile: device-specific profile to run devfreq.
+ * @governor: the policy to choose frequency.
+ */
+int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
+ struct devfreq_governor *governor)
+{
+ struct devfreq *new_devfreq, *devfreq;
+ int err = 0;
+
+ if (!dev || !profile || !governor) {
+ dev_err(dev, "%s: Invalid parameters.\n", __func__);
+ return -EINVAL;
+ }
+
+ mutex_lock(&devfreq_list_lock);
+
+ devfreq = find_device_devfreq(dev);
+ if (!IS_ERR(devfreq)) {
+ dev_err(dev, "%s: Unable to create devfreq for the device. "
+ "It already has one.\n", __func__);
+ err = -EINVAL;
+ goto out;
+ }
+
+ new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
+ if (!new_devfreq) {
+ dev_err(dev, "%s: Unable to create devfreq for the device\n",
+ __func__);
+ err = -ENOMEM;
+ goto out;
+ }
+
+ new_devfreq->dev = dev;
+ new_devfreq->profile = profile;
+ new_devfreq->governor = governor;
+ new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
+ devfreq_interval);
+ new_devfreq->previous_freq = profile->initial_freq;
+
+ list_add(&new_devfreq->node, &devfreq_list);
+
+ if (devfreq_wq && new_devfreq->next_polling && !polling) {
+ polling = true;
+ queue_delayed_work(devfreq_wq, &devfreq_work,
+ msecs_to_jiffies(devfreq_interval));
+ }
+out:
+ mutex_unlock(&devfreq_list_lock);
+
+ return err;
+}
+
+/**
+ * devfreq_remove_device() - Remove devfreq feature from a device.
+ * @device: the device to remove devfreq feature.
+ */
+int devfreq_remove_device(struct device *dev)
+{
+ struct devfreq *devfreq;
+
+ if (!dev)
+ return -EINVAL;
+
+ mutex_lock(&devfreq_list_lock);
+ devfreq = find_device_devfreq(dev);
+ if (IS_ERR(devfreq)) {
+ dev_err(dev, "%s: Unable to find devfreq entry for the device.\n",
+ __func__);
+ mutex_unlock(&devfreq_list_lock);
+ return -EINVAL;
+ }
+
+ list_del(&devfreq->node);
+
+ kfree(devfreq);
+
+ mutex_unlock(&devfreq_list_lock);
+
+ return 0;
+}
+
+/**
+ * devfreq_update() - Notify that the device OPP has been changed.
+ * @dev: the device whose OPP has been changed.
+ */
+int devfreq_update(struct device *dev)
+{
+ struct devfreq *devfreq;
+ int err = 0;
+
+ mutex_lock(&devfreq_list_lock);
+
+ devfreq = find_device_devfreq(dev);
+ if (IS_ERR(devfreq)) {
+ err = PTR_ERR(devfreq);
+ goto out;
+ }
+
+ /* Reevaluate the proper frequency */
+ err = devfreq_do(devfreq);
+
+out:
+ mutex_unlock(&devfreq_list_lock);
+ return err;
+}
+
+/**
+ * devfreq_init() - Initialize data structure for devfreq framework and
+ * start polling registered devfreq devices.
+ */
+static int __init devfreq_init(void)
+{
+ devfreq_interval = jiffies_to_msecs(msecs_to_jiffies(devfreq_interval));
+ if (devfreq_interval <= 0) {
+ pr_err("DEVFREQ: devfreq_interval too small.\n");
+ return -EINVAL;
+ }
+
+ mutex_lock(&devfreq_list_lock);
+ polling = false;
+ devfreq_wq = create_freezable_workqueue("devfreq_wq");
+ INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+ mutex_unlock(&devfreq_list_lock);
+
+ last_polled_at = ktime_get();
+ devfreq_monitor(&devfreq_work.work);
+ return 0;
+}
+late_initcall(devfreq_init);
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 56a6899..819c1b3 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -21,6 +21,7 @@
#include <linux/rculist.h>
#include <linux/rcupdate.h>
#include <linux/opp.h>
+#include <linux/devfreq.h>
/*
* Internal data structure organization with the OPP layer library is as
@@ -428,6 +429,11 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
list_add_rcu(&new_opp->node, head);
mutex_unlock(&dev_opp_list_lock);
+ /*
+ * Notify generic dvfs for the change and ignore error
+ * because the device may not have a devfreq entry
+ */
+ devfreq_update(dev);
return 0;
}
@@ -512,6 +518,9 @@ unlock:
mutex_unlock(&dev_opp_list_lock);
out:
kfree(new_opp);
+
+ /* Notify generic dvfs for the change and ignore error */
+ devfreq_update(dev);
return r;
}
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
new file mode 100644
index 0000000..6ec630b
--- /dev/null
+++ b/include/linux/devfreq.h
@@ -0,0 +1,103 @@
+/*
+ * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ * for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ * MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __LINUX_DEVFREQ_H__
+#define __LINUX_DEVFREQ_H__
+
+#define DEVFREQ_NAME_LEN 16
+
+struct devfreq;
+struct devfreq_dev_status {
+ /* both since the last measure */
+ unsigned long total_time;
+ unsigned long busy_time;
+ unsigned long current_frequency;
+};
+
+struct devfreq_dev_profile {
+ unsigned long max_freq; /* may be larger than the actual value */
+ unsigned long initial_freq;
+ int polling_ms; /* 0 for at opp change only */
+
+ int (*target)(struct device *dev, struct opp *opp);
+ int (*get_dev_status)(struct device *dev,
+ struct devfreq_dev_status *stat);
+};
+
+/**
+ * struct devfreq_governor - Devfreq policy governor
+ * @name Governor's name
+ * @get_target_freq Returns desired operating frequency for the device.
+ * Basically, get_target_freq will run
+ * devfreq_dev_profile.get_dev_status() to get the
+ * status of the device (load = busy_time / total_time).
+ */
+struct devfreq_governor {
+ char name[DEVFREQ_NAME_LEN];
+ int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
+};
+
+/**
+ * struct devfreq - Device devfreq structure
+ * @node list node - contains the devices with devfreq that have been
+ * registered.
+ * @dev device pointer
+ * @profile device-specific devfreq profile
+ * @governor method how to choose frequency based on the usage.
+ * @previous_freq previously configured frequency value.
+ * @next_polling the number of remaining "devfreq_monitor" executions to
+ * reevaluate frequency/voltage of the device. Set by
+ * profile's polling_ms interval.
+ * @data Private data of the governor. The devfreq framework does not
+ * touch this.
+ *
+ * This structure stores the devfreq information for a give device.
+ */
+struct devfreq {
+ struct list_head node;
+
+ struct device *dev;
+ struct devfreq_dev_profile *profile;
+ struct devfreq_governor *governor;
+
+ unsigned long previous_freq;
+ unsigned int next_polling;
+
+ void *data; /* private data for governors */
+};
+
+#if defined(CONFIG_PM_DEVFREQ)
+extern int devfreq_add_device(struct device *dev,
+ struct devfreq_dev_profile *profile,
+ struct devfreq_governor *governor);
+extern int devfreq_remove_device(struct device *dev);
+extern int devfreq_update(struct device *dev);
+#else /* !CONFIG_PM_DEVFREQ */
+static int devfreq_add_device(struct device *dev,
+ struct devfreq_dev_profile *profile,
+ struct devfreq_governor *governor)
+{
+ return 0;
+}
+
+static int devfreq_remove_device(struct device *dev)
+{
+ return 0;
+}
+
+static int devfreq_update(struct device *dev)
+{
+ return 0;
+}
+#endif /* CONFIG_PM_DEVFREQ */
+
+#endif /* __LINUX_DEVFREQ_H__ */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 87f4d24..b7e15c8 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -227,3 +227,37 @@ config PM_OPP
config PM_RUNTIME_CLK
def_bool y
depends on PM_RUNTIME && HAVE_CLK
+
+config ARCH_HAS_DEVFREQ
+ bool
+ depends on ARCH_HAS_OPP
+ help
+ Denotes that the architecture supports DEVFREQ. If the architecture
+ supports multiple OPP entries per device and the frequency of the
+ devices with OPPs may be altered dynamically, the architecture
+ supports DEVFREQ.
+
+config PM_DEVFREQ
+ bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
+ depends on PM_OPP && ARCH_HAS_DEVFREQ
+ help
+ With OPP support, a device may have a list of frequencies and
+ voltages available. DEVFREQ, a generic DVFS framework can be
+ registered for a device with OPP support in order to let the
+ governor provided to DEVFREQ choose an operating frequency
+ based on the OPP's list and the policy given with DEVFREQ.
+
+ Each device may have its own governor and policy. DEVFREQ can
+ reevaluate the device state periodically and/or based on the
+ OPP list changes (each frequency/voltage pair in OPP may be
+ disabled or enabled).
+
+ Like some CPUs with CPUFREQ, a device may have multiple clocks.
+ However, because the clock frequencies of a single device are
+ determined by the single device's state, an instance of DEVFREQ
+ is attached to a single device and returns a "representative"
+ clock frequency from the OPP of the device, which is also attached
+ to a device by 1-to-1. The device registering DEVFREQ takes the
+ responsiblity to "interpret" the frequency listed in OPP and
+ to set its every clock accordingly with the "target" callback
+ given to DEVFREQ.
--
1.7.4.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v5 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
2011-08-08 9:03 ` [PATCH v5 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
@ 2011-08-11 1:00 ` Turquette, Mike
2011-08-17 9:40 ` MyungJoo Ham
0 siblings, 1 reply; 13+ messages in thread
From: Turquette, Mike @ 2011-08-11 1:00 UTC (permalink / raw)
To: MyungJoo Ham
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
On Mon, Aug 8, 2011 at 2:03 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> With OPPs, a device may have multiple operable frequency and voltage
> sets. However, there can be multiple possible operable sets and a system
> will need to choose one from them. In order to reduce the power
> consumption (by reducing frequency and voltage) without affecting the
> performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
> scheme may be used.
>
> This patch introduces the DVFS capability to non-CPU devices with OPPs.
> DVFS is a techique whereby the frequency and supplied voltage of a
> device is adjusted on-the-fly. DVFS usually sets the frequency as low
> as possible with given conditions (such as QoS assurance) and adjusts
> voltage according to the chosen frequency in order to reduce power
> consumption and heat dissipation.
>
> The generic DVFS for devices, DEVFREQ, may appear quite similar with
> /drivers/cpufreq. However, CPUFREQ does not allow to have multiple
> devices registered and is not suitable to have multiple heterogenous
> devices with different (but simple) governors.
>
> Normally, DVFS mechanism controls frequency based on the demand for
> the device, and then, chooses voltage based on the chosen frequency.
> DEVFREQ also controls the frequency based on the governor's frequency
> recommendation and let OPP pick up the pair of frequency and voltage
> based on the recommended frequency. Then, the chosen OPP is passed to
> device driver's "target" callback.
>
> When PM QoS is going to be used with the DEVFREQ device, the device
> driver should enable OPPs that are appropriate with the current PM QoS
> requests. In order to do so, the device driver may call opp_enable and
> opp_disable at the notifier callback of PM QoS so that PM QoS's
> update_target() call enables the appropriate OPPs. Note that at least
> one of OPPs should be enabled at any time; be careful when there is a
> transition.
>
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>
> ---
> Tested with memory bus of Exynos4-NURI board.
>
> The test code with board support for Exynos4-NURI is at
> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>
> ---
> Thank you for your valuable comments, Rafael, Greg, Pavel, Colin, Mike,
> and Kevin.
>
> Changed from v4
> - Removed tickle, which is a duplicated feature; PM QoS can do the same.
> - Allow to extend polling interval if devices have longer polling intervals.
> - Relocated private data of governors.
> - Removed system-wide sysfs
>
> Changed from v3
> - In kerneldoc comments, DEVFREQ has ben replaced by devfreq
> - Revised removing devfreq entries with error mechanism
> - Added and revised comments
> - Removed unnecessary codes
> - Allow to give a name to a governor
> - Bugfix: a tickle call may cancel an older tickle call that is still in
> effect.
>
> Changed from v2
> - Code style revised and cleaned up.
> - Remove DEVFREQ entries that incur errors except for EAGAIN
> - Bug fixed: tickle for devices without polling governors
>
> Changes from v1(RFC)
> - Rename: DVFS --> DEVFREQ
> - Revised governor design
> . Governor receives the whole struct devfreq
> . Governor should gather usage information (thru get_dev_status)
> itself
> - Periodic monitoring runs only when needed.
> - DEVFREQ no more deals with voltage information directly
> - Removed some printks.
> - Some cosmetics update
> - Use freezable_wq.
> ---
> drivers/base/power/Makefile | 1 +
> drivers/base/power/devfreq.c | 303 ++++++++++++++++++++++++++++++++++++++++++
> drivers/base/power/opp.c | 9 ++
> include/linux/devfreq.h | 103 ++++++++++++++
> kernel/power/Kconfig | 34 +++++
> 5 files changed, 450 insertions(+), 0 deletions(-)
> create mode 100644 drivers/base/power/devfreq.c
> create mode 100644 include/linux/devfreq.h
>
> diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
> index 3647e11..20118dc 100644
> --- a/drivers/base/power/Makefile
> +++ b/drivers/base/power/Makefile
> @@ -4,5 +4,6 @@ obj-$(CONFIG_PM_RUNTIME) += runtime.o
> obj-$(CONFIG_PM_TRACE_RTC) += trace.o
> obj-$(CONFIG_PM_OPP) += opp.o
> obj-$(CONFIG_HAVE_CLK) += clock_ops.o
> +obj-$(CONFIG_PM_DEVFREQ) += devfreq.o
>
> ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
> \ No newline at end of file
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> new file mode 100644
> index 0000000..6f4bd3a
> --- /dev/null
> +++ b/drivers/base/power/devfreq.c
> @@ -0,0 +1,303 @@
> +/*
> + * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
> + * for Non-CPU Devices Based on OPP.
> + *
> + * Copyright (C) 2011 Samsung Electronics
> + * MyungJoo Ham <myungjoo.ham@samsung.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/opp.h>
> +#include <linux/devfreq.h>
> +#include <linux/workqueue.h>
> +#include <linux/platform_device.h>
> +#include <linux/list.h>
> +#include <linux/printk.h>
> +#include <linux/hrtimer.h>
> +
> +/*
> + * devfreq polling interval in ms.
> + * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
> + */
> +static unsigned int devfreq_interval = 20;
Instead of devfreq_interval, how about devices specify their
devfreq->profile->polling_ms which gets turned into a jiffies value
called devfreq->profile->next_monitor? devfreq->profile->next_monitor
is a jiffies value some time in the future.
When the monitor is run then devfreq->profile->next_monitor will
simply be "jiffies + msecs_to_jiffies(devfreq->profile->polling_ms)".
That would remove the unnecessary interval which is pretty restrictive
for someone that *really* wants a 50ms polling interval (not a
multiple of 20).
Then something like the following can be done in devfreq_monitor:
unsigned long next_monitor = UINT_MAX;
for_each_entry(devfreq, &devfreq_list, node) {
next_monitor = minimum(next_monitor, devfreq->next_monitor);
}
queue_delayed_work(devfreq_wq, &devfreq_work, next_monitor);
> +
> +/*
> + * devfreq_work periodically (given by devfreq_interval) monitors every
> + * registered device.
> + */
> +static bool polling;
Since patchset v5 removes tickle (so as not to duplicate a QoS
constraint), devfreq should no longer handle any non-polling DVFS
transitions, which can also be expressed through a QoS constraint.
Any static constraint requests from other drivers will use the QoS API
and not devfreq. As such "polling" can be removed and all the code
that touches it since polling is assumed. Why use devfreq if you're
not going to poll?
This also makes the performance and powersave governors redundant, and
possibly the userspace governor as well since QoS requests can just as
easily achieve the sort of constraints desirable for a non-polling
implementation.
This also touches on another point that I'll elaborate more on below,
which is that timer queueing/scheduling should be implemented by the
governors, not the core code.
> +static ktime_t last_polled_at;
> +static struct workqueue_struct *devfreq_wq;
> +static struct delayed_work devfreq_work;
Polling should not be implemented in core code. Instead the devfreq
devices should define their own delayed_work and governors should
schedule that work, possibly with governor-specific work queues, or
using the kernel global workqueue. E.g: simple-ondemand should have
its own workqueue and each device have it's own delayed_work;
something like devfreq->governor->wq and devfreq->work would suffice.
This is better aligned with CPUfreq, which leaves polling details to
the governors. For example cpufreq-ondemand has delayed_work for each
CPU and that governor schedules the work, not core code. Core code
handles little more than registration and sysfs events (including
switching governors).
Some other benefits are increased debugging capability (we know which
governor failed, we know which device failed, etc) by separating the
work queues and the work structs and it simplifies having to figure
out some min_polling/next_polling since workqueues do that for us when
we schedule multiple pieces of work, all with different delays.
> +
> +/* The list of all device-devfreq */
> +static LIST_HEAD(devfreq_list);
> +static DEFINE_MUTEX(devfreq_list_lock);
> +
> +/**
> + * find_device_devfreq() - find devfreq struct using device pointer
> + * @dev: device pointer used to lookup device devfreq.
> + *
> + * Search the list of device devfreqs and return the matched device's
> + * devfreq info. devfreq_list_lock should be held by the caller.
> + */
> +static struct devfreq *find_device_devfreq(struct device *dev)
> +{
> + struct devfreq *tmp_devfreq;
> +
> + if (unlikely(IS_ERR_OR_NULL(dev))) {
> + pr_err("DEVFREQ: %s: Invalid parameters\n", __func__);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
> + if (tmp_devfreq->dev == dev)
> + return tmp_devfreq;
> + }
> +
> + return ERR_PTR(-ENODEV);
> +}
> +
> +/**
> + * devfreq_do() - Check the usage profile of a given device and configure
> + * frequency and voltage accordingly
> + * @devfreq: devfreq info of the given device
> + */
> +static int devfreq_do(struct devfreq *devfreq)
> +{
> + struct opp *opp;
> + unsigned long freq;
> + int err;
> +
> + err = devfreq->governor->get_target_freq(devfreq, &freq);
Following along with my comments above, the semantics of calling
get_target_freq in devfreq_do are backwards. This code should go in
the governor and is analogous to the various dbs_check_cpu functions
in those governors.
> + if (err)
> + return err;
> +
> + opp = opp_find_freq_ceil(devfreq->dev, &freq);
I know that devfreq was originally developed with OPP as a
requirement, but there is no reason to include such details in
devfreq. Many platforms may not adopt the OPP libary.
How about the governor calls devfreq->profile->get_target_freq in it's
decision making function (e.g. dbs_check_cpu) and then calls
devfreq->profile->target with the frequency from that same function,
only this time passing in the frequency instead of an OPP?
That target function can determine what to do with the frequency. In
the case of Exoyns4 the target function can use the OPP library. In
the case of a simpler device it might just make a clk_set_rate call.
This also removes the drama from V1 of this patchset where the use of
OPP library itself was disputed and allows devfreq to be more generic.
> + if (opp == ERR_PTR(-ENODEV))
> + opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> + if (IS_ERR(opp))
> + return PTR_ERR(opp);
> +
> + if (devfreq->previous_freq == freq)
> + return 0;
> +
> + err = devfreq->profile->target(devfreq->dev, opp);
> + if (err)
> + return err;
> +
> + devfreq->previous_freq = freq;
> + return 0;
> +}
> +
> +/**
> + * devfreq_monitor() - Periodically run devfreq_do()
> + * @work: the work struct used to run devfreq_monitor periodically.
> + *
> + */
> +static void devfreq_monitor(struct work_struct *work)
> +{
> + struct devfreq *devfreq, *tmp;
> + int error;
> + unsigned int next_poll_min = UINT_MAX;
> + ktime_t now = ktime_get();
> + s64 time_passed = ktime_to_ms(ktime_sub(now, last_polled_at));
> + int iterations_passed = 1;
> +
> + /* If n * devfreq_interval has passed, count it */
> + do_div(time_passed, devfreq_interval);
> + if (time_passed > 1)
> + iterations_passed = time_passed;
> + last_polled_at = now;
> +
> + mutex_lock(&devfreq_list_lock);
> +
> + list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
> + if (devfreq->next_polling == 0)
> + continue;
> +
> + /*
> + * Reduce more next_polling if devfreq_wq took an extra
> + * delay. (i.e., CPU has been idled.)
> + */
> + if (devfreq->next_polling <= iterations_passed) {
> + error = devfreq_do(devfreq);
> +
> + /* Remove a devfreq with an error. */
> + if (error && error != -EAGAIN) {
> + dev_err(devfreq->dev, "devfreq_do error(%d). "
> + "devfreq is removed from the device\n",
> + error);
> +
> + list_del(&devfreq->node);
> + kfree(devfreq);
> +
> + continue;
> + }
> + devfreq->next_polling = DIV_ROUND_UP(
> + devfreq->profile->polling_ms,
> + devfreq_interval);
> +
> + /* No more polling required (polling_ms changed) */
> + if (devfreq->next_polling == 0)
> + continue;
> + } else {
> + devfreq->next_polling -= iterations_passed;
> + }
> +
> + next_poll_min = (next_poll_min > devfreq->next_polling) ?
> + devfreq->next_polling : next_poll_min;
> + }
Again, I find this whole method of using devfreq_interval and
evaluating every single device in a single delayed_work in a single
workqueue unintuitive. workqueues already do the job of taking many
pieces of future work and ordering them based on their delay, so why
should we replicate that pattern here by grouping all work together
and manually determining when to run the monitor?
> +
> + if (next_poll_min > 0 && next_poll_min < UINT_MAX) {
> + polling = true;
> + queue_delayed_work(devfreq_wq, &devfreq_work, msecs_to_jiffies(
> + devfreq_interval * next_poll_min));
> + } else {
> + polling = false;
> + }
> +
> + mutex_unlock(&devfreq_list_lock);
> +}
> +
> +/**
> + * devfreq_add_device() - Add devfreq feature to the device
> + * @dev: the device to add devfreq feature.
> + * @profile: device-specific profile to run devfreq.
> + * @governor: the policy to choose frequency.
> + */
> +int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
> + struct devfreq_governor *governor)
> +{
> + struct devfreq *new_devfreq, *devfreq;
Why use new_devfreq at all? devfreq can be reused after checking for
error conditions.
> + int err = 0;
> +
> + if (!dev || !profile || !governor) {
> + dev_err(dev, "%s: Invalid parameters.\n", __func__);
> + return -EINVAL;
> + }
> +
> + mutex_lock(&devfreq_list_lock);
> +
> + devfreq = find_device_devfreq(dev);
> + if (!IS_ERR(devfreq)) {
> + dev_err(dev, "%s: Unable to create devfreq for the device. "
> + "It already has one.\n", __func__);
Put the error string all on one line, even if it goes past 80 chars.
> + err = -EINVAL;
> + goto out;
> + }
> +
> + new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
> + if (!new_devfreq) {
> + dev_err(dev, "%s: Unable to create devfreq for the device\n",
> + __func__);
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + new_devfreq->dev = dev;
> + new_devfreq->profile = profile;
> + new_devfreq->governor = governor;
> + new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
> + devfreq_interval);
> + new_devfreq->previous_freq = profile->initial_freq;
> +
> + list_add(&new_devfreq->node, &devfreq_list);
> +
> + if (devfreq_wq && new_devfreq->next_polling && !polling) {
> + polling = true;
> + queue_delayed_work(devfreq_wq, &devfreq_work,
> + msecs_to_jiffies(devfreq_interval));
> + }
> +out:
> + mutex_unlock(&devfreq_list_lock);
> +
> + return err;
> +}
> +
> +/**
> + * devfreq_remove_device() - Remove devfreq feature from a device.
> + * @device: the device to remove devfreq feature.
> + */
> +int devfreq_remove_device(struct device *dev)
> +{
> + struct devfreq *devfreq;
> +
> + if (!dev)
> + return -EINVAL;
> +
> + mutex_lock(&devfreq_list_lock);
> + devfreq = find_device_devfreq(dev);
> + if (IS_ERR(devfreq)) {
> + dev_err(dev, "%s: Unable to find devfreq entry for the device.\n",
> + __func__);
> + mutex_unlock(&devfreq_list_lock);
> + return -EINVAL;
> + }
> +
> + list_del(&devfreq->node);
> +
> + kfree(devfreq);
> +
> + mutex_unlock(&devfreq_list_lock);
> +
> + return 0;
> +}
> +
> +/**
> + * devfreq_update() - Notify that the device OPP has been changed.
> + * @dev: the device whose OPP has been changed.
> + */
> +int devfreq_update(struct device *dev)
OPP library should implement notifiers which devfreq can subscribe to,
instead of hacking this code into the OPP libary. I've looped in
Nishanth Menon, author of OPP library, to comment.
> +{
> + struct devfreq *devfreq;
> + int err = 0;
> +
> + mutex_lock(&devfreq_list_lock);
> +
> + devfreq = find_device_devfreq(dev);
> + if (IS_ERR(devfreq)) {
> + err = PTR_ERR(devfreq);
> + goto out;
> + }
> +
> + /* Reevaluate the proper frequency */
> + err = devfreq_do(devfreq);
> +
> +out:
> + mutex_unlock(&devfreq_list_lock);
> + return err;
> +}
> +
> +/**
> + * devfreq_init() - Initialize data structure for devfreq framework and
> + * start polling registered devfreq devices.
> + */
> +static int __init devfreq_init(void)
> +{
> + devfreq_interval = jiffies_to_msecs(msecs_to_jiffies(devfreq_interval));
> + if (devfreq_interval <= 0) {
> + pr_err("DEVFREQ: devfreq_interval too small.\n");
> + return -EINVAL;
> + }
> +
> + mutex_lock(&devfreq_list_lock);
> + polling = false;
> + devfreq_wq = create_freezable_workqueue("devfreq_wq");
> + INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
> + mutex_unlock(&devfreq_list_lock);
> +
> + last_polled_at = ktime_get();
> + devfreq_monitor(&devfreq_work.work);
> + return 0;
> +}
> +late_initcall(devfreq_init);
> diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
> index 56a6899..819c1b3 100644
> --- a/drivers/base/power/opp.c
> +++ b/drivers/base/power/opp.c
> @@ -21,6 +21,7 @@
> #include <linux/rculist.h>
> #include <linux/rcupdate.h>
> #include <linux/opp.h>
> +#include <linux/devfreq.h>
>
> /*
> * Internal data structure organization with the OPP layer library is as
> @@ -428,6 +429,11 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
> list_add_rcu(&new_opp->node, head);
> mutex_unlock(&dev_opp_list_lock);
>
> + /*
> + * Notify generic dvfs for the change and ignore error
> + * because the device may not have a devfreq entry
> + */
> + devfreq_update(dev);
Same comment as above.
> return 0;
> }
>
> @@ -512,6 +518,9 @@ unlock:
> mutex_unlock(&dev_opp_list_lock);
> out:
> kfree(new_opp);
> +
> + /* Notify generic dvfs for the change and ignore error */
> + devfreq_update(dev);
> return r;
Same comment as above.
Regards,
Mike
> }
>
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> new file mode 100644
> index 0000000..6ec630b
> --- /dev/null
> +++ b/include/linux/devfreq.h
> @@ -0,0 +1,103 @@
> +/*
> + * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
> + * for Non-CPU Devices Based on OPP.
> + *
> + * Copyright (C) 2011 Samsung Electronics
> + * MyungJoo Ham <myungjoo.ham@samsung.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef __LINUX_DEVFREQ_H__
> +#define __LINUX_DEVFREQ_H__
> +
> +#define DEVFREQ_NAME_LEN 16
> +
> +struct devfreq;
> +struct devfreq_dev_status {
> + /* both since the last measure */
> + unsigned long total_time;
> + unsigned long busy_time;
> + unsigned long current_frequency;
> +};
> +
> +struct devfreq_dev_profile {
> + unsigned long max_freq; /* may be larger than the actual value */
> + unsigned long initial_freq;
> + int polling_ms; /* 0 for at opp change only */
> +
> + int (*target)(struct device *dev, struct opp *opp);
> + int (*get_dev_status)(struct device *dev,
> + struct devfreq_dev_status *stat);
> +};
> +
> +/**
> + * struct devfreq_governor - Devfreq policy governor
> + * @name Governor's name
> + * @get_target_freq Returns desired operating frequency for the device.
> + * Basically, get_target_freq will run
> + * devfreq_dev_profile.get_dev_status() to get the
> + * status of the device (load = busy_time / total_time).
> + */
> +struct devfreq_governor {
> + char name[DEVFREQ_NAME_LEN];
> + int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
> +};
> +
> +/**
> + * struct devfreq - Device devfreq structure
> + * @node list node - contains the devices with devfreq that have been
> + * registered.
> + * @dev device pointer
> + * @profile device-specific devfreq profile
> + * @governor method how to choose frequency based on the usage.
> + * @previous_freq previously configured frequency value.
> + * @next_polling the number of remaining "devfreq_monitor" executions to
> + * reevaluate frequency/voltage of the device. Set by
> + * profile's polling_ms interval.
> + * @data Private data of the governor. The devfreq framework does not
> + * touch this.
> + *
> + * This structure stores the devfreq information for a give device.
> + */
> +struct devfreq {
> + struct list_head node;
> +
> + struct device *dev;
> + struct devfreq_dev_profile *profile;
> + struct devfreq_governor *governor;
> +
> + unsigned long previous_freq;
> + unsigned int next_polling;
> +
> + void *data; /* private data for governors */
> +};
> +
> +#if defined(CONFIG_PM_DEVFREQ)
> +extern int devfreq_add_device(struct device *dev,
> + struct devfreq_dev_profile *profile,
> + struct devfreq_governor *governor);
> +extern int devfreq_remove_device(struct device *dev);
> +extern int devfreq_update(struct device *dev);
> +#else /* !CONFIG_PM_DEVFREQ */
> +static int devfreq_add_device(struct device *dev,
> + struct devfreq_dev_profile *profile,
> + struct devfreq_governor *governor)
> +{
> + return 0;
> +}
> +
> +static int devfreq_remove_device(struct device *dev)
> +{
> + return 0;
> +}
> +
> +static int devfreq_update(struct device *dev)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_PM_DEVFREQ */
> +
> +#endif /* __LINUX_DEVFREQ_H__ */
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index 87f4d24..b7e15c8 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -227,3 +227,37 @@ config PM_OPP
> config PM_RUNTIME_CLK
> def_bool y
> depends on PM_RUNTIME && HAVE_CLK
> +
> +config ARCH_HAS_DEVFREQ
> + bool
> + depends on ARCH_HAS_OPP
> + help
> + Denotes that the architecture supports DEVFREQ. If the architecture
> + supports multiple OPP entries per device and the frequency of the
> + devices with OPPs may be altered dynamically, the architecture
> + supports DEVFREQ.
> +
> +config PM_DEVFREQ
> + bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
> + depends on PM_OPP && ARCH_HAS_DEVFREQ
> + help
> + With OPP support, a device may have a list of frequencies and
> + voltages available. DEVFREQ, a generic DVFS framework can be
> + registered for a device with OPP support in order to let the
> + governor provided to DEVFREQ choose an operating frequency
> + based on the OPP's list and the policy given with DEVFREQ.
> +
> + Each device may have its own governor and policy. DEVFREQ can
> + reevaluate the device state periodically and/or based on the
> + OPP list changes (each frequency/voltage pair in OPP may be
> + disabled or enabled).
> +
> + Like some CPUs with CPUFREQ, a device may have multiple clocks.
> + However, because the clock frequencies of a single device are
> + determined by the single device's state, an instance of DEVFREQ
> + is attached to a single device and returns a "representative"
> + clock frequency from the OPP of the device, which is also attached
> + to a device by 1-to-1. The device registering DEVFREQ takes the
> + responsiblity to "interpret" the frequency listed in OPP and
> + to set its every clock accordingly with the "target" callback
> + given to DEVFREQ.
> --
> 1.7.4.1
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v5 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
2011-08-11 1:00 ` Turquette, Mike
@ 2011-08-17 9:40 ` MyungJoo Ham
0 siblings, 0 replies; 13+ messages in thread
From: MyungJoo Ham @ 2011-08-17 9:40 UTC (permalink / raw)
To: Turquette, Mike
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
On Thu, Aug 11, 2011 at 10:00 AM, Turquette, Mike <mturquette@ti.com> wrote:
>> +static unsigned int devfreq_interval = 20;
>
> Instead of devfreq_interval, how about devices specify their
> devfreq->profile->polling_ms which gets turned into a jiffies value
> called devfreq->profile->next_monitor? devfreq->profile->next_monitor
> is a jiffies value some time in the future.
>
> When the monitor is run then devfreq->profile->next_monitor will
> simply be "jiffies + msecs_to_jiffies(devfreq->profile->polling_ms)".
> That would remove the unnecessary interval which is pretty restrictive
> for someone that *really* wants a 50ms polling interval (not a
> multiple of 20).
>
> Then something like the following can be done in devfreq_monitor:
>
> unsigned long next_monitor = UINT_MAX;
> for_each_entry(devfreq, &devfreq_list, node) {
> next_monitor = minimum(next_monitor, devfreq->next_monitor);
> }
> queue_delayed_work(devfreq_wq, &devfreq_work, next_monitor);
>
Thanks. In the patchset v6, I'll let it based on simply JIFFY. In the
v6 rc, I've removed devfreq_interval.
>> +
>> +/*
>> + * devfreq_work periodically (given by devfreq_interval) monitors every
>> + * registered device.
>> + */
>> +static bool polling;
>
> Since patchset v5 removes tickle (so as not to duplicate a QoS
> constraint), devfreq should no longer handle any non-polling DVFS
> transitions, which can also be expressed through a QoS constraint.
> Any static constraint requests from other drivers will use the QoS API
> and not devfreq. As such "polling" can be removed and all the code
> that touches it since polling is assumed. Why use devfreq if you're
> not going to poll?
>
> This also makes the performance and powersave governors redundant, and
> possibly the userspace governor as well since QoS requests can just as
> easily achieve the sort of constraints desirable for a non-polling
> implementation.
>
I guess this is resolved in the other thread regarding patch 2/3.
> This also touches on another point that I'll elaborate more on below,
> which is that timer queueing/scheduling should be implemented by the
> governors, not the core code.
[]
> Polling should not be implemented in core code. Instead the devfreq
> devices should define their own delayed_work and governors should
> schedule that work, possibly with governor-specific work queues, or
> using the kernel global workqueue. E.g: simple-ondemand should have
> its own workqueue and each device have it's own delayed_work;
> something like devfreq->governor->wq and devfreq->work would suffice.
>
> This is better aligned with CPUfreq, which leaves polling details to
> the governors. For example cpufreq-ondemand has delayed_work for each
> CPU and that governor schedules the work, not core code. Core code
> handles little more than registration and sysfs events (including
> switching governors).
The first paragraphs in the reply of the thread of patch 2/3 discusses
this issue: in brief,
unlike CPUFREQ, allowing governors to loop itself increases overhead
and duplicated code.
>
> Some other benefits are increased debugging capability (we know which
> governor failed, we know which device failed, etc) by separating the
> work queues and the work structs and it simplifies having to figure
> out some min_polling/next_polling since workqueues do that for us when
> we schedule multiple pieces of work, all with different delays.
At an error in DEVFREQ polling process, the corresponding device's
DEVFREQ is removed from polling and an error message is printed. Thus,
the user should be able to know which device has failed with what
errno already. The governor is designated by DEVFREQ user, so the user
should know which governor is failing by knowing which device is
failing. Or, we may simply add governor name at the error message in
devfreq_monitor().
>
>> +static int devfreq_do(struct devfreq *devfreq)
>> +{
>> + struct opp *opp;
>> + unsigned long freq;
>> + int err;
>> +
>> + err = devfreq->governor->get_target_freq(devfreq, &freq);
>
> Following along with my comments above, the semantics of calling
> get_target_freq in devfreq_do are backwards. This code should go in
> the governor and is analogous to the various dbs_check_cpu functions
> in those governors.
By limiting the governors' role as recommending appropriate
frequencies only, we reduce redundant code across governors and
simplifies implementation of governors. As long as we might need to
implement custom governors for different type of devices (especially
for GPUs and MMCs), this can ease the effort later. I don't think we
need to create redundant code here across governors at least at this
stage.
>
>> + if (err)
>> + return err;
>> +
>> + opp = opp_find_freq_ceil(devfreq->dev, &freq);
>
> I know that devfreq was originally developed with OPP as a
> requirement, but there is no reason to include such details in
> devfreq. Many platforms may not adopt the OPP libary.
>
> How about the governor calls devfreq->profile->get_target_freq in it's
> decision making function (e.g. dbs_check_cpu) and then calls
> devfreq->profile->target with the frequency from that same function,
> only this time passing in the frequency instead of an OPP?
>
> That target function can determine what to do with the frequency. In
> the case of Exoyns4 the target function can use the OPP library. In
> the case of a simpler device it might just make a clk_set_rate call.
> This also removes the drama from V1 of this patchset where the use of
> OPP library itself was disputed and allows devfreq to be more generic.
>
A device that is capable of DVFS has multiple pairs of frequency and
voltage. OPP is a data structure to represent multiple pairs of
frequency and voltage of a device. Without OPP, DEVFREQ requires to
implement data structure to represent multiple pairs of frequency and
voltage per device. Why would we implement another data structure that
represents the same thing? Besides, for devices already with OPPs,
their device driver needs to make two copies of data with different
type representing the exactly same things.
Actually, later on (seems not just yet anyway), OPP may need to loosen
up the condition in Kconfig in case non-SoC-dependent devices starting
to use DVFS feature, or even, CPUFREQ may use OPP as its data
structure for listing available frequencies.
>> + next_poll_min = (next_poll_min > devfreq->next_polling) ?
>> + devfreq->next_polling : next_poll_min;
>> + }
>
> Again, I find this whole method of using devfreq_interval and
> evaluating every single device in a single delayed_work in a single
> workqueue unintuitive. workqueues already do the job of taking many
> pieces of future work and ordering them based on their delay, so why
> should we replicate that pattern here by grouping all work together
> and manually determining when to run the monitor?
>
>> + struct devfreq *new_devfreq, *devfreq;
>
> Why use new_devfreq at all? devfreq can be reused after checking for
> error conditions.
Thanks. I will remove new_devfreq.
>
>> + if (!IS_ERR(devfreq)) {
>> + dev_err(dev, "%s: Unable to create devfreq for the device. "
>> + "It already has one.\n", __func__);
>
> Put the error string all on one line, even if it goes past 80 chars.
will correct that one.
>> +int devfreq_update(struct device *dev)
>
> OPP library should implement notifiers which devfreq can subscribe to,
> instead of hacking this code into the OPP libary. I've looped in
> Nishanth Menon, author of OPP library, to comment.
>
>> + devfreq_update(dev);
>
> Same comment as above.
>
>> + devfreq_update(dev);
>
> Same comment as above.
>
The next revision won't use devfreq_update. Thanks.
Cheers!
MyungJoo
--
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v5 2/3] PM / DEVFREQ: add basic governors
2011-08-08 9:03 [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
2011-08-08 9:03 ` [PATCH v5 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
@ 2011-08-08 9:03 ` MyungJoo Ham
2011-08-11 1:35 ` Turquette, Mike
2011-08-08 9:03 ` [PATCH v5 3/3] PM / DEVFREQ: add sysfs interface MyungJoo Ham
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: MyungJoo Ham @ 2011-08-08 9:03 UTC (permalink / raw)
To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner
Four CPUFREQ-like governors are provided as examples.
powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.
performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.
userspace: use the user specified frequency stored at
devfreq.user_set_freq. With sysfs support in the following patch, a user
may set the value with the sysfs interface.
simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies DEVFREQ to update operating frequency
accordingly. Thus, DEVFREQ users (device drivers) do not need to update
DEVFREQ manually with OPP entry updates or set polling_ms for powersave
, performance, userspace, or any other "static" governors.
Note that these are given only as basic examples for governors and any
devices with DEVFREQ may implement their own governors with the drivers
and use them.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
Changes from v4:
- Added userspace governor
Changes from v3:
- Bugfixes on simple-ondemand governor (divide by zero / overflow)
- Style fixes
- Give names to governors
---
drivers/base/power/devfreq.c | 100 ++++++++++++++++++++++++++++++++++++++++++
include/linux/devfreq.h | 8 +++
2 files changed, 108 insertions(+), 0 deletions(-)
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index 6f4bd3a..c53bca9 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -301,3 +301,103 @@ static int __init devfreq_init(void)
return 0;
}
late_initcall(devfreq_init);
+
+static int devfreq_powersave_func(struct devfreq *df,
+ unsigned long *freq)
+{
+ *freq = 0; /* devfreq_do will run "ceiling" to 0 */
+ return 0;
+}
+
+struct devfreq_governor devfreq_powersave = {
+ .name = "powersave",
+ .get_target_freq = devfreq_powersave_func,
+};
+
+static int devfreq_performance_func(struct devfreq *df,
+ unsigned long *freq)
+{
+ *freq = UINT_MAX; /* devfreq_do will run "floor" */
+ return 0;
+}
+
+struct devfreq_governor devfreq_performance = {
+ .name = "performance",
+ .get_target_freq = devfreq_performance_func,
+};
+
+static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
+{
+ if (df->user_set_freq == 0)
+ *freq = df->previous_freq; /* No user freq specified yet */
+ else
+ *freq = df->user_set_freq;
+
+ return 0;
+}
+
+struct devfreq_governor devfreq_userspace = {
+ .name = "userspace",
+ .get_target_freq = devfreq_userspace_func,
+};
+
+/* Constants for DevFreq-Simple-Ondemand (DFSO) */
+#define DFSO_UPTHRESHOLD (90)
+#define DFSO_DOWNDIFFERENCTIAL (5)
+static int devfreq_simple_ondemand_func(struct devfreq *df,
+ unsigned long *freq)
+{
+ struct devfreq_dev_status stat;
+ int err = df->profile->get_dev_status(df->dev, &stat);
+ unsigned long long a, b;
+
+ if (err)
+ return err;
+
+ /* Assume MAX if it is going to be divided by zero */
+ if (stat.total_time == 0) {
+ *freq = UINT_MAX;
+ return 0;
+ }
+
+ /* Prevent overflow */
+ if (stat.busy_time >= (1 << 24) || stat.total_time >= (1 << 24)) {
+ stat.busy_time >>= 7;
+ stat.total_time >>= 7;
+ }
+
+ /* Set MAX if it's busy enough */
+ if (stat.busy_time * 100 >
+ stat.total_time * DFSO_UPTHRESHOLD) {
+ *freq = UINT_MAX;
+ return 0;
+ }
+
+ /* Set MAX if we do not know the initial frequency */
+ if (stat.current_frequency == 0) {
+ *freq = UINT_MAX;
+ return 0;
+ }
+
+ /* Keep the current frequency */
+ if (stat.busy_time * 100 >
+ stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
+ *freq = stat.current_frequency;
+ return 0;
+ }
+
+ /* Set the desired frequency based on the load */
+ a = stat.busy_time;
+ a *= stat.current_frequency;
+ b = div_u64(a, stat.total_time);
+ b *= 100;
+ b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
+ *freq = (unsigned long) b;
+
+ return 0;
+}
+
+struct devfreq_governor devfreq_simple_ondemand = {
+ .name = "simple_ondemand",
+ .get_target_freq = devfreq_simple_ondemand_func,
+};
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 6ec630b..7131d2a 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -57,6 +57,8 @@ struct devfreq_governor {
* @next_polling the number of remaining "devfreq_monitor" executions to
* reevaluate frequency/voltage of the device. Set by
* profile's polling_ms interval.
+ * @user_set_freq User specified adequete frequency value (thru sysfs
+ * interface). Governors may and may not use this value.
* @data Private data of the governor. The devfreq framework does not
* touch this.
*
@@ -72,6 +74,7 @@ struct devfreq {
unsigned long previous_freq;
unsigned int next_polling;
+ unsigned long user_set_freq; /* governors may ignore this. */
void *data; /* private data for governors */
};
@@ -81,6 +84,11 @@ extern int devfreq_add_device(struct device *dev,
struct devfreq_governor *governor);
extern int devfreq_remove_device(struct device *dev);
extern int devfreq_update(struct device *dev);
+
+extern struct devfreq_governor devfreq_powersave;
+extern struct devfreq_governor devfreq_performance;
+extern struct devfreq_governor devfreq_userspace;
+extern struct devfreq_governor devfreq_simple_ondemand;
#else /* !CONFIG_PM_DEVFREQ */
static int devfreq_add_device(struct device *dev,
struct devfreq_dev_profile *profile,
--
1.7.4.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v5 2/3] PM / DEVFREQ: add basic governors
2011-08-08 9:03 ` [PATCH v5 2/3] PM / DEVFREQ: add basic governors MyungJoo Ham
@ 2011-08-11 1:35 ` Turquette, Mike
2011-08-16 8:52 ` MyungJoo Ham
0 siblings, 1 reply; 13+ messages in thread
From: Turquette, Mike @ 2011-08-11 1:35 UTC (permalink / raw)
To: MyungJoo Ham
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
On Mon, Aug 8, 2011 at 2:03 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> Four CPUFREQ-like governors are provided as examples.
>
> powersave: use the lowest frequency possible. The user (device) should
> set the polling_ms as 0 because polling is useless for this governor.
>
> performance: use the highest freqeuncy possible. The user (device)
> should set the polling_ms as 0 because polling is useless for this
> governor.
Polling is the only practical use for devfreq, assuming a QoS API
exists for DVFS. As such powersave and performance governors should
be removed.
> userspace: use the user specified frequency stored at
> devfreq.user_set_freq. With sysfs support in the following patch, a user
> may set the value with the sysfs interface.
>
> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
I won't repeate everything from patch 1 of this series, but the
governors should implement the queue/loop logic in the same way that
CPUfreq does, and the individual devices should have their own
delayed_work.
> When a user updates OPP entries (enable/disable/add), OPP framework
> automatically notifies DEVFREQ to update operating frequency
> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
It would be nice if OPP library "notified" devfreq but it does not
today. OPP library needs notifiers and devfreq can provide handlers
for them.
> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
> , performance, userspace, or any other "static" governors.
>
> Note that these are given only as basic examples for governors and any
> devices with DEVFREQ may implement their own governors with the drivers
> and use them.
>
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>
> ---
> Changes from v4:
> - Added userspace governor
>
> Changes from v3:
> - Bugfixes on simple-ondemand governor (divide by zero / overflow)
> - Style fixes
> - Give names to governors
> ---
> drivers/base/power/devfreq.c | 100 ++++++++++++++++++++++++++++++++++++++++++
> include/linux/devfreq.h | 8 +++
> 2 files changed, 108 insertions(+), 0 deletions(-)
Governors should be split out into their own file, especially since
they need to grow to include polling/queueing logic.
>
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> index 6f4bd3a..c53bca9 100644
> --- a/drivers/base/power/devfreq.c
> +++ b/drivers/base/power/devfreq.c
> @@ -301,3 +301,103 @@ static int __init devfreq_init(void)
> return 0;
> }
> late_initcall(devfreq_init);
> +
> +static int devfreq_powersave_func(struct devfreq *df,
> + unsigned long *freq)
> +{
> + *freq = 0; /* devfreq_do will run "ceiling" to 0 */
> + return 0;
> +}
> +
> +struct devfreq_governor devfreq_powersave = {
> + .name = "powersave",
> + .get_target_freq = devfreq_powersave_func,
> +};
> +
> +static int devfreq_performance_func(struct devfreq *df,
> + unsigned long *freq)
> +{
> + *freq = UINT_MAX; /* devfreq_do will run "floor" */
> + return 0;
> +}
> +
> +struct devfreq_governor devfreq_performance = {
> + .name = "performance",
> + .get_target_freq = devfreq_performance_func,
> +};
> +
> +static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
> +{
> + if (df->user_set_freq == 0)
> + *freq = df->previous_freq; /* No user freq specified yet */
> + else
> + *freq = df->user_set_freq;
> +
> + return 0;
> +}
> +
> +struct devfreq_governor devfreq_userspace = {
> + .name = "userspace",
> + .get_target_freq = devfreq_userspace_func,
> +};
> +
> +/* Constants for DevFreq-Simple-Ondemand (DFSO) */
> +#define DFSO_UPTHRESHOLD (90)
> +#define DFSO_DOWNDIFFERENCTIAL (5)
> +static int devfreq_simple_ondemand_func(struct devfreq *df,
> + unsigned long *freq)
> +{
> + struct devfreq_dev_status stat;
> + int err = df->profile->get_dev_status(df->dev, &stat);
> + unsigned long long a, b;
> +
> + if (err)
> + return err;
> +
> + /* Assume MAX if it is going to be divided by zero */
> + if (stat.total_time == 0) {
> + *freq = UINT_MAX;
> + return 0;
> + }
> +
> + /* Prevent overflow */
> + if (stat.busy_time >= (1 << 24) || stat.total_time >= (1 << 24)) {
> + stat.busy_time >>= 7;
> + stat.total_time >>= 7;
> + }
> +
> + /* Set MAX if it's busy enough */
> + if (stat.busy_time * 100 >
> + stat.total_time * DFSO_UPTHRESHOLD) {
Thresholds should not be constants, but should be tuneable parameters,
per-device. This is yet another reason for revising the existing
relationship between devfreq core code, governors and devices.
> + *freq = UINT_MAX;
> + return 0;
> + }
> +
> + /* Set MAX if we do not know the initial frequency */
> + if (stat.current_frequency == 0) {
> + *freq = UINT_MAX;
> + return 0;
> + }
> +
> + /* Keep the current frequency */
> + if (stat.busy_time * 100 >
> + stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
Same as above.
> + *freq = stat.current_frequency;
> + return 0;
> + }
> +
> + /* Set the desired frequency based on the load */
> + a = stat.busy_time;
> + a *= stat.current_frequency;
> + b = div_u64(a, stat.total_time);
> + b *= 100;
> + b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
Same as above.
Regards,
Mike
> + *freq = (unsigned long) b;
> +
> + return 0;
> +}
> +
> +struct devfreq_governor devfreq_simple_ondemand = {
> + .name = "simple_ondemand",
> + .get_target_freq = devfreq_simple_ondemand_func,
> +};
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 6ec630b..7131d2a 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -57,6 +57,8 @@ struct devfreq_governor {
> * @next_polling the number of remaining "devfreq_monitor" executions to
> * reevaluate frequency/voltage of the device. Set by
> * profile's polling_ms interval.
> + * @user_set_freq User specified adequete frequency value (thru sysfs
> + * interface). Governors may and may not use this value.
> * @data Private data of the governor. The devfreq framework does not
> * touch this.
> *
> @@ -72,6 +74,7 @@ struct devfreq {
> unsigned long previous_freq;
> unsigned int next_polling;
>
> + unsigned long user_set_freq; /* governors may ignore this. */
> void *data; /* private data for governors */
> };
>
> @@ -81,6 +84,11 @@ extern int devfreq_add_device(struct device *dev,
> struct devfreq_governor *governor);
> extern int devfreq_remove_device(struct device *dev);
> extern int devfreq_update(struct device *dev);
> +
> +extern struct devfreq_governor devfreq_powersave;
> +extern struct devfreq_governor devfreq_performance;
> +extern struct devfreq_governor devfreq_userspace;
> +extern struct devfreq_governor devfreq_simple_ondemand;
> #else /* !CONFIG_PM_DEVFREQ */
> static int devfreq_add_device(struct device *dev,
> struct devfreq_dev_profile *profile,
> --
> 1.7.4.1
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v5 2/3] PM / DEVFREQ: add basic governors
2011-08-11 1:35 ` Turquette, Mike
@ 2011-08-16 8:52 ` MyungJoo Ham
2011-08-16 18:11 ` Turquette, Mike
0 siblings, 1 reply; 13+ messages in thread
From: MyungJoo Ham @ 2011-08-16 8:52 UTC (permalink / raw)
To: Turquette, Mike
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
On Thu, Aug 11, 2011 at 10:35 AM, Turquette, Mike <mturquette@ti.com> wrote:
[]
>
> Polling is the only practical use for devfreq, assuming a QoS API
> exists for DVFS. As such powersave and performance governors should
> be removed.
Although powersave/performance governors may seem useless, they are
used as basis on measuring the usefulness of DVFS mechanism of
specific devices. If a device is going to use DVFS, we can test the
device with them to find out the potential power save (compare
powersave to performance) and the performance deterioration (compared
to performance). Often, in testing phase, QA teams use performance to
find out any issues with DVFS features (in CPUFREQ). Users may simply
want to use performance governor in some cases (power is not an issue
sometimes).
Using QoS APIs simply to set "minimum" or "maximum" is possible.
However, they are not that straightforward; e.g., how should we set
"DMA latency" to be fixed at the minimum frequency regardless of
others, or how should we set "Network latency" to be fixed at the
maximum frequency? especially without knowing the specifications of
each DVFS-capable device (such as available frequencies, valid latency
values, ...).
>
>> userspace: use the user specified frequency stored at
>> devfreq.user_set_freq. With sysfs support in the following patch, a user
>> may set the value with the sysfs interface.
>>
>> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>
> I won't repeate everything from patch 1 of this series, but the
> governors should implement the queue/loop logic in the same way that
> CPUfreq does, and the individual devices should have their own
> delayed_work.
First, in case where we want to let each DVFS-capable device have
exact polling frequency (up to jiffy resolution), we only need to set
polling_interval = jiffies_to_msecs(1);.
In case of CPUFREQ, there would be only one polling loop at most for
each core. However, in case of DEVFREQ, there could be multiple
polling loops at a core if CPUFREQ-like looping logic is introduced.
Why don't we reduce that overhead while their function is same, it is
easily doable, and it reduces redundancy?
>
>> When a user updates OPP entries (enable/disable/add), OPP framework
>> automatically notifies DEVFREQ to update operating frequency
>> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>
> It would be nice if OPP library "notified" devfreq but it does not
> today. OPP library needs notifiers and devfreq can provide handlers
> for them.
That's why devfreq_update() is added in the patch. While DEVFREQ is
the only one requiring notifications from OPP, do you think we may
incur the overhead of notifier at OPP by replacing devfreq_update with
notifier? If we somehow add another module that requires notifications
from OPP for frequency availability changes, we will need to implement
notifier at OPP side, but not just yet, I guess: (discussed before at
https://lists.linux-foundation.org/pipermail/linux-pm/2011-July/032053.html
)
>
[]
>> ---
>> drivers/base/power/devfreq.c | 100 ++++++++++++++++++++++++++++++++++++++++++
>> include/linux/devfreq.h | 8 +++
>> 2 files changed, 108 insertions(+), 0 deletions(-)
>
> Governors should be split out into their own file, especially since
> they need to grow to include polling/queueing logic.
We will need to decide where to settle devfreq core, drivers, and
governors first. Would /drivers/devfreq/ be appropriate?
[]
>> +
>> + /* Set MAX if it's busy enough */
>> + if (stat.busy_time * 100 >
>> + stat.total_time * DFSO_UPTHRESHOLD) {
>
> Thresholds should not be constants, but should be tuneable parameters,
> per-device. This is yet another reason for revising the existing
> relationship between devfreq core code, governors and devices.
>
I agree. I think I should add governor specific "setup" value at
devfreq_add_device(); modifying the interface from
extern int devfreq_add_device(struct device *dev, struct
devfreq_dev_profile *profile, struct devfreq_governor *governor);
==>
extern int devfreq_add_device(struct device *dev, struct
devfreq_dev_profile *profile, struct devfreq_governor *governor, void
*gov_data);
where gov_data is fed to struct devfreq's data field.
>> + *freq = UINT_MAX;
>> + return 0;
>> + }
>> +
>> + /* Set MAX if we do not know the initial frequency */
>> + if (stat.current_frequency == 0) {
>> + *freq = UINT_MAX;
>> + return 0;
>> + }
>> +
>> + /* Keep the current frequency */
>> + if (stat.busy_time * 100 >
>> + stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
>
> Same as above.
Yes.
>
>> + *freq = stat.current_frequency;
>> + return 0;
>> + }
>> +
>> + /* Set the desired frequency based on the load */
>> + a = stat.busy_time;
>> + a *= stat.current_frequency;
>> + b = div_u64(a, stat.total_time);
>> + b *= 100;
>> + b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
>
> Same as above.
Yes.
>
> Regards,
> Mike
>
Cheers!
MyungJoo
--
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v5 2/3] PM / DEVFREQ: add basic governors
2011-08-16 8:52 ` MyungJoo Ham
@ 2011-08-16 18:11 ` Turquette, Mike
2011-08-17 7:32 ` MyungJoo Ham
0 siblings, 1 reply; 13+ messages in thread
From: Turquette, Mike @ 2011-08-16 18:11 UTC (permalink / raw)
To: myungjoo.ham
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
On Tue, Aug 16, 2011 at 1:52 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> On Thu, Aug 11, 2011 at 10:35 AM, Turquette, Mike <mturquette@ti.com> wrote:
> []
>>
>> Polling is the only practical use for devfreq, assuming a QoS API
>> exists for DVFS. As such powersave and performance governors should
>> be removed.
>
> Although powersave/performance governors may seem useless, they are
> used as basis on measuring the usefulness of DVFS mechanism of
> specific devices. If a device is going to use DVFS, we can test the
> device with them to find out the potential power save (compare
> powersave to performance) and the performance deterioration (compared
> to performance). Often, in testing phase, QA teams use performance to
> find out any issues with DVFS features (in CPUFREQ). Users may simply
> want to use performance governor in some cases (power is not an issue
> sometimes).
Fair enough. Keeping them around for testing is sensible.
> Using QoS APIs simply to set "minimum" or "maximum" is possible.
> However, they are not that straightforward; e.g., how should we set
> "DMA latency" to be fixed at the minimum frequency regardless of
> others, or how should we set "Network latency" to be fixed at the
> maximum frequency? especially without knowing the specifications of
> each DVFS-capable device (such as available frequencies, valid latency
> values, ...).
>
>>
>>> userspace: use the user specified frequency stored at
>>> devfreq.user_set_freq. With sysfs support in the following patch, a user
>>> may set the value with the sysfs interface.
>>>
>>> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>>
>> I won't repeate everything from patch 1 of this series, but the
>> governors should implement the queue/loop logic in the same way that
>> CPUfreq does, and the individual devices should have their own
>> delayed_work.
>
> First, in case where we want to let each DVFS-capable device have
> exact polling frequency (up to jiffy resolution), we only need to set
> polling_interval = jiffies_to_msecs(1);.
That requires a source code change for anyone that wants to do it. My
main complaint with this method is that it is restrictive to begin
with and the whole method for determining the next_polling time
reproduces what workqueues already give us.
> In case of CPUFREQ, there would be only one polling loop at most for
> each core. However, in case of DEVFREQ, there could be multiple
> polling loops at a core if CPUFREQ-like looping logic is introduced.
> Why don't we reduce that overhead while their function is same, it is
> easily doable, and it reduces redundancy?
I'm afraid I don't follow. I was thinking of having a single wq loop
for each device. Under what conditions would a single device have
multiple wq loops operating against it?
>>> When a user updates OPP entries (enable/disable/add), OPP framework
>>> automatically notifies DEVFREQ to update operating frequency
>>> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>>
>> It would be nice if OPP library "notified" devfreq but it does not
>> today. OPP library needs notifiers and devfreq can provide handlers
>> for them.
>
> That's why devfreq_update() is added in the patch. While DEVFREQ is
> the only one requiring notifications from OPP, do you think we may
> incur the overhead of notifier at OPP by replacing devfreq_update with
> notifier? If we somehow add another module that requires notifications
> from OPP for frequency availability changes, we will need to implement
> notifier at OPP side, but not just yet, I guess: (discussed before at
> https://lists.linux-foundation.org/pipermail/linux-pm/2011-July/032053.html
> )
Reading that thread makes me think that we really should implement
notifiers in the OPP library. An obvious user of OPP notifiers would
be CPUfreq. I think it is safe to say that there may be
implementations of devfreq and CPUfreq that live side-by-side in the
near future; OPPs might be enabled/disabled dynamically, which means
both of them need callbacks. Better to abstract it out early, I
think.
>>
> []
>>> ---
>>> drivers/base/power/devfreq.c | 100 ++++++++++++++++++++++++++++++++++++++++++
>>> include/linux/devfreq.h | 8 +++
>>> 2 files changed, 108 insertions(+), 0 deletions(-)
>>
>> Governors should be split out into their own file, especially since
>> they need to grow to include polling/queueing logic.
>
> We will need to decide where to settle devfreq core, drivers, and
> governors first. Would /drivers/devfreq/ be appropriate?
I think GKH already ACK'd drivers/devfreq in a previous thread:
https://lists.linux-foundation.org/pipermail/linux-pm/2011-August/032537.html
> []
>>> +
>>> + /* Set MAX if it's busy enough */
>>> + if (stat.busy_time * 100 >
>>> + stat.total_time * DFSO_UPTHRESHOLD) {
>>
>> Thresholds should not be constants, but should be tuneable parameters,
>> per-device. This is yet another reason for revising the existing
>> relationship between devfreq core code, governors and devices.
>>
>
> I agree. I think I should add governor specific "setup" value at
> devfreq_add_device(); modifying the interface from
>
> extern int devfreq_add_device(struct device *dev, struct
> devfreq_dev_profile *profile, struct devfreq_governor *governor);
> ==>
> extern int devfreq_add_device(struct device *dev, struct
> devfreq_dev_profile *profile, struct devfreq_governor *governor, void
> *gov_data);
>
> where gov_data is fed to struct devfreq's data field.
It would be nice for the threshold values to be run-time tunable via
sysfs. CPUfreq does this well today for ondemand/conservative
governors and it really helps when doing power/performance tuning.
Regards,
Mike
>>> + *freq = UINT_MAX;
>>> + return 0;
>>> + }
>>> +
>>> + /* Set MAX if we do not know the initial frequency */
>>> + if (stat.current_frequency == 0) {
>>> + *freq = UINT_MAX;
>>> + return 0;
>>> + }
>>> +
>>> + /* Keep the current frequency */
>>> + if (stat.busy_time * 100 >
>>> + stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
>>
>> Same as above.
> Yes.
>
>>
>>> + *freq = stat.current_frequency;
>>> + return 0;
>>> + }
>>> +
>>> + /* Set the desired frequency based on the load */
>>> + a = stat.busy_time;
>>> + a *= stat.current_frequency;
>>> + b = div_u64(a, stat.total_time);
>>> + b *= 100;
>>> + b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
>>
>> Same as above.
> Yes.
>
>>
>> Regards,
>> Mike
>>
>
>
> Cheers!
> MyungJoo
>
> --
> MyungJoo Ham (함명주), Ph.D.
> Mobile Software Platform Lab,
> Digital Media and Communications (DMC) Business
> Samsung Electronics
> cell: 82-10-6714-2858
>
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v5 2/3] PM / DEVFREQ: add basic governors
2011-08-16 18:11 ` Turquette, Mike
@ 2011-08-17 7:32 ` MyungJoo Ham
0 siblings, 0 replies; 13+ messages in thread
From: MyungJoo Ham @ 2011-08-17 7:32 UTC (permalink / raw)
To: Turquette, Mike
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
On Wed, Aug 17, 2011 at 3:11 AM, Turquette, Mike <mturquette@ti.com> wrote:
> On Tue, Aug 16, 2011 at 1:52 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
>>
>> First, in case where we want to let each DVFS-capable device have
>> exact polling frequency (up to jiffy resolution), we only need to set
>> polling_interval = jiffies_to_msecs(1);.
>
> That requires a source code change for anyone that wants to do it. My
> main complaint with this method is that it is restrictive to begin
> with and the whole method for determining the next_polling time
> reproduces what workqueues already give us.
>
>> In case of CPUFREQ, there would be only one polling loop at most for
>> each core. However, in case of DEVFREQ, there could be multiple
>> polling loops at a core if CPUFREQ-like looping logic is introduced.
>> Why don't we reduce that overhead while their function is same, it is
>> easily doable, and it reduces redundancy?
>
> I'm afraid I don't follow. I was thinking of having a single wq loop
> for each device. Under what conditions would a single device have
> multiple wq loops operating against it?
I meant a single wq loop for each device and multiple DEVFREQ devices
for a system.
I aggregated multiple instances of DEVFREQ polling into one polling loop because
1. remove redundant polling loops,
2. simplify governors implementation; I'm presuming that some devices
(especially GPUs and MMC hosts) might want their own custom governors
although the governors won't be complex, and
3. reduce overhead.
I don't see any benefit of looping at each DEVFREQ device, yet.
The case of CPUFREQ is a bit different because each CPUFREQ device
(CPU) is capable of looping itself and each looping should represent
one CPU. Each looping device (CPU) will have only up to one CPUFREQ
polling loop and each instance of CPUFREQ should be executed on the
corresponding CPU in order to avoid being slept while the represented
CPU is running.
>
>>>> When a user updates OPP entries (enable/disable/add), OPP framework
>>>> automatically notifies DEVFREQ to update operating frequency
>>>> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>>>
>>> It would be nice if OPP library "notified" devfreq but it does not
>>> today. OPP library needs notifiers and devfreq can provide handlers
>>> for them.
>>
>> That's why devfreq_update() is added in the patch. While DEVFREQ is
>> the only one requiring notifications from OPP, do you think we may
>> incur the overhead of notifier at OPP by replacing devfreq_update with
>> notifier? If we somehow add another module that requires notifications
>> from OPP for frequency availability changes, we will need to implement
>> notifier at OPP side, but not just yet, I guess: (discussed before at
>> https://lists.linux-foundation.org/pipermail/linux-pm/2011-July/032053.html
>> )
>
> Reading that thread makes me think that we really should implement
> notifiers in the OPP library. An obvious user of OPP notifiers would
> be CPUfreq. I think it is safe to say that there may be
> implementations of devfreq and CPUfreq that live side-by-side in the
> near future; OPPs might be enabled/disabled dynamically, which means
> both of them need callbacks. Better to abstract it out early, I
> think.
Ok, assuming that there would be another requesting notifications from
OPP, I've added "opp_get_notifier()" that returns struct
srcu_notifier_head * removing devfreq_update() at patchset v6
candidate.
>
>>>
>> []
>>>> ---
>>>> drivers/base/power/devfreq.c | 100 ++++++++++++++++++++++++++++++++++++++++++
>>>> include/linux/devfreq.h | 8 +++
>>>> 2 files changed, 108 insertions(+), 0 deletions(-)
>>>
>>> Governors should be split out into their own file, especially since
>>> they need to grow to include polling/queueing logic.
>>
>> We will need to decide where to settle devfreq core, drivers, and
>> governors first. Would /drivers/devfreq/ be appropriate?
>
> I think GKH already ACK'd drivers/devfreq in a previous thread:
> https://lists.linux-foundation.org/pipermail/linux-pm/2011-August/032537.html
Yup. I'm moving them to /drivers/devfreq/
>
>> []
>>>> +
>>>> + /* Set MAX if it's busy enough */
>>>> + if (stat.busy_time * 100 >
>>>> + stat.total_time * DFSO_UPTHRESHOLD) {
>>>
>>> Thresholds should not be constants, but should be tuneable parameters,
>>> per-device. This is yet another reason for revising the existing
>>> relationship between devfreq core code, governors and devices.
>>>
>>
>> I agree. I think I should add governor specific "setup" value at
>> devfreq_add_device(); modifying the interface from
>>
>> extern int devfreq_add_device(struct device *dev, struct
>> devfreq_dev_profile *profile, struct devfreq_governor *governor);
>> ==>
>> extern int devfreq_add_device(struct device *dev, struct
>> devfreq_dev_profile *profile, struct devfreq_governor *governor, void
>> *gov_data);
>>
>> where gov_data is fed to struct devfreq's data field.
>
> It would be nice for the threshold values to be run-time tunable via
> sysfs. CPUfreq does this well today for ondemand/conservative
> governors and it really helps when doing power/performance tuning.
>
> Regards,
> Mike
>
This will be done with the next revision.
Thank you.
MyungJoo
>>>> + *freq = UINT_MAX;
>>>> + return 0;
>>>> + }
>>>> +
>>>> + /* Set MAX if we do not know the initial frequency */
>>>> + if (stat.current_frequency == 0) {
>>>> + *freq = UINT_MAX;
>>>> + return 0;
>>>> + }
>>>> +
>>>> + /* Keep the current frequency */
>>>> + if (stat.busy_time * 100 >
>>>> + stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
>>>
>>> Same as above.
>> Yes.
>>
>>>
>>>> + *freq = stat.current_frequency;
>>>> + return 0;
>>>> + }
>>>> +
>>>> + /* Set the desired frequency based on the load */
>>>> + a = stat.busy_time;
>>>> + a *= stat.current_frequency;
>>>> + b = div_u64(a, stat.total_time);
>>>> + b *= 100;
>>>> + b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
>>>
>>> Same as above.
>> Yes.
>>
>>>
>>> Regards,
>>> Mike
>>>
>>
>>
>> Cheers!
>> MyungJoo
>>
>> --
>> MyungJoo Ham (함명주), Ph.D.
>> Mobile Software Platform Lab,
>> Digital Media and Communications (DMC) Business
>> Samsung Electronics
>> cell: 82-10-6714-2858
>>
>
--
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v5 3/3] PM / DEVFREQ: add sysfs interface
2011-08-08 9:03 [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
2011-08-08 9:03 ` [PATCH v5 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
2011-08-08 9:03 ` [PATCH v5 2/3] PM / DEVFREQ: add basic governors MyungJoo Ham
@ 2011-08-08 9:03 ` MyungJoo Ham
2011-08-08 9:53 ` [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
2011-08-10 21:09 ` Rafael J. Wysocki
4 siblings, 0 replies; 13+ messages in thread
From: MyungJoo Ham @ 2011-08-08 9:03 UTC (permalink / raw)
To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner
Device specific sysfs interface /sys/devices/.../power/devfreq_*
- governor R: name of governor
- cur_freq R: current frequency
- max_freq R: maximum operable frequency
- min_freq R: minimum operable frequency
- set_freq R: read user specified frequency (0 if not specified yet)
W: set user specified frequency
- polling_interval R: polling interval in ms given with devfreq profile
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
--
Changed from v4
- removed system-wide sysfs interface
- removed tickling sysfs interface
- added set_freq for userspace governor (and any other governors that
require user input)
Changed from v3
- corrected sysfs API usage
- corrected error messages
- moved sysfs entry location
- added sysfs entries
Changed from v2
- add ABI entries for devfreq sysfs interface
---
Documentation/ABI/testing/sysfs-devices-power | 45 +++++++
drivers/base/power/devfreq.c | 152 +++++++++++++++++++++++++
2 files changed, 197 insertions(+), 0 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power
index 8ffbc25..6e77604 100644
--- a/Documentation/ABI/testing/sysfs-devices-power
+++ b/Documentation/ABI/testing/sysfs-devices-power
@@ -165,3 +165,48 @@ Description:
Not all drivers support this attribute. If it isn't supported,
attempts to read or write it will yield I/O errors.
+
+What: /sys/devices/.../power/devfreq_governor
+Date: July 2011
+Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+ The /sys/devices/.../power/devfreq_governor shows the name
+ of the governor used by the corresponding device.
+
+What: /sys/devices/.../power/devfreq_cur_freq
+Date: July 2011
+Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+ The /sys/devices/.../power/devfreq_cur_freq shows the current
+ frequency of the corresponding device.
+
+What: /sys/devices/.../power/devfreq_max_freq
+Date: July 2011
+Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+ The /sys/devices/.../power/devfreq_cur_freq shows the
+ maximum operable frequency of the corresponding device.
+
+What: /sys/devices/.../power/devfreq_min_freq
+Date: July 2011
+Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+ The /sys/devices/.../power/devfreq_cur_freq shows the
+ minimum operable frequency of the corresponding device.
+
+What: /sys/devices/.../power/devfreq_set_freq
+Date: August 2011
+Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+ The /sys/devices/.../power/devfreq_set_freq sets and shows
+ the user specified desired frequency of the device. The
+ governor may and may not use the value. With the basic
+ governors given with devfreq.c, userspace governor is
+ using the value.
+
+What: /sys/devices/.../power/devfreq_polling_interval
+Date: July 2011
+Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+ The /sys/devices/.../power/devfreq_polling_interval shows the
+ requested polling interval of the corresponding device.
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index c53bca9..4f964b7 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -42,6 +42,8 @@ static struct delayed_work devfreq_work;
static LIST_HEAD(devfreq_list);
static DEFINE_MUTEX(devfreq_list_lock);
+static struct attribute_group dev_attr_group;
+
/**
* find_device_devfreq() - find devfreq struct using device pointer
* @dev: device pointer used to lookup device devfreq.
@@ -138,6 +140,8 @@ static void devfreq_monitor(struct work_struct *work)
"devfreq is removed from the device\n",
error);
+ sysfs_unmerge_group(&devfreq->dev->kobj,
+ &dev_attr_group);
list_del(&devfreq->node);
kfree(devfreq);
@@ -218,6 +222,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
queue_delayed_work(devfreq_wq, &devfreq_work,
msecs_to_jiffies(devfreq_interval));
}
+
+ sysfs_merge_group(&dev->kobj, &dev_attr_group);
out:
mutex_unlock(&devfreq_list_lock);
@@ -244,6 +250,8 @@ int devfreq_remove_device(struct device *dev)
return -EINVAL;
}
+ sysfs_unmerge_group(&dev->kobj, &dev_attr_group);
+
list_del(&devfreq->node);
kfree(devfreq);
@@ -278,6 +286,150 @@ out:
return err;
}
+static ssize_t show_governor(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct devfreq *df = find_device_devfreq(dev);
+
+ if (IS_ERR(df))
+ return PTR_ERR(df);
+ if (!df->governor)
+ return -EINVAL;
+
+ return sprintf(buf, "%s\n", df->governor->name);
+}
+
+static ssize_t show_freq(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct devfreq *df = find_device_devfreq(dev);
+
+ if (IS_ERR(df))
+ return PTR_ERR(df);
+
+ return sprintf(buf, "%lu\n", df->previous_freq);
+}
+
+static ssize_t show_max_freq(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct devfreq *df = find_device_devfreq(dev);
+ unsigned long freq = ULONG_MAX;
+ struct opp *opp;
+
+ if (IS_ERR(df))
+ return PTR_ERR(df);
+ if (!df->dev)
+ return -EINVAL;
+
+ opp = opp_find_freq_floor(df->dev, &freq);
+ if (IS_ERR(opp))
+ return PTR_ERR(opp);
+
+ return sprintf(buf, "%lu\n", freq);
+}
+
+static ssize_t show_min_freq(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct devfreq *df = find_device_devfreq(dev);
+ unsigned long freq = 0;
+ struct opp *opp;
+
+ if (IS_ERR(df))
+ return PTR_ERR(df);
+ if (!df->dev)
+ return -EINVAL;
+
+ opp = opp_find_freq_ceil(df->dev, &freq);
+ if (IS_ERR(opp))
+ return PTR_ERR(opp);
+
+ return sprintf(buf, "%lu\n", freq);
+}
+
+static ssize_t show_polling_interval(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct devfreq *df = find_device_devfreq(dev);
+
+ if (IS_ERR(df))
+ return PTR_ERR(df);
+ if (!df->profile)
+ return -EINVAL;
+
+ return sprintf(buf, "%d\n", df->profile->polling_ms);
+}
+
+static ssize_t set_user_frequency(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct devfreq *devfreq;
+ unsigned long wanted;
+ int err = 0;
+
+ sscanf(buf, "%lu", &wanted);
+
+ mutex_lock(&devfreq_list_lock);
+ devfreq = find_device_devfreq(dev);
+
+ if (IS_ERR(devfreq)) {
+ err = PTR_ERR(devfreq);
+ goto out;
+ }
+
+ devfreq->user_set_freq = wanted;
+ err = count;
+out:
+ mutex_unlock(&devfreq_list_lock);
+ if (err >= 0)
+ devfreq_update(dev);
+ return err;
+}
+
+static ssize_t show_user_frequency(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct devfreq *devfreq;
+ int err = 0;
+
+ mutex_lock(&devfreq_list_lock);
+ devfreq = find_device_devfreq(dev);
+
+ if (IS_ERR(devfreq)) {
+ err = PTR_ERR(devfreq);
+ goto out;
+ }
+
+ err = sprintf(buf, "%lu\n", devfreq->user_set_freq);
+out:
+ mutex_unlock(&devfreq_list_lock);
+ return err;
+
+}
+
+static DEVICE_ATTR(devfreq_governor, 0444, show_governor, NULL);
+static DEVICE_ATTR(devfreq_cur_freq, 0444, show_freq, NULL);
+static DEVICE_ATTR(devfreq_max_freq, 0444, show_max_freq, NULL);
+static DEVICE_ATTR(devfreq_min_freq, 0444, show_min_freq, NULL);
+static DEVICE_ATTR(devfreq_polling_interval, 0444, show_polling_interval, NULL);
+static DEVICE_ATTR(devfreq_set_freq, 0644, show_user_frequency,
+ set_user_frequency);
+static struct attribute *dev_entries[] = {
+ &dev_attr_devfreq_governor.attr,
+ &dev_attr_devfreq_cur_freq.attr,
+ &dev_attr_devfreq_max_freq.attr,
+ &dev_attr_devfreq_min_freq.attr,
+ &dev_attr_devfreq_polling_interval.attr,
+ &dev_attr_devfreq_set_freq.attr,
+ NULL,
+};
+static struct attribute_group dev_attr_group = {
+ .name = power_group_name,
+ .attrs = dev_entries,
+};
+
/**
* devfreq_init() - Initialize data structure for devfreq framework and
* start polling registered devfreq devices.
--
1.7.4.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices.
2011-08-08 9:03 [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
` (2 preceding siblings ...)
2011-08-08 9:03 ` [PATCH v5 3/3] PM / DEVFREQ: add sysfs interface MyungJoo Ham
@ 2011-08-08 9:53 ` MyungJoo Ham
2011-08-08 21:35 ` Greg KH
2011-08-10 21:09 ` Rafael J. Wysocki
4 siblings, 1 reply; 13+ messages in thread
From: MyungJoo Ham @ 2011-08-08 9:53 UTC (permalink / raw)
To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner
Hello all,
I've got one issue requesting some comments: whether to create devfreq
directory or not.
For now, we only have four basic governors and one not-yet-posted
devfreq user for DEVFREQ; thus, it seems not necessary for just now.
However, like CPUFREQ, DEVFREQ may have several users later and we may
want to gather these files to one location: /drivers/devfreq or
somewhere else in /drivers/*. For example, Exynos4210 Memory DEVFREQ
user is located at arch/arm/mach-exynos4/devfreq_bus.c and with a
separated directory, we may have it at
drivers/devfreq/exynos4_memory_bus.c.
How do you think about separating DEVFREQ directory? Should I do so or
wait until we have several DEVFREQ users?
Cheers!
MyungJoo
--
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices.
2011-08-08 9:53 ` [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
@ 2011-08-08 21:35 ` Greg KH
0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2011-08-08 21:35 UTC (permalink / raw)
To: MyungJoo Ham; +Cc: Len Brown, Kyungmin Park, linux-pm, Thomas Gleixner
On Mon, Aug 08, 2011 at 06:53:31PM +0900, MyungJoo Ham wrote:
> Hello all,
>
>
> I've got one issue requesting some comments: whether to create devfreq
> directory or not.
>
> For now, we only have four basic governors and one not-yet-posted
> devfreq user for DEVFREQ; thus, it seems not necessary for just now.
>
> However, like CPUFREQ, DEVFREQ may have several users later and we may
> want to gather these files to one location: /drivers/devfreq or
> somewhere else in /drivers/*. For example, Exynos4210 Memory DEVFREQ
> user is located at arch/arm/mach-exynos4/devfreq_bus.c and with a
> separated directory, we may have it at
> drivers/devfreq/exynos4_memory_bus.c.
>
> How do you think about separating DEVFREQ directory? Should I do so or
> wait until we have several DEVFREQ users?
A new subdir is fine, it's not an issue, and seems to make sense here,
as you point out.
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices.
2011-08-08 9:03 [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
` (3 preceding siblings ...)
2011-08-08 9:53 ` [PATCH v5 0/3] DEVFREQ, DVFS Framework for Non-CPU Devices MyungJoo Ham
@ 2011-08-10 21:09 ` Rafael J. Wysocki
4 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2011-08-10 21:09 UTC (permalink / raw)
To: MyungJoo Ham
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
On Monday, August 08, 2011, MyungJoo Ham wrote:
> For a usage example, please look at
> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>
> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
> and other related clocks simply follow the determined DDR RAM clock.
>
> The DEVFREQ driver for Exynos4210 memory bus is at
> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>
> In the dd (writing and reading 360MiB) test with NURI board, the memory
> throughput was not changed (the performance is not deteriorated) while
> the SoC power consumption has been reduced by 1%. When the memory access
> is not that intense while the CPU is heavily used, the SoC power consumption
> has been reduced by 6%. The power consumption has been compared with the
> case using the conventional Exynos4210 CPUFREQ driver, which sets memory
> bus frequency according to the CPU core frequency. Besides, when the CPU core
> running slow and the memory access is intense, the performance (memory
> throughput) has been increased by 11% (with higher SoC power consumption of
> 5%). The tested governor is "simple-ondemand".
>
> MyungJoo Ham (3):
> PM: Introduce DEVFREQ: generic DVFS framework with device-specific
> OPPs
> PM / DEVFREQ: add basic governors
> PM / DEVFREQ: add sysfs interface
>
> Documentation/ABI/testing/sysfs-devices-power | 45 ++
> drivers/base/power/Makefile | 1 +
> drivers/base/power/devfreq.c | 555 +++++++++++++++++++++++++
> drivers/base/power/opp.c | 9 +
> include/linux/devfreq.h | 111 +++++
> kernel/power/Kconfig | 34 ++
> 6 files changed, 755 insertions(+), 0 deletions(-)
> create mode 100644 drivers/base/power/devfreq.c
> create mode 100644 include/linux/devfreq.h
Are there still any objections to this version of the patchset?
Rafael
^ permalink raw reply [flat|nested] 13+ messages in thread