* [PATCH 1/4] cgroup: change locking order in attach_task_by_pid()
From: Tejun Heo @ 2011-09-04 18:01 UTC (permalink / raw)
To: rjw, paul, lizf
Cc: Paul Menage, fweisbec, containers, linux-kernel, Oleg Nesterov,
Tejun Heo, linux-pm, akpm
In-Reply-To: <1315159280-25032-1-git-send-email-htejun@gmail.com>
From: Tejun Heo <tj@kernel.org>
cgroup_mutex is updated to nest inside threadgroup_fork_lock instead
of the other way around. threadgroup locking is scheduled to be
updated to cover all threadgroup altering operations and nesting it
inside cgroup_mutex complicates locking dependency unnecessarily.
This also simplifies code a bit.
The ugly "if (threadgroup)" conditionals for threadgroup locking will
soon be removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/cgroup.c | 22 ++++++++++------------
1 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1d2b6ce..bd1fb5f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2203,15 +2203,11 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
const struct cred *cred = current_cred(), *tcred;
int ret;
- if (!cgroup_lock_live_group(cgrp))
- return -ENODEV;
-
if (pid) {
rcu_read_lock();
tsk = find_task_by_vpid(pid);
if (!tsk) {
rcu_read_unlock();
- cgroup_unlock();
return -ESRCH;
}
if (threadgroup) {
@@ -2225,7 +2221,6 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
} else if (tsk->flags & PF_EXITING) {
/* optimization for the single-task-only case */
rcu_read_unlock();
- cgroup_unlock();
return -ESRCH;
}
@@ -2238,7 +2233,6 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
cred->euid != tcred->uid &&
cred->euid != tcred->suid) {
rcu_read_unlock();
- cgroup_unlock();
return -EACCES;
}
get_task_struct(tsk);
@@ -2251,15 +2245,19 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
get_task_struct(tsk);
}
- if (threadgroup) {
+ if (threadgroup)
threadgroup_fork_write_lock(tsk);
- ret = cgroup_attach_proc(cgrp, tsk);
- threadgroup_fork_write_unlock(tsk);
- } else {
- ret = cgroup_attach_task(cgrp, tsk);
+ ret = -ENODEV;
+ if (cgroup_lock_live_group(cgrp)) {
+ if (threadgroup)
+ ret = cgroup_attach_proc(cgrp, tsk);
+ else
+ ret = cgroup_attach_task(cgrp, tsk);
+ cgroup_unlock();
}
+ if (threadgroup)
+ threadgroup_fork_write_unlock(tsk);
put_task_struct(tsk);
- cgroup_unlock();
return ret;
}
--
1.7.6
^ permalink raw reply related
* [PATCHSET cgroup] extend threadgroup locking
From: Tejun Heo @ 2011-09-04 18:01 UTC (permalink / raw)
To: rjw, paul, lizf; +Cc: fweisbec, containers, linux-kernel, linux-pm, akpm
Hello,
cgroup currently only blocks new threads from joining the target
threadgroup during migration, and on-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.
This patchset extends threadgroup locking such that it covers all
operations which can alter the threadgroup - fork, exit and exec, and
update cgroup to take advantage of it. rwsem read ops are added to
exit path but exec is excluded by grabbing the existing
cred_guard_mutex from threadgroup locking helper.
This makes threadgroup locking complete and resolves cgroup issues
stemming from the target taskset being unstable.
This patchset is on top of the current pm-freezer + "freezer: fixes &
simplifications" patchset and contains the following four patches.
Patch list and diffstat follow.
Thanks.
[PATCH 1/4] cgroup: change locking order in attach_task_by_pid()
[PATCH 2/4] threadgroup: rename signal->threadgroup_fork_lock to
[PATCH 3/4] threadgroup: extend threadgroup_lock() to cover exit and
[PATCH 4/4] cgroup: always lock threadgroup during migration
include/linux/init_task.h | 9 ++----
include/linux/sched.h | 58 ++++++++++++++++++++++++++++---------------
kernel/cgroup.c | 62 +++++++++++++++++++++-------------------------
kernel/exit.c | 16 ++++++++---
kernel/fork.c | 8 ++---
5 files changed, 88 insertions(+), 65 deletions(-)
--
tejun
[1] http://thread.gmane.org/gmane.linux.kernel/1187553
^ permalink raw reply
* Re: [PATCH v4 1/2] Input: enable i8042-level wakeup control
From: Daniel Drake @ 2011-09-03 12:45 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: linux-pm, dilinger, linux-input
In-Reply-To: <20110817070351.GE29361@core.coreip.homeip.net>
Hi,
Sorry for the delay in replying to this - I took sime time off.
On Wed, Aug 17, 2011 at 8:03 AM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> OK, so we have the following scenario:
>
> - we do not need to worry about SND and LED on OLPC;
>
> - we do not need to worry about releasing the key that was pressed
> when we went to sleep because they system would not go to sleep while
> there is a key pressed. So the code in input_dev_resume() won't
> actually release any keys unless a key was held pressed and we forced
> system to sleep somehow.
>
> - we need to make sure we do not loose key press (and following events)
> when we are waking up.
Yes, that seems accurate, with one more addition:
- we need to make sure we do not loose mouse button press or mouse
movement (and following events) when we are waking up.
> Can atkbd driver stash away serio byte data (when not in command mode)
> and replay it when pm notifier signals us that resume process is done?
I'm having trouble understanding this suggestion or relating it to the
problems in question and the previous patches posted. Are you
suggesting stashing of data that is sent from the host to the
keyboard, or from the keyboard to the host? How does this solve our
issues? What about the mouse?
How is it going to avoid the input-level issues?
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH 3/5] PM / QoS: Add function dev_pm_qos_read_value()
From: Rafael J. Wysocki @ 2011-09-03 8:02 UTC (permalink / raw)
To: Jean Pihet; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <201109030155.28031.rjw@sisk.pl>
Hi,
On Saturday, September 03, 2011, Rafael J. Wysocki wrote:
> On Friday, September 02, 2011, Jean Pihet wrote:
> > On Fri, Sep 2, 2011 at 12:07 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
...
> > >> >
> > >> > - mutex_lock(&dev_pm_qos_mtx);
> > >> > req->dev = dev;
> > >> >
> > >> > - /* Return if the device has been removed */
> > >> > - if (req->dev->power.constraints_state == DEV_PM_QOS_NO_DEVICE) {
> > >> > - ret = -ENODEV;
> > >> > - goto out;
> > >> > - }
> > >> > + device_pm_lock();
> > >> > + mutex_lock(&dev_pm_qos_mtx);
> > >> >
> > >> > - /*
> > >> > - * Allocate the constraints data on the first call to add_request,
> > >> > - * i.e. only if the data is not already allocated and if the device has
> > >> > - * not been removed
> > >> > - */
> > >> > - if (dev->power.constraints_state == DEV_PM_QOS_DEVICE_PRESENT)
> > >> > - ret = dev_pm_qos_constraints_allocate(dev);
> > >> > + if (dev->power.constraints) {
> > >> > + device_pm_unlock();
> > >> > + } else {
> > >> > + if (list_empty(&dev->power.entry)) {
> > >> > + /* The device has been removed from the system. */
> > >> > + device_pm_unlock();
> > >> > + goto out;
> > >> 0 is silently returned in case the device has been removed. Is that
> > >> the intention?
> > >
> > > Pretty much it is. Is that a problem?
> > I think the caller needs to know if the constraint has been applied
> > correctly or not.
>
> However, the removal of the device is a special case. What would the caller
> be supposed to do when it learned that the device had been removed while it had
> been trying to add a QoS constraing for it? Not much I guess.
Anyway, since returning an error code in that case won't hurt either,
below is a v2 of the patch doing that and resetting the dev field in the
req structure.
Thanks,
Rafael
---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / QoS: Add function dev_pm_qos_read_value() (v2)
To read the current PM QoS value for a given device we need to
make sure that the device's power.constraints object won't be
removed while we're doing that. For this reason, put the
operation under dev->power.lock and acquire the lock
around the initialization and removal of power.constraints.
Moreover, since we're using the value of power.constraints to
determine whether or not the object is present, the
power.constraints_state field isn't necessary any more and may be
removed. However, dev_pm_qos_add_request() needs to check if the
device is being removed from the system before allocating a new
PM QoS constraints object for it, so it has to use device_pm_lock()
and the device PM QoS initialization and destruction should be done
under device_pm_lock() as well.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/base/power/main.c | 4 -
drivers/base/power/qos.c | 170 ++++++++++++++++++++++++++--------------------
include/linux/pm.h | 8 --
include/linux/pm_qos.h | 3
4 files changed, 104 insertions(+), 81 deletions(-)
Index: linux/drivers/base/power/qos.c
===================================================================
--- linux.orig/drivers/base/power/qos.c
+++ linux/drivers/base/power/qos.c
@@ -30,15 +30,6 @@
* . To minimize the data usage by the per-device constraints, the data struct
* is only allocated at the first call to dev_pm_qos_add_request.
* . The data is later free'd when the device is removed from the system.
- * . The constraints_state variable from dev_pm_info tracks the data struct
- * allocation state:
- * DEV_PM_QOS_NO_DEVICE: No device present or device removed, no data
- * allocated,
- * DEV_PM_QOS_DEVICE_PRESENT: Device present, data not allocated and will be
- * allocated at the first call to dev_pm_qos_add_request,
- * DEV_PM_QOS_ALLOCATED: Device present, data allocated. The per-device
- * PM QoS constraints framework is operational and constraints can be
- * added, updated or removed using the dev_pm_qos_* API.
* . A global mutex protects the constraints users from the data being
* allocated and free'd.
*/
@@ -51,8 +42,30 @@
static DEFINE_MUTEX(dev_pm_qos_mtx);
+
static BLOCKING_NOTIFIER_HEAD(dev_pm_notifiers);
+/**
+ * dev_pm_qos_read_value - Get PM QoS constraint for a given device.
+ * @dev: Device to get the PM QoS constraint value for.
+ */
+s32 dev_pm_qos_read_value(struct device *dev)
+{
+ struct pm_qos_constraints *c;
+ unsigned long flags;
+ s32 ret = 0;
+
+ spin_lock_irqsave(&dev->power.lock, flags);
+
+ c = dev->power.constraints;
+ if (c)
+ ret = pm_qos_read_value(c);
+
+ spin_unlock_irqrestore(&dev->power.lock, flags);
+
+ return ret;
+}
+
/*
* apply_constraint
* @req: constraint request to apply
@@ -105,27 +118,37 @@ static int dev_pm_qos_constraints_alloca
}
BLOCKING_INIT_NOTIFIER_HEAD(n);
+ plist_head_init(&c->list);
+ c->target_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
+ c->default_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
+ c->type = PM_QOS_MIN;
+ c->notifiers = n;
+
+ spin_lock_irq(&dev->power.lock);
dev->power.constraints = c;
- plist_head_init(&dev->power.constraints->list);
- dev->power.constraints->target_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
- dev->power.constraints->default_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
- dev->power.constraints->type = PM_QOS_MIN;
- dev->power.constraints->notifiers = n;
- dev->power.constraints_state = DEV_PM_QOS_ALLOCATED;
+ spin_unlock_irq(&dev->power.lock);
return 0;
}
+static void __dev_pm_qos_constraints_init(struct device *dev)
+{
+ spin_lock_irq(&dev->power.lock);
+ dev->power.constraints = NULL;
+ spin_unlock_irq(&dev->power.lock);
+}
+
/**
- * dev_pm_qos_constraints_init
+ * dev_pm_qos_constraints_init - Initalize device's PM QoS constraints pointer.
* @dev: target device
*
- * Called from the device PM subsystem at device insertion
+ * Called from the device PM subsystem at device insertion under
+ * device_pm_lock().
*/
void dev_pm_qos_constraints_init(struct device *dev)
{
mutex_lock(&dev_pm_qos_mtx);
- dev->power.constraints_state = DEV_PM_QOS_DEVICE_PRESENT;
+ dev->power.constraints = NULL;
mutex_unlock(&dev_pm_qos_mtx);
}
@@ -133,34 +156,35 @@ void dev_pm_qos_constraints_init(struct
* dev_pm_qos_constraints_destroy
* @dev: target device
*
- * Called from the device PM subsystem at device removal
+ * Called from the device PM subsystem at device removal under device_pm_lock().
*/
void dev_pm_qos_constraints_destroy(struct device *dev)
{
struct dev_pm_qos_request *req, *tmp;
+ struct pm_qos_constraints *c;
mutex_lock(&dev_pm_qos_mtx);
- if (dev->power.constraints_state == DEV_PM_QOS_ALLOCATED) {
- /* Flush the constraints list for the device */
- plist_for_each_entry_safe(req, tmp,
- &dev->power.constraints->list,
- node) {
- /*
- * Update constraints list and call the notification
- * callbacks if needed
- */
- apply_constraint(req, PM_QOS_REMOVE_REQ,
- PM_QOS_DEFAULT_VALUE);
- memset(req, 0, sizeof(*req));
- }
+ c = dev->power.constraints;
+ if (!c)
+ goto out;
- kfree(dev->power.constraints->notifiers);
- kfree(dev->power.constraints);
- dev->power.constraints = NULL;
+ /* Flush the constraints list for the device */
+ plist_for_each_entry_safe(req, tmp, &c->list, node) {
+ /*
+ * Update constraints list and call the notification
+ * callbacks if needed
+ */
+ apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_DEFAULT_VALUE);
+ memset(req, 0, sizeof(*req));
}
- dev->power.constraints_state = DEV_PM_QOS_NO_DEVICE;
+ __dev_pm_qos_constraints_init(dev);
+
+ kfree(c->notifiers);
+ kfree(c);
+
+ out:
mutex_unlock(&dev_pm_qos_mtx);
}
@@ -178,8 +202,9 @@ void dev_pm_qos_constraints_destroy(stru
*
* Returns 1 if the aggregated constraint value has changed,
* 0 if the aggregated constraint value has not changed,
- * -EINVAL in case of wrong parameters, -ENODEV if the device has been
- * removed from the system
+ * -EINVAL in case of wrong parameters, -ENOMEM if there's not enough memory
+ * to allocate for data structures, -ENODEV if the device has just been removed
+ * from the system.
*/
int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
s32 value)
@@ -195,28 +220,37 @@ int dev_pm_qos_add_request(struct device
return -EINVAL;
}
- mutex_lock(&dev_pm_qos_mtx);
req->dev = dev;
- /* Return if the device has been removed */
- if (req->dev->power.constraints_state == DEV_PM_QOS_NO_DEVICE) {
- ret = -ENODEV;
- goto out;
- }
+ device_pm_lock();
+ mutex_lock(&dev_pm_qos_mtx);
- /*
- * Allocate the constraints data on the first call to add_request,
- * i.e. only if the data is not already allocated and if the device has
- * not been removed
- */
- if (dev->power.constraints_state == DEV_PM_QOS_DEVICE_PRESENT)
- ret = dev_pm_qos_constraints_allocate(dev);
+ if (dev->power.constraints) {
+ device_pm_unlock();
+ } else {
+ if (list_empty(&dev->power.entry)) {
+ /* The device has been removed from the system. */
+ device_pm_unlock();
+ req->dev = NULL;
+ ret = -ENODEV;
+ goto out;
+ } else {
+ device_pm_unlock();
+ /*
+ * Allocate the constraints data on the first call to
+ * add_request, i.e. only if the data is not already
+ * allocated and if the device has not been removed.
+ */
+ ret = dev_pm_qos_constraints_allocate(dev);
+ }
+ }
if (!ret)
ret = apply_constraint(req, PM_QOS_ADD_REQ, value);
-out:
+ out:
mutex_unlock(&dev_pm_qos_mtx);
+
return ret;
}
EXPORT_SYMBOL_GPL(dev_pm_qos_add_request);
@@ -252,13 +286,13 @@ int dev_pm_qos_update_request(struct dev
mutex_lock(&dev_pm_qos_mtx);
- if (req->dev->power.constraints_state == DEV_PM_QOS_ALLOCATED) {
+ if (req->dev->power.constraints) {
if (new_value != req->node.prio)
ret = apply_constraint(req, PM_QOS_UPDATE_REQ,
new_value);
} else {
/* Return if the device has been removed */
- ret = -ENODEV;
+ ret = -EINVAL;
}
mutex_unlock(&dev_pm_qos_mtx);
@@ -293,7 +327,7 @@ int dev_pm_qos_remove_request(struct dev
mutex_lock(&dev_pm_qos_mtx);
- if (req->dev->power.constraints_state == DEV_PM_QOS_ALLOCATED) {
+ if (req->dev->power.constraints) {
ret = apply_constraint(req, PM_QOS_REMOVE_REQ,
PM_QOS_DEFAULT_VALUE);
memset(req, 0, sizeof(*req));
@@ -323,15 +357,12 @@ int dev_pm_qos_add_notifier(struct devic
mutex_lock(&dev_pm_qos_mtx);
- /* Silently return if the device has been removed */
- if (dev->power.constraints_state != DEV_PM_QOS_ALLOCATED)
- goto out;
-
- retval = blocking_notifier_chain_register(
- dev->power.constraints->notifiers,
- notifier);
+ /* Silently return if the constraints object is not present. */
+ if (dev->power.constraints)
+ retval = blocking_notifier_chain_register(
+ dev->power.constraints->notifiers,
+ notifier);
-out:
mutex_unlock(&dev_pm_qos_mtx);
return retval;
}
@@ -354,15 +385,12 @@ int dev_pm_qos_remove_notifier(struct de
mutex_lock(&dev_pm_qos_mtx);
- /* Silently return if the device has been removed */
- if (dev->power.constraints_state != DEV_PM_QOS_ALLOCATED)
- goto out;
-
- retval = blocking_notifier_chain_unregister(
- dev->power.constraints->notifiers,
- notifier);
+ /* Silently return if the constraints object is not present. */
+ if (dev->power.constraints)
+ retval = blocking_notifier_chain_unregister(
+ dev->power.constraints->notifiers,
+ notifier);
-out:
mutex_unlock(&dev_pm_qos_mtx);
return retval;
}
Index: linux/include/linux/pm_qos.h
===================================================================
--- linux.orig/include/linux/pm_qos.h
+++ linux/include/linux/pm_qos.h
@@ -77,6 +77,7 @@ int pm_qos_remove_notifier(int pm_qos_cl
int pm_qos_request_active(struct pm_qos_request *req);
s32 pm_qos_read_value(struct pm_qos_constraints *c);
+s32 dev_pm_qos_read_value(struct device *dev);
int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
s32 value);
int dev_pm_qos_update_request(struct dev_pm_qos_request *req, s32 new_value);
@@ -117,6 +118,8 @@ static inline int pm_qos_request_active(
static inline s32 pm_qos_read_value(struct pm_qos_constraints *c)
{ return 0; }
+static inline s32 dev_pm_qos_read_value(struct device *dev)
+ { return 0; }
static inline int dev_pm_qos_add_request(struct device *dev,
struct dev_pm_qos_request *req,
s32 value)
Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -98,8 +98,8 @@ void device_pm_add(struct device *dev)
dev_warn(dev, "parent %s should not be sleeping\n",
dev_name(dev->parent));
list_add_tail(&dev->power.entry, &dpm_list);
- mutex_unlock(&dpm_list_mtx);
dev_pm_qos_constraints_init(dev);
+ mutex_unlock(&dpm_list_mtx);
}
/**
@@ -110,9 +110,9 @@ void device_pm_remove(struct device *dev
{
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus", dev_name(dev));
- dev_pm_qos_constraints_destroy(dev);
complete_all(&dev->power.completion);
mutex_lock(&dpm_list_mtx);
+ dev_pm_qos_constraints_destroy(dev);
list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
device_wakeup_disable(dev);
Index: linux/include/linux/pm.h
===================================================================
--- linux.orig/include/linux/pm.h
+++ linux/include/linux/pm.h
@@ -421,13 +421,6 @@ enum rpm_request {
RPM_REQ_RESUME,
};
-/* Per-device PM QoS constraints data struct state */
-enum dev_pm_qos_state {
- DEV_PM_QOS_NO_DEVICE, /* No device present */
- DEV_PM_QOS_DEVICE_PRESENT, /* Device present, data not allocated */
- DEV_PM_QOS_ALLOCATED, /* Device present, data allocated */
-};
-
struct wakeup_source;
struct pm_domain_data {
@@ -489,7 +482,6 @@ struct dev_pm_info {
#endif
struct pm_subsys_data *subsys_data; /* Owned by the subsystem. */
struct pm_qos_constraints *constraints;
- enum dev_pm_qos_state constraints_state;
};
extern void update_pm_runtime_accounting(struct device *dev);
^ permalink raw reply
* Re: [PATCH 3/5] PM / QoS: Add function dev_pm_qos_read_value()
From: Rafael J. Wysocki @ 2011-09-02 23:55 UTC (permalink / raw)
To: Jean Pihet; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <CAORVsuVAzc8Q2tG0-WW+96Ad9GhR5OCq2bAuweggs1rimFMcqg@mail.gmail.com>
On Friday, September 02, 2011, Jean Pihet wrote:
> On Fri, Sep 2, 2011 at 12:07 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Hi,
> >
> > On Thursday, September 01, 2011, Jean Pihet wrote:
> >> Hi Rafael,
> >>
> >> On Wed, Aug 31, 2011 at 12:21 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >> >
> >> > To read the current PM QoS value for a given device we need to
> >> > make sure that the device's power.constraints object won't be
> >> > removed while we're doing that. For this reason, put the
> >> > operation under dev->power.lock and acquire the lock
> >> > around the initialization and removal of power.constraints.
> >> Ok.
> >>
> >> > Moreover, since we're using the value of power.constraints to
> >> > determine whether or not the object is present, the
> >> > power.constraints_state field isn't necessary any more and may be
> >> > removed. However, dev_pm_qos_add_request() needs to check if the
> >> > device is being removed from the system before allocating a new
> >> > PM QoS constraints object for it, so it has to use device_pm_lock()
> >> > and the device PM QoS initialization and destruction should be done
> >> > under device_pm_lock() as well.
> >> Ok that makes sense.
> >> The constraints_state field can be replaced by a combination of
> >> dev->power.constraints and list_empty(&dev->power.entry), which makes
> >> the code more compact and less redundant.
> >>
> >> >
> >> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> >> > ---
> >> > drivers/base/power/main.c | 4 -
> >> > drivers/base/power/qos.c | 167 ++++++++++++++++++++++++++--------------------
> >> > include/linux/pm.h | 8 --
> >> > include/linux/pm_qos.h | 3
> >> > 4 files changed, 101 insertions(+), 81 deletions(-)
> >> >
> >> > Index: linux/drivers/base/power/qos.c
> >> > ===================================================================
> >> > --- linux.orig/drivers/base/power/qos.c
> >> > +++ linux/drivers/base/power/qos.c
> >> > @@ -30,15 +30,6 @@
> >> ...
> >>
> >> >
> >> > @@ -178,8 +202,8 @@ void dev_pm_qos_constraints_destroy(stru
> >> > *
> >> > * Returns 1 if the aggregated constraint value has changed,
> >> > * 0 if the aggregated constraint value has not changed,
> >> > - * -EINVAL in case of wrong parameters, -ENODEV if the device has been
> >> > - * removed from the system
> >> > + * -EINVAL in case of wrong parameters, -ENOMEM if there's not enough memory
> >> > + * to allocate for data structures.
> >> Why not use -ENODEV in case there is no device?
> >
> > I don't think it's useful for the caller. If the device is gone, the
> > constraing simply doesn't matter, so there's no error to handle.
> >
> >> > */
> >> > int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> >> > s32 value)
> >> > @@ -195,28 +219,35 @@ int dev_pm_qos_add_request(struct device
> >> > return -EINVAL;
> >> > }
> >> >
> >> > - mutex_lock(&dev_pm_qos_mtx);
> >> > req->dev = dev;
> >> >
> >> > - /* Return if the device has been removed */
> >> > - if (req->dev->power.constraints_state == DEV_PM_QOS_NO_DEVICE) {
> >> > - ret = -ENODEV;
> >> > - goto out;
> >> > - }
> >> > + device_pm_lock();
> >> > + mutex_lock(&dev_pm_qos_mtx);
> >> >
> >> > - /*
> >> > - * Allocate the constraints data on the first call to add_request,
> >> > - * i.e. only if the data is not already allocated and if the device has
> >> > - * not been removed
> >> > - */
> >> > - if (dev->power.constraints_state == DEV_PM_QOS_DEVICE_PRESENT)
> >> > - ret = dev_pm_qos_constraints_allocate(dev);
> >> > + if (dev->power.constraints) {
> >> > + device_pm_unlock();
> >> > + } else {
> >> > + if (list_empty(&dev->power.entry)) {
> >> > + /* The device has been removed from the system. */
> >> > + device_pm_unlock();
> >> > + goto out;
> >> 0 is silently returned in case the device has been removed. Is that
> >> the intention?
> >
> > Pretty much it is. Is that a problem?
> I think the caller needs to know if the constraint has been applied
> correctly or not.
However, the removal of the device is a special case. What would the caller
be supposed to do when it learned that the device had been removed while it had
been trying to add a QoS constraing for it? Not much I guess.
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCH pm-freezer 1/4] cgroup_freezer: fix freezer->state setting bug in freezer_change_state()
From: Matt Helsley @ 2011-09-02 18:31 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Paul Menage, linux-kernel, Tejun Heo, linux-pm, containers
In-Reply-To: <20110902165839.GA7478@redhat.com>
On Fri, Sep 02, 2011 at 06:58:39PM +0200, Oleg Nesterov wrote:
> On 09/01, Matt Helsley wrote:
> >
> > On Wed, Aug 31, 2011 at 12:21:07PM +0200, Tejun Heo wrote:
> > > case CGROUP_FROZEN:
> > > - atomic_inc(&system_freezing_cnt);
> > > - retval = try_to_freeze_cgroup(cgroup, freezer);
> > > + if (freezer->state == CGROUP_THAWED) {
> > > + freezer->state = CGROUP_FREEZING;
> > > + atomic_inc(&system_freezing_cnt);
> > > + retval = try_to_freeze_cgroup(cgroup, freezer);
> >
> > This still doesn't look quite right. If the cgroup is FREEZING it should
> > also call try_to_freeze_cgroup(). I think this is what's needed:
> >
> > if (freezer->state == CGROUP_THAWED)
> > atomic_inc(&system_freezing_cnt);
> > freezer->state = CGROUP_FREEZING;
> > retval = try_to_freeze_cgroup(cgroup, freezer);
>
> This is what I mentioned before, to me this looks like a win.
>
> Why do we need try_to_freeze_cgroup() in this case? "for safety"
> could actually mean "hide the bug" ;)
Well, I need to check Tejun's latest freezer bits to see if this is
still the case but it was possible to get to the FREEZING state and
not enter FROZEN before returning to userspace. So you could come back
into the state change function in the FREEZING state with FROZEN as
the goal state. Note that for the cgroup freezer the FREEZING state is
optional -- so skipping it is fine so long was we guarantee that by the
time we exit to userspace with a FROZEN state all the tasks in the cgroup
are actually frozen (in the refrigerator loop) or frozen enough (about to
enter the refrigerator loop without causing more IO -- e.g. stopped).
Cheers,
-Matt
^ permalink raw reply
* [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Tejun Heo @ 2011-09-02 18:27 UTC (permalink / raw)
To: oleg, matthltc, rjw, paul; +Cc: Tejun Heo, containers, linux-pm, linux-kernel
In-Reply-To: <1314988070-12244-1-git-send-email-tj@kernel.org>
There's no in-kernel user of set_freezable_with_signal() and there's
no plan to add one. Mixing TIF_SIGPENDING with kernel threads can
lead to nasty corner cases as kernel threads never travel signal
delivery path on their own.
e.g. the current implementation is buggy in the cancelation path of
__thaw_task(). It calls recalc_sigpending_and_wake() in an attempt to
clear TIF_SIGPENDING but the function never clears it regardless of
sigpending state. This means that signallable freezable kthreads may
continue executing with !freezing() && stuck TIF_SIGPENDING, which can
be troublesome.
This patch removes set_freezable_with_signal() along with
PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer. User
tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious
sigpending is dealt with in the usual signal delivery path.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
include/linux/freezer.h | 20 +-------------------
include/linux/sched.h | 1 -
kernel/freezer.c | 27 ++++++---------------------
kernel/kthread.c | 2 +-
4 files changed, 8 insertions(+), 42 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 9193bae..7ffbf04 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -48,7 +48,7 @@ static inline bool try_to_freeze(void)
}
extern bool freeze_task(struct task_struct *p);
-extern bool __set_freezable(bool with_signal);
+extern bool set_freezable(void);
#ifdef CONFIG_CGROUP_FREEZER
extern bool cgroup_freezing(struct task_struct *task);
@@ -104,23 +104,6 @@ static inline int freezer_should_skip(struct task_struct *p)
}
/*
- * Tell the freezer that the current task should be frozen by it
- */
-static inline bool set_freezable(void)
-{
- return __set_freezable(false);
-}
-
-/*
- * Tell the freezer that the current task should be frozen by it and that it
- * should send a fake signal to the task to freeze it.
- */
-static inline bool set_freezable_with_signal(void)
-{
- return __set_freezable(true);
-}
-
-/*
* Freezer-friendly wrappers around wait_event_interruptible() and
* wait_event_interruptible_timeout(), originally defined in <linux/wait.h>
*/
@@ -164,7 +147,6 @@ static inline void freezer_do_not_count(void) {}
static inline void freezer_count(void) {}
static inline int freezer_should_skip(struct task_struct *p) { return 0; }
static inline void set_freezable(void) {}
-static inline void set_freezable_with_signal(void) {}
#define wait_event_freezable(wq, condition) \
wait_event_interruptible(wq, condition)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1bb3356..6d45ce7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1784,7 +1784,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
#define PF_MUTEX_TESTER 0x20000000 /* Thread belongs to the rt mutex tester */
#define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezable */
-#define PF_FREEZER_NOSIG 0x80000000 /* Freezer won't send signals to it */
/*
* Only the _current_ task can read/write to tsk->flags, but other
diff --git a/kernel/freezer.c b/kernel/freezer.c
index da76b64..770e6f5 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -39,7 +39,7 @@ bool freezing_slow_path(struct task_struct *p)
if (pm_nosig_freezing || cgroup_freezing(p))
return true;
- if (pm_freezing && !(p->flags & PF_FREEZER_NOSIG))
+ if (pm_freezing && !(p->flags & PF_KTHREAD))
return true;
return false;
@@ -72,10 +72,6 @@ bool __refrigerator(bool check_kthr_stop)
schedule();
}
- spin_lock_irq(¤t->sighand->siglock);
- recalc_sigpending(); /* We sent fake signal, clean it up */
- spin_unlock_irq(¤t->sighand->siglock);
-
pr_debug("%s left refrigerator\n", current->comm);
/*
@@ -120,7 +116,7 @@ bool freeze_task(struct task_struct *p)
return false;
}
- if (!(p->flags & PF_FREEZER_NOSIG)) {
+ if (!(p->flags & PF_KTHREAD)) {
fake_signal_wake_up(p);
/*
* fake_signal_wake_up() goes through p's scheduler
@@ -145,28 +141,19 @@ void __thaw_task(struct task_struct *p)
* be visible to @p as waking up implies wmb. Waking up inside
* freezer_lock also prevents wakeups from leaking outside
* refrigerator.
- *
- * If !FROZEN, @p hasn't reached refrigerator, recalc sigpending to
- * avoid leaving dangling TIF_SIGPENDING behind.
*/
spin_lock_irqsave(&freezer_lock, flags);
- if (frozen(p)) {
+ if (frozen(p))
wake_up_process(p);
- } else {
- spin_lock(&p->sighand->siglock);
- recalc_sigpending_and_wake(p);
- spin_unlock(&p->sighand->siglock);
- }
spin_unlock_irqrestore(&freezer_lock, flags);
}
/**
- * __set_freezable - make %current freezable
- * @with_signal: do we want %TIF_SIGPENDING for notification too?
+ * set_freezable - make %current freezable
*
* Mark %current freezable and enter refrigerator if necessary.
*/
-bool __set_freezable(bool with_signal)
+bool set_freezable(void)
{
might_sleep();
@@ -177,10 +164,8 @@ bool __set_freezable(bool with_signal)
*/
spin_lock_irq(&freezer_lock);
current->flags &= ~PF_NOFREEZE;
- if (with_signal)
- current->flags &= ~PF_FREEZER_NOSIG;
spin_unlock_irq(&freezer_lock);
return try_to_freeze();
}
-EXPORT_SYMBOL(__set_freezable);
+EXPORT_SYMBOL(set_freezable);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index a6cbeea..7a4c862 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -282,7 +282,7 @@ int kthreadd(void *unused)
set_cpus_allowed_ptr(tsk, cpu_all_mask);
set_mems_allowed(node_states[N_HIGH_MEMORY]);
- current->flags |= PF_NOFREEZE | PF_FREEZER_NOSIG;
+ current->flags |= PF_NOFREEZE;
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
--
1.7.6
^ permalink raw reply related
* [PATCH 5/6] freezer: remove unused @sig_only from freeze_task()
From: Tejun Heo @ 2011-09-02 18:27 UTC (permalink / raw)
To: oleg, matthltc, rjw, paul; +Cc: Tejun Heo, containers, linux-pm, linux-kernel
In-Reply-To: <1314988070-12244-1-git-send-email-tj@kernel.org>
After 25a16d02ba "freezer: make freezing() test freeze conditions in
effect instead of TIF_FREEZE", freezing() returns authoritative answer
on whether the current task should freeze or not and freeze_task()
doesn't need or use @sig_only. Remove it.
While at it, rewrite function comment for freeze_task() and rename
@sig_only to @user_only in try_to_freeze_tasks().
This patch doesn't cause any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
include/linux/freezer.h | 2 +-
kernel/cgroup_freezer.c | 4 ++--
kernel/freezer.c | 21 +++++++++------------
kernel/power/process.c | 8 ++++----
4 files changed, 16 insertions(+), 19 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 122b5ce..9193bae 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -47,7 +47,7 @@ static inline bool try_to_freeze(void)
return __refrigerator(false);
}
-extern bool freeze_task(struct task_struct *p, bool sig_only);
+extern bool freeze_task(struct task_struct *p);
extern bool __set_freezable(bool with_signal);
#ifdef CONFIG_CGROUP_FREEZER
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 56a457a..8753e7b 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -206,7 +206,7 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
/* Locking avoids race with FREEZING -> THAWED transitions. */
if (freezer->state == CGROUP_FREEZING)
- freeze_task(task, true);
+ freeze_task(task);
spin_unlock_irq(&freezer->lock);
}
@@ -274,7 +274,7 @@ static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
cgroup_iter_start(cgroup, &it);
while ((task = cgroup_iter_next(cgroup, &it))) {
- if (!freeze_task(task, true))
+ if (!freeze_task(task))
continue;
if (frozen(task))
continue;
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 9846133..da76b64 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -100,20 +100,17 @@ static void fake_signal_wake_up(struct task_struct *p)
}
/**
- * freeze_task - send a freeze request to given task
- * @p: task to send the request to
- * @sig_only: if set, the request will only be sent if the task has the
- * PF_FREEZER_NOSIG flag unset
- * Return value: 'false', if @sig_only is set and the task has
- * PF_FREEZER_NOSIG set or the task is frozen, 'true', otherwise
+ * freeze_task - send a freeze request to given task
+ * @p: task to send the request to
*
- * The freeze request is sent by setting the tasks's TIF_FREEZE flag and
- * either sending a fake signal to it or waking it up, depending on whether
- * or not it has PF_FREEZER_NOSIG set. If @sig_only is set and the task
- * has PF_FREEZER_NOSIG set (ie. it is a typical kernel thread), its
- * TIF_FREEZE flag will not be set.
+ * If @p is freezing, the freeze request is sent by setting %TIF_FREEZE
+ * flag and either sending a fake signal to it or waking it up, depending
+ * on whether it has %PF_FREEZER_NOSIG set.
+ *
+ * RETURNS:
+ * %false, if @p is not freezing or already frozen; %true, otherwise
*/
-bool freeze_task(struct task_struct *p, bool sig_only)
+bool freeze_task(struct task_struct *p)
{
unsigned long flags;
diff --git a/kernel/power/process.c b/kernel/power/process.c
index fe06ddf..ab4fd7b 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -22,7 +22,7 @@
*/
#define TIMEOUT (20 * HZ)
-static int try_to_freeze_tasks(bool sig_only)
+static int try_to_freeze_tasks(bool user_only)
{
struct task_struct *g, *p;
unsigned long end_time;
@@ -37,14 +37,14 @@ static int try_to_freeze_tasks(bool sig_only)
end_time = jiffies + TIMEOUT;
- if (!sig_only)
+ if (!user_only)
freeze_workqueues_begin();
while (true) {
todo = 0;
read_lock(&tasklist_lock);
do_each_thread(g, p) {
- if (p == current || !freeze_task(p, sig_only))
+ if (p == current || !freeze_task(p))
continue;
/*
@@ -65,7 +65,7 @@ static int try_to_freeze_tasks(bool sig_only)
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
- if (!sig_only) {
+ if (!user_only) {
wq_busy = freeze_workqueues_busy();
todo += wq_busy;
}
--
1.7.6
^ permalink raw reply related
* [PATCH 4/6] freezer: use lock_task_sighand() in fake_signal_wake_up()
From: Tejun Heo @ 2011-09-02 18:27 UTC (permalink / raw)
To: oleg, matthltc, rjw, paul; +Cc: Tejun Heo, containers, linux-pm, linux-kernel
In-Reply-To: <1314988070-12244-1-git-send-email-tj@kernel.org>
cgroup_freezer calls freeze_task() without holding tasklist_lock and,
if the task is exiting, its ->sighand may be gone by the time
fake_signal_wake_up() is called. Use lock_task_sighand() instead of
accessing ->sighand directly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Menage <paul@paulmenage.org>
---
kernel/freezer.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/freezer.c b/kernel/freezer.c
index ebee1ab..9846133 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -93,9 +93,10 @@ static void fake_signal_wake_up(struct task_struct *p)
{
unsigned long flags;
- spin_lock_irqsave(&p->sighand->siglock, flags);
- signal_wake_up(p, 0);
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ if (lock_task_sighand(p, &flags)) {
+ signal_wake_up(p, 0);
+ unlock_task_sighand(p, &flags);
+ }
}
/**
--
1.7.6
^ permalink raw reply related
* [PATCH 3/6] freezer: restructure __refrigerator()
From: Tejun Heo @ 2011-09-02 18:27 UTC (permalink / raw)
To: oleg, matthltc, rjw, paul; +Cc: Tejun Heo, containers, linux-pm, linux-kernel
In-Reply-To: <1314988070-12244-1-git-send-email-tj@kernel.org>
If another freeze happens before all tasks leave FROZEN state after
being thawed, the freezer can see the existing FROZEN and consider the
tasks to be frozen but they can clear FROZEN without checking the new
freezing().
Oleg suggested restructuring __refrigerator() such that there's single
condition check section inside freezer_lock and sigpending is cleared
afterwards, which fixes the problem and simplifies the code.
Restructure accordingly.
-v2: Frozen loop exited without releasing freezer_lock. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
---
kernel/freezer.c | 29 +++++++++++------------------
1 files changed, 11 insertions(+), 18 deletions(-)
diff --git a/kernel/freezer.c b/kernel/freezer.c
index 466ea6b..ebee1ab 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -52,36 +52,29 @@ bool __refrigerator(bool check_kthr_stop)
/* Hmm, should we be allowed to suspend when there are realtime
processes around? */
bool was_frozen = false;
- long save;
+ long save = current->state;
- /*
- * No point in checking freezing() again - the caller already did.
- * Proceed to enter FROZEN.
- */
- spin_lock_irq(&freezer_lock);
- current->flags |= PF_FROZEN;
- spin_unlock_irq(&freezer_lock);
-
- save = current->state;
pr_debug("%s entered refrigerator\n", current->comm);
- spin_lock_irq(¤t->sighand->siglock);
- recalc_sigpending(); /* We sent fake signal, clean it up */
- spin_unlock_irq(¤t->sighand->siglock);
-
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
+
+ spin_lock_irq(&freezer_lock);
+ current->flags |= PF_FROZEN;
if (!freezing(current) ||
(check_kthr_stop && kthread_should_stop()))
+ current->flags &= ~PF_FROZEN;
+ spin_unlock_irq(&freezer_lock);
+
+ if (!(current->flags & PF_FROZEN))
break;
was_frozen = true;
schedule();
}
- /* leave FROZEN */
- spin_lock_irq(&freezer_lock);
- current->flags &= ~PF_FROZEN;
- spin_unlock_irq(&freezer_lock);
+ spin_lock_irq(¤t->sighand->siglock);
+ recalc_sigpending(); /* We sent fake signal, clean it up */
+ spin_unlock_irq(¤t->sighand->siglock);
pr_debug("%s left refrigerator\n", current->comm);
--
1.7.6
^ permalink raw reply related
* [PATCH 2/6] freezer: set PF_NOFREEZE on a dying task right before TASK_DEAD setting bug in freezer_change_state()
From: Tejun Heo @ 2011-09-02 18:27 UTC (permalink / raw)
To: oleg, matthltc, rjw, paul; +Cc: Tejun Heo, containers, linux-pm, linux-kernel
In-Reply-To: <1314988070-12244-1-git-send-email-tj@kernel.org>
3fb45733df "freezer: make exiting tasks properly unfreezable" removed
clear_freeze_flag() from exit path and set PF_NOFREEZE right after
PTRACE_EVENT_EXIT; however, Oleg pointed out that following exit paths
may cause interaction with device drivers after PM freezer consider
the system frozen.
There's no try_to_freeze() call in the exit path and the only
necessary guarantee is that freezer doesn't hang waiting for zombies.
Set PF_NOFREEZE right before setting tsk->state to TASK_DEAD instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
---
kernel/exit.c | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index ac58259..7b6c4fa 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -913,12 +913,6 @@ NORET_TYPE void do_exit(long code)
ptrace_event(PTRACE_EVENT_EXIT, code);
- /*
- * With ptrace notification done, there's no point in freezing from
- * here on. Disallow freezing.
- */
- current->flags |= PF_NOFREEZE;
-
validate_creds_for_do_exit(tsk);
/*
@@ -1044,6 +1038,10 @@ NORET_TYPE void do_exit(long code)
preempt_disable();
exit_rcu();
+
+ /* this task is now dead and freezer should ignore it */
+ current->flags |= PF_NOFREEZE;
+
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
schedule();
--
1.7.6
^ permalink raw reply related
* [PATCH 1/6] cgroup_freezer: fix freezer->state setting bug in freezer_change_state()
From: Tejun Heo @ 2011-09-02 18:27 UTC (permalink / raw)
To: oleg, matthltc, rjw, paul; +Cc: Tejun Heo, containers, linux-pm, linux-kernel
In-Reply-To: <1314988070-12244-1-git-send-email-tj@kernel.org>
d02f52811d0e "cgroup_freezer: prepare for removal of TIF_FREEZE" moved
setting of freezer->state into freezer_change_state(); unfortunately,
while doing so, when it's beginning to freeze tasks, it sets the state
to CGROUP_FROZEN instead of CGROUP_FREEZING ending up skipping the
whole freezing state. Fix it.
-v2: Oleg pointed out that re-freezing FROZEN cgroup could increment
system_freezing_cnt. Fixed.
-v3: Matt pointed out that setting CGROUP_FROZEN always invoked
try_to_freeze_cgroup() regardless of the current state. Patch
updated such that the actual freeze/thaw operations are always
performed on state changes. This shouldn't make any difference
unless something is broken.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Matt Helsley <matthltc@us.ibm.com>
---
kernel/cgroup_freezer.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 4e82525..56a457a 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -308,24 +308,24 @@ static int freezer_change_state(struct cgroup *cgroup,
spin_lock_irq(&freezer->lock);
update_if_frozen(cgroup, freezer);
- if (goal_state == freezer->state)
- goto out;
-
- freezer->state = goal_state;
switch (goal_state) {
case CGROUP_THAWED:
- atomic_dec(&system_freezing_cnt);
+ if (freezer->state != CGROUP_THAWED)
+ atomic_dec(&system_freezing_cnt);
+ freezer->state = CGROUP_THAWED;
unfreeze_cgroup(cgroup, freezer);
break;
case CGROUP_FROZEN:
- atomic_inc(&system_freezing_cnt);
+ if (freezer->state == CGROUP_THAWED)
+ atomic_inc(&system_freezing_cnt);
+ freezer->state = CGROUP_FREEZING;
retval = try_to_freeze_cgroup(cgroup, freezer);
break;
default:
BUG();
}
-out:
+
spin_unlock_irq(&freezer->lock);
return retval;
--
1.7.6
^ permalink raw reply related
* [PATCHSET pm-freezer] freezer: fixes & simplifications
From: Tejun Heo @ 2011-09-02 18:27 UTC (permalink / raw)
To: oleg, matthltc, rjw, paul; +Cc: containers, linux-pm, linux-kernel
Hello,
These six patches are fixes and simplifications for the recent freezer
changes. The first four have been posted twice. Changes since the
last posting[L] are
* freezer->state setting bug fix updated per Matt Helsley so that the
actual per-task freeze/thaw operations are always performed.
* fixed a bug caused by forgetting to unlock freezer_lock in "freezer:
restructure __refrigerator()".
* Two patches added. The fifth one is mostly trivial. The sixth a
bit more involved but still shouldn't cause any noticeable
functional difference. This removes the buggy and unused
freezable_with_signal.
Properly tested this time. As hera is still down, no git branch
available at this point. Patch list and diffstat follow.
Thanks.
[PATCH 1/6] cgroup_freezer: fix freezer->state setting bug in
[PATCH 2/6] freezer: set PF_NOFREEZE on a dying task right before
[PATCH 3/6] freezer: restructure __refrigerator()
[PATCH 4/6] freezer: use lock_task_sighand() in
[PATCH 5/6] freezer: remove unused @sig_only from freeze_task()
[PATCH 6/6] freezer: kill unused set_freezable_with_signal()
include/linux/freezer.h | 22 +------------
include/linux/sched.h | 1
kernel/cgroup_freezer.c | 18 +++++------
kernel/exit.c | 10 ++----
kernel/freezer.c | 78 ++++++++++++++++--------------------------------
kernel/kthread.c | 2 -
kernel/power/process.c | 8 ++--
7 files changed, 47 insertions(+), 92 deletions(-)
--
tejun
[L] http://thread.gmane.org/gmane.linux.kernel/1186387
^ permalink raw reply
* Re: [PATCH pm-freezer 1/4] cgroup_freezer: fix freezer->state setting bug in freezer_change_state()
From: Tejun Heo @ 2011-09-02 17:31 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, Paul Menage
In-Reply-To: <20110902171517.GA8247@redhat.com>
Hello,
On Fri, Sep 02, 2011 at 07:15:17PM +0200, Oleg Nesterov wrote:
> > I guess it depends on the viewpoint. A simple analogy would be using
> > WARN_ON_ONCE() instead of BUG_ON() so that the mode of failure is
> > softer. This change isn't likely to make bugs significantly more
> > difficult to discover so why not?
>
> I agree either way.
>
> Personally I prefer your current patch. Because it is not clear why
> do we call try_to_freeze_cgroup() if it was already called. And, the
> 2nd call can silently hide the problem if we have some bug.
>
> But of course, this is up to you and Matt.
I'm okay either way too. It would make a bug in that area a bit less
annoying and thus may decrease the chance of bug report, but it means
that we ship with built-in easy work around (if it doesn't work at the
first kick, kick again), which can be beneficial in practice.
> > > But I agree either way. Rafael, I think 1-4 are fine, but I think
> > > we need the simple 5/4, will send in a minute...
> >
> > Can you please wait a bit? The second one was broken (missing unlock)
>
> Yes, I just noticed the small problem too, hopefully we mean the same
> bug ;)
Yeap, the same one. I thought it was the second patch instead of the
third. :)
Thanks.
--
tejun
^ permalink raw reply
* Re: [PATCH pm-freezer 1/4] cgroup_freezer: fix freezer->state setting bug in freezer_change_state()
From: Oleg Nesterov @ 2011-09-02 17:30 UTC (permalink / raw)
To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, Paul Menage
In-Reply-To: <20110902170844.GJ2752@htj.dyndns.org>
On 09/03, Tejun Heo wrote:
>
> Can you please wait a bit?
Sure. But just in case, I mean the simple patch below.
Feel free to incorporate into 4/4, or I can resend it later.
Oleg.
------------------------------------------------------------------------------
[PATCH] freezer: remove the pointless/unsafe __thaw_task()->recalc_sigpending_and_wake()
Remove __thaw_task()->recalc_sigpending_and_wake(). It was copied from
cancel_freezing() recently, but it was always wrong.
It is pointless, contary to the comment it can't clear TIF_SIGPENDING.
Not to mention, we must never do this with !current task, this is wrong.
The usage of ->sighand is not safe if the caller is cgroup_freezer, we
can racw with exit.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/freezer.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
--- 3.1/kernel/freezer.c~kill_rspaw 2011-09-02 18:52:46.000000000 +0200
+++ 3.1/kernel/freezer.c 2011-09-02 19:03:30.000000000 +0200
@@ -150,18 +150,10 @@ void __thaw_task(struct task_struct *p)
* be visible to @p as waking up implies wmb. Waking up inside
* freezer_lock also prevents wakeups from leaking outside
* refrigerator.
- *
- * If !FROZEN, @p hasn't reached refrigerator, recalc sigpending to
- * avoid leaving dangling TIF_SIGPENDING behind.
*/
spin_lock_irqsave(&freezer_lock, flags);
- if (frozen(p)) {
+ if (frozen(p))
wake_up_process(p);
- } else {
- spin_lock(&p->sighand->siglock);
- recalc_sigpending_and_wake(p);
- spin_unlock(&p->sighand->siglock);
- }
spin_unlock_irqrestore(&freezer_lock, flags);
}
^ permalink raw reply
* Re: [PATCH pm-freezer 1/4] cgroup_freezer: fix freezer->state setting bug in freezer_change_state()
From: Oleg Nesterov @ 2011-09-02 17:15 UTC (permalink / raw)
To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, Paul Menage
In-Reply-To: <20110902170844.GJ2752@htj.dyndns.org>
On 09/03, Tejun Heo wrote:
>
> Hello,
>
> On Fri, Sep 02, 2011 at 06:58:39PM +0200, Oleg Nesterov wrote:
> > > This still doesn't look quite right. If the cgroup is FREEZING it should
> > > also call try_to_freeze_cgroup(). I think this is what's needed:
> > >
> > > if (freezer->state == CGROUP_THAWED)
> > > atomic_inc(&system_freezing_cnt);
> > > freezer->state = CGROUP_FREEZING;
> > > retval = try_to_freeze_cgroup(cgroup, freezer);
> >
> > This is what I mentioned before, to me this looks like a win.
> >
> > Why do we need try_to_freeze_cgroup() in this case? "for safety"
> > could actually mean "hide the bug" ;)
>
> I guess it depends on the viewpoint. A simple analogy would be using
> WARN_ON_ONCE() instead of BUG_ON() so that the mode of failure is
> softer. This change isn't likely to make bugs significantly more
> difficult to discover so why not?
I agree either way.
Personally I prefer your current patch. Because it is not clear why
do we call try_to_freeze_cgroup() if it was already called. And, the
2nd call can silently hide the problem if we have some bug.
But of course, this is up to you and Matt.
> > But I agree either way. Rafael, I think 1-4 are fine, but I think
> > we need the simple 5/4, will send in a minute...
>
> Can you please wait a bit? The second one was broken (missing unlock)
Yes, I just noticed the small problem too, hopefully we mean the same
bug ;)
Oleg.
^ permalink raw reply
* Re: [PATCH pm-freezer 1/4] cgroup_freezer: fix freezer->state setting bug in freezer_change_state()
From: Tejun Heo @ 2011-09-02 17:08 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, Paul Menage
In-Reply-To: <20110902165839.GA7478@redhat.com>
Hello,
On Fri, Sep 02, 2011 at 06:58:39PM +0200, Oleg Nesterov wrote:
> > This still doesn't look quite right. If the cgroup is FREEZING it should
> > also call try_to_freeze_cgroup(). I think this is what's needed:
> >
> > if (freezer->state == CGROUP_THAWED)
> > atomic_inc(&system_freezing_cnt);
> > freezer->state = CGROUP_FREEZING;
> > retval = try_to_freeze_cgroup(cgroup, freezer);
>
> This is what I mentioned before, to me this looks like a win.
>
> Why do we need try_to_freeze_cgroup() in this case? "for safety"
> could actually mean "hide the bug" ;)
I guess it depends on the viewpoint. A simple analogy would be using
WARN_ON_ONCE() instead of BUG_ON() so that the mode of failure is
softer. This change isn't likely to make bugs significantly more
difficult to discover so why not?
> But I agree either way. Rafael, I think 1-4 are fine, but I think
> we need the simple 5/4, will send in a minute...
Can you please wait a bit? The second one was broken (missing unlock)
and I made some small adjustments. Will repost soon.
Thanks.
--
tejun
^ permalink raw reply
* Re: [PATCH pm-freezer 3/4] freezer: restructure __refrigerator()
From: Oleg Nesterov @ 2011-09-02 17:08 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-kernel, Paul Menage, containers, linux-pm
In-Reply-To: <20110831102210.GC2828@mtj.dyndns.org>
On 08/31, Tejun Heo wrote:
>
> for (;;) {
> set_current_state(TASK_UNINTERRUPTIBLE);
> +
> + spin_lock_irq(&freezer_lock);
> + current->flags |= PF_FROZEN;
> if (!freezing(current) ||
> - (check_kthr_stop && kthread_should_stop()))
> + (check_kthr_stop && kthread_should_stop())) {
> + current->flags &= ~PF_FROZEN;
> + break;
Argh! sorry I missed this...
This "break" is typo, it should be removed.
Otherwise I believe this is correct.
Oleg.
^ permalink raw reply
* Re: [PATCH pm-freezer 1/4] cgroup_freezer: fix freezer->state setting bug in freezer_change_state()
From: Oleg Nesterov @ 2011-09-02 16:58 UTC (permalink / raw)
To: Matt Helsley; +Cc: Paul Menage, linux-kernel, Tejun Heo, linux-pm, containers
In-Reply-To: <20110902004231.GF1919@count0.beaverton.ibm.com>
On 09/01, Matt Helsley wrote:
>
> On Wed, Aug 31, 2011 at 12:21:07PM +0200, Tejun Heo wrote:
> > case CGROUP_FROZEN:
> > - atomic_inc(&system_freezing_cnt);
> > - retval = try_to_freeze_cgroup(cgroup, freezer);
> > + if (freezer->state == CGROUP_THAWED) {
> > + freezer->state = CGROUP_FREEZING;
> > + atomic_inc(&system_freezing_cnt);
> > + retval = try_to_freeze_cgroup(cgroup, freezer);
>
> This still doesn't look quite right. If the cgroup is FREEZING it should
> also call try_to_freeze_cgroup(). I think this is what's needed:
>
> if (freezer->state == CGROUP_THAWED)
> atomic_inc(&system_freezing_cnt);
> freezer->state = CGROUP_FREEZING;
> retval = try_to_freeze_cgroup(cgroup, freezer);
This is what I mentioned before, to me this looks like a win.
Why do we need try_to_freeze_cgroup() in this case? "for safety"
could actually mean "hide the bug" ;)
But I agree either way. Rafael, I think 1-4 are fine, but I think
we need the simple 5/4, will send in a minute...
Oleg.
^ permalink raw reply
* [PATCH 8/8] OMAP3: update cpuidle latency and threshold figures
From: Jean Pihet @ 2011-09-02 13:13 UTC (permalink / raw)
To: Kevin Hilman, Linux PM mailing list, linux-omap,
Rafael J. Wysocki, Paul
Cc: Jean Pihet
In-Reply-To: <1314969204-21704-1-git-send-email-j-pihet@ti.com>
Update the data from the measurements performed at HW and SW levels.
Cf. http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement
for a detailed explanation on where are the numbers magically coming from.
ToDo:
- Measure the wake-up latencies for all power domains for OMAP3
- Correct some numbers when sys_clkreq and sys_offmode are supported
Signed-off-by: Jean Pihet <j-pihet@ti.com>
---
arch/arm/mach-omap2/cpuidle34xx.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
index b43d1d2..3f37e89 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -45,19 +45,19 @@
*/
static struct cpuidle_params cpuidle_params_table[] = {
/* C1 . MPU WFI + Core active */
- {2 + 2, 5, 1},
+ {73 + 78, 152, 1},
/* C2 . MPU WFI + Core inactive */
- {10 + 10, 30, 1},
+ {165 + 88, 345, 1},
/* C3 . MPU CSWR + Core inactive */
- {50 + 50, 300, 1},
+ {163 + 182, 345, 1},
/* C4 . MPU OFF + Core inactive */
- {1500 + 1800, 4000, 1},
+ {2852 + 605, 150000, 1},
/* C5 . MPU RET + Core RET */
- {2500 + 7500, 12000, 1},
+ {800 + 366, 2120, 1},
/* C6 . MPU OFF + Core RET */
- {3000 + 8500, 15000, 1},
+ {4080 + 801, 215000, 1},
/* C7 . MPU OFF + Core OFF */
- {10000 + 30000, 300000, 1},
+ {4300 + 13000, 215000, 1},
};
#define OMAP3_NUM_STATES ARRAY_SIZE(cpuidle_params_table)
--
1.7.4.1
^ permalink raw reply related
* [PATCH 7/8] OMAP2+: cpuidle only influences the MPU state
From: Jean Pihet @ 2011-09-02 13:13 UTC (permalink / raw)
To: Kevin Hilman, Linux PM mailing list, linux-omap,
Rafael J. Wysocki, Paul
Cc: Jean Pihet
In-Reply-To: <1314969204-21704-1-git-send-email-j-pihet@ti.com>
Since cpuidle is a CPU centric framework it decides the MPU
next power state based on the MPU exit_latency and target_residency
figures.
The rest of the power domains get their next power state programmed
from the devices PM QoS framework, via the devices wake-up latency
constraints.
Note: the exit_latency and target_residency figures of the MPU
include the MPU itself and the peripherals needed for the MPU to
execute instructions (e.g. main memory, caches, IRQ controller,
MMU etc). Some of those peripherals can belong to other power domains
than the MPU subsystem and so the corresponding latencies must be
included in this figure.
Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
on MPU, CORE and PER.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
---
arch/arm/mach-omap2/cpuidle34xx.c | 42 +++++++++++-------------------------
arch/arm/mach-omap2/pm.h | 17 +++++++++++++-
2 files changed, 28 insertions(+), 31 deletions(-)
diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
index 4bf6e6e..b43d1d2 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -37,26 +37,26 @@
#ifdef CONFIG_CPU_IDLE
/*
- * The latencies/thresholds for various C states have
+ * The MPU latencies/thresholds for various C states have
* to be configured from the respective board files.
* These are some default values (which might not provide
* the best power savings) used on boards which do not
* pass these details from the board file.
*/
static struct cpuidle_params cpuidle_params_table[] = {
- /* C1 */
+ /* C1 . MPU WFI + Core active */
{2 + 2, 5, 1},
- /* C2 */
+ /* C2 . MPU WFI + Core inactive */
{10 + 10, 30, 1},
- /* C3 */
+ /* C3 . MPU CSWR + Core inactive */
{50 + 50, 300, 1},
- /* C4 */
+ /* C4 . MPU OFF + Core inactive */
{1500 + 1800, 4000, 1},
- /* C5 */
+ /* C5 . MPU RET + Core RET */
{2500 + 7500, 12000, 1},
- /* C6 */
+ /* C6 . MPU OFF + Core RET */
{3000 + 8500, 15000, 1},
- /* C7 */
+ /* C7 . MPU OFF + Core OFF */
{10000 + 30000, 300000, 1},
};
#define OMAP3_NUM_STATES ARRAY_SIZE(cpuidle_params_table)
@@ -64,7 +64,6 @@ static struct cpuidle_params cpuidle_params_table[] = {
/* Mach specific information to be recorded in the C-state driver_data */
struct omap3_idle_statedata {
u32 mpu_state;
- u32 core_state;
u8 valid;
};
struct omap3_idle_statedata omap3_idle_data[OMAP3_NUM_STATES];
@@ -98,7 +97,7 @@ static int omap3_enter_idle(struct cpuidle_device *dev,
{
struct omap3_idle_statedata *cx = cpuidle_get_statedata(state);
struct timespec ts_preidle, ts_postidle, ts_idle;
- u32 mpu_state = cx->mpu_state, core_state = cx->core_state;
+ u32 mpu_state = cx->mpu_state;
/* Used to keep track of the total time in idle */
getnstimeofday(&ts_preidle);
@@ -107,7 +106,6 @@ static int omap3_enter_idle(struct cpuidle_device *dev,
local_fiq_disable();
pwrdm_set_next_pwrst(mpu_pd, mpu_state);
- pwrdm_set_next_pwrst(core_pd, core_state);
if (omap_irq_pending() || need_resched())
goto return_sleep_time;
@@ -156,6 +154,7 @@ static struct cpuidle_state *next_valid_state(struct cpuidle_device *dev,
struct omap3_idle_statedata *cx = cpuidle_get_statedata(curr);
u32 mpu_deepest_state = PWRDM_POWER_RET;
u32 core_deepest_state = PWRDM_POWER_RET;
+ u32 core_next_state = pwrdm_read_next_pwrst(core_pd);
if (enable_off_mode) {
mpu_deepest_state = PWRDM_POWER_OFF;
@@ -171,7 +170,7 @@ static struct cpuidle_state *next_valid_state(struct cpuidle_device *dev,
/* Check if current state is valid */
if ((cx->valid) &&
(cx->mpu_state >= mpu_deepest_state) &&
- (cx->core_state >= core_deepest_state)) {
+ (core_next_state >= core_deepest_state)) {
return curr;
} else {
int idx = OMAP3_NUM_STATES - 1;
@@ -196,7 +195,7 @@ static struct cpuidle_state *next_valid_state(struct cpuidle_device *dev,
cx = cpuidle_get_statedata(&dev->states[idx]);
if ((cx->valid) &&
(cx->mpu_state >= mpu_deepest_state) &&
- (cx->core_state >= core_deepest_state)) {
+ (core_next_state >= core_deepest_state)) {
next = &dev->states[idx];
break;
}
@@ -242,19 +241,11 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
}
/*
- * FIXME: we currently manage device-specific idle states
- * for PER and CORE in combination with CPU-specific
- * idle states. This is wrong, and device-specific
- * idle management needs to be separated out into
- * its own code.
- */
-
- /*
* Prevent PER off if CORE is not in retention or off as this
* would disable PER wakeups completely.
*/
cx = cpuidle_get_statedata(state);
- core_next_state = cx->core_state;
+ core_next_state = pwrdm_read_next_pwrst(core_pd);
per_next_state = per_saved_state = pwrdm_read_next_pwrst(per_pd);
if ((per_next_state == PWRDM_POWER_OFF) &&
(core_next_state > PWRDM_POWER_RET))
@@ -346,32 +337,26 @@ int __init omap3_idle_init(void)
dev->safe_state = &dev->states[0];
cx->valid = 1; /* C1 is always valid */
cx->mpu_state = PWRDM_POWER_ON;
- cx->core_state = PWRDM_POWER_ON;
/* C2 . MPU WFI + Core inactive */
cx = _fill_cstate(dev, 1, "MPU ON + CORE ON");
cx->mpu_state = PWRDM_POWER_ON;
- cx->core_state = PWRDM_POWER_ON;
/* C3 . MPU CSWR + Core inactive */
cx = _fill_cstate(dev, 2, "MPU RET + CORE ON");
cx->mpu_state = PWRDM_POWER_RET;
- cx->core_state = PWRDM_POWER_ON;
/* C4 . MPU OFF + Core inactive */
cx = _fill_cstate(dev, 3, "MPU OFF + CORE ON");
cx->mpu_state = PWRDM_POWER_OFF;
- cx->core_state = PWRDM_POWER_ON;
/* C5 . MPU RET + Core RET */
cx = _fill_cstate(dev, 4, "MPU RET + CORE RET");
cx->mpu_state = PWRDM_POWER_RET;
- cx->core_state = PWRDM_POWER_RET;
/* C6 . MPU OFF + Core RET */
cx = _fill_cstate(dev, 5, "MPU OFF + CORE RET");
cx->mpu_state = PWRDM_POWER_OFF;
- cx->core_state = PWRDM_POWER_RET;
/* C7 . MPU OFF + Core OFF */
cx = _fill_cstate(dev, 6, "MPU OFF + CORE OFF");
@@ -386,7 +371,6 @@ int __init omap3_idle_init(void)
__func__);
}
cx->mpu_state = PWRDM_POWER_OFF;
- cx->core_state = PWRDM_POWER_OFF;
dev->state_count = OMAP3_NUM_STATES;
if (cpuidle_register_device(dev)) {
diff --git a/arch/arm/mach-omap2/pm.h b/arch/arm/mach-omap2/pm.h
index 4e166ad..aca3b6c 100644
--- a/arch/arm/mach-omap2/pm.h
+++ b/arch/arm/mach-omap2/pm.h
@@ -43,9 +43,22 @@ static inline int omap4_opp_init(void)
* omap3_pm_init_cpuidle
*/
struct cpuidle_params {
- u32 exit_latency; /* exit_latency = sleep + wake-up latencies */
+ /*
+ * exit_latency = sleep + wake-up latencies of the MPU,
+ * which include the MPU itself and the peripherals needed
+ * for the MPU to execute instructions (e.g. main memory,
+ * caches, IRQ controller, MMU etc). Some of those peripherals
+ * can belong to other power domains than the MPU subsystem and so
+ * the corresponding latencies must be included in this figure.
+ */
+ u32 exit_latency;
+ /*
+ * target_residency: required amount of time in the C state
+ * to break even on energy cost
+ */
u32 target_residency;
- u8 valid; /* validates the C-state */
+ /* validates the C-state on the given board */
+ u8 valid;
};
#if defined(CONFIG_PM) && defined(CONFIG_CPU_IDLE)
--
1.7.4.1
^ permalink raw reply related
* [PATCH 6/8] OMAP: PM CONSTRAINTS: implement the devices wake-up latency constraints
From: Jean Pihet @ 2011-09-02 13:13 UTC (permalink / raw)
To: Kevin Hilman, Linux PM mailing list, linux-omap,
Rafael J. Wysocki, Paul
Cc: Jean Pihet
In-Reply-To: <1314969204-21704-1-git-send-email-j-pihet@ti.com>
Implement the devices wake-up latency constraints using the global
device PM QoS notification handler which applies the constraints to the
underlying layer by calling the corresponding function at hwmod level.
Note: the bus throughput function is implemented but currently is
a no-op. A new PM QoS class for the bus throughput needs to be
added.
Tested on OMAP3 Beagleboard and OMAP4 Pandaboard in RET/OFF using wake-up
latency constraints on MPU, CORE and PER.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
---
arch/arm/plat-omap/include/plat/omap-pm.h | 128 ---------------------
arch/arm/plat-omap/omap-pm-constraints.c | 173 +++++++++++++---------------
arch/arm/plat-omap/omap-pm-noop.c | 89 ---------------
3 files changed, 80 insertions(+), 310 deletions(-)
diff --git a/arch/arm/plat-omap/include/plat/omap-pm.h b/arch/arm/plat-omap/include/plat/omap-pm.h
index 0840df8..d276082 100644
--- a/arch/arm/plat-omap/include/plat/omap-pm.h
+++ b/arch/arm/plat-omap/include/plat/omap-pm.h
@@ -62,136 +62,8 @@ void omap_pm_if_exit(void);
* Device-driver-originated constraints (via board-*.c files, platform_data)
*/
-
-/**
- * omap_pm_set_max_mpu_wakeup_lat - set the maximum MPU wakeup latency
- * @dev: struct device * requesting the constraint
- * @t: maximum MPU wakeup latency in microseconds
- *
- * Request that the maximum interrupt latency for the MPU to be no
- * greater than @t microseconds. "Interrupt latency" in this case is
- * defined as the elapsed time from the occurrence of a hardware or
- * timer interrupt to the time when the device driver's interrupt
- * service routine has been entered by the MPU.
- *
- * It is intended that underlying PM code will use this information to
- * determine what power state to put the MPU powerdomain into, and
- * possibly the CORE powerdomain as well, since interrupt handling
- * code currently runs from SDRAM. Advanced PM or board*.c code may
- * also configure interrupt controller priorities, OCP bus priorities,
- * CPU speed(s), etc.
- *
- * This function will not affect device wakeup latency, e.g., time
- * elapsed from when a device driver enables a hardware device with
- * clk_enable(), to when the device is ready for register access or
- * other use. To control this device wakeup latency, use
- * omap_pm_set_max_dev_wakeup_lat()
- *
- * Multiple calls to omap_pm_set_max_mpu_wakeup_lat() will replace the
- * previous t value. To remove the latency target for the MPU, call
- * with t = -1.
- *
- * XXX This constraint will be deprecated soon in favor of the more
- * general omap_pm_set_max_dev_wakeup_lat()
- *
- * Returns -EINVAL for an invalid argument, -ERANGE if the constraint
- * is not satisfiable, or 0 upon success.
- */
-int omap_pm_set_max_mpu_wakeup_lat(struct device *dev, long t);
-
-
-/**
- * omap_pm_set_min_bus_tput - set minimum bus throughput needed by device
- * @dev: struct device * requesting the constraint
- * @tbus_id: interconnect to operate on (OCP_{INITIATOR,TARGET}_AGENT)
- * @r: minimum throughput (in KiB/s)
- *
- * Request that the minimum data throughput on the OCP interconnect
- * attached to device @dev interconnect agent @tbus_id be no less
- * than @r KiB/s.
- *
- * It is expected that the OMAP PM or bus code will use this
- * information to set the interconnect clock to run at the lowest
- * possible speed that satisfies all current system users. The PM or
- * bus code will adjust the estimate based on its model of the bus, so
- * device driver authors should attempt to specify an accurate
- * quantity for their device use case, and let the PM or bus code
- * overestimate the numbers as necessary to handle request/response
- * latency, other competing users on the system, etc. On OMAP2/3, if
- * a driver requests a minimum L4 interconnect speed constraint, the
- * code will also need to add an minimum L3 interconnect speed
- * constraint,
- *
- * Multiple calls to omap_pm_set_min_bus_tput() will replace the
- * previous rate value for this device. To remove the interconnect
- * throughput restriction for this device, call with r = 0.
- *
- * Returns -EINVAL for an invalid argument, -ERANGE if the constraint
- * is not satisfiable, or 0 upon success.
- */
int omap_pm_set_min_bus_tput(struct device *dev, u8 agent_id, unsigned long r);
-
-/**
- * omap_pm_set_max_dev_wakeup_lat - set the maximum device enable latency
- * @req_dev: struct device * requesting the constraint, or NULL if none
- * @dev: struct device * to set the constraint one
- * @t: maximum device wakeup latency in microseconds
- *
- * Request that the maximum amount of time necessary for a device @dev
- * to become accessible after its clocks are enabled should be no
- * greater than @t microseconds. Specifically, this represents the
- * time from when a device driver enables device clocks with
- * clk_enable(), to when the register reads and writes on the device
- * will succeed. This function should be called before clk_disable()
- * is called, since the power state transition decision may be made
- * during clk_disable().
- *
- * It is intended that underlying PM code will use this information to
- * determine what power state to put the powerdomain enclosing this
- * device into.
- *
- * Multiple calls to omap_pm_set_max_dev_wakeup_lat() will replace the
- * previous wakeup latency values for this device. To remove the
- * wakeup latency restriction for this device, call with t = -1.
- *
- * Returns -EINVAL for an invalid argument, -ERANGE if the constraint
- * is not satisfiable, or 0 upon success.
- */
-int omap_pm_set_max_dev_wakeup_lat(struct device *req_dev, struct device *dev,
- long t);
-
-
-/**
- * omap_pm_set_max_sdma_lat - set the maximum system DMA transfer start latency
- * @dev: struct device *
- * @t: maximum DMA transfer start latency in microseconds
- *
- * Request that the maximum system DMA transfer start latency for this
- * device 'dev' should be no greater than 't' microseconds. "DMA
- * transfer start latency" here is defined as the elapsed time from
- * when a device (e.g., McBSP) requests that a system DMA transfer
- * start or continue, to the time at which data starts to flow into
- * that device from the system DMA controller.
- *
- * It is intended that underlying PM code will use this information to
- * determine what power state to put the CORE powerdomain into.
- *
- * Since system DMA transfers may not involve the MPU, this function
- * will not affect MPU wakeup latency. Use set_max_cpu_lat() to do
- * so. Similarly, this function will not affect device wakeup latency
- * -- use set_max_dev_wakeup_lat() to affect that.
- *
- * Multiple calls to set_max_sdma_lat() will replace the previous t
- * value for this device. To remove the maximum DMA latency for this
- * device, call with t = -1.
- *
- * Returns -EINVAL for an invalid argument, -ERANGE if the constraint
- * is not satisfiable, or 0 upon success.
- */
-int omap_pm_set_max_sdma_lat(struct device *dev, long t);
-
-
/**
* omap_pm_set_min_clk_rate - set minimum clock rate requested by @dev
* @dev: struct device * requesting the constraint
diff --git a/arch/arm/plat-omap/omap-pm-constraints.c b/arch/arm/plat-omap/omap-pm-constraints.c
index c8b4e4c..efec5df 100644
--- a/arch/arm/plat-omap/omap-pm-constraints.c
+++ b/arch/arm/plat-omap/omap-pm-constraints.c
@@ -17,132 +17,112 @@
#undef DEBUG
#include <linux/init.h>
+#include <linux/notifier.h>
#include <linux/cpufreq.h>
#include <linux/device.h>
#include <linux/platform_device.h>
+#include <linux/pm_qos.h>
/* Interface documentation is in mach/omap-pm.h */
#include <plat/omap-pm.h>
#include <plat/omap_device.h>
+#include <plat/common.h>
+#include <plat/omap_hwmod.h>
static bool off_mode_enabled;
static u32 dummy_context_loss_counter;
-/*
- * Device-driver-originated constraints (via board-*.c files)
- */
+static int _dev_pm_qos_wakeup_latency_handler(struct notifier_block *,
+ unsigned long, void *);
+static struct notifier_block _dev_pm_qos_wakeup_latency_notifier = {
+ .notifier_call = _dev_pm_qos_wakeup_latency_handler,
+};
-int omap_pm_set_max_mpu_wakeup_lat(struct device *dev, long t)
+
+static int _apply_dev_pm_qos_constraint(void *req, unsigned long new_value)
{
- if (!dev || t < -1) {
- WARN(1, "OMAP PM: %s: invalid parameter(s)", __func__);
- return -EINVAL;
- };
+ struct omap_device *od;
+ struct omap_hwmod *oh;
+ struct platform_device *pdev;
+ struct dev_pm_qos_request *dev_pm_qos_req = req;
- if (t == -1)
- pr_debug("OMAP PM: remove max MPU wakeup latency constraint: "
- "dev %s\n", dev_name(dev));
- else
- pr_debug("OMAP PM: add max MPU wakeup latency constraint: "
- "dev %s, t = %ld usec\n", dev_name(dev), t);
+ pr_debug("OMAP PM CONSTRAINTS: req@0x%p, new_value=%lu\n",
+ req, new_value);
- /*
- * For current Linux, this needs to map the MPU to a
- * powerdomain, then go through the list of current max lat
- * constraints on the MPU and find the smallest. If
- * the latency constraint has changed, the code should
- * recompute the state to enter for the next powerdomain
- * state.
- *
- * TI CDP code can call constraint_set here.
- */
+ /* Look for the platform device for the constraint target device */
+ pdev = to_platform_device(dev_pm_qos_req->dev);
- return 0;
-}
+ /* Try to catch non platform devices */
+ if (pdev->name == NULL) {
+ pr_err("%s: Error: platform device for device %s not valid\n",
+ __func__, dev_name(dev_pm_qos_req->dev));
+ return -EINVAL;
+ }
-int omap_pm_set_min_bus_tput(struct device *dev, u8 agent_id, unsigned long r)
-{
- if (!dev || (agent_id != OCP_INITIATOR_AGENT &&
- agent_id != OCP_TARGET_AGENT)) {
- WARN(1, "OMAP PM: %s: invalid parameter(s)", __func__);
+ /* Find the associated omap_device for dev */
+ od = container_of(pdev, struct omap_device, pdev);
+ if (od->hwmods_cnt != 1) {
+ pr_err("%s: Error: No unique hwmod for device %s\n",
+ __func__, dev_name(dev_pm_qos_req->dev));
return -EINVAL;
- };
+ }
- if (r == 0)
- pr_debug("OMAP PM: remove min bus tput constraint: "
- "dev %s for agent_id %d\n", dev_name(dev), agent_id);
- else
- pr_debug("OMAP PM: add min bus tput constraint: "
- "dev %s for agent_id %d: rate %ld KiB\n",
- dev_name(dev), agent_id, r);
+ /* Find the primary omap_hwmod for dev */
+ oh = od->hwmods[0];
- /*
- * This code should model the interconnect and compute the
- * required clock frequency, convert that to a VDD2 OPP ID, then
- * set the VDD2 OPP appropriately.
- *
- * TI CDP code can call constraint_set here on the VDD2 OPP.
- */
+ pr_debug("OMAP PM CONSTRAINTS: req@0x%p, dev=0x%p, new_value=%lu\n",
+ req, dev_pm_qos_req->dev, new_value);
- return 0;
+ /* Apply the constraint */
+ return omap_hwmod_set_wkup_lat_constraint(oh, dev_pm_qos_req,
+ new_value);
}
-int omap_pm_set_max_dev_wakeup_lat(struct device *req_dev, struct device *dev,
- long t)
+/* PM QoS classes handlers */
+static int _dev_pm_qos_wakeup_latency_handler(struct notifier_block *nb,
+ unsigned long new_value,
+ void *req)
{
- if (!req_dev || !dev || t < -1) {
- WARN(1, "OMAP PM: %s: invalid parameter(s)", __func__);
- return -EINVAL;
- };
-
- if (t == -1)
- pr_debug("OMAP PM: remove max device latency constraint: "
- "dev %s\n", dev_name(dev));
- else
- pr_debug("OMAP PM: add max device latency constraint: "
- "dev %s, t = %ld usec\n", dev_name(dev), t);
-
- /*
- * For current Linux, this needs to map the device to a
- * powerdomain, then go through the list of current max lat
- * constraints on that powerdomain and find the smallest. If
- * the latency constraint has changed, the code should
- * recompute the state to enter for the next powerdomain
- * state. Conceivably, this code should also determine
- * whether to actually disable the device clocks or not,
- * depending on how long it takes to re-enable the clocks.
- *
- * TI CDP code can call constraint_set here.
- */
-
- return 0;
+ return _apply_dev_pm_qos_constraint(req, new_value);
}
-int omap_pm_set_max_sdma_lat(struct device *dev, long t)
+/*
+ * omap_pm_set_min_bus_tput - set/release bus throughput constraints
+ * ToDo: currently is a no-op, to be converted to a PM QoS handler
+ * for the TPUT class
+ */
+int omap_pm_set_min_bus_tput(struct device *dev, u8 agent_id, unsigned long r)
{
- if (!dev || t < -1) {
+ long t;
+ struct device *req_dev = NULL;
+
+ if (!dev || (agent_id != OCP_INITIATOR_AGENT &&
+ agent_id != OCP_TARGET_AGENT)) {
WARN(1, "OMAP PM: %s: invalid parameter(s)", __func__);
return -EINVAL;
};
- if (t == -1)
- pr_debug("OMAP PM: remove max DMA latency constraint: "
- "dev %s\n", dev_name(dev));
+ /*
+ * A value of r == 0 removes the constraint. Convert it to the
+ * generic _set_dev_constraint convention (-1 for constraint removal)
+ */
+ if (r == 0)
+ t = -1;
else
- pr_debug("OMAP PM: add max DMA latency constraint: "
- "dev %s, t = %ld usec\n", dev_name(dev), t);
+ t = r;
/*
- * For current Linux PM QOS params, this code should scan the
- * list of maximum CPU and DMA latencies and select the
- * smallest, then set cpu_dma_latency pm_qos_param
- * accordingly.
- *
- * For future Linux PM QOS params, with separate CPU and DMA
- * latency params, this code should just set the dma_latency param.
- *
- * TI CDP code can call constraint_set here.
+ * Assign the device for L3 or L4 interconnect to req_dev,
+ * based on the value of agent_id
*/
+ switch (agent_id) {
+ case OCP_INITIATOR_AGENT:
+ req_dev = omap2_get_l3_device();
+ break;
+ case OCP_TARGET_AGENT:
+ /* Fixme: need the device for L4 interconnect */
+ break;
+ }
return 0;
}
@@ -350,10 +330,17 @@ int __init omap_pm_if_early_init(void)
return 0;
}
-/* Must be called after clock framework is initialized */
+/* Must be called after the clock framework is initialized */
int __init omap_pm_if_init(void)
{
- return 0;
+ int ret;
+
+ ret = dev_pm_qos_add_global_notifier(
+ &_dev_pm_qos_wakeup_latency_notifier);
+ if (ret)
+ WARN(1, KERN_ERR "Cannot add global notifier for dev PM QoS\n");
+
+ return ret;
}
void omap_pm_if_exit(void)
diff --git a/arch/arm/plat-omap/omap-pm-noop.c b/arch/arm/plat-omap/omap-pm-noop.c
index b0471bb2..8ad902f 100644
--- a/arch/arm/plat-omap/omap-pm-noop.c
+++ b/arch/arm/plat-omap/omap-pm-noop.c
@@ -32,35 +32,6 @@ static u32 dummy_context_loss_counter;
/*
* Device-driver-originated constraints (via board-*.c files)
*/
-
-int omap_pm_set_max_mpu_wakeup_lat(struct device *dev, long t)
-{
- if (!dev || t < -1) {
- WARN(1, "OMAP PM: %s: invalid parameter(s)", __func__);
- return -EINVAL;
- };
-
- if (t == -1)
- pr_debug("OMAP PM: remove max MPU wakeup latency constraint: "
- "dev %s\n", dev_name(dev));
- else
- pr_debug("OMAP PM: add max MPU wakeup latency constraint: "
- "dev %s, t = %ld usec\n", dev_name(dev), t);
-
- /*
- * For current Linux, this needs to map the MPU to a
- * powerdomain, then go through the list of current max lat
- * constraints on the MPU and find the smallest. If
- * the latency constraint has changed, the code should
- * recompute the state to enter for the next powerdomain
- * state.
- *
- * TI CDP code can call constraint_set here.
- */
-
- return 0;
-}
-
int omap_pm_set_min_bus_tput(struct device *dev, u8 agent_id, unsigned long r)
{
if (!dev || (agent_id != OCP_INITIATOR_AGENT &&
@@ -88,66 +59,6 @@ int omap_pm_set_min_bus_tput(struct device *dev, u8 agent_id, unsigned long r)
return 0;
}
-int omap_pm_set_max_dev_wakeup_lat(struct device *req_dev, struct device *dev,
- long t)
-{
- if (!req_dev || !dev || t < -1) {
- WARN(1, "OMAP PM: %s: invalid parameter(s)", __func__);
- return -EINVAL;
- };
-
- if (t == -1)
- pr_debug("OMAP PM: remove max device latency constraint: "
- "dev %s\n", dev_name(dev));
- else
- pr_debug("OMAP PM: add max device latency constraint: "
- "dev %s, t = %ld usec\n", dev_name(dev), t);
-
- /*
- * For current Linux, this needs to map the device to a
- * powerdomain, then go through the list of current max lat
- * constraints on that powerdomain and find the smallest. If
- * the latency constraint has changed, the code should
- * recompute the state to enter for the next powerdomain
- * state. Conceivably, this code should also determine
- * whether to actually disable the device clocks or not,
- * depending on how long it takes to re-enable the clocks.
- *
- * TI CDP code can call constraint_set here.
- */
-
- return 0;
-}
-
-int omap_pm_set_max_sdma_lat(struct device *dev, long t)
-{
- if (!dev || t < -1) {
- WARN(1, "OMAP PM: %s: invalid parameter(s)", __func__);
- return -EINVAL;
- };
-
- if (t == -1)
- pr_debug("OMAP PM: remove max DMA latency constraint: "
- "dev %s\n", dev_name(dev));
- else
- pr_debug("OMAP PM: add max DMA latency constraint: "
- "dev %s, t = %ld usec\n", dev_name(dev), t);
-
- /*
- * For current Linux PM QOS params, this code should scan the
- * list of maximum CPU and DMA latencies and select the
- * smallest, then set cpu_dma_latency pm_qos_param
- * accordingly.
- *
- * For future Linux PM QOS params, with separate CPU and DMA
- * latency params, this code should just set the dma_latency param.
- *
- * TI CDP code can call constraint_set here.
- */
-
- return 0;
-}
-
int omap_pm_set_min_clk_rate(struct device *dev, struct clk *c, long r)
{
if (!dev || !c || r < 0) {
--
1.7.4.1
^ permalink raw reply related
* [PATCH 5/8] OMAP2+: omap_hwmod: manage the wake-up latency constraints
From: Jean Pihet @ 2011-09-02 13:13 UTC (permalink / raw)
To: Kevin Hilman, Linux PM mailing list, linux-omap,
Rafael J. Wysocki, Paul
Cc: Jean Pihet
In-Reply-To: <1314969204-21704-1-git-send-email-j-pihet@ti.com>
Hwmod is queried from the OMAP_PM layer to manage the power domains
wake-up latency constraints. Hwmod retrieves the correct power domain
and if it exists it calls the corresponding power domain function.
Tested on OMAP3 Beagleboard and OMAP4 Pandaboard in RET/OFF using wake-up
latency constraints on MPU, CORE and PER.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
---
arch/arm/mach-omap2/omap_hwmod.c | 26 +++++++++++++++++++++++++-
arch/arm/plat-omap/include/plat/omap_hwmod.h | 2 ++
2 files changed, 27 insertions(+), 1 deletions(-)
diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index 84cc0bd..c6b1cc9 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -143,6 +143,7 @@
#include "powerdomain.h"
#include <plat/clock.h>
#include <plat/omap_hwmod.h>
+#include <plat/omap_device.h>
#include <plat/prcm.h>
#include "cm2xxx_3xxx.h"
@@ -2618,11 +2619,34 @@ ohsps_unlock:
return ret;
}
+/*
+ * omap_hwmod_set_wkup_constraint- set/release a wake-up latency constraint
+ *
+ * @oh: struct omap_hwmod* to which the target device belongs to.
+ * @cookie: identifier of the constraints list for @oh.
+ * @min_latency: the minimum allowed wake-up latency for @oh.
+ *
+ * Returns 0 upon success.
+ */
+int omap_hwmod_set_wkup_lat_constraint(struct omap_hwmod *oh,
+ void *cookie, long min_latency)
+{
+ struct powerdomain *pwrdm = omap_hwmod_get_pwrdm(oh);
+
+ if (!pwrdm) {
+ pr_err("%s: Error: could not find powerdomain "
+ "for %s\n", __func__, oh->name);
+ return -EINVAL;
+ }
+
+ return pwrdm_set_wkup_lat_constraint(pwrdm, cookie, min_latency);
+}
+
/**
* omap_hwmod_get_context_loss_count - get lost context count
* @oh: struct omap_hwmod *
*
- * Query the powerdomain of of @oh to get the context loss
+ * Query the powerdomain of @oh to get the context loss
* count for this device.
*
* Returns the context loss count of the powerdomain assocated with @oh
diff --git a/arch/arm/plat-omap/include/plat/omap_hwmod.h b/arch/arm/plat-omap/include/plat/omap_hwmod.h
index 0e329ca..75e0e7a 100644
--- a/arch/arm/plat-omap/include/plat/omap_hwmod.h
+++ b/arch/arm/plat-omap/include/plat/omap_hwmod.h
@@ -603,6 +603,8 @@ int omap_hwmod_for_each_by_class(const char *classname,
void *user);
int omap_hwmod_set_postsetup_state(struct omap_hwmod *oh, u8 state);
+int omap_hwmod_set_wkup_lat_constraint(struct omap_hwmod *oh, void *cookie,
+ long min_latency);
u32 omap_hwmod_get_context_loss_count(struct omap_hwmod *oh);
int omap_hwmod_no_setup_reset(struct omap_hwmod *oh);
--
1.7.4.1
^ permalink raw reply related
* [PATCH 4/8] OMAP3: powerdomain data: add wake-up latency figures
From: Jean Pihet @ 2011-09-02 13:13 UTC (permalink / raw)
To: Kevin Hilman, Linux PM mailing list, linux-omap,
Rafael J. Wysocki, Paul
Cc: Jean Pihet
In-Reply-To: <1314969204-21704-1-git-send-email-j-pihet@ti.com>
Figures are added to the power domains structs for RET and OFF modes.
Note: the latency figures for MPU, PER, CORE, NEON have been obtained
from actual measurements.
The latency figures for the other power domains are preliminary and
shall be added.
Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
on MPU, CORE and PER.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
---
arch/arm/mach-omap2/powerdomains3xxx_data.c | 78 +++++++++++++++++++++++++++
1 files changed, 78 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mach-omap2/powerdomains3xxx_data.c b/arch/arm/mach-omap2/powerdomains3xxx_data.c
index 469a920..32922bb 100644
--- a/arch/arm/mach-omap2/powerdomains3xxx_data.c
+++ b/arch/arm/mach-omap2/powerdomains3xxx_data.c
@@ -31,6 +31,14 @@
/*
* Powerdomains
+ *
+ * The wakeup_lat values are derived from measurements on
+ * the actual target.
+ *
+ * Note: the latency figures for MPU, PER, CORE, NEON have been obtained
+ * from actual measurements.
+ * The latency figures for the other power domains are preliminary and
+ * shall be added.
*/
static struct powerdomain iva2_pwrdm = {
@@ -52,6 +60,13 @@ static struct powerdomain iva2_pwrdm = {
[2] = PWRSTS_OFF_ON,
[3] = PWRSTS_ON,
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 1100,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 350,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain mpu_3xxx_pwrdm = {
@@ -68,6 +83,13 @@ static struct powerdomain mpu_3xxx_pwrdm = {
.pwrsts_mem_on = {
[0] = PWRSTS_OFF_ON,
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 1830,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 121,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
/*
@@ -98,6 +120,13 @@ static struct powerdomain core_3xxx_pre_es3_1_pwrdm = {
[0] = PWRSTS_OFF_RET_ON, /* MEM1ONSTATE */
[1] = PWRSTS_OFF_RET_ON, /* MEM2ONSTATE */
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 3082,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 153,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain core_3xxx_es3_1_pwrdm = {
@@ -121,6 +150,13 @@ static struct powerdomain core_3xxx_es3_1_pwrdm = {
[0] = PWRSTS_OFF_RET_ON, /* MEM1ONSTATE */
[1] = PWRSTS_OFF_RET_ON, /* MEM2ONSTATE */
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 3082,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 153,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain dss_pwrdm = {
@@ -136,6 +172,13 @@ static struct powerdomain dss_pwrdm = {
.pwrsts_mem_on = {
[0] = PWRSTS_ON, /* MEMONSTATE */
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 70,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 20,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
/*
@@ -157,6 +200,13 @@ static struct powerdomain sgx_pwrdm = {
.pwrsts_mem_on = {
[0] = PWRSTS_ON, /* MEMONSTATE */
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 1000,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain cam_pwrdm = {
@@ -172,6 +222,13 @@ static struct powerdomain cam_pwrdm = {
.pwrsts_mem_on = {
[0] = PWRSTS_ON, /* MEMONSTATE */
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 850,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 35,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain per_pwrdm = {
@@ -187,6 +244,13 @@ static struct powerdomain per_pwrdm = {
.pwrsts_mem_on = {
[0] = PWRSTS_ON, /* MEMONSTATE */
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 671,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 31,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain emu_pwrdm = {
@@ -201,6 +265,13 @@ static struct powerdomain neon_pwrdm = {
.omap_chip = OMAP_CHIP_INIT(CHIP_IS_OMAP3430),
.pwrsts = PWRSTS_OFF_RET_ON,
.pwrsts_logic_ret = PWRSTS_RET,
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 0,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 0,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain usbhost_pwrdm = {
@@ -223,6 +294,13 @@ static struct powerdomain usbhost_pwrdm = {
.pwrsts_mem_on = {
[0] = PWRSTS_ON, /* MEMONSTATE */
},
+ .wakeup_lat = {
+ [PWRDM_FUNC_PWRST_OFF] = 800,
+ [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_CSWR] = 150,
+ [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
+ [PWRDM_FUNC_PWRST_ON] = 0,
+ },
};
static struct powerdomain dpll1_pwrdm = {
--
1.7.4.1
^ permalink raw reply related
* [PATCH 3/8] OMAP2+: powerdomain: control power domains next state
From: Jean Pihet @ 2011-09-02 13:13 UTC (permalink / raw)
To: Kevin Hilman, Linux PM mailing list, linux-omap,
Rafael J. Wysocki, Paul
Cc: Jean Pihet
In-Reply-To: <1314969204-21704-1-git-send-email-j-pihet@ti.com>
When a PM QoS device latency constraint is requested or removed the
PM QoS layer notifies the underlying layer with the updated aggregated
constraint value. The constraint is stored in the powerdomain constraints
list and then applied to the corresponding power domain.
The power domains get the next power state programmed directly in the
registers via pwrdm_wakeuplat_update_pwrst.
Tested on OMAP3 Beagleboard and OMAP4 Pandaboard in RET/OFF using
wake-up latency constraints on MPU, CORE and PER.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
---
arch/arm/mach-omap2/powerdomain.c | 190 +++++++++++++++++++++++++++++++++++++
arch/arm/mach-omap2/powerdomain.h | 33 ++++++-
2 files changed, 221 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mach-omap2/powerdomain.c b/arch/arm/mach-omap2/powerdomain.c
index 9af0847..afa8153 100644
--- a/arch/arm/mach-omap2/powerdomain.c
+++ b/arch/arm/mach-omap2/powerdomain.c
@@ -17,8 +17,10 @@
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/list.h>
+#include <linux/slab.h>
#include <linux/errno.h>
#include <linux/string.h>
+#include <linux/pm_qos.h>
#include <trace/events/power.h>
#include "cm2xxx_3xxx.h"
@@ -104,6 +106,11 @@ static int _pwrdm_register(struct powerdomain *pwrdm)
for (i = 0; i < pwrdm->banks; i++)
pwrdm->ret_mem_off_counter[i] = 0;
+ /* Initialize the per-od wake-up constraints list and spinlock */
+ spin_lock_init(&pwrdm->wkup_lat_plist_lock);
+ plist_head_init(&pwrdm->wkup_lat_plist_head);
+
+ /* Initialize the pwrdm state */
pwrdm_wait_transition(pwrdm);
pwrdm->state = pwrdm_read_pwrst(pwrdm);
pwrdm->state_counter[pwrdm->state] = 1;
@@ -191,6 +198,77 @@ static int _pwrdm_post_transition_cb(struct powerdomain *pwrdm, void *unused)
return 0;
}
+/**
+ * pwrdm_wakeuplat_update_pwrst - Update power domain power state if needed
+ * @pwrdm: struct powerdomain * to which requesting device belongs to.
+ * @min_latency: the allowed wake-up latency for the given power domain. A
+ * value of PM_QOS_DEV_LAT_DEFAULT_VALUE means 'no constraint' on the pwrdm.
+ *
+ * Finds the power domain next power state that fulfills the constraint.
+ * Programs a new target state if it is different from current power state.
+ * The power domains get the next power state programmed directly in the
+ * registers.
+ *
+ * Returns 0 upon success.
+ */
+static int pwrdm_wakeuplat_update_pwrst(struct powerdomain *pwrdm,
+ long min_latency)
+{
+ int ret = 0, new_state = 0;
+
+ if (!pwrdm) {
+ WARN(1, "powerdomain: %s: invalid parameter(s)", __func__);
+ return -EINVAL;
+ }
+
+ /*
+ * Apply constraints to power domains by programming
+ * the pwrdm next power state.
+ */
+
+ /* Find power state with wakeup latency < minimum constraint */
+ for (new_state = 0x0; new_state < PWRDM_MAX_PWRSTS; new_state++) {
+ if (min_latency == PM_QOS_DEV_LAT_DEFAULT_VALUE ||
+ pwrdm->wakeup_lat[new_state] <= min_latency)
+ break;
+ }
+
+ switch (new_state) {
+ case PWRDM_FUNC_PWRST_OFF:
+ new_state = PWRDM_POWER_OFF;
+ break;
+ case PWRDM_FUNC_PWRST_OSWR:
+ pwrdm_set_logic_retst(pwrdm, PWRDM_POWER_OFF);
+ new_state = PWRDM_POWER_RET;
+ break;
+ case PWRDM_FUNC_PWRST_CSWR:
+ pwrdm_set_logic_retst(pwrdm, PWRDM_POWER_RET);
+ new_state = PWRDM_POWER_RET;
+ break;
+ case PWRDM_FUNC_PWRST_INACTIVE:
+ new_state = PWRDM_POWER_INACTIVE;
+ break;
+ case PWRDM_FUNC_PWRST_ON:
+ new_state = PWRDM_POWER_ON;
+ break;
+ default:
+ pr_warn("powerdomain: requested latency constraint not "
+ "supported %s set to ON state\n", pwrdm->name);
+ new_state = PWRDM_POWER_ON;
+ break;
+ }
+
+ if (pwrdm_read_next_pwrst(pwrdm) != new_state)
+ ret = omap_set_pwrdm_state(pwrdm, new_state);
+
+ pr_debug("powerdomain: %s pwrst: curr=%d, prev=%d next=%d "
+ "min_latency=%ld, set_state=%d\n", pwrdm->name,
+ pwrdm_read_pwrst(pwrdm), pwrdm_read_prev_pwrst(pwrdm),
+ pwrdm_read_next_pwrst(pwrdm), min_latency, new_state);
+
+ return ret;
+}
+
/* Public functions */
/**
@@ -930,6 +1008,118 @@ int pwrdm_post_transition(void)
return 0;
}
+/*
+ * pwrdm_set_wkup_lat_constraint - Set/update/remove a powerdomain wakeup
+ * latency constraint and apply it
+ * @pwrdm: struct powerdomain * which the constraint applies to
+ * @cookie: constraint identifier, used for tracking.
+ * @min_latency: minimum wakeup latency constraint (in microseconds) for
+ * the given pwrdm. The value of PM_QOS_DEV_LAT_DEFAULT_VALUE removes
+ * the constraint.
+ *
+ * Tracks the constraints by @cookie.
+ * Constraint set/update: Adds a new entry to powerdomain's wake-up latency
+ * constraint list.
+ * If the constraint identifier already exists in the list, the old value is
+ * overwritten.
+ * Constraint removal: Removes the identifier's entry from powerdomain's
+ * wakeup latency constraint list.
+ *
+ * Applies the strongest constraint value for the given pwrdm by calling
+ * pwrdm_wakeuplat_update_pwrst.
+ *
+ * Returns 0 upon success or a negative value in case of error.
+ *
+ * The caller must check the validity of the parameters.
+ */
+int pwrdm_set_wkup_lat_constraint(struct powerdomain *pwrdm, void *cookie,
+ long min_latency)
+{
+ struct pwrdm_wkup_constraints_entry *user = NULL;
+ struct pwrdm_wkup_constraints_entry *tmp_user, *new_user = NULL;
+ int ret = 0, free_new_user = 0, free_node = 0;
+ long value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
+ unsigned long flags;
+
+ pr_debug("powerdomain: %s: pwrdm %s, cookie=0x%p, min_latency=%ld\n",
+ __func__, pwrdm->name, cookie, min_latency);
+
+ if (min_latency != PM_QOS_DEV_LAT_DEFAULT_VALUE) {
+ new_user = kzalloc(sizeof(struct pwrdm_wkup_constraints_entry),
+ GFP_KERNEL);
+ if (!new_user) {
+ pr_err("%s: FATAL ERROR: kzalloc failed\n", __func__);
+ return -ENOMEM;
+ }
+ free_new_user = 1;
+ }
+
+ spin_lock_irqsave(&pwrdm->wkup_lat_plist_lock, flags);
+
+ /* Check if there already is a constraint for cookie */
+ plist_for_each_entry(tmp_user, &pwrdm->wkup_lat_plist_head, node) {
+ if (tmp_user->cookie == cookie) {
+ user = tmp_user;
+ break;
+ }
+ }
+
+ if (min_latency != PM_QOS_DEV_LAT_DEFAULT_VALUE) {
+ /* If nothing to update, job done */
+ if (user && (user->node.prio == min_latency))
+ goto exit_ok;
+
+ if (!user) {
+ /* Add new entry to the list */
+ user = new_user;
+ user->cookie = cookie;
+ free_new_user = 0;
+ } else {
+ /* Update existing entry */
+ plist_del(&user->node, &pwrdm->wkup_lat_plist_head);
+ }
+
+ plist_node_init(&user->node, min_latency);
+ plist_add(&user->node, &pwrdm->wkup_lat_plist_head);
+ } else {
+ if (user) {
+ /* Remove the constraint from the list */
+ plist_del(&user->node, &pwrdm->wkup_lat_plist_head);
+ free_node = 1;
+ } else {
+ /* Constraint not existing or list empty, do nothing */
+ ret = -EINVAL;
+ goto exit_error;
+ }
+
+ }
+
+exit_ok:
+ /* Find the strongest constraint from the list */
+ if (!plist_head_empty(&pwrdm->wkup_lat_plist_head))
+ value = plist_first(&pwrdm->wkup_lat_plist_head)->prio;
+
+ spin_unlock_irqrestore(&pwrdm->wkup_lat_plist_lock, flags);
+
+ if (free_node)
+ kfree(user);
+
+ if (free_new_user)
+ kfree(new_user);
+
+ /* Apply the constraint to the pwrdm */
+ pr_debug("powerdomain: %s: pwrdm %s, value=%ld\n",
+ __func__, pwrdm->name, value);
+ pwrdm_wakeuplat_update_pwrst(pwrdm, value);
+
+ return 0;
+
+exit_error:
+ spin_unlock_irqrestore(&pwrdm->wkup_lat_plist_lock, flags);
+
+ return ret;
+}
+
/**
* pwrdm_get_context_loss_count - get powerdomain's context loss count
* @pwrdm: struct powerdomain * to wait for
diff --git a/arch/arm/mach-omap2/powerdomain.h b/arch/arm/mach-omap2/powerdomain.h
index d23d979..f2b0ed7 100644
--- a/arch/arm/mach-omap2/powerdomain.h
+++ b/arch/arm/mach-omap2/powerdomain.h
@@ -19,7 +19,9 @@
#include <linux/types.h>
#include <linux/list.h>
-
+#include <linux/plist.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
#include <linux/atomic.h>
#include <plat/cpu.h>
@@ -43,6 +45,16 @@
#define PWRSTS_RET_ON (PWRSTS_RET | PWRSTS_ON)
#define PWRSTS_OFF_RET_ON (PWRSTS_OFF_RET | PWRSTS_ON)
+/* Powerdomain functional power states */
+#define PWRDM_FUNC_PWRST_OFF 0x0
+#define PWRDM_FUNC_PWRST_OSWR 0x1
+#define PWRDM_FUNC_PWRST_CSWR 0x2
+#define PWRDM_FUNC_PWRST_INACTIVE 0x3
+#define PWRDM_FUNC_PWRST_ON 0x4
+
+#define PWRDM_MAX_FUNC_PWRSTS 5
+
+#define UNSUP_STATE -1
/* Powerdomain flags */
#define PWRDM_HAS_HDWR_SAR (1 << 0) /* hardware save-and-restore support */
@@ -93,7 +105,12 @@ struct powerdomain;
* @state_counter:
* @timer:
* @state_timer:
- *
+ * @wakeup_lat: wakeup latencies (in us) for possible powerdomain power states
+ * Note about the wakeup latencies ordering: the values must be sorted
+ * in decremental order
+ * @wkup_lat_plist_head: pwrdm wake-up latency constraints list
+ * @wkup_lat_plist_lock: spinlock that protects the constraints lists
+ * domains states
* @prcm_partition possible values are defined in mach-omap2/prcm44xx.h.
*/
struct powerdomain {
@@ -118,6 +135,15 @@ struct powerdomain {
s64 timer;
s64 state_timer[PWRDM_MAX_PWRSTS];
#endif
+ const u32 wakeup_lat[PWRDM_MAX_FUNC_PWRSTS];
+ struct plist_head wkup_lat_plist_head;
+ spinlock_t wkup_lat_plist_lock;
+};
+
+/* Linked list for the wake-up latency constraints */
+struct pwrdm_wkup_constraints_entry {
+ void *cookie;
+ struct plist_node node;
};
/**
@@ -207,6 +233,9 @@ int pwrdm_clkdm_state_switch(struct clockdomain *clkdm);
int pwrdm_pre_transition(void);
int pwrdm_post_transition(void);
int pwrdm_set_lowpwrstchange(struct powerdomain *pwrdm);
+
+int pwrdm_set_wkup_lat_constraint(struct powerdomain *pwrdm, void *cookie,
+ long min_latency);
u32 pwrdm_get_context_loss_count(struct powerdomain *pwrdm);
bool pwrdm_can_ever_lose_context(struct powerdomain *pwrdm);
--
1.7.4.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox