* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: Vaidyanathan Srinivasan @ 2011-07-06 16:41 UTC (permalink / raw)
To: Pekka Enberg
Cc: KAMEZAWA Hiroyuki, linux-kernel, Dave Hansen, linux-mm,
thomas.abraham, Ankita Garg, linux-pm, Paul E. McKenney,
Christoph Lameter, linux-arm-kernel, Arjan van de Ven
In-Reply-To: <CAOJsxLHQP=-srK_uYYBsPb7+rUBnPZG7bzwtCd-rRaQa4ikUFg@mail.gmail.com>
* Pekka Enberg <penberg@kernel.org> [2011-07-06 11:45:41]:
> Hi Ankita,
>
> [ I don't really know anything about memory power management but
> here's my two cents since you asked for it. ]
>
> On Wed, Jun 29, 2011 at 4:00 PM, Ankita Garg <ankita@in.ibm.com> wrote:
> > I) Dynamic Power Transition
> >
> > The goal here is to ensure that as much as possible, on an idle system,
> > the memory references do not get spread across the entire RAM, a problem
> > similar to memory fragmentation. The proposed approach is as below:
> >
> > 1) One of the first things is to ensure that the memory allocations do
> > not spill over to more number of regions. Thus the allocator needs to
> > be aware of the address boundary of the different regions.
>
> Why does the allocator need to know about address boundaries? Why
> isn't it enough to make the page allocator and reclaim policies favor using
> memory from lower addresses as aggressively as possible? That'd mean
> we'd favor the first memory banks and could keep the remaining ones
> powered off as much as possible.
Yes, this will work to a limited extent when we have few regions to
account for. However if applications start and stop leaving large
holes in the address map, it may not worth the effort of migrating
pages to lower addresses to pack the holes.
> IOW, why do we need to support scenarios such as this:
>
> bank 0 bank 1 bank 2 bank3
> | online | offline | online | offline |
>
> instead of using memory compaction and possibly something like the
> SLUB defragmentation patches to turn the memory map into this:
>
> bank 0 bank 1 bank 2 bank3
> | online | online | offline | offline |
Yes, this is what we need, but also have a notion of how many pages
are used in each bank so that we can pack pages from under utilized
banks into a reasonably used bank and thereby free more banks.
Freeing more banks + clustering all used or free banks gives us more
power saving benefits.
> > 2) At the time of allocation, before spilling over allocations to the
> > next logical region, the allocator needs to make a best attempt to
> > reclaim some memory from within the existing region itself first. The
> > reclaim here needs to be in LRU order within the region. However, if
> > it is ascertained that the reclaim would take a lot of time, like there
> > are quite a fe write-backs needed, then we can spill over to the next
> > memory region (just like our NUMA node allocation policy now).
>
> I think a much more important question is what happens _after_ we've
> allocated and free'd tons of memory few times. AFAICT, memory
> regions don't help with that kind of fragmentation that will eventually
> happen anyway.
Memory regions allow us to have a zone per-region. This helps in the
cases were allocations are fragments into multiple regions by
potentially reclaiming very low utilized regions and packing the pages
into higher utilized regions. The requirement is a standard
de-fragmentation approach, except that the cluster of allocations
should fall within a region (any region) as much as possible.
--Vaidy
^ permalink raw reply
* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: Vaidyanathan Srinivasan @ 2011-07-06 16:50 UTC (permalink / raw)
To: Pekka Enberg
Cc: KAMEZAWA Hiroyuki, linux-kernel, Dave Hansen, linux-mm,
thomas.abraham, Ankita Garg, linux-pm, Paul E. McKenney,
Christoph Lameter, linux-arm-kernel, Arjan van de Ven
In-Reply-To: <CAOJsxLF0me+=Rk8RnxNS=9=_pmwwAntu1c930F6ySEUD2zZkGw@mail.gmail.com>
* Pekka Enberg <penberg@kernel.org> [2011-07-06 12:01:45]:
> On Wed, Jul 6, 2011 at 11:45 AM, Pekka Enberg <penberg@kernel.org> wrote:
> > Hi Ankita,
> >
> > [ I don't really know anything about memory power management but
> > here's my two cents since you asked for it. ]
> >
> > On Wed, Jun 29, 2011 at 4:00 PM, Ankita Garg <ankita@in.ibm.com> wrote:
> >> I) Dynamic Power Transition
> >>
> >> The goal here is to ensure that as much as possible, on an idle system,
> >> the memory references do not get spread across the entire RAM, a problem
> >> similar to memory fragmentation. The proposed approach is as below:
> >>
> >> 1) One of the first things is to ensure that the memory allocations do
> >> not spill over to more number of regions. Thus the allocator needs to
> >> be aware of the address boundary of the different regions.
> >
> > Why does the allocator need to know about address boundaries? Why
> > isn't it enough to make the page allocator and reclaim policies favor using
> > memory from lower addresses as aggressively as possible? That'd mean
> > we'd favor the first memory banks and could keep the remaining ones
> > powered off as much as possible.
> >
> > IOW, why do we need to support scenarios such as this:
> >
> > bank 0 bank 1 bank 2 bank3
> > | online | offline | online | offline |
> >
> > instead of using memory compaction and possibly something like the
> > SLUB defragmentation patches to turn the memory map into this:
> >
> > bank 0 bank 1 bank 2 bank3
> > | online | online | offline | offline |
> >
> >> 2) At the time of allocation, before spilling over allocations to the
> >> next logical region, the allocator needs to make a best attempt to
> >> reclaim some memory from within the existing region itself first. The
> >> reclaim here needs to be in LRU order within the region. However, if
> >> it is ascertained that the reclaim would take a lot of time, like there
> >> are quite a fe write-backs needed, then we can spill over to the next
> >> memory region (just like our NUMA node allocation policy now).
> >
> > I think a much more important question is what happens _after_ we've
> > allocated and free'd tons of memory few times. AFAICT, memory
> > regions don't help with that kind of fragmentation that will eventually
> > happen anyway.
>
> Btw, I'd also decouple the 'memory map' required for PASR from
> memory region data structure and use page allocator hooks for letting
> the PASR driver know about allocated and unallocated memory. That
> way the PASR driver could automatically detect if full banks are
> unused and power them off. That'd make memory power management
> transparent to the VM regardless of whether we're using hardware or
> software poweroff.
Having a 'memory map' to track blocks of memory for the purpose of
exploiting PASR is a good alternative. However having the notion of
regions allows us to free more such banks and probably reclaim last
few pages in a block and mark it free. A method to keep allocations
within a block and also reclaim all pages from certain blocks will
improve PASR usage apart from aiding other use cases.
Without affecting allocation and reclaim, the tracking part can be
implemented in a less intrusive way, but it will be great to design
a less intrusive method to bias the allocations and reclaims as well.
--Vaidy
^ permalink raw reply
* [PATCH] PM / Hibernate: Fix free_unnecessary_pages()
From: Rafael J. Wysocki @ 2011-07-06 18:46 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: LKML, Matthew Garrett
From: Rafael J. Wysocki <rjw@sisk.pl>
There is a bug in free_unnecessary_pages() that causes it to
attempt to free too many pages in some cases, which triggers the
BUG_ON() in memory_bm_clear_bit() for copy_bm. Namely, if
count_data_pages() is initially greater than alloc_normal, we get
to_free_normal equal to 0 and "save" greater from 0. In that case,
if the sum of "save" and count_highmem_pages() is greater than
alloc_highmem, we subtract a positive number from to_free_normal.
Hence, since to_free_normal was 0 before the subtraction and is
an unsigned int, the result is converted to a huge positive number
that is used as the number of pages to free.
Fix this bug by checking if to_free_normal is actually greater
than or equal to the number we're going to subtract from it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: Matthew Garrett <mjg@redhat.com>
Cc: stable@kernel.org
---
kernel/power/snapshot.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1211,7 +1211,11 @@ static void free_unnecessary_pages(void)
to_free_highmem = alloc_highmem - save;
} else {
to_free_highmem = 0;
- to_free_normal -= save - alloc_highmem;
+ save -= alloc_highmem;
+ if (to_free_normal > save)
+ to_free_normal -= save;
+ else
+ to_free_normal = 0;
}
memory_bm_position_reset(©_bm);
^ permalink raw reply
* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: david @ 2011-07-06 20:20 UTC (permalink / raw)
To: Pekka Enberg
Cc: Paul E. McKenney, KAMEZAWA Hiroyuki, linux-kernel, Dave Hansen,
linux-mm, thomas.abraham, Ankita Garg, linux-pm,
Christoph Lameter, linux-arm-kernel, Arjan van de Ven
In-Reply-To: <CAOJsxLHQP=-srK_uYYBsPb7+rUBnPZG7bzwtCd-rRaQa4ikUFg@mail.gmail.com>
On Wed, 6 Jul 2011, Pekka Enberg wrote:
> Why does the allocator need to know about address boundaries? Why
> isn't it enough to make the page allocator and reclaim policies favor using
> memory from lower addresses as aggressively as possible? That'd mean
> we'd favor the first memory banks and could keep the remaining ones
> powered off as much as possible.
>
> IOW, why do we need to support scenarios such as this:
>
> bank 0 bank 1 bank 2 bank3
> | online | offline | online | offline |
I believe that there are memory allocations that cannot be moved after
they are made (think about regions allocated to DMA from hardware where
the hardware has already been given the address space to DMA into)
As a result, you may not be able to take bank 2 offline, so your option is
to either leave banks 0-2 all online, or support emptying bank 1 and
taking it offline.
David Lang
^ permalink raw reply
* [RFC] Unprepare callback for cpuidle_device
From: asinghal @ 2011-07-06 20:23 UTC (permalink / raw)
To: linux-pm; +Cc: johlstei
We plan to use high resolution timers in one of our modules, with the
requirement that we cancel these timers when the cpu goes idle and restart
them when the cpu comes out of idle.
We are cancelling the timers in cpuidle prepare callback. The problem is
that if the need_resched() call in drivers/cpuidle/cpuidle.c returns true,
how do we restart the timer? If the call returns false, we can restart the
timer in the cpuidle enter callback.
The solution to the problem that we have in mind is adding an unprepare
callback to the cpuidle_device struct, and calling it if needs_resched()
returns true. Another option is to implement deferred timers for hrtimers.
Which of the two options is the better solution, or is there another
feasible alternative?
sincerely,
Amar Singhal
^ permalink raw reply
* [PATCH 0/5 v1] PM / Domains: Generic PM domains improvements
From: Rafael J. Wysocki @ 2011-07-06 20:48 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
Hi,
For the last few days I've been working on modifications of the generic
PM domains code allowing .start_device()/.stop_device() and drivers'
.runtime_suspend()/.runtime_resume() callbacks to execute things like
pm_runtime_resume() for other devices in the same PM domain safely
without deadlocking (the original motivation was a console driver on
shmobile that pm_runtime_get_*()/pm_runtime_put_*() are called for from
the other devices' .runtime_suspend()/.runtime_resume() callbacks).
I think I solved that problem, but in the process I found a few places
where fixes and/on improvements were necessary. The patches in this
series address those issues. They are on top of the branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-domains
Patches [1-2/5] are regarded as 3.1 material, the others not necessarily,
but I'd like to push them for 3.1 too if there are no complaints.
Thanks,
Rafael
^ permalink raw reply
* [PATCH 1/5 v1] PM / Domains: Set device state to "active" during system resume
From: Rafael J. Wysocki @ 2011-07-06 20:50 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
In-Reply-To: <201107062248.35586.rjw@sisk.pl>
From: Rafael J. Wysocki <rjw@sisk.pl>
The runtime PM status of devices in a power domain that is not
powered off in pm_genpd_complete() should be set to "active", because
those devices are operational at this point. Some of them may not be
in use, though, so make pm_genpd_complete() call pm_runtime_idle()
in addition to pm_runtime_set_active() for each of them.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/base/power/domain.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -786,7 +786,9 @@ static void pm_genpd_complete(struct dev
if (run_complete) {
pm_generic_complete(dev);
+ pm_runtime_set_active(dev);
pm_runtime_enable(dev);
+ pm_runtime_idle(dev);
}
}
^ permalink raw reply
* [PATCH 2/5 v1] PM / Domains: Make failing pm_genpd_prepare() clean up properly
From: Rafael J. Wysocki @ 2011-07-06 20:51 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
In-Reply-To: <201107062248.35586.rjw@sisk.pl>
From: Rafael J. Wysocki <rjw@sisk.pl>
If pm_generic_prepare() in pm_genpd_prepare() returns error code,
the PM domains counter of "prepared" devices should be decremented
and its suspend_power_off flag should be reset if this counter drops
down to zero. Otherwise, the PM domain runtime PM code will not
handle the domain correctly (it will permanently think that system
suspend is in progress).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/base/power/domain.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -367,6 +367,7 @@ static void pm_genpd_sync_poweroff(struc
static int pm_genpd_prepare(struct device *dev)
{
struct generic_pm_domain *genpd;
+ int ret;
dev_dbg(dev, "%s()\n", __func__);
@@ -400,7 +401,16 @@ static int pm_genpd_prepare(struct devic
mutex_unlock(&genpd->lock);
- return pm_generic_prepare(dev);
+ ret = pm_generic_prepare(dev);
+ if (ret) {
+ mutex_lock(&genpd->lock);
+
+ if (--genpd->prepared_count == 0)
+ genpd->suspend_power_off = false;
+
+ mutex_unlock(&genpd->lock);
+ }
+ return ret;
}
/**
^ permalink raw reply
* [PATCH 3/5 v1] PM / Domains: Rework locking
From: Rafael J. Wysocki @ 2011-07-06 20:53 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
In-Reply-To: <201107062248.35586.rjw@sisk.pl>
From: Rafael J. Wysocki <rjw@sisk.pl>
CUrrently, the .start_device() and .stop_device() callbacks from
struct generic_pm_domain() as well as the device drivers' runtime PM
callbacks used by the generic PM domains code are executed under
the generic PM domain lock. This, unfortunately, is prone to
deadlocks, for example if a device and its parent are boths members
of the same PM domain. For this reason, it would be better if the
PM domains code didn't execute device callbacks under the lock.
Rework the locking in the generic PM domains code so that the lock
is dropped for the execution of device callbacks. To this end,
introduce PM domains states reflecting the current status of a PM
domain and such that the PM domain lock cannot be acquired if the
status is GPD_STATE_BUSY. Make threads attempting to acquire a PM
domain's lock wait until the status changes to either
GPD_STATE_ACTIVE or GPD_STATE_POWER_OFF.
This change by itself doesn't fix the deadlock problem mentioned
above, but the mechanism introduced by it will be used for for this
purpose by a subsequent patch.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/base/power/domain.c | 249 ++++++++++++++++++++++++++++++--------------
include/linux/pm_domain.h | 10 +
2 files changed, 181 insertions(+), 78 deletions(-)
Index: linux-2.6/include/linux/pm_domain.h
===================================================================
--- linux-2.6.orig/include/linux/pm_domain.h
+++ linux-2.6/include/linux/pm_domain.h
@@ -11,8 +11,11 @@
#include <linux/device.h>
-#define GPD_IN_SUSPEND 1
-#define GPD_POWER_OFF 2
+enum gpd_status {
+ GPD_STATE_ACTIVE = 0, /* PM domain is active */
+ GPD_STATE_BUSY, /* Something is happening to the PM domain */
+ GPD_STATE_POWER_OFF, /* PM domain is off */
+};
struct dev_power_governor {
bool (*power_down_ok)(struct dev_pm_domain *domain);
@@ -29,7 +32,8 @@ struct generic_pm_domain {
struct work_struct power_off_work;
unsigned int in_progress; /* Number of devices being suspended now */
unsigned int sd_count; /* Number of subdomains with power "on" */
- bool power_is_off; /* Whether or not power has been removed */
+ enum gpd_status status; /* Current state of the domain */
+ wait_queue_head_t status_wait_queue;
unsigned int device_count; /* Number of devices */
unsigned int suspended_count; /* System suspend device counter */
unsigned int prepared_count; /* Suspend counter of prepared devices */
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -13,6 +13,8 @@
#include <linux/pm_domain.h>
#include <linux/slab.h>
#include <linux/err.h>
+#include <linux/sched.h>
+#include <linux/suspend.h>
#ifdef CONFIG_PM
@@ -30,6 +32,34 @@ static void genpd_sd_counter_dec(struct
genpd->sd_count--;
}
+static void genpd_acquire_lock(struct generic_pm_domain *genpd)
+{
+ DEFINE_WAIT(wait);
+
+ mutex_lock(&genpd->lock);
+ /*
+ * Wait for the domain to transition into either the active,
+ * or the power off state.
+ */
+ for (;;) {
+ prepare_to_wait(&genpd->status_wait_queue, &wait,
+ TASK_UNINTERRUPTIBLE);
+ if (genpd->status != GPD_STATE_BUSY)
+ break;
+ mutex_unlock(&genpd->lock);
+
+ schedule();
+
+ mutex_lock(&genpd->lock);
+ }
+ finish_wait(&genpd->status_wait_queue, &wait);
+}
+
+static void genpd_release_lock(struct generic_pm_domain *genpd)
+{
+ mutex_unlock(&genpd->lock);
+}
+
/**
* pm_genpd_poweron - Restore power to a given PM domain and its parents.
* @genpd: PM domain to power up.
@@ -39,22 +69,50 @@ static void genpd_sd_counter_dec(struct
*/
static int pm_genpd_poweron(struct generic_pm_domain *genpd)
{
+ struct generic_pm_domain *parent = genpd->parent;
+ DEFINE_WAIT(wait);
int ret = 0;
start:
- if (genpd->parent)
- mutex_lock(&genpd->parent->lock);
- mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
+ if (parent) {
+ mutex_lock(&parent->lock);
+ mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
+ } else {
+ mutex_lock(&genpd->lock);
+ }
+ /*
+ * Wait for the domain to transition into either the active,
+ * or the power off state.
+ */
+ for (;;) {
+ prepare_to_wait(&genpd->status_wait_queue, &wait,
+ TASK_UNINTERRUPTIBLE);
+ if (genpd->status != GPD_STATE_BUSY)
+ break;
+ mutex_unlock(&genpd->lock);
+ if (parent)
+ mutex_unlock(&parent->lock);
- if (!genpd->power_is_off
+ schedule();
+
+ if (parent) {
+ mutex_lock(&parent->lock);
+ mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
+ } else {
+ mutex_lock(&genpd->lock);
+ }
+ }
+ finish_wait(&genpd->status_wait_queue, &wait);
+
+ if (genpd->status == GPD_STATE_ACTIVE
|| (genpd->prepared_count > 0 && genpd->suspend_power_off))
goto out;
- if (genpd->parent && genpd->parent->power_is_off) {
+ if (parent && parent->status != GPD_STATE_ACTIVE) {
mutex_unlock(&genpd->lock);
- mutex_unlock(&genpd->parent->lock);
+ mutex_unlock(&parent->lock);
- ret = pm_genpd_poweron(genpd->parent);
+ ret = pm_genpd_poweron(parent);
if (ret)
return ret;
@@ -67,14 +125,14 @@ static int pm_genpd_poweron(struct gener
goto out;
}
- genpd->power_is_off = false;
- if (genpd->parent)
- genpd->parent->sd_count++;
+ genpd->status = GPD_STATE_ACTIVE;
+ if (parent)
+ parent->sd_count++;
out:
mutex_unlock(&genpd->lock);
- if (genpd->parent)
- mutex_unlock(&genpd->parent->lock);
+ if (parent)
+ mutex_unlock(&parent->lock);
return ret;
}
@@ -90,6 +148,7 @@ static int pm_genpd_poweron(struct gener
*/
static int __pm_genpd_save_device(struct dev_list_entry *dle,
struct generic_pm_domain *genpd)
+ __releases(&genpd->lock) __acquires(&genpd->lock)
{
struct device *dev = dle->dev;
struct device_driver *drv = dev->driver;
@@ -98,6 +157,8 @@ static int __pm_genpd_save_device(struct
if (dle->need_restore)
return 0;
+ mutex_unlock(&genpd->lock);
+
if (drv && drv->pm && drv->pm->runtime_suspend) {
if (genpd->start_device)
genpd->start_device(dev);
@@ -108,6 +169,8 @@ static int __pm_genpd_save_device(struct
genpd->stop_device(dev);
}
+ mutex_lock(&genpd->lock);
+
if (!ret)
dle->need_restore = true;
@@ -121,6 +184,7 @@ static int __pm_genpd_save_device(struct
*/
static void __pm_genpd_restore_device(struct dev_list_entry *dle,
struct generic_pm_domain *genpd)
+ __releases(&genpd->lock) __acquires(&genpd->lock)
{
struct device *dev = dle->dev;
struct device_driver *drv = dev->driver;
@@ -128,6 +192,8 @@ static void __pm_genpd_restore_device(st
if (!dle->need_restore)
return;
+ mutex_unlock(&genpd->lock);
+
if (drv && drv->pm && drv->pm->runtime_resume) {
if (genpd->start_device)
genpd->start_device(dev);
@@ -138,6 +204,8 @@ static void __pm_genpd_restore_device(st
genpd->stop_device(dev);
}
+ mutex_lock(&genpd->lock);
+
dle->need_restore = false;
}
@@ -150,13 +218,14 @@ static void __pm_genpd_restore_device(st
* the @genpd's devices' drivers and remove power from @genpd.
*/
static int pm_genpd_poweroff(struct generic_pm_domain *genpd)
+ __releases(&genpd->lock) __acquires(&genpd->lock)
{
struct generic_pm_domain *parent;
struct dev_list_entry *dle;
unsigned int not_suspended;
int ret;
- if (genpd->power_is_off || genpd->prepared_count > 0)
+ if (genpd->status == GPD_STATE_POWER_OFF || genpd->prepared_count > 0)
return 0;
if (genpd->sd_count > 0)
@@ -175,22 +244,36 @@ static int pm_genpd_poweroff(struct gene
return -EAGAIN;
}
+ genpd->status = GPD_STATE_BUSY;
+
list_for_each_entry_reverse(dle, &genpd->dev_list, node) {
ret = __pm_genpd_save_device(dle, genpd);
if (ret)
goto err_dev;
}
+ mutex_unlock(&genpd->lock);
+
+ parent = genpd->parent;
+ if (parent) {
+ genpd_acquire_lock(parent);
+ mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
+ } else {
+ mutex_lock(&genpd->lock);
+ }
+
if (genpd->power_off)
genpd->power_off(genpd);
- genpd->power_is_off = true;
+ genpd->status = GPD_STATE_POWER_OFF;
+ wake_up_all(&genpd->status_wait_queue);
- parent = genpd->parent;
if (parent) {
genpd_sd_counter_dec(parent);
if (parent->sd_count == 0)
queue_work(pm_wq, &parent->power_off_work);
+
+ genpd_release_lock(parent);
}
return 0;
@@ -199,6 +282,9 @@ static int pm_genpd_poweroff(struct gene
list_for_each_entry_continue(dle, &genpd->dev_list, node)
__pm_genpd_restore_device(dle, genpd);
+ genpd->status = GPD_STATE_ACTIVE;
+ wake_up_all(&genpd->status_wait_queue);
+
return ret;
}
@@ -212,13 +298,9 @@ static void genpd_power_off_work_fn(stru
genpd = container_of(work, struct generic_pm_domain, power_off_work);
- if (genpd->parent)
- mutex_lock(&genpd->parent->lock);
- mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
+ genpd_acquire_lock(genpd);
pm_genpd_poweroff(genpd);
- mutex_unlock(&genpd->lock);
- if (genpd->parent)
- mutex_unlock(&genpd->parent->lock);
+ genpd_release_lock(genpd);
}
/**
@@ -239,23 +321,17 @@ static int pm_genpd_runtime_suspend(stru
if (IS_ERR(genpd))
return -EINVAL;
- if (genpd->parent)
- mutex_lock(&genpd->parent->lock);
- mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
-
if (genpd->stop_device) {
int ret = genpd->stop_device(dev);
if (ret)
- goto out;
+ return ret;
}
+
+ genpd_acquire_lock(genpd);
genpd->in_progress++;
pm_genpd_poweroff(genpd);
genpd->in_progress--;
-
- out:
- mutex_unlock(&genpd->lock);
- if (genpd->parent)
- mutex_unlock(&genpd->parent->lock);
+ genpd_release_lock(genpd);
return 0;
}
@@ -276,9 +352,6 @@ static void __pm_genpd_runtime_resume(st
break;
}
}
-
- if (genpd->start_device)
- genpd->start_device(dev);
}
/**
@@ -304,9 +377,15 @@ static int pm_genpd_runtime_resume(struc
if (ret)
return ret;
- mutex_lock(&genpd->lock);
+ genpd_acquire_lock(genpd);
+ genpd->status = GPD_STATE_BUSY;
__pm_genpd_runtime_resume(dev, genpd);
- mutex_unlock(&genpd->lock);
+ genpd->status = GPD_STATE_ACTIVE;
+ wake_up_all(&genpd->status_wait_queue);
+ genpd_release_lock(genpd);
+
+ if (genpd->start_device)
+ genpd->start_device(dev);
return 0;
}
@@ -339,7 +418,7 @@ static void pm_genpd_sync_poweroff(struc
{
struct generic_pm_domain *parent = genpd->parent;
- if (genpd->power_is_off)
+ if (genpd->status == GPD_STATE_POWER_OFF)
return;
if (genpd->suspended_count != genpd->device_count || genpd->sd_count > 0)
@@ -348,7 +427,7 @@ static void pm_genpd_sync_poweroff(struc
if (genpd->power_off)
genpd->power_off(genpd);
- genpd->power_is_off = true;
+ genpd->status = GPD_STATE_POWER_OFF;
if (parent) {
genpd_sd_counter_dec(parent);
pm_genpd_sync_poweroff(parent);
@@ -375,33 +454,35 @@ static int pm_genpd_prepare(struct devic
if (IS_ERR(genpd))
return -EINVAL;
- mutex_lock(&genpd->lock);
+ genpd_acquire_lock(genpd);
if (genpd->prepared_count++ == 0)
- genpd->suspend_power_off = genpd->power_is_off;
+ genpd->suspend_power_off = genpd->status == GPD_STATE_POWER_OFF;
- if (genpd->suspend_power_off) {
- mutex_unlock(&genpd->lock);
- return 0;
- }
+ genpd_release_lock(genpd);
- /*
- * If the device is in the (runtime) "suspended" state, call
- * .start_device() for it, if defined.
- */
- if (pm_runtime_suspended(dev))
- __pm_genpd_runtime_resume(dev, genpd);
+ if (genpd->suspend_power_off)
+ return 0;
/*
- * Do not check if runtime resume is pending at this point, because it
- * has been taken care of already and if pm_genpd_poweron() ran at this
- * point as a result of the check, it would deadlock.
+ * The PM domain must be in the GPD_STATE_ACTIVE state at this point,
+ * so pm_genpd_poweron() will return immediately, but if the device
+ * is suspended (e.g. it's been stopped by .stop_device()), we need
+ * to make it operational. However, we need to prevent already
+ * received wakeup events from being lost too.
*/
- __pm_runtime_disable(dev, false);
+ pm_runtime_get_noresume(dev);
+ if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
+ pm_wakeup_event(dev, 0);
+
+ if (pm_wakeup_pending()) {
+ ret = -EBUSY;
+ } else {
+ pm_runtime_resume(dev);
+ __pm_runtime_disable(dev, false);
- mutex_unlock(&genpd->lock);
-
- ret = pm_generic_prepare(dev);
+ ret = pm_generic_prepare(dev);
+ }
if (ret) {
mutex_lock(&genpd->lock);
@@ -410,6 +491,8 @@ static int pm_genpd_prepare(struct devic
mutex_unlock(&genpd->lock);
}
+
+ pm_runtime_put_sync(dev);
return ret;
}
@@ -726,7 +809,7 @@ static int pm_genpd_restore_noirq(struct
* guaranteed that this function will never run twice in parallel for
* the same PM domain, so it is not necessary to use locking here.
*/
- genpd->power_is_off = true;
+ genpd->status = GPD_STATE_POWER_OFF;
if (genpd->suspend_power_off) {
/*
* The boot kernel might put the domain into the power on state,
@@ -836,9 +919,9 @@ int pm_genpd_add_device(struct generic_p
if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev))
return -EINVAL;
- mutex_lock(&genpd->lock);
+ genpd_acquire_lock(genpd);
- if (genpd->power_is_off) {
+ if (genpd->status == GPD_STATE_POWER_OFF) {
ret = -EINVAL;
goto out;
}
@@ -870,7 +953,7 @@ int pm_genpd_add_device(struct generic_p
spin_unlock_irq(&dev->power.lock);
out:
- mutex_unlock(&genpd->lock);
+ genpd_release_lock(genpd);
return ret;
}
@@ -891,7 +974,7 @@ int pm_genpd_remove_device(struct generi
if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev))
return -EINVAL;
- mutex_lock(&genpd->lock);
+ genpd_acquire_lock(genpd);
if (genpd->prepared_count > 0) {
ret = -EAGAIN;
@@ -915,7 +998,7 @@ int pm_genpd_remove_device(struct generi
}
out:
- mutex_unlock(&genpd->lock);
+ genpd_release_lock(genpd);
return ret;
}
@@ -934,9 +1017,19 @@ int pm_genpd_add_subdomain(struct generi
if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(new_subdomain))
return -EINVAL;
- mutex_lock(&genpd->lock);
+ start:
+ genpd_acquire_lock(genpd);
+ mutex_lock_nested(&new_subdomain->lock, SINGLE_DEPTH_NESTING);
+
+ if (new_subdomain->status != GPD_STATE_POWER_OFF
+ && new_subdomain->status != GPD_STATE_ACTIVE) {
+ mutex_unlock(&new_subdomain->lock);
+ genpd_release_lock(genpd);
+ goto start;
+ }
- if (genpd->power_is_off && !new_subdomain->power_is_off) {
+ if (genpd->status == GPD_STATE_POWER_OFF
+ && new_subdomain->status != GPD_STATE_POWER_OFF) {
ret = -EINVAL;
goto out;
}
@@ -948,17 +1041,14 @@ int pm_genpd_add_subdomain(struct generi
}
}
- mutex_lock_nested(&new_subdomain->lock, SINGLE_DEPTH_NESTING);
-
list_add_tail(&new_subdomain->sd_node, &genpd->sd_list);
new_subdomain->parent = genpd;
- if (!subdomain->power_is_off)
+ if (subdomain->status != GPD_STATE_POWER_OFF)
genpd->sd_count++;
- mutex_unlock(&new_subdomain->lock);
-
out:
- mutex_unlock(&genpd->lock);
+ mutex_unlock(&new_subdomain->lock);
+ genpd_release_lock(genpd);
return ret;
}
@@ -977,7 +1067,8 @@ int pm_genpd_remove_subdomain(struct gen
if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(target))
return -EINVAL;
- mutex_lock(&genpd->lock);
+ start:
+ genpd_acquire_lock(genpd);
list_for_each_entry(subdomain, &genpd->sd_list, sd_node) {
if (subdomain != target)
@@ -985,9 +1076,16 @@ int pm_genpd_remove_subdomain(struct gen
mutex_lock_nested(&subdomain->lock, SINGLE_DEPTH_NESTING);
+ if (subdomain->status != GPD_STATE_POWER_OFF
+ && subdomain->status != GPD_STATE_ACTIVE) {
+ mutex_unlock(&subdomain->lock);
+ genpd_release_lock(genpd);
+ goto start;
+ }
+
list_del(&subdomain->sd_node);
subdomain->parent = NULL;
- if (!subdomain->power_is_off)
+ if (subdomain->status != GPD_STATE_POWER_OFF)
genpd_sd_counter_dec(genpd);
mutex_unlock(&subdomain->lock);
@@ -996,7 +1094,7 @@ int pm_genpd_remove_subdomain(struct gen
break;
}
- mutex_unlock(&genpd->lock);
+ genpd_release_lock(genpd);
return ret;
}
@@ -1022,7 +1120,8 @@ void pm_genpd_init(struct generic_pm_dom
INIT_WORK(&genpd->power_off_work, genpd_power_off_work_fn);
genpd->in_progress = 0;
genpd->sd_count = 0;
- genpd->power_is_off = is_off;
+ genpd->status = is_off ? GPD_STATE_POWER_OFF : GPD_STATE_ACTIVE;
+ init_waitqueue_head(&genpd->status_wait_queue);
genpd->device_count = 0;
genpd->suspended_count = 0;
genpd->domain.ops.runtime_suspend = pm_genpd_runtime_suspend;
^ permalink raw reply
* [PATCH 4/5 v1] PM / Domains: Allow callbacks to execute arbitrary runtime PM helpers
From: Rafael J. Wysocki @ 2011-07-06 21:00 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
In-Reply-To: <201107062248.35586.rjw@sisk.pl>
From: Rafael J. Wysocki <rjw@sisk.pl>
A deadlock may occur if one of the PM domains' .start_device() or
.stop_device() callbacks or a device driver's .runtime_suspend() or
.runtime_resume() callback executed by the core generic PM domain
code uses a "wrong" runtime PM helper function. This happens, for
example, if .runtime_resume() from one device's driver calls
pm_runtime_resume() for another device in the same PM domain.
A similar situation may take place if a device's parent is in the
same PM domain, in which case the runtime PM framework may execute
pm_genpd_runtime_resume() automatically for the parent (if it is
suspended at the moment). This, of course, is undesirable, so
the generic PM domains code should be modified to prevent it from
happening.
The runtime PM framework guarantees that pm_genpd_runtime_suspend()
and pm_genpd_runtime_resume() won't be executed in parallel for
the same device, so the generic PM domains code need not worry
about those cases. Still, it needs to prevent the other possible
race conditions between pm_genpd_runtime_suspend(),
pm_genpd_runtime_resume(), pm_genpd_poweron() and pm_genpd_poweroff()
from happening and it needs to avoid deadlocks at the same time.
To this end, modify the generic PM domains code to relax
synchronization rules so that:
* pm_genpd_poweron() doesn't wait for the PM domain status to
change from GPD_STATE_BUSY. If it finds that the status is
not GPD_STATE_POWER_OFF, it returns without powering the domain on
(it may modify the status depending on the circumstances).
* pm_genpd_poweroff() returns as soon as it finds that the PM
domain's status changed from GPD_STATE_BUSY after it's released
the PM domain's lock.
* pm_genpd_runtime_suspend() doesn't wait for the PM domain status
to change from GPD_STATE_BUSY after executing the domain's
.stop_device() callback and executes pm_genpd_poweroff() only
if pm_genpd_runtime_resume() is not executed in parallel.
* pm_genpd_runtime_resume() doesn't wait for the PM domain status
to change from GPD_STATE_BUSY after executing pm_genpd_poweron()
and sets the domain's status to GPD_STATE_BUSY and increments its
counter of resuming devices (introduced by this change) immediately
after acquiring the lock. The counter of resuming devices is then
decremented after executing __pm_genpd_runtime_resume() for the
device and the domain's status is reset to GPD_STATE_ACTIVE (unless
there are more resuming devices in the domain, in which case the
status remains GPD_STATE_BUSY).
This way, for example, if a device driver's .runtime_resume()
callback executes pm_runtime_resume() for another device in the same
PM domain, pm_genpd_poweron() called by pm_genpd_runtime_resume()
invoked by the runtime PM framework will not block and it will see
that there's nothing to do for it. Next, the PM domain's lock will
be taken without waiting for its status to change from GPD_STATE_BUSY
and the device driver's .runtime_resume() callback will be executed.
In turn, if pm_runtime_suspend() is executed by one device driver's
.runtime_resume() callback for another device in the same PM domain,
pm_genpd_runtime_suspend() invoked by the runtime PM framework as
a result will notice that one of the devices in the domain is being
resumed, so it won't call pm_genpd_poweroff().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/base/power/domain.c | 97 +++++++++++++++++++++++++++-----------------
include/linux/pm_domain.h | 1
2 files changed, 62 insertions(+), 36 deletions(-)
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -60,6 +60,12 @@ static void genpd_release_lock(struct ge
mutex_unlock(&genpd->lock);
}
+static void genpd_set_active(struct generic_pm_domain *genpd)
+{
+ if (genpd->resume_count == 0)
+ genpd->status = GPD_STATE_ACTIVE;
+}
+
/**
* pm_genpd_poweron - Restore power to a given PM domain and its parents.
* @genpd: PM domain to power up.
@@ -75,42 +81,24 @@ static int pm_genpd_poweron(struct gener
start:
if (parent) {
- mutex_lock(&parent->lock);
+ genpd_acquire_lock(parent);
mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
} else {
mutex_lock(&genpd->lock);
}
- /*
- * Wait for the domain to transition into either the active,
- * or the power off state.
- */
- for (;;) {
- prepare_to_wait(&genpd->status_wait_queue, &wait,
- TASK_UNINTERRUPTIBLE);
- if (genpd->status != GPD_STATE_BUSY)
- break;
- mutex_unlock(&genpd->lock);
- if (parent)
- mutex_unlock(&parent->lock);
-
- schedule();
-
- if (parent) {
- mutex_lock(&parent->lock);
- mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
- } else {
- mutex_lock(&genpd->lock);
- }
- }
- finish_wait(&genpd->status_wait_queue, &wait);
if (genpd->status == GPD_STATE_ACTIVE
|| (genpd->prepared_count > 0 && genpd->suspend_power_off))
goto out;
+ if (genpd->status == GPD_STATE_BUSY) {
+ genpd_set_active(genpd);
+ goto out;
+ }
+
if (parent && parent->status != GPD_STATE_ACTIVE) {
mutex_unlock(&genpd->lock);
- mutex_unlock(&parent->lock);
+ genpd_release_lock(parent);
ret = pm_genpd_poweron(parent);
if (ret)
@@ -125,14 +113,14 @@ static int pm_genpd_poweron(struct gener
goto out;
}
- genpd->status = GPD_STATE_ACTIVE;
+ genpd_set_active(genpd);
if (parent)
parent->sd_count++;
out:
mutex_unlock(&genpd->lock);
if (parent)
- mutex_unlock(&parent->lock);
+ genpd_release_lock(parent);
return ret;
}
@@ -210,6 +198,20 @@ static void __pm_genpd_restore_device(st
}
/**
+ * genpd_abort_poweroff - Check if a PM domain power off should be aborted.
+ * @genpd: PM domain to check.
+ *
+ * Return true if a PM domain's status changed from GPD_STATE_BUSY during
+ * a "power off" operation, which means that a "power on" has occured in the
+ * meantime, or if its resume_count field is different from zero, which means
+ * that one of its devices has been resumed in the meantime.
+ */
+static bool genpd_abort_poweroff(struct generic_pm_domain *genpd)
+{
+ return genpd->status != GPD_STATE_BUSY || genpd->resume_count > 0;
+}
+
+/**
* pm_genpd_poweroff - Remove power from a given PM domain.
* @genpd: PM domain to power down.
*
@@ -239,6 +241,14 @@ static int pm_genpd_poweroff(struct gene
if (not_suspended > genpd->in_progress)
return -EBUSY;
+ /*
+ * Check if another instance of pm_genpd_poweroff() for the same domain
+ * that has already started to call drivers' runtime suspend routines
+ * is running in parallel with us.
+ */
+ if (genpd->status == GPD_STATE_BUSY)
+ return 0;
+
if (genpd->gov && genpd->gov->power_down_ok) {
if (!genpd->gov->power_down_ok(&genpd->domain))
return -EAGAIN;
@@ -250,6 +260,9 @@ static int pm_genpd_poweroff(struct gene
ret = __pm_genpd_save_device(dle, genpd);
if (ret)
goto err_dev;
+
+ if (genpd_abort_poweroff(genpd))
+ return 0;
}
mutex_unlock(&genpd->lock);
@@ -262,6 +275,13 @@ static int pm_genpd_poweroff(struct gene
mutex_lock(&genpd->lock);
}
+ if (genpd_abort_poweroff(genpd)) {
+ if (parent)
+ genpd_release_lock(parent);
+
+ return 0;
+ }
+
if (genpd->power_off)
genpd->power_off(genpd);
@@ -282,7 +302,7 @@ static int pm_genpd_poweroff(struct gene
list_for_each_entry_continue(dle, &genpd->dev_list, node)
__pm_genpd_restore_device(dle, genpd);
- genpd->status = GPD_STATE_ACTIVE;
+ genpd_set_active(genpd);
wake_up_all(&genpd->status_wait_queue);
return ret;
@@ -327,11 +347,13 @@ static int pm_genpd_runtime_suspend(stru
return ret;
}
- genpd_acquire_lock(genpd);
- genpd->in_progress++;
- pm_genpd_poweroff(genpd);
- genpd->in_progress--;
- genpd_release_lock(genpd);
+ mutex_lock(&genpd->lock);
+ if (genpd->resume_count == 0) {
+ genpd->in_progress++;
+ pm_genpd_poweroff(genpd);
+ genpd->in_progress--;
+ }
+ mutex_unlock(&genpd->lock);
return 0;
}
@@ -377,12 +399,14 @@ static int pm_genpd_runtime_resume(struc
if (ret)
return ret;
- genpd_acquire_lock(genpd);
+ mutex_lock(&genpd->lock);
genpd->status = GPD_STATE_BUSY;
+ genpd->resume_count++;
__pm_genpd_runtime_resume(dev, genpd);
- genpd->status = GPD_STATE_ACTIVE;
+ genpd->resume_count--;
+ genpd_set_active(genpd);
wake_up_all(&genpd->status_wait_queue);
- genpd_release_lock(genpd);
+ mutex_unlock(&genpd->lock);
if (genpd->start_device)
genpd->start_device(dev);
@@ -1122,6 +1146,7 @@ void pm_genpd_init(struct generic_pm_dom
genpd->sd_count = 0;
genpd->status = is_off ? GPD_STATE_POWER_OFF : GPD_STATE_ACTIVE;
init_waitqueue_head(&genpd->status_wait_queue);
+ genpd->resume_count = 0;
genpd->device_count = 0;
genpd->suspended_count = 0;
genpd->domain.ops.runtime_suspend = pm_genpd_runtime_suspend;
Index: linux-2.6/include/linux/pm_domain.h
===================================================================
--- linux-2.6.orig/include/linux/pm_domain.h
+++ linux-2.6/include/linux/pm_domain.h
@@ -34,6 +34,7 @@ struct generic_pm_domain {
unsigned int sd_count; /* Number of subdomains with power "on" */
enum gpd_status status; /* Current state of the domain */
wait_queue_head_t status_wait_queue;
+ unsigned int resume_count; /* Number of devices being resumed */
unsigned int device_count; /* Number of devices */
unsigned int suspended_count; /* System suspend device counter */
unsigned int prepared_count; /* Suspend counter of prepared devices */
^ permalink raw reply
* [PATCH 5/5 v1] PM / Domains: Do not restore all devices on power off error
From: Rafael J. Wysocki @ 2011-07-06 21:01 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
In-Reply-To: <201107062248.35586.rjw@sisk.pl>
From: Rafael J. Wysocki <rjw@sisk.pl>
Since every device in a PM domain has its own need_restore
flag, which is set by __pm_genpd_save_device(), there's no need to
walk the domain's device list and restore all devices on an error
from one of the drivers' .runtime_suspend() callbacks.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
drivers/base/power/domain.c | 3 ---
1 file changed, 3 deletions(-)
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -299,9 +299,6 @@ static int pm_genpd_poweroff(struct gene
return 0;
err_dev:
- list_for_each_entry_continue(dle, &genpd->dev_list, node)
- __pm_genpd_restore_device(dle, genpd);
-
genpd_set_active(genpd);
wake_up_all(&genpd->status_wait_queue);
^ permalink raw reply
* Re: [PATCH 0/5 v1] PM / Domains: Generic PM domains improvements
From: Greg KH @ 2011-07-06 21:08 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: LKML, MyungJoo Ham, Linux PM mailing list
In-Reply-To: <201107062248.35586.rjw@sisk.pl>
On Wed, Jul 06, 2011 at 10:48:35PM +0200, Rafael J. Wysocki wrote:
> Hi,
>
> For the last few days I've been working on modifications of the generic
> PM domains code allowing .start_device()/.stop_device() and drivers'
> .runtime_suspend()/.runtime_resume() callbacks to execute things like
> pm_runtime_resume() for other devices in the same PM domain safely
> without deadlocking (the original motivation was a console driver on
> shmobile that pm_runtime_get_*()/pm_runtime_put_*() are called for from
> the other devices' .runtime_suspend()/.runtime_resume() callbacks).
>
> I think I solved that problem, but in the process I found a few places
> where fixes and/on improvements were necessary. The patches in this
> series address those issues. They are on top of the branch at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-domains
>
> Patches [1-2/5] are regarded as 3.1 material, the others not necessarily,
> but I'd like to push them for 3.1 too if there are no complaints.
No complaints from me, looks nice.
greg k-h
^ permalink raw reply
* Re: [PATCH 0/5 v1] PM / Domains: Generic PM domains improvements
From: Rafael J. Wysocki @ 2011-07-06 21:12 UTC (permalink / raw)
To: Greg KH; +Cc: LKML, MyungJoo Ham, Linux PM mailing list
In-Reply-To: <20110706210856.GA1215@suse.de>
On Wednesday, July 06, 2011, Greg KH wrote:
> On Wed, Jul 06, 2011 at 10:48:35PM +0200, Rafael J. Wysocki wrote:
> > Hi,
> >
> > For the last few days I've been working on modifications of the generic
> > PM domains code allowing .start_device()/.stop_device() and drivers'
> > .runtime_suspend()/.runtime_resume() callbacks to execute things like
> > pm_runtime_resume() for other devices in the same PM domain safely
> > without deadlocking (the original motivation was a console driver on
> > shmobile that pm_runtime_get_*()/pm_runtime_put_*() are called for from
> > the other devices' .runtime_suspend()/.runtime_resume() callbacks).
> >
> > I think I solved that problem, but in the process I found a few places
> > where fixes and/on improvements were necessary. The patches in this
> > series address those issues. They are on top of the branch at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-domains
> >
> > Patches [1-2/5] are regarded as 3.1 material, the others not necessarily,
> > but I'd like to push them for 3.1 too if there are no complaints.
>
> No complaints from me, looks nice.
Thanks!
Rafael
^ permalink raw reply
* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: Ankita Garg @ 2011-07-07 4:54 UTC (permalink / raw)
To: david
Cc: Paul E. McKenney, linux-kernel, Dave Hansen, Pekka Enberg,
linux-mm, thomas.abraham, KAMEZAWA Hiroyuki, linux-pm,
Christoph Lameter, linux-arm-kernel, Arjan van de Ven
In-Reply-To: <alpine.DEB.2.02.1107061318190.2535@asgard.lang.hm>
On Wed, Jul 06, 2011 at 01:20:55PM -0700, david@lang.hm wrote:
> On Wed, 6 Jul 2011, Pekka Enberg wrote:
>
> >Why does the allocator need to know about address boundaries? Why
> >isn't it enough to make the page allocator and reclaim policies favor using
> >memory from lower addresses as aggressively as possible? That'd mean
> >we'd favor the first memory banks and could keep the remaining ones
> >powered off as much as possible.
> >
> >IOW, why do we need to support scenarios such as this:
> >
> > bank 0 bank 1 bank 2 bank3
> >| online | offline | online | offline |
>
> I believe that there are memory allocations that cannot be moved
> after they are made (think about regions allocated to DMA from
> hardware where the hardware has already been given the address space
> to DMA into)
>
Thats true. These are kernel allocations which are not movable. However,
the ZONE_MOVABLE would enable us to create complete movable zones and
the ones that have the kernel allocations could be flagged as kernelcore
zone.
> As a result, you may not be able to take bank 2 offline, so your
> option is to either leave banks 0-2 all online, or support emptying
> bank 1 and taking it offline.
>
--
Regards,
Ankita Garg (ankita@in.ibm.com)
Linux Technology Center
IBM India Systems & Technology Labs,
Bangalore, India
^ permalink raw reply
* Re: [RFC] Unprepare callback for cpuidle_device
From: Trinabh Gupta @ 2011-07-07 6:25 UTC (permalink / raw)
To: asinghal; +Cc: linux-pm, johlstei
In-Reply-To: <a619d8bbefd2f390a13d09084583dd39.squirrel@www.codeaurora.org>
On 07/06/2011 04:23 PM, asinghal@codeaurora.org wrote:
> We plan to use high resolution timers in one of our modules, with the
> requirement that we cancel these timers when the cpu goes idle and restart
> them when the cpu comes out of idle.
>
> We are cancelling the timers in cpuidle prepare callback. The problem is
> that if the need_resched() call in drivers/cpuidle/cpuidle.c returns true,
> how do we restart the timer? If the call returns false, we can restart the
> timer in the cpuidle enter callback.
Hi Amar,
I think you should not use cpuidle prepare callback at all. It may be
removed soon (see https://lkml.org/lkml/2011/6/6/261) and I think
there are better ways to achieve what you are trying to do.
I think everything should go into the enter routines (the idle routines
provided by the driver). That way you would not have to worry about
need_resched() in cpuidle.c. Also it would be a cleaner implementation
as you wouldn't touch generic cpuidle code.
>
> The solution to the problem that we have in mind is adding an unprepare
> callback to the cpuidle_device struct, and calling it if needs_resched()
> returns true. Another option is to implement deferred timers for hrtimers.
> Which of the two options is the better solution, or is there another
> feasible alternative?
As i said, everything should go inside enter routine and
you wouldn't have to use/implement prepare/unprepare callbacks.
Thanks
-Trinabh
^ permalink raw reply
* [PATCH 00/05] ARM: mach-shmobile: another sh7372 power domain update
From: Magnus Damm @ 2011-07-07 13:32 UTC (permalink / raw)
To: linux-sh; +Cc: linux-pm
ARM: mach-shmobile: another sh7372 power domain update
[PATCH 01/05] ARM: mach-shmobile: sh7372 D4 support
[PATCH 02/05] ARM: mach-shmobile: Runtime PM late init callback
[PATCH 03/05] ARM: mach-shmobile: sh7372 late pm domain off
[PATCH 04/05] PM: export pm_genpd_poweron() in header
[PATCH 05/05] ARM: mach-shmobile: sh7372 A3RV requires A4LC
These patches update the partial hardware power domain support
included in the pm-domains branch of the suspend-2.6 git tree.
With these patches applied the unused sh7372 hardware power domains
are powered down during kernel boot. This is currently handled
by the SoC-specific code, but I believe this is something that
can be shared on a framework leve. Until that happens we need some
way to turn off the unused power domains, so here it is.
Included is also code that fixes A3RV support by making sure A4LC
is on whenever A3RV is turned on. This issue started triggering
when "[PATCH 03/05] ARM: mach-shmobile: sh7372 late pm domain off"
got applied and A4LC was turned off automatically.
Signed-off-by: Magnus Damm <damm@opensource.se>
---
Applies to the suspend-2.6 git pm-domains branch on top of
the following acked patches:
"[PATCH 1/2] ARM: mach-shmobile: sh7372: make sure that fsi is peripheral of spu2"
"[PATCH 2/2 v2] ARM: mach-shmobile: sh7372 A4MP support"
arch/arm/mach-shmobile/include/mach/common.h | 1
arch/arm/mach-shmobile/include/mach/sh7372.h | 1
arch/arm/mach-shmobile/pm-sh7372.c | 62 ++++++++++++++++++++++++--
arch/arm/mach-shmobile/pm_runtime.c | 10 ++++
arch/arm/mach-shmobile/setup-sh7372.c | 1
drivers/base/power/domain.c | 2
include/linux/pm_domain.h | 5 ++
7 files changed, 78 insertions(+), 4 deletions(-)
^ permalink raw reply
* [PATCH 01/05] ARM: mach-shmobile: sh7372 D4 support
From: Magnus Damm @ 2011-07-07 13:32 UTC (permalink / raw)
To: linux-sh; +Cc: linux-pm
In-Reply-To: <20110707133212.22347.68775.sendpatchset@t400s>
From: Magnus Damm <damm@opensource.se>
Add support for the sh7372 D4 power domain. This power domain
contains the Coresight-ETM hardware block.
Signed-off-by: Magnus Damm <damm@opensource.se>
---
arch/arm/mach-shmobile/include/mach/sh7372.h | 1 +
arch/arm/mach-shmobile/pm-sh7372.c | 4 ++++
arch/arm/mach-shmobile/setup-sh7372.c | 1 +
3 files changed, 6 insertions(+)
--- 0002/arch/arm/mach-shmobile/include/mach/sh7372.h
+++ work/arch/arm/mach-shmobile/include/mach/sh7372.h 2011-07-07 19:20:09.000000000 +0900
@@ -486,6 +486,7 @@ static inline struct sh7372_pm_domain *t
#ifdef CONFIG_PM
extern struct sh7372_pm_domain sh7372_a4lc;
extern struct sh7372_pm_domain sh7372_a4mp;
+extern struct sh7372_pm_domain sh7372_d4;
extern struct sh7372_pm_domain sh7372_a3rv;
extern struct sh7372_pm_domain sh7372_a3ri;
extern struct sh7372_pm_domain sh7372_a3sg;
--- 0002/arch/arm/mach-shmobile/pm-sh7372.c
+++ work/arch/arm/mach-shmobile/pm-sh7372.c 2011-07-07 19:24:26.000000000 +0900
@@ -129,6 +129,10 @@ struct sh7372_pm_domain sh7372_a4mp = {
.bit_shift = 2,
};
+struct sh7372_pm_domain sh7372_d4 = {
+ .bit_shift = 3,
+};
+
struct sh7372_pm_domain sh7372_a3rv = {
.bit_shift = 6,
};
--- 0002/arch/arm/mach-shmobile/setup-sh7372.c
+++ work/arch/arm/mach-shmobile/setup-sh7372.c 2011-07-07 19:20:33.000000000 +0900
@@ -843,6 +843,7 @@ void __init sh7372_add_standard_devices(
{
sh7372_init_pm_domain(&sh7372_a4lc);
sh7372_init_pm_domain(&sh7372_a4mp);
+ sh7372_init_pm_domain(&sh7372_d4);
sh7372_init_pm_domain(&sh7372_a3rv);
sh7372_init_pm_domain(&sh7372_a3ri);
sh7372_init_pm_domain(&sh7372_a3sg);
^ permalink raw reply
* [PATCH 02/05] ARM: mach-shmobile: Runtime PM late init callback
From: Magnus Damm @ 2011-07-07 13:32 UTC (permalink / raw)
To: linux-sh; +Cc: linux-pm
In-Reply-To: <20110707133212.22347.68775.sendpatchset@t400s>
From: Magnus Damm <damm@opensource.se>
Add a mach-shmobile specific callback for SoC-specific code
to hook into. By having the late_initcall() in a common place
we can have multi-SoC/board support in the same kernel binary.
Signed-off-by: Magnus Damm <damm@opensource.se>
---
arch/arm/mach-shmobile/include/mach/common.h | 1 +
arch/arm/mach-shmobile/pm_runtime.c | 10 ++++++++++
2 files changed, 11 insertions(+)
--- 0001/arch/arm/mach-shmobile/include/mach/common.h
+++ work/arch/arm/mach-shmobile/include/mach/common.h 2011-07-07 19:39:49.000000000 +0900
@@ -12,6 +12,7 @@ extern struct platform_suspend_ops shmob
struct cpuidle_device;
extern void (*shmobile_cpuidle_modes[])(void);
extern void (*shmobile_cpuidle_setup)(struct cpuidle_device *dev);
+extern void (*shmobile_runtime_pm_late_init)(void);
extern void sh7367_init_irq(void);
extern void sh7367_add_early_devices(void);
--- 0001/arch/arm/mach-shmobile/pm_runtime.c
+++ work/arch/arm/mach-shmobile/pm_runtime.c 2011-07-07 19:39:28.000000000 +0900
@@ -56,3 +56,13 @@ static int __init sh_pm_runtime_init(voi
return 0;
}
core_initcall(sh_pm_runtime_init);
+
+void (*shmobile_runtime_pm_late_init)(void);
+
+static int __init sh_pm_runtime_late_init(void)
+{
+ if (shmobile_runtime_pm_late_init)
+ shmobile_runtime_pm_late_init();
+ return 0;
+}
+late_initcall(sh_pm_runtime_late_init);
^ permalink raw reply
* [PATCH 03/05] ARM: mach-shmobile: sh7372 late pm domain off
From: Magnus Damm @ 2011-07-07 13:32 UTC (permalink / raw)
To: linux-sh; +Cc: linux-pm
In-Reply-To: <20110707133212.22347.68775.sendpatchset@t400s>
From: Magnus Damm <damm@opensource.se>
Add sh7372 specific code to power down unused pm domains.
This should really be replaced by some generic PM core
code IMO, but until that happens this patch makes sure
we don't waste power by leaving unused power domains on.
Signed-off-by: Magnus Damm <damm@opensource.se>
---
arch/arm/mach-shmobile/pm-sh7372.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
--- 0005/arch/arm/mach-shmobile/pm-sh7372.c
+++ work/arch/arm/mach-shmobile/pm-sh7372.c 2011-07-07 20:15:19.000000000 +0900
@@ -96,6 +96,17 @@ static bool pd_active_wakeup(struct devi
return true;
}
+static void sh7372_late_pm_domain_off(void)
+{
+ /* request power down of unused pm domains */
+ queue_work(pm_wq, &sh7372_a4lc.genpd.power_off_work);
+ queue_work(pm_wq, &sh7372_a4mp.genpd.power_off_work);
+ queue_work(pm_wq, &sh7372_d4.genpd.power_off_work);
+ queue_work(pm_wq, &sh7372_a3rv.genpd.power_off_work);
+ queue_work(pm_wq, &sh7372_a3ri.genpd.power_off_work);
+ queue_work(pm_wq, &sh7372_a3sg.genpd.power_off_work);
+}
+
void sh7372_init_pm_domain(struct sh7372_pm_domain *sh7372_pd)
{
struct generic_pm_domain *genpd = &sh7372_pd->genpd;
@@ -107,6 +118,8 @@ void sh7372_init_pm_domain(struct sh7372
genpd->power_off = pd_power_down;
genpd->power_on = pd_power_up;
pd_power_up(&sh7372_pd->genpd);
+
+ shmobile_runtime_pm_late_init = sh7372_late_pm_domain_off;
}
void sh7372_add_device_to_domain(struct sh7372_pm_domain *sh7372_pd,
^ permalink raw reply
* [PATCH 04/05] PM: export pm_genpd_poweron() in header
From: Magnus Damm @ 2011-07-07 13:32 UTC (permalink / raw)
To: linux-sh; +Cc: linux-pm
In-Reply-To: <20110707133212.22347.68775.sendpatchset@t400s>
From: Magnus Damm <damm@opensource.se>
Allow SoC-specific code to call pm_genpd_poweron().
Signed-off-by: Magnus Damm <damm@opensource.se>
---
Perhaps there are better ways to deal with this?
drivers/base/power/domain.c | 2 +-
include/linux/pm_domain.h | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
--- 0001/drivers/base/power/domain.c
+++ work/drivers/base/power/domain.c 2011-07-07 21:37:49.000000000 +0900
@@ -37,7 +37,7 @@ static void genpd_sd_counter_dec(struct
* Restore power to @genpd and all of its parents so that it is possible to
* resume a device belonging to it.
*/
-static int pm_genpd_poweron(struct generic_pm_domain *genpd)
+int pm_genpd_poweron(struct generic_pm_domain *genpd)
{
int ret = 0;
--- 0001/include/linux/pm_domain.h
+++ work/include/linux/pm_domain.h 2011-07-07 21:37:49.000000000 +0900
@@ -63,6 +63,7 @@ extern int pm_genpd_remove_subdomain(str
struct generic_pm_domain *target);
extern void pm_genpd_init(struct generic_pm_domain *genpd,
struct dev_power_governor *gov, bool is_off);
+extern int pm_genpd_poweron(struct generic_pm_domain *genpd);
#else
static inline int pm_genpd_add_device(struct generic_pm_domain *genpd,
struct device *dev)
@@ -86,6 +87,10 @@ static inline int pm_genpd_remove_subdom
}
static inline void pm_genpd_init(struct generic_pm_domain *genpd,
struct dev_power_governor *gov, bool is_off) {}
+static inline int pm_genpd_poweron(struct generic_pm_domain *genpd)
+{
+ return -ENOSYS;
+}
#endif
#endif /* _LINUX_PM_DOMAIN_H */
^ permalink raw reply
* [PATCH 05/05] ARM: mach-shmobile: sh7372 A3RV requires A4LC
From: Magnus Damm @ 2011-07-07 13:32 UTC (permalink / raw)
To: linux-sh; +Cc: linux-pm
In-Reply-To: <20110707133212.22347.68775.sendpatchset@t400s>
From: Magnus Damm <damm@opensource.se>
Add a power domain workaround for the VPU and A3RV on sh7372.
The sh7372 data sheet mentions that the VPU is located in the
A3RV power domain. The A3RV power domain is not related to A4LC
in any way, but testing shows that unless A3RV _and_ A4LC are
powered on the VPU test program will bomb out.
This issue may be caused by a more or less undocumented dependency
on the MERAM block that happens to be located in A4LC. So now we
know that the out-of-reset requirement of the VPU is that the MERAM
is powered on.
This patch adds a workaround for A3RV to make sure A4LC is powered
on - this so we can use the VPU even though the LCDCs are in blanking
state and A4LC is supposed to be off.
Signed-off-by: Magnus Damm <damm@opensource.se>
---
arch/arm/mach-shmobile/pm-sh7372.c | 45 +++++++++++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 3 deletions(-)
--- 0006/arch/arm/mach-shmobile/pm-sh7372.c
+++ work/arch/arm/mach-shmobile/pm-sh7372.c 2011-07-07 21:57:14.000000000 +0900
@@ -91,6 +91,36 @@ static int pd_power_up(struct generic_pm
return ret;
}
+static int pd_power_up_a3rv(struct generic_pm_domain *genpd)
+{
+ int ret = pd_power_up(genpd);
+
+ /* force A4LC on after A3RV has been requested on */
+ pm_genpd_poweron(&sh7372_a4lc.genpd);
+
+ return ret;
+}
+
+static int pd_power_down_a3rv(struct generic_pm_domain *genpd)
+{
+ int ret = pd_power_down(genpd);
+
+ /* try to power down A4LC after A3RV is requested off */
+ pm_genpd_poweron(&sh7372_a4lc.genpd);
+ queue_work(pm_wq, &sh7372_a4lc.genpd.power_off_work);
+
+ return ret;
+}
+
+static int pd_power_down_a4lc(struct generic_pm_domain *genpd)
+{
+ /* only power down A4LC if A3RV is off */
+ if (!(__raw_readl(PSTR) & (1 << sh7372_a3rv.bit_shift)))
+ return pd_power_down(genpd);
+
+ return 0;
+}
+
static bool pd_active_wakeup(struct device *dev)
{
return true;
@@ -115,9 +145,18 @@ void sh7372_init_pm_domain(struct sh7372
genpd->stop_device = pm_clk_suspend;
genpd->start_device = pm_clk_resume;
genpd->active_wakeup = pd_active_wakeup;
- genpd->power_off = pd_power_down;
- genpd->power_on = pd_power_up;
- pd_power_up(&sh7372_pd->genpd);
+
+ if (sh7372_pd == &sh7372_a4lc) {
+ genpd->power_off = pd_power_down_a4lc;
+ genpd->power_on = pd_power_up;
+ } else if (sh7372_pd == &sh7372_a3rv) {
+ genpd->power_off = pd_power_down_a3rv;
+ genpd->power_on = pd_power_up_a3rv;
+ } else {
+ genpd->power_off = pd_power_down;
+ genpd->power_on = pd_power_up;
+ }
+ genpd->power_on(&sh7372_pd->genpd);
shmobile_runtime_pm_late_init = sh7372_late_pm_domain_off;
}
^ permalink raw reply
* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: Pekka Enberg @ 2011-07-07 18:00 UTC (permalink / raw)
To: david
Cc: Paul E. McKenney, KAMEZAWA Hiroyuki, linux-kernel, Dave Hansen,
linux-mm, thomas.abraham, Ankita Garg, linux-pm,
Christoph Lameter, linux-arm-kernel, Arjan van de Ven
In-Reply-To: <alpine.DEB.2.02.1107061318190.2535@asgard.lang.hm>
On Wed, 6 Jul 2011, Pekka Enberg wrote:
>> Why does the allocator need to know about address boundaries? Why
>> isn't it enough to make the page allocator and reclaim policies favor using
>> memory from lower addresses as aggressively as possible? That'd mean
>> we'd favor the first memory banks and could keep the remaining ones
>> powered off as much as possible.
>>
>> IOW, why do we need to support scenarios such as this:
>>
>> bank 0 bank 1 bank 2 bank3
>> | online | offline | online | offline |
>
On Wed, 6 Jul 2011, david@lang.hm wrote:
> I believe that there are memory allocations that cannot be moved after they
> are made (think about regions allocated to DMA from hardware where the
> hardware has already been given the address space to DMA into)
>
> As a result, you may not be able to take bank 2 offline, so your option is to
> either leave banks 0-2 all online, or support emptying bank 1 and taking it
> offline.
But drivers allocate DMA memory for hardware during module load and stay
pinned there until the driver is unloaded, no? So in practice DMA buffers
are going to be in banks 0-1?
Pekka
^ permalink raw reply
* Re: [RFC] Unprepare callback for cpuidle_device
From: asinghal @ 2011-07-07 19:52 UTC (permalink / raw)
To: Trinabh Gupta; +Cc: linux-pm, asinghal, johlstei
In-Reply-To: <4E15516E.5040209@gmail.com>
Hello Trinabh,
i cannot use the enter callback due to the following reason:
the residency calculation(tick)nohz_get_sleep_length) and the idle state
selection happens in the menu governor. The enter callback is called with
the selected state.
So cancelling the hrtimer that would affect the residency value calculated
in the menu governor, in the enter callback is not possible. The timer
needs to be cancelled before the select call is made.
thanks,
amar
> On 07/06/2011 04:23 PM, asinghal@codeaurora.org wrote:
>> We plan to use high resolution timers in one of our modules, with the
>> requirement that we cancel these timers when the cpu goes idle and
>> restart
>> them when the cpu comes out of idle.
>>
>> We are cancelling the timers in cpuidle prepare callback. The problem is
>> that if the need_resched() call in drivers/cpuidle/cpuidle.c returns
>> true,
>> how do we restart the timer? If the call returns false, we can restart
>> the
>> timer in the cpuidle enter callback.
>
> Hi Amar,
>
> I think you should not use cpuidle prepare callback at all. It may be
> removed soon (see https://lkml.org/lkml/2011/6/6/261) and I think
> there are better ways to achieve what you are trying to do.
>
> I think everything should go into the enter routines (the idle routines
> provided by the driver). That way you would not have to worry about
> need_resched() in cpuidle.c. Also it would be a cleaner implementation
> as you wouldn't touch generic cpuidle code.
>
>>
>> The solution to the problem that we have in mind is adding an unprepare
>> callback to the cpuidle_device struct, and calling it if needs_resched()
>> returns true. Another option is to implement deferred timers for
>> hrtimers.
>> Which of the two options is the better solution, or is there another
>> feasible alternative?
>
> As i said, everything should go inside enter routine and
> you wouldn't have to use/implement prepare/unprepare callbacks.
>
> Thanks
> -Trinabh
>
^ permalink raw reply
* [Update][PATCH 4/5] PM / Domains: Allow callbacks to execute all runtime PM helpers
From: Rafael J. Wysocki @ 2011-07-07 19:59 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
In-Reply-To: <201107062300.08884.rjw@sisk.pl>
From: Rafael J. Wysocki <rjw@sisk.pl>
A deadlock may occur if one of the PM domains' .start_device() or
.stop_device() callbacks or a device driver's .runtime_suspend() or
.runtime_resume() callback executed by the core generic PM domain
code uses a "wrong" runtime PM helper function. This happens, for
example, if .runtime_resume() from one device's driver calls
pm_runtime_resume() for another device in the same PM domain.
A similar situation may take place if a device's parent is in the
same PM domain, in which case the runtime PM framework may execute
pm_genpd_runtime_resume() automatically for the parent (if it is
suspended at the moment). This, of course, is undesirable, so
the generic PM domains code should be modified to prevent it from
happening.
The runtime PM framework guarantees that pm_genpd_runtime_suspend()
and pm_genpd_runtime_resume() won't be executed in parallel for
the same device, so the generic PM domains code need not worry
about those cases. Still, it needs to prevent the other possible
race conditions between pm_genpd_runtime_suspend(),
pm_genpd_runtime_resume(), pm_genpd_poweron() and pm_genpd_poweroff()
from happening and it needs to avoid deadlocks at the same time.
To this end, modify the generic PM domains code to relax
synchronization rules so that:
* pm_genpd_poweron() doesn't wait for the PM domain status to
change from GPD_STATE_BUSY. If it finds that the status is
not GPD_STATE_POWER_OFF, it returns without powering the domain on
(it may modify the status depending on the circumstances).
* pm_genpd_poweroff() returns as soon as it finds that the PM
domain's status changed from GPD_STATE_BUSY after it's released
the PM domain's lock.
* pm_genpd_runtime_suspend() doesn't wait for the PM domain status
to change from GPD_STATE_BUSY after executing the domain's
.stop_device() callback and executes pm_genpd_poweroff() only
if pm_genpd_runtime_resume() is not executed in parallel.
* pm_genpd_runtime_resume() doesn't wait for the PM domain status
to change from GPD_STATE_BUSY after executing pm_genpd_poweron()
and sets the domain's status to GPD_STATE_BUSY and increments its
counter of resuming devices (introduced by this change) immediately
after acquiring the lock. The counter of resuming devices is then
decremented after executing __pm_genpd_runtime_resume() for the
device and the domain's status is reset to GPD_STATE_ACTIVE (unless
there are more resuming devices in the domain, in which case the
status remains GPD_STATE_BUSY).
This way, for example, if a device driver's .runtime_resume()
callback executes pm_runtime_resume() for another device in the same
PM domain, pm_genpd_poweron() called by pm_genpd_runtime_resume()
invoked by the runtime PM framework will not block and it will see
that there's nothing to do for it. Next, the PM domain's lock will
be acquired without waiting for its status to change from
GPD_STATE_BUSY and the device driver's .runtime_resume() callback
will be executed. In turn, if pm_runtime_suspend() is executed by
one device driver's .runtime_resume() callback for another device in
the same PM domain, pm_genpd_poweroff() executed by
pm_genpd_runtime_suspend() invoked by the runtime PM framework as a
result will notice that one of the devices in the domain is being
resumed, so it will return immediately.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
Well, this turns out to be more tricky than I thought initially, because I
forgot about some "exotic" combinations of calls, like "runtime resume
immediately followed by runtime suspend called from a device driver's
.runtime_suspend() callback for another device in the same PM domain".
Updated patch, hopefully taking all of these into account, is below.
Thanks,
Rafael
---
drivers/base/power/domain.c | 144 ++++++++++++++++++++++++++++++--------------
include/linux/pm_domain.h | 3
2 files changed, 102 insertions(+), 45 deletions(-)
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -44,7 +44,8 @@ static void genpd_acquire_lock(struct ge
for (;;) {
prepare_to_wait(&genpd->status_wait_queue, &wait,
TASK_UNINTERRUPTIBLE);
- if (genpd->status != GPD_STATE_BUSY)
+ if (genpd->status == GPD_STATE_ACTIVE
+ || genpd->status == GPD_STATE_POWER_OFF)
break;
mutex_unlock(&genpd->lock);
@@ -60,6 +61,12 @@ static void genpd_release_lock(struct ge
mutex_unlock(&genpd->lock);
}
+static void genpd_set_active(struct generic_pm_domain *genpd)
+{
+ if (genpd->resume_count == 0)
+ genpd->status = GPD_STATE_ACTIVE;
+}
+
/**
* pm_genpd_poweron - Restore power to a given PM domain and its parents.
* @genpd: PM domain to power up.
@@ -75,42 +82,24 @@ static int pm_genpd_poweron(struct gener
start:
if (parent) {
- mutex_lock(&parent->lock);
+ genpd_acquire_lock(parent);
mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
} else {
mutex_lock(&genpd->lock);
}
- /*
- * Wait for the domain to transition into either the active,
- * or the power off state.
- */
- for (;;) {
- prepare_to_wait(&genpd->status_wait_queue, &wait,
- TASK_UNINTERRUPTIBLE);
- if (genpd->status != GPD_STATE_BUSY)
- break;
- mutex_unlock(&genpd->lock);
- if (parent)
- mutex_unlock(&parent->lock);
-
- schedule();
-
- if (parent) {
- mutex_lock(&parent->lock);
- mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
- } else {
- mutex_lock(&genpd->lock);
- }
- }
- finish_wait(&genpd->status_wait_queue, &wait);
if (genpd->status == GPD_STATE_ACTIVE
|| (genpd->prepared_count > 0 && genpd->suspend_power_off))
goto out;
+ if (genpd->status != GPD_STATE_POWER_OFF) {
+ genpd_set_active(genpd);
+ goto out;
+ }
+
if (parent && parent->status != GPD_STATE_ACTIVE) {
mutex_unlock(&genpd->lock);
- mutex_unlock(&parent->lock);
+ genpd_release_lock(parent);
ret = pm_genpd_poweron(parent);
if (ret)
@@ -125,14 +114,14 @@ static int pm_genpd_poweron(struct gener
goto out;
}
- genpd->status = GPD_STATE_ACTIVE;
+ genpd_set_active(genpd);
if (parent)
parent->sd_count++;
out:
mutex_unlock(&genpd->lock);
if (parent)
- mutex_unlock(&parent->lock);
+ genpd_release_lock(parent);
return ret;
}
@@ -210,6 +199,20 @@ static void __pm_genpd_restore_device(st
}
/**
+ * genpd_abort_poweroff - Check if a PM domain power off should be aborted.
+ * @genpd: PM domain to check.
+ *
+ * Return true if a PM domain's status changed to GPD_STATE_ACTIVE during
+ * a "power off" operation, which means that a "power on" has occured in the
+ * meantime, or if its resume_count field is different from zero, which means
+ * that one of its devices has been resumed in the meantime.
+ */
+static bool genpd_abort_poweroff(struct generic_pm_domain *genpd)
+{
+ return genpd->status == GPD_STATE_ACTIVE || genpd->resume_count > 0;
+}
+
+/**
* pm_genpd_poweroff - Remove power from a given PM domain.
* @genpd: PM domain to power down.
*
@@ -223,9 +226,17 @@ static int pm_genpd_poweroff(struct gene
struct generic_pm_domain *parent;
struct dev_list_entry *dle;
unsigned int not_suspended;
- int ret;
+ int ret = 0;
- if (genpd->status == GPD_STATE_POWER_OFF || genpd->prepared_count > 0)
+ start:
+ /*
+ * Do not try to power off the domain in the following situations:
+ * (1) The domain is already in the "power off" state.
+ * (2) System suspend is in progress.
+ * (3) One of the domain's devices is being resumed right now.
+ */
+ if (genpd->status == GPD_STATE_POWER_OFF || genpd->prepared_count > 0
+ || genpd->resume_count > 0)
return 0;
if (genpd->sd_count > 0)
@@ -239,34 +250,54 @@ static int pm_genpd_poweroff(struct gene
if (not_suspended > genpd->in_progress)
return -EBUSY;
+ if (genpd->poweroff_task) {
+ /*
+ * Another instance of pm_genpd_poweroff() is executing
+ * callbacks, so tell it to start over and return.
+ */
+ genpd->status = GPD_STATE_REPEAT;
+ return 0;
+ }
+
if (genpd->gov && genpd->gov->power_down_ok) {
if (!genpd->gov->power_down_ok(&genpd->domain))
return -EAGAIN;
}
genpd->status = GPD_STATE_BUSY;
+ genpd->poweroff_task = current;
list_for_each_entry_reverse(dle, &genpd->dev_list, node) {
ret = __pm_genpd_save_device(dle, genpd);
if (ret)
goto err_dev;
- }
- mutex_unlock(&genpd->lock);
+ if (genpd_abort_poweroff(genpd))
+ goto out;
+
+ if (genpd->status == GPD_STATE_REPEAT) {
+ genpd->poweroff_task = NULL;
+ goto start;
+ }
+ }
parent = genpd->parent;
if (parent) {
+ mutex_unlock(&genpd->lock);
+
genpd_acquire_lock(parent);
mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
- } else {
- mutex_lock(&genpd->lock);
+
+ if (genpd_abort_poweroff(genpd)) {
+ genpd_release_lock(parent);
+ goto out;
+ }
}
if (genpd->power_off)
genpd->power_off(genpd);
genpd->status = GPD_STATE_POWER_OFF;
- wake_up_all(&genpd->status_wait_queue);
if (parent) {
genpd_sd_counter_dec(parent);
@@ -276,16 +307,17 @@ static int pm_genpd_poweroff(struct gene
genpd_release_lock(parent);
}
- return 0;
+ out:
+ genpd->poweroff_task = NULL;
+ wake_up_all(&genpd->status_wait_queue);
+ return ret;
err_dev:
list_for_each_entry_continue(dle, &genpd->dev_list, node)
__pm_genpd_restore_device(dle, genpd);
- genpd->status = GPD_STATE_ACTIVE;
- wake_up_all(&genpd->status_wait_queue);
-
- return ret;
+ genpd_set_active(genpd);
+ goto out;
}
/**
@@ -327,11 +359,11 @@ static int pm_genpd_runtime_suspend(stru
return ret;
}
- genpd_acquire_lock(genpd);
+ mutex_lock(&genpd->lock);
genpd->in_progress++;
pm_genpd_poweroff(genpd);
genpd->in_progress--;
- genpd_release_lock(genpd);
+ mutex_unlock(&genpd->lock);
return 0;
}
@@ -365,6 +397,7 @@ static void __pm_genpd_runtime_resume(st
static int pm_genpd_runtime_resume(struct device *dev)
{
struct generic_pm_domain *genpd;
+ DEFINE_WAIT(wait);
int ret;
dev_dbg(dev, "%s()\n", __func__);
@@ -377,12 +410,31 @@ static int pm_genpd_runtime_resume(struc
if (ret)
return ret;
- genpd_acquire_lock(genpd);
+ mutex_lock(&genpd->lock);
genpd->status = GPD_STATE_BUSY;
+ genpd->resume_count++;
+ for (;;) {
+ prepare_to_wait(&genpd->status_wait_queue, &wait,
+ TASK_UNINTERRUPTIBLE);
+ /*
+ * If current is the powering off task, we have been called
+ * reentrantly from one of the device callbacks, so we should
+ * not wait.
+ */
+ if (!genpd->poweroff_task || genpd->poweroff_task == current)
+ break;
+ mutex_unlock(&genpd->lock);
+
+ schedule();
+
+ mutex_lock(&genpd->lock);
+ }
+ finish_wait(&genpd->status_wait_queue, &wait);
__pm_genpd_runtime_resume(dev, genpd);
- genpd->status = GPD_STATE_ACTIVE;
+ genpd->resume_count--;
+ genpd_set_active(genpd);
wake_up_all(&genpd->status_wait_queue);
- genpd_release_lock(genpd);
+ mutex_unlock(&genpd->lock);
if (genpd->start_device)
genpd->start_device(dev);
@@ -1122,6 +1174,8 @@ void pm_genpd_init(struct generic_pm_dom
genpd->sd_count = 0;
genpd->status = is_off ? GPD_STATE_POWER_OFF : GPD_STATE_ACTIVE;
init_waitqueue_head(&genpd->status_wait_queue);
+ genpd->poweroff_task = NULL;
+ genpd->resume_count = 0;
genpd->device_count = 0;
genpd->suspended_count = 0;
genpd->domain.ops.runtime_suspend = pm_genpd_runtime_suspend;
Index: linux-2.6/include/linux/pm_domain.h
===================================================================
--- linux-2.6.orig/include/linux/pm_domain.h
+++ linux-2.6/include/linux/pm_domain.h
@@ -14,6 +14,7 @@
enum gpd_status {
GPD_STATE_ACTIVE = 0, /* PM domain is active */
GPD_STATE_BUSY, /* Something is happening to the PM domain */
+ GPD_STATE_REPEAT, /* Power off in progress, to be repeated */
GPD_STATE_POWER_OFF, /* PM domain is off */
};
@@ -34,6 +35,8 @@ struct generic_pm_domain {
unsigned int sd_count; /* Number of subdomains with power "on" */
enum gpd_status status; /* Current state of the domain */
wait_queue_head_t status_wait_queue;
+ struct task_struct *poweroff_task; /* Powering off task */
+ unsigned int resume_count; /* Number of devices being resumed */
unsigned int device_count; /* Number of devices */
unsigned int suspended_count; /* System suspend device counter */
unsigned int prepared_count; /* Suspend counter of prepared devices */
^ permalink raw reply
* [Update][PATCH 5/5] PM / Domains: Do not restore all devices on power off error
From: Rafael J. Wysocki @ 2011-07-07 20:01 UTC (permalink / raw)
To: Linux PM mailing list; +Cc: Greg KH, LKML, MyungJoo Ham
In-Reply-To: <201107062301.22277.rjw@sisk.pl>
From: Rafael J. Wysocki <rjw@sisk.pl>
Since every device in a PM domain has its own need_restore
flag, which is set by __pm_genpd_save_device(), there's no need to
walk the domain's device list and restore all devices on an error
from one of the drivers' .runtime_suspend() callbacks.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
This update is on top of the [4/5] update I've just sent. It makes the code
more straightforward IMHO.
Thanks,
Rafael
---
drivers/base/power/domain.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -269,8 +269,10 @@ static int pm_genpd_poweroff(struct gene
list_for_each_entry_reverse(dle, &genpd->dev_list, node) {
ret = __pm_genpd_save_device(dle, genpd);
- if (ret)
- goto err_dev;
+ if (ret) {
+ genpd_set_active(genpd);
+ goto out;
+ }
if (genpd_abort_poweroff(genpd))
goto out;
@@ -311,13 +313,6 @@ static int pm_genpd_poweroff(struct gene
genpd->poweroff_task = NULL;
wake_up_all(&genpd->status_wait_queue);
return ret;
-
- err_dev:
- list_for_each_entry_continue(dle, &genpd->dev_list, node)
- __pm_genpd_restore_device(dle, genpd);
-
- genpd_set_active(genpd);
- goto out;
}
/**
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox