* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: Vaidyanathan Srinivasan @ 2011-07-06 16:41 UTC (permalink / raw)
To: Pekka Enberg
Cc: KAMEZAWA Hiroyuki, linux-kernel, Dave Hansen, linux-mm,
thomas.abraham, Ankita Garg, linux-pm, Paul E. McKenney,
Christoph Lameter, linux-arm-kernel, Arjan van de Ven
In-Reply-To: <CAOJsxLHQP=-srK_uYYBsPb7+rUBnPZG7bzwtCd-rRaQa4ikUFg@mail.gmail.com>
* Pekka Enberg <penberg@kernel.org> [2011-07-06 11:45:41]:
> Hi Ankita,
>
> [ I don't really know anything about memory power management but
> here's my two cents since you asked for it. ]
>
> On Wed, Jun 29, 2011 at 4:00 PM, Ankita Garg <ankita@in.ibm.com> wrote:
> > I) Dynamic Power Transition
> >
> > The goal here is to ensure that as much as possible, on an idle system,
> > the memory references do not get spread across the entire RAM, a problem
> > similar to memory fragmentation. The proposed approach is as below:
> >
> > 1) One of the first things is to ensure that the memory allocations do
> > not spill over to more number of regions. Thus the allocator needs to
> > be aware of the address boundary of the different regions.
>
> Why does the allocator need to know about address boundaries? Why
> isn't it enough to make the page allocator and reclaim policies favor using
> memory from lower addresses as aggressively as possible? That'd mean
> we'd favor the first memory banks and could keep the remaining ones
> powered off as much as possible.
Yes, this will work to a limited extent when we have few regions to
account for. However if applications start and stop leaving large
holes in the address map, it may not worth the effort of migrating
pages to lower addresses to pack the holes.
> IOW, why do we need to support scenarios such as this:
>
> bank 0 bank 1 bank 2 bank3
> | online | offline | online | offline |
>
> instead of using memory compaction and possibly something like the
> SLUB defragmentation patches to turn the memory map into this:
>
> bank 0 bank 1 bank 2 bank3
> | online | online | offline | offline |
Yes, this is what we need, but also have a notion of how many pages
are used in each bank so that we can pack pages from under utilized
banks into a reasonably used bank and thereby free more banks.
Freeing more banks + clustering all used or free banks gives us more
power saving benefits.
> > 2) At the time of allocation, before spilling over allocations to the
> > next logical region, the allocator needs to make a best attempt to
> > reclaim some memory from within the existing region itself first. The
> > reclaim here needs to be in LRU order within the region. However, if
> > it is ascertained that the reclaim would take a lot of time, like there
> > are quite a fe write-backs needed, then we can spill over to the next
> > memory region (just like our NUMA node allocation policy now).
>
> I think a much more important question is what happens _after_ we've
> allocated and free'd tons of memory few times. AFAICT, memory
> regions don't help with that kind of fragmentation that will eventually
> happen anyway.
Memory regions allow us to have a zone per-region. This helps in the
cases were allocations are fragments into multiple regions by
potentially reclaiming very low utilized regions and packing the pages
into higher utilized regions. The requirement is a standard
de-fragmentation approach, except that the cluster of allocations
should fall within a region (any region) as much as possible.
--Vaidy
^ permalink raw reply
* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: Pekka Enberg @ 2011-07-06 9:01 UTC (permalink / raw)
To: Ankita Garg
Cc: Paul E. McKenney, linux-kernel, Dave Hansen, linux-mm,
thomas.abraham, KAMEZAWA Hiroyuki, linux-pm, Christoph Lameter,
linux-arm-kernel, Arjan van de Ven
In-Reply-To: <CAOJsxLHQP=-srK_uYYBsPb7+rUBnPZG7bzwtCd-rRaQa4ikUFg@mail.gmail.com>
On Wed, Jul 6, 2011 at 11:45 AM, Pekka Enberg <penberg@kernel.org> wrote:
> Hi Ankita,
>
> [ I don't really know anything about memory power management but
> here's my two cents since you asked for it. ]
>
> On Wed, Jun 29, 2011 at 4:00 PM, Ankita Garg <ankita@in.ibm.com> wrote:
>> I) Dynamic Power Transition
>>
>> The goal here is to ensure that as much as possible, on an idle system,
>> the memory references do not get spread across the entire RAM, a problem
>> similar to memory fragmentation. The proposed approach is as below:
>>
>> 1) One of the first things is to ensure that the memory allocations do
>> not spill over to more number of regions. Thus the allocator needs to
>> be aware of the address boundary of the different regions.
>
> Why does the allocator need to know about address boundaries? Why
> isn't it enough to make the page allocator and reclaim policies favor using
> memory from lower addresses as aggressively as possible? That'd mean
> we'd favor the first memory banks and could keep the remaining ones
> powered off as much as possible.
>
> IOW, why do we need to support scenarios such as this:
>
> bank 0 bank 1 bank 2 bank3
> | online | offline | online | offline |
>
> instead of using memory compaction and possibly something like the
> SLUB defragmentation patches to turn the memory map into this:
>
> bank 0 bank 1 bank 2 bank3
> | online | online | offline | offline |
>
>> 2) At the time of allocation, before spilling over allocations to the
>> next logical region, the allocator needs to make a best attempt to
>> reclaim some memory from within the existing region itself first. The
>> reclaim here needs to be in LRU order within the region. However, if
>> it is ascertained that the reclaim would take a lot of time, like there
>> are quite a fe write-backs needed, then we can spill over to the next
>> memory region (just like our NUMA node allocation policy now).
>
> I think a much more important question is what happens _after_ we've
> allocated and free'd tons of memory few times. AFAICT, memory
> regions don't help with that kind of fragmentation that will eventually
> happen anyway.
Btw, I'd also decouple the 'memory map' required for PASR from
memory region data structure and use page allocator hooks for letting
the PASR driver know about allocated and unallocated memory. That
way the PASR driver could automatically detect if full banks are
unused and power them off. That'd make memory power management
transparent to the VM regardless of whether we're using hardware or
software poweroff.
Pekka
^ permalink raw reply
* Re: [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management
From: Pekka Enberg @ 2011-07-06 8:45 UTC (permalink / raw)
To: Ankita Garg
Cc: Paul E. McKenney, linux-kernel, Dave Hansen, linux-mm,
thomas.abraham, KAMEZAWA Hiroyuki, linux-pm, Christoph Lameter,
linux-arm-kernel, Arjan van de Ven
In-Reply-To: <20110629130038.GA7909@in.ibm.com>
Hi Ankita,
[ I don't really know anything about memory power management but
here's my two cents since you asked for it. ]
On Wed, Jun 29, 2011 at 4:00 PM, Ankita Garg <ankita@in.ibm.com> wrote:
> I) Dynamic Power Transition
>
> The goal here is to ensure that as much as possible, on an idle system,
> the memory references do not get spread across the entire RAM, a problem
> similar to memory fragmentation. The proposed approach is as below:
>
> 1) One of the first things is to ensure that the memory allocations do
> not spill over to more number of regions. Thus the allocator needs to
> be aware of the address boundary of the different regions.
Why does the allocator need to know about address boundaries? Why
isn't it enough to make the page allocator and reclaim policies favor using
memory from lower addresses as aggressively as possible? That'd mean
we'd favor the first memory banks and could keep the remaining ones
powered off as much as possible.
IOW, why do we need to support scenarios such as this:
bank 0 bank 1 bank 2 bank3
| online | offline | online | offline |
instead of using memory compaction and possibly something like the
SLUB defragmentation patches to turn the memory map into this:
bank 0 bank 1 bank 2 bank3
| online | online | offline | offline |
> 2) At the time of allocation, before spilling over allocations to the
> next logical region, the allocator needs to make a best attempt to
> reclaim some memory from within the existing region itself first. The
> reclaim here needs to be in LRU order within the region. However, if
> it is ascertained that the reclaim would take a lot of time, like there
> are quite a fe write-backs needed, then we can spill over to the next
> memory region (just like our NUMA node allocation policy now).
I think a much more important question is what happens _after_ we've
allocated and free'd tons of memory few times. AFAICT, memory
regions don't help with that kind of fragmentation that will eventually
happen anyway.
Pekka
^ permalink raw reply
* Re: Random freezing failure with NFS and automount
From: Rafael J. Wysocki @ 2011-07-04 23:29 UTC (permalink / raw)
To: svaidy; +Cc: linux-pm
In-Reply-To: <20110704171449.GC5357@dirshya.in.ibm.com>
On Monday, July 04, 2011, Vaidyanathan Srinivasan wrote:
> * Rafael J. Wysocki <rjw@sisk.pl> [2011-07-03 09:07:18]:
>
> > Hi,
> >
> > On Tuesday, June 28, 2011, Vaidyanathan Srinivasan wrote:
> > > Hi,
> > >
> > > I have random freezing failures on my laptop running 2.6.39 kernel.
> > > The laptop has NFS client and automount. Network could have been
> > > disconnected by the time suspend is attempted, hence nfs client should
> > > fail all operations, just freeze and allow laptop to suspend.
> > >
> > > I need some help to drill deeper at this log and also suggestions on
> > > config options to try and get more information to help me root cause
> > > this issue.
> > >
> > > This happens once in 4-5 suspend/resume cycles, does not succeed on
> > > retry, eventually I have to reboot.
> >
> > This is a tasks freezer failure, ie. the freezing of tasks fails, because
> > one of them refuses to handle signals for 20 s. This is probably related
> > to waiting on a VFS mutex in the TASK_UNINTERRUPTIBLE state.
> >
> > We don't handle those cases nicely right now, sorry about that.
>
> Hi Rafael,
>
> Thanks for taking a look. The NFS mount option in hard,intr so
> I would expect an interruptible sleep. I will take this to file
> system folks and see if they can help. I will also review my mount
> options to improve the situation.
>
> When you said we are not handling the situation, what did you mean?
I meant that the freezing fails in those cases.
> We seem to cleanly unfreeze the tasks and return the system to working
> state (though suspend fails). Maybe we should send some signals and
> try to prod the failing task to get to freeze? What is needed here to
> improve our framework?
Probably there is a bug (or more bugs) in our error code paths. That wouldn't
suprpise me too much, because those code paths are not tested very hard ...
Thanks,
Rafael
^ permalink raw reply
* Re: Random freezing failure with NFS and automount
From: Vaidyanathan Srinivasan @ 2011-07-04 17:14 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-pm
In-Reply-To: <201107030907.18203.rjw@sisk.pl>
* Rafael J. Wysocki <rjw@sisk.pl> [2011-07-03 09:07:18]:
> Hi,
>
> On Tuesday, June 28, 2011, Vaidyanathan Srinivasan wrote:
> > Hi,
> >
> > I have random freezing failures on my laptop running 2.6.39 kernel.
> > The laptop has NFS client and automount. Network could have been
> > disconnected by the time suspend is attempted, hence nfs client should
> > fail all operations, just freeze and allow laptop to suspend.
> >
> > I need some help to drill deeper at this log and also suggestions on
> > config options to try and get more information to help me root cause
> > this issue.
> >
> > This happens once in 4-5 suspend/resume cycles, does not succeed on
> > retry, eventually I have to reboot.
>
> This is a tasks freezer failure, ie. the freezing of tasks fails, because
> one of them refuses to handle signals for 20 s. This is probably related
> to waiting on a VFS mutex in the TASK_UNINTERRUPTIBLE state.
>
> We don't handle those cases nicely right now, sorry about that.
Hi Rafael,
Thanks for taking a look. The NFS mount option in hard,intr so
I would expect an interruptible sleep. I will take this to file
system folks and see if they can help. I will also review my mount
options to improve the situation.
When you said we are not handling the situation, what did you mean?
We seem to cleanly unfreeze the tasks and return the system to working
state (though suspend fails). Maybe we should send some signals and
try to prod the failing task to get to freeze? What is needed here to
improve our framework?
--Vaidy
^ permalink raw reply
* Re: [PATCH v3 2/3] PM / DEVFREQ: add example governors
From: MyungJoo Ham @ 2011-07-04 8:58 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
linux-pm, Thomas Gleixner
In-Reply-To: <201107041043.25150.rjw@sisk.pl>
On Mon, Jul 4, 2011 at 5:43 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Monday, July 04, 2011, MyungJoo Ham wrote:
>> Hello,
>>
>> 2011/7/3 Rafael J. Wysocki <rjw@sisk.pl>:
>> > Hi,
>> >
>> > On Friday, May 27, 2011, MyungJoo Ham wrote:
>> >> Three CPUFREQ-like governors are provided as examples.
>> >>
>> >> powersave: use the lowest frequency possible. The user (device) should
>> >> set the polling_ms as 0 because polling is useless for this governor.
>> >>
>> >> performance: use the highest freqeuncy possible. The user (device)
>> >> should set the polling_ms as 0 because polling is useless for this
>> >> governor.
>> >>
>> >> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>> >>
>> >> When a user updates OPP entries (enable/disable/add), OPP framework
>> >> automatically notifies DEVFREQ to update operating frequency
>> >> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>> >> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
>> >> , performance, or any other "static" governors.
>> >>
>> >> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> >> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> []
>> >> +
>> >> + /* Set the desired frequency based on the load */
>> >> + a = (unsigned long long) stat.busy_time * stat.current_frequency;
>> >
>> > What's the purpose of the conversion?
>>
>> Assuming that the work speed of a device is proportional to its
>> frequency, it measures the amount of work done.
>> It's time * work/time. For example, during the last 10 second, if the
>> busy_time was 5 sec and frequency was 10MHz,
>> it's "50M", which is same as 20MHz and 2.5 sec.
>
> I understand that, but my question was why you're doing a forced conversion
> to (unsigned long long).
Ah.. that was for the 64bit operations.
Both busy_time and current_frequency are 32bit and current_frequency
may be a big number.
Thus, in order to get "freq" value without losing bits (e.g., if
current_frequency = 1GHz and busy_time = 8000, we get an overflow
without 64bit operations), I've inserted 64bit operations with the
conversion. For the cosmetic reasons, it appears that "u64" looks
better though.
>
> Thanks,
> Rafael
>
Thank you.
- MyungJoo
--
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
^ permalink raw reply
* Re: [PATCH v3 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
From: Rafael J. Wysocki @ 2011-07-04 8:57 UTC (permalink / raw)
To: myungjoo.ham
Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
linux-pm, Thomas Gleixner
In-Reply-To: <CAJ0PZbSBCK96uw5x_o6ZsMtvFo4U5Hexe10JoaJfPQisLbAshg@mail.gmail.com>
On Monday, July 04, 2011, MyungJoo Ham wrote:
> 2011/7/3 Rafael J. Wysocki <rjw@sisk.pl>:
> > Hi,
> >
> > I'm not sure if spelling DEVFREQ in the kerneldoc comments with capitals is
> > a good idea, it looks kind of odd. I'm not sure what would be look better,
> > though.
>
> Umm.. what about "devfreq" using all non-capitals? It looks fine to me as well.
Works for me.
> >
> > On Friday, May 27, 2011, MyungJoo Ham wrote:
> >
> > ...
> >> +/**
> >> + * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
> >
> > I'd say "periodically" istead of "regularly".
>
> Yes, that is the proper term. Thanks.
>
> >
> >> + * @work: the work struct used to run devfreq_monitor periodically.
> >
> > Also please say something more about the "tickle" thinkg here.
>
> Sure.
>
> >
> >> + */
> >> +static void devfreq_monitor(struct work_struct *work)
> >> +{
> >> + struct devfreq *devfreq;
> >> + int error;
> >> + bool continue_polling = false;
> >> + struct devfreq *to_remove = NULL;
> >> +
> >> + mutex_lock(&devfreq_list_lock);
> >> +
> >> + list_for_each_entry(devfreq, &devfreq_list, node) {
> >> + /* Remove the devfreq entry that failed */
> >> + if (to_remove) {
> >> + list_del(&to_remove->node);
> >> + kfree(to_remove);
> >> + to_remove = NULL;
> >> + }
> >> +
> >> + /*
> >> + * If the device is tickled and the tickle duration is left,
> >> + * do not change the frequency for a while
> >> + */
> >> + if (devfreq->tickle) {
> >> + continue_polling = true;
> >> + devfreq->tickle--;
> >> +
> >> + /*
> >> + * If the tickle is ending and the device is not going
> >> + * to poll, force the device to poll next time so that
> >> + * it can return to the original frequency afterwards.
> >> + * However, non-polling device will have 0 polling_ms,
> >> + * it will not poll again later.
> >> + */
> >> + if (devfreq->tickle == 0 && devfreq->next_polling == 0)
> >> + devfreq->next_polling = 1;
> >> +
> >> + continue;
> >> + }
> >> +
> >> + /* This device does not require polling */
> >> + if (devfreq->next_polling == 0)
> >> + continue;
> >> +
> >> + continue_polling = true;
> >> +
> >> + if (devfreq->next_polling == 1) {
> >> + /* This device is polling this time */
> >
> > I'd remove this comment, it's confusing IMO. Besides, it may be better
> > to structure the code like this:
> >
> > if (devfreq->next_polling-- == 1) {
> > }
> >
> > and then you wouldn't need the "else" at all.
>
> Ok, I'll try that style.
>
> >
> >> + error = devfreq_do(devfreq);
> >> + if (error && error != -EAGAIN) {
> >> + /*
> >> + * Remove a devfreq with error. However,
> >> + * We cannot remove it right here because the
> >
> > Comma after "here", please.
>
> "We" should be "we" here, but no comma there though:
> http://owl.english.purdue.edu/owl/resource/607/02/
OK, whatever.
> However, "However, because the ..... above, we cannot remove it right
> here" is correct.
Yes, it is.
> >
> >> + * devfreq pointer is going to be used by
> >> + * list_for_each_entry above. Thus, it is
> >> + * removed afterwards.
> >
> > Why don't you use list_for_each_entry_safe(), then?
>
> Ah.. that's a good idea. Thanks! Now, I can reorganize this one.
>
> >
> >> + */
> >> + to_remove = devfreq->dev;
> >> + dev_err(devfreq->dev, "devfreq_do error(%d). "
> >> + "DEVFREQ is removed from the device\n",
> >> + error);
> >> + continue;
> >> + }
> >> + devfreq->next_polling = DIV_ROUND_UP(
> >> + devfreq->profile->polling_ms,
> >> + DEVFREQ_INTERVAL);
> >> + } else {
> >> + /* The device will poll later when next_polling = 1 */
> >> + devfreq->next_polling--;
> >> + }
> >> + }
> >> +
> >> + if (to_remove) {
> >> + list_del(&to_remove->node);
> >> + kfree(to_remove);
> >> + to_remove = NULL;
> >> + }
> >> +
> >> + if (continue_polling) {
> >> + polling = true;
> >> + queue_delayed_work(devfreq_wq, &devfreq_work,
> >> + msecs_to_jiffies(DEVFREQ_INTERVAL));
> >> + } else {
> >> + polling = false;
> >> + }
> >
> > OK, so why exactly continue_polling is needed? It seems you might siply use
> > "polling" instead of it above.
>
> The purpose of continue_polling seems to disappear in the middle of
> development. I'll remove it.
>
> >
> >> +
> >> + mutex_unlock(&devfreq_list_lock);
> >> +}
> > ...
> >> +/**
> >> + * devfreq_update() - Notify that the device OPP has been changed.
> >> + * @dev: the device whose OPP has been changed.
> >> + * @may_not_exist: do not print error message even if the device
> >> + * does not have devfreq entry.
> >
> > This argument isn't used any more.
>
> Ah.. yes. sure.
>
> >
> >> + */
> >> +int devfreq_update(struct device *dev)
> >> +{
> >> + struct devfreq *devfreq;
> >> + int err = 0;
> >> +
> >> + mutex_lock(&devfreq_list_lock);
> >> +
> >> + devfreq = find_device_devfreq(dev);
> >> + if (IS_ERR(devfreq)) {
> >> + err = PTR_ERR(devfreq);
> >> + goto out;
> >> + }
> >> +
> >> + if (devfreq->tickle) {
> >> + /* If the max freq available is changed, re-tickle */
> >
> > It would be good to explain what "re-tickle" means.
>
> Of course. I'll explain that.
>
> >
> >> + unsigned long freq = devfreq->profile->max_freq;
> >> + struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
> >> +
> >> + if (IS_ERR(opp)) {
> >> + err = PTR_ERR(opp);
> >> + goto out;
> >> + }
> >> +
> >> + /* Max freq available is not changed */
> >> + if (devfreq->previous_freq == freq)
> >> + goto out;
> >> +
> >> + err = devfreq->profile->target(devfreq->dev, opp);
> >> + if (!err)
> >> + devfreq->previous_freq = freq;
> >
> > Why don't we run devfreq_do() in this case?
>
> When the list of available frequencies is updated and the device is to
> be kept at its maximum frequency, we do not need to reevaluate the
> device's activities to setup the new frequency. In such cases, we only
> need to keep it at its maximum freuqency. If the maximum frequency is
> not changed, we don't need anything to do (when "tickling" is
> completed, devfreq_do() will reevaluate automatically); otherwise,
> tickling again at the new frequency is needed (re-tickle).
OK
> >> + } else {
> >> + /* Reevaluate the proper frequency */
> >> + err = devfreq_do(devfreq);
> >> + }
> >> +
> >> +out:
> >> + mutex_unlock(&devfreq_list_lock);
> >> + return err;
> >> +}
> >> +
> >
> > Add a kerneldoc comment here, please.
>
> Yes. I'll add one. (and for any other functions missing kerneldoc comments)
Well, if the function is a static one-liner, you can skip the kerneldoc IMHO
However, that one below is not really a one-liner.. :-)
> >> +static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
> >> +{
> >> + int err = 0;
> >> + unsigned long freq;
> >> + struct opp *opp;
> >> +
> >> + freq = df->profile->max_freq;
> >> + opp = opp_find_freq_floor(df->dev, &freq);
> >> + if (IS_ERR(opp))
> >> + return PTR_ERR(opp);
> >> +
> >> + if (df->previous_freq != freq) {
> >> + err = df->profile->target(df->dev, opp);
> >> + if (!err)
> >> + df->previous_freq = freq;
> >> + }
> >> + if (err) {
> >> + dev_err(df->dev, "%s: Cannot set frequency.\n", __func__);
> >> + } else {
> >> + df->tickle = delay;
> >> + df->num_tickle++;
> >> + }
> >> +
> >> + if (devfreq_wq && !polling) {
> >> + polling = true;
> >> + queue_delayed_work(devfreq_wq, &devfreq_work,
> >> + msecs_to_jiffies(DEVFREQ_INTERVAL));
> >> + }
> >> +
> >> + return err;
> >> +}
> >
> > Thanks,
> > Rafael
> >
>
> Thank you so much for the comments!
You're welcome.
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCH v3 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
From: Rafael J. Wysocki @ 2011-07-04 8:48 UTC (permalink / raw)
To: myungjoo.ham
Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
linux-pm, Thomas Gleixner
In-Reply-To: <CAJ0PZbQJadxCu=b4JWun29gdPpdsJPiP0WdqmSJK5px5nQRpgA@mail.gmail.com>
On Monday, July 04, 2011, MyungJoo Ham wrote:
> Hello,
>
> 2011/7/3 Rafael J. Wysocki <rjw@sisk.pl>:
> > On Friday, May 27, 2011, MyungJoo Ham wrote:
> >> 1. System-wide sysfs interface
> >> - tickle_all R: number of tickle_all execution
> >> W: tickle all devfreq devices
> >> - min_interval R: devfreq monitoring base interval in ms
> >> - monitoring R: shows whether devfreq monitoring is active or
> >> not.
> >>
> >> 2. Device specific sysfs interface
> >> - tickle R: number of tickle execution for the device
> >> W: tickle the device
> >>
> >> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> >>
> >> --
> >> Changed from v2
> >> - add ABI entries for devfreq sysfs interface
> >> ---
> >> Documentation/ABI/testing/sysfs-devices-devfreq | 21 ++++
> >> Documentation/ABI/testing/sysfs-power | 43 ++++++++
> >> drivers/base/power/devfreq.c | 133 ++++++++++++++++++++++-
> >> include/linux/devfreq.h | 3 +
> >> 4 files changed, 199 insertions(+), 1 deletions(-)
> >> create mode 100644 Documentation/ABI/testing/sysfs-devices-devfreq
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-devices-devfreq b/Documentation/ABI/testing/sysfs-devices-devfreq
> >> new file mode 100644
> >> index 0000000..7f35a64
> >> --- /dev/null
> >> +++ b/Documentation/ABI/testing/sysfs-devices-devfreq
> >> @@ -0,0 +1,21 @@
> >> +What: /sys/devices/.../devfreq/
> >> +Date: May 2011
> >> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> +Description:
> >> + The /sys/device/.../devfreq directory will contain files
> >> + that provide interfaces to DEVFREQ for a specific device.
> >> +
> >> +What: /sys/devices/.../devfreq/tickle
> >> +Date: May 2011
> >> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> +Description:
> >> + The /sys/devices/.../devfreq/tickle file allows user space
> >> + to force the corresponding device to operate at its maximum
> >> + operable frequency instaneously and temporarily. After a
> >> + designated duration has passed, the operating frequency returns
> >> + to normal. When a user reads the tickle entry, it returns
> >> + the number of tickle executions for the device. When a user
> >> + writes to the tickle entry with the tickle duration in ms,
> >> + the effect of device tickling is held for the designated
> >> + duration. Note that the duration is rounded-up by
> >> + the value DEVFREQ_INTERVAL defined in devfreq.c
> >> diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
> >> index b464d12..4d8434b 100644
> >> --- a/Documentation/ABI/testing/sysfs-power
> >> +++ b/Documentation/ABI/testing/sysfs-power
> >> @@ -172,3 +172,46 @@ Description:
> >>
> >> Reading from this file will display the current value, which is
> >> set to 1 MB by default.
> >> +
> >> +What: /sys/power/devfreq/
> >> +Date: May 2011
> >> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> +Description:
> >> + The /sys/power/devfreq directory will contain files that will
> >> + provide a unified interface to the DEVFREQ, a generic DVFS
> >> + (dynamic voltage and frequency scaling) framework.
> >> +
> >> +What: /sys/power/devfreq/tickle_all
> >> +Date: May 2011
> >> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> +Description:
> >> + The /sys/power/devfreq/tickle_all file allows user space to
> >> + force every device with DEVFREQ to operate at the maximum
> >> + frequency of the device instaneously and temporarily. After
> >> + a designated delay has passed, the operating frequency returns
> >> + to normal. If a user reads the tickle_all entry, it returns
> >> + the number of tickle_all executions. When writing to the
> >> + tickle_all entry, the user should supply with the duration of
> >> + tickle in ms (the "designated delay" mentioned before). Then,
> >> + the effect of tickle_all will hold for the denoted duration.
> >> + Note that the duration is rounded by the monitoring period
> >> + defined by DEVFREQ_INTERVAL in /drivers/base/power/devfreq.c.
> >> +
> >> +What: /sys/power/devfreq/min_interval
> >> +Date: May 2011
> >> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> +Description:
> >> + The /sys/power/devfreq/min_interval file shows the monitoring
> >> + period defined by DEVFREQ_INTERVAL in
> >> + /drivers/base/power/devfreq.c. The duration of device tickling
> >> + is rounded-up by DEVFREQ_INTERVAL.
> >> +
> >> +What: /sys/power/devfreq/monitoring
> >> +Date: May 2011
> >> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> +Description:
> >> + The /sys/power/devfreq/monitoring file shows whether DEVFREQ
> >> + is periodically monitoring. Periodic monitoring is activated
> >> + if there is a device that wants periodic monitoring for DVFS or
> >> + there is a device that is tickled (and the tickling duration is
> >> + not yet expired).
> >> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> >> index 7648a94..709c138 100644
> >> --- a/drivers/base/power/devfreq.c
> >> +++ b/drivers/base/power/devfreq.c
> >> @@ -40,6 +40,9 @@ static LIST_HEAD(devfreq_list);
> >> /* Exclusive access to devfreq_list and its elements */
> >> static DEFINE_MUTEX(devfreq_list_lock);
> >>
> >> +static struct kobject *devfreq_kobj;
> >> +static struct attribute_group dev_attr_group;
> >> +
> >> /**
> >> * find_device_devfreq() - find devfreq struct using device pointer
> >> * @dev: device pointer used to lookup device DEVFREQ.
> >> @@ -237,6 +240,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
> >> queue_delayed_work(devfreq_wq, &devfreq_work,
> >> msecs_to_jiffies(DEVFREQ_INTERVAL));
> >> }
> >> +
> >> + sysfs_update_group(&dev->kobj, &dev_attr_group);
> >
> > This appears to modify the attributes of the device, but anything like it is
> > not mentioned in the documentation. What's up?
>
> It is mentioned "Documentation/ABI/testing/sysfs-devices-devfreq".
Yes, it is, sorry. But it should be under /sys/devices/.../power/ .
> >
> >> out:
> >> mutex_unlock(&devfreq_list_lock);
> >>
> >> @@ -263,6 +268,8 @@ int devfreq_remove_device(struct device *dev)
> >> return -EINVAL;
> >> }
> >>
> >> + sysfs_remove_group(&dev->kobj, &dev_attr_group);
> >> +
> >> list_del(&devfreq->node);
> >>
> >> kfree(devfreq);
> >> @@ -344,7 +351,7 @@ static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
> >> if (devfreq_wq && !polling) {
> >> polling = true;
> >> queue_delayed_work(devfreq_wq, &devfreq_work,
> >> - msecs_to_jiffies(DEVFREQ_INTERVAL));
> >> + msecs_to_jiffies(DEVFREQ_INTERVAL));
> >
> > This change doesn't seem to belong to this patch.
>
> Oops. I'll rearrange the patch series.
>
> >
> >> }
> >>
> >> return err;
> >> @@ -382,6 +389,116 @@ int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
> >> return err;
> >> }
> >>
> >> +static int num_tickle_all;
> >> +static ssize_t tickle_all(struct device *dev, struct device_attribute *attr,
> >> + const char *buf, size_t count)
> >> +{
> >> + int duration = 0;
> >> + struct devfreq *tmp;
> >> + unsigned long delay;
> >> +
> >> + sscanf(buf, "%d", &duration);
> >> + if (duration < DEVFREQ_INTERVAL)
> >> + duration = DEVFREQ_INTERVAL;
> >> +
> >> + if (unlikely(IS_ERR_OR_NULL(dev))) {
> >> + pr_err("%s: Invalid parameters\n", __func__);
> >
> > Please say "null device" instead of in addition to the above.
>
> Ok. I'll let it say "Null or invalid device" as it's IS_ERR_OR_NULL.
OK
> >
> >> + return -EINVAL;
> >> + }
> >> +
> >> + delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
> >> +
> >> + mutex_lock(&devfreq_list_lock);
> >> + list_for_each_entry(tmp, &devfreq_list, node) {
> >> + _devfreq_tickle_device(tmp, delay);
> >> + }
> >> + mutex_unlock(&devfreq_list_lock);
> >> +
> >> + num_tickle_all++;
> >> + return count;
> >> +}
> >> +
> >> +static ssize_t show_num_tickle_all(struct device *dev,
> >> + struct device_attribute *attr, char *buf)
> >> +{
> >> + return sprintf(buf, "%d\n", num_tickle_all);
> >> +}
> >> +
> >> +static ssize_t show_min_interval(struct device *dev,
> >> + struct device_attribute *attr, char *buf)
> >> +{
> >> + return sprintf(buf, "%d\n", DEVFREQ_INTERVAL);
> >> +}
> >> +
> >> +static ssize_t show_monitoring(struct device *dev,
> >> + struct device_attribute *attr, char *buf)
> >> +{
> >> + return sprintf(buf, "%d\n", monitoring ? 1 : 0);
> >> +}
> >> +
> >> +static DEVICE_ATTR(tickle_all, 0644, show_num_tickle_all, tickle_all);
> >> +static DEVICE_ATTR(min_interval, 0444, show_min_interval, NULL);
> >> +static DEVICE_ATTR(monitoring, 0444, show_monitoring, NULL);
> >> +static struct attribute *devfreq_entries[] = {
> >> + &dev_attr_tickle_all.attr,
> >> + &dev_attr_min_interval.attr,
> >> + &dev_attr_monitoring.attr,
> >> + NULL,
> >> +};
> >> +static struct attribute_group devfreq_attr_group = {
> >> + .name = NULL,
> >> + .attrs = devfreq_entries,
> >> +};
> >> +
> >> +static ssize_t tickle(struct device *dev, struct device_attribute *attr,
> >> + const char *buf, size_t count)
> >> +{
> >> + int duration;
> >> + struct devfreq *df;
> >> + unsigned long delay;
> >> +
> >> + sscanf(buf, "%d", &duration);
> >> + if (duration < DEVFREQ_INTERVAL)
> >> + duration = DEVFREQ_INTERVAL;
> >> +
> >> + if (unlikely(IS_ERR_OR_NULL(dev))) {
> >> + pr_err("%s: Invalid parameters\n", __func__);
> >
> > Like above.
>
> Yup.
>
> >
> >> + return -EINVAL;
> >> + }
> >> +
> >> + delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
> >> +
> >> + mutex_lock(&devfreq_list_lock);
> >> + df = find_device_devfreq(dev);
> >> + _devfreq_tickle_device(df, delay);
> >> + mutex_unlock(&devfreq_list_lock);
> >> +
> >> + return count;
> >> +}
> >> +
> >> +static ssize_t show_num_tickle(struct device *dev,
> >> + struct device_attribute *attr, char *buf)
> >> +{
> >> + struct devfreq *df;
> >> +
> >> + df = find_device_devfreq(dev);
> >> +
> >> + if (!IS_ERR(df))
> >> + return sprintf(buf, "%d\n", df->num_tickle);
> >> +
> >> + return PTR_ERR(df);
> >> +}
> >> +
> >> +static DEVICE_ATTR(tickle, 0644, show_num_tickle, tickle);
> >> +static struct attribute *dev_entries[] = {
> >> + &dev_attr_tickle.attr,
> >> + NULL,
> >> +};
> >> +static struct attribute_group dev_attr_group = {
> >> + .name = "devfreq",
> >> + .attrs = dev_entries,
> >> +};
> >> +
> >> static int __init devfreq_init(void)
> >> {
> >> mutex_lock(&devfreq_list_lock);
> >> @@ -389,6 +506,20 @@ static int __init devfreq_init(void)
> >> polling = false;
> >> devfreq_wq = create_freezable_workqueue("devfreq_wq");
> >> INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
> >> +
> >> +#ifdef CONFIG_PM
> >> + /* Create sysfs */
> >> + devfreq_kobj = kobject_create_and_add("devfreq", power_kobj);
> >
> > Hmm, so power_kobj is global? It shouldn't be.
>
> Yes, it is global and it's declared at include/linux/kobject.h.
Oh, well.
> >
> > Generally, whatever adds attributes to /sys/power should be located in
> > kernel/power/ .
>
> I've put it at /sys/power because these attributes are global to every
> device with devfreq.
That's fine.
What I wanted to say was that all code adding attributes into /sys/power
should be located under kernel/power/ in the kernel source tree.
Otherwise we'll end up with a mess of dependencies that's hard to maintain.
> Anyway, if you think that's a weird location for it, could you please
> give me some hints on where would be proper to locate system-wide
> devfreq sysfs attributes?
> Or, do you think kernel/power/devfreq.c would be a proper location for devfreq?
>
> Or, what about allowing every /sys/devices/.../devfreq/global/* to be
> the same "global" attributes of devfreq?
>
> >
> >> + if (!devfreq_kobj) {
> >> + pr_err("Unable to create DEVFREQ kobject.\n");
> >> + goto out;
> >> + }
> >> + if (sysfs_create_group(devfreq_kobj, &devfreq_attr_group)) {
> >> + pr_err("Unable to create DEVFREQ sysfs entries.\n");
> >> + goto out;
> >> + }
> >> +#endif
> >> +out:
> >> mutex_unlock(&devfreq_list_lock);
> >>
> >> devfreq_monitor(&devfreq_work.work);
> >> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> >> index 1ec9a40..69334e7 100644
> >> --- a/include/linux/devfreq.h
> >> +++ b/include/linux/devfreq.h
> >> @@ -59,6 +59,7 @@ struct devfreq_governor {
> >> * at each executino of devfreq_monitor, tickle is decremented.
> >> * User may tickle a device-devfreq in order to set maximum
> >> * frequency instaneously with some guaranteed duration.
> >> + * @num_tickle number of tickle calls.
> >> *
> >> * This structure stores the DEVFREQ information for a give device.
> >> */
> >> @@ -72,6 +73,8 @@ struct devfreq {
> >> unsigned long previous_freq;
> >> unsigned int next_polling;
> >> unsigned int tickle;
> >> +
> >> + unsigned int num_tickle;
> >> };
> >>
> >> #if defined(CONFIG_PM_DEVFREQ)
> >
> > It looks like the above two changes should be moved to a separate patch.
>
> These two changes are for the sysfs interface as counting number of
> tickle is added with sysfs interface.
OK
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCH v3 2/3] PM / DEVFREQ: add example governors
From: Rafael J. Wysocki @ 2011-07-04 8:43 UTC (permalink / raw)
To: myungjoo.ham
Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
linux-pm, Thomas Gleixner
In-Reply-To: <CAJ0PZbQuiwkW-Z+1Wrqmr3r_U+LdWS312RggkkjFBGFHES4wCw@mail.gmail.com>
On Monday, July 04, 2011, MyungJoo Ham wrote:
> Hello,
>
> 2011/7/3 Rafael J. Wysocki <rjw@sisk.pl>:
> > Hi,
> >
> > On Friday, May 27, 2011, MyungJoo Ham wrote:
> >> Three CPUFREQ-like governors are provided as examples.
> >>
> >> powersave: use the lowest frequency possible. The user (device) should
> >> set the polling_ms as 0 because polling is useless for this governor.
> >>
> >> performance: use the highest freqeuncy possible. The user (device)
> >> should set the polling_ms as 0 because polling is useless for this
> >> governor.
> >>
> >> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
> >>
> >> When a user updates OPP entries (enable/disable/add), OPP framework
> >> automatically notifies DEVFREQ to update operating frequency
> >> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
> >> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
> >> , performance, or any other "static" governors.
> >>
> >> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> >> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> []
> >> +
> >> + /* Set the desired frequency based on the load */
> >> + a = (unsigned long long) stat.busy_time * stat.current_frequency;
> >
> > What's the purpose of the conversion?
>
> Assuming that the work speed of a device is proportional to its
> frequency, it measures the amount of work done.
> It's time * work/time. For example, during the last 10 second, if the
> busy_time was 5 sec and frequency was 10MHz,
> it's "50M", which is same as 20MHz and 2.5 sec.
I understand that, but my question was why you're doing a forced conversion
to (unsigned long long).
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCH v2 00/11] PM QoS: add a per-device wake-up latency constraint class
From: Rafael J. Wysocki @ 2011-07-04 8:38 UTC (permalink / raw)
To: Vishwanath Sripathy
Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet-XID
In-Reply-To: <62b1cfbe626e329338b9f0a9b69ef2d3@mail.gmail.com>
On Monday, July 04, 2011, Vishwanath Sripathy wrote:
> > -----Original Message-----
> > From: linux-omap-owner@vger.kernel.org [mailto:linux-omap-
> > owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> > Sent: Sunday, July 03, 2011 12:51 AM
> > To: jean.pihet@newoldbits.com
> > Cc: Paul Walmsley; Kevin Hilman; Magnus Damm; Linux PM mailing list;
> > linux-omap@vger.kernel.org; markgross@thegnar.org; Jean Pihet
> > Subject: Re: [PATCH v2 00/11] PM QoS: add a per-device wake-up
> > latency constraint class
> >
> > Hi,
> >
> > On Thursday, June 30, 2011, jean.pihet@newoldbits.com wrote:
> > > From: Jean Pihet <j-pihet@ti.com>
> > >
> > > This patch set is in an RFC state, for review and comments.
> >
> > First off, I'm sorry I couldn't review the patchset earlier.
> >
> > > In order to implement the new class in PM QoS the following
> > changes have been
> > > made:
> > >
> > > 1. Add a new PM QoS class for device wake-up constraints
> > > (PM_QOS_DEV_WAKEUP_LATENCY).
> > > Due to the per-device nature of the new class the constraints
> > lists are stored
> > > inside the device dev_pm_info struct instead of the internal per-
> > class
> > > constraints lists.
> > > The new class is only available from kernel drivers and so is not
> > exported to
> > > user space.
> >
> > Have you considered a design in which multiple devices may use the
> > same
> > list of constraints? It seems plausible that the constraints will
> > be the
> > same, for example, for all Ethernet adapters in the system, in which
> > case it
> > will be wasteful to duplicate the list of constraints for each of
> > them.
> >
> > > 2. Added a notification of device insertion/removal from the
> > device PM framework
> > > to PM QoS.
> > > This allows to init/de-init the per-device constraints list upon
> > device insertion
> > > and removal.
> > > RFC state for comments and review, barely tested
> >
> > I need to have a look at the details, but in principle this means
> > that the
> > per-device lists will be usable only after the devices have been
> > registered.
> > In particular, this means that it will only be possible to add new
> > constraints
> > after registering the device, which may be too late for some use
> > cases.
> >
> > > 3. Make the pm_qos_add_request API more generic by using a
> > > struct pm_qos_parameters parameter. This allows easy extension in
> > the future.
> > >
> > > 4. Upon a change of the strongest constraint in the
> > PM_QOS_DEV_WAKEUP_LATENCY
> > > class a notification chain mechanism is used to take action on the
> > system.
> > > This is the proposed way to have PM QoS and the platform dependant
> > code to
> > > interact with each other, cf. 4 below.
> >
> > I guess you mean 5.?
> >
> > I think we will need something in addition to the notifier here.
> > For example,
> > I wouldn't like any core code, like runtime PM or cpuidle, to have
> > to register
> > a notifier with PM QoS.
> >
> > > The notification mechanism now passes the constraint request
> > struct ptr in
> > > order for the notifier callback to have access to the full set of
> > constraint
> > > data, e.g. the struct device ptr.
> > >
> > > 5. cpuidle interaction with the OMAP3 cpuidle handler
> > > Since cpuidle is a CPU centric framework it decides the MPU next
> > power state
> > > based on the MPU exit_latency and target_residency figures.
> > >
> > > The rest of the power domains get their next power state
> > programmed from
> > > the PM_QOS_DEV_WAKEUP_LATENCY class of the PM QoS framework, via
> > the device
> > > wake-up latency constraints.
> > >
> > > Note: the exit_latency and target_residency figures of the MPU
> > include the MPU
> > > itself and the peripherals needed for the MPU to execute
> > instructions (e.g.
> > > main memory, caches, IRQ controller, MMU etc).
> > > Some of those peripherals can belong to other power domains than
> > the MPU
> > > subsystem and so the corresponding latencies must be included in
> > those figures.
> > >
> > > 6. Update the pm_qos_add_request callers to the generic API
> > >
> > > 7. Minor clean-ups and rename of struct fields
> > >
> > > Questions:
> > > 1. How to retrieve the device ptr from a given device driver in
> > order to add
> > > a constraint on it?
> > > 2. The device struct has recently been extended with the power
> > domain
> > > information. Can this be used to apply the constraints on power
> > domains?
> >
> > Yes, it can in principle, but that will require some work.
> >
> > > On-going developments, patches in preparation:
> > > 1. write Documentation for the new PM QoS class
> >
> > I'd wait with that until the code has settled.
> >
> > > 2. validate the constraints framework on OMAP4 HW (done on OMAP3)
> > > 3. refine the power domains wake-up latency and the cpuidle
> > figures
> > >
> > > Based on the master branch of the linux-omap git tree (3.0.0-rc3).
> > Compile
> > > tested using OMAP and x86 generic defconfigs.
> > > Tested on OMAP3 Beagleboard (ES2.x) with full RETention and OFF
> > modes.
> >
> > More detailed comments will follow.
> Thanks Rafael for reviewing the patches. Jean is currently on summer
> vacation for 2 weeks and will take this discussion forward once he is back
> to work on July 18.
OK, thanks for letting me know.
Regards,
Rafael
^ permalink raw reply
* Re: [PATCH v3 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
From: MyungJoo Ham @ 2011-07-04 8:34 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
linux-pm, Thomas Gleixner
In-Reply-To: <201107022356.24952.rjw@sisk.pl>
2011/7/3 Rafael J. Wysocki <rjw@sisk.pl>:
> Hi,
>
> I'm not sure if spelling DEVFREQ in the kerneldoc comments with capitals is
> a good idea, it looks kind of odd. I'm not sure what would be look better,
> though.
Umm.. what about "devfreq" using all non-capitals? It looks fine to me as well.
>
> On Friday, May 27, 2011, MyungJoo Ham wrote:
>
> ...
>> +/**
>> + * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
>
> I'd say "periodically" istead of "regularly".
Yes, that is the proper term. Thanks.
>
>> + * @work: the work struct used to run devfreq_monitor periodically.
>
> Also please say something more about the "tickle" thinkg here.
Sure.
>
>> + */
>> +static void devfreq_monitor(struct work_struct *work)
>> +{
>> + struct devfreq *devfreq;
>> + int error;
>> + bool continue_polling = false;
>> + struct devfreq *to_remove = NULL;
>> +
>> + mutex_lock(&devfreq_list_lock);
>> +
>> + list_for_each_entry(devfreq, &devfreq_list, node) {
>> + /* Remove the devfreq entry that failed */
>> + if (to_remove) {
>> + list_del(&to_remove->node);
>> + kfree(to_remove);
>> + to_remove = NULL;
>> + }
>> +
>> + /*
>> + * If the device is tickled and the tickle duration is left,
>> + * do not change the frequency for a while
>> + */
>> + if (devfreq->tickle) {
>> + continue_polling = true;
>> + devfreq->tickle--;
>> +
>> + /*
>> + * If the tickle is ending and the device is not going
>> + * to poll, force the device to poll next time so that
>> + * it can return to the original frequency afterwards.
>> + * However, non-polling device will have 0 polling_ms,
>> + * it will not poll again later.
>> + */
>> + if (devfreq->tickle == 0 && devfreq->next_polling == 0)
>> + devfreq->next_polling = 1;
>> +
>> + continue;
>> + }
>> +
>> + /* This device does not require polling */
>> + if (devfreq->next_polling == 0)
>> + continue;
>> +
>> + continue_polling = true;
>> +
>> + if (devfreq->next_polling == 1) {
>> + /* This device is polling this time */
>
> I'd remove this comment, it's confusing IMO. Besides, it may be better
> to structure the code like this:
>
> if (devfreq->next_polling-- == 1) {
> }
>
> and then you wouldn't need the "else" at all.
Ok, I'll try that style.
>
>> + error = devfreq_do(devfreq);
>> + if (error && error != -EAGAIN) {
>> + /*
>> + * Remove a devfreq with error. However,
>> + * We cannot remove it right here because the
>
> Comma after "here", please.
"We" should be "we" here, but no comma there though:
http://owl.english.purdue.edu/owl/resource/607/02/
However, "However, because the ..... above, we cannot remove it right
here" is correct.
>
>> + * devfreq pointer is going to be used by
>> + * list_for_each_entry above. Thus, it is
>> + * removed afterwards.
>
> Why don't you use list_for_each_entry_safe(), then?
Ah.. that's a good idea. Thanks! Now, I can reorganize this one.
>
>> + */
>> + to_remove = devfreq->dev;
>> + dev_err(devfreq->dev, "devfreq_do error(%d). "
>> + "DEVFREQ is removed from the device\n",
>> + error);
>> + continue;
>> + }
>> + devfreq->next_polling = DIV_ROUND_UP(
>> + devfreq->profile->polling_ms,
>> + DEVFREQ_INTERVAL);
>> + } else {
>> + /* The device will poll later when next_polling = 1 */
>> + devfreq->next_polling--;
>> + }
>> + }
>> +
>> + if (to_remove) {
>> + list_del(&to_remove->node);
>> + kfree(to_remove);
>> + to_remove = NULL;
>> + }
>> +
>> + if (continue_polling) {
>> + polling = true;
>> + queue_delayed_work(devfreq_wq, &devfreq_work,
>> + msecs_to_jiffies(DEVFREQ_INTERVAL));
>> + } else {
>> + polling = false;
>> + }
>
> OK, so why exactly continue_polling is needed? It seems you might siply use
> "polling" instead of it above.
The purpose of continue_polling seems to disappear in the middle of
development. I'll remove it.
>
>> +
>> + mutex_unlock(&devfreq_list_lock);
>> +}
> ...
>> +/**
>> + * devfreq_update() - Notify that the device OPP has been changed.
>> + * @dev: the device whose OPP has been changed.
>> + * @may_not_exist: do not print error message even if the device
>> + * does not have devfreq entry.
>
> This argument isn't used any more.
Ah.. yes. sure.
>
>> + */
>> +int devfreq_update(struct device *dev)
>> +{
>> + struct devfreq *devfreq;
>> + int err = 0;
>> +
>> + mutex_lock(&devfreq_list_lock);
>> +
>> + devfreq = find_device_devfreq(dev);
>> + if (IS_ERR(devfreq)) {
>> + err = PTR_ERR(devfreq);
>> + goto out;
>> + }
>> +
>> + if (devfreq->tickle) {
>> + /* If the max freq available is changed, re-tickle */
>
> It would be good to explain what "re-tickle" means.
Of course. I'll explain that.
>
>> + unsigned long freq = devfreq->profile->max_freq;
>> + struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
>> +
>> + if (IS_ERR(opp)) {
>> + err = PTR_ERR(opp);
>> + goto out;
>> + }
>> +
>> + /* Max freq available is not changed */
>> + if (devfreq->previous_freq == freq)
>> + goto out;
>> +
>> + err = devfreq->profile->target(devfreq->dev, opp);
>> + if (!err)
>> + devfreq->previous_freq = freq;
>
> Why don't we run devfreq_do() in this case?
When the list of available frequencies is updated and the device is to
be kept at its maximum frequency, we do not need to reevaluate the
device's activities to setup the new frequency. In such cases, we only
need to keep it at its maximum freuqency. If the maximum frequency is
not changed, we don't need anything to do (when "tickling" is
completed, devfreq_do() will reevaluate automatically); otherwise,
tickling again at the new frequency is needed (re-tickle).
>
>> + } else {
>> + /* Reevaluate the proper frequency */
>> + err = devfreq_do(devfreq);
>> + }
>> +
>> +out:
>> + mutex_unlock(&devfreq_list_lock);
>> + return err;
>> +}
>> +
>
> Add a kerneldoc comment here, please.
Yes. I'll add one. (and for any other functions missing kerneldoc comments)
>
>> +static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
>> +{
>> + int err = 0;
>> + unsigned long freq;
>> + struct opp *opp;
>> +
>> + freq = df->profile->max_freq;
>> + opp = opp_find_freq_floor(df->dev, &freq);
>> + if (IS_ERR(opp))
>> + return PTR_ERR(opp);
>> +
>> + if (df->previous_freq != freq) {
>> + err = df->profile->target(df->dev, opp);
>> + if (!err)
>> + df->previous_freq = freq;
>> + }
>> + if (err) {
>> + dev_err(df->dev, "%s: Cannot set frequency.\n", __func__);
>> + } else {
>> + df->tickle = delay;
>> + df->num_tickle++;
>> + }
>> +
>> + if (devfreq_wq && !polling) {
>> + polling = true;
>> + queue_delayed_work(devfreq_wq, &devfreq_work,
>> + msecs_to_jiffies(DEVFREQ_INTERVAL));
>> + }
>> +
>> + return err;
>> +}
>
> Thanks,
> Rafael
>
Thank you so much for the comments!
Cheers!
- MyungJoo
--
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply
* Re: [patch 1/1] [PATCH] include storage keys in hibernation image.
From: Martin Schwidefsky @ 2011-07-04 8:09 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-s390, linux-kernel, linux-pm, Jiri Slaby
In-Reply-To: <20110703174616.GB26790@elf.ucw.cz>
On Sun, 3 Jul 2011 19:46:16 +0200
Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
> > > I think, however, that we really should try to merge them. The only
> > > difference seems to be how the additionally allocated pages will be populated
> > > and what's going to happen to their contents during restore.
> > >
> > > ACPI will simply copy the NVS memory to those pages, while S390 will save
> > > the relevant storage key bits in there.
> >
> > One complication to keep in mind is that we need to know which storage key
> > goes to which page frame. We need something like the orig_bm/copy_bm or
> > we'd have to store the pfn with the key. Simply storing the key for every
> > page will make the array unnecessarily big.
>
> How big is the overhead? In percent / in megabytes?
Well, that depends on the ratio of the size of the hibernation image and
the total size of the ram. Consider a 1TB machine with a hibernation image
size of lets say 128 GB. The hibernation image would require 32 MB worth
of storage keys (0.024%), for 1TB the array size would be 256 MB (0.19%).
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply
* Re: [PATCH v2 00/11] PM QoS: add a per-device wake-up latency constraint class
From: Vishwanath Sripathy @ 2011-07-04 7:16 UTC (permalink / raw)
To: Rafael J. Wysocki, jean.pihet
Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet-XID
In-Reply-To: <201107022120.47221.rjw@sisk.pl>
> -----Original Message-----
> From: linux-omap-owner@vger.kernel.org [mailto:linux-omap-
> owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> Sent: Sunday, July 03, 2011 12:51 AM
> To: jean.pihet@newoldbits.com
> Cc: Paul Walmsley; Kevin Hilman; Magnus Damm; Linux PM mailing list;
> linux-omap@vger.kernel.org; markgross@thegnar.org; Jean Pihet
> Subject: Re: [PATCH v2 00/11] PM QoS: add a per-device wake-up
> latency constraint class
>
> Hi,
>
> On Thursday, June 30, 2011, jean.pihet@newoldbits.com wrote:
> > From: Jean Pihet <j-pihet@ti.com>
> >
> > This patch set is in an RFC state, for review and comments.
>
> First off, I'm sorry I couldn't review the patchset earlier.
>
> > In order to implement the new class in PM QoS the following
> changes have been
> > made:
> >
> > 1. Add a new PM QoS class for device wake-up constraints
> > (PM_QOS_DEV_WAKEUP_LATENCY).
> > Due to the per-device nature of the new class the constraints
> lists are stored
> > inside the device dev_pm_info struct instead of the internal per-
> class
> > constraints lists.
> > The new class is only available from kernel drivers and so is not
> exported to
> > user space.
>
> Have you considered a design in which multiple devices may use the
> same
> list of constraints? It seems plausible that the constraints will
> be the
> same, for example, for all Ethernet adapters in the system, in which
> case it
> will be wasteful to duplicate the list of constraints for each of
> them.
>
> > 2. Added a notification of device insertion/removal from the
> device PM framework
> > to PM QoS.
> > This allows to init/de-init the per-device constraints list upon
> device insertion
> > and removal.
> > RFC state for comments and review, barely tested
>
> I need to have a look at the details, but in principle this means
> that the
> per-device lists will be usable only after the devices have been
> registered.
> In particular, this means that it will only be possible to add new
> constraints
> after registering the device, which may be too late for some use
> cases.
>
> > 3. Make the pm_qos_add_request API more generic by using a
> > struct pm_qos_parameters parameter. This allows easy extension in
> the future.
> >
> > 4. Upon a change of the strongest constraint in the
> PM_QOS_DEV_WAKEUP_LATENCY
> > class a notification chain mechanism is used to take action on the
> system.
> > This is the proposed way to have PM QoS and the platform dependant
> code to
> > interact with each other, cf. 4 below.
>
> I guess you mean 5.?
>
> I think we will need something in addition to the notifier here.
> For example,
> I wouldn't like any core code, like runtime PM or cpuidle, to have
> to register
> a notifier with PM QoS.
>
> > The notification mechanism now passes the constraint request
> struct ptr in
> > order for the notifier callback to have access to the full set of
> constraint
> > data, e.g. the struct device ptr.
> >
> > 5. cpuidle interaction with the OMAP3 cpuidle handler
> > Since cpuidle is a CPU centric framework it decides the MPU next
> power state
> > based on the MPU exit_latency and target_residency figures.
> >
> > The rest of the power domains get their next power state
> programmed from
> > the PM_QOS_DEV_WAKEUP_LATENCY class of the PM QoS framework, via
> the device
> > wake-up latency constraints.
> >
> > Note: the exit_latency and target_residency figures of the MPU
> include the MPU
> > itself and the peripherals needed for the MPU to execute
> instructions (e.g.
> > main memory, caches, IRQ controller, MMU etc).
> > Some of those peripherals can belong to other power domains than
> the MPU
> > subsystem and so the corresponding latencies must be included in
> those figures.
> >
> > 6. Update the pm_qos_add_request callers to the generic API
> >
> > 7. Minor clean-ups and rename of struct fields
> >
> > Questions:
> > 1. How to retrieve the device ptr from a given device driver in
> order to add
> > a constraint on it?
> > 2. The device struct has recently been extended with the power
> domain
> > information. Can this be used to apply the constraints on power
> domains?
>
> Yes, it can in principle, but that will require some work.
>
> > On-going developments, patches in preparation:
> > 1. write Documentation for the new PM QoS class
>
> I'd wait with that until the code has settled.
>
> > 2. validate the constraints framework on OMAP4 HW (done on OMAP3)
> > 3. refine the power domains wake-up latency and the cpuidle
> figures
> >
> > Based on the master branch of the linux-omap git tree (3.0.0-rc3).
> Compile
> > tested using OMAP and x86 generic defconfigs.
> > Tested on OMAP3 Beagleboard (ES2.x) with full RETention and OFF
> modes.
>
> More detailed comments will follow.
Thanks Rafael for reviewing the patches. Jean is currently on summer
vacation for 2 weeks and will take this discussion forward once he is back
to work on July 18.
Regards
Vishwa
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v3 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
From: MyungJoo Ham @ 2011-07-04 7:15 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
linux-pm, Thomas Gleixner
In-Reply-To: <201107030013.10438.rjw@sisk.pl>
Hello,
2011/7/3 Rafael J. Wysocki <rjw@sisk.pl>:
> On Friday, May 27, 2011, MyungJoo Ham wrote:
>> 1. System-wide sysfs interface
>> - tickle_all R: number of tickle_all execution
>> W: tickle all devfreq devices
>> - min_interval R: devfreq monitoring base interval in ms
>> - monitoring R: shows whether devfreq monitoring is active or
>> not.
>>
>> 2. Device specific sysfs interface
>> - tickle R: number of tickle execution for the device
>> W: tickle the device
>>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>
>> --
>> Changed from v2
>> - add ABI entries for devfreq sysfs interface
>> ---
>> Documentation/ABI/testing/sysfs-devices-devfreq | 21 ++++
>> Documentation/ABI/testing/sysfs-power | 43 ++++++++
>> drivers/base/power/devfreq.c | 133 ++++++++++++++++++++++-
>> include/linux/devfreq.h | 3 +
>> 4 files changed, 199 insertions(+), 1 deletions(-)
>> create mode 100644 Documentation/ABI/testing/sysfs-devices-devfreq
>>
>> diff --git a/Documentation/ABI/testing/sysfs-devices-devfreq b/Documentation/ABI/testing/sysfs-devices-devfreq
>> new file mode 100644
>> index 0000000..7f35a64
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-devices-devfreq
>> @@ -0,0 +1,21 @@
>> +What: /sys/devices/.../devfreq/
>> +Date: May 2011
>> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
>> +Description:
>> + The /sys/device/.../devfreq directory will contain files
>> + that provide interfaces to DEVFREQ for a specific device.
>> +
>> +What: /sys/devices/.../devfreq/tickle
>> +Date: May 2011
>> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
>> +Description:
>> + The /sys/devices/.../devfreq/tickle file allows user space
>> + to force the corresponding device to operate at its maximum
>> + operable frequency instaneously and temporarily. After a
>> + designated duration has passed, the operating frequency returns
>> + to normal. When a user reads the tickle entry, it returns
>> + the number of tickle executions for the device. When a user
>> + writes to the tickle entry with the tickle duration in ms,
>> + the effect of device tickling is held for the designated
>> + duration. Note that the duration is rounded-up by
>> + the value DEVFREQ_INTERVAL defined in devfreq.c
>> diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
>> index b464d12..4d8434b 100644
>> --- a/Documentation/ABI/testing/sysfs-power
>> +++ b/Documentation/ABI/testing/sysfs-power
>> @@ -172,3 +172,46 @@ Description:
>>
>> Reading from this file will display the current value, which is
>> set to 1 MB by default.
>> +
>> +What: /sys/power/devfreq/
>> +Date: May 2011
>> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
>> +Description:
>> + The /sys/power/devfreq directory will contain files that will
>> + provide a unified interface to the DEVFREQ, a generic DVFS
>> + (dynamic voltage and frequency scaling) framework.
>> +
>> +What: /sys/power/devfreq/tickle_all
>> +Date: May 2011
>> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
>> +Description:
>> + The /sys/power/devfreq/tickle_all file allows user space to
>> + force every device with DEVFREQ to operate at the maximum
>> + frequency of the device instaneously and temporarily. After
>> + a designated delay has passed, the operating frequency returns
>> + to normal. If a user reads the tickle_all entry, it returns
>> + the number of tickle_all executions. When writing to the
>> + tickle_all entry, the user should supply with the duration of
>> + tickle in ms (the "designated delay" mentioned before). Then,
>> + the effect of tickle_all will hold for the denoted duration.
>> + Note that the duration is rounded by the monitoring period
>> + defined by DEVFREQ_INTERVAL in /drivers/base/power/devfreq.c.
>> +
>> +What: /sys/power/devfreq/min_interval
>> +Date: May 2011
>> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
>> +Description:
>> + The /sys/power/devfreq/min_interval file shows the monitoring
>> + period defined by DEVFREQ_INTERVAL in
>> + /drivers/base/power/devfreq.c. The duration of device tickling
>> + is rounded-up by DEVFREQ_INTERVAL.
>> +
>> +What: /sys/power/devfreq/monitoring
>> +Date: May 2011
>> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
>> +Description:
>> + The /sys/power/devfreq/monitoring file shows whether DEVFREQ
>> + is periodically monitoring. Periodic monitoring is activated
>> + if there is a device that wants periodic monitoring for DVFS or
>> + there is a device that is tickled (and the tickling duration is
>> + not yet expired).
>> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
>> index 7648a94..709c138 100644
>> --- a/drivers/base/power/devfreq.c
>> +++ b/drivers/base/power/devfreq.c
>> @@ -40,6 +40,9 @@ static LIST_HEAD(devfreq_list);
>> /* Exclusive access to devfreq_list and its elements */
>> static DEFINE_MUTEX(devfreq_list_lock);
>>
>> +static struct kobject *devfreq_kobj;
>> +static struct attribute_group dev_attr_group;
>> +
>> /**
>> * find_device_devfreq() - find devfreq struct using device pointer
>> * @dev: device pointer used to lookup device DEVFREQ.
>> @@ -237,6 +240,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
>> queue_delayed_work(devfreq_wq, &devfreq_work,
>> msecs_to_jiffies(DEVFREQ_INTERVAL));
>> }
>> +
>> + sysfs_update_group(&dev->kobj, &dev_attr_group);
>
> This appears to modify the attributes of the device, but anything like it is
> not mentioned in the documentation. What's up?
It is mentioned "Documentation/ABI/testing/sysfs-devices-devfreq".
>
>> out:
>> mutex_unlock(&devfreq_list_lock);
>>
>> @@ -263,6 +268,8 @@ int devfreq_remove_device(struct device *dev)
>> return -EINVAL;
>> }
>>
>> + sysfs_remove_group(&dev->kobj, &dev_attr_group);
>> +
>> list_del(&devfreq->node);
>>
>> kfree(devfreq);
>> @@ -344,7 +351,7 @@ static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
>> if (devfreq_wq && !polling) {
>> polling = true;
>> queue_delayed_work(devfreq_wq, &devfreq_work,
>> - msecs_to_jiffies(DEVFREQ_INTERVAL));
>> + msecs_to_jiffies(DEVFREQ_INTERVAL));
>
> This change doesn't seem to belong to this patch.
Oops. I'll rearrange the patch series.
>
>> }
>>
>> return err;
>> @@ -382,6 +389,116 @@ int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
>> return err;
>> }
>>
>> +static int num_tickle_all;
>> +static ssize_t tickle_all(struct device *dev, struct device_attribute *attr,
>> + const char *buf, size_t count)
>> +{
>> + int duration = 0;
>> + struct devfreq *tmp;
>> + unsigned long delay;
>> +
>> + sscanf(buf, "%d", &duration);
>> + if (duration < DEVFREQ_INTERVAL)
>> + duration = DEVFREQ_INTERVAL;
>> +
>> + if (unlikely(IS_ERR_OR_NULL(dev))) {
>> + pr_err("%s: Invalid parameters\n", __func__);
>
> Please say "null device" instead of in addition to the above.
Ok. I'll let it say "Null or invalid device" as it's IS_ERR_OR_NULL.
>
>> + return -EINVAL;
>> + }
>> +
>> + delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
>> +
>> + mutex_lock(&devfreq_list_lock);
>> + list_for_each_entry(tmp, &devfreq_list, node) {
>> + _devfreq_tickle_device(tmp, delay);
>> + }
>> + mutex_unlock(&devfreq_list_lock);
>> +
>> + num_tickle_all++;
>> + return count;
>> +}
>> +
>> +static ssize_t show_num_tickle_all(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + return sprintf(buf, "%d\n", num_tickle_all);
>> +}
>> +
>> +static ssize_t show_min_interval(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + return sprintf(buf, "%d\n", DEVFREQ_INTERVAL);
>> +}
>> +
>> +static ssize_t show_monitoring(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + return sprintf(buf, "%d\n", monitoring ? 1 : 0);
>> +}
>> +
>> +static DEVICE_ATTR(tickle_all, 0644, show_num_tickle_all, tickle_all);
>> +static DEVICE_ATTR(min_interval, 0444, show_min_interval, NULL);
>> +static DEVICE_ATTR(monitoring, 0444, show_monitoring, NULL);
>> +static struct attribute *devfreq_entries[] = {
>> + &dev_attr_tickle_all.attr,
>> + &dev_attr_min_interval.attr,
>> + &dev_attr_monitoring.attr,
>> + NULL,
>> +};
>> +static struct attribute_group devfreq_attr_group = {
>> + .name = NULL,
>> + .attrs = devfreq_entries,
>> +};
>> +
>> +static ssize_t tickle(struct device *dev, struct device_attribute *attr,
>> + const char *buf, size_t count)
>> +{
>> + int duration;
>> + struct devfreq *df;
>> + unsigned long delay;
>> +
>> + sscanf(buf, "%d", &duration);
>> + if (duration < DEVFREQ_INTERVAL)
>> + duration = DEVFREQ_INTERVAL;
>> +
>> + if (unlikely(IS_ERR_OR_NULL(dev))) {
>> + pr_err("%s: Invalid parameters\n", __func__);
>
> Like above.
Yup.
>
>> + return -EINVAL;
>> + }
>> +
>> + delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
>> +
>> + mutex_lock(&devfreq_list_lock);
>> + df = find_device_devfreq(dev);
>> + _devfreq_tickle_device(df, delay);
>> + mutex_unlock(&devfreq_list_lock);
>> +
>> + return count;
>> +}
>> +
>> +static ssize_t show_num_tickle(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct devfreq *df;
>> +
>> + df = find_device_devfreq(dev);
>> +
>> + if (!IS_ERR(df))
>> + return sprintf(buf, "%d\n", df->num_tickle);
>> +
>> + return PTR_ERR(df);
>> +}
>> +
>> +static DEVICE_ATTR(tickle, 0644, show_num_tickle, tickle);
>> +static struct attribute *dev_entries[] = {
>> + &dev_attr_tickle.attr,
>> + NULL,
>> +};
>> +static struct attribute_group dev_attr_group = {
>> + .name = "devfreq",
>> + .attrs = dev_entries,
>> +};
>> +
>> static int __init devfreq_init(void)
>> {
>> mutex_lock(&devfreq_list_lock);
>> @@ -389,6 +506,20 @@ static int __init devfreq_init(void)
>> polling = false;
>> devfreq_wq = create_freezable_workqueue("devfreq_wq");
>> INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
>> +
>> +#ifdef CONFIG_PM
>> + /* Create sysfs */
>> + devfreq_kobj = kobject_create_and_add("devfreq", power_kobj);
>
> Hmm, so power_kobj is global? It shouldn't be.
Yes, it is global and it's declared at include/linux/kobject.h.
>
> Generally, whatever adds attributes to /sys/power should be located in
> kernel/power/ .
I've put it at /sys/power because these attributes are global to every
device with devfreq.
Anyway, if you think that's a weird location for it, could you please
give me some hints on where would be proper to locate system-wide
devfreq sysfs attributes?
Or, do you think kernel/power/devfreq.c would be a proper location for devfreq?
Or, what about allowing every /sys/devices/.../devfreq/global/* to be
the same "global" attributes of devfreq?
>
>> + if (!devfreq_kobj) {
>> + pr_err("Unable to create DEVFREQ kobject.\n");
>> + goto out;
>> + }
>> + if (sysfs_create_group(devfreq_kobj, &devfreq_attr_group)) {
>> + pr_err("Unable to create DEVFREQ sysfs entries.\n");
>> + goto out;
>> + }
>> +#endif
>> +out:
>> mutex_unlock(&devfreq_list_lock);
>>
>> devfreq_monitor(&devfreq_work.work);
>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>> index 1ec9a40..69334e7 100644
>> --- a/include/linux/devfreq.h
>> +++ b/include/linux/devfreq.h
>> @@ -59,6 +59,7 @@ struct devfreq_governor {
>> * at each executino of devfreq_monitor, tickle is decremented.
>> * User may tickle a device-devfreq in order to set maximum
>> * frequency instaneously with some guaranteed duration.
>> + * @num_tickle number of tickle calls.
>> *
>> * This structure stores the DEVFREQ information for a give device.
>> */
>> @@ -72,6 +73,8 @@ struct devfreq {
>> unsigned long previous_freq;
>> unsigned int next_polling;
>> unsigned int tickle;
>> +
>> + unsigned int num_tickle;
>> };
>>
>> #if defined(CONFIG_PM_DEVFREQ)
>
> It looks like the above two changes should be moved to a separate patch.
These two changes are for the sysfs interface as counting number of
tickle is added with sysfs interface.
>
> Thanks,
> Rafael
>
Thank you,
- MyungJoo
--
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply
* Re: [PATCH 3/7] fault-injection: notifier error injection
From: Akinobu Mita @ 2011-07-04 5:35 UTC (permalink / raw)
To: Pavel Machek
Cc: Greg Kroah-Hartman, linux-kernel, linux-mm, Paul Mackerras, akpm,
linuxppc-dev, linux-pm
In-Reply-To: <20110703171626.GG21127@elf.ucw.cz>
2011/7/4 Pavel Machek <pavel@ucw.cz>:
>
>> + for (action = enb->actions; action->name; action++) {
>> + struct dentry *file = debugfs_create_int(action->name, mode,
>> + enb->dir, &action->error);
>> +
>> + if (!file) {
>> + debugfs_remove_recursive(enb->dir);
>> + return -ENOMEM;
>> + }
>
> Few lines how this work would be welcome...?
OK, I'll add a comment like below.
/*
* Create debugfs r/w file containing action->error. If notifier call
* chain is called with action->val, it will fail with the error code
*/
^ permalink raw reply
* Re: [PATCH 1/7] pm: improve error code of pm_notifier_call_chain()
From: Akinobu Mita @ 2011-07-04 5:32 UTC (permalink / raw)
To: Pavel Machek
Cc: linux-s390, Jiri Kosina, Heiko Carstens, linux-kernel,
Martin Schwidefsky, linux390, akpm, linux-pm
In-Reply-To: <20110703171539.GF21127@elf.ucw.cz>
2011/7/4 Pavel Machek <pavel@ucw.cz>:
> On Sun 2011-07-03 23:16:15, Akinobu Mita wrote:
>> This enables pm_notifier_call_chain() to get the actual error code
>> in the callback rather than always assume -EINVAL by converting all
>> PM notifier calls to return encapsulate error code with
>> notifier_from_errno().
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
>> Cc: Pavel Machek <pavel@ucw.cz>
>
> Nothing obviously wrong here.
Thanks. Can I add your ACK for this patch?
^ permalink raw reply
* Re: [PATCH v3 2/3] PM / DEVFREQ: add example governors
From: MyungJoo Ham @ 2011-07-04 1:37 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
linux-pm, Thomas Gleixner
In-Reply-To: <201107022358.58319.rjw@sisk.pl>
Hello,
2011/7/3 Rafael J. Wysocki <rjw@sisk.pl>:
> Hi,
>
> On Friday, May 27, 2011, MyungJoo Ham wrote:
>> Three CPUFREQ-like governors are provided as examples.
>>
>> powersave: use the lowest frequency possible. The user (device) should
>> set the polling_ms as 0 because polling is useless for this governor.
>>
>> performance: use the highest freqeuncy possible. The user (device)
>> should set the polling_ms as 0 because polling is useless for this
>> governor.
>>
>> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>>
>> When a user updates OPP entries (enable/disable/add), OPP framework
>> automatically notifies DEVFREQ to update operating frequency
>> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
>> , performance, or any other "static" governors.
>>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[]
>> +
>> + /* Set the desired frequency based on the load */
>> + a = (unsigned long long) stat.busy_time * stat.current_frequency;
>
> What's the purpose of the conversion?
Assuming that the work speed of a device is proportional to its
frequency, it measures the amount of work done.
It's time * work/time. For example, during the last 10 second, if the
busy_time was 5 sec and frequency was 10MHz,
it's "50M", which is same as 20MHz and 2.5 sec.
>
>> + b = div_u64(a, stat.total_time);
>> + b *= 100;
>> + b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
>> + *freq = (unsigned long) b;
>> +
>> + return 0;
>> +}
>
> Rafael
>
--
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply
* [GIT PULL] Power management documentation fixes for 3.0
From: Rafael J. Wysocki @ 2011-07-03 20:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux PM mailing list, LKML
Hi Linus,
Please pull power management documentation fixes for 3.0 from:
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-fixes
One of them makes documentation match the code again after a recent change
and it depends on the other one, which fixes a typo.
Documentation/power/runtime_pm.txt | 26 +++++++++++++++++++++-----
1 files changed, 21 insertions(+), 5 deletions(-)
---------------
Kevin Hilman (1):
PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist.
Rafael J. Wysocki (1):
PM / Runtime: Update documentation regarding driver removal
^ permalink raw reply
* Re: [patch 1/1] [PATCH] include storage keys in hibernation image.
From: Pavel Machek @ 2011-07-03 17:46 UTC (permalink / raw)
To: Martin Schwidefsky; +Cc: linux-s390, linux-kernel, linux-pm, Jiri Slaby
In-Reply-To: <20110615093629.07f01779@mschwide>
Hi!
> > I think, however, that we really should try to merge them. The only
> > difference seems to be how the additionally allocated pages will be populated
> > and what's going to happen to their contents during restore.
> >
> > ACPI will simply copy the NVS memory to those pages, while S390 will save
> > the relevant storage key bits in there.
>
> One complication to keep in mind is that we need to know which storage key
> goes to which page frame. We need something like the orig_bm/copy_bm or
> we'd have to store the pfn with the key. Simply storing the key for every
> page will make the array unnecessarily big.
How big is the overhead? In percent / in megabytes?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply
* Re: [PATCH 3/7] fault-injection: notifier error injection
From: Pavel Machek @ 2011-07-03 17:16 UTC (permalink / raw)
To: Akinobu Mita
Cc: Greg Kroah-Hartman, linux-kernel, linux-mm, Paul Mackerras, akpm,
linuxppc-dev, linux-pm
In-Reply-To: <1309702581-16863-4-git-send-email-akinobu.mita@gmail.com>
> + for (action = enb->actions; action->name; action++) {
> + struct dentry *file = debugfs_create_int(action->name, mode,
> + enb->dir, &action->error);
> +
> + if (!file) {
> + debugfs_remove_recursive(enb->dir);
> + return -ENOMEM;
> + }
Few lines how this work would be welcome...?
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply
* Re: [PATCH 1/7] pm: improve error code of pm_notifier_call_chain()
From: Pavel Machek @ 2011-07-03 17:15 UTC (permalink / raw)
To: Akinobu Mita
Cc: linux-s390, Jiri Kosina, Heiko Carstens, linux-kernel,
Martin Schwidefsky, linux390, akpm, linux-pm
In-Reply-To: <1309702581-16863-2-git-send-email-akinobu.mita@gmail.com>
On Sun 2011-07-03 23:16:15, Akinobu Mita wrote:
> This enables pm_notifier_call_chain() to get the actual error code
> in the callback rather than always assume -EINVAL by converting all
> PM notifier calls to return encapsulate error code with
> notifier_from_errno().
>
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> Cc: Pavel Machek <pavel@ucw.cz>
Nothing obviously wrong here.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply
* [PATCH 5/7] PM: PM notifier error injection
From: Akinobu Mita @ 2011-07-03 14:16 UTC (permalink / raw)
To: linux-kernel, akpm; +Cc: linux-pm, Akinobu Mita
In-Reply-To: <1309702581-16863-1-git-send-email-akinobu.mita@gmail.com>
This provides the ability to inject artifical errors to PM notifier
chain callbacks. It is controlled through debugfs interface under
/sys/kernel/debug/pm-notifier-error-inject/
Each of the files in the directory represents an event which can be
failed and contains the error code. If the notifier call chain should
be failed with some events notified, write the error code to the files.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-pm@lists.linux-foundation.org
---
kernel/power/main.c | 30 ++++++++++++++++++++++++++++++
lib/Kconfig.debug | 8 ++++++++
2 files changed, 38 insertions(+), 0 deletions(-)
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6c601f8..04b3774 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -42,6 +42,36 @@ int pm_notifier_call_chain(unsigned long val)
return notifier_to_errno(ret);
}
+#ifdef CONFIG_PM_NOTIFIER_ERROR_INJECTION
+
+static struct err_inject_notifier_block err_inject_pm_notifier = {
+ .actions = {
+ { ERR_INJECT_NOTIFIER_ACTION(PM_HIBERNATION_PREPARE) },
+ { ERR_INJECT_NOTIFIER_ACTION(PM_SUSPEND_PREPARE) },
+ { ERR_INJECT_NOTIFIER_ACTION(PM_RESTORE_PREPARE) },
+ {}
+ }
+};
+
+static int __init err_inject_pm_notifier_init(void)
+{
+ int err;
+
+ err = err_inject_notifier_block_init(&err_inject_pm_notifier,
+ "pm-notifier-error-inject", -1);
+ if (err)
+ return err;
+
+ err = register_pm_notifier(&err_inject_pm_notifier.nb);
+ if (err)
+ err_inject_notifier_block_cleanup(&err_inject_pm_notifier);
+
+ return err;
+}
+late_initcall(err_inject_pm_notifier_init);
+
+#endif /* CONFIG_PM_NOTIFIER_ERROR_INJECTION */
+
/* If set, devices may be suspended and resumed asynchronously. */
int pm_async_enabled = 1;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d944b32..3ffb38b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1043,6 +1043,14 @@ config CPU_NOTIFIER_ERROR_INJECTION
# echo 0 > /sys/devices/system/cpu/cpu1/online
bash: echo: write error: Operation not permitted
+config PM_NOTIFIER_ERROR_INJECTION
+ bool "PM notifier error injection"
+ depends on PM_DEBUG && NOTIFIER_ERROR_INJECTION
+ help
+ This option provides the ability to inject artifical errors to
+ PM notifier chain callbacks. It is controlled through debugfs
+ interface under /sys/kernel/debug/pm-notifier-error-inject/
+
config CPU_NOTIFIER_ERROR_INJECT
tristate "CPU notifier error injection module"
depends on HOTPLUG_CPU && DEBUG_KERNEL
--
1.7.4.4
^ permalink raw reply related
* [PATCH 3/7] fault-injection: notifier error injection
From: Akinobu Mita @ 2011-07-03 14:16 UTC (permalink / raw)
To: linux-kernel, akpm
Cc: Greg Kroah-Hartman, Akinobu Mita, linux-mm, Paul Mackerras,
linux-pm, linuxppc-dev
In-Reply-To: <1309702581-16863-1-git-send-email-akinobu.mita@gmail.com>
The notifier error injection provides the ability to inject artifical
errors to specified notifier chain callbacks. It is useful to test
the error handling of notifier call chain failures.
This adds common basic functions to define which type of events can be
fail and to initialize the debugfs interface to control what error code
should be returned and which event should be failed.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-pm@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Cc: linuxppc-dev@lists.ozlabs.org
---
include/linux/notifier.h | 25 ++++++++++++++++++++
kernel/notifier.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 11 +++++++++
3 files changed, 93 insertions(+), 0 deletions(-)
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index c0688b0..51882d6 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -278,5 +278,30 @@ extern struct blocking_notifier_head reboot_notifier_list;
#define VT_UPDATE 0x0004 /* A bigger update occurred */
#define VT_PREWRITE 0x0005 /* A char is about to be written to the console */
+#ifdef CONFIG_NOTIFIER_ERROR_INJECTION
+
+struct err_inject_notifier_action {
+ unsigned long val;
+ int error;
+ const char *name;
+};
+
+#define ERR_INJECT_NOTIFIER_ACTION(action) \
+ .name = #action, .val = (action),
+
+struct err_inject_notifier_block {
+ struct notifier_block nb;
+ struct dentry *dir;
+ struct err_inject_notifier_action actions[];
+ /* The last slot must be terminated with zero sentinel */
+};
+
+extern int err_inject_notifier_block_init(struct err_inject_notifier_block *enb,
+ const char *name, int priority);
+extern void err_inject_notifier_block_cleanup(
+ struct err_inject_notifier_block *enb);
+
+#endif /* CONFIG_NOTIFIER_ERROR_INJECTION */
+
#endif /* __KERNEL__ */
#endif /* _LINUX_NOTIFIER_H */
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 2488ba7..8dcc485 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -5,6 +5,7 @@
#include <linux/rcupdate.h>
#include <linux/vmalloc.h>
#include <linux/reboot.h>
+#include <linux/debugfs.h>
/*
* Notifier list for kernel code which wants to be called
@@ -584,3 +585,59 @@ int unregister_die_notifier(struct notifier_block *nb)
return atomic_notifier_chain_unregister(&die_chain, nb);
}
EXPORT_SYMBOL_GPL(unregister_die_notifier);
+
+#ifdef CONFIG_NOTIFIER_ERROR_INJECTION
+
+static int err_inject_notifier_callback(struct notifier_block *nb,
+ unsigned long val, void *p)
+{
+ int err = 0;
+ struct err_inject_notifier_block *enb =
+ container_of(nb, struct err_inject_notifier_block, nb);
+ struct err_inject_notifier_action *action;
+
+ for (action = enb->actions; action->name; action++) {
+ if (action->val == val) {
+ err = action->error;
+ break;
+ }
+ }
+ if (err) {
+ printk(KERN_INFO "Injecting error (%d) to %s\n",
+ err, action->name);
+ }
+
+ return notifier_from_errno(err);
+}
+
+int err_inject_notifier_block_init(struct err_inject_notifier_block *enb,
+ const char *name, int priority)
+{
+ struct err_inject_notifier_action *action;
+ mode_t mode = S_IFREG | S_IRUSR | S_IWUSR;
+
+ enb->nb.notifier_call = err_inject_notifier_callback;
+ enb->nb.priority = priority;
+
+ enb->dir = debugfs_create_dir(name, NULL);
+ if (!enb->dir)
+ return -ENOMEM;
+
+ for (action = enb->actions; action->name; action++) {
+ struct dentry *file = debugfs_create_int(action->name, mode,
+ enb->dir, &action->error);
+
+ if (!file) {
+ debugfs_remove_recursive(enb->dir);
+ return -ENOMEM;
+ }
+ }
+ return 0;
+}
+
+void err_inject_notifier_block_cleanup(struct err_inject_notifier_block *enb)
+{
+ debugfs_remove_recursive(enb->dir);
+}
+
+#endif /* CONFIG_NOTIFIER_ERROR_INJECTION */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dd373c8..8c6ce7e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1018,6 +1018,17 @@ config LKDTM
Documentation on how to use the module can be found in
Documentation/fault-injection/provoke-crashes.txt
+config NOTIFIER_ERROR_INJECTION
+ bool "Notifier error injection"
+ depends on DEBUG_KERNEL
+ select DEBUG_FS
+ help
+ This option provides the ability to inject artifical errors to
+ specified notifier chain callbacks. It is useful to test the error
+ handling of notifier call chain failures.
+
+ Say N if unsure.
+
config CPU_NOTIFIER_ERROR_INJECT
tristate "CPU notifier error injection module"
depends on HOTPLUG_CPU && DEBUG_KERNEL
--
1.7.4.4
^ permalink raw reply related
* [PATCH 1/7] pm: improve error code of pm_notifier_call_chain()
From: Akinobu Mita @ 2011-07-03 14:16 UTC (permalink / raw)
To: linux-kernel, akpm
Cc: linux-s390, Jiri Kosina, Heiko Carstens, Akinobu Mita,
Martin Schwidefsky, linux390, linux-pm
In-Reply-To: <1309702581-16863-1-git-send-email-akinobu.mita@gmail.com>
This enables pm_notifier_call_chain() to get the actual error code
in the callback rather than always assume -EINVAL by converting all
PM notifier calls to return encapsulate error code with
notifier_from_errno().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-pm@lists.linux-foundation.org
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
---
drivers/char/apm-emulation.c | 2 +-
drivers/s390/char/vmwatchdog.c | 4 ++--
drivers/s390/cio/css.c | 8 ++++----
kernel/power/main.c | 5 +++--
4 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/char/apm-emulation.c b/drivers/char/apm-emulation.c
index 548708c..a7346ab 100644
--- a/drivers/char/apm-emulation.c
+++ b/drivers/char/apm-emulation.c
@@ -606,7 +606,7 @@ static int apm_suspend_notifier(struct notifier_block *nb,
return NOTIFY_OK;
/* interrupted by signal */
- return NOTIFY_BAD;
+ return notifier_from_errno(err);
case PM_POST_SUSPEND:
/*
diff --git a/drivers/s390/char/vmwatchdog.c b/drivers/s390/char/vmwatchdog.c
index 12ef912..11312f4 100644
--- a/drivers/s390/char/vmwatchdog.c
+++ b/drivers/s390/char/vmwatchdog.c
@@ -258,13 +258,13 @@ static int vmwdt_suspend(void)
if (test_and_set_bit(VMWDT_OPEN, &vmwdt_is_open)) {
pr_err("The system cannot be suspended while the watchdog"
" is in use\n");
- return NOTIFY_BAD;
+ return notifier_from_errno(-EBUSY);
}
if (test_bit(VMWDT_RUNNING, &vmwdt_is_open)) {
clear_bit(VMWDT_OPEN, &vmwdt_is_open);
pr_err("The system cannot be suspended while the watchdog"
" is running\n");
- return NOTIFY_BAD;
+ return notifier_from_errno(-EBUSY);
}
return NOTIFY_DONE;
}
diff --git a/drivers/s390/cio/css.c b/drivers/s390/cio/css.c
index c47b25f..92d7324 100644
--- a/drivers/s390/cio/css.c
+++ b/drivers/s390/cio/css.c
@@ -814,8 +814,8 @@ static int css_power_event(struct notifier_block *this, unsigned long event,
mutex_unlock(&css->mutex);
continue;
}
- if (__chsc_do_secm(css, 0))
- ret = NOTIFY_BAD;
+ ret = __chsc_do_secm(css, 0);
+ ret = notifier_from_errno(ret);
mutex_unlock(&css->mutex);
}
break;
@@ -831,8 +831,8 @@ static int css_power_event(struct notifier_block *this, unsigned long event,
mutex_unlock(&css->mutex);
continue;
}
- if (__chsc_do_secm(css, 1))
- ret = NOTIFY_BAD;
+ ret = __chsc_do_secm(css, 1);
+ ret = notifier_from_errno(ret);
mutex_unlock(&css->mutex);
}
/* search for subchannels, which appeared during hibernation */
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 2981af4..6c601f8 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -37,8 +37,9 @@ EXPORT_SYMBOL_GPL(unregister_pm_notifier);
int pm_notifier_call_chain(unsigned long val)
{
- return (blocking_notifier_call_chain(&pm_chain_head, val, NULL)
- == NOTIFY_BAD) ? -EINVAL : 0;
+ int ret = blocking_notifier_call_chain(&pm_chain_head, val, NULL);
+
+ return notifier_to_errno(ret);
}
/* If set, devices may be suspended and resumed asynchronously. */
--
1.7.4.4
^ permalink raw reply related
* LPC2011 Power Management Micro Conf
From: Rafael J. Wysocki @ 2011-07-03 7:29 UTC (permalink / raw)
To: linux-pm
Cc: Arnd Bergmann, Greg Kroah-Hartman, Stephen Boyd, LKML,
Grant Likely, MyungJoo Ham, Jean Pihet, Arjan van de Ven
In-Reply-To: <201105011353.44809.rjw@sisk.pl>
Hi,
Some time ago I sent the following to linux-pm and wasn't really satisfied
with the response, so here it goes again. It is slightly outdated, because
we've already started to plan a PM meeting at the Kernel Summit, but I hope
some of you will be able to participate in the LPC PM Mini Conf too.
In the previous years we used to organize Power Management Mini Summits to
discuss PM-related issues, the last of which took place during LinuxCon in
Boston in 2010. Unfortunately, however, it wasn't particularly well attended
and some of the participants generally felt that it had failed its purpose.
The power management track at the last LPC, on the other hand, was generally
regarded as interesting and successful, so there only seems to be room for
one event devoted to power management a year. For this reason, there won't
be a PM Mini Summit in 2011 and the LPC PM Mini Conference will play the role
of it. To this end, in addition to the plenary sessions within the LPC PM
track there will be a possibility to discuss specific problems related to
power management in smaller groups.
If there's anything you'd like to discuss or give a presentation on related to
Power Management, please let me know, preferably by replying to this message.
We have 1.5 hours allocated in the LPC schedule for the plenary session, but
we will be able to move to another room (or a number of rooms if that's more
suitable) afterwards to discuss things in detail.
At the moment, there are two items proposed for the plenary session. One of
them will cover the power management development during the last year, but in
that context I also would like things that are still missing and problems to
solve to be mentioned, so please tell me what they are from your point of view.
The second one will cover the idle power consumption problem and will be led
by Len Brown. Len has a lot of experience in that area, but mostly with systems
based on Intel hardware, so if you want to contribute your own observations
and/or results related to idle power, please let me know.
Anyway, if you are going to attend the LPC and have anything PM-related to
talk about or discuss, please let me know.
Best regards,
Rafael
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox