Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* Re: [PATCH] PM / devfreq: remove compiler error with module governors (2)
From: Rafael J. Wysocki @ 2012-11-28 10:22 UTC (permalink / raw)
  To: MyungJoo Ham; +Cc: linux-pm, myungjoo.ham, nm
In-Reply-To: <1354067841-4936-1-git-send-email-myungjoo.ham@samsung.com>

On Wednesday, November 28, 2012 10:57:21 AM MyungJoo Ham wrote:
> Governors compiled as modules may use these functions.

OK

Do you want me to take this directly?

Rafael


> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> ---
> This patch is also locataed at
> git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq for-rafael tags/pull_req_20121128
> 
>  drivers/devfreq/devfreq.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 34c00c5..a8f0173 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -235,6 +235,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>  		queue_delayed_work(devfreq_wq, &devfreq->work,
>  			msecs_to_jiffies(devfreq->profile->polling_ms));
>  }
> +EXPORT_SYMBOL(devfreq_monitor_start);
>  
>  /**
>   * devfreq_monitor_stop() - Stop load monitoring of a devfreq instance
> @@ -248,6 +249,7 @@ void devfreq_monitor_stop(struct devfreq *devfreq)
>  {
>  	cancel_delayed_work_sync(&devfreq->work);
>  }
> +EXPORT_SYMBOL(devfreq_monitor_stop);
>  
>  /**
>   * devfreq_monitor_suspend() - Suspend load monitoring of a devfreq instance
> @@ -273,6 +275,7 @@ void devfreq_monitor_suspend(struct devfreq *devfreq)
>  	mutex_unlock(&devfreq->lock);
>  	cancel_delayed_work_sync(&devfreq->work);
>  }
> +EXPORT_SYMBOL(devfreq_monitor_suspend);
>  
>  /**
>   * devfreq_monitor_resume() - Resume load monitoring of a devfreq instance
> @@ -297,6 +300,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq)
>  out:
>  	mutex_unlock(&devfreq->lock);
>  }
> +EXPORT_SYMBOL(devfreq_monitor_resume);
>  
>  /**
>   * devfreq_interval_update() - Update device devfreq monitoring interval
> @@ -343,6 +347,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay)
>  out:
>  	mutex_unlock(&devfreq->lock);
>  }
> +EXPORT_SYMBOL(devfreq_interval_update);
>  
>  /**
>   * devfreq_notifier_call() - Notify that the device frequency requirements
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Rafael J. Wysocki @ 2012-11-28 10:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <CACVXFVOk45wr8jv3w=KO7uTThGSTSkq0FRsPD6p_AyQZLWGQJg@mail.gmail.com>

On Wednesday, November 28, 2012 05:47:18 PM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > But it doesn't have to walk the children.  Moreover, with counters it only
> 
> Yeah, I got it, it is the advantage of counter, but with extra 'int'
> field introduced
> in 'struct device'.
> 
> > needs to walk the whole path if all devices in it need to be updated.  For
> > example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
> > whose parent's counter is greater than zero already, you don't need to
> > walk the path above the parent.
> 
> We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
> can return immediately if one parent or the 'dev' flag is true.
> 
> But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
> called in a very infrequent path(network/block device->remove()), looks the
> introduced cost isn't worthy of the obtained advantage.
> 
> So could you accept not introducing counter? and I will update with the
> above improvement you suggested.

Well, please see my other message I sent a while ago. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Rafael J. Wysocki @ 2012-11-28 10:06 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <CACVXFVP=3s3pawyEbogjb=PfbSeD1B+LFk7g04FAMkGuXDQUbQ@mail.gmail.com>

On Wednesday, November 28, 2012 11:57:19 AM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> >> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> >> to help PM core to teach mm not allocating memory with GFP_KERNEL
> >> flag for avoiding probable deadlock.
> >>
> >> As explained in the comment, any GFP_KERNEL allocation inside
> >> runtime_resume() or runtime_suspend() on any one of device in
> >> the path from one block or network device to the root device
> >> in the device tree may cause deadlock, the introduced
> >> pm_runtime_set_memalloc_noio() sets or clears the flag on
> >> device in the path recursively.
> >>
> >> Cc: Alan Stern <stern@rowland.harvard.edu>
> >> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> >> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> ---
> >> v5:
> >>       - fix code style error
> >>       - add comment on clear the device memalloc_noio flag
> >> v4:
> >>       - rename memalloc_noio_resume as memalloc_noio
> >>       - remove pm_runtime_get_memalloc_noio()
> >>       - add comments on pm_runtime_set_memalloc_noio
> >> v3:
> >>       - introduce pm_runtime_get_memalloc_noio()
> >>       - hold one global lock on pm_runtime_set_memalloc_noio
> >>       - hold device power lock when accessing memalloc_noio_resume
> >>         flag suggested by Alan Stern
> >>       - implement pm_runtime_set_memalloc_noio without recursion
> >>         suggested by Alan Stern
> >> v2:
> >>       - introduce pm_runtime_set_memalloc_noio()
> >> ---
> >>  drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
> >>  include/linux/pm.h           |    1 +
> >>  include/linux/pm_runtime.h   |    3 +++
> >>  3 files changed, 64 insertions(+)
> >>
> >> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> >> index 3148b10..3e198a0 100644
> >> --- a/drivers/base/power/runtime.c
> >> +++ b/drivers/base/power/runtime.c
> >> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
> >>  }
> >>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
> >>
> >> +static int dev_memalloc_noio(struct device *dev, void *data)
> >> +{
> >> +     return dev->power.memalloc_noio;
> >> +}
> >> +
> >> +/*
> >> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> >> + * @dev: Device to handle.
> >> + * @enable: True for setting the flag and False for clearing the flag.
> >> + *
> >> + * Set the flag for all devices in the path from the device to the
> >> + * root device in the device tree if @enable is true, otherwise clear
> >> + * the flag for devices in the path whose siblings don't set the flag.
> >> + *
> >
> > Please use counters instead of walking the whole path every time.  Ie. in
> > addition to the flag add a counter to store the number of the device's
> > children having that flag set.
> 
> Thanks for your review.
> 
> IMO, pm_runtime_set_memalloc_noio() is only called in
> probe() and release() of block device and network device, which is
> in a very infrequent path, so I am wondering if it is worthy of introducing
> another counter for all devices.

Well, it may be unfrequent, but does it mean it has to do things that may
be avoided (ie. walking the children of every node in the path in some cases)?

I don't really think that the counters would cost us that much anyway.

> Also looks the current implementation of pm_runtime_set_memalloc_noio()
> is simple and clean enough with the flag, IMO.

I know you always know better. :-)

> > I would use the flag only to store the information that
> > pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
> > and I'd use a counter for everything else.
> >
> > That is, have power.memalloc_count that would be incremented when (1)
> > pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
> > power.memalloc_count for one of its children changes from 0 to 1 (and
> > analogously for decrementation).  Then, check the counter in rpm_callback().
> 
> Sorry, could you explain in a bit detail why we need the counter? Looks only
> checking the flag in rpm_callback() is enough, doesn't it?

Why would I want to use power.memalloc_count in addition to the
power.memalloc_noio flag?

Consider this:

pm_runtime_set_memalloc_noio(dev):
	return if power.memalloc_noio is set
	set power.memalloc_noio
  loop:
	increment power.memalloc_count
	if power.memalloc_count is 1 now switch to parent and go to loop

pm_runtime_clear_memalloc_noio(dev):
	return if power.memalloc_noio is unset
	unset power.memalloc_noio
  loop:
	decrement power.memalloc_count
	if power.memalloc_count is 0 now switch to parent and go to loop

Looks kind of simpler, doesn't it?

And why rpm_callback() should check power.memalloc_count instead of the count?
Because power.memalloc_noio will only be set for devices that
pm_runtime_set_memalloc_noio(dev) was called for directly (not necessarily for
the parents).

And that works even if someone calls any of them twice in a row for the same
device (presumably by mistake) and doesn't have to make any assumptions
about devices it is called for.

> > Besides, don't you need to check children for the arg device itself?
> 
> It isn't needed since the children of network/block device can't be
> involved of the deadlock in runtime PM path.
> 
> Also, the function is only called by network device or block device
> subsystem, both the two kind of device are class device and should
> have no children.

OK, so not walking the arg device's children is an optimization related to
some assumptions regarding who's supposed to use this routine.  That should
be clearly documented.

However, I'd prefer it not to make such assumptions in the first place.

> >> + * The function should only be called by block device, or network
> >> + * device driver for solving the deadlock problem during runtime
> >> + * resume/suspend:
> >> + *
> >> + *     If memory allocation with GFP_KERNEL is called inside runtime
> >> + *     resume/suspend callback of any one of its ancestors(or the
> >> + *     block device itself), the deadlock may be triggered inside the
> >> + *     memory allocation since it might not complete until the block
> >> + *     device becomes active and the involed page I/O finishes. The
> >> + *     situation is pointed out first by Alan Stern. Network device
> >> + *     are involved in iSCSI kind of situation.
> >> + *
> >> + * The lock of dev_hotplug_mutex is held in the function for handling
> >> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
> >> + * in async probe().
> >> + *
> >> + * The function should be called between device_add() and device_del()
> >> + * on the affected device(block/network device).
> >> + */
> >> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
> >> +{
> >> +     static DEFINE_MUTEX(dev_hotplug_mutex);
> >
> > What's the mutex for?
> 
> It is for avoiding hotplug race, for example, without the mutex,
> another child may set the flag between the time device_for_each_child()
> runs and the next loop iteration in pm_runtime_set_memalloc_noio(false).

OK

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Ming Lei @ 2012-11-28  9:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <1408044.6czCGhbHJH@vostro.rjw.lan>

On Wed, Nov 28, 2012 at 5:29 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> But it doesn't have to walk the children.  Moreover, with counters it only

Yeah, I got it, it is the advantage of counter, but with extra 'int'
field introduced
in 'struct device'.

> needs to walk the whole path if all devices in it need to be updated.  For
> example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
> whose parent's counter is greater than zero already, you don't need to
> walk the path above the parent.

We still can do it with the flag only, pm_runtime_set_memalloc_noio(dev, true)
can return immediately if one parent or the 'dev' flag is true.

But considered that the pm_runtime_set_memalloc_noio(dev, false) is only
called in a very infrequent path(network/block device->remove()), looks the
introduced cost isn't worthy of the obtained advantage.

So could you accept not introducing counter? and I will update with the
above improvement you suggested.

Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Rafael J. Wysocki @ 2012-11-28  9:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <CACVXFVODD9fRqQc3kR58OJm3ERgBWojnx=790xGwu=MPGaSmMA@mail.gmail.com>

On Wednesday, November 28, 2012 12:34:36 PM Ming Lei wrote:
> On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > Please use counters instead of walking the whole path every time.  Ie. in
> > addition to the flag add a counter to store the number of the device's
> > children having that flag set.
> 
> Even though counter is added, walking the whole path can't be avoided too,
> and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio
> is required to set or clear the flag(or increase/decrease the counter) of
> devices in the whole path.

But it doesn't have to walk the children.  Moreover, with counters it only
needs to walk the whole path if all devices in it need to be updated.  For
example, if you call pm_runtime_set_memalloc_noio(dev, true) for a device
whose parent's counter is greater than zero already, you don't need to
walk the path above the parent.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: James Bottomley @ 2012-11-28  8:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rafael J. Wysocki, linux-pm, Aaron Lu, Jeff Garzik, Alan Stern,
	Jeff Wu, Aaron Lu, linux-ide, linux-scsi, linux-acpi
In-Reply-To: <20121128013928.GB15971@htj.dyndns.org>

On Tue, 2012-11-27 at 17:39 -0800, Tejun Heo wrote:
> Hey, Rafael.
> 
> On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
> > Having considered that a bit more I'm now thinking that in fact the power state
> > the device is in at the moment doesn't really matter, so the polling code need
> > not really know what PM is doing.  What it needs to know is that the device
> > will generate a hardware event when something interesting happens, so it is not
> > necessary to poll it.
> > 
> > In this particular case it is related to power management (apparently, hardware
> > events will only be generated after the device has been put into ACPI D3cold,
> > or so Aaron seems to be claiming), but it need not be in general, at least in
> > principle.
> > 
> > It looks like we need an "event driven" flag that the can be set in the lower
> > layers and read by the upper layers.  I suppose this means it would need to be
> > in struct device, but not necessarily in the PM-specific part of it.
> 
> We already have that.  That's what gendisk->async_events is for (as
> opposed to gendisk->events).  If all events are async_events, block
> won't poll for events, but I'm not sure that's the golden bullet.
> 
> * None implements async_events yet and an API is missing -
>   disk_check_events() - which is trivial to add, but it's the same
>   story.  We'll need a mechanism to shoot up notification from libata
>   to block layer.  It's admittedly easier to justify routing through
>   SCSI tho.  So, we're mostly shifting the problem.  Given that async
>   events is nice to have, so it isn't a bad idea.

Could we drive it in the polling code?  As in, if we set a flag to say
we're event driven and please don't bother us, we could just respond to
the poll with the last known state (this would probably have to be in
SCSI somewhere since most polls are Test Unit Readys).  That way ZPODD
sets this flag when the device powers down and unsets it when it powers
up.

James

> * Still dunno much about zpodd but IIUC the notification from
>   zero-power is via ACPI.  To advertise that the device doesn't need
>   polling, it should also be able to do async notification while
>   powered up, which isn't covered by zpodd but ATA async notification.
>   So, ummm... that's another obstacle.  If zpodd requires the device
>   to support ATA async notification, it might not be too bad tho.




^ permalink raw reply

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Ming Lei @ 2012-11-28  4:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <5434404.G1ERYjuorE@vostro.rjw.lan>

On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Even though counter is added, walking the whole path can't be avoided too,
and may be a explicit walking or recursion, because pm_runtime_set_memalloc_noio
is required to set or clear the flag(or increase/decrease the counter) of
devices in the whole path.

Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Ming Lei @ 2012-11-28  3:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <5434404.G1ERYjuorE@vostro.rjw.lan>

On Wed, Nov 28, 2012 at 5:19 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
>> The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
>> to help PM core to teach mm not allocating memory with GFP_KERNEL
>> flag for avoiding probable deadlock.
>>
>> As explained in the comment, any GFP_KERNEL allocation inside
>> runtime_resume() or runtime_suspend() on any one of device in
>> the path from one block or network device to the root device
>> in the device tree may cause deadlock, the introduced
>> pm_runtime_set_memalloc_noio() sets or clears the flag on
>> device in the path recursively.
>>
>> Cc: Alan Stern <stern@rowland.harvard.edu>
>> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>> v5:
>>       - fix code style error
>>       - add comment on clear the device memalloc_noio flag
>> v4:
>>       - rename memalloc_noio_resume as memalloc_noio
>>       - remove pm_runtime_get_memalloc_noio()
>>       - add comments on pm_runtime_set_memalloc_noio
>> v3:
>>       - introduce pm_runtime_get_memalloc_noio()
>>       - hold one global lock on pm_runtime_set_memalloc_noio
>>       - hold device power lock when accessing memalloc_noio_resume
>>         flag suggested by Alan Stern
>>       - implement pm_runtime_set_memalloc_noio without recursion
>>         suggested by Alan Stern
>> v2:
>>       - introduce pm_runtime_set_memalloc_noio()
>> ---
>>  drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/pm.h           |    1 +
>>  include/linux/pm_runtime.h   |    3 +++
>>  3 files changed, 64 insertions(+)
>>
>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>> index 3148b10..3e198a0 100644
>> --- a/drivers/base/power/runtime.c
>> +++ b/drivers/base/power/runtime.c
>> @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
>>  }
>>  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
>>
>> +static int dev_memalloc_noio(struct device *dev, void *data)
>> +{
>> +     return dev->power.memalloc_noio;
>> +}
>> +
>> +/*
>> + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
>> + * @dev: Device to handle.
>> + * @enable: True for setting the flag and False for clearing the flag.
>> + *
>> + * Set the flag for all devices in the path from the device to the
>> + * root device in the device tree if @enable is true, otherwise clear
>> + * the flag for devices in the path whose siblings don't set the flag.
>> + *
>
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

Thanks for your review.

IMO, pm_runtime_set_memalloc_noio() is only called in
probe() and release() of block device and network device, which is
in a very infrequent path, so I am wondering if it is worthy of introducing
another counter for all devices.

Also looks the current implementation of pm_runtime_set_memalloc_noio()
is simple and clean enough with the flag, IMO.

> I would use the flag only to store the information that
> pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
> and I'd use a counter for everything else.
>
> That is, have power.memalloc_count that would be incremented when (1)
> pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
> power.memalloc_count for one of its children changes from 0 to 1 (and
> analogously for decrementation).  Then, check the counter in rpm_callback().

Sorry, could you explain in a bit detail why we need the counter? Looks only
checking the flag in rpm_callback() is enough, doesn't it?

>
> Besides, don't you need to check children for the arg device itself?

It isn't needed since the children of network/block device can't be
involved of the deadlock in runtime PM path.

Also, the function is only called by network device or block device
subsystem, both the two kind of device are class device and should
have no children.

>
>> + * The function should only be called by block device, or network
>> + * device driver for solving the deadlock problem during runtime
>> + * resume/suspend:
>> + *
>> + *     If memory allocation with GFP_KERNEL is called inside runtime
>> + *     resume/suspend callback of any one of its ancestors(or the
>> + *     block device itself), the deadlock may be triggered inside the
>> + *     memory allocation since it might not complete until the block
>> + *     device becomes active and the involed page I/O finishes. The
>> + *     situation is pointed out first by Alan Stern. Network device
>> + *     are involved in iSCSI kind of situation.
>> + *
>> + * The lock of dev_hotplug_mutex is held in the function for handling
>> + * hotplug race because pm_runtime_set_memalloc_noio() may be called
>> + * in async probe().
>> + *
>> + * The function should be called between device_add() and device_del()
>> + * on the affected device(block/network device).
>> + */
>> +void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
>> +{
>> +     static DEFINE_MUTEX(dev_hotplug_mutex);
>
> What's the mutex for?

It is for avoiding hotplug race, for example, without the mutex,
another child may set the flag between the time device_for_each_child()
runs and the next loop iteration in pm_runtime_set_memalloc_noio(false).

Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack
From: Ming Lei @ 2012-11-28  3:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <1354069667.BsTEhItmLz@vostro.rjw.lan>

On Wed, Nov 28, 2012 at 5:24 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>
> Please don't duplicate code this way.
>
> You can move that whole thing to rpm_callback().  Yes, you'll probably need to
> check dev->power.memalloc_noio twice in there, but that's OK.

Good idea, I will update it in v7.

Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 6/6 v6] cpufreq, highbank: add support for highbank cpufreq
From: Shawn Guo @ 2012-11-28  2:32 UTC (permalink / raw)
  To: Mark Langsdorf
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cpufreq-u79uwXL29TY76Z2rM5mHXA, linux-pm-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1354046672-7392-7-git-send-email-mark.langsdorf-bsGFqQB8/DxBDgjK7y7TUQ@public.gmane.org>

On Tue, Nov 27, 2012 at 02:04:32PM -0600, Mark Langsdorf wrote:
> Highbank processors depend on the external ECME to perform voltage
> management based on a requested frequency. Communication between the
> A9 cores and the ECME happens over the pl320 IPC channel.

...

> +static int hb_voltage_change(unsigned int freq)
> +{
> +	int i;
> +	u32 msg[7];
> +
> +	msg[0] = HB_CPUFREQ_CHANGE_NOTE;
> +	msg[1] = freq / 1000;
> +	for (i = 2; i < 7; i++)
> +		msg[i] = 0;
> +
> +	return pl320_ipc_transmit(msg);

Is it possible to have this handled inside clk_set_rate() call of cpu
clock?

Shawn

> +}

^ permalink raw reply

* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: Aaron Lu @ 2012-11-28  2:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Rafael J. Wysocki, James Bottomley, linux-pm, Jeff Garzik,
	Alan Stern, Jeff Wu, Aaron Lu, linux-ide, linux-scsi, linux-acpi
In-Reply-To: <20121128013928.GB15971@htj.dyndns.org>

On 11/28/2012 09:39 AM, Tejun Heo wrote:
> Hey, Rafael.
> 
> On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
>> Having considered that a bit more I'm now thinking that in fact the power state
>> the device is in at the moment doesn't really matter, so the polling code need
>> not really know what PM is doing.  What it needs to know is that the device
>> will generate a hardware event when something interesting happens, so it is not
>> necessary to poll it.
>>
>> In this particular case it is related to power management (apparently, hardware
>> events will only be generated after the device has been put into ACPI D3cold,
>> or so Aaron seems to be claiming), but it need not be in general, at least in
>> principle.
>>
>> It looks like we need an "event driven" flag that the can be set in the lower
>> layers and read by the upper layers.  I suppose this means it would need to be
>> in struct device, but not necessarily in the PM-specific part of it.
> 
> We already have that.  That's what gendisk->async_events is for (as
> opposed to gendisk->events).  If all events are async_events, block
> won't poll for events, but I'm not sure that's the golden bullet.

Right. For ZPODD, the problem is, during power up time, it needs
gendisk->events to be set to get poll; and after powered off, it will
need to clear the gendisk->events field so that the poll work will stop.
I'm not sure if the gendisk->async_events should be set here, as the
interrupt only says that user pressed the eject button for the tray type
ODD but he may not insert any disc. The whole point of the interrupt is
to re-power the ODD, it is not designed to give notification of media
change. This is my understanding of ZPODD.

> 
> * None implements async_events yet and an API is missing -

That's what I'm confused when reading the sata async support code, the
SCSI sr driver will unconditionally set gendisk->events, so how the sata
async can benefit? But this is another story.

>   disk_check_events() - which is trivial to add, but it's the same
>   story.  We'll need a mechanism to shoot up notification from libata
>   to block layer.  It's admittedly easier to justify routing through
>   SCSI tho.  So, we're mostly shifting the problem.  Given that async
>   events is nice to have, so it isn't a bad idea.
> 
> * Still dunno much about zpodd but IIUC the notification from
>   zero-power is via ACPI.  To advertise that the device doesn't need

Yes, when powered off, if users press the eject button of a tray type
ODD or inserts a disc of the slot type ODD, the SATA ODD will generate a
DEVICE ATTENTION signal, which will cause ACPI to emit an interrupt.

On my test system, when I insert a disc into the slot type ODD to wake
it up, it will continue to generate DEVICE ATTENTION signal, and thus,
ACPI interrupts are coming all the time if I do not disable the ACPI GPE
that controls the interrupt. I guess the behaviour is device by device,
and the SPEC only states what ODD needs to do when in powered off state.
And I don't think a tray type ODD will generate DEVICE ATTENTION signal
when its eject button is pressed after powered up, but Jeff Wu from AMD
may be able to test this behaviour as he has some tray type ODDs if this
is meaningful to know.

>   polling, it should also be able to do async notification while
>   powered up, which isn't covered by zpodd but ATA async notification.

Agree. For powered up media change notification, that's SATA async
notification's job.

>   So, ummm... that's another obstacle.  If zpodd requires the device
>   to support ATA async notification, it might not be too bad tho.

This doesn't seem to be the case, since ZPODD is for how ODD to get
notified when it is powered off, so there is no words stating if the ODD
should also support sata async notification.

Thanks,
Aaron

^ permalink raw reply

* [PATCH] PM / devfreq: remove compiler error with module governors (2)
From: MyungJoo Ham @ 2012-11-28  1:57 UTC (permalink / raw)
  To: rjw; +Cc: myungjoo.ham, linux-pm, nm
In-Reply-To: <2278739.ton8Zzmpxv@vostro.rjw.lan>

Governors compiled as modules may use these functions.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
---
This patch is also locataed at
git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq for-rafael tags/pull_req_20121128

 drivers/devfreq/devfreq.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 34c00c5..a8f0173 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -235,6 +235,7 @@ void devfreq_monitor_start(struct devfreq *devfreq)
 		queue_delayed_work(devfreq_wq, &devfreq->work,
 			msecs_to_jiffies(devfreq->profile->polling_ms));
 }
+EXPORT_SYMBOL(devfreq_monitor_start);
 
 /**
  * devfreq_monitor_stop() - Stop load monitoring of a devfreq instance
@@ -248,6 +249,7 @@ void devfreq_monitor_stop(struct devfreq *devfreq)
 {
 	cancel_delayed_work_sync(&devfreq->work);
 }
+EXPORT_SYMBOL(devfreq_monitor_stop);
 
 /**
  * devfreq_monitor_suspend() - Suspend load monitoring of a devfreq instance
@@ -273,6 +275,7 @@ void devfreq_monitor_suspend(struct devfreq *devfreq)
 	mutex_unlock(&devfreq->lock);
 	cancel_delayed_work_sync(&devfreq->work);
 }
+EXPORT_SYMBOL(devfreq_monitor_suspend);
 
 /**
  * devfreq_monitor_resume() - Resume load monitoring of a devfreq instance
@@ -297,6 +300,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq)
 out:
 	mutex_unlock(&devfreq->lock);
 }
+EXPORT_SYMBOL(devfreq_monitor_resume);
 
 /**
  * devfreq_interval_update() - Update device devfreq monitoring interval
@@ -343,6 +347,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay)
 out:
 	mutex_unlock(&devfreq->lock);
 }
+EXPORT_SYMBOL(devfreq_interval_update);
 
 /**
  * devfreq_notifier_call() - Notify that the device frequency requirements
-- 
1.7.5.4


^ permalink raw reply related

* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: Tejun Heo @ 2012-11-28  1:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: James Bottomley, linux-pm, Aaron Lu, Jeff Garzik, Alan Stern,
	Jeff Wu, Aaron Lu, linux-ide, linux-scsi, linux-acpi
In-Reply-To: <21511277.LLinyDpbAK@vostro.rjw.lan>

Hey, Rafael.

On Wed, Nov 28, 2012 at 01:51:00AM +0100, Rafael J. Wysocki wrote:
> Having considered that a bit more I'm now thinking that in fact the power state
> the device is in at the moment doesn't really matter, so the polling code need
> not really know what PM is doing.  What it needs to know is that the device
> will generate a hardware event when something interesting happens, so it is not
> necessary to poll it.
> 
> In this particular case it is related to power management (apparently, hardware
> events will only be generated after the device has been put into ACPI D3cold,
> or so Aaron seems to be claiming), but it need not be in general, at least in
> principle.
> 
> It looks like we need an "event driven" flag that the can be set in the lower
> layers and read by the upper layers.  I suppose this means it would need to be
> in struct device, but not necessarily in the PM-specific part of it.

We already have that.  That's what gendisk->async_events is for (as
opposed to gendisk->events).  If all events are async_events, block
won't poll for events, but I'm not sure that's the golden bullet.

* None implements async_events yet and an API is missing -
  disk_check_events() - which is trivial to add, but it's the same
  story.  We'll need a mechanism to shoot up notification from libata
  to block layer.  It's admittedly easier to justify routing through
  SCSI tho.  So, we're mostly shifting the problem.  Given that async
  events is nice to have, so it isn't a bad idea.

* Still dunno much about zpodd but IIUC the notification from
  zero-power is via ACPI.  To advertise that the device doesn't need
  polling, it should also be able to do async notification while
  powered up, which isn't covered by zpodd but ATA async notification.
  So, ummm... that's another obstacle.  If zpodd requires the device
  to support ATA async notification, it might not be too bad tho.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: Rafael J. Wysocki @ 2012-11-28  0:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-pm, Tejun Heo, Aaron Lu, Jeff Garzik, Alan Stern, Jeff Wu,
	Aaron Lu, linux-ide, linux-scsi, linux-acpi
In-Reply-To: <1353906191.2523.25.camel@dabdike>

On Monday, November 26, 2012 09:03:11 AM James Bottomley wrote:
> On Mon, 2012-11-26 at 01:33 +0100, Rafael J. Wysocki wrote:
> > On Monday, November 19, 2012 03:06:51 PM James Bottomley wrote:
> > > > I really think we need a way for (auto)pm and event polling to talk to
> > > > each other so that autopm can tell event poll to sod off while pm is
> > > > in effect.  Trying to solve this from inside libata doesn't seem
> > > > right.  The problem, again, seems to be figuring out which hardware
> > > > device maps to which block device.  Hmmm... Any good ideas?
> > > 
> > > I've asked the PM people several times about this, because it's a real
> > > problem for almost everything:  PM needs some type of top to bottom
> > > stack view, which the layering isolation we have within storage really
> > > doesn't cope with well.  No real suggestion has been forthcoming.
> > 
> > Actually, I think that the particular case in question is really special
> > and the fact that there's the pollig loop that user space is involved in
> > doesn't make things more stratightforward.
> > 
> > And PM really doesn't need to see things top to bottom, but the polling
> > needs to know what happens in the PM land.  We need to be able to tell it
> > "from now on tell user space that there are no events here".  The question
> > is where to put that information so that it's accessible to all parts of the
> > stack involved.
> 
> Right, open to suggestions ...

Having considered that a bit more I'm now thinking that in fact the power state
the device is in at the moment doesn't really matter, so the polling code need
not really know what PM is doing.  What it needs to know is that the device
will generate a hardware event when something interesting happens, so it is not
necessary to poll it.

In this particular case it is related to power management (apparently, hardware
events will only be generated after the device has been put into ACPI D3cold,
or so Aaron seems to be claiming), but it need not be in general, at least in
principle.

It looks like we need an "event driven" flag that the can be set in the lower
layers and read by the upper layers.  I suppose this means it would need to be
in struct device, but not necessarily in the PM-specific part of it.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: ERROR: "devfreq_monitor_resume" undefined!
From: Rafael J. Wysocki @ 2012-11-28  0:36 UTC (permalink / raw)
  To: MyungJoo Ham; +Cc: linux-pm, kbuild test robot
In-Reply-To: <50b55470.8yQB+TYqddithSLm%fengguang.wu@intel.com>

On Wednesday, November 28, 2012 08:01:52 AM kbuild test robot wrote:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> head:   0917293958d5d33213a23125ce5d9c7a852469d6
> commit: 883d588e556347c4b3221ac492a8acd8a75e730a PM / devfreq: remove compiler error when a governor is module
> date:   2 days ago
> config: make ARCH=m68k allmodconfig
> 
> All error/warnings:
> 
> ERROR: "devfreq_monitor_resume" undefined!
> ERROR: "devfreq_monitor_suspend" undefined!
> ERROR: "devfreq_interval_update" undefined!
> ERROR: "devfreq_monitor_stop" undefined!
> ERROR: "devfreq_monitor_start" undefined!

So I dropped the devfreq branch from my linux-next branch again and quite
frankly I'm not going to include it into my first pull request during the v3.8
merge window.

Currently, all the material I got from you is on my pm-devfreq branch.  Please
fix the build problems on top of that.

Thanks,
Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* ERROR: "devfreq_monitor_resume" undefined!
From: kbuild test robot @ 2012-11-28  0:01 UTC (permalink / raw)
  To: MyungJoo Ham; +Cc: linux-pm

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
head:   0917293958d5d33213a23125ce5d9c7a852469d6
commit: 883d588e556347c4b3221ac492a8acd8a75e730a PM / devfreq: remove compiler error when a governor is module
date:   2 days ago
config: make ARCH=m68k allmodconfig

All error/warnings:

ERROR: "devfreq_monitor_resume" undefined!
ERROR: "devfreq_monitor_suspend" undefined!
ERROR: "devfreq_interval_update" undefined!
ERROR: "devfreq_monitor_stop" undefined!
ERROR: "devfreq_monitor_start" undefined!

---
0-DAY kernel build testing backend         Open Source Technology Center
Fengguang Wu, Yuanhan Liu                              Intel Corporation

^ permalink raw reply

* Re: [PATCH] Longhaul: Disable driver by default
From: Rafael J. Wysocki @ 2012-11-27 23:44 UTC (permalink / raw)
  To: Rafał Bilski; +Cc: cpufreq, linux-pm, Dave Jones
In-Reply-To: <50B54ED9.3090704@interia.pl>

On Tuesday, November 27, 2012 11:38:01 PM Rafał Bilski wrote:
> 
> > On Tuesday, November 27, 2012 10:13:55 PM Rafal Bilski wrote:
> >> This is only solution I can think of. User decides if he wants this
> >> driver on his machine. I don't have enough knowledge and time to find
> >> the reason why same code works on some machines and doesn't on others
> >> which use same, or very similar,  chipset and processor.
> > I always have problems with patches like this one, because they are pretty much
> > guaranteed to make someone complain.
> >
> > Is there any way to blacklist the affected machine you have?
> >
> > Rafael
> No. Also problem seems to be larger than one machine. Also weirder.
> One user claims his processor can't run below some frequency even
> if it should be perfectly capable of doing so. System in question
> freezes straight away. In past on good system I could change
> frequency at least a couple of times without any protection. Just by
> a chance. I tried to investigate "weird CPU 0 not listed by the BIOS"
> message, but I have nothing. Also I seem to have far less time for
> anything than in the past.

OK, I'll take the patch, then, but I won't include it in my first v3.8 pull
request.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH] Longhaul: Disable driver by default
From: Rafał Bilski @ 2012-11-27 23:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: cpufreq, linux-pm, Dave Jones
In-Reply-To: <1997813.IluQxocScm@vostro.rjw.lan>


> On Tuesday, November 27, 2012 10:13:55 PM Rafal Bilski wrote:
>> This is only solution I can think of. User decides if he wants this
>> driver on his machine. I don't have enough knowledge and time to find
>> the reason why same code works on some machines and doesn't on others
>> which use same, or very similar,  chipset and processor.
> I always have problems with patches like this one, because they are pretty much
> guaranteed to make someone complain.
>
> Is there any way to blacklist the affected machine you have?
>
> Rafael
No. Also problem seems to be larger than one machine. Also weirder.
One user claims his processor can't run below some frequency even
if it should be perfectly capable of doing so. System in question
freezes straight away. In past on good system I could change
frequency at least a couple of times without any protection. Just by
a chance. I tried to investigate "weird CPU 0 not listed by the BIOS"
message, but I have nothing. Also I seem to have far less time for
anything than in the past.

Sorry
Rafal

>
>
>> Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
>>
>> ---
>>   drivers/cpufreq/longhaul.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
>> index 53ddbc7..0bf5bd1 100644
>> --- a/drivers/cpufreq/longhaul.c
>> +++ b/drivers/cpufreq/longhaul.c
>> @@ -77,7 +77,7 @@ static unsigned int longhaul_index;
>>   static int scale_voltage;
>>   static int disable_acpi_c3;
>>   static int revid_errata;
>> -
>> +static int enable;
>>   
>>   /* Clock ratios multiplied by 10 */
>>   static int mults[32];
>> @@ -965,6 +965,10 @@ static int __init longhaul_init(void)
>>   	if (!x86_match_cpu(longhaul_id))
>>   		return -ENODEV;
>>   
>> +	if (!enable) {
>> +		printk(KERN_ERR PFX "Option \"enable\" not set. Aborting.\n");
>> +		return -ENODEV;
>> +	}
>>   #ifdef CONFIG_SMP
>>   	if (num_online_cpus() > 1) {
>>   		printk(KERN_ERR PFX "More than 1 CPU detected, "
>> @@ -1021,6 +1025,10 @@ MODULE_PARM_DESC(scale_voltage, "Scale voltage of processor");
>>    * such. */
>>   module_param(revid_errata, int, 0644);
>>   MODULE_PARM_DESC(revid_errata, "Ignore CPU Revision ID");
>> +/* By default driver is disabled to prevent incompatible
>> + * system freeze. */
>> +module_param(enable, int, 0644);
>> +MODULE_PARM_DESC(enable, "Enable driver");
>>   
>>   MODULE_AUTHOR("Dave Jones <davej@redhat.com>");
>>   MODULE_DESCRIPTION("Longhaul driver for VIA Cyrix processors.");
>>


^ permalink raw reply

* Re: [PATCH] Longhaul: Disable driver by default
From: Dave Jones @ 2012-11-27 22:41 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafal Bilski, cpufreq, linux-pm
In-Reply-To: <1997813.IluQxocScm@vostro.rjw.lan>

On Tue, Nov 27, 2012 at 11:33:10PM +0100, Rafael J. Wysocki wrote:
 > On Tuesday, November 27, 2012 10:13:55 PM Rafal Bilski wrote:
 > > This is only solution I can think of. User decides if he wants this
 > > driver on his machine. I don't have enough knowledge and time to find
 > > the reason why same code works on some machines and doesn't on others
 > > which use same, or very similar,  chipset and processor.
 > 
 > I always have problems with patches like this one, because they are pretty much
 > guaranteed to make someone complain.
 > 
 > Is there any way to blacklist the affected machine you have?

There are a lot of marginal VIA systems out there. Mostly due to really
poor quality motherboards (I had several myself that ended up with leaking
capacitors). They work fine until you put them under load and then start
tweaking the voltage. Rafal spent a long time trying to get them stable
(see the git history for longhaul.c).  

Given those CPUs are pretty underpowered today, and there are many better
alternatives if you care about power saving that much, I'd vote for
not worrying about it too much.  We even stopped building it in Fedora
due to a) the limited userbase and b) when we got bug reports there was
nothing we could really do, so we opted for stability over power saving.

	Dave

^ permalink raw reply

* [PATCH] Longhaul: Disable driver by default
From: Rafal Bilski @ 2012-11-27 22:13 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: cpufreq, linux-pm

This is only solution I can think of. User decides if he wants this
driver on his machine. I don't have enough knowledge and time to find
the reason why same code works on some machines and doesn't on others
which use same, or very similar,  chipset and processor.

Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>

---
 drivers/cpufreq/longhaul.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
index 53ddbc7..0bf5bd1 100644
--- a/drivers/cpufreq/longhaul.c
+++ b/drivers/cpufreq/longhaul.c
@@ -77,7 +77,7 @@ static unsigned int longhaul_index;
 static int scale_voltage;
 static int disable_acpi_c3;
 static int revid_errata;
-
+static int enable;
 
 /* Clock ratios multiplied by 10 */
 static int mults[32];
@@ -965,6 +965,10 @@ static int __init longhaul_init(void)
 	if (!x86_match_cpu(longhaul_id))
 		return -ENODEV;
 
+	if (!enable) {
+		printk(KERN_ERR PFX "Option \"enable\" not set. Aborting.\n");
+		return -ENODEV;
+	}
 #ifdef CONFIG_SMP
 	if (num_online_cpus() > 1) {
 		printk(KERN_ERR PFX "More than 1 CPU detected, "
@@ -1021,6 +1025,10 @@ MODULE_PARM_DESC(scale_voltage, "Scale voltage of processor");
  * such. */
 module_param(revid_errata, int, 0644);
 MODULE_PARM_DESC(revid_errata, "Ignore CPU Revision ID");
+/* By default driver is disabled to prevent incompatible
+ * system freeze. */
+module_param(enable, int, 0644);
+MODULE_PARM_DESC(enable, "Enable driver");
 
 MODULE_AUTHOR("Dave Jones <davej@redhat.com>");
 MODULE_DESCRIPTION("Longhaul driver for VIA Cyrix processors.");
-- 
1.8.0


^ permalink raw reply related

* Re: [PATCH] Longhaul: Disable driver by default
From: Rafael J. Wysocki @ 2012-11-27 22:33 UTC (permalink / raw)
  To: Rafal Bilski; +Cc: cpufreq, linux-pm, Dave Jones
In-Reply-To: <1354054435-2459-1-git-send-email-rafalbilski@interia.pl>

On Tuesday, November 27, 2012 10:13:55 PM Rafal Bilski wrote:
> This is only solution I can think of. User decides if he wants this
> driver on his machine. I don't have enough knowledge and time to find
> the reason why same code works on some machines and doesn't on others
> which use same, or very similar,  chipset and processor.

I always have problems with patches like this one, because they are pretty much
guaranteed to make someone complain.

Is there any way to blacklist the affected machine you have?

Rafael


> Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
> 
> ---
>  drivers/cpufreq/longhaul.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/longhaul.c b/drivers/cpufreq/longhaul.c
> index 53ddbc7..0bf5bd1 100644
> --- a/drivers/cpufreq/longhaul.c
> +++ b/drivers/cpufreq/longhaul.c
> @@ -77,7 +77,7 @@ static unsigned int longhaul_index;
>  static int scale_voltage;
>  static int disable_acpi_c3;
>  static int revid_errata;
> -
> +static int enable;
>  
>  /* Clock ratios multiplied by 10 */
>  static int mults[32];
> @@ -965,6 +965,10 @@ static int __init longhaul_init(void)
>  	if (!x86_match_cpu(longhaul_id))
>  		return -ENODEV;
>  
> +	if (!enable) {
> +		printk(KERN_ERR PFX "Option \"enable\" not set. Aborting.\n");
> +		return -ENODEV;
> +	}
>  #ifdef CONFIG_SMP
>  	if (num_online_cpus() > 1) {
>  		printk(KERN_ERR PFX "More than 1 CPU detected, "
> @@ -1021,6 +1025,10 @@ MODULE_PARM_DESC(scale_voltage, "Scale voltage of processor");
>   * such. */
>  module_param(revid_errata, int, 0644);
>  MODULE_PARM_DESC(revid_errata, "Ignore CPU Revision ID");
> +/* By default driver is disabled to prevent incompatible
> + * system freeze. */
> +module_param(enable, int, 0644);
> +MODULE_PARM_DESC(enable, "Enable driver");
>  
>  MODULE_AUTHOR("Dave Jones <davej@redhat.com>");
>  MODULE_DESCRIPTION("Longhaul driver for VIA Cyrix processors.");
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH 0/8] cpupower enhancements for 3.8
From: Rafael J. Wysocki @ 2012-11-27 22:30 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: linux-pm, cpufreq
In-Reply-To: <1354018669-63680-1-git-send-email-trenn@suse.de>

On Tuesday, November 27, 2012 01:17:41 PM Thomas Renninger wrote:
> Palmer Cox (6):
>   cpupower tools: Remove brace expansion from clean target
>   cpupower tools: Update .gitignore for files created in the debug
>     directories
>   cpupower tools: Fix minor warnings
>   cpupower tools: Fix issues with sysfs_topology_read_file
>   cpupower tools: Fix malloc of cpu_info structure
>   cpupower tools: Fix warning and a bug with the cpu package count
> 
> Thomas Renninger (2):
>   cpupower: Provide -c param for cpupower monitor to schedule process
>     on all cores
>   cpupower: IvyBridge (0x3a and 0x3e models) support
> 
>  tools/power/cpupower/.gitignore                    |    7 +++
>  tools/power/cpupower/Makefile                      |    3 +-
>  tools/power/cpupower/debug/i386/Makefile           |    5 ++-
>  tools/power/cpupower/man/cpupower-monitor.1        |   15 +++++-
>  tools/power/cpupower/utils/helpers/cpuid.c         |    2 +
>  tools/power/cpupower/utils/helpers/helpers.h       |   18 ++++---
>  tools/power/cpupower/utils/helpers/sysfs.c         |   19 -------
>  tools/power/cpupower/utils/helpers/topology.c      |   53 +++++++++++---------
>  .../cpupower/utils/idle_monitor/cpupower-monitor.c |   21 +++++++-
>  .../cpupower/utils/idle_monitor/cpupower-monitor.h |   17 ++++++
>  tools/power/cpupower/utils/idle_monitor/snb_idle.c |   10 +++-
>  11 files changed, 110 insertions(+), 60 deletions(-)

Applied to linux-pm.git/linux-next.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH] cpufreq: ondemand: update sampling rate only on right CPUs
From: Rafael J. Wysocki @ 2012-11-27 22:08 UTC (permalink / raw)
  To: linux-pm; +Cc: Fabio Baltieri, cpufreq, linux-kernel
In-Reply-To: <1353953412-20667-1-git-send-email-fabio.baltieri@linaro.org>

On Monday, November 26, 2012 07:10:12 PM Fabio Baltieri wrote:
> Fix cpufreq_gov_ondemand to skip CPU where another governor is used.
> 
> The bug present itself as NULL pointer access on the mutex_lock() call,
> an can be reproduced on an SMP machine by setting the default governor
> to anything other than ondemand, setting a single CPU's governor to
> ondemand, then changing the sample rate by writing on:
> 
> > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
> 
> Backtrace:
> 
> Nov 26 17:36:54 balto kernel: [  839.585241] BUG: unable to handle kernel NULL pointer dereference at           (null)
> Nov 26 17:36:54 balto kernel: [  839.585311] IP: [<ffffffff8174e082>] __mutex_lock_slowpath+0xb2/0x170
> [snip]
> Nov 26 17:36:54 balto kernel: [  839.587005] Call Trace:
> Nov 26 17:36:54 balto kernel: [  839.587030]  [<ffffffff8174da82>] mutex_lock+0x22/0x40
> Nov 26 17:36:54 balto kernel: [  839.587067]  [<ffffffff81610b8f>] store_sampling_rate+0xbf/0x150
> Nov 26 17:36:54 balto kernel: [  839.587110]  [<ffffffff81031e9c>] ?  __do_page_fault+0x1cc/0x4c0
> Nov 26 17:36:54 balto kernel: [  839.587153]  [<ffffffff813309bf>] kobj_attr_store+0xf/0x20
> Nov 26 17:36:54 balto kernel: [  839.587192]  [<ffffffff811bb62d>] sysfs_write_file+0xcd/0x140
> Nov 26 17:36:54 balto kernel: [  839.587234]  [<ffffffff8114c12c>] vfs_write+0xac/0x180
> Nov 26 17:36:54 balto kernel: [  839.587271]  [<ffffffff8114c472>] sys_write+0x52/0xa0
> Nov 26 17:36:54 balto kernel: [  839.587306]  [<ffffffff810321ce>] ?  do_page_fault+0xe/0x10
> Nov 26 17:36:54 balto kernel: [  839.587345]  [<ffffffff81751202>] system_call_fastpath+0x16/0x1b
> 
> Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org>
> ---
> 
> Hi Rafael,
> 
> this is based on a clean linux-pm linux-next branch (i.e. not with my other
> patch-set applied), so expect a context conflict if both are applied.

I have applied this patch to my linux-next branch.

As for the other series, I'm in the process of reviewing it, but I rather
won't include it into my first pull request for v3.8.

Thanks,
Rafael


>  drivers/cpufreq/cpufreq_ondemand.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index cca3e9f..7731f7c 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -40,6 +40,10 @@
>  static struct dbs_data od_dbs_data;
>  static DEFINE_PER_CPU(struct od_cpu_dbs_info_s, od_cpu_dbs_info);
>  
> +#ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
> +static struct cpufreq_governor cpufreq_gov_ondemand;
> +#endif
> +
>  static struct od_dbs_tuners od_tuners = {
>  	.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
>  	.sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
> @@ -279,6 +283,10 @@ static void update_sampling_rate(unsigned int new_rate)
>  		policy = cpufreq_cpu_get(cpu);
>  		if (!policy)
>  			continue;
> +		if (policy->governor != &cpufreq_gov_ondemand) {
> +			cpufreq_cpu_put(policy);
> +			continue;
> +		}
>  		dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
>  		cpufreq_cpu_put(policy);
>  
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH 1/5] cpufreq: handle SW coordinated CPUs
From: Rafael J. Wysocki @ 2012-11-27 22:05 UTC (permalink / raw)
  To: linux-pm
  Cc: Fabio Baltieri, cpufreq, Rickard Andersson, Vincent Guittot,
	Linus Walleij, Lee Jones, linux-kernel
In-Reply-To: <1353947996-26723-2-git-send-email-fabio.baltieri@linaro.org>

On Monday, November 26, 2012 05:39:52 PM Fabio Baltieri wrote:
> From: Rickard Andersson <rickard.andersson@stericsson.com>
> 
> This patch fixes a bug that occurred when we had load on a secondary CPU
> and the primary CPU was sleeping. Only one sampling timer was spawned
> and it was spawned as a deferred timer on the primary CPU, so when a
> secondary CPU had a change in load this was not detected by the cpufreq
> governor (both ondemand and conservative).
> 
> This patch make sure that deferred timers are run on all CPUs in the
> case of software controlled CPUs that run on the same frequency.
> 
> Signed-off-by: Rickard Andersson <rickard.andersson@stericsson.com>
> Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org>
> ---
>  drivers/cpufreq/cpufreq_conservative.c |  3 +-
>  drivers/cpufreq/cpufreq_governor.c     | 52 ++++++++++++++++++++++++++++++----
>  drivers/cpufreq/cpufreq_governor.h     |  1 +
>  drivers/cpufreq/cpufreq_ondemand.c     |  3 +-
>  4 files changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
> index 64ef737..b9d7f14 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -122,7 +122,8 @@ static void cs_dbs_timer(struct work_struct *work)
>  
>  	dbs_check_cpu(&cs_dbs_data, cpu);
>  
> -	schedule_delayed_work_on(cpu, &dbs_info->cdbs.work, delay);
> +	schedule_delayed_work_on(smp_processor_id(), &dbs_info->cdbs.work,
> +			delay);
>  	mutex_unlock(&dbs_info->cdbs.timer_mutex);
>  }
>  
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 6c5f1d3..a00f02d 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -161,13 +161,31 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>  }
>  EXPORT_SYMBOL_GPL(dbs_check_cpu);
>  
> +bool dbs_sw_coordinated_cpus(struct cpu_dbs_common_info *cdbs)
> +{
> +	struct cpufreq_policy *policy = cdbs->cur_policy;
> +
> +	return cpumask_weight(policy->cpus) > 1;
> +}
> +EXPORT_SYMBOL_GPL(dbs_sw_coordinated_cpus);
> +
>  static inline void dbs_timer_init(struct dbs_data *dbs_data,
> -		struct cpu_dbs_common_info *cdbs, unsigned int sampling_rate)
> +				  struct cpu_dbs_common_info *cdbs,
> +				  unsigned int sampling_rate,
> +				  int cpu)
>  {
>  	int delay = delay_for_sampling_rate(sampling_rate);
> +	struct cpu_dbs_common_info *cdbs_local = dbs_data->get_cpu_cdbs(cpu);
> +	struct od_cpu_dbs_info_s *od_dbs_info;
> +
> +	cancel_delayed_work_sync(&cdbs_local->work);
> +
> +	if (dbs_data->governor == GOV_ONDEMAND) {
> +		od_dbs_info = dbs_data->get_cpu_dbs_info_s(cpu);
> +		od_dbs_info->sample_type = OD_NORMAL_SAMPLE;
> +	}

The patch looks good in general except for the special case above.

Why exactly is it necessary?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Rafael J. Wysocki @ 2012-11-27 21:46 UTC (permalink / raw)
  To: linux-pm
  Cc: Ming Lei, linux-kernel, Alan Stern, Oliver Neukum, Minchan Kim,
	Greg Kroah-Hartman, Jens Axboe, David S. Miller, Andrew Morton,
	netdev, linux-usb, linux-mm
In-Reply-To: <5434404.G1ERYjuorE@vostro.rjw.lan>

On Tuesday, November 27, 2012 10:19:29 PM Rafael J. Wysocki wrote:
> On Saturday, November 24, 2012 08:59:14 PM Ming Lei wrote:
> > The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
> > to help PM core to teach mm not allocating memory with GFP_KERNEL
> > flag for avoiding probable deadlock.
> > 
> > As explained in the comment, any GFP_KERNEL allocation inside
> > runtime_resume() or runtime_suspend() on any one of device in
> > the path from one block or network device to the root device
> > in the device tree may cause deadlock, the introduced
> > pm_runtime_set_memalloc_noio() sets or clears the flag on
> > device in the path recursively.
> > 
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> > ---
> > v5:
> > 	- fix code style error
> > 	- add comment on clear the device memalloc_noio flag
> > v4:
> > 	- rename memalloc_noio_resume as memalloc_noio
> > 	- remove pm_runtime_get_memalloc_noio()
> > 	- add comments on pm_runtime_set_memalloc_noio
> > v3:
> > 	- introduce pm_runtime_get_memalloc_noio()
> > 	- hold one global lock on pm_runtime_set_memalloc_noio
> > 	- hold device power lock when accessing memalloc_noio_resume
> > 	  flag suggested by Alan Stern
> > 	- implement pm_runtime_set_memalloc_noio without recursion
> > 	  suggested by Alan Stern
> > v2:
> > 	- introduce pm_runtime_set_memalloc_noio()
> > ---
> >  drivers/base/power/runtime.c |   60 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/pm.h           |    1 +
> >  include/linux/pm_runtime.h   |    3 +++
> >  3 files changed, 64 insertions(+)
> > 
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 3148b10..3e198a0 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
> >  
> > +static int dev_memalloc_noio(struct device *dev, void *data)
> > +{
> > +	return dev->power.memalloc_noio;
> > +}
> > +
> > +/*
> > + * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
> > + * @dev: Device to handle.
> > + * @enable: True for setting the flag and False for clearing the flag.
> > + *
> > + * Set the flag for all devices in the path from the device to the
> > + * root device in the device tree if @enable is true, otherwise clear
> > + * the flag for devices in the path whose siblings don't set the flag.
> > + *
> 
> Please use counters instead of walking the whole path every time.  Ie. in
> addition to the flag add a counter to store the number of the device's
> children having that flag set.

I would use the flag only to store the information that
pm_runtime_set_memalloc_noio(dev, true) has been run for this device directly
and I'd use a counter for everything else.

That is, have power.memalloc_count that would be incremented when (1)
pm_runtime_set_memalloc_noio(dev, true) is called for that device and (2) when
power.memalloc_count for one of its children changes from 0 to 1 (and
analogously for decrementation).  Then, check the counter in rpm_callback().

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox