Linux Power Management development
 help / color / mirror / Atom feed
* LPC2011 Power Management Micro Conf
From: Rafael J. Wysocki @ 2011-07-03  7:29 UTC (permalink / raw)
  To: linux-pm
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Stephen Boyd, LKML,
	Grant Likely, MyungJoo Ham, Jean Pihet, Arjan van de Ven
In-Reply-To: <201105011353.44809.rjw@sisk.pl>

Hi,

Some time ago I sent the following to linux-pm and wasn't really satisfied
with the response, so here it goes again.  It is slightly outdated, because
we've already started to plan a PM meeting at the Kernel Summit, but I hope
some of you will be able to participate in the LPC PM Mini Conf too.

In the previous years we used to organize Power Management Mini Summits to
discuss PM-related issues, the last of which took place during LinuxCon in
Boston in 2010.  Unfortunately, however, it wasn't particularly well attended
and some of the participants generally felt that it had failed its purpose.
The power management track at the last LPC, on the other hand, was generally
regarded as interesting and successful, so there only seems to be room for
one event devoted to power management a year.  For this reason, there won't
be a PM Mini Summit in 2011 and the LPC PM Mini Conference will play the role
of it.  To this end, in addition to the plenary sessions within the LPC PM
track there will be a possibility to discuss specific problems related to
power management in smaller groups.

If there's anything you'd like to discuss or give a presentation on related to
Power Management, please let me know, preferably by replying to this message.

We have 1.5 hours allocated in the LPC schedule for the plenary session, but
we will be able to move to another room (or a number of rooms if that's more
suitable) afterwards to discuss things in detail.

At the moment, there are two items proposed for the plenary session.  One of
them will cover the power management development during the last year, but in
that context I also would like things that are still missing and problems to
solve to be mentioned, so please tell me what they are from your point of view.
The second one will cover the idle power consumption problem and will be led
by Len Brown.  Len has a lot of experience in that area, but mostly with systems
based on Intel hardware, so if you want to contribute your own observations
and/or results related to idle power, please let me know.

Anyway, if you are going to attend the LPC and have anything PM-related to
talk about or discuss, please let me know. 

Best regards,
Rafael

^ permalink raw reply

* Re: Random freezing failure with NFS and automount
From: Rafael J. Wysocki @ 2011-07-03  7:07 UTC (permalink / raw)
  To: svaidy; +Cc: linux-pm
In-Reply-To: <20110628155054.GA23242@dirshya.in.ibm.com>

Hi,

On Tuesday, June 28, 2011, Vaidyanathan Srinivasan wrote:
> Hi,
> 
> I have random freezing failures on my laptop running 2.6.39 kernel.
> The laptop has NFS client and automount.  Network could have been
> disconnected by the time suspend is attempted, hence nfs client should
> fail all operations, just freeze and allow laptop to suspend.
> 
> I need some help to drill deeper at this log and also suggestions on
> config options to try and get more information to help me root cause
> this issue.
> 
> This happens once in 4-5 suspend/resume cycles, does not succeed on
> retry, eventually I have to reboot.

This is a tasks freezer failure, ie. the freezing of tasks fails, because
one of them refuses to handle signals for 20 s.  This is probably related
to waiting on a VFS mutex in the TASK_UNINTERRUPTIBLE state.

We don't handle those cases nicely right now, sorry about that.

Thanks,
Rafael


> Linux kernel version 2.6.39-2.slh.1-aptosid-amd64 (debian)
> 
> [15203.060847] PM: Syncing filesystems ... done.
> [15203.224792] Freezing user space processes ... 
> [15223.230516] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
> [15223.230551] man             T 0000000000000002     0  6788   6765 0x00800004
> [15223.230557]  ffff880037192760 0000000000000086 ffff88011823cc30 ffff88013328bb10
> [15223.230562]  ffff88010bd83fd8 ffff88010bd83fd8 ffff88010bd83fd8 ffff880037192760
> [15223.230567]  ffff8801332d5ac0 ffffffff8103cd7c ffff880037192760 ffff88012f8d5a88
> [15223.230571] Call Trace:
> [15223.230581]  [<ffffffff8103cd7c>] ? __wake_up_sync_key+0x4c/0x90
> [15223.230588]  [<ffffffff8105b4f9>] ? do_notify_parent_cldstop+0x149/0x1b0
> [15223.230596]  [<ffffffff810fb5c9>] ? kmem_cache_free+0x99/0xb0
> [15223.230600]  [<ffffffff8105b767>] ? do_signal_stop+0xa7/0x1e0
> [15223.230604]  [<ffffffff8105c74d>] ? get_signal_to_deliver+0xfd/0x3f0
> [15223.230609]  [<ffffffff8100a114>] ? do_signal+0x84/0x7e0
> [15223.230614]  [<ffffffff81033c28>] ? do_page_fault+0x198/0x440
> [15223.230619]  [<ffffffff8104fb38>] ? do_wait+0x1d8/0x210
> [15223.230623]  [<ffffffff81050dc5>] ? sys_wait4+0xa5/0x100
> [15223.230626]  [<ffffffff8100a8f5>] ? do_notify_resume+0x65/0x90
> [15223.230630]  [<ffffffff8104eab0>] ? delayed_put_task_struct+0x60/0x60
> [15223.230637]  [<ffffffff813b4c60>] ? int_signal+0x12/0x17
> [15223.230640] pager           T 0000000000000000     0  6799   6788 0x00800004
> [15223.230647]  ffff880037196900 0000000000000086 ffff880037025200 ffff880133326f90
> [15223.230649]  ffff88010487bfd8 ffff88010487bfd8 ffff88010487bfd8 ffff880037196900
> [15223.230651]  ffff8801303318c0 ffffffff8103cd7c ffff880037196900 ffff8801332d62c8
> [15223.230653] Call Trace:
> [15223.230655]  [<ffffffff8103cd7c>] ? __wake_up_sync_key+0x4c/0x90
> [15223.230657]  [<ffffffff8105b4f9>] ? do_notify_parent_cldstop+0x149/0x1b0
> [15223.230659]  [<ffffffff810400d8>] ? check_preempt_wakeup+0x118/0x160
> [15223.230661]  [<ffffffff810fb544>] ? kmem_cache_free+0x14/0xb0
> [15223.230663]  [<ffffffff8105b767>] ? do_signal_stop+0xa7/0x1e0
> [15223.230665]  [<ffffffff8105c74d>] ? get_signal_to_deliver+0xfd/0x3f0
> [15223.230667]  [<ffffffff8100a114>] ? do_signal+0x84/0x7e0
> [15223.230669]  [<ffffffff8105d312>] ? sys_kill+0x122/0x1d0
> [15223.230670]  [<ffffffff8100a8f5>] ? do_notify_resume+0x65/0x90
> [15223.230672]  [<ffffffff813b4c60>] ? int_signal+0x12/0x17
> [15223.230685] automount       D 0000000000000003     0 15394   2438 0x00800004
> [15223.230688]  ffff8800a0a83480 0000000000000082 0000000000000000 ffff8801183dc1a0
> [15223.230690]  ffff88009c4e3fd8 ffff88009c4e3fd8 ffff88009c4e3fd8 ffff8800a0a83480
> [15223.230692]  ffff88009c4e3ce0 ffff88010bf95001 ffff88009c4e3c5e ffffffff81113547
> [15223.230694] Call Trace:
> [15223.230697]  [<ffffffff81113547>] ? __follow_mount_rcu.isra.21+0x37/0xe0
> [15223.230703]  [<ffffffff812b3b89>] ? kernel_sendmsg+0x39/0x50
> [15223.230722]  [<ffffffffa059992a>] ? xs_send_kvec+0x8a/0x90 [sunrpc]
> [15223.230727]  [<ffffffffa059c3f0>] ? rpc_queue_empty+0x40/0x40 [sunrpc]
> [15223.230732]  [<ffffffffa059c40f>] ? rpc_wait_bit_killable+0x1f/0x40 [sunrpc]
> [15223.230734]  [<ffffffff813b253f>] ? __wait_on_bit+0x4f/0x80
> [15223.230738]  [<ffffffffa059c3f0>] ? rpc_queue_empty+0x40/0x40 [sunrpc]
> [15223.230740]  [<ffffffff813b25ec>] ? out_of_line_wait_on_bit+0x7c/0xa0
> [15223.230744]  [<ffffffff81068610>] ? autoremove_wake_function+0x30/0x30
> [15223.230748]  [<ffffffffa059d0d4>] ? __rpc_execute+0xe4/0x2f0 [sunrpc]
> [15223.230750]  [<ffffffff810682d8>] ? wake_up_bit+0x18/0x40
> [15223.230754]  [<ffffffffa0596299>] ? rpc_run_task+0x69/0x90 [sunrpc]
> [15223.230757]  [<ffffffffa05963cf>] ? rpc_call_sync+0x3f/0x70 [sunrpc]
> [15223.230765]  [<ffffffffa05f089c>] ? nfs3_rpc_wrapper.constprop.15+0x3c/0x60 [nfs]
> [15223.230770]  [<ffffffffa05f1c13>] ? nfs3_proc_getattr+0x43/0x90 [nfs]
> [15223.230775]  [<ffffffffa05e0d94>] ? __nfs_revalidate_inode+0x94/0x200 [nfs]
> [15223.230780]  [<ffffffffa05e1107>] ? nfs_getattr+0x57/0x110 [nfs]
> [15223.230782]  [<ffffffff8110c432>] ? vfs_fstatat+0x52/0x70
> [15223.230784]  [<ffffffff8110c5c2>] ? sys_newlstat+0x12/0x30
> [15223.230787]  [<ffffffff81125026>] ? mntput_no_expire+0x16/0xf0
> [15223.230790]  [<ffffffff8110625f>] ? filp_close+0x5f/0x90
> [15223.230791]  [<ffffffff8110633d>] ? sys_close+0xad/0x120
> [15223.230793]  [<ffffffff813b4992>] ? system_call_fastpath+0x16/0x1b
> [15223.230796] 
> [15223.230797] Restarting tasks ... done.
> 
> Thanks,
> Vaidy
> 
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
> 
> 

^ permalink raw reply

* Re: Suspend to RAm query
From: Rafael J. Wysocki @ 2011-07-03  7:04 UTC (permalink / raw)
  To: deepak.sikri79; +Cc: linux-pm, linux-kernel, deepaksi
In-Reply-To: <4E09DE03.3090100@st.com>

Hi,

On Tuesday, June 28, 2011, deepaksi wrote:
> Hi All,
> 
> I have a very basic query w.r.t the place to handle the standby/suspend
> process for the devices in the system.
> 
> Should it be done at the driver level or in the machine/platform
> specific code ?
> 
> There are instances where I need some specific functionality in the
> standby mode, or some extra functionality
> in the suspend mode at the device level. Is that the reason for
> differentiating between the suspend and standby at
> the platform code level, because in platform code we get different
> states signifying a suspend or a standby ?

Yes, it is.

The platform suspend callbacks in strcut platform_suspend_ops (as defined
in include/linux/suspend.h) take a suspend_state_t argument indicating
what state is being requested.  You can get the information from there,
if I understood your problem description correctly.

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH v3 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
From: Rafael J. Wysocki @ 2011-07-02 22:13 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner
In-Reply-To: <1306471995-4048-3-git-send-email-myungjoo.ham@samsung.com>

On Friday, May 27, 2011, MyungJoo Ham wrote:
> 1. System-wide sysfs interface
> - tickle_all	R: number of tickle_all execution
> 		W: tickle all devfreq devices
> - min_interval	R: devfreq monitoring base interval in ms
> - monitoring	R: shows whether devfreq monitoring is active or
>   not.
> 
> 2. Device specific sysfs interface
> - tickle	R: number of tickle execution for the device
> 		W: tickle the device
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> --
> Changed from v2
> - add ABI entries for devfreq sysfs interface
> ---
>  Documentation/ABI/testing/sysfs-devices-devfreq |   21 ++++
>  Documentation/ABI/testing/sysfs-power           |   43 ++++++++
>  drivers/base/power/devfreq.c                    |  133 ++++++++++++++++++++++-
>  include/linux/devfreq.h                         |    3 +
>  4 files changed, 199 insertions(+), 1 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-devices-devfreq
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-devfreq b/Documentation/ABI/testing/sysfs-devices-devfreq
> new file mode 100644
> index 0000000..7f35a64
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-devices-devfreq
> @@ -0,0 +1,21 @@
> +What:		/sys/devices/.../devfreq/
> +Date:		May 2011
> +Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> +		The /sys/device/.../devfreq directory will contain files
> +		that provide interfaces to DEVFREQ for a specific device.
> +
> +What:		/sys/devices/.../devfreq/tickle
> +Date:		May 2011
> +Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> +		The /sys/devices/.../devfreq/tickle file allows user space
> +		to force the corresponding device to operate at its maximum
> +		operable frequency instaneously and temporarily. After a
> +		designated duration has passed, the operating frequency returns
> +		to normal. When a user reads the tickle entry, it returns
> +		the number of tickle executions for the device. When a user
> +		writes to the tickle entry with the tickle duration in ms,
> +		the effect of device tickling is held for the designated
> +		duration. Note that the duration is rounded-up by
> +		the value DEVFREQ_INTERVAL defined in devfreq.c
> diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
> index b464d12..4d8434b 100644
> --- a/Documentation/ABI/testing/sysfs-power
> +++ b/Documentation/ABI/testing/sysfs-power
> @@ -172,3 +172,46 @@ Description:
>  
>  		Reading from this file will display the current value, which is
>  		set to 1 MB by default.
> +
> +What:		/sys/power/devfreq/
> +Date:		May 2011
> +Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> +		The /sys/power/devfreq directory will contain files that will
> +		provide a unified interface to the DEVFREQ, a generic DVFS
> +		(dynamic voltage and frequency scaling) framework.
> +
> +What:		/sys/power/devfreq/tickle_all
> +Date:		May 2011
> +Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> +		The /sys/power/devfreq/tickle_all file allows user space to
> +		force every device with DEVFREQ to operate at the maximum
> +		frequency of the device instaneously and temporarily. After
> +		a designated delay has passed, the operating frequency returns
> +		to normal. If a user reads the tickle_all entry, it returns
> +		the number of tickle_all executions. When writing to the
> +		tickle_all entry, the user should supply with the duration of
> +		tickle in ms (the "designated delay" mentioned before). Then,
> +		the effect of tickle_all will hold for the denoted duration.
> +		Note that the duration is rounded by the monitoring period
> +		defined by DEVFREQ_INTERVAL in /drivers/base/power/devfreq.c.
> +
> +What:		/sys/power/devfreq/min_interval
> +Date:		May 2011
> +Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> +		The /sys/power/devfreq/min_interval file shows the monitoring
> +		period defined by DEVFREQ_INTERVAL in
> +		/drivers/base/power/devfreq.c. The duration of device tickling
> +		is rounded-up by DEVFREQ_INTERVAL.
> +
> +What:		/sys/power/devfreq/monitoring
> +Date:		May 2011
> +Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> +		The /sys/power/devfreq/monitoring file shows whether DEVFREQ
> +		is periodically monitoring. Periodic monitoring is activated
> +		if there is a device that wants periodic monitoring for DVFS or
> +		there is a device that is tickled (and the tickling duration is
> +		not yet expired).
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> index 7648a94..709c138 100644
> --- a/drivers/base/power/devfreq.c
> +++ b/drivers/base/power/devfreq.c
> @@ -40,6 +40,9 @@ static LIST_HEAD(devfreq_list);
>  /* Exclusive access to devfreq_list and its elements */
>  static DEFINE_MUTEX(devfreq_list_lock);
>  
> +static struct kobject *devfreq_kobj;
> +static struct attribute_group dev_attr_group;
> +
>  /**
>   * find_device_devfreq() - find devfreq struct using device pointer
>   * @dev:	device pointer used to lookup device DEVFREQ.
> @@ -237,6 +240,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
>  		queue_delayed_work(devfreq_wq, &devfreq_work,
>  				   msecs_to_jiffies(DEVFREQ_INTERVAL));
>  	}
> +
> +	sysfs_update_group(&dev->kobj, &dev_attr_group);

This appears to modify the attributes of the device, but anything like it is
not mentioned in the documentation.  What's up?

>  out:
>  	mutex_unlock(&devfreq_list_lock);
>  
> @@ -263,6 +268,8 @@ int devfreq_remove_device(struct device *dev)
>  		return -EINVAL;
>  	}
>  
> +	sysfs_remove_group(&dev->kobj, &dev_attr_group);
> +
>  	list_del(&devfreq->node);
>  
>  	kfree(devfreq);
> @@ -344,7 +351,7 @@ static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
>  	if (devfreq_wq && !polling) {
>  		polling = true;
>  		queue_delayed_work(devfreq_wq, &devfreq_work,
> -				msecs_to_jiffies(DEVFREQ_INTERVAL));
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));

This change doesn't seem to belong to this patch.

>  	}
>  
>  	return err;
> @@ -382,6 +389,116 @@ int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
>  	return err;
>  }
>  
> +static int num_tickle_all;
> +static ssize_t tickle_all(struct device *dev, struct device_attribute *attr,
> +			  const char *buf, size_t count)
> +{
> +	int duration = 0;
> +	struct devfreq *tmp;
> +	unsigned long delay;
> +
> +	sscanf(buf, "%d", &duration);
> +	if (duration < DEVFREQ_INTERVAL)
> +		duration = DEVFREQ_INTERVAL;
> +
> +	if (unlikely(IS_ERR_OR_NULL(dev))) {
> +		pr_err("%s: Invalid parameters\n", __func__);

Please say "null device" instead of in addition to the above.

> +		return -EINVAL;
> +	}
> +
> +	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
> +
> +	mutex_lock(&devfreq_list_lock);
> +	list_for_each_entry(tmp, &devfreq_list, node) {
> +		_devfreq_tickle_device(tmp, delay);
> +	}
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	num_tickle_all++;
> +	return count;
> +}
> +
> +static ssize_t show_num_tickle_all(struct device *dev,
> +				   struct device_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "%d\n", num_tickle_all);
> +}
> +
> +static ssize_t show_min_interval(struct device *dev,
> +				 struct device_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "%d\n", DEVFREQ_INTERVAL);
> +}
> +
> +static ssize_t show_monitoring(struct device *dev,
> +			       struct device_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "%d\n", monitoring ? 1 : 0);
> +}
> +
> +static DEVICE_ATTR(tickle_all, 0644, show_num_tickle_all, tickle_all);
> +static DEVICE_ATTR(min_interval, 0444, show_min_interval, NULL);
> +static DEVICE_ATTR(monitoring, 0444, show_monitoring, NULL);
> +static struct attribute *devfreq_entries[] = {
> +	&dev_attr_tickle_all.attr,
> +	&dev_attr_min_interval.attr,
> +	&dev_attr_monitoring.attr,
> +	NULL,
> +};
> +static struct attribute_group devfreq_attr_group = {
> +	.name	= NULL,
> +	.attrs	= devfreq_entries,
> +};
> +
> +static ssize_t tickle(struct device *dev, struct device_attribute *attr,
> +		      const char *buf, size_t count)
> +{
> +	int duration;
> +	struct devfreq *df;
> +	unsigned long delay;
> +
> +	sscanf(buf, "%d", &duration);
> +	if (duration < DEVFREQ_INTERVAL)
> +		duration = DEVFREQ_INTERVAL;
> +
> +	if (unlikely(IS_ERR_OR_NULL(dev))) {
> +		pr_err("%s: Invalid parameters\n", __func__);

Like above.

> +		return -EINVAL;
> +	}
> +
> +	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
> +
> +	mutex_lock(&devfreq_list_lock);
> +	df = find_device_devfreq(dev);
> +	_devfreq_tickle_device(df, delay);
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	return count;
> +}
> +
> +static ssize_t show_num_tickle(struct device *dev,
> +			       struct device_attribute *attr, char *buf)
> +{
> +	struct devfreq *df;
> +
> +	df = find_device_devfreq(dev);
> +
> +	if (!IS_ERR(df))
> +		return sprintf(buf, "%d\n", df->num_tickle);
> +
> +	return PTR_ERR(df);
> +}
> +
> +static DEVICE_ATTR(tickle, 0644, show_num_tickle, tickle);
> +static struct attribute *dev_entries[] = {
> +	&dev_attr_tickle.attr,
> +	NULL,
> +};
> +static struct attribute_group dev_attr_group = {
> +	.name	= "devfreq",
> +	.attrs	= dev_entries,
> +};
> +
>  static int __init devfreq_init(void)
>  {
>  	mutex_lock(&devfreq_list_lock);
> @@ -389,6 +506,20 @@ static int __init devfreq_init(void)
>  	polling = false;
>  	devfreq_wq = create_freezable_workqueue("devfreq_wq");
>  	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
> +
> +#ifdef CONFIG_PM
> +	/* Create sysfs */
> +	devfreq_kobj = kobject_create_and_add("devfreq", power_kobj);

Hmm, so power_kobj is global?  It shouldn't be.

Generally, whatever adds attributes to /sys/power should be located in
kernel/power/ .

> +	if (!devfreq_kobj) {
> +		pr_err("Unable to create DEVFREQ kobject.\n");
> +		goto out;
> +	}
> +	if (sysfs_create_group(devfreq_kobj, &devfreq_attr_group)) {
> +		pr_err("Unable to create DEVFREQ sysfs entries.\n");
> +		goto out;
> +	}
> +#endif
> +out:
>  	mutex_unlock(&devfreq_list_lock);
>  
>  	devfreq_monitor(&devfreq_work.work);
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 1ec9a40..69334e7 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -59,6 +59,7 @@ struct devfreq_governor {
>   *		at each executino of devfreq_monitor, tickle is decremented.
>   *		User may tickle a device-devfreq in order to set maximum
>   *		frequency instaneously with some guaranteed duration.
> + * @num_tickle	number of tickle calls.
>   *
>   * This structure stores the DEVFREQ information for a give device.
>   */
> @@ -72,6 +73,8 @@ struct devfreq {
>  	unsigned long previous_freq;
>  	unsigned int next_polling;
>  	unsigned int tickle;
> +
> +	unsigned int num_tickle;
>  };
>  
>  #if defined(CONFIG_PM_DEVFREQ)

It looks like the above two changes should be moved to a separate patch.

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH v3 2/3] PM / DEVFREQ: add example governors
From: Rafael J. Wysocki @ 2011-07-02 21:58 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner
In-Reply-To: <1306471995-4048-2-git-send-email-myungjoo.ham@samsung.com>

Hi,

On Friday, May 27, 2011, MyungJoo Ham wrote:
> Three CPUFREQ-like governors are provided as examples.
> 
> powersave: use the lowest frequency possible. The user (device) should
> set the polling_ms as 0 because polling is useless for this governor.
> 
> performance: use the highest freqeuncy possible. The user (device)
> should set the polling_ms as 0 because polling is useless for this
> governor.
> 
> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
> 
> When a user updates OPP entries (enable/disable/add), OPP framework
> automatically notifies DEVFREQ to update operating frequency
> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
> , performance, or any other "static" governors.
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
>  drivers/base/power/devfreq.c |   69 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/devfreq.h      |    5 +++
>  2 files changed, 74 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> index 883d953..7648a94 100644
> --- a/drivers/base/power/devfreq.c
> +++ b/drivers/base/power/devfreq.c
> @@ -395,3 +395,72 @@ static int __init devfreq_init(void)
>  	return 0;
>  }
>  late_initcall(devfreq_init);
> +
> +static int devfreq_powersave_func(struct devfreq *df,
> +				  unsigned long *freq)
> +{
> +	*freq = 0; /* devfreq_do will run "ceiling" to 0 */
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_powersave = {
> +	.get_target_freq = devfreq_powersave_func,
> +};
> +
> +static int devfreq_performance_func(struct devfreq *df,
> +				    unsigned long *freq)
> +{
> +	*freq = UINT_MAX; /* devfreq_do will run "floor" */
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_performance = {
> +	.get_target_freq = devfreq_performance_func,
> +};
> +
> +/* Constants for DevFreq-Simple-Ondemand (DFSO) */
> +#define DFSO_UPTHRESHOLD	(90)
> +#define DFSO_DOWNDIFFERENCTIAL	(5)
> +static int devfreq_simple_ondemand_func(struct devfreq *df,
> +					unsigned long *freq)
> +{
> +	struct devfreq_dev_status stat;
> +	int err = df->profile->get_dev_status(df->dev, &stat);
> +	unsigned long long a, b;
> +
> +	if (err)
> +		return err;
> +
> +	/* Set MAX if it's busy enough */
> +	if (stat.busy_time * 100 >
> +	    stat.total_time * DFSO_UPTHRESHOLD) {
> +		*freq = UINT_MAX;
> +		return 0;
> +	}
> +
> +	/* Set MAX if we do not know the initial frequency */
> +	if (stat.current_frequency == 0) {
> +		*freq = UINT_MAX;
> +		return 0;
> +	}
> +
> +	/* Keep the current frequency */
> +	if (stat.busy_time * 100 >
> +	    stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
> +		*freq = stat.current_frequency;
> +		return 0;
> +	}
> +
> +	/* Set the desired frequency based on the load */
> +	a = (unsigned long long) stat.busy_time * stat.current_frequency;

What's the purpose of the conversion?

> +	b = div_u64(a, stat.total_time);
> +	b *= 100;
> +	b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
> +	*freq = (unsigned long) b;
> +
> +	return 0;
> +}

Rafael

^ permalink raw reply

* Re: [PATCH v3 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
From: Rafael J. Wysocki @ 2011-07-02 21:56 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner
In-Reply-To: <1306471995-4048-1-git-send-email-myungjoo.ham@samsung.com>

Hi,

I'm not sure if spelling DEVFREQ in the kerneldoc comments with capitals is
a good idea, it looks kind of odd.  I'm not sure what would be look better,
though.

On Friday, May 27, 2011, MyungJoo Ham wrote:

...
> +/**
> + * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.

I'd say "periodically" istead of "regularly".

> + * @work: the work struct used to run devfreq_monitor periodically.

Also please say something more about the "tickle" thinkg here.

> + */
> +static void devfreq_monitor(struct work_struct *work)
> +{
> +	struct devfreq *devfreq;
> +	int error;
> +	bool continue_polling = false;
> +	struct devfreq *to_remove = NULL;
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	list_for_each_entry(devfreq, &devfreq_list, node) {
> +		/* Remove the devfreq entry that failed */
> +		if (to_remove) {
> +			list_del(&to_remove->node);
> +			kfree(to_remove);
> +			to_remove = NULL;
> +		}
> +
> +		/*
> +		 * If the device is tickled and the tickle duration is left,
> +		 * do not change the frequency for a while
> +		 */
> +		if (devfreq->tickle) {
> +			continue_polling = true;
> +			devfreq->tickle--;
> +
> +			/*
> +			 * If the tickle is ending and the device is not going
> +			 * to poll, force the device to poll next time so that
> +			 * it can return to the original frequency afterwards.
> +			 * However, non-polling device will have 0 polling_ms,
> +			 * it will not poll again later.
> +			 */
> +			if (devfreq->tickle == 0 && devfreq->next_polling == 0)
> +				devfreq->next_polling = 1;
> +
> +			continue;
> +		}
> +
> +		/* This device does not require polling */
> +		if (devfreq->next_polling == 0)
> +			continue;
> +
> +		continue_polling = true;
> +
> +		if (devfreq->next_polling == 1) {
> +			/* This device is polling this time */

I'd remove this comment, it's confusing IMO.  Besides, it may be better
to structure the code like this:

if (devfreq->next_polling-- == 1) {
}

and then you wouldn't need the "else" at all.

> +			error = devfreq_do(devfreq);
> +			if (error && error != -EAGAIN) {
> +				/*
> +				 * Remove a devfreq with error. However,
> +				 * We cannot remove it right here because the

Comma after "here", please.

> +				 * devfreq pointer is going to be used by
> +				 * list_for_each_entry above. Thus, it is
> +				 * removed afterwards.

Why don't you use list_for_each_entry_safe(), then?

> +				 */
> +				to_remove = devfreq->dev;
> +				dev_err(devfreq->dev, "devfreq_do error(%d). "
> +					"DEVFREQ is removed from the device\n",
> +					error);
> +				continue;
> +			}
> +			devfreq->next_polling = DIV_ROUND_UP(
> +						devfreq->profile->polling_ms,
> +						DEVFREQ_INTERVAL);
> +		} else {
> +			/* The device will poll later when next_polling = 1 */
> +			devfreq->next_polling--;
> +		}
> +	}
> +
> +	if (to_remove) {
> +		list_del(&to_remove->node);
> +		kfree(to_remove);
> +		to_remove = NULL;
> +	}
> +
> +	if (continue_polling) {
> +		polling = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	} else {
> +		polling = false;
> +	}

OK, so why exactly continue_polling is needed?  It seems you might siply use
"polling" instead of it above.

> +
> +	mutex_unlock(&devfreq_list_lock);
> +}
...
> +/**
> + * devfreq_update() - Notify that the device OPP has been changed.
> + * @dev:	the device whose OPP has been changed.
> + * @may_not_exist:	do not print error message even if the device
> + *			does not have devfreq entry.

This argument isn't used any more.

> + */
> +int devfreq_update(struct device *dev)
> +{
> +	struct devfreq *devfreq;
> +	int err = 0;
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	devfreq = find_device_devfreq(dev);
> +	if (IS_ERR(devfreq)) {
> +		err = PTR_ERR(devfreq);
> +		goto out;
> +	}
> +
> +	if (devfreq->tickle) {
> +		/* If the max freq available is changed, re-tickle */

It would be good to explain what "re-tickle" means.

> +		unsigned long freq = devfreq->profile->max_freq;
> +		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> +		if (IS_ERR(opp)) {
> +			err = PTR_ERR(opp);
> +			goto out;
> +		}
> +
> +		/* Max freq available is not changed */
> +		if (devfreq->previous_freq == freq)
> +			goto out;
> +
> +		err = devfreq->profile->target(devfreq->dev, opp);
> +		if (!err)
> +			devfreq->previous_freq = freq;

Why don't we run devfreq_do() in this case?

> +	} else {
> +		/* Reevaluate the proper frequency */
> +		err = devfreq_do(devfreq);
> +	}
> +
> +out:
> +	mutex_unlock(&devfreq_list_lock);
> +	return err;
> +}
> +

Add a kerneldoc comment here, please.

> +static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
> +{
> +	int err = 0;
> +	unsigned long freq;
> +	struct opp *opp;
> +
> +	freq = df->profile->max_freq;
> +	opp = opp_find_freq_floor(df->dev, &freq);
> +	if (IS_ERR(opp))
> +		return PTR_ERR(opp);
> +
> +	if (df->previous_freq != freq) {
> +		err = df->profile->target(df->dev, opp);
> +		if (!err)
> +			df->previous_freq = freq;
> +	}
> +	if (err) {
> +		dev_err(df->dev, "%s: Cannot set frequency.\n", __func__);
> +	} else {
> +		df->tickle = delay;
> +		df->num_tickle++;
> +	}
> +
> +	if (devfreq_wq && !polling) {
> +		polling = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	}
> +
> +	return err;
> +}

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH v3 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
From: Rafael J. Wysocki @ 2011-07-02 21:30 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner
In-Reply-To: <201106162311.00593.rjw@sisk.pl>

On Thursday, June 16, 2011, Rafael J. Wysocki wrote:
> Nishanth,
> 
> Have your comments been addressed by the message below?

>From the silence I gather they have.

Thanks,
Rafael


 
> On Monday, May 30, 2011, MyungJoo Ham wrote:
> > On Sat, May 28, 2011 at 5:57 PM, Nishanth Menon <nm@ti.com> wrote:
> > > On 13:53-20110527, MyungJoo Ham wrote:
> > > [..]
> > >> diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
> > >> index 56a6899..819c1b3 100644
> > >> --- a/drivers/base/power/opp.c
> > >> +++ b/drivers/base/power/opp.c
> > >> @@ -21,6 +21,7 @@
> > >>  #include <linux/rculist.h>
> > >>  #include <linux/rcupdate.h>
> > >>  #include <linux/opp.h>
> > >> +#include <linux/devfreq.h>
> > >>
> > >>  /*
> > >>   * Internal data structure organization with the OPP layer library is as
> > >> @@ -428,6 +429,11 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
> > >>       list_add_rcu(&new_opp->node, head);
> > >>       mutex_unlock(&dev_opp_list_lock);
> > >>
> > >> +     /*
> > >> +      * Notify generic dvfs for the change and ignore error
> > >> +      * because the device may not have a devfreq entry
> > >> +      */
> > >> +     devfreq_update(dev);
> > > I think I understand your thought about notifiers here - we had some
> > > initial thought that we may need notifiers, for e.g. clk change
> > > notifiers which could implement enable/disable on a dependent device,
> > > thermal management etc.. so far, we have'nt had a need to modify the
> > > lowest data store layer to handle this. In this case, I think you'd
> > > like a notifier mechanism in general for opp_add,enable or disable in
> > > the opp library.
> > 
> > Yes, I wanted a notifier machnaism called by OPP for any potential
> > changes in available frequencies. For that purpose, I've added
> > devfreq_update() because, for now, DEVFREQ is the only one that needs
> > such a notifier and a notifier has some overhead. If there are
> > multiple entities that require notifications from OPP for such
> > changes, we'd need notifiers maintained by OPP.
> > 
> > > However, with this change, an opp_add now means:
> > > a) opp_add now depends on devfreq_update() to be part of it's integral
> > > operation - I think the devfreq should maintain it's own notifier
> > > mechanisms and not make OPP library do the notification trigger job of
> > > devfreq.
> > 
> > To let the entities depending on the list of available frequencies be
> > notified, I think it would be appropriate for OPP to have a notifier
> > chain and call the notifier chain whenever there is a change in the
> > list. In that way, we can always add and remove the depending entities
> > (including DEVFREQ) to and from OPP's. Otherwise, we'd need OPP users
> > that call OPP functions to call the notifier chain when it appears
> > that the call will change the list.
> > 
> > As long as the updates in available frequencies affect other entities,
> > OPP library should be able to notify the others whenever such changes
> > occur. In this patch, I've used devfreq_update() directly because we
> > do not have other entities, yet. However, as you've  mentioned, there
> > will be others, so I'm willing to implement notifier in OPP if other
> > people also agree.
> > 
> > > b) By ignoring the error, how will the caller of opp_add now know if the
> > > failures were real - e.g. some device failed to act on the notification
> > > and the overall opp_add operation failed? And why should OPP library do
> > > recovery for on behalf of devfreq?
> > 
> > Um... OPP does not need to do any recovery for errors from DEVFREQ.
> > Other than errors from find_device_devfreq() in devfreq_update(), the
> > errors seem to be better returned somehow to the caller of opp_add().
> > However, would it be fine to let opp_add (and other
> > frequency-availability changing functions) return errors returned from
> > a notifier? (either devfreq_update() or srcu_notifier_call_chain())
> > 
> > > c) How will you ensure the lock accuracy - given that you need to keep
> > > to the constraints of opp_add/opp_set_availability here? For example:
> > > devfreq_do is called from devfreq_monitor(workqueue) and devfreq_update
> > > in addition to having to call a bunch of function pointers whose
> > > implementation you do not know, having a function that ensures security
> > > of it's data, handles all lock conditions that is imposed differently by
> > > add,enable and disable is not visible in this implementation.
> > 
> > devfreq_update() waits for devfreq_monitor() with devfreq_list_lock.
> > Both has devfreq_list_lock locked during their operations.
> > 
> > > c) How about considering the usage of OPP library with an SoC cpufreq
> > > implementation say for MPU AND using the same SoC using devfreq for
> > > controlling some devices like MALI in your example simultaneously? Lets
> > > say the thermal policy manager implementation for some reason did an
> > > opp_disable of a higher OPP of MPU - devfreq_update gets called with the
> > > device(mpu_dev).. devfreq_update needs to now invoke it's validation
> > > path and fails correctly - Vs devfreq_update being called by some module
> > > which actually uses devfreq and the update failed - How does one
> > > differentiate between the two?
> > 
> > When OPP is being used by CPUFREQ w/ CPU and by MALI, the OPP list
> > should be attached to the two devices independently.
> > In other words, the frequency of CPU should be controlled
> > independently from the frequency of OPP. If both CPU and MALI should
> > use the same frequency, there is no point to implement devfreq for
> > MALI. However, if the condition is not fully-dependent; e.g.,
> > Freq(MALI) <= Freq(CPU), we can implement two instances of OPP for
> > each of CPU and MALI and let the "target" function supplied to CPU to
> > run opp_enable/disable and devfreq_update() to MALI before adjusting
> > the frequency of itself.
> > 
> > When we are implementing another entity that controls OPP/DEVFREQ such
> > as the thermal policy manager, it should tackle both CPU and MALI w/
> > OPPs at the same time if they are fully independent (i.e., the
> > frequency of MALI is not affected by the frequency of CPU). If not,
> > the same things (discussed in the previous paragraph) are applied.
> > 
> > In the two examples you've mentioned, the first one is about the
> > implementation of the current patch (let thermal manager handle OPP
> > and OPP handle devfreq_update) and the another is when DEVFREQ calls
> > are implemented through thermal manager (let thermal manager handle
> > both OPP and devfreq_update), right?
> > 
> > For the two cases, I think that the first approach is more
> > appropriate. We will probably have more than just a thermal manager to
> > handle OPP (and devfreq that depends on OPP). The first approach
> > allows users to declare which frequencies are available and to let the
> > underlying structure do their work without intervention from users
> > (those who decide which frequencies are available). With the second
> > approach, we need to enforce every frequency affecting entity to
> > understand underlying frequency dependencies and the behavior of
> > devfreq, not jst to understand the behavior of the device according to
> > the frequency. DEVFREQ is going to be handled and implemented by the
> > device driver of the device controlled by the instance of DEVFREQ. By
> > using the first approach, we can prevent affecting entities other than
> > the device driver using DEVFREQ from thinking about DEVFREQ of that
> > device.
> > 
> > 
> > >
> > > just my 2 cents, I think these kind of notifiers should probably
> > > stay out of opp library OR notifiers be introduced in a generic and safe
> > > fashion.
> > 
> > So, you want either
> > 
> > to call devfreq_update from where opp_add/enable/disable/... are called.
> > 
> > or
> > 
> > to use notifier in OPP?
> > 
> > 
> > Well, the first one would become the source of messy problems as
> > mentioned earlier. The second one has some overhead, but it's a
> > general approach and does not incur the issues of the first approach.
> > 
> > Therefore, I prefer either to keep the initial approach regarding
> > devfreq_update() or to use a generic notifier at OPP (probably, SRCU
> > notifier chains?).
> > 
> > How about using a SRCU_NOTIFIER at OPP?
> > 
> > The notifier.h says that SRCU has low overhead generally and the most
> > overhead of SRCU comes from "remove srcu" (probably when "device" is
> > being removed), which won't happen a lot with OPP.
> > 
> > >
> > >>       return 0;
> > >>  }
> > >>
> > >> @@ -512,6 +518,9 @@ unlock:
> > >>       mutex_unlock(&dev_opp_list_lock);
> > >>  out:
> > >>       kfree(new_opp);
> > >> +
> > >> +     /* Notify generic dvfs for the change and ignore error */
> > >> +     devfreq_update(dev);
> > > Another place where I am confused - how does devfreq_update know in what
> > > context is it being called is the OPP getting enabled/disabled/added -
> > > devfreq_update has to do some additional work to make it's own decision
> > > - it is almost as if it uses devfreq_update as a trigger mechanism to reevalute
> > > the entire decision tree - without using information of the context
> > > opp_enable/disable/add has to maybe make better call. if this
> > > information is not needed, maybe some other mechanism implemented by
> > > devfreq layer might suffice..
> > 
> > No, DEVFREQ does not require the context (OPP being
> > enabled/disabled/...) with the devfreq_update() calls. It is
> > essentially a "notify devfreq to reevaluate" and DEVFREQ reevaluates
> > based on the available frequencies.
> > 
> > Thus, a plain srcu_notifier_call_chain() should work for
> > DEVFREQ/devfreq_update from OPP.
> > 
> > >
> > >>       return r;
> > >>  }
> > >
> > > [...]
> > > --
> > > Regards,
> > > Nishanth Menon
> > >
> > 
> > 
> > Thank you for your comments, Nishanth.
> > 
> > 
> > Cheers!
> > - MyungJoo
> > 
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply

* Re: [PATCH 03/11] PM QoS: support the dynamic devices insertion and removal
From: Rafael J. Wysocki @ 2011-07-02 21:14 UTC (permalink / raw)
  To: jean.pihet; +Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1309446685-17502-4-git-send-email-j-pihet@ti.com>

Hi,

On Thursday, June 30, 2011, jean.pihet@newoldbits.com wrote:
> From: Jean Pihet <j-pihet@ti.com>
> 
> The devices wake-up latency constraints class of PM QoS is
> storing the constraints list using the device pm_info struct.
> 
> This patch adds the init and de-init of the per-device constraints
> list in order to support the dynamic insertion and removal
> of the devices in the system.
> 
> Signed-off-by: Jean Pihet <j-pihet@ti.com>

The number of times this patch special cases PM_QOS_DEV_WAKEUP_LATENCY with
respect to the other PM QoS classes indicates a design issue, which kind of
corresponds with my comments on [3/11].

Thanks,
Rafael


> ---
>  drivers/base/power/main.c     |    8 +++--
>  include/linux/pm.h            |    1 +
>  include/linux/pm_qos_params.h |    2 +
>  kernel/pm_qos_params.c        |   70 ++++++++++++++++++++++++++++++++--------
>  4 files changed, 64 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index b1fd96b..51f5526 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -27,6 +27,7 @@
>  #include <linux/sched.h>
>  #include <linux/async.h>
>  #include <linux/suspend.h>
> +#include <linux/pm_qos_params.h>
>  
>  #include "../base.h"
>  #include "power.h"
> @@ -96,8 +97,8 @@ void device_pm_add(struct device *dev)
>  			dev_name(dev->parent));
>  	list_add_tail(&dev->power.entry, &dpm_list);
>  	mutex_unlock(&dpm_list_mtx);
> -	/* ToDo: call PM QoS to init the per-device wakeup latency constraints */
> -	plist_head_init(&dev->power.wakeup_lat_plist_head, &dev->power.lock);
> +	/* Call PM QoS to init the per-device wakeup latency constraints */
> +	pm_qos_dev_wakeup_lat_init(dev);
>  }
>  
>  /**
> @@ -108,7 +109,8 @@ void device_pm_remove(struct device *dev)
>  {
>  	pr_debug("PM: Removing info for %s:%s\n",
>  		 dev->bus ? dev->bus->name : "No Bus", dev_name(dev));
> -	/* ToDo: call PM QoS to de-init the per-device wakeup latency constraints */
> +	/* Call PM QoS to de-init the per-device wakeup latency constraints */
> +	pm_qos_dev_wakeup_lat_deinit(dev);
>  	complete_all(&dev->power.completion);
>  	mutex_lock(&dpm_list_mtx);
>  	list_del_init(&dev->power.entry);
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 35fe682..d9b6092 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -464,6 +464,7 @@ struct dev_pm_info {
>  	void			*subsys_data;  /* Owned by the subsystem. */
>  #endif
>  	struct plist_head	wakeup_lat_plist_head;
> +	int			wakeup_lat_init;
>  };
>  
>  extern void update_pm_runtime_accounting(struct device *dev);
> diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
> index e6e16cb..5b6707a 100644
> --- a/include/linux/pm_qos_params.h
> +++ b/include/linux/pm_qos_params.h
> @@ -45,4 +45,6 @@ int pm_qos_add_notifier(int class, struct notifier_block *notifier);
>  int pm_qos_remove_notifier(int class, struct notifier_block *notifier);
>  int pm_qos_request_active(struct pm_qos_request_list *req);
>  
> +void pm_qos_dev_wakeup_lat_init(struct device *dev);
> +void pm_qos_dev_wakeup_lat_deinit(struct device *dev);
>  #endif
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index d61c8e5..6f25ccb 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -275,11 +275,21 @@ void pm_qos_add_request(struct pm_qos_request_list *pm_qos_req,
>  	struct pm_qos_object *o =  pm_qos_array[pm_qos_params->class];
>  	int new_value;
>  
> -	if ((pm_qos_params->class != PM_QOS_DEV_WAKEUP_LATENCY) &&
> -	    (pm_qos_request_active(pm_qos_req))) {
> -		WARN(1, KERN_ERR "pm_qos_add_request() called for already "
> -		     "added request\n");
> -		return;
> +	if (pm_qos_params->class == PM_QOS_DEV_WAKEUP_LATENCY) {
> +		if (!pm_qos_params->dev) {
> +			WARN(1, KERN_ERR "pm_qos_add_request() called for "
> +				"invalid device\n");
> +			return;
> +		}
> +		/* Silently return if the device is being released */
> +		if (!pm_qos_params->dev->power.wakeup_lat_init)
> +			return;
> +	} else {
> +		if (pm_qos_request_active(pm_qos_req)) {
> +			WARN(1, KERN_ERR "pm_qos_add_request() called for "
> +				"already added request\n");
> +			return;
> +		}
>  	}
>  
>  	if (pm_qos_params->value == PM_QOS_DEFAULT_VALUE)
> @@ -312,11 +322,16 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
>  	if (!pm_qos_req) /*guard against callers passing in null */
>  		return;
>  
> -	if ((pm_qos_req->class != PM_QOS_DEV_WAKEUP_LATENCY) &&
> -	    (!pm_qos_request_active(pm_qos_req))) {
> -		WARN(1, KERN_ERR "pm_qos_update_request() called for unknown "
> -		     "object\n");
> -		return;
> +	if (pm_qos_req->class == PM_QOS_DEV_WAKEUP_LATENCY) {
> +		/* Silently return if the device is being released */
> +		if (!pm_qos_req->dev->power.wakeup_lat_init)
> +			return;
> +	} else {
> +		if (!pm_qos_request_active(pm_qos_req)) {
> +			WARN(1, KERN_ERR "pm_qos_update_request() called "
> +				"for unknown object\n");
> +			return;
> +		}
>  	}
>  
>  	if (new_value == PM_QOS_DEFAULT_VALUE)
> @@ -343,11 +358,16 @@ void pm_qos_remove_request(struct pm_qos_request_list *pm_qos_req)
>  		return;
>  		/* silent return to keep pcm code cleaner */
>  
> -	if ((pm_qos_req->class != PM_QOS_DEV_WAKEUP_LATENCY) &&
> -	    (!pm_qos_request_active(pm_qos_req))) {
> -		WARN(1, KERN_ERR "pm_qos_remove_request() called for unknown "
> -		     "object\n");
> +	if (pm_qos_req->class == PM_QOS_DEV_WAKEUP_LATENCY) {
> +		/* Silently return if the device is being released */
> +		if (!pm_qos_req->dev->power.wakeup_lat_init)
> +			return;
> +	} else {
> +		if (!pm_qos_request_active(pm_qos_req)) {
> +			WARN(1, KERN_ERR "pm_qos_remove_request() called "
> +				"for unknown object\n");
>  		return;
> +		}
>  	}
>  
>  	update_target(pm_qos_req, 1, PM_QOS_DEFAULT_VALUE);
> @@ -393,6 +413,28 @@ int pm_qos_remove_notifier(int class, struct notifier_block *notifier)
>  }
>  EXPORT_SYMBOL_GPL(pm_qos_remove_notifier);
>  
> +/* Called from the device PM subsystem at device init */
> +void pm_qos_dev_wakeup_lat_init(struct device *dev)
> +{
> +	plist_head_init(&dev->power.wakeup_lat_plist_head, &dev->power.lock);
> +	dev->power.wakeup_lat_init = 1;
> +}
> +
> +/* Called from the device PM subsystem at device release */
> +void pm_qos_dev_wakeup_lat_deinit(struct device *dev)
> +{
> +	struct pm_qos_request_list *req, *tmp;
> +
> +	dev->power.wakeup_lat_init = 0;
> +
> +	/* Flush the constraints list for the device */
> +	plist_for_each_entry_safe(req, tmp,
> +				  &dev->power.wakeup_lat_plist_head,
> +				  list)
> +		update_target(req, 1, PM_QOS_DEFAULT_VALUE);
> +	plist_head_init(&dev->power.wakeup_lat_plist_head, &dev->power.lock);
> +}
> +
>  static int pm_qos_power_open(struct inode *inode, struct file *filp)
>  {
>  	struct pm_qos_parameters pm_qos_params;
> 

^ permalink raw reply

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: Rafael J. Wysocki @ 2011-07-02 21:10 UTC (permalink / raw)
  To: jean.pihet; +Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1309446685-17502-3-git-send-email-j-pihet@ti.com>

Hi,

On Thursday, June 30, 2011, jean.pihet@newoldbits.com wrote:
> From: Jean Pihet <j-pihet@ti.com>
> 
> - add a new PM QoS class PM_QOS_DEV_WAKEUP_LATENCY for device wake-up
> constraints. Due to the per-device nature of the new class the constraints
> list is stored inside the device dev_pm_info struct instead of the internal
> per-class constraints lists.

I think PM_QOS_DEV_LATENCY might be a better name.

> The new class is only available from kernel drivers and so is not exported
> to user space.

It should be available to user space, however, because in many cases drivers
simply have no idea what values to use (after all, the use decides if he
wants to trade worse video playback quality for better battery life, for
example).

> The new class is used to put constraints on given devices in the system
> while the existing PM_QOS_CPU_DMA_LATENCY class is used by cpuidle to
> determine the next MPU subsystem state.
> 
> - make the pm_qos_add_request API more generic by using a struct
> pm_qos_parameters parameter
> 
> - the notification mechanism now passes the constraint request struct ptr
> in order for the notifier callback to have access to the full set of
> constraint data, e.g. the struct device ptr
> 
> - update the pm_qos_add_request callers to the generic API
> 
> - minor clean-ups and rename of struct fields
> 
> Signed-off-by: Jean Pihet <j-pihet@ti.com>

Well, I think this patch attempts to do too many things at a time, which
makes it somewhat hard to comprehend at first glance.  I'd stronly suggest
splitting it into a series of patches that first modify the existing API
to make it suitable for the "real" changes and second introduce those
"real" chages in the most straightforward way possible.

> ---
>  arch/arm/plat-omap/i2c.c               |   20 ----
>  drivers/i2c/busses/i2c-omap.c          |   35 ++++---
>  drivers/media/video/via-camera.c       |    5 +-
>  drivers/net/e1000e/netdev.c            |    9 +-
>  drivers/net/wireless/ipw2x00/ipw2100.c |    6 +-
>  include/linux/pm_qos_params.h          |   40 +++++---
>  kernel/pm_qos_params.c                 |  185 +++++++++++++++++++-------------

Hmm.  If you're at it, what about moving pm_qos_params.c to kernel/power
and changing its name to pm_qos.c beforehand?

The header might be called pm_qos.h too, BTW.

>  sound/core/pcm_native.c                |    8 +-
>  8 files changed, 177 insertions(+), 131 deletions(-)
> 
...
> diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
> index a7d87f9..e6e16cb 100644
> --- a/include/linux/pm_qos_params.h
> +++ b/include/linux/pm_qos_params.h
> @@ -8,31 +8,41 @@
>  #include <linux/notifier.h>
>  #include <linux/miscdevice.h>
>  
> -#define PM_QOS_RESERVED 0
> -#define PM_QOS_CPU_DMA_LATENCY 1
> -#define PM_QOS_NETWORK_LATENCY 2
> -#define PM_QOS_NETWORK_THROUGHPUT 3
> +#define	PM_QOS_RESERVED			0
> +#define	PM_QOS_CPU_DMA_LATENCY		1
> +#define	PM_QOS_DEV_WAKEUP_LATENCY	2
> +#define	PM_QOS_NETWORK_LATENCY		3
> +#define	PM_QOS_NETWORK_THROUGHPUT	4
>  
> -#define PM_QOS_NUM_CLASSES 4
> -#define PM_QOS_DEFAULT_VALUE -1
> +#define PM_QOS_NUM_CLASSES		5
> +#define PM_QOS_DEFAULT_VALUE		-1
>  
> -#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
> -#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
> -#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE	0
> +#define	PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
> +#define	PM_QOS_DEV_WAKEUP_LAT_DEFAULT_VALUE	0
> +#define	PM_QOS_NETWORK_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
> +#define	PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE	0
>  
>  struct pm_qos_request_list {
>  	struct plist_node list;
> -	int pm_qos_class;
> +	int class;
> +	struct device *dev;
>  };
>  
> -void pm_qos_add_request(struct pm_qos_request_list *l, int pm_qos_class, s32 value);
> +struct pm_qos_parameters {
> +	int class;
> +	struct device *dev;
> +	s32 value;
> +};
> +
> +void pm_qos_add_request(struct pm_qos_request_list *l,
> +			struct pm_qos_parameters *params);
>  void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> -		s32 new_value);
> +			   s32 new_value);
>  void pm_qos_remove_request(struct pm_qos_request_list *pm_qos_req);
>  
> -int pm_qos_request(int pm_qos_class);
> -int pm_qos_add_notifier(int pm_qos_class, struct notifier_block *notifier);
> -int pm_qos_remove_notifier(int pm_qos_class, struct notifier_block *notifier);
> +int pm_qos_request(int class);
> +int pm_qos_add_notifier(int class, struct notifier_block *notifier);
> +int pm_qos_remove_notifier(int class, struct notifier_block *notifier);
>  int pm_qos_request_active(struct pm_qos_request_list *req);
>  
>  #endif
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index 6824ca7..d61c8e5 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -60,7 +60,7 @@ enum pm_qos_type {
>   * types linux supports for 32 bit quantites
>   */
>  struct pm_qos_object {
> -	struct plist_head requests;
> +	struct plist_head *requests;
>  	struct blocking_notifier_head *notifiers;
>  	struct miscdevice pm_qos_power_miscdev;
>  	char *name;

So, I gather the idea is to have "requests" point to the device's private
list of latency constraints.  [BTW, there seems to be a naming confusion,
because you tend to call those things "constraints" rather that "requests".]
If so, doesn't that mean we'll need a per-device struct pm_qos_object too?
In which case why not to make the device object point to that pm_qos_object
object instead?

> @@ -72,9 +72,12 @@ struct pm_qos_object {
>  static DEFINE_SPINLOCK(pm_qos_lock);
>  
>  static struct pm_qos_object null_pm_qos;
> +
>  static BLOCKING_NOTIFIER_HEAD(cpu_dma_lat_notifier);
> +static struct plist_head _cpu_dma_reqs =
> +			PLIST_HEAD_INIT(_cpu_dma_reqs, pm_qos_lock);
>  static struct pm_qos_object cpu_dma_pm_qos = {
> -	.requests = PLIST_HEAD_INIT(cpu_dma_pm_qos.requests, pm_qos_lock),
> +	.requests = &_cpu_dma_reqs,
>  	.notifiers = &cpu_dma_lat_notifier,
>  	.name = "cpu_dma_latency",
>  	.target_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> @@ -82,9 +85,20 @@ static struct pm_qos_object cpu_dma_pm_qos = {
>  	.type = PM_QOS_MIN,
>  };
>  
> +static BLOCKING_NOTIFIER_HEAD(dev_wakeup_lat_notifier);
> +static struct pm_qos_object dev_wakeup_lat_pm_qos = {
> +	.notifiers = &dev_wakeup_lat_notifier,
> +	.name = "dev_wakeup_latency",
> +	.target_value = PM_QOS_DEV_WAKEUP_LAT_DEFAULT_VALUE,
> +	.default_value = PM_QOS_DEV_WAKEUP_LAT_DEFAULT_VALUE,
> +	.type = PM_QOS_MIN,
> +};
> +
>  static BLOCKING_NOTIFIER_HEAD(network_lat_notifier);
> +static struct plist_head _network_lat_reqs =
> +			PLIST_HEAD_INIT(_network_lat_reqs, pm_qos_lock);
>  static struct pm_qos_object network_lat_pm_qos = {
> -	.requests = PLIST_HEAD_INIT(network_lat_pm_qos.requests, pm_qos_lock),
> +	.requests = &_network_lat_reqs,
>  	.notifiers = &network_lat_notifier,
>  	.name = "network_latency",
>  	.target_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> @@ -94,8 +108,10 @@ static struct pm_qos_object network_lat_pm_qos = {
>  
>  
>  static BLOCKING_NOTIFIER_HEAD(network_throughput_notifier);
> +static struct plist_head _network_throughput_reqs =
> +			PLIST_HEAD_INIT(_network_throughput_reqs, pm_qos_lock);
>  static struct pm_qos_object network_throughput_pm_qos = {
> -	.requests = PLIST_HEAD_INIT(network_throughput_pm_qos.requests, pm_qos_lock),
> +	.requests = &_network_throughput_reqs,
>  	.notifiers = &network_throughput_notifier,
>  	.name = "network_throughput",
>  	.target_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> @@ -107,6 +123,7 @@ static struct pm_qos_object network_throughput_pm_qos = {
>  static struct pm_qos_object *pm_qos_array[] = {
>  	&null_pm_qos,
>  	&cpu_dma_pm_qos,
> +	&dev_wakeup_lat_pm_qos,
>  	&network_lat_pm_qos,
>  	&network_throughput_pm_qos
>  };
> @@ -129,19 +146,19 @@ static const struct file_operations pm_qos_power_fops = {
>  /* unlocked internal variant */
>  static inline int pm_qos_get_value(struct pm_qos_object *o)
>  {
> -	if (plist_head_empty(&o->requests))
> +	if (plist_head_empty(o->requests))
>  		return o->default_value;
>  
>  	switch (o->type) {
> -	case PM_QOS_MIN:
> -		return plist_first(&o->requests)->prio;
> +		case PM_QOS_MIN:
> +			return plist_first(o->requests)->prio;
>  
> -	case PM_QOS_MAX:
> -		return plist_last(&o->requests)->prio;
> +		case PM_QOS_MAX:
> +			return plist_last(o->requests)->prio;
>  
> -	default:
> -		/* runtime check for not using enum */
> -		BUG();
> +		default:
> +			/* runtime check for not using enum */
> +			BUG();
>  	}
>  }
>  
> @@ -155,13 +172,22 @@ static inline void pm_qos_set_value(struct pm_qos_object *o, s32 value)
>  	o->target_value = value;
>  }
>  
> -static void update_target(struct pm_qos_object *o, struct plist_node *node,
> -			  int del, int value)
> +static void update_target(struct pm_qos_request_list *req, int del, int value)
>  {
>  	unsigned long flags;
>  	int prev_value, curr_value;
> +	struct pm_qos_object *o = pm_qos_array[req->class];
> +	struct plist_node *node = &req->list;
>  
>  	spin_lock_irqsave(&pm_qos_lock, flags);
> +	/*
> +	 * PM_QOS_DEV_WAKEUP_LATENCY:
> +	 * Use the per-device wake-up latency constraints list for
> +	 * constraints storage
> +	 */
> +	if (req->class == PM_QOS_DEV_WAKEUP_LATENCY)
> +		o->requests = &req->dev->power.wakeup_lat_plist_head;

Ah, I see what the idea is and quite frankly I don't like it.

Changing the requests field on the fly in every second (or so) invocation of
update_target() for PM_QOS_DEV_WAKEUP_LATENCY class is beyond ugly.  Moreover,
it makes all devices share the default_value field which I don't think is
correct (the target_value field is shared too, which may lead to all sorts of
problems).

I wouldn't even try to use struct pm_qos_object as it is.  Instead, I'd
introduce something like

struct pm_qos_constraits {
	struct plist_head requests;
	unsigned int current_val;
	unsigned int default_val;
	unsigned int floor_val;  /* for adding new requests */
	unsigned int ceiling_val;  /* ditto */
	enum pm_qos_type type;
};

Where I'm not sure if the "type" field is really necessary, but that should
work out at one point and "unsigned int" is on purpose (it should have been
that way from day one in the first place).  That's what is needed for device
IMO, the other fields in struct pm_qos_object are simply redundant for this
use case.

Now, for the compatibility with the existing code I'd redefine struct
pm_qos_object as:

struct pm_qos_object {
	struct pm_qos_constraints constraints;
	struct blocking_notifier *notifiers;
	struct miscdevice pm_qos_power_miscdev;
	char *name;
};

and I'd change the code to operate on struct pm_qos_constraits in the first
place, with a compatibility layer using struct pm_qos_object on top of that.

Then, each device may have its own struct pm_qos_constraits object that will
be operated on by the PM QoS functions.  [There is the problem of whether or
not those object should be embedded in struct dev_pm_info, it may be better
to use pointers to them.  That may be used for sharing of constraints, for
example, in which case it will be necessary to add a refcount to struct
pm_qos_constraits.]

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH 01/11] PM: add a per-device wake-up latency constraints plist
From: Rafael J. Wysocki @ 2011-07-02 19:39 UTC (permalink / raw)
  To: jean.pihet; +Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1309446685-17502-2-git-send-email-j-pihet@ti.com>

Hi,

On Thursday, June 30, 2011, jean.pihet@newoldbits.com wrote:
> From: Jean Pihet <j-pihet@ti.com>
> 
> Add the field wakeup_lat_plist_head in the struct dev_pm_info
> and the initialization of the plist in device_pm_init.
> 
> This enables the implementation of per-device constraints in
> PM QoS.
> 
> Signed-off-by: Jean Pihet <j-pihet@ti.com>
> ---
>  drivers/base/power/main.c |    3 +++
>  include/linux/pm.h        |    2 ++
>  2 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index aa632020..b1fd96b 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -96,6 +96,8 @@ void device_pm_add(struct device *dev)
>  			dev_name(dev->parent));
>  	list_add_tail(&dev->power.entry, &dpm_list);
>  	mutex_unlock(&dpm_list_mtx);
> +	/* ToDo: call PM QoS to init the per-device wakeup latency constraints */
> +	plist_head_init(&dev->power.wakeup_lat_plist_head, &dev->power.lock);
>  }
>  
>  /**
> @@ -106,6 +108,7 @@ void device_pm_remove(struct device *dev)
>  {
>  	pr_debug("PM: Removing info for %s:%s\n",
>  		 dev->bus ? dev->bus->name : "No Bus", dev_name(dev));
> +	/* ToDo: call PM QoS to de-init the per-device wakeup latency constraints */
>  	complete_all(&dev->power.completion);
>  	mutex_lock(&dpm_list_mtx);
>  	list_del_init(&dev->power.entry);
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 3160648..35fe682 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -22,6 +22,7 @@
>  #define _LINUX_PM_H
>  
>  #include <linux/list.h>
> +#include <linux/plist.h>
>  #include <linux/workqueue.h>
>  #include <linux/spinlock.h>
>  #include <linux/wait.h>
> @@ -462,6 +463,7 @@ struct dev_pm_info {
>  	unsigned long		accounting_timestamp;
>  	void			*subsys_data;  /* Owned by the subsystem. */
>  #endif
> +	struct plist_head	wakeup_lat_plist_head;
>  };

Please use a better name.  I mean, relly, the type implies that this is a
plist head, so that doesn't need to appear in the field name too.  Also,
the name is confusing, because "wakeup" may mean a couple of different things
and it's not entirely clear what "lat" stands for.  So, I'd prefer something
like

+	struct plist_head	latency_constraints;

or perhaps you can invent something even better.

>  
>  extern void update_pm_runtime_accounting(struct device *dev);
> 

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH v2 00/11] PM QoS: add a per-device wake-up latency constraint class
From: Rafael J. Wysocki @ 2011-07-02 19:20 UTC (permalink / raw)
  To: jean.pihet; +Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1309446685-17502-1-git-send-email-j-pihet@ti.com>

Hi,

On Thursday, June 30, 2011, jean.pihet@newoldbits.com wrote:
> From: Jean Pihet <j-pihet@ti.com>
> 
> This patch set is in an RFC state, for review and comments.

First off, I'm sorry I couldn't review the patchset earlier.

> In order to implement the new class in PM QoS the following changes have been
> made:
> 
> 1. Add a new PM QoS class for device wake-up constraints
> (PM_QOS_DEV_WAKEUP_LATENCY).
> Due to the per-device nature of the new class the constraints lists are stored
> inside the device dev_pm_info struct instead of the internal per-class
> constraints lists.
> The new class is only available from kernel drivers and so is not exported to
> user space.

Have you considered a design in which multiple devices may use the same
list of constraints?  It seems plausible that the constraints will be the
same, for example, for all Ethernet adapters in the system, in which case it
will be wasteful to duplicate the list of constraints for each of them.

> 2. Added a notification of device insertion/removal from the device PM framework
> to PM QoS.
> This allows to init/de-init the per-device constraints list upon device insertion
> and removal.
> RFC state for comments and review, barely tested

I need to have a look at the details, but in principle this means that the
per-device lists will be usable only after the devices have been registered.
In particular, this means that it will only be possible to add new constraints
after registering the device, which may be too late for some use cases.

> 3. Make the pm_qos_add_request API more generic by using a
> struct pm_qos_parameters parameter. This allows easy extension in the future.
> 
> 4. Upon a change of the strongest constraint in the PM_QOS_DEV_WAKEUP_LATENCY
> class a notification chain mechanism is used to take action on the system.
> This is the proposed way to have PM QoS and the platform dependant code to
> interact with each other, cf. 4 below.

I guess you mean 5.?

I think we will need something in addition to the notifier here.  For example,
I wouldn't like any core code, like runtime PM or cpuidle, to have to register
a notifier with PM QoS.

> The notification mechanism now passes the constraint request struct ptr in
> order for the notifier callback to have access to the full set of constraint
> data, e.g. the struct device ptr.
> 
> 5. cpuidle interaction with the OMAP3 cpuidle handler
> Since cpuidle is a CPU centric framework it decides the MPU next power state
> based on the MPU exit_latency and target_residency figures.
>     
> The rest of the power domains get their next power state programmed from
> the PM_QOS_DEV_WAKEUP_LATENCY class of the PM QoS framework, via the device
> wake-up latency constraints.
> 
> Note: the exit_latency and target_residency figures of the MPU include the MPU
> itself and the peripherals needed for the MPU to execute instructions (e.g.
> main memory, caches, IRQ controller, MMU etc).
> Some of those peripherals can belong to other power domains than the MPU
> subsystem and so the corresponding latencies must be included in those figures.
> 
> 6. Update the pm_qos_add_request callers to the generic API
> 
> 7. Minor clean-ups and rename of struct fields
> 
> Questions:
> 1. How to retrieve the device ptr from a given device driver in order to add
> a constraint on it?
> 2. The device struct has recently been extended with the power domain
> information. Can this be used to apply the constraints on power domains?

Yes, it can in principle, but that will require some work.

> On-going developments, patches in preparation:
> 1. write Documentation for the new PM QoS class

I'd wait with that until the code has settled.

> 2. validate the constraints framework on OMAP4 HW (done on OMAP3)
> 3. refine the power domains wake-up latency and the cpuidle figures
> 
> Based on the master branch of the linux-omap git tree (3.0.0-rc3). Compile
> tested using OMAP and x86 generic defconfigs.
> Tested on OMAP3 Beagleboard (ES2.x) with full RETention and OFF modes.

More detailed comments will follow.

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH] PM / Runtime: Update documentation regarding driver removal
From: Kevin Hilman @ 2011-07-01 22:49 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, linux-omap@vger.kernel.org
In-Reply-To: <201107020012.29661.rjw@sisk.pl>

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> Commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a (PM / Runtime: Rework
> runtime PM handling during driver removal) forgot to update the
> documentation in Documentation/power/runtime_pm.txt to match the new
> code in drivers/base/dd.c.  Update that documentation to match the
> code it describes.
>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Reviewed-by: Kevin Hilman <khilman@ti.com>

Thanks, it's clear now.

Kevin

^ permalink raw reply

* [PATCH] PM / Runtime: Update documentation regarding driver removal
From: Rafael J. Wysocki @ 2011-07-01 22:12 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm, linux-omap@vger.kernel.org
In-Reply-To: <Pine.LNX.4.44L0.1107011711370.1624-100000@iolanthe.rowland.org>

From: Rafael J. Wysocki <rjw@sisk.pl>

Commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a (PM / Runtime: Rework
runtime PM handling during driver removal) forgot to update the
documentation in Documentation/power/runtime_pm.txt to match the new
code in drivers/base/dd.c.  Update that documentation to match the
code it describes.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/runtime_pm.txt |   26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Index: linux-2.6/Documentation/power/runtime_pm.txt
===================================================================
--- linux-2.6.orig/Documentation/power/runtime_pm.txt
+++ linux-2.6/Documentation/power/runtime_pm.txt
@@ -513,13 +513,29 @@ helper functions described in Section 4.
 should be used.  Of course, for this purpose the device's runtime PM has to be
 enabled earlier by calling pm_runtime_enable().
 
-If the device bus type's or driver's ->probe() or ->remove() callback runs
+If the device bus type's or driver's ->probe() callback runs
 pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
 they will fail returning -EAGAIN, because the device's usage counter is
-incremented by the core before executing ->probe() and ->remove().  Still, it
-may be desirable to suspend the device as soon as ->probe() or ->remove() has
-finished, so the PM core uses pm_runtime_put_sync() to invoke the
-subsystem-level idle callback for the device at that time.
+incremented by the driver core before executing ->probe().  Still, it may be
+desirable to suspend the device as soon as ->probe() has finished, so the driver
+core uses pm_runtime_put_sync() to invoke the subsystem-level idle callback for
+the device at that time.
+
+Moreover, the driver core prevents runtime PM callbacks from racing with the bus
+notifier callback in __device_release_driver(), which is necessary, because the
+notifier is used by some subsystems to carry out operations affecting the
+runtime PM functionality.  It does so by calling pm_runtime_get_sync() before
+driver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications.  This
+resumes the device if it's in the suspended state and prevents it from
+being suspended again while those routines are being executed.
+
+To allow bus types and drivers to put devices into the suspended state by
+calling pm_runtime_suspend() from their ->remove() routines, the driver core
+executes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER
+notifications in __device_release_driver().  This requires bus types and
+drivers to make their ->remove() callbacks avoid races with runtime PM directly,
+but also it allows of more flexibility in the handling of devices during the
+removal of their drivers.
 
 The user space can effectively disallow the driver of the device to power manage
 it at run time by changing the value of its /sys/devices/.../power/control

^ permalink raw reply

* Re: runtime PM usage_count during driver_probe_device()?
From: Rafael J. Wysocki @ 2011-07-01 21:44 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm, linux-omap@vger.kernel.org
In-Reply-To: <Pine.LNX.4.44L0.1107011711370.1624-100000@iolanthe.rowland.org>

On Friday, July 01, 2011, Alan Stern wrote:
> On Fri, 1 Jul 2011, Rafael J. Wysocki wrote:
> 
> > Hi,
> > 
> > On Friday, July 01, 2011, Kevin Hilman wrote:
> > > Alan Stern <stern@rowland.harvard.edu> writes:
> > > 
> > > > On Fri, 1 Jul 2011, Kevin Hilman wrote:
> > > >
> > > >> OK, so the ->probe() part has been explained and makes sense, but I
> > > >> would expect ->remove() to be similarily protected (as the documentation
> > > >> states.)  But that is not the case.  Is that a bug?  If so, patch below
> > > >> makes the code match the documentation.
> > > >
> > > > I suspect it is a bug, but it's hard to be sure.  It's so _blatantly_ 
> > > > wrong that it looks like it was done deliberately.
> > > 
> > > heh
> > 
> > I seem to remeber having a problem with the pm_runtime_put_sync() after
> > drv->remove(dev) ...
> > 
> > So the code in question was introduced by
> > 
> > commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a
> > Author: Rafael J. Wysocki <rjw@sisk.pl>
> > Date:   Fri Apr 29 00:33:45 2011 +0200
> > 
> >     PM / Runtime: Rework runtime PM handling during driver removal
> > 
> > with a long changelog explaining the reason why.  Which seems to make sense. ;-)
> 
> Okay, that seems fair enough.  Looks like the documentation needs to be 
> updated to match, though.

Yes, it does.

> And we probably still want to make sure that access to the 
> power/control and related attribute files is mutually exclusive with 
> probe and remove.

I agree.

Thanks,
Rafael

^ permalink raw reply

* Re: runtime PM usage_count during driver_probe_device()?
From: Rafael J. Wysocki @ 2011-07-01 21:42 UTC (permalink / raw)
  To: linux-pm; +Cc: linux-omap@vger.kernel.org
In-Reply-To: <87k4c3dktm.fsf@ti.com>

On Friday, July 01, 2011, Kevin Hilman wrote:
> Continuing on the theme of runtime PM interactions with other parts of
> the driver core...
> 
> In drivers/base/dd.c:driver_probe_device(), the driver core increments
> the usage count around ->probe():
> 
>         [...]
> 	pm_runtime_get_noresume(dev);
> 	pm_runtime_barrier(dev);
> 	ret = really_probe(dev, drv);
> 	pm_runtime_put_sync(dev);
> 
> I'm not following the reason for this.  On driver's I'm familar with,
> it's not until ->probe where pm_runtime_enable() is called.  What is
> being protected against here?
> 
> These seem to exist since the introduction of the runtime PM core, but I
> can't find any explanation.
> 
> The documentation refers to the increment by the core, but not the
> reasons why:
> 
>     If the device bus type's or driver's ->probe() or ->remove()
>     callback runs pm_runtime_suspend() or pm_runtime_idle() or their
>     asynchronous counterparts, they will fail returning -EAGAIN, because
>     the device's usage counter is incremented by the core before
>     executing ->probe() and ->remove().  Still, it may be desirable to
>     suspend the device as soon as ->probe() or ->remove() has finished,
>     so the PM core uses pm_runtime_idle_sync() to invoke the
>     subsystem-level idle callback for the device at that time.
> 
> On a side note, the bit about -EAGAIN above is not accurate with today's
> code.  For example, __pm_runtime_suspend() returns zero when the usage
> count decrement is non-zero, so callers can't currently know that doing
> a pm_runtime_suspend() or pm_runtime_put_sync() in their ->probe()
> actually didn't happen.
> 
> Another curiosity is that, contrary to the above documentation, there is
> no usage_count increment before the bus/driver ->remove() (although
> there is a _get_sync/_put_sync around the sysfs_remove and notifier just
> before the bus/driver->remove().
> 
> Also, below is a patch for a typo in the above Documentation exerpt.
> 
> Kevin
> 
> 
> 
> From 069484f8d2bb86473a271c27733e10fbfd410c2c Mon Sep 17 00:00:00 2001
> From: Kevin Hilman <khilman@ti.com>
> Date: Thu, 30 Jun 2011 15:07:31 -0700
> Subject: [PATCH] PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist.
> 
> Replace reference to pm_runtime_idle_sync() in the driver core with
> pm_runtime_put_sync() which is used in the code.
> 
> Signed-off-by: Kevin Hilman <khilman@ti.com>

Applied to suspend-2.6/linux-next.

Thanks,
Rafael


> ---
>  Documentation/power/runtime_pm.txt |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
> index 22accb3..518d9be 100644
> --- a/Documentation/power/runtime_pm.txt
> +++ b/Documentation/power/runtime_pm.txt
> @@ -506,7 +506,7 @@ pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts,
>  they will fail returning -EAGAIN, because the device's usage counter is
>  incremented by the core before executing ->probe() and ->remove().  Still, it
>  may be desirable to suspend the device as soon as ->probe() or ->remove() has
> -finished, so the PM core uses pm_runtime_idle_sync() to invoke the
> +finished, so the PM core uses pm_runtime_put_sync() to invoke the
>  subsystem-level idle callback for the device at that time.
>  
>  The user space can effectively disallow the driver of the device to power manage
> 

^ permalink raw reply

* Re: [PATCH] PM: prevent runtime_resume from racing with probe
From: Rafael J. Wysocki @ 2011-07-01 21:42 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list
In-Reply-To: <Pine.LNX.4.44L0.1107011132070.1988-100000@iolanthe.rowland.org>

On Friday, July 01, 2011, Alan Stern wrote:
> This patch (as1475) adds device_lock() and device_unlock() calls to
> the store methods for the power/control and power/autosuspend_delay_ms
> sysfs attribute files.  We don't want badly timed writes to these
> files to cause runtime_resume callbacks to occur while a driver is
> being probed for a device.
> 
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

Applied to suspend-2.6/linux-next.

Thanks,
Rafael


> ---
> 
>  drivers/base/power/sysfs.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> Index: usb-3.0/drivers/base/power/sysfs.c
> ===================================================================
> --- usb-3.0.orig/drivers/base/power/sysfs.c
> +++ usb-3.0/drivers/base/power/sysfs.c
> @@ -116,12 +116,14 @@ static ssize_t control_store(struct devi
>  	cp = memchr(buf, '\n', n);
>  	if (cp)
>  		len = cp - buf;
> +	device_lock(dev);
>  	if (len == sizeof ctrl_auto - 1 && strncmp(buf, ctrl_auto, len) == 0)
>  		pm_runtime_allow(dev);
>  	else if (len == sizeof ctrl_on - 1 && strncmp(buf, ctrl_on, len) == 0)
>  		pm_runtime_forbid(dev);
>  	else
> -		return -EINVAL;
> +		n = -EINVAL;
> +	device_unlock(dev);
>  	return n;
>  }
>  
> @@ -205,7 +207,9 @@ static ssize_t autosuspend_delay_ms_stor
>  	if (strict_strtol(buf, 10, &delay) != 0 || delay != (int) delay)
>  		return -EINVAL;
>  
> +	device_lock(dev);
>  	pm_runtime_set_autosuspend_delay(dev, delay);
> +	device_unlock(dev);
>  	return n;
>  }
>  
> 
> 
> 

^ permalink raw reply

* [PATCH] PM: prevent runtime_resume from racing with probe
From: Alan Stern @ 2011-07-01 21:13 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux-pm mailing list
In-Reply-To: <Pine.LNX.4.44L0.1107011045160.1988-100000@iolanthe.rowland.org>

This patch (as1475) adds device_lock() and device_unlock() calls to
the store methods for the power/control and power/autosuspend_delay_ms
sysfs attribute files.  We don't want badly timed writes to these
files to cause runtime_resume callbacks to occur while a driver is
being probed for a device.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

 drivers/base/power/sysfs.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: usb-3.0/drivers/base/power/sysfs.c
===================================================================
--- usb-3.0.orig/drivers/base/power/sysfs.c
+++ usb-3.0/drivers/base/power/sysfs.c
@@ -116,12 +116,14 @@ static ssize_t control_store(struct devi
 	cp = memchr(buf, '\n', n);
 	if (cp)
 		len = cp - buf;
+	device_lock(dev);
 	if (len == sizeof ctrl_auto - 1 && strncmp(buf, ctrl_auto, len) == 0)
 		pm_runtime_allow(dev);
 	else if (len == sizeof ctrl_on - 1 && strncmp(buf, ctrl_on, len) == 0)
 		pm_runtime_forbid(dev);
 	else
-		return -EINVAL;
+		n = -EINVAL;
+	device_unlock(dev);
 	return n;
 }
 
@@ -205,7 +207,9 @@ static ssize_t autosuspend_delay_ms_stor
 	if (strict_strtol(buf, 10, &delay) != 0 || delay != (int) delay)
 		return -EINVAL;
 
+	device_lock(dev);
 	pm_runtime_set_autosuspend_delay(dev, delay);
+	device_unlock(dev);
 	return n;
 }
 

^ permalink raw reply

* Re: runtime PM usage_count during driver_probe_device()?
From: Alan Stern @ 2011-07-01 21:12 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, linux-omap@vger.kernel.org
In-Reply-To: <201107012253.19988.rjw@sisk.pl>

On Fri, 1 Jul 2011, Rafael J. Wysocki wrote:

> Hi,
> 
> On Friday, July 01, 2011, Kevin Hilman wrote:
> > Alan Stern <stern@rowland.harvard.edu> writes:
> > 
> > > On Fri, 1 Jul 2011, Kevin Hilman wrote:
> > >
> > >> OK, so the ->probe() part has been explained and makes sense, but I
> > >> would expect ->remove() to be similarily protected (as the documentation
> > >> states.)  But that is not the case.  Is that a bug?  If so, patch below
> > >> makes the code match the documentation.
> > >
> > > I suspect it is a bug, but it's hard to be sure.  It's so _blatantly_ 
> > > wrong that it looks like it was done deliberately.
> > 
> > heh
> 
> I seem to remeber having a problem with the pm_runtime_put_sync() after
> drv->remove(dev) ...
> 
> So the code in question was introduced by
> 
> commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a
> Author: Rafael J. Wysocki <rjw@sisk.pl>
> Date:   Fri Apr 29 00:33:45 2011 +0200
> 
>     PM / Runtime: Rework runtime PM handling during driver removal
> 
> with a long changelog explaining the reason why.  Which seems to make sense. ;-)

Okay, that seems fair enough.  Looks like the documentation needs to be 
updated to match, though.

And we probably still want to make sure that access to the 
power/control and related attribute files is mutually exclusive with 
probe and remove.

Alan Stern

^ permalink raw reply

* Re: runtime PM usage_count during driver_probe_device()?
From: Rafael J. Wysocki @ 2011-07-01 20:53 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: linux-pm, linux-omap@vger.kernel.org
In-Reply-To: <87pqlu3sy3.fsf@ti.com>

Hi,

On Friday, July 01, 2011, Kevin Hilman wrote:
> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Fri, 1 Jul 2011, Kevin Hilman wrote:
> >
> >> OK, so the ->probe() part has been explained and makes sense, but I
> >> would expect ->remove() to be similarily protected (as the documentation
> >> states.)  But that is not the case.  Is that a bug?  If so, patch below
> >> makes the code match the documentation.
> >
> > I suspect it is a bug, but it's hard to be sure.  It's so _blatantly_ 
> > wrong that it looks like it was done deliberately.
> 
> heh

I seem to remeber having a problem with the pm_runtime_put_sync() after
drv->remove(dev) ...

So the code in question was introduced by

commit e1866b33b1e89f077b7132daae3dfd9a594e9a1a
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Fri Apr 29 00:33:45 2011 +0200

    PM / Runtime: Rework runtime PM handling during driver removal

with a long changelog explaining the reason why.  Which seems to make sense. ;-)

So I'm not sure.

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH 7/10 v6] PM / Domains: Don't stop wakeup devices during system sleep transitions
From: Rafael J. Wysocki @ 2011-07-01 20:06 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-sh, Greg Kroah-Hartman, LKML, Linux PM mailing list
In-Reply-To: <Pine.LNX.4.44L0.1107011043410.1988-100000@iolanthe.rowland.org>

On Friday, July 01, 2011, Alan Stern wrote:
> On Fri, 1 Jul 2011, Rafael J. Wysocki wrote:
> 
> > So the only way forward I can see is to add a special PM domain callback,
> > say .active_wakeup(), that will return "true" if the device is to be left
> > active if wakeup-enabled.  So the check you don't like will become
> > something like:
> > 
> > if (device_may_wakeup(dev) && genpd->active_wakeup
> >     && genpd->active_wakeup(dev))
> >         return 0;
> > 
> > Would that be better?
> 
> Another option, less flexible but perhaps easier to use, would be to 
> set a couple of bitflags indicating whether the device needs power or 
> clocks to handle wakeup signals.

Well, I agree, but I've decided to add the new callback after all.

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH 3/10 v6] PM / Domains: Support for generic I/O PM domains (v7)
From: Rafael J. Wysocki @ 2011-07-01 20:03 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: linux-sh, Greg Kroah-Hartman, LKML, Linux PM mailing list
In-Reply-To: <87tyb5yipq.fsf@ti.com>

On Friday, July 01, 2011, Kevin Hilman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Introduce common headers, helper functions and callbacks allowing
> > platforms to use simple generic power domains for runtime power
> > management.
> >
> > Introduce struct generic_pm_domain to be used for representing
> > power domains that each contain a number of devices and may be
> > parent domains or subdomains with respect to other power domains.
> > Among other things, this structure includes callbacks to be
> > provided by platforms for performing specific tasks related to
> > power management (i.e. ->stop_device() may disable a device's
> > clocks, while ->start_device() may enable them, ->power_off() is
> > supposed to remove power from the entire power domain
> > and ->power_on() is supposed to restore it).
> >
> > Introduce functions that can be used as power domain runtime PM
> > callbacks, pm_genpd_runtime_suspend() and pm_genpd_runtime_resume(),
> > as well as helper functions for the initialization of a power
> > domain represented by a struct generic_power_domain object,
> > adding a device to or removing a device from it and adding or
> > removing subdomains.
> >
> > Introduce configuration option CONFIG_PM_GENERIC_DOMAINS to be
> > selected by the platforms that want to use the new code.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
> Reviewed-by: Kevin Hilman <khilman@ti.com>

Thanks!

> While I still don't think this is generic enough for some more
> complicated SoCs, it most certainly a huge step in the right direction,
> and I think we can address the other complexities with additional
> patches as needed.

That's my intention.

> Thanks Rafael for all your hard work on this and putting up with my
> nagging. ;)

I don't mind that at all, it kind of helps to focus on important stuff. ;-)

Rafael

^ permalink raw reply

* Re: [PATCH 3/3] PM: Limit race conditions between runtime PM and system sleep
From: Rafael J. Wysocki @ 2011-07-01 19:50 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: linux-scsi, Greg KH, LKML, Tejun Heo, Linux PM mailing list
In-Reply-To: <87sjqq2coj.fsf@ti.com>

Hi,

On Friday, July 01, 2011, Kevin Hilman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > One of the roles of the PM core is to prevent different PM callbacks
> > executed for the same device object from racing with each other.
> > Unfortunately, after commit e8665002477f0278f84f898145b1f141ba26ee26
> > (PM: Allow pm_runtime_suspend() to succeed during system suspend)
> > runtime PM callbacks may be executed concurrently with system
> > suspend/resume callbacks for the same device.
> >
> > The main reason for commit e8665002477f0278f84f898145b1f141ba26ee26
> > was that some subsystems and device drivers wanted to use runtime PM
> > helpers, pm_runtime_suspend() and pm_runtime_put_sync() in
> > particular, for carrying out the suspend of devices in their
> > .suspend() callbacks.  However, as it's been determined recently,
> > there are multiple reasons not to do so, inlcuding:
> >
> >  * The caller really doesn't control the runtime PM usage counters,
> >    because user space can access them through sysfs and effectively
> >    block runtime PM.  That means using pm_runtime_suspend() or
> >    pm_runtime_get_sync() to suspend devices during system suspend
> >    may or may not work.
> >
> >  * If a driver calls pm_runtime_suspend() from its .suspend()
> >    callback, it causes the subsystem's .runtime_suspend() callback to
> >    be executed, which leads to the call sequence:
> >
> >    subsys->suspend(dev)
> >       driver->suspend(dev)
> >          pm_runtime_suspend(dev)
> >             subsys->runtime_suspend(dev)
> >
> >    recursive from the subsystem's point of view.  For some subsystems
> >    that may actually work (e.g. the platform bus type), but for some
> >    it will fail in a rather spectacular fashion (e.g. PCI).  In each
> >    case it means a layering violation.
> >
> >  * Both the subsystem and the driver can provide .suspend_noirq()
> >    callbacks for system suspend that can do whatever the
> >    .runtime_suspend() callbacks do just fine, so it really isn't
> >    necessary to call pm_runtime_suspend() during system suspend.
> >
> >  * The runtime PM's handling of wakeup devices is usually different
> >    from the system suspend's one, so .runtime_suspend() may simply be
> >    inappropriate for system suspend.
> >
> >  * System suspend is supposed to work even if CONFIG_PM_RUNTIME is
> >    unset.
> >
> >  * The runtime PM workqueue is frozen before system suspend, so if
> >    whatever the driver is going to do during system suspend depends
> >    on it, that simply won't work.
> 
> Thanks for the thorough description of all these reasons.

Well, I thought I had to document them before I would forget again.

> > Still, there is a good reason to allow pm_runtime_resume() to
> > succeed during system suspend and resume (for instance, some
> > subsystems and device drivers may legitimately use it to ensure that
> > their devices are in full-power states before suspending them).
> > Moreover, there is no reason to prevent runtime PM callbacks from
> > being executed in parallel with the system suspend/resume .prepare()
> > and .complete() callbacks and the code removed by commit
> > e8665002477f0278f84f898145b1f141ba26ee26 went too far in this
> > respect.  On the other hand, runtime PM callbacks, including
> > .runtime_resume(), must not be executed during system suspend's
> > "late" stage of suspending devices and during system resume's "early"
> > device resume stage.
> >
> > Taking all of the above into consideration, make the PM core
> > acquire a runtime PM reference to every device and resume it if
> > there's a runtime PM resume request pending right before executing
> > the subsystem-level .suspend() callback for it.  Make the PM core
> > drop references to all devices right after executing the
> > subsystem-level .resume() callbacks for them.  Additionally,
> > make the PM core disable the runtime PM framework for all devices
> > during system suspend, after executing the subsystem-level .suspend()
> > callbacks for them, and enable the runtime PM framework for all
> > devices during system resume, right before executing the
> > subsystem-level .resume() callbacks for them.
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> OK, I'm convinced.  :)
> 
> Acked-by: Kevin Hilman <khilman@ti.com>

Cool, thanks. :-)

> Minor documentation comment below...
> 
> >  Documentation/power/runtime_pm.txt |   20 ++++++++++++++++++++
> >  drivers/base/power/main.c          |   27 ++++++++++++++++++++++-----
> >  2 files changed, 42 insertions(+), 5 deletions(-)
> >
> > Index: linux-2.6/drivers/base/power/main.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/power/main.c
> > +++ linux-2.6/drivers/base/power/main.c
> > @@ -505,6 +505,7 @@ static int legacy_resume(struct device *
> >  static int device_resume(struct device *dev, pm_message_t state, bool async)
> >  {
> >  	int error = 0;
> > +	bool put = false;
> >  
> >  	TRACE_DEVICE(dev);
> >  	TRACE_RESUME(0);
> > @@ -521,6 +522,9 @@ static int device_resume(struct device *
> >  	if (!dev->power.is_suspended)
> >  		goto Unlock;
> >  
> > +	pm_runtime_enable(dev);
> > +	put = true;
> > +
> >  	if (dev->pm_domain) {
> >  		pm_dev_dbg(dev, state, "power domain ");
> >  		error = pm_op(dev, &dev->pm_domain->ops, state);
> > @@ -563,6 +567,10 @@ static int device_resume(struct device *
> >  	complete_all(&dev->power.completion);
> >  
> >  	TRACE_RESUME(error);
> > +
> > +	if (put)
> > +		pm_runtime_put_sync(dev);
> > +
> >  	return error;
> >  }
> >  
> > @@ -843,16 +851,22 @@ static int __device_suspend(struct devic
> >  	int error = 0;
> >  
> >  	dpm_wait_for_children(dev, async);
> > -	device_lock(dev);
> >  
> >  	if (async_error)
> > -		goto Unlock;
> > +		return 0;
> > +
> > +	pm_runtime_get_noresume(dev);
> > +	if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> > +		pm_wakeup_event(dev, 0);
> >  	if (pm_wakeup_pending()) {
> > +		pm_runtime_put_sync(dev);
> >  		async_error = -EBUSY;
> > -		goto Unlock;
> > +		return 0;
> >  	}
> >  
> > +	device_lock(dev);
> > +
> >  	if (dev->pm_domain) {
> >  		pm_dev_dbg(dev, state, "power domain ");
> >  		error = pm_op(dev, &dev->pm_domain->ops, state);
> > @@ -890,12 +904,15 @@ static int __device_suspend(struct devic
> >   End:
> >  	dev->power.is_suspended = !error;
> >  
> > - Unlock:
> >  	device_unlock(dev);
> >  	complete_all(&dev->power.completion);
> >  
> > -	if (error)
> > +	if (error) {
> > +		pm_runtime_put_sync(dev);
> >  		async_error = error;
> > +	} else if (dev->power.is_suspended) {
> > +		__pm_runtime_disable(dev, false);
> > +	}
> >  
> >  	return error;
> >  }
> > Index: linux-2.6/Documentation/power/runtime_pm.txt
> > ===================================================================
> > --- linux-2.6.orig/Documentation/power/runtime_pm.txt
> > +++ linux-2.6/Documentation/power/runtime_pm.txt
> > @@ -567,6 +567,11 @@ this is:
> >  	pm_runtime_set_active(dev);
> >  	pm_runtime_enable(dev);
> >  
> > +The PM core always increments the run-time usage counter before calling the
> > +->suspend() callback and decrements it after calling the ->resume() callback.
> > +Hence disabling run-time PM temporarily like this will not cause any run-time
> > +suspend callbacks to be lost.
> > +
> >  On some systems, however, system sleep is not entered through a global firmware
> >  or hardware operation.  Instead, all hardware components are put into low-power
> >  states directly by the kernel in a coordinated way.  Then, the system sleep
> > @@ -579,6 +584,21 @@ place (in particular, if the system is n
> >  be more efficient to leave the devices that had been suspended before the system
> >  suspend began in the suspended state.
> >  
> > +The PM core does its best to reduce the probability of race conditions between
> > +the runtime PM and system suspend/resume (and hibernation) callbacks by carrying
> > +out the following operations:
> > +
> > +  * During system suspend it acquires a runtime PM reference to every device
> > +    and resume it if there's a runtime PM resume request pending right before
> 
> minor: s/resume/resumes/

That has been changed in the latest version:
https://patchwork.kernel.org/patch/919622/

Thanks,
Rafael

^ permalink raw reply

* Re: [REGRESSION] suspend/resume broken on T510
From: Rafael J. Wysocki @ 2011-07-01 19:40 UTC (permalink / raw)
  To: Marc Koschewski; +Cc: Linux PM mailing list, linux-kernel
In-Reply-To: <20110630223401.GA17292@marc.osknowledge.org>

On Friday, July 01, 2011, Marc Koschewski wrote:
> * Rafael J. Wysocki <rjw@sisk.pl> [2011-07-01 00:07:27 +0200]:
> 
> Hi Rafael,
> 
> I have attached some files here to let you know what machine won't resume.

I saw them, thanks.

Still, it's a bit difficult to figure out what's wrong.

Rafael

^ permalink raw reply

* Re: [PATCH 0/10 v6] PM / Domains: Support for generic I/O PM domains
From: Kevin Hilman @ 2011-07-01 18:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-sh, Greg Kroah-Hartman, LKML, Linux PM mailing list
In-Reply-To: <201106252324.13454.rjw@sisk.pl>

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Hi,
>
> Well, one more update. :-)
>
> This is the 6th version of the patchset adding support for generic I/O PM
> domains.  The majority of patches have had some small fixes here and there
> to make things work when CONFIG_PM_RUNTIME is unset.  In addition to that,
> to make the shmobile code introduced by the last patch work for
> CONFIG_PM_RUNTIME unset I had to add patches [8-9/10].  Patch [7/10]
> adds wakeup support that was missing from the previous version and I
> decided not to push patch [1/10] for 3.0 after all.

For everything but the shmobile specifics, which I didn't look at too
closely, and the changes for [7/10] that we agreed on:

Reviewed-by: Kevin Hilman <khilman@ti.com>

Thanks!

Kevin

^ permalink raw reply

* Re: [PATCH 3/10 v6] PM / Domains: Support for generic I/O PM domains (v7)
From: Kevin Hilman @ 2011-07-01 18:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-sh, Greg Kroah-Hartman, LKML, Linux PM mailing list
In-Reply-To: <201106252326.23837.rjw@sisk.pl>

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> Introduce common headers, helper functions and callbacks allowing
> platforms to use simple generic power domains for runtime power
> management.
>
> Introduce struct generic_pm_domain to be used for representing
> power domains that each contain a number of devices and may be
> parent domains or subdomains with respect to other power domains.
> Among other things, this structure includes callbacks to be
> provided by platforms for performing specific tasks related to
> power management (i.e. ->stop_device() may disable a device's
> clocks, while ->start_device() may enable them, ->power_off() is
> supposed to remove power from the entire power domain
> and ->power_on() is supposed to restore it).
>
> Introduce functions that can be used as power domain runtime PM
> callbacks, pm_genpd_runtime_suspend() and pm_genpd_runtime_resume(),
> as well as helper functions for the initialization of a power
> domain represented by a struct generic_power_domain object,
> adding a device to or removing a device from it and adding or
> removing subdomains.
>
> Introduce configuration option CONFIG_PM_GENERIC_DOMAINS to be
> selected by the platforms that want to use the new code.
>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

Reviewed-by: Kevin Hilman <khilman@ti.com>

While I still don't think this is generic enough for some more
complicated SoCs, it most certainly a huge step in the right direction,
and I think we can address the other complexities with additional
patches as needed.

Thanks Rafael for all your hard work on this and putting up with my
nagging. ;)

Kevin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox