[RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
@ 2011-10-13 19:45 Rafael J. Wysocki
  2011-10-13 19:49 ` [RFC][PATCH 1/2] PM / Sleep: Add mechanism to disable suspend and hibernation Rafael J. Wysocki
                   ` (3 more replies)
  0 siblings, 4 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-13 19:45 UTC (permalink / raw)
  To: Linux PM list; +Cc: mark gross, LKML, John Stultz, Alan Stern, NeilBrown

Hi,

There is an ongoing discussion about possible ways to avoid some probles
related to suspend/hibernate interfaces, such as races with the handling
of wakeup events (in user space) and (destructive) interference with some
important system maintenance tasks (such as firmware updates).

It follows from this discussion that whatever the kernel has to offer in
this area is either too complicated to use in practice or inadequate for
other reasons.  The two use case examples given by John, that is the
firmware update problem (i.e. system suspend or hibernation should not
be allowed to happen while the system's firmware is being updated) and the
backup problem (i.e. is should be possible to wake up the system from
sleep in the middle of the night via a timer event, create a backup of it
and make it go to sleep again automatically when the backup is ready
without implementing the backup feature in a power manager) are quite
convincing to me, but also it seems to me that previous attempts to
address them go a bit too far in complexity.  For this reason, I thought
it might be a good idea to propose a simpler approach.  It is not bullet
proof, but it should be suitable to address at least those two issues.

First, to address the firmware update problem, I think we need a big
hammer switch allowing a root-owned process to disable/enable all
suspend/hibernate interfaces.  This is introduced by the first patch in
the form of a new sysfs attribute, /sys/power/sleep_mode, that can be
used to disable the suspend/hibernate functionality (it does that with
the help of the existing wakeup events detection mechanism).

Second, to address the backup problem, we need to allow user space
processes other than the suspend/hibernate process itself to prevent the
system from being put into sleep states.  A mechanism for that is introduced
by the second patch in the form of the /dev/sleepctl special device working
kind of like user space wakelocks on Android (although in a simplified
fashion).

More details are in the changelogs and (of course) in the code itself.

The patches haven't been tested (I had tested the first one, but then I made
some changes to it afterwards), so most likely there are some bugs in them,
but I didn't want to lose time on testing things that people may not like
in principle. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [RFC][PATCH 1/2] PM / Sleep: Add mechanism to disable suspend and hibernation
  2011-10-13 19:45 [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Rafael J. Wysocki
@ 2011-10-13 19:49 ` Rafael J. Wysocki
  2011-10-13 19:50 ` [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode Rafael J. Wysocki
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-13 19:49 UTC (permalink / raw)
  To: Linux PM list; +Cc: mark gross, LKML, John Stultz, Alan Stern, NeilBrown

From: Rafael J. Wysocki <rjw@sisk.pl>

There are situations when it is necessary to disable suspend and
hibernation, so that it doesn't prevent some important operation
from completing.  For example, suspend or hibernation shouldn't
happen when the system's firmware is being updated, but it is
impossible to avoid race conditions with processes triggering suspend
or hibernation (e.g. a GUI power manager noticing that the GUI hasn't
been used for a sufficiently long time) entirely in user space.

For this reason, introduce a new sysfs attribute in /sys/power/,
called sleep_mode, allowing a root-owned process to set the mode
either to "direct", which means that it's possible to suspend or
hibernate normally and which is the default, or to "disabled", which
means that suspend and hibernation are impossible (this works by
adding a fake "wakeup event in progress" and enabling the wakeup
events detection mechanism defined in drivers/base/power/wakeup.c
unconditionally).

The name of the new attribute (sleep_mode) is chosen so that it
can be used for different purposes (e.g. adding a suspend mode
in which multiple processes have to agree on whether or not to
suspend) in the future.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/ABI/testing/sysfs-power |   28 +++++++++--
 drivers/base/power/wakeup.c           |   75 +++++++++++++++++++++++++------
 include/linux/suspend.h               |    2 
 kernel/power/hibernate.c              |    6 ++
 kernel/power/main.c                   |   82 ++++++++++++++++++++++++++++++++++
 kernel/power/suspend.c                |    5 ++
 kernel/power/user.c                   |    6 ++
 7 files changed, 187 insertions(+), 17 deletions(-)

Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -70,6 +70,87 @@ static ssize_t pm_async_store(struct kob
 
 power_attr(pm_async);
 
+enum sleep_mode {
+	PM_SLEEP_DISABLED = 0,
+	PM_SLEEP_DIRECT,
+};
+
+#define PM_SLEEP_LAST	PM_SLEEP_DIRECT
+
+static const char * const pm_sleep_modes[__TEST_AFTER_LAST] = {
+	[PM_SLEEP_DISABLED] = "disabled",
+	[PM_SLEEP_DIRECT] = "direct",
+};
+
+static enum sleep_mode pm_sleep_mode = PM_SLEEP_DIRECT;
+
+static ssize_t sleep_mode_show(struct kobject *kobj,
+				  struct kobj_attribute *attr,
+				  char *buf)
+{
+	char *s = buf;
+	enum sleep_mode mode;
+
+	for (mode = PM_SLEEP_DISABLED; mode <= PM_SLEEP_LAST; mode++) {
+		if (mode == pm_sleep_mode)
+			s += sprintf(s, "[%s] ", pm_sleep_modes[mode]);
+		else
+			s += sprintf(s, "%s ", pm_sleep_modes[mode]);
+	}
+
+	if (s != buf)
+		/* convert the last space to a newline */
+		*(s-1) = '\n';
+
+	return (s - buf);
+}
+
+static ssize_t sleep_mode_store(struct kobject *kobj,
+				   struct kobj_attribute *attr,
+				   const char *buf, size_t n)
+{
+	const char * const *s;
+	enum sleep_mode mode;
+	char *p;
+	int len;
+	int error = -EINVAL;
+
+	p = memchr(buf, '\n', n);
+	len = p ? p - buf : n;
+
+	error = mutex_lock_interruptible(&pm_mutex);
+	if (error)
+		return error;
+
+	mode = PM_SLEEP_DISABLED;
+	for (s = &pm_sleep_modes[mode]; mode <= PM_SLEEP_LAST; s++, mode++)
+		if (*s && len == strlen(*s) && !strncmp(buf, *s, len)) {
+			if (pm_sleep_mode != mode)
+				goto set_mode;
+
+			error = 0;
+			break;
+		}
+
+	mutex_unlock(&pm_mutex);
+
+	return error;
+
+ set_mode:
+	if (mode == PM_SLEEP_DISABLED)
+		pm_wakeup_disable_suspend();
+	else
+		pm_wakeup_enable_suspend();
+
+	pm_sleep_mode = mode;
+
+	mutex_unlock(&pm_mutex);
+
+	return n;
+}
+
+power_attr(sleep_mode);
+
 #ifdef CONFIG_PM_DEBUG
 int pm_test_level = TEST_NONE;
 
@@ -407,6 +488,7 @@ static struct attribute * g[] = {
 	&pm_trace_dev_match_attr.attr,
 #endif
 #ifdef CONFIG_PM_SLEEP
+	&sleep_mode_attr.attr,
 	&pm_async_attr.attr,
 	&wakeup_count_attr.attr,
 #ifdef CONFIG_PM_DEBUG
Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -25,6 +25,13 @@
 bool events_check_enabled;
 
 /*
+ * If set, events_check_enabled will be reset to 'false' every time by
+ * pm_wakeup_pending() if there are wakeup events in progress or new registered
+ * wakeup events and will be modified by pm_save_wakeup_count().
+ */
+static bool reset_events_check = true;
+
+/*
  * Combined counters of registered wakeup events and wakeup events in progress.
  * They need to be modified together atomically, so it's better to use one
  * atomic variable to hold them both.
@@ -601,7 +608,8 @@ bool pm_wakeup_pending(void)
 
 		split_counters(&cnt, &inpr);
 		ret = (cnt != saved_count || inpr > 0);
-		events_check_enabled = !ret;
+		if (reset_events_check)
+			events_check_enabled = !ret;
 	}
 	spin_unlock_irqrestore(&events_lock, flags);
 	if (ret)
@@ -641,29 +649,70 @@ bool pm_get_wakeup_count(unsigned int *c
  * pm_save_wakeup_count - Save the current number of registered wakeup events.
  * @count: Value to compare with the current number of registered wakeup events.
  *
- * If @count is equal to the current number of registered wakeup events and the
- * current number of wakeup events being processed is zero, store @count as the
- * old number of registered wakeup events for pm_check_wakeup_events(), enable
- * wakeup events detection and return 'true'.  Otherwise disable wakeup events
- * detection and return 'false'.
+ * Save the current value of the registered wakeup events counter for future
+ * checking if new wakeup events have been registered.
+ *
+ * Additionally, check if @count is equal to the current number of registered
+ * wakeup events and if the number of wakeup events in progress is 0.  If both
+ * conditions are satisfied, return 'true' (otherwise, 'false' is returned).
+ *
+ * If reset_events_check is set, change the wakeup events detection setting to
+ * reflect the return value.
  */
 bool pm_save_wakeup_count(unsigned int count)
 {
 	unsigned int cnt, inpr;
+	bool ret;
 
-	events_check_enabled = false;
 	spin_lock_irq(&events_lock);
+
 	split_counters(&cnt, &inpr);
-	if (cnt == count && inpr == 0) {
-		saved_count = count;
-		events_check_enabled = true;
-	}
+	saved_count = cnt;
+	ret = cnt == count && inpr == 0;
+	if (reset_events_check)
+		events_check_enabled = ret;
+
 	spin_unlock_irq(&events_lock);
-	if (!events_check_enabled)
+
+	if (!ret)
 		pm_wakeup_update_hit_counts();
-	return events_check_enabled;
+
+	return ret;
+}
+
+/**
+ * pm_wakeup_disable_suspend - Prevent the system from going to sleep.
+ */
+void pm_wakeup_disable_suspend(void)
+{
+	spin_lock_irq(&events_lock);
+	events_check_enabled = true;
+	reset_events_check = false;
+	spin_unlock_irq(&events_lock);
+	/* Increment the counter of events in progress. */
+	atomic_inc(&combined_event_count);
+}
+
+/**
+ * pm_wakeup_enable_suspend - Allow the system to be put into a sleep state.
+ *
+ * Detection of wakeup events needs to be re-enabled if desirable after this
+ * function has returned.
+ */
+void pm_wakeup_enable_suspend(void)
+{
+	spin_lock_irq(&events_lock);
+	reset_events_check = true;
+	events_check_enabled = false;
+	spin_unlock_irq(&events_lock);
+	/*
+	 * Increment the counter of registered wakeup events and decrement the
+	 * couter of wakeup events in progress simultaneously.
+	 */
+	atomic_add(MAX_IN_PROGRESS, &combined_event_count);
 }
 
+
 static struct dentry *wakeup_sources_stats_dentry;
 
 /**
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -351,6 +351,8 @@ extern bool events_check_enabled;
 extern bool pm_wakeup_pending(void);
 extern bool pm_get_wakeup_count(unsigned int *count);
 extern bool pm_save_wakeup_count(unsigned int count);
+extern void pm_wakeup_disable_suspend(void);
+extern void pm_wakeup_enable_suspend(void);
 #else /* !CONFIG_PM_SLEEP */
 
 static inline int register_pm_notifier(struct notifier_block *nb)
Index: linux/kernel/power/suspend.c
===================================================================
--- linux.orig/kernel/power/suspend.c
+++ linux/kernel/power/suspend.c
@@ -284,6 +284,11 @@ int enter_state(suspend_state_t state)
 	if (!mutex_trylock(&pm_mutex))
 		return -EBUSY;
 
+	if (pm_wakeup_pending()) {
+		error = -EAGAIN;
+		goto Unlock;
+	}
+
 	printk(KERN_INFO "PM: Syncing filesystems ... ");
 	sys_sync();
 	printk("done.\n");
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -612,6 +612,12 @@ int hibernate(void)
 	int error;
 
 	mutex_lock(&pm_mutex);
+
+	if (pm_wakeup_pending()) {
+		error = -EAGAIN;
+		goto Unlock;
+	}
+
 	/* The snapshot device should not be opened while we're running */
 	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
 		error = -EBUSY;
Index: linux/kernel/power/user.c
===================================================================
--- linux.orig/kernel/power/user.c
+++ linux/kernel/power/user.c
@@ -239,6 +239,11 @@ static long snapshot_ioctl(struct file *
 	if (!mutex_trylock(&pm_mutex))
 		return -EBUSY;
 
+	if (pm_wakeup_pending()) {
+		error = -EAGAIN;
+		goto unlock;
+	}
+
 	data = filp->private_data;
 
 	switch (cmd) {
@@ -458,6 +463,7 @@ static long snapshot_ioctl(struct file *
 
 	}
 
+ unlock:
 	mutex_unlock(&pm_mutex);
 
 	return error;
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -154,10 +154,14 @@ Description:
 		the current number of registered wakeup events and it blocks if
 		some wakeup events are being processed at the time the file is
 		read from.  Writing to it will only succeed if the current
-		number of wakeup events is equal to the written value and, if
-		successful, will make the kernel abort a subsequent transition
-		to a sleep state if any wakeup events are reported after the
-		write has returned.
+		number of registered wakeup events is equal to the written
+		value, but the way it works depends on the /sys/power/sleep_mode
+		setting.  Namely, in the "direct" mode, if the write has been
+		successful, it will make the kernel abort a subsequent
+		transition to a sleep state if any wakeup events are reported
+		after the write has returned.  In the "disabled" mode it only
+		saves the current value of registered wakeup events to be used
+		for future checking if new wakeup events have been registered.
 
 What:		/sys/power/reserved_size
 Date:		May 2011
@@ -172,3 +176,19 @@ Description:
 
 		Reading from this file will display the current value, which is
 		set to 1 MB by default.
+
+
+What:		/sys/power/sleep_mode
+Date:		October 2011
+Contact:	Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+		The /sys/power/sleep_mode file allows user space to disable all
+		of the suspend/hibernation interfaces by writing "disabled" to
+		it and to enable them again by writing "direct" to it.  If the
+		string corresponding to the current setting is written to this
+		file, the write returns 0.  Otherwise, the number of characters
+		written is returned.
+
+		Reading from this file returns the list of available values
+		("disabled" and "direct") with the current one in square
+		brackets.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-13 19:45 [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Rafael J. Wysocki
  2011-10-13 19:49 ` [RFC][PATCH 1/2] PM / Sleep: Add mechanism to disable suspend and hibernation Rafael J. Wysocki
@ 2011-10-13 19:50 ` Rafael J. Wysocki
  2011-10-13 22:58   ` John Stultz
  2011-10-14  5:52 ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces NeilBrown
  2011-10-31 19:55 ` Ming Lei
  3 siblings, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-13 19:50 UTC (permalink / raw)
  To: Linux PM list; +Cc: mark gross, LKML, John Stultz, Alan Stern, NeilBrown

From: Rafael J. Wysocki <rjw@sisk.pl>

The currently available mechanism allowing the suspend process to
avoid racing with wakeup events registered by the kernel appears
to be difficult to use.  Moreover, it requires that the suspend
process communicate with other user space processes that may take
part in the handling of wakeup events to make sure that they have
done their job before suspend is started.  Therefore all of the
wakeup-handling applications are expected to use an IPC mechanism
allowing them to exchange information with the suspend process, but
this expectation turns out to be unrealistic in practice.  For this
reason, it seems reasonable to add a mechanism allowing the
wakeup-handling processes to communicate with the suspend process
to the kernel.

This change introduces a new sleep mode, called "cooperative" sleep
mode, which needs to be selected via the /sys/power/sleep_mode sysfs
attribute and causes detection of wakeup events to be always
enabled, among other things, and a mechanism allowing user space
processes to prevent the system from being put into a sleep state
while in this mode.

The mechanism introduced by this change is based on a new special
device file, /dev/sleepctl.  A process wanting to prevent the system
from being put into a sleep state is expected to open /dev/sleepctl
and execute the SLEEPCTL_STAY_AWAKE ioctl() with the help of it.
This will make all attempts to suspend or hibernate the system block
until (1) the process executes the SLEEPCTL_RELAX ioctl() or (2)
a predefined timeout expires.  The timeout is set to 500 ms by
default, but the process can change it by writing the new timeout
value (in milliseconds) to /dev/sleepctl, in binary (unsigned int)
format.  The current timeout value can be read from /dev/sleepctl.
Setting the timeout to 0 disables it, i.e. it makes the
SLEEPCTL_STAY_AWAKE ioctl() block attempts to suspend or hibernate
the system until the SLEEPCTL_RELAX ioctl() is executed.

In addition to that, when system is resuming from suspend or
hibernation, the kernel automatically carries out an operation
equivalent to the SLEEPCTL_STAY_AWAKE ioctl() for all processes
that have /dev/sleepctl open at that time and whose timeouts are
greater than 0 (i.e. enabled), to allows those processes to
complete the handling of wakeup events before the system can be
put to a sleep state again.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/ABI/testing/sysfs-power |   34 +++-
 drivers/base/power/wakeup.c           |   38 +++-
 include/linux/suspend.h               |    6 
 include/linux/suspend_ioctls.h        |    4 
 kernel/power/Makefile                 |    1 
 kernel/power/hibernate.c              |   11 -
 kernel/power/main.c                   |  112 +++++++++++++
 kernel/power/power.h                  |    5 
 kernel/power/sleepctl.c               |  275 ++++++++++++++++++++++++++++++++++
 kernel/power/suspend.c                |   12 -
 kernel/power/user.c                   |   12 -
 11 files changed, 460 insertions(+), 50 deletions(-)

Index: linux/drivers/base/power/wakeup.c
===================================================================
--- linux.orig/drivers/base/power/wakeup.c
+++ linux/drivers/base/power/wakeup.c
@@ -681,30 +681,49 @@ bool pm_save_wakeup_count(unsigned int c
 }
 
 /**
- * pm_wakeup_disable_suspend - Prevent the system from going to sleep.
+ * pm_wakeup_enable_events_checking - Turn on wakeup events detection.
  */
-void pm_wakeup_disable_suspend(void)
+void pm_wakeup_enable_events_checking(void)
 {
 	spin_lock_irq(&events_lock);
 	events_check_enabled = true;
 	reset_events_check = false;
 	spin_unlock_irq(&events_lock);
-	/* Increment the counter of events in progress. */
-	atomic_inc(&combined_event_count);
 }
 
 /**
- * pm_wakeup_enable_suspend - Allow the system to be put into a sleep state.
- *
- * Detection of wakeup events needs to be re-enabled if desirable after this
- * function has returned.
+ * pm_wakeup_disable_events_checking - Turn off wakeup events detection.
  */
-void pm_wakeup_enable_suspend(void)
+void pm_wakeup_disable_events_checking(void)
 {
 	spin_lock_irq(&events_lock);
 	reset_events_check = true;
 	events_check_enabled = false;
 	spin_unlock_irq(&events_lock);
+}
+
+/**
+ * pm_wakeup_disable_suspend - Prevent the system from going to sleep.
+ * @enable_events_check: Whether or not to turn on wakeup events detection.
+ */
+void pm_wakeup_disable_suspend(bool enable_events_check)
+{
+	if (enable_events_check)
+		pm_wakeup_enable_events_checking();
+
+	/* Increment the counter of events in progress. */
+	atomic_inc(&combined_event_count);
+}
+
+/**
+ * pm_wakeup_enable_suspend - Allow the system to be put into a sleep state.
+ * @disable_events_check: Whether or not to turn off wakeup events detection.
+ */
+void pm_wakeup_enable_suspend(bool disable_events_check)
+{
+	if (disable_events_check)
+		pm_wakeup_disable_events_checking();
+
 	/*
 	 * Increment the counter of registered wakeup events and decrement the
 	 * couter of wakeup events in progress simultaneously.
@@ -712,7 +731,6 @@ void pm_wakeup_enable_suspend(void)
 	atomic_add(MAX_IN_PROGRESS, &combined_event_count);
 }
 
-
 static struct dentry *wakeup_sources_stats_dentry;
 
 /**
Index: linux/include/linux/suspend.h
===================================================================
--- linux.orig/include/linux/suspend.h
+++ linux/include/linux/suspend.h
@@ -351,8 +351,10 @@ extern bool events_check_enabled;
 extern bool pm_wakeup_pending(void);
 extern bool pm_get_wakeup_count(unsigned int *count);
 extern bool pm_save_wakeup_count(unsigned int count);
-extern void pm_wakeup_disable_suspend(void);
-extern void pm_wakeup_enable_suspend(void);
+extern void pm_wakeup_enable_events_checking(void);
+extern void pm_wakeup_disable_events_checking(void);
+extern void pm_wakeup_disable_suspend(bool enable_events_check);
+extern void pm_wakeup_enable_suspend(bool disable_events_check);
 #else /* !CONFIG_PM_SLEEP */
 
 static inline int register_pm_notifier(struct notifier_block *nb)
Index: linux/kernel/power/main.c
===================================================================
--- linux.orig/kernel/power/main.c
+++ linux/kernel/power/main.c
@@ -70,16 +70,26 @@ static ssize_t pm_async_store(struct kob
 
 power_attr(pm_async);
 
+static void pm_update_saved_wakeup_count(void)
+{
+	unsigned int wakeup_count;
+
+	pm_get_wakeup_count(&wakeup_count);
+	pm_save_wakeup_count(wakeup_count);
+}
+
 enum sleep_mode {
 	PM_SLEEP_DISABLED = 0,
 	PM_SLEEP_DIRECT,
+	PM_SLEEP_COOPERATIVE,
 };
 
-#define PM_SLEEP_LAST	PM_SLEEP_DIRECT
+#define PM_SLEEP_LAST	PM_SLEEP_COOPERATIVE
 
 static const char * const pm_sleep_modes[__TEST_AFTER_LAST] = {
 	[PM_SLEEP_DISABLED] = "disabled",
 	[PM_SLEEP_DIRECT] = "direct",
+	[PM_SLEEP_COOPERATIVE] = "cooperative",
 };
 
 static enum sleep_mode pm_sleep_mode = PM_SLEEP_DIRECT;
@@ -137,11 +147,26 @@ static ssize_t sleep_mode_store(struct k
 	return error;
 
  set_mode:
-	if (mode == PM_SLEEP_DISABLED)
-		pm_wakeup_disable_suspend();
-	else
-		pm_wakeup_enable_suspend();
-
+	switch (mode) {
+	case PM_SLEEP_DISABLED:
+		pm_wakeup_disable_suspend(pm_sleep_mode == PM_SLEEP_DIRECT);
+		break;
+
+	case PM_SLEEP_COOPERATIVE:
+		if (pm_sleep_mode == PM_SLEEP_DISABLED)
+			pm_wakeup_enable_suspend(false);
+		else /* PM_SLEEP_DIRECT */
+			pm_wakeup_enable_events_checking();
+
+		pm_update_saved_wakeup_count();
+		break;
+
+	default: /* PM_SLEEP_DIRECT */
+		if (pm_sleep_mode == PM_SLEEP_DISABLED)
+			pm_wakeup_enable_suspend(false);
+		else /* PM_SLEEP_COOPERATIVE */
+			pm_wakeup_disable_events_checking();
+	}
 	pm_sleep_mode = mode;
 
 	mutex_unlock(&pm_mutex);
@@ -151,6 +176,81 @@ static ssize_t sleep_mode_store(struct k
 
 power_attr(sleep_mode);
 
+static DECLARE_WAIT_QUEUE_HEAD(pm_sleep_wait_queue);
+
+/**
+ * pm_sleep_begin - Prepare to start a transition into a sleep state.
+ *
+ * Acquire pm_mutex and if the current sleep mode is not "cooperative", return
+ * an error code if new wakeup events have been registered or 0 otherwise.
+ *
+ * If the current sleep mode is "cooperative", wait until all users of
+ * /dev/sleepctl allow us to continue or new wakeup events are registered.
+ */
+int pm_sleep_begin(void)
+{
+	DEFINE_WAIT(wait);
+	int error;
+
+	error = mutex_lock_interruptible(&pm_mutex);
+	if (error)
+		return error;
+
+	if (pm_sleep_mode != PM_SLEEP_COOPERATIVE) {
+		if (pm_wakeup_pending()) {
+			mutex_unlock(&pm_mutex);
+			return -EAGAIN;
+		}
+		return 0;
+	}
+
+	while (!error) {
+		prepare_to_wait(&pm_sleep_wait_queue, &wait,
+				TASK_INTERRUPTIBLE);
+		if (!sleepctl_active())
+			break;
+
+		mutex_unlock(&pm_mutex);
+
+		schedule();
+
+		error = mutex_lock_interruptible(&pm_mutex);
+	}
+	finish_wait(&pm_sleep_wait_queue, &wait);
+	if (error)
+		return error;
+
+	return pm_wakeup_pending() ? -EAGAIN : 0;
+}
+
+/**
+ * pm_sleep_continue - Wake up a waiting suspend/hibernate process.
+ */
+void pm_sleep_continue(void)
+{
+	wake_up_all(&pm_sleep_wait_queue);
+}
+
+/**
+ * pm_sleep_end - Finalize a transition from a sleep state to the working state.
+ *
+ * If the current sleep mode is "cooperative", save the current value of the
+ * registered wakeup events counter for future use and make all processes
+ * having /dev/sleepctl open whose sleep delay fields are different from 0
+ * block subsequent pm_sleep_begin() calls.
+ *
+ * Release pm_mutex.
+ */
+void pm_sleep_end(void)
+{
+	if (pm_sleep_mode == PM_SLEEP_COOPERATIVE) {
+		pm_update_saved_wakeup_count();
+		sleepctl_rearm();
+	}
+
+	mutex_unlock(&pm_mutex);
+}
+
 #ifdef CONFIG_PM_DEBUG
 int pm_test_level = TEST_NONE;
 
Index: linux/include/linux/suspend_ioctls.h
===================================================================
--- linux.orig/include/linux/suspend_ioctls.h
+++ linux/include/linux/suspend_ioctls.h
@@ -30,4 +30,8 @@ struct resume_swap_area {
 #define SNAPSHOT_ALLOC_SWAP_PAGE	_IOR(SNAPSHOT_IOC_MAGIC, 20, __kernel_loff_t)
 #define SNAPSHOT_IOC_MAXNR	20
 
+#define SLEEPCTL_STAY_AWAKE		_IO(SNAPSHOT_IOC_MAGIC, 31)
+#define SLEEPCTL_RELAX			_IO(SNAPSHOT_IOC_MAGIC, 32)
+#define SLEEPCTL_SET_DELAY	32
+
 #endif /* _LINUX_SUSPEND_IOCTLS_H */
Index: linux/kernel/power/Makefile
===================================================================
--- linux.orig/kernel/power/Makefile
+++ linux/kernel/power/Makefile
@@ -8,5 +8,6 @@ obj-$(CONFIG_SUSPEND)		+= suspend.o
 obj-$(CONFIG_PM_TEST_SUSPEND)	+= suspend_test.o
 obj-$(CONFIG_HIBERNATION)	+= hibernate.o snapshot.o swap.o user.o \
 				   block_io.o
+obj-$(CONFIG_PM_SLEEP)		+= sleepctl.o
 
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
Index: linux/kernel/power/sleepctl.c
===================================================================
--- /dev/null
+++ linux/kernel/power/sleepctl.c
@@ -0,0 +1,275 @@
+/*
+ * kernel/power/sleepctl.c
+ *
+ * This file provides definitions of sleepctl special device file operations.
+ *
+ * Copyright (C) 2011 Rafael J. Wysocki <rjw@sisk.pl>, SUSE Labs
+ *
+ * This file is released under the GPLv2.
+ *
+ */
+
+#include <linux/suspend.h>
+#include <linux/syscalls.h>
+#include <linux/string.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+#include <linux/pm.h>
+#include <linux/slab.h>
+
+#include "power.h"
+
+#define DEFAULT_SUSPEND_DELAY_MS	500
+
+static LIST_HEAD(sleepctl_list);
+static LIST_HEAD(sleepctl_active_list);;
+
+static DEFINE_SPINLOCK(sleepctl_lock);
+
+struct sleepctl_data {
+	struct list_head	entry;
+	struct timer_list	timer;
+	unsigned int		delay_ms;
+	bool			active;
+};
+
+static bool __sleepctl_active(void)
+{
+	return !(list_empty(&sleepctl_active_list) || pm_wakeup_pending());
+}
+
+/**
+ * sleepctl_active - Check if the system is allowed to go into a sleep state.
+ *
+ * Check if there's any user space process preventing the system from being put
+ * into a sleep state and return 'true' in that case.
+ */
+bool sleepctl_active(void)
+{
+	bool ret;
+
+	spin_lock_irq(&sleepctl_lock);
+	ret = __sleepctl_active();
+	spin_unlock_irq(&sleepctl_lock);
+	return ret;
+}
+
+static void __sleepctl_relax(struct sleepctl_data *data)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sleepctl_lock, flags);
+
+	if (!data->active)
+		goto unlock;
+
+	data->active = false;
+	list_move_tail(&data->entry, &sleepctl_list);
+
+	if (!__sleepctl_active())
+		pm_sleep_continue();
+
+ unlock:
+	spin_unlock_irqrestore(&sleepctl_lock, flags);
+}
+
+static void sleepctl_timer_fn(unsigned long data)
+{
+	__sleepctl_relax((struct sleepctl_data *)data);
+}
+
+static void sleepctl_relax(struct sleepctl_data *data)
+{
+	del_timer_sync(&data->timer);
+	__sleepctl_relax(data);
+}
+
+static void sleepctl_stay_awake(struct sleepctl_data *data)
+{
+	spin_lock_irq(&sleepctl_lock);
+
+	if (data->active)
+		goto unlock;
+
+	data->active = true;
+	list_move_tail(&data->entry, &sleepctl_active_list);
+	if (data->delay_ms > 0)
+		mod_timer(&data->timer,
+			  jiffies + msecs_to_jiffies(data->delay_ms));
+
+ unlock:
+	spin_unlock_irq(&sleepctl_lock);
+}
+
+/**
+ * sleepctl_rearm - Make the users of /dev/sleepctl block sleep transitions.
+ *
+ * Make all processes having /dev/sleepctl open whose delay_ms fields are
+ * nonzero prevent the system from being put into sleep states.
+ */
+void sleepctl_rearm(void)
+{
+	struct sleepctl_data *data, *n;
+
+	list_for_each_entry_safe(data, n, &sleepctl_list, entry) {
+		if (data->delay_ms > 0)
+			sleepctl_stay_awake(data);
+	}
+}
+
+static int sleepctl_open(struct inode *inode, struct file *filp)
+{
+	struct sleepctl_data *data;
+	int error;
+
+	error = mutex_lock_interruptible(&pm_mutex);
+	if (error)
+		return error;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		error = -ENOMEM;
+		goto err_out;
+	}
+
+	spin_lock_irq(&sleepctl_lock);
+	list_add_tail(&data->entry, &sleepctl_list);
+	spin_unlock_irq(&sleepctl_lock);
+
+	data->delay_ms = DEFAULT_SUSPEND_DELAY_MS;
+	setup_timer(&data->timer, sleepctl_timer_fn, (unsigned long)data);
+
+	nonseekable_open(inode, filp);
+	filp->private_data = data;
+
+ err_out:
+	mutex_unlock(&pm_mutex);
+
+	return error;
+}
+
+static int sleepctl_release(struct inode *inode, struct file *filp)
+{
+	struct sleepctl_data *data;
+	int error;
+
+	error = mutex_lock_interruptible(&pm_mutex);
+	if (error)
+		return error;
+
+	data = filp->private_data;
+	if (data->active)
+		sleepctl_relax(data);
+
+	filp->private_data = NULL;
+
+	spin_lock_irq(&sleepctl_lock);
+	list_del(&data->entry);
+	spin_unlock_irq(&sleepctl_lock);
+
+	kfree(data);
+
+	mutex_unlock(&pm_mutex);
+
+	return 0;
+}
+
+static ssize_t sleepctl_read(struct file *filp, char __user *buf,
+			     size_t count, loff_t *offp)
+{
+	struct sleepctl_data *data;
+	ssize_t res;
+
+	res = mutex_lock_interruptible(&pm_mutex);
+	if (res)
+		return res;
+
+	data = filp->private_data;
+	res = sizeof(unsigned int);
+	if (copy_to_user(buf, &data->delay_ms, res))
+		res = -EFAULT;
+
+	mutex_unlock(&pm_mutex);
+
+	return res;
+}
+
+static ssize_t sleepctl_write(struct file *filp, const char __user *buf,
+			      size_t count, loff_t *offp)
+{
+	struct sleepctl_data *data;
+	ssize_t res;
+	unsigned int delay_ms;
+
+	res = mutex_lock_interruptible(&pm_mutex);
+	if (res)
+		return res;
+
+	res = sizeof(unsigned int);
+	if (copy_from_user(&delay_ms, buf, res)) {
+		res = -EFAULT;
+	} else {
+		data = filp->private_data;
+		sleepctl_relax(data);
+		data->delay_ms = delay_ms;
+	}
+
+	mutex_unlock(&pm_mutex);
+
+	return res;
+}
+
+static long sleepctl_ioctl(struct file *filp, unsigned int cmd,
+			   unsigned long arg)
+{
+	struct sleepctl_data *data;
+	int error;
+
+	error = mutex_lock_interruptible(&pm_mutex);
+	if (error)
+		return error;
+
+	data = filp->private_data;
+
+	switch (cmd) {
+
+	case SLEEPCTL_STAY_AWAKE:
+		del_timer_sync(&data->timer);
+		sleepctl_stay_awake(data);
+		break;
+
+	case SLEEPCTL_RELAX:
+		sleepctl_relax(data);
+		break;
+
+	default:
+		error = -ENOTTY;
+
+	}
+
+	mutex_unlock(&pm_mutex);
+
+	return error;
+}
+
+static const struct file_operations sleepctl_fops = {
+	.open = sleepctl_open,
+	.release = sleepctl_release,
+	.read = sleepctl_read,
+	.write = sleepctl_write,
+	.llseek = no_llseek,
+	.unlocked_ioctl = sleepctl_ioctl,
+};
+
+static struct miscdevice sleepctl_device = {
+	.minor = MISC_DYNAMIC_MINOR,
+	.name = "sleepctl",
+	.fops = &sleepctl_fops,
+};
+
+static int __init sleepctl_device_init(void)
+{
+	return misc_register(&sleepctl_device);
+};
+
+device_initcall(sleepctl_device_init);
Index: linux/kernel/power/suspend.c
===================================================================
--- linux.orig/kernel/power/suspend.c
+++ linux/kernel/power/suspend.c
@@ -281,13 +281,9 @@ int enter_state(suspend_state_t state)
 	if (!valid_state(state))
 		return -ENODEV;
 
-	if (!mutex_trylock(&pm_mutex))
-		return -EBUSY;
-
-	if (pm_wakeup_pending()) {
-		error = -EAGAIN;
-		goto Unlock;
-	}
+	error = pm_sleep_begin();
+	if (error)
+		return error;
 
 	printk(KERN_INFO "PM: Syncing filesystems ... ");
 	sys_sync();
@@ -310,7 +306,7 @@ int enter_state(suspend_state_t state)
 	pr_debug("PM: Finishing wakeup.\n");
 	suspend_finish();
  Unlock:
-	mutex_unlock(&pm_mutex);
+	pm_sleep_end();
 	return error;
 }
 
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -611,12 +611,9 @@ int hibernate(void)
 {
 	int error;
 
-	mutex_lock(&pm_mutex);
-
-	if (pm_wakeup_pending()) {
-		error = -EAGAIN;
-		goto Unlock;
-	}
+	error = pm_sleep_begin();
+	if (error)
+		return error;
 
 	/* The snapshot device should not be opened while we're running */
 	if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
@@ -684,7 +681,7 @@ int hibernate(void)
 	pm_restore_console();
 	atomic_inc(&snapshot_device_available);
  Unlock:
-	mutex_unlock(&pm_mutex);
+	pm_sleep_end();
 	return error;
 }
 
Index: linux/kernel/power/user.c
===================================================================
--- linux.orig/kernel/power/user.c
+++ linux/kernel/power/user.c
@@ -236,13 +236,9 @@ static long snapshot_ioctl(struct file *
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	if (!mutex_trylock(&pm_mutex))
-		return -EBUSY;
-
-	if (pm_wakeup_pending()) {
-		error = -EAGAIN;
-		goto unlock;
-	}
+	error = pm_sleep_begin();
+	if (error)
+		return error;
 
 	data = filp->private_data;
 
@@ -464,7 +460,7 @@ static long snapshot_ioctl(struct file *
 	}
 
  unlock:
-	mutex_unlock(&pm_mutex);
+	pm_sleep_end();
 
 	return error;
 }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -196,6 +196,11 @@ static inline void suspend_test_finish(c
 #ifdef CONFIG_PM_SLEEP
 /* kernel/power/main.c */
 extern int pm_notifier_call_chain(unsigned long val);
+extern int pm_sleep_begin(void);
+extern void pm_sleep_continue(void);
+extern void pm_sleep_end(void);
+extern bool sleepctl_active(void);
+extern void sleepctl_rearm(void);
 #endif
 
 #ifdef CONFIG_HIGHMEM
Index: linux/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-power
+++ linux/Documentation/ABI/testing/sysfs-power
@@ -159,7 +159,7 @@ Description:
 		setting.  Namely, in the "direct" mode, if the write has been
 		successful, it will make the kernel abort a subsequent
 		transition to a sleep state if any wakeup events are reported
-		after the write has returned.  In the "disabled" mode it only
+		after the write has returned.  In the other modes it only
 		saves the current value of registered wakeup events to be used
 		for future checking if new wakeup events have been registered.
 
@@ -182,13 +182,29 @@ What:		/sys/power/sleep_mode
 Date:		October 2011
 Contact:	Rafael J. Wysocki <rjw@sisk.pl>
 Description:
-		The /sys/power/sleep_mode file allows user space to disable all
-		of the suspend/hibernation interfaces by writing "disabled" to
-		it and to enable them again by writing "direct" to it.  If the
-		string corresponding to the current setting is written to this
-		file, the write returns 0.  Otherwise, the number of characters
-		written is returned.
+		The /sys/power/sleep_mode file allows user space to control the
+		way the system suspend and hibernation interfaces work.  The
+		available modes are:
+
+		'disabled':	All of the suspend and hibernation interfaces
+				are disabled and return error codes when used.
+
+		'direct':	Suspend and hibernation interfaces work in the
+				"traditional" way (i.e. a root-owned process can
+				suspend or hibernate the system by writing to
+				/sys/power/state or using /dev/snapshot and the
+				other user space processes can't prevent the
+				system from going into the sleep state).
+
+		'cooperative':	User space processes can cause all attempts to
+				suspend or hibernate the system to block by
+				using the /dev/sleepctl special device file's
+				ioctl()s.
+
+		If the string corresponding to the current setting is written to
+		this file, the write returns 0.  Otherwise, the number of
+		characters written is returned.
 
 		Reading from this file returns the list of available values
-		("disabled" and "direct") with the current one in square
-		brackets.
+		("disabled", "direct", "cooperative") with the current one in
+		square brackets.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-13 19:50 ` [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode Rafael J. Wysocki
@ 2011-10-13 22:58   ` John Stultz
  2011-10-14 22:49     ` Rafael J. Wysocki
  0 siblings, 1 reply; 80+ messages in thread
From: John Stultz @ 2011-10-13 22:58 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM list, mark gross, LKML, Alan Stern, NeilBrown

On Thu, 2011-10-13 at 21:50 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The currently available mechanism allowing the suspend process to
> avoid racing with wakeup events registered by the kernel appears
> to be difficult to use.  Moreover, it requires that the suspend
> process communicate with other user space processes that may take
> part in the handling of wakeup events to make sure that they have
> done their job before suspend is started.  Therefore all of the
> wakeup-handling applications are expected to use an IPC mechanism
> allowing them to exchange information with the suspend process, but
> this expectation turns out to be unrealistic in practice.  For this
> reason, it seems reasonable to add a mechanism allowing the
> wakeup-handling processes to communicate with the suspend process
> to the kernel.

Hey Rafael! 

I'm *very* excited to see some alternate approaches here, as I'll very
much admit that my proposal does have some complexities. While I still
prefer my approach, I'm pragmatic and would be happy with other
solutions as long as they solve the issue.

I've not yet dug deeply into the code of your patch, but some conceptual
thoughts and issues below.

> This change introduces a new sleep mode, called "cooperative" sleep
> mode, which needs to be selected via the /sys/power/sleep_mode sysfs
> attribute and causes detection of wakeup events to be always
> enabled, among other things, and a mechanism allowing user space
> processes to prevent the system from being put into a sleep state
> while in this mode.
> 
> The mechanism introduced by this change is based on a new special
> device file, /dev/sleepctl.  A process wanting to prevent the system
> from being put into a sleep state is expected to open /dev/sleepctl
> and execute the SLEEPCTL_STAY_AWAKE ioctl() with the help of it.
> This will make all attempts to suspend or hibernate the system block
> until (1) the process executes the SLEEPCTL_RELAX ioctl() or (2)
> a predefined timeout expires.  The timeout is set to 500 ms by
> default, but the process can change it by writing the new timeout
> value (in milliseconds) to /dev/sleepctl, in binary (unsigned int)
> format.

Just a nit, but is there any reason not to use u64 nanosecond value
instead of the jiffies-like granularity and range? Maybe u64 ns is
over-design, but milliseconds are getting a bit coarse these days.

>   The current timeout value can be read from /dev/sleepctl.
> Setting the timeout to 0 disables it, i.e. it makes the
> SLEEPCTL_STAY_AWAKE ioctl() block attempts to suspend or hibernate
> the system until the SLEEPCTL_RELAX ioctl() is executed.
> 
> In addition to that, when system is resuming from suspend or
> hibernation, the kernel automatically carries out an operation

Only when resuming from suspend/hibernation? Hrmm.. See below for my
concerns about this specifically.

> equivalent to the SLEEPCTL_STAY_AWAKE ioctl() for all processes
> that have /dev/sleepctl open at that time and whose timeouts are
> greater than 0 (i.e. enabled), to allows those processes to
> complete the handling of wakeup events before the system can be
> put to a sleep state again.

So the application psudocode looks like the following?

Example 1:
----------
sleepfd = open("/dev/sleepctl",...);
devfd = open("/dev/wakeup-button",...);
...
count = read(devfd, buf, bufsize);
ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */
do_stuff(buf,count);
ioctl(sleepfd, SLEEP_RELAX);

And the assumption is that when *any* wakeup event occurs, even if its
not the /dev/wakeup-button, the system will stay awake on this
application's behalf for 500ms (or the max value provided to sleepctl)

Then, the hope is that if the wakeup-button did wake the system up, the
application would get woken up from the read() call and hopefully
complete the STAY_AWAKE ioctl within the provided 500ms.

A minor nit, first: With the code above, after we call SLEEP_RELAX, the
timeout has been set to zero, so if we're the only one, the next wakeup
will not actually inhibit suspend for any amount of time. It might be
good to separate the ioctl used to set the timeout length, and the one
to inhibit suspend.

Now, my opinion: So, again, I'd welcome any solution to the problem, but
I'm personally not a big fan of the timeout usage found in this
proposal, as well as the Android wakelocks implementation. Its simply
racy, and would break down under heavy load or when interacting with
cpuhogging SCHED_FIFO tasks. Practically, it can be made to work, but I
worry the extra safety-margins folks will add to the timeouts will
result in inefficient power wasting.

Now, an actual problem: Further, I'm worried this still doesn't address
the main race in the alarmtimer triggered system backup case:

Example 2:
----------
sleepfd = open("/dev/sleepctl",...);
...
/* wait till 5pm */
clock_nanosleep(CLOCK_REALTIME_ALARM, TIMER_ABSTIME, backup_ts); 
ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */
do_backup();
ioctl(sleepfd, SLEEP_RELAX);

Which is basically identical to the above. At 5pm the alarmtimer fires,
and increments the wakeup_count. 

At the same time, maybe on a different cpu, the PM daemon reads the
updated wakeup_count, writes it back and triggers suspend.

All of this happens before my backup application gets scheduled and can
call the ioctl.

I think in order to avoid this with your approach, I think you're going
to need to have the kernel take the SLEEPCTL_STAY_AWAKE timeout for
every open fd upon *any* wakeup event, even when the system is running
and not just at resume.

The same bad behavior could also be tripped in example #1, with the
wakeup button being pressed while the system was running, right as a
suspend was triggered.

I think this is in part an issue with the "globalness" of the
wakeup_count value. We know an event happened, but we don't *which*
event, or if anyone was waiting for that event, or if the event has been
consumed. Thus with your approach, its necessary to use a timeout to try
to cover everyone, since there's not enough knowledge. 

Basically it breaks down to three questions I think we have to answer:
1) What event is being waited on?
2) Who is waiting?
3) Has the event been consumed?

To summarize my understanding of other recently proposed approaches to
this core issue:

Again, in your proposal (if adjusted as I suggest to avoid the backup
race) tasks register their wakeup-interest (#2), by opening the sleepctl
file, and then you inhibit suspend for the maximum specified timeout on
every wakeup event (#1) assuming that gives enough time for whichever
task was waiting on the triggered event to consume it (#3).

Neil's userspace approach (as best as I understand it) tries to resolve
this knowledge issue by requiring *everyone* who might be waiting to
consume wakeup events check-in with the PM daemon prior to *any* suspend
(If the PM daemon is aggressive, trying to suspend frequently, this
results in requiring every consumer to check in on every wakeup event).
So in this model, we get a list of waiters (#2) communicating with the
PM daemon, and for any event (#1), we require all waiters to ack (#3)
that its ok to suspend.

Mark's approach uses per-wakeup-device files in order to inform the
kernel about interest, allowing the kernel to inhibit suspend when a
wakeup event occurs on that device for each open fd (#1 & #2). Then it
requires each consumer to "ack" the events consumption (#3) back to the
fd, where the suspend inhibition is dropped. 

My approach is using a per-task flag of power-importance(#2), which
inhibits suspend if any task has such a flag. Any blocking call upon a
wakeup device (#1) will drop the flag, allowing suspend to occur, and
the kernel re-raises the flag when the task is woken up, which the task
can drop when its done (#3). 

Finally, Android's wakelock's are actually very conceptually similar to
Mark's, but utilize existing device files (#1&#2) (so its a little more
implicit) and uses read()  as the "ack" (#3) to allow the kernel to drop
the wakelock.

Does that seem reasonably accurate/fair?

thanks
-john

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-13 19:45 [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Rafael J. Wysocki
  2011-10-13 19:49 ` [RFC][PATCH 1/2] PM / Sleep: Add mechanism to disable suspend and hibernation Rafael J. Wysocki
  2011-10-13 19:50 ` [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode Rafael J. Wysocki
@ 2011-10-14  5:52 ` NeilBrown
  2011-10-14 16:00   ` Alan Stern
  2011-10-15 22:10   ` Rafael J. Wysocki
  2011-10-31 19:55 ` Ming Lei
  3 siblings, 2 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-14  5:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

[-- Attachment #1: Type: text/plain, Size: 5696 bytes --]

On Thu, 13 Oct 2011 21:45:42 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> Hi,
> 
> There is an ongoing discussion about possible ways to avoid some probles
> related to suspend/hibernate interfaces, such as races with the handling
> of wakeup events (in user space) and (destructive) interference with some
> important system maintenance tasks (such as firmware updates).
> 
> It follows from this discussion that whatever the kernel has to offer in
> this area is either too complicated to use in practice or inadequate for
> other reasons.  The two use case examples given by John, that is the
> firmware update problem (i.e. system suspend or hibernation should not
> be allowed to happen while the system's firmware is being updated) and the
> backup problem (i.e. is should be possible to wake up the system from
> sleep in the middle of the night via a timer event, create a backup of it
> and make it go to sleep again automatically when the backup is ready
> without implementing the backup feature in a power manager) are quite
> convincing to me, but also it seems to me that previous attempts to
> address them go a bit too far in complexity.  For this reason, I thought
> it might be a good idea to propose a simpler approach.  It is not bullet
> proof, but it should be suitable to address at least those two issues.
> 
> First, to address the firmware update problem, I think we need a big
> hammer switch allowing a root-owned process to disable/enable all
> suspend/hibernate interfaces.  This is introduced by the first patch in
> the form of a new sysfs attribute, /sys/power/sleep_mode, that can be
> used to disable the suspend/hibernate functionality (it does that with
> the help of the existing wakeup events detection mechanism).
> 
> Second, to address the backup problem, we need to allow user space
> processes other than the suspend/hibernate process itself to prevent the
> system from being put into sleep states.  A mechanism for that is introduced
> by the second patch in the form of the /dev/sleepctl special device working
> kind of like user space wakelocks on Android (although in a simplified
> fashion).
> 
> More details are in the changelogs and (of course) in the code itself.
> 
> The patches haven't been tested (I had tested the first one, but then I made
> some changes to it afterwards), so most likely there are some bugs in them,
> but I didn't want to lose time on testing things that people may not like
> in principle. :-)
>

Hi Rafael,

 What do you mean by "too complicated to use in practice"?  What is you
 measure for complexity?
 Using suspend in a race-free way is certainly less complex than - for
 example - configuring bluetooth.
 And in what way is it "inadequate for other reasons"? What reasons?

 The only sane way to handle suspend is for any (suitably privileged) process
 to be able to request that suspend doesn't happen, and then for one process
 to initiate suspend when no-one is blocking it.

 This is very different from the way it is currently handled were the GUI
 says "Hmm.. I'm not doing anything just now, I think I'll suspend".

 The later simply doesn't scale.  It is broken.  It has to be replaced.
 And it is being replaced.

 gnome-power-manage has a dbus interface on which you can request
 "InhibitInactiveSleep".  Call that will stop gnome-power-manager from
 sleeping (I assume - haven't looked at the code).
 It might not inhibit an explicit request for sleep - in that case it is
 probably broken and needs to be fixed.  But is can be fixed.  Or replaced.

 So if someone is running gnome-power-manager and wants to perform a firmware
 update, the correct thing to do is to use dbus to disable the inactive sleep.
 If someone is using some other power manager they might need to use some
 other mechanism.  Presumably these things will be standardised at some stage.

 But I think it is very wrong to put some hack in the kernel like your
   suspend_mode = disabled

 just because the user-space community hasn't got its act together yet.

 And if you really need a hammer to stop processes from suspending the system:

   cat /sys/power/state > /tmp/state
   mount --bind /tmp/state /sys/power/state

 should to it.

 You second patch has little to recommend it either.
 In the first place it seems to be entrenching the notion that timeouts are a
 good and valid way to think about suspend.
 I certainly agree that there are plenty of cases where timeouts are
 important and necessary.  But there are also plenty of cases where you will
 know exactly when you can allow suspend again, and having a timeout there is
 just confusing.

 But worse - the mechanism you provide can be trivially implemented using
 unix-domain sockets talking to a suspend-daemon.

 Instead of opening /dev/sleepctl, you connect to /var/run/suspend-daemon/sock
 Instead of ioctl(SLEEPCTL_STAY_AWAKE), you write a number to the socket.
 Instead of ioctl(SLEEPCTL_RELAX), you write zero to the socket.

 All the extra handling you do in the kernel, can easily be done by
 user-space suspend-daemon.

 I really wish I could work out why people find the current mechanism
 "difficult to use".  What exactly is it that is difficult?
 I have describe previously how to build a race-free suspend system.  Which
 bit of that is complicated or hard to achieve?  Or which bit of that cannot
 work the way I claim?  Or which need is not met by my proposals?

 Isn't it much preferable to do this in userspace where people can
 experiment and refine and improve without having to upgrade the kernel?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-14  5:52 ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces NeilBrown
@ 2011-10-14 16:00   ` Alan Stern
  2011-10-14 21:07     ` NeilBrown
  2011-10-15 22:10   ` Rafael J. Wysocki
  1 sibling, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-14 16:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

On Fri, 14 Oct 2011, NeilBrown wrote:

> Hi Rafael,
> 
>  What do you mean by "too complicated to use in practice"?  What is you
>  measure for complexity?
>  Using suspend in a race-free way is certainly less complex than - for
>  example - configuring bluetooth.
>  And in what way is it "inadequate for other reasons"? What reasons?
> 
> 
>  The only sane way to handle suspend is for any (suitably privileged) process
>  to be able to request that suspend doesn't happen, and then for one process
>  to initiate suspend when no-one is blocking it.

One of the things Rafael didn't mention is that sometimes a kernel 
driver needs to prevent the system from suspending.  This happens when 
recharging over a USB connection.

There's no simple way for such a driver to communicate with a power
daemon.  The driver has to use something like the wakeup mechanism -- 
but currently that mechanism is optional.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-14 16:00   ` Alan Stern
@ 2011-10-14 21:07     ` NeilBrown
  2011-10-15 18:34       ` Alan Stern
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-14 21:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

[-- Attachment #1: Type: text/plain, Size: 2109 bytes --]

On Fri, 14 Oct 2011 12:00:59 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Fri, 14 Oct 2011, NeilBrown wrote:
> 
> > Hi Rafael,
> > 
> >  What do you mean by "too complicated to use in practice"?  What is you
> >  measure for complexity?
> >  Using suspend in a race-free way is certainly less complex than - for
> >  example - configuring bluetooth.
> >  And in what way is it "inadequate for other reasons"? What reasons?
> > 
> > 
> >  The only sane way to handle suspend is for any (suitably privileged) process
> >  to be able to request that suspend doesn't happen, and then for one process
> >  to initiate suspend when no-one is blocking it.
> 
> One of the things Rafael didn't mention is that sometimes a kernel 
> driver needs to prevent the system from suspending.  This happens when 
> recharging over a USB connection.
> 
> There's no simple way for such a driver to communicate with a power
> daemon.  The driver has to use something like the wakeup mechanism -- 
> but currently that mechanism is optional.
> 
> Alan Stern

Certainly I don't expect a kernel driver to communicate directly with a
user-space daemon.  It communicates indirectly through the wakeup_source
mechanism.
If user-space wants to block suspend, it talks to the suspend daemon (power
manager) some how (dbus, lock files, sockets, signals, whatever).
If the kernel wants to block suspend, it activates a wakeup_source (aka
caffeine source) which the suspend daemon notices via /sys/power/wakeup_count.

But you say this wakeup mechanism is optional .... I don't see that.

It is implemented in drivers/base/power/wakeup.c which is included in the
kernel if CONFIG_PM_SLEEP which is defined as

config PM_SLEEP
	def_bool y
	depends on SUSPEND || HIBERNATE_CALLBACKS

which seems to mean "enable this unless we don't have suspend and we don't
have hibernate".

So it seems that the only time we don't have the wakeup mechanism, we also
have no risk of ever going to sleep.

What exactly where you saying was "optional"?? I don't understand.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-13 22:58   ` John Stultz
@ 2011-10-14 22:49     ` Rafael J. Wysocki
  2011-10-15  0:04       ` John Stultz
  0 siblings, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-14 22:49 UTC (permalink / raw)
  To: John Stultz; +Cc: Linux PM list, mark gross, LKML, Alan Stern, NeilBrown

Hi,

On Friday, October 14, 2011, John Stultz wrote:
> On Thu, 2011-10-13 at 21:50 +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > The currently available mechanism allowing the suspend process to
> > avoid racing with wakeup events registered by the kernel appears
> > to be difficult to use.  Moreover, it requires that the suspend
> > process communicate with other user space processes that may take
> > part in the handling of wakeup events to make sure that they have
> > done their job before suspend is started.  Therefore all of the
> > wakeup-handling applications are expected to use an IPC mechanism
> > allowing them to exchange information with the suspend process, but
> > this expectation turns out to be unrealistic in practice.  For this
> > reason, it seems reasonable to add a mechanism allowing the
> > wakeup-handling processes to communicate with the suspend process
> > to the kernel.
> 
> Hey Rafael! 
> 
> I'm *very* excited to see some alternate approaches here, as I'll very
> much admit that my proposal does have some complexities. While I still
> prefer my approach, I'm pragmatic and would be happy with other
> solutions as long as they solve the issue.
> 
> I've not yet dug deeply into the code of your patch, but some conceptual
> thoughts and issues below.
> 
> > This change introduces a new sleep mode, called "cooperative" sleep
> > mode, which needs to be selected via the /sys/power/sleep_mode sysfs
> > attribute and causes detection of wakeup events to be always
> > enabled, among other things, and a mechanism allowing user space
> > processes to prevent the system from being put into a sleep state
> > while in this mode.
> > 
> > The mechanism introduced by this change is based on a new special
> > device file, /dev/sleepctl.  A process wanting to prevent the system
> > from being put into a sleep state is expected to open /dev/sleepctl
> > and execute the SLEEPCTL_STAY_AWAKE ioctl() with the help of it.
> > This will make all attempts to suspend or hibernate the system block
> > until (1) the process executes the SLEEPCTL_RELAX ioctl() or (2)
> > a predefined timeout expires.  The timeout is set to 500 ms by
> > default, but the process can change it by writing the new timeout
> > value (in milliseconds) to /dev/sleepctl, in binary (unsigned int)
> > format.
> 
> Just a nit, but is there any reason not to use u64 nanosecond value
> instead of the jiffies-like granularity and range? Maybe u64 ns is
> over-design, but milliseconds are getting a bit coarse these days.
> 
> >   The current timeout value can be read from /dev/sleepctl.
> > Setting the timeout to 0 disables it, i.e. it makes the
> > SLEEPCTL_STAY_AWAKE ioctl() block attempts to suspend or hibernate
> > the system until the SLEEPCTL_RELAX ioctl() is executed.
> > 
> > In addition to that, when system is resuming from suspend or
> > hibernation, the kernel automatically carries out an operation
> 
> Only when resuming from suspend/hibernation? Hrmm.. See below for my
> concerns about this specifically.
> 
> > equivalent to the SLEEPCTL_STAY_AWAKE ioctl() for all processes
> > that have /dev/sleepctl open at that time and whose timeouts are
> > greater than 0 (i.e. enabled), to allows those processes to
> > complete the handling of wakeup events before the system can be
> > put to a sleep state again.
> 
> So the application psudocode looks like the following?

Not really.

> Example 1:
> ----------
> sleepfd = open("/dev/sleepctl",...);
> devfd = open("/dev/wakeup-button",...);
> ...
> count = read(devfd, buf, bufsize);
> ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */

No, this doesn't work like this.  You'd need to do:

write(sleepfd, zero_buf, sizeof(unsigned int));
ioctl(sleepfd, SLEEP_STAY_AWAKE);

> do_stuff(buf,count);
> ioctl(sleepfd, SLEEP_RELAX);
> 
> 
> And the assumption is that when *any* wakeup event occurs, even if its
> not the /dev/wakeup-button, the system will stay awake on this
> application's behalf for 500ms (or the max value provided to sleepctl)
> 
> Then, the hope is that if the wakeup-button did wake the system up, the
> application would get woken up from the read() call and hopefully
> complete the STAY_AWAKE ioctl within the provided 500ms.
> 
> 
> A minor nit, first: With the code above, after we call SLEEP_RELAX, the
> timeout has been set to zero, so if we're the only one, the next wakeup
> will not actually inhibit suspend for any amount of time. It might be
> good to separate the ioctl used to set the timeout length, and the one
> to inhibit suspend.

It is separate.  The timeout is set by a write().

> 
> Now, my opinion: So, again, I'd welcome any solution to the problem, but
> I'm personally not a big fan of the timeout usage found in this
> proposal, as well as the Android wakelocks implementation. Its simply
> racy, and would break down under heavy load or when interacting with
> cpuhogging SCHED_FIFO tasks. Practically, it can be made to work, but I
> worry the extra safety-margins folks will add to the timeouts will
> result in inefficient power wasting.

That's the cost of simplicity.

> Now, an actual problem: Further, I'm worried this still doesn't address
> the main race in the alarmtimer triggered system backup case:
> 
> Example 2:
> ----------
> sleepfd = open("/dev/sleepctl",...);
> ...
> /* wait till 5pm */
> clock_nanosleep(CLOCK_REALTIME_ALARM, TIMER_ABSTIME, backup_ts); 
> ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */
> do_backup();
> ioctl(sleepfd, SLEEP_RELAX);
> 
> 
> Which is basically identical to the above. At 5pm the alarmtimer fires,
> and increments the wakeup_count. 
> 
> At the same time, maybe on a different cpu, the PM daemon reads the
> updated wakeup_count, writes it back and triggers suspend.

It doesn't really have to write it back, but that's a minor thing.

> All of this happens before my backup application gets scheduled and can
> call the ioctl.

So the solution for your backup application is to (1) open "/dev/sleepctl",
(2) set a suitable timeout (using write()), (3) call
ioctl(sleepfd, SLEEP_STAY_AWAKE) and go to sleep, (4) (when woken up)
open "/dev/sleepctl" again, set the timeout on sleepfd2 to 0 and call
ioctl(sleepfd2, SLEEP_STAY_AWAKE), (5) close sleepfd (the first instance),
(6) do whatever it wants and (7) call ioctl(sleepfd2, SLEEP_RELAX).

> I think in order to avoid this with your approach, I think you're going
> to need to have the kernel take the SLEEPCTL_STAY_AWAKE timeout for
> every open fd upon *any* wakeup event, even when the system is running
> and not just at resume.
> 
> The same bad behavior could also be tripped in example #1, with the
> wakeup button being pressed while the system was running, right as a
> suspend was triggered.
> 
> 
> I think this is in part an issue with the "globalness" of the
> wakeup_count value. We know an event happened, but we don't *which*
> event, or if anyone was waiting for that event, or if the event has been
> consumed. Thus with your approach, its necessary to use a timeout to try
> to cover everyone, since there's not enough knowledge. 
> 
> Basically it breaks down to three questions I think we have to answer:
> 1) What event is being waited on?
> 2) Who is waiting?
> 3) Has the event been consumed?

Which only is relevant if we want to have a very fine grained resolution
of things.  I'm not really sure that very fine grained resolution is
achievable at all anyway, though.

> To summarize my understanding of other recently proposed approaches to
> this core issue:
> 
> Again, in your proposal (if adjusted as I suggest to avoid the backup
> race) tasks register their wakeup-interest (#2), by opening the sleepctl
> file, and then you inhibit suspend for the maximum specified timeout on
> every wakeup event (#1) assuming that gives enough time for whichever
> task was waiting on the triggered event to consume it (#3).

To be precise, the maximum specified timeout is only used for the events
that either aborted suspend in progress, or actually woke up the system.

> Neil's userspace approach (as best as I understand it) tries to resolve
> this knowledge issue by requiring *everyone* who might be waiting to
> consume wakeup events check-in with the PM daemon prior to *any* suspend
> (If the PM daemon is aggressive, trying to suspend frequently, this
> results in requiring every consumer to check in on every wakeup event).
> So in this model, we get a list of waiters (#2) communicating with the
> PM daemon, and for any event (#1), we require all waiters to ack (#3)
> that its ok to suspend.
> 
> Mark's approach uses per-wakeup-device files in order to inform the
> kernel about interest, allowing the kernel to inhibit suspend when a
> wakeup event occurs on that device for each open fd (#1 & #2). Then it
> requires each consumer to "ack" the events consumption (#3) back to the
> fd, where the suspend inhibition is dropped. 
> 
> My approach is using a per-task flag of power-importance(#2), which
> inhibits suspend if any task has such a flag. Any blocking call upon a
> wakeup device (#1) will drop the flag, allowing suspend to occur, and
> the kernel re-raises the flag when the task is woken up, which the task
> can drop when its done (#3). 
> 
> Finally, Android's wakelock's are actually very conceptually similar to
> Mark's, but utilize existing device files (#1&#2) (so its a little more
> implicit) and uses read()  as the "ack" (#3) to allow the kernel to drop
> the wakelock.

I don't think it works this way.  The Android's interface for using wakelocks
from user space is just that a user space process can create and use a
wakelock with the help of a special device file or something like this,
IIRC.

> Does that seem reasonably accurate/fair?

It would be fair if you said that your approach and the Android's one had been
frowned upon by the scheduler people, so I don't think we can realistically
regard them as doable. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-14 22:49     ` Rafael J. Wysocki
@ 2011-10-15  0:04       ` John Stultz
  2011-10-15 21:29         ` Rafael J. Wysocki
  0 siblings, 1 reply; 80+ messages in thread
From: John Stultz @ 2011-10-15  0:04 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM list, mark gross, LKML, Alan Stern, NeilBrown

On Sat, 2011-10-15 at 00:49 +0200, Rafael J. Wysocki wrote:
> On Friday, October 14, 2011, John Stultz wrote:
> > On Thu, 2011-10-13 at 21:50 +0200, Rafael J. Wysocki wrote:
> > > equivalent to the SLEEPCTL_STAY_AWAKE ioctl() for all processes
> > > that have /dev/sleepctl open at that time and whose timeouts are
> > > greater than 0 (i.e. enabled), to allows those processes to
> > > complete the handling of wakeup events before the system can be
> > > put to a sleep state again.
> > 
> > So the application psudocode looks like the following?
> 
> Not really.
> 
> > Example 1:
> > ----------
> > sleepfd = open("/dev/sleepctl",...);
> > devfd = open("/dev/wakeup-button",...);
> > ...
> > count = read(devfd, buf, bufsize);
> > ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */
> 
> No, this doesn't work like this.  You'd need to do:
> 
> write(sleepfd, zero_buf, sizeof(unsigned int));
> ioctl(sleepfd, SLEEP_STAY_AWAKE);

Sorry for misunderstanding. Thanks for the clarification!

 
> > Now, my opinion: So, again, I'd welcome any solution to the problem, but
> > I'm personally not a big fan of the timeout usage found in this
> > proposal, as well as the Android wakelocks implementation. Its simply
> > racy, and would break down under heavy load or when interacting with
> > cpuhogging SCHED_FIFO tasks. Practically, it can be made to work, but I
> > worry the extra safety-margins folks will add to the timeouts will
> > result in inefficient power wasting.
> 
> That's the cost of simplicity.

True. Again, just my feelings about the approach, not an objection to
it. I'm happy if a workable solution can be merged, but I want to make
sure the tradeoffs are discussed/understood.


> > Now, an actual problem: Further, I'm worried this still doesn't address
> > the main race in the alarmtimer triggered system backup case:
> > 
> > Example 2:
> > ----------
> > sleepfd = open("/dev/sleepctl",...);
> > ...
> > /* wait till 5pm */
> > clock_nanosleep(CLOCK_REALTIME_ALARM, TIMER_ABSTIME, backup_ts); 
> > ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */
> > do_backup();
> > ioctl(sleepfd, SLEEP_RELAX);
> > 
> > 
> > Which is basically identical to the above. At 5pm the alarmtimer fires,
> > and increments the wakeup_count. 
> > 
> > At the same time, maybe on a different cpu, the PM daemon reads the
> > updated wakeup_count, writes it back and triggers suspend.
> 
> It doesn't really have to write it back, but that's a minor thing.
> 
> > All of this happens before my backup application gets scheduled and can
> > call the ioctl.
> 
> So the solution for your backup application is to (1) open "/dev/sleepctl",
> (2) set a suitable timeout (using write()), (3) call
> ioctl(sleepfd, SLEEP_STAY_AWAKE) and go to sleep, (4) (when woken up)
> open "/dev/sleepctl" again, set the timeout on sleepfd2 to 0 and call
> ioctl(sleepfd2, SLEEP_STAY_AWAKE), (5) close sleepfd (the first instance),
> (6) do whatever it wants and (7) call ioctl(sleepfd2, SLEEP_RELAX).

I think I see how using the two open fds on sleepctl allows having the
timeout based stay-awake to be triggered after the resume, as well as
the infinite stay-awake which can be taken once the task is woken up.

But I'm still not sure I follow how this avoids the race between the
alarm firing and a separate suspend call when the system has been
running. That is why I suggested having the kernel take the stay-awake
timeout on *any* wakeup event (which I think would resolve this
concern). Does that make any sense?

My apologies if I'm still just not understanding it.


> > I think in order to avoid this with your approach, I think you're going
> > to need to have the kernel take the SLEEPCTL_STAY_AWAKE timeout for
> > every open fd upon *any* wakeup event, even when the system is running
> > and not just at resume.
> > 
> > The same bad behavior could also be tripped in example #1, with the
> > wakeup button being pressed while the system was running, right as a
> > suspend was triggered.
> > 
> > 
> > I think this is in part an issue with the "globalness" of the
> > wakeup_count value. We know an event happened, but we don't *which*
> > event, or if anyone was waiting for that event, or if the event has been
> > consumed. Thus with your approach, its necessary to use a timeout to try
> > to cover everyone, since there's not enough knowledge. 
> > 
> > Basically it breaks down to three questions I think we have to answer:
> > 1) What event is being waited on?
> > 2) Who is waiting?
> > 3) Has the event been consumed?
> 
> Which only is relevant if we want to have a very fine grained resolution
> of things.  I'm not really sure that very fine grained resolution is
> achievable at all anyway, though.
> 
> > To summarize my understanding of other recently proposed approaches to
> > this core issue:
> > 
> > Again, in your proposal (if adjusted as I suggest to avoid the backup
> > race) tasks register their wakeup-interest (#2), by opening the sleepctl
> > file, and then you inhibit suspend for the maximum specified timeout on
> > every wakeup event (#1) assuming that gives enough time for whichever
> > task was waiting on the triggered event to consume it (#3).
> 
> To be precise, the maximum specified timeout is only used for the events
> that either aborted suspend in progress, or actually woke up the system.
> 
> > Neil's userspace approach (as best as I understand it) tries to resolve
> > this knowledge issue by requiring *everyone* who might be waiting to
> > consume wakeup events check-in with the PM daemon prior to *any* suspend
> > (If the PM daemon is aggressive, trying to suspend frequently, this
> > results in requiring every consumer to check in on every wakeup event).
> > So in this model, we get a list of waiters (#2) communicating with the
> > PM daemon, and for any event (#1), we require all waiters to ack (#3)
> > that its ok to suspend.
> > 
> > Mark's approach uses per-wakeup-device files in order to inform the
> > kernel about interest, allowing the kernel to inhibit suspend when a
> > wakeup event occurs on that device for each open fd (#1 & #2). Then it
> > requires each consumer to "ack" the events consumption (#3) back to the
> > fd, where the suspend inhibition is dropped. 
> > 
> > My approach is using a per-task flag of power-importance(#2), which
> > inhibits suspend if any task has such a flag. Any blocking call upon a
> > wakeup device (#1) will drop the flag, allowing suspend to occur, and
> > the kernel re-raises the flag when the task is woken up, which the task
> > can drop when its done (#3). 
> > 
> > Finally, Android's wakelock's are actually very conceptually similar to
> > Mark's, but utilize existing device files (#1&#2) (so its a little more
> > implicit) and uses read()  as the "ack" (#3) to allow the kernel to drop
> > the wakelock.
> 
> I don't think it works this way.  The Android's interface for using wakelocks
> from user space is just that a user space process can create and use a
> wakelock with the help of a special device file or something like this,
> IIRC.

There are also wakelocks (with timeouts) taken by the kernel to protect
the wakeup-event data while it is buffered until it is read by userland
(or the timeout trips). Thus the select/wake_lock/read pattern that was
discussed by Arve here:
http://www.kerneltrap.com/mailarchive/linux-kernel/2010/5/21/4573616


> > Does that seem reasonably accurate/fair?
> 
> It would be fair if you said that your approach and the Android's one had been
> frowned upon by the scheduler people, so I don't think we can realistically
> regard them as doable. :-)

Totally true! My approach does have its significant detractors. :) 

My hope with the above was not to judge the different approaches, but
just to summarize what we've seen so far in the hopes that we can better
understand the problem.

Again, as I said, I'm very excited to see your proposal, and look
forward to it (or maybe just my understanding of it :) progressing.

thanks
-john


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-14 21:07     ` NeilBrown
@ 2011-10-15 18:34       ` Alan Stern
  2011-10-15 21:43         ` NeilBrown
  0 siblings, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-15 18:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

On Sat, 15 Oct 2011, NeilBrown wrote:

> > One of the things Rafael didn't mention is that sometimes a kernel 
> > driver needs to prevent the system from suspending.  This happens when 
> > recharging over a USB connection.
> > 
> > There's no simple way for such a driver to communicate with a power
> > daemon.  The driver has to use something like the wakeup mechanism -- 
> > but currently that mechanism is optional.
> > 
> > Alan Stern
> 
> Certainly I don't expect a kernel driver to communicate directly with a
> user-space daemon.  It communicates indirectly through the wakeup_source
> mechanism.
> If user-space wants to block suspend, it talks to the suspend daemon (power
> manager) some how (dbus, lock files, sockets, signals, whatever).
> If the kernel wants to block suspend, it activates a wakeup_source (aka
> caffeine source) which the suspend daemon notices via /sys/power/wakeup_count.
> 
> But you say this wakeup mechanism is optional .... I don't see that.
> 
> It is implemented in drivers/base/power/wakeup.c which is included in the
> kernel if CONFIG_PM_SLEEP which is defined as
> 
> config PM_SLEEP
> 	def_bool y
> 	depends on SUSPEND || HIBERNATE_CALLBACKS
> 
> which seems to mean "enable this unless we don't have suspend and we don't
> have hibernate".
> 
> So it seems that the only time we don't have the wakeup mechanism, we also
> have no risk of ever going to sleep.
> 
> What exactly where you saying was "optional"?? I don't understand.

It's optional in the sense that user programs can bypass it.  They 
aren't forced to read or write /sys/power/wakeup_countm, and if they 
don't then the wakeup mechanism won't prevent the system from 
suspending.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-15  0:04       ` John Stultz
@ 2011-10-15 21:29         ` Rafael J. Wysocki
  2011-10-17 16:48           ` John Stultz
  0 siblings, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-15 21:29 UTC (permalink / raw)
  To: John Stultz; +Cc: Linux PM list, mark gross, LKML, Alan Stern, NeilBrown

On Saturday, October 15, 2011, John Stultz wrote:
> On Sat, 2011-10-15 at 00:49 +0200, Rafael J. Wysocki wrote:
> > On Friday, October 14, 2011, John Stultz wrote:
> > > On Thu, 2011-10-13 at 21:50 +0200, Rafael J. Wysocki wrote:
> > > > equivalent to the SLEEPCTL_STAY_AWAKE ioctl() for all processes
> > > > that have /dev/sleepctl open at that time and whose timeouts are
> > > > greater than 0 (i.e. enabled), to allows those processes to
> > > > complete the handling of wakeup events before the system can be
> > > > put to a sleep state again.
> > > 
> > > So the application psudocode looks like the following?
> > 
> > Not really.
> > 
> > > Example 1:
> > > ----------
> > > sleepfd = open("/dev/sleepctl",...);
> > > devfd = open("/dev/wakeup-button",...);
> > > ...
> > > count = read(devfd, buf, bufsize);
> > > ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */
> > 
> > No, this doesn't work like this.  You'd need to do:
> > 
> > write(sleepfd, zero_buf, sizeof(unsigned int));
> > ioctl(sleepfd, SLEEP_STAY_AWAKE);
> 
> Sorry for misunderstanding. Thanks for the clarification!
> 
>  
> > > Now, my opinion: So, again, I'd welcome any solution to the problem, but
> > > I'm personally not a big fan of the timeout usage found in this
> > > proposal, as well as the Android wakelocks implementation. Its simply
> > > racy, and would break down under heavy load or when interacting with
> > > cpuhogging SCHED_FIFO tasks. Practically, it can be made to work, but I
> > > worry the extra safety-margins folks will add to the timeouts will
> > > result in inefficient power wasting.
> > 
> > That's the cost of simplicity.
> 
> True. Again, just my feelings about the approach, not an objection to
> it. I'm happy if a workable solution can be merged, but I want to make
> sure the tradeoffs are discussed/understood.
> 
> 
> > > Now, an actual problem: Further, I'm worried this still doesn't address
> > > the main race in the alarmtimer triggered system backup case:
> > > 
> > > Example 2:
> > > ----------
> > > sleepfd = open("/dev/sleepctl",...);
> > > ...
> > > /* wait till 5pm */
> > > clock_nanosleep(CLOCK_REALTIME_ALARM, TIMER_ABSTIME, backup_ts); 
> > > ioctl(sleepfd, SLEEP_STAY_AWAKE, 0); /* no timeout */
> > > do_backup();
> > > ioctl(sleepfd, SLEEP_RELAX);
> > > 
> > > 
> > > Which is basically identical to the above. At 5pm the alarmtimer fires,
> > > and increments the wakeup_count. 
> > > 
> > > At the same time, maybe on a different cpu, the PM daemon reads the
> > > updated wakeup_count, writes it back and triggers suspend.
> > 
> > It doesn't really have to write it back, but that's a minor thing.
> > 
> > > All of this happens before my backup application gets scheduled and can
> > > call the ioctl.
> > 
> > So the solution for your backup application is to (1) open "/dev/sleepctl",
> > (2) set a suitable timeout (using write()), (3) call
> > ioctl(sleepfd, SLEEP_STAY_AWAKE) and go to sleep, (4) (when woken up)
> > open "/dev/sleepctl" again, set the timeout on sleepfd2 to 0 and call
> > ioctl(sleepfd2, SLEEP_STAY_AWAKE), (5) close sleepfd (the first instance),
> > (6) do whatever it wants and (7) call ioctl(sleepfd2, SLEEP_RELAX).
> 
> I think I see how using the two open fds on sleepctl allows having the
> timeout based stay-awake to be triggered after the resume, as well as
> the infinite stay-awake which can be taken once the task is woken up.
> 
> But I'm still not sure I follow how this avoids the race between the
> alarm firing and a separate suspend call when the system has been
> running. That is why I suggested having the kernel take the stay-awake
> timeout on *any* wakeup event (which I think would resolve this
> concern). Does that make any sense?
> 
> My apologies if I'm still just not understanding it.

No, you understand it very well, I'm sorry for being a bit dense.  I must
have been really tired last night. :-)

So I think (please correct me if I'm wrong) that you're worried about the
following situation:

- The process opens /dev/sleepctl and sets the timeout
- It sets up a wake alarm to trigger at time T.
- It goes to sleep and sets it wakeup time to time T too, e.g. using select()
  with a timeout.
- The system doesn't go to sleep in the meantime.
- The wake alarm triggers a bit earlier than the process is woken up and
  system suspend is started in between of the two events.

This race particular race is avoidable if the process sets its wakeup time
to T - \Delta T, where \Delta T is enough for the process to be scheduled
and run ioctl(sleepfd, SLEEPCTL_STAY_AWAKE).  So the complete sequence may
look like this:

- The process opens /dev/sleepctl as sleepfd1 and sets the timeout to 0.
- The process opens /dev/sleepctl as sleepfd2 and sets the timeout to T_2.
  T_2 should be sufficient for the process to be able to call
  ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) when woken up.
- It sets up a wake alarm to trigger at time T.
- It goes to sleep and sets it wakeup time to time T - \Delta T, such that
  \Delta T is sufficient for the process to call
  ioctl(sleepfd2, SLEEPCT_STAY_AWAKE).

Then, if system suspend happens before T - \Delta T, the process will be
woken up along with the wakealarm event at time T and it will be able to call
ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) before T_2 expires.  If system suspend
doesn't happen in that time frame, the process will wake up at T - \Delta T
and it will be able to call ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) (even if
system suspend triggers after the process has been woken up and before it's
able to run the ioctl, it doesn't matter, because the wakealarm wakeup will
trigger the sleepfd2's STAY_AWAKE anyway).

Still, there appear to be similar races that aren't avoidable (for example,
if the time the wake alarm will trigger is not known to the process in
advance), so I have an idea how to address them.  Namely, suppose we have
one more ioctl, SLEEPCTL_WAIT_EVENT, that's equivalent to a combination
of _RELAX, wait and _STAY_AWAKE such that the process will be sent a signal
(say SIGPWR) on the first wakeup event and it's _STAY_AWAKE will trigger
automatically.

So in the scenarion above:

- The process opens /dev/sleepctl, sets the timeout to 0 and calls
  ioctl(sleepfd, SLEEPCTL_STAY_AWAKE).
- It sets up a wake alarm to trigger at time T.
- It runs ioctl(sleepctl, SLEEPCTL_WAIT_EVENT) which "relaxes" its sleepfd
  and makes it go to sleep until the first wakeup event happens.
- The process' signal handler checks if the current time is >= T and makes
  the process go to the previous step if not.

Hmm?

Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-15 18:34       ` Alan Stern
@ 2011-10-15 21:43         ` NeilBrown
  0 siblings, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-15 21:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

[-- Attachment #1: Type: text/plain, Size: 2990 bytes --]

On Sat, 15 Oct 2011 14:34:59 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Sat, 15 Oct 2011, NeilBrown wrote:
> 
> > > One of the things Rafael didn't mention is that sometimes a kernel 
> > > driver needs to prevent the system from suspending.  This happens when 
> > > recharging over a USB connection.
> > > 
> > > There's no simple way for such a driver to communicate with a power
> > > daemon.  The driver has to use something like the wakeup mechanism -- 
> > > but currently that mechanism is optional.
> > > 
> > > Alan Stern
> > 
> > Certainly I don't expect a kernel driver to communicate directly with a
> > user-space daemon.  It communicates indirectly through the wakeup_source
> > mechanism.
> > If user-space wants to block suspend, it talks to the suspend daemon (power
> > manager) some how (dbus, lock files, sockets, signals, whatever).
> > If the kernel wants to block suspend, it activates a wakeup_source (aka
> > caffeine source) which the suspend daemon notices via /sys/power/wakeup_count.
> > 
> > But you say this wakeup mechanism is optional .... I don't see that.
> > 
> > It is implemented in drivers/base/power/wakeup.c which is included in the
> > kernel if CONFIG_PM_SLEEP which is defined as
> > 
> > config PM_SLEEP
> > 	def_bool y
> > 	depends on SUSPEND || HIBERNATE_CALLBACKS
> > 
> > which seems to mean "enable this unless we don't have suspend and we don't
> > have hibernate".
> > 
> > So it seems that the only time we don't have the wakeup mechanism, we also
> > have no risk of ever going to sleep.
> > 
> > What exactly where you saying was "optional"?? I don't understand.
> 
> It's optional in the sense that user programs can bypass it.  They 
> aren't forced to read or write /sys/power/wakeup_countm, and if they 
> don't then the wakeup mechanism won't prevent the system from 
> suspending.
> 
> Alan Stern

Ahh, I understand now, thanks.

I didn't consider that as significant because I see it as a core and necessary
design philosophy.

For example, file locks are advisory in Unix/Linux.  In the old days we had
lock files which could be ignored (or even removed) by any (suitably
privileged) process.  Today we use flock or lockf which are easier to work
with but just as easy to ignore.  So they are optional, but it all still
works.
I have worked with systems with mandatory locking and they are often very
annoying.  I think they make it easier to write bad code, and harder to write
good code.

Another example is shutdown.  The right way to do this it tell 'init' to
clean up and shutdown, and /sbin/halt does that by default.
But it is optional, "halt -f" or "echo p > /proc/sysrq-trigger" just halt
immediately, and sometimes I want that option.

So yes: using /sys/power/wakeup_count is optional, but if some code doesn't
do it - file a bug report like you would if some code doesn't respect
necessary file locks.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-14  5:52 ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces NeilBrown
  2011-10-14 16:00   ` Alan Stern
@ 2011-10-15 22:10   ` Rafael J. Wysocki
  2011-10-16  2:49     ` Alan Stern
  2011-10-16 23:48     ` NeilBrown
  1 sibling, 2 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-15 22:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

Hi,

On Friday, October 14, 2011, NeilBrown wrote:
> On Thu, 13 Oct 2011 21:45:42 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
...
> 
> Hi Rafael,
> 
>  What do you mean by "too complicated to use in practice"?  What is you
>  measure for complexity?

I, personally, don't really know what the difficulty is, as I have already
described this approach for a few times (for example, in Section 5 of the
article at http://lwn.net/images/pdf/suspend_blockers.pdf).  However, I've
recently talked to a few people whom I regard as smart and who had tried
to implement it and it didn't work for them, and they aren't really able
to say why exactly.  Thus I have concluded it has to be complicated, but
obviously you're free to draw your own conclusions. :-)

[BTW, attempts to defend the approach I have invented against myself are
 extremely likely to fail, pretty much by definition. ;-)]

>  Using suspend in a race-free way is certainly less complex than - for
>  example - configuring bluetooth.
>  And in what way is it "inadequate for other reasons"? What reasons?

Consider the scenario described by John (the wakeup problem).  A process
has to do something at certain time and the system shouldn't be suspended
while the process is doing that, although it very well may be suspended
earlier or later.  The process puts itself to sleep with the assumption
that a wake alarm is set (presumably by another process) to wake the system
up from suspend at the right time (if the suspend happens).  However, the
process itself doesn't know _exactly_ what time the wake alarm is set to.

In the situation in which we only have the existing mechanism and a user space
power manager daemon, this scenario appears to be inherently racy, such that it
cannot be handled correctly.

>  The only sane way to handle suspend is for any (suitably privileged) process
>  to be able to request that suspend doesn't happen, and then for one process
>  to initiate suspend when no-one is blocking it.

As long as you don't specify the exact way by which the request is made and
how the suspend is blocked, the above statement is almost meaningless.

>  This is very different from the way it is currently handled were the GUI
>  says "Hmm.. I'm not doing anything just now, I think I'll suspend".
> 
>  The later simply doesn't scale.  It is broken.  It has to be replaced.
>  And it is being replaced.

Cool, good to hear that! :-)

>  gnome-power-manage has a dbus interface on which you can request
>  "InhibitInactiveSleep".  Call that will stop gnome-power-manager from
>  sleeping (I assume - haven't looked at the code).
>  It might not inhibit an explicit request for sleep - in that case it is
>  probably broken and needs to be fixed.  But is can be fixed.  Or replaced.

Perhaps.

Is KDE going to use the same mechanism, for one example?  And what about other
user space variants?  MeeGo anyone?  Tizen?  Android??

>  So if someone is running gnome-power-manager and wants to perform a firmware
>  update, the correct thing to do is to use dbus to disable the inactive sleep.
>  If someone is using some other power manager they might need to use some
>  other mechanism.  Presumably these things will be standardised at some stage.

Unless you have a specific idea about how to make this standardization happen,
I call it wishful thinking to put it lightly.  Sorry about the harsh words, but
that's how it goes IMNSHO. 

>  But I think it is very wrong to put some hack in the kernel like your
>    suspend_mode = disabled

Why is it wrong and why do you think it is a "hack"?

>  just because the user-space community hasn't got its act together yet.

Is there any guarantee that it will get its act together in any foreseeable
time frame?

>  And if you really need a hammer to stop processes from suspending the system:
> 
>    cat /sys/power/state > /tmp/state
>    mount --bind /tmp/state /sys/power/state
> 
>  should to it.

Except that (1) it appears to be racy (what if system suspend happens between
the first and second line in your example - can you safely start to upgrade
your firmware in that case?) and (2) it won't prevent the hibernate interface
based on /dev/snapshot from being used.

Do you honestly think I'd propose something like patch [1/2] if I didn't
see any other _working_ approach?

>  You second patch has little to recommend it either.
>  In the first place it seems to be entrenching the notion that timeouts are a
>  good and valid way to think about suspend.

That's because I think they are unavoidable.  Even if we are able to eliminate
all timeouts in the handling of wakeup events by the kernel and passing them
to user space, which I don't think is a realistic expectation, the user will
still have only so much time to wait for things to happen.  For example, if
a phone user doesn't see the screen turn on 0.5 sec after the button was
pressed, the button is pretty much guaranteed to be pressed again.  This
observation applies to other wakeup events, more or less.  They are very much
like items with "suitability for consumption" timestamps: it they are not
consumed quickly enough, we can simply forget about them.

>  I certainly agree that there are plenty of cases where timeouts are
>  important and necessary.  But there are also plenty of cases where you will
>  know exactly when you can allow suspend again, and having a timeout there is
>  just confusing.

Please note that with patch [2/2] the timeout can always be overriden.

>  But worse - the mechanism you provide can be trivially implemented using
>  unix-domain sockets talking to a suspend-daemon.
> 
>  Instead of opening /dev/sleepctl, you connect to /var/run/suspend-daemon/sock
>  Instead of ioctl(SLEEPCTL_STAY_AWAKE), you write a number to the socket.
>  Instead of ioctl(SLEEPCTL_RELAX), you write zero to the socket.
> 
>  All the extra handling you do in the kernel, can easily be done by
>  user-space suspend-daemon.

I'm not exactly sure why it is "worse".  Doing it through sockets may require
the kernel to do more work and it won't be possible to implement the
SLEEPCTL_WAIT_EVENT ioctl I've just described to John this way.

>  I really wish I could work out why people find the current mechanism
>  "difficult to use".  What exactly is it that is difficult?
>  I have describe previously how to build a race-free suspend system.  Which
>  bit of that is complicated or hard to achieve?  Or which bit of that cannot
>  work the way I claim?  Or which need is not met by my proposals?
> 
>  Isn't it much preferable to do this in userspace where people can
>  experiment and refine and improve without having to upgrade the kernel?

Well, I used to think that it's better to do things in user space.  Hence,
the hibernate user space interface that's used by many people.  And my
experience with that particular thing made me think that doing things in
the kernel may actually work better, even if they _can_ be done in user space.

Obviously, that doesn't apply to everything, but sometimes it simply is worth
discussing (if not trying).  If it doesn't work out, then fine, let's do it
differently, but I'm really not taking the "this should be done in user space"
argument at face value any more.  Sorry about that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-15 22:10   ` Rafael J. Wysocki
@ 2011-10-16  2:49     ` Alan Stern
  2011-10-16 14:51       ` Alan Stern
  2011-10-16 20:26       ` Rafael J. Wysocki
  2011-10-16 23:48     ` NeilBrown
  1 sibling, 2 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-16  2:49 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Sun, 16 Oct 2011, Rafael J. Wysocki wrote:

> Hi,
> 
> On Friday, October 14, 2011, NeilBrown wrote:
> > On Thu, 13 Oct 2011 21:45:42 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> ...
> > 
> > Hi Rafael,
> > 
> >  What do you mean by "too complicated to use in practice"?  What is you
> >  measure for complexity?
> 
> I, personally, don't really know what the difficulty is, as I have already
> described this approach for a few times (for example, in Section 5 of the
> article at http://lwn.net/images/pdf/suspend_blockers.pdf).  However, I've
> recently talked to a few people whom I regard as smart and who had tried
> to implement it and it didn't work for them, and they aren't really able
> to say why exactly.  Thus I have concluded it has to be complicated, but
> obviously you're free to draw your own conclusions. :-)
> 
> [BTW, attempts to defend the approach I have invented against myself are
>  extremely likely to fail, pretty much by definition. ;-)]

I'm with Neil on this.  I think the mechanisms you have proposed could 
be implemented equally well in userspace, with very little penalty.

Certainly any process that is wakeup-aware will have to be 
specially written in any case.  Either it has provisions for using 
/dev/sleepctl or it has provisions for communicating with a PM daemon.  
One doesn't seem any simpler than the other.

In addition, with proper care a PM daemon could be written that would
work with "legacy" systems (all current systems qualify).  It would 
allow for new wakeup-aware programs while allowing the legacy system to 
work correctly.

> >  Using suspend in a race-free way is certainly less complex than - for
> >  example - configuring bluetooth.
> >  And in what way is it "inadequate for other reasons"? What reasons?
> 
> Consider the scenario described by John (the wakeup problem).  A process
> has to do something at certain time and the system shouldn't be suspended
> while the process is doing that, although it very well may be suspended
> earlier or later.  The process puts itself to sleep with the assumption
> that a wake alarm is set (presumably by another process) to wake the system
> up from suspend at the right time (if the suspend happens).  However, the
> process itself doesn't know _exactly_ what time the wake alarm is set to.

That's okay.  When the process is notified about an impending suspend,
it checks the current time.  If the current time is more than Delta-T
before the target time, it allows the suspend to occur (relying on the
wake alarm to activate before the target time).  If not, it forbids the
suspend.

> In the situation in which we only have the existing mechanism and a user space
> power manager daemon, this scenario appears to be inherently racy, such that it
> cannot be handled correctly.

Anything wrong with the scheme described above?

> Is KDE going to use the same mechanism, for one example?  And what about other
> user space variants?  MeeGo anyone?  Tizen?  Android??

It shouldn't matter.  We ought to be able to write a PM daemon that 
would work under any of these systems as they currently exist.

> >  But I think it is very wrong to put some hack in the kernel like your
> >    suspend_mode = disabled
> 
> Why is it wrong and why do you think it is a "hack"?
> 
> >  just because the user-space community hasn't got its act together yet.
> 
> Is there any guarantee that it will get its act together in any foreseeable
> time frame?
> 
> >  And if you really need a hammer to stop processes from suspending the system:
> > 
> >    cat /sys/power/state > /tmp/state
> >    mount --bind /tmp/state /sys/power/state
> > 
> >  should to it.
> 
> Except that (1) it appears to be racy (what if system suspend happens between
> the first and second line in your example - can you safely start to upgrade
> your firmware in that case?) and (2) it won't prevent the hibernate interface
> based on /dev/snapshot from being used.

The bind mount, or something equivalent, would be done once, when the
PM daemon starts up (presumably at boot time).  Races aren't an issue 
then.

Basically, what we need is a reliable way to intercept the existing
mechanisms for suspend/hibernate and to redirect the requests to the PM
daemon.  When the daemon is started up in "legacy" mode, it assumes
there is a legacy client (representing the entire set of
non-wakeup-aware programs) that always forbids suspend _except_ when
one of the old mechanisms is invoked.

> Do you honestly think I'd propose something like patch [1/2] if I didn't
> see any other _working_ approach?

This redirection idea is worth considering.

> >  You second patch has little to recommend it either.
> >  In the first place it seems to be entrenching the notion that timeouts are a
> >  good and valid way to think about suspend.
> 
> That's because I think they are unavoidable.  Even if we are able to eliminate
> all timeouts in the handling of wakeup events by the kernel and passing them
> to user space, which I don't think is a realistic expectation, the user will
> still have only so much time to wait for things to happen.  For example, if
> a phone user doesn't see the screen turn on 0.5 sec after the button was
> pressed, the button is pretty much guaranteed to be pressed again.  This
> observation applies to other wakeup events, more or less.  They are very much
> like items with "suitability for consumption" timestamps: it they are not
> consumed quickly enough, we can simply forget about them.

At the moment, I don't see the utility of timeouts for wakeup-aware 
user programs.  While they may sometimes be necessary in the kernel, a 
program can implement its own timers.

> >  But worse - the mechanism you provide can be trivially implemented using
> >  unix-domain sockets talking to a suspend-daemon.
> > 
> >  Instead of opening /dev/sleepctl, you connect to /var/run/suspend-daemon/sock
> >  Instead of ioctl(SLEEPCTL_STAY_AWAKE), you write a number to the socket.
> >  Instead of ioctl(SLEEPCTL_RELAX), you write zero to the socket.
> > 
> >  All the extra handling you do in the kernel, can easily be done by
> >  user-space suspend-daemon.
> 
> I'm not exactly sure why it is "worse".  Doing it through sockets may require
> the kernel to do more work and it won't be possible to implement the
> SLEEPCTL_WAIT_EVENT ioctl I've just described to John this way.

Why not?  The PM daemon queries all its clients when a suspend is 
imminent.  Those queries are just like the SIGPWR things you described 
for SLEEPCTL_WAIT_EVENT.

> >  Isn't it much preferable to do this in userspace where people can
> >  experiment and refine and improve without having to upgrade the kernel?
> 
> Well, I used to think that it's better to do things in user space.  Hence,
> the hibernate user space interface that's used by many people.  And my
> experience with that particular thing made me think that doing things in
> the kernel may actually work better, even if they _can_ be done in user space.
> 
> Obviously, that doesn't apply to everything, but sometimes it simply is worth
> discussing (if not trying).  If it doesn't work out, then fine, let's do it
> differently, but I'm really not taking the "this should be done in user space"
> argument at face value any more.  Sorry about that.

In this case, I strongly suspect that the difficulty level will be
about the same either way.  Both approaches would place strict
requirements on the structure of wakeup-aware programs.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16  2:49     ` Alan Stern
@ 2011-10-16 14:51       ` Alan Stern
  2011-10-16 20:32         ` Rafael J. Wysocki
  2011-10-16 22:34         ` NeilBrown
  2011-10-16 20:26       ` Rafael J. Wysocki
  1 sibling, 2 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-16 14:51 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Sat, 15 Oct 2011, Alan Stern wrote:

> Basically, what we need is a reliable way to intercept the existing
> mechanisms for suspend/hibernate and to redirect the requests to the PM
> daemon.  When the daemon is started up in "legacy" mode, it assumes
> there is a legacy client (representing the entire set of
> non-wakeup-aware programs) that always forbids suspend _except_ when
> one of the old mechanisms is invoked.

The more I think about this, the better it seems.  In essence, it 
amounts to "virtualizing" the existing PM interface.

Let's add /sys/power/manage, and make it single-open.  Whenever that 
file is open, writes to /sys/power/state and /dev/snapshot don't work 
normally; instead they get forwarded over /sys/power/manage (and 
results get sent back).  Suspend is easy; hibernation (because of its 
multi-step nature) will be more difficult.

The only important requirement is that processes can use poll system 
calls to wait for wakeup events.  This may not always be true (consider 
timer expirations, for example), but we ought to be able to make some 
sort of accomodation.

The PM daemon will communicate with its clients over a Unix-domain
socket.  The protocol can be extremely simple: The daemon sends a byte
to the client when it wants to sleep, and the client sends the byte
back when it is ready to allow the system to go to sleep.  There's
never more than one byte outstanding at any time in either direction.

The clients would be structured like this:

	Open a socket connection to the PM daemon.

	Loop:

		Poll on possible events and the PM socket.

		If any events occurred, handle them.

		Otherwise if a byte was received from the PM daemon,
		send it back.

In non-legacy mode, the PM daemon's main loop is also quite simple:

	1. Read /sys/power/wakeup_count.

	2. For each client socket:

		If a response to the previous transmission is still
		pending, wait for it.

		Send a byte (the data can be just a sequence number).

		Wait for the byte to be echoed back.

	3. Write /sys/power/wakeup_count.

	4. Write a sleep command to /sys/power/manage.

A timeout can be added to step 2 if desired, but in this mode it isn't
needed.

With legacy support enabled, we probably will want something like a 
1-second timeout for step 2.  We'll also need an extra step at the 
beginning and one at the end:

	0. Wait for somebody to write "standy" or "mem" to 
	   /sys/power/state (received via the /sys/power/manage file).

	5. Send the final status of the suspend command back to the
	   /sys/power/state writer.

Equivalent support for hibernation is left as an exercise for the 
reader.

Obviously the PM daemon will need a secondary thread to accept new 
incoming socket connections, and these connections will have to be 
synchronized with the end of the iteration in step 2 (i.e., don't 
accept new connections between the end of step 2 and the end of step 
4).

Initial startup of the daemon will be a little tricky, because it
shouldn't start carrying out suspends until some clients have had a
chance to connect.  For that matter, in non-legacy mode the daemon
might not want to initiate suspends when there are no clients -- the
system would never get anything done because it would go back to sleep
as soon as the kernel finished processing each wakeup event.

This really seems like it could work, and it wouldn't be tremendously 
complicated.  The only changes needed in the kernel would be the 
"virtualization" (or forwarding) mechanism for legacy support.

Alan Stern

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16  2:49     ` Alan Stern
  2011-10-16 14:51       ` Alan Stern
@ 2011-10-16 20:26       ` Rafael J. Wysocki
  1 sibling, 0 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-16 20:26 UTC (permalink / raw)
  To: Alan Stern; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

Hi,

On Sunday, October 16, 2011, Alan Stern wrote:
> On Sun, 16 Oct 2011, Rafael J. Wysocki wrote:
> 
> > Hi,
> > 
> > On Friday, October 14, 2011, NeilBrown wrote:
> > > On Thu, 13 Oct 2011 21:45:42 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > ...
> > > 
> > > Hi Rafael,
> > > 
> > >  What do you mean by "too complicated to use in practice"?  What is you
> > >  measure for complexity?
> > 
> > I, personally, don't really know what the difficulty is, as I have already
> > described this approach for a few times (for example, in Section 5 of the
> > article at http://lwn.net/images/pdf/suspend_blockers.pdf).  However, I've
> > recently talked to a few people whom I regard as smart and who had tried
> > to implement it and it didn't work for them, and they aren't really able
> > to say why exactly.  Thus I have concluded it has to be complicated, but
> > obviously you're free to draw your own conclusions. :-)
> > 
> > [BTW, attempts to defend the approach I have invented against myself are
> >  extremely likely to fail, pretty much by definition. ;-)]
> 
> I'm with Neil on this.  I think the mechanisms you have proposed could 
> be implemented equally well in userspace, with very little penalty.

There's one little problem with that, which is to make user space developers
actually implement your idea. :-)

> Certainly any process that is wakeup-aware will have to be 
> specially written in any case.  Either it has provisions for using 
> /dev/sleepctl or it has provisions for communicating with a PM daemon.  
> One doesn't seem any simpler than the other.

I can agree with that.

> In addition, with proper care a PM daemon could be written that would
> work with "legacy" systems (all current systems qualify).  It would 
> allow for new wakeup-aware programs while allowing the legacy system to 
> work correctly.

Cool.  I thought pretty much the same one year ago, but so far, there are
no real-life implementations.

> > >  Using suspend in a race-free way is certainly less complex than - for
> > >  example - configuring bluetooth.
> > >  And in what way is it "inadequate for other reasons"? What reasons?
> > 
> > Consider the scenario described by John (the wakeup problem).  A process
> > has to do something at certain time and the system shouldn't be suspended
> > while the process is doing that, although it very well may be suspended
> > earlier or later.  The process puts itself to sleep with the assumption
> > that a wake alarm is set (presumably by another process) to wake the system
> > up from suspend at the right time (if the suspend happens).  However, the
> > process itself doesn't know _exactly_ what time the wake alarm is set to.
> 
> That's okay.  When the process is notified about an impending suspend,
> it checks the current time.  If the current time is more than Delta-T
> before the target time, it allows the suspend to occur (relying on the
> wake alarm to activate before the target time).  If not, it forbids the
> suspend.
> 
> > In the situation in which we only have the existing mechanism and a user space
> > power manager daemon, this scenario appears to be inherently racy, such that it
> > cannot be handled correctly.
> 
> Anything wrong with the scheme described above?

The wake alarm may happen before time T - Delta-T, in which case the process
will allow suspend to happen and won't be woken up.  However, I agree that this
is a matter of timing.

> > Is KDE going to use the same mechanism, for one example?  And what about other
> > user space variants?  MeeGo anyone?  Tizen?  Android??
> 
> It shouldn't matter.  We ought to be able to write a PM daemon that 
> would work under any of these systems as they currently exist.

OK

> > >  But I think it is very wrong to put some hack in the kernel like your
> > >    suspend_mode = disabled
> > 
> > Why is it wrong and why do you think it is a "hack"?
> > 
> > >  just because the user-space community hasn't got its act together yet.
> > 
> > Is there any guarantee that it will get its act together in any foreseeable
> > time frame?
> > 
> > >  And if you really need a hammer to stop processes from suspending the system:
> > > 
> > >    cat /sys/power/state > /tmp/state
> > >    mount --bind /tmp/state /sys/power/state
> > > 
> > >  should to it.
> > 
> > Except that (1) it appears to be racy (what if system suspend happens between
> > the first and second line in your example - can you safely start to upgrade
> > your firmware in that case?) and (2) it won't prevent the hibernate interface
> > based on /dev/snapshot from being used.
> 
> The bind mount, or something equivalent, would be done once, when the
> PM daemon starts up (presumably at boot time).  Races aren't an issue 
> then.
> 
> Basically, what we need is a reliable way to intercept the existing
> mechanisms for suspend/hibernate and to redirect the requests to the PM
> daemon.  When the daemon is started up in "legacy" mode, it assumes
> there is a legacy client (representing the entire set of
> non-wakeup-aware programs) that always forbids suspend _except_ when
> one of the old mechanisms is invoked.

I think that implementing this will actually be more complicated than my
patches.

> > Do you honestly think I'd propose something like patch [1/2] if I didn't
> > see any other _working_ approach?
> 
> This redirection idea is worth considering.
> 
> > >  You second patch has little to recommend it either.
> > >  In the first place it seems to be entrenching the notion that timeouts are a
> > >  good and valid way to think about suspend.
> > 
> > That's because I think they are unavoidable.  Even if we are able to eliminate
> > all timeouts in the handling of wakeup events by the kernel and passing them
> > to user space, which I don't think is a realistic expectation, the user will
> > still have only so much time to wait for things to happen.  For example, if
> > a phone user doesn't see the screen turn on 0.5 sec after the button was
> > pressed, the button is pretty much guaranteed to be pressed again.  This
> > observation applies to other wakeup events, more or less.  They are very much
> > like items with "suitability for consumption" timestamps: it they are not
> > consumed quickly enough, we can simply forget about them.
> 
> At the moment, I don't see the utility of timeouts for wakeup-aware 
> user programs.  While they may sometimes be necessary in the kernel, a 
> program can implement its own timers.

So consider the following modification of patch [2/2] in this series.

The SLEEPCTL_RELAX ioctl may take an additional argument (0 or 1)
indicating whether or not the process should be sent a signal (e.g. SIGPWR)
on the next wakeup event.  Along with sending the signal, the kernel will
do an equivalent of the SLEEPCTL_STAY_AWAKE, but the process will know
that it's supposed to do SLEEPCTL_RELAX again.  In this case, the timeouts
will be entirely optional (they need not be present at all in the patch).

Which doesn't make me think we can avoid timeouts anyway on higher levels, so
I don't see why they are wrong at this level.

> > >  But worse - the mechanism you provide can be trivially implemented using
> > >  unix-domain sockets talking to a suspend-daemon.
> > > 
> > >  Instead of opening /dev/sleepctl, you connect to /var/run/suspend-daemon/sock
> > >  Instead of ioctl(SLEEPCTL_STAY_AWAKE), you write a number to the socket.
> > >  Instead of ioctl(SLEEPCTL_RELAX), you write zero to the socket.
> > > 
> > >  All the extra handling you do in the kernel, can easily be done by
> > >  user-space suspend-daemon.
> > 
> > I'm not exactly sure why it is "worse".  Doing it through sockets may require
> > the kernel to do more work and it won't be possible to implement the
> > SLEEPCTL_WAIT_EVENT ioctl I've just described to John this way.
> 
> Why not?  The PM daemon queries all its clients when a suspend is 
> imminent.  Those queries are just like the SIGPWR things you described 
> for SLEEPCTL_WAIT_EVENT.

SIGPWR means that a wakeup event has actually happened, but the queries are
just in case.  And I claim that programming applications for handling those
queries will be more complicated than using the SLEEPCTL_STAY_AWAKE and
SLEEPCTL_RELAX ioctls with the modification described above.

Plus we'll need to implement the PM manager daemon, which I think will take
more time and code than those relatively simple patches I sent. :-)

> > >  Isn't it much preferable to do this in userspace where people can
> > >  experiment and refine and improve without having to upgrade the kernel?
> > 
> > Well, I used to think that it's better to do things in user space.  Hence,
> > the hibernate user space interface that's used by many people.  And my
> > experience with that particular thing made me think that doing things in
> > the kernel may actually work better, even if they _can_ be done in user space.
> > 
> > Obviously, that doesn't apply to everything, but sometimes it simply is worth
> > discussing (if not trying).  If it doesn't work out, then fine, let's do it
> > differently, but I'm really not taking the "this should be done in user space"
> > argument at face value any more.  Sorry about that.
> 
> In this case, I strongly suspect that the difficulty level will be
> about the same either way.

I'm not sure about that.

> Both approaches would place strict requirements on the structure of
> wakeup-aware programs.

That's obviously correct.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16 14:51       ` Alan Stern
@ 2011-10-16 20:32         ` Rafael J. Wysocki
  2011-10-17 15:33           ` Alan Stern
  2011-10-16 22:34         ` NeilBrown
  1 sibling, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-16 20:32 UTC (permalink / raw)
  To: Alan Stern; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Sunday, October 16, 2011, Alan Stern wrote:
> On Sat, 15 Oct 2011, Alan Stern wrote:
> 
> > Basically, what we need is a reliable way to intercept the existing
> > mechanisms for suspend/hibernate and to redirect the requests to the PM
> > daemon.  When the daemon is started up in "legacy" mode, it assumes
> > there is a legacy client (representing the entire set of
> > non-wakeup-aware programs) that always forbids suspend _except_ when
> > one of the old mechanisms is invoked.
> 
> The more I think about this, the better it seems.  In essence, it 
> amounts to "virtualizing" the existing PM interface.
> 
> Let's add /sys/power/manage, and make it single-open.

I'm not sure how to do that in sysfs.

Also I'm not sure what the real difference between /sys/power/manage
and my /sys/power/sleep_mode is (I could make /sys/power/sleep_mode
single-open too, if I knew how to do that).

> Whenever that file is open, writes to /sys/power/state and /dev/snapshot
> don't work normally; instead they get forwarded over /sys/power/manage (and 
> results get sent back).  Suspend is easy; hibernation (because of its 
> multi-step nature) will be more difficult.
> 
> The only important requirement is that processes can use poll system 
> calls to wait for wakeup events.  This may not always be true (consider 
> timer expirations, for example), but we ought to be able to make some 
> sort of accomodation.
> 
> The PM daemon will communicate with its clients over a Unix-domain
> socket.  The protocol can be extremely simple: The daemon sends a byte
> to the client when it wants to sleep, and the client sends the byte
> back when it is ready to allow the system to go to sleep.  There's
> never more than one byte outstanding at any time in either direction.
> 
> The clients would be structured like this:
> 
> 	Open a socket connection to the PM daemon.
> 
> 	Loop:
> 
> 		Poll on possible events and the PM socket.
> 
> 		If any events occurred, handle them.
> 
> 		Otherwise if a byte was received from the PM daemon,
> 		send it back.
> 
> In non-legacy mode, the PM daemon's main loop is also quite simple:
> 
> 	1. Read /sys/power/wakeup_count.
> 
> 	2. For each client socket:
> 
> 		If a response to the previous transmission is still
> 		pending, wait for it.
> 
> 		Send a byte (the data can be just a sequence number).
> 
> 		Wait for the byte to be echoed back.
> 
> 	3. Write /sys/power/wakeup_count.
> 
> 	4. Write a sleep command to /sys/power/manage.
> 
> A timeout can be added to step 2 if desired, but in this mode it isn't
> needed.
> 
> With legacy support enabled, we probably will want something like a 
> 1-second timeout for step 2.  We'll also need an extra step at the 
> beginning and one at the end:
> 
> 	0. Wait for somebody to write "standy" or "mem" to 
> 	   /sys/power/state (received via the /sys/power/manage file).
> 
> 	5. Send the final status of the suspend command back to the
> 	   /sys/power/state writer.
> 
> Equivalent support for hibernation is left as an exercise for the 
> reader.

Hehe.  Quite a difficult one for that matter. :-)

> Obviously the PM daemon will need a secondary thread to accept new 
> incoming socket connections, and these connections will have to be 
> synchronized with the end of the iteration in step 2 (i.e., don't 
> accept new connections between the end of step 2 and the end of step 
> 4).
> 
> Initial startup of the daemon will be a little tricky, because it
> shouldn't start carrying out suspends until some clients have had a
> chance to connect.  For that matter, in non-legacy mode the daemon
> might not want to initiate suspends when there are no clients -- the
> system would never get anything done because it would go back to sleep
> as soon as the kernel finished processing each wakeup event.
> 
> This really seems like it could work, and it wouldn't be tremendously 
> complicated.  The only changes needed in the kernel would be the 
> "virtualization" (or forwarding) mechanism for legacy support.

Yes, it could be made work, just as the hibernate user space interface,
but would it be really convenient to use?  I have some doubts.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16 14:51       ` Alan Stern
  2011-10-16 20:32         ` Rafael J. Wysocki
@ 2011-10-16 22:34         ` NeilBrown
  2011-10-17 14:45           ` Alan Stern
  2011-10-31 15:11           ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Richard Hughes
  1 sibling, 2 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-16 22:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

On Sun, 16 Oct 2011 10:51:01 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Sat, 15 Oct 2011, Alan Stern wrote:
> 
> > Basically, what we need is a reliable way to intercept the existing
> > mechanisms for suspend/hibernate and to redirect the requests to the PM
> > daemon.  When the daemon is started up in "legacy" mode, it assumes
> > there is a legacy client (representing the entire set of
> > non-wakeup-aware programs) that always forbids suspend _except_ when
> > one of the old mechanisms is invoked.
> 
> The more I think about this, the better it seems.  In essence, it 
> amounts to "virtualizing" the existing PM interface.

While "virtualizing" does sound attractive in some way, I think it would be
the wrong thing to do.
In practice there is only one process at a time that is likely to suspend
the system.  I've just been exploring how that works.

gnome-power-manager talks to upowerd over dbus to ask for a suspend.
upowerd then runs /usr/sbin/pm-suspend.
pm-suspend then runs all the script in /usr/lib/pm-utils/sleep.d/
and the calls "do_suspend" which is defined in /usr/lib/pm-utils/pm-functions

Ugghh.. That is a very deep stack that is doing things the "wrong" way.
i.e. it is structured about request to suspend rather than requests to stay
awake.

Nonetheless, we only really need to worry about the bottom of the stack.
Rather than virtualize /sys/power/state, just modify pm-function, which
you can probably do by putting appropriate content
into /usr/lib/pm-utils/defaults.
Get that to define a do_suspend which interacts with the new suspend-daemon
to say "now would be a good time to suspend" - if nothing else is blocking
suspend, it does.

Put it another way:  power-management has always been "virtualized" via lots
of shell scripts in pm-utils (and various daemons stacked on top of that).
We just need to plug in to that virtualisation.

This is all based on gnome.  kde might be different, but I suspect that it
only at the top levels.  I would be surprised if kde and the other desktops
don't all end up going through pm-utils.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-15 22:10   ` Rafael J. Wysocki
  2011-10-16  2:49     ` Alan Stern
@ 2011-10-16 23:48     ` NeilBrown
  2011-10-17 15:43       ` Alan Stern
  2011-10-17 22:02       ` Rafael J. Wysocki
  1 sibling, 2 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-16 23:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

[-- Attachment #1: Type: text/plain, Size: 14416 bytes --]

On Sun, 16 Oct 2011 00:10:40 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> Hi,
> 
> On Friday, October 14, 2011, NeilBrown wrote:
> > On Thu, 13 Oct 2011 21:45:42 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> ...
> > 
> > Hi Rafael,
> > 
> >  What do you mean by "too complicated to use in practice"?  What is you
> >  measure for complexity?
> 
> I, personally, don't really know what the difficulty is, as I have already
> described this approach for a few times (for example, in Section 5 of the
> article at http://lwn.net/images/pdf/suspend_blockers.pdf).  However, I've
> recently talked to a few people whom I regard as smart and who had tried
> to implement it and it didn't work for them, and they aren't really able
> to say why exactly.  Thus I have concluded it has to be complicated, but
> obviously you're free to draw your own conclusions. :-)
> 
> [BTW, attempts to defend the approach I have invented against myself are
>  extremely likely to fail, pretty much by definition. ;-)]

:-)   Maybe we can defend it together then.

> 
> >  Using suspend in a race-free way is certainly less complex than - for
> >  example - configuring bluetooth.
> >  And in what way is it "inadequate for other reasons"? What reasons?
> 
> Consider the scenario described by John (the wakeup problem).  A process
> has to do something at certain time and the system shouldn't be suspended
> while the process is doing that, although it very well may be suspended
> earlier or later.  The process puts itself to sleep with the assumption
> that a wake alarm is set (presumably by another process) to wake the system
> up from suspend at the right time (if the suspend happens).  However, the
> process itself doesn't know _exactly_ what time the wake alarm is set to.
> 
> In the situation in which we only have the existing mechanism and a user space
> power manager daemon, this scenario appears to be inherently racy, such that it
> cannot be handled correctly.

I would suggest that we need a time-management daemon - whether it ends up
being cron or systemd or scripts in pm-utils or some new daemon is just an
implementation detail.

i.e. we need a way for a process to say "I have something to do a X o'clock".
Whatever provides this service needs to hook in to the suspend logic and when
a suspend is about to happen, it programs the RTC alarm to wake up at (or
just before) the earliest requested time.
When the time arrives, the service blocks suspend and replies to the original
process "OK, do your thing".  The process takes over blocking of suspend,
acknowledges the wakeup, and does the backup or whatever.

So yes: we do need something new, but it is easy enough to do all in
user-space.

One important part of this is that an RTC alarm needs to be treated as a
wakeup_event and have a wakeup_source activated for it.  I suspect you could
avoid the need for that by having the suspend daemon know about programming
the RTC alarm and to simply not suspend at a bad time.
John Stultz posted a patch to add a wakeup_source for the RTC.  What do you
think of that.  Is a wakeup_source sensible here, or should user-space just
be careful about not suspending when a RTC alarm is likely soon ??

.... Actually, the more I think about it, the more sense it makes to include
the wake-up-at-time service with the suspend-daemon.  Then the RTC alarm
doesn't need a wakeup_source.
So my hypothetical suspend-daemon provides 2 services:
 1/ Client can say "Don't suspend after X".  If X is in the past it means
    don't suspend at all. In the future it means "If you suspend before
    this, be sure to wake up by X".  This request must be explicitly
    cancelled (though some mechanism is needed so that if the process dies
    it is automatically cancelled).
 2/ Client can say "check with me before entering suspend".  Client needs to
    respond to any callback promptly, but can register a "don't suspend after
    now" request first.
    (Client probably gets a callback both on suspend and resume)

> 
> >  The only sane way to handle suspend is for any (suitably privileged) process
> >  to be able to request that suspend doesn't happen, and then for one process
> >  to initiate suspend when no-one is blocking it.
> 
> As long as you don't specify the exact way by which the request is made and
> how the suspend is blocked, the above statement is almost meaningless.

The meaning is in the style of request.  Requests should be "don't suspend at
them moment", not "do suspend now".  I didn't intend it to carry more meaning
than that.

> 
> >  This is very different from the way it is currently handled were the GUI
> >  says "Hmm.. I'm not doing anything just now, I think I'll suspend".
> > 
> >  The later simply doesn't scale.  It is broken.  It has to be replaced.
> >  And it is being replaced.
> 
> Cool, good to hear that! :-)

I might have spoken too soon there :-(
I looked more deeply at how gnome power management works and it is deeply
structures around request to go to sleep, not requests to stay awake.

> 
> >  gnome-power-manage has a dbus interface on which you can request
> >  "InhibitInactiveSleep".  Call that will stop gnome-power-manager from
> >  sleeping (I assume - haven't looked at the code).
> >  It might not inhibit an explicit request for sleep - in that case it is
> >  probably broken and needs to be fixed.  But is can be fixed.  Or replaced.
> 
> Perhaps.
> 
> Is KDE going to use the same mechanism, for one example?  And what about other
> user space variants?  MeeGo anyone?  Tizen?  Android??
> 
> >  So if someone is running gnome-power-manager and wants to perform a firmware
> >  update, the correct thing to do is to use dbus to disable the inactive sleep.
> >  If someone is using some other power manager they might need to use some
> >  other mechanism.  Presumably these things will be standardised at some stage.
> 
> Unless you have a specific idea about how to make this standardization happen,
> I call it wishful thinking to put it lightly.  Sorry about the harsh words, but
> that's how it goes IMNSHO. 

Standardisation will happen when enough people see a problem.  As yet it
seems that they don't.
Once there are enough Linux devices running open desktops and needing good
power management (i.e. suspend often) that people start seeing problems,
there will be more motivation to create solutions.

Currently, we just need to be sure that the kernel *can* provide the needed
functionality and, if we like, experiment with user-space code to make use of
that functionality in an effective way.

If enough people experiment, learn, and publish their results - then the more
successful implementations will eventually spread...

> 
> >  But I think it is very wrong to put some hack in the kernel like your
> >    suspend_mode = disabled
> 
> Why is it wrong and why do you think it is a "hack"?

I think it is a "hack" because it is addressing a specific complaint rather
than fixing a real problem.

Contrast that with your wakeup_events which are a carefully designed approach
addressing a real problem and taking into account the big picture.

i.e. it seems to be addressing a symptom rather addressing the cause.

(and it is wrong because "hacks" are almost always wrong - short-term gain,
long term cost).

> 
> >  just because the user-space community hasn't got its act together yet.
> 
> Is there any guarantee that it will get its act together in any foreseeable
> time frame?
> 
> >  And if you really need a hammer to stop processes from suspending the system:
> > 
> >    cat /sys/power/state > /tmp/state
> >    mount --bind /tmp/state /sys/power/state
> > 
> >  should to it.
> 
> Except that (1) it appears to be racy (what if system suspend happens between
> the first and second line in your example - can you safely start to upgrade
> your firmware in that case?) and (2) it won't prevent the hibernate interface
> based on /dev/snapshot from being used.
> 
> Do you honestly think I'd propose something like patch [1/2] if I didn't
> see any other _working_ approach?

I think there are other workable approaches  (maybe not actually _working_,
but only because no-one has written the code).

I'm not saying we should definitely not add more functionality to the kernel,
but I am saying we should not do it at all hastily.

If someone has tried to use the current functionality, has really understood
it, has made an appropriate attempt to make use of it, and has found that
something cannot be make to work reliably, or efficiently, or securely or
whatever, then certainly consider ways to address the problems.

But I don't think we are there yet.  We are only just getting to the
"understanding" stage (and I have found these conversations very helpful in
refining my understanding).

When I get my GTA04 (phone motherboard) I hope to write some code that
actually realises these idea properly (I have code on my GTA02, but it is
broken in various ways, and the kernel is too old to
have /sys/power/wakeup_count anyway).

> 
> >  You second patch has little to recommend it either.
> >  In the first place it seems to be entrenching the notion that timeouts are a
> >  good and valid way to think about suspend.
> 
> That's because I think they are unavoidable.  Even if we are able to eliminate
> all timeouts in the handling of wakeup events by the kernel and passing them
> to user space, which I don't think is a realistic expectation, the user will
> still have only so much time to wait for things to happen.  For example, if
> a phone user doesn't see the screen turn on 0.5 sec after the button was
> pressed, the button is pretty much guaranteed to be pressed again.  This
> observation applies to other wakeup events, more or less.  They are very much
> like items with "suitability for consumption" timestamps: it they are not
> consumed quickly enough, we can simply forget about them.

I hadn't thought of it like that - I do see your point I think.
However things are usually consumed long before they expire - expiry times
are longer than expected shelf life.
I think it is important to think carefully about the correct expiry time for
each event type as they aren't all the same.
So I would probably go for a larger default which is always safe, but
possibly wasteful.  But that is a small point.

> 
> >  I certainly agree that there are plenty of cases where timeouts are
> >  important and necessary.  But there are also plenty of cases where you will
> >  know exactly when you can allow suspend again, and having a timeout there is
> >  just confusing.
> 
> Please note that with patch [2/2] the timeout can always be overriden.
> 
> >  But worse - the mechanism you provide can be trivially implemented using
> >  unix-domain sockets talking to a suspend-daemon.
> > 
> >  Instead of opening /dev/sleepctl, you connect to /var/run/suspend-daemon/sock
> >  Instead of ioctl(SLEEPCTL_STAY_AWAKE), you write a number to the socket.
> >  Instead of ioctl(SLEEPCTL_RELAX), you write zero to the socket.
> > 
> >  All the extra handling you do in the kernel, can easily be done by
> >  user-space suspend-daemon.
> 
> I'm not exactly sure why it is "worse".  Doing it through sockets may require
> the kernel to do more work and it won't be possible to implement the
> SLEEPCTL_WAIT_EVENT ioctl I've just described to John this way.

"worse" because it appears to me that you are adding functionality to the
kernel which is effectively already present.  When people do that to meet a
specific need it is usually not as usable as the original.  i.e. "You have
re-invented XXX - badly".  In this case XXX is IPC.

Yes - more CPU cycles may be expended in the user-space solution than a
kernel space solution, but that is a trade-off we often make.  I don't think
that suspend is a time-critical operation - is it?

And I think SLEEPCTL_WAIT_EVENT would work fine over sockets, particularly
instead of a signal being sense, a simple short message were sent back over
the socket.

> 
> >  I really wish I could work out why people find the current mechanism
> >  "difficult to use".  What exactly is it that is difficult?
> >  I have describe previously how to build a race-free suspend system.  Which
> >  bit of that is complicated or hard to achieve?  Or which bit of that cannot
> >  work the way I claim?  Or which need is not met by my proposals?
> > 
> >  Isn't it much preferable to do this in userspace where people can
> >  experiment and refine and improve without having to upgrade the kernel?
> 
> Well, I used to think that it's better to do things in user space.  Hence,
> the hibernate user space interface that's used by many people.  And my
> experience with that particular thing made me think that doing things in
> the kernel may actually work better, even if they _can_ be done in user space.
> 
> Obviously, that doesn't apply to everything, but sometimes it simply is worth
> discussing (if not trying).  If it doesn't work out, then fine, let's do it
> differently, but I'm really not taking the "this should be done in user space"
> argument at face value any more.  Sorry about that.

:-)  I have had similar mixed experiences.   Sometimes it can be a lot easier
to get things working if it is all in the kernel.
But I think that doing things in user-space leads to a lot more flexibility.
Once you have the interfaces and designs worked out you can then start doing
more interesting things and experimenting with ideas more easily.

In this case, I think the *only* barrier to a simple solution in user-space
is the pre-existing software that uses the 'old' kernel interface.  It seems
that interfacing with that is as easy as adding a script or two to pm-utils.

With that problem solved, experimenting is much easier in user-space than in
the kernel.

Thanks,
NeilBrown

> 
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16 22:34         ` NeilBrown
@ 2011-10-17 14:45           ` Alan Stern
  2011-10-17 22:49             ` NeilBrown
  2011-10-31 15:11           ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Richard Hughes
  1 sibling, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-17 14:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

On Mon, 17 Oct 2011, NeilBrown wrote:

> > The more I think about this, the better it seems.  In essence, it 
> > amounts to "virtualizing" the existing PM interface.
> 
> While "virtualizing" does sound attractive in some way, I think it would be
> the wrong thing to do.
> In practice there is only one process at a time that is likely to suspend
> the system.  I've just been exploring how that works.
> 
> gnome-power-manager talks to upowerd over dbus to ask for a suspend.
> upowerd then runs /usr/sbin/pm-suspend.
> pm-suspend then runs all the script in /usr/lib/pm-utils/sleep.d/
> and the calls "do_suspend" which is defined in /usr/lib/pm-utils/pm-functions
> 
> Ugghh.. That is a very deep stack that is doing things the "wrong" way.
> i.e. it is structured about request to suspend rather than requests to stay
> awake.
> 
> Nonetheless, we only really need to worry about the bottom of the stack.
> Rather than virtualize /sys/power/state, just modify pm-function, which
> you can probably do by putting appropriate content
> into /usr/lib/pm-utils/defaults.
> Get that to define a do_suspend which interacts with the new suspend-daemon
> to say "now would be a good time to suspend" - if nothing else is blocking
> suspend, it does.
> 
> Put it another way:  power-management has always been "virtualized" via lots
> of shell scripts in pm-utils (and various daemons stacked on top of that).
> We just need to plug in to that virtualisation.
> 
> This is all based on gnome.  kde might be different, but I suspect that it
> only at the top levels.  I would be surprised if kde and the other desktops
> don't all end up going through pm-utils.

Okay, good; that allows us to avoid the virtualization issue.  The only
reason for having it in the first place was to be certain of working
with userspace environments that don't use a standard, structured
method for initiating system sleeps.  If you don't care about those 
environments then there's no need for it.

Do we agree about the best way to make this work?  I'm suggesting that
when the PM daemon is started up with a "legacy" option, it should
assume the existence of a predefined client that always wants to keep
the system awake, except for brief periods whenever a sleep request is
received from a new pm-utils program.  Maybe this new program could
pass the PM daemon a time limit (such as 3000 ms), with the requirement
that if the daemon can't put the system to sleep within that time limit
then it should give up and fail the sleep request.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16 20:32         ` Rafael J. Wysocki
@ 2011-10-17 15:33           ` Alan Stern
  2011-10-17 21:10             ` Rafael J. Wysocki
  2011-10-17 21:27             ` Rafael J. Wysocki
  0 siblings, 2 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-17 15:33 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Sun, 16 Oct 2011, Rafael J. Wysocki wrote:

> On Sunday, October 16, 2011, Alan Stern wrote:
> > On Sat, 15 Oct 2011, Alan Stern wrote:
> > 
> > > Basically, what we need is a reliable way to intercept the existing
> > > mechanisms for suspend/hibernate and to redirect the requests to the PM
> > > daemon.  When the daemon is started up in "legacy" mode, it assumes
> > > there is a legacy client (representing the entire set of
> > > non-wakeup-aware programs) that always forbids suspend _except_ when
> > > one of the old mechanisms is invoked.
> > 
> > The more I think about this, the better it seems.  In essence, it 
> > amounts to "virtualizing" the existing PM interface.
> > 
> > Let's add /sys/power/manage, and make it single-open.
> 
> I'm not sure how to do that in sysfs.

If we don't implement the virtualization in the kernel, as Neil
suggests, then /sys/power/manage isn't necessary.  (And yes, I don't 
know how to make sysfs files single-open either -- probably there's no 
way to do it.)

> Also I'm not sure what the real difference between /sys/power/manage
> and my /sys/power/sleep_mode is (I could make /sys/power/sleep_mode
> single-open too, if I knew how to do that).

We really need to determine up front what userspace environments we
want to support.  It seems reasonable to decide that wakeup-awareness
will be available only on systems that use a centralized mechanism for
initiating system sleeps.  Whether that mechanism is pm-utils or a
vendor-specific program in an embedded system shouldn't matter too much
-- the important thing is that it can easily be changed to send
requests to a PM daemon instead of writing directly to /sys/power/state
or /dev/snapshot.

(Sending requests to the daemon need not be difficult; we could write a 
special program just for that purpose.)

If we do things this way, it leaves open the possibility of bypassing 
all the wakeup-aware code.  That's not necessarily a bad thing.

> > The only important requirement is that processes can use poll system 
> > calls to wait for wakeup events.  This may not always be true (consider 
> > timer expirations, for example), but we ought to be able to make some 
> > sort of accomodation.

This requirement remains somewhat tricky.  Can we guarantee it?  It 
comes down to two things.  When an event occurs that will cause a 
program to want to keep the system awake:

     A. The event must be capable of interrupting a poll system
	call.  I don't think it matters whether this interruption
	takes the form of a signal or of completing the system call.

     B. The program must be able to detect, in a non-blocking way, 
	whether the event has occurred.

Of course, any event that adds data to an input queue will be okay.  
But I don't know what other sorts of things we will have to handle.

> > The PM daemon will communicate with its clients over a Unix-domain
> > socket.  The protocol can be extremely simple: The daemon sends a byte
> > to the client when it wants to sleep, and the client sends the byte
> > back when it is ready to allow the system to go to sleep.  There's
> > never more than one byte outstanding at any time in either direction.
> > 
> > The clients would be structured like this:
> > 
> > 	Open a socket connection to the PM daemon.
> > 
> > 	Loop:
> > 
> > 		Poll on possible events and the PM socket.
> > 
> > 		If any events occurred, handle them.
> > 
> > 		Otherwise if a byte was received from the PM daemon,
> > 		send it back.
> > 
> > In non-legacy mode, the PM daemon's main loop is also quite simple:
> > 
> > 	1. Read /sys/power/wakeup_count.
> > 
> > 	2. For each client socket:
> > 
> > 		If a response to the previous transmission is still
> > 		pending, wait for it.
> > 
> > 		Send a byte (the data can be just a sequence number).
> > 
> > 		Wait for the byte to be echoed back.
> > 
> > 	3. Write /sys/power/wakeup_count.
> > 
> > 	4. Write a sleep command to /sys/power/manage.
> > 
> > A timeout can be added to step 2 if desired, but in this mode it isn't
> > needed.
> > 
> > With legacy support enabled, we probably will want something like a 
> > 1-second timeout for step 2.  We'll also need an extra step at the 
> > beginning and one at the end:
> > 
> > 	0. Wait for somebody to write "standy" or "mem" to 
> > 	   /sys/power/state (received via the /sys/power/manage file).

This would be replaced by: Wait for a sleep request to be received over 
the legacy interface.

> > 	5. Send the final status of the suspend command back to the
> > 	   /sys/power/state writer.

I haven't received any comments on these designs so far.  They seem
quite simple and adequate for what we want.  We may want to make the PM
daemon also responsible for keeping track of RTC wakeup alarm requests,
as Neil pointed out; that shouldn't be hard to add on.

> > Equivalent support for hibernation is left as an exercise for the 
> > reader.
> 
> Hehe.  Quite a difficult one for that matter. :-)

That's another thing we need to think about more carefully.  How 
extravagant do we want to make the wakeup/hibernation interaction?  My 
own feeling is: as little as possible (whatever that amounts to).

> > This really seems like it could work, and it wouldn't be tremendously 
> > complicated.  The only changes needed in the kernel would be the 
> > "virtualization" (or forwarding) mechanism for legacy support.
> 
> Yes, it could be made work, just as the hibernate user space interface,
> but would it be really convenient to use?  I have some doubts.

In terms of integration with current systems (and without the
virtualization), it should be very easy.  There will be a new daemon to
run when the system starts up, and a new program that will communicate
with that daemon (or will write to /sys/power/state if the daemon isn't
available).  That's all.

In terms of writing wakeup-aware clients, it's a little hard to say in 
the absence of any examples.  The client protocol described above 
shouldn't be too hard to use, especially if a wakeup library can be 
provided.

For something like a firmware update program, all the program has to do
is open a connection to the PM daemon before writing the new firmware.  
Nothing more -- if the program does not send any data over the socket
then the PM daemon will not allow sleep requests to go through.

Of course, the Android people have the most experience with this sort
of thing.  In an earlier discussion with Arve, he expressed some
concerns about getting the PM daemon started early enough (obviously it
needs to be running before any of its clients) and the fact that the
daemon would have to be multi-threaded.  I got the feeling that he was
complaining just for the sake of complaining, not because these things
would present any serious problems.

Converting the programs that currently use Android's userspace
wakelocks might be somewhat more difficult.  Simply releasing a
wakelock would no longer be sufficient; a program would need to respond
to polls from the PM daemon whenever it was willing to let the system
go to sleep.

Alan Stern

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16 23:48     ` NeilBrown
@ 2011-10-17 15:43       ` Alan Stern
  2011-10-17 22:02       ` Rafael J. Wysocki
  1 sibling, 0 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-17 15:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

On Mon, 17 Oct 2011, NeilBrown wrote:

> .... Actually, the more I think about it, the more sense it makes to include
> the wake-up-at-time service with the suspend-daemon.  Then the RTC alarm
> doesn't need a wakeup_source.
> So my hypothetical suspend-daemon provides 2 services:
>  1/ Client can say "Don't suspend after X".  If X is in the past it means
>     don't suspend at all. In the future it means "If you suspend before
>     this, be sure to wake up by X".  This request must be explicitly
>     cancelled (though some mechanism is needed so that if the process dies
>     it is automatically cancelled).
>  2/ Client can say "check with me before entering suspend".  Client needs to
>     respond to any callback promptly, but can register a "don't suspend after
>     now" request first.
>     (Client probably gets a callback both on suspend and resume)

1/ can be a separate type of communication channel to the daemon.  The
client opens a connection and sends the time X.  It then blocks waiting
for a response.  The daemon waits until X (using RTC wakeup alarms as
necessary), then acknowledges the request and prevents further suspends
until the connection is closed.

2/ is the normal client communication mechanism that I described 
earlier.  I don't see why a callback would be needed during resume in 
general, although some clients might want to be informed.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-15 21:29         ` Rafael J. Wysocki
@ 2011-10-17 16:48           ` John Stultz
  2011-10-17 18:19             ` Alan Stern
  2011-10-17 21:13             ` Rafael J. Wysocki
  0 siblings, 2 replies; 80+ messages in thread
From: John Stultz @ 2011-10-17 16:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM list, mark gross, LKML, Alan Stern, NeilBrown

On Sat, 2011-10-15 at 23:29 +0200, Rafael J. Wysocki wrote:
> So I think (please correct me if I'm wrong) that you're worried about the
> following situation:
> 
> - The process opens /dev/sleepctl and sets the timeout
> - It sets up a wake alarm to trigger at time T.
> - It goes to sleep and sets it wakeup time to time T too, e.g. using select()
>   with a timeout.
> - The system doesn't go to sleep in the meantime.
> - The wake alarm triggers a bit earlier than the process is woken up and
>   system suspend is started in between of the two events.
> 
> This race particular race is avoidable if the process sets its wakeup time
> to T - \Delta T, where \Delta T is enough for the process to be scheduled
> and run ioctl(sleepfd, SLEEPCTL_STAY_AWAKE).  So the complete sequence may
> look like this:
> 
> - The process opens /dev/sleepctl as sleepfd1 and sets the timeout to 0.
> - The process opens /dev/sleepctl as sleepfd2 and sets the timeout to T_2.
>   T_2 should be sufficient for the process to be able to call
>   ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) when woken up.
> - It sets up a wake alarm to trigger at time T.
> - It goes to sleep and sets it wakeup time to time T - \Delta T, such that
>   \Delta T is sufficient for the process to call
>   ioctl(sleepfd2, SLEEPCT_STAY_AWAKE).
> 
> Then, if system suspend happens before T - \Delta T, the process will be
> woken up along with the wakealarm event at time T and it will be able to call
> ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) before T_2 expires.  If system suspend
> doesn't happen in that time frame, the process will wake up at T - \Delta T
> and it will be able to call ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) (even if
> system suspend triggers after the process has been woken up and before it's
> able to run the ioctl, it doesn't matter, because the wakealarm wakeup will
> trigger the sleepfd2's STAY_AWAKE anyway).

So, the alarmtimer code is a bit more simple then what you describe
above (alarmtimers are just like regular posix timers, only enable an
RTC wakeup for the soonest event when the system goes into suspend).

However, such a dual-timer style behavior seems like it could work for
timer driven wakeups (and have been suggested to me by others as well).
Just to reiterate my understanding so that we're sure we're on the same
wavelength:

For any timer-style wakeup event, you set another non-wakeup timer for
some small period of time before the wakeup timer. Then when the
non-wakeup timer fires, the application inhibits suspend and waits for
the wakeup timer.  

Thus if the system is supended, the system will stay asleep until the
wakeup event, where we'll hold off suspend for a timeout length so the
task can run. If the system is not suspended, the early timer inhibits
suspend to block the possible race.

So yes, while not a very elegant solution in my mind (as its still racy
like any timeout based solution), it would seem to be workable in
practice, assuming wide error margins are used as the kernel does not
guarantee that timers will fire at a specific time (only after the
requested time). 

And this again assumes we'll see no timing issues as a result of system
load or realtime task processing.

> Still, there appear to be similar races that aren't avoidable (for example,
> if the time the wake alarm will trigger is not known to the process in
> advance), so I have an idea how to address them.  Namely, suppose we have
> one more ioctl, SLEEPCTL_WAIT_EVENT, that's equivalent to a combination
> of _RELAX, wait and _STAY_AWAKE such that the process will be sent a signal
> (say SIGPWR) on the first wakeup event and it's _STAY_AWAKE will trigger
> automatically.

So actually first sentence above is key, so let me talk about that
before I get into your new solution: As long as we know the timer is
going to fire, we can set the pre-timer to inhibit suspend. But most
wakeup events (network packets, keyboard presses, other buttons) are not
timer based, and we don't know when they would arrive. Thus the same
race could trigger between a wakeup-button press and a suspend call. 

1) wakeup key press
2) suspend call
3) key-press task scheduled

That's why I suggested adding the timeout on any wake event, instead of
resume. This would block the suspend call inbetween the wake event and
the application processing it. 

Really, the interaction is between the wakeup event and it being
processed in userland. Resume, if it occurs, should really be
transparent to that interaction. So that's why I think the
resume-specific behavior in your original proposal doesn't make sense.

> So in the scenarion above:
> 
> - The process opens /dev/sleepctl, sets the timeout to 0 and calls
>   ioctl(sleepfd, SLEEPCTL_STAY_AWAKE).
> - It sets up a wake alarm to trigger at time T.
> - It runs ioctl(sleepctl, SLEEPCTL_WAIT_EVENT) which "relaxes" its sleepfd
>   and makes it go to sleep until the first wakeup event happens.
> - The process' signal handler checks if the current time is >= T and makes
>   the process go to the previous step if not.

So I'm not sure if I'm understanding your suggestion totally. Is it that
when you call SLEEP_CTL_WAIT_EVENT, the ioctl sets SLEEP_CTL_RELAX, and
then the ioctl call blocks? 

Then when the signal handler triggers, where exactly does the
SLEEP_CTL_STAY_AWAKE call get made? Is it in the signal handler (after
the task has been scheduled)? Or is it done by the kernel on task
wakeup?

If its the former, I don't see how it blocks the race. 

If its the latter, then it seems this proposal starts to somewhat
approximate to my proposal (ie: kernel allows suspend on blocking on a
specific device, then disables it on task wakeup).

thanks
-john

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 16:48           ` John Stultz
@ 2011-10-17 18:19             ` Alan Stern
  2011-10-17 19:08               ` John Stultz
  2011-10-17 21:13             ` Rafael J. Wysocki
  1 sibling, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-17 18:19 UTC (permalink / raw)
  To: John Stultz; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, NeilBrown

On Mon, 17 Oct 2011, John Stultz wrote:

> So, the alarmtimer code is a bit more simple then what you describe
> above (alarmtimers are just like regular posix timers, only enable an
> RTC wakeup for the soonest event when the system goes into suspend).
> 
> However, such a dual-timer style behavior seems like it could work for
> timer driven wakeups (and have been suggested to me by others as well).
> Just to reiterate my understanding so that we're sure we're on the same
> wavelength:
> 
> For any timer-style wakeup event, you set another non-wakeup timer for
> some small period of time before the wakeup timer. Then when the
> non-wakeup timer fires, the application inhibits suspend and waits for
> the wakeup timer.  
> 
> Thus if the system is supended, the system will stay asleep until the
> wakeup event, where we'll hold off suspend for a timeout length so the
> task can run. If the system is not suspended, the early timer inhibits
> suspend to block the possible race.
> 
> So yes, while not a very elegant solution in my mind (as its still racy
> like any timeout based solution), it would seem to be workable in
> practice, assuming wide error margins are used as the kernel does not
> guarantee that timers will fire at a specific time (only after the
> requested time). 
> 
> And this again assumes we'll see no timing issues as a result of system
> load or realtime task processing.

It shouldn't have to be this complicated.  If a program wants the
system to be awake at a certain target time, it sets a wakeup timer for
that time.  Then it vetoes any suspend requests that occur too close to 
the target time, and continues to veto them until it has finished its 
job.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 18:19             ` Alan Stern
@ 2011-10-17 19:08               ` John Stultz
  2011-10-17 20:07                 ` Alan Stern
                                   ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: John Stultz @ 2011-10-17 19:08 UTC (permalink / raw)
  To: Alan Stern; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, NeilBrown

On Mon, 2011-10-17 at 14:19 -0400, Alan Stern wrote:
> On Mon, 17 Oct 2011, John Stultz wrote:
> 
> > So, the alarmtimer code is a bit more simple then what you describe
> > above (alarmtimers are just like regular posix timers, only enable an
> > RTC wakeup for the soonest event when the system goes into suspend).
> > 
> > However, such a dual-timer style behavior seems like it could work for
> > timer driven wakeups (and have been suggested to me by others as well).
> > Just to reiterate my understanding so that we're sure we're on the same
> > wavelength:
> > 
> > For any timer-style wakeup event, you set another non-wakeup timer for
> > some small period of time before the wakeup timer. Then when the
> > non-wakeup timer fires, the application inhibits suspend and waits for
> > the wakeup timer.  
> > 
> > Thus if the system is supended, the system will stay asleep until the
> > wakeup event, where we'll hold off suspend for a timeout length so the
> > task can run. If the system is not suspended, the early timer inhibits
> > suspend to block the possible race.
> > 
> > So yes, while not a very elegant solution in my mind (as its still racy
> > like any timeout based solution), it would seem to be workable in
> > practice, assuming wide error margins are used as the kernel does not
> > guarantee that timers will fire at a specific time (only after the
> > requested time). 
> > 
> > And this again assumes we'll see no timing issues as a result of system
> > load or realtime task processing.

> It shouldn't have to be this complicated.  If a program wants the
> system to be awake at a certain target time, it sets a wakeup timer for
> that time.  Then it vetoes any suspend requests that occur too close to 
> the target time, and continues to veto them until it has finished its 
> job.

I agree that the dual-timer approach is not really a good solution, and
doesn't help with similar races on non-timer based wakeups.

Though I also think proposed userland implementations that require
communication with all wakeup consumers before suspending (which really,
once you get aggressive about suspending when you can, means
communicating with all wakeup consumers on every wakeup event) isn't
really a good solution either.

Though as I've been thinking about it, there may be a way to do a
userland solution that uses the wakeup_count that isn't so inefficient.
Basically, its a varient of Mark's wakeup-device idea, but moved out to
userland.

There is a userland PM daemon. Its responsible for both suspending the
system, *and* handing all wakeup events.

Normal wakeup consumers open wakeup devices with a special library which
passes the open request through the PM daemon. The PM daemon opens the
device and provides a pipe fd back to the application, and basically
acts as a middle-man.

The PM daemon then cycles, doing the following:

while(1) {
	wakeup_count = read_int(wakeup_count_fd) /*possibly blocking*/
	if (wakeup_count != last_wakeup) {
		have_data = check_open_fds(fds);
		if (have_data)
			process_fds(fds);
		last_wakeup = wakeup_count;
	}
	write_int(wakeup_count_fd, wakeup_count);
	attempt_suspend();
}

Where check_open_fds() does a non-blocking select on all the fds that
the PM deamon has opened on behalf of applications, and process_fds()
basically writes any available data from the opened fds over to the
application through the earlier setup pipe. The daemon's write to the
pipe could be blocking, to ensure the application has read all of the
necessary data before the deamon continues trying to suspend.

Provided there is some suspend_inhibit/allow command that userspace can
make to the PM damon, this approach then provides a similar
select/wakelock/read pattern as what Android uses. The only other
features we might want is suggestion from Peter that
the /sys/power/state be only able to be opened by one application, so
that on systems which don't have the PM deamon running, applications
like the firmware update tool can try opening /sys/power/state and
blocking anyone from suspending under it.

Thoughts?

thanks
-john

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 19:08               ` John Stultz
@ 2011-10-17 20:07                 ` Alan Stern
  2011-10-17 20:34                   ` John Stultz
  2011-10-17 20:38                 ` Rafael J. Wysocki
  2011-10-17 21:19                 ` NeilBrown
  2 siblings, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-17 20:07 UTC (permalink / raw)
  To: John Stultz; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, NeilBrown

On Mon, 17 Oct 2011, John Stultz wrote:

> I agree that the dual-timer approach is not really a good solution, and
> doesn't help with similar races on non-timer based wakeups.
> 
> Though I also think proposed userland implementations that require
> communication with all wakeup consumers before suspending (which really,
> once you get aggressive about suspending when you can, means
> communicating with all wakeup consumers on every wakeup event) isn't
> really a good solution either.

I think you're not going to be able to do any better.  After a wakeup
event, any of the wakeup consumers could in theory become busy.  
Either you hope that the busy ones will tell the PM daemon they are
busy before the daemon tries another suspend (racy), or else the daemon
has to explicitly check the status of every client.

It doesn't get much better if you replace communication with the PM
daemon by communication with the kernel.

> Though as I've been thinking about it, there may be a way to do a
> userland solution that uses the wakeup_count that isn't so inefficient.
> Basically, its a varient of Mark's wakeup-device idea, but moved out to
> userland.
> 
> There is a userland PM daemon. Its responsible for both suspending the
> system, *and* handing all wakeup events.
> 
> Normal wakeup consumers open wakeup devices with a special library which
> passes the open request through the PM daemon. The PM daemon opens the
> device and provides a pipe fd back to the application, and basically
> acts as a middle-man.
> 
> The PM daemon then cycles, doing the following:
> 
> while(1) {
> 	wakeup_count = read_int(wakeup_count_fd) /*possibly blocking*/
> 	if (wakeup_count != last_wakeup) {
> 		have_data = check_open_fds(fds);
> 		if (have_data)
> 			process_fds(fds);
> 		last_wakeup = wakeup_count;
> 	}
> 	write_int(wakeup_count_fd, wakeup_count);
> 	attempt_suspend();
> }
> 
> 
> Where check_open_fds() does a non-blocking select on all the fds that
> the PM deamon has opened on behalf of applications, and process_fds()
> basically writes any available data from the opened fds over to the
> application through the earlier setup pipe. The daemon's write to the
> pipe could be blocking, to ensure the application has read all of the
> necessary data before the deamon continues trying to suspend.
> 
> Provided there is some suspend_inhibit/allow command that userspace can
> make to the PM damon, this approach then provides a similar
> select/wakelock/read pattern as what Android uses. The only other
> features we might want is suggestion from Peter that
> the /sys/power/state be only able to be opened by one application, so
> that on systems which don't have the PM deamon running, applications
> like the firmware update tool can try opening /sys/power/state and
> blocking anyone from suspending under it.
> 
> Thoughts?

So now, instead of contacting every client on every wakeup event, your 
daemon has to contact a client on every I/O operation!  That hardly 
seems more efficient.

Also, this doesn't cope well with wakeup conditions that aren't 
expressed in terms of data flowing through a pipe, such as a timer 
expiration.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 20:07                 ` Alan Stern
@ 2011-10-17 20:34                   ` John Stultz
  0 siblings, 0 replies; 80+ messages in thread
From: John Stultz @ 2011-10-17 20:34 UTC (permalink / raw)
  To: Alan Stern; +Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, NeilBrown

On Mon, 2011-10-17 at 16:07 -0400, Alan Stern wrote:
> > Though as I've been thinking about it, there may be a way to do a
> > userland solution that uses the wakeup_count that isn't so inefficient.
> > Basically, its a varient of Mark's wakeup-device idea, but moved out to
> > userland.
> > 
> > There is a userland PM daemon. Its responsible for both suspending the
> > system, *and* handing all wakeup events.
> > 
> > Normal wakeup consumers open wakeup devices with a special library which
> > passes the open request through the PM daemon. The PM daemon opens the
> > device and provides a pipe fd back to the application, and basically
> > acts as a middle-man.
> > 
> > The PM daemon then cycles, doing the following:
> > 
> > while(1) {
> > 	wakeup_count = read_int(wakeup_count_fd) /*possibly blocking*/
> > 	if (wakeup_count != last_wakeup) {
> > 		have_data = check_open_fds(fds);
> > 		if (have_data)
> > 			process_fds(fds);
> > 		last_wakeup = wakeup_count;
> > 	}
> > 	write_int(wakeup_count_fd, wakeup_count);
> > 	attempt_suspend();
> > }
> > 
> > 
> > Where check_open_fds() does a non-blocking select on all the fds that
> > the PM deamon has opened on behalf of applications, and process_fds()
> > basically writes any available data from the opened fds over to the
> > application through the earlier setup pipe. The daemon's write to the
> > pipe could be blocking, to ensure the application has read all of the
> > necessary data before the deamon continues trying to suspend.
> > 
> > Provided there is some suspend_inhibit/allow command that userspace can
> > make to the PM damon, this approach then provides a similar
> > select/wakelock/read pattern as what Android uses. The only other
> > features we might want is suggestion from Peter that
> > the /sys/power/state be only able to be opened by one application, so
> > that on systems which don't have the PM deamon running, applications
> > like the firmware update tool can try opening /sys/power/state and
> > blocking anyone from suspending under it.
> > 
> > Thoughts?
> 
> So now, instead of contacting every client on every wakeup event, your 
> daemon has to contact a client on every I/O operation!  That hardly 
> seems more efficient.

Well, I guess it depends on the common operation. If we're optimizing
for getting in and out of suspend, then doing less before suspend is
more efficient. 

If we're optimizing for all IO, then this proposal is less efficient.
But this is only for wakeup IO, which I suspect to be less frequent.

And if we're considering something like keyboard presses as wakeup IO,
the extra context switching is really no extra overhead then that
between X and X applications. 

Maybe networking wakeups would be more of a concern, but I'm not
familiar enough with the 3g modems on phones to know exactly if they
differentiate between any packet or have special wakeup packets. None
the less, this overhead would only be for wakeup devices opened through
the PM deamon, everything else would be unaffected.


> Also, this doesn't cope well with wakeup conditions that aren't 
> expressed in terms of data flowing through a pipe, such as a timer 
> expiration.

Quite true. However, its on my list to extend the timerfd to support
alarmtimers, which would provide similar fd semantics as what Android
uses.

thanks
-john



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 19:08               ` John Stultz
  2011-10-17 20:07                 ` Alan Stern
@ 2011-10-17 20:38                 ` Rafael J. Wysocki
  2011-10-17 21:20                   ` John Stultz
  2011-10-17 21:19                 ` NeilBrown
  2 siblings, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-17 20:38 UTC (permalink / raw)
  To: John Stultz; +Cc: Alan Stern, Linux PM list, mark gross, LKML, NeilBrown

On Monday, October 17, 2011, John Stultz wrote:
> On Mon, 2011-10-17 at 14:19 -0400, Alan Stern wrote:
> > On Mon, 17 Oct 2011, John Stultz wrote:
> > 
> > > So, the alarmtimer code is a bit more simple then what you describe
> > > above (alarmtimers are just like regular posix timers, only enable an
> > > RTC wakeup for the soonest event when the system goes into suspend).
> > > 
> > > However, such a dual-timer style behavior seems like it could work for
> > > timer driven wakeups (and have been suggested to me by others as well).
> > > Just to reiterate my understanding so that we're sure we're on the same
> > > wavelength:
> > > 
> > > For any timer-style wakeup event, you set another non-wakeup timer for
> > > some small period of time before the wakeup timer. Then when the
> > > non-wakeup timer fires, the application inhibits suspend and waits for
> > > the wakeup timer.  
> > > 
> > > Thus if the system is supended, the system will stay asleep until the
> > > wakeup event, where we'll hold off suspend for a timeout length so the
> > > task can run. If the system is not suspended, the early timer inhibits
> > > suspend to block the possible race.
> > > 
> > > So yes, while not a very elegant solution in my mind (as its still racy
> > > like any timeout based solution), it would seem to be workable in
> > > practice, assuming wide error margins are used as the kernel does not
> > > guarantee that timers will fire at a specific time (only after the
> > > requested time). 
> > > 
> > > And this again assumes we'll see no timing issues as a result of system
> > > load or realtime task processing.
> 
> > It shouldn't have to be this complicated.  If a program wants the
> > system to be awake at a certain target time, it sets a wakeup timer for
> > that time.  Then it vetoes any suspend requests that occur too close to 
> > the target time, and continues to veto them until it has finished its 
> > job.
> 
> I agree that the dual-timer approach is not really a good solution, and
> doesn't help with similar races on non-timer based wakeups.
> 
> Though I also think proposed userland implementations that require
> communication with all wakeup consumers before suspending (which really,
> once you get aggressive about suspending when you can, means
> communicating with all wakeup consumers on every wakeup event) isn't
> really a good solution either.
> 
> 
> Though as I've been thinking about it, there may be a way to do a
> userland solution that uses the wakeup_count that isn't so inefficient.
> Basically, its a varient of Mark's wakeup-device idea, but moved out to
> userland.
> 
> There is a userland PM daemon. Its responsible for both suspending the
> system, *and* handing all wakeup events.
> 
> Normal wakeup consumers open wakeup devices with a special library which
> passes the open request through the PM daemon. The PM daemon opens the
> device and provides a pipe fd back to the application, and basically
> acts as a middle-man.
> 
> The PM daemon then cycles, doing the following:
> 
> while(1) {
> 	wakeup_count = read_int(wakeup_count_fd) /*possibly blocking*/
> 	if (wakeup_count != last_wakeup) {
> 		have_data = check_open_fds(fds);
> 		if (have_data)
> 			process_fds(fds);
> 		last_wakeup = wakeup_count;
> 	}
> 	write_int(wakeup_count_fd, wakeup_count);
> 	attempt_suspend();
> }
> 
> 
> Where check_open_fds() does a non-blocking select on all the fds that
> the PM deamon has opened on behalf of applications, and process_fds()
> basically writes any available data from the opened fds over to the
> application through the earlier setup pipe. The daemon's write to the
> pipe could be blocking, to ensure the application has read all of the
> necessary data before the deamon continues trying to suspend.
> 
> Provided there is some suspend_inhibit/allow command that userspace can
> make to the PM damon, this approach then provides a similar
> select/wakelock/read pattern as what Android uses. The only other
> features we might want is suggestion from Peter that
> the /sys/power/state be only able to be opened by one application, so
> that on systems which don't have the PM deamon running, applications
> like the firmware update tool can try opening /sys/power/state and
> blocking anyone from suspending under it.
> 
> Thoughts?

Well, that's kind of like I thought it might work when I introduced
wakeup_count. :-)  So, I definitely don't think it's a bad approach.
If it addesses all your use cases, I'd say we can go for it, but I'd
like to explore the alternatives as far as we can to avoid going back
to them some time in future.

As for single-opening /sys/power/state, I don't think it will be
sufficient, because of the hibernate user space interface that doesn't
work on the basis of /sys/power/state.  It would have to be something
like /sys/power/manage that Alan has suggested (which opens one more
possibility, but see my reply to Alan).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 15:33           ` Alan Stern
@ 2011-10-17 21:10             ` Rafael J. Wysocki
  2011-10-17 21:27             ` Rafael J. Wysocki
  1 sibling, 0 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-17 21:10 UTC (permalink / raw)
  To: Alan Stern; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Monday, October 17, 2011, Alan Stern wrote:
> On Sun, 16 Oct 2011, Rafael J. Wysocki wrote:
> 
> > On Sunday, October 16, 2011, Alan Stern wrote:
> > > On Sat, 15 Oct 2011, Alan Stern wrote:
> > > 
> > > > Basically, what we need is a reliable way to intercept the existing
> > > > mechanisms for suspend/hibernate and to redirect the requests to the PM
> > > > daemon.  When the daemon is started up in "legacy" mode, it assumes
> > > > there is a legacy client (representing the entire set of
> > > > non-wakeup-aware programs) that always forbids suspend _except_ when
> > > > one of the old mechanisms is invoked.
> > > 
> > > The more I think about this, the better it seems.  In essence, it 
> > > amounts to "virtualizing" the existing PM interface.
> > > 
> > > Let's add /sys/power/manage, and make it single-open.
> > 
> > I'm not sure how to do that in sysfs.
> 
> If we don't implement the virtualization in the kernel, as Neil
> suggests, then /sys/power/manage isn't necessary.  (And yes, I don't 
> know how to make sysfs files single-open either -- probably there's no 
> way to do it.)
> 
> > Also I'm not sure what the real difference between /sys/power/manage
> > and my /sys/power/sleep_mode is (I could make /sys/power/sleep_mode
> > single-open too, if I knew how to do that).
> 
> We really need to determine up front what userspace environments we
> want to support.  It seems reasonable to decide that wakeup-awareness
> will be available only on systems that use a centralized mechanism for
> initiating system sleeps.  Whether that mechanism is pm-utils or a
> vendor-specific program in an embedded system shouldn't matter too much
> -- the important thing is that it can easily be changed to send
> requests to a PM daemon instead of writing directly to /sys/power/state
> or /dev/snapshot.
> 
> (Sending requests to the daemon need not be difficult; we could write a 
> special program just for that purpose.)
> 
> If we do things this way, it leaves open the possibility of bypassing 
> all the wakeup-aware code.  That's not necessarily a bad thing.

OK, I'd like to focus on this a bit more.  I'll reply to the rest of your
message separately.

_If_ we are going to use an extra interface for switching "modes" (e.g. current
behavior vs something that uses wakeup events detection unconditionally), then
it may be /dev/sleepctl working as follows:

1. It may be open only *once* for writing.
2. It may be open multiple times in parallel for reading.
3. While open for writing, it will cause all writes to /sys/power/wakeup_count
   fail (e.g. return -EACCES).
4. Writing to it will unconditionally store the written value as saved_count
   (this will allow the writer to effectively block all suspend/hibernate
   interfaces by writing a "known bad" number to it).
5. Reading from it will work like reading from /sys/power/wakeup_count.

This way, the writer using it (the power manager) won't have to open
/sys/power/wakeup_count in addition to opening /dev/sleepctl.

Next, we can add the SLEEPCTL_RELAX and SLEEPCTL_STAY_AWAKE ioctls to it in the
following way:

SLEEPCTL_STAY_AWAKE: cause all attempts to use /sys/power/state and the
  /dev/snapshot's ioctls to block until SLEEPCTL_RELAX is executed for
  the same file descriptor.

SLEEPCTL_RELAX(arg): if arg is 0, reverse the previous SLEEPCTL_STAY_AWAKE and
  return.  If arg is different from 0, reverse the previous SLEEPCTL_STAY_AWAKE
  and prepare the kernel to carry out an equivalent of SLEEPCTL_STAY_AWAKE on
  the given file descriptor and send the calling process the SIGPWR signal
  on the first wakeup event.

Then, the John's backup scenario will look like this:

1. SLEEPCTL_STAY_AWAKE
2. Set up the wakealarm.
3. SLEEPCTL_RELAX(1)
4. Sleep with timeout to wake up at time T (slightly before the wakealarm).
5. (a) if the system doesn't suspend and there are no wakeup events,
       do SLEEPCTL_STAY_AWAKE (when woken up) and create a backup.
   (b) if there is a wakeup event, check if the current time is earlier than
       T and go to 3 if so.  Otherwise, create a backup (the kernel has already
       done the SLEEPCTL_STAY_AWAKE for us).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 16:48           ` John Stultz
  2011-10-17 18:19             ` Alan Stern
@ 2011-10-17 21:13             ` Rafael J. Wysocki
  1 sibling, 0 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-17 21:13 UTC (permalink / raw)
  To: John Stultz; +Cc: Linux PM list, mark gross, LKML, Alan Stern, NeilBrown

On Monday, October 17, 2011, John Stultz wrote:
> On Sat, 2011-10-15 at 23:29 +0200, Rafael J. Wysocki wrote:
> > So I think (please correct me if I'm wrong) that you're worried about the
> > following situation:
> > 
> > - The process opens /dev/sleepctl and sets the timeout
> > - It sets up a wake alarm to trigger at time T.
> > - It goes to sleep and sets it wakeup time to time T too, e.g. using select()
> >   with a timeout.
> > - The system doesn't go to sleep in the meantime.
> > - The wake alarm triggers a bit earlier than the process is woken up and
> >   system suspend is started in between of the two events.
> > 
> > This race particular race is avoidable if the process sets its wakeup time
> > to T - \Delta T, where \Delta T is enough for the process to be scheduled
> > and run ioctl(sleepfd, SLEEPCTL_STAY_AWAKE).  So the complete sequence may
> > look like this:
> > 
> > - The process opens /dev/sleepctl as sleepfd1 and sets the timeout to 0.
> > - The process opens /dev/sleepctl as sleepfd2 and sets the timeout to T_2.
> >   T_2 should be sufficient for the process to be able to call
> >   ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) when woken up.
> > - It sets up a wake alarm to trigger at time T.
> > - It goes to sleep and sets it wakeup time to time T - \Delta T, such that
> >   \Delta T is sufficient for the process to call
> >   ioctl(sleepfd2, SLEEPCT_STAY_AWAKE).
> > 
> > Then, if system suspend happens before T - \Delta T, the process will be
> > woken up along with the wakealarm event at time T and it will be able to call
> > ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) before T_2 expires.  If system suspend
> > doesn't happen in that time frame, the process will wake up at T - \Delta T
> > and it will be able to call ioctl(sleepfd1, SLEEPCT_STAY_AWAKE) (even if
> > system suspend triggers after the process has been woken up and before it's
> > able to run the ioctl, it doesn't matter, because the wakealarm wakeup will
> > trigger the sleepfd2's STAY_AWAKE anyway).
> 
> So, the alarmtimer code is a bit more simple then what you describe
> above (alarmtimers are just like regular posix timers, only enable an
> RTC wakeup for the soonest event when the system goes into suspend).
> 
> However, such a dual-timer style behavior seems like it could work for
> timer driven wakeups (and have been suggested to me by others as well).
> Just to reiterate my understanding so that we're sure we're on the same
> wavelength:
> 
> For any timer-style wakeup event, you set another non-wakeup timer for
> some small period of time before the wakeup timer. Then when the
> non-wakeup timer fires, the application inhibits suspend and waits for
> the wakeup timer.  
> 
> Thus if the system is supended, the system will stay asleep until the
> wakeup event, where we'll hold off suspend for a timeout length so the
> task can run. If the system is not suspended, the early timer inhibits
> suspend to block the possible race.
> 
> So yes, while not a very elegant solution in my mind (as its still racy
> like any timeout based solution), it would seem to be workable in
> practice, assuming wide error margins are used as the kernel does not
> guarantee that timers will fire at a specific time (only after the
> requested time). 
> 
> And this again assumes we'll see no timing issues as a result of system
> load or realtime task processing.
> 
> 
> > Still, there appear to be similar races that aren't avoidable (for example,
> > if the time the wake alarm will trigger is not known to the process in
> > advance), so I have an idea how to address them.  Namely, suppose we have
> > one more ioctl, SLEEPCTL_WAIT_EVENT, that's equivalent to a combination
> > of _RELAX, wait and _STAY_AWAKE such that the process will be sent a signal
> > (say SIGPWR) on the first wakeup event and it's _STAY_AWAKE will trigger
> > automatically.
> 
> So actually first sentence above is key, so let me talk about that
> before I get into your new solution: As long as we know the timer is
> going to fire, we can set the pre-timer to inhibit suspend. But most
> wakeup events (network packets, keyboard presses, other buttons) are not
> timer based, and we don't know when they would arrive. Thus the same
> race could trigger between a wakeup-button press and a suspend call. 
> 
> 1) wakeup key press
> 2) suspend call
> 3) key-press task scheduled
> 
> That's why I suggested adding the timeout on any wake event, instead of
> resume. This would block the suspend call inbetween the wake event and
> the application processing it. 
> 
> Really, the interaction is between the wakeup event and it being
> processed in userland. Resume, if it occurs, should really be
> transparent to that interaction. So that's why I think the
> resume-specific behavior in your original proposal doesn't make sense.
> 
> 
> > So in the scenarion above:
> > 
> > - The process opens /dev/sleepctl, sets the timeout to 0 and calls
> >   ioctl(sleepfd, SLEEPCTL_STAY_AWAKE).
> > - It sets up a wake alarm to trigger at time T.
> > - It runs ioctl(sleepctl, SLEEPCTL_WAIT_EVENT) which "relaxes" its sleepfd
> >   and makes it go to sleep until the first wakeup event happens.
> > - The process' signal handler checks if the current time is >= T and makes
> >   the process go to the previous step if not.
> 
> 
> So I'm not sure if I'm understanding your suggestion totally. Is it that
> when you call SLEEP_CTL_WAIT_EVENT, the ioctl sets SLEEP_CTL_RELAX, and
> then the ioctl call blocks? 
> 
> Then when the signal handler triggers, where exactly does the
> SLEEP_CTL_STAY_AWAKE call get made? Is it in the signal handler (after
> the task has been scheduled)? Or is it done by the kernel on task
> wakeup?
> 
> If its the former, I don't see how it blocks the race. 
> 
> If its the latter, then it seems this proposal starts to somewhat
> approximate to my proposal (ie: kernel allows suspend on blocking on a
> specific device, then disables it on task wakeup).

It's the latter, but I think I have a better idea.

Please see my recent reply to Alan in this thread for details.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 19:08               ` John Stultz
  2011-10-17 20:07                 ` Alan Stern
  2011-10-17 20:38                 ` Rafael J. Wysocki
@ 2011-10-17 21:19                 ` NeilBrown
  2011-10-17 21:43                   ` John Stultz
  2 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-17 21:19 UTC (permalink / raw)
  To: John Stultz
  Cc: Alan Stern, Rafael J. Wysocki, Linux PM list, mark gross, LKML

[-- Attachment #1: Type: text/plain, Size: 4236 bytes --]

On Mon, 17 Oct 2011 12:08:49 -0700 John Stultz <john.stultz@linaro.org> wrote:

> On Mon, 2011-10-17 at 14:19 -0400, Alan Stern wrote:
> > On Mon, 17 Oct 2011, John Stultz wrote:
> > 
> > > So, the alarmtimer code is a bit more simple then what you describe
> > > above (alarmtimers are just like regular posix timers, only enable an
> > > RTC wakeup for the soonest event when the system goes into suspend).
> > > 
> > > However, such a dual-timer style behavior seems like it could work for
> > > timer driven wakeups (and have been suggested to me by others as well).
> > > Just to reiterate my understanding so that we're sure we're on the same
> > > wavelength:
> > > 
> > > For any timer-style wakeup event, you set another non-wakeup timer for
> > > some small period of time before the wakeup timer. Then when the
> > > non-wakeup timer fires, the application inhibits suspend and waits for
> > > the wakeup timer.  
> > > 
> > > Thus if the system is supended, the system will stay asleep until the
> > > wakeup event, where we'll hold off suspend for a timeout length so the
> > > task can run. If the system is not suspended, the early timer inhibits
> > > suspend to block the possible race.
> > > 
> > > So yes, while not a very elegant solution in my mind (as its still racy
> > > like any timeout based solution), it would seem to be workable in
> > > practice, assuming wide error margins are used as the kernel does not
> > > guarantee that timers will fire at a specific time (only after the
> > > requested time). 
> > > 
> > > And this again assumes we'll see no timing issues as a result of system
> > > load or realtime task processing.
> 
> > It shouldn't have to be this complicated.  If a program wants the
> > system to be awake at a certain target time, it sets a wakeup timer for
> > that time.  Then it vetoes any suspend requests that occur too close to 
> > the target time, and continues to veto them until it has finished its 
> > job.
> 
> I agree that the dual-timer approach is not really a good solution, and
> doesn't help with similar races on non-timer based wakeups.
> 
> Though I also think proposed userland implementations that require
> communication with all wakeup consumers before suspending (which really,
> once you get aggressive about suspending when you can, means
> communicating with all wakeup consumers on every wakeup event) isn't
> really a good solution either.

I would help me a lot if you could be more specific than "good".  Do you mean
"efficient" or "simple" or "secure" or ...

> 
> 
> Though as I've been thinking about it, there may be a way to do a
> userland solution that uses the wakeup_count that isn't so inefficient.
> Basically, its a varient of Mark's wakeup-device idea, but moved out to
> userland.

Here I see you probably meant "efficient".  Can that be quantified?  Do you
have a target latency for getting into suspend, and measurements that show
you regularly missing this target?
I am reminded of what Donald Knuth reportedly said about premature
optimisation.


> 
> There is a userland PM daemon. Its responsible for both suspending the
> system, *and* handing all wakeup events.
> 
> Normal wakeup consumers open wakeup devices with a special library which
> passes the open request through the PM daemon. The PM daemon opens the
> device and provides a pipe fd back to the application, and basically
> acts as a middle-man.

There is certainly merit in the idea but I think the pipes just get in the
way.

How about having both the PM daemon and the application listening on the same
FD.  The app sends the FD to the PM daemon on the same Unix domain socket
which is used to request suspend/resume handshaking.

The PM daemon never reads from the FD.  It only passes it to
poll/select/whatever.

When poll says  the FD is ready, the daemon initiated the handshake with the
app to make sure that it has consumed the event.  If none of the FDs are
ready for read and no process is blocking suspend, then the daemon is free to
enter suspend.


Of course if an App hasn't registered an FD, then it gets the handshake on
every suspend attempt.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 20:38                 ` Rafael J. Wysocki
@ 2011-10-17 21:20                   ` John Stultz
  0 siblings, 0 replies; 80+ messages in thread
From: John Stultz @ 2011-10-17 21:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, Linux PM list, mark gross, LKML, NeilBrown

On Mon, 2011-10-17 at 22:38 +0200, Rafael J. Wysocki wrote:
> On Monday, October 17, 2011, John Stultz wrote:
> > Though as I've been thinking about it, there may be a way to do a
> > userland solution that uses the wakeup_count that isn't so inefficient.
> > Basically, its a varient of Mark's wakeup-device idea, but moved out to
> > userland.
> > 
> > There is a userland PM daemon. Its responsible for both suspending the
> > system, *and* handing all wakeup events.
> > 
> > Normal wakeup consumers open wakeup devices with a special library which
> > passes the open request through the PM daemon. The PM daemon opens the
> > device and provides a pipe fd back to the application, and basically
> > acts as a middle-man.
> > 
> > The PM daemon then cycles, doing the following:
> > 
> > while(1) {
> > 	wakeup_count = read_int(wakeup_count_fd) /*possibly blocking*/
> > 	if (wakeup_count != last_wakeup) {
> > 		have_data = check_open_fds(fds);
> > 		if (have_data)
> > 			process_fds(fds);
> > 		last_wakeup = wakeup_count;
> > 	}
> > 	write_int(wakeup_count_fd, wakeup_count);
> > 	attempt_suspend();
> > }
> > 
> > 
> > Where check_open_fds() does a non-blocking select on all the fds that
> > the PM deamon has opened on behalf of applications, and process_fds()
> > basically writes any available data from the opened fds over to the
> > application through the earlier setup pipe. The daemon's write to the
> > pipe could be blocking, to ensure the application has read all of the
> > necessary data before the deamon continues trying to suspend.
> > 
> > Provided there is some suspend_inhibit/allow command that userspace can
> > make to the PM damon, this approach then provides a similar
> > select/wakelock/read pattern as what Android uses. The only other
> > features we might want is suggestion from Peter that
> > the /sys/power/state be only able to be opened by one application, so
> > that on systems which don't have the PM deamon running, applications
> > like the firmware update tool can try opening /sys/power/state and
> > blocking anyone from suspending under it.
> > 
> > Thoughts?
> 
> Well, that's kind of like I thought it might work when I introduced
> wakeup_count. :-)  So, I definitely don't think it's a bad approach.
> If it addesses all your use cases, I'd say we can go for it, but I'd
> like to explore the alternatives as far as we can to avoid going back
> to them some time in future.

Do forgive me for not seeing it earlier. But until I considered the idea
of the PM deamon handling the data as well as the wakeup_count, I didn't
really see a solution with a workable API (requiring full communication
before suspending didn't seem very usable in my mind, but I'm still
happy to entertain it if folks can get working protypes going to demo
it).

I of course am still interested in considering other approaches. But
I'll probably see how far I can get in prototyping this current userland
approach to see what issues it runs up against.

> As for single-opening /sys/power/state, I don't think it will be
> sufficient, because of the hibernate user space interface that doesn't
> work on the basis of /sys/power/state.  It would have to be something
> like /sys/power/manage that Alan has suggested (which opens one more
> possibility, but see my reply to Alan).

Right. I forgot about hibernate.

Although maybe if any of the sysfs devices are open, we could return
error on open to all of them? 

I'm just trying to think of what we can do without introducing a new
ABI.

thanks
-john


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 15:33           ` Alan Stern
  2011-10-17 21:10             ` Rafael J. Wysocki
@ 2011-10-17 21:27             ` Rafael J. Wysocki
  2011-10-18 17:30               ` Alan Stern
  1 sibling, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-17 21:27 UTC (permalink / raw)
  To: Alan Stern; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Monday, October 17, 2011, Alan Stern wrote:
> On Sun, 16 Oct 2011, Rafael J. Wysocki wrote:
...
> > > The only important requirement is that processes can use poll system 
> > > calls to wait for wakeup events.  This may not always be true (consider 
> > > timer expirations, for example), but we ought to be able to make some 
> > > sort of accomodation.
> 
> This requirement remains somewhat tricky.  Can we guarantee it?  It 
> comes down to two things.  When an event occurs that will cause a 
> program to want to keep the system awake:
> 
>      A. The event must be capable of interrupting a poll system
> 	call.  I don't think it matters whether this interruption
> 	takes the form of a signal or of completing the system call.
> 
>      B. The program must be able to detect, in a non-blocking way, 
> 	whether the event has occurred.
> 
> Of course, any event that adds data to an input queue will be okay.  
> But I don't know what other sorts of things we will have to handle.

Well, wakealarms don't do that, for one exaple.  Similarly for WoL through
a magic packet AFAICS.  Similarly for "a cable has been plugged in"
type of events.

> > > The PM daemon will communicate with its clients over a Unix-domain
> > > socket.  The protocol can be extremely simple: The daemon sends a byte
> > > to the client when it wants to sleep, and the client sends the byte
> > > back when it is ready to allow the system to go to sleep.  There's
> > > never more than one byte outstanding at any time in either direction.
> > > 
> > > The clients would be structured like this:
> > > 
> > > 	Open a socket connection to the PM daemon.
> > > 
> > > 	Loop:
> > > 
> > > 		Poll on possible events and the PM socket.
> > > 
> > > 		If any events occurred, handle them.
> > > 
> > > 		Otherwise if a byte was received from the PM daemon,
> > > 		send it back.
> > > 
> > > In non-legacy mode, the PM daemon's main loop is also quite simple:
> > > 
> > > 	1. Read /sys/power/wakeup_count.
> > > 
> > > 	2. For each client socket:
> > > 
> > > 		If a response to the previous transmission is still
> > > 		pending, wait for it.
> > > 
> > > 		Send a byte (the data can be just a sequence number).
> > > 
> > > 		Wait for the byte to be echoed back.
> > > 
> > > 	3. Write /sys/power/wakeup_count.
> > > 
> > > 	4. Write a sleep command to /sys/power/manage.
> > > 
> > > A timeout can be added to step 2 if desired, but in this mode it isn't
> > > needed.
> > > 
> > > With legacy support enabled, we probably will want something like a 
> > > 1-second timeout for step 2.  We'll also need an extra step at the 
> > > beginning and one at the end:
> > > 
> > > 	0. Wait for somebody to write "standy" or "mem" to 
> > > 	   /sys/power/state (received via the /sys/power/manage file).
> 
> This would be replaced by: Wait for a sleep request to be received over 
> the legacy interface.
> 
> > > 	5. Send the final status of the suspend command back to the
> > > 	   /sys/power/state writer.
> 
> I haven't received any comments on these designs so far.  They seem
> quite simple and adequate for what we want.  We may want to make the PM
> daemon also responsible for keeping track of RTC wakeup alarm requests,
> as Neil pointed out; that shouldn't be hard to add on.

Well, it's not a bad idea in principle and I think it will work, so long
as we can ensure that the PM daemon will be the only process using
suspend/hibernate interfaces.

Apart from this, steps 1.-3. represent a loop with quite a bit of socket
traffic if wakeup events occur relatively often (think someone typing on
a keyboard being a wakeup device or moving a mouse being a wakeup device).

> > > Equivalent support for hibernation is left as an exercise for the 
> > > reader.
> > 
> > Hehe.  Quite a difficult one for that matter. :-)
> 
> That's another thing we need to think about more carefully.  How 
> extravagant do we want to make the wakeup/hibernation interaction?  My 
> own feeling is: as little as possible (whatever that amounts to).

I don't agree with that.  In my opinion all system sleep interfaces should
be handled.

> > > This really seems like it could work, and it wouldn't be tremendously 
> > > complicated.  The only changes needed in the kernel would be the 
> > > "virtualization" (or forwarding) mechanism for legacy support.
> > 
> > Yes, it could be made work, just as the hibernate user space interface,
> > but would it be really convenient to use?  I have some doubts.
> 
> In terms of integration with current systems (and without the
> virtualization), it should be very easy.  There will be a new daemon to
> run when the system starts up, and a new program that will communicate
> with that daemon (or will write to /sys/power/state if the daemon isn't
> available).  That's all.
> 
> In terms of writing wakeup-aware clients, it's a little hard to say in 
> the absence of any examples.  The client protocol described above 
> shouldn't be too hard to use, especially if a wakeup library can be 
> provided.
> 
> For something like a firmware update program, all the program has to do
> is open a connection to the PM daemon before writing the new firmware.  
> Nothing more -- if the program does not send any data over the socket
> then the PM daemon will not allow sleep requests to go through.
> 
> Of course, the Android people have the most experience with this sort
> of thing.  In an earlier discussion with Arve, he expressed some
> concerns about getting the PM daemon started early enough (obviously it
> needs to be running before any of its clients) and the fact that the
> daemon would have to be multi-threaded.  I got the feeling that he was
> complaining just for the sake of complaining, not because these things
> would present any serious problems.
> 
> Converting the programs that currently use Android's userspace
> wakelocks might be somewhat more difficult.  Simply releasing a
> wakelock would no longer be sufficient; a program would need to respond
> to polls from the PM daemon whenever it was willing to let the system
> go to sleep.

I honestly don't think it will be very practical to expect all of the
existing Androig applications to be reworked this way ...

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 21:19                 ` NeilBrown
@ 2011-10-17 21:43                   ` John Stultz
  2011-10-17 23:06                     ` NeilBrown
  2011-10-17 23:14                     ` NeilBrown
  0 siblings, 2 replies; 80+ messages in thread
From: John Stultz @ 2011-10-17 21:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Alan Stern, Rafael J. Wysocki, Linux PM list, mark gross, LKML

On Tue, 2011-10-18 at 08:19 +1100, NeilBrown wrote:
> On Mon, 17 Oct 2011 12:08:49 -0700 John Stultz <john.stultz@linaro.org> wrote:
> 
> > On Mon, 2011-10-17 at 14:19 -0400, Alan Stern wrote:
> > > On Mon, 17 Oct 2011, John Stultz wrote:
> > > 
> > > > So, the alarmtimer code is a bit more simple then what you describe
> > > > above (alarmtimers are just like regular posix timers, only enable an
> > > > RTC wakeup for the soonest event when the system goes into suspend).
> > > > 
> > > > However, such a dual-timer style behavior seems like it could work for
> > > > timer driven wakeups (and have been suggested to me by others as well).
> > > > Just to reiterate my understanding so that we're sure we're on the same
> > > > wavelength:
> > > > 
> > > > For any timer-style wakeup event, you set another non-wakeup timer for
> > > > some small period of time before the wakeup timer. Then when the
> > > > non-wakeup timer fires, the application inhibits suspend and waits for
> > > > the wakeup timer.  
> > > > 
> > > > Thus if the system is supended, the system will stay asleep until the
> > > > wakeup event, where we'll hold off suspend for a timeout length so the
> > > > task can run. If the system is not suspended, the early timer inhibits
> > > > suspend to block the possible race.
> > > > 
> > > > So yes, while not a very elegant solution in my mind (as its still racy
> > > > like any timeout based solution), it would seem to be workable in
> > > > practice, assuming wide error margins are used as the kernel does not
> > > > guarantee that timers will fire at a specific time (only after the
> > > > requested time). 
> > > > 
> > > > And this again assumes we'll see no timing issues as a result of system
> > > > load or realtime task processing.
> > 
> > > It shouldn't have to be this complicated.  If a program wants the
> > > system to be awake at a certain target time, it sets a wakeup timer for
> > > that time.  Then it vetoes any suspend requests that occur too close to 
> > > the target time, and continues to veto them until it has finished its 
> > > job.
> > 
> > I agree that the dual-timer approach is not really a good solution, and
> > doesn't help with similar races on non-timer based wakeups.
> > 
> > Though I also think proposed userland implementations that require
> > communication with all wakeup consumers before suspending (which really,
> > once you get aggressive about suspending when you can, means
> > communicating with all wakeup consumers on every wakeup event) isn't
> > really a good solution either.
> 
> I would help me a lot if you could be more specific than "good".  Do you mean
> "efficient" or "simple" or "secure" or ...

Sorry. Efficient is what I mean. Having every task that consumes wakeup
events to have to be scheduled seems like it would unnecessarily slow
the suspend process.

Although I also don't see how the "its ok to suspend" handshake would
look like from the application's point of view. If the application is
blocking in the kernel on something, I don't think it could respond. So
this would require either signals from the PM demaon or the app to be
sure not to block. It just seems messy. I could just be not getting
something that makes it more elegant, so forgive me if that's the case.


> > Though as I've been thinking about it, there may be a way to do a
> > userland solution that uses the wakeup_count that isn't so inefficient.
> > Basically, its a varient of Mark's wakeup-device idea, but moved out to
> > userland.
> 
> Here I see you probably meant "efficient".  Can that be quantified?  Do you
> have a target latency for getting into suspend, and measurements that show
> you regularly missing this target?
> I am reminded of what Donald Knuth reportedly said about premature
> optimisation.

That is a fair point. I think the Android guys have better sense of the
specifics for suspend latency that they use. But just to get a sense of
it, on one Android board I've used, the system resumes and suspends for
each keystroke over the serial line.

>> There is a userland PM daemon. Its responsible for both suspending the
> > system, *and* handing all wakeup events.
> > 
> > Normal wakeup consumers open wakeup devices with a special library which
> > passes the open request through the PM daemon. The PM daemon opens the
> > device and provides a pipe fd back to the application, and basically
> > acts as a middle-man.
> 
> There is certainly merit in the idea but I think the pipes just get in the
> way.
> 
> How about having both the PM daemon and the application listening on the same
> FD.  The app sends the FD to the PM daemon on the same Unix domain socket
> which is used to request suspend/resume handshaking.
> 
> The PM daemon never reads from the FD.  It only passes it to
> poll/select/whatever.
> 
> When poll says  the FD is ready, the daemon initiated the handshake with the
> app to make sure that it has consumed the event.  If none of the FDs are
> ready for read and no process is blocking suspend, then the daemon is free to
> enter suspend.

So this is starting to sound pretty interesting!

I think you can drop the handshaking on suspend as well, because you can
consider the read() on the application side to mark that the event is
consumed. The application can flag to the pm daemon to inhibit suspend
after a select, but prior to reading.

Does that make sense to you?

This would both avoid the extra context switching to pass the event
over, and avoids the need to schedule everyone before suspending. 

thanks
-john



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16 23:48     ` NeilBrown
  2011-10-17 15:43       ` Alan Stern
@ 2011-10-17 22:02       ` Rafael J. Wysocki
  2011-10-17 23:36         ` NeilBrown
  1 sibling, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-17 22:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

On Monday, October 17, 2011, NeilBrown wrote:
> On Sun, 16 Oct 2011 00:10:40 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
...
> > 
> > >  But I think it is very wrong to put some hack in the kernel like your
> > >    suspend_mode = disabled
> > 
> > Why is it wrong and why do you think it is a "hack"?
> 
> I think it is a "hack" because it is addressing a specific complaint rather
> than fixing a real problem.

I wonder why you think that there's no real problem here.

The problem I see is that multiple processes can use the suspend/hibernate
interfaces pretty much at the same time (not exactly in parallel, becuase
there's some locking in there, but very well there may be two different
processes operating /sys/power/state independently of each other), while
the /sys/power/wakeup_count interface was designed with the assumption that
there will be only one such process in mind.

> Contrast that with your wakeup_events which are a carefully designed approach
> addressing a real problem and taking into account the big picture.
> 
> i.e. it seems to be addressing a symptom rather addressing the cause.
> 
> (and it is wrong because "hacks" are almost always wrong - short-term gain,
> long term cost).

Where I'm not sure what's the symptom and what's the cause. :-)


> > >  just because the user-space community hasn't got its act together yet.
> > 
> > Is there any guarantee that it will get its act together in any foreseeable
> > time frame?
> > 
> > >  And if you really need a hammer to stop processes from suspending the system:
> > > 
> > >    cat /sys/power/state > /tmp/state
> > >    mount --bind /tmp/state /sys/power/state
> > > 
> > >  should to it.
> > 
> > Except that (1) it appears to be racy (what if system suspend happens between
> > the first and second line in your example - can you safely start to upgrade
> > your firmware in that case?) and (2) it won't prevent the hibernate interface
> > based on /dev/snapshot from being used.
> > 
> > Do you honestly think I'd propose something like patch [1/2] if I didn't
> > see any other _working_ approach?
> 
> I think there are other workable approaches  (maybe not actually _working_,
> but only because no-one has written the code).
> 
> I'm not saying we should definitely not add more functionality to the kernel,
> but I am saying we should not do it at all hastily.

That I agree with.

> If someone has tried to use the current functionality, has really understood
> it, has made an appropriate attempt to make use of it, and has found that
> something cannot be make to work reliably, or efficiently, or securely or
> whatever, then certainly consider ways to address the problems.
> 
> But I don't think we are there yet.  We are only just getting to the
> "understanding" stage (and I have found these conversations very helpful in
> refining my understanding).
> 
> When I get my GTA04 (phone motherboard) I hope to write some code that
> actually realises these idea properly (I have code on my GTA02, but it is
> broken in various ways, and the kernel is too old to
> have /sys/power/wakeup_count anyway).
> 
> 
> > 
> > >  You second patch has little to recommend it either.
> > >  In the first place it seems to be entrenching the notion that timeouts are a
> > >  good and valid way to think about suspend.
> > 
> > That's because I think they are unavoidable.  Even if we are able to eliminate
> > all timeouts in the handling of wakeup events by the kernel and passing them
> > to user space, which I don't think is a realistic expectation, the user will
> > still have only so much time to wait for things to happen.  For example, if
> > a phone user doesn't see the screen turn on 0.5 sec after the button was
> > pressed, the button is pretty much guaranteed to be pressed again.  This
> > observation applies to other wakeup events, more or less.  They are very much
> > like items with "suitability for consumption" timestamps: it they are not
> > consumed quickly enough, we can simply forget about them.
> 
> I hadn't thought of it like that - I do see your point I think.
> However things are usually consumed long before they expire - expiry times
> are longer than expected shelf life.
> I think it is important to think carefully about the correct expiry time for
> each event type as they aren't all the same.
> So I would probably go for a larger default which is always safe, but
> possibly wasteful.  But that is a small point.
> 
> > 
> > >  I certainly agree that there are plenty of cases where timeouts are
> > >  important and necessary.  But there are also plenty of cases where you will
> > >  know exactly when you can allow suspend again, and having a timeout there is
> > >  just confusing.
> > 
> > Please note that with patch [2/2] the timeout can always be overriden.
> > 
> > >  But worse - the mechanism you provide can be trivially implemented using
> > >  unix-domain sockets talking to a suspend-daemon.
> > > 
> > >  Instead of opening /dev/sleepctl, you connect to /var/run/suspend-daemon/sock
> > >  Instead of ioctl(SLEEPCTL_STAY_AWAKE), you write a number to the socket.
> > >  Instead of ioctl(SLEEPCTL_RELAX), you write zero to the socket.
> > > 
> > >  All the extra handling you do in the kernel, can easily be done by
> > >  user-space suspend-daemon.
> > 
> > I'm not exactly sure why it is "worse".  Doing it through sockets may require
> > the kernel to do more work and it won't be possible to implement the
> > SLEEPCTL_WAIT_EVENT ioctl I've just described to John this way.
> 
> "worse" because it appears to me that you are adding functionality to the
> kernel which is effectively already present.  When people do that to meet a
> specific need it is usually not as usable as the original.  i.e. "You have
> re-invented XXX - badly".  In this case XXX is IPC.
> 
> Yes - more CPU cycles may be expended in the user-space solution than a
> kernel space solution, but that is a trade-off we often make.  I don't think
> that suspend is a time-critical operation - is it?
> 
> And I think SLEEPCTL_WAIT_EVENT would work fine over sockets, particularly
> instead of a signal being sense, a simple short message were sent back over
> the socket.
> 
> 
> 
> 
> > 
> > >  I really wish I could work out why people find the current mechanism
> > >  "difficult to use".  What exactly is it that is difficult?
> > >  I have describe previously how to build a race-free suspend system.  Which
> > >  bit of that is complicated or hard to achieve?  Or which bit of that cannot
> > >  work the way I claim?  Or which need is not met by my proposals?
> > > 
> > >  Isn't it much preferable to do this in userspace where people can
> > >  experiment and refine and improve without having to upgrade the kernel?
> > 
> > Well, I used to think that it's better to do things in user space.  Hence,
> > the hibernate user space interface that's used by many people.  And my
> > experience with that particular thing made me think that doing things in
> > the kernel may actually work better, even if they _can_ be done in user space.
> > 
> > Obviously, that doesn't apply to everything, but sometimes it simply is worth
> > discussing (if not trying).  If it doesn't work out, then fine, let's do it
> > differently, but I'm really not taking the "this should be done in user space"
> > argument at face value any more.  Sorry about that.
> 
> :-)  I have had similar mixed experiences.   Sometimes it can be a lot easier
> to get things working if it is all in the kernel.
> But I think that doing things in user-space leads to a lot more flexibility.
> Once you have the interfaces and designs worked out you can then start doing
> more interesting things and experimenting with ideas more easily.
> 
> In this case, I think the *only* barrier to a simple solution in user-space
> is the pre-existing software that uses the 'old' kernel interface.  It seems
> that interfacing with that is as easy as adding a script or two to pm-utils.

Well, assuming that we're only going to address the systems that use PM utils.

> With that problem solved, experimenting is much easier in user-space than in
> the kernel.

Somehow, I'm not exactly sure if we should throw all kernel-based solutions away
just yet.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 14:45           ` Alan Stern
@ 2011-10-17 22:49             ` NeilBrown
  2011-10-17 23:47               ` John Stultz
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-17 22:49 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz

[-- Attachment #1: Type: text/plain, Size: 9055 bytes --]

On Mon, 17 Oct 2011 10:45:59 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Mon, 17 Oct 2011, NeilBrown wrote:
> 
> > > The more I think about this, the better it seems.  In essence, it 
> > > amounts to "virtualizing" the existing PM interface.
> > 
> > While "virtualizing" does sound attractive in some way, I think it would be
> > the wrong thing to do.
> > In practice there is only one process at a time that is likely to suspend
> > the system.  I've just been exploring how that works.
> > 
> > gnome-power-manager talks to upowerd over dbus to ask for a suspend.
> > upowerd then runs /usr/sbin/pm-suspend.
> > pm-suspend then runs all the script in /usr/lib/pm-utils/sleep.d/
> > and the calls "do_suspend" which is defined in /usr/lib/pm-utils/pm-functions
> > 
> > Ugghh.. That is a very deep stack that is doing things the "wrong" way.
> > i.e. it is structured about request to suspend rather than requests to stay
> > awake.
> > 
> > Nonetheless, we only really need to worry about the bottom of the stack.
> > Rather than virtualize /sys/power/state, just modify pm-function, which
> > you can probably do by putting appropriate content
> > into /usr/lib/pm-utils/defaults.
> > Get that to define a do_suspend which interacts with the new suspend-daemon
> > to say "now would be a good time to suspend" - if nothing else is blocking
> > suspend, it does.
> > 
> > Put it another way:  power-management has always been "virtualized" via lots
> > of shell scripts in pm-utils (and various daemons stacked on top of that).
> > We just need to plug in to that virtualisation.
> > 
> > This is all based on gnome.  kde might be different, but I suspect that it
> > only at the top levels.  I would be surprised if kde and the other desktops
> > don't all end up going through pm-utils.
> 
> Okay, good; that allows us to avoid the virtualization issue.  The only
> reason for having it in the first place was to be certain of working
> with userspace environments that don't use a standard, structured
> method for initiating system sleeps.  If you don't care about those 
> environments then there's no need for it.
> 
> Do we agree about the best way to make this work?  I'm suggesting that
> when the PM daemon is started up with a "legacy" option, it should
> assume the existence of a predefined client that always wants to keep
> the system awake, except for brief periods whenever a sleep request is
> received from a new pm-utils program.  Maybe this new program could
> pass the PM daemon a time limit (such as 3000 ms), with the requirement
> that if the daemon can't put the system to sleep within that time limit
> then it should give up and fail the sleep request.
> 
> Alan Stern

We do seem to be approaching some sort of agreement ... well I am at least,
I cannot speak for others :-)

Yes it should start in a 'legacy' mode except that I think 'legacy' isn't
quite the right word as this is a mode that will always be needed to avoid
start-up races.

I don't see a real value in the 3000ms (though I don't really care one way or
the other).
I think there are - on a current desk top - two sorts of 'suspend now'
requests.
One comes from the desktop power manager (g-p-m?) and means "things have been
idle for a while, lets go to sleep". If the suspend daemon notices that
thinks aren't idle right *now*, it probably doesn't want to go to sleep.
The other comes from an explicit button press (whether a hard button or a
soft on-screen button) and mean "Must go to sleep now - master putting us in
a padded bag and we must stay cool".
In that case we could possibly delay a couple of seconds, but really do want
to go to sleep and what is more, we don't want to wake up again except by a
button press/lid opening.
Such a request should possibly disable timers and make sure the wifi is off
and not responding to wake-on-wlan.  I haven't really thought that issue
through yet so it isn't included in the following/.

However for the bits that I feel I do understand, this is what I (currently)
think it should (or could) look like.

1/ There is a suspend-management daemon that starts very early and is the only
   process that is allowed to initiate suspend or hibernate.  Any other
   process which tries to do this is a BUG.

2/ The daemon has two modes:
   A/ on-demand.  In this mode it will only enter suspend when requested to,
      and then only if there is nothing else blocking the suspend.
   B/ immediate.  In this mode it will enter suspend whenever nothing is
      blocking the suspend.  The daemon is free to add a small delay
      proportional to the resume latency if so configured.
   The daemon is in on-demand mode at start up.

3/ The daemon can handle 5 sorts of interactions with clients.

   i/ Change mode - a request to switch between on-demand and immediate mode.
  ii/ suspend now - a request to suspend which is only honoured if no client
      has blocked suspend, and if the kernel is not blocking suspend.
      Thus it is meaningless in immediate mode.
 iii/ be-awake-after - this request carries a timestamp and is stateful - it
      must be explicitly cancelled.  It requests that the system be fully
      active from that time onwards.
  iv/ notify - this establishes a 'session' between client and server.
      Server will call-back and await respond before entering suspend and
      again after resuming (no response needed for resume).
      The client is explicitly permitted to make a be-awake-after request
      during the suspend call-back.
   v/ notify-fd.  This is a special form of 'notify' which carries a file
      descriptor.  The server is not required to (and not expected to)
      initiate the 'suspend' callback unless the fd is reporting POLL_IN or
      POLL_ERR while preparing for suspend.

4/ The daemon manages the RTC alarm.  Any other process programing the alarm
   is a BUG.  Before entering suspend it will program the RTC to wake the
   system at (or slightly before) the time of the earliest active
   be-awake-after request.

5/ Possible implementation approaches for the client interactions:
   I/ A SOCK_STREAM unix domain socket which takes commands.
     On connect, server says "+READY".
     Client writes "MODE ON-DEMAND" or "MODE IMMEDIATE"
     Server replies "+MODE $MODE"

  II/ The same unix domain socket as I. 
     Client writes "SUSPEND"
     Server replies "+RESUMED" if the suspend happened, or
                    "-BUSY"  if it didn't.
     +RESUMED is no guarantee that an measurable time was in suspend, so
     maybe it isn't needed.

 III/ A separate Unix domain socket.
     On connect, server says "Awake" meaning that this connection is ensuring
     the system will be awake now.
     Client can write a seconds-since-epoch number, which the server will echo
     back when confirmed.  When that time arrives - which might be immediately
     - the server will write "Awake" again.
     When the client closes the connection, the suspend-block is removed.

  IV/ A third Unix domain socket.
     On connect, server writes a single character 'A' meaning 'system is
     awake'.
     When initiating suspend, server writes 'S' meaning 'suspend soon'.
     Client must reply to each 'S' with 'R' meaning 'ready'.  Server does not
     enter resume until the 'R' is received.
     On resume, server will write 'A' meaning 'awake' again.  Many clients
     might ignore this.

   V/ Same socket as IV, with extra message from client to server.
     Client writes 'M' (monitor) in a message with SCM_RIGHTS containing one
     or more fds.  Server will now only send 'S' when one or more of those fds
     are readable, but the client cannot rely on that and must (as always)
     not assume that a read will succeed, or will not block.

6/ The daemon may impose access control on be-awake messages.  In the above
   protocol it could be based on SCM_CREDENTIAL messages which might be
   required.
   It may also impose timeout on the 'R' reply from the 'S' request, or at
   least log clients which do not reply promptly.

7/ A client should not delay at all in replying to 'suspend
   soon' (S) with 'ready' (R).  It should only check if there is anything to
   do and should make a stay_awake request if there is something.  Then it
   must reply with 'R'.
   I should *not* use the fact that suspend is waiting for its reply to
   respond to an event as this misleads other clients as to the true state of
   the system.

8/ I haven't treated hibernate here.  My feeling is that it would be a
   different configuration for the daemon.
   If hibernate were possible and the soonest stay-awake time were longer
   than X in the future, then the daemon might configure the RTCalarm for X,
   and when that arrives, it pops out of suspend and goes into hibernate.
   But the details can wait for revision 2 of the spec..

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 21:43                   ` John Stultz
@ 2011-10-17 23:06                     ` NeilBrown
  2011-10-17 23:14                     ` NeilBrown
  1 sibling, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-17 23:06 UTC (permalink / raw)
  To: John Stultz
  Cc: Alan Stern, Rafael J. Wysocki, Linux PM list, mark gross, LKML

[-- Attachment #1: Type: text/plain, Size: 3520 bytes --]

On Mon, 17 Oct 2011 14:43:21 -0700 John Stultz <john.stultz@linaro.org> wrote:

> On Tue, 2011-10-18 at 08:19 +1100, NeilBrown wrote:
> > On Mon, 17 Oct 2011 12:08:49 -0700 John Stultz <john.stultz@linaro.org> wrote:

> > Here I see you probably meant "efficient".  Can that be quantified?  Do you
> > have a target latency for getting into suspend, and measurements that show
> > you regularly missing this target?
> > I am reminded of what Donald Knuth reportedly said about premature
> > optimisation.
> 
> That is a fair point. I think the Android guys have better sense of the
> specifics for suspend latency that they use. But just to get a sense of
> it, on one Android board I've used, the system resumes and suspends for
> each keystroke over the serial line.

Cool!

> 
> >> There is a userland PM daemon. Its responsible for both suspending the
> > > system, *and* handing all wakeup events.
> > > 
> > > Normal wakeup consumers open wakeup devices with a special library which
> > > passes the open request through the PM daemon. The PM daemon opens the
> > > device and provides a pipe fd back to the application, and basically
> > > acts as a middle-man.
> > 
> > There is certainly merit in the idea but I think the pipes just get in the
> > way.
> > 
> > How about having both the PM daemon and the application listening on the same
> > FD.  The app sends the FD to the PM daemon on the same Unix domain socket
> > which is used to request suspend/resume handshaking.
> > 
> > The PM daemon never reads from the FD.  It only passes it to
> > poll/select/whatever.
> > 
> > When poll says  the FD is ready, the daemon initiated the handshake with the
> > app to make sure that it has consumed the event.  If none of the FDs are
> > ready for read and no process is blocking suspend, then the daemon is free to
> > enter suspend.
> 
> So this is starting to sound pretty interesting!
> 
> I think you can drop the handshaking on suspend as well, because you can
> consider the read() on the application side to mark that the event is
> consumed. The application can flag to the pm daemon to inhibit suspend
> after a select, but prior to reading.
> 
> Does that make sense to you?
> 
> This would both avoid the extra context switching to pass the event
> over, and avoids the need to schedule everyone before suspending. 

I'm not sure if you are saying something different to me or not.

In my proposal we do avoid the handshake and context switch unless the fd is
readable when the daemon tries to suspend - and that should be the uncommon
case.
Avoiding some handshake is not possible when the fd is readable.
You could still have an implicit handshake were the daemon knows the app
hasn't handled the event yet because it hasn't read, and the app gets a
stay-awake lock before reading.  However
1/ there is still a handshake and probably a context switch - it is just
   less explicit
2/ the server needs to wait for "fd is not readable" and we don't have an
   interface for that.
   I guess the server could assume that if an fd is readable, then a
   stay-awake request will be made, so it waits for the stay-awake request.
   That feels a little bit fragile, but it might work and it could end up
   being a little more efficient - it would need careful analysis and probably
   some experimentation to be sure.

So yes - maybe it makes sense, but it needs a concrete implementation to
provide proper review.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode
  2011-10-17 21:43                   ` John Stultz
  2011-10-17 23:06                     ` NeilBrown
@ 2011-10-17 23:14                     ` NeilBrown
  1 sibling, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-17 23:14 UTC (permalink / raw)
  To: John Stultz
  Cc: Alan Stern, Rafael J. Wysocki, Linux PM list, mark gross, LKML

[-- Attachment #1: Type: text/plain, Size: 2101 bytes --]

On Mon, 17 Oct 2011 14:43:21 -0700 John Stultz <john.stultz@linaro.org> wrote:

> On Tue, 2011-10-18 at 08:19 +1100, NeilBrown wrote:
> > On Mon, 17 Oct 2011 12:08:49 -0700 John Stultz <john.stultz@linaro.org> wrote:

> > > Though I also think proposed userland implementations that require
> > > communication with all wakeup consumers before suspending (which really,
> > > once you get aggressive about suspending when you can, means
> > > communicating with all wakeup consumers on every wakeup event) isn't
> > > really a good solution either.
> > 
> > I would help me a lot if you could be more specific than "good".  Do you mean
> > "efficient" or "simple" or "secure" or ...
> 
> Sorry. Efficient is what I mean. Having every task that consumes wakeup
> events to have to be scheduled seems like it would unnecessarily slow
> the suspend process.
> 
> Although I also don't see how the "its ok to suspend" handshake would
> look like from the application's point of view. If the application is
> blocking in the kernel on something, I don't think it could respond. So
> this would require either signals from the PM demaon or the app to be
> sure not to block. It just seems messy. I could just be not getting
> something that makes it more elegant, so forgive me if that's the case.
> 
> 

Sorry - missed this bit in the previous reply.

Blocking in the kernel would be a problem.
But programs that need to respond to events tend to avoid blocking.
They usually use an event loop and non-blocking IO, or they use threads so
that some part is always ready to respond.

The same requirements would be imposed on a process that responds to wakeup
events - it just has to be able to respond to 'about to suspend' events too.

So I don't think it is any more messy then event handling always is (and if
you use libevent, most of that is hidden under the carpet anyway).


(and no:  not signals.  Never signals.  Just don't even think about signals.
I hate signals.  Use poll or equivalents - never signals (unless you cannot
avoid them))

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 22:02       ` Rafael J. Wysocki
@ 2011-10-17 23:36         ` NeilBrown
  2011-10-22 22:07           ` Rafael J. Wysocki
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-17 23:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

[-- Attachment #1: Type: text/plain, Size: 4633 bytes --]

On Tue, 18 Oct 2011 00:02:30 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Monday, October 17, 2011, NeilBrown wrote:
> > On Sun, 16 Oct 2011 00:10:40 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> ...
> > > 
> > > >  But I think it is very wrong to put some hack in the kernel like your
> > > >    suspend_mode = disabled
> > > 
> > > Why is it wrong and why do you think it is a "hack"?
> > 
> > I think it is a "hack" because it is addressing a specific complaint rather
> > than fixing a real problem.
> 
> I wonder why you think that there's no real problem here.
> 
> The problem I see is that multiple processes can use the suspend/hibernate
> interfaces pretty much at the same time (not exactly in parallel, becuase
> there's some locking in there, but very well there may be two different
> processes operating /sys/power/state independently of each other), while
> the /sys/power/wakeup_count interface was designed with the assumption that
> there will be only one such process in mind.

Multiple process can write to your mail box at the same time.  But some how
they don't.  This isn't because the kernel enforces anything, but because all
the relevant programs have an agreed protocol by which they arbitrate access.
One upon a time this involved creating a lock file with O_CREAT|O_EXCL.
These days it is fcntl locking.  But it is still advisory.

In the same way - we stop multiple processes from suspending/hibernating at
the same time by having an agreed protocol by which they share access to the
resource.  The kernel does not need to be explicitly involved in this.

...

> > > Well, I used to think that it's better to do things in user space.  Hence,
> > > the hibernate user space interface that's used by many people.  And my
> > > experience with that particular thing made me think that doing things in
> > > the kernel may actually work better, even if they _can_ be done in user space.
> > > 
> > > Obviously, that doesn't apply to everything, but sometimes it simply is worth
> > > discussing (if not trying).  If it doesn't work out, then fine, let's do it
> > > differently, but I'm really not taking the "this should be done in user space"
> > > argument at face value any more.  Sorry about that.
> > 
> > :-)  I have had similar mixed experiences.   Sometimes it can be a lot easier
> > to get things working if it is all in the kernel.
> > But I think that doing things in user-space leads to a lot more flexibility.
> > Once you have the interfaces and designs worked out you can then start doing
> > more interesting things and experimenting with ideas more easily.
> > 
> > In this case, I think the *only* barrier to a simple solution in user-space
> > is the pre-existing software that uses the 'old' kernel interface.  It seems
> > that interfacing with that is as easy as adding a script or two to pm-utils.
> 
> Well, assuming that we're only going to address the systems that use PM utils.

I suspect (and claim without proof :-) that any system will have some single
user-space thing that is responsible for initiating suspend.
Every time I look at one I see a whole host of things that need to be done
just before suspend, and other things just after resume.
They used to be in /etc/apm/event.d.  Now there are
in /usr/lib/pm-utils/sleep.d.  I think they were in /etc/acpid once.
I've seen one thing that uses shared-library modules instead of shell scripts
on the basis that it avoids forking and goes fast (and it probably does).
But I doubt there is any interesting system where writing to /sys/power/state
is the *only* thing you need to do for a clean suspend.
So all systems will have some user-space infrastructure to support suspend,
and we just need to hook in to that.

> 
> > With that problem solved, experimenting is much easier in user-space than in
> > the kernel.
> 
> Somehow, I'm not exactly sure if we should throw all kernel-based solutions away
> just yet.

My rule-of-thumb is that we should reserve kernel space for when
  a/ it cannot be done in user space
  b/ it cannot be done efficient in user space
  c/ it cannot be done securely in user space

I don't think any of those have been demonstrated yet.  If/when they are it
would be good to get those kernel-based solutions out of the draw (so yes:
keep them out of the rubbish bin).

So I'd respond with "I'm not at all sure that we should throw away an
all-userspace solution just yet".  Particularly because many of us seem to
still be working to understand what all the issues really are.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 22:49             ` NeilBrown
@ 2011-10-17 23:47               ` John Stultz
  2011-10-18  2:13                 ` NeilBrown
  0 siblings, 1 reply; 80+ messages in thread
From: John Stultz @ 2011-10-17 23:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: Alan Stern, Rafael J. Wysocki, Linux PM list, mark gross, LKML

On Tue, 2011-10-18 at 09:49 +1100, NeilBrown wrote:
> However for the bits that I feel I do understand, this is what I (currently)
> think it should (or could) look like.
> 
> 
> 1/ There is a suspend-management daemon that starts very early and is the only
>    process that is allowed to initiate suspend or hibernate.  Any other
>    process which tries to do this is a BUG.
> 
> 2/ The daemon has two modes:
>    A/ on-demand.  In this mode it will only enter suspend when requested to,
>       and then only if there is nothing else blocking the suspend.
>    B/ immediate.  In this mode it will enter suspend whenever nothing is
>       blocking the suspend.  The daemon is free to add a small delay
>       proportional to the resume latency if so configured.
>    The daemon is in on-demand mode at start up.
> 
> 3/ The daemon can handle 5 sorts of interactions with clients.
> 
>    i/ Change mode - a request to switch between on-demand and immediate mode.
>   ii/ suspend now - a request to suspend which is only honoured if no client
>       has blocked suspend, and if the kernel is not blocking suspend.
>       Thus it is meaningless in immediate mode.
>  iii/ be-awake-after - this request carries a timestamp and is stateful - it
>       must be explicitly cancelled.  It requests that the system be fully
>       active from that time onwards.

This initially wasn't super clear to me why this is necessary. I see
below it is trying to handle the non-fd timer method to keeping the
system awake.

Although does this also duplex as the  suspend-inhibit/suspend-allow
call made by applications? Or was that interaction just skipped here?

>   iv/ notify - this establishes a 'session' between client and server.
>       Server will call-back and await respond before entering suspend and
>       again after resuming (no response needed for resume).
>       The client is explicitly permitted to make a be-awake-after request
>       during the suspend call-back.

With the notify-fd example included below, I'm curious what specific use
cases you see as requiring the notify interaction? 

>    v/ notify-fd.  This is a special form of 'notify' which carries a file
>       descriptor.  The server is not required to (and not expected to)
>       initiate the 'suspend' callback unless the fd is reporting POLL_IN or
>       POLL_ERR while preparing for suspend.

I'd think it would be "the server is not allowed to" instead of "not
required to".

> 4/ The daemon manages the RTC alarm.  Any other process programing the alarm
>    is a BUG.  Before entering suspend it will program the RTC to wake the
>    system at (or slightly before) the time of the earliest active
>    be-awake-after request.

So, this may need to be revised. My RTC virtualization and alarmtimer
rework gives us a lot more flexibility with RTC events. Given the array
of existing applications that use the RTC chardev, I think its not
realistic to consider it a bug if someone else is using it. 

That said, the posix alarmtimer interface allows us to trigger wakeup
events in the future, without disrupting the legacy chardev programming
(this is possible because the kernel now virtualizes the chardev).

I'd probably rather add alarmtimer functionality to the timerfd, in
order to allow the notify-fd method to work with timers. But its not a
huge deal. I'd just like to avoid reimplementing a timer dispatch system
in userland.

> 5/ Possible implementation approaches for the client interactions:
>    I/ A SOCK_STREAM unix domain socket which takes commands.
>      On connect, server says "+READY".
>      Client writes "MODE ON-DEMAND" or "MODE IMMEDIATE"
>      Server replies "+MODE $MODE"
> 
>   II/ The same unix domain socket as I. 
>      Client writes "SUSPEND"
>      Server replies "+RESUMED" if the suspend happened, or
>                     "-BUSY"  if it didn't.
>      +RESUMED is no guarantee that an measurable time was in suspend, so
>      maybe it isn't needed.
> 
>  III/ A separate Unix domain socket.
>      On connect, server says "Awake" meaning that this connection is ensuring
>      the system will be awake now.
>      Client can write a seconds-since-epoch number, which the server will echo
>      back when confirmed.  When that time arrives - which might be immediately
>      - the server will write "Awake" again.
>      When the client closes the connection, the suspend-block is removed.

What is the seconds-since-epoch bit for? 

>   IV/ A third Unix domain socket.
>      On connect, server writes a single character 'A' meaning 'system is
>      awake'.
>      When initiating suspend, server writes 'S' meaning 'suspend soon'.
>      Client must reply to each 'S' with 'R' meaning 'ready'.  Server does not
>      enter resume until the 'R' is received.
>      On resume, server will write 'A' meaning 'awake' again.  Many clients
>      might ignore this.

Again, still not sure about this bit, but how do you handle aborted
suspends? If you have one blocked task that takes a really long time to
respond, what happens if you've had multiple attempts to suspend that
have aborted? Just want to make sure you don't end up getting an late
ack for an old suspend attempt (although I'm not really sure if that
matters).

>    V/ Same socket as IV, with extra message from client to server.
>      Client writes 'M' (monitor) in a message with SCM_RIGHTS containing one
>      or more fds.  Server will now only send 'S' when one or more of those fds
>      are readable, but the client cannot rely on that and must (as always)
>      not assume that a read will succeed, or will not block.

Err. Not following this. If this is the notify-fd bit, I'd expect the
client to provide the fds, and then that's it. Then the server will
check those fds before trying to suspend, and if any have data, it will
wait until that data is read. Why does the server send an S in this one?
Doesn't the task also see that there is data there?

> 6/ The daemon may impose access control on be-awake messages.  In the above
>    protocol it could be based on SCM_CREDENTIAL messages which might be
>    required.
>    It may also impose timeout on the 'R' reply from the 'S' request, or at
>    least log clients which do not reply promptly.

This again feels more complex then necessary, but I'll leave it be for
now.

> 7/ A client should not delay at all in replying to 'suspend
>    soon' (S) with 'ready' (R).  It should only check if there is anything to
>    do and should make a stay_awake request if there is something.  Then it
>    must reply with 'R'.
>    I should *not* use the fact that suspend is waiting for its reply to
>    respond to an event as this misleads other clients as to the true state of
>    the system.

Again, while I'm not sure about the notify method, this interleaving
seems right to me. 

> 8/ I haven't treated hibernate here.  My feeling is that it would be a
>    different configuration for the daemon.
>    If hibernate were possible and the soonest stay-awake time were longer
>    than X in the future, then the daemon might configure the RTCalarm for X,
>    and when that arrives, it pops out of suspend and goes into hibernate.
>    But the details can wait for revision 2 of the spec..

I'm not sure if hibernate is different in my mind, other then it taking
much longer. It just seems like it would be a subtlety of the type of
"suspend-now" request made to the PM daemon.

So while I'm excited to be making some headway on the userland approach,
I'm also concerned about how this approach might mesh with other dynamic
run-time power-saving methods that might be used in the future. For
instance, if some future scheduler does some form of rate limiting, and
avoids scheduling applications to keep the cpu in deep idle for longer,
would this keep the kernel from knowing enough to not freeze tasks that
might need to do something so that suspend can occur?   This in effect
would cause one power-saving strategy to block a potentially more
power-saving method from occurring. 

This is in part what I was trying to address with my original
SCHED_STAYAWAKE proposal, trying to find a mechanism that provides
adequate information for the kernel to make appropriate decisions. I
worry a little bit about having too narrow a view on these solutions. 

That of course won't keep me from trying to start work on this user-land
approach, but it is something I think we should keep in mind. It seems
with too many things (Dave Hansens' virtualization talk at Plumbers
covered some examples), we end up with 4-5 small solutions to smaller
problems that don't really work well together instead of stepping back
and seeing the broader picture.

thanks
-john

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 23:47               ` John Stultz
@ 2011-10-18  2:13                 ` NeilBrown
  2011-10-18 17:11                   ` Alan Stern
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-18  2:13 UTC (permalink / raw)
  To: John Stultz
  Cc: Alan Stern, Rafael J. Wysocki, Linux PM list, mark gross, LKML

[-- Attachment #1: Type: text/plain, Size: 13689 bytes --]

On Mon, 17 Oct 2011 16:47:04 -0700 John Stultz <john.stultz@linaro.org> wrote:

> On Tue, 2011-10-18 at 09:49 +1100, NeilBrown wrote:
> > However for the bits that I feel I do understand, this is what I (currently)
> > think it should (or could) look like.
> > 
> > 
> > 1/ There is a suspend-management daemon that starts very early and is the only
> >    process that is allowed to initiate suspend or hibernate.  Any other
> >    process which tries to do this is a BUG.
> > 
> > 2/ The daemon has two modes:
> >    A/ on-demand.  In this mode it will only enter suspend when requested to,
> >       and then only if there is nothing else blocking the suspend.
> >    B/ immediate.  In this mode it will enter suspend whenever nothing is
> >       blocking the suspend.  The daemon is free to add a small delay
> >       proportional to the resume latency if so configured.
> >    The daemon is in on-demand mode at start up.
> > 
> > 3/ The daemon can handle 5 sorts of interactions with clients.
> > 
> >    i/ Change mode - a request to switch between on-demand and immediate mode.
> >   ii/ suspend now - a request to suspend which is only honoured if no client
> >       has blocked suspend, and if the kernel is not blocking suspend.
> >       Thus it is meaningless in immediate mode.
> >  iii/ be-awake-after - this request carries a timestamp and is stateful - it
> >       must be explicitly cancelled.  It requests that the system be fully
> >       active from that time onwards.
> 
> This initially wasn't super clear to me why this is necessary. I see
> below it is trying to handle the non-fd timer method to keeping the
> system awake.
> 
> Although does this also duplex as the  suspend-inhibit/suspend-allow
> call made by applications? Or was that interaction just skipped here?

Yes, exactly.  This is primarily allowing an application to say "inhibit
suspend" (aka "be awake").  Being able to make the request for a future time
seemed a natural and simple extension.
If you can do timer wakeups like other wakeups and find it easier that way,
then we can leave the timestamp out of it.


> 
> >   iv/ notify - this establishes a 'session' between client and server.
> >       Server will call-back and await respond before entering suspend and
> >       again after resuming (no response needed for resume).
> >       The client is explicitly permitted to make a be-awake-after request
> >       during the suspend call-back.
> 
> With the notify-fd example included below, I'm curious what specific use
> cases you see as requiring the notify interaction? 

None specifically.  However while I'm convinced that all events must be
visible to user-space I am not convinced that they will be visible to a
poll.  You might occasionally require a read on a sysfs file, and then parse
the contents to see if the event happened.
We can do poll on sysfs files now so that can probably be avoided.
But I didn't want to close doors before I was sure no-one needed them.

And I think that with notify-fd you still need a hand-shake of some sort, and
this provides a simple starting point.

> 
> >    v/ notify-fd.  This is a special form of 'notify' which carries a file
> >       descriptor.  The server is not required to (and not expected to)
> >       initiate the 'suspend' callback unless the fd is reporting POLL_IN or
> >       POLL_ERR while preparing for suspend.
> 
> I'd think it would be "the server is not allowed to" instead of "not
> required to".

Maybe.  When specifying a protocol I am cautious of excluding things that are
merely inconvenient.  So "should not" but not "shall not" in rfc-speak.
However it might be easier on the client if it knew there would never be a
call-back so it might be best to make it "shall now".

> 
> > 4/ The daemon manages the RTC alarm.  Any other process programing the alarm
> >    is a BUG.  Before entering suspend it will program the RTC to wake the
> >    system at (or slightly before) the time of the earliest active
> >    be-awake-after request.
> 
> So, this may need to be revised. My RTC virtualization and alarmtimer
> rework gives us a lot more flexibility with RTC events. Given the array
> of existing applications that use the RTC chardev, I think its not
> realistic to consider it a bug if someone else is using it. 

If multiple applications think they can independently "own" the RTC alarm
then that sounds like it is already a bug quite apart from anything I add.

We must have some way to virtualise the rtc-alarm so that any app can be sure
there will we be a wakeup at-or-before some time.  I suggested doing that via
the suspend daemon.  If there is a strong case for a more general
kernel-based virtualisation of the RTC alarm in the kernel - then maybe that
is OK.

> 
> That said, the posix alarmtimer interface allows us to trigger wakeup
> events in the future, without disrupting the legacy chardev programming
> (this is possible because the kernel now virtualizes the chardev).
> 
> I'd probably rather add alarmtimer functionality to the timerfd, in
> order to allow the notify-fd method to work with timers. But its not a
> huge deal. I'd just like to avoid reimplementing a timer dispatch system
> in userland.

Yep.  Exactly which solution gets implemented isn't important as long as it
is clean and well defined.

> 
> 
> > 5/ Possible implementation approaches for the client interactions:
> >    I/ A SOCK_STREAM unix domain socket which takes commands.
> >      On connect, server says "+READY".
> >      Client writes "MODE ON-DEMAND" or "MODE IMMEDIATE"
> >      Server replies "+MODE $MODE"
> > 
> >   II/ The same unix domain socket as I. 
> >      Client writes "SUSPEND"
> >      Server replies "+RESUMED" if the suspend happened, or
> >                     "-BUSY"  if it didn't.
> >      +RESUMED is no guarantee that an measurable time was in suspend, so
> >      maybe it isn't needed.
> > 
> >  III/ A separate Unix domain socket.
> >      On connect, server says "Awake" meaning that this connection is ensuring
> >      the system will be awake now.
> >      Client can write a seconds-since-epoch number, which the server will echo
> >      back when confirmed.  When that time arrives - which might be immediately
> >      - the server will write "Awake" again.
> >      When the client closes the connection, the suspend-block is removed.
> 
> What is the seconds-since-epoch bit for? 

That is the time when the server will ensure the system is awake from.  i.e.
the wakeup timer.  If it is in the past, it means "be awake now".


> 
> >   IV/ A third Unix domain socket.
> >      On connect, server writes a single character 'A' meaning 'system is
> >      awake'.
> >      When initiating suspend, server writes 'S' meaning 'suspend soon'.
> >      Client must reply to each 'S' with 'R' meaning 'ready'.  Server does not
> >      enter resume until the 'R' is received.
> >      On resume, server will write 'A' meaning 'awake' again.  Many clients
> >      might ignore this.
> 
> Again, still not sure about this bit, but how do you handle aborted
> suspends? If you have one blocked task that takes a really long time to
> respond, what happens if you've had multiple attempts to suspend that
> have aborted? Just want to make sure you don't end up getting an late
> ack for an old suspend attempt (although I'm not really sure if that
> matters).

The server just needs to ensure that on every connection that it sends an 'S',
it waits for an 'R', and subsequently sends an 'A'.
Whether a suspend actually happens between the R and the A, or whether it was
aborted, is irrelevant.
After a suspend, whether aborted or not, the server must send 'A' to all
clients that it sent 'S' to.  Then it must sent S and wait for R before
trying to suspend again.

So a client that has been blocked for a while might see an 'A' and an 'S' but
that is all.  If it blocked for too long and the server was allowed to reject
it, it might see a closed connection.
There should be no confusion.


> 
> >    V/ Same socket as IV, with extra message from client to server.
> >      Client writes 'M' (monitor) in a message with SCM_RIGHTS containing one
> >      or more fds.  Server will now only send 'S' when one or more of those fds
> >      are readable, but the client cannot rely on that and must (as always)
> >      not assume that a read will succeed, or will not block.
> 
> Err. Not following this. If this is the notify-fd bit, I'd expect the
> client to provide the fds, and then that's it. Then the server will
> check those fds before trying to suspend, and if any have data, it will
> wait until that data is read. Why does the server send an S in this one?
> Doesn't the task also see that there is data there?

As I said in another email "wait until data has been read" is not an
operation that Linux supports directly.
The server sends the S so that it can then wait for the R.

But maybe it can wait for a separate "stay awake" request - that can be in
v0.2 of the protocol.


> 
> 
> > 6/ The daemon may impose access control on be-awake messages.  In the above
> >    protocol it could be based on SCM_CREDENTIAL messages which might be
> >    required.
> >    It may also impose timeout on the 'R' reply from the 'S' request, or at
> >    least log clients which do not reply promptly.
> 
> This again feels more complex then necessary, but I'll leave it be for
> now.
> 
> > 7/ A client should not delay at all in replying to 'suspend
> >    soon' (S) with 'ready' (R).  It should only check if there is anything to
> >    do and should make a stay_awake request if there is something.  Then it
> >    must reply with 'R'.
> >    I should *not* use the fact that suspend is waiting for its reply to
> >    respond to an event as this misleads other clients as to the true state of
> >    the system.
> 
> Again, while I'm not sure about the notify method, this interleaving
> seems right to me. 
> 
> > 8/ I haven't treated hibernate here.  My feeling is that it would be a
> >    different configuration for the daemon.
> >    If hibernate were possible and the soonest stay-awake time were longer
> >    than X in the future, then the daemon might configure the RTCalarm for X,
> >    and when that arrives, it pops out of suspend and goes into hibernate.
> >    But the details can wait for revision 2 of the spec..
> 
> I'm not sure if hibernate is different in my mind, other then it taking
> much longer. It just seems like it would be a subtlety of the type of
> "suspend-now" request made to the PM daemon.
> 
> 
> So while I'm excited to be making some headway on the userland approach,
> I'm also concerned about how this approach might mesh with other dynamic
> run-time power-saving methods that might be used in the future. For
> instance, if some future scheduler does some form of rate limiting, and
> avoids scheduling applications to keep the cpu in deep idle for longer,
> would this keep the kernel from knowing enough to not freeze tasks that
> might need to do something so that suspend can occur?   This in effect
> would cause one power-saving strategy to block a potentially more
> power-saving method from occurring. 

It is hard to guard against unknown future possibilities :-)

However I suspect that such a scheduler would make decisions based on policy
specified by the application.  An application that handled wakeup events
would need to request prompt scheduling, and would need to behave nicely and
only wake up when actually required.

 
> 
> This is in part what I was trying to address with my original
> SCHED_STAYAWAKE proposal, trying to find a mechanism that provides
> adequate information for the kernel to make appropriate decisions. I
> worry a little bit about having too narrow a view on these solutions. 

If suspend was just like another C-state and only shuts-down the CPU then I
would agree that a SCHED related approach was appropriate.  But then it would
be called a C-state and not an S-state.

When you suspend it shuts down the CPU and also some devices - at least that
is how I understand the distinction.

I think if you are shutting down an essentially arbitrary set of devices,
then you need to have user-space making the decision.
If you are only shutting down the processor and all interrupts will still
wake it up, then don't call it "suspend" aka S3 - call it C9 or something.

> 
> That of course won't keep me from trying to start work on this user-land
> approach, but it is something I think we should keep in mind. It seems
> with too many things (Dave Hansens' virtualization talk at Plumbers
> covered some examples), we end up with 4-5 small solutions to smaller
> problems that don't really work well together instead of stepping back
> and seeing the broader picture.

I emphatically agree with that last comment.  It is one of the reasons that I
advocate a user-space solution were possible.
Once something goes into the kernel it can be difficult to refine or replace
because of the no-regressions rule.  It is much better where possible to
prototype new ideas with as much control logic as possible in user-space,
where it is flexible and it is possible to re-architect it to address the
broader picture as that becomes clear.
Once you actually know what you are doing and see the big picture, then you
can make informed decisions about adding functionality to the kernel.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-18  2:13                 ` NeilBrown
@ 2011-10-18 17:11                   ` Alan Stern
  2011-10-18 22:55                     ` NeilBrown
  0 siblings, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-18 17:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stultz, Rafael J. Wysocki, Linux PM list, mark gross, LKML

On Tue, 18 Oct 2011, NeilBrown wrote:

> On Mon, 17 Oct 2011 16:47:04 -0700 John Stultz <john.stultz@linaro.org> wrote:
> 
> > On Tue, 2011-10-18 at 09:49 +1100, NeilBrown wrote:
> > > However for the bits that I feel I do understand, this is what I (currently)
> > > think it should (or could) look like.
> > > 
> > > 
> > > 1/ There is a suspend-management daemon that starts very early and is the only
> > >    process that is allowed to initiate suspend or hibernate.  Any other
> > >    process which tries to do this is a BUG.
> > > 
> > > 2/ The daemon has two modes:
> > >    A/ on-demand.  In this mode it will only enter suspend when requested to,
> > >       and then only if there is nothing else blocking the suspend.
> > >    B/ immediate.  In this mode it will enter suspend whenever nothing is
> > >       blocking the suspend.  The daemon is free to add a small delay
> > >       proportional to the resume latency if so configured.
> > >    The daemon is in on-demand mode at start up.

A minor point...  This distinction may not truly be necessary.  
On-demand mode is pretty much the same as immediate mode with an
implicit client that is almost never ready to suspend.

That business about "only if nothing else is blocking the suspend" in 
on-demand mode is troubling.  What happens if something else _is_ 
blocking the suspend?  Will the GNOME power manager go into a tight 
loop, asking over and over for suspends that all fail?

> > > 3/ The daemon can handle 5 sorts of interactions with clients.
> > > 
> > >    i/ Change mode - a request to switch between on-demand and immediate mode.

May or may not be needed, depending on what we decide about these 
modes.

> > >   ii/ suspend now - a request to suspend which is only honoured if no client
> > >       has blocked suspend, and if the kernel is not blocking suspend.
> > >       Thus it is meaningless in immediate mode.
> > >  iii/ be-awake-after - this request carries a timestamp and is stateful - it
> > >       must be explicitly cancelled.  It requests that the system be fully
> > >       active from that time onwards.
> > 
> > This initially wasn't super clear to me why this is necessary. I see
> > below it is trying to handle the non-fd timer method to keeping the
> > system awake.
> > 
> > Although does this also duplex as the  suspend-inhibit/suspend-allow
> > call made by applications? Or was that interaction just skipped here?
> 
> Yes, exactly.  This is primarily allowing an application to say "inhibit
> suspend" (aka "be awake").  Being able to make the request for a future time
> seemed a natural and simple extension.
> If you can do timer wakeups like other wakeups and find it easier that way,
> then we can leave the timestamp out of it.

There's another way to implement "inhibit suspend" -- via the notify 
mechanism.  If the client doesn't respond to a callback, the server 
won't suspend.  Hence if people use the fd-timer approach, 
be-awake-after isn't needed.

On the other hand, the notify-fd mechanism _does_ need a "stay awake"
call (it could be something as simple as a 'W' message in the
protocol).  Without it, you run the risk that the client might read the
fd data before the server sees it.  The server would think the client
was idle while it was busily processing the data.

> > >   iv/ notify - this establishes a 'session' between client and server.
> > >       Server will call-back and await respond before entering suspend and
> > >       again after resuming (no response needed for resume).
> > >       The client is explicitly permitted to make a be-awake-after request
> > >       during the suspend call-back.
> > 
> > With the notify-fd example included below, I'm curious what specific use
> > cases you see as requiring the notify interaction? 
> 
> None specifically.  However while I'm convinced that all events must be
> visible to user-space I am not convinced that they will be visible to a
> poll.  You might occasionally require a read on a sysfs file, and then parse
> the contents to see if the event happened.
> We can do poll on sysfs files now so that can probably be avoided.
> But I didn't want to close doors before I was sure no-one needed them.

Agreed; a non-poll arrangement should not be ruled out.

> And I think that with notify-fd you still need a hand-shake of some sort, and
> this provides a simple starting point.
> 
> > 
> > >    v/ notify-fd.  This is a special form of 'notify' which carries a file
> > >       descriptor.  The server is not required to (and not expected to)
> > >       initiate the 'suspend' callback unless the fd is reporting POLL_IN or
> > >       POLL_ERR while preparing for suspend.
> > 
> > I'd think it would be "the server is not allowed to" instead of "not
> > required to".

That doesn't make sense.  The fd state could change between the time 
the server checks it and the time the suspend callback is sent.

> Maybe.  When specifying a protocol I am cautious of excluding things that are
> merely inconvenient.  So "should not" but not "shall not" in rfc-speak.
> However it might be easier on the client if it knew there would never be a
> call-back so it might be best to make it "shall now".

I'm not convinced that notify-fd is a good idea.  Compare the messages 
needed for notify vs. notify-fd:

	notify: The server queries clients and needs to receive a 
		response before each suspend.

	notify-fd: The server queries clients only when it knows they
		are likely to be busy, and the clients must notify the
		server every time they get a wakeup event.

It's not immediately obvious which involves more back-and-forth
messaging.  But then consider when those messages occur:

	With notify, clients send and receive messages only when they 
	are idle.

	With notify-fd, clients have to send a message before starting
	to process each wakeup event.

Sending more messages when you are idle seems better than sending fewer
when you have work to do.

> > > 4/ The daemon manages the RTC alarm.  Any other process programing the alarm
> > >    is a BUG.  Before entering suspend it will program the RTC to wake the
> > >    system at (or slightly before) the time of the earliest active
> > >    be-awake-after request.
> > 
> > So, this may need to be revised. My RTC virtualization and alarmtimer
> > rework gives us a lot more flexibility with RTC events. Given the array
> > of existing applications that use the RTC chardev, I think its not
> > realistic to consider it a bug if someone else is using it. 
> 
> If multiple applications think they can independently "own" the RTC alarm
> then that sounds like it is already a bug quite apart from anything I add.
> 
> We must have some way to virtualise the rtc-alarm so that any app can be sure
> there will we be a wakeup at-or-before some time.  I suggested doing that via
> the suspend daemon.  If there is a strong case for a more general
> kernel-based virtualisation of the RTC alarm in the kernel - then maybe that
> is OK.
> 
> > 
> > That said, the posix alarmtimer interface allows us to trigger wakeup
> > events in the future, without disrupting the legacy chardev programming
> > (this is possible because the kernel now virtualizes the chardev).
> > 
> > I'd probably rather add alarmtimer functionality to the timerfd, in
> > order to allow the notify-fd method to work with timers. But its not a
> > huge deal. I'd just like to avoid reimplementing a timer dispatch system
> > in userland.
> 
> Yep.  Exactly which solution gets implemented isn't important as long as it
> is clean and well defined.

Agreed.

> > > 5/ Possible implementation approaches for the client interactions:
> > >    I/ A SOCK_STREAM unix domain socket which takes commands.
> > >      On connect, server says "+READY".
> > >      Client writes "MODE ON-DEMAND" or "MODE IMMEDIATE"
> > >      Server replies "+MODE $MODE"
> > > 
> > >   II/ The same unix domain socket as I. 
> > >      Client writes "SUSPEND"
> > >      Server replies "+RESUMED" if the suspend happened, or
> > >                     "-BUSY"  if it didn't.
> > >      +RESUMED is no guarantee that an measurable time was in suspend, so
> > >      maybe it isn't needed.

I like the single-letter messages better than complete words.  Not a 
big deal either way...

> > >  III/ A separate Unix domain socket.
> > >      On connect, server says "Awake" meaning that this connection is ensuring
> > >      the system will be awake now.
> > >      Client can write a seconds-since-epoch number, which the server will echo
> > >      back when confirmed.  When that time arrives - which might be immediately
> > >      - the server will write "Awake" again.
> > >      When the client closes the connection, the suspend-block is removed.
> > 
> > What is the seconds-since-epoch bit for? 
> 
> That is the time when the server will ensure the system is awake from.  i.e.
> the wakeup timer.  If it is in the past, it means "be awake now".
> 
> 
> > 
> > >   IV/ A third Unix domain socket.
> > >      On connect, server writes a single character 'A' meaning 'system is
> > >      awake'.
> > >      When initiating suspend, server writes 'S' meaning 'suspend soon'.
> > >      Client must reply to each 'S' with 'R' meaning 'ready'.  Server does not
> > >      enter resume until the 'R' is received.
> > >      On resume, server will write 'A' meaning 'awake' again.  Many clients
> > >      might ignore this.
> > 
> > Again, still not sure about this bit, but how do you handle aborted
> > suspends? If you have one blocked task that takes a really long time to
> > respond, what happens if you've had multiple attempts to suspend that
> > have aborted? Just want to make sure you don't end up getting an late
> > ack for an old suspend attempt (although I'm not really sure if that
> > matters).
> 
> The server just needs to ensure that on every connection that it sends an 'S',
> it waits for an 'R', and subsequently sends an 'A'.

It shouldn't send the 'A' unless the client asked it to.

> Whether a suspend actually happens between the R and the A, or whether it was
> aborted, is irrelevant.
> After a suspend, whether aborted or not, the server must send 'A' to all
> clients that it sent 'S' to.

No -- only to clients that responded with 'R' and that asked for the
'A'.  If 'S' was sent to a client, the server must not send anything
more to that client until an 'R' is received.

>  Then it must sent S and wait for R before
> trying to suspend again.
> 
> So a client that has been blocked for a while might see an 'A' and an 'S' but
> that is all.  If it blocked for too long and the server was allowed to reject
> it, it might see a closed connection.
> There should be no confusion.

Is there any reason for the server ever to close a connection, other
than perhaps insufficient access rights?

> > >    V/ Same socket as IV, with extra message from client to server.
> > >      Client writes 'M' (monitor) in a message with SCM_RIGHTS containing one
> > >      or more fds.  Server will now only send 'S' when one or more of those fds
> > >      are readable, but the client cannot rely on that and must (as always)
> > >      not assume that a read will succeed, or will not block.
> > 
> > Err. Not following this. If this is the notify-fd bit, I'd expect the
> > client to provide the fds, and then that's it. Then the server will
> > check those fds before trying to suspend, and if any have data, it will
> > wait until that data is read. Why does the server send an S in this one?
> > Doesn't the task also see that there is data there?
> 
> As I said in another email "wait until data has been read" is not an
> operation that Linux supports directly.
> The server sends the S so that it can then wait for the R.

Right.  Besides, "wait until data has been read" is the wrong thing to 
do.  The client needs time to process the data after reading it.

> But maybe it can wait for a separate "stay awake" request - that can be in
> v0.2 of the protocol.

The client has to send a "stay awake" request to avoid races.  It 
should be sufficient for the server to wait until it gets either that 
or the 'R'.

> > > 6/ The daemon may impose access control on be-awake messages.  In the above
> > >    protocol it could be based on SCM_CREDENTIAL messages which might be
> > >    required.
> > >    It may also impose timeout on the 'R' reply from the 'S' request, or at
> > >    least log clients which do not reply promptly.
> > 
> > This again feels more complex then necessary, but I'll leave it be for
> > now.

We would be better off requiring proper access control at the start of
each connection.  Random processes should not be able to prevent the 
system from suspending.

> > > 7/ A client should not delay at all in replying to 'suspend
> > >    soon' (S) with 'ready' (R).  It should only check if there is anything to
> > >    do and should make a stay_awake request if there is something.  Then it
> > >    must reply with 'R'.
> > >    I should *not* use the fact that suspend is waiting for its reply to
> > >    respond to an event as this misleads other clients as to the true state of
> > >    the system.
> > 
> > Again, while I'm not sure about the notify method, this interleaving
> > seems right to me. 
> > 
> > > 8/ I haven't treated hibernate here.  My feeling is that it would be a
> > >    different configuration for the daemon.
> > >    If hibernate were possible and the soonest stay-awake time were longer
> > >    than X in the future, then the daemon might configure the RTCalarm for X,
> > >    and when that arrives, it pops out of suspend and goes into hibernate.
> > >    But the details can wait for revision 2 of the spec..
> > 
> > I'm not sure if hibernate is different in my mind, other then it taking
> > much longer. It just seems like it would be a subtlety of the type of
> > "suspend-now" request made to the PM daemon.

That's my feeling too.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 21:27             ` Rafael J. Wysocki
@ 2011-10-18 17:30               ` Alan Stern
  0 siblings, 0 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-18 17:30 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Mon, 17 Oct 2011, Rafael J. Wysocki wrote:

> > This requirement remains somewhat tricky.  Can we guarantee it?  It 
> > comes down to two things.  When an event occurs that will cause a 
> > program to want to keep the system awake:
> > 
> >      A. The event must be capable of interrupting a poll system
> > 	call.  I don't think it matters whether this interruption
> > 	takes the form of a signal or of completing the system call.
> > 
> >      B. The program must be able to detect, in a non-blocking way, 
> > 	whether the event has occurred.
> > 
> > Of course, any event that adds data to an input queue will be okay.  
> > But I don't know what other sorts of things we will have to handle.
> 
> Well, wakealarms don't do that, for one exaple.  Similarly for WoL through
> a magic packet AFAICS.  Similarly for "a cable has been plugged in"
> type of events.

I think we already know how to handle alarms.  For WoL we'd need 
something else -- a process would have to be notified about each resume 
and it would have to prevent further suspends until it knew that no 
more work needed to be done.

I don't know about the "cable has been plugged in" thing.  Is that 
generally regarded as a wakeup event?  I suspect it usually isn't.

> Well, it's not a bad idea in principle and I think it will work, so long
> as we can ensure that the PM daemon will be the only process using
> suspend/hibernate interfaces.
> 
> Apart from this, steps 1.-3. represent a loop with quite a bit of socket
> traffic if wakeup events occur relatively often (think someone typing on
> a keyboard being a wakeup device or moving a mouse being a wakeup device).

The socket traffic is troubling in that it takes time which could be 
spent at a lower power level.  On the other hand, this traffic occurs 
only when the processes involved are otherwise idle.  The only 
alternative seems to involve informing the PM daemon (or the kernel) 
at every wakeup event.

> > That's another thing we need to think about more carefully.  How 
> > extravagant do we want to make the wakeup/hibernation interaction?  My 
> > own feeling is: as little as possible (whatever that amounts to).
> 
> I don't agree with that.  In my opinion all system sleep interfaces should
> be handled.

I didn't mean it shouldn't be handled.  Just that hibernation should be
treated, to the extent we can, simply as a rather slow suspend -- not
specially.

> > Converting the programs that currently use Android's userspace
> > wakelocks might be somewhat more difficult.  Simply releasing a
> > wakelock would no longer be sufficient; a program would need to respond
> > to polls from the PM daemon whenever it was willing to let the system
> > go to sleep.
> 
> I honestly don't think it will be very practical to expect all of the
> existing Androig applications to be reworked this way ...

Android would probably use a different PM daemon design -- one that
could directly interact with their existing wakelock stuff.  Then all
they would need to do is change the implementation of userspace
wakelocks to communicate with this daemon instead of with the kernel.

Assuming they want to change their current design at all...

Alan Stern

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-18 17:11                   ` Alan Stern
@ 2011-10-18 22:55                     ` NeilBrown
  2011-10-19 16:19                       ` Alan Stern
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-18 22:55 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Stultz, Rafael J. Wysocki, Linux PM list, mark gross, LKML

[-- Attachment #1: Type: text/plain, Size: 16842 bytes --]

On Tue, 18 Oct 2011 13:11:05 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Tue, 18 Oct 2011, NeilBrown wrote:
> 
> > On Mon, 17 Oct 2011 16:47:04 -0700 John Stultz <john.stultz@linaro.org> wrote:
> > 
> > > On Tue, 2011-10-18 at 09:49 +1100, NeilBrown wrote:
> > > > However for the bits that I feel I do understand, this is what I (currently)
> > > > think it should (or could) look like.
> > > > 
> > > > 
> > > > 1/ There is a suspend-management daemon that starts very early and is the only
> > > >    process that is allowed to initiate suspend or hibernate.  Any other
> > > >    process which tries to do this is a BUG.
> > > > 
> > > > 2/ The daemon has two modes:
> > > >    A/ on-demand.  In this mode it will only enter suspend when requested to,
> > > >       and then only if there is nothing else blocking the suspend.
> > > >    B/ immediate.  In this mode it will enter suspend whenever nothing is
> > > >       blocking the suspend.  The daemon is free to add a small delay
> > > >       proportional to the resume latency if so configured.
> > > >    The daemon is in on-demand mode at start up.
> 
> A minor point...  This distinction may not truly be necessary.  
> On-demand mode is pretty much the same as immediate mode with an
> implicit client that is almost never ready to suspend.

"pretty much" but not "exactly".   The implicit client would need to be ready
for suspend when a "please suspend now" request arrived.
So maybe the "please suspend now" would get sent to the implicit client.
And the mode-change would tell the implicit client to stop blocking suspend.
Which makes the implicit client indistinguishable from an internal mode.

The on-demand mode is both for "legacy" situations and for start-up.
The daemon would be in on-demand mode and only enter immediate mode when some
program detected that "boot" had completed and any process that might want to
block suspend was doing so.  Then it would allow auto-suspend to start.

> 
> That business about "only if nothing else is blocking the suspend" in 
> on-demand mode is troubling.  What happens if something else _is_ 
> blocking the suspend?  Will the GNOME power manager go into a tight 
> loop, asking over and over for suspends that all fail?

I doubt it.  You can quite easily create situations where suspend will fail.
I think having an NFS mount can do it - certainly an NFS mount to a dead
server. (I think this might be fixed in the near future).
If g-p-m spun on that, someone would have noticed.  I suspect that it resets
the idle timer when it attempts suspend - whether it succeeds or not.

> 
> > > > 3/ The daemon can handle 5 sorts of interactions with clients.
> > > > 
> > > >    i/ Change mode - a request to switch between on-demand and immediate mode.
> 
> May or may not be needed, depending on what we decide about these 
> modes.
> 
> > > >   ii/ suspend now - a request to suspend which is only honoured if no client
> > > >       has blocked suspend, and if the kernel is not blocking suspend.
> > > >       Thus it is meaningless in immediate mode.
> > > >  iii/ be-awake-after - this request carries a timestamp and is stateful - it
> > > >       must be explicitly cancelled.  It requests that the system be fully
> > > >       active from that time onwards.
> > > 
> > > This initially wasn't super clear to me why this is necessary. I see
> > > below it is trying to handle the non-fd timer method to keeping the
> > > system awake.
> > > 
> > > Although does this also duplex as the  suspend-inhibit/suspend-allow
> > > call made by applications? Or was that interaction just skipped here?
> > 
> > Yes, exactly.  This is primarily allowing an application to stay "inhibit
> > suspend" (aka "be awake").  Being able to make the request for a future time
> > seemed a natural and simple extension.
> > If you can do timer wakeups like other wakeups and find it easier that way,
> > then we can leave the timestamp out of it.
> 
> There's another way to implement "inhibit suspend" -- via the notify 
> mechanism.  If the client doesn't respond to a callback, the server 
> won't suspend.  Hence if people use the fd-timer approach, 
> be-awake-after isn't needed.

I don't like that approach though.  It leaves the daemon thinking "we are on
the way towards suspend" and that is what it tells other clients.  But really
we are in state "someone doesn't want us to suspend now".
So some clients are assuming suspend is imminent and are waiting expectantly
for it to be over, but in reality the system is staying awake and only one
client knows about it.

> 
> On the other hand, the notify-fd mechanism _does_ need a "stay awake"
> call (it could be something as simple as a 'W' message in the
> protocol).  Without it, you run the risk that the client might read the
> fd data before the server sees it.  The server would think the client
> was idle while it was busily processing the data.

Yep.

> 
> > > >   iv/ notify - this establishes a 'session' between client and server.
> > > >       Server will call-back and await respond before entering suspend and
> > > >       again after resuming (no response needed for resume).
> > > >       The client is explicitly permitted to make a be-awake-after request
> > > >       during the suspend call-back.
> > > 
> > > With the notify-fd example included below, I'm curious what specific use
> > > cases you see as requiring the notify interaction? 
> > 
> > None specifically.  However while I'm convinced that all events must be
> > visible to user-space I am not convinced that they will be visible to a
> > poll.  You might occasionally require a read on a sysfs file, and then parse
> > the contents to see if the event happened.
> > We can do poll on sysfs files now so that can probably be avoided.
> > But I didn't want to close doors before I was sure no-one needed them.
> 
> Agreed; a non-poll arrangement should not be ruled out.
> 
> > And I think that with notify-fd you still need a hand-shake of some sort, and
> > this provides a simple starting point.
> > 
> > > 
> > > >    v/ notify-fd.  This is a special form of 'notify' which carries a file
> > > >       descriptor.  The server is not required to (and not expected to)
> > > >       initiate the 'suspend' callback unless the fd is reporting POLL_IN or
> > > >       POLL_ERR while preparing for suspend.
> > > 
> > > I'd think it would be "the server is not allowed to" instead of "not
> > > required to".
> 
> That doesn't make sense.  The fd state could change between the time 
> the server checks it and the time the suspend callback is sent.
> 
> > Maybe.  When specifying a protocol I am cautious of excluding things that are
> > merely inconvenient.  So "should not" but not "shall not" in rfc-speak.
> > However it might be easier on the client if it knew there would never be a
> > call-back so it might be best to make it "shall now".
> 
> I'm not convinced that notify-fd is a good idea.  Compare the messages 
> needed for notify vs. notify-fd:
> 
> 	notify: The server queries clients and needs to receive a 
> 		response before each suspend.
> 
> 	notify-fd: The server queries clients only when it knows they
> 		are likely to be busy, and the clients must notify the
> 		server every time they get a wakeup event.
> 
> It's not immediately obvious which involves more back-and-forth
> messaging.  But then consider when those messages occur:
> 
> 	With notify, clients send and receive messages only when they 
> 	are idle.
> 
> 	With notify-fd, clients have to send a message before starting
> 	to process each wakeup event.
> 
> Sending more messages when you are idle seems better than sending fewer
> when you have work to do.

I hadn't looked at it like that, but I think you have a valid point.
Of course prototyping an measuring is what we really need. :-)

> 
> > > > 4/ The daemon manages the RTC alarm.  Any other process programing the alarm
> > > >    is a BUG.  Before entering suspend it will program the RTC to wake the
> > > >    system at (or slightly before) the time of the earliest active
> > > >    be-awake-after request.
> > > 
> > > So, this may need to be revised. My RTC virtualization and alarmtimer
> > > rework gives us a lot more flexibility with RTC events. Given the array
> > > of existing applications that use the RTC chardev, I think its not
> > > realistic to consider it a bug if someone else is using it. 
> > 
> > If multiple applications think they can independently "own" the RTC alarm
> > then that sounds like it is already a bug quite apart from anything I add.
> > 
> > We must have some way to virtualise the rtc-alarm so that any app can be sure
> > there will we be a wakeup at-or-before some time.  I suggested doing that via
> > the suspend daemon.  If there is a strong case for a more general
> > kernel-based virtualisation of the RTC alarm in the kernel - then maybe that
> > is OK.
> > 
> > > 
> > > That said, the posix alarmtimer interface allows us to trigger wakeup
> > > events in the future, without disrupting the legacy chardev programming
> > > (this is possible because the kernel now virtualizes the chardev).
> > > 
> > > I'd probably rather add alarmtimer functionality to the timerfd, in
> > > order to allow the notify-fd method to work with timers. But its not a
> > > huge deal. I'd just like to avoid reimplementing a timer dispatch system
> > > in userland.
> > 
> > Yep.  Exactly which solution gets implemented isn't important as long as it
> > is clean and well defined.
> 
> Agreed.
> 
> > > > 5/ Possible implementation approaches for the client interactions:
> > > >    I/ A SOCK_STREAM unix domain socket which takes commands.
> > > >      On connect, server says "+READY".
> > > >      Client writes "MODE ON-DEMAND" or "MODE IMMEDIATE"
> > > >      Server replies "+MODE $MODE"
> > > > 
> > > >   II/ The same unix domain socket as I. 
> > > >      Client writes "SUSPEND"
> > > >      Server replies "+RESUMED" if the suspend happened, or
> > > >                     "-BUSY"  if it didn't.
> > > >      +RESUMED is no guarantee that an measurable time was in suspend, so
> > > >      maybe it isn't needed.
> 
> I like the single-letter messages better than complete words.  Not a 
> big deal either way...
> 
> > > >  III/ A separate Unix domain socket.
> > > >      On connect, server says "Awake" meaning that this connection is ensuring
> > > >      the system will be awake now.
> > > >      Client can write a seconds-since-epoch number, which the server will echo
> > > >      back when confirmed.  When that time arrives - which might be immediately
> > > >      - the server will write "Awake" again.
> > > >      When the client closes the connection, the suspend-block is removed.
> > > 
> > > What is the seconds-since-epoch bit for? 
> > 
> > That is the time when the server will ensure the system is awake from.  i.e.
> > the wakeup timer.  If it is in the past, it means "be awake now".
> > 
> > 
> > > 
> > > >   IV/ A third Unix domain socket.
> > > >      On connect, server writes a single character 'A' meaning 'system is
> > > >      awake'.
> > > >      When initiating suspend, server writes 'S' meaning 'suspend soon'.
> > > >      Client must reply to each 'S' with 'R' meaning 'ready'.  Server does not
> > > >      enter resume until the 'R' is received.
> > > >      On resume, server will write 'A' meaning 'awake' again.  Many clients
> > > >      might ignore this.
> > > 
> > > Again, still not sure about this bit, but how do you handle aborted
> > > suspends? If you have one blocked task that takes a really long time to
> > > respond, what happens if you've had multiple attempts to suspend that
> > > have aborted? Just want to make sure you don't end up getting an late
> > > ack for an old suspend attempt (although I'm not really sure if that
> > > matters).
> > 
> > The server just needs to ensure that on every connection that it sends an 'S',
> > it waits for an 'R', and subsequently sends an 'A'.
> 
> It shouldn't send the 'A' unless the client asked it to.
> 

That sounds like a valid optimisation.


> > Whether a suspend actually happens between the R and the A, or whether it was
> > aborted, is irrelevant.
> > After a suspend, whether aborted or not, the server must send 'A' to all
> > clients that it sent 'S' to.
> 
> No -- only to clients that responded with 'R' and that asked for the
> 'A'.  If 'S' was sent to a client, the server must not send anything
> more to that client until an 'R' is received.
> 
> >  Then it must sent S and wait for R before
> > trying to suspend again.
> > 
> > So a client that has been blocked for a while might see an 'A' and an 'S' but
> > that is all.  If it blocked for too long and the server was allowed to reject
> > it, it might see a closed connection.
> > There should be no confusion.
> 
> Is there any reason for the server ever to close a connection, other
> than perhaps insufficient access rights?

I was wondering if the server might want to impose a (largeish) maximum delay
that it will wait for an 'R'.  If it didn't get one in time it would respond
by closing the connection.
That probably isn't really necessary - hard to know without experience.


> 
> > > >    V/ Same socket as IV, with extra message from client to server.
> > > >      Client writes 'M' (monitor) in a message with SCM_RIGHTS containing one
> > > >      or more fds.  Server will now only send 'S' when one or more of those fds
> > > >      are readable, but the client cannot rely on that and must (as always)
> > > >      not assume that a read will succeed, or will not block.
> > > 
> > > Err. Not following this. If this is the notify-fd bit, I'd expect the
> > > client to provide the fds, and then that's it. Then the server will
> > > check those fds before trying to suspend, and if any have data, it will
> > > wait until that data is read. Why does the server send an S in this one?
> > > Doesn't the task also see that there is data there?
> > 
> > As I said in another email "wait until data has been read" is not an
> > operation that Linux supports directly.
> > The server sends the S so that it can then wait for the R.
> 
> Right.  Besides, "wait until data has been read" is the wrong thing to 
> do.  The client needs time to process the data after reading it.
> 
> > But maybe it can wait for a separate "stay awake" request - that can be in
> > v0.2 of the protocol.
> 
> The client has to send a "stay awake" request to avoid races.  It 
> should be sufficient for the server to wait until it gets either that 
> or the 'R'.
> 
> > > > 6/ The daemon may impose access control on be-awake messages.  In the above
> > > >    protocol it could be based on SCM_CREDENTIAL messages which might be
> > > >    required.
> > > >    It may also impose timeout on the 'R' reply from the 'S' request, or at
> > > >    least log clients which do not reply promptly.
> > > 
> > > This again feels more complex then necessary, but I'll leave it be for
> > > now.
> 
> We would be better off requiring proper access control at the start of
> each connection.  Random processes should not be able to prevent the 
> system from suspending.
> 
> > > > 7/ A client should not delay at all in replying to 'suspend
> > > >    soon' (S) with 'ready' (R).  It should only check if there is anything to
> > > >    do and should make a stay_awake request if there is something.  Then it
> > > >    must reply with 'R'.
> > > >    I should *not* use the fact that suspend is waiting for its reply to
> > > >    respond to an event as this misleads other clients as to the true state of
> > > >    the system.
> > > 
> > > Again, while I'm not sure about the notify method, this interleaving
> > > seems right to me. 
> > > 
> > > > 8/ I haven't treated hibernate here.  My feeling is that it would be a
> > > >    different configuration for the daemon.
> > > >    If hibernate were possible and the soonest stay-awake time were longer
> > > >    than X in the future, then the daemon might configure the RTCalarm for X,
> > > >    and when that arrives, it pops out of suspend and goes into hibernate.
> > > >    But the details can wait for revision 2 of the spec..
> > > 
> > > I'm not sure if hibernate is different in my mind, other then it taking
> > > much longer. It just seems like it would be a subtlety of the type of
> > > "suspend-now" request made to the PM daemon.
> 
> That's my feeling too.
> 
> Alan Stern

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-18 22:55                     ` NeilBrown
@ 2011-10-19 16:19                       ` Alan Stern
  2011-10-20  0:17                         ` NeilBrown
  0 siblings, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-19 16:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stultz, Rafael J. Wysocki, Linux PM list, mark gross, LKML

On Wed, 19 Oct 2011, NeilBrown wrote:

> > There's another way to implement "inhibit suspend" -- via the notify 
> > mechanism.  If the client doesn't respond to a callback, the server 
> > won't suspend.  Hence if people use the fd-timer approach, 
> > be-awake-after isn't needed.
> 
> I don't like that approach though.  It leaves the daemon thinking "we are on
> the way towards suspend" and that is what it tells other clients.  But really
> we are in state "someone doesn't want us to suspend now".
> So some clients are assuming suspend is imminent and are waiting expectantly
> for it to be over, but in reality the system is staying awake and only one
> client knows about it.

Clients should not make assumptions of that sort.  They have no need to
know exactly what the daemon is doing, and there's no reason for the
daemon to tell them.

All a client needs to know is whether or not _it_ is busy, so that it
can provide correct information to the daemon.  (Some clients may also
need to be notified each time the system resumes -- that's a separate
matter.)  As for the rest, a client may as well assume that the system
is perpetually on the verge of suspending, except when it has
personally told the daemon to stay awake.

On the daemon's side, I don't think there is a significant difference
between "we are on the way toward suspend and waiting for this client's
response" and "this client doesn't want us to suspend now".  Either
way, the daemon can't make any forward progress until it hears from
the client -- and the client might send an update at any time.

Alan Stern

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-19 16:19                       ` Alan Stern
@ 2011-10-20  0:17                         ` NeilBrown
  2011-10-20 14:29                           ` Alan Stern
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-20  0:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Stultz, Rafael J. Wysocki, Linux PM list, mark gross, LKML

[-- Attachment #1: Type: text/plain, Size: 3695 bytes --]

On Wed, 19 Oct 2011 12:19:21 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Wed, 19 Oct 2011, NeilBrown wrote:
> 
> > > There's another way to implement "inhibit suspend" -- via the notify 
> > > mechanism.  If the client doesn't respond to a callback, the server 
> > > won't suspend.  Hence if people use the fd-timer approach, 
> > > be-awake-after isn't needed.
> > 
> > I don't like that approach though.  It leaves the daemon thinking "we are on
> > the way towards suspend" and that is what it tells other clients.  But really
> > we are in state "someone doesn't want us to suspend now".
> > So some clients are assuming suspend is imminent and are waiting expectantly
> > for it to be over, but in reality the system is staying awake and only one
> > client knows about it.
> 
> Clients should not make assumptions of that sort.  They have no need to
> know exactly what the daemon is doing, and there's no reason for the
> daemon to tell them.
> 
> All a client needs to know is whether or not _it_ is busy, so that it
> can provide correct information to the daemon.  (Some clients may also
> need to be notified each time the system resumes -- that's a separate
> matter.)  As for the rest, a client may as well assume that the system
> is perpetually on the verge of suspending, except when it has
> personally told the daemon to stay awake.

I don't think it is always appropriate to assume the system is on the verge
of suspending except when explicitly asking for stay-awake.

Consider a daemon tasked with managing the "GSM" interface in a phone.
Any message from the GSM module will wake the phone if it is suspended.

When suspended, the daemon only wants to get "incoming call" and "incoming
SMS" events.
When not suspended, the daemon also wants "Active cell changed" events so
that it can make this information available to some display widget.

So when it is told that a suspend is imminent it quickly tells the GSM
module to be quieter and then says "OK".  If it had to assume it was always
on the verge, it could never allow active-cell-changed events.

You could argue that the GSM daemon should only be reporting CELL changes -
and so the GSM module should only be asked to report them - when the widget
(or some other client) is explicitly asking for them.  So when the screen
goes blank, the widget stops getting expose events, so it rescinds is request
for updates and the GSM daemon passes that on to the GSM module.  So when
suspend happens, the GSM module has already stopped reporting.

But I'm not convinced that complexity is always justified.

I could make the situation a little more complex.  There might be a daemon
which wants to monitor GSM cell locations and so is always asking.
The GSM daemon might have a policy that if anyone wants those updates, then
 - if system is awake for some other reason, report them as they arrive
 - if system is otherwise suspended, wake up every 10 minutes to poll and
   report.

In that case the suspend-client (the GSM daemon) really does care about the
difference between an explicit stay-awake and a late reply to a
suspend-imminent message.

So I'm still inclined to think that the two cases need to be treated
separately.

Thanks,
NeilBrown

> 
> On the daemon's side, I don't think there is a significant difference
> between "we are on the way toward suspend and waiting for this client's
> response" and "this client doesn't want us to suspend now".  Either
> way, the daemon can't make any forward progress until it hears from
> the client -- and the client might send an update at any time.
> 
> Alan Stern

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-20  0:17                         ` NeilBrown
@ 2011-10-20 14:29                           ` Alan Stern
  2011-10-21  5:05                             ` NeilBrown
  2011-10-21  5:23                             ` lsusd - The Linux SUSpend Daemon NeilBrown
  0 siblings, 2 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-20 14:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stultz, Rafael J. Wysocki, Linux PM list, mark gross, LKML

On Thu, 20 Oct 2011, NeilBrown wrote:

> > All a client needs to know is whether or not _it_ is busy, so that it
> > can provide correct information to the daemon.  (Some clients may also
> > need to be notified each time the system resumes -- that's a separate
> > matter.)  As for the rest, a client may as well assume that the system
> > is perpetually on the verge of suspending, except when it has
> > personally told the daemon to stay awake.
> 
> I don't think it is always appropriate to assume the system is on the verge
> of suspending except when explicitly asking for stay-awake.

For some programs it may not be appropriate, but for wakeup clients I 
believe it is.

> Consider a daemon tasked with managing the "GSM" interface in a phone.
> Any message from the GSM module will wake the phone if it is suspended.
> 
> When suspended, the daemon only wants to get "incoming call" and "incoming
> SMS" events.
> When not suspended, the daemon also wants "Active cell changed" events so
> that it can make this information available to some display widget.
> 
> So when it is told that a suspend is imminent it quickly tells the GSM
> module to be quieter and then says "OK".  If it had to assume it was always
> on the verge, it could never allow active-cell-changed events.
> 
> You could argue that the GSM daemon should only be reporting CELL changes -
> and so the GSM module should only be asked to report them - when the widget
> (or some other client) is explicitly asking for them.  So when the screen
> goes blank, the widget stops getting expose events, so it rescinds is request
> for updates and the GSM daemon passes that on to the GSM module.  So when
> suspend happens, the GSM module has already stopped reporting.
> 
> But I'm not convinced that complexity is always justified.
> 
> I could make the situation a little more complex.  There might be a daemon
> which wants to monitor GSM cell locations and so is always asking.
> The GSM daemon might have a policy that if anyone wants those updates, then
>  - if system is awake for some other reason, report them as they arrive
>  - if system is otherwise suspended, wake up every 10 minutes to poll and
>    report.
> 
> In that case the suspend-client (the GSM daemon) really does care about the
> difference between an explicit stay-awake and a late reply to a
> suspend-imminent message.
> 
> So I'm still inclined to think that the two cases need to be treated
> separately.

The way I see it, your GSM daemon needs to know when the system is 
about to go into suspend.  That's a separate matter from communicating 
information about wakeup activity to/from the PM daemon.

What should happen is this: When the PM daemon is ready to start a
suspend (none of its clients need to keep the system awake), it should
broadcast the fact that a suspend is about to begin.  This broadcast
could take various forms, the simplest of which is to run a shell
script.

In fact, we may want to integrate the PM daemon into pm-utils at a 
level above where the various suspend scripts get run.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-20 14:29                           ` Alan Stern
@ 2011-10-21  5:05                             ` NeilBrown
  2011-10-21  5:23                             ` lsusd - The Linux SUSpend Daemon NeilBrown
  1 sibling, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-21  5:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Stultz, Rafael J. Wysocki, Linux PM list, mark gross, LKML

[-- Attachment #1: Type: text/plain, Size: 3745 bytes --]

On Thu, 20 Oct 2011 10:29:33 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Thu, 20 Oct 2011, NeilBrown wrote:
> 
> > > All a client needs to know is whether or not _it_ is busy, so that it
> > > can provide correct information to the daemon.  (Some clients may also
> > > need to be notified each time the system resumes -- that's a separate
> > > matter.)  As for the rest, a client may as well assume that the system
> > > is perpetually on the verge of suspending, except when it has
> > > personally told the daemon to stay awake.
> > 
> > I don't think it is always appropriate to assume the system is on the verge
> > of suspending except when explicitly asking for stay-awake.
> 
> For some programs it may not be appropriate, but for wakeup clients I 
> believe it is.
> 
> > Consider a daemon tasked with managing the "GSM" interface in a phone.
> > Any message from the GSM module will wake the phone if it is suspended.
> > 
> > When suspended, the daemon only wants to get "incoming call" and "incoming
> > SMS" events.
> > When not suspended, the daemon also wants "Active cell changed" events so
> > that it can make this information available to some display widget.
> > 
> > So when it is told that a suspend is imminent it quickly tells the GSM
> > module to be quieter and then says "OK".  If it had to assume it was always
> > on the verge, it could never allow active-cell-changed events.
> > 
> > You could argue that the GSM daemon should only be reporting CELL changes -
> > and so the GSM module should only be asked to report them - when the widget
> > (or some other client) is explicitly asking for them.  So when the screen
> > goes blank, the widget stops getting expose events, so it rescinds is request
> > for updates and the GSM daemon passes that on to the GSM module.  So when
> > suspend happens, the GSM module has already stopped reporting.
> > 
> > But I'm not convinced that complexity is always justified.
> > 
> > I could make the situation a little more complex.  There might be a daemon
> > which wants to monitor GSM cell locations and so is always asking.
> > The GSM daemon might have a policy that if anyone wants those updates, then
> >  - if system is awake for some other reason, report them as they arrive
> >  - if system is otherwise suspended, wake up every 10 minutes to poll and
> >    report.
> > 
> > In that case the suspend-client (the GSM daemon) really does care about the
> > difference between an explicit stay-awake and a late reply to a
> > suspend-imminent message.
> > 
> > So I'm still inclined to think that the two cases need to be treated
> > separately.
> 
> The way I see it, your GSM daemon needs to know when the system is 
> about to go into suspend.  That's a separate matter from communicating 
> information about wakeup activity to/from the PM daemon.

I agree that it is conceptually distinct.
However I think that in practice it ends up looking very similar.
Maybe that is just the way I practice.

> 
> What should happen is this: When the PM daemon is ready to start a
> suspend (none of its clients need to keep the system awake), it should
> broadcast the fact that a suspend is about to begin.  This broadcast
> could take various forms, the simplest of which is to run a shell
> script.

I would call it 'publish' rather than 'broadcast' as I expect clients to
subscribe first, but it is a small matter.
Certainly having different protocols for different uses could be appropriate.


> 
> In fact, we may want to integrate the PM daemon into pm-utils at a 
> level above where the various suspend scripts get run.
> 

Probably - yes.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* lsusd - The Linux SUSpend Daemon
  2011-10-20 14:29                           ` Alan Stern
  2011-10-21  5:05                             ` NeilBrown
@ 2011-10-21  5:23                             ` NeilBrown
  2011-10-21 16:07                               ` Alan Stern
                                                 ` (2 more replies)
  1 sibling, 3 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-21  5:23 UTC (permalink / raw)
  To: Alan Stern, John Stultz, Rafael J. Wysocki, mark gross
  Cc: Linux PM list, LKML

[-- Attachment #1: Type: text/plain, Size: 5801 bytes --]


Hi,

I wasn't going to do this... but then I did.  I think that sometimes coding is
a bit like chocolate.

At:
    git://neil.brown.name/lsusd
or
    http://neil.brown.name/git/lsusd

you can find a bunch of proof-of-concept sample code that implements a
"Linux SUSpend Daemon" with client support library and test programs.

I haven't actually tested it as root and had it actually suspend and resume
and definitely haven't had it even close to a race condition, but the
various bits seem to work with each other properly when I run them under
strace and watch.

It didn't turn out quite the way I imagined, but then cold harsh reality has
a way of destroying our dreams, doesn't it :-)


Below is the README file.  Comment welcome as always.
I'm happy for patches too, but I'm equally happy for someone to re-write it
completely and make something really useful and maintainable.

NeilBrown

-----------------------------------------------------------------

This directory contains a prototype proof-of-concept system
for managing suspend in Linux.
Thus the Linux SUSpend Daemon.

It contains:

 lsusd:
    The main daemon.  It is written to run a tight loop and blocks as
     required.  It obeys the wakeup_count protocol to get race-free
     suspend and allows clients to register to find out about
     suspend and to block it either briefly or longer term.
    It uses files in /var/run/suspend for all communication.

    File are:

      disabled:  This file always exists.  If any process holds a
        shared flock(), suspend will not happen.
      immediate:  If this file exists, lsusd will try to suspend whenever
        possible.
      request:  If this is created, then lsusd will try to suspend
        once, and will remove the file when suspend completes or aborts.
      watching:  This is normally empty.  Any process wanting to know
        about suspend should take a shared flock and check the file is
        still empty, and should watch for modification.
        When suspend is imminent, lsusd creates 'watching-next', writes
         a byte to 'watching' and waits for an exclusive lock on 'watching'.
        Clients should move their lock to 'watching-next' when ready for
        suspend.
        When suspend completes, another byte (or 2) is written to
        'watching', and 'watching-next' is renamed over it.  Clients can
        use either of these to know that resume has happened.

      watching-next: The file that will be 'watching' in the next awake cycle.

    lsusd does not try to be event-loop based because:
      - /sys/power/wakeup_count is not pollable.  This could probably be
        'fixed' but I want code to work with today's kernel.  It will probably
        only block 100msec at most, but that might be too long???
      - I cannot get an event notification when a lock is removed from a
        file. :-(  And I think locks are an excellent light-weight
        mechanism for blocking suspend.

  lsused:
      This is an event-loop based daemon that can therefore easily handle
      socket connections and client protocols which need prompt
      response.  It communicates with lsusd and provides extra
      services to client.

      lsused (which needs a better name) listens on the socket
            /var/run/suspend/registration
      A client may connect and send a list of file descriptors.
      When a suspend is immanent, if any file descriptor is readable,
      lsused will send a 'S' message to the client and await an 'R' reply
      (S == suspend, R == ready).  When all replies are in, lsused will
      allow the suspend to complete.  When it does (or aborts), it will send
      'A' (awake) to those clients to which it sent 'S'.

      This allows a client to get a chance to handle any wakeup events,
      but not to be woken unnecessarily on every suspend.

   wakealarmd:
      This allows clients to register on the socket.
             /var/run/suspend/wakealarm
      They write a timestamp in seconds since epoch, and will receive
      a 'Now' message when that time arrives.
      Between the time the connection is made and the time a "seconds"
      number is written, suspend will be blocked.
      Also between the time that "Now" is sent and when the socket is
      closed, suspend is also blocked.

   request_suspend:
      A simple tool to create the 'request' file and then wait for it
      to be removed.

   libsus.a:  A library of client-side interfaces.
      suspend_open, suspend_block, suspend_allow, suspend_close:
           easy interface to blocking suspend
      suspend_watch, suspend_unwatch:
           For use in libevent program to get notifications of
           suspend and resume via the 'watching' file.
      wake_set, wake_destory:
           create a libevent event for an fd which is protected from
           suspend. Whenever it is readable, suspend will not be entered.
      wakealarm_set, wakealarm_destroy:
           create a libevent event for a particular time which will
           trigger even if system is suspend, and will protect against
           suspend while event is happening.


   block_test watch_test event_test alarm_test
        simple test programs for the above interfaces.


    suspend.py  dnotify.py:
       Sample code for detecting suspend/resume from python
    block.sh test_block.sh:
       Sample code for disabling suspend from shell.

All code is available under GPLv2+.  However if you ask for a different
license I am unlikely to refuse (at least with the early prototype).

Patches and comment are welcome, but please also feel free to include
any of this in some more complete framework.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-21  5:23                             ` lsusd - The Linux SUSpend Daemon NeilBrown
@ 2011-10-21 16:07                               ` Alan Stern
  2011-10-21 22:34                                 ` NeilBrown
  2011-10-21 20:10                               ` david
  2011-10-26 14:31                               ` Jan Engelhardt
  2 siblings, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-21 16:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stultz, Rafael J. Wysocki, mark gross, Linux PM list, LKML

On Fri, 21 Oct 2011, NeilBrown wrote:

> Hi,
> 
> I wasn't going to do this... but then I did.  I think that sometimes coding is
> a bit like chocolate.

Getting started is always a big hurdle, for me anyway.

> At:
>     git://neil.brown.name/lsusd
> or
>     http://neil.brown.name/git/lsusd
> 
> you can find a bunch of proof-of-concept sample code that implements a
> "Linux SUSpend Daemon" with client support library and test programs.
> 
> I haven't actually tested it as root and had it actually suspend and resume
> and definitely haven't had it even close to a race condition, but the
> various bits seem to work with each other properly when I run them under
> strace and watch.
> 
> It didn't turn out quite the way I imagined, but then cold harsh reality has
> a way of destroying our dreams, doesn't it :-)
> 
> 
> Below is the README file.  Comment welcome as always.
> I'm happy for patches too, but I'm equally happy for someone to re-write it
> completely and make something really useful and maintainable.
> 
> NeilBrown
> 
> -----------------------------------------------------------------
> 
> This directory contains a prototype proof-of-concept system
> for managing suspend in Linux.
> Thus the Linux SUSpend Daemon.
> 
> It contains:
> 
>  lsusd:

This name is no good; it's too much like "lsusb".  In fact, anything 
starting with "ls" is going to be confusing.  Not that I have any 
better suggestions at the moment...

>     The main daemon.  It is written to run a tight loop and blocks as
>      required.  It obeys the wakeup_count protocol to get race-free
>      suspend and allows clients to register to find out about
>      suspend and to block it either briefly or longer term.
>     It uses files in /var/run/suspend for all communication.

I'm not so keen on using files for communication.  At best, they are
rather awkward for two-way messaging.  If you really want to use them,
then at least put them on a non-backed filesystem, like something under
/dev.

>     File are:
> 
>       disabled:  This file always exists.  If any process holds a
>         shared flock(), suspend will not happen.
>       immediate:  If this file exists, lsusd will try to suspend whenever
>         possible.
>       request:  If this is created, then lsusd will try to suspend
>         once, and will remove the file when suspend completes or aborts.
>       watching:  This is normally empty.  Any process wanting to know
>         about suspend should take a shared flock and check the file is
>         still empty, and should watch for modification.
>         When suspend is imminent, lsusd creates 'watching-next', writes
>          a byte to 'watching' and waits for an exclusive lock on 'watching'.
>         Clients should move their lock to 'watching-next' when ready for
>         suspend.
>         When suspend completes, another byte (or 2) is written to
>         'watching', and 'watching-next' is renamed over it.  Clients can
>         use either of these to know that resume has happened.
> 
>       watching-next: The file that will be 'watching' in the next awake cycle.
> 
>     lsusd does not try to be event-loop based because:
>       - /sys/power/wakeup_count is not pollable.  This could probably be
>         'fixed' but I want code to work with today's kernel.  It will probably

Why does this matter?

>         only block 100msec at most, but that might be too long???

Too long for what?

>       - I cannot get an event notification when a lock is removed from a
>         file. :-(  And I think locks are an excellent light-weight
>         mechanism for blocking suspend.

Except for this one drawback.  Socket connections are superior in that 
regard.

>   lsused:
>       This is an event-loop based daemon that can therefore easily handle
>       socket connections and client protocols which need prompt
>       response.  It communicates with lsusd and provides extra
>       services to client.
> 
>       lsused (which needs a better name) listens on the socket
>             /var/run/suspend/registration
>       A client may connect and send a list of file descriptors.

Including an empty list?

>       When a suspend is immanent, if any file descriptor is readable,

Or if no file descriptors were sent?

>       lsused will send a 'S' message to the client and await an 'R' reply
>       (S == suspend, R == ready).  When all replies are in, lsused will
>       allow the suspend to complete.  When it does (or aborts), it will send
>       'A' (awake) to those clients to which it sent 'S'.

But not to the client which failed to send an 'R'?

>       This allows a client to get a chance to handle any wakeup events,
>       but not to be woken unnecessarily on every suspend.

In practice, it may be best for clients that handle a large number of 
wakeup events to avoid using the fd mechanism.  Clients that handle 
only occasional wakeups may be better off using it.

You left out an important element: A client must be allowed to send
'A' at any time, indicating that it does not want to suspend now.  Of 
course, this will work reliably only if the client uses the fd 
mechanism.

I'm not sure it's such a good idea to separate this from the main 
daemon.  A crucial point of the protocol is that the daemon reads 
/sys/power/wakeup_count before sending all the 'S' messages, and waits 
for all the 'R' replies before writing wakeup_count.  The two-program 
approach would make this difficult.

>    wakealarmd:
>       This allows clients to register on the socket.
>              /var/run/suspend/wakealarm
>       They write a timestamp in seconds since epoch, and will receive
>       a 'Now' message when that time arrives.
>       Between the time the connection is made and the time a "seconds"
>       number is written, suspend will be blocked.
>       Also between the time that "Now" is sent and when the socket is
>       closed, suspend is also blocked.

In theory, this could be integrated with the previous program.

Alan Stern



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-21  5:23                             ` lsusd - The Linux SUSpend Daemon NeilBrown
  2011-10-21 16:07                               ` Alan Stern
@ 2011-10-21 20:10                               ` david
  2011-10-21 22:09                                 ` NeilBrown
  2011-10-26 14:31                               ` Jan Engelhardt
  2 siblings, 1 reply; 80+ messages in thread
From: david @ 2011-10-21 20:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: Alan Stern, John Stultz, Rafael J. Wysocki, mark gross,
	Linux PM list, LKML

On Fri, 21 Oct 2011, NeilBrown wrote:

> Hi,
>
> I wasn't going to do this... but then I did.  I think that sometimes coding is
> a bit like chocolate.
>
> At:
>    git://neil.brown.name/lsusd
> or
>    http://neil.brown.name/git/lsusd
>
> you can find a bunch of proof-of-concept sample code that implements a
> "Linux SUSpend Daemon" with client support library and test programs.
>
> I haven't actually tested it as root and had it actually suspend and resume
> and definitely haven't had it even close to a race condition, but the
> various bits seem to work with each other properly when I run them under
> strace and watch.
>
> It didn't turn out quite the way I imagined, but then cold harsh reality has
> a way of destroying our dreams, doesn't it :-)
>
>
> Below is the README file.  Comment welcome as always.
> I'm happy for patches too, but I'm equally happy for someone to re-write it
> completely and make something really useful and maintainable.

have you put any thought into the idea of extending this slightly to 
handle the userspace wakelock interface to potentially allow this to run 
android userspace on a stock kernel?

I realize that there are other things that would be needed as well, but 
the wakelock interface is a biggie.

David Lang

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-21 20:10                               ` david
@ 2011-10-21 22:09                                 ` NeilBrown
  0 siblings, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-21 22:09 UTC (permalink / raw)
  To: david
  Cc: Alan Stern, John Stultz, Rafael J. Wysocki, mark gross,
	Linux PM list, LKML

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

On Fri, 21 Oct 2011 13:10:10 -0700 (PDT) david@lang.hm wrote:

> On Fri, 21 Oct 2011, NeilBrown wrote:
> 
> > Hi,
> >
> > I wasn't going to do this... but then I did.  I think that sometimes coding is
> > a bit like chocolate.
> >
> > At:
> >    git://neil.brown.name/lsusd
> > or
> >    http://neil.brown.name/git/lsusd
> >
> > you can find a bunch of proof-of-concept sample code that implements a
> > "Linux SUSpend Daemon" with client support library and test programs.
> >
> > I haven't actually tested it as root and had it actually suspend and resume
> > and definitely haven't had it even close to a race condition, but the
> > various bits seem to work with each other properly when I run them under
> > strace and watch.
> >
> > It didn't turn out quite the way I imagined, but then cold harsh reality has
> > a way of destroying our dreams, doesn't it :-)
> >
> >
> > Below is the README file.  Comment welcome as always.
> > I'm happy for patches too, but I'm equally happy for someone to re-write it
> > completely and make something really useful and maintainable.
> 
> have you put any thought into the idea of extending this slightly to 
> handle the userspace wakelock interface to potentially allow this to run 
> android userspace on a stock kernel?
> 
> I realize that there are other things that would be needed as well, but 
> the wakelock interface is a biggie.
> 
> David Lang

I have certainly thought of someone else doing it :-)
I only have a high-level understanding of Android interfaces and don't really
want to go any deeper than that.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-21 16:07                               ` Alan Stern
@ 2011-10-21 22:34                                 ` NeilBrown
  2011-10-22  2:00                                   ` Alan Stern
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-21 22:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Stultz, Rafael J. Wysocki, mark gross, Linux PM list, LKML

[-- Attachment #1: Type: text/plain, Size: 9586 bytes --]

On Fri, 21 Oct 2011 12:07:07 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Fri, 21 Oct 2011, NeilBrown wrote:
> 
> > Hi,
> > 
> > I wasn't going to do this... but then I did.  I think that sometimes coding is
> > a bit like chocolate.
> 
> Getting started is always a big hurdle, for me anyway.
> 
> > At:
> >     git://neil.brown.name/lsusd
> > or
> >     http://neil.brown.name/git/lsusd
> > 
> > you can find a bunch of proof-of-concept sample code that implements a
> > "Linux SUSpend Daemon" with client support library and test programs.
> > 
> > I haven't actually tested it as root and had it actually suspend and resume
> > and definitely haven't had it even close to a race condition, but the
> > various bits seem to work with each other properly when I run them under
> > strace and watch.
> > 
> > It didn't turn out quite the way I imagined, but then cold harsh reality has
> > a way of destroying our dreams, doesn't it :-)
> > 
> > 
> > Below is the README file.  Comment welcome as always.
> > I'm happy for patches too, but I'm equally happy for someone to re-write it
> > completely and make something really useful and maintainable.
> > 
> > NeilBrown
> > 
> > -----------------------------------------------------------------
> > 
> > This directory contains a prototype proof-of-concept system
> > for managing suspend in Linux.
> > Thus the Linux SUSpend Daemon.
> > 
> > It contains:
> > 
> >  lsusd:
> 
> This name is no good; it's too much like "lsusb".  In fact, anything 
> starting with "ls" is going to be confusing.  Not that I have any 
> better suggestions at the moment...
> 
> >     The main daemon.  It is written to run a tight loop and blocks as
> >      required.  It obeys the wakeup_count protocol to get race-free
> >      suspend and allows clients to register to find out about
> >      suspend and to block it either briefly or longer term.
> >     It uses files in /var/run/suspend for all communication.
> 
> I'm not so keen on using files for communication.  At best, they are
> rather awkward for two-way messaging.  If you really want to use them,
> then at least put them on a non-backed filesystem, like something under
> /dev.

Isn't /var/run a tmpfs filesystem?  It should be.
Surely /run is, so in the new world order the files should probably go
there.   But that is just a detail.

I like files...  I particularly like 'flock' to block suspend.   The
rest.... whatever..
With files, you only need a context switch when there is real communication.
With sockets, every message sent must be read so there will be a context
switch.

Maybe we could do something with futexes...

> 
> >     File are:
> > 
> >       disabled:  This file always exists.  If any process holds a
> >         shared flock(), suspend will not happen.
> >       immediate:  If this file exists, lsusd will try to suspend whenever
> >         possible.
> >       request:  If this is created, then lsusd will try to suspend
> >         once, and will remove the file when suspend completes or aborts.
> >       watching:  This is normally empty.  Any process wanting to know
> >         about suspend should take a shared flock and check the file is
> >         still empty, and should watch for modification.
> >         When suspend is imminent, lsusd creates 'watching-next', writes
> >          a byte to 'watching' and waits for an exclusive lock on 'watching'.
> >         Clients should move their lock to 'watching-next' when ready for
> >         suspend.
> >         When suspend completes, another byte (or 2) is written to
> >         'watching', and 'watching-next' is renamed over it.  Clients can
> >         use either of these to know that resume has happened.
> > 
> >       watching-next: The file that will be 'watching' in the next awake cycle.
> > 
> >     lsusd does not try to be event-loop based because:
> >       - /sys/power/wakeup_count is not pollable.  This could probably be
> >         'fixed' but I want code to work with today's kernel.  It will probably
> 
> Why does this matter?

In my mind an event based program should never block.  Every action should be
non-blocking and only taken when 'poll' says it can.
Reading /sys/power/wakeup_count can be read non-blocking, but you cannot find
out when it is sensible to try to read it again.  So it doesn't fit.

> 
> >         only block 100msec at most, but that might be too long???
> 
> Too long for what?

For some other process to connect to some socket and have to wait for the
connection to be accepted.
(When reading from wakeup_count in the current code it will block for a
multiple of 100ms.  The multiplier might be 0 or 1, possibly more, though
that is unlikely).

> 
> >       - I cannot get an event notification when a lock is removed from a
> >         file. :-(  And I think locks are an excellent light-weight
> >         mechanism for blocking suspend.
> 
> Except for this one drawback.  Socket connections are superior in that 
> regard.

I'm very happy for someone else write an all-socket based daemon.
Or just use my two deamons together.



> 
> >   lsused:
> >       This is an event-loop based daemon that can therefore easily handle
> >       socket connections and client protocols which need prompt
> >       response.  It communicates with lsusd and provides extra
> >       services to client.
> > 
> >       lsused (which needs a better name) listens on the socket
> >             /var/run/suspend/registration
> >       A client may connect and send a list of file descriptors.
> 
> Including an empty list?

With current code an empty list will mean no callback ever so it would be
pointless.
It is probably this interfaces could be improved.  I just wanted something
that worked.

> 
> >       When a suspend is immanent, if any file descriptor is readable,
> 
> Or if no file descriptors were sent?

Not with current code, but that does fit the design we discussed previously.

> 
> >       lsused will send a 'S' message to the client and await an 'R' reply
> >       (S == suspend, R == ready).  When all replies are in, lsused will
> >       allow the suspend to complete.  When it does (or aborts), it will send
> >       'A' (awake) to those clients to which it sent 'S'.
> 
> But not to the client which failed to send an 'R'?

Every client must send an R before suspend can continue.  I don't currently
have an special handling for clients that misbehave.  I'm not even certain
that I correctly hand the case where the client dies and the socket closes.


> 
> >       This allows a client to get a chance to handle any wakeup events,
> >       but not to be woken unnecessarily on every suspend.
> 
> In practice, it may be best for clients that handle a large number of 
> wakeup events to avoid using the fd mechanism.  Clients that handle 
> only occasional wakeups may be better off using it.
> 
> You left out an important element: A client must be allowed to send
> 'A' at any time, indicating that it does not want to suspend now.  Of 
> course, this will work reliably only if the client uses the fd 
> mechanism.

I did leave that out because client can always use "suspend_block()" to get a
lock on the lockfile which will block suspend.
But I have no objections to it going in.


> 
> I'm not sure it's such a good idea to separate this from the main 
> daemon.  A crucial point of the protocol is that the daemon reads 
> /sys/power/wakeup_count before sending all the 'S' messages, and waits 
> for all the 'R' replies before writing wakeup_count.  The two-program 
> approach would make this difficult.

I think it already works correctly with the two programs so it doesn't seem
that difficult.
The second daemon is a client to the first, and a server to other clients.
It is a multiplexer if you like - talking one (file-based) protocol to the
central server and another (socket-based) protocol to an arbitrary number of
clients.


> 
> >    wakealarmd:
> >       This allows clients to register on the socket.
> >              /var/run/suspend/wakealarm
> >       They write a timestamp in seconds since epoch, and will receive
> >       a 'Now' message when that time arrives.
> >       Between the time the connection is made and the time a "seconds"
> >       number is written, suspend will be blocked.
> >       Also between the time that "Now" is sent and when the socket is
> >       closed, suspend is also blocked.
> 
> In theory, this could be integrated with the previous program.

True.  Keeping it separate just reduced my cognitive load during development,
and provided more sample code of what a client would look like.

I'm not even sure it is entirely race-free.  It uses a 2-second margin to
ensure there is no race between suspending and the alarm-clock wakeup, but it
isn't really close enough to the suspend call to be certain that any
particular amount of time is enough.
Unless we get a counted wakeup_source for the RTC alarm, the RTC handling
will really need to be in the main daemon immediately before the write to
'state'.

Thanks for your review.

My original plan was to have a single daemon with a main loop and a bunch of
loadable modules that provided different protocols to clients: simple-socket,
file-based, dbus, "suspend.d" script directory etc.   That might still be fun
but it won't be a priority for a while.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-21 22:34                                 ` NeilBrown
@ 2011-10-22  2:00                                   ` Alan Stern
  2011-10-22 16:31                                     ` Alan Stern
  2011-10-23  8:21                                     ` NeilBrown
  0 siblings, 2 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-22  2:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stultz, Rafael J. Wysocki, mark gross, Linux PM list, LKML

On Sat, 22 Oct 2011, NeilBrown wrote:

> > >     It uses files in /var/run/suspend for all communication.
> > 
> > I'm not so keen on using files for communication.  At best, they are
> > rather awkward for two-way messaging.  If you really want to use them,
> > then at least put them on a non-backed filesystem, like something under
> > /dev.
> 
> Isn't /var/run a tmpfs filesystem?  It should be.
> Surely /run is, so in the new world order the files should probably go
> there.   But that is just a detail.

On my Fedora-14 systems there is no /run, and /var/run is a regular 
directory in a regular filesystem.

> I like files...  I particularly like 'flock' to block suspend.   The
> rest.... whatever..
> With files, you only need a context switch when there is real communication.
> With sockets, every message sent must be read so there will be a context
> switch.
> 
> Maybe we could do something with futexes...

Not easily -- as far as I can tell, futexes enjoy relatively little 
support.  In any case, they provide the same service as a mutex, which 
means you'd have to build a shared lock on top of them.

> > >     lsusd does not try to be event-loop based because:
> > >       - /sys/power/wakeup_count is not pollable.  This could probably be
> > >         'fixed' but I want code to work with today's kernel.  It will probably
> > 
> > Why does this matter?
> 
> In my mind an event based program should never block.  Every action should be
> non-blocking and only taken when 'poll' says it can.
> Reading /sys/power/wakeup_count can be read non-blocking, but you cannot find
> out when it is sensible to try to read it again.  So it doesn't fit.

There shouldn't be any trouble about making wakeup_count pollable.  It
also would need to respect nonblocking reads, which it currently does 
not do.

At the worst, you could always have a separate thread to read 
wakeup_count.

> > >       - I cannot get an event notification when a lock is removed from a
> > >         file. :-(  And I think locks are an excellent light-weight
> > >         mechanism for blocking suspend.
> > 
> > Except for this one drawback.  Socket connections are superior in that 
> > regard.
> 
> I'm very happy for someone else write an all-socket based daemon.

Hmmm...  Maybe I'll take you up on that.


> > >       lsused will send a 'S' message to the client and await an 'R' reply
> > >       (S == suspend, R == ready).  When all replies are in, lsused will
> > >       allow the suspend to complete.  When it does (or aborts), it will send
> > >       'A' (awake) to those clients to which it sent 'S'.
> > 
> > But not to the client which failed to send an 'R'?
> 
> Every client must send an R before suspend can continue.

I was referring to the case where you abort before receiving an 'R'.  
The current suspend attempt will fail, but then what happens during the
next attempt?

>  I don't currently
> have an special handling for clients that misbehave.  I'm not even certain
> that I correctly hand the case where the client dies and the socket closes.

Clients that misbehave will prevent the system from suspending.  It's 
probably not a good idea to try and second-guess them.

On the other hand, the daemon certainly should be able to handle 
socket closure at any time.

> My original plan was to have a single daemon with a main loop and a bunch of
> loadable modules that provided different protocols to clients: simple-socket,
> file-based, dbus, "suspend.d" script directory etc.   That might still be fun
> but it won't be a priority for a while.

Yeah.  I don't see much advantage over a single protocol plus a client 
library.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-22  2:00                                   ` Alan Stern
@ 2011-10-22 16:31                                     ` Alan Stern
  2011-10-23  3:31                                       ` NeilBrown
  2011-10-23  8:21                                     ` NeilBrown
  1 sibling, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-22 16:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stultz, Rafael J. Wysocki, mark gross, Linux PM list, LKML

On Fri, 21 Oct 2011, Alan Stern wrote:

> > Maybe we could do something with futexes...
> 
> Not easily -- as far as I can tell, futexes enjoy relatively little 
> support.  In any case, they provide the same service as a mutex, which 
> means you'd have to build a shared lock on top of them.

It occurred to me that we could create a new type of special file, one
intended to help with interprocess synchronization.  It would support
locking (shared or exclusive, blocking or non-blocking) and the poll
system call -- the file would appear to be ready for reading whenever a
shared lock wouldn't block and ready for writing whenever an exclusive
lock wouldn't block.  Actual reads and writes wouldn't have to do 
anything, although maybe someone could suggest a use for them.

Alan Stern

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-17 23:36         ` NeilBrown
@ 2011-10-22 22:07           ` Rafael J. Wysocki
  2011-10-23  2:57             ` NeilBrown
  2011-10-23 15:50             ` Alan Stern
  0 siblings, 2 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-22 22:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

On Tuesday, October 18, 2011, NeilBrown wrote:
> On Tue, 18 Oct 2011 00:02:30 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Monday, October 17, 2011, NeilBrown wrote:
> > > On Sun, 16 Oct 2011 00:10:40 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > ...
> > > > 
> > > > >  But I think it is very wrong to put some hack in the kernel like your
> > > > >    suspend_mode = disabled
> > > > 
> > > > Why is it wrong and why do you think it is a "hack"?
> > > 
> > > I think it is a "hack" because it is addressing a specific complaint rather
> > > than fixing a real problem.
> > 
> > I wonder why you think that there's no real problem here.
> > 
> > The problem I see is that multiple processes can use the suspend/hibernate
> > interfaces pretty much at the same time (not exactly in parallel, becuase
> > there's some locking in there, but very well there may be two different
> > processes operating /sys/power/state independently of each other), while
> > the /sys/power/wakeup_count interface was designed with the assumption that
> > there will be only one such process in mind.
> 
> Multiple process can write to your mail box at the same time.  But some how
> they don't.  This isn't because the kernel enforces anything, but because all
> the relevant programs have an agreed protocol by which they arbitrate access.
> One upon a time this involved creating a lock file with O_CREAT|O_EXCL.
> These days it is fcntl locking.  But it is still advisory.
> 
> In the same way - we stop multiple processes from suspending/hibernating at
> the same time by having an agreed protocol by which they share access to the
> resource.  The kernel does not need to be explicitly involved in this.

Not really.  The main difference is that such a protocol doesn't exist for
processes that may want to suspend/hibernate the system.

Moreover, the race is real, because if you have two processes trying to use
/sys/power/wakeup_count at the same time, you can get:

Process A		Process B
read from wakeup_count
talk to apps
write to wakeup_count
--------- wakeup event ----------
			read from wakeup_count
			talk to apps
			write to wakeup_count
try to suspend -> success (should be failure, because the wakeup event
may still be processed by applications at this point and Process A hasn't
checked that).

Now, there are systems running two (or more) desktop environments each of
which has a power manager that may want to suspend on it's own.  They both
will probably use pm-utils, but then I somehow doubt that pm-utils is well
prepared to handle such concurrency.

> 
> ...
> 
> > > > Well, I used to think that it's better to do things in user space.  Hence,
> > > > the hibernate user space interface that's used by many people.  And my
> > > > experience with that particular thing made me think that doing things in
> > > > the kernel may actually work better, even if they _can_ be done in user space.
> > > > 
> > > > Obviously, that doesn't apply to everything, but sometimes it simply is worth
> > > > discussing (if not trying).  If it doesn't work out, then fine, let's do it
> > > > differently, but I'm really not taking the "this should be done in user space"
> > > > argument at face value any more.  Sorry about that.
> > > 
> > > :-)  I have had similar mixed experiences.   Sometimes it can be a lot easier
> > > to get things working if it is all in the kernel.
> > > But I think that doing things in user-space leads to a lot more flexibility.
> > > Once you have the interfaces and designs worked out you can then start doing
> > > more interesting things and experimenting with ideas more easily.
> > > 
> > > In this case, I think the *only* barrier to a simple solution in user-space
> > > is the pre-existing software that uses the 'old' kernel interface.  It seems
> > > that interfacing with that is as easy as adding a script or two to pm-utils.
> > 
> > Well, assuming that we're only going to address the systems that use PM utils.
> 
> I suspect (and claim without proof :-) that any system will have some single
> user-space thing that is responsible for initiating suspend.

Well, see above.

> Every time I look at one I see a whole host of things that need to be done
> just before suspend, and other things just after resume.
> They used to be in /etc/apm/event.d.  Now there are
> in /usr/lib/pm-utils/sleep.d.

I know of systems that don't need those hooks, however.

> I think they were in /etc/acpid once.
> I've seen one thing that uses shared-library modules instead of shell scripts
> on the basis that it avoids forking and goes fast (and it probably does).
> But I doubt there is any interesting system where writing to /sys/power/state
> is the *only* thing you need to do for a clean suspend.

I have such a system on my desk. :-)

> So all systems will have some user-space infrastructure to support suspend,
> and we just need to hook in to that.
> 
> 
> > 
> > > With that problem solved, experimenting is much easier in user-space than in
> > > the kernel.
> > 
> > Somehow, I'm not exactly sure if we should throw all kernel-based solutions away
> > just yet.
> 
> My rule-of-thumb is that we should reserve kernel space for when
>   a/ it cannot be done in user space
>   b/ it cannot be done efficient in user space
>   c/ it cannot be done securely in user space
> 
> I don't think any of those have been demonstrated yet.  If/when they are it
> would be good to get those kernel-based solutions out of the draw (so yes:
> keep them out of the rubbish bin).

I have one more rule.  If my would-be user space solution has the following
properties:

* It is supposed to be used by all of the existing variants of user space
  (i.e. all existing variants of user space are expected to use the very same
  thing).

* It requires all of those user space variants to be modified to work with it
  correctly.

* It includes a daemon process having to be started on boot and run permanently.

then it likely is better to handle the problem in the kernel.

> So I'd respond with "I'm not at all sure that we should throw away an
> all-userspace solution just yet".  Particularly because many of us seem to
> still be working to understand what all the issues really are.

OK, so perhaps we should try to implement two concurrent solutions, one
kernel-based and one purely in user space and decide which one is better
afterwards?

Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-22 22:07           ` Rafael J. Wysocki
@ 2011-10-23  2:57             ` NeilBrown
  2011-10-23 13:16               ` Rafael J. Wysocki
  2011-10-23 15:50             ` Alan Stern
  1 sibling, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-23  2:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

[-- Attachment #1: Type: text/plain, Size: 10096 bytes --]

On Sun, 23 Oct 2011 00:07:33 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Tuesday, October 18, 2011, NeilBrown wrote:
> > On Tue, 18 Oct 2011 00:02:30 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > On Monday, October 17, 2011, NeilBrown wrote:
> > > > On Sun, 16 Oct 2011 00:10:40 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > ...
> > > > > 
> > > > > >  But I think it is very wrong to put some hack in the kernel like your
> > > > > >    suspend_mode = disabled
> > > > > 
> > > > > Why is it wrong and why do you think it is a "hack"?
> > > > 
> > > > I think it is a "hack" because it is addressing a specific complaint rather
> > > > than fixing a real problem.
> > > 
> > > I wonder why you think that there's no real problem here.
> > > 
> > > The problem I see is that multiple processes can use the suspend/hibernate
> > > interfaces pretty much at the same time (not exactly in parallel, becuase
> > > there's some locking in there, but very well there may be two different
> > > processes operating /sys/power/state independently of each other), while
> > > the /sys/power/wakeup_count interface was designed with the assumption that
> > > there will be only one such process in mind.
> > 
> > Multiple process can write to your mail box at the same time.  But some how
> > they don't.  This isn't because the kernel enforces anything, but because all
> > the relevant programs have an agreed protocol by which they arbitrate access.
> > One upon a time this involved creating a lock file with O_CREAT|O_EXCL.
> > These days it is fcntl locking.  But it is still advisory.
> > 
> > In the same way - we stop multiple processes from suspending/hibernating at
> > the same time by having an agreed protocol by which they share access to the
> > resource.  The kernel does not need to be explicitly involved in this.
> 
> Not really.  The main difference is that such a protocol doesn't exist for
> processes that may want to suspend/hibernate the system.
> 
> Moreover, the race is real, because if you have two processes trying to use
> /sys/power/wakeup_count at the same time, you can get:
> 
> Process A		Process B
> read from wakeup_count
> talk to apps
> write to wakeup_count
> --------- wakeup event ----------
> 			read from wakeup_count
> 			talk to apps
> 			write to wakeup_count
> try to suspend -> success (should be failure, because the wakeup event
> may still be processed by applications at this point and Process A hasn't
> checked that).
> 
> Now, there are systems running two (or more) desktop environments each of
> which has a power manager that may want to suspend on it's own.  They both
> will probably use pm-utils, but then I somehow doubt that pm-utils is well
> prepared to handle such concurrency.

I think that "upowerd" is the current "solution" to this problem.  Different
desktops can communicate with it to negotiate when suspend will happen.

When upowerd decides to suspend, it calls the relevant pm_utils command.

So with modern desktops we would never expect two different processes to be
requesting pm_utils to suspend at the same time.  If we did that would be a
problem  but we don't.  There is no race here to fix.

I'm not certain that upowerd provides good interfaces.  But its existence
shows that this sort of problem that you see is not that hard to solve.

Sure: people could still design systems  which exhibited racy access to
suspend, but people have always being able to write buggy code - making up
new interfaces isn't going to stop them.



> 
> > 
> > ...
> > 
> > > > > Well, I used to think that it's better to do things in user space.  Hence,
> > > > > the hibernate user space interface that's used by many people.  And my
> > > > > experience with that particular thing made me think that doing things in
> > > > > the kernel may actually work better, even if they _can_ be done in user space.
> > > > > 
> > > > > Obviously, that doesn't apply to everything, but sometimes it simply is worth
> > > > > discussing (if not trying).  If it doesn't work out, then fine, let's do it
> > > > > differently, but I'm really not taking the "this should be done in user space"
> > > > > argument at face value any more.  Sorry about that.
> > > > 
> > > > :-)  I have had similar mixed experiences.   Sometimes it can be a lot easier
> > > > to get things working if it is all in the kernel.
> > > > But I think that doing things in user-space leads to a lot more flexibility.
> > > > Once you have the interfaces and designs worked out you can then start doing
> > > > more interesting things and experimenting with ideas more easily.
> > > > 
> > > > In this case, I think the *only* barrier to a simple solution in user-space
> > > > is the pre-existing software that uses the 'old' kernel interface.  It seems
> > > > that interfacing with that is as easy as adding a script or two to pm-utils.
> > > 
> > > Well, assuming that we're only going to address the systems that use PM utils.
> > 
> > I suspect (and claim without proof :-) that any system will have some single
> > user-space thing that is responsible for initiating suspend.
> 
> Well, see above.

See also upowerd.


> 
> > Every time I look at one I see a whole host of things that need to be done
> > just before suspend, and other things just after resume.
> > They used to be in /etc/apm/event.d.  Now there are
> > in /usr/lib/pm-utils/sleep.d.
> 
> I know of systems that don't need those hooks, however.
> 
> > I think they were in /etc/acpid once.
> > I've seen one thing that uses shared-library modules instead of shell scripts
> > on the basis that it avoids forking and goes fast (and it probably does).
> > But I doubt there is any interesting system where writing to /sys/power/state
> > is the *only* thing you need to do for a clean suspend.
> 
> I have such a system on my desk. :-)

:-)
I guess I would have to conclude that it is therefore not interesting :-)

Would you accept that is more of an exception than the rule?

The real point though is that lots of system do want pre/post scripts, so we
can expect that avoiding races between such scripts is a solved problem - and
this is what we find in e.g. upowerd.


> 
> > So all systems will have some user-space infrastructure to support suspend,
> > and we just need to hook in to that.
> > 
> > 
> > > 
> > > > With that problem solved, experimenting is much easier in user-space than in
> > > > the kernel.
> > > 
> > > Somehow, I'm not exactly sure if we should throw all kernel-based solutions away
> > > just yet.
> > 
> > My rule-of-thumb is that we should reserve kernel space for when
> >   a/ it cannot be done in user space
> >   b/ it cannot be done efficient in user space
> >   c/ it cannot be done securely in user space
> > 
> > I don't think any of those have been demonstrated yet.  If/when they are it
> > would be good to get those kernel-based solutions out of the draw (so yes:
> > keep them out of the rubbish bin).
> 
> I have one more rule.  If my would-be user space solution has the following
> properties:
> 
> * It is supposed to be used by all of the existing variants of user space
>   (i.e. all existing variants of user space are expected to use the very same
>   thing).
> 
> * It requires all of those user space variants to be modified to work with it
>   correctly.
> 
> * It includes a daemon process having to be started on boot and run permanently.
> 
> then it likely is better to handle the problem in the kernel.

By that set or rules, upowerd, dbus, pulse audio, bluez, and probably systemd
all need to go in the kernel.  My guess is that you might not find wide
acceptance for these rules.


> 
> > So I'd respond with "I'm not at all sure that we should throw away an
> > all-userspace solution just yet".  Particularly because many of us seem to
> > still be working to understand what all the issues really are.
> 
> OK, so perhaps we should try to implement two concurrent solutions, one
> kernel-based and one purely in user space and decide which one is better
> afterwards?

Absolutely.

My primary reason for entering this discussion is eloquently presented in
       http://xkcd.com/386/

Someone said "We need to change the kernel to get race-free suspend" and this
simply is not true.  I wanted to present a way to use the existing
functionality to provide race-free suspend - and now even have code to do it.

If someone else wants to write a different implementation, either in
userspace or kernel that is fine.

They can then present it as "I know this can be implemented in userspace, but
I don't like that solution for reasons X, Y, Z and so here is my better
kernel-space implementation" then that is cool.  We can examine X, Y, Z and
the code and see if the argument holds up.  Maybe it will, maybe not.

So far the only arguments I've seen for putting the code in the kernel are:

 1/ it cannot be done in userspace - demonstrably wrong
 2/ it is more efficient in the kernel - not demonstrated or even
    convincingly argued
 3/ doing it in user-space is too confusing - we would need a clear
    demonstration that a kernel interface is less confusing - and still
    correct.  Also the best way to remove confusion is with clear
    documentation and sample code, not by making up new interfaces.
 4/ doing it in the kernel makes it more accessible to multiple desktops.
    The success of freedesktop.org seems to contradict that.

So if you can do it a "better" way, please do.  But also please make sure
you can quantify "better".   I claim that user-space solutions are "better"
because they are more flexible and easier to experiment with.  The "no
regressions" rule actively discourages experimentation in the kernel so
people should only do it if there is a clear benefit.  User-space solutions
are much easier to introduce and then deprecate.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-22 16:31                                     ` Alan Stern
@ 2011-10-23  3:31                                       ` NeilBrown
  0 siblings, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-23  3:31 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Stultz, Rafael J. Wysocki, mark gross, Linux PM list, LKML

[-- Attachment #1: Type: text/plain, Size: 1708 bytes --]

On Sat, 22 Oct 2011 12:31:58 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Fri, 21 Oct 2011, Alan Stern wrote:
> 
> > > Maybe we could do something with futexes...
> > 
> > Not easily -- as far as I can tell, futexes enjoy relatively little 
> > support.  In any case, they provide the same service as a mutex, which 
> > means you'd have to build a shared lock on top of them.
> 
> It occurred to me that we could create a new type of special file, one
> intended to help with interprocess synchronization.  It would support
> locking (shared or exclusive, blocking or non-blocking) and the poll
> system call -- the file would appear to be ready for reading whenever a
> shared lock wouldn't block and ready for writing whenever an exclusive
> lock wouldn't block.  Actual reads and writes wouldn't have to do 
> anything, although maybe someone could suggest a use for them.
> 
> Alan Stern

Tempting...  We would need a good case to get something included, but not to
just experiment.

The approach that I would take would probably be to extent flock() with a new
flag, e.g. LOCK_POLL.
then
  flock(fd, LOCK_EX | LOCK_POLL)

would try to get a non-blocking exclusive lock on 'fd'.  If that didn't
succeed it would insert a 'block' anyway [locks_insert_block()] and arrange
so that when the lock might succeeds, fd gets marked to say that 'lock' might
succeed.
Then POLLPRI becomes enabled and select will trigger if the fd is listed in
the 'exceptfds'.

I would then extend that so that lease breaking and dnotify notifications
could come through select/poll rather than as signals...

But this is probably going somewhat off-topic.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-22  2:00                                   ` Alan Stern
  2011-10-22 16:31                                     ` Alan Stern
@ 2011-10-23  8:21                                     ` NeilBrown
  2011-10-23 12:48                                       ` Rafael J. Wysocki
  2011-10-23 16:17                                       ` Alan Stern
  1 sibling, 2 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-23  8:21 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Stultz, Rafael J. Wysocki, mark gross, Linux PM list, LKML

[-- Attachment #1: Type: text/plain, Size: 10182 bytes --]

On Fri, 21 Oct 2011 22:00:13 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Sat, 22 Oct 2011, NeilBrown wrote:
> 
> > > >     It uses files in /var/run/suspend for all communication.
> > > 
> > > I'm not so keen on using files for communication.  At best, they are
> > > rather awkward for two-way messaging.  If you really want to use them,
> > > then at least put them on a non-backed filesystem, like something under
> > > /dev.
> > 
> > Isn't /var/run a tmpfs filesystem?  It should be.
> > Surely /run is, so in the new world order the files should probably go
> > there.   But that is just a detail.
> 
> On my Fedora-14 systems there is no /run, and /var/run is a regular 
> directory in a regular filesystem.
> 
> > I like files...  I particularly like 'flock' to block suspend.   The
> > rest.... whatever..
> > With files, you only need a context switch when there is real communication.
> > With sockets, every message sent must be read so there will be a context
> > switch.
> > 
> > Maybe we could do something with futexes...
> 
> Not easily -- as far as I can tell, futexes enjoy relatively little 
> support.  In any case, they provide the same service as a mutex, which 
> means you'd have to build a shared lock on top of them.
> 
> > > >     lsusd does not try to be event-loop based because:
> > > >       - /sys/power/wakeup_count is not pollable.  This could probably be
> > > >         'fixed' but I want code to work with today's kernel.  It will probably
> > > 
> > > Why does this matter?
> > 
> > In my mind an event based program should never block.  Every action should be
> > non-blocking and only taken when 'poll' says it can.
> > Reading /sys/power/wakeup_count can be read non-blocking, but you cannot find
> > out when it is sensible to try to read it again.  So it doesn't fit.
> 
> There shouldn't be any trouble about making wakeup_count pollable.  It
> also would need to respect nonblocking reads, which it currently does 
> not do.

Hmm.. you are correct.  I wonder why I thought it did support non-blocking
reads...
I guess it was the code for handling an interrupted system call.

I feel a bit uncomfortable with the idea of sysfs files that block but I
don't think I can convincingly argue against it.
A non-blocking flag could be passed in, but it would be a very messy change -
lots of function call signatures changing needlessly:  we would need a flag
to the 'show' method ... or add a 'show_nonblock' method which would also be
ugly.


But I think there is a need to block - if there is an in-progress event then
it must be possible to wait for it to complete as it may not be visible to
userspace until then.
We could easily enable 'poll' for wakeup_count and then make it always
non-blocking, but I'm not really sure I want to require programs to use poll,
only to allow them.  And without using poll there is no way to wait.

As wakeup_count really has to be single-access we could possibly fudge
something by remembering the last value read (like we remember the last value
written).

- if the current count is different from the last value read, then return
  it even if there are in-progress events.
- if the current count is the same as the last value read, then block until
  there are no in-progress events and return the new value.
- enable sysfs_poll on wakeup_count by calling sysfs_notify_dirent at the
  end of wakeup_source_deactivated .... or calling something in
  kernel/power/main.c which calls that.  However we would need to make
  sysfs_notify_dirent a lot lighter weight first.  Maybe I should do that.

Then a process that uses 'poll' could avoid reading wakeup_count except when
it has changed, and then it won't block.  And a process that doesn't use poll
can block by simply reading twice - either explicitly or by going around a 
   read then write and it fails
loop a second time.

I'm not sure I'm completely comfortable with that, but it is the best I could
come up with.


> 
> At the worst, you could always have a separate thread to read 
> wakeup_count.

Maybe.  I'm tempted simply not to worry about the short delay after all, but
the moment I do that, someone will use a wakeup_source to disable suspend
for a longer period of time (just like I suggested) and suddenly it won't be
a short delay after all.

> 
> > > >       - I cannot get an event notification when a lock is removed from a
> > > >         file. :-(  And I think locks are an excellent light-weight
> > > >         mechanism for blocking suspend.
> > > 
> > > Except for this one drawback.  Socket connections are superior in that 
> > > regard.
> > 
> > I'm very happy for someone else write an all-socket based daemon.
> 
> Hmmm...  Maybe I'll take you up on that.
> 
> 
> > > >       lsused will send a 'S' message to the client and await an 'R' reply
> > > >       (S == suspend, R == ready).  When all replies are in, lsused will
> > > >       allow the suspend to complete.  When it does (or aborts), it will send
> > > >       'A' (awake) to those clients to which it sent 'S'.
> > > 
> > > But not to the client which failed to send an 'R'?
> > 
> > Every client must send an R before suspend can continue.
> 
> I was referring to the case where you abort before receiving an 'R'.  
> The current suspend attempt will fail, but then what happens during the
> next attempt?

There is no situation in which I abort before receiving an 'R'.
I wait until all 'R's are in until I even check if an abort was requested.


NeilBrown



Sample untested patch to allow non-blocking reads and poll on wakeup_count.

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 84f7c7d..54543ba 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -44,6 +44,8 @@ static void split_counters(unsigned int *cnt, unsigned int *inpr)
 
 /* A preserved old value of the events counter. */
 static unsigned int saved_count;
+/* Record last value that was read from events counter. */
+static unsigned int last_read_count;
 
 static DEFINE_SPINLOCK(events_lock);
 
@@ -410,6 +412,7 @@ static void wakeup_source_deactivate(struct wakeup_source *ws)
 {
 	ktime_t duration;
 	ktime_t now;
+	unsigned int comb;
 
 	ws->relax_count++;
 	/*
@@ -440,7 +443,10 @@ static void wakeup_source_deactivate(struct wakeup_source *ws)
 	 * Increment the counter of registered wakeup events and decrement the
 	 * couter of wakeup events in progress simultaneously.
 	 */
-	atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+	comb = atomic_add_return(MAX_IN_PROGRESS, &combined_event_count);
+
+	if ((comb >> IN_PROGRESS_BITS) == last_read_count + 1)
+		wakeup_count_changed();
 }
 
 /**
@@ -624,15 +630,24 @@ bool pm_get_wakeup_count(unsigned int *count)
 
 	for (;;) {
 		split_counters(&cnt, &inpr);
-		if (inpr == 0 || signal_pending(current))
+		if (inpr == 0 || cnt != last_read_count)
 			break;
+		if (signal_pending(current))
+			return false;
 		pm_wakeup_update_hit_counts();
 		schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
 	}
 
+	last_read_count = cnt;
+	/* If cnt has just changed, then last_read_count will be a bit
+	 * old, so we won't block on the next read, only the one after.
+	 * However this ensures wakeup_source_deactivate doesn't
+	 * miss out on calling wakeup_count_changed() on a change.
+	 */
+	smp_wmb();
 	split_counters(&cnt, &inpr);
 	*count = cnt;
-	return !inpr;
+	return true;
 }
 
 /**
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 1ad8c93..41361ba 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -444,18 +444,19 @@ static unsigned int sysfs_poll(struct file *filp, poll_table *wait)
 
 void sysfs_notify_dirent(struct sysfs_dirent *sd)
 {
-	struct sysfs_open_dirent *od;
-	unsigned long flags;
+	if (sd->s_attr.open) {
+		struct sysfs_open_dirent *od;
+		unsigned long flags;
+		spin_lock_irqsave(&sysfs_open_dirent_lock, flags);
 
-	spin_lock_irqsave(&sysfs_open_dirent_lock, flags);
+		od = sd->s_attr.open;
+		if (od) {
+			atomic_inc(&od->event);
+			wake_up_interruptible(&od->poll);
+		}
 
-	od = sd->s_attr.open;
-	if (od) {
-		atomic_inc(&od->event);
-		wake_up_interruptible(&od->poll);
+		spin_unlock_irqrestore(&sysfs_open_dirent_lock, flags);
 	}
-
-	spin_unlock_irqrestore(&sysfs_open_dirent_lock, flags);
 }
 EXPORT_SYMBOL_GPL(sysfs_notify_dirent);
 
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 6bbcef2..e42b7f9 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -296,6 +296,7 @@ extern bool events_check_enabled;
 extern bool pm_wakeup_pending(void);
 extern bool pm_get_wakeup_count(unsigned int *count);
 extern bool pm_save_wakeup_count(unsigned int count);
+extern void wakeup_count_changed(void);
 #else /* !CONFIG_PM_SLEEP */
 
 static inline int register_pm_notifier(struct notifier_block *nb)
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6c601f8..6b3cd80 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -231,6 +231,8 @@ power_attr(state);
  * are any wakeup events detected after 'wakeup_count' was written to.
  */
 
+struct sysfs_dirent *wakeup_count_dirent;
+
 static ssize_t wakeup_count_show(struct kobject *kobj,
 				struct kobj_attribute *attr,
 				char *buf)
@@ -253,6 +255,12 @@ static ssize_t wakeup_count_store(struct kobject *kobj,
 	return -EINVAL;
 }
 
+void wakeup_count_change(void)
+{
+	if (wakeup_count_dirent)
+		sysfs_notify_dirent(wakeup_count_dirent);
+}
+
 power_attr(wakeup_count);
 #endif /* CONFIG_PM_SLEEP */
 
@@ -342,7 +350,11 @@ static int __init pm_init(void)
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
-	return sysfs_create_group(power_kobj, &attr_group);
+	error = sysfs_create_group(power_kobj, &attr_group);
+#ifdef CONFIG_PM_SLEEP
+	wakeup_count_dirent = sysfs_get_dirent(power_kobj->sd, NULL, "wakeup_count");
+#endif
+	return error;
 }
 
 core_initcall(pm_init);




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-23  8:21                                     ` NeilBrown
@ 2011-10-23 12:48                                       ` Rafael J. Wysocki
  2011-10-23 23:04                                         ` NeilBrown
  2011-10-23 16:17                                       ` Alan Stern
  1 sibling, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-23 12:48 UTC (permalink / raw)
  To: NeilBrown; +Cc: Alan Stern, John Stultz, mark gross, Linux PM list, LKML

On Sunday, October 23, 2011, NeilBrown wrote:
> On Fri, 21 Oct 2011 22:00:13 -0400 (EDT) Alan Stern
> <stern@rowland.harvard.edu> wrote:
> 
> > On Sat, 22 Oct 2011, NeilBrown wrote:
> > 
> > > > >     It uses files in /var/run/suspend for all communication.
> > > > 
> > > > I'm not so keen on using files for communication.  At best, they are
> > > > rather awkward for two-way messaging.  If you really want to use them,
> > > > then at least put them on a non-backed filesystem, like something under
> > > > /dev.
> > > 
> > > Isn't /var/run a tmpfs filesystem?  It should be.
> > > Surely /run is, so in the new world order the files should probably go
> > > there.   But that is just a detail.
> > 
> > On my Fedora-14 systems there is no /run, and /var/run is a regular 
> > directory in a regular filesystem.
> > 
> > > I like files...  I particularly like 'flock' to block suspend.   The
> > > rest.... whatever..
> > > With files, you only need a context switch when there is real communication.
> > > With sockets, every message sent must be read so there will be a context
> > > switch.
> > > 
> > > Maybe we could do something with futexes...
> > 
> > Not easily -- as far as I can tell, futexes enjoy relatively little 
> > support.  In any case, they provide the same service as a mutex, which 
> > means you'd have to build a shared lock on top of them.
> > 
> > > > >     lsusd does not try to be event-loop based because:
> > > > >       - /sys/power/wakeup_count is not pollable.  This could probably be
> > > > >         'fixed' but I want code to work with today's kernel.  It will probably
> > > > 
> > > > Why does this matter?
> > > 
> > > In my mind an event based program should never block.  Every action should be
> > > non-blocking and only taken when 'poll' says it can.
> > > Reading /sys/power/wakeup_count can be read non-blocking, but you cannot find
> > > out when it is sensible to try to read it again.  So it doesn't fit.
> > 
> > There shouldn't be any trouble about making wakeup_count pollable.  It
> > also would need to respect nonblocking reads, which it currently does 
> > not do.
> 
> Hmm.. you are correct.  I wonder why I thought it did support non-blocking
> reads...
> I guess it was the code for handling an interrupted system call.
> 
> I feel a bit uncomfortable with the idea of sysfs files that block but I
> don't think I can convincingly argue against it.
> A non-blocking flag could be passed in, but it would be a very messy change -
> lots of function call signatures changing needlessly:  we would need a flag
> to the 'show' method ... or add a 'show_nonblock' method which would also be
> ugly.
> 
> 
> But I think there is a need to block - if there is an in-progress event then
> it must be possible to wait for it to complete as it may not be visible to
> userspace until then.
> We could easily enable 'poll' for wakeup_count and then make it always
> non-blocking, but I'm not really sure I want to require programs to use poll,
> only to allow them.  And without using poll there is no way to wait.
> 
> As wakeup_count really has to be single-access we could possibly fudge
> something by remembering the last value read (like we remember the last value
> written).
> 
> - if the current count is different from the last value read, then return
>   it even if there are in-progress events.
> - if the current count is the same as the last value read, then block until
>   there are no in-progress events and return the new value.
> - enable sysfs_poll on wakeup_count by calling sysfs_notify_dirent at the
>   end of wakeup_source_deactivated .... or calling something in
>   kernel/power/main.c which calls that.  However we would need to make
>   sysfs_notify_dirent a lot lighter weight first.  Maybe I should do that.
> 
> Then a process that uses 'poll' could avoid reading wakeup_count except when
> it has changed, and then it won't block.  And a process that doesn't use poll
> can block by simply reading twice - either explicitly or by going around a 
>    read then write and it fails
> loop a second time.
> 
> I'm not sure I'm completely comfortable with that, but it is the best I could
> come up with.

Well, you're now considering doing more and more changes to the kernel
just to be able to implement something in user space to avoid making
some _other_ changes to the kernel.  That doesn't sound right to me.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-23  2:57             ` NeilBrown
@ 2011-10-23 13:16               ` Rafael J. Wysocki
  2011-10-23 23:44                 ` NeilBrown
  0 siblings, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-23 13:16 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

On Sunday, October 23, 2011, NeilBrown wrote:
> On Sun, 23 Oct 2011 00:07:33 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Tuesday, October 18, 2011, NeilBrown wrote:
> > > On Tue, 18 Oct 2011 00:02:30 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > 
> > > > On Monday, October 17, 2011, NeilBrown wrote:
> > > > > On Sun, 16 Oct 2011 00:10:40 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > > ...
> > > > > > 
> > > > > > >  But I think it is very wrong to put some hack in the kernel like your
> > > > > > >    suspend_mode = disabled
> > > > > > 
> > > > > > Why is it wrong and why do you think it is a "hack"?
> > > > > 
> > > > > I think it is a "hack" because it is addressing a specific complaint rather
> > > > > than fixing a real problem.
> > > > 
> > > > I wonder why you think that there's no real problem here.
> > > > 
> > > > The problem I see is that multiple processes can use the suspend/hibernate
> > > > interfaces pretty much at the same time (not exactly in parallel, becuase
> > > > there's some locking in there, but very well there may be two different
> > > > processes operating /sys/power/state independently of each other), while
> > > > the /sys/power/wakeup_count interface was designed with the assumption that
> > > > there will be only one such process in mind.
> > > 
> > > Multiple process can write to your mail box at the same time.  But some how
> > > they don't.  This isn't because the kernel enforces anything, but because all
> > > the relevant programs have an agreed protocol by which they arbitrate access.
> > > One upon a time this involved creating a lock file with O_CREAT|O_EXCL.
> > > These days it is fcntl locking.  But it is still advisory.
> > > 
> > > In the same way - we stop multiple processes from suspending/hibernating at
> > > the same time by having an agreed protocol by which they share access to the
> > > resource.  The kernel does not need to be explicitly involved in this.
> > 
> > Not really.  The main difference is that such a protocol doesn't exist for
> > processes that may want to suspend/hibernate the system.
> > 
> > Moreover, the race is real, because if you have two processes trying to use
> > /sys/power/wakeup_count at the same time, you can get:
> > 
> > Process A		Process B
> > read from wakeup_count
> > talk to apps
> > write to wakeup_count
> > --------- wakeup event ----------
> > 			read from wakeup_count
> > 			talk to apps
> > 			write to wakeup_count
> > try to suspend -> success (should be failure, because the wakeup event
> > may still be processed by applications at this point and Process A hasn't
> > checked that).
> > 
> > Now, there are systems running two (or more) desktop environments each of
> > which has a power manager that may want to suspend on it's own.  They both
> > will probably use pm-utils, but then I somehow doubt that pm-utils is well
> > prepared to handle such concurrency.
> 
> I think that "upowerd" is the current "solution" to this problem.  Different
> desktops can communicate with it to negotiate when suspend will happen.
> 
> When upowerd decides to suspend, it calls the relevant pm_utils command.
> 
> So with modern desktops we would never expect two different processes to be
> requesting pm_utils to suspend at the same time.  If we did that would be a
> problem  but we don't.  There is no race here to fix.

I have a slightly different view on that.  Since there is no mechanism to
prevent the race from occuring, we need to assume the race is going to happen
in certain situations.

> I'm not certain that upowerd provides good interfaces.  But its existence
> shows that this sort of problem that you see is not that hard to solve.
> 
> Sure: people could still design systems  which exhibited racy access to
> suspend, but people have always being able to write buggy code - making up
> new interfaces isn't going to stop them.

I'd say that depends on what the new interfaces are.  For one, I wouldn't
agree with the statement that we couldn't invent interfaces that were better
than what we had. :-)

> > 
> > > 
> > > ...
> > > 
> > > > > > Well, I used to think that it's better to do things in user space.  Hence,
> > > > > > the hibernate user space interface that's used by many people.  And my
> > > > > > experience with that particular thing made me think that doing things in
> > > > > > the kernel may actually work better, even if they _can_ be done in user space.
> > > > > > 
> > > > > > Obviously, that doesn't apply to everything, but sometimes it simply is worth
> > > > > > discussing (if not trying).  If it doesn't work out, then fine, let's do it
> > > > > > differently, but I'm really not taking the "this should be done in user space"
> > > > > > argument at face value any more.  Sorry about that.
> > > > > 
> > > > > :-)  I have had similar mixed experiences.   Sometimes it can be a lot easier
> > > > > to get things working if it is all in the kernel.
> > > > > But I think that doing things in user-space leads to a lot more flexibility.
> > > > > Once you have the interfaces and designs worked out you can then start doing
> > > > > more interesting things and experimenting with ideas more easily.
> > > > > 
> > > > > In this case, I think the *only* barrier to a simple solution in user-space
> > > > > is the pre-existing software that uses the 'old' kernel interface.  It seems
> > > > > that interfacing with that is as easy as adding a script or two to pm-utils.
> > > > 
> > > > Well, assuming that we're only going to address the systems that use PM utils.
> > > 
> > > I suspect (and claim without proof :-) that any system will have some single
> > > user-space thing that is responsible for initiating suspend.
> > 
> > Well, see above.
> 
> See also upowerd.
> 
> 
> > 
> > > Every time I look at one I see a whole host of things that need to be done
> > > just before suspend, and other things just after resume.
> > > They used to be in /etc/apm/event.d.  Now there are
> > > in /usr/lib/pm-utils/sleep.d.
> > 
> > I know of systems that don't need those hooks, however.
> > 
> > > I think they were in /etc/acpid once.
> > > I've seen one thing that uses shared-library modules instead of shell scripts
> > > on the basis that it avoids forking and goes fast (and it probably does).
> > > But I doubt there is any interesting system where writing to /sys/power/state
> > > is the *only* thing you need to do for a clean suspend.
> > 
> > I have such a system on my desk. :-)
> 
> :-)
> I guess I would have to conclude that it is therefore not interesting :-)
> 
> Would you accept that is more of an exception than the rule?

Not really.  For example, on systems that run Android it is not necessary to
do anything in user space before suspending and after resuming.  There is quite
a number of such systems around.

> The real point though is that lots of system do want pre/post scripts, so we
> can expect that avoiding races between such scripts is a solved problem - and
> this is what we find in e.g. upowerd.

Well, I'm not entirely convinced that this is the case. :-)

> > > So all systems will have some user-space infrastructure to support suspend,
> > > and we just need to hook in to that.
> > > 
> > > 
> > > > 
> > > > > With that problem solved, experimenting is much easier in user-space than in
> > > > > the kernel.
> > > > 
> > > > Somehow, I'm not exactly sure if we should throw all kernel-based solutions away
> > > > just yet.
> > > 
> > > My rule-of-thumb is that we should reserve kernel space for when
> > >   a/ it cannot be done in user space
> > >   b/ it cannot be done efficient in user space
> > >   c/ it cannot be done securely in user space
> > > 
> > > I don't think any of those have been demonstrated yet.  If/when they are it
> > > would be good to get those kernel-based solutions out of the draw (so yes:
> > > keep them out of the rubbish bin).
> > 
> > I have one more rule.  If my would-be user space solution has the following
> > properties:
> > 
> > * It is supposed to be used by all of the existing variants of user space
> >   (i.e. all existing variants of user space are expected to use the very same
> >   thing).
> > 
> > * It requires all of those user space variants to be modified to work with it
> >   correctly.
> > 
> > * It includes a daemon process having to be started on boot and run permanently.
> > 
> > then it likely is better to handle the problem in the kernel.
> 
> By that set or rules, upowerd, dbus, pulse audio, bluez, and probably systemd
> all need to go in the kernel.  My guess is that you might not find wide
> acceptance for these rules.

Well, that's not what I thought.  Perhaps I didn't express that precisely
enough.  Take systemd, for example.  You still can design and use a Linux-based
system without systemd, so there's no requirement that _all_ variants of user
space use the given approach.  The choice of whether or not to use systemd
is not a choice between a working and non-working system.

However, this is not the case with the system daemon, becuase it's supposed
to handle problems that aren't possible to address without it.  So either you
use it, or you end up with a (slightly) broken system.
 
> > > So I'd respond with "I'm not at all sure that we should throw away an
> > > all-userspace solution just yet".  Particularly because many of us seem to
> > > still be working to understand what all the issues really are.
> > 
> > OK, so perhaps we should try to implement two concurrent solutions, one
> > kernel-based and one purely in user space and decide which one is better
> > afterwards?
> 
> Absolutely.
> 
> My primary reason for entering this discussion is eloquently presented in
>        http://xkcd.com/386/
> 
> Someone said "We need to change the kernel to get race-free suspend" and this
> simply is not true.  I wanted to present a way to use the existing
> functionality to provide race-free suspend - and now even have code to do it.
> 
> If someone else wants to write a different implementation, either in
> userspace or kernel that is fine.
> 
> They can then present it as "I know this can be implemented in userspace, but
> I don't like that solution for reasons X, Y, Z and so here is my better
> kernel-space implementation" then that is cool.  We can examine X, Y, Z and
> the code and see if the argument holds up.  Maybe it will, maybe not.
> 
> So far the only arguments I've seen for putting the code in the kernel are:
> 
>  1/ it cannot be done in userspace - demonstrably wrong

I'm not sure if that's correct.  If you meant "it can be done in user space
without _any_ kernel modifications", I probably wouldn't agree.

>  2/ it is more efficient in the kernel - not demonstrated or even
>     convincingly argued

I don't agree with that, but let's see.

>  3/ doing it in user-space is too confusing - we would need a clear
>     demonstration that a kernel interface is less confusing - and still
>     correct.  Also the best way to remove confusion is with clear
>     documentation and sample code, not by making up new interfaces.

The user space solution makes up new interfaces too, although they are
confined to user space.

To me, it all boils down to two factors: (1) the complexity and efficiency
of the code needed to implement the feature and (2) the complexity of the
resulting framework (be it in the kernel or in user space).

>  4/ doing it in the kernel makes it more accessible to multiple desktops.
>     The success of freedesktop.org seems to contradict that.

I don't agree here too.  Is Android a member of freedesktop.org?

> So if you can do it a "better" way, please do.  But also please make sure
> you can quantify "better".   I claim that user-space solutions are "better"
> because they are more flexible and easier to experiment with.  The "no
> regressions" rule actively discourages experimentation in the kernel so
> people should only do it if there is a clear benefit.

You seem to suppose that every kernel modification necessarily has a potential
to lead to some regressions.  I'm not exactly use if that's correct
(e.g. adding a new driver usually doesn't affect people who don't need it).

> User-space solutions are much easier to introduce and then deprecate.

That's demonstrably incorrect and the counter example is the hibernation user
space interface.  The sheer amount of work needed to implement user
space-driven hibernation and maintain that code shows that it's not exactly
easy and it would be more difficult to deprecate than many existing kernel
interfaces at this point.

So, even if you have implemented something in user space, the "no regressions"
rule and deprecation difficulties will apply to it as well as to the kernel as
soon as you make a sufficient number of people use it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-22 22:07           ` Rafael J. Wysocki
  2011-10-23  2:57             ` NeilBrown
@ 2011-10-23 15:50             ` Alan Stern
  2011-10-27 21:06               ` Rafael J. Wysocki
  2011-10-28  0:02               ` NeilBrown
  1 sibling, 2 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-23 15:50 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz

On Sun, 23 Oct 2011, Rafael J. Wysocki wrote:

> Moreover, the race is real, because if you have two processes trying to use
> /sys/power/wakeup_count at the same time, you can get:
> 
> Process A		Process B
> read from wakeup_count
> talk to apps
> write to wakeup_count
> --------- wakeup event ----------
> 			read from wakeup_count
> 			talk to apps
> 			write to wakeup_count
> try to suspend -> success (should be failure, because the wakeup event
> may still be processed by applications at this point and Process A hasn't
> checked that).
> 
> Now, there are systems running two (or more) desktop environments each of
> which has a power manager that may want to suspend on it's own.  They both
> will probably use pm-utils, but then I somehow doubt that pm-utils is well
> prepared to handle such concurrency.

I have no objection to adding a kernel-based mechanism for restricting
the suspend interface to one process at a time.  However, that's just
part of your most recent proposal.  The other part involves
coordinating the requirements of all the processes that may want to
prevent the system from suspending, which is a harder job.

> I have one more rule.  If my would-be user space solution has the following
> properties:
> 
> * It is supposed to be used by all of the existing variants of user space
>   (i.e. all existing variants of user space are expected to use the very same
>   thing).
> 
> * It requires all of those user space variants to be modified to work with it
>   correctly.
> 
> * It includes a daemon process having to be started on boot and run permanently.
> 
> then it likely is better to handle the problem in the kernel.

This reasoning doesn't apply to the second problem of allowing
processes to block suspend.  Whether the solution is implemented in the
kernel or as a daemon, other programs will have to be modified to
accomodate it.

In fact, if it's done properly then these other programs should each
need only a single set of modifications; the differences involved in 
communicating with the kernel vs. a daemon could be encapsulated in a 
shared library.

Overall, I think the discussion is getting a little muddled because of
a significant problem that has not yet been addressed sufficiently.

There is a big difference between Android's kernel wakelocks and the
currently proposed use of wakeup_sources.  In Android, a kernel
wakelock associated with an input device isn't released until the
device's queue becomes empty, whereas we have been talking about
releasing the corresponding wakeup_source as soon as data added to
the queue becomes visible to userspace.

This is quite a significant difference.  It means there's a window of
time (from when the data is added to the queue to when it is removed)  
during which userspace is forced to cope with suspend races, instead of
letting the kernel handle things.  This is what leads to our problems
about sending fd's to the daemon process and sending a request to each
client before the daemon starts a suspend.

(Other aspects of this problem that haven't been mentioned before: What
happens when a client program using the notify-fd API wants to close
one of the wakeup-capable fd's?  It would have to tell the daemon to
close its copy of the fd as well.  And likewise, a client would have to 
inform the daemon whenever it opened a new wakeup-capable device file.)

Now, in the end, I think our approach makes more sense in a general 
setting.  The Android approach is okay for a restricted environment 
where you know beforehand exactly which devices will be wakeup-capable 
and which wakeup events will be monitored by userspace programs.  But 
for the whole range of Linux-based systems, the kernel can't rely on 
such information.

(If you think back to the original wakelock patches, for example,
you'll remember that the patch descriptions were expressed in terms of
what happens as the screen is turned on and off.  Obviously this is
meaningless for systems that, unlike an Android phone, don't have a
built-in screen.  I complained about this at the time, and the Android
people seemed to have a hard time understanding what I was objecting
to.)

So this is really our biggest problem.  If we can figure out a really
good way to solve it, I predict we'll find that the kernel-based and
daemon-based suspend solutions are extremely similar.

Alan Stern

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-23  8:21                                     ` NeilBrown
  2011-10-23 12:48                                       ` Rafael J. Wysocki
@ 2011-10-23 16:17                                       ` Alan Stern
  1 sibling, 0 replies; 80+ messages in thread
From: Alan Stern @ 2011-10-23 16:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: John Stultz, Rafael J. Wysocki, mark gross, Linux PM list, LKML

On Sun, 23 Oct 2011, NeilBrown wrote:

> > There shouldn't be any trouble about making wakeup_count pollable.  It
> > also would need to respect nonblocking reads, which it currently does 
> > not do.
> 
> Hmm.. you are correct.  I wonder why I thought it did support non-blocking
> reads...
> I guess it was the code for handling an interrupted system call.
> 
> I feel a bit uncomfortable with the idea of sysfs files that block but I
> don't think I can convincingly argue against it.
> A non-blocking flag could be passed in, but it would be a very messy change -
> lots of function call signatures changing needlessly:  we would need a flag
> to the 'show' method ... or add a 'show_nonblock' method which would also be
> ugly.

Right.  Sysfs is pretty inflexible.

> But I think there is a need to block - if there is an in-progress event then
> it must be possible to wait for it to complete as it may not be visible to
> userspace until then.
> We could easily enable 'poll' for wakeup_count and then make it always
> non-blocking, but I'm not really sure I want to require programs to use poll,
> only to allow them.  And without using poll there is no way to wait.
> 
> As wakeup_count really has to be single-access we could possibly fudge
> something by remembering the last value read (like we remember the last value
> written).

A simpler approach would be to add a nonblocking variant:  
/sys/power/wakeup_count_nb.  It would make sense to support poll for 
this file; poll isn't very useful for the wakeup_count file.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-23 12:48                                       ` Rafael J. Wysocki
@ 2011-10-23 23:04                                         ` NeilBrown
  0 siblings, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-23 23:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, John Stultz, mark gross, Linux PM list, LKML

[-- Attachment #1: Type: text/plain, Size: 5777 bytes --]

On Sun, 23 Oct 2011 14:48:22 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Sunday, October 23, 2011, NeilBrown wrote:
> > On Fri, 21 Oct 2011 22:00:13 -0400 (EDT) Alan Stern
> > <stern@rowland.harvard.edu> wrote:
> > 
> > > On Sat, 22 Oct 2011, NeilBrown wrote:
> > > 
> > > > > >     It uses files in /var/run/suspend for all communication.
> > > > > 
> > > > > I'm not so keen on using files for communication.  At best, they are
> > > > > rather awkward for two-way messaging.  If you really want to use them,
> > > > > then at least put them on a non-backed filesystem, like something under
> > > > > /dev.
> > > > 
> > > > Isn't /var/run a tmpfs filesystem?  It should be.
> > > > Surely /run is, so in the new world order the files should probably go
> > > > there.   But that is just a detail.
> > > 
> > > On my Fedora-14 systems there is no /run, and /var/run is a regular 
> > > directory in a regular filesystem.
> > > 
> > > > I like files...  I particularly like 'flock' to block suspend.   The
> > > > rest.... whatever..
> > > > With files, you only need a context switch when there is real communication.
> > > > With sockets, every message sent must be read so there will be a context
> > > > switch.
> > > > 
> > > > Maybe we could do something with futexes...
> > > 
> > > Not easily -- as far as I can tell, futexes enjoy relatively little 
> > > support.  In any case, they provide the same service as a mutex, which 
> > > means you'd have to build a shared lock on top of them.
> > > 
> > > > > >     lsusd does not try to be event-loop based because:
> > > > > >       - /sys/power/wakeup_count is not pollable.  This could probably be
> > > > > >         'fixed' but I want code to work with today's kernel.  It will probably
> > > > > 
> > > > > Why does this matter?
> > > > 
> > > > In my mind an event based program should never block.  Every action should be
> > > > non-blocking and only taken when 'poll' says it can.
> > > > Reading /sys/power/wakeup_count can be read non-blocking, but you cannot find
> > > > out when it is sensible to try to read it again.  So it doesn't fit.
> > > 
> > > There shouldn't be any trouble about making wakeup_count pollable.  It
> > > also would need to respect nonblocking reads, which it currently does 
> > > not do.
> > 
> > Hmm.. you are correct.  I wonder why I thought it did support non-blocking
> > reads...
> > I guess it was the code for handling an interrupted system call.
> > 
> > I feel a bit uncomfortable with the idea of sysfs files that block but I
> > don't think I can convincingly argue against it.
> > A non-blocking flag could be passed in, but it would be a very messy change -
> > lots of function call signatures changing needlessly:  we would need a flag
> > to the 'show' method ... or add a 'show_nonblock' method which would also be
> > ugly.
> > 
> > 
> > But I think there is a need to block - if there is an in-progress event then
> > it must be possible to wait for it to complete as it may not be visible to
> > userspace until then.
> > We could easily enable 'poll' for wakeup_count and then make it always
> > non-blocking, but I'm not really sure I want to require programs to use poll,
> > only to allow them.  And without using poll there is no way to wait.
> > 
> > As wakeup_count really has to be single-access we could possibly fudge
> > something by remembering the last value read (like we remember the last value
> > written).
> > 
> > - if the current count is different from the last value read, then return
> >   it even if there are in-progress events.
> > - if the current count is the same as the last value read, then block until
> >   there are no in-progress events and return the new value.
> > - enable sysfs_poll on wakeup_count by calling sysfs_notify_dirent at the
> >   end of wakeup_source_deactivated .... or calling something in
> >   kernel/power/main.c which calls that.  However we would need to make
> >   sysfs_notify_dirent a lot lighter weight first.  Maybe I should do that.
> > 
> > Then a process that uses 'poll' could avoid reading wakeup_count except when
> > it has changed, and then it won't block.  And a process that doesn't use poll
> > can block by simply reading twice - either explicitly or by going around a 
> >    read then write and it fails
> > loop a second time.
> > 
> > I'm not sure I'm completely comfortable with that, but it is the best I could
> > come up with.
> 
> Well, you're now considering doing more and more changes to the kernel
> just to be able to implement something in user space to avoid making
> some _other_ changes to the kernel.  That doesn't sound right to me.

:-)   I thought I might get challenged on something like that.

I think the cases are different though.

I'm not presenting this code as a new feature.  I don't need new features -
I have user-space code which works correctly with the current kernel features.

However the precise usage of wakeup_count is a little unusual in that it
blocks when you read.  That doesn't mean that it cannot be used correctly,
but it might limit the options available to a user-space program which wants
to use it.   I was just looking at ways to generalise the existing interface
so that it matches the rest of the kernel better.  I see it much more as a
bug fix than as a new feature.

I'm not saying we need this patch, and I'm not even sure I like it.  I just
presented it as part of exploring exactly how the wakeup_count interface
really works.  It is an interface that I like and that does allow the
original suspend-race problem to be solved, but that does not mean it is
necessarily perfect.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-23 13:16               ` Rafael J. Wysocki
@ 2011-10-23 23:44                 ` NeilBrown
  2011-10-24 10:23                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-23 23:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

[-- Attachment #1: Type: text/plain, Size: 9478 bytes --]

On Sun, 23 Oct 2011 15:16:36 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Sunday, October 23, 2011, NeilBrown wrote:
> > On Sun, 23 Oct 2011 00:07:33 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > 
> > > On Tuesday, October 18, 2011, NeilBrown wrote:

> > > > > 
> > > > > > With that problem solved, experimenting is much easier in user-space than in
> > > > > > the kernel.
> > > > > 
> > > > > Somehow, I'm not exactly sure if we should throw all kernel-based solutions away
> > > > > just yet.
> > > > 
> > > > My rule-of-thumb is that we should reserve kernel space for when
> > > >   a/ it cannot be done in user space
> > > >   b/ it cannot be done efficient in user space
> > > >   c/ it cannot be done securely in user space
> > > > 
> > > > I don't think any of those have been demonstrated yet.  If/when they are it
> > > > would be good to get those kernel-based solutions out of the draw (so yes:
> > > > keep them out of the rubbish bin).
> > > 
> > > I have one more rule.  If my would-be user space solution has the following
> > > properties:
> > > 
> > > * It is supposed to be used by all of the existing variants of user space
> > >   (i.e. all existing variants of user space are expected to use the very same
> > >   thing).
> > > 
> > > * It requires all of those user space variants to be modified to work with it
> > >   correctly.
> > > 
> > > * It includes a daemon process having to be started on boot and run permanently.
> > > 
> > > then it likely is better to handle the problem in the kernel.
> > 
> > By that set or rules, upowerd, dbus, pulse audio, bluez, and probably systemd
> > all need to go in the kernel.  My guess is that you might not find wide
> > acceptance for these rules.
> 
> Well, that's not what I thought.  Perhaps I didn't express that precisely
> enough.  Take systemd, for example.  You still can design and use a Linux-based
> system without systemd, so there's no requirement that _all_ variants of user
> space use the given approach.  The choice of whether or not to use systemd
> is not a choice between a working and non-working system.
> 
> However, this is not the case with the system daemon, becuase it's supposed
> to handle problems that aren't possible to address without it.  So either you
> use it, or you end up with a (slightly) broken system.

I think you are seeing a distinction that isn't there.

Every system needs a process to run as 'init' - as PID == 1.
It might be systemd, it might be sysv-init, it might be /bin/sh, but there
are tasks that process much perform and there must be exactly one process
performing those tasks and the test of the systems need to be able to work
with that task (or ignore if it it is wholely independent).

Similarly every system need one process to manage suspend.  It can be my
daemon or your daemon or Alan's daemon but it cannot be 2 or more of them
running at the same time as that doesn't make any more sense than having
systemd and init running at the same time.


>  
> > > > So I'd respond with "I'm not at all sure that we should throw away an
> > > > all-userspace solution just yet".  Particularly because many of us seem to
> > > > still be working to understand what all the issues really are.
> > > 
> > > OK, so perhaps we should try to implement two concurrent solutions, one
> > > kernel-based and one purely in user space and decide which one is better
> > > afterwards?
> > 
> > Absolutely.
> > 
> > My primary reason for entering this discussion is eloquently presented in
> >        http://xkcd.com/386/
> > 
> > Someone said "We need to change the kernel to get race-free suspend" and this
> > simply is not true.  I wanted to present a way to use the existing
> > functionality to provide race-free suspend - and now even have code to do it.
> > 
> > If someone else wants to write a different implementation, either in
> > userspace or kernel that is fine.
> > 
> > They can then present it as "I know this can be implemented in userspace, but
> > I don't like that solution for reasons X, Y, Z and so here is my better
> > kernel-space implementation" then that is cool.  We can examine X, Y, Z and
> > the code and see if the argument holds up.  Maybe it will, maybe not.
> > 
> > So far the only arguments I've seen for putting the code in the kernel are:
> > 
> >  1/ it cannot be done in userspace - demonstrably wrong
> 
> I'm not sure if that's correct.  If you meant "it can be done in user space
> without _any_ kernel modifications", I probably wouldn't agree.

I have code to do it correctly today with no kernel modifications.  It is
called "lsusd".   Proof by example.  Or can you show that lsusd doesn't work
correctly?


> 
> >  2/ it is more efficient in the kernel - not demonstrated or even
> >     convincingly argued
> 
> I don't agree with that, but let's see.

If you don't agree, then you presumably have a demonstration or a convincing
argument.  Can you share it?

> 
> >  3/ doing it in user-space is too confusing - we would need a clear
> >     demonstration that a kernel interface is less confusing - and still
> >     correct.  Also the best way to remove confusion is with clear
> >     documentation and sample code, not by making up new interfaces.
> 
> The user space solution makes up new interfaces too, although they are
> confined to user space.
> 
> To me, it all boils down to two factors: (1) the complexity and efficiency
> of the code needed to implement the feature and (2) the complexity of the
> resulting framework (be it in the kernel or in user space).
> 
> >  4/ doing it in the kernel makes it more accessible to multiple desktops.
> >     The success of freedesktop.org seems to contradict that.
> 
> I don't agree here too.  Is Android a member of freedesktop.org?
>

This is completely irrelevant.

The "multiple desktops" issue that you brought up is (as I understand it)
multiple desktops running on the same computer, whether concurrently or
sequentially.
Android simply does not face that issue - it is the only "desktop" and is in
complete control of the machine it runs on.
So it doesn't need to solve the issue, so it doesn't need to be a member of
freedesktop.org.


> > So if you can do it a "better" way, please do.  But also please make sure
> > you can quantify "better".   I claim that user-space solutions are "better"
> > because they are more flexible and easier to experiment with.  The "no
> > regressions" rule actively discourages experimentation in the kernel so
> > people should only do it if there is a clear benefit.
> 
> You seem to suppose that every kernel modification necessarily has a potential
> to lead to some regressions.  I'm not exactly use if that's correct
> (e.g. adding a new driver usually doesn't affect people who don't need it).

I think that experimenting in the kernel (or at least in the upstream kernel)
is likely to result in creating functionality that ultimately will
not get used - the whole point of experimenting is that you probably get it
wrong the first time.
If this happens we either:
  - remove the unwanted functionality, which could be considered a regression
    and so must be done very carefully
  - leave the unwanted functionality there thus creating clutter and a
    maintenance burden.

i.e. the point of the "no-regressions" reference is that it tends to make it
harder to remove mistakes.  Not impossible of course, but it requires a lot
more care and time.

So I am against adding code to the kernel until the problem is really well
understood.  From the sorts of discussion that has been going on both in
this thread and elsewhere I'm not convinced the problem really is well
understood at all.
I think we are very much at the stage where people should be experimenting
with solutions, sharing the results, and learning.

So please feel free to publish sample code - whether for the kernel or for
user-space.  But it will only be credible if it is a fairly complete
proposal - e.g. with sample code demonstrating how the kernel features are
used.

(my lsusd really needs a 'plugin' for pm_utils to get it to communicate with
lsusd rather than writing to /sys/power/state ... I should probably add
that.  Then it would be complete and usable on current desktops).



> 
> > User-space solutions are much easier to introduce and then deprecate.
> 
> That's demonstrably incorrect and the counter example is the hibernation user
> space interface.  The sheer amount of work needed to implement user
> space-driven hibernation and maintain that code shows that it's not exactly
> easy and it would be more difficult to deprecate than many existing kernel
> interfaces at this point.
> 
> So, even if you have implemented something in user space, the "no regressions"
> rule and deprecation difficulties will apply to it as well as to the kernel as
> soon as you make a sufficient number of people use it.

Can we agree then that we shouldn't impose any part of a possible solution on
anyone until it has been sensibly tested and reviewed in a variety of
different use cases and found to be reliable and usable?

I think that addresses my main concern with kernel-space additions - I fear
that parts of them will end up unnecessary and unused but we will be stuck
with them.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-23 23:44                 ` NeilBrown
@ 2011-10-24 10:23                   ` Rafael J. Wysocki
  2011-10-25  2:52                     ` NeilBrown
  0 siblings, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-24 10:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

On Monday, October 24, 2011, NeilBrown wrote:
> On Sun, 23 Oct 2011 15:16:36 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Sunday, October 23, 2011, NeilBrown wrote:
> > > On Sun, 23 Oct 2011 00:07:33 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > 
> > > > On Tuesday, October 18, 2011, NeilBrown wrote:
> 
> > > > > > 
> > > > > > > With that problem solved, experimenting is much easier in user-space than in
> > > > > > > the kernel.
> > > > > > 
> > > > > > Somehow, I'm not exactly sure if we should throw all kernel-based solutions away
> > > > > > just yet.
> > > > > 
> > > > > My rule-of-thumb is that we should reserve kernel space for when
> > > > >   a/ it cannot be done in user space
> > > > >   b/ it cannot be done efficient in user space
> > > > >   c/ it cannot be done securely in user space
> > > > > 
> > > > > I don't think any of those have been demonstrated yet.  If/when they are it
> > > > > would be good to get those kernel-based solutions out of the draw (so yes:
> > > > > keep them out of the rubbish bin).
> > > > 
> > > > I have one more rule.  If my would-be user space solution has the following
> > > > properties:
> > > > 
> > > > * It is supposed to be used by all of the existing variants of user space
> > > >   (i.e. all existing variants of user space are expected to use the very same
> > > >   thing).
> > > > 
> > > > * It requires all of those user space variants to be modified to work with it
> > > >   correctly.
> > > > 
> > > > * It includes a daemon process having to be started on boot and run permanently.
> > > > 
> > > > then it likely is better to handle the problem in the kernel.
> > > 
> > > By that set or rules, upowerd, dbus, pulse audio, bluez, and probably systemd
> > > all need to go in the kernel.  My guess is that you might not find wide
> > > acceptance for these rules.
> > 
> > Well, that's not what I thought.  Perhaps I didn't express that precisely
> > enough.  Take systemd, for example.  You still can design and use a Linux-based
> > system without systemd, so there's no requirement that _all_ variants of user
> > space use the given approach.  The choice of whether or not to use systemd
> > is not a choice between a working and non-working system.
> > 
> > However, this is not the case with the system daemon, becuase it's supposed
> > to handle problems that aren't possible to address without it.  So either you
> > use it, or you end up with a (slightly) broken system.
> 
> I think you are seeing a distinction that isn't there.
> 
> Every system needs a process to run as 'init' - as PID == 1.
> It might be systemd, it might be sysv-init, it might be /bin/sh, but there
> are tasks that process much perform and there must be exactly one process
> performing those tasks and the test of the systems need to be able to work
> with that task (or ignore if it it is wholely independent).
> 
> Similarly every system need one process to manage suspend.  It can be my
> daemon or your daemon or Alan's daemon but it cannot be 2 or more of them
> running at the same time as that doesn't make any more sense than having
> systemd and init running at the same time.

I agree that it doesn't makes sense.  I don't agree that it implies people
won't try to do that.

> > > > > So I'd respond with "I'm not at all sure that we should throw away an
> > > > > all-userspace solution just yet".  Particularly because many of us seem to
> > > > > still be working to understand what all the issues really are.
> > > > 
> > > > OK, so perhaps we should try to implement two concurrent solutions, one
> > > > kernel-based and one purely in user space and decide which one is better
> > > > afterwards?
> > > 
> > > Absolutely.
> > > 
> > > My primary reason for entering this discussion is eloquently presented in
> > >        http://xkcd.com/386/
> > > 
> > > Someone said "We need to change the kernel to get race-free suspend" and this
> > > simply is not true.  I wanted to present a way to use the existing
> > > functionality to provide race-free suspend - and now even have code to do it.
> > > 
> > > If someone else wants to write a different implementation, either in
> > > userspace or kernel that is fine.
> > > 
> > > They can then present it as "I know this can be implemented in userspace, but
> > > I don't like that solution for reasons X, Y, Z and so here is my better
> > > kernel-space implementation" then that is cool.  We can examine X, Y, Z and
> > > the code and see if the argument holds up.  Maybe it will, maybe not.
> > > 
> > > So far the only arguments I've seen for putting the code in the kernel are:
> > > 
> > >  1/ it cannot be done in userspace - demonstrably wrong
> > 
> > I'm not sure if that's correct.  If you meant "it can be done in user space
> > without _any_ kernel modifications", I probably wouldn't agree.
> 
> I have code to do it correctly today with no kernel modifications.  It is
> called "lsusd".   Proof by example.  Or can you show that lsusd doesn't work
> correctly?

So why do you consider making changes to the kernel (described in the other
part of the thread)?  Are they completely cosmetic or are they needed for
functionality?

> > >  2/ it is more efficient in the kernel - not demonstrated or even
> > >     convincingly argued
> > 
> > I don't agree with that, but let's see.
> 
> If you don't agree, then you presumably have a demonstration or a convincing
> argument.  Can you share it?

I think I'll post a patch, but it'll take some time for me to develop it.

> > >  3/ doing it in user-space is too confusing - we would need a clear
> > >     demonstration that a kernel interface is less confusing - and still
> > >     correct.  Also the best way to remove confusion is with clear
> > >     documentation and sample code, not by making up new interfaces.
> > 
> > The user space solution makes up new interfaces too, although they are
> > confined to user space.
> > 
> > To me, it all boils down to two factors: (1) the complexity and efficiency
> > of the code needed to implement the feature and (2) the complexity of the
> > resulting framework (be it in the kernel or in user space).
> > 
> > >  4/ doing it in the kernel makes it more accessible to multiple desktops.
> > >     The success of freedesktop.org seems to contradict that.
> > 
> > I don't agree here too.  Is Android a member of freedesktop.org?
> >
> 
> This is completely irrelevant.
> 
> The "multiple desktops" issue that you brought up is (as I understand it)
> multiple desktops running on the same computer, whether concurrently or
> sequentially.
> Android simply does not face that issue - it is the only "desktop" and is in
> complete control of the machine it runs on.
> So it doesn't need to solve the issue, so it doesn't need to be a member of
> freedesktop.org.

I didn't understand what you meant by "multiple desktops", sorry about that.

> > > So if you can do it a "better" way, please do.  But also please make sure
> > > you can quantify "better".   I claim that user-space solutions are "better"
> > > because they are more flexible and easier to experiment with.  The "no
> > > regressions" rule actively discourages experimentation in the kernel so
> > > people should only do it if there is a clear benefit.
> > 
> > You seem to suppose that every kernel modification necessarily has a potential
> > to lead to some regressions.  I'm not exactly use if that's correct
> > (e.g. adding a new driver usually doesn't affect people who don't need it).
> 
> I think that experimenting in the kernel (or at least in the upstream kernel)
> is likely to result in creating functionality that ultimately will
> not get used - the whole point of experimenting is that you probably get it
> wrong the first time.
> If this happens we either:
>   - remove the unwanted functionality, which could be considered a regression
>     and so must be done very carefully

Unless nobody uses it, that is. :-)

>   - leave the unwanted functionality there thus creating clutter and a
>     maintenance burden.

I don't see this as a big problem.  I can handle that at least. :-)

> i.e. the point of the "no-regressions" reference is that it tends to make it
> harder to remove mistakes.  Not impossible of course, but it requires a lot
> more care and time.
> 
> So I am against adding code to the kernel until the problem is really well
> understood.  From the sorts of discussion that has been going on both in
> this thread and elsewhere I'm not convinced the problem really is well
> understood at all.
> I think we are very much at the stage where people should be experimenting
> with solutions, sharing the results, and learning.
> 
> So please feel free to publish sample code - whether for the kernel or for
> user-space.  But it will only be credible if it is a fairly complete
> proposal - e.g. with sample code demonstrating how the kernel features are
> used.
> 
> (my lsusd really needs a 'plugin' for pm_utils to get it to communicate with
> lsusd rather than writing to /sys/power/state ... I should probably add
> that.  Then it would be complete and usable on current desktops).

I'm actually glad that lsusd has been developed, that's something I've been
advocating for quite a while.  Still, I'm not sure how useful it turns out
to be for distros etc.

> > > User-space solutions are much easier to introduce and then deprecate.
> > 
> > That's demonstrably incorrect and the counter example is the hibernation user
> > space interface.  The sheer amount of work needed to implement user
> > space-driven hibernation and maintain that code shows that it's not exactly
> > easy and it would be more difficult to deprecate than many existing kernel
> > interfaces at this point.
> > 
> > So, even if you have implemented something in user space, the "no regressions"
> > rule and deprecation difficulties will apply to it as well as to the kernel as
> > soon as you make a sufficient number of people use it.
> 
> Can we agree then that we shouldn't impose any part of a possible solution on
> anyone until it has been sensibly tested and reviewed in a variety of
> different use cases and found to be reliable and usable?

Yes, of course.  That's why my patches in this area have been added the RFC
label in the first place.

> I think that addresses my main concern with kernel-space additions - I fear
> that parts of them will end up unnecessary and unused but we will be stuck
> with them.

OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-24 10:23                   ` Rafael J. Wysocki
@ 2011-10-25  2:52                     ` NeilBrown
  2011-10-25  7:47                       ` Valdis.Kletnieks
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-25  2:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern

[-- Attachment #1: Type: text/plain, Size: 4613 bytes --]

On Mon, 24 Oct 2011 12:23:43 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Monday, October 24, 2011, NeilBrown wrote:
> > On Sun, 23 Oct 2011 15:16:36 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> > Similarly every system need one process to manage suspend.  It can be my
> > daemon or your daemon or Alan's daemon but it cannot be 2 or more of them
> > running at the same time as that doesn't make any more sense than having
> > systemd and init running at the same time.
> 
> I agree that it doesn't makes sense.  I don't agree that it implies people
> won't try to do that.

Does that matter?  If they complain, tell them it isn't a supported
configuration.

> 
> > > > > > So I'd respond with "I'm not at all sure that we should throw away an
> > > > > > all-userspace solution just yet".  Particularly because many of us seem to
> > > > > > still be working to understand what all the issues really are.
> > > > > 
> > > > > OK, so perhaps we should try to implement two concurrent solutions, one
> > > > > kernel-based and one purely in user space and decide which one is better
> > > > > afterwards?
> > > > 
> > > > Absolutely.
> > > > 
> > > > My primary reason for entering this discussion is eloquently presented in
> > > >        http://xkcd.com/386/
> > > > 
> > > > Someone said "We need to change the kernel to get race-free suspend" and this
> > > > simply is not true.  I wanted to present a way to use the existing
> > > > functionality to provide race-free suspend - and now even have code to do it.
> > > > 
> > > > If someone else wants to write a different implementation, either in
> > > > userspace or kernel that is fine.
> > > > 
> > > > They can then present it as "I know this can be implemented in userspace, but
> > > > I don't like that solution for reasons X, Y, Z and so here is my better
> > > > kernel-space implementation" then that is cool.  We can examine X, Y, Z and
> > > > the code and see if the argument holds up.  Maybe it will, maybe not.
> > > > 
> > > > So far the only arguments I've seen for putting the code in the kernel are:
> > > > 
> > > >  1/ it cannot be done in userspace - demonstrably wrong
> > > 
> > > I'm not sure if that's correct.  If you meant "it can be done in user space
> > > without _any_ kernel modifications", I probably wouldn't agree.
> > 
> > I have code to do it correctly today with no kernel modifications.  It is
> > called "lsusd".   Proof by example.  Or can you show that lsusd doesn't work
> > correctly?
> 
> So why do you consider making changes to the kernel (described in the other
> part of the thread)?  Are they completely cosmetic or are they needed for
> functionality?

Not needed.  Maybe helpful.

I have suggested three kernel changes - with varying levels of seriousness.

1/ Changes to wakeup_count so that it can be read without blocking.
   This is currently just a "general cleanliness" issue.  It could become
   more of an issue if some kernel code activated a wakeup_source for a long
   time.
   It is not a problem at all for my current code, but if we wanted a single
   suspend daemon that didn't need threads or a helper process, then it might
   become an issue.

2/ Changes to flock locking so that a process can get notified when a
   lock attempt might succeed.  This is just me grumbling about incomplete
   locking semantics and has nothing to do with power management directly.

3/ Activating a wakeup_source when an RTC alarm fires.  This patch was
   proposed by John Stultz - I just supported it.
   It isn't strict necessary as the suspend daemon can check the RTC
   just before suspending and refuse to suspend in the alarm will fire in the
   next 2 seconds.
   However this assumes that the suspend will then complete within 2 seconds.
   This seems likely but I don't know that it is guaranteed.  The 2 second
   window could be extended, but that isn't really ideal.
   So this is one kernel change that could be deemed to be "necessary".
   However it isn't really a chance in design at all - it just acknowledges
   that the RTC alarm is a wakeup source, so registers a wakeup_source for
   it, so it is really just a bug fix.
   I'm still interested to know what you think of this patch.  While it isn't
   strictly needed I think it would be very helpful.

   (Without this, alarmtimers is racy too ... and  it doesn't even insert a
    2 second window .... I'm not really convinced alarmtimers is a good thing
    but it isn't clear that it is a bad thing either).

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-25  2:52                     ` NeilBrown
@ 2011-10-25  7:47                       ` Valdis.Kletnieks
  2011-10-25  8:35                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 80+ messages in thread
From: Valdis.Kletnieks @ 2011-10-25  7:47 UTC (permalink / raw)
  To: NeilBrown
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz,
	Alan Stern

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

On Tue, 25 Oct 2011 13:52:44 +1100, NeilBrown said:
> On Mon, 24 Oct 2011 12:23:43 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>
> > On Monday, October 24, 2011, NeilBrown wrote:
> > > On Sun, 23 Oct 2011 15:16:36 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > Similarly every system need one process to manage suspend.  It can be my
> > > daemon or your daemon or Alan's daemon but it cannot be 2 or more of them
> > > running at the same time as that doesn't make any more sense than having
> > > systemd and init running at the same time.
> >
> > I agree that it doesn't makes sense.  I don't agree that it implies people
> > won't try to do that.
>
> Does that matter?  If they complain, tell them it isn't a supported
> configuration.

We however *should* design things in such a way that if a second one is started
up, it tosses a nice obvious -EIDIOT error of some sort.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-25  7:47                       ` Valdis.Kletnieks
@ 2011-10-25  8:35                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-25  8:35 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz,
	Alan Stern

On Tuesday, October 25, 2011, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 25 Oct 2011 13:52:44 +1100, NeilBrown said:
> > On Mon, 24 Oct 2011 12:23:43 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >
> > > On Monday, October 24, 2011, NeilBrown wrote:
> > > > On Sun, 23 Oct 2011 15:16:36 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> > > > Similarly every system need one process to manage suspend.  It can be my
> > > > daemon or your daemon or Alan's daemon but it cannot be 2 or more of them
> > > > running at the same time as that doesn't make any more sense than having
> > > > systemd and init running at the same time.
> > >
> > > I agree that it doesn't makes sense.  I don't agree that it implies people
> > > won't try to do that.
> >
> > Does that matter?  If they complain, tell them it isn't a supported
> > configuration.
> 
> We however *should* design things in such a way that if a second one is started
> up, it tosses a nice obvious -EIDIOT error of some sort.

Well, that's exactly my point.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-21  5:23                             ` lsusd - The Linux SUSpend Daemon NeilBrown
  2011-10-21 16:07                               ` Alan Stern
  2011-10-21 20:10                               ` david
@ 2011-10-26 14:31                               ` Jan Engelhardt
  2011-10-27  4:34                                 ` NeilBrown
  2 siblings, 1 reply; 80+ messages in thread
From: Jan Engelhardt @ 2011-10-26 14:31 UTC (permalink / raw)
  To: NeilBrown
  Cc: Alan Stern, John Stultz, Rafael J. Wysocki, mark gross,
	Linux PM list, LKML


On Friday 2011-10-21 07:23, NeilBrown wrote:
>At:
>    git://neil.brown.name/lsusd
>or
>    http://neil.brown.name/git/lsusd
>
>you can find a bunch of proof-of-concept sample code that implements a
>"Linux SUSpend Daemon" with client support library and test programs.
>
>  lsused:
>      lsused (which needs a better name) listens on the socket
>            /var/run/suspend/registration

(lx)suspd and (lx)susp_eventd?
NB: The short form of Linux is usually lx (not l). Also, yours would be
the first I know to explicitly contain l/lx, which I think is rather
redundant, given I'll never run a type of suspend other than the linux
one anyway.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: lsusd - The Linux SUSpend Daemon
  2011-10-26 14:31                               ` Jan Engelhardt
@ 2011-10-27  4:34                                 ` NeilBrown
  0 siblings, 0 replies; 80+ messages in thread
From: NeilBrown @ 2011-10-27  4:34 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Alan Stern, John Stultz, Rafael J. Wysocki, mark gross,
	Linux PM list, LKML

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

On Wed, 26 Oct 2011 16:31:06 +0200 (CEST) Jan Engelhardt <jengelh@medozas.de>
wrote:

> 
> On Friday 2011-10-21 07:23, NeilBrown wrote:
> >At:
> >    git://neil.brown.name/lsusd
> >or
> >    http://neil.brown.name/git/lsusd
> >
> >you can find a bunch of proof-of-concept sample code that implements a
> >"Linux SUSpend Daemon" with client support library and test programs.
> >
> >  lsused:
> >      lsused (which needs a better name) listens on the socket
> >            /var/run/suspend/registration
> 
> (lx)suspd and (lx)susp_eventd?
> NB: The short form of Linux is usually lx (not l). Also, yours would be
> the first I know to explicitly contain l/lx, which I think is rather
> redundant, given I'll never run a type of suspend other than the linux
> one anyway.

Good points - thanks.
I'll certainly consider them if/when I try to decide on a better name.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-23 15:50             ` Alan Stern
@ 2011-10-27 21:06               ` Rafael J. Wysocki
  2011-10-28  0:02               ` NeilBrown
  1 sibling, 0 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-27 21:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz,
	Linus Torvalds

On Sunday, October 23, 2011, Alan Stern wrote:
> On Sun, 23 Oct 2011, Rafael J. Wysocki wrote:
> 
> > Moreover, the race is real, because if you have two processes trying to use
> > /sys/power/wakeup_count at the same time, you can get:
> > 
> > Process A		Process B
> > read from wakeup_count
> > talk to apps
> > write to wakeup_count
> > --------- wakeup event ----------
> > 			read from wakeup_count
> > 			talk to apps
> > 			write to wakeup_count
> > try to suspend -> success (should be failure, because the wakeup event
> > may still be processed by applications at this point and Process A hasn't
> > checked that).
> > 
> > Now, there are systems running two (or more) desktop environments each of
> > which has a power manager that may want to suspend on it's own.  They both
> > will probably use pm-utils, but then I somehow doubt that pm-utils is well
> > prepared to handle such concurrency.
> 
> I have no objection to adding a kernel-based mechanism for restricting
> the suspend interface to one process at a time.  However, that's just
> part of your most recent proposal.  The other part involves
> coordinating the requirements of all the processes that may want to
> prevent the system from suspending, which is a harder job.
> 
> 
> > I have one more rule.  If my would-be user space solution has the following
> > properties:
> > 
> > * It is supposed to be used by all of the existing variants of user space
> >   (i.e. all existing variants of user space are expected to use the very same
> >   thing).
> > 
> > * It requires all of those user space variants to be modified to work with it
> >   correctly.
> > 
> > * It includes a daemon process having to be started on boot and run permanently.
> > 
> > then it likely is better to handle the problem in the kernel.
> 
> This reasoning doesn't apply to the second problem of allowing
> processes to block suspend.  Whether the solution is implemented in the
> kernel or as a daemon, other programs will have to be modified to
> accomodate it.

That's correct, except if they are Android applications and the new
interface is compatible with what they already use.

> In fact, if it's done properly then these other programs should each
> need only a single set of modifications; the differences involved in 
> communicating with the kernel vs. a daemon could be encapsulated in a 
> shared library.
> 
> 
> Overall, I think the discussion is getting a little muddled because of
> a significant problem that has not yet been addressed sufficiently.
> 
> There is a big difference between Android's kernel wakelocks and the
> currently proposed use of wakeup_sources.  In Android, a kernel
> wakelock associated with an input device isn't released until the
> device's queue becomes empty, whereas we have been talking about
> releasing the corresponding wakeup_source as soon as data added to
> the queue becomes visible to userspace.
> 
> This is quite a significant difference.  It means there's a window of
> time (from when the data is added to the queue to when it is removed)  
> during which userspace is forced to cope with suspend races, instead of
> letting the kernel handle things.  This is what leads to our problems
> about sending fd's to the daemon process and sending a request to each
> client before the daemon starts a suspend.
> 
> (Other aspects of this problem that haven't been mentioned before: What
> happens when a client program using the notify-fd API wants to close
> one of the wakeup-capable fd's?  It would have to tell the daemon to
> close its copy of the fd as well.  And likewise, a client would have to 
> inform the daemon whenever it opened a new wakeup-capable device file.)
> 
> Now, in the end, I think our approach makes more sense in a general 
> setting.  The Android approach is okay for a restricted environment 
> where you know beforehand exactly which devices will be wakeup-capable 
> and which wakeup events will be monitored by userspace programs.  But 
> for the whole range of Linux-based systems, the kernel can't rely on 
> such information.
> 
> (If you think back to the original wakelock patches, for example,
> you'll remember that the patch descriptions were expressed in terms of
> what happens as the screen is turned on and off.  Obviously this is
> meaningless for systems that, unlike an Android phone, don't have a
> built-in screen.  I complained about this at the time, and the Android
> people seemed to have a hard time understanding what I was objecting
> to.)
> 
> So this is really our biggest problem.  If we can figure out a really
> good way to solve it, I predict we'll find that the kernel-based and
> daemon-based suspend solutions are extremely similar.

I agree that they are similar.

As to solving this particular issue, the Android problem has just reappeared
during the Kernel Summit in Prague and there has been a quite strong statement
from Linus that we should just merge the Android's wakelocks code.  I actually
agree with that, because I think that (1) we should really start to regard
Android as a legitimate Linux distribution and not something weird that just
happens to use the kernel in all of the wrong ways and (2) we should do our
best to support the Android user base.

Now, I'm not entirely sure that it's technically possible to merge it as is
without breaking the existing stuff that's become integrated with the
enabling/disabling of device wakeup, but at least we should be able to support
the user space interfaces for wakelocks used by Android, so that they work in
exactly the same way on top of slightly different code inside of the kernel.

I need to look at the most recent patches adding the wakelocks framework and
possibly find the least painful way of integrating it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-23 15:50             ` Alan Stern
  2011-10-27 21:06               ` Rafael J. Wysocki
@ 2011-10-28  0:02               ` NeilBrown
  2011-10-28  8:27                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-28  0:02 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz,
	Brian Swetland

[-- Attachment #1: Type: text/plain, Size: 7507 bytes --]

On Sun, 23 Oct 2011 11:50:40 -0400 (EDT) Alan Stern
<stern@rowland.harvard.edu> wrote:

> On Sun, 23 Oct 2011, Rafael J. Wysocki wrote:
> 
> > Moreover, the race is real, because if you have two processes trying to use
> > /sys/power/wakeup_count at the same time, you can get:
> > 
> > Process A		Process B
> > read from wakeup_count
> > talk to apps
> > write to wakeup_count
> > --------- wakeup event ----------
> > 			read from wakeup_count
> > 			talk to apps
> > 			write to wakeup_count
> > try to suspend -> success (should be failure, because the wakeup event
> > may still be processed by applications at this point and Process A hasn't
> > checked that).
> > 
> > Now, there are systems running two (or more) desktop environments each of
> > which has a power manager that may want to suspend on it's own.  They both
> > will probably use pm-utils, but then I somehow doubt that pm-utils is well
> > prepared to handle such concurrency.
> 
> I have no objection to adding a kernel-based mechanism for restricting
> the suspend interface to one process at a time.  However, that's just
> part of your most recent proposal.  The other part involves
> coordinating the requirements of all the processes that may want to
> prevent the system from suspending, which is a harder job.
> 
> 
> > I have one more rule.  If my would-be user space solution has the following
> > properties:
> > 
> > * It is supposed to be used by all of the existing variants of user space
> >   (i.e. all existing variants of user space are expected to use the very same
> >   thing).
> > 
> > * It requires all of those user space variants to be modified to work with it
> >   correctly.
> > 
> > * It includes a daemon process having to be started on boot and run permanently.
> > 
> > then it likely is better to handle the problem in the kernel.
> 
> This reasoning doesn't apply to the second problem of allowing
> processes to block suspend.  Whether the solution is implemented in the
> kernel or as a daemon, other programs will have to be modified to
> accomodate it.
> 
> In fact, if it's done properly then these other programs should each
> need only a single set of modifications; the differences involved in 
> communicating with the kernel vs. a daemon could be encapsulated in a 
> shared library.
> 
> 
> Overall, I think the discussion is getting a little muddled because of
> a significant problem that has not yet been addressed sufficiently.
> 
> There is a big difference between Android's kernel wakelocks and the
> currently proposed use of wakeup_sources.  In Android, a kernel
> wakelock associated with an input device isn't released until the
> device's queue becomes empty, whereas we have been talking about
> releasing the corresponding wakeup_source as soon as data added to
> the queue becomes visible to userspace.
> 
> This is quite a significant difference.  It means there's a window of
> time (from when the data is added to the queue to when it is removed)  
> during which userspace is forced to cope with suspend races, instead of
> letting the kernel handle things.  This is what leads to our problems
> about sending fd's to the daemon process and sending a request to each
> client before the daemon starts a suspend.
> 
> (Other aspects of this problem that haven't been mentioned before: What
> happens when a client program using the notify-fd API wants to close
> one of the wakeup-capable fd's?  It would have to tell the daemon to
> close its copy of the fd as well.  And likewise, a client would have to 
> inform the daemon whenever it opened a new wakeup-capable device file.)

In my current code the client only associates a single event fd with each
socket to the server, and when the client closes that socket, the fd gets
closed (though there are rough edges I think).
Teaching the client to use multiple fds per socket would not be difficult.
The biggest challenge would be choosing labels to use to identify the fds so
it can ask the server to close them - and that isn't hard.
But I certainly agree that this needs to be properly thought through and
resolved.

> 
> Now, in the end, I think our approach makes more sense in a general 
> setting.  The Android approach is okay for a restricted environment 
> where you know beforehand exactly which devices will be wakeup-capable 
> and which wakeup events will be monitored by userspace programs.  But 
> for the whole range of Linux-based systems, the kernel can't rely on 
> such information.

I think that is exactly right.  The Android code is understandable written
to particularly suit the Android context and may not be generally applicable.
I think the Android folk understand this and don't insist on having exactly
that code merged.  They just want the same functionality with the same
efficiency without unnecessary change to user-space.


> 
> (If you think back to the original wakelock patches, for example,
> you'll remember that the patch descriptions were expressed in terms of
> what happens as the screen is turned on and off.  Obviously this is
> meaningless for systems that, unlike an Android phone, don't have a
> built-in screen.  I complained about this at the time, and the Android
> people seemed to have a hard time understanding what I was objecting
> to.)
> 
> So this is really our biggest problem.  If we can figure out a really
> good way to solve it, I predict we'll find that the kernel-based and
> daemon-based suspend solutions are extremely similar.

Actually I think our biggest problem is - and has always been - communication
and understanding :-)

There are probably a dozen or more ways to solve this problem, each of which
has some impact on the kernel and some impact on the Android user-space.

We need an effective dialogue (we have had plenty of ineffective dialogue)
between people who know and care about Android and people who know and care
about the kernel.

I think we are having a useful discussion, but I think it would be much more
useful if we had some inside perspective and engagement with Android.

So I have added a Cc to Brian Swetland, hoping - Brian - that you might be
able to provide some insight - or maybe tell us where this discussion is
already happening and already progressing (maybe I missed something).

I'm particularly interested in:
 - is it fair to say that all wakeup events are - or could be - available to
   user-space though an 'fd' which reports POLLIN when an event is pending?
   If not - could you list some of those other wakeup events?
 - does a process that is handling wakeup events always "know" they are (or
   could be) wakeup events and so could take some extra action?  (assume for
   the moment that the action is free, it just has to be done for fds
   receiving wakeup events, and not for other fds).
 - How performance-sensitive is the opportunistic suspend event?  i.e. I'm
   assuming there are a collection of user-space and kernel-space things that
   block and unblock suspend from time to time.  At some point the last block
   is removed and the system should then enter suspend.  What sort of latency
   is acceptable at that point (microseconds? milliseconds?) and what sort of
   frequency would we expect that to happen (100HZ? 10HZ? 1HZ?  0.01HZ??)

I think answers to those would help a lot to parameterise the problem space.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-28  0:02               ` NeilBrown
@ 2011-10-28  8:27                 ` Rafael J. Wysocki
  2011-10-28 15:08                   ` Alan Stern
  0 siblings, 1 reply; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-28  8:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Alan Stern, Linux PM list, mark gross, LKML, John Stultz,
	Brian Swetland, Greg KH

On Friday, October 28, 2011, NeilBrown wrote:
> On Sun, 23 Oct 2011 11:50:40 -0400 (EDT) Alan Stern
> <stern@rowland.harvard.edu> wrote:
> 
> > On Sun, 23 Oct 2011, Rafael J. Wysocki wrote:
> > 
> > > Moreover, the race is real, because if you have two processes trying to use
> > > /sys/power/wakeup_count at the same time, you can get:
> > > 
> > > Process A		Process B
> > > read from wakeup_count
> > > talk to apps
> > > write to wakeup_count
> > > --------- wakeup event ----------
> > > 			read from wakeup_count
> > > 			talk to apps
> > > 			write to wakeup_count
> > > try to suspend -> success (should be failure, because the wakeup event
> > > may still be processed by applications at this point and Process A hasn't
> > > checked that).
> > > 
> > > Now, there are systems running two (or more) desktop environments each of
> > > which has a power manager that may want to suspend on it's own.  They both
> > > will probably use pm-utils, but then I somehow doubt that pm-utils is well
> > > prepared to handle such concurrency.
> > 
> > I have no objection to adding a kernel-based mechanism for restricting
> > the suspend interface to one process at a time.  However, that's just
> > part of your most recent proposal.  The other part involves
> > coordinating the requirements of all the processes that may want to
> > prevent the system from suspending, which is a harder job.
> > 
> > 
> > > I have one more rule.  If my would-be user space solution has the following
> > > properties:
> > > 
> > > * It is supposed to be used by all of the existing variants of user space
> > >   (i.e. all existing variants of user space are expected to use the very same
> > >   thing).
> > > 
> > > * It requires all of those user space variants to be modified to work with it
> > >   correctly.
> > > 
> > > * It includes a daemon process having to be started on boot and run permanently.
> > > 
> > > then it likely is better to handle the problem in the kernel.
> > 
> > This reasoning doesn't apply to the second problem of allowing
> > processes to block suspend.  Whether the solution is implemented in the
> > kernel or as a daemon, other programs will have to be modified to
> > accomodate it.
> > 
> > In fact, if it's done properly then these other programs should each
> > need only a single set of modifications; the differences involved in 
> > communicating with the kernel vs. a daemon could be encapsulated in a 
> > shared library.
> > 
> > 
> > Overall, I think the discussion is getting a little muddled because of
> > a significant problem that has not yet been addressed sufficiently.
> > 
> > There is a big difference between Android's kernel wakelocks and the
> > currently proposed use of wakeup_sources.  In Android, a kernel
> > wakelock associated with an input device isn't released until the
> > device's queue becomes empty, whereas we have been talking about
> > releasing the corresponding wakeup_source as soon as data added to
> > the queue becomes visible to userspace.
> > 
> > This is quite a significant difference.  It means there's a window of
> > time (from when the data is added to the queue to when it is removed)  
> > during which userspace is forced to cope with suspend races, instead of
> > letting the kernel handle things.  This is what leads to our problems
> > about sending fd's to the daemon process and sending a request to each
> > client before the daemon starts a suspend.
> > 
> > (Other aspects of this problem that haven't been mentioned before: What
> > happens when a client program using the notify-fd API wants to close
> > one of the wakeup-capable fd's?  It would have to tell the daemon to
> > close its copy of the fd as well.  And likewise, a client would have to 
> > inform the daemon whenever it opened a new wakeup-capable device file.)
> 
> In my current code the client only associates a single event fd with each
> socket to the server, and when the client closes that socket, the fd gets
> closed (though there are rough edges I think).
> Teaching the client to use multiple fds per socket would not be difficult.
> The biggest challenge would be choosing labels to use to identify the fds so
> it can ask the server to close them - and that isn't hard.
> But I certainly agree that this needs to be properly thought through and
> resolved.
> 
> > 
> > Now, in the end, I think our approach makes more sense in a general 
> > setting.  The Android approach is okay for a restricted environment 
> > where you know beforehand exactly which devices will be wakeup-capable 
> > and which wakeup events will be monitored by userspace programs.  But 
> > for the whole range of Linux-based systems, the kernel can't rely on 
> > such information.
> 
> I think that is exactly right.  The Android code is understandable written
> to particularly suit the Android context and may not be generally applicable.

I'm not sure why the heck this makes any difference.  For now, there doesn't
seem to be no one else who needs that functionality.  If there were people
like that we'd see some concurrent approaches appearing, but for now it's only
us considering the alternatives _theoretically_.

Moreover, if somebody who needs similar functionality and for whom the Android
stuff is not sufficient appears in the future, I don't see why not to address
his needs _at_ _that_ _time_ instead of trying to anticipate them (which is
kind of useless anyway, because we have no idea what those needs may be).

> I think the Android folk understand this and don't insist on having exactly
> that code merged.  They just want the same functionality with the same
> efficiency without unnecessary change to user-space.

The whole problem is that the Android code is proven to work on lots and
lots of systems and whatever else we can come up with will not be.

> > 
> > (If you think back to the original wakelock patches, for example,
> > you'll remember that the patch descriptions were expressed in terms of
> > what happens as the screen is turned on and off.  Obviously this is
> > meaningless for systems that, unlike an Android phone, don't have a
> > built-in screen.  I complained about this at the time, and the Android
> > people seemed to have a hard time understanding what I was objecting
> > to.)
> > 
> > So this is really our biggest problem.  If we can figure out a really
> > good way to solve it, I predict we'll find that the kernel-based and
> > daemon-based suspend solutions are extremely similar.
> 
> Actually I think our biggest problem is - and has always been - communication
> and understanding :-)
> 
> There are probably a dozen or more ways to solve this problem, each of which
> has some impact on the kernel and some impact on the Android user-space.
> 
> We need an effective dialogue (we have had plenty of ineffective dialogue)
> between people who know and care about Android and people who know and care
> about the kernel.
> 
> I think we are having a useful discussion, but I think it would be much more
> useful if we had some inside perspective and engagement with Android.
> 
> So I have added a Cc to Brian Swetland, hoping - Brian - that you might be
> able to provide some insight - or maybe tell us where this discussion is
> already happening and already progressing (maybe I missed something).
> 
> I'm particularly interested in:
>  - is it fair to say that all wakeup events are - or could be - available to
>    user-space though an 'fd' which reports POLLIN when an event is pending?
>    If not - could you list some of those other wakeup events?
>  - does a process that is handling wakeup events always "know" they are (or
>    could be) wakeup events and so could take some extra action?  (assume for
>    the moment that the action is free, it just has to be done for fds
>    receiving wakeup events, and not for other fds).
>  - How performance-sensitive is the opportunistic suspend event?  i.e. I'm
>    assuming there are a collection of user-space and kernel-space things that
>    block and unblock suspend from time to time.  At some point the last block
>    is removed and the system should then enter suspend.  What sort of latency
>    is acceptable at that point (microseconds? milliseconds?) and what sort of
>    frequency would we expect that to happen (100HZ? 10HZ? 1HZ?  0.01HZ??)
> 
> I think answers to those would help a lot to parameterise the problem space.

I'm sure they would, but I also think this already has taken too much time -
and too much pain for people who have to support two different kernels, the
mainline and the Android one, at the same time.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-28  8:27                 ` Rafael J. Wysocki
@ 2011-10-28 15:08                   ` Alan Stern
  2011-10-28 17:26                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 80+ messages in thread
From: Alan Stern @ 2011-10-28 15:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz,
	Brian Swetland, Greg KH

On Fri, 28 Oct 2011, Rafael J. Wysocki wrote:

> > > Now, in the end, I think our approach makes more sense in a general 
> > > setting.  The Android approach is okay for a restricted environment 
> > > where you know beforehand exactly which devices will be wakeup-capable 
> > > and which wakeup events will be monitored by userspace programs.  But 
> > > for the whole range of Linux-based systems, the kernel can't rely on 
> > > such information.
> > 
> > I think that is exactly right.  The Android code is understandable written
> > to particularly suit the Android context and may not be generally applicable.
> 
> I'm not sure why the heck this makes any difference.  For now, there doesn't
> seem to be no one else who needs that functionality.  If there were people
> like that we'd see some concurrent approaches appearing, but for now it's only
> us considering the alternatives _theoretically_.
> 
> Moreover, if somebody who needs similar functionality and for whom the Android
> stuff is not sufficient appears in the future, I don't see why not to address
> his needs _at_ _that_ _time_ instead of trying to anticipate them (which is
> kind of useless anyway, because we have no idea what those needs may be).

You're missing the point.  There could easily be situations where the
Android kernel will block suspend but a more general system should
_not_ block it.  Behavior that is appropriate in an Android phone might
not be appropriate in, say, a desktop system.

If we duplicate the Android functionality then people (who may or may
not theoretically want it now) might find that they _don't_ want the
new behavior.

> > I think the Android folk understand this and don't insist on having exactly
> > that code merged.  They just want the same functionality with the same
> > efficiency without unnecessary change to user-space.
> 
> The whole problem is that the Android code is proven to work on lots and
> lots of systems and whatever else we can come up with will not be.

But it seems likely that the Android code, which has been tested on 
only one kind of system, will _not_ work correctly on other kinds of 
systems.

Assuming we do go ahead and merge some form of the Android code, we
must make sure that it won't have bad effects in situations where it's
not needed or wanted.  This means more than configuring it away with 
Kconfig.  When it is present, there has to be a way to control its 
behavior in some detail.

Alan Stern


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-28 15:08                   ` Alan Stern
@ 2011-10-28 17:26                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 80+ messages in thread
From: Rafael J. Wysocki @ 2011-10-28 17:26 UTC (permalink / raw)
  To: Alan Stern
  Cc: NeilBrown, Linux PM list, mark gross, LKML, John Stultz,
	Brian Swetland, Greg KH

On Friday, October 28, 2011, Alan Stern wrote:
> On Fri, 28 Oct 2011, Rafael J. Wysocki wrote:
> 
> > > > Now, in the end, I think our approach makes more sense in a general 
> > > > setting.  The Android approach is okay for a restricted environment 
> > > > where you know beforehand exactly which devices will be wakeup-capable 
> > > > and which wakeup events will be monitored by userspace programs.  But 
> > > > for the whole range of Linux-based systems, the kernel can't rely on 
> > > > such information.
> > > 
> > > I think that is exactly right.  The Android code is understandable written
> > > to particularly suit the Android context and may not be generally applicable.
> > 
> > I'm not sure why the heck this makes any difference.  For now, there doesn't
> > seem to be no one else who needs that functionality.  If there were people
> > like that we'd see some concurrent approaches appearing, but for now it's only
> > us considering the alternatives _theoretically_.
> > 
> > Moreover, if somebody who needs similar functionality and for whom the Android
> > stuff is not sufficient appears in the future, I don't see why not to address
> > his needs _at_ _that_ _time_ instead of trying to anticipate them (which is
> > kind of useless anyway, because we have no idea what those needs may be).
> 
> You're missing the point.  There could easily be situations where the
> Android kernel will block suspend but a more general system should
> _not_ block it.  Behavior that is appropriate in an Android phone might
> not be appropriate in, say, a desktop system.
> 
> If we duplicate the Android functionality then people (who may or may
> not theoretically want it now) might find that they _don't_ want the
> new behavior.

Where is it said that is has to be mandatory?

> > > I think the Android folk understand this and don't insist on having exactly
> > > that code merged.  They just want the same functionality with the same
> > > efficiency without unnecessary change to user-space.
> > 
> > The whole problem is that the Android code is proven to work on lots and
> > lots of systems and whatever else we can come up with will not be.
> 
> But it seems likely that the Android code, which has been tested on 
> only one kind of system, will _not_ work correctly on other kinds of 
> systems.
> 
> Assuming we do go ahead and merge some form of the Android code, we
> must make sure that it won't have bad effects in situations where it's
> not needed or wanted.

Yes, that clearly is what we must do.

> This means more than configuring it away with Kconfig.  When it is present,
> there has to be a way to control its behavior in some detail.

I entirely agree.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-16 22:34         ` NeilBrown
  2011-10-17 14:45           ` Alan Stern
@ 2011-10-31 15:11           ` Richard Hughes
  1 sibling, 0 replies; 80+ messages in thread
From: Richard Hughes @ 2011-10-31 15:11 UTC (permalink / raw)
  To: linux-kernel

NeilBrown <neilb <at> suse.de> writes:
> gnome-power-manager talks to upowerd over dbus to ask for a suspend.

Quite a few other desktops talk to upower, including XFCE, KDE and LXDE.
It's basically the only way on a modern desktop a user can put the
machine to sleep without becoming root.

> upowerd then runs /usr/sbin/pm-suspend.
> pm-suspend then runs all the script in /usr/lib/pm-utils/sleep.d/
> and the calls "do_suspend" which is defined in
> /usr/lib/pm-utils/pm-functions
> 
> Ugghh.. That is a very deep stack that is doing things the "wrong"
> way.
> i.e. it is structured about request to suspend rather than requests to
> stay awake.

Erm, that's what it was designed to do. UPower has never had any feature
requests to manage "stay-awake" functionality as upower is pretty much a
mechanism daemon, rather than a policy daemon. Certainly just writing to
/sys if /usr/lib/pm-utils/pm-functions isn't installed would be a very
sane patch to suggest, given that pm-suspend in Fedora 16 basically
doesn't do anything anymore.

UPower does have signals that tell userspace when a suspend is about to
happen, and when the computer has been resumed, and this is done using
DBus.

I think it would be a shame to have Yet Another Daemon and Yet Another
Protocol just for managing this stuff, when upower already has DBus and
tons of client support. I think adding a tiny DBus interface on upower
to manage the stay-awake functionality would make things much less
complicated and give us a common desktop / embedded story.

Richard.

(please cc me in any replies, not subscribed)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-13 19:45 [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2011-10-14  5:52 ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces NeilBrown
@ 2011-10-31 19:55 ` Ming Lei
  2011-10-31 21:15   ` NeilBrown
  3 siblings, 1 reply; 80+ messages in thread
From: Ming Lei @ 2011-10-31 19:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, mark gross, LKML, John Stultz, Alan Stern,
	NeilBrown

Hi,

On Fri, Oct 14, 2011 at 3:45 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:

> Second, to address the backup problem, we need to allow user space
> processes other than the suspend/hibernate process itself to prevent the
> system from being put into sleep states.  A mechanism for that is introduced
> by the second patch in the form of the /dev/sleepctl special device working
> kind of like user space wakelocks on Android (although in a simplified
> fashion).

I also have another similar example: write(fd, buffer, 100*4096).

Suppose only 80*4096 are copied into pages of the file, then someone
run ' echo mem > /sys/power/state ' to trigger system sleep, so only
partial writing is completed before system sleep and data inconsistence
may be caused for the file on filesystem.

But I am not sure if it is possible to happen in reality.

thank,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-31 19:55 ` Ming Lei
@ 2011-10-31 21:15   ` NeilBrown
  2011-10-31 21:23     ` Ming Lei
  0 siblings, 1 reply; 80+ messages in thread
From: NeilBrown @ 2011-10-31 21:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz,
	Alan Stern

[-- Attachment #1: Type: text/plain, Size: 1737 bytes --]

On Tue, 1 Nov 2011 03:55:50 +0800 Ming Lei <tom.leiming@gmail.com> wrote:

> Hi,
> 
> On Fri, Oct 14, 2011 at 3:45 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > Second, to address the backup problem, we need to allow user space
> > processes other than the suspend/hibernate process itself to prevent the
> > system from being put into sleep states.  A mechanism for that is introduced
> > by the second patch in the form of the /dev/sleepctl special device working
> > kind of like user space wakelocks on Android (although in a simplified
> > fashion).
> 
> I also have another similar example: write(fd, buffer, 100*4096).
> 
> Suppose only 80*4096 are copied into pages of the file, then someone
> run ' echo mem > /sys/power/state ' to trigger system sleep, so only
> partial writing is completed before system sleep and data inconsistence
> may be caused for the file on filesystem.
> 
> But I am not sure if it is possible to happen in reality.
> 
> thank,

I'm not sure if it is possible either, but even if it is it isn't a new
problem.  A suspend is expected to leave all sorts of external things in
inconsistent states. The contents of memory implicitly records all these
inconsistencies and allows them to be resolved in the normal course of things
after resume.
If you lose power to memory during suspend then you can certainly expect
filesystems to be corrupted in exactly the same sort of way that they can be
corrupted by a crash.  This has always been the case and I assume always will
(until we get main-memory that preserves state without power)

We want to block suspend during backups not to avoid corruption but simply to
allow the backups to complete promptly.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces
  2011-10-31 21:15   ` NeilBrown
@ 2011-10-31 21:23     ` Ming Lei
  0 siblings, 0 replies; 80+ messages in thread
From: Ming Lei @ 2011-10-31 21:23 UTC (permalink / raw)
  To: NeilBrown
  Cc: Rafael J. Wysocki, Linux PM list, mark gross, LKML, John Stultz,
	Alan Stern

Hi,

On Tue, Nov 1, 2011 at 5:15 AM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 1 Nov 2011 03:55:50 +0800 Ming Lei <tom.leiming@gmail.com> wrote:
>
>> Hi,
>>
>> On Fri, Oct 14, 2011 at 3:45 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>
>> > Second, to address the backup problem, we need to allow user space
>> > processes other than the suspend/hibernate process itself to prevent the
>> > system from being put into sleep states.  A mechanism for that is introduced
>> > by the second patch in the form of the /dev/sleepctl special device working
>> > kind of like user space wakelocks on Android (although in a simplified
>> > fashion).
>>
>> I also have another similar example: write(fd, buffer, 100*4096).
>>
>> Suppose only 80*4096 are copied into pages of the file, then someone
>> run ' echo mem > /sys/power/state ' to trigger system sleep, so only
>> partial writing is completed before system sleep and data inconsistence
>> may be caused for the file on filesystem.
>>
>> But I am not sure if it is possible to happen in reality.
>>
>> thank,
>
> I'm not sure if it is possible either, but even if it is it isn't a new
> problem.  A suspend is expected to leave all sorts of external things in
> inconsistent states. The contents of memory implicitly records all these
> inconsistencies and allows them to be resolved in the normal course of things
> after resume.
> If you lose power to memory during suspend then you can certainly expect
> filesystems to be corrupted in exactly the same sort of way that they can be

Also if the external some usb mass storage is disconnected during suspend, then
filesystem on the device may be corrupted too, but it is a very reason operation
for user.

> corrupted by a crash.  This has always been the case and I assume always will
> (until we get main-memory that preserves state without power)
>
> We want to block suspend during backups not to avoid corruption but simply to
> allow the backups to complete promptly.

Yes, it does make sense.

thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2011-10-31 21:23 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-13 19:45 [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Rafael J. Wysocki
2011-10-13 19:49 ` [RFC][PATCH 1/2] PM / Sleep: Add mechanism to disable suspend and hibernation Rafael J. Wysocki
2011-10-13 19:50 ` [RFC][PATCH 2/2] PM / Sleep: Introduce cooperative suspend/hibernate mode Rafael J. Wysocki
2011-10-13 22:58   ` John Stultz
2011-10-14 22:49     ` Rafael J. Wysocki
2011-10-15  0:04       ` John Stultz
2011-10-15 21:29         ` Rafael J. Wysocki
2011-10-17 16:48           ` John Stultz
2011-10-17 18:19             ` Alan Stern
2011-10-17 19:08               ` John Stultz
2011-10-17 20:07                 ` Alan Stern
2011-10-17 20:34                   ` John Stultz
2011-10-17 20:38                 ` Rafael J. Wysocki
2011-10-17 21:20                   ` John Stultz
2011-10-17 21:19                 ` NeilBrown
2011-10-17 21:43                   ` John Stultz
2011-10-17 23:06                     ` NeilBrown
2011-10-17 23:14                     ` NeilBrown
2011-10-17 21:13             ` Rafael J. Wysocki
2011-10-14  5:52 ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces NeilBrown
2011-10-14 16:00   ` Alan Stern
2011-10-14 21:07     ` NeilBrown
2011-10-15 18:34       ` Alan Stern
2011-10-15 21:43         ` NeilBrown
2011-10-15 22:10   ` Rafael J. Wysocki
2011-10-16  2:49     ` Alan Stern
2011-10-16 14:51       ` Alan Stern
2011-10-16 20:32         ` Rafael J. Wysocki
2011-10-17 15:33           ` Alan Stern
2011-10-17 21:10             ` Rafael J. Wysocki
2011-10-17 21:27             ` Rafael J. Wysocki
2011-10-18 17:30               ` Alan Stern
2011-10-16 22:34         ` NeilBrown
2011-10-17 14:45           ` Alan Stern
2011-10-17 22:49             ` NeilBrown
2011-10-17 23:47               ` John Stultz
2011-10-18  2:13                 ` NeilBrown
2011-10-18 17:11                   ` Alan Stern
2011-10-18 22:55                     ` NeilBrown
2011-10-19 16:19                       ` Alan Stern
2011-10-20  0:17                         ` NeilBrown
2011-10-20 14:29                           ` Alan Stern
2011-10-21  5:05                             ` NeilBrown
2011-10-21  5:23                             ` lsusd - The Linux SUSpend Daemon NeilBrown
2011-10-21 16:07                               ` Alan Stern
2011-10-21 22:34                                 ` NeilBrown
2011-10-22  2:00                                   ` Alan Stern
2011-10-22 16:31                                     ` Alan Stern
2011-10-23  3:31                                       ` NeilBrown
2011-10-23  8:21                                     ` NeilBrown
2011-10-23 12:48                                       ` Rafael J. Wysocki
2011-10-23 23:04                                         ` NeilBrown
2011-10-23 16:17                                       ` Alan Stern
2011-10-21 20:10                               ` david
2011-10-21 22:09                                 ` NeilBrown
2011-10-26 14:31                               ` Jan Engelhardt
2011-10-27  4:34                                 ` NeilBrown
2011-10-31 15:11           ` [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernate interfaces Richard Hughes
2011-10-16 20:26       ` Rafael J. Wysocki
2011-10-16 23:48     ` NeilBrown
2011-10-17 15:43       ` Alan Stern
2011-10-17 22:02       ` Rafael J. Wysocki
2011-10-17 23:36         ` NeilBrown
2011-10-22 22:07           ` Rafael J. Wysocki
2011-10-23  2:57             ` NeilBrown
2011-10-23 13:16               ` Rafael J. Wysocki
2011-10-23 23:44                 ` NeilBrown
2011-10-24 10:23                   ` Rafael J. Wysocki
2011-10-25  2:52                     ` NeilBrown
2011-10-25  7:47                       ` Valdis.Kletnieks
2011-10-25  8:35                         ` Rafael J. Wysocki
2011-10-23 15:50             ` Alan Stern
2011-10-27 21:06               ` Rafael J. Wysocki
2011-10-28  0:02               ` NeilBrown
2011-10-28  8:27                 ` Rafael J. Wysocki
2011-10-28 15:08                   ` Alan Stern
2011-10-28 17:26                     ` Rafael J. Wysocki
2011-10-31 19:55 ` Ming Lei
2011-10-31 21:15   ` NeilBrown
2011-10-31 21:23     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).