* [RFC][PATCH] PM: Avoid losing wakeup events during suspend
@ 2010-06-19 22:05 Rafael J. Wysocki
0 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-19 22:05 UTC (permalink / raw)
To: linux-pm
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve
Hi,
One of the arguments during the suspend blockers discussion was that the
mainline kernel didn't contain any mechanisms allowing it to avoid losing
wakeup events during system suspend.
Generally, there are two problems in that area. First, if a wakeup event
occurs exactly at the same time when /sys/power/state is being written to,
the even may be delivered to user space right before the freezing of it,
in which case the user space consumer of the event may not be able to process
it before the system is suspended. Second, if a wakeup event occurs after user
space has been frozen and that event is not a wakeup interrupt, the kernel will
not react to it and the system will be suspended.
The following patch illustrates my idea of how these two problems may be
addressed. It introduces a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup events
and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
to increment the wakeup events counter.
/sys/power/wakeup_count may be read from or written to by user space. Reads
will always succeed and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to the
current value of the wakeup events counter. If a write is successful, it will
cause the kernel to save the current value of the wakeup events counter and
to compare the saved number with the current value of the counter at certain
points of the subsequent suspend (or hibernate) sequence. If the two values
don't match, the suspend will be aborted just as though a wakeup interrupt
happened. Reading from /sys/power/wakeup_count again will turn that mechanism
off.
The assumption is that there's a user space power manager that will first
read from /sys/power/wakeup_count. Then it will check all user space consumers
of wakeup events known to it for unprocessed events. If there are any, it will
wait for them to be processed and repeat. In turn, if there are not any,
it will try to write to /sys/power/wakeup_count and if the write is successful,
it will write to /sys/power/state to start suspend, so if any wakeup events
accur past that point, they will be noticed by the kernel and will eventually
cause the suspend to be aborted.
In addition to the above, the patch adds a wakeup events counter to the
power member of struct device and makes these per-device wakeup event counters
available via sysfs, so that it's possible to check the activity of various
wakeup event sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), I added it to the
PCI runtime PM wakeup-handling code.
At the moment the patch only contains code changes (ie. no documentation),
but I'm going to add comments etc. if people like the idea.
Please tell me what you think.
Rafael
---
drivers/base/power/Makefile | 2 -
drivers/base/power/main.c | 1
drivers/base/power/power.h | 3 +
drivers/base/power/sysfs.c | 9 ++++
drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++
drivers/pci/pci-acpi.c | 2 +
drivers/pci/pcie/pme/pcie_pme.c | 2 +
include/linux/pm.h | 6 +++
kernel/power/hibernate.c | 14 ++++---
kernel/power/main.c | 24 ++++++++++++
kernel/power/power.h | 6 +++
kernel/power/suspend.c | 2 -
12 files changed, 138 insertions(+), 7 deletions(-)
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec
power_attr(state);
+static ssize_t wakeup_count_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%lu\n", pm_get_wakeup_count());
+}
+
+static ssize_t wakeup_count_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t n)
+{
+ unsigned long val;
+
+ if (sscanf(buf, "%lu", &val) == 1) {
+ if (pm_save_wakeup_count(val))
+ return n;
+ }
+ return -EINVAL;
+}
+
+power_attr(wakeup_count);
+
#ifdef CONFIG_PM_TRACE
int pm_trace_enabled;
@@ -236,6 +258,7 @@ static struct attribute * g[] = {
#endif
#ifdef CONFIG_PM_SLEEP
&pm_async_attr.attr,
+ &wakeup_count_attr.attr,
#ifdef CONFIG_PM_DEBUG
&pm_test_attr.attr,
#endif
@@ -266,6 +289,7 @@ static int __init pm_init(void)
int error = pm_start_workqueue();
if (error)
return error;
+ pm_wakeup_events_init();
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
Index: linux-2.6/drivers/base/power/wakeup.c
===================================================================
--- /dev/null
+++ linux-2.6/drivers/base/power/wakeup.c
@@ -0,0 +1,74 @@
+
+#include <linux/device.h>
+#include <linux/pm.h>
+
+static unsigned long event_count;
+static unsigned long saved_event_count;
+static bool events_check_enabled;
+static spinlock_t events_lock;
+
+void pm_wakeup_events_init(void)
+{
+ spin_lock_init(&events_lock);
+}
+
+void pm_wakeup_event(struct device *dev)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&events_lock, flags);
+ event_count++;
+ if (dev)
+ dev->power.wakeup_count++;
+ spin_unlock_irqrestore(&events_lock, flags);
+}
+
+bool pm_check_wakeup_events(bool enable)
+{
+ unsigned long flags;
+ bool ret;
+
+ spin_lock_irqsave(&events_lock, flags);
+ ret = !events_check_enabled || (event_count == saved_event_count);
+ events_check_enabled = enable;
+ spin_unlock_irqrestore(&events_lock, flags);
+ return ret;
+}
+
+unsigned long pm_get_wakeup_count(void)
+{
+ unsigned long flags;
+ unsigned long count;
+
+ spin_lock_irqsave(&events_lock, flags);
+ events_check_enabled = false;
+ count = event_count;
+ spin_unlock_irqrestore(&events_lock, flags);
+ return count;
+}
+
+bool pm_save_wakeup_count(unsigned long count)
+{
+ unsigned long flags;
+ bool ret = false;
+
+ spin_lock_irqsave(&events_lock, flags);
+ if (count == event_count) {
+ saved_event_count = count;
+ events_check_enabled = true;
+ ret = true;
+ }
+ spin_unlock_irqrestore(&events_lock, flags);
+ return ret;
+}
+
+unsigned long pm_dev_wakeup_count(struct device *dev)
+{
+ unsigned long flags;
+ unsigned long count;
+
+ spin_lock_irqsave(&events_lock, flags);
+ count = dev->power.wakeup_count;
+ spin_unlock_irqrestore(&events_lock, flags);
+ return count;
+}
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -457,6 +457,7 @@ struct dev_pm_info {
#ifdef CONFIG_PM_SLEEP
struct list_head entry;
struct completion completion;
+ unsigned long wakeup_count;
#endif
#ifdef CONFIG_PM_RUNTIME
struct timer_list suspend_timer;
@@ -552,6 +553,9 @@ extern void __suspend_report_result(cons
} while (0)
extern void device_pm_wait_for_dev(struct device *sub, struct device *dev);
+
+/* drivers/base/power/wakeup.c */
+extern void pm_wakeup_event(struct device *dev);
#else /* !CONFIG_PM_SLEEP */
#define device_pm_lock() do {} while (0)
@@ -565,6 +569,8 @@ static inline int dpm_suspend_start(pm_m
#define suspend_report_result(fn, ret) do {} while (0)
static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {}
+
+static inline void pm_wakeup_event(struct device *dev) {}
#endif /* !CONFIG_PM_SLEEP */
/* How to reorder dpm_list after device_move() */
Index: linux-2.6/drivers/base/power/Makefile
===================================================================
--- linux-2.6.orig/drivers/base/power/Makefile
+++ linux-2.6/drivers/base/power/Makefile
@@ -1,5 +1,5 @@
obj-$(CONFIG_PM) += sysfs.o
-obj-$(CONFIG_PM_SLEEP) += main.o
+obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o
obj-$(CONFIG_PM_RUNTIME) += runtime.o
obj-$(CONFIG_PM_OPS) += generic_ops.o
obj-$(CONFIG_PM_TRACE_RTC) += trace.o
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,7 @@ void device_pm_init(struct device *dev)
{
dev->power.status = DPM_ON;
init_completion(&dev->power.completion);
+ dev->power.wakeup_count = 0;
pm_runtime_init(dev);
}
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -184,6 +184,12 @@ static inline void suspend_test_finish(c
#ifdef CONFIG_PM_SLEEP
/* kernel/power/main.c */
extern int pm_notifier_call_chain(unsigned long val);
+
+/* drivers/base/power/wakeup.c */
+extern void pm_wakeup_events_init(void);
+extern bool pm_check_wakeup_events(bool enable);
+extern unsigned long pm_get_wakeup_count(void);
+extern bool pm_save_wakeup_count(unsigned long count);
#endif
#ifdef CONFIG_HIGHMEM
Index: linux-2.6/kernel/power/suspend.c
===================================================================
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -157,7 +157,7 @@ static int suspend_enter(suspend_state_t
error = sysdev_suspend(PMSG_SUSPEND);
if (!error) {
- if (!suspend_test(TEST_CORE))
+ if (!suspend_test(TEST_CORE) && pm_check_wakeup_events(false))
error = suspend_ops->enter(state);
sysdev_resume();
}
Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- linux-2.6.orig/kernel/power/hibernate.c
+++ linux-2.6/kernel/power/hibernate.c
@@ -277,7 +277,7 @@ static int create_image(int platform_mod
goto Enable_irqs;
}
- if (hibernation_test(TEST_CORE))
+ if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events(true))
goto Power_up;
in_suspend = 1;
@@ -511,14 +511,18 @@ int hibernation_platform_enter(void)
local_irq_disable();
sysdev_suspend(PMSG_HIBERNATE);
+ if (!pm_check_wakeup_events(false))
+ goto Power_up;
+
hibernation_ops->enter();
/* We should never get here */
while (1);
- /*
- * We don't need to reenable the nonboot CPUs or resume consoles, since
- * the system is going to be halted anyway.
- */
+ Power_up:
+ sysdev_resume();
+ local_irq_enable();
+ enable_nonboot_cpus();
+
Platform_finish:
hibernation_ops->finish();
Index: linux-2.6/drivers/pci/pci-acpi.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-acpi.c
+++ linux-2.6/drivers/pci/pci-acpi.c
@@ -48,6 +48,8 @@ static void pci_acpi_wake_dev(acpi_handl
if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) {
pci_check_pme_status(pci_dev);
pm_runtime_resume(&pci_dev->dev);
+ if (device_may_wakeup(&pci_dev->dev))
+ pm_wakeup_event(&pci_dev->dev);
if (pci_dev->subordinate)
pci_pme_wakeup_bus(pci_dev->subordinate);
}
Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c
+++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c
@@ -147,6 +147,8 @@ static bool pcie_pme_walk_bus(struct pci
/* Skip PCIe devices in case we started from a root port. */
if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) {
pm_request_resume(&dev->dev);
+ if (device_may_wakeup(&dev->dev))
+ pm_wakeup_event(&dev->dev);
ret = true;
}
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -30,6 +30,9 @@ extern void device_pm_move_before(struct
extern void device_pm_move_after(struct device *, struct device *);
extern void device_pm_move_last(struct device *);
+/* drivers/base/power/wakeup.c */
+extern unsigned long pm_dev_wakeup_count(struct device *dev);
+
#else /* !CONFIG_PM_SLEEP */
static inline void device_pm_init(struct device *dev)
Index: linux-2.6/drivers/base/power/sysfs.c
===================================================================
--- linux-2.6.orig/drivers/base/power/sysfs.c
+++ linux-2.6/drivers/base/power/sysfs.c
@@ -144,6 +144,14 @@ wake_store(struct device * dev, struct d
static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store);
+static ssize_t wakeup_count_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%lu\n", pm_dev_wakeup_count(dev));
+}
+
+static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL);
+
#ifdef CONFIG_PM_ADVANCED_DEBUG
#ifdef CONFIG_PM_RUNTIME
@@ -230,6 +238,7 @@ static struct attribute * power_attrs[]
&dev_attr_control.attr,
#endif
&dev_attr_wakeup.attr,
+ &dev_attr_wakeup_count.attr,
#ifdef CONFIG_PM_ADVANCED_DEBUG
&dev_attr_async.attr,
#ifdef CONFIG_PM_RUNTIME
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <201006200005.35771.rjw@sisk.pl>
@ 2010-06-20 5:52 ` mark gross
[not found] ` <20100620055254.GA21994@gvim.org>
` (2 subsequent siblings)
3 siblings, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-20 5:52 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Sun, Jun 20, 2010 at 12:05:35AM +0200, Rafael J. Wysocki wrote:
> Hi,
>
> One of the arguments during the suspend blockers discussion was that the
> mainline kernel didn't contain any mechanisms allowing it to avoid losing
> wakeup events during system suspend.
>
> Generally, there are two problems in that area. First, if a wakeup event
> occurs exactly at the same time when /sys/power/state is being written to,
> the even may be delivered to user space right before the freezing of it,
> in which case the user space consumer of the event may not be able to process
yes this is racy. souldn't the wakeup event handers/driver force a user
mode ACK before they stop failing suspend attempts?
> it before the system is suspended. Second, if a wakeup event occurs after user
> space has been frozen and that event is not a wakeup interrupt, the kernel will
> not react to it and the system will be suspended.
If its not a wakeup interrupt is it not fair to allow the suspend to
happen even if its handler's are "in flight" at suspend time?
>
> The following patch illustrates my idea of how these two problems may be
> addressed. It introduces a new global sysfs attribute,
> /sys/power/wakeup_count, associated with a running counter of wakeup events
> and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> to increment the wakeup events counter.
>
> /sys/power/wakeup_count may be read from or written to by user space. Reads
> will always succeed and return the current value of the wakeup events counter.
> Writes, however, will only succeed if the written number is equal to the
> current value of the wakeup events counter. If a write is successful, it will
> cause the kernel to save the current value of the wakeup events counter and
> to compare the saved number with the current value of the counter at certain
> points of the subsequent suspend (or hibernate) sequence. If the two values
> don't match, the suspend will be aborted just as though a wakeup interrupt
> happened. Reading from /sys/power/wakeup_count again will turn that mechanism
> off.
why would you want to turn it off?
>
> The assumption is that there's a user space power manager that will first
> read from /sys/power/wakeup_count. Then it will check all user space consumers
> of wakeup events known to it for unprocessed events. If there are any, it will
> wait for them to be processed and repeat. In turn, if there are not any,
> it will try to write to /sys/power/wakeup_count and if the write is successful,
> it will write to /sys/power/state to start suspend, so if any wakeup events
> accur past that point, they will be noticed by the kernel and will eventually
> cause the suspend to be aborted.
>
> In addition to the above, the patch adds a wakeup events counter to the
> power member of struct device and makes these per-device wakeup event counters
> available via sysfs, so that it's possible to check the activity of various
> wakeup event sources within the kernel.
>
> To illustrate how subsystems can use pm_wakeup_event(), I added it to the
> PCI runtime PM wakeup-handling code.
>
> At the moment the patch only contains code changes (ie. no documentation),
> but I'm going to add comments etc. if people like the idea.
>
> Please tell me what you think.
>
> Rafael
>
> ---
> drivers/base/power/Makefile | 2 -
> drivers/base/power/main.c | 1
> drivers/base/power/power.h | 3 +
> drivers/base/power/sysfs.c | 9 ++++
> drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++
> drivers/pci/pci-acpi.c | 2 +
> drivers/pci/pcie/pme/pcie_pme.c | 2 +
> include/linux/pm.h | 6 +++
> kernel/power/hibernate.c | 14 ++++---
> kernel/power/main.c | 24 ++++++++++++
> kernel/power/power.h | 6 +++
> kernel/power/suspend.c | 2 -
> 12 files changed, 138 insertions(+), 7 deletions(-)
>
> Index: linux-2.6/kernel/power/main.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/main.c
> +++ linux-2.6/kernel/power/main.c
> @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec
>
> power_attr(state);
>
> +static ssize_t wakeup_count_show(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + return sprintf(buf, "%lu\n", pm_get_wakeup_count());
> +}
> +
> +static ssize_t wakeup_count_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t n)
> +{
> + unsigned long val;
> +
> + if (sscanf(buf, "%lu", &val) == 1) {
> + if (pm_save_wakeup_count(val))
> + return n;
> + }
> + return -EINVAL;
> +}
> +
> +power_attr(wakeup_count);
> +
> #ifdef CONFIG_PM_TRACE
> int pm_trace_enabled;
>
> @@ -236,6 +258,7 @@ static struct attribute * g[] = {
> #endif
> #ifdef CONFIG_PM_SLEEP
> &pm_async_attr.attr,
> + &wakeup_count_attr.attr,
> #ifdef CONFIG_PM_DEBUG
> &pm_test_attr.attr,
> #endif
> @@ -266,6 +289,7 @@ static int __init pm_init(void)
> int error = pm_start_workqueue();
> if (error)
> return error;
> + pm_wakeup_events_init();
> power_kobj = kobject_create_and_add("power", NULL);
> if (!power_kobj)
> return -ENOMEM;
> Index: linux-2.6/drivers/base/power/wakeup.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/drivers/base/power/wakeup.c
> @@ -0,0 +1,74 @@
> +
> +#include <linux/device.h>
> +#include <linux/pm.h>
> +
> +static unsigned long event_count;
> +static unsigned long saved_event_count;
what about over flow issues?
> +static bool events_check_enabled;
> +static spinlock_t events_lock;
> +
> +void pm_wakeup_events_init(void)
> +{
> + spin_lock_init(&events_lock);
> +}
> +
> +void pm_wakeup_event(struct device *dev)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&events_lock, flags);
> + event_count++;
should event_count be an atomic type so we can not bother with taking
the evnets_lock?
> + if (dev)
> + dev->power.wakeup_count++;
> + spin_unlock_irqrestore(&events_lock, flags);
> +}
> +
> +bool pm_check_wakeup_events(bool enable)
> +{
> + unsigned long flags;
> + bool ret;
> +
> + spin_lock_irqsave(&events_lock, flags);
> + ret = !events_check_enabled || (event_count == saved_event_count);
I'm not getting the events_check_enbled flag yet.
> + events_check_enabled = enable;
I'm not sure if this is the right thing depending on all the different
ways the boolians are interacting with eachother.
Which is a red flag to me. This code is confusing.
I'll look at it some more when I'm fresh tomorrow.
--mgross
> + spin_unlock_irqrestore(&events_lock, flags);
> + return ret;
> +}
> +
> +unsigned long pm_get_wakeup_count(void)
> +{
> + unsigned long flags;
> + unsigned long count;
> +
> + spin_lock_irqsave(&events_lock, flags);
> + events_check_enabled = false;
> + count = event_count;
> + spin_unlock_irqrestore(&events_lock, flags);
> + return count;
> +}
> +
> +bool pm_save_wakeup_count(unsigned long count)
> +{
> + unsigned long flags;
> + bool ret = false;
> +
> + spin_lock_irqsave(&events_lock, flags);
> + if (count == event_count) {
> + saved_event_count = count;
> + events_check_enabled = true;
> + ret = true;
> + }
> + spin_unlock_irqrestore(&events_lock, flags);
> + return ret;
> +}
> +
> +unsigned long pm_dev_wakeup_count(struct device *dev)
> +{
> + unsigned long flags;
> + unsigned long count;
> +
> + spin_lock_irqsave(&events_lock, flags);
> + count = dev->power.wakeup_count;
> + spin_unlock_irqrestore(&events_lock, flags);
> + return count;
> +}
> Index: linux-2.6/include/linux/pm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pm.h
> +++ linux-2.6/include/linux/pm.h
> @@ -457,6 +457,7 @@ struct dev_pm_info {
> #ifdef CONFIG_PM_SLEEP
> struct list_head entry;
> struct completion completion;
> + unsigned long wakeup_count;
> #endif
> #ifdef CONFIG_PM_RUNTIME
> struct timer_list suspend_timer;
> @@ -552,6 +553,9 @@ extern void __suspend_report_result(cons
> } while (0)
>
> extern void device_pm_wait_for_dev(struct device *sub, struct device *dev);
> +
> +/* drivers/base/power/wakeup.c */
> +extern void pm_wakeup_event(struct device *dev);
> #else /* !CONFIG_PM_SLEEP */
>
> #define device_pm_lock() do {} while (0)
> @@ -565,6 +569,8 @@ static inline int dpm_suspend_start(pm_m
> #define suspend_report_result(fn, ret) do {} while (0)
>
> static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {}
> +
> +static inline void pm_wakeup_event(struct device *dev) {}
> #endif /* !CONFIG_PM_SLEEP */
>
> /* How to reorder dpm_list after device_move() */
> Index: linux-2.6/drivers/base/power/Makefile
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/Makefile
> +++ linux-2.6/drivers/base/power/Makefile
> @@ -1,5 +1,5 @@
> obj-$(CONFIG_PM) += sysfs.o
> -obj-$(CONFIG_PM_SLEEP) += main.o
> +obj-$(CONFIG_PM_SLEEP) += main.o wakeup.o
> obj-$(CONFIG_PM_RUNTIME) += runtime.o
> obj-$(CONFIG_PM_OPS) += generic_ops.o
> obj-$(CONFIG_PM_TRACE_RTC) += trace.o
> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -59,6 +59,7 @@ void device_pm_init(struct device *dev)
> {
> dev->power.status = DPM_ON;
> init_completion(&dev->power.completion);
> + dev->power.wakeup_count = 0;
> pm_runtime_init(dev);
> }
>
> Index: linux-2.6/kernel/power/power.h
> ===================================================================
> --- linux-2.6.orig/kernel/power/power.h
> +++ linux-2.6/kernel/power/power.h
> @@ -184,6 +184,12 @@ static inline void suspend_test_finish(c
> #ifdef CONFIG_PM_SLEEP
> /* kernel/power/main.c */
> extern int pm_notifier_call_chain(unsigned long val);
> +
> +/* drivers/base/power/wakeup.c */
> +extern void pm_wakeup_events_init(void);
> +extern bool pm_check_wakeup_events(bool enable);
> +extern unsigned long pm_get_wakeup_count(void);
> +extern bool pm_save_wakeup_count(unsigned long count);
> #endif
>
> #ifdef CONFIG_HIGHMEM
> Index: linux-2.6/kernel/power/suspend.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/suspend.c
> +++ linux-2.6/kernel/power/suspend.c
> @@ -157,7 +157,7 @@ static int suspend_enter(suspend_state_t
>
> error = sysdev_suspend(PMSG_SUSPEND);
> if (!error) {
> - if (!suspend_test(TEST_CORE))
> + if (!suspend_test(TEST_CORE) && pm_check_wakeup_events(false))
> error = suspend_ops->enter(state);
> sysdev_resume();
> }
> Index: linux-2.6/kernel/power/hibernate.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/hibernate.c
> +++ linux-2.6/kernel/power/hibernate.c
> @@ -277,7 +277,7 @@ static int create_image(int platform_mod
> goto Enable_irqs;
> }
>
> - if (hibernation_test(TEST_CORE))
> + if (hibernation_test(TEST_CORE) || !pm_check_wakeup_events(true))
> goto Power_up;
>
> in_suspend = 1;
> @@ -511,14 +511,18 @@ int hibernation_platform_enter(void)
>
> local_irq_disable();
> sysdev_suspend(PMSG_HIBERNATE);
> + if (!pm_check_wakeup_events(false))
> + goto Power_up;
> +
> hibernation_ops->enter();
> /* We should never get here */
> while (1);
>
> - /*
> - * We don't need to reenable the nonboot CPUs or resume consoles, since
> - * the system is going to be halted anyway.
> - */
> + Power_up:
> + sysdev_resume();
> + local_irq_enable();
> + enable_nonboot_cpus();
> +
> Platform_finish:
> hibernation_ops->finish();
>
> Index: linux-2.6/drivers/pci/pci-acpi.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pci-acpi.c
> +++ linux-2.6/drivers/pci/pci-acpi.c
> @@ -48,6 +48,8 @@ static void pci_acpi_wake_dev(acpi_handl
> if (event == ACPI_NOTIFY_DEVICE_WAKE && pci_dev) {
> pci_check_pme_status(pci_dev);
> pm_runtime_resume(&pci_dev->dev);
> + if (device_may_wakeup(&pci_dev->dev))
> + pm_wakeup_event(&pci_dev->dev);
> if (pci_dev->subordinate)
> pci_pme_wakeup_bus(pci_dev->subordinate);
> }
> Index: linux-2.6/drivers/pci/pcie/pme/pcie_pme.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pcie/pme/pcie_pme.c
> +++ linux-2.6/drivers/pci/pcie/pme/pcie_pme.c
> @@ -147,6 +147,8 @@ static bool pcie_pme_walk_bus(struct pci
> /* Skip PCIe devices in case we started from a root port. */
> if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) {
> pm_request_resume(&dev->dev);
> + if (device_may_wakeup(&dev->dev))
> + pm_wakeup_event(&dev->dev);
> ret = true;
> }
>
> Index: linux-2.6/drivers/base/power/power.h
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/power.h
> +++ linux-2.6/drivers/base/power/power.h
> @@ -30,6 +30,9 @@ extern void device_pm_move_before(struct
> extern void device_pm_move_after(struct device *, struct device *);
> extern void device_pm_move_last(struct device *);
>
> +/* drivers/base/power/wakeup.c */
> +extern unsigned long pm_dev_wakeup_count(struct device *dev);
> +
> #else /* !CONFIG_PM_SLEEP */
>
> static inline void device_pm_init(struct device *dev)
> Index: linux-2.6/drivers/base/power/sysfs.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/sysfs.c
> +++ linux-2.6/drivers/base/power/sysfs.c
> @@ -144,6 +144,14 @@ wake_store(struct device * dev, struct d
>
> static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store);
>
> +static ssize_t wakeup_count_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%lu\n", pm_dev_wakeup_count(dev));
> +}
> +
> +static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL);
> +
> #ifdef CONFIG_PM_ADVANCED_DEBUG
> #ifdef CONFIG_PM_RUNTIME
>
> @@ -230,6 +238,7 @@ static struct attribute * power_attrs[]
> &dev_attr_control.attr,
> #endif
> &dev_attr_wakeup.attr,
> + &dev_attr_wakeup_count.attr,
> #ifdef CONFIG_PM_ADVANCED_DEBUG
> &dev_attr_async.attr,
> #ifdef CONFIG_PM_RUNTIME
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100620055254.GA21994@gvim.org>
@ 2010-06-20 12:49 ` Rafael J. Wysocki
[not found] ` <201006201449.14754.rjw@sisk.pl>
1 sibling, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-20 12:49 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, linux-pm
On Sunday, June 20, 2010, mark gross wrote:
> On Sun, Jun 20, 2010 at 12:05:35AM +0200, Rafael J. Wysocki wrote:
> > Hi,
> >
> > One of the arguments during the suspend blockers discussion was that the
> > mainline kernel didn't contain any mechanisms allowing it to avoid losing
> > wakeup events during system suspend.
> >
> > Generally, there are two problems in that area. First, if a wakeup event
> > occurs exactly at the same time when /sys/power/state is being written to,
> > the even may be delivered to user space right before the freezing of it,
> > in which case the user space consumer of the event may not be able to process
> yes this is racy. souldn't the wakeup event handers/driver force a user
> mode ACK before they stop failing suspend attempts?
I'm not sure what you mean. There are wake-up events without any user space
consumers, like wake-on-LAN.
> > it before the system is suspended. Second, if a wakeup event occurs after user
> > space has been frozen and that event is not a wakeup interrupt, the kernel will
> > not react to it and the system will be suspended.
>
> If its not a wakeup interrupt is it not fair to allow the suspend to
> happen even if its handler's are "in flight" at suspend time?
A wakeup event need not be an interrupt, or at least there are interrupts that
have other meanings, like ACPI SCI.
> > The following patch illustrates my idea of how these two problems may be
> > addressed. It introduces a new global sysfs attribute,
> > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > to increment the wakeup events counter.
> >
> > /sys/power/wakeup_count may be read from or written to by user space. Reads
> > will always succeed and return the current value of the wakeup events counter.
> > Writes, however, will only succeed if the written number is equal to the
> > current value of the wakeup events counter. If a write is successful, it will
> > cause the kernel to save the current value of the wakeup events counter and
> > to compare the saved number with the current value of the counter at certain
> > points of the subsequent suspend (or hibernate) sequence. If the two values
> > don't match, the suspend will be aborted just as though a wakeup interrupt
> > happened. Reading from /sys/power/wakeup_count again will turn that mechanism
> > off.
>
> why would you want to turn it off?
In some cases I may want to suspend/hibernate losing some wakeup events,
like for example in the case of emergency hibernation on critical battery.
> > The assumption is that there's a user space power manager that will first
> > read from /sys/power/wakeup_count. Then it will check all user space consumers
> > of wakeup events known to it for unprocessed events. If there are any, it will
> > wait for them to be processed and repeat. In turn, if there are not any,
> > it will try to write to /sys/power/wakeup_count and if the write is successful,
> > it will write to /sys/power/state to start suspend, so if any wakeup events
> > accur past that point, they will be noticed by the kernel and will eventually
> > cause the suspend to be aborted.
> >
> > In addition to the above, the patch adds a wakeup events counter to the
> > power member of struct device and makes these per-device wakeup event counters
> > available via sysfs, so that it's possible to check the activity of various
> > wakeup event sources within the kernel.
> >
> > To illustrate how subsystems can use pm_wakeup_event(), I added it to the
> > PCI runtime PM wakeup-handling code.
> >
> > At the moment the patch only contains code changes (ie. no documentation),
> > but I'm going to add comments etc. if people like the idea.
> >
> > Please tell me what you think.
> >
> > Rafael
> >
> > ---
> > drivers/base/power/Makefile | 2 -
> > drivers/base/power/main.c | 1
> > drivers/base/power/power.h | 3 +
> > drivers/base/power/sysfs.c | 9 ++++
> > drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++
> > drivers/pci/pci-acpi.c | 2 +
> > drivers/pci/pcie/pme/pcie_pme.c | 2 +
> > include/linux/pm.h | 6 +++
> > kernel/power/hibernate.c | 14 ++++---
> > kernel/power/main.c | 24 ++++++++++++
> > kernel/power/power.h | 6 +++
> > kernel/power/suspend.c | 2 -
> > 12 files changed, 138 insertions(+), 7 deletions(-)
> >
> > Index: linux-2.6/kernel/power/main.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/power/main.c
> > +++ linux-2.6/kernel/power/main.c
> > @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec
> >
> > power_attr(state);
> >
> > +static ssize_t wakeup_count_show(struct kobject *kobj,
> > + struct kobj_attribute *attr,
> > + char *buf)
> > +{
> > + return sprintf(buf, "%lu\n", pm_get_wakeup_count());
> > +}
> > +
> > +static ssize_t wakeup_count_store(struct kobject *kobj,
> > + struct kobj_attribute *attr,
> > + const char *buf, size_t n)
> > +{
> > + unsigned long val;
> > +
> > + if (sscanf(buf, "%lu", &val) == 1) {
> > + if (pm_save_wakeup_count(val))
> > + return n;
> > + }
> > + return -EINVAL;
> > +}
> > +
> > +power_attr(wakeup_count);
> > +
> > #ifdef CONFIG_PM_TRACE
> > int pm_trace_enabled;
> >
> > @@ -236,6 +258,7 @@ static struct attribute * g[] = {
> > #endif
> > #ifdef CONFIG_PM_SLEEP
> > &pm_async_attr.attr,
> > + &wakeup_count_attr.attr,
> > #ifdef CONFIG_PM_DEBUG
> > &pm_test_attr.attr,
> > #endif
> > @@ -266,6 +289,7 @@ static int __init pm_init(void)
> > int error = pm_start_workqueue();
> > if (error)
> > return error;
> > + pm_wakeup_events_init();
> > power_kobj = kobject_create_and_add("power", NULL);
> > if (!power_kobj)
> > return -ENOMEM;
> > Index: linux-2.6/drivers/base/power/wakeup.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/drivers/base/power/wakeup.c
> > @@ -0,0 +1,74 @@
> > +
> > +#include <linux/device.h>
> > +#include <linux/pm.h>
> > +
> > +static unsigned long event_count;
> > +static unsigned long saved_event_count;
>
> what about over flow issues?
There's no issue AFAICS. It only matters if the value is different from the
previous one after incrementation.
> > +static bool events_check_enabled;
> > +static spinlock_t events_lock;
> > +
> > +void pm_wakeup_events_init(void)
> > +{
> > + spin_lock_init(&events_lock);
> > +}
> > +
> > +void pm_wakeup_event(struct device *dev)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&events_lock, flags);
> > + event_count++;
> should event_count be an atomic type so we can not bother with taking
> the evnets_lock?
The lock is there, because two counters are incremented at the same time.
Also, in some other places the counter is accessed along with the enable
flag, so there are two operations that need to be done in the same critical
section.
> > + if (dev)
> > + dev->power.wakeup_count++;
> > + spin_unlock_irqrestore(&events_lock, flags);
> > +}
> > +
> > +bool pm_check_wakeup_events(bool enable)
> > +{
> > + unsigned long flags;
> > + bool ret;
> > +
> > + spin_lock_irqsave(&events_lock, flags);
> > + ret = !events_check_enabled || (event_count == saved_event_count);
> I'm not getting the events_check_enbled flag yet.
I tells the PM core whether or not to check wakeup events during suspend.
The checking is only enabled after a successful write to
/sys/power/wakeup_count, it is disabled by reads and by the final check right
before the system enters suspend. [That's because the "saved" value from
the previous suspend should not be used for checking wakeup events during
the next one.]
> > + events_check_enabled = enable;
> I'm not sure if this is the right thing depending on all the different
> ways the boolians are interacting with eachother.
>
> Which is a red flag to me. This code is confusing.
Well, I could use check_wakeup_events() and
check_and_disable_wakeup_events() (the latter is only necessary for the final
check), but I'm not sure if that's going to be better.
> I'll look at it some more when I'm fresh tomorrow.
Thanks for the comments.
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <201006200005.35771.rjw@sisk.pl>
2010-06-20 5:52 ` mark gross
[not found] ` <20100620055254.GA21994@gvim.org>
@ 2010-06-20 16:28 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006201153420.22272-100000@netrider.rowland.org>
3 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-20 16:28 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
> Hi,
>
> One of the arguments during the suspend blockers discussion was that the
> mainline kernel didn't contain any mechanisms allowing it to avoid losing
> wakeup events during system suspend.
>
> Generally, there are two problems in that area. First, if a wakeup event
> occurs exactly at the same time when /sys/power/state is being written to,
> the even may be delivered to user space right before the freezing of it,
> in which case the user space consumer of the event may not be able to process
> it before the system is suspended.
Indeed, the same problem arises if the event isn't delivered to
userspace until after userspace is frozen. Of course, the underlying
issue here is that the kernel has no direct way to know when userspace
has finished processing an event. Userspace would have to tell it,
which generally would mean rewriting some large number of user
programs.
> Second, if a wakeup event occurs after user
> space has been frozen and that event is not a wakeup interrupt, the kernel will
> not react to it and the system will be suspended.
I don't quite understand what you mean here. "Reacting" to an event
involves more than one action. The kernel has to tell the hardware to
stop generating the wakeup signal, and it has to handle the event
somehow.
If the kernel doesn't tell the hardware to stop generating the wakeup
signal, the signal will continue to be active until the system goes to
sleep. At that point it will cause the system to wake up immediately,
so there won't be any problem.
The real problem arises when the hardware stops generating the wakeup
signal but the kernel suspends before it finishes handling the event.
For example, an interrupt handler might receive the event and start
processing it by calling pm_request_resume() -- but if the pm workqueue
thread is already frozen then the processing won't finish until
something else wakes the system up. (IMO this is a potential bug which
could be fixed without too much effort.)
> The following patch illustrates my idea of how these two problems may be
> addressed. It introduces a new global sysfs attribute,
> /sys/power/wakeup_count, associated with a running counter of wakeup events
> and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> to increment the wakeup events counter.
In what way is this better than suspend blockers?
> /sys/power/wakeup_count may be read from or written to by user space. Reads
> will always succeed and return the current value of the wakeup events counter.
> Writes, however, will only succeed if the written number is equal to the
> current value of the wakeup events counter. If a write is successful, it will
> cause the kernel to save the current value of the wakeup events counter and
> to compare the saved number with the current value of the counter at certain
> points of the subsequent suspend (or hibernate) sequence. If the two values
> don't match, the suspend will be aborted just as though a wakeup interrupt
> happened. Reading from /sys/power/wakeup_count again will turn that mechanism
> off.
>
> The assumption is that there's a user space power manager that will first
> read from /sys/power/wakeup_count. Then it will check all user space consumers
> of wakeup events known to it for unprocessed events.
What happens if an event arrives just before you read
/sys/power/wakeup_count, but the userspace consumer doesn't realize
there is a new unprocessed event until after the power manager checks
it? Your plan is missing a critical step: the "handoff" whereby
responsibility for handling an event passes from the kernel to
userspace.
With suspend blockers, this handoff occurs when an event queue is
emptied and its associate suspend blocker is deactivated. Or with some
kinds of events for which the Android people have not written an
explicit handoff, it occurs when a timer expires (timed suspend
blockers).
> If there are any, it will
> wait for them to be processed and repeat. In turn, if there are not any,
> it will try to write to /sys/power/wakeup_count and if the write is successful,
> it will write to /sys/power/state to start suspend, so if any wakeup events
> accur past that point, they will be noticed by the kernel and will eventually
> cause the suspend to be aborted.
This shares with the other alternatives posted recently the need for a
central power-manager process. And like in-kernel suspend blockers, it
requires changes to wakeup-capable drivers (the wakeup-events counter
has to be incremented).
One advantage of the suspend-blocker approach is that it essentially
uses a single tool to handle both kinds of races (event not fully
handled by the kernel, or event not fully handled by userspace).
Things aren't quite this simple, because of the need for a special API
to implement userspace suspend blockers, but this does avoid the need
for a power-manager process.
> In addition to the above, the patch adds a wakeup events counter to the
> power member of struct device and makes these per-device wakeup event counters
> available via sysfs, so that it's possible to check the activity of various
> wakeup event sources within the kernel.
>
> To illustrate how subsystems can use pm_wakeup_event(), I added it to the
> PCI runtime PM wakeup-handling code.
>
> At the moment the patch only contains code changes (ie. no documentation),
> but I'm going to add comments etc. if people like the idea.
>
> Please tell me what you think.
While this isn't a bad idea, I don't see how it is superior to the
other alternatives that have been proposed.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006201153420.22272-100000@netrider.rowland.org>
@ 2010-06-20 21:50 ` Rafael J. Wysocki
2010-06-20 22:58 ` mark gross
1 sibling, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-20 21:50 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, linux-pm
On Sunday, June 20, 2010, Alan Stern wrote:
> On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
>
> > Hi,
> >
> > One of the arguments during the suspend blockers discussion was that the
> > mainline kernel didn't contain any mechanisms allowing it to avoid losing
> > wakeup events during system suspend.
> >
> > Generally, there are two problems in that area. First, if a wakeup event
> > occurs exactly at the same time when /sys/power/state is being written to,
> > the even may be delivered to user space right before the freezing of it,
> > in which case the user space consumer of the event may not be able to process
> > it before the system is suspended.
>
> Indeed, the same problem arises if the event isn't delivered to
> userspace until after userspace is frozen.
In that case the kernel should abort the suspend so that the event can be
delivered to the user space.
> Of course, the underlying issue here is that the kernel has no direct way
> to know when userspace has finished processing an event. Userspace would
> have to tell it, which generally would mean rewriting some large number of user
> programs.
I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't
really need to know whether or not user space has already consumed the event.
> > Second, if a wakeup event occurs after user
> > space has been frozen and that event is not a wakeup interrupt, the kernel will
> > not react to it and the system will be suspended.
>
> I don't quite understand what you mean here. "Reacting" to an event
> involves more than one action. The kernel has to tell the hardware to
> stop generating the wakeup signal, and it has to handle the event
> somehow.
Yes. I meant that the event wouldn't cause the suspend to be aborted.
> If the kernel doesn't tell the hardware to stop generating the wakeup
> signal, the signal will continue to be active until the system goes to
> sleep. At that point it will cause the system to wake up immediately,
> so there won't be any problem.
>
> The real problem arises when the hardware stops generating the wakeup
> signal but the kernel suspends before it finishes handling the event.
> For example, an interrupt handler might receive the event and start
> processing it by calling pm_request_resume() -- but if the pm workqueue
> thread is already frozen then the processing won't finish until
> something else wakes the system up. (IMO this is a potential bug which
> could be fixed without too much effort.)
That's why I put pm_wakeup_event() into the PCI runtime wakeup code, which
doesn't run from the PM workqueue.
> > The following patch illustrates my idea of how these two problems may be
> > addressed. It introduces a new global sysfs attribute,
> > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > to increment the wakeup events counter.
>
> In what way is this better than suspend blockers?
It doesn't add any new framework and it doesn't require the users of
pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add
the user space interface that caused so much opposition to appear.
> > /sys/power/wakeup_count may be read from or written to by user space. Reads
> > will always succeed and return the current value of the wakeup events counter.
> > Writes, however, will only succeed if the written number is equal to the
> > current value of the wakeup events counter. If a write is successful, it will
> > cause the kernel to save the current value of the wakeup events counter and
> > to compare the saved number with the current value of the counter at certain
> > points of the subsequent suspend (or hibernate) sequence. If the two values
> > don't match, the suspend will be aborted just as though a wakeup interrupt
> > happened. Reading from /sys/power/wakeup_count again will turn that mechanism
> > off.
> >
> > The assumption is that there's a user space power manager that will first
> > read from /sys/power/wakeup_count. Then it will check all user space consumers
> > of wakeup events known to it for unprocessed events.
>
> What happens if an event arrives just before you read
> /sys/power/wakeup_count, but the userspace consumer doesn't realize
> there is a new unprocessed event until after the power manager checks
> it? Your plan is missing a critical step: the "handoff" whereby
> responsibility for handling an event passes from the kernel to
> userspace.
I think this is not the kernel's problem. In this approach the kernel makes it
possible for the user space to avoid the race. Whether or not the user space
will use this opportunity is a different matter.
> With suspend blockers, this handoff occurs when an event queue is
> emptied and its associate suspend blocker is deactivated. Or with some
> kinds of events for which the Android people have not written an
> explicit handoff, it occurs when a timer expires (timed suspend
> blockers).
Well, quite frankly, I don't see any difference here. In either case there is
a possibility for user space to mess up things and the kernel can't really help
that.
> > If there are any, it will
> > wait for them to be processed and repeat. In turn, if there are not any,
> > it will try to write to /sys/power/wakeup_count and if the write is successful,
> > it will write to /sys/power/state to start suspend, so if any wakeup events
> > accur past that point, they will be noticed by the kernel and will eventually
> > cause the suspend to be aborted.
>
> This shares with the other alternatives posted recently the need for a
> central power-manager process. And like in-kernel suspend blockers, it
> requires changes to wakeup-capable drivers (the wakeup-events counter
> has to be incremented).
It doesn't really require changes to drivers, but to code that knows of wakeup
events, like the PCI runtime wakeup code. Moreover, it doesn't require kernel
subsystems to know or even care when it is reasonable to allow suspend to
happen. The only thing they need to do is to call pm_wakeup_event() whenever
they see a wakeup event. I don't really think it is too much of a requirement
(and quite frnakly I can't imagine anything simpler than that).
> One advantage of the suspend-blocker approach is that it essentially
> uses a single tool to handle both kinds of races (event not fully
> handled by the kernel, or event not fully handled by userspace).
> Things aren't quite this simple, because of the need for a special API
> to implement userspace suspend blockers, but this does avoid the need
> for a power-manager process.
Yes, it does, but I have an idea about how to implement such a power manager
and I'm going to actually try it.
> > In addition to the above, the patch adds a wakeup events counter to the
> > power member of struct device and makes these per-device wakeup event counters
> > available via sysfs, so that it's possible to check the activity of various
> > wakeup event sources within the kernel.
> >
> > To illustrate how subsystems can use pm_wakeup_event(), I added it to the
> > PCI runtime PM wakeup-handling code.
> >
> > At the moment the patch only contains code changes (ie. no documentation),
> > but I'm going to add comments etc. if people like the idea.
> >
> > Please tell me what you think.
>
> While this isn't a bad idea, I don't see how it is superior to the
> other alternatives that have been proposed.
I don't think any of the approaches that don't use suspend blockers allows
one to avoid the race between the process that writes to /sys/power/state
and a wakeup event happening at the same time. They attempt to address another
issue, which is how to prevent untrusted user space processes from keeping the
system out of idle, but that is a different story.
My patch is all about the (system-wide) suspend mechanism, regardless of
whether or not it is used for opportunistic suspending.
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006201153420.22272-100000@netrider.rowland.org>
2010-06-20 21:50 ` Rafael J. Wysocki
@ 2010-06-20 22:58 ` mark gross
1 sibling, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-20 22:58 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Sun, Jun 20, 2010 at 12:28:29PM -0400, Alan Stern wrote:
> On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
>
> > Hi,
> >
> > One of the arguments during the suspend blockers discussion was that the
> > mainline kernel didn't contain any mechanisms allowing it to avoid losing
> > wakeup events during system suspend.
> >
> > Generally, there are two problems in that area. First, if a wakeup event
> > occurs exactly at the same time when /sys/power/state is being written to,
> > the even may be delivered to user space right before the freezing of it,
> > in which case the user space consumer of the event may not be able to process
> > it before the system is suspended.
>
> Indeed, the same problem arises if the event isn't delivered to
> userspace until after userspace is frozen. Of course, the underlying
> issue here is that the kernel has no direct way to know when userspace
> has finished processing an event. Userspace would have to tell it,
> which generally would mean rewriting some large number of user
> programs.
Suspend blockers don't solve this, and the large number of user programs
isn't a big number. /me thinks its 1 or 2 services.
> > Second, if a wakeup event occurs after user
> > space has been frozen and that event is not a wakeup interrupt, the kernel will
> > not react to it and the system will be suspended.
>
> I don't quite understand what you mean here. "Reacting" to an event
> involves more than one action. The kernel has to tell the hardware to
> stop generating the wakeup signal, and it has to handle the event
> somehow.
>
> If the kernel doesn't tell the hardware to stop generating the wakeup
> signal, the signal will continue to be active until the system goes to
> sleep. At that point it will cause the system to wake up immediately,
> so there won't be any problem.
>
> The real problem arises when the hardware stops generating the wakeup
> signal but the kernel suspends before it finishes handling the event.
> For example, an interrupt handler might receive the event and start
> processing it by calling pm_request_resume() -- but if the pm workqueue
> thread is already frozen then the processing won't finish until
> something else wakes the system up. (IMO this is a potential bug which
> could be fixed without too much effort.)
>
> > The following patch illustrates my idea of how these two problems may be
> > addressed. It introduces a new global sysfs attribute,
> > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > to increment the wakeup events counter.
>
> In what way is this better than suspend blockers?
1) I don't think suspend blockers really solve or effectively articulate
the suspend wake event race problem.
2) the partial solution suspend blocker offer for the suspend race is
tightly bound to the suspend blocker implementation and not useful in
more general contexts.
> > /sys/power/wakeup_count may be read from or written to by user space. Reads
> > will always succeed and return the current value of the wakeup events counter.
> > Writes, however, will only succeed if the written number is equal to the
> > current value of the wakeup events counter. If a write is successful, it will
> > cause the kernel to save the current value of the wakeup events counter and
> > to compare the saved number with the current value of the counter at certain
> > points of the subsequent suspend (or hibernate) sequence. If the two values
> > don't match, the suspend will be aborted just as though a wakeup interrupt
> > happened. Reading from /sys/power/wakeup_count again will turn that mechanism
> > off.
> >
> > The assumption is that there's a user space power manager that will first
> > read from /sys/power/wakeup_count. Then it will check all user space consumers
> > of wakeup events known to it for unprocessed events.
>
> What happens if an event arrives just before you read
> /sys/power/wakeup_count, but the userspace consumer doesn't realize
> there is a new unprocessed event until after the power manager checks
> it? Your plan is missing a critical step: the "handoff" whereby
> responsibility for handling an event passes from the kernel to
> userspace.
>
> With suspend blockers, this handoff occurs when an event queue is
> emptied and its associate suspend blocker is deactivated. Or with some
> kinds of events for which the Android people have not written an
> explicit handoff, it occurs when a timer expires (timed suspend
> blockers).
The wakeup_count will need to be incremented in the same even queue the
suspend blocker handoff happens.
>
> > If there are any, it will
> > wait for them to be processed and repeat. In turn, if there are not any,
> > it will try to write to /sys/power/wakeup_count and if the write is successful,
> > it will write to /sys/power/state to start suspend, so if any wakeup events
> > accur past that point, they will be noticed by the kernel and will eventually
> > cause the suspend to be aborted.
>
> This shares with the other alternatives posted recently the need for a
> central power-manager process. And like in-kernel suspend blockers, it
> requires changes to wakeup-capable drivers (the wakeup-events counter
> has to be incremented).
true.
> One advantage of the suspend-blocker approach is that it essentially
> uses a single tool to handle both kinds of races (event not fully
> handled by the kernel, or event not fully handled by userspace).
> Things aren't quite this simple, because of the need for a special API
> to implement userspace suspend blockers, but this does avoid the need
> for a power-manager process.
I don't think suspend-blocker handles both kinds of races as well as you
seem to think. A single tool that conflates multiple kernel facilities
is not an advantage. It limits applications outside of the scope of
doing it the "android way".
Where do you get the idea that there isn't a need for a centralized PM
process in Android? Thats not true.
> > In addition to the above, the patch adds a wakeup events counter to the
> > power member of struct device and makes these per-device wakeup event counters
> > available via sysfs, so that it's possible to check the activity of various
> > wakeup event sources within the kernel.
> >
> > To illustrate how subsystems can use pm_wakeup_event(), I added it to the
> > PCI runtime PM wakeup-handling code.
> >
> > At the moment the patch only contains code changes (ie. no documentation),
> > but I'm going to add comments etc. if people like the idea.
> >
> > Please tell me what you think.
>
> While this isn't a bad idea, I don't see how it is superior to the
> other alternatives that have been proposed.
>
I like my PM-event framework better but I think this is ok too.
--mgross
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <201006201449.14754.rjw@sisk.pl>
@ 2010-06-20 23:13 ` mark gross
0 siblings, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-20 23:13 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Sun, Jun 20, 2010 at 02:49:14PM +0200, Rafael J. Wysocki wrote:
> On Sunday, June 20, 2010, mark gross wrote:
> > On Sun, Jun 20, 2010 at 12:05:35AM +0200, Rafael J. Wysocki wrote:
> > > Hi,
> > >
> > > One of the arguments during the suspend blockers discussion was that the
> > > mainline kernel didn't contain any mechanisms allowing it to avoid losing
> > > wakeup events during system suspend.
> > >
> > > Generally, there are two problems in that area. First, if a wakeup event
> > > occurs exactly at the same time when /sys/power/state is being written to,
> > > the even may be delivered to user space right before the freezing of it,
> > > in which case the user space consumer of the event may not be able to process
> > yes this is racy. souldn't the wakeup event handers/driver force a user
> > mode ACK before they stop failing suspend attempts?
>
> I'm not sure what you mean. There are wake-up events without any user space
> consumers, like wake-on-LAN.
I was thinking about the incoming call scenario. The 3G modem starts
ringing and the we don't want the suspend to happen again until user
mode ACKs the call (answer or ignore)
however; you are right about events without user mode consumers. I
forgot about those.
> > > it before the system is suspended. Second, if a wakeup event occurs after user
> > > space has been frozen and that event is not a wakeup interrupt, the kernel will
> > > not react to it and the system will be suspended.
> >
> > If its not a wakeup interrupt is it not fair to allow the suspend to
> > happen even if its handler's are "in flight" at suspend time?
>
> A wakeup event need not be an interrupt, or at least there are interrupts that
> have other meanings, like ACPI SCI.
true, but who desides if such a thing is a wakeup event? And how does
that polocy get generalized in a platform portable way?
(I don't think this is an issue for this current patch, its just a
nagging issue to me in general with all this PM stuff)
> > > The following patch illustrates my idea of how these two problems may be
> > > addressed. It introduces a new global sysfs attribute,
> > > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > > to increment the wakeup events counter.
> > >
> > > /sys/power/wakeup_count may be read from or written to by user space. Reads
> > > will always succeed and return the current value of the wakeup events counter.
> > > Writes, however, will only succeed if the written number is equal to the
> > > current value of the wakeup events counter. If a write is successful, it will
> > > cause the kernel to save the current value of the wakeup events counter and
> > > to compare the saved number with the current value of the counter at certain
> > > points of the subsequent suspend (or hibernate) sequence. If the two values
> > > don't match, the suspend will be aborted just as though a wakeup interrupt
> > > happened. Reading from /sys/power/wakeup_count again will turn that mechanism
> > > off.
> >
> > why would you want to turn it off?
>
> In some cases I may want to suspend/hibernate losing some wakeup events,
> like for example in the case of emergency hibernation on critical battery.
Or a user directed explicit suspend request. your right.
> > > The assumption is that there's a user space power manager that will first
> > > read from /sys/power/wakeup_count. Then it will check all user space consumers
> > > of wakeup events known to it for unprocessed events. If there are any, it will
> > > wait for them to be processed and repeat. In turn, if there are not any,
> > > it will try to write to /sys/power/wakeup_count and if the write is successful,
> > > it will write to /sys/power/state to start suspend, so if any wakeup events
> > > accur past that point, they will be noticed by the kernel and will eventually
> > > cause the suspend to be aborted.
> > >
> > > In addition to the above, the patch adds a wakeup events counter to the
> > > power member of struct device and makes these per-device wakeup event counters
> > > available via sysfs, so that it's possible to check the activity of various
> > > wakeup event sources within the kernel.
> > >
> > > To illustrate how subsystems can use pm_wakeup_event(), I added it to the
> > > PCI runtime PM wakeup-handling code.
> > >
> > > At the moment the patch only contains code changes (ie. no documentation),
> > > but I'm going to add comments etc. if people like the idea.
> > >
> > > Please tell me what you think.
> > >
> > > Rafael
> > >
> > > ---
> > > drivers/base/power/Makefile | 2 -
> > > drivers/base/power/main.c | 1
> > > drivers/base/power/power.h | 3 +
> > > drivers/base/power/sysfs.c | 9 ++++
> > > drivers/base/power/wakeup.c | 74 ++++++++++++++++++++++++++++++++++++++++
> > > drivers/pci/pci-acpi.c | 2 +
> > > drivers/pci/pcie/pme/pcie_pme.c | 2 +
> > > include/linux/pm.h | 6 +++
> > > kernel/power/hibernate.c | 14 ++++---
> > > kernel/power/main.c | 24 ++++++++++++
> > > kernel/power/power.h | 6 +++
> > > kernel/power/suspend.c | 2 -
> > > 12 files changed, 138 insertions(+), 7 deletions(-)
> > >
> > > Index: linux-2.6/kernel/power/main.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/power/main.c
> > > +++ linux-2.6/kernel/power/main.c
> > > @@ -204,6 +204,28 @@ static ssize_t state_store(struct kobjec
> > >
> > > power_attr(state);
> > >
> > > +static ssize_t wakeup_count_show(struct kobject *kobj,
> > > + struct kobj_attribute *attr,
> > > + char *buf)
> > > +{
> > > + return sprintf(buf, "%lu\n", pm_get_wakeup_count());
> > > +}
> > > +
> > > +static ssize_t wakeup_count_store(struct kobject *kobj,
> > > + struct kobj_attribute *attr,
> > > + const char *buf, size_t n)
> > > +{
> > > + unsigned long val;
> > > +
> > > + if (sscanf(buf, "%lu", &val) == 1) {
> > > + if (pm_save_wakeup_count(val))
> > > + return n;
> > > + }
> > > + return -EINVAL;
> > > +}
> > > +
> > > +power_attr(wakeup_count);
> > > +
> > > #ifdef CONFIG_PM_TRACE
> > > int pm_trace_enabled;
> > >
> > > @@ -236,6 +258,7 @@ static struct attribute * g[] = {
> > > #endif
> > > #ifdef CONFIG_PM_SLEEP
> > > &pm_async_attr.attr,
> > > + &wakeup_count_attr.attr,
> > > #ifdef CONFIG_PM_DEBUG
> > > &pm_test_attr.attr,
> > > #endif
> > > @@ -266,6 +289,7 @@ static int __init pm_init(void)
> > > int error = pm_start_workqueue();
> > > if (error)
> > > return error;
> > > + pm_wakeup_events_init();
> > > power_kobj = kobject_create_and_add("power", NULL);
> > > if (!power_kobj)
> > > return -ENOMEM;
> > > Index: linux-2.6/drivers/base/power/wakeup.c
> > > ===================================================================
> > > --- /dev/null
> > > +++ linux-2.6/drivers/base/power/wakeup.c
> > > @@ -0,0 +1,74 @@
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/pm.h>
> > > +
> > > +static unsigned long event_count;
> > > +static unsigned long saved_event_count;
> >
> > what about over flow issues?
>
> There's no issue AFAICS. It only matters if the value is different from the
> previous one after incrementation.
>
> > > +static bool events_check_enabled;
> > > +static spinlock_t events_lock;
> > > +
> > > +void pm_wakeup_events_init(void)
> > > +{
> > > + spin_lock_init(&events_lock);
> > > +}
> > > +
> > > +void pm_wakeup_event(struct device *dev)
> > > +{
> > > + unsigned long flags;
> > > +
> > > + spin_lock_irqsave(&events_lock, flags);
> > > + event_count++;
> > should event_count be an atomic type so we can not bother with taking
> > the evnets_lock?
>
> The lock is there, because two counters are incremented at the same time.
> Also, in some other places the counter is accessed along with the enable
> flag, so there are two operations that need to be done in the same critical
> section.
ok, I was thinking there is only one counter that needs locking, and one
global value writen too from user mode that didn't need locking.
>
> > > + if (dev)
> > > + dev->power.wakeup_count++;
> > > + spin_unlock_irqrestore(&events_lock, flags);
> > > +}
> > > +
> > > +bool pm_check_wakeup_events(bool enable)
> > > +{
> > > + unsigned long flags;
> > > + bool ret;
> > > +
> > > + spin_lock_irqsave(&events_lock, flags);
> > > + ret = !events_check_enabled || (event_count == saved_event_count);
> > I'm not getting the events_check_enbled flag yet.
>
> I tells the PM core whether or not to check wakeup events during suspend.
> The checking is only enabled after a successful write to
> /sys/power/wakeup_count, it is disabled by reads and by the final check right
> before the system enters suspend. [That's because the "saved" value from
> the previous suspend should not be used for checking wakeup events during
> the next one.]
>
> > > + events_check_enabled = enable;
> > I'm not sure if this is the right thing depending on all the different
> > ways the boolians are interacting with eachother.
> >
> > Which is a red flag to me. This code is confusing.
>
> Well, I could use check_wakeup_events() and
> check_and_disable_wakeup_events() (the latter is only necessary for the final
> check), but I'm not sure if that's going to be better.
I'm not sure either what you have is growing on me anyway.
> > I'll look at it some more when I'm fresh tomorrow.
>
> Thanks for the comments.
I still need to look more at the patch.
--mgross
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <201006202350.27143.rjw@sisk.pl>
@ 2010-06-21 2:23 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006202154260.29532-100000@netrider.rowland.org>
1 sibling, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 2:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
> > > Generally, there are two problems in that area. First, if a wakeup event
> > > occurs exactly at the same time when /sys/power/state is being written to,
> > > the even may be delivered to user space right before the freezing of it,
> > > in which case the user space consumer of the event may not be able to process
> > > it before the system is suspended.
> >
> > Indeed, the same problem arises if the event isn't delivered to
> > userspace until after userspace is frozen.
>
> In that case the kernel should abort the suspend so that the event can be
> delivered to the user space.
Yes.
> > Of course, the underlying issue here is that the kernel has no direct way
> > to know when userspace has finished processing an event. Userspace would
> > have to tell it, which generally would mean rewriting some large number of user
> > programs.
>
> I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't
> really need to know whether or not user space has already consumed the event.
That's true. But it only shifts the onus: When a userspace program has
finished processing an event, it has to tell the power-manager process.
Clearly this sort of thing is unavoidable, one way or another.
> > > The following patch illustrates my idea of how these two problems may be
> > > addressed. It introduces a new global sysfs attribute,
> > > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > > to increment the wakeup events counter.
> >
> > In what way is this better than suspend blockers?
>
> It doesn't add any new framework and it doesn't require the users of
> pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add
> the user space interface that caused so much opposition to appear.
Okay. A quick comparison shows that in your proposal:
There's no need to register and unregister suspend blockers.
But instead you create the equivalent of a suspend blocker
inside every struct device.
Drivers (or subsystems) don't have to activate suspend
blockers. But instead they have to call pm_wakeup_event().
Drivers don't have to deactivate suspend blockers. You don't
have anything equivalent, and as a result your scheme is
subject to the race described below.
There are no userspace suspend blockers and no opportunistic
suspend. Instead a power-manager process takes care of
initiating or preventing suspends as needed.
In short, you have eliminated the userspace part of the suspend blocker
approach just as in some of the proposals posted earlier, and you have
replaced the in-kernel suspend blockers with new data in struct device
and a new PM API. On the whole, it doesn't seem very different from
the in-kernel part of suspend blockers. The most notable difference is
the name: pm_wake_event() vs. suspend_blocker_activate(), or whatever
it ended up being called.
This is the race I was talking about:
> > What happens if an event arrives just before you read
> > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > there is a new unprocessed event until after the power manager checks
> > it?
> I think this is not the kernel's problem. In this approach the kernel makes it
> possible for the user space to avoid the race. Whether or not the user space
> will use this opportunity is a different matter.
It is _not_ possible for userspace to avoid this race. Help from the
kernel is needed.
> > Your plan is missing a critical step: the "handoff" whereby
> > responsibility for handling an event passes from the kernel to
> > userspace.
> > With suspend blockers, this handoff occurs when an event queue is
> > emptied and its associate suspend blocker is deactivated. Or with some
> > kinds of events for which the Android people have not written an
> > explicit handoff, it occurs when a timer expires (timed suspend
> > blockers).
>
> Well, quite frankly, I don't see any difference here. In either case there is
> a possibility for user space to mess up things and the kernel can't really help
> that.
With suspend blockers, there is also the possibility for userspace to
handle races correctly. But with your scheme there isn't -- that's the
difference.
> > This shares with the other alternatives posted recently the need for a
> > central power-manager process. And like in-kernel suspend blockers, it
> > requires changes to wakeup-capable drivers (the wakeup-events counter
> > has to be incremented).
>
> It doesn't really require changes to drivers, but to code that knows of wakeup
> events, like the PCI runtime wakeup code.
Just like in-kernel suspend blockers.
> Moreover, it doesn't require kernel
> subsystems to know or even care when it is reasonable to allow suspend to
> happen. The only thing they need to do is to call pm_wakeup_event() whenever
> they see a wakeup event.
That's just semantics. Obviously a wakeup event should prevent suspend
from happening, so if subsystems know or care about one then they know
or care about the other.
> I don't really think it is too much of a requirement
> (and quite frnakly I can't imagine anything simpler than that).
This is because you have omitted the part about allowing suspends again
(or if you prefer, about notifying the PM core that a wakeup event has
been handed off to userspace). As a result of leaving this out, you
haven't eliminated all the races.
> Yes, it does, but I have an idea about how to implement such a power manager
> and I'm going to actually try it.
A logical design would be to use dbus for disseminating PM-related
information. Does your idea work that way?
> I don't think any of the approaches that don't use suspend blockers allows
> one to avoid the race between the process that writes to /sys/power/state
> and a wakeup event happening at the same time. They attempt to address another
> issue, which is how to prevent untrusted user space processes from keeping the
> system out of idle, but that is a different story.
Well, there was one approach that didn't use suspend blockers and did
solve the race: the original wakelocks proposal. Of course, that was
just suspend blockers under a different name. One could make a very
good case that your scheme is also suspend blockers under a different
name (and with an important part missing).
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <20100620225810.GB9735@gvim.org>
@ 2010-06-21 2:33 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006202223440.29532-100000@netrider.rowland.org>
1 sibling, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 2:33 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Sun, 20 Jun 2010, mark gross wrote:
> > Indeed, the same problem arises if the event isn't delivered to
> > userspace until after userspace is frozen. Of course, the underlying
> > issue here is that the kernel has no direct way to know when userspace
> > has finished processing an event. Userspace would have to tell it,
> > which generally would mean rewriting some large number of user
> > programs.
>
> Suspend blockers don't solve this, and the large number of user programs
> isn't a big number. /me thinks its 1 or 2 services.
On Android, perhaps. What about on other systems?
> > In what way is this better than suspend blockers?
>
> 1) I don't think suspend blockers really solve or effectively articulate
> the suspend wake event race problem.
Why not?
> 2) the partial solution suspend blocker offer for the suspend race is
> tightly bound to the suspend blocker implementation and not useful in
> more general contexts.
I don't understand. Can you explain further?
> > One advantage of the suspend-blocker approach is that it essentially
> > uses a single tool to handle both kinds of races (event not fully
> > handled by the kernel, or event not fully handled by userspace).
> > Things aren't quite this simple, because of the need for a special API
> > to implement userspace suspend blockers, but this does avoid the need
> > for a power-manager process.
>
> I don't think suspend-blocker handles both kinds of races as well as you
> seem to think.
Can you give any counterexamples?
> A single tool that conflates multiple kernel facilities
> is not an advantage. It limits applications outside of the scope of
> doing it the "android way".
I don't see that necessarily as a drawback. You might just as well say
that the entire Linux kernel limits applications to doing things the
"Unix" way. Often have fewer choices is an advantage.
> Where do you get the idea that there isn't a need for a centralized PM
> process in Android? Thats not true.
I didn't get that idea from anywhere. Sorry if I gave the wrong
impression -- I meant that non-Android systems would need to implement
a centralized power-manager process, along with all the necessary
changes to other programs.
(Incidentally, even on Android the centralized PM process does not
handle _all_ of the userspace/kernel wakelock interactions.)
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006202223440.29532-100000@netrider.rowland.org>
@ 2010-06-21 4:04 ` David Brownell
2010-06-21 6:02 ` David Brownell
2010-06-21 15:06 ` Alan Stern
2010-06-21 5:55 ` mark gross
[not found] ` <20100621055522.GE9735@gvim.org>
2 siblings, 2 replies; 50+ messages in thread
From: David Brownell @ 2010-06-21 4:04 UTC (permalink / raw)
To: markgross, Alan Stern
Cc: Neil Brown, linux-pm, Dmitry Torokhov, Linux Kernel Mailing List,
mark gross
> > > Indeed, the same problem arises if the event
> isn't delivered to
> > > userspace until after userspace is frozen.
Can we put this more directly: the problem is
that the *SYSTEM ISN'T FULLY SUSPENDED* when the
hardware wake event triggers? (Where "*SYSTEM*
includes userspace not just kernel. In fact the
overall system is built from many subsystems,
some in the kernel and some in userspace.
At the risk of being prematurely general: I'd
point out that these subsystems probably have
sequencing requirements. kernel-then-user is
a degenerate case, and surely oversimplified.
There are other examples, e.g. between kernel
subsystems... Like needing to suspend a PMIC
before the bus it uses, where that bus uses
a task to manage request/response protocols.
(Think I2C or SPI.)
This is like the __init/__exit sequencing mess...
In terms of userspace event delivery, I'd say
it's a bug in the event mechanism if taking the
next step in suspension drops any event. It
should be queued, not lost... As a rule the
hardware queuing works (transparently)...
> Of course, the underlying
> > > issue here is that the kernel has no direct way
> to know when userspace
> > > has finished processing an event.
Again said more directly: there's no current
mechanism to coordinate subsystems. Userspace
can't communicate "I'm ready" to kernel, and
vice versa. (a few decades ago, APM could do
that ... we dropped such mechanisms though, and
I'm fairly sure APM's implementation was holey.)
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006202154260.29532-100000@netrider.rowland.org>
@ 2010-06-21 5:32 ` Florian Mickler
2010-06-21 6:13 ` mark gross
` (2 subsequent siblings)
3 siblings, 0 replies; 50+ messages in thread
From: Florian Mickler @ 2010-06-21 5:32 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Sun, 20 Jun 2010 22:23:38 -0400 (EDT)
Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
> > > In what way is this better than suspend blockers?
> >
> > It doesn't add any new framework and it doesn't require the users of
> > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add
> > the user space interface that caused so much opposition to appear.
>
> Okay. A quick comparison shows that in your proposal:
>
> There's no need to register and unregister suspend blockers.
> But instead you create the equivalent of a suspend blocker
> inside every struct device.
>
> Drivers (or subsystems) don't have to activate suspend
> blockers. But instead they have to call pm_wakeup_event().
>
> Drivers don't have to deactivate suspend blockers. You don't
> have anything equivalent, and as a result your scheme is
> subject to the race described below.
>
> There are no userspace suspend blockers and no opportunistic
> suspend. Instead a power-manager process takes care of
> initiating or preventing suspends as needed.
>
> In short, you have eliminated the userspace part of the suspend blocker
> approach just as in some of the proposals posted earlier, and you have
> replaced the in-kernel suspend blockers with new data in struct device
> and a new PM API. On the whole, it doesn't seem very different from
> the in-kernel part of suspend blockers. The most notable difference is
> the name: pm_wake_event() vs. suspend_blocker_activate(), or whatever
> it ended up being called.
>
> This is the race I was talking about:
>
> > > What happens if an event arrives just before you read
> > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > there is a new unprocessed event until after the power manager checks
> > > it?
>
> > I think this is not the kernel's problem. In this approach the kernel makes it
> > possible for the user space to avoid the race. Whether or not the user space
> > will use this opportunity is a different matter.
>
> It is _not_ possible for userspace to avoid this race. Help from the
> kernel is needed.
It is possible if every (relevant) userspace program implements a
callback for the powermanager to check if one of it's wakeup-sources
got activated.
That way the powermanager would read /sys/power/wakeup_count, then do
the roundtrip to all it's registered users and only then suspend.
This turns the suspend_blockers concept around. Instead of actively
signaling the suspend_blockers, the userspace programs only answer
"yes/no" when asked. (i.e. polling?)
You _can not_ implement userspace suspend blockers with this approach,
as it is vital for every userspace program to get scheduled and check
it's wakeup-source (if even possible) before you know that the right
parties have won the race.
Cheers,
Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006202223440.29532-100000@netrider.rowland.org>
2010-06-21 4:04 ` David Brownell
@ 2010-06-21 5:55 ` mark gross
[not found] ` <20100621055522.GE9735@gvim.org>
2 siblings, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-21 5:55 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Sun, Jun 20, 2010 at 10:33:01PM -0400, Alan Stern wrote:
> On Sun, 20 Jun 2010, mark gross wrote:
>
> > > Indeed, the same problem arises if the event isn't delivered to
> > > userspace until after userspace is frozen. Of course, the underlying
> > > issue here is that the kernel has no direct way to know when userspace
> > > has finished processing an event. Userspace would have to tell it,
> > > which generally would mean rewriting some large number of user
> > > programs.
> >
> > Suspend blockers don't solve this, and the large number of user programs
> > isn't a big number. /me thinks its 1 or 2 services.
>
> On Android, perhaps. What about on other systems?
This is just speculation but, I would expect other systems would have
the graphics, input, telephony, and any PM services wake_count / wake
event ware.
I can't imagine (at the moment) anything else that would care.
>
> > > In what way is this better than suspend blockers?
> >
> > 1) I don't think suspend blockers really solve or effectively articulate
> > the suspend wake event race problem.
>
> Why not?
For starters the suspend block patch is about suspend blocking, at least
at first before competing ideas starting coming out and this race issue
was offered up as an excuse to not consider the alternative ideas, then
suddenly suspend blocking became a wake event notification mechanism
implementation that was baked into the implementation. The arguments
are still a blur and irritating to recall / look up again.
But, the point I'm trying to make is that wake event serialization /
event handling suddenly became a big FUD-pie tightly bound to a specific
feature for suspend blocking (or was is auto suspending? Its a magical
solution to all the PM problems in the kernel. I'm being sarcastic.)
Its much better to call out the issue and provide a clean targeted
solution to it without binding it to some other solution to a different
problem.
>
> > 2) the partial solution suspend blocker offer for the suspend race is
> > tightly bound to the suspend blocker implementation and not useful in
> > more general contexts.
>
> I don't understand. Can you explain further?
>
> > > One advantage of the suspend-blocker approach is that it essentially
> > > uses a single tool to handle both kinds of races (event not fully
> > > handled by the kernel, or event not fully handled by userspace).
> > > Things aren't quite this simple, because of the need for a special API
> > > to implement userspace suspend blockers, but this does avoid the need
> > > for a power-manager process.
> >
> > I don't think suspend-blocker handles both kinds of races as well as you
> > seem to think.
>
> Can you give any counterexamples?
I knew you'd ask such a thing. Do you have any correctness proofs of it
working right?
Lets consider the RIL incoming call race:
1) user mode initiates an "opportunistic suspend" or whatever we call
this these days by writing to some sys interface.
2) a call comes in (multi-threaded cpu) just after the write.
3) the caif driver grabs a blocker, stopping the in flight suspend in the
nick of time. But, when should it release it without racing? How does
one implement that kernel code such that its portable to non-android user
mode stacks?
>
> > A single tool that conflates multiple kernel facilities
> > is not an advantage. It limits applications outside of the scope of
> > doing it the "android way".
>
> I don't see that necessarily as a drawback. You might just as well say
> that the entire Linux kernel limits applications to doing things the
> "Unix" way. Often have fewer choices is an advantage.
I disagree with your analogy. One problem one solution with minimal
coupling to other implementation details is nice. Two problems with one
solution dependent on the system wide architecture bound to a user mode
PM design is fragile and not portable.
> > Where do you get the idea that there isn't a need for a centralized PM
> > process in Android? Thats not true.
>
> I didn't get that idea from anywhere. Sorry if I gave the wrong
> impression -- I meant that non-Android systems would need to implement
> a centralized power-manager process, along with all the necessary
> changes to other programs.
>
> (Incidentally, even on Android the centralized PM process does not
> handle _all_ of the userspace/kernel wakelock interactions.)
>
Yeah, I've been told that too. I've grepped for where the wake_unlock
sysfs interfaces are hit from user mode android (eclair) and I would
like some help in identifying those guys. Maybe its in the FroYo code I
don't get to look at yet?
There is libhardware_legacy/power/power.c and an out of tree kernel
broadcom bcm4329 driver under hardware/broadcom but that doesn't count
as it looks like a kernel driver to me.
--mgross
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
2010-06-21 4:04 ` David Brownell
@ 2010-06-21 6:02 ` David Brownell
2010-06-21 15:06 ` Alan Stern
1 sibling, 0 replies; 50+ messages in thread
From: David Brownell @ 2010-06-21 6:02 UTC (permalink / raw)
To: markgross, Alan Stern
Cc: Neil Brown, linux-pm, Dmitry Torokhov, Linux Kernel Mailing List,
mark gross
--- On Sun, 6/20/10, David Brownell <david-b@pacbell.net> wrote:
... in a sort of "aren't we asking the
wrong questions??" manner ...
I suspect that
looking at the problem in terms of how to
coordinate subsystems (an abstraction which
is at best very ad-hoc today!) we would
end up with a cleaner model, which doesn't
bother so many folk the ay wakelocks or
even suspend blockers seem to bother them...
> From: David Brownell <david-b@pacbell.net>
> Subject: Re: [linux-pm] [RFC][PATCH] PM: Avoid losing wakeup events during suspend
> To: markgross@thegnar.org, "Alan Stern" <stern@rowland.harvard.edu>
> Cc: "Neil Brown" <neilb@suse.de>, linux-pm@lists.linux-foundation.org, "Dmitry Torokhov" <dmitry.torokhov@gmail.com>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, "mark gross" <640e9920@gmail.com>
> Date: Sunday, June 20, 2010, 9:04 PM
>
> > > > Indeed, the same problem arises if the
> event
> > isn't delivered to
> > > > userspace until after userspace is frozen.
>
> Can we put this more directly: the problem is
> that the *SYSTEM ISN'T FULLY SUSPENDED* when the
> hardware wake event triggers? (Where "*SYSTEM*
> includes userspace not just kernel. In fact the
> overall system is built from many subsystems,
> some in the kernel and some in userspace.
>
> At the risk of being prematurely general: I'd
> point out that these subsystems probably have
> sequencing requirements. kernel-then-user is
> a degenerate case, and surely oversimplified.
> There are other examples, e.g. between kernel
> subsystems... Like needing to suspend a PMIC
> before the bus it uses, where that bus uses
> a task to manage request/response protocols.
> (Think I2C or SPI.)
>
> This is like the __init/__exit sequencing mess...
>
> In terms of userspace event delivery, I'd say
> it's a bug in the event mechanism if taking the
> next step in suspension drops any event. It
> should be queued, not lost... As a rule the
> hardware queuing works (transparently)...
>
> > Of course, the underlying
> > > > issue here is that the kernel has no direct
> way
> > to know when userspace
> > > > has finished processing an event.
>
>
> Again said more directly: there's no current
> mechanism to coordinate subsystems. Userspace
> can't communicate "I'm ready" to kernel, and
> vice versa. (a few decades ago, APM could do
> that ... we dropped such mechanisms though, and
> I'm fairly sure APM's implementation was holey.)
>
>
>
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006202154260.29532-100000@netrider.rowland.org>
2010-06-21 5:32 ` Florian Mickler
@ 2010-06-21 6:13 ` mark gross
[not found] ` <20100621061345.GF9735@gvim.org>
2010-06-21 21:58 ` Rafael J. Wysocki
3 siblings, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-21 6:13 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Sun, Jun 20, 2010 at 10:23:38PM -0400, Alan Stern wrote:
> On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
>
> > > > Generally, there are two problems in that area. First, if a wakeup event
> > > > occurs exactly at the same time when /sys/power/state is being written to,
> > > > the even may be delivered to user space right before the freezing of it,
> > > > in which case the user space consumer of the event may not be able to process
> > > > it before the system is suspended.
> > >
> > > Indeed, the same problem arises if the event isn't delivered to
> > > userspace until after userspace is frozen.
> >
> > In that case the kernel should abort the suspend so that the event can be
> > delivered to the user space.
>
> Yes.
>
> > > Of course, the underlying issue here is that the kernel has no direct way
> > > to know when userspace has finished processing an event. Userspace would
> > > have to tell it, which generally would mean rewriting some large number of user
> > > programs.
> >
> > I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't
> > really need to know whether or not user space has already consumed the event.
>
> That's true. But it only shifts the onus: When a userspace program has
> finished processing an event, it has to tell the power-manager process.
> Clearly this sort of thing is unavoidable, one way or another.
>
> > > > The following patch illustrates my idea of how these two problems may be
> > > > addressed. It introduces a new global sysfs attribute,
> > > > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > > > to increment the wakeup events counter.
> > >
> > > In what way is this better than suspend blockers?
> >
> > It doesn't add any new framework and it doesn't require the users of
> > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add
> > the user space interface that caused so much opposition to appear.
>
> Okay. A quick comparison shows that in your proposal:
>
> There's no need to register and unregister suspend blockers.
> But instead you create the equivalent of a suspend blocker
> inside every struct device.
>
> Drivers (or subsystems) don't have to activate suspend
> blockers. But instead they have to call pm_wakeup_event().
>
> Drivers don't have to deactivate suspend blockers. You don't
> have anything equivalent, and as a result your scheme is
> subject to the race described below.
>
> There are no userspace suspend blockers and no opportunistic
> suspend. Instead a power-manager process takes care of
> initiating or preventing suspends as needed.
>
> In short, you have eliminated the userspace part of the suspend blocker
> approach just as in some of the proposals posted earlier, and you have
> replaced the in-kernel suspend blockers with new data in struct device
> and a new PM API. On the whole, it doesn't seem very different from
> the in-kernel part of suspend blockers. The most notable difference is
> the name: pm_wake_event() vs. suspend_blocker_activate(), or whatever
> it ended up being called.
>
> This is the race I was talking about:
Your confused about what problem this patch attempts to solve. There is
a pm_qos patch in the works to address the suspend blocker
functionality.
http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html
--mgross
>
> > > What happens if an event arrives just before you read
> > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > there is a new unprocessed event until after the power manager checks
> > > it?
>
> > I think this is not the kernel's problem. In this approach the kernel makes it
> > possible for the user space to avoid the race. Whether or not the user space
> > will use this opportunity is a different matter.
>
> It is _not_ possible for userspace to avoid this race. Help from the
> kernel is needed.
>
> > > Your plan is missing a critical step: the "handoff" whereby
> > > responsibility for handling an event passes from the kernel to
> > > userspace.
>
> > > With suspend blockers, this handoff occurs when an event queue is
> > > emptied and its associate suspend blocker is deactivated. Or with some
> > > kinds of events for which the Android people have not written an
> > > explicit handoff, it occurs when a timer expires (timed suspend
> > > blockers).
> >
> > Well, quite frankly, I don't see any difference here. In either case there is
> > a possibility for user space to mess up things and the kernel can't really help
> > that.
>
> With suspend blockers, there is also the possibility for userspace to
> handle races correctly. But with your scheme there isn't -- that's the
> difference.
>
> > > This shares with the other alternatives posted recently the need for a
> > > central power-manager process. And like in-kernel suspend blockers, it
> > > requires changes to wakeup-capable drivers (the wakeup-events counter
> > > has to be incremented).
> >
> > It doesn't really require changes to drivers, but to code that knows of wakeup
> > events, like the PCI runtime wakeup code.
>
> Just like in-kernel suspend blockers.
>
> > Moreover, it doesn't require kernel
> > subsystems to know or even care when it is reasonable to allow suspend to
> > happen. The only thing they need to do is to call pm_wakeup_event() whenever
> > they see a wakeup event.
>
> That's just semantics. Obviously a wakeup event should prevent suspend
> from happening, so if subsystems know or care about one then they know
> or care about the other.
>
> > I don't really think it is too much of a requirement
> > (and quite frnakly I can't imagine anything simpler than that).
>
> This is because you have omitted the part about allowing suspends again
> (or if you prefer, about notifying the PM core that a wakeup event has
> been handed off to userspace). As a result of leaving this out, you
> haven't eliminated all the races.
>
> > Yes, it does, but I have an idea about how to implement such a power manager
> > and I'm going to actually try it.
>
> A logical design would be to use dbus for disseminating PM-related
> information. Does your idea work that way?
>
> > I don't think any of the approaches that don't use suspend blockers allows
> > one to avoid the race between the process that writes to /sys/power/state
> > and a wakeup event happening at the same time. They attempt to address another
> > issue, which is how to prevent untrusted user space processes from keeping the
> > system out of idle, but that is a different story.
>
> Well, there was one approach that didn't use suspend blockers and did
> solve the race: the original wakelocks proposal. Of course, that was
> just suspend blockers under a different name. One could make a very
> good case that your scheme is also suspend blockers under a different
> name (and with an important part missing).
>
> Alan Stern
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100621061345.GF9735@gvim.org>
@ 2010-06-21 12:10 ` tytso
2010-06-21 16:01 ` Alan Stern
` (2 subsequent siblings)
3 siblings, 0 replies; 50+ messages in thread
From: tytso @ 2010-06-21 12:10 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
So where are we at this point?
Discussion had completely died down for a while, and it's picked up
again, but it's not clear to me that we're any closer to reaching
consensus.
There's been one proposal that we simply merge in a set of no-op
inline functions for suspend blockers, just so we can get let the
drivers go in (assuming that Greg K-H believes this is still a
problem), but with an automatical removal of N months (N to be
decided, say 9 or 12 or 18 months).
My concern is that if we do that, we will have simply kicked the ball
down the road for N months. Another approach is to simply merge in
no-op functions and not leave any kind of deprecation schedule.
That's sort of an implicit admission of the fact that we may not reach
consensus on this issue. Or we could simply ship the patches as-is to
Linus after he gets back from vacation and ask him for a thumbs up or
thumbs down vote, which might settle things once and for all.
How do we go forward from here?
- Ted
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100621121058.GF6199@thunk.org>
@ 2010-06-21 12:22 ` Alan Cox
2010-06-21 12:26 ` Florian Mickler
` (3 more replies)
2010-06-22 1:07 ` mark gross
1 sibling, 4 replies; 50+ messages in thread
From: Alan Cox @ 2010-06-21 12:22 UTC (permalink / raw)
To: tytso
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
> There's been one proposal that we simply merge in a set of no-op
> inline functions for suspend blockers, just so we can get let the
> drivers go in (assuming that Greg K-H believes this is still a
> problem), but with an automatical removal of N months (N to be
> decided, say 9 or 12 or 18 months).
If you automatically remove the suspend blocker wrappers in 12 months
then you can keep the drivers in tree.
The drivers are generally not a problem (ok some of them like the binder
are interesting little trips) its just those hooks, and we'd all benefit
somewhat even if the only bit google are patching back in are their hooks.
> down the road for N months. Another approach is to simply merge in
> no-op functions and not leave any kind of deprecation schedule.
> That's sort of an implicit admission of the fact that we may not reach
> consensus on this issue. Or we could simply ship the patches as-is to
Very bad precedent. What happens when every other vendor does the same ?
Keep the upstream code clean.
> Linus after he gets back from vacation and ask him for a thumbs up or
> thumbs down vote, which might settle things once and for all.
You seem desperate to just throw it at Linus - you have been all along
before the discussion, during it and now: but I don't understand why ?
It's as if you don't want progress to be made by other means ?
> How do we go forward from here?
PM QoS seems to be evolving nicely.
As for merging stuff - if its new code then it should get submitted with
the hooks left out anyway, just as all the other vendors are doing with
weird little custom hooks of their own. The hooks can still easily be
patched back in as a tiny easy to maintain vendor patch.
This works - the code still gets updated and seen by people, only the
little hook patch has to be maintained and generally it looks after
itself except for the odd .rej when something changes right up by the
merge point.
Alan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
2010-06-21 12:22 ` Alan Cox
@ 2010-06-21 12:26 ` Florian Mickler
2010-06-21 12:26 ` Florian Mickler
` (2 subsequent siblings)
3 siblings, 0 replies; 50+ messages in thread
From: Florian Mickler @ 2010-06-21 12:26 UTC (permalink / raw)
To: Alan Cox
Cc: tytso, Linux-pm, mark gross, markgross, Neil Brown,
Dmitry Torokhov, Linux Kernel Mailing List, list
On Mon, 21 Jun 2010 13:22:34 +0100
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > How do we go forward from here?
>
> PM QoS seems to be evolving nicely.
Still, it doesn't solve the userspace-part.
Cheers,
Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
2010-06-21 12:22 ` Alan Cox
2010-06-21 12:26 ` Florian Mickler
@ 2010-06-21 12:26 ` Florian Mickler
2010-06-21 13:42 ` tytso
[not found] ` <20100621134249.GG6199@thunk.org>
3 siblings, 0 replies; 50+ messages in thread
From: Florian Mickler @ 2010-06-21 12:26 UTC (permalink / raw)
To: linux-pm; +Cc: linux-kernel
On Mon, 21 Jun 2010 13:22:34 +0100
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > How do we go forward from here?
>
> PM QoS seems to be evolving nicely.
Still, it doesn't solve the userspace-part.
Cheers,
Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100621055522.GE9735@gvim.org>
@ 2010-06-21 12:39 ` Florian Mickler
2010-06-21 15:57 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006211125470.1687-100000@iolanthe.rowland.org>
2 siblings, 0 replies; 50+ messages in thread
From: Florian Mickler @ 2010-06-21 12:39 UTC (permalink / raw)
To: markgross
Cc: 640e9920, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List,
Matthew, linux-pm
On Sun, 20 Jun 2010 22:55:22 -0700
mark gross <640e9920@gmail.com> wrote:
> On Sun, Jun 20, 2010 at 10:33:01PM -0400, Alan Stern wrote:
> > On Sun, 20 Jun 2010, mark gross wrote:
> > > > In what way is this better than suspend blockers?
> > >
> > > 1) I don't think suspend blockers really solve or effectively articulate
> > > the suspend wake event race problem.
> >
> > Why not?
>
> For starters the suspend block patch is about suspend blocking, at least
> at first before competing ideas starting coming out and this race issue
> was offered up as an excuse to not consider the alternative ideas, then
> suddenly suspend blocking became a wake event notification mechanism
> implementation that was baked into the implementation. The arguments
> are still a blur and irritating to recall / look up again.
I really can't follow you here. Userspace and kernelspace actively
blocking suspend when necessary guarantees that the race is won by the
right party. I.e. the race problem is solved.
You can view it as a wake event notification if you want. But that's
just another point of view.
>
> But, the point I'm trying to make is that wake event serialization /
> event handling suddenly became a big FUD-pie tightly bound to a specific
> feature for suspend blocking (or was is auto suspending? Its a magical
> solution to all the PM problems in the kernel. I'm being sarcastic.)
>
> Its much better to call out the issue and provide a clean targeted
> solution to it without binding it to some other solution to a different
> problem.
See my other mail in response to Alan Stern. In the case of a
userspace-suspend-manager, if we just use this wake-event-counting
mechanism, we have to make shure userspace get's scheduled and notifies
the suspend-manager with an 'all clear' before it suspends the machine.
Else it can happen for the userspace-manager to suspend before
userspace could notify the userspace-manager to not suspend.
Cheers,
Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
2010-06-21 12:22 ` Alan Cox
2010-06-21 12:26 ` Florian Mickler
2010-06-21 12:26 ` Florian Mickler
@ 2010-06-21 13:42 ` tytso
[not found] ` <20100621134249.GG6199@thunk.org>
3 siblings, 0 replies; 50+ messages in thread
From: tytso @ 2010-06-21 13:42 UTC (permalink / raw)
To: Alan Cox
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, Jun 21, 2010 at 01:22:34PM +0100, Alan Cox wrote:
>
> You seem desperate to just throw it at Linus - you have been all along
> before the discussion, during it and now: but I don't understand why ?
Why are you so afraid to let Linus make the call? He's the benevolent
dictator, isn't he?
> It's as if you don't want progress to be made by other means ?
It doesn't look like progress is being made here.
- Ted
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100621134249.GG6199@thunk.org>
@ 2010-06-21 14:01 ` Alan Cox
0 siblings, 0 replies; 50+ messages in thread
From: Alan Cox @ 2010-06-21 14:01 UTC (permalink / raw)
To: tytso
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, 21 Jun 2010 09:42:49 -0400
tytso@mit.edu wrote:
> On Mon, Jun 21, 2010 at 01:22:34PM +0100, Alan Cox wrote:
> >
> > You seem desperate to just throw it at Linus - you have been all along
> > before the discussion, during it and now: but I don't understand why ?
>
> Why are you so afraid to let Linus make the call? He's the benevolent
> dictator, isn't he?
I don't mind him laughing at it. I'm just curious why you've tried to
short circuit the more general discussion repeatedly - even right at the
start.
> It doesn't look like progress is being made here.
On suspend blockers no - on some of the real problems yes.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
2010-06-21 4:04 ` David Brownell
2010-06-21 6:02 ` David Brownell
@ 2010-06-21 15:06 ` Alan Stern
1 sibling, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 15:06 UTC (permalink / raw)
To: David Brownell
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Florian Mickler, Linux-pm mailing list
On Sun, 20 Jun 2010, David Brownell wrote:
> Can we put this more directly: the problem is
> that the *SYSTEM ISN'T FULLY SUSPENDED* when the
> hardware wake event triggers? (Where "*SYSTEM*
> includes userspace not just kernel. In fact the
> overall system is built from many subsystems,
> some in the kernel and some in userspace.
Indeed, the system may not even be partially suspended when the wake
event triggers.
> At the risk of being prematurely general: I'd
> point out that these subsystems probably have
> sequencing requirements. kernel-then-user is
> a degenerate case, and surely oversimplified.
> There are other examples, e.g. between kernel
> subsystems... Like needing to suspend a PMIC
> before the bus it uses, where that bus uses
> a task to manage request/response protocols.
> (Think I2C or SPI.)
>
> This is like the __init/__exit sequencing mess...
>
> In terms of userspace event delivery, I'd say
> it's a bug in the event mechanism if taking the
> next step in suspension drops any event. It
> should be queued, not lost... As a rule the
> hardware queuing works (transparently)...
There may be a misunderstanding here... People talk about events
getting lost, but what they (usually) mean is that the event isn't
actually _dropped_ -- rather, it fails to trigger a wakeup or to
prevent a suspend. When something else causes the system to resume
later on, the event will be delivered normally.
This means that the problem is not one of sequencing. The problem is
twofold:
To recognize when a wakeup event has occurred and therefore
it is not now safe to allow the system to suspend;
And to recognize when a wakeup event has been completely
handled and therefore it is once again safe to allow the system
to suspend.
> > Of course, the underlying
> > > > issue here is that the kernel has no direct way
> > to know when userspace
> > > > has finished processing an event.
>
>
> Again said more directly: there's no current
> mechanism to coordinate subsystems. Userspace
> can't communicate "I'm ready" to kernel, and
> vice versa. (a few decades ago, APM could do
> that ... we dropped such mechanisms though, and
> I'm fairly sure APM's implementation was holey.)
Yes, that's a better way of putting it. And it's not just a matter of
"userspace communicating with the kernel", because userspace is not
monolithic. There has to be a way for one user process to communicate
this information to another (I like Florian's idea). Of course, the
kernel doesn't have to worry about those details.
If one accepts a scheme in which all the suspend initiations and
cancellations are carried out by a single process (a power-manager
process), then the difficulties of communication and coordination
between the kernel and userspace are minimized.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <20100621073233.3f874ad0@schatten.dmk.lab>
@ 2010-06-21 15:23 ` Alan Stern
2010-06-21 16:54 ` Alan Stern
` (2 subsequent siblings)
3 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 15:23 UTC (permalink / raw)
To: Florian Mickler
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, 21 Jun 2010, Florian Mickler wrote:
> On Sun, 20 Jun 2010 22:23:38 -0400 (EDT)
> Alan Stern <stern@rowland.harvard.edu> wrote:
> > This is the race I was talking about:
> >
> > > > What happens if an event arrives just before you read
> > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > there is a new unprocessed event until after the power manager checks
> > > > it?
> >
> > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > possible for the user space to avoid the race. Whether or not the user space
> > > will use this opportunity is a different matter.
> >
> > It is _not_ possible for userspace to avoid this race. Help from the
> > kernel is needed.
>
> It is possible if every (relevant) userspace program implements a
> callback for the powermanager to check if one of it's wakeup-sources
> got activated.
>
> That way the powermanager would read /sys/power/wakeup_count, then do
> the roundtrip to all it's registered users and only then suspend.
>
> This turns the suspend_blockers concept around. Instead of actively
> signaling the suspend_blockers, the userspace programs only answer
> "yes/no" when asked. (i.e. polling?)
In the end you would want to have communication in both directions:
suspend blockers _and_ callbacks. Polling is bad if done too often.
But I think the idea is a good one.
In fact, you don't need a "yes/no" response. Programs merely need a
chance to activate a new suspend blocker if a wakeup source was
recently activated before they acknowledge the poll.
> You _can not_ implement userspace suspend blockers with this approach,
> as it is vital for every userspace program to get scheduled and check
> it's wakeup-source (if even possible) before you know that the right
> parties have won the race.
I'm not sure what you mean. Certainly you can take a userspace
suspend-blocker implementation of the sort discussed before (where
programs communicate their needs to a central power-manager process)
and add this callback mechanism on top.
There is still at least one loophole to be closed: Android's
timer-based wakelocks. These include cases where the Android
developers didn't add enough wakelocks to cover the entire path from
kernel-event to userspace-handler, so they punted and relied on a timer
to decide when the wakelock should be deactivated. (There may be other
cases too; I didn't follow the original discussion very closely.)
It's not clear whether these things can be handled already in Rafael's
scheme with your addition, or whether something new is needed.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100621055522.GE9735@gvim.org>
2010-06-21 12:39 ` Florian Mickler
@ 2010-06-21 15:57 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006211125470.1687-100000@iolanthe.rowland.org>
2 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 15:57 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Sun, 20 Jun 2010, mark gross wrote:
> > > 1) I don't think suspend blockers really solve or effectively articulate
> > > the suspend wake event race problem.
> >
> > Why not?
>
> For starters the suspend block patch is about suspend blocking, at least
> at first before competing ideas starting coming out and this race issue
> was offered up as an excuse to not consider the alternative ideas, then
> suddenly suspend blocking became a wake event notification mechanism
> implementation that was baked into the implementation. The arguments
> are still a blur and irritating to recall / look up again.
I get the feeling you didn't fully absorb the intent of the original
wakelock patches. Right from the start they were about fixing a race
in system suspend: The system may go to sleep even though a pending
wakeup event should have blocked or prevented the suspend. In
particular that means notifying the PM core about wakeup events, when
they occur, and when the system may once again be allowed to suspend.
The userspace parts of the original patches did cloud the issue,
admittedly. But the purpose of the in-kernel parts has always been
clear.
> But, the point I'm trying to make is that wake event serialization /
> event handling suddenly became a big FUD-pie tightly bound to a specific
> feature for suspend blocking (or was is auto suspending? Its a magical
> solution to all the PM problems in the kernel. I'm being sarcastic.)
>
> Its much better to call out the issue and provide a clean targeted
> solution to it without binding it to some other solution to a different
> problem.
That's exactly what the wakelock patches did: They called out the
issue of the system suspending even while there were pending wakeup
events, they provided a targeted solution, and it wasn't bound to
another solution to a different problem.
Indeed, if you go back through the (I agree, irritating) threads on
this topic, you'll see that the FUD and other issues were all
introduced by other kernel developers, mainly because they didn't like
the idea of using system suspend as a mechanism for dynamic power
management.
> > > I don't think suspend-blocker handles both kinds of races as well as you
> > > seem to think.
> >
> > Can you give any counterexamples?
>
> I knew you'd ask such a thing. Do you have any correctness proofs of it
> working right?
If you want, sure. But what you think "working right" means may not be
the same as what I think.
> Lets consider the RIL incoming call race:
"RIL"?
> 1) user mode initiates an "opportunistic suspend" or whatever we call
> this these days by writing to some sys interface.
Okay.
> 2) a call comes in (multi-threaded cpu) just after the write.
Right. Presumably we want the suspend to be delayed until the call
can be handled by a user programm, yes?
> 3) the caif driver grabs a blocker, stopping the in flight suspend in the
> nick of time. But, when should it release it without racing? How does
> one implement that kernel code such that its portable to non-android user
> mode stacks?
I don't know anything about the caif driver so I can't answer this
question in detail. Here's the general idea: Somehow the kernel has to
notify userspace that an incoming call has arrived. Whatever driver or
subsystem sends that notification should release the suspend blocker
once userspace indicates that the notification has been received (for
example, by reading all the data in an input event queue). That's true
for both Android and non-Android.
In some cases there may not be any mechanism for userspace to tell the
kernel when a notification has been received. For thoses cases, the
Android people used timer-based blockers. This is not necessarily the
best approach but they seem happy with it. Others might prefer to add
an explicit "notification received" mechanism.
Still others take the view that since suspends are initiated by
userspace anyway, it's not necessary to tell the kernel when suspend is
safe again. Just tell the user process responsible for initiating
suspends.
> > > A single tool that conflates multiple kernel facilities
> > > is not an advantage. It limits applications outside of the scope of
> > > doing it the "android way".
> >
> > I don't see that necessarily as a drawback. You might just as well say
> > that the entire Linux kernel limits applications to doing things the
> > "Unix" way. Often have fewer choices is an advantage.
>
> I disagree with your analogy. One problem one solution with minimal
> coupling to other implementation details is nice. Two problems with one
> solution dependent on the system wide architecture bound to a user mode
> PM design is fragile and not portable.
I assume the "two problems" you refer to are: telling the PM core that
the kernel has a wakeup event in progress, and telling the PM core that
userspace has a wakeup event in progress. To me these don't seem to be
vastly different problems, and having a single solution for both
doesn't seem out of line.
The fact that this is bound to a user-mode PM design was inevitable,
given the whole idea was to overcome weaknesses in the system suspend
implementation and that implementation already is driven by userspace.
> > (Incidentally, even on Android the centralized PM process does not
> > handle _all_ of the userspace/kernel wakelock interactions.)
> >
>
> Yeah, I've been told that too. I've grepped for where the wake_unlock
> sysfs interfaces are hit from user mode android (eclair) and I would
> like some help in identifying those guys. Maybe its in the FroYo code I
> don't get to look at yet?
>
> There is libhardware_legacy/power/power.c and an out of tree kernel
> broadcom bcm4329 driver under hardware/broadcom but that doesn't count
> as it looks like a kernel driver to me.
I don't know -- I have never looked at the Android userspace. No doubt
Arve can provide some examples.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100621061345.GF9735@gvim.org>
2010-06-21 12:10 ` tytso
@ 2010-06-21 16:01 ` Alan Stern
[not found] ` <20100621121058.GF6199@thunk.org>
[not found] ` <Pine.LNX.4.44L0.1006211158200.1687-100000@iolanthe.rowland.org>
3 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 16:01 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Sun, 20 Jun 2010, mark gross wrote:
> Your confused about what problem this patch attempts to solve.
I don't think so. Rafael's description was pretty clear.
> There is
> a pm_qos patch in the works to address the suspend blocker
> functionality.
> http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html
No. That patch addresses something _similar_ to the suspend blocker
functionality. The fact remains, though, that pm_qos is not used
during system suspend (the /sys/power/state interface), hence changes
to pm_qos won't solve the system-suspend problems that suspend blockers
do solve.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <20100621073233.3f874ad0@schatten.dmk.lab>
2010-06-21 15:23 ` Alan Stern
@ 2010-06-21 16:54 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006211111060.1687-100000@iolanthe.rowland.org>
[not found] ` <Pine.LNX.4.44L0.1006211244140.1687-100000@iolanthe.rowland.org>
3 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 16:54 UTC (permalink / raw)
To: Florian Mickler
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, 21 Jun 2010, Florian Mickler wrote:
> > > > What happens if an event arrives just before you read
> > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > there is a new unprocessed event until after the power manager checks
> > > > it?
> >
> > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > possible for the user space to avoid the race. Whether or not the user space
> > > will use this opportunity is a different matter.
> >
> > It is _not_ possible for userspace to avoid this race. Help from the
> > kernel is needed.
>
> It is possible if every (relevant) userspace program implements a
> callback for the powermanager to check if one of it's wakeup-sources
> got activated.
>
> That way the powermanager would read /sys/power/wakeup_count, then do
> the roundtrip to all it's registered users and only then suspend.
After further thought, there's still a race:
A wakeup event arrives.
The kernel increments /sys/power/wakeup_count and starts
processing the event.
The power-manager process reads /sys/power/wakeup_count.
The power-manager process polls the relevant programs and
they all say no events are pending.
The power-manager process successfully writes
/sys/power/wakeup_count.
The power-manager process initiates a suspend.
... Hours later ...
The system wakes up.
The kernel finishes its internal processing of the event and
sends a notification to a user program.
The problem here is that the power-manager process can't tell when the
kernel has finished processing an event. This is true both for events
that need to propagate to userspace and for events that are handled
entirely by the kernel.
This is a reflection of the two distinct pieces of information that we
need to keep track of:
A wakeup event has arrived, so it's no longer safe to suspend.
Wakeup events are no longer pending, so it's once again
safe to suspend.
The wakeup_count interface tracks the first, but in this scheme nothing
tracks the second.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006211111060.1687-100000@iolanthe.rowland.org>
@ 2010-06-21 20:38 ` Florian Mickler
0 siblings, 0 replies; 50+ messages in thread
From: Florian Mickler @ 2010-06-21 20:38 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, 21 Jun 2010 11:23:33 -0400 (EDT)
Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 21 Jun 2010, Florian Mickler wrote:
>
> > On Sun, 20 Jun 2010 22:23:38 -0400 (EDT)
> > Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > > This is the race I was talking about:
> > >
> > > > > What happens if an event arrives just before you read
> > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > > there is a new unprocessed event until after the power manager checks
> > > > > it?
> > >
> > > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > > possible for the user space to avoid the race. Whether or not the user space
> > > > will use this opportunity is a different matter.
> > >
> > > It is _not_ possible for userspace to avoid this race. Help from the
> > > kernel is needed.
> >
> > It is possible if every (relevant) userspace program implements a
> > callback for the powermanager to check if one of it's wakeup-sources
> > got activated.
> >
> > That way the powermanager would read /sys/power/wakeup_count, then do
> > the roundtrip to all it's registered users and only then suspend.
> >
> > This turns the suspend_blockers concept around. Instead of actively
> > signaling the suspend_blockers, the userspace programs only answer
> > "yes/no" when asked. (i.e. polling?)
>
> In the end you would want to have communication in both directions:
> suspend blockers _and_ callbacks. Polling is bad if done too often.
> But I think the idea is a good one.
Actually, I'm not so shure.
1. you have to roundtrip whereas in the suspend_blocker scheme you have
active annotations (i.e. no further action needed)
2. it may not be possible for a user to determine if a wake-event is
in-flight. you would have to somehow pass the wake-event-number with
it, so that the userspace process could ack it properly without
confusion. Or... I don't know of anything else...
1. userspace-manager (UM) reads a number (42).
2. it questions userspace program X: is it ok to suspend?
[please fill in how userspace program X determines to block
suspend]
3a. UM's roundtrip ends and it proceeds to write "42" to the
kernel [suspending]
3b. UM's roundtrip ends and it aborts suspend, because a
(userspace-)suspend-blocker got activated
I'm not shure how the userspace program could determine that there is a
wake-event in flight. Perhaps by storing the number of last wake-event.
But then you need per-wake-event-counters... :|
> In fact, you don't need a "yes/no" response. Programs merely need a
> chance to activate a new suspend blocker if a wakeup source was
> recently activated before they acknowledge the poll.
true. (incorporated alreeady above)
>
> > You _can not_ implement userspace suspend blockers with this approach,
> > as it is vital for every userspace program to get scheduled and check
> > it's wakeup-source (if even possible) before you know that the right
> > parties have won the race.
>
> I'm not sure what you mean.
Sorry, that was not understandable. What I meant was that you "_can
not_" implement the suspend-blockers scheme, where you don't need to
roundtrip through all userspace (with all it's glory).
( => you need the roundtrip here)
>
> There is still at least one loophole to be closed: Android's
> timer-based wakelocks. These include cases where the Android
> developers didn't add enough wakelocks to cover the entire path from
> kernel-event to userspace-handler, so they punted and relied on a timer
> to decide when the wakelock should be deactivated. (There may be other
> cases too; I didn't follow the original discussion very closely.)
> It's not clear whether these things can be handled already in Rafael's
> scheme with your addition, or whether something new is needed.
>
> Alan Stern
Do you have some thoughts about the wake-event-in-flight detection?
Cheers,
Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006211244140.1687-100000@iolanthe.rowland.org>
@ 2010-06-21 20:40 ` Florian Mickler
2010-06-21 21:18 ` Rafael J. Wysocki
1 sibling, 0 replies; 50+ messages in thread
From: Florian Mickler @ 2010-06-21 20:40 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, 21 Jun 2010 12:54:44 -0400 (EDT)
Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 21 Jun 2010, Florian Mickler wrote:
>
> > > > > What happens if an event arrives just before you read
> > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > > there is a new unprocessed event until after the power manager checks
> > > > > it?
> > >
> > > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > > possible for the user space to avoid the race. Whether or not the user space
> > > > will use this opportunity is a different matter.
> > >
> > > It is _not_ possible for userspace to avoid this race. Help from the
> > > kernel is needed.
> >
> > It is possible if every (relevant) userspace program implements a
> > callback for the powermanager to check if one of it's wakeup-sources
> > got activated.
> >
> > That way the powermanager would read /sys/power/wakeup_count, then do
> > the roundtrip to all it's registered users and only then suspend.
>
> After further thought, there's still a race:
Ah, yes. That's what I meant in my other mail with 'how to determine
that an wake event is in flight'.
Read this too late.
> Alan Stern
Cheers,
Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006211244140.1687-100000@iolanthe.rowland.org>
2010-06-21 20:40 ` Florian Mickler
@ 2010-06-21 21:18 ` Rafael J. Wysocki
1 sibling, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-21 21:18 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, Florian Mickler,
Linux-pm mailing list
On Monday, June 21, 2010, Alan Stern wrote:
> On Mon, 21 Jun 2010, Florian Mickler wrote:
>
> > > > > What happens if an event arrives just before you read
> > > > > /sys/power/wakeup_count, but the userspace consumer doesn't realize
> > > > > there is a new unprocessed event until after the power manager checks
> > > > > it?
> > >
> > > > I think this is not the kernel's problem. In this approach the kernel makes it
> > > > possible for the user space to avoid the race. Whether or not the user space
> > > > will use this opportunity is a different matter.
> > >
> > > It is _not_ possible for userspace to avoid this race. Help from the
> > > kernel is needed.
> >
> > It is possible if every (relevant) userspace program implements a
> > callback for the powermanager to check if one of it's wakeup-sources
> > got activated.
> >
> > That way the powermanager would read /sys/power/wakeup_count, then do
> > the roundtrip to all it's registered users and only then suspend.
>
> After further thought, there's still a race:
>
> A wakeup event arrives.
>
> The kernel increments /sys/power/wakeup_count and starts
> processing the event.
>
> The power-manager process reads /sys/power/wakeup_count.
You assume that these two steps will occur instantaneously one after the other,
while there may be (and in fact there should be) a delay between them.
I would make the power manager wait for certain time after reading
/sys/power/wakeup_count and if no wakeup events are reported to it within
that time, to write to /sys/power/wakeup_count.
The length of the time to wait would be system-dependent in general, but I'd
also allow the event consumers to influence it (like when an application knows
it will process things for 10 minutes going forward, so it tells the power
manager to wait for at least 10 minutes before attempting to suspend).
> The power-manager process polls the relevant programs and
> they all say no events are pending.
>
> The power-manager process successfully writes
> /sys/power/wakeup_count.
>
> The power-manager process initiates a suspend.
>
> ... Hours later ...
>
> The system wakes up.
>
> The kernel finishes its internal processing of the event and
> sends a notification to a user program.
>
> The problem here is that the power-manager process can't tell when the
> kernel has finished processing an event. This is true both for events
> that need to propagate to userspace and for events that are handled
> entirely by the kernel.
>
> This is a reflection of the two distinct pieces of information that we
> need to keep track of:
>
> A wakeup event has arrived, so it's no longer safe to suspend.
>
> Wakeup events are no longer pending, so it's once again
> safe to suspend.
>
> The wakeup_count interface tracks the first, but in this scheme nothing
> tracks the second.
Which I don't think is really necessary, because we'll need to use timeouts
anyway, at least for events that have no user space consumers.
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006202154260.29532-100000@netrider.rowland.org>
` (2 preceding siblings ...)
[not found] ` <20100621061345.GF9735@gvim.org>
@ 2010-06-21 21:58 ` Rafael J. Wysocki
3 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-21 21:58 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, Linux-pm mailing list
On Monday, June 21, 2010, Alan Stern wrote:
> On Sun, 20 Jun 2010, Rafael J. Wysocki wrote:
>
> > > > Generally, there are two problems in that area. First, if a wakeup event
> > > > occurs exactly at the same time when /sys/power/state is being written to,
> > > > the even may be delivered to user space right before the freezing of it,
> > > > in which case the user space consumer of the event may not be able to process
> > > > it before the system is suspended.
> > >
> > > Indeed, the same problem arises if the event isn't delivered to
> > > userspace until after userspace is frozen.
> >
> > In that case the kernel should abort the suspend so that the event can be
> > delivered to the user space.
>
> Yes.
>
> > > Of course, the underlying issue here is that the kernel has no direct way
> > > to know when userspace has finished processing an event. Userspace would
> > > have to tell it, which generally would mean rewriting some large number of user
> > > programs.
> >
> > I'm not sure of that. If the kernel doesn't initiate suspend, it doesn't
> > really need to know whether or not user space has already consumed the event.
>
> That's true. But it only shifts the onus: When a userspace program has
> finished processing an event, it has to tell the power-manager process.
> Clearly this sort of thing is unavoidable, one way or another.
That's correct, but there already are multiple ways to communicate things
between user space processes, while we would need new interfaces to
pass that information between user space and the kernel. That makes quite
some difference.
> > > > The following patch illustrates my idea of how these two problems may be
> > > > addressed. It introduces a new global sysfs attribute,
> > > > /sys/power/wakeup_count, associated with a running counter of wakeup events
> > > > and a helper function, pm_wakeup_event(), that may be used by kernel subsystems
> > > > to increment the wakeup events counter.
> > >
> > > In what way is this better than suspend blockers?
> >
> > It doesn't add any new framework and it doesn't require the users of
> > pm_wakeup_event() to "unblock" suspend, so it is simpler. It also doesn't add
> > the user space interface that caused so much opposition to appear.
>
> Okay. A quick comparison shows that in your proposal:
>
> There's no need to register and unregister suspend blockers.
> But instead you create the equivalent of a suspend blocker
> inside every struct device.
Not really. That thing is for statistics purposes only and it's not essential
from the functionality POV. But you know that. :-)
> Drivers (or subsystems) don't have to activate suspend
> blockers. But instead they have to call pm_wakeup_event().
>
> Drivers don't have to deactivate suspend blockers. You don't
> have anything equivalent, and as a result your scheme is
> subject to the race described below.
The original one without user space variant of suspend blockers is subject to
it as well, because the kernel doesn't know when the user space consumer is
going to finish processing the event.
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <20100621223841.1cf1942d@schatten.dmk.lab>
@ 2010-06-21 22:18 ` Alan Stern
0 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 22:18 UTC (permalink / raw)
To: Florian Mickler
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, 21 Jun 2010, Florian Mickler wrote:
> > In the end you would want to have communication in both directions:
> > suspend blockers _and_ callbacks. Polling is bad if done too often.
> > But I think the idea is a good one.
>
> Actually, I'm not so shure.
>
> 1. you have to roundtrip whereas in the suspend_blocker scheme you have
> active annotations (i.e. no further action needed)
That's why it's best to use both. The normal case is that programs
activate and deactivate blockers by sending one-way messages to the PM
process. The exceptional case is when the PM process is about to
initiate a suspend; that's when it does the round-trip polling. Since
the only purpose of the polling is to avoid a race, 90% of the time it
will succeed.
> 2. it may not be possible for a user to determine if a wake-event is
> in-flight. you would have to somehow pass the wake-event-number with
> it, so that the userspace process could ack it properly without
> confusion. Or... I don't know of anything else...
>
> 1. userspace-manager (UM) reads a number (42).
>
> 2. it questions userspace program X: is it ok to suspend?
>
> [please fill in how userspace program X determines to block
> suspend]
>
> 3a. UM's roundtrip ends and it proceeds to write "42" to the
> kernel [suspending]
> 3b. UM's roundtrip ends and it aborts suspend, because a
> (userspace-)suspend-blocker got activated
>
> I'm not shure how the userspace program could determine that there is a
> wake-event in flight. Perhaps by storing the number of last wake-event.
> But then you need per-wake-event-counters... :|
Rafael seems to think timeouts will fix this. I'm not so sure.
> Do you have some thoughts about the wake-event-in-flight detection?
Not really, except for something like the original wakelock scheme in
which the kernel tells the PM core when an event is over.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <201006212318.03001.rjw@sisk.pl>
@ 2010-06-21 22:27 ` Alan Stern
0 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-21 22:27 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Florian Mickler, Linux-pm mailing list
On Mon, 21 Jun 2010, Rafael J. Wysocki wrote:
> > After further thought, there's still a race:
> >
> > A wakeup event arrives.
> >
> > The kernel increments /sys/power/wakeup_count and starts
> > processing the event.
> >
> > The power-manager process reads /sys/power/wakeup_count.
>
> You assume that these two steps will occur instantaneously one after the other,
> while there may be (and in fact there should be) a delay between them.
No, I'm not assuming that.
> I would make the power manager wait for certain time after reading
> /sys/power/wakeup_count and if no wakeup events are reported to it within
> that time, to write to /sys/power/wakeup_count.
Why? That's just wasted time -- time during which the system could
have been suspended.
I can understand the power manager might reason as follows: If this
wakeup event hasn't been handed over to a userspace program in the next
5 seconds, I'm going to suspend anyway on the theory that something is
wrong. But why do that when you can get exact information?
> The length of the time to wait would be system-dependent in general, but I'd
> also allow the event consumers to influence it (like when an application knows
> it will process things for 10 minutes going forward, so it tells the power
> manager to wait for at least 10 minutes before attempting to suspend).
It tells the power manager to wait by activating a userspace suspend
blocker. While a blocker is active, the power manager doesn't have to
poll /sys/power/wakeup_count or anything; it just waits for all the
suspend blockers to be deactivated. And there's no guesswork involved;
the power manager knows immediately when it's time to try suspending
again.
> > The power-manager process polls the relevant programs and
> > they all say no events are pending.
> >
> > The power-manager process successfully writes
> > /sys/power/wakeup_count.
> >
> > The power-manager process initiates a suspend.
> >
> > ... Hours later ...
> >
> > The system wakes up.
> >
> > The kernel finishes its internal processing of the event and
> > sends a notification to a user program.
> >
> > The problem here is that the power-manager process can't tell when the
> > kernel has finished processing an event. This is true both for events
> > that need to propagate to userspace and for events that are handled
> > entirely by the kernel.
> >
> > This is a reflection of the two distinct pieces of information that we
> > need to keep track of:
> >
> > A wakeup event has arrived, so it's no longer safe to suspend.
> >
> > Wakeup events are no longer pending, so it's once again
> > safe to suspend.
> >
> > The wakeup_count interface tracks the first, but in this scheme nothing
> > tracks the second.
>
> Which I don't think is really necessary, because we'll need to use timeouts
> anyway, at least for events that have no user space consumers.
You wouldn't need to use timeouts for them either if the kernel had a
way to indicate when it was finished processing events.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <Pine.LNX.4.44L0.1006211814370.1687-100000@iolanthe.rowland.org>
@ 2010-06-21 22:40 ` Rafael J. Wysocki
[not found] ` <201006220040.41524.rjw@sisk.pl>
1 sibling, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-21 22:40 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, Florian Mickler,
Linux-pm mailing list
On Tuesday, June 22, 2010, Alan Stern wrote:
> On Mon, 21 Jun 2010, Florian Mickler wrote:
>
> > > In the end you would want to have communication in both directions:
> > > suspend blockers _and_ callbacks. Polling is bad if done too often.
> > > But I think the idea is a good one.
> >
> > Actually, I'm not so shure.
> >
> > 1. you have to roundtrip whereas in the suspend_blocker scheme you have
> > active annotations (i.e. no further action needed)
>
> That's why it's best to use both. The normal case is that programs
> activate and deactivate blockers by sending one-way messages to the PM
> process. The exceptional case is when the PM process is about to
> initiate a suspend; that's when it does the round-trip polling. Since
> the only purpose of the polling is to avoid a race, 90% of the time it
> will succeed.
>
> > 2. it may not be possible for a user to determine if a wake-event is
> > in-flight. you would have to somehow pass the wake-event-number with
> > it, so that the userspace process could ack it properly without
> > confusion. Or... I don't know of anything else...
> >
> > 1. userspace-manager (UM) reads a number (42).
> >
> > 2. it questions userspace program X: is it ok to suspend?
> >
> > [please fill in how userspace program X determines to block
> > suspend]
> >
> > 3a. UM's roundtrip ends and it proceeds to write "42" to the
> > kernel [suspending]
> > 3b. UM's roundtrip ends and it aborts suspend, because a
> > (userspace-)suspend-blocker got activated
> >
> > I'm not shure how the userspace program could determine that there is a
> > wake-event in flight. Perhaps by storing the number of last wake-event.
> > But then you need per-wake-event-counters... :|
>
> Rafael seems to think timeouts will fix this. I'm not so sure.
>
> > Do you have some thoughts about the wake-event-in-flight detection?
>
> Not really, except for something like the original wakelock scheme in
> which the kernel tells the PM core when an event is over.
But the kernel doesn't really know that, so it really can't tell the PM core
anything useful. What happens with suspend blockers is that a kernel suspend
cooperates with a user space consumer of the event to get the story straight.
However, that will only work if the user space is not buggy and doesn't crash,
for example, before releasing the suspend blocker it's holding.
Apart from this, there are those events withoug user space "handoff" that
use timeouts.
Also there are events like wake-on-LAN that can be regarded as instantaneous
from the power manager's point of view, so they don't really need all of the
"suspend blockers" machinery and for them we will need to use a cooldown
timeout anyway.
And if we need to use that cooldown timeout, I don't see why not to use
timeouts for avoiding the race you're worrying about.
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <201006220040.41524.rjw@sisk.pl>
@ 2010-06-21 22:48 ` Rafael J. Wysocki
2010-06-22 0:50 ` Arve Hjønnevåg
` (2 subsequent siblings)
3 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-21 22:48 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, Florian Mickler,
Linux-pm mailing list
On Tuesday, June 22, 2010, Rafael J. Wysocki wrote:
> On Tuesday, June 22, 2010, Alan Stern wrote:
> > On Mon, 21 Jun 2010, Florian Mickler wrote:
> >
> > > > In the end you would want to have communication in both directions:
> > > > suspend blockers _and_ callbacks. Polling is bad if done too often.
> > > > But I think the idea is a good one.
> > >
> > > Actually, I'm not so shure.
> > >
> > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have
> > > active annotations (i.e. no further action needed)
> >
> > That's why it's best to use both. The normal case is that programs
> > activate and deactivate blockers by sending one-way messages to the PM
> > process. The exceptional case is when the PM process is about to
> > initiate a suspend; that's when it does the round-trip polling. Since
> > the only purpose of the polling is to avoid a race, 90% of the time it
> > will succeed.
> >
> > > 2. it may not be possible for a user to determine if a wake-event is
> > > in-flight. you would have to somehow pass the wake-event-number with
> > > it, so that the userspace process could ack it properly without
> > > confusion. Or... I don't know of anything else...
> > >
> > > 1. userspace-manager (UM) reads a number (42).
> > >
> > > 2. it questions userspace program X: is it ok to suspend?
> > >
> > > [please fill in how userspace program X determines to block
> > > suspend]
> > >
> > > 3a. UM's roundtrip ends and it proceeds to write "42" to the
> > > kernel [suspending]
> > > 3b. UM's roundtrip ends and it aborts suspend, because a
> > > (userspace-)suspend-blocker got activated
> > >
> > > I'm not shure how the userspace program could determine that there is a
> > > wake-event in flight. Perhaps by storing the number of last wake-event.
> > > But then you need per-wake-event-counters... :|
> >
> > Rafael seems to think timeouts will fix this. I'm not so sure.
> >
> > > Do you have some thoughts about the wake-event-in-flight detection?
> >
> > Not really, except for something like the original wakelock scheme in
> > which the kernel tells the PM core when an event is over.
>
> But the kernel doesn't really know that, so it really can't tell the PM core
> anything useful. What happens with suspend blockers is that a kernel suspend
s/suspend/subsyste/ (-ETOOLATE)
> cooperates with a user space consumer of the event to get the story straight.
>
> However, that will only work if the user space is not buggy and doesn't crash,
> for example, before releasing the suspend blocker it's holding.
>
> Apart from this, there are those events withoug user space "handoff" that
> use timeouts.
>
> Also there are events like wake-on-LAN that can be regarded as instantaneous
> from the power manager's point of view, so they don't really need all of the
> "suspend blockers" machinery and for them we will need to use a cooldown
> timeout anyway.
>
> And if we need to use that cooldown timeout, I don't see why not to use
> timeouts for avoiding the race you're worrying about.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <201006220040.41524.rjw@sisk.pl>
2010-06-21 22:48 ` Rafael J. Wysocki
@ 2010-06-22 0:50 ` Arve Hjønnevåg
2010-06-22 10:21 ` Rafael J. Wysocki
[not found] ` <201006221221.53801.rjw@sisk.pl>
3 siblings, 0 replies; 50+ messages in thread
From: Arve Hjønnevåg @ 2010-06-22 0:50 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Florian Mickler, Linux-pm mailing list
On Mon, Jun 21, 2010 at 3:40 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday, June 22, 2010, Alan Stern wrote:
>> On Mon, 21 Jun 2010, Florian Mickler wrote:
>>
>> > > In the end you would want to have communication in both directions:
>> > > suspend blockers _and_ callbacks. Polling is bad if done too often.
>> > > But I think the idea is a good one.
>> >
>> > Actually, I'm not so shure.
>> >
>> > 1. you have to roundtrip whereas in the suspend_blocker scheme you have
>> > active annotations (i.e. no further action needed)
>>
>> That's why it's best to use both. The normal case is that programs
>> activate and deactivate blockers by sending one-way messages to the PM
>> process. The exceptional case is when the PM process is about to
>> initiate a suspend; that's when it does the round-trip polling. Since
>> the only purpose of the polling is to avoid a race, 90% of the time it
>> will succeed.
>>
>> > 2. it may not be possible for a user to determine if a wake-event is
>> > in-flight. you would have to somehow pass the wake-event-number with
>> > it, so that the userspace process could ack it properly without
>> > confusion. Or... I don't know of anything else...
>> >
>> > 1. userspace-manager (UM) reads a number (42).
>> >
>> > 2. it questions userspace program X: is it ok to suspend?
>> >
>> > [please fill in how userspace program X determines to block
>> > suspend]
>> >
>> > 3a. UM's roundtrip ends and it proceeds to write "42" to the
>> > kernel [suspending]
>> > 3b. UM's roundtrip ends and it aborts suspend, because a
>> > (userspace-)suspend-blocker got activated
>> >
>> > I'm not shure how the userspace program could determine that there is a
>> > wake-event in flight. Perhaps by storing the number of last wake-event.
>> > But then you need per-wake-event-counters... :|
>>
>> Rafael seems to think timeouts will fix this. I'm not so sure.
>>
>> > Do you have some thoughts about the wake-event-in-flight detection?
>>
>> Not really, except for something like the original wakelock scheme in
>> which the kernel tells the PM core when an event is over.
>
> But the kernel doesn't really know that, so it really can't tell the PM core
> anything useful. What happens with suspend blockers is that a kernel suspend
> cooperates with a user space consumer of the event to get the story straight.
>
> However, that will only work if the user space is not buggy and doesn't crash,
> for example, before releasing the suspend blocker it's holding.
>
> Apart from this, there are those events withoug user space "handoff" that
> use timeouts.
>
> Also there are events like wake-on-LAN that can be regarded as instantaneous
> from the power manager's point of view, so they don't really need all of the
> "suspend blockers" machinery and for them we will need to use a cooldown
> timeout anyway.
>
> And if we need to use that cooldown timeout, I don't see why not to use
> timeouts for avoiding the race you're worrying about.
>
Just because some events need to use timeouts, does not mean that all
events should use timeouts. You may even want different timeout for
different events. If I want a 5 minute timeout after a wake-on-lan
packet, I don't want the same timeout after every battery check alarm
(every 10 minutes on the Nexus One).
Also, I don't see how this patch helps with events that never make
reach user-space. Unless each driver blocks suspend by returning and
error from its suspend hook, the driver then becomes dependent on the
user-space power manager to choose a sufficiently long timeout. For
some drivers, like the gpio keypad matrix driver there, is no safe
value for the timeout. The keypad driver has to block suspend if any
keys are held down.
I won't have time to actively follow this discussion, but I don't
think this patch is a good solution.
--
Arve Hjønnevåg
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100621121058.GF6199@thunk.org>
2010-06-21 12:22 ` Alan Cox
@ 2010-06-22 1:07 ` mark gross
1 sibling, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-22 1:07 UTC (permalink / raw)
To: tytso, markgross, Alan Stern, Rafael J. Wysocki,
Linux-pm mailing list
On Mon, Jun 21, 2010 at 08:10:58AM -0400, tytso@mit.edu wrote:
> So where are we at this point?
The patches are in James Bottomly's tree.
> Discussion had completely died down for a while, and it's picked up
> again, but it's not clear to me that we're any closer to reaching
> consensus.
I thought we (linux community and Android) where ok with the plist /
pm-qos implementation of the building blocks needed to implement the
suspend blocker feature on top of a pm-qos request class (I think the
name was "interactive") pretty much the exact same symantecs as the
suspend blocker thing, just with pm-qos kernel api's.
> There's been one proposal that we simply merge in a set of no-op
> inline functions for suspend blockers, just so we can get let the
> drivers go in (assuming that Greg K-H believes this is still a
> problem), but with an automatical removal of N months (N to be
> decided, say 9 or 12 or 18 months).
I'd rather see the re-tooling of pmqos happen.
>
> My concern is that if we do that, we will have simply kicked the ball
> down the road for N months. Another approach is to simply merge in
> no-op functions and not leave any kind of deprecation schedule.
> That's sort of an implicit admission of the fact that we may not reach
> consensus on this issue. Or we could simply ship the patches as-is to
> Linus after he gets back from vacation and ask him for a thumbs up or
> thumbs down vote, which might settle things once and for all.
>
> How do we go forward from here?
>
put the pm_qos -plist update into linux-next?
--mgross
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006211158200.1687-100000@iolanthe.rowland.org>
@ 2010-06-22 1:25 ` mark gross
0 siblings, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-22 1:25 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, Jun 21, 2010 at 12:01:09PM -0400, Alan Stern wrote:
> On Sun, 20 Jun 2010, mark gross wrote:
>
> > Your confused about what problem this patch attempts to solve.
>
> I don't think so. Rafael's description was pretty clear.
Then how is it you don't understand the fact that Rafael's patch is to
solve the wake event notification suspend race and not block opertunistic
suspends or kernel critical sections where suspending should be disabled?
>
> > There is
> > a pm_qos patch in the works to address the suspend blocker
> > functionality.
> > http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html
>
> No. That patch addresses something _similar_ to the suspend blocker
> functionality. The fact remains, though, that pm_qos is not used
> during system suspend (the /sys/power/state interface), hence changes
> to pm_qos won't solve the system-suspend problems that suspend blockers
> do solve.
You keep saying they solve something, I keep wondering what you are
talking aobut.
Lets see what problems it solves:
* implements oppertunistic suspending (this is a feature not a problem)
* enables kernel critical sections blocking suspending.
* requiers overlapping application specific critcal sections from ISR
into user mode to make implementation correct.
* exposes a user mode interface to set a critical section.
* reduces races between wake events (or suspend blocking events) but I'm
not convinced it solves them.
suspend blockers provide a way to block oppertunistic suspending, wich
I'll have you know, is a pain to get working right and the enabling from
device to device is not very portable and *that* doesn't say good things
about the scheme.
--mgross
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <Pine.LNX.4.44L0.1006211125470.1687-100000@iolanthe.rowland.org>
@ 2010-06-22 1:58 ` mark gross
[not found] ` <20100622015837.GC12795@gvim.org>
1 sibling, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-22 1:58 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Mon, Jun 21, 2010 at 11:57:21AM -0400, Alan Stern wrote:
> On Sun, 20 Jun 2010, mark gross wrote:
>
> > > > 1) I don't think suspend blockers really solve or effectively articulate
> > > > the suspend wake event race problem.
> > >
> > > Why not?
> >
> > For starters the suspend block patch is about suspend blocking, at least
> > at first before competing ideas starting coming out and this race issue
> > was offered up as an excuse to not consider the alternative ideas, then
> > suddenly suspend blocking became a wake event notification mechanism
> > implementation that was baked into the implementation. The arguments
> > are still a blur and irritating to recall / look up again.
>
> I get the feeling you didn't fully absorb the intent of the original
> wakelock patches. Right from the start they were about fixing a race
> in system suspend: The system may go to sleep even though a pending
wrong. They are about 1) adding opportunistic suspend, and 2) adding
critical sections to block suspend.
No race issues where ever talked about until alternative solutions
where put up.
> wakeup event should have blocked or prevented the suspend. In
> particular that means notifying the PM core about wakeup events, when
> they occur, and when the system may once again be allowed to suspend.
>
> The userspace parts of the original patches did cloud the issue,
> admittedly. But the purpose of the in-kernel parts has always been
> clear.
>
> > But, the point I'm trying to make is that wake event serialization /
> > event handling suddenly became a big FUD-pie tightly bound to a specific
> > feature for suspend blocking (or was is auto suspending? Its a magical
> > solution to all the PM problems in the kernel. I'm being sarcastic.)
> >
> > Its much better to call out the issue and provide a clean targeted
> > solution to it without binding it to some other solution to a different
> > problem.
>
> That's exactly what the wakelock patches did: They called out the
> issue of the system suspending even while there were pending wakeup
> events, they provided a targeted solution, and it wasn't bound to
> another solution to a different problem.
>
> Indeed, if you go back through the (I agree, irritating) threads on
> this topic, you'll see that the FUD and other issues were all
> introduced by other kernel developers, mainly because they didn't like
> the idea of using system suspend as a mechanism for dynamic power
> management.
I am struck by how differently you are seeing things.
>
> > > > I don't think suspend-blocker handles both kinds of races as well as you
> > > > seem to think.
> > >
> > > Can you give any counterexamples?
> >
> > I knew you'd ask such a thing. Do you have any correctness proofs of it
> > working right?
>
> If you want, sure. But what you think "working right" means may not be
> the same as what I think.
>
> > Lets consider the RIL incoming call race:
>
> "RIL"?
You are not an android developer are you? RIL is the user mode Radio
Interface Layer.
> > 1) user mode initiates an "opportunistic suspend" or whatever we call
> > this these days by writing to some sys interface.
>
> Okay.
>
> > 2) a call comes in (multi-threaded cpu) just after the write.
>
> Right. Presumably we want the suspend to be delayed until the call
> can be handled by a user programm, yes?
>
> > 3) the caif driver grabs a blocker, stopping the in flight suspend in the
> > nick of time. But, when should it release it without racing? How does
> > one implement that kernel code such that its portable to non-android user
> > mode stacks?
>
> I don't know anything about the caif driver so I can't answer this
> question in detail. Here's the general idea: Somehow the kernel has to
> notify userspace that an incoming call has arrived. Whatever driver or
> subsystem sends that notification should release the suspend blocker
> once userspace indicates that the notification has been received (for
> example, by reading all the data in an input event queue). That's true
> for both Android and non-Android.
so tell me how user space indicates to the kernel object will ever know
when its ok to release the blocker?
then tell me how suspend blocker provide an elegant portable solution
to this that works for multiple user mode stacks.
>
> In some cases there may not be any mechanism for userspace to tell the
> kernel when a notification has been received. For thoses cases, the
> Android people used timer-based blockers. This is not necessarily the
> best approach but they seem happy with it. Others might prefer to add
> an explicit "notification received" mechanism.
the time-based blockers are not part of the latest patch set.
>
> Still others take the view that since suspends are initiated by
> userspace anyway, it's not necessary to tell the kernel when suspend is
> safe again. Just tell the user process responsible for initiating
> suspends.
>
> > > > A single tool that conflates multiple kernel facilities
> > > > is not an advantage. It limits applications outside of the scope of
> > > > doing it the "android way".
> > >
> > > I don't see that necessarily as a drawback. You might just as well say
> > > that the entire Linux kernel limits applications to doing things the
> > > "Unix" way. Often have fewer choices is an advantage.
> >
> > I disagree with your analogy. One problem one solution with minimal
> > coupling to other implementation details is nice. Two problems with one
> > solution dependent on the system wide architecture bound to a user mode
> > PM design is fragile and not portable.
>
> I assume the "two problems" you refer to are: telling the PM core that
> the kernel has a wakeup event in progress, and telling the PM core that
> userspace has a wakeup event in progress. To me these don't seem to be
> vastly different problems, and having a single solution for both
> doesn't seem out of line.
nope.
problem 1) opportunistic enable suspend deferral for critical sections
when suspending is "bad"
problem 2) race between entering pm_suspend call tree and wake events.
> The fact that this is bound to a user-mode PM design was inevitable,
> given the whole idea was to overcome weaknesses in the system suspend
> implementation and that implementation already is driven by userspace.
I think we can do better.
> > > (Incidentally, even on Android the centralized PM process does not
> > > handle _all_ of the userspace/kernel wakelock interactions.)
> > >
> >
> > Yeah, I've been told that too. I've grepped for where the wake_unlock
> > sysfs interfaces are hit from user mode android (eclair) and I would
> > like some help in identifying those guys. Maybe its in the FroYo code I
> > don't get to look at yet?
> >
> > There is libhardware_legacy/power/power.c and an out of tree kernel
> > broadcom bcm4329 driver under hardware/broadcom but that doesn't count
> > as it looks like a kernel driver to me.
>
> I don't know -- I have never looked at the Android userspace. No doubt
> Arve can provide some examples.
who are you? You don't know about the Android stack, and you keep
blowing about how suspend blockers solve all the PM problems? I'm
starting to think your just trolling.
--mgross
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <20100622012519.GB12795@gvim.org>
@ 2010-06-22 2:24 ` Alan Stern
0 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-22 2:24 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Linux-pm mailing list
On Mon, 21 Jun 2010, mark gross wrote:
> On Mon, Jun 21, 2010 at 12:01:09PM -0400, Alan Stern wrote:
> > On Sun, 20 Jun 2010, mark gross wrote:
> >
> > > Your confused about what problem this patch attempts to solve.
> >
> > I don't think so. Rafael's description was pretty clear.
>
> Then how is it you don't understand the fact that Rafael's patch is to
> solve the wake event notification suspend race and not block opertunistic
> suspends or kernel critical sections where suspending should be disabled?
I don't know what gave you the idea that I think Rafael's patch is
meant to block kernel critical sections. I certainly don't think that.
However leaving that aside, the rest of the above is just two different
ways of saying the same thing:
Wakeup events should cause suspend to be disabled until the
events are processed.
Arrival of wakeup events races with initiation of system
suspend (whether opportunistic or not).
Therefore blocking suspends when they ought to be disabled requires us
to solve the wake event/suspend race. You can't do one without doing
the other.
> > > There is
> > > a pm_qos patch in the works to address the suspend blocker
> > > functionality.
> > > http://lists.linux-foundation.org/pipermail/linux-pm/2010-June/026760.html
> >
> > No. That patch addresses something _similar_ to the suspend blocker
> > functionality. The fact remains, though, that pm_qos is not used
> > during system suspend (the /sys/power/state interface), hence changes
> > to pm_qos won't solve the system-suspend problems that suspend blockers
> > do solve.
>
> You keep saying they solve something, I keep wondering what you are
> talking aobut.
> Lets see what problems it solves:
> * implements oppertunistic suspending (this is a feature not a problem)
> * enables kernel critical sections blocking suspending.
> * requiers overlapping application specific critcal sections from ISR
> into user mode to make implementation correct.
> * exposes a user mode interface to set a critical section.
> * reduces races between wake events (or suspend blocking events) but I'm
> not convinced it solves them.
The last item on your list is what I meant. The others are not
problems solved by suspend blockers. Well, maybe the second is, but I
don't know of any kernel critical sections that need to block suspend
other than those caused by wakeup events.
I agree with you that bundling opportunistic suspend and the user mode
interface together with suspend blockers made the situation more
difficult by mixing up the important issues.
> suspend blockers provide a way to block oppertunistic suspending, wich
> I'll have you know, is a pain to get working right and the enabling from
> device to device is not very portable and *that* doesn't say good things
> about the scheme.
The real problem with the scheme is to define which events should count
as "wakeup" events and to determine when they have been fully handled.
I can see that these might well vary from one platform to another.
(Maybe that's what you mean by saying the enabling is not very
portable.) But these issues are unavoidable; any scheme will have to
address them.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100622015837.GC12795@gvim.org>
@ 2010-06-22 2:46 ` Alan Stern
2010-06-22 6:18 ` Florian Mickler
` (2 subsequent siblings)
3 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-22 2:46 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Mon, 21 Jun 2010, mark gross wrote:
> > I get the feeling you didn't fully absorb the intent of the original
> > wakelock patches. Right from the start they were about fixing a race
> > in system suspend: The system may go to sleep even though a pending
>
> wrong. They are about 1) adding opportunistic suspend, and 2) adding
> critical sections to block suspend.
>
> No race issues where ever talked about until alternative solutions
> where put up.
That's not how I remember it. But this point isn't important enough to
be worth digging through the old emails to verify or disprove it.
> I am struck by how differently you are seeing things.
Those discussions were (and still are!) so long and annoying, it
wouldn't be surprising if _everybody_ saw them differently! :-) You
certainly must agree that a lot of differing personal viewpoints were
expressed.
> > > Lets consider the RIL incoming call race:
> >
> > "RIL"?
>
> You are not an android developer are you?
Nope.
> RIL is the user mode Radio
> Interface Layer.
>
> > > 1) user mode initiates an "opportunistic suspend" or whatever we call
> > > this these days by writing to some sys interface.
> >
> > Okay.
> >
> > > 2) a call comes in (multi-threaded cpu) just after the write.
> >
> > Right. Presumably we want the suspend to be delayed until the call
> > can be handled by a user programm, yes?
> >
> > > 3) the caif driver grabs a blocker, stopping the in flight suspend in the
> > > nick of time. But, when should it release it without racing? How does
> > > one implement that kernel code such that its portable to non-android user
> > > mode stacks?
> >
> > I don't know anything about the caif driver so I can't answer this
> > question in detail. Here's the general idea: Somehow the kernel has to
> > notify userspace that an incoming call has arrived. Whatever driver or
> > subsystem sends that notification should release the suspend blocker
> > once userspace indicates that the notification has been received (for
> > example, by reading all the data in an input event queue). That's true
> > for both Android and non-Android.
>
> so tell me how user space indicates to the kernel object will ever know
> when its ok to release the blocker?
First you tell me how the kernel indicates to userspace that an
incoming call has arrived. My answer will depend on that.
> then tell me how suspend blocker provide an elegant portable solution
> to this that works for multiple user mode stacks.
When did I -- or anyone -- ever say it was "elegant"?
As for multiple user mode stacks, I don't see the issue. If you grant
there is a mechanism whereby userspace indicates to the kernel that a
blocker may be released, then it seems obvious that this mechanism
could be used in multiple user mode stacks.
> problem 1) opportunistic enable suspend deferral for critical sections
> when suspending is "bad"
> problem 2) race between entering pm_suspend call tree and wake events.
Can you point out any examples of kernel critical sections where
suspending is "bad" that aren't started by arrival of a wakeup event?
Unless you can, I'm justified in saying that these "two" problems are
one and the same.
> > The fact that this is bound to a user-mode PM design was inevitable,
> > given the whole idea was to overcome weaknesses in the system suspend
> > implementation and that implementation already is driven by userspace.
>
> I think we can do better.
Maybe we can. I'm certainly not going to claim that suspend blockers
are the best possible solution. And I'm not really keen on seeing them
merged along with all the attendant opportunistic suspend and userspace
API stuff. But so far I haven't heard anything significantly better.
> > I don't know -- I have never looked at the Android userspace. No doubt
> > Arve can provide some examples.
> who are you?
You sound like a Vorlon. Should I ask: "What do you want?" :-)
> You don't know about the Android stack, and you keep
> blowing about how suspend blockers solve all the PM problems?
Not all of them. Just the race between wakeup events and suspend.
> I'm
> starting to think your just trolling.
You may think what you like. Perhaps there is a certain degree of
truth to the accusation -- goodness knows, I do tend to prolong
technical conversations when I think I'm right. (Ask anybody who has
corresponded with me for any length of time.) But this is probably
true of a lot of kernel developers...
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100622015837.GC12795@gvim.org>
2010-06-22 2:46 ` Alan Stern
@ 2010-06-22 6:18 ` Florian Mickler
2010-06-22 9:29 ` Rafael J. Wysocki
[not found] ` <20100622081803.4d472542@schatten.dmk.lab>
3 siblings, 0 replies; 50+ messages in thread
From: Florian Mickler @ 2010-06-22 6:18 UTC (permalink / raw)
To: markgross
Cc: 640e9920, Neil Brown, Dmitry Torokhov, Linux Kernel Mailing List,
Matthew, linux-pm
On Mon, 21 Jun 2010 18:58:37 -0700
mark gross <640e9920@gmail.com> wrote:
> wrong. They are about 1) adding opportunistic suspend, and 2) adding
> critical sections to block suspend.
>
> No race issues where ever talked about until alternative solutions
> where put up.
The whole and only reason to even define the term "critical sections" is
when you need to define "a race". Or vice versa. A race is prevented by
defining critical sections and protecting these against concurrent
access.
[..snip..]
[some rant that alan is not familiar with android userspace..]
Are you suggesting that only android developers are supposed to talk
about this?
This is a pretty basic thing. It has only to do with system suspend.
(And using system suspend aggressively)
>
> --mgross
>
Cheers,
Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <Pine.LNX.4.44L0.1006212224550.19813-100000@netrider.rowland.org>
@ 2010-06-22 9:24 ` Rafael J. Wysocki
0 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-22 9:24 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, linux-pm
On Tuesday, June 22, 2010, Alan Stern wrote:
> On Mon, 21 Jun 2010, mark gross wrote:
>
...
> > I'm starting to think your just trolling.
>
> You may think what you like. Perhaps there is a certain degree of
> truth to the accusation -- goodness knows, I do tend to prolong
> technical conversations when I think I'm right. (Ask anybody who has
> corresponded with me for any length of time.) But this is probably
> true of a lot of kernel developers...
I certainly don't mind discussing with you. :-)
Thanks,
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100622015837.GC12795@gvim.org>
2010-06-22 2:46 ` Alan Stern
2010-06-22 6:18 ` Florian Mickler
@ 2010-06-22 9:29 ` Rafael J. Wysocki
[not found] ` <20100622081803.4d472542@schatten.dmk.lab>
3 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-22 9:29 UTC (permalink / raw)
To: markgross
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, linux-pm
On Tuesday, June 22, 2010, mark gross wrote:
> On Mon, Jun 21, 2010 at 11:57:21AM -0400, Alan Stern wrote:
> > On Sun, 20 Jun 2010, mark gross wrote:
...
> > I don't know -- I have never looked at the Android userspace. No doubt
> > Arve can provide some examples.
> who are you? You don't know about the Android stack, and you keep
> blowing about how suspend blockers solve all the PM problems? I'm
> starting to think your just trolling.
And that's just an excellent way of convincing people that you're right. Guess
how it worked on me.
For your information, Alan is one of the people most actively engaged in
power management development at the kernel level and his ideas have heavily
influenced the design of the I/O runtime PM framework, among other things.
I certainly appeciate Alan's opinions very much, because he usually is
technically right.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <201006220040.41524.rjw@sisk.pl>
2010-06-21 22:48 ` Rafael J. Wysocki
2010-06-22 0:50 ` Arve Hjønnevåg
@ 2010-06-22 10:21 ` Rafael J. Wysocki
[not found] ` <201006221221.53801.rjw@sisk.pl>
3 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-22 10:21 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, Florian Mickler,
Linux-pm mailing list
On Tuesday, June 22, 2010, Rafael J. Wysocki wrote:
> On Tuesday, June 22, 2010, Alan Stern wrote:
> > On Mon, 21 Jun 2010, Florian Mickler wrote:
> >
> > > > In the end you would want to have communication in both directions:
> > > > suspend blockers _and_ callbacks. Polling is bad if done too often.
> > > > But I think the idea is a good one.
> > >
> > > Actually, I'm not so shure.
> > >
> > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have
> > > active annotations (i.e. no further action needed)
> >
> > That's why it's best to use both. The normal case is that programs
> > activate and deactivate blockers by sending one-way messages to the PM
> > process. The exceptional case is when the PM process is about to
> > initiate a suspend; that's when it does the round-trip polling. Since
> > the only purpose of the polling is to avoid a race, 90% of the time it
> > will succeed.
> >
> > > 2. it may not be possible for a user to determine if a wake-event is
> > > in-flight. you would have to somehow pass the wake-event-number with
> > > it, so that the userspace process could ack it properly without
> > > confusion. Or... I don't know of anything else...
> > >
> > > 1. userspace-manager (UM) reads a number (42).
> > >
> > > 2. it questions userspace program X: is it ok to suspend?
> > >
> > > [please fill in how userspace program X determines to block
> > > suspend]
> > >
> > > 3a. UM's roundtrip ends and it proceeds to write "42" to the
> > > kernel [suspending]
> > > 3b. UM's roundtrip ends and it aborts suspend, because a
> > > (userspace-)suspend-blocker got activated
> > >
> > > I'm not shure how the userspace program could determine that there is a
> > > wake-event in flight. Perhaps by storing the number of last wake-event.
> > > But then you need per-wake-event-counters... :|
> >
> > Rafael seems to think timeouts will fix this. I'm not so sure.
> >
> > > Do you have some thoughts about the wake-event-in-flight detection?
> >
> > Not really, except for something like the original wakelock scheme in
> > which the kernel tells the PM core when an event is over.
>
> But the kernel doesn't really know that, so it really can't tell the PM core
> anything useful. What happens with suspend blockers is that a kernel subsystem
> cooperates with a user space consumer of the event to get the story straight.
>
> However, that will only work if the user space is not buggy and doesn't crash,
> for example, before releasing the suspend blocker it's holding.
Having reconsidered that I think there's more to it.
Take the PCI subsystem as an example, specifically pcie_pme_handle_request().
This is the place where wakeup events are started, but it has no idea about
how they are going to be handled. Thus in the suspend blocker scheme it would
need to activate a blocker, but it wouldn't be able to release it. So, it
seems, we would need to associate a suspend blocker with each PCIe device
that can generate wakeup events and require all drivers of such devices to
deal with a blocker activated by someone else (PCIe PME driver in this
particular case). That sounds cumbersome to say the least.
Moreover, even if we do that, it still doesn't solve the entire problem,
because the event may need to be delivered to user space and processed by it.
While a driver can check if user space has already read the event, it has
no way to detect whether or not it has finished processing it. In fact,
processing an event may involve an interaction with a (human) user and there's
no general way by which software can figure out when the user considers the
event as processed.
It looks like user space suspend blockers only help in some special cases
when the user space processing of a wakeup event is simple enough, but I don't
think they help in general. For an extreme example, a user may want to wake up
a system using wake-on-LAN to log into it, do some work and log out, so
effectively the initial wakeup event has not been processed entirely until the
user finally logs out of the system. Now, after the system wakeup (resulting
from the wake-on-LAN signal) we need to give the user some time to log in, but
if the user doesn't do that in certain time, it may be reasonable to suspend
and let the user wake up the system again.
Similar situation takes place when the system is woken up by a lid switch.
Evidently, the user has opened the laptop lid to do something, but we don't
even know what the user is going to do, so there's no way we can say when
the wakeup event is finally processed.
So, even if we can say when the kernel has finished processing the event
(although that would be complicated in the PCIe case above), I don't think
it's generally possible to ensure that the entire processing of a wakeup event
has been completed. This leads to the question whether or not it is worth
trying to detect the ending of the processing of a wakeup event.
Now, going back to the $subject patch, I didn't really think it would be
suitable for opportunistic suspend, so let's focus on the "forced" suspend
instead. It still has the problem that wakeup events occuring while
/sys/power/state is written to (or even slightly before) should cause the
system to cancel the suspend, but they generally won't. With the patch
applied that can be avoided by (a) reading from /sys/power/wakeup_count,
(b) waiting for certain time (such that if a suspend event is not entirely
processed within that time, it's worth suspending and waking up the
system again) and (c) writing to /sys/power/wakeup_count right before writing
to /sys/power/state (where the latter is only done if the former succeeds).
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <201006221221.53801.rjw@sisk.pl>
@ 2010-06-22 14:35 ` Alan Stern
2010-06-22 23:00 ` mark gross
1 sibling, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-22 14:35 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Florian Mickler, Linux-pm mailing list
On Tue, 22 Jun 2010, Rafael J. Wysocki wrote:
> Having reconsidered that I think there's more to it.
>
> Take the PCI subsystem as an example, specifically pcie_pme_handle_request().
> This is the place where wakeup events are started, but it has no idea about
> how they are going to be handled. Thus in the suspend blocker scheme it would
> need to activate a blocker, but it wouldn't be able to release it. So, it
> seems, we would need to associate a suspend blocker with each PCIe device
> that can generate wakeup events and require all drivers of such devices to
> deal with a blocker activated by someone else (PCIe PME driver in this
> particular case). That sounds cumbersome to say the least.
Maybe this means pcie_pme_handle_request() isn't a good place to note
the arrival of a wakeup event. Doing it in the individual driver might
work out better.
Or look at this from a different point of view. Adopting Mark's
terminology, let's say that the kernel is in a "critical section" at
times when we don't want the system to suspend. Currently there's no
way to tell the PM core when the kernel enters or leaves a critical
section, so there's no way to prevent the system from suspending at the
wrong time.
Most wakeup events indicate the start of a critical section, in the
sense that you hardly ever say: "I want the computer to wake up when I
press this button, but I don't care what it does afterward -- it can go
right back to sleep without doing anything if it wants." Much more
common is that a wakeup event requires a certain amount of processing,
and you don't want the system to go back to sleep until the processing
is over. Of course, if the processing is simple enough that it can all
be done in an interrupt handler or a resume method, then nothing
extra is needed since obviously the system won't suspend while an
interrupt handler or a resume method is running. But for more
complicated cases, we need to do more.
The problem in your example is that pcie_pme_handle_request() has no
idea about the nature or extent of the critical section to follow.
Therefore it's not in a good position to mark the beginning of the
critical section, even though it is in an excellent position to mark
the receipt of a wakeup event.
> Moreover, even if we do that, it still doesn't solve the entire problem,
> because the event may need to be delivered to user space and processed by it.
> While a driver can check if user space has already read the event, it has
> no way to detect whether or not it has finished processing it. In fact,
> processing an event may involve an interaction with a (human) user and there's
> no general way by which software can figure out when the user considers the
> event as processed.
Is this the kernel's problem? Once userspace has read the event, we
can safely say that the kernel's critical section is over. Perhaps a
userspace critical section has begun, perhaps not; either way, it's no
longer the kernel's responsibility.
> It looks like user space suspend blockers only help in some special cases
> when the user space processing of a wakeup event is simple enough, but I don't
> think they help in general. For an extreme example, a user may want to wake up
> a system using wake-on-LAN to log into it, do some work and log out, so
> effectively the initial wakeup event has not been processed entirely until the
> user finally logs out of the system. Now, after the system wakeup (resulting
> from the wake-on-LAN signal) we need to give the user some time to log in, but
> if the user doesn't do that in certain time, it may be reasonable to suspend
> and let the user wake up the system again.
I agree. This is a case where there is no clear-cut end to the
kernel's critical section, because the event is not handed over to
userspace. A reasonable approach would be to use a timeout, perhaps
also with some heuristic like: End the critical section early if we
receive 100(?) more network packets before the timeout expires.
> Similar situation takes place when the system is woken up by a lid switch.
> Evidently, the user has opened the laptop lid to do something, but we don't
> even know what the user is going to do, so there's no way we can say when
> the wakeup event is finally processed.
In this case, the kernel could inform an appropriate user process (some
part of DeviceKit? or the power-manager process?) about the lid-switch
event. Once that information had been passed on, the kernel's critical
section would be over. The process could start its own critical
section or not, as it sees fit.
If there is no process to send the information to, the kernel could
again end the critical section after a reasonable timeout (1 minute?).
> So, even if we can say when the kernel has finished processing the event
> (although that would be complicated in the PCIe case above), I don't think
> it's generally possible to ensure that the entire processing of a wakeup event
> has been completed. This leads to the question whether or not it is worth
> trying to detect the ending of the processing of a wakeup event.
As Arve pointed out, in some cases it definitely is worthwhile (the
gpio keypad matrix example). In other cases there may be no reasonable
way to tell. That doesn't mean we have to give up entirely.
> Now, going back to the $subject patch, I didn't really think it would be
> suitable for opportunistic suspend, so let's focus on the "forced" suspend
> instead. It still has the problem that wakeup events occuring while
> /sys/power/state is written to (or even slightly before) should cause the
> system to cancel the suspend, but they generally won't. With the patch
> applied that can be avoided by (a) reading from /sys/power/wakeup_count,
> (b) waiting for certain time (such that if a suspend event is not entirely
> processed within that time, it's worth suspending and waking up the
> system again) and (c) writing to /sys/power/wakeup_count right before writing
> to /sys/power/state (where the latter is only done if the former succeeds).
In other words, you could detect if a critical section begins after the
user process has decided to initiate a suspend. Yes, that's so.
On the other hand, we should already be able to abort a suspend if a
wakeup event arrives after tasks are frozen (to pick a reasonable
boundary point). If we can't -- if some wakeup events are able to slip
through our fingers -- I would say it's a bug and the relevant drivers
need to be fixed. In the end this probably will require adding a
function to notify the PM core that a wakeup event occurred and
therefore a suspend-in-progress should be aborted -- almost exactly
what pm_wakeup_event() does.
So I'm not opposed to the new function. But it doesn't solve the
entire problem of avoiding suspends during critical sections.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <Pine.LNX.4.44L0.1006220952290.1631-100000@iolanthe.rowland.org>
@ 2010-06-22 15:35 ` Rafael J. Wysocki
0 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-22 15:35 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, Florian Mickler,
Linux-pm mailing list
On Tuesday, June 22, 2010, Alan Stern wrote:
> On Tue, 22 Jun 2010, Rafael J. Wysocki wrote:
>
> > Having reconsidered that I think there's more to it.
> >
> > Take the PCI subsystem as an example, specifically pcie_pme_handle_request().
> > This is the place where wakeup events are started, but it has no idea about
> > how they are going to be handled. Thus in the suspend blocker scheme it would
> > need to activate a blocker, but it wouldn't be able to release it. So, it
> > seems, we would need to associate a suspend blocker with each PCIe device
> > that can generate wakeup events and require all drivers of such devices to
> > deal with a blocker activated by someone else (PCIe PME driver in this
> > particular case). That sounds cumbersome to say the least.
>
> Maybe this means pcie_pme_handle_request() isn't a good place to note
> the arrival of a wakeup event. Doing it in the individual driver might
> work out better.
But it this case there will be an open window in which suspend may be started
after the wakeup event is signaled, but before the PM core is told about it.
> Or look at this from a different point of view. Adopting Mark's
> terminology, let's say that the kernel is in a "critical section" at
> times when we don't want the system to suspend. Currently there's no
> way to tell the PM core when the kernel enters or leaves a critical
> section, so there's no way to prevent the system from suspending at the
> wrong time.
>
> Most wakeup events indicate the start of a critical section, in the
> sense that you hardly ever say: "I want the computer to wake up when I
> press this button, but I don't care what it does afterward -- it can go
> right back to sleep without doing anything if it wants." Much more
> common is that a wakeup event requires a certain amount of processing,
> and you don't want the system to go back to sleep until the processing
> is over. Of course, if the processing is simple enough that it can all
> be done in an interrupt handler or a resume method, then nothing
> extra is needed since obviously the system won't suspend while an
> interrupt handler or a resume method is running. But for more
> complicated cases, we need to do more.
>
> The problem in your example is that pcie_pme_handle_request() has no
> idea about the nature or extent of the critical section to follow.
Exactly.
> Therefore it's not in a good position to mark the beginning of the
> critical section, even though it is in an excellent position to mark
> the receipt of a wakeup event.
I think we have no choice but to regard the detection of a wakeup event as the
beginning of a "suspend critical section".
> > Moreover, even if we do that, it still doesn't solve the entire problem,
> > because the event may need to be delivered to user space and processed by it.
> > While a driver can check if user space has already read the event, it has
> > no way to detect whether or not it has finished processing it. In fact,
> > processing an event may involve an interaction with a (human) user and there's
> > no general way by which software can figure out when the user considers the
> > event as processed.
>
> Is this the kernel's problem? Once userspace has read the event, we
> can safely say that the kernel's critical section is over. Perhaps a
> userspace critical section has begun, perhaps not; either way, it's no
> longer the kernel's responsibility.
Well, I agree here, but in the suspend blockers world it is the kernel
responsibility, because the kernel contains the power manager part.
> > It looks like user space suspend blockers only help in some special cases
> > when the user space processing of a wakeup event is simple enough, but I don't
> > think they help in general. For an extreme example, a user may want to wake up
> > a system using wake-on-LAN to log into it, do some work and log out, so
> > effectively the initial wakeup event has not been processed entirely until the
> > user finally logs out of the system. Now, after the system wakeup (resulting
> > from the wake-on-LAN signal) we need to give the user some time to log in, but
> > if the user doesn't do that in certain time, it may be reasonable to suspend
> > and let the user wake up the system again.
>
> I agree. This is a case where there is no clear-cut end to the
> kernel's critical section, because the event is not handed over to
> userspace. A reasonable approach would be to use a timeout, perhaps
> also with some heuristic like: End the critical section early if we
> receive 100(?) more network packets before the timeout expires.
Exactly.
> > Similar situation takes place when the system is woken up by a lid switch.
> > Evidently, the user has opened the laptop lid to do something, but we don't
> > even know what the user is going to do, so there's no way we can say when
> > the wakeup event is finally processed.
>
> In this case, the kernel could inform an appropriate user process (some
> part of DeviceKit? or the power-manager process?) about the lid-switch
> event. Once that information had been passed on, the kernel's critical
> section would be over. The process could start its own critical
> section or not, as it sees fit.
>
> If there is no process to send the information to, the kernel could
> again end the critical section after a reasonable timeout (1 minute?).
Agreed.
> > So, even if we can say when the kernel has finished processing the event
> > (although that would be complicated in the PCIe case above), I don't think
> > it's generally possible to ensure that the entire processing of a wakeup event
> > has been completed. This leads to the question whether or not it is worth
> > trying to detect the ending of the processing of a wakeup event.
>
> As Arve pointed out, in some cases it definitely is worthwhile (the
> gpio keypad matrix example). In other cases there may be no reasonable
> way to tell. That doesn't mean we have to give up entirely.
Well, I'm not sure, because that really depends on the hardware and bus in
question. The necessary condition seems to be that the event be detected
and handled entirely by the same functional unit (eg. a device driver) within
the kernel and such that it is able to detect whether or not user space has
acquired the event information. That doesn't seem to be a common case to me.
> > Now, going back to the $subject patch, I didn't really think it would be
> > suitable for opportunistic suspend, so let's focus on the "forced" suspend
> > instead. It still has the problem that wakeup events occuring while
> > /sys/power/state is written to (or even slightly before) should cause the
> > system to cancel the suspend, but they generally won't. With the patch
> > applied that can be avoided by (a) reading from /sys/power/wakeup_count,
> > (b) waiting for certain time (such that if a suspend event is not entirely
> > processed within that time, it's worth suspending and waking up the
> > system again) and (c) writing to /sys/power/wakeup_count right before writing
> > to /sys/power/state (where the latter is only done if the former succeeds).
>
> In other words, you could detect if a critical section begins after the
> user process has decided to initiate a suspend. Yes, that's so.
Generally yes, although I think it will also detect "critical sections"
starting exactly at the moment the suspend is started. Which in fact is the
purpose of the patch.
> On the other hand, we should already be able to abort a suspend if a
> wakeup event arrives after tasks are frozen (to pick a reasonable
> boundary point). If we can't -- if some wakeup events are able to slip
> through our fingers -- I would say it's a bug and the relevant drivers
> need to be fixed. In the end this probably will require adding a
> function to notify the PM core that a wakeup event occurred and
> therefore a suspend-in-progress should be aborted -- almost exactly
> what pm_wakeup_event() does.
That's correct.
> So I'm not opposed to the new function. But it doesn't solve the
> entire problem of avoiding suspends during critical sections.
Surely not and it isn't my goal at this point.
I think there are a few different issues that the suspend blockers (or
wakelocks) framework attempts to address in one bit hammer. To me, they are
at least (1) deciding when to suspend, (2) detecting events that should make
us avoid suspending (or abort suspend if already started), (3) preventing
"untrusted" processes from making the system use too much energy.
IMO it's better to treat them as different issues and try to address them
separately.
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <201006221735.10467.rjw@sisk.pl>
@ 2010-06-22 19:55 ` Alan Stern
0 siblings, 0 replies; 50+ messages in thread
From: Alan Stern @ 2010-06-22 19:55 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Florian Mickler, Linux-pm mailing list
On Tue, 22 Jun 2010, Rafael J. Wysocki wrote:
> On Tuesday, June 22, 2010, Alan Stern wrote:
> > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote:
> >
> > > Having reconsidered that I think there's more to it.
> > >
> > > Take the PCI subsystem as an example, specifically pcie_pme_handle_request().
> > > This is the place where wakeup events are started, but it has no idea about
> > > how they are going to be handled. Thus in the suspend blocker scheme it would
> > > need to activate a blocker, but it wouldn't be able to release it. So, it
> > > seems, we would need to associate a suspend blocker with each PCIe device
> > > that can generate wakeup events and require all drivers of such devices to
> > > deal with a blocker activated by someone else (PCIe PME driver in this
> > > particular case). That sounds cumbersome to say the least.
> >
> > Maybe this means pcie_pme_handle_request() isn't a good place to note
> > the arrival of a wakeup event. Doing it in the individual driver might
> > work out better.
>
> But it this case there will be an open window in which suspend may be started
> after the wakeup event is signaled, but before the PM core is told about it.
That's true but it doesn't matter, assuming the suspend can't progress
during this window.
> > The problem in your example is that pcie_pme_handle_request() has no
> > idea about the nature or extent of the critical section to follow.
>
> Exactly.
>
> > Therefore it's not in a good position to mark the beginning of the
> > critical section, even though it is in an excellent position to mark
> > the receipt of a wakeup event.
>
> I think we have no choice but to regard the detection of a wakeup event as the
> beginning of a "suspend critical section".
Receipt of a wakeup event triggers a whole series of function calls,
including calls to the resume methods of every driver. The system
should be designed so that the next suspend can't begin until those
function calls complete. For example, the next suspend certainly can't
begin before the resume methods all complete. Given that premise, any
one of those functions could serve as the start of a suspend critical
section.
> > > Moreover, even if we do that, it still doesn't solve the entire problem,
> > > because the event may need to be delivered to user space and processed by it.
> > > While a driver can check if user space has already read the event, it has
> > > no way to detect whether or not it has finished processing it. In fact,
> > > processing an event may involve an interaction with a (human) user and there's
> > > no general way by which software can figure out when the user considers the
> > > event as processed.
> >
> > Is this the kernel's problem? Once userspace has read the event, we
> > can safely say that the kernel's critical section is over. Perhaps a
> > userspace critical section has begun, perhaps not; either way, it's no
> > longer the kernel's responsibility.
>
> Well, I agree here, but in the suspend blockers world it is the kernel
> responsibility, because the kernel contains the power manager part.
In the suspend blockers world (or at least, in Android's version of the
suspend blockers world), userspace would activate another suspend
blocker before reading the event. The kernel's critical section could
then end safely once the event was read.
> > > So, even if we can say when the kernel has finished processing the event
> > > (although that would be complicated in the PCIe case above), I don't think
> > > it's generally possible to ensure that the entire processing of a wakeup event
> > > has been completed. This leads to the question whether or not it is worth
> > > trying to detect the ending of the processing of a wakeup event.
> >
> > As Arve pointed out, in some cases it definitely is worthwhile (the
> > gpio keypad matrix example). In other cases there may be no reasonable
> > way to tell. That doesn't mean we have to give up entirely.
>
> Well, I'm not sure, because that really depends on the hardware and bus in
> question. The necessary condition seems to be that the event be detected
> and handled entirely by the same functional unit (eg. a device driver) within
> the kernel and such that it is able to detect whether or not user space has
> acquired the event information. That doesn't seem to be a common case to me.
It's hard to say how common this is without having a list of possible
wakeup sources. And of course, that list will differ from one platform
to another.
> > > Now, going back to the $subject patch, I didn't really think it would be
> > > suitable for opportunistic suspend, so let's focus on the "forced" suspend
> > > instead. It still has the problem that wakeup events occuring while
> > > /sys/power/state is written to (or even slightly before) should cause the
> > > system to cancel the suspend, but they generally won't. With the patch
> > > applied that can be avoided by (a) reading from /sys/power/wakeup_count,
> > > (b) waiting for certain time (such that if a suspend event is not entirely
> > > processed within that time, it's worth suspending and waking up the
> > > system again) and (c) writing to /sys/power/wakeup_count right before writing
> > > to /sys/power/state (where the latter is only done if the former succeeds).
> >
> > In other words, you could detect if a critical section begins after the
> > user process has decided to initiate a suspend. Yes, that's so.
>
> Generally yes, although I think it will also detect "critical sections"
> starting exactly at the moment the suspend is started. Which in fact is the
> purpose of the patch.
Well, the moment the suspend is started _does_ occur after the user
process decides to initiate a suspend. Hence critical sections
starting exactly at that moment would indeed be detected.
> > So I'm not opposed to the new function. But it doesn't solve the
> > entire problem of avoiding suspends during critical sections.
>
> Surely not and it isn't my goal at this point.
>
> I think there are a few different issues that the suspend blockers (or
> wakelocks) framework attempts to address in one bit hammer. To me, they are
> at least (1) deciding when to suspend, (2) detecting events that should make
> us avoid suspending (or abort suspend if already started), (3) preventing
> "untrusted" processes from making the system use too much energy.
>
> IMO it's better to treat them as different issues and try to address them
> separately.
Certainly (3) needs to be addressed separately. It should be handled
completely within userspace, if at all possible.
(1) and (2) are closely related. In fact, a reasonable criterion for
(1) might be: Suspend whenever it is allowed. Then (2) becomes: What
sorts of things should disallow suspend, and for how long?
Evidently part of the problem here is that for a very long time
(predating the existence of Linux), people have been using a bad
abstraction. We talk about "wakeup events", but an event occupies only
a single moment in time. If the computer happens to be asleep at that
moment then it wakes up, fine. But if it was already awake, then once
the moment is passed there's no reason not to suspend -- even if only
1 microsecond has elapsed. Likewise, if an event causes the computer
to wake up, then once the computer is awake the moment is over and
there's nothing to prevent the computer from immediately going back to
sleep.
Instead of talking about events, we should be talking about procedures
or sections: something that happens over a non-zero period of time.
But people have never thought in terms of wakeup procedures or suspend
critical sections, and so the kernel isn't designed to accomodate them.
We may know when they begin, but we often have only a cloudy idea of
when they end.
Historically, people have mostly had in mind that the answer to (1)
would be: Suspend whenever the user tells the computer to suspend. In
that kind of setting, (2) doesn't matter: When the user tells the
machine to suspend, it should obey.
But when we start to consider energy conservation and autonomous (or
opportunistic) suspend, things become more complex. This is why, for
example, the USB subsystem has a user-selectable autosuspend delay.
It's not an ideal solution, but it does prevent us from thrashing
between suspending and resuming a device over and over if it gets used
repeatedly during a short period of time.
We can design mechanisms until we are blue in the face (some of us may
be blue already!), but until we remedy this weakness in our thinking we
won't know how to apply them. Which means people won't be able to
agree on a single correct approach.
Alan Stern
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] <Pine.LNX.4.44L0.1006221508390.1322-100000@iolanthe.rowland.org>
@ 2010-06-22 20:58 ` Rafael J. Wysocki
0 siblings, 0 replies; 50+ messages in thread
From: Rafael J. Wysocki @ 2010-06-22 20:58 UTC (permalink / raw)
To: Alan Stern
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Arve, Florian Mickler,
Linux-pm mailing list
On Tuesday, June 22, 2010, Alan Stern wrote:
> On Tue, 22 Jun 2010, Rafael J. Wysocki wrote:
>
> > On Tuesday, June 22, 2010, Alan Stern wrote:
> > > On Tue, 22 Jun 2010, Rafael J. Wysocki wrote:
> > >
> > > > Having reconsidered that I think there's more to it.
> > > >
> > > > Take the PCI subsystem as an example, specifically pcie_pme_handle_request().
> > > > This is the place where wakeup events are started, but it has no idea about
> > > > how they are going to be handled. Thus in the suspend blocker scheme it would
> > > > need to activate a blocker, but it wouldn't be able to release it. So, it
> > > > seems, we would need to associate a suspend blocker with each PCIe device
> > > > that can generate wakeup events and require all drivers of such devices to
> > > > deal with a blocker activated by someone else (PCIe PME driver in this
> > > > particular case). That sounds cumbersome to say the least.
> > >
> > > Maybe this means pcie_pme_handle_request() isn't a good place to note
> > > the arrival of a wakeup event. Doing it in the individual driver might
> > > work out better.
> >
> > But it this case there will be an open window in which suspend may be started
> > after the wakeup event is signaled, but before the PM core is told about it.
>
> That's true but it doesn't matter, assuming the suspend can't progress
> during this window.
>
> > > The problem in your example is that pcie_pme_handle_request() has no
> > > idea about the nature or extent of the critical section to follow.
> >
> > Exactly.
> >
> > > Therefore it's not in a good position to mark the beginning of the
> > > critical section, even though it is in an excellent position to mark
> > > the receipt of a wakeup event.
> >
> > I think we have no choice but to regard the detection of a wakeup event as the
> > beginning of a "suspend critical section".
>
> Receipt of a wakeup event triggers a whole series of function calls,
> including calls to the resume methods of every driver. The system
> should be designed so that the next suspend can't begin until those
> function calls complete. For example, the next suspend certainly can't
> begin before the resume methods all complete. Given that premise, any
> one of those functions could serve as the start of a suspend critical
> section.
Well, consider pcie_pme_handle_request() again. It certainly can be called
during suspend (until the PME interrupt is disabled), but the PM workqueue
is frozen at this point, so the device driver's resume routine won't be called.
However, the wakeup signal from the device should be regarded as a wakeup
event in that case IMO.
[We have a check for that in dpm_prepare(), but I think it should be replaced
by the "proper" handling of wakeup events, once we have one.]
...
> > > > So, even if we can say when the kernel has finished processing the event
> > > > (although that would be complicated in the PCIe case above), I don't think
> > > > it's generally possible to ensure that the entire processing of a wakeup event
> > > > has been completed. This leads to the question whether or not it is worth
> > > > trying to detect the ending of the processing of a wakeup event.
> > >
> > > As Arve pointed out, in some cases it definitely is worthwhile (the
> > > gpio keypad matrix example). In other cases there may be no reasonable
> > > way to tell. That doesn't mean we have to give up entirely.
> >
> > Well, I'm not sure, because that really depends on the hardware and bus in
> > question. The necessary condition seems to be that the event be detected
> > and handled entirely by the same functional unit (eg. a device driver) within
> > the kernel and such that it is able to detect whether or not user space has
> > acquired the event information. That doesn't seem to be a common case to me.
>
> It's hard to say how common this is without having a list of possible
> wakeup sources. And of course, that list will differ from one platform
> to another.
Agreed.
...
> > I think there are a few different issues that the suspend blockers (or
> > wakelocks) framework attempts to address in one bit hammer. To me, they are
> > at least (1) deciding when to suspend, (2) detecting events that should make
> > us avoid suspending (or abort suspend if already started), (3) preventing
> > "untrusted" processes from making the system use too much energy.
> >
> > IMO it's better to treat them as different issues and try to address them
> > separately.
>
> Certainly (3) needs to be addressed separately. It should be handled
> completely within userspace, if at all possible.
>
> (1) and (2) are closely related. In fact, a reasonable criterion for
> (1) might be: Suspend whenever it is allowed. Then (2) becomes: What
> sorts of things should disallow suspend, and for how long?
The events I mean by (2) are the minimal subset of conditions used in (1),
because the user may add more restrictions that should be checked by user space.
For example, the user may request to suspend whenever there are no open SSH
connections to the machine (why not?), but even if that criterion is satisfied,
wake-on-LAN events should prevent suspend from happening.
> Evidently part of the problem here is that for a very long time
> (predating the existence of Linux), people have been using a bad
> abstraction. We talk about "wakeup events", but an event occupies only
> a single moment in time. If the computer happens to be asleep at that
> moment then it wakes up, fine. But if it was already awake, then once
> the moment is passed there's no reason not to suspend -- even if only
> 1 microsecond has elapsed. Likewise, if an event causes the computer
> to wake up, then once the computer is awake the moment is over and
> there's nothing to prevent the computer from immediately going back to
> sleep.
>
> Instead of talking about events, we should be talking about procedures
> or sections: something that happens over a non-zero period of time.
Agreed.
> But people have never thought in terms of wakeup procedures or suspend
> critical sections, and so the kernel isn't designed to accomodate them.
> We may know when they begin, but we often have only a cloudy idea of
> when they end.
Yeah.
> Historically, people have mostly had in mind that the answer to (1)
> would be: Suspend whenever the user tells the computer to suspend. In
> that kind of setting, (2) doesn't matter: When the user tells the
> machine to suspend, it should obey.
Well, not necessarily, because the user can change his mind while the machine
is suspending and try to generate a wakeup event to abort the suspend.
> But when we start to consider energy conservation and autonomous (or
> opportunistic) suspend, things become more complex. This is why, for
> example, the USB subsystem has a user-selectable autosuspend delay.
> It's not an ideal solution, but it does prevent us from thrashing
> between suspending and resuming a device over and over if it gets used
> repeatedly during a short period of time.
>
> We can design mechanisms until we are blue in the face (some of us may
> be blue already!), but until we remedy this weakness in our thinking we
> won't know how to apply them. Which means people won't be able to
> agree on a single correct approach.
I agree 100%.
Rafael
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <201006221221.53801.rjw@sisk.pl>
2010-06-22 14:35 ` Alan Stern
@ 2010-06-22 23:00 ` mark gross
1 sibling, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-22 23:00 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: mark gross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, Florian Mickler, Linux-pm mailing list
On Tue, Jun 22, 2010 at 12:21:53PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, June 22, 2010, Rafael J. Wysocki wrote:
> > On Tuesday, June 22, 2010, Alan Stern wrote:
> > > On Mon, 21 Jun 2010, Florian Mickler wrote:
> > >
> > > > > In the end you would want to have communication in both directions:
> > > > > suspend blockers _and_ callbacks. Polling is bad if done too often.
> > > > > But I think the idea is a good one.
> > > >
> > > > Actually, I'm not so shure.
> > > >
> > > > 1. you have to roundtrip whereas in the suspend_blocker scheme you have
> > > > active annotations (i.e. no further action needed)
> > >
> > > That's why it's best to use both. The normal case is that programs
> > > activate and deactivate blockers by sending one-way messages to the PM
> > > process. The exceptional case is when the PM process is about to
> > > initiate a suspend; that's when it does the round-trip polling. Since
> > > the only purpose of the polling is to avoid a race, 90% of the time it
> > > will succeed.
> > >
> > > > 2. it may not be possible for a user to determine if a wake-event is
> > > > in-flight. you would have to somehow pass the wake-event-number with
> > > > it, so that the userspace process could ack it properly without
> > > > confusion. Or... I don't know of anything else...
> > > >
> > > > 1. userspace-manager (UM) reads a number (42).
> > > >
> > > > 2. it questions userspace program X: is it ok to suspend?
> > > >
> > > > [please fill in how userspace program X determines to block
> > > > suspend]
> > > >
> > > > 3a. UM's roundtrip ends and it proceeds to write "42" to the
> > > > kernel [suspending]
> > > > 3b. UM's roundtrip ends and it aborts suspend, because a
> > > > (userspace-)suspend-blocker got activated
> > > >
> > > > I'm not shure how the userspace program could determine that there is a
> > > > wake-event in flight. Perhaps by storing the number of last wake-event.
> > > > But then you need per-wake-event-counters... :|
> > >
> > > Rafael seems to think timeouts will fix this. I'm not so sure.
> > >
> > > > Do you have some thoughts about the wake-event-in-flight detection?
> > >
> > > Not really, except for something like the original wakelock scheme in
> > > which the kernel tells the PM core when an event is over.
> >
> > But the kernel doesn't really know that, so it really can't tell the PM core
> > anything useful. What happens with suspend blockers is that a kernel subsystem
> > cooperates with a user space consumer of the event to get the story straight.
> >
> > However, that will only work if the user space is not buggy and doesn't crash,
> > for example, before releasing the suspend blocker it's holding.
>
> Having reconsidered that I think there's more to it.
>
> Take the PCI subsystem as an example, specifically pcie_pme_handle_request().
> This is the place where wakeup events are started, but it has no idea about
> how they are going to be handled. Thus in the suspend blocker scheme it would
> need to activate a blocker, but it wouldn't be able to release it. So, it
> seems, we would need to associate a suspend blocker with each PCIe device
> that can generate wakeup events and require all drivers of such devices to
> deal with a blocker activated by someone else (PCIe PME driver in this
> particular case). That sounds cumbersome to say the least.
>
> Moreover, even if we do that, it still doesn't solve the entire problem,
> because the event may need to be delivered to user space and processed by it.
> While a driver can check if user space has already read the event, it has
> no way to detect whether or not it has finished processing it. In fact,
> processing an event may involve an interaction with a (human) user and there's
> no general way by which software can figure out when the user considers the
> event as processed.
>
> It looks like user space suspend blockers only help in some special cases
> when the user space processing of a wakeup event is simple enough, but I don't
> think they help in general. For an extreme example, a user may want to wake up
> a system using wake-on-LAN to log into it, do some work and log out, so
> effectively the initial wakeup event has not been processed entirely until the
> user finally logs out of the system. Now, after the system wakeup (resulting
> from the wake-on-LAN signal) we need to give the user some time to log in, but
> if the user doesn't do that in certain time, it may be reasonable to suspend
> and let the user wake up the system again.
>
> Similar situation takes place when the system is woken up by a lid switch.
> Evidently, the user has opened the laptop lid to do something, but we don't
> even know what the user is going to do, so there's no way we can say when
> the wakeup event is finally processed.
>
> So, even if we can say when the kernel has finished processing the event
> (although that would be complicated in the PCIe case above), I don't think
> it's generally possible to ensure that the entire processing of a wakeup event
> has been completed. This leads to the question whether or not it is worth
> trying to detect the ending of the processing of a wakeup event.
>
> Now, going back to the $subject patch, I didn't really think it would be
> suitable for opportunistic suspend, so let's focus on the "forced" suspend
> instead. It still has the problem that wakeup events occuring while
> /sys/power/state is written to (or even slightly before) should cause the
> system to cancel the suspend, but they generally won't. With the patch
> applied that can be avoided by (a) reading from /sys/power/wakeup_count,
> (b) waiting for certain time (such that if a suspend event is not entirely
> processed within that time, it's worth suspending and waking up the
> system again) and (c) writing to /sys/power/wakeup_count right before writing
> to /sys/power/state (where the latter is only done if the former succeeds).
>
This is what thought was the problem your idea as trying to deal with.
--mgross
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [RFC][PATCH] PM: Avoid losing wakeup events during suspend
[not found] ` <20100622081803.4d472542@schatten.dmk.lab>
@ 2010-06-22 23:22 ` mark gross
0 siblings, 0 replies; 50+ messages in thread
From: mark gross @ 2010-06-22 23:22 UTC (permalink / raw)
To: Florian Mickler
Cc: 640e9920, markgross, Neil Brown, Dmitry Torokhov,
Linux Kernel Mailing List, linux-pm
On Tue, Jun 22, 2010 at 08:18:03AM +0200, Florian Mickler wrote:
> On Mon, 21 Jun 2010 18:58:37 -0700
> mark gross <640e9920@gmail.com> wrote:
>
>
> > wrong. They are about 1) adding opportunistic suspend, and 2) adding
> > critical sections to block suspend.
> >
> > No race issues where ever talked about until alternative solutions
> > where put up.
>
> The whole and only reason to even define the term "critical sections" is
> when you need to define "a race". Or vice versa. A race is prevented by
> defining critical sections and protecting these against concurrent
> access.
>
> [..snip..]
>
> [some rant that alan is not familiar with android userspace..]
>
> Are you suggesting that only android developers are supposed to talk
> about this?
of course not. I'm just getting frustrated with having android-isms
tossed in my face as we try to discuss the merits of the ideas, only to
find that the are getting tossed around by someone not familiar with
Android.
Sorry about that. I was having a bad day.
--mgross
>
> This is a pretty basic thing. It has only to do with system suspend.
> (And using system suspend aggressively)
>
> >
> > --mgross
> >
>
> Cheers,
> Flo
^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2010-06-22 23:22 UTC | newest]
Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.44L0.1006211814370.1687-100000@iolanthe.rowland.org>
2010-06-21 22:40 ` [RFC][PATCH] PM: Avoid losing wakeup events during suspend Rafael J. Wysocki
[not found] ` <201006220040.41524.rjw@sisk.pl>
2010-06-21 22:48 ` Rafael J. Wysocki
2010-06-22 0:50 ` Arve Hjønnevåg
2010-06-22 10:21 ` Rafael J. Wysocki
[not found] ` <201006221221.53801.rjw@sisk.pl>
2010-06-22 14:35 ` Alan Stern
2010-06-22 23:00 ` mark gross
[not found] <Pine.LNX.4.44L0.1006221508390.1322-100000@iolanthe.rowland.org>
2010-06-22 20:58 ` Rafael J. Wysocki
[not found] <201006221735.10467.rjw@sisk.pl>
2010-06-22 19:55 ` Alan Stern
[not found] <Pine.LNX.4.44L0.1006220952290.1631-100000@iolanthe.rowland.org>
2010-06-22 15:35 ` Rafael J. Wysocki
[not found] <Pine.LNX.4.44L0.1006212224550.19813-100000@netrider.rowland.org>
2010-06-22 9:24 ` Rafael J. Wysocki
[not found] <20100622012519.GB12795@gvim.org>
2010-06-22 2:24 ` Alan Stern
[not found] <201006212318.03001.rjw@sisk.pl>
2010-06-21 22:27 ` Alan Stern
[not found] <20100621223841.1cf1942d@schatten.dmk.lab>
2010-06-21 22:18 ` Alan Stern
[not found] <20100621073233.3f874ad0@schatten.dmk.lab>
2010-06-21 15:23 ` Alan Stern
2010-06-21 16:54 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006211111060.1687-100000@iolanthe.rowland.org>
2010-06-21 20:38 ` Florian Mickler
[not found] ` <Pine.LNX.4.44L0.1006211244140.1687-100000@iolanthe.rowland.org>
2010-06-21 20:40 ` Florian Mickler
2010-06-21 21:18 ` Rafael J. Wysocki
[not found] <20100620225810.GB9735@gvim.org>
2010-06-21 2:33 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006202223440.29532-100000@netrider.rowland.org>
2010-06-21 4:04 ` David Brownell
2010-06-21 6:02 ` David Brownell
2010-06-21 15:06 ` Alan Stern
2010-06-21 5:55 ` mark gross
[not found] ` <20100621055522.GE9735@gvim.org>
2010-06-21 12:39 ` Florian Mickler
2010-06-21 15:57 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006211125470.1687-100000@iolanthe.rowland.org>
2010-06-22 1:58 ` mark gross
[not found] ` <20100622015837.GC12795@gvim.org>
2010-06-22 2:46 ` Alan Stern
2010-06-22 6:18 ` Florian Mickler
2010-06-22 9:29 ` Rafael J. Wysocki
[not found] ` <20100622081803.4d472542@schatten.dmk.lab>
2010-06-22 23:22 ` mark gross
[not found] <201006202350.27143.rjw@sisk.pl>
2010-06-21 2:23 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006202154260.29532-100000@netrider.rowland.org>
2010-06-21 5:32 ` Florian Mickler
2010-06-21 6:13 ` mark gross
[not found] ` <20100621061345.GF9735@gvim.org>
2010-06-21 12:10 ` tytso
2010-06-21 16:01 ` Alan Stern
[not found] ` <20100621121058.GF6199@thunk.org>
2010-06-21 12:22 ` Alan Cox
2010-06-21 12:26 ` Florian Mickler
2010-06-21 12:26 ` Florian Mickler
2010-06-21 13:42 ` tytso
[not found] ` <20100621134249.GG6199@thunk.org>
2010-06-21 14:01 ` Alan Cox
2010-06-22 1:07 ` mark gross
[not found] ` <Pine.LNX.4.44L0.1006211158200.1687-100000@iolanthe.rowland.org>
2010-06-22 1:25 ` mark gross
2010-06-21 21:58 ` Rafael J. Wysocki
[not found] <201006200005.35771.rjw@sisk.pl>
2010-06-20 5:52 ` mark gross
[not found] ` <20100620055254.GA21994@gvim.org>
2010-06-20 12:49 ` Rafael J. Wysocki
[not found] ` <201006201449.14754.rjw@sisk.pl>
2010-06-20 23:13 ` mark gross
2010-06-20 16:28 ` Alan Stern
[not found] ` <Pine.LNX.4.44L0.1006201153420.22272-100000@netrider.rowland.org>
2010-06-20 21:50 ` Rafael J. Wysocki
2010-06-20 22:58 ` mark gross
2010-06-19 22:05 Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox