[PATCH 0/6] [RFC] Proposal for optimistic suspend idea.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
@ 2011-09-26 19:13 John Stultz
  2011-09-26 19:13 ` [PATCH 1/6] [RFC] suspend: Block suspend when wakeups are in-progress John Stultz
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: John Stultz @ 2011-09-26 19:13 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 9400 bytes --]

So this is a *VERY VERY* rough draft of patches in order to try
to illustrate an idea I've been thinking about since the wakelocks
and suspend-blockers debates happened.

While I doubt the idea is all that novel, I haven't really seen anything
similar discussed yet. Additionally this has very much been influenced by
discussions with Arve Hjønnevåg, Mark Gross, Rafael Wysocki, Alan Stern,
Kevin Hilman, Magnus Damm and others, so credits to all of them for 
clarifying the numerous points of views and requirements that exist,
and helping me better formulate the idea (even if they don't like it :).

>From the wakelock/suspend blockers discussion, as I understood it,
there were three main requirements needed for Android.

1) Method for the kernel to inhibit suspend during some critical section
2) Method for userland to inhibit suspend during some critical section
3) Method using the first two to inhibit suspend from the point that
a wakeup event occurs, until userland has consumed that event.

The third item is the most subtle, and seems to be often overlooked.

The wakelock/suspend_blockers proposal solved #3 by using the following
pattern in userland:

	wake_lock();
	do_stuff();
	wake_unlock();

	select();	//kernel will grab a wakelock when event occurs

	wake_lock();
	read();		//kernel will drop wakelock when data is read

	do_stuff()
	wake_unlock();	

This pattern allows the kernel to inhibit suspend from the wakeup event
until the new data from the event (such as a keyboard press) was read.
The select allows userland to wait until data has arrived, before grabbing
its own wakelock, and then reading the data (where the kernels' wakelock is
then dropped).

This provides proper overlapping critical sections from the wakeup event
all the way to userland consuming the event.

Recently, the pm_stay_awake/pm_relax and wakeup_source infrastructure has
been included into mainline. Some have claimed that this infrastructure
is sufficient to implement wakelock-like functionality, but because the
wakeup_count only provides a count of the number of wakeup events, it
does not provide a way to know if userland has consumed the event. So this
third requirement is not yet met.

In my mind, one disadvantage of the wakelock/suspend-blocker API is that
it requires userland to know what devices are wakeup devices, and must
follow the pattern above interlocking wake_lock calls between select and
read. This makes it a somewhat special case API, with very suspend-specific
rules.

Instead, it seems to me that we could look at this more from the schedulers
point of view. We could have some way for tasks to mark themselves as 
"important", and that would allow us to decide if suspend could occur or not.

In this (again very very rough) draft patch queue, I've used a new
SCHED_STAYAWAKE sched_setsched() flag to be used by applications to mark
themselves as important. This notifies the system that it should not suspend
until the task un-marks itself or dies.

Now, the problem with just some scheduler call is that it really isn't 
any different from the userland wakelock api.

However, here is where the idea is different. Instead of userland having
to know what devices are wakeup sources and dropping a wakelock before
blocking on them, and then having to do a dance to reacquire the wakelock
between the select and the read(), why not leave it all up to the kernel.

The application would mark itself as important, and do whatever it needs
to do.  Should it end up blocking on a device that is backed by a wakeup
source, the kernel could understand that, and de-boost the task's
importance when it blocks. Then when the wakeup event occurs, when the task
is woken up, the kernel would re-boost the task to its original importance.

This can be imagined as a sort of upside down priority inheritance.

Here is an example userland testcase: suspend-block-test.c:

/* Build: gcc -lrt suspend-block-test.c -o suspend-block-test */
#include <stdio.h>
#include <sched.h>

#define SCHED_STAYAWAKE 0x0f000000
#define CLOCK_BOOTTIME			7
#define CLOCK_BOOTTIME_ALARM		9

void main(void)
{
	struct sched_param param;
	int err;
	param.sched_priority = 0;
	sched_setscheduler(0, SCHED_STAYAWAKE, &param);
	while (1) {
		struct timespec ts1, ts2;
		char input[256];
		printf("Waiting for input:");
		scanf("%s", input);

		ts1.tv_sec = 10;
		ts1.tv_nsec = 0;
		printf("Sleeping for 10 seconds (BOOTTIME)\n");
		clock_nanosleep(CLOCK_BOOTTIME, 0, &ts1, &ts2);
		printf("wakeup!\n");

		printf("Sleeping for 10 seconds (BOOTTIME_ALARM)\n");
		clock_nanosleep(CLOCK_BOOTTIME_ALARM, 0, &ts1, &ts2);
		printf("wakeup!\n");
	}
	return;
}

As you can see, after marking itself as important, the application doesn't
need to do anything else special. The system will only allow suspends
to occur while the task is blocked in clock_nanosleep() on the RTC
backed BOOTTIME_ALARM clockid.

Now, this simplified API comes at the cost that it relies hevily on the
kernel to do the right thing. And the kernel must be able to know what
devices are backed by wakeup events or not, and also must protect the
wakeup event to task-wakeup path such that a suspend will not trigger
before the task is re-boosted. These paths can be long and varied, through
threaded-irqs, soft-irqs, workqueues and even timers. So protecting
those paths in the kernel may add quite a bit of complexity. Possibly so
much that this really becomes infeasible.

There are also other complicated cases, like selecting on multiple fds,
where some are backed by wakeup events and some aren't, where its not clear
if de-boosting or not de-boosting would be the right decision.

Further, its likely additional work will be needed refining the
wakeup_source code so we can better distinquish between wake up events
and in-kernel critical section where we want to block suspend. Currently
they are one and the same. But as those in-kernel critical sections are
added, its likely the process of suspending the system may run across code
paths that have such critical sections. And while those critical sections
need to complete before suspend finishes, we don't want them to cause
suspend to abort, as it would if a wakeup event occured.

None the less, because the userland API is a bit simpler, and entrusts the
kernel to do the right thing, it may also be a more flexible userland API
to live with for the comming decades. I could imagine it being useful for
non-suspend power related scheduling decisions, such as if the scheduler
should enforce idleness (via something like idle cycle injection) to save
power or not. Or maybe such a flag could be used to decide if a task's 
timers are important enough to schedule timer irqs in the near future or
not.

But that's all blue-sky talk. 

For now, I'd just be interested in what folks think about the concept with
regards to the wakelock discussions. Where it might not be sufficient? Or
what other disadvantages might it have? Are there any varients to this
idea that would be better?

Again, at this point I'm not trying to push this specific idea forward for
inclusion, but I'm hoping to just put it forward for review to see if it
can't help spur some constructive discussion about it or alternative
approaches.

The following patch queue implements:
* Changes to the suspend code to enforce that suspend will not occur
  while a wakeup is in progress. This is not strictly required, but
  seems useful to me.
* Changes to the scheduler to manage both the per-task importance and 
  global active count, as well as re-boosting on task wakeup.
* Changes to the RTC layers to inhibit suspend on the path from IRQ to
  the task wakeup call.
* Change to the alarmtimer nanosleep call to deboost before blocking

I'll admit there are a lot of problems with the current patches, but
hopefully they communicate the idea well enough to allow for discussion.

Again, big thanks to Mark Gross, Rafael Wysocki, and Alan Stern for
listening and helping me better formulate the idea, and letting me know
where they saw problems with it.

thanks
-john

CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: arve@android.com
CC: markgross@thegnar.org
CC: Alan Stern <stern@rowland.harvard.edu>
CC: amit.kucheria@linaro.org
CC: farrowg@sg.ibm.com
CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
CC: linux-pm@lists.linux-foundation.org
CC: khilman@ti.com
CC: Magnus Damm <damm@opensource.se>
CC: mjg@redhat.com 
CC: peterz@infradead.org 

John Stultz (6):
  [RFC] suspend: Block suspend when wakeups are in-progress
  [RFC] sched: Add support for SCHED_STAYAWAKE flag
  [RFC] rtc: rtc-cmos: Add pm_stay_awake/pm_relax calls around IRQ
  [RFC] rtc: interface: Add pm_stay_awake/pm_relax chaining rtc
    workqueue processing
  [RFC] alarmtimer: Add pm_stay_awake /pm_relax calls
  [RFC] alarmtimer: Deboost on nanosleep

 drivers/base/power/wakeup.c |   13 ++++++
 drivers/rtc/interface.c     |   16 ++++++-
 drivers/rtc/rtc-cmos.c      |   13 +++++-
 include/linux/sched.h       |   12 +++++
 include/linux/suspend.h     |    3 +
 kernel/exit.c               |    2 +
 kernel/fork.c               |    2 +
 kernel/power/hibernate.c    |    5 ++
 kernel/power/suspend.c      |    5 ++
 kernel/sched.c              |  101 +++++++++++++++++++++++++++++++++++++++++++
 kernel/time/alarmtimer.c    |   48 +++++++++++++++++++--
 11 files changed, 212 insertions(+), 8 deletions(-)

-- 
1.7.3.2.146.gca209

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/6] [RFC] suspend: Block suspend when wakeups are in-progress
  2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
@ 2011-09-26 19:13 ` John Stultz
  2011-09-26 19:13 ` [PATCH 2/6] [RFC] sched: Add support for SCHED_STAYAWAKE flag John Stultz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-26 19:13 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

With the current pm_stay_awake/pm_relax api, reads to
/sys/power/wakeup_count will block when pm_stay_awake() has been
called. Then once pm_relax() returns, the read will unblock and return
the wakeup_count value. This value can be echo'ed back into wakeup_count,
and if no other wakeup events have occured, the system can be suspended
by calling "echo mem > /sys/power/state".

This method is somewhat advisory, as if a wakeup event has occured
between the reading of /sys/power/wakeup_count and the attempt to suspend,
that attempt to suspend will fail. However, if a second attmept to suspend
is tried, without checking /sys/power/wakeup_count, the suspend will
succeed.

Similarly, if pm_stay_awake() has been called, and then a suspend is
attepted wihtout checking /sys/power/wakeup_count, the suspend will
succeed, despite the pm_relax() call not having been made.

This patch tries to make the pm_stay_awake() call a bit more enforcing,
such that any attempt to suspend that occurs while a wakeup is in progress
will fail. Once the matching pm_relax() has been called, suspend will
succeed.

This does not change the blocking behavior of /sys/power/wakeup_count,
or the suspend failure if a stale wakeup count has been echo'ed into
the /sys/power/wakeup_count.

Also modified the hibernate path in the same way.

CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: arve@android.com
CC: markgross@thegnar.org
CC: Alan Stern <stern@rowland.harvard.edu>
CC: amit.kucheria@linaro.org
CC: farrowg@sg.ibm.com
CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
CC: linux-pm@lists.linux-foundation.org
CC: khilman@ti.com
CC: Magnus Damm <damm@opensource.se>
CC: mjg@redhat.com 
CC: peterz@infradead.org 
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/base/power/wakeup.c |   13 +++++++++++++
 include/linux/suspend.h     |    3 +++
 kernel/power/hibernate.c    |    5 +++++
 kernel/power/suspend.c      |    5 +++++
 4 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 84f7c7d..0722873 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -580,6 +580,19 @@ static void pm_wakeup_update_hit_counts(void)
 	rcu_read_unlock();
 }
 
+
+bool pm_wakeup_in_progress(void)
+{
+	unsigned int cnt, inpr;
+	unsigned long flags;
+
+	spin_lock_irqsave(&events_lock, flags);
+	split_counters(&cnt, &inpr);
+	spin_unlock_irqrestore(&events_lock, flags);
+
+	return (inpr != 0);
+}
+
 /**
  * pm_wakeup_pending - Check if power transition in progress should be aborted.
  *
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 6bbcef2..c9403a1 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -294,6 +294,7 @@ extern int unregister_pm_notifier(struct notifier_block *nb);
 extern bool events_check_enabled;
 
 extern bool pm_wakeup_pending(void);
+extern bool pm_wakeup_in_progress(void);
 extern bool pm_get_wakeup_count(unsigned int *count);
 extern bool pm_save_wakeup_count(unsigned int count);
 #else /* !CONFIG_PM_SLEEP */
@@ -311,6 +312,8 @@ static inline int unregister_pm_notifier(struct notifier_block *nb)
 #define pm_notifier(fn, pri)	do { (void)(fn); } while (0)
 
 static inline bool pm_wakeup_pending(void) { return false; }
+static inline bool pm_wakeup_in_progress(void) { return false; }
+
 #endif /* !CONFIG_PM_SLEEP */
 
 extern struct mutex pm_mutex;
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 8f7b1db..22dc22c 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -611,6 +611,11 @@ int hibernate(void)
 		goto Unlock;
 	}
 
+	if (pm_wakeup_in_progress()) {
+		error = -EBUSY;
+		goto Unlock;
+	}
+
 	pm_prepare_console();
 	error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
 	if (error)
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index b6b71ad..b2dba52 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -281,6 +281,11 @@ int enter_state(suspend_state_t state)
 	if (!mutex_trylock(&pm_mutex))
 		return -EBUSY;
 
+	if (pm_wakeup_in_progress()) {
+		error = -EBUSY;
+		goto Unlock;
+	}
+
 	printk(KERN_INFO "PM: Syncing filesystems ... ");
 	sys_sync();
 	printk("done.\n");
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/6] [RFC] sched: Add support for SCHED_STAYAWAKE flag
  2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
  2011-09-26 19:13 ` [PATCH 1/6] [RFC] suspend: Block suspend when wakeups are in-progress John Stultz
@ 2011-09-26 19:13 ` John Stultz
  2011-09-26 19:13 ` [PATCH 3/6] [RFC] rtc: rtc-cmos: Add pm_stay_awake/pm_relax calls around IRQ John Stultz
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-26 19:13 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

This is a draft proof of concept on how a stayawake scheduler flag
could be used to inhibit suspend from userland.

I'm in no way married to this specific api, but this acts a a concrete
example of the following idea I'd like to propose:

First there is some method for a task to mark and unmark itself as
"important".

While there are any "important" tasks, no matter if they are runnable
or not, suspend could not occur (this is not unlike Android's userland
wakelocks).

Now, If an "important" task were to block on a device that the kernel
knows to be a wake-up source, the kerenl can choose to de-boost the
"important" task, so that while blocked, it would not be considered
"important".

Upon task wakeup, the kernel would re-boost the task back to its prior
level of importance.

One can sort of imagine this as an upside-down priority inheritance.

This patch provides the API for a task to mark and umark itself as
"important" and block suspend, as well as the hook on wakeup to
reboost any de-boosted tasks.

Now, for corrrectness, in order to avoid races with suspend attempts
that might occur after a wakeup event but before the "important" task
is reboosted on wakeup, there would need to be over-lapping pm_stay_awake
and pm_relax chaining, so the entire IRQ->task wakeup path prohibited
suspend.

CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: arve@android.com
CC: markgross@thegnar.org
CC: Alan Stern <stern@rowland.harvard.edu>
CC: amit.kucheria@linaro.org
CC: farrowg@sg.ibm.com
CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
CC: linux-pm@lists.linux-foundation.org
CC: khilman@ti.com
CC: Magnus Damm <damm@opensource.se>
CC: mjg@redhat.com 
CC: peterz@infradead.org 
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/sched.h |   12 ++++++
 kernel/exit.c         |    2 +
 kernel/fork.c         |    2 +
 kernel/sched.c        |  101 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 117 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4ac2c05..3557838 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -41,6 +41,8 @@
 #define SCHED_IDLE		5
 /* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
 #define SCHED_RESET_ON_FORK     0x40000000
+#define SCHED_STAYAWAKE		0x0f000000
+
 
 #ifdef __KERNEL__
 
@@ -1566,6 +1568,10 @@ struct task_struct {
 		unsigned long memsw_nr_pages; /* uncharged mem+swap usage */
 	} memcg_batch;
 #endif
+
+	int task_active_count;
+	int task_active_boosted;
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 	atomic_t ptrace_bp_refcnt;
 #endif
@@ -1753,6 +1759,12 @@ static inline void put_task_struct(struct task_struct *t)
 extern void task_times(struct task_struct *p, cputime_t *ut, cputime_t *st);
 extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *st);
 
+
+extern void sched_inc_active_count(void);
+extern void sched_dec_active_count(void);
+extern void sched_deboost_task_active_count(struct task_struct *p);
+extern void sched_boost_task_active_count(struct task_struct *p);
+
 /*
  * Per process flags
  */
diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..71f7bd4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -994,6 +994,8 @@ NORET_TYPE void do_exit(long code)
 	 */
 	perf_event_exit_task(tsk);
 
+	sched_deboost_task_active_count(tsk);
+
 	cgroup_exit(tsk, 1);
 
 	if (group_dead)
diff --git a/kernel/fork.c b/kernel/fork.c
index 8e6b6f4..c79c4fb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1213,6 +1213,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->memcg_batch.do_batch = 0;
 	p->memcg_batch.memcg = NULL;
 #endif
+	p->task_active_count = 0;
+	p->task_active_boosted = 0;
 
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	sched_fork(p);
diff --git a/kernel/sched.c b/kernel/sched.c
index ec5f472..a134129 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -852,6 +852,96 @@ static inline u64 global_rt_runtime(void)
 # define finish_arch_switch(prev)	do { } while (0)
 #endif
 
+/* XXX This should be per-cpu or soemthing that scales */
+static int global_task_active_count = 0;
+static DEFINE_SPINLOCK(global_task_active_lock);
+static struct wakeup_source *wakelock;
+
+static int __init wakelock_init(void)
+{
+       wakelock = wakeup_source_register("wakelock");
+       return 0;
+}
+core_initcall(wakelock_init);
+
+
+static void __sched_inc_global_active_count(int count)
+{
+	if (!global_task_active_count && count)
+		__pm_stay_awake(wakelock);
+	global_task_active_count += count;
+}
+
+static void __sched_dec_global_active_count(int count)
+{
+	WARN_ON(count > global_task_active_count);
+	global_task_active_count -= count;
+	if (!global_task_active_count && count)
+		__pm_relax(wakelock);
+}
+
+void sched_inc_active_count(void)
+{
+	unsigned long flags;
+	spin_lock_irqsave(&global_task_active_lock, flags);
+
+	current->task_active_boosted = 1;
+	current->task_active_count++;
+	__sched_inc_global_active_count(1);
+
+	spin_unlock_irqrestore(&global_task_active_lock, flags);
+}
+
+void sched_dec_active_count(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&global_task_active_lock, flags);
+
+	WARN_ON(current->task_active_count == 0);
+
+	current->task_active_count--;
+	if (current->task_active_count == 0)
+		current->task_active_boosted = 0;
+	__sched_dec_global_active_count(1);
+
+	spin_unlock_irqrestore(&global_task_active_lock, flags);
+
+}
+
+void sched_deboost_task_active_count(struct task_struct *p)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&global_task_active_lock, flags);
+
+	if (p->task_active_boosted)
+		__sched_dec_global_active_count(p->task_active_count);
+	p->task_active_boosted = 0;
+
+	spin_unlock_irqrestore(&global_task_active_lock, flags);
+
+}
+
+void sched_boost_task_active_count(struct task_struct *p)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&global_task_active_lock, flags);
+	if (!p->task_active_boosted)
+		__sched_inc_global_active_count(p->task_active_count);
+	if (p->task_active_count)
+		p->task_active_boosted = 1;
+	spin_unlock_irqrestore(&global_task_active_lock, flags);
+}
+
+static inline int is_task_active(struct task_struct *p)
+{
+	return !!p->task_active_count;
+}
+
+
+
 static inline int task_current(struct rq *rq, struct task_struct *p)
 {
 	return rq->curr == p;
@@ -2727,6 +2817,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
+	sched_boost_task_active_count(p);
+
 	if (p->sched_class->task_waking)
 		p->sched_class->task_waking(p);
 
@@ -5113,6 +5205,7 @@ static int __sched_setscheduler(struct task_struct *p, int policy,
 	const struct sched_class *prev_class;
 	struct rq *rq;
 	int reset_on_fork;
+	int stayawake=0;
 
 	/* may grab non-irq protected spin_locks */
 	BUG_ON(in_interrupt());
@@ -5125,6 +5218,9 @@ recheck:
 		reset_on_fork = !!(policy & SCHED_RESET_ON_FORK);
 		policy &= ~SCHED_RESET_ON_FORK;
 
+		stayawake = !!(policy & SCHED_STAYAWAKE);
+		policy &= ~SCHED_STAYAWAKE;
+
 		if (policy != SCHED_FIFO && policy != SCHED_RR &&
 				policy != SCHED_NORMAL && policy != SCHED_BATCH &&
 				policy != SCHED_IDLE)
@@ -5202,6 +5298,11 @@ recheck:
 		return -EINVAL;
 	}
 
+	if (stayawake && !is_task_active(p))
+		sched_inc_active_count();
+	else if (!stayawake && is_task_active(p))
+		sched_dec_active_count();
+
 	/*
 	 * If not changing anything there's no need to proceed further:
 	 */
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/6] [RFC] rtc: rtc-cmos: Add pm_stay_awake/pm_relax calls around IRQ
  2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
  2011-09-26 19:13 ` [PATCH 1/6] [RFC] suspend: Block suspend when wakeups are in-progress John Stultz
  2011-09-26 19:13 ` [PATCH 2/6] [RFC] sched: Add support for SCHED_STAYAWAKE flag John Stultz
@ 2011-09-26 19:13 ` John Stultz
  2011-10-01 21:31   ` NeilBrown
  2011-09-26 19:13 ` [PATCH 4/6] [RFC] rtc: interface: Add pm_stay_awake/pm_relax chaining rtc workqueue processing John Stultz
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: John Stultz @ 2011-09-26 19:13 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

Flag the rtc-cmos IRQ event as a wakeup event using pm_stay_awake
and pm_relax()

CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: arve@android.com
CC: markgross@thegnar.org
CC: Alan Stern <stern@rowland.harvard.edu>
CC: amit.kucheria@linaro.org
CC: farrowg@sg.ibm.com
CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
CC: linux-pm@lists.linux-foundation.org
CC: khilman@ti.com
CC: Magnus Damm <damm@opensource.se>
CC: mjg@redhat.com 
CC: peterz@infradead.org 
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/rtc/rtc-cmos.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 05beb6c..1cc4688 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -538,11 +538,15 @@ static struct bin_attribute nvram = {
 
 static struct cmos_rtc	cmos_rtc;
 
+static struct wakeup_source *rtc_cmos_wakelock;
+
 static irqreturn_t cmos_interrupt(int irq, void *p)
 {
 	u8		irqstat;
 	u8		rtc_control;
 
+	__pm_stay_awake(rtc_cmos_wakelock);
+
 	spin_lock(&rtc_lock);
 
 	/* When the HPET interrupt handler calls us, the interrupt
@@ -573,9 +577,12 @@ static irqreturn_t cmos_interrupt(int irq, void *p)
 
 	if (is_intr(irqstat)) {
 		rtc_update_irq(p, 1, irqstat);
+		__pm_relax(rtc_cmos_wakelock);
 		return IRQ_HANDLED;
-	} else
-		return IRQ_NONE;
+	}
+
+	__pm_relax(rtc_cmos_wakelock);
+	return IRQ_NONE;
 }
 
 #ifdef	CONFIG_PNP
@@ -1153,6 +1160,8 @@ static int __init cmos_init(void)
 {
 	int retval = 0;
 
+	rtc_cmos_wakelock = wakeup_source_register("rtc_cmos");
+
 #ifdef	CONFIG_PNP
 	retval = pnp_register_driver(&cmos_pnp_driver);
 	if (retval == 0)
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/6] [RFC] rtc: interface: Add pm_stay_awake/pm_relax chaining rtc workqueue processing
  2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
                   ` (2 preceding siblings ...)
  2011-09-26 19:13 ` [PATCH 3/6] [RFC] rtc: rtc-cmos: Add pm_stay_awake/pm_relax calls around IRQ John Stultz
@ 2011-09-26 19:13 ` John Stultz
  2011-09-26 19:13 ` [PATCH 5/6] [RFC] alarmtimer: Add pm_stay_awake /pm_relax calls John Stultz
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-26 19:13 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

To ensure suspend events don't trigger between the irq and the
rtc workqueue function running, use pm_stay_awake/pm_relax.

CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: arve@android.com
CC: markgross@thegnar.org
CC: Alan Stern <stern@rowland.harvard.edu>
CC: amit.kucheria@linaro.org
CC: farrowg@sg.ibm.com
CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
CC: linux-pm@lists.linux-foundation.org
CC: khilman@ti.com
CC: Magnus Damm <damm@opensource.se>
CC: mjg@redhat.com 
CC: peterz@infradead.org 
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/rtc/interface.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 44e91e5..ba1c437 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -19,6 +19,15 @@
 static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer);
 static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer);
 
+static struct wakeup_source *rtc_interface_wakelock;
+static int __init rtc_interface_wakelock_init(void)
+{
+       rtc_interface_wakelock = wakeup_source_register("rtc_interface");
+       return 0;
+}
+core_initcall(rtc_interface_wakelock_init);
+
+
 static int __rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm)
 {
 	int err;
@@ -563,6 +572,7 @@ enum hrtimer_restart rtc_pie_update_irq(struct hrtimer *timer)
 void rtc_update_irq(struct rtc_device *rtc,
 		unsigned long num, unsigned long events)
 {
+	__pm_stay_awake(rtc_interface_wakelock);
 	schedule_work(&rtc->irqwork);
 }
 EXPORT_SYMBOL_GPL(rtc_update_irq);
@@ -751,9 +761,10 @@ static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer)
 		alarm.time = rtc_ktime_to_tm(timer->node.expires);
 		alarm.enabled = 1;
 		err = __rtc_set_alarm(rtc, &alarm);
-		if (err == -ETIME)
+		if (err == -ETIME) {
+			__pm_stay_awake(rtc_interface_wakelock);
 			schedule_work(&rtc->irqwork);
-		else if (err) {
+		} else if (err) {
 			timerqueue_del(&rtc->timerqueue, &timer->node);
 			timer->enabled = 0;
 			return err;
@@ -849,6 +860,7 @@ again:
 	}
 
 	mutex_unlock(&rtc->ops_lock);
+	__pm_relax(rtc_interface_wakelock);
 }
 
 
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/6] [RFC] alarmtimer: Add pm_stay_awake /pm_relax calls
  2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
                   ` (3 preceding siblings ...)
  2011-09-26 19:13 ` [PATCH 4/6] [RFC] rtc: interface: Add pm_stay_awake/pm_relax chaining rtc workqueue processing John Stultz
@ 2011-09-26 19:13 ` John Stultz
  2011-09-26 19:13 ` [PATCH 6/6] [RFC] alarmtimer: Deboost on nanosleep John Stultz
  2011-09-26 20:16 ` [PATCH 0/6] [RFC] Proposal for optimistic suspend idea Peter Zijlstra
  6 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-26 19:13 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

This provides wakelock like chaining to  protects the
RTC wakeup path to the hrtimer firing of the alarmtimer.

CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: arve@android.com
CC: markgross@thegnar.org
CC: Alan Stern <stern@rowland.harvard.edu>
CC: amit.kucheria@linaro.org
CC: farrowg@sg.ibm.com
CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
CC: linux-pm@lists.linux-foundation.org
CC: khilman@ti.com
CC: Magnus Damm <damm@opensource.se>
CC: mjg@redhat.com 
CC: peterz@infradead.org 
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/alarmtimer.c |   44 ++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index ea5e1a9..00ee80f 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -46,12 +46,17 @@ static struct alarm_base {
 static ktime_t freezer_delta;
 static DEFINE_SPINLOCK(freezer_delta_lock);
 
+
 #ifdef CONFIG_RTC_CLASS
 /* rtc timer and device for setting alarm wakeups at suspend */
 static struct rtc_timer		rtctimer;
 static struct rtc_device	*rtcdev;
 static DEFINE_SPINLOCK(rtcdev_lock);
 
+static struct wakeup_source *alarmtimer_wakelock;
+static int stay_awake;
+static DEFINE_SPINLOCK(stay_awake_lock);
+
 /**
  * has_wakealarm - check rtc device has wakealarm ability
  * @dev: current device
@@ -73,6 +78,19 @@ static int has_wakealarm(struct device *dev, void *name_ptr)
 	return 1;
 }
 
+/* rtctimer function called by first rtc interrupt after suspend */
+void alarmtimer_resume_call(void* p)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&stay_awake_lock, flags);
+	if (stay_awake == 1) {
+		stay_awake = 2;
+		__pm_stay_awake(alarmtimer_wakelock);
+	}
+	spin_unlock_irqrestore(&stay_awake_lock, flags);
+}
+
 /**
  * alarmtimer_get_rtcdev - Return selected rtcdevice
  *
@@ -99,7 +117,7 @@ static struct rtc_device *alarmtimer_get_rtcdev(void)
 			 * rtc_open takes its own.
 			 */
 			put_device(dev);
-			rtc_timer_init(&rtctimer, NULL, NULL);
+			rtc_timer_init(&rtctimer, alarmtimer_resume_call, NULL);
 		}
 	}
 	ret = rtcdev;
@@ -158,6 +176,8 @@ static void alarmtimer_remove(struct alarm_base *base, struct alarm *alarm)
 }
 
 
+
+
 /**
  * alarmtimer_fired - Handles alarm hrtimer being fired.
  * @timer: pointer to hrtimer being run
@@ -175,6 +195,8 @@ static enum hrtimer_restart alarmtimer_fired(struct hrtimer *timer)
 	ktime_t now;
 	int ret = HRTIMER_NORESTART;
 
+
+	__pm_stay_awake(alarmtimer_wakelock);
 	spin_lock_irqsave(&base->lock, flags);
 	now = base->gettime();
 	while ((next = timerqueue_getnext(&base->timerqueue))) {
@@ -206,6 +228,13 @@ static enum hrtimer_restart alarmtimer_fired(struct hrtimer *timer)
 	}
 	spin_unlock_irqrestore(&base->lock, flags);
 
+	spin_lock_irqsave(&stay_awake_lock, flags);
+	if (stay_awake == 2)
+		__pm_relax(alarmtimer_wakelock);
+	stay_awake = 0;
+	spin_unlock_irqrestore(&stay_awake_lock, flags);
+	__pm_relax(alarmtimer_wakelock);
+
 	return ret;
 
 }
@@ -225,12 +254,14 @@ static int alarmtimer_suspend(struct device *dev)
 {
 	struct rtc_time tm;
 	ktime_t min, now;
+	int min_base; 
 	unsigned long flags;
 	struct rtc_device *rtc;
 	int i;
 
 	spin_lock_irqsave(&freezer_delta_lock, flags);
 	min = freezer_delta;
+	min_base = 0;
 	freezer_delta = ktime_set(0, 0);
 	spin_unlock_irqrestore(&freezer_delta_lock, flags);
 
@@ -251,8 +282,10 @@ static int alarmtimer_suspend(struct device *dev)
 		if (!next)
 			continue;
 		delta = ktime_sub(next->expires, base->gettime());
-		if (!min.tv64 || (delta.tv64 < min.tv64))
+		if (!min.tv64 || (delta.tv64 < min.tv64)) {
 			min = delta;
+			min_base = i;
+		}
 	}
 	if (min.tv64 == 0)
 		return 0;
@@ -266,6 +299,10 @@ static int alarmtimer_suspend(struct device *dev)
 	now = rtc_tm_to_ktime(tm);
 	now = ktime_add(now, min);
 
+	spin_lock_irqsave(&stay_awake_lock, flags);
+	stay_awake=1;
+	spin_unlock_irqrestore(&stay_awake_lock, flags);
+
 	rtc_timer_start(rtc, &rtctimer, now, ktime_set(0, 0));
 
 	return 0;
@@ -703,6 +740,9 @@ static int __init alarmtimer_init(void)
 		.nsleep		= alarm_timer_nsleep,
 	};
 
+	alarmtimer_wakelock = wakeup_source_register("alarmtimer");
+	stay_awake=0;
+
 	posix_timers_register_clock(CLOCK_REALTIME_ALARM, &alarm_clock);
 	posix_timers_register_clock(CLOCK_BOOTTIME_ALARM, &alarm_clock);
 
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/6] [RFC] alarmtimer: Deboost on nanosleep
  2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
                   ` (4 preceding siblings ...)
  2011-09-26 19:13 ` [PATCH 5/6] [RFC] alarmtimer: Add pm_stay_awake /pm_relax calls John Stultz
@ 2011-09-26 19:13 ` John Stultz
  2011-09-26 20:16 ` [PATCH 0/6] [RFC] Proposal for optimistic suspend idea Peter Zijlstra
  6 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-26 19:13 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

Example of deboosting tasks before blocking on a wakeup
source like alarmtimer based nanosleep.

With this final patch, the kernel will be able to suspend
when SCHED_STAYAWAKE flagged tasks are blocked on the alarmtimer.

CC: Rafael J. Wysocki <rjw@sisk.pl>
CC: arve@android.com
CC: markgross@thegnar.org
CC: Alan Stern <stern@rowland.harvard.edu>
CC: amit.kucheria@linaro.org
CC: farrowg@sg.ibm.com
CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
CC: linux-pm@lists.linux-foundation.org
CC: khilman@ti.com
CC: Magnus Damm <damm@opensource.se>
CC: mjg@redhat.com 
CC: peterz@infradead.org 
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/alarmtimer.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 00ee80f..c855d40 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -568,9 +568,10 @@ static int alarmtimer_do_nsleep(struct alarm *alarm, ktime_t absexp)
 	do {
 		set_current_state(TASK_INTERRUPTIBLE);
 		alarm_start(alarm, absexp, ktime_set(0, 0));
-		if (likely(alarm->data))
+		if (likely(alarm->data)) {
+			sched_deboost_task_active_count(current);
 			schedule();
-
+		}
 		alarm_cancel(alarm);
 	} while (alarm->data && !signal_pending(current));
 
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
                   ` (5 preceding siblings ...)
  2011-09-26 19:13 ` [PATCH 6/6] [RFC] alarmtimer: Deboost on nanosleep John Stultz
@ 2011-09-26 20:16 ` Peter Zijlstra
  2011-09-26 22:27   ` John Stultz
  6 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-26 20:16 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Mon, 2011-09-26 at 12:13 -0700, John Stultz wrote:
> 
> For now, I'd just be interested in what folks think about the concept with
> regards to the wakelock discussions. Where it might not be sufficient? Or
> what other disadvantages might it have? Are there any varients to this
> idea that would be better? 

I would like to know why people still think wakelocks are remotely sane?

>From where I'm sitting they're utter crap.. _WHY_ do you need to suspend
anything? What's wrong with regular idle?

So no, you've got a massive major NAK for anything touching the
scheduler for this utter braindamage.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-26 20:16 ` [PATCH 0/6] [RFC] Proposal for optimistic suspend idea Peter Zijlstra
@ 2011-09-26 22:27   ` John Stultz
  2011-09-27 10:37     ` Peter Zijlstra
  2011-09-28  0:09     ` Thomas Gleixner
  0 siblings, 2 replies; 28+ messages in thread
From: John Stultz @ 2011-09-26 22:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Mon, 2011-09-26 at 22:16 +0200, Peter Zijlstra wrote:
> On Mon, 2011-09-26 at 12:13 -0700, John Stultz wrote:
> > 
> > For now, I'd just be interested in what folks think about the concept with
> > regards to the wakelock discussions. Where it might not be sufficient? Or
> > what other disadvantages might it have? Are there any varients to this
> > idea that would be better? 
> 
> I would like to know why people still think wakelocks are remotely sane?
> 
> From where I'm sitting they're utter crap.. _WHY_ do you need to suspend
> anything? What's wrong with regular idle?

Well. Regular idle still takes up more power with my desktop then I
could save with suspend.

My personal use case: I do nightly backups with rdiff-backup. I'd like
to schedule those  backup using an alarm-timer, so I could suspend my
system when I'm not using it. So far, so good, that all works. 

However, if my system tries to suspend itself after 15 minutes of X
input idle, and my backup at 2am takes more then 15 minutes, then the
backup gets interrupted. Because rdiff-backup is more of a transactional
style backup, it then has to roll back any incomplete changes and try
again the next night, which will surely take more then 15 minutes, etc.

I could try to inhibit suspend by making requests to my desktop
environment, so the desktop-specific power management daemon won't
trigger suspend. But given recent behavior, I don't trust that not to
break when I upgrade my system, or if I get frustrated with one desktop
environment, that I won't have to use a different api for whatever other
environment I pick next.

Another use case I've heard about are systems that have firmware updates
that are remotely triggered. Should the system go into suspend while the
firmware update is going on, you end up with a brick.

Having to have multiple distro/release specific quirks to get the
power-management-daemon to inhibit suspend is annoying enough, but then
you also have to deal with custom changes by administrator, or remote
power management systems like power nap, which might also echo "mem"
into /sys/power/state when you're not expecting it. A kernel method to
really block suspend would be nice. While this doesn't necessarily need
to be conflated with wakelock style suspend, there is some need to allow
userland to block suspend at the kernel level, and once you have that, I
can't imagine folks not trying to stretch that into something like
wakelocks. So you might as well at least try to design it reasonably
well to start.

And again, this doesn't have to be suspend specific. As I mentioned, one
way of reducing power drain by increasing timer slack, or just not
scheduling processes for some chunk of time. However, there really isn't
any good scheduler primitives that allow userspace to communicate when
that is ok or not. 

I personally think there is a growing need for a more power-aware
scheduling class.  In talking with others, I've said I sometimes think
of my proposal as a form of "opportunistic scheduling", where the system
is only going to spend power to allow specific tasks to run. Since those
important tasks will do things that block for short amounts of time
(disk io, etc), less-important tasks can opportunistically use the idle
cycles of the active task. But when the active tasks are finished, we
stop scheduling anyone else. There are some folks looking at trying to
use cgroups for this sort of prioritizing, but that has issues with
priority inversion style issues when sharing resources across cgroups.

> So no, you've got a massive major NAK for anything touching the
> scheduler for this utter braindamage.

Fair enough, I didn't really expect a warm welcome from you on this. :)
So I'll take my lumps.

But while I understand you see this as crap, I'd be interested if you
think the approach is maybe an improvement or not over prior attempts?
While I'm not picky about the specific API being sched_setscheduler, I
see a conceptual benefit to this approach, as it provides information to
the scheduler that would allow the scheduler to make other informed
decisions.

Where as other attempts which really didn't involve the scheduler at
all, and just suspended based only on if there were any active critical
sections. Causing some to charge that it created a second-scheduler.

For my proposal, there could also be other cases that might parallel the
priority inheritance code, where a "important" task A is blocked waiting
on some resource held by a non-important task B which is blocked on a
device that is backed by a wakeup source. In that case, you could maybe
pass the "importance" from task A to task B, then allowing B to be
deboosted while blocked on the wakeup source, and allow suspend to
safely occur. Granted, this gets pretty complex, and isn't really
necessary, but I can imagine interested folks could hole up in academia
for awhile on similar approaches.

So with these sorts of parallels, it seems this sort of thing should be
connected in the scheduler in some way, no?

And again, I'm not sending these patches to push for inclusion, but only
just to communicate the idea and to get folks discussing the merits and
faults of this implementation compared with wakelocks or other
alternatives they might have. 

thanks
-john

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-26 22:27   ` John Stultz
@ 2011-09-27 10:37     ` Peter Zijlstra
  2011-09-27 22:56       ` John Stultz
  2011-09-28  0:09     ` Thomas Gleixner
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-27 10:37 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Mon, 2011-09-26 at 15:27 -0700, John Stultz wrote:
> On Mon, 2011-09-26 at 22:16 +0200, Peter Zijlstra wrote:
> > On Mon, 2011-09-26 at 12:13 -0700, John Stultz wrote:
> > > 
> > > For now, I'd just be interested in what folks think about the concept with
> > > regards to the wakelock discussions. Where it might not be sufficient? Or
> > > what other disadvantages might it have? Are there any varients to this
> > > idea that would be better? 
> > 
> > I would like to know why people still think wakelocks are remotely sane?
> > 
> > From where I'm sitting they're utter crap.. _WHY_ do you need to suspend
> > anything? What's wrong with regular idle?
> 
> Well. Regular idle still takes up more power with my desktop then I
> could save with suspend.

Blame Intel ;-) Personally I loathe suspend because it kills all my
network links.

> My personal use case: I do nightly backups with rdiff-backup. I'd like
> to schedule those  backup using an alarm-timer, so I could suspend my
> system when I'm not using it. So far, so good, that all works. 
> 
> However, if my system tries to suspend itself after 15 minutes of X
> input idle, and my backup at 2am takes more then 15 minutes, then the
> backup gets interrupted. Because rdiff-backup is more of a transactional
> style backup, it then has to roll back any incomplete changes and try
> again the next night, which will surely take more then 15 minutes, etc.

So your fail is to tie suspend to the input inactivity instead of the
completion of your backup thingy.

> I could try to inhibit suspend by making requests to my desktop
> environment, so the desktop-specific power management daemon won't
> trigger suspend. But given recent behavior, I don't trust that not to
> break when I upgrade my system, or if I get frustrated with one desktop
> environment, that I won't have to use a different api for whatever other
> environment I pick next.

Kick the friggin Desktop folks already for messing up. I mean, because
userspace is incompetent this needs to go in the kernel? Ere long we'll
have a kernel based GUI if we go that route.

> Another use case I've heard about are systems that have firmware updates
> that are remotely triggered. Should the system go into suspend while the
> firmware update is going on, you end up with a brick.

Bricks are good for throwing at those Desktop folks ;-)

> Having to have multiple distro/release specific quirks to get the
> power-management-daemon to inhibit suspend is annoying enough, but then
> you also have to deal with custom changes by administrator, or remote
> power management systems like power nap, which might also echo "mem"
> into /sys/power/state when you're not expecting it. A kernel method to
> really block suspend would be nice. While this doesn't necessarily need
> to be conflated with wakelock style suspend, there is some need to allow
> userland to block suspend at the kernel level, and once you have that, I
> can't imagine folks not trying to stretch that into something like
> wakelocks. So you might as well at least try to design it reasonably
> well to start.

How about you create a daemon tasked with managing /sys/power/state and
change /sys/power/state such that it can be opened only once, then that
daemon can keep the fd open and everything else trying to poke at it
will get a fail.

Then fix up the fallout.

> And again, this doesn't have to be suspend specific. As I mentioned, one
> way of reducing power drain by increasing timer slack, or just not
> scheduling processes for some chunk of time. However, there really isn't
> any good scheduler primitives that allow userspace to communicate when
> that is ok or not. 

I'm probably stupid, but what?! Why would the scheduler want to care
about this nonsense?

What you should do (and what Android should have done) is change the
runtime so you mandate power aware apps, and anything violating the
runtime gets killed.

For Desktop apps this probably involves D-Bus or whatnot, where the
system tells the apps what state it is in. Apps should then respect this
state.

For instance anybody trying to draw to an X surface after they've been
told the screen is off should get kicked. (And before people go whinge
about d-bus having to wake all tasks to get the msgs across, which
wastes power; if you fix the runtime up far enough the attempt of
drawing could return this information.)

I'm not quite sure how timer-slack comes into this, because every app
receiving random wakeups (no matter what slack) after its been told it
should quiesce is a fail, with the exception of the wakeup for telling
it its good to go again (but that comes _after_ the system policy
change, so its fine).

> I personally think there is a growing need for a more power-aware
> scheduling class.  In talking with others, I've said I sometimes think
> of my proposal as a form of "opportunistic scheduling", where the system
> is only going to spend power to allow specific tasks to run. Since those
> important tasks will do things that block for short amounts of time
> (disk io, etc), less-important tasks can opportunistically use the idle
> cycles of the active task. But when the active tasks are finished, we
> stop scheduling anyone else. There are some folks looking at trying to
> use cgroups for this sort of prioritizing, but that has issues with
> priority inversion style issues when sharing resources across cgroups.

That's just insane.. why bother running anything but the 'important'
tasks. Idle is more power aware than running random crap tasks that have
no business running in the first place.

IOW you should stop tasks from being runnable in the first place, once
you're in a situation where you've got random runnable processes you've
failed.

Nothing the scheduler can do about that.

Also, this is a fucked up definition of power-aware scheduling. Normally
power-aware scheduling is about optimizing throughput/watt, and that's a
hard enough problem. No reason to conflate the issue with shitty
userspace that doesn't know what the fuck its doing.

> > So no, you've got a massive major NAK for anything touching the
> > scheduler for this utter braindamage.
> 
> Fair enough, I didn't really expect a warm welcome from you on this. :)
> So I'll take my lumps.
> 
> But while I understand you see this as crap, I'd be interested if you
> think the approach is maybe an improvement or not over prior attempts?

No its still wakelocks, its still trying to force a shitty bunch of
userspace that doesn't know shit into half-way behaving.

And from experience (having an Android phone) it simply doesn't work
worth shit.. there's plenty apps out there that suck battery like
nobodies business, so clearly all the wakelock crap in the world doesn't
help one whit.

So stop fucking about and start fixing the runtime.

> While I'm not picky about the specific API being sched_setscheduler, I
> see a conceptual benefit to this approach, as it provides information to
> the scheduler that would allow the scheduler to make other informed
> decisions.

Where I'm sitting, the moment you need to scheduler to interfere you've
already failed. Tasks that you don't want to run shouldn't be runnable,
full stop.

> Where as other attempts which really didn't involve the scheduler at
> all, and just suspended based only on if there were any active critical
> sections. Causing some to charge that it created a second-scheduler.

That only because they're shit, see above.

> For my proposal, there could also be other cases that might parallel the
> priority inheritance code, where a "important" task A is blocked waiting
> on some resource held by a non-important task B which is blocked on a
> device that is backed by a wakeup source. In that case, you could maybe
> pass the "importance" from task A to task B, then allowing B to be
> deboosted while blocked on the wakeup source, and allow suspend to
> safely occur. Granted, this gets pretty complex, and isn't really
> necessary, but I can imagine interested folks could hole up in academia
> for awhile on similar approaches.
> 
> So with these sorts of parallels, it seems this sort of thing should be
> connected in the scheduler in some way, no?

No, clearly B was runnable to begin with, someone forced its ass to
sleep, they fail. Never allow a task to go into indefinite sleep while
holding a resource.

Same for the kernel, we don't allow a return to userspace with a kernel
lock held, or a call into the freezer while holding a lock. Why would
you want to allow this for userspace.

> And again, I'm not sending these patches to push for inclusion, but only
> just to communicate the idea and to get folks discussing the merits and
> faults of this implementation compared with wakelocks or other
> alternatives they might have. 

I think I've made my position clear.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 10:37     ` Peter Zijlstra
@ 2011-09-27 22:56       ` John Stultz
  2011-09-28  7:51         ` Peter Zijlstra
                           ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: John Stultz @ 2011-09-27 22:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Tue, 2011-09-27 at 12:37 +0200, Peter Zijlstra wrote:
> On Mon, 2011-09-26 at 15:27 -0700, John Stultz wrote:
> > On Mon, 2011-09-26 at 22:16 +0200, Peter Zijlstra wrote:
> > > On Mon, 2011-09-26 at 12:13 -0700, John Stultz wrote:
> > > > 
> > > > For now, I'd just be interested in what folks think about the concept with
> > > > regards to the wakelock discussions. Where it might not be sufficient? Or
> > > > what other disadvantages might it have? Are there any varients to this
> > > > idea that would be better? 
> > > 
> > > I would like to know why people still think wakelocks are remotely sane?
> > > 
> > > From where I'm sitting they're utter crap.. _WHY_ do you need to suspend
> > > anything? What's wrong with regular idle?
> > 
> > Well. Regular idle still takes up more power with my desktop then I
> > could save with suspend.
> 
> Blame Intel ;-) Personally I loathe suspend because it kills all my
> network links.
> 
> > My personal use case: I do nightly backups with rdiff-backup. I'd like
> > to schedule those  backup using an alarm-timer, so I could suspend my
> > system when I'm not using it. So far, so good, that all works. 
> > 
> > However, if my system tries to suspend itself after 15 minutes of X
> > input idle, and my backup at 2am takes more then 15 minutes, then the
> > backup gets interrupted. Because rdiff-backup is more of a transactional
> > style backup, it then has to roll back any incomplete changes and try
> > again the next night, which will surely take more then 15 minutes, etc.
> 
> So your fail is to tie suspend to the input inactivity instead of the
> completion of your backup thingy.

Well, its both. If the backup runs very long, and I'm using the machine
in the morning, I don't want the end of my backup to suspend the system.

> > I could try to inhibit suspend by making requests to my desktop
> > environment, so the desktop-specific power management daemon won't
> > trigger suspend. But given recent behavior, I don't trust that not to
> > break when I upgrade my system, or if I get frustrated with one desktop
> > environment, that I won't have to use a different api for whatever other
> > environment I pick next.
> 
> Kick the friggin Desktop folks already for messing up. I mean, because
> userspace is incompetent this needs to go in the kernel? Ere long we'll
> have a kernel based GUI if we go that route.

Well, to be fair to the desktop guys, they have been working to try to
provide a DBUS api to handle this.

But even with a proper DBUS api, there's still the race when I walk away
from my computer 15 minutes before the backup starts. 

In that case, my backup application's alarm timer fires and schedules
the backup, but then before the backup application runs and sends its
DBUS message to block suspend, the suspend occurs. And yea, that's
probably not a problem for my use, but it limits any similar power
savings from an environment where reliability might actually matter. 

Read that last bit again, as it has seemingly been missed over and over
in these discussions. This ability to make sure wake up events are
consumed by userland before suspending again is key.

> > Another use case I've heard about are systems that have firmware updates
> > that are remotely triggered. Should the system go into suspend while the
> > firmware update is going on, you end up with a brick.
[snip]
> > Having to have multiple distro/release specific quirks to get the
> > power-management-daemon to inhibit suspend is annoying enough, but then
> > you also have to deal with custom changes by administrator, or remote
> > power management systems like power nap, which might also echo "mem"
> > into /sys/power/state when you're not expecting it. A kernel method to
> > really block suspend would be nice. While this doesn't necessarily need
> > to be conflated with wakelock style suspend, there is some need to allow
> > userland to block suspend at the kernel level, and once you have that, I
> > can't imagine folks not trying to stretch that into something like
> > wakelocks. So you might as well at least try to design it reasonably
> > well to start.
> 
> How about you create a daemon tasked with managing /sys/power/state and
> change /sys/power/state such that it can be opened only once, then that
> daemon can keep the fd open and everything else trying to poke at it
> will get a fail.

That's actually pretty interesting. It doesn't handle the race issues
between wakeup event and event consumption by userland, but not a bad
tool to have in the toolbox as we look at other approaches.

> > And again, this doesn't have to be suspend specific. As I mentioned, one
> > way of reducing power drain by increasing timer slack, or just not
> > scheduling processes for some chunk of time. However, there really isn't
> > any good scheduler primitives that allow userspace to communicate when
> > that is ok or not. 
> 
> I'm probably stupid, but what?! Why would the scheduler want to care
> about this nonsense?

For the same reason it cares about SCHED_FIFO.

> What you should do (and what Android should have done) is change the
> runtime so you mandate power aware apps, and anything violating the
> runtime gets killed.
> 
> For Desktop apps this probably involves D-Bus or whatnot, where the
> system tells the apps what state it is in. Apps should then respect this
> state.

Sure, and real-time tasks should coordinate with all of userland to just
make sure no one gets in the way! And those applications should respect
that! Why is all that *useless* code in the kernel!? :)

> For instance anybody trying to draw to an X surface after they've been
> told the screen is off should get kicked. (And before people go whinge
> about d-bus having to wake all tasks to get the msgs across, which
> wastes power; if you fix the runtime up far enough the attempt of
> drawing could return this information.)
> 
> I'm not quite sure how timer-slack comes into this, because every app
> receiving random wakeups (no matter what slack) after its been told it
> should quiesce is a fail, with the exception of the wakeup for telling
> it its good to go again (but that comes _after_ the system policy
> change, so its fine).

So you're suggesting we rewrite everything in the debian package
archives to use DBUS and abide by some sort of userland power policy?

I'll admit its a terrible straw-man, but this starts to sound like: "We
don't need protected memory! Just rewrite all the apps so they don't
accidentally overwrite kernel structures!"

Don't get me wrong, I do think that there are power-optimizations to be
had using more "runtime" regulated behavior. You're right, when the
screen is powered off, the clock applet shouldn't be trying to make X
calls to update itself every minute.

But I think once you get away from some theoretical system, containing
only properly behaving apps, having some tools in the kernel to enforce
a certain type of behavior (and having the kernel's ability to handle
more complex cases like importance inheritance) might be helpful.

> > I personally think there is a growing need for a more power-aware
> > scheduling class.  In talking with others, I've said I sometimes think
> > of my proposal as a form of "opportunistic scheduling", where the system
> > is only going to spend power to allow specific tasks to run. Since those
> > important tasks will do things that block for short amounts of time
> > (disk io, etc), less-important tasks can opportunistically use the idle
> > cycles of the active task. But when the active tasks are finished, we
> > stop scheduling anyone else. There are some folks looking at trying to
> > use cgroups for this sort of prioritizing, but that has issues with
> > priority inversion style issues when sharing resources across cgroups.
> 
> That's just insane.. why bother running anything but the 'important'
> tasks. Idle is more power aware than running random crap tasks that have
> no business running in the first place.

Its really not that different conceptually from aligning timers. Making
sure that when we fire, we expire as many timers as we can in one go,
and run all the tasks that need to run, so we can go back to idle for as
long as possible.

But instead of idling "until the next timer group", we split stuff we
don't care that much about (but needs to be there), and stuff we do care
about, and only schedule the hardware to fire for the events we do care
about.

> IOW you should stop tasks from being runnable in the first place, once
> you're in a situation where you've got random runnable processes you've
> failed.

Consider your desktop. Consider servers. Are really ontop of every task
and are sure its not inefficient, or doesn't have some edge case bug
where it just flips out and chews cpu (I'm looking at you flashplayer!).
The real world is filled with crap.

> Nothing the scheduler can do about that.

I disagree. Why are the inmates running the asylum? The scheduler
decides what runs when and where. We're not at the mercy of bad
applications, they're at the mercy of the scheduler.

> Also, this is a fucked up definition of power-aware scheduling. Normally
> power-aware scheduling is about optimizing throughput/watt, and that's a
> hard enough problem. No reason to conflate the issue with shitty
> userspace that doesn't know what the fuck its doing.

And I do get that its a hard enough problem, and have nothing but
respect for your work there. But I suspect its likely to get harder as
it gets more important. Once the throughput side is maxed out, being
able to further reduce the watt side of the equation is going to have
value. And being able to do that without rewriting all of userland is
compelling to folks. That is why suspend is being exploited in a number
of these cases (in more or less hackish ways, depending).

Its like using sleeping spinlocks instead of having to rewrite every
driver so that they didn't have any long held critical sections. One
works in practice and the other is better in theory.

> > But while I understand you see this as crap, I'd be interested if you
> > think the approach is maybe an improvement or not over prior attempts?
> 
> No its still wakelocks, its still trying to force a shitty bunch of
> userspace that doesn't know shit into half-way behaving.
> 
> And from experience (having an Android phone) it simply doesn't work
> worth shit.. there's plenty apps out there that suck battery like
> nobodies business, so clearly all the wakelock crap in the world doesn't
> help one whit.

One can configure a Linux system to run like crap, therefore everyone's
focus on performance didn't help one whit?   Come on, nothing is a
cure-all. But that's not really an argument against improving something
or providing necessary tools to allow for certain types of
optimizations.

And don't take me for an Android apologist. I think the poor battery
anecdotes that pervade are a big problem.  One issue I suspect with
Android's wakelocks is that many are timeout based, which keeps things
active for longer then necessary. Additionally, since all wakelock-izing
of drivers happens out of the tree, there's no sanity reviewing, so
extending those timeouts to avoid issues are quick fixes that get phones
out the door, but at the price of battery life.

I believe that one of the benefits of my proposal is that it avoids the
need for such timeouts.  But I realize at this point in the conversation
comparing apples vs oranges probably isn't productive if you're argument
is "round things suck". :)

> So stop fucking about and start fixing the runtime.
> 
> > While I'm not picky about the specific API being sched_setscheduler, I
> > see a conceptual benefit to this approach, as it provides information to
> > the scheduler that would allow the scheduler to make other informed
> > decisions.
> 
> Where I'm sitting, the moment you need to scheduler to interfere you've
> already failed. Tasks that you don't want to run shouldn't be runnable,
> full stop.

SIGSTOP the world? (and melt^H^H^H^Hfreeze with you!)

You want them to run, its just a matter of when. 

> > Where as other attempts which really didn't involve the scheduler at
> > all, and just suspended based only on if there were any active critical
> > sections. Causing some to charge that it created a second-scheduler.
> 
> That only because they're shit, see above.
> 
> > For my proposal, there could also be other cases that might parallel the
> > priority inheritance code, where a "important" task A is blocked waiting
> > on some resource held by a non-important task B which is blocked on a
> > device that is backed by a wakeup source. In that case, you could maybe
> > pass the "importance" from task A to task B, then allowing B to be
> > deboosted while blocked on the wakeup source, and allow suspend to
> > safely occur. Granted, this gets pretty complex, and isn't really
> > necessary, but I can imagine interested folks could hole up in academia
> > for awhile on similar approaches.
> > 
> > So with these sorts of parallels, it seems this sort of thing should be
> > connected in the scheduler in some way, no?
> 
> No, clearly B was runnable to begin with, someone forced its ass to
> sleep, they fail. Never allow a task to go into indefinite sleep while
> holding a resource.
>
> Same for the kernel, we don't allow a return to userspace with a kernel
> lock held, or a call into the freezer while holding a lock. Why would
> you want to allow this for userspace.

Maybe the word "resource" was too general. Consider circular buffer
semaphores used to sync producer/consumer with my earlier example above.
Tasks waiting on tasks waiting on input are not an uncommon pattern. 

Thanks again for the feedback and critique. I recognize you are very
passionate in your distaste for wakelocks and similar ideas, and I don't
mean to wind you up too much arguing with you. 

I'll happily admit that my proposal and the vision I see of how the
scheduler could function here may have real issues and might add
unnecessary complexity. 

But I also want to separate my specific solution from the problem at
large. I do think that there are issues that my proposal and wakelocks
address that the hand-wavy "just do it in userspace" rebuttals don't
deal with (again specifically: wakeup event consumption in userland
before the next suspend).

I do think my idea is neat, so we can spend/burn time debating it, but
if you have viable suggestions (and honestly, I think rewriting all of
userland isn't viable) on solving the larger problem, I'm all ears.

thanks again!
-john

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-26 22:27   ` John Stultz
  2011-09-27 10:37     ` Peter Zijlstra
@ 2011-09-28  0:09     ` Thomas Gleixner
  2011-09-28  1:19       ` John Stultz
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2011-09-28  0:09 UTC (permalink / raw)
  To: John Stultz
  Cc: Peter Zijlstra, lkml, Rafael J. Wysocki, arve, markgross,
	Alan Stern, amit.kucheria, farrowg, Dmitry Fink (Palm GBU),
	linux-pm, khilman, Magnus Damm, mjg

On Mon, 26 Sep 2011, John Stultz wrote:
> Another use case I've heard about are systems that have firmware updates

Yes, I have heard about people wanting O_PONIES ...

> that are remotely triggered. Should the system go into suspend while the
> firmware update is going on, you end up with a brick.

If someone came up with a firmware update mechanism which is not
coping with unexpected interruption of any kind, then wakelocks are
not making any difference.

Please collect the resulting bricks and shove them back to those who
thought that remote firmware updates do not have to be engineered and
the resulting fallout can be blamed on the kernel.

We have proper mechanisms in place to handle such stuff, but they need
proper overall design and definitely a bit more brain usage than just
yelling "wakelock".

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  0:09     ` Thomas Gleixner
@ 2011-09-28  1:19       ` John Stultz
  2011-09-28  8:18         ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: John Stultz @ 2011-09-28  1:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, lkml, Rafael J. Wysocki, arve, markgross,
	Alan Stern, amit.kucheria, farrowg, Dmitry Fink (Palm GBU),
	linux-pm, khilman, Magnus Damm, mjg

On Wed, 2011-09-28 at 02:09 +0200, Thomas Gleixner wrote:
> On Mon, 26 Sep 2011, John Stultz wrote:
> > Another use case I've heard about are systems that have firmware updates
> 
> Yes, I have heard about people wanting O_PONIES ...

O_PONIES_WITH_HEADMOUNTED_WOODCUTTING_LASERS?

> > that are remotely triggered. Should the system go into suspend while the
> > firmware update is going on, you end up with a brick.
> 
> If someone came up with a firmware update mechanism which is not
> coping with unexpected interruption of any kind, then wakelocks are
> not making any difference.
> 
> Please collect the resulting bricks and shove them back to those who
> thought that remote firmware updates do not have to be engineered and
> the resulting fallout can be blamed on the kernel.
> 
> We have proper mechanisms in place to handle such stuff, but they need
> proper overall design and definitely a bit more brain usage than just
> yelling "wakelock".

And it would be great if some of that brain usage was spent to review
and critique what I'm actually proposing, rather then just yelling
"wakelock". :P

I apologize for being probably too verbose in my mails, but I did
originally admit that the firmware update issue is a simpler problem and
doesn't necessarily need the same solution as the races around my
nightly backups. But I do think that some thought should be put into the
different use cases that seem to desire similar things, so that an
appropriate design can be created, instead of a collection of short-term
hacks.

More brain usage, and proper design. At least with that, I think we
agree. :)

thanks
-john

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 22:56       ` John Stultz
@ 2011-09-28  7:51         ` Peter Zijlstra
  2011-09-28  7:57           ` Richard Cochran
  2011-09-28  8:19         ` Peter Zijlstra
                           ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-28  7:51 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> Sure, and real-time tasks should coordinate with all of userland to just
> make sure no one gets in the way! And those applications should respect
> that! Why is all that *useless* code in the kernel!? :) 

In fact they should, its impossible to develop two sets of real-time
applications and shove them on the same box without co-ordination.

SCHED_FIFO is an utter trainwreck, you're talking to the guy who gave a
conference talk on why SCHED_FIFO sucks.

The only reason we have SCHED_FIFO is because of POSIX, and its one of
those things where they fucked up worse than usual (mostly because
people often don't realize how broken it is).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  7:51         ` Peter Zijlstra
@ 2011-09-28  7:57           ` Richard Cochran
  2011-09-28  8:02             ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2011-09-28  7:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Wed, Sep 28, 2011 at 09:51:45AM +0200, Peter Zijlstra wrote:
> SCHED_FIFO is an utter trainwreck, you're talking to the guy who gave a
> conference talk on why SCHED_FIFO sucks.

That sounds interesting. Do you have a link to the paper and/or
presentation?

Thanks,
Richard


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  7:57           ` Richard Cochran
@ 2011-09-28  8:02             ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-28  8:02 UTC (permalink / raw)
  To: Richard Cochran
  Cc: John Stultz, lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Wed, 2011-09-28 at 09:57 +0200, Richard Cochran wrote:
> On Wed, Sep 28, 2011 at 09:51:45AM +0200, Peter Zijlstra wrote:
> > SCHED_FIFO is an utter trainwreck, you're talking to the guy who gave a
> > conference talk on why SCHED_FIFO sucks.
> 
> That sounds interesting. Do you have a link to the paper and/or
> presentation?

Not really, I'm also the guy who doesn't write papers and has minimal
slides (and has lost those he had).

But the gist is that SCHED_FIFO doesn't provide proper resource control:
its possible to overload the system, nor does it provide proper resource
isolation: (like already stated) its impossible to fold two properly
working RT systems onto one machine and still have them work correctly
(even when the combined utilization < 1).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  1:19       ` John Stultz
@ 2011-09-28  8:18         ` Thomas Gleixner
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2011-09-28  8:18 UTC (permalink / raw)
  To: John Stultz
  Cc: Peter Zijlstra, lkml, Rafael J. Wysocki, arve, markgross,
	Alan Stern, amit.kucheria, farrowg, Dmitry Fink (Palm GBU),
	linux-pm, khilman, Magnus Damm, mjg

On Tue, 27 Sep 2011, John Stultz wrote:
> On Wed, 2011-09-28 at 02:09 +0200, Thomas Gleixner wrote:
> > We have proper mechanisms in place to handle such stuff, but they need
> > proper overall design and definitely a bit more brain usage than just
> > yelling "wakelock".
> 
> And it would be great if some of that brain usage was spent to review
> and critique what I'm actually proposing, rather then just yelling
> "wakelock". :P

Working on it :)

> I apologize for being probably too verbose in my mails, but I did
> originally admit that the firmware update issue is a simpler problem and
> doesn't necessarily need the same solution as the races around my
> nightly backups. But I do think that some thought should be put into the
> different use cases that seem to desire similar things, so that an
> appropriate design can be created, instead of a collection of short-term
> hacks.

Yes, we want use cases, which can actually justify something like the
proposed.

Firmware update is _not_ one of them because it needs a proper design
to be completely failsafe and just preventing the box to suspend is
not helping that goal at all. You have to deal with broken network
connections, resets, power outage and more to make it failsafe. And
dealing with all of that covers the unintended suspend already. It
simply does not matter whether it happens or not.

And that's why I'm ranting about such arguments, as they will just
guide people into the delusion of solving hard problems like safe
firmware updates with the wrong mechanisms.

The whole wakelock discussion has been full of delusions from the very
beginning and we really need to eliminate the lunatic arguments so we
can look at the real remaining ones (if any), which might justify
them.

> More brain usage, and proper design. At least with that, I think we
> agree. :)

Right, and proper design does not exclude user space. It very much
starts there.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 22:56       ` John Stultz
  2011-09-28  7:51         ` Peter Zijlstra
@ 2011-09-28  8:19         ` Peter Zijlstra
  2011-09-29  3:07           ` John Stultz
  2011-09-28  8:19         ` Peter Zijlstra
                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-28  8:19 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> > For instance anybody trying to draw to an X surface after they've been
> > told the screen is off should get kicked. (And before people go whinge
> > about d-bus having to wake all tasks to get the msgs across, which
> > wastes power; if you fix the runtime up far enough the attempt of
> > drawing could return this information.)
> > 
> > I'm not quite sure how timer-slack comes into this, because every app
> > receiving random wakeups (no matter what slack) after its been told it
> > should quiesce is a fail, with the exception of the wakeup for telling
> > it its good to go again (but that comes _after_ the system policy
> > change, so its fine).
> 
> So you're suggesting we rewrite everything in the debian package
> archives to use DBUS and abide by some sort of userland power policy?
> 
> I'll admit its a terrible straw-man, but this starts to sound like: "We
> don't need protected memory! Just rewrite all the apps so they don't
> accidentally overwrite kernel structures!"
> 
> Don't get me wrong, I do think that there are power-optimizations to be
> had using more "runtime" regulated behavior. You're right, when the
> screen is powered off, the clock applet shouldn't be trying to make X
> calls to update itself every minute.
> 
> But I think once you get away from some theoretical system, containing
> only properly behaving apps, having some tools in the kernel to enforce
> a certain type of behavior (and having the kernel's ability to handle
> more complex cases like importance inheritance) might be helpful. 

So you want to add something that sends a SIGBUS or similar to an
application if it gets a wakeup when its not supposed to? That's what
happens with memory protection, you get killed.

And yes, when Arjan did his powertop thing people did go fix up
everything that made the top.

It really isn't that hard to fix most of these apps. But the problem
with Android is that the battery usage thing it has is utterly useless.
It mostly just tells you wireless or whatnot is sucking power, not what
app is causing that. (And of course the fact that its all fucking Java
crap, which I'll refuse to touch anyway).

This results in people (me) uninstalling everything they installed since
last it worked well and then cursing this crappy shit for wasting their
time.

I really think Android is a pile of privacy violating shit, I can't say
I actually enjoy using the phone. I'm at a point where I can call with
it (without it telling google whoem all I know and where I'm at) and
browse the interwebs and have decided to screw apps, they don't work
anyway.

The only reason I'm not back to the N900 is because the Android thing is
a lot faster at rendering web (better cpu etc..).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 22:56       ` John Stultz
  2011-09-28  7:51         ` Peter Zijlstra
  2011-09-28  8:19         ` Peter Zijlstra
@ 2011-09-28  8:19         ` Peter Zijlstra
  2011-09-29  3:27           ` John Stultz
  2011-09-28  8:40         ` Peter Zijlstra
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-28  8:19 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> > That's just insane.. why bother running anything but the 'important'
> > tasks. Idle is more power aware than running random crap tasks that have
> > no business running in the first place.
> 
> Its really not that different conceptually from aligning timers. Making
> sure that when we fire, we expire as many timers as we can in one go,
> and run all the tasks that need to run, so we can go back to idle for as
> long as possible.
> 
> But instead of idling "until the next timer group", we split stuff we
> don't care that much about (but needs to be there), and stuff we do care
> about, and only schedule the hardware to fire for the events we do care
> about. 

But but but, my badly coded bouncing cows thing simply doesn't need to
run when we wake up to refill the sound buffers for the mp3 player while
the screen is still off!

Yet the wakelock thing will wake the system and lets us schedule
bouncing cows just fine..

I really don't get your argument. It just doesn't make any sense. What
I'm saying is, what about those apps we really don't care about, and
really don't need to be there.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 22:56       ` John Stultz
                           ` (2 preceding siblings ...)
  2011-09-28  8:19         ` Peter Zijlstra
@ 2011-09-28  8:40         ` Peter Zijlstra
  2011-09-28  8:59         ` Peter Zijlstra
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-28  8:40 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> 
> > IOW you should stop tasks from being runnable in the first place, once
> > you're in a situation where you've got random runnable processes you've
> > failed.
> 
> Consider your desktop. Consider servers. Are really ontop of every task
> and are sure its not inefficient, or doesn't have some edge case bug
> where it just flips out and chews cpu (I'm looking at you flashplayer!).
> The real world is filled with crap.

Yes, this is why I loathe to update to a new distro, there's bound to
new and improved *kit-daemon crap about which does random wakeups even
though you know you're not using any of it, and when you're trying to
uninstall that junk it thinks it needs to uninstall the world :-(

On servers you mostly can, and I mostly have, wiped all that stuff.

As for the Desktop, yes it is an unforgivable pile of shit. Even without
flash, I've got 4G of memory and I can't read email and browse the web
at the same time. Something is broken.

About Flash, just say no, its one of the things Apple did get right ;-)

> > Nothing the scheduler can do about that.
> 
> I disagree. Why are the inmates running the asylum? The scheduler
> decides what runs when and where. We're not at the mercy of bad
> applications, they're at the mercy of the scheduler. 

How the fuck does the scheduler know your bouncing cows crap really
shouldn't be running because the screen is off? There is absolutely
_NOTHING_ the scheduler can do about that.

The user said he wanted this app to run, the app is runnable, we run it.
That simple.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 22:56       ` John Stultz
                           ` (3 preceding siblings ...)
  2011-09-28  8:40         ` Peter Zijlstra
@ 2011-09-28  8:59         ` Peter Zijlstra
  2011-09-29  3:45           ` John Stultz
  2011-09-28  9:16         ` Peter Zijlstra
  2011-09-28 21:02         ` Rafael J. Wysocki
  6 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-28  8:59 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> 
> But I also want to separate my specific solution from the problem at
> large. I do think that there are issues that my proposal and wakelocks
> address that the hand-wavy "just do it in userspace" rebuttals don't
> deal with (again specifically: wakeup event consumption in userland
> before the next suspend). 

You know, once you drop the whole suspend notion that race goes away.

Esp. on the mobile hardware there really isn't anything different
between a deep idle state and suspend.

So just make the thing idle and your suspend race goes away.

There's still things like the cgroup-freezer if you really want to force
stuff down, but really your core system should be sane and not actually
do anything unless asked.

Also, I'm all for aggressive enforcement, if it doesn't obey they rules,
kill it. Don't mess about with freezing it, that never does the users
any favour.

If they find their app dies a lot, they'll complain to the dev, if the
dev doesn't fix his stuff, the app gets a shitty rating and nobody will
bother installing it again, problem sorted.

We don't try to be friendly in the face of memory protection violations
either.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 22:56       ` John Stultz
                           ` (4 preceding siblings ...)
  2011-09-28  8:59         ` Peter Zijlstra
@ 2011-09-28  9:16         ` Peter Zijlstra
  2011-09-28 10:45           ` Borislav Petkov
  2011-09-28 21:02         ` Rafael J. Wysocki
  6 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2011-09-28  9:16 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> 
> 
> I do think my idea is neat, so we can spend/burn time debating it, but
> if you have viable suggestions (and honestly, I think rewriting all of
> userland isn't viable) on solving the larger problem, I'm all ears.
> 
And yet, rewriting all of userspace is exactly what Android did, they
had the perfect opportunity of doing it right, but they came up with the
worst solution ever.

As for my laptop, I really would like the firefox guys to add a
'feature' that stops JS on non-visible tabs and preferably also drop all
data after a little while.

firefox is the #1 wakeup thing on my machine and its mostly invisible
(minimized) and when its not, I can only stare at a single tab anyway.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  9:16         ` Peter Zijlstra
@ 2011-09-28 10:45           ` Borislav Petkov
  0 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2011-09-28 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Wed, Sep 28, 2011 at 11:16:39AM +0200, Peter Zijlstra wrote:
> On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> > 
> > 
> > I do think my idea is neat, so we can spend/burn time debating it, but
> > if you have viable suggestions (and honestly, I think rewriting all of
> > userland isn't viable) on solving the larger problem, I'm all ears.
> > 
> And yet, rewriting all of userspace is exactly what Android did, they
> had the perfect opportunity of doing it right, but they came up with the
> worst solution ever.
> 
> As for my laptop, I really would like the firefox guys to add a
> 'feature' that stops JS on non-visible tabs and preferably also drop all
> data after a little while.
> 
> firefox is the #1 wakeup thing on my machine and its mostly invisible
> (minimized) and when its not, I can only stare at a single tab anyway.

Oh yeah, same here.

I even went so far as to do

kill -STOP $(pidof firefox-bin)

at times simply because the thing was getting out of control. And to be
honest, the moment a thinner browser comes out with authors who actually
do care about not bloating the crap out of it, I'm jumping on it.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-27 22:56       ` John Stultz
                           ` (5 preceding siblings ...)
  2011-09-28  9:16         ` Peter Zijlstra
@ 2011-09-28 21:02         ` Rafael J. Wysocki
  6 siblings, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2011-09-28 21:02 UTC (permalink / raw)
  To: John Stultz, Peter Zijlstra
  Cc: lkml, arve, markgross, Alan Stern, amit.kucheria, farrowg,
	Dmitry Fink (Palm GBU), linux-pm, khilman, Magnus Damm, mjg,
	Thomas Gleixner

Hi John,
Hi Peter,

On Wednesday, September 28, 2011, John Stultz wrote:
> On Tue, 2011-09-27 at 12:37 +0200, Peter Zijlstra wrote:
> > On Mon, 2011-09-26 at 15:27 -0700, John Stultz wrote:
> > > On Mon, 2011-09-26 at 22:16 +0200, Peter Zijlstra wrote:
> > > > On Mon, 2011-09-26 at 12:13 -0700, John Stultz wrote:
> > > > > 
> > > > > For now, I'd just be interested in what folks think about the concept with
> > > > > regards to the wakelock discussions. Where it might not be sufficient? Or
> > > > > what other disadvantages might it have? Are there any varients to this
> > > > > idea that would be better? 
> > > > 
> > > > I would like to know why people still think wakelocks are remotely sane?
> > > > 
> > > > From where I'm sitting they're utter crap.. _WHY_ do you need to suspend
> > > > anything? What's wrong with regular idle?
> > > 
> > > Well. Regular idle still takes up more power with my desktop then I
> > > could save with suspend.
> > 
> > Blame Intel ;-)

That's not Intel, but all of the PC vendors who don't care about
anything but Windows compatibility.

> > Personally I loathe suspend because it kills all my network links.

Well, I only use suspend when I would disconnect from the network anyway.

> > > My personal use case: I do nightly backups with rdiff-backup. I'd like
> > > to schedule those  backup using an alarm-timer, so I could suspend my
> > > system when I'm not using it. So far, so good, that all works. 
> > > 
> > > However, if my system tries to suspend itself after 15 minutes of X
> > > input idle, and my backup at 2am takes more then 15 minutes, then the
> > > backup gets interrupted. Because rdiff-backup is more of a transactional
> > > style backup, it then has to roll back any incomplete changes and try
> > > again the next night, which will surely take more then 15 minutes, etc.
> > 
> > So your fail is to tie suspend to the input inactivity instead of the
> > completion of your backup thingy.
> 
> Well, its both. If the backup runs very long, and I'm using the machine
> in the morning, I don't want the end of my backup to suspend the system.

I think this is a variant of a more general issue (see below).

> > > I could try to inhibit suspend by making requests to my desktop
> > > environment, so the desktop-specific power management daemon won't
> > > trigger suspend. But given recent behavior, I don't trust that not to
> > > break when I upgrade my system, or if I get frustrated with one desktop
> > > environment, that I won't have to use a different api for whatever other
> > > environment I pick next.
> > 
> > Kick the friggin Desktop folks already for messing up. I mean, because
> > userspace is incompetent this needs to go in the kernel?

The problem is not outright incompetence, but the fact that this would require
creating a layer of user space between the apps and the kernel that would
have to be shared by all the existing variants of user space, or else
applications working with one variant of the user space "middleware" won't be
compatible with another one.

> > Ere long we'll have a kernel based GUI if we go that route.

I guess you'd use the same argument against cgroups, right?

> Well, to be fair to the desktop guys, they have been working to try to
> provide a DBUS api to handle this.
> 
> But even with a proper DBUS api, there's still the race when I walk away
> from my computer 15 minutes before the backup starts. 
> 
> In that case, my backup application's alarm timer fires and schedules
> the backup, but then before the backup application runs and sends its
> DBUS message to block suspend, the suspend occurs. And yea, that's
> probably not a problem for my use, but it limits any similar power
> savings from an environment where reliability might actually matter. 
> 
> Read that last bit again, as it has seemingly been missed over and over
> in these discussions. This ability to make sure wake up events are
> consumed by userland before suspending again is key.

This alone doesn't require any new kernel interfaces IMO, although it
requires some additional conditions to be met, which is not the case in
practice (see below).

> > > Another use case I've heard about are systems that have firmware updates
> > > that are remotely triggered. Should the system go into suspend while the
> > > firmware update is going on, you end up with a brick.
> [snip]
> > > Having to have multiple distro/release specific quirks to get the
> > > power-management-daemon to inhibit suspend is annoying enough, but then
> > > you also have to deal with custom changes by administrator, or remote
> > > power management systems like power nap, which might also echo "mem"
> > > into /sys/power/state when you're not expecting it. A kernel method to
> > > really block suspend would be nice. While this doesn't necessarily need
> > > to be conflated with wakelock style suspend, there is some need to allow
> > > userland to block suspend at the kernel level, and once you have that, I
> > > can't imagine folks not trying to stretch that into something like
> > > wakelocks. So you might as well at least try to design it reasonably
> > > well to start.
> > 
> > How about you create a daemon tasked with managing /sys/power/state and
> > change /sys/power/state such that it can be opened only once, then that
> > daemon can keep the fd open and everything else trying to poke at it
> > will get a fail.
> 
> That's actually pretty interesting. It doesn't handle the race issues
> between wakeup event and event consumption by userland, but not a bad
> tool to have in the toolbox as we look at other approaches.

_If_ there is a power management daemon, it doesn't have to do things like
this.  It may work just as described in Section 5 of this paper:

http://lwn.net/images/pdf/suspend_blockers.pdf

The problem is that no one has ever tried to implement the PM daemon.

I think there are two questions to ask here:

(1) If "opportunistic" suspend is not used (i.e. the only way to make the
    system suspend is to write "mem" to /sys/power/state), can we handle
    the race issues between wakeup events and event consumption by userland?

(2) If the answer to (1) is "yes" and "opportunistic" suspend _is_ going
    to be used, is the mechanism addressing (1) sufficient for it to work
    correctly?

While (2) is only relevant if one wants to use "opportunistic" suspend (like
Android), (1) is relevant in general.

Now, I believe that the answer to (1) is "yes" under the following conditions:
* There is only one user space process attempting to use /sys/power/state
  (ie. the power manager daemon).
* This process cooperates with all user space processes consuming wakeup
  events (as described in the article mentioned above).

Unfortunately, as I said, I'm not aware of any power management daemon
implementations like this used in real life.  [Well, I could create a
proof-of-concept one, but quite evidently people want me to spend time on
other things.]  This means that _in_ _practice_ the answer to (1) is "no",
which I think John is trying to address (perhaps among other things).

The practice seems to be that (a) there are multiple processes that can
try to write to /sys/power/state (and we have hibernation that can use
a different interface for that matter) and (b) those processes don't
cooperate with each other in any way close to sanity.

Moreover, user space consumers of wakeup events (or more precisely, consumers
of input data associated with wakeup events) have no way to talk to
the processes operating /sys/power/state, so they can't indicate whether
or not it really is a good idea to suspend at the moment.

Arguably, those issues can be addressed by modifying user space, but there
are a few things to take into consideration here:
* Is it realistic to expect that they're ever going to be addressed this way?
* If so, is the cost of addressing them at the user space level lower or
  higher than the cost of a solution involving a kernel interface?
* If it is higher, then perhaps we can do that at the kernel level after all?

Rafael

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  8:19         ` Peter Zijlstra
@ 2011-09-29  3:07           ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-29  3:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Wed, 2011-09-28 at 10:19 +0200, Peter Zijlstra wrote:
> On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> > But I think once you get away from some theoretical system, containing
> > only properly behaving apps, having some tools in the kernel to enforce
> > a certain type of behavior (and having the kernel's ability to handle
> > more complex cases like importance inheritance) might be helpful. 
> 
> So you want to add something that sends a SIGBUS or similar to an
> application if it gets a wakeup when its not supposed to? That's what
> happens with memory protection, you get killed.

Although that is more reactive then what I'm thinking.

I'm thinking more enforcement on the front end, where instead of
bunching timers around time intervals, we bunch timers around the events
of "important" processes.

So its not applications get wakeups when they aren't supposed to, its
that we don't schedule timer interrupts for their timers, and we don't
schedule them unless the system is "active" running an "important" task
(allowing them to exploit the small idle cycles of the "important" task,
which are likely too short to allow for any deep-idle state to trigger).

Basically we want the system to be in as deep an idle as possible for as
long as possible. Bunching both timers and task execution allows this.
But instead of pushing things out maybe a second, can we stall tasks
for minutes or hours? If we start doing that, we're going to need some
way for tasks to communicate that: "now is a really bad time for you to
freeze me".

So I'm trying to come up with an API that we can use for that issue as
well (since it starts to approximates the suspend case as those forced
idle times grow).

I'll see if I can't start playing with some patches to try to
demonstrate the idea in a non-suspend context.

> And yes, when Arjan did his powertop thing people did go fix up
> everything that made the top.

Yea, the really terrible offenders did improve and I think that was
great! But now how do we deal with the hundreds of normal applications,
that when combined still cause quite a bit of wakeup noise? How do we
really get to minutes of deep idle on a "normal" system?

> It really isn't that hard to fix most of these apps. But the problem
> with Android is that the battery usage thing it has is utterly useless.
> It mostly just tells you wireless or whatnot is sucking power, not what
> app is causing that. (And of course the fact that its all fucking Java
> crap, which I'll refuse to touch anyway).
> 
> This results in people (me) uninstalling everything they installed since
> last it worked well and then cursing this crappy shit for wasting their
> time.

Sure Android has issues, no argument there.

But realize that while part of my motivation is to try to mend the
Android fork, another part is that the need for extreme power efficiency
is not just about phones. I'm trying to find a solution that works for
both on servers with classic linux distros as well as the Android
environment (or better, is demonstratively superior to the wakelocks
solution - fixing all your complaints about your phones battery life!).

So arguments about how Android userland sucks aren't really relevant.

> I really think Android is a pile of privacy violating shit, I can't say
> I actually enjoy using the phone. I'm at a point where I can call with
> it (without it telling google whoem all I know and where I'm at) and
> browse the interwebs and have decided to screw apps, they don't work
> anyway.

I'm sorry my patch doesn't address the privacy concerns of using an
Android (or really any) cell phone. :)

thanks
-john

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  8:19         ` Peter Zijlstra
@ 2011-09-29  3:27           ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-29  3:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Wed, 2011-09-28 at 10:19 +0200, Peter Zijlstra wrote:
> On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> > > That's just insane.. why bother running anything but the 'important'
> > > tasks. Idle is more power aware than running random crap tasks that have
> > > no business running in the first place.
> > 
> > Its really not that different conceptually from aligning timers. Making
> > sure that when we fire, we expire as many timers as we can in one go,
> > and run all the tasks that need to run, so we can go back to idle for as
> > long as possible.
> > 
> > But instead of idling "until the next timer group", we split stuff we
> > don't care that much about (but needs to be there), and stuff we do care
> > about, and only schedule the hardware to fire for the events we do care
> > about. 
> 
> But but but, my badly coded bouncing cows thing simply doesn't need to
> run when we wake up to refill the sound buffers for the mp3 player while
> the screen is still off!
> 
> Yet the wakelock thing will wake the system and lets us schedule
> bouncing cows just fine..

You're right. While the sound buffers are being refilled, there might be
idle cycles that the wasteful bouncing cow app gets to run during.

And I'm also not disagreeing that blocking draw events from the
framework if the screen is off would be smart.

But I think there's a difference between wasteful bouncing cow app, and
the normal everyday background tasks that run on a standard linux
system.

> I really don't get your argument. It just doesn't make any sense. What
> I'm saying is, what about those apps we really don't care about, and
> really don't need to be there.

I just think that we need a way to block the background noise of normal
systems, so there should be some way for the kernel to distinguish
between background noise and not.

thanks
-john



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] [RFC] Proposal for optimistic suspend idea.
  2011-09-28  8:59         ` Peter Zijlstra
@ 2011-09-29  3:45           ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2011-09-29  3:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, Thomas Gleixner

On Wed, 2011-09-28 at 10:59 +0200, Peter Zijlstra wrote:
> On Tue, 2011-09-27 at 15:56 -0700, John Stultz wrote:
> > 
> > But I also want to separate my specific solution from the problem at
> > large. I do think that there are issues that my proposal and wakelocks
> > address that the hand-wavy "just do it in userspace" rebuttals don't
> > deal with (again specifically: wakeup event consumption in userland
> > before the next suspend). 
> 
> You know, once you drop the whole suspend notion that race goes away.
> 
> Esp. on the mobile hardware there really isn't anything different
> between a deep idle state and suspend.

Well, except timer noise and device irq noise.

> So just make the thing idle and your suspend race goes away.

Maybe hardware vendors will surprise me, but I don't think the power
difference between suspend and idle will get close enough in a
reasonable amount of time on server hardware.

Even then, I doubt you'll see standard distros that get minutes of
un-interrupted deep-idle. 

How long until you realistically expect to leave your laptop overnight
off of AC without suspending it (and not have it be on fumes in the
morning)? 

> There's still things like the cgroup-freezer if you really want to force
> stuff down, but really your core system should be sane and not actually
> do anything unless asked.

I think the cgroup-freezer is closer to the lines I'm thinking of, but
with the potential to do "importance" inheritance so interactions
between tasks in different groups (be they cgroups or sched classes) can
work normally.

thanks
-john

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] [RFC] rtc: rtc-cmos: Add pm_stay_awake/pm_relax calls around IRQ
  2011-09-26 19:13 ` [PATCH 3/6] [RFC] rtc: rtc-cmos: Add pm_stay_awake/pm_relax calls around IRQ John Stultz
@ 2011-10-01 21:31   ` NeilBrown
  0 siblings, 0 replies; 28+ messages in thread
From: NeilBrown @ 2011-10-01 21:31 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Rafael J. Wysocki, arve, markgross, Alan Stern,
	amit.kucheria, farrowg, Dmitry Fink (Palm GBU), linux-pm, khilman,
	Magnus Damm, mjg, peterz

[-- Attachment #1: Type: text/plain, Size: 2489 bytes --]

On Mon, 26 Sep 2011 12:13:51 -0700 John Stultz <john.stultz@linaro.org> wrote:

> Flag the rtc-cmos IRQ event as a wakeup event using pm_stay_awake
> and pm_relax()

It seems that only the 0/6 patch in this sequence was given any attention.
However it appears to me that 3, 4, and 5 stand on their own and are clearly
bug-fixes - the bug being that the wake-up event generated by the CMOS rtc is
not being counted as a wakeup event by the power core.

I'm less clear on 5, but that is probably simply that I don't understand this
code very well yet.

So maybe resend 4,5,6 by themselves and get them accepted?  Or maybe Rafael
could just pick them up and review/apply them as they are ???

Thanks,
NeilBrown


> 
> CC: Rafael J. Wysocki <rjw@sisk.pl>
> CC: arve@android.com
> CC: markgross@thegnar.org
> CC: Alan Stern <stern@rowland.harvard.edu>
> CC: amit.kucheria@linaro.org
> CC: farrowg@sg.ibm.com
> CC: Dmitry Fink (Palm GBU) <Dmitry.Fink@palm.com>
> CC: linux-pm@lists.linux-foundation.org
> CC: khilman@ti.com
> CC: Magnus Damm <damm@opensource.se>
> CC: mjg@redhat.com 
> CC: peterz@infradead.org 
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  drivers/rtc/rtc-cmos.c |   13 +++++++++++--
>  1 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> index 05beb6c..1cc4688 100644
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -538,11 +538,15 @@ static struct bin_attribute nvram = {
>  
>  static struct cmos_rtc	cmos_rtc;
>  
> +static struct wakeup_source *rtc_cmos_wakelock;
> +
>  static irqreturn_t cmos_interrupt(int irq, void *p)
>  {
>  	u8		irqstat;
>  	u8		rtc_control;
>  
> +	__pm_stay_awake(rtc_cmos_wakelock);
> +
>  	spin_lock(&rtc_lock);
>  
>  	/* When the HPET interrupt handler calls us, the interrupt
> @@ -573,9 +577,12 @@ static irqreturn_t cmos_interrupt(int irq, void *p)
>  
>  	if (is_intr(irqstat)) {
>  		rtc_update_irq(p, 1, irqstat);
> +		__pm_relax(rtc_cmos_wakelock);
>  		return IRQ_HANDLED;
> -	} else
> -		return IRQ_NONE;
> +	}
> +
> +	__pm_relax(rtc_cmos_wakelock);
> +	return IRQ_NONE;
>  }
>  
>  #ifdef	CONFIG_PNP
> @@ -1153,6 +1160,8 @@ static int __init cmos_init(void)
>  {
>  	int retval = 0;
>  
> +	rtc_cmos_wakelock = wakeup_source_register("rtc_cmos");
> +
>  #ifdef	CONFIG_PNP
>  	retval = pnp_register_driver(&cmos_pnp_driver);
>  	if (retval == 0)


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-10-01 21:32 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-26 19:13 [PATCH 0/6] [RFC] Proposal for optimistic suspend idea John Stultz
2011-09-26 19:13 ` [PATCH 1/6] [RFC] suspend: Block suspend when wakeups are in-progress John Stultz
2011-09-26 19:13 ` [PATCH 2/6] [RFC] sched: Add support for SCHED_STAYAWAKE flag John Stultz
2011-09-26 19:13 ` [PATCH 3/6] [RFC] rtc: rtc-cmos: Add pm_stay_awake/pm_relax calls around IRQ John Stultz
2011-10-01 21:31   ` NeilBrown
2011-09-26 19:13 ` [PATCH 4/6] [RFC] rtc: interface: Add pm_stay_awake/pm_relax chaining rtc workqueue processing John Stultz
2011-09-26 19:13 ` [PATCH 5/6] [RFC] alarmtimer: Add pm_stay_awake /pm_relax calls John Stultz
2011-09-26 19:13 ` [PATCH 6/6] [RFC] alarmtimer: Deboost on nanosleep John Stultz
2011-09-26 20:16 ` [PATCH 0/6] [RFC] Proposal for optimistic suspend idea Peter Zijlstra
2011-09-26 22:27   ` John Stultz
2011-09-27 10:37     ` Peter Zijlstra
2011-09-27 22:56       ` John Stultz
2011-09-28  7:51         ` Peter Zijlstra
2011-09-28  7:57           ` Richard Cochran
2011-09-28  8:02             ` Peter Zijlstra
2011-09-28  8:19         ` Peter Zijlstra
2011-09-29  3:07           ` John Stultz
2011-09-28  8:19         ` Peter Zijlstra
2011-09-29  3:27           ` John Stultz
2011-09-28  8:40         ` Peter Zijlstra
2011-09-28  8:59         ` Peter Zijlstra
2011-09-29  3:45           ` John Stultz
2011-09-28  9:16         ` Peter Zijlstra
2011-09-28 10:45           ` Borislav Petkov
2011-09-28 21:02         ` Rafael J. Wysocki
2011-09-28  0:09     ` Thomas Gleixner
2011-09-28  1:19       ` John Stultz
2011-09-28  8:18         ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).